summaryrefslogtreecommitdiff
path: root/proposals/VK_EXT_shader_object.adoc
blob: c3407e85e987ff7b0c1d43f21cf96fc718d343a7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
// Copyright 2023-2024 The Khronos Group Inc.
//
// SPDX-License-Identifier: CC-BY-4.0

= VK_EXT_shader_object
:toc: left
:refpage: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/
:sectnums:

This document describes the proposed design for a new extension which aims to comprehensively address problems the pipeline abstraction has created for both applications and implementations.

== Problem Statement

When Vulkan 1.0 and its precursor Mantle were originally developed the then-existing shader and state binding models of earlier APIs were beginning to show worrying limitations, both in terms of draw call scaling and driver complexity needed to support them. Application developers were being artificially constrained from accessing the full capabilities of GPUs, and many IHVs were forced to maintain rat's nests of driver code full of heavy-handed draw time state validation and hacky shader patching, all in the service of simplicity at the API level. IHVs were understandably highly motivated to move away from such API designs.

Enter the new low-level APIs like Mantle and ultimately Vulkan. These APIs set out to reduce driver overhead by exposing lower-level abstractions that would hopefully avoid the need for the draw time state validation and shader patching that was so problematic for IHVs, and so detrimental to performance for applications.

One of the most significant changes to this end was the new concept of pipelines, which promised to shift the burden of the shader state combinatorics out of drivers and into applications, ideally avoiding the need for driver-side draw time state validation and shader patching entirely. The thinking went that application developers would design or redesign their renderers with pipelines in mind, and in so doing they would naturally learn to accomplish their goals with fewer combinations of state.

Implicit in such a design was an assumption that applications would be able to know and provide nearly all of this state upfront. A very limited set of dynamic states was specified for the few pieces of state that had effectively unbounded ranges of values, but otherwise even state that could have been fully dynamic on all implementations was required to be baked into the static pipeline objects. This, the thinking went, would benefit even those implementations where the state was internally dynamic by enabling new possibilities for optimization during shader compilation.

Also implicit in the design of pipelines was an assumption that the driver overhead of the pipeline abstraction would either be negligible, or that it would at least always be outweighed by the performance savings at draw time when compared to earlier APIs. The possibility that either setting dozens of individual pieces of state each time a pipeline is bound or tracking which of those dozens of pieces of state had changed since the previous pipeline bind might cause some implementations to exhibit problematically high overhead at pipeline bind time does not seem to have been a central consideration.

Many of these assumptions have since proven to be unrealistic.

On the application side, many developers considering or implementing Vulkan and similar APIs found them unable to efficiently support important use cases which were easily supportable in earlier APIs. This has not been simply a matter of developers being stuck in an old way of thinking or unwilling to "rein in" an unnecessarily large number of state combinations, but a reflection of the reality that the natural design patterns of the most demanding class of applications which use graphics APIs -- video games -- are inherently and deeply dependent on the very "dynamism" that pipelines set out to constrain.

As a result, renderers with a choice of API have largely chosen to avoid Vulkan and its "pipelined" contemporaries, while those without a choice have largely just devised workarounds to make these new APIs behave like the old ones -- usually in the form of the now nearly ubiquitous hash-n-cache pattern. These applications set various pieces of "pipeline" state independently, then hash it all at draw time and use the hash as a key into an application-managed pipeline cache, reusing an existing pipeline if it exists or creating and caching a new one if it does not. In effect, the messy and inefficient parts of GL drivers that pipelines sought to eliminate have simply moved into applications, except without the benefits of implementation specific knowledge which might have reduced their complexity or improved their performance.

This is not just a problem of "legacy" application code where it might be viable for the API to wait it out until application codebases are rewritten or replaced. Applications need the features they need, and are unlikely to remove features they need just to satisfy what they know to be artificial limitations imposed by a graphics API's made-up abstraction. This is especially true for developers working on platforms where the pipeline API does not offer substantial performance benefits over other APIs that do not share the same limitations.

On the driver side, pipelines have provided some of their desired benefits for some implementations, but for others they have largely just shifted draw time overhead to pipeline bind time (while in some cases still not entirely eliminating the draw time overhead in the first place). Implementations where nearly all "pipeline" state is internally dynamic are forced to either redundantly re-bind all of this state each time a pipeline is bound, or to track what state has changed from pipeline to pipeline -- either of which creates considerable overhead on CPU-constrained platforms.

For certain implementations, the pipeline abstraction has also locked away a significant amount of the flexibility supported by their hardware, thereby paradoxically leaving many of their capabilities inaccessible in the newer and ostensibly "low level" API, though still accessible through older, high level ones. In effect, this is a return to the old problem of the graphics API artificially constraining applications from accessing the full capabilities of the GPU, only on a different axis.

Finally, on fixed hardware platforms like game consoles and embedded systems pipelines have created some additional and unique challenges. These platforms tend to have limited CPU performance, memory, and storage capacity all at the same time. Because of this it is generally not desirable for applications on these platforms to waste storage space shipping both uncompiled SPIR-V and precompiled pipeline caches, however it is also not desirable to compile the same shaders from scratch on each system (even if they could be cached for subsequent runs). Also, the hardware and even driver versions on these systems are typically known in advance, and drivers might only ever change in tandem with applications. Vulkan applications on these systems are forced to waste precious storage space on not only shipping both SPIR-V and pipeline cached versions of their shaders, but on their pipeline caches containing potentially large numbers of slightly differently optimized permutations of the same shader code, with only minor differences in pipeline state (arguably this last point is a compression problem, but opaque pipeline caches mostly leave applications at the mercy of the driver to solve it for them).

Fortunately, some of these problems have been acknowledged and various efforts have already begun to address several of them.

These existing efforts have mainly chosen to tackle problems through the lens of existing hash-n-cache type application architectures, and have focused on those problems which are most acute at pipeline compile time. Their goals have included things like reducing pipeline counts, improving the usability and efficiency of pipeline caches, and introducing more granularity to the pipeline compilation and caching process. The extensions they have produced have preferred a targeted, piecemeal, and minimally invasive "band-aid" approach over a more holistic "rip off the band-aid" redesign.

Such efforts have undoubtedly produced valuable improvements, but they have left the class of problems which manifest at bind time largely unaddressed. It might be possible to continue the existing piecemeal approach with a refocus onto bind time, but the solution space afforded by this kind of approach would necessarily remain constrained by the design decisions of the past.

== Solution Space

Several approaches are immediately apparent:

 . Extend the existing graphics pipeline library concept somehow, perhaps by adding optional new, more granular library types and/or making pipeline binaries directly bindable without needing to be explicitly linked into a pipeline object
 . Continue to expose more (maybe optional) dynamic state to minimize the number of pipeline objects needed
 . Abandon pipelines entirely and introduce new functionality to compile and bind shaders directly

Option 1 is a natural extension of recent efforts and requires relatively few API changes, but it adds even more complexity to the already very complex pipeline concept, while also failing to adequately address significant parts of the problem. While directly bindable pipeline libraries do reduce the dimensionality of pipeline combinatorics, they do not provide any meaningful absolute CPU performance improvement at pipeline bind time. The total overhead of binding N different pipeline libraries is still roughly on par with the overhead of binding a single (monolithic or linked) pipeline.

Option 2 also requires relatively few API changes and would do more to address bind time CPU performance than option 1, but this option is limited in both the class of issues it can address and its portability across implementations. Much of the universally supportable "low hanging fruit" dynamic state has already been exposed by the existing extended dynamic state extensions, and the remaining state is mostly not universally dynamic. Exposing states A and B as dynamic on one implementation and states B and C on another is still valuable, but it limits this approach's benefits for simplifying application architectures. Even though this option is not a complete solution, it can and should be pursued in parallel with other efforts -- both for its own sake and as a potential foundation for more a comprehensive solution.

Option 3 is more radical, but brings the API design more in line with developer expectations. The pipeline abstraction has been a consistent problem for many developers trying to use Vulkan since its inception, and this option can produce a cleaner, more user-friendly abstraction that bypasses the complexity of pipelines. With the benefit of years of hindsight and broader Working Group knowledge about the constraints of each others' implementations, it can aim to achieve a design which better balances API simplicity with adherence to the explicit design ethos of Vulkan.

This proposal focuses on option 3, for the reasons outlined above.

== Proposal

=== Shaders

This extension introduces a new object type `VkShaderEXT` which represents a single compiled shader stage. `VkShaderEXT` objects may be created either independently or linked with other `VkShaderEXT` objects created at the same time. To create `VkShaderEXT` objects, applications call `vkCreateShadersEXT()`:

[source,c]
----
VkResult vkCreateShadersEXT(
    VkDevice                                    device,
    uint32_t                                    createInfoCount,
    VkShaderCreateInfoEXT*                      pCreateInfos,
    VkAllocationCallbacks*                      pAllocator,
    VkShaderEXT*                                pShaders);
----

This function compiles the source code for one or more shader stages into `VkShaderEXT` objects. Whenever `createInfoCount` is greater than one, the shaders being created may optionally be linked together. Linking allows the implementation to perform cross-stage optimizations based on a promise by the application that the linked shaders will always be used together.

Though a set of linked shaders may perform anywhere between the same to substantially better than equivalent unlinked shaders, this tradeoff is left to the application and linking is never mandatory.

[source,c]
----
typedef enum VkShaderCreateFlagBitsEXT {
    VK_SHADER_CREATE_LINK_STAGE_BIT_EXT = 0x00000001,
    VK_SHADER_CREATE_ALLOW_VARYING_SUBGROUP_SIZE_BIT_EXT = 0x00000002,
    VK_SHADER_CREATE_REQUIRE_FULL_SUBGROUPS_BIT_EXT = 0x00000004,
    VK_SHADER_CREATE_NO_TASK_SHADER_BIT_EXT = 0x00000008,
    VK_SHADER_CREATE_DISPATCH_BASE_BIT_EXT = 0x00000010,
    VK_SHADER_CREATE_FRAGMENT_SHADING_RATE_ATTACHMENT_BIT_EXT = 0x00000020,
    VK_SHADER_CREATE_FRAGMENT_DENSITY_MAP_ATTACHMENT_BIT_EXT = 0x00000040
} VkShaderCreateFlagBitsEXT;
typedef VkFlags VkShaderCreateFlagsEXT;

typedef enum VkShaderCodeTypeEXT {
    VK_SHADER_CODE_TYPE_BINARY_EXT = 0,
    VK_SHADER_CODE_TYPE_SPIRV_EXT = 1
} VkShaderCodeTypeEXT;

typedef struct VkShaderCreateInfoEXT {
    VkStructureType                             sType;
    const void*                                 pNext;
    VkShaderCreateFlagsEXT                      flags;
    VkShaderStageFlagBits                       stage;
    VkShaderStageFlags                          nextStage;
    VkShaderCodeTypeEXT                         codeType;
    size_t                                      codeSize;
    const void*                                 pCode;
    const char*                                 pName;
    uint32_t                                    setLayoutCount;
    const VkDescriptorSetLayout*                pSetLayouts;
    uint32_t                                    pushConstantRangeCount;
    const VkPushConstantRange*                  pPushConstantRanges;
    const VkSpecializationInfo*                 pSpecializationInfo;
} VkShaderCreateInfoEXT;
----

To specify that shaders should be linked, include the `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` flag in each of the `VkShaderCreateInfoEXT` structures passed to `vkCreateShadersEXT()`. The presence or absence of `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` must match across all `VkShaderCreateInfoEXT` structures passed to a single `vkCreateShadersEXT()` call: i.e., if any member of `pCreateInfos` includes `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` then all other members must include it too. `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` is ignored if `createInfoCount` is one, and a shader created this way is considered unlinked.

The stage of the shader being compiled is specified by `stage`. Applications must also specify which stage types will be allowed to immediately follow the shader being created. For example, a vertex shader might specify a `nextStage` value of `VK_SHADER_STAGE_FRAGMENT_BIT` to indicate that the vertex shader being created will always be followed by a fragment shader (and never a geometry or tessellation shader). Applications that do not know this information at shader creation time or need the same shader to be compatible with multiple subsequent stages can specify a mask that includes as many valid next stages as they wish. For example, a vertex shader can specify a `nextStage` mask of `VK_SHADER_STAGE_GEOMETRY_BIT | VK_SHADER_STAGE_FRAGMENT_BIT` to indicate that the next stage could be either a geometry shader or fragment shader (but not a tessellation shader).

[NOTE]
====
Certain implementations may incur a compile time and/or memory usage penalty whenever more than one stage bit is set in `nextStage`, so applications should strive to set the minimum number of bits they are able to. However, applications should *not* interpret this advice to mean that they should create multiple `VkShaderEXT` objects that differ only by the value of `nextStage`, as this will incur unnecessarily overhead on implementations where `nextStage` is ignored.
====

The shader code is pointed to by `pCode` and may be provided as SPIR-V, or in an opaque implementation defined binary form specific to the physical device. The format of the shader code is specified by `codeType`.

The `codeType` of all `VkShaderCreateInfoEXT` structures passed to a `vkCreateShadersEXT()` call must match. This also means that only shaders created with the same `codeType` may be linked together.

Descriptor set layouts and push constant ranges used by each shader are specified directly (not via a `VkPipelineLayout`), though multiple stages can of course point to the same structures.

Any time after a `VkShaderEXT` object has been created, its binary shader code can be queried using `vkGetShaderBinaryDataEXT()`:

[source,c]
----
VkResult vkGetShaderBinaryDataEXT(
    VkDevice                                    device,
    VkShaderEXT                                 shader,
    size_t*                                     pDataSize,
    void*                                       pData);
----

When `pData` is `NULL`, `size` is filled with the number of bytes needed to store the shader’s binary code and `VK_SUCCESS` is returned.

When `pData` is non-`NULL`, `size` points to the application-provided size of `pData`. If the provided size is large enough then the location pointed to by `pData` is filled with the shader’s binary code and `VK_SUCCESS` is returned, otherwise nothing is written to `pData` and `VK_INCOMPLETE` is returned.

The binary shader code returned in `pData` can be saved by the application and used in a future `vkCreateShadersEXT()` call (including on a different `VkInstance` and/or `VkDevice`) with a compatible physical device by setting `codeType` to `VK_SHADER_CODE_TYPE_BINARY_EXT`. This means that on fixed platforms like game consoles and embedded systems applications need not ship SPIR-V shader code at all. If the binary shader code in any `VkShaderCreateInfoEXT` passed to `vkCreateShadersEXT()` is not compatible with the physical device then the `vkCreateShadersEXT()` call returns `VK_INCOMPATIBLE_SHADER_BINARY_EXT`.

Applications must pass the same values of `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` to a `vkCreateShadersEXT()` call with a `codeType` of `VK_SHADER_CODE_TYPE_BINARY_EXT` as were passed when those shaders were originally compiled from SPIR-V.

`VkShaderEXT` objects can be bound on a command buffer using `vkCmdBindShadersEXT()`:

[source,c]
----
void vkCmdBindShadersEXT(
    VkCommandBuffer                             commandBuffer,
    uint32_t                                    stageCount,
    const VkShaderStageFlagBits*                pStages,
    const VkShaderEXT*                          pShaders);
----

It is possible to unbind shaders for a particular stage by calling `vkCmdBindShadersEXT()` with elements of `pShaders` set to `VK_NULL_HANDLE`. For example, an application may want to arbitrarily bind and unbind a known compatible passthrough geometry shader without knowing or caring what specific vertex and fragment shaders are bound at that time.

Regardless of whether the shaders were created with `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` the interfaces of all stages bound at `vkCmdDraw*()` time must be compatible. This means that the union of descriptor set layouts and push constant ranges across all bound shaders must not conflict, and that the inputs of each stage are compatible with the outputs of the previous stage. It is the application's responsibility to ensure that this is the case, and the implementation will not do any draw time state validation to guard against this kind of invalid usage.

If any of the shaders bound at `vkCmdDraw*()` time were created with `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` then all shaders that were linked to that shader must also be bound. It is the application's responsibility to ensure that this is the case, and the implementation will not do any draw time state validation to guard against this kind of invalid usage.

When drawing with shaders bound with `vkCmdBindShadersEXT()` most state must be set dynamically. Specifically, the following existing commands must be used to set the corresponding state:

 * `vkCmdSetViewportWithCount()`
 * `vkCmdSetScissorWithCount()`
 * `vkCmdSetLineWidth()`
 * `vkCmdSetDepthBias()`
 * `vkCmdSetBlendConstants()`
 * `vkCmdSetDepthBounds()`
 * `vkCmdSetStencilCompareMask()`
 * `vkCmdSetStencilWriteMask()`
 * `vkCmdSetStencilReference()`
 * `vkCmdBindVertexBuffers2()`
 * `vkCmdSetCullMode()`
 * `vkCmdSetDepthBoundsTestEnable()`
 * `vkCmdSetDepthCompareOp()`
 * `vkCmdSetDepthTestEnable()`
 * `vkCmdSetDepthWriteEnable()`
 * `vkCmdSetFrontFace()`
 * `vkCmdSetPrimitiveTopology()`
 * `vkCmdSetStencilOp()`
 * `vkCmdSetStencilTestEnable()`
 * `vkCmdSetDepthBiasEnable()`
 * `vkCmdSetPrimitiveRestartEnable()`
 * `vkCmdSetRasterizerDiscardEnable()`
 * `vkCmdSetVertexInputEXT()`
 * `vkCmdSetLogicOpEXT()`
 * `vkCmdSetPatchControlPointsEXT()`
 * `vkCmdSetTessellationDomainOriginEXT()`
 * `vkCmdSetDepthClampEnableEXT()`
 * `vkCmdSetPolygonModeEXT()`
 * `vkCmdSetRasterizationSamplesEXT()`
 * `vkCmdSetSampleMaskEXT()`
 * `vkCmdSetAlphaToCoverageEnableEXT()`
 * `vkCmdSetAlphaToOneEnableEXT()`
 * `vkCmdSetLogicOpEnableEXT()`
 * `vkCmdSetColorBlendEnableEXT()`
 * `vkCmdSetColorBlendEquationEXT()`
 * `vkCmdSetColorWriteMaskEXT()`

If link:{refpage}VK_KHR_fragment_shading_rate.html[VK_KHR_fragment_shading_rate] is supported and enabled:

 * `vkCmdSetFragmentShadingRateKHR()`

If link:{refpage}VK_EXT_transform_feedback.html[VK_EXT_transform_feedback] is supported and enabled:

 * `vkCmdSetRasterizationStreamEXT()`

If link:{refpage}VK_EXT_discard_rectangle.html[VK_EXT_discard_rectangle] is supported and enabled:

 * `vkCmdSetDiscardRectangleEnableEXT()`
 * `vkCmdSetDiscardRectangleModeEXT()`
 * `vkCmdSetDiscardRectangleEXT()`

If link:{refpage}VK_EXT_conservative_rasterization.html[VK_EXT_conservative_rasterization] is supported and enabled:

 * `vkCmdSetConservativeRasterizationModeEXT()`
 * `vkCmdSetExtraPrimitiveOverestimationSizeEXT()`

If link:{refpage}VK_EXT_depth_clip_enable.html[VK_EXT_depth_clip_enable] is supported and enabled:

 * `vkCmdSetDepthClipEnableEXT()`

If link:{refpage}VK_EXT_sample_locations.html[VK_EXT_sample_locations] is supported and enabled:

 * `vkCmdSetSampleLocationsEnableEXT()`
 * `vkCmdSetSampleLocationsEXT()`

If link:{refpage}VK_EXT_blend_operation_advanced.html[VK_EXT_blend_operation_advanced] is supported and enabled:

 * `vkCmdSetColorBlendAdvancedEXT()`

If link:{refpage}VK_EXT_provoking_vertex.html[VK_EXT_provoking_vertex] is supported and enabled:

 * `vkCmdSetProvokingVertexModeEXT()`

If link:{refpage}VK_EXT_line_rasterization.html[VK_EXT_line_rasterization] is supported and enabled:

 * `vkCmdSetLineRasterizationModeEXT()`
 * `vkCmdSetLineStippleEnableEXT()`
 * `vkCmdSetLineStippleEXT()`

If link:{refpage}VK_EXT_depth_clip_control.html[VK_EXT_depth_clip_control] is supported and enabled:

 * `vkCmdSetDepthClipNegativeOneToOneEXT()`

If link:{refpage}VK_EXT_color_write_enable.html[VK_EXT_color_write_enable] is supported and enabled:

 * `vkCmdSetColorWriteEnableEXT()`

If link:{refpage}VK_NV_clip_space_w_scaling.html[VK_NV_clip_space_w_scaling] is supported and enabled:

 * `vkCmdSetViewportWScalingEnableNV()`
 * `vkCmdSetViewportWScalingNV()`

If link:{refpage}VK_NV_viewport_swizzle.html[VK_NV_viewport_swizzle] is supported and enabled:

 * `vkCmdSetViewportSwizzleNV()`

If link:{refpage}VK_NV_fragment_coverage_to_color.html[VK_NV_fragment_coverage_to_color] is supported and enabled:

 * `vkCmdSetCoverageToColorEnableNV()`
 * `vkCmdSetCoverageToColorLocationNV()`

If link:{refpage}VK_NV_framebuffer_mixed_samples.html[VK_NV_framebuffer_mixed_samples] is supported and enabled:

 * `vkCmdSetCoverageModulationModeNV()`
 * `vkCmdSetCoverageModulationTableEnableNV()`
 * `vkCmdSetCoverageModulationTableNV()`

If link:{refpage}VK_NV_coverage_reduction_mode.html[VK_NV_coverage_reduction_mode] is supported and enabled:

 * `vkCmdSetCoverageReductionModeNV()`

If link:{refpage}VK_NV_representative_fragment_test.html[VK_NV_representative_fragment_test] is supported and enabled:

 * `vkCmdSetRepresentativeFragmentTestEnableNV()`

If link:{refpage}VK_NV_shading_rate_image.html[VK_NV_shading_rate_image] is supported and enabled:

 * `vkCmdSetCoarseSampleOrderNV()`
 * `vkCmdSetShadingRateImageEnableNV()`
 * `vkCmdSetViewportShadingRatePaletteNV()`

If link:{refpage}VK_NV_scissor_exclusive.html[VK_NV_scissor_exclusive] is supported and enabled:

 * `vkCmdSetExclusiveScissorEnableNV()`
 * `vkCmdSetExclusiveScissorNV()`

If link:{refpage}VK_NV_fragment_shading_rate_enums.html[VK_NV_fragment_shading_rate_enums] is supported and enabled:

 * `vkCmdSetFragmentShadingRateEnumNV()`

Certain dynamic state setting commands have modified behavior from their original versions:

 * `vkCmdSetPrimitiveTopology()` does not have any constraints on the topology class (i.e., it behaves as if the `dynamicPrimitiveTopologyUnrestricted` property is `VK_TRUE` even when the actual property is `VK_FALSE`).
 * `vkCmdSetLogicOpEXT()` may be used on any implementation regardless of its support for the `extendedDynamicState2LogicOp` feature.
 * `vkCmdSetPatchControlPointsEXT()` may be used on any implementation regardless of its support for the `extendedDynamicState2PatchControlPoints` feature.

Any `VkShaderEXT` can be destroyed using `vkDestroyShaderEXT()`:

[source,c]
----
void vkDestroyShaderEXT(
    VkDevice                                    device,
    VkShaderEXT                                 shader,
    VkAllocationCallbacks*                      pAllocator);
----

Destroying a `VkShaderEXT` object used by action commands in one or more command buffers in the _recording_ or _executable_ states causes those command buffers to enter the _invalid_ state. A `VkShaderEXT` object must not be destroyed as long as any command buffer that issues any action command that uses it is in the _pending_ state.

== Examples

=== Graphics

Consider an application which always treats sets of shader stages as complete programs.

At startup time, the application compiles and links the shaders for each complete program:

[source,c]
----
VkShaderCreateInfoEXT shaderInfo[2] = {
    {
        .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT,
        .pNext = NULL,
        .flags = VK_SHADER_CREATE_LINK_STAGE_BIT_EXT,
        .stage = VK_SHADER_STAGE_VERTEX_BIT,
        .nextStage = VK_SHADER_STAGE_FRAGMENT_BIT,
        .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT,
        .codeSize = vertexShaderSpirvSize,
        .pCode = pVertexShaderSpirv,
        .pName = "main",
        .setLayoutCount = 1,
        .pSetLayouts = &descriptorSetLayout,
        .pushConstantRangeCount = 0,
        .pPushConstantRanges = NULL,
        .pSpecializationInfo = NULL
    },
    {
        .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT,
        .pNext = NULL,
        .flags = VK_SHADER_CREATE_LINK_STAGE_BIT_EXT,
        .stage = VK_SHADER_STAGE_FRAGMENT_BIT,
        .nextStage = 0,
        .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT,
        .codeSize = fragmentShaderSpirvSize,
        .pCode = pFragmentShaderSpirv,
        .pName = "main",
        .setLayoutCount = 1,
        .pSetLayouts = &descriptorSetLayout,
        .pushConstantRangeCount = 0,
        .pPushConstantRanges = NULL,
        .pSpecializationInfo = NULL
    }
};

VkShaderEXT shaders[2];

vkCreateShadersEXT(device, 2, shaderInfo, NULL, shaders);
----

Later at draw time, the application binds the linked vertex and fragment shaders forming a complete program:

[source,c]
----
VkShaderStageFlagBits stages[2] = {
    VK_SHADER_STAGE_VERTEX_BIT,
    VK_SHADER_STAGE_FRAGMENT_BIT
};
vkCmdBindShadersEXT(commandBuffer, 2, stages, shaders);
----

Alternatively, the same result could be achieved by:

[source,c]
----
{
    VkShaderStageFlagBits stage = VK_SHADER_STAGE_VERTEX_BIT;
    vkCmdBindShadersEXT(commandBuffer, 1, &stage, &shaders[0]);
}

{
    VkShaderStageFlagBits stage = VK_SHADER_STAGE_FRAGMENT_BIT;
    vkCmdBindShadersEXT(commandBuffer, 1, &stage, &shaders[1]);
}
----

If the `tessellationShader` or `geometryShader` features are enabled on the device, the application sets the corresponding shader types to VK_NULL_HANDLE:

[source,c]
----
VkShaderStageFlagBits unusedStages[3] = {
    VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT,
    VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT,
    VK_SHADER_STAGE_GEOMETRY_BIT
};
VkShaderEXT unusedShaders[3] = { /* VK_NULL_HANDLE, ... */ };
vkCmdBindShadersEXT(commandBuffer, 3, unusedStages, unusedShaders);
----

Alternatively, the same result could be achieved by:

[source,c]
----
VkShaderStageFlagBits unusedStages[3] = {
    VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT,
    VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT,
    VK_SHADER_STAGE_GEOMETRY_BIT
};
// Setting pShaders to NULL is equivalent to specifying an array of stageCount VK_NULL_HANDLE values
vkCmdBindShadersEXT(commandBuffer, 3, unusedStages, NULL);
----

Finally, the application issues a draw call:

[source,c]
----
vkCmdDrawIndexed(commandBuffer, ...);
----

Now consider a different application which needs to mix and match vertex and fragment shaders in arbitrary combinations that are not predictable at shader compile time.

At startup time, the application compiles unlinked vertex and fragment shaders:

[source,c]
----
VkShaderCreateInfoEXT shaderInfo[3] = {
    {
        .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT,
        .pNext = NULL,
        .flags = 0,
        .stage = VK_SHADER_STAGE_VERTEX_BIT,
        .nextStage = VK_SHADER_STAGE_FRAGMENT_BIT,
        .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT,
        .codeSize = vertexShaderSpirvSize,
        .pCode = pVertexShaderSpirv,
        .pName = "main",
        .setLayoutCount = 1,
        .pSetLayouts = &descriptorSetLayout,
        .pushConstantRangeCount = 0,
        .pPushConstantRanges = NULL,
        .pSpecializationInfo = NULL
    },
    {
        .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT,
        .pNext = NULL,
        .flags = 0,
        .stage = VK_SHADER_STAGE_FRAGMENT_BIT,
        .nextStage = 0,
        .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT,
        .codeSize = fragmentShaderSpirvSize[0],
        .pCode = pFragmentShaderSpirv[0],
        .pName = "main",
        .setLayoutCount = 1,
        .pSetLayouts = &descriptorSetLayout,
        .pushConstantRangeCount = 0,
        .pPushConstantRanges = NULL,
        .pSpecializationInfo = NULL
    },
    {
        .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT,
        .pNext = NULL,
        .flags = 0,
        .stage = VK_SHADER_STAGE_FRAGMENT_BIT,
        .nextStage = 0,
        .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT,
        .codeSize = fragmentShaderSpirvSize[1],
        .pCode = pFragmentShaderSpirv[1],
        .pName = "main",
        .setLayoutCount = 1,
        .pSetLayouts = &descriptorSetLayout,
        .pushConstantRangeCount = 0,
        .pPushConstantRanges = NULL,
        .pSpecializationInfo = NULL
    }
};

VkShaderEXT shaders[3];

vkCreateShadersEXT(device, 3, shaderInfo, NULL, shaders);
----

Alternatively, the same result could be achieved by:

[source,c]
----
VkShaderEXT shaders[3];

{
    VkShaderCreateInfoEXT shaderInfo = {
        .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT,
        .pNext = NULL,
        .flags = 0,
        .stage = VK_SHADER_STAGE_VERTEX_BIT,
        .nextStage = VK_SHADER_STAGE_FRAGMENT_BIT,
        .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT,
        .codeSize = vertexShaderSpirvSize,
        .pCode = pVertexShaderSpirv,
        .pName = "main",
        .setLayoutCount = 1,
        .pSetLayouts = &descriptorSetLayout,
        .pushConstantRangeCount = 0,
        .pPushConstantRanges = NULL,
        .pSpecializationInfo = NULL
    };

    vkCreateShadersEXT(device, 1, &shaderInfo, NULL, &shaders[0]);
}

{
    VkShaderCreateInfoEXT shaderInfo = {
        .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT,
        .pNext = NULL,
        .flags = 0,
        .stage = VK_SHADER_STAGE_FRAGMENT_BIT,
        .nextStage = 0,
        .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT,
        .codeSize = fragmentShaderSpirvSize[0],
        .pCode = pFragmentShaderSpirv[0],
        .pName = "main",
        .setLayoutCount = 1,
        .pSetLayouts = &descriptorSetLayout,
        .pushConstantRangeCount = 0,
        .pPushConstantRanges = NULL,
        .pSpecializationInfo = NULL
    };

    vkCreateShadersEXT(device, 1, &shaderInfo, NULL, &shaders[1]);
}

{
    VkShaderCreateInfoEXT shaderInfo = {
        .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT,
        .pNext = NULL,
        .flags = 0,
        .stage = VK_SHADER_STAGE_FRAGMENT_BIT,
        .nextStage = 0,
        .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT,
        .codeSize = fragmentShaderSpirvSize[1],
        .pCode = pFragmentShaderSpirv[1],
        .pName = "main",
        .setLayoutCount = 1,
        .pSetLayouts = &descriptorSetLayout,
        .pushConstantRangeCount = 0,
        .pPushConstantRanges = NULL,
        .pSpecializationInfo = NULL
    };

    vkCreateShadersEXT(device, 1, &shaderInfo, NULL, &shaders[2]);
}
----

Later at draw time, the application binds independent vertex and fragment shaders forming a complete program:

[source,c]
----
VkShaderStageFlagBits stages[2] = {
    VK_SHADER_STAGE_VERTEX_BIT,
    VK_SHADER_STAGE_FRAGMENT_BIT
};
vkCmdBindShadersEXT(commandBuffer, 2, stages, shaders);
----

If the `tessellationShader` or `geometryShader` features are enabled on the device, the application sets the corresponding shader types to VK_NULL_HANDLE:

[source,c]
----
VkShaderStageFlagBits unusedStages[3] = {
    VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT,
    VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT,
    VK_SHADER_STAGE_GEOMETRY_BIT
};
// Setting pShaders to NULL is equivalent to specifying an array of stageCount VK_NULL_HANDLE values
vkCmdBindShadersEXT(commandBuffer, 3, unusedStages, NULL);
----

Then, the application issues a draw call:

[source,c]
----
vkCmdDrawIndexed(commandBuffer, ...);
----

Later, the application binds a different fragment shader without disturbing any other stages:

[source,c]
----
VkShaderStageFlagBits stage = VK_SHADER_STAGE_FRAGMENT_BIT;
vkCmdBindShadersEXT(commandBuffer, 1, &stage, &shaders[2]);
----

Finally, the application issues another draw call:

[source,c]
----
vkCmdDrawIndexed(commandBuffer, ...);
----

=== Compute

At startup time, the application compiles a compute shader:

[source,c]
----
VkShaderCreateInfoEXT shaderInfo = {
    .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT,
    .pNext = NULL,
    .flags = 0,
    .stage = VK_SHADER_STAGE_COMPUTE_BIT,
    .nextStage = 0,
    .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT,
    .codeSize = computeShaderSpirvSize,
    .pCode = pComputeShaderSpirv,
    .pName = "main",
    .setLayoutCount = 1,
    .pSetLayouts = &descriptorSetLayout,
    .pushConstantRangeCount = 0,
    .pPushConstantRanges = NULL,
    .pSpecializationInfo = NULL
};

VkShaderEXT shader;

vkCreateShadersEXT(device, 1, &shaderInfo, NULL, &shader);
----

Later, the application binds the compute shader:

[source,c]
----
VkShaderStageFlagBits stage = VK_SHADER_STAGE_COMPUTE_BIT;
vkCmdBindShadersEXT(commandBuffer, 1, &stage, &shader);
----

Finally, the application dispatches the compute:

[source,c]
----
vkCmdDispatch(commandBuffer, ...);
----

== Issues

=== RESOLVED: How should implementations which absolutely must link shader stages implement this extension?

The purpose of this extension is to expose the flexibility of those implementations which allow arbitrary combinations of unlinked but compatible shader stages and state to be bound independently. Attempting to modify this extension to support implementations which do not have this flexibility would defeat the entire purpose of the extension. For this reason, implementations which do not have the required flexibility should not implement this extension.

IHVs whose implementations have such limitations today are encouraged to consider incorporating changes which could remove these limitations into their future hardware roadmaps.

=== RESOLVED: Should this extension try to reuse pipeline objects and concepts?

No - the pipeline abstraction was never designed with such a radically different design in mind.

Avoiding the introduction of a new object type and a handful of new entry points is not a compelling reason to continue to pile less and less pipeline-like functionality into pipelines. Doing so would needlessly constrict or even undermine the design and future extensibility of both models.

=== RESOLVED: Should binary shader support be exposed in some way similar to existing pipeline caches or pipeline binaries?

No - fixed platforms like game consoles and embedded systems have constraints which make shipping both SPIR-V and binary copies of the same shader code undesirable.

=== RESOLVED: Should there be some kind of shader program object to represent a set of linked shaders?

No - the compiled code for each shader stage is represented by a single `VkShaderEXT` object whether it is linked to other stages or not.

Introducing a shader program object would overly complicate the API and impose a new and unnecessary object lifetime management burden on applications. Vulkan is a low level API, and it should be the application's responsibility to ensure that it keeps any promises it chooses to make about binding the correct stages together.

[NOTE]
====
Whenever shaders are created linked together, the rules for binding them give implementations the freedom to (for example) internally store the compiled code for multiple linked stages in a single stage's `VkShaderEXT` object and to leave the other stages' `VkShaderEXT` objects internally unused, though this is *strongly* discouraged.
====

=== RESOLVED: Should there be some mechanism for applications to provide static state that is known at compile time?

Not as part of this extension - it is possible to imagine some kind of "shader optimization hint" functionality to let applications provide implementations with "static state" similar to the existing static state in pipelines, but on an opt-in rather than opt-out basis. By providing a given piece of state in an optimization hint at shader creation time, an application could promise that the equivalent piece of dynamic state would always be set to some specific value whenever that shader is used, thereby allowing implementations to perform compile time optimizations similar to those they can make with pipelines today.

For already pipeline-friendly applications with lots of static state this could serve as a "gentler" version of pipelines that might provide the best of both worlds, but it is unclear that the benefits of such a scheme for the (pipeline-unfriendly) majority of applications which actually need this extension would outweigh the costs of the added complexity to the API.

If such functionality turns out to be important, it can be noninvasively layered on top of this extension in the form of another extension. Until then, applications wanting something that behaves like pipelines should just use pipelines.

=== RESOLVED: Should this extension expose some abstraction for setting groups of related state?

No - an earlier version of this proposal exposed a mechanism for applications to pre-create "interface shaders" which could then be bound on a command buffer to reduce draw time overhead. This added complexity to the API, and it was unclear that this solution would be able to deliver meaningful performance improvements over setting individual pieces of state on the command buffer.

Such an abstraction may prove beneficial for certain implementations, but it should not be designed until those implementations have at least attempted to implement support for this extension in its existing form.

=== RESOLVED: There is currently no dynamic state setting functionality for sample shading. How should this be handled?

Sample shading is already implicitly enabled (with minSampleShading = 1.0) whenever a shader reads from the SampleId or SamplePosition builtins. The main functionality missing in the absence of dynamic sample shading is the ability to specify minSampleShading values other than 1.0.

This could be addressed by introducing a new MinSampleShading shader builtin which can be either hard-coded or specialized at SPIR-V compile time using the existing specialization constant mechanism. However, since introducing this functionality is orthogonal to the objective of this extension this is left up to a different extension.

Until such an extension is available, applications that need to specify a minSampleShading other than 1.0 should use pipelines.

=== RESOLVED: Is `VK_INCOMPATIBLE_SHADER_BINARY_EXT` a success code, or an error code?

A success code.

Initially this token was named `VK_ERROR_INCOMPATIBLE_SHADER_BINARY_EXT`,
but as pointed out in
https://github.com/KhronosGroup/Vulkan-Docs/issues/2295 the numeric value
assigned to the token was positive.

On further discussion we agreed that the return code was a success code,
much as `VK_INCOMPLETE` is, and aliased the original name to the current name
without `ERROR` in it.

== Further Functionality

 * Shader optimization hints
 * State grouping
 * Ray tracing shader objects