summaryrefslogtreecommitdiff
path: root/proposals/VK_AMD_shader_early_and_late_fragment_tests.adoc
blob: 8a8ec31b45e300023297444be606b8d260df7373 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
// Copyright 2021-2024 The Khronos Group Inc.
//
// SPDX-License-Identifier: CC-BY-4.0

# VK_AMD_shader_early_and_late_fragment_tests
:toc: left
:refpage: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/
:sectnums:

This document describes a proposal for a new SPIR-V execution mode that allows fragment shaders to be discarded by early fragment operations, even if they contain writes to storage resources or other side effects.


## Problem Statement

Most graphics devices are able to take advantage of early fragment operations when a fragment shader avoids certain operations - e.g. writing to depth or stencil, or to storage resources, providing significant performance advantages in most cases.
This is implicitly enabled wherever possible, and can be explicitly enabled by specifying the `EarlyFragmentTests` execution mode in SPIR-V.
However the `EarlyFragmentTests` execution mode makes it invalid to write fragment depth from a shader.

Some implementations can perform depth testing both before _and_ after fragment shading, allowing a conservative early test to discard most fragments and a late test to discard with more precision.
https://registry.khronos.org/OpenGL/extensions/ARB/ARB_conservative_depth.txt[GL_ARB_conservative_depth] added a way to enable this optimisation even when depth was written by the fragment shader, allowing further optimisations to be achieved in certain conditions.
However, if the shader also writes to storage resources, no such optimisation is possible due to the predictability requirements of the specification.
In cases where an application does not care whether storage writes are performed by a fragment shader when discarded, it is possible to use this capability for a significant performance improvement on some console platforms, but so far Vulkan has no mechanism to do this.
For some applications, this can mean an unnecessary performance hit that should be relatively straightforward to solve.


## Solution Space

There is only really one solution to this problem, which is to expose this capability to applications in some way.
The main question is how to expose this, at what granularity, and whether we should provide any guarantees.
Ultimately this should be relatively easy to turn on/off, and ideally should be set per-fragment shader in some form.

The main options for signifying the switch are:

  . Pipeline creation flag
  . SPIR-V execution mode
  . Non-semantic instruction

If it becomes a pipeline creation flag, it is easy to turn on/off on a per-pipeline basis.
However, the knowledge of whether the writes can be discarded is usually a property of whatever algorithm has been written in the fragment shader code itself, meaning this property has to be specified in two places.
From that perspective, it makes sense to have something in the shader code itself.

A SPIR-V execution mode is a straightforward way to express this, and it is consistent with the way that conservative depth is expressed in SPIR-V (e.g. `DepthGreater` https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#Execution_Mode[Execution Mode]).
The downside of using an execution mode which is essentially an optimisation hint is that drivers have to implement the extension for the SPIR-V to be valid; when in fact it can be safely ignored in all cases.

Non-Semantic instructions seem like they could offer a way to specify the behavior without need for drivers to implement anything new.
Unfortunately, as the induced behavior violates the current Vulkan specification, it is not suitable for this use case, as it cannot be safely added to all shaders.

Without a wider "this can be ignored but can change behavior" extension along the lines of the non-semantic extension, a SPIR-V execution mode is likely the most suitable option.
Applications, a user-facing library, or a Vulkan software layer could be used to automatically remove the execution mode when not supported.
Implementations should also eventually be able to support the execution mode as a no-op if they do not have the required capabilities.


## Proposal

### New Vulkan feature

```c
typedef struct VkPhysicalDeviceShaderEarlyAndLateFragmentTestsFeaturesAMD {
    VkStructureType    sType;
    void*              pNext;
    VkBool32           shaderEarlyAndLateFragmentTests;
} VkPhysicalDeviceShaderEarlyAndLateFragmentTestsFeaturesAMD;
```

This feature allows the new execution mode in SPIR-V shaders consumed by the implementation.


### New SPIR-V Execution Modes

A new execution mode is introduced which allows for early depth and stencil tests to be performed both early and late when depth and stencil writes are performed, in combination with the depth optimisations.
In order to allow for stencil reference writes with this new execution mode, similar stencil reference write optimisations are provided.

[cols="1,6,2,1",options="header"]
|====
2+^| Execution mode ^| Extra Operands ^| Enabling Capabilities
| 5017 | *EarlyAndLateFragmentTestsAMD* +
Fragment tests can be performed both before and after fragment shader execution, with latter tests taking values written to _FragDepth_ and _FragStencilRefEXT_ into account. Early tests are not guaranteed, late tests are.+
 +
If neither of *ExecutionModeDepthReplacing* or *ExecutionModeStencilRefReplacingEXT* are specified, functions identically to *EarlyFragmentTests*. +
If this and *ExecutionModeStencilRefReplacingEXT* are both specified, one of *StencilRefGreaterAMD*, *StencilRefLessAMD*, or *StencilRefUnchangedAMD* must also be specified. +
If this and *ExecutionModeDepthReplacing* are both specified, one of *DepthGreater*, *DepthLess*, or *DepthUnchanged* must also be specified. +
 +
Only valid with the Fragment https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#Execution_Model[Execution Model]. +
See client API for detail on fragment operations.
|
| *Shader*
| 5079 | *StencilRefUnchangedFrontAMD* +
Indicates that early per-fragment tests may assume that any _FragStencilRefEXT_ built in-decorated value written by the shader is equal to the stencil reference value set for the front face in the client API after masking.
Late per-fragment tests will use the written value as normal. +
 +
Only valid with the Fragment https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#Execution_Model[Execution Model]. +
At most one of *StencilRefGreaterAMD*, *StencilRefLessAMD*, and *StencilRefUnchangedAMD* can be specified.
|
| *StencilExportEXT*
| 5080 | *StencilRefGreaterFrontAMD* +
Indicates that early per-fragment tests may assume that any _FragStencilRefEXT_ built in-decorated value written by the shader is greater than or equal to the stencil reference value set for the front face in the client API after masking.
Late per-fragment tests will use the written value as normal. +
 +
Only valid with the Fragment https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#Execution_Model[Execution Model]. +
At most one of *StencilRefGreaterAMD*, *StencilRefLessAMD*, and *StencilRefUnchangedAMD* can be specified.
|
| *StencilExportEXT*
| 5081 | *StencilRefLessFrontAMD* +
Indicates that early per-fragment tests may assume that any _FragStencilRefEXT_ built in-decorated value written by the shader is less than or equal to the stencil reference value  set for the front face in the client API after masking.
Late per-fragment tests will use the written value as normal. +
 +
Only valid with the Fragment https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#Execution_Model[Execution Model]. +
At most one of *StencilRefGreaterAMD*, *StencilRefLessAMD*, and *StencilRefUnchangedAMD* can be specified.
|
| *StencilExportEXT*
| 5082 | *StencilRefUnchangedBackAMD* +
Indicates that early per-fragment tests may assume that any _FragStencilRefEXT_ built in-decorated value written by the shader is equal to the stencil reference value set for the back face in the client API after masking.
Late per-fragment tests will use the written value as normal. +
 +
Only valid with the Fragment https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#Execution_Model[Execution Model]. +
At most one of *StencilRefGreaterAMD*, *StencilRefLessAMD*, and *StencilRefUnchangedAMD* can be specified.
|
| *StencilExportEXT*
| 5083 | *StencilRefGreaterBackAMD* +
Indicates that early per-fragment tests may assume that any _FragStencilRefEXT_ built in-decorated value written by the shader is greater than or equal to the stencil reference value set for the back face in the client API after masking.
Late per-fragment tests will use the written value as normal. +
 +
Only valid with the Fragment https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#Execution_Model[Execution Model]. +
At most one of *StencilRefGreaterAMD*, *StencilRefLessAMD*, and *StencilRefUnchangedAMD* can be specified.
|
| *StencilExportEXT*
| 5084 | *StencilRefLessBackAMD* +
Indicates that early per-fragment tests may assume that any _FragStencilRefEXT_ built in-decorated value written by the shader is less than or equal to the stencil reference value set for the back face in the client API after masking.
Late per-fragment tests will use the written value as normal. +
 +
Only valid with the Fragment https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#Execution_Model[Execution Model]. +
At most one of *StencilRefGreaterAMD*, *StencilRefLessAMD*, and *StencilRefUnchangedAMD* can be specified.
|
| *StencilExportEXT*
|====

This allows implementations to perform both early and late tests explicitly.


### New GLSL Layout Qualifiers

The following new layout qualifiers are added to GLSL:

Fragment shaders allow the following stand-alone declaration:

```
__early_and_late_fragment_testsAMD
```

to request that certain fragment tests be performed before and after fragment shader execution, as described in
the "`Fragment Operations`" chapter of the Vulkan 1.2 Specification.
This declaration must appear in a line on its own.

The following additional standalone declarations may be specified:

```
layout-qualifier-id:
    __stencil_ref_unchanged_frontAMD
    __stencil_ref_less_frontAMD
    __stencil_ref_greater_frontAMD
    __stencil_ref_unchanged_backAMD
    __stencil_ref_less_backAMD
    __stencil_ref_greater_backAMD
```

These declarations must each appear in a line on their own.
Only one __stencil_ref_*_frontAMD and one __stencil_ref_*_backAMD declaration may be specified.
Each declaration constrains the intentions of the final value of `gl_FragStencilRefARB` written by any shader invocation. 
Implementations are allowed to perform optimizations assuming that the stencil test fails (or passes) for a given fragment if all values of `gl_FragStencilRefARB` consistent with the declaration would fail (or pass).
This potentially includes skipping shader execution if the fragment is discarded because it is occluded and the shader has no side effects.
If the final value of `gl_FragStencilRefARB` is inconsistent with the declaration for the facing of the shaded polygon, the result of the stencil test for the corresponding fragment is undefined.
If the stencil test passes and stencil writes are enabled, the value written to the stencil buffer is always the value of `gl_FragStencilRefARB`, whether or not it is consistent with the layout qualifier.

Each of the above qualifiers maps directly to the equivalently named spir-v execution mode.


### New HLSL Attributes

The following new https://github.com/microsoft/DirectXShaderCompiler/blob/master/docs/SPIR-V.rst#vulkan-specific-attributes[Vulkan Specific Attribute] is added:

  * `early_and_late_tests`: Marks an entry point as enabling early and late depth tests.
    If depth is written via `SV_Depth`, `depth_unchanged` must also be specified (SV_DepthLess and SV_DepthGreater can be written freely). 
    If a stencil reference value is written via `SV_StencilRef`, one of `stencil_ref_unchanged_front`, `stencil_ref_greater_equal_front`, or `stencil_ref_less_equal_front` and one of `stencil_ref_unchanged_back`, `stencil_ref_greater_equal_back`, or `stencil_ref_less_equal_back` must be specified.
  * `depth_unchanged`: Specifies that any depth written to `SV_Depth` will not invalidate the result of early depth tests.
     Sets the `DepthUnchanged` execution mode in SPIR-V.
  * `stencil_ref_unchanged_front`: Specifies that any stencil ref written to `SV_StencilRef` will not invalidate the result of early stencil tests when the fragment is front facing.
    Sets the `StencilRefUnchangedFrontAMD` execution mode in SPIR-V.
  * `stencil_ref_greater_equal_front`: Specifies that any stencil ref written to `SV_StencilRef` will be greater than or equal to the stencil reference value set by the API when the fragment is front facing.
    Sets the `StencilRefGreaterFrontAMD` execution mode in SPIR-V.
  * `stencil_ref_less_equal_front`: Specifies that any stencil ref written to `SV_StencilRef` will be less than or equal to the stencil reference value set by the API when the fragment is front facing.
    Sets the `StencilRefLessFrontAMD` execution mode in SPIR-V.
  * `stencil_ref_unchanged_back`: Specifies that any stencil ref written to `SV_StencilRef` will not invalidate the result of early stencil tests when the fragment is back facing.
    Sets the `StencilRefUnchangedBackAMD` execution mode in SPIR-V.
  * `stencil_ref_greater_equal_back`: Specifies that any stencil ref written to `SV_StencilRef` will be greater than or equal to the stencil reference value set by the API when the fragment is back facing.
    Sets the `StencilRefGreaterBackAMD` execution mode in SPIR-V.
  * `stencil_ref_less_equal_back`: Specifies that any stencil ref written to `SV_StencilRef` will be less than or equal to the stencil reference value set by the API when the fragment is back facing.
    Sets the `StencilRefLessBackAMD` execution mode in SPIR-V.

Shaders must not specify more than one of `stencil_ref_unchanged_front`, `stencil_ref_greater_equal_front`, and `stencil_ref_less_equal_front`.
Shaders must not specify more than one of `stencil_ref_unchanged_back`, `stencil_ref_greater_equal_back`, and `stencil_ref_less_equal_back`.


## Issues

### UNRESOLVED: Should we expose a feature/property indiciting if the implementation is actually going to perform early and late tests?

It would be useful if ultimately all implementations could ship this feature, treating it as a no-op where relevant - but if some implementations cannot gain any advantage from this, it might be reasonable to expose a property indicating this.