appendices/memorymodel.adoc


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219

// Copyright 2017-2023 The Khronos Group Inc.
//
// SPDX-License-Identifier: CC-BY-4.0

[appendix]
[[memory-model]]
= Memory Model

[NOTE]
.Note
====
This memory model describes synchronizations provided by all
implementations; however, some of the synchronizations defined require extra
features to be supported by the implementation.
ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
See slink:VkPhysicalDeviceVulkanMemoryModelFeatures.
endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
====

[[memory-model-agent]]
== Agent

_Operation_ is a general term for any task that is executed on the system.

[NOTE]
.Note
====
An operation is by definition something that is executed.
Thus if an instruction is skipped due to control flow, it does not
constitute an operation.
====

Each operation is executed by a particular _agent_.
Possible agents include each shader invocation, each host thread, and each
fixed-function stage of the pipeline.


[[memory-model-memory-location]]
== Memory Location

A _memory location_ identifies unique storage for 8 bits of data.
Memory operations access a _set of memory locations_ consisting of one or
more memory locations at a time, e.g. an operation accessing a 32-bit
integer in memory would read/write a set of four memory locations.
Memory operations that access whole aggregates may: access any padding bytes
between elements or members, but no padding bytes at the end of the
aggregate.
Two sets of memory locations _overlap_ if the intersection of their sets of
memory locations is non-empty.
A memory operation must: not affect memory at a memory location not within
its set of memory locations.

Memory locations for buffers and images are explicitly allocated in
slink:VkDeviceMemory objects, and are implicitly allocated for SPIR-V
variables in each shader invocation.

ifdef::VK_KHR_workgroup_memory_explicit_layout[]
Variables with code:Workgroup storage class that point to a block-decorated
type share a set of memory locations.
endif::VK_KHR_workgroup_memory_explicit_layout[]


[[memory-model-allocation]]
== Allocation

The values stored in newly allocated memory locations are determined by a
SPIR-V variable's initializer, if present, or else are undefined:.
At the time an allocation is created there have been no
<<memory-model-memory-operation,memory operations>> to any of its memory
locations.
The initialization is not considered to be a memory operation.

[NOTE]
.Note
====
For tessellation control shader output variables, a consequence of
initialization not being considered a memory operation is that some
implementations may need to insert a barrier between the initialization of
the output variables and any reads of those variables.
====


[[memory-model-memory-operation]]
== Memory Operation

For an operation A and memory location M:

  * [[memory-model-access-read]] A _reads_ M if and only if the data stored
    in M is an input to A.
  * [[memory-model-access-write]] A _writes_ M if and only if the data
    output from A is stored to M.
  * [[memory-model-access-access]] A _accesses_ M if and only if it either
    reads or writes (or both) M.

[NOTE]
.Note
====
A write whose value is the same as what was already in those memory
locations is still considered to be a write and has all the same effects.
====


[[memory-model-references]]
== Reference

A _reference_ is an object that a particular agent can: use to access a set
of memory locations.
On the host, a reference is a host virtual address.
On the device, a reference is:

  * The descriptor that a variable is bound to, for variables in Image,
    Uniform, or StorageBuffer storage classes.
    If the variable is an array (or array of arrays, etc.) then each element
    of the array may: be a unique reference.
ifdef::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[]
  * The address range for a buffer in code:PhysicalStorageBuffer storage
    class, where the base of the address range is queried with
ifndef::VK_VERSION_1_2,VK_KHR_buffer_device_address[]
    flink:vkGetBufferDeviceAddressEXT
endif::VK_VERSION_1_2,VK_KHR_buffer_device_address[]
ifdef::VK_VERSION_1_2,VK_KHR_buffer_device_address[]
    flink:vkGetBufferDeviceAddress
endif::VK_VERSION_1_2,VK_KHR_buffer_device_address[]
    and the length of the range is the size of the buffer.
endif::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[]
ifdef::VK_KHR_workgroup_memory_explicit_layout[]
  * A single common reference for all variables with code:Workgroup storage
    class that point to a block-decorated type.
  * The variable itself for non-block-decorated type variables in
    code:Workgroup storage class.
endif::VK_KHR_workgroup_memory_explicit_layout[]
  * The variable itself for variables in other storage classes.

Two memory accesses through distinct references may: require availability
and visibility operations as defined
<<memory-model-location-ordered,below>>.


[[memory-model-program-order]]
== Program-Order

A _dynamic instance_ of an instruction is defined in SPIR-V
(https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#DynamicInstance)
as a way of referring to a particular execution of a static instruction.
Program-order is an ordering on dynamic instances of instructions executed
by a single shader invocation:

  * (Basic block): If instructions A and B are in the same basic block, and
    A is listed in the module before B, then the n'th dynamic instance of A
    is program-ordered before the n'th dynamic instance of B.
  * (Branch): The dynamic instance of a branch or switch instruction is
    program-ordered before the dynamic instance of the OpLabel instruction
    to which it transfers control.
  * (Call entry): The dynamic instance of an code:OpFunctionCall instruction
    is program-ordered before the dynamic instances of the
    code:OpFunctionParameter instructions and the body of the called
    function.
  * (Call exit): The dynamic instance of the instruction following an
    code:OpFunctionCall instruction is program-ordered after the dynamic
    instance of the return instruction executed by the called function.
  * (Transitive Closure): If dynamic instance A of any instruction is
    program-ordered before dynamic instance B of any instruction and B is
    program-ordered before dynamic instance C of any instruction then A is
    program-ordered before C.
  * (Complete definition): No other dynamic instances are program-ordered.

For instructions executed on the host, the source language defines the
program-order relation (e.g. as "`sequenced-before`").


ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
[[shader-call-related]]
== Shader Call Related

Shader-call-related is an equivalence relation on invocations defined as the
symmetric and transitive closure of:

  * A is shader-call-related to B if A is created by an
    <<ray-tracing-repack,invocation repack>> instruction executed by B.


[[shader-call-order]]
== Shader Call Order

Shader-call-order is a partial order on dynamic instances of instructions
executed by invocations that are shader-call-related:

  * (Program order): If dynamic instance A is program-ordered before B, then
    A is shader-call-ordered before B.
  * (Shader call entry): If A is a dynamic instance of an
    <<ray-tracing-repack,invocation repack>> instruction and B is a dynamic
    instance executed by an invocation that is created by A, then A is
    shader-call-ordered before B.
  * (Shader call exit): If A is a dynamic instance of an
    <<ray-tracing-repack,invocation repack>> instruction, B is the next
    dynamic instance executed by the same invocation, and C is a dynamic
    instance executed by an invocation that is created by A, then C is
    shader-call-ordered before B.
  * (Transitive closure): If A is shader-call-ordered-before B and B is
    shader-call-ordered-before C, then A is shader-call-ordered-before C.
  * (Complete definition): No other dynamic instances are
    shader-call-ordered.
endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]


[[memory-model-scope]]
== Scope

Atomic and barrier instructions include scopes which identify sets of shader
invocations that must: obey the requested ordering and atomicity rules of
the operation, as defined below.

The various scopes are described in detail in <<shaders-scope, the Shaders
chapter>>.


[[memory-model-atomic-operation]]
== Atomic Operation

An _atomic operation_ on the device is any SPIR-V operation whose name
begins with code:OpAtomic.
An atomic operation on the host is any operation performed with an
std::atomic typed object.

Each atomic operation has a memory <<memory-model-scope,scope>> and a
<<memory-model-memory-semantics,semantics>>.
Informally, the scope determines which other agents it is atomic with
respect to, and the <<memory-model-memory-semantics,semantics>> constrains
its ordering against other memory accesses.
Device atomic operations have explicit scopes and semantics.
Each host atomic operation implicitly uses the code:CrossDevice scope, and
uses a memory semantics equivalent to a C++ std::memory_order value of
relaxed, acquire, release, acq_rel, or seq_cst.

Two atomic operations A and B are _potentially-mutually-ordered_ if and only
if all of the following are true:

  * They access the same set of memory locations.
  * They use the same reference.
  * A is in the instance of B's memory scope.
  * B is in the instance of A's memory scope.
  * A and B are not the same operation (irreflexive).

Two atomic operations A and B are _mutually-ordered_ if and only if they are
potentially-mutually-ordered and any of the following are true:

  * A and B are both device operations.
  * A and B are both host operations.
  * A is a device operation, B is a host operation, and the implementation
    supports concurrent host- and device-atomics.

[NOTE]
.Note
====
If two atomic operations are not mutually-ordered, and if their sets of
memory locations overlap, then each must: be synchronized against the other
as if they were non-atomic operations.
====


[[memory-model-scoped-modification-order]]
== Scoped Modification Order

For a given atomic write A, all atomic writes that are mutually-ordered with
A occur in an order known as A's _scoped modification order_.
A's scoped modification order relates no other operations.

[NOTE]
.Note
====
Invocations outside the instance of A's memory scope may: observe the values
at A's set of memory locations becoming visible to it in an order that
disagrees with the scoped modification order.
====

[NOTE]
.Note
====
It is valid to have non-atomic operations or atomics in a different scope
instance to the same set of memory locations, as long as they are
synchronized against each other as if they were non-atomic (if they are not,
it is treated as a <<memory-model-access-data-race,data race>>).
That means this definition of A's scoped modification order could include
atomic operations that occur much later, after intervening non-atomics.
That is a bit non-intuitive, but it helps to keep this definition simple and
non-circular.
====


[[memory-model-memory-semantics]]
== Memory Semantics

Non-atomic memory operations, by default, may: be observed by one agent in a
different order than they were written by another agent.

Atomics and some synchronization operations include _memory semantics_,
which are flags that constrain the order in which other memory accesses
(including non-atomic memory accesses and
<<memory-model-availability-visibility,availability and visibility
operations>>) performed by the same agent can: be observed by other agents,
or can: observe accesses by other agents.

Device instructions that include semantics are code:OpAtomic*,
code:OpControlBarrier, code:OpMemoryBarrier, and code:OpMemoryNamedBarrier.
Host instructions that include semantics are some std::atomic methods and
memory fences.

SPIR-V supports the following memory semantics:

  * Relaxed: No constraints on order of other memory accesses.
  * Acquire: A memory read with this semantic performs an _acquire
    operation_.
    A memory barrier with this semantic is an _acquire barrier_.
  * Release: A memory write with this semantic performs a _release
    operation_.
    A memory barrier with this semantic is a _release barrier_.
  * AcquireRelease: A memory read-modify-write operation with this semantic
    performs both an acquire operation and a release operation, and inherits
    the limitations on ordering from both of those operations.
    A memory barrier with this semantic is both a release and acquire
    barrier.

[NOTE]
.Note
====
SPIR-V does not support "`consume`" semantics on the device.
====

The memory semantics operand also includes _storage class semantics_ which
indicate which storage classes are constrained by the synchronization.
SPIR-V storage class semantics include:

  * UniformMemory
  * WorkgroupMemory
  * ImageMemory
  * OutputMemory

Each SPIR-V memory operation accesses a single storage class.
Semantics in synchronization operations can include a combination of storage
classes.

The UniformMemory storage class semantic applies to accesses to memory in
the
ifdef::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[]
PhysicalStorageBuffer,
endif::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[]
ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
code:ShaderRecordBufferKHR,
endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
Uniform and StorageBuffer storage classes.
The WorkgroupMemory storage class semantic applies to accesses to memory in
the Workgroup storage class.
The ImageMemory storage class semantic applies to accesses to memory in the
Image storage class.
The OutputMemory storage class semantic applies to accesses to memory in the
Output storage class.

[NOTE]
.Note
====
Informally, these constraints limit how memory operations can be reordered,
and these limits apply not only to the order of accesses as performed in the
agent that executes the instruction, but also to the order the effects of
writes become visible to all other agents within the same instance of the
instruction's memory scope.
====

[NOTE]
.Note
====
Release and acquire operations in different threads can: act as
synchronization operations, to guarantee that writes that happened before
the release are visible after the acquire.
(This is not a formal definition, just an Informative forward reference.)
====

[NOTE]
.Note
====
The OutputMemory storage class semantic is only useful in tessellation
control shaders, which is the only execution model where output variables
are shared between invocations.
====

The memory semantics operand can: also include availability and visibility
flags, which apply availability and visibility operations as described in
<<memory-model-availability-visibility,availability and visibility>>.
The availability/visibility flags are:

  * MakeAvailable: Semantics must: be Release or AcquireRelease.
    Performs an availability operation before the release operation or
    barrier.
  * MakeVisible: Semantics must: be Acquire or AcquireRelease.
    Performs a visibility operation after the acquire operation or barrier.

The specifics of these operations are defined in
<<memory-model-availability-visibility-semantics,Availability and Visibility
Semantics>>.

Host atomic operations may: support a different list of memory semantics and
synchronization operations, depending on the host architecture and source
language.


[[memory-model-release-sequence]]
== Release Sequence

After an atomic operation A performs a release operation on a set of memory
locations M, the _release sequence headed by A_ is the longest continuous
subsequence of A's scoped modification order that consists of:

  * the atomic operation A as its first element
  * atomic read-modify-write operations on M by any agent

[NOTE]
.Note
====
The atomics in the last bullet must: be mutually-ordered with A by virtue of
being in A's scoped modification order.
====

[NOTE]
.Note
====
This intentionally omits "`atomic writes to M performed by the same agent
that performed A`", which is present in the corresponding C++ definition.
====


[[memory-model-synchronizes-with]]
== Synchronizes-With

_Synchronizes-with_ is a relation between operations, where each operation
is either an atomic operation or a memory barrier (aka fence on the host).

If A and B are atomic operations, then A synchronizes-with B if and only if
all of the following are true:

  * A performs a release operation
  * B performs an acquire operation
  * A and B are mutually-ordered
  * B reads a value written by A or by an operation in the release sequence
    headed by A

code:OpControlBarrier, code:OpMemoryBarrier, and code:OpMemoryNamedBarrier
are _memory barrier_ instructions in SPIR-V.

If A is a release barrier and B is an atomic operation that performs an
acquire operation, then A synchronizes-with B if and only if all of the
following are true:

  * there exists an atomic write X (with any memory semantics)
  * A is program-ordered before X
  * X and B are mutually-ordered
  * B reads a value written by X or by an operation in the release sequence
    headed by X
  ** If X is relaxed, it is still considered to head a hypothetical release
     sequence for this rule
  * A and B are in the instance of each other's memory scopes
  * X's storage class is in A's semantics.

If A is an atomic operation that performs a release operation and B is an
acquire barrier, then A synchronizes-with B if and only if all of the
following are true:

  * there exists an atomic read X (with any memory semantics)
  * X is program-ordered before B
  * X and A are mutually-ordered
  * X reads a value written by A or by an operation in the release sequence
    headed by A
  * A and B are in the instance of each other's memory scopes
  * X's storage class is in B's semantics.

If A is a release barrier and B is an acquire barrier, then A
synchronizes-with B if all of the following are true:

  * there exists an atomic write X (with any memory semantics)
  * A is program-ordered before X
  * there exists an atomic read Y (with any memory semantics)
  * Y is program-ordered before B
  * X and Y are mutually-ordered
  * Y reads the value written by X or by an operation in the release
    sequence headed by X
  ** If X is relaxed, it is still considered to head a hypothetical release
     sequence for this rule
  * A and B are in the instance of each other's memory scopes
  * X's and Y's storage class is in A's and B's semantics.
  ** NOTE: X and Y must have the same storage class, because they are
     mutually ordered.

If A is a release barrier, B is an acquire barrier, and C is a control
barrier (where A can: equal C, and B can: equal C), then A synchronizes-with
B if all of the following are true:

  * A is program-ordered before (or equals) C
  * C is program-ordered before (or equals) B
  * A and B are in the instance of each other's memory scopes
  * A and B are in the instance of C's execution scope

[NOTE]
.Note
====
This is similar to the barrier-barrier synchronization above, but with a
control barrier filling the role of the relaxed atomics.
====

ifdef::VK_EXT_fragment_shader_interlock[]

Let F be an ordering of fragment shader invocations, such that invocation
F~1~ is ordered before invocation F~2~ if and only if F~1~ and F~2~ overlap
as described in <<shaders-scope-fragment-interlock,Fragment Shader
Interlock>> and F~1~ executes the interlocked code before F~2~.

If A is an code:OpEndInvocationInterlockEXT instruction and B is an
code:OpBeginInvocationInterlockEXT instruction, then A synchronizes-with B
if the agent that executes A is ordered before the agent that executes B in
F. A and B are both considered to have code:FragmentInterlock memory scope
and semantics of UniformMemory and ImageMemory, and A is considered to have
Release semantics and B is considered to have Acquire semantics.

[NOTE]
.Note
====
code:OpBeginInvocationInterlockEXT and code:OpBeginInvocationInterlockEXT do
not perform implicit availability or visibility operations.
Usually, shaders using fragment shader interlock will declare the relevant
resources as `coherent` to get implicit
<<memory-model-instruction-av-vis,per-instruction availability and
visibility operations>>.
====

endif::VK_EXT_fragment_shader_interlock[]

ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
If A is a release barrier and B is an acquire barrier, then A
synchronizes-with B if all of the following are true:

  * A is shader-call-ordered-before B
  * A and B are in the instance of each other's memory scopes

endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]

No other release and acquire barriers synchronize-with each other.


[[memory-model-system-synchronizes-with]]
== System-Synchronizes-With

_System-synchronizes-with_ is a relation between arbitrary operations on the
device or host.
Certain operations system-synchronize-with each other, which informally
means the first operation occurs before the second and that the
synchronization is performed without using application-visible memory
accesses.

If there is an <<synchronization-dependencies-execution,execution
dependency>> between two operations A and B, then the operation in the first
synchronization scope system-synchronizes-with the operation in the second
synchronization scope.

[NOTE]
.Note
====
This covers all Vulkan synchronization primitives, including device
operations executing before a synchronization primitive is signaled, wait
operations happening before subsequent device operations, signal operations
happening before host operations that wait on them, and host operations
happening before flink:vkQueueSubmit.
The list is spread throughout the synchronization chapter, and is not
repeated here.
====

System-synchronizes-with implicitly includes all storage class semantics and
has code:CrossDevice scope.

If A system-synchronizes-with B, we also say A is
_system-synchronized-before_ B and B is _system-synchronized-after_ A.


[[memory-model-non-private]]
== Private vs. Non-Private

By default, non-atomic memory operations are treated as _private_, meaning
such a memory operation is not intended to be used for communication with
other agents.
Memory operations with the NonPrivatePointer/NonPrivateTexel bit set are
treated as _non-private_, and are intended to be used for communication with
other agents.

More precisely, for private memory operations to be
<<memory-model-location-ordered,Location-Ordered>> between distinct agents
requires using system-synchronizes-with rather than shader-based
synchronization.
Private memory operations still obey program-order.

Atomic operations are always considered non-private.


[[memory-model-inter-thread-happens-before]]
== Inter-Thread-Happens-Before

Let SC be a non-empty set of storage class semantics.
Then (using template syntax) operation A _inter-thread-happens-before_<SC>
operation B if and only if any of the following is true:

  * A system-synchronizes-with B
  * A synchronizes-with B, and both A and B have all of SC in their
    semantics
  * A is an operation on memory in a storage class in SC or that has all of
    SC in its semantics, B is a release barrier or release atomic with all
    of SC in its semantics, and A is program-ordered before B
  * A is an acquire barrier or acquire atomic with all of SC in its
    semantics, B is an operation on memory in a storage class in SC or that
    has all of SC in its semantics, and A is program-ordered before B
  * A and B are both host operations and A inter-thread-happens-before B as
    defined in the host language specification
  * A inter-thread-happens-before<SC> some X and X
    inter-thread-happens-before<SC> B


[[memory-model-happens-before]]
== Happens-Before

Operation A _happens-before_ operation B if and only if any of the following
is true:

  * A is program-ordered before B
  * A inter-thread-happens-before<SC> B for some set of storage classes SC

_Happens-after_ is defined similarly.

[NOTE]
.Note
====
Unlike C++, happens-before is not always sufficient for a write to be
visible to a read.
Additional <<memory-model-availability-visibility,availability and
visibility>> operations may: be required for writes to be
<<memory-model-visible-to,visible-to>> other memory accesses.
====

[NOTE]
.Note
====
Happens-before is not transitive, but each of program-order and
inter-thread-happens-before<SC> are transitive.
These can be thought of as covering the "`single-threaded`" case and the
"`multi-threaded`" case, and it is not necessary (and not valid) to form
chains between the two.
====


[[memory-model-availability-visibility]]
== Availability and Visibility

_Availability_ and _visibility_ are states of a write operation, which
(informally) track how far the write has permeated the system, i.e. which
agents and references are able to observe the write.
Availability state is per _memory domain_.
Visibility state is per (agent,reference) pair.
Availability and visibility states are per-memory location for each write.

Memory domains are named according to the agents whose memory accesses use
the domain.
Domains used by shader invocations are organized hierarchically into
multiple smaller memory domains which correspond to the different
<<shaders-scope, scopes>>.
Each memory domain is considered the _dual_ of a scope, and vice versa.
The memory domains defined in Vulkan include:

  * _host_ - accessible by host agents
  * _device_ - accessible by all device agents for a particular device
  * _shader_ - accessible by shader agents for a particular device,
    corresponding to the code:Device scope
  * _queue family instance_ - accessible by shader agents in a single queue
    family, corresponding to the code:QueueFamily scope.
ifdef::VK_EXT_fragment_shader_interlock[]
  * _fragment interlock instance_ - accessible by fragment shader agents
    that <<shaders-scope-fragment-interlock,overlap>>, corresponding to the
    code:FragmentInterlock scope.
endif::VK_EXT_fragment_shader_interlock[]
ifdef::VK_KHR_ray_tracing_pipeline[]
  * _shader call instance_ - accessible by shader agents that are
    <<shader-call-related,shader-call-related>>, corresponding to the
    code:ShaderCallKHR scope.
endif::VK_KHR_ray_tracing_pipeline[]
  * _workgroup instance_ - accessible by shader agents in the same
    workgroup, corresponding to the code:Workgroup scope.
  * _subgroup instance_ - accessible by shader agents in the same subgroup,
    corresponding to the code:Subgroup scope.

The memory domains are nested in the order listed above,
ifdef::VK_KHR_ray_tracing_pipeline[]
except for shader call instance domain,
endif::VK_KHR_ray_tracing_pipeline[]
with memory domains later in the list nested in the domains earlier in the
list.
ifdef::VK_KHR_ray_tracing_pipeline[]
The shader call instance domain is at an implementation-dependent location
in the list, and is nested according to that location.
The shader call instance domain is not broader than the queue family
instance domain.
endif::VK_KHR_ray_tracing_pipeline[]

[NOTE]
.Note
====
Memory domains do not correspond to storage classes or device-local and
host-local slink:VkDeviceMemory allocations, rather they indicate whether a
write can be made visible only to agents in the same subgroup, same
workgroup,
ifdef::VK_EXT_fragment_shader_interlock[]
overlapping fragment shader invocation,
endif::VK_EXT_fragment_shader_interlock[]
ifdef::VK_KHR_ray_tracing_pipeline[]
shader-call-related ray tracing invocation,
endif::VK_KHR_ray_tracing_pipeline[]
in any shader invocation, or anywhere on the device, or host.
The shader, queue family instance,
ifdef::VK_EXT_fragment_shader_interlock[]
fragment interlock instance,
endif::VK_EXT_fragment_shader_interlock[]
ifdef::VK_KHR_ray_tracing_pipeline[]
shader call instance,
endif::VK_KHR_ray_tracing_pipeline[]
workgroup instance, and subgroup instance domains are only used for
shader-based availability/visibility operations, in other cases writes can
be made available from/visible to the shader via the device domain.
====

_Availability operations_, _visibility operations_, and _memory domain
operations_ alter the state of the write operations that happen-before them,
and which are included in their _source scope_ to be available or visible to
their _destination scope_.

  * For an availability operation, the source scope is a set of
    (agent,reference,memory location) tuples, and the destination scope is a
    set of memory domains.
  * For a memory domain operation, the source scope is a memory domain and
    the destination scope is a memory domain.
  * For a visibility operation, the source scope is a set of memory domains
    and the destination scope is a set of (agent,reference,memory location)
    tuples.

How the scopes are determined depends on the specific operation.
Availability and memory domain operations expand the set of memory domains
to which the write is available.
Visibility operations expand the set of (agent,reference,memory location)
tuples to which the write is visible.

Recall that availability and visibility states are per-memory location, and
let W be a write operation to one or more locations performed by agent A via
reference R. Let L be one of the locations written.
(W,L) (the write W to L), is initially not available to any memory domain
and only visible to (A,R,L).
An availability operation AV that happens-after W and that includes (A,R,L)
in its source scope makes (W,L) _available_ to the memory domains in its
destination scope.

A memory domain operation DOM that happens-after AV and for which (W,L) is
available in the source scope makes (W,L) available in the destination
memory domain.

A visibility operation VIS that happens-after AV (or DOM) and for which
(W,L) is available in any domain in the source scope makes (W,L) _visible_
to all (agent,reference,L) tuples included in its destination scope.

If write W~2~ happens-after W, and their sets of memory locations overlap,
then W will not be available/visible to all agents/references for those
memory locations that overlap (and future AV/DOM/VIS ops cannot revive W's
write to those locations).

Availability, memory domain, and visibility operations are treated like
other non-atomic memory accesses for the purpose of
<<memory-model-memory-semantics,memory semantics>>, meaning they can be
ordered by release-acquire sequences or memory barriers.

An _availability chain_ is a sequence of availability operations to
increasingly broad memory domains, where element N+1 of the chain is
performed in the dual scope instance of the destination memory domain of
element N and element N happens-before element N+1.
An example is an availability operation with destination scope of the
workgroup instance domain that happens-before an availability operation to
the shader domain performed by an invocation in the same workgroup.
An availability chain AVC that happens-after W and that includes (A,R,L) in
the source scope makes (W,L) _available_ to the memory domains in its final
destination scope.
An availability chain with a single element is just the availability
operation.

Similarly, a _visibility chain_ is a sequence of visibility operations from
increasingly narrow memory domains, where element N of the chain is
performed in the dual scope instance of the source memory domain of element
N+1 and element N happens-before element N+1.
An example is a visibility operation with source scope of the shader domain
that happens-before a visibility operation with source scope of the
workgroup instance domain performed by an invocation in the same workgroup.
A visibility chain VISC that happens-after AVC (or DOM) and for which (W,L)
is available in any domain in the source scope makes (W,L) _visible_ to all
(agent,reference,L) tuples included in its final destination scope.
A visibility chain with a single element is just the visibility operation.


[[memory-model-vulkan-availability-visibility]]
== Availability, Visibility, and Domain Operations

The following operations generate availability, visibility, and domain
operations.
When multiple availability/visibility/domain operations are described, they
are system-synchronized-with each other in the order listed.

An operation that performs a <<synchronization-dependencies-memory,memory
dependency>> generates:

  * If the source access mask includes ename:VK_ACCESS_HOST_WRITE_BIT, then
    the dependency includes a memory domain operation from host domain to
    device domain.
  * An availability operation with source scope of all writes in the first
    <<synchronization-dependencies-access-scopes,access scope>> of the
    dependency and a destination scope of the device domain.
  * A visibility operation with source scope of the device domain and
    destination scope of the second access scope of the dependency.
  * If the destination access mask includes ename:VK_ACCESS_HOST_READ_BIT or
    ename:VK_ACCESS_HOST_WRITE_BIT, then the dependency includes a memory
    domain operation from device domain to host domain.

flink:vkFlushMappedMemoryRanges performs an availability operation, with a
source scope of (agents,references) = (all host threads, all mapped memory
ranges passed to the command), and destination scope of the host domain.

flink:vkInvalidateMappedMemoryRanges performs a visibility operation, with a
source scope of the host domain and a destination scope of
(agents,references) = (all host threads, all mapped memory ranges passed to
the command).

flink:vkQueueSubmit performs a memory domain operation from host to device,
and a visibility operation with source scope of the device domain and
destination scope of all agents and references on the device.


[[memory-model-availability-visibility-semantics]]
== Availability and Visibility Semantics

A memory barrier or atomic operation via agent A that includes MakeAvailable
in its semantics performs an availability operation whose source scope
includes agent A and all references in the storage classes in that
instruction's storage class semantics, and all memory locations, and whose
destination scope is a set of memory domains selected as specified below.
The implicit availability operation is program-ordered between the barrier
or atomic and all other operations program-ordered before the barrier or
atomic.

A memory barrier or atomic operation via agent A that includes MakeVisible
in its semantics performs a visibility operation whose source scope is a set
of memory domains selected as specified below, and whose destination scope
includes agent A and all references in the storage classes in that
instruction's storage class semantics, and all memory locations.
The implicit visibility operation is program-ordered between the barrier or
atomic and all other operations program-ordered after the barrier or atomic.

The memory domains are selected based on the memory scope of the instruction
as follows:

  * code:Device scope uses the shader domain
  * code:QueueFamily scope uses the queue family instance domain
ifdef::VK_EXT_fragment_shader_interlock[]
  * code:FragmentInterlock scope uses the fragment interlock instance domain
endif::VK_EXT_fragment_shader_interlock[]
ifdef::VK_KHR_ray_tracing_pipeline[]
  * code:ShaderCallKHR scope uses the shader call instance domain
endif::VK_KHR_ray_tracing_pipeline[]
  * code:Workgroup scope uses the workgroup instance domain
  * code:Subgroup uses the subgroup instance domain
  * code:Invocation perform no availability/visibility operations.

When an availability operation performed by an agent A includes a memory
domain D in its destination scope, where D corresponds to scope instance S,
it also includes the memory domains that correspond to each smaller scope
instance S' that is a subset of S and that includes A. Similarly for
visibility operations.


[[memory-model-instruction-av-vis]]
== Per-Instruction Availability and Visibility Semantics

A memory write instruction that includes MakePointerAvailable, or an image
write instruction that includes MakeTexelAvailable, performs an availability
operation whose source scope includes the agent and reference used to
perform the write and the memory locations written by the instruction, and
whose destination scope is a set of memory domains selected by the Scope
operand specified in <<memory-model-availability-visibility-semantics,
Availability and Visibility Semantics>>.
The implicit availability operation is program-ordered between the write and
all other operations program-ordered after the write.

A memory read instruction that includes MakePointerVisible, or an image read
instruction that includes MakeTexelVisible, performs a visibility operation
whose source scope is a set of memory domains selected by the Scope operand
as specified in <<memory-model-availability-visibility-semantics,
Availability and Visibility Semantics>>, and whose destination scope
includes the agent and reference used to perform the read and the memory
locations read by the instruction.
The implicit visibility operation is program-ordered between read and all
other operations program-ordered before the read.

[NOTE]
.Note
====
Although reads with per-instruction visibility only perform visibility ops
from the shader or
ifdef::VK_EXT_fragment_shader_interlock[]
fragment interlock instance or
endif::VK_EXT_fragment_shader_interlock[]
ifdef::VK_KHR_ray_tracing_pipeline[]
shader call instance or
endif::VK_KHR_ray_tracing_pipeline[]
workgroup instance or subgroup instance domain, they will also see writes
that were made visible via the device domain, i.e. those writes previously
performed by non-shader agents and made visible via API commands.
====

[NOTE]
.Note
====
It is expected that all invocations in a subgroup execute on the same
processor with the same path to memory, and thus availability and visibility
operations with subgroup scope can be expected to be "`free`".
====


[[memory-model-location-ordered]]
== Location-Ordered

Let X and Y be memory accesses to overlapping sets of memory locations M,
where X != Y. Let (A~X~,R~X~) be the agent and reference used for X, and
(A~Y~,R~Y~) be the agent and reference used for Y. For now, let "`->`"
denote happens-before and "`->^rcpo^`" denote the reflexive closure of
program-ordered before.

If D~1~ and D~2~ are different memory domains, then let DOM(D~1~,D~2~) be a
memory domain operation from D~1~ to D~2~.
Otherwise, let DOM(D,D) be a placeholder such that X->DOM(D,D)->Y if and
only if X->Y.

X is _location-ordered_ before Y for a location L in M if and only if any of
the following is true:

  * A~X~ == A~Y~ and R~X~ == R~Y~ and X->Y
  ** NOTE: this case means no availability/visibility ops are required when
     it is the same (agent,reference).

  * X is a read, both X and Y are non-private, and X->Y
  * X is a read, and X (transitively) system-synchronizes with Y

  * If R~X~ == R~Y~ and A~X~ and A~Y~ access a common memory domain D (e.g.
    are in the same workgroup instance if D is the workgroup instance
    domain), and both X and Y are non-private:
  ** X is a write, Y is a write, AVC(A~X~,R~X~,D,L) is an availability chain
     making (X,L) available to domain D, and X->^rcpo^AVC(A~X~,R~X~,D,L)->Y
  ** X is a write, Y is a read, AVC(A~X~,R~X~,D,L) is an availability chain
     making (X,L) available to domain D, VISC(A~Y~,R~Y~,D,L) is a visibility
     chain making writes to L available in domain D visible to Y, and
     X->^rcpo^AVC(A~X~,R~X~,D,L)->VISC(A~Y~,R~Y~,D,L)->^rcpo^Y
  ** If
     slink:VkPhysicalDeviceVulkanMemoryModelFeatures::pname:vulkanMemoryModelAvailabilityVisibilityChains
     is ename:VK_FALSE, then AVC and VISC must: each only have a single
     element in the chain, in each sub-bullet above.

  * Let D~X~ and D~Y~ each be either the device domain or the host domain,
    depending on whether A~X~ and A~Y~ execute on the device or host:
  ** X is a write and Y is a write, and
     X->AV(A~X~,R~X~,D~X~,L)->DOM(D~X~,D~Y~)->Y
  ** X is a write and Y is a read, and
     X->AV(A~X~,R~X~,D~X~,L)->DOM(D~X~,D~Y~)->VIS(A~Y~,R~Y~,D~Y~,L)->Y

[NOTE]
.Note
====
The final bullet (synchronization through device/host domain) requires
API-level synchronization operations, since the device/host domains are not
accessible via shader instructions.
And "`device domain`" is not to be confused with "`device scope`", which
synchronizes through the "`shader domain`".
====


[[memory-model-access-data-race]]
== Data Race

Let X and Y be operations that access overlapping sets of memory locations
M, where X != Y, and at least one of X and Y is a write, and X and Y are not
mutually-ordered atomic operations.
If there does not exist a location-ordered relation between X and Y for each
location in M, then there is a _data race_.

Applications must: ensure that no data races occur during the execution of
their application.

[NOTE]
.Note
====
Data races can only occur due to instructions that are actually executed.
For example, an instruction skipped due to control flow must not contribute
to a data race.
====


[[memory-model-visible-to]]
== Visible-To

Let X be a write and Y be a read whose sets of memory locations overlap, and
let M be the set of memory locations that overlap.
Let M~2~ be a non-empty subset of M. Then X is _visible-to_ Y for memory
locations M~2~ if and only if all of the following are true:

  * X is location-ordered before Y for each location L in M~2~.
  * There does not exist another write Z to any location L in M~2~ such that
    X is location-ordered before Z for location L and Z is location-ordered
    before Y for location L.

If X is visible-to Y, then Y reads the value written by X for locations
M~2~.

[NOTE]
.Note
====
It is possible for there to be a write between X and Y that overwrites a
subset of the memory locations, but the remaining memory locations (M~2~)
will still be visible-to Y.
====


[[memory-model-acyclicity]]
== Acyclicity

_Reads-from_ is a relation between operations, where the first operation is
a write, the second operation is a read, and the second operation reads the
value written by the first operation.
_From-reads_ is a relation between operations, where the first operation is
a read, the second operation is a write, and the first operation reads a
value written earlier than the second operation in the second operation's
scoped modification order (or the first operation reads from the initial
value, and the second operation is any write to the same locations).

Then the implementation must: guarantee that no cycles exist in the union of
the following relations:

  * location-ordered
  * scoped modification order (over all atomic writes)
  * reads-from
  * from-reads

[NOTE]
.Note
====
This is a "`consistency`" axiom, which informally guarantees that sequences
of operations cannot violate causality.
====


[[memory-model-scoped-modification-order-coherence]]
=== Scoped Modification Order Coherence

Let A and B be mutually-ordered atomic operations, where A is
location-ordered before B. Then the following rules are a consequence of
acyclicity:

  * If A and B are both reads and A does not read the initial value, then
    the write that A takes its value from must: be earlier in its own scoped
    modification order than (or the same as) the write that B takes its
    value from (no cycles between location-order, reads-from, and
    from-reads).
  * If A is a read and B is a write and A does not read the initial value,
    then A must: take its value from a write earlier than B in B's scoped
    modification order (no cycles between location-order, scope modification
    order, and reads-from).
  * If A is a write and B is a read, then B must: take its value from A or a
    write later than A in A's scoped modification order (no cycles between
    location-order, scoped modification order, and from-reads).
  * If A and B are both writes, then A must: be earlier than B in A's scoped
    modification order (no cycles between location-order and scoped
    modification order).
  * If A is a write and B is a read-modify-write and B reads the value
    written by A, then B comes immediately after A in A's scoped
    modification order (no cycles between scoped modification order and
    from-reads).


[[memory-model-shader-io]]
== Shader I/O

If a shader invocation A in a shader stage other than code:Vertex performs a
memory read operation X from an object in storage class
ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
code:CallableDataKHR, code:IncomingCallableDataKHR, code:RayPayloadKHR,
code:HitAttributeKHR, code:IncomingRayPayloadKHR, or
endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
code:Input, then X is system-synchronized-after all writes to the
corresponding
ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
code:CallableDataKHR, code:IncomingCallableDataKHR, code:RayPayloadKHR,
code:HitAttributeKHR, code:IncomingRayPayloadKHR, or
endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
code:Output storage variable(s) in the shader invocation(s) that contribute
to generating invocation A, and those writes are all visible-to X.

[NOTE]
.Note
====
It is not necessary for the upstream shader invocations to have completed
execution, they only need to have generated the output that is being read.
====


[[memory-model-deallocation]]
== Deallocation

ifndef::VKSC_VERSION_1_0[]

A call to flink:vkFreeMemory must: happen-after all memory operations on all
memory locations in that slink:VkDeviceMemory object.

[NOTE]
.Note
====
Normally, device memory operations in a given queue are synchronized with
flink:vkFreeMemory by having a host thread wait on a fence signaled by that
queue, and the wait happens-before the call to flink:vkFreeMemory on the
host.
====

endif::VKSC_VERSION_1_0[]

The deallocation of SPIR-V variables is managed by the system and
happens-after all operations on those variables.


[[memory-model-informative-descriptions]]
== Descriptions (Informative)

This subsection offers more easily understandable consequences of the memory
model for app/compiler developers.

Let SC be the storage class(es) specified by a release or acquire operation
or barrier.

  * An atomic write with release semantics must not be reordered against any
    read or write to SC that is program-ordered before it (regardless of the
    storage class the atomic is in).

  * An atomic read with acquire semantics must not be reordered against any
    read or write to SC that is program-ordered after it (regardless of the
    storage class the atomic is in).

  * Any write to SC program-ordered after a release barrier must not be
    reordered against any read or write to SC program-ordered before that
    barrier.

  * Any read from SC program-ordered before an acquire barrier must not be
    reordered against any read or write to SC program-ordered after the
    barrier.

A control barrier (even if it has no memory semantics) must not be reordered
against any memory barriers.

This memory model allows memory accesses with and without availability and
visibility operations, as well as atomic operations, all to be performed on
the same memory location.
This is critical to allow it to reason about memory that is reused in
multiple ways, e.g. across the lifetime of different shader invocations or
draw calls.
While GLSL (and legacy SPIR-V) applies the "`coherent`" decoration to
variables (for historical reasons), this model treats each memory access
instruction as having optional implicit availability/visibility operations.
GLSL to SPIR-V compilers should map all (non-atomic) operations on a
coherent variable to Make{Pointer,Texel}\{Available}\{Visible} flags in this
model.

Atomic operations implicitly have availability/visibility operations, and
the scope of those operations is taken from the atomic operation's scope.


[[memory-model-tessellation-output-ordering]]
== Tessellation Output Ordering

For SPIR-V that uses the Vulkan Memory Model, the code:OutputMemory storage
class is used to synchronize accesses to tessellation control output
variables.
For legacy SPIR-V that does not enable the Vulkan Memory Model via
code:OpMemoryModel, tessellation outputs can be ordered using a control
barrier with no particular memory scope or semantics, as defined below.

Let X and Y be memory operations performed by shader invocations A~X~ and
A~Y~.
Operation X is _tessellation-output-ordered_ before operation Y if and only
if all of the following are true:

  * There is a dynamic instance of an code:OpControlBarrier instruction C
    such that X is program-ordered before C in A~X~ and C is program-ordered
    before Y in A~Y~.
  * A~X~ and A~Y~ are in the same instance of C's execution scope.

If shader invocations A~X~ and A~Y~ in the code:TessellationControl
execution model execute memory operations X and Y, respectively, on the
code:Output storage class, and X is tessellation-output-ordered before Y
with a scope of code:Workgroup, then X is location-ordered before Y, and if
X is a write and Y is a read then X is visible-to Y.


ifdef::VK_NV_cooperative_matrix[]
[[memory-model-cooperative-matrix]]
== Cooperative Matrix Memory Access

For each dynamic instance of a cooperative matrix load or store instruction
(code:OpCooperativeMatrixLoadNV or code:OpCooperativeMatrixStoreNV), a
single implementation-dependent invocation within the instance of the
matrix's scope performs a non-atomic load or store (respectively) to each
memory location that is defined to be accessed by the instruction.
endif::VK_NV_cooperative_matrix[]