From d99193d3fcc4b2a0dacc0a9d7e4951ea611a3e96 Mon Sep 17 00:00:00 2001
From: Jon Leech <oddhack@sonic.net>
Date: Wed, 7 Feb 2024 19:51:30 -0800
Subject: Fix some improperly named files.

---
 proposals/VK_KHR_cooperative_matrix.adoc           |  50 +++++++
 proposals/VK_KHR_cooperative_matrix.asciidoc       |  50 -------
 proposals/VK_KHR_shader_expect_assume.adoc         |  93 ++++++++++++
 proposals/VK_KHR_shader_expect_assume.asciidoc     |  93 ------------
 proposals/VK_KHR_shader_maximal_reconvergence.adoc | 162 +++++++++++++++++++++
 .../VK_KHR_shader_maximal_reconvergence.asciidoc   | 162 ---------------------
 proposals/VK_KHR_shader_subgroup_rotate.adoc       | 150 +++++++++++++++++++
 proposals/VK_KHR_shader_subgroup_rotate.asciidoc   | 150 -------------------
 8 files changed, 455 insertions(+), 455 deletions(-)
 create mode 100644 proposals/VK_KHR_cooperative_matrix.adoc
 delete mode 100644 proposals/VK_KHR_cooperative_matrix.asciidoc
 create mode 100644 proposals/VK_KHR_shader_expect_assume.adoc
 delete mode 100644 proposals/VK_KHR_shader_expect_assume.asciidoc
 create mode 100644 proposals/VK_KHR_shader_maximal_reconvergence.adoc
 delete mode 100644 proposals/VK_KHR_shader_maximal_reconvergence.asciidoc
 create mode 100644 proposals/VK_KHR_shader_subgroup_rotate.adoc
 delete mode 100644 proposals/VK_KHR_shader_subgroup_rotate.asciidoc

diff --git a/proposals/VK_KHR_cooperative_matrix.adoc b/proposals/VK_KHR_cooperative_matrix.adoc
new file mode 100644
index 00000000..83766b7a
--- /dev/null
+++ b/proposals/VK_KHR_cooperative_matrix.adoc
@@ -0,0 +1,50 @@
+// Copyright 2021-2024 The Khronos Group Inc.
+//
+// SPDX-License-Identifier: CC-BY-4.0
+
+= VK_KHR_cooperative_matrix
+:toc: left
+:refpage: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/
+:sectnums:
+
+This document proposes adding support for so-called cooperative matrix
+operations that enables multiple shader invocations to cooperatively and
+efficiently perform matrix multiplications.
+
+== Problem Statement
+
+A growing number of GPU applications are making use of matrix multiplication
+operations. Modern GPU HW can take advantage of cross-invocation communication
+channels or other hardware facilities to implement matrix multiplications
+operations more efficiently but there is currently no suitable standard
+SPIR-V/API mechanism to expose these features to applications or libraries.
+
+== Solution Space
+
+Applications or libraries can use subgroup primitives to write more efficient
+matrix multiplication kernels but, while technically possible on some hardware,
+this approach often does not make it possible to write optimal kernels and
+requires applications to have a lot of device-specific knowledge.
+
+NVIDIA exposed with VK_NV_cooperative_matrix a new set of abstractions for such
+cooperative matrix operations. These include cooperative load and store
+instructions, a matrix multiplication-addition instruction as well a limited
+support for element-wise operations on these matrices. Since the release of
+that extension, a growing body of evidence in the form of discussions and
+other similar vendor extensions suggests that this approach is suitable for
+a wide variety of devices and applications and is thus a good candidate for
+standardisation.
+
+== Proposal
+
+Work towards a standard extension that exposes abstractions similar as those
+released under VK_NV_cooperative_matrix.
+
+== Examples
+
+See specifications and presentations for VK_NV_cooperative_matrix.
+
+== Issues
+
+None.
+
diff --git a/proposals/VK_KHR_cooperative_matrix.asciidoc b/proposals/VK_KHR_cooperative_matrix.asciidoc
deleted file mode 100644
index 83766b7a..00000000
--- a/proposals/VK_KHR_cooperative_matrix.asciidoc
+++ /dev/null
@@ -1,50 +0,0 @@
-// Copyright 2021-2024 The Khronos Group Inc.
-//
-// SPDX-License-Identifier: CC-BY-4.0
-
-= VK_KHR_cooperative_matrix
-:toc: left
-:refpage: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/
-:sectnums:
-
-This document proposes adding support for so-called cooperative matrix
-operations that enables multiple shader invocations to cooperatively and
-efficiently perform matrix multiplications.
-
-== Problem Statement
-
-A growing number of GPU applications are making use of matrix multiplication
-operations. Modern GPU HW can take advantage of cross-invocation communication
-channels or other hardware facilities to implement matrix multiplications
-operations more efficiently but there is currently no suitable standard
-SPIR-V/API mechanism to expose these features to applications or libraries.
-
-== Solution Space
-
-Applications or libraries can use subgroup primitives to write more efficient
-matrix multiplication kernels but, while technically possible on some hardware,
-this approach often does not make it possible to write optimal kernels and
-requires applications to have a lot of device-specific knowledge.
-
-NVIDIA exposed with VK_NV_cooperative_matrix a new set of abstractions for such
-cooperative matrix operations. These include cooperative load and store
-instructions, a matrix multiplication-addition instruction as well a limited
-support for element-wise operations on these matrices. Since the release of
-that extension, a growing body of evidence in the form of discussions and
-other similar vendor extensions suggests that this approach is suitable for
-a wide variety of devices and applications and is thus a good candidate for
-standardisation.
-
-== Proposal
-
-Work towards a standard extension that exposes abstractions similar as those
-released under VK_NV_cooperative_matrix.
-
-== Examples
-
-See specifications and presentations for VK_NV_cooperative_matrix.
-
-== Issues
-
-None.
-
diff --git a/proposals/VK_KHR_shader_expect_assume.adoc b/proposals/VK_KHR_shader_expect_assume.adoc
new file mode 100644
index 00000000..9cfe62c2
--- /dev/null
+++ b/proposals/VK_KHR_shader_expect_assume.adoc
@@ -0,0 +1,93 @@
+// Copyright 2021-2024 The Khronos Group, Inc.
+//
+// SPDX-License-Identifier: CC-BY-4.0
+
+= VK_KHR_shader_expect_assume
+:toc: left
+:refpage: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/
+:sectnums:
+
+This document proposes adding support for expect/assume SPIR-V instructions
+to guide shader program optimizations.
+
+== Problem Statement
+
+Shader writers or generators as well as other SPIR-V producers (e.g. Machine
+Learning compilers) often have access to information that could enable the SPIR-V
+consumers in Vulkan implementations to make better optimization decisions, such
+as knowledge of the likely value of objects or whether a given condition holds,
+but which they cannot communicate to a Vulkan SPIR-V consumer using existing features.
+
+== Solution Space
+
+SPIR-V already provides some mechanisms for producers to give hints to consumers
+in a limited number of scenarios:
+
+- `OpBranchConditional` can accept branch weights that enable producers to
+indicate the likelihood of each path. This does not however generalize
+to `OpSwitch` constructs.
+
+- Various so called _Loop Controls_ make it possible for producers to provide
+metadata about the iteration count of loops or desired unrolling behaviour.
+
+There is however no exposed generic mechanism for SPIR-V producers to communicate
+optimisation information to consumers. SPIR-V does support dedicated instructions,
+introduced by the
+http://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/KHR/SPV_KHR_expect_assume.html[SPV_KHR_expect_assume]
+extension, that make it possible for producers to communicate to consumers the
+likely value of an object or whether a given condition holds, but this extension
+is currently not exposed in Vulkan.
+
+== Proposal
+
+Expose the
+http://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/KHR/SPV_KHR_expect_assume.html[SPV_KHR_expect_assume]
+extension in Vulkan.
+
+The `SPV_KHR_expect_assume` extension introduces two new instructions:
+
+- `OpExpectKHR` makes it possible to state the most probable value of its input.
+- `OpAssumeTrueKHR` enables the optimizer to assume that the provided condition is
+always true.
+
+== Examples
+
+As an illustration, consider the following pseudocode example:
+
+[source]
+----
+c = 20
+d = 2
+b = c / d
+
+if (a - b > 0) {
+    ...
+} else {
+    ...
+}
+----
+
+The writer or producer may know that a > 10. This knowledge makes it possible
+to completely remove the `else` branch. In this case, the producer could perform
+that optimisation alone. However, if the producer only knows that `a` is greater
+than _some_ value provided, say with a specialization constant, it can no longer
+perform the optimisation. Adding that information to the SPIR-V module would
+enable the SPIR-V consumer to do it.
+
+Another possible use could be to provide guarantees that a particular value
+is not NaN or infinite:
+
+[source]
+----
+value = load(...)
+assume(!isnan(value))
+----
+
+== Issues
+
+1) What shader stages should the instructions introduced by this extension
+be allowed in?
+
+*PROPOSED*: No restrictions are placed on the shader stages the instructions can
+be used in.
+
diff --git a/proposals/VK_KHR_shader_expect_assume.asciidoc b/proposals/VK_KHR_shader_expect_assume.asciidoc
deleted file mode 100644
index 9cfe62c2..00000000
--- a/proposals/VK_KHR_shader_expect_assume.asciidoc
+++ /dev/null
@@ -1,93 +0,0 @@
-// Copyright 2021-2024 The Khronos Group, Inc.
-//
-// SPDX-License-Identifier: CC-BY-4.0
-
-= VK_KHR_shader_expect_assume
-:toc: left
-:refpage: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/
-:sectnums:
-
-This document proposes adding support for expect/assume SPIR-V instructions
-to guide shader program optimizations.
-
-== Problem Statement
-
-Shader writers or generators as well as other SPIR-V producers (e.g. Machine
-Learning compilers) often have access to information that could enable the SPIR-V
-consumers in Vulkan implementations to make better optimization decisions, such
-as knowledge of the likely value of objects or whether a given condition holds,
-but which they cannot communicate to a Vulkan SPIR-V consumer using existing features.
-
-== Solution Space
-
-SPIR-V already provides some mechanisms for producers to give hints to consumers
-in a limited number of scenarios:
-
-- `OpBranchConditional` can accept branch weights that enable producers to
-indicate the likelihood of each path. This does not however generalize
-to `OpSwitch` constructs.
-
-- Various so called _Loop Controls_ make it possible for producers to provide
-metadata about the iteration count of loops or desired unrolling behaviour.
-
-There is however no exposed generic mechanism for SPIR-V producers to communicate
-optimisation information to consumers. SPIR-V does support dedicated instructions,
-introduced by the
-http://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/KHR/SPV_KHR_expect_assume.html[SPV_KHR_expect_assume]
-extension, that make it possible for producers to communicate to consumers the
-likely value of an object or whether a given condition holds, but this extension
-is currently not exposed in Vulkan.
-
-== Proposal
-
-Expose the
-http://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/KHR/SPV_KHR_expect_assume.html[SPV_KHR_expect_assume]
-extension in Vulkan.
-
-The `SPV_KHR_expect_assume` extension introduces two new instructions:
-
-- `OpExpectKHR` makes it possible to state the most probable value of its input.
-- `OpAssumeTrueKHR` enables the optimizer to assume that the provided condition is
-always true.
-
-== Examples
-
-As an illustration, consider the following pseudocode example:
-
-[source]
-----
-c = 20
-d = 2
-b = c / d
-
-if (a - b > 0) {
-    ...
-} else {
-    ...
-}
-----
-
-The writer or producer may know that a > 10. This knowledge makes it possible
-to completely remove the `else` branch. In this case, the producer could perform
-that optimisation alone. However, if the producer only knows that `a` is greater
-than _some_ value provided, say with a specialization constant, it can no longer
-perform the optimisation. Adding that information to the SPIR-V module would
-enable the SPIR-V consumer to do it.
-
-Another possible use could be to provide guarantees that a particular value
-is not NaN or infinite:
-
-[source]
-----
-value = load(...)
-assume(!isnan(value))
-----
-
-== Issues
-
-1) What shader stages should the instructions introduced by this extension
-be allowed in?
-
-*PROPOSED*: No restrictions are placed on the shader stages the instructions can
-be used in.
-
diff --git a/proposals/VK_KHR_shader_maximal_reconvergence.adoc b/proposals/VK_KHR_shader_maximal_reconvergence.adoc
new file mode 100644
index 00000000..7b361e4e
--- /dev/null
+++ b/proposals/VK_KHR_shader_maximal_reconvergence.adoc
@@ -0,0 +1,162 @@
+// Copyright 2024 The Khronos Group, Inc.
+//
+// SPDX-License-Identifier: CC-BY-4.0
+
+= VK_KHR_shader_maximal_reconvergence
+:toc: left
+:refpage: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/
+:sectnums:
+
+== Problem Statement
+
+The SPIR-V specification defines several types of instructions as communicating between invocations.
+It refers to these instructions as
+https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html#tangled_instruction[tangled
+instructions].
+Tangled instructions include very useful instructions such as subgroup
+operations and derivatives.
+In order to correctly reason about their programs, shader authors need to be
+able to understand, and be provided some guarantees, about which invocations
+will be tangled together.
+Unfortunately, SPIR-V does not provide strong guarantees surrounding the
+divergence and reconvergence of invocations.
+The
+https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html#uniform_control_flow[guarantees]
+it does provide are rather weak and lead to unreliable behaviour across
+different devices (or even different drivers of the same device).
+
+VK_KHR_shader_subgroup_uniform_control_flow provides stronger guarantees, but
+still has some drawbacks from a shader author's point of view.
+Shader authors would like to be able to reason about the divergence and
+reconvergence of invocations executing shaders written in a HLL and have that
+reasoning translate faithfully into SPIR-V.
+
+== Solution Space
+
+The following options were considered to address this issue:
+
+1. Add new mechanisms to SPIR-V, and optionally HLLs, that provide explicit
+    divergence and reconvergence information directly in the shader.
+2. Add new guarantees to SPIR-V (through a new execution mode) that guarantee
+    divergence and reconvergence in SPIR-V maps intuitively from the shader's
+    representation in a HLL.
+
+The main advantage of option 1 is that is completely explicit.
+The main disadvantage is it likely requires additional changes in HLL
+(otherwise just use option 2) and that it requires shader authors to write more
+verbose code to achieve what should, intuitively, be obvious behavior.
+
+The main advantage of option 2 is that there is almost no burden placed on
+shader authors (beyond requesting the new style of execution).
+Their code works how they expect across different devices.
+The main disadvantage is that drivers must be cautious to preserve the
+information implicitly encoded in the SPIR-V control flow graph throughout
+internal transformations in order to guarantee the expected divergence and
+reconvergence.
+Option 2 is a clear win for shader authors and the difficulty for
+implementations is expected to be manageable.
+
+== Proposal
+
+=== SPV_KHR_maximal_reconvergence
+
+This extension exposes the ability to use the SPIR-V extension, which provides
+extra guarantees surrounding divergence and reconvergence.
+
+The extension introduces the idea of a tangle, which is the set of invocations
+that execute a specific dynamic instruction instance and provides a set of
+rules to reason about which invocations are included in each tangle.
+
+The rules are designed to match shader author intuition of divergence and
+reconvergence in an HLL.
+That is, divergence and reconvergence information is inferred directly from the
+control flow graph of the SPIR-V module.
+
+=== Examples
+
+[source,c]
+----
+uint myMaterialIndex = ...;
+for (;;) {
+  uint materialIndex = subgroupBroadcastFirst(myMaterialIndex);
+  if (myMaterialIndex == materialIndex) {
+    // Vulkan specification requires uniform access to the resource.
+    vec4 diffuse = texture(diffuseSamplers[materialIndex], uv);
+
+    // ...
+
+    break;
+  }
+}
+----
+
+In the above example, the shader author relies on invocations executing
+different loop iterations being diverged from each other; however, SPIR-V does
+not guarantee this to be the case.
+Without maximal reconvergence, an implementation may interleave invocations
+among different iterations of the loop, inadvertently breaking the uniform
+access.
+Another potential problem is that implementations may treat the resource access
+as occurring outside the loop altogether depending on how the compiler analyzes
+the program.
+With maximal reconvergence, invocations are executing different loop iterations
+are never in the same tangle and the break block is always considered to be
+inside the loop.
+With those restrictions, this example behaves as the shader author expects.
+
+[source,c]
+----
+// Free should be initialized to 0.
+layout(set=0, binding=0) buffer BUFFER { uint free; uint data[]; } b;
+void main() {
+  bool needs_space = false;
+  ...
+  if (needs_space) {
+    // gl_SubgroupSize may be larger than the actual subgroup size so
+    // calculate the actual subgroup size.
+    uvec4 mask = subgroupBallot(needs_space);
+    uint size = subgroupBallotBitCount(mask);
+    uint base = 0;
+    if (subgroupElect()) {
+      // "free" tracks the next free slot for writes.
+      // The first invocation in the subgroup allocates space
+      // for each invocation in the subgroup that requires it.
+      base = atomicAdd(b.free, size);
+    }
+
+    // Broadcast the base index to other invocations in the subgroup.
+    base = subgroupBroadcastFirst(base);
+    // Calculate the offset from "base" for each invocation.
+    uint offset = subgroupBallotExclusiveBitCount(mask);
+
+    // Write the data in the allocated slot for each invocation that
+    // requested space.
+    b.data[base + offset] = ...;
+  }
+  ...
+}
+----
+
+This example is borrowed from the
+https://github.com/KhronosGroup/Vulkan-Guide/blob/master/chapters/extensions/VK_KHR_shader_subgroup_uniform_control_flow.adoc[guide
+for VK_KHR_shader_subgroup_uniform_control flow].
+Even with subgroup uniform control flow the rewritten example had a caveat that
+the code could only be executed from subgroup uniform control flow.
+With maximal reconvergence, the unaltered version of code (as listed above) can
+be used directly to perform atomic compaction.
+The extra subgroup operations required by subgroup uniform control flow are no longer required.
+Maximal reconvergence guarantees that the election, broadcast and bit count all
+operate on the same tangle.
+
+== Issues
+
+=== RESOLVED: Can a single behavior be provided for switch statements?
+
+Unfortunately, maximal reconvergence cannot guarantee a single behavior for
+switch statements.
+There are too many different implementations for a switch statement,
+restricting the divergence and reconvergence behavior would have serious
+negative performance impacts on some implementations.
+Instead, shader authors should avoid switch statements in favour of if/else
+statements if they require guarantees about divergence and reconvergence.
+
diff --git a/proposals/VK_KHR_shader_maximal_reconvergence.asciidoc b/proposals/VK_KHR_shader_maximal_reconvergence.asciidoc
deleted file mode 100644
index 7b361e4e..00000000
--- a/proposals/VK_KHR_shader_maximal_reconvergence.asciidoc
+++ /dev/null
@@ -1,162 +0,0 @@
-// Copyright 2024 The Khronos Group, Inc.
-//
-// SPDX-License-Identifier: CC-BY-4.0
-
-= VK_KHR_shader_maximal_reconvergence
-:toc: left
-:refpage: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/
-:sectnums:
-
-== Problem Statement
-
-The SPIR-V specification defines several types of instructions as communicating between invocations.
-It refers to these instructions as
-https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html#tangled_instruction[tangled
-instructions].
-Tangled instructions include very useful instructions such as subgroup
-operations and derivatives.
-In order to correctly reason about their programs, shader authors need to be
-able to understand, and be provided some guarantees, about which invocations
-will be tangled together.
-Unfortunately, SPIR-V does not provide strong guarantees surrounding the
-divergence and reconvergence of invocations.
-The
-https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html#uniform_control_flow[guarantees]
-it does provide are rather weak and lead to unreliable behaviour across
-different devices (or even different drivers of the same device).
-
-VK_KHR_shader_subgroup_uniform_control_flow provides stronger guarantees, but
-still has some drawbacks from a shader author's point of view.
-Shader authors would like to be able to reason about the divergence and
-reconvergence of invocations executing shaders written in a HLL and have that
-reasoning translate faithfully into SPIR-V.
-
-== Solution Space
-
-The following options were considered to address this issue:
-
-1. Add new mechanisms to SPIR-V, and optionally HLLs, that provide explicit
-    divergence and reconvergence information directly in the shader.
-2. Add new guarantees to SPIR-V (through a new execution mode) that guarantee
-    divergence and reconvergence in SPIR-V maps intuitively from the shader's
-    representation in a HLL.
-
-The main advantage of option 1 is that is completely explicit.
-The main disadvantage is it likely requires additional changes in HLL
-(otherwise just use option 2) and that it requires shader authors to write more
-verbose code to achieve what should, intuitively, be obvious behavior.
-
-The main advantage of option 2 is that there is almost no burden placed on
-shader authors (beyond requesting the new style of execution).
-Their code works how they expect across different devices.
-The main disadvantage is that drivers must be cautious to preserve the
-information implicitly encoded in the SPIR-V control flow graph throughout
-internal transformations in order to guarantee the expected divergence and
-reconvergence.
-Option 2 is a clear win for shader authors and the difficulty for
-implementations is expected to be manageable.
-
-== Proposal
-
-=== SPV_KHR_maximal_reconvergence
-
-This extension exposes the ability to use the SPIR-V extension, which provides
-extra guarantees surrounding divergence and reconvergence.
-
-The extension introduces the idea of a tangle, which is the set of invocations
-that execute a specific dynamic instruction instance and provides a set of
-rules to reason about which invocations are included in each tangle.
-
-The rules are designed to match shader author intuition of divergence and
-reconvergence in an HLL.
-That is, divergence and reconvergence information is inferred directly from the
-control flow graph of the SPIR-V module.
-
-=== Examples
-
-[source,c]
-----
-uint myMaterialIndex = ...;
-for (;;) {
-  uint materialIndex = subgroupBroadcastFirst(myMaterialIndex);
-  if (myMaterialIndex == materialIndex) {
-    // Vulkan specification requires uniform access to the resource.
-    vec4 diffuse = texture(diffuseSamplers[materialIndex], uv);
-
-    // ...
-
-    break;
-  }
-}
-----
-
-In the above example, the shader author relies on invocations executing
-different loop iterations being diverged from each other; however, SPIR-V does
-not guarantee this to be the case.
-Without maximal reconvergence, an implementation may interleave invocations
-among different iterations of the loop, inadvertently breaking the uniform
-access.
-Another potential problem is that implementations may treat the resource access
-as occurring outside the loop altogether depending on how the compiler analyzes
-the program.
-With maximal reconvergence, invocations are executing different loop iterations
-are never in the same tangle and the break block is always considered to be
-inside the loop.
-With those restrictions, this example behaves as the shader author expects.
-
-[source,c]
-----
-// Free should be initialized to 0.
-layout(set=0, binding=0) buffer BUFFER { uint free; uint data[]; } b;
-void main() {
-  bool needs_space = false;
-  ...
-  if (needs_space) {
-    // gl_SubgroupSize may be larger than the actual subgroup size so
-    // calculate the actual subgroup size.
-    uvec4 mask = subgroupBallot(needs_space);
-    uint size = subgroupBallotBitCount(mask);
-    uint base = 0;
-    if (subgroupElect()) {
-      // "free" tracks the next free slot for writes.
-      // The first invocation in the subgroup allocates space
-      // for each invocation in the subgroup that requires it.
-      base = atomicAdd(b.free, size);
-    }
-
-    // Broadcast the base index to other invocations in the subgroup.
-    base = subgroupBroadcastFirst(base);
-    // Calculate the offset from "base" for each invocation.
-    uint offset = subgroupBallotExclusiveBitCount(mask);
-
-    // Write the data in the allocated slot for each invocation that
-    // requested space.
-    b.data[base + offset] = ...;
-  }
-  ...
-}
-----
-
-This example is borrowed from the
-https://github.com/KhronosGroup/Vulkan-Guide/blob/master/chapters/extensions/VK_KHR_shader_subgroup_uniform_control_flow.adoc[guide
-for VK_KHR_shader_subgroup_uniform_control flow].
-Even with subgroup uniform control flow the rewritten example had a caveat that
-the code could only be executed from subgroup uniform control flow.
-With maximal reconvergence, the unaltered version of code (as listed above) can
-be used directly to perform atomic compaction.
-The extra subgroup operations required by subgroup uniform control flow are no longer required.
-Maximal reconvergence guarantees that the election, broadcast and bit count all
-operate on the same tangle.
-
-== Issues
-
-=== RESOLVED: Can a single behavior be provided for switch statements?
-
-Unfortunately, maximal reconvergence cannot guarantee a single behavior for
-switch statements.
-There are too many different implementations for a switch statement,
-restricting the divergence and reconvergence behavior would have serious
-negative performance impacts on some implementations.
-Instead, shader authors should avoid switch statements in favour of if/else
-statements if they require guarantees about divergence and reconvergence.
-
diff --git a/proposals/VK_KHR_shader_subgroup_rotate.adoc b/proposals/VK_KHR_shader_subgroup_rotate.adoc
new file mode 100644
index 00000000..83eb8646
--- /dev/null
+++ b/proposals/VK_KHR_shader_subgroup_rotate.adoc
@@ -0,0 +1,150 @@
+// Copyright 2021-2024 The Khronos Group, Inc.
+//
+// SPDX-License-Identifier: CC-BY-4.0
+
+# Subgroup rotation instruction
+:toc: left
+:refpage: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/
+:sectnums:
+
+## Problem Statement
+
+Subgroup operations are useful in the implementation of many compute algorithms.
+Rotating values across invocations within a subgroup in particular can be useful
+in the implementation of the convolution routines used in neural network inference.
+
+A rotation by N rotates values "down" N invocations within the subgroup.
+
+A rotation by (SubgroupSize - N) rotates values "up" N invocations
+within the subgroup.
+
+Taking the example of a subgroup of size 16, a rotation by 2 would,
+when executed by the invocation identified by id 0, return the value from the
+invocation identified by id 2. The same rotation instruction, when executed
+by the invocation identified by id 14, would return the value from the invocation
+identified by id 0.
+
+While this rotation operation can be built on top of existing subgroup instructions,
+doing so results in far from optimal performance on some implementations.
+
+## Solution Space
+
+### Using existing broadcast instruction
+
+It is possible to broadcast the value for each invocation to all other invocations
+and for each invocation to calculate the id of the invocation whose value it needs
+to retain. This is very inefficient and the cost of the rotation operation as a
+whole grows linearly with the size of the subgroup. It is included here only for
+the sake of completeness.
+
+### Using existing shuffle instruction
+
+The rotation operation above can be built on top of the *OpGroupNonUniformShuffle*
+instruction, here abbreviated as `Shuffle`, as follows:
+
+```
+ShuffleRotate(value, amount) = Shuffle(value, ((amount + LocalId) & (SubgroupSize - 1)))
+```
+
+*OpGroupNonUniformShuffle* does not require the source
+invocation's id to be dynamically uniform within the subgroup which results in
+inefficient code for implementations that can optimise the case where the source
+ID is dynamically uniform. Admittedly, it is possible for applications to decorate
+the calculated source id with `Uniform` and implementations to detect that pattern
+and emit optimised code but this approach can be complex and costly to implement as
+well as brittle, especially without introducing a new high-level language construct.
+
+### Using existing relative shuffle instruction
+
+It is similarly possible to implement the rotation operation using the
+*OpGroupNonUniformShuffleUp* or *OpGroupNonUniformShuffleDown* relative shuffle
+instruction that are more efficient on some implementations. However, these
+instructions also do not require the source invocation id to be dynamically
+uniform and their relative nature makes calculating the source invocation ID
+required for a rotation operation more complex than with a general shuffle.
+
+### New shuffle features
+
+Another solution that was considered is the addition of new subgroup features
+that only enable shuffle instructions for cases where the source invocation ID
+is dynamically uniform. While this would be a significant step toward enabling a
+more efficient implementation of the rotation operation described here on
+implementations that can optimise this case, it would not solve the implementation
+complexity issues mentioned above.
+
+This functionality would however be otherwise useful and could be added to the
+current proposal or be the object of a separate proposal.
+
+### New dedicated SPIR-V instruction
+
+Introduce a new dedicated SPIR-V instruction that performs subgroup rotation
+operations and requires the rotation distance to be dynamically uniform.
+
+## Proposal
+
+Expose a new dedicated SPIR-V instruction, as defined by
+http://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/KHR/SPV_KHR_subgroup_rotate.html[SPV_KHR_subgroup_rotate]
+to express rotating values across the invocations of a subgroup that requires
+the rotation amount to be dynamically uniform within the subgroup.
+
+Specify new built-in functions to expose the SPIR-V instruction in GLSL:
+
+```
+genType  subgroupRotate(genType value,  uint delta);
+genIType subgroupRotate(genIType value, uint delta);
+genUType subgroupRotate(genUType value, uint delta);
+genBType subgroupRotate(genBType value, uint delta);
+genDType subgroupRotate(genDType value, uint delta);
+
+genType  subgroupClusteredRotate(genType value,  uint delta, uint clusterSize);
+genIType subgroupClusteredRotate(genIType value, uint delta, uint clusterSize);
+genUType subgroupClusteredRotate(genUType value, uint delta, uint clusterSize);
+genBType subgroupClusteredRotate(genBType value, uint delta, uint clusterSize);
+genDType subgroupClusteredRotate(genDType value, uint delta, uint clusterSize);
+
+If GL_EXT_shader_subgroup_extended_types_int8 is enabled:
+
+genI8Type subgroupRotate(genI8Type value, uint delta);
+genU8Type subgroupRotate(genU8Type value, uint delta);
+
+genI8Type subgroupClusteredRotate(genI8Type value, uint delta, uint clusterSize);
+genU8Type subgroupClusteredRotate(genU8Type value, uint delta, uint clusterSize);
+
+If GL_EXT_shader_subgroup_extended_types_int16 is enabled:
+
+genI16Type subgroupRotate(genI16Type value, uint delta);
+genU16Type subgroupRotate(genU16Type value, uint delta);
+
+genI16Type subgroupClusteredRotate(genI16Type value, uint delta, uint clusterSize);
+genU16Type subgroupClusteredRotate(genU16Type value, uint delta, uint clusterSize);
+
+If GL_EXT_shader_subgroup_extended_types_int64 is enabled:
+
+genI64Type subgroupRotate(genI64Type value, uint delta);
+genU64Type subgroupRotate(genU64Type value, uint delta);
+
+genI64Type subgroupClusteredRotate(genI64Type value, uint delta, uint clusterSize);
+genU64Type subgroupClusteredRotate(genU64Type value, uint delta, uint clusterSize);
+
+If GL_EXT_shader_subgroup_extended_types_float16 is enabled:
+
+genF16Type subgroupRotate(genF16Type value, uint delta);
+
+genF16Type subgroupClusteredRotate(genF16Type value, uint delta, uint clusterSize);
+
+```
+
+Each of the rotate functions shuffles `value` to the invocation with a `gl_SubgroupInvocationID` equal to `(gl_SubgroupInvocationID + delta) % gl_SubgroupSize` for `subgroupRotate`, or to the invocation with a `gl_SubgroupInvocationID` equal to `(gl_SubgroupInvocationID - (gl_SubgroupInvocationID % clusterSize)) + ((gl_SubgroupInvocationID % clusterSize + delta) % clusterSize)` for `subgroupClusteredRotate` functions. 
+
+## Examples
+
+```
+OpCapability GroupNonUniformShuffleRotateKHR
+...
+%result = OpGroupNonUniformShuffleRotateKHR %result_type Subgroup %value %amount
+```
+
+## Further Functionality
+
+See the above description for new shuffle features that would require the
+source invocation id to be dynamically uniform.
diff --git a/proposals/VK_KHR_shader_subgroup_rotate.asciidoc b/proposals/VK_KHR_shader_subgroup_rotate.asciidoc
deleted file mode 100644
index 83eb8646..00000000
--- a/proposals/VK_KHR_shader_subgroup_rotate.asciidoc
+++ /dev/null
@@ -1,150 +0,0 @@
-// Copyright 2021-2024 The Khronos Group, Inc.
-//
-// SPDX-License-Identifier: CC-BY-4.0
-
-# Subgroup rotation instruction
-:toc: left
-:refpage: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/
-:sectnums:
-
-## Problem Statement
-
-Subgroup operations are useful in the implementation of many compute algorithms.
-Rotating values across invocations within a subgroup in particular can be useful
-in the implementation of the convolution routines used in neural network inference.
-
-A rotation by N rotates values "down" N invocations within the subgroup.
-
-A rotation by (SubgroupSize - N) rotates values "up" N invocations
-within the subgroup.
-
-Taking the example of a subgroup of size 16, a rotation by 2 would,
-when executed by the invocation identified by id 0, return the value from the
-invocation identified by id 2. The same rotation instruction, when executed
-by the invocation identified by id 14, would return the value from the invocation
-identified by id 0.
-
-While this rotation operation can be built on top of existing subgroup instructions,
-doing so results in far from optimal performance on some implementations.
-
-## Solution Space
-
-### Using existing broadcast instruction
-
-It is possible to broadcast the value for each invocation to all other invocations
-and for each invocation to calculate the id of the invocation whose value it needs
-to retain. This is very inefficient and the cost of the rotation operation as a
-whole grows linearly with the size of the subgroup. It is included here only for
-the sake of completeness.
-
-### Using existing shuffle instruction
-
-The rotation operation above can be built on top of the *OpGroupNonUniformShuffle*
-instruction, here abbreviated as `Shuffle`, as follows:
-
-```
-ShuffleRotate(value, amount) = Shuffle(value, ((amount + LocalId) & (SubgroupSize - 1)))
-```
-
-*OpGroupNonUniformShuffle* does not require the source
-invocation's id to be dynamically uniform within the subgroup which results in
-inefficient code for implementations that can optimise the case where the source
-ID is dynamically uniform. Admittedly, it is possible for applications to decorate
-the calculated source id with `Uniform` and implementations to detect that pattern
-and emit optimised code but this approach can be complex and costly to implement as
-well as brittle, especially without introducing a new high-level language construct.
-
-### Using existing relative shuffle instruction
-
-It is similarly possible to implement the rotation operation using the
-*OpGroupNonUniformShuffleUp* or *OpGroupNonUniformShuffleDown* relative shuffle
-instruction that are more efficient on some implementations. However, these
-instructions also do not require the source invocation id to be dynamically
-uniform and their relative nature makes calculating the source invocation ID
-required for a rotation operation more complex than with a general shuffle.
-
-### New shuffle features
-
-Another solution that was considered is the addition of new subgroup features
-that only enable shuffle instructions for cases where the source invocation ID
-is dynamically uniform. While this would be a significant step toward enabling a
-more efficient implementation of the rotation operation described here on
-implementations that can optimise this case, it would not solve the implementation
-complexity issues mentioned above.
-
-This functionality would however be otherwise useful and could be added to the
-current proposal or be the object of a separate proposal.
-
-### New dedicated SPIR-V instruction
-
-Introduce a new dedicated SPIR-V instruction that performs subgroup rotation
-operations and requires the rotation distance to be dynamically uniform.
-
-## Proposal
-
-Expose a new dedicated SPIR-V instruction, as defined by
-http://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/KHR/SPV_KHR_subgroup_rotate.html[SPV_KHR_subgroup_rotate]
-to express rotating values across the invocations of a subgroup that requires
-the rotation amount to be dynamically uniform within the subgroup.
-
-Specify new built-in functions to expose the SPIR-V instruction in GLSL:
-
-```
-genType  subgroupRotate(genType value,  uint delta);
-genIType subgroupRotate(genIType value, uint delta);
-genUType subgroupRotate(genUType value, uint delta);
-genBType subgroupRotate(genBType value, uint delta);
-genDType subgroupRotate(genDType value, uint delta);
-
-genType  subgroupClusteredRotate(genType value,  uint delta, uint clusterSize);
-genIType subgroupClusteredRotate(genIType value, uint delta, uint clusterSize);
-genUType subgroupClusteredRotate(genUType value, uint delta, uint clusterSize);
-genBType subgroupClusteredRotate(genBType value, uint delta, uint clusterSize);
-genDType subgroupClusteredRotate(genDType value, uint delta, uint clusterSize);
-
-If GL_EXT_shader_subgroup_extended_types_int8 is enabled:
-
-genI8Type subgroupRotate(genI8Type value, uint delta);
-genU8Type subgroupRotate(genU8Type value, uint delta);
-
-genI8Type subgroupClusteredRotate(genI8Type value, uint delta, uint clusterSize);
-genU8Type subgroupClusteredRotate(genU8Type value, uint delta, uint clusterSize);
-
-If GL_EXT_shader_subgroup_extended_types_int16 is enabled:
-
-genI16Type subgroupRotate(genI16Type value, uint delta);
-genU16Type subgroupRotate(genU16Type value, uint delta);
-
-genI16Type subgroupClusteredRotate(genI16Type value, uint delta, uint clusterSize);
-genU16Type subgroupClusteredRotate(genU16Type value, uint delta, uint clusterSize);
-
-If GL_EXT_shader_subgroup_extended_types_int64 is enabled:
-
-genI64Type subgroupRotate(genI64Type value, uint delta);
-genU64Type subgroupRotate(genU64Type value, uint delta);
-
-genI64Type subgroupClusteredRotate(genI64Type value, uint delta, uint clusterSize);
-genU64Type subgroupClusteredRotate(genU64Type value, uint delta, uint clusterSize);
-
-If GL_EXT_shader_subgroup_extended_types_float16 is enabled:
-
-genF16Type subgroupRotate(genF16Type value, uint delta);
-
-genF16Type subgroupClusteredRotate(genF16Type value, uint delta, uint clusterSize);
-
-```
-
-Each of the rotate functions shuffles `value` to the invocation with a `gl_SubgroupInvocationID` equal to `(gl_SubgroupInvocationID + delta) % gl_SubgroupSize` for `subgroupRotate`, or to the invocation with a `gl_SubgroupInvocationID` equal to `(gl_SubgroupInvocationID - (gl_SubgroupInvocationID % clusterSize)) + ((gl_SubgroupInvocationID % clusterSize + delta) % clusterSize)` for `subgroupClusteredRotate` functions. 
-
-## Examples
-
-```
-OpCapability GroupNonUniformShuffleRotateKHR
-...
-%result = OpGroupNonUniformShuffleRotateKHR %result_type Subgroup %value %amount
-```
-
-## Further Functionality
-
-See the above description for new shuffle features that would require the
-source invocation id to be dynamically uniform.
-- 
cgit v1.2.3