Implementations normally perform computations in floating-point, and must meet the range and precision requirements defined under “Floating-Point Computation” below.

These requirements only apply to computations performed in Vulkan operations outside of shader execution, such as texture image specification and sampling, and per-fragment operations. Range and precision requirements during shader execution differ and are specified by the Precision and Operation of SPIR-V Instructions section.

In some cases, the representation and/or precision of operations is implicitly limited by the specified format of vertex or texel data consumed by Vulkan. Specific floating-point formats are described later in this section.

Most floating-point computation is performed in SPIR-V shader modules. The properties of computation within shaders are constrained as defined by the Precision and Operation of SPIR-V Instructions section.

Some floating-point computation is performed outside of shaders, such as viewport and depth range calculations. For these computations, we do not specify how floating-point numbers are to be represented, or the details of how operations on them are performed, but only place minimal requirements on representation and precision as described in the remainder of this section.

editing-note | |
---|---|

(Jon, Bug 14966) This is a rat’s nest of complexity, both in terms of describing/enumerating places such computation may take place (other than “not shader code”) and in how implementations may do it. We have consciously deferred the resolution of this issue to post-1.0, and in the meantime, the following language inherited from the OpenGL Specification is inserted as a placeholder. Hopefully it can be tightened up considerably. |

We require simply that numbers' floating-point parts contain enough bits and
that their exponent fields are large enough so that individual results of
floating-point operations are accurate to about 1 part in 10^{5}. The
maximum representable magnitude for all floating-point values must be at
least 2^{32}.
$x \cdot 0 = 0 \cdot x = 0$
for any non-infinite
and non-NaN
$x$
.
$1 \cdot x = x \cdot 1 = x$
.
$x + 0 = 0 + x = x$
.
$0^0 = 1$
.

Occasionally, further requirements will be specified. Most single-precision floating-point formats meet these requirements.

The special values $Inf$ and $-Inf$ encode values with magnitudes too large to be represented; the special value $NaN$ encodes “Not A Number” values resulting from undefined arithmetic operations such as $0 / 0$ . Implementations may support $Inf$ s and $NaN$ s in their floating-point computations.

Any representable floating-point value is legal as input to a Vulkan command that requires floating-point data. The result of providing a value that is not a floating-point number to such a command is unspecified, but must not lead to Vulkan interruption or termination. In [IEEE 754] arithmetic, for example, providing a negative zero or a denormalized number to an Vulkan command must yield deterministic results, while providing a $NaN$ or $Inf$ yields unspecified results.

16-bit floating point numbers are defined in the “16-bit floating point numbers” section of the Khronos Data Format Specification.

Any representable 16-bit floating-point value is legal as input to a Vulkan command that accepts 16-bit floating-point data. The result of providing a value that is not a floating-point number (such as $Inf$ or $NaN$ ) to such a command is unspecified, but must not lead to Vulkan interruption or termination. Providing a denormalized number or negative zero to Vulkan must yield deterministic results.

Unsigned 11-bit floating point numbers are defined in the “Unsigned 11-bit floating point numbers” section of the Khronos Data Format Specification.

When a floating-point value is converted to an unsigned 11-bit floating-point representation, finite values are rounded to the closest representable finite value.

While less accurate, implementations are allowed to always round in the direction of zero. This means negative values are converted to zero. Likewise, finite positive values greater than 65024 (the maximum finite representable unsigned 11-bit floating-point value) are converted to 65024. Additionally: negative infinity is converted to zero; positive infinity is converted to positive infinity; and both positive and negative $NaN$ are converted to positive $NaN$ .

Any representable unsigned 11-bit floating-point value is legal as input to a Vulkan command that accepts 11-bit floating-point data. The result of providing a value that is not a floating-point number (such as $Inf$ or $NaN$ ) to such a command is unspecified, but must not lead to Vulkan interruption or termination. Providing a denormalized number to Vulkan must yield deterministic results.

Unsigned 10-bit floating point numbers are defined in the “Unsigned 10-bit floating point numbers” section of the Khronos Data Format Specification.

When a floating-point value is converted to an unsigned 10-bit floating-point representation, finite values are rounded to the closest representable finite value.

While less accurate, implementations are allowed to always round in the direction of zero. This means negative values are converted to zero. Likewise, finite positive values greater than 64512 (the maximum finite representable unsigned 10-bit floating-point value) are converted to 64512. Additionally: negative infinity is converted to zero; positive infinity is converted to positive infinity; and both positive and negative $NaN$ are converted to positive $NaN$ .

Any representable unsigned 10-bit floating-point value is legal as input to a Vulkan command that accepts 10-bit floating-point data. The result of providing a value that is not a floating-point number (such as $Inf$ or $NaN$ ) to such a command is unspecified, but must not lead to Vulkan interruption or termination. Providing a denormalized number to Vulkan must yield deterministic results.