2.8. Fixed-Point Data Conversions

When generic vertex attributes and pixel color or depth components are represented as integers, they are often (but not always) considered to be normalized. Normalized integer values are treated specially when being converted to and from floating-point values, and are usually referred to as normalized fixed-point.

In the remainder of this section, $b$ denotes the bit width of the fixed-point integer representation. When the integer is one of the types defined by the API, $b$ is the bit width of that type. When the integer comes from an image containing color or depth component texels, $b$ is the number of bits allocated to that component in its specified image format.

The signed and unsigned fixed-point representations are assumed to be $b$ -bit binary two’s-complement integers and binary unsigned integers, respectively.

2.8.1. Conversion from Normalized Fixed-Point to Floating-Point

Unsigned normalized fixed-point integers represent numbers in the range $[0,1]$ . The conversion from an unsigned normalized fixed-point value $c$ to the corresponding floating-point value $f$ is defined as

\[ f = { c \over { 2^b - 1 } } \]

Signed normalized fixed-point integers represent numbers in the range $[-1,1]$ . The conversion from a signed normalized fixed-point value $c$ to the corresponding floating-point value $f$ is performed using

\[ f = \max( {c \over {2^{b-1} - 1}}, -1.0 ) \]

Only the range $[-2^{b-1}+1,2^{b-1}-1]$ is used to represent signed fixed-point values in the range $[-1,1]$ . For example, if $b = 8$ , then the integer value $-127$ corresponds to $-1.0$ and the value 127 corresponds to $1.0$ . Note that while zero is exactly expressible in this representation, one value ( $-128$ in the example) is outside the representable range, and must be clamped before use. This equation is used everywhere that signed normalized fixed-point values are converted to floating-point.

2.8.2. Conversion from Floating-Point to Normalized Fixed-Point

The conversion from a floating-point value $f$ to the corresponding unsigned normalized fixed-point value $c$ is defined by first clamping $f$ to the range $[0,1]$ , then computing

\[ c = \operatorname{convertFloatToUint} ( f \times ( 2^b - 1 ) , b ) \]

where $\operatorname{convertFloatToUint}(r,b)$ returns one of the two unsigned binary integer values with exactly $b$ bits which are closest to the floating-point value $r$ . Implementations should round to nearest. If $r$ is equal to an integer, then that integer value is returned. In particular, if $f$ is equal to 0.0 or 1.0, then $c$ must be assigned 0 or $2^b-1$ , respectively.

The conversion from a floating-point value $f$ to the corresponding signed normalized fixed-point value $c$ is performed by clamping $f$ to the range $[-1,1]$ , then computing

\[ c = \operatorname{convertFloatToInt} ( f \times ( 2^{b - 1} - 1 ) , b ) \]

where $\operatorname{convertFloatToInt}(r,b)$ returns one of the two signed two’s-complement binary integer values with exactly $b$ bits which are closest to the floating-point value $r$ . Implementations should round to nearest. If $r$ is equal to an integer, then that integer value must be returned. In particular, if $f$ is equal to -1.0, 0.0, or 1.0, then $c$ must be assigned $-(2^{b-1}-1)$ , 0, or $2^{b-1}-1$ , respectively.

This equation is used everywhere that floating-point values are converted to signed normalized fixed-point.