The Q notation is a way to specify the parameters of a binary fixed point number format. For example, in Q notation, the number format denoted by Q8.8
means that the fixed point numbers in this format have 8 bits for the integer part and 8 bits for the fraction part.
A number of other notations have been used for the same purpose.
The Q notation, as defined by Texas Instruments, consists of the letter followed by a pair of numbers mn, where m is the number of bits used for the integer part of the value, and n is the number of fraction bits.
By default, the notation describes signed binary fixed point format, with the unscaled integer being stored in two's complement format, used in most binary processors. The first bit always gives the sign of the value(1 = negative, 0 = non-negative), and it is not counted in the m parameter. Thus, the total number w of bits used is 1 + m + n.
For example, the specification describes a signed binary fixed-point number with a w = 16 bits in total, comprising the sign bit, three bits for the integer part, and 12 bits that are the fraction. That is, a 16-bit signed (two's complement) integer, that is implicitly multiplied by the scaling factor 2−12
In particular, when n is zero, the numbers are just integers. If m is zero, all bits except the sign bit are fraction bits; then the range of the stored number is from −1.0 (inclusive) to +1.0 (exclusive).
The m and the dot may be omitted, in which case they are inferred from the size of the variable or register where the value is stored. Thus, means a signed integer with any number of bits, that is implicitly multiplied by 2−12.
The letter can be prefixed to the to denote an unsigned binary fixed-point format. For example, describes values represented as unsigned 16-bit integers with an implicit scaling factor of 2−15, which range from 0.0 to (216-1)/215 = +1.999969482421875.
A variant of the Q notation has been in use by ARM. In this variant, the m number includes the sign bit. For example, a 16-bit signed integer would be denoted Q15.0
in the TI variant, but Q16.0
in the ARM variant.
The resolution (difference between successive values) of a Qm.n or UQm.n format is always 2−n. The range of representable values depends on the notation used:
Signed Qm.n | −2m to +2m − 2−n | −2m−1 to +2m−1 − 2−n | |
Unsigned UQm.n | 0 to 2m − 2−n | 0 to 2m − 2−n |
Q numbers are a ratio of two integers: the numerator is kept in storage, the denominator
d
Consider the following example:
If the Q number's base is to be maintained (n remains constant) the Q number math operations must keep the denominator
d
N1
N2
N1
d
\begin{align} | N1 |
d |
+
N2 | |
d |
&=
N1+N2 | \\ | |
d |
N1 | |
d |
-
N2 | |
d |
&=
N1-N2 | \\ \left( | |
d |
N1 | |
d |
x
N2 | |
d |
\right) x d&=
N1 x N2 | \\ \left( | |
d |
N1 | |
d |
/
N2 | |
d |
\right)/d&=
N1/N2 | |
d |
\end{align}
Because the denominator is a power of two, the multiplication can be implemented as an arithmetic shift to the left and the division as an arithmetic shift to the right; on many processors shifts are faster than multiplication and division.
To maintain accuracy, the intermediate multiplication and division results must be double precision and care must be taken in rounding the intermediate result before converting back to the desired Q number.
Using C the operations are (note that here, Q refers to the fractional part's number of bits) :
// saturate to range of int16_tint16_t sat16(int32_t x)
int16_t q_mul(int16_t a, int16_t b)