In computer science, multiply-with-carry (MWC) is a method invented by George Marsaglia for generating sequences of random integers based on an initial set from two to many thousands of randomly chosen seed values. The main advantages of the MWC method are that it invokes simple computer integer arithmetic and leads to very fast generation of sequences of random numbers with immense periods, ranging from around
260
22000000
As with all pseudorandom number generators, the resulting sequences are functions of the supplied seed values.
xn=bxn-1\bmodp
p
Normal Lehmer generator implementations choose a modulus close to the machine word size. An MWC generator instead maintains its state in base
b
b
b
b
b=28
b=264
b=232
The initial state ("seed") values are arbitrary, except that they must not be all zero, nor all at the maximum permitted values (
x0=b-1
c0=a-1
c0
a-2
xn,cn
xn=(axn-1+cn-1
)\bmodb, c | ||||
|
\right\rfloor
This is called a lag-1 MWC sequence. Sometimes an odd base is preferred, in which case
b=2k-1
r
r
xn,cn
n>r
xn=(axn-r+cn-1
)\bmodb, c | ||||
|
\right\rfloor
and the MWC generator output is the sequence of
x
xr,xr+1,xr+2,...
In this case, the initial state ("seed") values must not be all zero nor
x0=b-1
cr-1=a-1
The MWC multiplier
a
r
p=abr-1
a
r
b
p
Because 2 is a quadratic residue of numbers of the form
8k\pm1
b=2k
p=abr-1
2k
The basic form of an MWC generator has parameters a, b and r, and r+1 words of state. The state consists of r residues modulo b
0 ≤ x0, x1, x2,..., xr−1 < b,and a carry cr−1 < a.
Although the theory of MWC generators permits a > b, a is almost always chosen smaller for convenience of implementation.
The state transformation function of an MWC generator is one step of Montgomery reduction modulo p. The state is a large integer with most significant word cn−1 and least significant word xn−r. Each step, xn−r·(abr−1) is added to this integer. This is done in two parts: −1·xn−r is added to xn−r, resulting in a least significant word of zero. And second, a·xn−r is added to the carry. This makes the integer one word longer, producing two new most significant words xn and cn.
So far, this has simply added a multiple of p to the state, resulting in a different representative of the same residue class modulo p. But finally, the state is shifted down one word, dividing by b. This discards the least significant word of zero (which, in practice, is never computed at all) and effectively multiplies the state by b−1 (mod p).
Thus, a multiply-with-carry generator is a Lehmer generator with modulus p and multiplier b−1 (mod p). This is the same as a generator with multiplier b, but producing output in reverse order, which does not affect the quality of the resultant pseudorandom numbers.
Couture and L'Ecuyer have proved the surprising result that the lattice associated with a multiply-with-carry generator is very close to the lattice associated with the Lehmer generator it simulates. Thus, the mathematical techniques developed for Lehmer generators (such as the spectral test) can be applied to multiply-with-carry generators.
A linear congruential generator with base b = 232 is implemented as
xn+1
32 | |
=(ax | |
n+c) \bmod2 |
,
This can be computed using only the low 32 bits of the product of a and the current x.However, many microprocessors can compute a full 64-bit product in almost the same time as the low 32 bits. Indeed, many compute the 64-bit product and then ignore the high half.
A lag-1 multiply-with-carry generator allows us to make the period nearly 263 by using almost the same computer operations, except that the top half of the 64-bit product is not ignored after the product is computed.Instead, a 64-bit sum is computed, and the top half is used as a new carry value c rather than the fixed additive constant of the standard congruential sequence:Compute ax+c in 64 bits, then use the top half as the new c, and bottom half as the new x.
With multiplier a specified, each pair of input values x, c is converted to a new pair,
x\leftarrow(ax+c)\bmod232, c\leftarrow\left\lfloor
ax+c | |
232 |
\right\rfloor.
If x and c are not both zero, then the period of the resulting multiply-with-carry sequence will be the order of b = 232in the multiplicative group of residues moduloabr − 1, that is, the smallest n such thatbn ≡ 1 (mod abr − 1).
If p = abr − 1 is prime, then Fermat's little theorem guarantees that the order of any element must divide p − 1 = abr − 2, so one way to ensure a large order is to choose a such that p is a "safe prime",that is both p and (p − 1)/2 = abr/2 − 1 are prime. In such a case, for b= 232 and r = 1, the period will be abr/2 − 1, approaching 263, which in practice may be an acceptably large subset of the number of possible 32-bit pairs (x, c).
More specifically, in such a case, the order of any element divides p − 1, and there are only four possible divisors: 1, 2, abr/2 − 1, or abr − 2. The first two apply only to the elements 1 and −1, and quadratic reciprocity arguments show that the fourth option cannot apply to b, so only the third option remains.
Following are some maximal values of a for computer applications which satisfy the above safe prime condition, for lag-1 generators:
Bits in a | b | Maximum a such that ab−1 is a safe prime | Period | |
---|---|---|---|---|
15 | 216 | 215−50 = 32,718 | 1,072,103,423 | |
16 | 216 | 216−352 = 65,184 | 2,135,949,311 | |
31 | 232 | 231−563 = 2,147,483,085 | 4,611,684,809,394,094,079 | |
32 | 232 | 232−178 = 4,294,967,118 | 9,223,371,654,602,686,463 | |
64 | 264 | 264−742 = 18,446,744,073,709,550,874 | 263(264−742)−1 ≈ | |
128 | 2128 | 2128−10,408 | 2127(2128−10,408)−1 ≈ | |
256 | 2256 | 2256−9166 | 2255(2256−9166)−1 ≈ | |
512 | 2512 | 2512−150,736 | 2511(2512−150,736)−1 ≈ |
While a safe prime ensures that almost any element of the group has a large order, the period of the generator is specifically the order of b. For small moduli, more computationally expensive methods can be used to find multipliers a where the period is ab/2 − 1.The following are again maximum values of a of various sizes.
Bits in a | br | Maximum a such that b has order abr/2-1 | Period | |
---|---|---|---|---|
8 | 28 | 28−7 = 249 | 31,871 | |
8 | (28)2 = 216 | 28−32 = 224 | 7,340,031 | |
15 | 216 | 215−29 = 32,739 | 1,072,791,551 | |
16 | 216 | 216−22 = 65,514 | 2,146,762,751 | |
8 | (28)4 = 232 | 28−64 = 192 | 412,316,860,415 | |
15 | (216)2 = 232 | 215−26 = 32,742 | 70,312,909,602,815 | |
16 | (216)2 = 232 | 216−2 = 65,534 | 140,733,193,388,031 | |
31 | 232 | 231−68 = 2,147,483,580 | 4,611,685,872,398,499,839 | |
32 | 232 | 232−76 = 4,294,967,220 | 9,223,371,873,646,018,559 | |
8 | (28)8 = 264 | 28−41 = 215 | 263(28−41)−1 ≈ | |
15 | (216)4 = 264 | 215−50 = 32,718 | 263(215−50)−1 ≈ | |
16 | (216)4 = 264 | 216−56 = 65,480 | 263(216−56)−1 ≈ | |
31 | (232)2 = 264 | 231−38 = 2,147,483,610 | 263(231−38)−1 ≈ | |
32 | (232)2 = 264 | 232−43 = 4,294,967,253 | 263(232−43)−1 ≈ | |
63 | 264 | 263−140 = 9,223,372,036,854,775,668 | 263(263−140)−1 ≈ | |
64 | 264 | 264−116 = 18,446,744,073,709,551,500 | 263(264−116)−1 ≈ |
The output of a multiply-with-carry generator is equivalent to the radix-b expansion of a fraction with denominator p = abr − 1. Here is an example for the simple case of b = 10 and r = 1, so the result is a repeating decimal.
Starting with , the MWC sequence
xn=(7xn-1+cn-1
)\bmod10, c | ||||
|
\right\rfloor,
produces this sequence of states:
10,01,07,49,67,55,40,04,28,58,61,13,22,16,43,25,37,52,19,64,34,31, 10,01,07,...
with period 22. Consider just the sequence of xi:
0,1,7,9,7,5,0,4,8,8,1,3,2,6,3,5,7,2,9,4,4,1, 0,1,7,9,7,5,0,...
Notice that if those repeated segments of x values are put in reverse order:
1449275 … 97101449275 … 9710144 …
10 | |
69 |
=0.144927536231884057971014492753623\ldots
This is true in general: The sequence of xs produced by a lag-r MWC generator:
xn=(axn-r+cn-1
)\bmodb, c | ||||
|
\right\rfloor,
when put in reverse order, will be the radix-b expansion of a fraction j/(abr - 1) for some 0 < j < abr.
Continuing the above example, if we start with , and generate the ordinary congruential sequence
xn=7xn-1\bmod69
31,10,1,7,49,67,55,40,4,28,58,61,13,22,16,43,25,37,52,19,64,34, 31,10,1,7,...and that sequence, reduced mod 10, is
1,0,1,7,9,7,5,0,4,8,8,1,3,2,6,3,5,7,2,9,4,4, 1,0,1,7,9,7,5,0,...the same sequence of xs resulting from the MWC sequence.
This is true in general, (but apparently only for lag-1 MWC sequences):
Given initial values
x0,c0
x1,x2,\ldots
xn=(axn-1+cn-1)\bmod
b, c | ||||
|
\right\rfloor
is exactly the Lehmer random number generator output sequence yn = ayn - 1 mod (ab - 1),reduced modulo b.
Choosing a different initial value y0 merely rotates the cycle of xs.
Establishing the period of a lag-r MWC generator usually entails choosing multiplier a so that p = abr − 1 is prime. Then p − 1 will have to be factored in order to find the order of b mod p. If p is a safe prime, then this is simple, and the order of b will be either p − 1 or (p − 1)/2. In other cases, p - 1 may be difficult to factor.
However, the algorithm also permits a negative multiplier. This leads to a slight modification of the MWC procedure, and produces a modulus p = = abr + 1. This makes p − 1 = abr easy to factor, making it practical to establish the period of very large generators.
The modified procedure is called complementary-multiply-with-carry (CMWC), and the setup is the same as that for lag-r MWC: multiplier a, base b, and r + 1 seeds,
x0, x1, x2, ..., xr−1, and cr−1.
The modification is to the generation of a new pair (x, c). Rearranging the computation to avoid negative numbers, the new x value is complemented by subtracting it from b − 1:
xn=(b-1)-(axn-r+cn-1
)\bmodb, c | ||||
|
\right\rfloor.
The resulting sequence of xs produced by the CMWC RNG will have period the order of b in the multiplicative group of residues modulo abr+1, and the output xs, in reverse order, will form the base b expansion of j/(abr+1) for some 0 < j < abr.
Use of lag-r CMWC makes it much easier to find periods for rs as large as 512, 1024, 2048, etc.(Making r a power of 2 makes it slightly simpler to access elements in the array containing the r most recent xs.)
Another advantage of this modified procedure is that the period is a multiple of b, so the output is exactly equidistributed mod b. (The ordinary MWC, over its full period, produces each possible output an equal number of times except that zero is produced one time less, a bias which is negligible if the period is long enough.)
One disadvantage of the CMWC construction is that, with a power-of-two base, the maximum achievable period is less than for a similar-sized MWC generator; you lose several bits. Thus, an MWC generator is usually preferable for small lags. This can remedied by using b = 2k−1, or choosing a lag one word longer to compensate.
Some examples:With b = 232, and a = 109111 or 108798 or 108517, the period of the lag-1024 CMWC
xn=(b-1)-(axn-1024+cn-1
)\bmodb, c | ||||
|
\right\rfloor.
With b = 232 and a = 3636507990, p = ab1359 − 1 is a safe prime, so the MWC sequence based on that a has period 3636507990·243487 ≈ 1013101.
With b = 232, a CMWC RNG with near record period may be based on the prime p = 15455296b42658 + 1. The order of b for that prime is 241489·21365056 ≈ 10410928.
The MWC modulus of abr−1 is chosen to make computation particularly simple, but brings with it some disadvantages, notably that the period is at most half the modulus. There are several ways to generalize this, at the cost of more multiplications per iteration.
First, it is possible to add additional terms to the product, producing a modulus of the form arbr+asbs−1. This requires computing cnb + xn = arxn−r + asxn−s. (The carry is limited to one word if ar + as ≤ b.)
However, this does not fix the period issue, which depends on the low bits of the modulus. Fortunately, the Montgomery reduction algorithm permits other moduli, as long as they are relatively prime to the base b, and this can be applied to permit a modulus of the form arbr−a0, for a wide range of values a0. Goresky and Klapper developed the theory of these generalized multiply-with-carry generators, proving, in particular, that choosing a negative a0 and ar–a0 < b the carry value is always smaller than b, making the implementation efficient. The more general form of the modulus improves also the quality of the generator, albeit one cannot always get full period.
To implement a Goresky-Klapper generator one precomputes a (mod b), and changes the iteration as follows:
t=cn-1+
r | |
\sum | |
1 |
aixn-i, xn=
-1 | |
a | |
0 |
t\bmodb, cn=\left\lfloor
t-a0xn | |
b |
\right\rfloor.
In the common case that b = 2k, a0 must be odd for the inverse to exist.
The following is an implementation of the CMWC algorithm in the C programming language. Also, included in the program is a sample initialization function. In this implementation the base is 232-1 and lag r=4096. The period of the resulting generator is about
2131104
// How many bits in rand?// https://stackoverflow.com/a/27593398
// CMWC working parts
struct cmwc_state ;
// Collect 32 bits of rand. You are encouraged to use a better source instead.uint32_t rand32(void)
// Init the state with seedvoid initCMWC(struct cmwc_state *state, unsigned int seed)
// CMWC engineuint32_t randCMWC(struct cmwc_state *state) //EDITED parameter *state was missing
int main
The following are implementations of small-state MWC generators with 64-bit output using 128-bit multiplications.
/* The state must be neither all zero, nor x = 2^64 - 1, c = MWC_A1 - 1. The condition 0 < c < MWC_A1 - 1 is thus sufficient. */
uint64_t x, c = 1;
uint64_t inline next
/* The state must be neither all zero, nor x = y = z = 2^64 - 1, c = MWC_A3 - 1. The condition 0 < c < MWC_A3 - 1 is thus sufficient. */
uint64_t x, y, z, c = 1;
uint64_t inline next
The following are implementations of small-state Goresky-Klapper's generalized MWC generators with 64-bit output using 128-bit multiplications.
/* The state must be neither all zero, nor x = 2^64 - 1, c = GMWC_A1 + GMWC_MINUS_A0. The condition 0 < c < GMWC_A1 + GMWC_MINUS_A0 is thus sufficient. */
uint64_t x = 0, c = 1;
uint64_t inline next
/* The state must be neither all zero, nor x = y = z = 2^64 - 1, c = GMWC_A3 + GMWC_MINUS_A0. The condition 0 < c < GMWC_A3 + GMWC_MINUS_A0 is thus sufficient. */
uint64_t x, y, z, c = 1; /* The state can be seeded with any set of values, not all zeroes. */
uint64_t inline next
Because of simplicity and speed, CMWC is known to be used in game development, particularly in modern roguelike games. It is informally known as the Mother of All PRNGs, a name originally coined by Marsaglia himself.[2] In libtcod, CMWC4096 replaced MT19937 as the default PRNG.[3]