Elliptic curve scalar multiplication is the operation of successively adding a point along an elliptic curve to itself repeatedly. It is used in elliptic curve cryptography (ECC).The literature presents this operation as scalar multiplication, as written in Hessian form of an elliptic curve. A widespread name for this operation is also elliptic curve point multiplication, but this can convey the wrong impression of being a multiplication between two points.
Given a curve, E, defined by some equation in a finite field (such as E:), point multiplication is defined as the repeated addition of a point along that curve. Denote as for some scalar (integer) n and a point that lies on the curve, E. This type of curve is known as a Weierstrass curve.
The security of modern ECC depends on the intractability of determining n from given known values of Q and P if n is large (known as the elliptic curve discrete logarithm problem by analogy to other cryptographic systems). This is because the addition of two points on an elliptic curve (or the addition of one point to itself) yields a third point on the elliptic curve whose location has no immediately obvious relationship to the locations of the first two, and repeating this many times over yields a point nP that may be essentially anywhere. Intuitively, this is not dissimilar to the fact that if you had a point P on a circle, adding 42.57 degrees to its angle may still be a point "not too far" from P, but adding 1000 or 1001 times 42.57 degrees will yield a point that requires a bit more complex calculation to find the original angle. Reversing this process, i.e., given Q=nP and P, and determining n, can only be done by trying out all possible n—an effort that is computationally intractable if n is large.
There are three commonly defined operations for elliptic curve points: addition, doubling and negation.
Point at infinity is the identity element of elliptic curve arithmetic. Adding it to any point results in that other point, including adding point at infinity to itself.That is:
\begin{align} l{O}+l{O}=l{O}\\ l{O}+P=P \end{align}
Point at infinity is also written as .
Point negation is finding such a point, that adding it to itself will result in point at infinity .
\begin{align} P+(-P)=l{O} \end{align}
For elliptic curves of the form E:, negation is a point with the same x coordinate but negated y coordinate:
\begin{align} (x,y)+(-(x,y))&=l{O}\\ (x,y)+(x,-y)&=l{O}\\ (x,-y)&=-(x,y)\end{align}
With 2 distinct points, P and Q, addition is defined as the negation of the point resulting from the intersection of the curve, E, and the straight line defined by the points P and Q, giving the point, R.[1]
\begin{align} P+Q&=R\\ (xp,yp)+(xq,yq)&=(xr,yr) \end{align}
Assuming the elliptic curve, E, is given by, this can be calculated as:
\begin{align} λ&=
yq-yp | |
xq-xp |
\\ xr&=λ2-xp-xq\\ yr&=λ(xp-xr)-yp\\ \end{align}
Where the points P and Q are coincident (at the same coordinates), addition is similar, except that there is no well-defined straight line through P, so the operation is closed using a limiting case, the tangent to the curve, E, at P.
This is calculated as above, taking derivatives (dE/dx)/(dE/dy):[2]
λ=
| ||||||||||
2yp |
where a is from the defining equation of the curve, E, above.
The straightforward way of computing a point multiplication is through repeated addition. However, there are more efficient approaches to computing the multiplication.
The simplest method is the double-and-add method,[3] similar to square-and-multiply in modular exponentiation. The algorithm works as follows:
To compute sP, start with the binary representation for s:, where .
let bits = bit_representation(s) # the vector of bits (from LSB to MSB) representing s let res =
\begin{align}l{O}\end{align}
let bits = bit_representation(s) # the vector of bits (from LSB to MSB) representing s let i = length(bits) - 2 let res = P while (i >= 0): # traversing from second MSB to LSB res = res + res # double if bits[i]
Note that both of the iterative methods above are vulnerable to timing analysis. See Montgomery Ladder below for an alternative approach.
f(P, d) is if d = 0 then return 0 # computation complete else if d = 1 then return P else if d mod 2 = 1 then return point_add(P, f(P, d - 1)) # addition when d is odd else return f(point_double(P), d / 2) # doubling when d is even
where f is the function for multiplying, P is the coordinate to multiply, d is the number of times to add the coordinate to itself. Example: 100P can be written as and thus requires six point double operations and two point addition operations. 100P would be equal to f(P, 100).
This algorithm requires log2(d) iterations of point doubling and addition to compute the full point multiplication. There are many variations of this algorithm such as using a window, sliding window, NAF, NAF-w, vector chains, and Montgomery ladder.
In the windowed version of this algorithm,[3] one selects a window size w and computes all
2w
dP
d=0,1,2,...,2w-1
d=d0+
wd | |
2 | |
1 |
+22wd2+ … +2mwdm
Q ← 0 for i from m to 0 do Q ← point_double_repeat(Q, w) if di > 0 then Q ← point_add(Q, diP) # using pre-computed value of diP return Q
This algorithm has the same complexity as the double-and-add approach with the benefit of using fewer point additions (which in practice are slower than doubling). Typically, the value of w is chosen to be fairly small making the pre-computation stage a trivial component of the algorithm. For the NIST recommended curves,
w=4
n+1
2w-2+\tfrac{n}{w}
In the sliding-window version, we look to trade off point additions for point doubles. We compute a similar table as in the windowed version except we only compute the points
dP
d=2w-1,2w-1+1,...,2w-1
d=d0+2d1+
2d | |
2 | |
2 |
+ … +
md | |
2 | |
m |
Q ← 0 for i from m downto 0 do if di = 0 then Q ← point_double(Q) else t ← extract j (up to w − 1) additional bits from d (including di) i ← i − j if j < w then Perform double-and-add using t return Q else Q ← point_double_repeat(Q, w) Q ← point_add(Q, tP) return Q
This algorithm has the benefit that the pre-computation stage is roughly half as complex as the normal windowed method while also trading slower point additions for point doublings. In effect, there is little reason to use the windowed method over this approach, except that the former can be implemented in constant time. The algorithm requires
w-1+n
2w-1-1+\tfrac{n}{w}
In the non-adjacent form we aim to make use of the fact that point subtraction is just as easy as point addition to perform fewer (of either) as compared to a sliding-window method. The NAF of the multiplicand
d
i ← 0 while (d > 0) do if (d mod 2) = 1 then di ← d mods 2w d ← d − di else di = 0 d ← d/2 i ← i + 1 return (di−1, di-2, …, d0)
Where the signed modulo function mods is defined as if (d mod 2w) >= 2w−1 return (d mod 2w) − 2w else return d mod 2w
This produces the NAF needed to now perform the multiplication. This algorithm requires the pre-computation of the points
\lbrace1,3,5,...,2w-1-1\rbraceP
P
P=\lbracex,y\rbrace
-P=\lbracex,-y\rbrace
dP
Q ← 0 for j ← i − 1 downto 0 do Q ← point_double(Q) if (dj != 0) Q ← point_add(Q, djP) return Q
The wNAF guarantees that on average there will be a density of
\tfrac{1}{w+1}
2w-2-1
n
\tfrac{n}{w+1}
One property of the NAF is that we are guaranteed that every non-zero element
di
w-1
w
d
2w
di
It has been shown that through application of a FLUSH+RELOAD side-channel attack on OpenSSL, the full private key can be revealed after performing cache-timing against as few as 200 signatures performed.[4]
The Montgomery ladder[5] approach computes the point multiplication in a fixed number of operations. This can be beneficial when timing, power consumption, or branch measurements are exposed to an attacker performing a side-channel attack. The algorithm uses the same representation as from double-and-add.
R0 ← 0 R1 ← P for i from m downto 0 do if di = 0 then R1 ← point_add(R0, R1) R0 ← point_double(R0) else R0 ← point_add(R0, R1) R1 ← point_double(R1) // invariant property to maintain correctness assert R1
This algorithm has in effect the same speed as the double-and-add approach except that it computes the same number of point additions and doubles regardless of the value of the multiplicand d. This means that at this level the algorithm does not leak any information through branches or power consumption.
However, it has been shown that through application of a FLUSH+RELOAD side-channel attack on OpenSSL, the full private key can be revealed after performing cache-timing against only one signature at a very low cost.[6]
Rust code for Montgomery Ladder:[7]
The security of a cryptographic implementation is likely to face the threat of the so called timing attacks which exploits the data-dependent timing characteristics of the implementation. Machines running cryptographicimplementations consume variable amounts of time to process different inputs and so the timings vary based on the encryption key. To resolve this issue, cryptographic algorithms are implementedin a way which removes data dependent variable timing characteristic from the implementation leadingto the so called constant-time implementations. Software implementations are considered to be constant-time in the following sense as stated in:[8] “avoids all input-dependent branches, all input-dependent array indices, and other instructions with input-dependent timings.” The GitHub page[9] lists coding rulesfor implementations of cryptographic operations, and more generally for operations involving secret or sensitive values.
The Montgomery ladder is an
x
Algorithm Montgomery-Ladder(xP,n) input: An
l
n
x
xP
P
x
nP
n
P
i
l-1
i
n
⊕
\langle
\rangle
\langle
\rangle
\langle
\rangle
\langle
\rangle
\langle
\rangle
\langle
\rangle
\langle
\rangle
\langle
\rangle
The Ladder-Step function (given below) used within the ladder is the core of the algorithm and is a combined form of the differential add and doubling operations. The fieldconstant a24 is defined as a24 =
(A+2)/4
A
Function Ladder-Step(
\langle
\rangle
\langle
\rangle
\langle
\rangle
\langle
\rangle
The CSwap function manages the conditional branching and helps the ladder to run following the requirements of a constant time implementation. The function swaps the pair of field elements
\langle
\rangle
\langle
\rangle
b
Since the inception of the standard Montgomery curve Curve25519 at 128-bit security level, there has been various software implementations to compute the ECDH on various architectures and to achieve best possible performance cryptographic developers have resorted to write the implementations using assembly language of the underlying architecture. The work[13] provided a couple of 64-bit assembly implementations targeting the AMD64 architecture. The implementations were developed using a tool known as qhasm[14] which can generate high-speed assembly language cryptographic programs. It is to be noted that the function CSwap was used in the implementations of these ladders. After that there has been several attempts to optimise the ladder implementation through hand-written assembly programs out of which the notion of CSelect was first used in[15] and then in.[16] Apart from using sequential instructions, vector instructions have also been used to optimise the ladder computation through various works.[17] [18] [19] [20] Along with AMD64, attempts have also been made to achieve efficient implementations on other architectures like ARM. The works[21] and [22] provides efficient implementations targeting the ARM architecture. The libraries lib25519[23] and [24] are two state-of-art libraries containing efficient implementations of the Montgomery ladder for Curve25519. Nevertheless, the libraries have implementations of other cryptographic primitives as well.
Apart from Curve25519, there have been several attempts to compute the ladder over other curves at various security levels. Efficient implementations of the ladder over the standard curve Curve448 at 224-bit security level have also been studied in literature. A curve named Curve41417 providing security just over 200 bits was proposed [25] in which a variant of Karatsuba strategy was used to implement the field multiplication needed for the related ECC software. In pursuit of searching Montgomery curves that are competitive to Curve25519 and Curve448 research has been done and couple of curves were proposed along with efficient sequential and vectorised implementations of the corresponding ladders. At 256-bit security level efficient implementations of the ladder have also been addressed through three different Montgomery curves.[26]