In linear algebra, the Strassen algorithm, named after Volker Strassen, is an algorithm for matrix multiplication. It is faster than the standard matrix multiplication algorithm for large matrices, with a better asymptotic complexity, although the naive algorithm is often better for smaller matrices. The Strassen algorithm is slower than the fastest known algorithms for extremely large matrices, but such galactic algorithms are not useful in practice, as they are much slower for matrices of practical size. For small matrices even faster algorithms exist.
Strassen's algorithm works for any ring, such as plus/multiply, but not all semirings, such as min-plus or boolean algebra, where the naive algorithm still works, and so called combinatorial matrix multiplication.
Volker Strassen first published this algorithm in 1969 and thereby proved that the
n3
Let
A
B
l{R}
C=AB
A,B,C\in
\operatorname{Matr} | |
2n x 2n |
(l{R})
A
B
2n x 2n
The Strassen algorithm partitions
A
B
C
A= \begin{bmatrix} A11&A12\\ A21&A22\end{bmatrix}, B= \begin{bmatrix} B11&B12\\ B21&B22\end{bmatrix}, C= \begin{bmatrix} C11&C12\\ C21&C22\end{bmatrix},
with
Aij,Bij,Cij\in
\operatorname{Mat} | |
2n-1 x 2n-1 |
(l{R})
\begin{bmatrix} C11&C12\\ C21&C22\end{bmatrix} = \begin{bmatrix} A11{\color{red} x }B11+A12{\color{red} x }B21 & A11{\color{red} x }B12+A12{\color{red} x }B22\\ A21{\color{red} x }B11+A22{\color{red} x }B21 & A21{\color{red} x }B12+A22{\color{red} x }B22\end{bmatrix}.
This construction does not reduce the number of multiplications: 8 multiplications of matrix blocks are still needed to calculate the
Cij
The Strassen algorithm defines instead new values:
\begin{align} M1&=(A11+A22){\color{red} x }(B11+B22);\\ M2&=(A21+A22){\color{red} x }B11;\\ M3&=A11{\color{red} x }(B12-B22);\\ M4&=A22{\color{red} x }(B21-B11);\\ M5&=(A11+A12){\color{red} x }B22;\\ M6&=(A21-A11){\color{red} x }(B11+B12);\\ M7&=(A12-A22){\color{red} x }(B21+B22),\\ \end{align}
using only 7 multiplications (one for each
Mk
Cij
Mk
\begin{bmatrix} C11&C12\\ C21&C22\end{bmatrix} = \begin{bmatrix} M1+M4-M5+M7 & M3+M5\\ M2+M4 & M1-M2+M3+M6 \end{bmatrix}.
We recursively iterate this division process until the submatrices degenerate into numbers (elements of the ring
l{R}
A
B
C
Practical implementations of Strassen's algorithm switch to standard methods of matrix multiplication for small enough submatrices, for which those algorithms are more efficient. The particular crossover point for which Strassen's algorithm is more efficient depends on the specific implementation and hardware. Earlier authors had estimated that Strassen's algorithm is faster for matrices with widths from 32 to 128 for optimized implementations.[2] However, it has been observed that this crossover point has been increasing in recent years, and a 2010 study found that even a single step of Strassen's algorithm is often not beneficial on current architectures, compared to a highly optimized traditional multiplication, until matrix sizes exceed 1000 or more, and even for matrix sizes of several thousand the benefit is typically marginal at best (around 10% or less). A more recent study (2016) observed benefits for matrices as small as 512 and a benefit around 20%.
It is possible to reduce the number of matrix additions by instead using the following form discovered by Winograd:
\begin{bmatrix} a&b\\ c&d \end{bmatrix} \begin{bmatrix} A&C\\ B&D \end{bmatrix} = \begin{bmatrix} t+b{\color{red} x }B&w+v+(a+b-c-d){\color{red} x }D\\ w+u+d{\color{red} x }(B+C-A-D)&w+u+v \end{bmatrix}
where
t=a{\color{red} x }A, u=(c-a){\color{red} x }(C-D), v=(c+d){\color{red} x }(C-A), w=t+(c+d-a){\color{red} x }(A+D-C)
This reduces the number of matrix additions and subtractions from 18 to 15. The number of matrix multiplications is still 7, and the asymptotic complexity is the same.
The outline of the algorithm above showed that one can get away with just 7, instead of the traditional 8, matrix-matrix multiplications for the sub-blocks of the matrix. On the other hand, one has to do additions and subtractions of blocks, though this is of no concern for the overall complexity: Adding matrices of size
N/2
(N/2)2
2(N/2)3
The question then is how many operations exactly one needs for Strassen's algorithms, and how this compares with the standard matrix multiplication that takes approximately
2N3
N=2n
\Theta(N3)
The number of additions and multiplications required in the Strassen algorithm can be calculated as follows: let
f(n)
2n x 2n
f(n)=7f(n-1)+l4n
l
f(n)=(7+o(1))n
N=2n
O([7+o(1)]n)=
log27+o(1) | |
O(N |
) ≈ O(N2.8074)
Strassen's algorithm needs to be compared to the "naive" way of doing the matrix multiplication that would require 8 instead of 7 multiplications of sub-blocks. This would then give rise to the complexity one expects from the standard approach:
O(8n)=
log28 | |
O(N |
)=O(N3)
Nthreshold
Nthreshold
The bilinear complexity or rank of a bilinear map is an important concept in the asymptotic complexity of matrix multiplication. The rank of a bilinear map
\phi:A x B → C
R(\phi/F)=min\left\{r\left|\existsfi\in
*,w | |
A | |
i\inC, |
\forall
r | |
a\inA,b\inB,\phi(a,b)=\sum | |
i=1 |
fi(a)gi(b)wi\right.\right\}
2 x 2
Standard algorithm | Strassen algorithm | |||||||||||||||||||
i | fi(a) | gi(b) | wi | fi(a) | gi(b) | wi | ||||||||||||||
1 | \begin{bmatrix}1&0\\0&0\end{bmatrix}:a | \begin{bmatrix}1&0\\0&0\end{bmatrix}:b | \begin{bmatrix}1&0\\0&0\end{bmatrix} | \begin{bmatrix}1&0\\0&1\end{bmatrix}:a | \begin{bmatrix}1&0\\0&1\end{bmatrix}:b | \begin{bmatrix}1&0\\0&1\end{bmatrix} | ||||||||||||||
2 | \begin{bmatrix}0&1\\0&0\end{bmatrix}:a | \begin{bmatrix}0&0\\1&0\end{bmatrix}:b | \begin{bmatrix}1&0\\0&0\end{bmatrix} | \begin{bmatrix}0&0\\1&1\end{bmatrix}:a | \begin{bmatrix}1&0\\0&0\end{bmatrix}:b | \begin{bmatrix}0&0\\1&-1\end{bmatrix} | ||||||||||||||
3 | \begin{bmatrix}1&0\\0&0\end{bmatrix}:a | \begin{bmatrix}0&1\\0&0\end{bmatrix}:b | \begin{bmatrix}0&1\\0&0\end{bmatrix} | \begin{bmatrix}1&0\\0&0\end{bmatrix}:a | \begin{bmatrix}0&1\\0&-1\end{bmatrix}:b | \begin{bmatrix}0&1\\0&1\end{bmatrix} | ||||||||||||||
4 | \begin{bmatrix}0&1\\0&0\end{bmatrix}:a | \begin{bmatrix}0&0\\0&1\end{bmatrix}:b | \begin{bmatrix}0&1\\0&0\end{bmatrix} | \begin{bmatrix}0&0\\0&1\end{bmatrix}:a | \begin{bmatrix}-1&0\\1&0\end{bmatrix}:b | \begin{bmatrix}1&0\\1&0\end{bmatrix} | ||||||||||||||
5 | \begin{bmatrix}0&0\\1&0\end{bmatrix}:a | \begin{bmatrix}1&0\\0&0\end{bmatrix}:b | \begin{bmatrix}0&0\\1&0\end{bmatrix} | \begin{bmatrix}1&1\\0&0\end{bmatrix}:a | \begin{bmatrix}0&0\\0&1\end{bmatrix}:b | \begin{bmatrix}-1&1\\0&0\end{bmatrix} | ||||||||||||||
6 | \begin{bmatrix}0&0\\0&1\end{bmatrix}:a | \begin{bmatrix}0&0\\1&0\end{bmatrix}:b | \begin{bmatrix}0&0\\1&0\end{bmatrix} | \begin{bmatrix}-1&0\\1&0\end{bmatrix}:a | \begin{bmatrix}1&1\\0&0\end{bmatrix}:b | \begin{bmatrix}0&0\\0&1\end{bmatrix} | ||||||||||||||
7 | \begin{bmatrix}0&0\\1&0\end{bmatrix}:a | \begin{bmatrix}0&1\\0&0\end{bmatrix}:b | \begin{bmatrix}0&0\\0&1\end{bmatrix} | \begin{bmatrix}0&1\\0&-1\end{bmatrix}:a | \begin{bmatrix}0&0\\1&1\end{bmatrix}:b | \begin{bmatrix}1&0\\0&0\end{bmatrix} | ||||||||||||||
8 | \begin{bmatrix}0&0\\0&1\end{bmatrix}:a | \begin{bmatrix}0&0\\0&1\end{bmatrix}:b | \begin{bmatrix}0&0\\0&1\end{bmatrix} | |||||||||||||||||
fi(a)gi(b)wi |
fi(a)gi(b)wi |
L
R
L=\Theta(R)
R/2\leL\leR
2n x 2n x 2n
7n
n
n
2 x 2 x 2
n
Strassen's algorithm is cache oblivious. Analysis of its cache behavior algorithm has shown it to incur
\Theta\left(1+
n2 | |
b |
+
| |||||
b\sqrt{M |
cache misses during its execution, assuming an idealized cache of size
M
M/b
b
The description above states that the matrices are square, and the size is a power of two, and that padding should be used if needed. This restriction allows the matrices to be split in half, recursively, until limit of scalar multiplication is reached. The restriction simplifies the explanation, and analysis of complexity, but is not actually necessary;[6] and in fact, padding the matrix as described will increase the computation time and can easily eliminate the fairly narrow time savings obtained by using the method in the first place.
A good implementation will observe the following:
O(n2)
1600 x 1600
2048 x 2048
25 x 25
199 x 199
100 x 100
99 x 99
99
100
M2
99
A21+A22
99
100
A22
100
A21
Furthermore, there is no need for the matrices to be square. Non-square matrices can be split in half using the same methods, yielding smaller non-square matrices. If the matrices are sufficiently non-square it will be worthwhile reducing the initial operation to more square products, using simple methods which are essentially
O(n2)
[2N x N]\ast[N x 10N]
[N x N]\ast[N x N]
[N x 10N]\ast[10N x N]
[N x N]\ast[N x N]
These techniques will make the implementation more complicated, compared to simply padding to a power-of-two square; however, it is a reasonable assumption that anyone undertaking an implementation of Strassen, rather than conventional multiplication, will place a higher priority on computational efficiency than on simplicity of the implementation.
In practice, Strassen's algorithm can be implemented to attain better performance than conventional multiplication even for matrices as small as
500 x 500
log23 | |
O(n |
)
O(n2)