In mathematics, low-rank approximation refers to the process of approximating a given matrix by a matrix of lower rank. More precisely, it is a minimization problem, in which the cost function measures the fit between a given matrix (the data) and an approximating matrix (the optimization variable), subject to a constraint that the approximating matrix has reduced rank. The problem is used for mathematical modeling and data compression. The rank constraint is related to a constraint on the complexity of a model that fits the data. In applications, often there are other constraints on the approximating matrix apart from the rank constraint, e.g., non-negativity and Hankel structure.
Low-rank approximation is closely related to numerous other techniques, including principal component analysis, factor analysis, total least squares, latent semantic analysis, orthogonal regression, and dynamic mode decomposition.
Given
l{S}:
np | |
R |
\toRm x
np | |
p\inR |
\| ⋅ \|
r
minimize over\widehatp \|p-\widehatp\| subjectto \operatorname{rank}(l{S}(\widehatp))\leqr.
The unstructured problem with fit measured by the Frobenius norm, i.e.,
minimize over\widehatD \|D-\widehatD\|F subjectto \operatorname{rank}(\widehatD)\leqr
D=U\SigmaV\top\inRm x , m\geqn
D
\Sigma=:\operatorname{diag}(\sigma1,\ldots,\sigmam)
m x m
\sigma1\geq\ldots\geq\sigmam
r\in\{1,...,m-1\}
U
\Sigma
V
U=:\begin{bmatrix}U1&U2\end{bmatrix}, \Sigma=:\begin{bmatrix}\Sigma1&0\ 0&\Sigma2\end{bmatrix}, and V=:\begin{bmatrix}V1&V2\end{bmatrix},
U1
m x r
\Sigma1
r x r
V1
r x n
r
\widehatD*=U1\Sigma1
\top | |
V | |
1 |
,
\|D-\widehat
*\| | |
D | |
F |
=min\operatorname{rank(\widehatD)\leqr}\|D-\widehatD\|F=
2 | |
\sqrt{\sigma | |
r+1 |
+ … +
2 | |
\sigma | |
m}. |
\widehatD*
\sigmar+1 ≠ \sigmar
Let
A\inRm x
m\leqn
A=U\SigmaV\top
is the singular value decomposition of
A
U
V
\Sigma
m x n
(\sigma1,\sigma2, … ,\sigmam)
\sigma1\geq\sigma2\geq … \geq\sigmam\geq0
We claim that the best rank-
k
A
\| ⋅ \|2
Ak:=
k | |
\sum | |
i=1 |
\sigmaiui
\top | |
v | |
i |
where
ui
vi
i
U
V
First, note that we have
\|A-Ak\|2=
\color{red | |
\left\|\sum | |
i=1 |
{n}}\sigmaiui
\top | |
v | |
i |
-
\color{red | |
\sum | |
i=1 |
{k}}\sigmaiui
\top\right\| | |
v | |
2 |
=\left\|\sumi=\color{red{k+1}}n\sigmaiui
\top | |
v | |
i |
\right\|2=\sigmak+1
Therefore, we need to show that if
Bk=XY\top
X
Y
k
\|A-Ak\|2=\sigmak+1\leq\|A-Bk\|2
Since
Y
k
k+1
V
w=\gamma1v1+ … +\gammak+1vk+1,
such that
Y\topw=0
w
\|w\|2=1
2+ … | |
\gamma | |
1 |
2 | |
+\gamma | |
k+1 |
=1
\|A-Bk\|
2 | |
2 |
\geq\|(A-Bk)w\|
2 | |
2 |
=
2 | |
\|Aw\| | |
2 |
=
2\geq | |
\gamma | |
k+1 |
2. | |
\sigma | |
k+1 |
The result follows by taking the square root of both sides of the above inequality.
Let
A\inRm x
m\leqn
A=U\SigmaV\top
is the singular value decomposition of
A
We claim that the best rank
k
A
\| ⋅ \|F
Ak=
k | |
\sum | |
i=1 |
\sigmaiui
\top | |
v | |
i |
where
ui
vi
i
U
V
First, note that we have
\|A-Ak\|
2 | |
F |
=
n | |
\left\|\sum | |
i=k+1 |
\sigmaiuiv
\top | |
i |
2 | |
\right\| | |
F |
=
n | |
\sum | |
i=k+1 |
2 | |
\sigma | |
i |
Therefore, we need to show that if
Bk=XY\top
X
Y
k
\|A-Ak\|
2\leq | |
i |
\|A-Bk\|
2. | |
F |
By the triangle inequality with the spectral norm, if
A=A'+A''
\sigma1(A)\leq\sigma1(A')+\sigma1(A'')
A'k
A''k
k
A'
A''
i,j\geq1
\begin{align} \sigmai(A')+\sigmaj(A'')&=\sigma1(A'-A'i-1)+\sigma1(A''-A''j-1)\\ &\geq\sigma1(A-A'i-1-A''j-1)\\ &\geq\sigma1(A-Ai+j-2) (since{\rmrank}(A'i-1+A''j-1)\leqi+j-2))\\ &=\sigmai+j-1(A). \end{align}
Since
\sigmak+1(Bk)=0
A'=A-Bk
A''=Bk
i\geq1,j=k+1
\sigmai(A-Bk)\geq\sigmak+i(A).
Therefore,
\|A-Bk\|
2 | |
F |
=
n | |
\sum | |
i=1 |
\sigmai(A-B
2 | |
k) |
\geq
2 | |
\sum | |
i(A) |
=\|A-Ak\|
2, | |
F |
as required.
The Frobenius norm weights uniformly all elements of the approximation error
D-\widehatD
minimize over\widehatD \operatorname{vec}(D-\widehatD)\topW\operatorname{vec}(D-\widehatD) subjectto \operatorname{rank}(\widehatD)\leqr,
vec(A)
A
W
The general weighted low-rank approximation problem does not admit an analytic solution in terms of the singular value decomposition and is solved by local optimization methods, which provide no guarantee that a globally optimal solution is found.
In case of uncorrelated weights, weighted low-rank approximation problem also can be formulated in this way:[4] [5] for a non-negative matrix
W
A
\sumi,j(Wi,j(Ai,j-Bi,j))2
B
r
Let
\|A\|p=\left(\sumi,j
p|\right) | |
|A | |
i,j |
1/p
p=2
nnz(A)+n ⋅ poly(k/\epsilon)
For
p=1
\|B-A\|1
p=0
p\geq1
Let
P=\{p1,\ldots,pm\}
Q=\{q1,\ldots,qn\}
A
m x n
Ai,j=dist(pi,qi)
The low-rank approximation problems in the distributed and streaming setting has been considered in.[14]
Using the equivalences
\operatorname{rank}(\widehatD)\leqr \iff thereareP\in\Rm x andL\in\Rr x suchthat\widehatD=PL
\operatorname{rank}(\widehatD)\leqr \iff thereisfullrowrankR\in\RmsuchthatR\widehatD=0
minimize over\widehatD,PandL \operatorname{vec}\top(D-\widehatD)W\operatorname{vec}(D-\widehatD) subjectto \widehatD=PL
minimize over\widehatDandR \operatorname{vec}\top(D-\widehatD)W\operatorname{vec}(D-\widehatD) subjectto R\widehatD=0 and RR\top=Ir,
Ir
r
The image representation of the rank constraint suggests a parameter optimization method in which the cost function is minimized alternatively over one of the variables (
P
L
P
L
The resulting optimization algorithm (called alternating projections) is globally convergent with a linear convergence rate to a locally optimal solution of the weighted low-rank approximation problem. Starting value for the
P
L
Matlab implementation of the alternating projections algorithm for weighted low-rank approximation:
The alternating projections algorithm exploits the fact that the low rank approximation problem, parameterized in the image form, is bilinear in the variables
P
L
Consider again the weighted low rank approximation problem, parameterized in the image form. Minimization with respect to the
L
P
f(P)=\sqrt{\operatorname{vec}\top(D)(W-W(In ⊗ P)((In ⊗ P)\topW(In ⊗ P))-1(In ⊗ P)\topW )\operatorname{vec}(D)}.
f(P)
P
Matlab implementation of the variable projections algorithm for weighted low-rank approximation:
function [f, vl] = cost_fun(p, d, w)bp = kron(eye(size(d, 2)), p);vl = (bp' * w * bp) \ bp' * w * d(:);f = d(:)' * w * (d(:) - bp * vl); The variable projections approach can be applied also to low rank approximation problems parameterized in the kernel form. The method is effective when the number of eliminated variables is much larger than the number of optimization variables left at the stage of the nonlinear least squares minimization. Such problems occur in system identification, parameterized in the kernel form, where the eliminated variables are the approximating trajectory and the remaining variables are the model parameters. In the context of linear time-invariant systems, the elimination step is equivalent to Kalman smoothing.
Usually, we want our new solution not only to be of low rank, but also satisfy other convex constraints due to application requirements. Our interested problem would be as follows,
minimize over\widehatp \|p-\widehatp\| subjectto \operatorname{rank}(l{S}(\widehatp))\leqrand g(\widehatp)\leq0
This problem has many real world applications, including to recover a good solution from an inexact (semidefinite programming) relaxation. If additional constraint
g(\widehatp)\leq0
This problem is helpful in solving many problems. However, it is challenging due to the combination of the convex and nonconvex (low-rank) constraints. Different techniques were developed based on different realizations of
g(\widehatp)\leq0