Matrix completion is the task of filling in the missing entries of a partially observed matrix, which is equivalent to performing data imputation in statistics. A wide range of datasets are naturally organized in matrix form. One example is the movie-ratings matrix, as appears in the Netflix problem: Given a ratings matrix in which each entry
(i,j)
j
i
i
j
Without any restrictions on the number of degrees of freedom in the completed matrix this problem is underdetermined since the hidden entries could be assigned arbitrary values. Thus we require some assumption on the matrix to create a well-posed problem, such as assuming it has maximal determinant, is positive definite, or is low-rank.
r
In statistical learning point of view, the matrix completion problem is an application of matrix regularization which is a generalization of vector regularization. For example, in the low-rank matrix completion problem one may apply the regularization penalty taking the form of a nuclear norm
R(X)=λ\|X\|*
One of the variants of the matrix completion problem is to find the lowest rank matrix
X
M
E
\begin{align} &\underset{X}{min
Candès and Recht proved that with assumptions on the sampling of the observed entries and sufficiently many sampled entries this problem has a unique solution with high probability.
An equivalent formulation, given that the matrix
M
r
X
Xij=Mij \foralli,j\inE
A number of assumptions on the sampling of the observed entries and the number of sampled entries are frequently made to simplify the analysis and to ensure the problem is not underdetermined.
To make the analysis tractable, it is often assumed that the set
E
|E|
E
p
p
N | |
mn |
N
E
m, n
m<n
|E|
O(nlogn)
N
Suppose the
m
n
M
m<n
r
M
m
n
r
{C}m x
(n+m)r-r2
4nr-4r2
{C}n
r\leqn/2
Secondly, there must be at least one observed entry per row and column of
M
M
U\SigmaV\dagger
i
ith
M
vi
M
j
jth
M
ui
O(nlogn)
Combining the necessary conditions and assuming that
r\llm,n
nrlogn
The concept of incoherence arose in compressed sensing. It is introduced in the context of matrix completion to ensure the singular vectors of
M
1 | |
\sqrt{n |
Rn
m
n
\begin{bmatrix}1&0& … &0\ \vdots&&\vdots\ 0&0&0&0\end{bmatrix}
Im\begin{bmatrix}1&0& … &0\ \vdots&&\vdots\ 0&0&0&0\end{bmatrix}In
M
Candès and Recht define the coherence of a matrix
U
r-
Rn
\mu(U)=
n | |
r |
maxi\|PU
2 | |
e | |
i\| |
PU
U
U\SigmaV\dagger
m
n
M
\mu(U), \mu(V)\leq\mu0
\sumkuk
\dagger | |
v | |
k |
\mu1\sqrt{
r | |
mn |
for some
\mu0, \mu1
In real world application, one often observe only a few entries corrupted at least by a small amount of noise. For example, in the Netflix problem, the ratings are uncertain. Candès and Plan showed that it is possible to fill in the many missing entries of large low-rank matrices from just a few noisy samples by nuclear norm minimization. The noisy model assumes that we observe
Yij=Mij+Zij,(i,j)\in\Omega,
where
{Zij:(i,j)\in\Omega}
P\Omega(Y)=P\Omega(M)+P\Omega(Z),
where
Z
n x n
Zij
(i,j)\in\Omega
\|P\Omega(Z)\|F\leq\delta
\delta>0
\begin{align} &\underset{X}{min
Among all matrices consistent with the data, find the one with minimum nuclear norm. Candès and Plan have shown that this reconstruction is accurate. They have proved that when perfect noiseless recovery occurs, then matrix completion is stable vis a vis perturbations. The error is proportional to the noise level
\delta
2 | |
(1-\delta)\|X\| | |
F |
\leq
1 | |
p |
2 | |
\|P | |
F |
\leq
2 | |
(1+\delta)\|X\| | |
F |
for all matrices
X
\delta<1
The high rank matrix completion in general is NP-Hard. However, with certain assumptions, some incomplete high rank matrix or even full rank matrix can be completed.
Eriksson, Balzano and Nowak have considered the problem of completing a matrix with the assumption that the columns of the matrix belong to a union of multiple low-rank subspaces. Since the columns belong to a union of subspaces, the problem may be viewed as a missing-data version of the subspace clustering problem. Let
X
n x N
k
rank\leqr<n
N\ggkn
X
CrNlog2(n)
X
C>1
The algorithm involves several steps: (1) local neighborhoods; (2) local subspaces; (3) subspace refinement; (4) full matrix completion. This method can be applied to Internet distance matrix completion and topology identification.
Various matrix completion algorithms have been proposed. These include convex relaxation-based algorithm, gradient-based algorithm, and alternating minimization-based algorithm.
\|M\|*
M
rank(M)
M
\begin{align} &\underset{W1,W2}{min
The complexity of using SDP to solve the convex relaxation is
O(max(m,n)4)
Candès and Recht show, using the study of random variables on Banach spaces, that if the number of observed entries is on the order of
2, | |
max{\{\mu | |
1 |
\sqrt{\mu0}\mu1,\mu0n0.25\}}nrlogn
m<n
1- | c |
n3 |
c
M
r\leq
n0.2 | |
\mu0 |
\mu0n1.2rlogn
nrlogn
This result has been improved by Candès and Tao. They achieve bounds that differ from the optimal bounds only by polylogarithmic factors by strengthening the assumptions. Instead of the incoherence property, they assume the strong incoherence property with parameter
\mu3
|\langleea,PUea'\rangle-
r | |
m |
1a|\leq\mu3
\sqrt{r | |
a,a'\leqm
|\langleeb,PUeb'\rangle-
r | |
n |
1b|\leq\mu3
\sqrt{r | |
b,b'\leqn
\sumiui
\dagger | |
v | |
i |
\mu3\sqrt{
r | |
mn |
Intuitively, strong incoherence of a matrix
U
U
Candès and Tao find that when
r
O(1)
4 | |
\mu | |
3 |
n(logn)2
1- | c |
n3 |
c
r
2 | |
\mu | |
3 |
nr(logn)6
Another convex relaxation approach is to minimize the Frobenius squared norm under a rank constraint. This is equivalent to solving
\begin{align} &\underset{X}{min
By introducing an orthogonal projection matrix
Y
Y2=Y,Y=Y'
X
X=YX,trace(Y)\leqk
\begin{align} &\underset{X,Y,\theta}{min
If Y is a projection matrix (i.e., has binary eigenvalues) in this relaxation, then the relaxation is tight. Otherwise, it gives a valid lower bound on the overall objective. Moreover, it can be converted into a feasible solution with a (slightly) larger objective by rounding the eigenvalues of Y greedily. Remarkably, this convex relaxation can be solved by alternating minimization on X and Y without solving any SDPs, and thus it scales beyond the typical numerical limits of state-of-the-art SDP solvers like SDPT3 or Mosek.
This approach is a special case of a more general reformulation technique, which can be applied to obtain a valid lower bound on any low-rank problem with a trace-matrix-convex objective.
Keshavan, Montanari and Oh consider a variant of matrix completion where the rank of the
m
n
M
r
m | |
n |
M
Mmax
\sigma1 | |
\sigmar |
\sigma1
\sigmar
M
\mu0
\mu1
\sigma1 | |
\sigmar |
\mu0
\mu1
ME
M
E
ME
2|E| | |
n |
2|E| | |
n |
ME
r
Tr(ME)
minX,Y
min | |
S\inRr |
1 | |
2 |
\sumi,j(Mij-
\dagger) | |
(XSY | |
ij |
)2+\rhoG(X,Y)
G(X,Y)
X, Y
X0, Y0
Tr(ME)=X0S0
\dagger | |
Y | |
0 |
G(X,Y)
X, Y
X0
Y0
XSY\dagger
Steps 1 and 2 of the algorithm yield a matrix
Tr(ME)
M
1- | 1 |
n3 |
1 | ||||||
|
\|M-Tr(ME)
2 | |
\| | |
F |
\leqC
r | \sqrt{ | |
m|E| |
m | |
n |
C
\| ⋅ \|F
ME
r
M
In Step 3, the space of candidate matrices
X, Y
(X,Y)
(XQ,YR)
Q
R
r
r
r\llm, n
nrlogn
M
nrlogn
Alternating minimization represents a widely applicable and empirically successful approach for finding low-rank matrices that best fit the given data. For example, for the problem of low-rank matrix completion, this method is believed to be one of the most accurate and efficient, and formed a major component of the winning entry in the Netflix problem. In the alternating minimization approach, the low-rank target matrix is written in a bilinear form:
X=UVT
the algorithm then alternates between finding the best
U
V
The alternating minimization algorithm can be viewed as an approximate way to solve the following non-convex problem:
\begin{align} &\underset{U,V\inRn x
The AltMinComplete Algorithm proposed by Jain, Netrapalli and Sanghavi is listed here:
\Omega
P\Omega(M)
\Omega
2T+1
\Omega0, … ,\Omega2T
\Omega
\Omegat
\hat{U}0=SVD(
1 | |
p |
P | |
\Omega0 |
(M),k)
k
1 | |
p |
P | |
\Omega0 |
(M)
\hat{U}0
2\mu\sqrt{k | |
\hat{U}0
t=0, … ,T-1
\hat{V}t+1\leftarrow
argmin | |
V\inRn x |
\|P | |
\Omegat+1 |
(\hat{U}VT-M)\|
2 | |
F |
\hat{U}t+1\leftarrow
argmin | |
U\inRm x |
\|P | |
\OmegaT+t+1 |
(U(\hat{V}t+1)T-M)\|
2 | |
F |
X=\hat{U}T(\hat{V}T)T
They showed that by observing
|\Omega|=O((
| |||||||
|
)6k7lognlog(k\|M\|F/\epsilon))
M
M
O(log(1/\epsilon))
|\Omega|
\Omega
O(|\Omega|k2log(1/\epsilon))
It is worth noting that, although convex relaxation based methods have rigorous analysis, alternating minimization based algorithms are more successful in practice.
Several applications of matrix completion are summarized by Candès and Plan as follows:
Collaborative filtering is the task of making automatic predictions about the interests of a user by collecting taste information from many users. Companies like Apple, Amazon, Barnes and Noble, and Netflix are trying to predict their user preferences from partial knowledge. In these kind of matrix completion problem, the unknown full matrix is often considered low rank because only a few factors typically contribute to an individual's tastes or preference.
In control, one would like to fit a discrete-time linear time-invariant state-space model
\begin{align} x(t+1)&=Ax(t)+Bu(t)\\ y(t)&=Cx(t)+Du(t) \end{align}
to a sequence of inputs
u(t)\inRm
y(t)\inRp,t=0,\ldots,N
x(t)\inRn
t
n
A,B,C,D
x(0)
The localization (or global positioning) problem emerges naturally in IoT sensor networks. The problem is to recover the sensor map in Euclidean space from a local or partial set of pairwise distances. Thus it is a matrix completion problem with rank two if the sensors are located in a 2-D plane and three if they are in a 3-D space.
Most of the real-world social networks have low-rank distance matrices. When we are not able to measure the complete network, which can be due to reasons such as private nodes, limited storage or compute resources, we only have a fraction of distance entries known. Criminal networks are a good example of such networks. Low-rank Matrix Completion can be used to recover these unobserved distances.