Stress majorization explained
Stress majorization is an optimization strategy used in multidimensional scaling (MDS) where, for a set of
-dimensional data items, a configuration
of
points in
-dimensional space is sought that minimizes the so-called stress
function
. Usually
is
or
, i.e. the
matrix
lists points in
or
dimensional Euclidean space so that the result may be visualised (i.e. an MDS plot). The function
is a cost or loss function that measures the squared differences between ideal (
-dimensional) distances and actual distances in r
-dimensional space. It is defined as:\sigma(X)=\sumi<j\lewij(dij(X)-\deltaij)2
where
is a weight for the measurement between a pair of points
,
is the
euclidean distance between
and
and
is the ideal distance between the points (their separation) in the
-dimensional data space. Note that
can be used to specify a degree of confidence in the similarity between points (e.g. 0 can be specified if there is no information for a particular pair).
A configuration
which minimizes
gives a plot in which points that are close together correspond to points that are also close together in the original
-dimensional data space.
There are many ways that
could be minimized. For example, Kruskal
[1] recommended an iterative
steepest descent approach. However, a significantly better (in terms of guarantees on, and rate of, convergence) method for minimizing stress was introduced by
Jan de Leeuw.
[2] De Leeuw's
iterative majorization method at each step minimizes a simple convex function which both bounds
from above and touches the surface of
at a point
, called the
supporting point. In
convex analysis such a function is called a
majorizing function. This iterative majorization process is also referred to as the SMACOF algorithm ("Scaling by MAjorizing a COmplicated Function").
The SMACOF algorithm
The stress function
can be expanded as follows:
\sigma(X)=\sumi<j\lewij(dij(X)-\deltaij
wij
+\sumi<jwij
wij\deltaijdij(X)
Note that the first term is a constant
and the second term is quadratic in
(i.e. for the
Hessian matrix
the second term is equivalent to
tr
) and therefore relatively easily solved. The third term is bounded by:
\sumi<jwij\deltaijdij(X)=\operatorname{tr}X'B(X)X\ge\operatorname{tr}X'B(Z)Z
where
has:
for
and
for
and
.
Proof of this inequality is by the Cauchy-Schwarz inequality, see Borg[3] (pp. 152–153).
Thus, we have a simple quadratic function
that majorizes stress:
\sigma(X)=C+\operatorname{tr}X'VX-2\operatorname{tr}X'B(X)X
\leC+\operatorname{tr}X'VX-2\operatorname{tr}X'B(Z)Z=\tau(X,Z)
The iterative minimization procedure is then:
step we set
Xk\leftarrowminX\tau(X,Z)
\sigma(Xk-1)-\sigma(Xk)<\epsilon
otherwise repeat.
This algorithm has been shown to decrease stress monotonically (see de Leeuw[2]).
Use in graph drawing
Stress majorization and algorithms similar to SMACOF also have application in the field of graph drawing.[4] [5] That is, one can find a reasonably aesthetically appealing layout for a network or graph by minimizing a stress function over the positions of the nodes in the graph. In this case, the
are usually set to the graph-theoretic distances between nodes
and
and the weights
are taken to be
. Here,
is chosen as a trade-off between preserving long- or short-range ideal distances. Good results have been shown for
.
[6] Notes and References
- .
- .
- .
- .
- .
- .