Stress majorization explained

Stress majorization is an optimization strategy used in multidimensional scaling (MDS) where, for a set of

n

m

-dimensional data items, a configuration

X

of

n

points in

r

(\llm)

-dimensional space is sought that minimizes the so-called stress function

\sigma(X)

. Usually

r

is

2

or

3

, i.e. the

(n x r)

matrix

X

lists points in

2-

or

3-

dimensional Euclidean space so that the result may be visualised (i.e. an MDS plot). The function

\sigma

is a cost or loss function that measures the squared differences between ideal (

m

-dimensional) distances and actual distances in
r-dimensional space. It is defined as:

\sigma(X)=\sumi<j\lewij(dij(X)-\deltaij)2

where

wij\ge0

is a weight for the measurement between a pair of points

(i,j)

,

dij(X)

is the euclidean distance between

i

and

j

and

\deltaij

is the ideal distance between the points (their separation) in the

m

-dimensional data space. Note that

wij

can be used to specify a degree of confidence in the similarity between points (e.g. 0 can be specified if there is no information for a particular pair).

A configuration

X

which minimizes

\sigma(X)

gives a plot in which points that are close together correspond to points that are also close together in the original

m

-dimensional data space.

There are many ways that

\sigma(X)

could be minimized. For example, Kruskal[1] recommended an iterative steepest descent approach. However, a significantly better (in terms of guarantees on, and rate of, convergence) method for minimizing stress was introduced by Jan de Leeuw.[2] De Leeuw's iterative majorization method at each step minimizes a simple convex function which both bounds

\sigma

from above and touches the surface of

\sigma

at a point

Z

, called the supporting point. In convex analysis such a function is called a majorizing function. This iterative majorization process is also referred to as the SMACOF algorithm ("Scaling by MAjorizing a COmplicated Function").

The SMACOF algorithm

The stress function

\sigma

can be expanded as follows:

\sigma(X)=\sumi<j\lewij(dij(X)-\deltaij

2 =\sum
)
i<j

wij

2
\delta
ij

+\sumi<jwij

2(X)-2\sum
d
i<j

wij\deltaijdij(X)

Note that the first term is a constant

C

and the second term is quadratic in

X

(i.e. for the Hessian matrix

V

the second term is equivalent to tr

X'VX

) and therefore relatively easily solved. The third term is bounded by:

\sumi<jwij\deltaijdij(X)=\operatorname{tr}X'B(X)X\ge\operatorname{tr}X'B(Z)Z

where

B(Z)

has:

bij=-

wij\deltaij
dij(Z)
for

dij(Z)\ne0,i\nej

and

bij=0

for

dij(Z)=0,i\nej

and

bii

n
=-\sum
j=1,j\nei

bij

.

Proof of this inequality is by the Cauchy-Schwarz inequality, see Borg[3] (pp. 152–153).

Thus, we have a simple quadratic function

\tau(X,Z)

that majorizes stress:

\sigma(X)=C+\operatorname{tr}X'VX-2\operatorname{tr}X'B(X)X

\leC+\operatorname{tr}X'VX-2\operatorname{tr}X'B(Z)Z=\tau(X,Z)

The iterative minimization procedure is then:

kth

step we set

Z\leftarrowXk-1

Xk\leftarrowminX\tau(X,Z)

\sigma(Xk-1)-\sigma(Xk)<\epsilon

otherwise repeat.

This algorithm has been shown to decrease stress monotonically (see de Leeuw[2]).

Use in graph drawing

Stress majorization and algorithms similar to SMACOF also have application in the field of graph drawing.[4] [5] That is, one can find a reasonably aesthetically appealing layout for a network or graph by minimizing a stress function over the positions of the nodes in the graph. In this case, the

\deltaij

are usually set to the graph-theoretic distances between nodes

i

and

j

and the weights

wij

are taken to be
-\alpha
\delta
ij
. Here,

\alpha

is chosen as a trade-off between preserving long- or short-range ideal distances. Good results have been shown for

\alpha=2

.[6]

Notes and References

  1. .
  2. .
  3. .
  4. .
  5. .
  6. .