Stress majorization explained

Stress majorization is an optimization strategy used in multidimensional scaling (MDS) where, for a set of

-dimensional data items, a configuration

of
n

points in

(\llm)

-dimensional space is sought that minimizes the so-called stress function
\sigma(X)

. Usually

is
2

or
3

, i.e. the

(n x r)

matrix

lists points in
2-

or
3-

dimensional Euclidean space so that the result may be visualised (i.e. an MDS plot). The function
\sigma

is a cost or loss function that measures the squared differences between ideal (
m

-dimensional) distances and actual distances in r-dimensional space. It is defined as:

\sigma(X)=\sum_i<j\lew_ij(d_ij(X)-\delta_ij)²

where

w_ij\ge0

is a weight for the measurement between a pair of points

(i,j)

d_ij(X)

is the euclidean distance between

and

\delta_ij

is the ideal distance between the points (their separation) in the

-dimensional data space. Note that

w_ij

can be used to specify a degree of confidence in the similarity between points (e.g. 0 can be specified if there is no information for a particular pair).

A configuration

which minimizes

\sigma(X)

gives a plot in which points that are close together correspond to points that are also close together in the original

-dimensional data space.

There are many ways that

\sigma(X)

could be minimized. For example, Kruskal^[1] recommended an iterative steepest descent approach. However, a significantly better (in terms of guarantees on, and rate of, convergence) method for minimizing stress was introduced by Jan de Leeuw.^[2] De Leeuw's iterative majorization method at each step minimizes a simple convex function which both bounds

\sigma

from above and touches the surface of

\sigma

at a point

, called the supporting point. In convex analysis such a function is called a majorizing function. This iterative majorization process is also referred to as the SMACOF algorithm ("Scaling by MAjorizing a COmplicated Function").

The SMACOF algorithm

The stress function

\sigma

can be expanded as follows:

\sigma(X)=\sum_i<j\lew_ij(d_ij(X)-\delta_ij

	2 =\sum
)
	i<j

w_ij

	2
\delta
	ij

+\sum_i<jw_ij

	2(X)-2\sum
d
	i<j

w_ij\delta_ijd_ij(X)

Note that the first term is a constant

and the second term is quadratic in

(i.e. for the Hessian matrix

the second term is equivalent to tr

X'VX

) and therefore relatively easily solved. The third term is bounded by:

\sum_i<jw_ij\delta_ijd_ij(X)=\operatorname{tr}X'B(X)X\ge\operatorname{tr}X'B(Z)Z

where

B(Z)

has:

b_ij=-

	w_ij\delta_ij
	d_ij(Z)

for

d_ij(Z)\ne0,i\nej

and

b_ij=0

for

d_ij(Z)=0,i\nej

and

b_ii

	n
=-\sum
	j=1,j\nei

b_ij

Proof of this inequality is by the Cauchy-Schwarz inequality, see Borg^[3] (pp. 152–153).

Thus, we have a simple quadratic function

\tau(X,Z)

that majorizes stress:

\sigma(X)=C+\operatorname{tr}X'VX-2\operatorname{tr}X'B(X)X

\leC+\operatorname{tr}X'VX-2\operatorname{tr}X'B(Z)Z=\tau(X,Z)

The iterative minimization procedure is then:

at the

k^th

step we set

Z\leftarrowX^k-1

X^k\leftarrowmin_X\tau(X,Z)

stop if

\sigma(X^k-1)-\sigma(X^k)<\epsilon

otherwise repeat.

This algorithm has been shown to decrease stress monotonically (see de Leeuw^[2]).

Use in graph drawing

Stress majorization and algorithms similar to SMACOF also have application in the field of graph drawing.^[4] ^[5] That is, one can find a reasonably aesthetically appealing layout for a network or graph by minimizing a stress function over the positions of the nodes in the graph. In this case, the

\delta_ij

are usually set to the graph-theoretic distances between nodes i

and j

and the weights

w_ij

are taken to be

	-\alpha
\delta
	ij

. Here,

\alpha

is chosen as a trade-off between preserving long- or short-range ideal distances. Good results have been shown for

\alpha=2

.^[6]

Stress majorization explained

The SMACOF algorithm

Use in graph drawing

Notes and References