Maximal entropy random walk explained

Maximal entropy random walk (MERW) is a popular type of biased random walk on a graph, in which transition probabilities are chosen accordingly to the principle of maximum entropy, which says that the probability distribution which best represents the current state of knowledge is the one with largest entropy. While standard random walk chooses for every vertex uniform probability distribution among its outgoing edges, locally maximizing entropy rate, MERW maximizes it globally (average entropy production) by assuming uniform probability distribution among all paths in a given graph.

MERW is used in various fields of science. A direct application is choosing probabilities to maximize transmission rate through a constrained channel, analogously to Fibonacci coding. Its properties also made it useful for example in analysis of complex networks,^[1] like link prediction,^[2] community detection,^[3] robust transport over networks^[4] and centrality measures.^[5] Also in image analysis, for example for detecting visual saliency regions,^[6] object localization,^[7] tampering detection^[8] or tractography problem.^[9]

Additionally, it recreates some properties of quantum mechanics, suggesting a way to repair the discrepancy between diffusion models and quantum predictions, like Anderson localization.^[10]

Basic model

Consider a graph with

vertices, defined by an adjacency matrix

A\in\left\{0,1\right\}ⁿ

A_ij=1

if there is an edge from vertex

, 0 otherwise. For simplicity assume it is an undirected graph, which corresponds to a symmetric

; however, MERW can also be generalized for directed and weighted graphs (for example Boltzmann distribution among paths instead of uniform).

We would like to choose a random walk as a Markov process on this graph: for every vertex

and its outgoing edge to

, choose probability

S_ij

of the walker randomly using this edge after visiting

. Formally, find a stochastic matrix

(containing the transition probabilities of a Markov chain) such that

0\leqS_ij\leqA_ij

for all

i,j

and

	n
\sum
	j=1

S_ij=1

for all

.Assuming this graph is connected and not periodic, ergodic theory says that evolution of this stochastic process leads to some stationary probability distribution

\rho

such that

\rhoS=\rho

Using Shannon entropy for every vertex and averaging over probability of visiting this vertex (to be able to use its entropy), we get the following formula for average entropy production (entropy rate) of the stochastic process:

	n
H(S)=\sum
	i=1

\rho_i

	n
\sum
	j=1

S_ijlog(1/S_ij)

This definition turns out to be equivalent to the asymptotic average entropy (per length) of the probability distribution in the space of paths for this stochastic process.

In the standard random walk, referred to here as generic random walk (GRW), we naturally choose that each outgoing edge is equally probable:

S_ij=

A_ij

	n
\sum\limits		A_ik
	k=1

.For a symmetric

it leads to a stationary probability distribution

\rho

with

\rho_i=

	n
\sum\limits		A_ij
	j=1

\sum\limits

	n
\sum\limits
	j=1

A_ij

i=1

.It locally maximizes entropy production (uncertainty) for every vertex, but usually leads to a suboptimal averaged global entropy rate

H(S)

MERW chooses the stochastic matrix which maximizes

H(S)

, or equivalently assumes uniform probability distribution among all paths in a given graph. Its formula is obtained by first calculating the dominant eigenvalue

and corresponding eigenvector

\psi

of the adjacency matrix, i.e. the largest

λ\inR

with corresponding

\psi\inRⁿ

such that

\psiA=λ\psi

. Then stochastic matrix and stationary probability distribution are given by

S_ij=

	A_ij
	λ

	\psi_j
	\psi_i

for which every possible path of length

from the

-th to

-th vertex has probability

	1
	λ^l

	\psi_j
	\psi_i

.Its entropy rate is

log(λ)

and the stationary probability distribution

\rho

\rho_i=

	2
\psi
	i

	2
\\|\psi\\|
	2

In contrast to GRW, the MERW transition probabilities generally depend on the structure of the entire graph (are nonlocal). Hence, they should not be imagined as directly applied by the walker – if random-looking decisions are made based on the local situation, like a person would make, the GRW approach is more appropriate. MERW is based on the principle of maximum entropy, making it the safest assumption when we don't have any additional knowledge about the system. For example, it would be appropriate for modelling our knowledge about an object performing some complex dynamics – not necessarily random, like a particle.

Sketch of derivation

Assume for simplicity that the considered graph is indirected, connected and aperiodic, allowing to conclude from the Perron–Frobenius theorem that the dominant eigenvector is unique. Hence

A^l

can be asymptotically (

l → infty

) approximated by

λ^l\psi\psi^T

(or

λ^l|\psi\rangle\langle\psi|

in bra–ket notation).

MERW requires uniform distribution along paths. The number

m_il

of paths with length

and vertex

in the center is

m_il=

	n
\sum
	j=1

	n
\sum
	k=1

	l\right)
\left(A
	ji

	l\right)
\left(A
	ik

≈

	n
\sum
	j=1

	n
\sum
	k=1

\left(λ^l\psi

	\top\right)
\psi
	ji

\left(λ^l\psi

	\top\right)
\psi
	ik

	n
\sum
	j=1

	n
\sum
	k=1

λ^2l\psi_j\psi_i\psi_i\psi_k=λ^2l

	2
\psi
	i

	n
\underbrace{\sum
	j=1

\psi_j

	n
\sum
	k=1

\psi_k}_=:

,hence for all

\rho_i=\lim_l

m_il

	n
\sum\limits		m_kl
	k=1

=\lim_l

	2
\psi
	i

\sum\limits

λ^2l

	2
\psi
	k

k=1

=\lim_l

	2
\psi
	i

\sum\limits

	2
\psi
	k

k=1

	2
\psi
	i

\sum\limits

	2
\psi
	k

k=1

	2
\psi
	i

	2
\\|\psi\\|
	2

Analogously calculating probability distribution for two succeeding vertices, one obtains that the probability of being at the

-th vertex and next at the

-th vertex is

\psi_iA_ij\psi_j

\sum\limits

	n
\sum\limits
	j'=1

\psi_i'A_i'j'\psi_j'

i'=1

	\psi_iA_ij\psi_j
	\psiA\psi^\top

\psi_iA_ij\psi_j

	2
\\|\psi\\|
	2

.Dividing by the probability of being at the

-th vertex, i.e.

\rho_i

, gives for the conditional probability

S_ij

of the

-th vertex being next after the

-th vertex

S_ij=

	A_ij
	λ

	\psi_j
	\psi_i

Weighted MERW: Boltzmann path ensemble

We have assumed that

A_ij\in\{0,1\}

for MERW corresponding to uniform ensemble among paths. However, the above derivation works for real nonnegative

. Parametrizing

A_ij=\exp(-E_ij)

and asking for probability of length

path

(\gamma_0,\ldots,\gamma_l)

, we get:

rm{Pr}(\gamma_0,\ldots,\gamma_l)=\rho


	\gamma₀

S
	\gamma₀\gamma₁

\ldots

S
	\gamma_l-1\gamma_l

\psi
	\gamma₀

\ldots

A
	\gamma_l-1\gamma_l

\gamma₀\gamma₁

λ^l

\psi
	\gamma_l

=\psi
	\gamma₀

\exp(-(E

+\ldots

+E
	\gamma_l-1\gamma_l

))

\gamma₀\gamma₁

λ^l

\psi
	\gamma_l

As in Boltzmann distribution of paths for energy defined as sum of

E_ij

over given path. For example, it allows to calculate probability distribution of patterns in Ising model.

Examples

Let us first look at a simple nontrivial situation: Fibonacci coding, where we want to transmit a message as a sequence of 0s and 1s, but not using two successive 1s: after a 1 there has to be a 0. To maximize the amount of information transmitted in such sequence, we should assume uniform probability distribution in the space of all possible sequences fulfilling this constraint. To practically use such long sequences, after 1 we have to use 0, but there remains a freedom of choosing the probability of 0 after 0. Let us denote this probability by

, then entropy coding would allow encoding a message using this chosen probability distribution. The stationary probability distribution of symbols for a given

turns out to be

\rho=(1/(2-q),1-1/(2-q))

. Hence, entropy production is

H(S)=\rho₀\left(qlog(1/q)+(1-q)log(1/(1-q))\right)

, which is maximized for

q=(\sqrt{5}-1)/2 ≈ 0.618

, known as the golden ratio. In contrast, standard random walk would choose suboptimal

q=0.5

. While choosing larger

reduces the amount of information produced after 0, it also reduces frequency of 1, after which we cannot write any information.

A more complex example is the defected one-dimensional cyclic lattice: let say 1000 nodes connected in a ring, for which all nodes but the defects have a self-loop (edge to itself). In standard random walk (GRW) the stationary probability distribution would have defect probability being 2/3 of probability of the non-defect vertices – there is nearly no localization, also analogously for standard diffusion, which is infinitesimal limit of GRW. For MERW we have to first find the dominant eigenvector of the adjacency matrix – maximizing

in:

(λ\psi)_x=(A\psi)_x=\psi_x-1+(1-V_x)\psi_x+\psi_x+1

for all positions

, where

V_x=1

for defects, 0 otherwise. Substituting

3\psi_x

and multiplying the equation by −1 we get:

E\psi_x=-(\psi_x-1-2\psi_x+\psi_x+1)+V_x\psi_x

where

E=3-λ

is minimized now, becoming the analog of energy. The formula inside the bracket is discrete Laplace operator, making this equation a discrete analogue of stationary Schrodinger equation. As in quantum mechanics, MERW predicts that the probability distribution should lead exactly to the one of quantum ground state:

\rho_x\propto

	2
\psi
	x

with its strongly localized density (in contrast to standard diffusion). Taking the infinitesimal limit, we can get standard continuous stationary (time-independent) Schrodinger equation (

E\psi=-C\psi_xx+V\psi

for

C=\hbar^2/2m

) here.^[11]

External links

Gábor Simonyi, Y. Lin, Z. Zhang, "Mean first-passage time for maximal-entropy random walks in complex networks". Scientific Reports, 2014.
Electron Conductance Models Using Maximal Entropy Random Walks Wolfram Demonstration Project

Notes and References

Sinatra. Roberta. Gómez-Gardeñes. Jesús. Lambiotte. Renaud. Nicosia. Vincenzo. Latora. Vito. Maximal-entropy random walks in complex networks with limited information. Physical Review E. 83. 3. 030103. 2011. 1539-3755. 10.1103/PhysRevE.83.030103. 21517435. 2011PhRvE..83c0103S. 1007.4936. 6984660 .
Li. Rong-Hua. Yu. Jeffrey Xu. Liu. Jianquan. Link prediction: the power of maximal entropy random walk. 2011. 1147. 10.1145/2063576.2063741. 15309519 . https://web.archive.org/web/20170212090812/https://pdfs.semanticscholar.org/f185/bc2499be95b4312169a7e722bac570c2d509.pdf. dead. 2017-02-12. Association for Computing Machinery Conference on Information and Knowledge Management. http://www.cikm2011.org/.
Ochab. J.K.. Burda. Z.. Maximal entropy random walk in community detection. The European Physical Journal Special Topics. 216. 1. 2013. 73–81. 1951-6355. 10.1140/epjst/e2013-01730-6. 1208.3688. 2013EPJST.216...73O. 56409069 .
Chen. Y.. Georgiou. T.T.. Pavon. M.. Tannenbaum. A.. Robust transport over networks. IEEE Transactions on Automatic Control. 62. 9. 2016. 4675–4682. 10.1109/TAC.2016.2626796. 28924302. 5600536. 1603.08129. 2016arXiv160308129C.
Delvenne. Jean-Charles. Libert. Anne-Sophie. Centrality measures and thermodynamic formalism for complex networks. Physical Review E. 83. 4. 046117. 2011. 1539-3755. 10.1103/PhysRevE.83.046117. 21599250. 0710.3972. 2011PhRvE..83d6117D. 25816198 .
Maximal Entropy Random Walk for Region-Based Visual Saliency . IEEE Transactions on Cybernetics . Institute of Electrical and Electronics Engineers (IEEE) . 44 . 9 . 2014 . 2168-2267 . 10.1109/tcyb.2013.2292054 . 25137693 . 1661–1672. Jin-Gang Yu . Ji Zhao . Jinwen Tian . Yihua Tan . 20962642 .
https://ieeexplore.ieee.org/abstract/document/6678551/ L. Wang, J. Zhao, X. Hu, J. Lu, Weakly supervised object localization via maximal entropy random walk
Korus . Pawel . Huang . Jiwu . Improved Tampering Localization in Digital Image Forensics Based on Maximal Entropy Random Walk . IEEE Signal Processing Letters . Institute of Electrical and Electronics Engineers (IEEE) . 23 . 1 . 2016 . 1070-9908 . 10.1109/lsp.2015.2507598 . 169–173. 2016ISPL...23..169K . 16305991 .
Galinsky . Vitaly L. . Frank . Lawrence R. . Simultaneous Multi-Scale Diffusion Estimation and Tractography Guided by Entropy Spectrum Pathways . IEEE Transactions on Medical Imaging . Institute of Electrical and Electronics Engineers (IEEE) . 34 . 5 . 2015 . 0278-0062 . 10.1109/tmi.2014.2380812 . 25532167 . 4417445 . 1177–1193.
Burda . Z. . Duda . J. . Luck . J. M. . Waclaw . B. . Localization of the Maximal Entropy Random Walk . Physical Review Letters . 102 . 16 . 2009-04-23 . 0031-9007 . 10.1103/physrevlett.102.160602 . 19518691 . 160602. 2009PhRvL.102p0602B . 0810.4113 . 32134048 .
http://www.fais.uj.edu.pl/documents/41628/d63bc0b7-cb71-4eba-8a5a-d974256fd065 J. Duda, Extended Maximal Entropy Random Walk

Maximal entropy random walk explained

Basic model

Sketch of derivation

Weighted MERW: Boltzmann path ensemble

Examples

See also

External links

Notes and References