Maximal entropy random walk (MERW) is a popular type of biased random walk on a graph, in which transition probabilities are chosen accordingly to the principle of maximum entropy, which says that the probability distribution which best represents the current state of knowledge is the one with largest entropy. While standard random walk chooses for every vertex uniform probability distribution among its outgoing edges, locally maximizing entropy rate, MERW maximizes it globally (average entropy production) by assuming uniform probability distribution among all paths in a given graph.
MERW is used in various fields of science. A direct application is choosing probabilities to maximize transmission rate through a constrained channel, analogously to Fibonacci coding. Its properties also made it useful for example in analysis of complex networks,[1] like link prediction,[2] community detection,[3] robust transport over networks[4] and centrality measures.[5] Also in image analysis, for example for detecting visual saliency regions,[6] object localization,[7] tampering detection[8] or tractography problem.[9]
Additionally, it recreates some properties of quantum mechanics, suggesting a way to repair the discrepancy between diffusion models and quantum predictions, like Anderson localization.[10]
Consider a graph with
n
A\in\left\{0,1\right\}n
Aij=1
i
j
A
We would like to choose a random walk as a Markov process on this graph: for every vertex
i
j
Sij
i
S
0\leqSij\leqAij
i,j
n | |
\sum | |
j=1 |
Sij=1
i
\rho
\rhoS=\rho
Using Shannon entropy for every vertex and averaging over probability of visiting this vertex (to be able to use its entropy), we get the following formula for average entropy production (entropy rate) of the stochastic process:
n | |
H(S)=\sum | |
i=1 |
\rhoi
n | |
\sum | |
j=1 |
Sijlog(1/Sij)
This definition turns out to be equivalent to the asymptotic average entropy (per length) of the probability distribution in the space of paths for this stochastic process.
In the standard random walk, referred to here as generic random walk (GRW), we naturally choose that each outgoing edge is equally probable:
Sij=
Aij | |||||||||
|
A
\rho
\rhoi=
| |||||||||||||||
|
H(S)
MERW chooses the stochastic matrix which maximizes
H(S)
λ
\psi
λ\inR
\psi\inRn
\psiA=λ\psi
Sij=
Aij | |
λ |
\psij | |
\psii |
l
i
j
1 | |
λl |
\psij | |
\psii |
log(λ)
\rho
\rhoi=
| |||||||
|
In contrast to GRW, the MERW transition probabilities generally depend on the structure of the entire graph (are nonlocal). Hence, they should not be imagined as directly applied by the walker – if random-looking decisions are made based on the local situation, like a person would make, the GRW approach is more appropriate. MERW is based on the principle of maximum entropy, making it the safest assumption when we don't have any additional knowledge about the system. For example, it would be appropriate for modelling our knowledge about an object performing some complex dynamics – not necessarily random, like a particle.
Assume for simplicity that the considered graph is indirected, connected and aperiodic, allowing to conclude from the Perron–Frobenius theorem that the dominant eigenvector is unique. Hence
Al
l → infty
λl\psi\psiT
λl|\psi\rangle\langle\psi|
MERW requires uniform distribution along paths. The number
mil
2l
i
mil=
n | |
\sum | |
j=1 |
n | |
\sum | |
k=1 |
l\right) | |
\left(A | |
ji |
l\right) | |
\left(A | |
ik |
≈
n | |
\sum | |
j=1 |
n | |
\sum | |
k=1 |
\left(λl\psi
\top\right) | |
\psi | |
ji |
\left(λl\psi
\top\right) | |
\psi | |
ik |
=
n | |
\sum | |
j=1 |
n | |
\sum | |
k=1 |
λ2l\psij\psii\psii\psik=λ2l
2 | |
\psi | |
i |
n | |
\underbrace{\sum | |
j=1 |
\psij
n | |
\sum | |
k=1 |
\psik}=:
i
\rhoi=\liml
mil | |||||||||
|
=\liml
| |||||||||||||||
|
=\liml
| |||||||||||||||
|
=
| |||||||||||||||
|
=
| |||||||
|
Analogously calculating probability distribution for two succeeding vertices, one obtains that the probability of being at the
i
j
\psiiAij\psij | |||||||||||||||
|
=
\psiiAij\psij | |
\psiA\psi\top |
=
\psiiAij\psij | ||||||||
|
i
\rhoi
Sij
j
i
Sij=
Aij | |
λ |
\psij | |
\psii |
We have assumed that
Aij\in\{0,1\}
A
Aij=\exp(-Eij)
l
(\gamma0,\ldots,\gammal)
rm{Pr}(\gamma0,\ldots,\gammal)=\rho
\gamma0 |
S | |
\gamma0\gamma1 |
\ldots
S | |
\gammal-1\gammal |
=
\psi | |
\gamma0 |
| |||||||||||
λl |
\psi | |
\gammal |
=\psi | |
\gamma0 |
| |||||||||||
λl |
\psi | |
\gammal |
Eij
Let us first look at a simple nontrivial situation: Fibonacci coding, where we want to transmit a message as a sequence of 0s and 1s, but not using two successive 1s: after a 1 there has to be a 0. To maximize the amount of information transmitted in such sequence, we should assume uniform probability distribution in the space of all possible sequences fulfilling this constraint. To practically use such long sequences, after 1 we have to use 0, but there remains a freedom of choosing the probability of 0 after 0. Let us denote this probability by
q
q
\rho=(1/(2-q),1-1/(2-q))
H(S)=\rho0\left(qlog(1/q)+(1-q)log(1/(1-q))\right)
q=(\sqrt{5}-1)/2 ≈ 0.618
q=0.5
q
A more complex example is the defected one-dimensional cyclic lattice: let say 1000 nodes connected in a ring, for which all nodes but the defects have a self-loop (edge to itself). In standard random walk (GRW) the stationary probability distribution would have defect probability being 2/3 of probability of the non-defect vertices – there is nearly no localization, also analogously for standard diffusion, which is infinitesimal limit of GRW. For MERW we have to first find the dominant eigenvector of the adjacency matrix – maximizing
λ
(λ\psi)x=(A\psi)x=\psix-1+(1-Vx)\psix+\psix+1
for all positions
x
Vx=1
3\psix
E\psix=-(\psix-1-2\psix+\psix+1)+Vx\psix
where
E=3-λ
\rhox\propto
2 | |
\psi | |
x |
E\psi=-C\psixx+V\psi
C=\hbar2/2m