Stochastic chains with memory of variable length are a family of stochastic chains of finite order in a finite alphabet, such as, for every time pass, only one finite suffix of the past, called context, is necessary to predict the next symbol. These models were introduced in the information theory literature by Jorma Rissanen in 1983,[1] as a universal tool to data compression, but recently have been used to model data in different areas such as biology,[2] linguistics[3] and music.[4]
A stochastic chain with memory of variable length is a stochastic chain
(Xn)n\in
A
(\tau,p)
\tau
Xn-l,\ldots,Xn-1
l
X-infty,\ldots,Xn-1
Xn
p
The class of stochastic chains with memory of variable length was introduced by Jorma Rissanen in the article A universal data compression system.[1] Such class of stochastic chains was popularized in the statistical and probabilistic community by P. Bühlmann and A. J. Wyner in 1999, in the article Variable Length Markov Chains. Named by Bühlmann and Wyner as “variable length Markov chains” (VLMC), these chains are also known as “variable-order Markov models" (VOM), “probabilistic suffix trees”[2] and “context tree models”.[5] The name “stochastic chains with memory of variable length” seems to have been introduced by Galves and Löcherbach, in 2008, in the article of the same name.[6]
Consider a system by a lamp, an observer and a door between both of them. The lamp has two possible states: on, represented by 1, or off, represented by 0. When the lamp is on, the observer may see the light through the door, depending on which state the door is at the time: open, 1, or closed, 0. such states are independent of the original state of the lamp.
Let
(Xn)n\geq
A={0,1}
p
(\xin)n\geq
A
(Xn)n\geq
P(\xin=1)=1-\varepsilon
where
0<\epsilon<1
(Zn)n
Zn=Xn\xin
(Zn)n.
In order to determine the last instant that the observer could see the lamp on, i.e. to identify the least instant
k
k<n
Zk=1
Using a context tree it's possible to represent the past states of the sequence, showing which are relevant to identify the next state.
The stochastic chain
(Zn)n\inZ
A
(\tau,p)
\tau=\{1,10,100, … \}\cup\{0infty\}.
Given a sample
Xl,\ldots,Xn
In the article A Universal Data Compression System,[1] Rissanen introduced a consistent algorithm to estimate the probabilistic context tree that generates the data. This algorithm's function can be summarized in two steps:
Be
X0,\ldots,Xn-1
(\tau,p)
-1 | |
x | |
-j |
j\leqn
Nn(x
-1 | |
-j |
)
Nn(x
-1 | |
-j |
)=
n-j | |
\sum | |
t=0 |
t+j-1 | |
1\left\{X | |
t |
=
-1 | |
x | |
-j |
\right\}
Rissanen first built a context maximum candidate, given by
n-1 | |
X | |
n-K(n) |
K(n)=Clog{n}
C
Clog{n}
log{n}
n
From there, Rissanen shortens the maximum candidate through successive cutting the branches according to a sequence of tests based in statistical likelihood ratio. In a more formal definition, if bANnxk1b0 define the probability estimator of the transition probability
p
\hat{p}n(a\mid
-1 | |
x | |
-k |
)=
| |||||||||||||
|
where
-1 | |
x | |
-j |
a=(x-j,\ldots,x-1,a)
\sumbNn(x
-1 | |
-k |
b)=0
\hat{p}n(a\mid
-1 | |
x | |
-k |
)=1/|A|
To
i\geq1
Λn
-1 | |
(x | |
-i |
)=2\sumy\sumaNn(y
-1 | |
x | |
-i |
a)log\left[
\hat{p | |
n(a\mid |
-1 | |
x | |
-i |
y)}{\hat{p}n(a\mid
-1 | |
x | |
-i |
)}\right]
where
y
-1 | |
x | |
-i |
=(y,x-i,\ldots,x-1)
\hat{p}n(a\mid
-1 | |
x | |
-i |
y)=
| |||||||||||||
|
.
Note that
Λn
-1 | |
(x | |
-i |
)
(\tau,p)
(\tau',p')
\tau
\tau'
The length of the current estimated context is defined by
\hat{\ell}n(X
n-1 | |
0 |
)=max\left\{i=1,\ldots,K(n):Λn (X
n-1 | |
n-i |
)>Clogn\right\}
where
C
X0,\ldots,Xn-1
(\tau,p)
P\left(\hat{\ell}n(X
n-1 | |
0 |
) ≠
n-1 | |
\ell(X | |
0 |
)\right)\longrightarrow0,
when
n → infty
The estimator of the context tree by BIC with a penalty constant
c>0
\hat{\tau}BIC=\underset{\tau\inl{T}n}{\argmax}\{logL\tau
n)-crm{d}f(\tau)log | |
(X | |
1 |
n\}
The smallest maximizer criterion[3] is calculated by selecting the smallest tree τ of a set of champion trees C such that
\limn\to
| |||||||||
(X |
n | |
1)}{n} |
=0