In the mathematical theory of stochastic processes, variable-order Markov (VOM) models are an important class of models that extend the well known Markov chain models. In contrast to the Markov chain models, where each random variable in a sequence with a Markov property depends on a fixed number of random variables, in VOM models this number of conditioning random variables may vary based on the specific observed realization.
This realization sequence is often called the context; therefore the VOM models are also called context trees.[1] VOM models are nicely rendered by colorized probabilistic suffix trees (PST).[2] The flexibility in the number of conditioning random variables turns out to be of real advantage for many applications, such as statistical analysis, classification and prediction.[3] [4] [5]
Consider for example a sequence of random variables, each of which takes a value from the ternary alphabet . Specifically, consider the string constructed from infinite concatenations of the sub-string : .
The VOM model of maximal order 2 can approximate the above string using only the following five conditional probability components:,,,, .
In this example, ; therefore, the shorter context is sufficient to determine the next character. Similarly, the VOM model of maximal order 3 can generate the string exactly using only five conditional probability components, which are all equal to 1.0. To construct the Markov chain of order 1 for the next character in that string, one must estimate the following 9 conditional probability components:,,,,,,,, . To construct the Markov chain of order 2 for the next character, one must estimate 27 conditional probability components:,,, . And to construct the Markov chain of order three for the next character one must estimate the following 81 conditional probability components:,,, .
In practical settings there is seldom sufficient data to accurately estimate the exponentially increasing number of conditional probability components as the order of the Markov chain increases.
The variable-order Markov model assumes that in realistic settings, there are certain realizations of states (represented by contexts) in which some past states are independent from the future states; accordingly, "a great reduction in the number of model parameters can be achieved."[1]
Let be a state space (finite alphabet) of size
|A|
n | |
x | |
1 |
=x1x2...xn
xi\inA
\scriptstyle(1\lei\len)
xi
xi+1
xixi+1
Given a training set of observed states,
n | |
x | |
1 |
P(xi\mids)
xi\inA
s\inA*
VOM models attempt to estimate conditional distributions of the form
P(xi\mids)
|s|\leD
|s|=D
Effectively, for a given training sequence, the VOM models are found to obtain better model parameterization than the fixed-order Markov models that leads to a better variance-bias tradeoff of the learned models.[3] [4] [5]
Various efficient algorithms have been devised for estimating the parameters of the VOM model.[4]
VOM models have been successfully applied to areas such as machine learning, information theory and bioinformatics, including specific applications such as coding and data compression,[1] document compression,[4] classification and identification of DNA and protein sequences,[6] http://www.eng.tau.ac.il/~bengal/VOMBAT.pdf[3] statistical process control,[5] spam filtering,[7] haplotyping,[8] speech recognition,[9] sequence analysis in social sciences, and others.