Variable-order Markov model explained

In the mathematical theory of stochastic processes, variable-order Markov (VOM) models are an important class of models that extend the well known Markov chain models. In contrast to the Markov chain models, where each random variable in a sequence with a Markov property depends on a fixed number of random variables, in VOM models this number of conditioning random variables may vary based on the specific observed realization.

This realization sequence is often called the context; therefore the VOM models are also called context trees.^[1] VOM models are nicely rendered by colorized probabilistic suffix trees (PST).^[2] The flexibility in the number of conditioning random variables turns out to be of real advantage for many applications, such as statistical analysis, classification and prediction.^[3] ^[4] ^[5]

Example

Consider for example a sequence of random variables, each of which takes a value from the ternary alphabet . Specifically, consider the string constructed from infinite concatenations of the sub-string : .

The VOM model of maximal order 2 can approximate the above string using only the following five conditional probability components:,,,, .

In this example, ; therefore, the shorter context is sufficient to determine the next character. Similarly, the VOM model of maximal order 3 can generate the string exactly using only five conditional probability components, which are all equal to 1.0. To construct the Markov chain of order 1 for the next character in that string, one must estimate the following 9 conditional probability components:,,,,,,,, . To construct the Markov chain of order 2 for the next character, one must estimate 27 conditional probability components:,,, . And to construct the Markov chain of order three for the next character one must estimate the following 81 conditional probability components:,,, .

In practical settings there is seldom sufficient data to accurately estimate the exponentially increasing number of conditional probability components as the order of the Markov chain increases.

The variable-order Markov model assumes that in realistic settings, there are certain realizations of states (represented by contexts) in which some past states are independent from the future states; accordingly, "a great reduction in the number of model parameters can be achieved."^[1]

Definition

Let be a state space (finite alphabet) of size

|A|

	n
x
	1

=x_1x_2...x_n

of realizations of random variables, where

x_i\inA

is the state (symbol) at position

\scriptstyle(1\lei\len)

, and the concatenation of states

x_i

and

x_i+1

is denoted by

x_ix_i+1

Given a training set of observed states,

	n
x
	1

, the construction algorithm of the VOM models^[3] ^[4] ^[5] learns a model that provides a probability assignment for each state in the sequence given its past (previously observed symbols) or future states. Specifically, the learner generates a conditional probability distribution

P(x_i\mids)

for a symbol

x_i\inA

given a context

s\inA^*

, where the * sign represents a sequence of states of any length, including the empty context.

VOM models attempt to estimate conditional distributions of the form

P(x_i\mids)

where the context length

|s|\leD

varies depending on the available statistics. In contrast, conventional Markov models attempt to estimate these conditional distributions by assuming a fixed contexts' length

|s|=D

and, hence, can be considered as special cases of the VOM models.

Effectively, for a given training sequence, the VOM models are found to obtain better model parameterization than the fixed-order Markov models that leads to a better variance-bias tradeoff of the learned models.^[3] ^[4] ^[5]

Application areas

Various efficient algorithms have been devised for estimating the parameters of the VOM model.^[4]

VOM models have been successfully applied to areas such as machine learning, information theory and bioinformatics, including specific applications such as coding and data compression,^[1] document compression,^[4] classification and identification of DNA and protein sequences,^[6] http://www.eng.tau.ac.il/~bengal/VOMBAT.pdf^[3] statistical process control,^[5] spam filtering,^[7] haplotyping,^[8] speech recognition,^[9] sequence analysis in social sciences, and others.

Notes and References

Rissanen. J.. A Universal Data Compression System. IEEE Transactions on Information Theory. 29. 5. Sep 1983. 656–664. 10.1109/TIT.1983.1056741.
Gabadinho. Alexis. Ritschard. Gilbert. 2016. Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package. Journal of Statistical Software. en. 72. 3. 10.18637/jss.v072.i03. 63681202 . 1548-7660. free.
Shmilovici. A.. Ben-Gal, I. . Using a VOM Model for Reconstructing Potential Coding Regions in EST Sequences. Computational Statistics. 22. 1. 2007. 49–69. 10.1007/s00180-007-0021-8. 2737235 .
Begleiter. R.. El-Yaniv, R.. Yona, G.. On Prediction Using Variable Order Markov models. Journal of Artificial Intelligence Research. 22. 2004. 385–421. 10.1613/jair.1491. free. 1107.0051.
Ben-Gal. I.. Morag, G.. Shmilovici, A.. 2003. Context-Based Statistical Process Control: A Monitoring Procedure for State-Dependent Processes. Technometrics. 45. 4. 293–311. 10.1198/004017003000000122. 5227793 . 0040-1706.
VOMBAT: Prediction of Transcription Factor Binding Sites using Variable Order Bayesian Trees . Grau J. . Ben-Gal I. . Posch S. . Grosse I. . Nucleic Acids Research . Nucleic Acids Research, vol. 34, issue W529–W533.. 2006. 34 . Web Server issue . W529-33 . 10.1093/nar/gkl212 . 16845064 . 1538886 .
Bratko. A. . Cormack, G. V. . Filipic, B. . Lynam, T. . Zupan, B.. Spam Filtering Using Statistical Data Compression Models. Journal of Machine Learning Research. 7. 2006. 2673–2698.
[Sharon R. Browning|Browning, Sharon R.]
Book: Smith. A.. Denenberg. J.. Slack. T.. Tan. C.. Wohlford. R.. ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing . Application of a sequential pattern learning system to connected speech recognition . 1985. https://ieeexplore.ieee.org/document/1168282. Tampa, FL, USA. Institute of Electrical and Electronics Engineers. 10. 1201–1204. 10.1109/ICASSP.1985.1168282. 60991068 .

Variable-order Markov model explained

Example

Definition

Application areas

See also

Notes and References