Kolmogorov structure function explained

In 1973, Andrey Kolmogorov proposed a non-probabilistic approach to statistics and model selection. Let each datum be a finite binary string and a model be a finite set of binary strings. Consider model classes consisting of models of given maximal Kolmogorov complexity.The Kolmogorov structure function of an individual data string expresses the relation between the complexity level constraint on a model class and the least log-cardinality of a model in the class containing the data. The structure function determines all stochastic properties of the individual data string: for every constrained model class it determines the individual best-fitting model in the class irrespective of whether the true model is in the model class considered or not. In the classical case we talk about a set of data with a probability distribution, and the properties are those of the expectations. In contrast, here we deal with individual data strings and the properties of the individual string focused on. In this setting, a property holds with certainty rather than with high probability as in the classical case. The Kolmogorov structure function precisely quantifies the goodness-of-fit of an individual model with respect to individual data.

The Kolmogorov structure function is used in the algorithmic information theory, also known as the theory of Kolmogorov complexity, for describing the structure of a string by use of models of increasing complexity.

Kolmogorov's definition

The structure function was originally proposed by Kolmogorov in 1973 at a Soviet Information Theory symposium in Tallinn, but these results were not published p. 182. But the results were announced in^[1] in 1974, the only written record by Kolmogorov himself. One of his last scientific statements is (translated from the original Russian by L.A. Levin):

Contemporary definition

It is discussed in Cover and Thomas.^[2] It is extensively studied in Vereshchagin and Vitányi^[3] where also the main properties are resolved.The Kolmogorov structure function can be written as

h_x(\alpha)=min_S\{log|S|:x\inS,K(S)\leq\alpha\}

where

is a binary string of length

with

x\inS

where

is a contemplated model (set of n-length strings) for

K(S)

is the Kolmogorov complexity of

and

\alpha

is a nonnegative integer value bounding the complexity of the contemplated

's. Clearly, this function is nonincreasing and reaches

log|\{x\}|=0

for

\alpha=K(x)+c

where

is the required number of bits to change

into

\{x\}

and

K(x)

is the Kolmogorov complexity of

The algorithmic sufficient statistic

We define a set

containing

such that

K(S)+K(x|S)=K(x)+O(1)

.The function

h_x(\alpha)

never decreases more than a fixed independent constant below the diagonal called sufficiency line L defined by

L(\alpha)+\alpha=K(x)

.It is approached to within a constant distance by the graph of

h_x

for certain arguments (for instance, for

\alpha=K(x)+c

). For these

\alpha

's we have

\alpha+h_x(\alpha)=K(x)+O(1)

and the associated model

(witness for

h_x(\alpha)

) is called an optimal set for

, and its description of

K(S)\leq\alpha

bits is therefore an algorithmic sufficient statistic. We write `algorithmic' for `Kolmogorov complexity' by convention. The main properties of an algorithmic sufficient statistic are the following: If

is an algorithmic sufficient statistic for

, then

K(S)+log|S|=K(x)+O(1)

.That is, the two-part description of

using the model

and as data-to-model code the index of

in the enumeration of

log|S|

bits, is as concise as the shortest one-part code of

K(x)

bits. This can be easily seen as follows:

K(x)\leqK(x,S)+O(1)\leqK(S)+K(x|S)+O(1)\leqK(S)+log|S|+O(1)\leqK(x)+O(1)

, using straightforward inequalities and the sufficiency property, we find that

K(x|S)=log|S|+O(1)

. (For example, given

S\nix

, we can describe

self-delimitingly (you can determine its end) in

log|S|+O(1)

bits.) Therefore, the randomness deficiency

log|S|-K(x|S)

is a constant, which means that

is a typical (random) element of S. However, there can be models

containing

that are not sufficient statistics. An algorithmic sufficient statistic

for

has the additional property, apart from being a model of best fit, that

K(x,S)=K(x)+O(1)

and therefore by the Kolmogorov complexity symmetry of information (the information about

is about the same as the information about

in x) we have

K(S|x^*)=O(1)

: the algorithmic sufficient statistic

is a model of best fit that is almost completely determined by

. (

x^*

is a shortest program for

.) The algorithmic sufficient statistic associated with the least such

\alpha

is called the algorithmic minimal sufficient statistic.

With respect to the picture: The MDL structure function

λ_x(\alpha)

is explained below. The Goodness-of-fit structure function

\beta_x(\alpha)

is the least randomness deficiency (see above) of any model

S\nix

for

such that

K(S)\leq\alpha

. This structure function gives the goodness-of-fit of a model

(containing x) for the string x. When it is low the model fits well, and when it is high the model doesn't fit well. If

\beta_x(\alpha)=0

for some

\alpha

then there is a typical model

S\nix

for

such that

K(S)\leq\alpha

and

is typical (random) for S. That is,

is the best-fitting model for x. For more details see and especially^[3] and.^[4]

Selection of properties

Within the constraints that the graph goes down at an angle of at least 45 degrees, that it starts at n and ends approximately at

K(x)

, every graph (up to a

O(logn)

additive term in argument and value) is realized by the structure function of some data x and vice versa. Where the graph hits the diagonal first the argument (complexity) is that of the minimum sufficient statistic. It is incomputable to determine this place. See.

Main property

It is proved that at each level

\alpha

of complexity the structure function allows us to select the best model

for the individual string x within a strip of

O(logn)

with certainty, not with great probability.

The MDL variant

The Minimum description length (MDL) function: The length of the minimal two-part code for x consisting of the model cost K(S) and thelength of the index of x in S, in the model class of sets of given maximal Kolmogorov complexity

\alpha

, the complexity of S upper bounded by

\alpha

, is given by the MDL function or constrained MDL estimator:

λ_x(\alpha)= min_S\{Λ(S):S\nix, K(S)\leq\alpha\},

where

Λ(S)=log|S|+K(S)\geK(x)-O(1)

is the total length of two-part code of x with help of model S.

Main property

It is proved that at each level

\alpha

of complexity the structure function allows us to select the best model S for the individual string x within a strip of

O(logn)

with certainty, not with great probability.

Application in statistics

The mathematics developed above were taken as the foundation of MDL by its inventor Jorma Rissanen.^[5]

Probability models

For every computable probability distribution

it can be proved^[6] that

-logP(x)=log|S|+O(logn)

.For example, if

is some computable distribution on the set

of strings of length

, then each

x\inS

has probability

P(x)=\exp(O(logn))/|S|=n^O(1)/|S|

. Kolmogorov's structure function becomes

h'_x(\alpha)=min_P\{-logP(x):P(x)>0,K(P)\leq\alpha\}

where x is a binary string of length n with

-logP(x)>0

where

is a contemplated model (computable probability of

-length strings) for

K(P)

is the Kolmogorov complexity of

and

\alpha

is an integer value bounding the complexity of the contemplated

's. Clearly, this function is non-increasing and reaches

log|\{x\}|=0

for

\alpha=K(x)+c

where c is the required number of bits to change

into

\{x\}

and

K(x)

is the Kolmogorov complexity of

. Then

h'_x(\alpha)=h_x(\alpha)+O(logn)

. For every complexity level

\alpha

the function

h'_x(\alpha)

is the Kolmogorov complexity version of the maximum likelihood (ML).

Main property

It is proved that at each level

\alpha

of complexity the structure function allows us to select the best model

for the individual string

within a strip of

O(logn)

with certainty, not with great probability.

The MDL variant and probability models

The MDL function: The length of the minimal two-part code for x consisting of the model cost K(P) and thelength of

-logP(x)

, in the model class of computable probability mass functions of given maximal Kolmogorov complexity

\alpha

, the complexity of P upper bounded by

\alpha

, is given by the MDL function or constrained MDL estimator:

λ'_x(\alpha)= min_P\{Λ(P):P(x)>0, K(P)\leq\alpha\},

where

Λ(P)=-logP(x)+K(P)\geqK(x)-O(1)

is the total length of two-part code of x with help of model P.

Main property

It is proved that at each level

\alpha

of complexity the MDL function allows us to select the best model P for the individual string x within a strip of

O(logn)

with certainty, not with great probability.^[3]

Extension to rate distortion and denoising

It turns out that the approach can be extended to a theory of rate distortion of individual finite sequences and denoising of individual finite sequences^[7] using Kolmogorov complexity. Experiments using real compressor programs have been carried out with success.^[8] Here the assumption is that for natural data the Kolmogorov complexity is not far from the length of a compressed version using a good compressor.

Literature

Cover. T.M. . P. Gacs . R.M. Gray. Kolmogorov's contributions to Information Theory and Algorithmic Complexity. Annals of Probability. 1989. 17. 3. 840–865. 2244387. 10.1214/aop/1176991250. free.
Kolmogorov. A. N.. Uspenskii, V. A. . Algorithms and Randomness. Theory of Probability and Its Applications. 1 January 1987. 32. 3. 389–412. 10.1137/1132060.
Book: Li, M., Vitányi, P.M.B.. An introduction to Kolmogorov complexity and its applications. 2008. Springer. New York. 978-0387339986. 3rd., Especially pp. 401–431 about the Kolmogorov structure function, and pp. 613–629 about rate distortion and denoising of individual sequences.
Shen. A.. Discussion on Kolmogorov Complexity and Statistical Analysis. The Computer Journal. 1 April 1999. 42. 4. 340–342. 10.1093/comjnl/42.4.340.
V'yugin. V.V.. On Randomness Defect of a Finite Object Relative to Measures with Given Complexity Bounds. Theory of Probability and Its Applications. 1987. 32. 3. 508–512. 10.1137/1132071.
V'yugin. V. V.. Algorithmic Complexity and Stochastic Properties of Finite Binary Sequences. The Computer Journal. 1 April 1999. 42. 4. 294–317. 10.1093/comjnl/42.4.294.

Notes and References

http://www.mathnet.ru/php/archive.phtml?jrnid=rm&wshow=issue&year=1974&volume=29&volume_alt=&issue=4&issue_alt=178&option_lang=eng Abstract of a talk for the Moscow Mathematical Society in Uspekhi Mat. Nauk Volume 29, Issue 4(178) in the Communications of the Moscow Mathematical Society page 155 (in the Russian edition, not translated into English)
Book: Cover, Thomas M.. Elements of information theory. registration. 1991. Wiley. New York. 978-0471062592. Thomas, Joy A. . 175–178.
Vereshchagin. N.K.. Vitanyi, P.M.B. . Kolmogorov's Structure Functions and Model Selection. IEEE Transactions on Information Theory. 1 December 2004. 50. 12. 3265–3290. 10.1109/TIT.2004.838346. cs/0204037.
Gacs. P.. Tromp, J.T. . Vitanyi, P.M.B. . Algorithmic statistics. IEEE Transactions on Information Theory. 2001. 47. 6. 2443–2463. 10.1109/18.945257. math/0006233.
Book: Rissanen, Jorma. Information and complexity in statistical modeling. 2007. Springer. New York. 978-0-387-36610-4. Online-Ausg..
https://scholar.google.com/scholar?hl=en&as_sdt=0,5&q=Shen+1983+Soviet+Math+Doklady A.Kh. Shen, The concept of (α, β)-stochasticity in the Kolmogorov sense, and its properties, Soviet Math. Dokl., 28:1(1983), 295--299
Vereshchagin. Nikolai K.. Vitanyi, Paul M.B. . Rate Distortion and Denoising of Individual Data Using Kolmogorov Complexity. IEEE Transactions on Information Theory. 1 July 2010. 56. 7. 3438–3454. 10.1109/TIT.2010.2048491. cs/0411014.
de Rooij. Steven. Vitanyi, Paul . Approximating Rate-Distortion Graphs of Individual Data: Experiments in Lossy Compression and Denoising. IEEE Transactions on Computers. 1 March 2012. 61. 3. 395–407. 10.1109/TC.2011.25. cs/0609121.