Growth function explained

The growth function, also called the shatter coefficient or the shattering number, measures the richness of a set family or class of function. It is especially used in the context of statistical learning theory, where it is used to study properties of statistical learning methods.The term 'growth function' was coined by Vapnik and Chervonenkis in their 1968 paper, where they also proved many of its properties.It is a basic concept in machine learning.^[1]

Definitions

Set-family definition

Let

be a set family (a set of sets) and

a set. Their intersection is defined as the following set-family:

$H\cap C := \$

The intersection-size (also called the index) of

with respect to

|H\capC|

. If a set

C_m

has

elements then the index is at most

2^m

. If the index is exactly 2^m then the set

is said to be shattered by

, because

H\cap C

contains all the subsets of

C

, i.e.:

$|H\cap C|=2^$

The growth function measures the size of

H\capC

as a function of

|C|

. Formally:

$\operatorname(H,m) := \max_ |H\cap C|$

Hypothesis-class definition

Equivalently, let

be a hypothesis-class (a set of binary functions) and

a set with

elements. The restriction of

is the set of binary functions on

that can be derived from

$H_ := \$ The growth function measures the size of

H_C

as a function of

|C|

$\operatorname(H,m) := \max_ |H_C|$

Examples

1. The domain is the real line

. The set-family

contains all the half-lines (rays) from a given number to positive infinity, i.e., all sets of the form

\{x>x₀\midx\inR\}

for some

x_0\inR

. For any set

real numbers, the intersection

H\capC

contains

m+1

sets: the empty set, the set containing the largest element of

, the set containing the two largest elements of

, and so on. Therefore:

\operatorname{Growth}(H,m)=m+1

. The same is true whether

contains open half-lines, closed half-lines, or both.

2. The domain is the segment

[0,1]

. The set-family

contains all the open sets. For any finite set

real numbers, the intersection

H\capC

contains all possible subsets of

. There are

2^m

such subsets, so

\operatorname{Growth}(H,m)=2^m

3. The domain is the Euclidean space

Rⁿ

. The set-family

contains all the half-spaces of the form:

x ⋅ \phi\geq1

, where

\phi

is a fixed vector.Then

\operatorname{Growth}(H,m)=\operatorname{Comp}(n,m)

,where Comp is the number of components in a partitioning of an n-dimensional space by m hyperplanes.

4. The domain is the real line

. The set-family

contains all the real intervals, i.e., all sets of the form

\{x\in[x_0,x_1]|x\inR\}

for some

x_0,x_1\inR

. For any set

real numbers, the intersection

H\capC

contains all runs of between 0 and

consecutive elements of

. The number of such runs is

{m+1\choose2}+1

, so

\operatorname{Growth}(H,m)={m+1\choose2}+1

Polynomial or exponential

The main property that makes the growth function interesting is that it can be either polynomial or exponential - nothing in-between.

The following is a property of the intersection-size:

If, for some set

C_m

of size

, and for some number

n\leqm

|H\capC_m|\geq\operatorname{Comp}(n,m)

then, there exists a subset

C_n\subseteqC_m

of size

such that

|H\capC_n|=2ⁿ

This implies the following property of the Growth function.For every family

there are two cases:

The exponential case:

\operatorname{Growth}(H,m)=2^m

identically.

The polynomial case:

\operatorname{Growth}(H,m)

is majorized by

\operatorname{Comp}(n,m)\leqmⁿ⁺¹

, where

is the smallest integer for which

\operatorname{Growth}(H,n)<2ⁿ

Other properties

Trivial upper bound

For any finite

\operatorname{Growth}(H,m)\leq|H|

since for every

, the number of elements in

H\capC

is at most

|H|

. Therefore, the growth function is mainly interesting when

is infinite.

Exponential upper bound

For any nonempty

\operatorname{Growth}(H,m)\leq2^m

I.e, the growth function has an exponential upper-bound.

We say that a set-family

shatters a set

if their intersection contains all possible subsets of

, i.e.

H\capC=2^C

.If

shatters

of size

, then

\operatorname{Growth}(H,C)=2^m

, which is the upper bound.

Cartesian intersection

Define the Cartesian intersection of two set-families as:

H_1otimesH₂:=\{h_1\caph₂\midh_1\inH_1,h_2\inH_2\}

.Then:^[1]

\operatorname{Growth}(H_1otimesH_2,m)\leq\operatorname{Growth}(H_1,m) ⋅ \operatorname{Growth}(H_2,m)

Union

For every two set-families:^[1]

\operatorname{Growth}(H_1\cupH_2,m)\leq\operatorname{Growth}(H_1,m)+\operatorname{Growth}(H_2,m)

VC dimension

The VC dimension of

is defined according to these two cases:

In the polynomial case,

\operatorname{VCDim}(H)=n-1

= the largest integer

for which

\operatorname{Growth}(H,d)=2^d

In the exponential case

\operatorname{VCDim}(H)=infty

\operatorname{VCDim}(H)\geqd

if-and-only-if

\operatorname{Growth}(H,d)=2^d

The growth function can be regarded as a refinement of the concept of VC dimension. The VC dimension only tells us whether

\operatorname{Growth}(H,d)

is equal to or smaller than

2^d

, while the growth function tells us exactly how

\operatorname{Growth}(H,m)

changes as a function of

Another connection between the growth function and the VC dimension is given by the Sauer–Shelah lemma:

\operatorname{VCDim}(H)=d

, then:

for all

\operatorname{Growth}(H,m)\leq

	d
\sum
	i=0

{m\choosei}

In particular,

for all

m>d+1

\operatorname{Growth}(H,m)\leq(em/d)^d=O(m^d)

so when the VC dimension is finite, the growth function grows polynomially with

.This upper bound is tight, i.e., for all

m>d

there exists

with VC dimension

such that:^[1]

\operatorname{Growth}(H,m)=

	d
\sum
	i=0

{m\choosei}

Entropy

While the growth-function is related to the maximum intersection-size,the entropy is related to the average intersection size:

\operatorname{Entropy}(H,m)=

E
	\|C_m\|=m

[log_2(|H\capC_m|)]

The intersection-size has the following property. For every set-family

|H\cap(C₁\cupC_2)|\leq|H\capC_{1| ⋅}|H\capC_2|

Hence:

\operatorname{Entropy}(H,m_1+m_{2)
\leq
\operatorname{Entropy}(H,}m₁₎+\operatorname{Entropy}(H,m₂₎

Moreover, the sequence

\operatorname{Entropy}(H,m)/m

converges to a constant

c\in[0,1]

when

m\toinfty

Moreover, the random-variable

log_2{|H\capC_m|/m}

is concentrated near

Applications in probability theory

Let

\Omega

be a set on which a probability measure

\Pr

is defined. Let

be family of subsets of

\Omega

(= a family of events).

Suppose we choose a set

C_m

that contains

elements of

\Omega

,where each element is chosen at random according to the probability measure

, independently of the others (i.e., with replacements). For each event

h\inH

, we compare the following two quantities:

Its relative frequency in

C_m

, i.e.,

|h\capC_m|/m

;

Its probability

\Pr[h]

.We are interested in the difference,

D(h,C_m):=||h\capC_m|/m-\Pr[h]|

. This difference satisfies the following upper bound:

$\Pr\left[\forall h\in H: D(h,C_m) \leq
\sqrt{8(\ln\operatorname{Growth}(H, 2m) + \ln(4/\delta)) \over m}
\right]~~~~>~~~~1 - \delta$ which is equivalent to:

$\Pr\big[\forall h\in H: D(h,C_m) \leq \varepsilon\big]~~~~>~~~~1 - 4\cdot \operatorname(H, 2m)\cdot \exp(-\varepsilon^2\cdot m/8)$ In words: the probability that for all events in

, the relative-frequency is near the probability, is lower-bounded by an expression that depends on the growth-function of

A corollary of this is that, if the growth function is polynomial in

(i.e., there exists some

such that

\operatorname{Growth}(H,m)\leqmⁿ⁺¹

), then the above probability approaches 1 as

m\toinfty

. I.e, the family

enjoys uniform convergence in probability.

Notes and References

, especially Section 3.2