Complexity index explained

In modern computer science and statistics, the complexity index of a function denotes the level of informational content, which in turn affects the difficulty of learning the function from examples. This is different from computational complexity, which is the difficulty to compute a function. Complexity indices characterize the entire class of functions to which the one we are interested in belongs. Focusing on Boolean functions, the detail of a class

of Boolean functions c essentially denotes how deeply the class is articulated.

Technical definition

To identify this index we must first define a sentry function of

.Let us focus for a moment on a single function c, call it a concept defined on a set

of elements that we may figure as points in a Euclidean space. In this framework, the above function associates to c a set of points that, since are defined to be external to the concept, prevent it from expanding into another function of

. We may dually define these points in terms of sentinelling a given concept c from being fully enclosed (invaded) by another concept within the class. Therefore, we call these points either sentinels or sentry points; they are assigned by the sentry function

\boldsymbolS

to each concept of

in such a way that:

the sentry points are external to the concept c to be sentineled and internal to at least one other including it,
each concept

including c has at least one of the sentry points of c either in the gap between c and

, or outside

and distinct from the sentry points of

, and

they constitute a minimal set with these properties.

The technical definition coming from is rooted in the inclusion of an augmented concept

c⁺

made up of c plus its sentry points by another

\left(c'\right)⁺

in the same class.

Definition of sentry function

For a concept class

on a space

akX

, a sentry function is a total function

\boldsymbolS:C\cup\{\emptyset,akX\}\mapsto2^akX

satisfying the following conditions:

Sentinels are outside the sentineled concept (

c\cap{\boldsymbolS}(c)=\emptyset

for all

c\inC

Sentinels are inside the invading concept (Having introduced the sets

c^{+=c\cup\boldsymbol}S(c)

, an invading concept

c'\inC

is such that

c'\not\subseteqc

and

c^+\subseteq\left(c'\right)⁺

. Denoting

up(c)

the set of concepts invading c, we must have that if

c_2\inup(c₁₎

, then

c_{2\cap{\boldsymbol}S}(c_{1) ≠ \emptyset}

{\boldsymbolS}(c)

is a minimal set with the above properties (No

{\boldsymbolS}' ≠ {\boldsymbolS}

exists satisfying (1) and (2) and having the property that

\boldsymbolS'(c)\subseteq\boldsymbolS(c)

for every

c\inC

Sentinels are honest guardians. It may be that

c\subseteq\left(c'\right)⁺

but

{\boldsymbolS}(c)\capc'=\emptyset

so that

c'\not\inup(c)

. This however must be a consequence of the fact that all points of

{\boldsymbolS}(c)

are involved in really sentineling c against other concepts in

up(c)

and not just in avoiding inclusion of

c⁺

(c')⁺

. Thus if we remove

c',{\boldsymbolS}(c)

remains unchanged (Whenever

c₁

and

c₂

are such that

c_1\subsetc_{2\cup{\boldsymbol}S}(c₂₎

and

c_{2\cap{\boldsymbol}S}(c_1)=\emptyset

, then the restriction of

{\boldsymbolS}

\{c_1\}\cupup(c_1)-\{c_2\}

is a sentry function on this set).

{\boldsymbolS}(c)

is the frontier of c upon

\boldsymbolS

With reference to the picture on the right,

\{x_1,x_2,x_3\}

is a candidate frontier of

c₀

against

c_1,c_2,c_3,c₄

. All points are in the gap between a

c_i

and

c₀

. They avoid inclusion of

c_0\cup\{x_1,x_2,x_3\}

c₃

, provided that these points are not used by the latter for sentineling itself against other concepts. Vice versa we expect that

c₁

uses

x₁

and

x₃

as its own sentinels,

c₂

uses

x₂

and

x₃

and

c₄

uses

x₁

and

x₂

analogously. Point

x₄

is not allowed as a

c₀

sentry point since, like any diplomatic seat, it should be located outside all other concepts just to ensure that it is not occupied in case of invasion by

c₀

Definition of detail

The frontier size of the most expensive concept to be sentineled with the least efficient sentineling function, i.e. the quantity

D_C=\sup_{\boldsymbol,c}\#{\boldsymbolS}(c)

is called detail of

\boldsymbolS

spans also over sentry functions on subsets of

akX

sentineling in this case the intersections of the concepts with these subsets. Actually, proper subsets of

akX

may host sentineling tasks that prove harder than those emerging with

akX

itself.

The detail

D_C

is a complexity measure of concept classes dual to the VC dimension

D_VC

. The former uses points to separate sets of concepts, the latter concepts for partitioning sets of points. In particular the following inequality holds

D_C\leqD_VC+1

See also Rademacher complexity for a recently introduced class complexity index.

Example: continuous spaces

Class C of circles in

R²

has detail

D_C=2

, as shown in the picture on left below. Similarly, for the class of segments on

, as shown in the picture on right.

Example: discrete spaces

The class

C=\{c_1,c_2,c_3,c_4\}

akX=\{x_1,x_2,x_3\}

whose concepts are illustrated in the following scheme, where "+" denotes an element

x_j

belonging to

c_i

, "-" an element outside

c_i

, and ⃝ a sentry point:

	x₁	x₂	x₃
c₁₌	-⃝	-⃝	-
c₂₌	-⃝	+	+
c₃₌	+	-⃝	+
c₄₌	+	+	+

This class has

D_C=2

. As usual we may have different sentineling functions. A worst case, as illustrated, is:

S(c_1)=\{x_1,x_2\},S(c_2)=\{x_1\},S(c_3)=\{x_2\},S(c_4)=\emptyset

. However a cheaper one is

S(c_1)=\{x_3\},S(c_2)=\{x_1\},S(c_3)=\{x_2\},S(c_4)=\emptyset

	x₁	x₂	x₃
c₁₌	-	-	-⃝
c₂₌	-⃝	+	+
c₃₌	+	-⃝	+
c₄₌	+	+	+

References

Book: Apolloni, B. . Malchiodi, D. . Gaito, S. . Algorithmic Inference in Machine Learning . Magill . International Series on Advanced Intelligence . Adelaide . 5 . Advanced Knowledge International . 2nd . 2006 . .
10.1016/S0304-3975(95)00240-5 . Apolloni, B. . Chiaravalli, S. . PAC learning of concept classes through the boundaries of their items . Theoretical Computer Science . 172 . 1–2 . 1997 . 91–120. free . .