Concept class explained

In computational learning theory in mathematics, a concept over a domain X is a total Boolean function over X. A concept class is a class of concepts. Concept classes are a subject of computational learning theory.

Concept class terminology frequently appears in model theory associated with probably approximately correct (PAC) learning.[1] In this setting, if one takes a set Y as a set of (classifier output) labels, and X is a set of examples, the map

c:X\toY

, i.e. from examples to classifier labels (where

Y=\{0,1\}

and where c is a subset of X), c is then said to be a concept. A concept class

C

is then a collection of such concepts.

Given a class of concepts C, a subclass D is reachable if there exists a sample s such that D contains exactly those concepts in C that are extensions to s. Not every subclass is reachable.[2]

Background

A sample

s

is a partial function from

X

to

\{0,1\}

. Identifying a concept with its characteristic function mapping

X

to

\{0,1\}

, it is a special case of a sample.

Two samples are consistent if they agree on the intersection of their domains. A sample

s'

extends another sample

s

if the two are consistent and the domain of

s

is contained in the domain of

s'

.

Examples

Suppose that

C=S+(X)

. Then:

\{\{x\}\}

is reachable with the sample

s=\{(x,1)\}

;

S+(Y)

for

Y\subseteqX

are reachable with a sample that maps the elements of

X-Y

to zero;

S(X)

, which consists of the singleton sets, is not reachable.

Applications

Let

C

be some concept class. For any concept

c\inC

, we call this concept

1/d

-good for a positive integer

d

if, for all

x\inX

, at least

1/d

of the concepts in

C

agree with

c

on the classification of

x

. The fingerprint dimension

FD(C)

of the entire concept class

C

is the least positive integer

d

such that every reachable subclass

C'\subseteqC

contains a concept that is

1/d

-good for it. This quantity can be used to bound the minimum number of equivalence queries needed to learn a class of concepts according to the following inequality:FD(C) - 1\leq \#EQ(C)\leq \lceil FD(C)\ln(|C|)\rceil.

Notes and References

  1. https://arxiv.org/abs/1801.06566 Chase, H., & Freitag, J. (2018). Model Theory and Machine Learning. arXiv preprint arXiv:1801.06566
  2. Angluin. D.. 2004. Queries revisited. Theoretical Computer Science. 313. 2. 188–191. 10.1016/j.tcs.2003.11.004 .