Covering number explained

In mathematics, a covering number is the number of balls of a given size needed to completely cover a given space, with possible overlaps between the balls. The covering number quantifies the size of a set and can be applied to general metric spaces. Two related concepts are the packing number, the number of disjoint balls that fit in a space, and the metric entropy, the number of points that fit in a space when constrained to lie at some fixed minimum distance apart.

Definition

Let (M, d) be a metric space, let K be a subset of M, and let r be a positive real number. Let Br(x) denote the ball of radius r centered at x. A subset C of M is an r-external covering of K if:

K\subseteqcupxBr(x)

.In other words, for every

y\inK

there exists

x\inC

such that

d(x,y)\leqr

.

If furthermore C is a subset of K, then it is an r-internal covering.

The external covering number of K, denoted

ext
N
r(K)
, is the minimum cardinality of any external covering of K. The internal covering number, denoted
int
N
r(K)
, is the minimum cardinality of any internal covering.

A subset P of K is a packing if

P\subseteqK

and the set

\{Br(x)\}x

is pairwise disjoint. The packing number of K, denoted
pack
N
r(K)
, is the maximum cardinality of any packing of K.

A subset S of K is r-separated if each pair of points x and y in S satisfies d(x, y) ≥ r. The metric entropy of K, denoted

met
N
r(K)
, is the maximum cardinality of any r-separated subset of K.

Properties

The following properties relate to covering numbers in the standard Euclidean space,

Rm

:

Application to machine learning

Let

K

be a space of real-valued functions, with the l-infinity metric (see example 3 above).Suppose all functions in

K

are bounded by a real constant

M

.Then, the covering number can be used to bound the generalization errorof learning functions from

K

,relative to the squared loss:

\operatorname{Prob}\left[ \suph\in\vertGeneralizationError(h)-EmpiricalError(h)\vert\geq\epsilon \right]\leq

int
N
r

(K)2\exp{-m\epsilon2\over2M4}

where

r={\epsilon\over8M}

and

m

is the number of samples.

See also