Small-bias sample space explained

In theoretical computer science, a small-bias sample space (also known as

\epsilon

-biased sample space,

\epsilon

-biased generator, or small-bias probability space) is a probability distribution that fools parity functions.In other words, no parity function can distinguish between a small-bias sample space and the uniform distribution with high probability, and hence, small-bias sample spaces naturally give rise to pseudorandom generators for parity functions.

The main useful property of small-bias sample spaces is that they need far fewer truly random bits than the uniform distribution to fool parities. Efficient constructions of small-bias sample spaces have found many applications in computer science, some of which are derandomization, error-correcting codes, and probabilistically checkable proofs.The connection with error-correcting codes is in fact very strong since

\epsilon

-biased sample spaces are equivalent to

\epsilon

-balanced error-correcting codes
.

Definition

Bias

Let

X

be a probability distribution over

\{0,1\}n

.The bias of

X

with respect to a set of indices

I\subseteq\{1,...,n\}

is defined as[1]

biasI(X) = \left| \Prx\sim\left(\sumi\inxi=0\right) - \Prx\sim\left(\sumi\inxi=1\right) \right| = \left| 2\Prx\sim\left(\sumi\inxi=0\right) -1 \right| ,

where the sum is taken over

F2

, the finite field with two elements. In other words, the sum

\sumi\inxi

equals

0

if the number of ones in the sample

x\in\{0,1\}n

at the positions defined by

I

is even, and otherwise, the sum equals

1

.For

I=\emptyset

, the empty sum is defined to be zero, and hence

bias\emptyset(X)=1

.

ϵ-biased sample space

A probability distribution

X

over

\{0,1\}n

is called an

\epsilon

-biased sample space
if

biasI(X)\leq\epsilon

holds for all non-empty subsets

I\subseteq\{1,2,\ldots,n\}

.

ϵ-biased set

An

\epsilon

-biased sample space

X

that is generated by picking a uniform element from a multiset

X\subseteq\{0,1\}n

is called

\epsilon

-biased set
.The size

s

of an

\epsilon

-biased set

X

is the size of the multiset that generates the sample space.

ϵ-biased generator

An

\epsilon

-biased generator

G:\{0,1\}\ell\to\{0,1\}n

is a function that maps strings of length

\ell

to strings of length

n

such that the multiset

XG=\{G(y)\verty\in\{0,1\}\ell\}

is an

\epsilon

-biased set. The seed length of the generator is the number

\ell

and is related to the size of the

\epsilon

-biased set

XG

via the equation

s=2\ell

.

Connection with epsilon-balanced error-correcting codes

There is a close connection between

\epsilon

-biased sets and

\epsilon

-balanced
linear error-correcting codes.A linear code

C:\{0,1\}n\to\{0,1\}s

of message length

n

and block length

s

is

\epsilon

-balanced
if the Hamming weight of every nonzero codeword

C(x)

is between
(1
2

-\epsilon)s

and
(1
2

+\epsilon)s

.Since

C

is a linear code, its generator matrix is an

(n x s)

-matrix

A

over

F2

with

C(x)=xA

.

Then it holds that a multiset

X\subset\{0,1\}n

is

\epsilon

-biased if and only if the linear code

CX

, whose columns are exactly elements of

X

, is

\epsilon

-balanced.[2]

Constructions of small epsilon-biased sets

Usually the goal is to find

\epsilon

-biased sets that have a small size

s

relative to the parameters

n

and

\epsilon

.This is because a smaller size

s

means that the amount of randomness needed to pick a random element from the set is smaller, and so the set can be used to fool parities using few random bits.

Theoretical bounds

The probabilistic method gives a non-explicit construction that achieves size

s=O(n/\epsilon2)

.The construction is non-explicit in the sense that finding the

\epsilon

-biased set requires a lot of true randomness, which does not help towards the goal of reducing the overall randomness.However, this non-explicit construction is useful because it shows that these efficient codes exist.On the other hand, the best known lower bound for the size of

\epsilon

-biased sets is

s=\Omega(n/(\epsilon2log(1/\epsilon))

, that is, in order for a set to be

\epsilon

-biased, it must be at least that big.

Explicit constructions

There are many explicit, i.e., deterministic constructions of

\epsilon

-biased sets with various parameter settings:

\displaystyles=

n
poly(\epsilon)
. The construction makes use of Justesen codes (which is a concatenation of Reed–Solomon codes with the Wozencraft ensemble) as well as expander walk sampling.

\displaystyles=O\left(

n
\epsilonlog(n/\epsilon)

\right)2

. One of their constructions is the concatenation of Reed–Solomon codes with the Hadamard code; this concatenation turns out to be an

\epsilon

-balanced code, which gives rise to an

\epsilon

-biased sample space via the connection mentioned above.

\epsilon

-balanced code with

\displaystyles=O\left(

n
\epsilon3log(1/\epsilon)

\right)

.

\displaystyles=O\left(

n
\epsilon2log(1/\epsilon)

\right)5/4

.

\displaystyles=O\left(

n
\epsilon2+o(1)

\right)

which is almost optimal because of the lower bound.These bounds are mutually incomparable. In particular, none of these constructions yields the smallest

\epsilon

-biased sets for all settings of

\epsilon

and

n

.

Application: almost k-wise independence

An important application of small-bias sets lies in the construction of almost k-wise independent sample spaces.

k-wise independent spaces

A random variable

Y

over

\{0,1\}n

is a k-wise independent space if, for all index sets

I\subseteq\{1,...,n\}

of size

k

, the marginal distribution

Y|I

is exactly equal to the uniform distribution over

\{0,1\}k

.That is, for all such

I

and all strings

z\in\{0,1\}k

, the distribution

Y

satisfies

\PrY(Y|I=z)=2-k

.

Constructions and bounds

k-wise independent spaces are fairly well understood.

nk

.

nk/2

.

nk/2

.

Joffe's construction

constructs a

k

-wise independent space

Y

over the finite field with some prime number

n>k

of elements, i.e.,

Y

is a distribution over
n
F
n
. The initial

k

marginals of the distribution are drawn independently and uniformly at random:

(Y0,...,Yk-1)

k
\simF
n
.For each

i

with

k\leqi<n

, the marginal distribution of

Yi

is then defined as

Yi=Y0+Y1i+Y2i2+...+Yk-1ik-1,

where the calculation is done in

Fn

. proves that the distribution

Y

constructed in this way is

k

-wise independent as a distribution over
n
F
n
.The distribution

Y

is uniform on its support, and hence, the support of

Y

forms a

k

-wise independent set
.It contains all

nk

strings in
k
F
n
that have been extended to strings of length

n

using the deterministic rule above.

Almost k-wise independent spaces

A random variable

Y

over

\{0,1\}n

is a

\delta

-almost k-wise independent space
if, for all index sets

I\subseteq\{1,...,n\}

of size

k

, the restricted distribution

Y|I

and the uniform distribution

Uk

on

\{0,1\}k

are

\delta

-close in 1-norm, i.e.,

\|Y|I-Uk\|1\leq\delta

.

Constructions

give a general framework for combining small k-wise independent spaces with small

\epsilon

-biased spaces to obtain

\delta

-almost k-wise independent spaces of even smaller size.In particular, let
h\to\{0,1\}
G
1:\{0,1\}

n

be a linear mapping that generates a k-wise independent space and let
\ell
G
2:\{0,1\}

\to\{0,1\}h

be a generator of an

\epsilon

-biased set over

\{0,1\}h

.That is, when given a uniformly random input, the output of

G1

is a k-wise independent space, and the output of

G2

is

\epsilon

-biased.Then

G:\{0,1\}\ell\to\{0,1\}n

with

G(x)=G1(G2(x))

is a generator of an

\delta

-almost

k

-wise independent space, where

\delta=2k/2\epsilon

.[3]

As mentioned above, construct a generator

G1

with

h=\tfrac{k}{2}logn

, and construct a generator

G2

with

\ell=logs=logh+O(log(\epsilon-1))

.Hence, the concatenation

G

of

G1

and

G2

has seed length

\ell=logk+loglogn+O(log(\epsilon-1))

.In order for

G

to yield a

\delta

-almost k-wise independent space, we need to set

\epsilon=\delta2-k/2

, which leads to a seed length of

\ell=loglogn+O(k+log(\delta-1))

and a sample space of total size

2\ell\leqlognpoly(2k\delta-1)

.

References

Notes and References

  1. cf., e.g.,
  2. cf., e.g., p. 2 of
  3. Section 4 in