Small-bias sample space explained

In theoretical computer science, a small-bias sample space (also known as

\epsilon

-biased sample space,

\epsilon

-biased generator, or small-bias probability space) is a probability distribution that fools parity functions.In other words, no parity function can distinguish between a small-bias sample space and the uniform distribution with high probability, and hence, small-bias sample spaces naturally give rise to pseudorandom generators for parity functions.

The main useful property of small-bias sample spaces is that they need far fewer truly random bits than the uniform distribution to fool parities. Efficient constructions of small-bias sample spaces have found many applications in computer science, some of which are derandomization, error-correcting codes, and probabilistically checkable proofs.The connection with error-correcting codes is in fact very strong since

\epsilon

-biased sample spaces are equivalent to \epsilon

-balanced error-correcting codes.

Definition

Bias

Let

be a probability distribution over

\{0,1\}ⁿ

.The bias of

with respect to a set of indices

I\subseteq\{1,...,n\}

is defined as^[1]

bias_{I(X)
=
\left|
\Pr}_x\sim\left(\sum_i\inx_i=0\right) - \Pr_x\sim\left(\sum_i\inx_i=1\right) \right| = \left| 2 ⋅ \Pr_x\sim\left(\sum_i\inx_i=0\right) -1 \right| ,

where the sum is taken over

F₂

, the finite field with two elements. In other words, the sum

\sum_i\inx_i

equals

if the number of ones in the sample

x\in\{0,1\}ⁿ

at the positions defined by

is even, and otherwise, the sum equals

.For

I=\emptyset

, the empty sum is defined to be zero, and hence

bias_\emptyset(X)=1

ϵ-biased sample space

A probability distribution

over

\{0,1\}ⁿ

is called an \epsilon

-biased sample space if

bias_I(X)\leq\epsilon

holds for all non-empty subsets

I\subseteq\{1,2,\ldots,n\}

ϵ-biased set

\epsilon

-biased sample space

that is generated by picking a uniform element from a multiset

X\subseteq\{0,1\}ⁿ

is called \epsilon

-biased set.The size

of an

\epsilon

-biased set

is the size of the multiset that generates the sample space.

ϵ-biased generator

\epsilon

-biased generator

G:\{0,1\}^\ell\to\{0,1\}ⁿ

is a function that maps strings of length

\ell

to strings of length

such that the multiset

X_G=\{G(y) \vert y\in\{0,1\}^\ell\}

is an

\epsilon

-biased set. The seed length of the generator is the number

\ell

and is related to the size of the

\epsilon

-biased set

X_G

via the equation

s=2^\ell

Connection with epsilon-balanced error-correcting codes

There is a close connection between

\epsilon

-biased sets and \epsilon

-balanced linear error-correcting codes.A linear code

C:\{0,1\}^n\to\{0,1\}^s

of message length

and block length

is
\epsilon

-balanced if the Hamming weight of every nonzero codeword

C(x)

is between

(	1
	2

-\epsilon)s

and

(	1
	2

+\epsilon)s

.Since

is a linear code, its generator matrix is an

(n x s)

-matrix

over

F₂

with

C(x)=x ⋅ A

Then it holds that a multiset

X\subset\{0,1\}ⁿ

\epsilon

-biased if and only if the linear code

C_X

, whose columns are exactly elements of

, is

\epsilon

-balanced.^[2]

Constructions of small epsilon-biased sets

Usually the goal is to find

\epsilon

-biased sets that have a small size

relative to the parameters

and

\epsilon

.This is because a smaller size

means that the amount of randomness needed to pick a random element from the set is smaller, and so the set can be used to fool parities using few random bits.

Theoretical bounds

The probabilistic method gives a non-explicit construction that achieves size

s=O(n/\epsilon²⁾

.The construction is non-explicit in the sense that finding the

\epsilon

-biased set requires a lot of true randomness, which does not help towards the goal of reducing the overall randomness.However, this non-explicit construction is useful because it shows that these efficient codes exist.On the other hand, the best known lower bound for the size of

\epsilon

-biased sets is

s=\Omega(n/(\epsilon²log(1/\epsilon))

, that is, in order for a set to be

\epsilon

-biased, it must be at least that big.

Explicit constructions

There are many explicit, i.e., deterministic constructions of

\epsilon

-biased sets with various parameter settings:

achieve

\displaystyles=

	n
	poly(\epsilon)

. The construction makes use of Justesen codes (which is a concatenation of Reed–Solomon codes with the Wozencraft ensemble) as well as expander walk sampling.

achieve

\displaystyles=O\left(

	n
	\epsilonlog(n/\epsilon)

\right)²

. One of their constructions is the concatenation of Reed–Solomon codes with the Hadamard code; this concatenation turns out to be an

\epsilon

-balanced code, which gives rise to an

\epsilon

-biased sample space via the connection mentioned above.

Concatenating Algebraic geometric codes with the Hadamard code gives an

\epsilon

-balanced code with

\displaystyles=O\left(

	n
	\epsilon³log(1/\epsilon)

\right)

achieves

\displaystyles=O\left(

	n
	\epsilon²log(1/\epsilon)

\right)^5/4

achieves

\displaystyles=O\left(

	n
	\epsilon^2+o(1)

\right)

which is almost optimal because of the lower bound.These bounds are mutually incomparable. In particular, none of these constructions yields the smallest

\epsilon

-biased sets for all settings of

\epsilon

and

Application: almost k-wise independence

An important application of small-bias sets lies in the construction of almost k-wise independent sample spaces.

k-wise independent spaces

A random variable

over

\{0,1\}ⁿ

is a k-wise independent space if, for all index sets

I\subseteq\{1,...,n\}

of size

, the marginal distribution

Y|_I

is exactly equal to the uniform distribution over

\{0,1\}^k

.That is, for all such

and all strings

z\in\{0,1\}^k

, the distribution

satisfies

\Pr_Y(Y|_I=z)=2^-k

Constructions and bounds

k-wise independent spaces are fairly well understood.

A simple construction by achieves size

n^k

construct a k-wise independent space whose size is

n^k/2

prove that no k-wise independent space can be significantly smaller than

n^k/2

Joffe's construction

constructs a

-wise independent space

over the finite field with some prime number

n>k

of elements, i.e.,

is a distribution over

	n
F
	n

. The initial

marginals of the distribution are drawn independently and uniformly at random:

(Y_0,...,Y_k-1)

	k
\simF
	n

.For each

with

k\leqi<n

, the marginal distribution of

Y_i

is then defined as

Y_i=Y₀+Y₁ ⋅ i+Y₂ ⋅ i²+...+Y_k-1 ⋅ i^k-1,

where the calculation is done in

F_n

. proves that the distribution

constructed in this way is

-wise independent as a distribution over

	n
F
	n

.The distribution

is uniform on its support, and hence, the support of

forms a k

-wise independent set.It contains all

n^k

strings in

	k
F
	n

that have been extended to strings of length

using the deterministic rule above.

Almost k-wise independent spaces

A random variable

over

\{0,1\}ⁿ

is a \delta

-almost k-wise independent space if, for all index sets

I\subseteq\{1,...,n\}

of size

, the restricted distribution

Y|_I

and the uniform distribution

U_k

\{0,1\}^k

are

\delta

-close in 1-norm, i.e.,

\|Y|_I-U_k\|₁\leq\delta

Constructions

give a general framework for combining small k-wise independent spaces with small

\epsilon

-biased spaces to obtain

\delta

-almost k-wise independent spaces of even smaller size.In particular, let

	h\to\{0,1\}
G
	1:\{0,1\}

ⁿ

be a linear mapping that generates a k-wise independent space and let

	\ell
G
	2:\{0,1\}

\to\{0,1\}^h

be a generator of an

\epsilon

-biased set over

\{0,1\}^h

.That is, when given a uniformly random input, the output of

G₁

is a k-wise independent space, and the output of

G₂

\epsilon

-biased.Then

G:\{0,1\}^\ell\to\{0,1\}ⁿ

with

G(x)=G_1(G_2(x))

is a generator of an

\delta

-almost

-wise independent space, where

\delta=2^k/2\epsilon

.^[3]

As mentioned above, construct a generator

G₁

with

h=\tfrac{k}{2}logn

, and construct a generator

G₂

with

\ell=logs=logh+O(log(\epsilon^-1))

.Hence, the concatenation

G₁

and

G₂

has seed length

\ell=logk+loglogn+O(log(\epsilon^-1))

.In order for

to yield a

\delta

-almost k-wise independent space, we need to set

\epsilon=\delta2^-k/2

, which leads to a seed length of

\ell=loglogn+O(k+log(\delta^-1))

and a sample space of total size

2^\ell\leqlogn ⋅ poly(2^k ⋅ \delta^-1)

References

- - Book: Avraham . Ben-Aroya . Amnon . Ta-Shma . 2009 50th Annual IEEE Symposium on Foundations of Computer Science . Constructing Small-Bias Sets from Algebraic-Geometric Codes . 2009 . 191–197 . 10.1109/FOCS.2009.44 . 978-1-4244-5116-6. 10.1.1.149.9273 .
Book: Benny . Chor . Benny Chor . Oded . Goldreich . Johan . Håstad . Joel . Freidmann . Steven . Rudich . Roman . Smolensky . 26th Annual Symposium on Foundations of Computer Science (SFCS 1985) . The bit extraction problem or t-resilient functions . 1985 . 396–407 . 10.1109/SFCS.1985.55 . 978-0-8186-0644-1. 10.1.1.39.6768 . 6968065 .

Notes and References

cf., e.g.,
cf., e.g., p. 2 of
Section 4 in