Automatic basis function construction explained

In machine learning, automatic basis function construction (or basis discovery) is the mathematical method of looking for a set of task-independent basis functions that map the state space to a lower-dimensional embedding, while still representing the value function accurately. Automatic basis construction is independent of prior knowledge of the domain, which allows it to perform well where expert-constructed basis functions are difficult or impossible to create.

Motivation

In reinforcement learning (RL), most real-world Markov Decision Process (MDP) problems have large or continuous state spaces, which typically require some sort of approximation to be represented efficiently.

Linear function approximators^[1] (LFAs) are widely adopted for their low theoretical complexity. Two sub-problems needs to be solved for better approximation: weight optimization and basis construction. To solve the second problem, one way is to design special basis functions. Those basis functions work well in specific tasks but are significantly restricted to domains. Thus constructing basis construction functions automatically is preferred for broader applications.

Problem definition

A Markov decision process with finite state space and fixed policy is defined with a 5-tuple

{s,a,p,\gamma,r}

, which includes the finite state space

S={{1,2,\ldots,s}}

, the finite action space

, the reward function

, discount factor

\gamma\in[0,1)

, and the transition model

Bellman equation is defined as:

v=r+\gammaPv.

When the number of elements in

is small,

is usually maintained as tabular form. While

grows too large for this kind of representation.

is commonly being approximated via a linear combination of basis function

\Phi={\phi_1,\phi_{2,\ldots,\phi}_n}

,^[2] so that we have:

	n\theta
v ≈ \hat{v}=\sum
	n\phi

Here

\Phi

is a

|S| x n

matrix in which every row contains a feature vector for corresponding row,

\theta

is a weight vector with n parameters and usually

n\ll|s|

Basis construction looks for ways to automatically construct better basis function

\Phi

which can represent the value function well.

A good construction method should have the following characteristics:

Small error bounds between the estimate and real value function
Form orthogonal basis in the value function space
Converge to stationary value function fast

Popular methods

Proto-value basis

In this approach, Mahadevan analyzes the connectivity graph between states to determine a set of basis functions.

The normalized graph Laplacian is defined as:

-	1
	2

L=I-D

-	1
	2

Here W is an adjacency matrix which represents the states of fixed policy MDP which forms an undirected graph (N,E). D is a diagonal matrix related to nodes' degrees.

In discrete state space, the adjacency matrix

could be constructed by simply checking whether two states are connected, and D could be calculated by summing up every row of W. In continuous state space, we could take random walk Laplacian of W.

This spectral framework can be used for value function approximation (VFA). Given the fixed policy, the edge weights are determined by corresponding states' transition probability. To get smooth value approximation, diffusion wavelets are used.^[3]

Krylov basis

Krylov basis construction uses the actual transition matrix instead of random walk Laplacian. The assumption of this method is that transition model P and reward r are available.

The vectors in Neumann series are denoted as

	ir
y
	i=P

for all

i\in[0,infty)

It shows that Krylov space spanned by

y_0,y_1,\ldots,y_m-1

is enough to represent any value function,^[4] and m is the degree of minimal polynomial of

(I-\gammaP)

Suppose the minimal polynomial is

p(A)=	1
	\alpha₀

	m-1
\sum
	i=0

\alpha_i+1Aⁱ

, and we have

BA=I

, the value function can be written as:

v=Br=	1
	\alpha₀

	m-1
\sum
	i=0

\alpha_i+1(I-\gamma

	m-1
P)
	i=0

\alpha_i+1\beta_iy_i.

Algorithm Augmented Krylov Method^[5]

z_1,z_2,\ldots,z_k

are top real eigenvectors of P

z_k+1:=r

for

i:=1:(l+k)

i>k+1

then

z_i:=Pz_i-1

end if

for

j:=1:(i-1)

z_i:=z_i-<z_j,z_i>z_j;

end for

\parallelz_{i\parallel ≈}0

then

break;

end if

end for

k: number of eigenvectors in basis

l: total number of vectors

Bellman error basis

Bellman error (or BEBFs) is defined as:

\varepsilon=r+\gammaP\hat{v}-\hat{v}=r+\gammaP\Phi\theta-\Phi\theta

Loosely speaking, Bellman error points towards the optimal value function.^[6] The sequence of BEBF form a basis space which is orthogonal to the real value function space; thus with sufficient number of BEBFs, any value function can be represented exactly.

Algorithm BEBF

stage stage i=1,

\phi₁=r

;

stage

i\in[2,N]

compute the weight vector

\theta_i

according to current basis function

\Phi_i

;

compute new bellman error by

\varepsilon=r+\gammaP\Phi_i\theta_i-\Phi_i\theta_i

;

add bellman error to form new basis function:

\Phi_i+1=[\Phi_i:\varepsilon]

;

N represents the number of iterations till convergence.

":" means juxtaposing matrices or vectors.

Bellman average reward bases

Bellman Average Reward Bases (or BARBs)^[7] is similar to Krylov Bases, but the reward function is being dilated by the average adjusted transition matrix

P-P^*

. Here

P^*

can be calculated by many methods in.^[8]

BARBs converges faster than BEBFs and Krylov when

\gamma

is close to 1.

Algorithm BARBs

stage stage i=1,

P^*r

;

stage

i\in[2,N]

compute the weight vector

\theta_i

according to current basis function

\Phi_i

;

compute new basis:

:\phi_i+1

	r+P\Phi*
=r-P
	i

\theta_i-\Phi_i\theta_i

, and add it to form new bases matrix

\Phi_i+1=[\Phi_i:\phi_i+1]

;

N represents the number of iterations till convergence.

":" means juxtaposing matrices or vectors.

Discussion and analysis

There are two principal types of basis construction methods.

The first type of methods are reward-sensitive, like Krylov and BEBFs; they dilate the reward function geometrically through transition matrix. However, when discount factor

\gamma

approaches to 1, Krylov and BEBFs converge slowly. This is because the error Krylov based methods are restricted by Chebyshev polynomial bound.^[5] To solve this problem, methods such as BARBs are proposed. BARBs is an incremental variant of Drazin bases, and converges faster than Krylov and BEBFs when

\gamma

becomes large.

Another type is reward-insensitive proto value basis function derived from graph Lapalacian. This method uses graph information, but the construction of adjacency matrix makes this method hard to analyze.

External links

http://www-all.cs.umass.edu/ UMASS ALL lab

Notes and References

Keller, Philipp; Mannor, Shie; Precup, Doina. (2006) Automatic Basis Function Construction for Approximate Dynamic Programming and Reinforcement Learning. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA.
Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction.(1998) MIT Press, Cambridge, MA, chapter 8
Mahadevan, Sridhar; Maggioni, Mauro. (2005) Value function approximation with diffusion wavelets and Laplacian eigenfuctions. Proceedings of Advances in Neural Information Processing Systems.
[Ilse Ipsen|Ilse C. F. Ipsen]
M. Petrik. An analysis of Laplacian methods for value function approximation in MDPs. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 2574–2579, 2007
R. Parr, C. Painter-Wakefield, L.-H. Li, and M. Littman. Analyzing feature generation for value-function approximation. In ICML’07, 2007.
S. Mahadevan and B. Liu. Basis construction from power series expansions of value functions. In NIPS’10, 2010
William J. Stewart. Numerical methods for computing stationary distributions of finite irreducible Markov chains. In Advances in Computational Probability. Kluwer Academic Publishers, 1997.

Automatic basis function construction explained

Motivation

Problem definition

Popular methods

Proto-value basis

Krylov basis

Bellman error basis

Bellman average reward bases

Discussion and analysis

See also

External links

Notes and References