Exponential tilting explained

Exponential Tilting (ET), Exponential Twisting, or Exponential Change of Measure (ECM) is a distribution shifting technique used in many parts of mathematics.The different exponential tiltings of a random variable

is known as the natural exponential family of

Exponential Tilting is used in Monte Carlo Estimation for rare-event simulation, and rejection and importance sampling in particular.In mathematical finance ^[1] Exponential Tilting is also known as Esscher tilting (or the Esscher transform), and often combined with indirect Edgeworth approximation and is used in such contexts as insurance futures pricing.^[2]

The earliest formalization of Exponential Tilting is often attributed to Esscher^[3] with its use in importance sampling being attributed to David Siegmund.^[4]

Overview

Given a random variable

with probability distribution

, density

, and moment generating function (MGF)

M_X(\theta)=E[e^\theta]<infty

, the exponentially tilted measure

P_\theta

is defined as follows:

P_\theta(X\indx)=

	E[e^\thetaI[X\indx]]
	M_X(\theta)

=e^\thetaP(X\indx),

where

\kappa(\theta)

is the cumulant generating function (CGF) defined as

\kappa(\theta)=logE[e^\theta]=logM_X(\theta).

We call

P_\theta(X\indx)=f_\theta(x)

the

\theta

-tilted density of

. It satisfies

f_\theta(x)\proptoe^\thetaf(x)

The exponential tilting of a random vector

has an analogous definition:

P_\theta(X\indx)=

	\theta^Tx-\kappa(\theta)
e

P(X\indx),

where

\kappa(\theta)=logE[\exp\{\theta^TX\}]

Example

The exponentially tilted measure in many cases has the same parametric form as that of

. One-dimensional examples include the normal distribution, the exponential distribution, the binomial distribution and the Poisson distribution.

For example, in the case of the normal distribution,

N(\mu,\sigma²⁾

the tilted density

f_\theta(x)

is the

N(\mu+\theta\sigma^2,\sigma²⁾

density. The table below provides more examples of tilted densities.

Original distribution^[5] ^[6]

θ-Tilted distribution

Gamma(\alpha,\beta)

Gamma(\alpha,\beta-\theta)

Binomial(n,p)

Binomial\left(n,

	pe^\theta
	1-p+pe^\theta

\right)

Poisson(λ)

Poisson(λe^\theta)

Exponential(λ)

Exponential(λ-\theta)

l{N}(\mu,\sigma²⁾

l{N}(\mu+\theta\sigma^2,\sigma²⁾

l{N}(\mu,\Sigma)

l{N}(\mu+\Sigma\theta,\Sigma)

\chi^2(\kappa)

Gamma\left(	\kappa
	2

	2
	1-2\theta

\right)

For some distributions, however, the exponentially tilted distribution does not belong to the same parametric family as

. An example of this is the Pareto distribution with

f(x)=\alpha/(1+x)^\alpha,x>0

, where

f_\theta(x)

is well defined for

\theta<0

but is not a standard distribution. In such examples, the random variable generation may not always be straightforward.^[7]

In statistical mechanics, the energy of a system in equilibrium with a heat bath has the Boltzmann distribution:

P(E\indE)\proptoe^-\betadE

, where

\beta

is the inverse temperature. Exponential tilting then corresponds to changing the temperature:

P_\theta(E\indE)\proptoe^-(\betadE

Similarly, the energy and particle number of a system in equilibrium with a heat and particle bath has the grand canonical distribution:

P((N,E)\in(dN,dE))\proptoe^\betadNdE

, where

\mu

is the chemical potential. Exponential tilting then corresponds to changing both the temperature and the chemical potential.

Advantages

In many cases, the tilted distribution belongs to the same parametric family as the original. This is particularly true when the original density belongs to the exponential family of distribution. This simplifies random variable generation during Monte-Carlo simulations. Exponential tilting may still be useful if this is not the case, though normalization must be possible and additional sampling algorithms may be needed.

In addition, there exists a simple relationship between the original and tilted CGF,

\kappa_\theta(η)=

	ηX
log(E
	\theta[e

])=\kappa(\theta+η)-\kappa(\theta).

We can see this by observing that

F_\theta(x)=

	x\exp\{\theta
\int\limits
	infty

y-\kappa(\theta)\}f(y)dy.

Thus,

\begin{align} \kappa_\theta(η)&=log\inte^ηdF_\theta(x)\ &=log\inte^ηe^\thetadF(x)\\ &=logE[e^{(η+\theta)X-\kappa(\theta)}]\\ &=log(e^{\kappa(η+\theta)-\kappa(\theta)})\\ &=\kappa(η+\theta)-\kappa(\theta) \end{align}

Clearly, this relationship allows for easy calculation of the CGF of the tilted distribution and thus the distributions moments. Moreover, it results in a simple form of the likelihood ratio. Specifically,

\ell=

	dP
	dP_\theta

	f(x)
	f_\theta(x)

=e^-

Properties

\kappa(η)=logE[\exp(ηX)]

is the CGF of

, then the CGF of the

\theta

-tilted

\kappa_\theta(η)=\kappa(\theta+η)-\kappa(\theta).

This means that the

-th cumulant of the tilted

\kappa⁽ⁱ⁾(\theta)

. In particular, the expectation of the tilted distribution is

E_{\theta[X]=\tfrac{d}{dη}\kappa}_\theta(η)|_η=0=\kappa'(\theta)

The variance of the tilted distribution is

	2}{dη
Var
	\theta[X]=\tfrac{d

	2}\kappa

	\theta(η)\|

_η=0=\kappa''(\theta)

Repeated tilting is additive. That is, tilting first by

\theta₁

and then

\theta₂

is the same as tilting once by

\theta_1+\theta₂

is the sum of independent, but not necessarily identical random variables

X_1,X_2,...

, then the

\theta

-tilted distribution of

is the sum of

X_1,X_2,...

each

\theta

-tilted individually.

D_KL(P\parallelP_\theta)=E\left[log\tfrac{P}{P_{\theta}\right]}

between the tilted distribution

P_\theta

and the original distribution

Similarly, since

E_\theta[X]=\kappa'(\theta)

, we have the Kullback-Leibler divergence as

D_KL(P_\theta\parallelP)=E_\theta\left[log\tfrac{P_\theta}{P}\right]=\theta\kappa'(\theta)-\kappa(\theta)

Applications

Rare-event simulation

The exponential tilting of

, assuming it exists, supplies a family of distributions that can be used as proposal distributions for acceptance-rejection sampling or importance distributions for importance sampling. One common application is sampling from a distribution conditional on a sub-region of the domain, i.e.

X|X\inA

. With an appropriate choice of

\theta

, sampling from

P_\theta

can meaningfully reduce the required amount of sampling or the variance of an estimator.

Saddlepoint approximation

The saddlepoint approximation method is a density approximation methodology often used for the distribution of sums and averages of independent, identically distributed random variables that employs Edgeworth series, but which generally performs better at extreme values. From the definition of the natural exponential family, it follows that

f_\theta(\bar{x})=f(\bar{x})\exp\{n(\theta\bar{x}-\kappa(\theta))\}

Applying the Edgeworth expansion for

f_\theta(\bar{x})

, we have

f_\theta(\bar{x})=\psi(z)(Var[\bar{X}])^-1/2\left\{1+

	\rho_3(\theta)h_3(z)
	6

	\rho_4(\theta)h_4(z)
	24

...\right\},

where

\psi(z)

is the standard normal density of

	\bar{x
	-

\kappa_\bar{x

} ' (\theta)},

\rho_n(\theta)=\kappa⁽ⁿ⁾(\theta)\{\kappa''(\theta)^n/2\}

and

h_n

are the hermite polynomials.

When considering values of

\bar{x}

progressively farther from the center of the distribution,

|z| → infty

and the

h_n(z)

terms become unbounded. However, for each value of

\bar{x}

, we can choose

\theta

such that

\kappa'(\theta)=\bar{x}.

This value of

\theta

is referred to as the saddle-point, and the above expansion is always evaluated at the expectation of the tilted distribution. This choice of

\theta

leads to the final representation of the approximation given by

f(\bar{x}) ≈ \left(

	n
	2\pi\kappa''(\theta)

\right)^1/2\exp\{n(\kappa(\theta)-\theta\bar{x})\}.

^[8] ^[9]

Rejection sampling

Using the tilted distribution

P_\theta

as the proposal, the rejection sampling algorithm prescribes sampling from

f_\theta(x)

and accepting with probability

	1
	c

\exp(-\thetax+\kappa(\theta)),

where

c=\sup\limits_x\in

	dP
	dP_\theta

(x).

That is, a uniformly distributed random variable

p\simUnif(0,1)

is generated, and the sample from

f_\theta(x)

is accepted if

p\leq

	1
	c

\exp(-\thetax+\kappa(\theta)).

Importance sampling

Applying the exponentially tilted distribution as the importance distribution yields the equation

E(h(X))=E_\theta[\ell(X)h(X)]

where

\ell(X)=

	dP
	dP_\theta

is the likelihood function. So, one samples from

f_\theta

to estimate the probability under the importance distribution

P(dX)

and then multiplies it by the likelihood ratio. Moreover, we have the variance given by

Var(X)=E[(\ell(X)h(X)^2]

Example

Assume independent and identically distributed

\{X_i\}

such that

\kappa(\theta)<infty

. In order to estimate

P(X₁+ … +X_n>c)

, we can employ importance sampling by taking

h(X)=

	n
I(\sum
	i=1

X_i>c)

The constant

can be rewritten as

for some other constant

. Then,

	n
P(\sum
	i=1

X_i>na)=

E
	\theta_a

\left[\exp\{-\theta_a\sum

	n

	i=1

X_i+n\kappa(\theta_a)\}I(\sum

	n

	i=1

X_i>na)\right]

where

\theta_a

denotes the

\theta

defined by the saddle-point equation

\kappa'(\theta_a)=a

Stochastic processes

Given the tilting of a normal R.V., it is intuitive that the exponential tilting of

X_t

, a Brownian motion with drift

\mu

and variance

\sigma²

, is a Brownian motion with drift

\mu+\theta\sigma²

and variance

\sigma²

. Thus, any Brownian motion with drift under

can be thought of as a Brownian motion without drift under

P
	\theta^*

. To observe this, consider the process

X_t=B_t+\mu_t

f(X_t)=

f
	\theta^*

dP
	\theta^*

=f(B_t)\exp\{\muB_T-

	1
	2

\mu^2T\}

. The likelihood ratio term,

\exp\{\muB_T-

	1
	2

\mu²T\}

, is a martingale and commonly denoted

M_T

. Thus, a Brownian motion with drift process (as well as many other continuous processes adapted to the Brownian filtration) is a

P
	\theta^*

-martingale.^[10] ^[11]

Stochastic Differential Equations

dX(t)=\mu(t)dt+\sigma(t)dB(t)

dX_\theta(t)=\mu_\theta(t)dt+\sigma(t)dB(t)

, where

\mu_\theta(t)

\mu(t)+\theta\sigma(t)

. Girsanov's Formula states the likelihood ratio

	dP
	dP_\theta

T	\mu_\theta(t)-\mu(t)
	\sigma^2(t)

\exp\{-\int\limits

dB(t)+

T(	\sigma^2(t)
	2

\int\limits

)dt\}

. Therefore, Girsanov's Formula can be used to implement importance sampling for certain SDEs.

Tilting can also be useful for simulating a process

X(t)

via rejection sampling of the SDE

dX(t)=\mu(X(t))dt+dB(t)

. We may focus on the SDE since we know that

X(t)

can be written

	t
\int\limits
	0

dX(t)+X(0)

. As previously stated, a Brownian motion with drift can be tilted to a Brownian motion without drift. Therefore, we choose

P_proposal

=P
	\theta^*

. The likelihood ratio

dP
	\theta^*

(dX(s):0\leqs\leqt)=

\prod\limits_\tau\geq\exp\{\mu(X(\tau))dX(\tau)-

	\mu(X(\tau))²
	2

\}dt=

	t\mu(X(\tau))dX(\tau)
\exp\{\int\limits
	0

t	\mu(X(s))²
	2

\int\limits

\}dt

. This likelihood ratio will be denoted

M(t)

. To ensure this is a true likelihood ratio, it must be shown that

E[M(t)]=1

. Assuming this condition holds, it can be shown that

f_X(t)(y)=

	\theta^*
f
	X(t)

(y)E
	\theta^*

[M(t)|X(t)=y]

. So, rejection sampling prescribes that one samples from a standard Brownian motion and accept with probability

f_X(t)(y)

	\theta^*
f		(y)
	X(t)

	1
	c

	1
	c

E
	\theta^*

[M(t)|X(t)=y]

Choice of tilting parameter

Siegmund's algorithm

Assume i.i.d. X's with light tailed distribution and

E[X]>0

. In order to estimate

\psi(c)=P(\tau(c)<infty)

where

\tau(c)=

	t
inf\{t:\sum\limits
	i=1

X_i>c\}

, when

is large and hence

\psi(c)

small, the algorithm uses exponential tilting to derive the importance distribution. The algorithm is used in many aspects, such as sequential tests,^[12] G/G/1 queue waiting times, and

\psi

is used as the probability of ultimate ruin in ruin theory. In this context, it is logical to ensure that

P_{\theta(\tau(c)}<infty)=1

. The criterion

\theta>\theta₀

, where

\theta₀

is s.t.

\kappa'(\theta₀₎=0

achieves this. Siegmund's algorithm uses

\theta=\theta^*

, if it exists, where

\theta^*

is defined in the following way:

\kappa(\theta^*)=0

. It has been shown that

\theta^*

is the only tilting parameter producing bounded relative error (

\underset{x → infty}{\lim\sup}

	VarI_A(x)
	PA(x)²

<infty

).^[13]

Black-Box algorithms

We can only see the input and output of a black box, without knowing its structure. The algorithm is to use only minimal information on its structure. When we generate random numbers, the output may not be within the same common parametric class, such as normal or exponential distributions. An automated way may be used to perform ECM. Let

X_1,X_2,...

be i.i.d. r.v.’s with distribution

; for simplicity we assume

X\geq0

. Define

ak{F}_n=\sigma(X_1,...,X_n,U_1,...,U_n)

, where

U_1,U₂

, . . . are independent (0, 1) uniforms. A randomized stopping time for

X_1,X₂

, . . . is then a stopping time w.r.t. the filtration

\{ak{F}_n\}

, . . . Let further

ak{G}

be a class of distributions

[0,infty)

with

k_G=

	infty
\int
	0

e^\thetaG(dx)<infty

and define

G_\theta

	dG_\theta
	dG(x)

	\thetax-k_G
e

. We define a black-box algorithm for ECM for the given

\theta

and the given class

ak{G}

of distributions as a pair of a randomized stopping time

\tau

and an

ak{F}_\tau-

measurable r.v.

such that

is distributed according to

G_\theta

for any

G\inak{G}

. Formally, we write this as

P_G(Z<x)=G_\theta(x)

for all

. In other words, the rules of the game are that the algorithm may usesimulated values from

and additional uniforms to produce an r.v. from

G_\theta

.^[14]

Notes and References

H.U. Gerber & E.S.W. Shiu. 1994. Option pricing by Esscher transforms. Transactions of the Society of Actuaries. 46. 99–191.
Book: Cruz, Marcelo. Fundamental Aspects of Operational Risk and Insurance Analytics. Wiley. 2015. 978-1-118-11839-9 . 784–796.
Book: Butler, Ronald. Saddlepoint Approximations with Applications. limited. Cambridge University Press. 2007. 9780521872508. 156.
Siegmund. D. . 1976 . Importance Sampling in the Monte Carlo Study of Sequential Tests . . 4 . 4 . 673–684. 10.1214/aos/1176343541 . free.
Book: Asmussen Soren & Glynn Peter. Stochastic Simulation. Springer. 2007. 978-0-387-30679-7. 130.
Fuh. Cheng-Der. Teng. Huei-Wen. Wang. Ren-Her. Efficient Importance Sampling for Rare Event Simulation with Applications. 2013.
Asmussen, Soren & Glynn, Peter (2007). Stochastic Simulation. Springer. pp. 164–167.
Book: Butler, Ronald. Saddlepoint Approximations with Applications. limited. Cambridge University Press. 2007. 9780521872508. 156–157.
Book: Seeber, G.U.H.. Advances in GLIM and Statistical Modelling. Springer. 1992. 978-0-387-97873-4 . 195–200.
Book: Asmussen Soren & Glynn Peter. Stochastic Simulation. Springer. 2007. 978-0-387-30679-7. 407.
Book: Steele, J. Michael. Stochastic Calculus and Financial Applications. limited. Springer. 2001. 978-1-4419-2862-7 . 213–229.
D. Siegmund (1985) Sequential Analysis. Springer-Verlag
Book: Asmussen Soren & Glynn Peter, Peter. Stochastic Simulation. Springer. 2007. 978-0-387-30679-7. 164–167.
Asmussen, Soren & Glynn, Peter (2007). Stochastic Simulation. Springer. pp. 416–420.

Exponential tilting explained

Overview

Example

Advantages

Properties

Applications

Rare-event simulation

Saddlepoint approximation

Rejection sampling

Importance sampling

Example

Stochastic processes

Stochastic Differential Equations

Choice of tilting parameter

Siegmund's algorithm

Black-Box algorithms

See also

Notes and References