Stochastic programming explained
In the field of mathematical optimization, stochastic programming is a framework for modeling optimization problems that involve uncertainty. A stochastic program is an optimization problem in which some or all problem parameters are uncertain, but follow known probability distributions.[1] [2] This framework contrasts with deterministic optimization, in which all problem parameters are assumed to be known exactly. The goal of stochastic programming is to find a decision which both optimizes some criteria chosen by the decision maker, and appropriately accounts for the uncertainty of the problem parameters. Because many real-world decisions involve uncertainty, stochastic programming has found applications in a broad range of areas ranging from finance to transportation to energy optimization.[3] [4]
Methods
Several stochastic programming methods have been developed:
Two-stage problem definition
The basic idea of two-stage stochastic programming is that (optimal) decisions should be based on data available at the time the decisions are made and cannot depend on future observations. The two-stage formulation is widely used in stochastic programming. The general formulation of a two-stage stochastic programming problem is given by:where
is the optimal value of the second-stage problem
The classical two-stage linear stochastic programming problems can be formulated as
where
is the optimal value of the second-stage problem
In such formulation
is the first-stage decision variable vector,
is the second-stage decision variable vector, and
contains the data of the second-stage problem. In this formulation, at the first stage we have to make a "here-and-now" decision
before the realization of the uncertain data
, viewed as a random vector, is known. At the second stage, after a realization of
becomes available, we optimize our behavior by solving an appropriate optimization problem.
At the first stage we optimize (minimize in the above formulation) the cost
of the first-stage decision plus the expected cost of the (optimal) second-stage decision. We can view the second-stage problem simply as an optimization problem which describes our supposedly optimal behavior when the uncertain data is revealed, or we can consider its solution as a recourse action where the term
compensates for a possible inconsistency of the system
and
is the cost of this recourse action.
The considered two-stage problem is linear because the objective functions and the constraints are linear. Conceptually this is not essential and one can consider more general two-stage stochastic programs. For example, if the first-stage problem is integer, one could add integrality constraints to the first-stage problem so that the feasible set is discrete. Non-linear objectives and constraints could also be incorporated if needed.[5]
Distributional assumption
The formulation of the above two-stage problem assumes that the second-stage data
is modeled as a random vector with a
known probability distribution. This would be justified in many situations. For example, the distribution of
could be inferred from historical data if one assumes that the distribution does not significantly change over the considered period of time. Also, the empirical distribution of the sample could be used as an approximation to the distribution of the future values of
. If one has a prior model for
, one could obtain a posteriori distribution by a Bayesian update.
Scenario-based approach
Discretization
To solve the two-stage stochastic problem numerically, one often needs to assume that the random vector
has a finite number of possible realizations, called
scenarios, say
, with respective probability masses
. Then the expectation in the first-stage problem's objective function can be written as the summation:
and, moreover, the two-stage problem can be formulated as one large linear programming problem (this is called the deterministic equivalent of the original problem, see section).
When
has an infinite (or very large) number of possible realizations the standard approach is then to represent this distribution by scenarios. This approach raises three questions, namely:
- How to construct scenarios, see ;
- How to solve the deterministic equivalent. Optimizers such as CPLEX, and GLPK can solve large linear/nonlinear problems. The NEOS Server,[6] hosted at the University of Wisconsin, Madison, allows free access to many modern solvers. The structure of a deterministic equivalent is particularly amenable to apply decomposition methods,[7] such as Benders' decomposition or scenario decomposition;
- How to measure quality of the obtained solution with respect to the "true" optimum.
These questions are not independent. For example, the number of scenarios constructed will affect both the tractability of the deterministic equivalent and the quality of the obtained solutions.
Stochastic linear programming
A stochastic linear program is a specific instance of the classical two-stage stochastic program. A stochastic LP is built from a collection of multi-period linear programs (LPs), each having the same structure but somewhat different data. The
two-period LP, representing the
scenario, may be regarded as having the following form:
\begin{array}{lccccccc}
Minimize&fTx&+&gTy&+&
&&\
subjectto&Tx&+&Uy&&&=&r\
&&&Vky&+&Wkzk&=&sk\
&x&,&y&,&zk&\geq&0
\end{array}
The vectors
and
contain the first-period variables, whose values must be chosen immediately. The vector
contains all of the variables for subsequent periods. The constraints
involve only first-period variables and are the same in every scenario. The other constraints involve variables of later periods and differ in some respects from scenario to scenario, reflecting uncertainty about the future.
Note that solving the
two-period LP is equivalent to assuming the
scenario in the second period with no uncertainty. In order to incorporate uncertainties in the second stage, one should assign probabilities to different scenarios and solve the corresponding deterministic equivalent.
Deterministic equivalent of a stochastic problem
With a finite number of scenarios, two-stage stochastic linear programs can be modelled as large linear programming problems. This formulation is often called the deterministic equivalent linear program, or abbreviated to deterministic equivalent. (Strictly speaking a deterministic equivalent is any mathematical program that can be used to compute the optimal first-stage decision, so these will exist for continuous probability distributions as well, when one can represent the second-stage cost in some closed form.)For example, to form the deterministic equivalent to the above stochastic linear program, we assign a probability
to each scenario
. Then we can minimize the expected value of the objective, subject to the constraints from all scenarios:
\begin{array}{lccccccccccccc}
Minimize&f\topx&+&g\topy&+&p1h
z1&+&p2h
&+& … &+&pKh
zK&&\
subjectto&Tx&+&Uy&&&&&&&&&=&r\
&&&V1y&+&W1z1&&&&&&&=&s1\
&&&V2y&&&+&W2z2&&&&&=&s2\
&&&\vdots&&&&&&\ddots&&&&\vdots\
&&&VKy&&&&&&&+&WKzK&=&sK\
&x&,&y&,&z1&,&z2&,&\ldots&,&zK&\geq&0\
\end{array}
We have a different vector
of later-period variables for each scenario
. The first-period variables
and
are the same in every scenario, however, because we must make a decision for the first period before we know which scenario will be realized. As a result, the constraints involving just
and
need only be specified once, while the remaining constraints must be given separately for each scenario.
Scenario construction
In practice it might be possible to construct scenarios by eliciting experts' opinions on the future. The number of constructed scenarios should be relatively modest so that the obtained deterministic equivalent can be solved with reasonable computational effort. It is often claimed that a solution that is optimal using only a few scenarios provides more adaptable plans than one that assumes a single scenario only. In some cases such a claim could be verified by a simulation. In theory some measures of guarantee that an obtained solution solves the original problem with reasonable accuracy. Typically in applications only the first stage optimal solution
has a practical value since almost always a "true" realization of the random data will be different from the set of constructed (generated) scenarios.
Suppose
contains
independent random components, each of which has three possible realizations (for example, future realizations of each random parameters are classified as low, medium and high), then the total number of scenarios is
. Such
exponential growth of the number of scenarios makes model development using expert opinion very difficult even for reasonable size
. The situation becomes even worse if some random components of
have continuous distributions.
Monte Carlo sampling and Sample Average Approximation (SAA) Method
A common approach to reduce the scenario set to a manageable size is by using Monte Carlo simulation. Suppose the total number of scenarios is very large or even infinite. Suppose further that we can generate a sample
of
replications of the random vector
. Usually the sample is assumed to be
independent and identically distributed (i.i.d sample). Given a sample, the expectation function
is approximated by the sample average
and consequently the first-stage problem is given by
\begin{array}{rlrrr}
\hat{g}N(x)=&min\limits
&cTx+
Q(x,\xij)&\\
&subjectto&Ax&=&b\\
& &x&\geq&0
\end{array}
This formulation is known as the Sample Average Approximation method. The SAA problem is a function of the considered sample and in that sense is random. For a given sample
the SAA problem is of the same form as a two-stage stochastic linear programming problem with the scenarios
.,
, each taken with the same probability
.
Statistical inference
Consider the following stochastic programming problem
min\limitsx\in\{g(x)=f(x)+E[Q(x,\xi)]\}
Here
is a nonempty closed subset of
,
is a random vector whose probability distribution
is supported on a set
, and
. In the framework of two-stage stochastic programming,
is given by the optimal value of the corresponding second-stage problem.
Assume that
is well defined and
finite valued for all
. This implies that for every
the value
is finite almost surely.
Suppose that we have a sample
of
realizations of the random vector
. This random sample can be viewed as historical data of
observations of
, or it can be generated by Monte Carlo sampling techniques. Then we can formulate a corresponding
sample average approximationmin\limitsx\in\{\hat{g}N(x)=f(x)+
Q(x,\xij)\}
By the Law of Large Numbers we have that, under some regularity conditions
converges pointwise with probability 1 to
as
. Moreover, under mild additional conditions the convergence is uniform. We also have
, i.e.,
is an
unbiased estimator of
. Therefore, it is natural to expect that the optimal value and optimal solutions of the SAA problem converge to their counterparts of the true problem as
.
Consistency of SAA estimators
Suppose the feasible set
of the SAA problem is fixed, i.e., it is independent of the sample. Let
and
be the optimal value and the set of optimal solutions, respectively, of the true problem and let
and
be the optimal value and the set of optimal solutions, respectively, of the SAA problem.
- Let
and
be a sequence of (deterministic) real valued functions. The following two properties are equivalent:
-
and any sequence
converging to
it follows that
converges to
-
is continuous on
and
converges to
uniformly on any compact subset of
- If the objective of the SAA problem
converges to the true problem's objective
with probability 1, as
, uniformly on the feasible set
. Then
converges to
with probability 1 as
.
- Suppose that there exists a compact set
such that
-
of optimal solutions of the true problem is nonempty and is contained in
-
is finite valued and continuous on
- the sequence of functions
converges to
with probability 1, as
, uniformly in
-
large enough the set
is nonempty and
with probability 1
then
\hat{\vartheta}N → \vartheta*
and
with probability 1 as
. Note that
denotes the
deviation of set
from set
, defined as
D(A,B):=\supx\in\{infx'\|x-x'\|\}
In some situations the feasible set
of the SAA problem is estimated, then the corresponding SAA problem takes the form
where
is a subset of
depending on the sample and therefore is random. Nevertheless, consistency results for SAA estimators can still be derived under some additional assumptions:
- Suppose that there exists a compact set
such that
-
of optimal solutions of the true problem is nonempty and is contained in
-
is finite valued and continuous on
- the sequence of functions
converges to
with probability 1, as
, uniformly in
-
large enough the set
is nonempty and
with probability 1
-
and
converges with probability 1 to a point
, then
-
there exists a sequence
such that
with probability 1.
then
\hat{\vartheta}N → \vartheta*
and
with probability 1 as
.
Asymptotics of the SAA optimal value
Suppose the sample
is i.i.d. and fix a point
. Then the sample average estimator
, of
, is unbiased and has variance
, where
\sigma2(x):=Var[Q(x,\xi)]
is supposed to be finite. Moreover, by the
central limit theorem we have that
\sqrt{N}[\hat{g}N-g(x)]\xrightarrow{l{D}}Yx
where
denotes convergence in
distribution and
has a normal distribution with mean
and variance
, written as
.
In other words,
has
asymptotically normal distribution, i.e., for large
,
has approximately normal distribution with mean
and variance
. This leads to the following (approximate)
% confidence interval for
:
\left[\hat{g}N(x)-z\alpha/2
| \hat{\sigma |
(x)}{\sqrt{N}}, |
\hat{g}N(x)+z\alpha/2
| \hat{\sigma |
(x)}{\sqrt{N}}\right]
|
where
z\alpha/2:=\Phi-1(1-\alpha/2)
(here
denotes the cdf of the standard normal distribution) and
\hat{\sigma}2(x):=
\left[
Q(x,\xij)\right]2
is the sample variance estimate of
. That is, the error of estimation of
is (stochastically) of order
.
Applications and Examples
Biological applications
Stochastic dynamic programming is frequently used to model animal behaviour in such fields as behavioural ecology.[8] [9] Empirical tests of models of optimal foraging, life-history transitions such as fledging in birds and egg laying in parasitoid wasps have shown the value of this modelling technique in explaining the evolution of behavioural decision making. These models are typically many-staged, rather than two-staged.
Economic applications
Stochastic dynamic programming is a useful tool in understanding decision making under uncertainty. The accumulation of capital stock under uncertainty is one example; often it is used by resource economists to analyze bioeconomic problems[10] where the uncertainty enters in such as weather, etc.
Example: multistage portfolio optimization
See main article: Intertemporal portfolio choice.
See also: Merton's portfolio problem.
The following is an example from finance of multi-stage stochastic programming.Suppose that at time
we have initial capital
to invest in
assets. Suppose further that we are allowed to rebalance our portfolio at times
but without injecting additional cash into it. At each period
we make a decision about redistributing the current wealth
among the
assets. Let
be the initial amounts invested in the n assets. We require that each
is nonnegative and that the balance equation
should hold.
Consider the total returns
for each period
. This forms a vector-valued random process
. At time period
, we can rebalance the portfolio by specifying the amounts
invested in the respective assets. At that time the returns in the first period have been realized so it is reasonable to use this information in the rebalancing decision. Thus, the second-stage decisions, at time
, are actually functions of realization of the random vector
, i.e.,
. Similarly, at time
the decision
is a function
of the available information given by
the history of the random process up to time
. A sequence of functions
,
, with
being constant, defines an
implementable policy of the decision process. It is said that such a policy is
feasible if it satisfies the model constraints with probability 1, i.e., the nonnegativity constraints
,
,
, and the balance of wealth constraints,
where in period
the wealth
is given by
Wt=
\xiitxi,t-1(\xi[t-1]),
which depends on the realization of the random process and the decisions up to time
.
Suppose the objective is to maximize the expected utility of this wealth at the last period, that is, to consider the problem
This is a multistage stochastic programming problem, where stages are numbered from
to
. Optimization is performed over all implementable and feasible policies. To complete the problem description one also needs to define the probability distribution of the random process
. This can be done in various ways. For example, one can construct a particular scenario tree defining time evolution of the process. If at every stage the random return of each asset is allowed to have two continuations, independent of other assets, then the total number of scenarios is
In order to write dynamic programming equations, consider the above multistage problem backward in time. At the last stage
, a realization
\xi[T-1]=(\xi1,...,\xiT-1)
of the random process is known and
has been chosen. Therefore, one needs to solve the following problem
\begin{array}{lrclr}
max\limits | |
| xT-1 |
&E[U(WT)|\xi[T-1]]&\\
subjectto&WT&=&
\xiiTxi,T-1\\
xi,T-1&=&WT-1\\
&xT-1&\geq&0
\end{array}
where
denotes the conditional expectation of
given
. The optimal value of the above problem depends on
and
and is denoted
.
Similarly, at stages
, one should solve the problem
\begin{array}{lrclr}
max\limits | |
| xt |
&E[Qt+1(Wt+1,\xi[t+1])|\xi[t]]&\\
subjectto&Wt+1&=&
\xii,t+1xi,t\\
xi,t&=&Wt\\
&xt&\geq&0
\end{array}
whose optimal value is denoted by
. Finally, at stage
, one solves the problem
\begin{array}{lrclr}
max\limits | |
| x0 |
&E[Q1(W1,\xi[1])]&\\
subjectto&W1&=&
\xii,1xi0\\
xi0&=&W0\\
&x0&\geq&0
\end{array}
Stagewise independent random process
For a general distribution of the process
, it may be hard to solve these dynamic programming equations. The situation simplifies dramatically if the process
is stagewise independent, i.e.,
is (stochastically) independent of
for
. In this case, the corresponding conditional expectations become unconditional expectations, and the function
,
does not depend on
. That is,
is the optimal value of the problem
\begin{array}{lrclr}
max\limits | |
| xT-1 |
&E[U(WT)]&\\
subjectto&WT&=&
\xiiTxi,T-1\\
xi,T-1&=&WT-1\\
&xT-1&\geq&0
\end{array}
and
is the optimal value of
\begin{array}{lrclr}
max\limits | |
| xt |
&E[Qt+1(Wt+1)]&\\
subjectto&Wt+1&=&
\xii,t+1xi,t\\
xi,t&=&Wt\\
&xt&\geq&0
\end{array}
for
.
Software tools
Modelling languages
All discrete stochastic programming problems can be represented with any algebraic modeling language, manually implementing explicit or implicit non-anticipativity to make sure the resulting model respects the structure of the information made available at each stage. An instance of an SP problem generated by a general modelling language tends to grow quite large (linearly in the number of scenarios), and its matrix loses the structure that is intrinsic to this class of problems, which could otherwise be exploited at solution time by specific decomposition algorithms.Extensions to modelling languages specifically designed for SP are starting to appear, see:
- AIMMS - supports the definition of SP problems
- EMP SP (Extended Mathematical Programming for Stochastic Programming) - a module of GAMS created to facilitate stochastic programming (includes keywords for parametric distributions, chance constraints and risk measures such as Value at risk and Expected shortfall).
- SAMPL - a set of extensions to AMPL specifically designed to express stochastic programs (includes syntax for chance constraints, integrated chance constraints and Robust Optimization problems)
They both can generate SMPS instance level format, which conveys in a non-redundant form the structure of the problem to the solver.
See also
Further reading
- John R. Birge and François V. Louveaux. Introduction to Stochastic Programming. Springer Verlag, New York, 1997.
- Book: Kall. Peter . Wallace. Stein W.. Stochastic programming . Wiley-Interscience Series in Systems and Optimization. John Wiley & Sons, Ltd.. Chichester. 1994. xii+307. 0-471-95158-7. 1315300 .
- G. Ch. Pflug: Optimization of Stochastic Models. The Interface between Simulation and Optimization. Kluwer, Dordrecht, 1996.
- András Prékopa. Stochastic Programming. Kluwer Academic Publishers, Dordrecht, 1995.
- Andrzej Ruszczynski and Alexander Shapiro (eds.) (2003) Stochastic Programming. Handbooks in Operations Research and Management Science, Vol. 10, Elsevier.
- Book: Shapiro. Alexander. Darinka Dentcheva. Dentcheva. Darinka. Andrzej Piotr Ruszczyński. Ruszczyński. Andrzej. Lectures on stochastic programming: Modeling and theory. MPS/SIAM Series on Optimization. 9. Society for Industrial and Applied Mathematics (SIAM). Philadelphia, PA. Mathematical Programming Society (MPS). 2009. xvi+436. 978-0-89871-687-0. 2562798. 2010-09-22. 2020-03-24. https://web.archive.org/web/20200324131907/https://www2.isye.gatech.edu/people/faculty/Alex_Shapiro/SPbook.pdf. dead.
- Stein W. Wallace and William T. Ziemba (eds.) (2005) Applications of Stochastic Programming. MPS-SIAM Book Series on Optimization 5
- Book: King. Alan J.. Wallace. Stein W.. Modeling with Stochastic Programming . Springer Series in Operations Research and Financial Engineering. Springer. New York. 2012. 978-0-387-87816-4.
External links
Notes and References
- Book: Shapiro. Alexander. Lectures on stochastic programming: Modeling and theory. Dentcheva. Darinka. Ruszczyński. Andrzej. Society for Industrial and Applied Mathematics (SIAM). 2009. 978-0-89871-687-0. MPS/SIAM Series on Optimization. 9. Philadelphia, PA. xvi+436. 2562798. Darinka Dentcheva. Andrzej Piotr Ruszczyński. Mathematical Programming Society (MPS). 2010-09-22. 2020-03-24. https://web.archive.org/web/20200324131907/https://www2.isye.gatech.edu/people/faculty/Alex_Shapiro/SPbook.pdf. dead.
- Book: Birge. John R.. Louveaux. François. 2011. Introduction to Stochastic Programming. Springer Series in Operations Research and Financial Engineering. en-gb. 10.1007/978-1-4614-0237-4. 978-1-4614-0236-7. 1431-8598.
- Stein W. Wallace and William T. Ziemba (eds.). Applications of Stochastic Programming. MPS-SIAM Book Series on Optimization 5, 2005.
- Applications of stochastic programming are described at the following website, Stochastic Programming Community.
- Book: Shapiro. Alexander. Philpott. Andy. A tutorial on Stochastic Programming.
- Web site: NEOS Server for Optimization.
- Book: Alexander. Shapiro. Ruszczyński. Andrzej. Stochastic Programming. Elsevier. 2003. 978-0444508546. Handbooks in Operations Research and Management Science. 10. Philadelphia. 700. Andrzej Piotr Ruszczyński.
- Mangel, M. & Clark, C. W. 1988. Dynamic modeling in behavioral ecology. Princeton University Press
- Houston, A. I & McNamara, J. M. 1999. Models of adaptive behaviour: an approach based on state. Cambridge University Press
- Howitt, R., Msangi, S., Reynaud, A and K. Knapp. 2002. "Using Polynomial Approximations to Solve Stochastic Dynamic Programming Problems: or A "Betty Crocker " Approach to SDP." University of California, Davis, Department of Agricultural and Resource Economics Working Paper.