Innovation Method Explained
In statistics, the Innovation Method provides an estimator for the parameters of stochastic differential equations given a time series of (potentially noisy) observations of the state variables. In the framework of continuous-discrete state space models, the innovation estimator is obtained by maximizing the log-likelihood of the corresponding discrete-time innovation process with respect to the parameters. The innovation estimator can be classified as a M-estimator, a quasi-maximum likelihood estimator or a prediction error estimator depending on the inferential considerations that want to be emphasized. The innovation method is a system identification technique for developing mathematical models of dynamical systems from measured data and for the optimal design of experiments.
Background
Stochastic differential equations (SDEs) have become an important mathematical tool for describing the time evolution of several random phenomenon in natural, social and applied sciences. Statistical inference for SDEs is thus of great importance in applications for model building, model selection, model identification and forecasting. To carry out statistical inference for SDEs, measurements of the state variables of these random phenomena are indispensable. Usually, in practice, only a few state variables are measured by physical devices that introduce random measurement errors (observational errors).
Mathematical model for inference
The innovation estimator. for SDEs is defined in the framework of continuous-discrete state space models.[1] These models arise as natural mathematical representation of the temporal evolution of continuous random phenomena and their measurements in a succession of time instants. In the simplest formulation, these continuous-discrete models are expressed in termof a SDE of the form
dx(t)=f(t,x(t);\theta)dt
gi (t,x(t);\theta) dwi(t) (1)
describing the time evolution of
state variables
of the phenomenon for all time instant
, and an observation equation
describing the time series of measurements
of at least one of the variables
of the random phenomenon on
time instants
. In the model (1)-(2),
and
are
differentiable functions,
is an
-dimensional standard
Wiener process,
is a vector of
parameters,
:
\sim\Nu(
)\}k=0,...,M-1
is a sequence of
-dimensional
i.i.d. Gaussian random vectors independent of
,
an
positive definite matrix, and
an
matrix.
Statistical problem to solve
Once the dynamics of a phenomenon is described by a state equation as (1) and the way of measurement the state variables specified by an observation equation as (2), the inference problem to solve is the following:[2] given
partial and noisy observations
of the stochastic process
on the observation times
, estimate the unobserved state variable of
and the unknown parameters
in (1) that better fit to the given observations.
Discrete-time innovation process
Let
be the sequence of
observation times
of the states of (1), and
Z\rho=\{
:tk\leq\rho,tk\in\{t\}M\}
the time series of partial and noisy measurements of
described by the observation equation (2).
Further, let
xt/\rho=\Epsilon(x(t)|Z\rho)
and
Ut/\rho=E(x(t)x\intercal(t)|Z\rho)-xt/\rho
be the
conditional mean and variance of
with
, where
denotes the
expected value of random vectors.
The random sequence
with
}}(\theta), \qquad \qquad (3)
defines the discrete-time innovation process,[3] where
is proved to be an independent
normally distributed random vector with zero mean and variance
}(\theta)\ \mathbf^\intercal + \Pi_, \qquad \qquad (4)
for small enough
\Delta=\underset{k}{max}\{tk+1-tk\}
, with
. In practice,
[4] this distribution for the discrete-time innovation is valid when, with a suitable selection of both, the number
of observations and the time distance
between consecutive observations, the time series of observations
of the SDE contains the main information about the continuous-time process
. That is, when the sampling of the continuous-time process
has low distortion (
aliasing) and when there is a suitable signal-noise ratio.
Innovation estimator
The innovation estimator for the parameters of the SDE (1) is the one that maximizes the likelihood function of the discrete-time innovation process
with respect to the parameters. More precisely, given
measurements
of the state space model (1)-(2) with
on
the
innovation estimator for the parameters
of (1) is defined by
\hat{\theta}M=\operatorname\arg\{\underset{\theta}{min} UM(\theta,
)\},
(5)
where
being
}the discrete-time innovation (3) and
the innovation variance (4) of the model (1)-(2) at
, for all
In the above expression for
the conditional mean
and variance
are computed by the continuous-discrete
filtering algorithm for the evolution of the moments (Section 6.4 in), for all
Differences with the maximum likelihood estimator
The maximum likelihood estimator of the parameters
in the model (1)-(2) involves the evaluation of the - usually unknown -
transition density function p\theta(tk+1-tk,x(tk),x(tk+1))
between the states
and
of the
diffusion process
for all the observation times
and
.
[5] Instead of this, the innovation estimator (5) is obtained by maximizing the likelihood of the discrete-time innovation process
taking into account that
are Gaussian and independent random vectors. Remarkably, whereas the transition density function
p\theta(tk+1-tk,x(tk),x(tk+1))
changes when the SDE for
does, the transition density function
for the innovation process remains Gaussian independently of the SDEs for
. Only in the case that the diffusion
is described by a linear SDE with additive noise, the density function
p\theta(tk+1-tk,x(tk),x(tk+1))
is Gaussian and equal to
ak{p}\theta(tk+1-tk,
,
),
and so the maximum likelihood and the innovation estimator coincide.
[6] Otherwise, the innovation estimator is an approximation to the maximum likelihood estimator and, in this sense, the innovation estimator is a Quasi-Maximum Likelihood estimator. In addition, the innovation method is a particular instance of the Prediction Error method according to the definition given in.
[7] Therefore, the asymptotic results obtained in for that general class of estimators are valid for the innovation estimators.
[8] [9] Intuitively, by following the typical control engineering viewpoint, it is expected that the innovation process - viewed as a measure of the prediction errors of the fitted model - be approximately a white noise process when the models fit the data,
[10] which can be used as a practical tool for designing of models and for optimal experimental design.
Properties
The innovation estimator (5) has a number of important attributes:
-UM,h(\widehat{\theta}M,
)
of the innovation estimator (5) can be used to compute the
Akaike or Bayesian information criterion.
confidence limits \widehat{\theta}M\pmtriangleup
for the innovation estimator
is estimated with
triangleup=t1-\alpha,\sqrt{
| diag(Var(\widehat{\theta |
M))}{M-p}},
|
where
is the
t-student distribution with
significance level, and
degrees of freedom . Here,
Var(\widehat{\theta}M)=(I
denotes the variance of the innovation estimator
, where
I(\widehat{\theta}M
Ik(\widehat{\theta}M)
is the Fisher Information matrix the innovation estimator
of
and
\lbrackIk(\widehat{\theta}M)]m,n=
| \partial\mu\intercal |
\partial\thetam |
\Sigma-1
| \partial\mu | + |
\partial\thetan |
trace(\Sigma-1
| \partial\Sigma |
\partial\thetam |
\Sigma-1
| \partial\Sigma |
\partial\thetan |
)
is the entry
of the matrix
with
and
\Sigma
(\widehat{\theta}M)
, for
.
(\widehat{\theta}M)\}k=1,\ldots
measures the
goodness of fit of the model to the data.
- For smooth enough function
, nonlinear observation equations of the form
can be transformed to the simpler one (2), and the innovation estimator (5) can be applied.
Approximate Innovation estimators
In practice, close form expressions for computing
and
in (5) are only available for a few models (1)-(2). Therefore, approximate filtering algorithms as the following are used in applications.
Given
measurements
and the initial filter estimates
,
, the
approximate Linear Minimum Variance (LMV) filter for the model (1)-(2) is iteratively defined at each observation time
by the prediction estimates
[12]
and
=E(y(tk+1)y
\intercal(tk+1
, (7)
with initial conditions
and
, and the filter estimates
and
with filter gain
C\intercal
(C\intercal
)-1
for all
, where
is an approximation to the solution
of (1) on the observation times
.
Given
measurements
of the state space model (1)-(2) with
on
, the
approximate innovation estimator for the parameters
of (1) is defined by
}_=\arg \, \qquad \qquad (9)
where
\widetilde{U}M
)=(M-1)ln
ln(\det(\widetilde{\Sigma
}_))+\widetilde_^(\widetilde_)^\widetilde_,
being
}_=\mathbf_-\mathbf_(\theta) \qquad and
}_=\mathbf_(\theta)\mathbf^+\mathbf_
approximations to the discrete-time innovation (3) and innovation variance (4), respectively, resulting from the filtering algorithm (7)-(8).
For models with complete observations free of noise (i.e., with
and
in (2)), the approximate innovation estimator (9) reduces to the known Quasi-Maximum Likelihood estimators for SDEs.
Main conventional-type estimators
Conventional-type innovation estimators are those (9) derived from conventional-type continuous-discrete ordiscrete-discrete approximate filtering algorithms. With approximate continuous-discrete filters there are the innovation estimators based on Local Linearization (LL) filters,[13] on the extended Kalman filter,[14] [15] and on the second order filters. Approximate innovation estimators based on discrete-discrete filters result from the discretization of the SDE (1) by means of a numerical scheme.[16] [17] Typically, the effectiveness of these innovation estimators is directly related to the stability of the involved filtering algorithms.
A shared drawback of these conventional-type filters is that, once the observations are given, the error between the approximate and the exact innovation process is fixed and completely settled by the time distance between observations. This might set a large bias of the approximate innovation estimators in some applications, bias that cannot be corrected by increasing the number of observations. However, the conventional-type innovation estimators are useful in many practical situations for which only medium or low accuracy for the parameter estimation is required.
Order-β innovation estimators
Let us consider the finer time discretization
\left(\tau\right)h>0=\{\taun:\taun+1-\taun\leqhforn=0,1,\ldots,N\}
of the time interval
satisfying the condition
\left(\tau\right)h\supset\{t\}M
. Further, let
be the approximate value of
obtained from a discretization of the equation (1) for all
, and
y=\{y(t),t\in\lbrackt0,tM-1]:y(\taun)=yn,
for all
\taun\in\left(\tau\right)h\} (10)
a continuous-time approximation to
.
A order-
LMV filter. is an approximate LMV filter for which
is an
order-
weak approximation to
satisfying (10) and the
weak convergence condition \underset{tk\leqt\leqtk+1
}\left\vert E\left(g(\mathbf (t))|Z_\right) -E\left(g(\mathbf(t))|Z_\right) \right\vert\leq L_h^
for all
and any
times continuously differentiable functions
for which
and all its partial derivatives up to order
have polynomial growth, being
a positive constant. This order-
LMV filter converges with rate
to the exact LMV filter as
goes to zero, where
is the maximum stepsize of the timediscretization
on which the approximation
to
is defined.
A order-
innovation estimator is an approximate innovation estimator (9) for which the approximations to the discrete-time innovation (3) and innovation variance (4), respectively, resulting from an order-
LMV filter.
Approximations
of any kind converging to
in a weak sense (as, e.g., those in
[18]) can be used to design an order-
LMV filter and, consequently, an order-
innovation estimator. These order-
innovation estimators are intended for the recurrent practical situation in which a diffusion process should be identified from a reduced number of observations distant in time or when high accuracy for the estimated parameters is required.
Properties
An order-
innovation estimator
}_(h) has a number of important properties:
of
observations,
}_(h) converges to the exact innovation estimator
}_ as the maximum stepsize
of the time discretization
\left(\tau\right)h\supset\{t\}M
goes to zero.
observations, the expected value of
}_(h) converges to the expected value of the exact innovation estimator
}_ as
goes to zero.
- For an increasing number of observations,
}_(h) is asymptotically normal distributed and its bias decreases when
goes to zero.
- Likewise to the convergence of the order-
LMV filter to the exact LMV filter, for the convergence and asymptotic properties of
}_(h) there are no constraints on the time distance
between two consecutive observations
and
, nor on the time discretization
- Approximations for the Akaike or Bayesian information criterion and confidence limits are directly obtained by replacing the exact estimator
}_ by its approximation
}_(h). These approximations converge to the corresponding exact one when the maximum stepsize
of the time discretization
\left(\tau\right)h\supset\{t\}M
goes to zero.
- The distribution of the approximate fitting-innovation process
}_:\widetilde_=\mathbf_-\mathbf_(\widehat_(h))\}_ measures the goodness of fit of the model to the data, which is also used as a practical tool for designing of models and for optimal experimental design.
- For smooth enough function
, nonlinear observation equations of the form (6) can be transformed to the simpler one (2), and the order-
innovation estimator can be applied.
Figure 1 presents the histograms of the differences
(\widehat{\alpha}M
,\widehat{\sigma}M
)
and
(\widehat{\alpha}M-\widehat{\alpha}h,M,\widehat{\sigma}M-\widehat{\sigma}h,M)
between the exact innovation estimator
(\widehat{\alpha}M,\widehat{\sigma}M)
with the conventional
and order-
(\widehat{\alpha}h,M,\widehat{\sigma}h,M)
innovation estimators for the parameters
and
of the equation
dx=txdt+\sigma\sqrt{t}xdw (11)
obtained from 100 time series
of
noisy observations
=x(tk
,fork=0,1,..,M-1, (12)
of
on the observation times
\{t\}M=10=\{tk=0.5+k\Delta:k=0,\ldotsM-1
,
, with
and
. The classical and the order-
Local Linearization filters of the innovationestimators
and
(\widehat{\alpha}h,M,\widehat{\sigma}h,M)
are defined as in, respectively, on the uniform time discretizations
\left(\tau\right)h=\Delta\equiv\{t\}M
and
\left(\tau
\right)h=\Delta/2,\Delta/8,\Delta/32=\{\taun:\taun=0.5+nh
, with
. The number of stochastic simulations of the order-
Local Linearization filter is estimated via an adaptive sampling algorithm with moderate tolerance. The Figure 1 illustrates the convergence of the order-
innovation estimator
(\widehat{\alpha}h,M,\widehat{\sigma}h,M)
to the exact innovation estimators
(\widehat{\alpha}M,\widehat{\sigma}M)
as
decreases, which substantially improves the estimation provided by the conventional innovation estimator
| D |
(\widehat{\alpha} | |
| \Delta,M |
| D |
,\widehat{\sigma} | |
| \Delta,M |
)
.
Deterministic approximations
The order-
innovation estimators overcome the drawback of the conventional-type innovation estimators concerning the impossibility of reducing bias. However, the viable bias reduction of an order-
innovation estimators might eventually require that the associated order-
LMV filter performs a large number of stochastic simulations. In situations where only low or medium precision approximate estimators are needed, an alternative deterministic filter algorithm - called deterministic order-
LMV filter - can be obtained by tracking the first two conditional moments
and
of the order-
weak approximation
at all the time instants
\taun\in\left(\tau\right)h
in between two consecutive observation times
and
. That is, the value of the predictions
and
in the filtering algorithm are computed from the recursive formulas
and
with
\taun,\taun+1\in(\tau)h\cap\lbracktk,tk+1],
and with
. The approximate innovation estimators
}_ defined with these deterministic order-
LMV filters not longer converge to the exact innovation estimator, but allow a significant bias reduction in the estimated parameters for a given finite sample with a lower computational cost.
Figure 2 presents the histograms and the confidence limits of the approximate innovation estimators
(\widehat{\alpha}h,M,\widehat{\sigma}h,M)
and
(\widehat{\alpha} ⋅ ,\widehat{\sigma} ⋅ )
for the parameters
and
of the
Van der Pol oscillator with random frequency
dx2
-1)x2-\alphax1)dt+\sigmax1dw (14)
obtained from 100 time series
of
partial and noisy observations
=x1(tk
,fork=0,1,..,M-1, (15)
of
on the observation times
\{t\}M=30=\{tk=k\Delta:k=0,\ldotsM-1
,
, with
and
. The deterministic order-
Local Linearization filter of the innovation estimators
(\widehat{\alpha}h,,M,\widehat{\sigma}h,M)
and
(\widehat{\alpha} ⋅ ,\widehat{\sigma} ⋅ )
is defined, for each estimator, on uniform time discretizations
\left(\tau\right)h=\{\taun:\taun=nh
, with
and on an adaptive time-stepping discretization
with moderate relative and absolute tolerances, respectively. Observe the bias reduction of the estimated parameter as
decreases.
Software
A Matlab implementation of various approximate innovation estimators is provided by the SdeEstimation toolbox.[19] This toolbox has Local Linearization filters, including deterministic and stochastic options with fixed step sizes and sample numbers. It also offers adaptive time stepping and sampling algorithms, along with local and global optimization algorithms for innovation estimation. For models with complete observations free of noise, various approximations to the Quasi-Maximum Likelihood estimator are implemented in R.[20]
Notes and References
- Jazwinski A.H., Stochastic Processes and Filtering Theory, Academic Press, New York, 1970.
- Nielsen . Jan Nygaard . Vestergaard . Martin . 2000 . Estimation in continuous-time stochastic volatility models using nonlinear filters . International Journal of Theoretical and Applied Finance . 03 . 2 . 279–308 . 10.1142/S0219024900000139 . 0219-0249.
- Kailath T., Lectures on Wiener and Kalman Filtering. New York: Springer-Verlag, 1981.
- Jimenez . J. C. . Yoshimoto . A. . Miwakeichi . F. . 2021-08-24 . State and parameter estimation of stochastic physical systems from uncertain and indirect measurements . The European Physical Journal Plus . en . 136. 8. 136, 869 . 10.1140/epjp/s13360-021-01859-1 . 2021EPJP..136..869J . 238846267 . 2190-5444.
- Schweppe . F. . 1965 . Evaluation of likelihood functions for Gaussian signals . IEEE Transactions on Information Theory . 11 . 1 . 61–70 . 10.1109/TIT.1965.1053737 . 1557-9654.
- Jimenez . J. C. . Ozaki . T. . 2006 . An Approximate Innovation Method For The Estimation Of Diffusion Processes From Discrete Data . Journal of Time Series Analysis . en . 27 . 1 . 77–97 . 10.1111/j.1467-9892.2005.00454.x . 18072651 . 0143-9782.
- Ljung L., System Identification, Theory for the User (2nd edn). Englewood Cliffs: Prentice Hall, 1999.
- Lennart . Ljung . Caines . Peter E. . 1980 . Asymptotic normality of prediction error estimators for approximate system models . Stochastics . en . 3 . 1–4 . 29–46 . 10.1080/17442507908833135 . 43397253 . 0090-9491.
- Nolsoe K., Nielsen, J.N., Madsen H. (2000) "Prediction-based estimating function for diffusion processes with measurement noise", Technical Reports 2000, No. 10, Informatics and Mathematical Modelling, Technical University of Denmark.
- Ozaki . T. . Jimenez . J. C. . Haggan-Ozaki . V. . 2000 . The Role of the Likelihood Function in the Estimation of Chaos Models . Journal of Time Series Analysis . en . 21 . 4 . 363–387 . 10.1111/1467-9892.00189 . 122681657 . 0143-9782.
- Jimenez . J.C. . 2020 . Bias reduction in the estimation of diffusion processes from discrete observations . 2023-07-06 . IMA Journal of Mathematical Control and Information . 37 . 4 . 1468–1505 . 10.1093/imamci/dnaa021.
- Jimenez . J.C. . 2019 . Approximate linear minimum variance filters for continuous-discrete state space models: convergence and practical adaptive algorithms . 2023-07-06 . IMA Journal of Mathematical Control and Information . 36 . 2 . 341–378 . 10.1093/imamci/dnx047. free .
- Shoji . Isao . 1998 . A comparative study of maximum likelihood estimators for nonlinear dynamical system models . International Journal of Control . en . 71 . 3 . 391–404 . 10.1080/002071798221731 . 0020-7179.
- Nielsen . Jan Nygaard . Madsen . Henrik . 2001-01-01 . Applying the EKF to stochastic differential equations with level effects . Automatica . en . 37 . 1 . 107–112 . 10.1016/S0005-1098(00)00128-X . 0005-1098.
- Singer . Hermann . 2002 . Parameter Estimation of Nonlinear Stochastic Differential Equations: Simulated Maximum Likelihood versus Extended Kalman Filter and Itô-Taylor Expansion . Journal of Computational and Graphical Statistics . en . 11 . 4 . 972–995 . 10.1198/106186002808 . 120719418 . 1061-8600.
- Ozaki . Tohru . Iino . Mitsunori . 2001 . An innovation approach to non-Gaussian time series analysis . Journal of Applied Probability . en . 38 . A . 78–92 . 10.1239/jap/1085496593 . 119422248 . 0021-9002.
- Book: Peng . H. . Ozaki . T. . Jimenez . J.C. . Proceedings of the 41st IEEE Conference on Decision and Control, 2002 . Modeling and control for foreign exchange based on a continuous time stochastic microstructure model . 2002 . https://ieeexplore.ieee.org/document/1185071 . 4 . 4440–4445 vol.4 . 10.1109/CDC.2002.1185071. 0-7803-7516-5 . 8239063 .
- Kloeden P.E., Platen E., Numerical Solution of Stochastic Differential Equations, 3rd edn. Berlin: Springer, 1999.
- Web site: GitHub - locallinearization/SdeEstimation . 2023-07-06 . GitHub . en.
- Iacus S.M., Simulation and inference for stochastic differential equations: with R examples, New York: Springer, 2008.