The Volterra series is a model for non-linear behavior similar to the Taylor series. It differs from the Taylor series in its ability to capture "memory" effects. The Taylor series can be used for approximating the response of a nonlinear system to a given input if the output of the system depends strictly on the input at that particular time. In the Volterra series, the output of the nonlinear system depends on the input to the system at all other times. This provides the ability to capture the "memory" effect of devices like capacitors and inductors.
It has been applied in the fields of medicine (biomedical engineering) and biology, especially neuroscience.[1] It is also used in electrical engineering to model intermodulation distortion in many devices, including power amplifiers and frequency mixers. Its main advantage lies in its generalizability: it can represent a wide range of systems. Thus, it is sometimes considered a non-parametric model. In mathematics, a Volterra series denotes a functional expansion of a dynamic, nonlinear, time-invariant functional. The Volterra series are frequently used in system identification. The Volterra series, which is used to prove the Volterra theorem, is an infinite sum of multidimensional convolutional integrals.
The Volterra series is a modernized version of the theory of analytic functionals from the Italian mathematician Vito Volterra, in his work dating from 1887.[2] [3] Norbert Wiener became interested in this theory in the 1920s due to his contact with Volterra's student Paul Lévy. Wiener applied his theory of Brownian motion for the integration of Volterra analytic functionals. The use of the Volterra series for system analysis originated from a restricted 1942 wartime report[4] of Wiener's, who was then a professor of mathematics at MIT. He used the series to make an approximate analysis of the effect of radar noise in a nonlinear receiver circuit. The report became public after the war.[5] As a general method of analysis of nonlinear systems, the Volterra series came into use after about 1957 as the result of a series of reports, at first privately circulated, from MIT and elsewhere.[6] The name itself, Volterra series, came into use a few years later.
The theory of the Volterra series can be viewed from two different perspectives:
The latter functional mapping perspective is more frequently used due to the assumed time-invariance of the system.
A continuous time-invariant system with x(t) as input and y(t) as output can be expanded in the Volterra series as
y(t)=h0+
N | |
\sum | |
n=1 |
b | |
\int | |
a |
…
b | |
\int | |
a |
hn(\tau1,...,\taun)
n | |
\prod | |
j=1 |
x(t-\tauj)d\tauj.
Here the constant term
h0
y
hn(\tau1,...,\taun)
\tau
\tau
If N is finite, the series is said to be truncated. If a, b, and N are finite, the series is called doubly finite.
Sometimes the n-th-order term is divided by n!, a convention which is convenient when taking the output of one Volterra system as the input of another ("cascading").
The causality condition: Since in any physically realizable system the output can only depend on previous values of the input, the kernels
hn(t1,t2,\ldots,tn)
t1,t2,\ldots,tn
a\geq0
Fréchet's approximation theorem: The use of the Volterra series to represent a time-invariant functional relation is often justified by appealing to a theorem due to Fréchet. This theorem states that a time-invariant functional relation (satisfying certain very general conditions) can be approximated uniformly and to an arbitrary degree of precision by a sufficiently high finite-order Volterra series. Among other conditions, the set of admissible input functions
x(t)
The discrete-time case is similar to the continuous-time case, except that the integrals are replaced by summations:
where
P\in\{1,2,...\}\cup\{infty\}.
hp
a\geq0
We can always consider, without loss of the generality, the kernel
hp(\tau1,...,\taup)
\tau1,...,\taup
For a causal system with symmetrical kernels we can rewrite the n-th term approximately in triangular form
M | |
\sum | |
\tau1=0 |
M | |
\sum | |
\tau2=\tau1 |
…
M | |
\sum | |
\taup=\taup-1 |
hp(\tau1,...,\taup)
p | |
\prod | |
j=1 |
x(n-\tauj).
Estimating the Volterra coefficients individually is complicated, since the basis functionals of the Volterra series are correlated. This leads to the problem of simultaneously solving a set of integral equations for the coefficients. Hence, estimation of Volterra coefficients is generally performed by estimating the coefficients of an orthogonalized series, e.g. the Wiener series, and then recomputing the coefficients of the original Volterra series. The Volterra series main appeal over the orthogonalized series lies in its intuitive, canonical structure, i.e. all interactions of the input have one fixed degree. The orthogonalized basis functionals will generally be quite complicated.
An important aspect, with respect to which the following methods differ, is whether the orthogonalization of the basis functionals is to be performed over the idealized specification of the input signal (e.g. gaussian, white noise) or over the actual realization of the input (i.e. the pseudo-random, bounded, almost-white version of gaussian white noise, or any other stimulus). The latter methods, despite their lack of mathematical elegance, have been shown to be more flexible (as arbitrary inputs can be easily accommodated) and precise (due to the effect that the idealized version of the input signal is not always realizable).
This method, developed by Lee and Schetzen, orthogonalizes with respect to the actual mathematical description of the signal, i.e. the projection onto the new basis functionals is based on the knowledge of the moments of the random signal.
We can write the Volterra series in terms of homogeneous operators, as
y(n)=h0+
P | |
\sum | |
p=1 |
Hpx(n),
where
Hpx(n)=
b | |
\sum | |
\tau1=a |
…
b | |
\sum | |
\taup=a |
hp(\tau1,...,\taup)
p | |
\prod | |
j=1 |
x(n-\tauj).
To allow identification orthogonalization, Volterra series must be rearranged in terms of orthogonal non-homogeneous G operators (Wiener series):
y(n)=\sumpHpx(n)\equiv\sumpGpx(n).
The G operators can be defined by the following:
E\{Hix(n)Gjx(n)\}=0; i<j,
E\{Gix(n)Gjx(n)\}=0; i ≠ j,
whenever
Hix(n)
Recalling that every Volterra functional is orthogonal to all Wiener functional of greater order, and considering the following Volterra functional:
* | |
H | |
\overline{p |
we can write
E\left\{y(n)
* | |
H | |
\overline{p |
If x is SWN,
\tau1 ≠ \tau2 ≠ \ldots ≠ \tauP
A=
2 | |
\sigma | |
x |
E\left\{y(n)\prod\overline{p
So if we exclude the diagonal elements,
{\taui ≠ \tauj, \foralli,j}
kp(\tau1,...,\taup)=
E\left\{y(n)x(n-\tau1) … x(n-\taup)\right\ | |
If we want to consider the diagonal elements, the solution proposed by Lee and Schetzen is
kp(\tau1,...,\taup)=
| |||||||||
The main drawback of this technique is that the estimation errors, made on all elements of lower-order kernels, will affect each diagonal element of order p by means of the summation
p-1 | |
\sum\limits | |
m=0 |
Gmx(n)
Once the Wiener kernels were identified, Volterra kernels can be obtained by using Wiener-to-Volterra formulas, in the following reported for a fifth-order Volterra series:
h5=k5,
h4=k4,
h3=k3-10A
\sum | |
\tau4 |
k5(\tau1,\tau2,\tau3,\tau4,\tau4),
h2=k2-6A
\sum | |
\tau3 |
k4(\tau1,\tau2,\tau3,\tau3),
h1=k1-3A
\sum | |
\tau2 |
k3(\tau1,\tau2,\tau2)+15A2\sum\tau2
\sum | |
\tau3 |
k5(\tau1,\tau2,\tau2,\tau3,\tau3),
h0=k0-A
\sum | |
\tau1 |
k2(\tau1,\tau1)+3A2
\sum | |
\tau1 |
\sum | |
\tau2 |
k4(\tau1,\tau1,\tau2,\tau2).
In the traditional orthogonal algorithm, using inputs with high
\sigmax
\sigmax
On the contrary, the use of lower
\sigmax
This phenomenon, which can be called locality of truncated Volterra series, can be revealed by calculating the output error of a series as a function of different variances of input.This test can be repeated with series identified with different input variances, obtaining different curves, each with a minimum in correspondence of the variance used in the identification.
To overcome this limitation, a low
\sigmax
The traditional Wiener kernel identification should be changed as follows:[9]
(0) | |
k | |
0 |
=E\{y(0)(n)\},
(1) | |
k | |
1 |
(\tau1)=
1 | |
A1 |
E\left\{y(1)(n)x(1)(n-\tau1)\right\},
(2) | |
k | |
2 |
(\tau1,\tau2)=
1 | ||||||||
|
\left\{E\left\{y(2)(n)
2 | |
\prod | |
i=1 |
x(2)(n-\taui)\right\}-A2
(2) | |
k | |
0 |
\delta | |
\tau1\tau2 |
\right\},
(3) | |
k | |
3 |
(\tau1,\tau2,\tau3)=
1 | ||||||||
|
\left\{E\left\{y(3)(n)
3 | |
\prod | |
i=1 |
x(3)(n-\taui)\right\}-
2 | |
A | |
3 |
(3) | |
\left[k | |
1 |
(\tau1)
\delta | |
\tau2\tau3 |
+
(3) | |
k | |
1 |
(\tau2)
\delta | |
\tau1\tau3 |
+
(3) | |
k | |
1 |
(\tau3)
\delta | |
\tau1\tau2 |
\right]\right\}.
In the above formulas the impulse functions are introduced for the identification of diagonal kernel points.If the Wiener kernels are extracted with the new formulas, the following Wiener-to-Volterra formulas (explicited up the fifth order) are needed:
h5=
(5) | |
k | |
5 |
,
h4=
(4) | |
k | |
4 |
,
h3=
(3) | |
k | |
3 |
-10A3
\sum | |
\tau4 |
(5) | |
k | |
5 |
(\tau1,\tau2,\tau3,\tau4,\tau4),
h2=
(2) | |
k | |
2 |
-6A2
\sum | |
\tau3 |
(4) | |
k | |
4 |
(\tau1,\tau2,\tau3,\tau3),
h1=
(1) | |
k | |
1 |
-3A1
\sum | |
\tau2 |
(3) | |
k | |
3 |
(\tau1,\tau2,\tau2)+15
2 | |
A | |
1 |
\sum\tau2
\sum | |
\tau3 |
(5) | |
k | |
5 |
(\tau1,\tau2,\tau2,\tau3,\tau3),
h0=
(0) | |
k | |
0 |
-A0
\sum | |
\tau1 |
(2) | |
k | |
2 |
(\tau1,\tau1)+3
2 | |
A | |
0 |
\sum | |
\tau1 |
\sum | |
\tau2 |
(4) | |
k | |
4 |
(\tau1,\tau1,\tau2,\tau2).
As can be seen, the drawback with respect to the previous formula[8] is that for the identification of the n-th-order kernel, all lower kernels must be identified again with the higher variance.However, an outstanding improvement in the output MSE will be obtained if the Wiener and Volterra kernels are obtained with the new formulas.[9]
This method was developed by Wray and Green (1994) and utilizes the fact that a simple 2-fully connected layer neural network (i.e., a multilayer perceptron) is computationally equivalent to the Volterra series and therefore contains the kernels hidden in its architecture. After such a network has been trained to successfully predict the output based on the current state and memory of the system, the kernels can then be computed from the weights and biases of that network.
The general notation for the n-th-order Volterra kernel is given by
hn(\tau1,...,\taun)=
M | |
\sum | |
i=1 |
(ciani
\omega | |
\tau1i |
...
\omega | |
\tauni |
),
where
n
ci
aji
\omegaji
This method and its more efficient version (fast orthogonal algorithm) were invented by Korenberg.[10] In this method the orthogonalization is performed empirically over the actual input. It has been shown to perform more precisely than the crosscorrelation method. Another advantage is that arbitrary inputs can be used for the orthogonalization and that fewer data points suffice to reach a desired level of accuracy. Also, estimation can be performed incrementally until some criterion is fulfilled.
Linear regression is a standard tool from linear analysis. Hence, one of its main advantages is the widespread existence of standard tools for solving linear regressions efficiently. It has some educational value, since it highlights the basic property of Volterra series: linear combination of non-linear basis-functionals. For estimation, the order of the original should be known, since the Volterra basis functionals are not orthogonal, and thus estimation cannot be performed incrementally.
This method was invented by Franz and Schölkopf[11] and is based on statistical learning theory. Consequently, this approach is also based on minimizing the empirical error (often called empirical risk minimization). Franz and Schölkopf proposed that the kernel method could essentially replace the Volterra series representation, although noting that the latter is more intuitive.
This method was developed by van Hemmen and coworkers[12] and utilizes Dirac delta functions to sample the Volterra coefficients.