Industrial process data validation and reconciliation, or more briefly, process data reconciliation (PDR), is a technology that uses process information and mathematical methods in order to automatically ensure data validation and reconciliation by correcting measurements in industrial processes. The use of PDR allows for extracting accurate and reliable information about the state of industry processes from raw measurement data and produces a single consistent set of data representing the most likely process operation.
Industrial processes, for example chemical or thermodynamic processes in chemical plants, refineries, oil or gas production sites, or power plants, are often represented by two fundamental means:
F(y)=0
y=(y1,\ldots,yn)
Data originates typically from measurements taken at different places throughout the industrial site, for example temperature, pressure, volumetric flow rate measurements etc. To understand the basic principles of PDR, it is important to first recognize that plant measurements are never 100% correct, i.e. raw measurement
y
F(y)=0
Random errors means that the measurement
y
y*
y*
y
\bar{y}
y*
Other sources of errors when calculating plant balances include process faults such as leaks, unmodeled heat losses, incorrect physical properties or other physical parameters used in equations, and incorrect structure such as unmodeled bypass lines. Other errors include unmodeled plant dynamics such as holdup changes, and other instabilities in plant operations that violate steady state (algebraic) models. Additional dynamic errors arise when measurements and samples are not taken at the same time, especially lab analyses.
The normal practice of using time averages for the data input partly reduces the dynamic problems. However, that does not completely resolve timing inconsistencies for infrequently-sampled data like lab analyses.
This use of average values, like a moving average, acts as a low-pass filter, so high frequency noise is mostly eliminated. The result is that, in practice, data reconciliation is mainly making adjustments to correct systematic errors like biases.
ISA-95 is the international standard for the integration of enterprise and control systems[1] It asserts that:
Data reconciliation is a serious issue for enterprise-control integration. The data have to be valid to be useful for the enterprise system. The data must often be determined from physical measurements that have associated error factors. This must usually be converted into exact values for the enterprise system. This conversion may require manual, or intelligent reconciliation of the converted values [...].Systems must be set up to ensure that accurate data are sent to production and from production. Inadvertent operator or clerical errors may result in too much production, too little production, the wrong production, incorrect inventory, or missing inventory.
PDR has become more and more important due to industrial processes that are becoming more and more complex. PDR started in the early 1960s with applications aiming at closing material balances in production processes where raw measurements were available for all variables.[2] At the same time the problem of gross error identification and elimination has been presented.[3] In the late 1960s and 1970s unmeasured variables were taken into account in the data reconciliation process.,[4] [5] PDR also became more mature by considering general nonlinear equation systems coming from thermodynamic models.,[6],[7] [8] Quasi steady state dynamics for filtering and simultaneous parameter estimation over time were introduced in 1977 by Stanley and Mah.[7] Dynamic PDR was formulated as a nonlinear optimization problem by Liebman et al. in 1992.[9]
Data reconciliation is a technique that targets at correcting measurement errors that are due to measurement noise, i.e. random errors. From a statistical point of view the main assumption is that no systematic errors exist in the set of measurements, since they may bias the reconciliation results and reduce the robustness of the reconciliation.
Given
n
yi
\begin{align}
min | |
x,y* |
&
| ||||||||||
\sum | ||||||||||
i=1 |
\right)2\\ subjectto&F(x,y*)=0\\ &ymin\ley*\leymax\\ &xmin\lex\lexmax, \end{align}
where
* | |
y | |
i |
i
i=1,\ldots,n
yi
i
i=1,\ldots,n
xj
j
j=1,\ldots,m
\sigmai
i
i=1,\ldots,n
F(x,y*)=0
p
xmin,xmax,ymin,ymax
The term
\left( |
| ||||||
\sigmai |
\right)2
| ||||||||||
f(y | ||||||||||
i=1 |
\right)2
In other words, one wants to minimize the overall correction (measured in the least squares term) that is needed in order to satisfy the system constraints. Additionally, each least squares term is weighted by the standard deviation of the corresponding measurement. The standard deviation is related to the accuracy of the measurement. For example, at a 95% confidence level, the standard deviation is about half the accuracy.
Data reconciliation relies strongly on the concept of redundancy to correct the measurements as little as possible in order to satisfy the process constraints. Here, redundancy is defined differently from redundancy in information theory. Instead, redundancy arises from combining sensor data with the model (algebraic constraints), sometimes more specifically called "spatial redundancy",[7] "analytical redundancy", or "topological redundancy".
Redundancy can be due to sensor redundancy, where sensors are duplicated in order to have more than one measurement of the same quantity. Redundancy also arises when a single variable can be estimated in several independent ways from separate sets of measurements at a given time or time averaging period, using the algebraic constraints.
Redundancy is linked to the concept of observability. A variable (or system) is observable if the models and sensor measurements can be used to uniquely determine its value (system state). A sensor is redundant if its removal causes no loss of observability. Rigorous definitions of observability, calculability, and redundancy, along with criteria for determining it, were established by Stanley and Mah,[10] for these cases with set constraints such as algebraic equations and inequalities. Next, we illustrate some special cases:
Topological redundancy is intimately linked with the degrees of freedom (
dof
a=b+c
When speaking about topological redundancy we have to distinguish between measured and unmeasured variables. In the following let us denote by
x
y
F(x,y)=0
y
x
F(x,y)=0
n
red=n-dof
dof
\begin{align} red=n-dof=n-(n+m-p)=p-m, \end{align}
i.e. the redundancy is the difference between the number of equations
p
m
Simple counts of variables, equations, and measurements are inadequate for many systems, breaking down for several reasons: (a) Portions of a system might have redundancy, while others do not, and some portions might not even be possible to calculate, and (b) Nonlinearities can lead to different conclusions at different operating points. As an example, consider the following system with 4 streams and 2 units.
We incorporate only flow conservation constraints and obtain
a+b=c
c=d
F(x,y)=0
p-m\ge0
If we have measurements for
c
d
a
b
c
a
b
a
d
b
c
In 1981, observability and redundancy criteria were proven for these sorts of flow networks involving only mass and energy balance constraints.[12] After combining all the plant inputs and outputs into an "environment node", loss of observability corresponds to cycles of unmeasured streams. That is seen in the second case above, where streams a and b are in a cycle of unmeasured streams. Redundancy classification follows, by testing for a path of unmeasured streams, since that would lead to an unmeasured cycle if the measurement was removed. Measurements c and d are redundant in the second case above, even though part of the system is unobservable.
Redundancy can be used as a source of information to cross-check and correct the measurements
y
x
Data validation denotes all validation and verification actions before and after the reconciliation step.
Data filtering denotes the process of treating measured data such that the values become meaningful and lie within the range of expected values. Data filtering is necessary before the reconciliation process in order to increase robustness of the reconciliation step. There are several ways of data filtering, for example taking the average of several measured values over a well-defined time period.
Result validation is the set of validation or verification actions taken after the reconciliation process and it takes into account measured and unmeasured variables as well as reconciled values. Result validation covers, but is not limited to, penalty analysis for determining the reliability of the reconciliation, or bound checks to ensure that the reconciled values lie in a certain range, e.g. the temperature has to be within some reasonable bounds.
Result validation may include statistical tests to validate the reliability of the reconciled values, by checking whether gross errors exist in the set of measured values. These tests can be for example
If no gross errors exist in the set of measured values, then each penalty term in the objective function is a random variable that is normally distributed with mean equal to 0 and variance equal to 1. By consequence, the objective function is a random variable which follows a chi-square distribution, since it is the sum of the square of normally distributed random variables. Comparing the value of the objective function
f(y*)
P\alpha
f(y*)\leP95
The individual test compares each penalty term in the objective function with the critical values of the normal distribution. If the
i
Advanced process data reconciliation (PDR) is an integrated approach of combining data reconciliation and data validation techniques, which is characterized by
Simple models include mass balances only. When adding thermodynamic constraints such as energy balances to the model, its scope and the level of redundancy increases. Indeed, as we have seen above, the level of redundancy is defined as
p-m
p
thumb|350px|The workflow of an advanced data validation and reconciliation process.Gross errors are measurement systematic errors that may bias the reconciliation results. Therefore, it is important to identify and eliminate these gross errors from the reconciliation process. After the reconciliation statistical tests can be applied that indicate whether or not a gross error does exist somewhere in the set of measurements. These techniques of gross error remediation are based on two concepts:
Gross error elimination determines one measurement that is biased by a systematic error and discards this measurement from the data set. The determination of the measurement to be discarded is based on different kinds of penalty terms that express how much the measured values deviate from the reconciled values. Once the gross errors are detected they are discarded from the measurements and the reconciliation can be done without these faulty measurements that spoil the reconciliation process. If needed, the elimination is repeated until no gross error exists in the set of measurements.
Gross error relaxation targets at relaxing the estimate for the uncertainty of suspicious measurements so that the reconciled value is in the 95% confidence interval. Relaxation typically finds application when it is not possible to determine which measurement around one unit is responsible for the gross error (equivalence of gross errors). Then measurement uncertainties of the measurements involved are increased.
It is important to note that the remediation of gross errors reduces the quality of the reconciliation, either the redundancy decreases (elimination) or the uncertainty of the measured data increases (relaxation). Therefore, it can only be applied when the initial level of redundancy is high enough to ensure that the data reconciliation can still be done (see Section 2,).
Advanced PDR solutions offer an integration of the techniques mentioned above:
The result of an advanced PDR procedure is a coherent set of validated and reconciled process data.
PDR finds application mainly in industry sectors where either measurements are not accurate or even non-existing, like for example in the upstream sector where flow meters are difficult or expensive to position (see [13]); or where accurate data is of high importance, for example for security reasons in nuclear power plants (see [14]). Another field of application is performance and process monitoring (see [15]) in oil refining or in the chemical industry.
As PDR enables to calculate estimates even for unmeasured variables in a reliable way, the German Engineering Society (VDI Gesellschaft Energie und Umwelt) has accepted the technology of PDR as a means to replace expensive sensors in the nuclear power industry (see VDI norm 2048,).