Surrogate data explained
Surrogate data, sometimes known as analogous data, usually refers to time series data that is produced using well-defined (linear) models like ARMA processes that reproduce various statistical properties like the autocorrelation structure of a measured data set.[1] The resulting surrogate data can then for example be used for testing for non-linear structure in the empirical data; this is called surrogate data testing.
Surrogate or analogous data also refers to data used to supplement available data from which a mathematical model is built. Under this definition, it may be generated (i.e., synthetic data) or transformed from another source.[2]
Uses
Surrogate data is used in environmental and laboratory settings, when study data from one source is used in estimation of characteristics of another source.[3] For example, it has been used to model population trends in animal species.[4] It can also be used to model biodiversity, as it would be difficult to gather actual data on all species in a given area.[5]
Surrogate data may be used in forecasting. Data from similar series may be pooled to improve forecast accuracy.[6] Use of surrogate data may enable a model to account for patterns not seen in historical data.[7]
Another use of surrogate data is to test models for non-linearity. The term surrogate data testing refers to algorithms used to analyze models in this way.[8] These tests typically involve generating data, whereas surrogate data in general can be produced or gathered in many ways.
Methods
One method of surrogate data is to find a source with similar conditions or parameters, and use those data in modeling. Another method is to focus on patterns of the underlying system, and to search for a similar pattern in related data sources (for example, patterns in other related species or environmental areas).
Rather than using existing data from a separate source, surrogate data may be generated through statistical processes, which may involve random data generation using constraints of the model or system.
See also
Further reading
- Schreiber . T. . Schmitz . A. . 10.1103/PhysRevLett.77.635 . Improved Surrogate Data for Nonlinearity Tests . Physical Review Letters . 77 . 4 . 635–638 . 1996 . 10062864. 1996PhRvL..77..635S . chao-dyn/9909041 . 13193081 .
Notes and References
- Generating surrogate data for time series with several simultaneously measured variables . Physical Review Letters . 73 . 7 . 951–954 . 1994 . Prichard . Theiler . 10.1103/physrevlett.73.951 . 10057582. comp-gas/9405002 . 1994PhRvL..73..951P . 32748996 .
- M.Sc. . Kaefer . Paul E. . 2015 . Transforming Analogous Time Series Data to Improve Natural Gas Demand Forecast Accuracy . Marquette University . 2016-02-18 . 2016-03-12 . https://web.archive.org/web/20160312044649/http://epublications.marquette.edu/theses_open/320/ . live .
- Web site: Surrogate Data Meaning . Columbia Analytical Services, Inc., now ALS Environmental . What is Surrogate Data? Data from studies of test organisms or a test substance that are used to estimate the characteristics or effects on another organism or substance. . February 15, 2017 . February 16, 2017 . https://web.archive.org/web/20170216130237/http://www.caslab.com/Surrogate_Data_Meaning/ . live .
- The Use of Surrogate Data in Demographic Population Viability Analysis: A Case Study of California Sea Lions . Claudia J. . Hernández-Camacho . Victoria. J. . Bakker . David . Aurioles-Gamboa . Jeff . Laake . Leah R. . Gerber . Leah Gerber. Aaron W. Reed . . 10 . 9 . September 2015 . 10.1371/journal.pone.0139158 . 26413746 . e0139158. 2015PLoSO..1039158H . 4587556 . free .
- Environmental diversity: on the best-possible use of surrogate data for assessing the relative biodiversity of sets of areas . D.P. . Faith . P.A. . Walker . Biodiversity and Conservation . 5 . 4 . 1996 . Springer Nature . 399–415 . 10.1007/BF00056387. 1996BiCon...5..399F . 24066193 .
- Book: Forecasting Analogous Time Series . Principles of Forecasting: A Handbook for Researchers and Practitioners . George T. . Duncan . Wilpen L. . Gorr . Janusz . Szczypula . J. Scott Armstrong . J. Scott Armstrong . Kluwer Academic Publishers . 2001 . 195–213 . 0-7923-7930-6.
- Using Surrogate Data to Mitigate the Risks of Natural Gas Forecasting on Unusual Days . Paul E. . Kaefer . Babatunde . Ishola . Ronald H. . Brown . George F. . Corliss . International Institute of Forecasters: 35th International Symposium on Forecasting . 2015 . forecasters.org/isf . 2022-07-20 . 2021-05-17 . https://web.archive.org/web/20210517041956/https://forecasters.org/wp-content/uploads/gravity_forms/7-621289a708af3e7af65a7cd487aee6eb/2015/07/Kaefer_Paul_ISF2015.pdf . live .
- Thomas . Schreiber . Andreas . Schmitz . Surrogate time series . Physica D . 142 . 3–4 . 346–382 . 1999 . 10.1016/s0167-2789(00)00043-9. chao-dyn/9909037. 2000PhyD..142..346S . 10.1.1.46.3999 . 13889229 .