Data model (GIS) explained

A geographic data model, geospatial data model, or simply data model in the context of geographic information systems, is a mathematical and digital structure for representing phenomena over the Earth. Generally, such data models represent various aspects of these phenomena by means of geographic data, including spatial locations, attributes, change over time, and identity. For example, the vector data model represents geography as collections of points, lines, and polygons, and the raster data model represent geography as cell matrices that store numeric values.[1] Data models are implemented throughout the GIS ecosystem, including the software tools for data management and spatial analysis, data stored in a variety of GIS file formats, specifications and standards, and specific designs for GIS installations.

While the unique nature of spatial information has led to its own set of model structures, much of the process of data modeling is similar to the rest of information technology, including the progression from conceptual models to logical models to physical models, and the difference between generic models and application-specific designs.

History

The earliest computer systems that represented geographic phenomena were quantitative analysis models developed during the quantitative revolution in geography in the 1950s and 1960s; these could not be called a geographic information system because they did not attempt to store geographic data in a consistent permanent structure, but were usually statistical or mathematical models. The first true GIS software modeled spatial information using data models that would come to be known as raster or vector:

Most first-generation GIS were custom-built for specific needs, with data models designed to be stored and processed most efficiently using the technology limitations of the day (especially punched cards and limited mainframe processing time). During the 1970s, the early systems had produced sufficient results to compare them and evaluate the effectiveness of their underlying data models.[6] This led to efforts at the Harvard Lab and elsewhere focused on developing a new generation of generic data models, such as the POLYVRT topological vector model that would form the basis for commercial software and data such as the Esri Coverage.[7]

As commercial off-the-shelf GIS software, GIS installations, and GIS data proliferated in the 1980s, scholars began to look for conceptual models of geographic phenomena that seemed to underlay the common data models, trying to discover why the raster and vector data models seemed to make common sense, and how they measured and represented the real world.[8] This was one of the primary threads that formed the subdiscipline of geographic information science in the early 1990s.

Further developments in GIS data modeling in the 1990s were driven by rapid increases in both the GIS user base and computing capability. Major trends included 1) the development of extensions to the traditional data models to handle more complex needs such as time, three-dimensional structures, uncertainty, and multimedia; and 2) the need to efficiently manage exponentially increasing volumes of spatial data with enterprise needs for multiuser access and security. These trends eventually culminated in the emergence of spatial databases incorporated into relational databases and object-relational databases.

Types of data models

Because the world is much more complex than can be represented in a computer, all geospatial data are incomplete approximations of the world.[9] Thus, most geospatial data models encode some form of strategy for collecting a finite sample of an often infinite domain, and a structure to organize the sample in such a way as to enable interpolation of the nature of the unsampled portion. For example, a building consists of an infinite number of points in space; a vector polygon represents it with a few ordered points, which are connected into a closed outline by straight lines and assuming all interior points are part of the building; furthermore, a "height" attribute may be the only representation of its three-dimensional volume.

The process of designing geospatial data models is similar to data modeling in general, at least in its overall pattern. For example, it can be segmented into three distinct levels of model abstraction:[10]

Each of these models can be designed in one of two situations or scopes:

Conceptual spatial models

See also: Conceptual model. Generic geospatial conceptual models attempt to capture both the physical nature of geographic phenomena and how people think about them and work with them.[12] Contrary to the standard modeling process described above, the data models upon which GIS is built were not originally designed based on a general conceptual model of geographic phenomena, but were largely designed according to technical expediency, likely influenced by common sense conceptualizations that had not yet been documented.

That said, an early conceptual framework that was very influential in early GIS development was the recognition by Brian Berry and others that geographic information can be decomposed into the description of three very different aspects of each phenomenon: space, time, and attribute/property/theme.[13] As a further development in 1978, David Sinton presented a framework that characterized different strategies for measurement, data, and mapping as holding one of the three aspects constant, controlling a second, and measuring the third.[14]

During the 1980s and 1990s, a body of spatial information theories gradually emerged as a major subfield of geographic information science, incorporating elements of philosophy (especially ontology), linguistics, and sciences of spatial cognition. By the early 1990s, a basic dichotomy had emerged of two alternative ways of making sense of the world and its contents:

These two conceptual models are not meant to represent different phenomena, but often are different ways of conceptualizing and describing the same phenomenon. For example, a lake is an object, but the temperature, clarity, and proportion of pollution of the water in the lake are each fields (the water itself may be considered as a third concept of a mass, but this is not as widely accepted as objects and fields).[16]

Vector data model

See also: Vector graphics. The vector logical model represents each geographic location or phenomenon by a geometric shape and a set of values for its attributes. Each geometric shape is represented using coordinate geometry, by a structured set of coordinates (x,y) in a geographic coordinate system, selected from a set of available geometric primitives, such as points, lines, and polygons.

Although there are dozens of vector file formats (i.e., physical data models) used in various GIS software, most conform to the Simple Feature Access (SFA) specification from the Open Geospatial Consortium (OGC). It was developed in the 1990s by finding common ground between existing vector models, and is now enshrined as ISO 19125, the reference standard for the vector data model. OGC-SFA includes the following vector geometric primitives:[17]

The geometric shape stored in a vector data set representing a phenomenon may or may not be of the same dimension as the real-world phenomenon itself.[18] It is common to represent a feature by a lower dimension than its real nature, based on the scale and purpose of the representation. For example, a city (a two-dimensional region) may be represented as a point, or a road (a three-dimensional structure) may be represented as a line. As long as the user is aware that the latter is a representation choice and a road is not really a line, this generalization can be useful for applications such as transport network analysis.

Based on this basic strategy of geometric shapes and attributes, vector data models use a variety of structures to collect these into a single data set (often called a layer), usually containing a set of related features (e.g., roads). These can be categorized into several approaches:

Vector data structures can also be classified by how they manage topological relationships between objects in a dataset:[22]

Vector data are commonly used to represent conceptual objects (e.g., trees, buildings, counties), but they can also represent fields. As an example of the latter, a temperature field could be represented by an irregular sample of points (e.g., weather stations), or by isotherms, a sample of lines of equal temperature.

Raster data model

See also: Raster graphics.

The raster logical model represents a field using a tessellation of geographic space into a regularly spaced two-dimensional array of locations (each called a cell), with a single attribute value for each cell (or more than one value in a multi-band raster). Typically, each cell either represents a single central point sample (in which the measurement model for the entire raster is called a lattice) or it represents a summary (usually the mean) of the field variable over the square area (in which the model is called a grid). The general data model is essentially the same as that used for images and other raster graphics, with the addition of capabilities for the geographic context. A small example follows:

May 2019 Precipitation (mm)
6 7 10 9 8 6 7 8
6 8 9 10 8 7 7 7
7 8 9 10 9 8 7 6
8 8 9 11 10 9 9 7
8 9 10 11 11 10 10 8
9 9 10 10 11 10 9 8
7 8 9 10 10 9 9 7
7 7 8 9 8 8 7 6

To represent a raster grid in a computer file, it must be serialized into a single (one-dimensional) list of values. While there are various possible ordering schemes, the most commonly used is row-major, in which the cells in the first row, followed immediately by the cells in the second row, as follows:

6 7 10 9 8 6 7 8 6 8 9 10 8 7 7 7 7 8 9 10 9 8 7 6 8 8 9 11 10 9 9 7 . . .

To reconstruct the original grid, a header is required with general parameters for the grid. At the very least, it requires the number of rows in each column so it will know where to begin each new row, and the datatype of each value (i.e. the number of bits in each value before beginning the next value).[25]

While the raster model is closely tied to the field conceptual model, objects can also be represented in raster, essentially by transforming an object X into a discrete (Boolean) field of presence/absence of X. Alternatively, a layer of objects (usually polygons) could be transformed into a discrete field of object identifiers. In this case, some raster file formats allow a vector-like table of attributes to be joined to the raster by matching the ID values. Raster representations of objects are often temporary, only created and used as part of a modelling procedure, rather than in a permanent data store.

To be useful in GIS, a raster file must be georeferenced to correspond to real world locations, as a raw raster can only express locations in terms of rows and columns. This is typically done with a set of metadata parameters, either in the file header (such as the GeoTIFF format) or in a sidecar file (such as a world file). At the very least, the georeferencing metadata must include the location of at least one cell in the chosen coordinate system and the resolution or cell size, the distance between each cell. A linear Affine transformation is the most common type of georeferencing, allowing rotation and rectangular cells. More complex georeferencing schemes include polynomial and spline transformations.

Raster data sets can be very large, so image compression techniques are often used. Compression algorithms identify spatial patterns in the data, then transform the data into parameterized representations of the patterns, from which the original data can be reconstructed. In most GIS applications, lossless compression algorithms (e.g., Lempel-Ziv) are preferred over lossy ones (e.g., JPEG), because the complete original data are needed, not an interpolation.

Extensions

Starting in the 1990s, as the original data models and GIS software matured, one of the primary foci of data modeling research was on developing extensions to the traditional models to handle more complex geographic information.

Spatiotemporal models

Time has always played an important role in analytical geography, dating at least back to Brian Berry's regional science matrix (1964) and the time geography of Torsten Hägerstrand (1970).[26] In the dawn of the GIScience era of the early 1990s, the work of Gail Langran opened the doors to research into methods of explicitly representing change over time in GIS data;[27] this led to many conceptual and data models emerging in the decades since.[28] Some forms of temporal data began to be supported in off-the-shelf GIS software by 2010.

Several common models for representing time in vector and raster GIS data include:[29]

Three-dimensional models

See also: 3D computer graphics. There are several approaches for representing three-dimensional map information, and for managing it in the data model. Some of these were developed specifically for GIS, while others have been adopted from 3D computer graphics or computer-aided drafting (CAD).

See also

Further reading

Notes and References

  1. Wade, T. and Sommer, S. eds. A to Z GIS
  2. Robertson . J.C. . The Symap Programme for Computer Mapping . The Cartographic Journal . 1967 . 4 . 2 . 108–113 . 10.1179/caj.1967.4.2.108.
  3. Book: Tomlinson . Roger . Stewart . G.A. . Land Evaluation: Papers of a CSIRO Symposium . 1968 . Macmillan of Australia . 200–210 . A Geographic Information System for Regional Planning.
  4. Book: Cooke . Donald F. . Foresman . Timothy W. . The History of Geographic Information Systems: Perspectives from the Pioneers . 1998 . Prentice Hall . 47–57 . Topology and TIGER: The Census Bureau's Contribution.
  5. Book: Tomlinson . Roger F. . Calkins . Hugh W. . Marble . Duane F. . Computer handling of geographical data . 1976 . UNESCO Press.
  6. Dueker . Kenneth J. . A Framework for Encoding Spatial Data . Geographical Analysis . 1972 . 4 . 1 . 98–105 . 10.1111/j.1538-4632.1972.tb00460.x. free .
  7. Peucker . Thomas K. . Chrisman . Nicholas . Cartographic Data Structures . The American Cartographer . 1975 . 2 . 1 . 55–69 . 10.1559/152304075784447289.
  8. Peuquet . Donna J. . Representations of Geographic Space: Toward a Conceptual Synthesis . Annals of the Association of American Geographers . 1988 . 78 . 3 . 375–394 . 10.1111/j.1467-8306.1988.tb00214.x.
  9. Book: Huisman . Otto . de By . Rolf A. . Principles of Geographic Information Systems . 2009 . ITC . Enschede, The Netherlands . 64 . 1 November 2021.
  10. Book: Longley . Paul A. . Goodchild . Michael F. . Maguire . David J. . Rhind . David W. . Geographic Information Systems & Science . 2011 . Wiley . 3rd. 207–228.
  11. Web site: Esri . ESRI Shapefile Technical Description . Esri Technical Library . 30 October 2021.
  12. Mennis . J. . Peuquet . D.J. . Qian . L. . A conceptual framework for incorporating cognitive principles into geographical database representation . International Journal of Geographical Information Science . 2000 . 14 . 6 . 501–520 . 10.1080/136588100415710. 7458359 .
  13. Berry . Brian J.L. . Approaches to regional analysis: A synthesis . Annals of the Association of American Geographers . 1964 . 54 . 1 . 2–11 . 10.1111/j.1467-8306.1964.tb00469.x. 128770492 .
  14. Book: Sinton . David J. . Dutton . Geoff . Harvard Papers in GIS . 1978 . 7. Harvard University . The inherent structure of information as a constraint to analysis: Mapped thematic data as a case study.
  15. Book: Peuquet . Donna J. . Smith . Barry . Brogaard . Berit . The Ontology of Fields: Report of a Specialist Meeting Held under the Auspices of the Varenius Project . 1997 .
  16. Book: Plewe . Brandon . Timpf . Sabine . Schlieder . Christoff . Kattenbeck . Marcus . Ludwig . Bernd . 14th International Conference on Spatial Information Theory (COSIT 2019) . 2019 . Schloss Dagstuhl-Leibniz-Zentrum für Informatik . A Case for Geographic Masses.
  17. Book: Open Geospatial Consortium . Simple feature access - Part 1: Common architecture . 2010. 20–32 .
  18. Book: Bolstad . Paul . GIS Fundamentals: A First Text on Geographic Information Systems . 2019 . XanEdu . 39–71 . 6th.
  19. Morehouse . Scott . ARC/INFO: A geo-relational model for spatial information . Proceedings of the International Symposium on Cartography and Computing (Auto-Carto VII) . 1985 . 388 .
  20. Book: Jensen . John R. . Jensen . Ryan R. . Introductory Geographic Information Systems . 2013 . Pearson . 125–147 . 5: Spatial Data Models and Databases.
  21. Web site: Open Geospatial Consortium . Simple Feature Access - Part 2: SQL Option . 4 November 2021.
  22. Peuquet . Donna J. . A conceptual framework and comparison of spatial data models . Cartographica . 1984 . 21 . 4 . 66–113 . 10.3138/D794-N214-221R-23R5.
  23. Peucker . Thomas K. . Chrisman . Nicholas . Cartographic Data Structures . The American Cartographer . 1975 . 2 . 1 . 55–69 . 10.1559/152304075784447289.
  24. Web site: Esri . What is a network dataset? . ArcGIS Pro Documentation . 4 November 2021.
  25. Book: Lo . C.P. . Yeung . Albert K.W. . Concepts and Techniques of Geographic Information Systems . 2002 . Prentice Hall . 81.
  26. Hägerstrand . Torsten . Torsten Hägerstrand . 1970 . What about people in regional science? . Papers of the Regional Science Association . 24 . 1 . 6–21 . 10.1007/BF01936872 . 198174673.
  27. Book: Langran . Gail . Time in Geographic Information Systems . 1992 . Taylor & Francis.
  28. Peuquet . Donna J. . It's about time: a conceptual framework for the representation of temporal dynamics in geographic information systems . Annals of the Association of American Geographers . 1994 . 84 . 3 . 441–461 . 10.1111/j.1467-8306.1994.tb01869.x.
  29. Gregory . Ian N. . Time-variant GIS Databases of Changing Historical Administrative Boundaries: A European Comparison . Transactions in GIS . 2002 . 6 . 2 . 161–178 . 10.1111/1467-9671.00103. 38450649 .
  30. Plewe . Brandon . A Qualified Assertion Database for the History of Places . International Journal of Humanities and Arts Computing . 2019 . 13 . 1–2 . 95–115 . 10.3366/ijhac.2019.0233. 207941717 .
  31. Web site: Esri . Fundamentals of netCDF data storage . ArcGIS Pro Documentation . 5 November 2021.
  32. D.R. Soller et al. (1999). "Inclusion of digital map products in the National Geologic Map Database". In Soller, D.R., ed., Digital Mapping Techniques '99—Workshop Proceedings. U.S. Geological Survey Open-File Report 99-386, p. 35–38,