The modifiable areal unit problem (MAUP) is a source of statistical bias that can significantly impact the results of statistical hypothesis tests. MAUP affects results when point-based measures of spatial phenomena are aggregated into spatial partitions or areal units (such as regions or districts) as in, for example, population density or illness rates.[1] [2] The resulting summary values (e.g., totals, rates, proportions, densities) are influenced by both the shape and scale of the aggregation unit.[3]
For example, census data may be aggregated into county districts, census tracts, postcode areas, police precincts, or any other arbitrary spatial partition. Thus the results of data aggregation are dependent on the mapmaker's choice of which "modifiable areal unit" to use in their analysis. A census choropleth map calculating population density using state boundaries will yield radically different results than a map that calculates density based on county boundaries. Furthermore, census district boundaries are also subject to change over time,[4] meaning the MAUP must be considered when comparing past data to current data.
The issue was first recognized by Gehlke and Biehl in 1934 and later described in detail in an entry in the Concepts and Techniques in Modern Geography (CATMOG) series by Stan Openshaw (1984) and in the book by Giuseppe Arbia (1988). In particular, Openshaw (1984) observed that "the areal units (zonal objects) used in many geographical studies are arbitrary, modifiable, and subject to the whims and fancies of whoever is doing, or did, the aggregating". The problem is especially apparent when the aggregate data are used for cluster analysis for spatial epidemiology, spatial statistics or choropleth mapping, in which misinterpretations can easily be made without realizing it. Many fields of science, especially human geography are prone to disregard the MAUP when drawing inferences from statistics based on aggregated data.[2] MAUP is closely related to the topic of ecological fallacy and ecological bias (Arbia, 1988). Stan Openshaw's work on this topic has led to Michael F. Goodchild suggesting it be referred to as the "Openshaw effect."[5]
Ecological bias caused by MAUP has been documented as two separate effects that usually occur simultaneously during the analysis of aggregated data. First, the scale effect causes variation in statistical results between different levels of aggregation (radial distance). Therefore, the association between variables depends on the size of areal units for which data are reported. Generally, correlation increases as areal unit size increases. The zoning effect describes variation in correlation statistics caused by the regrouping of data into different configurations at the same scale (areal shape).[6]
Since the 1930s, research has found extra variation in statistical results because of the MAUP. The standard methods of calculating within-group and between-group variance do not account for the extra variance seen in MAUP studies as the groupings change. MAUP can be used as a methodology to calculate upper and lower limits as well as average regression parameters for multiple sets of spatial groupings. The MAUP is a critical source of error in spatial studies, whether observational or experimental. As such, unit consistency, particularly in a time-series cross-sectional (TSCS) context, is essential. Further, robustness checks of unit sensitivity to alternative spatial aggregation should be routinely performed to mitigate associated biases on resulting statistical estimates.
Several suggestions have been made in literature to reduce aggregation bias during regression analysis. A researcher might correct the variance-covariance matrix using samples from individual-level data.[7]