Spatial statistics, sets and balls. Balls?
2026-03-20
Taking into account spatial dependence possibly present in data is a foremost aspect of spatial statistics.
Important for proper uncertainty quantification in GLMMs (and friends)
Useful for spatial interpolation (more on this later)
There are two broad categories of spatial data:
Models are tailored for each of these two data types.
They differ in assumption and how they quantify the “proximity” between sample units.
Change of Support: Predicting a process on one spatial resolution (or scale) using data collected from a different resolution (Gelfand et al. 2001);
Spatial Data Fusion: Analyzing the same phenomenon when observations are simultaneously available at multiple resolutions (Moraga et al. 2017).
Spatial Misalignment: Handling response and explanatory variables that are observed on different spatial resolutions (Godoy et al. 2026).
Develop a spatial model which:
How? Adapting what is done for point-referenced data!
Challenge: A meaningful and computationally feasible distance function between spatial units regardless of their type (e.g., point or area).
In mathematics, a set is simply a collection of distinct objects (elements).
In spatial statistics, we usually work with coordinates or areas (or regions) in an index set \(D\).
Instead of distinguishing between points and areas, we may regard them both as sets.
How do we quantify distance between sets?
Definition \[ h(A_1, A_2) = \inf \{ r \geq 0 \, : \, A_1 \subseteq {\rm B}_r(A_2), A_2 \subseteq {\rm B}_r(A_1) \}, \] where \(A_1 \subset D\) and \(A_2 \subset D\) are two non-empty sets.
Intuition: given a reference metric space \((D, d)\), the Hausdorff distance quantifies the greatest distance one would have to travel from a point in one set to reach the other set.
Limitations: Computationally expensive to compute (Knauer et al. 2011), hard to make GPs work under this distance.