Spatial statistics, sets and balls. Balls?

Lucas da Cunha Godoy

ldcgodoy@ucsc.com

EEB Department, UCSC

2026-03-20

Spatial Statistics

Brief Intro

Taking into account spatial dependence possibly present in data is a foremost aspect of spatial statistics.
- “Everything is related to everything else, but near things are more related than distant things.”
Important for proper uncertainty quantification in GLMMs (and friends)
Useful for spatial interpolation (more on this later)

Types of Data

There are two broad categories of spatial data:
- Areal
- Point-referenced
Models are tailored for each of these two data types.
They differ in assumption and how they quantify the “proximity” between sample units.

Areal data

Point-referenced data

Spatial data in practice

Challenges

Change of Support: Predicting a process on one spatial resolution (or scale) using data collected from a different resolution (Gelfand et al. 2001);
- Downscaling: A specific type of change of support where coarse aggregate data is used to infer values at a finer resolution (Zheng et al. 2025).
Spatial Data Fusion: Analyzing the same phenomenon when observations are simultaneously available at multiple resolutions (Moraga et al. 2017).
Spatial Misalignment: Handling response and explanatory variables that are observed on different spatial resolutions (Godoy et al. 2026).

Objectives

Develop a spatial model which:
1. Provides an unified framework for point-referenced, areal, and even mixed spatial data
2. Allows for modeling data at disparate spatial resolutions
How? Adapting what is done for point-referenced data!
Challenge: A meaningful and computationally feasible distance function between spatial units regardless of their type (e.g., point or area).

Sets

What is a Set? and Why sets?

In mathematics, a set is simply a collection of distinct objects (elements).
In spatial statistics, we usually work with coordinates or areas (or regions) in an index set \(D\).
Instead of distinguishing between points and areas, we may regard them both as sets.
How do we quantify distance between sets?

The Hausdorff distance (HD)

Definition \[ h(A_1, A_2) = \inf \{ r \geq 0 \, : \, A_1 \subseteq {\rm B}_r(A_2), A_2 \subseteq {\rm B}_r(A_1) \}, \] where \(A_1 \subset D\) and \(A_2 \subset D\) are two non-empty sets.
Intuition: given a reference metric space \((D, d)\), the Hausdorff distance quantifies the greatest distance one would have to travel from a point in one set to reach the other set.
Limitations: Computationally expensive to compute (Knauer et al. 2011), hard to make GPs work under this distance.

What about the balls?

What is a Ball?

In a metric space \((D, d)\), an open ball of radius \(r\) centered at \(x\) is: \[ \mathrm{B}_r(x) = \{ y \in D : d(x, y) < r \} \]

Balls?

Minimum Enclosing Balls (MEB)

The ball-Hausdorff Distance

We define the distance between two sets \(A_1\) and \(A_2\) as: \[ bh(A_1, A_2) = d(c(A_1), c(A_2)) + \lvert R(A_1) - R(A_2) \rvert \]
Intuition: Distance = (How far are the centers?) + (How different are the sizes?).
Efficiency: Computationally much cheaper than the standard Hausdorff distance (5x to 186x faster).

Comparison with Hausdorff distance

Further theoretical justification

For a spatial model to be valid, its covariance matrix must be positive definite.
- This property guarantees that any linear-combination of our spatial variables has a non-negative variance.
I proved theorems establishing conditions for the positive-definiteness of covariance functions under the ball-Hausdorff distance.
Impact: We can define GPs directly on sets and achieve the initial research objectives

Application: Atmospheric Temp

Data Fusion in California

Fusing in situ (points) and Satellite (areal) data.
Comparing results to Physical model (gold standard)

Statistical vs. Physical Model (1/2)

Statistical vs. Physical Model (2/2)

Wrapping up

Potential Applications in Ecology?

Species Distribution Models (SDMs): Fusing opportunistic sightings (points) with range maps or habitat surveys (polygons).
Spatial misalignment: Species’ abundance (and occurrence) are really observed at the same spatial scale as environmental variables.
Remote Sensing Integration: Moving beyond “pixel as point” assumptions to treat sensor footprints as geometric sets.

A Brief Summary

Unified Framework: Points and areas are treated consistently as sets.
The “Balls” Shortcut: Ball-Hausdorff distance is much faster than the classical distance between sets, while keeping the relationship between sets.
Theoretically Sound: Proven validity of GPs for sets.

Thank you!!

References

Gelfand, A. E., Zhu, L., and Carlin, B. P. (2001), “On the change of support problem for spatio-temporal data,” Biostatistics, Oxford University Press, 2, 31–45.

Godoy, L. da C., Prates, M. O., and Yan, J. (2026), “Voronoi linkage between mismatching voting stations and census tracts in analyzing the 2018 brazilian presidential election data,” Spatial Statistics, 71, 100949. https://doi.org/10.1016/j.spasta.2025.100949.

Knauer, C., Löffler, M., Scherfenberg, M., and Wolle, T. (2011), “The directed Hausdorff distance between imprecise point sets,” Theoretical Computer Science, Elsevier, 412, 4173–4186.

Moraga, P., Cramb, S. M., Mengersen, K. L., and Pagano, M. (2017), “A geostatistical model for combined analysis of point-level and area-level data using INLA and SPDE,” Spatial Statistics, Elsevier, 21, 27–41.

Zheng, X., Cressie, N., Clarke, D. A., McGeoch, M. A., and Zammit-Mangion, A. (2025), “Spatial-statistical downscaling with uncertainty quantification in biodiversity modelling,” Methods in Ecology and Evolution, Wiley Online Library, 16, 837–853. https://doi.org/10.1111/2041-210X.14505.