Set-indexed random fields: Theory and practice
Available at lcgodoy.me/slides/2026-uconn/
2026-02-06
This research follows from my PhD thesis and is conducted in collaboration with:
Marcos O. Prates - Universidade Federal de Minas Gerais
Jun Yan - University of Connecticut
Fernando A. Quintana - Pontificia Universidad Católica de Chile
Bruno Sansó - University of California Santa Cruz
A random field (RF): \(\{ Z(\mathbf{s}) \; : \; \mathbf{s} \in D \}\), where \(D\) is an index set.
RF are used extensively in spatial statistics (Cressie 1993), where sample units are conceptualized as elements of an index-set \(D\).
Inference based on RF relies on further assumptions. The usual assumptions depend heavily on the spatial structure/geometry of the observed spatial data.
| Geometry | Branch | Index set |
|---|---|---|
| Areas/polygons | Areal models | Countable |
| Points | Geostatistics | Continuum |
Change of Support: Predicting a process on one spatial resolution (or scale) using data collected from a different resolution (Gelfand et al. 2001);
Spatial Data Fusion: Analyzing the same phenomenon when observations are simultaneously available at multiple resolutions (Moraga et al. 2017).
Spatial Misalignment: Handling response and explanatory variables that are observed on different spatial resolutions (Godoy et al. 2026a).
The assumptions regarding the index set \(D\) are inherited from Geostatistics.
Realizations observed over areal units (or blocks) are an aggregation of the point-level process \(Z(\mathbf{s})\): \[ Z(B) = {\lvert B \rvert}^{-1} \int_{B} Z(\mathbf{x}) \mathrm{d}\mathbf{x}, \]
Covariances involving aggregations are as follows: \[ \mathrm{Cov}[Z(B), Z(\mathbf{s})] = {\lvert B \rvert}^{-1} \int_{B} \mathrm{Cov}[Z(\mathbf{x}), Z(\mathbf{s})] \mathrm{d}\mathbf{x} \]
Since no analytical solutions are available, Monte Carlo techniques are used to approximate the covariances (Gelfand et al. 2001): \[ \begin{align} \mathrm{Cov}[Z(B), Z(\mathbf{s})] & = {\lvert B \rvert}^{-1} \int_{B} \mathrm{Cov}[Z(\mathbf{x}), Z(\mathbf{s})] \mathrm{d}\mathbf{x} \\ & \approx L^{-1} \sum_{k} \mathrm{Cov}[Z(\mathbf{s}_k), Z(\mathbf{s})] \end{align} \]
There is no consensus in the literature about how to choose \(L\) and these approximations may introduce unquantifiable biases (Gonçalves and Gamerman 2018).
A more flexible index set: The class of non-empty, closed and bounded sets in \(D\) (denoted \(\mathcal{C}_D\)).
Successful in practice: Competitive with areal models, often better than models for data fusion.
Lacking theoretical foundation: no formal proof of validity of covariance functions, does not allow for smooth covariance functions.
Derive a theoretically sound RF for spatial data, which allows for modeling areal, point-referenced, and mixed spatial data seamlessly.
To achieve our goal, we will:
Definition \[ h(A_1, A_2) = \inf \{ r \geq 0 \, : \, A_1 \subseteq {\rm B}_r(A_2), A_2 \subseteq {\rm B}_r(A_1) \}, \] where \(A_1 \subset D\) and \(A_2 \subset D\) are two non-empty sets.
Intuition: given a reference metric space \((D, d)\), the Hausdorff distance quantifies the greatest distance one would have to travel from a point in one set to reach the other set.
Limitations: Computationally expensive to compute (Knauer et al. 2011), no results establishing positive-definite functions of this distance.
Definition: A metric space \((D, d)\) is a length space if \(d(x, y)\) equals the infimum of the lengths of paths connecting \(x \in D\) and \(y \in D\) (Burago et al. 2001).
Property: Length spaces possess approximate midpoints.
Lemma: Let \((D, d)\) be a length space. Then balls expand linearly: \[ \rm B_{r}(\rm B_{k}(x)) = \rm B_{r + k}(x). \]
Consequence: \(h(\rm B_r(x), \rm B_k(y)) = d(x, y) + \lvert r - k \rvert\).
We denote the smallest ball containing a set \(A \subset \mathcal{C}_D\) by \(\mathcal{B}(A)\).
Radius: \({\rm R}(A) = \inf_{x \in D} \inf \{ r \geq 0 : A \subset \rm B_r(x) \}\)
Set of centers: \(\mathcal{E}(A) = \{ x \in D : {\rm R}(x, A) = {\rm R}(A) \}.\)
Chebyshev center: \(c(A) \in \mathcal{E}(A)\)
\(\mathcal{B}(A)\) always exists for closed and bounded sets on normed metric spaces (Garkavi 1970) and on complete manifolds (such as sphere and torus) (Burago et al. 2001).
Definition: Let \((D, d)\) be a length-space. Define \(\mathcal{C}_D\) as the class of non-empty, closed and bounded sets in \(D\). The ball-Hausdorff distance is defined as: \[ bh(A_1, A_2) = d(c(A_1), c(A_2)) + \lvert R(A_1) - R(A_2) \rvert, \] where \(A_1, A_2 \in \mathcal{C}_D\).
Applied context: The class \(\mathcal{C}_D\) encompasses most types of data we encounter in Spatial Statistics:
Remark: On the real line, the ball-Hausdorff distance is equivalent to the Hausdorff distance.
Remark: If we use the \(\lVert \cdot \rVert_1\) distance for sets in \(\mathbb{R}^p\), the ball-Hausdorff distance can be isometrically embedded into \((\mathbb{R}^{p + 1}, \lVert \cdot \rVert_1)\).
Theorem: Let \((D, d)\) be a pseudometric space. An upper-bound for the ball-Hausdorff distance is given by: \[ bh(A_1, A_2) \leq d(c(A_1), c(A_2)) + \max \{R(A_1), R(A_2)\}, \] where \(A_1, A_2 \in \mathcal{C}_D\).
Covariance function (CF): \(K : D \times D \to \mathbb{R}_{+}\)
Isotropic CFs: \(K \{ d(\mathbf{s}_1, \mathbf{s}_2) \}\).
Positive definiteness (PD): The CF of a RF must satisfy: \(\sum_{i, j = 1}^{n} c_i c_j K\{d(\mathbf{s}_i, \mathbf{s}_j)\} \geq 0\).
Isotropic CF of the Euclidean distance: \(\Phi_p\) is the class of valid isotropic CF on \(\mathbb{R}^p\): \(\Phi_1 \supset \Phi_2 \supset \cdots \supset \Phi_{\infty} = \bigcap_{k = 1}^{\infty} \Phi_{k}\)
Notable members of \(\Phi_{\infty}\): Matérn and Powered Exponential.
Let \((D, d)\) and \((D^\ast, d^\ast)\) denote metric spaces.
Isometric embedding: \(\phi \,: \, D \to D^\ast\) such that \(d^\ast(\phi(s), \phi(t)) = d(s, t)\), for any \(s, t \in D\) (Wells and Williams 1975).
Conditionally Negative Definite Function: \(g \, : \, D \times D \to \mathbb{R}_{+}\) satisfying: \[\sum_{i = 1}^{m} \sum_{j = 1}^{m} b_i b_j g(s_i, s_j) \leq 0, \, \sum b_i = 0.\] for \(s_1, \ldots, s_m \in D\).
Theorem: A pseudometric space \((D, d)\) can be isometrically embedded in a Hilbert space if and only if \(d^2\) is CND.
Consequence: Let \((D, d)\) be a metric space such that \(d\) is a CND pseudometric. Then, any function belonging to the class \(\Phi_{\infty}\) is PD on \((D, d^{1/2})\).
Let \((D, d)\) be a length space where the function \(d\) is CND. Define \(\mathcal{C}_D\) as the class of non-empty, bounded sets in \(D\). Then,
Follows from the fact that CND functions form a convex cone.
Follows from the Theorem on CND functions and embeddings.
Let \((D, d)\) be a length space where the function \(d\) is CND. Then, the Powered Exponential (PEXP) covariance function \[ K(h; \, \theta) = \sigma^2 \exp \left\{ - \left( \frac{h}{\phi} \right)^{\nu} \right\} \] is a valid family on \((\mathcal{C}_D, bh)\) for \(\nu \in (0, 1]\).
Let \((D, d)\) be a length space where the function \(d\) is CND. Then, the Matérn covariance function \[ K(h; \, \theta) = \sigma^2 \frac{1}{2^{\nu - 1}\Gamma(\nu)} {\left(\frac{h}{\phi}\right)}^{\nu} K_{\nu} \left( \frac{h}{\phi} \right) \] is a valid family on \((\mathcal{C}_D, \sqrt bh)\).
The ball-Hausdorff distance is based on the Hausdorff distance between minimum enclosing balls.
Its existence is guaranteed for most scenarios that are relevant in spatial statistics applications.
Conditions for the CND property of the distance and an algorithm for its computation have been proposed.
Rich families of covariance functions are readily available for the (element-wise) square-root of the ball-Hausdorff distance.
Model: \((Y(\mathbf{s}_i) \mid X(\mathbf{s}_i), z(\mathbf{s}_i)) \sim \mathcal{N}(\alpha + \beta^\top X(\mathbf{s}_i) + z(\mathbf{s}_i), \tau^2)\).
\(\mathbf{z} \sim \mathrm{NNGP}(\mathbf{0}, K(\cdot, \cdot))\), where
Inference using Stan
The proposed distance is a pseudometric (\(bh(A, B) = 0\) does not imply \(A \equiv B\)). However, introducing a nugget effect alleviates that problem.
Defining cross-covariance functions in this context would be a huge deal for spatial misalignment!
Other topics in this context:
Thank you!!