HAUSDORFF–GAUSSIAN PROCESS WITH A SPATIOTEMPORAL APPLICATION

Available at lcgodoy.me/slides/2024-ness/

Lucas da Cunha Godoy

lcgodoy@duck.com

Department of Statistics, UConn

NESS 2024

2024-05-23

Introduction

“Everything is related to everything else, but near things are more related than distant things.” —Tobler’s first law of geography

Spatial Data

Taking into account spatial dependence possibly present in data is a foremost aspect of spatial statistics.
Statistical inference depends heavily on the spatial structure/geometry of the observed spatial data.

Geometry	Branch
Point-referrenced	Geostatistics
	Point patterns
Areal units/polygons	Lattice/Areal models
Mixed	Spatial Data Fusion

General Spatial Model

The model below is flexible enough to analyze most spatial data structures (Cressie 1993)

\[\{ Z(\mathbf{s}) \; : \; \mathbf{s} \in D \},\]

where \(D\) is an index set.

Areal models: index set is countable.
Geostatistics: index set is a continuum.
Gaussian Process (GP): a stochastic process such that all its finite-dimensional marginal distributions are multivariate normal distributions.

Research Questions

The main research questions we are interested in are:
1. Can we propose a model for spatial data that accomodates areal, point-referrenced, and fused data?
2. If so, is this model competitive when compared to specialized models?
Proposal: A GP defined on a flexible index set.
Main challenge: Defining a valid correlation function.

Hausdorff–Gaussian Process (HGP)

Distances between Sets

Metric space: \((D, d)\), where \(D\) is a spatial region of interest.
Distance between a point and a set: \(d(x, A) = \inf_{a \in A} d(x, a)\), where \(d(x, y)\) is the distance between any two elements \(x, y \in D\).
Directed Hausdorff distance: \[{\vec h}(A, B) = \sup_{a \in A} d(a, B)\]
Hausdorff distance: \[h(A, B) = \max \left \{ \vec{h}(A, B), \vec{h}(B, A) \right \}\]

Hausdorff Distance for Spatial Data Analysis

Study region: In spatial statistics, \(D\) is tipically a closed and bounded subset of \(\mathbb{R}^2\).
In this context, the Hausdorff distance is a metric (Sendov 2004).

The HGP

Isotropic stochastic process: \(\{ Z(\mathbf{s}) \, : \, \mathbf{s} \in \mathcal{B}(D) \},\) where \(D \subset \mathbb{R}^2\).
Flexible index set: \(\mathcal{B}(D)\) contains all closed and bounded subsets of \(D\).
The HGP: The stochastic process \(Z(\cdot)\) is an HGP if:
- It is a GP
- Its correlation function is a function of the Hausdorff distance between sets (in our case, spatial units).
This process is suitable to analyze point-referrenced, areal, and mixed spatial data.

Ensuring Validity: Correlation Functions for the HGP

For a valid process, its correlation function must satisfy the following properties:

Diminish with increasing distance: \(\lim_{h \to \infty}r(h) = 0\).
Bounded from above by 1: \(r(0) = 1\).
Positive-definiteness: yields positive-definite correlation matrices for all its finite-dimensional marginal distributions.
Unfortunately, functions that are guaranteed to be positive definite on \((\lVert \cdot \rVert_2, \mathbb{R}^2)\) are not necessarily positive definite on other metric spaces (Li et al. 2023).

The Powered Exponential Correlation (PEC) Function

PEC function: \[r(h; \phi, \nu) = \exp\left \{ - \frac{h^{\nu}}{\phi^{\nu}}\right \},\] where \(\nu\) is a smoothness parameter and \(\phi\) governs the range of the spatial dependence.
Parametrization: We reparametrize this function with \(\rho = {\log(10)}^{1 / \nu} \phi\).
Interpretation: \(\rho\) is the distance at which the spatial correlation reduces to \(0.10\).

Visualizing the PEC function

HGP Recap

Flexibility: A process that handles point-referrenced, areal, and mixed spatial data by construction.
Hausdorff distance: Enables HGP’s correlation function to account for the shape, size, and orientation of spatial objects.
Validity: Using a PEC function ensures the HGP is a valid process.

flowchart LR
    A(Flexible index set) ==> B(GP)
    B ==> E(((HGP)))
    C(Hausdorff distance) ==> D(PEC)
    D ==> E

Spatial modeling

Notation

Set of spatial locations: \(\mathcal{S} = \{ \mathbf{s}_1, \ldots, \mathbf{s}_n \}\), where \(\mathbf{s}_i \in \mathcal{B}(D)\) and \(D \subset \mathbb{R}^2\).
Response variable: \(\mathbf{Y} = {(Y(\mathbf{s}_1), \ldots, Y(\mathbf{s}_n))}^\top\).
Covariates: \(\mathbf{X} = (\mathbf{X}_1, \ldots, \mathbf{X}_p)\), where \(\mathbf{X}_{j} = (x_{j1}, \ldots, x_{jn})^\top\) and \(\mathbf{x}_i = (x_{1i}, \ldots, x_{pi})\).
Spatial random effects: \(\mathbf{Z} = {(Z(\mathbf{s}_1), \ldots, Z(\mathbf{s}_n))}^\top\).
Density functions (for response and latent variables): \(p(\cdot \mid \cdot)\)
Prior and posterior distributions: \(\pi(\cdot)\) and \(\pi(\cdot \mid \cdot)\)

Spatial GLMM

A generalized linear mixed model (GLMM) can be written as \[\begin{aligned} & Y(\mathbf{s}_i) \mid \mathbf{x}_i, z(\mathbf{s}_i) \overset{{\rm ind}}{\sim} f(\cdot \mid \mu_i, \omega) \\ & g(\mu_i) = \mathbf{x}_i \boldsymbol{\beta} + z(\mathbf{s}_i). \end{aligned}\]

Probability distribution: \(f(\cdot)\)
Link function: \(g(\cdot)\)
Conditional mean: \(\mu_i = \mathbb{E}[Y(\mathbf{s}_i) \mid \mathbf{x}_i, z(\mathbf{s}_i)]\)
Model parameters: \(\theta = {\{\boldsymbol{\beta}^\top, \boldsymbol{\sigma}^\top, \delta^\top, \omega^\top \}}^\top\)
Joint density: \(p(\mathbf{y} \mid \mathbf{z}, \theta) = \prod_{i = 1}^n f(y(\mathbf{s}_i) \mid \mu_i, \omega)\)

Bayesian Estimation

Priors: \[\begin{align*} & \mathbf{Z} \sim \mathrm{HGP}\{0, v(\cdot), r(\cdot) \} \\ & \boldsymbol{\beta} \sim \mathcal{N}(\mathbf{0}, 10^2 \mathbf{I}) \quad \sigma \sim t_{+}(3) \\ & \rho \sim \mathrm{Exp}(a_\rho) \quad \nu \sim \mathrm{Beta}(a_\nu, b_\nu) \end{align*}\]
Posterior: \(\pi(\theta \mid \mathbf{y}, \mathbf{z}) \propto p(\mathbf{y} \mid \mathbf{z}, \theta) p(\mathbf{z} \mid \theta) \pi(\theta)\)
MCMC sampler: No-U-Turn (Homan and Gelman 2014).
Convergence assessment: traceplots and split-\({\hat{R}}\) (Vehtari et al. 2021).

Goodness-of-fit and Predictions

Goodness-of-fit criteria: DIC, WAIC, and LOOIC (lower values indicate better fit)
Unobserved outcome at \(m\) locations: \(\mathbf{Y}^\ast = (y(\mathbf{s}^\ast_1), \ldots, y(\mathbf{s}^\ast_m))^\top\).
Posterior predictive distributions: \(p(\mathbf{y}^{\ast} \mid \mathbf{y}) = \int p(\mathbf{y}^{\ast} \mid \mathbf{z}^{\ast}, \theta) p(\mathbf{z}^{\ast} \mid \mathbf{z}, \theta) \pi(\theta \mid \mathbf{y}) {\rm d} \theta\)
Monte Carlo samples from \(p(\mathbf{y}^{\ast} \mid \mathbf{y})\) are obtained using the draws from the posterior \(\pi(\theta \mid \mathbf{y})\)
Predictions assessment: Interval Score (IS) and RMSP (lower values indicate better fit)

Spatiotemporal application

Tuberculosis in Brazil

TB in Context: Tuberculosis (TB) remains a major global health threat, second in infectious disease mortality only to COVID-19.
Brazilian TB Trends: Brazil’s national TB incidence declined only slightly from 2006–2015. Rio Grande do Sul (RS) reported significantly higher incidence than the national average in 2021, with the eastern region even more affected.
Spatial Dependence: Studies demonstrate strong spatial dependence of TB infections (and co-infections) in Brazil, both nationwide and in specific cities. Time dependence has been largely overlooked in the literature.
Risk Factors: TB risk factors include densely populated areas, poverty, substance abuse, and incarceration.

Objectives

Reliable incidence estimates:
- TB incidence across the study period, incorporating location-specific factors.
- Smaller municipalities benefit from “borrowed strength” from neighbors (in the Hausdorff distance sense).
- Results enable the calculation of SIRs to pinpoint high-risk areas.
Forecasts:
- TB incidence rates one year ahead.
- These predictions offer crucial insights for proactive public health planning.

Model

Sample units: 54 municipalities, across 11 years (2011 to 2021). We use 2022 to assess the quality of predictions.
Number of TB cases: \(Y_t(\mathbf{s}_i) = Y(\mathbf{s}_i, t)\) at location \(\mathbf{s}_i\) and time \(t\).
Population: \(P_t(\mathbf{s}_i)\).
Five covariates and two way interactions between presence of prison and continuous variables (except IDESE).

\[\begin{aligned} & (Y_t(\mathbf{s}_i) \mid \mathbf{X}_{t}(\mathbf{s}_i), z_t(\mathbf{s}_i)) \overset{{\rm ind}}{\sim} \text{Poisson}(P_t(\mathbf{s}_i) \mu_{it}) \\ & \log(\mu_{it}) = \beta_0 + \mathbf{X}^\top_{t}(\mathbf{s}_i) \beta + z_t(\mathbf{s}_i) \end{aligned}\]

Spatiotemporal Trend

Explanatory Variables

Random Effects

No random effects
Unstructured: Random intercept per municipality (iid Normal)
Time dependent: \(\mathrm{AR}(1)\)
Spatial: Homoscedastic HGP
Separable spatiotemporal: \(\mathrm{AR}(1)\) + Homoscedastic HGP, Multivariate \(\mathrm{AR}(1)\) with HGP errors
Separable spatiotemporal and unstructured: Multivariate \(\mathrm{AR}(1)\) with HGP errors + random intercept per municipality

Results: GOF and Predictive Performance

	LOOIC	WAIC	DIC	RMSP	IS
BYM	3606.1	3449.9	3505.4	123.3	176.6
DAGAR	3520.9	3447.5	3443.6	22.4	88.8
HGP	3516.1	3447.7	3450.4	21.1	87.8

Results: Relative Risks

Parameter	Estimate
\(\exp(\beta_1)\)	2.34 (1.70, 3.19)
\(\exp(\beta_2)\)	1.33 (1.15, 1.56)
\(\exp(\beta_3)\)	1.03 (0.99, 1.07)
\(\exp(\beta_4)\)	0.97 (0.93, 1.00)
\(\exp(\beta_5)\)	0.99 (0.92, 1.07)
\(\exp(\beta_2 + \beta_{21})\)	1.75 (1.18, 2.52)
\(\exp(\beta_3 + \beta_{31})\)	2.25 (1.63, 3.09)
\(\exp(\beta_4 + \beta_{41})\)	2.51 (1.83, 3.46)

Spatiotemporal Dependence

Small Municipalities

Forecast

Population at Risk in 2022

Highlights

Tailored an HGP extension for spatiotemporal disease mapping.
Competitive with specialized models
It helps to gain insights into spatiotemporal disease mapping through spatiotemporal correlation functions.
More reliable estimates of risk factors
Out-of-sample predictions to inform public policies

Closing remarks

Conclusion

The HGP offers a powerful and versatile model for spatial analysis. Advantages include:

Common framework for diverse spatial data types.
Competitive performance when compared to specialized models.
Directly interpretable spatial parameters.
Reliable and accurate predictions.

Future work and Limitations

Extensions:
- non-Euclidean spaces
- Big data
Limitations:
- Difficulties in obtaining spectral densities for correlation functions
- Big “n” problem inherited from geostatistics
- Unclear how to incorporate anisotropy

References

Besag, J., York, J., and Mollié, A. (1991), “Bayesian image restoration, with two applications in spatial statistics,” Annals of the Institute of Statistical Mathematics, 43, 1–20.

Cressie, N. (1993), Statistics for spatial data, Wiley series in probability and statistics, Wiley.

Datta, A., Banerjee, S., Hodges, J. S., and Gao, L. (2019), “Spatial disease mapping using directed acyclic graph auto-regressive (DAGAR) models,” Bayesian analysis, NIH Public Access, 14, 1221.

Dean, C., Ugarte, M., and Militino, A. (2001), “Detecting interaction between random region and fixed age effects in disease mapping,” Biometrics, Wiley Online Library, 57, 197–202.

Diggle, P. J., Tawn, J. A., and Moyeed, R. A. (1998), “Model-based geostatistics,” Journal of the Royal Statistical Society Series C: Applied Statistics, Oxford University Press, 47, 299–350.

Gneiting, T., and Raftery, A. E. (2007), “Strictly proper scoring rules, prediction, and estimation,” Journal of the American statistical Association, Taylor & Francis, 102, 359–378.

Gonçalves, F. B., and Gamerman, D. (2018), “Exact Bayesian inference in spatiotemporal Cox processes driven by multivariate Gaussian processes,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), Wiley Online Library, 80, 157–175.

Homan, M. D., and Gelman, A. (2014), “The No-U-turn sampler: Adaptively setting path lengths in hamiltonian Monte Carlo,” Journal of Machine Learning Research, JMLR.org, 15, 1593–1623.

Johnson, O., Diggle, P., and Giorgi, E. (2020), “Dealing with spatial misalignment to model the relationship between deprivation and life expectancy: A model-based geostatistical approach,” International Journal of Health Geographics, Springer, 19, 1–13.

Leroux, B. G., Lei, X., and Breslow, N. (2000), “Estimation of disease rates in small areas: A new mixed model for spatial dependence,” in Statistical models in epidemiology, the environment, and clinical trials, Springer, pp. 179–191.

Li, D., Tang, W., and Banerjee, S. (2023), “Inference for Gaussian processes with Matérn covariogram on compact Riemannian manifolds,” Journal of Machine Learning Research, 24, 1–26.

Lindgren, F., Rue, H., and Lindström, J. (2011), “An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73, 423–498. https://doi.org/https://doi.org/10.1111/j.1467-9868.2011.00777.x.

Min, D., Zhilin, L., and Xiaoyong, C. (2007), “Extended Hausdorff distance for spatial objects in GIS,” International Journal of Geographical Information Science, Taylor & Francis, 21, 459–475.

Moraga, P., Cramb, S. M., Mengersen, K. L., and Pagano, M. (2017), “A geostatistical model for combined analysis of point-level and area-level data using INLA and SPDE,” Spatial Statistics, Elsevier, 21, 27–41.

Rue, H., Martino, S., and Chopin, N. (2009), “Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), Wiley Online Library, 71, 319–392.

Sendov, B. (2004), “Hausdorff distance and image processing,” Russian Mathematical Surveys, IOP Publishing, 59, 319. https://doi.org/10.1070/RM2004v059n02ABEH000721.

Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., and Bürkner, P.-C. (2021), “Rank-normalization, folding, and localization: An improved \(\hat{R}\) for assessing convergence of MCMC (with discussion),” Bayesian Analysis, International Society for Bayesian Analysis, 16, 667–718.

Zhang, H. (2004), “Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics,” Journal of the American Statistical Association, Taylor & Francis, 99, 250–261.

Appendix

Areal Models: Limitations

Out-of-sample predictions: Non-trivial, it may be necessary to refit the model.
Interpretability: Relationship between the spatial dependence parameter and correlation between neighbors is counterintuitive.
Heteroscedasticity: Marginal SD depend on the number of neighbors.
Extensions for change of support and data fusion problems are cumbersome and there exists no consensus in the literature in how to approach such problems.

Data Fusion Models: Limitations

Computing: Although satisfactory approximations exist, it becomes computationally prohibitive for moderately large datasets.
Approximations: Inferences may drastically change depending on the precision of the approximations.
Biases: It is hard to quantify the impact of the discretizations on inferences (Gonçalves and Gamerman 2018).

Areal Data and the CAR Model

\(Z_k \mid Z_{-k} \sim \mathcal{N}\left (\frac{\psi} {m_k} \sum_{i \sim k} Z_i, \frac{\tau^2}{m_k} \right)\)
Special case: ICAR.
Extensions of the ICAR model, providing a measure of spatial dependence have been developed (Dean et al. 2001; Leroux et al. 2000).
The directed acyclic graph auto-regressive (DAGAR) model (Datta et al. 2019) proposes a new way to construct precision matrices using a DAG derived from the original adjacency matrix.

Fused Data

GP at the point-referrenced level and a stochastic integral at the areal level (AGP) (Moraga et al. 2017): \[Z(\mathbf{s}_i) = \begin{cases} {\mathcal{A}(\mathbf{s}_i)}^{-1} \int_{\mathbf{s}_i} Z(\mathbf{s}) \mathrm{d}\mathbf{s}, & \text{if } \mathcal{A}(\mathbf{s}_i) > 0, \\ Z(\mathbf{s}_i), & \; \text{otherwise,} \end{cases}\]
Covariance (Johnson et al. 2020): \[\mathrm{Cov}(Z(\mathbf{s}_i), Z(\mathbf{s}_j)) = {(\mathcal{A}(\mathbf{s}_i) \mathcal{A}(\mathbf{s}_j))}^{-1} \int_{\mathbf{s}_j} \int_{\mathbf{s}_i} C(\mathbf{x}, \mathbf{y}) \mathrm{d} \mathbf{x} \mathrm{d} \mathbf{y}\]
Arbitrary precision: The stochastic integrals are evaluated numerically and depend on a user-defined grid.

Theoretical Foundation: Positive Definiteness of the PEC

Proposition

Let \(h = h(\mathbf{s}, \mathbf{s}')\) be the Hausdorff distance between two spatial units, denoted \(\mathbf{s}, \mathbf{s}' \in \mathcal{B}(D)\), where \(D \subset \mathbb{R}^2\). The powered exponential correlation function \(\exp \{ - h^{\nu} / \phi^{\nu} \}\) is positive definite for \(\nu \in (1/2, 1)\).

The proposition above guarantees the validity of the HGP equipped with a PEC function with \(\nu \in (1/2, 1)\).
The proof is based on embedding the Hausdorff distance into a high-dimensional \(L_1\) normed Euclidean space, and using the fact that the exponential correlation function is positive definite on this space.

Practical Validation: Empirical Assessment of Positive Definiteness

Simulation Studies: Setup

Goal: Evaluate HGP’s ability to adapt to different scenarios and compare its performance to specialized models.
200 simulated datasets for each combination of data generation process and scenario.
Estimation assessment: MAPE, RMSE, frequentist coverage of the credible interval (CP).
Predictions assessment: bias, RMSP, interval score (Gneiting and Raftery 2007, IS), frequentist coverate of the prediction intervals (CPP).

Simulation Studies: Scenarios

Simulation Study: Areal Data

Data model: Poisson GLMM, single covariate, log link function, and a spatial random effect.
We fit Poisson GLMMs with a homoscedastic HGP prior on the random effect.
Assessment:
- Parameter estimation when generating data from HGP random effects.
- How good are the estimates of the regression coefficient under random effect misspecification?
- Goodness-of-fit comparison with DAGAR priors on the random effect.

Areal Random Effects

Data Generation:
- Besag-York-Mollié (BYM, Besag et al. 1991): \(\mathbf{Z} \sim \mathcal{N}(\mathbf{0}, \sigma^{2}((1 - \zeta) \mathbf{I} + \zeta \mathbf{Q}^{-})\), where \(\mathbf{Q}\) is the precision matrix from the ICAR model and \(\zeta\) is a mixing parameter.
- Aggregated GP (AGP): \(Z(\mathbf{s}_i) = {\mathcal{A}(\mathbf{s}_i)}^{-1} \int_{\mathbf{s}_i} K(\mathbf{s}) \mathrm{d} \mathbf{s}\), where \(\{ K(\mathbf{s}) \, : \, \mathbf{s} \in D \}\) is a (point-level) zero-mean, second order stationary and isotropic GP.
Competing Method:
- DAGAR: \(\mathbf{Z} \sim \mathcal{N}(\mathbf{0}, \Lambda(\tau, \psi))\), where \(\Lambda\) is a precision matrix, \(\tau\) is the conditional precision, and \(\psi\) measures spatial dependence.

Areal Simulation: Estimation

Generating data from HGP:
- CP of the hardest parameter to estimate (\(\rho\)) was \(92.5\%\) in the worst case scenario.
- For all the other parameters, the coverage was extremely close to the nominal level.
Regression parameter under misspecification:
- Largest (worst) MAPE was \(6.6\%\).
- CP of the credible interval suggests reliable estimates.

Areal Simulation: GOF

HGP outperforms DAGAR when simulating data from a AGP random effect.

Simulation Study: Data Fusion

Data model: Gaussian GLMM, no covariates, identity link function, and a spatial random effect.
We fit the models with a heteroscedastic HGP as prior on the random effect.
Competing method: AGP₁ and AGP₂ priors on the random effect. The latter employes a finer grid/mesh resolution to approximate integrals numerically.
Parameters estimation when generating data from HGP random effects.
Model comparison: Predictions assessment.

Competing Method: Meshes

Competing Method: Further Details

The model employs a Matérn covariance function (at point level) with smoothness parameter fixed at 1. The other two parameters are the SD \(\sigma\) and the practical range \(\rho\). Both are analogous to the parameters with the same name in the HGP model.
Inferences are conducted using the INLA (Rue et al. 2009) and the stochastic partial differential equation (Lindgren et al. 2011, SPDE) method.

Data Fusion Simulation: Estimation

The HGP inherits same problems from geostatistical models.
There are no weakly consistent estimators for the variance and spatial dependence parameters (Zhang 2004)
Predictions are still optimal (Zhang 2004)
The small scale variation parameter had a low coverage in the credible intervals (\(79.5\%\) for scenario 2).
The MAPE for the \(\sigma\), \(\sigma_a\), and \(\rho\) parameters fluctuated around \(10\%\) in both scenarios.

Data Fusion Simulation: Prediction

Uncertainty of the predictions: The HGP handles uncertainty in its predictions better than the AGP₁ model, even when the latter reflects the true data generation process.

HGP as an AGP Approximation: The HGP delivers satisfactory approximations to an AGP model.
HGP Advantages: The HGP offers potential computational advantages for large datasets while maintaining clear parameter interpretation.

Highlights

Flexibility & Interpretability: Models diverse spatial data types and provides meaningful spatial parameters, regardless of unit size or shape.
Predictions: The HGP delivers accurate predictions
Areal Data: The model delivers valuable insights on the spatial dependence and accounts for sample units’ size, shape, and orientation.
Data Fusion: The HGP simplifies the modeling process eliminating the need of a user-defined grid or mesh. Consequently, avoids computationally intensive numerical integrations with arbitrary precision, making inference seamless.