Tuberculosis in context

  • Mortality: Tuberculosis (TB) remains a major global health threat, second in infectious disease mortality only to COVID-19.

  • Rio Grande do Sul (RS) reported significantly higher incidence than the national average in 2021, with the eastern region even more affected.

  • Dependence: Studies demonstrate strong spatial dependence of TB infections in Brazil, but temporal and spatiotemporal structures have been largely overlooked.

  • Risk Factors: TB risk factors include densely populated areas, poverty, substance abuse, and incarceration (Cortez et al. 2021).

Spatiotemporal (SPT) models for areal data

  • Spatial models: CAR (Besag 1974), ICAR, BYM (Besag et al. 1991), DAGAR (Datta et al. 2019), RENeGe (Cruz-Reyes et al. 2023).

  • Nonseparable SPT models are more complex as they consider that the spatial and temporal correlations might be intertwined (Cressie and Wikle 2015, pg. 309–321).

  • Separable models one way to look at these models is as multivariate spatial processes (MacNab 2022).

  • Advantages of separable models: Computational efficiency & positive-definiteness of the covariance function.

Proposed methodology & Objectives

  • Hausdorff–Gaussian Process (HGP): we propose using the newly developed HGP for the spatial portion of the model (Godoy et al. 2024).

  • Reliable incidence estimates:

    • Smaller municipalities benefit from borrowed strength from neighbors, improving estimate reliability.
    • Results enable the calculation of standardized incidence ratios to pinpoint high-risk areas.
  • Forecasting: Predicted TB incidence rates one year ahead offer crucial insights for proactive public health planning.

Hausdorff–Gaussian Process (HGP)


  • Areal spatial units are (closed and bounded) sets.

  • We need to generalize distance between points to distance between sets.

  • Ideally, this distance should:

    1. Take into account the shape, size, and orientation of spatial sample units.
    2. Be “spatially interpretable”.

Distances between sets

  • Distance between a point and a set: \(d(x, A) = \inf_{a \in A} d(x, a)\), where \(d(x, y)\) is the distance between any two elements \(x, y \in D\)

  • Directed Hausdorff & Hausdorff distance: \[{\vec h}(A, B) = \sup_{a \in A} d(a, B) \quad \text{and} \quad h(A, B) = \max \left \{ \vec{h}(A, B), \vec{h}(B, A) \right \}\]


  • General spatial model: \(\{ Z(\mathbf{s}) \; : \; \mathbf{s} \in \mathcal{B}(D) \}\).

  • Index set: \(\mathcal{B}(D)\) represents the closed and bounded subsets of \(D \subset \mathbb{R}^2\).

  • Assumption: The HGP assumes \(Z(\mathbf{s})\) to be an isotropic Gaussian Process such that its spatial correlation function depends on the Hausdorff distance.

  • Powered Exponential Correlation (PEC) function: \(r(h) = \exp\left \{ - \frac{h^{\nu}}{\phi^{\nu}}\right \},\) where \(h\) denotes the Hausdorff distance between \(\mathbf{s}_1, \mathbf{s}_2 \in \mathcal{B}(D)\).

Tuberculosis spatiotemporal modeling

Data & Model

  • Sample units: 54 municipalities, across 11 years (2011 to 2021). We use 2022 to assess the quality of predictions.

  • Number of TB cases: \(Y_t(\mathbf{s}_i)\) at location \(\mathbf{s}_i\) and time \(t\).

  • Population: \(P_t(\mathbf{s}_i)\).

  • Five covariates and two way interactions with presence of prison (except IDESE).

\[\begin{aligned} & (Y_t(\mathbf{s}_i) \mid \mathbf{X}_{t}(\mathbf{s}_i), Z(\mathbf{s}_i, t)) \overset{{\rm ind}}{\sim} \text{Poisson}(P_t(\mathbf{s}_i) \mu_{it}) \\ & \log(\mu_{it}) = \alpha + \mathbf{X}^\top_{t}(\mathbf{s}_i) \beta + Z(\mathbf{s}_i, t) \end{aligned}\]


  • We assume \(Z(\mathbf{s}, t)\) is a separable zero-mean Gaussian model such that its SPT covariance matrix is the kronecker product between a spatial covariance (HGP, BYM, & DAGAR) and a temporal correlation (\(\mathrm{AR}(1)\)).

  • HGP spatial dependence: \(\rho \sim \mathrm{Exp}(a_\rho)\), where \(a_{\rho} = - \log(p_{\rho}) / \rho_0\). \(a_\rho\) is chosen such that \(\mathbb{P}(\rho > \rho_0) = p_\rho\).

  • Smoothness & marginal SD: \(\nu \sim \mathrm{Beta}(2.5, 1.5)\) (mode at \(0.75\)) & \(\sigma \sim t_{+}(3)\).

  • Temporal dependence: PC prior (Sørbye and Rue 2017) where \(\mathbb{P}(\lvert \psi \rvert > 0.8) = 0.1\).

  • Intercept & regression coefficients: \(\alpha\) (i.e., \(\pi(\alpha) \propto 1\)) & \(\boldsymbol{\beta} \sim \mathcal{N}(\mathbf{0}, 10 \mathbf{I})\)

Computational considerations

  • Super effortful: \(vec(\mathbf{Z}) \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathrm{R}_s \otimes \mathrm{R}_t)\) requires \(\mathcal{O}(N^3 T^3)\) flops (and storage).

  • Effortful: With linear algebra, we can reduce the computational complexity (and storage) to \(\approx \mathcal{O}(N^3 + T^3)\)

  • Neutral: More linear algebra can be used to evaluate a quadratic form with less operations.

  • Clever: The Cholesky decomposition of \(R^{-1}_t\) is tridiagonal.

  • Super clever: The complexity to obtain \(chol(R^{-1}_s)\) is dramatically decreased using nearest-neighbor approximations (Finley et al. 2019).

Bayesian Inference & Model Assessment

  • Posterior: \(\pi(\boldsymbol{\theta} \mid \mathbf{y}, \mathbf{z}) \propto p(\mathbf{y} \mid \mathbf{z}, \boldsymbol{\theta}) p(\mathbf{z} \mid \boldsymbol{\theta}) \pi(\boldsymbol{\theta})\)

  • MCMC sampler: No-U-Turn (Homan and Gelman 2014).

  • Convergence assessment: traceplots and split-\({\hat{R}}\) (Vehtari et al. 2021).

  • Goodness-of-fit criteria: LOOIC (lower values indicate better fit)

  • Posterior predictive distributions: \(p(\mathbf{y}^{\ast} \mid \mathbf{y})\)

  • Predictions assessment: Interval Score (IS) and RMSP (lower values indicate better fit)

Spatiotemporal Trend

Explanatory Variables

Results: GOF and Predictive Performance

HGP 3516.1 21.1 87.8
BYM 3606.1 123.3 176.6
DAGAR 3520.9 22.4 88.8

Results: Relative Risks

Parameter Description Estimate
\(\exp(\beta_1)\) Prison 2.34 (1.70, 3.19)
\(\exp(\beta_2)\) Pop / km2 1.33 (1.15, 1.56)
\(\exp(\beta_2 + \beta_{21})\) 1.75 (1.18, 2.52)
\(\exp(\beta_3)\) HS dropout % 1.03 (0.99, 1.07)
\(\exp(\beta_3 + \beta_{31})\) 2.25 (1.63, 3.09)
\(\exp(\beta_4)\) Homicide rate 0.97 (0.93, 1.00)
\(\exp(\beta_4 + \beta_{41})\) 2.51 (1.83, 3.46)
\(\exp(\beta_5)\) IDESE 0.99 (0.92, 1.07)

Spatiotemporal Dependence

Small Municipalities


Closing remarks

  • Tailored an HGP extension for spatiotemporal disease mapping.

  • Competitive with specialized models

  • It helps to gain insights into spatiotemporal disease mapping through spatiotemporal correlation functions.

  • More reliable estimates of risk factors

  • Out-of-sample predictions to inform public policies


Sensitivity analysis