4. Method • smile

General setup

Let us set up the notations first. Suppose a there exists a partition of a region $\mathrm{D} \in \mathcal{R}^2$ (e.g., a city). This partition is denoted by $A_i$ , $i = 1, \ldots, n$ . Moreover, there exists another partition of the same city, denoted $B_j$ , where $j = 1, \ldots, m$ . These partitions can be seen as two different administrative divisions within the same city. It is common for different government agencies to release data for different divisions of a same city, country, or state.

Model-based approach

Consequently,

$\mathrm{E}[Y(A_i)] = \frac{1}{\lvert A_i \rvert} \int_{A_i} \mathrm{E}[Y(\mathbf{s})] \, \mathrm{d} \mathbf{s} = \frac{1}{\lvert A_i \rvert} \int_{A_i} \mu \, \mathrm{d} \mathbf{s} = \mu$

and

$\begin{align*} \mathrm{Cov}[Y(A_i), Y(A_j)] & = \frac{1}{\lvert A_i \rvert \lvert A_j \rvert} \int_{A_i \times A_j} \mathrm{Cov}[Y(\mathbf{s}, Y(\mathbf{s}')] \, \mathrm{d} \mathbf{s} \, \mathrm{d} \mathbf{s'} \\ & = \frac{1}{\lvert A_i \rvert \lvert A_j \rvert} \int_{A_i \times A_j} \mathrm{C}( \lVert \mathbf{s} - \mathbf{s}' \rVert ; \boldsymbol{\theta}) \, \mathrm{d} \mathbf{s} \, \mathrm{d} \mathbf{s'}, \end{align*}$

where $\lVert \mathbf{s} - \mathbf{s}' \rVert$ is the Euclidean distance between the coordinates $\mathbf{s}$ and $\mathbf{s}'$ , and $\mathrm{C}( \lVert \mathbf{s} - \mathbf{s}' \rVert ; \boldsymbol{\theta})$ is an isotropic covariance function depending on the parameter $\boldsymbol{\theta}$ .

Assume we observe a random variable $Y(\cdot)$ at each region $A_i$ and we are interested in predict/estimate this variable in each of the regions $B_j$ . Now suppose the random variable $Y(\cdot)$ varies continuously over $\mathrm{D}$ and is defined as follows $Y(\mathbf{s}) = \mu + S(\mathbf{s}) + \varepsilon(\mathbf{s}), \, \mathbf{s} \in \mathrm{D} \subset \mathcal{R}^2.$

where $S(\cdot) \sim \mathrm{GP}(0, \sigma^2 \rho(\cdot; \, \phi, \kappa)) \; \text{ and } \; \varepsilon(\cdot) \overset{\mathrm{i.i.d.}}{\sim} \mathrm{N}(0, \sigma^2 \rho(\cdot; \, \phi, \kappa)),$ where $S$ and $\varepsilon$ are independent. For now, let’s make the unrealistic assumption that all those parameters are known. Then, our assumption is that the observed data is as follows

$\begin{align*} Y(A_i) & = \frac{1}{\lvert A_i \rvert} \int_{A_i} Y(\mathbf{s}) \, \mathrm{d} \mathbf{s} \\ & = \frac{1}{\lvert A_i \rvert} \int_{A_i} [\mu + S(\mathbf{s}) + \varepsilon(\mathbf{s})] \, \mathrm{d} \mathbf{s} \\ & = \mu + \frac{1}{\lvert A_i \rvert} \int_{A_i} S(\mathbf{s}) \mathrm{d} \mathbf{s} + \frac{1}{\lvert A_i \rvert} \int_{A_i} \varepsilon(\mathbf{s}) \mathrm{d} \mathbf{s}, \end{align*}$

where $\lvert \cdot \rvert$ returns the area of a polygon. Furthermore, it can be shown that (using Fubini’s Theorem and some algebraic manipulation) $\mathrm{Cov}(Y(A_i), Y(A_j)) = \frac{\sigma^2}{\lvert A_i \rvert \lvert A_j \rvert} \int_{A_i \times A_j} \rho( \lVert \mathbf{s} - \mathbf{s}' \rVert; \, \phi, \kappa ) \, \mathrm{d} \mathbf{s} \, \mathrm{d} \mathbf{s}' + \mathbf{I}(i = j) \frac{\tau}{\lvert A_i \rvert},$ where $\rho(\cdot ; \, \phi, \kappa)$ is a positive definite correlation function. Now, let $\mathrm{R}_{\kappa}(\phi)$ be a correlation matrix such that $\mathrm{R}_{\kappa}(\phi)_{ij} = \frac{1}{\lvert A_i \rvert \lvert A_j \rvert} \int_{A_i \times A_j} \rho( \lVert \mathbf{s} - \mathbf{s}' \rVert; \, \phi, \kappa ) \, \mathrm{d} \mathbf{s} \, \mathrm{d} \mathbf{s}',$ thus, $Y(A_1, \cdots, A_n) \sim \mathrm{N}( \mu \mathbf{1}_n, \sigma^2 \mathrm{R}_{\kappa}(\phi) + \tau \mathrm{diag}(\lvert A_1 \rvert^{-1}, \ldots, \lvert A_1 \rvert^{-1})).$ Then, if we assume $(Y^{\top}(A_1, \cdots, A_n), Y^{\top}(B_1, \cdots, A_m)^{\top})$ to be jointly normal, we use can the conditional mean of $Y^{\top}(B_1, \cdots, A_m)^{\top}$ given $Y^{\top}(A_1, \cdots, A_n)$ to estimate the observed random variable in the partition $B_1, \ldots, B_m$ .

Now, suppose the parameters $\boldsymbol{\theta} = (\mu, \sigma^2, \phi, \tau)$ are unknown. The Likelihood of $Y(A_1, \ldots, A_n)$ can still be computed.

In particular, if we use the parametrization $\alpha = \tau / \sigma^2$ , we have closed form for the Maximum Likelihood estimators both for $\mu$ and $\sigma^2$ . Thus, we can optimize the profile likelihood for $\phi$ and $\alpha$ numerically. Then, we resort on conditional Normal properties again to compute the predictions in a new different set of regions.

Areal Interpolation (AI)

Areal interpolation is a nonparametric approach that interpolates $Y(A_i)$ ’s to construct $Y(B_j)$ ’s. Define an $m \times n$ matrix $\mathbf{W} = \{ w_{ij} \}$ , where $w_{ij}$ is the weight associated with the polygon $A_i$ in constructing $Y(B_j)$ . The weights are $w_{ij} = \lvert A_i \cap B_j \rvert / \lvert B_j \rvert$ (Goodchild and Lam 1980; Gotway and Young 2002). The interpolation for $\hat Y(B_1, \ldots, B_m)$ is constructed as $\begin{equation} \label{eq:np-est} \hat{Y}(B_1, \ldots, B_m) = \mathbf{W} Y(A_1, \ldots, A_n). \end{equation}$ The expectation and variance of the predictor are, respectively, $\mathrm{E}[\hat{Y}(B_1, \ldots, B_m)] = \mathbf{W} \mathrm{E}[Y(A_1, \ldots, A_n)]$ and $\begin{equation} \label{eq:np-matcov} \textrm{Var}[\hat{Y}(B_1, \ldots, B_m)] = \mathbf{W} \textrm{Var}[Y(A_1, \ldots, A_n)] \mathbf{W}^{\top}. \end{equation}$ In practice, the covariance matrix $\textrm{Var}[Y(A_1, \ldots, A_n)]$ is unknown and, consequently needs to be estimated.

The variance each predictor $\text{Var}[\hat Y(B_i)]$ is needed as an uncertainty measure. It relies on both the variances of $Y(A_j)$ ’s and their covariances: $\begin{align} \label{eq:np-single-var} \textrm{Var}[\hat{Y}(B_i)] = \sum_{i = 1}^n w^2_{ij} \textrm{Var} \left [ Y(A_i) \right ] + 2 \sum_{l \neq i} w_{ij} w_{il} \textrm{Cov} \left[ Y(A_i), Y(A_l) \right]. \end{align}$ The variances are often observed in survey data, but the covariances are not. For practical purpose, we propose an approximation for $\textrm{Cov}[ Y(A_i), Y(A_l)]$ based on Moran’s I, a global spatial autocorrelation. Specifically, let $\rho_I$ be the Moran’s I calculated with a weight matrix constructed with first-degree neighbors. That is, $\rho_I$ is the average of the pairwise correlation for all neighboring pairs. For regions $A_i$ and $A_l$ , if they are neighbors of each other, our approximation is $\begin{align} \label{eq:cova} \textrm{Cov} \left[ Y(A_i), Y(A_l) \right] = \rho_I \sqrt{\text{Var}[Y(A_i)] \text{Var}[Y(A_l)]}. \end{align}$ The covariance between non-neighboring $A_i$ and $A_l$ is discarded. The final uncertainty approximation for $\textrm{Var}[\hat{Y}(B_i)]$ will be an underestimate. Alternatively, we can derive, at least, an upper bound for the variance of the estimates by using a simple application from the Cauchy–Schwartz inequality, in which case, $\rho_I$ is replaced with~1.

Reference

Goodchild, Michael F, and Nina Siu-Ngan Lam. 1980. “Areal Interpolation: A Variant of the Traditional Spatial Problem.” Geo-Processing 1: 279–312.

Gotway, Carol A, and Linda J Young. 2002. “Combining Incompatible Spatial Data.” Journal of the American Statistical Association 97 (458): 632–48.