### General setup

Let us set up the notations first. Suppose a there exists a partition of a region $${\rm D} \in {\cal R}^2$$ (e.g., a city). This partition is denoted by $$A_i$$, $$i = 1, \ldots, n$$. Moreover, there exists another parition of the same city, denoted $$B_j$$, where $$j = 1, \ldots, m$$. These partitions can be seen as two different administrative divisions within the same city. It is common for different government agencies to release data for different divisions of a same city, country, or state.

### Model-based approach

Assume we observe a random variable $$Y(\cdot)$$ at each region $$A_i$$ and we are interested in predict/estimate this variable in each of the regions $$B_j$$. Now suppose the random variable $$Y(\cdot)$$ varies continuously over $${\rm D}$$ and is defined as follows $Y(\mathbf{s}) = \mu + S(\mathbf{s}) + \varepsilon(\mathbf{s}), \, \mathbf{s} \in {\rm D} \subset {\cal R}^2.$ where $S(\cdot) \sim {\rm GP}(0, \sigma^2 \rho(\cdot; \, \phi, \kappa)) \; \text{ and } \; \varepsilon(\cdot) \overset{{\rm i.i.d.}}{\sim} {\rm N}(0, \sigma^2 \rho(\cdot; \, \phi, \kappa)),$ with $$S$$ and $$\varepsilon$$ independent of each other. For now, let’s make the unrealistic assumption that all those parameters are known. Then, our assumption is that the observed data is as follows \begin{align*} Y(A_i) & = \frac{1}{\lvert A_i \rvert} \int_{A_i} Y(\mathbf{s}) \, {\rm d} \mathbf{s} \\ & = \frac{1}{\lvert A_i \rvert} \int_{A_i} [\mu + S(\mathbf{s}) + \varepsilon(\mathbf{s})] \, {\rm d} \mathbf{s} \\ & = \mu + \frac{1}{\lvert A_i \rvert} \int_{A_i} S(\mathbf{s}) {\rm d} \mathbf{s} + \frac{1}{\lvert A_i \rvert} \int_{A_i} \varepsilon(\mathbf{s}) {\rm d} \mathbf{s}, \end{align*} where $$\lvert \cdot \rvert$$ returns the area of a polygon. Furthermore, it can be shown that (using Fubini’s Theorem and some algebraic manipulation) ${\rm Cov}(Y(A_i), Y(A_j)) = \frac{\sigma^2}{\lvert A_i \rvert \lvert A_j \rvert} \int_{A_i \times A_j} \rho( \lVert \mathbf{s} - \mathbf{s}' \rVert; \, \phi, \kappa ) \, {\rm d} \mathbf{s} \, {\rm d} \mathbf{s}' + \mathbf{I}(i = j) \frac{\tau}{\lvert A_i \rvert},$ where $$\rho(\cdot ; \, \phi, \kappa)$$ is a positive definite correlation function. Now, let $${\rm R}_{\kappa}(\phi)$$ be a correlation matrix such that ${\rm R}_{\kappa}(\phi)_{ij} = \frac{1}{\lvert A_i \rvert \lvert A_j \rvert} \int_{A_i \times A_j} \rho( \lVert \mathbf{s} - \mathbf{s}' \rVert; \, \phi, \kappa ) \, {\rm d} \mathbf{s} \, {\rm d} \mathbf{s}',$ thus, $Y(A_1, \cdots, A_n) \sim {\rm N}( \mu \mathbf{1}_n, \sigma^2 {\rm R}_{\kappa}(\phi) + \tau {\rm diag}(\lvert A_1 \rvert^{-1}, \ldots, \lvert A_1 \rvert^{-1})).$ Then, if we assume $$(Y^{\top}(A_1, \cdots, A_n), Y^{\top}(B_1, \cdots, A_m)^{\top})$$ to be jointly normal, we use can the conditional mean of $$Y^{\top}(B_1, \cdots, A_m)^{\top}$$ given $$Y^{\top}(A_1, \cdots, A_n)$$ to estimate the observed random variable in the partition $$B_1, \ldots, B_m$$.

Now, suppose the parameters $$\boldsymbol{\theta} = (\mu, \sigma^2, \phi, \tau)$$ are unknown. The Likelihood of $$Y(A_1, \ldots, A_n)$$ can still be computed.

In particular, if we use the parametrization $$\nu = \tau / \sigma^2$$, we have closed form for the Maximum Likelihood estimators both for $$\mu$$ and $$\sigma^2$$. Thus, we can optimize the profile likelihood for $$\phi$$ and $$\nu$$ numerically. Then, we resort on conditional Normal properties again to compute the predictions in a new different set of regions.

### Areal Interpolation (AI)

Areal interpolation is a nonparametric approach that interpolates $$Y(A_i)$$’s to construct $$Y(B_j)$$’s. Define an $$m \times n$$ matrix $$\mathbf{W} = \{ w_{ij} \}$$, where $$w_{ij}$$ is the weight associated with the polygon $$A_i$$ in constructing $$Y(B_j)$$. The weights are $$w_{ij} = \lvert A_i \cap B_j \rvert / \lvert B_j \rvert$$ (Goodchild and Lam 1980; Gotway and Young 2002). The interpolation for $$\hat Y(B_1, \ldots, B_m)$$ is constructed as $\begin{equation} \label{eq:np-est} \hat{Y}(B_1, \ldots, B_m) = \mathbf{W} Y(A_1, \ldots, A_n). \end{equation}$ The expectation and variance of the predictor are, respectively, ${\rm E}[\hat{Y}(B_1, \ldots, B_m)] = \mathbf{W} {\rm E}[Y(A_1, \ldots, A_n)]$ and $\begin{equation} \label{eq:np-matcov} \textrm{Var}[\hat{Y}(B_1, \ldots, B_m)] = \mathbf{W} \textrm{Var}[Y(A_1, \ldots, A_n)] \mathbf{W}^{\top}. \end{equation}$ In practice, the covariance matrix $$\textrm{Var}[Y(A_1, \ldots, A_n)]$$ is unknown and, consequently needs to be estimated.

The variance each predictor $$\text{Var}[\hat Y(B_i)]$$ is needed as an uncertainty measure. It relies on both the variances of $$Y(A_j)$$’s and their covariances: \begin{align} \label{eq:np-single-var} \textrm{Var}[\hat{Y}(B_i)] = \sum_{i = 1}^n w^2_{ij} \textrm{Var} \left [ Y(A_i) \right ] + 2 \sum_{l \neq i} w_{ij} w_{il} \textrm{Cov} \left[ Y(A_i), Y(A_l) \right]. \end{align} The variances are often observed in survey data, but the covariances are not. For practical purpose, we propose an approximation for $$\textrm{Cov}[ Y(A_i), Y(A_l)]$$ based on Moran’s I, a global spatial autocorrelation. Specifically, let $$\rho_I$$ be the Moran’s I calculated with a weight matrix constructed with first-degree neighbors. That is, $$\rho_I$$ is the average of the pairwise correlation for all neighboring pairs. For regions $$A_i$$ and $$A_l$$, if they are neighbors of each other, our approximation is \begin{align} \label{eq:cova} \textrm{Cov} \left[ Y(A_i), Y(A_l) \right] = \rho_I \sqrt{\text{Var}[Y(A_i)] \text{Var}[Y(A_l)]}. \end{align} The covariance between non-neighboring $$A_i$$ and $$A_l$$ is discarded. The final uncertainty approximation for $$\textrm{Var}[\hat{Y}(B_i)]$$ will be an underestimate. Alternatively, we can derive, at least, an upper bound for the variance of the estimates by using a simple application from the Cauchy–Schwartz inequality, in which case, $$\rho_I$$ is replaced with~1.

## Reference

Goodchild, Michael F, and Nina Siu-Ngan Lam. 1980. “Areal Interpolation: A Variant of the Traditional Spatial Problem.” Geo-Processing 1: 279–312.

Gotway, Carol A, and Linda J Young. 2002. “Combining Incompatible Spatial Data.” Journal of the American Statistical Association 97 (458): 632–48.