Skip to contents

General setup

Let us set up the notations first. Suppose a there exists a partition of a region D2\mathrm{D} \in \mathcal{R}^2 (e.g., a city). This partition is denoted by AiA_i, i=1,,ni = 1, \ldots, n. Moreover, there exists another partition of the same city, denoted BjB_j, where j=1,,mj = 1, \ldots, m. These partitions can be seen as two different administrative divisions within the same city. It is common for different government agencies to release data for different divisions of a same city, country, or state.

Model-based approach

Consequently,

E[Y(Ai)]=1|Ai|AiE[Y(𝐬)]d𝐬=1|Ai|Aiμd𝐬=μ\mathrm{E}[Y(A_i)] = \frac{1}{\lvert A_i \rvert} \int_{A_i} \mathrm{E}[Y(\mathbf{s})] \, \mathrm{d} \mathbf{s} = \frac{1}{\lvert A_i \rvert} \int_{A_i} \mu \, \mathrm{d} \mathbf{s} = \mu

and

Cov[Y(Ai),Y(Aj)]=1|Ai||Aj|Ai×AjCov[Y(𝐬,Y(𝐬)]d𝐬d𝐬=1|Ai||Aj|Ai×AjC(𝐬𝐬;𝛉)d𝐬d𝐬,\begin{align*} \mathrm{Cov}[Y(A_i), Y(A_j)] & = \frac{1}{\lvert A_i \rvert \lvert A_j \rvert} \int_{A_i \times A_j} \mathrm{Cov}[Y(\mathbf{s}, Y(\mathbf{s}')] \, \mathrm{d} \mathbf{s} \, \mathrm{d} \mathbf{s'} \\ & = \frac{1}{\lvert A_i \rvert \lvert A_j \rvert} \int_{A_i \times A_j} \mathrm{C}( \lVert \mathbf{s} - \mathbf{s}' \rVert ; \boldsymbol{\theta}) \, \mathrm{d} \mathbf{s} \, \mathrm{d} \mathbf{s'}, \end{align*}

where 𝐬𝐬\lVert \mathbf{s} - \mathbf{s}' \rVert is the Euclidean distance between the coordinates 𝐬\mathbf{s} and 𝐬\mathbf{s}', and C(𝐬𝐬;𝛉)\mathrm{C}( \lVert \mathbf{s} - \mathbf{s}' \rVert ; \boldsymbol{\theta}) is an isotropic covariance function depending on the parameter 𝛉\boldsymbol{\theta}.

Assume we observe a random variable Y()Y(\cdot) at each region AiA_i and we are interested in predict/estimate this variable in each of the regions BjB_j. Now suppose the random variable Y()Y(\cdot) varies continuously over D\mathrm{D} and is defined as follows Y(𝐬)=μ+S(𝐬)+ε(𝐬),𝐬D2.Y(\mathbf{s}) = \mu + S(\mathbf{s}) + \varepsilon(\mathbf{s}), \, \mathbf{s} \in \mathrm{D} \subset \mathcal{R}^2.

where S()GP(0,σ2ρ(;ϕ,κ)) and ε()i.i.d.N(0,σ2ρ(;ϕ,κ)), S(\cdot) \sim \mathrm{GP}(0, \sigma^2 \rho(\cdot; \, \phi, \kappa)) \; \text{ and } \; \varepsilon(\cdot) \overset{\mathrm{i.i.d.}}{\sim} \mathrm{N}(0, \sigma^2 \rho(\cdot; \, \phi, \kappa)), where SS and ε\varepsilon are independent. For now, let’s make the unrealistic assumption that all those parameters are known. Then, our assumption is that the observed data is as follows

Y(Ai)=1|Ai|AiY(𝐬)d𝐬=1|Ai|Ai[μ+S(𝐬)+ε(𝐬)]d𝐬=μ+1|Ai|AiS(𝐬)d𝐬+1|Ai|Aiε(𝐬)d𝐬,\begin{align*} Y(A_i) & = \frac{1}{\lvert A_i \rvert} \int_{A_i} Y(\mathbf{s}) \, \mathrm{d} \mathbf{s} \\ & = \frac{1}{\lvert A_i \rvert} \int_{A_i} [\mu + S(\mathbf{s}) + \varepsilon(\mathbf{s})] \, \mathrm{d} \mathbf{s} \\ & = \mu + \frac{1}{\lvert A_i \rvert} \int_{A_i} S(\mathbf{s}) \mathrm{d} \mathbf{s} + \frac{1}{\lvert A_i \rvert} \int_{A_i} \varepsilon(\mathbf{s}) \mathrm{d} \mathbf{s}, \end{align*}

where ||\lvert \cdot \rvert returns the area of a polygon. Furthermore, it can be shown that (using Fubini’s Theorem and some algebraic manipulation) Cov(Y(Ai),Y(Aj))=σ2|Ai||Aj|Ai×Ajρ(𝐬𝐬;ϕ,κ)d𝐬d𝐬+𝐈(i=j)τ|Ai|, \mathrm{Cov}(Y(A_i), Y(A_j)) = \frac{\sigma^2}{\lvert A_i \rvert \lvert A_j \rvert} \int_{A_i \times A_j} \rho( \lVert \mathbf{s} - \mathbf{s}' \rVert; \, \phi, \kappa ) \, \mathrm{d} \mathbf{s} \, \mathrm{d} \mathbf{s}' + \mathbf{I}(i = j) \frac{\tau}{\lvert A_i \rvert}, where ρ(;ϕ,κ)\rho(\cdot ; \, \phi, \kappa) is a positive definite correlation function. Now, let Rκ(ϕ)\mathrm{R}_{\kappa}(\phi) be a correlation matrix such that Rκ(ϕ)ij=1|Ai||Aj|Ai×Ajρ(𝐬𝐬;ϕ,κ)d𝐬d𝐬, \mathrm{R}_{\kappa}(\phi)_{ij} = \frac{1}{\lvert A_i \rvert \lvert A_j \rvert} \int_{A_i \times A_j} \rho( \lVert \mathbf{s} - \mathbf{s}' \rVert; \, \phi, \kappa ) \, \mathrm{d} \mathbf{s} \, \mathrm{d} \mathbf{s}', thus, Y(A1,,An)N(μ𝟏n,σ2Rκ(ϕ)+τdiag(|A1|1,,|A1|1)). Y(A_1, \cdots, A_n) \sim \mathrm{N}( \mu \mathbf{1}_n, \sigma^2 \mathrm{R}_{\kappa}(\phi) + \tau \mathrm{diag}(\lvert A_1 \rvert^{-1}, \ldots, \lvert A_1 \rvert^{-1})). Then, if we assume (Y(A1,,An),Y(B1,,Am))(Y^{\top}(A_1, \cdots, A_n), Y^{\top}(B_1, \cdots, A_m)^{\top}) to be jointly normal, we use can the conditional mean of Y(B1,,Am)Y^{\top}(B_1, \cdots, A_m)^{\top} given Y(A1,,An)Y^{\top}(A_1, \cdots, A_n) to estimate the observed random variable in the partition B1,,BmB_1, \ldots, B_m.


Now, suppose the parameters 𝛉=(μ,σ2,ϕ,τ)\boldsymbol{\theta} = (\mu, \sigma^2, \phi, \tau) are unknown. The Likelihood of Y(A1,,An)Y(A_1, \ldots, A_n) can still be computed.

In particular, if we use the parametrization α=τ/σ2\alpha = \tau / \sigma^2, we have closed form for the Maximum Likelihood estimators both for μ\mu and σ2\sigma^2. Thus, we can optimize the profile likelihood for ϕ\phi and α\alpha numerically. Then, we resort on conditional Normal properties again to compute the predictions in a new different set of regions.

Areal Interpolation (AI)

Areal interpolation is a nonparametric approach that interpolates Y(Ai)Y(A_i)’s to construct Y(Bj)Y(B_j)’s. Define an m×nm \times n matrix 𝐖={wij}\mathbf{W} = \{ w_{ij} \}, where wijw_{ij} is the weight associated with the polygon AiA_i in constructing Y(Bj)Y(B_j). The weights are wij=|AiBj|/|Bj|w_{ij} = \lvert A_i \cap B_j \rvert / \lvert B_j \rvert(Goodchild and Lam 1980; Gotway and Young 2002). The interpolation for Ŷ(B1,,Bm)\hat Y(B_1, \ldots, B_m) is constructed as Ŷ(B1,,Bm)=𝐖Y(A1,,An).\begin{equation} \label{eq:np-est} \hat{Y}(B_1, \ldots, B_m) = \mathbf{W} Y(A_1, \ldots, A_n). \end{equation} The expectation and variance of the predictor are, respectively, E[Ŷ(B1,,Bm)]=𝐖E[Y(A1,,An)] \mathrm{E}[\hat{Y}(B_1, \ldots, B_m)] = \mathbf{W} \mathrm{E}[Y(A_1, \ldots, A_n)] and Var[Ŷ(B1,,Bm)]=𝐖Var[Y(A1,,An)]𝐖.\begin{equation} \label{eq:np-matcov} \textrm{Var}[\hat{Y}(B_1, \ldots, B_m)] = \mathbf{W} \textrm{Var}[Y(A_1, \ldots, A_n)] \mathbf{W}^{\top}. \end{equation} In practice, the covariance matrix Var[Y(A1,,An)]\textrm{Var}[Y(A_1, \ldots, A_n)] is unknown and, consequently needs to be estimated.

The variance each predictor Var[Ŷ(Bi)]\text{Var}[\hat Y(B_i)] is needed as an uncertainty measure. It relies on both the variances of Y(Aj)Y(A_j)’s and their covariances: Var[Ŷ(Bi)]=i=1nwij2Var[Y(Ai)]+2liwijwilCov[Y(Ai),Y(Al)].\begin{align} \label{eq:np-single-var} \textrm{Var}[\hat{Y}(B_i)] = \sum_{i = 1}^n w^2_{ij} \textrm{Var} \left [ Y(A_i) \right ] + 2 \sum_{l \neq i} w_{ij} w_{il} \textrm{Cov} \left[ Y(A_i), Y(A_l) \right]. \end{align} The variances are often observed in survey data, but the covariances are not. For practical purpose, we propose an approximation for Cov[Y(Ai),Y(Al)]\textrm{Cov}[ Y(A_i), Y(A_l)] based on Moran’s I, a global spatial autocorrelation. Specifically, let ρI\rho_I be the Moran’s I calculated with a weight matrix constructed with first-degree neighbors. That is, ρI\rho_I is the average of the pairwise correlation for all neighboring pairs. For regions AiA_i and AlA_l, if they are neighbors of each other, our approximation is Cov[Y(Ai),Y(Al)]=ρIVar[Y(Ai)]Var[Y(Al)].\begin{align} \label{eq:cova} \textrm{Cov} \left[ Y(A_i), Y(A_l) \right] = \rho_I \sqrt{\text{Var}[Y(A_i)] \text{Var}[Y(A_l)]}. \end{align} The covariance between non-neighboring AiA_i and AlA_l is discarded. The final uncertainty approximation for Var[Ŷ(Bi)]\textrm{Var}[\hat{Y}(B_i)] will be an underestimate. Alternatively, we can derive, at least, an upper bound for the variance of the estimates by using a simple application from the Cauchy–Schwartz inequality, in which case, ρI\rho_I is replaced with~1.

Reference

Goodchild, Michael F, and Nina Siu-Ngan Lam. 1980. “Areal Interpolation: A Variant of the Traditional Spatial Problem.” Geo-Processing 1: 279–312.
Gotway, Carol A, and Linda J Young. 2002. “Combining Incompatible Spatial Data.” Journal of the American Statistical Association 97 (458): 632–48.