Spatially misaligned data: An application to the 2018 Brazilian Presidential Election

# Spatially misaligned data: An application to the 2018 Brazilian Presidential Election
### Lucas Godoy <br> Marcos Oliveira Prates <br> Jun Yan
### October 1, 2021

---

.full {
  padding: 0px;
}

.full p {
  margin-top: 0px;
}
</style>

## Motivation

* Elections outcomes are of public interest.
* Important statistical analysis in political sciences involves predicting who will
  win an election based on polls, social media, and past elections.
* An interesting question relies on the characterization of the candidates electorate;

* This is the main motivation of our work. More specifically, we are interested
  in quantifying the associations between socio-economic variables, provided by
  the IBGE, and the 2018 second round Brazilian election outcome in Belo
  Horizonte, the capital of the state of Minas Gerais.

---

## Motivation

* Our goal is to extract the information at the __census tracts level__
  (provided by IBGE) to the __electoral section__ level (provided by the TSE).
* The problems with this apparently simple task are:
  1. the data are **spatially misaligned**; 
  2. the electoral section data are measured at the voting
  locations (point-referenced), but this spatial geometry may be inappropriate
  given the nature of the data.
* We propose a nonparametric and a parametric solution for the problem, both
  relying on the Voronoi Tesselation.

---

## Data

---

### Nonparametric approach

* Assumption: populations uniformly distributed within census tracts.

* Construct a `$n \times m$` matrix `$\mathbf{W} = \{ w_{ij} \}$`, where its `$i$`th
  row and `$j$`th column stand for the weight associated with the polygon `$B_j$` in
  the estimation of the variables `$Y_k(\cdot)$` at the point `$\mathbf{s}_i$`.

* The following weights are called naive and voronoi weights, respectively,

`$$w_{ij} = \mathbf{I} \{ \mathbf{s}_i \in B_j \},$$`

and

$$ w_{ij} = \frac{\lvert B_j \cap V_i \rvert}{\lvert V_i \rvert}. $$

---

* Let `$p = 1$`, for simplicity. Given the weight matrix `$\mathbf{W}$`, the
  estimates are $$ \hat{Y}(\mathbf{s}_1, \ldots, \mathbf{s}_n) = \mathbf{W}
  Y(B_1, \ldots, B_m). $$

* Moreover the expectation and variance for this class of estimator are defined,
  respectively as

$$ \textrm{E}[\hat{Y}(\mathbf{s}_1, \ldots, \mathbf{s}_n)] = \mathbf{W}
\textrm{E}[Y(B_1, \ldots, B_m)], $$

and

`$$\textrm{Var}[\hat{Y}(\mathbf{s}_1, \ldots, \mathbf{s}_n)] = \mathbf{W}
\textrm{Var}[Y(B_1, \ldots, B_m)] \mathbf{W}^{\top}.$$`

In practice, the covariance matrix `$\textrm{Var}[\mathbf{Y}(B_1, \ldots, B_m)]$`
is unknown and, consequently, it has to be estimated from the data. However, the
IBGE provides the variance associated to each variable. Therefore, for our
application, the main diagonal of the matrix 
`$\textrm{Var}[\mathbf{Y}(B_1, \ldots, B_m)]$` is known.

---

### Parametric

* Model: `$Y(B_j) = \mu + S(B_j) + \varepsilon(B_j)$`, conveniently assuming 
  `$\{ S(\mathbf{s}) \, : \, \mathbf{s} \in D \}$` to be a zero mean and 
  second-order stationary Gaussian Random Field. Additionally, 
  `$\varepsilon(\mathbf{s}) \overset{\textrm{iid}}{\sim} \mathcal{N}(0, \tau^2).$`

* Assumption: Existence of a continuous underlying Random Field driving the
  observed data.

* That is `$S(B_j) = \lvert B_j \rvert^{-1} \int_{B_j} S(\mathbf{s}) \, \textrm{d}\mathbf{s}$` 
	and, similarly, 
  `$\varepsilon(B_j) = \lvert B_j \rvert^{-1} \int_{B_j} \varepsilon(\mathbf{s}) \, \textrm{d}\mathbf{s}$`

* Thus, we have `$Y(\mathbf{B}) \sim \textrm{GP}(\mu, \Sigma(\phi, \sigma^2, \tau^2))$`, 
  where

`\begin{align}
  \Sigma(\phi, \sigma^2, \tau^2)_{ij} & = \textrm{Cov}[Y(B_i), Y(B_j)]
  = \frac{1}{\lvert B_i \rvert \lvert B_j \rvert}
                                 \int_{B_i \times B_j}
                                 \textrm{Cov}[Y(\mathbf{u}), Y(\mathbf{v})] \,
                                 \textrm{d} \mathbf{u} \, \textrm{d} \mathbf{v} \nonumber \\
                               & = \left( \frac{1}{\lvert B_i \rvert \lvert B_j \rvert}
                                 \int_{B_i \times B_j}
                                 C( \lVert \mathbf{u} - \mathbf{v} \rVert; \mathbf{\theta}) \,
                                 \textrm{d} \mathbf{u} \, \textrm{d} \mathbf{v} \right ) + \mathbf{I} \{i = j\}
                                 \frac{\tau^2}{\lvert B_i \rvert},
\end{align}`

---

## Simulation study

---

---

## Simulation Results - under normality

<table class=" lightable-minimal lightable-hover" style='font-family: "Trebuchet MS", verdana, sans-serif; margin-left: auto; margin-right: auto;'>
 <thead>
  <tr>
   <th style="text-align:left;"> dep </th>
   <th style="text-align:left;"> method </th>
   <th style="text-align:right;"> bias </th>
   <th style="text-align:right;"> mse </th>
   <th style="text-align:right;"> wdt </th>
   <th style="text-align:right;"> cvg </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> weak </td>
   <td style="text-align:left;"> mle </td>
   <td style="text-align:right;"> -0.0026 </td>
   <td style="text-align:right;"> 1.0000 </td>
   <td style="text-align:right;"> 1.0000 </td>
   <td style="text-align:right;"> 0.9547 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> weak </td>
   <td style="text-align:left;"> voronoi </td>
   <td style="text-align:right;"> 0.0103 </td>
   <td style="text-align:right;"> 1.4102 </td>
   <td style="text-align:right;"> 2.8053 </td>
   <td style="text-align:right;"> 0.9981 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> weak </td>
   <td style="text-align:left;"> naive </td>
   <td style="text-align:right;"> -0.0671 </td>
   <td style="text-align:right;"> 5.8110 </td>
   <td style="text-align:right;"> 3.3172 </td>
   <td style="text-align:right;"> 0.9436 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> moderate </td>
   <td style="text-align:left;"> mle </td>
   <td style="text-align:right;"> -0.0026 </td>
   <td style="text-align:right;"> 1.0000 </td>
   <td style="text-align:right;"> 1.0000 </td>
   <td style="text-align:right;"> 0.9524 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> moderate </td>
   <td style="text-align:left;"> voronoi </td>
   <td style="text-align:right;"> 0.0273 </td>
   <td style="text-align:right;"> 1.8314 </td>
   <td style="text-align:right;"> 2.7210 </td>
   <td style="text-align:right;"> 0.9983 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> moderate </td>
   <td style="text-align:left;"> naive </td>
   <td style="text-align:right;"> -0.0803 </td>
   <td style="text-align:right;"> 6.5128 </td>
   <td style="text-align:right;"> 3.0461 </td>
   <td style="text-align:right;"> 0.9243 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> strong </td>
   <td style="text-align:left;"> mle </td>
   <td style="text-align:right;"> -0.0024 </td>
   <td style="text-align:right;"> 1.0000 </td>
   <td style="text-align:right;"> 1.0000 </td>
   <td style="text-align:right;"> 0.9504 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> strong </td>
   <td style="text-align:left;"> voronoi </td>
   <td style="text-align:right;"> 0.0797 </td>
   <td style="text-align:right;"> 6.3406 </td>
   <td style="text-align:right;"> 3.4125 </td>
   <td style="text-align:right;"> 0.9976 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> strong </td>
   <td style="text-align:left;"> naive </td>
   <td style="text-align:right;"> -0.0363 </td>
   <td style="text-align:right;"> 11.6271 </td>
   <td style="text-align:right;"> 3.3717 </td>
   <td style="text-align:right;"> 0.9105 </td>
  </tr>
</tbody>
</table>

---

## Simulation Results - marginal Beta

<table class=" lightable-minimal lightable-hover" style='font-family: "Trebuchet MS", verdana, sans-serif; margin-left: auto; margin-right: auto;'>
 <thead>
  <tr>
   <th style="text-align:left;"> method </th>
   <th style="text-align:left;"> family </th>
   <th style="text-align:right;"> bias </th>
   <th style="text-align:right;"> mse </th>
   <th style="text-align:right;"> wdt </th>
   <th style="text-align:right;"> cvg </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> mle </td>
   <td style="text-align:left;"> a = .3202, b = .3056 </td>
   <td style="text-align:right;"> -0.0008 </td>
   <td style="text-align:right;"> 1.0000 </td>
   <td style="text-align:right;"> 1.0000 </td>
   <td style="text-align:right;"> 0.9362 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> voronoi </td>
   <td style="text-align:left;"> a = .3202, b = .3056 </td>
   <td style="text-align:right;"> 0.0105 </td>
   <td style="text-align:right;"> 1.6866 </td>
   <td style="text-align:right;"> 2.7875 </td>
   <td style="text-align:right;"> 0.9980 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> naive </td>
   <td style="text-align:left;"> a = .3202, b = .3056 </td>
   <td style="text-align:right;"> -0.0280 </td>
   <td style="text-align:right;"> 5.8942 </td>
   <td style="text-align:right;"> 2.8223 </td>
   <td style="text-align:right;"> 0.8006 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> mle </td>
   <td style="text-align:left;"> a = .4516, b = .8147 </td>
   <td style="text-align:right;"> -0.0010 </td>
   <td style="text-align:right;"> 1.0000 </td>
   <td style="text-align:right;"> 1.0000 </td>
   <td style="text-align:right;"> 0.9312 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> voronoi </td>
   <td style="text-align:left;"> a = .4516, b = .8147 </td>
   <td style="text-align:right;"> 0.0083 </td>
   <td style="text-align:right;"> 1.7426 </td>
   <td style="text-align:right;"> 2.7903 </td>
   <td style="text-align:right;"> 0.9982 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> naive </td>
   <td style="text-align:left;"> a = .4516, b = .8147 </td>
   <td style="text-align:right;"> -0.0199 </td>
   <td style="text-align:right;"> 5.9246 </td>
   <td style="text-align:right;"> 2.7587 </td>
   <td style="text-align:right;"> 0.8147 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> mle </td>
   <td style="text-align:left;"> a = .6042, b = 6.0446 </td>
   <td style="text-align:right;"> -0.0004 </td>
   <td style="text-align:right;"> 1.0000 </td>
   <td style="text-align:right;"> 1.0000 </td>
   <td style="text-align:right;"> 0.9279 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> voronoi </td>
   <td style="text-align:left;"> a = .6042, b = 6.0446 </td>
   <td style="text-align:right;"> 0.0024 </td>
   <td style="text-align:right;"> 1.7042 </td>
   <td style="text-align:right;"> 2.7083 </td>
   <td style="text-align:right;"> 0.9980 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> naive </td>
   <td style="text-align:left;"> a = .6042, b = 6.0446 </td>
   <td style="text-align:right;"> -0.0052 </td>
   <td style="text-align:right;"> 5.9145 </td>
   <td style="text-align:right;"> 2.5553 </td>
   <td style="text-align:right;"> 0.8155 </td>
  </tr>
</tbody>
</table>

---

## Simulation Results - marginal t with 1.5 df

---

## Application

---

---

### Association between proportion of blank and null votes and socio-economic variables

![](data:image/png;base64,#ness2021_files/figure-html/plot3-1.png)

---

## Discussion

* Robustness of the Gaussian Random Fields to model areal data shows evidence of
  the existence of some type of Central Limit Theorem (CLT) for stochastic (spatial) integrals.

* Based on the assumption of the existence of such CLT for stochastic integrals, our simulation study provides  empirical indication that, regardless the distribution of `$S(\cdot)$`, the integrals converge in distribution to a Gaussian process, and the nugget effect is averaged out.

* Nonparametric alternative is useful only on extreme cases. These extreme cases may not be common in practice.

* Interesting results for the application:
	- The higher the average income, the lower the proportion of blank and null
      votes;
	- Higher percentages of elderly people are associated with low proportion of
      blank and null votes;
	- Same behavior was observed for the proportion of (self-declared) white
      people.

---

# Thank you!