class: center, middle, inverse, title-slide # Spatially misaligned data: An application to the 2018 Brazilian Presidential Election ### Lucas Godoy
Marcos Oliveira Prates
Jun Yan ### October 1, 2021 --- class: middle <style type="text/css"> .remark-slide-content { font-size: 25px; padding: 1em 4em 1em 4em; } .full { padding: 0px; } .full p { margin-top: 0px; } </style> ## Motivation * Elections outcomes are of public interest. * Important statistical analysis in political sciences involves predicting who will win an election based on polls, social media, and past elections. * An interesting question relies on the characterization of the candidates electorate; * This is the main motivation of our work. More specifically, we are interested in quantifying the associations between socio-economic variables, provided by the IBGE, and the 2018 second round Brazilian election outcome in Belo Horizonte, the capital of the state of Minas Gerais. --- class: middle ## Motivation * Our goal is to extract the information at the __census tracts level__ (provided by IBGE) to the __electoral section__ level (provided by the TSE). * The problems with this apparently simple task are: 1. the data are **spatially misaligned**; 2. the electoral section data are measured at the voting locations (point-referenced), but this spatial geometry may be inappropriate given the nature of the data. * We propose a nonparametric and a parametric solution for the problem, both relying on the Voronoi Tesselation. --- class: middle ## Data <img src="data:image/png;base64,#img/study_regions.png" width="1152" /> --- ### Nonparametric approach * Assumption: populations uniformly distributed within census tracts. * Construct a `\(n \times m\)` matrix `\(\mathbf{W} = \{ w_{ij} \}\)`, where its `\(i\)`th row and `\(j\)`th column stand for the weight associated with the polygon `\(B_j\)` in the estimation of the variables `\(Y_k(\cdot)\)` at the point `\(\mathbf{s}_i\)`. * The following weights are called naive and voronoi weights, respectively, `$$w_{ij} = \mathbf{I} \{ \mathbf{s}_i \in B_j \},$$` and $$ w_{ij} = \frac{\lvert B_j \cap V_i \rvert}{\lvert V_i \rvert}. $$ --- class: middle * Let `\(p = 1\)`, for simplicity. Given the weight matrix `\(\mathbf{W}\)`, the estimates are $$ \hat{Y}(\mathbf{s}_1, \ldots, \mathbf{s}_n) = \mathbf{W} Y(B_1, \ldots, B_m). $$ * Moreover the expectation and variance for this class of estimator are defined, respectively as $$ \textrm{E}[\hat{Y}(\mathbf{s}_1, \ldots, \mathbf{s}_n)] = \mathbf{W} \textrm{E}[Y(B_1, \ldots, B_m)], $$ and `$$\textrm{Var}[\hat{Y}(\mathbf{s}_1, \ldots, \mathbf{s}_n)] = \mathbf{W} \textrm{Var}[Y(B_1, \ldots, B_m)] \mathbf{W}^{\top}.$$` In practice, the covariance matrix `\(\textrm{Var}[\mathbf{Y}(B_1, \ldots, B_m)]\)` is unknown and, consequently, it has to be estimated from the data. However, the IBGE provides the variance associated to each variable. Therefore, for our application, the main diagonal of the matrix `\(\textrm{Var}[\mathbf{Y}(B_1, \ldots, B_m)]\)` is known. --- ### Parametric * Model: `\(Y(B_j) = \mu + S(B_j) + \varepsilon(B_j)\)`, conveniently assuming `\(\{ S(\mathbf{s}) \, : \, \mathbf{s} \in D \}\)` to be a zero mean and second-order stationary Gaussian Random Field. Additionally, `\(\varepsilon(\mathbf{s}) \overset{\textrm{iid}}{\sim} \mathcal{N}(0, \tau^2).\)` * Assumption: Existence of a continuous underlying Random Field driving the observed data. * That is `\(S(B_j) = \lvert B_j \rvert^{-1} \int_{B_j} S(\mathbf{s}) \, \textrm{d}\mathbf{s}\)` and, similarly, `\(\varepsilon(B_j) = \lvert B_j \rvert^{-1} \int_{B_j} \varepsilon(\mathbf{s}) \, \textrm{d}\mathbf{s}\)` * Thus, we have `\(Y(\mathbf{B}) \sim \textrm{GP}(\mu, \Sigma(\phi, \sigma^2, \tau^2))\)`, where `\begin{align} \Sigma(\phi, \sigma^2, \tau^2)_{ij} & = \textrm{Cov}[Y(B_i), Y(B_j)] = \frac{1}{\lvert B_i \rvert \lvert B_j \rvert} \int_{B_i \times B_j} \textrm{Cov}[Y(\mathbf{u}), Y(\mathbf{v})] \, \textrm{d} \mathbf{u} \, \textrm{d} \mathbf{v} \nonumber \\ & = \left( \frac{1}{\lvert B_i \rvert \lvert B_j \rvert} \int_{B_i \times B_j} C( \lVert \mathbf{u} - \mathbf{v} \rVert; \mathbf{\theta}) \, \textrm{d} \mathbf{u} \, \textrm{d} \mathbf{v} \right ) + \mathbf{I} \{i = j\} \frac{\tau^2}{\lvert B_i \rvert}, \end{align}` --- class: middle, center, inverse ## Simulation study --- class: middle, center <img src="data:image/png;base64,#../manuscript/img/sim_city.png" width="62%" /> --- ## Simulation Results - under normality <table class=" lightable-minimal lightable-hover" style='font-family: "Trebuchet MS", verdana, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> dep </th> <th style="text-align:left;"> method </th> <th style="text-align:right;"> bias </th> <th style="text-align:right;"> mse </th> <th style="text-align:right;"> wdt </th> <th style="text-align:right;"> cvg </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> weak </td> <td style="text-align:left;"> mle </td> <td style="text-align:right;"> -0.0026 </td> <td style="text-align:right;"> 1.0000 </td> <td style="text-align:right;"> 1.0000 </td> <td style="text-align:right;"> 0.9547 </td> </tr> <tr> <td style="text-align:left;"> weak </td> <td style="text-align:left;"> voronoi </td> <td style="text-align:right;"> 0.0103 </td> <td style="text-align:right;"> 1.4102 </td> <td style="text-align:right;"> 2.8053 </td> <td style="text-align:right;"> 0.9981 </td> </tr> <tr> <td style="text-align:left;"> weak </td> <td style="text-align:left;"> naive </td> <td style="text-align:right;"> -0.0671 </td> <td style="text-align:right;"> 5.8110 </td> <td style="text-align:right;"> 3.3172 </td> <td style="text-align:right;"> 0.9436 </td> </tr> <tr> <td style="text-align:left;"> moderate </td> <td style="text-align:left;"> mle </td> <td style="text-align:right;"> -0.0026 </td> <td style="text-align:right;"> 1.0000 </td> <td style="text-align:right;"> 1.0000 </td> <td style="text-align:right;"> 0.9524 </td> </tr> <tr> <td style="text-align:left;"> moderate </td> <td style="text-align:left;"> voronoi </td> <td style="text-align:right;"> 0.0273 </td> <td style="text-align:right;"> 1.8314 </td> <td style="text-align:right;"> 2.7210 </td> <td style="text-align:right;"> 0.9983 </td> </tr> <tr> <td style="text-align:left;"> moderate </td> <td style="text-align:left;"> naive </td> <td style="text-align:right;"> -0.0803 </td> <td style="text-align:right;"> 6.5128 </td> <td style="text-align:right;"> 3.0461 </td> <td style="text-align:right;"> 0.9243 </td> </tr> <tr> <td style="text-align:left;"> strong </td> <td style="text-align:left;"> mle </td> <td style="text-align:right;"> -0.0024 </td> <td style="text-align:right;"> 1.0000 </td> <td style="text-align:right;"> 1.0000 </td> <td style="text-align:right;"> 0.9504 </td> </tr> <tr> <td style="text-align:left;"> strong </td> <td style="text-align:left;"> voronoi </td> <td style="text-align:right;"> 0.0797 </td> <td style="text-align:right;"> 6.3406 </td> <td style="text-align:right;"> 3.4125 </td> <td style="text-align:right;"> 0.9976 </td> </tr> <tr> <td style="text-align:left;"> strong </td> <td style="text-align:left;"> naive </td> <td style="text-align:right;"> -0.0363 </td> <td style="text-align:right;"> 11.6271 </td> <td style="text-align:right;"> 3.3717 </td> <td style="text-align:right;"> 0.9105 </td> </tr> </tbody> </table> --- ## Simulation Results - marginal Beta <table class=" lightable-minimal lightable-hover" style='font-family: "Trebuchet MS", verdana, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> method </th> <th style="text-align:left;"> family </th> <th style="text-align:right;"> bias </th> <th style="text-align:right;"> mse </th> <th style="text-align:right;"> wdt </th> <th style="text-align:right;"> cvg </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> mle </td> <td style="text-align:left;"> a = .3202, b = .3056 </td> <td style="text-align:right;"> -0.0008 </td> <td style="text-align:right;"> 1.0000 </td> <td style="text-align:right;"> 1.0000 </td> <td style="text-align:right;"> 0.9362 </td> </tr> <tr> <td style="text-align:left;"> voronoi </td> <td style="text-align:left;"> a = .3202, b = .3056 </td> <td style="text-align:right;"> 0.0105 </td> <td style="text-align:right;"> 1.6866 </td> <td style="text-align:right;"> 2.7875 </td> <td style="text-align:right;"> 0.9980 </td> </tr> <tr> <td style="text-align:left;"> naive </td> <td style="text-align:left;"> a = .3202, b = .3056 </td> <td style="text-align:right;"> -0.0280 </td> <td style="text-align:right;"> 5.8942 </td> <td style="text-align:right;"> 2.8223 </td> <td style="text-align:right;"> 0.8006 </td> </tr> <tr> <td style="text-align:left;"> mle </td> <td style="text-align:left;"> a = .4516, b = .8147 </td> <td style="text-align:right;"> -0.0010 </td> <td style="text-align:right;"> 1.0000 </td> <td style="text-align:right;"> 1.0000 </td> <td style="text-align:right;"> 0.9312 </td> </tr> <tr> <td style="text-align:left;"> voronoi </td> <td style="text-align:left;"> a = .4516, b = .8147 </td> <td style="text-align:right;"> 0.0083 </td> <td style="text-align:right;"> 1.7426 </td> <td style="text-align:right;"> 2.7903 </td> <td style="text-align:right;"> 0.9982 </td> </tr> <tr> <td style="text-align:left;"> naive </td> <td style="text-align:left;"> a = .4516, b = .8147 </td> <td style="text-align:right;"> -0.0199 </td> <td style="text-align:right;"> 5.9246 </td> <td style="text-align:right;"> 2.7587 </td> <td style="text-align:right;"> 0.8147 </td> </tr> <tr> <td style="text-align:left;"> mle </td> <td style="text-align:left;"> a = .6042, b = 6.0446 </td> <td style="text-align:right;"> -0.0004 </td> <td style="text-align:right;"> 1.0000 </td> <td style="text-align:right;"> 1.0000 </td> <td style="text-align:right;"> 0.9279 </td> </tr> <tr> <td style="text-align:left;"> voronoi </td> <td style="text-align:left;"> a = .6042, b = 6.0446 </td> <td style="text-align:right;"> 0.0024 </td> <td style="text-align:right;"> 1.7042 </td> <td style="text-align:right;"> 2.7083 </td> <td style="text-align:right;"> 0.9980 </td> </tr> <tr> <td style="text-align:left;"> naive </td> <td style="text-align:left;"> a = .6042, b = 6.0446 </td> <td style="text-align:right;"> -0.0052 </td> <td style="text-align:right;"> 5.9145 </td> <td style="text-align:right;"> 2.5553 </td> <td style="text-align:right;"> 0.8155 </td> </tr> </tbody> </table> --- class: middle ## Simulation Results - marginal t with 1.5 df <table class=" lightable-minimal lightable-hover" style='font-family: "Trebuchet MS", verdana, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> method </th> <th style="text-align:left;"> family </th> <th style="text-align:right;"> bias </th> <th style="text-align:right;"> mse </th> <th style="text-align:right;"> wdt </th> <th style="text-align:right;"> cvg </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> voronoi </td> <td style="text-align:left;"> t (df = 1.5) </td> <td style="text-align:right;"> 0.1047 </td> <td style="text-align:right;"> 1.0000 </td> <td style="text-align:right;"> 1.3029 </td> <td style="text-align:right;"> 0.9973 </td> </tr> <tr> <td style="text-align:left;"> mle </td> <td style="text-align:left;"> t (df = 1.5) </td> <td style="text-align:right;"> 0.3800 </td> <td style="text-align:right;"> 4.6630 </td> <td style="text-align:right;"> 1.0000 </td> <td style="text-align:right;"> 0.9366 </td> </tr> <tr> <td style="text-align:left;"> naive </td> <td style="text-align:left;"> t (df = 1.5) </td> <td style="text-align:right;"> -0.3179 </td> <td style="text-align:right;"> 5.5389 </td> <td style="text-align:right;"> 1.1392 </td> <td style="text-align:right;"> 0.9089 </td> </tr> </tbody> </table> --- class: middle, center, inverse ## Application --- class: middle, center <img src="data:image/png;base64,#../poster/img/pp_white.png" width="1920" /> --- ### Association between proportion of blank and null votes and socio-economic variables <!-- --> --- ## Discussion * Robustness of the Gaussian Random Fields to model areal data shows evidence of the existence of some type of Central Limit Theorem (CLT) for stochastic (spatial) integrals. * Based on the assumption of the existence of such CLT for stochastic integrals, our simulation study provides empirical indication that, regardless the distribution of `\(S(\cdot)\)`, the integrals converge in distribution to a Gaussian process, and the nugget effect is averaged out. * Nonparametric alternative is useful only on extreme cases. These extreme cases may not be common in practice. * Interesting results for the application: - The higher the average income, the lower the proportion of blank and null votes; - Higher percentages of elderly people are associated with low proportion of blank and null votes; - Same behavior was observed for the proportion of (self-declared) white people. --- class: middle, center, inverse # Thank you!