drmr: A Bayesian approach to Dynamic Range Models in R

Available at lcgodoy.me/slides/2025-slmdrmr/

Lucas da Cunha Godoy

EEB Department, UCSC

2025-04-21

Introduction

The Challenge & Limits of SDMs

  • Critical Challenge: Predicting species’ responses to global environmental change is vital for conservation and management.

  • Usual Tool: Species Distribution Models (SDMs) have been the workhorse, correlating occurrences with environmental variables.

  • Limitations of SDMs:

    • Correlative: Struggle to predict responses under novel future conditions (Pagel and Schurr 2012).
    • Equilibrium Assumption: Often violated (Guisan and Thuiller 2005).
    • Lack Mechanism: Don’t explicitly model the underlying biological processes.

Dynamic Range Models (DRMs): A Mechanistic Approach

  • DRMs explicitly incorporate demographic processes that drive range dynamics (Pagel and Schurr 2012).

  • Allows linking environmental drivers directly to specific processes

  • Potential for more robust forecasting under novel conditions.

  • Why are they rarely in practice? Despite conceptual appeal, DRMs have been underutilized due to their complexity and computational fitting challenges.

Introducing drmr: Making DRMs Accessible

  • Goal: Bridge the gap between DRM potential and practical application.

  • Key Features:

    • Develops and applies age-structured DRMs in a Bayesian framework.
    • User-friendly interface.
    • Leverages Stan via cmdstanr for efficient fitting (Gabry et al. 2024).
    • Easily relate environmental drivers to specific recruitment and survival.

Preliminary

Zero-augmented probability distributions

  • Species’ densities are non-negative continuous variables.

  • Zero-augmented vs. Zero-inflated Distributions:

    • Zero-inflated models apply to discrete distributions (e.g., Poisson counts). They increase (inflate) the probability of observing zeros, which is already a possible outcome (\(P(Y = 0) > 0\)) in the base distribution.
    • Zero-augmented models apply to continuous distributions (e.g., densities). Since \(P(Y = 0)\) is theoretically \(0\) for continuous variables, these models add a discrete probability mass specifically at zero, augmenting the possibilities to include exact zeros.

Zero-augmented probability density function (PDF)

\[ f(y_{t, i} \mid \mu_{t, i}, \phi, \rho_{t, i}) = \begin{cases} \rho_{t, i}, & \text{ if } y_{t, i} = 0, \\ (1 - \rho_{t, i}) g(y_{t, i} \mid \mu_{t, i}, \phi), & \text{ if } y_{t, i} > 0, \end{cases} \]

  • \(\rho_{t, i} = \mathrm{Pr}(Y_{t, i} = 0)\) is the probability of absence.

  • \(g(\cdot \mid \mu_{t, i}, \phi)\) is the pdf of a continuous probability distribution.

  • \(\mu_{t, i}\) is the expected value (theoretical mean) of \(Y_{t, i}\) provided the species is present at time \(t\) and site \(i\).

Age-structured DRM

Setup

  • \(Y_{a, t, i}\): Unobserved density of individuals of age \(a\), time \(t\) and site \(i\).

  • \(\lambda_{a, t, i}\): Expected age-specific density.

  • \(Y_{t, i} = \sum_{a} Y_{a, t, i}\): Observed density of individuals of all ages at time \(t\) and site \(i\).

  • \(\mu_{t, i} = \sum_{a} \lambda_{a, t, i}\): Expected “overall” density.

  • Biological processes are encoded through \(\lambda_{a, t, i}\).

  • Key processes: recruitment, survival, and movement

Recruitment

  • Expected density for recruits: \[ \lambda_{1, t, i} = \exp \{ \psi_{t, i} \} \]

  • Where \(\psi_{t, i}\) is the expected log-transformed recruitment at time \(t\) and site \(i\).

  • We assume the expected log-transformed recruitment relates to the environmental drivers as follows: \[ \psi_{t, i} = \boldsymbol{\beta}^{\top}_r \mathbf{x}^{(r)}_{t, i} \]

  • It is also possible to incorporate temporal dependence to recruitment through an AR(1) process.

Survival

  • Expected density at age \(a\), time \(t\), and patch \(i\): \[ \lambda_{a, t, i} = \lambda_{a - 1, t - 1, i} \cdot s_{a - 1, t - 1, i} \]

  • \(s_{a, t, i}\) represents survival rates.

  • We assume the log-transformed survival rates relate to the environmental drivers as follows: \[ \log(s_{a, t, i}) = \boldsymbol{\beta}^{\top}_s \mathbf{x}^{(s)}_{t, i} - f_{a, t} \]

  • External information: additional mortality sources can be incorporated to the survival rates as well. Above, \(f_{a, t}\) is the instantaneous mortality rate for individuals of age \(a\) at time \(t\). Examples: fishing mortality, hunting, etc.

Movement

  • Movement is an important process affecting species’ densities.

  • Due to a high computational cost, we offer only a simple way to account for movement.

  • We estimate the probability of individuals remaining in the same site (\(\zeta\)) between two time points, and divide the probability of moving (\(1 - \zeta\)) evenly across neighboring sites.

The drmr Package & Case Study

Main Functions

  • Fitting: fit_drm, fit_sdm

  • Forecasting: predict_drm, predict_sdm

  • Visualization: marg_rec, marg_surv, marg_pabs

  • Simulation: pop_dyn, model_sim, pp_sim

Summer Flounder Dataset

  • An example analysis uses Summer flounder (Paralichthys dentatus) data from 1982-2016 NOAA bottom trawl surveys (Fredston et al. 2025) to illustrate the package’s features.

  • The data spans the US Atlantic coast (Cape Hatteras, NC to the Canada/Maine border) and was aggregated from individual hauls into 10 latitudinal patches with varying areas.

  • Response variable: Density (count per unit area).

  • Environmental drivers: SST and SBT.

Patches/sites

Fitting a simple model

  • Default model: two age groups with constant density and recruitment across time/space, and sets a fixed survival rate of \(0.70\) between age classes.

  • Output: is a list containing a stanfit object, equivalent to cmdstanr::sample() output, allowing full access to cmdstanr’s features for diagnostics, plotting, and assessment.

library(drmr) ## load package
data(sum_fl)  ## load data
baseline <-
  fit_drm(.data = dat_train,  ## dataset
          y_col = "dens",     ## response variable
          time_col = "year",  ## variable storing time
          site_col = "patch", ## variable storing "patch/site" id
          seed = 202505)      ## seed

Parameter estimates & diagnostics

  • The $summary method allows for checking the parameter estimates easily
baseline$stanfit$summary(variables = c("phi", "beta_r"))
# A tibble: 2 × 10
  variable     mean  median     sd    mad     q5    q95  rhat ess_bulk ess_tail
  <chr>       <dbl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
1 phi[1]     1.08    1.08   0.0960 0.0945  0.925 1.24    1.00    4099.    3003.
2 beta_r[1] -0.0792 -0.0818 0.0735 0.0734 -0.197 0.0411  1.00    4011.    3083.

Toggles and increased model complexity

Toggle Description Default State
cloglog Use the complementary log-log (cloglog) link function for absence probability. If off, the logistic link function is used. Off
movement Enable the movement routine as described. Off
est_surv Estimate survival probabilities within the model. On
est_init Estimate \(\lambda\) initial values. Off (this one has many options)
time_ar Include an AR(1) term for recruitment. Off

Models fitted to this example

Model Description
baseline Baseline model
drm_1 Recruitment as a quadratic function of SST, and probability of absence depending on number of hauls, SST and SBT
drm_2 Same as drm_1 + estimating constant survival rates
drm_3 Same as drm_2 + AR(1) for recruitment
drm_4 Same as drm_3 + movement and increasing number of age-classes to 6
drm_5 Same as drm_3 and increasing number of age-classes to 6
drm_6 Same as drm_3 + fishing mortality for specific age-classes at each time point (16 age-classes)
drm_7 probability of absence as drm_3; AR(1) for recruitment (but constant across patches) and survival depending on SBT
sdm A zero-augmented Gamma SDM with the absence probability as in drm_3 and conditional density as a quadratic function of SST.

Code for fitting a complex model

drm_4 <-
  fit_drm(.data = dat_train,
          y_col = "dens",
          time_col = "year",
          site_col = "patch",
          family = "gamma",
          seed = 202505,
          formula_zero = ~ 1 + c_hauls + c_btemp + c_stemp,
          formula_rec = ~ 1 + c_stemp + I(c_stemp * c_stemp),
          formula_surv = ~ 1,
          n_ages = 6,
          adj_mat = adj_mat, ## A matrix for movement routine
          ages_movement = c(0, 0, 1, 1, 1, 0), ## ages allowed to move
          .toggles = list(est_surv = 1,
                          time_ar = 1,
                          movement = 1),
          init = "pathfinder")

Estimating relationship between processes and environmental variables

  • Recruitment and SST:
newdata_rec <- data.frame(c_stemp =
                            seq(from = quantile(dat_train$c_stemp, .05),
                                to = quantile(dat_train$c_stemp, .95),
                                length.out = 200))

rec_samples_3 <- marg_rec(drm_3, newdata_rec)
  • Survival and SBT:
newdata_surv <- data.frame(c_btemp =
                             seq(from = quantile(dat_train$c_btemp, .05),
                                 to = quantile(dat_train$c_btemp, .95),
                                 length.out = 200))

surv_samples <- marg_surv(drm_7, newdata_surv)

Estimated relationship between processes and environmental variables

Forecasting

  • The code for obtaining out-of-sample predictions is quite intuitive.

  • We only need to provide the output for a model fit, a new dataset (where we want the predictions), and a seed:

forecast_3 <- predict_drm(drm = drm_3,
                          new_data = dat_test,
                          past_data = filter(dat_train,
                                             year == max(year)),
                          seed = 125,
                          cores = 4)

Comparing models

Model \(\Delta\)-LOOIC RMSE IS PIC Time
drm_3 0.00 8.98 8.60 78.0 13.34
sdm -6.07 12.04 12.72 60.0 3.88
drm_4 -7.35 8.30 8.87 84.0 28.7
drm_5 -7.74 8.23 8.96 86.0 19.24
drm_7 -10.41 8.75 8.97 78.0 22.73
drm_6 -18.04 10.22 10.49 86.0 33.73
drm_1 -22.49 11.17 10.68 68.0 4.67
drm_2 -32.32 10.73 10.29 68.0 4.44
baseline -163.85 14.44 12.98 74.0 1.35

Concluding remarks

Highlights

  • The drmr substantially lowers the barrier for ecologists to use the DRM in their applications.

  • The code is easy to use and takes advantage of what has been developed for Stan: visualization, diagnostic tools, and estimation.

  • drmr allows for empirically testing which processes are more important to predict the distribution of a species.

  • The more complex a model is, the more (and better) data we need to be able to estimate those relationships.

Future work

  • Application to NE Mackerel

  • More options for spatial and temporal random effects for specific processes

  • Other population dynamic models (e.g., Ricker, Belverton-Holt)

  • GAM-like non-parametric relationships between processes and the environment

  • Support for length-composition data

  • More realistic movement routines

References

Fredston, A., Ovando, D., Cunha Godoy, L. da, Kong, J., Muffley, B., Thorson, J. T., and Pinsky, M. (2025), “Dynamic range models improve the near-term forecast for a marine species on the move,” Ecology Letters, 00, (under review). https://doi.org/10.32942/X24D00.
Gabry, J., Češnovar, R., Johnson, A., and Bronder, S. (2024), cmdstanr: R interface to CmdStan.
Guisan, A., and Thuiller, W. (2005), “Predicting species distribution: Offering more than simple habitat models,” Ecology letters, Wiley Online Library, 8, 993–1009. https://doi.org/https://doi.org/10.1111/j.1461-0248.2005.00792.x.
Pagel, J., and Schurr, F. M. (2012), “Forecasting species ranges by statistical estimation of ecological niches and spatial population dynamics,” Global Ecology and Biogeography, 21, 293–304. https://doi.org/https://doi.org/10.1111/j.1466-8238.2011.00663.x.