`drmr`: A Bayesian approach to Dynamic Range Models in `R`

Available at lcgodoy.me/slides/2025-slmdrmr/

Lucas da Cunha Godoy

ldcgodoy@ucsc.com

EEB Department, UCSC

2025-04-21

Introduction

The Challenge & Limits of SDMs

Critical Challenge: Predicting species’ responses to global environmental change is vital for conservation and management.
Usual Tool: Species Distribution Models (SDMs) have been the workhorse, correlating occurrences with environmental variables.
Limitations of SDMs:
- Correlative: Struggle to predict responses under novel future conditions (Pagel and Schurr 2012).
- Equilibrium Assumption: Often violated (Guisan and Thuiller 2005).
- Lack Mechanism: Don’t explicitly model the underlying biological processes.

Dynamic Range Models (DRMs): A Mechanistic Approach

DRMs explicitly incorporate demographic processes that drive range dynamics (Pagel and Schurr 2012).
Allows linking environmental drivers directly to specific processes
Potential for more robust forecasting under novel conditions.
Why are they rarely in practice? Despite conceptual appeal, DRMs have been underutilized due to their complexity and computational fitting challenges.

Introducing `drmr`: Making DRMs Accessible

Goal: Bridge the gap between DRM potential and practical application.
Key Features:
- Develops and applies age-structured DRMs in a Bayesian framework.
- User-friendly interface.
- Leverages Stan via cmdstanr for efficient fitting (Gabry et al. 2024).
- Easily relate environmental drivers to specific recruitment and survival.

Preliminary

Zero-augmented probability distributions

Species’ densities are non-negative continuous variables.
Zero-augmented vs. Zero-inflated Distributions:
- Zero-inflated models apply to discrete distributions (e.g., Poisson counts). They increase (inflate) the probability of observing zeros, which is already a possible outcome ($P(Y = 0) > 0$) in the base distribution.
- Zero-augmented models apply to continuous distributions (e.g., densities). Since $P(Y = 0)$ is theoretically $0$ for continuous variables, these models add a discrete probability mass specifically at zero, augmenting the possibilities to include exact zeros.

Zero-augmented probability density function (PDF)

\[ f(y_{t, i} \mid \mu_{t, i}, \phi, \rho_{t, i}) = \begin{cases} \rho_{t, i}, & \text{ if } y_{t, i} = 0, \\ (1 - \rho_{t, i}) g(y_{t, i} \mid \mu_{t, i}, \phi), & \text{ if } y_{t, i} > 0, \end{cases} \]

$\rho_{t, i} = \mathrm{Pr}(Y_{t, i} = 0)$ is the probability of absence.
$g(\cdot \mid \mu_{t, i}, \phi)$ is the pdf of a continuous probability distribution.
$\mu_{t, i}$ is the expected value (theoretical mean) of $Y_{t, i}$ provided the species is present at time $t$ and site $i$.

Age-structured DRM

Setup

$Y_{a, t, i}$: Unobserved density of individuals of age $a$, time $t$ and site $i$.
$\lambda_{a, t, i}$: Expected age-specific density.
$Y_{t, i} = \sum_{a} Y_{a, t, i}$: Observed density of individuals of all ages at time $t$ and site $i$.
$\mu_{t, i} = \sum_{a} \lambda_{a, t, i}$: Expected “overall” density.
Biological processes are encoded through $\lambda_{a, t, i}$.
Key processes: recruitment, survival, and movement

Recruitment

Expected density for recruits: \[ \lambda_{1, t, i} = \exp \{ \psi_{t, i} \} \]
Where $\psi_{t, i}$ is the expected log-transformed recruitment at time $t$ and site $i$.
We assume the expected log-transformed recruitment relates to the environmental drivers as follows: \[ \psi_{t, i} = \boldsymbol{\beta}^{\top}_r \mathbf{x}^{(r)}_{t, i} \]
It is also possible to incorporate temporal dependence to recruitment through an AR(1) process.

Survival

Expected density at age $a$, time $t$, and patch $i$: \[ \lambda_{a, t, i} = \lambda_{a - 1, t - 1, i} \cdot s_{a - 1, t - 1, i} \]
$s_{a, t, i}$ represents survival rates.
We assume the log-transformed survival rates relate to the environmental drivers as follows: \[ \log(s_{a, t, i}) = \boldsymbol{\beta}^{\top}_s \mathbf{x}^{(s)}_{t, i} - f_{a, t} \]
External information: additional mortality sources can be incorporated to the survival rates as well. Above, $f_{a, t}$ is the instantaneous mortality rate for individuals of age $a$ at time $t$. Examples: fishing mortality, hunting, etc.

Movement

Movement is an important process affecting species’ densities.
Due to a high computational cost, we offer only a simple way to account for movement.
We estimate the probability of individuals remaining in the same site ($\zeta$) between two time points, and divide the probability of moving ($1 - \zeta$) evenly across neighboring sites.

The `drmr` Package & Case Study

Main Functions

Fitting: fit_drm, fit_sdm
Forecasting: predict_drm, predict_sdm
Visualization: marg_rec, marg_surv, marg_pabs
Simulation: pop_dyn, model_sim, pp_sim

Summer Flounder Dataset

An example analysis uses Summer flounder (Paralichthys dentatus) data from 1982-2016 NOAA bottom trawl surveys (Fredston et al. 2025) to illustrate the package’s features.
The data spans the US Atlantic coast (Cape Hatteras, NC to the Canada/Maine border) and was aggregated from individual hauls into 10 latitudinal patches with varying areas.
Response variable: Density (count per unit area).
Environmental drivers: SST and SBT.

Patches/sites

Fitting a simple model

Default model: two age groups with constant density and recruitment across time/space, and sets a fixed survival rate of $0.70$ between age classes.
Output: is a list containing a stanfit object, equivalent to cmdstanr::sample() output, allowing full access to cmdstanr’s features for diagnostics, plotting, and assessment.

library(drmr) ## load package
data(sum_fl)  ## load data
baseline <-
  fit_drm(.data = dat_train,  ## dataset
          y_col = "dens",     ## response variable
          time_col = "year",  ## variable storing time
          site_col = "patch", ## variable storing "patch/site" id
          seed = 202505)      ## seed

Parameter estimates & diagnostics

The $summary method allows for checking the parameter estimates easily

baseline$stanfit$summary(variables = c("phi", "beta_r"))

# A tibble: 2 × 10
  variable     mean  median     sd    mad     q5    q95  rhat ess_bulk ess_tail
  <chr>       <dbl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
1 phi[1]     1.08    1.08   0.0960 0.0945  0.925 1.24    1.00    4099.    3003.
2 beta_r[1] -0.0792 -0.0818 0.0735 0.0734 -0.197 0.0411  1.00    4011.    3083.

Toggles and increased model complexity

Toggle	Description	Default State
`cloglog`	Use the complementary log-log (cloglog) link function for absence probability. If off, the logistic link function is used.	Off
`movement`	Enable the movement routine as described.	Off
`est_surv`	Estimate survival probabilities within the model.	On
`est_init`	Estimate $\lambda$ initial values.	Off (this one has many options)
`time_ar`	Include an AR(1) term for recruitment.	Off

Models fitted to this example

Model	Description
`baseline`	Baseline model
`drm_1`	Recruitment as a quadratic function of SST, and probability of absence depending on number of hauls, SST and SBT
`drm_2`	Same as `drm_1` + estimating constant survival rates
`drm_3`	Same as `drm_2` + AR(1) for recruitment
`drm_4`	Same as `drm_3` + movement and increasing number of age-classes to 6
`drm_5`	Same as `drm_3` and increasing number of age-classes to 6
`drm_6`	Same as `drm_3` + fishing mortality for specific age-classes at each time point (16 age-classes)
`drm_7`	probability of absence as `drm_3`; AR(1) for recruitment (but constant across patches) and survival depending on SBT
`sdm`	A zero-augmented Gamma SDM with the absence probability as in `drm_3` and conditional density as a quadratic function of SST.

Code for fitting a complex model

drm_4 <-
  fit_drm(.data = dat_train,
          y_col = "dens",
          time_col = "year",
          site_col = "patch",
          family = "gamma",
          seed = 202505,
          formula_zero = ~ 1 + c_hauls + c_btemp + c_stemp,
          formula_rec = ~ 1 + c_stemp + I(c_stemp * c_stemp),
          formula_surv = ~ 1,
          n_ages = 6,
          adj_mat = adj_mat, ## A matrix for movement routine
          ages_movement = c(0, 0, 1, 1, 1, 0), ## ages allowed to move
          .toggles = list(est_surv = 1,
                          time_ar = 1,
                          movement = 1),
          init = "pathfinder")

Estimating relationship between processes and environmental variables

Recruitment and SST:

newdata_rec <- data.frame(c_stemp =
                            seq(from = quantile(dat_train$c_stemp, .05),
                                to = quantile(dat_train$c_stemp, .95),
                                length.out = 200))

rec_samples_3 <- marg_rec(drm_3, newdata_rec)

Survival and SBT:

newdata_surv <- data.frame(c_btemp =
                             seq(from = quantile(dat_train$c_btemp, .05),
                                 to = quantile(dat_train$c_btemp, .95),
                                 length.out = 200))

surv_samples <- marg_surv(drm_7, newdata_surv)

Estimated relationship between processes and environmental variables

Forecasting

The code for obtaining out-of-sample predictions is quite intuitive.
We only need to provide the output for a model fit, a new dataset (where we want the predictions), and a seed:

forecast_3 <- predict_drm(drm = drm_3,
                          new_data = dat_test,
                          past_data = filter(dat_train,
                                             year == max(year)),
                          seed = 125,
                          cores = 4)

Comparing models

Model	$\Delta$-LOOIC	RMSE	IS	PIC	Time
`drm_3`	0.00	8.98	8.60	78.0	13.34
`sdm`	-6.07	12.04	12.72	60.0	3.88
`drm_4`	-7.35	8.30	8.87	84.0	28.7
`drm_5`	-7.74	8.23	8.96	86.0	19.24
`drm_7`	-10.41	8.75	8.97	78.0	22.73
`drm_6`	-18.04	10.22	10.49	86.0	33.73
`drm_1`	-22.49	11.17	10.68	68.0	4.67
`drm_2`	-32.32	10.73	10.29	68.0	4.44
`baseline`	-163.85	14.44	12.98	74.0	1.35

Concluding remarks

Highlights

The drmr substantially lowers the barrier for ecologists to use the DRM in their applications.
The code is easy to use and takes advantage of what has been developed for Stan: visualization, diagnostic tools, and estimation.
drmr allows for empirically testing which processes are more important to predict the distribution of a species.
The more complex a model is, the more (and better) data we need to be able to estimate those relationships.

Future work

Application to NE Mackerel
More options for spatial and temporal random effects for specific processes
Other population dynamic models (e.g., Ricker, Belverton-Holt)
GAM-like non-parametric relationships between processes and the environment
Support for length-composition data
More realistic movement routines

References

Fredston, A., Ovando, D., Cunha Godoy, L. da, Kong, J., Muffley, B., Thorson, J. T., and Pinsky, M. (2025), “Dynamic range models improve the near-term forecast for a marine species on the move,” Ecology Letters, 00, (under review). https://doi.org/10.32942/X24D00.

Gabry, J., Češnovar, R., Johnson, A., and Bronder, S. (2024), cmdstanr: R interface to CmdStan.

Guisan, A., and Thuiller, W. (2005), “Predicting species distribution: Offering more than simple habitat models,” Ecology letters, Wiley Online Library, 8, 993–1009. https://doi.org/https://doi.org/10.1111/j.1461-0248.2005.00792.x.

Pagel, J., and Schurr, F. M. (2012), “Forecasting species ranges by statistical estimation of ecological niches and spatial population dynamics,” Global Ecology and Biogeography, 21, 293–304. https://doi.org/https://doi.org/10.1111/j.1466-8238.2011.00663.x.

drmr: A Bayesian approach to Dynamic Range Models in R