abslife: A new package for estimating the lifetime distribution for left-truncated, discrete-time data with applications to asset-backed securities

Lucas da Cunha Godoy

EEB Department, UCSC

2026-02-10

Acknowledgment

  • This research is partly supported in part by a 2024 Faculty Seed for Success Grant Award from Bentley University.

  • W. Michael Hoffman Center for Business Ethics at Bentley University.

Talk material

Available at lcgodoy.me/slides/2026-bentley

Roadmap

  1. Tools and advice for writing R packages

  2. Review of relevant statistical methods and asset-backed security data

  3. Working examples with the abslife R package

  4. Future directions

Writing an R package

Advice & Resources

  1. Get comfortable with the command line and git
  2. Freely available resources:
  3. Easily documenting R packages: roxygen2
  4. Making your documentation into a website: pkgdown

Classes & Methods

  • If your package aims at making a methodology available, it is a good practice to create your own “classes”, and associated methods.
output <- my_method(data)

print(output)   ## prints minimal information about the output
summary(output) ## summarizes the results in a meaningful way
plot(output)    ## visualizes the results

Background

Event & time to event

  • Event = Change of state

    • Healthcare: Recovery (Ill \(\rightarrow\) Healthy)
    • Engineering: Failure (Working \(\rightarrow\) Broken)
    • Business: Churn (Subscrined \(\rightarrow\) Cancelled)
  • Time to event: The duration from the start of observation until the state changes.

  • Competing risk (CR): When an alternative event prevents the event of interest.

Asset-backet securities

  • By pooling auto loans into trusts, ABS links capital from investors to consumers needing credit, directly supporting manufacturer sales.

  • Significant market value: The sector is economically vital, with $170.4 billion in new ABS issuance recorded between November 2023 and October 2024.

  • Expanded Data Access: Regulatory changes have enabled an increasing in data availability in this market.

  • Empirical data on the loans allow for estimating of the duration mismatch and lenders’ profitability through a probabilistic view of cashflows.

Cash flow & lifetime data

Loan (Age) Month 1 Month 2 Month s
\(L_{1(x_1)}\) \(CF_{1(x_1+1)}\) \(CF_{1(x_1+2)}\) \(\cdots\) \(CF_{1(x_1+s)}\)
\(L_{2(x_2)}\) \(CF_{2(x_2+1)}\) \(CF_{2(x_2+2)}\) \(\cdots\) \(CF_{2(x_2+s)}\)
\(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\)
\(L_{n(x_n)}\) \(CF_{n(x_n+1)}\) \(CF_{n(x_n+2)}\) \(\cdots\) \(CF_{n(x_n+s)}\)
ABS CF \(f\left(\sum_{j=1}^{n} CF_{j(x_j+1)}\right)\) \(f\left(\sum_{j=1}^{n} CF_{j(x_j+2)}\right)\) \(\cdots\) \(f\left(\sum_{j=1}^{n} CF_{j(x_j+s)}\right)\)

Focus of the package/analyses

  • Event: A consumer auto loan or lease contract (with fixed known duration) default or pre-payment

  • Discrete time to event: Two (or more) loans sharing the same termination age is almost certain.

  • Left-truncation (LT): Occurs because investors only observe loans that survive long enough to be included in the trust.

  • Right-censoring (RC): Refers to leases that are currently active. We know they are paying now, but we don’t know when they will eventually end.

Relevant Literature

Notation, Assumptions & Results

Notation

  • \(X\) and \(Y\) denote the discrete time to event and a left-truncation random variable, respectively.

  • Truncation: We only observe \(X\) given \(X \geq Y\).

  • Probability mass function: \(p(x) = \Pr(X = x)\)

  • Survival and distribution functions: \(S(x) = \Pr(X \geq x)\)

  • Hazard rate: \(\lambda(x) = \frac{p(x)}{S(x)}\)

  • Cause specific hazard rate: \(\lambda_k(x) = \frac{\Pr(X = x, Z = k)}{\Pr(X \geq x)}\)

Assumptions

  • \(X, Y \in \mathbb{N}\)

  • \(X \perp Y\): Reasonable in ABS applications (Lautier et al. 2023b).

  • Right-censoring:

    • \(C = Y + \tau\), where \(\tau\) is a constant.
    • When right-censoring is present, we observe: \((Y, \min\{X, C\})\) as opposed to \((Y, X)\).

A typical dataset

LT

  Xi Yi
1 39 33
2 39 33
3 39 33
4 34 33
5 35 33
6 39 33

LT + RC

  Xi Yi Ci
1 37 29  1
2 30 29  1
3 37 30  1
4 32 32  1
5 37 32  1
6 36 33  1

LT + RC + CR

  Xi Yi Ci Zi
1 42 19  0  0
2 50 19  0  1
3 23 19  0  1
4 70 19  1  0
5 21 19  0  1
6 70 19  0  0

Estimators

  • \(\hat{\lambda}(x) = \frac{\sum_{i=1}^{n} \mathbf{1}_{X_i \leq C_i} \mathbf{1}_{\min(X_i, C_i)=x}}{\sum_{i=1}^{n} \mathbf{1}_{Y_i \leq x \leq \min(X_i, C_i)}}\)

  • \(\hat{\lambda}_k(x) = \frac{\sum_{i=1}^{n} \mathbf{1}_{Z_i = k} \mathbf{1}_{X_i \leq C_i} \mathbf{1}_{\min(X_i, C_i)=x}}{\sum_{i=1}^{n} \mathbf{1}_{Y_i \leq x \leq \min(X_i, C_i)}}\)

  • \(\hat{S}(x) = \prod_{\Delta + 1 \leq k \leq x}[1 - \hat{\lambda}(x)]\)

  • \(\hat{F}(-x) = 1 - \hat{S}(x)\)

  • \(\hat{p}(x) = \hat{S}(x - 1) - \hat{S}(x)\)

  • We provide asymptotic distributions for the estimators above.

Using the abslife package

Installing the package

  1. Make sure you have either the remotes or pak packages installed.
install.packages("remotes")
## install.packages("pak")
  1. Install abslife using remotes:
remotes::install_github("lcgodoy/abslife")
  1. Install abslife using pak:
pak::pak("lcgodoy/abslife")

Datasets in the package

The main function

  • Most of the package’s functionalities revolve around the estimate_hazard function.

    library(abslife)
    args(estimate_hazard)
    function (lifetime, trunc_time = NULL, censoring_indicator = NULL, 
        event_type = NULL, support_lifetime_rv = NULL, carry_hazard = FALSE, 
        ci_level = 0.95) 
    NULL
  • For details on the function arguments, run ?estimate_hazard, or check this link.

  • Available methods: print, summary, plot and calc_cdf.

Example 1: Left-truncation

data(aart)
head(aart)
  Xi Yi
1 39 33
2 39 33
3 39 33
4 34 33
5 35 33
6 39 33

Example 1: Estimating hazard rates

hazard_aart <- estimate_hazard(lifetime = aart$Xi,
                               trunc_time = aart$Yi)
hazard_aart
Observed lifetime support: [5, 49] 
Total number of timepoints observed: 45 
ev_life(hazard_aart)
Expected lifetime: 28 

Example 1: Plotting

plot(hazard_aart)

Example 1: Restricting support

hazard_aart <- estimate_hazard(lifetime = aart$Xi,
                               trunc_time = aart$Yi,
                               support_lifetime_rv = 5:37)
hazard_aart
Observed lifetime support: [5, 37] 
Total number of timepoints observed: 33 
plot(hazard_aart)

Example 1: Interpolating

hazard_aart <- estimate_hazard(lifetime = aart$Xi,
                               trunc_time = aart$Yi,
                               carry_hazard = TRUE)
hazard_aart
Observed lifetime support: [5, 49] 
Total number of timepoints observed: 45 
plot(hazard_aart)

Example 1: CDF and density

cdf_aart <- calc_cdf(hazard_aart)
summary(cdf_aart)
   lifetime        cdf      density
1         5 0.01904762 0.0190476190
6        10 0.10628735 0.0141298442
11       15 0.19848679 0.0199852375
16       20 0.28713123 0.0125064696
21       25 0.37058116 0.0191770702
26       30 0.45756023 0.0185808099
31       35 0.54922672 0.0245421009
36       40 0.99118211 0.0240042534
41       45 0.99975506 0.0002449414

Example 1: Plotting CDF

plot(cdf_aart)

Example 1: Sampling from the lifetime distribution

samples_cdf <- ralife_cdf(10000, cdf_aart)

hist(samples_cdf, xlab = "Lifetime")

Example 2: LC + RC

data(mbalt)
head(mbalt)
  Zi Yi Di
1 37 29  1
2 30 29  1
3 37 30  1
4 32 32  1
5 37 32  1
6 36 33  1

Example 2: Estimating hazard rates

hazard_mbalt <- estimate_hazard(lifetime = mbalt$Zi,
                                trunc_time = mbalt$Yi,
                                censoring_indicator = 1 - mbalt$Di)
hazard_mbalt
Observed lifetime support: [5, 37] 
Total number of timepoints observed: 33 
summary(hazard_mbalt, by = 10)
   lifetime      hazard se_log_hazard    lower_ci    upper_ci
1         5 0.002508511    0.26692582 0.001486649 0.004232760
11       15 0.001737138    0.14573824 0.001305514 0.002311463
21       25 0.006408170    0.06229942 0.005671588 0.007240414
31       35 0.177075260    0.01376845 0.172360664 0.181918815

Example 3: LT + RC + CR

data(aloans)
head(aloans)
    risk_cat  Z  Y C D R  bond
1   subprime 42 19 0 0 1 sdart
2   subprime 50 19 0 1 0 sdart
3 near_prime 23 19 0 1 0 sdart
4      prime 70 19 1 0 0 sdart
5   subprime 21 19 0 1 0 sdart
6      prime 70 19 0 0 1 sdart

Example 3: Estimating hazard rates

aloans_prime <- subset(aloans, risk_cat == "prime")
hazard_aloans <- estimate_hazard(lifetime = aloans_prime$Z,
                                trunc_time = aloans_prime$Y,
                                censoring_indicator = aloans_prime$C,
                                event_type = aloans_prime$D)
hazard_aloans
Observed lifetime support: [3, 70] 
Total number of timepoints observed: 68 
Total number of event types: 2 
Event types observed: 0, 1 

Example 3: Estimating hazard rates

aloans_prime$etype <- ifelse(aloans_prime$D == 1, "Default",
                             "Prepayment")
hazard_aloans <- estimate_hazard(lifetime = aloans_prime$Z,
                                trunc_time = aloans_prime$Y,
                                censoring_indicator = aloans_prime$C,
                                event_type = aloans_prime$etype,
                                carry_hazard = TRUE)
hazard_aloans
Observed lifetime support: [3, 70] 
Total number of timepoints observed: 68 
Total number of event types: 2 
Event types observed: Default, Prepayment 
summary(hazard_aloans, by = 15)

Example 3: Estimating hazard rates

  event_type lifetime      hazard se_log_hazard     lower_ci    upper_ci
1    Default        3 0.003278689     0.7059466 0.0008218590 0.013079858
2    Default       18 0.002227585     0.2883534 0.0012658650 0.003919956
3    Default       33 0.002093997     0.3329842 0.0010902836 0.004021728
4    Default       48 0.004700721     0.2575913 0.0028372809 0.007788011
5    Default       63 0.006711409     0.9966386 0.0009516415 0.047331916
6 Prepayment       18 0.014293670     0.1131432 0.0114508070 0.017842323
7 Prepayment       33 0.011167985     0.1435293 0.0084295122 0.014796098
8 Prepayment       48 0.020056409     0.1237401 0.0157371246 0.025561183
9 Prepayment       63 0.063492063     0.4838667 0.0245952567 0.163903235

Example 3: Plot + multiple CIs

plot(hazard_aloans, ci_level = c(.5, .75, .9, .99), lwd = 1)

Example 3: Customization

plot(hazard_aloans,
     ci_level = c(.5, .75, .9, .99),
     lwd = 1, ## line width
     col_line = "#d33682", ## line color
     color = "#268bd2") ## shaded region color

Example 3: ggplot2? Yes.

gg_aloans <-
  prep_ggplot(hazard_aloans,
              ci_level = c(.5, .75, .9, .99))
head(gg_aloans)
  event_type lifetime  risk_set      hazard se_log_hazard     lower_ci
1 Prepayment        4 0.3193651 0.002485089     0.4466576 0.0007864666
2 Prepayment        5 0.5125397 0.003096934     0.3157377 0.0013731758
3 Prepayment        6 0.7419048 0.004278990     0.2231279 0.0024084363
4 Prepayment        7 0.7825397 0.007302231     0.1660570 0.0047609412
5 Prepayment        8 0.8000000 0.011507937     0.1305487 0.0082216073
6 Prepayment        9 0.8120635 0.008600469     0.1501060 0.0058425617
     upper_ci level
1 0.007852424   99%
2 0.006984540   99%
3 0.007602342   99%
4 0.011200008   99%
5 0.016107873   99%
6 0.012660212   99%

Example 3: Plotting using ggplot2

library(ggplot2)

ggplot(data = gg_aloans,
       aes(x = lifetime,
           y = hazard)) +
  geom_ribbon(aes(alpha = level,
                  ymin = lower_ci,
                  ymax = upper_ci)) +
  geom_line() +
  facet_wrap(~ event_type)

Example 3: Plotting using ggplot2

Example 3: Customizing ggplot2

ggplot(data = gg_aloans,
       aes(x = lifetime,
           y = hazard)) +
  geom_ribbon(aes(alpha = level,
                  ymin = lower_ci,
                  ymax = upper_ci),
              fill = 2) +
  geom_line(color = 1) +
  labs(x = "x",
       y = expression(lambda(x)),
       alpha = "Confidence level") +
  theme_bw() +
  theme(legend.position = "inside",
        legend.position.inside = c(.2, .8)) +
  facet_wrap(~ event_type)

Example 3: Customizing ggplot2

Financial applications

ABS data

  • The availability of intuitive tools to enable data driven decision making has not yet matched the wide availability ABS datasets.

  • abslife fills this gap by providing tools for the analyses of discrete time to event data.

  • The package opens the possibibility for investors to assess the uncertainty around problems such as: duration mismatch, profitability, loan pricing, and risk scoring

Duration example

Month 5 6 7 \(\cdots\)
Probability \(\hat{p}(5)\) \(\hat{p}(6)\) \(\hat{p}(7)\) \(\cdots\)
CF \(\text{PMT}_1\) \(\text{PMT}_1\) \(\text{PMT}_1\) \(\cdots\)
\(\text{PMT}_2\) \(\text{PMT}_2\) \(\text{PMT}_2\) \(\cdots\)
\(\vdots\) \(\vdots\) \(\vdots\) \(\cdots\)
\(\text{PMT}_5 + \text{Prepay}\) \(\text{PMT}_5\) \(\text{PMT}_5\) \(\cdots\)
\(\text{PMT}_6 + \text{Prepay}\) \(\text{PMT}_6\) \(\cdots\)
\(\text{PMT}_7 + \text{Prepay}\) \(\cdots\)

Wrapping up!

Future work

  • Enhanced documentation

  • Non-parametric Bootstrap

  • Incorporate future methods such as regression

  • More concrete financial examples

References

AART (2017), Ally Auto Receivables Trust, Prospectus, Ally Auto Assets LLC.
Lautier, J. P., Pozdnyakov, V., and Yan, J. (2023a), “Estimating a discrete distribution subject to random left-truncation with an application to structured finance,” Econometrics and Statistics. https://doi.org/10.1016/j.ecosta.2023.05.005.
Lautier, J. P., Pozdnyakov, V., and Yan, J. (2023b), “Pricing time-to-event contingent cash flows: A discrete-time survival analysis approach,” Insurance: Mathematics and Economics, 110, 53–71. https://doi.org/10.1016/j.insmatheco.2023.02.003.
Lautier, J. P., Pozdnyakov, V., and Yan, J. (2024), “On the convergence of credit risk in current consumer automobile loans,” Journal of the Royal Statistical Society Series A: Statistics in Society, qnae137. https://doi.org/10.1093/jrsssa/qnae137.
Wickham, H. (2019), Advanced r, Boca Raton, Florida: Chapman; Hall/CRC. https://doi.org/10.1201/9781351201315.