delicatessen.estimating_equations.survival.ee_aft

ee_aft(theta, X, t, delta, distribution, weights=None)

Estimating equation for a generalized accelerated failure time (AFT) model. Let \(T_i\) indicate the time of the event and \(C_i\) indicate the time to right censoring. Therefore, the observable data consists of \(t_i = \min(T_i, C_i)\) and \(\Delta_i = I(t_i = T_i)\). The estimating equations are

\[\begin{split}\sum_{i=1}^n = \begin{bmatrix} - \frac{1}{\sigma} \lambda_\epsilon (Z_i) X^T \\ - \frac{1}{\sigma} \lambda_\epsilon (Z_i) \times Z_i - \frac{\Delta_i}{\sigma} \\ \end{bmatrix} = 0\end{split}\]

where \(\theta = (\beta, \sigma)\), \(Z_i = \frac{\log(t_i) - X \beta^T}{\sigma}\) and

\[\lambda_\epsilon = \Delta_i \frac{f_\epsilon'(Z_i)}{f_\epsilon(Z_i)} - (1 - \Delta_i) \frac{S_\epsilon'(Z_i)}{S_\epsilon(Z_i)}.\]

Here the choice of the distribution for \(f_\epsilon\) and \(S_\epsilon\) are determined by the specified distributions. Options include exponential, Weibull, log-logistic, and log-normal. These functions are described in the following table

Distribution

Keyword

\(f_\epsilon' / f_\epsilon\)

\(S_\epsilon' / S_\epsilon\)

Exponential

exponential

\(1-\exp(Z_i)\)

\(\exp(Z_i)\)

Weibull

weibull

\(1-\exp(Z_i)\)

\(\exp(Z_i)\)

Log-Logistic

log-logistic

\(1 - \frac{2 \exp(Z_i)}{1 + \exp(Z_i)}\)

\(\frac{\exp(Z_i)}{1 + \exp(Z_i)}\)

Log-Normal

log-normal

\(- Z_i\)

\(\frac{\phi(Z_i)}{1 - \Phi(Z_i)}\)

The design matrix \(X\) should include an intercept term. Note that for optimization, the starting values for the intercept term likely be a positive number (e.g., \(5\)).

Note

The parametrization of the AFT model is the same as R’s survival library, except for scale parameter. Here, the inverse of the scale is equal to the R implementation.

Here, \(\theta\) is a 1-by-(b + 1) array, where b is the distinct covariates included as part of X. For example, if X is a 3-by-n matrix, then theta will be a 1-by-4 array. The code is general to allow for an arbitrary dimension of X.

Parameters
  • theta (ndarray, list, vector) – theta consists of b`+1 values. Therefore, initial values should consist of the same number as the number of columns present in ``X` plus 1. This can easily be implemented via [0, ] * X.shape[1] + [0, ]. Note that if using an exponential model, only b values need to be provided.

  • X (ndarray, list, vector) – 2-dimensional vector of n observed values for b variables.

  • t (ndarray, list, vector) – 1-dimensional vector of n observed times.

  • delta (ndarray, list, vector) – 1-dimensional vector of n values indicating whether the time was an event or censoring.

  • distribution (str) – Distribution to use for the AFT model. See table for options.

  • weights (ndarray, list, vector, None, optional) – 1-dimensional vector of n weights. Default is None, which assigns a weight of 1 to all observations.

Returns

Returns a (b + 1)-by-n NumPy array evaluated for the input theta.

Return type

array

Examples

Construction of a estimating equation(s) with ee_aft should be done similar to the following

>>> import numpy as np
>>> import pandas as pd
>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_aft

Some generic survival data to estimate a AFT regression model

>>> n = 100
>>> data = pd.DataFrame()
>>> data['X'] = np.random.binomial(n=1, p=0.5, size=n)
>>> data['T'] = (1/1.25 + 1/np.exp(0.5)*data['X'])*np.random.weibull(a=0.75, size=n)
>>> data['C'] = np.random.weibull(a=1, size=n)
>>> data['C'] = np.where(data['C'] > 10, 10, data['C'])
>>> data['delta'] = np.where(data['T'] < data['C'], 1, 0)
>>> data['t'] = np.where(data['delta'] == 1, data['T'], data['C'])
>>> d_obs = data[['X', 't', 'delta']].copy()
>>> d_obs['C'] = 1

Defining psi, or the stacked estimating equations

>>> def psi(theta):
>>>     aft = ee_aft(theta, t=d_obs['t'], delta=d_obs['delta'], X=d_obs[['X', 'W']], distribution='weibull')
>>>     return aft

Calling the M-estimator

>>> estr = MEstimator(stacked_equations=psi, init=[2., 0., 0.])
>>> estr.estimate(solver='lm')

Note

Optimization of the AFT model can be difficult. It may help to fit an exponential AFT model first and then use those coefficients as starting values for a more general model.

Inspecting the parameter estimates, variance, and confidence intervals

>>> estr.theta
>>> estr.variance
>>> estr.confidence_intervals()

Inspecting parameter the specific parameter estimates

>>> estr.theta[0]     # beta_0     (scale)
>>> estr.theta[1]     # beta_X     (scale coefficients)
>>> estr.theta[-1]    # log(sigma) (shape)

References

Collett D. (2015). Accelerated failure time and other parametric models. In: Modelling Survival Data in Medical Research. CRC press. pg 236-250