delicatessen.estimating_equations.survival.ee_aft
- ee_aft(theta, X, t, delta, distribution, weights=None)
Estimating equation for a generalized accelerated failure time (AFT) model. Let \(T_i\) indicate the time of the event and \(C_i\) indicate the time to right censoring. Therefore, the observable data consists of \(t_i = \min(T_i, C_i)\) and \(\Delta_i = I(t_i = T_i)\). The estimating equations are
\[\begin{split}\sum_{i=1}^n = \begin{bmatrix} - \frac{1}{\sigma} \lambda_\epsilon (Z_i) X^T \\ - \frac{1}{\sigma} \lambda_\epsilon (Z_i) \times Z_i - \frac{\Delta_i}{\sigma} \\ \end{bmatrix} = 0\end{split}\]where \(\theta = (\beta, \sigma)\), \(Z_i = \frac{\log(t_i) - X \beta^T}{\sigma}\) and
\[\lambda_\epsilon = \Delta_i \frac{f_\epsilon'(Z_i)}{f_\epsilon(Z_i)} - (1 - \Delta_i) \frac{S_\epsilon'(Z_i)}{S_\epsilon(Z_i)}.\]Here the choice of the distribution for \(f_\epsilon\) and \(S_\epsilon\) are determined by the specified distributions. Options include exponential, Weibull, log-logistic, and log-normal. These functions are described in the following table
Distribution
Keyword
\(f_\epsilon' / f_\epsilon\)
\(S_\epsilon' / S_\epsilon\)
Exponential
exponential\(1-\exp(Z_i)\)
\(\exp(Z_i)\)
Weibull
weibull\(1-\exp(Z_i)\)
\(\exp(Z_i)\)
Log-Logistic
log-logistic\(1 - \frac{2 \exp(Z_i)}{1 + \exp(Z_i)}\)
\(\frac{\exp(Z_i)}{1 + \exp(Z_i)}\)
Log-Normal
log-normal\(- Z_i\)
\(\frac{\phi(Z_i)}{1 - \Phi(Z_i)}\)
The design matrix \(X\) should include an intercept term. Note that for optimization, the starting values for the intercept term likely be a positive number (e.g., \(5\)).
Note
The parametrization of the AFT model is the same as R’s
survivallibrary, except for scale parameter. Here, the inverse of the scale is equal to the R implementation.Here, \(\theta\) is a 1-by-(b + 1) array, where b is the distinct covariates included as part of
X. For example, ifXis a 3-by-n matrix, then theta will be a 1-by-4 array. The code is general to allow for an arbitrary dimension ofX.- Parameters
theta (ndarray, list, vector) – theta consists of b`+1 values. Therefore, initial values should consist of the same number as the number of columns present in ``X` plus 1. This can easily be implemented via
[0, ] * X.shape[1] + [0, ]. Note that if using an exponential model, only b values need to be provided.X (ndarray, list, vector) – 2-dimensional vector of n observed values for b variables.
t (ndarray, list, vector) – 1-dimensional vector of n observed times.
delta (ndarray, list, vector) – 1-dimensional vector of n values indicating whether the time was an event or censoring.
distribution (str) – Distribution to use for the AFT model. See table for options.
weights (ndarray, list, vector, None, optional) – 1-dimensional vector of n weights. Default is
None, which assigns a weight of 1 to all observations.
- Returns
Returns a (b + 1)-by-n NumPy array evaluated for the input
theta.- Return type
array
Examples
Construction of a estimating equation(s) with
ee_aftshould be done similar to the following>>> import numpy as np >>> import pandas as pd >>> from delicatessen import MEstimator >>> from delicatessen.estimating_equations import ee_aft
Some generic survival data to estimate a AFT regression model
>>> n = 100 >>> data = pd.DataFrame() >>> data['X'] = np.random.binomial(n=1, p=0.5, size=n) >>> data['T'] = (1/1.25 + 1/np.exp(0.5)*data['X'])*np.random.weibull(a=0.75, size=n) >>> data['C'] = np.random.weibull(a=1, size=n) >>> data['C'] = np.where(data['C'] > 10, 10, data['C']) >>> data['delta'] = np.where(data['T'] < data['C'], 1, 0) >>> data['t'] = np.where(data['delta'] == 1, data['T'], data['C']) >>> d_obs = data[['X', 't', 'delta']].copy() >>> d_obs['C'] = 1
Defining psi, or the stacked estimating equations
>>> def psi(theta): >>> aft = ee_aft(theta, t=d_obs['t'], delta=d_obs['delta'], X=d_obs[['X', 'W']], distribution='weibull') >>> return aft
Calling the M-estimator
>>> estr = MEstimator(stacked_equations=psi, init=[2., 0., 0.]) >>> estr.estimate(solver='lm')
Note
Optimization of the AFT model can be difficult. It may help to fit an exponential AFT model first and then use those coefficients as starting values for a more general model.
Inspecting the parameter estimates, variance, and confidence intervals
>>> estr.theta >>> estr.variance >>> estr.confidence_intervals()
Inspecting parameter the specific parameter estimates
>>> estr.theta[0] # beta_0 (scale) >>> estr.theta[1] # beta_X (scale coefficients) >>> estr.theta[-1] # log(sigma) (shape)
References
Collett D. (2015). Accelerated failure time and other parametric models. In: Modelling Survival Data in Medical Research. CRC press. pg 236-250