delicatessen.estimating_equations.survival.ee_aft_weibull_measure

ee_aft_weibull_measure(theta, times, X, measure, mu, beta, sigma)

Estimating equation to calculate a survival measure (survival, density, risk, hazard, cumulative hazard) given a specific covariate pattern and Weibull accelerated failure time (AFT) model. The estimating equation for the survival function at time \(t\) is

\[\sum_{i=1}^n \left\{ \exp(-1 \lambda_i t^{\gamma}) - \theta \right\} = 0\]

and the estimating equation for the hazard function at time \(t\) is

\[\sum_{i=1}^n \left\{ \lambda_i \gamma t^{\gamma - 1} - \theta \right\} = 0\]

where

\[\begin{split}\gamma = \exp(\sigma) \\ \lambda_i = \exp(-1 (\mu + X \beta) * \gamma)\end{split}\]

For the other measures, we take advantage of the following transformation between survival meaures

\[\begin{split}F(t) = 1 - S(t) \\ H(t) = -\log(S(t)) \\ f(t) = h(t) S(t)\end{split}\]

Note

For proper uncertainty estimation, this estimating equation is meant to be stacked together with the corresponding Weibull AFT model.

Parameters
  • theta (ndarray, list, vector) – theta consists of t values. The initial values should consist of the same number of elements as provided in the times argument.

  • times (int, float, ndarray, list, vector) – A single time or 1-dimensional collection of times to calculate the measure at. The number of provided times should consist of the same number of elements as provided in the theta argument.

  • X (ndarray, list, vector) – 2-dimensional vector of n observed values for b variables.

  • measure (str) – Measure to calculate. Options include survival ('survival'), density ('density'), risk or the cumulative density ('risk'), hazard ('hazard'), or cumulative hazard ('cumulative_hazard').

  • mu (float, int) – The estimated scale parameter from the Weibull AFT. From ee_aft_weibull, will be the first element.

  • beta – The estimated scale coefficients from the Weibull AFT. From ee_aft_weibull, will be the middle element(s).

  • sigma – The estimated shape parameter from the Weibull AFT. From ee_aft_weibull, will be the last element.

Returns

Returns a t-by-n NumPy array evaluated for the input theta

Return type

array

Examples

Construction of a estimating equations for \(S(t=5)\) with ee_aft_weibull_measure should be done similar to the following

>>> import numpy as np
>>> import pandas as pd
>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_aft_weibull, ee_aft_weibull_measure

For demonstration, we will generated generic survival data

>>> n = 100
>>> data = pd.DataFrame()
>>> data['X'] = np.random.binomial(n=1, p=0.5, size=n)
>>> data['W'] = np.random.binomial(n=1, p=0.5, size=n)
>>> data['T'] = (1/1.25 + 1/np.exp(0.5)*data['X'])*np.random.weibull(a=0.75, size=n)
>>> data['C'] = np.random.weibull(a=1, size=n)
>>> data['C'] = np.where(data['C'] > 10, 10, data['C'])
>>> data['delta'] = np.where(data['T'] < data['C'], 1, 0)
>>> data['t'] = np.where(data['delta'] == 1, data['T'], data['C'])
>>> d_obs = data[['X', 'W', 't', 'delta']].copy()

Our interest will be in the survival among those with \(X=1,W=1\). Therefore, we will generate a copy of the data and set the values in that copy (to keep the dimension the same across both estimating equations).

>>> d_coef = d_obs.copy()
>>> d_coef['X'] = 1
>>> d_coef['W'] = 1

Defining psi, or the stacked estimating equations

>>> def psi(theta):
>>>     aft = ee_aft_weibull(theta=theta[0:4],
>>>                     t=d_obs['t'], delta=d_obs['delta'], X=d_obs[['X', 'W']])
>>>     pred_surv_t = ee_aft_weibull_measure(theta=theta[4], X=d_coef[['X', 'W']],
>>>                                          times=5, measure='survival',
>>>                                          mu=theta[0], beta=theta[1:3], sigma=theta[3])
>>>     return np.vstack((aft, pred_surv_t))

Calling the M-estimator

>>> estr = MEstimator(stacked_equations=psi, init=[0., 0., 0., 0., 0.5])
>>> estr.estimate(solver='lm')

Inspecting the estimate, variance, and confidence intervals for \(S(t=5)\)

>>> estr.theta[-1]                      # \hat{S}(t)
>>> estr.variance[-1, -1]               # \hat{Var}(\hat{S}(t))
>>> estr.confidence_intervals()[-1, :]  # 95% CI for S(t)

Next, we will consider evaluating the survival function at multiple time points (so we can easily create a plot of the survival function and the corresponding confidence intervals)

Note

When calculate the survival (or other measures) at many time points, it is generally best to pre-wash the coefficients to reduce the number of iterations and total run-time.

To make everything easier, we will generate a list of uniformly spaced values between the start and end points of our desired survival function. We will also generate initial values of the same length (to help the optimizer, we also start our starting values from near one and end near zero).

>>> resolution = 50
>>> time_spacing = list(np.linspace(0.01, 8, resolution))
>>> fast_inits = list(np.linspace(0.99, 0.01, resolution))

Defining psi, or the stacked estimating equations

>>> def psi(theta):
>>>     aft = ee_aft_weibull(theta=theta[0:4],
>>>                     t=d_obs['t'], delta=d_obs['delta'], X=d_obs[['X', 'W']])
>>>     pred_surv_t = ee_aft_weibull_measure(theta=theta[4:], X=d_coef[['X', 'W']],
>>>                                          times=time_spacing, measure='survival',
>>>                                          mu=theta[0], beta=theta[1:3], sigma=theta[3])
>>>     return np.vstack((aft, pred_surv_t))

Calling the M-estimator

>>> estr = MEstimator(psi, init=list(estr.theta[0:4]) + fast_inits)
>>> estr.estimate(solver="lm")

To plot the survival curves, we could do the following:

>>> import matplotlib.pyplot as plt
>>> ci = estr.confidence_intervals()[4:, :]  # Extracting relevant CI
>>> plt.fill_between(time_spacing, ci[:, 0], ci[:, 1], alpha=0.2)
>>> plt.plot(time_spacing, estr.theta[4:], '-')
>>> plt.show()

References

Collett D. (2015). Accelerated failure time and other parametric models. In: Modelling survival data in medical research. CRC press. pg171-220