delicatessen.utilities.aft_predictions_individual

aft_predictions_individual(X, times, theta, distribution, measure='survival')

Compute predicted survival analysis measures from an accelerated failure time (AFT) model for given a design matrix and times. This function is meant to be used with parametrization of the ee_aft to generate predicted survival (or other measures) at user-specified time points.

Predictions are generated via

\[\begin{split}S(t) = S_{\epsilon}\left( \frac{\log(t) - X \beta^T}{\sigma} \right) \\ h(t) = (\sigma t)^{-1} h_{\epsilon}\left( \frac{\log(t) - X \beta^T}{\sigma} \right)\end{split}\]

where the corresponding function for the given AFT distribution is

Distribution

Keyword

\(S_\epsilon(x)\)

\(h_\epsilon(x)\)

Exponential

exponential

\(\exp(-\exp(x))\)

\(\exp(x)\)

Weibull

weibull

\(\exp(-\exp(x))\)

\(\exp(x)\)

Log-Logistic

log-logistic

\((1 - \exp(x))^{-1}\)

\((1 - \exp(-x))^{-1}\)

Log-Normal

log-normal

\(1 - \Phi(x)\)

\(\frac{\exp(-x^2 / 2)}{[1 - \Phi(x)] \sqrt{2 \pi }}\)

Note that one only needs to ensure that distribution is set to the same argument as the one used in ee_aft

Parameters
  • X (ndarray, list, vector) – 2-dimensional vector of n observed values for b variables.

  • times (float, int, ndarray, list, vector) – Either a single time point or a vector of time points to generate predicted measures at. This argument determines the shape of the output.

  • theta (ndarray, list, vector) – Estimated coefficients from MEstimator.theta with ee_aft.

  • distribution (str) – Distribution to use for the AFT model. See table for options.

  • measure (str, optional) – Measure to compute. Options include survival ('survival'), density ('density'), risk or the cumulative density ('risk'), hazard ('hazard'), or cumulative hazard ('cumulative_hazard'). Default is survival

Returns

Returns a n-by-t NumPy array of predictions, where n is the number of rows in the design matrix and t is the number of time points.

Return type

array

Examples

The following illustrates how to use aft_predictions_individual to generate predicted survival probabilites at specific times for individuals.

>>> import numpy as np
>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_aft
>>> from delicatessen.utilities import aft_predictions_individual
>>> from delicatessen.data import load_breast_cancer

Loading breast cancer data from Collett 2015

>>> dat = load_breast_cancer()
>>> delta = dat[:, 0]
>>> t = dat[:, 1]
>>> covars = np.asarray([np.ones(dat.shape[0]), dat[:, 0]]).T

Estimating the parameters of a Weibull AFT model

>>> def psi(theta):
>>>     return ee_aft(theta=theta, t=t, delta=delta,
>>>                   X=covars, distribution='weibull')
>>> estr = MEstimator(psi, init=[5., 0., 0.])
>>> estr.estimate()

Now we can generate predicted values of survival for each observation. Suppose we wanted the survival at time 50 for all units. The following code gives us predicted survival for all units

>>> aft_predictions_individual(X=covars, times=50.,
>>>                            theta=estr.theta,
>>>                            distribution='weibull')

Alternatively, we can request the predicted survival at multiple points at once. The following code computes the predicted survival at times 50, 100, 150, 200, 250 for all units.

>>> aft_predictions_individual(X=covars, times=[50, 100, 150, 200, 250],
>>>                            theta=estr.theta,
>>>                            distribution='weibull')

Different survival measures can be requested through the optional measure argument.

References

Collett D. (2015). Accelerated failure time and other parametric models. In: Modelling survival data in medical research. CRC press. pg 242