delicatessen.estimating_equations.survival.ee_exponential_measure

ee_exponential_measure(theta, times, n, measure, scale)

Estimating equation to calculate a survival measure (survival, density, risk, hazard, cumulative hazard) from the exponential model. The estimating equation for the survival function at time \(t\) is

\[\sum_{i=1}^n \left\{ \exp(- \lambda t) - \theta \right\} = 0\]

and the estimating equation for the hazard function at time \(t\) is

\[\sum_{i=1}^n \left\{ \lambda - \theta \right\} = 0\]

For the other measures, we take advantage of the following transformations

\[\begin{split}F(t) = 1 - S(t) \\ H(t) = -\log(S(t)) \\ f(t) = h(t) S(t)\end{split}\]

Note

For proper uncertainty estimation, this estimating equation is meant to be stacked with ee_exponential_model.

Parameters
  • theta (ndarray, list, vector) – theta consists of t values. The initial values should consist of the same number of elements as provided in the times argument.

  • times (int, float, ndarray, list, vector) – A single time or 1-dimensional collection of times to calculate the measure at. The number of provided times should consist of the same number of elements as provided in the theta argument.

  • n (int) – Number of observations in the input data. This argument ensures that the dimensions of the estimating equation are correct given the number of observations in the data.

  • measure (str) – Measure to calculate. Options include survival ('survival'), density ('density'), risk or the cumulative density ('risk'), hazard ('hazard'), or cumulative hazard ('cumulative_hazard').

  • scale (float, int) – The estimated scale parameter from the Weibull model. From ee_weibull_model, will be the first element.

Returns

Returns a t-by-n NumPy array evaluated for the input theta

Return type

array

Examples

Construction of a estimating equation(s) with ee_exponential_model and ee_exponential_measure should be done similar to the following. First, we will estimate the survival at time 5.

>>> import numpy as np
>>> import pandas as pd
>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_exponential_model, ee_exponential_measure

Some generic survival data to estimate an exponential model

>>> n = 100
>>> data = pd.DataFrame()
>>> data['C'] = np.random.weibull(a=1, size=n)
>>> data['C'] = np.where(data['C'] > 5, 5, data['C'])
>>> data['T'] = 0.8*np.random.weibull(a=1.0, size=n)
>>> data['delta'] = np.where(data['T'] < data['C'], 1, 0)
>>> data['t'] = np.where(data['delta'] == 1, data['T'], data['C'])

Defining psi, or the stacked estimating equations

>>> def psi(theta):
>>>     exp = ee_exponential_model(theta=theta[0], t=data['t'],
>>>                                delta=data['delta'])
>>>     pred_surv_t = ee_exponential_measure(theta=theta[1], n=data.shape[0],
>>>                                          times=5, measure='survival',
>>>                                          scale=theta[0])
>>>     return np.vstack((exp, pred_surv_t))

Calling the M-estimator (note that init has 2 value, one for the scale and the other for \(S(t=5)\)).

>>> estr = MEstimator(stacked_equations=psi, init=[1., 0.5])
>>> estr.estimate(solver='lm')

Inspecting the estimate, variance, and confidence intervals for \(S(t=5)\)

>>> estr.theta[-1]                      # \hat{S}(t)
>>> estr.variance[-1, -1]               # \hat{Var}(\hat{S}(t))
>>> estr.confidence_intervals()[-1, :]  # 95% CI for S(t)

Next, we will consider evaluating the survival function at multiple time points (so we can easily create a plot of the survival function and the corresponding confidence intervals)

Note

When calculate the survival (or other measures) at many time points, it is generally best to pre-wash the coefficients to reduce the number of iterations and total run-time.

To make everything easier, we will generate a list of uniformly spaced values between the start and end points of our desired survival function. We will also generate initial values of the same length (to help the optimizer, we also start our starting values from near one and end near zero).

>>> resolution = 50
>>> time_spacing = list(np.linspace(0.01, 5, resolution))
>>> fast_inits = list(np.linspace(0.99, 0.01, resolution))

Defining psi, or the stacked estimating equations

>>> def psi(theta):
>>>     exp = ee_exponential_model(theta=theta[0], t=data['t'],
>>>                                delta=data['delta'])
>>>     pred_surv_t = ee_exponential_measure(theta=theta[1:], n=data.shape[0],
>>>                                          times=time_spacing, measure='survival',
>>>                                          scale=theta[0])
>>>     return np.vstack((exp, pred_surv_t))

Calling the M-estimator

>>> mestr = MEstimator(psi, init=list(estr.theta[0]) + fast_inits)
>>> mestr.estimate(solver="lm")

To plot the survival curves, we could do the following:

>>> import matplotlib.pyplot as plt
>>> ci = mestr.confidence_intervals()[1:, :]  # Extracting relevant CI
>>> plt.fill_between(time_spacing, ci[:, 0], ci[:, 1], alpha=0.2)
>>> plt.plot(time_spacing, mestr.theta[1:], '-')
>>> plt.show()

References

Collett D. (2015). Modelling survival data in medical research. CRC press.