delicatessen.estimating_equations.survival.ee_weibull_measure

ee_weibull_measure(theta, times, n, measure, scale, shape)

Estimating equation to calculate a survival measure (survival, density, risk, hazard, cumulative hazard) for the Weibull model. The estimating equation for the survival function at time \(t\) is

\[\sum_{i=1}^n \left\{ \exp(- \lambda t^{\gamma}) - \theta \right\} = 0\]

and the estimating equation for the hazard function at time \(t\) is

\[\sum_{i=1}^n \left\{ \lambda \gamma t^{\gamma - 1} - \theta \right\} = 0\]

For the other measures, we take advantage of the following transformation between survival measures

\[\begin{split}F(t) = 1 - S(t) \\ H(t) = -\log(S(t)) \\ f(t) = h(t) S(t)\end{split}\]

Note

For proper uncertainty estimation, this estimating equation is meant to be stacked with ee_weibull_model.

Parameters

theta (ndarray, list, vector) – theta consists of t values. The initial values should consist of the same number of elements as provided in the times argument.
times (int, float, ndarray, list, vector) – A single time or 1-dimensional collection of times to calculate the measure at. The number of provided times should consist of the same number of elements as provided in the theta argument.
n (int) – Number of observations in the input data. This argument ensures that the dimensions of the estimating equation are correct given the number of observations in the data.
measure (str) – Measure to calculate. Options include survival ('survival'), density ('density'), risk or the cumulative density ('risk'), hazard ('hazard'), or cumulative hazard ('cumulative_hazard').
scale (float, int) – The estimated scale parameter from the Weibull model. From ee_weibull_model, will be the first element.
shape – The estimated shape parameter from the Weibull model. From ee_weibull_model, will be the second (last) element.

Returns

Returns a t-by-n NumPy array evaluated for the input theta.

Return type

array

Examples

Construction of a estimating equation(s) with ee_weibull_model and ee_weibull_measure should be done similar to the following. First, we will estimate the survival at time 5.

>>> import numpy as np
>>> import pandas as pd
>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_weibull_model, ee_weibull_measure

Some generic survival data to estimate a Weibull model

>>> n = 100
>>> data = pd.DataFrame()
>>> data['C'] = np.random.weibull(a=1, size=n)
>>> data['C'] = np.where(data['C'] > 5, 5, data['C'])
>>> data['T'] = 0.8*np.random.weibull(a=0.8, size=n)
>>> data['delta'] = np.where(data['T'] < data['C'], 1, 0)
>>> data['t'] = np.where(data['delta'] == 1, data['T'], data['C'])

Defining psi, or the stacked estimating equations

>>> def psi(theta):
>>>     exp = ee_weibull_model(theta=theta[0:2], t=data['t'],
>>>                            delta=data['delta'])
>>>     pred_surv_t = ee_weibull_measure(theta=theta[2], n=data.shape[0],
>>>                                      times=5, measure='survival',
>>>                                      scale=theta[0], shape=theta[1])
>>>     return np.vstack((exp, pred_surv_t))

Calling the M-estimator

>>> estr = MEstimator(stacked_equations=psi, init=[1., 1., 0.5])
>>> estr.estimate(solver='lm')

Inspecting the estimate, variance, and confidence intervals for \(S(t=5)\)

>>> estr.theta[-1]                      # \hat{S}(t)
>>> estr.variance[-1, -1]               # \hat{Var}(\hat{S}(t))
>>> estr.confidence_intervals()[-1, :]  # 95% CI for S(t)

Next, we will consider evaluating the survival function at multiple time points (so we can easily create a plot of the survival function and the corresponding confidence intervals)

Note

When calculate the survival (or other measures) at many time points, it is generally best to pre-wash the coefficients to reduce the number of iterations and total run-time.

To make everything easier, we will generate a list of uniformly spaced values between the start and end points of our desired survival function. We will also generate initial values of the same length (to help the optimizer, we also start our starting values from near one and end near zero).

>>> resolution = 50
>>> time_spacing = list(np.linspace(0.01, 5, resolution))
>>> fast_inits = list(np.linspace(0.99, 0.01, resolution))

Defining psi, or the stacked estimating equations

>>> def psi(theta):
>>>     exp = ee_weibull_model(theta=theta[0:2], t=data['t'],
>>>                            delta=data['delta'])
>>>     pred_surv_t = ee_weibull_measure(theta=theta[2:], n=data.shape[0],
>>>                                      times=time_spacing, measure='survival',
>>>                                      scale=theta[0], shape=theta[1])
>>>     return np.vstack((exp, pred_surv_t))

Calling the M-estimator

>>> mestr = MEstimator(psi, init=list(estr.theta[0:2]) + fast_inits)
>>> mestr.estimate(solver="lm")

To plot the survival curves, we could do the following:

>>> import matplotlib.pyplot as plt
>>> ci = mestr.confidence_intervals()[2:, :]  # Extracting relevant CI
>>> plt.fill_between(time_spacing, ci[:, 0], ci[:, 1], alpha=0.2)
>>> plt.plot(time_spacing, mestr.theta[2:], '-')
>>> plt.show()

References

Collett D. (2015). Modelling survival data in medical research. CRC press.