delicatessen.utilities.plogit_predict
- plogit_predict(theta, t, delta, X, S=None, times_to_predict=None, measure='survival', unique_times=None)
Compute predicted survival analysis measures from a pooled logistic regression model for given a design matrix and times. This function is meant to be used with
ee_pooled_logisticto generate predicted survival (or other measures) at designated time points.Given a specified covariate and time design matrix, the coefficients from a pooled logistic regression model are used to generate conditional probabilities of the event. These are then transformed into the desired survival measure. Predictions can be output for a selected set of times (
times_to_predict).Note
Specifications of
`theta,t,delta,S, andunique_timesshould match those provided toee_pooled_logistic.- Parameters
theta (ndarray, list, vector) – Estimated parameter vector for the pooled logistic model. Composed of the parameters for the baseline covariates and the time coefficients. These should be the values optimized by
ee_pooled_logistic.t (ndarray, list, vector) – 1-dimensional vector of n observed times. This should be the same values provided to
ee_pooled_logistic.delta (ndarray, list, vector) –
- 1-dimensional vector of n event indicators, where 1 indicates an event and 0 indicates right censoring.
This should be the same values provided to
ee_pooled_logistic.
X (ndarray, list, vector) – 2-dimensional vector of n observed values for b variables. Covariate values can be modified from those given to
ee_pooled_logistic, as is done with g-computation estimators.S (ndarray, list, vector, None, optional) – Optional argument for parametric function form specifications for time. Default is
None, which uses disjoint indicators to model time. Expected to havenp.max(t)rows. This should match the specification provided toee_pooled_logistic.times_to_predict (int, float, ndarray, list, vector, None, optional) – Time(s) to generate predicted values for. Specified times must be \([0, \tau]\). Default is
None, which generates predicted values at each unique event time (ifS=None) or at each unit-time interval (S!=None)measure (str, optional) – Measure to compute. Options include survival (
'survival'), density ('density'), risk or the cumulative density ('risk'), hazard ('hazard'), or cumulative hazard ('cumulative_hazard'). Default is survivalunique_times (None, ndarray, list, vector, optional) – Optional argument to compute the disjoint indicators for only a subset of terms. This argument is intended for use with disjoint indicators for time that are stratified by some external variable. This argument is ignored when
Sis notNone. This should match the specification provided toee_pooled_logistic.
- Returns
Returns a n-by-K NumPy array of predictions, where n is the number of rows in the design matrix and K is the number of time points to compute the survival measure at.
- Return type
array
Examples
The following illustrates how to use
plogit_predictions_individualto generate predicted survival probabilites at specific times for individuals.>>> import numpy as np >>> import pandas as pd >>> import matplotlib.pyplot as plt >>> from delicatessen import MEstimator >>> from delicatessen.estimating_equations import ee_plogit >>> from delicatessen.utilities import plogit_predict >>> from delicatessen.data import load_breast_cancer
Here, we will illustrate pooled logistic regression with breast cancer from the Middlesex Hospital in July 1987. This data can be loaded as follows
>>> d = pd.DataFrame(load_breast_cancer(), columns=['d', 't', 'statin'])
To start, we estimate the coefficients of a pooled logistic regression where time is modeled disjoint indicators. See
ee_plogitfor further details>>> unique_event_times = list(np.unique(d.loc[d['d'] == 1, 't']))
>>> def psi(theta): >>> return ee_plogit(theta=theta, X=d[['statin', ]], delta=d['d'], t=d['t'])
>>> inits = [0., ] + [-3., ] + [0., ]*(len(unique_times) - 1) >>> estr = MEstimator(stacked_equations=psi, init=inits) >>> estr.estimate()
After estimating the parameters, predicted survival metrics can be computed. Here, we compute the risk function for all observations at all unique events times.
>>> plogit_predict(theta=estr.theta, t=d['t'], delta=d['delta'], X=d[['statin', ]], S=None, measure='risk')
Note that the shared arguments between
ee_plogitandplogit_predict(besidesX, which can be modified) should match each other. If they do not, unexpected behaviors may occur.For further details on how to use
plogit_predict, see the Applied Examples.References
Abbott RD. (1985). Logistic regression in survival analysis. American Journal of Epidemiology, 121(3), 465-471.
D’Agostino RB, Lee ML, Belanger AJ, Cupples LA, Anderson K, & Kannel WB. (1990). Relation of pooled logistic regression to time dependent Cox regression analysis: the Framingham Heart Study. Statistics in Medicine, 9(12), 1501-1515.
Zivich PN, Cole SR, Shook-Sa BE, DeMonte JB, & Edwards JK. (2025). Estimating equations for survival analysis with pooled logistic regression. arXiv:2504.13291