delicatessen.utilities.plogit_predict

plogit_predict(theta, t, delta, X, S=None, times_to_predict=None, measure='survival', unique_times=None)

Compute predicted survival analysis measures from a pooled logistic regression model for given a design matrix and times. This function is meant to be used with ee_pooled_logistic to generate predicted survival (or other measures) at designated time points.

Given a specified covariate and time design matrix, the coefficients from a pooled logistic regression model are used to generate conditional probabilities of the event. These are then transformed into the desired survival measure. Predictions can be output for a selected set of times (times_to_predict).

Note

Specifications of `theta, t, delta, S, and unique_times should match those provided to ee_pooled_logistic.

Parameters
  • theta (ndarray, list, vector) – Estimated parameter vector for the pooled logistic model. Composed of the parameters for the baseline covariates and the time coefficients. These should be the values optimized by ee_pooled_logistic.

  • t (ndarray, list, vector) – 1-dimensional vector of n observed times. This should be the same values provided to ee_pooled_logistic.

  • delta (ndarray, list, vector) –

    1-dimensional vector of n event indicators, where 1 indicates an event and 0 indicates right censoring.

    This should be the same values provided to ee_pooled_logistic.

  • X (ndarray, list, vector) – 2-dimensional vector of n observed values for b variables. Covariate values can be modified from those given to ee_pooled_logistic, as is done with g-computation estimators.

  • S (ndarray, list, vector, None, optional) – Optional argument for parametric function form specifications for time. Default is None, which uses disjoint indicators to model time. Expected to have np.max(t) rows. This should match the specification provided to ee_pooled_logistic.

  • times_to_predict (int, float, ndarray, list, vector, None, optional) – Time(s) to generate predicted values for. Specified times must be \([0, \tau]\). Default is None, which generates predicted values at each unique event time (if S=None) or at each unit-time interval (S!=None)

  • measure (str, optional) – Measure to compute. Options include survival ('survival'), density ('density'), risk or the cumulative density ('risk'), hazard ('hazard'), or cumulative hazard ('cumulative_hazard'). Default is survival

  • unique_times (None, ndarray, list, vector, optional) – Optional argument to compute the disjoint indicators for only a subset of terms. This argument is intended for use with disjoint indicators for time that are stratified by some external variable. This argument is ignored when S is not None. This should match the specification provided to ee_pooled_logistic.

Returns

Returns a n-by-K NumPy array of predictions, where n is the number of rows in the design matrix and K is the number of time points to compute the survival measure at.

Return type

array

Examples

The following illustrates how to use plogit_predictions_individual to generate predicted survival probabilites at specific times for individuals.

>>> import numpy as np
>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_plogit
>>> from delicatessen.utilities import plogit_predict
>>> from delicatessen.data import load_breast_cancer

Here, we will illustrate pooled logistic regression with breast cancer from the Middlesex Hospital in July 1987. This data can be loaded as follows

>>> d = pd.DataFrame(load_breast_cancer(), columns=['d', 't', 'statin'])

To start, we estimate the coefficients of a pooled logistic regression where time is modeled disjoint indicators. See ee_plogit for further details

>>> unique_event_times = list(np.unique(d.loc[d['d'] == 1, 't']))
>>> def psi(theta):
>>>     return ee_plogit(theta=theta, X=d[['statin', ]], delta=d['d'], t=d['t'])
>>> inits = [0., ] + [-3., ] + [0., ]*(len(unique_times) - 1)
>>> estr = MEstimator(stacked_equations=psi, init=inits)
>>> estr.estimate()

After estimating the parameters, predicted survival metrics can be computed. Here, we compute the risk function for all observations at all unique events times.

>>> plogit_predict(theta=estr.theta, t=d['t'], delta=d['delta'], X=d[['statin', ]], S=None, measure='risk')

Note that the shared arguments between ee_plogit and plogit_predict (besides X, which can be modified) should match each other. If they do not, unexpected behaviors may occur.

For further details on how to use plogit_predict, see the Applied Examples.

References

Abbott RD. (1985). Logistic regression in survival analysis. American Journal of Epidemiology, 121(3), 465-471.

D’Agostino RB, Lee ML, Belanger AJ, Cupples LA, Anderson K, & Kannel WB. (1990). Relation of pooled logistic regression to time dependent Cox regression analysis: the Framingham Heart Study. Statistics in Medicine, 9(12), 1501-1515.

Zivich PN, Cole SR, Shook-Sa BE, DeMonte JB, & Edwards JK. (2025). Estimating equations for survival analysis with pooled logistic regression. arXiv:2504.13291