delicatessen.estimating_equations.measurement.ee_rogan_gladen_extended

ee_rogan_gladen_extended(theta, y, y_star, r, X, weights=None)

Estimating equation for the extended Rogan-Gladen correction for mismeasured binary outcomes. This estimator uses external data to estimate the sensitivity and specificity conditional on covariates, and then uses those external estimates to correct the estimated proportion. The general form of the estimating equations are

\[\begin{split}\sum_{i=1}^n \begin{bmatrix} R_i \times \left\{ \frac{Y^* + m(X_i; \beta) - 1}{m(X_i; \alpha) + m(X_i; \beta) - 1} - \mu \right\} \\ (1-R_i) Y_i \left\{ Y^*_i - m(X_i; \beta) \right\} X_i^T \\ (1-R_i) (1 - Y_i) \left\{ (1 - Y^*_i) - m(X_i; \beta) \right\} X_i^T \\ \end{bmatrix} = 0\end{split}\]

where \(Y\) is the true value of the outcome, \(Y^*\) is the mismeasured value of the outcome. The first estimating equation is the corrected proportion, the second is for sensitivity, and the third for specificity.

If \(X\) is of dimension \(p\), then theta is a 1-by-(1+2`p`) array. Note that the design matrix is shared across the sensitivity and specificity models.

Note

The Rogan-Gladen estimator may provide corrected proportions outside of \([0,1]\) when \(\alpha + \beta \le 1\), or the addition of sensitivity and specificity is less than or equal to one.

Parameters

theta (ndarray, list, vector) – Theta consists of 4 values.
y (ndarray, list, vector) – 1-dimensional vector of n observed values. These are the gold-standard \(Y\) measurements in the external sample. All values should be either 0 or 1, and be non-missing among those with \(R=0\).
y_star (ndarray, list, vector) – 1-dimensional vector of n observed values. These are the mismeasured \(Y\) values. All values should be either 0 or 1, and be non-missing among all observations.
r (ndarray, list, vector) – 1-dimensional vector of n indicators regarding whether an observation was part of the external validation data. Indicator should designate if observations are the main data.
X (ndarray, list, vector) – 2-dimensional vector of a design matrix for the sensitivity and specificity models.
weights (ndarray, list, vector, None, optional) – 1-dimensional vector of n weights. Default is None, which assigns a weight of 1 to all observations.

Returns

Returns a 4-by-n NumPy array evaluated for the input theta

Return type

array

Examples

Construction of a estimating equation(s) with ee_rogan_gladen_extended should be done similar to the following

>>> import numpy as np
>>> import pandas as pd
>>> from scipy.stats import logistic
>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_rogan_gladen_extended

Replicating the example from Cole et al. (2023).

>>> d = pd.DataFrame()
>>> d['Y_star'] = [0, 1] + [0, 1, 0, 1]
>>> d['Y'] = [np.nan, np.nan] + [0, 0, 1, 1]
>>> d['S'] = [1, 1] + [0, 0, 0, 0]
>>> d['n'] = [270, 680] + [71, 18, 38, 203]
>>> d = pd.DataFrame(np.repeat(d.values, d['n'], axis=0), columns=d.columns)
>>> d['C'] = 1

Applying the Rogan-Gladen correction to this example

>>> def psi(theta):
>>>     return ee_rogan_gladen_extended(theta=theta, y=d['Y'],
>>>                                     y_star=d['Y_star'],
>>>                                     X=d[['C', ]], r=d['S'])

Notice that y corresponds to the gold-standard outcomes (only available where R=0), y_star corresponds to the mismeasured covariate data (available for R=1 and R=0), and r corresponds to the indicator for the main data source. Now we can call the M-Estimator.

>>> estr = MEstimator(psi, init=[0.5, 1., 1.])
>>> estr.estimate(solver='lm')

Inspecting the parameter estimates, variance, and 95% confidence intervals

>>> estr.theta
>>> estr.variance
>>> estr.confidence_intervals()

Note

The sensitivity and specificity in ee_rogan_gladen_extended correspond to the logit transformations, unlike ee_rogan_gladen which returns the sensitivity and specificity directly.

The corrected proportion is

>>> estr.theta[0]

Inverse probability weights can be used through the weights argument. See the applied examples for a demonstration.

References

Cole SR, Edwards JK, Breskin A, Rosin S, Zivich PN, Shook-Sa BE, & Hudgens MG. (2023). Illustration of 2 Fusion Designs and Estimators. American Journal of Epidemiology, 192(3), 467-474.

Rogan WJ & Gladen B. (1978). Estimating prevalence from the results of a screening test. American Journal of Epidemiology, 107(1), 71-76.

Ross RK, Cole SR, Edwards JK, Zivich PN, Westreich D, Daniels JL, Price JT & Stringer JSA. (2024). Leveraging External Validation Data: The Challenges of Transporting Measurement Error Parameters. Epidemiology, 35(2), 196-207.