delicatessen.estimating_equations.measurement.ee_rogan_gladen

ee_rogan_gladen(theta, y, y_star, r, weights=None)

Estimating equation for the Rogan-Gladen correction for mismeasured binary outcomes. This estimator uses external data to estimate the sensitivity and specificity, and then uses those external estimates to correct the estimated proportion. The general form of the estimating equations are

\[\begin{split}\sum_{i=1}^n \begin{bmatrix} \mu \times \left\{ \alpha + \beta - 1 \right\} - \left\{ \mu^* + \beta - 1 \right\} \\ R_i (Y_i^* - \mu^*) \\ (1-R_i) Y_i \left\{ Y^*_i - \beta \right\} \\ (1-R_i) (1-Y_i) \left\{ (1 - Y^*_i) - \alpha \right\} \\ \end{bmatrix} = 0\end{split}\]

where \(Y\) is the true value of the outcome, \(Y^*\) is the mismeasured value of the outcome, \(R\) is the indicator for the main study data, \(\mu\) is the corrected mean, \(\mu^*\) is the mismeasured mean in the main study data, \(\beta\) is the sensitivity, and \(\alpha\) is the specificity. The first estimating equation is the corrected proportion, the second is the naive proportion, the third is for sensitivity, and the fourth for specificity.

Here, theta is a 1-by-4 array.

Note

The Rogan-Gladen estimator may provide corrected proportions outside of \([0,1]\) when \(\alpha + \beta \le 1\).

Parameters
  • theta (ndarray, list, vector) – Theta consists of 4 values.

  • y (ndarray, list, vector) – 1-dimensional vector of n observed values. These are the gold-standard \(Y\) measurements in the external sample. All values should be either 0 or 1, and be non-missing among those with \(R=0\).

  • y_star (ndarray, list, vector) – 1-dimensional vector of n observed values. These are the mismeasured \(Y\) values. All values should be either 0 or 1, and be non-missing among all observations.

  • r (ndarray, list, vector) – 1-dimensional vector of n indicators regarding whether an observation was part of the external validation data. Indicator should designate if observations are the main data.

  • weights (ndarray, list, vector, None, optional) – 1-dimensional vector of n weights. Default is None, which assigns a weight of 1 to all observations.

Returns

Returns a 4-by-n NumPy array evaluated for the input theta

Return type

array

Examples

Construction of a estimating equation(s) with ee_rogan_gladen should be done similar to the following

>>> import numpy as np
>>> import pandas as pd
>>> from scipy.stats import logistic
>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_rogan_gladen

Replicating the published example from Cole et al. (2023).

>>> d = pd.DataFrame()
>>> d['Y_star'] = [0, 1] + [0, 1, 0, 1]
>>> d['Y'] = [np.nan, np.nan] + [0, 0, 1, 1]
>>> d['S'] = [1, 1] + [0, 0, 0, 0]
>>> d['n'] = [270, 680] + [71, 18, 38, 203]
>>> d = pd.DataFrame(np.repeat(d.values, d['n'], axis=0), columns=d.columns)

Applying the Rogan-Gladen correction to this example

>>> def psi(theta):
>>>     return ee_rogan_gladen(theta=theta, y=d['Y'],
>>>                            y_star=d['Y_star'], r=d['S'])

Notice that y corresponds to the gold-standard outcomes (only available where R=0), y_star corresponds to the mismeasured covariate data (available for R=1 and R=0), and r corresponds to the indicator for the main data source. Now we can call the M-Estimator.

>>> estr = MEstimator(psi, init=[0.5, 0.5, .75, .75])
>>> estr.estimate(solver='lm')

Inspecting the parameter estimates, variance, and 95% confidence intervals

>>> estr.theta
>>> estr.variance
>>> estr.confidence_intervals()

The corrected proportion is

>>> estr.theta[0]

Inverse probability weights can be used through the weights argument. See the applied examples for a demonstration.

References

Cole SR, Edwards JK, Breskin A, Rosin S, Zivich PN, Shook-Sa BE, & Hudgens MG. (2023). Illustration of 2 Fusion Designs and Estimators. American Journal of Epidemiology, 192(3), 467-474.

Rogan WJ & Gladen B. (1978). Estimating prevalence from the results of a screening test. American Journal of Epidemiology, 107(1), 71-76.

Ross RK, Zivich PN, Stringer JSA, & Cole SR. (2024). M-estimation for common epidemiological measures: introduction and applied examples. International Journal of Epidemiology, 53(2), dyae030.