Coffman et al. (2021): Causal Mediation Analysis

For the twangMediation package in R, Coffman et al. provided a vignette on causal mediation analysis in the context of health disparity research. Specifically, the vignette examined possible mediating pathways for substance use disparities among sexual minority women. Data for this analysis comes from the National Survey of Drug Use and Health (NSDUH) and is available from the R package.

Drawing from this vignette, we show how delicatessen can be used to conduct causal mediation analyses. Note that the provided data consists of more than 40,000 observations, so fitting these models might take a second.

Setup

[1]:
import numpy as np
import scipy as sp
import pandas as pd
from formulaic import model_matrix

import delicatessen as deli
from delicatessen import MEstimator
from delicatessen.estimating_equations import ee_regression, ee_ipw
from delicatessen.utilities import inverse_logit, spline, regression_predictions

print("Versions")
print("NumPy:        ", np.__version__)
print("SciPy:        ", sp.__version__)
print("pandas:       ", pd.__version__)
print("Delicatessen: ", deli.__version__)
Versions
NumPy:         2.3.5
SciPy:         1.16.3
pandas:        2.3.3
Delicatessen:  4.1
[2]:
d = pd.read_csv("data/nsduh.csv")

In the context of the vignette, interest was in the disparities of adult smoking status (cigmon) by sexual minority (e.g., lesbian, gay, bisexual; lgb_flag) as mediated by early smoking initiation (prior to age 15; cig15). The adjustment set is age (age), race (race), education (educ), income (income), and employment (employ).

Average Causal Effect

To start, we will estimate the average causal effect of sexual minority on adult smoking status as done in the vignette. We will do this using an inverse probability of treatment weighting estimator. The weights are defined as

\[IPTW = \frac{A_i}{\Pr(A_i = 1 \mid W_i)} + \frac{1 - A_i}{1 - \Pr(A_i = 1 \mid W_i)}\]

and then can be used to compute the weighted mean. Here, we use the Hajek estimator due to its improved finite-sample performance relative to the Horvitz-Thompson estimator

[3]:
W_matrix = np.asarray(model_matrix("C(age) + C(race) + C(educ) + C(income) + C(employ)", d))
a = np.asarray(d['lgb_flag'])
y = np.asarray(d['cigmon'])
[4]:
def psi_ace(theta):
    ace, mu1, mu0 = theta[:3]
    alpha = theta[3:]

    # Propensity score model
    ee_ps = ee_regression(theta=alpha, X=W_matrix, y=a, model='logistic')
    ps = inverse_logit(np.dot(W_matrix, alpha))
    iptw = a/ps + (1-a)/(1-ps)

    # Hajek estimator for the causal risks
    ee_r1 = iptw * a * (y - mu1)
    ee_r0 = iptw * (1-a) * (y - mu0)
    ee_ace = np.ones(y.shape[0]) * (mu1 - mu0 - ace)

    return np.vstack([ee_ace, ee_r1, ee_r0, ee_ps])
[5]:
init_vals = [0., 0.5, 0.5, ] + [0., ]*W_matrix.shape[1]
estr = MEstimator(psi_ace, init=init_vals)
estr.estimate()
ci = estr.confidence_intervals()

print("ACE:   ", estr.theta[0])
print("95% CI:", ci[0, :])
ACE:    0.12460267719839338
95% CI: [0.10793198 0.14127338]

These results suggest a disparity by sexual minority status in adult smoking behaviors that is not explained by the adjustment set. These results are similar to twangMediation but do differ as a result of how the propensity scores are being estimated. In the vignette, XGBoost (an ensemble of decision trees) is used. Here, a simple parameteric model is used instead. It would be better practice to specify a more flexible model for the propensity score (e.g., interactions).

While it may seem like twangMediation has an advantage here, the corresponding statisitcal inference is invalid as XGBoost is not generally valid without the use of cross-fitting and doubly robust estimators. See Zivich & Breskin (2021) for details on why the twangMediation approach is not statistically valid. The approach based on parametric models used here with delicatessen is statistically valid.

Controlled Direct Effect

The controlled direct effect is one parameter we can consider in mediation analysis. Letting \(Y(a,m)\) denote the potential outcome under action \(a\) and mediator \(m\), the controlled direct effect is

\[E[Y(a,m)] - E[Y(a',m)]\]

so we are considering the effect of changing \(a\) when \(m\) is held at a fixed value.

For the controlled direct effect, we need to estimate a pair of weights (Coffman & Zhong 2012). First, we estimate the inverse probability of treatment weights as before. The second weight needed is the inverse probability of mediator weights, defined as the following

\[IPMW = \frac{M_i}{\Pr(M_i = 1 \mid A_i, W_i)} + \frac{1 - M_i}{1 - \Pr(M_i = 1 \mid A_i, W_i)}\]

which we can jointly estimate all parameters using a stacked extension of the previous estimating functions with an additional logistic regression model. Here, the same adjustment set is used for both nuisance models. However, it can be the case that there are additional confounding variables for mediators. These can be dealt with by extending the conditional probability for \(IPMW\).

For estimation, we again use the Hajek estimator with the product of the two weights

[6]:
W_matrix = np.asarray(model_matrix("C(age) + C(race) + C(educ) + C(income) + C(employ)", d))
X_matrix = np.asarray(model_matrix("lgb_flag + C(age) + C(race) + C(educ) + C(income) + C(employ)", d))
a = np.asarray(d['lgb_flag'])
m = np.asarray(d['cig15'])
y = np.asarray(d['cigmon'])
[7]:
def psi_cde(theta):
    ndim_W = W_matrix.shape[1]
    ace1, ace0, mu11, mu01, mu10, mu00 = theta[:6]
    alpha = theta[6: 6+ndim_W]
    gamma = theta[6+ndim_W: ]

    # Propensity score model for A
    ee_ps = ee_regression(theta=alpha, X=W_matrix, y=a, model='logistic')
    ps = inverse_logit(np.dot(W_matrix, alpha))
    iptw = a/ps + (1-a)/(1-ps)

    # Mediation score model for M
    ee_ms = ee_regression(theta=gamma, X=X_matrix, y=m, model='logistic')
    ms = inverse_logit(np.dot(X_matrix, gamma))
    ipmw = m/ms + (1-m)/(1-ms)

    # Hajek estimator for the causal risks
    ipw = iptw * ipmw
    ee_r11 = ipw * a * m * (y - mu11)
    ee_r01 = ipw * (1-a) * m * (y - mu01)
    ee_r10= ipw * a * (1-m) * (y - mu10)
    ee_r00 = ipw * (1-a) * (1-m) * (y - mu00)
    ee_ace1 = np.ones(y.shape[0]) * (mu11 - mu01 - ace1)
    ee_ace0 = np.ones(y.shape[0]) * (mu10 - mu00 - ace0)

    return np.vstack([ee_ace1, ee_ace0, ee_r11, ee_r01, ee_r10, ee_r00, ee_ps, ee_ms])
[8]:
init_vals = ([0., 0., 0.5, 0.5, 0.5, 0.5]
             + [0., ]*W_matrix.shape[1]
             + [0., ]*X_matrix.shape[1])
estr = MEstimator(psi_cde, init=init_vals)
estr.estimate()
ci = estr.confidence_intervals()

print("CDE(1):", estr.theta[0])
print("95% CI:", ci[0, :])
print("CDE(0):", estr.theta[1])
print("95% CI:", ci[1, :])
CDE(1): 0.10231467457422143
95% CI: [0.06129448 0.14333486]
CDE(0): 0.09821972928863482
95% CI: [0.07878279 0.11765667]

Note that the twangMediation documentation does not list the controlled direct effects, so no comparison can be made here. However, these numbers do happen to be similar to the pure direct effect estimates, which we will examine in the next section.

As we have estimated a pair of parameters (two controlled direct effects), confidence bands are more appropriate to report here (since they provide simultaneous coverage for both paramters). sup-t confidence bands can be easily computed using delicatessen with the following functionality.

[9]:
cb = estr.confidence_bands(subset=[0, 1], seed=4281014)
print("95% CB - CDE(1):", cb[0, :])
print("95% CB - CDE(0):", cb[1, :])
95% CB - CDE(1): [0.05552766 0.14910169]
95% CB - CDE(0): [0.07605025 0.12038921]

As shown here, these are slightly wider than the confidence intervals. This is to be expected, as we are interested in multiple parameters but still would like to have simultaneous coverage. However, the sup-t correction is not overly conservative like the Bonferroni correction.

Pure Direct Effect

The next parameter we could be interested in is the pure direct effect (also known as the natural direct effect). Using the potential outcomes from before, the pure direct effect is

\[E[Y(a, M(a'))] - E[Y(a', M(a'))]\]

where \(M(a)\) is the potential mediator under \(a\). Note that there is the seemingly odd combination of \(Y\) under \(a\) but the mediator is the one under \(a'\). Such a pattern does not occur in nature. Identification of the pure direct effect thus relies on a ‘cross-world’ assumption for identification. This assumption is what makes these effect ‘unnatural’ (to me).

For the right quantity, note that \(E[Y(a')]\). Thus, that part can be estimated using the IPW estimator for the never-act part of the average causal effect estimator. For the left quantity, we use ‘strategy 3’ of Tchetgen Tchetgen & Shpitser (2012). The described IPW estimator is

\[n^{-1} \sum_{i=1}^{n} Y_i \frac{I(A_i = 1)}{\Pr(A=1 \mid W_i)} \times \frac{\Pr(M_i = m \mid A = 0, W_i)}{\Pr(M_i = m \mid A_i, W_i)}\]

where the last probability fractions are the probability of the observed value of \(M\) for that individual.

[10]:
W_matrix = np.asarray(model_matrix("C(age) + C(race) + C(educ) + C(income) + C(employ)", d))
X_matrix = np.asarray(model_matrix("lgb_flag + C(age) + C(race) + C(educ) + C(income) + C(employ)", d))
d0 = d.copy()
d0['lgb_flag'] = 0
X0_matrix = np.asarray(model_matrix("lgb_flag + C(age) + C(race) + C(educ) + C(income) + C(employ)", d0))
a = np.asarray(d['lgb_flag'])
m = np.asarray(d['cig15'])
y = np.asarray(d['cigmon'])
[11]:
def psi_pde(theta):
    ndim_W = W_matrix.shape[1]
    pde, mu1m, mu0 = theta[:3]
    alpha = theta[3: 3+ndim_W]
    gamma = theta[3+ndim_W: ]

    # Propensity score model for A
    ee_ps = ee_regression(theta=alpha, X=W_matrix, y=a, model='logistic')
    ps = inverse_logit(np.dot(W_matrix, alpha))
    iptw = a/ps + (1-a)/(1-ps)

    # Mediation score model for M
    ee_ms = ee_regression(theta=gamma, X=X_matrix, y=m, model='logistic')
    ms = inverse_logit(np.dot(X_matrix, gamma))
    m0s = inverse_logit(np.dot(X0_matrix, gamma))
    ipmw = m*(m0s / ms) + (1-m)*(1-m0s)/(1-ms)

    # Hajek estimator for the causal risks
    ee_r1m = iptw * a * ipmw * (y - mu1m)
    ee_r0 = iptw * (1-a) * (y - mu0)
    ee_pde = np.ones(y.shape[0]) * (mu1m - mu0 - pde)

    return np.vstack([ee_pde, ee_r1m, ee_r0, ee_ps, ee_ms])
[12]:
init_vals = ([0., 0.5, 0.5]
             + [0., ]*W_matrix.shape[1]
             + [0., ]*X_matrix.shape[1])
estr = MEstimator(psi_pde, init=init_vals)
estr.estimate()
ci = estr.confidence_intervals()

print("PDE:   ", estr.theta[0])
print("95% CI:", ci[0, :])
PDE:    0.09552324822589883
95% CI: [0.07882165 0.11222485]

These results are fairly similar to those provided in twangMediation (but the results here are not stratified and we are using a different nuisance model).

Pure Indirect Effect

The pure (natural) indirect effect can be defined as

\[E[Y(a, M(a))] - E[Y(a, M(a'))]\]

which again involves a cross-world assumption. Note the second part here is the same as the left side of the pure direct effect. This shows how to decompose everything. The following code implements the estimator for the pure indirect effect

[13]:
def psi_pie(theta):
    ndim_W = W_matrix.shape[1]
    pie, mu1, mu1m = theta[:3]
    alpha = theta[3: 3+ndim_W]
    gamma = theta[3+ndim_W: ]

    # Propensity score model for A
    ee_ps = ee_regression(theta=alpha, X=W_matrix, y=a, model='logistic')
    ps = inverse_logit(np.dot(W_matrix, alpha))
    iptw = a/ps + (1-a)/(1-ps)

    # Mediation score model for M
    ee_ms = ee_regression(theta=gamma, X=X_matrix, y=m, model='logistic')
    ms = inverse_logit(np.dot(X_matrix, gamma))
    m0s = inverse_logit(np.dot(X0_matrix, gamma))
    ipmw = m*(m0s/ms) + (1-m)*(1-m0s)/(1-ms)

    # Hajek estimator for the causal risks
    ee_r1 = iptw * a * (y - mu1)
    ee_r1m = iptw * a * ipmw * (y - mu1m)
    ee_pie = np.ones(y.shape[0]) * (mu1 - mu1m - pie)

    return np.vstack([ee_pie, ee_r1, ee_r1m, ee_ps, ee_ms])
[14]:
init_vals = ([0., 0.5, 0.5]
             + [0., ]*W_matrix.shape[1]
             + [0., ]*X_matrix.shape[1])
estr = MEstimator(psi_pie, init=init_vals)
estr.estimate()
ci = estr.confidence_intervals()

print("PIE:   ", estr.theta[0])
print("95% CI:", ci[0, :])
PIE:    0.029079428972494544
95% CI: [0.02309128 0.03506758]

That provides the pure indirect effect estimate. This is again similar to twangMediation and if we add the pure effects together, we get back to the average causal effect.

Pure Effects Altogether

As a final step, we now package the pure direct and indirect effect estimators together into a single function. As with the controlled direct effect, interest is in a pair of parameters, so confidence bands are more appropriate to report here

[15]:
def psi_full(theta):
    ndim_W = W_matrix.shape[1]
    pde, pie, mu1, mu0, mu1m = theta[:5]
    alpha = theta[5: 5+ndim_W]
    gamma = theta[5+ndim_W: ]

    # Propensity score model for A
    ee_ps = ee_regression(theta=alpha, X=W_matrix, y=a, model='logistic')
    ps = inverse_logit(np.dot(W_matrix, alpha))
    iptw = a/ps + (1-a)/(1-ps)

    # Mediation score model for M
    ee_ms = ee_regression(theta=gamma, X=X_matrix, y=m, model='logistic')
    ms = inverse_logit(np.dot(X_matrix, gamma))
    m0s = inverse_logit(np.dot(X0_matrix, gamma))
    ipmw = m*(m0s/ms) + (1-m)*(1-m0s)/(1-ms)

    # Hajek estimator for the causal risks
    ee_r1 = iptw * a * (y - mu1)
    ee_r0 = iptw * (1-a) * (y - mu0)
    ee_r1m = iptw * a * ipmw * (y - mu1m)
    ee_pde = np.ones(y.shape[0]) * (mu1m - mu0 - pde)
    ee_pie = np.ones(y.shape[0]) * (mu1 - mu1m - pie)

    return np.vstack([ee_pde, ee_pie, ee_r1, ee_r0, ee_r1m, ee_ps, ee_ms])
[16]:
init_vals = ([0., 0., 0.5, 0.5, 0.5]
             + [0., ]*W_matrix.shape[1]
             + [0., ]*X_matrix.shape[1])
estr = MEstimator(psi_full, init=init_vals)
estr.estimate()
cb = estr.confidence_bands(subset=[0, 1], seed=921453)

print("PDE:   ", estr.theta[0])
print("95% CB:", ci[0, :])
print("PIE:   ", estr.theta[1])
print("95% CB:", ci[1, :])
PDE:    0.09552324822589882
95% CB: [0.02309128 0.03506758]
PIE:    0.029079428972494527
95% CB: [0.30311362 0.33564762]

This provides our results collected from before into a single estimating function. As a result, we are able to construct sup-t confidence bands for appropriate inference on the decomposition of the total effect.

We can also check that our decomposition sums back to the average causal effect.

[17]:
estr.theta[0] + estr.theta[1]
[17]:
np.float64(0.12460267719839335)

which matches the average causal effect for many of the decimals (we expect some minor differences due to floating point precision and the root-finding procedure tolerance not being infinite).

This example provides another illustration of how to use delicatessen to construct custom estimating equations and use it for statistically valid inference.

References

Coffman, D. L., & Zhong, W. (2012). Assessing mediation using marginal structural models in the presence of confounding and moderation. Psychological methods, 17(4), 642.

Coffman, D. L., Schuler, M. S., McCaffrey, D. F., Castellano, K. E., Zhou, H., Vegetabile, B., & Griffin, B. A. (2021). A tutorial for conducting causal mediation analysis with the twangMediation package.

Tchetgen Tchetgen, E. J., & Shpitser, I. (2012). Semiparametric theory for causal mediation analysis: efficiency bounds, multiple robustness, and sensitivity analysis. Annals of statistics, 40(3), 1816.

Zivich, P. N., & Breskin, A. (2021). Machine learning for causal inference: on the use of cross-fit estimators. Epidemiology, 32(3), 393-401.