Hernan & Robins (2023): Causal Inference with Models

The following replicates selected examples from the textbook “Causal Inference: What If” by Hernan and Robins. These replications focus on Part II of the textbook. I recommend reading Part I prior to looking through the following code. It is a great, approachable, and freely available resource.

Here, we demonstrate application of delicatessen for causal inference. Throughout, we use the sandwich variance estimator. This is not described in the textbook, but is an alternative to the bootstrap. Importantly, it is a consistent estimator of the variance that is computationally simpler (in terms of the computers computational effort). delicatessen automates the whole procedure, so it is also simpler for us.

Broadly, interest is in estimating the average causal effect of stopping smoking (variable name: qsmk) on 10-year weight change (variable name: wt82_71). If we let \(Y^a\) indicate the potential weight change under smoking status \(a\), then the average causal effect can be written as

\[E[Y^1] - E[Y^0]\]

Herafter, we assume that the interest parameter is identified (see the book for what this means). Our focus will be on estimators for this quantity.

Setup

Data for these replications is available at the following Harvard School of Public Health website. Data comes from the National Health and Nutrition Examination Survey Data I Epidemiologic Follow-up Study (NHEFS). The NHEFS was jointly initiated by the National Center for Health Statistics and the National Institute on Aging in collaboration with other agencies of the United States Public Health Service. A detailed description of the NHEFS, together with publicly available data sets and documentation, can be found at wwwn.cdc.gov/nchs/nhanes/nhefs/

The data set used in the book and this tutorial is a subset of the full NHEFS. First, we will load the data and run some basic variable manipulations.

[1]:
import numpy as np
import scipy as sp
import pandas as pd

import delicatessen as deli
from delicatessen import MEstimator
from delicatessen.estimating_equations import (ee_regression, ee_glm, ee_ipw_msm,
                                               ee_gformula, ee_gestimation_snmm)
from delicatessen.utilities import inverse_logit, regression_predictions

print("Versions")
print("NumPy:        ", np.__version__)
print("SciPy:        ", sp.__version__)
print("pandas:       ", pd.__version__)
print("Delicatessen: ", deli.__version__)
Versions
NumPy:         1.25.2
SciPy:         1.11.2
pandas:        1.4.1
Delicatessen:  2.1
[2]:
df = pd.read_csv('data/nhefs.csv')
df.dropna(subset=['sex', 'age', 'race', 'ht',
                  'school', 'alcoholpy', 'smokeintensity'],
         inplace=True)

# recoding some variables
df['inactive'] = np.where(df['active'] == 2, 1, 0)
df['no_exercise'] = np.where(df['exercise'] == 2, 1, 0)
df['university'] = np.where(df['education'] == 5, 1, 0)

# Subsetting only variables of interest
df = df[['wt82_71', 'qsmk', 'sex', 'age', 'race', 'wt71', 'wt82', 'ht',
         'school', 'alcoholpy', 'smokeintensity', 'smokeyrs', 'smkintensity82_71',
         'education', 'exercise', 'active', 'death']]

# creating quadratic terms
for col in ['age', 'wt71', 'smokeintensity', 'smokeyrs']:
    df[col+'_sq'] = df[col] * df[col]

df['I'] = 1

# Indicator terms
df['educ_2'] = np.where(df['education'] == 2, 1, 0)
df['educ_3'] = np.where(df['education'] == 3, 1, 0)
df['educ_4'] = np.where(df['education'] == 4, 1, 0)
df['educ_5'] = np.where(df['education'] == 5, 1, 0)
df['exer_1'] = np.where(df['exercise'] == 1, 1, 0)
df['exer_2'] = np.where(df['exercise'] == 2, 1, 0)
df['active_1'] = np.where(df['active'] == 1, 1, 0)
df['active_2'] = np.where(df['active'] == 2, 1, 0)

# Interaction terms
df['qsmk_smkint'] = df['qsmk'] * df['smokeintensity']

# Complete-case data
dc = df.dropna(subset=['wt82_71']).copy()

Chapter 12: IP weighting and marginal structural models

The first estimation approach is inverse probability weighting. Inverse probability weights are defined as

\[\frac{1}{\Pr(A=a | W)}\]

where \(W\) is the set of confounders. We will estimate these weights and then use them to estimate the parameters of a marginal structural model. An example of a marginal structural model is

\[E[Y^a] = \alpha_0 + \alpha_1 a\]

where \(\alpha\) are the parameters to estimate. Here, \(\alpha_1\) represents the average causal effect. We will estimate the marginal structural model using the observed data and a regression model weighted by the inverse probability weights (see the books for details).

12.1: The causal question

Chapter 12 starts out with estimating the crude association between qsmk and wt82_71 (12.1). We will do this by fitting a linear regression model with delicatessen.

[3]:
def psi_ols(theta):
    return ee_regression(theta=theta, X=dc[['I', 'qsmk']],
                         y=dc['wt82_71'], model='linear')
[4]:
estr = MEstimator(psi_ols, init=[0., 0.])
estr.estimate(solver='hybr')
[5]:
print("Point:", estr.theta[1])
print("95% CI:", estr.confidence_intervals()[1, :])
Point: 2.540581454955888
95% CI: [1.58620716 3.49495575]

The book reports an estimate of 2.5 (95% CI: 1.7, 3.4)

You may notice that the confidence interval differs slightly. That is because we are using the sandwich variance (the book does not). While the variance estimators used here and in the book are expected to be asymptotically equivalent in this case, they can produce different results in finite samples as we see here.

12.2: Estimating inverse probability weights via modeling

Now we will estimate the parameters of the marginal structural model using unstabilized inverse probability weights.

To do this with delicatessen, we will define the corresponding design matrices. Then we will define the stacked estimating equations. Then we will estimate the parameters and covariance using MEstimator and present the output.

[6]:
# Design matrix for the propensity score model
W = dc[['I', 'sex', 'race', 'age', 'age_sq',
        'educ_2', 'educ_3', 'educ_4', 'educ_5',
        'smokeintensity', 'smokeintensity_sq',
        'smokeyrs', 'smokeyrs_sq',
        'exer_1', 'exer_2', 'active_1', 'active_2',
        'wt71', 'wt71_sq']]
# Design matrix for the marginal structural model
msm = dc[['I', 'qsmk']]
# treatment variable
a = dc['qsmk']
# outcome variable
y = dc['wt82_71']
[7]:
def psi_ipw_msm1(theta):
    # Dividing parameters into corresponding estimation equations
    alpha = theta[0:2]
    beta = theta[2:]

    # Estimating the propensity scores
    ee_ps = ee_regression(theta=beta,        # Estimate propensity scores
                          X=W, y=a,          # ... given observed A,W
                          model='logistic')  # ... with logit model
    pi = inverse_logit(np.dot(W, beta))      # Get Pr(A = 1 | W)
    ipw = 1 / np.where(a == 1, pi, 1-pi)     # Convert to IPW

    # Estimating the MSM using a weighted linear model
    ee_msm = ee_regression(theta=alpha,      # MSM parameters
                           X=msm, y=y,       # ... observed data
                           model='linear',   # ... with linear model
                           weights=ipw)      # ... but weighted by IPW

    # Stacking the estimating equations and returning
    return np.vstack([ee_msm, ee_ps])
[8]:
# M-estimator
init_vals = [0., 0., ] + [0.,]*W.shape[1]
estr = MEstimator(psi_ipw_msm1, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[9]:
ci = estr.confidence_intervals()
print(estr.theta[0:2])
print("95% CI")
print(ci[0:2, :].T)
[1.77997819 3.44053543]
95% CI
[[1.35249881 2.48589079]
 [2.20745757 4.39518006]]

which are the estimates of the \(\alpha_0\) and \(\alpha_1\) parameters. The book provides the point estimate for qsmk (\(\hat{\alpha}_1\)) as 3.4 (95% CI: 2.4, 4.5).

The point estimates presented here are the same as the book. However, the confidence intervals differ slightly. The confidence intervals in the book use the ‘GEE trick’ which provides overly conservative confidence intervals. With delicatessen and the sandwich variance, we can estimate the variance without being overly conservative. So, the variance reported above would be preferred over the approach described in the book. This highlights the advantage of M-estimators for computation of the variance with nuisance parameters (like the IPW estimator when the propensity score is estimated).

Instead of coding this by-hand, we can also use the built-in ee_ipw_msm function. This function estimates a marginal structural model using inverse probability weights, as done above. Below is how to apply this functionality

[10]:
def psi_ipw_msm1a(theta):
    # Built-in estimating equation
    return ee_ipw_msm(theta, y=y, A=a, W=W, V=msm,
                      distribution='normal',
                      link='identity')
[11]:
# M-estimator
init_vals = [0., 0., ] + [0.,]*W.shape[1]
estr = MEstimator(psi_ipw_msm1a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[12]:
ci = estr.confidence_intervals()
print(estr.theta[0:2])
print("95% CI")
print(ci[0:2, :].T)
[1.77997819 3.44053543]
95% CI
[[1.35249881 2.48589079]
 [2.20745757 4.39518006]]

As expected, this built-in functionality produces the same results as the by-hand version.

12.3: Stabilized inverse probability weights

Next, we are going to use stabilized weights. The stabilized weights will require us to estimate an additional parameter. We will accomplish this by stacking an estimating equation for that parameter. This extra estimating equation is for an intercept-only model for the probability of qsmk.

[13]:
def psi_ipw_msm2(theta):
    # Dividing parameters into corresponding estimation equations
    alpha = theta[0:2]                 # MSM parameters
    gamma = np.array([theta[2], ])     # Numerator parameter
    beta = theta[3:]                   # Propensity score parameters

    # Estimating the propensity scores using a logit model
    ee_ps = ee_regression(theta=beta,        # Propensity score model
                          X=W, y=a,          # ... with observed data
                          model='logistic')  # ... and logit model
    pi = inverse_logit(np.dot(W, beta))      # Predicted probability of A=1

    # Estimating intercept-only for numerator
    ee_num = ee_regression(theta=gamma,            # Numerator model
                           X=dc[['I']], y=a,       # ... intercept-only
                           model='logistic')       # ... logit model
    num = inverse_logit(np.dot(dc[['I']], gamma))  # Marginal probability

    # Construct stabilized weights
    ipw = np.where(a == 1, num/pi, (1-num)/(1-pi))

    # Estimating the MSM using a weighted linear model
    ee_msm = ee_regression(theta=alpha,     # MSM
                           X=msm, y=y,      # ... with observed data
                           model='linear',  # ... linear
                           weights=ipw)     # ... weighted by stabilized

    # Stacking the estimating equations and returning
    return np.vstack([ee_msm, ee_num, ee_ps])
[14]:
# M-estimator
init_vals = [0., 0., ] + [0., ] + [0.,]*W.shape[1]
estr = MEstimator(psi_ipw_msm2, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[15]:
ci = estr.confidence_intervals()
print("Intercept:", estr.theta[0])
print("95% CI:", ci[0, :])
print("qsmk:", estr.theta[1])
print("95% CI:", ci[1, :])
Intercept: 1.77997819053316
95% CI: [1.35249875 2.20745763]
qsmk: 3.44053542964726
95% CI: [2.48589062 4.39518024]

The estimate is the same in the previous section. This is expected because as long as the marginal structural model is saturated, the unstabilized and stabilized IPTW should produce the same answer. (note there are some differences out in the smaller decimals places, but this is due to floating point errors, not differences between the estimators).

Note: ee_ipw_msm does not support the computation of stabilized weights (results are equivalent in this setting, so we can be more computationally efficient by opting for the unstabilized weights).

12.4: Marginal structural models

Now we will consider the IPW estimator for a continuous action. We will look at smokeintensity on wt82_71.

[16]:
# Restricting data by smoking intensity
ds = dc.loc[dc['smokeintensity'] <= 25].copy()

# Design matrix for the propensity score model
W = ds[['I', 'sex', 'race', 'age', 'age_sq',
        'educ_2', 'educ_3', 'educ_4', 'educ_5',
        'smokeintensity', 'smokeintensity_sq',
        'smokeyrs', 'smokeyrs_sq',
        'exer_1', 'exer_2', 'active_1', 'active_2',
        'wt71', 'wt71_sq']]
# Design matrix for the marginal structural model
ds['smkint_sq'] = ds['smkintensity82_71']**2
msm = ds[['I', 'smkintensity82_71', 'smkint_sq']]
# Treatment array
a = ds['smkintensity82_71']
# Outcome array
y = ds['wt82_71']
[17]:
def psi_ipw_msm3(theta):
    # Dividing parameters into corresponding estimation equations
    alpha = theta[0:3]               # Marginal structural model
    gamma = np.array([theta[3], ])   # Numerator
    beta = theta[4:]                 # Propensity score model
    div_ps = W.shape[0] - len(beta)  # Divisor for PS SD
    div_nm = W.shape[0] - len(gamma) # Divisor for Num SD

    # Estimating the propensity scores using a logit model
    ee_ps = ee_regression(theta=beta,           # Generalized PS model
                          X=W, y=a,             # ... for observed data
                          model='linear')       # ... linear regression
    mu = np.dot(W, beta)                        # Predicted values
    mu_resid = np.sum((a - mu)**2) / div_ps     # Standard deviation
    fAL = sp.stats.norm.pdf(a, mu,              # PDF from normal
                            np.sqrt(mu_resid))  # ... with estimates

    # Estimating intercept-only for numerator
    ee_num = ee_regression(theta=gamma,         # Numerator for stabilized
                           X=ds[['I', ]], y=a,  # ... for observed data
                           model='linear')      # ... linear regression
    num = np.dot(ds[['I', ]], gamma)            # Predicted values
    num_resid = np.sum((a - num)**2) / div_nm   # Standard deviation
    fA = sp.stats.norm.pdf(a, num,              # PDF from normal
                           np.sqrt(num_resid))  # ... with estimates

    # Stabilized weights
    ipw = fA / fAL

    # Estimating the MSM using a weighted linear model
    ee_msm = ee_regression(theta=alpha,     # Marginal structural model
                           X=msm, y=y,      # ... observed data
                           model='linear',  # ... linear model
                           weights=ipw)     # ... weighted by IPW

    # Stacking the estimating equations and returning
    return np.vstack([ee_msm, ee_num, ee_ps])
[18]:
# M-estimator
init_vals = [0., 0., 0., ] + [0., ] + [0.,]*W.shape[1]
estr = MEstimator(psi_ipw_msm3, init=init_vals)
estr.estimate(solver='hybr', tolerance=1e-12, maxiter=5000)
[19]:
ci = estr.confidence_intervals()
print(estr.theta[0:3])
print("95% CI:")
print(ci[0:3, :].T)
[ 2.00452474 -0.10898888  0.00269494]
95% CI:
[[ 1.44058085e+00 -1.66863862e-01 -1.75081112e-03]
 [ 2.56846862e+00 -5.11138978e-02  7.14069266e-03]]

The book reports coefficients of: 2.005, −0.109, 0.003. This matches the output shown above.

As done in the book, we want to know the weight change for no change in smoking intensity and a +20 in smoking intensity. Rather than add corresponding estimation equations for those parts, we can directly manipulate the point and covariance estimates to get this. The function regression_predictions will do this for us given the estimated parameters and the values of interest

[20]:
# Creating dataframe for combinations to predict
p = pd.DataFrame()
p['smkintensity82_71'] = [0, 20]
p['smkint_sq'] = [0**2, 20**2]
p['I'] = 1
vals = np.asarray(p[['I', 'smkintensity82_71', 'smkint_sq']])

# Getting predicted values and variance for combinations
pred_y = regression_predictions(vals,
                                estr.theta[0:3],
                                estr.variance[0:3, 0:3])

# Displaying results
print("No change in smoking")
print(pred_y[0, 0])
print("95% CI:", pred_y[0, 2:])
print()
print("+20 smoking intensity")
print(pred_y[1, 0])
print("95% CI:", pred_y[1, 2:])
No change in smoking
2.004524735075991
95% CI: [1.44058085 2.56846862]

+20 smoking intensity
0.9027234481603187
95% CI: [-1.49966211  3.305109  ]

Again, we get similar results to those reported in the book: 2.0 (95% CI: 1.4, 2.6) and 0.9 (95% CI: −1.7, 3.5). However, our confidence intervals are slightly more narrow. Again this would be expected since we are using a variance estimator that is not overly conservative.

Note: ee_ipw_msm does not support non-binary treatments. Allowing for generic distributions for the generalized propensity score model is difficult. So, you will need to implement them by-hand (using a similar approach to above).

Binary outcome

We now repeat the process, but for a binary outcome and treatment. We will use a GLM with the binomial distribution and logistic link (this will estimate the causal odds ratio). This is avaible in ee_glm. We will also be using qsmk again.

[21]:
# Design matrix for propensity scores
W = dc[['I', 'sex', 'race', 'age', 'age_sq',
        'educ_2', 'educ_3', 'educ_4', 'educ_5',
        'smokeintensity', 'smokeintensity_sq',
        'smokeyrs', 'smokeyrs_sq',
        'exer_1', 'exer_2', 'active_1', 'active_2',
        'wt71', 'wt71_sq']]
# Design matrix for marginal structural model
msm = dc[['I', 'qsmk']]
# Treatment array
a = dc['qsmk']
# Outcome array
y = dc['death']
[22]:
def psi_ipw_msm4(theta):
    # Dividing parameters into corresponding estimation equations
    alpha = theta[0:2]              # Marginal structural model
    gamma = np.array([theta[2], ])  # Numerator model
    beta = theta[3:]                # Propensity score model

    # Estimating the propensity scores using a logit model
    ee_ps = ee_regression(theta=beta,        # Propensity score
                          X=W, y=a,          # ... observed data
                          model='logistic')  # ... logistic model
    pi = inverse_logit(np.dot(W, beta))      # Predicted probability

    # Estimating intercept-only for numerator
    ee_num = ee_regression(theta=gamma,           # Numerator model
                           X=dc[['I']], y=a,      # ... observed data
                           model='logistic')      # ... logit model
    num = inverse_logit(np.dot(dc[['I']], gamma)) # Predicted prob

    # Stabilized inverse probability weights
    ipw = np.where(a == 1, num/pi, (1-num)/(1-pi))

    # Estimating the MSM using a weighted linear model
    ee_msm = ee_glm(theta=alpha,               # MSM
                    X=msm, y=y,                # ... observed data
                    link='logit',              # ... logit link
                    distribution='binomial',   # ... binomial dist
                    weights=ipw)               # ... weighted

    # Stacking the estimating equations and returning
    return np.vstack([ee_msm, ee_num, ee_ps])
[23]:
# M-estimator
init_vals = [0., 0., ] + [0., ] + [0.,]*W.shape[1]
estr = MEstimator(psi_ipw_msm4, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[24]:
ci = estr.confidence_intervals()
print("qsmk:", np.exp(estr.theta[1]))
print("95% CI:", np.exp(ci[1, :]))
qsmk: 1.030577892676402
95% CI: [0.78938458 1.34546687]

The previous results are for the causal odds ratio. Again, they are similar to the book with slight differences in the confidence intervals (i.e., 1.0; 95% CI: 0.8, 1.4).

We can replicate this approach using ee_ipw_msm. To simplify the process, I am going to use the previous propensity score model parameters as the starting values.

[25]:
init_ps = list(estr.theta[3:])
[26]:
def psi_ipw_msm4a(theta):
    # Built-in estimating equation
    return ee_ipw_msm(theta, y=y, A=a, W=W, V=msm,
                      distribution='binomial',
                      link='logit')
[27]:
# M-estimator
init_vals = [0., 0., ] + init_ps
estr = MEstimator(psi_ipw_msm4a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[28]:
ci = estr.confidence_intervals()
print(np.exp(estr.theta[0:2]))
print("95% CI")
print(np.exp(ci[0:2, :]).T)
[0.22527109 1.030578  ]
95% CI
[[0.19459809 0.78938459]
 [0.26077884 1.34546713]]

Which are the same (exponentiated) coefficients as the previous approach. There is a slight discrepancy you may notice in the confidence intervals, but this is a floating point error resulting from a difference between the stabilized weights (by-hand) and the unstabilized weights (ee_ipw_msm).

12.5: Effect modification and marginal structural models

We will now use marginal structural models to study effect measure modification. We will look at effect modification by sex (sex) of quitting smoking (qsmk) on 10-year weight change (wt82_71).

[29]:
# Design matrix for propensity scores
W = dc[['I', 'sex', 'race', 'age', 'age_sq',
        'educ_2', 'educ_3', 'educ_4', 'educ_5',
        'smokeintensity', 'smokeintensity_sq',
        'smokeyrs', 'smokeyrs_sq',
        'exer_1', 'exer_2', 'active_1', 'active_2',
        'wt71', 'wt71_sq']]
# Design matrix for marginal structural model
dc['qsmk_sex'] = dc['qsmk']*dc['sex']
msm = dc[['I', 'qsmk', 'sex', 'qsmk_sex']]
# Treatment array
a = dc['qsmk']
# Outcome array
y = dc['wt82_71']
[30]:
def psi_ipw_msm5(theta):
    # Dividing parameters into corresponding estimation equations
    alpha = theta[0:4]              # Marginal structural model
    gamma = np.array([theta[4], ])  # Numerator parameter
    beta = theta[5:]                # Propensity score

    # Estimating the propensity scores using a logit model
    ee_ps = ee_regression(theta=beta,        # Propensity score
                          X=W, y=a,          # ... observed data
                          model='logistic')  # ... logit model
    pi = inverse_logit(np.dot(W, beta))      # Predicted prob

    # Estimating intercept-only for numerator
    ee_num = ee_regression(theta=gamma,            # Numerator model
                           X=dc[['I']], y=a,       # ... observed data
                           model='logistic')       # ... logit model
    num = inverse_logit(np.dot(dc[['I']], gamma))  # Predicted prob

    # Stabilized inverse probability weights
    ipw = np.where(a == 1, num/pi, (1-num)/(1-pi))

    # Estimating the MSM using a weighted linear model
    ee_msm = ee_regression(theta=alpha,     # Marginal structural model
                           X=msm, y=y,      # ... observed data
                           model='linear',  # ... linear model
                           weights=ipw)     # ... weighted

    # Stacking the estimating equations and returning
    return np.vstack([ee_msm, ee_num, ee_ps])
[31]:
# M-estimator
init_vals = [0., 0., 0., 0., ] + [0., ] + [0.,]*W.shape[1]
estr = MEstimator(psi_ipw_msm5, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[32]:
ci = estr.confidence_intervals()
print(estr.theta[0:4])
print("95% CI")
print(ci[0:4, :].T)
[ 1.78444688  3.52197763 -0.00872478 -0.15947852]
95% CI
[[ 1.19225612  2.28240175 -0.88125781 -2.15657223]
 [ 2.37663763  4.76155352  0.86380824  1.83761518]]

While not reported in the book, other online references report the folloiwing coefficients: 1.7844, 3.5220, -0.0087, -0.1595.

As before, we can replicate this result using the built-in ee_ipw_msm function.

[33]:
def psi_ipw_msm5a(theta):
    # Built-in estimating equation
    return ee_ipw_msm(theta, y=y, A=a, W=W, V=msm,
                      distribution='normal',
                      link='identity')
[34]:
# M-estimator
init_vals = [0., ]*msm.shape[1] + init_ps
estr = MEstimator(psi_ipw_msm5a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[35]:
ci = estr.confidence_intervals()
print(estr.theta[0:4])
print("95% CI")
print(ci[0:4, :].T)
[ 1.78444688  3.52197763 -0.00872478 -0.15947852]
95% CI
[[ 1.19225614  2.28240246 -0.88125778 -2.15657182]
 [ 2.37663761  4.7615528   0.86380821  1.83761477]]

which again replicates the by-hand results.

12.6: Censoring and missing data

To conclude, we will now consider the missing outcomes that were ignored earlier. To do this, we will use stabilized inverse probability of missingness weights (IPCW in the book). As we have done so many times already, we will use delicatessen to stack additional nuisance models together.

[36]:
# Design matrix for propensity score model
W = df[['I', 'sex', 'race', 'age', 'age_sq',
        'educ_2', 'educ_3', 'educ_4', 'educ_5',
        'smokeintensity', 'smokeintensity_sq',
        'smokeyrs', 'smokeyrs_sq',
        'exer_1', 'exer_2', 'active_1', 'active_2',
        'wt71', 'wt71_sq']]
# Design matrix for missing model
X = df[['I', 'qsmk', 'sex', 'race', 'age', 'age_sq',
        'educ_2', 'educ_3', 'educ_4', 'educ_5',
        'smokeintensity', 'smokeintensity_sq',
        'smokeyrs', 'smokeyrs_sq',
        'exer_1', 'exer_2', 'active_1', 'active_2',
        'wt71', 'wt71_sq']]
# Design matrix for marginal structural model
msm = df[['I', 'qsmk']]
# Treatment array
a = df['qsmk']
# Outcome array
y = df['wt82_71']
# Missing indicator
r = np.where(df['wt82_71'].isna(), 0, 1)
[37]:
def psi_ipw_msm6(theta):
    # Dividing parameters up for their corresponding estimation equations
    alpha = theta[0:2]                # MSM
    gamma_n = np.array([theta[2], ])  # Numerator PS
    beta_n = np.array([theta[3], ])   # Numerator MW
    gamma_d = theta[4:4+W.shape[1]]   # Propensity score
    beta_d = theta[4+W.shape[1]:]     # Missing model

    # Estimating the propensity scores using a logit model
    ee_ps = ee_regression(theta=gamma_d,      # Propensity score
                          X=W, y=a,           # ... observed data
                          model='logistic')   # ... logit model
    pi_a = inverse_logit(np.dot(W, gamma_d))  # Predicted prob

    # Estimating intercept-only for numerator of IPTW
    ee_num = ee_regression(theta=gamma_n,     # Numerator
                           X=df[['I']], y=a,  # ... observed data
                           model='logistic')  # ... logit model
    num_a = inverse_logit(np.dot(df[['I']], gamma_n))

    # Estimating the missing scores using a logit model
    ee_ms = ee_regression(theta=beta_d,      # Missing score
                          X=X, y=r,          # ... observed data
                          model='logistic')  # ... logit model
    pi_m = inverse_logit(np.dot(X, beta_d))  # Predicted prob

    # Estimating intercept-only for numerator of IPMW
    ee_sms = ee_regression(theta=beta_n,      # Numerator
                           X=df[['I']], y=r,  # ... observed data
                           model='logistic')  # ... logit model
    num_m = inverse_logit(np.dot(df[['I']], beta_n))

    # Stabilized inverse probability weights
    iptw = np.where(a == 1, num_a/pi_a, (1-num_a)/(1-pi_a))
    ipmw = np.where(r == 1, num_m/pi_m, 0)
    ipw = iptw*ipmw

    # Estimating the MSM using a weighted linear model
    ee_msm = ee_regression(theta=alpha,     # MSM
                           X=msm, y=y,      # ... observed data
                           model='linear',  # ... linear model
                           weights=ipw)     # ... weighted
    # Setting rows with missing Y's as zero (no contribution)
    ee_msm = np.nan_to_num(ee_msm, copy=False, nan=0.)

    # Stacking the estimating equations and returning
    return np.vstack([ee_msm, ee_num, ee_sms, ee_ps, ee_ms])
[38]:
# M-estimator
init_vals = [0., 0., ] + [0., 0., ] + [0.,]*W.shape[1] + [0.,]*X.shape[1]
estr = MEstimator(psi_ipw_msm6, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[39]:
ci = estr.confidence_intervals()
print(estr.theta[0:2])
print("95% CI")
print(ci[0:2, :].T)
[1.66199003 3.49649333]
95% CI
[[1.22590804 2.54496958]
 [2.09807203 4.44801708]]

Here, the book reports 3.5 (95% CI: 2.5, 4.5).

Again, we will replicate this using ee_ipw_msm instead. To incorporate the missingness weights, we will fit a separate model and then use the optional weights argument. Below is how this can be done

[40]:
def psi_ipw_msm6a(theta):
    # Separating parameters out
    alpha = theta[:2+W.shape[1]]  # MSM & PS
    gamma = theta[2+W.shape[1]:]  # Missing score

    # Estimating equation for IPMW
    ee_ms = ee_regression(theta=gamma,       # Missing score
                          X=X, y=r,          # ... observed data
                          model='logistic')  # ... logit model
    pi_m = inverse_logit(np.dot(X, gamma))   # Predicted prob
    ipmw = r / pi_m                          # Missing weights

    # Estimating equations for MSM and PS
    ee_msm = ee_ipw_msm(theta=alpha, y=y, A=a, W=W, V=msm,
                        distribution='normal',
                        link='identity',
                        weights=ipmw)
    # Setting rows with missing Y's as zero (no contribution)
    ee_msm = np.nan_to_num(ee_msm, copy=False, nan=0.)

    # Return the stacked estimating equations
    return np.vstack([ee_msm, ee_ms])
[41]:
# M-estimator
init_vals = [0., ]*msm.shape[1] + init_ps + [0., ]*X.shape[1]
estr = MEstimator(psi_ipw_msm6a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[42]:
ci = estr.confidence_intervals()
print(estr.theta[0:2])
print("95% CI")
print(ci[0:2, :].T)
[1.66199003 3.49649333]
95% CI
[[1.22590805 2.54496955]
 [2.09807201 4.4480171 ]]

Again, we are able to replicate the by-hand implementation. This concludes chapter 12.

Chapter 13: Standardization and the Parametric G-Formula

For Chapter 13, the book reviews the g-formula. Unlike IPW, the g-formula relies on modeling the outcome process. The g-computation algorithm estimator is

\[\hat{E}[Y^a] = n^{-1} \sum_{i=1}^n m(a, W_i; \beta)\]

where \(m\) is a statistical model for \(E[Y | A, W]\) and \(\beta\) are the parameters defining the model. This version is slightly different from the standardization form given in the book, but it is equivalent. Broadly, we apply the g-computation algorithm via (1) estimate an outcome model, (2) predict the outcomes had everyone been assigned \(a\), and (3) compute the mean of those predictions.

13.2: Estimating the mean outcome via modeling

First, we will fit a linear model for wt82_71 conditional on qsmk and the set of confounding variables. To begin, we will ignore the missing outcomes.

[43]:
# Design matrix for outcome model
X = dc[['I', 'qsmk', 'qsmk_smkint',
        'sex', 'race', 'age', 'age_sq',
        'educ_2', 'educ_3', 'educ_4', 'educ_5',
        'smokeintensity', 'smokeintensity_sq',
        'smokeyrs', 'smokeyrs_sq',
        'exer_1', 'exer_2', 'active_1', 'active_2',
        'wt71', 'wt71_sq']]
# Outcome array
y = dc['wt82_71']
[44]:
def psi_regression(theta):
    # Estimating the linear outcome model
    return ee_regression(theta=theta,
                         X=X, y=y,
                         model='linear')
[45]:
# M-estimator
init_vals = [0., ]*X.shape[1]
estr = MEstimator(psi_regression, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[46]:
print("Regression coefficients")
print(estr.theta)
Regression coefficients
[-1.58816566e+00  2.55959409e+00  4.66628395e-02 -1.43027166e+00
  5.60109604e-01  3.59635262e-01 -6.10095538e-03  7.90444028e-01
  5.56312403e-01  1.49156952e+00 -1.94977036e-01  4.91364717e-02
 -9.90651192e-04  1.34368569e-01 -1.86642949e-03  2.95975357e-01
  3.53912774e-01 -9.47569487e-01 -2.61377894e-01  4.55017905e-02
 -9.65325105e-04]

To ease application of later steps, we are going to save a list of these optimized values for starting values of the root-finding procedure.

[47]:
init_reg = list(estr.theta)

13.3: Standardizing the mean outcome to the confounder distribution

Now we can apply the g-computation algorithm (a way to evaluate the g-formula). To do this, we are going to create a copy of our data set. In that copy we are going to set qsmk=1 for all observations. We will then save the design matrix. We will then repeat this process for qsmk=0.

[48]:
# Copy of the data that we will updated qsmk in
dca = dc.copy()
[49]:
# Setting qsmk to 1
dca['qsmk'] = 1
dca['qsmk_smkint'] = dca['qsmk'] * dca['smokeintensity']
# Design matrix from qsmk=1 data
X1 = dca[['I', 'qsmk', 'qsmk_smkint',
          'sex', 'race', 'age', 'age_sq',
          'educ_2', 'educ_3', 'educ_4', 'educ_5',
          'smokeintensity', 'smokeintensity_sq',
          'smokeyrs', 'smokeyrs_sq',
          'exer_1', 'exer_2', 'active_1', 'active_2',
          'wt71', 'wt71_sq']]
[50]:
# Setting qsmk to 0
dca['qsmk'] = 0
dca['qsmk_smkint'] = dca['qsmk'] * dca['smokeintensity']
# Design matrix from qsmk=0 data
X0 = dca[['I', 'qsmk', 'qsmk_smkint',
          'sex', 'race', 'age', 'age_sq',
          'educ_2', 'educ_3', 'educ_4', 'educ_5',
          'smokeintensity', 'smokeintensity_sq',
          'smokeyrs', 'smokeyrs_sq',
          'exer_1', 'exer_2', 'active_1', 'active_2',
          'wt71', 'wt71_sq']]

Now, we will use the design matrices to generate predicted outcomes (pseudo-outcomes) for each individual. The average causal effect can then be calculated from those pseudo-outcomes.

The following is the corresponding estimating equations

[51]:
def psi_gcomp1(theta):
    # Dividing parameters into corresponding estimation equations
    rd, r1, r0 = theta[0], theta[1], theta[2]   # Causal means
    beta = theta[3:]                            # Outcome model

    # Estimating the linear model
    ee_reg = ee_regression(theta=beta,      # Outcome model
                           X=X, y=y,        # ... observed data
                           model='linear')  # ... linear model

    # Generating pseudo-outcome using the model
    y1hat = np.dot(X1, beta)  # Predicted Y when qsmk=1
    y0hat = np.dot(X0, beta)  # Predicted Y when qsmk=0

    # Causal means
    ee_r1 = y1hat - r1   # Causal mean for qsmk=1
    ee_r0 = y0hat - r0   # Causal mean for qsmk=0

    # Average causal effect
    ee_rd = np.ones(y.shape[0]) * ((r1 - r0) - rd)

    # Stacking the estimating equations and returning
    return np.vstack([ee_rd, ee_r1, ee_r0, ee_reg])
[52]:
# M-estimator
init_vals = [0., 0., 0.,] + init_reg
estr = MEstimator(psi_gcomp1, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[53]:
ci = estr.confidence_intervals()
print(estr.theta[0:3])
print("95% CI")
print(ci[0:3, :].T)
[3.5173742  5.27358732 1.75621312]
95% CI
[[2.58132997 4.42099134 1.33029619]
 [4.45341844 6.1261833  2.18213004]]

The first estimate is for the average causal effect (the second are the causal means under all quit smoking and all don’t quit smoking). Here, the confidence intervals match the book, but we were able to avoid the computational complexity of the bootstrap (another advantage of the sandwich variance estimator).

In the book, they report 3.5 (95% CI: 2.6, 4.5). The confidence interval in the book was generated via the bootstrap, so there is potential for random error in the estimates.

Rather than implement g-computation by-hand, we can also use the built-in estimating equations. Here is an example of that

[54]:
def psi_gcomp1a(theta):
    # Built-in g-formula estimating equation
    return ee_gformula(theta,
                       y=y, X=X,
                       X1=X1, X0=X0)
[55]:
# M-estimator
init_vals = [0., 0., 0.,] + init_reg
estr = MEstimator(psi_gcomp1a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[56]:
ci = estr.confidence_intervals()
print(estr.theta[0:3])
print("95% CI")
print(ci[0:3, :].T)
[3.5173742  5.27358732 1.75621312]
95% CI
[[2.58132997 4.42099134 1.33029619]
 [4.45341844 6.1261833  2.18213004]]

which provides the same answers (as we would expect!)

Chapter 14: G-Estimation of Structural Nested Models

G-estimation differs from the previous approaches in what parameter it targets. Rather than the average causal effect, we will estimate the parameters of a structural nested model. The structural nested mean model is

\[E[Y^a - Y^{a=0} | A=a, W] = \varphi_0 a\]

Here, \(\varphi_0\) represents the difference by \(a\) (which is equivalent to average causal effect). However, we can also study effect measure modification easily with structural nested models. Consider

\[E[Y^a - Y^{a=0} | A=a, W] = \varphi_0 a + \varphi_1 a V\]

where \(V \in W\). Therefore, the structural nested model described effect measure modification by \(V\). Importantly, structural nested models assume that we have correctly specified this model (hence the difference in interest parameters that occurs between methods).

G-estimation is a bit less straightforward to understand compare to the previous methods. Note that we will be using the propensity score for estimation and we still assume the same identification conditions. See the book for a much more in-depth discussion.

14.5 G-estimation

For solve our g-estimator, we are going to use a different approach than the one described in the main text of the book. Instead, we are going to use the approach (the estimating equations) described in Technical Point 14.2. As mentioned there, this approach is equivalent to the process described in the main text.

[57]:
# Design matrix for propensity scores
W = np.asarray(df[['I', 'sex', 'race', 'age', 'age_sq',
                   'educ_2', 'educ_3', 'educ_4', 'educ_5',
                   'smokeintensity', 'smokeintensity_sq',
                   'smokeyrs', 'smokeyrs_sq',
                   'exer_1', 'exer_2', 'active_1', 'active_2',
                   'wt71', 'wt71_sq']])
# Design matrix for missing model
X = np.asarray(df[['I', 'qsmk', 'sex', 'race', 'age', 'age_sq',
                   'educ_2', 'educ_3', 'educ_4', 'educ_5',
                   'smokeintensity', 'smokeintensity_sq',
                   'smokeyrs', 'smokeyrs_sq',
                   'exer_1', 'exer_2', 'active_1', 'active_2',
                   'wt71', 'wt71_sq']])
# Design matrix for structural nested model
snm = np.asarray(df[['I', ]])
# Treatment array
a = np.asarray(df['qsmk'])
# Outcome array
y = np.asarray(df['wt82_71'])
# Missing indicator
r = np.where(df['wt82_71'].isna(), 0, 1)
[58]:
def psi_snm1(theta):
    # Dividing parameters into corresponding estimation equations
    alpha = theta[:snm.shape[1]]             # SNM parameters
    beta = theta[snm.shape[1]:               # Propensity score
                 snm.shape[1]+W.shape[1]]
    gamma = theta[snm.shape[1]+W.shape[1]:]  # Missing score

    # Estimating equation for IPMW
    ee_ms = ee_regression(theta=gamma,       # Missing score
                          X=X, y=r,          # ... observed data
                          model='logistic')  # ... logit model
    pi_m = inverse_logit(np.dot(X, gamma))   # Predicted prob
    ipmw = r / pi_m                          # Missing weights

    # Estimating equations for PS
    ee_log = ee_regression(theta=beta,        # Propensity score
                           X=W, y=a,          # ... observed data
                           model='logistic',  # ... logit model
                           weights=ipmw)      # ... weighted
    pi = inverse_logit(np.dot(W, beta))       # Predicted prob

    # H(psi) equation for linear models
    h_psi = y - np.dot(snm*a[:, None], alpha)

    # Estimating equation for the structural nested mean model
    ee_snm = ipmw*(h_psi[:, None] * (a - pi)[:, None] * snm).T
    # Setting rows with missing Y's as zero (no contribution)
    ee_snm = np.nan_to_num(ee_snm, copy=False, nan=0.)

    return np.vstack([ee_snm, ee_log, ee_ms])
[59]:
# M-estimator
init_vals = [0., ] + init_ps + [0., ]*X.shape[1]
estr = MEstimator(psi_snm1, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[60]:
ci = estr.confidence_intervals()
print(estr.theta[0:1])
print("95% CI")
print(ci[0, :])
[3.4458988]
95% CI
[2.52709776 4.36469984]

The book reported 3.4 (95% CI: 2.5, 4.5). This confidence interval was generated by inverting the test results. As such, there is expected differences (different estimators are being used). As in the inverse probability weighting examples, these confidence intervals are expected due to be conservative due to the use of the ‘GEE trick’.

Now consider how we can use the built-in g-estimation functionality, ee_gestimation_snmm.

[61]:
def psi_snm1a(theta):
    # Dividing parameters into corresponding estimation equations
    alpha = theta[:snm.shape[1]+W.shape[1]]  # SNM and PS
    gamma = theta[snm.shape[1]+W.shape[1]:]  # Missing scores

    # Estimating equation for IPMW
    ee_ms = ee_regression(theta=gamma,       # Missing score
                          X=X, y=r,          # ... observed data
                          model='logistic')  # ... logit model
    pi_m = inverse_logit(np.dot(X, gamma))   # Predicted prob
    ipmw = r / pi_m                          # Missing weights

    # Estimating equations for PS
    ee_snm = ee_gestimation_snmm(theta=alpha,
                                 y=y, A=a,
                                 W=W, V=snm,
                                 weights=ipmw)
    # Setting rows with missing Y's as zero (no contribution)
    ee_snm = np.nan_to_num(ee_snm, copy=False, nan=0.)

    return np.vstack([ee_snm, ee_ms])
[62]:
# M-estimator
init_vals = [0., ] + init_ps + [0., ]*X.shape[1]
estr = MEstimator(psi_snm1a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[63]:
ci = estr.confidence_intervals()
print(estr.theta[0:1])
print("95% CI")
print(ci[0, :])
[3.4458988]
95% CI
[2.52709776 4.36469984]

The structural nested model parameters match between implementations, as expected.

14.6: Structural nested models with two or more parameters

We now adapt the previous code to consider more than 2 parameters in the structural nested model. Thankfully, this is pretty straightforward for how we coded the estimating equations since snm is a global object.

[64]:
snm = np.asarray(df[['I', 'smokeintensity']])
[65]:
def psi_snm2(theta):
    # Dividing parameters into corresponding estimation equations
    alpha = theta[:snm.shape[1]]             # SNM parameters
    beta = theta[snm.shape[1]:               # Propensity score
                 snm.shape[1]+W.shape[1]]
    gamma = theta[snm.shape[1]+W.shape[1]:]  # Missing score

    # Estimating equation for IPMW
    ee_ms = ee_regression(theta=gamma,       # Missing score
                          X=X, y=r,          # ... observed data
                          model='logistic')  # ... logit model
    pi_m = inverse_logit(np.dot(X, gamma))   # Predicted prob
    ipmw = r / pi_m                          # Missing weights

    # Estimating equations for PS
    ee_log = ee_regression(theta=beta,        # Propensity score
                           X=W, y=a,          # ... observed data
                           model='logistic',  # ... logit model
                           weights=ipmw)      # ... weighted
    pi = inverse_logit(np.dot(W, beta))       # Predicted prob

    # H(psi) equation for linear models
    h_psi = y - np.dot(snm*a[:, None], alpha)

    # Estimating equation for the structural nested mean model
    ee_snm = ipmw*(h_psi[:, None] * (a - pi)[:, None] * snm).T
    # Setting rows with missing Y's as zero (no contribution)
    ee_snm = np.nan_to_num(ee_snm, copy=False, nan=0.)

    return np.vstack([ee_snm, ee_log, ee_ms])
[66]:
# M-estimator
init_vals = [0., 0., ] + init_ps + [0., ]*X.shape[1]
estr = MEstimator(psi_snm2, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[67]:
ci = estr.confidence_intervals()
print(estr.theta[0:2])
print("95% CI")
print(ci[:2, :].T)
[2.85947039 0.03004128]
95% CI
[[ 1.0291672  -0.05951923]
 [ 4.68977357  0.11960179]]

which provides the same results as the book (2.86 and 0.03). Unlike the book, we provide confidence intervals. Inverting the test is not simple in this case, and the book does not use a bootstrapping procedure. Our confidence intervals are computed using the sandwich variance estimator.

Again, we can apply the built-in estimating equations for g-estimation. In truth, the new estimating equation is the same as the previous g-estimation version (since snm is a global object as called). Regardless, we provide again here.

[68]:
def psi_snm2a(theta):
    # Dividing parameters into corresponding estimation equations
    alpha = theta[:snm.shape[1]+W.shape[1]]  # SNM and PS
    gamma = theta[snm.shape[1]+W.shape[1]:]  # Missing scores

    # Estimating equation for IPMW
    ee_ms = ee_regression(theta=gamma,       # Missing score
                          X=X, y=r,          # ... observed data
                          model='logistic')  # ... logit model
    pi_m = inverse_logit(np.dot(X, gamma))   # Predicted prob
    ipmw = r / pi_m                          # Missing weights

    # Estimating equations for PS
    ee_snm = ee_gestimation_snmm(theta=alpha,
                                 y=y, A=a,
                                 W=W, V=snm,
                                 weights=ipmw)
    # Setting rows with missing Y's as zero (no contribution)
    ee_snm = np.nan_to_num(ee_snm, copy=False, nan=0.)

    return np.vstack([ee_snm, ee_ms])
[69]:
# M-estimator
init_vals = [0., 0., ] + init_ps + [0., ]*X.shape[1]
estr = MEstimator(psi_snm2a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[70]:
ci = estr.confidence_intervals()
print(estr.theta[0:2])
print("95% CI")
print(ci[:2, :].T)
[2.85947039 0.03004128]
95% CI
[[ 1.0291672  -0.05951923]
 [ 4.68977357  0.11960179]]

This concludes chapter 14.

Replication of the following chapters 15-17 are not yet available.