Hernan & Robins (2023): Causal Inference with Models
The following replicates selected examples from the textbook “Causal Inference: What If” by Hernan and Robins. These replications focus on Part II of the textbook. I recommend reading Part I prior to looking through the following code. It is a great, approachable, and freely available resource.
Here, we demonstrate application of delicatessen
for causal inference. Throughout, we use the sandwich variance estimator. This is not described in the textbook, but is an alternative to the bootstrap. Importantly, it is a consistent estimator of the variance that is computationally simpler (in terms of the computers computational effort). delicatessen
automates the whole procedure, so it is also simpler for us.
Broadly, interest is in estimating the average causal effect of stopping smoking (variable name: qsmk
) on 10-year weight change (variable name: wt82_71
). If we let \(Y^a\) indicate the potential weight change under smoking status \(a\), then the average causal effect can be written as
Herafter, we assume that the interest parameter is identified (see the book for what this means). Our focus will be on estimators for this quantity.
Setup
Data for these replications is available at the following Harvard School of Public Health website. Data comes from the National Health and Nutrition Examination Survey Data I Epidemiologic Follow-up Study (NHEFS). The NHEFS was jointly initiated by the National Center for Health Statistics and the National Institute on Aging in collaboration with other agencies of the United States Public Health Service. A detailed description of the NHEFS, together with publicly available data sets and documentation, can be found at wwwn.cdc.gov/nchs/nhanes/nhefs/
The data set used in the book and this tutorial is a subset of the full NHEFS. First, we will load the data and run some basic variable manipulations.
[1]:
import numpy as np
import scipy as sp
import pandas as pd
import delicatessen as deli
from delicatessen import MEstimator
from delicatessen.estimating_equations import (ee_regression, ee_glm, ee_ipw_msm,
ee_gformula, ee_gestimation_snmm)
from delicatessen.utilities import inverse_logit, regression_predictions
print("Versions")
print("NumPy: ", np.__version__)
print("SciPy: ", sp.__version__)
print("pandas: ", pd.__version__)
print("Delicatessen: ", deli.__version__)
Versions
NumPy: 1.25.2
SciPy: 1.11.2
pandas: 1.4.1
Delicatessen: 2.1
[2]:
df = pd.read_csv('data/nhefs.csv')
df.dropna(subset=['sex', 'age', 'race', 'ht',
'school', 'alcoholpy', 'smokeintensity'],
inplace=True)
# recoding some variables
df['inactive'] = np.where(df['active'] == 2, 1, 0)
df['no_exercise'] = np.where(df['exercise'] == 2, 1, 0)
df['university'] = np.where(df['education'] == 5, 1, 0)
# Subsetting only variables of interest
df = df[['wt82_71', 'qsmk', 'sex', 'age', 'race', 'wt71', 'wt82', 'ht',
'school', 'alcoholpy', 'smokeintensity', 'smokeyrs', 'smkintensity82_71',
'education', 'exercise', 'active', 'death']]
# creating quadratic terms
for col in ['age', 'wt71', 'smokeintensity', 'smokeyrs']:
df[col+'_sq'] = df[col] * df[col]
df['I'] = 1
# Indicator terms
df['educ_2'] = np.where(df['education'] == 2, 1, 0)
df['educ_3'] = np.where(df['education'] == 3, 1, 0)
df['educ_4'] = np.where(df['education'] == 4, 1, 0)
df['educ_5'] = np.where(df['education'] == 5, 1, 0)
df['exer_1'] = np.where(df['exercise'] == 1, 1, 0)
df['exer_2'] = np.where(df['exercise'] == 2, 1, 0)
df['active_1'] = np.where(df['active'] == 1, 1, 0)
df['active_2'] = np.where(df['active'] == 2, 1, 0)
# Interaction terms
df['qsmk_smkint'] = df['qsmk'] * df['smokeintensity']
# Complete-case data
dc = df.dropna(subset=['wt82_71']).copy()
Chapter 12: IP weighting and marginal structural models
The first estimation approach is inverse probability weighting. Inverse probability weights are defined as
where \(W\) is the set of confounders. We will estimate these weights and then use them to estimate the parameters of a marginal structural model. An example of a marginal structural model is
where \(\alpha\) are the parameters to estimate. Here, \(\alpha_1\) represents the average causal effect. We will estimate the marginal structural model using the observed data and a regression model weighted by the inverse probability weights (see the books for details).
12.1: The causal question
Chapter 12 starts out with estimating the crude association between qsmk
and wt82_71
(12.1). We will do this by fitting a linear regression model with delicatessen
.
[3]:
def psi_ols(theta):
return ee_regression(theta=theta, X=dc[['I', 'qsmk']],
y=dc['wt82_71'], model='linear')
[4]:
estr = MEstimator(psi_ols, init=[0., 0.])
estr.estimate(solver='hybr')
[5]:
print("Point:", estr.theta[1])
print("95% CI:", estr.confidence_intervals()[1, :])
Point: 2.540581454955888
95% CI: [1.58620716 3.49495575]
The book reports an estimate of 2.5 (95% CI: 1.7, 3.4)
You may notice that the confidence interval differs slightly. That is because we are using the sandwich variance (the book does not). While the variance estimators used here and in the book are expected to be asymptotically equivalent in this case, they can produce different results in finite samples as we see here.
12.2: Estimating inverse probability weights via modeling
Now we will estimate the parameters of the marginal structural model using unstabilized inverse probability weights.
To do this with delicatessen
, we will define the corresponding design matrices. Then we will define the stacked estimating equations. Then we will estimate the parameters and covariance using MEstimator
and present the output.
[6]:
# Design matrix for the propensity score model
W = dc[['I', 'sex', 'race', 'age', 'age_sq',
'educ_2', 'educ_3', 'educ_4', 'educ_5',
'smokeintensity', 'smokeintensity_sq',
'smokeyrs', 'smokeyrs_sq',
'exer_1', 'exer_2', 'active_1', 'active_2',
'wt71', 'wt71_sq']]
# Design matrix for the marginal structural model
msm = dc[['I', 'qsmk']]
# treatment variable
a = dc['qsmk']
# outcome variable
y = dc['wt82_71']
[7]:
def psi_ipw_msm1(theta):
# Dividing parameters into corresponding estimation equations
alpha = theta[0:2]
beta = theta[2:]
# Estimating the propensity scores
ee_ps = ee_regression(theta=beta, # Estimate propensity scores
X=W, y=a, # ... given observed A,W
model='logistic') # ... with logit model
pi = inverse_logit(np.dot(W, beta)) # Get Pr(A = 1 | W)
ipw = 1 / np.where(a == 1, pi, 1-pi) # Convert to IPW
# Estimating the MSM using a weighted linear model
ee_msm = ee_regression(theta=alpha, # MSM parameters
X=msm, y=y, # ... observed data
model='linear', # ... with linear model
weights=ipw) # ... but weighted by IPW
# Stacking the estimating equations and returning
return np.vstack([ee_msm, ee_ps])
[8]:
# M-estimator
init_vals = [0., 0., ] + [0.,]*W.shape[1]
estr = MEstimator(psi_ipw_msm1, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[9]:
ci = estr.confidence_intervals()
print(estr.theta[0:2])
print("95% CI")
print(ci[0:2, :].T)
[1.77997819 3.44053543]
95% CI
[[1.35249881 2.48589079]
[2.20745757 4.39518006]]
which are the estimates of the \(\alpha_0\) and \(\alpha_1\) parameters. The book provides the point estimate for qsmk
(\(\hat{\alpha}_1\)) as 3.4 (95% CI: 2.4, 4.5).
The point estimates presented here are the same as the book. However, the confidence intervals differ slightly. The confidence intervals in the book use the ‘GEE trick’ which provides overly conservative confidence intervals. With delicatessen
and the sandwich variance, we can estimate the variance without being overly conservative. So, the variance reported above would be preferred over the approach described in the book. This highlights the advantage of M-estimators for computation of
the variance with nuisance parameters (like the IPW estimator when the propensity score is estimated).
Instead of coding this by-hand, we can also use the built-in ee_ipw_msm
function. This function estimates a marginal structural model using inverse probability weights, as done above. Below is how to apply this functionality
[10]:
def psi_ipw_msm1a(theta):
# Built-in estimating equation
return ee_ipw_msm(theta, y=y, A=a, W=W, V=msm,
distribution='normal',
link='identity')
[11]:
# M-estimator
init_vals = [0., 0., ] + [0.,]*W.shape[1]
estr = MEstimator(psi_ipw_msm1a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[12]:
ci = estr.confidence_intervals()
print(estr.theta[0:2])
print("95% CI")
print(ci[0:2, :].T)
[1.77997819 3.44053543]
95% CI
[[1.35249881 2.48589079]
[2.20745757 4.39518006]]
As expected, this built-in functionality produces the same results as the by-hand version.
12.3: Stabilized inverse probability weights
Next, we are going to use stabilized weights. The stabilized weights will require us to estimate an additional parameter. We will accomplish this by stacking an estimating equation for that parameter. This extra estimating equation is for an intercept-only model for the probability of qsmk
.
[13]:
def psi_ipw_msm2(theta):
# Dividing parameters into corresponding estimation equations
alpha = theta[0:2] # MSM parameters
gamma = np.array([theta[2], ]) # Numerator parameter
beta = theta[3:] # Propensity score parameters
# Estimating the propensity scores using a logit model
ee_ps = ee_regression(theta=beta, # Propensity score model
X=W, y=a, # ... with observed data
model='logistic') # ... and logit model
pi = inverse_logit(np.dot(W, beta)) # Predicted probability of A=1
# Estimating intercept-only for numerator
ee_num = ee_regression(theta=gamma, # Numerator model
X=dc[['I']], y=a, # ... intercept-only
model='logistic') # ... logit model
num = inverse_logit(np.dot(dc[['I']], gamma)) # Marginal probability
# Construct stabilized weights
ipw = np.where(a == 1, num/pi, (1-num)/(1-pi))
# Estimating the MSM using a weighted linear model
ee_msm = ee_regression(theta=alpha, # MSM
X=msm, y=y, # ... with observed data
model='linear', # ... linear
weights=ipw) # ... weighted by stabilized
# Stacking the estimating equations and returning
return np.vstack([ee_msm, ee_num, ee_ps])
[14]:
# M-estimator
init_vals = [0., 0., ] + [0., ] + [0.,]*W.shape[1]
estr = MEstimator(psi_ipw_msm2, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[15]:
ci = estr.confidence_intervals()
print("Intercept:", estr.theta[0])
print("95% CI:", ci[0, :])
print("qsmk:", estr.theta[1])
print("95% CI:", ci[1, :])
Intercept: 1.77997819053316
95% CI: [1.35249875 2.20745763]
qsmk: 3.44053542964726
95% CI: [2.48589062 4.39518024]
The estimate is the same in the previous section. This is expected because as long as the marginal structural model is saturated, the unstabilized and stabilized IPTW should produce the same answer. (note there are some differences out in the smaller decimals places, but this is due to floating point errors, not differences between the estimators).
Note: ee_ipw_msm
does not support the computation of stabilized weights (results are equivalent in this setting, so we can be more computationally efficient by opting for the unstabilized weights).
12.4: Marginal structural models
Now we will consider the IPW estimator for a continuous action. We will look at smokeintensity
on wt82_71
.
[16]:
# Restricting data by smoking intensity
ds = dc.loc[dc['smokeintensity'] <= 25].copy()
# Design matrix for the propensity score model
W = ds[['I', 'sex', 'race', 'age', 'age_sq',
'educ_2', 'educ_3', 'educ_4', 'educ_5',
'smokeintensity', 'smokeintensity_sq',
'smokeyrs', 'smokeyrs_sq',
'exer_1', 'exer_2', 'active_1', 'active_2',
'wt71', 'wt71_sq']]
# Design matrix for the marginal structural model
ds['smkint_sq'] = ds['smkintensity82_71']**2
msm = ds[['I', 'smkintensity82_71', 'smkint_sq']]
# Treatment array
a = ds['smkintensity82_71']
# Outcome array
y = ds['wt82_71']
[17]:
def psi_ipw_msm3(theta):
# Dividing parameters into corresponding estimation equations
alpha = theta[0:3] # Marginal structural model
gamma = np.array([theta[3], ]) # Numerator
beta = theta[4:] # Propensity score model
div_ps = W.shape[0] - len(beta) # Divisor for PS SD
div_nm = W.shape[0] - len(gamma) # Divisor for Num SD
# Estimating the propensity scores using a logit model
ee_ps = ee_regression(theta=beta, # Generalized PS model
X=W, y=a, # ... for observed data
model='linear') # ... linear regression
mu = np.dot(W, beta) # Predicted values
mu_resid = np.sum((a - mu)**2) / div_ps # Standard deviation
fAL = sp.stats.norm.pdf(a, mu, # PDF from normal
np.sqrt(mu_resid)) # ... with estimates
# Estimating intercept-only for numerator
ee_num = ee_regression(theta=gamma, # Numerator for stabilized
X=ds[['I', ]], y=a, # ... for observed data
model='linear') # ... linear regression
num = np.dot(ds[['I', ]], gamma) # Predicted values
num_resid = np.sum((a - num)**2) / div_nm # Standard deviation
fA = sp.stats.norm.pdf(a, num, # PDF from normal
np.sqrt(num_resid)) # ... with estimates
# Stabilized weights
ipw = fA / fAL
# Estimating the MSM using a weighted linear model
ee_msm = ee_regression(theta=alpha, # Marginal structural model
X=msm, y=y, # ... observed data
model='linear', # ... linear model
weights=ipw) # ... weighted by IPW
# Stacking the estimating equations and returning
return np.vstack([ee_msm, ee_num, ee_ps])
[18]:
# M-estimator
init_vals = [0., 0., 0., ] + [0., ] + [0.,]*W.shape[1]
estr = MEstimator(psi_ipw_msm3, init=init_vals)
estr.estimate(solver='hybr', tolerance=1e-12, maxiter=5000)
[19]:
ci = estr.confidence_intervals()
print(estr.theta[0:3])
print("95% CI:")
print(ci[0:3, :].T)
[ 2.00452474 -0.10898888 0.00269494]
95% CI:
[[ 1.44058085e+00 -1.66863862e-01 -1.75081112e-03]
[ 2.56846862e+00 -5.11138978e-02 7.14069266e-03]]
The book reports coefficients of: 2.005, −0.109, 0.003. This matches the output shown above.
As done in the book, we want to know the weight change for no change in smoking intensity and a +20 in smoking intensity. Rather than add corresponding estimation equations for those parts, we can directly manipulate the point and covariance estimates to get this. The function regression_predictions
will do this for us given the estimated parameters and the values of interest
[20]:
# Creating dataframe for combinations to predict
p = pd.DataFrame()
p['smkintensity82_71'] = [0, 20]
p['smkint_sq'] = [0**2, 20**2]
p['I'] = 1
vals = np.asarray(p[['I', 'smkintensity82_71', 'smkint_sq']])
# Getting predicted values and variance for combinations
pred_y = regression_predictions(vals,
estr.theta[0:3],
estr.variance[0:3, 0:3])
# Displaying results
print("No change in smoking")
print(pred_y[0, 0])
print("95% CI:", pred_y[0, 2:])
print()
print("+20 smoking intensity")
print(pred_y[1, 0])
print("95% CI:", pred_y[1, 2:])
No change in smoking
2.004524735075991
95% CI: [1.44058085 2.56846862]
+20 smoking intensity
0.9027234481603187
95% CI: [-1.49966211 3.305109 ]
Again, we get similar results to those reported in the book: 2.0 (95% CI: 1.4, 2.6) and 0.9 (95% CI: −1.7, 3.5). However, our confidence intervals are slightly more narrow. Again this would be expected since we are using a variance estimator that is not overly conservative.
Note: ee_ipw_msm
does not support non-binary treatments. Allowing for generic distributions for the generalized propensity score model is difficult. So, you will need to implement them by-hand (using a similar approach to above).
Binary outcome
We now repeat the process, but for a binary outcome and treatment. We will use a GLM with the binomial distribution and logistic link (this will estimate the causal odds ratio). This is avaible in ee_glm
. We will also be using qsmk
again.
[21]:
# Design matrix for propensity scores
W = dc[['I', 'sex', 'race', 'age', 'age_sq',
'educ_2', 'educ_3', 'educ_4', 'educ_5',
'smokeintensity', 'smokeintensity_sq',
'smokeyrs', 'smokeyrs_sq',
'exer_1', 'exer_2', 'active_1', 'active_2',
'wt71', 'wt71_sq']]
# Design matrix for marginal structural model
msm = dc[['I', 'qsmk']]
# Treatment array
a = dc['qsmk']
# Outcome array
y = dc['death']
[22]:
def psi_ipw_msm4(theta):
# Dividing parameters into corresponding estimation equations
alpha = theta[0:2] # Marginal structural model
gamma = np.array([theta[2], ]) # Numerator model
beta = theta[3:] # Propensity score model
# Estimating the propensity scores using a logit model
ee_ps = ee_regression(theta=beta, # Propensity score
X=W, y=a, # ... observed data
model='logistic') # ... logistic model
pi = inverse_logit(np.dot(W, beta)) # Predicted probability
# Estimating intercept-only for numerator
ee_num = ee_regression(theta=gamma, # Numerator model
X=dc[['I']], y=a, # ... observed data
model='logistic') # ... logit model
num = inverse_logit(np.dot(dc[['I']], gamma)) # Predicted prob
# Stabilized inverse probability weights
ipw = np.where(a == 1, num/pi, (1-num)/(1-pi))
# Estimating the MSM using a weighted linear model
ee_msm = ee_glm(theta=alpha, # MSM
X=msm, y=y, # ... observed data
link='logit', # ... logit link
distribution='binomial', # ... binomial dist
weights=ipw) # ... weighted
# Stacking the estimating equations and returning
return np.vstack([ee_msm, ee_num, ee_ps])
[23]:
# M-estimator
init_vals = [0., 0., ] + [0., ] + [0.,]*W.shape[1]
estr = MEstimator(psi_ipw_msm4, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[24]:
ci = estr.confidence_intervals()
print("qsmk:", np.exp(estr.theta[1]))
print("95% CI:", np.exp(ci[1, :]))
qsmk: 1.030577892676402
95% CI: [0.78938458 1.34546687]
The previous results are for the causal odds ratio. Again, they are similar to the book with slight differences in the confidence intervals (i.e., 1.0; 95% CI: 0.8, 1.4).
We can replicate this approach using ee_ipw_msm
. To simplify the process, I am going to use the previous propensity score model parameters as the starting values.
[25]:
init_ps = list(estr.theta[3:])
[26]:
def psi_ipw_msm4a(theta):
# Built-in estimating equation
return ee_ipw_msm(theta, y=y, A=a, W=W, V=msm,
distribution='binomial',
link='logit')
[27]:
# M-estimator
init_vals = [0., 0., ] + init_ps
estr = MEstimator(psi_ipw_msm4a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[28]:
ci = estr.confidence_intervals()
print(np.exp(estr.theta[0:2]))
print("95% CI")
print(np.exp(ci[0:2, :]).T)
[0.22527109 1.030578 ]
95% CI
[[0.19459809 0.78938459]
[0.26077884 1.34546713]]
Which are the same (exponentiated) coefficients as the previous approach. There is a slight discrepancy you may notice in the confidence intervals, but this is a floating point error resulting from a difference between the stabilized weights (by-hand) and the unstabilized weights (ee_ipw_msm
).
12.5: Effect modification and marginal structural models
We will now use marginal structural models to study effect measure modification. We will look at effect modification by sex (sex
) of quitting smoking (qsmk
) on 10-year weight change (wt82_71
).
[29]:
# Design matrix for propensity scores
W = dc[['I', 'sex', 'race', 'age', 'age_sq',
'educ_2', 'educ_3', 'educ_4', 'educ_5',
'smokeintensity', 'smokeintensity_sq',
'smokeyrs', 'smokeyrs_sq',
'exer_1', 'exer_2', 'active_1', 'active_2',
'wt71', 'wt71_sq']]
# Design matrix for marginal structural model
dc['qsmk_sex'] = dc['qsmk']*dc['sex']
msm = dc[['I', 'qsmk', 'sex', 'qsmk_sex']]
# Treatment array
a = dc['qsmk']
# Outcome array
y = dc['wt82_71']
[30]:
def psi_ipw_msm5(theta):
# Dividing parameters into corresponding estimation equations
alpha = theta[0:4] # Marginal structural model
gamma = np.array([theta[4], ]) # Numerator parameter
beta = theta[5:] # Propensity score
# Estimating the propensity scores using a logit model
ee_ps = ee_regression(theta=beta, # Propensity score
X=W, y=a, # ... observed data
model='logistic') # ... logit model
pi = inverse_logit(np.dot(W, beta)) # Predicted prob
# Estimating intercept-only for numerator
ee_num = ee_regression(theta=gamma, # Numerator model
X=dc[['I']], y=a, # ... observed data
model='logistic') # ... logit model
num = inverse_logit(np.dot(dc[['I']], gamma)) # Predicted prob
# Stabilized inverse probability weights
ipw = np.where(a == 1, num/pi, (1-num)/(1-pi))
# Estimating the MSM using a weighted linear model
ee_msm = ee_regression(theta=alpha, # Marginal structural model
X=msm, y=y, # ... observed data
model='linear', # ... linear model
weights=ipw) # ... weighted
# Stacking the estimating equations and returning
return np.vstack([ee_msm, ee_num, ee_ps])
[31]:
# M-estimator
init_vals = [0., 0., 0., 0., ] + [0., ] + [0.,]*W.shape[1]
estr = MEstimator(psi_ipw_msm5, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[32]:
ci = estr.confidence_intervals()
print(estr.theta[0:4])
print("95% CI")
print(ci[0:4, :].T)
[ 1.78444688 3.52197763 -0.00872478 -0.15947852]
95% CI
[[ 1.19225612 2.28240175 -0.88125781 -2.15657223]
[ 2.37663763 4.76155352 0.86380824 1.83761518]]
While not reported in the book, other online references report the folloiwing coefficients: 1.7844, 3.5220, -0.0087, -0.1595.
As before, we can replicate this result using the built-in ee_ipw_msm
function.
[33]:
def psi_ipw_msm5a(theta):
# Built-in estimating equation
return ee_ipw_msm(theta, y=y, A=a, W=W, V=msm,
distribution='normal',
link='identity')
[34]:
# M-estimator
init_vals = [0., ]*msm.shape[1] + init_ps
estr = MEstimator(psi_ipw_msm5a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[35]:
ci = estr.confidence_intervals()
print(estr.theta[0:4])
print("95% CI")
print(ci[0:4, :].T)
[ 1.78444688 3.52197763 -0.00872478 -0.15947852]
95% CI
[[ 1.19225614 2.28240246 -0.88125778 -2.15657182]
[ 2.37663761 4.7615528 0.86380821 1.83761477]]
which again replicates the by-hand results.
12.6: Censoring and missing data
To conclude, we will now consider the missing outcomes that were ignored earlier. To do this, we will use stabilized inverse probability of missingness weights (IPCW in the book). As we have done so many times already, we will use delicatessen
to stack additional nuisance models together.
[36]:
# Design matrix for propensity score model
W = df[['I', 'sex', 'race', 'age', 'age_sq',
'educ_2', 'educ_3', 'educ_4', 'educ_5',
'smokeintensity', 'smokeintensity_sq',
'smokeyrs', 'smokeyrs_sq',
'exer_1', 'exer_2', 'active_1', 'active_2',
'wt71', 'wt71_sq']]
# Design matrix for missing model
X = df[['I', 'qsmk', 'sex', 'race', 'age', 'age_sq',
'educ_2', 'educ_3', 'educ_4', 'educ_5',
'smokeintensity', 'smokeintensity_sq',
'smokeyrs', 'smokeyrs_sq',
'exer_1', 'exer_2', 'active_1', 'active_2',
'wt71', 'wt71_sq']]
# Design matrix for marginal structural model
msm = df[['I', 'qsmk']]
# Treatment array
a = df['qsmk']
# Outcome array
y = df['wt82_71']
# Missing indicator
r = np.where(df['wt82_71'].isna(), 0, 1)
[37]:
def psi_ipw_msm6(theta):
# Dividing parameters up for their corresponding estimation equations
alpha = theta[0:2] # MSM
gamma_n = np.array([theta[2], ]) # Numerator PS
beta_n = np.array([theta[3], ]) # Numerator MW
gamma_d = theta[4:4+W.shape[1]] # Propensity score
beta_d = theta[4+W.shape[1]:] # Missing model
# Estimating the propensity scores using a logit model
ee_ps = ee_regression(theta=gamma_d, # Propensity score
X=W, y=a, # ... observed data
model='logistic') # ... logit model
pi_a = inverse_logit(np.dot(W, gamma_d)) # Predicted prob
# Estimating intercept-only for numerator of IPTW
ee_num = ee_regression(theta=gamma_n, # Numerator
X=df[['I']], y=a, # ... observed data
model='logistic') # ... logit model
num_a = inverse_logit(np.dot(df[['I']], gamma_n))
# Estimating the missing scores using a logit model
ee_ms = ee_regression(theta=beta_d, # Missing score
X=X, y=r, # ... observed data
model='logistic') # ... logit model
pi_m = inverse_logit(np.dot(X, beta_d)) # Predicted prob
# Estimating intercept-only for numerator of IPMW
ee_sms = ee_regression(theta=beta_n, # Numerator
X=df[['I']], y=r, # ... observed data
model='logistic') # ... logit model
num_m = inverse_logit(np.dot(df[['I']], beta_n))
# Stabilized inverse probability weights
iptw = np.where(a == 1, num_a/pi_a, (1-num_a)/(1-pi_a))
ipmw = np.where(r == 1, num_m/pi_m, 0)
ipw = iptw*ipmw
# Estimating the MSM using a weighted linear model
ee_msm = ee_regression(theta=alpha, # MSM
X=msm, y=y, # ... observed data
model='linear', # ... linear model
weights=ipw) # ... weighted
# Setting rows with missing Y's as zero (no contribution)
ee_msm = np.nan_to_num(ee_msm, copy=False, nan=0.)
# Stacking the estimating equations and returning
return np.vstack([ee_msm, ee_num, ee_sms, ee_ps, ee_ms])
[38]:
# M-estimator
init_vals = [0., 0., ] + [0., 0., ] + [0.,]*W.shape[1] + [0.,]*X.shape[1]
estr = MEstimator(psi_ipw_msm6, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[39]:
ci = estr.confidence_intervals()
print(estr.theta[0:2])
print("95% CI")
print(ci[0:2, :].T)
[1.66199003 3.49649333]
95% CI
[[1.22590804 2.54496958]
[2.09807203 4.44801708]]
Here, the book reports 3.5 (95% CI: 2.5, 4.5).
Again, we will replicate this using ee_ipw_msm
instead. To incorporate the missingness weights, we will fit a separate model and then use the optional weights
argument. Below is how this can be done
[40]:
def psi_ipw_msm6a(theta):
# Separating parameters out
alpha = theta[:2+W.shape[1]] # MSM & PS
gamma = theta[2+W.shape[1]:] # Missing score
# Estimating equation for IPMW
ee_ms = ee_regression(theta=gamma, # Missing score
X=X, y=r, # ... observed data
model='logistic') # ... logit model
pi_m = inverse_logit(np.dot(X, gamma)) # Predicted prob
ipmw = r / pi_m # Missing weights
# Estimating equations for MSM and PS
ee_msm = ee_ipw_msm(theta=alpha, y=y, A=a, W=W, V=msm,
distribution='normal',
link='identity',
weights=ipmw)
# Setting rows with missing Y's as zero (no contribution)
ee_msm = np.nan_to_num(ee_msm, copy=False, nan=0.)
# Return the stacked estimating equations
return np.vstack([ee_msm, ee_ms])
[41]:
# M-estimator
init_vals = [0., ]*msm.shape[1] + init_ps + [0., ]*X.shape[1]
estr = MEstimator(psi_ipw_msm6a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[42]:
ci = estr.confidence_intervals()
print(estr.theta[0:2])
print("95% CI")
print(ci[0:2, :].T)
[1.66199003 3.49649333]
95% CI
[[1.22590805 2.54496955]
[2.09807201 4.4480171 ]]
Again, we are able to replicate the by-hand implementation. This concludes chapter 12.
Chapter 13: Standardization and the Parametric G-Formula
For Chapter 13, the book reviews the g-formula. Unlike IPW, the g-formula relies on modeling the outcome process. The g-computation algorithm estimator is
where \(m\) is a statistical model for \(E[Y | A, W]\) and \(\beta\) are the parameters defining the model. This version is slightly different from the standardization form given in the book, but it is equivalent. Broadly, we apply the g-computation algorithm via (1) estimate an outcome model, (2) predict the outcomes had everyone been assigned \(a\), and (3) compute the mean of those predictions.
13.2: Estimating the mean outcome via modeling
First, we will fit a linear model for wt82_71
conditional on qsmk
and the set of confounding variables. To begin, we will ignore the missing outcomes.
[43]:
# Design matrix for outcome model
X = dc[['I', 'qsmk', 'qsmk_smkint',
'sex', 'race', 'age', 'age_sq',
'educ_2', 'educ_3', 'educ_4', 'educ_5',
'smokeintensity', 'smokeintensity_sq',
'smokeyrs', 'smokeyrs_sq',
'exer_1', 'exer_2', 'active_1', 'active_2',
'wt71', 'wt71_sq']]
# Outcome array
y = dc['wt82_71']
[44]:
def psi_regression(theta):
# Estimating the linear outcome model
return ee_regression(theta=theta,
X=X, y=y,
model='linear')
[45]:
# M-estimator
init_vals = [0., ]*X.shape[1]
estr = MEstimator(psi_regression, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[46]:
print("Regression coefficients")
print(estr.theta)
Regression coefficients
[-1.58816566e+00 2.55959409e+00 4.66628395e-02 -1.43027166e+00
5.60109604e-01 3.59635262e-01 -6.10095538e-03 7.90444028e-01
5.56312403e-01 1.49156952e+00 -1.94977036e-01 4.91364717e-02
-9.90651192e-04 1.34368569e-01 -1.86642949e-03 2.95975357e-01
3.53912774e-01 -9.47569487e-01 -2.61377894e-01 4.55017905e-02
-9.65325105e-04]
To ease application of later steps, we are going to save a list of these optimized values for starting values of the root-finding procedure.
[47]:
init_reg = list(estr.theta)
13.3: Standardizing the mean outcome to the confounder distribution
Now we can apply the g-computation algorithm (a way to evaluate the g-formula). To do this, we are going to create a copy of our data set. In that copy we are going to set qsmk=1
for all observations. We will then save the design matrix. We will then repeat this process for qsmk=0
.
[48]:
# Copy of the data that we will updated qsmk in
dca = dc.copy()
[49]:
# Setting qsmk to 1
dca['qsmk'] = 1
dca['qsmk_smkint'] = dca['qsmk'] * dca['smokeintensity']
# Design matrix from qsmk=1 data
X1 = dca[['I', 'qsmk', 'qsmk_smkint',
'sex', 'race', 'age', 'age_sq',
'educ_2', 'educ_3', 'educ_4', 'educ_5',
'smokeintensity', 'smokeintensity_sq',
'smokeyrs', 'smokeyrs_sq',
'exer_1', 'exer_2', 'active_1', 'active_2',
'wt71', 'wt71_sq']]
[50]:
# Setting qsmk to 0
dca['qsmk'] = 0
dca['qsmk_smkint'] = dca['qsmk'] * dca['smokeintensity']
# Design matrix from qsmk=0 data
X0 = dca[['I', 'qsmk', 'qsmk_smkint',
'sex', 'race', 'age', 'age_sq',
'educ_2', 'educ_3', 'educ_4', 'educ_5',
'smokeintensity', 'smokeintensity_sq',
'smokeyrs', 'smokeyrs_sq',
'exer_1', 'exer_2', 'active_1', 'active_2',
'wt71', 'wt71_sq']]
Now, we will use the design matrices to generate predicted outcomes (pseudo-outcomes) for each individual. The average causal effect can then be calculated from those pseudo-outcomes.
The following is the corresponding estimating equations
[51]:
def psi_gcomp1(theta):
# Dividing parameters into corresponding estimation equations
rd, r1, r0 = theta[0], theta[1], theta[2] # Causal means
beta = theta[3:] # Outcome model
# Estimating the linear model
ee_reg = ee_regression(theta=beta, # Outcome model
X=X, y=y, # ... observed data
model='linear') # ... linear model
# Generating pseudo-outcome using the model
y1hat = np.dot(X1, beta) # Predicted Y when qsmk=1
y0hat = np.dot(X0, beta) # Predicted Y when qsmk=0
# Causal means
ee_r1 = y1hat - r1 # Causal mean for qsmk=1
ee_r0 = y0hat - r0 # Causal mean for qsmk=0
# Average causal effect
ee_rd = np.ones(y.shape[0]) * ((r1 - r0) - rd)
# Stacking the estimating equations and returning
return np.vstack([ee_rd, ee_r1, ee_r0, ee_reg])
[52]:
# M-estimator
init_vals = [0., 0., 0.,] + init_reg
estr = MEstimator(psi_gcomp1, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[53]:
ci = estr.confidence_intervals()
print(estr.theta[0:3])
print("95% CI")
print(ci[0:3, :].T)
[3.5173742 5.27358732 1.75621312]
95% CI
[[2.58132997 4.42099134 1.33029619]
[4.45341844 6.1261833 2.18213004]]
The first estimate is for the average causal effect (the second are the causal means under all quit smoking and all don’t quit smoking). Here, the confidence intervals match the book, but we were able to avoid the computational complexity of the bootstrap (another advantage of the sandwich variance estimator).
In the book, they report 3.5 (95% CI: 2.6, 4.5). The confidence interval in the book was generated via the bootstrap, so there is potential for random error in the estimates.
Rather than implement g-computation by-hand, we can also use the built-in estimating equations. Here is an example of that
[54]:
def psi_gcomp1a(theta):
# Built-in g-formula estimating equation
return ee_gformula(theta,
y=y, X=X,
X1=X1, X0=X0)
[55]:
# M-estimator
init_vals = [0., 0., 0.,] + init_reg
estr = MEstimator(psi_gcomp1a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[56]:
ci = estr.confidence_intervals()
print(estr.theta[0:3])
print("95% CI")
print(ci[0:3, :].T)
[3.5173742 5.27358732 1.75621312]
95% CI
[[2.58132997 4.42099134 1.33029619]
[4.45341844 6.1261833 2.18213004]]
which provides the same answers (as we would expect!)
Chapter 14: G-Estimation of Structural Nested Models
G-estimation differs from the previous approaches in what parameter it targets. Rather than the average causal effect, we will estimate the parameters of a structural nested model. The structural nested mean model is
Here, \(\varphi_0\) represents the difference by \(a\) (which is equivalent to average causal effect). However, we can also study effect measure modification easily with structural nested models. Consider
where \(V \in W\). Therefore, the structural nested model described effect measure modification by \(V\). Importantly, structural nested models assume that we have correctly specified this model (hence the difference in interest parameters that occurs between methods).
G-estimation is a bit less straightforward to understand compare to the previous methods. Note that we will be using the propensity score for estimation and we still assume the same identification conditions. See the book for a much more in-depth discussion.
14.5 G-estimation
For solve our g-estimator, we are going to use a different approach than the one described in the main text of the book. Instead, we are going to use the approach (the estimating equations) described in Technical Point 14.2. As mentioned there, this approach is equivalent to the process described in the main text.
[57]:
# Design matrix for propensity scores
W = np.asarray(df[['I', 'sex', 'race', 'age', 'age_sq',
'educ_2', 'educ_3', 'educ_4', 'educ_5',
'smokeintensity', 'smokeintensity_sq',
'smokeyrs', 'smokeyrs_sq',
'exer_1', 'exer_2', 'active_1', 'active_2',
'wt71', 'wt71_sq']])
# Design matrix for missing model
X = np.asarray(df[['I', 'qsmk', 'sex', 'race', 'age', 'age_sq',
'educ_2', 'educ_3', 'educ_4', 'educ_5',
'smokeintensity', 'smokeintensity_sq',
'smokeyrs', 'smokeyrs_sq',
'exer_1', 'exer_2', 'active_1', 'active_2',
'wt71', 'wt71_sq']])
# Design matrix for structural nested model
snm = np.asarray(df[['I', ]])
# Treatment array
a = np.asarray(df['qsmk'])
# Outcome array
y = np.asarray(df['wt82_71'])
# Missing indicator
r = np.where(df['wt82_71'].isna(), 0, 1)
[58]:
def psi_snm1(theta):
# Dividing parameters into corresponding estimation equations
alpha = theta[:snm.shape[1]] # SNM parameters
beta = theta[snm.shape[1]: # Propensity score
snm.shape[1]+W.shape[1]]
gamma = theta[snm.shape[1]+W.shape[1]:] # Missing score
# Estimating equation for IPMW
ee_ms = ee_regression(theta=gamma, # Missing score
X=X, y=r, # ... observed data
model='logistic') # ... logit model
pi_m = inverse_logit(np.dot(X, gamma)) # Predicted prob
ipmw = r / pi_m # Missing weights
# Estimating equations for PS
ee_log = ee_regression(theta=beta, # Propensity score
X=W, y=a, # ... observed data
model='logistic', # ... logit model
weights=ipmw) # ... weighted
pi = inverse_logit(np.dot(W, beta)) # Predicted prob
# H(psi) equation for linear models
h_psi = y - np.dot(snm*a[:, None], alpha)
# Estimating equation for the structural nested mean model
ee_snm = ipmw*(h_psi[:, None] * (a - pi)[:, None] * snm).T
# Setting rows with missing Y's as zero (no contribution)
ee_snm = np.nan_to_num(ee_snm, copy=False, nan=0.)
return np.vstack([ee_snm, ee_log, ee_ms])
[59]:
# M-estimator
init_vals = [0., ] + init_ps + [0., ]*X.shape[1]
estr = MEstimator(psi_snm1, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[60]:
ci = estr.confidence_intervals()
print(estr.theta[0:1])
print("95% CI")
print(ci[0, :])
[3.4458988]
95% CI
[2.52709776 4.36469984]
The book reported 3.4 (95% CI: 2.5, 4.5). This confidence interval was generated by inverting the test results. As such, there is expected differences (different estimators are being used). As in the inverse probability weighting examples, these confidence intervals are expected due to be conservative due to the use of the ‘GEE trick’.
Now consider how we can use the built-in g-estimation functionality, ee_gestimation_snmm
.
[61]:
def psi_snm1a(theta):
# Dividing parameters into corresponding estimation equations
alpha = theta[:snm.shape[1]+W.shape[1]] # SNM and PS
gamma = theta[snm.shape[1]+W.shape[1]:] # Missing scores
# Estimating equation for IPMW
ee_ms = ee_regression(theta=gamma, # Missing score
X=X, y=r, # ... observed data
model='logistic') # ... logit model
pi_m = inverse_logit(np.dot(X, gamma)) # Predicted prob
ipmw = r / pi_m # Missing weights
# Estimating equations for PS
ee_snm = ee_gestimation_snmm(theta=alpha,
y=y, A=a,
W=W, V=snm,
weights=ipmw)
# Setting rows with missing Y's as zero (no contribution)
ee_snm = np.nan_to_num(ee_snm, copy=False, nan=0.)
return np.vstack([ee_snm, ee_ms])
[62]:
# M-estimator
init_vals = [0., ] + init_ps + [0., ]*X.shape[1]
estr = MEstimator(psi_snm1a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[63]:
ci = estr.confidence_intervals()
print(estr.theta[0:1])
print("95% CI")
print(ci[0, :])
[3.4458988]
95% CI
[2.52709776 4.36469984]
The structural nested model parameters match between implementations, as expected.
14.6: Structural nested models with two or more parameters
We now adapt the previous code to consider more than 2 parameters in the structural nested model. Thankfully, this is pretty straightforward for how we coded the estimating equations since snm
is a global object.
[64]:
snm = np.asarray(df[['I', 'smokeintensity']])
[65]:
def psi_snm2(theta):
# Dividing parameters into corresponding estimation equations
alpha = theta[:snm.shape[1]] # SNM parameters
beta = theta[snm.shape[1]: # Propensity score
snm.shape[1]+W.shape[1]]
gamma = theta[snm.shape[1]+W.shape[1]:] # Missing score
# Estimating equation for IPMW
ee_ms = ee_regression(theta=gamma, # Missing score
X=X, y=r, # ... observed data
model='logistic') # ... logit model
pi_m = inverse_logit(np.dot(X, gamma)) # Predicted prob
ipmw = r / pi_m # Missing weights
# Estimating equations for PS
ee_log = ee_regression(theta=beta, # Propensity score
X=W, y=a, # ... observed data
model='logistic', # ... logit model
weights=ipmw) # ... weighted
pi = inverse_logit(np.dot(W, beta)) # Predicted prob
# H(psi) equation for linear models
h_psi = y - np.dot(snm*a[:, None], alpha)
# Estimating equation for the structural nested mean model
ee_snm = ipmw*(h_psi[:, None] * (a - pi)[:, None] * snm).T
# Setting rows with missing Y's as zero (no contribution)
ee_snm = np.nan_to_num(ee_snm, copy=False, nan=0.)
return np.vstack([ee_snm, ee_log, ee_ms])
[66]:
# M-estimator
init_vals = [0., 0., ] + init_ps + [0., ]*X.shape[1]
estr = MEstimator(psi_snm2, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[67]:
ci = estr.confidence_intervals()
print(estr.theta[0:2])
print("95% CI")
print(ci[:2, :].T)
[2.85947039 0.03004128]
95% CI
[[ 1.0291672 -0.05951923]
[ 4.68977357 0.11960179]]
which provides the same results as the book (2.86 and 0.03). Unlike the book, we provide confidence intervals. Inverting the test is not simple in this case, and the book does not use a bootstrapping procedure. Our confidence intervals are computed using the sandwich variance estimator.
Again, we can apply the built-in estimating equations for g-estimation. In truth, the new estimating equation is the same as the previous g-estimation version (since snm
is a global object as called). Regardless, we provide again here.
[68]:
def psi_snm2a(theta):
# Dividing parameters into corresponding estimation equations
alpha = theta[:snm.shape[1]+W.shape[1]] # SNM and PS
gamma = theta[snm.shape[1]+W.shape[1]:] # Missing scores
# Estimating equation for IPMW
ee_ms = ee_regression(theta=gamma, # Missing score
X=X, y=r, # ... observed data
model='logistic') # ... logit model
pi_m = inverse_logit(np.dot(X, gamma)) # Predicted prob
ipmw = r / pi_m # Missing weights
# Estimating equations for PS
ee_snm = ee_gestimation_snmm(theta=alpha,
y=y, A=a,
W=W, V=snm,
weights=ipmw)
# Setting rows with missing Y's as zero (no contribution)
ee_snm = np.nan_to_num(ee_snm, copy=False, nan=0.)
return np.vstack([ee_snm, ee_ms])
[69]:
# M-estimator
init_vals = [0., 0., ] + init_ps + [0., ]*X.shape[1]
estr = MEstimator(psi_snm2a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)
[70]:
ci = estr.confidence_intervals()
print(estr.theta[0:2])
print("95% CI")
print(ci[:2, :].T)
[2.85947039 0.03004128]
95% CI
[[ 1.0291672 -0.05951923]
[ 4.68977357 0.11960179]]
This concludes chapter 14.
Replication of the following chapters 15-17 are not yet available.