Hernan & Robins (2023): Causal Inference with Models

The following section replicates selected examples from the textbook “Causal Inference: What If” by Hernan and Robins. These replications focus on Part II of the textbook. I recommend reading Part I prior to looking through the following code. It is a great, approachable, and freely available resource on causal inference.

Here, we demonstrate application of delicatessen for causal inference. Throughout, we use the empirical sandwich variance estimator. This is not described in the textbook, but is an alternative to the bootstrap. Importantly, it is a consistent estimator of the variance that is computationally simpler (in terms of the computers computational effort). delicatessen automates the whole procedure, so it is even simpler for us.

Broadly, interest will be in estimating the average causal effect of stopping smoking (variable name: qsmk) on 10-year weight change (variable name: wt82_71). If we let \(Y^a\) indicate the potential weight change under smoking status \(a\), then the average causal effect can be written as

\[E[Y^1] - E[Y^0]\]

Herafter, we assume that the interest parameter is identified (see the book for details on what this means). Our focus will be on estimators described in the book for this quantity (or related ones).

Setup

[1]:

import numpy as np
import scipy as sp
import pandas as pd

import delicatessen as deli
from delicatessen import MEstimator
from delicatessen.estimating_equations import (ee_regression, ee_glm, ee_ipw_msm,
                                               ee_gformula, ee_aipw,
                                               ee_gestimation_snmm,
                                               ee_iv_causal, ee_2sls)
from delicatessen.utilities import inverse_logit, regression_predictions

print("Versions")
print("NumPy:        ", np.__version__)
print("SciPy:        ", sp.__version__)
print("pandas:       ", pd.__version__)
print("Delicatessen: ", deli.__version__)

Versions
NumPy:         2.3.5
SciPy:         1.16.3
pandas:        2.3.3
Delicatessen:  4.1

Data for these replications is available at the following Harvard School of Public Health website. Data comes from the National Health and Nutrition Examination Survey Data I Epidemiologic Follow-up Study (NHEFS). The NHEFS was jointly initiated by the National Center for Health Statistics and the National Institute on Aging in collaboration with other agencies of the United States Public Health Service. A detailed description of the NHEFS, together with publicly available data sets and documentation, can be found at wwwn.cdc.gov/nchs/nhanes/nhefs/

The data set used in the book and this tutorial is a subset of the full NHEFS. First, we will load the data and run some basic variable manipulations.

[2]:

df = pd.read_csv('data/nhefs.csv')
df.dropna(subset=['sex', 'age', 'race', 'ht',
                  'school', 'alcoholpy', 'smokeintensity'],
         inplace=True)

# recoding some variables
df['inactive'] = np.where(df['active'] == 2, 1, 0)
df['no_exercise'] = np.where(df['exercise'] == 2, 1, 0)
df['university'] = np.where(df['education'] == 5, 1, 0)

# Subsetting only variables of interest
df = df[['wt82_71', 'qsmk', 'sex', 'age', 'race', 'wt71', 'wt82', 'ht',
         'school', 'alcoholpy', 'smokeintensity', 'smokeyrs', 'smkintensity82_71',
         'education', 'exercise', 'active', 'death']]

# creating quadratic terms
for col in ['age', 'wt71', 'smokeintensity', 'smokeyrs']:
    df[col+'_sq'] = df[col] * df[col]

df['I'] = 1

# Indicator terms
df['educ_2'] = np.where(df['education'] == 2, 1, 0)
df['educ_3'] = np.where(df['education'] == 3, 1, 0)
df['educ_4'] = np.where(df['education'] == 4, 1, 0)
df['educ_5'] = np.where(df['education'] == 5, 1, 0)
df['exer_1'] = np.where(df['exercise'] == 1, 1, 0)
df['exer_2'] = np.where(df['exercise'] == 2, 1, 0)
df['active_1'] = np.where(df['active'] == 1, 1, 0)
df['active_2'] = np.where(df['active'] == 2, 1, 0)

# Interaction terms
df['qsmk_smkint'] = df['qsmk'] * df['smokeintensity']

# Complete-case data
dc = df.dropna(subset=['wt82_71']).copy()

Chapter 12: IP weighting and marginal structural models

The first estimation approach is inverse probability weighting. Inverse probability weights are defined as

\[\frac{1}{\Pr(A=a | W)}\]

where \(W\) is the set of confounders. We will estimate these weights and then use them to estimate the parameters of a marginal structural model. An example of a marginal structural model is

\[E[Y^a] = \alpha_0 + \alpha_1 a\]

where \(\alpha\) are the parameters to estimate. Here, \(\alpha_1\) represents the average causal effect. We will estimate the marginal structural model using the observed data and a regression model weighted by the inverse probability weights (see the books for details).

12.1: The causal question

Chapter 12 starts out with estimating the crude association between qsmk and wt82_71 (12.1). We will do this by fitting a linear regression model with delicatessen.

[3]:

def psi_ols(theta):
    return ee_regression(theta=theta, X=dc[['I', 'qsmk']],
                         y=dc['wt82_71'], model='linear')

[4]:

estr = MEstimator(psi_ols, init=[0., 0.])
estr.estimate()

[5]:

estr.print_results(subset=[1, ])

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1566 | No. Parameters:              2
Solving algorithm:         lm | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    2.54     0.49     5.22     1.59     3.49     0.00    22.39
==============================================================

The book reports an unadjusted estimate of 2.5 (95% CI: 1.7, 3.4)

You may notice that the confidence interval differs slightly. That is because we are using the sandwich variance (the book uses a different approach). While the variance estimators used here and in the book are expected to be asymptotically equal (i.e., equal as \(n\) goes to \(\infty\)), they can produce different results in finite samples as we see here.

12.2: Estimating inverse probability weights via modeling

Now we will estimate the parameters of the marginal structural model using unstabilized inverse probability weights.

To do this with delicatessen, we will define the corresponding design matrices. Then we will define the stacked estimating equations. Then we will estimate the parameters and covariance using MEstimator and present the output.

[6]:

# Design matrix for the propensity score model
W = dc[['I', 'sex', 'race', 'age', 'age_sq',
        'educ_2', 'educ_3', 'educ_4', 'educ_5',
        'smokeintensity', 'smokeintensity_sq',
        'smokeyrs', 'smokeyrs_sq',
        'exer_1', 'exer_2', 'active_1', 'active_2',
        'wt71', 'wt71_sq']]
# Design matrix for the marginal structural model
msm = dc[['I', 'qsmk']]
# treatment variable
a = dc['qsmk']
# outcome variable
y = dc['wt82_71']

[7]:

def psi_ipw_msm1(theta):
    # Dividing parameters into corresponding estimation equations
    alpha = theta[0:2]
    beta = theta[2:]

    # Estimating the propensity scores
    ee_ps = ee_regression(theta=beta,        # Estimate propensity scores
                          X=W, y=a,          # ... given observed A,W
                          model='logistic')  # ... with logit model
    pi = inverse_logit(np.dot(W, beta))      # Get Pr(A = 1 | W)
    ipw = 1 / np.where(a == 1, pi, 1-pi)     # Convert to IPW

    # Estimating the MSM using a weighted linear model
    ee_msm = ee_regression(theta=alpha,      # MSM parameters
                           X=msm, y=y,       # ... observed data
                           model='linear',   # ... with linear model
                           weights=ipw)      # ... but weighted by IPW

    # Stacking the estimating equations and returning
    return np.vstack([ee_msm, ee_ps])

[8]:

# M-estimator
init_vals = [0., 0., ] + [0.,]*W.shape[1]
estr = MEstimator(psi_ipw_msm1, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)

[9]:

estr.print_results(subset=[0, 1])

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1566 | No. Parameters:             21
Solving algorithm:       hybr | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    1.78     0.22     8.16     1.35     2.21     0.00    51.42
    3.44     0.49     7.06     2.49     4.40     0.00    39.17
==============================================================

which are the estimates of the \(\alpha_0\) and \(\alpha_1\) parameters. The book provides the point estimate for qsmk (\(\hat{\alpha}_1\)) as 3.4 (95% CI: 2.4, 4.5).

The point estimates presented here are the same as the book. However, the confidence intervals differ slightly. The confidence intervals in the book use the ‘GEE trick’, treating the weights as known in the weighted linear model. The GEE trick provides overly conservative confidence intervals. With delicatessen and the sandwich variance, we can estimate the variance without being overly conservative. So, the variance reported above would be preferred over the approach described in the book. This highlights the advantage of M-estimators for computation of the variance with nuisance parameters (like the IPW estimator when the propensity score is estimated).

Instead of coding this by-hand, we can also use the built-in ee_ipw_msm function. This function estimates a marginal structural model using inverse probability weights, as done above. Below is how to apply this functionality

[10]:

def psi_ipw_msm1a(theta):
    # Built-in estimating equation
    return ee_ipw_msm(theta, y=y, A=a, W=W, V=msm,
                      distribution='normal',
                      link='identity')

[11]:

# M-estimator
init_vals = [0., 0., ] + [0.,]*W.shape[1]
estr = MEstimator(psi_ipw_msm1a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)

[12]:

estr.print_results(subset=[0, 1])

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1566 | No. Parameters:             21
Solving algorithm:       hybr | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    1.78     0.22     8.16     1.35     2.21     0.00    51.42
    3.44     0.49     7.06     2.49     4.40     0.00    39.17
==============================================================

As expected, this built-in functionality produces the same results as the by-hand version.

12.3: Stabilized inverse probability weights

Next, we are going to use stabilized weights. The stabilized weights will require us to estimate an additional parameter. We will accomplish this by stacking an estimating equation for that parameter. This extra estimating equation is for an intercept-only model for the probability of qsmk.

[13]:

def psi_ipw_msm2(theta):
    # Dividing parameters into corresponding estimation equations
    alpha = theta[0:2]                 # MSM parameters
    gamma = np.array([theta[2], ])     # Numerator parameter
    beta = theta[3:]                   # Propensity score parameters

    # Estimating the propensity scores using a logit model
    ee_ps = ee_regression(theta=beta,        # Propensity score model
                          X=W, y=a,          # ... with observed data
                          model='logistic')  # ... and logit model
    pi = inverse_logit(np.dot(W, beta))      # Predicted probability of A=1

    # Estimating intercept-only for numerator
    ee_num = ee_regression(theta=gamma,            # Numerator model
                           X=dc[['I']], y=a,       # ... intercept-only
                           model='logistic')       # ... logit model
    num = inverse_logit(np.dot(dc[['I']], gamma))  # Marginal probability

    # Construct stabilized weights
    ipw = np.where(a == 1, num/pi, (1-num)/(1-pi))

    # Estimating the MSM using a weighted linear model
    ee_msm = ee_regression(theta=alpha,     # MSM
                           X=msm, y=y,      # ... with observed data
                           model='linear',  # ... linear
                           weights=ipw)     # ... weighted by stabilized

    # Stacking the estimating equations and returning
    return np.vstack([ee_msm, ee_num, ee_ps])

[14]:

# M-estimator
init_vals = [0., 0., ] + [0., ] + [0.,]*W.shape[1]
estr = MEstimator(psi_ipw_msm2, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)

[15]:

estr.print_results(subset=[0, 1])

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1566 | No. Parameters:             22
Solving algorithm:       hybr | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    1.78     0.22     8.16     1.35     2.21     0.00    51.42
    3.44     0.49     7.06     2.49     4.40     0.00    39.17
==============================================================

The estimate is the same in the previous section. This is expected because as long as the marginal structural model is saturated, the unstabilized and stabilized IPTW should produce the same answer. (Note there are some differences out in the smaller decimals places, but this is due to floating point errors, not differences between the estimators).

Note: ee_ipw_msm does not support the computation of stabilized weights (results are equivalent in this setting, so we can be more computationally efficient by opting for the unstabilized weights).

12.4: Marginal structural models

Now we will consider the IPW estimator for a continuous action. We will look at smokeintensity on wt82_71.

[16]:

# Restricting data by smoking intensity
ds = dc.loc[dc['smokeintensity'] <= 25].copy()

# Design matrix for the propensity score model
W = ds[['I', 'sex', 'race', 'age', 'age_sq',
        'educ_2', 'educ_3', 'educ_4', 'educ_5',
        'smokeintensity', 'smokeintensity_sq',
        'smokeyrs', 'smokeyrs_sq',
        'exer_1', 'exer_2', 'active_1', 'active_2',
        'wt71', 'wt71_sq']]
# Design matrix for the marginal structural model
ds['smkint_sq'] = ds['smkintensity82_71']**2
msm = ds[['I', 'smkintensity82_71', 'smkint_sq']]
# Treatment array
a = ds['smkintensity82_71']
# Outcome array
y = ds['wt82_71']

[17]:

def psi_ipw_msm3(theta):
    # Dividing parameters into corresponding estimation equations
    alpha = theta[0:3]               # Marginal structural model
    gamma = np.array([theta[3], ])   # Numerator
    beta = theta[4:]                 # Propensity score model
    div_ps = W.shape[0] - len(beta)  # Divisor for PS SD
    div_nm = W.shape[0] - len(gamma) # Divisor for Num SD

    # Estimating the propensity scores using a logit model
    ee_ps = ee_regression(theta=beta,           # Generalized PS model
                          X=W, y=a,             # ... for observed data
                          model='linear')       # ... linear regression
    mu = np.dot(W, beta)                        # Predicted values
    mu_resid = np.sum((a - mu)**2) / div_ps     # Standard deviation
    fAL = sp.stats.norm.pdf(a, mu,              # PDF from normal
                            np.sqrt(mu_resid))  # ... with estimates

    # Estimating intercept-only for numerator
    ee_num = ee_regression(theta=gamma,         # Numerator for stabilized
                           X=ds[['I', ]], y=a,  # ... for observed data
                           model='linear')      # ... linear regression
    num = np.dot(ds[['I', ]], gamma)            # Predicted values
    num_resid = np.sum((a - num)**2) / div_nm   # Standard deviation
    fA = sp.stats.norm.pdf(a, num,              # PDF from normal
                           np.sqrt(num_resid))  # ... with estimates

    # Stabilized weights
    ipw = fA / fAL

    # Estimating the MSM using a weighted linear model
    ee_msm = ee_regression(theta=alpha,     # Marginal structural model
                           X=msm, y=y,      # ... observed data
                           model='linear',  # ... linear model
                           weights=ipw)     # ... weighted by IPW

    # Stacking the estimating equations and returning
    return np.vstack([ee_msm, ee_num, ee_ps])

[18]:

# M-estimator
init_vals = [0., 0., 0., ] + [0., ] + [0.,]*W.shape[1]
estr = MEstimator(psi_ipw_msm3, init=init_vals)
estr.estimate(solver='hybr', tolerance=1e-12, maxiter=5000)

[19]:

estr.print_results(subset=[0, 1, 2], decimals=3)

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1162 | No. Parameters:             23
Solving algorithm:       hybr | Max Iterations:           5000
Solving tolerance:      1e-12 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
   2.005    0.288    6.967    1.441    2.568    0.000   38.165
  -0.109    0.030   -3.691   -0.167   -0.051    0.000   12.128
   0.003    0.002    1.188   -0.002    0.007    0.235    2.091
==============================================================

The book reports coefficients of: 2.005, −0.109, 0.003. This matches the output shown above.

As done in the book, we want to know the weight change for no change in smoking intensity and a +20 in smoking intensity. Rather than add corresponding estimation equations for those parts, we can directly manipulate the point and covariance estimates to get this. The function regression_predictions will do this for us given the estimated parameters and the values of interest

[20]:

# Creating dataframe for combinations to predict
p = pd.DataFrame()
p['smkintensity82_71'] = [0, 20]
p['smkint_sq'] = [0**2, 20**2]
p['I'] = 1
vals = np.asarray(p[['I', 'smkintensity82_71', 'smkint_sq']])

# Getting predicted values and variance for combinations
pred_y = regression_predictions(vals,
                                estr.theta[0:3],
                                estr.variance[0:3, 0:3])

# Displaying results
print("No change in smoking")
print(pred_y[0, 0])
print("95% CI:", pred_y[0, 2:])
print()
print("+20 smoking intensity")
print(pred_y[1, 0])
print("95% CI:", pred_y[1, 2:])

No change in smoking
2.004524735075372
95% CI: [1.44058093 2.56846854]

+20 smoking intensity
0.9027234481660886
95% CI: [-1.49966123  3.30510813]

Again, we get similar results to those reported in the book: 2.0 (95% CI: 1.4, 2.6) and 0.9 (95% CI: −1.7, 3.5). However, our confidence intervals are slightly more narrow. Again this would be expected since we are using a variance estimator that is not overly conservative.

Note: ee_ipw_msm does not support non-binary treatments. Allowing for generic distributions for the generalized propensity score model is difficult. So, you will need to implement them by-hand (using a similar approach to above).

Binary outcome

We now repeat the process, but for a binary outcome and treatment. We will use a GLM with the binomial distribution and logistic link (this will estimate the causal odds ratio). This is avaible in ee_glm. We will also be using qsmk again.

[21]:

# Design matrix for propensity scores
W = dc[['I', 'sex', 'race', 'age', 'age_sq',
        'educ_2', 'educ_3', 'educ_4', 'educ_5',
        'smokeintensity', 'smokeintensity_sq',
        'smokeyrs', 'smokeyrs_sq',
        'exer_1', 'exer_2', 'active_1', 'active_2',
        'wt71', 'wt71_sq']]
# Design matrix for marginal structural model
msm = dc[['I', 'qsmk']]
# Treatment array
a = dc['qsmk']
# Outcome array
y = dc['death']

[22]:

def psi_ipw_msm4(theta):
    # Dividing parameters into corresponding estimation equations
    alpha = theta[0:2]              # Marginal structural model
    gamma = np.array([theta[2], ])  # Numerator model
    beta = theta[3:]                # Propensity score model

    # Estimating the propensity scores using a logit model
    ee_ps = ee_regression(theta=beta,        # Propensity score
                          X=W, y=a,          # ... observed data
                          model='logistic')  # ... logistic model
    pi = inverse_logit(np.dot(W, beta))      # Predicted probability

    # Estimating intercept-only for numerator
    ee_num = ee_regression(theta=gamma,           # Numerator model
                           X=dc[['I']], y=a,      # ... observed data
                           model='logistic')      # ... logit model
    num = inverse_logit(np.dot(dc[['I']], gamma)) # Predicted prob

    # Stabilized inverse probability weights
    ipw = np.where(a == 1, num/pi, (1-num)/(1-pi))

    # Estimating the MSM using a weighted linear model
    ee_msm = ee_glm(theta=alpha,               # MSM
                    X=msm, y=y,                # ... observed data
                    link='logit',              # ... logit link
                    distribution='binomial',   # ... binomial dist
                    weights=ipw)               # ... weighted

    # Stacking the estimating equations and returning
    return np.vstack([ee_msm, ee_num, ee_ps])

[23]:

# M-estimator
init_vals = [0., 0., ] + [0., ] + [0.,]*W.shape[1]
estr = MEstimator(psi_ipw_msm4, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)

[24]:

ci = estr.confidence_intervals()
print("qsmk:", np.exp(estr.theta[1]))
print("95% CI:", np.exp(ci[1, :]))

qsmk: 1.0305778926815938
95% CI: [0.78938459 1.34546684]

The previous results are for the causal odds ratio. Again, they are similar to the book with slight differences in the confidence intervals (i.e., 1.0; 95% CI: 0.8, 1.4).

We can replicate this approach using ee_ipw_msm. To simplify the process, I am going to use the previous propensity score model parameters as the starting values.

[25]:

init_ps = list(estr.theta[3:])

[26]:

def psi_ipw_msm4a(theta):
    # Built-in estimating equation
    return ee_ipw_msm(theta, y=y, A=a, W=W, V=msm,
                      distribution='binomial',
                      link='logit')

[27]:

# M-estimator
init_vals = [0., 0., ] + init_ps
estr = MEstimator(psi_ipw_msm4a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)

[28]:

ci = estr.confidence_intervals()
print(np.exp(estr.theta[0:2]))
print("95% CI")
print(np.exp(ci[0:2, :]).T)

[0.22527109 1.030578  ]
95% CI
[[0.19459809 0.78938473]
 [0.26077884 1.3454669 ]]

Which are the same (exponentiated) coefficients as the previous approach. There is a slight discrepancy you may notice in the confidence intervals, but this is a floating point error resulting from a difference between the stabilized weights (by-hand) and the unstabilized weights (ee_ipw_msm).

12.5: Effect modification and marginal structural models

We will now use marginal structural models to study effect measure modification. We will look at effect modification by sex (sex) of quitting smoking (qsmk) on 10-year weight change (wt82_71).

[29]:

# Design matrix for propensity scores
W = dc[['I', 'sex', 'race', 'age', 'age_sq',
        'educ_2', 'educ_3', 'educ_4', 'educ_5',
        'smokeintensity', 'smokeintensity_sq',
        'smokeyrs', 'smokeyrs_sq',
        'exer_1', 'exer_2', 'active_1', 'active_2',
        'wt71', 'wt71_sq']]
# Design matrix for marginal structural model
dc['qsmk_sex'] = dc['qsmk']*dc['sex']
msm = dc[['I', 'qsmk', 'sex', 'qsmk_sex']]
# Treatment array
a = dc['qsmk']
# Outcome array
y = dc['wt82_71']

[30]:

def psi_ipw_msm5(theta):
    # Dividing parameters into corresponding estimation equations
    alpha = theta[0:4]              # Marginal structural model
    gamma = np.array([theta[4], ])  # Numerator parameter
    beta = theta[5:]                # Propensity score

    # Estimating the propensity scores using a logit model
    ee_ps = ee_regression(theta=beta,        # Propensity score
                          X=W, y=a,          # ... observed data
                          model='logistic')  # ... logit model
    pi = inverse_logit(np.dot(W, beta))      # Predicted prob

    # Estimating intercept-only for numerator
    ee_num = ee_regression(theta=gamma,            # Numerator model
                           X=dc[['I']], y=a,       # ... observed data
                           model='logistic')       # ... logit model
    num = inverse_logit(np.dot(dc[['I']], gamma))  # Predicted prob

    # Stabilized inverse probability weights
    ipw = np.where(a == 1, num/pi, (1-num)/(1-pi))

    # Estimating the MSM using a weighted linear model
    ee_msm = ee_regression(theta=alpha,     # Marginal structural model
                           X=msm, y=y,      # ... observed data
                           model='linear',  # ... linear model
                           weights=ipw)     # ... weighted

    # Stacking the estimating equations and returning
    return np.vstack([ee_msm, ee_num, ee_ps])

[31]:

# M-estimator
init_vals = [0., 0., 0., 0., ] + [0., ] + [0.,]*W.shape[1]
estr = MEstimator(psi_ipw_msm5, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)

[32]:

estr.print_results(subset=[0, 1, 2, 3])

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1566 | No. Parameters:             24
Solving algorithm:       hybr | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    1.78     0.30     5.91     1.19     2.38     0.00    28.09
    3.52     0.63     5.57     2.28     4.76     0.00    25.22
   -0.01     0.45    -0.02    -0.88     0.86     0.98     0.02
   -0.16     1.02    -0.16    -2.16     1.84     0.88     0.19
==============================================================

While not reported in the book, other online references report the following coefficients: 1.7844, 3.5220, -0.0087, -0.1595.

As before, we can replicate this result using the built-in ee_ipw_msm function.

[33]:

def psi_ipw_msm5a(theta):
    # Built-in estimating equation
    return ee_ipw_msm(theta, y=y, A=a, W=W, V=msm,
                      distribution='normal',
                      link='identity')

[34]:

# M-estimator
init_vals = [0., ]*msm.shape[1] + init_ps
estr = MEstimator(psi_ipw_msm5a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)

[35]:

estr.print_results(subset=[0, 1, 2, 3])

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1566 | No. Parameters:             23
Solving algorithm:       hybr | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    1.78     0.30     5.91     1.19     2.38     0.00    28.09
    3.52     0.63     5.57     2.28     4.76     0.00    25.22
   -0.01     0.45    -0.02    -0.88     0.86     0.98     0.02
   -0.16     1.02    -0.16    -2.16     1.84     0.88     0.19
==============================================================

which again replicates the by-hand results.

12.6: Censoring and missing data

To conclude, we will now consider the missing outcomes that were ignored earlier. To do this, we will use stabilized inverse probability of missingness weights (IPCW in the book). As we have done so many times already, we will use delicatessen to stack additional nuisance models together.

[36]:

# Design matrix for propensity score model
W = df[['I', 'sex', 'race', 'age', 'age_sq',
        'educ_2', 'educ_3', 'educ_4', 'educ_5',
        'smokeintensity', 'smokeintensity_sq',
        'smokeyrs', 'smokeyrs_sq',
        'exer_1', 'exer_2', 'active_1', 'active_2',
        'wt71', 'wt71_sq']]
# Design matrix for missing model
X = df[['I', 'qsmk', 'sex', 'race', 'age', 'age_sq',
        'educ_2', 'educ_3', 'educ_4', 'educ_5',
        'smokeintensity', 'smokeintensity_sq',
        'smokeyrs', 'smokeyrs_sq',
        'exer_1', 'exer_2', 'active_1', 'active_2',
        'wt71', 'wt71_sq']]
# Design matrix for marginal structural model
msm = df[['I', 'qsmk']]
# Treatment array
a = df['qsmk']
# Outcome array
y = df['wt82_71']
# Missing indicator
r = np.where(df['wt82_71'].isna(), 0, 1)

[37]:

def psi_ipw_msm6(theta):
    # Dividing parameters up for their corresponding estimation equations
    alpha = theta[0:2]                # MSM
    gamma_n = np.array([theta[2], ])  # Numerator PS
    beta_n = np.array([theta[3], ])   # Numerator MW
    gamma_d = theta[4:4+W.shape[1]]   # Propensity score
    beta_d = theta[4+W.shape[1]:]     # Missing model

    # Estimating the propensity scores using a logit model
    ee_ps = ee_regression(theta=gamma_d,      # Propensity score
                          X=W, y=a,           # ... observed data
                          model='logistic')   # ... logit model
    pi_a = inverse_logit(np.dot(W, gamma_d))  # Predicted prob

    # Estimating intercept-only for numerator of IPTW
    ee_num = ee_regression(theta=gamma_n,     # Numerator
                           X=df[['I']], y=a,  # ... observed data
                           model='logistic')  # ... logit model
    num_a = inverse_logit(np.dot(df[['I']], gamma_n))

    # Estimating the missing scores using a logit model
    ee_ms = ee_regression(theta=beta_d,      # Missing score
                          X=X, y=r,          # ... observed data
                          model='logistic')  # ... logit model
    pi_m = inverse_logit(np.dot(X, beta_d))  # Predicted prob

    # Estimating intercept-only for numerator of IPMW
    ee_sms = ee_regression(theta=beta_n,      # Numerator
                           X=df[['I']], y=r,  # ... observed data
                           model='logistic')  # ... logit model
    num_m = inverse_logit(np.dot(df[['I']], beta_n))

    # Stabilized inverse probability weights
    iptw = np.where(a == 1, num_a/pi_a, (1-num_a)/(1-pi_a))
    ipmw = np.where(r == 1, num_m/pi_m, 0)
    ipw = iptw*ipmw

    # Estimating the MSM using a weighted linear model
    ee_msm = ee_regression(theta=alpha,     # MSM
                           X=msm, y=y,      # ... observed data
                           model='linear',  # ... linear model
                           weights=ipw)     # ... weighted
    # Setting rows with missing Y's as zero (no contribution)
    ee_msm = np.nan_to_num(ee_msm, copy=False, nan=0.)

    # Stacking the estimating equations and returning
    return np.vstack([ee_msm, ee_num, ee_sms, ee_ps, ee_ms])

[38]:

# M-estimator
init_vals = [0., 0., ] + [0., 0., ] + [0.,]*W.shape[1] + [0.,]*X.shape[1]
estr = MEstimator(psi_ipw_msm6, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)

[39]:

estr.print_results(subset=[0, 1])

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1629 | No. Parameters:             43
Solving algorithm:       hybr | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    1.66     0.22     7.47     1.23     2.10     0.00    43.50
    3.50     0.49     7.20     2.54     4.45     0.00    40.62
==============================================================

Here, the book reports 3.5 (95% CI: 2.5, 4.5).

Again, we will replicate this using ee_ipw_msm instead. To incorporate the missingness weights, we will fit a separate model and then use the optional weights argument. Below is how this can be done

[40]:

def psi_ipw_msm6a(theta):
    # Separating parameters out
    alpha = theta[:2+W.shape[1]]  # MSM & PS
    gamma = theta[2+W.shape[1]:]  # Missing score

    # Estimating equation for IPMW
    ee_ms = ee_regression(theta=gamma,       # Missing score
                          X=X, y=r,          # ... observed data
                          model='logistic')  # ... logit model
    pi_m = inverse_logit(np.dot(X, gamma))   # Predicted prob
    ipmw = r / pi_m                          # Missing weights

    # Estimating equations for MSM and PS
    ee_msm = ee_ipw_msm(theta=alpha, y=y, A=a, W=W, V=msm,
                        distribution='normal',
                        link='identity',
                        weights=ipmw)
    # Setting rows with missing Y's as zero (no contribution)
    ee_msm = np.nan_to_num(ee_msm, copy=False, nan=0.)

    # Return the stacked estimating equations
    return np.vstack([ee_msm, ee_ms])

[41]:

# M-estimator
init_vals = [0., ]*msm.shape[1] + init_ps + [0., ]*X.shape[1]
estr = MEstimator(psi_ipw_msm6a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)

[42]:

estr.print_results(subset=[0, 1])

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1629 | No. Parameters:             41
Solving algorithm:       hybr | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    1.66     0.22     7.47     1.23     2.10     0.00    43.50
    3.50     0.49     7.20     2.54     4.45     0.00    40.62
==============================================================

Again, we are able to replicate the by-hand implementation. This concludes chapter 12.

Chapter 13: Standardization and the Parametric G-Formula

For Chapter 13, the book reviews the g-formula. Unlike IPW, the g-formula relies on modeling the outcome process. The g-computation algorithm estimator is

\[\hat{E}[Y^a] = n^{-1} \sum_{i=1}^n m(a, W_i; \beta)\]

where \(m\) is a statistical model for \(E[Y | A, W]\) and \(\beta\) are the parameters defining the model. This version is slightly different from the standardization form given in the book, but it is equivalent. Broadly, we apply the g-computation algorithm via (1) estimate an outcome model, (2) predict the outcomes had everyone been assigned \(a\), and (3) compute the mean of those predictions.

13.2: Estimating the mean outcome via modeling

First, we will fit a linear model for wt82_71 conditional on qsmk and the set of confounding variables. To begin, we will ignore the missing outcomes.

[43]:

# Design matrix for outcome model
X = dc[['I', 'qsmk', 'qsmk_smkint',
        'sex', 'race', 'age', 'age_sq',
        'educ_2', 'educ_3', 'educ_4', 'educ_5',
        'smokeintensity', 'smokeintensity_sq',
        'smokeyrs', 'smokeyrs_sq',
        'exer_1', 'exer_2', 'active_1', 'active_2',
        'wt71', 'wt71_sq']]
# Outcome array
y = dc['wt82_71']

[44]:

def psi_regression(theta):
    # Estimating the linear outcome model
    return ee_regression(theta=theta,
                         X=X, y=y,
                         model='linear')

[45]:

# M-estimator
init_vals = [0., ]*X.shape[1]
estr = MEstimator(psi_regression, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)

[46]:

estr.print_results()

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1566 | No. Parameters:             21
Solving algorithm:       hybr | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
   -1.59     4.71    -0.34   -10.83     7.65     0.74     0.44
    2.56     0.84     3.06     0.92     4.20     0.00     8.82
    0.05     0.04     1.14    -0.03     0.13     0.25     1.99
   -1.43     0.51    -2.83    -2.42    -0.44     0.00     7.75
    0.56     0.60     0.94    -0.61     1.73     0.35     1.52
    0.36     0.19     1.92    -0.01     0.73     0.06     4.18
   -0.01     0.00    -3.15    -0.01    -0.00     0.00     9.25
    0.79     0.63     1.25    -0.45     2.03     0.21     2.25
    0.56     0.55     1.01    -0.53     1.64     0.31     1.67
    1.49     0.76     1.97     0.01     2.97     0.05     4.36
   -0.19     0.72    -0.27    -1.60     1.21     0.79     0.35
    0.05     0.06     0.86    -0.06     0.16     0.39     1.36
   -0.00     0.00    -0.98    -0.00     0.00     0.33     1.62
    0.13     0.11     1.21    -0.08     0.35     0.23     2.14
   -0.00     0.00    -1.07    -0.01     0.00     0.29     1.81
    0.30     0.47     0.64    -0.62     1.21     0.53     0.93
    0.35     0.52     0.68    -0.66     1.37     0.50     1.01
   -0.95     0.40    -2.38    -1.73    -0.17     0.02     5.84
   -0.26     0.75    -0.35    -1.72     1.20     0.73     0.46
    0.05     0.10     0.46    -0.15     0.24     0.64     0.64
   -0.00     0.00    -1.50    -0.00     0.00     0.13     2.89
==============================================================

To ease application of later steps, we are going to save a list of these optimized values for starting values of the root-finding procedure.

[47]:

init_reg = list(estr.theta)

13.3: Standardizing the mean outcome to the confounder distribution

Now we can apply the g-computation algorithm (a way to evaluate the g-formula). To do this, we are going to create a copy of our data set. In that copy we are going to set qsmk=1 for all observations. We will then save the design matrix. We will then repeat this process for qsmk=0.

[48]:

# Copy of the data that we will updated qsmk in
dca = dc.copy()

[49]:

# Setting qsmk to 1
dca['qsmk'] = 1
dca['qsmk_smkint'] = dca['qsmk'] * dca['smokeintensity']
# Design matrix from qsmk=1 data
X1 = dca[['I', 'qsmk', 'qsmk_smkint',
          'sex', 'race', 'age', 'age_sq',
          'educ_2', 'educ_3', 'educ_4', 'educ_5',
          'smokeintensity', 'smokeintensity_sq',
          'smokeyrs', 'smokeyrs_sq',
          'exer_1', 'exer_2', 'active_1', 'active_2',
          'wt71', 'wt71_sq']]

[50]:

# Setting qsmk to 0
dca['qsmk'] = 0
dca['qsmk_smkint'] = dca['qsmk'] * dca['smokeintensity']
# Design matrix from qsmk=0 data
X0 = dca[['I', 'qsmk', 'qsmk_smkint',
          'sex', 'race', 'age', 'age_sq',
          'educ_2', 'educ_3', 'educ_4', 'educ_5',
          'smokeintensity', 'smokeintensity_sq',
          'smokeyrs', 'smokeyrs_sq',
          'exer_1', 'exer_2', 'active_1', 'active_2',
          'wt71', 'wt71_sq']]

Now, we will use the design matrices to generate predicted outcomes (pseudo-outcomes) for each individual. The average causal effect can then be calculated from those pseudo-outcomes.

The following is the corresponding estimating equations

[51]:

def psi_gcomp1(theta):
    # Dividing parameters into corresponding estimation equations
    rd, r1, r0 = theta[0], theta[1], theta[2]   # Causal means
    beta = theta[3:]                            # Outcome model

    # Estimating the linear model
    ee_reg = ee_regression(theta=beta,      # Outcome model
                           X=X, y=y,        # ... observed data
                           model='linear')  # ... linear model

    # Generating pseudo-outcome using the model
    y1hat = np.dot(X1, beta)  # Predicted Y when qsmk=1
    y0hat = np.dot(X0, beta)  # Predicted Y when qsmk=0

    # Causal means
    ee_r1 = y1hat - r1   # Causal mean for qsmk=1
    ee_r0 = y0hat - r0   # Causal mean for qsmk=0

    # Average causal effect
    ee_rd = np.ones(y.shape[0]) * ((r1 - r0) - rd)

    # Stacking the estimating equations and returning
    return np.vstack([ee_rd, ee_r1, ee_r0, ee_reg])

[52]:

# M-estimator
init_vals = [0., 0., 0.,] + init_reg
estr = MEstimator(psi_gcomp1, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)

[53]:

estr.print_results(subset=[0, 1, 2])

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1566 | No. Parameters:             24
Solving algorithm:       hybr | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    3.52     0.48     7.36     2.58     4.45     0.00    42.36
    5.27     0.44    12.12     4.42     6.13     0.00   109.95
    1.76     0.22     8.08     1.33     2.18     0.00    50.48
==============================================================

The first estimate is for the average causal effect (the second are the causal means under all quit smoking and all don’t quit smoking). Here, the confidence intervals match the book, but we were able to avoid the computational complexity of the bootstrap (another advantage of the sandwich variance estimator).

In the book, they report 3.5 (95% CI: 2.6, 4.5). The confidence interval in the book was generated via the bootstrap, so there is potential for random error in the estimates.

Rather than implement g-computation by-hand, we can also use the built-in estimating equations. Here is an example of that

[54]:

def psi_gcomp1a(theta):
    # Built-in g-formula estimating equation
    return ee_gformula(theta,
                       y=y, X=X,
                       X1=X1, X0=X0)

[55]:

# M-estimator
init_vals = [0., 0., 0.,] + init_reg
estr = MEstimator(psi_gcomp1a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)

[56]:

estr.print_results(subset=[0, 1, 2])

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1566 | No. Parameters:             24
Solving algorithm:       hybr | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    3.52     0.48     7.36     2.58     4.45     0.00    42.36
    5.27     0.44    12.12     4.42     6.13     0.00   109.95
    1.76     0.22     8.08     1.33     2.18     0.00    50.48
==============================================================

which provides the same answers (as we would expect!)

Fine Point 13.2: A doubly robust estimator

As a way to combine the estimators of the previous chapters, this aside in the book discusses construction of ‘doubly robust’ estimators. Namely, this estimator takes both the propensity score and outcome regression approaches, and then combines them in such a way that the resulting estimator is consistent when either of the two nuisance models is correctly specified. The Fine Point reviews a doubly robust estimator proposed by Bang & Robins. The doubly robust estimator considered here is different and is sometimes referred to as the canonical augmented inverse probability weighting (AIPW) estimator. Briefly, we apply the estimating functions for the nuisance models and then we combine these nuisance parameters via the following formula

\[\hat{E}[Y^a] = n^{-1} \sum_{i=1}^n \frac{Y_i I(A_i = a)}{\pi_{A=a}(W_i)} + m(a, W_i; \beta) I(A_i) \frac{\pi_{A\ne a}(W_i)}{\pi_{A=a}(W_i)}\]

which can be implemented manually with the following estimating functions

[57]:

# Refreshing design matrix for the propensity score model
W = dc[['I', 'sex', 'race', 'age', 'age_sq',
        'educ_2', 'educ_3', 'educ_4', 'educ_5',
        'smokeintensity', 'smokeintensity_sq',
        'smokeyrs', 'smokeyrs_sq',
        'exer_1', 'exer_2', 'active_1', 'active_2',
        'wt71', 'wt71_sq']]
# Refreshing treatment array
a = dc['qsmk']

[58]:

def psi_aipw1(theta):
    # Dividing parameters into corresponding estimation equations
    ndim_X = X.shape[1]
    rd, r1, r0 = theta[0], theta[1], theta[2]   # Causal means
    beta = theta[3: 3+ndim_X]                   # Outcome model
    alpha = theta[3+ndim_X:]                    # Outcome model

    # Estimating the propensity scores
    ee_ps = ee_regression(theta=alpha,       # Estimate propensity scores
                          X=W, y=a,          # ... given observed A,W
                          model='logistic')  # ... with logit model
    pi = inverse_logit(np.dot(W, alpha))     # Get Pr(A = 1 | W)

    # Estimating the outcome model
    ee_reg = ee_regression(theta=beta,      # Outcome model
                           X=X, y=y,        # ... observed data
                           model='linear')  # ... linear model
    y1 = np.dot(X1, beta)                   # Pseudo potential outcome
    y0 = np.dot(X0, beta)                   # Pseudo potential outcome

    # Causal means
    ee_r1 = (y*a/pi - y1*(a-pi)/pi) - r1              # Causal mean for qsmk=1
    ee_r0 = (y*(1-a)/(1-pi) + y0*(a-pi)/(1-pi)) - r0  # Causal mean for qsmk=0

    # Average causal effect
    ee_rd = np.ones(y.shape[0]) * ((r1 - r0) - rd)

    # Stacking the estimating equations and returning
    return np.vstack([ee_rd, ee_r1, ee_r0, ee_reg, ee_ps])

[59]:

# M-estimator
init_vals = [0., 0., 0., ] + [0., ]*X.shape[1] + [0.,]*W.shape[1]
estr = MEstimator(psi_aipw1, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)

[60]:

estr.print_results(subset=[0, 1, 2])

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1566 | No. Parameters:             43
Solving algorithm:       hybr | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    3.46     0.48     7.14     2.51     4.41     0.00    39.99
    5.22     0.44    11.82     4.36     6.09     0.00   104.64
    1.77     0.22     8.07     1.34     2.20     0.00    50.39
==============================================================

Here, the AIPW results are similar to the previous g-computation and IPW results.

We can also use the built-in functionality for the canonical AIPW estimator (ee_aipw) to do this same approach without all the manual coding.

[61]:

def psi_aipw2(theta):
    return ee_aipw(theta=theta, y=y, A=a, W=W, X=X, X1=X1, X0=X0)

[62]:

# M-estimator
init_vals = [0., 0., 0., ] + [-1.5, ] + [0., ]*(X.shape[1]-1) + [0.,]*W.shape[1]
estr = MEstimator(psi_aipw2, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)

[63]:

estr.print_results(subset=[0, 1, 2])

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1566 | No. Parameters:             43
Solving algorithm:       hybr | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    3.46     0.48     7.14     2.51     4.41     0.00    39.99
    5.22     0.44    11.82     4.36     6.09     0.00   104.64
    1.77     0.22     8.07     1.34     2.20     0.00    50.39
==============================================================

These results match the by-hand implementation, as expected.

There is also an additional way to implement the AIPW estimator. This approach fits a weighted outcome model, where the weights are from IPW. This model is then used to predict values, like the standard g-computation algorithm. As shown below, this approach might be easier to implement than the others. This version is sometimes referred to as the weighted-regression AIPW

[64]:

def psi_aipw3(theta):
    # Dividing parameters into corresponding estimation equations
    ndim_X = X.shape[1]
    rd, r1, r0 = theta[0], theta[1], theta[2]   # Causal means
    beta = theta[3: 3+ndim_X]                   # Outcome model
    alpha = theta[3+ndim_X:]                    # Outcome model

    # Estimating the propensity scores
    ee_ps = ee_regression(theta=alpha,       # Estimate propensity scores
                          X=W, y=a,          # ... given observed A,W
                          model='logistic')  # ... with logit model
    pi = inverse_logit(np.dot(W, alpha))     # Get Pr(A = 1 | W)
    ipw = 1 / np.where(a == 1, pi, 1-pi)     # Convert to IPW

    # Estimating the outcome model
    ee_reg = ee_regression(theta=beta,      # Outcome model
                           X=X, y=y,        # ... observed data
                           model='linear',  # ... linear model
                           weights=ipw)     # ... weighted by IPW
    y1 = np.dot(X1, beta)                   # Pseudo potential outcome
    y0 = np.dot(X0, beta)                   # Pseudo potential outcome

    # Causal means
    ee_r1 = y1 - r1  # Causal mean for qsmk=1
    ee_r0 = y0 - r0  # Causal mean for qsmk=0

    # Average causal effect
    ee_rd = np.ones(y.shape[0]) * ((r1 - r0) - rd)

    # Stacking the estimating equations and returning
    return np.vstack([ee_rd, ee_r1, ee_r0, ee_reg, ee_ps])

[65]:

# M-estimator
init_vals = [0., 0., 0., ] + [0., ]*X.shape[1] + [0.,]*W.shape[1]
estr = MEstimator(psi_aipw3, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)

[66]:

estr.print_results(subset=[0, 1, 2])

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1566 | No. Parameters:             43
Solving algorithm:       hybr | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    3.43     0.48     7.16     2.49     4.36     0.00    40.13
    5.19     0.44    11.90     4.33     6.04     0.00   106.14
    1.76     0.22     8.06     1.33     2.19     0.00    50.19
==============================================================

Note that these estimates differ slightly, but this is expected since it solves a different equation. While not relevant in this setting, the weighted-regression AIPW may be preferred over the canonical AIPW, since the former is guaranteed to be bounded in the parameter space while the latter is not.

As detailed in Shook-Sa et al. Biometrics 2025, the M-estimation setup offers the additional feature that the AIPW estimators have doubly robust point and variance estimation. This is not always the case for variance estimation with AIPW (which is not desirable).

Chapter 14: G-Estimation of Structural Nested Models

G-estimation differs from the previous approaches in what parameter it targets. Rather than the average causal effect, we will estimate the parameters of a structural nested model. The structural nested mean model is

\[E[Y^a - Y^{a=0} | A=a, W] = \varphi_0 a\]

Here, \(\varphi_0\) represents the difference by \(a\). However, we can also study effect measure modification easily with structural nested models. Consider

\[E[Y^a - Y^{a=0} | A=a, W] = \varphi_0 a + \varphi_1 a V\]

where \(V \in W\). Therefore, the structural nested model described effect measure modification by \(V\). Importantly, structural nested models assume that we have correctly specified this model (hence the difference in interest parameters that occurs between methods).

G-estimation is a bit less straightforward to understand compare to the previous methods. Note that we will be using the propensity score for estimation and we still assume the same identification conditions. See the book for a much more in-depth discussion.

14.5 G-estimation

For solve our g-estimator, we are going to use a different approach than the one described in the main text of the book. Instead, we are going to use the approach (the estimating equations) described in Technical Point 14.2. As mentioned there, this approach is equivalent to the process described in the main text.

[67]:

# Design matrix for propensity scores
W = np.asarray(df[['I', 'sex', 'race', 'age', 'age_sq',
                   'educ_2', 'educ_3', 'educ_4', 'educ_5',
                   'smokeintensity', 'smokeintensity_sq',
                   'smokeyrs', 'smokeyrs_sq',
                   'exer_1', 'exer_2', 'active_1', 'active_2',
                   'wt71', 'wt71_sq']])
# Design matrix for missing model
X = np.asarray(df[['I', 'qsmk', 'sex', 'race', 'age', 'age_sq',
                   'educ_2', 'educ_3', 'educ_4', 'educ_5',
                   'smokeintensity', 'smokeintensity_sq',
                   'smokeyrs', 'smokeyrs_sq',
                   'exer_1', 'exer_2', 'active_1', 'active_2',
                   'wt71', 'wt71_sq']])
# Design matrix for structural nested model
snm = np.asarray(df[['I', ]])
# Treatment array
a = np.asarray(df['qsmk'])
# Outcome array
y = np.asarray(df['wt82_71'])
# Missing indicator
r = np.where(df['wt82_71'].isna(), 0, 1)

[68]:

def psi_snm1(theta):
    # Dividing parameters into corresponding estimation equations
    alpha = theta[:snm.shape[1]]             # SNM parameters
    beta = theta[snm.shape[1]:               # Propensity score
                 snm.shape[1]+W.shape[1]]
    gamma = theta[snm.shape[1]+W.shape[1]:]  # Missing score

    # Estimating equation for IPMW
    ee_ms = ee_regression(theta=gamma,       # Missing score
                          X=X, y=r,          # ... observed data
                          model='logistic')  # ... logit model
    pi_m = inverse_logit(np.dot(X, gamma))   # Predicted prob
    ipmw = r / pi_m                          # Missing weights

    # Estimating equations for PS
    ee_log = ee_regression(theta=beta,        # Propensity score
                           X=W, y=a,          # ... observed data
                           model='logistic',  # ... logit model
                           weights=ipmw)      # ... weighted
    pi = inverse_logit(np.dot(W, beta))       # Predicted prob

    # H(psi) equation for linear models
    h_psi = y - np.dot(snm*a[:, None], alpha)

    # Estimating equation for the structural nested mean model
    ee_snm = ipmw*(h_psi[:, None] * (a - pi)[:, None] * snm).T
    # Setting rows with missing Y's as zero (no contribution)
    ee_snm = np.nan_to_num(ee_snm, copy=False, nan=0.)

    return np.vstack([ee_snm, ee_log, ee_ms])

[69]:

# M-estimator
init_vals = [0., ] + init_ps + [0., ]*X.shape[1]
estr = MEstimator(psi_snm1, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)

[70]:

estr.print_results(subset=[0, ])

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1629 | No. Parameters:             40
Solving algorithm:       hybr | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    3.45     0.47     7.35     2.53     4.36     0.00    42.21
==============================================================

The book reported 3.4 (95% CI: 2.5, 4.5), generating confidence interval by inverting the test results. As such, there is expected differences (different estimators are being used). As in the inverse probability weighting examples, these confidence intervals are expected due to be conservative due to the use of the ‘GEE trick’.

Now consider how we can use the built-in g-estimation functionality, ee_gestimation_snmm.

[71]:

def psi_snm1a(theta):
    # Dividing parameters into corresponding estimation equations
    alpha = theta[:snm.shape[1]+W.shape[1]]  # SNM and PS
    gamma = theta[snm.shape[1]+W.shape[1]:]  # Missing scores

    # Estimating equation for IPMW
    ee_ms = ee_regression(theta=gamma,       # Missing score
                          X=X, y=r,          # ... observed data
                          model='logistic')  # ... logit model
    pi_m = inverse_logit(np.dot(X, gamma))   # Predicted prob
    ipmw = r / pi_m                          # Missing weights

    # Estimating equations for PS
    ee_snm = ee_gestimation_snmm(theta=alpha,
                                 y=y, A=a,
                                 W=W, V=snm,
                                 weights=ipmw)
    # Setting rows with missing Y's as zero (no contribution)
    ee_snm = np.nan_to_num(ee_snm, copy=False, nan=0.)

    return np.vstack([ee_snm, ee_ms])

[72]:

# M-estimator
init_vals = [0., ] + init_ps + [0., ]*X.shape[1]
estr = MEstimator(psi_snm1a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)

[73]:

estr.print_results(subset=[0, ])

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1629 | No. Parameters:             40
Solving algorithm:       hybr | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    3.45     0.47     7.35     2.53     4.36     0.00    42.21
==============================================================

The structural nested model parameters match between implementations, as expected.

14.6: Structural nested models with two or more parameters

We now adapt the previous code to consider more than 2 parameters in the structural nested model. Thankfully, this is pretty straightforward for how we coded the estimating equations since snm is a global object.

[74]:

snm = np.asarray(df[['I', 'smokeintensity']])

[75]:

def psi_snm2(theta):
    # Dividing parameters into corresponding estimation equations
    alpha = theta[:snm.shape[1]]             # SNM parameters
    beta = theta[snm.shape[1]:               # Propensity score
                 snm.shape[1]+W.shape[1]]
    gamma = theta[snm.shape[1]+W.shape[1]:]  # Missing score

    # Estimating equation for IPMW
    ee_ms = ee_regression(theta=gamma,       # Missing score
                          X=X, y=r,          # ... observed data
                          model='logistic')  # ... logit model
    pi_m = inverse_logit(np.dot(X, gamma))   # Predicted prob
    ipmw = r / pi_m                          # Missing weights

    # Estimating equations for PS
    ee_log = ee_regression(theta=beta,        # Propensity score
                           X=W, y=a,          # ... observed data
                           model='logistic',  # ... logit model
                           weights=ipmw)      # ... weighted
    pi = inverse_logit(np.dot(W, beta))       # Predicted prob

    # H(psi) equation for linear models
    h_psi = y - np.dot(snm*a[:, None], alpha)

    # Estimating equation for the structural nested mean model
    ee_snm = ipmw*(h_psi[:, None] * (a - pi)[:, None] * snm).T
    # Setting rows with missing Y's as zero (no contribution)
    ee_snm = np.nan_to_num(ee_snm, copy=False, nan=0.)

    return np.vstack([ee_snm, ee_log, ee_ms])

[76]:

# M-estimator
init_vals = [0., 0., ] + init_ps + [0., ]*X.shape[1]
estr = MEstimator(psi_snm2, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)

[77]:

estr.print_results(subset=[0, 1])

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1629 | No. Parameters:             41
Solving algorithm:       hybr | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    2.86     0.93     3.06     1.03     4.69     0.00     8.83
    0.03     0.05     0.66    -0.06     0.12     0.51     0.97
==============================================================

which provides the same results as the book (2.86 and 0.03). Unlike the book, we provide confidence intervals. Inverting the test is not simple in this case, and the book does not report variance estimates from a bootstrapping procedure.

Again, we can apply the built-in estimating equations for g-estimation. In truth, the new estimating equation is the same as the previous g-estimation version (since snm is a global object as called). Regardless, we provide again here.

[78]:

def psi_snm2a(theta):
    # Dividing parameters into corresponding estimation equations
    alpha = theta[:snm.shape[1]+W.shape[1]]  # SNM and PS
    gamma = theta[snm.shape[1]+W.shape[1]:]  # Missing scores

    # Estimating equation for IPMW
    ee_ms = ee_regression(theta=gamma,       # Missing score
                          X=X, y=r,          # ... observed data
                          model='logistic')  # ... logit model
    pi_m = inverse_logit(np.dot(X, gamma))   # Predicted prob
    ipmw = r / pi_m                          # Missing weights

    # Estimating equations for PS
    ee_snm = ee_gestimation_snmm(theta=alpha,
                                 y=y, A=a,
                                 W=W, V=snm,
                                 weights=ipmw)
    # Setting rows with missing Y's as zero (no contribution)
    ee_snm = np.nan_to_num(ee_snm, copy=False, nan=0.)

    return np.vstack([ee_snm, ee_ms])

[79]:

# M-estimator
init_vals = [0., 0., ] + init_ps + [0., ]*X.shape[1]
estr = MEstimator(psi_snm2a, init=init_vals)
estr.estimate(solver='hybr', maxiter=5000)

[80]:

estr.print_results(subset=[0, 1])

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1629 | No. Parameters:             41
Solving algorithm:       hybr | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    2.86     0.93     3.06     1.03     4.69     0.00     8.83
    0.03     0.05     0.66    -0.06     0.12     0.51     0.97
==============================================================

This concludes chapter 14.

Chapter 16: Instrumental Variable Analysis

Chapter 16 focuses on instrumental variable (IV) analysis. Here, we are going to utilize an instrument (high state cigarette prices, variable name: highprice) to estimate the effect of quitting smoking on weight gain.

Here, we will reload the data set, since the rows we need to drop due to missing data differ from the previous examples.

[81]:

df = pd.read_csv('data/nhefs.csv')
df['highprice'] = (df['price82'] >= 1.5).astype('int')
df.dropna(subset=['wt82', 'price82'],
         inplace=True)
df = df[['highprice', 'qsmk', 'wt82_71', 'price82']].copy()
df['I'] = 1

[82]:

z = np.asarray(df['highprice'])
a = np.asarray(df['qsmk'])
y = np.asarray(df['wt82_71'])

16.1: The three instrumental conditions

As described in the book, we will first examine the relationship between our instrument \(Z\) and the exposure \(A\). Below we present estimating equations for \(\Pr(A=1 | Z=1) - \Pr(A=1 | Z=0)\)

[83]:

def psi_iv_strength(theta):
    ee_az1 = z*(a - theta[0])
    ee_az0 = (1-z)*(a - theta[1])
    ee_rd = np.ones(a.shape[0]) * (theta[0] - theta[1] - theta[2])
    return np.vstack([ee_az1, ee_az0, ee_rd])

[84]:

estr = MEstimator(psi_iv_strength, init=[0.5, 0.5, 0.])
estr.estimate()

[85]:

estr.print_results()

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1476 | No. Parameters:              3
Solving algorithm:         lm | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    0.26     0.01    22.33     0.24     0.28     0.00   364.43
    0.20     0.06     3.15     0.07     0.32     0.00     9.27
    0.06     0.06     1.00    -0.06     0.19     0.32     1.65
==============================================================

These are the same (point) estimates as those reported in the book. Note that our results also provide the confidence intervals. Here, \(Z\) is considered to be a weak instrument since the risk difference is 6.3%. Our results also show that the corresponding confidence intervals are also quite wide.

16.2: The usual IV estimand

The usual IV is defined as

\[\beta = \frac{E[Y \mid Z=1] - E[Y \mid Z=0]}{E[A \mid Z=1] - E[A \mid Z=0]}\]

There are a few ways to implement this IV estimator with estimating equations. The first is to estimate each of the pieces and combine them as shown in the previous equation. The estimating equations for this approach are

\[\begin{split} \sum_{i=1}^n \begin{bmatrix} (\alpha_1 - \alpha_2) - \beta(\alpha_3 - \alpha_4) \\ Z_i (Y_i - \alpha_1) \\ (1-Z_i) \times (Y_i - \alpha_2) \\ Z_i (A_i - \alpha_3) \\ (1-Z_i) \times (A_i - \alpha_4) \\ \end{bmatrix} = 0\end{split}\]

where the first equation is the usual IV, and the following 4 are the \(Z\) conditional means for \(Y\) and \(A\) respectively. This estimating equation can be implemented by

[86]:

def psi_usual_iv1(theta):
    ee_uiv = np.ones(y.shape[0]) * ((theta[1] - theta[2])
                                    - theta[0]*(theta[3] - theta[4]))
    ee_yz1 = z*(y - theta[1])      # for E[Y | Z=1]
    ee_yz0 = (1-z)*(y - theta[2])  # for E[Y | Z=0]
    ee_az1 = z*(a - theta[3])      # for E[A | Z=1]
    ee_az0 = (1-z)*(a - theta[4])  # for E[A | Z=0]
    return np.vstack([ee_uiv, ee_yz1, ee_yz0, ee_az1, ee_az0])

[87]:

estr = MEstimator(psi_usual_iv1, init=[0., 0., 0., 0.5, 0.5])
estr.estimate()

[88]:

estr.print_results(decimals=3)

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1476 | No. Parameters:              5
Solving algorithm:         lm | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
   2.396   22.341    0.107  -41.391   46.183    0.915    0.129
   2.686    0.208   12.888    2.278    3.095    0.000  123.834
   2.536    1.444    1.756   -0.294    5.365    0.079    3.662
   0.258    0.012   22.328    0.235    0.280    0.000  364.433
   0.195    0.062    3.153    0.074    0.316    0.002    9.272
==============================================================

The \(Z\) conditional means match those reported in the book (2.686, 2.536, 0.2578, 0.1951), which gives us the same IV estimate as the book (2.4 kg).

Another way is shown in Boos & Stefanski (2013). The usual IV can be expressed as the following pair of estimating equations

\[\begin{split} \sum_{i=1}^n \begin{bmatrix} (Y_i - A_i \beta) \times (Z_i - \alpha) \\ Z_i - \alpha \\ \end{bmatrix} = 0\end{split}\]

so there is only one nuisance parameter (the mean of \(Z\)) that needs to be estimated. The following implements this estimating equation by-hand

[89]:

def psi_usual_iv2(theta):
    ee_uiv = (y - a*theta[0]) * (z - theta[1])
    ee_z = z - theta[1]
    return np.vstack([ee_uiv, ee_z])

[90]:

estr = MEstimator(psi_usual_iv2, init=[0., 0.5])
estr.estimate()

[91]:

estr.print_results(decimals=3)

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1476 | No. Parameters:              2
Solving algorithm:         lm | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
   2.396   22.341    0.107  -41.391   46.184    0.915    0.129
   0.972    0.004  227.288    0.964    0.981    0.000      inf
==============================================================

There is also a built-in version of the usual IV estimator, shown below

[92]:

def psi_usual_iv3(theta):
    return ee_iv_causal(theta, y=y, A=a, Z=z)

[93]:

estr = MEstimator(psi_usual_iv3, init=[0., 0.5])
estr.estimate()

[94]:

estr.print_results(decimals=3)

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1476 | No. Parameters:              2
Solving algorithm:         lm | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
   2.396   22.341    0.107  -41.391   46.184    0.915    0.129
   0.972    0.004  227.288    0.964    0.981    0.000      inf
==============================================================

As expected, the built-in implementation provides the same result.

In the book, another approach for estimation is also described: two-stage least squares (2SLS). 2SLS works by estimating two linear regression models. The first is for \(A\) given \(Z\). From that model, we can get the predicted value of \(A\) given \(Z\) and an intercept, denoted by \(\hat{A}\). Next, we can fit a model for \(Y\) given \(\hat{A}\) and an intercept. The coefficient for \(\hat{A}\) from this model is the IV estimate. Below is code to implement this approach by-hand

[95]:

Z = np.asarray(df[['I', 'highprice']])

[96]:

def psi_2sls1(theta):
    beta = theta[:2]
    alpha = theta[2:]

    # First-stage regression
    ee_stage1 = ee_regression(theta=alpha, y=a, X=Z, model='linear')

    # Second-stage regression
    a_hat = np.dot(Z, alpha)
    A_hat = np.asarray([a_hat, np.ones(a.shape[0])]).T
    ee_stage2 = ee_regression(theta=beta, y=y, X=A_hat, model='linear')

    # Returning stacked estimating equations
    return np.vstack([ee_stage2, ee_stage1])

[97]:

estr = MEstimator(psi_2sls1, init=[0., 0., 0., 0.])
estr.estimate()

[98]:

estr.print_results()

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1476 | No. Parameters:              4
Solving algorithm:         lm | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    2.40    22.34     0.11   -41.40    46.19     0.91     0.13
    2.07     5.73     0.36    -9.17    13.31     0.72     0.48
    0.20     0.06     3.15     0.07     0.32     0.00     9.27
    0.06     0.06     1.00    -0.06     0.19     0.32     1.65
==============================================================

Here, the first element is the IV estimate (note how the A_hat matrix is built). This estimate matches the previous methods and the book. The book reports their 95% CI as −36.5 to 41.3, which differs slightly from what we have. This is because we are using a different variance estimator. The one reported in the book assumes homoskedasticity, whereas the empirical sandwich variance estimator does not.

There is also a built-in procedure for 2SLS. The following is how that built-in procedure can be used to replicate the by-hand code

[99]:

def psi_2sls2(theta):
    return ee_2sls(theta, y=df['wt82_71'], A=df['qsmk'],
                   Z=df[['highprice', ]], W=df[['I', ]])

[100]:

estr = MEstimator(psi_2sls2, init=[0., 0., 0., 0.])
estr.estimate()

[101]:

estr.print_results()

==============================================================
              Estimation Method: M-estimator
--------------------------------------------------------------
No. Observations:        1476 | No. Parameters:              4
Solving algorithm:         lm | Max Iterations:           5000
Solving tolerance:      1e-09 | Allow P-Inverse:             1
Derivative Method:     approx | Deriv Approx:            1e-09
Small N Correction:      None | Distribution:           Z-stat
==============================================================
   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value
--------------------------------------------------------------
    2.40    22.34     0.11   -41.38    46.17     0.91     0.13
    2.07     5.73     0.36    -9.17    13.30     0.72     0.48
    0.06     0.06     1.00    -0.06     0.19     0.32     1.65
    0.20     0.06     3.15     0.07     0.32     0.00     9.27
==============================================================

The built-in procedure matches the by-hand implementation, as expected. Note that the 2SLS implementation is more general and allows for one to adjust for other variables or use multiple instruments. Therefore, it is likely to be the preferred implementation over the usual IV expression.

16.5: The three instrument conditions revisited

Below is code to run the analysis at different thresholds for the ‘high price’ cut-off, as described in the book.

[102]:

def psi_2sls3(theta):
    return ee_2sls(theta, y=df['wt82_71'], A=df['qsmk'],
                   Z=z_new[:, None], W=df[['I', ]])

[103]:

for c in [1.60, 1.70, 1.80, 1.90]:
    z_new = np.where(df['price82'] >= c, 1, 0)
    estr = MEstimator(psi_2sls3, init=[0., 0., 0., 0.])
    estr.estimate()
    ci = estr.confidence_intervals()[0, :]
    print("Cutoff: $"+str(c))
    print(np.round([estr.theta[0], ci[0], ci[1]], 3))

Cutoff: $1.6
[  41.281 -278.122  360.685]
Cutoff: $1.7
[ -40.912 -428.146  346.322]
Cutoff: $1.8
[-21.103 -77.16   34.953]
Cutoff: $1.9
[-12.811 -54.941  29.318]

These results are the same as those reported in the book (41.3, −40.9, −21.1, −12.8). Again confidence intervals may differ slightly due to differences between variance estimators.

Chapter 17: Causal Survival Analysis

Replication of the following chapter 17 is not available yet.

References

Shook-Sa BE, Zivich PN, Lee C, Xue K, Ross RK, Edwards JK, Stringer JSA, & Cole SR. (2025). Double robust variance estimation with parametric working models. Biometrics, 81(2), ujaf054.