delicatessen.estimating_equations.causal.ee_2sls

ee_2sls(theta, y, A, Z, W=None, weights=None)

Estimating equations for Two-Stage Least Squares (2SLS) for instrumental variable (IV) analysis. The pair of estimating equations are

\[\begin{split}\sum_{i=1}^n \begin{bmatrix} \left\{ Y_i - \hat{X}_i^T \beta \right\} \hat{X}_i \\ \left\{ A_i - X_i^T \alpha \right\} X_i \end{bmatrix} = 0\end{split}\]

where \(A\) is the action of interest, \(Y\) is the outcome of interest, \(Z\) is the instrument(s), \(W\) is a set of exogenous variables (possibly the empty set, or none), \(X = (1, Z, W)\), \(\hat{X} = (1, \hat{A}, W)\), and \(\hat{A} = X^T \alpha\). Here, the length of the theta vector is 1 + b + 2`c`, where b is the number of covariates in \(Z\) and c is the number of covariates in \(W\).

Parameters

theta (ndarray, list, vector) – Theta consists of 1 + b + 2c values. The first set of 1 + c parameters are for the second-stage model, with the remainder corresponding to the first-stage model.
y (ndarray, list, vector) – 1-dimensional vector of n observed values for the outcome of interest.
A (ndarray, list, vector) – 1-dimensional vector of n observed values for the action of interest.
Z (ndarray, list, vector) – 2-dimensional vector of n observed values for the b instrumental variable(s).
W (ndarray, list, vector, None, optional) – 2-dimensional vector of n observed values for c exogenous variables. This design matrix is stacked together in the first- and second-stage regression models as provided. This argument allows for the addition of an intercept to both regression models. Default is None.
weights (ndarray, list, vector, None, optional) – 1-dimensional vector of n weights. Default is None, which assigns a weight of 1 to all observations. This argument is intended to support the use of sampling or missingness weights.

Returns

Returns a (1+`b`+2`c`)-by-n NumPy array evaluated for the input theta and y,A,Z

Return type

array

Examples

Construction of an estimating equation(s) with ee_2sls should be done similar to the following

>>> import numpy as np
>>> import pandas as pd
>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_2sls

Some generic data

>>> n = 200
>>> d = pd.DataFrame()
>>> d['Z'] = np.random.binomial(n=1, p=0.5, size=n)
>>> d['U'] = np.random.normal(size=n)
>>> pr_a = inverse_logit(d['U'] + d['Z'])
>>> d['A'] = np.random.binomial(n=1, p=pr_a, size=n)
>>> d['X'] = np.random.normal(size=n)
>>> d['Y'] = 2*d['A'] - d['U'] + 0.1*d['X'] + np.random.normal(size=n)

To start, consider 2SLS without any exogenous variables. The psi function is

>>> def psi(theta):
>>>     return ee_2sls(theta,
>>>                    y=d['Y'],
>>>                    A=d['A'],
>>>                    Z=d[['Z', ]])

Calling the M-estimator. 2SLS has 2 parameters with 1 coefficient in the second-stage model, and 1 coefficient in first-stage model. Generally, starting with all 0. as initials is reasonable for 2SLS.

>>> estr = MEstimator(psi,
>>>                   init=[0., 0., ])
>>> estr.estimate()

Inspecting the parameter estimates, variance, and 95% confidence intervals

>>> estr.theta
>>> estr.variance
>>> estr.confidence_intervals()

More specifically, the corresponding parameters are

>>> estr.theta[0]   # Second-stage model
>>> estr.theta[1]    # First-stage model

Here, the parameter of interest is estr.theta[0], which under the IV assumptions is a causal effect of \(A\) on \(Y\).

To add an intercept term to the models or add exogenous variables, 2SLS is specified as

>>> def psi(theta):
>>>     return ee_2sls(theta,
>>>                    y=d['Y'],
>>>                    A=d['A'],
>>>                    Z=d[['Z', ]],
>>>                    W=d[['C', 'X']])

Here, 6 parameters are estimated since there is a single exogenous variable that shows up in both stages of 2SLS

>>> estr = MEstimator(psi,
>>>                   init=[0., 0., 0., 0., 0., 0.])
>>> estr.estimate()
>>> estr.theta[0:3]   # Second-stage model
>>> estr.theta[3:]    # First-stage model

The parameter of interest is is again estr.theta[0].

Finally, there is also support for multiple instruments. This can be done by including multiple covariates in Z. Below is an example of how the function would look

>>> def psi(theta):
>>>     return ee_2sls(theta,
>>>                    y=d['Y'],
>>>                    A=d['A'],
>>>                    Z=d[['Z1', 'Z2']],
>>>                    W=d[['C', 'X']])

References

Meijer E, & Wansbeek T. (2007). The sample selection model from a method of moments perspective. Econometric Reviews, 26(1), 25-51.

Zivich PN, Cole SR, Edwards JK, Mulholland GE, Shook-Sa BE, & Tchetgen Tchetgen EJ. (2023). Introducing proximal causal inference for epidemiologists. American Journal of Epidemiology, 192(7), 1224-1227.

Zivich PN (2024). RE:’Estimating the effect of a treatment when there is non-adherence in a trial’. American Journal of Epidemiology, 194(2), 552-553.