delicatessen.estimating_equations.causal.ee_2sls
- ee_2sls(theta, y, A, Z, W=None, weights=None)
Estimating equations for Two-Stage Least Squares (2SLS) for instrumental variable (IV) analysis. The pair of estimating equations are
\[\begin{split}\sum_{i=1}^n \begin{bmatrix} \left\{ Y_i - \hat{X}_i^T \beta \right\} \hat{X}_i \\ \left\{ A_i - X_i^T \alpha \right\} X_i \end{bmatrix} = 0\end{split}\]where \(A\) is the action of interest, \(Y\) is the outcome of interest, \(Z\) is the instrument(s), \(W\) is a set of exogenous variables (possibly the empty set, or none), \(X = (1, Z, W)\), \(\hat{X} = (1, \hat{A}, W)\), and \(\hat{A} = X^T \alpha\). Here, the length of the theta vector is 1 + b + 2`c`, where b is the number of covariates in \(Z\) and c is the number of covariates in \(W\).
- Parameters
theta (ndarray, list, vector) – Theta consists of 1 + b + 2c values. The first set of 1 + c parameters are for the second-stage model, with the remainder corresponding to the first-stage model.
y (ndarray, list, vector) – 1-dimensional vector of n observed values for the outcome of interest.
A (ndarray, list, vector) – 1-dimensional vector of n observed values for the action of interest.
Z (ndarray, list, vector) – 2-dimensional vector of n observed values for the b instrumental variable(s).
W (ndarray, list, vector, None, optional) – 2-dimensional vector of n observed values for c exogenous variables. This design matrix is stacked together in the first- and second-stage regression models as provided. This argument allows for the addition of an intercept to both regression models. Default is None.
weights (ndarray, list, vector, None, optional) – 1-dimensional vector of n weights. Default is
None, which assigns a weight of 1 to all observations. This argument is intended to support the use of sampling or missingness weights.
- Returns
Returns a (1+`b`+2`c`)-by-n NumPy array evaluated for the input
thetaandy,A,Z- Return type
array
Examples
Construction of an estimating equation(s) with
ee_2slsshould be done similar to the following>>> import numpy as np >>> import pandas as pd >>> from delicatessen import MEstimator >>> from delicatessen.estimating_equations import ee_2sls
Some generic data
>>> n = 200 >>> d = pd.DataFrame() >>> d['Z'] = np.random.binomial(n=1, p=0.5, size=n) >>> d['U'] = np.random.normal(size=n) >>> pr_a = inverse_logit(d['U'] + d['Z']) >>> d['A'] = np.random.binomial(n=1, p=pr_a, size=n) >>> d['X'] = np.random.normal(size=n) >>> d['Y'] = 2*d['A'] - d['U'] + 0.1*d['X'] + np.random.normal(size=n)
To start, consider 2SLS without any exogenous variables. The psi function is
>>> def psi(theta): >>> return ee_2sls(theta, >>> y=d['Y'], >>> A=d['A'], >>> Z=d[['Z', ]])
Calling the M-estimator. 2SLS has 2 parameters with 1 coefficient in the second-stage model, and 1 coefficient in first-stage model. Generally, starting with all
0.as initials is reasonable for 2SLS.>>> estr = MEstimator(psi, >>> init=[0., 0., ]) >>> estr.estimate()
Inspecting the parameter estimates, variance, and 95% confidence intervals
>>> estr.theta >>> estr.variance >>> estr.confidence_intervals()
More specifically, the corresponding parameters are
>>> estr.theta[0] # Second-stage model >>> estr.theta[1] # First-stage model
Here, the parameter of interest is
estr.theta[0], which under the IV assumptions is a causal effect of \(A\) on \(Y\).To add an intercept term to the models or add exogenous variables, 2SLS is specified as
>>> def psi(theta): >>> return ee_2sls(theta, >>> y=d['Y'], >>> A=d['A'], >>> Z=d[['Z', ]], >>> W=d[['C', 'X']])
Here, 6 parameters are estimated since there is a single exogenous variable that shows up in both stages of 2SLS
>>> estr = MEstimator(psi, >>> init=[0., 0., 0., 0., 0., 0.]) >>> estr.estimate() >>> estr.theta[0:3] # Second-stage model >>> estr.theta[3:] # First-stage model
The parameter of interest is is again
estr.theta[0].Finally, there is also support for multiple instruments. This can be done by including multiple covariates in
Z. Below is an example of how the function would look>>> def psi(theta): >>> return ee_2sls(theta, >>> y=d['Y'], >>> A=d['A'], >>> Z=d[['Z1', 'Z2']], >>> W=d[['C', 'X']])
References
Meijer E, & Wansbeek T. (2007). The sample selection model from a method of moments perspective. Econometric Reviews, 26(1), 25-51.
Zivich PN, Cole SR, Edwards JK, Mulholland GE, Shook-Sa BE, & Tchetgen Tchetgen EJ. (2023). Introducing proximal causal inference for epidemiologists. American Journal of Epidemiology, 192(7), 1224-1227.
Zivich PN (2024). RE:’Estimating the effect of a treatment when there is non-adherence in a trial’. American Journal of Epidemiology, 194(2), 552-553.