delicatessen.estimating_equations.causal.ee_gestimation_snmm_iv
- ee_gestimation_snmm_iv(theta, y, Z, A, W, V, X=None, model='linear', model_instrument='logistic', weights=None)
- Estimating equations for g-estimation of structural mean models (SMMs) with an instrumental variable (IV). The
parameter(s) of interest are the parameter(s) of the corresponding SMM. With an IV, the linear (additive) SMM remains the same (see
ee_gestimation_snmm). Rather than model the propensity score for the action variables (\(A\)), we instead model the IV (\(Z\)). The inefficient g-estimator we solve for \(eta\) in the following estimating equation\[\sum_{i=1}^n egin{bmatrix} \left\{ H(eta) imes (Z_i - \pi_i)\]- ight} imes V_i
left{ Z_i - g(W_i^T lpha)
- ight} W_i
end{bmatrix} = 0
where \(\pi_i = \hat{E}[Z \mid W]\), and \(H(eta) = Y - eta A \mathbb{V}\) for a linear SMM and \(H(eta) = Y imes \exp(-A eta \mathbb{V})\) for a log-linear SMM. Note that \(V \subseteq W\), where \(W\) is the set of confounding variables for the instrument (if there are any). The length of the parameter vector is b`+`c, where b is the number of columns in
V, and c is the number of columns inW.Alternatively, the efficient g-estimator can also be used. Like with the other g-estimator of the structural mean model, we replace \(H(eta)\) with \(\{H(eta) - E[H(eta) | W]\}\) in the prior estimating equation and specify a model for \(E[H(eta) | W]\).
- thetandarray, list, vector
Theta consists of 1+`b` values if
X0isNone, and 3+b values ifX0is notNone.- yndarray, list, vector
1-dimensional vector of n observed values of the outcome.
- Zndarray, list, vector
1-dimensional vector of n observed values of the instrument. Values should all be 0 or 1.
- Andarray, list, vector
1-dimensional vector of n observed values of the action. Values should all be 0 or 1.
- Wndarray, list, vector
2-dimensional vector of n observed values for b columns of a design matrix to model the expected value of
A.- Vndarray, list, vector
2-dimensional vector of n observed values for b columns of a design matrix for the structural mean model. Note that the design matrix here is expected to not include the observed values of
A- Xndarray, list, vector, None, optional
Default of this argument is
None, which implements the estimating equation for the inefficient g-estimator. To use the efficient g-estimator, a 2-dimensional vector of n observed values for b columns of a design matrix for the \(E[H(eta) | W]\) model should be provided here.- modelstr, optional
Type of structural mean model to fit. Options are currently:
linear,poisson. Default islinear. The Poisson model specification can be used for positive continuous data, or with binary data in order to estimate causal risk ratios.- model_instrumentstr, optional
Type of model to fit for the instrument, E[Z | X]. This choice should be made depending on the type of variable the instrument is. Options are currently:
linear,poisson, orlogistic. Default islinear.- weightsndarray, list, vector, None, optional
1-dimensional vector of n weights. Default is
None, which assigns a weight of 1 to all observations. This argument is intended to support the use of sampling or missingness weights.
- array :
Returns a (b`+`c)-by-n (inefficient) or (b`+`c`+`d)-by-n (efficient) NumPy array evaluated for the input
theta.
Construction of an estimating equation(s) with
ee_gestimation_snmmshould be done similar to the following>>> import numpy as np >>> import pandas as pd >>> from scipy.stats import logistic >>> from delicatessen import MEstimator >>> from delicatessen.estimating_equations import ee_gestimation_snmm
Some generic data
>>> n = 200 >>> d = pd.DataFrame() >>> d['Z'] = np.random.binomial(n=1, p=0.5, size=n) >>> d['U'] = np.random.normal(size=n) >>> pr_a = inverse_logit(d['U'] + d['Z']) >>> d['A'] = np.random.binomial(n=1, p=pr_a, size=n) >>> d['X'] = np.random.normal(size=n) >>> d['Y'] = 2*d['A'] - d['U'] + 0.1*d['X'] + np.random.normal(size=n) >>> d['C'] = 1
To start, consider 2SLS without any exogenous variables. The psi function is
>>> def psi(theta): >>> return ee_gestimation_snmm_iv(theta, y=d['Y'], >>> Z=d['Z'], A=d['A'], >>> W=d[['C', ]], V=d[['C', ]], >>> model_instrument='logistic')
Calling the M-estimator. the structural mean model has 2 parameters. Generally, starting with all
0.as initials is reasonable.>>> estr = MEstimator(psi, >>> init=[0., 0., ]) >>> estr.estimate()
Inspecting the parameter estimates, variance, and 95% confidence intervals
>>> estr.theta >>> estr.variance >>> estr.confidence_intervals()
More specifically, the corresponding parameters are
>>> estr.theta[0] # Structural mean model parameter >>> estr.theta[1] # Nuisance model parameter
Here, the parameter of interest is
estr.theta[0], which under the IV assumptions is a causal effect of \(A\) on \(Y\).To add exogenous variables (i.e., variables related to the instrument and outcome), the corresponding g-estimator is specified as
>>> def psi(theta): >>> return ee_gestimation_snmm_iv(theta, y=d['Y'], >>> Z=d['Z'], A=d['A'], >>> W=d[['C', 'X']], V=d[['C', ]], >>> model_instrument='logistic')
Here, 3 parameters are estimated since there is a single exogenous variable that shows up in only the nuisance model
>>> estr = MEstimator(psi, >>> init=[0., 0., 0.]) >>> estr.estimate() >>> estr.theta[0] # Structural Mean Model Parameters >>> estr.theta[1:] # Nuisance Model Parameters
Modification by \(X\) can be assessed by include that variable into the structural mean model (via
V=d[['C', 'X']]). A continuous variable can also be used as an IV by updating themodel_instrumentargument. Finally, a multiplicative structural mean model can be considered instead by updating themodelargument.Robins JM (1994). Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics-Theory and Methods, 23(8), 2379-2412.
Vansteelandt S, & Joffe M (2014). Structural nested models and G-estimation: the partially realized promise. Statist Sci, 29(4), 707-731.