delicatessen.estimating_equations.causal.ee_gestimation_snmm_iv

ee_gestimation_snmm_iv(theta, y, Z, A, W, V, X=None, model='linear', model_instrument='logistic', weights=None)

Estimating equations for g-estimation of structural mean models (SMMs) with an instrumental variable (IV). The

parameter(s) of interest are the parameter(s) of the corresponding SMM. With an IV, the linear (additive) SMM remains the same (see ee_gestimation_snmm). Rather than model the propensity score for the action variables (\(A\)), we instead model the IV (\(Z\)). The inefficient g-estimator we solve for \(eta\) in the following estimating equation

\[\sum_{i=1}^n egin{bmatrix} \left\{ H(eta) imes (Z_i - \pi_i)\]

ight} imes V_i

left{ Z_i - g(W_i^T lpha)

ight} W_i

end{bmatrix} = 0

where \(\pi_i = \hat{E}[Z \mid W]\), and \(H(eta) = Y - eta A \mathbb{V}\) for a linear SMM and \(H(eta) = Y imes \exp(-A eta \mathbb{V})\) for a log-linear SMM. Note that \(V \subseteq W\), where \(W\) is the set of confounding variables for the instrument (if there are any). The length of the parameter vector is b`+`c, where b is the number of columns in V, and c is the number of columns in W.

Alternatively, the efficient g-estimator can also be used. Like with the other g-estimator of the structural mean model, we replace \(H(eta)\) with \(\{H(eta) - E[H(eta) | W]\}\) in the prior estimating equation and specify a model for \(E[H(eta) | W]\).

thetandarray, list, vector: Theta consists of 1+`b` values if X0 is None, and 3+b values if X0 is not None.
yndarray, list, vector: 1-dimensional vector of n observed values of the outcome.
Zndarray, list, vector: 1-dimensional vector of n observed values of the instrument. Values should all be 0 or 1.
Andarray, list, vector: 1-dimensional vector of n observed values of the action. Values should all be 0 or 1.
Wndarray, list, vector: 2-dimensional vector of n observed values for b columns of a design matrix to model the expected value of A.
Vndarray, list, vector: 2-dimensional vector of n observed values for b columns of a design matrix for the structural mean model. Note that the design matrix here is expected to not include the observed values of A
Xndarray, list, vector, None, optional: Default of this argument is None, which implements the estimating equation for the inefficient g-estimator. To use the efficient g-estimator, a 2-dimensional vector of n observed values for b columns of a design matrix for the \(E[H(eta) | W]\) model should be provided here.
modelstr, optional: Type of structural mean model to fit. Options are currently: linear, poisson. Default is linear. The Poisson model specification can be used for positive continuous data, or with binary data in order to estimate causal risk ratios.
model_instrumentstr, optional: Type of model to fit for the instrument, E[Z | X]. This choice should be made depending on the type of variable the instrument is. Options are currently: linear, poisson, or logistic. Default is linear.
weightsndarray, list, vector, None, optional: 1-dimensional vector of n weights. Default is None, which assigns a weight of 1 to all observations. This argument is intended to support the use of sampling or missingness weights.

array :: Returns a (b`+`c)-by-n (inefficient) or (b`+`c`+`d)-by-n (efficient) NumPy array evaluated for the input theta.

Construction of an estimating equation(s) with ee_gestimation_snmm should be done similar to the following

>>> import numpy as np
>>> import pandas as pd
>>> from scipy.stats import logistic
>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_gestimation_snmm

Some generic data

>>> n = 200
>>> d = pd.DataFrame()
>>> d['Z'] = np.random.binomial(n=1, p=0.5, size=n)
>>> d['U'] = np.random.normal(size=n)
>>> pr_a = inverse_logit(d['U'] + d['Z'])
>>> d['A'] = np.random.binomial(n=1, p=pr_a, size=n)
>>> d['X'] = np.random.normal(size=n)
>>> d['Y'] = 2*d['A'] - d['U'] + 0.1*d['X'] + np.random.normal(size=n)
>>> d['C'] = 1

To start, consider 2SLS without any exogenous variables. The psi function is

>>> def psi(theta):
>>>     return ee_gestimation_snmm_iv(theta, y=d['Y'],
>>>                                   Z=d['Z'], A=d['A'],
>>>                                   W=d[['C', ]], V=d[['C', ]],
>>>                                   model_instrument='logistic')

Calling the M-estimator. the structural mean model has 2 parameters. Generally, starting with all 0. as initials is reasonable.

>>> estr = MEstimator(psi,
>>>                   init=[0., 0., ])
>>> estr.estimate()

Inspecting the parameter estimates, variance, and 95% confidence intervals

>>> estr.theta
>>> estr.variance
>>> estr.confidence_intervals()

More specifically, the corresponding parameters are

>>> estr.theta[0]   # Structural mean model parameter
>>> estr.theta[1]   # Nuisance model parameter

Here, the parameter of interest is estr.theta[0], which under the IV assumptions is a causal effect of \(A\) on \(Y\).

To add exogenous variables (i.e., variables related to the instrument and outcome), the corresponding g-estimator is specified as

>>> def psi(theta):
>>>     return ee_gestimation_snmm_iv(theta, y=d['Y'],
>>>                                   Z=d['Z'], A=d['A'],
>>>                                   W=d[['C', 'X']], V=d[['C', ]],
>>>                                   model_instrument='logistic')

Here, 3 parameters are estimated since there is a single exogenous variable that shows up in only the nuisance model

>>> estr = MEstimator(psi,
>>>                   init=[0., 0., 0.])
>>> estr.estimate()
>>> estr.theta[0]     # Structural Mean Model Parameters
>>> estr.theta[1:]    # Nuisance Model Parameters

Modification by \(X\) can be assessed by include that variable into the structural mean model (via V=d[['C', 'X']]). A continuous variable can also be used as an IV by updating the model_instrument argument. Finally, a multiplicative structural mean model can be considered instead by updating the model argument.

Robins JM (1994). Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics-Theory and Methods, 23(8), 2379-2412.

Vansteelandt S, & Joffe M (2014). Structural nested models and G-estimation: the partially realized promise. Statist Sci, 29(4), 707-731.