delicatessen.estimating_equations.causal.ee_gestimation_snmm

ee_gestimation_snmm(theta, y, A, W, V, X=None, model='linear', weights=None)

Estimating equations for g-estimation of structural mean models (SMMs). The parameter(s) of interest are the parameter(s) of the corresponding SMM. Rather than estimating the average causal effect, g-estimation of SMM estimates the average causal effect in the acted on within strata of a set of covariates, \(V\). Options for SMM include the linear SMM and the log-linear SMM. The linear SMM is defined as

\[E[Y^a - Y^{0} | A=a, V] = \beta_1 a + \beta_2 a V\]

This model corresponds to the average causal effect among those with \(A=a\) by \(V\). The log-linear SMM is defined as

\[\frac{E[Y^a | A=a, V]}{E[Y^{0} | A=a, V]} = \exp(\beta_1 a + \beta_2 a V)\]

This model corresponds to the causal mean ratio among those with \(A=a\) by \(V\). Note that the log-linear SMM is only defined when \(Y > 0\). The parameters of either SMM are identified under the assumptions of causal consistency, and exchangeability with positivity.

Two different estimating equations are available for g-estimation. The first set is referred to at the ‘inefficient’ g-estimator. For the inefficient g-estimator we solve for \(\beta\) in the following estimating equation

\[\begin{split}\sum_{i=1}^n \begin{bmatrix} \left\{ H(\beta) \times (A - \pi_i) \right\} \times V_i \\ \left\{ A_i - \text{expit}(W_i^T \alpha) \right\} W_i \end{bmatrix} = 0\end{split}\]

where \(\pi_i = \text{expit}(W_i^T \alpha)\), and \(H(\beta) = Y - \beta A \mathbb{V}\) for a linear SMM and \(H(\beta) = Y \times \exp(-A \beta \mathbb{V})\) for a log-linear SMM, where . Note that \(V \subseteq W\), where \(W\) is the set of confounding variables. The length of the parameter vector is b`+`c, where b is the number of columns in V, and c is the number of columns in W.

The second implementation for g-estimation is the ‘efficient’ g-estimator. For the efficient g-estimator we replace \(H(\beta)\) with \(\{H(\beta) - E[H(\beta) | W]\}\) in the prior estimating equation and specify a model for \(E[H(\beta) | W]\). The corresponding stacked estimating equations are

\[\begin{split}\sum_{i=1}^n \begin{bmatrix} \left\{ (H(\beta) - g^{-1}(W_i^T \gamma)) \times (A - \pi_i) \right\} \times V_i \\ \left\{ A_i - \text{expit}(W_i^T \alpha) \right\} W_i \\ \left\{ H(\beta) - g^{-1}(W_i^T \gamma) \right\} W_i \\ \end{bmatrix} = 0\end{split}\]

where \(g^{-1}\) is the inverse transformation for the specified SMM. Therefore, there are b+c+d parameters for the efficient g-estimator, where d is the number of parameters in the model for \(E[H(\beta) | W]\).

Parameters

theta (ndarray, list, vector) – Theta consists of 1+`b` values if X0 is None, and 3+b values if X0 is not None.
y (ndarray, list, vector) – 1-dimensional vector of n observed values of the outcome.
A (ndarray, list, vector) – 1-dimensional vector of n observed values of the action. The A values should all be 0 or 1.
W (ndarray, list, vector) – 2-dimensional vector of n observed values for b columns of a design matrix to model the expected value of A.
V (ndarray, list, vector) – 2-dimensional vector of n observed values for b columns of a design matrix for the structural mean model. Note that the design matrix here is expected to not include the observed values of A
X (ndarray, list, vector, None, optional) – Default of this argument is None, which implements the estimating equation for the inefficient g-estimator. To use the efficient g-estimator, a 2-dimensional vector of n observed values for b columns of a design matrix for the \(E[H(\beta) | W]\) model should be provided here.
model (str, optional) – Type of structural mean model to fit. Options are currently: linear, poisson. Default is linear. The Poisson model specification can be used for positive continuous data, or with binary data in order to estimate causal risk ratios.
weights (ndarray, list, vector, None, optional) – 1-dimensional vector of n weights. Default is None, which assigns a weight of 1 to all observations. This argument is intended to support the use of sampling or missingness weights.

Returns

Returns a (b`+`c)-by-n (inefficient) or (b`+`c`+`d)-by-n (efficient) NumPy array evaluated for the input theta.

Return type

array

Examples

Construction of a estimating equation(s) with ee_gestimation_snmm should be done similar to the following

>>> import numpy as np
>>> import pandas as pd
>>> from scipy.stats import logistic
>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_gestimation_snmm

Some generic data

>>> n = 200
>>> d = pd.DataFrame()
>>> d['W'] = np.random.normal(size=n)
>>> d['V'] = np.random.binomial(1, p=0.5, size=n)
>>> d['A'] = np.random.binomial(1, p=logistic.cdf(0.25 + 0.5*d['V'] + d['W']), size=n)
>>> d['Ya0'] = 12.75 - 3.5*d['V'] + d['W'] + np.random.normal(size=n)
>>> d['Ya1'] = 10.75 - 0.8*d['V'] + d['W'] + np.random.normal(size=n)
>>> d['Y'] = (1-d['A'])*d['Ya0'] + d['A']*d['Ya1']
>>> d['C'] = 1

Defining psi, or the stacked estimating equations. Note that A is the action of interest and Y is the outcome of interest. Here, we are interested in estimating the following linear SMM

\[E[Y^a - Y^{0} | A=a, V] = \beta_1 a + \beta_2 a V\]

>>> def psi(theta):
>>>     return ee_gestimation_snmm(theta,
>>>                                y=d['Y'], A=d['A'],
>>>                                W=d[['C', 'V', 'W']],
>>>                                V=d[['C', 'V']])

Calling the M-estimator. Since there are 2 coefficients in the SMM and 3 coefficients in the \(E[A|W]\) model, the total number of initial values should be 2+3=5:

>>> estr = MEstimator(psi,
>>>                   init=[0., ]*5)
>>> estr.estimate(solver='lm')

Inspecting the parameter estimates, variance, and 95% confidence intervals

>>> estr.theta
>>> estr.variance
>>> estr.confidence_intervals()

More specifically, the corresponding parameters are

>>> estr.theta[0]     # beta_1 of SMM
>>> estr.theta[1]     # beta_2 of SMM
>>> estr.theta[2:]    # propensity score regression coefficients

The efficient g-estimator can be implemented by providing a design matrix to the argument X

>>> def psi(theta):
>>>     return ee_gestimation_snmm(theta,
>>>                                y=d['Y'], A=d['A'],
>>>                                W=d[['C', 'V', 'W']],
>>>                                V=d[['C', 'V']],
>>>                                X=d[['C', 'V', 'W']])

Here, there are 2+3+3=8 parameters to estimate

>>> estr = MEstimator(psi,
>>>                   init=[0., ]*8)
>>> estr.estimate(solver='lm')

A log-linear SMM for this example can be estimated by specifying model='poisson'.

References

Dukes O, & Vansteelandt S (2018). A note on G-estimation of causal risk ratios. American Journal of Epidemiology, 187(5), 1079-1084.

Robins JM, Mark SD, Newey WK (1992). Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics, 48(2), 479–495.

Vansteelandt S, & Joffe M (2014). Structural nested models and G-estimation: the partially realized promise. Statist Sci, 29(4), 707-731.

Vansteelandt S, & Sjolander A (2016). Revisiting g-estimation of the effect of a time-varying exposure subject to time-varying confounding. Epidemiologic Methods, 5(1), 37-56.