delicatessen.estimating_equations.causal.ee_gestimation_snmm

ee_gestimation_snmm(theta, y, A, W, V, X=None, model='linear', weights=None)

Estimating equations for g-estimation of structural mean models (SMMs). The parameter(s) of interest are the parameter(s) of the corresponding SMM. Rather than estimating the average causal effect, g-estimation of SMM estimates the average causal effect in the acted on within strata of a set of covariates, \(V\). Options for SMM include the linear SMM and the log-linear SMM. The linear SMM is defined as

\[E[Y^a - Y^{0} | A=a, V] = \beta_1 a + \beta_2 a V\]

This model corresponds to the average causal effect among those with \(A=a\) by \(V\). The log-linear SMM is defined as

\[\frac{E[Y^a | A=a, V]}{E[Y^{0} | A=a, V]} = \exp(\beta_1 a + \beta_2 a V)\]

This model corresponds to the causal mean ratio among those with \(A=a\) by \(V\). Note that the log-linear SMM is only defined when \(Y > 0\). The parameters of either SMM are identified under the assumptions of causal consistency, and exchangeability with positivity.

Two different estimating equations are available for g-estimation. The first set is referred to at the ‘inefficient’ g-estimator. For the inefficient g-estimator we solve for \(\beta\) in the following estimating equation

\[\begin{split}\sum_{i=1}^n \begin{bmatrix} \left\{ H(\beta) \times (A - \pi_i) \right\} \times V_i \\ \left\{ A_i - \text{expit}(W_i^T \alpha) \right\} W_i \end{bmatrix} = 0\end{split}\]

where \(\pi_i = \text{expit}(W_i^T \alpha)\), and \(H(\beta) = Y - \beta A \mathbb{V}\) for a linear SMM and \(H(\beta) = Y \times \exp(-A \beta \mathbb{V})\) for a log-linear SMM, where . Note that \(V \subseteq W\), where \(W\) is the set of confounding variables. The length of the parameter vector is b`+`c, where b is the number of columns in V, and c is the number of columns in W.

The second implementation for g-estimation is the ‘efficient’ g-estimator. For the efficient g-estimator we replace \(H(\beta)\) with \(\{H(\beta) - E[H(\beta) | W]\}\) in the prior estimating equation and specify a model for \(E[H(\beta) | W]\). The corresponding stacked estimating equations are

\[\begin{split}\sum_{i=1}^n \begin{bmatrix} \left\{ (H(\beta) - g^{-1}(W_i^T \gamma)) \times (A - \pi_i) \right\} \times V_i \\ \left\{ A_i - \text{expit}(W_i^T \alpha) \right\} W_i \\ \left\{ H(\beta) - g^{-1}(W_i^T \gamma) \right\} W_i \\ \end{bmatrix} = 0\end{split}\]

where \(g^{-1}\) is the inverse transformation for the specified SMM. Therefore, there are b+c+d parameters for the efficient g-estimator, where d is the number of parameters in the model for \(E[H(\beta) | W]\).

Parameters
  • theta (ndarray, list, vector) – Theta consists of 1+`b` values if X0 is None, and 3+b values if X0 is not None.

  • y (ndarray, list, vector) – 1-dimensional vector of n observed values of the outcome.

  • A (ndarray, list, vector) – 1-dimensional vector of n observed values of the action. The A values should all be 0 or 1.

  • W (ndarray, list, vector) – 2-dimensional vector of n observed values for b columns of a design matrix to model the expected value of A.

  • V (ndarray, list, vector) – 2-dimensional vector of n observed values for b columns of a design matrix for the structural mean model. Note that the design matrix here is expected to not include the observed values of A

  • X (ndarray, list, vector, None, optional) – Default of this argument is None, which implements the estimating equation for the inefficient g-estimator. To use the efficient g-estimator, a 2-dimensional vector of n observed values for b columns of a design matrix for the \(E[H(\beta) | W]\) model should be provided here.

  • model (str, optional) – Type of structural mean model to fit. Options are currently: linear, poisson. Default is linear. The Poisson model specification can be used for positive continuous data, or with binary data in order to estimate causal risk ratios.

  • weights (ndarray, list, vector, None, optional) – 1-dimensional vector of n weights. Default is None, which assigns a weight of 1 to all observations. This argument is intended to support the use of sampling or missingness weights.

Returns

Returns a (b`+`c)-by-n (inefficient) or (b`+`c`+`d)-by-n (efficient) NumPy array evaluated for the input theta.

Return type

array

Examples

Construction of a estimating equation(s) with ee_gestimation_snmm should be done similar to the following

>>> import numpy as np
>>> import pandas as pd
>>> from scipy.stats import logistic
>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_gestimation_snmm

Some generic data

>>> n = 200
>>> d = pd.DataFrame()
>>> d['W'] = np.random.normal(size=n)
>>> d['V'] = np.random.binomial(1, p=0.5, size=n)
>>> d['A'] = np.random.binomial(1, p=logistic.cdf(0.25 + 0.5*d['V'] + d['W']), size=n)
>>> d['Ya0'] = 12.75 - 3.5*d['V'] + d['W'] + np.random.normal(size=n)
>>> d['Ya1'] = 10.75 - 0.8*d['V'] + d['W'] + np.random.normal(size=n)
>>> d['Y'] = (1-d['A'])*d['Ya0'] + d['A']*d['Ya1']
>>> d['C'] = 1

Defining psi, or the stacked estimating equations. Note that A is the action of interest and Y is the outcome of interest. Here, we are interested in estimating the following linear SMM

\[E[Y^a - Y^{0} | A=a, V] = \beta_1 a + \beta_2 a V\]
>>> def psi(theta):
>>>     return ee_gestimation_snmm(theta,
>>>                                y=d['Y'], A=d['A'],
>>>                                W=d[['C', 'V', 'W']],
>>>                                V=d[['C', 'V']])

Calling the M-estimator. Since there are 2 coefficients in the SMM and 3 coefficients in the \(E[A|W]\) model, the total number of initial values should be 2+3=5:

>>> estr = MEstimator(psi,
>>>                   init=[0., ]*5)
>>> estr.estimate(solver='lm')

Inspecting the parameter estimates, variance, and 95% confidence intervals

>>> estr.theta
>>> estr.variance
>>> estr.confidence_intervals()

More specifically, the corresponding parameters are

>>> estr.theta[0]     # beta_1 of SMM
>>> estr.theta[1]     # beta_2 of SMM
>>> estr.theta[2:]    # propensity score regression coefficients

The efficient g-estimator can be implemented by providing a design matrix to the argument X

>>> def psi(theta):
>>>     return ee_gestimation_snmm(theta,
>>>                                y=d['Y'], A=d['A'],
>>>                                W=d[['C', 'V', 'W']],
>>>                                V=d[['C', 'V']],
>>>                                X=d[['C', 'V', 'W']])

Here, there are 2+3+3=8 parameters to estimate

>>> estr = MEstimator(psi,
>>>                   init=[0., ]*8)
>>> estr.estimate(solver='lm')

A log-linear SMM for this example can be estimated by specifying model='poisson'.

References

Dukes O, & Vansteelandt S (2018). A note on G-estimation of causal risk ratios. American Journal of Epidemiology, 187(5), 1079-1084.

Robins JM, Mark SD, Newey WK (1992). Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics, 48(2), 479–495.

Vansteelandt S, & Joffe M (2014). Structural nested models and G-estimation: the partially realized promise. Statist Sci, 29(4), 707-731.

Vansteelandt S, & Sjolander A (2016). Revisiting g-estimation of the effect of a time-varying exposure subject to time-varying confounding. Epidemiologic Methods, 5(1), 37-56.