delicatessen.estimating_equations.causal.ee_ipw

ee_ipw(theta, y, A, W, truncate=None, weights=None)

Estimating equation for inverse probability weighting (IPW) estimator. The average causal effect is estimated by this implementation of the IPW estimator. For estimation of the propensity scores, a logistic model is used.

The stacked estimating equations are

\[\begin{split}\sum_{i=1}^n \begin{bmatrix} (\theta_1 - \theta_2) - \theta_0 \\ \frac{A_i Y_i}{\pi_i} - \theta_1 - \theta_1 \\ \frac{(1-A_i) Y_i}{1-\pi_i} - \theta_2 \\ \left\{ A_i - \text{expit}(W_i^T \alpha) \right\} W_i \end{bmatrix} = 0\end{split}\]

where \(A\) is the action, math:W is the set of confounders, and \(\pi_i = expit(W_i^T \alpha)\). The first estimating equation is for the average causal effect, the second is for the mean under \(A:=1\), the third is for the mean under \(A:=0\), and the last is the logistic regression model for the propensity scores. Here, the length of the theta vector is 3+`b`, where b is the number of parameters in the regression model.

Parameters
  • theta (ndarray, list, vector) – Theta consists of 3+`b` values.

  • y (ndarray, list, vector) – 1-dimensional vector of n observed values.

  • A (ndarray, list, vector) – 1-dimensional vector of n observed values. The A values should all be 0 or 1.

  • W (ndarray, list, vector) – 2-dimensional vector of n observed values for b variables to model the probability of A with.

  • truncate (None, list, set, ndarray, optional) – Bounds to truncate the estimated probabilities of A at. For example, estimated probabilities above 0.99 or below 0.01 can be set to 0.99 or 0.01, respectively. This is done by specifying truncate=(0.01, 0.99). Note this step is done via numpy.clip(.., a_min, a_max), so order is important. Default is None, which applies no truncation.

  • weights (ndarray, list, vector, None, optional) – 1-dimensional vector of n weights. Default is None, which assigns a weight of 1 to all observations. This argument is intended to support the use of missingness weights. The propensity score model is not fit using these weights.

Returns

Returns a (3+`b`)-by-n NumPy array evaluated for the input theta.

Return type

array

Examples

Construction of a estimating equation(s) with ee_ipw should be done similar to the following

>>> import numpy as np
>>> import pandas as pd
>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_ipw

Some generic data

>>> n = 200
>>> d = pd.DataFrame()
>>> d['W'] = np.random.binomial(1, p=0.5, size=n)
>>> d['A'] = np.random.binomial(1, p=(0.25 + 0.5*d['W']), size=n)
>>> d['Ya0'] = np.random.binomial(1, p=(0.75 - 0.5*d['W']), size=n)
>>> d['Ya1'] = np.random.binomial(1, p=(0.75 - 0.5*d['W'] - 0.1*1), size=n)
>>> d['Y'] = (1-d['A'])*d['Ya0'] + d['A']*d['Ya1']
>>> d['C'] = 1

Defining psi, or the stacked estimating equations. Note that 'A' is the action.

>>> def psi(theta):
>>>     return ee_ipw(theta, y=d['Y'], A=d['A'],
>>>                   W=d[['C', 'W']])

Calling the M-estimation procedure. Since W is 2-by-n here and IPW has 3 additional parameters, the initial values should be of length 3+2=5. In general, it will be best to start with [0., 0.5, 0.5, …] as the initials when Y is binary. Otherwise, starting with all 0. as initials is reasonable.

>>> estr = MEstimator(stacked_equations=psi, init=[0., 0.5, 0.5, 0., 0.])
>>> estr.estimate(solver='lm')

Inspecting the parameter estimates, variance, and 95% confidence intervals

>>> estr.theta
>>> estr.variance
>>> estr.confidence_intervals()

More specifically, the corresponding parameters are

>>> estr.theta[0]    # causal mean difference of 1 versus 0
>>> estr.theta[1]    # causal mean under A=1
>>> estr.theta[2]    # causal mean under A=0
>>> estr.theta[3:]   # logistic regression coefficients

If you want to see how truncating the probabilities works, try repeating the above code but specifying truncate=(0.1, 0.9) as an optional argument in ee_ipw.

References

Hernán MA, & Robins JM. (2006). Estimating causal effects from epidemiological data. Journal of Epidemiology & Community Health, 60(7), 578-586.

Cole SR, & Hernán MA. (2008). Constructing inverse probability weights for marginal structural models. American Journal of Epidemiology, 168(6), 656-664.