delicatessen.estimating_equations.causal.ee_ipw
- ee_ipw(theta, y, A, W, truncate=None, weights=None)
Estimating equation for inverse probability weighting (IPW) estimator. The average causal effect is estimated by this implementation of the IPW estimator. For estimation of the propensity scores, a logistic model is used.
The stacked estimating equations are
\[\begin{split}\sum_{i=1}^n \begin{bmatrix} (\theta_1 - \theta_2) - \theta_0 \\ \frac{A_i Y_i}{\pi_i} - \theta_1 - \theta_1 \\ \frac{(1-A_i) Y_i}{1-\pi_i} - \theta_2 \\ \left\{ A_i - \text{expit}(W_i^T \alpha) \right\} W_i \end{bmatrix} = 0\end{split}\]where \(A\) is the action, math:W is the set of confounders, and \(\pi_i = expit(W_i^T \alpha)\). The first estimating equation is for the average causal effect, the second is for the mean under \(A:=1\), the third is for the mean under \(A:=0\), and the last is the logistic regression model for the propensity scores. Here, the length of the theta vector is 3+`b`, where b is the number of parameters in the regression model.
- Parameters
theta (ndarray, list, vector) – Theta consists of 3+`b` values.
y (ndarray, list, vector) – 1-dimensional vector of n observed values.
A (ndarray, list, vector) – 1-dimensional vector of n observed values. The A values should all be 0 or 1.
W (ndarray, list, vector) – 2-dimensional vector of n observed values for b variables to model the probability of
A
with.truncate (None, list, set, ndarray, optional) – Bounds to truncate the estimated probabilities of
A
at. For example, estimated probabilities above 0.99 or below 0.01 can be set to 0.99 or 0.01, respectively. This is done by specifyingtruncate=(0.01, 0.99)
. Note this step is done vianumpy.clip(.., a_min, a_max)
, so order is important. Default isNone
, which applies no truncation.weights (ndarray, list, vector, None, optional) – 1-dimensional vector of n weights. Default is
None
, which assigns a weight of 1 to all observations. This argument is intended to support the use of missingness weights. The propensity score model is not fit using these weights.
- Returns
Returns a (3+`b`)-by-n NumPy array evaluated for the input
theta
.- Return type
array
Examples
Construction of a estimating equation(s) with
ee_ipw
should be done similar to the following>>> import numpy as np >>> import pandas as pd >>> from delicatessen import MEstimator >>> from delicatessen.estimating_equations import ee_ipw
Some generic data
>>> n = 200 >>> d = pd.DataFrame() >>> d['W'] = np.random.binomial(1, p=0.5, size=n) >>> d['A'] = np.random.binomial(1, p=(0.25 + 0.5*d['W']), size=n) >>> d['Ya0'] = np.random.binomial(1, p=(0.75 - 0.5*d['W']), size=n) >>> d['Ya1'] = np.random.binomial(1, p=(0.75 - 0.5*d['W'] - 0.1*1), size=n) >>> d['Y'] = (1-d['A'])*d['Ya0'] + d['A']*d['Ya1'] >>> d['C'] = 1
Defining psi, or the stacked estimating equations. Note that
'A'
is the action.>>> def psi(theta): >>> return ee_ipw(theta, y=d['Y'], A=d['A'], >>> W=d[['C', 'W']])
Calling the M-estimation procedure. Since
W
is 2-by-n here and IPW has 3 additional parameters, the initial values should be of length 3+2=5. In general, it will be best to start with [0., 0.5, 0.5, …] as the initials whenY
is binary. Otherwise, starting with all 0. as initials is reasonable.>>> estr = MEstimator(stacked_equations=psi, init=[0., 0.5, 0.5, 0., 0.]) >>> estr.estimate(solver='lm')
Inspecting the parameter estimates, variance, and 95% confidence intervals
>>> estr.theta >>> estr.variance >>> estr.confidence_intervals()
More specifically, the corresponding parameters are
>>> estr.theta[0] # causal mean difference of 1 versus 0 >>> estr.theta[1] # causal mean under A=1 >>> estr.theta[2] # causal mean under A=0 >>> estr.theta[3:] # logistic regression coefficients
If you want to see how truncating the probabilities works, try repeating the above code but specifying
truncate=(0.1, 0.9)
as an optional argument inee_ipw
.References
Hernán MA, & Robins JM. (2006). Estimating causal effects from epidemiological data. Journal of Epidemiology & Community Health, 60(7), 578-586.
Cole SR, & Hernán MA. (2008). Constructing inverse probability weights for marginal structural models. American Journal of Epidemiology, 168(6), 656-664.