Custom Estimating Equations

While there are many alternative libraries for regression, causal inference, pharmacokinetic models, a key advantages of delicatessen is the provided flexibility in how you can specify estimating equations. Specifically, delicatessen allows low-level access to the estimating equations, which allows you develop your own novel combinations. However, building your own stack of estimating equations can be complicated when first starting out. Here, I provide an overview and tips for how to build your own estimating equations using delicatessen. Understanding this process will unleash the true power of estimating equations in your quantitative research.

In general, it will be best if you find a paper or book that directly provides the mathematical expression of an estimating equation(s) for you. Alternatively, if you can find the score function or gradient for a regression model, that is the corresponding estimating equation. This section does not address how to derive your own estimating equation(s). Rather, this guide focuses on how to translate an estimating equation into code that is compatible with delicatessen, as delicatessen assumes you are giving it a valid estimating equation. For some guidance on deriving the mathematical expressions for your own estimating equations, see Ross et al. (2024).

[1]:
import numpy as np
import pandas as pd
from scipy.stats import logistic

from delicatessen import MEstimator
from delicatessen.estimating_equations import ee_regression
from delicatessen.utilities import inverse_logit

Building from scratch

To begin, we will go through the case of building an estimating equation completely from scratch. To do this, we will go through an example with linear regression. First, we have the estimating equation (which is the score function) given in Boos & Stefanski (2013) and Ross et al. (2024).

\[\sum_{i=1}^{n} (Y_i - X_i \beta^T) X_i^T = 0\]

where \(Y\) is the outcome and \(X\) is the design matrix.

We will demonstrate using the following simulated data set

[2]:
np.random.seed(8091421)
n = 200
d = pd.DataFrame()
d['X'] = np.random.normal(size=n)
d['W'] = np.random.normal(size=n)
d['C'] = 1
d['Y'] = 5 + d['X'] + np.random.normal(size=n)

Here, we are interested in the model \(Y = \beta_0 + \beta_1 X + \epsilon\). To help see how to code the estimating equation for this regression mode, it is helpful to write-out explicitly (instead of using the matrix algebra notation from above). Here the estimating equation is

\[\begin{split} \sum_{i=1}^{n} \begin{bmatrix} \left[ Y_i - (\beta_0 + \beta_1 X_i)\right] \times 1 \\ \left[ Y_i - (\beta_0 + \beta_1 X_i)\right] \times X_i \\ \end{bmatrix} = 0\end{split}\]

where the first equation is for \(\beta_0\) and the second is for \(\beta_1\).

To program this estimating equation, we have a few options (as suggested by the multiple mathematical expressions we have). First, we can build the estimating equation using a for-loop where each i piece will be stacked together. While this for-loop approach will be ‘slow’, it is often a good strategy to implement a for-loop version that is easier to debug first (unless you are a linear algebra wizard!).

Below calculates the estimating equation for each i in the for-loop. This function returns a stacked array of each i observation as a 3-by-n array. That array can then be passed to the MEstimator

[3]:
def psi(theta):
    # Transforming to arrays
    X = np.asarray(d[['C', 'X', 'W']])   # Design matrix
    y = np.asarray(d['Y'])               # Dependent variable
    beta = np.asarray(theta)[:, None]    # Parameters
    n = X.shape[0]                       # Number of observations

    # Where to store each of the resulting estimating functions
    est_vals = []

    # Looping through each observation from 1 to n
    for i in range(n):
        v_i = (y[i] - np.dot(X[i], beta)) * X[i]
        est_vals.append(v_i)

    # returning 3-by-n NumPy array
    return np.asarray(est_vals).T

delicatessen is not picky about the particulars of the estimating functions. One only needs to create a function that has a single argument (i.e., theta). That argument needs to return a len(theta) by \(n\) NumPy array. Below is code that calls our custom estimating equation

[4]:
estr = MEstimator(psi, init=[0., 0., 0.])
estr.estimate()
estr.theta
[4]:
array([ 5.08271932e+00,  9.68128991e-01, -1.27507410e-04])

for which the coefficients match the coefficients from a ordinary least squares model (variance estimates may differ, since by default most OLS software use the inverse of the information matrix to estimate the variance rather than the empirical sandwich variance estimator).

Here, we can vectorize the operations. The advantage of the vectorized-form is that it will run much faster. With some careful experimentation, the following is a vectorized version. Remember that delicatessen is expecting a 3-by-n array to be given by the psi function in this example. Failure to provide this is a common mistake when building custom estimating equations.

[5]:
def psi(theta):
    X = np.asarray(d[['C', 'X', 'W']])    # Design matrix
    y = np.asarray(d['Y'])[:, None]       # Dependent variable
    beta = np.asarray(theta)[:, None]     # Parameters
    return ((y - np.dot(X, beta)) * X).T  # Computes all estimating functions


estr = MEstimator(psi, init=[0., 0., 0.])
estr.estimate()
estr.theta
[5]:
array([ 5.08271932e+00,  9.68128991e-01, -1.27507410e-04])

This code provides the same results. Vectorizing (even parts of an estimating equation) can help to improve run-times if you find the root-finding step to be taking a long time.

Building with basics

Instead of building everything from scratch, you can also piece together built-in estimating equations with your custom estimating equations code. To demonstrate this, we will go through an example with inverse probability weighting.

The inverse probability weighting estimator consists of four estimating equations: the difference between the weighted means, the weighted mean under \(A=1\), the weighted mean under \(A=0\), and the propensity score model. We can express this mathematically as

\[\begin{split}\sum_{i=1}^n \begin{bmatrix} (\theta_1 - \theta_2) - \theta_0 \\ \frac{A_i \times Y_i}{\pi_i(W_i, \alpha)} - \theta_1 \\ \frac{(1-A_i) \times Y_i}{1-\pi_i(W_i, \alpha)} - \theta_2 \\ (A_i - \text{expit}(W_i^T \alpha)) W_i \end{bmatrix} = 0\end{split}\]

where \(A\) is the action of interest, \(Y\) is the outcome of interest, and \(W\) is the set of confounding variables.

Rather than re-code the logistic regression model (to estimate the propensity scores), we will use the built-in logistic regression functionality. Below is a stacked estimating equation for the inverse probability weighting estimator above

[6]:
def psi(theta):
    # Ensuring correct typing
    W = np.asarray(d['C', 'W'])     # Design matrix of confounders
    A = np.asarray(d['A'])          # Action
    y = np.asarray(y)               # Outocome
    beta = theta[3:]                # Regression parameters

    # Estimating propensity score
    preds_reg = ee_regression(theta=beta,        # Built-in regression
                              X=W,               # Plug-in covariates for X
                              y=A,               # Plug-in treatment for Y
                              model='logistic')  # Specify logistic
    # Estimating weights
    pi = inverse_logit(np.dot(W, beta))          # Pr(A|W) using delicatessen.utilities

    # Calculating Y(a=1)
    ya1 = (A * y) / pi - theta[1]                # i's contribution is (AY) / \pi

    # Calculating Y(a=0)
    ya0 = ((1-A) * y) / (1-pi) - theta[2]        # i's contribution is ((1-A)Y) / (1-\pi)

    # Calculating Y(a=1) - Y(a=0) (using np.ones to ensure a 1-by-n array)
    ate = np.ones(y.shape[0]) * (theta[1] - theta[2]) - theta[0]

    # Output (3+b)-by-n stacked array
    return np.vstack((ate,             # theta[0] is for the ATE
                      ya1[None, :],    # theta[1] is for R1
                      ya0[None, :],    # theta[2] is for R0
                      preds_reg))      # theta[3:] is for the regression coefficients

This example demonstrates how estimating equations can easily be stacked together using delicatessen. Specifically, both built-in and user-specified functions can be specified together seamlessly. All it requires is specifying both in the estimating equation and returning a stacked array of the estimates.

One important piece to note here is that the returned array needs to be in the same order as the theta’s are input. As done here, all the theta values after the 3rd are for the propensity score model. Therefore, the propensity score model values are last in the returned stack. Returning the values in a different order than input is a common mistake when programming your own. delicatessen currently cannot detect this issue, but it will often result in the root-finding procedure failing and returning the starting value(s).

Handling np.nan

Sometimes, np.nan will show up in your data set. However, delicatessen does not naturally handle np.nan (as there are many options on how to handle missing data, and delicatessen by design does not assume how the user would like missing data to be handled). When there are np.nan’s detected in the output estimating equations, delicatessen will return an error (by design). The following section discusses how to handle np.nan with delicatessen.

In the first case, we will consider handling np.nan with a built-in estimating equation. When trying to fit a regression model where there are np.nan’s present, the missing values will be set to a placeholder value and their contributions will be manually removed using an indicator function for missingness. This process is equivalent to a complete-case analysis. Below is an example using the built-in logistic regression estimating equations:

[7]:
d = pd.DataFrame()
d['X'] = np.random.normal(size=100)
y = np.random.binomial(n=1, p=0.5 + 0.01 * d['X'], size=100)
d['y'] = np.where(np.random.binomial(n=1, p=0.9, size=100), y, np.nan)
d['C'] = 1

X = np.asarray(d[['C', 'X']])
y = np.asarray(d['y'])
r = np.where(d['y'].isna(), 0, 1)
y_no_nan = np.asarray(d['y'].fillna(-999))
[8]:
def psi(theta):
    # Estimating logistic model values with filled-in Y's
    a_model = ee_regression(theta,
                            X=X, y=y_no_nan,
                            model='logistic')
    # Setting contributions with missing to zero manually
    a_model = a_model * r
    return a_model


estr = MEstimator(psi, init=[0, 0, ])
estr.estimate()
estr.theta
[8]:
array([0.03331353, 0.27781908])

If the contribution to the estimating function with missing Y’s had not been included, the optimized points would have been nan. Alternatively, we could have used numpy.nan_to_num. However, this method to handling nan does not work with automatic differentiation (i.e., when deriv_method='exact').

As a second example, we will consider estimating the mean with missing data and correcting for informative missing by inverse probability weighting. To reduce random error, this example uses 1000 observations. Here, we set nan’s to be zero’s prior to subtracting off the mean using np.where. This is shown below:

[9]:
# Generating data
n = 1000
d = pd.DataFrame()
d['X'] = np.random.normal(size=n)
y = 5 + d['X'] + np.random.normal(size=n)
d['y'] = np.where(np.random.binomial(n=1, p=logistic.cdf(1 + d['X']), size=n), y, np.nan)
d['C'] = 1

X = np.asarray(d[['C', 'X']])
y = np.asarray(d['y'])
r = np.asarray(np.where(d['y'].isna(), 0, 1))


def psi(theta):
    # Estimating logistic model values
    a_model = ee_regression(theta[1:], X=X, y=r,
                            model='logistic')
    pi = inverse_logit(np.dot(X, theta[1:]))

    y_w = np.where(r, y / pi, 0) - theta[0]  # nan-to-zero then subtract off
    return np.vstack((y_w[None, :],
                      a_model))


estr = MEstimator(psi, init=[0, 0, 0])
estr.estimate()
estr.theta
[9]:
array([5.04578442, 0.96731873, 0.97044598])

Our estimate for the mean (i.e., theta[0]) close to the truth (i.e., 5).

Wrapping estimators

If you find yourself using delicatessen for the same tasks, you may want to develop wrapper functions. This is a process I use for my own work and other software libraries I have developed. To illustrate the idea behind wrapping up delicatessen, we consider the previous missing data example. We will create a new class object, which will organize and process the data for us. This data will then be passed to MEstimator for estimation

[10]:
class IPMW:
    def __init__(self, data, outcome):
        # Storing data inputs
        self.data = data.copy()
        self.outcome = outcome

        # Calculating missing indicator
        self.missing_indicator = np.where(self.data[self.outcome].isna(), 0, 1)
        self.y_no_nan = self.data[self.outcome].fillna(-999)

        # Empty storage for later
        self.M_dmatrix = None

    def missing_model(self, design_matrix):
        self.M_dmatrix = self.data[design_matrix]

    def estimate(self, decimals=3):
        def psi(theta):
            # Estimating logistic model values
            a_model = ee_regression(theta[1:],
                                    X=self.M_dmatrix,
                                    y=self.missing_indicator,
                                    model='logistic')
            pi = inverse_logit(np.dot(self.M_dmatrix, theta[1:]))

            y_w = np.where(self.missing_indicator, self.y_no_nan / pi, 0) - theta[0]
            return np.vstack((y_w[None, :],
                              a_model))

        # M-estimator
        estr = MEstimator(psi, init=[0, 0, 0])
        estr.estimate()

        print("The estimated mean is", np.round(estr.theta[0], decimals=decimals))
[11]:
ipmw = IPMW(d, outcome='y')
ipmw.missing_model(['C', 'X'])
ipmw.estimate()
The estimated mean is 5.046

This illustrates how you can package delicatessen for your needs (or your software library).

Common Mistakes

Here is a list of common mistakes when programming your own estimating functions, most of which we have done ourselves:

  1. The psi function doesn’t return a NumPy array.

  2. The psi function returns the wrong shape. Remember, it should be a \(b\)-by-\(n\) NumPy array!

  3. The psi function is summing over \(n\). delicatessen needs to perform the sum internally (in order to compute the bread and filling), so do not sum over \(n\) in psi!

  4. The theta values and b must be in the same order. If theta[0] is the mean, the 1st row of the returned array better be the estimating function for that mean!

  5. Automatic differentiation with np.nan_to_num. This will result in the bread matrix having nan values.

  6. Trying to use a SciPy function with deriv_method='exact' (only some functionalities are currently supported. please open an issue on GitHub if you have one you would like to see added).

If you still have trouble, please open an issue at pzivich/Delicatessen This will help me to add other common mistakes here and improve the documentation for custom estimating equations.

Additional Examples

For further examples on how to use delicatessen with custom estimating equations, see the Applied Examples page.