Boos & Stefanski (2013): M-Estimation (Estimating Equations)

Here, we will implement some of the examples described in Chapter 7 of Boos & Stefanski (2013). If you have the book (or access to it), then reading along with each section may be helpful. Here, we code each of the estimating equations by-hand (rather than using the pre-built options offered).

Examples of M-Estimation provided in that chapter are replicated here using delicatessen. Reading the chapter and looking at the corresponding implementations is likely to be the best approach to learning both the theory and application of M-Estimation.

Boos DD, & Stefanski LA. (2013). M-estimation (estimating equations). In Essential Statistical Inference (pp. 297-337). Springer, New York, NY.

Setup

[1]:
import numpy as np
import scipy as sp
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

import delicatessen
from delicatessen import MEstimator

np.random.seed(80950841)

print("NumPy version:       ", np.__version__)
print("SciPy version:       ", sp.__version__)
print("Pandas version:      ", pd.__version__)
print("Delicatessen version:", delicatessen.__version__)
NumPy version:        1.25.2
SciPy version:        1.11.2
Pandas version:       1.4.1
Delicatessen version: 2.1

7.2 The Basic Approach

7.2.2 Sample Mean and Variance

The first example is the estimating equations for the mean and variance. To demonstrate the example, we will use some generic data for \(Y\). Below is an example data set that will be used up to Section 7.2.6

[2]:
n = 200
data = pd.DataFrame()
data['Y'] = np.random.normal(loc=10, scale=2, size=n)
data['X'] = np.random.normal(loc=5, size=n)
data['C'] = 1

Here, estimating equations for both the mean and variance are stacked together:

\[\begin{split}\psi(Y_i, \theta) = \begin{bmatrix} Y_i - \theta_1\\ (Y_i - \theta_1)^2 - \theta_2 \end{bmatrix}\end{split}\]

The top estimating equation is the mean, and the bottom estimating equation is the (asymptotic) variance. The following is a by-hand implementation of these estimating equations

[3]:
def psi_mean_var(theta):
    """By-hand stacked estimating equations"""
    return (data['Y'] - theta[0],
            (data['Y'] - theta[0])**2 - theta[1])



estr = MEstimator(psi_mean_var, init=[0, 0])
estr.estimate()

print("=========================================================")
print("Mean & Variance")
print("=========================================================")
print("M-Estimation: by-hand")
print("Theta:", estr.theta)
print("Var:  \n", estr.asymptotic_variance)
print("---------------------------------------------------------")
print("Closed-Form")
print("Mean: ", np.mean(data['Y']))
print("Var:  ", np.var(data['Y'], ddof=0))
print("=========================================================")
=========================================================
Mean & Variance
=========================================================
M-Estimation: by-hand
Theta: [10.16284625  4.11208477]
Var:
 [[ 4.11208477 -1.6739995 ]
 [-1.6739995  36.16386927]]
---------------------------------------------------------
Closed-Form
Mean:  10.162846250198633
Var:   4.112084770881207
=========================================================

The M-Estimator solves for \(\hat{\theta}\) via a root finding procedure given the initial values in init. Since the variance must be \(>0\), we provide a positive initial value. For the sandwich variance, delicatessen uses a numerical approximation procedure for the bread matrix. This is different from the closed-form variance estimator provided in Chapter 7, but both should return approximately the same answer. The advantage of the numerically approximating the derivatives is that this process can be done for arbitrary estimating functions.

Notice that \(\theta_2\) also matches the first element of the (asymptotic) variance matrix. These two values should match (since they are estimating the same thing). Further, as shown the closed-form solutions for the mean and variance are equal to the M-Estimation approach.

The following uses the built-in estimating equation for the mean and variance

[4]:
from delicatessen.estimating_equations import ee_mean_variance

def psi_mean_var_default(theta):
    """Built-in stacked estimating equations"""
    return ee_mean_variance(y=np.asarray(data['Y']), theta=theta)


estr = MEstimator(psi_mean_var_default, init=[0, 0])
estr.estimate()

print("=========================================================")
print("Mean & Variance")
print("=========================================================")
print("M-Estimation: built-in")
print("Theta:", estr.theta)
print("Var:  \n", estr.asymptotic_variance)
print("=========================================================")
=========================================================
Mean & Variance
=========================================================
M-Estimation: built-in
Theta: [10.16284625  4.11208477]
Var:
 [[ 4.11208477 -1.6739995 ]
 [-1.6739995  36.16386927]]
=========================================================

7.2.3 Ratio Estimator

Now consider if we wanted to estimate the ratio between two means. This can be expressed as a single estimating equation

\[\psi(Y_i, \theta) = \begin{bmatrix} Y_i - X_i \times \theta_1 \end{bmatrix}\]

and is implemented by-hand as

[5]:
def psi_ratio(theta):
    return data['Y'] - data['X']*theta


estr = MEstimator(psi_ratio, init=[0, ])
estr.estimate()

print("=========================================================")
print("Ratio Estimator")
print("=========================================================")
print("M-Estimation: single estimating equation")
print("Theta:", estr.theta)
print("Var:  ",estr.asymptotic_variance)
print("---------------------------------------------------------")
print("Closed-Form")

theta = np.mean(data['Y']) / np.mean(data['X'])
b = 1 / np.mean(data['X'])**2
c = np.mean((data['Y'] - theta*data['X'])**2)
var = b * c
print("Ratio:",theta)
print("Var:  ",var)

print("=========================================================")
=========================================================
Ratio Estimator
=========================================================
M-Estimation: single estimating equation
Theta: [2.08234516]
Var:   [[0.33842324]]
---------------------------------------------------------
Closed-Form
Ratio: 2.0823451609959682
Var:   0.33842329733168625
=========================================================

As you may notice, only a single initial value is provided (since only a single array is being returned). Furthermore, we provide an initial value \(>0\) since we are estimating a ratio.

However, there is another set of stacked estimating equations we can consider for the ratio. Specifically, we can estimate each of the means and then take the ratio of those means. Below is this alternative set of estimating equations

\[\begin{split}\psi(Y_i, \theta) = \begin{bmatrix} Y_i - \theta_1\\ X_i - \theta_2\\ \theta_1 - \theta_2 \theta_3 \end{bmatrix}\end{split}\]

Note that the last element is the ratio

[6]:
def psi_ratio_three(theta):
    return (data['Y'] - theta[0],
            data['X'] - theta[1],
            np.ones(data.shape[0])*theta[0] - theta[1]*theta[2])


estr = MEstimator(psi_ratio_three, init=[0.1, 0.1, 0.1])
estr.estimate()

print("=========================================================")
print("Ratio Estimator")
print("=========================================================")
print("M-Estimation: three estimating equations")
print("Theta:", estr.theta)
print("Var:  \n", estr.asymptotic_variance)
print("=========================================================")
=========================================================
Ratio Estimator
=========================================================
M-Estimation: three estimating equations
Theta: [10.16284625  4.88048112  2.08234516]
Var:
 [[ 4.11208477  0.04326814  0.82409608]
 [ 0.04326814  0.95223639 -0.39742316]
 [ 0.82409608 -0.39742316  0.3384232 ]]
=========================================================

Here, we used a trick to make sure the dimension of ratio stays as \(n\), we use np.ones. Without multiplying by the array of ones, ratio would be a single value. However, MEstimator expects a \(3 \times n\) array. Multiplying the 3rd equation by an array of 1’s ensure the same dimension.

Also notice this form requires the use of 3 init values, unlike the other ratio estimator. As before, the ratio initial value is set \(>0\) to be nice to the root-finding algorithm.

7.2.4 Delta Method via M-Estimation

The delta method has been used in a variety of contexts, including estimating the variance for transformations of parameters. Instead of separately estimating the parameters, transforming the parameters, and then using the delta method to estimate the variance of the transformed parameters; we can apply the transformation in an estimating equation and automatically estimate the variance for the transformed parameter(s) via the sandwich variance estimator. To do this, we stack the estimating equation for the transformation into our set of estimating equations. Below is the mean-variance estimating equations stacked with two transformations of the variance

\[\begin{split}\psi(Y_i, \theta) = \begin{bmatrix} Y_i - \theta_1\\ (Y_i - \theta_1)^2 - \theta_2\\ \sqrt{\theta_2} - \theta_3\\ \log(\theta_2) - \theta_4 \end{bmatrix}\end{split}\]

These equations can be expressed programmatically as

[7]:
def psi_delta(theta):
    return (data['Y'] - theta[0],
            (data['Y'] - theta[0])**2 - theta[1],
            np.ones(data.shape[0])*np.sqrt(theta[1]) - theta[2],
            np.ones(data.shape[0])*np.log(theta[1]) - theta[3])


estr = MEstimator(psi_delta, init=[10., 2., 1., 1.])
estr.estimate()

print("=========================================================")
print("Delta Method")
print("=========================================================")
print("M-Estimation")
print("Theta:", estr.theta)
print("Var:  \n", estr.variance)
print("=========================================================")
=========================================================
Delta Method
=========================================================
M-Estimation
Theta: [10.16284625  4.11208477  2.0278276   1.41393014]
Var:
 [[ 0.02056042 -0.00837    -0.00206379 -0.00203546]
 [-0.00837     0.18081935  0.04458452  0.04397267]
 [-0.00206379  0.04458452  0.01099318  0.01084232]
 [-0.00203546  0.04397267  0.01084232  0.01069352]]
=========================================================

Notice the use of the np.ones trick to ensure that the final equations are the correct shapes. Here, there are 4 parameters, so init must be provided 4 values.

7.2.6 Instrumental Variable Estimation

Consider the following instrumental variable approach to correcting for measurement error of a variable. Here, \(Y\) is the outcome of interest, \(X\) is the true unmeasured variable, \(X^*\) is the possibly mismeasured variables, and \(I\) is the instrument for \(X\).We are interested in estimating \(\beta_1\) of

\[Y_i = \beta_0 + \beta_1 X_i + e_{i,j}\]

Since \(X^*\) is mismeasured, we can’t immediately estimated \(\beta_1\). Instead, we need to use an instrumental variable approach. Below is some generated data consistent with this measurment error story

[8]:
# Generating some data
n = 500
data = pd.DataFrame()
data['X'] = np.random.normal(size=n)
data['Y'] = 0.5 + 2*data['X'] + np.random.normal(loc=0, size=n)
data['X-star'] = data['X'] + np.random.normal(loc=0, size=n)
data['T'] = -0.75 - 1*data['X'] + np.random.normal(loc=0, size=n)

Two variations on the estimating equations for instrumental variable analyses are provided in the book. The first estimating equation is

\[\begin{split}\psi(Y_i,X_i^*,T_i, \theta) = \begin{bmatrix} T_i - \theta_1\\ (Y_i - \theta_2X_i^*)(\theta_1 - T_i) \end{bmatrix}\end{split}\]

where \(\theta_1\) is the mean of the instrument, and \(\theta_2\) corresponds to \(\beta_1\). The previous estimating equations can be translated as

[9]:
def psi_instrument(theta):
    return (theta[0] - data['T'],
            (data['Y'] - data['X-star']*theta[1])*(theta[0] - data['T']))


estr = MEstimator(psi_instrument, init=[0.1, 0.1])
estr.estimate()

print("=========================================================")
print("Instrumental Variable")
print("=========================================================")
print("M-Estimation")
print("Theta:", estr.theta)
print("Var:  \n", estr.variance)
print("=========================================================")
=========================================================
Instrumental Variable
=========================================================
M-Estimation
Theta: [-0.89989957  2.01777751]
Var:
 [[ 0.00430115 -0.0006694 ]
 [-0.0006694   0.023841  ]]
=========================================================

As mentioned in the chapter, certain joint distributions may be of interest. To capture these additional distributions, the estimating equations were updated to

\[\begin{split}\psi(Y_i,X_i^*,T_i, \theta) = \begin{bmatrix} T_i - \theta_1\\ \theta_2 - X_i^* \\ (Y_i - \theta_3 X_i^*)(\theta_2 - X_i^*)\\ (Y_i - \theta_4 X_i^*)(\theta_1 - T_i) \end{bmatrix}\end{split}\]

This set of estimating equations further allows for inference on the difference between \(\beta_1\) minus the coefficient for \(Y\) given \(X^*\). Here, \(\theta_1\) is the mean of the instrument, \(\theta_2\) is the mean of the mismeasured value of \(X\), and \(\theta_3\) corresponds to the coefficient for \(Y\) given \(X^*\), and \(\theta_4\) is \(\beta_1\).

Again, we can easily translate these equations for delicatessen

[10]:
def psi(theta):
    return (theta[0] - data['T'],
            theta[1] - data['X-star'],
            (data['Y'] - data['X-star']*theta[2])*(theta[1] - data['X-star']),
            (data['Y'] - data['X-star']*theta[3])*(theta[0] - data['T'])
            )


estr = MEstimator(psi, init=[0.1, 0.1, 0.1, 0.1])
estr.estimate()

print("=========================================================")
print("Instrumental Variable")
print("=========================================================")
print("M-Estimation")
print("Theta:", estr.theta)
print("Var:  \n", estr.variance)
print("=========================================================")
=========================================================
Instrumental Variable
=========================================================
M-Estimation
Theta: [-0.89989957  0.02117577  0.95717618  2.01777751]
Var:
 [[ 0.00430115 -0.00207361 -0.00011136 -0.0006694 ]
 [-0.00207361  0.0041239   0.00023703  0.00039778]
 [-0.00011136  0.00023703  0.00302462  0.00171133]
 [-0.0006694   0.00039778  0.00171133  0.023841  ]]
=========================================================

7.4 Nonsmooth Estimating Functions

7.4.1 Robust Location Estimation

To begin, we generate some generic data used for this example

[11]:
y = np.random.normal(size=250)
n = y.shape[0]

The robust location estimator reduces the influence of outliers by applying bounds. The robust mean proposed by Huber (1964) is

\[\psi(Y_i, \theta_1) = g_k(Y_i) - \theta_1\]

where \(k\) indicates the bound, such that if \(Y_i>k\) then \(k\), or \(Y_i<-k\) then \(-k\), otherwise \(Y_i\).

Below is the estimating equation translated into code

[12]:
def psi_robust_mean(theta):
    k = 3                          # Bound value
    yr = np.where(y > k, k, y)     # Applying upper bound
    yr = np.where(y < -k, -k, y)   # Applying lower bound
    return yr - theta


estr = MEstimator(psi_robust_mean, init=[0.])
estr.estimate()

print("=========================================================")
print("Robust Location")
print("=========================================================")
print("M-Estimation")
print("Theta:", estr.theta)
print("Var:  \n", estr.variance)
print("=========================================================")
=========================================================
Robust Location
=========================================================
M-Estimation
Theta: [0.03056108]
Var:
 [[0.00370521]]
=========================================================

Notice that the estimating equation here is not smooth (i.e., non-differentiable at \(k\)).

7.5 Regression M-estimators

7.5.1 Linear Model with Random \(X\)

For the next examples, the following simulated data is used

[13]:
n = 500
data = pd.DataFrame()
data['X'] = np.random.normal(size=n)
data['Z'] = np.random.normal(size=n)
data['Y'] = 0.5 + 2*data['X'] - 1*data['Z'] + np.random.normal(size=n)
data['C'] = 1

Here, we are interested in estimating the relationship between \(Y\) and \(X,Z\). We will do this via linear regression. Note that we need to manually add an intercept (the column C in the data).

It is also worthwhile to note that the variance here is robust (to violations of the homoscedastic assumption). As comparison, we provide the equivalent using statsmodels generalized linear model with heteroscedastic-corrected variances.

As with all the preceding estimating equations, there are multiple ways to code these. Since linear regression involves matrix manipulations for the programmed estimating equations to return the correct format for delicatessen, we highlight two variations here.

First, we present a for-loop implementation of the estimating equation

[14]:
def psi(theta):
    # Transforming to arrays
    x = np.asarray(data[['C', 'X', 'Z']])
    y = np.asarray(data['Y'])
    beta = np.asarray(theta)[:, None]
    n = x.shape[0]

    # Where to store each of the resulting estimates
    est_vals = []

    # Looping through each observation
    for i in range(n):
        v_i = (y[i] - np.dot(x[i], beta)) * x[i]
        est_vals.append(v_i)

    # returning 3-by-n object
    return np.asarray(est_vals).T

mestimator = MEstimator(psi, init=[0.1, 0.1, 0.1])
mestimator.estimate()

print("=========================================================")
print("Linear Model")
print("=========================================================")
print("M-Estimation: by-hand")
print(mestimator.theta)
print(mestimator.variance)
print("---------------------------------------------------------")

print("GLM Estimator")
glm = smf.glm("Y ~ X + Z", data).fit(cov_type="HC1")
print(np.asarray(glm.params))
print(np.asarray(glm.cov_params()))
print("=========================================================")
=========================================================
Linear Model
=========================================================
M-Estimation: by-hand
[ 0.41082601  1.96289222 -1.02663555]
[[ 2.18524086e-03  7.28169364e-05  1.54216649e-04]
 [ 7.28169364e-05  2.08315683e-03 -4.09520393e-05]
 [ 1.54216649e-04 -4.09520393e-05  2.14573772e-03]]
---------------------------------------------------------
GLM Estimator
[ 0.41082601  1.96289222 -1.02663555]
[[ 2.18524092e-03  7.28169947e-05  1.54216630e-04]
 [ 7.28169947e-05  2.08315690e-03 -4.09519947e-05]
 [ 1.54216630e-04 -4.09519947e-05  2.14573770e-03]]
=========================================================

As the second approach, a vectorized version is used

[15]:
def psi_regression(theta):
    x = np.asarray(data[['C', 'X', 'Z']])
    y = np.asarray(data['Y'])[:, None]
    beta = np.asarray(theta)[:, None]
    return ((y - np.dot(x, beta)) * x).T


mestimator = MEstimator(psi_regression, init=[0.1, 0.1, 0.1])
mestimator.estimate()

print("=========================================================")
print("Linear Model")
print("=========================================================")
print("M-Estimation: by-hand")
print(mestimator.theta)
print(mestimator.variance)
print("=========================================================")
=========================================================
Linear Model
=========================================================
M-Estimation: by-hand
[ 0.41082601  1.96289222 -1.02663555]
[[ 2.18524113e-03  7.28169589e-05  1.54216655e-04]
 [ 7.28169589e-05  2.08315685e-03 -4.09520174e-05]
 [ 1.54216655e-04 -4.09520174e-05  2.14573766e-03]]
=========================================================

While these two approaches give the same answer, vectorized versions will generally be faster than for-loop variations (but may be less ‘human readable’). Having said that, it is easier to make a mistake with a vectorized version. We would generally recommend creating a for-loop version first (and then creating a vectorized version if that for-loop is too slow).

The following uses the built-in linear regression functionality (which uses the vectorized implementation)

[16]:
from delicatessen.estimating_equations import ee_regression

def psi_regression(theta):
    return ee_regression(theta=theta,
                         X=data[['C', 'X', 'Z']],
                         y=data['Y'],
                         model='linear')


mestimator = MEstimator(psi_regression, init=[0.1, 0.1, 0.1])
mestimator.estimate()

print("=========================================================")
print("Linear Model")
print("=========================================================")
print("M-Estimation: built-in")
print(mestimator.theta)
print(mestimator.variance)
print("=========================================================")
=========================================================
Linear Model
=========================================================
M-Estimation: built-in
[ 0.41082601  1.96289222 -1.02663555]
[[ 2.18524113e-03  7.28169589e-05  1.54216655e-04]
 [ 7.28169589e-05  2.08315685e-03 -4.09520174e-05]
 [ 1.54216655e-04 -4.09520174e-05  2.14573766e-03]]
=========================================================

7.5.4 Robust Regression

The next example is robust regression, where the standard linear regression model is made robust to outliers. We use :math:f_k() from 7.4.1 but now apply it to the residuals of the regression model.

[17]:
def psi_robust_regression(theta):
    k = 1.345
    x = np.asarray(data[['C', 'X', 'Z']])
    y = np.asarray(data['Y'])[:, None]
    beta = np.asarray(theta)[:, None]
    preds = np.clip(y - np.dot(x, beta), -k, k)
    return (preds * x).T


mestimator = MEstimator(psi_robust_regression, init=[0.5, 2., -1.])
mestimator.estimate()

print("=========================================================")
print("Robust Linear Model")
print("=========================================================")
print("M-Estimation: by-hand")
print(mestimator.theta)
print(mestimator.variance)
print("=========================================================")
=========================================================
Robust Linear Model
=========================================================
M-Estimation: by-hand
[ 0.41223641  1.95577495 -1.02508413]
[[ 2.31591830e-03  1.82105969e-04  2.57209766e-04]
 [ 1.82105969e-04  2.12098818e-03 -6.95783670e-05]
 [ 2.57209766e-04 -6.95783670e-05  2.38212585e-03]]
=========================================================

The following uses the built-in robust linear regression functionality

[18]:
from delicatessen.estimating_equations import ee_robust_regression

def psi_robust_regression(theta):
    return ee_robust_regression(theta=theta,
                                X=data[['C', 'X', 'Z']],
                                y=data['Y'],
                                model='linear',
                                loss='huber', k=1.345)

mestimator = MEstimator(psi_robust_regression, init=[0.5, 2., -1.])
mestimator.estimate()

print("=========================================================")
print("Robust Linear Model")
print("=========================================================")
print("M-Estimation: built-in")
print(mestimator.theta)
print(mestimator.variance)
print("=========================================================")
=========================================================
Robust Linear Model
=========================================================
M-Estimation: built-in
[ 0.41223641  1.95577495 -1.02508413]
[[ 2.31591830e-03  1.82105969e-04  2.57209766e-04]
 [ 1.82105969e-04  2.12098818e-03 -6.95783670e-05]
 [ 2.57209766e-04 -6.95783670e-05  2.38212585e-03]]
=========================================================

You’ll notice that the coefficients have changed slightly here. That is because we have reduced the extent of outliers on the estimation of the linear regression parameters (however, our simulated data mechanism doesn’t really result in major outliers, so the change is small here).

7.5.5 Generalized Linear Models

The last category of models consider here are generalized linear models (GLMs). These models are much more flexible than the previous regression models in terms of the distributions being assumed. To illustrate this, we will use GLMs to estimate the odds ratio, risk ratio, and risk difference (since the book does not provide specific examples).

The following is some generic data for the example

[19]:
d = pd.DataFrame()
d['X'] = [1, -1, 0, 1, 2, 1, -2, -1, 0, 3, -3, 1, 1, -1, -1, -2, 2, 0, -1, 0]
d['Z'] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
d['Y'] = [1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0]
d['I'] = 1

For the subsequent examples, we will use the built-in GLM functionality.

Odds Ratio

To estimate odds ratios, we will use logistic regression. Logistic regression can be implemented through a GLM with a binomial distribution and the logit link. Below is the implemenentation

[20]:
from delicatessen.estimating_equations import ee_glm

def psi(theta):
    return ee_glm(theta, X=d[['I', 'X', 'Z']], y=d['Y'],
                  distribution='binomial', link='logit')

mestr = MEstimator(psi, init=[0., 0., 0.])
mestr.estimate()

print("=========================================================")
print("Logistic")
print("=========================================================")
print("ln(OR):", mestr.theta)
print("OR:    ", np.exp(mestr.theta))
print("=========================================================")
=========================================================
Logistic
=========================================================
ln(OR): [-0.34613077  0.16182415  0.28150058]
OR:     [0.70741997 1.17565348 1.32511677]
=========================================================

To estimate the risk ratios, we can use a log-binomial regression model. This can be implemented through a GLM with the binomial distribution and the log identity. Below is code to do this

[21]:
def psi(theta):
    return ee_glm(theta, X=d[['I', 'X', 'Z']], y=d['Y'],
                  distribution='binomial', link='log')

mestr = MEstimator(psi, init=[-.9, 0., 0.])
mestr.estimate(solver='lm')

print("=========================================================")
print("Log-binomial")
print("=========================================================")
print("ln(RR):", mestr.theta)
print("RR:    ", np.exp(mestr.theta))
print("=========================================================")
=========================================================
Log-binomial
=========================================================
ln(RR): [-0.89140821  0.06939184  0.16163837]
RR:     [0.41007787 1.07185612 1.1754351 ]
=========================================================

As seen here, the odds ratio and risk ratios differ. This is expected since the outcome is not that rare.

Note: the log-binomial can be a bit difficult to estimate as it is not bounded (but the log-RR is bounded). What this means is that decent starting values might need to be provided. As seen here, the intercept is given a good starting value.

Finally, we can estimate the risk difference via linear-binomial regression. This can be done with the binomial distribution but with the identity link. Below is code for this model

[22]:
def psi(theta):
    return ee_glm(theta, X=d[['I', 'X', 'Z']], y=d['Y'],
                  distribution='binomial', link='identity')

mestr = MEstimator(psi, init=[.2, 0., 0.])
mestr.estimate(solver='lm')

print("=========================================================")
print("Risk difference")
print("=========================================================")
print("RD:   ", mestr.theta)
print("=========================================================")
=========================================================
Risk difference
=========================================================
RD:    [0.41621376 0.03456562 0.06753619]
=========================================================

Like the log-binomial model this too can be difficult to fit since the risk differences are bounded but the model is not. This is especially a concern for the linear-binomial model, since we need to ensure the starting values don’t produce risk differences outside the range of \([-1,1]\). To ensure this, a starting value of \(0.2\) was used for the intercept.

This concludes the replication of the examples provided in Boos & Stefanski (2002).