Morris et al. (2022): Precision in Randomized Trials

The following is a replication of the analyses described in Morris et al. (2022)., which uses data from Wilson E et al. (2017). Morris et al. examine three different approaches to adjust for prognostic factors in randomized trials. The three ways are direct adjustment, standardization, and inverse probability weighting. This example highlights how causal inference methods (like those in the Hernan & Robins (2023)) can be used in randomized trials.

Here, we are not going to consider direct adjustment. Attention is further restricted to the any-test outcome. For finer details and comparisons between the covariate adjustment approaches, see the original publication.

Setup

[1]:

import numpy as np
import pandas as pd
import patsy

import delicatessen
from delicatessen import MEstimator
from delicatessen.estimating_equations import ee_regression
from delicatessen.utilities import inverse_logit

print("Versions")
print("NumPy:        ", np.__version__)
print("pandas:       ", pd.__version__)
print("Delicatessen: ", delicatessen.__version__)

Versions
NumPy:         1.25.2
pandas:        1.4.1
Delicatessen:  2.1

Data is available from Supplement S1 of Wilson et al. (2017) (referenced below).

[2]:

d = pd.read_excel("data/GetTestedData.xls", sheet_name=1)

# Subsetting the desired columns
cols = ['group',      # Randomized arm
        'anytest',    # Outcome 1 (any test)
        'anydiag',    # Outcome 2 (any diagnosis)
        'gender',     # gender (male, female, transgender)
        'msm',        # MSM, other
        'age',        # age (continuous)
        'partners',   # Number of partners in <12 months
        'ethnicgrp']  # Ethnicity (5 categories)
d = d[cols].copy()

# Re-coding columns as numbers
d['group_n'] = np.where(d['group'] == 'SH:24', 1, np.nan)
d['group_n'] = np.where(d['group'] == 'Control', 0, d['group_n'])
d['gender_n'] = np.where(d['gender'] == 'Female', 0, np.nan)
d['gender_n'] = np.where(d['gender'] == 'Male', 1, d['gender_n'])
d['gender_n'] = np.where(d['gender'] == 'Transgender', 2, d['gender_n'])
d['msm_n'] = np.where(d['msm'] == 'other', 0, np.nan)
d['msm_n'] = np.where(d['msm'] == 'msm', 1, d['msm_n'])
d['msm_n'] = np.where(d['msm'] == '99', 2, d['msm_n'])
d['partners_n'] = np.where(d['partners'] == '1', 0, np.nan)
categories = ['2', '3', '4', '5', '6', '7', '8', '9', '10+']
for index in range(len(categories)):
    d['partners_n'] = np.where(d['partners'] == categories[index],
                               index, d['partners_n'])

d['ethnicgrp_n'] = np.where(d['ethnicgrp'] == 'White/ White British', 0, np.nan)
d['ethnicgrp_n'] = np.where(d['ethnicgrp'] == 'Black/ Black British', 1, d['ethnicgrp_n'])
d['ethnicgrp_n'] = np.where(d['ethnicgrp'] == 'Mixed/ Multiple ethnicity', 2, d['ethnicgrp_n'])
d['ethnicgrp_n'] = np.where(d['ethnicgrp'] == 'Asian/ Asian British', 3, d['ethnicgrp_n'])
d['ethnicgrp_n'] = np.where(d['ethnicgrp'] == 'Other', 4, d['ethnicgrp_n'])

# Dropping old columns and renaming new ones
d = d.drop(columns=['group', 'gender', 'msm', 'partners', 'ethnicgrp'])
relabs = dict()
for c in cols:
    relabs[c + "_n"] = c

d = d.rename(columns=relabs)

As done in the main paper, we conduct a complete-case analysis (i.e., discard all the observations with missing outcomes).

As delicatessen interacts with NumPy arrays, we will further extract the covariates from the complete-case data set. Here, we use patsy, which allows for R-like formulas, to make the covariate manipulations easier. Specifically, the C(...) functionality generates an array of indicator variables (saving us the trouble of hand-coding disjoint indicators).

Finally, we generate some copies of the data set and set the corresponding group values to all-1 and all-0. These sets of covariates will be used to generate the predicted outcome under each treatment for the standardization approach

[3]:

ds = d.dropna().copy()

# Outcome
y = np.asarray(ds['anytest'])

# Treatment
a = np.asarray(ds['group'])

# Observed covariates
prop_score_covs = patsy.dmatrix("age + C(gender) + C(msm) + C(partners) + C(ethnicgrp)",
                                ds)
outcome_covs = patsy.dmatrix("group + age + C(gender) + C(msm) + C(partners) + C(ethnicgrp)",
                                ds)

# Setting group=a and getting matrices
dsa = ds.copy()
dsa['group'] = 1
outcome_a1_covs = patsy.dmatrix("group + age + C(gender) + C(msm) + C(partners) + C(ethnicgrp)",
                                dsa)
dsa['group'] = 0
outcome_a0_covs = patsy.dmatrix("group + age + C(gender) + C(msm) + C(partners) + C(ethnicgrp)",
                                dsa)

Standardization

For standardization (or g-computation), we stack several estimating equations together. Let \(Y\) indicate the outcome, \(A\) indicate the treatment, and \(W\) are the prognostic factors included in the model (age, gender, sexual orientation, number of partners, ethnicity). The parameters are \(\theta\), which we further divide into the interest parameters (\(\mu_0, \mu_1, \mu_2, \mu_3\)) and the nuisance parameters (\(\beta\)). As their name implies, nuisance parameters are not of interest but are necessary for estimation for our interest parameters. Importantly, by stacking the estimating equations, we can account for the uncertainty in the estimation of the nuisance parameters into the uncertainty in our interest parameters automatically.

The estimating equations are

\[\begin{split}\psi(Y_i, A_i, W_i; \theta) = \begin{bmatrix} (\mu_1 - \mu_0) - \mu_2 \\ \log(\mu_1 / \mu_0) - \mu_3 \\ \hat{Y}_i^{a=1} - \mu_1 \\ \hat{Y}_i^{a=0} - \mu_0 \\ (Y_i - \text{expit}(g(A_i,W_i)^T \beta)) g(A_i,W_i)_i \end{bmatrix}\end{split}\]

where \(g(A_i,W_i)\) is a specific function to generate the design matrix, and \(\hat{Y}_i^{a} = \text{expit}(g(a,W_i)^T \beta)\). The first equation in the stack is the risk difference (RD) and the second is the log-transformed risk ratio (RR). Notice that these are defined in terms of other parameters. Specifically, parameters from the third and fourth equations, which correspond to the risk under all-1 and the risk under all-0. The predicted values of the outcome are generated using the parameters from the final equation, which is the logistic model used to estimate the probability of \(Y_i\) conditional on \(A_i,W_i\).

An astute reader may notice that we could actually shorten this stack of equations. Rather than estimate \(\mu_2,\mu_3\), we could have plugged the corresponding \(\hat{Y}_i^{a}\) into the first and second equations (thereby removing the need for the third and fourth). While this is correct, we prefer the process above for three reasons: 1. The above stacked equations give us the correspond risk under each arm, which also may be of interest (what was the risk under a=0). Therefore, no extra steps are necessary to get this estimated parameter. 2. This second argument is more technical. Here, we use a root-finding procedure to get \(\hat{\theta}\). During this process we plug in different values for \(\hat{\theta}\) until we find the (approximate) zeroes. For the RR, some values during the exploration of values may result in \(\hat{Y}_i^{a=0} = 0\). This causes a division by zero and may break the root-finding procedure. By instead calculating the average of \(\hat{Y}_i^{a=0}\) across all \(i\)’s, we reduce the opporuntity for a rogue zero. 3. Related to the previous reason, the optimization procedure will run much faster since we only need to take the log of a single number as opposed to as an array. While this won’t make much of a difference in run-time here, it is good practice (and may be important in problems with more complex estimating equations).

Below we write out the estimating equation by-hand (except for the regression component, which we use a built-in functionality), estimate with our M-estimator, and then provide the results (similar to Table 3).

[4]:

def psi(theta):
    # Extracting parameters from theta for ease
    risk_diff = theta[0]
    log_risk_ratio = theta[1]
    risk_a1 = theta[2]
    risk_a0 = theta[3]
    beta = theta[4:]

    # Estimating nuisance model (outcome regression)
    ee_log = ee_regression(theta=beta,        # beta coefficients
                           X=outcome_covs,    # X covariates
                           y=y,               # y
                           model='logistic')  # logit

    # Generating predicted outcome values
    ee_ya1 = inverse_logit(np.dot(outcome_a1_covs, beta)) - risk_a1
    ee_ya0 = inverse_logit(np.dot(outcome_a0_covs, beta)) - risk_a0

    # Estimating interest parameters
    ee_risk_diff = np.ones(y.shape[0])*(risk_a1 - risk_a0) - risk_diff
    ee_risk_ratio = np.ones(y.shape[0])*np.log(risk_a1 / risk_a0) - log_risk_ratio

    # Returning stacked estimating equations (order matters)
    return np.vstack((ee_risk_diff,  # risk difference
                      ee_risk_ratio, # risk ratio
                      ee_ya1,        # risk a=1
                      ee_ya0,        # risk a=0
                      ee_log,))      # logistic model

[5]:

estr = MEstimator(psi,
                        init=[0, 0, 0.5, 0.5, ] + [0, ]*outcome_covs.shape[1])
estr.estimate(solver='lm')

[6]:

std = np.sqrt(np.diag(estr.variance))
fmt = '{:.3f}'

print("Risk Difference:", fmt.format(estr.theta[0]), "("+fmt.format(std[0])+")")
print("Risk Ratio:     ", fmt.format(estr.theta[1]), "("+fmt.format(std[1])+")")

Risk Difference: 0.260 (0.021)
Risk Ratio:      0.795 (0.075)

At this point, you may look at the results for the risk ratio and notice some differences in the third decimal place. We reserve conversation regarding this point till the end.

Inverse Probability Weighting

Now, we will consider the case of inverse probability weighting. Inverse probability weighting relies on a separate nuisance model. To clarify this, our nuisance parameters will be indicated by \(\alpha\) here.

The estimating equations are

\[\begin{split}\psi(Y_i, A_i, W_i; \theta) = \begin{bmatrix} (\mu_3 - \mu_4) - \mu_1 \\ \log(\mu_3 / \mu_4) - \mu_2 \\ \frac{Y_i A_i}{\text{expit}(g(W_i)^T \beta)} - \mu_3 \\ \frac{Y_i (1-A_i)}{1 - \text{expit}(g(W_i)^T \beta)} - \mu_4 \\ (A_i - \text{expit}(g(W_i)^T \beta)) g(W_i) \end{bmatrix}\end{split}\]

where \(g(W_i)\) is a specific function to generate the design matrix. As before, the first and second equations are the risk difference and risk ratio, respectively. Third and fourth equations correspond to the risk under all-1 and the risk under all-0. The final estimating equation is the propensity score model and is used to generate the propensity scores given \(W_i\).

Again, we write out the estimating equation by-hand (except for the regression component, which we use a built-in functionality), estimate with our M-estimator, and then provide the results (similar to Table 3).

[7]:

def psi(theta):
    # Extracting parameters from theta for ease
    risk_diff = theta[0]
    log_risk_ratio = theta[1]
    risk_a1 = theta[2]
    risk_a0 = theta[3]
    beta = theta[4:]

    # Estimating nuisance model (outcome regression)
    ee_log = ee_regression(theta=beta,         # beta coefficients
                           X=prop_score_covs,  # W covariates
                           y=a,                # a
                           model='logistic')      # logit

    # Generating predicted propensity score
    prop_score = inverse_logit(np.dot(prop_score_covs, beta))

    # Calculating weighted pieces
    ee_ya1 = (a*y) / prop_score - risk_a1
    ee_ya0 = ((1-a)*y) / (1-prop_score) - risk_a0

    # Estimating interest parameters
    ee_risk_diff = np.ones(a.shape[0])*(risk_a1 - risk_a0) - risk_diff
    ee_risk_ratio = np.ones(a.shape[0])*np.log(risk_a1 / risk_a0) - log_risk_ratio

    # Returning stacked estimating equations (order matters)
    return np.vstack((ee_risk_diff,  # risk difference
                      ee_risk_ratio, # risk ratio
                      ee_ya1,        # risk a=1
                      ee_ya0,        # risk a=0
                      ee_log,))      # logistic model

[8]:

estr = MEstimator(psi, init=[0., 0., 0.5, 0.5] + [0, ]*prop_score_covs.shape[1])
estr.estimate(solver='lm')

[9]:

std = np.sqrt(np.diag(estr.variance))
fmt = '{:.3f}'

print("Risk Difference:", fmt.format(estr.theta[0]), "("+fmt.format(std[0])+")")
print("Risk Ratio:     ", fmt.format(estr.theta[1]), "("+fmt.format(std[1])+")")

Risk Difference: 0.261 (0.021)
Risk Ratio:      0.805 (0.075)

Results are the same as Morris et al. up to the third decimal place.

Differences in Results

As promised, let’s return to the differences in the answers. Why does this occur? The most likely culprit is the differences in the procedures for finding \(\hat{\theta}\) across software.

delicatessen operates by a root-finding procedure. Essentially, we are looking for the values of \(\hat{\theta}\) that lead to the estimating equations all being approximately zero. Specifically, I am using the Levenberg-Marquardt algorithm in these examples. delicatessen uses a root-finding procedure because it simplifies building stacked estimating equations. However, this comes at the cost of speed and robustness of other approaches.

Other regression software tends to proceed under a different approach. Generally, most software aims to maximize the log-likelihood. As before, there are a variety of algorithms that can be used to find the maximize the likelihood. However, most algorithms take advantage of both the log-likelihood and the score function (the derivative of the log-likelihood) to find the maximum. This means they are quite robust to sparse data while also being fast. As there is sparse data (there are many categorical variables for number of partners), this is the most likely explanation for the small differences.

Conclusion

We replicated selected analyses in Morris et al. (2022) to showcase the basics of stacking estimating equations and how they are implemented in delicatessen. While delicatessen has built-in functionalities for standardization and inverse probability weighting, we opted to present them by-hand here. By-hand takes some extra work but it provides a better idea of how delicatessen can be used to build estimating equations. For example, each of the above approaches could further be extended to account for missing outcomes under a weaker assumption.

References

Morris TP, Walker AS, Williamson EJ, & White IR. (2022). “Planning a method for covariate adjustment in individually-randomised trials: a practical guide”. Trials, 23:328

Wilson E, Free C, Morris TP, Syred J, Ahamed I, Menon-Johansson AS et al.. (2017). “Internet-accessed sexually transmitted infection (e-STI) testing and results service: a randomised, single-blind, controlled trial”. PLoS Medicine, 14(12), e1002479.