M-estimator

class MEstimator(stacked_equations, init, subset=None, finite_correction=None)

M-Estimator for stacked estimating equations.

Estimating equations are a general approach to point and variance estimation that consists of defining an estimator as the solution to a vector of equations that are equal to zero. The corresponding estimators, often called M-estimators or Z-estimators, satisify the following equation

\[\sum_{i=1}^n \psi(O_i, \hat{\theta}) = 0\]

where \(\psi\) is the \(v\)-dimensional vector of estimating function(s), \(\hat{\theta}\) is the \(v\)-dimensional parameter vector, and \(O_i\) is the observed data (where units are independent but not necessarily identically distributed).

M-Estimators consists of two broad step: point estimation and variance estimation. Point estimation is carried out by determining the values of \(\theta\) where the sum of the estimating equations are zero. This is done via standard root-finding algorithms.

For variance estimation, sandwich variance estimator is used. The asymptotic sandwich variance estimator consists of

\[V_n(O, \hat{\theta}) = B_n(O, \hat{\theta})^{-1} F_n(O, \hat{\theta}) \left\{B_n(O, \hat{\theta}^{-1})\right\}^T\]

where \(B\) is the ‘bread’ and \(F\) is the ‘filling’ matrix. These matrices are defined as

\[B_n(O, \hat{\theta}) = n^{-1} \sum_{i=1}^{n} - \frac{\partial}{\partial \theta} \psi(O_i, \hat{\theta})\]

\[F_n(O, \hat{\theta}) = n^{-1} \sum_{i=1}^{n} \psi(O_i, \hat{\theta}) \psi(O_i, \hat{\theta})^T\]

respectively. The partial derivatives for the bread are calculated using either numerical approximation (e.g., forward difference method) or forward-mode automatic differentiation. Inverting the bread is done via NumPy’s linalg.pinv. For the filling, the dot product is taken at \(\hat{\theta}\).

After completion of these steps, point and variance estimates are stored. These can be extracted from MEstimator. Further, confidence intervals, Z-scores, P-values, or S-values can all be generated.

Parameters

stacked_equations (function, callable) – Function that returns a v-by-n NumPy array of the estimating equations. See provided examples in the documentation for how to construct a set of estimating equations.
init (list, set, array) – Initial values for the root-finding algorithm. A total of v values should be provided.
subset (list, set, array, None, optional) – Optional argument to conduct the root-finding procedure on a subset of parameters in the estimating equations. The input list is used to location index the parameter array via np.take(). The subset list will only affect the root-finding procedure (i.e., the sandwich variance estimator ignores the subset argument). Default is None, which runs the root-finding procedure for all parameters in the estimating equations.
finite_correction (str, None, optional) – Whether to apply a finite-sample correction when computing the empirical sandwich variance estimator. Default is None, which applies no correction. When set to a string, the t-distribution is used instead of the Z-distribution to compute confidence intervals, P-values, and S-values. Finite-sample corrections include: HC1. The HC1 correction replaces the scaling by \(n\) with \(n-p\) where \(p\) is the number of parameters.

Note

Because the root-finding procedure is NOT ran for parameters outside of the subset, those coefficients must be solved outside of MEstimator. In general, I do NOT recommend using the subset argument unless a series of complex estimating equations need to be solved. In general, this argument does not massively improve speed until the estimating equations consist of hundreds of parameters.

theta

Parameters for estimating equations. Becomes the estimated parameters after .estimate() is called

Type: ndarray

variance

Covariance matrix for the parameters, in the same order as the parameter vector

Type: ndarray

asymptotic_variance

Asymptotic covariance matrix for the parameters. Not scaled by \(n\)

Type: ndarray

bread

Bread matrix for the parameter vector

Type: ndarray

meat

Meat matrix for the parameter vector

Type: ndarray

Examples

Loading necessary functions and building a generic data set for estimation of the mean

>>> import numpy as np
>>> import pandas as pd
>>> from scipy.stats import logistic
>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_mean_variance

>>> y_dat = [1, 2, 4, 1, 2, 3, 1, 5, 2]

M-estimation with built-in estimating equation for the mean and variance. First, psi, or the stacked estimating equations, is defined

>>> def psi(theta):
>>>     return ee_mean_variance(theta=theta, y=y_dat)

Calling the M-estimation procedure

>>> estr = MEstimator(stacked_equations=psi, init=[0, 0, ])
>>> estr.estimate()

Inspecting the output results

>>> estr.theta                                  # Point estimates
>>> estr.variance                               # Covariance
>>> estr.asymptotic_variance                    # Asymptotic covariance
>>> np.sqrt(np.diag(estr.asymptotic_variance))  # Standard deviation
>>> estr.variance                               # Covariance
>>> np.sqrt(np.diag(estr.variance))             # Standard error
>>> estr.confidence_intervals()                 # Confidence intervals
>>> estr.z_scores()                             # Z-scores
>>> estr.p_values()                             # P-values
>>> estr.s_values()                             # S-values

Alternatively, a custom estimating equation can be specified. This is done by constructing a valid estimating equation for the MEstimator. The MEstimator expects the psi function to return a v-by-n array, where v is the number of parameters (length of theta) and n is the total number of observations. Below is an example of the mean and variance estimating equation from before, but implemented by-hand

>>> def psi(theta):
>>>     y = np.array(y_dat)
>>>     mean = y - theta[0]
>>>     variance = (y - theta[0]) ** 2 - theta[1]
>>>     return mean, variance

The M-estimation procedure is called using the same approach as before

>>> estr = MEstimator(stacked_equations=psi, init=[0, 0, ])
>>> estr.estimate()

Note that len(init) should be equal to v. So in this case, two initial values are provided.

MEstimator can also be run with a user-provided root-finding algorithm. To specify a custom root-finder, a function must be created by the user that consists of two keyword arguments (stacked_equations, init) and must return only the optimized values. The following is an example with SciPy’s Levenberg-Marquardt algorithm implemented in root.

>>> def custom_solver(stacked_equations, init):
>>>     options = {"maxiter": 1000}
>>>     opt = root(stacked_equations,
>>>                x0=np.asarray(init),
>>>                method='lm', tol=1e-9, options=options)
>>>     return opt.x

The provided custom root-finder can then be implemented like the following (continuing with the estimating equation from the previous example):

>>> estr = MEstimator(stacked_equations=psi, init=[0, 0, ])
>>> estr.estimate(solver=custom_solver)

For more examples on how to apply MEstimator, see https://deli.readthedocs.io/en/latest/

References

Boos DD, & Stefanski LA. (2013). M-estimation (estimating equations). In Essential Statistical Inference (pp. 297-337). Springer, New York, NY.

Ross RK, Zivich PN, Stringer JSA, & Cole SR. (2024). M-estimation for common epidemiological measures: introduction and applied examples. International Journal of Epidemiology, 53(2), dyae030.

Stefanski LA, & Boos DD. (2002). The calculus of M-estimation. The American Statistician, 56(1), 29-38.

estimate(solver='lm', maxiter=5000, tolerance=1e-09, deriv_method='approx', dx=1e-09, allow_pinv=True)

Run the point and variance estimation procedures for given estimating equation and starting values. This function carries out the point and variance estimation of theta. After this procedure, the point estimates (in theta) and the covariance matrix (in variance) can be extracted from the MEstimator object.

Parameters

solver (str, function, callable, optional) – Method to use for the root-finding procedure. Default is the Levenberg-Marquardt algorithm (scipy.optimize.root(method='lm'), specified by solver='lm'). Other built-in option is the secant method (scipy.optimize.newton, specified by solver='newton'), and a modification of the Powell hybrid method (scipy.optimize.root(method='hybr'), specified by solver='hybr'). Finally, any generic root-finding algorithm can be used via a user-provided callable object. The function must consist of two keyword arguments: stacked_equations, and init. Additionally, the function should return only the optimized values. Please review the provided example in the documentation for how to implement a custom root-finding algorithm.
maxiter (int, optional) – Maximum iterations to consider for the root finding procedure. Default is 5000 iterations. For complex estimating equations, this value may need to be increased. This argument is not used when a custom root-finding method (e.g., solver) is provided.
tolerance (float, optional) – Maximum tolerance for errors in the root finding in scipy.optimize. Default is 1e-9. This argument is not used when a custom root-finding method (e.g., solver) is provided.
deriv_method (str, optional) – Method to compute the derivative of the estimating equations for the bread matrix. Default is 'approx'. Options include numerical approximation via the forward difference method via SciPy ('approx'), forward difference as implemented in delicatessen (‘fapprox’), backward difference as implemented in delicatessen (‘bapprox’), central difference implemented as in delicatessen (‘capprox’), or forward-mode automatic differentiation as implemented in delicatessen('exact').
dx (float, optional) – Spacing to use to numerically approximate the partial derivatives of the bread matrix. Default is 1e-9. Here, a small value for dx should be used, since some large values can result in poor approximations. This argument is only used with numerical approximation methods.
allow_pinv (bool, optional) – Whether to allow for the pseudo-inverse (via numpy.linalg.pinv) if the bread matrix is determined to be non-invertible. If you want to disallow the pseudo-inverse (i.e., use numpy.linalg.inv), set this argument to False. Default is True, which is more robust to the possible bread matrices.

Returns

Return type

None

confidence_bands(subset=None, alpha=0.05, method='supt', n_draws=1000000, seed=None)

Calculate two-sided \((1 - \alpha) \times\) 100% confidence bands from the point and sandwich variance estimates. Rather than cover a single parameter, the confidence bands provide coverage for parameter vectors. The formula for the confidence bands is

\[\hat{\theta} \pm \hat{c}_{\alpha / 2} \times \widehat{SE}(\hat{\theta})\]

Note

The .estimate() function must be called before the confidence bands can be calculated.

This formula looks very similar to the confidence interval formula, but there is a different critical value. Note that the formula above has a hat on the critical value, \(c\), to denote it is being estimated. Currently, only the sup-t method for estimating the critical value is supported.

Parameters

subset (list, set, array, None, optional) – Optional argument to compute the confidence bands for a subset of parameters. Note that this argument is distinct from the overall subset argument and is only used for the confidence bands. Default is None, which computes the confidence bands for all parameters (as if they are are all the parameters of interest).
alpha (float, optional) – The \(0 < \alpha < 1\) level for the corresponding confidence bands. Default is 0.05, which corresponds to 95% confidence bands.
method (str, optional) – Method to compute the confidence bands. Currently, only the sup-t and Bonferroni method are supported. Default is 'supt'
n_draws (int, optional) – Number of random draws to use for any methods based on simulated approximation. Default is one million, 1000000.
seed (int, optional) – Seed to initialize a pseudo RNG for methods based on simulated approximations. Default is None which does not use a reproducible seed. To consistently obtain the same confidence bands with approximation methods, please use a seed.

Returns

b-by-2 array, where row 1 is the confidence intervals for \(\theta_1\), …, and row b is the confidence intervals for \(\theta_b\)

Return type

array

References

Montiel Olea JL & Plagborg‐Møller M. (2019). Simultaneous confidence bands: Theory, implementation, and an application to SVARs. Journal of Applied Econometrics, 34(1), 1-17.

Zivich PN, Cole SR, Greifer N, Montoya LM, Kosorok MR, Edwards JK. (2025). Confidence Regions for Multiple Outcomes, Effect Modifiers, and Other Multiple Comparisons, arXiv:2510.07076

confidence_intervals(alpha=0.05)

Calculate two-sided Wald-type \((1 - \alpha) \times\) 100% confidence intervals using the point and sandwich variance estimates. The formula for the confidence intervals is

\[\hat{\theta} \pm z_{\alpha / 2} \times \widehat{SE}(\hat{\theta})\]

Note

The .estimate() function must be called before the confidence intervals can be calculated.

Parameters: alpha (float, optional) – The \(0 < \alpha < 1\) level for the corresponding confidence intervals. Default is 0.05, which corresponds to 95% confidence intervals.
Returns: b-by-2 array, where row 1 is the confidence intervals for \(\theta_1\), …, and row b is the confidence intervals for \(\theta_b\)
Return type: array

influence_functions(allow_pinv=True)

Calculate the influence function values for individual observations. Note that the influence function for \(i\) is computed using the following equality

\[IF(O_i; \theta) = \left\{ B_n(theta) \right\}^{-1} \psi(O_i; \theta)\]

or that the influence functions is equal to the estimating function scaled by the inverse of the bread matrix. While this approach was historically used to derive the influence function, there are more general alternatives. delicatessen offers this implementation since

Returns: Returns a n-by-b NumPy array evaluated for the input theta.
Return type: array

References

Fisher A, & Kennedy EH. (2021). Visually communicating and teaching intuition for influence functions. The American Statistician, 75(2), 162-172.

Hines O, Dukes O, Diaz-Ordaz K, & Vansteelandt S. (2022). Demystifying statistical learning based on efficient influence functions. The American Statistician, 76(3), 292-304.

Stefanski LA, & Boos DD. (2002). The calculus of M-estimation. The American Statistician, 56(1), 29-38.

p_values(null=0)

Calculate two-sided Wald-type P-values using the Z-scores compute using the point and the sandwich variance estimates. Once the Z-scores are computed, the corresponding P-values are obtained by comparing to the standard normal distribution.

The .estimate() function must be called before the P-values can be calculated.

Parameters: null (int, float, ndarray, optional) – Null or reference for the the corresponding P-values. Default is 0.
Returns: Array of P-values for \(\theta_1, ..., \theta_b\), respectively
Return type: array

print_results(subset=None, decimals=2, alpha=0.05)

Display estimator results to the console. This function prints the results of the estimator object (point estimates, standard error, confidence intervals, P-value, S-value) to the console. This function is only meant as an easy way to display the underlying results to the console without having to call each function separately.

Parameters

subset (list, set, array, None, optional) – List of indices of a subset of parameters to display in the table. Default is None, which displays all the paramters in the table.
decimals (int, optional) – Number of decimals to display in the table. Default is 2. The spacing of the table may be skewed when decimals > 5.
alpha (float, optional) – The \(0 < \alpha < 1\) level for the corresponding confidence bands. Default is 0.05, which corresponds to 95% confidence intervals.

Returns

Return type

None

s_values(null=0)

Calculate two-sided Wald-type S-values using the point estimates and the sandwich variance. The S-value, or Shannon Information value, is a transformation of the P-values that has been argued to be more easily interpretable as it can be related back to simple coin-flipping scenarios. The transformation from a P-value into a S-value is.

\[S = - \log_2(P)\]

where \(P\) is the corresponding P-value. The .estimate() function must be called before the S-values can be calculated.

The S-value can be contextualized in terms of coin-flips. Suppose the S-value is \(s\). Then \(s\) corresponds to the number of heads in a row with a fair coin (equal chances heads or tails). As \(s\) increases, one would be more ‘surprised’ by the result (e.g., it might not be surprising to have 2 heads in a row, but it would be surprising for 20 in a row).

Parameters: null (int, float, ndarray, optional) – Null or reference for the the corresponding S-values. Default is 0.
Returns: Array of S-values for \(\theta_1, ..., \theta_b\), respectively
Return type: array

References

Cole SR, Edwards JK, & Greenland S. (2021). Surprise! American Journal of Epidemiology, 190(2), 191-193.

Greenland S. (2019). Valid P-values behave exactly as they should: Some misleading criticisms of P-values and their resolution with S-values. The American Statistician, 73(sup1), 106-114.

z_scores(null=0)

Calculate the Z-score using the point estimates and the sandwich variance. The formula for the Z score is

\[\frac{\hat{\theta} - \theta}{\widehat{SE}(\hat{\theta})}\]

where \(\theta\) is the null. The .estimate() function must be called before the Z-scores can be calculated. Note that the default value for the null is zero.

Parameters: null (int, float, ndarray, optional) – Null or reference for the the corresponding P-values. Default is 0.
Returns: Array of Z-scores for \(\theta_1, ..., \theta_b\), respectively
Return type: array