M-estimator
- class MEstimator(stacked_equations, init, subset=None, finite_correction=None)
M-Estimator for stacked estimating equations.
Estimating equations are a general approach to point and variance estimation that consists of defining an estimator as the solution to a vector of equations that are equal to zero. The corresponding estimators, often called M-estimators or Z-estimators, satisify the following equation
\[\sum_{i=1}^n \psi(O_i, \hat{\theta}) = 0\]where \(\psi\) is the \(v\)-dimensional vector of estimating function(s), \(\hat{\theta}\) is the \(v\)-dimensional parameter vector, and \(O_i\) is the observed data (where units are independent but not necessarily identically distributed).
M-Estimators consists of two broad step: point estimation and variance estimation. Point estimation is carried out by determining the values of \(\theta\) where the sum of the estimating equations are zero. This is done via standard root-finding algorithms.
For variance estimation, sandwich variance estimator is used. The asymptotic sandwich variance estimator consists of
\[V_n(O, \hat{\theta}) = B_n(O, \hat{\theta})^{-1} F_n(O, \hat{\theta}) \left\{B_n(O, \hat{\theta}^{-1})\right\}^T\]where \(B\) is the ‘bread’ and \(F\) is the ‘filling’ matrix. These matrices are defined as
\[B_n(O, \hat{\theta}) = n^{-1} \sum_{i=1}^{n} - \frac{\partial}{\partial \theta} \psi(O_i, \hat{\theta})\]\[F_n(O, \hat{\theta}) = n^{-1} \sum_{i=1}^{n} \psi(O_i, \hat{\theta}) \psi(O_i, \hat{\theta})^T\]respectively. The partial derivatives for the bread are calculated using either numerical approximation (e.g., forward difference method) or forward-mode automatic differentiation. Inverting the bread is done via NumPy’s
linalg.pinv. For the filling, the dot product is taken at \(\hat{\theta}\).After completion of these steps, point and variance estimates are stored. These can be extracted from
MEstimator. Further, confidence intervals, Z-scores, P-values, or S-values can all be generated.- Parameters
stacked_equations (function, callable) – Function that returns a v-by-n NumPy array of the estimating equations. See provided examples in the documentation for how to construct a set of estimating equations.
init (list, set, array) – Initial values for the root-finding algorithm. A total of v values should be provided.
subset (list, set, array, None, optional) – Optional argument to conduct the root-finding procedure on a subset of parameters in the estimating equations. The input list is used to location index the parameter array via
np.take(). The subset list will only affect the root-finding procedure (i.e., the sandwich variance estimator ignores the subset argument). Default isNone, which runs the root-finding procedure for all parameters in the estimating equations.finite_correction (str, None, optional) – Whether to apply a finite-sample correction when computing the empirical sandwich variance estimator. Default is
None, which applies no correction. When set to a string, the t-distribution is used instead of the Z-distribution to compute confidence intervals, P-values, and S-values. Finite-sample corrections include: HC1. The HC1 correction replaces the scaling by \(n\) with \(n-p\) where \(p\) is the number of parameters.
Note
Because the root-finding procedure is NOT ran for parameters outside of the subset, those coefficients must be solved outside of
MEstimator. In general, I do NOT recommend using thesubsetargument unless a series of complex estimating equations need to be solved. In general, this argument does not massively improve speed until the estimating equations consist of hundreds of parameters.- theta
Parameters for estimating equations. Becomes the estimated parameters after .estimate() is called
- Type
ndarray
- variance
Covariance matrix for the parameters, in the same order as the parameter vector
- Type
ndarray
- asymptotic_variance
Asymptotic covariance matrix for the parameters. Not scaled by \(n\)
- Type
ndarray
- bread
Bread matrix for the parameter vector
- Type
ndarray
- meat
Meat matrix for the parameter vector
- Type
ndarray
Examples
Loading necessary functions and building a generic data set for estimation of the mean
>>> import numpy as np >>> import pandas as pd >>> from scipy.stats import logistic >>> from delicatessen import MEstimator >>> from delicatessen.estimating_equations import ee_mean_variance
>>> y_dat = [1, 2, 4, 1, 2, 3, 1, 5, 2]
M-estimation with built-in estimating equation for the mean and variance. First,
psi, or the stacked estimating equations, is defined>>> def psi(theta): >>> return ee_mean_variance(theta=theta, y=y_dat)
Calling the M-estimation procedure
>>> estr = MEstimator(stacked_equations=psi, init=[0, 0, ]) >>> estr.estimate()
Inspecting the output results
>>> estr.theta # Point estimates >>> estr.variance # Covariance >>> estr.asymptotic_variance # Asymptotic covariance >>> np.sqrt(np.diag(estr.asymptotic_variance)) # Standard deviation >>> estr.variance # Covariance >>> np.sqrt(np.diag(estr.variance)) # Standard error >>> estr.confidence_intervals() # Confidence intervals >>> estr.z_scores() # Z-scores >>> estr.p_values() # P-values >>> estr.s_values() # S-values
Alternatively, a custom estimating equation can be specified. This is done by constructing a valid estimating equation for the
MEstimator. TheMEstimatorexpects thepsifunction to return a v-by-n array, where v is the number of parameters (length oftheta) and n is the total number of observations. Below is an example of the mean and variance estimating equation from before, but implemented by-hand>>> def psi(theta): >>> y = np.array(y_dat) >>> mean = y - theta[0] >>> variance = (y - theta[0]) ** 2 - theta[1] >>> return mean, variance
The M-estimation procedure is called using the same approach as before
>>> estr = MEstimator(stacked_equations=psi, init=[0, 0, ]) >>> estr.estimate()
Note that
len(init)should be equal to v. So in this case, two initial values are provided.MEstimatorcan also be run with a user-provided root-finding algorithm. To specify a custom root-finder, a function must be created by the user that consists of two keyword arguments (stacked_equations,init) and must return only the optimized values. The following is an example with SciPy’s Levenberg-Marquardt algorithm implemented inroot.>>> def custom_solver(stacked_equations, init): >>> options = {"maxiter": 1000} >>> opt = root(stacked_equations, >>> x0=np.asarray(init), >>> method='lm', tol=1e-9, options=options) >>> return opt.x
The provided custom root-finder can then be implemented like the following (continuing with the estimating equation from the previous example):
>>> estr = MEstimator(stacked_equations=psi, init=[0, 0, ]) >>> estr.estimate(solver=custom_solver)
For more examples on how to apply
MEstimator, see https://deli.readthedocs.io/en/latest/References
Boos DD, & Stefanski LA. (2013). M-estimation (estimating equations). In Essential Statistical Inference (pp. 297-337). Springer, New York, NY.
Ross RK, Zivich PN, Stringer JSA, & Cole SR. (2024). M-estimation for common epidemiological measures: introduction and applied examples. International Journal of Epidemiology, 53(2), dyae030.
Stefanski LA, & Boos DD. (2002). The calculus of M-estimation. The American Statistician, 56(1), 29-38.
- estimate(solver='lm', maxiter=5000, tolerance=1e-09, deriv_method='approx', dx=1e-09, allow_pinv=True)
Run the point and variance estimation procedures for given estimating equation and starting values. This function carries out the point and variance estimation of
theta. After this procedure, the point estimates (intheta) and the covariance matrix (invariance) can be extracted from theMEstimatorobject.- Parameters
solver (str, function, callable, optional) – Method to use for the root-finding procedure. Default is the Levenberg-Marquardt algorithm (
scipy.optimize.root(method='lm'), specified bysolver='lm'). Other built-in option is the secant method (scipy.optimize.newton, specified bysolver='newton'), and a modification of the Powell hybrid method (scipy.optimize.root(method='hybr'), specified bysolver='hybr'). Finally, any generic root-finding algorithm can be used via a user-provided callable object. The function must consist of two keyword arguments:stacked_equations, andinit. Additionally, the function should return only the optimized values. Please review the provided example in the documentation for how to implement a custom root-finding algorithm.maxiter (int, optional) – Maximum iterations to consider for the root finding procedure. Default is 5000 iterations. For complex estimating equations, this value may need to be increased. This argument is not used when a custom root-finding method (e.g.,
solver) is provided.tolerance (float, optional) – Maximum tolerance for errors in the root finding in
scipy.optimize. Default is 1e-9. This argument is not used when a custom root-finding method (e.g.,solver) is provided.deriv_method (str, optional) – Method to compute the derivative of the estimating equations for the bread matrix. Default is
'approx'. Options include numerical approximation via the forward difference method via SciPy ('approx'), forward difference as implemented in delicatessen (‘fapprox’), backward difference as implemented in delicatessen (‘bapprox’), central difference implemented as in delicatessen (‘capprox’), or forward-mode automatic differentiation as implemented in delicatessen('exact').dx (float, optional) – Spacing to use to numerically approximate the partial derivatives of the bread matrix. Default is 1e-9. Here, a small value for
dxshould be used, since some large values can result in poor approximations. This argument is only used with numerical approximation methods.allow_pinv (bool, optional) – Whether to allow for the pseudo-inverse (via
numpy.linalg.pinv) if the bread matrix is determined to be non-invertible. If you want to disallow the pseudo-inverse (i.e., usenumpy.linalg.inv), set this argument toFalse. Default isTrue, which is more robust to the possible bread matrices.
- Returns
- Return type
None
- confidence_bands(subset=None, alpha=0.05, method='supt', n_draws=1000000, seed=None)
Calculate two-sided \((1 - \alpha) \times\) 100% confidence bands from the point and sandwich variance estimates. Rather than cover a single parameter, the confidence bands provide coverage for parameter vectors. The formula for the confidence bands is
\[\hat{\theta} \pm \hat{c}_{\alpha / 2} \times \widehat{SE}(\hat{\theta})\]Note
The
.estimate()function must be called before the confidence bands can be calculated.This formula looks very similar to the confidence interval formula, but there is a different critical value. Note that the formula above has a hat on the critical value, \(c\), to denote it is being estimated. Currently, only the sup-t method for estimating the critical value is supported.
- Parameters
subset (list, set, array, None, optional) – Optional argument to compute the confidence bands for a subset of parameters. Note that this argument is distinct from the overall
subsetargument and is only used for the confidence bands. Default isNone, which computes the confidence bands for all parameters (as if they are are all the parameters of interest).alpha (float, optional) – The \(0 < \alpha < 1\) level for the corresponding confidence bands. Default is 0.05, which corresponds to 95% confidence bands.
method (str, optional) – Method to compute the confidence bands. Currently, only the sup-t and Bonferroni method are supported. Default is
'supt'n_draws (int, optional) – Number of random draws to use for any methods based on simulated approximation. Default is one million,
1000000.seed (int, optional) – Seed to initialize a pseudo RNG for methods based on simulated approximations. Default is
Nonewhich does not use a reproducible seed. To consistently obtain the same confidence bands with approximation methods, please use a seed.
- Returns
b-by-2 array, where row 1 is the confidence intervals for \(\theta_1\), …, and row b is the confidence intervals for \(\theta_b\)
- Return type
array
References
Montiel Olea JL & Plagborg‐Møller M. (2019). Simultaneous confidence bands: Theory, implementation, and an application to SVARs. Journal of Applied Econometrics, 34(1), 1-17.
Zivich PN, Cole SR, Greifer N, Montoya LM, Kosorok MR, Edwards JK. (2025). Confidence Regions for Multiple Outcomes, Effect Modifiers, and Other Multiple Comparisons, arXiv:2510.07076
- confidence_intervals(alpha=0.05)
Calculate two-sided Wald-type \((1 - \alpha) \times\) 100% confidence intervals using the point and sandwich variance estimates. The formula for the confidence intervals is
\[\hat{\theta} \pm z_{\alpha / 2} \times \widehat{SE}(\hat{\theta})\]Note
The
.estimate()function must be called before the confidence intervals can be calculated.- Parameters
alpha (float, optional) – The \(0 < \alpha < 1\) level for the corresponding confidence intervals. Default is 0.05, which corresponds to 95% confidence intervals.
- Returns
b-by-2 array, where row 1 is the confidence intervals for \(\theta_1\), …, and row b is the confidence intervals for \(\theta_b\)
- Return type
array
- influence_functions(allow_pinv=True)
Calculate the influence function values for individual observations. Note that the influence function for \(i\) is computed using the following equality
\[IF(O_i; \theta) = \left\{ B_n(theta) \right\}^{-1} \psi(O_i; \theta)\]or that the influence functions is equal to the estimating function scaled by the inverse of the bread matrix. While this approach was historically used to derive the influence function, there are more general alternatives. delicatessen offers this implementation since
- Returns
Returns a n-by-b NumPy array evaluated for the input
theta.- Return type
array
References
Fisher A, & Kennedy EH. (2021). Visually communicating and teaching intuition for influence functions. The American Statistician, 75(2), 162-172.
Hines O, Dukes O, Diaz-Ordaz K, & Vansteelandt S. (2022). Demystifying statistical learning based on efficient influence functions. The American Statistician, 76(3), 292-304.
Stefanski LA, & Boos DD. (2002). The calculus of M-estimation. The American Statistician, 56(1), 29-38.
- p_values(null=0)
Calculate two-sided Wald-type P-values using the Z-scores compute using the point and the sandwich variance estimates. Once the Z-scores are computed, the corresponding P-values are obtained by comparing to the standard normal distribution.
The
.estimate()function must be called before the P-values can be calculated.- Parameters
null (int, float, ndarray, optional) – Null or reference for the the corresponding P-values. Default is 0.
- Returns
Array of P-values for \(\theta_1, ..., \theta_b\), respectively
- Return type
array
- print_results(subset=None, decimals=2, alpha=0.05)
Display estimator results to the console. This function prints the results of the estimator object (point estimates, standard error, confidence intervals, P-value, S-value) to the console. This function is only meant as an easy way to display the underlying results to the console without having to call each function separately.
- Parameters
subset (list, set, array, None, optional) – List of indices of a subset of parameters to display in the table. Default is
None, which displays all the paramters in the table.decimals (int, optional) – Number of decimals to display in the table. Default is
2. The spacing of the table may be skewed whendecimals > 5.alpha (float, optional) – The \(0 < \alpha < 1\) level for the corresponding confidence bands. Default is 0.05, which corresponds to 95% confidence intervals.
- Returns
- Return type
None
- s_values(null=0)
Calculate two-sided Wald-type S-values using the point estimates and the sandwich variance. The S-value, or Shannon Information value, is a transformation of the P-values that has been argued to be more easily interpretable as it can be related back to simple coin-flipping scenarios. The transformation from a P-value into a S-value is.
\[S = - \log_2(P)\]where \(P\) is the corresponding P-value. The
.estimate()function must be called before the S-values can be calculated.The S-value can be contextualized in terms of coin-flips. Suppose the S-value is \(s\). Then \(s\) corresponds to the number of heads in a row with a fair coin (equal chances heads or tails). As \(s\) increases, one would be more ‘surprised’ by the result (e.g., it might not be surprising to have 2 heads in a row, but it would be surprising for 20 in a row).
- Parameters
null (int, float, ndarray, optional) – Null or reference for the the corresponding S-values. Default is 0.
- Returns
Array of S-values for \(\theta_1, ..., \theta_b\), respectively
- Return type
array
References
Cole SR, Edwards JK, & Greenland S. (2021). Surprise! American Journal of Epidemiology, 190(2), 191-193.
Greenland S. (2019). Valid P-values behave exactly as they should: Some misleading criticisms of P-values and their resolution with S-values. The American Statistician, 73(sup1), 106-114.
- z_scores(null=0)
Calculate the Z-score using the point estimates and the sandwich variance. The formula for the Z score is
\[\frac{\hat{\theta} - \theta}{\widehat{SE}(\hat{\theta})}\]where \(\theta\) is the null. The
.estimate()function must be called before the Z-scores can be calculated. Note that the default value for the null is zero.- Parameters
null (int, float, ndarray, optional) – Null or reference for the the corresponding P-values. Default is 0.
- Returns
Array of Z-scores for \(\theta_1, ..., \theta_b\), respectively
- Return type
array