delicatessen.mestimation.MEstimator

class MEstimator(stacked_equations, init=None, subset=None)

M-Estimator for stacked estimating equations.

Estimating equations are a general approach to point and variance estimation that consists of defining an estimator as the solution to a vector of equations that are equal to zero. The corresponding estimators, often called M-estimators or Z-estimators, satisify the following equation

\[\sum_{i=1}^n \psi(O_i, \hat{\theta}) = 0\]

where \(\psi\) is the \(v\)-dimensional vector of estimating equation(s), \(\hat{\theta}\) is the \(v\)-dimensional parameter vector, and \(O_i\) is the observed data (where units are independent but not necessarily identically distributed).

Note

Estimating equations are advantageous in both theoretical and applied research. They simplifies proofs of consistency and asymptotic normality of estimators under a large-sample approximation framework. In application, this approach simplifies variance estimation and automates the delta-method.

M-Estimators consists of two broad step: point estimation and variance estimation. Point estimation is carried out by determining the values of \(\theta\) where the sum of the estimating equations are zero. This is done via standard root-finding algorithms.

For variance estimation, sandwich variance estimator is used. The asymptotic sandwich variance estimator consists of

\[V_n(O, \hat{\theta}) = B_n(O, \hat{\theta})^{-1} F_n(O, \hat{\theta}) \left\{B_n(O, \hat{\theta}^{-1})\right\}^T\]

where \(B\) is the ‘bread’ and \(F\) is the ‘filling’ matrix. These matrices are defined as

\[B_n(O, \hat{\theta}) = n^{-1} \sum_{i=1}^{n} - \frac{\partial}{\partial \theta} \psi(O_i, \hat{\theta})\]

\[F_n(O, \hat{\theta}) = n^{-1} \sum_{i=1}^{n} \psi(O_i, \hat{\theta}) \psi(O_i, \hat{\theta})^T\]

respectively. The partial derivatives for the bread are calculated using either numerical approximation (e.g., forward difference method) or forward-mode automatic differentiation. Inverting the bread is done via NumPy’s linalg.pinv. For the filling, the dot product is taken at \(\hat{\theta}\).

Note

The difficult part (that must be done by the user) is to specify the estimating equations. Be sure to check the provided examples for the expected format. Pre-built estimating equations for common problems are also made available.

After completion of these steps, point and variance estimates are stored. These can be extracted from MEstimator. Further, confidence intervals, Z-scores, P-values, or S-values can all be generated.

Note

For complex regression problems, the root-finding algorithms are not as robust relative to maximization approaches. A simple solution for difficult problems is to ‘pre-wash’ or find the solution to the equations and provide those as the initial starting values.

Parameters

stacked_equations (function, callable) – Function that returns a v-by-n NumPy array of the estimating equations. See provided examples in the documentation for how to construct a set of estimating equations.
init (list, set, array) – Initial values for the root-finding algorithm. A total of v values should be provided.
subset (list, set, array, None, optional) – Optional argument to conduct the root-finding procedure on a subset of parameters in the estimating equations. The input list is used to location index the parameter array via np.take(). The subset list will only affect the root-finding procedure (i.e., the sandwich variance estimator ignores the subset argument). Default is None, which runs the root-finding procedure for all parameters in the estimating equations.

Note

Because the root-finding procedure is NOT ran for parameters outside of the subset, those coefficients must be solved outside of MEstimator. In general, I do NOT recommend using the subset argument unless a series of complex estimating equations need to be solved. In general, this argument does not massively improve speed until the estimating equations consist of hundreds of parameters.

Examples

Loading necessary functions and building a generic data set for estimation of the mean

>>> import numpy as np
>>> import pandas as pd
>>> from scipy.stats import logistic
>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_mean_variance

>>> y_dat = [1, 2, 4, 1, 2, 3, 1, 5, 2]

M-estimation with built-in estimating equation for the mean and variance. First, psi, or the stacked estimating equations, is defined

>>> def psi(theta):
>>>     return ee_mean_variance(theta=theta, y=y_dat)

Calling the M-estimation procedure

>>> estr = MEstimator(stacked_equations=psi, init=[0, 0, ])
>>> estr.estimate()

Inspecting the output results

>>> estr.theta                                  # Point estimates
>>> estr.variance                               # Covariance
>>> estr.asymptotic_variance                    # Asymptotic covariance
>>> np.sqrt(np.diag(estr.asymptotic_variance))  # Standard deviation
>>> estr.variance                               # Covariance
>>> np.sqrt(np.diag(estr.variance))             # Standard error
>>> estr.confidence_intervals()                 # Confidence intervals
>>> estr.z_scores()                             # Z-scores
>>> estr.p_values()                             # P-values
>>> estr.s_values()                             # S-values

Alternatively, a custom estimating equation can be specified. This is done by constructing a valid estimating equation for the MEstimator. The MEstimator expects the psi function to return a v-by-n array, where v is the number of parameters (length of theta) and n is the total number of observations. Below is an example of the mean and variance estimating equation from before, but implemented by-hand

>>> def psi(theta):
>>>     y = np.array(y_dat)
>>>     mean = y - theta[0]
>>>     variance = (y - theta[0]) ** 2 - theta[1]
>>>     return mean, variance

The M-estimation procedure is called using the same approach as before

>>> estr = MEstimator(stacked_equations=psi, init=[0, 0, ])
>>> estr.estimate()

Note that len(init) should be equal to v. So in this case, two initial values are provided.

MEstimator can also be run with a user-provided root-finding algorithm. To specify a custom root-finder, a function must be created by the user that consists of two keyword arguments (stacked_equations, init) and must return only the optimized values. The following is an example with SciPy’s Levenberg-Marquardt algorithm implemented in root.

>>> def custom_solver(stacked_equations, init):
>>>     options = {"maxiter": 1000}
>>>     opt = root(stacked_equations,
>>>                x0=np.asarray(init),
>>>                method='lm', tol=1e-9, options=options)
>>>     return opt.x

The provided custom root-finder can then be implemented like the following (continuing with the estimating equation from the previous example):

>>> estr = MEstimator(stacked_equations=psi, init=[0, 0, ])
>>> estr.estimate(solver=custom_solver)

For more examples on how to apply MEstimator, see https://deli.readthedocs.io/en/latest/

References

Boos DD, & Stefanski LA. (2013). M-estimation (estimating equations). In Essential Statistical Inference (pp. 297-337). Springer, New York, NY.

Ross RK, Zivich PN, Stringer JSA, & Cole SR. (2024). M-estimation for common epidemiological measures: introduction and applied examples. International Journal of Epidemiology, 53(2), dyae030.

Stefanski LA, & Boos DD. (2002). The calculus of M-estimation. The American Statistician, 56(1), 29-38.

__init__(stacked_equations, init=None, subset=None)

Methods

`__init__`(stacked_equations[, init, subset])
`confidence_intervals`([alpha])	Calculate two-sided Wald-type \((1 - \alpha) \times\) 100% confidence intervals using the point and sandwich variance estimates.
`estimate`([solver, maxiter, tolerance, ...])	Run the point and variance estimation procedures for given estimating equation and starting values.
`p_values`([null])	Calculate two-sided Wald-type P-values using the Z-scores compute using the point and the sandwich variance estimates.
`s_values`([null])	Calculate two-sided Wald-type S-values using the point estimates and the sandwich variance.
`z_scores`([null])	Calculate the Z-score using the point estimates and the sandwich variance.

estimate(solver='lm', maxiter=5000, tolerance=1e-09, deriv_method='approx', dx=1e-09, allow_pinv=True)

Run the point and variance estimation procedures for given estimating equation and starting values. This function carries out the point and variance estimation of theta. After this procedure, the point estimates (in theta) and the covariance matrix (in variance) can be extracted from the MEstimator object.

Parameters

solver (str, function, callable, optional) – Method to use for the root-finding procedure. Default is the Levenberg-Marquardt algorithm (scipy.optimize.root(method='lm'), specified by solver='lm'). Other built-in option is the secant method (scipy.optimize.newton, specified by solver='newton'), and a modification of the Powell hybrid method (scipy.optimize.root(method='hybr'), specified by solver='hybr'). Finally, any generic root-finding algorithm can be used via a user-provided callable object. The function must consist of two keyword arguments: stacked_equations, and init. Additionally, the function should return only the optimized values. Please review the provided example in the documentation for how to implement a custom root-finding algorithm.
maxiter (int, optional) – Maximum iterations to consider for the root finding procedure. Default is 5000 iterations. For complex estimating equations, this value may need to be increased. This argument is not used when a custom root-finding method (e.g., solver) is provided.
tolerance (float, optional) – Maximum tolerance for errors in the root finding in scipy.optimize. Default is 1e-9. This argument is not used when a custom root-finding method (e.g., solver) is provided.
deriv_method (str, optional) – Method to compute the derivative of the estimating equations for the bread matrix. Default is 'approx'. Options include numerical approximation via the forward difference method via SciPy ('approx'), forward difference as implemented in delicatessen (‘fapprox’), backward difference as implemented in delicatessen (‘bapprox’), central difference implemented as in delicatessen (‘capprox’), or forward-mode automatic differentiation as implemented in delicatessen('exact').
dx (float, optional) – Spacing to use to numerically approximate the partial derivatives of the bread matrix. Default is 1e-9. Here, a small value for dx should be used, since some large values can result in poor approximations. This argument is only used with numerical approximation methods.
allow_pinv (bool, optional) – Whether to allow for the pseudo-inverse (via numpy.linalg.pinv) if the bread matrix is determined to be non-invertible. If you want to disallow the pseudo-inverse (i.e., use numpy.linalg.inv), set this argument to False. Default is True, which is more robust to the possible bread matrices.

Returns

Return type

None

confidence_intervals(alpha=0.05)

Calculate two-sided Wald-type \((1 - \alpha) \times\) 100% confidence intervals using the point and sandwich variance estimates. The formula for the confidence intervals is

\[\hat{\theta} \pm z_{\alpha / 2} \times \widehat{SE}(\hat{\theta})\]

Note

The .estimate() function must be called before the confidence intervals can be calculated.

Parameters: alpha (float, optional) – The \(0 < \alpha < 1\) level for the corresponding confidence intervals. Default is 0.05, which corresponds to 95% confidence intervals.
Returns: b-by-2 array, where row 1 is the confidence intervals for \(\theta_1\), …, and row b is the confidence intervals for \(\theta_b\)
Return type: array

z_scores(null=0)

Calculate the Z-score using the point estimates and the sandwich variance. The formula for the Z score is

\[\frac{\hat{\theta} - \theta}{\widehat{SE}(\hat{\theta})}\]

where \(\theta\) is the null. The .estimate() function must be called before the Z-scores can be calculated. Note that the default value for the null is zero.

Parameters: null (int, float, ndarray, optional) – Null or reference for the the corresponding P-values. Default is 0.
Returns: Array of Z-scores for \(\theta_1, ..., \theta_b\), respectively
Return type: array

p_values(null=0)

Calculate two-sided Wald-type P-values using the Z-scores compute using the point and the sandwich variance estimates. Once the Z-scores are computed, the corresponding P-values are obtained by comparing to the standard normal distribution.

The .estimate() function must be called before the P-values can be calculated.

Parameters: null (int, float, ndarray, optional) – Null or reference for the the corresponding P-values. Default is 0.
Returns: Array of P-values for \(\theta_1, ..., \theta_b\), respectively
Return type: array

s_values(null=0)

Calculate two-sided Wald-type S-values using the point estimates and the sandwich variance. The S-value, or Shannon Information value, is a transformation of the P-values that has been argued to be more easily interpretable as it can be related back to simple coin-flipping scenarios. The transformation from a P-value into a S-value is.

\[S = - \log_2(P)\]

where \(P\) is the corresponding P-value. The .estimate() function must be called before the S-values can be calculated.

The S-value can be contextualized in terms of coin-flips. Suppose the S-value is \(s\). Then \(s\) corresponds to the number of heads in a row with a fair coin (equal chances heads or tails). As \(s\) increases, one would be more ‘surprised’ by the result (e.g., it might not be surprising to have 2 heads in a row, but it would be surprising for 20 in a row).

Parameters: null (int, float, ndarray, optional) – Null or reference for the the corresponding S-values. Default is 0.
Returns: Array of S-values for \(\theta_1, ..., \theta_b\), respectively
Return type: array

References

Cole SR, Edwards JK, & Greenland S. (2021). Surprise! American Journal of Epidemiology, 190(2), 191-193.

Greenland S. (2019). Valid P-values behave exactly as they should: Some misleading criticisms of P-values and their resolution with S-values. The American Statistician, 73(sup1), 106-114.