GMM-estimator

class GMMEstimator(stacked_equations, init, subset=None, finite_correction=None, overid_maxiter=10, overid_tolerance=1e-09)

Generalized Method of Moments (GMM) Estimator for stacked estimating equations.

Estimating equations are a general approach to point and variance estimation that consists of defining an estimator as the solution to a vector of equations that are equal to zero. The corresponding GMM estimator satisifies the following equation

\[\text{argmin}_{\theta} \left[ \sum_{i=1}^n \psi(O_i, \hat{\theta}) \right]^T \text{Q} \left[ \sum_{i=1}^n \psi(O_i, \hat{\theta}) \right]^T\]

where \(\psi\) is the \(\ge v\)-dimensional vector of estimating function(s), \(\hat{\theta}\) is the \(v\)-dimensional parameter vector, \(\text{Q}\) is a weight matrix, and \(O_i\) is the observed data (where units are independent but not necessarily identically distributed).

GMM estimators consists of two broad step: point estimation and variance estimation. Point estimation is carried out by finding where the minimum of the negative product of the estimating equations is. Initially \(\text{Q}\) is set to be the identity matrix. This process is done via standard minimization algorithms.

Note

For over-identified settings, GMMEstimator uses an two-step iterative procedure, where \(\theta\) is estimated, then the meat matrix is computed. The meat matrix replaces the identity matrix as \(\text{Q}\). Then the optimization process is repeated.

For variance estimation, sandwich variance estimator is used. The asymptotic sandwich variance estimator consists of

\[V_n(O, \hat{\theta}) = B_n(O, \hat{\theta})^{-1} F_n(O, \hat{\theta}) \left\{B_n(O, \hat{\theta}^{-1})\right\}^T\]

where \(B\) is the ‘bread’ and \(F\) is the ‘filling’ matrix. These matrices are defined as

\[B_n(O, \hat{\theta}) = n^{-1} \sum_{i=1}^{n} - \frac{\partial}{\partial \theta} \psi(O_i, \hat{\theta})\]

\[F_n(O, \hat{\theta}) = n^{-1} \sum_{i=1}^{n} \psi(O_i, \hat{\theta}) \psi(O_i, \hat{\theta})^T\]

respectively. The partial derivatives for the bread are calculated using either numerical approximation (e.g., forward difference method) or forward-mode automatic differentiation. Inverting the bread is done via NumPy’s linalg.pinv. For the filling, the dot product is taken at \(\hat{\theta}\).

After completion of these steps, point and variance estimates are stored. These can be extracted from GMMEstimator. Further, confidence intervals, Z-scores, P-values, or S-values can all be generated.

Parameters

stacked_equations (function, callable) – Function that returns a v-by-n NumPy array of the estimating equations. See provided examples in the documentation for how to construct a set of estimating equations.
init (list, set, array) – Initial values for the minimization algorithm. A total of v values should be provided.
subset (list, set, array, None, optional) – Optional argument to conduct the minimization procedure on a subset of parameters in the estimating equations. The input list is used to location index the parameter array via np.take(). The subset list will only affect the minimization procedure (i.e., the sandwich variance estimator ignores the subset argument). Default is None, which runs the minimization procedure for all parameters in the estimating equations.
finite_correction (str, None, optional) – Whether to apply a finite-sample correction when computing the empirical sandwich variance estimator. Default is None, which applies no correction. When set to a string, the t-distribution is used instead of the Z-distribution to compute confidence intervals, P-values, and S-values. Finite-sample corrections include: HC1. The HC1 correction replaces the scaling by \(n\) with \(n-p\) where \(p\) is the number of parameters.
overid_maxiter (int, optional) – Maximum number of iterations for the two-step GMM-estimation procedure with over-identified parameters. Note that this procedure is only used for estimating equations that include an over-identified parameter, otherwise this argument has no impact. Default is 10.
overid_tolerance (float, optional) – Error tolerance for the two-step GMM-estimation procedure with over-identified parameters. Note that this procedure is only used for estimating equations that include an over-identified parameter, otherwise this argument has no impact. If the tolerance threshold is met, the two-step procedure is terminated. The threshold is defined as the maximum of the absolute value of the difference between the parameters of the current iteration and the previous iteration. Default is 1e-9.

Note

Because the minimization procedure is NOT ran for parameters outside of the subset, those coefficients must be solved outside of GMMEstimator. In general, I do NOT recommend using the subset argument unless a series of complex estimating equations need to be solved. In general, this argument does not massively improve speed until the estimating equations consist of hundreds of parameters.

theta

Parameters for estimating equations. Becomes the estimated parameters after .estimate() is called

Type: ndarray

variance

Covariance matrix for the parameters, in the same order as the parameter vector

Type: ndarray

asymptotic_variance

Asymptotic covariance matrix for the parameters. Not scaled by \(n\)

Type: ndarray

bread

Bread matrix for the parameter vector

Type: ndarray

meat

Meat matrix for the parameter vector

Type: ndarray

weight_matrix

Weight matrix, \(\text{Q}\) used. For just-identified problems, the weight matrix is simply the identity matrix, otherwise the weight matrix is updated to the meat matrix from the prior iteration.

Type: ndarray

Examples

Loading necessary functions and building a generic data set for estimation of the mean

>>> import numpy as np
>>> import pandas as pd
>>> from scipy.stats import logistic
>>> from delicatessen import GMMEstimator
>>> from delicatessen.estimating_equations import ee_mean_variance

>>> y_dat = [1, 2, 4, 1, 2, 3, 1, 5, 2]

GMM-estimation with built-in estimating equation for the mean and variance. First, psi, or the stacked estimating equations, is defined

>>> def psi(theta):
>>>     return ee_mean_variance(theta=theta, y=y_dat)

Calling the GMM-estimation procedure

>>> estr = GMMEstimator(stacked_equations=psi, init=[0, 0, ])
>>> estr.estimate()

Inspecting the output results

>>> estr.theta                                  # Point estimates
>>> estr.variance                               # Covariance
>>> estr.asymptotic_variance                    # Asymptotic covariance
>>> np.sqrt(np.diag(estr.asymptotic_variance))  # Standard deviation
>>> estr.variance                               # Covariance
>>> np.sqrt(np.diag(estr.variance))             # Standard error
>>> estr.confidence_intervals()                 # Confidence intervals
>>> estr.z_scores()                             # Z-scores
>>> estr.p_values()                             # P-values
>>> estr.s_values()                             # S-values

For more examples on how to apply GMMEstimator, see https://deli.readthedocs.io/en/latest/

References

Duncan GM. (1987). A simplified approach to M-estimation with application to two-stage estimators. Journal of Econometrics, 34(3), 373-389.

Hansen LP. (1982). Large sample properties of generalized method of moments estimators. Econometrica: Journal of the Econometric Society, 1029-1054.

Jesus J, & Chandler RE. (2011). Estimating functions and the generalized method of moments. Interface Focus, 1(6), 871-885.

Wirjanto T. (1997). Estimating Functions and Over-Identified Models. Lecture Notes - Monograph Series, 239-256.

estimate(solver='bfgs', maxiter=5000, tolerance=1e-09, deriv_method='approx', dx=1e-09, allow_pinv=True)

Run the point and variance estimation procedures for given estimating equation and starting values. This function carries out the point and variance estimation of theta. After this procedure, the point estimates (in theta) and the covariance matrix (in variance) can be extracted from the GMMEstimator object.

Parameters

solver (str, function, callable, optional) – Method to use for the minimization procedure. Default is the BFGS algorithm (scipy.optimize.minimize(method='bfgs'), specified by solver='bfgs'). Other built-in options are the L-BFGS-B algorithm ('l-bfgs-b'), the conjugate gradient algorithm ('cg'), Nelder-Mead ('nelder-mead'), and the modified Powell algorithm ('powell'). Finally, any generic minimization algorithm can be used via a user-provided callable object. The function must consist of two keyword arguments: stacked_equations, and init. Additionally, the function should return only the optimized values. Please review the provided example in the documentation for how to implement a custom minimization algorithm.
maxiter (int, optional) – Maximum iterations to consider for the minimization procedure. Default is 5000 iterations. For complex estimating equations, this value may need to be increased. This argument is not used when a custom minimization method (e.g., solver) is provided.
tolerance (float, optional) – Maximum tolerance for errors in the minimization in scipy.optimize. Default is 1e-9. This argument is not used when a custom minimization method (e.g., solver) is provided.
deriv_method (str, optional) – Method to compute the derivative of the estimating equations for the bread matrix. Default is 'approx'. Options include numerical approximation via the forward difference method via SciPy ('approx'), forward difference as implemented in delicatessen (‘fapprox’), backward difference as implemented in delicatessen (‘bapprox’), central difference implemented as in delicatessen (‘capprox’), or forward-mode automatic differentiation as implemented in delicatessen('exact').
dx (float, optional) – Spacing to use to numerically approximate the partial derivatives of the bread matrix. Default is 1e-9. Here, a small value for dx should be used, since some large values can result in poor approximations. This argument is only used with numerical approximation methods.
allow_pinv (bool, optional) – Whether to allow for the pseudo-inverse (via numpy.linalg.pinv) if the bread matrix is determined to be non-invertible. If you want to disallow the pseudo-inverse (i.e., use numpy.linalg.inv), set this argument to False. Default is True, which is more robust to the possible bread matrices.

Returns

Return type

None

confidence_bands(subset=None, alpha=0.05, method='supt', n_draws=1000000, seed=None)

Calculate two-sided \((1 - \alpha) \times\) 100% confidence bands from the point and sandwich variance estimates. Rather than cover a single parameter, the confidence bands provide coverage for parameter vectors. The formula for the confidence bands is

\[\hat{\theta} \pm \hat{c}_{\alpha / 2} \times \widehat{SE}(\hat{\theta})\]

Note

The .estimate() function must be called before the confidence bands can be calculated.

This formula looks very similar to the confidence interval formula, but there is a different critical value. Note that the formula above has a hat on the critical value, \(c\), to denote it is being estimated. Currently, only the sup-t method for estimating the critical value is supported.

Parameters

subset (list, set, array, None, optional) – Optional argument to compute the confidence bands for a subset of parameters. Note that this argument is distinct from the overall subset argument and is only used for the confidence bands. Default is None, which computes the confidence bands for all parameters (as if they are are all the parameters of interest).
alpha (float, optional) – The \(0 < \alpha < 1\) level for the corresponding confidence bands. Default is 0.05, which corresponds to 95% confidence bands.
method (str, optional) – Method to compute the confidence bands. Currently, only the sup-t and Bonferroni method are supported. Default is 'supt'
n_draws (int, optional) – Number of random draws to use for any methods based on simulated approximation. Default is one million, 1000000.
seed (int, optional) – Seed to initialize a pseudo RNG for methods based on simulated approximations. Default is None which does not use a reproducible seed. To consistently obtain the same confidence bands with approximation methods, please use a seed.

Returns

b-by-2 array, where row 1 is the confidence intervals for \(\theta_1\), …, and row b is the confidence intervals for \(\theta_b\)

Return type

array

References

Montiel Olea JL & Plagborg‐Møller M. (2019). Simultaneous confidence bands: Theory, implementation, and an application to SVARs. Journal of Applied Econometrics, 34(1), 1-17.

Zivich PN, Cole SR, Greifer N, Montoya LM, Kosorok MR, Edwards JK. (2025). Confidence Regions for Multiple Outcomes, Effect Modifiers, and Other Multiple Comparisons, arXiv:2510.07076

confidence_intervals(alpha=0.05)

Calculate two-sided Wald-type \((1 - \alpha) \times\) 100% confidence intervals using the point and sandwich variance estimates. The formula for the confidence intervals is

\[\hat{\theta} \pm z_{\alpha / 2} \times \widehat{SE}(\hat{\theta})\]

Note

The .estimate() function must be called before the confidence intervals can be calculated.

Parameters: alpha (float, optional) – The \(0 < \alpha < 1\) level for the corresponding confidence intervals. Default is 0.05, which corresponds to 95% confidence intervals.
Returns: b-by-2 array, where row 1 is the confidence intervals for \(\theta_1\), …, and row b is the confidence intervals for \(\theta_b\)
Return type: array

influence_functions(allow_pinv=True)

Calculate the influence function values for individual observations. Note that the influence function for \(i\) is computed using the following equality

\[IF(O_i; \theta) = \left\{ B_n(theta) \right\}^{-1} \psi(O_i; \theta)\]

or that the influence functions is equal to the estimating function scaled by the inverse of the bread matrix. While this approach was historically used to derive the influence function, there are more general alternatives. delicatessen offers this implementation since

Returns: Returns a n-by-b NumPy array evaluated for the input theta.
Return type: array

References

Fisher A, & Kennedy EH. (2021). Visually communicating and teaching intuition for influence functions. The American Statistician, 75(2), 162-172.

Hines O, Dukes O, Diaz-Ordaz K, & Vansteelandt S. (2022). Demystifying statistical learning based on efficient influence functions. The American Statistician, 76(3), 292-304.

Stefanski LA, & Boos DD. (2002). The calculus of M-estimation. The American Statistician, 56(1), 29-38.

p_values(null=0)

Calculate two-sided Wald-type P-values using the Z-scores compute using the point and the sandwich variance estimates. Once the Z-scores are computed, the corresponding P-values are obtained by comparing to the standard normal distribution.

The .estimate() function must be called before the P-values can be calculated.

Parameters: null (int, float, ndarray, optional) – Null or reference for the the corresponding P-values. Default is 0.
Returns: Array of P-values for \(\theta_1, ..., \theta_b\), respectively
Return type: array

print_results(subset=None, decimals=2, alpha=0.05)

Display estimator results to the console. This function prints the results of the estimator object (point estimates, standard error, confidence intervals, P-value, S-value) to the console. This function is only meant as an easy way to display the underlying results to the console without having to call each function separately.

Parameters

subset (list, set, array, None, optional) – List of indices of a subset of parameters to display in the table. Default is None, which displays all the paramters in the table.
decimals (int, optional) – Number of decimals to display in the table. Default is 2. The spacing of the table may be skewed when decimals > 5.
alpha (float, optional) – The \(0 < \alpha < 1\) level for the corresponding confidence bands. Default is 0.05, which corresponds to 95% confidence intervals.

Returns

Return type

None

s_values(null=0)

Calculate two-sided Wald-type S-values using the point estimates and the sandwich variance. The S-value, or Shannon Information value, is a transformation of the P-values that has been argued to be more easily interpretable as it can be related back to simple coin-flipping scenarios. The transformation from a P-value into a S-value is.

\[S = - \log_2(P)\]

where \(P\) is the corresponding P-value. The .estimate() function must be called before the S-values can be calculated.

The S-value can be contextualized in terms of coin-flips. Suppose the S-value is \(s\). Then \(s\) corresponds to the number of heads in a row with a fair coin (equal chances heads or tails). As \(s\) increases, one would be more ‘surprised’ by the result (e.g., it might not be surprising to have 2 heads in a row, but it would be surprising for 20 in a row).

Parameters: null (int, float, ndarray, optional) – Null or reference for the the corresponding S-values. Default is 0.
Returns: Array of S-values for \(\theta_1, ..., \theta_b\), respectively
Return type: array

References

Cole SR, Edwards JK, & Greenland S. (2021). Surprise! American Journal of Epidemiology, 190(2), 191-193.

Greenland S. (2019). Valid P-values behave exactly as they should: Some misleading criticisms of P-values and their resolution with S-values. The American Statistician, 73(sup1), 106-114.

z_scores(null=0)

Calculate the Z-score using the point estimates and the sandwich variance. The formula for the Z score is

\[\frac{\hat{\theta} - \theta}{\widehat{SE}(\hat{\theta})}\]

where \(\theta\) is the null. The .estimate() function must be called before the Z-scores can be calculated. Note that the default value for the null is zero.

Parameters: null (int, float, ndarray, optional) – Null or reference for the the corresponding P-values. Default is 0.
Returns: Array of Z-scores for \(\theta_1, ..., \theta_b\), respectively
Return type: array