Basics

Here, the basics of the estimating equations are reviewed. This introduction will be a little abstract. In the Applied Examples, specific estimators are reviewed for a variety of different computational problems.

M-estimator

An M-estimator, \(\hat{\theta}\), is defined as the solution to the estimating equation

\[\sum_{i=1}^{n} \psi(O_i, \hat{\theta}) = 0\]

where \(\psi\) is a known \(v \times 1\)-dimension estimating function, \(O_i\) indicates the observed unit \(i \in \{1,...,n\}\), and the parameters are the vector \(\theta = (\theta_1, \theta_2, ..., \theta_v)\). Note that \(v\) is finite-dimensional and the number of parameters matches the dimension of the estimating functions.

In this equation, we use a root-finding algorithm to solve for \(\theta\). Root-finding algorithms are procedures for finding the zeroes (i.e., roots) of an equation. This is accomplished in delicatessen by using SciPy’s root-finding algorithms.

GMM-estimator

The Generalized Method of Moments (GMM) estimator is instead defined as the solution to

\[\text{argmin}_{\theta} \left[ \sum_{i=1}^n \psi(O_i, \hat{\theta}) \right] \text{Q} \left[ \sum_{i=1}^n \psi(O_i, \hat{\theta}) \right]\]

where \(\text{Q}\) is a weight matrix. In general, the weight matrix begins as the identity matrix as implemented in delicatessen.

For this equation, we use a minimization algorithm to solve for \(\theta\). This is accomplished in delicatessen by using SciPy’s minimization routines.

Note that solving this equation is equivalent to the M-estimator when the dimension of the parameters and estimating equations match. However, the GMM estimator can also be used when there is more estimating equations than parameters. This is referred to as over-identification. In these settings GMMEstimator can be used, but MEstimator cannot.

Variance Estimation

Regardless of the point-estimation strategy, the empirical sandwich variance estimator is used to estimate the variance for \(\theta\):

\[V_n(O,\hat{\theta}) = B_n(O,\hat{\theta})^{-1} F_n(O,\hat{\theta}) \left(B_n(O,\hat{\theta})^{-1}\right)^T\]

where the ‘bread’ is

\[B_n(O,\hat{\theta}) = n^{-1} \sum_{i=1}^n - \nabla \psi(O_i, \hat{\theta})\]

where the \(\nabla\) indicates the gradient of the estimating functions, and the ‘meat’ or ‘filling’ is

\[M_n(O, \hat{\theta}) = n^{-1} \sum_{i=1}^n \psi(O_i, \hat{\theta}) \psi(O_i, \hat{\theta})^T\]

The sandwich variance requires finding the derivative of the estimating functions and some matrix algebra. Again, we can get the computer to complete all these calculations for us. For the derivative, delicatessen offers two options: numerical approximation or forward-mode automatic differentiation.

After computing the derivatives, the filling is computed via a dot product. The bread is then inverted using NumPy. If the pseudo-inverse is allowed, the Moore-Penrose inverse is used. Finally, the bread and filling matrices are combined via dot products.

Automatic Differentiation Caveats

There are two caveats to the use of automatic differentiation. (1) some NumPy functionalities are not fully supported. For example, np.log(x, where=0<x) will result in an error since there is an attempt to evaluate a log at zero internally. When using these specialty functions are necessary, it is better to use numerical approximation for differentiation. (2) Consider the following discontinuous function \(f(x) = x^2\) if \(x \ge 1\) and \(f(x) = 0\) otherwise. Because of how automatic differentiation operates, the derivative at \(x=1\) will result in \(2x\) (this is the same behavior as other automatic differentiation software, like autograd).

Finite-Sample Corrections

The sandwich variance estimator can perform poorly with small sample sizes. To help improve performance, there are finite-sample corrections. These corrections can be requested using the optional finite_correction argument available for both MEstimator and GMMEstimator. Currently, only the HC1 correction, which replaces \(n\) in the divisor for the variance with \(n-p\) where \(p\) is the number of parameters, is available.

Clustered Data

For clustered data, the delicatessen.utilities.aggregate_efuncs function can be used to condense observations along a group or cluster ID variable. This operation does not modify the point estimates but does modify the variance estimate. Implicitly, the exact variance estimator used here assumes observations are independent within clusters. While this may not be the case, the sandwich variance estimator is robust to violations of this assumption.

Code and Issue Tracker

Please report any bugs, issues, or feature requests on GitHub at pzivich/Delicatessen.