Basics ===================================== Here, the basics of the estimating equations are reviewed. This introduction will be a little abstract. In the Applied Examples, specific estimators are reviewed for a variety of different computational problems. M-estimator ------------------------------- An M-estimator, :math:`\hat{\theta}`, is defined as the solution to the estimating equation .. math:: \sum_{i=1}^{n} \psi(O_i, \hat{\theta}) = 0 where :math:`\psi` is a known :math:`v \times 1`-dimension estimating function, :math:`O_i` indicates the observed unit :math:`i \in \{1,...,n\}`, and the parameters are the vector :math:`\theta = (\theta_1, \theta_2, ..., \theta_v)`. Note that :math:`v` is finite-dimensional and the number of parameters matches the dimension of the estimating functions. In this equation, we use a *root-finding* algorithm to solve for :math:`\theta`. Root-finding algorithms are procedures for finding the zeroes (i.e., roots) of an equation. This is accomplished in ``delicatessen`` by using SciPy's root-finding algorithms. GMM-estimator ------------------------------- The Generalized Method of Moments (GMM) estimator is instead defined as the solution to .. math:: \text{argmin}_{\theta} \left[ \sum_{i=1}^n \psi(O_i, \hat{\theta}) \right] \text{Q} \left[ \sum_{i=1}^n \psi(O_i, \hat{\theta}) \right] where :math:`\text{Q}` is a weight matrix. In general, the weight matrix begins as the identity matrix as implemented in ``delicatessen``. For this equation, we use a *minimization* algorithm to solve for :math:`\theta`. This is accomplished in ``delicatessen`` by using SciPy's minimization routines. Note that solving this equation is equivalent to the M-estimator when the dimension of the parameters and estimating equations match. However, the GMM estimator can also be used when there is more estimating equations than parameters. This is referred to as *over-identification*. In these settings ``GMMEstimator`` can be used, but ``MEstimator`` cannot. Variance Estimation ------------------------------- Regardless of the point-estimation strategy, the empirical sandwich variance estimator is used to estimate the variance for :math:`\theta`: .. math:: V_n(O,\hat{\theta}) = B_n(O,\hat{\theta})^{-1} F_n(O,\hat{\theta}) \left(B_n(O,\hat{\theta})^{-1}\right)^T where the 'bread' is .. math:: B_n(O,\hat{\theta}) = n^{-1} \sum_{i=1}^n - \nabla \psi(O_i, \hat{\theta}) where the :math:`\nabla` indicates the gradient of the estimating functions, and the 'meat' or 'filling' is .. math:: M_n(O, \hat{\theta}) = n^{-1} \sum_{i=1}^n \psi(O_i, \hat{\theta}) \psi(O_i, \hat{\theta})^T The sandwich variance requires finding the derivative of the estimating functions and some matrix algebra. Again, we can get the computer to complete all these calculations for us. For the derivative, ``delicatessen`` offers two options: numerical approximation or forward-mode automatic differentiation. After computing the derivatives, the filling is computed via a dot product. The bread is then inverted using NumPy. If the pseudo-inverse is allowed, the Moore-Penrose inverse is used. Finally, the bread and filling matrices are combined via dot products. Automatic Differentiation Caveats ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ There are two caveats to the use of automatic differentiation. (1) some NumPy functionalities are not fully supported. For example, ``np.log(x, where=0`_.