Basics
Here, the basics of M-estimator will be reviewed. An M-estimator, \(\hat{\theta}\), is defined as the solution to the estimating equation
where \(\psi\) is a known \(v \times 1\)-dimension estimating function, \(O_i\) indicates the observed unit \(i \in \{1,...,n\}\), and the parameters are the vector \(\theta = (\theta_1, \theta_2, ..., \theta_v)\). Note that \(v\) is finite-dimensional and the number of parameters matches the dimension of the estimating functions.
Point Estimation
To implement the point estimation of \(\theta\), we use a root-finding algorithm. Root-finding algorithms are
procedures for finding the zeroes (i.e., roots) of an equation. This is accomplished in delicatessen
by using
SciPy’s root-finding algorithms.
Variance Estimation
To estimate the variance for \(\theta\), the M-estimator uses the empirical sandwich variance estimator:
where the ‘bread’ is
where the \(\psi'\) indicates the partial derivative, and the ‘filling’ is
The sandwich variance requires finding the derivative of the estimating functions and some matrix algebra. Again, we
can get the computer to complete all these calculations for us. For the derivative, delicatessen
offers two
options. First, the derivatives can be numerically approximated using the central difference method. This is done using
SciPy’s approx_fprime
functionality. As of v2.0
, the derivatives can also be computed using forward-mode
automatic differentiation. This approach provides the exact derivative (as opposed to an approximation). This is
implemented by-hand in delicatessen
via operator overloading. Finally, we use forward-mode because there is no
time advantage of backward-mode because the Jacobian is square.
Automatic Differentiation Caveats
There are some caveats to the use of automatic differentiation. First, some NumPy functionalities are not fully
supported. For example, np.log(x, where=0<x)
will result in an error since there is an attempt to evaluate a
log at zero internally. When using these specialty functions are necessary, it is better to use numerical approximation
for differentiation. The second is regarding discontinuities. Consider the following function \(f(x) = x**2\) if
\(x \ge 1\) and \(f(x) = 0\) otherwise. Because of how automatic differentiation operates, the derivative at
\(x=1\) will result in \(2x\) (this is the same behavior as other automatic differentiation software, like
autograd).
After computing the derivatives, the filling is computed via a dot product. The bread is then inverted using NumPy. Finally, the bread and filling matrices are combined via dot products.
This introduction has all been a little abstract. In the following Examples section, we will see how M-estimators can be used to address specific estimation problems.