delicatessen.estimating_equations.basic.ee_percentile

ee_percentile(theta, y, q)

Estimating equation for the q percentile. The estimating equation is

\[\sum_{i=1}^n \left\{ q - I(Y_i \le \theta) \right\} = 0\]

where \(0 < q < 1\) is the percentile. Notice that this estimating equation is non-smooth. Therefore, root-finding is difficult.

Note

As the derivative of the estimating equation is not defined at \(\hat{\theta}\), the bread (and sandwich) cannot be used to estimate the variance. This estimating equation is offered for completeness, but is not generally recommended for applications.

Parameters
  • theta (ndarray, list, vector) – Theta in this case consists of one value. Therefore, initial values like the form of [0, ] should be provided.

  • y (ndarray, list, vector) – 1-dimensional vector of n observed values. No missing data should be included (missing data may cause unexpected behavior when attempting to calculate the mean).

  • q (float) – Percentile to calculate. Must be \((0, 1)\)

Returns

Returns a 1-by-n NumPy array evaluated for the input theta and y.

Return type

array

Examples

Construction of a estimating equation(s) with ee_percentile should be done similar to the following

>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_percentile

Some generic data to estimate the mean for

>>> np.random.seed(89041)
>>> y_dat = np.random.normal(size=100)

Defining psi, or the stacked estimating equations

>>> def psi(theta):
>>>     return ee_percentile(theta=theta, y=y_dat, q=0.5)

Calling the M-estimation procedure (note that init has 2 values now).

>>> estr = MEstimator(stacked_equations=psi, init=[0, ])
>>> estr.estimate(solver='hybr', tolerance=1e-3, dx=1, order=15)

Notice that we use a different solver, tolerance values, and parameters for numerically approximating the derivative here. These changes generally work better for the percentile since the estimating equation is non-smooth. Furthermore, optimization is hard when only a few observations (<100) are available.

>>> estr.theta

Then displays the estimated percentile / median. In this example, there is a difference between the closed form solution (-0.07978) and M-Estimation (-0.06022).

References

Boos DD, & Stefanski LA. (2013). M-estimation (estimating equations). In Essential Statistical Inference (pp. 297-337). Springer, New York, NY.