delicatessen.estimating_equations.basic.ee_percentile
- ee_percentile(theta, y, q)
Estimating equation for the q percentile. The estimating equation is
\[\sum_{i=1}^n \left\{ q - I(Y_i \le \theta) \right\} = 0\]where \(0 < q < 1\) is the percentile. Notice that this estimating equation is non-smooth. Therefore, root-finding is difficult.
Note
As the derivative of the estimating equation is not defined at \(\hat{\theta}\), the bread (and sandwich) cannot be used to estimate the variance. This estimating equation is offered for completeness, but is not generally recommended for applications.
- Parameters
theta (ndarray, list, vector) – Theta in this case consists of one value. Therefore, initial values like the form of
[0, ]
should be provided.y (ndarray, list, vector) – 1-dimensional vector of n observed values. No missing data should be included (missing data may cause unexpected behavior when attempting to calculate the mean).
q (float) – Percentile to calculate. Must be \((0, 1)\)
- Returns
Returns a 1-by-n NumPy array evaluated for the input
theta
andy
.- Return type
array
Examples
Construction of a estimating equation(s) with
ee_percentile
should be done similar to the following>>> from delicatessen import MEstimator >>> from delicatessen.estimating_equations import ee_percentile
Some generic data to estimate the mean for
>>> np.random.seed(89041) >>> y_dat = np.random.normal(size=100)
Defining psi, or the stacked estimating equations
>>> def psi(theta): >>> return ee_percentile(theta=theta, y=y_dat, q=0.5)
Calling the M-estimation procedure (note that
init
has 2 values now).>>> estr = MEstimator(stacked_equations=psi, init=[0, ]) >>> estr.estimate(solver='hybr', tolerance=1e-3, dx=1, order=15)
Notice that we use a different solver, tolerance values, and parameters for numerically approximating the derivative here. These changes generally work better for the percentile since the estimating equation is non-smooth. Furthermore, optimization is hard when only a few observations (<100) are available.
>>> estr.theta
Then displays the estimated percentile / median. In this example, there is a difference between the closed form solution (
-0.07978
) and M-Estimation (-0.06022
).References
Boos DD, & Stefanski LA. (2013). M-estimation (estimating equations). In Essential Statistical Inference (pp. 297-337). Springer, New York, NY.