delicatessen.utilities.robust_loss_functions
- robust_loss_functions(residual, loss, k, a=None, b=None)
Loss functions for robust mean and robust regression estimating equations. This function is called internally for
ee_mean_robustandee_robust_regression. This function can also be accessed, so user’s can easily adapt their own regression models into robust regression models using the pre-defined loss functions.Note
The loss functions here are technically the first-order derivatives of the loss functions you see in the literature.
The following score of the loss functions, \(f_k()\), are available.
Andrew’s Sine
\[f_k(x) = I(k \pi \le x \le k \pi) \times \sin(x/k)\]Huber
\[f_k(x) = x I(-k < x < k) + \text{sign}(x) k (1 - I(-k < x < k))\]Tukey’s biweight
\[f_k(x) = x I(-k < x < k) + x \left( 1 - (x/k)^2 \right)^2\]Fair
\[f_k(x) = \frac{x}{1 + |x|/k}\]Cauchy
\[f_k(x) = \frac{x}{1 + (x/k)^2}\]Ullah
\[f_k(x) = x \left[ 1 + (x/k)^4 \right]^-2\]Welsch
\[f_k(x) = x \exp(-x^2 / (2k^2))\]Hampel (Hampel’s requires two additional parameters, \(a\) and \(b\))
\[\begin{split}f_{k,a,b}(x) = \begin{bmatrix} I(-a < x < a) \times x \\ + I(a \le |x| < b) \times a \times \text{sign}(x) \\ + I(b \le x < k) \times a \frac{k - x}{k - b} \\ + I(-k \ge x > -b) \times -a \frac{-k + x}{-k + b} \\ + I(|x| \ge k) \times 0 \end{bmatrix}\end{split}\]- Parameters
residual (ndarray, vector, list) – 1-dimensional vector of n observed values. Input should consists of the residuals (the difference between the observed value and the predicted value). For the robust mean, this is \(Y_i - \mu\). For robust regression, this is \(Y_i - X_i^T \beta\)
loss (str) – Loss function to use. Options include: ‘andrew’, ‘huber’, ‘tukey’, ‘fair’, ‘cauchy’, ‘ullah’, ‘welsch’, ‘hampel’
k (int, float) – Tuning parameter for the corresponding loss function. Note: no default is provided, since each loss function has different recommendations.
a (int, float, None, optional) – Lower parameter for the ‘hampel’ loss function
b (int, float, None, optional) – Upper parameter for the ‘hampel’ loss function
- Returns
Returns a 1-by-n NumPy array evaluated for the input theta and residual
- Return type
array
Examples
Using the robust loss function
>>> import numpy as np >>> from delicatessen.utilities import robust_loss_functions
Some generic data to stand-in for the residuals
>>> residuals = np.random.standard_cauchy(size=20)
Huber’s loss function
>>> robust_loss_functions(residuals, loss='huber', k=1.345)
Andrew’s Sine
>>> robust_loss_functions(residuals, loss='andrew', k=1.339)
Tukey’s biweight
>>> robust_loss_functions(residuals, loss='tukey', k=4.685)
Fair
>>> robust_loss_functions(residuals, loss='fair', k=1.3998)
Cauchy
>>> robust_loss_functions(residuals, loss='cauchy', k=2.3849)
Ullah
>>> robust_loss_functions(residuals, loss='ullah', k=3.2296)
Welsch
>>> robust_loss_functions(residuals, loss='welsch', k=2.9846)
Hampel’s loss function
>>> robust_loss_functions(residuals, loss='hampel', k=8, a=2, b=4)
References
Andrews DF. (1974). A robust method for multiple linear regression. Technometrics, 16(4), 523-531.
Beaton AE & Tukey JW (1974). The fitting of power series, meaning polynomials, illustrated on band-spectroscopic data. Technometrics, 16(2), 147-185.
Hampel FR. (1971). A general qualitative definition of robustness. The Annals of Mathematical Statistics, 42(6), 1887-1896.
Huber PJ. (1964). Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics, 35(1), 73–101.
Huber PJ, Ronchetti EM. (2009) Robust Statistics 2nd Edition. Wiley. pgs 98-100
de Menezes DQF, Prata DM, Secchi AR, & Pinto JC. (2021). A review on robust M-estimators for regression analysis. Computers & Chemical Engineering, 147, 107254.
Rey WJ. (1983). Type M estimators. In Introduction to Robust and Quasi-Robust Statistical Methods (pp. 134-189). Berlin, Heidelberg: Springer Berlin Heidelberg.