delicatessen.utilities.robust_loss_functions

robust_loss_functions(residual, loss, k, a=None, b=None)

Loss functions for robust mean and robust regression estimating equations. This function is called internally for ee_mean_robust and ee_robust_regression. This function can also be accessed, so user’s can easily adapt their own regression models into robust regression models using the pre-defined loss functions.

Note

The loss functions here are technically the first-order derivatives of the loss functions you see in the literature.

The following score of the loss functions, \(f_k()\), are available.

Andrew’s Sine

\[f_k(x) = I(k \pi \le x \le k \pi) \times \sin(x/k)\]

Huber

\[f_k(x) = x I(-k < x < k) + \text{sign}(x) k (1 - I(-k < x < k))\]

Tukey’s biweight

\[f_k(x) = x I(-k < x < k) + x \left( 1 - (x/k)^2 \right)^2\]

Fair

\[f_k(x) = \frac{x}{1 + |x|/k}\]

Cauchy

\[f_k(x) = \frac{x}{1 + (x/k)^2}\]

Ullah

\[f_k(x) = x \left[ 1 + (x/k)^4 \right]^-2\]

Welsch

\[f_k(x) = x \exp(-x^2 / (2k^2))\]

Hampel (Hampel’s requires two additional parameters, \(a\) and \(b\))

\[\begin{split}f_{k,a,b}(x) = \begin{bmatrix} I(-a < x < a) \times x \\ + I(a \le |x| < b) \times a \times \text{sign}(x) \\ + I(b \le x < k) \times a \frac{k - x}{k - b} \\ + I(-k \ge x > -b) \times -a \frac{-k + x}{-k + b} \\ + I(|x| \ge k) \times 0 \end{bmatrix}\end{split}\]
Parameters
  • residual (ndarray, vector, list) – 1-dimensional vector of n observed values. Input should consists of the residuals (the difference between the observed value and the predicted value). For the robust mean, this is \(Y_i - \mu\). For robust regression, this is \(Y_i - X_i^T \beta\)

  • loss (str) – Loss function to use. Options include: ‘andrew’, ‘huber’, ‘tukey’, ‘fair’, ‘cauchy’, ‘ullah’, ‘welsch’, ‘hampel’

  • k (int, float) – Tuning parameter for the corresponding loss function. Note: no default is provided, since each loss function has different recommendations.

  • a (int, float, None, optional) – Lower parameter for the ‘hampel’ loss function

  • b (int, float, None, optional) – Upper parameter for the ‘hampel’ loss function

Returns

Returns a 1-by-n NumPy array evaluated for the input theta and residual

Return type

array

Examples

Using the robust loss function

>>> import numpy as np
>>> from delicatessen.utilities import robust_loss_functions

Some generic data to stand-in for the residuals

>>> residuals = np.random.standard_cauchy(size=20)

Huber’s loss function

>>> robust_loss_functions(residuals, loss='huber', k=1.345)

Andrew’s Sine

>>> robust_loss_functions(residuals, loss='andrew', k=1.339)

Tukey’s biweight

>>> robust_loss_functions(residuals, loss='tukey', k=4.685)

Fair

>>> robust_loss_functions(residuals, loss='fair', k=1.3998)

Cauchy

>>> robust_loss_functions(residuals, loss='cauchy', k=2.3849)

Ullah

>>> robust_loss_functions(residuals, loss='ullah', k=3.2296)

Welsch

>>> robust_loss_functions(residuals, loss='welsch', k=2.9846)

Hampel’s loss function

>>> robust_loss_functions(residuals, loss='hampel', k=8, a=2, b=4)

References

Andrews DF. (1974). A robust method for multiple linear regression. Technometrics, 16(4), 523-531.

Beaton AE & Tukey JW (1974). The fitting of power series, meaning polynomials, illustrated on band-spectroscopic data. Technometrics, 16(2), 147-185.

Hampel FR. (1971). A general qualitative definition of robustness. The Annals of Mathematical Statistics, 42(6), 1887-1896.

Huber PJ. (1964). Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics, 35(1), 73–101.

Huber PJ, Ronchetti EM. (2009) Robust Statistics 2nd Edition. Wiley. pgs 98-100

de Menezes DQF, Prata DM, Secchi AR, & Pinto JC. (2021). A review on robust M-estimators for regression analysis. Computers & Chemical Engineering, 147, 107254.

Rey WJ. (1983). Type M estimators. In Introduction to Robust and Quasi-Robust Statistical Methods (pp. 134-189). Berlin, Heidelberg: Springer Berlin Heidelberg.