delicatessen.estimating_equations.regression.ee_lasso_regression

ee_lasso_regression(theta, X, y, model, penalty, epsilon=0.003, weights=None, center=0.0, offset=None)

Estimating equation for an approximate LASSO (least absolute shrinkage and selection operator) regressor. LASSO regression applies an L1-regularization through a magnitude penalty.

The estimating equation for the approximate LASSO linear regression is

\[\sum_{i=1}^n \left\{(Y_i - X_i^T \theta) X_i - \lambda (1 + \epsilon) | \theta |^{\epsilon} sign(\theta) \right\} = 0\]

where \(\lambda\) is the penalty term.

Note

As the derivative of the estimating equation for LASSO is not defined at \(\theta=0\), the bread (and sandwich) cannot be used to estimate the variance in all settings.

Here, an approximation based on the bridge penalty for the LASSO is used. For the bridge penalty, LASSO is the special case where \(\epsilon = 0\). By making \(\epsilon > 0\), we can approximate the LASSO. The true LASSO may not be possible to implement due to the existence of multiple solutions

Here, \(\theta\) is a 1-by-b array, which corresponds to the coefficients in the corresponding regression model and b is the distinct covariates included as part of X. For example, if X is a 3-by-n matrix, then \(\theta\) will be a 1-by-3 array. The code is general to allow for an arbitrary number of elements in X.

Note

The ‘strength’ of the penalty term is indicated by \(\lambda\), which is the penalty argument scaled (or divided by) the number of observations.

Parameters

theta (ndarray, list, vector) – Theta in this case consists of b values. Therefore, initial values should consist of the same number as the number of columns present. This can easily be implemented via [0, ] * X.shape[1].
X (ndarray, list, vector) – 2-dimensional vector of n observed values for b variables.
y (ndarray, list, vector) – 1-dimensional vector of n observed values.
model (str) – Type of regression model to estimate. Options are 'linear' (linear regression), 'logistic' (logistic regression), and 'poisson' (Poisson regression).
penalty (int, float, ndarray, list, vector) – Penalty term to apply to all coefficients (if only a integer or float is provided) or the corresponding coefficient (if a list or vector of integers or floats is provided). Note that the penalty term should either consists of a single value or b values (to match the length of theta). The penalty is scaled by n.
epsilon (float, optional) – Approximation error to use for the LASSO approximation. Default argument is 0.003, which results in a bridge penalty of 1.0003.
weights (ndarray, list, vector, None, optional) – 1-dimensional vector of n weights. Default is None, which assigns a weight of 1 to all observations.
center (int, float, ndarray, list, vector, optional) – Center or reference value to penalized estimated coefficients towards. Default is 0, which penalized coefficients towards the null. Other center values can be specified for all coefficients (by providing an integer or float) or covariate-specific centering values (by providing a vector of values of the same length as X).
offset (ndarray, list, vector, None, optional) – A 1-dimensional offset to be included in the model. Default is None, which applies no offset term.

Returns

Returns a b-by-n NumPy array evaluated for the input theta.

Return type

array

Examples

Construction of a estimating equation(s) with ee_lasso_regression should be done similar to the following

>>> import numpy as np
>>> import pandas as pd
>>> from scipy.stats import logistic
>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_lasso_regression

Some generic data to estimate a LASSO regression model

>>> n = 500
>>> data = pd.DataFrame()
>>> data['V'] = np.random.normal(size=n)
>>> data['W'] = np.random.normal(size=n)
>>> data['X'] = data['W'] + np.random.normal(scale=0.25, size=n)
>>> data['Z'] = np.random.normal(size=n)
>>> data['Y1'] = 0.5 + 2*data['W'] - 1*data['Z'] + np.random.normal(loc=0, size=n)
>>> data['Y2'] = np.random.binomial(n=1, p=logistic.cdf(0.5 + 2*data['W'] - 1*data['Z']), size=n)
>>> data['Y3'] = np.random.poisson(lam=np.exp(1 + 2*data['W'] - 1*data['Z']), size=n)
>>> data['C'] = 1

Note that C here is set to all 1’s. This will be the intercept in the regression.

Defining psi, or the stacked estimating equations. Note that the penalty is a list of values. Here, we are not penalizing the intercept (which is generally recommended when the intercept is unlikely to be zero). The remainder of covariates have a penalty of 10 applied.

>>> penalty_vals = [0., 10., 10., 10., 10.]
>>> def psi(theta):
>>>     x, y = data[['C', 'V', 'W', 'X', 'Z']], data['Y1']
>>>     return ee_lasso_regression(theta=theta, X=x, y=y, model='linear', penalty=penalty_vals)

Calling the M-estimator (note that init has 5 values now, since X.shape[1] is 5).

>>> estr = MEstimator(stacked_equations=psi, init=[0.01, 0.01, 0.01, 0.01, 0.01])
>>> estr.estimate(solver='lm', maxiter=20000)

Inspecting the parameter estimates

>>> estr.theta

Next, we can estimate the parameters for a logistic regression model as follows

>>> penalty_vals = [0., 10., 10., 10., 10.]
>>> def psi(theta):
>>>     x, y = data[['C', 'V', 'W', 'X', 'Z']], data['Y2']
>>>     return ee_lasso_regression(theta=theta, X=x, y=y, model='logistic', penalty=penalty_vals)

>>> estr = MEstimator(stacked_equations=psi, init=[0.01, 0.01, 0.01, 0.01, 0.01])
>>> estr.estimate(solver='lm', maxiter=20000)

Finally, we can estimate the parameters for a Poisson regression model as follows

>>> penalty_vals = [0., 10., 10., 10., 10.]
>>> def psi(theta):
>>>     x, y = data[['C', 'V', 'W', 'X', 'Z']], data['Y3']
>>>     return ee_lasso_regression(theta=theta, X=x, y=y, model='poisson', penalty=penalty_vals)

>>> estr = MEstimator(stacked_equations=psi, init=[0.01, 0.01, 0.01, 0.01, 0.01])
>>> estr.estimate(solver='lm', maxiter=20000)

Weighted models can be estimated by specifying the optional weights argument.

References

Fu WJ. (1998). Penalized regressions: the Bridge versus the LASSO. Journal of Computational and Graphical Statistics, 7(3), 397-416.

Fu WJ. (2003). Penalized estimating equations. Biometrics, 59(1), 126-132.