delicatessen.estimating_equations.regression.ee_regression
- ee_regression(theta, X, y, model, weights=None, offset=None)
Estimating equation for regression. Options include: linear, logistic, and Poisson regression. The general estimating equation is
\[\sum_{i=1}^n \left\{ Y_i - g(X_i^T \theta) \right\} X_i = 0\]where \(g\) indicates a transformation function. For linear regression, \(g\) is the identity function. Logistic regression uses the inverse-logit function, \(\text{expit}(u) = 1 / (1 + \exp(u))\). Finally, Poisson regression is \(\exp(u)\).
Here, \(\theta\) is a 1-by-b array, which corresponds to the coefficients in the corresponding regression model and b is the distinct covariates included as part of
X
. For example, ifX
is a 3-by-n matrix, then \(\theta\) will be a 1-by-3 array. The code is general to allow for an arbitrary number of elements inX
.- Parameters
theta (ndarray, list, vector) – Theta in this case consists of b values. Therefore, initial values should consist of the same number as the number of columns present. This can easily be implemented by
[0, ] * X.shape[1]
.X (ndarray, list, vector) – 2-dimensional vector of n observed values for b variables.
y (ndarray, list, vector) – 1-dimensional vector of n observed values.
model (str) – Type of regression model to estimate. Options are
'linear'
(linear regression),'logistic'
(logistic regression), and'poisson'
(Poisson regression).weights (ndarray, list, vector, None, optional) – 1-dimensional vector of n weights. Default is
None
, which assigns a weight of 1 to all observations.offset (ndarray, list, vector, None, optional) – A 1-dimensional offset to be included in the model. Default is
None
, which applies no offset term.
- Returns
Returns a b-by-n NumPy array evaluated for the input
theta
.- Return type
array
Examples
Construction of a estimating equation(s) with
ee_regression
should be done similar to the following>>> import numpy as np >>> import pandas as pd >>> from scipy.stats import logistic >>> from delicatessen import MEstimator >>> from delicatessen.estimating_equations import ee_regression
Some generic data to estimate the regression models
>>> n = 500 >>> data = pd.DataFrame() >>> data['X'] = np.random.normal(size=n) >>> data['Z'] = np.random.normal(size=n) >>> data['Y1'] = 0.5 + 2*data['X'] - 1*data['Z'] + np.random.normal(loc=0, size=n) >>> data['Y2'] = np.random.binomial(n=1, p=logistic.cdf(0.5 + 2*data['X'] - 1*data['Z']), size=n) >>> data['Y3'] = np.random.poisson(lam=np.exp(0.5 + 2*data['X'] - 1*data['Z']), size=n) >>> data['C'] = 1
Note that
C
here is set to all 1’s. This will be the intercept in the regression.To start, we will demonstrate linear regression for the outcome
Y1
. Defining psi, or the stacked estimating equations>>> def psi(theta): >>> return ee_regression(theta=theta, X=data[['C', 'X', 'Z']], y=data['Y1'], model='linear')
Calling the M-estimator (note that
init
requires 3 values, sinceX.shape[1]
is 3).>>> estr = MEstimator(stacked_equations=psi, init=[0., 0., 0.,]) >>> estr.estimate()
Inspecting the parameter estimates, variance, and confidence intervals
>>> estr.theta >>> estr.variance >>> estr.confidence_intervals()
Next, we can estimate the parameters for a logistic regression model as follows
>>> def psi(theta): >>> return ee_regression(theta=theta, X=data[['C', 'X', 'Z']], y=data['Y2'], model='logistic')
>>> estr = MEstimator(stacked_equations=psi, init=[0., 0., 0.,]) >>> estr.estimate()
Finally, we can estimate the parameters for a Poisson regression model as follows
>>> def psi(theta): >>> return ee_regression(theta=theta, X=data[['C', 'X', 'Z']], y=data['Y3'], model='poisson')
>>> estr = MEstimator(stacked_equations=psi, init=[0., 0., 0.,]) >>> estr.estimate()
Weighted models can be estimated by specifying the optional
weights
argument.References
Boos DD, & Stefanski LA. (2013). M-estimation (estimating equations). In Essential Statistical Inference (pp. 297-337). Springer, New York, NY.