delicatessen.estimating_equations.regression.ee_regression

ee_regression(theta, X, y, model, weights=None, offset=None)

Estimating equation for regression. Options include: linear, logistic, and Poisson regression. The general estimating equation is

\[\sum_{i=1}^n \left\{ Y_i - g(X_i^T \theta) \right\} X_i = 0\]

where \(g\) indicates a transformation function. For linear regression, \(g\) is the identity function. Logistic regression uses the inverse-logit function, \(\text{expit}(u) = 1 / (1 + \exp(u))\). Finally, Poisson regression is \(\exp(u)\).

Here, \(\theta\) is a 1-by-b array, which corresponds to the coefficients in the corresponding regression model and b is the distinct covariates included as part of X. For example, if X is a 3-by-n matrix, then \(\theta\) will be a 1-by-3 array. The code is general to allow for an arbitrary number of elements in X.

Parameters
  • theta (ndarray, list, vector) – Theta in this case consists of b values. Therefore, initial values should consist of the same number as the number of columns present. This can easily be implemented by [0, ] * X.shape[1].

  • X (ndarray, list, vector) – 2-dimensional vector of n observed values for b variables.

  • y (ndarray, list, vector) – 1-dimensional vector of n observed values.

  • model (str) – Type of regression model to estimate. Options are 'linear' (linear regression), 'logistic' (logistic regression), and 'poisson' (Poisson regression).

  • weights (ndarray, list, vector, None, optional) – 1-dimensional vector of n weights. Default is None, which assigns a weight of 1 to all observations.

  • offset (ndarray, list, vector, None, optional) – A 1-dimensional offset to be included in the model. Default is None, which applies no offset term.

Returns

Returns a b-by-n NumPy array evaluated for the input theta.

Return type

array

Examples

Construction of a estimating equation(s) with ee_regression should be done similar to the following

>>> import numpy as np
>>> import pandas as pd
>>> from scipy.stats import logistic
>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_regression

Some generic data to estimate the regression models

>>> n = 500
>>> data = pd.DataFrame()
>>> data['X'] = np.random.normal(size=n)
>>> data['Z'] = np.random.normal(size=n)
>>> data['Y1'] = 0.5 + 2*data['X'] - 1*data['Z'] + np.random.normal(loc=0, size=n)
>>> data['Y2'] = np.random.binomial(n=1, p=logistic.cdf(0.5 + 2*data['X'] - 1*data['Z']), size=n)
>>> data['Y3'] = np.random.poisson(lam=np.exp(0.5 + 2*data['X'] - 1*data['Z']), size=n)
>>> data['C'] = 1

Note that C here is set to all 1’s. This will be the intercept in the regression.

To start, we will demonstrate linear regression for the outcome Y1. Defining psi, or the stacked estimating equations

>>> def psi(theta):
>>>     return ee_regression(theta=theta, X=data[['C', 'X', 'Z']], y=data['Y1'], model='linear')

Calling the M-estimator (note that init requires 3 values, since X.shape[1] is 3).

>>> estr = MEstimator(stacked_equations=psi, init=[0., 0., 0.,])
>>> estr.estimate()

Inspecting the parameter estimates, variance, and confidence intervals

>>> estr.theta
>>> estr.variance
>>> estr.confidence_intervals()

Next, we can estimate the parameters for a logistic regression model as follows

>>> def psi(theta):
>>>         return ee_regression(theta=theta, X=data[['C', 'X', 'Z']], y=data['Y2'], model='logistic')
>>> estr = MEstimator(stacked_equations=psi, init=[0., 0., 0.,])
>>> estr.estimate()

Finally, we can estimate the parameters for a Poisson regression model as follows

>>> def psi(theta):
>>>         return ee_regression(theta=theta, X=data[['C', 'X', 'Z']], y=data['Y3'], model='poisson')
>>> estr = MEstimator(stacked_equations=psi, init=[0., 0., 0.,])
>>> estr.estimate()

Weighted models can be estimated by specifying the optional weights argument.

References

Boos DD, & Stefanski LA. (2013). M-estimation (estimating equations). In Essential Statistical Inference (pp. 297-337). Springer, New York, NY.