delicatessen.estimating_equations.regression.ee_mlogit
- ee_mlogit(theta, X, y, weights=None, offset=None)
Estimating equation for multinomial logistic regression. This estimating equation functionality supports unranked categorical outcome data, unlike
ee_regressionandee_glm.Unlike the other regression estimating equations,
ee_mlogitexpects a matrix of indicators for each possible value ofy, with the first column being used as the referent category. In other words, the outcome variable is a matrix of dummy variables that includes the reference. The estimating equation for column \(r\) of the indicator variable \(Y_{r}\) of a \(Y\) with \(k\) unique categories is\[\sum_{i=1}^n \left\{ Y_{r,i} - \frac{\exp(X_i^T \theta_r)}{1 + \sum_{j=2}^{k} \exp(X_i^T \theta_j)} \right\} X_i = 0\]where \(\theta_r\) are the coefficients correspond to the log odds ratio comparing \(Y_r\) to all other categories of \(Y\). Here, \(\theta\) is a 1-by-(b :math`times` (k-1)) array, where b is the distinct covariates included as part of
X. So, the stack of estimating equations consists of (k-1) estimating equations of the dimension \(X_i\). For example, if X is a 3-by-n matrix and \(Y\) has three unique categories, then \(\theta\) will be a 1-by-6 array.- Parameters
theta (ndarray, list, vector) – Theta in this case consists of b \(\times\) (k-1) values. Therefore, initial values should consist of the same number as the number of columns present in the design matrix for each category of the outcome matrix besides the reference.
X (ndarray, list, vector) – 2-dimensional design matrix of n observed covariates for b variables.
y (ndarray, list, vector) – 2-dimensional indicator matrix of n observed outcomes.
weights (ndarray, list, vector, None, optional) – 1-dimensional vector of n weights. Default is
None, which assigns a weight of 1 to all observations.offset (ndarray, list, vector, None, optional) – A 1-dimensional offset to be included in the model. Default is
None, which applies no offset term.
- Returns
Returns a (b \(\times\) (k-1))-by-n NumPy array evaluated for the input
theta.- Return type
array
Examples
Construction of an estimating equation(s) with
ee_regressionshould be done similar to the following>>> import numpy as np >>> import pandas as pd >>> from delicatessen import MEstimator >>> from delicatessen.estimating_equations import ee_mlogit
Some generic data to estimate a multinomial logistic regression model
>>> d = pd.DataFrame() >>> d['W'] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1] >>> d['Y'] = [1, 1, 1, 1, 2, 2, 3, 3, 3, 1, 2, 2, 3, 3] >>> d['C'] = 1
First, notice that
Yneeds to be pre-processed for use withee_mlogit. To prepare the data, we need to convertd['Y']into a matrix of indicator variables. We can do this manually by>>> d['Y1'] = np.where(d['Y'] == 1, 1, 0) >>> d['Y2'] = np.where(d['Y'] == 2, 1, 0) >>> d['Y3'] = np.where(d['Y'] == 3, 1, 0)
This can also be accomplished with
pd.get_dummies(d['Y'], drop_first=False).For the reference category, we want to have
Y=1as the reference. Therefore,Y1will be the first column iny. The pair of matrices are>>> y = d[['Y1', 'Y2', 'Y3']] >>> X = d[['C', 'W']]
Defining psi, or the stacked estimating equations
>>> def psi(theta): >>> return ee_mlogit(theta, X=X, y=y)
Calling the M-estimator (note that
initrequires 4 values, sinceX.shape[1]is 2 andy.shape[1]is 3).>>> estr = MEstimator(stacked_equations=psi, init=[0., 0., 0., 0.]) >>> estr.estimate()
Inspecting the parameter estimates, variance, and confidence intervals
>>> estr.theta >>> estr.variance >>> estr.confidence_intervals()
Here, the first two values of
thetacorrespond toY2and the last two values ofthetacorrespond toY3.A weighted multinomial logistic regression can be implemented by specifying the
weightsargument. An offset can be added by specifying theoffsetargument.References
Kwak C & Clayton-Matthews A. (2002). Multinomial logistic regression. Nursing Research, 51(6), 404-410.