delicatessen.estimating_equations.regression.ee_mlogit
- ee_mlogit(theta, X, y, weights=None, offset=None)
Estimating equation for multinomial logistic regression. This estimating equation functionality supports unranked categorical outcome data, unlike
ee_regression
andee_glm
.Unlike the other regression estimating equations,
ee_mlogit
expects a matrix of indicators for each possible value ofy
, with the first column being used as the referent category. In other words, the outcome variable is a matrix of dummy variables that includes the reference. The estimating equation for column \(r\) of the indicator variable \(Y_{r}\) of a \(Y\) with \(k\) unique categories is\[\sum_{i=1}^n \left\{ Y_{r,i} - \frac{\exp(X_i^T \theta_r)}{1 + \sum_{j=2}^{k} \exp(X_i^T \theta_j)} \right\} X_i = 0\]where \(\theta_r\) are the coefficients correspond to the log odds ratio comparing \(Y_r\) to all other categories of \(Y\). Here, \(\theta\) is a 1-by-(b :math`times` (k-1)) array, where b is the distinct covariates included as part of
X
. So, the stack of estimating equations consists of (k-1) estimating equations of the dimension \(X_i\). For example, if X is a 3-by-n matrix and \(Y\) has three unique categories, then \(\theta\) will be a 1-by-6 array.- Parameters
theta (ndarray, list, vector) – Theta in this case consists of b \(\times\) (k-1) values. Therefore, initial values should consist of the same number as the number of columns present in the design matrix for each category of the outcome matrix besides the reference.
X (ndarray, list, vector) – 2-dimensional design matrix of n observed covariates for b variables.
y (ndarray, list, vector) – 2-dimensional indicator matrix of n observed outcomes.
weights (ndarray, list, vector, None, optional) – 1-dimensional vector of n weights. Default is
None
, which assigns a weight of 1 to all observations.offset (ndarray, list, vector, None, optional) – A 1-dimensional offset to be included in the model. Default is
None
, which applies no offset term.
- Returns
Returns a (b \(\times\) (k-1))-by-n NumPy array evaluated for the input
theta
.- Return type
array
Examples
Construction of a estimating equation(s) with
ee_regression
should be done similar to the following>>> import numpy as np >>> import pandas as pd >>> from delicatessen import MEstimator >>> from delicatessen.estimating_equations import ee_mlogit
Some generic data to estimate a multinomial logistic regression model
>>> d = pd.DataFrame() >>> d['W'] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1] >>> d['Y'] = [1, 1, 1, 1, 2, 2, 3, 3, 3, 1, 2, 2, 3, 3] >>> d['C'] = 1
First, notice that
Y
needs to be pre-processed for use withee_mlogit
. To prepare the data, we need to convertd['Y']
into a matrix of indicator variables. We can do this manually by>>> d['Y1'] = np.where(d['Y'] == 1, 1, 0) >>> d['Y2'] = np.where(d['Y'] == 2, 1, 0) >>> d['Y3'] = np.where(d['Y'] == 3, 1, 0)
This can also be accomplished with
pd.get_dummies(d['Y'], drop_first=False)
.For the reference category, we want to have
Y=1
as the reference. Therefore,Y1
will be the first column iny
. The pair of matrices are>>> y = d[['Y1', 'Y2', 'Y3']] >>> X = d[['C', 'W']]
Defining psi, or the stacked estimating equations
>>> def psi(theta): >>> return ee_mlogit(theta, X=X, y=y)
Calling the M-estimator (note that
init
requires 4 values, sinceX.shape[1]
is 2 andy.shape[1]
is 3).>>> estr = MEstimator(stacked_equations=psi, init=[0., 0., 0., 0.]) >>> estr.estimate()
Inspecting the parameter estimates, variance, and confidence intervals
>>> estr.theta >>> estr.variance >>> estr.confidence_intervals()
Here, the first two values of
theta
correspond toY2
and the last two values oftheta
correspond toY3
.A weighted multinomial logistic regression can be implemented by specifying the
weights
argument. An offset can be added by specifying theoffset
argument.References
Kwak C & Clayton-Matthews A. (2002). Multinomial logistic regression. Nursing Research, 51(6), 404-410.