delicatessen.estimating_equations.regression.ee_beta_regression

ee_beta_regression(theta, X, y, weights=None, offset=None)

Estimating equation for a beta regression model. This estimating equation functionality supports outcome data, bounded within \((0,1)\). Here, the mean–precision parameterization of beta regression is used, with the parameters for the beta distribution defined as

\[\alpha = \mu \phi, \qquad \beta = (1 - \mu) \phi, \qquad \phi > 0\]

where \(\mu = g^{-1}(X_i \eta^T)\) is the regression model and \(g^{-1}\) is the inverse link function. The corresponding estimating equation for beta regression are

\[\begin{split}\sum_{i=1}^n \begin{bmatrix} \mu (1-\mu) \phi \left\{ \text{logit}(Y) - \dot{\gamma}(\mu \phi) + \dot{\gamma}((1-\mu)\phi)\right\} X_i^T \\ \dot{\gamma}(\phi) - \mu \dot{\gamma}(\mu\phi) + (1-\mu)\dot{\gamma}((1-\mu)\phi) + \mu \log(Y_i) (1-\mu)\log(1-y) \end{bmatrix} = 0\end{split}\]

where \(\dot{\gamma}\) denotes the digamma function. Here, \(\theta\) is a 1-by-(b \(+\) 1) array, where b is the distinct covariates included as part of X. For example, if X is a 3-by-n matrix, then \(\theta\) will be a 1-by-4 array.

Parameters

theta (array) – Theta in this case consists of b`+1 values. Therefore, initial values should consist of the same number as the number of columns present. This can easily be implemented by ``[0., ] * X.shape[1] +[0., ]`.
X (ndarray, list, vector) – 2-dimensional vector of n observed values for b variables.
y (ndarray, list, vector) – 1-dimensional vector of n observed values.
weights (ndarray, list, vector, None, optional) – 1-dimensional vector of n weights. Default is None, which assigns a weight of 1 to all observations.
offset (ndarray, list, vector, None, optional) – A 1-dimensional offset to be included in the model. Default is None, which applies no offset term.

Returns

Returns a (b+1)-by-n NumPy array evaluated for the input theta.

Return type

array

Examples

Construction of an estimating equation(s) with ee_beta_regression should be done similar to the following

>>> import numpy as np
>>> import pandas as pd
>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_beta_regression

Some generic data to estimate a beta regression model with

>>> d = pd.DataFrame()
>>> d['W'] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
>>> d['Y'] = [0.1, 0.2, 0.7, 0.11, 0.3, 0.4, 0.65, 0.01, 0.14, 0.9, 0.8, 0.56, 0.99, 0.82]
>>> d['C'] = 1

>>> y = d['Y']
>>> X = d[['C', 'W']]

Defining psi, or the stacked estimating equations

>>> def psi(theta):
>>>     return ee_beta_regression(theta, X=X, y=y)

Calling the M-estimator (note that init requires 4 values, since X.shape[1] is 3).

>>> estr = MEstimator(stacked_equations=psi, init=[0., 0., 0., 0.])
>>> estr.estimate()

Inspecting the parameter estimates, variance, and confidence intervals

>>> estr.theta
>>> estr.variance
>>> estr.confidence_intervals()

Here, the first three values of theta correspond to the regression and the last value of theta corresponds to the precision parameter (on the natural log scale).

Weighted beta regression can be implemented by specifying the weights argument. An offset can be added by specifying the offset argument.

References

Ferrari S, & Cribari-Neto F. (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31(7), 799-815.