delicatessen.estimating_equations.regression.ee_tobit

ee_tobit(theta, X, y, lower=None, upper=None, weights=None, offset=None)

Estimating equation for linear regression with censored outcomes. This estimating equations implements Tobit regression (Type I), which can be used for outcomes that are left or right censored. This data arises commonly with biological measurements that have limits of detection. The general estimating equation is

\[\sum_{i=1}^n \left\{ I(Y_l = Y_i) \frac{\phi(\hat{Y}_i)}{\sigma \Phi(\hat{Y}_i)} + I(Y_l \le Y_i \le Y_u) \frac{Y_{i} - \bar{Y_i}}{\sigma^2} + I(Y_u = Y_i) \frac{\phi(\check{Y}_i)}{(1 - \Phi(\check{Y}_i))\sigma} + \right\} X_i = 0\]

where Y_l is the lower limit, Y_u is the upper limit, hat{Y_i} = X_i^T beta, \(\hat{Y}_i = (Y_l - \bar{Y_i})/\sigma\), \(\check{Y}_i = (Y_u - \bar{Y_i})/\sigma\), \(\phi\) is the standard normal probability density function, and \(\Phi\) is the standard normal cumulative density function. As seen in this estimating equation, the overall variance of the outcomes also needs to be estimated for the Tobit model. This additional parameter is estimated with censored observations using the following equation

\[\sum_{i=1}^n \left\{ -I(Y_l = Y_i) \frac{\hat{Y}_i \phi(\hat{Y}_i)}{\sigma \Phi(\hat{Y}_i)} + I(Y_l \le Y_i \le Y_u) \left(\frac{1}{\sigma} + \frac{(Y_i - \bar{Y_i})^2}{\sigma^3} \right) + I(Y_u = Y_i) \frac{\check{Y}_i \phi(\check{Y}_i)}{\sigma \Phi(\check{Y}_i)} \right\} X_i = 0\]

Note

For computational purposes, the input parameter for \(\sigma\) is actually \(\log(\sigma)\), which avoids introduction of non-positive values.

So, \(\theta = (\beta, \log(\sigma))\) which is a 1-by-(b`+1) array, where `b is the dimension of X. For example, if X is a 3-by-n matrix, then \(\theta\) will be a 1-by-4 array. The code is general to allow for an arbitrary number of elements in X.

Parameters

theta (ndarray, list, vector) – Theta in this case consists of b \(\times\) (k-1) values. Therefore, initial values should consist of the same number as the number of columns present in the design matrix for each category of the outcome matrix besides the reference.
X (ndarray, list, vector) – 2-dimensional design matrix of n observed covariates for b variables.
y (ndarray, list, vector) – 2-dimensional indicator matrix of n observed outcomes.
lower (float, int, None, optional) – Lower limit of the measurement. Default is None, which corresponds to -np.inf.
upper (float, int, None, optional) – Upper limit of the measurement. Default is None, which corresponds to np.inf.
weights (ndarray, list, vector, None, optional) – 1-dimensional vector of n weights. Default is None, which assigns a weight of 1 to all observations.
offset (ndarray, list, vector, None, optional) – A 1-dimensional offset to be included in the model. Default is None, which applies no offset term.

Returns

Returns a (b`+1)-by-`n NumPy array evaluated for the input theta.

Return type

array

Examples

Construction of an estimating equation(s) with ee_tobit should be done similar to the following

>>> import numpy as np
>>> import pandas as pd
>>> from delicatessen import MEstimator
>>> from delicatessen.estimating_equations import ee_tobit

Some generic data to estimate the regression model

>>> n = 500
>>> d = pd.DataFrame()
>>> d['X'] = np.random.normal(size=n)
>>> d['Z'] = np.random.normal(size=n)
>>> d['Y'] = 0.5 + 2*d['X'] - d['Z'] + np.random.normal(loc=0, size=n)
>>> d['Yc'] = np.clip(d['Y'], a_min=-2, a_max=3)
>>> d['C'] = 1

Note that C here is set to all 1’s. This will be the intercept in the regression. Here, we observe Yc which are the censored observations. Defining psi, or the stacked estimating equations

>>> def psi(theta):
>>>     return ee_tobit(theta=theta, X=d[['C', 'X', 'Z']], y=d['Yc'], lower=-2, upper=3)

Calling the M-estimator (note that init requires 4 values, since X.shape[1] is 3).

>>> estr = MEstimator(psi, init=[np.mean(d['Yc']), 0, 0, 0])
>>> estr.estimate()

Inspecting the estimated parameters

>>> estr.theta[:-1]  # beta, or model coefficients
>>> estr.theta[-1]   # log(sigma), or the log of the variance of Y

For data that is only censored in one direction (left or right censoring), only the lower or upper limit should be specified. Weighted models can be estimated by specifying the optional weights argument.

References

Amemiya, T. (1984). Tobit models: A survey. Journal of Econometrics, 24(1-2), 3-61.

Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica, 26(1), 24-36.