delicatessen.utilities.additive_design_matrix
- additive_design_matrix(X, specifications, return_penalty=False)
Generate an additive design matrix for generalized additive models (GAM) given a set of spline specifications to apply.
Note
This function is interally called by
ee_additive_regression. This function can also be called to aid in easily generating predicted values.- Parameters
X (ndarray, vector, list) – Input independent variable data.
specifications (ndarray, vector, list) – A list of dictionaries that define the hyperparameters for the spline (e.g., number of knots, strength of penalty). For terms that should not have splines,
Noneshould be specified instead (see examples below). Each dictionary supports the following parameters: “knots”, “natural”, “power”, “penalty” knots (list): controls the position of the knots, with knots are placed at given locations. There is no default, so must be specified by the user. natural (bool): controls whether to generate natural (restricted) or unrestricted splines. Default isTrue, which corresponds to natural splines. power (float): controls the power to raise the spline terms to. Default is 3, which corresponds to cubic splines. penalty (float): penalty term (\(\lambda\)) applied to each corresponding spline basis term. Default is 0, which applies no penalty to the spline basis terms. normalized (bool): whether to normalize the spline terms. Default isFalse, with a default change coming with v3.0 release.return_penalty (bool, optional) – Whether the list of the corresponding penalty terms should also be returned. This functionality is used internally to create the list of penalty terms to provide the Ridge regression model, where only the spline terms are penalized. Default is False.
- Returns
Returns a (b+k)-by-n design matrix as a NumPy array, where b is the number of columns in the input array and k is determined by the specifications of the spline basis functions.
- Return type
array
Examples
Construction of a design matrix for an additive model should be done similar to the following
>>> import numpy as np >>> import pandas as pd >>> from delicatessen.utilities import additive_design_matrix
Some generic data to estimate a generalized additive model
>>> n = 200 >>> d = pd.DataFrame() >>> d['X'] = np.random.normal(size=n) >>> d['Z'] = np.random.normal(size=n) >>> d['W'] = np.random.binomial(n=1, p=0.5, size=n) >>> d['C'] = 1
To begin, consider the simple input design matrix of
d[['C', 'X']]. This initial design matrix consists of an intercept term and a continuous term. Here, we will specify a natural spline with 20 knots for the second term only>>> x_knots = np.linspace(np.min(d['X'])+0.1, np.max(d['X'])-0.1, 20) >>> specs = [None, {"knots": x_knots, "penalty": 10}] >>> Xa_design = additive_design_matrix(X=d[['C', 'X']], specifications=specs)
Other optional specifications are also available. Here, we will specify an unrestricted quadratic spline with a penalty of 5.5 for the second column of the design matrix.
>>> specs = [None, {"knots": [-2, -1, 0, 1, 2], "natural": False, "power": 2, "penalty": 5.5}] >>> Xa_design = additive_design_matrix(X=d[['C', 'X']], specifications=specs)
Now consider the input design matrix of
d[['C', 'X', 'Z', 'W']]. This initial design matrix consists of an intercept, two continuous, and a categorical term. Here, we will specify splines for both continuous terms>>> x_knots = np.linspace(np.min(d['X'])+0.1, np.max(d['X'])-0.1, 20) >>> z_knots = np.linspace(np.min(d['Z'])+0.1, np.max(d['Z'])-0.1, 10) >>> specs = [None, # Intercept term >>> {"knots": x_knots, "penalty": 25}, # X (continuous) >>> {"knots": z_knots, "penalty": 15}, # Z (continuous) >>> None] # W (categorical) >>> Xa_design = additive_design_matrix(X=d[['C', 'X', 'Z', 'W']], specifications=specs)
Notice that the two continuous terms have different spline specifications.
Finally, we could opt to only generate a spline basis for one of the continuous variables
>>> specs = [None, # Intercept term >>> {"knots": x_knots, "penalty": 25}, # X (continuous) >>> None, # Z (continuous) >>> None] # W (categorical) >>> Xa_design = additive_design_matrix(X=d[['C', 'X', 'Z', 'W']], specifications=specs)
Specification of splines can be modified and paired in a variety of ways. These are determined by the object type in the specification list, and the input dictionary for the spline terms.