delicatessen.utilities.additive_design_matrix

additive_design_matrix(X, specifications, return_penalty=False)

Generate an additive design matrix for generalized additive models (GAM) given a set of spline specifications to apply.

Note

This function is interally called by ee_additive_regression. This function can also be called to aid in easily generating predicted values.

Parameters
  • X (ndarray, vector, list) – Input independent variable data.

  • specifications (ndarray, vector, list) – A list of dictionaries that define the hyperparameters for the spline (e.g., number of knots, strength of penalty). For terms that should not have splines, None should be specified instead (see examples below). Each dictionary supports the following parameters: “knots”, “natural”, “power”, “penalty” knots (list): controls the position of the knots, with knots are placed at given locations. There is no default, so must be specified by the user. natural (bool): controls whether to generate natural (restricted) or unrestricted splines. Default is True, which corresponds to natural splines. power (float): controls the power to raise the spline terms to. Default is 3, which corresponds to cubic splines. penalty (float): penalty term (\(\lambda\)) applied to each corresponding spline basis term. Default is 0, which applies no penalty to the spline basis terms. normalized (bool): whether to normalize the spline terms. Default is False, with a default change coming with v3.0 release.

  • return_penalty (bool, optional) – Whether the list of the corresponding penalty terms should also be returned. This functionality is used internally to create the list of penalty terms to provide the Ridge regression model, where only the spline terms are penalized. Default is False.

Returns

Returns a (b+k)-by-n design matrix as a NumPy array, where b is the number of columns in the input array and k is determined by the specifications of the spline basis functions.

Return type

array

Examples

Construction of a design matrix for an additive model should be done similar to the following

>>> import numpy as np
>>> import pandas as pd
>>> from delicatessen.utilities import additive_design_matrix

Some generic data to estimate a generalized additive model

>>> n = 200
>>> d = pd.DataFrame()
>>> d['X'] = np.random.normal(size=n)
>>> d['Z'] = np.random.normal(size=n)
>>> d['W'] = np.random.binomial(n=1, p=0.5, size=n)
>>> d['C'] = 1

To begin, consider the simple input design matrix of d[['C', 'X']]. This initial design matrix consists of an intercept term and a continuous term. Here, we will specify a natural spline with 20 knots for the second term only

>>> x_knots = np.linspace(np.min(d['X'])+0.1, np.max(d['X'])-0.1, 20)
>>> specs = [None, {"knots": x_knots, "penalty": 10}]
>>> Xa_design = additive_design_matrix(X=d[['C', 'X']], specifications=specs)

Other optional specifications are also available. Here, we will specify an unrestricted quadratic spline with a penalty of 5.5 for the second column of the design matrix.

>>> specs = [None, {"knots": [-2, -1, 0, 1, 2], "natural": False, "power": 2, "penalty": 5.5}]
>>> Xa_design = additive_design_matrix(X=d[['C', 'X']], specifications=specs)

Now consider the input design matrix of d[['C', 'X', 'Z', 'W']]. This initial design matrix consists of an intercept, two continuous, and a categorical term. Here, we will specify splines for both continuous terms

>>> x_knots = np.linspace(np.min(d['X'])+0.1, np.max(d['X'])-0.1, 20)
>>> z_knots = np.linspace(np.min(d['Z'])+0.1, np.max(d['Z'])-0.1, 10)
>>> specs = [None,                              # Intercept term
>>>          {"knots": x_knots, "penalty": 25}, # X (continuous)
>>>          {"knots": z_knots, "penalty": 15}, # Z (continuous)
>>>          None]                              # W (categorical)
>>> Xa_design = additive_design_matrix(X=d[['C', 'X', 'Z', 'W']], specifications=specs)

Notice that the two continuous terms have different spline specifications.

Finally, we could opt to only generate a spline basis for one of the continuous variables

>>> specs = [None,                              # Intercept term
>>>          {"knots": x_knots, "penalty": 25}, # X (continuous)
>>>          None,                              # Z (continuous)
>>>          None]                              # W (categorical)
>>> Xa_design = additive_design_matrix(X=d[['C', 'X', 'Z', 'W']], specifications=specs)

Specification of splines can be modified and paired in a variety of ways. These are determined by the object type in the specification list, and the input dictionary for the spline terms.