{
 "cells": [
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "# Colditz (1994): Meta-Analysis\n",
    "\n",
    "When there are multiple studies focused on the same question, one might be interested in meta-analysis, whereby the results of those multiple studies are integrated together to boost precision. To illustrate how `delicatessen` can be used to conduct meta-analyses, we re-examine data from Colditz et al. (1994), which used meta-analysis to estimate the efficacy of BCG vaccination for prevention of tuberculosis (TB). The corresponding data can be obtained from the R package `metafor`.\n",
    "\n",
    "## Setup"
   ],
   "id": "f7a69d7d18d61c7"
  },
  {
   "cell_type": "code",
   "id": "initial_id",
   "metadata": {
    "collapsed": true,
    "ExecuteTime": {
     "end_time": "2026-04-29T18:15:06.718497Z",
     "start_time": "2026-04-29T18:15:05.674405700Z"
    }
   },
   "source": [
    "import numpy as np\n",
    "import scipy as sp\n",
    "import pandas as pd\n",
    "import delicatessen as deli\n",
    "from delicatessen import MEstimator\n",
    "from delicatessen.estimating_equations import (ee_mean, ee_meta_random,\n",
    "                                               ee_regression, ee_meta_regression)\n",
    "\n",
    "print(\"Versions\")\n",
    "print(\"NumPy:        \", np.__version__)\n",
    "print(\"SciPy:        \", sp.__version__)\n",
    "print(\"Pandas:       \", pd.__version__)\n",
    "print(\"Delicatessen: \", deli.__version__)"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Versions\n",
      "NumPy:         2.3.5\n",
      "SciPy:         1.16.3\n",
      "Pandas:        2.3.3\n",
      "Delicatessen:  4.2\n"
     ]
    }
   ],
   "execution_count": 1
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-04-29T18:15:06.746402400Z",
     "start_time": "2026-04-29T18:15:06.722118300Z"
    }
   },
   "cell_type": "code",
   "source": [
    "# Loading data from metafor\n",
    "d = pd.read_csv(\"data/colditz1994.csv\")"
   ],
   "id": "b1749d6f6d7f56c4",
   "outputs": [],
   "execution_count": 2
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "The provided data describes the events per arm. We talk this raw data and convert it into summary measures instead. Explicitly, we manually estimate the risk differences and risk ratios. In general, this is what would be available to conduct a meta-analysis. ",
   "id": "3f7ae0bedd5446dd"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-04-29T18:15:06.799360600Z",
     "start_time": "2026-04-29T18:15:06.747934Z"
    }
   },
   "cell_type": "code",
   "source": [
    "d['n'] = d['tpos'] + d['tneg'] + d['cpos'] + d['cneg']\n",
    "d['r1'] = d['tpos'] / (d['tpos'] + d['tneg'])\n",
    "d['r0'] = d['cpos'] / (d['cpos'] + d['cneg'])\n",
    "d['rd'] = d['r1'] - d['r0']\n",
    "d['var_rd'] = ((d['r1'] * (1 - d['r1'])) / (d['tpos'] + d['tneg'])\n",
    "               + (d['r0'] * (1 - d['r0'])) / (d['cpos'] + d['cneg']))\n",
    "d['rr'] = d['r1'] / d['r0']\n",
    "d['var_lrr'] = ((1 / d['tpos']) - (1 / (d['tpos'] + d['tneg']))\n",
    "                + (1 / d['cpos']) - (1 / (d['cpos'] + d['cneg'])))\n",
    "d['c'] = 1"
   ],
   "id": "d9dca509f4fea56a",
   "outputs": [],
   "execution_count": 3
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "## Common-Effect Meta-Analysis\n",
    "\n",
    "In the first case, we can simply take a weighted average of the point estimates. Here, we use the inverse of the estimated variance as the weight. This means that studies that are more precise (i.e., smaller variance), have a larger contribution to the overall estimate relative to studies that are imprecise. In meta-analysis, this approach is commonly referred to as common-effect or equal-effect models.\n",
    "\n",
    "As this is simply a weighted average of the point estimates, we can use the existing `ee_mean` function. Here, the risk difference is considered first"
   ],
   "id": "aa21da0adafb1184"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-04-29T18:15:06.812506600Z",
     "start_time": "2026-04-29T18:15:06.803505Z"
    }
   },
   "cell_type": "code",
   "source": [
    "def psi_rd(theta):\n",
    "    return ee_mean(theta=theta, y=d['rd'], weights=1/d['var_rd'])"
   ],
   "id": "6c2592b89c0be30a",
   "outputs": [],
   "execution_count": 4
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-04-29T18:15:06.837527800Z",
     "start_time": "2026-04-29T18:15:06.814645800Z"
    }
   },
   "cell_type": "code",
   "source": [
    "estr = MEstimator(psi_rd, init=[0., ], finite_correction='hc1')\n",
    "estr.estimate()\n",
    "estr.print_results(decimals=3)"
   ],
   "id": "e1a12e429fe0a1f4",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "==============================================================\n",
      "              Estimation Method: M-estimator\n",
      "--------------------------------------------------------------\n",
      "No. Observations:          13 | No. Parameters:              1\n",
      "Solving algorithm:         lm | Max Iterations:           5000\n",
      "Solving tolerance:      1e-09 | Allow P-Inverse:             1\n",
      "Derivative Method:     approx | Deriv Approx:            1e-09\n",
      "Small N Correction:       hc1 | Distribution:           t-stat\n",
      "==============================================================\n",
      "   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value \n",
      "--------------------------------------------------------------\n",
      "  -0.001    0.001   -1.405   -0.002    0.001    0.185    2.431 \n",
      "==============================================================\n"
     ]
    }
   ],
   "execution_count": 5
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "On the risk difference scale, we see little change by vaccination. In general, the risk of TB was low in the data so this is not surprising.\n",
    "\n",
    "We can also repeat this process for the risk ratio. However, we can no longer simply take the mean of the risk ratios for the meta-analysis. For the average, we want the point estimates to combine linearly. Therefore, we instead meta-analyze the *log risk ratios* and the variance of the *log risk ratios*. The following code applies this process in `delicatessen`"
   ],
   "id": "64042cca87a2ee21"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-04-29T18:15:06.882292Z",
     "start_time": "2026-04-29T18:15:06.848054300Z"
    }
   },
   "cell_type": "code",
   "source": [
    "def psi_rr(theta):\n",
    "    return ee_mean(theta=theta, y=np.log(d['rr']), weights=1/d['var_lrr'])"
   ],
   "id": "3a17603e01989fdd",
   "outputs": [],
   "execution_count": 6
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-04-29T18:15:06.907771200Z",
     "start_time": "2026-04-29T18:15:06.883998600Z"
    }
   },
   "cell_type": "code",
   "source": [
    "estr = MEstimator(psi_rr, init=[0., ], finite_correction='hc1')\n",
    "estr.estimate()\n",
    "estr.print_results(decimals=3)"
   ],
   "id": "ee7b19fc2479211d",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "==============================================================\n",
      "              Estimation Method: M-estimator\n",
      "--------------------------------------------------------------\n",
      "No. Observations:          13 | No. Parameters:              1\n",
      "Solving algorithm:         lm | Max Iterations:           5000\n",
      "Solving tolerance:      1e-09 | Allow P-Inverse:             1\n",
      "Derivative Method:     approx | Deriv Approx:            1e-09\n",
      "Small N Correction:       hc1 | Distribution:           t-stat\n",
      "==============================================================\n",
      "   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value \n",
      "--------------------------------------------------------------\n",
      "  -0.430    0.229   -1.878   -0.930    0.069    0.085    3.558 \n",
      "==============================================================\n"
     ]
    }
   ],
   "execution_count": 7
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "which corresponds to a risk ratio of 0.65. This example highlights how the different scales interact with the absolute risk in different ways.\n",
    "\n",
    "Here, we also apply the HC1 finite-sample correction for the variance since we have relatively few observations.\n",
    "\n",
    "## Random-Effect Meta-Analysis\n",
    "\n",
    "The previous common-effect approach assumes that all studies include in the meta-analysis share a risk ratio. This is unlikely to be the case. Instead, we can use a random-effects model which assumes that the included studies are drawn from a distribution of true effects. In other words, heterogeneity in the risk ratio between studies is allowed.\n",
    "\n",
    "This approach involves the estimation of an additional parameter corresponding to the heterogeneity between studies. Random-effect meta-analysis is enabled through the `ee_meta_random` estimating functions. Implicitly, the method being used to combine studies in the random-effects model is the Paule-Mandel method, which is equivalent to Empirical Bayes (Viechtbauer et al. 2015). Here, we apply this approach to meta-analyze the risk ratios"
   ],
   "id": "7f7770d7f23955d3"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-04-29T18:15:06.922082Z",
     "start_time": "2026-04-29T18:15:06.912526200Z"
    }
   },
   "cell_type": "code",
   "source": [
    "def psi_mr(theta):\n",
    "    return ee_meta_random(theta=theta, point_est=np.log(d['rr']), var_est=d['var_lrr'])"
   ],
   "id": "56b000d2b8a61ca3",
   "outputs": [],
   "execution_count": 8
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-04-29T18:15:06.949403Z",
     "start_time": "2026-04-29T18:15:06.922082Z"
    }
   },
   "cell_type": "code",
   "source": [
    "estr = MEstimator(psi_mr, init=[np.mean(np.log(d['rr'])), 0.], finite_correction='hc1')\n",
    "estr.estimate()\n",
    "estr.print_results(decimals=3)"
   ],
   "id": "bd56f3739c714b4b",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "==============================================================\n",
      "              Estimation Method: M-estimator\n",
      "--------------------------------------------------------------\n",
      "No. Observations:          13 | No. Parameters:              2\n",
      "Solving algorithm:         lm | Max Iterations:           5000\n",
      "Solving tolerance:      1e-09 | Allow P-Inverse:             1\n",
      "Derivative Method:     approx | Deriv Approx:            1e-09\n",
      "Small N Correction:       hc1 | Distribution:           t-stat\n",
      "==============================================================\n",
      "   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value \n",
      "--------------------------------------------------------------\n",
      "  -0.715    0.188   -3.801   -1.129   -0.301    0.003    8.411 \n",
      "  -1.145    0.270   -4.244   -1.740   -0.551    0.001    9.501 \n",
      "==============================================================\n"
     ]
    }
   ],
   "execution_count": 9
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "Here, we see a stronger risk ratio (`exp(-0.715)=0.50`) that seems substantially different from the null (based on the S-value). Here also see that the second parameter, which is the log-transformed heterogeneity parameter is far from zero. Therefore, there is some underlying heterogeneity between these studies.\n",
    "\n",
    "One caution is that the random-effect meta-analysis can perform poorly when there is few studies being meta-analyzed. Therefore, this approach is not to be recommended when there are fewer than 10 studies.\n",
    "\n",
    "## Random-Effect Meta-Regression\n",
    "\n",
    "Beyond simply summarizing the point estimates, we can also model how the point estimates vary by characteristics of the studies. For example, we can model how the estimated risk ratios change as a function of the latitude of the study location.\n",
    "\n",
    "To simplify interpretation and improve computational performance, we take several initial steps. First, the latitude of the study locations are centered, so that the intercept corresponds to the mean latitude (rather than a latitude of zero). The second part is that we first fit a common-effect meta-regression model. The parameters of that model are used as starting values for the random-effect meta-regression model. The following code applies these two steps"
   ],
   "id": "deefff5d4a318dd"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-04-29T18:15:06.962234400Z",
     "start_time": "2026-04-29T18:15:06.952237500Z"
    }
   },
   "cell_type": "code",
   "source": "d['ablat_c'] = d['ablat'] - np.mean(d['ablat'])",
   "id": "28528737b0c96365",
   "outputs": [],
   "execution_count": 10
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-04-29T18:15:06.982422200Z",
     "start_time": "2026-04-29T18:15:06.963522700Z"
    }
   },
   "cell_type": "code",
   "source": [
    "# Finding good initial values by fitting a common-effect meta-regression\n",
    "def psi_reg(theta):\n",
    "    return ee_regression(theta=theta, y=np.log(d['rr']),\n",
    "                         X=d[['c', 'ablat_c']], model='linear',\n",
    "                         weights=1/d['var_lrr'])\n",
    "\n",
    "estr = MEstimator(psi_reg, init=[-0.1, -0.03])\n",
    "estr.estimate()\n",
    "init_beta = list(estr.theta)"
   ],
   "id": "dac69d15b559f252",
   "outputs": [],
   "execution_count": 11
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "Having done this background preparation, we can now estimate the parameters of the random-effects meta-regression model",
   "id": "8113f34d283ae28"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-04-29T18:15:06.995304Z",
     "start_time": "2026-04-29T18:15:06.984453300Z"
    }
   },
   "cell_type": "code",
   "source": [
    "def psi_mreg(theta):\n",
    "    return ee_meta_regression(theta=theta, point_est=np.log(d['rr']),\n",
    "                              var_est=d['var_lrr'],\n",
    "                              X=d[['c', 'ablat_c']])"
   ],
   "id": "4b511402ed8e0d5e",
   "outputs": [],
   "execution_count": 12
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-04-29T18:15:07.034030200Z",
     "start_time": "2026-04-29T18:15:06.995304Z"
    }
   },
   "cell_type": "code",
   "source": [
    "estr = MEstimator(psi_mreg, init=init_beta + [0.1, ])\n",
    "estr.estimate()\n",
    "estr.print_results(decimals=4)"
   ],
   "id": "7edf8d5eede0a88d",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "==============================================================\n",
      "              Estimation Method: M-estimator\n",
      "--------------------------------------------------------------\n",
      "No. Observations:          13 | No. Parameters:              3\n",
      "Solving algorithm:         lm | Max Iterations:           5000\n",
      "Solving tolerance:      1e-09 | Allow P-Inverse:             1\n",
      "Derivative Method:     approx | Deriv Approx:            1e-09\n",
      "Small N Correction:      None | Distribution:           Z-stat\n",
      "==============================================================\n",
      "   Theta   StdErr  Z-score      LCL      UCL  P-value  S-value \n",
      "--------------------------------------------------------------\n",
      " -0.7339   0.1088  -6.7472  -0.9471  -0.5207   0.0000  35.9495 \n",
      " -0.0286   0.0059  -4.8184  -0.0402  -0.0169   0.0000  19.3985 \n",
      " -1.9510   0.8860  -2.2019  -3.6876  -0.2144   0.0277   5.1755 \n",
      "==============================================================\n"
     ]
    }
   ],
   "execution_count": 13
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "In this model, we can see the risk ratio for the mean latitude (first parameter), how the risk ratio varies by a one degree change in latitude (second parameter), and the estimated heterogeneity. One will notice here that the point estimate for heterogeneity decreased somewhat. This is indicative of latitude playing a role in explaining the heterogeneity of results between studies.\n",
    "\n",
    "The advantage of `delicatessen` for meta-analysis is that the estimating equations automatically accomodate the dependence of the summary risk ratios on the heterogeneity parameter. Other methods for meta-analysis and meta-regression do not always appropriately incorporate this variability into the parameters of interest. As described for the DerSimonian–Laird estimator, the heterogeneity parameter is assumed to be known, which can then under-estimate the variance (Cornell et al. 2014). `delicatessen` solves this issue through the use of the empirical sandwich variance estimator\n",
    "\n",
    "## References\n",
    "\n",
    "Colditz GA, Brewer TF, Berkey CS, Wilson ME, Burdick E, Fineberg HV, & Mosteller F. (1994). Efficacy of BCG vaccine in the prevention of tuberculosis: meta-analysis of the published literature. *JAMA*, 271(9), 698-702.\n",
    "\n",
    "Cornell JE, Mulrow CD, Localio R, Stack CB, Meibohm AR, Guallar E, & Goodman SN. (2014). Random-effects meta-analysis of inconsistent effects: a time for change. *Annals of Internal Medicine*, 160(4), 267-270.\n",
    "\n",
    "Paule RC, & Mandel J. (1982). Consensus values and weighting factors. *Journal of research of the National Bureau of Standards*, 87(5), 377.\n",
    "\n",
    "Viechtbauer W, López-López JA, Sánchez-Meca J, & Marín-Martínez F. (2015). A comparison of procedures to test for moderators in mixed-effects meta-regression models. *American Psychological Association*, 20(3), 360.\n",
    "\n",
    "Veroniki AA, Jackson D, Viechtbauer W, et al. (2016). Methods to estimate the between‐study variance and its uncertainty in meta‐analysis. *Research Synthesis Methods*, 7(1), 55-79."
   ],
   "id": "d5e7ab2f276561c5"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}