{ "cells": [ { "metadata": {}, "cell_type": "markdown", "source": [ "# Ding (2024) Chapter 25: Mendelian Randomization\n", "\n", "Mendelian Randomization is an alternative identification and estimation strategy. It is based on leveraging genes as instrumental variables for some exposure and outcome. Importantly, it is premised on the standard assumptions made in instrumental variable analysis, so it is only as valid as those assumptions.\n", "\n", "Mendelian Randomization with individual-level data becomes a single Two-Stage Least Squares problem, which are illustrated with `delicatessen` in other examples. However, many Mendelian Randomization analyses are based on summary-level data. Here, we use `delicatessen` to apply these methods. The provided example comes from Peng Ding's book *A First Course in Causal Inference*. Using data from the R package `mr.raps` (`bmi.sbp`), the results from 3 GWAS studies are used to assess the effect of body mass index (BMI) on systolic blood pressure (SBP). For various reasons, this 'effect' is likely ill-defined. As such, this example should only be viewed as illustrative of how Mendelian Randomization analyses can be done using `delicatessen`.\n", "\n", "## Setup" ], "id": "b5510c844b4260f7" }, { "cell_type": "code", "id": "initial_id", "metadata": { "collapsed": true, "ExecuteTime": { "end_time": "2026-04-30T16:35:24.902264200Z", "start_time": "2026-04-30T16:35:23.664516600Z" } }, "source": [ "import numpy as np\n", "import scipy as sp\n", "import pandas as pd\n", "import delicatessen as deli\n", "from delicatessen import MEstimator\n", "from delicatessen.estimating_equations import ee_regression\n", "\n", "print(\"Versions\")\n", "print(\"NumPy: \", np.__version__)\n", "print(\"SciPy: \", sp.__version__)\n", "print(\"Pandas: \", pd.__version__)\n", "print(\"Delicatessen:\", deli.__version__)" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Versions\n", "NumPy: 2.3.5\n", "SciPy: 1.16.3\n", "Pandas: 2.3.3\n", "Delicatessen: 4.2\n" ] } ], "execution_count": 1 }, { "metadata": { "ExecuteTime": { "end_time": "2026-04-30T16:35:29.267860500Z", "start_time": "2026-04-30T16:35:29.219389300Z" } }, "cell_type": "code", "source": [ "d = pd.read_csv(\"data/mr_bmi_sbp.csv\")\n", "d['I'] = 1" ], "id": "771f01b93eae3dc2", "outputs": [], "execution_count": 3 }, { "metadata": {}, "cell_type": "markdown", "source": [ "Here, we analyze the summary estimates and the estimated variance using Egger regression. Egger regression operates by fitting a linear model for the association between the outcome and exposure conditional on the association between the instrument and exposure. Here, a weighted linear regression is used, where the weights are the inverse of the variance for the association between the exposure and outcome.\n", "\n", "Fitting this model can be done using the `ee_regression` functionality in `delicatessen` as follows" ], "id": "a0edbed43cbbbf23" }, { "metadata": { "ExecuteTime": { "end_time": "2026-04-30T16:37:21.444063400Z", "start_time": "2026-04-30T16:37:21.370931200Z" } }, "cell_type": "code", "source": [ "def psi(theta):\n", " return ee_regression(theta, y=d['beta.outcome'],\n", " X=d[['I', 'beta.exposure']],\n", " weights=1/(d['se.outcome']**2),\n", " model='linear')" ], "id": "7f43a264618c445", "outputs": [], "execution_count": 6 }, { "metadata": { "ExecuteTime": { "end_time": "2026-04-30T16:37:34.353773Z", "start_time": "2026-04-30T16:37:34.293438400Z" } }, "cell_type": "code", "source": [ "estr = MEstimator(psi, init=[0, 0])\n", "estr.estimate()\n", "estr.print_results(decimals=4)" ], "id": "74f2780fe7ccabde", "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==============================================================\n", " Estimation Method: M-estimator\n", "--------------------------------------------------------------\n", "No. Observations: 160 | No. Parameters: 2\n", "Solving algorithm: lm | Max Iterations: 5000\n", "Solving tolerance: 1e-09 | Allow P-Inverse: 1\n", "Derivative Method: approx | Deriv Approx: 1e-09\n", "Small N Correction: None | Distribution: Z-stat\n", "==============================================================\n", " Theta StdErr Z-score LCL UCL P-value S-value \n", "--------------------------------------------------------------\n", " 0.0001 0.0020 0.0566 -0.0038 0.0040 0.9549 0.0666 \n", " 0.3173 0.1069 2.9678 0.1078 0.5268 0.0030 8.3812 \n", "==============================================================\n" ] } ], "execution_count": 8 }, { "metadata": {}, "cell_type": "markdown", "source": [ "These point estimates match those provided in Section 25.3 of the book. You will notice that there are differences between the standard error estimates. This difference arises from `delicatessen` leveraging the empirical sandwich variance estimator for inference. However, the statistical conclusions we would draw remain consistent between the approaches.\n", "\n", "## References\n", "\n", "Ding P. (2024). *A First Course in Causal Inference*. Chapman and Hall/CRC." ], "id": "1bcb9939aeae697c" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 5 }