Machine Learning for Health (ML4H)

Machine Learning for Health (ML4H) is an effort led by the Broad Institute in collaboration with faculty members from Massachusetts General Hospital, Brigham and Women’s Hospital, and MIT. Our goal is to use machine learning to drive new fundamental research into the genetic underpinnings of disease, disease subgroup classification, and risk prediction with applications in clinical trials and clinical decision support.

Our initial array of projects is focused on cardiovascular disease and healthy aging. Our ultimate vision is to accelerate the real-world impact of clinical AI across all areas of medicine. Here are a few snapshots of our work. See our publications page for a complete list.

S. Agarwal (Khera Lab) and MDR Klarqvist (ML4H) built a CNN to quantify fat depots from MRI and showed that the precise variation of fat deposits across the body can modify the effect of BMI, and can be protective or harmful for developing diabetes or coronary artery disease. [Paper]

The Community Care Cohort Project (“C3PO”) led by S. Kurshid (Lubitz Lab) and C. Reeder (ML4H) has longitudinal high-resolution clinical data for over a half-million individuals. ML4H uses NLP on the ~80B tokens in C3PO to scale phenotype ascertainment and to drive new biological discovery and clinical impact across a wide variety of clinical analyses. [Paper]

N. Diamant (Stultz Lab, ML4H) devised a new approach for transfer learning on ECGs Patient Contrastive Learning Representations (PCLR) that creates a more performant, efficient, and pragmatic representation of an ECG that outperforms training a deep learning model from scratch in data sets with less than a few thousand labeled events. [Paper]

S. Khurshid (Lubitz Lab) and S. F. Friedman (ML4H) developed ECG-AI, a deep learning model using 12-lead ECG for predicting time to incident atrial fibrillation (AF). ECG-AI demonstrated improved predictive usefulness of incident AF when combined with a clinical risk model. A subsequent genome-wide association study (GWAS) showed that risk estimates of ECG-AI are influenced by AF specific genetic mechanisms. [Paper] [Paper]

J. Cunningham (BWH), P. Singh (ML4H) and C. Reeder (ML4H) developed a natural language processing model that accurately identifies heart failure events from unstructured discharge summaries. [Paper]

A. Radhakrishnan (MIT) and S. F. Friedman (ML4H) developed a cross-modal autoencoder framework integrating ECGs and MRIs for constructing a holistic representation of the cardiovascular state. The joint representations were shown to improve phenotype prediction from a single modality and enable data imputation. [Paper]

E. Lau (MGH) and P. Di Achille (ML4H) developed a segmentation-free deep learning model that interprets echocardiograms and automatically extracts left heart structural and functional measurements. These measurements were shown to be highly associated with future clinical outcomes. [Paper]

J. Pirruccello (MGH, Broad, ML4H) built a deep learning model to characterize aortic dimensions across 37K UK biobank participants discovering ~100 loci in a Genome-Wide Association Study (GWAS) and predicting the risk of aortic dissection. A subsequent work further investigated the genetic contributions to aortic diameter. [Paper] [Paper].

Phenotyping Cardiac Fibrosis at Scale: V. Nauffal (BWH, Broad, ML4H) and P. Di Achille (ML4H) developed a segmentation model to automate quantification of T1 time from mid-ventricular short-axis cardiac MRI T1 maps in the UK Biobank. Their work contributed to the largest dataset of myocardial T1 time and enabled novel insights into novel biologic pathways relevant to fibrosis for therapeutic targeting. [Manuscript][Research editorial]

ML4H also maintains the open-source ML4H codebase on behalf of the entire research community.

To discuss collaboration opportunities, please contact ML4H's director, Mahnaz Maddah.