MIT CSAIL Biomedical data sharing and analysis with privacy
Abstract: Although open sharing of genomic or pharmacological data would greatly advance science, it is generally not viable due to data privacy and intellectual property concerns. Building upon modern cryptographic tools, we introduce privacy-preserving computational protocols that could encourage data sharing and collaboration in biomedicine. First, we describe the first scalable and secure protocol for large-scale genome-wide association analysis that facilitates quality control and population stratification correction while maintaining the confidentiality of underlying genotypes and phenotypes. We show the protocol could feasibly scale to a million individuals. Second, we introduce a protocol for securely training a neural network model of drug-target interaction (DTI) that ensures the confidentiality of all underlying drugs, targets, and observed interactions. Our protocol scales to a real dataset of more than a million interactions, and is more accurate than state-of-the-art DTI prediction methods. Using our protocol, we discover novel DTIs that we experimentally validated via targeted assays. Our work lays a foundation for more effective and cooperative biomedical research.