In silico phenotyping via co-training for improved phenotype prediction from genotype.

Bioinformatics

Authors	Damian Roqueiro Menno Witteveen Verneri Anttila Gisela Terwindt Arn van den Maagdenberg Karsten Borgwardt
Keywords	Humans Phenotype Genotype Algorithms Computer Simulation Disease Genotyping Techniques Migraine with Aura Migraine without Aura
Abstract	MOTIVATION: Predicting disease phenotypes from genotypes is a key challenge in medical applications in the postgenomic era. Large training datasets of patients that have been both genotyped and phenotyped are the key requisite when aiming for high prediction accuracy. With current genotyping projects producing genetic data for hundreds of thousands of patients, large-scale phenotyping has become the bottleneck in disease phenotype prediction. RESULTS: Here we present an approach for imputing missing disease phenotypes given the genotype of a patient. Our approach is based on co-training, which predicts the phenotype of unlabeled patients based on a second class of information, e.g. clinical health record information. Augmenting training datasets by this type of in silico phenotyping can lead to significant improvements in prediction accuracy. We demonstrate this on a dataset of patients with two diagnostic types of migraine, termed migraine with aura and migraine without aura, from the International Headache Genetics Consortium. CONCLUSIONS: Imputing missing disease phenotypes for patients via co-training leads to larger training datasets and improved prediction accuracy in phenotype prediction. AVAILABILITY AND IMPLEMENTATION: The code can be obtained at: http://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-…
Year of Publication	2015
Journal	Bioinformatics
Volume	31
Issue	12
Pages	i303-10
Date Published	2015 Jun 15
ISSN	1367-4811
URL	http://bioinformatics.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=26072497
DOI	10.1093/bioinformatics/btv254
PubMed ID	26072497
PubMed Central ID	PMC4765855
Links	PubMed Google Scholar DOI

Recent Broad Publications

An esophagus cell atlas reveals dynamic rewiring during active eosinophilic esophagitis and remission.

Genome-wide association analyses identify 95 risk loci and provide insights into the neurobiology of post-traumatic stress disorder.

ACE inhibitors and angiotensin receptor blockers differentially alter the response to angiotensin II treatment in vasodilatory shock.

Bioinformatics pipeline for the systematic mining genomic and proteomic variation linked to rare diseases: The example of monogenic diabetes.

Multi-ancestry meta-analysis of tobacco use disorder identifies 461 potential risk genes and reveals associations with multiple health outcomes.