|Publication Type||Journal Article|
|Year of Publication||2005|
|Authors||Mikkelsen, TS, Galagan, JE, Mesirov, JP|
|Date Published||2005 Feb 15|
|Keywords||Algorithms, Bayes Theorem, Chromosome Mapping, DNA Mutational Analysis, Evolution, Molecular, Gene Expression Profiling, Genomics, Models, Genetic, Models, Statistical, Phylogeny, Sequence Alignment, Sequence Analysis, DNA|
MOTIVATION: A promising strategy for refining genome annotations is to detect features that conflict with known functional or evolutionary relationships between groups of genes. Previous work in this area has been focused on investigating the absence of 'housekeeping' genes or components of well-studied pathways. We have sought to develop a method for improving new annotations that can automatically synthesize and use the information available in a database of other annotated genomes.
RESULTS: We show that a probabilistic model of phylogenetic profiles, trained from a database of curated genome annotations, can be used to reliably detect errors in new annotations. We use our method to identify 22 genes that were missed in previously published annotations of prokaryotic genomes.
AVAILABILITY: The method was evaluated using MATLAB and open source software referenced in this work. Scripts and datasets are available from the authors upon request.