You are here

Bioinformatics DOI:10.1093/bioinformatics/btp190

Identifying novel constrained elements by exploiting biased substitution patterns.

Publication TypeJournal Article
Year of Publication2009
AuthorsGarber, M, Guttman, M, Clamp, M, Zody, MC, Friedman, N, Xie, X
JournalBioinformatics
Volume25
Issue12
Pagesi54-62
Date Published2009 Jun 15
ISSN1367-4811
KeywordsAlgorithms, Base Sequence, Evolution, Molecular, Genomics, Sequence Alignment, Software
Abstract

MOTIVATION: Comparing the genomes from closely related species provides a powerful tool to identify functional elements in a reference genome. Many methods have been developed to identify conserved sequences across species; however, existing methods only model conservation as a decrease in the rate of mutation and have ignored selection acting on the pattern of mutations.

RESULTS: We present a new approach that takes advantage of deeply sequenced clades to identify evolutionary selection by uncovering not only signatures of rate-based conservation but also substitution patterns characteristic of sequence undergoing natural selection. We describe a new statistical method for modeling biased nucleotide substitutions, a learning algorithm for inferring site-specific substitution biases directly from sequence alignments and a hidden Markov model for detecting constrained elements characterized by biased substitutions. We show that the new approach can identify significantly more degenerate constrained sequences than rate-based methods. Applying it to the ENCODE regions, we identify as much as 10.2% of these regions are under selection.

AVAILABILITY: The algorithms are implemented in a Java software package, called SiPhy, freely available at http://www.broadinstitute.org/science/software/.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

URLhttp://bioinformatics.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=19478016
DOI10.1093/bioinformatics/btp190
Pubmed

http://www.ncbi.nlm.nih.gov/pubmed/19478016?dopt=Abstract

Alternate JournalBioinformatics
PubMed ID19478016
PubMed Central IDPMC2687944