Tool Development

Pilon is a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3-5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.

Publications

 

SynerClust performs orthogroup clustering, incorporating both sequence similarity as well as synteny in the context of a phylogenetic tree. This combination allows for fast and accurate orthogroup construction. This is a faster and more user-friendly implementation of the algorithm introduced by Wapinski et al. in 2007 in Nature and Bioinformatics. (Christophe Georgescu; Allison Griggs). Synerclust is freely available on Github.