Scientists in the Broad community have developed many critical software tools for the analysis of increasingly large genome-related datasets, and they make these tools openly available to the scientific community. For the conditions governing the use of Broad Institute software, please see the software use agreement associated with the tools you choose to download.

Use our search function, browse the complete software collection or click on one of the software categories listed below:

  • NAST-iEr

    The NAST-iEr alignment utility aligns a single raw nucleotide sequence against one or more NAST formatted sequences.

  • Pilon

    Pilon uses read alignment analysis to diagnose, report, and automatically improve genome assemblies, and it can also be used to make variant calls among similar haploid strains.

  • RC454

    RC454 is a program that takes a set of 454 read and quality files as well as a consensus assembly for those reads and corrects for known 454 error modes such as homopolymer indels and carry forward/incomplete extension (CAFIE). It will also correct for any indel that breaks the reading frame, unless it occurs in more than 25% of the reads. Since the algorithm is aggressive in correcting for errors, it is important to align the reads to their own assembly rather than to an external reference to prevent misalignments as much as possible. RC454 uses Mosaik to align the corrected reads between each step, and as such it is required to run the script.

  • Siphy

    Siphy analyzes multiple sequence alignments and single outs bases or small regions that are undergoing selection by looking at reduction in substitution rates and unexpected detecting substitution patterns. A specific program to detect conserved transcription factor binding sites is also available.

  • Spines

    Spines is a C++ software package for genomic sequence alignment and analysis. The source code is publicly available under the Gnu Lesser General Public License.


    VICUNA is a de novo assembly program targeting populations with high mutation rates. It creates a single linear representation of the mixed population on which intra-host variants can be mapped. For clinical samples rich in contamination (e.g., >95%), VICUNA can leverage existing genomes, if available, to assemble only target-alike reads. After initial assembly, it can also use existing genomes to perform guided merging of contigs. For each data set (e.g., Illumina paired read, 454), VICUNA outputs consensus sequence(s) and the corresponding multiple sequence alignment of constituent reads. VICUNA efficiently handles ultra-deep sequence data with tens of thousands fold coverage.