Scientists in the Broad community have developed many critical software tools for the analysis of increasingly large genome-related datasets, and they make these tools openly available to the scientific community. For the conditions governing the use of Broad Institute software, please see the software use agreement associated with the tools you choose to download.

Use our search function, browse the complete software collection or click on one of the software categories listed below:

  • Spines

    Spines is a C++ software package for genomic sequence alignment and analysis. The source code is publicly available under the Gnu Lesser General Public License.

  • Sweep

    Sweep allows large-scale analysis of haplotype structure in genomes for the primary purpose of detecting evidence of natural selection.

  • Tagger

    Tagger is a web server for the selection and evaluation of tag SNPs from genotype data.

  • TreeChopper

    TreeChopper clusters tree leaf nodes according to phylogenetic distance.

  • Trinity

    Trinity, developed at the Broad Institute, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-Seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. 

    Visit the Trinity Sourceforge page.

  • Tumorscape

    The Tumorscape portal facilitates the use and understanding of high resolution copy number data amassed from multiple cancer types. It supports gene-level analysis, analysis by cancer type, and the downloading/browsing of data.

  • Ultrasome

    Ultrasome is an extremely efficient methodology for detecting gains and losses of chromosomal material in DNA copy-number data. The program processes latest-generation copy number arrays about 10,000 times faster than standard methods (e.g., CBS) while retaining comparable analyticaccuracy.

  • V-FAT

    V-FAT is a tool to perform automated computational finishing and annotation of de novo viral assemblies. V-FAT uses reference and read data to order and merge contigs, correct frameshifts, and produce NCBI-ready annotation files. It also performs a set of quality assurance measurements including coverage computation by gene or amplicon and identification of potential consensus errors.

  • V-Phaser

    V-Phaser is a tool to call variants in mixed populations from ultra-deep sequence data. V-Phaser combines information regarding the covariation (i.e. phasing) between observed variants to increase sensitivity and an expectation maximization algorithm that iteratively recalibrates base quality scores to increase specificity. V-Phaser can reliably detect rare variants in mixed populations that occur at frequencies of <1%. The V-Phaser package also includes V-Profiler a tool to analyze and visualize variants.


    VICUNA is a de novo assembly program targeting populations with high mutation rates. It creates a single linear representation of the mixed population on which intra-host variants can be mapped. For clinical samples rich in contamination (e.g., >95%), VICUNA can leverage existing genomes, if available, to assemble only target-alike reads. After initial assembly, it can also use existing genomes to perform guided merging of contigs. For each data set (e.g., Illumina paired read, 454), VICUNA outputs consensus sequence(s) and the corresponding multiple sequence alignment of constituent reads. VICUNA efficiently handles ultra-deep sequence data with tens of thousands fold coverage.