What is TranspoSeq?

TranspoSeq identifies non-reference somatic retrotransposon insertions given tumor and normal BAM files.


Source Code

Download the source: TranspoSeq_v2.0.tar.gz


Reference Files

TranspoSeq requires reference files of approximately 9G (compressed). These can either be downloaded in bulk or in several downloads via the categorized table below.

File Download Size
RepeatMasker (Hg18) RepeatMasker_Hg18.tar.gz 46 MB
RepeatMasker (Hg19) RepeatMasker_Hg19.tar.gz 54 MB
Reference Genome (Hg18) RefGenome_Hg18.tar.gz 4.3 GB
Reference Genome (Hg19) RefGenome_Hg19.tar.gz 2.7 GB
RefSeq Genes (Hg18, Hg19) RefSeqGenes.tar.gz 5.6 MB
Previously annotated RIPs (Hg18, Hg19) RIPs.tar.gz 5.1 MB
Miscellaneous files Misc.tar.gz 1.1 MB
Sam* jar file sam.jar 528 KB
All Required Reference Files TranspoSeq_Reffiles.tar.gz 8.9 GB


How does TranspoSeq work?

TranspoSeq is a computational framework that takes in paired-end sequencing data and produces a list of annotated putative somatic retrotransposon insertion sites. First, input BAMs are parsed for discordant read-pairs; these pairs are then aligned to a consensus retrotransposon sequence. Pairs with one read aligning to the retrotransposon database and the other aligning to the reference genome with little ambiguity are clustered in the forward and reverse directions. Overlaps of clusters are identified and annotated to support a putative non-reference retrotransposon at the given genomic position. Finally, the read-pairs within each cluster are assembled de-novo and the resulting contig is aligned to both the reference and retrotransposon database to annotate the element that was inserted. Events with strong evidence that pass filtering criteria are retained and classified as somatic or germline.

Schematic of TranspoSeq