Firehose Parameters

From GSA

Jump to: navigation, search

Contents

DEPRECATED

NOTE: we are no longer using this set of parameters. We have switched to the Queue based architecture described here

Current Standard Parameters

Reference: We prefer to use hg19 when possible, unless external collaborators are using hg18, or several hundred samples have already been through on hg18 dbSNP: We're still on dbSNP 129 because of the lower quality of later versions. We're hoping that the new version coming out will be good. Other parameters have been listed in the command lines below.

The GSA Firehose Standard Run

Please note that the parameters mentioned below are currently being used for only small target and whole exome GAII and HiSeq sequences only. These are not necessarily optimal for low-pass whole genome or other types of sequencing, and running the following pipelines on inappropriate samples may result in erroneous results. Please see Using the GATK for Variant Detection for suggestions about how best to use the GATK for your purposes.

We always run the following for Exome and HybSel Samples:

SampleCleanBam

This pipeline realigns sequences around indels, as described here.

Inputs

  • reference
  • realigned bam
  • interval list
  • dbsnp
  • blacklist of bad lanes
  • base.name (the name of the sample)
  • temporary directory

Command Lines

  • RealignerTargetCreator (GenomeAnalysisTK-1.0.4013 release)
  java -Xmx2g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -I <input.bam> -R <reference.genome> <interval.list> <blacklist.file> -o <base.name>.merged.intervals
  • IndelRealigner (GenomeAnalysisTK-1.0.4013 release)
  java -Djava.io.tmpdir=<tmp.dir> -Xmx5g -jar GenomeAnalysisTK.jar -T IndelRealigner -I <input.bam> -R <reference.genome> <blacklist.file> -stats <base.name>.indel.stats -O <base.name>.unfixed.cleaned.bam -maxInRam 200000 -targetIntervals <merged.intervals> -D <dbsnp.file>
  • FixMates
  java -Xmx4096m -jar /humgen/gsa-firehose/firehose/genepattern_data/taskLib/FixMates.13.765/FixMateInformation.jar I=<input.bam> SO=coordinate VALIDATION_STRINGENCY=SILENT TMP_DIR=<tmp.dir> O=<base.name>.cleaned.bam
 

Outputs

  • merged intervals (for IR)
  • unfixed cleaned bam
  • cleaned bam (with fixed mates)

SampleIndelGenotyper

This pipeline uses the Indel Genotyper to call inserstions and deletions.

Inputs

  • reference
  • realigned bam
  • interval list
  • dbsnp
  • blacklist (optional)

Command Lines

  • IndelGenotyperNoTumor (GenomeAnalysisTK-1.0.3116 release)
   java -jar GenomeAnalysisTK.jar -l INFO -T IndelGenotyperV2 -verbose -I <input.bams> -R <reference.genome> <interval.list> -mnr 20000 -mrl 20000 -o <sample.id>_indels.verbose.bed -e gatk.err -O <sample.id>_indels.bed <blacklist.file>
  • Filter Indels
   perl /humgen/gsa-firehose/firehose/genepattern_data/taskLib/FilterIndelCalls.5.574/filterSingleSampleCalls.pl --calls <sample.verbose.calls> --max_cons_av_mm 1.9 --max_cons_nqs_av_mm 0.2 --mode ANNOTATE --output <sample.id>_filtered_indels.bed

Outputs

  • Indel Calls
  • Filtered indel calls

SetUnifiedGenotypertoEval

This creates SNP calls for a set of samples.

Inputs

  • reference
  • list of cleaned bams from SampleCleanBam
  • interval list
  • dbsnp
  • blacklist (optional)
  • sample set base name

Command Lines

   java -Xmx5g -jar GenomeAnalysisTK.jar-T UnifiedGenotyper -I <bam.list> -R <reference.genome> --DBSNP <dbsnp> <interval.list> -mrl 1000000 -mbq 20 -mmq 30 -confidence 10 <blacklist.file> <platform> -varout <base.name>.vcf -l INFO -A HaplotypeScore
   java -Xmx2g -jar GenomeAnalysisTK.jar -T VariantFiltration -R <reference.genome> -D <dbsnp> <interval.list> --clusterWindowSize 10 -B variant,VCF,<unfiltered.vcf> -filterName <filter.name> -filter <filter.expression> <filter2.name> <filter2.expression> <filter3.name> <filter3.expression> <filter4.name> <filter4.expression> -o <base.name>.filtered.vcf
   <java> -Xmx2g -jar GenomeAnalysisTK.jar -T VariantEval -R <reference.genome> -D <dbsnp> <interval.list> --extensiveSubsets -B eval,VCF,<vcf.file> -o <sample.id>filtered.eval
   <java> -Xmx2g -jar GenomeAnalysisTK.jar -T VariantEval -R <reference.genome> -D <dbsnp> <interval.list> --extensiveSubsets -B eval,VCF,<vcf.file> -o <sample.id>unfiltered.eval
  • VCFToMaf
   sh /humgen/gsa-firehose/firehose/genepattern_data/taskLib/VcfToMaf.5.494/wrapper.sh <vcf.file> <base.name>.maf <perl>
  • AnnotateMaf
   python /humgen/gsa-firehose/firehose/genepattern_data/taskLib/AnnotateMaf.18.402/run_matlab.py annotate_maflite <maf.file> <base.name>.maf.annotated
  • AnnotateVCFwithMAF
   python /humgen/gsa-firehose/firehose/genepattern_data/taskLib/AnnotateVCFwithMAF.3.439/AnnotateVCFwithMAF.py <vcf.file> <annotated.maf.file>

Outputs

  • SNP Calls (VCF)
  • Filtered SNP calls (VCF)
  • Eval metrics for unfiltered calls
  • Eval metrics for filtered calls
  • maf file
  • annotated maf file
  • maf annotated vcf
Personal tools