Firehose Parameters
From GSA
Contents |
DEPRECATED
NOTE: we are no longer using this set of parameters. We have switched to the Queue based architecture described here
Current Standard Parameters
Reference: We prefer to use hg19 when possible, unless external collaborators are using hg18, or several hundred samples have already been through on hg18 dbSNP: We're still on dbSNP 129 because of the lower quality of later versions. We're hoping that the new version coming out will be good. Other parameters have been listed in the command lines below.
The GSA Firehose Standard Run
Please note that the parameters mentioned below are currently being used for only small target and whole exome GAII and HiSeq sequences only. These are not necessarily optimal for low-pass whole genome or other types of sequencing, and running the following pipelines on inappropriate samples may result in erroneous results. Please see Using the GATK for Variant Detection for suggestions about how best to use the GATK for your purposes.
We always run the following for Exome and HybSel Samples:
SampleCleanBam
This pipeline realigns sequences around indels, as described here.
Inputs
- reference
- realigned bam
- interval list
- dbsnp
- blacklist of bad lanes
- base.name (the name of the sample)
- temporary directory
Command Lines
- RealignerTargetCreator (GenomeAnalysisTK-1.0.4013 release)
java -Xmx2g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -I <input.bam> -R <reference.genome> <interval.list> <blacklist.file> -o <base.name>.merged.intervals
- IndelRealigner (GenomeAnalysisTK-1.0.4013 release)
java -Djava.io.tmpdir=<tmp.dir> -Xmx5g -jar GenomeAnalysisTK.jar -T IndelRealigner -I <input.bam> -R <reference.genome> <blacklist.file> -stats <base.name>.indel.stats -O <base.name>.unfixed.cleaned.bam -maxInRam 200000 -targetIntervals <merged.intervals> -D <dbsnp.file>
- FixMates
java -Xmx4096m -jar /humgen/gsa-firehose/firehose/genepattern_data/taskLib/FixMates.13.765/FixMateInformation.jar I=<input.bam> SO=coordinate VALIDATION_STRINGENCY=SILENT TMP_DIR=<tmp.dir> O=<base.name>.cleaned.bam
Outputs
- merged intervals (for IR)
- unfixed cleaned bam
- cleaned bam (with fixed mates)
SampleIndelGenotyper
This pipeline uses the Indel Genotyper to call inserstions and deletions.
Inputs
- reference
- realigned bam
- interval list
- dbsnp
- blacklist (optional)
Command Lines
- IndelGenotyperNoTumor (GenomeAnalysisTK-1.0.3116 release)
java -jar GenomeAnalysisTK.jar -l INFO -T IndelGenotyperV2 -verbose -I <input.bams> -R <reference.genome> <interval.list> -mnr 20000 -mrl 20000 -o <sample.id>_indels.verbose.bed -e gatk.err -O <sample.id>_indels.bed <blacklist.file>
- Filter Indels
perl /humgen/gsa-firehose/firehose/genepattern_data/taskLib/FilterIndelCalls.5.574/filterSingleSampleCalls.pl --calls <sample.verbose.calls> --max_cons_av_mm 1.9 --max_cons_nqs_av_mm 0.2 --mode ANNOTATE --output <sample.id>_filtered_indels.bed
Outputs
- Indel Calls
- Filtered indel calls
SetUnifiedGenotypertoEval
This creates SNP calls for a set of samples.
Inputs
- reference
- list of cleaned bams from SampleCleanBam
- interval list
- dbsnp
- blacklist (optional)
- sample set base name
Command Lines
- UnifiedGenotyper (GenomeAnalysisTK-1.0.3175 release)
java -Xmx5g -jar GenomeAnalysisTK.jar-T UnifiedGenotyper -I <bam.list> -R <reference.genome> --DBSNP <dbsnp> <interval.list> -mrl 1000000 -mbq 20 -mmq 30 -confidence 10 <blacklist.file> <platform> -varout <base.name>.vcf -l INFO -A HaplotypeScore
- VariantFiltration (GenomeAnalysisTK-1.0.3185 release)
java -Xmx2g -jar GenomeAnalysisTK.jar -T VariantFiltration -R <reference.genome> -D <dbsnp> <interval.list> --clusterWindowSize 10 -B variant,VCF,<unfiltered.vcf> -filterName <filter.name> -filter <filter.expression> <filter2.name> <filter2.expression> <filter3.name> <filter3.expression> <filter4.name> <filter4.expression> -o <base.name>.filtered.vcf
- VariantEval (GenomeAnalysisTK-1.0.3148:3150M release)
<java> -Xmx2g -jar GenomeAnalysisTK.jar -T VariantEval -R <reference.genome> -D <dbsnp> <interval.list> --extensiveSubsets -B eval,VCF,<vcf.file> -o <sample.id>filtered.eval
- VariantEval (GenomeAnalysisTK-1.0.3148:3150M release)
<java> -Xmx2g -jar GenomeAnalysisTK.jar -T VariantEval -R <reference.genome> -D <dbsnp> <interval.list> --extensiveSubsets -B eval,VCF,<vcf.file> -o <sample.id>unfiltered.eval
- VCFToMaf
sh /humgen/gsa-firehose/firehose/genepattern_data/taskLib/VcfToMaf.5.494/wrapper.sh <vcf.file> <base.name>.maf <perl>
- AnnotateMaf
python /humgen/gsa-firehose/firehose/genepattern_data/taskLib/AnnotateMaf.18.402/run_matlab.py annotate_maflite <maf.file> <base.name>.maf.annotated
- AnnotateVCFwithMAF
python /humgen/gsa-firehose/firehose/genepattern_data/taskLib/AnnotateVCFwithMAF.3.439/AnnotateVCFwithMAF.py <vcf.file> <annotated.maf.file>
Outputs
- SNP Calls (VCF)
- Filtered SNP calls (VCF)
- Eval metrics for unfiltered calls
- Eval metrics for filtered calls
- maf file
- annotated maf file
- maf annotated vcf
