As I know GATK is worked fine for Genome SNP Calling. Can I know it work well for Transcriptome SNP calling as well? I can't find much info regarding Transcriptome SNP calling.
If yes, can I know the step for Transcriptome SNP calling by GATK is same as what we did for Genome SNP calling? eg. alignment raw read to reference transcriptome, marking/remove PCR duplicates, local realignment around indel and quality score re-calibration (if know dbSNP is available).
Thanks and looking forward to hear from you.
Hi all: I find that among all the work flows of GATK http://www.broadinstitute.org/gatk/guide/topic?name=methods-and-workflows there are no workflows for RNA-seq analysis. I understand that GATK mainly focuses on variant calling, can anyone tell me how to use GATK for RNA-seq analysis?
I've been trying to get ReduceReads working in a pipeline I've made that incorporates GATK tools to call variants in RNA-seq data. After performing indel realignment and base recalibration I'm trying to use ReduceReads prior to calling variants using Unified Genotyper.
I've been using GATK version 2.3.9. When I try to use ReduceReads on a 1.7Gb .bam file, I need to set aside 100Gb memory to perform the operation for the process to complete (otherwise I'll get an error saying I didn't provide enough memory to run the program and to adjust the maximum heap size using the -Xmx option etc).
The problem isn't that ReduceReads doesn't work - it does, however of the 100Gb I set aside, it uses 80-90Gb of it. This means I can't run more than one job at a time due to the constraints of the machine I'm using etc.
I've been looking through the GATK forum and understand it may be a GATK version issue, though I've tried using GATK 2.5.2 ReduceReads for this step and it still requires 70-80Gb memory.
Can anyone provide any clues as to what I may be doing wrong? or whether I can do something to make it use less memory so I can run multiple jobs simultaneously?
The command I'm using is:
java -Xmx100g -Djava.io.tmpdir=/RAW/javatmp/ -jar /NMC/LCR/GenomeAnalysisTK-2.5-2-gf57256b/GenomeAnalysisTK.jar -T ReduceReads -R /SCRATCH/LCR/BWAIndex_hg19/genome.fa -I out.bam.sorted.bam.readGroups.bam.rmdup.bam.realigned.bam.recalibrated.bam -o out.bam.sorted.bam.readGroups.bam.rmdup.bam.realigned.bam.recalibrated.bam.reducedReads.bam
Thanks in advance,
Hi! I have worked some time on a mRNAseq set, single-end. Its a high quality set and lots of biological replicates (200+).
My question is, how could I best contribute to the methodology used for SNPs call in mRNAseq? What do we need tested to improve this method?