Tagged with #rna-seq
0 documentation articles | 0 announcements | 8 forum discussions

No posts found with the requested search criteria.
No posts found with the requested search criteria.

Created 2016-02-05 09:25:43 | Updated | Tags: haplotypecaller best-practices merge rna-seq
Comments (0)

Hello, and thanks for making all the GATK tools! I have recently started to try my hand at variant calling of my RNA-seq data, following the GATK Best Practices more or less verbatim, only excluding indel alignment (because I am only interested in SNPs at this point) and the BQSR (partly because I have very high quality data, but mostly because I couldn't get it to work in the workflow).

I have three replicates for each of my samples, and my question is where, if at all, I should merge the data from them. I am not sure if I can (or even should!) merge the FASTQ files before the alignment step, or merge the aligned BAM files, or something else. I read that for aligners such as BWA the options are (more or less) equivalent, but seeing as the RNA-seq Best Practice workflow using STAR... What would be the "correct" way to do it, if at all? How would merging (at some level) affect the speed of the workflow, and can I optimise that somehow?

If it's a bad idea to do merging, how would I determine the "true" variant from my three resulting VCF-files at the end, for cases where they differ?

Created 2015-10-03 00:45:25 | Updated | Tags: haplotypecaller rna-seq
Comments (3)

Hi, I am trying to call SNPs from RNA-seq data. The data that I have is a pooled sample (from the larvae of shellfish 1000s of larvae pooled together to get enough RNA) I have 6 of those samples. Can I use GATK to call SNPs in these pooled samples ?


Created 2015-09-28 14:11:22 | Updated | Tags: vcf rna-seq
Comments (10)

Hello All,

I am using I am using GATK RNA-seq variant pipeline for finding muttaion/vatiants on the list of gene given in teh follwoing command line

java-1.7 -jar -Xincgc -Xmx1586M GenomeAnalysisTK-3.2-2.jar -T HaplotypeCaller --filter_reads_with_N_cigar -R human_genome37_gatk.fa -D dbsnp_137.hg19.vcf -I sample_split.bam -o sample.vcf -L mylist.intervals

And the resulting VCF files has for variants AF either 100 % or 50 % . It would be great if anyone would explain me what does AF means in INFO column from VCF file. I AF means allele frequency or it has to be calculated separately from VCF file and if so how can I do it..??? Thank you

Created 2015-09-17 04:24:37 | Updated | Tags: validatesamfile rna-seq errors
Comments (4)

Hi everyone,

I have followed the instructions found here https://www.broadinstitute.org/gatk/guide/article?id=3891 to analyze RNA-seq data. However, when I tried to validate the final bam file using ValidateSamFile tool I get the following errors:

HISTOGRAM java.lang.String


It looks like the Split'N'Trim step is the cause of these errors since I validate the bam files before and after.

Can anyone tell me why I am getting these errors and how to fix them please?

I appreciate your help, Regards Hak

Created 2015-04-01 14:56:27 | Updated | Tags: rna-seq
Comments (3)


I am trying to run RNA-SeQC for alignment of reads against Zebrafish reference using this command:

java -Xmx24g -jar /share/pkg/RNA-SeQC/1.1.7/RNA-SeQC_v1.1.7.jar -r danRer7.fa -t danRer7/ucsc_new.gtf
-n 1000 -s 'sample1|tumor_aln_Filtered_SortedFixed_Reorder_Clean_MarkDup_AddRG.bam|RNASEQC analysis' -o metrics

After running for a while I get this error: Running GATK Depth of Coverage Analysis .... Arguments: -T DepthOfCoverage -R /home/sg15w/WES/Coel/BWAIndex/danRer7.fa -I /project/umw_michael_czech/BIOIFX-032/alignment/BWA/RealignedBamDir/tumor_aln_Filtered_SortedFixed_Reorder_Clean_MarkDup_AddRG.bam -o _metrics/sample1/highexpr//perBaseDoC.out -L _metrics/sample1/highexpr/intervals.list -l ERROR Arguments Array: [-T, DepthOfCoverage, -R, /home/sg15w/WES/Coel/BWAIndex/danRer7.fa, -I, /project/umw_michael_czech/BIOIFX-032/alignment/BWA/RealignedBamDir/tumor_aln_Filtered_SortedFixed_Reorder_Clean_MarkDup_AddRG.bam, -o, _metrics/sample1/highexpr//perBaseDoC.out, -L, _metrics/sample1/highexpr/intervals.list, -l, ERROR] org.broadinstitute.sting.utils.exceptions.UserException$MalformedGenomeLoc: Badly formed genome loc: Parameters to GenomeLocParser are incorrect:The genome loc coordinates 62098192-62098475 exceed the contig size (59938731) at org.broadinstitute.sting.utils.GenomeLocParser.vglHelper(GenomeLocParser.java:324) at org.broadinstitute.sting.utils.GenomeLocParser.validateGenomeLoc(GenomeLocParser.java:307) at org.broadinstitute.sting.utils.GenomeLocParser.createGenomeLoc(GenomeLocParser.java:265) at org.broadinstitute.sting.utils.GenomeLocParser.parseGenomeLoc(GenomeLocParser.java:389) at org.broadinstitute.sting.utils.interval.IntervalUtils.intervalFileToList(IntervalUtils.java:139) at org.broadinstitute.sting.utils.interval.IntervalUtils.parseIntervalArguments(IntervalUtils.java:71) at org.broadinstitute.sting.commandline.IntervalBinding.getIntervals(IntervalBinding.java:106) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.loadIntervals(GenomeAnalysisEngine.java:616) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.initializeIntervals(GenomeAnalysisEngine.java:583) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:233) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.cga.rnaseq.gatk.GATKTools.runDoC(GATKTools.java:59) at org.broadinstitute.cga.rnaseq.PerBaseDoC.runDoC(PerBaseDoC.java:888) at org.broadinstitute.cga.rnaseq.PerBaseDoC.runDoC(PerBaseDoC.java:858) at org.broadinstitute.cga.rnaseq.RNASeqMetrics.runMetrics(RNASeqMetrics.java:264) at org.broadinstitute.cga.rnaseq.RNASeqMetrics.execute(RNASeqMetrics.java:166) at org.broadinstitute.cga.rnaseq.RNASeqMetrics.main(RNASeqMetrics.java:135) RNA-SeQC Total Runtime: 115 min

Could you please help me fix this error.

Thanks Sharvari

Created 2014-12-11 18:37:39 | Updated | Tags: rnaseq cohort rna-seq
Comments (2)

Dear GATK team,

Is there a value in cohort calling in RNA-Seq similar to what is recommended in the GATK DNA-Seq workflow? I am trying to understand why cohort calling is highly emphasized in DNA-Seq but not mentioned in the RNA-Seq workflow.

Thank you!


Created 2014-11-30 22:28:31 | Updated | Tags: haplotypecaller rna-seq
Comments (1)

Hi, I am performing RNA-Seq to identify new polymorphisms in a species of sea star. Our short-term goal is to generate novel DNA sequences of coding genes for phylogenetic analysis. It is therefore important that polymorphisms be called accurately and that they can be phased.

Our reference genome is poorly assembled and comprises over 60,000 scaffolds and contigs. Subsequently, when paired-end RNA-Seq reads are aligned to this reference genome (using TopHat), the two halves of the pair are often mapped to different scaffolds or contigs. This seems to greatly lower the MAQ score, which in turn leads to HaplotypeCaller missing well-supported polymorphisms, because the reads that support them have MAQ values between 1 and 3.

The obvious solution for this is to set the --min-mapping-quality-score to 1 or 2, rather than the default of 20; and raising the --min_base_quality_score from the default value of 10 to maybe 25 or 30. This does, however, increase the risk of calling false positives from poorly aligned regions.

Has this situation been considered by the GATK development team, and is there a recommended way to account for it?

Created 2014-01-20 19:28:37 | Updated | Tags: bwa mapping rna-seq
Comments (1)

Hi all,

My question is on bwa software when one want to map RNA-seq data on the entire human genome. What should be the specific settings to use to get maximum mapping? Should it be effective if no options are used in the command line?

Thank you for your time