Tagged with #fastaalternatereferencemaker
0 documentation articles | 1 announcement | 10 forum discussions


No posts found with the requested search criteria.
Comments (0)

GATK 3.1 was released on March 18, 2014. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history


Haplotype Caller

  • Added new capabilities to the Haplotype Caller to use hardware-based optimizations. Can be enabled with --pair_hmm_implementation VECTOR_LOGLESS_CACHING. Please see the 3.1 Version Highlights for more details about expected speed ups and some background on the collaboration that made these possible.
  • Fixed bugs in computing the weights of edges in the assembly graph. This was causing bad genotypes to be output when running the Haplotype Caller over multiple samples simultaneously (as opposed to creating gVCFs in the new recommended pipeline, which was working as expected).

Variant Recalibrator

  • Fixed issue where output could be non-deterministic with very large data sets.

CalculateGenotypePosteriors

  • Fixed several bugs where bad input were causing the tool to crash instead of gracefully exiting with an error message.

Miscellaneous

  • RandomlySplitVariants can now output splits comprised of more than 2 output files.
  • FastaAlternateReferenceMaker can now output heterozygous sites using IUPAC ambiguity encoding.
  • Picard, Tribble, and Variant jars updated to version 1.109.1722.
Comments (1)

I was working with FastaAlternateReferenceMaker today. I wanted to use it on two samples. The first sample worked fine. The second sample came up with an error about a few REF values not conforming (e.g. :A) in the vcf. I wrote a script to correct the errors and remove the : from the vcf file. I tried to run it again, but received memory allocation errors. So, I dealt with that and am now allocating 6g for a 15G VCF and a 188M fasta reference. I got it running (I think) and then started getting the NegativeArraySizeException java error. I saw this had come up as a bug in earlier versions of GATK, but it seemed like it had been resolved...perhaps not? Anyhow, this 'quick little run' has taken up the better part of my day and I would be super greatful for some assistance troubleshooting this problem.

Command: $ java -Xmx6g -jar ../../shays/GATK/GenomeAnalysisTK.jar -l ERROR -R ../up/chrom6.2.fa -T FastaAlternateReferenceMaker -o D-chr6.cons.fa --variant D-c6.vcf

Trace:

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.NegativeArraySizeException at org.broad.tribble.readers.AsciiLineReader.readLine(AsciiLineReader.java:101) at org.broad.tribble.readers.AsciiLineReader.readLine(AsciiLineReader.java:120) at org.broadinstitute.variant.vcf.VCFCodec.readHeader(VCFCodec.java:85) at org.broad.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:76) at org.broad.tribble.index.IndexFactory$FeatureIterator.readHeader(IndexFactory.java:366) at org.broad.tribble.index.IndexFactory$FeatureIterator.(IndexFactory.java:354) at org.broad.tribble.index.IndexFactory.createDynamicIndex(IndexFactory.java:281) at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.createIndexInMemory(RMDTrackBuilder.java:388) at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.loadIndex(RMDTrackBuilder.java:274) at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.getFeatureSource(RMDTrackBuilder.java:211) at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.createInstanceOfTrack(RMDTrackBuilder.java:140) at org.broadinstitute.sting.gatk.datasources.rmd.ReferenceOrderedQueryDataPool.(ReferenceOrderedDataSource.java:208) at org.broadinstitute.sting.gatk.datasources.rmd.ReferenceOrderedDataSource.(ReferenceOrderedDataSource.java:88) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:873) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:725) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:259) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.5-2-gf57256b):
ERROR
ERROR Please check the documentation guide to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Code exception (see stack trace for error itself)
ERROR ------------------------------------------------------------------------------------------
Comments (1)

Hey,

first of all thanks for the support offered here. But (sadly) I have an other two questions.

I used the FastaAlternateReferenceMaker to create a consesus sequence and it worked great. Here I have a question regarding a warning: "IndexDictionaryUtils - Track variant doesn't have a sequence dictionary built in, skipping dictionary validation". I do not get this warning. Which dictionary is needed here. Is this warning comming up, because I used .gz and .tbi compressed vcf-files?

And a furhter question regarding the creation of a consesus sequence. I`m unsure what to do in cases of heterozygous INDELs. What would you suggest to do? Skip them as they are only represented by one allele or include them in the consensus. Both seems to be incorrect. Do you know the general handling here?

Volavii

Comments (1)

Hey,

I`m trying to use the option --useIUPAC with the FastaAlternateReferenceMaker. I tried several ways to set this option but all lead to ERRORs (see below). Without the option no ERROR message occurrs.

Here my code and...

java -Xms1g -jar "/home/GenomeAnalysisTK.jar" \
-T "FastaAlternateReferenceMaker" \
-R "/home/refGen.fa" \
-L "chromosome3" \
-lw "100" \
--variant "/home/chromosome3_raw_snp_filter3.vcf.gz" \
-o "/home/chromosome3.fasta" \
--useIUPAC 

the ways I used useIUPAC:

--useIUPAC
##### ERROR MESSAGE: Argument with name 'useIUPAC' isn't defined.

--useIUPAC "true"
##### ERROR MESSAGE: Argument with name 'useIUPAC' isn't defined.

-useIUPAC "true"
##### ERROR MESSAGE: Argument with name 'useIUPAC' isn't defined.

-useIUPAC
##### ERROR MESSAGE: Argument with name 'useIUPAC' isn't defined.
Comments (10)

Hi,

I would like to change the reference genome with the sequences of the variants I've found in my deep sequencing data. So here is my command line: java -jar GenomeAnalysisTK.jar -R ../human_g1k_v37.fasta -T FastaAlternateReferenceMaker -o ../output_new_genome.fasta -L ../cancer.intervals -- ../output.snp.indel.LOH.vcf

I got this error:

ERROR MESSAGE: Invalid argument value '--' at position 8.
ERROR Invalid argument value '../output.snp.indel.LOH.vcf' at position 9.

Don't really understand...

Thank you for your time and help

Comments (3)

Dear all I am writing to you to understand why FastaAlternateReferenceMaker is missing a deletion in the final consensus. I am using gatk v. 2.8-1 and this commandline:

gatk -R ref.fa -T FastaAlternateReferenceMaker -o reads_vs_ref_gatk_consensus.fa --variant reads_vs_ref_var.vcf

output from gatk was:

INFO 08:57:46,123 HelpFormatter - -------------------------------------------------------------------------------- INFO 08:57:46,126 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.8-1-g932cd3a, Compiled 2013/12/06 16:47:15 INFO 08:57:46,126 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 08:57:46,126 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 08:57:46,130 HelpFormatter - Program Args: -R ref.fa -T FastaAlternateReferenceMaker -o reads_vs_ref_gatk_consensus.fa --variant reads_vs_ref_var.vcf INFO 08:57:46,130 HelpFormatter - Date/Time: 2014/02/20 08:57:46
INFO 08:57:46,131 HelpFormatter - --------------------------------------------------------------------------------
INFO 08:57:46,131 HelpFormatter - --------------------------------------------------------------------------------
INFO 08:57:46,137 ArgumentTypeDescriptor - Dynamically determined type of reads_vs_ref_var.vcf to be VCF
INFO 08:57:46,856 GenomeAnalysisEngine - Strictness is SILENT
INFO 08:57:46,978 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 08:57:47,010 RMDTrackBuilder - Creating Tribble index in memory for file reads_vs_ref_var.vcf
INFO 08:57:47,079 RMDTrackBuilder - Writing Tribble index to disk for file /home/aprea/test/consensus/reads_vs_ref_var.vcf.idx
INFO 08:57:47,223 GenomeAnalysisEngine - Preparing for traversal
INFO 08:57:47,224 GenomeAnalysisEngine - Done preparing for traversal
INFO 08:57:47,225 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 08:57:47,226 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining
INFO 08:57:47,546 ProgressMeter - done 1.05e+04 0.0 s 30.0 s 99.0% 0.0 s 0.0 s
INFO 08:57:47,547 ProgressMeter - Total runtime 0.32 secs, 0.01 min, 0.00 hours
INFO 08:57:51,452 GATKRunReport - Uploaded run statistics report to AWS S3

There are 3 variants in the vcf file: the first is a SNP and is reported, the second is an insertion and is reported but the third, a deletion, is missing. Could you please tell me where's the problem? I attached both reference and vcf file.

Many thanks.

Comments (4)

Hi,

I am using FastaAlternateReferenceMaker to create consensus sequences as follows:

java -Xmx12g -jar /GenomeAnalysisTK-2.7-4-g6f46d11/GenomeAnalysisTK.jar -T FastaAlternateReferenceMaker -R /Reference/chromosome.1.fa -o /Output/consensus.1.fa --variant /VCF/chromosome1.vcf -L /Interval/chromosome.1.list

This command line produces a single fasta file including consensus sequences for the intervals provided in the list file.

Two questions:

(1) Presumably, sequence header for each sequence in the consensus fasta file corresponds to the contig name in the master reference file. Is there any way to modify the command line to print, say, the interval as the sequence header instead of the contig name?

Note: I can feed the program one interval at a time and name the output file accordingly, however, I'd like to stick to submitting my query per one chromosome at a time for the sake of saving up time.

(2) Number of sequences in the consensus fasta do not match the number of non-overlapping intervals in the input list. I would think that this is because some intervals are variant-free and therefore no alternate reference is reported for them. Do you confirm this is the case? I cannot easily check because the issue mentioned in (1).

Thank you.

Comments (11)

Hello

I tried to run FastaAlternateReferenceMaker and I get the following error:

WARNING 2013-09-18 16:28:28 IntervalList    Ignoring interval for unknown reference: Chr1:3580210-3580286

For all the intervals I submitted. I already looked around on the web, and I did not find any answer, knowing that my chromosome names are all with the 'Chr' format in all the files and that my interval files are tab delimited.

My interval file look like:

@HD     VN:1.4  SO:unsorted
@SQ     SN:Chr1 LN:158337067    UR:file:chromosome_3.1.fasta    M5:0631b350aa263a0f714de8ba9d609eb0
@SQ     SN:Chr2 LN:137060424    UR:file:Chromosome_3.1.fasta    M5:15898469d6142f8bb74f769bfe9b155f
@SQ     SN:Chr3 LN:121430405    UR:file:Chromosome_3.1.fasta    M5:c515c4da7c2cd2d24c9487db8f733cfd
...
Chr1    3580210 3580286 +       ID=MI0011294_1;accession_number=MI0011294
Chr1    3580220 3580240 +       ID=MIMAT0011792_1;accession_number=MIMAT0011792
Chr1    3607747 3607842 -       ID=MI0014499_1;accession_number=MI0014499
Chr1    3607802 3607822 -       ID=MIMAT0017395_1;accession_number=MIMAT0017395
Chr1    10227277        10227339        -       ID=MI0009752_1;accession_number=MI0009752
Chr1    10227315        10227337        -       ID=MIMAT0009241_1;accession_number=MIMAT0009241
Chr1    19881347        19881431        -       ID=MI0005457_1;accession_number=MI0005457
Chr1    19881398        19881419        -       ID=MIMAT0003539_1;accession_number=MIMAT0003539
Chr1    19930459        19930542        -       ID=MI0005454_1;accession_number=MI0005454
Chr1    19930511        19930532        -       ID=MIMAT0004332_1;accession_number=MIMAT0004332
...

The header of my interval file is a copy of the Chromosome_3.1.dict I do not know what is misformated and why I get this error

Thanks

Martin

Comments (3)

Command:

java -jar GenomeAnalysisTK.jar -R S288C_refseq.fasta -T FastaAlternateReferenceMaker -o WT_refseq.fasta --variant WT_common.vcf

Output (last line):

##### ERROR MESSAGE: Line 54: there aren't enough columns for line 
chr1   1   .   CCACACCACACCCACAC   CCACCCACACCACACCCACAC,CCACACCACACCCACACCACACCCACAC   5.79   .   INDEL;IS=1,1.000000;DP=3;VDB=2.063840e-02;AF1=1;AC1=8;DP4=0,0,3,0;MQ=28;FQ=-33.4   GT:PL:DP:SP:GQ 
(we expected 9 tokens, and saw 1 )

As far as I can tell, the vcf file conforms to spec and contains 9 tab separated columns. Therefore I don't understand the error. I've tried re-parsing the vcf file to ensure that there aren't missing or hidden characters, without success.

Comments (1)

Hi, I want to create out of my bam file a consensus file by using FastaAlternatereferneceMaker and my .vcf file. How should I care about positions where no reads mapped to. By default FastaAlternateReferenceMaker uses the REF base, but isnt it better to use here N insted? THANKS :=)

Comments (6)

Hi, I am using FastaAlternateReferenceMaker and have a set of intervals ordered first by chromosome and then by their start positions. I have tried ordering chromosomes alphabetically(chr1, chr10, chr11,..) as well as numerically (chr1, chr2, chr3...) but the output fasta sequence returned is not in the same order as listed in interval file. I find that even the names target_1, target_2 etc are also not used as fasta headers in the output file. I am stuck with mapping the input intervals with the output fasta sequences. Thanks in advance for all the help, Ramya