java -jar GenomeAnalysisTK.jar -R S288C_refseq.fasta -T FastaAlternateReferenceMaker -o WT_refseq.fasta --variant WT_common.vcf
Output (last line):
##### ERROR MESSAGE: Line 54: there aren't enough columns for line chr1 1 . CCACACCACACCCACAC CCACCCACACCACACCCACAC,CCACACCACACCCACACCACACCCACAC 5.79 . INDEL;IS=1,1.000000;DP=3;VDB=2.063840e-02;AF1=1;AC1=8;DP4=0,0,3,0;MQ=28;FQ=-33.4 GT:PL:DP:SP:GQ (we expected 9 tokens, and saw 1 )
As far as I can tell, the vcf file conforms to spec and contains 9 tab separated columns. Therefore I don't understand the error. I've tried re-parsing the vcf file to ensure that there aren't missing or hidden characters, without success.
Hi, I want to create out of my bam file a consensus file by using FastaAlternatereferneceMaker and my .vcf file. How should I care about positions where no reads mapped to. By default FastaAlternateReferenceMaker uses the REF base, but isnt it better to use here N insted? THANKS :=)
I am using FastaAlternateReferenceMaker to create consensus sequences as follows:
java -Xmx12g -jar /GenomeAnalysisTK-2.7-4-g6f46d11/GenomeAnalysisTK.jar -T FastaAlternateReferenceMaker -R /Reference/chromosome.1.fa -o /Output/consensus.1.fa --variant /VCF/chromosome1.vcf -L /Interval/chromosome.1.list
This command line produces a single fasta file including consensus sequences for the intervals provided in the list file.
(1) Presumably, sequence header for each sequence in the consensus fasta file corresponds to the contig name in the master reference file. Is there any way to modify the command line to print, say, the interval as the sequence header instead of the contig name?
Note: I can feed the program one interval at a time and name the output file accordingly, however, I'd like to stick to submitting my query per one chromosome at a time for the sake of saving up time.
(2) Number of sequences in the consensus fasta do not match the number of non-overlapping intervals in the input list. I would think that this is because some intervals are variant-free and therefore no alternate reference is reported for them. Do you confirm this is the case? I cannot easily check because the issue mentioned in (1).
Hi, I am using FastaAlternateReferenceMaker and have a set of intervals ordered first by chromosome and then by their start positions. I have tried ordering chromosomes alphabetically(chr1, chr10, chr11,..) as well as numerically (chr1, chr2, chr3...) but the output fasta sequence returned is not in the same order as listed in interval file. I find that even the names target_1, target_2 etc are also not used as fasta headers in the output file. I am stuck with mapping the input intervals with the output fasta sequences. Thanks in advance for all the help, Ramya
I tried to run FastaAlternateReferenceMaker and I get the following error:
WARNING 2013-09-18 16:28:28 IntervalList Ignoring interval for unknown reference: Chr1:3580210-3580286
For all the intervals I submitted. I already looked around on the web, and I did not find any answer, knowing that my chromosome names are all with the 'Chr' format in all the files and that my interval files are tab delimited.
My interval file look like:
@HD VN:1.4 SO:unsorted @SQ SN:Chr1 LN:158337067 UR:file:chromosome_3.1.fasta M5:0631b350aa263a0f714de8ba9d609eb0 @SQ SN:Chr2 LN:137060424 UR:file:Chromosome_3.1.fasta M5:15898469d6142f8bb74f769bfe9b155f @SQ SN:Chr3 LN:121430405 UR:file:Chromosome_3.1.fasta M5:c515c4da7c2cd2d24c9487db8f733cfd ... Chr1 3580210 3580286 + ID=MI0011294_1;accession_number=MI0011294 Chr1 3580220 3580240 + ID=MIMAT0011792_1;accession_number=MIMAT0011792 Chr1 3607747 3607842 - ID=MI0014499_1;accession_number=MI0014499 Chr1 3607802 3607822 - ID=MIMAT0017395_1;accession_number=MIMAT0017395 Chr1 10227277 10227339 - ID=MI0009752_1;accession_number=MI0009752 Chr1 10227315 10227337 - ID=MIMAT0009241_1;accession_number=MIMAT0009241 Chr1 19881347 19881431 - ID=MI0005457_1;accession_number=MI0005457 Chr1 19881398 19881419 - ID=MIMAT0003539_1;accession_number=MIMAT0003539 Chr1 19930459 19930542 - ID=MI0005454_1;accession_number=MI0005454 Chr1 19930511 19930532 - ID=MIMAT0004332_1;accession_number=MIMAT0004332 ...
The header of my interval file is a copy of the Chromosome_3.1.dict I do not know what is misformated and why I get this error