Created 2015-08-09 21:20:47 | Updated | Tags: fastaalternatereference vcf fastaalternatereferencemaker

I am trying to use a VCF containing snps variants to change the mouse reference (GRCm38- c57BL/6J) with BALB/cJ snps.

After running this command:

java -jar ~/programs/GenomeAnalysisTK.jar -T FastaAlternateReferenceMaker -R ~/genome/mouse_GRCm38.p4/GRCm38.primary_assembly/GRCm38.primary_assembly.fa -o ~/BALBcJ.snp.primary.fa -V ~/BALB_cJ.snps.vcf

The following ERROR shows up:

##### ERROR reference contigs = [1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 3, 4, 5, 6, 7, 8, 9, MT, X, Y, JH584299.1, GL456233.1, JH584301.1, GL456211.1, GL456350.1, JH584293.1, GL456221.1, JH584297.1, JH584296.1, GL456354.1, JH584294.1, JH584298.1, JH584300.1, GL456219.1, GL456210.1, JH584303.1, JH584302.1, GL456212.1, JH584304.1, GL456379.1, GL456216.1, GL456393.1, GL456366.1, GL456367.1, GL456239.1, GL456213.1, GL456383.1, GL456385.1, GL456360.1, GL456378.1, GL456389.1, GL456372.1, GL456370.1, GL456381.1, GL456387.1, GL456390.1, GL456394.1, GL456392.1, GL456382.1, GL456359.1, GL456396.1, GL456368.1, JH584292.1, JH584295.1]

So Trying to fix, I used the perl script in the link to sort properly within the reference.

I did this:

./sortByRef.pl ~/BALB_cJ.snps.vcf /home/tiagocastro/genome/mouse_GRCm38.p4/GRCm38.primary_assembly/GRCm38.primary_assembly.fa.fai > ~/BALB_cJ.snps_sorted.vcf

using the new vcf file, a new error is shown:

##### ERROR VCF3 VariantContext (this is an external codec and is not documented within GATK)

looking the head of each, sorted and basic vcf, I can see that is different.

Can someone help me?

Created 2015-05-29 19:52:11 | Updated | Tags: fastaalternatereference indels

Dear GATK team.

I would like to use the FastaAlternateReferenceMaker to generate draft genome assemblies of samples from plant species that are closely related to my reference. I know there are quite some large-scale insertions and deletions that I need to take into account.

My question is whether FastaAlternateReferenceMaker is able to process such large-scale (up to several kb in length) insertions and deletions?

The documentation states that it "works only for SNPs and for simple indels (but not for things like complex substitutions)". When are indels "simple" enough for the tool to be able to process them?

Thanks in advance for the information.

Robin

Created 2015-05-06 17:39:20 | Updated | Tags: fastaalternatereference

I specify my interval as 1-1527. The output does not align with position 1 of my (bacterial) reference genome, even though it is 1527 bp long. Position 1 of the output actually aligns with position 87 of my reference genome, and then skips one base about every 70bp (see attachment). I can't figure out what the problem is.

gatk.sh -Xmx2g -R /global/scratch/ahlstrom/ref_MAP4/MAP4.fasta -T FastaAlternateReferenceMaker -L WGreverse.intervals --variant /global/scratch/ahlstrom/MAP4_VCF/$1_MAP4.vcf -o WGreverse/$1_WGreverse1.fasta

Thanks, Christina

Created 2015-03-02 10:13:32 | Updated 2015-03-02 10:30:32 | Tags: fastaalternatereference fastaalternatereferencemaker

Hello, I really like yor FastaAlternateReferenceMaker to build a new consensus from FASTA+VCF. However, it deletes all headers from the FASTA file and replaces them by numbers. Is there a way to change this behavior, e.g. simply take the old headers?

Best, Boyke

Created 2013-09-27 13:49:14 | Updated | Tags: combinevariants haplotypecaller fastaalternatereference coverage

Hi, I'm calling Variants with HaplotypeCaller in a population of 2 Parents and 7 F1-individuals. After read backed phasing I'm combining the vcf files of my genotypes with CombineVariants. In the outfile I very often find "./.". I thought this means there is no coverage at a certain position. But at many positions I do have good coverage. Why do I then get ./.? Moreover I used FastaAlternateReferenceMaker and created a new reference sequence including the variants from the parents. In that case, after I run HC and do the phasing and combine variants steps, I only get "./." at positions where there is really no coverage (as I can see in my mappings). Nadia

Created 2012-10-11 08:18:08 | Updated 2012-10-18 01:25:50 | Tags: fastaalternatereference

Hi GATK team,

I have a VCF file (from GATK) containing variants for a total of 20 individuals and I'm wondering how to get the consensus sequences for each individual regarding its own polymorphism. Some individuals may not show polymorphism at a particular position in a contig whereas some others may. I've checked the GATK dedicated tool (FastaAlternateReferenceMaker) but it doesn't answer my question as only one consensus is generated. My requirement would be to get as many outputs files (containing consensus file) as mapped individuals.

Is there anyway to get this task achieved using GATK?

Thanks, C.