I am trying to use a VCF containing snps variants to change the mouse reference (GRCm38- c57BL/6J) with BALB/cJ snps.
After running this command:
java -jar ~/programs/GenomeAnalysisTK.jar -T FastaAlternateReferenceMaker -R ~/genome/mouse_GRCm38.p4/GRCm38.primary_assembly/GRCm38.primary_assembly.fa -o ~/BALBcJ.snp.primary.fa -V ~/BALB_cJ.snps.vcf
The following ERROR shows up:
So Trying to fix, I used the perl script in the link to sort properly within the reference.
I did this:
./sortByRef.pl ~/BALB_cJ.snps.vcf /home/tiagocastro/genome/mouse_GRCm38.p4/GRCm38.primary_assembly/GRCm38.primary_assembly.fa.fai > ~/BALB_cJ.snps_sorted.vcf
using the new vcf file, a new error is shown:
looking the head of each, sorted and basic vcf, I can see that is different.
Can someone help me?
Dear GATK team.
I would like to use the FastaAlternateReferenceMaker to generate draft genome assemblies of samples from plant species that are closely related to my reference. I know there are quite some large-scale insertions and deletions that I need to take into account.
My question is whether FastaAlternateReferenceMaker is able to process such large-scale (up to several kb in length) insertions and deletions?
The documentation states that it "works only for SNPs and for simple indels (but not for things like complex substitutions)". When are indels "simple" enough for the tool to be able to process them?
Thanks in advance for the information.
I specify my interval as 1-1527. The output does not align with position 1 of my (bacterial) reference genome, even though it is 1527 bp long. Position 1 of the output actually aligns with position 87 of my reference genome, and then skips one base about every 70bp (see attachment). I can't figure out what the problem is.
gatk.sh -Xmx2g -R /global/scratch/ahlstrom/ref_MAP4/MAP4.fasta -T FastaAlternateReferenceMaker -L WGreverse.intervals --variant /global/scratch/ahlstrom/MAP4_VCF/$1_MAP4.vcf -o WGreverse/$1_WGreverse1.fasta
Hello, I really like yor FastaAlternateReferenceMaker to build a new consensus from FASTA+VCF. However, it deletes all headers from the FASTA file and replaces them by numbers. Is there a way to change this behavior, e.g. simply take the old headers?
Hi, I'm calling Variants with HaplotypeCaller in a population of 2 Parents and 7 F1-individuals. After read backed phasing I'm combining the vcf files of my genotypes with CombineVariants. In the outfile I very often find "./.". I thought this means there is no coverage at a certain position. But at many positions I do have good coverage. Why do I then get ./.? Moreover I used FastaAlternateReferenceMaker and created a new reference sequence including the variants from the parents. In that case, after I run HC and do the phasing and combine variants steps, I only get "./." at positions where there is really no coverage (as I can see in my mappings). Nadia
Hi GATK team,
I have a VCF file (from GATK) containing variants for a total of 20 individuals and I'm wondering how to get the consensus sequences for each individual regarding its own polymorphism. Some individuals may not show polymorphism at a particular position in a contig whereas some others may. I've checked the GATK dedicated tool (FastaAlternateReferenceMaker) but it doesn't answer my question as only one consensus is generated. My requirement would be to get as many outputs files (containing consensus file) as mapped individuals.
Is there anyway to get this task achieved using GATK?