Hi everyone, I have a set of high-quality SNPs that were jointly called off a merged, realigned BAM. I then created a VCF for each sample using SelectVariants, re-ran UnifiedGenotyper with EMIT_ALL_SITES, and thinned the resulting VCF to no-call sites so I could mask them. Next, I produced a FASTA file for each sample with Ns at no-call sites and SNPs in their appropriate positions with FastaAlternateReferenceMaker. How can I use GATK to specify contigs where no reads were supported for a given sample and use this information to avoid outputting these regions via the -L flag with FastaAlternateReferenceMaker? Apologies if this is trivial, but I haven't found a clear solution.
I've followed the exact commandline instructions for two separate reference files (hg.fa and Homo_sapiens_assembly19.fasta). I am definitely generating the files, but other tools (bwa mem) find no index when I use the reference as an argument. Full example below:
$ bwa index -a bwtsw -p Homo_sapiens_assembly19 Homo_sapiens_assembly19.fasta $ samtools faidx Homo_sapiens_assembly19.fasta $ picard CreateSequenceDictionary \ REFERENCE=Homo_sapiens_assembly19.fasta \ OUTPUT=Homo_sapiens_assembly19.dict
(all of these finish without an error)
$ ls Homo_sapiens_assembly19* Homo_sapiens_assembly19.amb Homo_sapiens_assembly19.bwt Homo_sapiens_assembly19.fasta Homo_sapiens_assembly19.pac Homo_sapiens_assembly19.ann Homo_sapiens_assembly19.dict Homo_sapiens_assembly19.fasta.fai Homo_sapiens_assembly19.sa
$ bwa mem Homo_sapiens_assembly19.fasta filtered.fastq [E::bwa_idx_load_from_disk] fail to locate the index files
Any ideas? I could swear I've run this exact same pipeline before with the same files without any issues...
I'm running bwa-0.7.12 with picard-1.140 and samtools-1.2 on a Macbook Pro OSX El Capitan. Every command executed within the same directory.
Thanks for any help!
I have tried to solve several issues which came up while trying to run the HaplotypeCaller. For this one, I didn't find anything on google and to be honest when pasting the error, google doesn't even find something similar.
ERROR MESSAGE: Badly formed genome loc: Contig NC_007605 given as location, but this contig isn't present in the Fasta sequence dictionary
Can anyone please tell me what's the problem here? The fasta file I got was the one downloaded from the bundle: human_g1k_v37.fasta.gz
Any help would be really appreciated. Thank you!!
Hello GATK Team,
Is there a tool within GATK that takes a multiple sequence alignment in FASTA format and converts to VCF? If not, could anyone point me to a tool that could do this task?
Many thanks, Nick
I am a phd student working in Sweden, currently trying to apply NGS data to phylogenetics.
I would like to know if there is a way to convert a sorted BAM file into a fasta sequence using only the mapped reads, i.e., without incorporating any of the reference into the fasta sequence?
I have sorted bam files that result from mapping reads of one species to a reference of a different species. Right now I am extracting the reads from the bam files and re-assembling them, but the result is sub-optimal because I often get multiple contigs, probably due to low coverage portions in the bam file, and this causes many alignment problems. I would like to get a single contig, perhaps with gaps inserted where there are no reads to match the reference, which would be much easier to align to other samples.
Regards, Filipe de Sousa