I want to compare my vcf file to a vcf file supplied in the GATK bundle using Picard GenotypeConcordance. In the terminology used by Picard GenotypeConcordance I want to use a vcf file in the bundle as the "truth sample."
The problem is the vcf files in the bundle lack the sample name needed by Picard GenotypeConcordance.
That is, there is no value in these supplied vcf files to satisfy the Picard GenotypeConcordance required option: TRUTH_SAMPLE (String) The name of the truth sample within the truth VCF Required.
Take dbsnp_138.hg19.vcf.gz as an example:
$ zcat dbsnp_138.hg19.vcf.gz | grep CHROM #CHROM POS ID REF ALT QUAL FILTER INFO
Based on the description of the vcf file format described elsewhere on this GATK site https://broadinstitute.org/gatk/guide/article?id=1268 I expect to see a FORMAT field and a sample name field following the INFO field.
How should I proceed?
Hello, I was trying to download the resource bundle and using the credentials given on the website. Username: gsapubftp-anonymous and password is empty. Unfortunately I am not able to download files. Following is the error for most of the cases: 550/tutorials/workshops/GATKbasic: no such file or directory. Is there any way to download resource bundle? Thank you
Hi GATK Team,
I am Mia Yang. Would like to ask you some questions regarding the resources from GATK bundle.
(1) Is there any differences between the original files (HapMap, 1000G, Mills, etc..) and the one that we can download directly from the GATK bundle? And are these resources compatible with other variant re-calibration tools available out there?
(2) I can see that there are some labelled with hg19 at the back of the file names. Does this means that the files are only compatible with hg19 genome or it is optimized for hg19 and not the others (example hg38 etc..)?
Thank you in advance and have a nice day! =)
Dear GATK team,
Good day. I would like to ask about few things regarding the GATK bundle provided at ftp://ftp.broadinstitute.org/bundle/2.8/hg19/.
From the FTP site, there are few variant file (.vcf format) associated with 1000G, Mills, dbSNPs etc. I would like to ask about the differences between them with the original file provided by their original sources. By "hg19", does that meaning all the resource bundles are optimized for hg19 reference genome? How is the files provided at the bundle differs from the original files?
Thank you in advance.
Hi, I noticed that the GATK resource bundle does not have all the patches from here (http://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.25/#/def_asm_PATCHES). I need to add a few like GL582971.1. This brings me to my question:
For the GL's included by default in the bundle what was their inclusion criterion?
Do you have a resource bundle which has ALL the GL's as per the release notes on NCBI (above link)?