# Tagged with #hapmap 0 documentation articles | 0 announcements | 3 forum discussions

No posts found with the requested search criteria.
No posts found with the requested search criteria.

Created 2015-02-06 09:10:56 | Updated | Tags: dbsnp hapmap 1000g omni mills

When I tried to call SNP/Indel through the V3.3 GATK, I found a problem, how can I get the following datasets?

True sites training resource: HapMap True sites training resource: Omni Non-true sites training resource: 1000G Known sites resource, not used in training: dbSNP Known and true sites training resource: Mills

Does GATK provide the corresponding vcf files such as "hapmap.vcf","omni.vcf","1000G.vcf""dbsnp.vcf""mills.vcf" ?

Created 2014-02-27 19:56:41 | Updated 2014-02-27 20:03:15 | Tags: variantstovcf hapmap

I’d like to convert a hapmap file to vcf. The hapmap file is from http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/latest/forward/non-redundant/genotypes_chr1_ASW_r27_nr.b36_fwd.txt.gz

java -Xmx2g -jar GenomeAnalysisTK.jar \
-R ref.fasta \
-T VariantsToVCF \
-o output.vcf \
--variant:RawHapMap input.hapmap \
--dbsnp dbsnp.vcf

1. Since the hapmap is in reference genome b36, should the ref.fasta be b36 as well? While b37 is everywhere, the only place I can find b36 is b36.3 at ftp://igenome:G3nom3s4u@ussd-ftp.illumina.com/Homo_sapiens/NCBI/build36.3/Homo_sapiens_NCBI_build36.3.tar.gz, is this OK?
2. What’s the usage of “-–dbsnp” here, should it be dbSNP built upon b36 as well?

Thanks,

Created 2012-10-18 23:10:15 | Updated 2012-10-19 17:56:13 | Tags: bundle liftovervariants hapmap

Hi,

We are sequencing some of the HapMap samples (NA19240 for instance) and we compared the calls we get for the HapMap samples with the calls that are reported for these samples in the file "hapmap_3.3.b37.vcf" that was part of the GATK bundle 1.5.

We are surprised to find a lot of discordant calls, but when we verified the calls, we could see no evidence in the HapMap sequence data for many of them.

We see the discordance at many sites and the vast majority of those show multiple alleles for the positions (like this one:

13 32914977 rs11571660 A C,T . PASS AC=1,2785;AF=0.00036,0.99964;AN=2786;set=Intersection GT 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 ... )

We call many of these sites as Homozygous REF...

##reference=Homo_sapiens_assembly18.fasta


But the file seem to suggest it was done against b37 which implies hg19... Was this file created with liftover?

Also...Could it be possible that HG19 was updated to reflect the most common allele in the population (as you can see of the MAF of 0.00036 for this example) when it went from HG18 to HG19, but that the VCF file was not updated for the HapMap 3.3 samples to reflect this?

So the bottom line is, that the HapMap3.3 file could not be used to rely on the actual calls for the HapMap samples, since about 10% of the sites show variant calls, while the HG19 reference shows no variance...

I know that HapMap is used for different purpose in GATK (Variant Score Calibration), but we may want to warn the public that you cannot use the file as a source of variation for the HapMap samples it reports on...

Thanks

Thon