I have annotated my vcf file of 20 samples from Unified genotyper using the following steps.
My question is how should I proceed if I have to select rare variants (MAF<1%) for the candidate genes that I have,for each of these 20 samples?
I am reading a research paper that uses GATK to call variants and filtration.
The method description goes:
"In addition to the default filters in GATK, variants were further filtered for genotype minimum quality of 30, minimum quality over depth of 5, minimum strand bias -0.10 and maximum fraction of reads with mapping quality of zero at 10%. Annotated variants were subsequently filtered to exclude the variants greater or equal to 1% of minor allele frequency based on dbSNP135 and the 1000 genome project and the NHLBI Exome Variant server (EVS). "
I want to make sure I understand how the authors did the filtration. Below is my guess - needs your help to confirm:
java -Xmx2g -jar GenomeAnalysisTK.jar \ -R ref.fasta \ -T VariantFiltration \ --filterExpression "GQ >= 30" \ --filterExpression " DP >= 5" \ --filterExpression "SB >= -2" \ --filterExpression "MQ0 <= 0.1"
Then annotate the variants, I don't know how to "exclude the variants greater or equal to 1% of minor allele frequency based on dbSNP135 and the 1000 genome "??
What is minor allele frequency (MAF)? and how do you exclude variants based on MAF?
Is MAF selected by the "AF" field in VCF files? Should I use the SelectVariants of GATK to do something like this?
thanks for help