Hello GATK users,
As you know, we have been trying for the past few months to beef up support and improve documentation. This is a long game and although we'd like to think we've made some progress, there still remains a lot to be done.
One thing we're doing right now is developing teaching materials for the upcoming workshop. Even though that will only serve a portion of our user community directly, the materials will be a useful addition to the online documentation. We've also got some new documents in preparation that should help with parallelization options, as well as some frequently asked questions on a range of subjects, illustrations to clarify workflows, and a lot of updates and link fixes for the older doc articles.
All of this takes time to develop, and unfortunately, responding to questions on the forum takes time away from that.
So we're appealing to everyone of you to help us by helping others. There's a whole subset of "beginner" questions that any intermediate user could easily answer. It would be a big help if you folks could jump in and answer those questions for us whenever you can. Some of you are already doing it, and we're really thankful, because it takes some of our support burden away and frees up time for us to work on the materials that can benefit many people at once. So we'd love to see more people help in this way, and to be able to move more questions to the "Ask the Community" section (which currently houses mostly the "weird datasets" questions that we honestly don't know how to answer).
We'd be happy to consider an incentive system, by the way. We can't really offer money or anything like that, but if there's any intangible form of reward that would motivate you (leaderboards? gold stars? "expert support" coupons? big smiles and karma points?), let us know!
To anyone who's worried about the quality of community-sourced answers: we're always going to monitor every discussion, so rest assured that there won't be any decrease on quality. Who knows, there may even be an increase! ;-)
I am trying to run the BaseRecalibrator on plant data for which I have no SNP or INDEL data. The documentation clearly states that providing a set of known variants is OPTIONAL, but the program crashes. What is going on and how do I run the program without the SNPs which are not available? I have followed the pipeline given in the Best Practices and want to see it through.
Documentation for the tool:
States the following:
"Optional Inputs" "--knownSites NA A database of known polymorphic sites"
However, I get this message:
Should I simply not bother with this step and just run the Haplotype Caller? And will this yield equally good results?
"summary": "Output variants not called in this comparison track", "name": "--discordance", "kind": "optional_in",
'kind' should be optional_out isn't it ?
There seems to be a documentation error on this page: https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_StrandOddsRatio.php
refRatio and altRatio should be calculated as min/max rather than max/min values. That is what is in code here: https://github.com/broadgsa/gatk-protected/blob/master/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/annotator/StrandOddsRatio.java assuming the two correspond to each other. I coded it the same way myself and get more precise results too.
Amount of padding (in bp) to add to each interval Use this to add padding to the intervals specified using -L and/or -XL. For example, '-L chr1:100' with a padding value of 20 would turn into '-L chr1:80-120'. This is typically used to add padding around exons when analyzing exomes. The general Broad exome calling pipeline uses 100 bp padding by default.
I think it sould not be '-L chr1:80-120'
I recently start consider to use more GATK and Queue and develop my analysis tools with it. I intend to figure out how to generate jar files for GATK or Queue. I will import the files into Scala IDE for ease of viewing inline source code and code documentation. Should I try doing that or there is a simpler way?
http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_coverage_DepthOfCoverage.html states (bold is my emphasis):
This tool can be run in multi-threaded mode using this option.
Yet when I run the GATK with
-nt, I get an error that
-nt is not supported (truncated to save space):
ERROR A USER ERROR has occurred (version 2.8-1-g932cd3a): ERROR MESSAGE: Invalid command line: Argument nt has a bad value: The analysis DepthOfCoverage aggregates results by interval. Due to a current limitation of the GATK, analyses of this type do not currently support parallel execution. Please run your analysis without the -nt option.
In GATK 2.6, there have been some changes to BaseRecalibrator. Based on the AnalyzeCovariates page, it must now be run twice. To generate the first pass recalibration table file, it's the same command as before. To generate the second pass recalibration table file, you need to add the -BQSR argument. However, on the BaseRecalibrator page, there is no -BQSR documentation.
Hi, I use the option to get extended output from mutect and it works well. Unfortunately I cannot find a detailed description of all the columns, only the ones that are in the regular output. Is it possible to have this kind of documentation? thanks a lot
I was here maybe a month or two ago and the documentation seemed fine. This was definitely after the website was redesigned. Now, it seems a bunch of documentation pages are either incomplete or empty. Here are some examples: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_filters_BadMateFilter.html http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_BaseCounts.html http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_DepthOfCoverage.html http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_SampleList.html http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_SnpEff.html
Is this a known issue? I saw a post about a similar problem from September 2012, but this seems to be something new.
I've a couple of essentially documentation-centric questions...
Firstly, the SelectVariants documentation describes selecting 1000 random variants from a vcf using '-number 1000', however when I try that (with the command "
java -jar GenomeAnalysisTK.jar -T SelectVariants -R human_g1k_v37.fasta --variant variants.vcf -o 1000.vcf -number 1000") it produces the error
'Argument with the name number isn't defined'. Trying --number instead doesn't make any difference, while the --help output does not list this argument (GATK 2.2.2). It this option no longer available?
Secondly, the 'gatkforums.broadinstitute.org/discussion/48/using-varianteval#Understanding_Genotype_Concordance_values_from_Variant_Eval' section of the 'Using VariantEval' page has a series of images explaining the concordance values, however the images are missing. Please could these be restored?
Many thanks, James