If this is your first rodeo, you're probably asking yourself:
What can GATK do for me? Identify variants in a bunch of sample sequences, with great sensitivity and specificity.
No but really, how do I know what to do? For each step in the Best Practices, there is a tutorial that details how to run the tools involved, with example commands. The idea is to daisy-chain all thosee tutorials in the order that they're referenced in the Best Practices doc into a pipeline.
Oh, you mean I can just copy/paste all the tutorial commands as they are?
Not quite, because there are a few things that need to be tweaked. For example, the tutorials use the
-L/--intervals argument to restrict analysis for demo purposes, but depending on your data and experimental design, you may need to remove it (e.g. for WGS) or adapt it (for WEx). Hopefully it's explained clearly enough in the tutorials.
Why don't you just provide one script that runs all the tools? It's really hard to build and maintain a one-size-fits-all pipeline solution. Really really hard. And not nearly as much fun as developing new analysis methods. We do provide a pipelining program called Queue that has the advantage of understanding GATK argument syntax natively, but you still have to actually write scripts yourself in Scala to use it. Sorry. Maybe one day we will be able to offer GATK analysis on the Cloud. But not today.
What if I want to know what a command line argument does or change a parameter? First, check out the basic GATK command syntax FAQ if it's your first time using GATK, then consult the relevant Tool Documentation page. Keep in mind that some arguments are "engine parameters" that are shared by many tools, and are listed in a separate document. Also, you can always use the search box to find an argument description really quickly.
The documentation seems chaotic. Is there any logic to how it's organized? Sort of. (And, ouch. Tough crowd.) The main category names should be obvious enough (if not, see the "Documentation Categories" tab). Within categories, everything is just in alphabetical order. In future, we're going to try to provide more use-case based structure, but for now this is what we have. The best way to find practical information is to either go from the Best Practices doc (which provide links to all FAQs, method articles and tutorials directly related to a given step), or use the search box and search-by-tag functions (see the "Search tab"). Be sure to also check out the Presentations section, which provides workshop materials and videos that explain a lot of the motivation and methods behind the Best Practices.
Does GATK include other tools beside the ones in the Best Practices? Oh sure, there's a whole bunch of them, all listed in the Tool Documentation section, categorized by type of analysis. But be aware that anything that's not part of the Best Practices is most likely either a tool that was written for a one-off analysis years ago, an experimental feature that we're still not sure is actually useful, or an accessory utility that can be used in many different ways and takes expert inside knowledge to use properly. All these may be buggy, insufficiently documented, or both. We provide support for them as well as humanly possible but ultimately, you use them at your own risk.
Why do the answers to these questions keep getting longer and longer? I don't know what you're talking about.
What else should I know before I start? You should probably browse the titles of the Frequently Asked Questions -- there will be at least a handful you'll want to read, but it's hard for us to predict which ones.
The categories menu on the left-hand side shows the top categories under which the documentation articles are classified.
|Guide Index||Your guide to the Guide|
|Search Tags||Look up the information you need by tag|
|Best Practice Workflows||Recommended workflows for genome analysis|
|Tool Documentation||Detailed technical documentation (arguments, options etc.) for each tool|
|Methods and Algorithms||Description of analysis methods and explanation of the algorithms involved|
|FAQs||Frequently Asked Questions|
|Tutorials||Step-by-step tutorials that demonstrate how to use the tools in practice|
|Presentations||Materials from conferences, workshops and online events|
|Common Problems||Solutions and workarounds for common problems|
|Pipelining with Queue||Building analysis pipelines to run GATK and other tools efficiently|
|Developer Zone||All you need to know to write your own tools on top of the GATK|
|Third-Party Tools||Tools built on top of the GATK developed by other groups|
|Version History||Historical record of changes by version|
In the unlikely event that the documentation doesn't solve all of your problems, do your holiday shopping and make your coffee too, please use our support forum to get help from a live human. The live human in question will most likely be a member of the development team. However in some cases you may get responses from other people in the user community who are not part of our group. This is usually a good thing, as there are quite a few expert users who we happily trust with answering most questions, and we welcome the help (hurray for free labor). But as with any public forum, we do recommend you exercise judgment in following advice from strangers on the Internet. One thing you can do is check people's role and post count on their public profile.
Now, in practice: