Origin of the Practices
We develop and validate these workflows in collaboration with many investigators within the Broad Institute's network of affiliates. They are deployed at scale in the Broad's production pipelines -- a very large scale indeed. As a general rule, the command-line arguments and parameters given in the documentation examples are meant to be broadly applicable (no pun intended). However, our testing does focus largely on data from human whole-genome or whole-exome samples sequenced with Illumina technology, so if you are working with different types of data or experimental designs, you may need to adapt certain branches of the workflow, as well as certain parameter selections and values. See the FAQs and Common Problems documentation in particular for help with that. Note that we may not be able to provide recommendations on how to deal with very different experimental designs or divergent datatypes (such as Ion Torrent).
Beware legacy commands (or, trust but verify)
If someone hands you a script and tells you "this implements the GATK Best Practices", start by asking what version it used and when it was written. Both our software and our usage recommendations evolve in step with the rapid pace of technological and methodological innovation in the field of genomics, so what was Best Practice last year (let alone in 2010) may no longer be applicable. And if all the steps seem to be in accordance with our docs (same tools in the same order), you should still check every single argument in the commands. If anything is different, figure out what it does. It's one or two hours of your life that can save you days of troubleshooting. We're working on a way to produce versioned Best Practices documents that will mitigate this problem, but in the meantime, protect yourself by being thorough.