Tagged with #mouse
0 documentation articles | 0 announcements | 10 forum discussions

No articles to display.

No articles to display.

Created 2016-04-20 11:04:13 | Updated | Tags: bqsr mouse enu

Comments (2)


I am currently processing some aligned reads to discover novel variants from an ENU mutagenesis project. All the samples are Bl6 mice, with a few exceptions.

A previous forum has discussed about using BQSR on mice sequences (gatkforums.broadinstitute.org/gatk/discussion/1243/what-are-the-standard-resources-for-non-human-genomes). Moreover, there's a research paper that fed in dbSNP database into their BQSR algorithm for their ENU project (ncbi.nlm.nih.gov/pmc/articles/PMC4623266/.)

From what I understand about BQSR, the recalibration method considers variants not found in well annotated variants as "errors" (is this error modelling?). I'm just wondering for such a project where novel mutations are to be expected, one might overestimate the errors if one were to feed in known variants into the learning algorithm (especially if the mutation load is high)?

Thanks in advance

Created 2016-04-11 04:59:55 | Updated | Tags: bqsr mouse mm10

Comments (1)


I've had good luck running BQSR on E. coli and C. remanei in the past, but mouse is giving me some trouble. It works if I remove the "--fix_misencoded_quality_scores," but I'm not sure if that is very helpful, as I thought that was the point of running BQSR. I have formatted the mm10 reference as suggested, with the chromosomes in the correct order and the chrom labels as just numbers without "chr", but I keep getting the same error. For the known sites vcf, I just use a vcf generated from the same, non-calibrated bam file, generated with the program Lofreq.

Thanks in advance for any help. Here is my input and error message:

$ java -jar /usr/local/packages/GATK/2.6-4/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home13/jpreston/genomes/sorted_mm10/sorted_mm10_2.fa -I /home9/anniep/dnaseq/bowtie/242_normal_sorted.bam -knownSites /home9/anniep/dnaseq/bowtie/242_normal_sorted_KS.vcf --fix_misencoded_quality_scores -o /home9/anniep/dnaseq/bowtie/242_normal_sorted_recal.table
INFO 21:42:19,395 HelpFormatter - -------------------------------------------------------------------------------- INFO 21:42:19,397 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.6-4-g3e5ff60, Compiled 2013/06/24 14:48:56 INFO 21:42:19,397 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 21:42:19,397 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 21:42:19,401 HelpFormatter - Program Args: -T BaseRecalibrator -R /home13/jpreston/genomes/sorted_mm10/sorted_mm10_2.fa -I /home9/anniep/dnaseq/bowtie/242_normal_sorted.bam -knownSites /home9/anniep/dnaseq/bowtie/242_normal_sorted_KS.vcf --fix_misencoded_quality_scores -o /home9/anniep/dnaseq/bowtie/242_normal_sorted_recal.table INFO 21:42:19,401 HelpFormatter - Date/Time: 2016/04/10 21:42:19 INFO 21:42:19,401 HelpFormatter - -------------------------------------------------------------------------------- INFO 21:42:19,402 HelpFormatter - -------------------------------------------------------------------------------- INFO 21:42:19,413 ArgumentTypeDescriptor - Dynamically determined type of /home9/anniep/dnaseq/bowtie/242_normal_sorted_KS.vcf to be VCF INFO 21:42:19,982 GenomeAnalysisEngine - Strictness is SILENT INFO 21:42:20,069 GenomeAnalysisEngine - Downsampling Settings: No downsampling INFO 21:42:20,076 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 21:42:20,134 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.06 INFO 21:42:20,168 RMDTrackBuilder - Loading Tribble index from disk for file /home9/anniep/dnaseq/bowtie/242_normal_sorted_KS.vcf WARN 21:42:20,288 RMDTrackBuilder - Index file /home9/anniep/dnaseq/bowtie/242_normal_sorted_KS.vcf.idx is out of date (index older than input file), deleting and updating the index file INFO 21:42:20,394 RMDTrackBuilder - Creating Tribble index in memory for file /home9/anniep/dnaseq/bowtie/242_normal_sorted_KS.vcf INFO 21:42:21,123 RMDTrackBuilder - Writing Tribble index to disk for file /home9/anniep/dnaseq/bowtie/242_normal_sorted_KS.vcf.idx INFO 21:42:29,567 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files INFO 21:42:29,571 GenomeAnalysisEngine - Done preparing for traversal INFO 21:42:29,571 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 21:42:29,571 ProgressMeter - Location processed.reads runtime per.1M.reads completed total.runtime remaining INFO 21:42:29,596 BaseRecalibrator - The covariates being used here:
INFO 21:42:29,596 BaseRecalibrator - ReadGroupCovariate INFO 21:42:29,596 BaseRecalibrator - QualityScoreCovariate INFO 21:42:29,596 BaseRecalibrator - ContextCovariate INFO 21:42:29,597 ContextCovariate - Context sizes: base substitution model 2, indel substitution model 3 INFO 21:42:29,597 BaseRecalibrator - CycleCovariate INFO 21:42:29,600 ReadShardBalancer$1 - Loading BAM index data INFO 21:42:29,600 ReadShardBalancer$1 - Done loading BAM index data WARN 21:42:30,549 RestStorageService - Error Response: PUT '/GATK_Run_Reports/CDOcM34kcITMDEM7GEsb3sgg7vCkk1ry.report.xml.gz' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 988, Content-MD5: d3m5ghuffx5ZqbXFYdU8eQ==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: 7779b9821b9f7f1e59a9b5c561d53c79, Date: Mon, 11 Apr 2016 04:42:29 GMT, Authorization: AWS AKIAIMHBU7X642TCHQ2A:43FosbVvbsP2X/SqzsJ7PhY1w80=, User-Agent: JetS3t/0.8.1 (Linux/2.6.32-358.23.2.el6.x86_64; amd64; en; JVM 1.7.0_80), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers: [x-amz-request-id: FE8C7B3005828024, x-amz-id-2: euWnPAZr6BuQp05KFLPCkdLpzrcwXMCEWm3Tlfjk2lGbgP89RENyKG387IdN+YcAXsa58zB7zzk=, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Mon, 11 Apr 2016 04:42:29 GMT, Connection: close, Server: AmazonS3]

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 2.6-4-g3e5ff60):
ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
ERROR Please do not post this error to the GATK forum
ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR MESSAGE: Bad input: while fixing mis-encoded base qualities we encountered a read that was correctly encoded; we cannot handle such a mixture of reads so unfortunately the BAM must be fixed with some other tool
ERROR ------------------------------------------------------------------------------------------

Created 2015-04-02 20:30:24 | Updated | Tags: mouse referencecontext

Comments (1)

Hi, I met the belowing error when I do Base Quality Recalibration. I'm working on mouse genome. I downloaded mm10 dbsnp from broad institue. Is it related to this dbsnp file?

INFO 16:25:16,159 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining INFO 16:25:46,168 ProgressMeter - chr12:87273626 3.33e+07 30.0 s 0.0 s 14.5% 3.4 m 2.9 m INFO 16:25:49,333 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused INFO 16:25:49,333 HttpMethodDirector - Retrying request INFO 16:25:49,336 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused INFO 16:25:49,336 HttpMethodDirector - Retrying request INFO 16:25:49,338 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused INFO 16:25:49,338 HttpMethodDirector - Retrying request INFO 16:25:49,339 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused INFO 16:25:49,340 HttpMethodDirector - Retrying request INFO 16:25:49,341 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused INFO 16:25:49,341 HttpMethodDirector - Retrying request

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: 1 at org.broadinstitute.sting.gatk.contexts.ReferenceContext.getBase(ReferenceContext.java:179) at org.broadinstitute.sting.gatk.walkers.indels.RealignerTargetCreator.map(RealignerTargetCreator.java:219) at org.broadinstitute.sting.gatk.walkers.indels.RealignerTargetCreator.map(RealignerTargetCreator.java:124) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:121) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:107)

Created 2014-04-23 15:38:54 | Updated | Tags: indelrealigner realignertargetcreator mouse

Comments (2)


I've run the IndelRealigner on my mouse WGS *bam files with known site data from the Sanger MGP, and now I'm trying to figure out how "well" it worked.

The list created by RealignerTargetCreator contains 6547185 intervals

Parsing the output realigned.bam file for reads that had an "OC" tag added (as suggested in http://www.broadinstitute.org/gatk/events/3391/GATKw1310-BP-2-Realignment.pdf) shows that 1648299 reads were actually realigned.

I used the default settings, which means that

1) -model was USE_READS - and from what I've read, this is the correct option to use, given that Smith-Waterman modelling doesn't give greatly improved results;

2) -LOD was 5.0 - but for my data, which is mouse whole-genome sequence at average 10x coverage, this may be too high and I might be losing true positives.

I've tried randomly picking out candidate intervals from the intervals and OC-tagged reads from the realigned.bam file to check, but I was wondering if there's a more empirical way of checking how good the realignment was (I realise there's "no formal measure" as per the presentation but I'm finding it hard to make a judgement call!).

My feeling from looking at the intervals or realigned reads is that the low coverage is a major issue in terms of identifying "true" indels, so preferably we'd go for specificity over sensitivity.

Thanks for any advice/suggestions in advance!

Created 2014-04-17 11:34:39 | Updated 2014-04-17 16:09:45 | Tags: indelrealigner bqsr knownsites mouse

Comments (5)

Hello again,

More fun with mouse known site data! I'm using the Sanger MGP v3 known indel/known SNP sites for the IndelRealigner and BQSR steps.

I'm working with whole-genome sequence; however, the known sites have been filtered for the following contigs (example from the SNP vcf):

##source_20130026.2=vcf-annotate(r813) -f +/D=200/d=5/q=20/w=2/a=5 (AJ,AKR,CASTEiJ,CBAJ,DBA2J,FVBNJ,LPJ,PWKPhJ,WSBEiJ)
##source_20130026.2=vcf-annotate(r813) -f +/D=250/d=5/q=20/w=2/a=5 (129S1,BALBcJ,C3HHeJ,C57BL6NJ,NODShiLtJ,NZO,Spretus)
##source_20130305.2=vcf-annotate(r818) -f +/D=155/d=5/q=20/w=2/a=5 (129P2)
##source_20130304.2=vcf-annotate(r818) -f +/D=100/d=5/q=20/w=2/a=5 (129S5)
##FILTER=<ID=BaseQualBias,Description="Min P-value for baseQ bias (INFO/PV4) [0]">
##FILTER=<ID=EndDistBias,Description="Min P-value for end distance bias (INFO/PV4) [0.0001]">
##FILTER=<ID=GapWin,Description="Window size for filtering adjacent gaps [3]">
##FILTER=<ID=Het,Description="Genotype call is heterozygous (low quality) []">
##FILTER=<ID=MapQualBias,Description="Min P-value for mapQ bias (INFO/PV4) [0]">
##FILTER=<ID=MaxDP,Description="Maximum read depth (INFO/DP or INFO/DP4) [200]">
##FILTER=<ID=MinAB,Description="Minimum number of alternate bases (INFO/DP4) [5]">
##FILTER=<ID=MinDP,Description="Minimum read depth (INFO/DP or INFO/DP4) [5]">
##FILTER=<ID=MinMQ,Description="Minimum RMS mapping quality for SNPs (INFO/MQ) [20]">
##FILTER=<ID=Qual,Description="Minimum value of the QUAL field [10]">
##FILTER=<ID=RefN,Description="Reference base is N []">
##FILTER=<ID=SnpGap,Description="SNP within INT bp around a gap to be filtered [2]">
##FILTER=<ID=StrandBias,Description="Min P-value for strand bias (INFO/PV4) [0.0001]">
##FILTER=<ID=VDB,Description="Minimum Variant Distance Bias (INFO/VDB) [0]">

When I was trying to use these known sites at the VariantRecalibration step, I got a lot of walker errors saying that (I paraphrase) "it's dangerous to use this known site data on your VCF because the contigs of your references do not match".

However, if you look at the GRCm38_68.fai it DOES include the smaller scaffolds which are present in my data.

So, my question is: how should I filter my bam files for the IndelRealigner and downstream steps? I feel like the best option is to filter on the contigs present in the known site vcfs, but obviously that would throw out a proportion of my data.

Thanks very much!

Created 2014-04-10 16:24:48 | Updated | Tags: indelrealigner realignertargetcreator bqsr knownsites mouse indel-realignment

Comments (5)


I was wondering about the format of the known site vcfs used by the RealignerTargetCreator and BaseRecalibrator walkers.

I'm working with mouse whole genome sequence data, so I've been using the Sanger Mouse Genome project known sites from the Keane et al. 2011 Nature paper. From the output, it seems that the RealignerTargetCreator walker is able to recognise and use the gzipped vcf fine:

INFO 15:12:09,747 HelpFormatter - -------------------------------------------------------------------------------- INFO 15:12:09,751 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.5-2-gf57256b, Compiled 2013/05/01 09:27:02 INFO 15:12:09,751 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 15:12:09,752 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 15:12:09,758 HelpFormatter - Program Args: -T RealignerTargetCreator -R mm10.fa -I DUK01M.sorted.dedup.bam -known /tmp/mgp.v3.SNPs.indels/ftp-mouse.sanger.ac.uk/REL-1303-SNPs_Indels-GRCm38/mgp.v3.indels.rsIDdbSNPv137.vcf.gz -o DUK01M.indel.intervals.list INFO 15:12:09,758 HelpFormatter - Date/Time: 2014/03/25 15:12:09 INFO 15:12:09,758 HelpFormatter - -------------------------------------------------------------------------------- INFO 15:12:09,759 HelpFormatter - -------------------------------------------------------------------------------- INFO 15:12:09,918 ArgumentTypeDescriptor - Dynamically determined type of /fml/chones/tmp/mgp.v3.SNPs.indels/ftp-mouse.sanger.ac.uk/REL-1303-SNPs_Indels-GRCm38/mgp.v3.indels.rsIDdbSNPv137.vcf.gz to be VCF INFO 15:12:10,010 GenomeAnalysisEngine - Strictness is SILENT INFO 15:12:10,367 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 INFO 15:12:10,377 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 15:12:10,439 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.06 INFO 15:12:10,468 RMDTrackBuilder - Attempting to blindly load /fml/chones/tmp/mgp.v3.SNPs.indels/ftp-mouse.sanger.ac.uk/REL-1303-SNPs_Indels-GRCm38/mgp.v3.indels.rsIDdbSNPv137.vcf.gz as a tabix indexed file INFO 15:12:11,066 IndexDictionaryUtils - Track known doesn't have a sequence dictionary built in, skipping dictionary validation INFO 15:12:11,201 GenomeAnalysisEngine - Creating shard strategy for 1 BAM files INFO 15:12:12,333 GenomeAnalysisEngine - Done creating shard strategy INFO 15:12:12,334 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] I've checked the indel interval lists for my samples and they do all appear to contain different intervals.

However, when I use the equivalent SNP vcf in the following BQSR step, GATK errors as follows:

`##### ERROR ------------------------------------------------------------------------------------------

ERROR A USER ERROR has occurred (version 2.5-2-gf57256b):
ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
ERROR Please do not post this error to the GATK forum
ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR MESSAGE: Invalid command line: This calculation is critically dependent on being able to skip over known variant sites. Please provide a VCF file containing known sites of genetic variation.
ERROR ------------------------------------------------------------------------------------------`

Which means that the SNP vcf (which has the same format as the indel vcf) is not used by BQSR.

My question is: given that the BQSR step failed, should I be worried that there are no errors from the Indel Realignment step? As the known SNP/indel vcfs are in the same format, I don't know whether I can trust the realigned .bams.

Thanks very much!

Created 2013-10-31 15:41:54 | Updated | Tags: mutect mouse cosmic

Comments (0)

According to the documentation, "there is no cosmic VCF available for mouse, this entire parameter can be eliminated". Is that still the official recommendation? Is there now perhaps some other comparable resource that one could use?

Created 2013-03-25 22:34:33 | Updated | Tags: variantrecalibrator mouse

Comments (9)

Hello, I am just trying VariantRecalibrator on my 4 samples:

java -jar GenomeAnalysisTK.jar -T VariantRecalibrator -R gatk.ucsc.mm10.fa -input UnifiedGenotyper.output.snps.raw.vcf -nt 14 -recalFile file_for_ApplyRecalibration.recal -tranchesFile file_for_ApplyRecalibration.tranches -resource:sanger,known=false,training=true,truth=true mgp.v2.snps.annot.reformat.vcf -resource:dbnsp,known=true,training=false,truth=false,prior=6.0 mm10_dbsnp.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an InbreedingCoeff -mG 4 -percentBad 0.05

which starts running then gives me this error: INFO 08:26:57,741 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.NumberFormatException: For input string: "." at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:481) at java.lang.Integer.valueOf(Integer.java:582) at org.broadinstitute.sting.utils.codecs.vcf.AbstractVCFCodec.decodeInts(AbstractVCFCodec.java:680) at org.broadinstitute.sting.utils.codecs.vcf.AbstractVCFCodec.createGenotypeMap(AbstractVCFCodec.java:641) at org.broadinstitute.sting.utils.codecs.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:92) at org.broadinstitute.sting.utils.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:130) at org.broadinstitute.sting.utils.variantcontext.LazyGenotypesContext.getGenotypes(LazyGenotypesContext.java:120) at org.broadinstitute.sting.utils.variantcontext.GenotypesContext.iterator(GenotypesContext.java:461) at org.broadinstitute.sting.utils.variantcontext.VariantContext.getCalledChrCount(VariantContext.java:922) at org.broadinstitute.sting.utils.variantcontext.VariantContext.getCalledChrCount(VariantContext.java:908) at org.broadinstitute.sting.utils.variantcontext.VariantContext.isMonomorphicInSamples(VariantContext.java:937) at org.broadinstitute.sting.utils.variantcontext.VariantContext.isPolymorphicInSamples(VariantContext.java:948) at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantDataManager.isValidVariant(VariantDataManager.java:278) at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantDataManager.parseTrainingSets(VariantDataManager.java:263) at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.map(VariantRecalibrator.java:259) at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.map(VariantRecalibrator.java:107) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:243) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:231) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:248) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:219) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:120) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:67) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:23) at org.broadinstitute.sting.gatk.executive.ShardTraverser.call(ShardTraverser.java:73) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.3-9-ge5ebf34):
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR MESSAGE: For input string: "."
ERROR ------------------------------------------------------------------------------------------

I've used all three VCFs in other GATK tools without issues. Any help greatly appreciated!, many thanks, Lavinia.

Created 2013-03-25 22:10:53 | Updated | Tags: vqsr mouse

Comments (4)

I was wondering if anyone has used VQSR for a mouse related genome project. I am working with mm10 dbsnp and DNA-seq short insert data for multiple homozygous mouse samples. I have obtained decent results so far using the mm10 dbsnp as the training set, but was curious to see if anyone had any recommendations as to what settings to use. Any input is appreciated. I also have a lot of RNA-seq data, but that will come at a much later point in time. Thanks!

Created 2012-08-29 20:00:22 | Updated 2013-01-07 19:59:32 | Tags: non-human variantrecalibration mouse

Comments (3)

Dear all,

I was calling SNP from Mouse samples using GATK and was in the step of "Variant quality score recalibration". The VariantRecalibrator walker asked me to provide training sets for mouse SNPs. The only SNP data (for the USCS mm9 assembly) I can find right now is the dbSNP. So I tried the run VariantRecalibrator like this:

java -Xmx4g -jar GenomeAnalysisTK.jar -T VariantRecalibrator -R Refseq.fa -input snps.raw.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 snp128.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an InbreedingCoeff -mode BOTH -recalFile output.recal -tranchesFile output.tranches -rscriptFile output.plots.R

However, the program asked for more:

ERROR MESSAGE: Invalid command line: No training set found! Please provide sets of known polymorphic loci marked with the training=true ROD binding tag. For example, -resource:hapmap,VCF,known=false,training=true,truth=true,prior=12.0 hapmapFile.vcf
ERROR ------------------------------------------------------------------------------------------

I was wondering if I can change the parameters by setting both the training/truth to true:

-resource:dbsnp,known=true,training=true,truth=true,prior=6.0 snp128.vcf

or should I ignore the "-resource" option at the cost of being less accurate?

Any suggestions would be greatly appreciated.