Tagged with #merged bams
0 documentation articles | 0 announcements | 3 forum discussions


No posts found with the requested search criteria.
No posts found with the requested search criteria.

Created 2013-03-29 20:27:26 | Updated | Tags: readbackedphasing bam
Comments (4)

I have applied PhaseByTransmission on a trio with a ped file and now want to run ReadBackedPhasing. However, each of the trio variant calls were called from a different BAM file (as each was from a different individual). In the ReadBackedPhasing documentation it only mentions using the program with a single bam. Does this mean that I need to merge the bams for each of the three individuals into a single bam? If so, do you have any suggested programs that work well with GATK?


Created 2012-10-01 22:56:18 | Updated 2012-10-01 22:56:18 | Tags:
Comments (1)

Dear developers, we are puzzled by the strange outcome of picard MergeSamFiles and wondering how and if it will affect the subsequent GATK pipeline on the analysis. We have groups of individuals having the same genetics diseases and decided to pre-process and call mutations after merging all the samples of each group together. Each sample has been indexed and some samples have been sequenced twice and thus two Read Group IDs are assigned to those samples.

We merged the bam alignment files of different samples having Read Group information using picard MergeSamFiles program and proceeded to pre-process the merged file and called mutations with UnifiedGenotyper. The UG results seemed fine, a genotype was assigned to each sample and thus Read Group identity is preserved, but after visualizing the alignment in IGV we noticed that in the merged files the reads appear to be duplicated and the pair information corrupted. Both the forward read and the repeated read are present twice in the file and identifies itself as its own pair mapped at the same position and all the "pairs have insert size either 1 or -1. Please find below an example.

Apparently, no one reported this behaviour of MergeSamFiles, so my question is: do you happen to see the same? How do you merge files when processing cohorts of samples in the same bam? Will this affect the mutation call and, for example, the read counts reported in the vcf (the AD and DP fields in each genotype)? Is the read pairing lost in the "fragment based" model?

In summary, should we repeat the analyses on the merged files and if so how do you suggest to pre-process them?

Thank you in advance for your help and for all your work on the great tools that you are providing the scientific community.

Best,

Margherita

Here is the example:


Before merging, the reads information in sam file looks like this. [Forward Read] HWI-ARTY:52:IDC100:5:1101:10002:79055 99 chr2 152506978 60 75M = 152507022 119 CATTTTAGATTTTCAATAATAATTCCTAATTTCTGCTTCTCTTTGACTAAATAGCTACTCATGTAAAATATGAAT CBCFFFFFHHHHHJJJJJJJJJJJJJJJIJJJJJIJJJJIJIJIJJIIEIIIIIIIJJIJJJJGIFIIIIJIIII X0:i:1 X1:i:0 MD:Z:62A12 RG:Z:S72 XG:i:0 AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 XT:A:U [Reverse Read] HWI-ARTY:52:IDC100:5:1101:10002:79055 147 chr2 152507022 60 75M = 152506978 -119 GACTAAATAGCTACTCATGTAAAATATGAATTAATATATAAACTATTGGATTTAATAGAGACTGACCTCACTTTG HGGJJJJJJGIHGIGGJHIIIJIIGIGJJJIIJJJJIIGIIHGJIIJJJJJJJJIJJJJJIIGHHHHFEDFFCCB X0:i:1 X1:i:0 MD:Z:18A40A15 RG:Z:S72 XG:i:0 AM:i:37 NM:i:2 SM:i:37 XM:i:2 XO:i:0 XT:A:U


And after Merging it looks like this [Forward Read] HWI-ARTY:52:IDC100:5:1101:10002:79055 67 chr2 152506978 60 75M = 152506978 -1 CATTTTAGATTTTCAATAATAATTCCTAATTTCTGCTTCTCTTTGACTAAATAGCTACTCATGTAAAATATGAAT 8>=BAA=:;>BBB:@A@=A@>B@B::?>B@BB:??;?B:?:?BB??;?;BB?>:;?>;?:@@?==?BA>;=:8>9 X0:i:1 X1:i:0 BD:Z:KKQNJIUTETPCDRSOTRJSRJTPQNNMISOGQQNLOLRRNRNEESGQMICKPRMRNNTOVTSRQI==MQTPQOR MD:Z:62A12 RG:Z:S72 XG:i:0 BI:Z:FFGFVUZYRRTRGVROUORVOQTOOKOHLRLIPQOGPNQRMGNKLKODIPIRNQONKQONCJOLIDEBHGVGDHK AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 MQ:i:60 XT:A:U [Forward Read Repeated] HWI-ARTY:52:IDC100:5:1101:10002:79055 67 chr2 152506978 60 75M = 152506978 1 CATTTTAGATTTTCAATAATAATTCCTAATTTCTGCTTCTCTTTGACTAAATAGCTACTCATGTAAAATATGAAT 8>=BAA=:;>BBB:@A@=A@>B@B::?>B@BB:??;?B:?:?BB??;?;BB?>:;?>;?:@@?==?BA>;=:8>9 X0:i:1 X1:i:0 BD:Z:KKQNJIUTETPCDRSOTRJSRJTPQNNMISOGQQNLOLRRNRNEESGQMICKPRMRNNTOVTSRQI==MQTPQOR MD:Z:62A12 RG:Z:S72 XG:i:0 BI:Z:FFGFVUZYRRTRGVROUORVOQTOOKOHLRLIPQOGPNQRMGNKLKODIPIRNQONKQONCJOLIDEBHGVGDHK AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 MQ:i:60 XT:A:U

[Reverse Read] HWI-ARTY:52:IDC100:5:1101:10002:79055 179 chr2 152507022 60 75M = 152507022 -1 GACTAAATAGCTACTCATGTAAAATATGAATTAATATATAAACTATTGGATTTAATAGAGACTGACCTCACTTTG 5:6;@A@>>;:=;:=?@?:>ABB@=??:B@B=B?=@>?=BA=9=?B@::?AA=A?=>9?:>:?;>:::A?:@AB2 X0:i:1 X1:i:0 BD:Z:FRRRJHCLROROSOUSTKTRHFQOQRSNOTKNOLNLPMRDOQLPMHRMRSDERMLLQNQGSRNGQOQSNTSCNKK MD:Z:18A40A15 RG:Z:S72 XG:i:0 BI:Z:HIIH;?I9>6HNUGNRQSRJLLOSHLKTPTQQNOJRJJOGKQMBQKJHKQIMOHJEMIMMOGHKQNLMTVTCEFF AM:i:37 NM:i:2 SM:i:37 XM:i:2 XO:i:0 MQ:i:60 XT:A:U [Reverse Read Repeated] HWI-ARTY:52:IDC100:5:1101:10002:79055 179 chr2 152507022 60 75M = 152507022 1 GACTAAATAGCTACTCATGTAAAATATGAATTAATATATAAACTATTGGATTTAATAGAGACTGACCTCACTTTG 5:6;@A@>>;:=;:=?@?:>ABB@=??:B@B=B?=@>?=BA=9=?B@::?AA=A?=>9?:>:?;>:::A?:@AB2 X0:i:1 X1:i:0 BD:Z:FRRRJHCLROROSOUSTKTRHFQOQRSNOTKNOLNLPMRDOQLPMHRMRSDERMLLQNQGSRNGQOQSNTSCNKK MD:Z:18A40A15 RG:Z:S72 XG:i:0 BI:Z:HIIH;?I9>6HNUGNRQSRJLLOSHLKTPTQQNOJRJJOGKQMBQKJHKQIMOHJEMIMMOGHKQNLMTVTCEFF AM:i:37 NM:i:2 SM:i:37 XM:i:2 XO:i:0 MQ:i:60 XT:A:U


Created 2012-09-06 20:17:20 | Updated 2012-09-07 01:55:51 | Tags: readgroup
Comments (1)

Hi, I have two bam files from one case and one control. They were both mapped with bwa. In their bam files, their readgroup tags are with different sample ID (SM), but with the same library ID (LB).

I then fed the two bam files into GATK together for realignment, and the output is a merged bam for the two samples. I am just a bit unsure, if the two samples with the same LB id will affect the downstream variant calls? can the caller can distinguish them by their different sample IDs?

Thank you very much for your kind guidance!