About the contig ordering in the reference(b37/hg19)
Posted in Ask the GATK team | Last updated on


Comments (1)

I have read this on GATK's documents:

Human sequence

If you are using human data, your reads must be aligned to one of the official b3x (e.g. b36, b37) or hg1x (e.g. hg18, hg19) references. The contig ordering in the reference you used must exactly match that of one of the official references canonical orderings. These are defined by historical karotyping of largest to smallest chromosomes, followed by the X, Y, and MT for the b3x references; the order is thus 1, 2, 3, ..., 10, 11, 12, ... 20, 21, 22, X, Y, MT. The hg1x references differ in that the chromosome names are prefixed with "chr" and chrM appears first instead of last. The GATK will detect misordered contigs (for example, lexicographically sorted) and throw an error.

that said the reference order must be: chr1, chr2, chr3, ... chr22, chrX, chrY, chrM. but after I download all the bundle in GATK's ftp, I check's reference, it's with a order of :

>chrM
>chr1
>chr2
>chr3
>chr4
>chr5
>chr6
>chr7
>chr8
>chr9
>chr10
>chr11
>chr12
>chr13
>chr14
>chr15
>chr16
>chr17
>chr18
>chr19
>chr20
>chr21
>chr22
>chrX
>chrY
...

so, is it contradictory?


Return to top Comment on this article in the forum