MAF (Mutation Annotation Format)
A Mutation Annotation Format (MAF) file (.maf) is a tab-delimited text file that lists mutations. The format originates from The Cancer Genome Atlas (TCGA) project and is described in detail here. As such, the format pertains to human genomes.
In the context of human cancer, MAF files come in two types--protected and somatic. These two types extend conceptually to (1) mutation files that contain all sequenced mutations--however mutations are defined--e.g. against matched normal tissue, against a reference much like for the VCF format, or against another tumor stage, and (2) mutation files that filter the mutation types that are listed based on set criteria, e.g. only somatic mutations.
Additionally, depending on the mutation calling technique, such as exome sequencing or whole genome sequencing, coverage may be limited to portions of a genome.
In somatic MAFs, to minimize germline contamination, unvalidated mutations are included only if within coding regions and splice sites. Thus, mutations from UTRs, intergenic regions, introns, and other noncoding regions are excluded. Additionally, mutation types are filtered as detailed here. Finally, SNPs present and annotated as other than somatic in dbSNP, COSMIC, or OMIM are removed.
A MAF's Mutation_Status field, if positively verified, follows the Variant Call Format's (VCF) validation status (VLS) assignments. Briefly, a mutation is labeled relative to its non-adjacent normal as 0 for wildtype, 1 for germline, 2 for somatic, 3 for loss of heterozygosity (LOH), and 4 for post transcriptional modification. A status of 5 for unknown implies that a secondary verification was not performed.
IGV is indifferent to the data type within a MAF, e.g. whether it is protected or somatic. Furthermore, for TCGA data, IGV matches only to the patient identifier portion of the TCGA barcode and is indifferent to tissue type and sample type. This has the following implications.
For protected MAFs, IGV is indifferent to Mutation_Status field values in overlaying mutations onto tumor versus matched normal sample tracks. IGV does not filter nor assign these mutations to respective samples to which the mutations may be relevant.
In addition, whether protected or somatic, when overlaying TCGA data tracks, IGV matches to the patient identifer portion of the TCGA barcode. This feature is convenient in matching mutation data or attributes to the various sample types that are differentially indicated by the latter half of the TCGA barcode and arise from a single patient. However, an unintended consequence of this feature is that any data sample from the same patient will overlay the mutation track, irregardless of tissue origin. Thus mutation tracks can erraneously overlay on matched normal sample tracks. If this is undesirable, please use a linking ID and a Sample Information file as outlined in Using linking identifiers to overlay tracks. For help deciphering TCGA barcodes, see the NCI wiki for TCGA barcodes and the code table for sample types. To request new IGV features, post to the IGV-Help forum.
As of IGV version 2.3.3 both .maf and .maf.annotated extensions are valid for this format, so long as the files are text-based. This means (1) a file with the extension .maf.txt will be recognized as a MAF file, and (2) IGV will not visualize binary MAF files, e.g. from TumorPortal.org. Instead, download and visualize the equivalent .MAF.TXT files from Firebrowse.org.
MAF files from Firebrowse.org download as a folder containing a manifest and an individual maf.txt file for each patient. For example, at the time of this writing the bladder cancer (BLCA) cohort gives 130 individual MAF files corresponding to each TCGA patient.
There are two ways to open multiple MAF files at once on IGV:
For instructions on merging multiple text files, converting a MAF file to the MUT format, or displaying multiple mutation tracks in collapsed form, see How to concatenate multiple text files.
IGV will visualize each individual sample's mutation data as a single track.