Tagger

Tagger

Introduction

We have developed a tagging strategy that combines the simplicity of pairwise methods with the potential efficiency of multimarker approaches. We avoid overfitting and unbounded haplotype tests in the association phase by (a) using only those multiallelic combinations in which the alleles are themselves in strong LD, and (b) explicitly recording the allelic hypotheses that are to be tested in the subsequent association analysis. Attractive practical features include the ability to force in or exclude sets of tags.

Haploview is based on Paul de Bakker's Tagger. It and more information are available at the Tagger website.There are a number of differences between the implementations, although they are constructed around the same concept. Tagger currently searches a much broader space of available multi-marker tests (up to 6-mers)whereas Haploview allows only 2- or 3-marker tests in the interest of computational efficiency.

Features

Haploview's Tagger operates in either pairwise or aggressive mode. In either case it begins by selecting aminimal set of markers such that all alleles to be captured are correlated at an r2 greater than a user-editable threshold with a marker in that set. Certain markers can be forced into the tag list or explicitly prohibited from being chosen as tags. You can also specify which markers in the dataset you want to be captured.

Aggressive tagging introduces two additional steps. The first is to try to capture SNPs which could not be captured inthe pairwise step (N.B. these must have been "excluded" since otherwise they would simply be chosen to capture themselves)using multi-marker tests constructed from the set of markers chosen as pairwise tags. After this, it tries to "peel back"the tag list by replacing certain tags with multi-marker tests. Tagger avoids overfitting by only constructing multi-marker tests from SNPs which are in strong LD with each other, as measured by a pairwise LOD score. This LOD cutoff can be adjusted to loosen or tighten this requirement; in general, the default cutoff of 3.0 is appropriate for selecting tags from a HapMap-sized reference panel of 120 chromosomes.

Much more information about the development of this algorithm is available at the Tagger website.

Tagger Configuration Panel

N.B.Haploview's Tagger requires either an info file or a hapmap style input file, because it references the marker names specified in those files.If you load a pedigree or phased haplotypes input file without an info file, the Tagger panels will not be available.

This panel shows all SNPs available for tag selection. SNPs which are deselected in the Check Markers tab will not be in this list. There are three checkboxes for each SNP:

 

  1. Force Include

    Checking this box will force this SNP to be chosen as a tag SNP.

  2. Force Exclude

    Checking this box will prohibit this SNP from being chosen as a tag SNP.

  3. Capture this Allele?

    If this box is checked, Haploview will include this SNP in the list of alleles to be captured by the chosen tag set.

N.B. The include and exclude checkboxes are mutually exclusive, and"Capture this Allele" must be checked in order to either include or exclude a marker.

 

Directly below the marker list are buttons to quickly manipulate the table above. Use "Include All"to check all of the "Force Include" boxes, and "Exclude All" to check all of the "Force Exclude" boxes."Uncapture All" will uncheck the "Capture this Allele?" column for all markers, "Exclude A/T and C/G SNPs"will exclude check the "Force Exclude" boxes for SNPs with strand issues, and "Reset Table"will return the table to its initial state. Beneath these buttons are several additional tagging options.You can choose from among pairwise and two aggressive tagging strategies discussed above.You can also set the r2and LOD thresholds as previously mentioned. Additionally, you canspecify the maximum number of tags to pick, as well as the minumum distance (in base pairs) between picked tags. You can load a set of SNPs to include or exclude using the "Load Includes" and "LoadExcludes" buttons. These buttons take in a file with a single column of SNPs to include or exclude. The "Alleles to Capture" button also takes in a file with a single column of SNPs to be captured. Design scores can also be loaded in using the "Design Scores" button. Design score files should contain two columns containing the SNP and the design score to assign to that SNP. A minimum design score threshold can alsobe specified. All of the Tagger thresholds can be reset to theirdefault values using the "Reset Thresholds" button. Clicking "RunTagger" will run the tagging algorithm. When finished it will switch from the Configuration to the Results Panel.

Tagger Results Panel

This panel is split into a "Tests" section on the left and a marker-by-marker report on the right. The marker report lists all SNPs, the test which best captures them, and their r2 with that test.SNPs which were unchecked from the "Capture this allele?" list on the Configuration panel are greyed out. SNPs which could not be successfully tagged are shown in red. The first list in the "Tests" section shows all the tests (both single marker and multi-marker alleles) chosen by Haploview. Selecting tests in this list will show which alleles are captured by those tests in the second list in the panel. Beneath these lists is a summary of the tagging results.

  • Captured N alleles with mean r2 of X.

    This shows how many of the SNPs in the dataset have been successfully tagged by the set of chosen tests. The mean r2represents the mean for only those SNPs successfully captured.

  • Captured N percent of alleles with r2 >0.8

    This shows what fraction of the alleles captured by the tests have an r2 >= 0.8. Of course, if your tagging r2 threshold is >= 0.8 this value will always be 100%.

  • Using N SNPs in M tests.

    This shows that N unique SNPs have been chosen to create M tests,which can either be one of the set of N SNPs or some combination of those SNPs.

The "Dump Tests File" button exports a file with the list of tests in the format used by Haploview's custom association test file and Tagger's export. This file contains the list of all tests(single SNPs and multi-marker tests) selected by Tagger for subsequent association analysis. In pairwise-only tagging this file will be identical to the "Tags" file, below.

This file is the same format used by Haploview for custom association tests and exported by Tagger.
 

The "Dump Tags File" button exports a file with the list of Tag SNPs in the format used by Haploview's custom association test file and Tagger's export. It is the concise list of SNPs selected by Tagger for genotyping. In pairwise-only tagging this file will be identical to the "Tests" file, above.

The Tagger text output begins with several pieces of summary information. More details on this can be found in the Tagger section. The rest of the output is divided into two sections. The first lists each marker, with the following rows:

  1. Marker is the marker name.
  2. Best Test is the test with the highest r2 to this marker.
  3. r^2 w/test is the r2 between this marker and its test.

The second part consists of a list of the tests and the alleles they capture best.

The "Export Tab to Text" option in the File menu will export a summary file showing the best tag for each marker and the list of tests along with the alleles tagged by each test.