Documentation

The SNAP search service allows you to find proxy SNPs based on linkage disequilibrium, physical distance and membership in selected commercial genotyping arrays. SNAP can also generate plots showing regional linkage disequilibrium or regional assocation.

SNAP Data Sources

Linkage disequilibrium data is calculated using Haploview 4.0 , based on phased genotype data from the International HapMap Project and the 1000 Genomes Project. See SNP Data Set below for more information.

Genome coordinates, genetic mapping data and recombination rates are from the International HapMap Project or from the 1000 Genomes Project.

Information on the composition of the listed commercial arrays is derived from data published by the array vendors.

Snp and gene annotations are from GeneCruiser. The GeneCruiser annotations are based on the Ensembl database. Ensembl automatically annotates and indexes data from several sources such as Entrez, RefSeq, EMBL, Uniprot/SWISS-PROT, Affymetrix and Gene Ontology (GO).

Data for the gene tracks on SNAP plots are taken from the RefSeq track of the UCSC Genome Browser.

Proxy Search

The proxy search function allows you to find proxy SNPs based on a variety of criteria, including linkage disequilibrium, physical distance and whether the query and/or proxy snp is present on selected commercial genotyping arrays.

Proxy Search Inputs

The proxy search function has the following inputs.

Input SNPs

Choose the method you would like to use to select query SNPs.

File

Allows you to upload a file containing a list of SNP names (rs numbers).

By default, a file is expected that contains a list of SNP names, one per line. If the file is formatted into columns (delimited by white space), you can specify that column that contains the SNP names and also whether there is a header line in the file that should be ignored.

Text Input

Allows you to type in SNP names or paste them into a box, one SNP name per line.

Genomic Locus

Allows you to select query SNPs by selecting a chromosome and start/end coordinates.

If you omit the start or end coordinate, it defaults to the end of the chromosome. There is a 1 megabase limit to queries by genomic locus.

SNP Data Set

The linkage disequilibrium data is based on the SNP data set and panel chosen.

HapMap Release 21

HapMap Release 21 uses SNP data from HapMap phase II and has phased genotype data for CEU, YRI and JPT+CHB HapMap panels. When using this data set, inter-SNP distances are measured in hg17 coordinates.

HapMap Release 22

HapMap Release 22 uses SNP data from HapMap phase II and has phased genotype data for CEU, YRI and JPT+CHB HapMap panels. When using this data set, inter-SNP distances are measured in hg18 coordinates.

Warning: HapMap Release 22 does not include any phased genotype data for chrX.
As a result, no proxies will be returned for SNPs on chrX when using this data set.

HapMap 3 (release 2)

HapMap 3 has phased genotype data for all HapMap panels listed below. HapMap3 (release 2) includes only SNPs genotyped in HapMap phase III release 2. It does not include SNPs genotyped in HapMap phases I or II. When using this data set, inter-SNP distances are measured in hg18 coordinates. When using this data set, the recombination rate data used is from HapMap Release 22 and so is missing on some HapMap 3 SNPs.

Warning: HapMap 3 (release 2) does not include any phased genotype data for chrX.
As a result, no proxies will be returned for SNPs on chrX when using this data set.

1000 Genomes Pilot 1

This data set uses sequence-based SNP genotype data from the low-coverage sequencing pilot (Pilot 1) of the 1000 Genomes Project. This data set uses phased genotypes for 179 individuals from the HapMap CEU, YRI and JPT+CHB panels. Inter-SNP distances are measured in hg18 coordinates.

SNAP does not calculate LD data for chrY, so proxy searches for SNPs on the Y chromosome will never return any results.

Population Panel

The linkage disequilibrium between SNPs is calculated separately for each of the following panels in each SNP data set:

ASW African ancestry in Southwest USA
CEU Utah residents with Northern and Western European ancestry from the CEPH collection
CEU+TSI Combined panel of Utah residents with Northern and Western European ancestry from the CEPH collection and Toscans in Italy
CHD Chinese in Metropolitan Denver, Colorado
GIH Gujarati Indians in Houston, Texas
JPT+CHB Combined panel of Japanese in Tokyo, Japan and Han Chinese in Beijing, China
JPT+CHB+CHD Combined panel of Japanese in Tokyo, Japan, Han Chinese in Beijing, China and Chinese in Metropolitan Denver, Colorado
LWK Luhya in Webuye, Kenya
MEX Mexican ancestry in Los Angeles, California
MKK Maasai in Kinyawa, Kenya
TSI Toscans in Italy
YRI Yoruba in Ibadan, Nigeria
r2 Threshold

Sets a lower bound on the linkage disequilibrium between the query SNP and each corresponding proxy SNP. Proxies with a computed r2 value less than the given threshold will not be returned.

Selecting No limit will return data records for all available proxies as reported by Haploview.

Distance Limit

The maximum distance in kilobases between the query SNP and the proxy SNP. SNAP only calculates and stores linkage disequilibrium data for markers within 500Kb of each other, so no proxies farther than 500Kb from the query SNP will be returned.

Distances are based on the HapMap annotations for each SNP and may be different in different SNP data sets. HapMap Release 21 uses hg17 coordinates and the other three data sets, HapMap Release 22, HapMap3_r2 and 1000 Genomes Pilot 1 use hg18 coordinates.

If you select a distance of zero and also check Include each query SNP as a proxy for itself then data records will be returned only for the query SNPs (if they pass other filters). This can be used, for example, to determine which arrays contain a given set of SNPs.

Download To

If you select File, then your browser should offer to save the search results to a file on your local computer. If you select Browser, then the search results will be displayed in your browser window. Currently, the output in both cases is tab delimited text, so columns may not line up when displayed in your browser.

Include each query SNP as a proxy for itself

If checked, then an extra data record is returned for each query SNP listing itself as proxy. This data record is distinguished by a distance field of zero.

In this data record, the SNP field will list the query SNP while the Proxy field will list the HapMap rs# that corresponds to the query SNP in the specified SNP data set. These values may be different due to merging of rs numbers in different dbSNP releases.

Suppress warning messages in output

By default, the search results will contain at least one output record for every query SNP. If no proxies are found that meet the specified criteria, a warning message is included in the output. The SNP field of the warning record lists the query SNPs and the rest of the line is free text describing the problem.

Checking this box suppresses these warnings messages. In this case, if no proxy is found then the query SNP may not appear in the output at all.

Filter By Array

Use the check boxes to select one or more commercial arrays. Only SNPs that are present on one of the selected arrays will be returned.

The check boxes at the top of the columns can be used to quickly check or uncheck all arrays by that vendor (Illumina or Affymetrix). The select all and unselect all links can similarly be used to quickly check or uncheck all of the boxes.

The array filter can be applied to both the query SNPs and the proxy SNPs, or just to the proxy SNPs. If the filter is applied to both, then query SNPs not on the selected arrays will be skipped and no proxies will be returned for these query SNPs. If the filter is applied to proxy SNPs only, then proxy SNPs that are not on one of the selected arrays will not be returned.

Output Columns

Use the check boxes to select additional columns that can be displayed. The values for each column are described in detail under Search Output.

Checking Associated Gene Annotations from GeneCruiser will add columns indicating any genes associated with this snp and the type of the snp (e.g. non-synonymous, intronic).

Gene annotations are retrieved in real time from the GeneCruiser server. Queries that retrieve gene annotations run slower than queries that do not.

Search Output

Proxy search results are returned as a tab delimited file, one data record per line. The output fields are described below.

Each search result contains the query SNP in the first column. The second column is normally an associated proxy SNP. When problems arise, however, the second column may be the text WARNING or ERROR, in which case the rest of the line is free text that describes the problem.

The search results are always returned in the same order as the query SNPs. For a given query SNP, the returned proxy SNPs are sorted in descending order of r2, then in order of increasing distance from the query SNP.

Columns

SNP

The query SNP exactly as input.

Proxy

The name of the proxy SNP.

The name used for the proxy SNP is always the rs number used in the specified SNP data set. This may be different than the most current rs number for the snp and the rs number used for a particular SNP could be different in different SNP data sets.

This column may also contain the text WARNING or ERROR if this is an error record.

Distance

The distance (in bases) between the query SNP and the proxy SNP.

Distances are derived from the HapMap metadata for the specified SNP data set. Release 21 uses hg17 coordinates. Release 22, HapMap3r2 and 1000 Genome Pilot 1 use hg18 coordinates.

RSquared

(r2) Measure of linkage disequilibrium (correlation coefficient).

DPrime

(D') Measure of linkage disequilibrium (normalized for allele frequency).

Arrays

Comma-separated list of the commercial arrays that contains this SNP.

The array abbrevions are listed in the table below.

Chromosome

The chromosome that contains the snp, using mapping data from HapMap.

Coordinate

The base position of the snp on the chromosome, using mapping data from HapMap.

The column heading also specifies the genome assembly used for the selected SNP data set.

RecombinationRate

Recombination rate in centimorgans per million bases.

GeneticMapDistance

Distance from the query snp to the proxy snp in centimorgans.

GeneticMapPosition

Position of the snp on the genetic map for this chromosome in centimorgans.

GeneVariant

Classifies the effects of this snp on related genes.

If the snp may have multiple effects (either due to relationships to different genes or to different isoforms of the same gene), then all annotated classfications are listed, separated by commas.

GeneName

The name of any associated gene.

HUGO gene identifiers are used, if known, otherwise Ensembl stable identifiers are used.

If the snp is related to multiple genes, the gene names are all listed, separated by commas.

GeneDescription

Free text description of the associated gene.

If the snp is related to multiple genes, all descriptions are listed, separated by commas.

Sources for each gene description are included with the description text (typically Uniprot/SWISSPROT).

Array Codes

The following tables lists all of the commercial arrays currently known by SNAP, along with the two or three digit array code used by SNAP for each array.

CodeVendorDescription
AXAffymetrixMapping50K XbaI
AHAffymetrixMapping50K HindIII
AGAffymetrix50K Human Gene Focused chip
ANAffymetrixMapping250K NspI
ASAffymetrixMapping250K StyI
A5AffymetrixAffy 5.0
A6AffymetrixAffy 6.0
ADAffymetrixAffymetrix DMET plus
AxMAffymetrixAffymetrix Axiom
I1IlluminaHuman-1
I2IlluminaHumanHap240
I3IlluminaHumanHap300
ICIlluminaHumanHap370CNV (single)
ICQIlluminaHumanHap370CNV (quad)
I5IlluminaHumanHap550
I6IlluminaHumanHap650
I6QIlluminaHumanHap610 (quad)
IMIlluminaHuman1M
IMDIlluminaHuman1M (duo)
IBCIlluminaCARe iSelect Array
CYTIlluminaCyto12
OQIlluminaOmniQuad
CMIlluminaCardio-Metabochip
IWQIlluminaIllumina 660W-Quad
OEIlluminaIllumina OmniExpress

Pairwise LD

The Pairwise LD function in SNAP allows you to quickly ascertain the linkage disequilibrium between two SNPs or to find which SNPs in an input set are in sigificant LD with each other.

To find the computed LD between two SNPs, enter them in the text box or upload a file containing the names of the two SNPs.

If you enter more than two SNPs, then SNAP will perform an "all-vs-all" search and produce an output record for every pair of SNPs in the query set that have detectable LD and meet the other search criteria (r2 threshold, distance, array membership, etc.).

The search options and outputs for the Pairwise LD search are otherwise the same as the SNAP Proxy Search, with two exceptions:

  • By default, no row is returned for the query snp itself.
  • Warning messages are suppressed by default.
These settings reduce the output when comparing many SNPs, but you can override these defaults.

Plots

SNAP can generate two kinds of plots: regional LD plots and regional association plots.

Regional LD plots give a visual representation of the same kinds of information provided by the SNAP Proxy Search function. The left-hand y-axis shows values for r2.

Regional association plots allow you to upload and plot p-values for SNPs, overlaid with data from SNAP such as r2 and recombination rate. For association plots, the left-hand y-axis shows -log10P.

SNAP plots can be displayed in the browser and can also be downloaded as high quality, publication-ready PDF files.

Regional LD Plots

To generate regional LD plots, enter a set of query SNPs. You will then be able to view plots for each of the query SNPs and to adjust the plot parameters for each SNP on the preview page.

Input SNPs

Choose the method you would like to use to select query SNPs.
Plot preview is currently limited to 100 query SNPs at a time.

File

Allows you to upload a file containing a list of SNP names (rs numbers).

By default, a file is expected that contains a list of SNP names, one per line. If the file is formatted into columns (delimited by white space), you can specify that column that contains the SNP names and also whether there is a header line in the file that should be ignored.

Text Input

Allows you to type in SNP names or paste them into a box, one SNP name per line.

Genomic Locus

Allows you to select query SNPs by selecting a chromosome and start/end coordinates.

If you omit the start or end coordinate, it defaults to the end of the chromosome. There is a 1 megabase limit to queries by genomic locus.

SNP Data Set

The linkage disequilibrium data is based on the SNP data set and panel chosen. See SNP data set for more details.

Population Panel

The panel selected affects the pairwise linkage disequilibrium calculated between SNPs. Pairwise LD is calculated separately for each panel. See Population Panels for more details. For the HapMap data sets, the background recombination rate is the same across all panels. For the HapMap 3 data set, the HapMap Release 22 recombination rate data is used. The 1000 Genomes Pilot 1 data set uses background recombination rates that are calculated separately for each panel.

r2 Threshold

Controls the display of dashed lines on the plot that indicate the extent of proxy SNPs within the specified r2 value. This can be changed when previewing each individual plot.

Distance Limit

The maximum distance in kilobases between the query SNP and the proxy SNPs that will be shown on the plot. This also controls how much of the genome is shown on the plot (the "zoom").

SNAP only calculates and stores linkage disequilibrium data for markers within 500Kb of each other.

Distances are based on the HapMap annotations for each SNP. They may be different in different SNP data sets and may be based on different genome assemblies in different SNP data sets.

Plot Preview

You can preview LD plots for up to 100 query SNPs at a time. The list on the left allows you to move from SNP to SNP. The controls below the plot allow you to change the plot options for currently displayed SNP.

After you change plot options, you need to click on Update Plot to regenerate the plot. It may take a few seconds to regenerate the plot.

For high quality, publication-ready plots, click Download PDF to view or save a version of the current plot in Adobe Portable Document Format. You can download a free viewer for PDF files from the Adobe Systems web site.

Regional Association Plots

To generate a regional association plot, you need to prepare a file of association data containing, at a minimum, the names of the SNPs you want to plot and the associated p-value for each SNP.

The data file must contain whitespace-delimited columns and must contain a header line identifying the data in each column.

The columns can be in any order. Some columns are optional: The data in these columns will be used if supplied and will be defaulted otherwise. The data file can contain extra unrecognized columns which will be ignored.

The following input columns are recognized. There is some flexibility in the text allowed for each column title (different capitalization or abbreviations), but the column title needs to be recognizable by SNAP for the data to be used.

Note that column titles in the data file may not contain spaces. Spaces in column titles will cause them to be treated as multiple columns.

SNP

The dbSNP rs number for the SNP (required).

The input SNPs must be HapMap or 1000 Genomes pilot 1 SNPs unless you also supply position data for the SNP.

PValue

The p-value for the SNP as a floating point number (required).

Alternative column names: PVAL, p-value

Position

The position (coordinate) of this SNP on the chromosome (optional).

If you supply position data for a SNP, then SNAP will use that position. If you do not supply position data, SNAP will use the position data from HapMap for that SNP if it is a HapMap SNP. If no position can be determined, then the SNP will not be plotted.

All SNPs in the input file must reside on the same chromosome and must be within the same locus (currently defined as being within 1Mb).

Alternative column names: POS

RSquared

Measure of the LD between this SNP and the target SNP (optional).

If you supply an r2 value for a SNP, then SNAP will use that value in the plot. If you do not supply an r2 value, SNAP will use its internally calculated value based on HapMap if both SNPs are HapMap SNPs. Otherwise, SNAP assumes an r2 value of zero.

Alternative column names: RSQR, R-squared, r2

SnpType

Whether this SNP was genotyped or imputed (optional).

Values in this column should be one of the strings typed or imputed.

Alternative column names: TYPE

Target SNP

This SNP is the focus of the association plot. The r2 values are computed from this SNP to all of the other plotted SNPs. The default is to use the input SNP with the lowest p-value, but you can use this to force the focus of the plot to be a different SNP.

SNP Data Set

Several features on the plots depend on the SNP data set chosen, including the SNP coordinates, the r2 values and the background recombination rates. See SNP Data Set for more details.

Population Panel

Several features on the plots depend on the population panel chosen, including the r2 values. See Population Panels for more details.

Plot Preview

The controls below the plot allow you to adjust the plot options. After you change plot options, you need to click on Update Plot to regenerate the plot. It may take a few seconds to regenerate the plot.

The Download Plot Data link generates a tab-delimited file containing the SNP data used in the plot. This is a combination of data supplied by you in the upload association data file and data filled in by SNAP (such as r2 values and SNP coordinates).

For high quality, publication-ready plots, click Download PDF to view or save a version of the current plot in Adobe Portable Document Format. You can download a free viewer for PDF files from the Adobe Systems web site.

Map SNP IDs

A recurring problem is the need to determine whether two lists of SNPs overlap (for example, hits from two different studies). This problem is nettlesome because dbSNP IDs (rs numbers) are often merged in later versions of dbSNP resulting in many different aliases for the same SNP and different preferred rs numbers for the same SNP over time.

SNAP provides a utility to map SNP IDs between different versions of dbSNP and to report on all of the known rs number aliases.

Mapping Methodology

Given an rs number and a target build of dbSNP, the Map SNP IDs utility provides two pieces of data:

  • The preferred rs number for the input SNP in the target dbSNP build.
  • All other aliases for the input SNP in the target dbSNP build.

The default behavior is to map the input SNPs to the most recent dbSNP build. This allows you to easily determine whether two SNPs are the same or to ascertain the overlap between two lists of SNPs.

Mapping input SNPs to earlier versions of dbSNP can be useful in other cases. For example, if you are querying a database that is based on an earlier build of dbSNP, your query may fail if you are not using the preferred rs number (or perhaps an alias) from that particular build of dbSNP.

Mapping Newer SNP IDs

If you want to map an rs number to an non-current build of dbSNP, it is possible that the rs number did not exist at the time of the earlier dbSNP build. In this case, SNAP will try to determine whether there are rs numbers in the earlier build that are now aliases for the input rs number. The algorithm used is as follows:

  1. Map the input rsID to latest dbSNP build.
  2. The target build alias set is the union of the aliases in the target build of all rsIDs that are aliases of the input rsID in the current build which also existed in the target build.
  3. The target build preferred aliases are that subset of the target build alias set that are preferred aliases (for some SNP) in the target build.

Note that when a newer SNP is mapped to an earlier build, there may be more than one preferred alias in the earlier build. This can happen when two rs numbers are later merged into a cluster that contains the input rs number. In this case, the preferred alias in the earlier build is ambiguous and SNAP will report more than one preferred alias.

Input SNPs

You can choose one of two methods to provide the list of input SNPs.

File

Allows you to upload a file containing a list of SNP names (rs numbers).

By default, a file is expected that contains a list of SNP names, one per line. If the file is formatted into columns (delimited by white space), you can specify that column that contains the SNP names and also whether there is a header line in the file that should be ignored.

Text Input

Allows you to type in SNP names or paste them into a box, one SNP name per line.

Map To Build

Select the dbSNP build that you want to map the input SNPs to. The default is the most recent build of dbSNP.

If you check this option, then no output is produced for any input SNPs where the preferred rs number is the same as the input rs number. This reduces the amount of output in the common case where the input SNP has no aliases or you are already using the preferred alias.

By default, if the target dbSNP build is not the latest, then for rs numbers that did not yet exist at the time of the target build SNAP will determine all rs numbers from the target build that cluster with the input rs number in the latest build of dbSNP. See Mapping Newer SNP IDs for details.

If you do not want this behavior, then you can uncheck this option. If you uncheck this option, then an input SNP that is newer than the target build will always report no aliases and will list None for the preferred rs number.

Download To

If you select File, then your browser should offer to save the search results to a file on your local computer. If you select Browser, then the search results will be displayed in your browser window. Currently, the output in both cases is tab delimited text, so columns may not line up when displayed in your browser.