IGV for iPad: Hosted Genomes & Public Data Tracks

This page provides instructions on how to select a hosted genome, and how to load and view public datasets made available through the Tracks menu, including brief descriptions and links to more information. 

Select a Reference Genome

IGV displays data in the context of a reference genome. Before loading any data tracks, you must select the reference genome. To switch between genomes:

  1. Tap on the Genomes button in the upper left corner of the screen.
  2. Select a genome from the list.
  3. Select a chromosome from the list. This will be the initial viewing locus.

  • Selecting a reference genome will automatically load the corresponding Genes annotation track.
  • Switching genomes removes previously loaded tracks. Switching chromosomes using the menu within the same genome preserves displayed tracks.
  • For preparing and using other genomes, refer to User-defined Reference Genomes.

Load Data

Tap on the Tracks button on the toolbar, then select Public Tracks. Depending on the genome selected, the Public Tracks primary menu options will differ as summarized in the table under Public Data Tracks below. This is because datasets are pinned to coordinates specific to genome release versions. Each primary menu item slides out another nested menu, or in the case of hg19 and mm9 ENCODE data, a full page view of options. Drill down lists and toggle sliders to green to display data tracks one at a time or multiple sets at a time depending on the dataset.

  • You may need to scroll down to see all items on a list. 
  • Depending on the size of the dataset and the network speed it may take time for a selected track to display.
  • For some track types, IGV for iPad does not load the data unless the view is zoomed to a sufficiently small region. When zoomed out to a larger region, the track will display a message to zoom in to see the data. 

If you have trouble loading and viewing datasets try the following:

  • Reduce the number of tracks.
  • For sequence alignments, adjust data-loading parameters as outlined in Manage Data Tracks.

Hosted Genomes

The following genome assemblies are currently provided by default: Human hg19, Human hg18, and Mouse mm9. IGV for iPad hosted genomes are accessed through Broad Institute Servers and originate from assemblies at the University of California Santa Cruz (UCSC) as summarized in the table. 

Hosted Genome Assembly and Original Source Release Name Notes
Human hg19

UCSC hg19 (GCA_000001405.1), February 2009

UCSC Genome Bioinformatics

Genome Reference Consortium GRCh37

Starting with hg19, the human genome sequence is provided by the Genome Reference Consortium (GRC). Their goal is to correct the small number of regions in the reference that are currently misrepresented, to close as many remaining gaps as possible, and to produce alternative assemblies of structurally variant loci as needed. Sequencing credits are listed by chromosome here. For updated information on the human genome go to the National Human Genome Research Institute (NHGRI) website.

Human hg18

UCSC hg18 (NCBI build 36.1), March 2006

UCSC Genome Bioinformatics

NCBI Build 36.1

Prior to hg19, the human genome sequence data were generated by laboratories belonging to the Human Genome Sequencing Consortium. The credits are the same as for hg19.

Mouse mm9

UCSC mm9 (NCBI build 37), July 2007

UCSC Genome Bioinformatics

NCBI Build 37

The initial 3X coverage of the mouse genome was by the Mouse Sequencing Consortium, now known as the Mouse Genome Sequencing Consortium (MGSC), and is acknowledged here. The publication titled Initial sequencing and comparative analysis of the mouse genome is featured in a special 2002 Nature mouse genome issue.

Public Data Tracks

Default genes track.  When you select a hosted reference genome, IGV automatically loads the corresponding Genes track which is based on RefSeq annotations.

  • Boxed exons, arrowed introns, arrowed directionality, and gene symbols are represented.
  • New tracks are added to the top, and the Genes track is shifted downwards.

Summary of public data tracks

  hg19 hg18 mm9
Annotations
  • CpG, %GC
  • dbSNP 1.3.1 (variation & repeats)
  • Conservation:
    • siphy rate
    • Phastcons
    • phyloP
  • CpG, %GC
  • dbSNP 129 & 130
  • Conservation:
    • siphy rate
    • siphy pi
    • Phastcons
  • %GC
  • dbSNP 128

Human Body Map

  • Transcript Assemblies:
    • 24 Scripture
    • 23 Cufflinks excludes adipose
  • Coverage:
    • 24 tissues
  • Alignments:
    • 16 Merged 50bp & 75bp
    • 16 50bp (paired-end)
    • 16 75bp (single-end)
    • 8 GAII 75bp paired-end
  • Merged 50bp & 75bp:
    • 16 Reads (alignments)
    • 16 Coverage  
 
Chromatin Regulators
  • H1 embryonic stem (ES) cells:
    • 11 Histone Modifications
    • 16 Chromatin Regulators
  • K562 myelogenous leukemia cells:
    • 12 Histone Modifications
    • 29 Chromatin Regulators
   
ENCODE
  • 19,954 sets
  • 93 sets (Broad Histone)
  • 3,258 sets
1000 Genomes
  • Link on IGV for iPad coming soon
   

 

Additional information on public data tracks

  Notes
Annotations

Percent guanine-cytosine content (%GC) and at least one release from the Database of Short Genetic Variation (dbSNP handbook, dbSNP NCBI site) of single-nucleotide polymorphisms are provided for each genome assembly. For human assemblies, CpG islands and various conservation tracks (siphy rate, Vertebrate 46 way) are also given.

Human Body Map

The alignments, coverage tracks, and transcript assemblies for the 24 deep coverage RNA-Seq samples were performed by Cabili et al. In the linked publication, the authors focused on analysis of a subset of the data, for which they provide the Human Body Map lincRNAs website, where data files are available to the public. The core of the dataset are 16 tissue samples from the Illumina Body Map 2.0 project sequenced each in a full lane of an Illumina HiSeq2000 (~236 million reads per sample; EMBL-EBI website E-MTAB-513, NCBI GEO series GSE30611), and these are complemented with 8 additional samples sequenced each in a full lane of an Illumina Genome Analyzer or GAII (~55 million reads per sample; NCBI GEO series GSE30554). All RNA libraries were poly-A selected and random primed giving non-stranded data. The HiSeq 50bp paired-end and 75bp single-end reads data were from the same library preparation and thus reads were combined to form the Merged 50bp & 75bp dataset with which assemblies and coverage tracks were made. The coordinates for hg19 Merged 50bp & 75bp data were converted to apply to hg18 using UCSC Genome Bioinformatics Batch Coordinate Conversion (liftOver).

  • The 16 Illumina Body Map 2.0 tissues are from 16 different individuals, male & female and ranging in age from 19 to 86, who died of tissue-unrelated causes. The tissues are adipose, adrenal, brain*, breast, colon, heart, kidney, liver*, lung, lymph node, ovary, prostate, skeletal muscle, testes*, thyroid, and white blood cells. Explore data further on the Expression Atlas website.
  • Cabili et al. added human lung fibroblasts (HLF, two biological replicates marked with 1 & 2), foreskin fibroblasts, hela cells, placenta, and the replicate tissue samples of brain*, liver*, and testes*. The astericked replicate tissue is from a different individual. Cell lines were cultured by Cabili et al. and sample labels are differentiated by an R for Rinn laboratory.

Coverage. Coverage tracks can be displayed separately for the hosted alignments on IGV for iPad. While alignments display a limited depth of up to 50, the coverage tracks summarize all alignments.

Transcript Assemblies. The Cufflinks and Scripture transcript assemblies provide two alternative transcriptome reconstructions to the Genes track. Both are ab-initio, splice-aware assembly algorithms for short reads that align using TopHat but assemble using different statistical methods. Cufflinks focuses on precision while Scripture maximizes sensitivity of isoform detection. The 2010 Nature Biotechnology issue compares the methods in News and Views.

Chromatin Regulators

The genome-wide binding maps for 29 chromatin regulators by ChIP-Seq and their relationships to histone modification states are published in a 2011 issue of Cell, in a paper titled Combinatorial Patterning of Chromatin Regulators Uncovered by Genome-wide Location Analysis in Human Cells.  The authors conclude that chromosome regulators bind in characteristic modular combinations, even in different cell types where chromosome regulators often redistribute to different genomic regions. Data can be accessed on the CRome web portal here.

ENCODE

Encyclopedia of DNA Elements (ENCODE) is a project funded by the National Human Genome Research Institute to identify all regions of transcription, transcription factor association, chromatin structure, and histone modification in the human genome sequence. Identification of DNA elements have led to the association of at least one biochemical function with 80% of human genome components. These functional annotations provide insight into the organization and regulation of genes and the genome.

Data represent 30 papers published simultaneously across three journals--Nature, Genome Research, and Genome Biology--and can be explored collectively by thematic threads that include figures and tables from the papers using the Nature ENCODE Explorer. Alternatively, to learn more about specific datasets go to the ENCODE search page.

Within IGV for iPad, narrow down searches via the textbox provided in the full page view.

  • Omit separating commas.
  • Arising from iPad column width constraints, if you find a need to distinguish between seemingly identical datasets, refer to desktop IGV, as the listed order of ENCODE data is identical. 
1000 Genomes

The 1000 Genomes Project, an international public-private consortium, aims to catalog human genetic variation with data from over 2,500 people from 26 populations at 4X coverage. IGV for iPad hosts public data from the pilot phase as described in a 2010 Nature paper titled A map of human genome variation from population-scale sequencing. The pilot phase consisted of three projects and informed subsequent project design, in Phase 3 as of 2014:

  • low-coverage (2x) whole-genome sequencing of 179 individuals from 4 populations
  • high-coverage (20X per genome) sequencing of 2 trios (mother-father-child)
  • exon-targeted sequencing of 697 individuals from 7 populations with deep coverage (20x)

It was found that on average, each person carries around 250-300 loss-of-function variants in annotated genes and 50-100 variants previously implicated in inherited disorders. Based on the two trios, it is estimated that the rate of de novo germline mutation is approximately 10-8 per base per generation.

 

Back to IGV for iPad Homepage                                              Go to next section of guide, Manage Data Tracks