# Tagged with #igv 2 documentation articles | 0 announcements | 8 forum discussions

Created 2015-11-24 22:15:30 | Updated 2015-12-16 22:43:04 | Tags: bam igv

Visualize sequence read alignment data (BAM or SAM) on IGV using this quick-start tutorial. The Integrative Genomics Viewer is a non-GATK tool developed at the Broad Institute that allows for interactive exploration of large genomic datasets.

#### Prerequisites

• Coordinate-sorted and aligned BAM or SAM file
• Corresponding BAI index
• Matching reference genome to which the reads align. See IGV hosted genomes to check if IGV hosts a reference genome or this page for instructions on loading a .genome or FASTA file genome.

• tutorial_6491.tar.gz contains a coordinated-sorted BAM and corresponding BAI. Most reads align to a 1 Mbp genomic interval on chromosome 10 (10:96,000,000–97,000,000) of the human GRCh37 reference assembly. Specifically, reads align to GATK bundle's human_g1k_v37_decoy.fasta that corresponds to the Human (1kg, b37+decoy) reference hosted by IGV.

## View aligned reads using IGV

To view aligned reads using the Integrative Genomics Viewer (IGV), the SAM or BAM file must be coordinate-sorted and indexed.

1. Always load the reference genome first. Go to Genomes>Load Genome From Server or load from the drop-down menu in the upper left corner. Select Human (1kg, b37+decoy).
2. Load the data file. Go to File>Load from File and select 6491_snippet.bam. IGV automatically uses the corresponding 6491_snippet.bai index in the same folder.
3. Zoom in to see alignments. For our tutorial data, copy and paste 10:96,867,400-96,869,400 into the textbox at the top and press Go. A 2 kbp region of chromosome 10 comes into view as shown in the screenshot above.

Alongside read data, IGV automatically generates a coverage track that sums the depth of reads for each genomic position.

## Find a specific read and view as pairs

1. Right-click on the alignment track and Select by name. Copy and paste H0164ALXX140820:2:2107:7323:30703 into the read name textbox and press OK. IGV will highlight two reads corresponding to this query name in bold red.
2. Right-click on the alignment track and select View as pairs. The two highlighted reads will display in the same row connected by a line as shown in the screenshot.

Because IGV holds in memory a limited set of data overlapping with the genomic interval in view (this is what makes IGV fast), the select by name feature also applies only to the data that you call into view. For example, we know this read has a secondary alignment on contig hs37d5 (hs37d5:10,198,000-10,200,000).

### Some tips

If you find IGV sluggish, download a Java Web Start jnlp version of IGV that allows more memory. The highest memory setting as of this writing is 10 GB (RAM) for machines with 64-bit Java. For the tutorial example data, the typical 2 GB allocation is sufficient.

• To run the jnlp version of IGV, you may need to adjust your system's Java Control Panel settings, e.g. enable Java content in the browser. Also, when first opening the jnlp, overcome Mac OS X's gatekeeper function by right-clicking the saved jnlp and selecting Open with Java Web Start.

To change display settings, check out either the Alignment Preferences panel or the Alignment track Pop-up menu. For persistent changes to your IGV display settings, use the Preferences panel. For track-by-track changes, use the Pop-up menus.

Default Alignment Preferences settings are tuned to genomic sequence libraries. Go to View>Preferences and make sure the settings under the Alignments tab allows you to view reads of interest, e.g. duplicate reads.

• IGV saves any changes you make to these settings and applies them to future sessions.
• Some changes apply only to new sessions started after the change.
• To restore default preferences, delete or rename the prefs.properties file within your system's igv folder. IGV automatically generates a new prefs.properties file with default settings. See [IGV's user guide] for details.

After loading data, adjust viewing modes specific to track type by right-clicking on a track to pop up a menu of options. For alignment tracks, these options are described here.

Created 2013-07-02 00:16:14 | Updated 2015-09-24 12:12:04 | Tags: install rscript igv picard gsalib samtools r ggplot2 rstudio

#### Objective

Install all software packages required to follow the GATK Best Practices.

#### Prerequisites

To follow these instructions, you will need to have a basic understanding of the meaning of the following words and command-line operations. If you are unfamiliar with any of the following, you should consult a more experienced colleague or your systems administrator if you have one. There are also many good online tutorials you can use to learn the necessary notions.

• Basic Unix environment commands
• Binary / Executable
• Compiling a binary
• Command-line shell, terminal or console
• Software library

You will also need to have access to an ANSI compliant C++ compiler and the tools needed for normal compilations (make, shell, the standard library, tar, gunzip). These tools are usually pre-installed on Linux/Unix systems. On MacOS X, you may need to install the MacOS Xcode tools. See https://developer.apple.com/xcode/ for relevant information and software downloads. The XCode tools are free but an AppleID may be required to download them.

Starting with version 2.6, the GATK requires Java Runtime Environment version 1.7. All Linux/Unix and MacOS X systems should have a JRE pre-installed, but the version may vary. To test your Java version, run the following command in the shell:

java -version 

This should return a message along the lines of ”java version 1.7.0_25” as well as some details on the Runtime Environment (JRE) and Virtual Machine (VM). If you have a version other than 1.7.x, be aware that you may run into trouble with some of the more advanced features of the Picard and GATK tools. The simplest solution is to install an additional JRE and specify which you want to use at the command-line. To find out how to do so, you should seek help from your systems administrator.

#### Software packages

1. BWA
2. SAMtools
3. Picard
4. Genome Analysis Toolkit (GATK)
5. IGV
6. RStudio IDE and R libraries ggplot2 and gsalib

Note that the version numbers of packages you download may be different than shown in the instructions below. If so, please adapt the number accordingly in the commands.

### 1. BWA

• Installation

Unpack the tar file using:

tar xvzf bwa-0.7.12.tar.bz2 

This will produce a directory called bwa-0.7.12 containing the files necessary to compile the BWA binary. Move to this directory and compile using:

cd bwa-0.7.12
make

The compiled binary is called bwa. You should find it within the same folder (bwa-0.7.12 in this example). You may also find other compiled binaries; at time of writing, a second binary called bwamem-lite is also included. You can disregard this file for now. Finally, just add the BWA binary to your path to make it available on the command line. This completes the installation process.

• Testing

Open a shell and run:

bwa 

This should print out some version and author information as well as a list of commands. As the Usage line states, to use BWA you will always build your command lines like this:

bwa <command> [options] 

This means you first make the call to the binary (bwa), then you specify which command (method) you wish to use (e.g. index) then any options (i.e. arguments such as input files or parameters) used by the program to perform that command.

### 2. SAMtools

• Installation

Unpack the tar file using:

tar xvjf samtools-0.1.2.tar.bz2 

This will produce a directory called samtools-0.1.2 containing the files necessary to compile the SAMtools binary. Move to this directory and compile using:

cd samtools-0.1.2
make 

The compiled binary is called samtools. You should find it within the same folder (samtools-0.1.2 in this example). Finally, add the SAMtools binary to your path to make it available on the command line. This completes the installation process.

• Testing

Open a shell and run:

samtools 

This should print out some version information as well as a list of commands. As the Usage line states, to use SAMtools you will always build your command lines like this:

samtools <command> [options] 

This means you first make the call to the binary (samtools), then you specify which command (method) you wish to use (e.g. index) then any options (i.e. arguments such as input files or parameters) used by the program to perform that command. This is a similar convention as used by BWA.

### 3. Picard

• Installation

Unpack the zip file using:

tar xjf picard-tools-1.139.zip 

This will produce a directory called picard-tools-1.139 containing the Picard jar files. Picard tools are distributed as a pre-compiled Java executable (jar file) so there is no need to compile them.

Note that it is not possible to add jar files to your path to make the tools available on the command line; you have to specify the full path to the jar file in your java command, which would look like this:

java -jar ~/my_tools/jars/picard.jar <Toolname> [options]

This syntax will be explained in a little more detail further below.

However, you can set up a shortcut called an "environment variable" in your shell profile configuration to make this easier. The idea is that you create a variable that tells your system where to find a given jar, like this:

PICARD = "~/my_tools/jars/picard.jar"

So then when you want to run a Picard tool, you just need to call the jar by its shortcut, like this:

## pysam

import pysam
def bedtools(filename):
"""simulate the behaviour of bedtools"""
bamfile=pysam.AlignmentFile(filename,'rb')

name=ref['SN']

pileup=bamfile.pileup()
for pos,column in enumerate(pileup,1):
depth=column.nsegments
print(name,pos,depth)
if pos >= 10:
break

## IGV

Manually, using mouse-over of the depth graph in the default view to see the exact read depth on the tooltip

# Results

## (I couldn't get it aligned better here, paste it into excel for a proper view)  Position IGV pysam genomeCoverageBed gatk 1 127 89 128 89 2 130 92 131 92 3 130 92 131 92 4 133 95 134 95 5 136 98 137 98 6 137 99 138 99 7 140 102 141 102 8 141 103 142 103 9 142 104 143 104 10 146 108 147 108  Summary of results

• Only pysam and gatk agree
• IGV seems to count 38 reads more then pysam/gatk
• genomeCoverageBed counts 37 more then pysam/gatk

Created 2015-08-07 14:25:49 | Updated | Tags: depthofcoverage haplotypecaller dp solid igv

Hello Everyone!

I'm using the whole GATK workflow to analyze Target Resequencing data coming from SOLID platforms. I followed the Best Practices for analysis and used the proper SOLID flags when using BaseRecalibrator (--solid_recal_mode SET_Q_ZERO_BASE_N --solid_nocall_strategy PURGE_READ), however, when looking at the VCF files after Haplotype Caller something does not add up.

I checked some of the variants inside some of my samples and i found that the DP field does not report the same per base coverage value than the one that are reported by the bam (using the --bamOutput to produce a bam for Haplotype Caller) when looking at them using the IGV. As far as I understand, for each position there's a downsampling, but I'm see a lower DP value compared to the ones that are stored in the BAM I'm attaching an IGV screenshots of one of the variants in which i'm encountering this problem. I deactivated all filtering alignment options in IGV, as well as downsampling. Here's the line Reported in the VCF for this variant:

As you can see from the screenshot, not only the covers differ, but a lot of reads that maps according to the reference are missing- Does somebody has an idea of what happened to the coverage inside the VCF?

Thanks a lot for your time!

Daniele

Created 2015-03-31 20:22:27 | Updated | Tags: igv snps

Hi All,

I have a question regarding the SNP call by GATK3.2 vs the eye observation in IGV; both use hg19: We have three samples, in the IGV, I see the following genotypes from BAM file (after realign and recalibration; before HaplotypeCaller): chr10:17659149 Sample 1: 9Gs and 11Ts Sample 2: 6Gs and 6Ts Sample 3: 18Gs

But when I check the vcf produced by GATK, it shows: chr10 17659149 rs7895850 C G,T 1509.20 PASS AC=4,2;AF=0.667,0.333;AN=6;DB;DP=41;FS=0.000;MLEAC=4,2;MLEAF=0.667,0.333;MQ=60.00;MQ0=0;POSITIVE_TRAIN_SITE;QD=25.48;VQSLOD=6.71;culprit=MQ GT:AD:DP:GQ:PL 1/1:0,14,0:14:41:493,41,0,493,41,493 2/2:0,11,0:11:33:429,429,429,33,33,0 1/1:0,16,0:16:48:704,48,0,704,48,704

If you look at the GT field, the corresponding genotypes are sample1 as G, sample2 as T, sample3 as G. They are quite different from the IGV for sample 1 and 2. I am wondering if you have any idea about this?

Created 2014-04-22 17:53:55 | Updated | Tags: snp vcf igv

Hi, I start working with IGV, but I have some doubts in how to identify a good SPN in this program. First I download the new Soybean Genome on Phytozome (Gmax_275_v2.0.fa and Gmax_275_Wm82.a2.v1.gene.gff3 files), and then I upload my files (sample.vcf, sample.bam and sample.bam.bai) into the program. I indexed which files that program needed, so that's OK! But my doubt is which parameters should I consider for a good SNP? For example, what I need to see on Alleles, Genotypes and Variant Attributes? See the example below.

Chr: Chr06 Position: 35170948 ID: . Reference: C* Alternate: T Qual: 160 Type: SNP Is Filtered Out: No

Alleles: No Call: 0 Allele Num: 2 Allele Count: 4 Allele Frequency: 1

Minor Allele Fraction: 1

Genotypes: Non Variant: 0

• No Call: 0
• Hom Ref: 0 Variant: 1
• Het: 0
• Hom Var: 1

Variant Attributes AF1: 1 RPB: 5.557190e-01 VDB: 1.587578e-01 Depth: 18 FQ: -54 DP4: [1, 1, 6, 8] AC1: 2 Mapping Quality: 25 PV4: [1, 0.22, 1, 0.24]

Created 2013-04-25 07:36:27 | Updated 2013-04-25 07:38:09 | Tags: unifiedgenotyper igv

Greetings!

First of all, thank you for a truly great toolkit! It is no doubt the best one out there.

Now, I have a question regarding visualization of a SNP that is not called by UG but looks convincing in IGV. Yes, I've looked at the FAQ page gatkforums.broadinstitute.org/discussion/1235/why-didnt-the-unified-genotyper-call-my-snp-i-can-see-it-right-there-in-igv but I'm still not completely convinced that this is a false positive.

The BAM files have gone through the Best Practices workflow prior to SNP calling. Calling was done using UG with subsequent recalibration steps, where I followed the guidelines under gatkforums.broadinstitute.org/discussion/1259/what-vqsr-training-sets-arguments-should-i-use-for-my-specific-project. SNP calling was done using GATK 2.4-9.

Below is a screenshot from IGV showing the SNP call:

Fullsize here: s24.postimg.org/sepow851v/igv_snp.png

The average mapping quality for the reads that include the SNP is 50 and the average base quality at the locus of the SNP is 28.7 (not including 4 positions where base quality is below 10). These values are calculated from the values shown by IGV

Are these values really too low to not confidently call this SNP? I mean a base quality of 28.7 means a probability of 99.87% that the base call is correct. Isn't that good enough?