Dog Data at Broad
To sign up for updates to this site, please send an email to email@example.com. We will add you to our Google group for the dog data.
The Vertebrate Genome Biology group at the Broad Institute is dedicated to the support and advocacy of the domesticated dog (Canis lupus familiaris) for disease gene mapping. Starting with the original dog genome paper, a number of data sets have been generated by the Broad Institute and close collaborators for use by the large canine genomic community.
This page lists all the data that pertains to the "normal" genome, including a genome reference assembly, an improved annotation for the reference, as well as RNA-Seq and variation (SNP) data sets. For disease specific data sets, please see our Canine Disease Genomic Resource page.
The Broad Institute works in conjunction with the University of California, Santa Cruz (UCSC) Genome Browser to host and visualize much of the data below. Via the UCSC "Track Hub" utility, researchers can go the link below, and activate the Broad Improved Canine Annotation v1 Track Hub.
The tracks will then be viewable in the UCSC Genome Browser mapped to the latest dog genome assembly (CanFam3.1.) Most of the data listed below is also downloadable via the UCSC Table Browser tool.
As part of the Dog Genome Project, the Broad Institute has been the steward of the high quality dog genome reference assembly. The latest version is an improved version of the original Sanger 7x build, with targeted resequencing for a variety of features in the genome.
The genome itself is available via NCBI. NCBI also provides a wide range of search functions on the assembly sequence itself via an online Blast page. The genome assembly also provides the backbone for viewing genomic data on the UCSC Genome Browser.
- CanFam3.1 Reference Assembly
- CanFam3.1 Reference Assembly - FTP
- NCBI CanFam3.1 Blast Utility
- CanFam2.0 (older Sanger 7x)
In addition to a high-quality reference, annotations are key to understanding and utilizing the dog as a model organism. From the initial Ensembl annotation, the Broad Institute along with collaborators, has generated an improved annotation for the dog.
- Improved Annotation - Manuscript (Hoeppner et.al., 2014)
- Improved Annotation - Data Files
- UCSC Improved Dog Annotation Track
- Ensembl Annotation (input for improved annotation)
The advent of RNA-Seq has provided a huge amount of data, which is used for annotation. The Broad Institute has generated RNA-Seq data from multiple tissues for the dog, and links to the data can be found below. This data was used in the improved annotation of the dog reference, and the transcript assemblies (via Cufflinks) can be viewed at the UCSC Genome Browser.
We utilized 3 separate protocols:
- a strand-specific dUTP protocol, polyA selected protocol
- a duplex-specific nuclease (DSN) normalization protocol
- a small RNA protocol
All of the samples below were commercially obtained RNA and were derived from a beagle.
To map the variation within the dog reference genome, we've generated several data sets and resources capital zing on these data sets. Originally from the dog paper, greater than 2 million SNPs were identified and then supplemented with targeted resequencing project to create the Illumina HD Array.
- Survey SNPset #1 - Manuscript (Lindblad-Toh et. al., 2005)
- Survey SNPset #1 - Data Files
- Survey SNPset #2 - Manuscript (Vaysse et. al., 2011)
- Survey SNPset #2 - Data Files
- Illumina CanineHD Whole-Genome Genotyping Array (170K SNPs)
Variation Data - Whole Genome Resequencing SNP Sets
SNP and variation data generated from a number of different breeds for different projects.
WGS SNPset #1 - Domestication SNPs
Led by collaborators at Uppsala University in Sweden, a large set of SNPs has been identified in a project aimed at finding selective signatures of domestication in the dog.