Human lincRNA Catalog

Large intergenic non-coding RNAs (lincRNAs) are emerging as key regulators of diverse cellular processes. Determining the function of individual lincRNAs remains a challenge. Recent advances in RNA sequencing (RNA-Seq) and computational methods allow for an unprecedented analysis of such transcripts. In this work, we present an integrative approach to define a reference catalogue of over 8,000 human lincRNAs. Our catalogue unifies previously existing annotation sources with transcripts we assembled from RNA-Seq data collected from ~4 billion RNA-Seq reads across 24 tissues and cell types. We characterize each lincRNA by a panorama of more than 30 properties, including sequence, structural, transcriptional, and orthology features. We find that lincRNA expression is strikingly tissue specific compared to coding genes, and that they are typically co-expressed with their neighboring genes, albeit to a similar extent to that of pairs of neighboring protein-coding genes. We distinguish an additional sub-set of transcripts (termed transcripts of uncertain coding potential; TUCPs) that have high evolutionary conservation but may include short open reading frames, and may serve either as lincRNAs or as small peptides. Our integrated, comprehensive, yet conservative reference catalogue of human lincRNAs reveals the global properties of lincRNAs and will facilitate experimental studies and further functional classification of these genes.

This site includes the catalog of the Human Body Map lincRNAs and TUCP transcripts as well as all RNA-Seq alignment files and raw de-novo transcriptome assemblies. Our work is further described in the manuscript linked from this website.   

If you have any questions please address them to Moran  nmcabili at fas.harvard.edu.

Important Note:
 
If you are trying to open an account on the website - please email  nmcabili at fas.harvard.edu  noting you applied for an account. Thank you!

 

Data visualization:

Alignments, transcriptome assemblies and transcript annotation files (BED, GTF) can be easily viewed by uploading the data to the Integrative Genomics Viewer (IGV). Details are provided in the IGV user guide

GTF and BED file formats can also be easily uploaded to the UCSC Genome Browser using the “upload user custom tracks” selection box in the “tables” page http://genome.ucsc.edu/cgi-bin/hgTables?command=start. Instructions for managing user custom tracks can be found at http://genome.ucsc.edu/goldenPath/help/customTrack.html#MANAGE_CT .