You are here

Low dim embeddings of words and docs; Density-aware visualization

Leland McInnes
Tutte Institute for Mathematics and Computing
MIA Meeting: Low dimensional embeddings of words and documents (and how they might apply to single-cell data)

Over the last decade the field of Natural Language Processing (NLP) has been overtaken by neural networks and deep learning. The latest models and algorithms, from word2vec to Google’s Universal Sentence Encoder and BERT, can perform seemingly magical feats and provide powerful tools to understand and analyze text documents.
This talk will seek to pick apart word and document embedding techniques from NLP, removing the neural networks and instead working with linear algebra and dimension reduction on large sparse matrices of counts that is very similar to a large amount of single-cell data. Such an approach can achieve results on par with neural network approaches, but are both simpler to understand the inner workings of, and are generalizable to much more diverse domains than NLP, including, hopefully, single-cell research. In particular all we really need to apply these kinds of techniques are large sparse matrices of counts and some notion of locality or cooccurrence of the feature columns of that matrix. The goal of this talk is to hopefully open up some of the ideas that have been so effective in NLP and make them usable and accessible to a wider audience working with more diverse types of data, and initiate a conversation about what is required to make such techniques useful for the single-cell community.

Hoon Cho
Broad Institute
Primer: Density-aware visualization and sketching of single-cell transcriptomic data

Single-cell transcriptomic datasets have enabled the study of gene expression at an unprecedented resolution and scale. The high-dimensional and large-scale nature of single-cell transcriptomic landscapes necessitate efficient and accurate computational tools for extracting biological insights from these data. Unfortunately, standard analysis workflows often neglect information about local density in the original transcriptomic space, resulting in misleading representations of transcriptomic variability of individual cell states in downstream analyses. In this talk, I will introduce our recent algorithms for single-cell analysis that expressly account for density differences in the underlying dataset: densMAP and denSNE for data visualization, which respectively augment widely-used methods UMAP and t-SNE, and GeoSketch for sketching (downsampling) massive datasets. Our methods facilitate more accurate and unbiased exploration of single-cell transcriptomic landscapes.