You are here

MIA Talks

Primer: Generative models from NLP for sequence data

May 22, 2019
Marks Lab, Harvard Medical School, Broad Institute

Generative models are powerful tools for capturing functional constraints within families of biological sequences. Autoregressive models, developed in natural language processing and related fields, provide a useful approach to modeling sequence data without imposing a rigid alignment structure on the data. In this primer, we will review the math and intuition behind these models, survey advancements in model parameterization, and compare strategies for sampling from the models to generate new sequences. Finally, we will discuss important considerations when applying these models to biological data.