Geometric deep learning and generative models for protein target discovery
Marinka Zitnik
Assistant Professor of Biomedical Informatics, Harvard Medical School
Computational therapeutic target discovery requires deciphering the cell types in which proteins act and their interactions. We present PINNACLE, a contextual AI model for single-cell protein biology. Using a multi-organ single-cell atlas, PINNACLE learns from contextualized protein interaction networks, producing 394,760 protein representations from 156 cell types across 24 tissues. PINNACLE’s embedding space reflects cellular and tissue organization, enabling zero-shot retrieval of tissue hierarchy. Pretrained PINNACLE representations can power a range of tasks: enhancing 3D structure-based representations in immuno-oncology, predicting drug effects across cell types and states, and identifying therapeutic targets in a cell-type-specific manner. We used PINNACLE to nominate protein targets for rheumatoid arthritis and inflammatory bowel disease, identifying predictive cell type contexts. Drugs bind to protein pockets, areas where proteins interact with ligand molecules, a challenging task due to biomolecular interactions and sequence-structure dependencies. We developed PocketGen, a sequence-structure generative model that optimizes protein pockets to serve as optimal binders for ligand molecules. PocketGen iteratively refines the sequence and structure of pockets by maximizing binding affinity with the ligand and sequence-structure consistency. Using a graph transformer for all-atom structure modeling and a protein language model for sequence prediction, PocketGen can help optimize protein pockets with high binding affinity, strong structural validity and generation efficiency.