Machine learning guides researchers to new synthetic genetic switches
A new method allows precise activation or repression of genes in specific cells and tissues.
Using a form of artificial intelligence (AI) called deep learning, the group trained a model using hundreds of thousands of DNA sequences from the human genome that they measured in the laboratory for CRE activity in three types of cells: blood, liver and brain. The AI model allowed the researchers to predict the activity for any sequence from the almost infinite number of possible combinations. By analyzing these predictions, the researchers discovered new patterns in the DNA, learning how the grammar of CRE sequences in the DNA impact how much RNA would be made – a proxy for how much a gene is activated.
The team then developed a platform called CODA (Computational Optimization of DNA Activity), which used their AI model to efficiently design thousands of completely new CREs with requested characteristics, like activating a particular gene in human liver cells but not activating the same gene in human blood or brain cells. Through an iterative combination of ‘wet’ and ‘dry’ investigation, using experimental data to first build and then validate computational models, the researchers refined and improved the program’s ability to predict the biological impact of each CRE and enabled the design of specific CREs never before seen in nature.
"Natural CREs, while plentiful, represent a tiny fraction of possible genetic elements and are constrained in their function by natural selection," said study co-first author Sager Gosai, a postdoctoral fellow in Sabeti's lab. "These AI tools have immense potential for designing genetic switches that precisely tune gene expression for novel applications, such as biomanufacturing and therapeutics, that lie outside the scope of evolutionary pressures."
Pick-and-choose your organ
Castro, Gosai, Reilly, Sabeti, Tewhey, and their team tested the new, AI-designed synthetic CREs by adding them into cells and measuring how well they activated genes in the desired cell type, as well as how good they were at avoiding gene expression in other cells. The new CREs, they discovered, were even more cell-type-specific than naturally occurring CREs known to be associated with the cell types.
"The synthetic CREs semantically diverged so far from natural elements that predictions for their effectiveness seemed implausible," said Gosai. "We initially expected many of the sequences would misbehave inside living cells."
"It was a thrilling surprise to us just how good CODA was at designing these elements," said Castro.
Tewhey and his collaborators studied why the synthetic CREs were able to outperform naturally occurring CREs and discovered that the cell-specific synthetic CREs contained combinations of sequences responsible for expressing genes in the target cell types, as well as sequences that repressed or turned off the gene in the other cell types.
Finally, the group tested several of the synthetic CRE sequences in zebrafish and mice, with good results. One CRE, for instance, was able to activate a fluorescent protein in developing zebrafish livers but not in any other areas of the fish.
"This technology paves the way toward the writing of new regulatory elements with pre-defined functions," said Tewhey. "Such tools will be valuable for basic research but also could have significant biomedical implications where you could use these elements to control gene expression in very specific cell types for therapeutic purposes."
Adapted from a press release issued jointly with The Jackson Laboratory.
Funding
Support for this study came from the National Human Genome Research Institute. Pardis Sabeti is an Investigator with the Howard Hughes Medical Institute.
Paper cited
Gosai SJ, Castro RI, et al. Machine-guided design of cell type-specific cis-regulatory elements. Nature. Online October 23, 2024. DOI: 10.1038/s41586-024-08070-z.