#WhyIScience Q&A: A computer scientist marvels at the beauty behind good code
Victoria Popic discusses her journey through industry and academia and how machine learning can enhance our understanding of the human genome.
By Alex Viveros
September 8, 2022
Credit: Allison Dougherty, Broad Communications
Victoria Popic is a Schmidt Fellow at the Broad.
As a child growing up in Moldova, Victoria Popic had a profound love for learning. The unique problems in math, science, and art – and their intersection – deeply intrigued her.
When Popic eventually started writing her first lines of code, she was immediately hooked. She loved the theoretical and engineering aspects of building large-scale software systems and designing new algorithms. She also discovered that by pursuing a career in computer science, she could tackle problems in genomics and healthcare by developing tools that could further our understanding and interpretation of the human genome and disease.
After graduating from MIT with degrees in computer science and mathematics, Popic moved to Silicon Valley, where she earned a PhD in computer science from Stanford University in 2017 and worked as an engineer in tech and biotech.
Popic started her lab at the Broad as a Schmidt Fellow in 2020, and now uses deep neural networks — a type of machine learning tool — to build better ways to understand complex genomic data.
We spoke with Popic about her path through computer science, how the field has affected her day-to-day life, and how she believes machine learning will pave the way to new genomic discoveries in this #WhyIScience.
Why did you leave industry to work in academia?
I worked at several different tech and biotech companies. The work was really interesting but I felt like I was missing true freedom. For me, that is very important: defining your own research agenda and focusing on things that you really feel passionate about. I feel like you have more room for that in an academic setting.
Why did you fall in love with computer science?
I really liked everything I was studying when I was a kid, and I loved the process of learning and exploring different things. I loved physics, mathematics, film, languages, and drawing, so I was considering a variety of different paths.
Once I started coding though, it engrossed me entirely, and I quickly realized that computer science was really my true passion. One big appeal was that computer science is a field where you get to do a lot of different things — you get to work in biology, or chemistry, or physics — and build tools that combine domain-specific insights, with novel algorithms, and low-level system optimization and design. And since it’s such a dynamic field, you also have to keep learning as well, which is another thing that drove me to it.
Computer science also enriches my life outside of research. Day to day, whatever I do, I’m constantly thinking of algorithms. When I cook or drive, I ask myself, “What’s the optimal way to do this?” So it comes up all the time, and it enriches my appreciation and understanding of certain tasks, since many are actually known algorithms with known optimal solutions.
It’s a field that opens the door to a vast diversity of problems to tackle and ways to innovate and contribute to society.
And it also involves some art and creativity — writing beautiful code is an art in itself.
What is the focus of your work now?
One of the bigger goals of my research is currently to develop methods that can reconstruct the genome sequence or mixture of sequences from massive sets of short DNA readouts. Due to certain limitations of sequencing technologies, we only have a fragmented view of what the underlying genome sequence really is. Answering the question of what genomic input resulted in the data that we observe has been fascinating to me.
The problems that fall into this category — including read mapping, assembly, phasing, and variation discovery — have to do with this primary question: what is the data we’re looking at – what is the genome sequence? Once there, you can do interpretation and extremely exciting analysis. But this initial question in itself presents a very rich set of puzzles and challenges.
Currently I’m especially focused on developing data-driven solutions to this question that are scalable, sustainable, and generalizable. Leveraging deep learning techniques leads to methods that can easily adapt to the rapidly-changing field of DNA sequencing, like quickly adapting to support a new sequencing technology or changes in key characteristics of existing technologies.
How are deep neural networks being used to learn more about the human genome?
While I love developing algorithms, there are certain categories of problems involving data patterns that are really complex where deep neural networks are just doing a better job.
Structural variant detection in the genome is a big project in my lab right now. Structural variants are a major driver of genetic diversity and disease in the human genome and have been linked to numerous disorders such as cancer, Huntington’s, Alzheimer’s, autism, and schizophrenia. Our approach is to shift the problem from sequence analysis to image analysis, and formulate structural variant detection as a learning task on images that are engineered to capture patterns induced by different structural variants across the genome. By juxtaposing any two given intervals of the genome along the two image axes, and encoding certain properties of genomic sequences aligned to each interval as pixel values, we capture both the local and long-range context around potentially distant structural variant breakpoints on the genome, enabling their detection in this space. We then train a deep neural network to detect the location of these breakpoints, and ultimately map the image-based predictions back to genome space coordinates. Since deep neural networks learn complex abstractions automatically from the data, they are better suited to handle the often complex patterns of rearrangements of the genome, as compared to hand-engineered models. For such categories of problems, deep learning can really make a huge difference.
What advice would you give to someone who is thinking about going into computer science?
Go for it! You just need a laptop to get started. There’s so much free educational content online that can be helpful. Also the classic textbooks on algorithms, programming languages, computer architecture, artificial intelligence, and so on, can provide a great foundation along the way.
As a result, you’ll gain an ability to contribute and interact with the world in interesting ways and with different areas of science and parts of our society, from healthcare to environmental science and more