In a #WhyIScience Q&A, a software product manager discusses how she moved from the lab to data science
The life sciences are undergoing a data revolution. Advances in high-throughput sequencing have enabled scientists to generate and store massive amounts of information, but to make sense of all the data, researchers need new software. At the Broad Institute of MIT and Harvard, Ruchi Munshi and other people working with the Data Sciences Platform are creating such tools and tailoring them to the needs of different groups.
Munshi began her career in the lab. After studying biochemistry at the University of Massachusetts Amherst, she did research on diagnostic biomarkers for skin cancer at Boston University’s School of Medicine — work she particularly enjoyed because of its connection to patients. Today, as a software product manager at the Broad, she emphasizes the importance of a similar connection to the end users of her team’s tools.
Munshi spoke with us about her career path and how software empowers scientists in a #WhyIScience Q&A:
Q: What do you do at the Broad?
A: I’m a software product manager on the Cromwell/WDL team at the Data Sciences Platform. This means that I talk to Broad scientists about the type of research they do and what software tools would enhance that experience. Then I take this feedback and bring it to a dedicated team of software engineers who work on solving those problems. We think about what one person’s day is like, what do they currently spend most of their time on, and what are the things they wish they didn’t have to spend time on. Those are considered pain points. We categorize the pain points into different bins and think about how we can solve several pain points together by building a tool to automate the things the scientists don’t want to spend their time on. That’s how software products are formed, and I assist this process of building solutions by evaluating which features are high value and building a roadmap of features needed to solve pain points for a specific type of role, such as a computational biologist.
Q: How did you decide to pursue your particular line of work?
A: I think what sparked my interest was when I worked in the Clinical Research Sequencing Platform at the Broad and we were exposed to the analytical outputs of our experiments. To understand the link between the chemistry and the results, I started to learn more about the outputs and more about the software tools that are used to generate results. That led to wanting to know more about what algorithms these software programs use when making sense of genomics data, and so I took computer science/bioinformatics classes, which became my gateway into software engineering.
Q: What inspires you to do what you do?
A: The long-ranging impact software can have on research. Today, scientists are really dependent on tools and cloud resources to do groundbreaking work, and I feel it's my responsibility to build a bridge between tech and science, since I know a bit of both and I can visualize how and where they can be connected.
I think a good example of such a tool is WDL, the Workflow Description Language. It’s a way for computational biologists to write a complex analytical pipeline without having to know how to write code. For a lot of existing pipelining languages, you’re required to know things like Java, Scala, or Python. What WDL aims to do is reduce that barrier to entry to writing pipelines by providing a human-readable/writable interface. WDL started off as a very light domain-specific language, but over the years we’ve seen a trend, that even those analysts who are comfortable writing code have started using WDL. This means there has been an increasing demand for more complex features. So it’s been interesting to see this language evolve over the years, and learn what’s the right balance between complexity and ease-of-use for your average computational biologist.
Q: What have you found most rewarding about your work?
A: The most rewarding piece about my work has been learning about the scientific impact of the research at the Broad. It feels great to know that my team helped enable some of those things, such as processing the genomic variation across the world's largest collection of human exome data.
Q: What trend in your field is most intriguing to you?
A: The migration of scientific tools to the cloud. At many academic institutes, cloud computing is important due to the amount of data generated that cannot fit on local systems and the lack of computing power to process this volume of data locally. It’s the future of scientific computing, but the adoption of cloud computing can be a really painful experience for scientists because it’s such a big change in mentality. It is also an expensive change, along with the learning curve and the need to re-organize data. But it’s so essential to assist in this shift in order to raise the bar on research, and it’s something I want to help support as much as I can.
Q: What, in your opinion, has been your biggest professional accomplishment to date?
A: It’s not a single event, but nothing gives me more satisfaction than when a researcher reaches out about a project they want to run at a higher scale than ever before, and I’m able to help them get to that next level. Every such event strengthens my attachment to the analysts at the Broad.