Power up: Jose Soto’s code gives scientific research a boost
Much has been said about the “data deluge” in genomics and biomedicine: as technology improves, genome sequencing information and other biological data are being generated at a dizzying rate. Managing and analyzing that data remains a challenge—one that Broad takes pride in meeting. The institute is committed to sharing software, tools, and pipelines to enable researchers worldwide to make sense of what might otherwise be an overwhelming surge of information.
Jose Soto, a principal software engineer in the Broad Data Sciences Platform (DSP), is one of the resourceful computer science experts tasked by the institute with developing, maintaining, and improving such resources. Collectively, DSP dedicates itself to integrating information science into the life sciences. In Soto’s case, that means troubleshooting solutions to emerging challenges by creating and adapting software that can help scientists conduct their research more efficiently and effectively.
In this #WhyIScience Q&A, Soto talks about his role at the intersection of computer science and biomedicine:
Q: What do you do at the Broad?
A: I am a part of what we call the “Special Ops” team in the Broad’s Data Sciences Platform. We are a very small team that connects our methods-development teams with our software engineering teams. Because we cross over both groups, we have the tools in our toolbox to tackle problems that maybe wouldn’t be in any other team’s wheelhouse. The goal behind all of the projects we choose is very simple: make science easier to do and make the tools and resources that researchers and data scientists need more accessible. My day-to-day varies from writing tools/scripts/pipelines, to running analyses, to debugging some random issue that someone may be having. It’s a great environment for me because every day is interesting and slightly different than the previous day, which keeps me challenged and engaged.
Q: Who or what inspired you to do the work that you do?
A: It definitely had to do with my older brother. He has always been a huge role model and father figure in my life. He showed me cool things you could do with computer science (like making video games, which I love and are still one of my favorite things to do outside of work). He also showed me how programming is really a problem-solving tool, and how its effectiveness is not just about the quality of the tool, but also about how much further it gets an end-user to their end goal.
Q: How does your work fit into the grand scheme of the Broad’s scientific mission?
A: First, let me say that I can’t “do” science...at all... but, what I can do is make scientific research faster, cheaper, and more reliable—and that is what I focus on. A lot of my work centers on optimizing different parts of the infrastructure that supports the science being done by people at the Broad and elsewhere. Once we make an analysis pipeline cheaper, or speed up a tool or pipeline, we can then pass those gains on to the science community as a whole. What we really do in our corner of DSP is advance science by enabling scientists to work more efficiently.
Q: What do you find are the biggest challenges in your field right now?
A: Getting people to think in a different paradigm. We are in the midst of this shift toward cloud-based computing—which includes moving access to tools, data, and pipelines to the cloud. The idea is to bring people to the data instead of bringing data to the people. But, this can sometimes be met with resistance by researchers and other science folks who are used to doing everything on their laptops or whatever local infrastructure their place of employment may have set up. The best way to overcome this reluctance is to build good tools which, in my experience, is not the easiest thing to do. We need to make this transition as painless as possible for people and show them all the benefits everyone stands to gain with this important shift taking place in the genomics world.
Q: What trend in your field is most intriguing to you?
A: Contending with the rate of sequencing data generation is for me the most interesting thing we face as software engineers in this field. This trend constantly provides new problems to solve, in addition to dredging up familiar problems that we have to solve in new and unique ways. This forces us to try out new technologies and partner up with other leaders in the field to come up with novel solutions—which always keeps us on our toes. I don’t particularly like being bored and I love solving problems and being challenged so the work I get to do is pretty much right up my alley.
Q: What have you found most rewarding about your work?
A: Being at the Broad is great because I am able to talk directly with researchers and scientists on-site who are using the software that I have worked on, so I get regular, in-person feedback on what’s working and what’s not. I accrue that knowledge and can then use it to improve any current and future software I work on. I even had a long conversation about a month ago with a friend of a friend who did not know I had worked on some of the software she was using in her lab. We had a great back on forth on the software itself, how it was being used, and what it was allowing her to accomplish at her institution that she couldn’t before (or at least, couldn’t accomplish as easily). Knowing that I actually am helping people is the best part for sure.