#WhyIScienceQ&A: Why I switched from experimental to computational biology

Diabetes genetics researcher Josep Mercader talks about the benefits and challenges of a career analyzing large and diverse datasets.

Josep Mercader
Credit: Allison Dougherty
Josep Mercader

While studying the genetics of anorexia nervosa for his Ph.D., Josep Mercader became intrigued by computational biology. He taught himself basic statistics, scripting, and programming as he continued his graduate research at the Center for Genomic Regulation in Barcelona. After completing his degree, he was a postdoc at the Barcelona Supercomputing Center, earned a master’s degree in biostatistics, and began using computational methods and tools to analyze biological data, including some from type 2 diabetes patients.

When he encountered a challenge in this project, he reached out to Jose Florez, director of the diabetes research group at the Broad Institute of MIT and Harvard, co-director of Broad’s Metabolism Program, and chief of the endocrine division and diabetes unit at Massachusetts General Hospital (MGH). That connection led to a collaboration with Florez’s lab and, ultimately, Mercader joining the Broad full time as a research scientist in September 2016.

Today, Mercader is analyzing datasets with new computational techniques to search for novel genetic variation associated with type 2 diabetes. He is also following up on previously identified genetic variants to determine how they increase a person’s risk of the disease. And in collaboration with Aaron Leong, an instructor in medicine and an endocrinologist at MGH, Mercader is investigating the physiologic and pharmacologic responses of people who carry genetic variants associated with type 2 diabetes. The findings from these studies could potentially be applied to the development of new and more precise ways of treating and predicting the disease.

Mercader spoke with us about the benefits and challenges of computational biology, as well as the importance of diversity in genomics research in a #WhyIScience Q&A:

Q: What motivated you to move from experimental biology to computational biology?

A: I think the main motivation was that there are currently more data out there than we can understand and digest. It’s important to generate more data and larger datasets, but it’s also important to make better use of the data we already have. Most research projects just publish the low-hanging fruit, but there’s a lot of fruit they’re missing, and data-sharing allows computational biologists to do more comprehensive analyses and to try to find other fruits that are just as juicy as the low-hanging fruit. 

For example, by re-analyzing already published genetic datasets, we discovered a novel genetic variant that doubles the risk for type 2 diabetes in men and is present in 1 percent of the population. This was possible by using only about 10 percent of the existing data that has been shared publicly. This shows how much more could be done if all the data was made publicly available. 

The UK Biobank is another great demonstration of data-sharing that really works and results in lots of discoveries. However, it’s mostly of European ancestry, and efforts should be made to study other ancestries too.

Q: How difficult was this career move?

A: The move was challenging because I don’t have a computer science background, but it was also a lot of fun to learn new things and to see that you can actually change fields. I think the basic difference between the two is that computational biology gives you a lot of freedom because you can test the hypothesis you want just by analyzing the data. The fact that you only need a computer (or many computers) to do it makes it easier to test a hypothesis, whereas in a wet lab, you have to really consider the cost of reagents, cells, animal models, etc. before deciding to test a new hypothesis.

Q: What are the biggest challenges in computational biology right now? 

A: One of the challenges is to improve genomic data-sharing in general. A lot of genetic data is already available, but much of it is only accessible to a privileged group of people and that’s a big limitation. Another challenge is the fact that we have been producing genetic and genomic data only on European populations for a long time. There are efforts now to study other populations, but genomic resources, such as GTEx or phenome-wide association data, are still mostly based on European populations. So every time you discover something in non-European populations, the amount of resources you have available to do follow-up work is very limited. 

Q: What has been your biggest scientific accomplishment to date?

A: I want to say that none of my accomplishments are just mine because everything I’ve done is a team accomplishment. Most of it would have been impossible without the excellent mentorship, the great resources, and the students and others who help with the analysis of data or with the heavy lifting of the lab work. 

That being said, one of the interesting accomplishments lately was the discovery of a loss of function protective variant for type 2 diabetes, and this data was discovered by analyzing individuals in Mexico and Latin America. The interesting thing is that we discovered this by analyzing only around 9,000 individuals. As a comparison, that is only 1 percent of the biggest dataset for type 2 diabetes that has been studied from Europeans. 

Another interesting aspect is that because what we found is a protective variant and a loss-of-function variant, it has the potential to become a therapeutic target. It’s an example of one benefit of studying other populations; sometimes important discoveries can be made with smaller sample sizes. 

Q: What advice would you give budding scientists who want to pursue a career in biology?

A: I would encourage them to try to do research both in a wet lab and in computational biology, not only to try to understand what they’re better at, but also because it provides a very good perspective. Even if you end up doing computational biology, it’s very important to understand how the experiments are generated or the underlying biology. There are a lot of advantages to having experience in both.