Brandon Franco, a senior software engineering major at St. Mary’s University, developed an annotation database for organizing, identifying, and retrieving a curated collection of published variant annotations, which can be used in both clinical genetics and basic genomic research.
Reproducibility in research, more specifically computational reproducibility, is an enormous challenge facing the scientific community. Life is not a linear path; especially in science. The Broad Institute offers an unmatched environment where collaboration and diversity are encouraged when pursuing the next big scientific discovery. I learned here that science is not strictly limited to just one path, but rather branches out and encompasses all disciplines. Having the opportunity to interact with many passionate and brilliant scientists and learning about their career paths was especially eye-opening and inspiring. The Broad has been instrumental in helping me grow as a person and also to finally understand what it truly means to be both a scientist and an engineer. Although many studies use sufficiently rigorous statistical standards for reproducible science, the software side of reproducibility continues to lag behind. While researchers generally use open-source software tools and published scientific databases in their work, it is rarely possible to replicate the specific software versions and installations used. Some tools and databases are surprisingly difficult to install, and so the most popular software tools and databases are those that are the most easily accessible. However, these may not be the most scientifically appropriate. As part of an open-source software framework to make genomic data analysis accessible and reproducible, we present the following annotation database for organizing, identifying, and retrieving a curated collection of published variant annotations, which can be used in both clinical genetics and basic genomic research. These variant annotations are documented in an interactive webpage and stored in an easily accessible format, where additional resources can be added as they are published. To further encourage reproducibility, dataset provenance is encoded into the source code of this tool, including both input data and software environment specifications. This tool will allow future researchers to overcome the technical challenges of reproducibility by making their own findings reproducible, transparent, and open for the public.
Project: Facilitating reproducible genomic science with a user-friendly catalog of genetic annotations
Mentors: Daniel King and Cotton Seed, Hail Team, Neale Lab