Yours and MINE
Brothers David Reshef (second from right) and Yakir Reshef (right)
developed a new tool for detecting patterns in large data sets under the
guidance of advisers Pardis Sabeti (second from left) of the Broad Institute
and Michael Mitzenmacher (left) of the Harvard University School of
Engineering and Applied Sciences.
Photo by ChieYu Lin
David and Yakir Reshef can’t help but fill in each other’s sentences. As we talk about the project that they have been working on together for the past several years, the discussion easily shifts back and forth as one and then the other takes the lead in describing their work. Their conversation is so seamless that when I go back to review the tape of our interview, it’s tricky to figure out where Yakir’s quotes end and David’s begin. They have the sort of mental rapport unique to close siblings or co-authors. This is, of course, because David and Yakir are both.
“David and I are only a year apart in age so we grew up together and influenced each other intellectually,” says Yakir who is a remote associated researcher at the Broad Institute and a Fulbright scholar at the Weizmann Institute of Science. “One of us would think that the math team was cool, and then the other would join him. Or one of us would get into computer programming, and the other would say, ‘I want to do that too.’”
“It’s been really great to keep that up and incorporate that relationship into our research,” says David, a graduate research assistant at the Broad Institute and a graduate student in the Harvard-MIT Health Sciences and Technology program.
Over the past few years, David and Yakir have been working in close collaboration with researchers Pardis Sabeti, Michael Mitzenmacher, and others to develop a tool that can find unanticipated patterns in large amounts of data. The project has brought together researchers from many different areas of research because it requires experts in statistics, computation, and more. The tool itself is designed to tackle a major problem affecting fields as diverse as physics, economics, global health, and genomics, all of which are churning out bigger data sets at faster and cheaper rates. The toolkit that the researchers have developed, called MINE (which stands for Maximal Information-based Nonparametric Exploration), can find multiple patterns hidden in data and can score and compare the strengths of these relationships, helping researchers narrow in on the most promising and potentially important associations. A paper about MINE appears in this week’s issue of Science.
A senior author of the paper and professor of computer science at Harvard University, Michael Mitzenmacher says, “What’s been amazing, particularly over the last ten years, is our ability to both store and collect really large data sets. Now the question is, what can we do with the data? It’s too much for us to get a handle around. We need to find ways to deal with it, to break it into pieces we can understand.” Michael became involved in the project when David took one of his classes. Since then, Michael has been involved in the mathematical and algorithmic side of the project.
Pardis Sabeti, an assistant professor at Harvard and an associate member of the Broad Institute, says that collaborating with Michael and the Reshef brothers on this work has been extraordinary. “The Broad is all about collaboration and this has been one of the most fun collaborations I’ve ever had,” she says. “We’re very much a family.”
Pardis is passionate about genomics and public health so one of the data sets on which the team tested their toolkit was data amassed by the World Health Organization, which contained statistics like infant mortality rate, income per capita, internet use, carbon dioxide emissions, and more. Using MINE, the researchers could see patterns and connections emerge.
In addition to using MINE to look at data related to biology and medicine, researchers can use the toolkit to look at large data sets from just about any field. David and Yakir, who grew up rooting for the Baltimore Orioles, even tested out the toolkit on data from major league baseball.
“We think there’s tremendous potential in being able to look in this way at large data sets,” says Yakir. “This is a project that has the potential to impact lots of different fields,” David adds.
You can read more about how the researchers tested out their toolkit on WHO data, baseball statistics, and data collected on gut-dwelling bacteria in a Broad press release here. You can also watch David, Yakir, Michael, and Pardis talk about their work in a video available here.