Algorithm seeks meaningful relationships
Networks are ubiquitous these days: we use the internet to surf through a seemingly endless network of linked sites; we rely on social media to network with friends and acquaintances across the globe; and we’ve come to look at the human body as an interconnected system of biological processes.
This explosion of connectivity has been both a boon and a challenge for network scientists – the brave souls who are attempting to make sense of the data being mined from these systems. They are working in the biological, social, and information sciences (and beyond), searching networks for meaningful connections that might reveal, for instance, how genes, proteins, and drugs interact inside the body, or how people influence and work with each other.
In the August issue of Nature Biotechnology, a team led by Broad associate member Manolis Kellis introduced a new algorithm that may make this work easier for network researchers in a variety of fields. The team developed a method called “network deconvolution” that helps researchers identify the most direct connections within complex networks by filtering out the background noise caused by indirect or incidental connections. The algorithm was designed to address a persistent challenge in network science: how to distinguish between meaningful relationships among interacting variables (or “nodes”) in a network, and more spurious, potentially misleading connections.
“In complex systems where you may have tens of thousands of variables, you’ll see many connections, but some of those connections are the result of indirect relationships – they only exist because the variables share a common connection with another, separate variable,” explains Soheil Feizi, a graduate student in Kellis’s lab and first author of the paper. “Those indirect connections are not informative and they obscure the true network of direct relationships that researchers want to observe.”
It’s a complication that may be best understood in terms of Kevin Bacon, arguably the world’s most popular network node. In the complex, IMDB-fueled, “Six Degrees of Kevin Bacon” network, Bacon would be directly connected to John Belushi and Ryan Gosling because he worked with each of the actors. But, looking objectively at the network, it might also appear that Belushi and Gosling are connected because they both worked with Bacon. In reality, the two surely never met and probably share no direct connection. Network deconvolution filters out such meaningless, indirect relationships so that researchers can focus on the important and informative connections at the heart of the network.
“Indirect relationships can cause echoes that reverberate infinitely in the data,” explains Kellis, who is also an associate professor in the Computer Science and Electrical Engineering Department at MIT. “These create feedback within the network, make spurious connections seem substantial, and sometimes make direct connections seem disproportionately significant.”
The team saw that, by transforming the network matrix, they could use a mathematical shortcut to contain the amplifying effects of those echoes. From there, they could reverse the process, getting rid of the echoes and revealing an underlying network consisting of only the most direct, informative relationships.
Feizi and Kellis liken the process to cleaning up a dozen audio tracks from a dozen conversations at a crowded cocktail party. Each conversation may spill into the others, but the ability to single out the voices makes it possible to strip out the ambient sound so that one clear voice can be heard.
“Network deconvolution is a powerful new operation for matrices and we hope it will be widely adopted,” Feizi says. “It’s intuitive and simple to program, and it allows us to essentially eliminate the noise in the network.”
For their Nature Biotechnology paper, the scientists applied their algorithm to three distinct types of networks. They looked at gene regulatory networks to determine which genes interacted with each other within a cell. They looked at protein structures to see which amino acids from those proteins truly interacted with each other (as opposed to being related because they’re, for instance, simply folded next to each other). The team also examined co-authorship of papers within social networks of scientists to distinguish strong collaborations from weak ones. In all three cases, the network deconvolution algorithm enabled them to identify the meaningful, direct ties in the network more effectively than previous methods.
The success of the method with these three disparate networks suggests that the new algorithm can be applied to an even wider range of real-world networks, as a fundamental network science tool to study a variety of complex systems.
Kellis says that versatility is something his lab values greatly. “Sitting in a computer science department, working on biology, we are constantly asking, how can we think of this specific biological problem in more general terms? Can we uncover the fundamental principles hiding behind the specific problems that we set out to solve and, if we can, what are the broader implications across fields?”
He says that his lab will continue to search for foundational computational methods that can be added to the toolkit scientists use to analyze networks. As more systems-level information becomes available to the researchers investigating such fields as gene regulation, social networks, or the interaction between drugs, genes, and disease, there will be a growing need for methods like network deconvolution that bring order to the complex worlds within and around us.
Feizi S, et al. “Network deconvolution as a general method to distinguish direct dependencies in networks.” Nature Biotechnology. Online July 14, 2013. DOI: 10.1038/nbt.2635