Beyond base pairs: Two Broad cancer researchers on the study of structural variation in cancers
Why are researchers like yourselves so interested in structural variations?
Rameen Beroukhim: First, structural variants or rearrangements have a long history as important therapeutic targets. The first three targeted treatments in cancer — all trans-retinoic acid, trastuzumab, and imatinib — were against oncogenes that are activated by rearrangements.
Second, they can also have massive, long-range effects in cancer cells. A single rearrangement in one spot can cause a DNA duplication covering 1,000 genes. Or, if it occurs in a non-coding regulatory region, can affect a gene many megabases away.
Matthew Meyerson: When you compare a cancer cell and a normal cell, you find thousands of structural rearrangements. Nearly a third of the cancer cell’s genome is altered by copy number variations (CNVs) alone.
Beroukhim: Those combinations of rearrangements could be very important, as they likely can teach us a great deal about genome function. If two structural changes tend to co-occur, for instance, they’re probably synergistic. If two events never co-occur, they probably have the same functional outcome, just by different mechanisms.
What kinds of structural variants are important in cancers?
Meyerson: The first wave of variants that researchers noticed were those that affect proteins in some clear way, such as a translocation that leads to a fusion gene, or translocations that bring strong promoters in front of oncogenes. Leukemias are in large part driven by fusion genes, as are prostate cancers. The classic examples are antibody and T cell receptor gene promoter translocations to oncogenes, as seen in many B and T cell leukemias and lymphomas.
The next wave were those modulating gene regulation. These are new classes of alterations, such as enhancer hijacking or enhancer modulation, where you see duplications or rearrangements of enhancers or deletions of repressive elements that lead to gene activation.
Then there are the rearrangements whose cancer impact we don’t yet understand, including some very complex ones. There’s genome doubling, for instance, which frees up opportunities for other rearrangements to take place. There are complex rearrangements involving many small pieces of DNA, novel insertions, and chromothripsis. All of these can lead to gene activation or inactivation.
Are certain kinds of structural variation unique to cancer?
Meyerson: All of the types of structural events we see in cancer are seen in inherited disorders as well, and may have important but as yet unappreciated roles in evolutionary biology as well. I think cancers are a great model for finding and identifying them, because they occur at a higher density.
Beroukhim: What’s interesting is that some structural rearrangements seem to be unique to particular tumor types or subtypes. MYC amplifications, for example, occur across many tumor types, but specific rearrangements that activate MYC seem to occur only in single cancer types. My suspicion is that it has to do with whether a rearrangement brings a gene together with an enhancer that’s active in that cell type.
The upshot, though, is that to be able to detect a recurrent rearrangement you need to sequence whole genomes from many samples of a single cancer type or subtype. It’s not enough to look across many cancers.
How has technology evolved for studying structural variations in cancers?
Meyerson: Starting in the mid 20th century, cytogenetics and karyotyping allowed the detection of specific translocations in leukemias, which have relatively simple genomes. Expression profiling, RNA sequencing, and early efforts at whole genome sequencing in the 1990s and 2000s made it possible to study variation in breast, prostate, lung , and other epithelial cancers, which have much more complex genomes. Now the answer to everything is nucleotide sequencing.
Are there particular challenges to studying structural variations in cancer using sequence?
Beroukhim: The datasets we have available are, to a large extent, insufficient for interpreting rearrangements. A lot of effort has been put into sequencing cancer exomes, which let you significantly reduce costs while still getting a lot of information. But sequencing just exons misses functionally-important rearrangements in introns and intergenic regions, where rearrangements often take place.
Even if we look in the right parts of the genome, rearrangements screw up the genome in ways that can make it difficult to identify them from short sequence reads. We need to have long, reliable reads that span the borders of a rearrangement to detect and align them well.
Meyerson: That’s right. The main limitations of whole genome or exome sequencing to date have been imperfect mapping of complex rearrangements, and of rearrangements that cross repetitive elements. And those limitations point back to the challenges of assembling genomes and aligning sequence reads to a reference genome.
Beroukhim: Then there’s the question of interpretation. Remember, structural variations can have long-range effects, which can be incredibly difficult to interpret. If you have a 30 megabase variation covering hundreds of genes, which genes are the important ones? Or how do you link an inversion that impacts regulation of a gene millions of bases away?
And as Matthew pointed out, a third of the cancer cell’s genome is structurally altered. Which parts matter? Which affected genes are the drivers, and which are the passengers? Even if we had 100,000 perfectly described cancer samples, I anticipate it would be decades of work to tease apart what’s going on.
What’s being done to overcome these challenges?
Beroukhim: I think the best analytic approach to detecting rearrangements is to use algorithms that assemble short reads into longer contiguous ones and ask whether those longer reads contain rearrangements. You still have a mapping challenge, but that challenge gets easier the longer you make your contiguous reads.
Meyerson: Read length is really important. Approaches that give longer reads, whether they be synthetic linked reads or truly longer reads, would definitely increase our ability to uncover structural rearrangements.
But we also need improved read depth and improved computational or analytic methods. I think moving from alignment- to assembly-based methods for variation detection is going to be extremely important. Because they assume no reference genome, assembly methods are really good for complex rearrangements.
I’m particularly excited about a method called Snowman being developed by Jeremiah Wala, a student in Rameen’s lab, that looks for cancer rearrangements based on local assembly.
Beroukhim: Interpreting variations is another thing. GISTIC, which Gaddy Getz and I developed, was one of the first methods for identifying oncogenic CNVs. We’re trying to improve on it so we can start to understand how the genes directly affected by a duplication or deletion contribute, or not, to a cancer.
But there haven’t been good algorithms for looking for rearrangements like inversions or translocations. We’re now developing algorithms that do this by scanning for genomic regions that are broken or fused more often than they should be by chance.
In the long run, the best algorithms should take into account all of the possible effects of all of these alterations in combination. That’s an incredibly complex problem to solve.
Steve McCarroll: Structural variation and disease in the human genome