Cancer Program Publications

Craig Mermel, Steven Schumacher, Barbara Hill, Matthew Meyerson, Rameen Beroukhim, and Gad Getz

Mermel, C. H., Schumacher, S. E., Hill, B., Meyerson, M. L., Beroukhim, R., and Getz, G. (2011). GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol, 12(4), R41. doi: 10.1186/gb-2011-12-4-r41

We describe methods with enhanced power and specificity to identify genes targeted by somatic copy-number alterations (SCNAs) that drive cancer growth. By separating SCNA profiles into underlying arm-level and focal alterations, we improve the estimation of background rates for each category. We additionally describe a probabilistic method for defining the boundaries of selected-for SCNA regions with user-defined confidence. Here we detail this revised computational approach, GISTIC2.0, and validate its performance in real and simulated datasets.

Yujin Hoshida

PLoS ONE 5(11): e15543

Gene-expression signature-based disease classification and clinical outcome prediction has not been widely introduced in clinical medicine as initially expected, mainly due to the lack of extensive validation needed for its clinical deployment. Obstacles include variable measurement in microarray assay, inconsistent assay platform, analytical requirement for comparable pair of training and test datasets, etc. Furthermore, as medical device helping clinical decision making, the prediction needs to be made for each single patient with a measure of its reliability. To address these issues, there is a need for flexible prediction method less sensitive to difference in experimental and analytical conditions, applicable to each single patient, and providing measure of prediction confidence. The nearest template prediction (NTP) method provides a convenient way to make class prediction with assessment of prediction confidence computed in each single patient???s geneexpression data using only a list of signature genes and a test dataset. We demonstrate that the method can be flexibly applied to cross-platform, cross-species, and multiclass predictions without any optimization of analysis parameters.

Bjorn Nilsson, Mikael Johansson, Fatima Al-Shahrour, Anne E. Carpenter, Benjamini L. Ebert

Bioinformatics (2009), in press.

Motivation: Multimillion-probe microarrays allow detection of gains and losses of chromosomal material at unprecedented resolution. However, the data generated by these arrays are several-fold larger than data from earlier platforms, creating a need for efficient analysis tools that scale robustly with data size. Results: We developed a new aberration caller, Ultrasome, that delineates genomic changes-of-interest with dramatically improved efficiency. Ultrasome shows near-linear computational complexity and processes latest-generation copy number arrays about 10,000 times faster than standard methods with preserved analytic accuracy. Availability: www.broad.mit.edu/ultrasome.

Derek Y. Chiang, Gad Getz, David B. Jaffe, Michael J.T. O'Kelly, Xiaojun Zhao, Scott L. Carter, Carsten Russ, Chad Nusbaum, Matthew Meyerson, Eric S. Lander

Nature Methods 6:99-103

Cancer results from somatic alterations in key genes, including point mutations, copy number alterations and structural rearrangements. A powerful way to discover cancer-causing genes is to identify genomic regions that show recurrent copy-number alterations (gains and losses) in tumor genomes. Recent advances in sequencing technologies suggest that massively parallel sequencing may provide a feasible alternative to DNA microarrays for detecting copy-number alterations. Here, we present: (i) a statistical analysis of the power to detect copy-number alterations of a given size; (ii) SegSeq, an algorithm to identify chromosomal breakpoints using massively parallel sequence data; and (iii) analysis of experimental data from three matched pairs of tumor and normal cell lines. We show that a collection of ~14 million aligned sequence reads from human cell lines has comparable power to detect events as the current generation of DNA microarrays and has over two-fold better precision for localizing breakpoints (typically, to within ~1 kb).

Pravin J. Mishra, Prasun J. Mishra, Rita Humeniuk, Daniel J. Medina, Gabriela Alexe, Jill P. Mesirov, Sridhar Ganesan, John W. Glod and Debabrata Banerjee

Cancer Res 2008;68(11):4331-9

Carcinoma-associated fibroblasts (CAF) have recently been implicated in important aspects of epithelial solid tumor biology, such as neoplastic progression, tumor growth, angiogenesis, and metastasis. However, neither the source of CAFs nor the differences between CAFs and fibroblasts from nonneoplastic tissue have been well defined. In this study, we show that human bone marrow-derived mesenchymal stem cells (hMSCs) exposed to tumor-conditioned medium (TCM) over a prolonged period of time assume a CAF-like myofibroblastic phenotype. More importantly, these cells exhibit functional properties of CAFs, including sustained expression of stromal-derived factor-1 (SDF-1) and the ability to promote tumor cell growth both in vitro and in an in vivo coimplantation model, and expression of myofibroblast markers, including alpha-smooth muscle actin and fibroblast surface protein. hMSCs induced to differentiate to a myofibroblast-like phenotype using 5-azacytidine do not promote tumor cell growth as efficiently as hMSCs cultured in TCM nor do they show increased SDF-1 expression. Furthermore, gene expression profiling revealed similarities between TCM-exposed hMSCs and CAFs. Taken together, these data suggest that hMSCs are a source of CAFs and can be used in the modeling of tumor-stroma interactions. To our knowledge, this is the first report showing that hMSCs become activated and resemble carcinoma-associated myofibroblasts on prolonged exposure to conditioned medium from MDAMB231 human breast cancer cells.

Yujin Hoshida, Jean-Philippe Brunet, Pablo Tamayo, Todd R. Golub, Jill P. Mesirov

PLoS ONE 2(11): e1195, 2007

Whole genome expression profiles are widely used to discover molecular subtypes of diseases. A remaining challenge is to identify the correspondence or commonality of subtypes found in multiple, independent data sets generated on various platforms. While model-based supervised learning is often used to make these connections, the models can be biased to the training data set and thus miss inherent, relevant substructure in the test data. Here we describe an unsupervised subclass mapping method (SubMap), which reveals common subtypes between independent data sets. The subtypes within a data set can be determined by unsupervised clustering or given by predetermined phenotypes before applying SubMap. We define a measure of correspondence for subtypes and evaluate its significance building on our previous work on gene set enrichment analysis. The strength of the SubMap method is that it does not impose the structure of one data set upon another, but rather uses a bi-directional approach to highlight the common substructures in both. We show how this method can reveal the correspondence between several cancer-related data sets. Notably, it identifies common subtypes of breast cancer associated with estrogen receptor status, and a subgroup of lymphoma patients who share similar survival patterns, thus improving the accuracy of a clinical outcome predictor.

Pablo Tamayo, Daniel Scanfeld, Benjamin L. Ebert, Michael A. Gillette, Charles W. M. Roberts, and Jill P. Mesirov.

Proc. Natl. Acad. Sci. USA, 104: 5959-5964

The high dimensionality of global transcription profiles, the expression level of 20,000 genes in a much small number of samples, presents challenges that affect the sensitivity and general applicability of analysis results. In principle, it would be better to describe the data in terms of a small number of metagenes, positive linear combinations of genes, which could reduce noise while still capturing the invariant biological features of the data. Here we describe how to accomplish such a reduction in dimension by a metagene projection methodology, which can greatly reduce the number of features used to characterize microarray data. We show, in applications to the analysis of leukemia, lung cancer, and central nervous systems tumor data sets, how this approach can help assess and interpret similarities and differences between independent data sets, enable cross-platform and cross-species analysis, improve clustering and class prediction, and provide a computational means for detecting and removing sample contamination.

Joshua Gould, Gad Getz, Stefano Monti, Michael Reich, Jill P. Mesirov

Bioinformatics 22(15): 1924-1925, 2006.

An important step in analyzing expression profiles from microarray data is to identify genes that can discriminate between distinct classes of samples. Many statistical approaches for assigning significance values to genes have been developed. The Comparative Marker Selection suite consists of three modules that allow users to apply and compare different methods of computing significance for each marker gene, a viewer to assess the results, and a tool to create derivative datasets and marker lists based on user-defined significance criteria.

2006.04.30

Michael Reich, Ted Liefeld, Joshua Gould, Jim Lerner, Pablo Tamayo & Jill P Mesirov

Nat. Genet. 38 no. 5 (2006): pp500-501 doi:10.1038/ng0506-500

Whole-genome expression profiling has created a revolution in the way we study disease and basic biology. Since 1997, the number of published results based on an analysis of gene expression microarray data has grown from 30 to over 5,000 publications per year. Sophisticated mathematical methods have been developed for use in patient diagnosis and prognosis, identification of new drug targets and understanding biological mechanisms. However, these tools are often out of the direct reach of the biomedical researchers who can so critically benefit from them because they can be difficult to understand and use correctly. This challenge is even more relevant in the context of "integrative" approaches, where a multitude of data sources and methods are combined in the analysis of a single problem.

To address this challenge in genomics research we have developed a software package called GenePattern, which provides a comprehensive environment that can support (i) a broad community of users at all levels of computational experience and sophistication (ii) access to a repository of analytic and visualization tools and easily create complex analytic methods from them and (iii) the rapid development and dissemination of new methods. Perhaps the most important feature of GenePattern is that it supports a mechanism to guarantee the capture and independent replication of published computational methods and in silico results.

Martin, S., Hohmann, MM, Liefeld, T.

Drug Discovery Today. 2005 Nov 15;10(22):1566-72

Since the Life Science Identifier (LSID) data identification and access standard made its official debut in late 2004, several organizations have begun to use LSIDs to simplify the methods used to uniquely name, reference and retrieve distributed data objects and concepts. In this review, the authors build on introductory work that describes the LSID standard by documenting how five early adopters have incorporated the standard into their technology infrastructure and by outlining several common misconceptions and difficulties related to LSID use, including the impact of the byte identity requirement for LSID-identified objects and the opacity recommendation for use of the LSID syntax. The review describes several shortcomings of the LSID standard, such as the lack of a specific metadata standard, along with solutions that could be addressed in future revisions of the specification.

Aravind Subramanian, Pablo Tamayo, Vamsi K. Mootha, Sayan Mukherjee, Benjamin L. Ebert, Michael A. Gillette, Amanda Paulovich, Scott L. Pomeroy, Todd R. Golub, Eric S. Lander, and Jill P. Mesirov.

No citation available

Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

Ted Liefeld, Michael Reich, Josh Gould, Peili Zhang, Pablo Tamayo, and Jill P. Mesirov

Bioinformatics, doi:10.1093/bioinformatics/bti587

GeneCruiser is a web service allowing users to annotate their genomic data by mapping microarray feature identifiers to gene identifiers from databases such as UniGene, while providing links to web resources such as the UCSC Genome Browser. It relies on a regularly updated database that retrieves and indexes the mappings between microarray probes and genomic databases. Genes are identified using the Life Sciences Identifier standard.

Jean-Philippe Brunet, Pablo Tamayo, Todd Golub, Jill Mesirov

Proc. Natl. Acad. Sci. USA 2004 101: 4164-4169

The ability to generate large amounts of genomic information using DNA microarrays provides an opportunity to extract from these data previously unrecognized biological structure and meaning. The challenge, however, is that existing unsupervised clustering methods are often non-robust, and lack the ability to discover subtle, context-dependent biological patterns. We describe here the use of Non-negative Matrix Factorization (NMF), an algorithm based on decomposition-by-parts, and we demonstrate its ability to recover meaningful biological information from cancer-related microarray data without supervision. Coupled with a novel model selection mechanism, NMF is an efficient method for identification of distinct molecular patterns and provides a powerful method for class discovery. NMF appears to have higher resolution than other methods such as hierarchical clustering or self-organizing maps, and to be less sensitive to a priori selection of genes. Rather than separating gene clusters based on distance computation, NMF detects context-dependent patterns of gene expression in complex biological systems. This ability, similar to semantic polysemy in text, provides a general method for robust molecular pattern discovery.

Clark T., Martin S., Liefeld T.

Briefings in Bioinformatics 5.1:59-70

The World-Wide Web provides a globally distributed communication framework that is essential for almost all scientific collaboration, including bioinformatics. However, several limits and inadequacies have become apparent, one of which is the inability to programmatically identify locally named objects that may be widely distributed over the network. This shortcoming limits our ability to integrate multiple knowledgebases, each of which gives partial information of a shared domain, as is commonly seen in bioinformatics. The Life Science Identifier (LSID) and LSID Resolution System (LSRS) provide simple and elegant solutions to this problem, based on the extension of existing internet technologies. LSID and LSRS are consistent with next-generation semantic web and semantic grid approaches. This article describes the syntax, operations, infrastructure compatibility considerations, use cases and potential future applications of LSID and LSRS. We see the adoption of these methods as important steps toward simpler, more elegant and more reliable integration of the world's biological knowledgebases, and as facilitating stronger global collaboration in biology.

M. Reich, K. Ohm, P. Tamayo, M. Angelo, J.P. Mesirov

Bioinformatics 2004; doi:10.1093/bioinformatics/bth138

GeneCluster 2.0 is a software package for analyzing gene expression and other bioarray data, giving users a variety of methods to build and evaluate class predictors, visualize marker lists, cluster data, and validate results. GeneCluster 2.0 greatly expands the data analysis capabilities of GeneCluster 1.0 by adding classification, class discovery, and permutation test methods. It includes algorithms for building and testing supervised models using weighted voting (WV) and k-nearest neighbors (kNN) algorithms, a module for systematically finding and evaluating clustering via self-organizing maps (SOM), and modules for marker gene selection and heat map visualization that allow users to view and sort samples and genes by many criteria. GeneCluster 2.0 is a standalone Java application and runs on any platform that supports the Java Runtime Environment version 1.3.1 or greater.

Sayan Mukherjee, Pablo Tamayo, Simon Rogers, Ryan Rifkin, Anna Engle, Colin Campbell, Todd R. Golub and Jill P. Mesirov.

J. of Comp. Biol. vol 10, n2, p119-142 (2003)

A statistical methodology for estimating dataset size requirements for classifying microarray data using learning curves is introduced. The goal is to use existing classification results to estimate dataset size requirements for future classification experiments and to evaluate the gain in accuracy and significance of classifiers built with additional data. The method is based on fitting inverse power-law models to construct empirical learning curves. It also includes a permutation test procedure to assess the statistical significance of classification performance for a given dataset size. This procedure is applied to several molecular classification problems representing a broad spectrum of levels of complexity.

Stefano Monti, Pablo Tamayo, Jill Mesirov, and Todd Golub

Machine Learning Journal, 52(1-2):91-118, 2003.

In this paper we present a new methodology of class discovery and clustering validation tailored to the task of analyzing gene expression data. The method can best be thought of as an analysis approach, to guide and assist in the use of any of a wide range of available clustering algorithms. We call the new methodology consensus clustering, and in conjunction with resampling techniques, it provides for a method to represent the consensus across multiple runs of a clustering algorithm and to assess the stability of the discovered clusters. The method can also be used to represent the consensus over multiple runs of a clustering algorithm with random restart (such as K-means, model-based Bayesian clustering, SOM, etc.), so as to account for its sensitivity to the initial conditions. Finally, it provides for a visualization tool to inspect cluster number, membership, and boundaries. We present the results of our experiments on both simulated data and real gene expression data aimed at evaluating the effectiveness of the methodology in discovering biologically meaningful clusters.

Ryan Rifkin, Sayan Mukherjee, Pablo Tamayo, Sridhar Ramaswamy, Chen-Hsiang Yeang, Michael Angelo, Michael Reich, Tomaso Poggio, Eric S. Lander, Todd R. Golub and Jill P. Mesirov.

SIAM Review, vol. 45, number 4, pp. 706-723 (2003).
Copyright ? 2003 by Society for Industrial and Applied Mathematics

Modern cancer treatment relies upon clinical judgment and microscopic tissue examination to classify tumors according to anatomical site of origin. This approach is effective but subjective and variable even among experienced clinicians and pathologists. Recently, DNA microarray-generated gene expression data has been used to build molecular cancer classifiers. Previous work from our group and others demonstrated methods for solving pair-wise classification problems using such global gene expression patterns. However, classification across multiple primary tumor classes poses new methodological and computational challenges. In this paper we describe a computational methodology for multi-class prediction that combines class specific (one vs. all) binary Support Vector Machines. We apply this methodology to the diagnosis of multiple common adult malignancies using DNA microarray data from a collection of 198 tumor samples, spanning 14 of the most common tumor types. Overall classification accuracy is 78%, far exceeding the expected accuracy for random classification. In a large subset of the samples (80%), the algorithm attains 90% accuracy. The methodology described in this paper both demonstrates that accurate gene expression-based multi-class cancer diagnosis is possible and highlights some of the analytic challenges inherent in applying to such strategies to biomedical research.

Alena A. Antipova, Pablo Tamayo, and Todd R. Golub

Genome Biology 2002, 3(12):research0073.1?0073.4

Background:
One of the factors limiting the number of genes analyzable on high density oligonucleotide arrays is that each transcript is probed by multiple oligonucleotide probes of distinct sequence in order to magnify the sensitivity and specificity of detection. Over the years, the number of probes per gene has decreased, but still no single array for the entire human genome has been reported. To reduce the number of probes required for each gene, a robust systematic approach for choosing the most representative probes is needed. Here, we introduce a generalizable empiric method for reducing the number of probes per gene while maximizing the fidelity to the original array design.

Results:
The methodology has been tested on a dataset comprised of 317 Affymetrix HuGeneFL GeneChips. The performance of the original and reduced probe sets was compared in four cancer classification problems. The results of these comparisons demonstrate that the reduction of the probe set by 95% does not dramatically affect performance, and thus illustrate the feasibility of substantially reducing probe numbers without significantly compromising sensitivity and specificity of detection.

Conclusions:
The strategy described here is potentially useful for designing small, limited-probe genome-wide arrays for screening applications.

Sridhar Ramaswamy, Pablo Tamayo, Ryan Rifkin, Sayan Mukherjee, Chen-Hsiang Yeang, Michael Angelo, Christine Ladd, Michael Reich, Eva Latulippe , Jill P. Mesirov, Tomaso Poggio, William Gerald , Massimo Loda, Eric S. Lander, Todd R. Golub

PNAS 98: 15149-15154

The optimal treatment of cancer patients depends on establishing accurate diagnoses using a complex combination of clinical and histopathologic data. In some instances this is difficult or impossible due to atypical clinical presentation or histopathology. To determine whether the diagnosis of multiple common adult malignancies could be achieved purely by molecular classification, we subjected 218 tumor samples, spanning 14 common tumor types, and 90 normal tissue samples to oligonucleotide microarray gene expression analysis. The expression levels of 16,063 genes and expressed sequence tags were used to evaluate the accuracy of a multi-class classifier based on a Support Vector Machine algorithm. Overall classification accuracy was 78%, far exceeding the accuracy of random classification (9%). Poorly differentiated cancers resulted in low-confidence predictions and could not be accurately classified according to their tissue of origin, indicating that they are molecularly distinct entities with dramatically different gene expression patterns compared to their well-differentiated counterparts. Taken together, these results demonstrate the feasibility of accurate, multi-class molecular cancer classification, and suggest a strategy for future clinical implementation of molecular cancer diagnostics.

Chen-Hsiang Yeang, Sridhar Ramaswamy, Pablo Tamayo, Sayan Mukherjee, Ryan M. Rifkin, Michael Angelo, Michael Reich, Eric Lander, Jill P. Mesirov, and Todd Golub

Bioinformatics 17(Suppl. 1):S316-S322. 2001

Using gene expression data to classify tumor types is a very promising tool in cancer diagnosis. Previous works show several pairs of tumor types can be successfully distinguished by their gene expression patterns. However, the simultaneous classification across a heterogeneous set of tumor types has not been well studied yet. We obtained 190 samples from 14 tumor classes and generated a combined expression dataset containing 16063 genes for each of those samples. We performed multi-class classification by combining the outputs of binary classifiers. Three binary classifiers (k-nearest neighbors, weighted voting, and support vector machines) were applied in conjunction with three combination scenarios (one-vs-all, all-pairs, hierarchical partitioning). We achieved the best cross validation error rate of 18.75% and the best test error rate of 21.74% by using the one-vs-all support vector machine algorithm. The results demonstrate the feasibility of performing clinically useful classification from samples of multiple tumor types.

Pablo Tamayo, Donna Slonim, Jill Mesirov, Qing Zhu, Sutisak Kitareewan, Ethan Dmitrovsky, Eric S. Lander, and Todd R. Golub.

Proc. Natl. Acad. Sci. USA 96:2907-2912.

Array technologies have made it straightforward to monitor simultaneously the expression pattern of thousands of genes. The challenge now is to interpret such massive data sets. The first step is to extract the fundamental patterns of gene expression inherent in the data. This paper describes the application of self-organizing maps, a type of mathematical cluster analysis that is particularly well suited for recognizing and classifying features in complex, multidimensional data. The method has been implemented in a publicly available computer package, GENECLUSTER, that performs the analytical calculations and provides easy data visualization. To illustrate the value of such analysis, the approach is applied to hematopoietic differentiation in four well studied models (HL-60, U937, Jurkat, and NB4 cells). Expression patterns of some 6,000 human genes were assayed, and an online database was created. GENECLUSTER was used to organize the genes into biologically relevant clusters that suggest novel hypotheses about hematopoietic differentiationfor example, highlighting certain genes and pathways involved in "differentiation therapy" used in the treatment of acute promyelocytic leukemia.

Rameen Beroukhim, Gad Getz, Leia Nghiemphu, Jordi Barretina, Teli Hsueh, David Linhart, Igor Vivanco, Jeffrey C. Lee, Julie H. Huang, Sethu Alexander, Jinyan Du, Tweeny Kau, Roman K. Thomas, Kinjal Shah, Horacio Soto, Sven Perner, John Prensner, Ralph M. Debiasi, Francesca Demichelis, Charlie Hatton, Mark A. Rubin, Levi A. Garraway, Stan F. Nelson, Linda Liau, Paul Mischel, Tim F. Cloughesy, Matthew Meyerson, Todd R. Golub, Eric S. Lander, Ingo K. Mellinghoff, William R. Sellers

Proc Natl Acad Sci U S A. 2007 Dec 11;104(50):20007-12

Comprehensive knowledge of the genomic alterations that underlie cancer is a critical foundation for diagnostics, prognostics and targeted therapeutics. Analyses of chromosomal aberrations are hampered by the lack of a statistical framework to distinguish meaningful events from random background aberrations. Here, we describe a systematic method called Genomic Identification of Significant Targets in Cancer (GISTIC). We use it to study chromosomal aberrations in 141 gliomas and compare the results with two prior studies. Traditional methods show little concordance between these studies and highlight hundreds of altered regions. The new approach reveals a highly concordant picture involving ~35 significant events, including 16-18 broad events near chromosome-arm size and 16-21 focal events. About half of these events correspond to known cancer-related genes, only some of which have been previously tied to glioma. We also show that superimposed broad and focal events need not have the same target. Specifically, gliomas with broad amplification of chromosome 7 have different properties than those with overlapping focal EGFR amplification: the broad events act in part through effects on MET and its ligand HGF and correlate with MET dependence in vitro. Our results support the feasibility and utility of systematic characterization of the cancer genome.

Michael S. Isakoff , Courtney G. Sansam , Pablo Tamayo , Aravind Subramanian , Julia A. Evans , Christine M. Fillmore , Xi Wang , Jaclyn A. Biegel , Scott L. Pomeroy , Jill P. Mesi Charles W. M. Roberts

Published online before print November 21, 2005, 10.1073/pnas.0509014102

Snf5 (Ini1/Baf47/Smarcb1), a core member of the Swi/Snf chromatin remodeling complex, is a potent tumor suppressor whose mechanism of action is largely unknown. Biallelic loss of Snf5 leads to the onset of aggressive cancers in both humans and mice. We have developed an innovative and widely applicable analytical technique for cross-species validation of cancer models and show that the gene expression profiles of our Snf5 murine models closely resemble those of human Snf5-deficient rhabdoid tumors. We exploit this system to produce what we believe to be the first report documenting the effects on gene expression of inactivating a Swi/Snf subunit in normal mammalian cells and to identify the transcriptional pathways regulated by Snf5. We demonstrate that the tumor suppressor activity of Snf5 depends on its regulation of cell cycle progression; Snf5 inactivation leads to aberrant up-regulation of E2F targets and increased levels of p53 that are accompanied by apoptosis, polyploidy, and growth arrest. Further, conditional mouse models demonstrate that inactivation of p16Ink4a or Rb (retinoblastoma) does not accelerate tumor formation in Snf5 conditional mice, whereas mutation of p53 leads to a dramatic acceleration of tumor formation.

Catherine L. Nutt, D. R. Mani, Rebecca A. Betensky, Pablo Tamayo, J. Gregory Cairncross, Christine Ladd, Ute Pohl, Christian Hartmann, Margaret E. McLaughlin, Tracy T. Batchelor, Peter M. Black, Andreas von Deimling, Scott L. Pomeroy, Todd R. Golub, and David N. Louis

Cancer Research 63(7):1602-1607

In modern clinical neuro-oncology, histopathological diagnosis affects therapeutic decisions and prognostic estimation more than any other variable. Among high grade gliomas, for example, histologically classic glioblastomas and anaplastic oligodendrogliomas follow markedly different clinical courses. Unfortunately, many malignant gliomas are diagnostically challenging; these non-classic lesions are difficult to classify by histological features, generating considerable interobserver variability and limited diagnostic reproducibility. The resulting tentative pathological diagnoses create significant clinical confusion. We investigated whether gene expression profiling, coupled with class prediction methodology, could be used to classify high grade gliomas in a manner more objective, explicit and consistent than standard pathology. Microarray analysis was used to determine the expression of approximately 12,000 genes in a set of 50 gliomas: 28 glioblastomas and 22 anaplastic oligodendrogliomas. Supervised learning approaches were used to build a two-class prediction model based on a subset of 14 glioblastomas and 7 anaplastic oligodendrogliomas with classic histology. A 20-feature k-nearest neighbor model correctly classified 18 out of the 21 classic cases in leave-one-out cross validation when compared to pathological diagnoses. This model was then used to predict the classification of clinically common, histologically non-classic samples. When tumors were classified according to pathology, the survival of patients with non-classic glioblastoma and non-classic anaplastic oligodendroglioma was not significantly different (p=0.19). However, class distinctions according to the model were significantly associated with survival outcome (p=0.05). This class prediction model was capable of classifying high grade, non-classic glial tumors objectively and reproducibly. Moreover, the model provided a more accurate predictor of prognosis in these non-classic lesions than did pathological classification. These data suggest that class prediction models, based on defined molecular profiles, classify diagnostically challenging malignant gliomas in a manner that better correlates with clinical outcome than does standard pathology.

Scott L. Pomeroy, Pablo Tamayo, Michelle Gaasenbeek, Lisa M. Sturla, Michael Angelo, Margaret E. McLaughlin, John Y.H. Kim, Liliana C. Goumnerova, Peter McL. Black, Ching Lau, Jeffrey C. Allen, David Zagzag, James M.Olson, Tom Curran, Cynthia Wetmore, Jaclyn A. Biegel, Tomaso Poggio, Shayan Mukherjee, Ryan Rifkin, Andrea Califano, Gustavo Stolovitzky, David N. Louis, Jill P. Mesirov, Eric S. Lander and Todd R. Golub

Nature, Vol 415, 24

Embryonal tumors of the central nervous system (CNS) represent a heterogeneous group of tumors about which little is known biologically, and whose diagnosis, based on morphologic appearance alone, is controversial. Medulloblastomas, for example, are the most common malignant brain tumor of childhood, but their pathogenesis is unknown, their relationship to other embryonal CNS tumors is debated, and patients' response to therapy is difficult to predict. We approached these problems by developing a classification system based on DNA microarray gene expression data derived from 99 patient samples. We demonstrate that medulloblastomas are molecularly distinct from other brain tumors including primitive neuroectodermal tumors (PNET), atypical teratoid/rhabdoid tumors (AT/RT) and malignant gliomas. Previously unrecognized evidence supporting the derivation of medulloblastomas from cerebellar granule cells through activation of the Sonic Hedgehog (Shh) pathway was also revealed. We further show that the clinical outcome of children with medulloblastomas is highly predictable based on the gene expression profiles of their tumors at diagnosis.

Gabriela Alexe, Gul S. Dalgin, Daniel Scanfeld, Pablo Tamayo, Jill P. Mesirov, Charles DeLisi, Lyndsay Harris, Nicola Barnard, Maritza Martel, Arnold J. Levine, Shridar Ganesan and Gyan Bhanot

Cancer Res. 2007;67(22):10669-76.

Gene expression analysis has identified biologically relevant subclasses of breast cancer. However, most classification schemes do not robustly cluster all HER2+ breast cancers, in part due to limitations and bias of clustering techniques used. In this article, we propose an alternative approach that first separates the HER2+ tumors using a gene amplification signal for Her2/neu amplicon genes and then applies consensus ensemble clustering separately to the HER2+ and HER2- clusters to look for further substructure. We applied this procedure to a microarray data set of 286 early-stage breast cancers treated only with surgery and radiation and identified two basal and four luminal subtypes in the HER2- tumors, as well as two novel and robust HER2+ subtypes. HER2+ subtypes had median distant metastasis-free survival of 99 months [95% confidence interval (95% CI), 83-118 months] and 33 months (95% CI, 11-54 months), respectively, and recurrence rates of 11% and 58%, respectively. The low recurrence subtype had a strong relative overexpression of lymphocyte-associated genes and was also associated with a prominent lymphocytic infiltration on histologic analysis. These data suggest that early-stage HER2+ cancers associated with lymphocytic infiltration are a biologically distinct subtype with an improved natural history.

Gul S Dalgin*, Gabriela Alexe*, Daniel Scanfeld, Pablo Tamayo, Jill P Mesirov, Shridar Ganesan, Charles DeLisi and Gyan Bhanot * Contributed equally

BMC Bioinformatics 2007, 8:291doi:10.1186/1471-2105-8-291

Background Clustering analysis of microarray data is often criticized for giving ambiguous results because of sensitivity to data perturbation or clustering techniques used. In this paper, we describe a new method based on principal component analysis and ensemble consensus clustering that avoids these problems. Results We illustrate the method on a public microarray dataset from 36 breast cancer patients of whom 31 were diagnosed with at least two of three pathological stages of disease (atypical ductal hyperplasia (ADH), ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC). Our method identifies an optimum set of genes and divides the samples into stable clusters which correlate with clinical classification into Luminal, Basal-like and Her2+ subtypes. Our analysis reveals a hierarchical portrait of breast cancer progression and identifies genes and pathways for each stage, grade and subtype. An intriguing observation is that the disease phenotype is distinguishable in ADH and progresses along distinct pathways for each subtype. The genetic signature for disease heterogeneity across subtypes is greater than the heterogeneity of progression from DCIS to IDC within a subtype, suggesting that the disease subtypes have distinct progression pathways. Our method identifies six disease subtype and one normal clusters. The first split separates the normal samples from the cancer samples. Next, the cancer cluster splits into low grade (pathological grades 1 and 2) and high grade (pathological grades 2 and 3) while the normal cluster is unchanged. Further, the low grade cluster splits into two subclusters and the high grade cluster into four. The final six disease clusters are mapped into one Luminal A, three Luminal B, one Basal-like and one Her2+. Conclusion We confirm that the cancer phenotype can be identified in early stage because the genes altered in this stage progressively alter further as the disease progresses through DCIS into IDC. We identify six subtypes of disease which have distinct genetic signatures and remain separated in the clustering hierarchy. Our findings suggest that the heterogeneity of disease across subtypes is higher than the heterogeneity of the disease progression within a subtype, indicating that the subtypes are in fact distinct diseases.

Cory M. Johannessen, Jesse S. Boehm, So Young Kim, Sapana R. Thomas, Leslie Wardwell, Laura A. Johnson, Caroline M. Emery, Nicolas Stransky, Alexandria P. Cogdill, Jordi Barretina, Giordano Caponigro, Haley Hieronymus, Ryan R. Murray, Kourosh Salehi-Ashtiani, David E. Hill, Marc Vidal, Jean J. Zhao, Xiaoping Yang, Ozan Alkan, Sungjoon Kim, Jennifer L. Harris, Christopher J. Wilson, Vic E. Myer, Peter M. Finan, David E. Root, Thomas M. Roberts, Todd Golub, Keith T. Flaherty, Reinhard Dummer, Barbara Weber, William R. Sellers, Robert Schlege, Jennifer A. Wargo, William C. Hahn, Levi A. Garraway

No citation available

Oncogenic mutations in the serine/threonine kinase B-RAF are found in 50-70% of malignant melanomas1. Pre-clinical studies have demonstrated that the B-RAFV600E mutation predicts a dependency on the mitogen activated protein kinase (MAPK) signaling cascade in melanoma1-5???an observation that has been validated by the success of RAF and MEK inhibitors in clinical trials6-8. However, clinical responses to targeted anticancer therapeutics are frequently confounded by de novo or acquired resistance9-11. Identification of resistance mechanisms in a manner that elucidates alternative ???druggable??? targets may inform effective longterm treatment strategies12. Here, we expressed ~600 kinase and kinase-related open reading frames (ORFs) in parallel to functionally interrogate resistance to a selective RAF kinase inhibitor. We identified MAP3K8 (COT/TPL2) as a MAPK pathway agonist that drives resistance to RAF inhibition in B-RAFV600E cell lines. COT activates ERK primarily through MEK-dependent mechanisms that do not require RAF signaling. Moreover, COT expression is associated with de novo resistance in B-RAFV600E cultured cell lines and acquired resistance in melanoma cells and tissue obtained from relapsing patients following treatment with MEK or RAF inhibition. We further identify combinatorial MAPK pathway inhibition or targeting of COT kinase activity as possible therapeutic strategies for reducing MAPK pathway activation in this setting. Together, these results provide new insights into resistance mechanisms involving the MAPK pathway and articulate an integrative approach through which high-throughput functional screens may inform the development of novel therapeutic strategies.

Jennifer L. Shepard*, James F. Amatruda*, Howard M. Stern, Aravind Subramanian, David Finkelstein, James Ziai, K. Rose Finley, Kathleen L. Pfaff, Candace Hersey, Yi Zhou, Bruce Barut, Matthew Freedman, Charles Lee, Jan Spitsbergen, Donna Neuberg, Gerhard Weber, Todd R. Golub, Jonathan N. Glickman, Jeffery L. Kutok, Jon C. Aster, and Leonard I. Zon

No citation available

A major goal of cancer research has been to identify new genes that contribute to cancer formation. The similar pathology between zebrafish and human tumors, as well as the past success of large-scale genetic screens in uncovering human disease genes, makes zebrafish an ideal system in which to find such new genes 1,2. Here we show that a zebrafish forward genetic screen uncovered multiple cell proliferation mutants including one mutant, crash&burn (crb), which represents a loss of function mutation in bmyb, a transcriptional regulator and member of a putative proto-oncogene family. crb mutant embryos have defects in mitotic progression and mitotic spindle formation, and exhibit genomic instability. Regulation of cyclin B levels by bmyb appears to be the mechanism of mitotic accumulation in crb. Carcinogenesis studies reveal increased cancer susceptibility in adult crb heterozygotes. Gene expression signatures associated with loss of bmyb in zebrafish are also correlated with conserved signatures in human tumor samples and loss of the B-myb gene signature is associated with retention of p53 function. Our findings show that zebrafish screens can uncover cancer pathways, and surprisingly demonstrate that loss of function of bmyb is associated with cancer.

Alejandro Sweet-Cordero, Sayan Mukherjee, Aravind Subramanian, Han You, Jeffrey J Roix, Christine Ladd-Acosta, Jill Mesirov, Todd R Golub, & Tyler Jacks

No citation available

Using advanced gene targeting methods, generating mouse models of cancer that accurately reproduce the genetic alterations present in human tumors is now relatively straightforward. The challenge is to determine to what extent such models faithfully mimic human disease with respect to the underlying molecular mechanisms that accompany tumor progression. Here we describe a method for comparing mouse models of cancer with human tumors using gene-expression profiling. We applied this method to the analysis of a model of Kras2-mediated lung cancer and found a good relationship to human lung adenocarcinoma, thereby validating the model. Furthermore, we found that whereas a gene-expression signature of KRAS2 activation was not identifiable when analyzing human tumors with known KRAS2 mutation status alone, integrating mouse and human data uncovered a gene-expression signature of KRAS2 mutation in human lung cancer. We confirmed the importance of this signature by gene-expression analysis of short hairpin RNA−mediated inhibition of oncogenic Kras2. These experiments identified both a pattern of gene expression indicative of KRAS2 mutation and potential effectors of oncogenic KRAS2 activity in human cancer. This approach provides a strategy for using genomic analysis of animal models to probe human disease.

Steven M. Corsello, Giovanni Roti, Kenneth N. Ross, Kwan T. Chow, Ilene Galinsky, Richard M. Stone, Daniel J. DeAngelo, Andrew N. Kung, Todd R. Golub, and Kimberly Stegmaier

Blood. 2009 Jun 11;113(24):6193-205. Epub 2009 Apr 17.

Somatic rearrangements of transcription factors are common abnormalities in the acute leukemias. With rare exception, however, the resultant protein products have remained largely intractable as pharmacological targets. One example is AML1-ETO, the most common translocation reported in AML. In order to identify AML1-ETO modulators, we screened a small molecule library using a chemical genomic approach. Gene expression signatures were used as surrogates for the expression versus loss of the translocation in AML1-ETO-expressing cells. The top classes of compounds that scored in this screen were corticosteroids and dihydrofolate reductase (DHFR) inhibitors. In addition to modulating the AML1-ETO signature, both classes induced evidence of differentiation, dramatically inhibited cell viability, and ultimately induced apoptosis via on-target activity. Furthermore, AML1-ETO-expressing cell lines were exquisitely sensitive to the effects of corticosteroids on cellular viability compared to non-expressers. The corticosteroids diminished AML1-ETO protein in both cell lines and primary patient cells, which was rescued via proteasome inhibition and glucocorticoid receptor antagonism. Moreover, these molecule classes demonstrated synergy in combination with standard AML chemotherapy agents and activity in an orthotopic model of AML1-ETO-positive AML. This work suggests a possible role for DHFR inhibitors and glucocorticoids in treating patients with AML1-ETO-positive disease.

Cynthia K. Hahn, Kenneth N. Ross, Ian Warrington, Ralph Mazitschek, Cindy M. Kanegai, Renee D. Wright, Andrew L. Kung, Todd R. Golub, and Kimberly Stegmaier

No citation available

The discovery of new small molecules and their testing in rational combination poses an ongoing problem for rare diseases, particularly for pediatric cancers such as neuroblastoma. Despite maximal cytotoxic therapy with double autologous stem cell transplantation, outcome remains poor for children with high-stage disease. Because differentiation is aberrant in this malignancy, compounds that modulate transcription, such as histone deactylase (HDAC) inhibitors, are of particular interest. However, as single agents, HDAC inhibitors have had limited efficacy. In the present study, we use an HDAC inhibitor as an enhancer to screen a bioactive small molecule library for compounds inducing neuroblastoma maturation. In order to quantify differentiation, we use an enabling gene expression-based screening strategy. The top hit identified in the screen was all-trans retinoic acid. Secondary assays confirmed greater neuroblastoma differentiation with the combination of an HDAC inhibitor and a retinoid versus either compound alone. Furthermore, effects of combination therapy were synergistic with respect to inhibition of cellular viability and induction of apoptosis. In a xenograft model of neuroblastoma, animals treated with combination therapy had the longest survival. This work suggests that testing of an HDAC inhibitor and retinoid in combination is warranted for children with neuroblastoma and demonstrates the success of a signature-based screening approach to prioritize compound combinations for testing in rare diseases.

Kimberly Stegmaier, Jenny S. Wong, Kenneth N. Ross, Kwan T. Chow, David Peck, Renee D. Wright, Stephen L. Lessnick, Andrew L. Kung, Todd R. Golub

Stegmaier K, Wong JS, Ross KN, Chow KT, Peck D, et al. (2007) Signature-based small molecule screening identifies cytosine arabinoside as an EWS/FLI modulator in Ewing sarcoma. PLoS Med, Vol. 4, No. 4, e122 doi:10.1371/journal.pmed.0040122

The presence of tumor-specific mutations in the cancer genome represents a potential opportunity for pharmacologic intervention to therapeutic benefit. Unfortunately, many classes of oncoproteins (e.g., transcription factors) are not amenable to conventional small-molecule screening. Despite the identification of tumor-specific somatic mutations, most cancer therapy still utilizes nonspecific, cytotoxic drugs. One illustrative example is the treatment of Ewing sarcoma. Although the EWS/FLI oncoprotein, present in the vast majority of Ewing tumors, was characterized over ten years ago, it has never been exploited as a target of therapy. Previously, this target has been intractable to modulation with traditional small-molecule library screening approaches. Here we describe a gene expressionbased approach to identify compounds that induce a signature of EWS/FLI attenuation. We hypothesize that screening small-molecule libraries highly enriched for FDA-approved drugs will provide a more rapid path to clinical application. A gene expression signature for the EWS/FLI off state was determined with microarray expression profiling of Ewing sarcoma cell lines with EWS/FLI-directed RNA interference. A small-molecule library enriched for FDA-approved drugs was screened with a high-throughput, ligation-mediated amplification assay with a fluorescent, bead-based detection. Screening identified cytosine arabinoside (ARA-C) as a modulator of EWS/FLI. ARA-C reduced EWS/FLI protein abundance and accordingly diminished cell viability and transformation and abrogated tumor growth in a xenograft model. Given the poor outcomes of many patients with Ewing sarcoma and the well-established ARA-C safety profile, clinical trials testing ARA-C are warranted. We demonstrate that a gene expressionbased approach to small-molecule library screening can identify, for rapid clinical testing, candidate drugs that modulate previously intractable targets. Furthermore, this is a generic approach that can, in principle, be applied to the identification of modulators of any tumor-associated oncoprotein in the rare pediatric malignancies, but also in the more common adult cancers.

Justin Lamb, Emily D. Crawford, David Peck, Joshua W. Modell, Irene C. Blat, Matthew J. Wrobel, Jim Lerner, Jean-Philippe Brunet, Aravind Subramanian, Kenneth N. Ross, Michael Reich, Haley Hieronymus, Guo Wei, Scott A. Armstrong, Stephen J. Haggarty, Paul A. Clemons, Ru Wei, Steven A. Carr, Eric S. Lander and Todd R. Golub

Science 313: 1929-1935 (2006)

To pursue a systematic approach to the discovery of functional connections among diseases, genetic perturbation, and drug action, we have created the first installment of a reference collection of gene-expression profiles from cultured human cells treated with bioactive small molecules, together with pattern-matching software to mine these data. We demonstrate that this 'Connectivity Map' resource can be used to find connections among small molecules sharing a mechanism of action, chemicals and physiological processes, and diseases and drugs. These results indicate the feasibility of the approach and suggest the value of a large-scale community Connectivity Map project.

Kimberly Stegmaier, Steven M. Corsello, Kenneth N. Ross, Jenny S. Wong, Daniel J. DeAngelo, Todd R. Golub

Blood. 2005 Oct 15;106(8):2841-8.

Cure rates for patients with acute myeloid leukemia (AML) remain low despite ever-increasing dose-intensity of cytotoxic therapy. In an effort to identify novel approaches to AML therapy, we recently reported a new method of chemical screening based on the modulation of a gene expression signature of interest. We applied this approach to the discovery of AML differentiation-promoting compounds. Among the compounds inducing neutrophilic differentiation was 4,5-dianilinophthalimide (DAPH1), previously reported to inhibit epidermal growth factor receptor (EGFR) kinase activity. Here, we report that the FDA-approved EGFR inhibitor gefitinib (Iressa) similarly promotes the differentiation of AML cell lines and primary patient-derived AML blasts in vitro. Gefitinib induced differentiation based on morphological assessment, nitro-blue tetrazolium reduction, cell surface markers, genome-wide patterns of gene expression, and inhibition of proliferation at clinically achievable doses. Importantly, EGFR expression was not detected in AML cells, indicating that gefitinib functions through a previously unrecognized, EGFR-independent mechanism. These studies indicate that clinical trials testing the efficacy of gefitinib in patients with AML are warranted.

Kimberly Stegmaier, Kenneth N. Ross, Sierra A. Colavito, Shawn O'Malley, Brent R. Stockwell, and Todd R. Golub

Nature Genetics, Vol. 36, No. 3, 257-263, March 2004.

Chemical genomics involves generating large collections of small molecules and using them to modulate cellular states. Although there has been recent progress in the systematic synthesis of structurally diverse compounds, their use in screens of cellular circuitry has remained an ad hoc process. Here, we outline a general, efficient approach called GE-HTS (Gene Expression-Based High Throughput Screening) in which a gene expression signature is used as a surrogate for cellular states, and we describe its application in a particular setting -- the identification of compounds inducing the differentiation of acute myeloid leukemia cells. In screening 1,739 compounds, we identified 8 that reliably induced the differentiation signature, and furthermore yielded functional evidence of bona fide differentiation. The results indicate that GE-HTS may provide a powerful, general approach for chemical screening.

Jane E. Staunton, Donna K. Slonim , Hilary A. Coller, Pablo Tamayo, Michael J. Angelo , Johnny Park , Uwe Scherf, Jae K. Lee, William O. Reinhold, John N. Weinstein, Jill P. Mesirov, Eric S. Landerand Todd R. Golub

Proc. Natl. Acad. Sci. USA 98: 0787-10792

In an effort to develop a genomics-based approach to the prediction of drug response, we have developed an algorithm for classification of cell line chemosensitivity based on gene expression profiles alone. Using oligonucleotide microarrays, the expression levels of 6817 genes were measured in a panel of 60 human cancer cell lines (the NCI-60) for which the chemosensitivity profiles of thousands of chemical compounds have been determined. We sought to determine whether the gene expression signatures of untreated cells were sufficient for the prediction of chemosensitivity. Gene expression-based classifiers of sensitivity or resistance for 232 compounds were generated and then evaluated on independent sets of data. The classifiers were designed to be independent of the cells' tissue of origin. The accuracy of chemosensitivity prediction was considerably better than would be expected by chance. Eighty-eight of 232 expression-based classifiers performed accurately (with p < 0.05) on an independent test set, whereas only 12 of the 232 would be expected to do so by chance. These results suggest that at least for a subset of compounds, genomic approaches to chemosensitivity prediction are feasible.

Firestein R, Bass AJ, Kim SY, Dunn IF, Silver SJ, Guney I, Freed E, Ligon AH, Vena N, Ogino S, Chheda MG, Tamayo P, Finn S, Shrestha Y, Boehm JS, Jain S, Bojarski E, Mermel C, Barretina J, Chan JA, Baselga J, Tabernero J, Root DE, Fuchs CS, Loda M, Shivdasani RA, Meyerson M, Hahn WC.

Firestein R, Bass AJ, Kim SY, Dunn IF, Silver SJ, Guney I, Freed E, Ligon AH, Vena N, Ogino S, Chheda MG, Tamayo P, Finn S, Shrestha Y, Boehm JS, Jain S, Bojarski E, Mermel C, Barretina J, Chan JA, Baselga J, Tabernero J, Root DE, Fuchs CS, Loda M, Shivdasani RA, Meyerson M, Hahn WC. CDK8 is a colorectal cancer oncogene that regulates beta-catenin activity. Nature. 2008 Sep 25;455(7212):547-51. Epub 2008 Sep 14.

Aberrant activation of the canonical WNT/beta-catenin pathway occurs in almost all colorectal cancers and contributes to their growth, invasion and survival. Although dysregulated beta-catenin activity drives colon tumorigenesis, further genetic perturbations are required to elaborate full malignant transformation3. To identify genes that both modulate beta-catenin activity and are essential for colon cancer cell proliferation, we conducted two loss-of-function screens in human colon cancer cells and compared genes identified in these screens with an analysis of copy number alterations in colon cancer specimens. One of these genes, CDK8, which encodes a member of the mediator complex4, is located at 13q12.13, a region of recurrent copy number gain in a substantial fraction of colon cancers. Here we show that the suppression of CDK8 expression inhibits proliferation in colon cancer cells characterized by high levels of CDK8 and beta-catenin hyperactivity. CDK8 kinase activity was necessary for beta-catenin-driven transformation and for expression of several beta-catenin transcriptional targets. Together these observations suggest that therapeutic interventions targeting CDK8 may confer a clinical benefit in beta-catenin-driven malignancies.

Salvesen HB, Carter SL, Mannelqvist M, Dutt A, Getz G, Stefansson IM, Raeder MB, Sos ML, Engelsen IB, Trovik J, Wik E, Greulich H, B?? TH, Jonassen I, Thomas RK, Zander T, Garraway LA, Oyan AM, Sellers WR, Kalland KH, Meyerson M, Akslen LA, Beroukhim R.

Proc. Natl. Acad. Sci. USA 106:4834-4839

Although 75% of endometrial cancers are treated at an early stage, 15% to 20% of these recur. We performed an integrated analysis of genome-wide expression and copy-number data for primary endometrial carcinomas with extensive clinical and histopathological data to detect features predictive of recurrent disease. Unsupervised analysis of the expression data distinguished 2 major clusters with strikingly different phenotypes, including significant differences in disease-free survival. To identify possible mechanisms for these differences, we performed a global genomic survey of amplifications, deletions, and loss of heterozygosity, which identified 11 significantly amplified and 13 significantly deleted regions. Amplifications of 3q26.32 harboring the oncogene PIK3CA were associated with poor prognosis and segregated with the aggressive transcriptional cluster. Moreover, samples with PIK3CA amplification carried signatures associated with in vitro activation of PI3 kinase (PI3K), a signature that was shared by aggressive tumors without PIK3CA amplification. Tumors with loss of PTEN expression or PIK3CA overexpression that did not have PIK3CA amplification also shared the PI3K activation signature, high protein expression of the PI3K pathway member STMN1, and an aggressive phenotype in test and validation datasets. However, mutations of PTEN or PIK3CA were not associated with the same expression profile or aggressive phenotype. STMN1 expression had independent prognostic value. The results affirm the utility of systematic characterization of the cancer genome in clinically annotated specimens and suggest the particular importance of the PI3K pathway in patients who have aggressive endometrial cancer.

Dutt A, Salvesen HB, Chen TH, Ramos AH, Onofrio RC, Hatton C, Nicoletti R, Winckler W, Grewal R, Hanna M, Wyhs N, Ziaugra L, Richter DJ, Trovik J, Engelsen IB, Stefansson IM, Fennell T, Cibulskis K, Zody MC, Akslen LA, Gabriel S, Wong KK, Sellers WR, Meyerson M, Greulich H.

Proc. Natl. Acad. Sci. USA 105:8713-8717

Oncogenic activation of tyrosine kinases is a common mechanism of carcinogenesis and, given the druggable nature of these enzymes, an attractive target for anticancer therapy. Here, we show that somatic mutations of the fibroblast growth factor receptor 2 (FGFR2) tyrosine kinase gene, FGFR2, are present in 12% of endometrial carcinomas, with additional instances found in lung squamous cell carcinoma and cervical carcinoma. These FGFR2 mutations, many of which are identical to mutations associated with congenital craniofacial developmental disorders, are constitutively activated and oncogenic when ectopically expressed in NIH 3T3 cells. Inhibition of FGFR2 kinase activity in endometrial carcinoma cell lines bearing such FGFR2 mutations inhibits transformation and survival, implicating FGFR2 as a novel therapeutic target in endometrial carcinoma.

Benjamin L. Ebert, Naomi Galili, Pablo Tamayo, Jocelyn Bosco, Raymond Mak, Jennifer Pretz, Christine Ladd-Acosta, Richard Stone, Todd R. Golub and Azra Raza.

No citation available

No abstract available

Benjamin L. Ebert, Jennifer Pretz, Jocelyn Bosco, Cindy Y. Chang, Pablo Tamayo, Naomi Galili, Azra Raza, David E. Root, Eyal Attar, Steven R. Ellis, and Todd R. Golub.

Nature 451, 335-339 (17 January 2008) | doi:10.1038/nature06494

Somatic chromosomal deletions in cancer are thought to indicate the location of tumor suppressor genes, whereby complete loss of gene function occurs through biallelic deletion, point mutation, or epigenetic silencing, thus fulfilling Knudson's two-hit hypothesis. In many recurrent deletions, however, such biallelic inactivation has not been found. One prominent example is the 5q- syndrome, a subtype of myelodysplastic syndrome (MDS) characterized by a defect in erythroid differentiation. Here, we describe an RNA interference (RNAi)-based approach to discovery of the 5q- disease gene. We find that partial loss of function of the ribosomal protein RPS14 phenocopies the disease in normal hematopoietic progenitor cells, and moreover that forced expression of RPS14 rescues the disease phenotype in patient-derived bone marrow cells. In addition, we identified a block in the processing of pre-rRNA in RPS14 deficient cells that is highly analogous to the functional defect in Diamond Blackfan Anemia, linking the molecular pathophysiology of the 5q- syndrome to a congenital bone marrow failure syndrome. These results indicate that the 5q- syndrome is caused by a defect in ribosomal protein function, and suggests that RNAi screening is an effective strategy for identifying causal haploinsufficiency disease genes.

Benjamin L. Ebert, Michele M. Lee, Jennifer L. Pretz, Aravind Subramanian, Raymond Mak, Todd R. Golub, Colin A. Sieff

No citation available

Diamond Blackfan Anemia (DBA), a congenital erythroblastopenia, is a model disease for the study of erythroid differentiation, but is poorly understood. RPS19 is the only gene yet to have been associated with DBA, but its relevance to erythroid differentiation is unclear. The molecular basis for the stimulation of erythropoiesis by glucocorticoids in patients with DBA has not been identified. We demonstrate that targeted degradation of the RPS19 gene, through retroviral expression of short hairpin RNAs (shRNAs), in cultured human CD34+ cells blocks the proliferation and differentiation of erythroid progenitor cells. Treatment of RPS19 deficient cells with dexamethasone restores erythroid differentiation to normal levels. We investigated the molecular basis of pharmacologic therapies for DBA using oligonucleotide microarrays to survey gene expression in CD34+ cells treated with combinations of dexamethasone, erythropoietin, stem cell factor, and interleukin-3. Dexamethasone did not alter expression of RPS19, but activated a genetic program that includes a set of key hematopoietic regulatory genes. Genes specific to erythroid progenitor cells were up-regulated by dexamethasone, while genes specific to non-erythroid lineages were down-regulated. Deficiency of RPS19 therefore blocks proliferation of immature erythroid progenitor cells, and dexamethasone activates proliferation of the same cell population through mechanisms independent of RPS19.

Chang-Zheng Chen, Min Li, David de Graaf, Stefano Monti, Berthold Gottgens, Maria-Jose Sanchez, Eric S. Lander, Todd R. Golub, Anthony R. Green and Harvey F. Lodish

PNAS 2002 99(24):15468-15473

We describe a strategy to obtain highly enriched long-term repopulating (LTR) hematopoietic stem cells (HSCs) from bone marrow side-population (SP) cells by using a transgenic reporter gene driven by a stem cell enhancer. To analyze the gene-expression profile of the rare HSC population, we developed an amplification protocol termed "constant-ratio PCR," in which sample and control cDNAs are amplified in the same PCR. This protocol allowed us to identify genes differentially expressed in the enriched LTR-HSC population by oligonucleotide microarray analysis using as little as 1 ng of total RNA. Endoglin, an ancillary transforming growth factor beta receptor, was differentially expressed by the enriched HSCs. Importantly, endoglin-positive cells, which account for 20% of total SP cells, contain all the LTR-HSC activity within bone marrow SP. Our results demonstrate that endoglin, which plays important roles in angiogenesis and hematopoiesis, is a functional marker that defines LTR HSCs. Our overall strategy may be applicable for the identification of markers for other tissue-specific stem cells.

Toffanin S, Hoshida Y, Lachenmayer A, Villanueva A, Cabellos L, Minguez B, Savic R, Ward SC, Thung S, Chiang DY, Alsinet C, Tovar V, Roayaie S, Schwartz M, Bruix J, Waxman S, Friedman SL, Golub T, Mazzaferro V, Llovet JM

Gastroenterology. 2011

BACKGROUND & AIMS: Hepatocellular carcinoma (HCC) is a heterogeneous tumor that develops via activation of multiple pathways and molecular alterations. It has been a challenge to identify molecular classes of HCC and design treatment strategies for each specific subtype. MicroRNAs (miRNAs) are involved in HCC pathogenesis and their expression profiles have been used to classify cancers. We analyzed miRNA expression in human HCC samples to identify molecular subclasses and oncogenic miRNAs. METHODS: We performed miRNA profiling of 89 HCC samples using a ligation-mediated amplification method. Subclasses were identified by unsupervised clustering analysis. We identified molecular features specific for each subclass using expression pattern (Affymetrix U133 2.0), DNA change (Affymetrix STY Mapping Array), mutation (CTNNB1), and immunohistochemical (phosphor[p]-Akt, p-IGF-IR, p-S6, p-EGFR, ??-catenin) analyses. The roles of selected miRNAs were investigated in cell lines and in an orthotopic model of HCC. RESULTS: We identified 3 main clusters of HCCs: the Wnt (32 of 89; 36%), interferon-related (29 of 89; 33%), and proliferation (28 of 89; 31%) subclasses. A subset of patients with tumors in the proliferation subclass (8 of 89; 9%) overexpressed a family of poorly characterized miRNAs from chr19q13.42. Expression of miR-517a and miR-520c (from ch19q13.42) increased proliferation, migration, and invasion of HCC cells in vitro. MiR-517a promoted tumorigenesis and metastatic dissemination in vivo. CONCLUSIONS: We propose miRNA-based classification of 3 subclasses of HCC. Among the proliferation class, miR-517a is an oncogenic miRNA that promotes tumor progression. There is rationale for developing therapies that miRNA 517 for patients with HCC.

Villanueva A, Hoshida Y, Battiston C, Tovar V, Sia D, Alsinet C, Cornella H, Liberzon A, Kobayashi M, Kumada H, Thung SN, Bruix J, Newell P, April C, Fan JB, Roayaie S, Mazzaferro V, Schwartz ME, Llovet JM

Gastroenterology. 2011

BACKGROUND & AIMS: In approximately 70% of patients with hepatocellular carcinoma (HCC) treated by resection or ablation, disease recurs within 5 years. Although gene expression signatures have been associated with outcome, there is no method to predict recurrence based on combined clinical, pathology, and genomic data (from tumor and cirrhotic tissue). We evaluated gene expression signatures associated with outcome in a large cohort of patients with early stage (Barcelona-Clinic Liver Cancer 0/A), single-nodule HCC and heterogeneity of signatures within tumor tissues. METHODS: We assessed 287 HCC patients undergoing resection and tested genome-wide expression platforms using tumor (n = 287) and adjacent nontumor, cirrhotic tissue (n = 226). We evaluated gene expression signatures with reported prognostic ability generated from tumor or cirrhotic tissue in 18 and 4 reports, respectively. In 15 additional patients, we profiled samples from the center and periphery of the tumor, to determine stability of signatures. Data analysis included Cox modeling and random survival forests to identify independent predictors of tumor recurrence. RESULTS: Gene expression signatures that were associated with aggressive HCC were clustered, as well as those associated with tumors of progenitor cell origin and those from nontumor, adjacent, cirrhotic tissues. On multivariate analysis, the tumor-associated signature G3-proliferation (hazard ratio [HR], 1.75; P = .003) and an adjacent poor-survival signature (HR, 1.74; P = .004) were independent predictors of HCC recurrence, along with satellites (HR, 1.66; P = .04). Samples from different sites in the same tumor nodule were reproducibly classified. CONCLUSIONS: We developed a composite prognostic model for HCC recurrence, based on gene expression patterns in tumor and adjacent tissues. These signatures predict early and overall recurrence in patients with HCC, and complement findings from clinical and pathology analyses.

Victoria Tovar, Clara Alsinet, Augusto Villanueva, Yujin Hoshida, Derek Y. Chiang, Manel Sole, Swan Thung, Susana Moyano, Sara Toffanin, Beatriz M??nguez, Laia Cabellos, Judit Peix, Myron Schwartz, Vincenzo Mazzaferro, Jordi Bruix, Josep M. Llovet

Journal of Hepatology, 2010 in press

No abstract available

Yujin Hoshida, Sara Toffanin, Anja Lachenmayer, Augusto Villanueva, Beatriz Minguez, Josep M. Llovet

Seminars in Liver Disease, 30(1); 35, 2010

No abstract available

Yujin Hoshida

Journal of Hepatology, 2009 Nov;51(5):842-4

No abstract available

Yujin Hoshida, Sebastian M.B. Nijman, Masahiro Kobayashi, Jennifer A. Chan, Jean-Philippe Brunet, Derek Y. Chiang, Augusto Villanueva, Philippa Newell, Kenji Ikeda, Masaji Hashimoto, Goro Watanabe, Stacey Gabriel, Scott L. Friedman, Hiromitsu Kumada, Josep M. Llovet, Todd R. Golub

Cancer Research 2009;69(18):7385-92

Hepatocellular carcinoma (HCC) is a highly heterogeneous disease, and prior attempts to develop genomics-based classification for HCC have yielded highly divergent results, indicating difficulty to identify unified molecular anatomy. We performed a meta-analysis of gene expression profiles in datasets from 8 independent patient cohorts across the world. In addition, aiming to establish the real world applicability of a classification system, we profiled 118 formalin-fixed, paraffin-embedded tissues from an additional patient cohort. A total of 603 patients were analyzed, representing the major etiologies of HCC (hepatitis B and C) collected from Western and Eastern countries. We observed 3 robust HCC subclasses (termed S1, S2, and S3), each correlated with clinical parameters such as tumor size, extent of cellular differentiation, and serum alpha-fetoprotein levels. An analysis of the components of the signatures indicated that S1 reflected aberrant activation of the WNT signaling pathway, S2 was characterized by proliferation as well as MYC and AKT activation, and S3 was associated with hepatocyte differentiation. Functional studies indicated that the WNT pathway activation signature characteristic of S1 tumors was not simply the result of beta-catenin mutation, but rather was the result of TGF-beta activation, thus representing a new mechanism of WNT pathway activation in HCC. These experiments establish the first consensus classification framework for HCC based on gene-expression profiles, and highlight the power of integrating of multiple datasets to define a robust molecular taxonomy of the disease.

Pippa Newell, Sara Toffanin, Augusto Villanueva, Derek Y. Chiang, Beatriz Minguez, Laia Cabellos, Radoslav Savic, Yujin Hoshida, Kiat Hon Lim, Pedro Melgar-Lesmes, Steven Yea, Judit Peix, Kemal Deniz, M. Isabel Fiel, Swan Thung, Clara Alsinet, Victoria Tovar, Vincenzo Mazzaferro, Jordi Bruix, Sasan Roayaie, Myron Schwartz, Scott L. Friedman, Josep M. Llovet

Journal of Hepatology, 2009;51(4): 725-733

Background/Aims The success of sorafenib in the treatment of advanced hepatocellular carcinoma (HCC) has focused interest on the role of Ras signaling in this malignancy. We investigated the molecular alterations of the Ras pathway in HCC and the antineoplastic effects of sorafenib in combination with rapamycin, an inhibitor of mTOR pathway, in experimental models. Methods Gene expression (qRT-PCR, oligonucleotide microarray), DNA copy number changes (SNP-array), methylation of tumor suppressor genes (methylation-specific PCR) and protein activation (immunohistochemistry) were analysed in 351 samples. Anti-tumoral effects of combined therapy targeting the Ras and mTOR pathways were evaluated in cell lines and HCC xenografts. Results Different mechanisms accounted for Ras pathway activation in HCC. H-ras was up-regulated during different steps of hepatocarcinogenesis. B-raf was overexpressed in advanced tumors and its expression was associated with genomic amplification. Partial methylation of RASSF1A and NORE1A was detected in 89% and 44% of tumors respectively, and complete methylation was found in 11 and 4% of HCCs. Activation of the pathway (pERK immunostaining) was identified in 10.3% of HCC. Blockade of Ras and mTOR pathways with sorafenib and rapamycin reduced cell proliferation and induced apoptosis in cell lines. In vivo, the combination of both compounds enhanced tumor necrosis and ulceration when compared with sorafenib alone. Conclusions Ras activation results from several molecular alterations, such as methylation of tumor suppressors and amplification of oncogenes (B-raf). Sorafenib blocks signaling and synergizes with rapamycin in vivo, preventing tumor progression. These data provide the rationale for testing this combination in clinical studies.

Srinivas R Viswanathan, John T Powers, William Einhor, Yujin Hoshida, Tony L Ng, Sara Toffanin, Maureen O'Sullivan, Jun Lu, Letha A Phillips, Victoria L Lockhart, Samar P Shah, Pradeep S Tanwar, Craig H Mermel, Rameen Beroukhim, Mohammad Azam, Jose Teixeira, Matthew Meyerson, Timothy P Hughes, Josep M Llovet, Jerald Radich, Charles G Mullighan, Todd R Golub, Poul H Sorensen, George Q Daley

Nature Genetics, 2009 Jul;41(7):843-8

Multiple members of the let-7 family of miRNAs are often repressed in human cancers1, 2, thereby promoting oncogenesis by derepressing targets such as HMGA2, K-Ras and c-Myc3, 4. However, the mechanism by which let-7 miRNAs are coordinately repressed is unclear. The RNA-binding proteins LIN28 and LIN28B block let-7 precursors from being processed to mature miRNAs5, 6, 7, 8, suggesting that their overexpression might promote malignancy through repression of let-7. Here we show that LIN28 and LIN28B are overexpressed in primary human tumors and human cancer cell lines (overall frequency approx15%), and that overexpression is linked to repression of let-7 family miRNAs and derepression of let-7 targets. LIN28 and LIN28b facilitate cellular transformation in vitro, and overexpression is associated with advanced disease across multiple tumor types. Our work provides a mechanism for the coordinate repression of let-7 miRNAs observed in a subset of human cancers, and associates activation of LIN28 and LIN28B with poor clinical prognosis.

Yujin Hoshida

Journal of Hepatology, 2009 Sep;51(3):595-6

No abstract available

Yujin Hoshida, Augusto Villanueva, Josep M. Llovet

Expert Rev Gastroenterol Hepatol. 2009 Apr;3(2):101-3

No abstract available

Yujin Hoshida, Todd R. Golub

N Engl J Med. 2009 Mar 12;360(11):1152

No abstract available

Yujin Hoshida, Augusto Villanueva, Masahiro Kobayashi, Judit Peix, Derek Y. Chiang, Amy Camargo, Supriya Gupta, Jamie Moore, Matthew J. Wrobel, Jim Lerner, Michael Reich, Jennifer A. Chan, Jonathan N. Glickman, Kenji Ikeda, Masaji Hashimoto, Goro Watanabe, Maria G. Daidone, Sasan Roayaie, Myron Schwartz, Swan Thung, Helga B. Salvesen, Stacey Gabriel, Vincenzo Mazzaferro, Jordi Bruix, Scott L. Friedman, Hiromitsu Kumada, Josep M. Llovet, Todd R. Golub

N Engl J Med 2008;359:1995-2004

Background: It is a challenge to identify those patients who, after undergoing potentially curative treatments for hepatocellular carcinoma, are at greatest risk of recurrence. Such high-risk patients could receive novel interventional measures. An obstacle to the development of genome-based predictors of outcome in patients with hepatocellular carcinoma has been the lack of a means to carry out genomewide expression profiling of fixed, as opposed to frozen, tissues. Methods: We aimed to demonstrate the feasibility of gene-expression profiling of more than 6000 human genes in formalin-fixed, paraffin-embedded tissues. We applied the method to tissues from 307 patients with hepatocellular carcinoma, from four series of patients, to discover and validate a gene-expression signature associated with survival. Results: The expression-profiling method for formalin-fixed, paraffin-embedded tissue was highly effective: samples from 90% of the patients yielded data of high quality, including samples that had been archived for more than 24 years. Gene-expression profiles of tumor tissue failed to yield a significant association with survival. In contrast, profiles of the surrounding nontumoral liver tissue were highly correlated with survival in a training set of 82 Japanese patients, and the signature was validated in tissues from an independent group of 225 patients from the United States and Europe (p = 0.04). Conclusions: We have demonstrated the feasibility of genomewide expression profiling of formalin-fixed, paraffin-embedded tissues and have shown that a reproducible gene-expression signature correlating with survival is present in liver tissue adjacent to the tumor in patients with hepatocellular carcinoma.

Augusto Villanueva, Derek Y. Chiang, Pippa Newell, Judit Peix, Swan Thung, Clara Alsinet, Victoria Tovar, Sasan Roayaie, Beatriz Minguez, Manel Sole, Carlo Battiston, Stijn van Laarhoven, Maria I Fiel, Analisa Di Feo, Yujin Hoshida, Steven Yea, Sara Toffanin, Alex Ramos, John A. Martignetti, Vincenzo Mazzaferro, Jordi Bruix, Samuel Waxman, Myron Schwartz, Matthew Meyerson, Scott L. Friedman, Josep M. Llovet

Gastroenterology, 2008 Dec;135(6):1972-83

BACKGROUND: The advent of targeted therapies in hepatocellular carcinoma (HCC) has underscored the importance of pathway characterization to identify novel molecular targets for treatment. Based on its role in cell growth and differentiation, we evaluated mTOR signaling activation in human HCC, as well as the anti-tumoral effect of a duallevel blockade of the mTOR pathway. METHODS: The mTOR pathway was assessed using integrated data from mutation analysis (direct sequencing), DNA copy number changes (SNP-array), mRNA levels (qRT-PCR and gene expression microarray), and protein activation (immunostaining) in 351 human samples, including HCC (n=314), and non-tumoral tissue (n=37). Effects of dual blockade of mTOR signaling using a rapamycin analog (everolimus) and an EGFR/VEGFR inhibitor (AEE788) were evaluated in liver cancer cell lines, and in a tumor xenograft model. RESULTS: Aberrant mTOR signaling (phosphorylated-RPS6) was present in half of the cases, associated with IGF pathway activation, EGF upregulation, and PTEN dysregulation. PTEN and PI3KCA-B mutations were rare events. Chromosomal gains in RICTOR (25% of patients) and positive pRPS6 staining correlated with recurrence. RICTOR-specific siRNA downregulation reduced tumor cell viability in vitro. Blockage of mTOR signaling with everolimus in vitro and in a xenograft model decelerated tumor growth and increased survival. This effect was enhanced in vivo after EGFR blockade. CONCLUSIONS: MTOR signaling has a critical role in the pathogenesis of HCC, with evidence for the role of RICTOR in tumor oncogenesis. MTOR blockade with everolimus is effective in vivo. These findings establish a rationale for targeting mTOR pathway in clinical trials in HCC.

Derek Y. Chiang, Augusto Villanueva, Yujin Hoshida, Judit Peix, Philippa Newell, Beatriz Minguez, Amanda LeBlanc, Diana Donovan, Swan Thung, Manel Sole, Victoria Tovar, Clara Alsinet, Alex Ramos, Jordi Barretina, Sasan Roayaie, Myron Schwartz, Samuel Waxman, Jordi Bruix, Vincenzo Mazzaferro, Azra Ligon, Vesna Najfeld, Scott Friedman, William Sellers, Matthew Meyerson, Josep Llovet

Cancer Res. 68(16):6779-88

Hepatocellular carcinomas represent the third leading cause of cancer-related deaths worldwide. The vast majority of cases arise in the context of chronic liver injury due to hepatitis B virus or hepatitis C virus infection. To identify genetic mechanisms of hepatocarcinogenesis, we characterized copy number alterations and gene expression profiles from the same set of tumors associated with hepatitis C virus. Most tumors harbored 1q gain, 8q gain, or 8p loss, with occasional alterations in 13 additional chromosome arms. In addition to amplifications at 11q13 in 6 of 103 tumors, 4 tumors harbored focal gains at 6p21 incorporating vascular endothelial growth factor A (VEGFA). Fluorescence in situ hybridization on an independent validation set of 210 tumors found 6p21 high-level gains in 14 tumors, as well as 2 tumors with 6p21 amplifications. Strikingly, this locus overlapped with copy gains in 4 of 371 lung adenocarcinomas. Overexpression of VEGFA via 6p21 gain in hepatocellular carcinomas suggested a novel, non-cell-autonomous mechanism of oncogene activation. Hierarchical clustering of gene expression among 91 of these tumors identified five classes, including "CTNNB1", "proliferation", "IFN-related", a novel class defined by polysomy of chromosome 7, and an unannotated class. These class labels were further supported by molecular data; mutations in CTNNB1 were enriched in the "CTNNB1" class, whereas insulin-like growth factor I receptor and RPS6 phosphorylation were enriched in the "proliferation" class. The enrichment of signaling pathway alterations in gene expression classes provides insights on hepatocellular carcinoma pathogenesis. Furthermore, the prevalence of VEGFA high-level gains in multiple tumor types suggests indications for clinical trials of antiangiogenic therapies.

Augusto Villaneuva, Philippa Newell, Derek Y. Chiang, Scott Friedman, Josep Llovet

Semin. Liver Dis. 27(1): 55-76

Hepatocellular carcinoma (HCC) is a leading cause of death among cirrhotic patients and has become a major health problem in developed countries. There is an elemental understanding of the genes and signaling pathways involved in the initiation and progression of this neoplasm. The current hypothesis of the HCC cell origin includes both somatic cells (hepatocytes) and stem cells/progenitor cells. Unlike that in other malignancies such as breast, brain, or hematopoietic cancers, the implication of cancer stem cells in HCC pathogenesis is not yet supported by consistent data. Analysis of somatic genetic alterations and gene expression profiles in HCC samples has provided relevant information on the genes involved in hepatocarcinogenesis, pinpointing a seminal molecular classification of the disease. Nonetheless, a comprehensive genomic analysis of HCC samples using high-resolution platforms in precisely annotated HCCs is clearly needed. Recent data have identified different signaling pathways in liver carcinogenesis (e.g., Wnt-betaCatenin, Hedgehog, tyrosine kinase receptor-related pathways), providing an important potential source of novel molecular targets for new therapies. This review summarizes the most relevant information regarding structural and functional alterations in HCC and describes some of the key signaling pathways implicated in hepatocarcinogenesis.

Rameen Beroukhim*, Craig H. Mermel*, Dale Porter, Guo Wei, Soumya Raychaudhuri, Jerry Donovan, Jordi Barretina, Jesse S. Boehm, Jennifer Dobson, Mitsuyoshi Urashima, Kevin T. Mc Henry, Reid M. Pinchback, Azra H. Ligon, Yoon-Jae Cho, Leila Haery, Heidi Greulich, Michael Reich, Wendy Winckler, Michael S. Lawrence, Barbara A. Weir, Kumiko E. Tanaka, Derek Y. Chiang, Adam J. Bass, Alice Loo, Carter Hoffman, John Prensner, Ted Liefeld, Qing Gao, Derek Yecies, Sabina Signoretti, Elizabeth Maher, Frederic J. Kaye, Hidefumi Sasaki, Joel E. Tepper, Jonathan A. Fletcher, Josep Tabernero, Jose Baselga, Ming-Sound Tsao, Francesca DeMichelis, Mark A. Rubin, Pasi A. Janne, Mark J. Daly, Carmelo Nucera, Ross L. Levine, Benjamin L. Ebert, Stacey Gabriel, Anil K. Rustgi, Cristina R. Antonescu, Marc Ladanyi, Anthony Letai, Levi A. Garraway, Massimo Loda, David G. Beer, Lawrence D. True, Aikou Okamoto, Scott L. Pomeroy, Samuel Singer, Todd R. Golub, Eric S. Lander1, Gad Getz, William R. Sellers & Matthew Meyerson

Nature 463, 899-905(18 February 2010)

A powerful way to discover key genes with causal roles in oncogenesis is to identify genomic regions that undergo frequent alteration in human cancers. Here we present high-resolution analyses of somatic copy-number alterations (SCNAs) from 3,131 cancer specimens, belonging largely to 26 histological types. We identify 158 regions of focal SCNA that are altered at significant frequency across several cancer types, of which 122 cannot be explained by the presence of a known cancer target gene located within these regions. Several gene families are enriched among these regions of focal SCNA, including the BCL2 family of apoptosis regulators and the NF-???? pathway. We show that cancer cells containing amplifications surrounding the MCL1 and BCL2L1 anti-apoptotic genes depend on the expression of these genes for survival. Finally, we demonstrate that a large majority of SCNAs identified in individual cancer types are present in several cancer types.

No authors available

No citation available

No abstract available

Guo Wei, David Twomey, Justin Lamb, Krysta Schlis, Jyoti Agarwal, Ronald Stam, Joseph T. Opferman, Stephen E. Sallan, Monique L. den Boer, Rob Pieters, Todd R. Golub, Scott A. Armstrong.

CancerCell October 2006. 10.1016/j.ccr.2006.09.006

Drug resistance remains a major obstacle to successful cancer treatment. Here we use a novel approach to identify rapamycin as a glucocorticoid resistance reversal agent. A database of drugassociated gene expression profiles was screened for molecules whose profile overlapped with a gene expression signature of glucocorticoid (GC) sensitivity/resistance in Acute Lymphoblastic Leukemia (ALL) cells. The screen indicated the mTOR inhibitor rapamycin profile matched the signature of GCsensitivity. We thus tested the hypothesis that rapamycin would induce GC sensitivity in GC resistant lymphoid cells, and found that it sensitized cells to glucocorticoid induced apoptosis via modulation of antiapoptotic MCL1. These data indicate that MCL1 is an important regulator of GC-induced apoptosis, and that rapamycin is a potential therapeutic for GC-resistant ALL. Furthermore this approach represents a novel strategy for identification of promising combination therapies for cancer.

Andrei Krivtsov, David Twomey, Zhaohui Feng, Matthew C. Stubbs, Yingzi Wang, Joerg Faber, Jason E. Levine, Jing Wang, William C. Hahn, D. Gary Gilliland, Todd R. Golub, Scott A. Armstrong

doi:10.1038/nature04980; Nature. 2006 Aug 17;442(7104):818-22. Epub 2006 Jul 16

Leukaemias and other cancers possess a rare population of cells capable of the limitless self-renewal necessary for cancer initiation and maintenance1-7. Eradication of these cancer stem cells is probably a critical part of any successful anti-cancer therapy, and may explain why conventional cancer therapies are often effective in reducing tumour burden, but are only rarely curative. Given that both normal and cancer stem cells are capable of selfrenewal, the extent to which cancer stem cells resemble normal tissue stem cells is a critical issue if targeted therapies are to be developed. However, it remains unclear whether cancer stem cells must be phenotypically similar to normal tissue stem cells or whether they can retain the identity of committed progenitors. Here we show that leukaemia stem cells (LSC) can maintain the global identity of the progenitor from which they arose while activating a limited stem-cell- or self-renewal-associated programme. We isolated LSC from leukaemias initiated in committed granulocyte macrophage progenitors through introduction of the MLL-AF9 fusion protein encoded by the t(9;11)(p22;q23). The LSC were capable of transferring leukaemia to secondary recipient mice when only four cells were transferred, and possessed an immunophenotype and global gene expression profile very similar to that of normal granulocyte macrophage progenitors. However, a subset of genes highly expressed in normal haematopoietic stem cells was re-activated in LSC. LSC can thus be generated from committed progenitors without widespread reprogramming of gene expression, and a leukaemia self-renewal-associated signature is activated in the process. Our findings define progression from normal progenitor to cancer stem cell, and suggest that targeting a self-renewal programme expressed in an abnormal context may be possible.

Jean-Pierre Bourquin, Aravind Subramanian, Claudia Langebrake, Dirk Reinhardt, Olivier Bernard, Paola Ballerini , Andri Baruchel, Hilhne Cavi, Nicole Dastugue, Henrik Hasle , Gertjan L Kaspers, Michel Lessard, Lucienne Michaux, Elisabeth van Wering, Christian M Zwaan, Todd R.Golub and Stuart H. Orkin

No citation available

Individuals with Down Syndrome (DS) are predisposed to develop acute megakaryoblastic leukemia (AMKL), characterized by consistent somatic mutation of the transcription factor GATA1. As a result, DS-AMKL cells express an N-terminally truncated GATA1 protein, GATA1s. The treatment outcome for DS-AMKL is more favorable than for AMKL in non-DS patients. To gain insight into gene expression differences in AMKL, we compared 24 DS and 39 non-DS AMKL samples. We found that non-DS-AMKL samples cluster in two groups, characterized by differences in expression of HOX/TALE family members. Both of these groups are distinct from DS-AMKL, independent of chromosome 21 gene expression. To explore alterations of the GATA1 transcriptome, we used cross-species comparison to genes regulated by GATA1 expression in murine erythroid precursors. Interestingly, genes that are repressed following GATA1 induction in the murine system, most notably GATA-2, MYC and KIT, show increased expression in DS-AMKL, suggesting that GATA1s fails to repress this class of genes. In contrast, only a subset of genes that are upregulated upon GATA1 induction in the murine system show increased expression in DS-AMKL, including GATA1 and BACH1, a probable negative regulator of megakaryocytic differentiation located on chromosome 21. Surprisingly, expression of the chromosome 21 gene RUNX1, a known regulator of megakaryopoiesis, was not elevated in DS-AMKL. Collectively, our results identify relevant signatures for distinct AMKL entities and provide insight into gene expression changes associated with these related leukemias.

Scott A. Armstrong, Jane E. Staunton, Lewis B. Silverman, Rob Pieters, Monique L. den Boer, Mark D. Minden, Stephen E. Sallan, Eric S. Lander, Todd R. Golub, and Stanley J. Korsemeyer

Nature Genetics 30, pp 41 - 47 (2002)

Acute lymphoblastic leukemias carrying a chromosomal translocation involving the mixed-lineage leukemia gene (MLL, ALL1, HRX) have a particularly poor prognosis. Here we show that they have a characteristic, highly distinct gene expression profile that is consistent with an early hematopoietic progenitor expressing select multilineage markers and individual HOX genes. Clustering algorithms reveal that lymphoblastic leukemias with MLL translocations can clearly be separated from conventional acute lymphoblastic and acute myelogenous leukemias. We propose that they constitute a distinct disease, denoted here as MLL, and show that the differences in gene expression are robust enough to classify leukemias correctly as MLL, acute lymphoblastic leukemia or acute myelogenous leukemia. Establishing that MLL is a unique entity is essential, as it mandates the examination of selectively expressed genes for urgently needed molecular targets.

Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub and Eric S. Lander

RECOMB 2000, p263-272, 2000

Classification of patient samples is a crucial aspect of cancer diagnosis and treatment. We present a method for classifying samples by computational analysis of gene expression data. We consider the classification problem in two parts: class discovery and class prediction . Class discovery refers to the process of dividing samples into reproducible classes that have similar behavior or properties, while class prediction places new samples into already known classes. We describe a method for performing class prediction and illustrate its strength by correctly classifying bone marrow and blood samples from acute leukemia patients. We also describe how to use our predictor to validate newly discovered classes, and we demonstrate how this technique could have discovered the key distinctions among leukemias if they were not already known. This proof-of-concept experiment paves the way for a wealth of future work on the molecular classification and understanding of disease.

T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander

Science 286:531-537. (1999).

Although cancer classification has improved over the past 30 years, there has been no general approach for identifying new cancer classes (class discovery) or for assigning tumors to known classes (class prediction). Here, a generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case. A class discovery procedure automatically discovered the distinction between acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) without previous knowledge of these classes. An automatically derived class predictor was able to determine the class of new leukemia cases. The results demonstrate the feasibility of cancer classification based solely on gene expression monitoring and suggest a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.

DN Hayes, S Monti, G Parmigiani, CB Gilks, K Naoki, A Bhattacharjee, MA Socinski, C Perou and M Meyerson

J Clin Oncol, 24(31): 5079-5090, 2006.

PURPOSE: Published reports suggest that DNA microarrays identify clinically meaningful subtypes of lung adenocarcinomas not recognizable by other routine tests. This report is an investigation of the reproducibility of the reported tumor subtypes. METHODS: Three independent cohorts of patients with lung cancer were evaluated using a variety of DNA microarray assays. Using the integrative correlations method, a subset of genes was selected, the reliability of which was acceptable across the different DNA microarray platforms. Tumor subtypes were selected using consensus clustering and genes distinguishing subtypes were identified using the weighted difference statistic. Gene lists were compared across cohorts using centroids and gene set enrichment analysis. RESULTS: Cohorts of 31, 72, and 128 adenocarcinomas were generated for a total of 231 microarrays, each with 2,553 reliable genes. Three adenocarcinoma subtypes were identified in each cohort. These were named bronchioid, squamoid, and magnoid according to their respective correlations with gene expression patterns from histologically defined bronchioalveolar carcinoma, squamous cell carcinoma, and large-cell carcinoma. Tumor subtypes were distinguishable by many hundreds of genes, and lists generated in one cohort were predictive of tumor subtypes in the two other cohorts. Tumor subtypes correlated with clinically relevant covariates, including stage-specific survival and metastatic pattern. Most notably, bronchioid tumors were correlated with improved survival in early-stage disease, whereas squamoid tumors were associated with better survival in advanced disease. CONCLUSION: DNA microarray analysis of lung adenocarcinomas identified reproducible tumor subtypes which differ significantly in clinically important behaviors such as stage-specific survival.

Greulich H, Chen TH, Feng W, Janne PA, Alvarez JV, Zappaterra M, Bulmer SE, Frank DA, Hahn WC, Sellers WR, Meyerson M.

PLoS Medicine 2:e313

BACKGROUND: Somatic mutations in the kinase domain of the epidermal growth factor receptor tyrosine kinase gene EGFR are common in lung adenocarcinoma. The presence of mutations correlates with tumor sensitivity to the EGFR inhibitors erlotinib and gefitinib, but the transforming potential of specific mutations and their relationship to drug sensitivity have not been described. METHODS AND FINDINGS: Here, we demonstrate that EGFR active site mutants are oncogenic. Mutant EGFR can transform both fibroblasts and lung epithelial cells in the absence of exogenous epidermal growth factor, as evidenced by anchorage-independent growth, focus formation, and tumor formation in immunocompromised mice. Transformation is associated with constitutive autophosphorylation of EGFR, Shc phosphorylation, and STAT pathway activation. Whereas transformation by most EGFR mutants confers on cells sensitivity to erlotinib and gefitinib, transformation by an exon 20 insertion makes cells resistant to these inhibitors but more sensitive to the irreversible inhibitor CL-387,785. CONCLUSION: Oncogenic transformation of cells by different EGFR mutants causes differential sensitivity to gefitinib and erlotinib. Treatment of lung cancers harboring EGFR exon 20 insertions may therefore require the development of alternative kinase inhibition strategies.

Balazs Halmos, Daniela S. Basseres, Stefano Monti, Francesco D'Alo, Tajhal Dayaram, Katalin Ferenczi, Bas J. Wouters, Claudia S. Huettner, Todd R. Golub and Daniel G. Tenen

Cancer Research 64, 4137-4147, June 15, 2004

We showed previously that CCAAT/enhancer binding protein (C/EBP), a tissue-specific transcription factor, is a candidate tumor suppressor in lung cancer. In the present study, we have performed a transcriptional profiling study of C/EBP target genes using an inducible cell line system. This study led to the identification of hepatocyte nuclear factor 3beta (HNF3beta), a transcription factor known to play a role in airway differentiation, as a downstream target of C/EBP. We found down-regulation of HNF3beta expression in a large proportion of lung cancer cell lines examined and identified two novel mutants of HNF3beta, as well as hypermethylation of the HNF3beta promoter. We also developed a tetracycline-inducible cell line model to study the cellular consequences of HNF3beta expression. Conditional expression of HNF3beta led to significant growth reduction, proliferation arrest, apoptosis, and loss of clonogenic ability, suggesting additionally that HNF3beta is a novel tumor suppressor in lung cancer. This is the first study to show genetic abnormalities of lung-specific differentiation pathways in the development of lung cancer.

Arindam Bhattacharjee, William G. Richards, Jane Staunton, Cheng Li, Stefano Monti, Priya Vasa, Christine Ladd, Javad Beheshti, Raphael Bueno, Michael Gillette, Massimo Loda, Griffin Weber, Eugene J. Mark, Eric S. Lander, Wing Wong, Bruce E. Johnson, Todd R. Golub, David J. Sugarbaker, and Matthew Meyerson

Proc. Natl. Acad. Sci. USA, Vol. 98, Issue 24, 13790-13795, November 20, 2001

We have generated a molecular taxonomy of lung carcinoma, the leading cause of cancer death in the United States and worldwide. Using oligonucleotide microarrays, we analyzed mRNA expression levels corresponding to 12,600 transcript sequences in 186 lung tumor samples, including 139 adenocarcinomas resected from the lung. Hierarchical and probabilistic clustering of expression data defined distinct sub-classes of lung adenocarcinoma. Among these were tumors with high relative expression of neuroendocrine genes and of type II pneumocyte genes, respectively. Retrospective analysis revealed a less favorable outcome for the adenocarcinomas with neuroendocrine gene expression. The diagnostic potential of expression profiling is emphasized by its ability to discriminate primary lung adenocarcinomas from metastases of extra-pulmonary origin. These results suggest that integration of expression profile data with clinical parameters could aid in diagnosis of lung cancer patients

Linfeng Chen, Stefano Monti, Przemyslaw Juszczynski, Yasumichi Hitoshi, Wen Chen, Jeffery L. Kutok, Margaret A. Shipp

Blood, 111(4): 2230-2237, 2008

The role of B-cell receptor (BCR)-mediated survival signals in diffuse large B-cell lymphoma (DLBCL) remains undefined. BCR signaling induces receptor oligomerization and Ig α/β ITAM phosphorylation; thereafter, the protein tyrosine kinase SYK is recruited and activated, initiating downstream events and amplifying the original BCR signal. BCRs also transmit low-level tonic survival signals in the absence of receptor engagement. We previously found that protein tyrosine phosphatase-mediated SYK inactivation induced the apoptosis of DLBCLs, highlighting the potential role of SYK-dependent tonic BCR signaling as a survival mechanism. For these reasons, we assessed the efficacy of a novel ATP-competitive inhibitor of SYK, R406, in an extensive panel of DLBCL cell lines. R406 inhibited the proliferation and induced apoptosis of the majority of these DLBCLs. In all R406-sensitive lines, BCR crosslinking increased the autophosphorylation of SYK525/526 and SYK-dependent phosphorylation of the B-cell linker protein, BLNK, and R406 specifically inhibited these events. Furthermore, the R406-sensitive cell lines exhibited tonic phosphorylation of SYK and BLNK in the absence of BCR crosslinking. In addition, the DLBCL cell lines with an intact BCR signaling pathway and sensitivity to the SYK inhibitor were independently identified as BCR tumors by transcriptional profiling. These data suggest that tonic BCR signaling is an important and potentially targetable survival pathway in some, but not all, DLBCLs and that R406-sensitive DLBCLs can be identified by their transcriptional profiles.

K Takeyama, S Monti, J P Manis, P Dal Cin, G Getz, R Beroukhim, S Dutt, J C Aster, F W Alt, T R Golub, and M A Shipp

Oncogene 27(3):318-322, 2007

p53-Binding protein 1 (53BP1) encodes a critical checkpoint protein that localizes to sites of DNA double-strand breaks (DSBs) and participates in DSB repair. Mice that are 53bp1 deficient or hemizygous have an increased incidence of lymphoid malignancies. However, 53BP1 abnormalities in primary human tumors have not been described. By combining high-density single nucleotide polymorphism (HD SNP) array data and gene expression profiles, we found 9 of 63 newly diagnosed human diffuse large B-cell lymphomas (DLBCLs) with single copy loss of the chromosome 15q15 region including the 53BP1 locus; these nine tumors also had significantly lower levels of 53BP1 transcripts. 53BP1 single copy loss found with the HD SNP array platform was subsequently confirmed by fluorescence in situ hybridization. These studies highlight the role of 53BP1 copy loss in primary human DLBCLs and the value of integrative analyses in detecting this genetic lesion in human tumors.

Jessica Shin, Stefano Monti, Daniel J Aires, Madeleine Duvic, Todd Golub, David A Jones, and Thomas S Kupper

Blood, 110(8): 3015-3027, 2007

Cutaneous T cell lymphoma (CTCL) is defined by infiltration of activated and malignant T cells in skin. The clinical manifestations and prognosis in CTCL are highly variable. In this study, we hypothesized that gene expression analysis in lesional skin biopsies can improve understanding of the disease and its management. Based on 63 skin samples, we performed consensus clustering, revealing three patient clusters. Two clusters tended to differentiate limited CTCL (stages IA and IB) from more extensive CTCL (stages IB and III). Stage IB subjects appeared in both clusters, but those in the limited CTCL cluster were more responsive to treatment than those in the more extensive CTCL cluster. The third cluster was enriched in lymphocyte activation genes and was associated with a high proportion of tumor (stage IIB) lesions. Survival analysis revealed significant differences in event-free survival between clusters, with poorest survival seen in the activated lymphocyte cluster. Using supervised analysis, we further characterized genes significantly associated with lower stage/treatment responsive versus higher stage/treatment resistant CTCL. We conclude that transcriptional profiling of CTCL skin lesions reveals clinically relevant signatures, correlating with differences in survival and response to treatment. Additional prospective long-term studies to validate and refine these findings appear warranted.

Przemyslaw Juszczynski, Jing Ouyang, Stefano Monti, Scott J. Rodig, Kunihiko Takeyama, Jeremy Abramson, Wen Chen, Jeffery L. Kutok, Gabriel A. Rabinovich, and Margaret A. Shipp

Proc. Natl. Acad. Sci. USA 104(32):13134-13139, 2007

Classical Hodgkin lymphomas (cHLs) contain small numbers of neoplastic Reed-Sternberg (RS) cells within an extensive inflammatory infiltrate that includes abundant T helper (Th)-2 and T regulatory (Treg) cells. The skewed nature of the T cell infiltrate and the lack of an effective host antitumor immune response suggest that RS cells use potent mechanisms to evade immune attack. In a screen for T cell-inhibitory molecules in cHL, we found that RS cells selectively overexpressed the immunoregulatory glycan-binding protein, galectin-1 (Gal1), through an AP1-dependent enhancer. In cocultures of activated T cells and Hodgkin cell lines, RNAi-mediated blockade of RS cell Gal1 increased T cell viability and restored the Th1/Th2 balance. In contrast, Gal1 treatment of activated T cells favored the secretion of Th2 cytokines and the expansion of CD4+CD25high FOXP3+ Treg cells. These data directly implicate RS cell Gal1 in the development and maintenance of an immunosuppressive Th2/Treg-skewed microenvironment in cHL and provide the molecular basis for selective Gal1 expression in RS cells. Thus, Gal1 represents a potential therapeutic target for restoring immune surveillance in cHL.

Jose M. Polo, Przemyslaw Juszczynski, Stefano Monti, Leandro Cerchietti, Kenny Ye, John M. Greally, Margaret Shipp, Ari Melnick

PNAS 104(9): 3207-3212, 2007

Diffuse large B cell lymphomas (DLBCLs) often express BCL6, a transcriptional repressor required for the formation of normal germinal centers. In a subset of DLBCLs, BCL6 is deregulated by chromosomal translocations or aberrant somatic hypermutation; in other tumors, BCL6 expression may simply reflect germinal center lineage. DLBCLs dependent on BCL6-regulated pathways should exhibit differential regulation of BCL6 target genes. Genomic array ChIP-on-chip was used to identify the cohort of direct BCL6 target genes. This set of genes was enriched in modulators of transcription, chromatin structure, protein ubiquitylation, cell cycle, and DNA damage responses. In primary DLBCLs classified on the basis of gene expression profiles, these BCL6 target genes were clearly differentially regulated in "BCR" tumors, a subset of DLBCLs with increased BCL6 expression and more frequent BCL6 translocations. In a panel of DLBCL cell lines analyzed by expression arrays and classified according to their gene expression profiles, only BCR tumors were highly sensitive to the BCL6 peptide inhibitor, BPI. These studies identify a discrete subset of DLBCLs that are reliant on BCL6 signaling and uniquely sensitive to BCL6 inhibitors. More broadly, these data show how genome-wide identification of direct target genes can identify tumors dependent on oncogenic transcription factors and amenable to targeted therapeutics.

Friedrich Feuerhake, Jeffery L. Kutok, Stefano Monti, Wen Chen, Ann S. LaCasce, Giorgio Cattoretti, Paul Kurtin, Geraldine S. Pinkus, Laurence de Leval, Nancy L. Harris, Kerry J. Savage, Donna Neuberg, Thomas M. Habermann, Riccardo Dalla-Favera, Todd R. Golub, Jon C. Aster, Margaret A. Shipp

Blood, 2005. 106(4): p.1392-1399.

Primary mediastinal large B-cell lymphoma (MLBCL) shares important clinical and molecular features with classical Hodgkin lymphoma, including nuclear localization of the c-REL NFkB subunit in a pilot series. Herein, we analyzed c-REL subcellular localization in additional primary MLBCLs and characterized NFkB activity and function in a MLBCL cell line. The new primary MLBCLs had near-uniform c-REL nuclear staining and the MLBCL cell line exhibited high levels of NFkB binding activity. MLBCL cells expressing a super-repressor form of IkBk had a markedly higher rate of apoptosis, implicating constitutive NFkB activity in MLBCL cell survival. The transcriptional profiles of newly diagnosed primary MLBCLs and DLBCLs were then used to characterize the NFkB target gene signatures of MLBCL and specific DLBCL subtypes. MLBCLs expressed increased levels of NFkB targets that promote cell survival and favor anti-apoptotic TNFa signaling. In contrast, "ABC-like" DLBCLs had a more restricted, potentially developmentally regulated, NFkB target gene signature. Of interest, the newly characterized "Host Response" DLBCL subtype had a robust NFkB target gene signature which partially overlapped that of primary MLBCL. In this large series of primary MLBCLs and DLBCLs, NFkB activation was not associated with amplification of the c-REL locus, suggesting alternative pathogenetic mechanisms.

Stefano Monti, Kerry J. Savage, Jeffery L. Kutok, Friedrich Feuerhake, Paul Kurtin, Martin Mihm, Bingyan Wu, Laura Pasqualucci, Donna Neuberg, Ricardo C.T. Aguiar, Paola Dal Cin, Christine Ladd, Geraldine S. Pinkus, Gilles Salles, Nancy L. Harris, Riccardo Dalla-Favera, Thomas Habermann, Jon C. Aster, Todd R. Golub, Margaret A. Shipp

Blood, 1 March 2005, Vol. 105, No. 5, pp. 1851-1861

Diffuse large B-cell lymphoma (DLBCL) is a heterogeneous disease with recognized variability in clinical outcome, genetic features, and cells of origin. To date, transcriptional profiling has been used to highlight similarities between DLBCL tumor cells and normal B-cell subtypes and associate genes and pathways with unfavorable outcome. To identify robust and highly reproducible DLBCL subtypes with comprehensive transcriptional signatures, we utilized a large series of newly diagnosed DLBCLs, whole genome arrays and multiple clustering methods. Tumors were also analyzed for known common genetic abnormalities in DLBCL. Three discrete subsets of DLBCLs -- Oxidative Phosphorylation, B-cell Receptor/Proliferation, and Host Response (HR)-- were identified, characterized using gene set enrichment analysis and confirmed in an independent series. HR tumors had increased expression of T/NK-cell receptor and activation pathway components, complement cascade members, macrophage/dendritic cell markers and inflammatory mediators. HR DLBCLs also contained significantly higher numbers of morphologically distinct CD2+/CD3+ tumor-infiltrating lymphocytes and interdigitating S100+/GILT+/CD1a-/CD123- dendritic cells. The HR cluster shared features of histologically defined T-cell/histiocyte-richBCL, including fewer genetic abnormalities, younger age at presentation and frequent splenic and bone marrow involvement. These studies identify tumor microenvironment and host inflammatory response as defining features in DLBCL and suggest rational treatment targets in specific DLBCL subsets.

Kerry J. Savage, Stefano Monti, Jeffery L. Kutok, Giorgio Cattoretti, Donna Neuberg, Laurence de Leval, Paul Kurtin, Paola Dal Cin, Christine Ladd, Friedrich Feuerhake, Ricardo C. T. Aguiar, Sigui Li, Gilles Salles, Francoise Berger, Wen Jing, Geraldine S. Pinkus, Thomas Habermann, Riccardo Dalla-Favera, Nancy Lee Harris, Jon C. Aster, Todd R. Golub and Margaret A. Shipp

Blood, 1 December 2003, Vol. 102, No. 12, pp. 3871-3879.

Mediastinal large B-cell lymphoma (MLBCL) is a recently identified subtype of diffuse large B-cell lymphoma (DLBCL) that characteristically presents as localized tumors in young female patients. Although MLBCL has distinctive pathologic features, it clinically resembles the nodular sclerosis subtype of classical Hodgkin lymphoma (cHL). To elucidate the molecular features of MLBCL, we compared the gene expression profiles of newly diagnosed MLBCL and DLBCL and developed a classifier of these diseases. MLBCLs had low levels of expression of multiple components of the B-cell receptor signaling cascade, a profile resembling that of Reed-Sternberg cells of cHL. Like cHLs, MLBCLs also had high levels of expression of the interleukin-13 (IL-13) receptor and downstream effectors of IL-13 signaling (Janus kinase-2 [JAK2] and signal transducer and activator of transcription-1 [STAT1]), tumor necrosis factor (TNF) family members, and TNF receptor-associated factor-1 (TRAF1). Increased expression of STAT1 and TRAF1 in MLBCL was confirmed by immunohisto-chemistry. Given the TRAF1 expression and known link to nuclear factor?B (NF- B), MLBCLs were also evaluated for nuclear translocation of c-REL protein. In almost all cases, c-REL was localized to the nucleus, consistent with activation of the NF-B pathway. These studies identify a molecular link between MLBCL and cHL and a shared survival pathway.

M A Shipp, K N Ross, P Tamayo, A P Weng, J L Kutok, R C T Aguiar, M Gaasenbeek, M Angelo, M Reich, G S Pinkus, T S Ray, M A Koval, K W Last, A Norton, T A Lister, J Mesirov, D S Neuberg, E S Lander, J C Aster & T R Golub

Nature Medicine January 2002 Volume 8 Number 1 pp 68 - 74

Diffuse large B-cell lymphoma (DLBCL), the most common lymphoid malignancy in adults, is curable in less than 50% of patients. Prognostic models based on pre-treatment characteristics, such as the International Prognostic Index (IPI), are currently used to predict outcome in DLBCL. However, clinical outcome models identify neither the molecular basis of clinical heterogeneity, nor specific therapeutic targets. We have analyzed the expression of 6817 genes in diagnostic tumor specimens from DLBCL patients who received CHOP-based chemotherapy and have applied a supervised learning prediction method to delineate cured vs. fatal/refractory disease. The algorithm identified 2 categories of patients with dramatically different 5-yr overall survivals (70% vs. 12%). The model also effectively delineated patients within specific IPI risk categories who were likely to be cured or die of their disease. Features associated with outcome included differences in genes involved in responses to B-cell receptor signaling as well as serine/threonine phosphorylation pathways and downstream regulators of apoptosis. These data indicate that supervised learning classification techniques can predict outcome in DLBCL and identify rational targets for intervention.

Lei Xu, Steven S. Shen, Yujin Hoshida, Aravind Subramanian, Ken Ross, Jean-Philippe Brunet, Stephan N. Wagner, Sridhar Ramaswamy, Jill P. Mesirov, and Richard O. Hynes

Molecular Cancer Research 6, 760-769, May 1, 2008

Metastasis is the deadliest phase of cancer progression. Experimental models using immunodeficient mice have been used to gain insights into the mechanisms of metastasis. We report here the identification of a "metastasis aggressiveness gene expression signature" derived using human melanoma cells selected based on their metastatic potentials in a xenotransplant metastasis model. Comparison with expression data from human melanoma patients shows that this metastasis gene signature correlates with the aggressiveness of melanoma metastases in human patients. Many genes encoding secreted and membrane proteins are included in the signature, suggesting the importance of tumor-microenvironment interactions during metastasis.

Edwin A. Clark, Todd R. Golub, Eric S. Lander and Richard O. Hynes.

Nature 406:532-535

The most damaging change during cancer progression is the switch from a locally growing tumour to a metastatic killer. This switch is believed to involve numerous alterations that allow tumour cells to complete the complex series of events needed for metastasis. Relatively few genes have been implicated in these events. Here we use an in vivo selection scheme to select highly metastatic melanoma cells. By analysing these cells on DNA arrays, we define a pattern of gene expression that correlates with progression to a metastatic phenotype. In particular, we show enhanced expression of several genes involved in extracellular matrix assembly and of a second set of genes that regulate, either directly or indirectly, the actin-based cytoskeleton. One of these, the small GTPase RhoC, enhances metastasis when overexpressed, whereas a dominant-negative Rho inhibits metastasis. Analysis of the phenotype of cells expressing dominant-negative Rho or RhoC indicates that RhoC is important in tumour cell invasion. The genomic approach allows us to identify families of genes involved in a process, not just single genes, and can indicate which molecular and cellular events might be important in complex biological processes such as metastasis.

Vamsi K. Mootha, Christoph Handschin, Dan Arlow, Xiaohui Xie, Julie St. Pierre, Smita Sihag, Wenli Yang, David Altshuler, Pere Puigserver, Nick Patterson, Patricia J. Willy, Ira G. Shulman, Richard A. Heyman, Eric S. Lander, and Bruce M. Spiegelman

Proc. Natl. Acad. Sci. USA

Recent studies have shown that genes involved in oxidative phosphorylation (OXPHOS) exhibit reduced expression in skeletal muscle of diabetic and prediabetic humans. Moreover, these changes may be mediated by the transcriptional co-activator peroxisome proliferator-activated receptor-g co-activator-1a (PGC-1a). By combining PGC-1a-induced genomewide transcriptional profiles with a computational strategy to detect cis-regulatory motifs, we identified estrogen receptor-related receptor-a (Erra) and GA-binding protein a (Gabpa) as key transcription factors regulating the OXPHOS pathway. Interestingly, the genes encoding these two transcription factors are themselves PGC-1a-inducible and contain variants of both motifs near their promoters. Cellular assays confirmed that Erra and Gabpa partner with PGC-1a in muscle to form a double-positive feedback loop that drives the expression of many OXPHOS genes. By using a synthetic inhibitor of Erra, we demonstrated its key role in PGC-1a mediated effects on gene regulation and cellular respiration. These results illustrate the dissection of gene regulatory networks in a complex mammalian system, elucidate the mechanism of PGC-1a action in the OXPHOS pathway, and suggest that Erra agonists may ameliorate insulin-resistance in individuals with type 2 diabetes mellitus.

Vamsi K. Mootha, Jakob Bunkenborg, Jesper V. Olsen, Majbrit Hjerrild, Jacek R. Wisniewski, Erich Stahl, Marjan S. Bolouri, Heta N. Ray, Smita Sihag, Michael Kamal, Nick Patterson, Eric S. Lander, and Matthias Mann

Cell 115: 629-640

Mitochondria are tailored to meet the metabolic and signaling needs of each cell. To explore its molecular composition, we performed a proteomic survey of mitochondria from mouse brain, heart, kidney, and liver and combined the results with existing gene annotations to produce a list of 591 mitochondrial proteins, including 163 proteins not previously associated with the organelle. The protein expression data were largely concordant with large-scale surveys of RNA abundance and both measures indicate tissue specific differences in organelle composition. RNA expression profiles across a wide battery of tissues reveal sub-networks of mitochondrial genes that share function and regulatory mechanisms. We also determined a larger ?neighborhood? of genes whose expression is closely correlated to the mitochondrial genes. The combined analysis identifies specific genes of biological interest, such as candidates for mtDNA repair enzymes, offers new insights into the biogenesis and ancestry of mammalian mitochondria, and provides a framework for understanding the organelle?s contribution to human disease.

Vamsi K. Mootha, Cecilia M. Lindgren, Karl-Fredrik Eriksson, Aravind Subramanian, Smita Sihag, Joseph Lehar, Pere Puigserver, Emma Carlsson, Martin Ridderstr?le, Esa Laurila, Nicholas Houstis, Mark J. Daly, Nick Patterson, Jill P. Mesirov, Todd R. Golub, Pablo Tamayo, Bruce Spiegelman, Eric S. Lander, Joel N. Hirschhorn, David Altshuler, and Leif C. Groop

Nature Nature Genet. 15 June 2003, vol. 34 no. 3 pp 267 ? 273.

DNA microarrays can be used to discover gene expression changes characteristic of human disease. This is challenging, however, when relevant differences are subtle at the level of individual genes. We introduce an analytical strategy, Gene Set Enrichment Analysis, designed to detect modest but coordinate changes in the expression of groups of functionally related genes. Using this approach, we identify a set of genes involved in oxidative phosphorylation whose expression is coordinately decreased in human diabetic muscle. Expression of these genes is high at sites of insulin-mediated glucose disposal, activated by PGC-1a, and correlated with total-body aerobic capacity. Our results associate this gene set with clinically important variation in human metabolism, and illustrate the value of pathway relationships in the analysis of genomic profiling experiments.

Dirk M. Pegtel, Aravind Subramanian, Tzung-Shiahn Sheen, Ching-Hwa Tsai, Todd R. Golub and David A. Thorley-Lawson

No citation available

Non-keratinizing nasopharyngeal carcinomas (NPC) are >95% associated with the expression of the Epstein-Barr virus (EBV) LMP2A latent protein. However, the role of EBV, in particular LMP2A, in tumor progression is not well understood. Using Affymetrix chips and a pattern matching computational technique (neighborhood analysis), we show that the level of LMP2A expression in NPC biopsies correlates with that of a cellular protein, Integrin-alpha-6 (ITGa6), that is associated with cellular migration in vitro and metastasis in vivo. We have recently developed a primary epithelial model from tonsil tissue to study EBV infection in epithelial cells. Here we report that LMP2A expression in primary tonsil epithelial cells causes them to become migratory and invasive, that ITGa6 RNA levels are upregulated in epithelial cells expressing LMP2 and ITGa6 protein levels are increased in the migrating cells. Blocking antibodies against ITGa6 abrogated LMP2 induced invasion through Matrigel by primary epithelial cells. Our results provide a link between LMP2A expression, ITGa6 expression, epithelial cell migration and NPC metastasis and suggest that EBV infection may contribute to the high incidence of metastasis in NPC progression.

Sridhar Ramaswamy, Ken N. Ross, Eric S. Lander, & Todd R. Golub

Nature Genetics, vol. 33, January 2003, pp. 49-54.

Metastasis is the principal cause of death in cancer patients, yet its genetic basis remains poorly understood. To explore the molecular differences between human primary tumors and metastases, we compared the gene expression profiles of adenocarcinoma metastases of multiple tumor types to unmatched primary adenocarcinomas. First, we were able to find a gene expression signature that distinguished primary from metastatic adenocarcinomas. Second, and more surprisingly, we found that a subset of primary tumors resembled metastatic tumors with respect to this gene expression signature. We confirmed this finding by applying this expression signature to data on 279 primary solid tumors of diverse types. We found that solid tumors carrying the metastasis gene expression signature were most likely to be associated with metastasis and poor clinical outcome (P < 0.03). These results suggest that the metastatic potential of human tumors is encoded in the bulk of a primary tumor, thus challenging the notion that metastases arise from rare cells within a primary tumor that have the ability to metastasize.

Ding Z, Wu CJ, Chu GC, Xiao Y, Ho D, Zhang J, Perry SR, Labrot ES, Wu X, Lis R, Hoshida Y, Hiller D, Hu B, Jiang S, Zheng H, Stegh AH, Scott KL, Signoretti S, Bardeesy N, Wang YA, Hill DE, Golub TR, Stampfer MJ, Wong WH, Loda M, Mucci L, Chin L, DePinho RA

Nature. 2011 Feb 10;470(7333):269-73

Effective clinical management of prostate cancer (PCA) has been challenged by significant intratumoural heterogeneity on the genomic and pathological levels and limited understanding of the genetic elements governing disease progression. Here, we exploited the experimental merits of the mouse to test the hypothesis that pathways constraining progression might be activated in indolent Pten-null mouse prostate tumours and that inactivation of such progression barriers in mice would engender a metastasis-prone condition. Comparative transcriptomic and canonical pathway analyses, followed by biochemical confirmation, of normal prostate epithelium versus poorly progressive Pten-null prostate cancers revealed robust activation of the TGF??/BMP-SMAD4 signalling axis. The functional relevance of SMAD4 was further supported by emergence of invasive, metastatic and lethal prostate cancers with 100% penetrance upon genetic deletion of Smad4 in the Pten-null mouse prostate. Pathological and molecular analysis as well as transcriptomic knowledge-based pathway profiling of emerging tumours identified cell proliferation and invasion as two cardinal tumour biological features in the metastatic Smad4/Pten-null PCA model. Follow-on pathological and functional assessment confirmed cyclin D1 and SPP1 as key mediators of these biological processes, which together with PTEN and SMAD4, form a four-gene signature that is prognostic of prostate-specific antigen (PSA) biochemical recurrence and lethal metastasis in human PCA. This model-informed progression analysis, together with genetic, functional and translational studies, establishes SMAD4 as a key regulator of PCA progression in mice and humans.

Andrea Sboner, Francesca Demichelis, Stefano Calza, Yudi Pawitan, Sunita R Setlur, Yujin Hoshida, Sven Perner, Hans-Olov Adami, Katja Fall, Lorelei A Mucci, Philip W Kantoff, Meir Stampfer, Swen-Olof Andersson, Eberhard Varenhorst, Jan-Erik Johansson, Mark B Gerstein, Todd R Golub, Mark A Rubin and Ove Andren

BMC Medical Genomics. 2010 Mar 16;3(1):8

BACKGROUND: Current prostate cancer prognostic models are based on pre-treatment prostate specific antigen (PSA) levels, biopsy Gleason score, and clinical staging but in practice are inadequate to accurately predict disease progression. Hence, we sought to develop a molecular panel for prostate cancer progression by reasoning that molecular profiles might further improve current clinical models. METHODS: We analyzed a Swedish Watchful Waiting cohort with up to 30 years of clinical follow up using a novel method for gene expression profiling. This cDNA-mediated annealing, selection, ligation, and extension (DASL) method enabled the use of formalin-fixed paraffin-embedded transurethral resection of prostate (TURP) samples taken at the time of the initial diagnosis. We determined the expression profiles of 6100 genes for 281 men divided in two extreme groups: men who died of prostate cancer and men who survived more than 10 years without metastases (lethals and indolents, respectively). Several statistical and machine learning models using clinical and molecular features were evaluated for their ability to distinguish lethal from indolent cases. RESULTS: Surprisingly, none of the predictive models using molecular profiles significantly improved over models using clinical variables only. Additional computational analysis confirmed that molecular heterogeneity within both the lethal and indolent classes is widespread in prostate cancer as compared to other types of tumors. CONCLUSIONS: The determination of the molecularly dominant tumor nodule may be limited by sampling at time of initial diagnosis, may not be present at time of initial diagnosis, or may occur as the disease progresses making the development of molecular biomarkers for prostate cancer progression challenging.

Sunita R. Setlur , Kirsten D. Mertz , Yujin Hoshida , Francesca Demichelis , Mathieu Lupien , Sven Perner , Andrea Sboner , Yudi Pawitan , Ove Andrin , Laura A. Johnson , Jeff Tang , Hans-Olov Adami , Stefano Calza , Arul M. Chinnaiyan , Daniel Rhodes , Scott Tomlins , Katja Fall , Lorelei A. Mucci , Philip W. Kantoff , Meir J. Stampfer , Swen-Olof Andersson , Eberhard Varenhorst , Jan-Erik Johansson , Myles Brown , Todd R. Golub , Mark A. Rubin

J Natl Cancer Inst 2008;100: 815  825

No abstract available

F Demichelis, K Fall, S Perner, O Andre4n, F Schmidt, SR Setlur, Y Hoshida, J-M Mosquera, Y Pawitan, C Lee, H-O Adami, LA Mucci, PW Kantoff, S-O Andersson, AM Chinnaiyan, J-E Johansson, MA Rubin

Oncogene. 2007 Jul 5;26(31):4596-9

The identification of the TMPRSS2:ERG fusion in prostate cancer suggests that distinct molecular subtypes may define risk for disease progression. In surgical series, TMPRSS2:ERG fusion was identified in 50% of the tumors. Here, we report on a population-based cohort of men with localized prostate cancers followed by expectant (watchful waiting) therapy with 15% (17/111) TMPRSS2:ERG fusion. We identified a statistically significant association between TMPRSS2:ERG fusion and prostate cancer specific death (cumulative incidence ratio=2.7, P<0.01, 95% confidence interval=1.3-5.8). Quantitative reverse-transcription-polymerase chain reaction demonstrated high ets-related [corrected] gene (ERG) expression to be associated with TMPRSS2:ERG fusion (P<0.005). These data suggest that TMPRSS2:ERG fusion prostate cancers may have a more aggressive phenotype, possibly mediated through increased ERG expression.

Dinesh Singh, Phillip G. Febbo, Kenneth Ross, Donald G. Jackson, Judith Manola, Christine Ladd, Pablo Tamayo, Andrew A. Renshaw, Anthony V. D'Amico, Jerome P. Richie, Eric S. Lander, Massimo Loda, Philip W. Kantoff, Todd R. Golub, William R. Sellers.

Cancer Cell: March 2002, Vol. 1.

Prostate tumors are among the most heterogeneous of cancers, both histologically and with respect to highly divergent clinical outcomes. We used oligonucleotide array-based expression analysis to determine whether global biological differences underlie common pathological features of prostate cancer and to identify genes that might prove useful in anticipating the clinical behavior of this common disease. Robust expression differences between tumor and normal samples allowed for the development of accurate class prediction models that were validated in an independent data set. While no expression correlates of age, serum PSA, and measures of local invasion were found, a set of 29 genes including multiple TGF-beta targets was identified that strongly correlated with the state of tumor differentiation (Gleason Score). Finally, a model built using the expression of 5-genes accurately predicted patient outcome following prostatectomy. These results, taken together, support the notion that the clinical behavior of prostate cancer is genetically determined, and that these genetic determinants, or markers thereof, are detectable at the time of diagnosis.

David A. Barbie, Pablo Tamayo, Jesse S. Boehm, So Young Kim, Susan E. Moody, Ian F. Dunn, Anna C. Schinzel, Peter Sandy, Etienne Meylan, Claudia Scholl, Stefan Frohling, Edmond M. Chan, Martin L. Sos, Kathrin Michel, Craig Mermel, Serena J. Silver, Barbara A. Weir, Jan H. Reiling, Qing Sheng, Piyush B. Gupta, Raymond C. Wadlow, Hanh Le, Sebastian Hoersch, Ben S. Wittner, Sridhar Ramaswamy, David M. Livingston, David M. Sabatini, Matthew Meyerson, Roman K. Thomas, Eric S. Lander, Jill P. Mesirov, David E. Root, D. Gary Gilliland, Tyler Jacks & William C. Hahn

Nature 462(7269),108-112

The proto-oncogene KRAS is mutated in a wide array of human cancers, most of which are aggressive and respond poorly to standard therapies. Although the identification of specific oncogenes has led to the development of clinically effective, molecularly targeted therapies in some cases, KRAS has remained refractory to this approach. A complementary strategy for targeting KRAS is to identify gene products that, when inhibited, result in cell death only in the presence of an oncogenic allele. Here we have used systematic RNA interference to detect synthetic lethal partners of oncogenic KRAS and found that the non-canonical IkappaB kinase TBK1 was selectively essential in cells that contain mutant KRAS. Suppression of TBK1 induced apoptosis specifically in human cancer cell lines that depend on oncogenic KRAS expression. In these cells, TBK1 activated NF-kappaB anti-apoptotic signals involving c-Rel and BCL-XL (also known as BCL2L1) that were essential for survival, providing mechanistic insights into this synthetic lethal interaction. These observations indicate that TBK1 and NF-kappaB signalling are essential in KRAS mutant tumours, and establish a general approach for the rational identification of co-dependent pathways in cancer.

Benjamin L. Ebert and Todd R. Golub

Blood. 2004; 104: 923-932

Over the past several years, experiments using DNA microarrays have contributed to an increasingly refined molecular taxonomy of hematologic malignancies. In addition to the characterization of molecular profiles for known diagnostic classifications, studies have defined patterns of gene expression corresponding to specific molecular abnormalities, oncologic phenotypes, and clinical outcomes. Furthermore, novel subclasses with distinct molecular profiles and clinical behaviors have been identified. In some cases, specific cellular pathways have been highlighted that can be therapeutically targeted. The findings of microarray studies are beginning to enter clinical practice as novel diagnostic tests, and clinical trials are ongoing in which therapeutic agents are being used to target pathways that were identified by gene expression profiling. While the technology of DNA microarrays is becoming well established, genome-wide surveys of gene expression generate large datasets that can easily lead to spurious conclusions. Many challenges remain in the statistical interpretation of gene expression data and the biological validation of findings. As data accumulates and analyses become more sophisticated, genomic technologies offer the potential to generate increasingly sophisticated insights into the complex molecular circuitry of hematologic malignancies. This review summarizes the current state of discovery, and addresses key areas for future research.

Gregory Piatetsky-Shapiro and Pablo Tamayo

SIGKDD Explorations December 2003. Volume 5, Issue 2

This is an introductory article for a special issue of the Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining dedicated to Microarray Data Mining.

Sridhar Ramaswamy, Charles M. Perou

Lancet 361:1576-1577

No abstract available

Pablo Tamayo and Sridhar Ramaswamy

Expression profiling of human tumors: diagnostic and research applications. Marc Ladanyi and William Gerald eds. Humana Press (2003).

Cancer is a genetic malady, mostly resulting from acquired mutations and epigenetic changes that influence gene expression. Accordingly, a major focus in cancer research is identifying genetic markers that can be used for precise diagnosis or therapy. Over the last half-century, investigators have used reductionism to discover such markers through the study of simple genetic changes like balanced chromosomal translocations. For example, fundamental insights into the nature of the bcr-abl gene translocation product resulted in the precise molecular classification of chronic myelogenous leukemia and recently led to the development of the molecularly targeted tyrosine kinase inhibitor STI571 (Gleevec; Novartis, East Hanover, NJ) for the treatment of this disease. Ninety percent of human cancers, however, are epithelial in origin and display marked aneuploidy, multiple gene amplifications and deletions, and genetic instability, making resulting downstream effects difficult to study with traditional methods. Because this complexity probably explains the clinical diversity of histologically similar tumors, a comprehensive understanding of the genetic alterations present in all tumors is required. The initial sequencing of the human genome, coupled with technologic advances, now make it possible to embrace the genetic complexity of common human cancers in a global fashion. Tools are currently available, or are being developed, for the identification of all changes that take place in cancer at the DNA, RNA, and protein levels. In particular, the use of DNA microarrays for the comprehensive analysis of RNA expression (expression profiling) in human tumor samples holds much promise (see review articles in the Chipping Forecast 1999). A major challenge with this approach, however, remains the interpretation of complex and biologically ?noisy? data in a way that yields new knowledge. We have therefore focused on developing first-generation approaches to gene expression data analysis that are suitable for this purpose. Without such analytic tools, DNA microarray data are useless. This chapter is meant to serve as an introduction to fundamental concepts and techniques that have been developed in gene expression data mining over the last three years. It is not meant to be a comprehensive review of this rapidly expanding field, nor is it a step-by-step set of recipes. Most of the examples described come from our experience in cancer gene expression data analysis at the Whitehead / MIT Center for Genome Research over the last five years, but references to other works are also given when relevant to the discussion.

Sridhar Ramaswamy, Todd R. Golub

Journal of Clinical Oncology 20, 1932-1941

Aberrant gene expression is critical for tumor initiation and progression. However, we lack a comprehensive understanding of all genes that are aberrantly expressed in human cancer. Recently, DNA microarrays have been used to obtain global views of human cancer gene expression and to identify genetic markers that might be important for diagnosis and therapy. We review clinical applications of these novel tools, discuss some important recent studies, identify promising avenues of research in this emerging field of study, and discuss the likely impact that expression profiling will have on clinical oncology.

Todd Golub

N Engl J Med 2001; 344:601-602

No abstract available

Eric S. Lander and Robert A. Weinberg

Science 2000 March 10; 287: 1777-1782

Without doubt, the greatest achievement in biology over the past millennium has been the elucidation of the mechanism of heredity. Heredity is surely the strangest of physiological processes: Organisms encapsulate instructions for creating a member of their species in their gametes, these instructions are passed on to a fertilized egg, and then they unfold spontaneously to give rise to offspring. The ancient Greeks puzzled over these remarkable phenomena. Hippocrates imagined that instructional particles were gathered together from throughout the adult body, having been shaped by experience, while Aristotle believed that the instructions were constant and inherent in the gametes. But philosophers could do no more than speculate for the ensuing 2000 years, because there was no way to probe the physical nature of these instructions.

1998.12.31

Eric S Lander

The Chipping Forecast (1999) Special Supplement. Nature Genet. 21:January 1999.

Genomics aims to provide biologists with the equivalent of chemistry's Periodic Table1 ? an inventory of all genes used to assemble a living creature, together with an insightful system for classifying these building blocks. A short decade ago, the task of enumeration alone appeared to many to be a quixotic quest. Whereas chemical matter is composed of a mere hundred or so elements, organismal parts lists are huge ? running into the thousands for bacteria and hundreds of thousands for mammals. Genomic mapping and sequencing, however, has steadily extended its dominion: it has domesticated the Megabase and will tame the Gigabase in the not?too?distant future.

Barbara A. Weir, Michele S. Woo, Gad Getz, Sven Perner, Li Ding, Rameen Beroukhim, William M. Lin, Michael A. Province, Aldi Kraja, Laura A. Johnson, Kinjal Shah, Mitsuo Sato, Roman K. Thomas, Justine A. Barletta, Ingrid B. Borecki, Stephen Broderick, Andrew C. Chang, Derek Y. Chiang, Lucian R. Chirieac, Jeonghee Cho, Yoshitaka Fujii, Adi F. Gazdar, Thomas Giordano, Heidi Greulich, Megan Hanna, Bruce E. Johnson, Mark G. Kris, Alex Lash, Ling Lin, Neal Lindeman, Elaine R. Mardis, John D. McPherson, John Minna, Margaret B. Morgan, Mark Nadel, Mark B. Orringer, John R. Osborne, Brad Ozenberger, Alex H. Ramos, James Robinson, Jack A. Roth, Valerie Rusch, Hidefumi Sasaki, Frances Shepherd, Carrie Sougnez, Margaret R. Spitz, Ming-Sound Tsao, David Twomey, Roel Verhaak, George M. Weinstock, David A. Wheeler, Wendy Winckler, Akihiko Yoshizawa, Soyoung Yu, Maureen F. Zakowski, Qunyuan Zhang, David G. Beer, Ignacio I. Wistuba, Mark A. Watson, Levi A. Garraway, Marc Ladanyi, William D. Travis, William Pao, Mark A. Rubin, Stacey B. Gabriel, Richard A. Gibbs, Harold E. Varmus, Richard K. Wilson, Eric S. Lander & Matthew Meyerson

Nature 450, 893-898 (06 December 2007)

Somatic alterations in cellular DNA underlie almost all human cancers. The prospect of targeted therapies and the development of high-resolution, genome-wide approaches are now spurring systematic efforts to characterize cancer genomes. Here, we report a large-scale project to characterize copy-number alterations in primary lung adenocarcinomas. By analysis of a large collection of tumors (n=371) using dense single nucleotide polymorphism (SNP) arrays, we identify a total of 57 significantly recurrent events. We find that 26 of 39 autosomal chromosome arms show consistent gain or loss, of which only a handful have been linked to a specific gene. We also identify 31 recurrent focal events, including 24 amplifications and 7 homozygous deletions. Only 6 of these focal events are currently associated with known mutations in lung carcinomas. The most common event, amplification of chromosome 14q13.3, is found in ~12% of samples. Based on genomic and functional analysis, we identify NKX2-1 (TITF1), which lies in the interval and encodes a lineage-specific transcription factor, as a novel candidate proto-oncogene involved in a significant fraction of lung adenocarcinomas. More generally, our results indicate that many of the genes that play a role in lung adenocarcinoma remain to be discovered.

Thomas LaFramboise, Barbara A. Weir, Xiaojun Zhao, Rameen Beroukhim, Cheng Li, David Harrington, William R. Sellers, Matthew Meyerson

PLoS Comput Biol. 2005 Nov;1(6):e65.

Amplification, deletion, and loss of heterozygosity of genomic DNA are hallmarks of cancer. In recent years a variety of studies have emerged measuring total chromosomal copy number at increasingly high resolution. Similarly, loss-of-heterozygosity events have been finely mapped using high-throughput genotyping technologies. We have developed a probe-level allele-specific quantitation procedure that extracts both copy number and allelotype information from single nucleotide polymorphism (SNP) array data to arrive at allele-specific copy number across the genome. Our approach applies an expectation-maximization algorithm to a model derived from a novel classification of SNP array probes. This method is the first to our knowledge that is able to (a) determine the generalized genotype of aberrant samples at each SNP site (e.g., CCCCT at an amplified site), and (b) infer the copy number of each parental chromosome across the genome. With this method, we are able to determine not just where amplifications and deletions occur, but also the haplotype of the region being amplified or deleted. The merit of our model and general approach is demonstrated by very precise genotyping of normal samples, and our allele-specific copy number inferences are validated using PCR experiments. Applying our method to a collection of lung cancer samples, we are able to conclude that amplification is essentially monoallelic, as would be expected under the mechanisms currently believed responsible for gene amplification. This suggests that a specific parental chromosome may be targeted for amplification, whether because of germ line or somatic variation. An R software package containing the methods described in this paper is freely available at http://genome.dfci.harvard.edu/~tlaframb/PLASQ.

Levi A. Garraway, Hans R. Widlund, Mark A. Rubin, Gad Getz, Aaron J. Berger, Sridhar Ramaswamy, Rameen Beroukhim, Danny A. Milner, Scott R. Granter, Jinyan Du, Charles Lee, Stephan N. Wagner, Cheng Li, Todd R. Golub, David L. Rimm, Matthew L. Meyerson, David E. Fisher and William R. Sellers

Nature. 2005 Jul 7;436(7047):117-22.

Systematic analyses of cancer genomes promise to unveil patterns of genetic alterations linked to the genesis and spread of human cancers. High-density single-nucleotide polymorphism (SNP) arrays enable detailed and genome-wide identification of both loss-of-heterozygosity events and copy-number alterations in cancer. Here, by integrating SNP array-based genetic maps with gene expression signatures derived from NCI60 cell lines, we identified the melanocyte master regulator MITF (microphthalmia-associated transcription factor) as the target of a novel melanoma amplification. We found that MITF amplification was more prevalent in metastatic disease and correlated with decreased overall patient survival. BRAF mutation and p16 inactivation accompanied MITF amplification in melanoma cell lines. Ectopic MITF expression in conjunction with the BRAF(V600E) mutant transformed primary human melanocytes, and thus MITF can function as a melanoma oncogene. Reduction of MITF activity sensitizes melanoma cells to chemotherapeutic agents. Targeting MITF in combination with BRAF or cyclin-dependent kinase inhibitors may offer a rational therapeutic avenue into melanoma, a highly chemotherapy-resistant neoplasm. Together, these data suggest that MITF represents a distinct class of 'lineage survival' or 'lineage addiction' oncogenes required for both tissue-specific cancer development and tumour progression.

Xiaojun Zhao, Barbara A. Weir, Thomas LaFramboise, Ming Lin, Rameen Beroukhim, Levi Garraway, Javad Beheshti, Jeffrey C. Lee, Katsuhiko Naoki, William G. Richards, David Sugarbaker, Fei Chen, Mark A. Rubin, Pasi A. Janne, Luc Girard, John Minna, David Christiani, Cheng Li, William R. Sellers and Matthew Meyerson

Cancer Res. 2005 Jul 1;65(13):5561-70.

Genome-wide copy number changes were analyzed in 70 primary human lung carcinoma specimens and 31 cell lines derived from human lung carcinomas, with high-density arrays representing approximately 115,000 single nucleotide polymorphism loci. In addition to previously characterized loci, two regions of homozygous deletion were found, one near the PTPRD locus on chromosome segment 9p23 in four samples representing both small cell lung carcinoma (SCLC) and non-small cell lung carcinoma (NSCLC) and the second on chromosome segment 3q25 in one sample each of NSCLC and SCLC. High-level amplifications were identified within chromosome segment 8q12-13 in two SCLC specimens, 12p11 in two NSCLC specimens and 22q11 in four NSCLC specimens. Systematic copy number analysis of tyrosine kinase genes identified high-level amplification of EGFR in three NSCLC specimens, FGFR1 in two specimens and ERBB2 and MET in one specimen each. EGFR amplification was shown to be independent of kinase domain mutational status.

Minna Allinen, Rameen Beroukhim, Li Cai, Cameron Brennan, Jaana Lahti-Domenici, Haiyan Huang, Dale Porter, Min Hu, Lynda Chin, Andrea Richardson, Stuart Schnitt, William R. Sellers and Kornelia Polyak

Cancer Cell. 2004 Jul;6(1):17-32.

Here we describe the comprehensive gene expression profiles of each cell type composing normal breast tissue and in situ and invasive breast carcinomas using serial analysis of gene expression. Based on these data, we determined that extensive gene expression changes occur in all cell types during cancer progression and that a significant fraction of altered genes encode secreted proteins and receptors. Despite the dramatic gene expression changes in all cell types, genetic alterations were detected only in cancer epithelial cells. The CXCL14 and CXCL12 chemokines overexpressed in tumor myoepithelial cells and myofibroblasts, respectively, bind to receptors on epithelial cells and enhance their proliferation, migration, and invasion. Thus, chemokines may play a role in breast tumorigenesis by acting as paracrine factors.

J. Guillermo Paez, Ming Lin, Rameen Beroukhim, Jeffrey C. Lee, Xiaojun Zhao, Daniel J. Richter, Stacey Gabriel, Paula Herman, Hidefumi Sasaki, David Altshuler, Cheng Li, Matthew Meyerson, and William R. Sellers

Nucleic Acids Res. 2004 May 18;32(9):e71.

Major efforts are underway to systematically define the somatic and germline genetic variations causally associated with disease. Genome-wide genetic analysis of actual clinical samples is, however, limited by the paucity of genomic DNA available. Here we have tested the fidelity and genome representation of phi29 polymerase-based genome amplification (phi29MDA) using direct sequencing and high density oligonucleotide arrays probing >10,000 SNP alleles. Genome representation was comprehensive and estimated to be 99.82% complete, although six regions encompassing a maximum of 5.62 Mb failed to amplify. There was no degradation in the accuracy of SNP genotyping and, in direct sequencing experiments sampling 500,000 bp, the estimated error rate (9.5 x 10(-6)) was the same as in paired unamplified samples. The detection of cancer-associated loss of heterozygosity and copy number changes, including homozygous deletion and gene amplification, were similarly robust. These results suggest that phi29MDA yields high fidelity, near-complete genome representation suitable for high resolution genetic analysis.

Xiaojun Zhao, Cheng Li, J. Guillermo Paez, Koei Chin, Pasi A. Janne, Tzu-Hsiu Chen, Luc Girard, John Minna, David Christiani, Chris Leo, Joe W. Gray, William R. Sellers and Matthew Meyerson

Cancer Res. 2004 May 1;64(9):3060-71.

Changes in DNA copy number contribute to cancer pathogenesis. We now show that high-density single nucleotide polymorphism (SNP) arrays can detect copy number alterations. By hybridizing genomic representations of breast and lung carcinoma cell line and lung tumor DNA to SNP arrays, and measuring locus-specific hybridization intensity, we detected both known and novel genomic amplifications and homozygous deletions in these cancer samples. Moreover, by combining genotyping with SNP quantitation, we could distinguish loss of heterozygosity events caused by hemizygous deletion from those that occur by copy-neutral events. The simultaneous measurement of DNA copy number changes and loss of heterozygosity events by SNP arrays should strengthen our ability to discover cancer-causing genes and to refine cancer diagnosis.

Pasi A Janne, Cheng Li, Xiaojun Zhao, Luc Girard, Tzu-Hsiu Chen, John Minna, David C Christiani, Bruce E Johnson, and Matthew Meyerson

Oncogene (2004) 23, 2716-2726

Chromosomal loss of heterozygosity (LOH) is a common mechanism for the inactivation of tumor suppressor genes in human epithelial cancers. Hybridization to single-nucleotide polymorphism (SNP) arrays is an efficient method to detect genome-wide cancer LOH. Here, we survey LOH patterns in a panel of 33 human lung cancer cell lines using SNP array hybridization containing 1500 SNPs. We compared the LOH patterns generated by SNP array hybridization to those previously obtained by 399 microsatellite markers and find a high degree of concordance between the two methods. A novel informatics platform, dChipSNP, was used to perform hierarchical tumor clustering based on genome-wide LOH patterns. We demonstrate that this method can separate non-small-cell and small-cell lung cancer samples based on their shared LOH. Furthermore, we analysed seven human lung cancer cell lines using a novel 10 000 SNP array and demonstrate that this is an efficient and reliable method of high-density allelotyping. Using this array, we identified small regions of LOH that were not detected by lower density SNP arrays or by standard microsatellite marker panels.

Ming Lin, Lee-Jen Wei, William R. Sellers, Marshall Lieberfarb, Wing Hung Wong, and Cheng Li

Bioinformatics. 2004 May 22;20(8):1233-40.

MOTIVATION: Oligonucleotide microarrays allow genotyping of thousands of single-nucleotide polymorphisms (SNPs) in parallel. Recently, this technology has been applied to loss-of-heterozygosity (LOH) analysis of paired normal and tumor samples. However, methods and software for analyzing such data are not fully developed. RESULT: Here, we report automated methods for pooling SNP array replicates to make LOH calls, visualizing SNP and LOH data along chromosomes in the context of genes and cytobands, making statistical inference to identify shared LOH regions, clustering samples based on LOH profiles and correlating the clustering results to clinical variables. Application of these methods to prostate and breast cancer datasets generates biologically important results. AVAILABILITY: The software module dChipSNP implementing these methods is available at http://biosun1.harvard.edu/complab/dchip/snp/ SUPPLEMENTARY INFORMATION: The breast cancer data are provided by Andrea L. Richardson, Zhigang C. Wang and James D. Iglehart.

Zhigang C. Wang, Ming Lin, Lee-Jen Wei, Cheng Li, Alexander Miron, Gabriella Lodeiro, Lyndsay Harris, Sridhar Ramaswamy, David M. Tanenbaum, Matthew Meyerson, James D. Iglehart and Andrea Richardson

Cancer Res. 2004 Jan 1;64(1):64-71.

Gene expression array profiles identify subclasses of breast cancers with different clinical outcomes and different molecular features. The present study attempted to correlate genomic alterations (loss of heterozygosity; LOH) with subclasses of breast cancers having distinct gene expression signatures. Hierarchical clustering of expression array data from 89 invasive breast cancers identified four major expression subclasses. Thirty-four of these cases representative of the four subclasses were microdissected and allelotyped using genome-wide single nucleotide polymorphism detection arrays (Affymetrix, Inc.). LOH was determined by comparing tumor and normal single nucleotide polymorphism allelotypes. A newly developed statistical tool was used to determine the chromosomal regions of frequent LOH. We found that breast cancers were highly heterogeneous, with the proportion of LOH ranging widely from 0.3% to >60% of heterozygous markers. The most common sites of LOH were on 17p, 17q, 16q, 11q, and 14q, sites reported in previous LOH studies. Signature LOH events were discovered in certain expression subclasses. Unique regions of LOH on 5q and 4p marked a subclass of breast cancers with "basal-like" expression profiles, distinct from other subclasses. LOH on 1p and 16q occurred preferentially in a subclass of estrogen receptor-positive breast cancers. Finding unique LOH patterns in different groups of breast cancer, in part defined by expression signatures, adds confidence to newer schemes of molecular classification. Furthermore, exclusive association between biological subclasses and restricted LOH events provides rationale to search for targeted genes.

Marshall E. Lieberfarb, Ming Lin, Mirna Lechpammer, Cheng Li, David M. Tanenbaum, Phillip G. Febbo, Renee L. Wright, Judy Shim, Philip W. Kantoff, Massimo Loda, Matthew Meyerson and William R. Sellers

Cancer Res. 2003 Aug 15;63(16):4781-5.

Oligonucleotide arrays that detect single nucleotide polymorphisms were used to generate genome-wide loss of heterozygosity (LOH) maps from laser capture microdissected paraffin-embedded samples using as little as 5 ng of DNA. The allele detection rate from such samples was comparable with that obtained with standard amounts of DNA prepared from frozen tissues. A novel informatics platform, dChipSNP, was used to automate the definition of statistically valid regions of LOH, assign LOH genotypes to prostate cancer samples, and organize by hierarchical clustering prostate cancers based on the pattern of LOH. This organizational strategy revealed apparently distinct genetic subsets of prostate cancer.

Kerstin Lindblad-Toh, David M. Tanenbaum, Mark J. Daly, Ellen Winchester, Weng-Onn Lui, Anuradha Villapakkam, Sasha E. Stanton, Catharina Larsson, Thomas J. Hudson, Bruce E. Johnson, Eric S. Lander & Matthew Meyerson

Nat Biotechnol. 2000 Sep;18(9):1001-5.

Human cancers arise by a combination of discrete mutations and chromosomal alterations. Loss of heterozygosity (LOH) of chromosomal regions bearing mutated tumor suppressor genes is a key event in the evolution of epithelial and mesenchymal tumors. Global patterns of LOH can be understood through allelotyping of tumors with polymorphic genetic markers. Simple sequence length polymorphisms (SSLPs, or microsatellites) are reliable genetic markers for studying LOH, but only a modest number of SSLPs are used in LOH studies because the genotyping procedure is rather tedious. Here, we report the use of a highly parallel approach to genotype large numbers of single-nucleotide polymorphisms (SNPs) for LOH, in which samples are genotyped for nearly 1,500 loci by performing 24 polymerase chain reactions (PCR), pooling the resulting amplification products and hybridizing the mixture to a high-density oligonucleotide array. We characterize the results of LOH analyses on human small-cell lung cancer (SCLC) and control DNA samples by hybridization. We show that the patterns of LOH are consistent with those obtained by analysis with both SSLPs and comparative genomic hybridization (CGH), whereas amplifications rarely are detected by the SNP array. The results validate the use of SNP array hybridization for tumor studies.

Richard Smith, Leah A. Owen, Deborah J. Trem, Jenny S. Wong, Jennifer S. Whangbo, Todd R. Golub, and Stephen L. Lessnick

Cancer Cell (2006) 9:405-416

Our understanding of Ewing's sarcoma development mediated by the EWS/FLI fusion protein has been limited by a lack of knowledge regarding the tumor cell of origin. To circumvent this, we analyzed the function of EWS/FLI in Ewing's sarcoma itself. By combining retroviral-mediated RNA interference with reexpression studies, we show that ongoing EWS/FLI expression is required for the tumorigenic phenotype of Ewing's sarcoma. We used this system to define the full complement of EWS/FLI-regulated genes in Ewing's sarcoma. Functional analysis revealed that NKX2.2 is an EWS/FLI-regulated gene that is necessary for oncogenic transformation in this tumor. Thus, we developed a highly validated transcriptional profile for the EWS/FLI fusion protein and identified a critical target gene in Ewing's sarcoma development.

Stephen L. Lessnick, Caroline S. Dacwag, and Todd R. Golub

Cancer Cell 1, 393-401

Ewing's sarcoma is associated with a fusion between the EWS and FLI1 genes, forming an EWS/FLI fusion protein. We developed a system for the identification of cooperative mutations in this tumor through expression of EWS/FLI in primary human fibroblasts. Gene expression profiling demonstrated that this system recapitulates many features of Ewing's sarcoma. EWS/FLI-expressing cells underwent growth arrest, suggesting that growth arrest-abrogating collaborative mutations may be required for tumorigenesis. Expression profiling identified transcriptional upregulation of p53, and the growth arrest was rescued by inhibition of p53. These data support a role for p53 as a tumor suppressor in Ewing's sarcoma, and demonstrate the use of transcriptional profiling of model systems in the identification of cooperating mutations in human cancer.

Douglas Fambrough, Kimberly McClure, Andrius Kazlauskas, and Eric S. Lander.

Cell, Vol. 97, 727?741, June, 1999

We sought to explore the relationship between receptor tyrosine kinase (RTK) activated signaling pathways and the transcriptional induction of immediate early genes (IEGs). Using global expression monitoring, we identified 66 fibroblast IEGs induced by platelet-derived growth factor receptor (PDGFR) signaling. Mutant receptors lacking binding sites for activation of the PLC, PI3K, SHP2, and RasGAP pathways still retain partial ability to induce 64 of these IEGs. Removal of the Grb2-binding site further broadly reduces induction. These results suggest that the diverse pathways exert broadly overlapping effects on IEG induction. Interestingly, a mutant receptor that restores the RasGAP-binding site promotes induction of an independent group of genes, normally induced by interferons. Finally, we compare the PDGFR and fibroblast growth factor receptor 1; each induces essentially identical IEGs in fibroblasts

Ricardo D. Coletta, Kimberly Christensen, Kelly Jansky Reichenberger, Justin Lamb, Damian Micomonaco, Lili Huang, Douglas M. Wolf, Carsten M?ller-Tidow, Todd R. Golub, Kiyoshi Kawakami and Heide L. Ford

Proc. Natl. Acad. Sci. USA 101: 6478-6483

Homeobox genes comprise a large family of transcription factors that are essential during normal development and are often dysregulated in cancer. However, the molecular mechanisms by which homeobox genes influence cancer remain largely unknown. Here we show that the tissue-restricted cyclin A1 is a transcriptional target of the Six1 homeoprotein. Both genes are expressed in the embryonic but not the terminally differentiated mammary gland, and Six1 knockout mice show a dramatic reduction of cyclin A1 in the embryonic mammary gland. In addition, both genes are re-expressed in breast cancers. Six1 overexpression increases cyclin A1 mRNA levels and activity, cell proliferation and tumor volume, whereas Six1 downregulation decreases cyclin A1 mRNA levels and proliferation. Overexpression of Six1 in wild type mouse embryonic fibroblasts, but not in knockout variants lacking the cyclin A1 gene, induces cell proliferation. Furthermore, inhibition of cyclin A1 in Six1 overexpressing mammary carcinoma cells decreases proliferation. Together these results demonstrate that cyclin A1 is required for the proliferative effect of Six1. We conclude that Six1 overexpression re-instates an embryonic pathway of proliferation in breast cancer by upregulating cyclin A1.

Justin Lamb, Sridhar Ramaswamy, Heide L. Ford, Bernardo Contreras, Robert V. Martinez, Frances S. Kittrell, Cynthia A. Zahnow, Nick Patterson, Todd R. Golub and Mark E. Ewen

Cell 114: 323-334 2003

Here we describe how patterns of gene expression in human tumors have been deconvoluted to reveal a previously unappreciated mechanism of action for the cyclin D1 oncogene. Computational analysis of the expression patterns of thousands of genes across hundreds of tumor specimens suggested that a transcription factor, C/EBPβ/Nf-Il6, participates in the consequences of cyclin D1 overexpression. Functional analyses confirmed the involvement of C/EBPβ in the regulation of genes affected by cyclin D1 and established this protein as an indispensable effector of a potentially important facet of cyclin D1 biology. This work demonstrates that tumor gene expression databases can be used to study the function of a human oncogene in situ.

Johansen LM, Iwama A, Lodie TA, Sasaki K, Felsher DW, Golub TR, Tenen DG

Molecular and Cellular Biology, June 2001, p. 3789-3806, Vol. 21, No. 11

CCAAT/enhancer binding protein alpha (C/EBPalpha) is an integral factor in the granulocytic developmental pathway, as myeloblasts from C/EBPalpha-null mice exhibit an early block in differentiation. Since mice deficient for known C/EBPalpha target genes do not exhibit the same block in granulocyte maturation, we sought to identify additional C/EBPalpha target genes essential for myeloid cell development. To identify such genes, we used both representational difference analysis and oligonucleotide array analysis with RNA derived from a C/EBPalpha-inducible myeloid cell line. From each of these independent screens, we identified c-Myc as a C/EBPalpha negatively regulated gene. We mapped an E2F binding site in the c-Myc promoter as the cis-acting element critical for C/EBPalpha negative regulation. The identification of c-Myc as a C/EBPalpha target gene is intriguing, as it has been previously shown that down-regulation of c-Myc can induce myeloid differentiation. Here we show that stable expression of c-Myc from an exogenous promoter not responsive to C/EBPalpha-mediated down-regulation forces myeloblasts to remain in an undifferentiated state. Therefore, C/EBPalpha negative regulation of c-Myc is critical for allowing early myeloid precursors to enter a differentiation pathway. This is the first report to demonstrate that C/EBPalpha directly affects the level of c-Myc expression and, thus, the decision of myeloid blasts to enter into the granulocytic differentiation pathway.

Hilary A. Coller, Carla Grandori, Pablo Tamayo, Trent Colbert, Eric S. Lander, Robert N. Eisenman and Todd R. Golub

Proc. Natl. Acad. Sci. USA, Vol. 97, Issue 7, 3260-3265, March 28, 2000

MYC affects normal and neoplastic cell proliferation by altering gene expression, but the precise pathways remain unclear. We used oligonucleotide microarray analysis of 6,416 genes and expressed sequence tags to determine changes in gene expression caused by activation of c-MYC in primary human fibroblasts. In these experiments, 27 genes were consistently induced, and 9 genes were repressed. The identity of the genes revealed that MYC may affect many aspects of cell physiology altered in transformed cells: cell growth, cell cycle, adhesion, and cytoskeletal organization. Identified targets possibly linked to MYC's effects on cell growth include the nucleolar proteins nucleolin and fibrillarin, as well as the eukaryotic initiation factor 5A. Among the cell cycle genes identified as targets, the G1 cyclin D2 and the cyclin-dependent kinase binding protein CksHs2 were induced whereas the cyclin-dependent kinase inhibitor p21Cip1 was repressed. A role for MYC in regulating cell adhesion and structure is suggested by repression of genes encoding the extracellular matrix proteins fibronectin and collagen, and the cytoskeletal protein tropomyosin. A possible mechanism for MYC-mediated apoptosis was revealed by identification of the tumor necrosis factor receptor associated protein TRAP1 as a MYC target. Finally, two immunophilins, peptidyl-prolyl cis-trans isomerase F and FKBP52, the latter of which plays a role in cell division in Arabidopsis, were up-regulated by MYC. We also explored pattern-matching methods as an alternative approach for identifying MYC target genes. The genes that displayed an expression profile most similar to endogenous Myc in microarray-based expression profiling of myeloid differentiation models were highly enriched for MYC target genes.

No authors available

No citation available

No abstract available

Jun Lu, Shangqin Guo, Benjamin L. Ebert, Hao Zhang, Xiao Peng, Jocelyn Bosco, Jennifer Pretz, Rita Schlanger, Judy Y. Wang, Raymond H. Mak, David M. Dombkowski, Frederic I. Preffer, David T. Scadden and Todd R. Golub

Developmental Cell (2008), doi:10.1016/j.devcel.2008.03.012

Lineage specification is a critical issue in developmental and regenerative biology. We hypothesized that microRNAs (miRNAs) are important participants in that process and used the poorly-understood regulation of megakaryocyte-erythrocyte progenitors (MEPs) in hematopoiesis as a model system. We report here that miR-150 modulates lineage fate in MEPs. Using a novel methodology capable of profiling miRNA expression in limiting numbers of primary cells, we identify miR-150 as preferentially expressed in the megakaryocytic lineage. Through gain- and loss-of-function experiments, we demonstrate that miR-150 drives MEP differentiation toward megakaryocytes at the expense of erythroid cells in vitro and in vivo. Moreover, we identify the transcription factor MYB as a critical target of miR-150 in this regulation. These experiments show that miR-150 regulates MEP fate, and thus establish a role for miRNAs in lineage specification of mammalian multi-potent cells.

Madhu S. Kumar, Jun Lu, Kim L. Mercer, Todd R. Golub, and Tyler Jacks

Nat Genet. 2007 May;39(5):673-7. Epub 2007 Apr 1

MicroRNAs (miRNAs) are a new class of small noncoding RNAs that post-transcriptionally regulate the expression of target mRNA transcripts. Many of these target mRNA transcripts are involved in proliferation, differentiation and apoptosis, processes commonly altered during tumorigenesis. Recent work has shown a global decrease of mature miRNA expression in human cancers. However, it is unclear whether this global repression of miRNAs reflects the undifferentiated state of tumors or causally contributes to the transformed phenotype. Here we show that global repression of miRNA maturation promotes cellular transformation and tumorigenesis. Cancer cells expressing short hairpin RNAs (shRNAs) targeting three different components of the miRNA processing machinery showed a substantial decrease in steady-state miRNA levels and a more pronounced transformed phenotype. In animals, miRNA processing-impaired cells formed tumors with accelerated kinetics. These tumors were more invasive than control tumors, suggesting that global miRNA loss enhances tumorigenesis. Furthermore, conditional deletion of Dicer1 enhanced tumor development in a K-Ras-induced mouse model of lung cancer. Overall, these studies indicate that abrogation of global miRNA processing promotes tumorigenesis.

Jun Lu, Gad Getz, Eric A. Miska, Ezequiel A. Alvarez-Saavedra, Justin Lamb, David Peck, Alejandro Sweet-Cordero, Benjamin L. Ebert, Raymond H. Mak, Adolfo A. Ferrando, James R. Downing, Tyler Jacks, H. Robert Horvitz and Todd R. Golub

Nature 435, 834-838 (9 June 2005)

Recent work has revealed the existence of a class of small noncoding RNA species, known as microRNAs (miRNAs), which have critical functions across various biological processes1,2. Here we use a new, bead-based flow cytometric miRNA expression profiling method to present a systematic expression analysis of 217 mammalian miRNAs from 334 samples, including multiple human cancers. The miRNA profiles are surprisingly informative, reflecting the developmental lineage and differentiation state of the tumours. We observe a general downregulation of miRNAs in tumours compared with normal tissues. Furthermore, we were able to successfully classify poorly differentiated tumours using miRNA expression profiles, whereas messenger RNA profiles were highly inaccurate when applied to the same samples. These findings highlight the potential of miRNA profiling in cancer diagnosis.

Xiaohui Xie, Jun Lu, E. J. Kulbokas, Todd R. Golub, Vamsi Mootha, Kerstin Lindblad-Toh, Eric S. Lander and Manolis Kellis

Nature. 2005 Mar 17;434(7031):338-45

Comprehensive identification of all functional elements encoded in the human genome is a fundamental need in biomedical research. Here, we present a comparative analysis of the human, mouse, rat and dog genomes to create a systematic catalogue of common regulatory motifs in promoters and 3' untranslated regions (3' UTRs). The promoter analysis yields 174 candidate motifs, including most previously known transcription-factor binding sites and 105 new motifs. The 30-UTR analysis yields 106 motifs likely to be involved in post-transcriptional regulation. Nearly one-half are associated with microRNAs (miRNAs), leading to the discovery of many new miRNA genes and their likely target genes. Our results suggest that previous estimates of the number of human miRNA genes were low, and that miRNAs regulate at least 20% of human genes. The overall results provide a systematic view of gene regulation in the human, which will be refined as additional mammalian genomes become available.