Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression

Science 286:531-537. (1999).. Published: 1999.10.14

T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander

Read Manuscript

Abstract

Although cancer classification has improved over the past 30 years, there has been no general approach for identifying new cancer classes (class discovery) or for assigning tumors to known classes (class prediction). Here, a generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case. A class discovery procedure automatically discovered the distinction between acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) without previous knowledge of these classes. An automatically derived class predictor was able to determine the class of new leukemia cases. The results demonstrate the feasibility of cancer classification based solely on gene expression monitoring and suggest a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.

Keywords: Leukemia, ALL, AML, gene expression, prediction, class discovery, gene marker, molecular classification, supervised, unsupervised.

All aml

Supplemental Data

Description Link/Filename
Paper (PDF) Golub_et_al_1999.pdf
Files description Files_descriptions.txt
Experimental protocol protocol.html
Rescaling factors table_ALL_AML_rfactors.txt
Samples table (Word) table_ALL_AML_samples.rtf
Samples table (text) table_ALL_AML_samples.txt
Train dataset (Excel) data_set_ALL_AML_train.tsv
Train dataset (text) data_set_ALL_AML_train.txt
Test datset (Excel) data_set_ALL_AML_independent.tsv
Test dataset (text) data_set_ALL_AML_independent.txt
Prediction results (Word) table_ALL_AML_predic.rtf
Prediction results (text) table_ALL_AML_predic.txt
Original and supplemental figures (Powerpoint) Figures_original_plus_suppl.ppt
Train dataset in WI format ALL_vs_AML_train_set_38_sorted.res
Train dataset class vector in WI format ALL_vs_AML_train_set_38_sorted.cls
Test dataset in WI format Leuk_ALL_AML.test.res
Test dataset class vector in WI format Leuk_ALL_AML.test.cls