|Publication Type||Journal Article|
|Year of Publication||2004|
|Authors||Brunet, J-P, Tamayo, P, Golub, TR, Mesirov, JP|
|Journal||Proceedings of the National Academy of Sciences of the United States of America|
|Pages||4164 - 9|
|Keywords||Algorithms, Cancer, Central Nervous System Neoplasms, Computational Biology, Data Interpretation, Genetic, Leukemia, Medulloblastoma, Models, Neoplasms, Statistical|
We describe here the use of nonnegative matrix factorization (NMF), an algorithm based on decomposition by parts that can reduce the dimension of expression data from thousands of genes to a handful of metagenes. Coupled with a model selection mechanism, adapted to work for any stochastic clustering algorithm, NMF is an efficient method for identification of distinct molecular patterns and provides a powerful method for class discovery. We demonstrate the ability of NMF to recover meaningful biological information from cancer-related microarray data. NMF appears to have advantages over other methods such as hierarchical clustering or self-organizing maps. We found it less sensitive to a priori selection of genes or initial conditions and able to detect alternative or context-dependent patterns of gene expression in complex biological systems. This ability, similar to semantic polysemy in text, provides a general method for robust molecular pattern discovery.