|Publication Type||Journal Article|
|Year of Publication||2007|
|Authors||Alexe, G, Dalgin, GS, Scanfeld, D, Tamayo, P, Mesirov, JP, Ganesan, S, Delisi, C, Bhanot, G|
|Journal||Genome informatics. International Conference on Genome Informatics|
We describe a new method based on principal component analysis and robust consensus ensemble clustering to identify and elucidate the subtypes of breast cancer disease. The method was applied to microarray gene expression data using micro-dissection of samples from 36 breast cancer patients with at least two of three pathological stages of disease. Controls were normal breast epithelial cells from 3 disease free patients. Our method identified an optimum set of genes and strong, stable clusters which correlated well with clinical classification into Luminal, Basal and Her2+ subtypes based on ER, PR and Her2 status. It also revealed a hierarchical portrait of disease progression through various grades and stages and identified genes and functional pathways for each stage, grade and disease subtype. We found that gene expression heterogeneity across subtypes is much greater than the heterogeneity of progression from DCIS to IDC within a subtype, suggesting that the disease subtypes are distinct disease processes. The averaging over data perturbations and clustering methods is critical in the robust identification of subtypes and gene markers for grade and progression.