You are here

Journal of Computational Biology: A Journal of Computational Molecular Cell Biology DOI:

Estimating dataset size requirements for classifying DNA microarray data

Publication TypeJournal Article
Year of Publication2003
AuthorsMukherjee, S, Tamayo, P, Rogers, S, Rifkin, R, Engle, A, Campbell, C, Golub, TR, Mesirov, JP
JournalJournal of Computational Biology: A Journal of Computational Molecular Cell Biology
Volume10
Issue2
Pages119 - 42
Date Published2003///
ISBN Number1066-5277
KeywordsAlgorithms, Cancer, Computational Biology, Computer Simulation, Gene Expression Profiling, Humans, Models, Molecular, Neoplasms, Oligonucleotide Array Sequence Analysis
Abstract

A statistical methodology for estimating dataset size requirements for classifying microarray data using learning curves is introduced. The goal is to use existing classification results to estimate dataset size requirements for future classification experiments and to evaluate the gain in accuracy and significance of classifiers built with additional data. The method is based on fitting inverse power-law models to construct empirical learning curves. It also includes a permutation test procedure to assess the statistical significance of classification performance for a given dataset size. This procedure is applied to several molecular classification problems representing a broad spectrum of levels of complexity.

URLhttp://www.ncbi.nlm.nih.gov/pubmed/12804087