Biomarker discovery using statistically significant gene sets

  • Hoon Kim
  • , John Watkinson
  • , Dimitris Anastassiou

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

Analysis of large gene expression data sets in the presence and absence of a phenotype can lead to the selection of a group of genes serving as biomarkers jointly predicting the phenotype. Among gene selection methods, filter methods derived from ranked individual genes have been widely used in existing products for diagnosis and prognosis. Univariate filter approaches selecting genes individually, although computationally efficient, often ignore gene interactions inherent in the biological data. On the other hand, multivariate approaches selecting gene subsets are known to have a higher risk of selecting spurious gene subsets due to the overfitting of the vast number of gene subsets evaluated. Here we propose a framework of statistical significance tests for multivariate feature selection that can reduce the risk of selecting spurious gene subsets. Using three existing data sets, we show that our proposed approach is an essential step to identify such a gene set that is generated by a significant interaction of its members, even improving classification performance when compared to established approaches. This technique can be applied for the discovery of robust biomarkers for medical diagnosis.

Original languageEnglish
Pages (from-to)1329-1338
Number of pages10
JournalJournal of Computational Biology
Volume18
Issue number10
DOIs
StatePublished - 1 Oct 2011
Externally publishedYes

Keywords

  • cancer classification
  • gene expression
  • gene interaction
  • gene selection
  • microarray

Fingerprint

Dive into the research topics of 'Biomarker discovery using statistically significant gene sets'. Together they form a unique fingerprint.

Cite this