Feature selection for high dimensional data using monte carlo tree search

Muhammad Umar Chaudhry, Jee Hyong Lee

Research output: Contribution to journalArticlepeer-review

19 Scopus citations

Abstract

Feature selection is the preliminary step in machine learning and data mining. It identifies the most important and relevant features within a dataset by eliminating the redundant or irrelevant features. The substantial benefits may include an improved performance in terms of high prediction accuracy, reduced computational complexity, and simply interpretable underlying models. In this paper, we present a novel framework to investigate and understand the importance of Monte Carlo tree search (MCTS) in feature selection for very high-dimensional datasets. We construct a binary feature selection tree where each node represents one of the two feature states: A feature is selected or not. The search starts with an empty root node reflecting that no feature is selected. Then, the search tree is expanded by adding nodes in an incremental fashion through MCTS-based simulations. Following tree and default policy, every iteration generates an initial feature subset, where a filter is used to select the top k features forming the candidate feature subset. The classification accuracy is used as the goodness or reward of the candidate feature subset and propagated backward up to the root node following the active path. Finally, the candidate subset with highest reward is selected as the best feature subset. Experiments are performed on 30 real-world datasets, including 14 very high-dimensional microarray datasets, and results are also compared with state-of-the-art methods in the literature, which proves the efficacy, validity, and significance of the proposed method.

Original languageEnglish
Article number8548538
Pages (from-to)76036-76048
Number of pages13
JournalIEEE Access
Volume6
DOIs
StatePublished - 2018
Externally publishedYes

Keywords

  • Dimensionality reduction
  • feature selection
  • filter-wrapper
  • H-MOTiFS
  • hybrid
  • monte Carlo tree search (MCTS)

Fingerprint

Dive into the research topics of 'Feature selection for high dimensional data using monte carlo tree search'. Together they form a unique fingerprint.

Cite this