On comparison of feature selection algorithms

Payam Refaeilzadeh, Lei Tang, Huan Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

40 Scopus citations

Abstract

Feature selection (FS) is extensively studied in machine learning. We often need to compare two FS algorithms (A1, A2). Without knowing true relevant features, a conventional way of evaluating A1 and A2 is to evaluate the effect of selected features on classification accuracy in two steps: selecting features from dataset D using Ai to form D′i, and obtaining accuracy using each D′i, respectively. The superiority of A1 or A 2 can be statistically measured by their accuracy difference. To obtain reliable accuracy estimation, k - fold cross-validation (CV) is commonly used: one fold of data is reserved in turn for test. FS may be performed only once at the beginning and subsequently the results of the two algorithms can be compared using CV; or FS can be performed k-times inside the CV loop. At first glance, the latter is the obvious choice for accuracy estimation. We investigate in this work if the two really differ when comparing two FS algorithms and provide findings of bias analysis.

Original languageEnglish (US)
Title of host publicationEvaluation Methods for Machine Learning II - Papers from the 2007 AAAI Workshop, Technical Report
Pages34-39
Number of pages6
StatePublished - 2007
Event2007 AAAI Workshop - Vancouver, BC, Canada
Duration: Jul 22 2007Jul 22 2007

Publication series

NameAAAI Workshop - Technical Report
VolumeWS-07-05

Other

Other2007 AAAI Workshop
Country/TerritoryCanada
CityVancouver, BC
Period7/22/077/22/07

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'On comparison of feature selection algorithms'. Together they form a unique fingerprint.

Cite this