TY - GEN
T1 - On comparison of feature selection algorithms
AU - Refaeilzadeh, Payam
AU - Tang, Lei
AU - Liu, Huan
PY - 2007
Y1 - 2007
N2 - Feature selection (FS) is extensively studied in machine learning. We often need to compare two FS algorithms (A1, A2). Without knowing true relevant features, a conventional way of evaluating A1 and A2 is to evaluate the effect of selected features on classification accuracy in two steps: selecting features from dataset D using Ai to form D′i, and obtaining accuracy using each D′i, respectively. The superiority of A1 or A 2 can be statistically measured by their accuracy difference. To obtain reliable accuracy estimation, k - fold cross-validation (CV) is commonly used: one fold of data is reserved in turn for test. FS may be performed only once at the beginning and subsequently the results of the two algorithms can be compared using CV; or FS can be performed k-times inside the CV loop. At first glance, the latter is the obvious choice for accuracy estimation. We investigate in this work if the two really differ when comparing two FS algorithms and provide findings of bias analysis.
AB - Feature selection (FS) is extensively studied in machine learning. We often need to compare two FS algorithms (A1, A2). Without knowing true relevant features, a conventional way of evaluating A1 and A2 is to evaluate the effect of selected features on classification accuracy in two steps: selecting features from dataset D using Ai to form D′i, and obtaining accuracy using each D′i, respectively. The superiority of A1 or A 2 can be statistically measured by their accuracy difference. To obtain reliable accuracy estimation, k - fold cross-validation (CV) is commonly used: one fold of data is reserved in turn for test. FS may be performed only once at the beginning and subsequently the results of the two algorithms can be compared using CV; or FS can be performed k-times inside the CV loop. At first glance, the latter is the obvious choice for accuracy estimation. We investigate in this work if the two really differ when comparing two FS algorithms and provide findings of bias analysis.
UR - http://www.scopus.com/inward/record.url?scp=52049116945&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=52049116945&partnerID=8YFLogxK
M3 - Conference contribution
SN - 9781577353324
T3 - AAAI Workshop - Technical Report
SP - 34
EP - 39
BT - Evaluation Methods for Machine Learning II - Papers from the 2007 AAAI Workshop, Technical Report
T2 - 2007 AAAI Workshop
Y2 - 22 July 2007 through 22 July 2007
ER -