TY - GEN
T1 - Maximum Distance Minimum Error (MDME)
T2 - 2017 Intelligent Systems Conference, IntelliSys 2017
AU - Trevino, Robert P.
AU - Lamkin, Thomas J.
AU - Smith, Ross
AU - Kawamoto, Steve A.
AU - Liu, Huan
N1 - Publisher Copyright: © 2017 IEEE.
PY - 2018/3/23
Y1 - 2018/3/23
N2 - Feature selection is a necessary preprocessing step in data analytics. Most distribution-based feature selection algorithms are parametric approaches that assume a normal distribution for the data. Often times, however, real world data do not follow a normal distribution, instead following a lognormal distribution. This is especially true in biology where latent factors often dictate distribution patterns. Parametric-based approaches are not well suited for this type of distribution. We propose the Maximum Distance Minimum Error (MDME) method, a non-parametric approach capable of handling both normal and log-normal data sets. The MDME method is based on the Kolmogorov-Smirnov test, which is well known for its ability to accurately test the dependency between two distributions without normal distribution assumptions. We test our MDME method on multiple datasets and demonstrate that our approach performs comparable to and often times better than the traditional parametric-based approaches.
AB - Feature selection is a necessary preprocessing step in data analytics. Most distribution-based feature selection algorithms are parametric approaches that assume a normal distribution for the data. Often times, however, real world data do not follow a normal distribution, instead following a lognormal distribution. This is especially true in biology where latent factors often dictate distribution patterns. Parametric-based approaches are not well suited for this type of distribution. We propose the Maximum Distance Minimum Error (MDME) method, a non-parametric approach capable of handling both normal and log-normal data sets. The MDME method is based on the Kolmogorov-Smirnov test, which is well known for its ability to accurately test the dependency between two distributions without normal distribution assumptions. We test our MDME method on multiple datasets and demonstrate that our approach performs comparable to and often times better than the traditional parametric-based approaches.
KW - Feature Selection
KW - High Content Screening
KW - Kolmogorov-Smirnov Test
UR - http://www.scopus.com/inward/record.url?scp=85051074530&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85051074530&partnerID=8YFLogxK
U2 - 10.1109/IntelliSys.2017.8324366
DO - 10.1109/IntelliSys.2017.8324366
M3 - Conference contribution
T3 - 2017 Intelligent Systems Conference, IntelliSys 2017
SP - 670
EP - 677
BT - 2017 Intelligent Systems Conference, IntelliSys 2017
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 7 September 2017 through 8 September 2017
ER -