TY - JOUR
T1 - Performance comparison of machine learning platforms
AU - Roy, Asim
AU - Qureshi, Shiban
AU - Pande, Kartikeya
AU - Nair, Divitha
AU - Gairola, Kartik
AU - Jain, Pooja
AU - Singh, Suraj
AU - Sharma, Kirti
AU - Jagadale, Akshay
AU - Lin, Yi Yang
AU - Sharma, Shashank
AU - Gotety, Ramya
AU - Zhang, Yuexin
AU - Tang, Ji
AU - Mehta, Tejas
AU - Sindhanuru, Hemanth
AU - Okafor, Nonso
AU - Das, Santak
AU - Gopal, Chidambara N.
AU - Rudraraju, Srinivasa B.
AU - Kakarlapudi, Avinash V.
N1 - Publisher Copyright: Copyright: © 2019 INFORMS.
PY - 2019
Y1 - 2019
N2 - In this paper, we present a method for comparing and evaluating different collections of machine learning algorithms on the basis of a given performance measure (e.g., accuracy, area under the curve (AUC), F-score). Such a method can be used to compare standard machine learning platforms such as SAS, IBM SPSS, and Microsoft Azure ML. A recent trend in automation of machine learning is to exercise a collection of machine learning algorithms on a particular problem and then use the best performing algorithm. Thus, the proposed method can also be used to compare and evaluate different collections of algorithms for automation on a certain problem type and find the best collection. In the study reported here, we applied the method to compare six machine learning platforms – R, Python, SAS, IBM SPSS Modeler, Microsoft Azure ML, and Apache Spark ML. We compared the platforms on the basis of predictive performance on classification problems because a significant majority of the problems in machine learning are of that type. The general question that we addressed is the following: Are there platforms that are superior to others on some particular performance measure? For each platform, we used a collection of six classification algorithms from the following six families of algorithms – support vector machines, multilayer perceptrons, random forest (or variant), decision trees/gradient boosted trees, Naive Bayes/Bayesian networks, and logistic regression. We compared their performance on the basis of classification accuracy, F-score, and AUC. We used F-score and AUC measures to compare platforms on two-class problems only. For testing the platforms, we used a mix of data sets from (1) the University of California, Irvine (UCI) library, (2) the Kaggle competition library, and (3) high-dimensional gene expression problems. We performed some hyperparameter tuning on algorithms wherever possible.
AB - In this paper, we present a method for comparing and evaluating different collections of machine learning algorithms on the basis of a given performance measure (e.g., accuracy, area under the curve (AUC), F-score). Such a method can be used to compare standard machine learning platforms such as SAS, IBM SPSS, and Microsoft Azure ML. A recent trend in automation of machine learning is to exercise a collection of machine learning algorithms on a particular problem and then use the best performing algorithm. Thus, the proposed method can also be used to compare and evaluate different collections of algorithms for automation on a certain problem type and find the best collection. In the study reported here, we applied the method to compare six machine learning platforms – R, Python, SAS, IBM SPSS Modeler, Microsoft Azure ML, and Apache Spark ML. We compared the platforms on the basis of predictive performance on classification problems because a significant majority of the problems in machine learning are of that type. The general question that we addressed is the following: Are there platforms that are superior to others on some particular performance measure? For each platform, we used a collection of six classification algorithms from the following six families of algorithms – support vector machines, multilayer perceptrons, random forest (or variant), decision trees/gradient boosted trees, Naive Bayes/Bayesian networks, and logistic regression. We compared their performance on the basis of classification accuracy, F-score, and AUC. We used F-score and AUC measures to compare platforms on two-class problems only. For testing the platforms, we used a mix of data sets from (1) the University of California, Irvine (UCI) library, (2) the Kaggle competition library, and (3) high-dimensional gene expression problems. We performed some hyperparameter tuning on algorithms wherever possible.
KW - Classification algorithms
KW - Comparison of algorithms
KW - Comparison of platforms
KW - Machine learning platforms
UR - http://www.scopus.com/inward/record.url?scp=85070370695&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85070370695&partnerID=8YFLogxK
U2 - 10.1287/ijoc.2018.0825
DO - 10.1287/ijoc.2018.0825
M3 - Article
SN - 1091-9856
VL - 31
SP - 207
EP - 225
JO - INFORMS Journal on Computing
JF - INFORMS Journal on Computing
IS - 2
ER -