TY - GEN
T1 - Tell Me Who Are You Talking to and I Will Tell You What Issues Need Your Skills
AU - Santos, Fabio
AU - Penney, Jacob
AU - Pimentel, Joao Felipe
AU - Wiese, Igor
AU - Steinmacher, Igor
AU - Gerosa, Marco A.
N1 - Publisher Copyright: © 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Selecting an appropriate task is challenging for newcomers to Open Source Software (OSS) projects. To facilitate task selection, researchers and OSS projects have leveraged machine learning techniques, historical information, and textual analysis to label tasks (a.k.a. issues) with information such as the issue type and domain. These approaches are still far from mainstream adoption, possibly because of a lack of good predictors. Inspired by previous research, we advocate that label prediction might benefit from leveraging metrics derived from communication data and social network analysis (SNA) for issues in which social interaction occurs. Thus, we study how these "social metrics"can improve the automatic labeling of open issues with API domains - categories of APIs used in the source code that solves the issue - which the literature shows that newcomers to the project consider relevant for task selection. We mined data from OSS projects' repositories and organized it in periods to reflect the seasonality of the contributors' project participation. We replicated metrics from previous work and added social metrics to the corpus to predict API-domain labels. Social metrics improved the performance of the classifiers compared to using only the issue description text in terms of precision, recall, and F-measure. Precision (0.922) increased by 15.82% and F-measure (0.942) by 15.89% for a project with high social activity. These results indicate that social metrics can help capture the patterns of social interactions in a software project and improve the labeling of issues in an issue tracker.
AB - Selecting an appropriate task is challenging for newcomers to Open Source Software (OSS) projects. To facilitate task selection, researchers and OSS projects have leveraged machine learning techniques, historical information, and textual analysis to label tasks (a.k.a. issues) with information such as the issue type and domain. These approaches are still far from mainstream adoption, possibly because of a lack of good predictors. Inspired by previous research, we advocate that label prediction might benefit from leveraging metrics derived from communication data and social network analysis (SNA) for issues in which social interaction occurs. Thus, we study how these "social metrics"can improve the automatic labeling of open issues with API domains - categories of APIs used in the source code that solves the issue - which the literature shows that newcomers to the project consider relevant for task selection. We mined data from OSS projects' repositories and organized it in periods to reflect the seasonality of the contributors' project participation. We replicated metrics from previous work and added social metrics to the corpus to predict API-domain labels. Social metrics improved the performance of the classifiers compared to using only the issue description text in terms of precision, recall, and F-measure. Precision (0.922) increased by 15.82% and F-measure (0.942) by 15.89% for a project with high social activity. These results indicate that social metrics can help capture the patterns of social interactions in a software project and improve the labeling of issues in an issue tracker.
KW - Human Factors
KW - Labels
KW - Machine Learning
KW - Mining Software Repositories
KW - Open Source Software
KW - Skills
KW - Social Network Analysis
KW - Tags
UR - http://www.scopus.com/inward/record.url?scp=85151379625&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85151379625&partnerID=8YFLogxK
U2 - 10.1109/MSR59073.2023.00087
DO - 10.1109/MSR59073.2023.00087
M3 - Conference contribution
T3 - Proceedings - 2023 IEEE/ACM 20th International Conference on Mining Software Repositories, MSR 2023
SP - 611
EP - 623
BT - Proceedings - 2023 IEEE/ACM 20th International Conference on Mining Software Repositories, MSR 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 20th IEEE/ACM International Conference on Mining Software Repositories, MSR 2023
Y2 - 15 May 2023 through 16 May 2023
ER -