TY - GEN
T1 - Do CONTRIBUTING Files Provide Information about OSS Newcomers' Onboarding Barriers?
AU - Fronchetti, Felipe
AU - Shepherd, David C.
AU - Wiese, Igor
AU - Treude, Christoph
AU - Gerosa, Marco Aurélio
AU - Steinmacher, Igor
N1 - Publisher Copyright: © 2023 Owner/Author.
PY - 2023/11/30
Y1 - 2023/11/30
N2 - Effectively onboarding newcomers is essential for the success of open source projects. These projects often provide onboarding guidelines in their 'CONTRIBUTING' files (e.g., CONTRIBUTING.md on GitHub). These files explain, for example, how to find open tasks, implement solutions, and submit code for review. However, these files often do not follow a standard structure, can be too large, and miss barriers commonly found by newcomers. In this paper, we propose an automated approach to parse these CONTRIBUTING files and assess how they address onboarding barriers. We manually classified a sample of files according to a model of onboarding barriers from the literature, trained a machine learning classifier that automatically predicts the categories of each paragraph (precision: 0.655, recall: 0.662), and surveyed developers to investigate their perspective of the predictions' adequacy (75% of the predictions were considered adequate). We found that CONTRIBUTING files typically do not cover the barriers newcomers face (52% of the analyzed projects missed at least 3 out of the 6 barriers faced by newcomers; 84% missed at least 2). Our analysis also revealed that information about choosing a task and talking with the community, two of the most recurrent barriers newcomers face, are neglected in more than 75% of the projects. We made available our classifier as an online service that analyzes the content of a given CONTRIBUTING file. Our approach may help community builders identify missing information in the project ecosystem they maintain and newcomers can understand what to expect in CONTRIBUTING files.
AB - Effectively onboarding newcomers is essential for the success of open source projects. These projects often provide onboarding guidelines in their 'CONTRIBUTING' files (e.g., CONTRIBUTING.md on GitHub). These files explain, for example, how to find open tasks, implement solutions, and submit code for review. However, these files often do not follow a standard structure, can be too large, and miss barriers commonly found by newcomers. In this paper, we propose an automated approach to parse these CONTRIBUTING files and assess how they address onboarding barriers. We manually classified a sample of files according to a model of onboarding barriers from the literature, trained a machine learning classifier that automatically predicts the categories of each paragraph (precision: 0.655, recall: 0.662), and surveyed developers to investigate their perspective of the predictions' adequacy (75% of the predictions were considered adequate). We found that CONTRIBUTING files typically do not cover the barriers newcomers face (52% of the analyzed projects missed at least 3 out of the 6 barriers faced by newcomers; 84% missed at least 2). Our analysis also revealed that information about choosing a task and talking with the community, two of the most recurrent barriers newcomers face, are neglected in more than 75% of the projects. We made available our classifier as an online service that analyzes the content of a given CONTRIBUTING file. Our approach may help community builders identify missing information in the project ecosystem they maintain and newcomers can understand what to expect in CONTRIBUTING files.
KW - FLOSS
KW - novices
KW - onboarding
KW - open source
KW - software engineering
UR - http://www.scopus.com/inward/record.url?scp=85180551146&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85180551146&partnerID=8YFLogxK
U2 - 10.1145/3611643.3616288
DO - 10.1145/3611643.3616288
M3 - Conference contribution
T3 - ESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
SP - 16
EP - 28
BT - ESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
A2 - Chandra, Satish
A2 - Blincoe, Kelly
A2 - Tonella, Paolo
PB - Association for Computing Machinery, Inc
T2 - 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023
Y2 - 3 December 2023 through 9 December 2023
ER -