From Specialized Mechanics To Project Butlers The Usage of Bots in Open Source Software Development

FOCUS: BOTS IN SOFTWARE ENGINEERING
From Specialized from managing a large number of

contributors and their contributions
and the ever-faster pace in CI/CD,
Mechanics to
some small-scale automation tech-
niques have found their way into
OSS projects’ engineering practices.
Project Butlers
Software agents for software engi-
neering (SE) tasks, i.e., developer bots,
or bots, for short, are exactly such au-
tomation techniques that have the po-
tential to significantly automate many
aspects of building software. Various
The Usage of Bots in bots have been proposed for improv-
ing developers’ workflow through
Open Source Software better productivity and work–life

balance.2 These software agents and
bots automate repetitive and routine
Development tasks, such as managing issues and
pull requests, building and executing
test suites, and so on.3,4 Thus, they
Zhendong Wang, University of California, Irvine can often substitute for human labor.
While these bots have saved substan-
Yi Wang, Beijing, University of Posts and Telecommunications tial human efforts, they also have ex-
hibited limitations, for instance, not
David Redmiles, University of California, Irvine being designed for smart tasks and
lacking interactivity during human–
// We seek to identify how open source software bot communication.3,5
Bots will inevitably be adopted by
(OSS) projects adopt bot services from a diverse set of more and more OSS projects in the
selections. Our empirical research examines bot usage foreseeable future of automated SE.
Thanks to the continuous efforts of
in the most popular OSS repositories in GitHub.//
practitioners and researchers, the cur-
rent bot market offers a wide range of
development services6 for OSS main-
tainers to shop around. However, we
have little knowledge about how prac-
titioners select and adopt these bots.
Here, we start to bridge the gap
by empirically examining the bots
adopted by popular OSS projects
on GitHub, the largest online social
coding platform. Combining auto-
TODAY, MAINSTREAM OPEN source is afforded by various online social mated bot detection and manual re-
software (OSS) development often coding platforms, e.g., GitHub.1 At view, we compile a list of bots used in
involves a huge base of contributors, the same time, the fast, ongoing pene- 1,000 popular projects. Analyses of
users, and other stakeholders, which tration of the continuous integration, de- these bot services show a remarkable
livery, and deployment (CI/CD) model adoption rate (61%) among popular
Digital Object Identifier 10.1109/MS.2022.3180297
makes urgent demands on rapid it- GitHub OSS projects when compared
Date of current version: 22 August 2022 erations. Facing increasing pressures with prior studies.4 However, the
This work is licensed under a Creative Commons
Attribution 4.0 License. For more information, see
38 I E E E S O F T WA R E | PUBLISHED BY THE IEEE COMPUTER SO CIE T Y https://creativecommons.org/licenses/by/4.0/deed.ast.
application scenarios of these bots bot creation frameworks that bring Data Collection and Analysis
are still limited: many of them are specific engineering benefits, such as First, to automatically identify bot ac-
combinations of simple automation OpenBot, Fabric Bot, and Windows tivities from GitHub repositories, we
features, which are triggered by spe- Community Tool. adopted a bot identification method
cific project events, and they perform Meanwhile, researchers have col- called BODEGHA,10 a machine learn-
designated tasks through a rule-based lected practitioners’ feedback for fu- ing classifier using language patterns
mechanism. In addition, over 69% of ture improvement. For instance, they and features of comments to detect
bots remain private, without trans- find that practitioners expect more bot activities. Second, we manually
parently sharing their implementation smart features in bots’ communica- examined the automatically classified
and being accessible outside projects tion and functionality.4 Erlenhov bot accounts/services and reviewed
and organizational ecosystems. et al. also emphasize the importance their activities for data reliability.
for a bot to be socially competent We collected activity data from
Bots in OSS Development and bring technical value, especially the most recent list of event actors in
In the past decade, the emerging when it is designed for smarter and commits, issues, and pull requests.
CI/CD model and the large-scale more complex tasks. 5 Liu et al.’s sur- We also reviewed repository infor-
collaborative software development vey3 created seven principles for de- mation documents, e.g., readme.md.
powered by online social coding plat- signing bots that demonstrate the When we found an event actor that
forms have driven the rapid evolution critical challenges in interactions. was suspiciously repeating similar
of OSS development practices. Fast it- activities or demonstrated that the
erations drastically complicate the de- Methods content was automatically generated,
velopment process as projects evolve. This study extracted activity traces of we first reviewed its GitHub profile
Therefore, practitioners resolve to ex- bot accounts and apps from GitHub or GitHub Marketplace/app listings
ploit automation techniques that han- repositories and identified bots through for its naming and description. Sec-
dle specific routine development and automatic bot detection and manual ond, we reviewed the content of this
social tasks so that they can focus on review. The data we sought included GitHub account’s recent comments
innovative activities. the following: and activities. Finally, for some ac-
These automation techniques, i.e., counts that did not display their con-
bots, may support developers through • a list of popular software devel- tributions, we collected their activity
many service mediums. Some bots opment repositories that have records through the GitHub applica-
have user profiles and are authorized employed bots in CI/CD tion programming interface.
as their projects’ contributors.4 Besides • descriptions of bot accounts, Two researchers generated the pre-
these account-based bots, others may apps, and services employed by ceding review protocol for examin-
employ platform applications, user set- these repositories. ing and identifying bot services and
tings, and various external services.3 tested it on a random subsample of
Practitioners often adopt desired bot Sampled Repositories 100 repositories. The process achieved
services with tradeoffs among perfor- The number of GitHub stars is an a 0.645 interrater reliability (Cohen’s
mance, privacy, and security. indicator of popularity and is also kappa), suggesting substantial agree-
SE practitioners and researchers associated with project evolution.8 ment. Any differences were resolved
have put significant effort into bot Sampling most starred repositories during a discussion session and led to
development.7 Several bot creation helps to avoid many perils in mining updating the protocol. Finally, three
frameworks have been developed to repositories, e.g., projects that received researchers followed the updated pro-
separate bot operation logic from spe- minimal participation. 9 Moreover, tocol and labeled the entire sample.
cific platform mechanisms, such as we ensured that the sample included For each identified bot, we ex-
GitHub’s WebHook event subscrip- only software development projects tracted several data items, including
tion. For example, Probot allows bot using the pull request model, which the bot’s service-providing medium,
developers to interact with GitHub excluded non-SE repositories. Finally, main functionality, service and source
via an internal event emitter, and due to the limitations of manual re- code availability, and design/creation
therefore, developers may focus on viewing, e.g., substantial time effort, framework, if indicated. In addition,
designing bot logic. There are other the sample size was set as 1,000. we collected the external reference of
S EP TEMBER /OCTOBER 2022 | I E E E S O F T WA R E 39

the bot, including its documentation merging approved/passed pull re- dependency is update to date,
site, GitHub profile page, and main quests (Bors, LGTM, and Repo and it detects and reports secu-
service medium site. Some sets of bots Ranger), and automating build- rity issues (Dependabot, Depfu,
providing the same service often un- ing, testing, and deployment and Renovate).
der one application, such as Codecov (Vercel, Buildbot, and Bors). • Developer and user community
IO and Codecov Commenter and De- • Issue and pull request manage- support: The service assists with
pendabot Preview and Dependabot, ment: The service automates the management of a community
were merged into one service. A rep- management of issue and pull through collecting contributor li-
lication package of the manual review request tracking systems, in- cense agreements, acknowledging
protocol and bot data is available at cluding labeling issues and pull contributors (All Contributors),
https://tinyurl.com/33392bp3. requests, cleaning inactive/in- managing contributor permissions
valid requests (Mary Poppins, (MeeseeksDev), and automatically
Results Carsonbot, and Support), and commenting for welcoming con-
Among the 1,000 sampled reposito- requesting and formatting issue tributors and providing feedback
ries, BODEGHA identified bot ac- and pull request content through (Mirai Bot and Welcome).
tivities in 462. Following the steps checklists (Issue Check, Request • Documentation generation: The
described in the “Data Collection Info, and Vue Bot). service assists in generating end
and Analysis” section, our manual • Code review assistance: The user and developer documen-
review found that 613 sampled re- service provides assistance in tation (DecDocs Bot, Weekly
positories employed 201 distinc- the code review process, in- Digest, and RC Publisher).
tive bot services, though three of cluding performing static and
them were reported as deprecated or dynamic analysis (Sourcery AI, These application scenarios and
stopped services. SourceLevel Bot, and Codecov), functions of adopted bot services
Based on our data, bots’ func- change–set summaries and dif- are similar to the findings in prior
tions fall into six categories: ference visualization (Changeset research.4,11 Additionally, in our re-
Bot, ECMA262 Compare Bot, search, we examine combinations
• CI task assistance: The service and Sizebot), and code reviewer of bot functions. Most prevalent is
automates the various tasks for assignment (Googlebot, PullAp- the sole function for automating CI
CI/CD, including managing prove, and VS Code Triage Bot). tasks. Combinations of functions
branches and releases (Release • Dependency and security: The are less prevalent in terms of distinc-
Drafter and Semantic Release service periodically checks tive bot services (refer to Figure 1 for
Bot), pushing commits and and ensures that a repository’s more details).
40
40
Combination
28
Frequency
30
21 20
Bot
20
11 11
10 7 6 5 5 5 5 5 4 4 3 2 2 2 2 1 1 1 1 1 1 1 1 1 1
0
(a)
CI
Code Review
Community
Dependency
Documentation
Issue and
Pull Request
75 50 25 0
Overall
Frequency
by Function
(b)
FIGURE 1. The (a) frequency of bot function combinations and the (b) overall bot function frequency (in sampled projects).
40 I E E E S O F T WA R E | W W W. C O M P U T E R . O R G / S O F T W A R E | @ I E E E S O F T WA R E
Table 1. The top 10 most prevalent bot services among popular OSS development projects.
Cost for Code Service Creation Number of
Bot service Service medium Interactions developers Functionalities availability availability framework projects
Dependabot GitHub repository Pull request Free Update dependencies of a repository Yes Public NA 171
security settings and pull request from automatically generated pull
comment requests
Stale GitHub apps Issue and Free Label inactive issues and pull requests Yes Public ProBot 100
pull request as “stale” and close them if they remain
comments inactive
Codecov GitHub apps and Pull request Free and Report the test coverage of a pull No Public NA 89
actions comments paid request
Googlebot GitHub apps Pull request Private Check and collect contributor license Yes Organizational ProBot 51
authorized accoun comments agreements (CLAs) and label an issue
or pull request with its CLA status
Assign developers to issue and pull
requests
Collect user feedback after an issue
closes
CLA Assistant GitHub apps and Pull request Free Comments on pull requests to ask Yes Public NA 46
actions comments contributors to sign CLAs
Renovate GitHub apps Pull requests Free Update dependencies of a repository Yes Public NA 34
and pull request from automatically generated pull
comments requests
Facebook Github Authorized Pull request Private Comment on a pull request to ask the No Organizational NA 30
Bot account comments contributor to sign a CLA
Make commit and push changes
Summarize changes to a pull request
and execute automatic tests
S EP TEMBER /OCTOBER 2022

Label the CI status on a pull request
|
Coveralls GitHub apps and Pull request Free and Report the test coverage of a pull No Public NA 26
actions comments paid request
Vercel GitHub apps and Pull request Free and Automate CI for building, testing, and No Public NA 20
actions comments paid deployment for front-end applications
Comments on pull requests include URL
of a preview of deployed changes
I E E E S O F T WA R E
We do not include GitHub Action as a typical bot service in this study because it is usually powered by existing as well as customized automation.
41
Many bots have specialized in advanced tasks in practice, the conve- closed source, according to our obser-
multiple tasks and become a butler for nience they provide has outweighed vations. First, there are security and
a repository. For example, RepoKit- many of their drawbacks, e.g., disrup- business concerns about bots’ func-
teh performs multiple tasks, including tive notifications.11 The current prac- tions: bots often have a high level of
checking the format of a pull request, tice of OSS development on GitHub permission (above write access) in the
automating tests on a pull request, has largely become a semiautomated repositories. Open sourcing a bot may
assigning users to issue and labeling procedure heavily assisted by bots. expose the vulnerabilities in a reposi-
problems, and merging pull requests However, the popular bots we tory’s automating workflow, which
that passed automatic tests. Some identified employed similar design leads to risks of malicious behaviors.
organizations have even customized mechanisms and provided limited in- Second, many customized butler bots
a private butler bot overseeing their teractivity. For a limited number of OS integrate multiple existing services and
CI/CD practices, e.g., Googlebot and bots, the implementation was with a adapt to the needs of a specific project
Facebook GitHub Bot in Table 1. rule-based system that subscribed to or ecosystem. Thus, developers have
few motivations to share source code,
as many components of such bots
have been publicly available. How-
ever, without a framework that inte-
Bots will inevitably be adopted by grates various bot services, the current
situation is far from optimal for the
more and more OSS projects in the development of bots since many repos-
foreseeable future of automated SE. itories choose to withhold their bots.
Finally, we list the most prevalent

bot services, based on how many proj-
ects they appear in, in Table 1, and
and acted based on certain repository
events and event payloads. For exam-
ple, when offering automatic issue la-
T his study leveraged auto-
mated and manual reviews
to investigate the adoption
of bots in OSS development practice
on GitHub. We found that bots had
we spot a few common patterns for beling, Carsonbot required a strictly become more prevalent in OSS de-
their services. Noticeably, these ser- formatted issue with content in its tem- velopment, as expected.6 However,
vices expect Facebook GitHub Bot plate. Moreover, the interactivity of the somewhat unexpectedly, SE bots had
to provide a quick setup option with bots has not been improved, for vari- not evolved much beyond atomic ap-
GitHub apps and actions. Pull request ous reasons.7 The basic mechanism plication scenarios but simply inte-
comments are the major communica- is still to respond and act under pre- grated more functions and multiple
tion channel for them, in addition to defined input for completing simple, tasks. This study challenged our ideas
Stale, which is also specialized with tedious tasks. However, we observe about the future of bots. Namely, one
issues. Further, these services provide the prevalence of multitasking butlers line of future work could be to provide
a free installment option to the public, in popular GitHub repositories, and an interactive interface for butler bots
though two services are subject only we anticipate a growing percentage of while communicating with develop-
to their organizational ecosystems. bots that automate multiple tasks in- ers during CI/CD practices. Another
stead of specializing in just one.5 line of research could be to develop a
Discussion Finally, for over two-thirds (69%) mechanism or framework to enable a
Over 60% of the sampled reposi- of the identified bots, their implemen- bot-sharing community and address
tories employed at least one bot to tations remained opaque to the public. the concerns of security.
automate routine workflows. The Moreover, more than half (53%) were
high adoption rate suggests that SE available and applicable only to spe- Acknowledgments
bots have become prevalent in OSS cific repositories and organizations. We would like to thank the editor
projects. Although bots have not We argue that there are two major and anonymous reviewers. Zhen-
been sophisticated enough to handle reasons for bots to be kept private and dong Wang and David Redmiles are
42 I E E E S O F T WA R E | W W W. C O M P U T E R . O R G / S O F T W A R E | @ I E E E S O F T WA R E
ABOUT THE AUTHORS
supported, in part, by the Donald Bren
School of Information and Computer
Sciences, University of California, Ir- ZHENDONG WANG is working toward his Ph.D. in software
vine. Yi Wang is partially supported by engineering at the University of California, Irvine, Irvine,
the National Natural Science Founda- California, 92697, USA. His research interests include leveraging
tion of China, under grant 62172049. and supporting expertise in software development, particularly
in distributed software development. Wang received his M.S.
References in software engineering from the University of California, Irvine.
1. M. Gerosa et al., “The shifting sands Contact him at zhendow@uci.edu.
of motivation: Revisiting what drives
contributors in open source,” in Proc. YI WANG is a professor with the School of Computer Science
IEEE 43rd Int. Conf. Softw. Eng., (National Pilot Software Engineering School) and the Key
2021, pp. 1046–1058, doi: 10.1109/ Laboratory of Trustworthy Distributed Computing and Service,
ICSE43902.2021.00098. Beijing University of Posts and Telecommunications, Beijing,
2. A. N. Meyer, L. E. Barton, 100876, China. His research interests include human and
G. C. Murphy, T. Zimmermann, and social factors in software engineering, computer-supported
T. Fritz, “The work life of develop- cooperative work, and social computing. Wang received his
ers: Activities, switches and per- Ph.D. from the University of California, Irvine. Contact him at
ceived productivity,” IEEE Trans. wang@cocolabs.org.
Softw. Eng., vol. 43, no. 12, pp.
1178–1193, 2017, doi: 10.1109/ DAVID REDMILES is a professor in the Department of
TSE.2017.2656886. Informatics, Donald Bren School of Information and Computer
3. D. Liu, M. J. Smith, and K. Veeram- Sciences, University of California, Irvine, Irvine, Califor-
achaneni, “Understanding user-bot nia, 92697, USA. His research interests include software
interactions for small-scale automa- engineering, human–computer interaction, and computer-
tion in open-source development,” supported cooperative work. Redmiles received his Ph.D. in
in Proc. CHI Conf. Hum. Factors computer science from the University of Colorado, Boulder.
Comput. Syst., 2020, pp. 1–8, doi: He is a member of the Association for Computing Machinery
10.1145/3334480.3382998. and the IEEE Computer Society. Contact him at redmiles@
4. M. Wessel et al., “The power of bots: ics.uci.edu.
Characterizing and understanding bots
in OSS projects,” Proc. ACM Hum.-
Comput. Interact., vol. 2, no. CSCW,
pp. 1–19, 2018, doi: 10.1145/3274451.
5. L. Erlenhov, F. G. de Oliveira Neto, Proc. 17th Int. Conf. Mining Softw. Repositories, 2014, pp. 92–101, doi:
R. Scandariato, and P. Leitner, Repositories, 2020, pp. 174–185, doi: 10.1145/2597073.2597074.
“Current and future bots in soft- 10.1145/3379597.3387472. 10. M. Golzadeh, A. Decan, D. Legay,
ware development,” in Proc. 8. H. Borges and M. T. Valente, and T. Mens, “A ground-truth data-
IEEE/ACM 1st Int. Workshop Bots “What’s in a GitHub star? Under- set and classification model for de-
Softw. Eng. (BotSE), 2019, pp. 7–11, standing repository starring prac- tecting bots in GitHub Issue and PR
doi: 10.1109/BotSE.2019.00009. tices in a social coding platform,” comments,” J. Syst. Softw., vol. 175,
6. C. Lebeuf, M.-A. Storey, and A. Za- J. Syst. Softw., vol. 146, p. 110,911, May 2021, doi: 10.1016/j.
galsky, “Software bots,” IEEE Softw., pp. 112–129, Dec. 2018, doi: jss.2021.110911.
vol. 35, no. 1, pp. 18–23, 2018, doi: 10.1016/j.jss.2018.09.016. 11. M.-A. Storey and A. Zagalsky,
10.1109/MS.2017.4541027. 9. E. Kalliamvakou, G. Gousios, K. “Disrupting developer productiv-
7. A. Abdellatif, D. Costa, K. Badran, Blincoe, L. Singer, D. M. German, ity one bot at a time,” in Proc. 24th
R. Abdalkareem, and E. Shihab, and D. Damian, “The promises and ACM SIGSOFT Int. Symp. Found.
“Challenges in chatbot development: perils of mining GitHub,” in Proc. Softw. Eng., 2016, pp. 928–931, doi:
A study of stack overflow posts,” in 11th Working Conf. Mining Softw. 10.1145/2950290.2983989.
S EP TEMBER /OCTOBER 2022 | I E E E S O F T WA R E 43

From Specialized Mechanics To Project Butlers The Usage of Bots in Open Source Software Development

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

From Specialized Mechanics To Project Butlers The Usage of Bots in Open Source Software Development

Uploaded by

Copyright:

Available Formats

FOCUS: BOTS IN SOFTWARE ENGINEERING

From Specialized from managing a large number of

Open Source Software better productivity and work–life

S EP TEMBER /OCTOBER 2022 | I E E E S O F T WA R E 39

S EP TEMBER /OCTOBER 2022

Finally, we list the most prevalent

S EP TEMBER /OCTOBER 2022 | I E E E S O F T WA R E 43

You might also like

From Specialized Mechanics To Project Butlers The Usage of Bots in Open Source Software Development

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

From Specialized Mechanics To Project Butlers The Usage of Bots in Open Source Software Development

Uploaded by

Copyright:

Available Formats

FOCUS: BOTS IN SOFTWARE ENGINEERING

From Specialized from managing a large number of

Open Source Software better productivity and work–life

S EP TEMBER /OCTOBER 2022 | I E E E S O F T WA R E 39

S EP TEMBER /OCTOBER 2022

Finally, we list the most prevalent

S EP TEMBER /OCTOBER 2022 | I E E E S O F T WA R E 43

You might also like

S EP TEMBER /OCTOBER 2022 | I E E E S O F T WA R E 39