Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

The Usage of Bots in Open Source Software

Development
1st Sankalp 2nd Lata
2021-IMT-084 2021-IMT-059
ABV-Indian Institute of Technology and Management ABV-Indian Institute of Technology and Management
Gwalior, India Gwalior, India
sankalp25103@gmail.com imlata1111@gmail.com

3rd Gaurang Sondur 4th Pranav Pawar


2021-IMT-036 2021-IMT-074
ABV-Indian Institute of Technology and Management ABV-Indian Institute of Technology and Management
Gwalior, India Gwalior, India
gssondur28@gmail.com pranav@example.com or ORCID

I. BACKGROUND come into play, classifying these patterns as either originating


from human or bot interactions.
In the realm of open-source software development, the re-
liance on automated tools and bots has become commonplace.
These automated entities play a pivotal role in managing B. Data Collection through the GitHub API
various aspects of open-source projects, including tasks such For the collection of activity data, the researchers have
as issue management, code review, and continuous integration. harnessed the GitHub API. This programmatic interface serves
Their contributions significantly enhance project efficiency as the gateway to accessing and retrieving data from GitHub
and code quality. However, amid this synergy between hu- repositories. By interfacing with the API, researchers can
man contributors and automated systems, it is imperative to systematically extract information concerning repositories,
distinguish between human and bot-generated contributions. commits, issues, pull requests, and user interactions. The API
This distinction serves as the linchpin of maintaining the emerges as a foundational technological tool for expeditious
integrity of open-source collaboration. The research paper data aggregation and analysis, ensuring the data’s accuracy
under discussion offers a detailed methodology for the identi- and comprehensiveness.
fication and review of bot activities within GitHub repositories
within the context of open-source software development. This C. Review of GitHub Profiles and Marketplace/App Listings
methodology is an intricate blend of automated techniques
and manual review processes, meticulously designed to ensure To validate the bot activities flagged by BODEGHA, a
the reliability and precision of data. The approach various meticulous manual review of GitHub profiles and examination
technologies, including machine learning, natural language of GitHub Marketplace/app listings is undertaken. This valida-
processing, web scraping, and the GitHub API. Moreover, it tion process necessitates the use of web scraping techniques.
employs statistical metrics like Cohen’s kappa to gauge inter- Web scraping is an automation technology that facilitates the
rater reliability, further refining the review protocol through systematic extraction of data from web pages. In this context,
iterative discussions. . it enables the gathering of critical information from user
profiles, encompassing data such as usernames, descriptions,
II. P ROCESS AND T ECHNOLOGIES U SED and affiliations with marketplace apps or services, fostering
A. Automated Bot Identification with BODEGHA a comprehensive understanding of the roles of specific bots
within the open-source ecosystem.
Central to the methodology is the deployment of an au-
tomated bot identification method known as BODEGHA.
D. Manual Review of Comments and Activities
BODEGHA operates as a machine learning classifier, driven
by the application of Natural Language Processing (NLP) and Another vital facet of the methodology encompasses a
machine learning technologies. NLP, a field at the intersection hands-on manual review of the content found in recent com-
of linguistics and computer science, empowers the algorithm ments and activities associated with GitHub accounts. This
to decipher language patterns and unearth pertinent insights qualitative inspection leverages text analysis and sentiment
from comments. Simultaneously, machine learning algorithms analysis techniques.
E. Data Reliability and Updating of Review Protocol the increasing number of developers joining this community.
In an endeavor to ensure the reliability of data and the A major contributing factor to this growth has been the
robustness of the methodology, the researchers have metic- presence of various open-source programs, such as Google
ulously implemented statistical measures. This includes the Summer of Code, Outreachy, or MLH. Concurrently, the rapid
assessment of interrater reliability, employing Cohen’s kappa and ongoing adoption of the continuous integration, delivery,
as a metric to gauge the agreement between human raters. and deployment (CI/CD) model places urgent demands on
Statistical software or libraries designed for data analysis and rapid iterations, imposing significant pressure on repository
statistical computations are instrumental in this process. maintainers. However, the rise of bots has provided substantial
relief to these maintainers.
III. P ROBLEM S TATEMENT This paper offers an in-depth investigation of developer bots,
Open Source Software (OSS) development has become a and it is well-written and meticulously maintained. The authors
very large domain in software engineering which involves provide a clear and concise overview of the topic, discussing
a huge base of contributors, users, and other stakeholders. their study’s findings in detail. To accomplish this, they
Various social coding platforms like GitHub, GitLab and Slack conducted thorough research on 1,000 popular repositories on
involve a huge base of such contributors. At the same time, GitHub, revealing that 61% of these repositories utilize bots
the CI/CD (continuous integration, delivery, and deployment) for various tasks.
model is also fast-growing and requires urgent demands. A. Data Collection
Managing such contributors and handling the requirement of
CI/CD raises the need of automation in OSS development. Data collection was a meticulous task. The first step was to
Huge company projects present on various GitHub repositories select only repositories in the sample that contained software
also require automation as it is infeasible for employees to development projects, while excluding non-software engineer-
respond to every query of the contributors. ing (SE) projects. To achieve this, the BODEGHA bot identi-
fication method was used to detect bot activities. The selected
IV. S OLUTION repositories were then manually checked for data reliability,
The report discusses the solution of this problem by ana- achieving an interrater reliability of 0.645 (Cohen’s kappa),
lyzing the use of bots in OSS development. Bots are software indicating the effectiveness of their data collection method.
agents used to perform software engineering tasks which B. Analysis of Data
help to automate the process of building any software. Bots
are seen as a promising approach and can be used to deal For each identified bot, several pieces of data were
with complex OSS projects. Such bots are used to automate extracted, including the bot’s service-providing medium,
repetitive tasks such as managing issues and pull requests, main functionality, service and source code availability, de-
building and executing test sites, providing chatbot services to sign/creation, and framework.
the contributors and handling small queries of them. The use Based on the collected data, the authors categorized bots
of bots helps in saving human labor. into six categories:
The report analyzed 1000 popular GitHub repositories and • CI task assistance
found out that 61% of the most popular projects have adapted • Issue and pull request management
bots in their projects. It can be predicted that in the near future • Code review assistance
more and more projects will start adopting bots to perform • Dependency and security
repetitive tasks. In the past decade, the technology has been • Developer and user community support
developed enormously such as the use of CI/CD tools and • Documentation generation
large-scale collaboration between the contributors have caused The most common of these tasks is CI task assistance.
rapid developments to automation techniques used in bots to Many bots which they found are capable of performing a
handle social tasks so that people can focus on innovative wide range of tasks and can even be customized. Therefore,
tasks. it is accurate to say that they have become like butlers for
Software engineers put significant effort towards development maintainers.
of bots. Creation of these bots is done by bot frameworks.
Different types of bots need to be built for performing various C. Bot Development
automation tasks depending upon the project’s demands, so However, for most of the bots, their source code is not
to separate the bot creation logic for different platforms these available. The authors claim that there may be two major
frameworks are used. reasons behind this. First, there could be security and business
concerns, as bots often have high-level permissions in repos-
V. A NALYSIS OF THE PAPER itories. Making their code public could expose vulnerabilities
This paper investigates the proliferation of bots in the open- that attackers could easily exploit. Second, many customized
source community. Over the past decade, the open-source com- bots use existing services whose code is already available,
munity has experienced significant growth, whether through which gives developers little motivation to release their own
the adoption of open-source practices by new companies or code. These reasons are well justified.
It has also been observed that the emergence of bots has • The paper focuses on GitHub, although alternatives like
led to the development of various frameworks for building Gitlab are also a popular choice and offer more security
them, including OpenBot, Fabric Bot, and Windows Commu- for OSS projects.
nity Tool. These frameworks have empowered developers to • Bots primarily perform simple, rule-based automation,
concentrate on logic rather than writing extensive code. We limiting their application scenarios.
believe that the continued development of such frameworks • Over 69% of bots in OSS remain private, raising security
will significantly contribute to the proliferation of bots within and business concerns due to high permissions in repos-
the open-source community. itories. without transparently sharing their implementa-
tion and accessibility outside projects and organizational
D. Conclusion ecosystems. This raises security and business concerns
due to the high level of permissions bots often possess
In conclusion, the authors have conducted a comprehensive
(above write access) in repositories.
analysis, and their paper is well-organized and structured.
• For data collection, the authors gathered activity data
In their study, they found that over 60% of repositories are
from the most recent list of event actors in commits,
utilizing bots, indicating that bots have become prevalent in
issues, and pull requests, as well as reviewing these
open-source software (OSS). However, these bots are not yet
activities themselves. However, a broader analysis of bot
sophisticated enough to handle complex tasks. Multitasking
usage in OSS could have been achieved by surveying
bots are also rapidly on the rise and are likely to dominate
maintainers and contributors, as they are the primary
the field in the near future. Additionally, they discovered that,
users of the bots.
despite the increased use of bots, more than 65% of bot source
codes remain unpublished. VII. I MPROVEMENT
A. Scalability and Resource Optimization
VI. P ROS A ND C ONS As the number of bots and their complexity increase, it’s
A. Pros crucial to research ways to optimize resource usage. This
includes strategies to minimize resource-intensive operations
• The paper asserts that software agents and bots automate and ensuring that bots don’t inadvertently contribute to system
repetitive and routine tasks, such as managing issues and overloads. Research into resource-efficient bot design and
pull requests, building and executing test suites, thereby scaling mechanisms can be valuable.
often substituting for human labor.
• Many bots have specialized in multiple tasks, serving B. Enhanced Bot Communication and Functionality
as butlers for repositories. For example, RepoKit-teh The authors mentioned that practitioners expect more in-
performs various tasks, including checking the format of telligent features in bots’ communication and functionality.
a pull request, automating tests, assigning users to issues, To address this, future research could delve into specific im-
and merging pull requests. Some organizations have even provements, such as natural language processing capabilities
customized private butler bots for CI/CD practices, like for more interactive and context-aware communication with
Googlebot and Facebook GitHub Bot, enhancing flexibil- users. Bots that can understand and respond to user queries
ity. more effectively can greatly enhance their usability.
• Open source software (OSS) projects typically have few
C. Bot Security Audits
full-time employees, making it challenging to address
queries. Trained bots capable of answering frequently Given the concerns about bot security, conducting regular
asked queries save significant time. security audits of bot implementations could be beneficial.
• During specific events like GSoC, Outreachy, and Hack- This includes vulnerability assessments, code reviews, and
toberfest, where contributors surge, bots prove helpful testing for potential exploits. Research efforts should focus on
in managing issues and reviewing PRs due to limited developing best practices and tools for ensuring the security of
maintainers. bots, especially those with elevated permissions in repositories.
• These automation techniques, i.e., bots, support devel- C ONTRIBUTION
opers through various service mediums, with options
• Pranav Pawar: Conducted a background check and re-
for user profiles and platform applications. Practitioners
viewed the history of bots in open source. Also, explained
adopt bot services, considering performance, privacy, and
the methodology used by the authors to extract data.
security tradeoffs.
• Gaurang Sondur: Described the problem statement ad-
dressed by the authors and the reasons behind it. Addi-
B. Cons tionally, outlined the proposed solution.
• While these bots have saved substantial human effort, • Sankalp: Performed a comprehensive analysis of the
they also have limitations, such as not being designed for paper and provided a summary of its key points.
complex tasks and lacking interactivity during human-bot • Lata: Identified the pros and cons of the paper and the
communication. bots, and offered suggestions for improving both.
R EFERENCES
[1] Z. Wang, Y. Wang and D. Redmiles, ”From Specialized Mechanics
to Project Butlers: The Usage of Bots in Open Source Software
Development,” in IEEE Software, vol. 39, no. 5, pp. 38-43, Sept.-Oct.
2022
[2] 2022. Proceedings of the 13th Asia-Pacific Symposium on Internetware.
Association for Computing Machinery, New York, NY, USA.
[3] https://marutitech.com/complete-guide-bot-frameworks/
[4] https://github.com/googleapis/repo-automation-bots

You might also like