Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Information and Software Technology 145 (2022) 106849

Contents lists available at ScienceDirect

Information and Software Technology


journal homepage: www.elsevier.com/locate/infsof

Collaboration in software ecosystems: A study of work groups in open


environment
Zhifei Chen a ,∗, Wanwangying Ma b , Lin Chen b , Wei Song a
a
School of Computer Science and Engineering, Nanjing University of Science and Technology, China
b
State Key Laboratory for Novel Software Technology, Nanjing University, China

ARTICLE INFO ABSTRACT

Keywords: Context: As a particular type of software ecosystem, an open source software ecosystem (OSSECO) is a
Open source software collection of interdependent open source software (OSS) projects which are developed and evolve together.
Software ecosystem Events happening within an OSSECO inherently involve the collaboration of participants from multiple OSS
Collaboration
projects, forming a temporary work group. However, it is still unclear how different members of a work group
Work groups
collaborate to fix cross-project bugs, a typical event in the maintenance of OSSECOs.
Cross-project bugs
Objective: This study aims to investigate the characteristics of collaboration within a work group when fixing
cross-project bugs in an OSSECO. It involves the participants from the upstream (which caused the bugs) and
the downstream (which were affected by the bugs) OSS projects.
Method: We conducted our study on 236 cross-project bugs from the scientific Python ecosystem, involving
571 participants and 91 OSS projects, to understand open collaboration within a work group. We established
a quantitative analysis to investigate the members of a work group, along with a qualitative analysis to
understand the roles of the members from different OSS communities.
Results: The results show that: (1) A typical work group is constituted of four to eight members from
the core development teams of the two OSS communities. More members concern with the upstream OSS
projects and few can make active contributions to both sides; (2) Distinct responsibilities are taken by the two
OSS communities, with the downstream members as the problem-finders and the upstream members as the
decision-makers or gatekeepers.
Conclusions: Our findings reveal the collaborative mechanism and the responsibility allocation between the
upstream and downstream OSS communities in the ecosystems.

1. Introduction OSSECOs tend to be large, containing from tens to hundreds of


OSS projects, with even an order of magnitude more dependencies
Software projects are seldom developed in isolation. On the con- between them [3]. The complicated dependency networks in OSSECOs
trary, software projects generally rely on the infrastructure or func- are a burden for many developers, as the maintenance event triggered
tional components provided by other software projects, leading to by any OSS project may affect the total ecosystem. Thanks to the
complex inter-project dependencies. The formed dependency network support of social coding platforms such as GitHub [4], the transparent
facilitates the co-evolution of complementary applications and services and open development environment promotes the collaboration among
in return. In this context, a software ecosystem (SECO)1 is defined as software developers. It makes the communication and coordination
a collection of interdependent software projects which are developed change from on the project level to the network level of the OSSECO.
and evolve together in the same environment [1]. One particular Individuals from any OSS community can take part in public software
type of such SECO is open-source software ecosystem (OSSECO or engineering events in the total ecosystem. However, the collaboration
OSS ecosystem) where the common environment is an open-source is complicated on the network level of the OSSECO. For instance,
community. For example, the scientific Python ecosystem is one of the improvements that a developer makes to an open-source library can
most prominent OSSECOs on GitHub with complicated inter-project affect all the users of that library. Any action may require rework from
dependencies [2]. developers whose software projects depend on that library. They should

∗ Corresponding author.
E-mail addresses: chenzhifei@njust.edu.cn (Z. Chen), wwyma@smail.nju.edu.cn (W. Ma), lchen@nju.edu.cn (L. Chen), wsong@njust.edu.cn (W. Song).
1
The definition of SECO varies in the existing literature.

https://doi.org/10.1016/j.infsof.2022.106849
Received 7 June 2021; Received in revised form 18 December 2021; Accepted 11 January 2022
Available online 22 January 2022
0950-5849/© 2022 Elsevier B.V. All rights reserved.
Z. Chen et al. Information and Software Technology 145 (2022) 106849

invest in regular rework to keep up with changes and collaborate with • We explored the roles of group members behaved in fixing cross-
upstream OSS projects (i.e., the libraries that their OSS projects depend project bugs, which sheds light on the entire interactive and col-
on) to minimize the impact of those changes. The maintainers of the laborative process within work group members during OSSECO
library project, in turn, work toward to the reduction of the burden on maintenance.
their users. Therefore, compared with the collaboration in a single OSS
With regard to the contribution of our findings to research, the
community, the collaboration in an OSSECO presents a richer diversity
results from our study on collaboration shed some light on the building
of patterns and is enlarged in width and depth.
of work groups in fixing cross-project bugs in the scientific Python
During the individuals’ collaboration in OSSECO development, there
ecosystem on GitHub. Further, results and discussion from our study
is an emerging type of bug specific to the OSSECOs to fix: the cross-
could help in bringing recommendations to researchers, developers,
project bug. A cross-project bug roots in an upstream OSS project, but
and practitioners. Essentially, this work can serve as a starting point
affects not only itself but also the downstream ones depending on it.
towards the means to understand open collaboration in OSSECOs and
Tracking and fixing cross-project bugs inherently involves the commu-
can also be useful for decision-making when allocating people to a
nication and coordination of multiple OSS communities including the
GitHub work group.
upstream library, the affected downstream OSS project(s), and perhaps
The remainder of the paper is organized as follows. Section 2 gives
the outer users. In this context, it forms a temporary work group which
an overview of the related work. Section 3 describes the methodology
lasts for a short term, emerging and disappearing with the assignment
of our research. Sections 4 and 5 report the results of our two research
and completion of the tasks. The members of a work group are not
questions. Section 6 further discusses the findings from this research.
constant, but are varying based on their responsibilities, abilities, and
Finally, Section 7 concludes this paper.
interests. Informally, a work group referred in this paper consists of
all the members who participate in the reporting, tracking, and fixing
2. Related work
process of a cross-project bug in the OSSECO.
Although the developers’ collaborative practices in a single OSS
Since this study investigates group collaboration in an OSSECO
project have been widely studied [5–8], the research on the collabora-
when fixing cross-project bugs, this section first introduces the defini-
tion on the network level of OSSECOs is still limited, only focusing on
tion of SECOs and OSSECOs, and then summarizes the works on the
general activities [9–11] but lacking the studies on certain tasks such as
collaboration in open OSS development and in OSSECOs. Finally, we
fixing cross-project bugs. The work groups for fixing cross-project bugs
present the existing literature referring to bug tracking practices.
are representative for those in OSSECOs. The members of a work group
from different OSS communities are differentiated by the directed
2.1. SECOs & OSSECOs
dependencies between OSS projects. Thus, the different identities of
members lead to different roles of them within a work group. Yet, it
is not clear how they perform interactions and coordination with each SECO is an area that has been gaining in popularity over the last
other when they encounter a cross-project bug. The uniqueness of the decade. Manikas and Hansen [12] provided a systematic overview of
work groups formed to fix cross-project bugs in OSSECOs makes a study the research done on SECOs from a software engineering perspective.
on the collaboration in such environment valuable. They pointed out that there is little consensus on what constitutes a
SECO in the existing literature. Four main groups of the quoted defi-
The goal of this study is to understand the characteristics of col-
nitions are from Messerschmitt and Szyperski [13], Jansen et al. [14],
laboration in the OSSECOs hosted on GitHub. We choose to study the
Bosch [15], and Lungu et al. [1].
work groups forming in cross-project bug fixing process. Specifically,
Our study is focused on SECOs in the context of OSS, namely
with 236 cross-project bugs in the scientific Python ecosystem under
OSSECOs. OSS initiatives typically create an adequate environment for
analysis, we aim at two main research questions with regard to such
making a SECO emerge from their projects and communities. OSSECO,
work groups: (1) what does a work group look like? (2) what is the role
which is a free available SECO, has been widely studied in recent years.
of each member in a work group? In order to answer these questions,
Unfortunately, similar with SECO, there is no common definition of
we established a quantitative analysis to investigate the members of
OSSECO either. Most relevant studies based their work on definitions
a work group, along with a qualitative analysis to understand the
related to SECOs [16]. In this paper, the type of the studied OSSECOs
roles of the participants from different OSS communities by manually
is coherent in the definition by Lungu et al. [1] which concerns the
inspecting the cross-project bug reports.
co-evolution of interdependent software projects where multiple par-
By combining qualitative and quantitative analysis, we find that a
ticipants collaborate to develop and maintain on the network level.
typical work group consists of four to eight members who come from
Particularly, our study is conducted on the OSSECOs through GitHub
the core development teams of the upstream and downstream sides.
hosting platform.
More group members concern with the upstream OSS projects than the
Franco-Bedoya et al. [16] reported a systematic mapping in the field
downstream OSS projects and few of them can make active contribu-
of OSSECOs. The results showed that existing research on several topics
tions to both sides. In addition, distinct responsibilities are taken by the
related to OSSECOs is still preliminary and one-sided, but the great
two OSS communities. In most cases, downstream OSS communities
value of OSSECOs in the development of engineering applications is
reported cross-project bugs first. However, they faced difficulties in
a consensus. This shows that it is urgent to study the targeted theory
fixing them, and thus they communicated with upstream participants
and method for the healthy development of OSSECOs.
actively in order to have a quick fix. Upstream OSS communities take
responsibility for the entire process of bug-fix work.
In summary, this article makes the following contributions: 2.2. Collaboration in OSS development

• We conduct a study on 236 cross-project bugs which involve 91 A number of research works on OSS studied its social media, such
OSS projects and 571 participants to understand open collabora- as ‘‘starring’’ and ‘‘forking’’ a repository on GitHub, which facilitates
tion in the scientific Python ecosystem on GitHub by combining the interactions between OSS projects and developers [17–21]. As a
qualitative and quantitative analysis. further study, Thung et al. [22] investigated the network structure
• The study provides insights on the members of a work group in an on GitHub and concluded that social coding enabled substantially
OSSECO and found that most of the group members come from more collaborations among developers. Based on this finding, previous
the core development teams of the upstream and downstream studies have discussed diverse aspects and perspectives of collaboration
sides. in OSS development.

2
Z. Chen et al. Information and Software Technology 145 (2022) 106849

On the one hand, multiple studies proposed different methods to software communities to fix. Canfora et al. [31] proposed an approach
promote the collaboration in OSS development. Wang and Redmiles [5] to identify Cross-System-Bug-Fixings (CSBFs) in FreeBSD and OpenBSD
proposed an intelligent system called IIAG to advise its users about kernels. They also employed social network analysis to associate the
strategies for initial interactions with new remote collaborators in occurrences of CSBFs with the social characteristics of contributors.
OSS development team members. In order to make better decisions Ma et al. [32] estimated the impact of a cross-project bug within
during collaborative OSS development, Yang et al. [6] proposed a its ecosystem by identifying the affected downstream modules. Ma
multi-dimensional developer portrait model to characterize developers. et al. [33] also used qualitative methods to investigate how developers
On the other hand, some studies aim to understand the complexi- track the root causes of cross-project bugs and how the downstream
ties and barriers for collaboration in OSS development. For instance, developers deal with upstream bugs. They focused on the aspect of
Zhou et al. [7] showed that there are significant inefficiencies in the root causes and only studied the behaviors of downstream developers
fork-based development process, including redundant development and (especially reporters and linkers). Based on their work, we further study
fragmented communities. Constantino et al. [8] presented an interview the open collaboration between the upstream and downstream OSS
study to understand how collaboration happens in fork-based devel- communities, concentrating more on the structure of the work groups
opment and concluded that the main barriers for collaboration are and the various roles of the whole members.
related to non-technical issues. More importantly, the result of their
interview study also showed that bug fixing is one of the most recurring 3. Research methodology
collaborative practices. In this study, we analyze collaborative bug
fixing practices in an OSSECO, as opposed to the single OSS project. This section presents our main research methodology. At first, we
explain the research setting and research questions, and then, we
2.3. Collaboration in OSSECOs describe the objects and methods used to conduct this research. Finally,
we discuss the threats to validity.
Unlike the development of single OSS project, coordination is a
major challenge in OSSECOs, since the OSS projects in an OSSECO tend 3.1. Research setting and questions
to be highly interdependent yet independently maintained. However,
the relevant research is still limited and preliminary. We conduct our study on GitHub, a famous social coding platform.
Farias et al. [9] presented a survey on the developers’ sense of We first use a real case to introduce the GitHub issue tracker and
influence in ecosystems based on GitHub repositories. The influencers the fixing process of a cross-project bug. Fig. 1 displays a snip of a
are those who lead development and dictate how the software project bug report with id numpy/numpy#42252 in the OSS project Numpy.
evolves. They found that active participation and long-time interac- Numpy is a fundamental library in the scientific Python ecosystem
tion with a project are drivers for collaboration. Lyulina et al. [10] with a great many OSS projects depending on it including Scipy and
attempted to provide an interactive collaboration graph of an extremely Nengo. A developer named pbrod submitted the bug report to Numpy
and then several GitHub users took part in the discussion of the bug
large OSSECO in order to facilitate the understanding of the developer
by leaving comments on the bug page. For simplicity, we only show
collaboration structure and relationships among OSS projects. Hou
one comment from WarrenWeckesser in the snip. Later, contributors
et al. [11] evaluated developer cooperation intensity to identify clear
of Scipy and Nengo (named ev-br and tcstewar, respectively) sep-
community structure for the developer collaboration network in the
arately found that this bug would affect the two OSS projects and
ecosystems on GitHub.
then referenced it in their own bug reports (scipy/scipy#32423
Our study is also conducted on OSSECOs, focusing on the collab-
and nengo/nengo#2604 ). The Numpy bug was finally closed by a
orative activities of participants. Different from the existing research
which analyzed influencers or collaborators in general activities, our
Numpy’s core developer after confirming that it had been fixed by the
contributor juliantaylar.
work analyzes the characteristics of collaboration within a specific
In this case, numpy/numpy#4225 is a cross-project bug causing
group of individuals gathering to complete a certain task (i.e., fixing
harm to other OSS projects. Numpy is the upstream OSS project which
cross-project bugs) on the network level.
the bug rooted in, while Scipy and Nengo are the downstream OSS
projects which were affected. When fixing such cross-project bugs,
2.4. Bug tracking practices
three types of users play key roles: (1) reporter: the one who first
reported this bug in the issue tracking system; (2) linker: the one
The practices of collaborative teams have long been the focus of
who indicated that the bug affects another OSS project; (3) restorer:
software engineering community, including open data analysis [23],
the one who finally fixed the bug by submitting a fix commit. For
scientific problem solving [24], and peer production [25]. Bug tracking
numpy/numpy#4225, pbrod is the reporter who submitted the bug, ju-
is an important social part of software development that has attracted
liantaylar is the restorer who finally fixed the bug, ev-br and tcstewar are
many researchers’ interests.
the linkers who linked the bug to two other OSS projects. The number
Oliveira et al. [26] studied the effectiveness of collaborative identifi-
of linkers ranges from one to several, relevant to the number of affected
cation of code smells. Their statistical testing suggested more precision
OSS projects. All the people participating in the discussion of this bug
and recall through the collaborative smell identification. For identified
form a work group for resolving the bug. Note that the participants
bugs, Breu et al. [27] concluded that the users’ active participation in
who commented on scipy/scipy#3242 or nengo/nengo#260
the discussion was important to accelerate the fixing process. Bertram but not on numpy/numpy#4225 are also considered as the members
et al. [28] put their eyes on the design and use of issue tracking of the work group as long as their comments were made before the
systems. They revealed the nature of issue trackers as the central hub cross-project bug was closed, because their comments were also directly
of communication that significantly supported the collaboration within or indirectly contributed to the fixing of the bug.
collocated software development teams. Crowston and Scozzi [29] In this study, we aim to explore the characteristics of collaboration
investigated the coordination mechanism of distributed teams when in a work group when fixing a cross-project bug. Specifically, we seek
fixing bugs. Kumar et al. [30] investigated the multifaceted nature of to answer the following two research questions:
collaboration among the developers while they fix bugs and collaborate
in OSS systems.
Recently, the cross-project bug, an emerging type of bug, has caught 2
https://github.com/numpy/numpy/issues/4225.
the attention of a few researchers. Such type of bug affects more 3
https://github.com/scipy/scipy/issues/3242.
4
than one software project, thus needs the collaboration of different https://github.com/nengo/nengo/pull/260.

3
Z. Chen et al. Information and Software Technology 145 (2022) 106849

Fig. 1. A snip of numpy/numpy#4225 bug report.

• RQ1(Composition of a Work Group): What does a work group cross-project bugs which affect them. Finally, we selected 236 distinct
look like? cross-project bugs which come from 27 libraries and totally affect 91
RQ1.1 How many members are there in a work group? OSS projects.
RQ1.2 What are the members’ identities with respect to the up- Work Groups. For each cross-project bug, we identified the work
stream/downstream OSS projects? group by picking up the users who left a comment in the bug or in its re-
RQ1.3 What are the members’ contributions to the upstream/ lated downstream one(s) before closing the cross-project bug. Although
downstream OSS projects? some users just placed an arbitrary comment, or just reported and not
This research question focuses on the members of a work group followed up on the bug, we still take them as the members of this work
by investigating the individuals. A picture of the work groups can group. Because they really participanted in the bug fixing activities by
thus be obtained: how many people are in a work group? Of them, expressing their opinions, though making little contribution or taking
how many members are from the upstream or downstream OSS an inactive role.
projects? How many members are from other OSS communities? In order to identify the work group for each cross-project bug, we
• RQ2(Roles of Group Members): What is the role of each parsed the web pages of its upstream bug report and all downstream
member in a work group? bug reports, and collected the closing time and all the comments
RQ2.1 Which OSS communities are the reporter, restorer, and linker information. All the users commented on these bug reports before the
from in a work group? closing time were recorded as the work group of the cross-project bug.
RQ2.2 What work do the members of different OSS communities Finally, a number of 571 distinct GitHub users were included in our
undertake? study.
The goal of RQ2 is to find out how the group members collaborate Generally, the work group of a cross-project bug contains the re-
to fix a bug. On the one hand, we pay attention to three important porter who first reported the bug, the restorer who fixed it, and the
types of participants in a work group, i.e., reporter, restorer, and linker who found the bug’s connection between the upstream library
linker. We analyze the characteristics of these participants. On and the downstream OSS project. The reporters and linkers have been
the other hand, we seek to find out whether the members of stored in the dataset provided by Ma et al. [33] and we identified the
different OSS communities (i.e., the upstream OSS project and restorers of the cross-project bugs by observing GitHub commit records
the downstream OSS project(s)) undertake different work in fixing in the bug reports.
cross-project bugs. Finally, our research objects consist of 236 cross-project bugs, the
work group for fixing each cross-project bug, the reporter, linker, and
In the following subsections, we will introduce the data (Sec- restorer in each work group. In our dataset, the cross-project bugs along
tion 3.2) and the methods (Section 3.3) used to answer these questions. with the reporters and linkers come from the existing study [33].

3.2. Research objects 3.3. Research methods

Cross-project Bugs. To answer the research questions, we collected We combined quantitative and qualitative analysis to address the
the cross-project bugs based on the dataset provided by Ma et al. [33]. research questions.
They identified 271 pairs of cross-project correlated bugs in the sci-
entific Python ecosystem, one of the most prominent ecosystems on 3.3.1. Quantitative analysis
GitHub [2]. A pair of cross-project correlated bugs, as they defined, In order to investigate the members of a work group, we first
is constituted of an upstream bug and an affected downstream one. identified each member’s relationship with the involving OSS projects.
Therefore, there are two pairs in the aforementioned case: one is As one of the largest code hosts in the open source world, GitHub
numpy/numpy#4225 and scipy/scipy#3242, and the other is has attracted millions of developers, who can participant in the devel-
numpy/numpy#4225 and nengo/nengo#260. In our study, we opment and maintenance of any public project on GitHub that interests
considered each upstream bug (called cross-project bug in this paper) as them. Therefore, the development teams of popular GitHub projects,
a data point and 240 distinct ones were found. Among them, two OSS just as other OSS projects, often show a hierarchy structure [34]. The
projects (i.e., jaraco/setuptools and Homebrew/homebrew- core developers build the codebase and oversee the OSS project design,
python) have been deleted from GitHub, and thus we filtered out the peripheral participants make more efforts to submit patches, and active

4
Z. Chen et al. Information and Software Technology 145 (2022) 106849

users undertake supporting activities (such as providing use-cases and 3.3.2. Qualitative analysis
testing new releases) rather than contribute code. According to the To understand the roles of the members from different OSS commu-
characteristics of the influencers identified by Farias et al. [9], we nities for answering RQ2, we manually inspected the comments of the
can infer the relationship between participants and OSS projects from cross-project bug reports.
the social media provided by GitHub. Specifically, we considered the
The studied cross-project bugs involve 4737 comments in total and
relationship between a participant and a specific OSS project from the
following two dimensions. three of the authors of this paper inspected the roles of group members
(1) The participant’s identity with respect to the OSS project played in these comments. The authors focused on two main aspects for
GitHub provides three built-in features that allow participants to each comment: (1) the identity of the commenter, i.e. whether he/she
build contact with OSS projects, that is, watching, forking, and star- is an upstream participant, a downstream participant, or neither; and
ring. ‘‘Watching’’ an OSS project allows a participant to receive no- (2) whether and how this comment contributes to the fixing of the bug.
tifications of events that takes place in the OSS project (e.g., report- In order to identify and categorize the roles of group members in the
ing, commenting, and closing an issue) via the email and GitHub discussion of cross-project bugs, we followed a manual coding process
social media.‘‘Watching’’ implies an interest in the activities of the OSS for performing a qualitative analysis inspired in the grounded theory
project [35] and also a likelihood to contribute to it in the future [17].
procedures [39]. We coded the roles of group members shown in the
‘‘Forking’’ an OSS project means making a copy in the participant’s
comments according to the following steps.
account, which allows him to freely change the code without affecting
the original OSS project. A fork is usually used to propose modifications Firstly, we randomly selected three sets of 30 cross-project bug
to the OSS project or used as a starting point of a new idea [36]. reports and assigned each set to one of the three authors. The authors
‘‘Starring’’ an OSS project indicates to put the OSS project in the first independently reviewed all the comments in the bug reports
participant’s bookmark so that he/she can keep track of it. For a specific assigned to him/her. For each comment, they analyzed the contribution
OSS project, the participants who watch, fork, and star it are called of the commenter made to the bug fixing. During this process, similar
watchers, forkers, and stargazers, respectively. contributions were grouped together to form categories. For example,
Apart from the three types of participants, we also consider the the author might note several comments which appeared to be directed
owner of an OSS project who establishes and is in charge of the OSS at providing individuals’ suggestions on fix commits, then the author
project. The owner may be an organization with a number of members
labeled these as ‘‘Reviewing the bug fix’’. After that, each author got
or an individual.
the individual categorization related to comment contributions.
Generally, for a participant under our consideration, watching an
OSS project indicates his/her attention to the maintenance of the OSS The authors then got together to compare, discuss, and integrate
project, forking an OSS project implies the possibility of making a code their individual categorizations to settle on a basic set of codes. They in-
change, starring an OSS project shows his/her interest and praise to it, tegrated three individual categorizations by merging similar categories
and owning an OSS project signifies his/her rights and responsibilities they identified or splitting when it was the case. For example, two
to manage it. In this study, in order to identify the users’ identities for authors proposed a category of ‘‘Giving suggestions on the fix design’’
answering RQ1.2, we collected all the watchers, forkers, stargazers, and while the other author proposed a category of ‘‘Suggesting fix location’’
owners of the investigated OSS projects by using GitHub API [37]. which is its subcategory, so they integrated these categories as ‘‘Giving
(2) The participant’s contribution to the OSS project suggestions on the fix design’’. The output of this phase was the basic
Apart from the identity of a participant, we also considered his/her categories related to the roles of the commenters during the discussion
contribution, i.e., the code changes, to the OSS project. In an OSS
of the cross-project bugs. An overview of the basic codes is presented
project, the numbers of watchers, forkers, and stargazers may be large.
in Table 2 which provides a part of categories.
For example, Numpy has 559 watchers, 16.4 k forkers, and 5.3 k
stargazers. In addition, its owner, the Numpy organization, has 31 After that, the three authors read through all the bug reports and
members. Of all these people, 1063 ones have committed code changes grouped the roles of the commenters according to the basic codes. In
(including documentation changes) to Numpy. However, more than particular, we distributed the upstream and downstream bug reports
half the contributors (572) committed code changes to the repository of 236 cross-project bugs among three of the authors. The reports of
only once, and only 179 contributors have committed more than five each cross-project bug were inspected by two authors. Each of the
times. It means that most contributors only occasionally contributed to involved authors independently categorized the participants’ roles by
Numpy. understanding their comments and left out the comments which cannot
We analyzed the participants’ contributions to the investigated OSS be categorized for a later discussion.
projects. Due to limited space, we only list the data of seven central OSS
Then, the authors double-checked the consistency of their individual
projects in the scientific Python ecosystem (i.e., Astropy, Ipython,
Matplotlib, Numpy, Pandas, Scikit-learn, and Scipy) in categorizations. The categorizations resulted in a Cohen’s Kappa score
Table 1. The table shows that only seven to 19 contributors have of 0.9, which indicates strong agreement. They then discussed their
pushed more than one percent of the total commits to each OSS project, disagreements to reach a common decision. The remaining differences
accounting for 62.6% to 85.2% of the total commit contributions. This in six comments were resolved by tie-breaking votes from the third
finding also confirms that a small percentage of participants have made author. Additionally, they also identified the categories for the unla-
most of the code changes in an OSS project [38]. In this study, those beled comments. In particular, the basic codes were refined by allowing
who have contributed more than one percent of an OSS project’s total new codes to be added to reveal additional categories. The codes were
commits are identified as its activists, since they participate in the OSS iteratively discussed until no new categories of roles in the comments
project’s maintenance activities frequently. In this study, in order to were found. During the discussion, eight new categories were created
answer RQ1.3, we identified the activists of an investigated OSS project for the remaining 20 comments that could not be grouped into the basic
by mining its commit history through the GitHub API [37].
categories.
In summary, we inferred the relationship between a participant and
an OSS project based on his/her identity and contribution with respect Finally, according to the coding result, we can summarize the
to the OSS project, in which way we identified watchers, forkers, general work taken by the participants from different OSS communi-
stargazers, and owners through the former dimension, and activists ties in bug fixing activities, which reveals the collaboration between
through the latter dimension. This information is analyzed in order to participants of the upstream library and the affected downstream OSS
study the members of the work group for RQ1. project(s).

5
Z. Chen et al. Information and Software Technology 145 (2022) 106849

Table 1
Statistics of developer contributions to the OSS projects.
OSS project #commits of contributors #contributors #commits of contributors committing >1%
Min. 1st Qu. Med. Avg. 3rd Qu. Max. Total all >1%
Astropy 1 1 4 79 10 4316 29,672 377 17 24,685 (85.2%)
Ipython 1 1 2 33 5 5617 22,901 699 7 18,226 (79.6%)
Matplotlib 1 1 2 36 5 5430 36,937 1031 17 28,898 (78.2%)
Numpy 1 1 1 23 3 4886 24,097 1063 17 18,081 (75.0%)
Pandas 1 1 1 11 2 4730 24,123 2244 15 16,081 (66.7%)
Scikit-learn 1 1 1 13 2 2331 24,148 1900 17 15,122 (62.6%)
Scipy 1 1 2 23 5 3093 22,271 952 19 14,541 (65.2%)

Min. and Max. columns show the minimal and maximal number of commits that the contributors have proposed to the OSS project, respectively. 1st Qu., Med., and 3rd Qu. columns
show the quantiles of the numbers of the commits that its contributors have proposed. Avg. means the average number of commits that a contributor has proposed. Total means
the total number of the commits in the OSS project. all and >1% columns show the number of the total contributors in the OSS project and the number of the contributors who
have contributed more than 1% of the total commits in the OSS project, respectively. The last column shows the number of commits of active contributors who have contributed
more than 1% of the total commits in the OSS project.

Table 2 watching, forking, or starring an OSS project for the participants is


An overview of the basic codes relating to the roles of the participants. limited. Second, when mining the evolution history of an OSS project,
Comment phase # Categories Part of the categories to determine the contributors of commits, we used the participants’
‘‘Describing the error/exception’’ names to match. If a participant changed his/her name in the code
‘‘Describing the context’’ evolution history, our matching process might cause false negative. One
Discussion on the bug 16 ‘‘Asking the details of the bug’’ such case was found in this study, i.e., a developer named ‘‘pv’’ was
‘‘Confirming the bug’’
renamed.
‘‘Rejecting the bug’’
Threats to internal validity. In our qualitative analysis, we man-
‘‘Providing test cases"
ually inspected all of the cross-project bug reports. It is essentially
‘‘Linking to another project’’
Discussion on fix design 18 ‘‘Linking to another bug’’
subjective to understand a given comment in these bug reports, and this
‘‘Giving suggestions on the fix design’’ subjectivity is unavoidable. To alleviate this threat, three of the authors
‘‘Providing temporary solutions’’ individually inspected the bug report comments and summarized their
‘‘Reviewing the bug fix" observations, then they got together to integrate their findings and
‘‘Confirming the bug fix’’ ideas. However, there is now no possibility to completely eliminate the
Discussion on fix release 9 ‘‘Updating the bug fix’’ influence of their preconceptions. We intend to increase the database
‘‘Planning the fix release’’ and to invite skilled experts to fix or confirm our conclusion in future.
‘‘Informing the fix release’’
An additional threat lies in the identification of work groups. In our
‘‘Closing the discussion" study, the members of a work group consist of the users who left a
‘‘Agreeing/Disagreeing with others’’ comment in the cross-project bug report or in its related downstream
Communication 11 ‘‘Informing other developers’’
one(s). During our qualitative analysis, we found a few members only
‘‘Failing to follow up’’
‘‘Showing interests or concerns’’ left a random message in the bug report and made little contribution
to the bug fixing. This finding motivates future work in the statistical
evaluation of a member’s contribution in the work group.

3.4. Threats to validity 4. Composition of a work group

This subsection discusses some of the threats that could affect the In this section, we report the result of the first research question:
what does a work group look like?
validity of our study results.
Threats to external validity. In this study, the main external threat 4.1. (RQ1.1) How many members are there in a work group?
is related to dataset selection. The dataset we used comes from the
cross-project bugs provided by the previous research [33]. Moreover, Table 3 shows the distribution of the sizes of work groups, that is,
these bugs are all from the well-known scientific Python ecosystem. the number of members in a group. For the 236 work groups in the
The study results may be biased by the selection of these subjects. scientific Python ecosystem, the sizes range from two to 31, with an
For example, the cross-project bug fixing process in other OSSECOs average of six members. Fifty percent of the investigated bugs were
may have differences in observing work group activities. In this study, collaboratively fixed by four to eight participants.
the analyzed OSSECO involves 91 OSS projects. Generally speaking, In particular, the work group for fixing ipython/ipython#625
our subject projects are popular (4055 GitHub users starred the OSS involved the most number of participants. It is a complex bug submitted
project on average), are still actively maintained (the OSS project has on May 10th, 2010 and undergoes a repeated process of closing and
343 commits over the past year on average), and have attracted a reopening. The bug remains open with more downstream OSS projects
large number of contributors (the OSS project has 194 contributors claiming under affected and an increasing number of participants en-
and eight activists on average). As a future work, the study would gaging in the fixing. There are five other bugs with a work group of
15 or more members. These bugs either have severe impact on the
be repeated across different subjects, then the understanding of work
downstream OSS projects or are difficult to fix so that a relatively larger
group collaboration in fixing cross-project bugs can be calibrated.
number of participants took part in the discussion of repairing them.
Threats to construct validity. The threats to construct validity
mainly concern measurement errors. First, in order to identify a partic- Finding 1 Fifty percent of the investigated cross-project bugs were fixed
ipant’s identity with respect to one OSS project, we checked the current by the cooperation of four to eight participants.
lists of owners, watchers, forkers, and stargazers recorded on GitHub. Finding 2 Fixing cross-project bugs that have severe impact on the
However, the lists are changing with the evolution of the OSS project. downstream OSS projects needs many more participants.
For example, a participant may withdraw from the owner organization
of one OSS project and start to fork another OSS project to become its
5
contributor. The assumption is that the occurrence of stopping owning, https://github.com/ipython/ipython/issues/62.

6
Z. Chen et al. Information and Software Technology 145 (2022) 106849

Fig. 2. The members of work groups. • The boxplots describe the percentage distribution of owners/watchers/forkers/stargazer/others and the percentage distribution of the
activists, with respect to the upstream OSS project, downstream OSS project(s), and both OSS projects within a work group. • A boxplot presents the median (the horizontal line
within the box), the 25th, and the 75th percentiles (the lower and upper sides of the box), as well as the mean (the × within the box) of the distribution. The figure beside a
boxplot indicates the mean value.

Table 3 fork the upstream OSS projects and downstream OSS projects to make
The distribution of group sizes.
contributions to them.
Min. 1st Qu. Med. 3rd Qu. Max. Avg. Sd.
Group size 2 4 6 8 31 6 3.7 Finding 4 There are few outer users from other OSS communities in a
Min. and Max. columns show the minimal and maximal sizes of work groups, work group.
respectively. 1st Qu., Med., and 3rd Qu. columns show the quantiles of the group
sizes. Avg. and Sd. columns show the average group size and the standard deviation,
respectively. 4.3. (RQ1.3) What are the members’ contributions?

Fig. 2(b) shows the distribution of the activists (the core developers)
4.2. (RQ1.2) What are the members’ identities? from the upstream OSS project, at least one downstream OSS project,
and both the upstream and downstream OSS projects in a work group.
Fig. 2(a) illustrates the distribution of owners, watchers, forkers, On average, two activists of the upstream OSS project and two
stargazers, and others with respect to the upstream OSS project, at least activists of the downstream OSS project(s) will collaborate to fix the
one downstream OSS project, and both the upstream and downstream cross-project bug, accounting for 39% and 38% of a work group,
OSS projects in a work group. respectively. Since the core development team of each side has the
In general, work group members concern more with the upstream common goal of being revealed from the bug, it is their responsibility
OSS projects. On average, more members in the group watched, forked, to keep an eye on the progress of the fixing process.
and starred the upstream OSS project than the downstream OSS Meanwhile, it is not necessary to include an activist of both the
projects, accounting for 30%, 69%, and 51% of the members, respec- upstream and downstream OSS projects in a work group, as averagely
tively. In addition, there is a similar number of members who are the only 7% of the group members are the core contributors of both sides.
owners of the upstream OSS project and the owners of the downstream In the 236 work groups under investigation, we find that each work
OSS projects in a work group.
group only contains less than four bilateral activists. It reveals that few
Notably, most group members forked the upstream OSS projects or
participants can make active contributions to both of the involving OSS
the downstream OSS projects. In 25% of the investigated work groups,
projects at the same time.
all their members forked the upstream OSS projects, and most of them
are the downstream forkers. Specially, 31% of group members forked Finding 5 In a work group, approximately 2 core contributors of the
both types of OSS projects on average. Since most participants fork upstream OSS project and 2 core contributors of the downstream OSS
an OSS project in order to contribute to it, it is reasonable to infer project(s) collaborate to fix the cross-project bug.
that a large percentage of the group members have ever submitted pull
requests to modify the upstream code. Finding 6 There are very few activists of both the upstream and
In addition, averagely one participant of the work group, accounting downstream OSS projects in a work group.
for 9% of group members, has never established connection with any
OSS projects, who we consider as an outer user of the outer OSS com-
Summary: A typical work group consists of four to eight
munity. However, in our investigation, 148 work groups (accounting
members, including 39% and 38% of them coming from the
for 63% of all work groups) do not include an outer user. In most cases,
core development teams of the upstream and downstream sides
all the members in a work group concern with at least one involving
respectively. More group members concern with the upstream OSS
OSS project.
projects than downstream OSS projects, and few of them can make
Finding 3 More group members concern with the upstream OSS projects active contributions to both sides.
than downstream OSS projects. Specially, most of these group members

7
Z. Chen et al. Information and Software Technology 145 (2022) 106849

Fig. 3. The identities and contributions of the reporters, restorers, and linkers. • (a), (b), and (c) show the percentages of owners/watchers/forkers/stargazers/others with respect to
the upstream OSS project, downstream OSS project(s), and both OSS projects among the reporters, restorers, and linkers, respectively. (d) shows the upstream/downstream/bilateral
activists among the reporters, restorers, and linkers. (e) shows the distributions of contributions of the reporters, restorers, and linkers to the upstream and downstream OSS projects.
• A boxplot presents the median (the horizontal line within the box), the 25th, and the 75th percentiles (the lower and upper sides of the box), as well as the mean (the × within
the box) of the percentage of commits a participant has contributed to the total commits of the OSS project.

5. Roles of group members the main maintainers of their OSS projects and the experts who know
best of how to fix the bugs.
This section reports the result of our second research question: what Linkers. Fig. 3(c) shows a distinct result from Fig. 3(b). The linkers
is the role of each member in a work group? have closer relationship with the downstream OSS projects. Specifi-
cally, half of the linkers are the downstream owners. Fig. 3(d) shows
5.1. (RQ2.1) Which OSS communities are the reporter, restorer, and linker that 63% of the linkers are the downstream activists. Linkers are
from? considered as those who first found that the bug affected another OSS
project. The result indicates that in most cases the cross-project impact
Figs. 3(a), 3(b), and 3(c) show the identities of the reporters, restor- of a bug is usually found from the downstream OSS communities,
ers, and linkers within the work groups, respectively. Fig. 3(d) displays especially from the core development team.
the percentage of the upstream/downstream/bilateral activists among Furthermore, we used Wilcoxon signed rank test [40] to compare
the reporters, restorers, and linkers. Fig. 3(e) describes the distribution the contributions of the reporters/restorers/linkers make to the up-
of contributions of the reporters, restorers, and linkers to either OSS stream and downstream OSS projects. Wilcoxon test is a non-parametric
project. statistical hypothesis test, which compares two groups of independent
Reporters. From Fig. 3(a), we see that among 236 cross-project bug samples to assess whether their population mean ranks differ. In this
reporters, the upstream forkers are a few more than the downstream study, we test at the significance level of 0.001 to investigate whether
forkers, but more of the reporters own, watch, or star the downstream
the number of commits contributed by the reporters/restorers/linkers
OSS projects than the upstream OSS projects. Fig. 3(d) shows that
significantly differs in the upstream and downstream OSS projects.
nearly half of the reporters are the downstream activists, much more
Table 4 reports the result of Wilcoxon signed rank test. It shows that
than the upstream activists who account for 27% of the reporters.
the reporters, restorers, and linkers of cross-project bugs all make
The result indicates that more cross-project bugs were first found from
significantly different contributions to the upstream and downstream
the downstream OSS communities rather than by the upstream OSS
OSS projects. Combined with Fig. 3(e), we conclude that reporters and
community.
linkers contribute significantly more to the downstream OSS projects,
Restorers. As seen in Fig. 3(b), more restorers come from the
upstream side, since significantly more of them own, watch, fork, or while the restorers contribute more to the upstream OSS projects.
star the upstream OSS projects than the downstream OSS projects. Finding 7 More cross-project bugs are first reported by the downstream
Specifically, a high percentage (90%) of the restorers had ever forked OSS communities but are finally repaired by the upstream activists.
the upstream OSS projects before the reporting of the bugs that they
repaired. Fig. 3(d) shows that 65% of the cross-project bugs were Finding 8 About half of the studied cross-project bugs are linked by the
repaired by the upstream activists. It is not surprising since they are downstream owners and activists.

8
Z. Chen et al. Information and Software Technology 145 (2022) 106849

Table 4 participants until they could provide a case. As the owner of OSS project
Result of Wilcoxon test which compares members’ contributions to the upstream and Charo said in enthought/chaco#213,9 ‘‘I have found it difficult to
downstream OSS projects.
extract a case that reliably crashes in all circumstances, so I find it difficult
Reporters Restorers Linkers
to report it upstream.’’ Similarly, the upstream participants highlighted
Wilcoxon signed rank test p-value 1.94e–4 4.75e–11 1.86e–11 the significance of a reproducible case, especially when, as a participant
said ‘‘A regression is only visible by running code from the downstream
project ’’. Therefore, a bug case or a bash script that does the same thing
5.2. (RQ2.2) What work do the members of different OSS communities then would be greatly appreciated. It helps the upstream participants
undertake? to ensure that their workflows are exactly the same so that they would
not waste a lot of time.
Obviously, fixing cross-project bugs needs the collaboration be- A case is at least supposed to include the buggy version, the usage
tween participants of multiple OSS communities, especially the up- triggering the bug, the wrong and expected outputs. Sometimes the
stream library and the affected downstream OSS project(s). However, information on the platform, such as Windows, Linux, and Mac OS, is
do they undertake the same work during the process? Alternatively, also needed. In a word, the requirement for the case is to be concise
what specific contribution do they make to fixing bugs? We rely on and brief, which could clearly describe and reproduce the bug and
the manual inspection of cross-project bug reports to address this omit irrelevant information. A makeshift case was provided by the
question. In the following, we report the main roles of the downstream reporter of Pytables/Pytables#319.10 The upstream participants
participants and the upstream participants in fixing cross-project bugs felt confused with the reported bug, and thus asked ‘‘Is there any
according to our manual inspection, respectively. way you can provide a simple script which generates an error that fails?’’
(1) Roles of the downstream participants However, it is challenging to extract a satisfactory case to reproduce
a cross-project bug, especially a complex one, which also indicates a
Narrowing down the bug to the upstream side (found in 372 bug
need for the supporting tools that automatically extract minimum test
reports). During our inspection, we found that multiple cross-project
cases from the downstream code for the upstream OSS projects.
bugs were first reported to the downstream OSS projects, either by the
Giving suggestions on the fix design (found in 238 bug re-
end-users because they encountered problems when they were using
ports). As the direct victims of the cross-project bugs, the downstream
the downstream OSS projects, or by the downstream developers when
participants are in a position to understand when bugs arise and im-
they built their own OSS projects. The participants tracked the root
plementation concerns when fixing them. Benefiting from the detailed
causes of the failures and finally located the error in the OSS projects
knowledge about their OSS projects, the downstream participants could
that they were depending on. They then told the upstream side by
inform how thing should work. Therefore, they are able to provide
submitting a bug report. During the tracking process, the downstream
valuable and practical suggestions on how to fix the bug or even to
code to some extent acted as the testing field of the upstream library. An
propose a suitable solution. As shown in Section 5.1, 26.9% of the
Astropy’s activist said in the bug report of its upstream OSS project
cross-project bugs were repaired by the core developers of the affected
Numpy (numpy/numpy#59626 ), ‘‘Bug found by testing astropy – see
downstream OSS projects.
astropy/astropy#3848. It seems our test suite checks even the dark corners
Additionally, a counterexample in scipy/scipy#154711 also
of numpy!’’
suggests the importance of the opinions from the downstream side.
However, the tracking process was not so easy even though the
A bug in a Scipy’s method caused inconsistent results with different
downstream members had already known that the problem might be
forms of input variables. Fixing the bug would change the behavior
related with a specific upstream OSS project. As Ma et al. [33] sug-
of the method and thus break the downstream code. Though the
gested, an important and difficult issue during this stage is to determine
bug was finally repaired in a cautious way, the participants from the
whether the unintended behavior is a feature of the upstream OSS
affected downstream project Nilearn were not satisfied because the
project or indeed a bug. An example was shown in a downstream bug
upstream participants did not ask them for suggestions. The exam-
report astropy/astropy#1031,7 which recorded that something
ple indicates that fixing cross-project bugs, especially those involving
was wrong when depending on the OSS project SOFA. The reporter, one breaking changes or backwards incompatibility, needs the opinions
of the owners of the downstream OSS project Astropy, commented from the downstream OSS communities.
‘‘This needs to be brought up to the SOFA developers to understand if this However, in spite of the willingness to fix cross-project bugs, it is
is a bug or expected feature of a non-uniform time scale.’’ Another similar difficult for the downstream members to give advice or submit a patch
case is astropy/astropy#1156,8 where the developers discussed if they lack the expertise of the upstream OSS project [33]. Therefore,
an unexpected behavior related to the upstream OSS project Pandas. we are not surprised to find that a large percentage of the downstream
They could not decide whether they themselves or Pandas should restorers are also keeping an eye on or even contributing much to the
be responsible for the problem. The discussion lasted for 680 days upstream OSS projects.
before they reported it to Pandas, which absolutely put off the re- Urging the upstream side to make progress (found in 73 bug
pairing of the bug. These examples confirm reversely the importance of reports). Downstream members commented on the bug reports to
the prompt communication with the suspected upstream OSS projects express their requirement for a quick fix especially when there were
during the tracking process [33]. no effective feedbacks for a long time. For example in
Providing a minimum bug case (found in 256 bug reports). During Pytables/Pytables#319,12 a downstream participant asked ‘‘Has
inspecting the bug comments, we found that nearly all the affected there been any progress with this? We have a couple of reproducible
downstream OSS projects would provide a reproducible case to demon- examples and nasty workarounds are required to avoid using the index
strate the bug. A bug case is the most direct and trustworthy way which is seemingly unreliable’’.
to tell the upstream members how the bug occurred. Furthermore, a Confirming and informing the fix (found in 368 bug reports).
minimum case independent of the other OSS projects is preferable for After the fix was committed and merged into the upstream code, the
the downstream participants to illustrate that the bug was absolutely
caused by an upstream error. They would even not bother the upstream
9
https://github.com/enthought/chaco/pull/213.
10
https://github.com/PyTables/PyTables/issues/319.
6 11
https://github.com/numpy/numpy/pull/5962. https://github.com/scipy/scipy/issues/1547.
7 12
https://github.com/astropy/astropy/issues/1031. https://github.com/PyTables/PyTables/issues/319#issuecomment-
8
https://github.com/astropy/astropy/issues/1156. 55531662.

9
Z. Chen et al. Information and Software Technology 145 (2022) 106849

downstream members usually checked whether their own OSS projects includes a test case. In astropy/astropy#4117,20 the reviewer said
became immune to the bug with the patched upstream branch. If so, ‘‘Bug fixes should generally include a regression test that demonstrates the
they closed the issue report submitted to their side to inform their users issue.’’
and developers. Examples are shown in astropy/astropy#384813 Usually, the reviewer of the bug fix is only one developer except
and pandas-dev/pandas#12390.14 for complex bugs whose fixes involved heavy changes. The pull request
Finding 9 In order to fix cross-project bugs quickly, downstream partic- numpy/numpy#508621 is just an example. Two files were modified to
ipants communicate with upstream participants actively by providing fix the bug with 54 lines added and 2 lines deleted, and three Numpy’s
test cases, giving suggestions on the fix design, and urging to make owners participated in reviewing the patch. In this way, they not only
progress. checked the fix but also discussed the best way to repair the bug with
the restorer.
Finding 10 Downstream participants face many difficulties in fixing
Planning the bug-fix release (found in 299 bug reports). Usually,
a cross-project bug, including hardly tracking the root cause of the
merging a fix into the upstream source code is not the end of the
failure, providing a concise and brief test case, and lacking the expertise
bug’s impact on the downstream OSS projects. Only when the upstream
of the upstream OSS project.
side publishes a bug-fix release (i.e., a new version including the
(2) Roles of the upstream participants
patch), the downstream OSS projects and the end-users can get rid
Debugging the bug (found in 252 bug reports). For a cross-project
of the bug. The upstream developers often choose to include the
bug which is caused by an error in the upstream OSS project, the
patch in the next version as long as there is enough time to fix the
upstream participants have the responsibility to find the root cause and
bug. A common practice to indicate the bug-fix version is to set the
decide how to fix it. With the information provided by the reporter,
‘‘Milestone’’ field in the bug report. For example, one of the IPython
such as the test triggering the bug, the upstream developers reproduce
owners tagged ipython/ipython#626222 with 3.0 milestone to
and debug it. During the process, they first diagnose the bug to find
tell others that they planned to include the fix of the bug in 3.0
the design flaw resulting in the bug, sometimes even pick up which
commit has introduced it (e.g., numpy/numpy#646715 ), and check version. Similar examples are pandas-dev/pandas#565223 and
which versions are affected. After finding the root cause, they discuss matplotlib/matplotlib#3574.24 Some other developers
and decide the best way to fix the bug. may show the release information in the comment. For example in
Reviewing the bug fix (found in 354 bug reports). As we know, numpy/numpy#4063,25 an activist of Numpy said ‘‘I also think we
GitHub supports pull-based development. To fix a bug of an OSS accumulated enough fixes to warrant a 1.8.1 release if we add this and
project, the restorer usually first forks the OSS project and makes the the C99 windows fix.’’
changes. A pull request is then submitted to the OSS project to indicate Undoubtedly, it is the upstream’s responsibility to decide which
the local branch to be merged into the original OSS project. Finally, the release should include the bug fix and when to publish the release,
core team of the OSS project decides whether to integrate the bug fix which however, may sometimes cause friction between the upstream
into their repository. Before the integration, a critical step is to review and downstream sides. One example shows in numpy/numpy#2969.26
the fix, which we find from the studied cross-project bugs, is primarily A Numpy’s bug that affected the OSS project Scikit-learn was
done by the core developers of the upstream OSS projects. fixed long before, but the fix was not included in the upcoming version
This finding is unsurprising and to some extent consistent with the (1.7.0) of Numpy. The reason is that no one had reminded the upstream
result from Gousios et al. [41]. In their study, half of their surveyed participants to contain the fix into the last release candidate (RC)
integrators suggested that the OSS project’s community actively reviews and they would not ‘‘put non-RC stuff into 1.7.0’’ for their principal of
the code changes. That is possibly because the upstream participants ‘‘keeping the final identical to the last RC’’. Release candidates are used as
are more familiar and concerned with their own projects. Therefore, the biggest test to collect feedback from users and to avoid introducing
they are more responsible and able to check whether the fix is suitable. new bugs before releasing a new version. With no more time to publish
During the review, the upstream participants may leave comments a new RC due to the nine months delay of 1.7.0 release, the upstream
on the pull request to propose suggestions or requirements, such as developers promised to include the patch in 1.7.1.
numpy/numpy#7258.16 They may also comment on specific lines of Providing temporary solutions (found in 86 bug reports). For
code that the restorers have changed, as in numpy/numpy#4804.17 some cross-project bugs which need long time to fix, the upstream
The focuses of the reviewers mainly fall on three aspects. First and participants might provide temporary solutions for the downstream
the most important, they examine whether the changes on the code OSS projects to deal with the bugs. They often offered another method
repair the bug. An example is astropy/astropy#2783,18 where the in their OSS project which fulfills the similar functionality as an alter-
reviewer did not think the submitted patch fixed the bug so he/she native.
provided some suggestions. Second, they check whether the proposed
patch interferes with other parts of their code or block other issues. Finding 11 Upstream members take responsibility for bug-fix work.
For example, in numpy/numpy#5078,19 the OSS project was not built They debug the bug, review the bug fix, and publish a fix release.
successfully with the pull request. The reviewer analyzed the cause and
said ‘‘Wonder if all those pointers should have been char * to be consistent
with the rest of numpy.’’ Third, they care about whether the pull request Summary: The upstream and downstream participants play

13
https://github.com/astropy/astropy/issues/3848#issuecomment-
20
113811036. https://github.com/astropy/astropy/pull/4117#issuecomment-
14
https://github.com/pandas-dev/pandas/issues/12390#issuecomment- 137806225.
21
186841876. https://github.com/numpy/numpy/pull/5086.
15 22
https://github.com/numpy/numpy/issues/6467#issuecomment- https://github.com/ipython/ipython/issues/6262.
23
147768314. https://github.com/pandas-dev/pandas/issues/5652.
16 24
https://github.com/numpy/numpy/pull/7285. https://github.com/matplotlib/matplotlib/pull/3574.
17 25
https://github.com/numpy/numpy/pull/4804. https://github.com/numpy/numpy/issues/4063#issuecomment-
18
https://github.com/astropy/astropy/pull/2783. 34088966.
19 26
https://github.com/numpy/numpy/pull/5087. https://github.com/numpy/numpy/issues/2969.

10
Z. Chen et al. Information and Software Technology 145 (2022) 106849

6.2. Recommendations to developers


distinct roles within the work groups. In most cases, downstream
OSS communities reported cross-project bugs first. However, they Our study of open collaboration in fixing cross-project bugs in
faced difficulties in fixing them and thus they communicated with OSSECOs benefits the maintenance activities in both the upstream and
upstream participants actively in order to have a quick fix. Up- downstream OSS projects. Particularly, the results of this study provide
stream OSS communities take responsibility for the entire process practical recommendations to the developers of both sides.
of bug-fix work. For the developers of upstream OSS projects, they should pay close
attention to the reports from downstream OSS projects, as cross-project
bugs could also be found in downstream OSS projects. Our study
results reveal that more cross-project bugs were first found from the
6. Discussions
downstream OSS communities rather than the upstream OSS commu-
nities themselves. After receiving the report of a cross-project bug, the
The research presented in this paper has a number of implications developers of upstream OSS projects should publish the fix release as
for both research and practice. From our results we formulate some rec- soon as quickly. Generally, the upstream would decide which release
ommendations that could help researchers, developers, practitioners, should include the bug fix and when to publish the release. However,
and also anyone involved in the development and evolution (especially according to our result of qualitative analysis, it causes friction between
fixing cross-project bugs) of OSSECOs such as the scientific Python the upstream and downstream sides in many cases.
ecosystem. For the developers of downstream OSS projects, what we are sug-
gesting is that they should communicate with upstream developers
actively and efficiently. Firstly, once they find a cross-project bug, they
6.1. Recommendations to researchers could provide a reproducible and minimum case to demonstrate the
bug when reporting it to the upstream. Secondly, they could give active
In our study, we have discovered that the members of a work suggestions for the fix design. However, our study reveals that it is
group act various roles in fixing cross-project bugs, including the re- difficult for them to submit a patch, because they lack the expertise of
porter/linker/restorer of the bug and the watchers/forkers/stargazers/ the upstream OSS projects. Therefore, as we discovered in this study,
many downstream restorers are keeping an eye on or even directly
owners/activists of the upstream or downstream OSS projects. More
contributing to the upstream OSS projects. Thirdly, once upstream
research should be conducted to further explore the behaviors of certain
developers have put forward a fix plan, downstream developers could
roles of participants during the evolvement of OSSECOs. For example,
review it by checking whether the path would repair the bug in their
Ma et al. [33] monitored the practices of reporters and linkers and
downstream OSS projects. During the above communication stages,
then found that tracking the root cause of cross-project bugs is difficult.
the key is that downstream developers should know upstream OSS
As one explanation, our study result showed that there are very few projects very well. As suggested by our study results, it is difficult for
activists of both the upstream and downstream OSS projects in a work developers to determine whether a unintended behavior is a feature
group and it needs the collaboration within a work group with several of the upstream OSS project or indeed a bug, which delays effective
members. In addition, they also inspected the practices of downstream communication with upstream developers.
OSS developers, while our study complements it with the practices
of upstream OSS developers to understand their collaboration. The 6.3. Recommendations to practitioners
practices of other roles of participants should also be analyzed to better
understand the evolvement of OSSECOs. There are a number of challenges that may face the community in
Based on the practices of different roles of participants in OSSECOs, regard to OSSECOs. We believe that the practitioners need to improve
one future research direction is evaluating the influence or efforts of the following tools to support the evolvement of OSSECOs.
each role of participants in a work group to find who plays the most Test Case Generation Tools. The result of RQ2 reveals that the
important role in cross-project bug fixing activities. Farias et al. [9] upstream OSS participants require the bug case from the downstream
investigated the sense of influence from two SECO actors’ roles and side to be concise and brief so that they would not waste time in un-
from multiple characteristics. Based on the influencers’ characteristics derstanding complex contexts, but downstream OSS participants often
summarized from their study, a new study can be conducted to explore feel difficult to provide a minimum bug case. Although general test case
which roles of participants in a work group have the most influences generation tools are already available, few can produce a concise and
on the fixing process, which helps contributors understand how an brief test case which could reproduce the cross-project bug and avoid
OSSECO evolves based on influencers’ actions. irrelevant information (e.g., the effects of the dependences from the
other OSS projects or the effects of the runtime platform). Therefore,
Furthermore, as it needs complex collaboration for different mem-
it indicates a need for the supporting tools that automatically extract
bers in a work group to fix a cross-project bug according to our
minimum test cases from the downstream code for the upstream OSS
study result, more research is needed to propose effective techniques
projects. Such tools can help upstream developers locate defects in their
of automatically detecting bugs in OSSECOs. It is more challenging
OSS projects quickly.
for developers to detect bugs in OSSECOs than in an isolated project.
Instant Messaging Tools. We have discovered that it is generally a
On the one hand, testing a module in a single project only needs to
long process for two sides of participants to solve a cross-project bug.
consider the dependencies within the project, while in the context of A work group analyzes the bug and proceeds to work on a fix only by
OSSECOs, developers also need to generate test cases for the usage of commenting on the bug report, which is time-consuming. In order to fa-
this module in the downstream OSS projects. On the other hand, once cilitate effective communication among the members of a work group,
an error occurs when testing the OSSECO, it is difficult for developers instant messaging tools are needed to provide a faster communication
to determine the root cause of this bug, including which OSS project the way for them. Particularly, some additional functionality can be devel-
bug lies in, in which contexts of upstream and downstream OSS projects oped to be specific to cross-project bug fixing work groups. As a new
this error occurs, and how many OSS projects would be affected by feature, the tools can automatically send notification messages to the
this bug. All these challenges motivate new studies on the techniques involved participants at certain key points. For example, once the fix
of automatically detecting cross-project bugs in OSSECOs. commit is released in the upstream OSS project, the tool automatically

11
Z. Chen et al. Information and Software Technology 145 (2022) 106849

informs the origin reporter of the cross-project bug and the activists of CRediT authorship contribution statement
the downstream OSS projects.
Ecosystem Management Tools. Though both sides of participants Zhifei Chen: Conceptualization, Methodology, Writing – review &
value the communication with each other, the upstream participants editing. Wanwangying Ma: Software, Data curation, Writing – orig-
lack the knowledge of the effects of one bug on its downstream OSS inal draft. Lin Chen: Supervision, Project administration. Wei Song:
projects, while the downstream participants have little idea about the Writing – review & editing.
upstream release management. It causes conflicts between the up-
stream and downstream OSS projects. This shortcoming demonstrates Declaration of competing interest
the demand of supporting tools to manage maintenance activities in OS-
SECOs. The main focus of such tools is not the analysis of dependencies The authors declare that they have no known competing finan-
between OSS projects, but the impact analysis of a certain cross-project cial interests or personal relationships that could have appeared to
bug. On the one hand, once a cross-project bug is found in the upstream influence the work reported in this paper.
OSS project, the tool analyzes the context of the bug in the upstream
Acknowledgments
code and locates the OSS projects which are affected by the bug (not all
the OSS projects that are dependent on this project). On the other hand,
We would like to thank the anonymous reviewers for their insightful
the tool monitors the fix release of the cross-project bug and connects
feedback and comments. This work was supported in part by the
it to the co-changes of code in affected OSS projects.
National Natural Science Foundation of China (Grant No. 61872177,
Task Management Tools. As we have shown in Section 5, par-
62172202, and 61761136003).
ticipants from different OSS communities play distinct roles within a
GitHub work group. The upstream participants are more similar to References
the decision makers in repairing cross-project bugs, since they decide
the way to fix the bug and review the patch to determine whether [1] M. Lungu, M. Lanza, T. Gîrba, R. Robbes, The small project observatory:
to merge it into their code. The downstream participants act as the Visualizing software ecosystems, Sci. Comput. Program. 75 (4) (2010) 264–275.
[2] K. Blincoe, F. Harrison, D. Damian, Ecosystems in GitHub and a method for
support stuff to provide information, suggestion, and even the final fix.
ecosystem identification using reference coupling, in: Proceedings of the 12th
It is reasonable to conclude that the essence of the tasks in OSSECOs Working Conference on Mining Software Repositories, 2015, pp. 202–207.
determines the roles of the group members. Furthermore, the consid- [3] A. Decan, T. Mens, P. Grosjean, An empirical comparison of dependency network
erable responsibilities laid on the decision-makers result in a heavy evolution in seven software packaging ecosystems, Empir. Softw. Eng. 24 (1)
(2019) 381–416.
workload. In the studied OSSECO, the core development team of an
[4] E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D.M. German, D. Damian, An
upstream OSS project, usually fewer than ten participants, has to review in-depth study of the promises and perils of mining GitHub, Empir. Softw. Eng.
a large number of pull requests, which reduces the efficiency of bug 21 (5) (2015) 2035–2071.
fixing. It shows the promise and need of supporting tools which can [5] Y. Wang, D. Redmiles, IIAG: A data-driven and theory-inspired approach for
advising how to interact with new remote collaborators in OSS teams, Automated
automatically assign different tasks, including fix design and review,
Softw. Eng. 28 (2) (2021) 5.
to the most suitable members in a work group. The development of [6] W. Yang, M. Pan, Y. Zhou, Z. Huang, Developer portraying: A quick approach
such tools is challenging as the members of a work group evolve with to understanding developers on OSS platforms, Inf. Softw. Technol. 125 (2020)
time. One alternative solution is assigning a task to more than one 106336.
[7] S. Zhou, B. Vasilescu, C. Kästner, What the fork: A study of inefficient and
participants.
efficient forking practices in social coding, in: Proceedings of the ACM Joint
Meeting on European Software Engineering Conference and Symposium on the
7. Conclusions Foundations of Software Engineering, 2019, pp. 350–361.
[8] K. Constantino, S. Zhou, M.R. de A. Souza, E. Figueiredo, C. Kästner, Understand-
ing collaborative software development: An interview study, in: Proceedings of
Software ecosystems on GitHub offer open environment for software the 15th IEEE/ACM International Conference on Global Software Engineering,
developers from various OSS communities to collaborate and coordi- 2020, pp. 55–65.
[9] V. Farias, I. Wiese, R. Santos, What characterizes an influencer in software
nate with one another. Developers participate in any public event that
ecosystems? IEEE Softw. 36 (1) (2019) 42–47.
they are interested in, such as bug fixing, forming temporary work [10] E. Lyulina, M. Jahanshahi, Building the collaboration graph of open-source soft-
groups. This study aims to investigate the characteristics of collab- ware scosystem, in: Proceedings of the 18th IEEE/ACM International Conference
oration within a work group when fixing cross-project bugs in the on Mining Software Repositories, 2021, pp. 618–620.
[11] T. Hou, X. Yao, D. Gong, Community detection in software ecosystem by
ecosystem, focusing on the composition of a work group and the roles
comprehensively evaluating developer cooperation intensity, Inf. Softw. Technol.
of group members. By investigating 236 work groups in cross-project 130 (9) (2021) 106451.
bug fixing, we find that: (1) A typical work group is constituted of four [12] K. Manikas, K.M. Hansen, Software ecosystems - A systematic literature review,
to eight members from the core development teams of the upstream and J. Syst. Softw. 86 (2013) 1294–1306.
[13] D.G. Messerschmitt, C. Szyperski, Software ecosystem: Understanding an
downstream sides. More group members concern with the upstream
indispensable technology and industry, MIT Press Books, 2005, p. 1.
OSS projects and few of them can make active contributions to both [14] S. Jansen, A. Finkelstein, S. Brinkkemper, A sense of community: A research
sides; (2) During the bug reporting, linking, and fixing process, distinct agenda for software ecosystems, in: Proceedings of the 31st International
responsibilities are taken by the two OSS communities, with the down- Conference on Software Engineering - Companion Volume, 2009, pp. 187–190.
[15] J. Bosch, From software product lines to software ecosystems, in: Proceedings of
stream members as the problem-finders and the upstream members as
the 13th International Software Product Line Conference, 2009, pp. 111–119.
the decision-makers or gatekeepers. [16] O. Franco-Bedoya, D. Ameller, D. Costal, X. Franch, Open source software
The study results lead to a better understanding of the open collab- ecosystems: A systematic mapping, Inf. Softw. Technol. 91 (2017) 160–185.
oration. From our results we highlight the implications of the findings [17] J. Sheoran, K. Blincoe, E. Kalliamvakou, D. Damian, J. Ell, Understanding
‘‘watchers’’ on GitHub, in: Proceedings of the 11th Working Conference on
and formulate some recommendations to researchers, developers, and
Mining Software Repositories, 2014, pp. 336–339.
practitioners. In summary, we believe that future work needs to ad- [18] M. Biazzini, B. Baudry, ‘‘May the fork be with you’’: Novel metrics to analyze
dress the following: (1) further research to investigate the impact collaboration on GitHub, in: Proceedings of the 5th International Workshop on
of collaborative practices; (2) more effective communication between Emerging Trends In Software Metrics, 2014, pp. 37–43.
[19] Y. Wu, J. Kropczynski, P.C. Shih, J.M. Carroll, Exploring the ecosystem of
members within a work group; and (3) practical tools that can support
software developers on GitHub and other platforms, in: Proceedings of the
participants in the development of software ecosystems hosted on Companion Publication of the 17th ACM Conference on Computer Supported
GitHub. Cooperative Work, 2014, pp. 265–268.

12
Z. Chen et al. Information and Software Technology 145 (2022) 106849

[20] A. Lima, L. Rossi, M. Musolesi, Coding together at scale: GitHub as a collab- [31] G. Canfora, L. Cerulo, M. Cimitile, M.D. Penta, Social interactions around cross-
orative social network, in: Proceedings of the 8th International Conference on system bug fixings: The case of FreeBSD and OpenBSD, in: Proceedings of the
Weblogs and Social Media, 2014, pp. 295–304. 8th Working Conference on Mining Software Repositories, 2011, pp. 143–152.
[21] H. Borges, M.T. Valente, What’s in a GitHub star? Understanding repository [32] W. Ma, L. Chen, X. Zhang, Y. Feng, Z. Xu, Z. Chen, Y. Zhou, B. Xu, Impact anal-
starring practices in a social coding platform, J. Syst. Softw. 146 (2018) 112–129. ysis of cross-project bugs on software ecosystems, in: Proceedings of IEEE/ACM
[22] F. Thung, T.F. Bissyandé, D. Lo, L. Jiang, Network structure of social coding 42nd International Conference on Software Engineering, 2020, pp. 100–111.
in GitHub, in: Proceedings of the 17th European Conference on Software [33] W. Ma, L. Chen, X. Zhang, Y. Zhou, B. Xu, How do developers fix cross-
Maintenance and Reengineering, 2013, pp. 323–326. project correlated bugs? A case study on the GitHub scientific Python ecosystem,
[23] J. Choi, Y. Tausczik, Characteristics of collaboration in the emerging practice of in: Proceedings of 2017 IEEE/ACM 39th International Conference on Software
open data analysis, in: Proceedings of the 2017 ACM Conference on Computer Engineering, 2017, pp. 381–392.
Supported Cooperative Work and Social Computing, 2017, pp. 835–846. [34] K. Crowston, J. Howison, The social structure of free and open source software
[24] Y.R. Tausczik, A. Kittur, R.E. Kraut, Collaborative problem solving: A study development, First Monday 10 (2005) 2.
of MathOverflow, in: Proceedings of the 17th ACM Conference on Computer [35] L. Dabbish, C. Stuart, J. Tsay, J. Herbsleb, Social coding in GitHub: Trans-
Supported Cooperative Work and Social Computing, 2014, pp. 355–367. parency and collaboration in an open software repository, in: Proceedings of
[25] A. Kittur, R.E. Kraut, Beyond Wikipedia: Coordination and conflict in online the ACM 2012 Conference on Computer Supported Cooperative Work, 2012, pp.
production groups, in: Proceedings of the 2010 ACM Conference on Computer 1277–1286.
Supported Cooperative Work, 2010, pp. 215–224. [36] J. Jiang, D. Lo, J. He, X. Xia, P.S. Kochhar, L. Zhang, Why and how developers
[26] R.F. Oliveira, R.M. de Mello, E. Fernandes, A. Garcia, C. Lucena, Collaborative fork what from whom in GitHub, Empir. Softw. Eng. 22 (1) (2017) 547–578.
or individual identification of code smells? On the effectiveness of novice and [37] GitHub Developer – GitHub Developer Guide, Retrieved in March, 2021 from
professional developers, Inf. Softw. Technol. 120 (2020) 106242. https://developer.github.com/.
[27] S. Breu, R. Premraj, J. Sillito, T. Zimmermann, Information needs in bug reports: [38] A. Mockus, R.T. Fielding, J.D. Herbsleb, Two case studies of open source software
Improving cooperation between developers and users, in: Proceedings of the 2010 development: Apache and mozilla, ACM Trans. Softw. Eng. Methodol. 11 (3)
ACM Conference on Computer Supported Cooperative Work, 2010, pp. 301–310. (2002) 309–346.
[28] D. Bertram, A. Voida, S. Greenberg, R. Walker, Communication, collaboration, [39] J.M. Corbin, A. Strauss, Grounded theory research: Procedures, canons, and
and bugs: The social nature of issue tracking in small, collocated teams, in: evaluative criteria, Qualit. Sociol. 13 (1) (1990) 3–21.
Proceedings of the 2010 ACM Conference on Computer Supported Cooperative [40] F. Wilcoxon, Individual Comparisons by Ranking Methods, Methodology and
Work, 2010, pp. 291–300. distribution, Breakthroughs in statistics, 1992.
[29] K. Crowston, B. Scozzi, Bug fixing practices within free/libre open source [41] G. Gousios, A. Zaidman, M.-A. Storey, A.V. Deursen, Work practices and chal-
software development teams, J. Database Manage. 19 (2) (2008) 1–30. lenges in pull-based development: The integrator’s perspective, in: Procedings of
[30] M. Gandhi, A. Kumar, Y. Desai, S. Agarwal, Studying multifaceted collaboration the 37th International Conference on Software Engineering, 2015, pp. 358–368.
of OSS developers and its impact on their bug fixing performance, in: Proceedings
of the 7th International Workshop on Quantitative Approaches to Software
Quality Co-Located With 26th Asia-Pacific Software Engineering Conference,
2019, pp. 37–44.

13

You might also like