Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Information and Software Technology 164 (2023) 107319

Contents lists available at ScienceDirect

Information and Software Technology


journal homepage: www.elsevier.com/locate/infsof

Warnings: Violation symptoms indicating architecture erosion


Ruiyin Li a,b , Peng Liang a ,∗, Paris Avgeriou b
a
School of Computer Science, Wuhan University, 430072 Wuhan, China
b
Department of Mathematics and Computing Science, University of Groningen, 9747AG Groningen, The Netherlands

ARTICLE INFO ABSTRACT

Keywords: Context: As a software system evolves, its architecture tends to degrade, and gradually impedes software
Architecture erosion symptom maintenance and evolution activities and negatively impacts the quality attributes of the system. The main root
Architecture violation cause behind architecture erosion phenomenon derives from violation symptoms (i.e., various architecturally-
Code review
relevant violations, such as violations of architecture pattern). Previous studies focus on detecting violations
Code commit
in software systems using architecture conformance checking approaches. However, code review comments
are also rich sources that may contain extensive discussions regarding architecture violations, while there is a
limited understanding of violation symptoms from the viewpoint of developers.
Objective: In this work, we investigated the characteristics of architecture violation symptoms in code review
comments from the developers’ perspective.
Methods: We employed a set of keywords Related to violation symptoms to collect 606 (out of 21,583) code
review comments from four popular OSS projects in the openStack and qt communities. We manually analyzed
the collected 606 review comments to provide the categories and linguistic patterns of violation symptoms, as
well as the reactions how developers addressed them.
Results: Our findings show that: (1) three main categories of violation symptoms are discussed by devel-
opers during the code review process; (2) The frequently-used terms of expressing violation symptoms are
‘‘inconsistent ’’ and ‘‘violate’’, and the most common linguistic pattern is Problem Discovery; (3) Refactoring and
removing code are the major measures (90%) to tackle violation symptoms, while a few violation symptoms
were ignored by developers.
Conclusions: Our findings suggest that the investigation of violation symptoms can help researchers better un-
derstand the characteristics of architecture erosion and facilitate the development and maintenance activities,
and developers should explicitly manage violation symptoms, not only for addressing the existing architecture
violations but also preventing future violations.

1. Introduction symptoms were reported: structural symptoms (e.g., cyclic dependen-


cies), violation symptoms (e.g., violation of the layered pattern), quality
During software evolution, the implemented architecture tends to symptoms (e.g., high defect rate), and evolution symptoms (e.g., rigid-
increasingly diverge from the intended architecture. The resulting gap ity and brittleness of the software system). From these four types
between the intended and implemented architectures is defined as of symptoms, violation symptoms are deemed as the most critical
architecture erosion [1,2], and has been described using different terms symptoms that practitioners should address because the accumulation
in the literature and practice, such as architectural decay, degradation, of violation symptoms can render the architecture completely unten-
and deterioration [1,3]. Architecture erosion can negatively affect qual- able [5]. Violation symptoms of architecture erosion include various
ity attributes of software systems, such as maintainability, performance, types of architecturally-relevant violations in software systems, such
and modularity [3,4]. as violations of design principles, architecture patterns, decisions, re-
Architecture erosion can manifest in a variety of symptoms during quirements, modularity, etc. [1]. For the sake of brevity, we refer to
the software life cycle; such symptoms indicate that the implemented violation symptoms of architecture erosion as violation symptoms in the
architecture is moving away from the intended one. In our recent rest of this paper.
systematic mapping study [1], four categories of architecture erosion

∗ Corresponding author.
E-mail addresses: ryli_cs@whu.edu.cn (R. Li), liangp@whu.edu.cn (P. Liang), p.avgeriou@rug.nl (P. Avgeriou).

https://doi.org/10.1016/j.infsof.2023.107319
Received 20 September 2022; Received in revised form 19 July 2023; Accepted 3 August 2023
Available online 9 August 2023
0950-5849/© 2023 Elsevier B.V. All rights reserved.
R. Li et al. Information and Software Technology 164 (2023) 107319

While a handful of temporary violation symptoms might be innocu- 2.1. Code review
ous regarding the software system, the accumulation of architecture
violations can lead to architecture erosion [2,4,5], severely impacting Code review is the process of analyzing assigned code for inspecting
run-time and design-time qualities. Therefore, identifying and monitor- code and identifying defects. A methodical code review process can
ing violation symptoms is crucial to reveal inconsistencies between the continuously improve the quality of software systems, share develop-
implementation and the intended architecture; eventually this can help ment knowledge, and prevent from releasing products with unstable
to, at least partially, repair architecture erosion [1,5]. and defective code. Currently, code review practices have become a
Prior studies focusing on erosion symptoms through inspecting crucial development activity that has been broadly adopted and con-
source code might ignore implicit semantic information, while our verged to code review supported by tools. Moreover, tool-based code
work dives into violation symptoms by analyzing textual artifacts such review has been widely used in both industry and open source com-
as code review comments from the developers’ perspective. In contrast munities. In recent years, many code review tools have been provided,
to violation symptoms identification from source code using predefined such as Meta’s Phabricator1 , VMware’s Review-Board2 , and Gerrit3 .
abstract models [6,7] and rules [8–11], violation symptoms can also Gerrit is a popular code review platform designed for code review
be detected by analyzing textual artifacts that contain information workflows and is used in our selected projects (see Section 3.2). Once a
related to the system architecture and its design. Violation symptoms developer submits new code changes (e.g., patches) and their descrip-
can occur at different stages of development and be self-admitted tion to Gerrit, the tool will create a page to record all the changes, and
or pointed out by developers [12]. In our previous study [12], we meanwhile the developer should write a message to describe the code
investigated two types of symptoms of architecture erosion in code changes, namely, a ‘‘commit message’’. Gerrit conducts a sanity check to
reviews (i.e., structural and violations symptoms), and found that verify the patch is compliant and to make sure that the code has no
violation symptoms are the most frequently-discussed symptoms in obvious compilation errors. After the submitted patch passes the sanity
architecturally-relevant issues during code review. Therefore, in this check, code reviewers will manually examine the patch and provide
work, we focus on violation symptoms and attempt to categorize them their feedback to correct any potential errors, and then give a voting
and understand how developers express and address them. score. Note that, code reviewers can not only comment on source code
Code review comments include rich textual information about the but also on code commits. The review and vote process will iterate with
architecture changes that developers identified and discussed during the purpose of improving the patch. Finally, the submitted patch will
development [13]. Code reviews are usually used to inspect defects be merged into the code repository after passing the integration tests
in source code and help improve the code quality [14]. Compared to (i.e., without any issues and conflicts).
pull requests that might not provide specific advice for development
practice [15], code review comments provide a finer granularity of 2.2. Architecture erosion
information for investigating architecture changes and violations from
the developers’ perspective. The sustainability of architecture depends on architectural design
Although valuable information on architecture violations is dis- to ensure the long-term use, efficient maintenance, and appropriate
cussed during code review, there are no studies regarding the categories evolution of architecture in a dynamically changing environment [16].
of violation symptoms that are discussed and admitted by developers, However, architecture erosion and drift are two essential phenomena
how developers express violation symptoms, and whether and how threatening architecture sustainability. Architecture erosion happens
these symptoms are addressed during the development. To this end, due to the direct violations of the intended architecture, whereas
we aim at understanding how developers discuss violation symptoms architecture drift occurs due to extensive modifications that are not
and providing an in-depth investigation to the categories of violation direct violations but introduce design decisions not included in the
symptoms, as well as practical measures used to address them. We intended architecture [2,16].
identified 606 (out of 21,583) code review comments related to ar- The architecture erosion phenomenon has been extensively dis-
chitecture violations from four OSS projects in the OpenStack and Qt cussed in the past decades and has been described by various terms
communities. The main contributions of this work are the following: [1,3], such as architecture decay [17,18], degradation [19], and degen-
• We created a dataset containing violation symptoms of architec- eration [20]. Architecture erosion manifests in a variety of symptoms
ture erosion from code review comments, which can be used by during development and maintenance. A symptom is a (partial) sign or
the research community for the study of architecture erosion. indicator of the emergence of architecture erosion. According to our
• We identified the 606 violation symptoms and classified them into recent systematic mapping study [1], the erosion symptoms can be
three categories with ten subcategories, as well as the ways that classified into four categories: structural symptoms (e.g., cyclic depen-
developers addressed these categories. dencies), violation symptoms (e.g., layering violation), quality symptoms
• This is the first study that investigated violation symptoms in (e.g., high defect rate), and evolution symptoms (e.g., rigidity and brittle-
textual artifacts (specifically, code review comments) from the ness of systems). Previous studies have investigated different symptoms
perspective of practitioners. of architecture erosion. Mair et al. [21] proposed a formalization
• We identified the linguistic patterns of expressing architecture method regarding the process of repairing eroded architecture through
violation symptoms from code review comments. detecting violation symptoms and recommending optimal repair se-
quences. Le et al. [18,22] regarded architectural smells as structural
The paper is organized as follows: Section 2 introduces the back- symptoms and provided metrics to detect instances of architecture
ground of this study. Section 3 elaborates on the study design. The erosion by analyzing the detected smells. Bhattacharya et al. [23]
results of the research questions are presented in Section 4, while their developed a model for tracking software evolution by measuring the
implications are further discussed in Section 5. Section 6 elaborates on loss of functionality (as evolution symptoms). Regarding the scope of
the threats to validity. Section 7 reviews the research work of this study. our work, we focus on the nature of architecture erosion (i.e., violation
Finally, Section 8 summarizes this work and outlines the directions for symptoms) through code review comments in this work, which paves
future research. the way towards shedding light on architecture violations from the
developers’ perspective.
2. Background

In this section, we overview the background of our study regarding 1


https://www.phacility.com/
code review and architecture erosion with the corresponding erosion 2
https://www.reviewboard.org/
3
symptoms. https://www.gerritcodereview.com/

2
R. Li et al. Information and Software Technology 164 (2023) 107319

3. Methodology Nova from the OpenStack community, and Qt Base and Qt Creator from
the Qt community (see Table 1). Neutron (providing networking as a
The goal of this study is formulated by following the Goal-Question- service for interface devices) and Nova (a controller for providing cloud
Metric approach [24]: analyze code review comments for the purpose virtual servers) are mainly written in Python; Qt Base (offering the core
of identification and analysis with respect to violation symptoms of UI functionality) and Qt Creator (the Qt IDE) are mainly developed
architecture erosion from the point of view of software developers in in C++. The selected four projects are the most active projects in
the context of open source software development. the OpenStack and Qt communities, respectively, and they are widely
known for a plethora of code review data recorded in the Gerrit code
3.1. Research questions review system [27,28].

To achieve our goal, we define three Research Questions (RQs): 3.3. Data collection

RQ1: What categories of violation symptoms do develop- An overview of our data collection, labeling, and analysis is shown
ers discuss? in Fig. 1. Starting with the process of data collection (the top part of
Fig. 1), we first employed Python scripts to mine code review comments
(concerning source code and commits) of the four projects through the
Rationale: This RQ aims at investigating the categories of violation
REST API7 supported by the Gerrit tool. Then, we organized and stored
symptoms that frequently occur during the development process; an
the collected data in MongoDB. Our goal, as stated in the beginning
example of such a category is violations of architecture patterns. The of this section, is to analyze the violation symptoms that exist in code
proposed categories of violation symptoms in textual artifacts from review comments from developers. Therefore, we removed the review
code review comments can be used by practitioners as guidelines to comments that were generated by bots in the Qt community; we noticed
avoid such violations in practice. For example, certain categories of that there were no code review comments generated by bots in the
violation symptoms may be associated with high erosion risks [1] and OpenStack community. However, manually analyzing the entire history
be regarded as important to provide warnings to developers. of code review comments of the four projects is prohibitive in terms of
both effort and time. Thus, we decided to collect the code reviews of the
RQ2: How do developers express violation symptoms? four projects in seven years between Jan 2014 and Dec 2020 to guar-
antee sufficient revisions for long-lived software systems. Finally, we
Rationale: Violation symptoms in code review comments are de- obtained 518,743 code review comments concerning code and 48,113
scribed in natural language, but there is a lack of evidence regarding review comments concerning commit messages from the four projects
how developers describe these violation symptoms. Specifically, we in the past seven years. Each item in our dataset contains review ID and
are interested in the terms and linguistic patterns4 that developers patch information, including change_id, patch, file_url, line,
use to denote violation symptoms. Establishing a list of the terms and and message; the message variable includes code review comments
linguistic patterns used by practitioners can subsequently provide a concerning source code and commits. All the scripts and the dataset of
basis for the automatic identification of violation symptoms through this work have been made available in the replication package [29].
natural language processing techniques. The collected review comments contain a large number of entries,
such as ‘‘Done’’ and ‘‘Ditto’’, which are not related to the discussion
RQ3: What practices are used by developers to deal with on violation symptoms. To effectively collect and locate the associated
violation symptoms? code review comments on violation symptoms of architecture erosion,
we decided to employ a keyword-based search approach. We employed
the keywords presented in our previous work [12] (see the coarse-
Rationale: We aim at exploring what developers do when they grained keywords in Table 2) and improved the keyword set (see the
encounter violation symptoms during the development process; this fine-grained keywords in Table 2) as described below. Specifically, to
includes whether developers address the violation symptoms and how derive possible and associated synonyms of the keywords in software
they do that. The answers to this RQ can help uncover best practices engineering practices, we adopted the pre-trained word2vec model
to cope with violation symptoms, and facilitate the development of proposed by Efstathiou et al. [30] for querying semantically similar
methods and tools that promote such practices. terms. The authors of [30] trained this model with over 15 GB of
textual data from Stack Overflow posts, which contain a plethora of
3.2. Project selection textual expressions and words in the software engineering domain.
We utilized this pre-trained word embedding model to query sim-
To understand violation symptoms that developers face in practice, ilar terms of the coarse-grained keyword set. For example, we got
we selected four OSS projects from two communities, namely Open- ‘‘discrepancy’’ and ‘‘deviation’’ which are similar terms to ‘‘divergence’’,
Stack and Qt; these projects have been commonly used in previous and then the first two authors discussed together to manually check
studies (e.g., [26]) due to their long development history and rich and remove unrelated and duplicate words, such as ‘‘oo’’ which is
textual artifacts. OpenStack5 is a widely-used open source cloud soft- related to programming languages rather than architecture violations.
ware platform, on which many organizations (e.g., IBM and Cisco) The keywords set used to search code review comments includes both
collaboratively develop applications for cloud computing. Qt6 is a the coarse-grained keywords and the fine-grained keywords listed in
toolkit and a cross-platform framework for developing GUIs, and is used Table 2.
by around one million developers to develop world-class products for In addition, given that the effectiveness of the keyword-based ap-
desktop, embedded, and mobile operating systems. proach highly depends on the set of keywords, we chose the iterative
Both OpenStack and Qt contain a large number of sub-projects, approach proposed by Bosu et al. [31] to further improve the keyword
thus we selected two sub-projects from each community: Neutron and set by adding possible keywords that are related to the keywords in
Table 2. This approach has already been employed in previous studies
(e.g., [12,32]) that used keyword-based search in code review data. We
4
Grammatical rules that allow their users to speak properly in a common implemented this approach in the following steps:
language [25]
5
https://www.openstack.org/
6 7
https://www.qt.io/ https://gerrit-review.googlesource.com/Documentation/rest-api.html

3
R. Li et al. Information and Software Technology 164 (2023) 107319

Fig. 1. An overview of the data collection and analysis process.

Table 1
An overview of the selected projects.
Project Domain Repository Language #Review comments of #Review comments of
code commits
Nova Virtual server management https://opendev.org/openstack/nova Python 152,107 15,164
Neutron Network connectivity https://opendev.org/openstack/neutron Python 181,839 16,719
Qt Base Providing UI functionality https://code.qt.io/cgit/qt/qtbase.git/ C++ 123,546 13,369
Qt Creator A cross-platform IDE https://code.qt.io/cgit/qt-creator/qt-creator.git C++ 61,251 2,861
Total 518,743 48,113

1. Search in the collected review comments using the keyword Table 2


Keywords related to violation symptoms of architecture erosion.
set in Table 2, and then establish a corpus by collecting the
Coarse-grained keywords
relevant comments that encompass at least one keyword from
architecture, architectural, structure, structural, layer, design, violate,
the keyword set (e.g., ‘‘violation’’).
violation, deviate, deviation, inconsistency, inconsistent, consistent, mismatch,
2. Process the retrieved code review comments that contain at least diverge, divergence, divergent, deviate, deviation
one keyword in our keyword set and remove English stopwords, Fine-grained keywords
punctuation, code snippets, and numbers. That results in a list layering, layered, designed, violates, violating, violated, diverges, designing,
of tokens. diverged, diverging, deviates, deviated, deviating, inconsistencies,
3. Conduct a stemming process (using SnowballStemmer from the non-consistent, discrepancy, deviations, modular, module, modularity,
encapsulation, encapsulate, encapsulating, encapsulated, intend, intends,
NLTK toolkit [33]) to obtain the stem of each token intended, intent, intents, implemented, implement, implementation,
(e.g., ‘‘architecture’’ and ‘‘architectural’’ have the same token as-planned, as-implemented, blueprint, blueprints, mis-match, mismatched,
‘‘architectur’’). mismatches, mismatching

4. Build a document-term matrix from the corpus, and find addi-


tional words that co-occur frequently with each of our keywords
(co-occurrence probability of 0.05 in the same document). 3.4. Data labeling and analysis
5. Manually check and discuss the list of frequently co-occurring
additional words to determine whether the newly discovered We filtered out a large number of irrelevant code review comments
words should be added to the keyword set. in Section 3.3. Still, the retrieved code review comments that contain at
least one keyword might be unrelated to violation symptoms. Thus, we
After executing this approach, we have not found any keywords needed to manually check and further remove these semantically un-
that co-occur with the keywords based on a co-occurrence probability related review comments. We conducted data labeling in two phases,
as illustrated in Fig. 1.
of 0.05 in the same document. Therefore, we believe that we have
Phase 1. We decided to conduct a pilot data labeling to reach
minimized the possibility of missing potentially co-occurred and asso-
a consensus and to ensure that we have the same understanding of
ciated words, and the keyword set could be relatively adequate and violation symptoms. Four researchers (the first author and three mas-
comprehensive for the search in this study. The keywords used in this ter students) had an online meeting to discuss the characteristics of
work are presented in Table 2. In total, we collected 21,583 code review violation symptoms. We randomly selected 50 review comments from
comments from the four OSS projects that contain at least one keyword. the collected data. The four researchers independently labeled the

4
R. Li et al. Information and Software Technology 164 (2023) 107319

violation symptoms from the review comments via MS Excel sheets, and
provided reasons for their labeling results. Then the four researchers
had another meeting to check the similarities and differences between
their labeling results for reaching an agreement, and any disagreements
were discussed with the second author to reach a consensus. Note that,
during the data labeling process, the researchers not only read the text
content of the code review comments per se, but also read their cor-
responding code snippets, documentation, and commit messages. This
helped us further mitigate the threat of wrong labels, such as simple
violations at the code level (e.g., pep8 coding style violation8 ). In the
end, to measure the inter-rater agreement between the researchers, we
calculated the Cohen’s Kappa coefficient value [34] of the pilot data
labeling and got an agreement value of 0.857, which demonstrates a
substantial agreement between them. Fig. 2. Overview of the retrieved review comments containing the violation symptom
Phase 2. After the pilot data labeling, the four researchers started keywords and the identified review comments related to violation symptoms in the
four projects.
the formal data labeling by dividing the retrieved 21,583 code re-
view comments into four parts; each researcher manually labeled one
fourth of this dataset (almost 5,400 review comments). The first author
created the MS Excel sheets and shared them with the other three
4. Results
researchers. The four researchers were asked to label the textual in-
formation associated with violation symptoms. After the formal data
4.1. Overview
labeling, the first author checked the data labeling results from the
other three researchers to make sure that there were no false positive Before delving into the findings of the three RQs, we briefly report
labeling results. To mitigate potential bias, we discussed all the conflicts an overview of the descriptive statistics about the identified violation
in the labeling results until we reached an agreement. In other words, symptoms from the selected four projects.
the data labeling results were checked by at least two researchers. The Fig. 2 shows the retrieved review comments containing the violation
researchers followed the same process as in Phase 1 to conduct data symptom keywords in Table 2 and the identified review comments
labeling. related to violation symptoms from the four projects. We observed
Finally, for data analysis, we employed Constant Comparison [35, that (1) the proportion of retrieved review comments across the four
36] to analyze and categorize the identified textual information. Con- projects aligns closely with the results presented in Table 1. Specifi-
stant Comparison [35,36] can be used to yield concepts, categories, and cally, the percentages are as follows: Nova at 2.94%, Neutron at 2.42%,
theories through a systematic analysis of qualitative data. The Constant Qt Base at 2.22%, and Qt Creator at 1.15%; (2) the identified review
Comparison process according to Charmaz et al. [35] includes three comments account for a similar percentage of the retrieved review
steps. The first step is initial coding executed by the four researchers, comments across the four projects; the respective percentages are: Nova
who examined the review comments by identifying violation symptoms at 4.77%, Neutron at 3.28%, Qt Base at 5.10%, and Qt Creator at
from the retrieved textual information. Second, we applied focused 7.84%.
coding executed by the first author and reviewed by the second author, In terms of the identified code review comments related to violation
by selecting categories from the most frequent codes and using them symptoms in our dataset, only a small portion of these comments (59
to categorize the data. For example, ‘‘feels like this DB work violates out of 606, 9.7%) pertained to the content in commit messages. In
contrast, the vast majority of the identified review comments related
the level of abstraction we are expecting here’’ was initially coded as
to violation symptoms (547 out of 606, 90.3%) were associated with
violation of abstraction, and we merged this code into violation of design
source code. Considering that the number of review comments on
principles, as we considered that the violation of ‘‘the expected level of
source code is 10 times higher than the number of review comments on
abstraction’’ belongs to violation of design principles. Third, we applied
commit messages as shown in Table 1, the proportion (10:1) is similar
theoretical coding to specify the relationship between codes. We checked
to the proportion of the identified review comments related to violation
the disagreements on the coding results by the four researchers to symptoms from the two sources (547:59).
reduce the personal bias, and discussed the disagreements with the
second author to get a consensus. The whole manual labeling and 4.2. RQ1 - Categories of violations symptoms
analysis process took the researchers around one and a half months.
During the data analysis process, if the violation symptoms were To answer RQ1, we identified 606 (out of 21,583) code review
specifically stated, we assigned them to specific groups. Conversely, comments from the four projects that contain a discussion of vio-
when the symptoms were defined more broadly or lacked specificity, lation symptoms of architecture erosion. As shown in Table 3, the
we classified them into general categories. We relied solely on the ex- collected code review comments can be classified into three categories
plicit textual content of the comments themselves during data analysis, of violation symptoms, with ten subcategories as follows.
without subjective interpretation. Besides, we note that we did not
• Design-related violation: six types of violation symptoms per-
find multiple violation symptoms were discussed in a single review
tained to design are identified, including: structural inconsisten-
comment. In other words, each identified review comment has one
cies, violation of design decisions, violation of design principles,
single label. In addition, unlike our previous study [12] which focused violation of rules, violation of architecture patterns, and violation
on both structural and violation symptoms, we conducted the data of database design.
collection and labeling processes in this study from scratch by following • Specification-related violation: two types of violation symptoms
the aforementioned steps; consequently we have established a more related to specifications are identified, including: violation of
comprehensive and larger dataset on violation symptoms [29]. documentation and violation of API specification.
• Requirement-related violation: two types of violation symptoms
relevant to requirements are identified, including: violation of
8
https://peps.python.org/pep-0008/ architecture requirements and violation of constraints.

5
R. Li et al. Information and Software Technology 164 (2023) 107319

Table 3
Categories of violation symptoms in code review comments.
Category Subcategory Description Count
Design-related Structural inconsistencies Violations of consistencies of structural design in architecture (e.g., architectural mismatch) 205
violation that exist in various architectural elements (e.g., components, ports, modules, and interfaces).
Violation of design decisions Violations of selected design decisions, including design rationale, intents, or goals that may 92
cause implementation errors and give rise to ever-increasing maintenance costs.
Violation of design principles Violations of the common design principles (e.g., the SOLID principles) or divergences from 91
object-oriented development guidelines.
Violation of rules Violations of the predefined architecture rules or policies when the implementation does not 46
actually follow the rules.
Violation of architecture patterns Violations of architecture patterns (e.g., violations of layered pattern) when the architecture 33
pattern implementations do not conform to their definitions.
Violation of database design Violations of the principles or constraints in designing the database of systems. 25
Specification-related Violation of documentation Violations of the specification in development documents that hinder architecture 56
violation improvements and modifications.
Violation of API specification Violations or inconsistencies of the API claims or specification. 36
Requirement-related Violation of architecture Violations of the intended requirements (e.g., user requirements) during the development and 11
violation requirements maintenance process.
Violation of constraints Violations of specific constraints imposed by the intended architecture may have an impact on 11
architecture design.

In the following subsections, we present the detailed descriptions of 4.2.2. Violation of design decisions
the ten subcategories, accompanied by a range of real-world examples. This category contains violations of design decisions that have been
The ten subcategories of violation symptoms are presented according made, especially the violation of design rationale. Such violations may
to their frequencies in Table 3 within the dataset. lead to implementation errors and consequently aggravate maintenance
costs. For example, one developer stated:
‘‘I chose olso because the intent is for this file to eventually not use
4.2.1. Structural inconsistencies
anything in nova, so adding a nova import seemed like a step in the wrong
Structural inconsistencies occur due to various reasons, for exam- direction.’’
ple, it might be associated with a change of software specifications, In addition, another reviewer pointed out a violation of design
and related components that implement the specifications cannot be intent:
automatically updated to keep consistency [37]. Structural inconsisten-
‘‘But if _get_provider_traits is being updated in this otherwise aggregate-
cies can delay system development, increase the cost of development
specific change, it seems like we’re splitting in two different directions which
process, and further jeopardise the properties of system quality (e.g., re-
defeats the purpose of trying to be consistent.’’
liability and compatibility). For example, one developer mentioned
Such violations imperil the actual implementation and lead to high
how inconsistency leads to an increased number of collaborating mod- error-proneness, and make software systems harder to implement, com-
ules (i.e., module dependencies), and consequently the complexity of prehend, maintain, and evolve. Here is an implementation error due to
the system: violating certain design decision as one developer mentioned:
‘‘Consistency is a nice thing, especially as in this case by not having ‘‘Originally, QMT_CHECK was Q_ASSERT (before I added this code to
consistency we are increasing the number of collaborating modules this QtCreator). It is a serious implementation error ... If you are not happy with
module has.’’ a crash you can add a check against 0. This will avoid the crash here but I
Another example is related to inconsistent implementation between am pretty sure that it will crash sooner or later on a different location.’’
extension classes:
4.2.3. Violation of design principles
‘‘I think it could, but that would make it inconsistent with every other The identified violation symptoms in this category denote violations
extension class’s implementation of this method (I gripped). I think its best of common design principles in object-oriented design and develop-
to keep them consistent. Feel free to file a low-hanging-fruit bug to fix this ment, such as encapsulation, abstraction, and hierarchy [40]. Common
across all extensions.’’ examples of object-oriented design principles include the SOLID prin-
In addition, architectural mismatch is another typical inconsistency ciples proposed by Martin [41], i.e., Single Responsibility Principle, Open
issue in this category, which denotes the inability to successfully inte- Closed Principle, Liscov Substitution Principle, Interface Segregation Prin-
grate component-based systems [38]. Architectural mismatch happens ciple, and Dependency Inversion Principle. As an example, one reviewer
pointed out a violation of the Interface Segregation Principle:
when there are conflicting assumptions among architecture elements
(e.g., connectors and components) during development [38,39]. One ‘‘But here don’t we have to make a upcall from compute to api db, which
reviewer mentioned an architectural mismatch between the server and will violate api/cell isolation rules. Is there any workaround in this case?’’
agent side due to port design issues: Another two examples are violations of abstraction and encapsula-
tion:
‘‘The port security extension adds functionality to disable port security,
which is on by default. I don’t think we should be changing the default ‘‘Having said all of that: I get that I’m violating an abstraction layer in
behavior when port security is not present. User can explicitly disable port LinkLocalAddressPair and that this is surprising (and therefore bad).’’
security if needed with port security extension. Enabling it here also makes ‘‘It would seem to violate encapsulation to have to know to set the default
a mismatch between server and agent side code.’’ value for an attribute outside of the object.’’

6
R. Li et al. Information and Software Technology 164 (2023) 107319

4.2.4. Violation of documentation can negatively impact the quality attributes (e.g., reusability, maintain-
The identified violation symptoms in this category encompass vio- ability, and portability), eventually leading to architecture erosion. For
lations of instructions in documentation, e.g., not following the instruc- instance, a developer commented on a violation of the layered pattern:
tions on how to implement an interface. Such violations of documenta-
‘‘That all said, I’m definitely −2 (even if not core ;-)) on that patch,
tion can hinder the subsequent architecture improvements and modifi-
because I think it’s a layer isolation violation to just make the call here. It
cations [42]. Two examples regarding violations of documentation are
should be fixed at the Compute API level rather IMHO.’’
presented below:
Another example regarding the layered pattern violation is:
‘‘The case of physical or service VM routers managed by L3 plugins
‘‘We could get some race conditions when starting the scheduler where it
doesn’t appear to be supported here when the gateway IP is not or cannot
would not know the allocation ratios and would have to call the computes,
be set to the router LLA. By supporting those cases in this way though, we’re
which is a layer isolation violation to me.’’
breaking the reference implementation as documented.’’

‘‘The reimplementation in QStandardItemModel does *not* match the 4.2.8. Violation of database design
documentation as it replaces the entire set of roles with the new ones. It Databases are one of the key architectural elements [43], and can
also violates the documentation with the silly EditRole->DisplayRole thing negatively impact the system quality attributes when their design is
(touching the value for a role not passed in input; although one could say violated. This category includes the problems caused by code changes
that QSIM ‘aliases’ EditRole with the DisplayRole in all cases).’’ that violate the constraints of database design, such as primary and
foreign key constraints. For example, as mentioned by one reviewer,
4.2.5. Violation of rules the network port will no longer exist when a foreign key violation is
The violations of architecture rules occur when the implementation generated:
does not actually follow the predefined rules by architects. Different
‘‘If subnet was fetched in reference IPAM driver, port got deleted and
systems have their own defined architecture rules or policies. During
foreign key violation was generated on ipallocation insert (because port no
the development, rules provide the way to specify the allowed and
longer exists).’’
specific relationships between architecture elements (e.g., components
Another example is about unique key violation:
and modules). As mentioned by a developer:
‘‘In the bug report, the randomly generated index happens to be 2, which
‘‘... this implies that that caller is able to put constraints on the driver
violates the already existing (router, 1) unique key constraints.’’
which may violate the rules built into the driver.’’
Another violation of predefined rules pointed out by a reviewer is:
4.2.9. Violation of constraints
‘‘We would now be limiting new spawns to only be allowed to the host Constraints are pre-determined special design decisions with zero
that was the cause of the violation, thus causing the violation to be made degrees of freedom [44], which can be regarded as special requirements
worse. But maybe this is okay, since there isn’t much we can do if the policy that cannot be negotiable and impact certain aspects of architecture
has been violated prior to this.’’ implementation. Such violations often denote the concrete statements,
expressions, and declarations in source code that do not comply with
4.2.6. Violation of API specification the constraints imposed by the intended architecture [45], such as
All API-related violations and inconsistencies belong to this cate- inter-module communication constraints. For example, one reviewer
gory. API documentation describes the explicit specification of inter- mentioned a constraint violation in the system:
faces and dependencies. Due to the changing business requirements
‘‘If a specific subnet is passed in, then the IPAM system would try to give
and continuous demands to upgrade functionalities, API evolution is
that subnet, but if it’s already use or it violates the constraints in the IPAM
inevitable. Violations of API specification encompass improper API us-
system, it would refuse.’’
ages and inconsistent API calls. Improper applications of APIs can give
Another example concerning constraint violation is:
rise to unexpected system behaviors and eventually cause architectural
inconsistencies. For example, developers do not adhere to the contract ‘‘This should probably be HTTPBadRequest: the provided allocation has
or specification to use the required APIs. As one reviewer stated: a form that violates Inventory constraints, so if the allocation (the request
body) changes, it could work.’’
‘‘the point is that with neither a category backend nor completely disabled
output, the macro should be unusable (even if it would print something, the
category would be missing, i.e., the implementation would violate the api).’’ 4.2.10. Violation of architecture requirements
Besides, inconsistent API calls can also cause inconsistencies, such as Requirements and especially quality attribute requirements are
getting different responses from different versions of APIs (e.g., distinct closely related to the architecture of a system. Architecturally signif-
parameters and returns). As one developer mentioned: icant requirements drive the architecture [43], but are, unfortunately,
commonly violated [1]. Moreover, architecturally significant require-
‘‘On v2 API, there is a lot of API inconsistencies and we need to fix them ments specify the major features and functionalities that a particular
on v3 API. So we can change API parameters on v3 API.’’ product should include, and convey the expectations of the stakeholders
for the software product. As an example, a developer mentioned that
4.2.7. Violation of architecture patterns the existing code had violated the requirements:
Architecture patterns provide general and reusable solutions for
‘‘We don’t have this in our requirements ... I guess this was already
particular problems, such as the layered pattern and client–server
violated by existing code so I don’t really want to block on it, we can handle
pattern. Violating architectural patterns undermines the sustainability
it separately.’’
and reliability of software systems and increases the risk of architecture
Another example of violation of requirements is:
erosion. For example, we found that violations of the layered pattern
are one of the most common types of this category. Modern software ‘‘This is global/module level data so it will get loaded once and stay
systems often contain millions of lines of code across many modules. loaded for the life time of the wsgi process even if the application is restrated
Therefore, employing hierarchical layers is a common practice to orga- in the interpreter. we are the ortinial code violated the requirement in pont3
nize the relationships between modules. Violations of layered systems of ...’’

7
R. Li et al. Information and Software Technology 164 (2023) 107319

Table 4
Categories and percentages of linguistic patterns used to express violation symptoms in code review comments.
Linguistic Patterns Description Example Percentage
Problem Discovery Linguistic patterns related to unexpected ‘‘Well, so this patch actually makes the API worse: It is as unclear as before when a temporary 90.4%
or unintended behaviors file is created in the temporary directory and when not, *and* it deviates from the behavior of
QTemporaryFile as well, so anyone knowing that behavior will get unexpected results here.’’
Solution Proposal Linguistic patterns related to describe ‘‘I’m not convinced that it is wise to deviate from that with this one, other than to use 10.4%
possible solutions for founded problems QPlainTestLogger::outputMessage(), which has side-effects on WinCE, Windows and Android. ...
Please consider inheriting QAbstractTestLogger instead and calling
QAbstractTestLogger::outputString() instead of QPlainTestLogger::outputMessage().’’
Opinion Asking Linguistic patterns used for inquiring ‘‘Why diverge from the pattern established for toFoo_helper below?’’ 6.8%
someone about his/her viewpoints and
thoughts
Information Giving Linguistic patterns for informing ‘‘I didn’t want to optimize in this way, because the code is wrong: The parent implementation 4.8%
someone about something should only be called if the sub-class does not handle the line itself. Several implementations got
this wrong, and it is indeed not self-explanatory. I will fix all of those as part of moving away
from the chaining approach.’’
Feature Request Linguistic patterns for providing ‘‘Recently there were some changes in API, virtual getters were replaced with protected setter. 2.0%
suggestions/recommendations/ideas Please be consistent with it and provide protected setter and public getter, both non-virtual. ... I
saw those changes came from them. And I strongly agree with it.’’
Information Seeking Linguistic patterns related to ask help or ‘‘I’m a bit confused as to how this violates the open/closed principle. ... I’m a bit unclear as to 1.8%
information from others what your are suggestion as an alternative. Are you suggesting having a separate config option for
each traffic type as an alternative?’’

Fig. 4. Distribution of the developers’ reactions in response to violation symptoms in


code review comments.

Fig. 3. Distribution of the frequently used terms related to description of violation


symptoms. namely, Feature Request, Opinion Asking, Problem Discovery, Solution
Proposal, Information Seeking, and Information Giving, which have been
employed to identify the linguistic patterns used in various textual
4.3. RQ2 - Expression of violation symptoms artifacts (e.g., app reviews [47,48] and issue reports [49]). Table 4
presents the statistical results of the review comments related to vio-
For answering RQ2, we inspected the content of the identified viola- lation symptoms, including the categories, descriptions, examples, and
tion symptoms in code review comments to identify the frequently-used percentages. A few code review comments contain more than one
terms and associated linguistic patterns. Fig. 3 presents the distribution linguistic pattern, and overall they add up to more than 100%. Our
of the most frequently-used terms related to the discussion of violation results show that most (90.4%) of the comments related to violation
symptoms in review comments. We use typical terms to represent the symptoms are about Problem Discovery, followed by Solution Proposal
words which have the same meaning. For example, ‘‘inconsistent ’’ con- (10.4%) and Opinion Asking (6.8%). Moreover, we list the linguistic
tains all the terms that have the same meaning, such as not consistent, patterns (frequency ≥ 3) used to express violation symptoms in Table 5.
inconsistent, inconsistency, and inconsistencies. The most frequently-used
term is ‘‘inconsistent ’’ (37%, 225 out of 606) and is related to the notion 4.4. RQ3 - Dealing with violation symptoms
of ‘‘consistency’’. ‘‘Violate’’ comes second with 140 (23%, out of 606)
comments, followed by ‘‘design’’ (9%, 52 out of 606) and ‘‘layer’’ (6%, To answer RQ3 and gain a better understanding of how developers
35 out of 606). We put the less-frequently used terms into ‘‘other terms’’, deal with violation symptoms, we plot a tree map (see Fig. 4) of
such as ‘‘module’’ and ‘‘architecture’’. the distribution of the status (i.e., ‘‘Merged’’, ‘‘Abandoned’’, and
In addition, to further analyze the characteristics of the comments ‘‘Deferred’’) of the patches containing violation symptoms. We fur-
related to violation symptoms, we summarized and categorized the ther analyzed the developers’ reactions (including refactored, removed,
linguistic patterns of expressing violation symptoms. We discovered the and ignored) in response to violation symptoms from code review
linguistic patterns by reviewing words or phrases that either frequently comments.
appear or are relatively unique to a particular category. Subsequently, We found that most (76.1%, 461 out of 606) of the violation symp-
we manually checked and categorized the comments into six categories toms are in ‘‘Merged’’ status, which means that developers agreed to
based on the linguistic patterns proposed by Di Sorbo et al. [46], merge the submitted code into the code repository. 23.1% (140 out of

8
R. Li et al. Information and Software Technology 164 (2023) 107319

Table 5 5.1. Interpretation of results


Linguistic patterns (frequency ≥ 3) used to express violation symptoms in code review
comments.
The percentage of the identified violation symptoms from code re-
# Problem Discovery
view comments is rather low (2.8%, 606 out of 21,583) in the selected
1 This is breaking [something]
four OSS projects (i.e., Nova, Neutron, Qt Base, and Qt Creator). Prior
2 This seems to violate [something]
3 It seems like a (layer) violation of [something]
studies [12,13] show that the percentage of architecturally-relevant
4 It seems inconsistent/not consistent with [something] information is comparatively lower than code-level issues (e.g., code
5 [someone] probably have [an issue] smells) in code reviews, and our results comply with the findings
6 This violates/breaks [something] from previous studies. Although the low percentage of architecturally-
7 [someone] is/are violating the rules
relevant code review comments, the identified architectural issues (es-
8 This is a violation of [something]
9 This is not consistent with [something] pecially architecture violation symptoms) can have a seriously negative
10 [something] is/are inconsistent with [something] impact on software maintenance and evolution.
11 There will be inconsistencies in [something] RQ1: Categories of violation symptoms. We classified the col-
12 There are inconsistencies [between something]/[of/in something] lected violation symptoms into three categories of violation symptoms
13 [something] leads to inconsistency
14 [something] is a (poor/awful/terrible) design mistake/flaw/choice
with ten subcategories that developers often discuss during develop-
15 [something] diverges/deviates from [something] ment. Design-related violations are the main category of violation
# Solution Proposal symptoms. Specifically, we observed that structural inconsistency is
the most common subcategory of violation symptoms. Structural incon-
1 I think [someone] should/need to [verb + subject]
2 [someone] should modify/preserve/revise [something] sistency might be triggered by classes with many methods that make
3 We should remove [something] systems tend to be complex, overloaded, and contain architectural
4 I think [do something] would better if [something] smells, and such classes have a greater possibility of being reused and
5 [someone] should/need to stay/keep consistent with [something] becoming the source of architectural inconsistencies [19]. Moreover,
6 To fix it, we need to [do something]
7 I think what we should do is [something]
structural inconsistencies might be hard to detect with tools. For in-
8 Did you consider the approach of [something] stance, architecture mismatch is a kind of structural inconsistency that
# Opinion Asking
is relatively difficult to detect by off-the-shelf tools due to various
reasons (e.g., standard architectural description languages are used to
1 Why are you diverging from [something]?
2 Would it be possible to [do something]? document architecture, but they generally do not support tool-assisted
3 Maybe we should [do something], what do you think about it? detection of architecture mismatches) [39].
4 Should/would we [do something]? Additionally, our results show that certain design-related factors are
# Feature Request also common sources of violation symptoms that we cannot ignore.
1 We should/need to keep consistent with [something] For instance, violations of layered pattern (i.e., violation of architecture
2 It is better to [do something] patterns in Section 4.2) undermine the sustainability and reliability of
3 I wonder if we can [do something] systems and may gradually lead to architecture erosion due to their ac-
# Information Giving cumulation. In many cases, such violations usually require considerable
1 I will modify/fix [something] to address/keep consistent with [something] effort to repair, to the extent that such a repair may not be financially
2 That is why I [doing something] feasible [1,50]. Besides, our results indicate that certain design-related
# Information Seeking violations (e.g., violations of design decisions, design principles, and rules)
1 Is there a different [something]? are prevalent during development, and one possible reason is that the
2 Are you suggesting [something]? missing architectural knowledge leading to these violations might exist
3 Are we planning to [do something]? in certain small groups such as architects or team leaders. Therefore,
our results suggest that disseminating and sharing the architectural
knowledge related to violation symptoms across the development team
is necessary.
RQ2: Linguistic patterns expressing violation symptoms. The
606) of the patches are in ‘‘Abandoned’’ status, which means that the results of RQ2 show that most (60%) of violation symptoms contain
submitted code was rejected to be integrated into the code repository. the terms about ‘‘inconsistent ’’ (e.g., not consistent and inconsistency)
Only a few patches (0.8%, 5 out of 606) stayed in ‘‘Deferred’’ status, and ‘‘violate’’ (e.g., violation and violating) (see Fig. 3). One possible
which denotes a pending status and only exists in the code review of explanation is that such terms are more in line with the idiomatic
Qt (the reviewers and developers consider that the raised issues are not expressions used by developers and reviewers, and they are commonly
of very high priority and can be fixed in the following releases). used to discuss issues related to violation symptoms in system design.
For the patches that contain violation symptoms, 77.7% (358 out Regarding the linguistic patterns, the results show that the linguistic
of 461) of the merged patches and 82.9% (116 out of 140) of the patterns of expressing violation symptoms from code review comments
abandoned patches, were addressed by refactoring. Moreover, 12.8% can be mapped to the categories of linguistic patterns identified in
(59 out of 461) and 9.5% (44 out of 461) of violation symptoms were development emails [46]. The six linguistic patterns can also be used
removed (i.e., deleted the code) and ignored (i.e., no changes), respec- to analyze textual artifacts in other sources (e.g., app reviews [47,48]
tively, in the merged patches. Similar percentage of violation symptoms and issue reports [49]), and our findings indicate that code review dis-
were removed (7.1%) and ignored (10.0%) in the abandoned patches. cussions also encompass these categories of linguistic patterns. Besides,
In the five deferred patches, developers refactored four submitted code the major type of linguistic patterns of expressing violation symptoms is
snippets to cope with the violation symptoms and ignored one of them, Problem Discovery (see Table 4). We conjecture that developers incline
while the remaining issues would be addressed in future releases. to use the linguistic patterns regarding Problem Discovery (see Table 5)
to specify violation symptoms, as one of the aims of code review is to
identify issues during development.
5. Discussion RQ3: Reactions to violation symptoms from developers. We
found that most of the identified violation symptoms were merged into
In this section, we first interpret the study results, and we then the code repositories after refactoring or removing the smelly code.
discuss the implications of the results for practitioners and researchers. This observation indicates that code review is necessary, and to a

9
R. Li et al. Information and Software Technology 164 (2023) 107319

large extent these review comments can help to mitigate the risk of 5.2.2. Implications for practitioners
architecture erosion caused by violation symptoms. In addition, devel- The results in Section 4 and their explanations in Section 5.1 can
opers’ reactions (i.e., 77.7% patches are merged into the code base) be used by practitioners to guide their refactoring and maintenance
also indicate that the review comments raising violation symptoms are activities. For example, having the categories of violation symptoms
crucial, especially for large-scale and long-term projects [51], as these might help the practitioners to be aware of the possible violations in
review comments help reduce violation symptoms and improve the system architecture and then consider avoiding or repairing such issues
quality of software systems. in their daily development work. Moreover, the frequently-used terms
In some sense, remove the code that contains violation symptoms can and linguistic patterns related to descriptions of violation symptoms
also be considered as code improvement, for example, removing dupli- from the viewpoint of developers, can help developers pay more at-
cated code or redundant dependencies decreases code complexity and tention to violation symptoms during maintenance and evolution. For
increases system maintainability. Therefore, the percentage of improve- example, such terms and linguistic patterns could be a clear signal that
ment (refactored + removed) regarding addressing violation symptoms developers should be wary of architecture erosion risks and avoid the
accounts for around 90% no matter whether it is merged into the code appearance of violation symptoms.
repositories or not. Only a small percentage (9.6%) of the identified Furthermore, we encourage practitioners to manage violation symp-
violation symptoms were ignored and remained in the systems, and toms with the purpose of facilitating refactoring and repairing architec-
ture violations. As reported by Schultis et al. in Siemens, architecture
one possible reason is that developers have different opinions about
violations must be explicitly managed, which includes addressing the
how to address the remaining violation symptoms without reaching
existing architecture violations and preventing future violations [53].
an agreement. Another possible reason is that the submitted code
Therefore, architecture teams (especially architects) should take the re-
containing violation symptoms is not quite urgent to be fixed and has
sponsibility for collecting and monitoring violation symptoms, and then
a lower priority, as the priority of issues depends on the severity and
equip developers with the knowledge to repair or minimize architec-
degree of the impact on different quality attributes [1]. In general, the
ture violations during development. Thus, practitioners can work with
study results show that developers incline to repair the issues related
researchers and put effort to developing dedicated tools for managing
to violation symptoms when they are pointed out or discussed during
violation symptoms and improving the productivity of maintenance
code review.
activity.

5.2. Implications 6. Threats to validity

5.2.1. Implications for researchers The threats to the validity of this study are discussed by following
We identified three categories of violation symptoms from code the guidelines proposed by Wohlin et al. [54]. Internal validity is not
review comments. Researchers are encouraged to investigate violation considered because this study does not address any causal relationships
symptoms from other artifacts (such as pull requests, issues, and devel- between variables.
oper mailing lists) to provide more comprehensive empirical evidence, Construct validity pertains to whether the theoretical and concep-
in order to further validate and consolidate the observations in this tual constructs are correctly interpreted and measured. In this work,
study. For example, the categories of violation symptoms can be further one potential threat is about the construction of the keyword set. To
explored because only four OSS projects (written in Python and C++) mitigate this threat, we first built the keyword set based on previous
from two communities (OpenStack and Qt) were used in our study. It studies, and then we employed a pre-trained word embedding model
would be worth to explore violation symptoms in both industrial and to query and select similar keywords in the software domain. Besides,
OSS projects written in other programming languages (e.g., Java) and we constructed and used the co-occurrence matrix approach proposed
communities (e.g., Apache). Besides, researchers can further conduct by Bosu et al. [31] to check the possible missing co-occurring words.
In this way, the potential threat can be, at least partly, mitigated.
a comparison between the identified violation symptoms by utiliz-
External validity concerns the extent to which we can generalize
ing certain off-the-shelf architecture conformance checking techniques
the findings to other studies. First, a potential threat to external validity
(e.g., reflexion models [52]) and our dataset (i.e., manually collected
is whether the selected projects are representative enough. Our work
violation symptoms). More specifically, they can try to perform quan-
chose the two largest and most popular OSS projects (i.e., Nova and
titative comparisons regarding the identified violation symptoms be-
Neutron) from the OpenStack community, both written in Python
tween code and textual artifacts (e.g., code review comments and
language; we further selected another two major OSS projects (i.e., Qt
commit messages) with the purpose of evaluating the performance
Base and Qt creator) from the Qt community, which are written in C++.
of the techniques. In addition, researchers can also try to map the
Second, another threat is that only Python and C++ OSS projects were
violation symptoms from textual artifacts to code in order to im-
selected, which may reduce the generalizability of the study results.
prove and complement the existing architecture conformance checking
Our findings may not generalize or represent all open source and closed
techniques. source projects. It would be interesting to select more projects from
Moreover, we have created a dataset [29] containing violation different sources and programming languages to increase the external
symptoms of architecture erosion from code review comments. This validity of the study results. Besides, it is worth exploring the gener-
dataset can act as a foundation for future study on architecture ero- alizability of the findings regarding the frequent terms and linguistic
sion [1], especially architecture violation symptoms. For example, re- patterns related to the identified violation symptoms in this work, for
searchers can further explore the possibility of automatic identification example, to investigate whether these findings are also applicable with
of violation symptoms from textual artifacts through employing nat- other artifacts (e.g., pull requests and issues).
ural language processing techniques based on machine learning and Reliability refers to the replicability of a study regarding yielding
deep learning algorithms. Automatically identifying violation symp- the same or similar results when other researchers reproduce this study.
toms would be of great value to developers, as manual identification The potential threat is mainly from the activities of data collection
can be extremely tedious, effort-intensive, and error-prone. Specifically, and data analysis. For data collection, we presented the detailed data
based on the models trained by machine learning and deep learning collection steps in Section 3.3 and provided a replication package [29]
algorithms, researchers can devise auxiliary plugins to existing code re- for reproducing the data collection and filtering process, which can
view tools for providing warnings of violation symptoms to developers help to enhance the reliability of the results. Regarding data labeling,
during development and maintenance. our observations show that developers generally discussed individual

10
R. Li et al. Information and Software Technology 164 (2023) 107319

violation symptoms within one single review comment, as such, the Besides, rule-based conformance checking approaches are also em-
threat of multiple symptoms discussed in one review comment with ployed to identify architecture violations. For example, previous stud-
multiple labels is not present in this work. As for the data analysis, to ies detected architecture violations by checking the explicitly defined
mitigate personal bias, we conducted a pilot labeling and classification architectural rules [4,8,57]. Moreover, it is viable to check architec-
(see Phase I in Section 3.4) before the formal data labeling and clas- ture conformance and identify architecture violations by defining and
sification process, and we got a Cohen’s Kappa value of 0.857, which describing the systems through Architecture Description Languages
indicates a substantial inter-rater agreement. Likewise, we executed a (ADLs) [8,9], or Domain-Specific Languages (DSLs) [10,11]. However,
similar process (see Phase II in Section 3.4) when we conducted the the aforementioned approaches have obvious limitations; for example,
formal data labeling and analysis. Any disagreements were discussed much effort is required to address the challenges of understanding the
between the four researchers to reach an agreement and at least two architecture design (e.g., concepts and relations), defining architectural
researchers participated in the data labeling and classification process. rules (or description languages) in advance, and establishing a mapping
between architectural elements and source code. Moreover, other lim-
7. Related work itations, such as lack of generalizability, visualization of architecture
views, and insufficient tooling support, hinder the above approaches
In this section, we discuss the work related to our study, which from being widely used in practice. Additionally, to the best of our
involves architecture violations and their detection approaches (i.e., ar- knowledge, prior studies regarding architecture violations focus on
chitecture conformance checking), as well as the data sources (i.e., code checking architecture conformance with source code using predefined
review comments) used in this study. abstract models and rules, and there is no evidence-based knowledge on
identifying architecture violations from textual artifacts, such as code
review comments.
7.1. Architecture violations

7.3. Code review comments


Over the past decades, there have been extensive investigation on
architecture violations. Brunet et al. [55] performed a longitudinal
Code review comments contain massive knowledge related to soft-
study to explore the evolution of architecture violations in 19 bi-weekly
ware development, and a variety of studies analyzed software defects
versions of four open source systems. They investigated the life cycle
and evolution through mining review comments and commit records.
and location of architecture violations over time by comparing the
Zhou and Sharma [58] designed an automated vulnerability identifi-
intended and recovered architectures of a system. They found that
cation system based on a large number of commits and bug reports
architecture violations tend to intensify as software evolves and a few
(containing rich contextual information for security research), and their
design entities are responsible for the majority of violations. More inter-
approach can identify a wide range of vulnerabilities and significantly
estingly, some violations seem to be recurring after being eliminated.
reduce false positives by more than 90% compared to manual effort.
Mendoza et al. [4] proposed a tool ArchVID based on model-driven
Besides, Uchôa et al. [59] investigated the impact of code review
engineering techniques for identifying architecture violations, and the
on the evolution of design degradation through mining and analyzing
tool supports recovering and visualizing the implemented architecture.
a plethora of code reviews from seven OSS projects. They found that
Moreover, Terra et al. [45] reported their experience in fixing
there is a wide fluctuation of design degradation during the revisions
architecture violations. They proposed a recommendation system that
of certain code reviews. Paixão et al. [60] explored how developers
provides refactoring guidelines for developers and maintainers to re-
perform refactorings in code review, and they found that refactoring
pair architecture violations in the module architecture view of object-
operations are most often used in code reviews that implement new fea-
oriented systems. The results show that their approach can trigger
tures. Besides, they observed that the refactoring operations were rarely
correct recommendations for 79% architecture violations, which were refined or undone along the code review, and such refactorings often
accepted by architects. Maffort et al. [56] proposed an approach to contribute to new code smells and bugs. Given that previous studies
check architecture conformance for detecting architecture violations discussed above investigated various aspects of code review regarding
based on defined heuristics. They claimed that their approach relies development and maintenance (e.g., decisions and design degradation),
on the defined heuristic rules and can rapidly raise architectural viola- there are no studies that investigate architecture violations through
tion warnings. Different from the abovementioned studies focusing on code review comments; we decided to explore the violation-related
detecting architecture violations in source code, our work investigates issues (i.e., violation symptoms) from code review comments.
the architectural violation symptoms in code review comments from
the perspective of developers, including the categories and linguistic 8. Conclusions
patterns of expressing violation symptoms, as well as the reactions
developers take to deal with violation symptoms. As software systems evolve, the changes in the systems could lead
to cascading violations, and consequently the architecture will exhibit
7.2. Architecture conformance checking an eroding tendency. In this work, we conducted an empirical study
to investigate the discussions on violation symptoms of architecture
Architecture conformance checking techniques are the most erosion from code review comments. We collected a large number of
commonly-used approaches to detect architecture violations [1]. They code review comments from four popular OSS projects in the OpenStack
can be checked statically or dynamically, and they are usually per- (i.e., Nova and Neutron) and Qt (i.e., Qt Base and Qt Creator) commu-
formed to compare the structure of the intended architecture (provided nities. Our results show that ten subcategories of violation symptoms
by the architects) with the extracted architecture information from in three main categories are discussed by developers during the code
source code that implements the architecture. For example, Pruijt review process. Besides, we found that the most frequently-used terms
et al. [7] proposed a metamodel for extensive support of semantically related to the description of violation symptoms concern structural
rich modular architectures in the context of architecture conformance inconsistencies and design-related violations, such as violation of design
checking. Miranda et al. [6] presented an architectural conformance decisions, design principles, and architecture patterns; the most common
and visualization approach based on static code analysis techniques linguistic pattern (90.4%) used to express violation symptoms is Prob-
and a lightweight type propagation heuristic. They evaluated their lem Discovery. Refactoring is the major measure that developers used
approach in three real-world systems and 28 OSS systems to identify to address violation symptoms, no matter whether the smelly code is
architecture violations. integrated (i.e., 77.7% refactorings happened in the merged patches) or

11
R. Li et al. Information and Software Technology 164 (2023) 107319

not (i.e., 82.9% refactorings happened in the abandoned patches). The [7] L. Pruijt, S. Brinkkemper, A metamodel for the support of semantically rich mod-
finding indicates that code review can help reduce violation symptoms ular architectures in the context of static architecture compliance checking, in:
Proceedings of the 11th Working IEEE/IFIP Conference on Software Architecture
and increase system quality.
(WICSA) Companion, ACM, Sydney, NSW, Australia, 2014, pp. 1–8.
Our findings encourage researchers to investigate violation symp- [8] R. Terra, M.T. Valente, A dependency constraint language to manage
toms from various artifacts (e.g., pull requests, issues, and developer object-oriented software architectures, Softw. - Pract. Exp. 39 (12) (2009)
mailing lists) in order to provide more comprehensive evidence for 1073–1094.
validating and consolidating the findings. The most frequently-used [9] H. Rocha, R.S. Durelli, R. Terra, S. Bessa, M.T. Valente, DCL 2.0: modular and
reusable specification of architectural constraints, J. Braz. Comput. Soc. 23 (1)
terms and recurring linguistic patterns used to express violation symp- (2017) 1–25.
toms can help researchers and practitioners better understand and be [10] A. Caracciolo, M.F. Lungu, O. Nierstrasz, A unified approach to architecture con-
aware of the natural language on describing violation symptoms of formance checking, in: Proceedings of the 12th Working IEEE/IFIP Conference on
architecture erosion commonly used by developers. Besides, explicitly Software Architecture (WICSA), IEEE, Montreal, QC, Canada, 2015, pp. 41–50.
[11] L. Juarez Filho, L. Rocha, R. Andrade, R. Britto, Preventing erosion in exception
managing violation symptoms can to some extent help reduce the oc-
handling design using static-architecture conformance checking, in: Proceedings
currence of architecture violations and prevent future violations during of the 11th European Conference on Software Architecture (ECSA), Springer,
development and maintenance. Canterbury, UK, 2017, pp. 67–83.
Developers usually discuss and address design-related issues in arti- [12] R. Li, M. Soliman, P. Liang, P. Avgeriou, Symptoms of architecture erosion in
code reviews: A study of two OpenStack projects, in: Proceedings of the 19th
facts such as commits, issues, and pull requests [61]. In this context,
IEEE International Conference on Software Architecture (ICSA), IEEE, Honolulu,
we plan to construct classification models based on textual artifacts Hawaii, USA, 2022, pp. 24–35.
with machine learning and deep learning techniques for the purpose [13] M. Paixao, J. Krinke, D. Han, C. Ragkhitwetsagul, M. Harman, The impact of
of automatically notifying developers about the potential violation code review on architectural changes, IEEE Trans. Softw. Eng. 47 (5) (2021)
symptoms of architecture erosion during development; for example, as 1041–1059.
[14] A. Bacchelli, C. Bird, Expectations, outcomes, and challenges of modern code
a plugin to the Gerrit tool during the code review process. We also plan
review, in: Proceedings of the 35th International Conference on Software
to invite practitioners to evaluate the effectiveness and efficiency of the Engineering (ICSE), IEEE, San Francisco, CA, USA, 2013, pp. 712–721.
proposed classification models and the tool on assisting developers in [15] Z. Li, X. Qi, Q. Yu, P. Liang, R. Mo, C. Yang, Multi-programming-language
detecting violation symptoms. commits in OSS: An empirical study on apache projects, in: Proceedings of the
29th IEEE/ACM International Conference on Program Comprehension (ICPC),
IEEE, Madrid, Spain, 2021, pp. 219–229.
CRediT authorship contribution statement [16] C.C. Venters, R. Capilla, S. Betz, B. Penzenstadler, T. Crick, S. Crouch, E.Y.
Nakagawa, C. Becker, C. Carrillo, Software sustainability: Research and practice
Ruiyin Li: Conceptualization, Investigation, Writing – original draft from a software architecture viewpoint, J. Syst. Softw. 138 (2018) 174–188.
[17] S. Hassaine, Y.-G. Guéhéneuc, S. Hamel, G. Antoniol, Advise: Architectural decay
Preparation, Visualization. Peng Liang: Conceptualization, Investiga-
in software evolution, in: Proceedings of the 16th European Conference on
tion, Resources, Writing – review & editing, Supervision, Project ad- Software Maintenance and Reengineering (CSMR), IEEE, Szeged, Hungary, 2012,
ministration, Funding acquisition. Paris Avgeriou: Conceptualization, pp. 267–276.
Investigation, Writing – review & editing, Supervision, Project admin- [18] D.M. Le, D. Link, A. Shahbazian, N. Medvidovic, An empirical study of ar-
istration, Funding acquisition. chitectural decay in open-source software, in: Proceedings of the 15th IEEE
International Conference on Software Architecture (ICSA), IEEE, Seattle, WA,
USA, 2018, pp. 176–185.
Declaration of competing interest [19] J. Lenhard, M. Blom, S. Herold, Exploring the suitability of source code
metrics for indicating architectural inconsistencies, Softw. Qual. J. 27 (1) (2019)
The authors declare that they have no known competing finan- 241–274.
[20] L. Hochstein, M. Lindvall, Combating architectural degeneration: A survey, Inf.
cial interests or personal relationships that could have appeared to
Softw. Technol. 47 (10) (2005) 643–656.
influence the work reported in this paper. [21] M. Mair, S. Herold, A. Rausch, Towards flexible automated software architecture
erosion diagnosis and treatment, in: Proceedings of the 11th Working IEEE/IFIP
Data availability Conference on Software Architecture (WICSA) Companion, ACM, Sydney, NSW,
Australia, 2014, pp. 1–6.
[22] D.M. Le, C. Carrillo, R. Capilla, N. Medvidovic, Relating architectural decay and
We have shared the link to our dataset in the reference [29]. sustainability of software systems, in: Proceedings of the 13th Working IEEE/IFIP
Conference on Software Architecture (WICSA), IEEE, Venice, Italy, 2016, pp.
Acknowledgments 178–181.
[23] S. Bhattacharya, D.E. Perry, Architecture assessment model for system evolu-
tion, in: Proceedings of the 6th Working IEEE/IFIP Conference on Software
This work has been partially supported by the National Natural Architecture (WICSA), IEEE, Mumbai, Maharashtra, India, 2007, pp. 44–53.
Science Foundation of China (NSFC) with Grant No. 62172311 and the [24] V.R. Basili, G. Caldiera, H.D. Rombach, The goal question metric approach,
Special Fund of Hubei Luojia Laboratory. Encycl. Softw. Eng. 1 (1994) 528–532.
[25] A.R. Da Silva, Linguistic patterns and linguistic styles for requirements specifica-
tion (i) an application case with the rigorous RSL/Business-level language, in:
References Proceedings of the 22nd European Conference on Pattern Languages of Programs
(EuroPLoP), ACM, Irsee, Germany, 2017, pp. 1–27.
[1] R. Li, P. Liang, M. Soliman, P. Avgeriou, Understanding software architecture [26] Y. Kashiwa, R. Nishikawa, Y. Kamei, M. Kondo, E. Shihab, R. Sato, N. Ubayashi,
erosion: A systematic mapping study, J. Softw. Evol. Process. 34 (3) (2022) An empirical study on self-admitted technical debt in modern code review, Inf.
e2423. Softw. Technol. 146 (2022) 106855.
[2] D.E. Perry, A.L. Wolf, Foundations for the study of software architecture, ACM [27] P. Thongtanunam, S. McIntosh, A.E. Hassan, H. Iida, Review participation in
SIGSOFT Softw. Eng. Notes 17 (4) (1992) 40–52. modern code review: An empirical study of the android, Qt, and OpenStack
[3] R. Li, P. Liang, M. Soliman, P. Avgeriou, Understanding architecture erosion: The projects, Empir. Softw. Eng. 22 (2) (2017) 768–817.
practitioners’ perceptive, in: Proceedings of the 29th IEEE/ACM International [28] T. Hirao, S. McIntosh, A. Ihara, K. Matsumoto, Code reviews with divergent
Conference on Program Comprehension (ICPC), IEEE, Madrid, Spain, 2021, pp. review scores: An empirical study of the OpenStack and Qt communities, IEEE
311–322. Trans. Softw. Eng. 48 (2) (2022) 69–81.
[4] C. Mendoza, J. Bocanegra, K. Garcés, R. Casallas, Architecture violations detec- [29] R. Li, P. Liang, P. Avgeriou, Replication Package for the Paper: Warnings:
tion and visualization in the continuous integration pipeline, Softw. - Pract. Exp. Violation Symptoms Indicating Architecture Erosion, 2022, http://dx.doi.org/10.
51 (8) (2021) 1822–1845. 5281/zenodo.7054370.
[5] L. De Silva, D. Balasubramaniam, Controlling software architecture erosion: A [30] V. Efstathiou, C. Chatzilenas, D. Spinellis, Word embeddings for the software
survey, J. Syst. Softw. 85 (1) (2012) 132–151. engineering domain, in: Proceedings of the 15th International Conference on
[6] S. Miranda, E. Rodrigues Jr., M.T. Valente, R. Terra, Architecture conformance Mining Software Repositories (MSR), ACM, Gothenburg, Sweden, 2018, pp.
checking in dynamically typed languages, J. Object Technol. 15 (3) (2016) 1–34. 38–41.

12
R. Li et al. Information and Software Technology 164 (2023) 107319

[31] A. Bosu, J.C. Carver, M. Hafiz, P. Hilley, D. Janni, Identifying the characteristics [48] A. Di Sorbo, S. Panichella, C.V. Alexandru, C.A. Visaggio, G. Canfora, SURF:
of vulnerable code changes: An empirical study, in: Proceedings of the 22nd Summarizer of user reviews feedback, in: Proceedings of the 39th IEEE/ACM
ACM SIGSOFT International Symposium on Foundations of Software Engineering International Conference on Software Engineering (ICSE) Companion, IEEE,
(FSE), ACM, Hong Kong, China, 2014, pp. 257–268. Buenos Aires, Argentina, 2017, pp. 55–58.
[32] X. Han, A. Tahir, P. Liang, S. Counsell, Y. Luo, Understanding code smell [49] Q. Huang, X. Xia, D. Lo, G.C. Murphy, Automating intention mining, IEEE Trans.
detection via code review: A study of the OpenStack community, in: Proceedings Softw. Eng. 46 (10) (2018) 1098–1119.
of the 29th IEEE/ACM International Conference on Program Comprehension [50] S. Sarkar, S. Ramachandran, G.S. Kumar, M.K. Iyengar, K. Rangarajan, S.
(ICPC), IEEE, Madrid, Spain, 2021, pp. 323–334. Sivagnanam, Modularization of a large-scale business application: A case study,
[33] S. Bird, E. Klein, E. Loper, Natural language processing with Python: Analyzing IEEE Softw. 26 (2) (2009) 28–35.
text with the natural language toolkit, Lang. Resour. Eval. 44 (2010) 421–424. [51] A. Bosu, J.C. Carver, C. Bird, J. Orbeck, C. Chockley, Process aspects and social
[34] J. Cohen, A coefficient of agreement for nominal scales, Educ. Psychol. Meas. dynamics of contemporary code review: Insights from open source development
20 (1) (1960) 37–46.
and industrial practice at microsoft, IEEE Trans. Softw. Eng. 43 (1) (2016) 56–75.
[35] K. Charmaz, Constructing Grounded Theory, Sage, 2014.
[52] G.C. Murphy, D. Notkin, K. Sullivan, Software reflexion models: Bridging the gap
[36] K.-J. Stol, P. Ralph, B. Fitzgerald, Grounded theory in software engineering re-
between source and high-level models, in: Proceedings of the 3rd ACM SIGSOFT
search: A critical review and guidelines, in: Proceedings of the 38th International
Symposium on Foundations of Software Engineering (FSE), ACM, Washington,
Conference on Software Engineering (ICSE), ACM, Austin, TX, USA, 2016, pp.
DC, USA, 1995, pp. 18–28.
120–131.
[53] K.-B. Schultis, C. Elsner, D. Lohmann, Architecture-violation management for
[37] J. Grundy, J. Hosking, W.B. Mugridge, Inconsistency management for multiple-
internal software ecosystems, in: Proceedings of the 13th Working IEEE/IFIP
view software development environments, IEEE Trans. Softw. Eng. 24 (11)
(1998) 960–981. Conference on Software Architecture (WICSA), IEEE, Venice, Italy, 2016, pp.
[38] D. Garlan, R. Allen, J. Ockerbloom, Architectural mismatch: Why reuse is so 241–246.
hard, IEEE Softw. 12 (6) (1995) 17–26. [54] C. Wohlin, P. Runeson, M. Höst, M.C. Ohlsson, B. Regnell, A. Wesslén, Ex-
[39] D. Garlan, R. Allen, J. Ockerbloom, Architectural mismatch: Why reuse is still perimentation in Software Engineering, Springer Science & Business Media,
so hard, IEEE Softw. 26 (4) (2009) 66–69. 2012.
[40] R.C. Martin, M. Martin, Agile Principles, Patterns, and Practices in C#, first ed., [55] J. Brunet, R.A. Bittencourt, D. Serey, J. Figueiredo, On the evolutionary nature
Pearson, 2006. of architectural violations, in: Proceedings of the 19th Working Conference on
[41] R.C. Martin, Agile Software Development: Principles, Patterns, and Practices, Reverse Engineering (WCRE), IEEE, Kingston, ON, Canada, 2012, pp. 257–266.
Prentice Hall PTR, 2003. [56] C. Maffort, M.T. Valente, R. Terra, M. Bigonha, N. Anquetil, A. Hora, Mining
[42] I. Macia, R. Arcoverde, A. Garcia, C. Chavez, A. von Staa, On the rele- architectural violations from version history, Empir. Softw. Eng. 21 (3) (2016)
vance of code anomalies for identifying architecture degradation symptoms, in: 854–895.
Proceedings of the 16th European Conference on Software Maintenance and [57] S. Schröder, M. Riebisch, Architecture conformance checking with description
Reengineering (CSMR), IEEE, Szeged, Hungary, 2012, pp. 277–286. logics, in: Proceedings of the 11th European Conference on Software Architecture
[43] L. Bass, P. Clements, R. Kazman, Software Architecture in Practice, fourth ed., (ECSA) Companion, ACM, Canterbury, United Kingdom, 2017, pp. 166–172.
Addison-Wesley Professional, 2021. [58] Y. Zhou, A. Sharma, Automated identification of security issues from commit
[44] L. Bass, P. Clements, R. Kazman, Software Architecture in Practice, third ed., messages and bug reports, in: Proceedings of the 11th Joint Meeting on
Addison-Wesley Professional, 2012. Foundations of Software Engineering (FSE), ACM, Paderborn, Germany, 2017,
[45] R. Terra, M.T. Valente, K. Czarnecki, R.S. Bigonha, A recommendation system for pp. 914–919.
repairing violations detected by static architecture conformance checking, Softw. [59] A. Uchôa, C. Barbosa, W. Oizumi, P. Blenilio, R. Lima, A. Garcia, C. Bezerra,
- Pract. Exp. 45 (3) (2015) 315–342. How does modern code review impact software design degradation? An in-depth
[46] A. Di Sorbo, S. Panichella, C.A. Visaggio, M. Di Penta, G. Canfora, H. Gall, DECA: empirical study, in: Proceedings of the 36th IEEE International Conference on
development emails content analyzer, in: Proceedings of the 38th IEEE/ACM Software Maintenance and Evolution (ICSME), IEEE, Adelaide, Australia, 2020,
International Conference on Software Engineering (ICSE) Companion, ACM,
pp. 511–522.
Austin, TX, USA, 2016, pp. 641–644.
[60] M. Paixão, A. Uchôa, A.C. Bibiano, D. Oliveira, A. Garcia, J. Krinke, E. Arvonio,
[47] A. Di Sorbo, S. Panichella, C.V. Alexandru, J. Shimagaki, C.A. Visaggio, G.
Behind the intents: An in-depth empirical study on software refactoring in
Canfora, H.C. Gall, What would users change in my app? Summarizing app
modern code review, in: Proceedings of the 17th International Conference
reviews for recommending software changes, in: Proceedings of the 24th ACM
on Mining Software Repositories (MSR), ACM, Seoul, South Korea, 2020, pp.
SIGSOFT International Symposium on Foundations of Software Engineering
125–136.
(FSE), ACM, Seattle, WA, USA, 2016, pp. 499–510.
[61] J. Brunet, G.C. Murphy, R. Terra, J. Figueiredo, D. Serey, Do developers discuss
design? in: Proceedings of the 11th Working Conference on Mining Software
Repositories (MSR), IEEE, Hyderabad, India, 2014, pp. 340–343.

13

You might also like