Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

An extension for Transactional Memory in OpenMP

Andre D. Jardim Kevin Oliveira Diogo J. Cardoso


andre.jardim@inf.ufpel.edu.br kodoliveira@inf.ufpel.edu.br diogo.jcardoso@inf.ufpel.edu.br
Programa de Pós-Graduação em Programa de Pós-Graduação em Programa de Pós-Graduação em
Computação Computação Computação
Federal University of Pelotas Federal University of Pelotas Federal University of Pelotas
Pelotas, Rio Grande do Sul, Brazil Pelotas, Rio Grande do Sul, Brazil Pelotas, Rio Grande do Sul, Brazil

Daniel Di Domenico Andre R. Du Bois Gerson G. H. Cavalheiro


ddomenico@inf.ufpel.edu.br dubois@inf.ufpel.edu.br gerson.cavalheiro@inf.ufpel.edu.br
Programa de Pós-Graduação em Programa de Pós-Graduação em Programa de Pós-Graduação em
Computação Computação Computação
Federal University of Pelotas Federal University of Pelotas Federal University of Pelotas
Pelotas, Rio Grande do Sul, Brazil Pelotas, Rio Grande do Sul, Brazil Pelotas, Rio Grande do Sul, Brazil
ABSTRACT KEYWORDS
The Transactional Memory model was proposed as a mechanism Multithreaded Programming, Transactional Memory, OpenMP
offering a higher-level programming interface to abstract some of
ACM Reference Format:
the complexities associated with simultaneous access to shared Andre D. Jardim, Kevin Oliveira, Diogo J. Cardoso, Daniel Di Domenico,
data. Although modern tools for multithreaded programming offer Andre R. Du Bois, and Gerson G. H. Cavalheiro. 2021. An extension for
resources, such as programming interface and scheduling facilities, Transactional Memory in OpenMP. In 25th Brazilian Symposium on Program-
for efficient hardware exploitation, the support for shared data syn- ming Languages (SBLP’21), September 27-October 1, 2021, Joinville, Brazil.
chronization still reflects classic critical section-based models. This ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3475061.3475089
work proposes an extension to the OpenMP, a de facto standard
for multithread programming, offering the Transaction Memory 1 INTRODUCTION
model. Different from other approaches found in literature to ex-
The Transactional Memory (TM) model has emerged as a promis-
tend OpenMP with Transaction Memory, we propose an interface
ing alternative to synchronization mechanisms based on mutual
that not only promotes the access to a Transaction Memory but
exclusion. Transactional memory, which can be implemented in
also reflects the OpenMP programming style. A specification of the
hardware or software, offers an alternative with a higher level of
OpenMP extension is presented, and a prototype implementation is
abstraction to classic synchronization mechanisms, providing a new
evaluated with the help of transactional memory tools in software:
synchronization control construct that avoids common blocking
the TinySTM library and the TM support offered by the GNU C
problems and significantly simplifies the programming effort to
Compiler (GCC). The proposed interface and its prototype are pre-
produce correct software [7]. A strong argument for the employ-
sented, in the form of an intermediate language, Vanilla-TM, and
ment of TM is based on its advantages in comparison with the
the interface validation was performed based on the analysis of the
use of mutex: scalability, composability, robustness and contention
results obtained. These results point to the viability of incorporate
reduction [8, 14, 17].
the proposed extension in an OpenMP dialect, as well as the anal-
Transactional memory offers, as one of its main objectives, an
ysis of the experiments allowed us to conclude that the policies
easier way to implement atomic updates of multiple independent
applied for TM management are decisive for a good performance
data, avoiding problems related to locks (liveness problems, inver-
of the programs.
sion of priorities and convoying). The use of TM provides to the
programmer a more abstract parallel programming model, with
CCS CONCEPTS compositional capacity and high performance for multiprocessor
• Software and its engineering → Application specific develop- systems, becoming a promising programming model [14]. Transac-
ment environments. tional memories, although they have existed since the 1990s, have
not been effectively absorbed into the new generation of tools for
multithreaded programming.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed Modern APIs (Application Programming Interface), such as Open-
for profit or commercial advantage and that copies bear this notice and the full citation MP, TBB and even C++ when considered the concurrent program-
on the first page. Copyrights for components of this work owned by others than ACM ming resources introduced in recent standards, do not include
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a natively a TM programming model. Some works in the litera-
fee. Request permissions from permissions@acm.org. ture, such as [1, 10, 19, 20], show efforts to achieve this inclu-
SBLP’21, September 27-October 1, 2021, Joinville, Brazil sion through extensions. Commonly, these proposals present a
© 2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-9062-0/21/09. . . $15.00 transaction-handling interface that extends the one offered by
https://doi.org/10.1145/3475061.3475089 OpenMP introducing a new specific directive to handle transactions.
SBLP’21, September 27-October 1, 2021, Joinville, Brazil Andre D. Jardim, Kevin Oliveira, Diogo J. Cardoso, Daniel Di Domenico, Andre R. Du Bois, and Gerson G. H. Cavalheiro

This approach can be seen as a great starting point for studying programmers to partition code in transactions and ensure that they
transactions within OpenMP. run atomically and in isolation from each other. STM can simplify
OpenMP offers different mechanism of data sharing between concurrent programming, facilitating data protection, allowing for
tasks in a same paralell region. However there is a gap that can sequential reasoning between transactions and disabling part of
be covered by the adoption of transactional memory support. For the concurrency bugs [7], such as deadlock or datarace.
that, this support must respecting the programming practices of The synchronization model offered by Transactional Memory
the tool, not introducing a resource totally foreign to it, like those presents itself as an alternative to methods based on mutexes, offer-
supported by an third party library, and extending its expression ing higher levels of abstraction in the programming using atomic
power. Furthermore, the TM support must reduce the need to in- operations. Also, due to the characteristics of atomicity and isola-
clude in algorithms the execution control mechanisms under a tion, transactional systems can be conceived [7], and implemented
regime of mutual exclusion. New solutions to the application of as consequence, to explore more parallelism, increasing their scala-
TM in programming tools have emerged in recent years, such as bility and performance.
its association with the garbage collector [13] or futures [15]. Also, Unlike the lock-based mechanism imposed by the use of mutex,
we found approaches extending the application of TM to program- the transactional memory model explores the concept of atomic
ming languages, such as C++ [21] and Python [16]. In the present transactions. A programmer defines a transaction by placing a set
proposal, the use of TM also distances itself from the classic model of programming language instructions inside an atomic block. This
of STM library implementations [5], like TinySTM and SwissTM, block represents a critical section and should only contain state-
where the programmer is required to indicate the addresses of the ments with reversible effects. A runtime system allows threads to
variables to be considered in a transaction explicitly in a service execute atomic blocks concurrently, making it appear that only one
provided by the library. thread at a time is executed within an atomic block. If a transaction
This article proposesto extend OpenMP incorporating a new running concurrently conflicts with another transaction, runtime
memory layer, the Transactional Memory (TM), shared among aborts (that is, undoes its effects) and tries again later; otherwise,
tasks. This is done by incorporating a new feature in its program- it confirms it (commits) and makes its effects visible to all other
ming interface allowing to bind variables to this memory layer. The threads. The runtime basically enforces the known atomicity, con-
obtained gain is to associate the semantics of operations on transac- sistency and isolation properties of database transactions that apply
tional memory with those offered by the other operations already to [12] programming language instructions. It is worth noting that
available in OpenMP. The objective of the work is to extend the the concurrency control of transactional memory may differ from
state of the art in interfaces for multithreaded concurrent program- lock-based mechanisms. TM may often present an optimistic con-
ming by introducing resources for handling transactional memories currency control: where ownership acquisition and validation of
in well established programming programming tools. The intro- shared objects only occurs when committing. Whereas the con-
duction of transactional memory mechanisms in multithreaded currency control for locks are pessimistic: ownership acquisition
programming tools is studied with regard to programming inter- is immediate hence preventing other threads to access the shared
face issues. It is understood in this work that its effective adoption object.
can occur when its use is in line with the design decisions of the
native programming interface. The proposed approach considers 3 OPENMP
this issue, proposing programming features compatible with those The OpenMP (Open Multi-Processing) specification [4, 11] defines
already offered in OpenMP. an API to explore concurrency in C/C++ and Fortran programs
The remaining of this paper is organized as follows. Sections 2 through a multithreaded programming tool with a high degree of
and 3 provide a theoretical basis for the topics covered in the article, portability. OpenMP focuses on shared memory environments and
describing Transactional Memories and OpenMP, respectively. Sec- its API includes the specification of environment variables, a service
tion 4 discusses works in the literature with similar proposals for library and a set of directives. These directives allow to describe
integrating TM with OpenMP. In section 5, the interface proposed in the concurrency of a program and define the synchronization be-
the article is presented and an OpenMP memory model with trans- tween concurrent activities. As it was conceived, the indication of
actions is contemplated. Section 5 also presents the prototyping concurrent sections in the code is carried out explicitly, annotating
of the proposed interface in the form of an intermediate language. an original (sequential) code with the specified directives.
Section 6 presents an analysis comparing the source code of our The OpenMP programming interface applies directives as the
interface with other approaches. Section 7 presents a performance basic element to describe the concurrency of a program and intro-
evaluation regarding the proposed extension to OpenMP. Finally, duce synchronization points. These directives must be used in a
Section 8 concludes the work and enumerates the possibilities for parallel region delimited by parallel. Directives have the form1 :
future work. #pragma omp <directive> [clauses]. Regarding this syntax,
#pragma omp instructs the preprocessor that an OpenMP directive
will be expanded, so that its corresponding code is generated. A
2 TRANSACTIONAL MEMORY
directive is made up of one or more commands, which may be
Transactional Memory, or TM, is a model build over a mecha- followed by clauses.
nism that promotes synchronization between competing threads.
This work focuses on STM (Software Transactional Memory). STM 1 In
this work, we limit the discussion of the syntax to the OpenMP standard for the
is a software-level concurrency control mechanism that enables C/C++ language.
An extension for Transactional Memory in OpenMP SBLP’21, September 27-October 1, 2021, Joinville, Brazil

OpenMP directives support several clauses, optional additions


that provide a simple and powerful way to control the behavior of
the constructor to which they apply. In fact, some of these clauses
are almost indispensable in practice. They may include, e.g., the
syntax required to specify which variables are shared and which
are private in the code for each task. An identifier in a program
refers to a memory address where the shared variable is stored, the
clauses define how the variables will be accessed, via identifiers, in
the concurrent code units.
In an OpenMP program, threads can communicate by regular
read/write operations on variables in the shared address space. Figure 1: Proposed Interface.
Although communication in an OpenMP program is implicit, it
is necessary to coordinate access to variables shared by multiple
threads in order to ensure correct execution. This is done by using
extending the expression capacity of OpenMP, not introducing the
the available synchronization mechanisms, and, above all, using
use of an orthogonal library to this language.
the task parameterization clauses.
However, the OpenMP standard defines the semantics of access-
ing data shared by clauses associated with parallelization directives,
4 RELATED WORK such as shared, private and reduce. Thus, it is understood that
This section discusses the works in the literature that proposed a semantic of access to data handle by transactions should also
programming interface extensions for multithreaded programming be defined in the same way, that is, being defined by clauses. In
tools to contemplate the use of transactional memory. this work, transaction is a clause, not a directive, since it allows
Milovanović et al. [9, 10] has presented Nebelung framework for defining a semantic to data sharing. A transaction clause, with
handling transactional memory and its use combined with OpenMP. CRCW access semantics (Concurrent Read, Concurrent Write) for
Nebelung provides a transactional memory mechanism in software. data access, should provide consistency in accessing shared data in
Its implementation has a multiprocessed execution core and a com- both read and write operations.
piler, Mercurium. Both papers discuss language design issues in the
association of OpenMP with transactional memories. 5.1 The transaction Clause
The work of Baek et al. [1] introduces OpenTM, an OpenMP
The transaction clause allows TM to be integrated into the way
extension including directives to express non-blocking synchroniza-
a parallel code is implemented from OpenMP. With this, it is pos-
tion and speculative parallelization based on memory transactions.
sible to use TM without changing the form of expression of the
OpenTM inherits the OpenMP execution model, memory semantics,
programming model provided by the OpenMP API. The proposed
language syntax and runtime constructs. Therefore, any OpenMP
new clause is called transaction. The (simplified) grammar of this
program is a legitimate OpenTM program. Non-transactional par-
extension is shown in Figure 1.
allel or sequential code behave exactly as described in the OpenMP
The keyword transaction explicit the use of transactional mem-
specification.
ory in an OpenMP program with the proposed extension. The
In [20], the authors explore the potential of transactional memory
transaction associated with the parallel directive indicates that,
for OpenMP applications. In the proposal, a software transactional
in this parallel region, tasks will be able to access the transactional
memory system provides a transaction primitive. This primitive
memory. In practice, the construction parallel transaction cre-
is combined with an OpenMP implementation that offers all other
ates a transactional memory in the memory hierarchy and binds
shared memory functionalities. The presented system uses OpenMP
the threads in the pool to it. In other directives, the transaction is
to generate threads and parallelize programs. The transactions
parameterized with a list of identifiers <id>. In this case, all tasks
indicated by the transactional memory handling interface serves as
access any variable belonging to <id> in a transactional way. Re-
an alternative to OpenMP synchronization.
garding the identifiers, it is emphasized that they must belong to the
In [2] and [19], the authors continue the work previously de-
scope of the created tasks. If any identifier corresponds to a pointer,
scribed in [20]. The paper presents results using transactional mem-
the transitioned variable is considered to be the one referenced by
ory in hardware in the Blue Gene/Q system from IBM. The article
the pointer.
also shows how this transactional memory system can significantly
The operator designated by <op> identifies the preferential mode
reduce the complexity of shared memory programming while main-
of access, as discussed below, to the transactional variable. This
taining efficiency. So, the authors continue to expand the results
access can be characterized as preferential in reading, writing or
presented in [20], specifying a programming interface for transac-
reading and writing, using the respective operators: R, W or RW. In
tional memory in OpenMP.
the absence of the operator’s specification, it is assumed that the
transaction operates the data in reading and writing. Another access
5 PROPOSED INTERFACE modifier can be given by the options A (adopt, default option) and
In the tools discussed as related works, the use of transactions D (defer). This modifier indicates that if the task being created will
in OpenMP is enabled employing a new directive transaction. perform the transaction on the data itself, or if a nested task will
This aspect can be considered positive, since the approaches aim at perform the transaction.
SBLP’21, September 27-October 1, 2021, Joinville, Brazil Andre D. Jardim, Kevin Oliveira, Diogo J. Cardoso, Daniel Di Domenico, Andre R. Du Bois, and Gerson G. H. Cavalheiro

The transaction for a transaction clause matches the Command


Block code that appears after. If other tasks are created in this block,
the access semantics to the variables monitored by the transaction
will take place in default mode for the respective directive or in
the way it is explicit, such as private, reduction or even again
transaction.
When applied to a sections directive, transactions are bound
by the scope of tasks created in the different section. The appli-
cation in the for directive implies that each created task, for
each generated chunk, will be a transaction. In use with the task Figure 3: Process for obtaining the code.
directive, the transaction corresponds to the code associated with
the created task.
The transaction clause aims, in addition to simplifying the the tools that support the TM model in software, this article points
parallel code, to avoid the use of critical sections or blocks, such out TinySTM [18] and GCC-TM [6] as alternatives to the implemen-
as those supported by the critical directive. Another proposed tation of Vanilla-TM. The first represents the implementation of
feature is related to the possibility of specifying the preferred access TM in the form of a library, compatible with C/C++ programs. The
mode (<op>) to the transactional variables. second, the support in the Gnu compiler for the C/C++ language
for the specification of transactional memories of this language.
5.2 Prototyping The Vanilla-TM interface consists of an intermediate language.
Its specification aims to meet the needs of different tools offering
The prototyping of the proposed extension exploits the Vanilla-
resources for exploring transactional memory in software. The
TM interface. This interface consists of an intermediate language
types of data handled in this intermediate language aim to allow
designed to allow the proposed extension to be supported by differ-
the communication of the data effectively handled by these libraries.
ent tools (libraries) to support transactional memory in software.
These types of data are manipulated in an opaque way, that is, their
The architectural model is shown in Figure 2. The application pro-
fields must not be manipulated by primitives, their fields never
gram is developed in OpenMP, using all the resources available
being accessed directly.
in the OpenMP specification, with the new transaction clause.
The use of the transaction clause to the OpenMP set of services,
supported by Vanilla-TM, is supported by the selected STM tool,
6 SOURCE CODE ANALYSIS
without interfering with the OpenMP operation mode and without As a way to analyse our proposal of a new transaction clause for
using any facilities offered by this pattern. OpenMP specification to support TM, we define some metrics and
analyse the source code of different approaches to transactional
memory in OpenMP. These metrics were applied to programs in
the Cowichan benchmark, that are meant to evaluate the expres-
siveness of concurrent and parallel programming interfaces. We
compared the source codes of the original OpenMP implementa-
tions of Cowichan, with extensions proposed in [10] and [19], in
addition with OpenMP versions that support TM such as TinySTM
and GCC-TM. Of the 14 problems present in Cowichan, we consid-
ered 6 that are fit to use transactions.
The metric we use to measure the difference in source code are
based on the Goal Question Metric (GQM) [3]. The GQM approach
assumes that some properties of the projects can be measured. The
data provided by these measures can help many administrative
decisions, such as selecting a programming tool. Following this
Figure 2: Architecture of the proposed OpenMP environ- approach we elected five questions that can be seen in Table 1, and
ment. our goal is to “characterize the use of the different interfaces for
TM in OpenMP”.
The Figure 3 shows the insertion of the step, in the process of gen- In Table 2, we present the resulting data of the source code
erating the executable, responsible for handling the transaction analysis. We can observe that the transaction clause we propose
clause. A code translation procedure identifies, in the application’s allows for a reduction in lines of code (NLC) and invocations of
source program, the use of the proposed clause, handling the code parallelization directives (NIPD) compared to the original OpenMP
in order to obtain a new OpenMP program with service calls to the source code. The number of directives and clause invocations (ARU)
tool adopted, to support TM. has very close values with the other implementations, but still lower
It is important to note that the Vanilla-TM is not effectively a than Nebelung [10] which had the highest amount in this metric.
new library for handling TM in C programs. It is an intermediate A smaller number of code lines in all Cowichan problems eval-
representation to allow the portability of implementations carried uated can be observed with the use of the transaction clause,
out between different programming tools with TM support. Among which had similar results as Nebelung [10] with Wong [19] very
An extension for Transactional Memory in OpenMP SBLP’21, September 27-October 1, 2021, Joinville, Brazil

Table 1: Goal Question Metric table. In terms of how the transactional blocks are generated, the
transaction clause differs from the other implementations be-
Q1. What is the size of the source code? cause of how the identifiers can be used. In the implementation by
Metric NCL - Number of Code Lines
Definition Total number of lines of code in the source code including Wong [19], either the entire parallel section is used as a transaction
instructions and API calls. or a specific directive is used to dictate which parts of the code may
Comment This count does not include blank lines and comments in the have access to the TM. In contrast, in Nebelung [10], it is possible
source file.
Q2. How many resources of the OpenMP API are used? to dictate which variables are transactional variables, however that
Metric ARU - Amount of Resources Used has to be done with an additional clause. Furthermore, none of the
Definition Shows how many resources of the OpenMP API are necessary other implementations allow for the specification of the type of
for the implementation.
Comment This counts for all unique directives, clauses and function calls access (read or write) in the transactional identifiers.
defined in OpenMP. The transactional clause we propose grants the ability of defin-
Q3. How many parallelization directives are used? ing identifiers for transactional memory directly in the clause itself.
Metric NIPD - Number of Invocations to Parallelization Directives
Definition Total number of occurrences of OpenMP directives that creates Something that is possible with other clauses in OpenMP that focus
tasks. on shared data access control. Thus, specifying and controlling
Comment Characterizes the amount of parallel blocks necessary for the the transaction in the parallel section becomes simpler and more
implementation.
Q4. What is the ration of resources and the source code size? compatible with other OpenMP clauses.
Metric Rel1 - Relation between Number of Code Lines and Resources When analysing the implementations of TinySTM and GCC-TM,
Used we identified a clear difference in the level of abstraction these
Definition Rel1 = NCL / ARU.
Comment The bigger this number is, the more resources are used. implementations have compared to our interface. The program-
Q5. What is the ration of parallel zones and source code size? ming interface provided by TinySTM and GCC-TM require of the
Metric Rel2 - Relation between Number of Code Lines and Number of programmer to inform the memory address that will be manipu-
Parallelization Directives
Definition Rel2 = NCL / NIPD. lated in a transactional manner. This results in a situation where
Comment This shows a relative amount of parallel blocks compared to a given variable in the TM is not dependent on the scope of its
the size of the code. identifier. In our interface, in contrast to TinySTM or GCC-TM, the
transactional access to a variable is allowed only in the structured
Table 2: Comparison between OpenMP code of the block associated to the OpenMP current directive. If the variable is
Cowichan problems. passed as parameter by reference to a function, any access through
the reference is not observed by the TM manager.
OpenMP
GCC-TM TinySTM
Metric Problem transaction original Nebelung Wong
hull 152 208 152 155 167 180
norm 60 95 60 60 68 78
outer 31 53 31 34 38 45
NCL
sor 41 61 41 44 48 55 7 EXPERIMENTATION AND PERFORMANCE
tresh 59 102 59 59 70 86
veciff 20 42 20 23 28 35 ANALYSIS
hull 3 2 4 3 3 3
norm 3 2 4 3 2 2
The experimentation was carried out considering two sets of pro-
outer 4 3 5 4 3 3 grams. The first is represented by the Cowichan [2] benchmark
ARU
sor 4 4 5 4 3 3 applications. Programs from this benchmark were implemented to
tresh 3 3 4 3 2 3
vecdiff 3 3 4 3 3 3 validate the interface. The second set of programs consists of an
hull 3 5 3 4 5 5 application of the STAMP [18] benchmark (bayes) and a program
norm 2 3 2 2 3 3 developed as a benchmark for OpenMP (kmeans). The experiments
outer 3 4 3 4 4 4
NIPD in steps 1 and 2 were executed using two NUMA platforms: Hydra
sor 1 2 1 2 2 2
tresh 7 9 7 7 9 9 (Opteron architecture with 64 cores, 4 nodes, 120 GB RAM) and
vecdiff 1 2 1 2 2 2
hull 0,02 0,01 0,03 0,02 0,02 0,02
Tekoha (Xeon architecture with 192 cores, 8 us, 120 GB RAM).
norm 0,05 0,02 0,07 0,05 0,03 0,03 The performance index collected is the execution time, presented
ARU / outer 0,13 0,06 0,16 0,12 0,08 0,07 in seconds, obtained using an average of at least 30 executions in
NCL sor 0,10 0,07 0,12 0,09 0,06 0,05
tresh 0,05 0,03 0,07 0,05 0,03 0,03 each case. In the presentation of these averages, it is highlighted
veciff 0,15 0,07 0,20 0,13 0,11 0,09 when the samples of each case adhere to a normal distribution or
hull 0,02 0,02 0,02 0,03 0,03 0,03 not. For this, the Kolmogorov-Smirnov test was performed with
norm 0,03 0,03 0,03 0,03 0,04 0,04
NIPD / outer 0,10 0,08 0,10 0,12 0,11 0,09 95% confidence. The comparisons between the cases use adequate
NCL sor 0,02 0,03 0,02 0,05 0,04 0,04 statistical tests to obtain positioning of the performance between
tresh 0,12 0,09 0,12 0,12 0,13 0,10 two cases. When samples adhere to a normal curve, Student’s T
vecdiff 0,05 0,05 0,05 0,09 0,07 0,03
was used, otherwise the Mann-Whitney U test was used, both also
with 95% confidence.
close but with a higher number. In regards to the use of resources, The case studies obtained reflect a combination of the versions of
our proposal had the similar results as [19]. It is also noticeable the programs to be compared, different execution supports, different
that in general, every implementations with TM support had better architectures (in steps 1 and 2) and different number of threads in
results than the original OpenMP. the OpenMP runtime support.
SBLP’21, September 27-October 1, 2021, Joinville, Brazil Andre D. Jardim, Kevin Oliveira, Diogo J. Cardoso, Daniel Di Domenico, Andre R. Du Bois, and Gerson G. H. Cavalheiro

7.1 Experimentation Step 1: Prototype Table 5: Case performance: Input large, with 32 threads at
Validation execution time (time in seconds).

The proposed interface was validated implementing six Cowichan Hydra Tekoha
benchmark programs in which the applicability of the transactional OpenMP TinySTM GCC-TM OpenMP TinySTM GCC-TM
memory model was possible: hull, norm, outer, sor, thresh and vecd- hull 2,42 9,15 149,65 2,42 13,27 267,01
norm 0,61 1,28 47,54 0,15 0,63 80,79
iff. The Cowichan benchmark was not designed for performance outer 0,81 4,53 131,83 1,92 2,24 241,51
analysis, but to characterize parallel programming tools in terms sor 4,01 4,14 4,51 0,79 0,9 1,06
of the ability to represent your programming resources. thresh 1,55 265,14 502,84 0,87 490,6 1221,75
vecdiff 0,86 2,5 221,55 0,19 1,43 399,66
7.1.1 Performance Analysis. Three implementations were evalu-
ated for the selected Cowichan problems. One in OpenMP2 and the
other two obtained from this first version by adapting it through with red cells in Table 5. This case illustrates a situation where it
a prototype of the proposed interface with versioned Vanilla-TM, is not possible to assert the superiority of OpenMP over TinySTM,
that supports transactional memory offered by TinySTM and GCC- or TinySTM over GCC-TM. In this program, the test applied to
TM. The experiments considered three input sizes of the problem, the results was unable to confirm, with 95% confidence, that the
defined according to the nature of each application, identified by: averages obtained by executions with the three tools belonged to
small, medium and large. Execution times for 2, 4, 8, 16, 32 and 64 different populations.
threads were collected in the OpenMP execution team. The overall analysis of the results allows inferring that pure
Tables 3, 4 and 5 exemplify the results obtained by presenting OpenMP naturally produces better performance by using lower
the average execution times obtained by running the programs on level resources. It is also observed that for different cases, sor and
small, medium and large size entries with 32 threads in the OpenMP norm highlighted at this point, the use of TM (in particular with
execution team. TinySTM) can be considered. For others, like thresh and hull, TM
tools are not appropriate, at least not using the same algorithm
Table 3: Case performance: Input small, with 32 threads at implemented in OpenMP.
execution time (time in seconds).
7.2 Experimentation Step 2: Analysis of the
Hydra Tekoha Prototype Behavior
OpenMP TinySTM GCC-TM OpenMP TinySTM GCC-TM
hull 0,22 0,25 0,21 0,24 0,26 0,23 In the second stage of experimentation, two programs originally
norm 0,06 0,15 4,76 0,04 0,08 7,07 developed for performance analysis were implemented with the pro-
outer 0,29 0,28 8,34 0,09 0,14 14,04 posed interface and their performance results were evaluated. One
sor 4,25 0,28 0,26 0,07 0,09 0,07
thresh 1,13 16,13 34,60 0,14 30,93 51,19 of these programs, bayes, belongs to the benchmark STAMP [18],
vecdiff 0,1 0,25 22,36 0,03 0,17 39,2 developed to evaluate the performance of tools for programming
with transactional memory. The second program was developed
as a benchmark for OpenMP3 . The objective of this second stage
Table 4: Case performance: Input medium, with 32 threads of experimentation is to position the proposed solution in relation
in the execution time (time in seconds). to its applicability in problems conceived within the context of
programming with transactional memory and problems designed
Hydra Tekoha to be applied over pure OpenMP.
OpenMP TinySTM GCC-TM OpenMP TinySTM GCC-TM
hull 0,8 0,94 0,81 0,74 0,73 0,74 7.2.1 Bayes. This program, in the STAMP benchmark, implements
norm 0,32 0,65 23,78 0,08 0,33 39,82 an algorithm for learning Bayesian networks. The complexity of the
outer 0,47 0,68 33 0,39 0,54 59,38
sor 0,9 0,95 0,88 0,24 0,25 0,25
program is associated with the number of variables considered and
thresh 0,45 66,27 126,08 0,33 122,26 209,01 the connection structure between these variables. These connec-
vecdiff 0,19 0,47 44,73 0,06 0,33 79,2 tions represent, in terms of probability, the conditional dependence
between two variables. The launch of the program allows the pa-
In the tables presented, the cells marked in red correspond to rameterization of the size of the problem indicating the number of
the experiments whose samples did not adhere to a normal curve. variables to be treated and the degree of connectivity between the
That is, these averages do not represent samples whose distribution variables.
has been shown to adhere to a normal distribution. In this case, the In the original implementation of this problem4 , the likelihood
average times are presented in an illustrative way, but were not function is calculated for each variable and then the values obtained
used in the performance comparison between the tools. for each variable are accumulated. The calculation operation is
The result in which executions with pure OpenMP would per- performed concurrently and the accumulation operation to obtain
form better, due to the extra cost of adding TM management mech- the global value uses some synchronization mechanism.
anisms, was proven in almost all cases. The exception is in the 3 The code used as the basis for the implementation is available at https://github.com/
sor program when submitted to a large entry. This is highlighted manshi10/kmeans, accessed on November, 12, 2020.
4 The implementation of this benchmark used in this work was taken from the reposi-
2 Retrieved from https://code.google.com/archive/p/cowichan, accessed November 9, tory of one of its authors: https://github.com/kozyraki/stamp, accessed on September
2020. 7, 2020.
An extension for Transactional Memory in OpenMP SBLP’21, September 27-October 1, 2021, Joinville, Brazil

Table 6: Comparison: bayes and kmeans

Hydra Tekoha
2 4 8 16 32 64 2 4 8 16 32 64
Small
kmeans 19,40 10,10 5,29 2,92 1,86 1,62 10,78 7,40 5,05 3,97 3,96 3,89
kmeans-VAN 32,91 38,65 83,84 167,36 401,66 534,04 37,24 50,40 117,27 223,94 459,53 918,57
bayes 10,60 10,84 12,91 11,04 11,86 11,42 5,45 5,76 6,73 5,97 5,29 5,71
bayes-VAN 6,76 3,66 5,70 6,41 3,77 4,10 3,55 2,24 2,81 3,40 2,96 2,17
Medium
kmeans 37,12 19,34 10,05 5,56 3,51 3,03 20,70 14,38 10,31 8,16 7,76 7,54
kmeans-VAN 23,92 88,25 152,68 372,36 846,79 1011,78 56,19 113,27 224,95 480,41 930,51 1842,23
bayes 14,12 18,21 12,36 15,67 19,99 16,46 7,33 8,73 9,46 7,04 7,55 8,46
bayes-VAN 8,57 5,05 16,38 9,98 9,12 8,93 3,60 4,77 7,66 4,72 3,23 7,02
Large
kmeans 75,16 39,33 20,67 11,10 6,89 6,00 40,86 29,29 21,72 17,30 15,97 15,12
kmeans-VAN 147,07 221,28 335,41 761,71 1677,89 2152,70 126,21 247,09 485,72 1031,31 1872,23 3570,21
bayes 14,50 15,99 13,34 14,18 14,35 13,68 8,30 6,63 8,80 8,80 7,43 7,55
bayes-VAN 7,41 7,64 4,64 8,34 4,50 5,03 5,28 4,46 2,18 3,93 2,70 2,90

7.2.2 Kmeans. The implementation used belongs to a benchmark5 8 CONCLUSION


developed to evaluate different implementations of the kmeans This work proposed an extension for OpenMP to support TM. The
algorithm in different programming tools. Among the variations main result was the incorporation of the transactional memory
presented in the original version of this benchmark, there is an im- mechanism in OpenMP and its validation, carried out with the
plementation in OpenMP using atomic operations for manipulating support of a prototype developed on tools offering support to TM,
shared data. This implementation was used for evaluation purposes TinySTM and GCC-TM.
in the present work. The data resulting from the analysis of the codes showed, thanks
to the expressiveness of the proposed resource, that the use of
7.2.3 Performance Analysis. The programs labelled with -VAN are the transaction provided a small number of lines of code, direc-
those implemented with the proposed interface. The kmeans pro- tives and invocations of directives in comparisons with the original
gram is the original benchmark implemented in OpenMP. The bayes OpenMP code and with the versions employing the OpenMP exten-
program, on the other hand, is the version available in the STAMP sions proposed in the related works. In the experimentation and
benchmark, also implemented in OpenMP, with TinySTM support performance analysis, it was possible to verify that the prototype
in its default settings. The time averages for runs using OpenMP was operational with both TinySTM and GCC-TM, although the
with 2, 3, 8, 16 and 64 threads in each architecture are presented in comparison regarding performance did not show advantages in
Table 6. The cells with a red background indicate samples that did relation to pure OpenMP.
not adhere to a normal curve. The prototyping of the proposed extension explored the Vanilla-
The experiment methodology involved sampling the execution TM, an interface that consists of an intermediate representation
times of each of the implementations. The version of the code im- designed to allow the proposed extension to be supported by dif-
plementing the proposed interface was supported by the TinySTM ferent STM tools. Among the these tools, this article pointed out
library, configured with its default options, for manipulating the TinySTM and GCC-TM as alternatives to the implementation of
data in transactional memory. Vanilla-TM. The use of the operational clause to the set of services
In this experiment, it was observed that the samples adhered to OpenMP transaction, with support of Vanilla-TM, is supported
a normal distribution in a greater number of cases. In the Tekoha by the selected STM tools, without interference in the OpenMP
machine, perhaps because this machine offers greater processing operation mode and without using any facilities offered by this
power, the management cost was not compensated by the size of pattern.
the problems explored. This statement is presented considering The proposed interface was evaluated employing a code compar-
cases where the cost of the processing problem is smaller than the ison metric. Codes implemented for 6 different programs from the
exploited parallelism capacity, the execution time in Tekoha is Cowichan benchmark were considered in 6 different programming
higher. From the analysis of the samples, there is an indication that interfaces: OpenMP, OpenMP extended with the proposed inter-
the bayes-VAN version shows a performance gain compared to its face, two STM libraries (TinySTM and GCC-TM) and two other
original version. proposals for extending the use of TM over OpenMP found in liter-
ature ([10, 19]). As a general result, the extension proposed by this
paper and the one proposed by Nebelung [10] required the shorter
5 The original implementation of the benchmark is available at url- codes. In terms of amount of different programming resources, the
http://users.eecs.northwestem.edu/wkliao/Kmeans/index.html, accessed in January numbers obtained with our proposed interface are comparable to
2021.
SBLP’21, September 27-October 1, 2021, Joinville, Brazil Andre D. Jardim, Kevin Oliveira, Diogo J. Cardoso, Daniel Di Domenico, Andre R. Du Bois, and Gerson G. H. Cavalheiro

the others interfaces. However, regarding this metric, Nebelung [11] OpenMP Specification. 2018. Version 5.0. The OpenMP Architecture Review
presented higher values. Board.
[12] Victor Pankratius and Ali-Reza Adl-Tabatabai. 2014. Software Engineering with
A performance evaluation was executed implementing bench- Transactional Memory Versus Locks in Practice. Theor. Comp. Sys. 55, 3 (Oct.
marks programs for TM (Bayes) and OpenMP (kmeans) applying the 2014), 555–590. https://doi.org/10.1007/s00224-013-9452-5
[13] V. M. Dhivya Shri and K. Reshma. 2019. The Transactional Memory. International
proposed interface. The results allowed to analyse the cost added Journal of Scientific Research in Computer Science, Engineering and Information
by the proposed interface in comparison with pure OpenMP and Technology (Feb 2019), 13–20. https://doi.org/10.32628/cseit1951117
TinySTM. They allowed us to suppose a stability in the overhead us- [14] S. K. Srivatsa and Ch. R. Kumar. 2012. Reconfigurable Frame Work for Chip-
multiprocessors and its Application in Multithreaded Environment. Interna-
ing different number of threads regardless of the employed problem tional Journal on Information Sciences and Computing 6, 1 (Jan 2012), 41–48.
size. They also indicated that the proposed interface presents a low https://doi.org/10.18000/ijisac.50111
overhead in relation to the direct use of TinySTM over OpenMP. [15] Janwillem Swalens, Joeri De Koster, and Wolfgang De Meuter. 2018. Chocola: Inte-
grating Futures, Actors, and Transactions. In Proceedings of the 8th ACM SIGPLAN
For future work, we intend to specify the Vanilla-TM interface International Workshop on Programming Based on Actors, Agents, and Decentralized
in a way where it could be representative for a wider spectrum of Control (Boston, MA, USA) (AGERE 2018). Association for Computing Machinery,
New York, NY, USA, 33—-43. https://doi.org/10.1145/3281366.3281373
STM tools, also including space for its extensibility. Features such [16] Tabassum and Meenu. 2020. Transactional Memory: A Review. In 2020 6th
as transaction nesting, definition of read and write set, library calls International Conference on Advanced Computing and Communication Systems
and use of I/O can also influence the definition of the extension (ICACCS). 370–375. https://doi.org/10.1109/ICACCS48705.2020.9074423
[17] Jons-Tobias Wamhoff, Torvald Riegel, Christof Fetzer, and Pascal Felber. 2010.
for TM, so they will also be taken into consideration. In addition, RobuSTM: A robust software transactional memory. In Symp on Self-Stabilizing
the effective introduction of the extension in OpenMP and a per- Systems. Springer, 388–404.
formance evaluation considering a specific benchmark to evaluate [18] Gregory V. Wilson and R. Bruce Irvin. 1995. Assessing and Comparing the Usabil-
ity of Parallel Programming Systems. Technical Report. University of Toronto.
the transactional memory model tools will also be contemplated. Computer Systems Research Institute.
[19] Michael Wong, Eduard Ayguadé, Justin Gottschlich, Victor Luchangco, Bronis R.
ACKNOWLEDGMENTS de Supinski, and Barna Bihari. 2014. Towards Transactional Memory for OpenMP.
In Using and Improving OpenMP for Devices, Tasks, and More, Luiz DeRose, Bro-
This study was financed in part by the Coordenação de Aperfeiçoa- nis R. de Supinski, Stephen L. Olivier, Barbara M. Chapman, and Matthias S.
Müller (Eds.). Springer International Publishing, Cham, 130–145.
mento de Pessoal de Nível Superior – Brasil (CAPES) – Finance [20] Michael Wong, Barna L. Bihari, Bronis R. de Supinski, Peng Wu, Maged Michael,
Code 001. . This work has been partially supported by the project Yan Liu, and Wang Chen. 2010. A Case for Including Transactions in OpenMP.
“GREEN-CLOUD: Computação em Cloud com Computação Sus- In Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More,
Mitsuhisa Sato, Toshihiro Hanawa, Matthias S. Müller, Barbara M. Chapman,
tentável” (#16/2551-0000 488-9), from FAPERGS and CNPq Brazil, and Bronis R. de Supinski (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg,
program PRONEX 12/2014. 149–160.
[21] Pantea Zardoshti, Tingzhe Zhou, Pavithra Balaji, Michael L. Scott, and Michael
Spear. 2019. Simplifying Transactional Memory Support in C++. ACM Trans.
REFERENCES Archit. Code Optim. 16, 3, Article 25 (July 2019), 24 pages. https://doi.org/10.
[1] Woongki Baek, Chi Cao Minh, Martin Trautmann, Christos Kozyrakis, and Kunle 1145/3328796
Olukotun. 2007. The OpenTM Transactional Application Programming Interface.
In Proceedings of the 16th International Conference on Parallel Architecture and
Compilation Techniques (PACT ’07). IEEE Computer Society, Washington, DC,
USA, 376–387. https://doi.org/10.1109/PACT.2007.74
[2] Barna L. Bihari, Michael Wong, Amy Wang, Bronis R. de Supinski, and Wang Chen.
2012. A Case for Including Transactions in OpenMP II: Hardware Transactional
Memory. In OpenMP in a Heterogeneous World, Barbara M. Chapman, Federico
Massaioli, Matthias S. Müller, and Marco Rorro (Eds.). Springer Berlin Heidelberg,
Berlin, Heidelberg, 44–58.
[3] Victor R Basili1 Gianluigi Caldiera and H Dieter Rombach. 1994. The goal
question metric approach. Encyclopedia of software engineering (1994), 528–532.
[4] Leonardo Dagum and Ramesh Menon. 1998. OpenMP: An Industry-Standard
API for Shared-Memory Programming. IEEE Comput. Sci. Eng. 5, 1 (Jan. 1998),
46–55. https://doi.org/10.1109/99.660313
[5] Aleksandar Dragojević, Rachid Guerraoui, and Michal Kapalka. 2009. Stretching
Transactional Memory. In Proceedings of the 30th ACM SIGPLAN Conference on
Programming Language Design and Implementation (Dublin, Ireland) (PLDI ’09).
Association for Computing Machinery, New York, NY, USA, 155–165. https:
//doi.org/10.1145/1542476.1542494
[6] GCC. [n.d.]. The GNU Compiler Collection. https://gcc.gnu.org/
[7] Tim Harris, James Larus, and Ravi Rajwar. 2010. Transactional Memory, 2nd
Edition (2nd ed.). Morgan and Claypool Publishers.
[8] Tim Harris, Simon Marlow, Simon Peyton-Jones, and Maurice Herlihy. 2005.
Composable Memory Transactions. In Proceedings of the Tenth ACM SIGPLAN
Symposium on Principles and Practice of Parallel Programming (Chicago, IL, USA)
(PPoPP ’05). Association for Computing Machinery, New York, NY, USA, 48–60.
https://doi.org/10.1145/1065944.1065952
[9] Miloš Milovanović, Roger Ferrer, Vladimir Gajinov, Osman S. Unsal, Adrian
Cristal, Eduard Ayguadé, and Mateo Valero. 2007. Multithreaded Software Trans-
actional Memory and OpenMP. In Proceedings of the 2007 Workshop on Memory
Performance: Dealing with Applications, Systems and Architecture (Brasov, Roma-
nia) (MEDEA ’07). ACM, New York, NY, USA, 81–88. https://doi.org/10.1145/
1327171.1327181
[10] Miloš Milovanović, Roger Ferrer, Vladimir Gajinov, Osman S. Unsal, Adrian
Cristal, Eduard Ayguadé, and Mateo Valero. 2008. Nebelung: Execution Environ-
ment for Transactional OpenMP. International Journal of Parallel Programming
36, 3 (01 Jun 2008), 326–346. https://doi.org/10.1007/s10766-008-0073-6

You might also like