Professional Documents
Culture Documents
KevinJARDIM SBLP 2021
KevinJARDIM SBLP 2021
This approach can be seen as a great starting point for studying programmers to partition code in transactions and ensure that they
transactions within OpenMP. run atomically and in isolation from each other. STM can simplify
OpenMP offers different mechanism of data sharing between concurrent programming, facilitating data protection, allowing for
tasks in a same paralell region. However there is a gap that can sequential reasoning between transactions and disabling part of
be covered by the adoption of transactional memory support. For the concurrency bugs [7], such as deadlock or datarace.
that, this support must respecting the programming practices of The synchronization model offered by Transactional Memory
the tool, not introducing a resource totally foreign to it, like those presents itself as an alternative to methods based on mutexes, offer-
supported by an third party library, and extending its expression ing higher levels of abstraction in the programming using atomic
power. Furthermore, the TM support must reduce the need to in- operations. Also, due to the characteristics of atomicity and isola-
clude in algorithms the execution control mechanisms under a tion, transactional systems can be conceived [7], and implemented
regime of mutual exclusion. New solutions to the application of as consequence, to explore more parallelism, increasing their scala-
TM in programming tools have emerged in recent years, such as bility and performance.
its association with the garbage collector [13] or futures [15]. Also, Unlike the lock-based mechanism imposed by the use of mutex,
we found approaches extending the application of TM to program- the transactional memory model explores the concept of atomic
ming languages, such as C++ [21] and Python [16]. In the present transactions. A programmer defines a transaction by placing a set
proposal, the use of TM also distances itself from the classic model of programming language instructions inside an atomic block. This
of STM library implementations [5], like TinySTM and SwissTM, block represents a critical section and should only contain state-
where the programmer is required to indicate the addresses of the ments with reversible effects. A runtime system allows threads to
variables to be considered in a transaction explicitly in a service execute atomic blocks concurrently, making it appear that only one
provided by the library. thread at a time is executed within an atomic block. If a transaction
This article proposesto extend OpenMP incorporating a new running concurrently conflicts with another transaction, runtime
memory layer, the Transactional Memory (TM), shared among aborts (that is, undoes its effects) and tries again later; otherwise,
tasks. This is done by incorporating a new feature in its program- it confirms it (commits) and makes its effects visible to all other
ming interface allowing to bind variables to this memory layer. The threads. The runtime basically enforces the known atomicity, con-
obtained gain is to associate the semantics of operations on transac- sistency and isolation properties of database transactions that apply
tional memory with those offered by the other operations already to [12] programming language instructions. It is worth noting that
available in OpenMP. The objective of the work is to extend the the concurrency control of transactional memory may differ from
state of the art in interfaces for multithreaded concurrent program- lock-based mechanisms. TM may often present an optimistic con-
ming by introducing resources for handling transactional memories currency control: where ownership acquisition and validation of
in well established programming programming tools. The intro- shared objects only occurs when committing. Whereas the con-
duction of transactional memory mechanisms in multithreaded currency control for locks are pessimistic: ownership acquisition
programming tools is studied with regard to programming inter- is immediate hence preventing other threads to access the shared
face issues. It is understood in this work that its effective adoption object.
can occur when its use is in line with the design decisions of the
native programming interface. The proposed approach considers 3 OPENMP
this issue, proposing programming features compatible with those The OpenMP (Open Multi-Processing) specification [4, 11] defines
already offered in OpenMP. an API to explore concurrency in C/C++ and Fortran programs
The remaining of this paper is organized as follows. Sections 2 through a multithreaded programming tool with a high degree of
and 3 provide a theoretical basis for the topics covered in the article, portability. OpenMP focuses on shared memory environments and
describing Transactional Memories and OpenMP, respectively. Sec- its API includes the specification of environment variables, a service
tion 4 discusses works in the literature with similar proposals for library and a set of directives. These directives allow to describe
integrating TM with OpenMP. In section 5, the interface proposed in the concurrency of a program and define the synchronization be-
the article is presented and an OpenMP memory model with trans- tween concurrent activities. As it was conceived, the indication of
actions is contemplated. Section 5 also presents the prototyping concurrent sections in the code is carried out explicitly, annotating
of the proposed interface in the form of an intermediate language. an original (sequential) code with the specified directives.
Section 6 presents an analysis comparing the source code of our The OpenMP programming interface applies directives as the
interface with other approaches. Section 7 presents a performance basic element to describe the concurrency of a program and intro-
evaluation regarding the proposed extension to OpenMP. Finally, duce synchronization points. These directives must be used in a
Section 8 concludes the work and enumerates the possibilities for parallel region delimited by parallel. Directives have the form1 :
future work. #pragma omp <directive> [clauses]. Regarding this syntax,
#pragma omp instructs the preprocessor that an OpenMP directive
will be expanded, so that its corresponding code is generated. A
2 TRANSACTIONAL MEMORY
directive is made up of one or more commands, which may be
Transactional Memory, or TM, is a model build over a mecha- followed by clauses.
nism that promotes synchronization between competing threads.
This work focuses on STM (Software Transactional Memory). STM 1 In
this work, we limit the discussion of the syntax to the OpenMP standard for the
is a software-level concurrency control mechanism that enables C/C++ language.
An extension for Transactional Memory in OpenMP SBLP’21, September 27-October 1, 2021, Joinville, Brazil
Table 1: Goal Question Metric table. In terms of how the transactional blocks are generated, the
transaction clause differs from the other implementations be-
Q1. What is the size of the source code? cause of how the identifiers can be used. In the implementation by
Metric NCL - Number of Code Lines
Definition Total number of lines of code in the source code including Wong [19], either the entire parallel section is used as a transaction
instructions and API calls. or a specific directive is used to dictate which parts of the code may
Comment This count does not include blank lines and comments in the have access to the TM. In contrast, in Nebelung [10], it is possible
source file.
Q2. How many resources of the OpenMP API are used? to dictate which variables are transactional variables, however that
Metric ARU - Amount of Resources Used has to be done with an additional clause. Furthermore, none of the
Definition Shows how many resources of the OpenMP API are necessary other implementations allow for the specification of the type of
for the implementation.
Comment This counts for all unique directives, clauses and function calls access (read or write) in the transactional identifiers.
defined in OpenMP. The transactional clause we propose grants the ability of defin-
Q3. How many parallelization directives are used? ing identifiers for transactional memory directly in the clause itself.
Metric NIPD - Number of Invocations to Parallelization Directives
Definition Total number of occurrences of OpenMP directives that creates Something that is possible with other clauses in OpenMP that focus
tasks. on shared data access control. Thus, specifying and controlling
Comment Characterizes the amount of parallel blocks necessary for the the transaction in the parallel section becomes simpler and more
implementation.
Q4. What is the ration of resources and the source code size? compatible with other OpenMP clauses.
Metric Rel1 - Relation between Number of Code Lines and Resources When analysing the implementations of TinySTM and GCC-TM,
Used we identified a clear difference in the level of abstraction these
Definition Rel1 = NCL / ARU.
Comment The bigger this number is, the more resources are used. implementations have compared to our interface. The program-
Q5. What is the ration of parallel zones and source code size? ming interface provided by TinySTM and GCC-TM require of the
Metric Rel2 - Relation between Number of Code Lines and Number of programmer to inform the memory address that will be manipu-
Parallelization Directives
Definition Rel2 = NCL / NIPD. lated in a transactional manner. This results in a situation where
Comment This shows a relative amount of parallel blocks compared to a given variable in the TM is not dependent on the scope of its
the size of the code. identifier. In our interface, in contrast to TinySTM or GCC-TM, the
transactional access to a variable is allowed only in the structured
Table 2: Comparison between OpenMP code of the block associated to the OpenMP current directive. If the variable is
Cowichan problems. passed as parameter by reference to a function, any access through
the reference is not observed by the TM manager.
OpenMP
GCC-TM TinySTM
Metric Problem transaction original Nebelung Wong
hull 152 208 152 155 167 180
norm 60 95 60 60 68 78
outer 31 53 31 34 38 45
NCL
sor 41 61 41 44 48 55 7 EXPERIMENTATION AND PERFORMANCE
tresh 59 102 59 59 70 86
veciff 20 42 20 23 28 35 ANALYSIS
hull 3 2 4 3 3 3
norm 3 2 4 3 2 2
The experimentation was carried out considering two sets of pro-
outer 4 3 5 4 3 3 grams. The first is represented by the Cowichan [2] benchmark
ARU
sor 4 4 5 4 3 3 applications. Programs from this benchmark were implemented to
tresh 3 3 4 3 2 3
vecdiff 3 3 4 3 3 3 validate the interface. The second set of programs consists of an
hull 3 5 3 4 5 5 application of the STAMP [18] benchmark (bayes) and a program
norm 2 3 2 2 3 3 developed as a benchmark for OpenMP (kmeans). The experiments
outer 3 4 3 4 4 4
NIPD in steps 1 and 2 were executed using two NUMA platforms: Hydra
sor 1 2 1 2 2 2
tresh 7 9 7 7 9 9 (Opteron architecture with 64 cores, 4 nodes, 120 GB RAM) and
vecdiff 1 2 1 2 2 2
hull 0,02 0,01 0,03 0,02 0,02 0,02
Tekoha (Xeon architecture with 192 cores, 8 us, 120 GB RAM).
norm 0,05 0,02 0,07 0,05 0,03 0,03 The performance index collected is the execution time, presented
ARU / outer 0,13 0,06 0,16 0,12 0,08 0,07 in seconds, obtained using an average of at least 30 executions in
NCL sor 0,10 0,07 0,12 0,09 0,06 0,05
tresh 0,05 0,03 0,07 0,05 0,03 0,03 each case. In the presentation of these averages, it is highlighted
veciff 0,15 0,07 0,20 0,13 0,11 0,09 when the samples of each case adhere to a normal distribution or
hull 0,02 0,02 0,02 0,03 0,03 0,03 not. For this, the Kolmogorov-Smirnov test was performed with
norm 0,03 0,03 0,03 0,03 0,04 0,04
NIPD / outer 0,10 0,08 0,10 0,12 0,11 0,09 95% confidence. The comparisons between the cases use adequate
NCL sor 0,02 0,03 0,02 0,05 0,04 0,04 statistical tests to obtain positioning of the performance between
tresh 0,12 0,09 0,12 0,12 0,13 0,10 two cases. When samples adhere to a normal curve, Student’s T
vecdiff 0,05 0,05 0,05 0,09 0,07 0,03
was used, otherwise the Mann-Whitney U test was used, both also
with 95% confidence.
close but with a higher number. In regards to the use of resources, The case studies obtained reflect a combination of the versions of
our proposal had the similar results as [19]. It is also noticeable the programs to be compared, different execution supports, different
that in general, every implementations with TM support had better architectures (in steps 1 and 2) and different number of threads in
results than the original OpenMP. the OpenMP runtime support.
SBLP’21, September 27-October 1, 2021, Joinville, Brazil Andre D. Jardim, Kevin Oliveira, Diogo J. Cardoso, Daniel Di Domenico, Andre R. Du Bois, and Gerson G. H. Cavalheiro
7.1 Experimentation Step 1: Prototype Table 5: Case performance: Input large, with 32 threads at
Validation execution time (time in seconds).
The proposed interface was validated implementing six Cowichan Hydra Tekoha
benchmark programs in which the applicability of the transactional OpenMP TinySTM GCC-TM OpenMP TinySTM GCC-TM
memory model was possible: hull, norm, outer, sor, thresh and vecd- hull 2,42 9,15 149,65 2,42 13,27 267,01
norm 0,61 1,28 47,54 0,15 0,63 80,79
iff. The Cowichan benchmark was not designed for performance outer 0,81 4,53 131,83 1,92 2,24 241,51
analysis, but to characterize parallel programming tools in terms sor 4,01 4,14 4,51 0,79 0,9 1,06
of the ability to represent your programming resources. thresh 1,55 265,14 502,84 0,87 490,6 1221,75
vecdiff 0,86 2,5 221,55 0,19 1,43 399,66
7.1.1 Performance Analysis. Three implementations were evalu-
ated for the selected Cowichan problems. One in OpenMP2 and the
other two obtained from this first version by adapting it through with red cells in Table 5. This case illustrates a situation where it
a prototype of the proposed interface with versioned Vanilla-TM, is not possible to assert the superiority of OpenMP over TinySTM,
that supports transactional memory offered by TinySTM and GCC- or TinySTM over GCC-TM. In this program, the test applied to
TM. The experiments considered three input sizes of the problem, the results was unable to confirm, with 95% confidence, that the
defined according to the nature of each application, identified by: averages obtained by executions with the three tools belonged to
small, medium and large. Execution times for 2, 4, 8, 16, 32 and 64 different populations.
threads were collected in the OpenMP execution team. The overall analysis of the results allows inferring that pure
Tables 3, 4 and 5 exemplify the results obtained by presenting OpenMP naturally produces better performance by using lower
the average execution times obtained by running the programs on level resources. It is also observed that for different cases, sor and
small, medium and large size entries with 32 threads in the OpenMP norm highlighted at this point, the use of TM (in particular with
execution team. TinySTM) can be considered. For others, like thresh and hull, TM
tools are not appropriate, at least not using the same algorithm
Table 3: Case performance: Input small, with 32 threads at implemented in OpenMP.
execution time (time in seconds).
7.2 Experimentation Step 2: Analysis of the
Hydra Tekoha Prototype Behavior
OpenMP TinySTM GCC-TM OpenMP TinySTM GCC-TM
hull 0,22 0,25 0,21 0,24 0,26 0,23 In the second stage of experimentation, two programs originally
norm 0,06 0,15 4,76 0,04 0,08 7,07 developed for performance analysis were implemented with the pro-
outer 0,29 0,28 8,34 0,09 0,14 14,04 posed interface and their performance results were evaluated. One
sor 4,25 0,28 0,26 0,07 0,09 0,07
thresh 1,13 16,13 34,60 0,14 30,93 51,19 of these programs, bayes, belongs to the benchmark STAMP [18],
vecdiff 0,1 0,25 22,36 0,03 0,17 39,2 developed to evaluate the performance of tools for programming
with transactional memory. The second program was developed
as a benchmark for OpenMP3 . The objective of this second stage
Table 4: Case performance: Input medium, with 32 threads of experimentation is to position the proposed solution in relation
in the execution time (time in seconds). to its applicability in problems conceived within the context of
programming with transactional memory and problems designed
Hydra Tekoha to be applied over pure OpenMP.
OpenMP TinySTM GCC-TM OpenMP TinySTM GCC-TM
hull 0,8 0,94 0,81 0,74 0,73 0,74 7.2.1 Bayes. This program, in the STAMP benchmark, implements
norm 0,32 0,65 23,78 0,08 0,33 39,82 an algorithm for learning Bayesian networks. The complexity of the
outer 0,47 0,68 33 0,39 0,54 59,38
sor 0,9 0,95 0,88 0,24 0,25 0,25
program is associated with the number of variables considered and
thresh 0,45 66,27 126,08 0,33 122,26 209,01 the connection structure between these variables. These connec-
vecdiff 0,19 0,47 44,73 0,06 0,33 79,2 tions represent, in terms of probability, the conditional dependence
between two variables. The launch of the program allows the pa-
In the tables presented, the cells marked in red correspond to rameterization of the size of the problem indicating the number of
the experiments whose samples did not adhere to a normal curve. variables to be treated and the degree of connectivity between the
That is, these averages do not represent samples whose distribution variables.
has been shown to adhere to a normal distribution. In this case, the In the original implementation of this problem4 , the likelihood
average times are presented in an illustrative way, but were not function is calculated for each variable and then the values obtained
used in the performance comparison between the tools. for each variable are accumulated. The calculation operation is
The result in which executions with pure OpenMP would per- performed concurrently and the accumulation operation to obtain
form better, due to the extra cost of adding TM management mech- the global value uses some synchronization mechanism.
anisms, was proven in almost all cases. The exception is in the 3 The code used as the basis for the implementation is available at https://github.com/
sor program when submitted to a large entry. This is highlighted manshi10/kmeans, accessed on November, 12, 2020.
4 The implementation of this benchmark used in this work was taken from the reposi-
2 Retrieved from https://code.google.com/archive/p/cowichan, accessed November 9, tory of one of its authors: https://github.com/kozyraki/stamp, accessed on September
2020. 7, 2020.
An extension for Transactional Memory in OpenMP SBLP’21, September 27-October 1, 2021, Joinville, Brazil
Hydra Tekoha
2 4 8 16 32 64 2 4 8 16 32 64
Small
kmeans 19,40 10,10 5,29 2,92 1,86 1,62 10,78 7,40 5,05 3,97 3,96 3,89
kmeans-VAN 32,91 38,65 83,84 167,36 401,66 534,04 37,24 50,40 117,27 223,94 459,53 918,57
bayes 10,60 10,84 12,91 11,04 11,86 11,42 5,45 5,76 6,73 5,97 5,29 5,71
bayes-VAN 6,76 3,66 5,70 6,41 3,77 4,10 3,55 2,24 2,81 3,40 2,96 2,17
Medium
kmeans 37,12 19,34 10,05 5,56 3,51 3,03 20,70 14,38 10,31 8,16 7,76 7,54
kmeans-VAN 23,92 88,25 152,68 372,36 846,79 1011,78 56,19 113,27 224,95 480,41 930,51 1842,23
bayes 14,12 18,21 12,36 15,67 19,99 16,46 7,33 8,73 9,46 7,04 7,55 8,46
bayes-VAN 8,57 5,05 16,38 9,98 9,12 8,93 3,60 4,77 7,66 4,72 3,23 7,02
Large
kmeans 75,16 39,33 20,67 11,10 6,89 6,00 40,86 29,29 21,72 17,30 15,97 15,12
kmeans-VAN 147,07 221,28 335,41 761,71 1677,89 2152,70 126,21 247,09 485,72 1031,31 1872,23 3570,21
bayes 14,50 15,99 13,34 14,18 14,35 13,68 8,30 6,63 8,80 8,80 7,43 7,55
bayes-VAN 7,41 7,64 4,64 8,34 4,50 5,03 5,28 4,46 2,18 3,93 2,70 2,90
the others interfaces. However, regarding this metric, Nebelung [11] OpenMP Specification. 2018. Version 5.0. The OpenMP Architecture Review
presented higher values. Board.
[12] Victor Pankratius and Ali-Reza Adl-Tabatabai. 2014. Software Engineering with
A performance evaluation was executed implementing bench- Transactional Memory Versus Locks in Practice. Theor. Comp. Sys. 55, 3 (Oct.
marks programs for TM (Bayes) and OpenMP (kmeans) applying the 2014), 555–590. https://doi.org/10.1007/s00224-013-9452-5
[13] V. M. Dhivya Shri and K. Reshma. 2019. The Transactional Memory. International
proposed interface. The results allowed to analyse the cost added Journal of Scientific Research in Computer Science, Engineering and Information
by the proposed interface in comparison with pure OpenMP and Technology (Feb 2019), 13–20. https://doi.org/10.32628/cseit1951117
TinySTM. They allowed us to suppose a stability in the overhead us- [14] S. K. Srivatsa and Ch. R. Kumar. 2012. Reconfigurable Frame Work for Chip-
multiprocessors and its Application in Multithreaded Environment. Interna-
ing different number of threads regardless of the employed problem tional Journal on Information Sciences and Computing 6, 1 (Jan 2012), 41–48.
size. They also indicated that the proposed interface presents a low https://doi.org/10.18000/ijisac.50111
overhead in relation to the direct use of TinySTM over OpenMP. [15] Janwillem Swalens, Joeri De Koster, and Wolfgang De Meuter. 2018. Chocola: Inte-
grating Futures, Actors, and Transactions. In Proceedings of the 8th ACM SIGPLAN
For future work, we intend to specify the Vanilla-TM interface International Workshop on Programming Based on Actors, Agents, and Decentralized
in a way where it could be representative for a wider spectrum of Control (Boston, MA, USA) (AGERE 2018). Association for Computing Machinery,
New York, NY, USA, 33—-43. https://doi.org/10.1145/3281366.3281373
STM tools, also including space for its extensibility. Features such [16] Tabassum and Meenu. 2020. Transactional Memory: A Review. In 2020 6th
as transaction nesting, definition of read and write set, library calls International Conference on Advanced Computing and Communication Systems
and use of I/O can also influence the definition of the extension (ICACCS). 370–375. https://doi.org/10.1109/ICACCS48705.2020.9074423
[17] Jons-Tobias Wamhoff, Torvald Riegel, Christof Fetzer, and Pascal Felber. 2010.
for TM, so they will also be taken into consideration. In addition, RobuSTM: A robust software transactional memory. In Symp on Self-Stabilizing
the effective introduction of the extension in OpenMP and a per- Systems. Springer, 388–404.
formance evaluation considering a specific benchmark to evaluate [18] Gregory V. Wilson and R. Bruce Irvin. 1995. Assessing and Comparing the Usabil-
ity of Parallel Programming Systems. Technical Report. University of Toronto.
the transactional memory model tools will also be contemplated. Computer Systems Research Institute.
[19] Michael Wong, Eduard Ayguadé, Justin Gottschlich, Victor Luchangco, Bronis R.
ACKNOWLEDGMENTS de Supinski, and Barna Bihari. 2014. Towards Transactional Memory for OpenMP.
In Using and Improving OpenMP for Devices, Tasks, and More, Luiz DeRose, Bro-
This study was financed in part by the Coordenação de Aperfeiçoa- nis R. de Supinski, Stephen L. Olivier, Barbara M. Chapman, and Matthias S.
Müller (Eds.). Springer International Publishing, Cham, 130–145.
mento de Pessoal de Nível Superior – Brasil (CAPES) – Finance [20] Michael Wong, Barna L. Bihari, Bronis R. de Supinski, Peng Wu, Maged Michael,
Code 001. . This work has been partially supported by the project Yan Liu, and Wang Chen. 2010. A Case for Including Transactions in OpenMP.
“GREEN-CLOUD: Computação em Cloud com Computação Sus- In Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More,
Mitsuhisa Sato, Toshihiro Hanawa, Matthias S. Müller, Barbara M. Chapman,
tentável” (#16/2551-0000 488-9), from FAPERGS and CNPq Brazil, and Bronis R. de Supinski (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg,
program PRONEX 12/2014. 149–160.
[21] Pantea Zardoshti, Tingzhe Zhou, Pavithra Balaji, Michael L. Scott, and Michael
Spear. 2019. Simplifying Transactional Memory Support in C++. ACM Trans.
REFERENCES Archit. Code Optim. 16, 3, Article 25 (July 2019), 24 pages. https://doi.org/10.
[1] Woongki Baek, Chi Cao Minh, Martin Trautmann, Christos Kozyrakis, and Kunle 1145/3328796
Olukotun. 2007. The OpenTM Transactional Application Programming Interface.
In Proceedings of the 16th International Conference on Parallel Architecture and
Compilation Techniques (PACT ’07). IEEE Computer Society, Washington, DC,
USA, 376–387. https://doi.org/10.1109/PACT.2007.74
[2] Barna L. Bihari, Michael Wong, Amy Wang, Bronis R. de Supinski, and Wang Chen.
2012. A Case for Including Transactions in OpenMP II: Hardware Transactional
Memory. In OpenMP in a Heterogeneous World, Barbara M. Chapman, Federico
Massaioli, Matthias S. Müller, and Marco Rorro (Eds.). Springer Berlin Heidelberg,
Berlin, Heidelberg, 44–58.
[3] Victor R Basili1 Gianluigi Caldiera and H Dieter Rombach. 1994. The goal
question metric approach. Encyclopedia of software engineering (1994), 528–532.
[4] Leonardo Dagum and Ramesh Menon. 1998. OpenMP: An Industry-Standard
API for Shared-Memory Programming. IEEE Comput. Sci. Eng. 5, 1 (Jan. 1998),
46–55. https://doi.org/10.1109/99.660313
[5] Aleksandar Dragojević, Rachid Guerraoui, and Michal Kapalka. 2009. Stretching
Transactional Memory. In Proceedings of the 30th ACM SIGPLAN Conference on
Programming Language Design and Implementation (Dublin, Ireland) (PLDI ’09).
Association for Computing Machinery, New York, NY, USA, 155–165. https:
//doi.org/10.1145/1542476.1542494
[6] GCC. [n.d.]. The GNU Compiler Collection. https://gcc.gnu.org/
[7] Tim Harris, James Larus, and Ravi Rajwar. 2010. Transactional Memory, 2nd
Edition (2nd ed.). Morgan and Claypool Publishers.
[8] Tim Harris, Simon Marlow, Simon Peyton-Jones, and Maurice Herlihy. 2005.
Composable Memory Transactions. In Proceedings of the Tenth ACM SIGPLAN
Symposium on Principles and Practice of Parallel Programming (Chicago, IL, USA)
(PPoPP ’05). Association for Computing Machinery, New York, NY, USA, 48–60.
https://doi.org/10.1145/1065944.1065952
[9] Miloš Milovanović, Roger Ferrer, Vladimir Gajinov, Osman S. Unsal, Adrian
Cristal, Eduard Ayguadé, and Mateo Valero. 2007. Multithreaded Software Trans-
actional Memory and OpenMP. In Proceedings of the 2007 Workshop on Memory
Performance: Dealing with Applications, Systems and Architecture (Brasov, Roma-
nia) (MEDEA ’07). ACM, New York, NY, USA, 81–88. https://doi.org/10.1145/
1327171.1327181
[10] Miloš Milovanović, Roger Ferrer, Vladimir Gajinov, Osman S. Unsal, Adrian
Cristal, Eduard Ayguadé, and Mateo Valero. 2008. Nebelung: Execution Environ-
ment for Transactional OpenMP. International Journal of Parallel Programming
36, 3 (01 Jun 2008), 326–346. https://doi.org/10.1007/s10766-008-0073-6