Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

Green-CM: Energy efficient

contention management for


Transactional Memory
Shady Alaa
Paolo Romano – INESC-ID/IST
Mats Brorsson - KTH
Agenda
• Introduction
• Related work
• Architecture
• Green-CM
• Evaluation
• Conclusion

ICPP 2015 - Green-CM 2


Introduction
• Multicores are everywhere
– Complex programming Main memory
• Locks
• Deadlocks Core 1 Core 2 Core 3 Core 4

– Transactional memory
• Atomics blocks
• Transparent from programmer
atomic{
if(bal>amount)
withdraw(amount);
}
ICPP 2015 - Green-CM 3
Introduction
• Energy efficiency
– First order design choice
– Battery based devices
– Data centers

• Goal
– Energy efficient transactional memory in
terms of both energy and performance

ICPP 2015 - Green-CM 4


Introduction
• Contention Manager
– minimize contention
– which transaction to abort
– when to restart an aborted
transaction
• Energy efficiency:
– wait implementation
– DVFS

ICPP 2015 - Green-CM 5


Related work
• Few work in literature
– Mainly HTM
• Clock gating processors upon abort
– Lowering frequency upon abort
• Using simulator
• Studies
– HTM consume lower energy
• Does not fit all workloads
– Need for adaptability
• Using DVFS in TM
– Fastlane
• Designed for low number of threads
ICPP 2015 - Green-CM 6
Architecture

Throughput*
* Controller*
Energy*

Tuning*of* Tuning*of*
Β* α,*Τ*
*
backEoff* End**
Asymmetric* Hybrid*
Tx*abort* dura.on* backEoff*
Conten.on Wait**
(no.*of*retries,*
Implementa.on* Restart*
core*on*which* Manager*
Tx*
tx*is*execu.ng)*

ICPP 2015 - Green-CM 7


Architecture

Throughput*
* Controller*
Energy*

Tuning*of* Tuning*of*
Β* α,*Τ*
*
backEoff* End**
Asymmetric* Hybrid*
Tx*abort* dura.on* backEoff*
Conten.on Wait**
(no.*of*retries,*
Implementa.on* Restart*
core*on*which* Manager*
Tx*
tx*is*execu.ng)*

ICPP 2015 - Green-CM 8


Implementing waits
• Building block for contention managers
• Drastic effect on energy consumption
• Can be implemented in two ways:
– Busy waiting
– sleeping

ICPP 2015 - Green-CM 9


Implementing waits
• Busy waiting • Sleeping
– Fine granularity – Coarse granularity
– Similar to real actual – Low energy
work consumption
– expensive

ICPP 2015 - Green-CM 10


Implementing waits
• Hybrid approach
– Either busy wait or sleep
• Adaptive fashion
– How to determine the threshold
• Cost of sleep

ICPP 2015 - Green-CM 11


Implementing waits
No one
Static Thresholds size fits all
6
5.5 Intruder
5 Kmeans
EDP / best EDP

4.5
4
3.5
3
2.5
2
1.5
1
100 1000 10000 100000 1x106 1x107
Threshold

ICPP 2015 - Green-CM 12


Architecture

Throughput*
* Controller*
Energy*

Tuning*of* Tuning*of*
Β* α,*Τ*
*
backEoff* End**
Asymmetric* Hybrid*
Tx*abort* dura.on* backEoff*
Conten.on Wait**
(no.*of*retries,*
Implementa.on* Restart*
core*on*which* Manager*
Tx*
tx*is*execu.ng)*

ICPP 2015 - Green-CM 13


Asymmetric CM
• DVFS P0 3.0 GHz
– Variable operating frequency P1 2.4 GHz

• Exploiting DVFS P2 2.2 GHz


– Boosting active threads P3 2.0 GHz
– Reducing freq. of backing off
P4 1.8 GHz
threads
P5 1.6 GHz
• Enabling DVFS P6 1.4 GHz
– Manual control is expensive
– How to favor automatic boosting

ICPP 2015 - Green-CM 14


Asymmetric CM
Linear
Busy Linear
Busy
• Linear backoff cores: Boosted
backoff Boosted
backoff
wait waiting
– Shorter backoff periods
– Mainly busy waiting Exp. Exp.
Sleep Sleep
backoffs Backoff Backoff
• Exp. Backoff cores:
– Longer backoff periods Exp. Exp.
Sleep Sleep
– Mainly sleep waiting Backoff Backoff
• Favor boosting
Exp. Exp.
– When enough cores are Sleep Sleep
in sleep states
Backoff Backoff
8 core
processor
ICPP 2015 - Green-CM 15
Asymmetric CM
• Increased contention?
– Cores not backing off exponentially

• Control number of cores to be boosted

ICPP 2015 - Green-CM 16


Asymmetric CM
Intruder Genome STM7
Kmeans Memcached
Static No. of Boosted Threads
1.8
1.6
EDP / best EDP

1.4
1.2
1
0.8
0.6
0.4
0.2
0
2 4 8 16
No. of Boosted Threads

ICPP 2015 - Green-CM 17


Architecture

Throughput*
* Controller*
Energy*

Tuning*of* Tuning*of*
Β* α,*Τ*
*
backEoff* End**
Asymmetric* Hybrid*
Tx*abort* dura.on* backEoff*
Conten.on Wait**
(no.*of*retries,*
Implementa.on* Restart*
core*on*which* Manager*
Tx*
tx*is*execu.ng)*

ICPP 2015 - Green-CM 18


Controller
• Online, lightweight
• Hill climbing
• Challenges:
– Collection of energy
– Multi dimensional
• Different exploration strategies
– Stabilization
– Random jumps

ICPP 2015 - Green-CM 19


Controller
• Tuning α (threshold for hybrid)
Threshold Tuning Strategies
2.5
no stab
EDP / best EDP

2 stab
stab jmp 1
1.5 stab jmp 10

1
0.5
0
In

Km

ST

Av
tru

em

e
M
ea

ra
de

7
ca
ns

ge
r

ch
de

Benchmark
ICPP 2015 - Green-CM 20
2.5
Controller
Threshold Tuning Strategies

no stab

EDP / best EDP


2 stab
• Tuning β (no.
1.5 of boosted threads) stab jmp 1
stab jmp 10
No. of Boosted Threads Tuning Strategies
1.6 1
1.4
EDP / best EDP

0.5
1.2
1 0
0.8
In

Km

ST

Av
tru

em

er
M
0.6

ea
de

ag
7
ca
ns
0.4
r

e
ch
ed
0.2
0 Benchmark
In

Km

ST

Av
tru

em

e
M
ea

ra
d

7
ca
er

ge
s

ch
ed
Benchmark
ICPP 2015 - Green-CM 21
Controller
• Merging the learners
independent stab jmp 1 stab jmp 1 – stab bidim stab jmp 1
stab – stab stab jmp 10 – stab
independent stab jmp 1 stab jmp 1 – stab bidim stab jmp 1
stab – stab Coupling
stab the Tuners
jmp 10 – stab
2.5
Coupling the Tuners
EDP / best EDP

2
5 1.5
1
2 0.5
5 0
In

Km

ST

Av
1
tru

em

er
M
ea
de

ag
7
ca
ns
r

e
ch

5
ed

Benchmark
ICPP 2015 - Green-CM 22
0
Evaluation
Intruder
1.2
EDP-GreenCM / EDP

1
0.8
0.6
0.4
0.2
0
4 8 16 32 48 64
Threads

ICPP 2015 - Green-CM 23


Evaluation
STM7
1.2
EDP-GreenCM / EDP

1
0.8
0.6
0.4
0.2
0
4 8 16 32 48 64
Threads

ICPP 2015 - Green-CM 24


Evaluation
Memcached
1.2
EDP-GreenCM / EDP

1
0.8
0.6
0.4
0.2
0
4 8 16 32 48 64
Threads

ICPP 2015 - Green-CM 25


Evaluation
Intruder, 64 threads

p6
p5
% of total cores

p4
p3
p2
p1
p0
spin

no-asym

asym

ICPP 2015 - Green-CM 26


Conclusion
• Implementation of waits has a significant
impact on energy efficiency
• Experimental results (obtained on real
system) contradict previously published
ones based on simulation
• Exploiting DVFS enhances energy
efficiency
• Self-tuning is needed to adapt to different
workloads
ICPP 2015 - Green-CM 27
THANK YOU

ICPP 2015 - Green-CM 28


Evaluation
Intruder
1.2
1.2
/ Energy
/ EDP

11
0.8
EDP-GreenCM

0.8
Energy-GreenCM

0.6
0.6
0.4
0.4
0.2
0.2
0
4 8 16 32 48 64
0
Threads
4 8 16 32 48 64
Threads
ICPP 2015 - Green-CM 29
Evaluation
Intruder
1.21.2
EDP-GreenCM / EDP

1 1
Time-GreenCM / Time

0.8
0.8
0.6
0.6
0.4
0.4
0.2
00.2
4 8 16 32 48 64
0
4 8 Threads
16 32 48 64
Threads
ICPP 2015 - Green-CM 30

You might also like