Coordinated Multi-Agent Reinforcement Learning Approach To Multi-Level Cache Co-Partitioning

A Coordinated Multi-Agent Reinforcement
Learning Approach to Multi-Level Cache Co-

partitioning
Presented by Preeti Ranjan Panda
Department of Computer Science and Engineering
Indian Institute of Technology Delhi, India
Rahul Jain
Sreenivas Subramoney
Preeti Ranjan Panda
Introduction
• Dynamic Cache Way Partitioning
• Cache Set-Ways assigned to individual Cores
• Dynamic: Requirement changes during program execution
• Isolation: Cores do not evict other Core’s data
• Simultaneous Multi-Threaded (SMT) Processor
• Parallel Execution of multiple threads on same physical core
• Hyper-Threading on Intel
• Improves Function Unit utilization
• Private Caches are shared: L1 and L2
29 March 2017 Indian Institute of Technology Delhi, India 2

DCCP Problem
• Dynamic Cache Co-partitioning
• Simultaneously perform cache partitioning at multiple levels
• Current Work: L2+L3 DCCP
• Optimize for System Throughput (STP)
• First Attempt at this problem
App1 with 100 KB WSS App2 with 1000 KB WSS
L2 128 KB Demand
based Data for
allocation both the
L3 1024 KB
Apps fits in
Cache
WSS: Working Set Size

Application Sensitivity to Cache Levels
Available Cache Ways
Latency normalized to base case

App L1 L2 L3
bzip2 L L H
gobmk H H H
povray H H L
astar L L L

Cache Co-partitioning Motivation
•Static Cache Partitioning (SCP) based on App Cache Sensitivity
• bzip2 (LLH), povray(HHL), gobmk(HHH), astar(LLL)
• SCP-L2 (SCP-L3) : SCP only at L2 (L3)
• SCP-L2L3: SCP at both L2 and L3
normalized to no-opt
baseline system
STP: System Throughput
Cache Partitioning
• Cache Partitioning Technique
• Step 1: Cache Allocation
• Based on Utility Computation
• Step 2: Partitioning Enforcement
•Cache Allocation Utility
• Working Set Size (WSS)
• Reuse Distance of Data
• Application Sensitivity to Latency
• High ILP app would be less sensitive
• Application MLP

Related Work
•[UCP-LA]: M. K. Qureshi et. al., MICRO-2006
• Used by most state-of-the-art DCP for Cache Allocation (Step 1)
• Utility Monitors (UMON) track cache hits per Way
• Look Ahead Algorithm for doing partitioning
• Cache Miss Utility
• O(W*C*C), W:cache ways, C: #cores sharing the cache
• [UCP-LU] I. Guney et. al., ISC-2015
• Reduces partition algorithm overhead but still requires ATD profilers
• Requires offline training per application
• [MCFQ] D. Kaseridis, et. al., IEEE TC-2014
• Utility Model
• MLP
• Cache Friendliness (WSS and Reuse Distance)
• ATD and Partition Algorithm overheads
• [MLM] R. Jain et. al., DATE-2016
• RL based DCP of LLC
• Single RL Agent scales poorly

DCCP using State-of-the-art DCP
• Extending Single-level Cache Techniques to Multi-level Caches
• Apply Single-Level Techniques simultaneously at L2 and L3
• No interaction between the L2 and L3 partition controller
•UCP-based for Co-partitioning
• 1 UMON per App per Cache Level
• 4-SMT requires 8 UMON instances
• Requests reaching L3 depends on the L2 allocation
• L2 and L3 allocators do not interact
• L2 (L3) ATDs results in significant power (latency)
overheads
• varying performance sensitivity to cache misses
•Our Proposal: Machine Learned Caches
• Use Model Free Reinforcement Learning
• No Special hardware profilers
• Learn Cache Utility online
Reinforcement Learning (RL)
• Learn by interaction with the Environment (Architecture)
• Cache Allocation Utility Model NOT REQUIRED
• No Models for WSS, Reuse distance, Latency Sensitivity, MLP, etc. and their
complex interactions
• Markov Decision Process (MDP) MDP Agent Model
• Framework to implement RL
State Agent’s View of the Architecture
Actions How to interact with Architecture (Cache

Reconfiguration) expecting state change
Reward Measure of utility of the performed cache

reconfiguration
Helps identify good actions from a state

Q-Learning Algorithm
• • One of the most popular RL algorithms
• Model Free RL Technique
• Does not require an Environment (Architecture) Model
• Finds a good action-selection policy for an MDP
• Learns an Action-Value Function (Cache Allocation Utility)
• Expected Utility of an Action (Cache Reconfiguration) in a State
• One-Step Q-Learning
• t : current state and action

• : Learning rate (Importance of new data)
• : Reward Discount (Importance of Long Term Reward)
• : new state after performing at
Coordinated Learning
• Uncoordinated Agents
• Agents Actions are Independent
• Selfish Agent Actions: What is best for me is best for the
system
• Maximum Utility to Agent
• Can the agents work together ?
• Pick actions jointly for better global optimization
• Maximum Utility to System

Coordinated Joint Action
• Q-Table represents the Utility of Actions (Cache Reconfigurations)
• Central Controller Searches for Joint Actions
• Search only feasible joint actions
• Total allocation and deallocation requests should match
• with Max Utility to System
• Hill Climbing Search : Low overhead

Machine Learned Caches (MLC)
•1 MDP Agent/SMT Core
•Hill Climbing Search for Best Joint Actions
•Agent Action Space
• resize only one cache level at a time
• Request increase(+)/decrease(-) in cache ways
• L2Request = [-2,-1,0,+1,+2]
• L3Request = [-4,-2,-1,0,+1,+2,+4]
• MDP Agent Model
< L2Request, L3Request > : 11 Actions
State Quantized IPC values
Actions Resize Cache : < L2Request, L3Request >
Reward Instruction Per Cycle (IPC)

Smart Updates
• Exploit IPC cache size relation L2 resize
• IPC cannot decrease with increasing cache size Actions
• Infer Multiple Learnings
• Better adaptation to Application Phase +2 Smart Update with
Changes same L3 size: Lower
bound on IPC utility
+1
L3 resize
Actions -4 -2 -1 0 +1 +2 +4
Q-Update Smart Update with

for Action
-1 more L3 size: Lower
<0, -4> bound on IPC utility
-2

Smart Updates
L2 resize
Actions
+2
Q-Update Smart Update with more
for Action Cache size (L2/L3): Lower
<0, 0> +1 bound on IPC utility
L3 resize
Actions -4 -2 -1 0 +1 +2 +4
Smart Update with lesser -1

Cache size: Upper bound on
IPC utility
-2

Multi-Agent Coordinated Working
Agents get
their State
Agents performs Perform Hill Climbing

Smart Updates Search to find Best Joint
Action
Agents receive
reward (IPC) for its Agents executes its
last Action and action of Cache
learns Reconfiguration
System performs
Cache
Reconfigurations
and Executes

Experimental Setup and Results
Simulator SniperSim + McPAT
Architecture Intel x86 Nehalem
Cores 3.2 GHz, 8-SMT cores
(2-SMT per physical core) DCCP STP EDP
L1 Cache split, private, 3 cycle access coMCFQ 4.9% 6.8%
32 KB L1I, 32 KB L1D
L2 Cache private, 256 KB, 8-way, 10 cycle access coMLM 4.6% 7.14%
L3 Cache 8 MB, 32-way, 30 cycle access coUCP-LU 7.7% 9.4%
8-SMT core shared
MLC 9.35% 13.5%
Uncore 3.2 GHz (NoC+LLC)
DRAM 60ns ( ~200 cycles) access
Workloads 15 8-benchmark WL using Spec2006

Conclusion
• Dynamic Cache Co-partitioning (DCCP)
• New problem proposed
• Perform cache allocation across cache levels
• DCCP outperforms DCP
• Extending state-of-the-art DCP techniques not efficient
• Multi-Agent Reinforcement Learning
• No Cache Utility Model required, learns the model online
• No special data profilers required
• Low implementation overhead
• Machine Learned Caches
• Coordinated Multi-Agent RL model
• Smart Updates for faster adaptability to phase changes
• Avg 9.35% (13.5%) STP (EDP) improvement evaluated on 8-SMT system

Thank You
• This work was partially supported by SERB and CII, Govt. of
India under the Prime Minister’s Doctoral Fellowship with Intel
as the industry partner
• References
• [UCP-LA] M. K. Qureshi et. al., “Utility-based cache partitioning: A low-overhead, high-
performance, runtime mechanism to partition shared caches,” in MICRO, 2006.
• [UCP-LU] I. Guney et. al. “A machine learning approach for a scalable, energy-efficient utility-
based cache partitioning,” in ISC-2015
• [MCFQ] D. Kaseridis, et. al, “Cache friendliness aware management of shared last-level caches
for high performance multicore systems,” IEEE TC-2014
• [MLM] R. Jain et. al. “Machine Learned Machines: Adaptive co-optimization of caches, cores,
and on-chip network,” DATE-2016

Reinforcement Learning
• Learning takes place as a result of interaction between an
agent and the world
• Adaptiveness of RL can handle the complexity of applying
multiple optimizations simultaneously
• RL is useful for Sequential Decision Making Problems
• Agent interacts with Architecture by taking actions and
reaching next state
• Agent learns good/bad states and optimal actions based on
the received rewards
Markov Decision Process
• Markov Decision Process (MDP)
• Framework to implement RL
• Set of States
• Set of Actions
• Reward Function
• Computational Model
• Agent Interacts with the Env
• Learns Optimal Actions from a State by Trial and Error
• s0 -> a0 -> r1 -> s1 ->a1 ...

DCCP Results

Coordinated Multi-Agent Reinforcement Learning Approach To Multi-Level Cache Co-Partitioning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Coordinated Multi-Agent Reinforcement Learning Approach To Multi-Level Cache Co-Partitioning

Uploaded by

Copyright:

Available Formats

A Coordinated Multi-Agent Reinforcement

Learning Approach to Multi-Level Cache Co-

29 March 2017 Indian Institute of Technology Delhi, India 2

App1 with 100 KB WSS App2 with 1000 KB WSS

29 March 2017 Indian Institute of Technology Delhi, India 3

Latency normalized to base case

29 March 2017 Indian Institute of Technology Delhi, India 4

29 March 2017 Indian Institute of Technology Delhi, India 6

29 March 2017 Indian Institute of Technology Delhi, India 7

Actions How to interact with Architecture (Cache

Reward Measure of utility of the performed cache

29 March 2017 Indian Institute of Technology Delhi, India 9

• t : current state and action

29 March 2017 Indian Institute of Technology Delhi, India 11

29 March 2017 Indian Institute of Technology Delhi, India 12

29 March 2017 Indian Institute of Technology Delhi, India 13

Q-Update Smart Update with

29 March 2017 Indian Institute of Technology Delhi, India 14

Smart Update with lesser -1

29 March 2017 Indian Institute of Technology Delhi, India 15

Agents performs Perform Hill Climbing

29 March 2017 Indian Institute of Technology Delhi, India 16

29 March 2017 Indian Institute of Technology Delhi, India 17

29 March 2017 Indian Institute of Technology Delhi, India 18

29 March 2017 Indian Institute of Technology Delhi, India 19

29 March 2017 Indian Institute of Technology Delhi, India 21

29 March 2017 Indian Institute of Technology Delhi, India 22

You might also like