Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Presentation "Cache & SpinLocks Udi & Haim. A...

Search...

Search

http://slideplayer.com/slide/224800/

Upload

Log in

Download presentation

Cache coherence

We think you have liked this presentation. If you wish to


download it, please recommend it to your friends in any social
system. Share buttons are a little bit lower. Thank you!
Buttons:

Write-invalidate Snooping Protocol For Write-back

When a block is first loaded in the cache it is marked "valid".


On a read miss to the local cache, the read request is broadcast
on the bus. If one
5
has cached that address and it is in the state "dirty", it changes the state to "valid"
and sends the copy to requesting node. The "valid" state means that the cache line is
current.
Cancel
Download
When writing a block in state "valid" its state is changed
to "dirty" and a broadcast is sent out to all cache
controllers to invalidate their copies.

Cache & SpinLocks Udi & Haim.


102
Agenda
Caching
background
Why do we need caching? Caching in
modern desktop. Cache writing. Cache
coherence. Cache.
Published by Jackson Gibson

1 of 21

Similar presentations

Modied over 2 years ago

06/12/2016 09:27 AM

Presentation "Cache & SpinLocks Udi & Haim. A...

Embed

http://slideplayer.com/slide/224800/

Download presentation

Download presentation
0 Comments

Sort by Oldest
We think you have liked this presentation. If you wish to
download it, please recommend it to your friends in any social
system. Share buttons are a little bit lower. Thank you!
Buttons:

Facebook Comments Plugin

Cancel

Presentation on theme: "Cache & SpinLocks Udi & Haim.

Download

Agenda Caching background Why do we need caching?


Caching in modern desktop. Cache writing. Cache coherence.
Cache." Presentation transcript:

1
2

Cache & SpinLocks Udi & Haim


Agenda Caching background Why do we need caching?

Caching in modern desktop. Cache writing. Cache coherence.


Cache & Spinlocks

Agenda Concurrent Systems Synchronization Types Spinlock

Semaphore Mutex Seqlocks RCU Spinlock in linux kernel Caching


and locking

Cache

Why caching? Accessing the main memory is expensive. And is

becoming the pc performance bottleneck. Slower CPU Faster CPU

Caching in modern desktop What is caching? A computer

memory with very short access time used for storage of frequently
used instructions or data webster.com Modern desktop have at
least three caches: TLB translation lookaside buer I-Cache
instruction cache D-Cachedata cache

Caching in modern desktop Locality Temporal locality Spatial

locality Cache coloring Replacement policies LRU MRU Direct Map


2 ofcache
21 Cache performance = The proportion of accesses that result

in a cache hit

06/12/2016 09:27 AM

Presentation
"Cache
& SpinLocks
Haim.writing
A...
writing
There areUdi
two& basic
approaches: Write8 Cache

http://slideplayer.com/slide/224800/

through Write is done synchronously both to the cache and to the


backing store. Write-back (or Write-behind) Initially, writing is done
only to the cache. The
write to presentation
the backing store is postponed until
Download
the cache blocks containing the data are about to be
modied/replaced by new content.
We think you have liked this presentation. If you wish to
Cache writing Two approaches for situations of write-misses:
download it, please recommend it to your friends in any social
No-write allocate (aka
Write Share
around)
The missed-write
is not you!
system.
buttons
are a little bitlocation
lower. Thank
loaded to cache, and is written directly to the backing store. In this
approach, only system
reads are being cached. Write allocate (aka
Buttons:
Fetch on write) The missed-write location is loaded to cache, followed
by a write-hit operation. In this approach, write misses are similar to
read-misses.

10

Cache coherence Coherence denes the behavior of reads

and writes to the same memory location.

11

Cancel
Download
Cache coherence The coherence of caches is obtained if the

following conditions are met: In a read made by a processor P to a


location X that follows a write by the same processor P to X, with no
writes of X by another processor occurring between the write and the
read instructions made by P, X must always return the value written
by P. This condition is related with the program order preservation,
and this must be achieved even in monoprocessed architectures. A
read made by a processor P1 to location X that follows a write by
another processor P2 to X must return the written value made by P2 if
no other writes to X made by any processor occur between the two
accesses. This condition denes the concept of coherent view of
memory. If processors can read the same old value after the write
made by P2, we can say that the memory is incoherent. Writes to the
same location must be sequenced. In other words, if location X
received two dierent values A and B, in this order, from any two
processors, the processors can never read location X as B and then
read it as A. The location X must be seen with values A and B in that
order

12

Cache

coherence

Cache

coherence

mechanisms

Directory-based Snooping (BUS-based) And many more .

13

Cache coherence Directory-based In a directory-based system,

the data being shared is placed in a common directory that maintains


the coherence between caches. The directory acts as a lter through
which the processor must ask permission to load an entry from the
primary memory to its cache. When an entry is changed the directory
either updates or invalidates the other caches with that entry.

14

3 of 21

Cache coherence Snooping (BUS-based) Snooping is the

process where the individual caches monitor address lines for


accesses to memory locations that they have cached. It is called a

06/12/2016 09:27 AM

write invalidate
whenUdi
a&
write
operation
is observed to a
Presentation
"Cacheprotocol
& SpinLocks
Haim.
A...

http://slideplayer.com/slide/224800/

location that a cache has a copy of. There are two implementation for
the invalidate protocol: Write-update When a local cache block is
updated, the new data block is broadcast to all caches containing a
Download presentation
copy of the block for updating them Write-invalidate Invalidate all
remote copies of cache when a local cache block is updated.
We think Coherence
you have liked
this presentation.
you wish to
coherence
protocol
example: IfWritedownload it, please recommend it to your friends in any social
invalidate Snooping Protocol For Write-through Writes invalidate all
system. Share buttons are a little bit lower. Thank you!
other caches

15

Cache

16

Buttons:
Cache coherence Write-invalidate Snooping Protocol For

Write-back When a block is rst loaded in the cache it is marked


"valid". On a read miss to the local cache, the read request is
broadcast on the bus. If one has cached that address and it is in the 5
state "dirty", it changes the state to "valid" and sends the copy to
requesting node. The "valid" state means that the cache line is
current. When writing a block in state "valid" its state is changed to
"dirty" and a broadcast is sent out to all cache controllers to Cancel
invalidate Download
their copies.

17

Cache coherence - MESI MESI Modied Exclusive Shared

Invalid

18

Cache coherence - MESI Every cache line is marked with one

of the four following states : Modied - The cache line is present only
in the current cache, and is dirty; it has been modied from the value
in main memory. The cache is required to write the data back to main
memory at some time in the future, before permitting any other read
of the (no longer valid) main memory state. The write-back changes
the line to the Exclusive state. Exclusive - The cache line is present
only in the current cache, but is clean; it matches main memory. It
may be changed to the Shared state at any time, in response to a read
request. Alternatively, it may be changed to the Modied state when
writing to it. Shared - Indicates that this cache line may be stored in
other caches of the machine and is "clean" ; it matches the main
memory. The line may be discarded (changed to the Invalid state) at
any time. Invalid - Indicates that this cache line is invalid (unused). To
summarize, the MESI is an extension of MSI algo. The MESI adds
division between modifying cache point the exist only in my cache
AND modifying cache point the exist also in other caches

19

Cache coherence - MESI For any given pair of caches, the

permitted states of a given cache line are as follows: The Exclusive


state is an opportunistic optimization: If the CPU wants to modify a
cache line that is in state S, a bus transaction is necessary to invalidate
all other cached copies. State E enables modifying a cache line with no
bus transaction.
4 of 21
20 Cache coherence

06/12/2016 09:27 AM

Presentation
"Cache
& SpinLocks
A... what is done by the
What
is done Udi
by &
theHaim.
OS and
21 Cache

http://slideplayer.com/slide/224800/

hardware? In Intel X86 series, caching is implement in hardware, all


you need and can do it to change the conguration with registers
interface called Control
registers.presentation
The control registers are sets in to 7
Download
groups : CR0, CR1, CR2, CR3, CR4, And another 2 groups called: EFER,
CR8 (added to support X64 series) Our main interest in the
presentation revolved around caching, but bear in mind that this
We think you have liked this presentation. If you wish to
interface contain every parameter you can set on Intel architecture.
download it, please recommend it to your friends in any social
CR0 CD (bit 30) Globally enables/disable the memory cache CR0
system. Share buttons are a little bit lower. Thank you!
NW (bit 29) Globally enables/disable write-back caching (or writethrow) ushing of TLB
entries can be done in Linux using API called
Buttons:
vpid_sync_contextvpid_sync_context The implementation is done by
using: vpid_sync_vcpu_single or vpid_sync_vcpu_global for single or all
Cpus

22

Caching & Spinlock

23

Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]
Cancel
Download
test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret

24

Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]

test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret

25

Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]

test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret

26

Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]

test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret

27

Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]

test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret

28

Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]

test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret

29

Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]

test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret

30

Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]

test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
ret The other CPU action
5 of[locked]
21

06/12/2016 09:27 AM

Presentation
"Cacheand
& SpinLocks
Udi & Haim.
spin lock spin_lock:
movA...
eax, 1 xchg eax, [locked]
31 Caching

http://slideplayer.com/slide/224800/

test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret

32

Download presentation

Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]

test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret
We think you have liked this presentation. If you wish to
download it, please recommend it to your friends in any social
spin lock spin_lock: mov eax, 1 xchg eax, [locked]
33 Caching and system.
Share buttons are a little bit lower. Thank you!
test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
Buttons:
[locked] ret

34

Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]

test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret

35

Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]

test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 Cancel
xchg eax, Download
[locked] ret

36

Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]

test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret

37

Caching and spin lock spin_lock: mov eax, [locked] test eax,

eax jnz spin_lock mov eax, 1 xchg eax, [locked] test eax, eax jnz
spin_lock ret spin_unlock: mov eax, 0 xchg eax, [locked] ret

38

Caching and ticket lock void spin_lock(spinlock_t *lock){ t =

atomic_inc(lock->next_ticket); while (t != lock->current_ticket) } void


spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; }

39

Caching and ticket lock void spin_lock(spinlock_t *lock){ t =

atomic_inc(lock->next_ticket); while (t != lock->current_ticket) } void


spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; }

40

Caching and ticket lock void spin_lock(spinlock_t *lock){ t =

atomic_inc(lock->next_ticket); while (t != lock->current_ticket) } void


spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; }

41

Caching and ticket lock void spin_lock(spinlock_t *lock){ t =

atomic_inc(lock->next_ticket); while (t != lock->current_ticket) } void


spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; }
6 of 21

42

Caching and ticket lock void spin_lock(spinlock_t *lock){ t =

06/12/2016 09:27 AM

atomic_inc(lock->next_ticket);
!= lock->current_ticket)
} void
Presentation
"Cache & SpinLockswhile
Udi &(tHaim.
A...
spin_unlock(spinlock_t *lock){ lock->current_ticket++;
spinlock_t { int current_ticket; int next_ticket; }

43

http://slideplayer.com/slide/224800/

struct

Caching andDownload
ticket lockpresentation
void spin_lock(spinlock_t *lock){ t =

atomic_inc(lock->next_ticket); while (t != lock->current_ticket) } void


spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
We think you have liked this presentation. If you wish to
spinlock_t { int current_ticket; int next_ticket; }
download it, please recommend it to your friends in any social
Sharevoid
buttons
are a little bit lower.
Thank
Caching andsystem.
ticket lock
spin_lock(spinlock_t
*lock){
t = you!

44

atomic_inc(lock->next_ticket);
Buttons: while (t != lock->current_ticket) } void
spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; } SPIN

45

Caching and ticket lock void spin_lock(spinlock_t *lock){ t =

atomic_inc(lock->next_ticket); while (t != lock->current_ticket) } void


spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; }
Cancel
Download

46

Caching and ticket lock void spin_lock(spinlock_t *lock){ t =

atomic_inc(lock->next_ticket); while (t != lock->current_ticket) } void


spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; }

47

Caching and ticket lock void spin_lock(spinlock_t *lock){ t =

atomic_inc(lock->next_ticket); while (t != lock->current_ticket) } void


spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; }

48

Caching and ticket lock void spin_lock(spinlock_t *lock){ t =

atomic_inc(lock->next_ticket); while (t != lock->current_ticket) } void


spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; }

49

Caching and ticket lock void spin_lock(spinlock_t *lock){ t =

atomic_inc(lock->next_ticket); while (t != lock->current_ticket) } void


spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; }

50

Caching and ticket lock void spin_lock(spinlock_t *lock){ t =

atomic_inc(lock->next_ticket); while (t != lock->current_ticket) } void


spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; }

51

Caching and ticket lock void spin_lock(spinlock_t *lock){ t =

atomic_inc(lock->next_ticket); while (t != lock->current_ticket) } void


spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; }
7 of 21

52

Caching and ticket lock void spin_lock(spinlock_t *lock){ t =

06/12/2016 09:27 AM

atomic_inc(lock->next_ticket);
!= lock->current_ticket)
} void
Presentation
"Cache & SpinLockswhile
Udi &(tHaim.
A...
spin_unlock(spinlock_t *lock){ lock->current_ticket++;
spinlock_t { int current_ticket; int next_ticket; }

53

http://slideplayer.com/slide/224800/

struct

Caching andDownload
ticket lockpresentation
void spin_lock(spinlock_t *lock){ t =

atomic_inc(lock->next_ticket); while (t != lock->current_ticket) } void


spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
We think you have liked this presentation. If you wish to
spinlock_t { int current_ticket; int next_ticket; }
download it, please recommend it to your friends in any social
Sharevoid
buttons
are a little bit lower.
Thank
Caching andsystem.
ticket lock
spin_lock(spinlock_t
*lock){
t = you!

54

atomic_inc(lock->next_ticket);
Buttons: while (t != lock->current_ticket) } void
spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; }

55

Caching and ticket lock void spin_lock(spinlock_t *lock){ t =

atomic_inc(lock->next_ticket); while (t != lock->current_ticket) } void


spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; }
Cancel
Download

56

Caching and ticket lock void spin_lock(spinlock_t *lock){ t =

atomic_inc(lock->next_ticket); while (t != lock->current_ticket) } void


spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; }

57

Caching and ticket lock void spin_lock(spinlock_t *lock){ t =

atomic_inc(lock->next_ticket); while (t != lock->current_ticket) } void


spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; }

58

Caching and ticket lock void spin_lock(spinlock_t *lock){ t =

atomic_inc(lock->next_ticket); while (t != lock->current_ticket) } void


spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; }

59

Caching and ticket lock void spin_lock(spinlock_t *lock){ t =

atomic_inc(lock->next_ticket); while (t != lock->current_ticket) } void


spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; }

60

Interrupt

61

An interrupt is simply a signal that the hardware can send

when it wants the processors attention. driver need only register a


handler for its devices interrupts, and handle them properly when
they arrive

62

Interrupt Cont Register API: int request_irq(unsigned int irq,

irqreturn_t (*handler)(int, void *, struct pt_regs *), unsigned long ags,


const char *dev_name, void *dev_id); Un register API: void
free_irq(unsigned int irq, void *dev_id);

8 of 21

06/12/2016 09:27 AM

Presentation
"Cache &
SpinLocks
Udi
Haim.
A...
Cont
unsigned
int&irq
The
interrupt number being
63 Interrupt

http://slideplayer.com/slide/224800/

requested irqreturn_t (*handler)(int, void *, struct pt_regs *) The


pointer to the handling function being installed

64

Download presentation

Interrupt Cont unsigned long ags a bit mask of options

related to interrupt SA_INTERRUPT - When set, this indicates a fast


interrupt handler. Fast
executed
with
interrupts disabled
Wehandlers
think youare
have
liked this
presentation.
If you wish to
on the current processor
SA_SHIRQ
This
bit
signals
that
the
interrupt
download it, please recommend it to your friends in any social
can be shared between
devices
char
The Thank
string you!
system.
Shareconst
buttons
are*dev_name
a little bit lower.
passed to request_irq is used in /proc/interrupts
Buttons:
Interrupt
Cont
void *dev_id Pointer used for shared interrupt
65
lines. It is a unique identier that is used when the interrupt line is
freed and that may also be used by the driver to point to its own
private data area

66

Top & Bottom Half how to perform lengthy tasks within a

handler? splitting the interrupt handler into two halves.


The Download
Cancel
so-called top half is the routine that actually responds to the interrupt
The bottom half is a routine that is scheduled by the top half to be
executed later, at a safer time. all interrupts are enabled during
execution of the bottom half

67

Top & Bottom Half Cont Two dierent mechanisms that may

be used to implement bottom-half processing Takslet - fast and must


be atomic (SW interrupt) Workqueue - higher latency but are allowed
to sleep

68

Tasklet Fast & atomic (can not sleep) Guaranteed to run on

the same CPU as the function that rst schedules them Interrupt
handler can be secure that a tasklet does not begin executing before
the handler has completed

69

Tasklet Cont Another interrupt can certainly be delivered

while the tasklet is running, so locking between the tasklet and the
interrupt handler may still be required They may be scheduled to run
multiple times, but tasklet scheduling is not cumulative, the tasklet
runs only once, even if it is requested repeatedly before it is launched

70

Tasklet Cont No tasklet ever runs in parallel with itself, since

they run only once, but tasklets can run in parallel with other tasklets
on SMP systems, so locking between tasklets are required

71

Tasklet

Example

void

short_do_tasklet(unsigned

long);

DECLARE_TASKLET(short_tasklet, short_do_tasklet, 0); irqreturn_t


short_tl_interrupt(int irq, void *dev_id, struct pt_regs *regs) { /*
Handle fast path IRQ */ tasklet_schedule(&short_tasklet); /* Schedule
return IRQ_HANDLED; }
9 oftasklet*/
21

06/12/2016 09:27 AM

Presentation
"Cache & SpinLocks
Udi &
Haim.
Higher latency
but
are A...
allowed to sleep Invoke a
72 Workqueue

http://slideplayer.com/slide/224800/

function at some future time in the context of a special worker


process. Workqueue function runs in process context, it can sleep if
need be. You cannot,
however,presentation
copy data into user space from a
Download
workqueue process Workqueue process does not have access to any
other processs address space
We think you have liked this presentation. If you wish to
Workqueue Example static struct work_struct short_wq; /* this
download it, please recommend it to your friends in any social
line is in short_init()
*/ INIT_WORK(&short_wq,
(*)(void
*)) you!
system.
Share buttons are a little(void
bit lower.
Thank
short_do_tasklet, NULL); irqreturn_t short_wq_interrupt(int irq, void
*dev_id, struct pt_regs
*regs) { /* Handle fast path IRQ */
Buttons:
schedule_work(&short_wq);/*
Schedule
workqueue*/
return
IRQ_HANDLED; }

73

74

Locks

75

Concurrent Systems Concurrency - what happens when the

system tries to do more than one thing at once! In computer


science, Download
Cancel
concurrency is a property of systems in which several computations
are executing simultaneously, and potentially interacting with each
other. The computations may be executing on: 1. Multiple cores in the
same chip. 2. Preemptively time-shared threads on the same
processor. 3. Executed on physically separated processors. (wiki)

76

Concurrent Systems Management The management of

concurrency is one of the core problems in operating systems


programming and the resulting outcome can be indeterminate

77

Concurrent Systems Management Concurrency faults can lead

to: Race condition - uncontrolled access to shared data. Starvation where a process is perpetually denied necessary resources. Without
those resources, the program can never nish its task. Deadlock - is
a situation in which two or more competing actions are each waiting
for the other to nish, and thus neither ever doe.

78

Race

Condition

Example

for

race

condition:

Lock;

if

(!dptr->data[s_pos])
{
dptr->data[s_pos]
=
kmalloc(quantum,
GFP_KERNEL); if (!dptr->data[s_pos]) goto __cleanup; } UnLock; Leads
to memory leak !!!

79

Deadlock Example for deadlock: Lock; if (!dptr->data[s_pos]) {

dptr->data[s_pos]
=
kmalloc(quantum,
GFP_KERNEL);
if
(!dptr->data[s_pos]) return -1; } UnLock; Leads to deadlock!!! goto
__cleanup;

80

Solution Avoid shared resources whenever possible In many

situations it is possible to design data structures that do not require


locking,
e.g. by using per-thread or per-CPU data and disabling
10 of
21
interrupts Problem - such sharing is often required, hardware

06/12/2016 09:27 AM

resources "Cache
are, by&their
nature,Udi
shared,
andA...
software resources also
Presentation
SpinLocks
& Haim.

http://slideplayer.com/slide/224800/

must often be available to more than one thread

81

Solution Cont mutual exclusion - making sure that only one

Download presentation

thread of execution can manipulate a shared resource at any time


Not all critical sections are the same, so the kernel provides dierent
primitives for dierent needs Process context can sleep Interrupt
We think you have liked this presentation. If you wish to
context cannot sleep
download it, please recommend it to your friends in any social
system.isShare
buttons
are a little
bitthat
lower.
Spinlock A spinlock
a mutual
exclusion
device
canThank
have you!

82

only two values: locked


and unlocked If the kernel control path nds
Buttons:
the spin lock unlocked, it acquires the lock and continues its execution

83

Spinlock Cont if the kernel control path nds the lock locked, it

spins around, repeatedly executing a tight instruction loop, until the


lock is released spinlocks may be used in code that cannot sleep, such
as interrupt handler

84

Spinlock Scenario #1 1.Driver acquires a spinlocksCancel


2.Driver Download

loses the processor due to: the driver call a function which put the
process to sleep (e.g copy_from_user) kernel preemption kicks in higher priority process push the driver code aside

85

Spinlock Scenario #1 Result: Diver holds a spinlocks which

will not be free in the near future In the best case, if another thread
tries to acquire the locks it will spin for long time In the worst case
deadlock can occur

86

Spinlock Scenario #1 Conclusion Conclusion: Code holding a

spinlock must be atomic and can not go to sleep (and sometimes not
even handle interrupt) Preemption is disabled on processor which
hold a spinlock

87

Spinlock Scenario #2 1.Driver acquires a spinlocks 2.Async -

device issue an interrupt The interrupt handler of the device, trying


to acquire the spinlock

88

Spinlock Scenario #2 Result: what happens if the interrupt

routine executes in the same processor as the code that took out the
lock originally? Deadlock !!!

89

Spinlock Scenario #2 Conclusion Conclusion: In this case the

acquire of spinlock must disable interrupt Spinlocks critical section


must be as little as possible ( The longer you hold a lock, the longer
another processor may have to spin waiting for you to release it)

90

Spinlock Scenario #2 Conclusion Cont Long lockhold times

also keep the current processor from scheduling, meaning that a


higher priority processwhich really should be able to get the CPUmay
have to wait

11 of 21

06/12/2016 09:27 AM

Presentation
"Cache & API
SpinLocks
Udi &lock
Haim.
A...
Spinlock
Initialize
APIs:
91

spinlock_t

lock

http://slideplayer.com/slide/224800/

SPIN_LOCK_UNLOCKED spin_lock_init(spinlock_t *lock)

92

Spinlock API Lock APIs and, possibly, disabling interrupts:

Download presentation

void spin_lock(spinlock_t *lock) void spin_lock_irqsave(spinlock_t


*lock, unsigned long ags) Interrupts can execute in nested fashion
the previous interrupt
state
is you
stored
in liked
ags this
(safe)
We
think
have
presentation. If you wish to
download it, please recommend it to your friends in any social
void spin_lock_irq(spinlock_t *lock) If you are
93 Spinlock APIsystem.
Share buttons are a little bit lower. Thank you!
absolutely sure nothing else might have already disabled interrupts
on your processor IfButtons:
you are sure that you should enable interrupts
when you release your spinlock void spin_lock_bh(spinlock_t *lock)
disables software interrupts before taking the lock

94

Spinlock API Cont Try lock APIs: int spin_trylock(spinlock_t

*lock) int spin_trylock_bh(spinlock_t *lock) Non spinning versions of


the above functions

95

Cancel
Download
Spinlock API Cont Un lock APIs: void spin_unlock(spinlock_t

*lock) void spin_unlock_irqrestore(spinlock_t *lock, unsigned long


ags)
void
spin_unlock_irq(spinlock_t
*lock)
void
spin_unlock_bh(spinlock_t *lock)

96

Semaphore single integer value combined with a pair of

functions: one which acquires a semaphore one which releases a


semaphore if the value of the semaphore is greater than zero: value
is decremented by one process continues

97

Semaphore Cont otherwise the process goes to sleep till

another process will release the semaphore (increment by one the


semaphore value) if necessary, wakes up processes that are waiting

98

Semaphore Struct Semaphore struct can be found at:

include/linux/semaphore.h struct semaphore { raw_spinlock_t lock;


unsigned int count; struct list_head wait_list; };

99

Semaphore Struct Cont The ->lock (spinlock) controls access

to the other members of the semaphore The ->count variable


represents how many more tasks can acquire this semaphore. If it's
zero, there may be tasks waiting on the wait_list The ->wait_list is a list
of tasks waiting for the semaphore (FIFO)

100

Semaphore Lock Procedure Semaphore Lock: 1.acquire

spinlock 2.if(count >0) i.count-- 3.else i.insert calling task to the tail of
wait_list ii.set wakeup ag to 0 iii.repeat: release spinlock put task to
sleep acquire spinlock if (wakeup ag ==1) exit repeat section
4.release spinlock
12 of 21

101

Semaphore Unlock Procedure Semaphore Unlock: 1.acquire

06/12/2016 09:27 AM

spinlock 2.if(wait_list
is empty) Udi
i.count
++ 3.else
Presentation
"Cache & SpinLocks
& Haim.
A... i.node = get wait_list

http://slideplayer.com/slide/224800/

head ii.remove node from wait_list iii.set wakeup ag to 1 iv.wakeup


process 4.release spinlock

102

presentation
SemaphoreDownload
APIs Create
a semaphore with initalize counter

value of val void sema_init(struct semaphore *sem, int val) #dene


DEFINE_SEMAPHORE(name) Lock a semaphore void down(struct
We think you have liked this presentation. If you wish to
semaphore *sem)
download it, please recommend it to your friends in any social
system.
buttons are
a little
bit lower.
Thank
Semaphore
APIsShare
Interruptible
lock
(process
can
be you!

103

interrupted by a signal)
int down_interruptible(struct semaphore
Buttons:
*sem) Try lock (never sleep) int down_trylock(struct semaphore
*sem) Un lock semaphore void up(struct semaphore *sem)

104

RW Semaphore Semaphores perform mutual exclusion for

all callers Many tasks break down into two distinct types of work:
Readers Writers

105

Cancel
Download
RW Semaphore Allow multiple concurrent readers Optimize

performance An RW semaphore allows either one writer or an


unlimited number of readers to hold the semaphore

106

RW Semaphore cont Since multiple readers may hold the

lock at once writer may continue waiting for the lock while new
reader threads are able to acquire the lock write starvation

107

RW

Semaphore

API

RW

Semaphore

struct:

struct

rw_semaphore { long count; raw_spinlock_t wait_lock; struct list_head


wait_list; }

108

RW

Semaphore

API

Initialize

RW

semaphore

init_rwsem(struct rw_semaphore *sem) Obtaining and releasing read


access to a reader/writer semaphore void down_read(struct
rw_semaphore *sem) int down_read_trylock(struct rw_semaphore
*sem) void up_read(struct rw_semaphore *sem)

109

RW Semaphore API Cont Obtaining and releasing write

access to a reader/writer semaphore void down_write(struct


rw_semaphore *sem) int down_write_trylock(struct rw_semaphore
*sem)
void
up_write(struct
rw_semaphore
*sem)
void
downgrade_write(struct rw_semaphore *sem)

110

RW Semaphore API Cont Obtaining and releasing write

access to a reader/writer semaphore void down_write(struct


rw_semaphore *sem) int down_write_trylock(struct rw_semaphore
*sem)
void
up_write(struct
rw_semaphore
*sem)
void
downgrade_write(struct rw_semaphore *sem)
13 of 21

111

Mutex Similar to semaphore Mutex struct struct mutex { /*

1: unlocked, 0: locked, negative: locked, possible waiters */ atomic_t

06/12/2016 09:27 AM

count; spinlock_t
struct
struct task_struct
Presentation
"Cache wait_lock;
& SpinLocks
Udilist_head
& Haim. wait_list;
A...

http://slideplayer.com/slide/224800/

*owner; };

112

Mutex Vs Semaphore Only one task can hold the mutex at a

Download presentation

time (binary semaphore) Only the owner of the mutex can unlock the
mutex Recursive locks Improvement: try to spin for acquisition when
we nd that there are no pending waiters and the lock owner is
We think you have liked this presentation. If you wish to
currently running on a (dierent) CPU (it is likely to release the lock
download it, please recommend it to your friends in any social
soon)
system. Share buttons are a little bit lower. Thank you!

113

Seqlocks In read write locks: readers must wait until the


Buttons:
writer has nished writer must wait until all readers have nished
Seqlocks give a much higher priority to writers writer is allowed to
proceed even when readers are active

114

Seqlocks Cont The Seqlock struct can be found at /include

/linux/seqlock.h typedef struct { struct seqcoun_t seqcount; spinlock_t


lock; } seqlock_t;
Cancel
Download

115

Seqlocks Read Access Read access works by: obtaining an

(unsigned) integer sequence value on entry into the critical section


do some reading operations Compare the current sequence # with
the one obtained if there is a mismatch, the read access must be
retried

116

Seqlocks Write Access The write lock is implemented with a

spinlock, so all the usual constraints apply give a much higher


priority to writers writer is allowed to proceed even when readers
are active Increment the sequence #

117

Seqlocks Read Example A typical code example will look like:

unsigned int seq; do { seq = read_seqbegin(&the_lock); /* Do what you


need to do */ } while (read_seqretry(&the_lock, seq));

118

Seqlocks Summary Pros writer never waits (unless another

writer is active) free access for readers Cons reader may sometimes
be forced to read the same data several times until it gets a valid copy
generally cannot be used to protect data structures involving pointers,
because the reader may be following a pointer that is invalid while the
writer is changing the data structure

119

Seqlocks

API

Initialize

seqlocks:

seqlock_t

lock

SEQLOCK_UNLOCKED; seqlock_init(seqlock_t *lock); Obtaining read


access: unsigned int read_seqbegin(seqlock_t *lock); unsigned int
read_seqbegin_irqsave(seqlock_t *lock, unsigned long ags);

120

Seqlocks API Cont int read_seqretry(seqlock_t *lock,

unsigned int seq); int read_seqretry_irqrestore(seqlock_t *lock,


unsigned int seq, unsigned long ags);

14 of 21

06/12/2016 09:27 AM

Presentation
& SpinLocks
Udi &
Haim. A... write
Seqlocks
API Cont
Obtaining
121 "Cache

access:

void

http://slideplayer.com/slide/224800/

write_seqlock(seqlock_t *lock); void write_seqlock_irqsave(seqlock_t


*lock, unsigned long ags); void write_seqlock_irq(seqlock_t *lock);
void
write_seqlock_bh(seqlock_t
*lock);
int
Download presentation
write_tryseqlock(seqlock_t *lock);

122

API Cont Releasing write access: void


We think you have liked this presentation. If you wish to
write_sequnlock(seqlock_t
*lock);
void in any social
download it, please recommend
it to your friends
write_sequnlock_irqrestore(seqlock_t
*lock,
long Thank
ags); you!
system. Share buttons
are unsigned
a little bit lower.
void
write_sequnlock_irq(seqlock_t
*lock);
void
write_sequnlock_bh(seqlock_t
Buttons: *lock);

123

Seqlocks

RCU Read Copy Update An improvement for seqlocks RCU

allows many readers and many writers to proceed concurrently (an


improvement over seqlocks, which allow only one writer to proceed)
optimized for situations where reads are common and writes are rare

124

RCU Constrains Constraints: resources being protected


Cancel
Download
should be accessed via pointers all references to those resources
must be held only by atomic code (process can not sleep inside a
critical region protected by RCU)

125

RCU How ? On the reader side, code using an RCU- protected

data structure should disable\enable preemption struct my_stu


*stu;
rcu_read_lock(
);//disable
preemption
stu
=
nd_the_stu(args...); /* Do what you need to do */ rcu_read_unlock(
);//enable preemption

126

RCU How Cont? On the writer side: Allocates a new

structure Copies data from the old one, Replaces the pointer that is
seen by the read code At this point from reader perspective, the
change is complete. any code entering the critical section sees the
new version of the data

127

RCU Cleanup The only problem is when no free the old

pointer (reader might have reference for the pointer) Since all code
holding references to this data structure must (by the rules) be
atomic, we know that once every processor on the system has been
scheduled at least once, all references must be gone. RCU sets aside
a callback that waits until all processors have scheduled; that callback
is then run to perform the cleanup work

128

Simple Spinlock Using test and set atomic function test-

and-set instruction is an instruction used to write to a memory


location and return its old value as a single atomic operation int
test_and_set(int *lock)

129

15 of 21

Simple Spinlock #dene LOCKED 1 int test_and_set(int*

lockPtr) { int oldValue; oldValue = SwapAtomic(lockPtr, LOCKED);

06/12/2016 09:27 AM

return (oldValue
Presentation
"Cache==
& LOCKED);
SpinLocks} Udi & Haim. A...

130

http://slideplayer.com/slide/224800/

Test and set mutex Implemention Lock: int lock(int* lockPtr)

{ while (TestAndSet(lock)==LOCKED) //wait a bit } UnLock: int


Download presentation
un_lock(int* lockPtr) {* lockPtr=0;}

131

Problems Grants requests in unpredictable order Accelerate


We think you have liked this presentation. If you wish to
inter-CPU bus trac (cache)
download it, please recommend it to your friends in any social
system.
Share lock
buttons
areas
a little
bit lower.
Thank you!
Ticket Spinlocks
A ticket
works
follows:
Two integer

132

values which initializeButtons:


to 0 Queue ticket Dequeue ticket

133

Ticket Spinlocks Cont Acquire lock procedure: Obtain &

increments queue ticket Compares its ticket's value (before the


increment) with the dequeue ticket's value If they are the same, the 5
thread is permitted to enter the critical section else, then another
thread must already be in the critical section and this thread must
busy-wait or yield
Cancel
Download

134

Ticket Spinlocks Cont Release lock procedure: Increments

the dequeue ticket This permits the next waiting thread to enter the
critical section

135

Ticket Spinlock Summary Grants requests in FIFO order

Problems Accelerate inter-CPU bus trac (cache)

136

Linux Scalability What is scalability? Application does N

times as much work on N cores as it could on 1 core. Scalability may


be limited by Amdahl's Law: Locks, shared data structures,... Shared
hardware (DRAM, NIC,...)

137

Linux Scalability

138

Linux Scalability Cont

142

Test-and-Set Lock Repeatedly test-and-set a Boolean ag

indicating whether the lock is held Problem: contention for the ag


(read- modify-write instructions are expensive) Causes lots of
network trac, especially on cache-coherent architectures (because
of cache invalidations) Variation: test-and-test-and-set less trac

143

Ticket Lock 2 counters (nr_requests, and nr_releases) Lock

acquire: fetch-and-increment on the nr_requests counter, waits until


its ticket is equal to the value of the nr_releases counter Lock release:
increment of the nr_releases counter

144

Ticket Lock Cont Advantage over T&S: polls with read

operations
only BUT - Still generates lots of trac and contention
16 of
21

All threads spin on the same shared location causing cache-

06/12/2016 09:27 AM

coherence"Cache
trac on
every successful
access
Presentation
& SpinLocks
Udi & lock
Haim.
A...

145

http://slideplayer.com/slide/224800/

The Problem Busy-waiting techniques is heavily used in

synchronization on shared memory Busy-waiting synchronization


Download presentation
constructs tend to: Have signicant impact on network trac due to
cache invalidations Contention leads to poor scalability
WeCont
think
you signicant
have liked impact
this presentation.
you wish to
The Problem
Have
on networkIf trac
download it, please recommend it to your friends in any social
due to cache invalidations: Even in the case of two CPUs are
system. Share buttons are a little bit lower. Thank you!
repeatedly acquiring a spinlock, the memory location representing
that lock will bounce
back and forth between those CPUs' caches.
Buttons:
Even if neither CPU ever has to wait for the lock, the process of
moving it between caches will slow things down considerably

146

147

The Problem Cont Contention leads to poor scalability: The

simple act of spinning for a lock clearly is not going to be good for
performance Cache contention would appear to be less of an issue
(CPU spinning on a lock will cache its contents in a shared mode) No
Cancel
Download
cache bouncing should occur until the CPU owning the lock releases it
(Releasing the lock and its acquisition by another CPU requires writing
to the lock, and that requires exclusive cache access)

148

The Problem Cont Contention leads to poor scalability: Case

2: lock is contended, there will be one or more other CPUs constantly


querying its value, obtaining shared access to that same cache line
and depriving the lock holder of the exclusive access it needs. A
subsequent modication of data within the aected cache line will
thus incur a cache miss. So CPUs querying a contended lock can slow
the lock owner considerably, even though that owner is not accessing
the lock directly.

149

The Problem Cont Contention leads to poor scalability:

Kernel code will acquire a lock to work with (and, usually, modify) a
structure's contents. Often, changing a eld within the protected
structure will require access to the same cache line that holds the
structure's spinlock. Case 1: lock is uncontended, that access is not a
problem, the CPU owning the lock probably owns the cache line as
well. Case 2: lock is contended, there will be one or more other CPUs
constantly querying its value, obtaining shared access to that same
cache line and depriving the lock holder of the exclusive access it
needs. A subsequent modication of data within the aected cache
line will thus incur a cache miss. So CPUs querying a contended lock
can slow the lock owner considerably, even though that owner is not
accessing the lock directly.

150
151

17 of 21

The Source of the Problem Spinning on remote variables


The Proposed Solution Insert delay (backo) Minimize

access to remote variables - spin on local variables instead

06/12/2016 09:27 AM

Presentation
"Cache
& SpinLocks
Udi &
Haim.than
A... spinning tightly and
Lock
With Backo
Rather
152 Spin

http://slideplayer.com/slide/224800/

querying a contended lock's status, a waiting CPU should wait a bit


more patiently, only querying the lock occasionally Cause a waiting
CPU to loop a number
of times
doing nothing at all before it gets
Download
presentation
impatient and checks the lock again

153

Spin Lock With Backo Pros Pros While a CPU is looping


We think you have liked this presentation. If you wish to
without querying thedownload
lock it cannot
be bouncing
cache
it, please
recommend
it tolines
youraround,
friends in any social
so the lock holder should
be
able
to
make
faster
progress
Calculate
system. Share buttons are a little bit lower. Thank you!
proportional backo using the value of the ticket minus the number
of ticket which is currently
Buttons: served multiply the static backo loop
Cons too much looping will cause the lock to sit idle before the
owner of the next ticket notices that its turn has come; that, too, will
hurt performance All threads spin on the same shared location
5
causing cache-coherence trac on every successful lock access

154

Spin Lock With Backo Cons Cons too much looping will

cause the lock to sit idle before the owner of the next ticket notices
Cancel
Download
that its turn has come; that, too, will hurt performance All threads
spin on the same shared location causing cache-coherence trac on
every successful lock access

155

Array Lock #dene NUM_OF_PROC 100 #dene HAS_LOCK 1

#dene MUST_WAIT 0 struct arrLock{ int slot[NUM_OF_PROC]; int


next_slot; }

156

Array Lock Cont #dene INIT_ARR_LOCK(name) \ struct

arrLock name;\ name.slot = [0=HAS_LOCK, 1 NUM_OF_PROC-1 =


MUST_WAIT];\ name.next_slot = 0;

157

Array

Lock

Acquire

int arr_lock_lock

(struct arr_lock

*arr_lock_p, int *my_slot) { *my_slot = fetch_and_increment


(arr_lock_p->next_slot); // returns old value *my_slot %=
NUM_OF_PROC ; // get the slot inside the array while
(arr_lock_p->slots[*my_slot]
=
MUST_WAIT)
{};
//
spin
arr_lock_p->slots[*my_slot] = MUST_WAIT; // init for next time return
0; }

158

Array Lock Release int arr_lock_unlock (struct arr_lock

*arr_lock_p, int my_slot) { arr_lock_p->slots[(my_slot + 1) %


NUM_OF_PROC] = HAS_LOCK; return 0; } Each CPU clears the lock for
its successor (sets it from must-wait to has-lock) Lock-acquire while
(slots[my_place] == MUST_WAIT); Lock-release slots[(my_place + 1) %
NUM_OF_PROC] = HAS_LOCK;

159

Array Lock Cons adjacent data items share a single cache

line. A write to one item invalidates that items cache line


18 of 21

160

Array Lock Cons How to solve it: Pad array elements so that

06/12/2016 09:27 AM

distinct elements
mapped to
distinct
cache
Presentation
"Cache are
& SpinLocks
Udi
& Haim.
A...lines

161

http://slideplayer.com/slide/224800/

Array Lock Cons The ALock is not space-ecien We dont

know NUM_OF_PROC value?

Download presentation

162

Array Lock Pros Spin on local variables, no cache jumps

think you have3 liked


this presentation.
you 2
wish to
Ticket LockWe
Improvements
counters
(nr_requests,Ifand
download it, please recommend it to your friends in any social
nr_releases) each counter is in dirent cache line (padding with
system. Share buttons are a little bit lower. Thank you!
zeroes) Counter init values: nr_requests = 1 array of nr_releases:
nr_releases[0] = 1 nr_releases[1]
=0
Buttons:

163

164

Ticket Lock Improvements Algo The algorithm: Lock: fetch-

and-increment
(nr_requests)
//get
ticket
while(ticket
!=nr_releases[(ticket+1) %2]) //wait for my turn Lock release:
nr_releases[(ticket % 2)]+=2 //increment by 2

165

Ticket Lock Improvements Summary Advenatege: Divide by


Cancel
Download
half chache miss (linear to the array size of nr_release) Can be
generalized for n releases counters Disadventage: The lock is not
space-ecient - each counter is in distincit cache line

166

MCS Lock List Based Queue Lock Goals: Reduce bus trafc

on cc machines (by spinning on local varibles) Space ecient


Requires
atomic
instructions
available
on
some
CPUs:
ATOMIC_COMPARE_AND_SWAP: CAS (mem, old, new) If *mem ==
old, then set *mem = new and return true

167

MCS Lock List Based Queue Lock typedef struct qnode {

struct qnode *next; bool locked; } mcs_lock_qnode; A lock is just a


pointer to a qnode typedef mcs_lock_qnode *mcs_lock;

168

MCS Lock Acquire acquire (mcs_lock *L, mcs_lock_qnode *I) {

I->next = NULL; qnode *predecessor = I; ATOMIC_SWAP (predecessor,


*L); if (predecessor != NULL) { I->locked = true; predecessor->next = I;
while (I->locked) ; }

169

MCS Lock Acquire If unlocked, L is NULL If locked, no waiters,

L is owners qnode If waiters, *L is tail of waiter list

170

MCS Lock Release release (mcs_lock *L, mcs_lock_qnode *I) {

if (!I->next) { if (ATOMIC_COMPARE_AND_SWAP (*L, I, NULL)) return; }


while (!I->next) ; I->next->locked = false; }

171

MCS Lock Release If I->next NULL and *L == I No one else is

waiting for lock, OK to set *L = NULL If I->next NULL and *L != I


Another thread is in the middle of aquire Just wait for I->next to be
non-NULL
If I->next is non-NULL I->next oldest waiter, wake up w.
19 of
21
I->next->loked = false

06/12/2016 09:27 AM

Presentation
"Cache & Cache
SpinLocks
Haim. A...given below is used to
Line Udi
the &technique
172 Exclusive

http://slideplayer.com/slide/224800/

force alignment of data structures on cache boundaries: Dynamic:


#dene ALIGN 64 void *aligned_malloc(int size) { void *mem =
kmalloc(size+ALIGN+sizeof(void*),
GFP_KERNEL); void **ptr = (void**)
Download presentation
((long)(mem+ALIGN+sizeof(void*)) & ~(ALIGN-1)); ptr[-1] = mem;
return ptr; }
We think you have liked this presentation. If you wish to
Cache Line void aligned_free(void *ptr) {
download it, please recommend it to your friends in any social
free(((void**)ptr)[-1]);system.
} static:Share
int __attribute__((aligned(64)))
lock;Thank you!
buttons are a little bit lower.

173
174

Exclusive

Memory pool
Lookaside Caches Allocating many objects of
Buttons:

the same size, over and over in the kernel. API kmem_cache_t
*kmem_cache_create(const char *name, size_t size, size_t oset,
unsigned long ags, void (*constructor)(void *, kmem_cache_t *,
unsigned long ags), void (*destructor)(void *, kmem_cache_t *, 5
unsigned long ags)); void *kmem_cache_alloc(kmem_cache_t
*cache, int ags); void kmem_cache_free(kmem_cache_t *cache,
const void *obj); int kmem_cache_destroy(kmem_cache_t *cache);
Cancel
Download
ags = SLAB_HWCACHE_ALIGN This ag requires each data object to
be aligned to a cache line;

175

Memory pool Memory Pools There are places in the kernel

where memory allocations cannot be allowed tofail. A memory pool


is really just a form of a lookaside cache that tries to always keep a list
of free memory around for use in emergencies. API mempool_t
*mempool_create(int
min_nr,
mempool_alloc_t
*alloc_fn,
mempool_free_t
*free_fn,
void
*pool_data);
typedef
void
*(mempool_alloc_t)(int gfp_mask, void *pool_data); typedef void
(mempool_free_t)(void
*element,
void
*pool_data);
void
*mempool_alloc(mempool_t
*pool,
int
gfp_mask);
void
mempool_free(void
*element,
mempool_t
*pool);

int
mempool_resize(mempool_t *pool, int new_min_nr, int gfp_mask);
void mempool_destroy(mempool_t *pool);

176

Memory pool Code example #dene ALIGN 64 const char*

cachName = slots union slot{ char spaceKeeper[ALIGN] ; int val; };


cache
=
kmem_cache_create(cachName,sizeof(slot),0,
SLAB_HWCACHE_ALIGN,...);
pool
=
mempool_create(MY_POOL_MINIMUM,
mempool_alloc_slab,
mempool_free_slab, cache); union slot * obj = (union slot *)
mempool_alloc(pool,..);

177

Testing

178

Testing Application The test application is divided into 2

parts: User mode a performance test application Kernel mode a


char device
20 of 21

179

Testing Application - kernel Create a char device driver Via

06/12/2016 09:27 AM

ioctl control
then&following:
of spinlock:
ticket lock Array
Presentation
"Cache
SpinLockscreation
Udi & Haim.
A...

http://slideplayer.com/slide/224800/

lock MCS lock acquire spinlock (the which was created) Release
spinlock (the which was created)

180

Download
presentation
Testing Application
- User
The test application will generate

each run a dierent type of spinlock The test application will run
several fork\threads (one for each CPU core) Each thread will run on
We think you have liked this presentation. If you wish to
separate (unique) core (sched_setanity)
download it, please recommend it to your friends in any social
system. Share
buttons
are Pseudo
a little bitCode:
lower.fopen
Thank you!
Testing Application
User
Cont

181

spinlock device Create


a spinlock type for i = 0; i < MAX_OF_CORES;
Buttons:
i++ sched_setanity(i)

182

Testing Application User Cont Each thread will do the

following in a loop of x iterators: acquire a lock suspend himself


(sched_yield) and let other threads to run release the lock Measure
the time that all threads nished

183

Cancel
Testing Application User Cont Pseudo Code:
fopen Download

spinlock device create a spinlock type start_tick= Get Tick for i = 0;


i < MAX_OF_CORES; i++ Run thread (i) Make sure all threads nished
working Time = Get Tick start_tick

184

Testing

Application

User

Cont

Inside

thread

sched_setanity(i) Loop: lock acquire suspend (sched_yield) lock


release

185

Thank you

Download "Cache & SpinLocks Udi & Haim. Agenda Caching


background Why do we need caching? Caching in modern desktop.
Cache writing. Cache coherence. Cache."

2016 SlidePlayer.com Inc.


All rights reserved.

Feedback

About project

Privacy Policy

SlidePlayer

Feedback

Terms of Service

Search...

21 of 21

Search

06/12/2016 09:27 AM

You might also like