Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

LinuxCon Japan 2013

Memory Hotplug
May 29th, 2013
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
FUJITSU LIMITED

Copyright 2013 FUJITSU LIMITED


Agenda
 Motivation of memory hotplug
 What is memory hotplug?
 Development of memory hotplug
 To-do Lists

1 Copyright 2013 FUJITSU LIMITED


MOTIVATION OF MEMORY HOTPLUG

2 Copyright 2013 FUJITSU LIMITED


Hardware partitioning & Dynamic reconfiguration
 Hardware partitioning
 System can have several servers in a box with flexibility
Partition 1 Partition 2 Partition 3 Partition 1 Partition 2
Application Application Application Application Application
Middleware Middleware Middleware Users can change Middleware Middleware
Linux Linux Linux partition’s configuration Linux Linux
SB
while a partition is power off
SB SB SB SB SB

 Dynamic reconfiguration
 System can change partition’s configuration at runtime
Partition 1 Partition 2
Application Application
Middleware Middleware
Linux Linux
SB SB SB Free SB
User can add
Free SB

3 Copyright 2013 FUJITSU LIMITED


Purpose of memory hotplug (1)
 Dynamic reconfiguration enhance RAS
SB

Partition 1 Partition 1 Partition 1


Application Application Application
Middleware Middleware Middleware
Linux hot remove Linux hot add Linux
SB SB SB SB SB SB SB SB

 Dynamic reconfiguration is used for resize

Partition 1 Partition 1
Application Application
Middleware Middleware
Linux hot add Linux
SB
SB SB Free SB SB SB SB

4 Copyright 2013 FUJITSU LIMITED


Purpose of memory hotplug (2)
 Memory hotplug will be supported by KVM
 “ACPI memory hotplug” by Vasilis Liaskovitis
http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg02693.html

Guest machine
Application
Middleware
Linux
Node0 Node1

Add memory

⇨ Memory hotplug has been supported by several OSs. But Linux has
not copmletely support it yet.

5 Copyright 2013 FUJITSU LIMITED


WHAT IS MEMORY HOTPLUG

6 Copyright 2013 FUJITSU LIMITED


Memory hotplug
 Memory hotplug allows to users to increase/decrease the amount
of memory
 Physical memory hotplug phase (For hot adding/removing DIMMs physically)
•memory hot add
•memory hot remove
 Logical memory hotplug phase (For changing the amount of memory)
•memory online
•memory offline

7 Copyright 2013 FUJITSU LIMITED


Memory management
 Kernel manages physical memory by using
 page table (direct mapping)
•calculate virtual address to physical address virtual address
 virtual memory map
•manage page structures
virtual memory map

physical address

page table

node0

8 Copyright 2013 FUJITSU LIMITED


Memory hot add
 Prepare following items
physical address
for managing added
memory node1
virtual address
 page table (direct mapping)
 virtual memory map
virtual memory map
add virtual memory map

node0

page table
add page table

9 Copyright 2013 FUJITSU LIMITED


Memory hot remove
 Delete following items
physical address
for managing removed
memory node1
virtual address
 page table (direct mapping)
 virtual memory map
virtual memory map
delete virtual memory map

node0

page table
delete page table

10 Copyright 2013 FUJITSU LIMITED


Online memory & Offline memory
 All pages are managed physical zone

section section
by each zone structures address zone
page ….
page
node1 page
 User can online/offline page

memory by echoing zone


sysfs file NORMAL section section

 echo online/offline > page


page
….
page
/sys/devices/system/mem page

ory/memoryX/state
 Online memory node0
zone

section section
 change page state to page ….
usable NORMAL page
page
page

 Offline memory
zone
 change page stat to DMA section section
unusable page ….
page
page
DMA32 page

11 Copyright 2013 FUJITSU LIMITED


DEVELOPMENT OF MEMORY HOTPLUG

12 Copyright 2013 FUJITSU LIMITED


Status of memory hotplug on linux-3.2
Hot add memory
 Support
Hot remove memory
 Not supoprt
Online memory
Support
Offline memory
 Support with limitation

13 Copyright 2013 FUJITSU LIMITED


Required items for memory hot remove
 Delete following items
physical address
for managing removed
memory node1
virtual address
 page table (direct mapping)
 virtual memory map
 /sys/firmware/memmap sysfs virtual memory map
 /sys/devices/syste/node/memo delete virtual memory map
ryX sysfs
 Improve ACPI memory
hotplug frame work node0

 Update zone information page table


delete page table

All of them are merged into


linux3.9

14 Copyright 2013 FUJITSU LIMITED


Limitation of memory offline
allocate a new page
 When offlining memory has data, zone
and migrate date to
section section
the data is migrated to other page zone the page
page ….
node1 page
page
 But migratable pages are only page

page cache and anonymous page, zone


called movable memory NORMAL section section

 So if memory is used as other page


page
page
….
page
purpose except for movable
memory, called kernel memory, zone
the memory cannot be offlined node0 section section

page ….
NORMAL page
page
page

zone
DMA section section

page ….
page
page
DMA32 page

15 Copyright 2013 FUJITSU LIMITED


Comparison of 2 ways of offlining
 Support to offline kernel memory
Pros.
•All memory can be offlined
•No performance impact caused by NUMA
Cons.
•Kernel address (P<->V relationship) should change completely
 Make a node only of movable memory
Pros.
•We can make use of existing feature
Cons.
•The node can use as only page cache and anonymous page
•Performance impact caused by NUMA

16 Copyright 2013 FUJITSU LIMITED


Comparison of 2 ways of offlining
 Support to offline kernel memory
Pros.
•All memory can be offlined
•No performance impact caused by NUMA
Cons.
•Kernel address (P<->V relation ship) should change completely

For supporting kernel memory offline, it will take several years.

17 Copyright 2013 FUJITSU LIMITED


Comparison of 2 ways of offlining
 Support to offline kernel memory
Pros.
•All memory can be offlined
As first step, we develop following solution
•No performance impact caused by NUMA
Cons.
•Kernel address (P<->V relation ship) should change completely
 Make a node consists only of movable memory (movable node)
Pros.
•We can make use of existing feature
Cons.
•The node can use as only page cache and anonymous page
•Performance impact by NUMA

18 Copyright 2013 FUJITSU LIMITED


Make a node consists only of movable memory
all pages’ date can be
 For making a memory range of zone
migrated to other
section sectionpages
movable memory, Linux has zone
page ….
ZONE_MOVABLE, this is not node1 page
page
page

created automatically
zone
 If a node is constructed by only MOVABLE section section

ZONE_MOVABLE, the whole page


page
….

memory on the node can be page


page

migrated to other memory and


offlined node0
zone

section section

page ….
NORMAL page
page
page

zone
DMA section section

page ….
page
page
DMA32 page

19 Copyright 2013 FUJITSU LIMITED


ZONE_MOVABLE configuration
 Enhance/Add features for creating ZONE_MOVABLE
1. New boot option, movablecore=acpi
2. New “online” feature, online_movable

20 Copyright 2013 FUJITSU LIMITED


Original boot option of creating MOVABLE zone
 Linux has a boot option, called
movablecore=, for creating
MOVABLE zone node1 node1
5G MOVABLE
 The boot option can specifies
amount of movable memory in NORMAL
a system
NORMAL
 But movable memory is evenly
distributed to all nodes. movablecore=10G
node0 node0
5G MOVABLE
NORMAL
NORMAL

DMA DMA

DMA32 DMA32

21 Copyright 2013 FUJITSU LIMITED


New boot option
 We are proposing a new boot
option “movablecore=acpi”
node1 node1
 use memory affinity structure in ACPI SRAT table
SRAT table Node Hotpluggable bit
 if hotpluggable bit is enable, the 0 Disable
NORMAL MOVABLE
memory range is managed to 1 Enable
MOVABLE zone

The feature is under movablecore=acpi


developing node0 node0

NORMAL NORMAL

DMA DMA

DMA32 DMA32

22 Copyright 2013 FUJITSU LIMITED


Lack of feature at onlining memory
 Hot added memory is always physical
managed by NORMAL zone zone memory
node1 node1
 When onlining memory, the free

memory may contains kernel movable memory

memory and movable memory NORMAL free

echo online > kernel memory

/sys/devices/system/node/nodeX/m movable memory


kernel memory
emoryY/state free
online
node0 memory node0

NORMAL

DMA

DMA32

23 Copyright 2013 FUJITSU LIMITED


New interface at memory onlining
 Online memory into physical
ZONE_MOVABLE zone zone memory
node1 node1 node1
echo online_movable > free
/sys/devices/system/node/nodeX/me movable memory
moryY/state
NORMAL MOVALBE free

The feature is merged into movable memory

linux3.8 free
online_movable
memory
node0 node0 node0

NORMAL NORMAL

DMA DMA

DMA32 DMA32

24 Copyright 2013 FUJITSU LIMITED


Current status of memory hotplug
Hot add memory
 Support
Hot remove memory
 supoprt
Online memory
Support
Offline memory
 Support with limitation
•added online_movable option
•under developing movablecore=acpi option

25 Copyright 2013 FUJITSU LIMITED


TO-DO LISTS

26 Copyright 2013 FUJITSU LIMITED


Switch of changing hot added memory’s zone
 Hot added memory is always
managed by NORMAL zone
node1 node1
 If user want to hot remove
memory, user need to use:
 echo online_movable > NORMAL MOVABLE
/sys/devices/system/nodeX/mem
oryY/state
 Prepare use SRAT information
 SRAT information or sysctl
node0 node0
•If movablecore=acpi is defined,
check hotpluggable bit of SRAT NORMAL NORMAL
information
 sysctl
•vm.hotadd_memory_treat_as_mov DMA DMA
able
DMA32 DMA32

27 Copyright 2013 FUJITSU LIMITED


Kernel object in a hot added memory
 When hot adding memory, kernel objects like page table and
virtual memory map are allocated into other memory.
physical physical
zone memoy zone memoy
node1 node1 virtual
memory map

NORMAL/ NORMAL/ page table


MOVABLE MOVABLE

move kernel
objects in a hot
node0 virtual added memory node0
memory map
NORMAL NORMAL
page table

DMA DMA

DMA32 DMA32

28 Copyright 2013 FUJITSU LIMITED


Migrate vmalloc area
 Vmalloc area is not continuous physically
 When offlining vmalloc’s region, the data on the range is migrated
to other memory
vmalloc physical
area memoy
node1

Migrate vmalloc’s data


node0

29 Copyright 2013 FUJITSU LIMITED


30 Copyright 2010 FUJITSU LIMITED

You might also like