Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 29

Introducing Exadata X8M

In-Memory Performance with


All the Benefits of Shared Storage for both
OLTP and Analytics
Martien Ouwens
Oracle Enterprise Architect
EMEA Systems Platform Architecture Group
New RDMA Fabric
7 May 2020
Persistent Memory
Safe harbor statement

The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not
a commitment to deliver any material, code, or functionality, and should not be relied
upon in making purchasing decisions.

The development, release, timing, and pricing of any features or functionality described for
Oracle’s products may change and remains at the sole discretion of Oracle Corporation.

3
Exadata X8M (changes from X8 in red)
• Scale-Out 2 or 8 Socket Database Servers Database Server

• Latest 24 core Intel Cascade Lake

• 100Gb RDMA over Converged Ethernet (RoCE) High-Capacity (HC) Storage

Internal Fabric
• Scale-Out Intelligent 2-Socket Storage Servers Extreme Flash (EF) Storage
• 1.5 TB Persistent Memory per storage server
• Three tiers of storage: PMEM, NVMe, HDD

• Enhanced consolidation using Linux KVM Extended (XT) Storage

4 Copyright © 2019 Oracle and/or its affiliates.


Taal cursus: RoCE

https://rocky.fandom.com/wiki/Rocky_Balboa?file=BalboaTitle1982.jpg

5
Worlds Fastest Database Machine – Exadata X8M
Exadata X8M storage performance is comparable to in-
memory
• With capacity, sharing, and cost benefits of shared storage
• 16 Million OLTP Read IOPS (8K IOs) Each rack has up to
• 2.5x faster than Exadata X8 3.0 PB Raw Disk
• <19 microsecond OLTP IO latency 920 TB NVMe Flash
• 10x faster than Exadata X8 27 TB PMEM
• Ultra fast log file writes to accelerate transactions
• 560GB/sec Analytic Scan throughput
• Over 1 TB/sec analytic scans with columnar data in flash

Performance scales as more racks are added


6 Copyright © 2019 Oracle and/or its affiliates.
Exadata X8M
RoCE Networking

7 Copyright © 2019 Oracle and/or its affiliates.


New RoCE Internal Network Fabric
• InfiniBand was the only viable RDMA capable
network at the inception of Exadata, but now
Ethernet has caught up
• Exadata RoCE provides RDMA speed and
reliability on Ethernet fabric
• 100Gb throughout
• Zero packet loss messaging
• Prioritization of critical database messages
• Latest KVM based virtualization
• Cisco Nexus 9336C-FX QSFP28 (100GbE) Ethernet World’s First and Only
Switches RoCE-based Database Machine
8 Copyright © 2019 Oracle and/or its affiliates.
RoCE is Industry Standard

• Defined by an Open Consortium


• InfiniBand Trade Association (IBTA)
• Developed in open-source and maintained in upstream Linux
• Supported by major network card vendors: Broadcom, Intel, Mellanox
• Supported by major switch vendors: Arista, Cisco, Juniper, Mellanox
• Exadata is using Mellanox Card and Cisco switch
• RoCE also used by new storage products implementing remote access to
flash drives on the network - NVMe (flash IO) over Fabrics

9
InfiniBand vs RoCE
Feature InfiniBand RoCE
Decentralized Autonomous Fabric
Fabric Management Centralized using Subnet Manager
Management
Speed 40Gb/s 100Gb/s
Lossless Network ✅ ✅
Multi-rack* ✅ ✅
All Exadata Performance Features ✅ ✅
Kernel Support UEK2, UEK4, UEK5 UEK5 only
Exadata Virtualization Xen KVM
Instant Failure Detection Via Subnet Manager Query Via RDMA Queries

* Multi-racking between InfiniBand and RoCE is not possible

10 Copyright © 2019 Oracle and/or its affiliates.


Exadata Uses RDMA for Extreme Performance
• Remote Direct Memory Access (RDMA) is the ability for one computer
to access data from a remote computer without any OS or CPU involvement
• Network card directly reads/writes memory with no extra copying or buffering
and very low latency

• RDMA is an integral part of the Exadata high-performance architecture enabling:


• High throughput and low-CPU usage for large data transfers
• Unique Direct-to-Wire Protocol to deliver 3x faster inter-node OLTP cluster messaging
• Unique Smart Fusion Block Transfer that eliminates log write on inter-node block move
• Unique RDMA protocol to coordinate transactions between nodes
• Direct ultra low-latency access to persistent memory in storage servers (X8M+19c only)
• Ultra low-latency writes of database logs to persistent memory in storage servers (X8M+19c only)

• RDMA is enabled with all supported DB


• PMEM optimizations only available with X8M and 19c RDBMS
11
Exadata X8M Interoperability

• Exadata X8M RoCE fabric is an internal cluster and storage network


• Client connectivity remains the same as X8 – connect directly from database nodes via 10/25Gbps
• Other Engineered Systems such as PCA, ZDLRA cannot take advantage of RDMA to connect to
the database, therefore access to this network fabric is not required
• Exadata X8M can work together with Exadata X8 by utilizing Data Guard and
GoldenGate
• X8 (IB) and X8M (RoCE) CANNOT be directly connected (i.e. multirack)
• X8 and previous Storage Severs (IB) CANNOT be used with X8M (RoCE)
• PCA and ZDLRA X8 or ZDLRA X8M can interoperate with Exadata X8M using the
client network access

12
New RoCE Internal Network Fabric

• New Exadata 100 Gb RoCE provides RDMA speed


and reliability on Ethernet fabric
• But RDMA has been a long-time friend on IB
High throughput and low-CPU usage for large data transfers
Unique Direct-to-Wire Protocol to deliver 3x faster inter-node
OLTP cluster messaging
….so what is the new trick?
RDMA to PMEM in Storage Servers

13
Exadata X8M
Persistent Memory boosts
OLTP performance to the next
level

14 Copyright © 2019 Oracle and/or its affiliates.


New Persistent Memory
• Persistent memory is a new silicon technology
• Capacity, performance, and price are between DRAM and flash
DRAM
Intel® Optane™ DC Persistent Memory:

Higher Cost Per GB



Persistent Mem
• Reads at memory speed – much faster than flash

Faster
• Writes survive power failure unlike DRAM FLASH

• optimizations to fully leverage PMEM speed and


maintain integrity of data on PMEM during failures
• RDMA
• Call special instructions to flush data from CPU cache to PMEM
• Complete or backout sequence of writes interrupted by a crash
15 Copyright © 2019 Oracle and/or its affiliates.
Persistent Memory in Conventional Storage
• Persistent Memory usage with conventional storage:
Compute Server • Database issues read I/O call to OS
• OS sends message to storage
SAN • Storage CPU issues read to Persistent Memory
Storage Server • Storage CPU sends reply to Server OS
• Server OS wakes up Database
• Speed of Persistent Memory read is overwhelmed by
Hot PMEM
high cost of network and I/O software, interrupts, and
context switches
EMC, vSphere PMEM, etc..
• Performance benefit from PMEM is wasted
16
Dissect the Exadata Flash I/O Read Latency
Database
Server Database
Software
Context
Storage Switch:
Server Kernel/OS 10s of µs
(Database
Server) 

Flash Read Raw


Latency: <100 µs Kernel/OS
(Storage Server)
FLASH
Context
Database 8K Read End- Switch:
Storage Server 10s of µs
to-end Latency: ~200 Software
µsec
17
17
Exadata X8 Before X8M, the world’s fastest DB for $$/OLTP transaction
What if we drop in PMEM as is?
Database Database
Server Software
Context
Switch:
Storage
Kernel/OS 10s of µs
Server
(Database
Server) 

PMEM Read Raw


Latency: ~1 µs Kernel/OS
90x better than flash (Storage Server)

PMEM Context
Switch:
Database 8K Read End- Storage Server 10s of µs
to-end Latency: ~100 Software
µsec
2x lower latency than X8, but 90x PMEM advantages wasted.
18
18
Exadata X8M + pre-19c Somewhat faster than X8
A Radical Approach – RDMA Read to PMEM
Database Database
Server Software
Context
Switch:
Storage RDMA 10s of µs
Kernel/OS
Server
(Database
Server) 

PMEM
Kernel/OS
(Storage Server)

Database 8K Read End- Context


to-end Latency: Switch:
Storage Server
<19 µsec Software
10s of µs

Exadata X8M + 19c 10x Faster latency than Exadata X8


19
19 2.5x faster than X8 on OLTP
Exadata X8M With Persistent Memory Data Accelerator
World’s First and Only Shared Persistent Memory Optimized for Database

Compute Server
• Exadata Storage Servers transparently add Persistent Memory
Accelerator in front of Flash memory
RoCE RDMA • 2.5X higher IOs per second than current – 16 Million IOPS
Storage Server
• Database uses RDMA instead of IO to read remote PMEM
• Bypasses network and IO software, interrupts, context switches
Persistent
• 10X better latency - <19 μsec for 8K database read
Hot Memory
• PMEM Automatically tiered and shared across DBs
Warm FLASH • Using as a cache for hottest data increases effective capacity 10x

Cold
• Persistent Memory mirrored automatically across storage
servers for fault-tolerance

20 Copyright © 2019 Oracle and/or its affiliates.


Enabled with Exadata System Software 19.3 and Database Software 19c
Exadata X8M Persistent Memory Commit Accelerator
Compute Server • Log Write latency is critical for OLTP performance
RoCE RDMA
• Faster log writes means faster commit times
Log Write • Any log write slowdown stalls the whole database
Storage Server
• Automatic Commit Accelerator
• Database issues one-way RDMA writes to PMEM on
Hot
Persistent
Memory Flush later
multiple Storage Servers
to
Flash/Disk
• Bypasses network and IO software, interrupts, context
Warm FLASH switches, etc.
• Up to 8x Faster Log Writes
Cold
• AKA PMEM Log
21 Copyright © 2019 Oracle and/or its affiliates.
Enabled with Exadata System Software 19.3 and Database Software 19c
Exadata
Smart
Software
19.3.0
The Secret to Breakthrough
Performance, Availability and Cost

22 Copyright © 2019 Oracle and/or its affiliates.


ESS support - X8 and X8M
• “Exadata Database Machine and Exadata Storage Server Supported Versions (Doc ID 888828.1)”

• X8M needs 19.3 and later


- ESS 19.3 supports RDBMS versions 11.2.0.4 to 19c
- 11.2.0.3.28 desupported
- PMEM Cache and PMEM Log RDMA optimizations only supported with 19c
- Virtualization with KVM – Trusted Partitions approved by LMS
• X8 needs 19.2 and later
- Latest ESS 19.2.6 update supports RDBMS versions 11.2.0.3.28 to 19c
- Virtualization with OVM – Trusted Partitions approved by LMS

- X8 can run with 19.3


- New features but no PMEM Cache/Log RDMA optimizations (even with 19c)

23
Exadata X8M – DB support
• Support for 11.2.0.4 to 19c RDBMS

• To leverage full 2.5x/10x performance benefit 19c DB is needed, min version supported:
• 19.4.0.0.0.190716

• Pre-19c DB will take advantage from X8M with lower performance enhancement, min version
supported on X8M:
• 18.7.0.0.0.190716
• 12.2.0.1.0.180831
• 12.1.0.2.0.180831
• 11.2.0.4.0.180717

• 11.2.0.3 officially desupported

24• All DB options are supported in X8M (RAC, DG, etc..)


Exadata Virtualization
• Exadata X8M uses KVM based Virtualization
• Exadata X8 and previous InfiniBand based Exadata continue using Xen
• RoCE and Persistent Memory only supported on KVM

• KVM Virtualization gives:


• 2X more guest VM Memory – 1.5 TB/server (1376GB – IB was 720GB)
• Faster client network latency (HV integrated with Kernel)
• 50% more guest VMs per server
• 8 VMs per node in IB –vs- 12 VMs for KVM
• Faster installation procedure
25
Exadata Software Overview
• Exadata Database Machine and Exadata Storage Server Supported Versions
(Doc ID 888828.1)

• Exadata System Software Certification (Doc ID 2075007.1)

• Exadata What’s new

• https://docs.oracle.com/en/engineered-systems/exadata-database-machine/dbmso/new-features.html

26
Exadata Software 19.3 Enhancements
• Features only available for X8M

• Persistent Memory Data Accelerator


• Persistent Memory Commit Accelerator
• Support for Oracle Exadata X8M Systems
• Instant Failure Detection for X8M Systems
• KVM for Virtual Environments
• Default XFS File System for X8M Servers

27
Exadata Software 19.3 Enhancements
• Features also available for other Exadata generations

• Faster Encrypted Table Smart Scans


• Smart Aggregation
• Smart In-Memory Columnar Cache with Row IDs
• Smart In-memory Columnar Cache with Chained Rows
• Fast Smart In-memory Columnar Cache Creation
• Encryption of System Logs to Remote Destinations
• Update a Single SNMP User Definition
• Securing Storage Server Software Processes with Memory Protection Keys
• Software Certification Ends for Exadata Database Machine X2 Servers
28
Veel gestelde vragen
• Kunnen X8M componenten voor de uitbreiding van X8/X7… gebruikt worden?
• Nee, RoCE vs infiniband
• Hoe kan ik mijn bestaande Exadata uitbreiden?
• Met X8-componenten
• Kan ik een X8M met X8/X7… koppelen?
• Multi-racking: Nee Data Guard: Ja
• Hoe kan ik een backup maken vanaf mijn Exadata X8M?
• 10/25GbE
• Wat kost een X8M vergeleken met X8?
• Momenteel is de prijs identiek (“Promotional Price”)

29

You might also like