Professional Documents
Culture Documents
Lundhild-Understanding RAC Internals
Lundhild-Understanding RAC Internals
Lundhild-Understanding RAC Internals
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracles products remains at the sole discretion of Oracle.
Agenda
1. What are the major components of Oracle Clusterware and how do they interact? 2. Why does Oracle reboot nodes? 3. How does Oracle handle private interconnect failure and scalability? 4. When my public network fails, why does ASM and the db instance get shut down? 5. What exactly is the VIP, its purpose, and how does it work? 6. What is the purpose of ONS is it required for anything other than FAN? 7. How does Oracle do load balancing across RAC instances?
What are the major components of Oracle Clusterware and how do they interact?
RAC 10 Architecture
public network
Node1
Node n
Operating System
Operating System
shared storage
Redo / Archive logs all instances Database / Control files OCR and Voting Disks
VIP
Clusterware
VIP
Oracle Clusterware
If a node does not send a network heartbeat for <MissCount> (time in seconds), then node is evicted from cluster If disk heartbeat (voting disk) is not updated in <I/O timeout>, then node is evicted from cluster
Heartbeat Failures
Network Heartbeat
node(4) missed(59) checkin(s) >2005-06-18 08:14:37.858 [3002575792] >WARNING: clssnmPollingThread: Eviction started for node 4,flags 0x000d, >state 3, wt4c 0 >2005-06-18 08:14:41.985 [3047074736] >TRACE: clssnmHandleSync: CSSD]2005-10-11 15:56:23.668 [93645744] >WARNING: clssnmDiskPMT: long disk latency >(45940 ms) to voting disk (0//dev/raw/raw1)
Disk Heartbeat
Oracle Clusterware
Split Brain Resolution
Split Brain Resolution:
Determine surviving subcluster Sub-cluster with largest number of Nodes Sub-cluster with lowest node number IO Fencing via Stonith algorithm (remote power reset)
Voting disk is used to detect and resolve network problems that could lead to a split-brain
Final arbiter of the status of configured nodes, either up or down, and delivers eviction notices Recommended to have at least 3 voting disks Multiple voting disks supported in RAC 10g Release 2 Dynamic addition of voting disk RAC 11g
Changing MissCount
IT IS NOT SUPPORTED TO REDUCE MISSCOUNT BELOW THE DEFAULT
Default varies somewhat by platform (30s or 60s) Default = 600s if vendor clusterware is installed
Private Interconnect
public network
//
Node 2
VIPn Service Listener instance n ASM
Oracle Clusterware
Node1
Node n
Operating System
Operating System
Operating System
Switch 1
cluster interconnect
Switch 2
Private Interconnect
Network between the nodes of a RAC cluster MUST be private Supported links: GbE, IB ( IPoIB: 10.2 ) Supported transport protocols:
Oracle Clusterware uses TCP RAC: UDP, RDS (10.2.0.3)
Use multiple or dual-ported NICs for redundancy and increase bandwidth with NIC bonding Large ( Jumbo ) Frames for GbE recommended
Interconnect Bandwidth
Bandwidth requirements depend on
CPU power per cluster node Application-driven data access frequency Number of nodes and size of the working set Data distribution between PQ slaves 10000-12000 8K blocks per sec to saturate 1 x Gb Ethernet ( 75-80% of theoretical bandwidth )
Typical utilization approx. 10-30% in OLTP Multiple NICs generally not required for performance and scalability
IPC configuration
Settings:
Socket receive buffers ( 256 KB 1MB ) Negotiated top bit rate and full duplex mode NIC ring buffers Ethernet flow control settings CPU(s) receiving network interrupts
Interconnect Bonding
Terminology: NIC Bonding, link aggregation, port trunking, NIC teaming, Multiple physical links combined into a single logical link
Provides redundancy and/or scalability
Logical link is provided to Oracle Clusterware and RAC Most operate at OSI Layer 2 Different implementations on different platforms
Read the fine print Generally recommend failover only (active/passive) configuration
Interconnect Bonding
Some cluster managers provide support for multiple interconnects
Not required with Oracle Clusterware
OS-Specific bonding
Solaris: IPMP, Sun Trunking AIX: etherchannel HP-UX: APA Linux: NIC Bonding Windows: NIC Teaming IB drivers inherently support failover and load balancing.
10
Interconnect Configuration
OCR
[SYSTEM.css.interfaces.global.bond0.192|d168|d12|d0.1] ORATEXT : cluster_interconnect SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_ALL_ACCESS, OTHER_PERMISSION : PROCR_READ, USER_NAME : oracle, GROUP_NAME : odba}
RDBMS
SQL> select * from x$ksxpia; ADDR INDX INST_ID P PICK NAME_KSXPIA IP_KSXPIA -------- ---------- ---------- - ---- --------------- ------------58EC8340 0 1 Y OCR bond0 192.168.12.1
11
Db_block_size = 8K
ifconfig a:
eth0 Link encap:Ethernet HWaddr 00:0B:DB:4B:A2:04 Bcast:130.35.27.255 MTU:1500 Mask:255.255.252.0 inet addr:130.35.25.110
Metric:1
RX packets:21721236 errors:135 dropped:0 overruns:0 frame:95 TX packets:273120 errors:0 dropped:0 overruns:0 carrier:0
12
286,038 177,315
174 164 52
4,272 6,316
13
Operating System
14
Oracle Dependencies
Prior to 10.2.0.3
public network
Node1
Node2
Operating System
Operating System
shared storage
Redo / Archive logs all instances Database / Control files OCR and Voting Disks
Oracle Dependencies
Prior to 10.2.0.3
public network
Node1
Node2
Operating System
Operating System
shared storage
Redo / Archive logs all instances Database / Control files OCR and Voting Disks
15
Oracle Dependencies
public network
Node1
Node 2
Operating System
Operating System
shared storage
Redo / Archive logs all instances Database / Control files OCR and Voting Disks
Oracle Dependencies
public network
Node1
Node 2
Operating System
Operating System
shared storage
Redo / Archive logs all instances Database / Control files OCR and Voting Disks
16
What exactly is the VIP, its purpose, and how does it work?
17
18
VIP
Listener.ora
SID_LIST_LISTENER_PMRAC1 = (SID_LIST = (SID_DESC = (SID_NAME = PLSExtProc) (ORACLE_HOME = /u01/oracle/product/10gR2/asm) (PROGRAM = extproc) ) ) LISTENER_PMRAC1 = (DESCRIPTION_LIST = (DESCRIPTION = (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1))
VIP
) )
19
Application VIPs
New resource in Oracle RAC 10g Release 2 Created as functional VIPs which can be used to connect to an application regardless of the node it is running on VIP is a dependent resource of the user registered application There can be many VIPs, one per User Application
20
What is the purpose of ONS is it required for anything other than FAN?
21
What is FAN?
Fast Application Notification (FAN) is a RAC notification mechanism FAN HA Events: Notification of Up/Down for service, instance & node Load Balancing Advisory Events: Advise clients of current load for service and where to send connection requests Enable it, and Forget it.
Fan Clients
HA Events: JDBC Implicit Connection Cache, OCI, ODP.NET Connection Pools, Listener, Server Side Callouts, CMAN Load Balancing Advisory Events: JDBC Implicit Connection Cache, ODP.NET Connection Pools, Listener, CMAN
New in RAC 11g OCI Session Pools subscribe to Load Balancing Advisory Events to provide Runtime Connection Load Balancing
22
LISTENER
Service OLTP? OLTP1 on N1
Application Server Network
OLTP2 on N2 OLTP3 on N3
Network
RAC Database
23
LISTENER
Connection made to OLTP1
Listeners RAC Database
tw Ne ork
Clients
Connection Pools
c c c c c c cc c c c c
Application Connection Pool Real Application Clusters
24
25
26
27
Q & A
QUESTIONS ANSWERS
28
Appendix
http://search.oracle.com
REAL APPLICATION CLUSTERS
or otn.oracle.com/rac
29
OTN.ORACLE.COM/RAC
Workload Management with Oracle Real Application Clusters (FAN, FCF, Load Balancing) Using standard NFS to support a third voting disk on a stretch cluster configuration on Linux Using Oracle Clusterware to Protect 3rd Party Applications RAC Sample Code Page
http://www.oracle.com/technology/sample_code/products/rac/index.html
30
31