RAC Introduction R

RAC introduction
By
Riyaj Shamsudeen
Basic RAC
  One database, multiple instances. Shared everything architecture.
  All of the following must be in cluster file system or ASM
storage.
  Database files
  Online redo log files
  Undo tablespace
  Temp tablespace
  It is a good practice to keep archived log files and spfile in the

CFS or ASM, but not technically necessary.
ATT June 2010

Riyaj Shamsudeen 2
UNDO and REDO
  Every instance masters its own undo tablespace.

  Each instance has its own redo thread or set of online redo log
files.
  Undo segment blocks are accessed by other instances during
normal operations.
  Redo log files of other threads accessed only during instance
recovery.
ATT June 2010

Riyaj Shamsudeen 3
DB Startup
  You should try to use srvctl commands to perform instance and

nodeapps operations.
  While this can be performed at CRS level, you probably should
not use that as a first measure.
  Startup
srvctl start database –d PERF
  Syntax is
srvctl start database -d db_unique_name [-o start_options] \
[-c connect_str | -q]
ATT June 2010

Riyaj Shamsudeen 4
Instance Startup
  Instances individually can be started too.
  Instance Startup
srvctl start instance –d PERF –i PERF1,PERF2
  Syntax is
srvctl start instance -d db_unique_name -i inst_name_list \
[-o start_options] [-c connect_str | -q]
ATT June 2010

Riyaj Shamsudeen 5
Config
  You can see all instances in the cluster using config option.
  Instances
$ srvctl config database -d PERF
wsqfinc1a PERF1 /opt/app/dtperf/perfdb/10.2
wsqfinc3e PERF4 /opt/app/dtperf/perfdb/10.2
  Syntax is
srvctl config database [-d db_unique_name [-a] [-t]]
ATT June 2010

Riyaj Shamsudeen 6
Listeners
  You can see configured listeners using config option
  Listeners. Listener name and node that listener is currently
located. PERF is the listener name.
$ srvctl config listener -n wsqfinc1a
wsqfinc1a PERF
 Starting listener. All configured listeners will be started with this
command.
srvctl start listener –n wsqfinc1a
 Individual listeners
srvctl start listener –n wsqfinc1a –l PERF, PERF_99
ATT June 2010

Riyaj Shamsudeen 7
What is VIP?
  VIP stands for Virtual IP address.
  An IP address that can be floated to a different server, if needed.
  Listeners need to listen on Virtual IP address. In the output below,
IP address 10.85.24.38 is plumbed to the device ce11.
  Ce11 also has another NOFAILOVER address which is physical
(Not virtual).
/sbin/ifconfig –a
…
ce11: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500
index 2
inet 10.85.24.24 netmask fffffe00 broadcast 10.85.25.255
groupname wsqvcs81_multinicB
…
ce11:2: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 index 2
inet 10.85.24.38 netmask fffffe00 broadcast 10.85.25.255
ATT June 2010

Riyaj Shamsudeen 8
Why do we need VIP?
  Let’s say that listener is listening on physical IP address. If the
server goes down then the IP address will not respond.
  So, new connections from the application, will send the packet to
ethernet and must wait for tcp_time_out seconds before trying
next instance.
  This means that new connection requests will wait for 6 minutes
and can timeout.
  If we use virtual IP address for the listener, then CRS will relocate
the virtual IP address to a surviving server. New connection to
that IP address will immediately get a response and try next entry
in the connect string.
ATT June 2010

Riyaj Shamsudeen 9
Shutdown
 Shutting down database (all instances) uses srvctl commands.
srvctl stop database –d PERF –o immediate
  Shutting down individual instances

srvctl stop instance –d PERF –i PERF1 –o immediate
  Stopping listeners
srvctl stop listener –n wsqfinc1a
srvctl stop listener –n wsqfinc1a –l PERF
ATT June 2010

Riyaj Shamsudeen 10
CRS/CSS
  CRS and CSS daemons are clusterware daemons. They monitor
the health of the cluster and resources.
  How to check if clusterware is running? Following daemons need

to be running. Following daemons need to be running.
ps -ef |grep d.bin
root 21955 1 0 May 22 ? 0:00 /opt/app/dtperf/oracrs/product/crs/bin/oclskd.bin
oracrs 21409 21264 0 May 22 ? 418:50 /opt/app/dtperf/oracrs/product/crs/bin/ocssd.bin
oracrs 21073 4384 0 May 22 ? 11:07 /opt/app/dtperf/oracrs/product/crs/bin/evmd.bin
root 21556 4386 0 May 22 ? 587:56 /opt/app/dtperf/oracrs/product/crs/bin/crsd.bin reboot
  To check health of CRS, from CRS owner userid

crsctl check crs
Cluster Synchronization Services appears healthy
Cluster Ready Services appears healthy
Event Manager appears healthy
ATT June 2010

Riyaj Shamsudeen 11
Resources
  CRS manages the resources such as database, listeners and VIPs
etc.
  Crs_stat shows the resources.
crs_stat -t -v
Name Type R/RA F/FT Target State Host
----------------------------------------------------------------------
ora....F1.inst application 0/5 0/0 ONLINE ONLINE wsqfinc1a
ora....F4.inst application 0/5 0/0 ONLINE ONLINE wsqfinc3e
ora.PERF.db application 0/0 0/1 ONLINE ONLINE wsqfinc1a
ora....RF.lsnr application 1/5 0/0 ONLINE ONLINE wsqfinc1a
ora....c1a.gsd application 4/5 0/0 ONLINE ONLINE wsqfinc1a
...
ATT June 2010

Riyaj Shamsudeen 12
Resource details
  You can look at the resources in detail too
$crs_stat ora.wsqfinc2a.vip –v
NAME=ora.wsqfinc2a.vip
TYPE=application
RESTART_ATTEMPTS=0
RESTART_COUNT=0
FAILURE_THRESHOLD=0
FAILURE_COUNT=0
TARGET=ONLINE
STATE=ONLINE on wsqfinc2a
  Target is specifying what the state should be.

  State is indicating the current state and current server the resource
is located.
ATT June 2010

Riyaj Shamsudeen 13
crs_stop/crs_start
  All resources can be started from CRS. But, you generally do not
want to do this:
$crs_start –all
  Individual resources can be started

crs_start ora.wsqfinc2a.PERF.lsnr
  Individual or All resources can be stopped too.

crs_stop ora.wsqfinc2a.PERF.lsnr
Crs_stop -all
ATT June 2010

Riyaj Shamsudeen 14
CRS stack
  Whole CRS stack can be stopped.
Crsctl stop crs
  But, generally, this needs root permissions

Sudo –u root crsctl stop crs
  All resources can be stopped by crsctl too

Crsctl stop resources
  If you are not sure syntax, just type crsctl and enter. Complete
syntax displayed.
ATT June 2010

Riyaj Shamsudeen 15
Voting disks
  Voting disks are for disk heart beat between the nodes. You can
see voting disks with crsctl command.
$ crsctl query css votedisk
0. 0 /opt/app/wsqvcs81/VOTE/votedisk.dbf
Located 1 voting disk(s).
  You should probably create multiple voting disks.
  If the CRS detects that other nodes stopped updating voting

disks, it can kill failed nodes and remove from the cluster.
ATT June 2010

Riyaj Shamsudeen 16
OCR
  OCR is Oracle Cluster Registry, almost like Windows registry.
  Various details about the cluster is stored in the OCR.
  Unfortunately, OCR gets corrupted quite easily. So, have a good
backup strategy for OCR.
  CRS backup up OCR every 4 hours.
ocrconfig -showbackup
wsqfinc3e 2010/06/10 10:27:19 /opt/app/dtperf/oracrs/product/crs/cdata/wsqvcs81/

backup00.ocr

backup01.ocr

backup02.ocr
ATT June 2010

Riyaj Shamsudeen 17
CRS log files
  In case of CRS issues, read CRS log files.
  Log file locations are:
$ORA_CRS_HOME/log/<nodename>
/opt/app/dtperf/oracrs/product/crs/log/wsqfinc1a/
  Alert log is important. CRS equivalent of DB alert log.

alertwsqfinc1a.log
  Other logs are in the respective directory. For example, crs log file
in PERF is in
/opt/app/dtperf/oracrs/product/crs/log/wsqfinc1a/crsd/crsd.log
ATT June 2010

Riyaj Shamsudeen 18
Resolving hung issues
  One or more RAC instances can go in to an hang state.
  Always, first identify whether it is one instance in trouble or all

instances are in trouble.
1* select inst_id, instance_name from gv$instance order by inst_id
INST_ID INSTANCE_NAME
------- ----------------
1 PERF1
2 PERF2
..
  If the above statement is hung, then it is quite possible that we

lost communication to one instance.
  Avoid one instance myopia. Read alert log from all nodes.
ATT June 2010

Riyaj Shamsudeen 19
Use AWR reports
  To resolve issues, AWR reports are quite handy. Use my scripts to

create AWR reports from all nodes.
@awrrpt_all_gen.sql -- To create recent text AWR report for all nodes
@awrrpt_all_range_gen.sql -- To create recent text AWR report for a range
you can specify.
  If there was no issues 30 minutes ago (assuming 30 min awr

duration), AWR might not show any issues.
ATT June 2010

Riyaj Shamsudeen 20
Use ASH too
  Following query will tell you where the problem might be, if there
is an instance wide hang. This is just an indicator of the problem,
not necessarily a problem itself.
select inst_id, event, count(*) from gv$active_session_history
where sample_time > sysdate -( 5/60/24 )
group by inst_id, event
order by 3 desc
/
INST_ID EVENT COUNT(*)
------- ------------------------------ ----------
3 4442
1 3956
3 db file sequential read 3646
2 3091
4 2934
3 gc buffer busy 1503
ATT June 2010

Riyaj Shamsudeen 21
Interconnects
  RAC communicates with other instances through private
interconnect.
  Gv$cluster_interconnects parameter will show all interconnects.
1* select * from gv$cluster_interconnects
INST_ID NAME IP_ADDRESS IS_ SOURCE
------- --------------- ---------------- ---
-------------------------------
1 ce2 172.29.1.11 NO cluster_interconnects
parameter
parameter
parameter
parameter
parameter
parameter
...
  To check if an interconnectRiyaj
is Shamsudeen
ATT reachable
June 2010 or not, use ping. First one
22
is source, in this case ce2, target is node 4 IP address.
/usr/sbin/ping -s -U -i ce2 172.29.1.41 1 10

RAC Introduction R

Uploaded by

Copyright:

Available Formats

You might also like

RAC Introduction R

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RAC Introduction R

Uploaded by

Copyright:

Available Formats

RAC introduction

 Online redo log files

 It is a good practice to keep archived log files and spfile in the

ATT June 2010

 Every instance masters its own undo tablespace.

ATT June 2010

 You should try to use srvctl commands to perform instance and

ATT June 2010

 Instances individually can be started too.

ATT June 2010

ATT June 2010

ATT June 2010

ATT June 2010

ATT June 2010

 Shutting down individual instances

ATT June 2010

 How to check if clusterware is running? Following daemons need

 To check health of CRS, from CRS owner userid

ATT June 2010

ATT June 2010

 Target is specifying what the state should be.

ATT June 2010

 Individual resources can be started

 Individual or All resources can be stopped too.

ATT June 2010

 But, generally, this needs root permissions

 All resources can be stopped by crsctl too

ATT June 2010

 You should probably create multiple voting disks.

 If the CRS detects that other nodes stopped updating voting

ATT June 2010

wsqfinc3e 2010/06/10 10:27:19 /opt/app/dtperf/oracrs/product/crs/cdata/wsqvcs81/

wsqfinc3e 2010/06/10 06:27:18 /opt/app/dtperf/oracrs/product/crs/cdata/wsqvcs81/

wsqfinc3e 2010/06/10 02:27:18 /opt/app/dtperf/oracrs/product/crs/cdata/wsqvcs81/

ATT June 2010

 Alert log is important. CRS equivalent of DB alert log.

ATT June 2010

 Always, first identify whether it is one instance in trouble or all

 If the above statement is hung, then it is quite possible that we

ATT June 2010

 To resolve issues, AWR reports are quite handy. Use my scripts to

 If there was no issues 30 minutes ago (assuming 30 min awr

ATT June 2010

ATT June 2010

You might also like

  Online redo log files

  It is a good practice to keep archived log files and spfile in the

  Every instance masters its own undo tablespace.

  You should try to use srvctl commands to perform instance and

  Instances individually can be started too.

  Shutting down individual instances

  How to check if clusterware is running? Following daemons need

  To check health of CRS, from CRS owner userid

  Target is specifying what the state should be.

  Individual resources can be started

  Individual or All resources can be stopped too.

  But, generally, this needs root permissions

  All resources can be stopped by crsctl too

  You should probably create multiple voting disks.

  If the CRS detects that other nodes stopped updating voting

  Alert log is important. CRS equivalent of DB alert log.

  Always, first identify whether it is one instance in trouble or all

  If the above statement is hung, then it is quite possible that we

  To resolve issues, AWR reports are quite handy. Use my scripts to

  If there was no issues 30 minutes ago (assuming 30 min awr