Professional Documents
Culture Documents
RAC Interview Questions
RAC Interview Questions
1) What are Oracle Clusterware processes for 10g on UNIX and Linux?
Cluster Synchronization Services (ocssd):
Manages cluster node membership and runs as the oracle user; failure of this process
results in cluster restart.
CSS provides basic Group Services Support; it is a distributed group membership
system that allows applications to coordinate activities to archive a common result.
Group services use vendor clusterware group services when it is available.
Lock services provide the basic cluster-wide serialization locking functions, it uses
the First In, First Out (FIFO) mechanism to manage locking
Node services uses OCR to store data and updates the information during
reconfiguration, it also manages the OCR data which is static otherwise.
The CRSd process manages resources such as starting and stopping the services and
failover of the application resources, it also spawns separate processes to manage
application resources. CRS manages the OCR and stores the current know state of the
cluster, it requires a public, private and VIP interface in order to run. OCSSd provides
synchronization services among nodes, it provides access to the node membership and
enables basic cluster services, including cluster group services and locking, failure of this
daemon causes the node to be rebooted to avoid split-brain situations.
The Event Management Logger, which runs the EVMd process. The daemon spawns
processes called evmlogger and generates the events when things happen. The evmlogger
spawns new children processes on demand and scans the callout directory to invoke
callouts. Death of the EVMd daemon will not halt the instance and will be restarted.
In 10g, CRS consisted of three major components, as shown in Figure 1. These components
manifested themselves as daemons, which ran out of inittab on Linux/Unix, or as services
on Windows. The three daemons were:
LMS (Lock Manager Server Process) —Global Cache Service Process (GCS):
This is the cache fusion part and the most active process; it handles the consistent copies of
blocks that are transferred between instances. It receives requests from LMD to perform
lock requests. I roll back any uncommitted transactions. There can be up to ten LMS
processes running and can be started dynamically if demand requires it.
They manage lock manager service requests for GCS resources and send them to a service
queue to be handled by the LMSn process. It also handles global deadlock detection and
monitors for lock conversion timeouts.
As a performance gain you can increase this process priority to make sure CPU starvation
does not occur
You can see the statistics of this daemon by looking at the view X$KJMSDP
The Global Cache Service Processes (LMSx) are the processes that handle remote Global
Cache Service (GCS) messages.
This process maintains statuses of datafiles and each cached block by recording information
in a Global Resource Directory (GRD). This process also controls the flow of messages to
remote instances and manages global data block access and transmits block images
between the buffer caches of different instances. This processing is a part of cache fusion
feature.
Note: Real Application Clusters software provides for up to 10 Global Cache Service
Processes. The number of LMSx varies depending on the amount of messaging traffic
among nodes in the cluster.
Primary job is to transport blocks across the nodes for cache-fusion requests.
Lock Manager Server Process is used in Cache Fusion. It enables consistent copies of
blocks to be transferred from a holding instance’s buffer cache to a requesting
It rollbacks any uncommitted transactions for any blocks that are being requested for
a consistent read by the remote instance.
You can see the statistics of this daemon by looking at the view X$KJMDDP
A detailed log file is created that tracks any reconfigurations that have happened.
Hardware Failure: A failure of any of the major hardware components (CPU, RAM,
network interconnect) can cause a node eviction.
Server Overload: A server that is experiencing RAM swapping might trigger a node
eviction. It's important that each node be properly configured.
Voting disk communications this can happen when communications to the voting disk
is interrupted, causing the disconnected node to be evicted and re-boot.
Database issues if the database (or the ASM instance) is not responding (a database
“hangs" condition), then a node eviction may occur.
Troubleshooting:
1. Look at the cssd.log files on both nodes; usually we will get more information on the
second node if the first node is evicted and check at crsd.log file also.
Analysis:
If you see “Polling” key words with reduce in percentage values in cssd.log file that says the
eviction is probably due to Network.
If you see “Diskpingout” are something related to -DISK- then, the eviction is because of
Disk time out.
If network was the issue, then check if any NIC cards were down, or if link switching as
happen. And check private interconnect is working between both the nodes.
Check the OS level health: /var/messages
2. Collect NMON/OS Watcher/RDA reports to make sure /justify if it was DISK issue or
Network, if in case we see more memory contention/paging in the reports then it’s time to
collect AWR report to see what loads/SQL was running during that period?
3. The evicted node will have core dump file generated and system reboot info.
4. Find out if there was node reboot, is it because of CRS or others, check system reboot
time.
5. Sometimes eviction could also be due to OS error where the system is in halt state for
while or Memory over commitment or CPU 100% used, check OS /system logfiles to get
more information.
6. What got changed recently? Ask your coworker to open up a ticket with Oracle and
upload logs
7. Check the health of clusterware, db instances, asm instances, uptime of all hosts and all
the logs – ASM logs, Grid logs, CRS and ocssd.log, HAS logs, EVM logs, DB instances logs,
OS logs, SAN logs for that particular timestamp.
8. Run TFA and OSWATCHER, NETSTAT, IFCONFIG settings etc based on error messages
during your RCA.
9. Verify user equivalence between cluster nodes
10. A major reason however for node evictions at our cluster was at the "patch-levels" not
being equal across the two nodes.
Nodes sometimes completely died, without any error what so ever. It turned to be a bug in
the installer of 11.1.0.7.1 PSU.
Location of Logs:
$ORA_CRS_HOME/crs/log: Contains trace files for the CRS resources.
$ORA_CRS_HOME/crs/init: Contains trace files of the CRS daemon during startup. This is a
good place to start with any CRS login problems.
$ORA_CRS_HOME/css/log : The Cluster Synchronization (CSS) logs indicate all actions such
as reconfigurations, missed check-ins, connects, and disconnects from the client CSS
listener. In some cases, the logger logs messages with the category of auth.crit for the
reboots done by Oracle. This could be used for checking the exact time when the reboot
occurred.
$ORA_CRS_HOME/css/init : Contains core dumps from the Oracle Cluster Synchronization
Service daemon (OCSSd) and the process ID (PID) for the CSS daemon whose death is
treated as fatal. If abnormal restarts for CSS exist, the core files will have the format of
core.
$ORA_CRS_HOME/evm/log: Log files for the Event Volume Manager (EVM) and evmlogger
daemons Not used as often for debugging as the CRS and CSS directories.
$ORA_CRS_HOME/evm/init: PID and lock files for EVM, Core files for EVM should also be
written here.
$ORA_CRS_HOME/srvm/log: Log files for Oracle Cluster Registry (OCR), which contains the
details at the Oracle cluster level.
$ORA_CRS_HOME/log: Log files for Oracle Clusterware (known as the cluster alert log),
which contains diagnostic messages at the Oracle cluster level. This is available from Oracle
database 10g R2.
Automatic backups:
To display backups:
#ocrconfig -showbackup
To restore a backup:
#ocrconfig -restore
With Oracle RAC 10g Release 2 or later, you can also use the export command
#ocrconfig -export -s online
Oracle RAC 11g Release 1, you can do a manual backup of the OCR with the command:
# ocrconfig -manualbackup
11g:
In 11g release 2 you no longer have to take voting disks backup. In fact according to Oracle
documentation restoration of voting disks that were copied using the "dd" or "cp" command
may prevent your clusterware from starting up.
So, in 11g Release 2 your voting disk data is automatically backed up in the OCR whenever
there is a configuration change.
The CKPT process updates the control file every 3 seconds in an operation known as
heartbeat.
CKPT writes to a single block that is local to the node/each instance and intra instance
coordination is not required. This block is called checkpoint progress record.
All members of the cluster attempt to lock on the controlfile record for updating.
The instance which obtains the locks tallies the votes from all members. Then, the group
membership must conform to the decided (voted) membership before allowing GCS/GES to
proceed for reconfiguration. The control file record is then stored in the same block as the
heartbeat in the controlfile checkpoint progress record.
What are NETWORK and DISK HEARTBEAT and how it registers in VOTING DISKS/FILES?
All nodes in the RAC cluster register their heartbeat information in the voting
disks/files. RAC heartbeat is the polling mechanism that is sent over the cluster
interconnect to ensure all RAC
Nodes are available.
Voting disks/files are just like attendance register where you have nodes
mark their attendance (heartbeats).
CSSD process on every node makes entries in the voting disk to ascertain the
membership of the node. While marking their own presence, all the nodes also
register the information about their communicability with other nodes in the voting
disk. This is called NETWORK HEARTBEAT.
CSSD process in each RAC maintains the heart beat in a block of size 1 OS block in
the hot block of voting disk at a specific offset. The written block has a header area
with the node name. The heartbeat counter increments every second on every write
call. Thus heartbeat of various nodes is recorded at different offsets in the voting
disk. This process is called DISK HEARTBEAT.
In addition of maintaining its own disk block, CSSD processes also monitors the disk
block maintained by the CSSD processes of other nodes in cluster. Healthy nodes will
have continuous network & disk heartbeats exchanged between the nodes. Break in
heartbeats indicates a possible error scenario.
If the disk is not updated in a short timeout period, the node is considered unhealthy
and may be rebooted to protect the database. In this case, a message to this effect
is written in the KILL BLOCK of node. Each nodes reads its KILL BLOCK once per
second, if the kill block is not overwritten, node commits suicide.
During reconfig (leaving or joining), CSSD monitors all nodes heartbeat information
and determines whether the nodes has a disk heartbeat including those with no
network heartbeat. If no disk heartbeat is detected, then node is considered as
dead.
As we know now that voting disks is used by CSSD. It contains both network & disk
heartbeat from all nodes and if any break in heartbeat will result in eviction of the node
from cluster. There are possible scenarios with missing heartbeats.
Network heart beat is successful, but disk heart beat is missed.
Disk heart beat is successful, but network heart beat is missed.
Both heart beats failing.
When a cluster is involved with many nodes, then few more scenarios are possible.
Nodes have a split into N sets of nodes, communicating within the sets, but not with
the members in other set.
Just one node going unhealthy. Nodes with quorum (minimum number of nodes to
make cluster valid) will maintain active membership of the cluster and other node(s)
will be fenced/rebooted.
A node must be able to access more than half of the voting disks at any time.
Example:
Let us consider 2 node clusters with even number of voting disks say 2.
Let node 1 is able to access voting disk 1.
Node 2 is able to access voting disk 2.
From the above steps, we see that we don’t have any common file where clusterware
can check the heartbeat of both the nodes.
If we have 3 voting disks and both the nodes are able to access more than half ie., 2
voting disks, there will be atleast one disk which will be accessed by both the nodes.
The clusterware can use this disk to check the heartbeat of the nodes.
A node not able to do so will be evicted from the cluster by another node that has
more than half the voting disks to maintain the integrity of the cluster.
It can be stored in
Raw devices
Cluster file system supported by Oracle RAC such as OCFS, Sun cluster or VERITAS
Cluster Filesystem
ASM disks (in 11gR2).
When voting disk is stored in ASM, a question is raised how the voting file on ASM can be
accessed when we want to add a new node to a cluster.
Oracle ASM reserves several blocks at the fixed location for every Oracle ASM disk used for
storing the voting files. As a result, Oracle clusterware can access the voting disks present
in ASM even if the ASM instance is down and CSS can continue to maintain the Oracle
cluster even if the ASM has failed. The physical location of the voting files in ASM disks are
fixed i.e., the cluster stack does not rely on a running ASM instance to access the files.
If the ASM is stored in ASM, the multiplexing of voting disk is decided by the redundancy of
the diskgroup.
10) What is SCAN? How SCAN works? Benefits of SCAN? How to configure SCAN? How
many SCAN Listeners and why?
What is SCAN?
Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g
Release 2 feature that provides a single name for clients to access an Oracle
Database running in a cluster. The benefit is clients using SCAN do not need to change if
you add or remove nodes in the cluster.
So, SCAN needs to resolve to one to three IP addresses with the same name. Oracle
recommends using three IP Addresses for SCAN in DNS. There would be three SCAN
listeners only, though the cluster has got dozens of nodes. SCAN listeners would be started
from GRID Oracle Home, not the database/rdbms home. Since its part of a grid, this can be
used for all the database in the cluster. So, we don't to run netca to create listeners in DB
Homes anymore. If the default port, 1521, is used, Oracle instances (PMON) automatically
registers with the SCAN listener. Here is a quick look at Oracle documentation's load
balancing flow with SCAN:
PMON process of each instance registers the database services with the default
listener on the local node and with each SCAN listener, which is specified by the
REMOTE_LISTENER database parameter.
Oracle client connects using SCAN name: myscan:1521/sales.example.com
Client queries DNS to resolve scan_name.
SCAN listener selects least loaded node (node in this example)
The client connects to the local listener on node2. The local listener starts a
dedicated server process for the connection to the database.
The client connects directly to the dedicated server process on node2 and accesses
the sales2 database instance.
After the installation, two SCAN listeners would be started on one node and another SCAN
listener on another node in a two node cluster.
SCAN Listener would normally be into following chain during normal operation (there might
be something more which I am not able to guess at this time)
Ready for service –> Busy (Received request from Client) –> Busy (Identifying least loaded
node) –> Busy (Redirecting connection to local listener of least loaded node) –> Busy
(Handing off address of that local listener to client) –> Ready for service
-> Having less than 2 SCAN listeners would be concern for HA.
-> Having 2 SCAN listeners would be good for HA.
-> For any processing in round robin we want to make sure that whenever we approaches
someone we want that process so having only 2 SCAN listener would put me into situation
when I am trying to approach SCAN which is not yet ready to check my request.
-> So 2 SCANs would be good to go, still the formula of having N+1 , which means take one
more than what you need for your system. Same is like we have with number of control
files, though you should be good with two control files, still its better to have 2+1.
-> So Having 2+1 would be good for both HA as well as scaleability.
Last thing I would like to share that as per oracle 3 SCAN listeners would be sufficient
enough to handle any peak load connection request even on largest cluster. Still if your
environment is facing bottleneck then you have given an option to add more SCAN listeners.
http://www.freeoraclehelp.com/2011/12/scan-setup-for-oracle-11g-release211gr2.html
https://saruamit4.wordpress.com/2013/09/27/how-many-scan-listeners/
http://www.oracle.com/technetwork/products/clustering/overview/scan-129069.pdf
13) What do you do if you see GC CR BLOCK LOST in top 5 Timed Events in AWR Report?
This is most likely due to a fault in interconnect network.
Check netstat -s
If you see "fragments dropped" or "packet reassemblies failed", Work with your system
administrator find the fault with network.
15) Srvctl cannot start instance, I get the following error PRKP-1001 CRS-0215, and
however sqlplus can start it on both nodes? How do you identify the problem?
Set the environmental variable SRVM_TRACE to true, and start the instance with srvctl. Now
you will get detailed error stack.
Voting disk record node membership information. Oracle Clusterware uses the voting disk to
determine which instances are members of a cluster. The voting disk must reside on a
shared disk. For high availability, Oracle recommends that you have a minimum of three
voting disks. If you configure a single voting disk, then you should use external mirroring
to provide redundancy. You can have up to 32 voting disks in your cluster.
19) How would you find the interconnect IP address from any node within an Oracle 10g
RAC configuration?
Using oifcfg command.
20) How many OCR and voting disks should one have?
For redundancy, one should have at least two OCR disks and three voting disks (raw disk
partitions). These disk partitions should be spread across different physical disks.
23) What is dynamic remastering? When will the dynamic remastering happens?
Dynamic remastering is ability to move the ownership of resource from one instance
to another instance in RAC.
Dynamic resource remastering is used to implement for resource affinity for
increased performance.
Resource affinity optimized the system in situation where update transactions are
being executed in one instance.
When activity shift to another instance the resource affinity correspondingly move to
another instance.
If activity is not localized then resource ownership is hashed to the instance.
26) If there is some issue with virtual IP how will you troubleshoot it? How will you change
virtual ip?
$ srvctl modify nodeapps -A new_address
27) How you will backup your RAC Database?
An RAC Database consists of
OCR
Voting disk
Database files, controlfiles, redolog files & Archive log files
28) Do you have any idea of load balancing in application? How load balancing is done?
http://practicalappsdba.wordpress.com/category/for-master-apps-dbas/
GC CR request: the time it takes to retrieve the data from the remote cache
Reason: RAC Traffic Using Slow Connection or Inefficient queries (poorly tuned queries will
increase the amount of data blocks requested by an Oracle session.
The more blocks requested typically means the more often a block will need to be read from
a remote instance via the interconnect.)
GC BUFFER BUSY: It is the time the remote instance locally spends accessing the requested
data block.
40) How to export and import crs resources while migrating Oracle RAC to new server?
Below script generate svrctl add script for database, instance, service and 11G listeners
from OCR from current RAC.
42) What is the significance of using cluster-aware shared storage in an Oracle RAC
environment?
All instances of an Oracle RAC can access all the datafiles, control files, SPFILE's, redolog
files when these files are hosted out of cluster-aware shared storage which are group of
shared disks.
43) Give few examples for solutions that support cluster storage?
ASM (automatic storage management), raw disk devices, network file system (NFS), OCFS2
and OCFS (Oracle Cluster Fie systems)
54) What is rolling upgrade? And how to apply rolling patch in RAC?
It is a new ASM feature from Database 11g. ASM instances in Oracle database 11g release
(from 11.1) can be upgraded or patched using rolling upgrade feature. This enables us
to patch or upgrade ASM nodes in a clustered environment without affecting database
availability. During a rolling upgrade we can maintain a functional cluster while one or more
of the nodes in the cluster are running in different software versions.
55) Can rolling upgrade be used to upgrade from 10g to 11g database?
No, it can be used only for Oracle database 11g releases (from 11.1).
56) State the initialization parameters that must have same value for every instance in an
Oracle RAC database?
Some initialization parameters are critical at the database creation time and must have
same values. Their value must be specified in SPFILE or PFILE for every instance. The lists
of parameters that must be identical on every instance are given below:
ACTIVE_INSTANCE_COUNT
ARCHIVE_LAG_TARGET
COMPATIBLE
CLUSTER_DATABASE
CLUSTER_DATABASE_INSTANCE
CONTROL_FILES
DB_BLOCK_SIZE
DB_DOMAIN
DB_FILES
DB_NAME
DB_RECOVERY_FILE_DEST
DB_RECOVERY_FILE_DEST_SIZE
DB_UNIQUE_NAME
INSTANCE_TYPE (RDBMS or ASM)
PARALLEL_MAX_SERVERS
REMOTE_LOGIN_passWORD_FILE
UNDO_MANAGEMENT
57) What is ORA-00603: ORACLE server session terminated by fatal error or ORA-29702:
error occurred in Cluster Group Service operation?
RAC node name was listed in the loopback address
59) What two parameters must be set at the time of starting up an ASM instance in a RAC
environment?
The parameters CLUSTER_DATABASE and INSTANCE_TYPE must be set.
64) How do we verify that an instance has been removed from OCR after deleting an
instance?
srvctl config database -d database_name
cd CRS_HOME/bin
./crs_stat
68) Write a sample script for RMAN for the recovery if all the instance are down. (First
explain the procedure how you will restore)?
Bring all nodes down.
Start one Node
Restore all datafiles and archive logs.
Recover 1 Node.
Open the database.
Bring other nodes up.
Confirm that all nodes are operational.
69) Clients are performing some operation and suddenly one of the datafile is experiencing
problem what do you do? If the cluster is a two node?
Bring the datafile offline recover the datafile.
70) How to move OCR and Voting disk to new storage device?
Moving OCR
==========
You must be logged in as the root user, because root owns the OCR files.
Also an ocrmirror must be in place before trying to replace the OCR device.
Make sure there is a recent backup of the OCR file before making any changes:
ocrconfig –showbackup
If there is not a recent backup copy of the OCR file, an export can be taken for the current
OCR file. Use the following command to generate an export of the online OCR file:
In 10.2
In 11g
# ocrconfig -manualbackup
The new OCR disk must be owned by root, must be in the oinstall group, and must have
permissions set to 640. Provide at least 100 MB disk space for the OCR.
Now run ocrcheck to verify if the OCR is pointing to the new file
Shutdown the Oracle Clusterware (crsctl stop crs as root) on all nodes before making any
modification to the voting disk. Determine the current voting disk location using:
dd if=voting_disk_name of=backup_file_name
To move a Voting Disk, provide the full path including file name:
After modifying the voting disk, start the Oracle Clusterware stack on all nodes
72) When exactly during the installation processes are clusterware components created?
After fulfilling the pre-installation requirements, the basic installation steps to follow are:
3. After the Summary screen, OUI will start copying under the $CRS_HOME (this is the
$ORACLE_HOME for Oracle Clusterware) in the local node the libraries and executables.
- here we will have the daemons and scripts init.* created and configured properly.
Oracle Clusterware is formed of several daemons, each one of which have a special function
inside the stack. Daemons are executed via the init.* scripts (init.cssd, init.crsd and
init.evmd).
- note that for CRS only some client libraries are recreated, but not all the executables (as
for the RDBMS).
4. Later the software is propagated to the rest of the nodes in the cluster and the
oraInventory is updated.
5. The installer will ask to execute root.sh on each node. Until this step the software for
Oracle Clusterware is inside the $CRS_HOME.
- control files (or SCLS_SRC files ) will be created with the correct contents to start Oracle
Clusterware.
These files are used to control some aspects of Oracle Clusterware like:
- enable/disable processes from the CSSD family (Eg. oprocd, oslsvmon)
- stop the daemons (ocssd.bin, crsd.bin, etc).
- prevent Oracle Clusterware from being started when the machine boots.
- etc.
In order to start the Oracle Clusterware daemons, the init.* scripts first need to be run.
These scripts are executed by the daemon init. To accomplish this some entries must be
created in the file /etc/inittab.
- the different processes init.* (init.cssd, init.crsd, etc) will start the daemons (ocssd.bin,
crsd.bin, etc). When all the daemons are running then we can say that the
installation was successful
- On 10.2 and later, running root.sh on the last node in the cluster also will create the
nodeapps (VIP, GSD and ONS). On 10.1, VIPCA is executed as part of the RAC installation.
6. After running root.sh on each node, we need to continue with the OUI session. After
pressing the 'OK' button OUI will include the information for the public and
cluster_interconnect interfaces. Also CVU (Cluster Verification Utility) will be executed.
Oracle recommends that you back up your voting disk after the initial cluster creation and
after we complete any node addition or deletion procedures.
First, as root user, stop Oracle Clusterware (with the crsctl stop crs command) on all
nodes. Then, determine the current voting disk by issuing the following command:
crsctl query votedisk css
Then, issue the dd or ocopy command to back up a voting disk, as appropriate.
Give the syntax of backing up voting disks:-
On Linux or UNIX systems:
dd if=voting_disk_name of=backup_file_name
Where, voting_disk_name is the name of the active voting disk backup_file_name is the
name of the file to which we want to back up the voting disk contents
On Windows systems, use the ocopy command:
dd if=backup_file_name of=voting_disk_name
80) How can we add and remove and move multiple voting disks?
If we have multiple voting disks, then we can remove the voting disks and add them back
into our environment using the following commands, where path is the complete path of the
location where the voting disk resides:
(or)
OR You can also use V$GES_RESOURCE view to identify the master node.
v$cache_transfer: This view shows the types and classes of blocks that Oracle transfers
over the cluster interconnect on a per-object basis.
The forced_reads and forced_writes columns can be used to determine the types of objects
the RAC instances are sharing.
Values in the forced_writes column show how often a certain block type is transferred out of
a local buffer cache due to the current version being requested by another instance.
91) What is local OCR? And how to identify the location of OLR?
/etc/oracle/local.ocr
/var/opt/oracle/local.ocr
92) How to backup OLR file? And how to check the OLR backup location?
93) If voting disk/OCR file got corrupted and don’t have backups, how to get them?
We have to install Clusterware.
Ans:
Single Client Access Name (SCAN) is a new Oracle Real Application Clusters (RAC) 11g
Release 2,
feature that provides a single name for clients to access an Oracle Database running in a
cluster.
The benefit is clients using SCAN do not need to change if you add or remove nodes in the
cluster.
97) How many SCAN listeners will be running? Why three SCAN?
Three SCAN listeners only
From 10g release 2 the service can be setup to use load balancing advisory. This means
connections can be routed using SERVICE TIME and THROUGHPUT. Connection load
balancing means the goal of a service can be changed, to reflect the type of connections
using the service.
Session failover will have just the session to failed over to the next available node. With
SELECT, the select query will be resumed.
TAF can be configured with just server side service settings by using dbms_service package.
With fast connection failover, when a down event is received, cached connections affected
by the down event are immediately marked invalid and cleaned up.
102) what are the uses of services? How to find out the services in cluster?
Applications should use the services to connect to the Oracle database.
Services define rules and characteristics (unique name, workload balancing, failover options,
and high availability) to control how users and applications connect to database instances.
103) How to find out the nodes in cluster (or) how to find out the master node?
# olsnodes -- Which ever displayed first, is the master node of the cluster.
To find out which is the master node, you can see ocssd.log file and search for "master node
number".
103) How to know the public IPs, private IPs, VIPs in RAC?
# olsnodes -n -p -i
gc buffer busy
gc buffer busy acquire
gc current request
gc cr request
gc cr failure
gc current block lost
gc cr block lost
gc current block corrupt
gc cr block corrupt
gc current block busy
gc cr block busy
gc current block congested
gc cr block congested.
gc current block 2-way
gc cr block 2-way
gc current block 3-way
gc cr block 3-way
(gc current/cr block n-way, n is number of nodes)
gc current grant 2-way
gc cr grant 2-way
gc current grant busy
gc current grant congested
gc cr grant congested
gc cr multi block read
gc current multi block request
gc cr multi block request
gc cr block build time
gc current block flush time
gc cr block flush time
gc current block send time
gc cr block send time
gc current block pin time
gc domain validation
gc current retry
ges inquiry response
gcs log flush sync
108) what are the initialization parameters that must have same value for every instance in
an Oracle RAC database?
http://satya-racdba.blogspot.com/2012/09/init-parameters-in-oracle-rac.html
ACTIVE_INSTANCE_COUNT
ARCHIVE_LAG_TARGET
COMPATIBLE
CLUSTER_DATABASE
CLUSTER_DATABASE_INSTANCE
CONTROL_FILES
DB_BLOCK_SIZE
DB_DOMAIN
DB_FILES
DB_NAME
DB_RECOVERY_FILE_DEST
DB_RECOVERY_FILE_DEST_SIZE
DB_UNIQUE_NAME
INSTANCE_TYPE (RDBMS or ASM)
PARALLEL_MAX_SERVERS
REMOTE_LOGIN_PASSWORD_FILE
UNDO_MANAGEMENT
109) what is the difference between cr block and cur (current) block?
ASM Disk Scrubbing - From RAC 12c, ASM comes with disk scrubbing feature so that
logical corruptions can be discovered.
Also Oracle 12c ASM can automatically correct this in normal or high redundancy
diskgroups.
IPv6 Support - Oracle RAC 12c now supports IPv6 for Client connectivity, Interconnect is
still on IPv4.
Per Subnet multiple SCAN - RAC 12c, per-Subnet multiple SCAN can be configured per
cluster.
Each RAC instance opens the Container Database (CDB) as a whole so that versions would
be same for CDB as well as for all of the Pluggable Databases (PDBs). PDBs are also fully
compatible with RAC.
Oracle installer will run root.sh script across nodes. We don't have to run the scripts
manually on all RAC nodes.
Oracle 9i RAC:
---------------------
OPS (Oracle Parallel Server) was renamed as RAC
CFS (Cluster File System) was supported
OCFS (Oracle Cluster File System) for Linux and Windows
watchdog timer replaced by hangcheck timer
You should create redo log groups only if you are using administrator-managed databases.
For policy-managed databases, increase the cardinality and when the instance starts, if you
are using Oracle Managed Files and Oracle ASM, then Oracle automatically allocates the
thread, redo, and undo.
If you remove an instance from your Oracle RAC database, then you should disable the
instance’s thread of redo so that Oracle does not have to check the thread during database
recovery.
For policy-managed databases, Oracle automatically allocates the undo tablespace when the
instance starts if you have OMF enabled.
118) what are the different types of Server-Side Connection Load Balancing?
With server-side load balancing, the SCAN listener directs a connection request to the best
instance currently providing the service by using the load balancing advisory. The two types
of connection load balancing are:
· SHORT—Connections are distributed across instances based on the amount of time that
the service is used. Use the SHORT connection load balancing goal for applications that have
connections of brief duration. When using connection pools that are integrated with FAN, set
the connection load balancing goal to SHORT. SHORT tells the listener to use CPU-based
statistics.
· LONG—Connections are distributed across instances based on the number of sessions in
each instance, for each instance that supports the service. Use the LONG connection load
balancing goal for applications that have connections of long duration. This is typical for
connection pools and SQL*Forms sessions. LONG is the default connection load balancing
goal, and tells the listener to use session-based statistics.
121) can I configure both failure notifications with Universal Connection Pool (UCP)?
Connection failure notification is redundant with Fast Connection Failover (FCF) as
implemented by the UCP. You should not configure both within the same application.
123) Can I use Fast Connection Failover (FCF) and Transparent Application Failover (TAF)
together?
No. Only one of them should be used at a time.
124) what is the status of Fast Connection Failover (FCF) with Universal Connection Pool
(UCP)?
FCF is now deprecated along with the Implicit Connection Caching in favor of using the
Universal Connection Pool (UCP) for JDBC.
Use the following query against the internal queue table for load balancing advisory FAN
events to monitor load balancing advisory events generated for an instance:-
SET PAGES 60 COLSEP '|' LINES 132 NUM 8 VERIFY OFF FEEDBACK OFF
COLUMN user_data HEADING "AQ Service Metrics" FORMAT A60 WRAP
BREAK ON service_name SKIP 1
SELECT
TO_CHAR (enq_time, 'HH:MI:SS') Enq_time, user_data
FROM sys.sys$service_metrics_tab
ORDER BY 1;
129) what types of affinity does Universal Connection Pool (UCP) support?
UCP JDBC connection pools support two types of connection affinity: transaction-based
affinity and Web session affinity.
133) do I still need to backup my Oracle Cluster Registry (OCR) and Voting Disks?
You no longer have to back up the voting disk. The voting disk data is automatically backed
up in OCR as part of any configuration change and is automatically restored to any voting
disk added. If all voting disks are corrupted, however, you can restore.
Oracle Clusterware automatically creates OCR backups every four hours. At any one time,
Oracle Database always retains the last three backup copies of OCR. The CRSD process that
creates the backups also creates and retains an OCR backup for each full day and at the end
of each week. You cannot customize the backup frequencies or the number of files that
Oracle Database retains.
137) Why does my user appear across all nodes when querying GV$SESSION when my
service does not span all nodes?
The problem is you are querying GV$SESSION as the ABC user and this results in the
"strange" behavior. If you select gv$session, 2 parallel servers are spawned to query the
v$session on each node. This happens as the same user. Hence when you query gv$session
as ABC you are seeing 3 (one real and 2 parallel slaves querying v$session on each
instance). The reason you are seeing 1 on one node and 3 on the other is the order in which
the parallel processes query the v$session. Take the sys (or any other) user to query the
session of ABC and you will not see this problem.
138) How does Clusterware startup with OCR and Voting Disk in ASM?
The startup sequence has been changed/replaced, now being 2-phased, optimized
approach:
Phase I
· OHASD will startup "local" resources first.
· CSSD uses GPnP profile which stores location of voting disk so no need to access ASM
(voting disk is stored different within ASM than other files so location is known).
Simultaneously,
· ORAAGENT starts up and ASM instance is started (subset of information in OCR is stored in
OLR, enough to startup local resources), and ORAROOTAGENT starts CRSD.
So the 1st phase of Clusterware startup is to essentially start up local resources.
Phase II
· At this point ASM and full OCR information is available and the node is "joined" to cluster.
This is about to understand the startup sequence of Grid Infrastructure daemons and its
resources in 11gR2 RAC.
In 11g RAC aka Grid Infrastructure we all know there are additional background daemons
and agents, and the Oracle documentation is not so clear nor the other blog.
For example:- I have found below two diagram follow any one of these.
#> cd GI_HOME/crs/install
#> perl rootcrs.pl -unlock
As root again:
#> cd GI_HOME/crs/instal
141) How do I determine the “Master” node?
For the cluster synchronization service (CSS), the master can be found by searching
$GI_HOME/log/cssd/ocssd.log. For master of an enqueue resource with Oracle RAC, you can
select from v$ges_resource. There should be a master_node column.
144) what is the major difference between 10g and 11g RAC?
Well, there is not much difference between 10g and 11gR (1) RAC.
But there is a significant difference in 11gR2.
146) what are Oracle Kernel Components (nothing but how does Oracle RAC database
differs than Normal single instance database in terms of Binaries and process)
Basically Oracle kernel need to switched on with RAC On option when you convert to RAC,
that is the difference as it facilitates few RAC bg process like LMON,LCK,LMD,LMS etc.
To turn on RAC
# link the oracle libraries
$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins_rdbms.mk rac_on
# rebuild oracle
$ cd $ORACLE_HOME/bin
$ relink oracle
Oracle RAC is composed of two or more database instances. They are composed of Memory
structures and background processes same as the single instance database. Oracle RAC
instances use two processes GES(Global Enqueue Service), GCS(Global Cache Service) that
enable cache fusion. Oracle RAC instances are composed of following background
processes:
ACMS—Atomic Controlfile to Memory Service (ACMS)
GTX0-j—Global Transaction Process
LMON—Global Enqueue Service Monitor
LMD—Global Enqueue Service Daemon
LMS—Global Cache Service Process
LCK0—Instance Enqueue Process
RMSn—Oracle RAC Management Processes (RMSn)
RSMN—Remote Slave Monitor
149) As you said Voting & OCR Disk resides in ASM Diskgroups, but as per startup sequence
OCSSD starts first before than ASM, how is it possible?
How does OCSSD starts if voting disk & OCR resides in ASM Diskgroups?
You might wonder how CSSD, which is required to start the clustered ASM instance, can be
started if voting disks are stored in ASM? This sounds like a chicken-and-egg problem:
without access to the voting disks there is no CSS, hence the node cannot join the cluster.
But without being part of the cluster, CSSD cannot start the ASM instance. To solve this
problem the ASM disk headers have new metadata in 11.2: you can use kfed to read the
header of an ASM disk containing a voting disk. The kfdhdb.vfstart and kfdhdb.vfend fields
tell CSS where to find the voting file. This does not require the ASM instance to be up. Once
the voting disks are located, CSS can access them and joins the cluster.
Client Connected through SCAN name of the cluster (remember all three IP addresses
round robin resolves to same Host name (SCAN Name), here in this case our scan name
is cluster01-scan.cluster01.example.com
The request reaches to DNS server in your corp and then resolves to one of the node out
of three. a. If GNS (Grid Naming service or domain is configured) that is a subdomain
configured in the DNS entry for to resolve cluster address the request will be handover
to GNS (gnsd)
Here in our case assume there is no GNS, now the with the help of SCAN listeners where
end points are configured to database listener.
Database Listeners listen the request and then process further.
In case of node addition, Listener 4, client need not to know or need not change any
thing from their tns entry (address of 4th node/instance) as they just using scan IP.
Same case even in the node deletion.
In Oracle Database 11g Release 2, GPnP allows each node to perform the following tasks
dynamically:
To add a node, simply connect the server to the cluster and allow the cluster to configure
the node.
So this profile will be read local or from the remote machine when plugged into cluster and
dynamically added to cluster.
153) what are the file types that ASM support and keep in disk groups?
162) what happens during failure events? – How scan connection still works when node
dies? When local listener dies/when scan listener dies?
The main component that makes the solution agnostic to nodes’ failures are 3 SCAN
listeners and related IP addresses. Those listeners are Cluster wide services meaning that
and of 3 listeners are not bind to and particular node and can run on and of the nodes. If
one of the nodes crashes than the following happens:
During the crash itself there are a small timeframe when client connections ma get some
errors alike the following. However in 1-2 minutes it all gets to normal:
"ORA‐12514: TNS:listener does not currently know of service"
The exact SCAN information refresh process is to be investigated, may be Oracle Process
and Notification service might be part of the process as each of the listeners keeps
connection to “127.0.0.1:6100” socket
What methods are available to keep the time synchronized on all nodes in the cluster?
Either the Network Time Protocol(NTP) can be configured or in 11gr2, Cluster Time
Synchronization Service (CTSS) can be used.
Spfiles, ControlFiles, Datafiles and Redolog files should be created on shared storage.
Where does the Clusterware write when there is a network or Storage missed heartbeat?
The ocrconfig -showbackup can be run to find out the automatic and manually run backups.
You can use either the logical or the physical OCR backup copy to restore the Repository.
How do you find out what object has its blocks being shipped across the instance the most?
You can query the V$ACTIVE_INSTANCES view to determine the member instances of the
RAC cluster.
The Cluster Health Monitor (CHM) stores operating system metrics in the CHM repository for
all nodes in a RAC cluster. It stores information on CPU, memory, process, network and
other OS data, This information can later be retrieved and used to troubleshoot and identify
any cluster related issues. It is a default component of the 11gr2 grid install. The data is
stored in the master repository and replicated to a standby repository on a different node.
What would be the possible performance impact in a cluster if a less powerful node (e.g.
slower CPU’s) is added to the cluster?
All processing will show down to the CPU speed of the slowest server.
Oracle Local repository contains information that allows the cluster processes to be started
up with the OCR being in the ASM storage system. Since the ASM file system is unavailable
until the Grid processes are started up a local copy of the contents of the OCR is required
which is stored in the OLR.
In 10g the default SGA size is 1G in 11g it is set to 256M and in 12c ASM it is set back to
1G.
You can use md_backup to restore the ASM diskgroup configuration in-case of ASM
diskgroup storage loss.
Datafiles
Redo logfiles
Spfiles
In 12c the files below can also new be stored in the ASM Diskgroup
Password file
This is the parameter which controls the number of Allocation units the ASM instance will try
to rebalance at any given time. In ASM versions less than 11.2.0.3 the default value is 11
however it has been changed to unlimited in later versions.
A patch is considered a rolling if it is can be applied to the cluster binaries without having to
shutting down the database in a RAC environment. All nodes in the cluster are patched in a
rolling manner, one by one, with only the node which is being patched unavailable while all
other instance open.
CLUSTER_DATABASE
CLUSTER_DATABASE_INSTANCE
ACTIVE_INSTANCE_COUNT
UNDO_MANAGEMENT
The Grid software is becoming more and more capable of not just supporting HA for Oracle
Databases but also other applications including Oracle’s applications. With 12c there are
more features and functionality built-in and it is easier to deploy these pre-built solutions,
available for common Oracle applications.
What components of the Grid should I back up?
Is there an easy way to verify the inventory for all remote nodes
You can run the opatch lsinventory -all_nodes command from a single node to look at the
inventory details for all nodes in the cluster.
Q What is SCAN?
Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g
Release 2 feature that provides a single name for clients to access an Oracle Database
running in a cluster. The benefit is clients using SCAN do not need to change if you add or
remove nodes in the cluster.
dynamic remastering is ability to move the ownership of resource from one instance to
another instance in RAC. dynamic resource remastering is used to implement for resource
affinity for increased performance. resource affinity optimized the system in situation where
update transactions are being executed in one instance. when activity shift to another
instance the resource affinity correspondingly move to another instance. If activity is not
localized then resource ownership is hashed to the instance.
We can check the service or start the services with 'srvctl' command.load balanced/TAF
service named RAC online.
Q If there is some issue with virtual IP how will you troubleshoot it?How will you change
virtual ip?
To change the VIP (virtual IP) on a RAC node, use the command
1)OCR
Q Do you have any idea of load balancing in application?How load balancing is done?
http://practicalappsdba.wordpress.com/category/for-master-apps-dbas/
Q What is RAC?
RAC stands for Real Application cluster. It is a clustering solution from Oracle Corporation
that ensures high availability of databases by providing instance failover, media failover
features.
RAC stands for Real Application Cluster, you have n number of instances running in their
own separate nodes and based on the shared storage. Cluster is the key component and is
a collection of servers operations as one unit. RAC is the best solution for high performance
and high availably. Non RAC databases has single point of failure in case of hardware failure
or server crash.
Oracle RAC is composed of two or more database instances. They are composed of Memory
structures and background processes same as the single instance database.Oracle RAC
instances use two processes GES(Global Enqueue Service), GCS(Global Cache Service) that
enable cache fusion.Oracle RAC instances are composed of following background processes:
ACMS—Atomic Controlfile to Memory Service (ACMS)
Q What is GRD?
GRD stands for Global Resource Directory. The GES and GCS maintains records of the
statuses of each datafile and each cahed block using global resource directory.This process
is referred to as cache fusion and helps in data integrity.
Private interfaces is for intra node communication. VIP is all about availability of application.
When a node fails then the VIP component fail over to some other node, this is the reason
that all applications should based on vip components means tns entries should have vip
entry in the host list
ACMS stands for Atomic Controlfile Memory Service.In an Oracle RAC environment ACMS is
an agent that ensures a distributed SGA memory update(ie)SGA updates are globally
committed on success or globally aborted in event of a failure.
GC CR request :the time it takes to retrieve the data from the remote cache
Reason: RAC Traffic Using Slow Connection or Inefficient queries (poorly tuned queries will
increase the amount of data blocks requested by an Oracle session. The more blocks
requested typically means the more often a block will need to be read from a remote
instance via the interconnect.)
GC BUFFER BUSY: It is the time the remote instance locally spends accessing the requested
data block.
This process monitors global enques and resources across the cluster and performs global
enqueue recovery operations.This is called as Global Enqueue Service Monitor.
This process is called as global enqueue service daemon. This process manages incoming
remote resource requests within each instance.
This process is called as Instance enqueue process.This process manages non-cache fusion
resource requests such as libry and row cache requests.
Q How to export and import crs resources while migrating Oracle RAC to new server.
Below script generate svrctl add script for database, instance, service and 11G listeners
from OCR from current RAC.
do
# Generate DB resource
'BEGIN { FS=":" }
$1~/Oracle home/ || $1~/ORACLE_HOME/ {dbhome = "-o" $2}
END { if (avail == "-a ") {avail = ""}; printf "%s %s %s %s %s\n", "srvctl add database -d
", dbname, dbhome, spfile, dg }'
'$4~/running/ { printf "%s %s %s %s %s %s\n", "srvctl add instance -d ",dbname, " -i ",
$2 ," -n ", $7 }
$5~/running/ { printf "%s %s %s %s %s %s \n", "srvctl add instance -d ",dbname, " -i ",
$2 ," -n ", $8 }'
then
'$2~/1$/ { printf "%s %s %s %s %s \n", "srvctl modify instance -d ",dbname, " -i ", $2 ," -
s +ASM1" }
$2~/2$/ { printf "%s %s %s %s %s \n", "srvctl modify instance -d ",dbname, " -i ", $2 ," -
s +ASM2" }
$2~/3$/ { printf "%s %s %s %s %s \n", "srvctl modify instance -d ",dbname, " -i ", $2 ," -
s +ASM3" }
$2~/4$/ { printf "%s %s %s %s %s \n", "srvctl modify instance -d ",dbname, " -i ", $2 ," -
s +ASM4" }'
fi
do
'BEGIN { FS=":"}
done
done
This process is called as Remote Slave Monitor.This process manages background slave
process creation andd communication on remote instances. This is a background slave
process.This process performs tasks on behalf of a co-ordinating process running in another
instance.
All datafiles, controlfiles, SPFIles, redo log files must reside on cluster-aware shred storage.
All instances of an Oracle RAC can access all the datafiles,control files, SPFILE's, redolog
files when these files are hosted out of cluster-aware shared storage which are group of
shared disks.
An interconnect network is a private network that connects all of the servers in a cluster.
The interconnect network uses a switch/multiple switches that only the nodes in the cluster
can access.
Q How can we configure the cluster interconnect?
Configure User Datagram Protocol(UDP) on Gigabit ethernet for cluster interconnect.On unix
and linux systems we use UDP and RDS(Reliable data socket) protocols to be used by
Oracle Clusterware.Windows clusters use the TCP protocol.
No, crossover cables are not supported with Oracle Clusterware intercnects.
Cluster interconnect is used by the Cache fusion for inter instance communication.
Users can access a RAC database using a client/server configuration or through one or more
middle tiers ,with or without connection pooling.Users can use oracle services feature to
connect to database.
Applications should use the services feature to connect to the Oracle database.Services
enable us to define rules and characteristics to control how users and applications connect
to database instances.
The charateristics include a unique name, workload balancing and failover options,and high
availability characteristics.
Q What enables the load balancing of applications in RAC?
Oracle Net Services enable the load balancing of application connections across all of the
instances in an Oracle RAC database.
A virtl IP address or VIP is an alternate IP address that the client connectins use instead of
the standard public IP address. To configureVIP address, we need to reserve a spare IP
address for each node, and the IP addresses must use the same subnet as the public
network.
If a node fails, then the node's VIP address fails over to another node on which the VIP
address can accept TCP connections but it cannot accept Oracle connections.
VIP addresses failover happens when the node on which the VIP address runs fails, all
interfaces for the VIP address fails, all interfaces for the VIP address are disconnected from
the network.
When a VIP address failover happens, Clients that attempt to connect to the VIP address
receive a rapid connection refused error .They don't have to wait for TCP connection timeout
messages.
Q What are the administrative tools used for Oracle RAC environments?
Issue the following query from any one node connecting through SQL*PLUS.
Q What is FAN?
FAN UP and FAN DOWN events can be applied to instances,services and nodes.
Having ASM is the Oracle recommended storage option for RAC databases as the ASM
maximizes performance by managing the storage configuration across the disks.ASM does
this by distributing the database file across all of the available storage within our cluster
database environment.
It is a new ASM feature from Database 11g.ASM instances in Oracle database 11g
release(from 11.1) can be upgraded or patched using rolling upgrade feature. This enables
us to patch or upgrade ASM nodes in a clustered environment without affecting database
availability.During a rolling upgrade we can maintain a functional cluster while one or more
of the nodes in the cluster are running in different software versions.
No,it can be used only for Oracle database 11g releases(from 11.1).
Q State the initialization parameters that must have same value for every instance in an
Oracle RAC database
Some initialization parameters are critical at the database creation time and must have
same values.Their value must be specified in SPFILE or PFILE for every instance.The list of
parameters that must be identical on every instance are given below:
ACTIVE_INSTANCE_COUNT
ARCHIVE_LAG_TARGET
COMPATIBLE
CLUSTER_DATABASE
CLUSTER_DATABASE_INSTANCE
CONTROL_FILES
DB_BLOCK_SIZE
DB_DOMAIN
DB_FILES
DB_NAME
DB_RECOVERY_FILE_DEST
DB_RECOVERY_FILE_DEST_SIZE
DB_UNIQUE_NAME
PARALLEL_MAX_SERVERS
REMOTE_LOGIN_passWORD_FILE
UNDO_MANAGEMENT
These parameters can be identical on all instances only if these parameter values are set to
zero.
What two parameters must be set at the time of starting up an ASM instance in a RAC
environment?The parameters CLUSTER_DATABASE and INSTANCE_TYPE must be set.
Oracle clusterware is made up of components like voting disk and Oracle Cluster
Registry(OCR).
Oracle clusterware manages CRS resources based on the configuration information of CRS
resources stored in OCR(Oracle Cluster Registry).
Oracle clusterware manages CRS resources based on the configuration information of CRS
resources stored in OCR(Oracle Cluster Registry).
Q What are the modes of deleting instances from ORacle Real Application cluster
Databases?
We can delete instances using silent mode or interactive mode using DBCA(Database
Configuration Assistant).
We need to stop and delete the instance in the node first in interactive or silent mode.After
that asm can be removed using srvctl tool as follows:
We can verify if ASM has been removed by issuing the following command:
Q How do we verify that an instance has been removed from OCR after deleting an
instance?
cd CRS_HOME/bin
./crs_stat
We can verify the current backup of OCR using the following command : ocrconfig -
showbackup
We have v$ views that are instance specific. In addition we have GV$ views called as global
views that has an INST_ID column of numeric data type.GV$ views obtain information from
individual V$ views.
There are two types of connection load-balancing:server-side load balancing and client-side
load balancing.
Q What is the difference between server-side and client-side connection load balancing?
Client-side balancing happens at client side where load balancing is done using listener.In
case of server-side load balancing listener uses a load-balancing advisory to redirect
connections to the instance providing best service.
In a RAC environment, if a node in the cluster fails, the application continues to run on the
surviving nodes contained in the cluster. If your application is configured correctly, most
users won't even know that the node they were running on became unavailable.
In a RAC environment the buffer cache is global across all instances in the cluster and hence
the processing differs.The most common wait events related to this are gc cr request and gc
buffer busy
GC CR request: the time it takes to retrieve the data from the remote cache
Reason: RAC Traffic Using Slow Connection or Inefficient queries (poorly tuned queries will
increase the amount of data blocks
requested by an Oracle session. The more blocks requested typically means the more often
a block will need to be read from a remote instance via the interconnect.)
GC BUFFER BUSY: It is the time the remote instance locally spends accessing the requested
data block.
We have public, private, and VIP components. Private interfaces is for intra node
communication. VIP is all about availability of application. When a node fails then the VIP
component will fail over to some other node, this is the reason that all applications should
be based on VIP components. This means that tns entries should have VIP entry in the host
list.
Q Tune the following RAC DATABASE (DBNAME=PROD) which is 3 node RAC.
What are you looking for here? What tuning information do you expect?
I would put 20% of the memory for Oracle in each node. So that would mean that the SGA
is different in each of the nodes.
Also since the CPU's are different PROD2 can have more number of max number of
processes as compared to the rest of them.
But as I said this is just configuration, this is not tuning. Question is not clear.
Q Write a sample script for RMAN for the recovery if all the instance are down.(First explain
the procedure how you will restore)
Recover 1 Node.
Clients are performing some operation and suddenly one of the datafile is experiencing
problem what do you do? The cluster is a two node one.
Moving OCR
==========
You must be logged in as the root user, because root owns the OCR files. Also an ocrmirror
must be in place before trying to replace the OCR device.
Make sure there is a recent backup of the OCR file before making any changes:
ocrconfig –showbackup
If there is not a recent backup copy of the OCR file, an export can be taken for the current
OCR file. Use the following command to generate an export of the online OCR file:
In 10.2
In 11g
# ocrconfig -manualbackup
The new OCR disk must be owned by root, must be in the oinstall group, and must have
permissions set to 640. Provide at least 100 MB disk space for the OCR.
Now run ocrcheck to verify if the OCR is pointing to the new file
==================
Shutdown the Oracle Clusterware (crsctl stop crs as root) on all nodes before making any
modification to the voting disk. Determine the current voting disk location using:
dd if=voting_disk_name of=backup_file_name
To move a Voting Disk, provide the full path including file name:
After modifying the voting disk, start the Oracle Clusterware stack on all nodes
Q When exactly during the installation process are clusterware components created?
After fulfilling the pre-installation requirements, the basic installation steps to follow are:
-etc.
3. After the Summary screen, OUI will start copying under the $CRS_HOME (this is the
$ORACLE_HOME for Oracle Clusterware) in the local node the libraries and executables.
- here we will have the daemons and scripts init.* created and configured properly.
Oracle Clusterware is formed of several daemons, each one of which have a special function
inside the stack. Daemons are executed via the init.* scripts (init.cssd, init.crsd and
init.evmd).
- note that for CRS only some client libraries are recreated, but not all the executables (as
for the RDBMS).
4. Later the software is propagated to the rest of the nodes in the cluster and the
oraInventory is updated.
5. The installer will ask to execute root.sh on each node. Until this step the software for
Oracle Clusterware is inside the $CRS_HOME.
- control files (or SCLS_SRC files ) will be created with the correct contents to start Oracle
Clusterware.
These files are used to control some aspects of Oracle Clusterware like:
- prevent Oracle Clusterware from being started when the machine boots.
- etc.
In order to start the Oracle Clusterware daemons, the init.* scripts first need to be run.
These scripts are executed by the daemon init. To accomplish this some entries must be
created in the file /etc/inittab.
- the different processes init.* (init.cssd, init.crsd, etc) will start the daemons (ocssd.bin,
crsd.bin, etc). When all the daemons are running then we can say that the installation was
successful
- On 10.2 and later, running root.sh on the last node in the cluster also will create the
nodeapps (VIP, GSD and ONS). On 10.1, VIPCA is executed as part of the RAC installation.
6. After running root.sh on each node, we need to continue with the OUI session. After
pressing the 'OK' button OUI will include the information for the public and
cluster_interconnect interfaces. Also CVU (Cluster Verification Utility) will be executed.
Q What are Oracle Clusterware processes for 10g on Unix and Linux
Cluster Synchronization Services (ocssd) — Manages cluster node membership and runs as
the oracle user; failure of this process results in cluster restart.
Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be
a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application
process, and so on) based on the resource's configuration information that is stored in the
OCR. This includes start, stop, monitor and failover operations. This process runs as the root
user
Event manager daemon (evmd) —A background process that publishes events that crs
creates.
Process Monitor Daemon (OPROCD) —This process monitor the cluster and provide I/O
fencing. OPROCD performs its check, stops running, and if the wake up is beyond the
expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure
results in Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on
Linux platforms.
To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy
a query or transaction, Oracle RAC instances use two processes, the Global Cache Service
(GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the
statuses of each data file and each cached block using a Global Resource Directory (GRD).
The GRD contents are distributed across all of the active instances.
Voting Disk — Oracle RAC uses the voting disk to manage cluster membership by way of a
health check and arbitrates cluster ownership among the instances in case of network
failures. The voting disk must reside on shared disk.
Note.559365.1 Using Diagwait as a diagnostic to get more information for diagnosing Oracle
Clusterware Node evictions.
To display backups :
#ocrconfig -showbackup
To restore a backup :
#ocrconfig -restore
With Oracle RAC 10g Release 2 or later, you can also use the export command:
#ocrconfig -export -s online, and use -import option to restore the contents back.
With Oracle RAC 11g Release 1, you can do a manaual backup of the OCR with the
command:
# ocrconfig -manualbackup
or
#ocrcheck
Q What is SCAN?
Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g
Release 2 feature that provides a single name for clients to access an Oracle Database
running in a cluster. The benefit is clients using SCAN do not need to change if you add or
remove nodes in the cluster.
Clusterware uses the private interconnect for cluster synchronization (network heartbeat)
and daemon communication between the the clustered nodes. This communication is based
on the TCP protocol.
RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP).
Cache Fusion is the remote memory mapping of Oracle buffers, shared between the caches
of participating nodes in the cluster.
Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP
timeout period (which can be up to 10 min) before getting an error. As a result, you don't
really have a good HA solution without using VIPs.
When a node fails, the VIP associated with it is automatically failed over to some other node
and new node re-arps the world indicating a new MAC address for the IP. Subsequent
packets sent to the VIP go to the new node, which will send error RST packets back to the
clients. This results in the clients getting errors immediately
10g Release 2, support 100 nodes in a cluster using Oracle Clusterware, and 100 instances
in a RAC database.
Q Srvctl cannot start instance, I get the following error PRKP-1001 CRS-0215, however
sqlplus can start it on both nodes? How do you identify the problem?
Set the environmental variable SRVM_TRACE to true.. And start the instance with srvctl.
Now you will get detailed error stack.
The Oracle Notification Service (ONS) daemon is an daemon started by the CRS clusterware
as part of the nodeapps. There is one ons daemon started per clustered node.
The Oracle Notification Service daemon receive a subset of published clusterware events via
the local evmd and racgimon clusterware daemons and forward those events to application
subscribers and to the local listeners.
b. the 10gR2 Load Balancing Advisory, the feature that permit load balancing accross
different rac nodes dependent of the load on the different nodes. The rdbms MMON is
creating an advisory for distribution of work every 30seconds and forward it via racgimon
and ONS to listeners and applications.
Users can access a RAC database using a client/server configuration or through one or more
middle tiers, with or without connection pooling. Users can use oracle services feature to
connect to database.
Applications should use the services feature to connect to the Oracle database. Services
enable us to define rules and characteristics to control how users and applications connect
to database instances.
1) Oracle recommends that you back up your voting disk after the initial cluster creation
and after we complete any node addition or deletion procedures.
2) First, as root user, stop Oracle Clusterware (with the crsctl stop crs command) on all
nodes. Then, determine the current voting disk by issuing the following command:
crsctl query votedisk css
dd if=voting_disk_name of=backup_file_name
where,
backup_file_name is the name of the file to which we want to back up the voting disk
contents
Oracle recommends us to use the dd command to backup the voting disk with a minimum
block size of 4KB.
To restore the backup of your voting disk, issue the dd or ocopy command for Linux and
UNIX systems or ocopy for Windows systems respectively.
On Linux or UNIX systems:
dd if=backup_file_name of=voting_disk_name
where,
If we have multiple voting disks, then we can remove the voting disks and add them back
into our environment using the following commands, where path is the complete path of the
location where the voting disk resides:
Before making any modification to the voting disk, as root user, stop Oracle Clusterware
using the crsctl stop crs command on all nodes.
Q How do we add voting disk?
To add a voting disk, issue the following command as the root user, replacing the path
variable with the fully qualified path name for the voting disk we want to add:
To move a voting disk, issue the following commands as the root user, replacing the path
variable with the fully qualified path name for the voting disk we want to move:
To remove a voting disk, issue the following command as the root user, replacing the path
variable with the fully qualified path name for the voting disk we want to remove:
After modifying the voting disk, restart Oracle Clusterware using the crsctl start crs
command on all nodes, and verify the voting disk location using the following command:
If our cluster is down, then we can include the -force option to modify the voting disk
configuration, without interacting with active Oracle Clusterware daemons. However, using
the -force option while any cluster node is active may corrupt our configuration
. What are the special background processes for RAC (or) what is difference in stand-alone
database & RAC database background processes?
Ans:
http://satya-racdba.blogspot.com/2010/07/new-features-in-9i-10g-11g-rac.html
SCAN,
By using srvctl, we can mange diskgroups, home, ons, eons, filesystem, srvpool, server,
scan, scan_listener, gns, vip, oc4j,
GSD
Ans:
http://satya-racdba.blogspot.com/2010/07/new-features-in-9i-10g-11g-rac.html
SCAN,
By using srvctl, we can mange diskgroups, home, ons, eons, filesystem, srvpool, server,
scan, scan_listener, gns, vip, oc4j,
GSD
7. What is cache fusion?
Ans:
Transferring of data between RAC instances by using private network. Cache Fusion is the
remote memory mapping of Oracle buffers, shared between the caches of participating
nodes in the cluster. When a block of data is read from datafile by an instance within the
cluster and another instance is in need of the same block, it is easy to get the block image
from the instance which has the block in its SGA rather than reading from the disk.
Ans:
Clusterware uses the private interconnect for cluster synchronization (network heartbeat)
and daemon communication between the clustered nodes. This communication is based on
the TCP protocol. RAC uses the interconnect for cache fusion (UDP) and inter-process
communication (TCP).
Ans:
Voting Disk - Oracle RAC uses the voting disk to manage cluster membership by way of a
health check and arbitrates cluster ownership among the instances in case of network
failures. The voting disk must reside on shared disk.
Virtual IP (VIP) - When a node fails, the VIP associated with it is automatically failed over to
some other node and new node re-arps the world indicating a new MAC address for the IP.
Subsequent packets sent to the VIP go to the new node, which will send error RST packets
back to the clients. This results in the clients getting errors immediately.
Ans:
RAC configuration information repository that manages information about the cluster node
list and instance-to-node mapping information. The OCR also manages information about
Oracle Clusterware resource profiles for customized applications. Maintains cluster
configuration information as well as configuration information about any cluster database
within the cluster. The OCR must reside on shared disk that is accessible by all of the nodes
in your cluster. The daemon OCSSd manages the configuration info in OCR and maintains
the changes to cluster in the registry.
11. What is Voting file/disk and how many files should be there?
Ans:
Voting Disk File is a file on the shared cluster system or a shared raw device file. Oracle
Clusterware uses the voting disk to determine which instances are members of a cluster.
Voting disk is akin to the quorum disk, which helps to avoid the split-brain syndrome. Oracle
RAC uses the voting disk to manage cluster membership by way of a health check and
arbitrates cluster ownership among the instances in case of network failures. The voting
disk must reside on shared disk.
Ans:
#ocrconfig -manualbackup
Ans:
/etc/oracle/local.ocr
/var/opt/oracle/local.ocr
Ans:
#ocrconfig –showbackup
Ans:
dd if=/u02/ocfs2/vote/VDFile_0 of=$ORACLE_BASE/bkp/vd/VDFile_0
Ans:
Ans:
# ocrcheck
19. If voting disk/OCR file got corrupted and don’t have backups, how to get them?
Ans:
Ans:
Cache fusion is nothing but a mapping of remote memory of oracle buffers, which is shared
between the caches participating nodes in the cluster. It is very easy to gain the block
image from the instance that contain the block in its SGA instead of reading from the disk,
this happens when the block of data is read from data file by an instance in the cluster and
when another instance require the same block.
Oracle cluster registry (OCR): It contains all information about instances, services, state
information, cluster configuration, nodes and ASM storage if needed. The OCR should
occupy on a shared disk, which is accessible by all the nodes in your cluster. In OCR, the
daemon OCSSd is used to manage the configuration and in the registry, it maintains the
changes to the cluster.
Voting Disk: It helps to verify, if a node has failed, which means it got separated from the
majority, then it is rebooted forcibly and after rebooting, it is added again to the surviving
nodes of cluster. The Oracle RAC uses it to maintain the membership of cluster.
Oracle Local Repository (OLR) contains an information which allows the cluster programs to
initiate with the OCR, which is being in the ASM storage. As until the grid processes are
started, the ASM file is unavailable, then a local copy of the data of the OCR is required,
that is stored in OLR.
4.What is FAN?
FAN stands for fast application notification, which is connected to the events containing
services, nodes and instances. In order to describe the other processes about the service
level information and configuration which contains the changes of the service status like UP
or DOWN events, Oracle RAC uses this notification mechanism. Using FAN events, the
application gives response and can take immediate actions.
5.What is SCAN?
SCAN stands for Single Client Access Name is a feature of new Oracle RAC 11g release 2
which provides one name for clients to access an Oracle Database cluster. The benefit to the
SCAN user is that, there is no need to change if you remove or add nodes in the cluster.
The crash recovery takes place during the startup.when an instance, breaks up in a single
node database. The same recovery is performed in the RAC environment by the surviving
nodes, which is called as an instance recovery.
The hangcheck timer is used to check the health of the system regularly. The node is
restarted automatically, when the system stops or hangs.
Hangcheck margin- this shows that how much delay can be permitted before the reset of
the RAC node is done by the hangcheck timer.
Hangcheck Tick: It is defined as the time period between system health checks. 60 seconds
is the default time, but Oracle recommends it to be 30 seconds.
When nodes of the database in a cluster can’t communicate with each other, they modify
the data blocks and may continue to process independently. If more than one instance
modify the same block, locking or synchronization of the blocks of the data does not occur
and it may happen that the blocks get overwritten by others in the cluster. This process is
called split brain.
9.What is GRD?
GRD is the Global Resource Directory. The GRD is used by the GES and GCS to maintain the
records of each cached block and each datafile. This process is known as cache fusion and
can be used in data integrity.
It is a RAC configuration information repository, which maintains the information about the
instance node mapping and cluster node. It also maintains information about the profiles of
oracle Clusterware resource for customed applications. It manages the configuration
information related to any cluster database in the cluster. It is necessary for the OCR to
reside on a shared disk, which is accessible by all of the cluster nodes. The command
daemon OCSSD maintains the configuration information in OCR and manages the changes
to cluster within the registry.
It is a private network, which connects all the servers in a cluster. It uses the multiple
switches which are accessed by only the nodes in the cluster.
It is the part of the physical disk, which is accessed on the lowest level. When an addition
partition is created, raw partition is created and without any formatting, a logical partitions
are assigned to it. It is called cooked partition, once the formatting is completed.
When the node fails, then the VIP address of that node fails over to the other node on which
it cannot accept Oracle connections but not TCP connections.
The backup copy of either physical or logical OCR copy is used to restore the repository.
The Clusterware is installed on each node (on an Oracle Home) and on the shared disks (the
voting disks and the CSR file)
The base software is installed on each node of the cluster and the
3. What kind of storage we can use for the shared Clusterware files?
- OCFS (Release 1 or 2)
- raw devices
4. What kind of storage we can use for the RAC database storage?
- OCFS (Release 1 or 2)
- ASM
- raw devices
5. What is a CFS?
A cluster File System (CFS) is a file system that may be accessed (read and write) by all
members in a cluster at the same time. This implies that all members of a cluster have the
same view.
6. What is an OCFS2?
The OCFS2 is the Oracle (version 2) Cluster File System which can be used for the Oracle
Real Application Cluster.
A raw device is a disk drive that does not yet have a file system set up. Raw devices are
used for Real Application Clusters since they enable the sharing of disks.
A raw partition is a portion of a physical disk that is accessed at the lowest possible level. A
raw partition is created when an extended partition is created and logical partitions are
assigned to it without any formatting. Once formatting is complete, it is called cooked
partition.
A CFS offers:
- Simpler management
- With Oracle_Home on CFS, when you apply Oracle patches CFS guarantees that the
updated Oracle_Home is visible to all nodes in the cluster.
Note: This option is very dependent on the availability of a CFS on your platform.
- The performance is very, very important: Raw devices offer best performance without any
intermediate layer between Oracle and the disk.
Note: Autoextend fails on raw devices if the space is exhausted. However the space could
be added online if needed.
Oracle RAC 10g Release 1 introduced Oracle Cluster Ready Services (CRS), a platform-
independent set of system services for cluster environments. In Release 2, Oracle has
renamed this product to Oracle Clusterware.
It returns a dead connection IMMIDIATELY, when its primary node fails. Without using VIP
IP, the clients have to wait around 10 minutes to receive ORA-3113: “end of file on
communications channel”. However, using Transparent Application Failover (TAF) could
avoid ORA-3113.
16. Why we need to have configured SSH or RSH on the RAC nodes?
SSH (Secure Shell,10g+) or RSH (Remote Shell, 9i+) allows “oracle” UNIX account
connecting to another RAC node and copy/ run commands as the local “oracle” UNIX
account.
No. SSH or RSH are needed only for RAC, patch set installation and clustered database
creation.
Each node of a cluster that is being used for a clustered database will typically have the
RDBMS and RAC software loaded on it, but not actual data files (these need to be available
via shared disk).
19. What are the restrictions on the SID with a RAC database? Is it limited to 5 characters?
The SID prefix in 10g Release 1 and prior versions was restricted to five characters by
install/ config tools so that an ORACLE_SID of up to max of 5+3=8 characters can be
supported in a RAC environment. The SID prefix is relaxed up to 8 characters in 10g
Release 2, see bug 4024251 for more information.
The Real Application Clusters do not support heterogeneous platforms in the same cluster.
21. Are there any issues for the interconnect when sharing the same switch as the public
network by using VLAN to separate the network?
RAC and Clusterware deployment best practices suggests that the interconnect (private
connection) be deployed on a stand-alone, physically separate, dedicated switch. On big
network the connections could be instable.
With 10g Release 2, we support 100 nodes in a cluster using Oracle Clusterware, and 100
instances in a RAC database. Currently DBCA has a bug where it will not go beyond 63
instances. There is also a documentation bug for the max-instances parameter. With 10g
Release 1 the Maximum is 63.
The Cluster Verification Utility (CVU) is a validation tool that you can use to check all the
important components that need to be verified at different stages of deployment in a RAC
environment.
25. What versions of the database can I use the cluster verification utility (cluvfy) with?
The cluster verification utility is release with Oracle Database 10g Release 2 but can also be
used with Oracle Database 10g Release 1.
26. If I am using Vendor Clusterware such as Veritas, IBM, Sun or HP, do I still need Oracle
Clusterware to run Oracle RAC 10g?
Yes. When certified, you can use Vendor Clusterware however you must still install and use
Oracle Clusterware for RAC. Best Practice is to leave Oracle Clusterware to manage RAC. For
details see Metalink Note 332257.1 and for Veritas SFRAC see 397460.1.
Yes.
The hangcheck timer checks regularly the health of the system. If the system hangs or stop
the node will be restarted automatically.
-> hangcheck-tick: this parameter defines the period of time between checks of system
health. The default value is 60 seconds; Oracle recommends setting it to 30seconds.
-> hangcheck-margin: this defines the maximum hang delay that should be tolerated before
hangcheck-timer resets the RAC node.
29. Is the hangcheck timer still needed with Oracle RAC 10g?
Yes.
30. What files can I put on Linux OCFS2?
For optimal performance, you should only put the following files on Linux OCFS2:
- Datafiles
- Control Files
- Redo Logs
- Archive Logs
- Voting File
- SPFILE
31. Is it possible to use ASM for the OCR and voting disk?
No, the OCR and voting disk must be on raw or CFS (cluster file system).
32. Can I change the name of my cluster after I have created it when I am using Oracle
Clusterware?
No, you must properly uninstall Oracle Clusterware and then re-install.
The O2CB is the OCFS2 cluster stack. OCFS2 includes some services. These services must
be started before using OCFS2 (mount/ format the file systems).
The voting disk is nothing but a file that contains and manages information of all the node
memberships.
37. What command would you use to check the availability of the RAC system?
38. What is the minimum number of instances you need to have in order to create a RAC?
Yes, but Clusterware version must be greater than the greater database version.
41. What was RAC previous name before it was called RAC?OPS: Oracle Parallel Server
43. What is the difference between normal views and RAC views?A RAC view has the prefix
‘G’. For example, GV$SESSION instead of V$SESSION
44. Which command will we use to manage (stop, start) RAC services in command-line
mode?
srvctl
Voting Disk — Oracle RAC uses the voting disk to manage cluster membership by way of a
health check and arbitrates cluster ownership among the instances in case of network
failures. The voting disk must reside on shared disk.
or
#ocrcheck
Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g
Release 2 feature that provides a single name for clients to access an Oracle Database
running in a cluster. The benefit is clients using SCAN do not need to change if you add or
remove nodes in the cluster.
What is the main purpose of Oracle Real Application Clusters (RAC)?
Oracle Real Application (RAC) provides the interaction of executable file with the Oracle
database.
It allows the running of any packaged or custom built application with the Oracle database
that is running on a server pool.It provides very high level of availability, flexibility and
scalability to run the application and store it to the database.
It creates the database such that if the pool fails then the database is continued to run from
the remaining servers and the load can be distributed.It makes it easier for the
administrator to maintain many servers at the same time by load-balancing techniques and
providing provision to add more and more servers when the load increases.
When an instance crashes in a single node database on startup a crash recovery takes
place. In a RAC enviornment the same recovery for an instance is performed by the
surviving nodes called Instance recovery.
It is a private network which is used to ship data blocks from one instance to another for
cache fusion. The physical data blocks as well as data dictionary blocks are shared across
this interconnect.
How do you determine what protocol is being used for Interconnect traffic?
One of the ways is to look at the database alert log for the time period when the database
was started up.
What methods are available to keep the time synchronized on all nodes in the cluster?
Either the Network Time Protocol(NTP) can be configured or in 11gr2, Cluster Time
Synchronization Service (CTSS) can be used.
Spfiles, ControlFiles, Datafiles and Redolog files should be created on shared storage.
Where does the Clusterware write when there is a network or Storage missed heartbeat?
The ocrconfig -showbackup can be run to find out the automatic and manually run backups.
You can use either the logical or the physical OCR backup copy to restore the Repository.
How do you find out what object has its blocks being shipped across the instance the most?
The VIP is an alternate Virtual IP address assigned to each node in a cluster. During a node
failure the VIP of the failed node moves to the surviving node and relays to the application
that the node has gone down. Without VIP, the application will wait for TCP timeout and
then find out that the session is no longer live due to the failure.
How do we know which database instances are part of a RAC cluster?
You can query the V$ACTIVE_INSTANCES view to determine the member instances of the
RAC cluster.
The Cluster Health Monitor (CHM) stores operating system metrics in the CHM repository for
all nodes in a RAC cluster. It stores information on CPU, memory, process, network and
other OS data, This information can later be retrieved and used to troubleshoot and identify
any cluster related issues. It is a default component of the 11gr2 grid install. The data is
stored in the master repository and replicated to a standby repository on a different node.
What would be the possible performance impact in a cluster if a less powerful node (e.g.
slower CPU’s) is added to the cluster?
All processing will show down to the CPU speed of the slowest server.
Oracle Local repository contains information that allows the cluster processes to be started
up with the OCR being in the ASM storage ssytem. Since the ASM file system is unavailable
until the Grid processes are started up a local copy of the contents of the OCR is required
which is stored in the OLR.
In 10g the default SGA size is 1G in 11g it is set to 256M and in 12c ASM it is set back to
1G.
– Datafiles
– Redo logfiles
– Spfiles
In 12c the files below can also new be stored in the ASM Diskgroup
– Password file
This is the parameter which controls the number of Allocation units the ASM instance will try
to rebalance at any given time. In ASM versions less than 11.2.0.3 the default value is 11
however it has been changed to unlimited in later versions.
A patch is considered a rolling if it is can be applied to the cluster binaries without having to
shutting down the database in a RAC environment. All nodes in the cluster are patched in a
rolling manner, one by one, with only the node which is being patched unavailable while all
other instance open.
– CLUSTER_DATABASE
– CLUSTER_DATABASE_INSTANCE
– ACTIVE_INSTANCE_COUNT
– UNDO_MANAGEMENT
The Grid software is becoming more and more capable of not just supporting HA for Oracle
Databases but also other applications including Oracle’s applications. With 12c there are
more features and functionality built-in and it is easier to deploy these pre-built solutions,
available for common Oracle applications.
Is there an easy way to verify the inventory for all remote nodes?
You can run the opatch lsinventory -all_nodes command from a single node to look at the
inventory details for all nodes in the cluster.
When database nodes in a cluster are unable to communicate with each other, they may
continue to process and modify the data blocks independently. If the
same block is modified by more than one instance, synchronization/locking of the data
blocks does not take place and blocks may be overwritten by others in the cluster. This
state is called split brain.
Load balancing advisory is a process through which the load of the applications and
resources can be managed throughout the servers.
It monitors the workload of the current activities from all the clusters and the instances that
is being given on the server.
The service that is being provided is active all the time to see the workload of the
applications on the servers.
To simplify it, it provides a percentage value to show the total workload of the instance and
it flags the instance according to the quality.
Load Balancing Advisory helps in maintaining the loads from the servers and equally
distributes it among many other servers that are not currently working.
– Cluster Verification Utility is a tool in the Oracle Grid that is used to eliminate the errors
that come up with the validations of the steps.
– It provides the verification on the changes that is being made in the configuration of the
files or the system.
– The tool can be used with the command line interface and it is used to validate the
configuration input as well such that during the installation it can be found out that
everything is perfectly ok.
– The tool is used to verify the system pre-requisites that are related to Oracle Clusterware,
ASM and the databases.
– There are few fix up scripts available if by any means the verification tool fails then these
scripts can be used to automatically fix the errors.
What are the components required to manage Oracle Real Application Clusters Database?
Oracle RAC uses a single system in the form of an image to configure and manage the
servers in an easy way. It provides a database for the installed and configured applications
from one location so that it can be managed in an easy way.
– Oracle Universal Installer (OUI) is used to manage the database that is related to the
cluster and provide enterprise level configuration.
– Database configuration assistant (DBCA) that manages the database and its related
functionality and services.
– Database upgrade assistant (DBUA) is the tool that allows the database to be upgraded
when it is required on the server.
Well, there is not much difference between 10g and 11gR (1) RAC.
But there is a significant difference in 11gR2.
Databases
Instances
Applications
Node Monitoring
Event Services
High Availability
Databases
Instances
Applications
Cluster Management
Node Management
Event Services
High Availability
Network Management (provides DNS/GNS/MDNSD services on behalf of other
traditional services) and SCAN – Single Access Client Naming method, HAIP
Storage Management (with help of ASM and other new ACFS filesystem)
Time synchronization (rather depending upon traditional NTP)
Removed OS dependent hang checker etc, manages with own additional monitor
process
Clusterware software
4. What are Oracle Kernel Components (nothing but how does Oracle RAC
database differs than Normal single instance database in terms of Binaries and
process)
Basically Oracle kernel need to switched on with RAC On option when you convert to RAC,
that is the difference as it facilitates few RAC bg process like LMON,LCK,LMD,LMS etc.
To turn on RAC
# link the oracle libraries
$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins_rdbms.mk rac_on
# rebuild oracle
$ cd $ORACLE_HOME/bin
$ relink oracle
Oracle RAC is composed of two or more database instances. They are composed of Memory
structures and background processes same as the single instance database.Oracle RAC
instances use two processes GES(Global Enqueue Service), GCS(Global Cache Service) that
enable cache fusion.Oracle RAC instances are composed of following background processes:
5. What is Clusterware?
Software that provides various interfaces and services for a cluster. Typically, this includes
capabilities that:
6. What are the background process that exists in 11gr2 and functionality?
Process
Functionality
Name
•Multicast domain name service (mDNS): Allows DNS requests. The mDNS
mdnsd process is a background process on Linux and UNIX, and a service on
Windows.
•Oracle Grid Naming Service (GNS): Is a gateway between the cluster mDNS
gnsd and external DNS servers. The GNS process performs name resolution within
the cluster.
Cluster Time
Synchronization Service octssd root
(CTSS)
8. What is startup sequence in Oracle 11g RAC? 11g RAC startup sequence?
9. As you said Voting & OCR Disk resides in ASM Diskgroups, but as per startup
sequence OCSSD starts first before than ASM, how is it possible?
How does OCSSD starts if voting disk & OCR resides in ASM Diskgroups?
You might wonder how CSSD, which is required to start the clustered ASM instance, can be
started if voting disks are stored in ASM? This sounds like a chicken-and-egg problem:
without access to the voting disks there is no CSS, hence the node cannot join the cluster.
But without being part of the cluster, CSSD cannot start the ASM instance. To solve this
problem the ASM disk headers have new metadata in 11.2: you can use kfed to read the
header of an ASM disk containing a voting disk. The kfdhdb.vfstart and kfdhdb.vfend fields
tell CSS where to find the voting file. This does not require the ASM instance to be up. Once
the voting disks are located, CSS can access them and joins the cluster.
Source: Pro Oracle Database 11g RAC on Linux- Martin Bach ... - Amazon.com
Grid Naming service is alternative service to DNS , which will act as a sub domain in your
DNS but managed by Oracle, with GNS the connection is routed to the cluster IP and
manages internally.
12.
13. What are the file types that ASM support and keep in disk groups?
Change tracking
Temporary data files RMAN backup sets
bitmaps
Process Description
In 11gr2 the listeners will run from Grid Infrastructure software home
The node listener is a process that helps establish network connections from ASM
clients to the ASM instance.
Runs by default from the Grid $ORACLE_HOME/bin directory
Listens on port 1521 by default
Is the same as a database instance listener
Is capable of listening for all database instances on the same machine in addition to
the ASM instance
Can run concurrently with separate database listeners or be replaced by a separate
database listener
Is named tnslsnr on the Linux platform
A scan listener is something that additional to node listener which listens the incoming db
connection requests from the client which got through the scan IP, it got end points
configured to node listener where it routes the db connection requests to particular node
listener.
cat /etc/oracle/ocr.loc
ocrconfig_loc=+DATA
local_only=FALSE
Process Description
Supported Default
Disk Group Type
Mirroring Levels Mirroring Level
Two-wayThree-
Normal redundancy wayUnprotected Two-way
(None)
ASM can use variable size data extents to support larger files, reduce memory
requirements, and improve performance.
26. How many ASM Diskgroups can be created under one ASM Instance?
$ oifcfg iflist –p -n
To determine the public and private interfaces that have been configured:
$ oifcfg getif
To determine the Virtual IP (VIP) host name, VIP address, VIP subnet mask, and VIP
interface name:
VIP exists.:host01
...
On a single node in the cluster, add the new global interface specification:
# oifcfg getif
Assign the network address to the new network adapters on all nodes using ifconfig:
32. Can I stop all nodes in one command? Meaning that stopping whole cluster ?
33. What is OLR? Which of the following statements regarding the Oracle Local
Registry (OLR) is true?
2.The OLR should be manually created after installing Grid Infrastructure on each node in
the cluster.
3.One of its functions is to facilitate Clusterware startup in situations where the ASM stores
the OCR and voting disks.
crsctl stop cluster (possible only from 11gr2), please note crsctl commands becomes
global now, if you do not specify node specifically the command executed globally for
example
crsctl stop crs (stops in all crs resource in all nodes)
crsctl stop crs –n <ndeoname) (stops only in specified node)
36. CRS is not starting automatically after a node reboot, what you do to make it
happen?
to disable
Read here
Read here
Read here
41. What is the difference between TAF and FAN & FCF? at what conditions you
use them?
ONS is part of the clusterware and is used to propagate messages both between nodes and
to application-tiers
ONS is the foundation for FAN upon which is built FCF.
RAC uses FAN to publish configuration changes and LBA events. Applications can react as
those published events in two way :
- by using ONS api (you need to program it)
- by using FCF (automatic by using JDBC implicit connection cache on the application
server)
you can also respond to FAN event by using server-side callout but this on the server side
(as their name suggests it)
42. Can you add voting disk online? Do you need voting disk backup?
Yes, as per documentation, if you have multiple voting disk you can add online, but if you
have only one voting disk , by that cluster will be down as its lost you just need to start crs
in exclusive mode and add the votedisk using
43. You have lost OCR disk, what is your next step?
The cluster stack will be down due to the fact that cssd is unable to maintain the integrity,
this is true in 10g, From 11gR2 onwards, the crsd stack will be down, the hasd still up and
running. You can add the ocr back by restoring the automatic backup or import the manual
backup,
44. What happens when ocssd fails, what is node eviction? how does node eviction
happens? For all answer will be same.
Read here
Read here
47. Can you modify VIP address after your cluster installation?
48. How do you interpret AWR report in RAC instances, what sections in awr report for rac
instances are most important?
Read here.
a. Case 1: Migrating disk group from one storage to other with same name
1. Consider the disk group is DATA,
2. Create new disks in DATA pointing towards the new storage (EMC),
a) Partioning provisioning done by storage and they give you the device name
or mapper like /dev/mapper/asakljdlas
3. Add the new disk to diskgroup DATA
a) Alter diskgroup data add disk '/dev/mapper/asakljdlas'
3. drop the old disks from DATA with which rebalancing is done automatically.
If you want you can the rebalance by alter system set asm_power_limit =12 for full
throttle.
alter diskgroup data drop disk 'path to hitachi storage'
Note: you can get the device name in v$asm_disk in path column.
4. Request SAN team to detach the old Storage (HITACHI).
b. Case 2: Migrating disk group from one to another with different diskgroup name.
1) Create the Disk group with new name in the new storage.
2) Create the spfile in new diskgroup and change the parameter scope = spfile for
control files etc.
3) Take a control file backup in format +newdiskgroup
4) Shutdown the db, startup nomount the database
5) restore the control file from backup (now the control will restore to new diskgroup)
6) Take the RMAN backup as copy of all the databases with new format.
RMAN> backup database as copy format '+newdiskgroup name' ;
3) RMAN> Switch database to copy.
4) Verify dba_data_files,dba_temp_files, v$log that all files are pointing to new
diskgroup name.
c. Case 3: Migrating disk group to new storage but no additional diskgroup given
1) Take the RMAN backup as copy of all the databases with new format and place it in
the disk.
2) Prepare rename commands from v$log ,v$datafile etc (dynamic queries)
3) Take a backup of pfile and modify the following referring to new diskgroup name
.control_files
.db_create_file_dest
.db_create_online_log_dest_1
.db_create_online_log_dest_2
.db_recovery_file_des
4) stop the database
5) Unmount the diskgroup
asmcmd umount ORA_DATA
6) use asmcmd renamedg (11gr2 only) command to rename to new
diskgroup
renamedg phase=both dgname=ORA_DATA newdgname=NEW_DATA
verbose=true
7) mount the diskgroup
asmcmd mount NEW_DATA
8) start the database in mount with new pfile taken backup in step 3
9) Run the rename file scripts generated at step2
9) Add the diskgroup to cluster the cluster (if using rac)
srvctl modify database -d orcl -p +NEW_FRA/orcl/spfileorcl.ora
srvctl modify database -d orcl -a "NEW_DATA"
srvctl config database -d orcl
srvctl start database -d orcl
10) Delete the old diskgroup from cluster
crsctl delete resource ora.ORA_DATA.dg
11) Open the database.
a. Take the outputs of all the services that are running on the databases.
b. set cluster_database=FALSE
c. Drop all the services associated with the database.
d. Stop the database
e. Startup mount
f. Use nid to change the DB Name.
Generic question, If using ASM the usual location for the datafile would
be +DATA/datafile/OLDDBNAME/system01.dbf'
Does NID changes this path too? to reflect the new db name?
Yes it will, by using proper directory structure it will create a links to
original directory structure. +DATA/datafile/NEWDBNAME/system01.dbf'
this has to be tested, We dont have test bed, but thanks to Anji who
confirmed it will
8.How to find the database in which particular service is attached to when you have a large
number of databases running in the server, you cannot check one by one manually
Write a shell script to read the database name from oratab and iterate the loop taking inpt
as DB name in srvctl to get the result.
#!/bin/ksh
ORACLE_HOME=<crs_home>
PATH=$ORACLE_HOME/bin:$PATH
LD_LIBRARY_PATH=${SAVE_LLP}:${ORACLE_HOME}/lib
do
export ORACLE_SID=$INSTANCE
done
OHAS is complete cluster stack which includes some kernel level tasks like managing
network,time synchronization, disks etc, where the CRS has the ability to manage the
resources like database,listeners,applications, etc With both of this Oracle provides the high
availability clustering services rather only affinity to databases.
ocrcheck
oifcfg getif
cd $GRID_home/CRSCONFIG/INSTALL
ON OTHER NODE
a) RPMS Install
b) Kernel Parameters
e) File limits
d) ntp setup
4. In new node
Remove unwanted log directories in the grid home you just copied
5. Run clone.pl in new node
cd $GRID_HOME/oui/bin/clone.pl -ORACLE_HOME=/u01/oracle/12.1.0/grid
6. Run addnode.sh in existing node (remember this need to be run on existing node not on
new node)
6. Copy the gpnp profile and crs params to new node from existing node
scp $GRID_HOME/crsconfig/install/crs_configparams
newnode:$GRID_HOME/gpnp/peer/profile
./root.sh
Oracle Clusterware
Cluster Interconnects
5) Mention what are the file storage options provided by Oracle Database for Oracle RAC?
The file storage options provided by Oracle Database for Oracle RAC are,
Raw devices
im07t1-real-application-clusters-1895795
6) Mention what is the volume management techniques used in Oracle RAC?
Oracle RAC provides dynamic volume manager. It has a file system that consists of
information of the cluster file system
Cluster file system in Oracle is known as OCFS. It has the connection with the databases
that provide raw devices and command line features.
The new feature added in Oracle ASM 12c is Oracle Flex ASM. Its a new ASM deployment
model which increases instance database availability and reduces the Oracle ASM related
resource consumption.
Oracle Flex ASM instance when fails on a particular node, then the Oracle Flex ASM instance
is passed over to another node in the cluster.
9) Mention what are the key characteristics of RAC or why to use RAC?
Reliability: Eliminates the database server from a single point of failure. If an instance fails,
the remaining instances in the cluster remain active and open.
Cache function is used to show the storage of the information in the clustered network with
the Oracle database. It involves two nodes, one writes the data to the same disk, and other
reads the data block from the disk. For its network connection, RAC uses a dedicated server
for its network, and cache function is an internal part of the cluster.
11) Mention what is the difference between single instance environment and RAC
environment?
Online redo logfile only one instance can write, but other instances can read during recovery
and archiving.
Alert log and trace files are private to each instance. Other instance never write or read to
those files
In Oracle RAC, all the instances/servers communicate with each other using a private
network. When the instance members in a RAC fail to ping/connect to each other via this
private network and continue to process data block independently. Then this process is
referred as Split Brain Syndrome.
13) What happens if you keep split brain syndrome in RAC unresolved? How it can be
resolved?
If you keep split brain syndrome unresolved, then there would be data integrity issue. The
blocks changed in one instance will not be locked and could be over-written by another
instance. It is resolved by using the voting disk, it will decide which node(s) will survive
and which node(s) will be evicted.
14) Mention how can you determine what protocol is being used for Interconnect traffic?
To determine what protocol is being used for Interconnect traffic you can look at the
database alert log for the time period when the database was started up.
In RAC ControlFiles, Spfiles, Redolog files, and Datafiles should be created on shared
storage.
16) Mention where does the Clusterware write when there is a network or storage issue?
When there is a network or storage issue the network ping failure is written in
$CRS_HOME/log
17) Mention what are the tools provided in Oracle Enterprise Manager?
Grid Control-
It is used to deliver the centralized management system and provides configuration and
administration capabilities.
It provides the cost reduction plans and provides higher efficiency
Database Control-
It is related to the Oracle Clusterware. It is used to maintain the services of the Oracle RAC.
It also manages the server pools that are being created with the Oracle Clusterware and
provision to manage it from a single place.
18) Mention what is the difference between Instance recovery and Crash recovery?
A crash recovery takes place when an instance crashes in a single node database on
startup. When the same recovery for an instance is performed in RAC environment by the
surviving nodes then it is called Instance recovery.
if your OCR is corrupted, you can either use the logical or physical OCR backup copy to
restore the repository.
ORL stands for Oracle Local Repository (OLR). It consists of information which enables the
cluster programs to initiate with the OCR in the ASM Storage. Until the grid process are
started, the ASM file is unavailable. In such case, a local copy of the data of the OCR is
required, that is stored in OLR.
What is RAC?
RAC stands for Real Application cluster. It is a clustering solution from Oracle Corporation
that ensures high availability of databases by providing instance failover, media failover
features.
10g Release 2, support 100 nodes in a cluster using Oracle Clusterware, and 100 instances
in a RAC database.
What is SCAN?
Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g
Release 2 feature that provides a single name for clients to access an Oracle Database
running in a cluster. The benefit is clients using SCAN do not need to change if you add or
remove nodes in the cluster.
What is GRD?
GRD stands for Global Resource Directory. The GES and GCS maintain records of the status
of each datafile and each cached block using global resource directory. This process is
referred to as cache fusion and helps in data integrity.
Oracle RAC is composed of two or more instances. When a block of data is read from
datafile by an instance within the cluster and another instance is in need of the same block,
it is easy to get the block image from the instance which has the block in its SGA rather
than reading from the disk. To enable inter instance communication Oracle RAC makes use
of interconnects. The Global Enqueue Service (GES) monitors and Instance enqueue process
manages the cache fusion.
To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy
a query or transaction, OracleRAC instances use two processes, the Global Cache Service
(GCS) and the Global Enqueue Service (GES). The GCS andGES maintain records of the
statuses of each data file and each cached block using a Global Resource Directory (GRD).
The GRD contents are distributed across all of the active instances.
ACMS stands for Atomic Controlfile Memory Service. In an Oracle RAC environment ACMS is
an agent that ensures a distributed SGA memory update (ie) SGA updates are globally
committed on success or globally aborted in event of a failure.
GTX0-j in Detail:-
The process provides transparent support for XA global transactions in a RAC environment.
The database auto tunes the number of these processes based on the workload of XA global
transactions.
LMON in Detail:-
This process monitors global enques and resources across the cluster and performs Global
Enqueue recovery operations. This is called as Global Enqueue Service Monitor.
LMD in Detail:-
This process is called as global enqueue service daemon. This process manages incoming
remote resource requests within each instance.
LMS in Detail:-
This process is called as Global Cache service process. This process maintains status of
datafiles and each cached block by recording information in a Global Resource Directory
(GRD). This process also controls the flow of messages to remote instances and manages
global data block access and transmits block images between the buffer caches of different
instances. This processing is a part of cache fusion feature.
LCK0 in Detail:-
This process is called as Instance enqueue process. This process manages non-cache fusion
resource requests such as library and row cache requests.
RMSn in Detail:-
This process is called as Oracle RAC management process. These processes perform
manageability tasks for Oracle RAC. Tasks include creation of resources related
Oracle RAC when new instances are added to the cluster.
RSMN in Detail:-
This process is called as Remote Slave Monitor. This process manages background slave
process creation and communication on remote instances. This is a background slave
process. This process performs tasks on behalf of a coordinating process running in another
instance.
What are Oracle Clusterware processes for 10g on Unix and Linux
Cluster Synchronization Services (ocssd) — Manages cluster node membership and runs as
the oracle user; failure of this process results in cluster restart.
Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be
a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application
process, and so on) based on the resource's configuration information that is stored in
the OCR. This includes start, stop, monitor and failover operations. This process runs as the
root user
Event manager daemon (evmd) —A background process that publishes events that crs
creates.
Process Monitor Daemon (OPROCD) —This process monitor the cluster and provide I/O
fencing. OPROCD performs its check, stops running, and if the wake up is beyond the
expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure
results in Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on
Linux platforms.
Voting Disk — Oracle RAC uses the voting disk to manage cluster membership by way of a
health check and arbitrates cluster ownership among the instances in case of network
failures. The voting disk must reside on shared disk.
All datafiles, controlfiles, SPFIles, redo log files must reside on cluster-aware shared
storage.
All instances of an Oracle RAC can access all the datafiles, controlfiles, SPFILE's, redolog
files when these files are hosted out of cluster-aware shared storage which are group of
shared disks.
An interconnect network is a private network that connects all of the servers in a cluster.
The interconnect network uses a switch/multiple switches that only the nodes in the cluster
can access.
Configure User Datagram Protocol (UDP) on Gigabit Ethernet for cluster interconnects.
On UNIX and Linux systems we use UDP and RDS (Reliable data socket) protocols to be
used by Oracle Clusterware.
Windows clusters use the TCP protocol.
No, crossover cables are not supported with Oracle Clusterware interconnects.
Cluster interconnect is used by the Cache fusion for inter instance communication.
Clusterware uses the private interconnect for cluster synchronization (network heartbeat)
and daemon communication between the the clustered nodes. This communication is based
on the TCP protocol.
RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP).
Cache Fusion is the remote memory mapping of Oracle buffers, shared between the caches
of participating nodes in the cluster.
Users can access a RAC database using a client/server configuration or through one or more
middle tiers, with or without connection pooling. Users can use oracle services feature to
connect to database.
Applications should use the services feature to connect to the Oracle database. Services
enable us to define rules and characteristics to control how users and applications connect
to database instances.
The characteristics include a unique name, workload balancing, failover options, and high
availability.
Oracle Net Services enable the load balancing of application connections across all of the
instances in an Oracle RACdatabase.
A virtual IP address or VIP is an alternate IP address that the client connections use instead
of the standard public IP address. To configure VIP address, we need to reserve a spare IP
address for each node, and the IP addresses must use the same subnet as the public
network.
If a node fails, then the node's VIP address fails over to another node on which
the VIP address can accept TCPconnections but it cannot accept Oracle connections.
Without using VIPs or FAN, clients connected to a node that died will often wait for
a TCP timeout period (which can be up to 10 min) before getting an error. As a result, you
don't really have a good HA solution without using VIPs.
When a node fails, the VIP associated with it is automatically failed over to some other node
and new node re-arps the world indicating a new MAC address for the IP. Subsequent
packets sent to the VIP go to the new node, which will send error RST packets back to the
clients. This results in the clients getting errors immediately.
VIP addresses failover happens when the node on which the VIP address runs fails; all
interfaces for the VIP address fails, all interfaces for the VIP address are disconnected from
the network.
When a VIP address failover happens, Clients that attempt to connect to the VIP address
receive a rapid connection refused error .They don't have to wait for TCP connection timeout
messages.
What are the administrative tools used for Oracle RAC environments?
Oracle RAC cluster can be administered as a single image using the below
OEM (Enterprise Manager),
SQL*PLUS,
Server control (SRVCTL),
Cluster Verification Utility (CLUVFY),
DBCA,
NETCA
Issue the following query from any one node connecting through SQL*PLUS.
$connect sys/sys as sysdba
SQL>select * from V$ACTIVE_INSTANCES;
The query gives the instance number under INST_NUMBER column, host instance name
under INST_NAME column.
What is FAN?
FAN UP and FAN DOWN events can be applied to instances, services and nodes.
During times of cluster configuration changes, Oracle RAC high availability framework
publishes a FAN event immediately when a state change occurs in the cluster. So
applications can receive FAN events and react immediately. This prevents applications from
polling database and detecting a problem after such a state change.
Why should we have separate homes for ASM instance?
It is a good practice to have ASM home separate from the database home (ORACLE_HOME).
This helps in upgrading and patching ASM and the Oracle database software independent of
each other. Also, we can deinstall the Oracle database software independent of the ASM
instance.
Having ASM is the Oracle recommended storage option for RAC databases as the ASM
maximizes performance by managing the storage configuration across the disks. ASM does
this by distributing the database file across all of the available storage within our cluster
database environment.
It is a new ASM feature from Database 11g. ASM instances in Oracle database 11g
release(from 11.1) can be upgraded or patched using rolling upgrade feature. This enables
us to patch or upgrade ASM nodes in a clustered environment without affecting database
availability. During a rolling upgrade we can maintain a functional cluster while one or more
of the nodes in the cluster are running in different software versions.
No, it can be used only for Oracle database 11g releases (from 11.1).
State the initialization parameters that must have same value for every instance in
an Oracle RAC database:-
Some initialization parameters are critical at the database creation time and must have
same values. Their value must be specified in SPFILE or PFILE for every instance. The list of
parameters that must be identical on every instance are given below:
These parameters can be identical on all instances only if these parameter values are set to
zero.
What two parameters must be set at the time of starting up an ASM instance in
a RAC environment?
Oracle Clusterware is made up of components like voting disk and Oracle Cluster Registry
(OCR).
What are the modes of deleting instances from Oracle Real Application cluster
Databases?
We can delete instances using silent mode or interactive mode using DBCA (Database
Configuration Assistant).
We need to stop and delete the instance in the node first in interactive or silent mode. After
that ASM can be removed using srvctl tool as follows:
srvctl stop asm -n node_name
srvctl remove asm -n node_name
We can verify if ASM has been removed by issuing the following command:
srvctl config asm -n node_name
How do we verify that an instance has been removed from OCR after deleting an
instance?
We can verify the current backup of OCR using the following command : ocrconfig -
showbackup
We have v$ views that are instance specific. In addition we have GV$ views called as global
views that has an INST_ID column of numeric data type.GV$ views obtain information from
individual V$ views.
There are two types of connection load-balancing: server-side load balancing and client-side
load balancing.
What is the difference between server-side and client-side connection load
balancing?
Client-side balancing happens at client side where load balancing is done using listener. In
case of server-side load balancing listener uses a load-balancing advisory to redirect
connections to the instance providing best service.
To display backups :
#ocrconfig -showbackup
To restore a backup :
#ocrconfig -restore
With Oracle RAC 10g Release 2 or later, you can also use the export command:
#ocrconfig -export -s online, and use -import option to restore the contents back.
With Oracle RAC 11g Release 1, you can do a manual backup of the OCR with the
command:
# ocrconfig -manual backup
The Oracle Notification Service (ONS) daemon is an daemon started by the CRS clusterware
as part of the nodeapps. There is one ons daemon started per clustered node.
The Oracle Notification Service daemon receives a subset of published clusterware events
via the local evmd and racgimon Clusterware daemons and forward those events to
application subscribers and to the local listeners.
Srvctl cannot start instance, I get the following error PRKP-1001 CRS-0215,
however sqlplus can start it on both nodes? How do you identify the problem?
Set the environmental variable SRVM_TRACE to true.. And start the instance with srvctl.
Now you will get detailed error stack.
What is (use of) Virtual IP (VIP) in Oracle Real Application Clusters (RAC)?
When installing Oracle 10g/11g R1 RAC, three network interfaces (IPs) are required for each
node in the RAC cluster, they are:
Public Interface: Used for normal network communications to the node
Private Interface: Used as the cluster interconnect
Virtual (Public) Interface: Used for failover and RAC management
When installing Oracle 11g R2 RAC, we need one more network interface (IP) is required for
each node in the RACcluster.
SCAN Interface (IP): Single Client Access Name (SCAN) is a new Oracle Real
Application Clusters (RAC) 11g Release 2 feature, which provides a single name for clients
to access an Oracle Database running in a cluster. The benefit is clients using SCAN do not
need to change if you add or remove nodes in the cluster.
When a client connects to a tns-alias, it uses a TCP connection to an IP address, defined in
the tnsnames.ora file. When using RAC, we define multiple addresses in our tns-alias, to be
able to failover when an IP address, listener or instance is unavailable. TCP timeouts can
differ from platform to platform or implementation to implementation. This makes it difficult
to predict the failover time.
Oracle 10g Cluster Ready Services enables databases to use a Virtual IP address to
configure the listener ON. This feature is to assure that oracle clients quickly failover when a
node fails. In Oracle Database 10g RAC, the use of a virtual IP address to mask the
individual IPO addresses of the clustered nodes is required. The virtual IP addresses are
used to simplify failover and are automatically managed by CRS.
To create a Virtual IP (VIP) address, the Virtual IP Configuration Assistant (VIPCA) is called
from the root.sh script of a RACinstall, which then configures the virtual IP addresses for
each node specified during the installation process. In order to be able to run VIPCA, there
must be unused public IP addresses available for each node that has been configured in the
/etc/hosts file.
One public IP address for each node to use for its Virtual IP address for client connections
and for connection failover. This IP address is in addition to the operating system managed
public host IP address that is already assigned to the node by the operating system. This
public Virtual IP must be associated with the same interface name on every node that is a
part of the cluster. The IP addresses that are used for all of the nodes that are part of a
cluster must be from the same subnet. The host names for the VIP addresses must be
registered with the domain name server (DNS). The Virtual IP address should not be in use
at the time of the installation because this is a Virtual IP address that Oracle manages
internally to the RAC processes. This virtual IP address does not require a separate NIC. The
VIPs should be registered in the DNS. The VIP addresses must be on the same subnet as
the public host network addresses. Each Virtual IP (VIP) configured requires an unused and
resolvable IP address.
Using virtual IP we can save our TCP/IP timeout problem because Oracle notification service
(ONS) maintains communication between each nodes and listeners. Once ONS found any
listener down or node down, it will notify another nodes and listeners. While new connection
is trying to establish connection to failure node or listener, virtual IP of failure node
automatically divert to surviving node and session will be establishing in another surviving
node. This process doesn't wait for TCP/IP timeout event. Due to this new connection gets
faster session establishment to another surviving nodes/listener.
Virtual IP (VIP) is for fast connection establishment in failover dictation. Still we can use
physical IP address in Oracle 10g in listener if we have no worry for failover timing. We can
change default TCP/IP timeout using operating system utilities/commands and kept smaller.
But taking advantage of VIP (Virtual IP address) in Oracle 10g RAC database is advisable.
What is RAC? What is the benefit of RAC over single instance database?
Benefits:
Improve throughput
High availability
Transparency
Oracle RAC one Node is a single instance running on one node of the cluster while the 2nd
node is in cold standby mode. If the instance fails for some reason then RAC one node
detect it and restart the instance on the same node or the instance is relocate to the 2nd
node incase there is failure or fault in 1st node.
The benefit of this feature is that it provides a cold failover solution and it automates the
instance relocation without any downtime and does not need a manual intervention. Oracle
introduced this feature with the release of 11gR2 (available with Enterprise Edition).
Availability – nodes can be added or replaced without having to shutdown the database
Scalability – more nodes can be added to the cluster as the workload increases
A virtual IP address or VIP is an alternate IP address that the client connections use instead
of the standard public IP address. To configure VIP address, we need to reserve a spare IP
address for each node, and the IP addresses must use the same
The Clusterware is installed on each node (on an Oracle Home) and on the shared disks (the
voting disks and the CSR file)
The base software is installed on each node of the cluster and the database storage on the
shared disks.
What kind of storage we can use for the shared Clusterware files?
OCFS (Release 1 or 2)
Raw devices
When a VIP address failover happens, Clients that attempt to connect to the VIP address
receive a rapid connection refused error .They don’t have to wait for TCP connection timeout
messages.
Voting Disk is a file that sits in the shared storage area and must be accessible by all nodes
in the cluster. All nodes in the cluster registers their heart-beat information in the voting
disk, so as to confirm that they are all operational. If heart-beat information of any node in
the voting disk is not available that node will be evicted from the cluster.
The CSS (Cluster Synchronization Service) daemon in the clusterware maintains the heart
beat of all nodes to the voting disk. When any node is not able to send heartbeat to voting
disk, then it will reboot itself, thus help avoiding the split-brain syndrome.
For high availability, Oracle recommends that you have a minimum of three or odd number
(3 or greater) of votingdisks.
Voting Disk – is file that resides on shared storage and Manages cluster members. Voting
disk reassigns cluster ownership between the nodes in case of failure.
The Voting Disk Files are used by Oracle Clusterware to determine which nodes are
currently members of the cluster. The voting disk files are also used in concert with other
Cluster components such as CRS to maintain the clusters integrity.
Oracle Database 11g Release 2 provides the ability to store the voting disks in ASM along
with the OCR. Oracle Clusterware can access the OCR and the voting disks present in ASM
even if the ASM instance is down. As a result CSS can continue to maintain the Oracle
cluster even if the ASM instance has failed.
What kind of storage we can use for the RAC database storage?
OCFS (Release 1 or 2)
ASM
raw devices
What is a CFS?
A cluster File System (CFS) is a file system that may be accessed (read and write) by all
members in a cluster at the same time. This implies that all members of a cluster have the
same view.
What is an OCFS2?
The OCFS2 is the Oracle (version 2) Cluster File System which can be used for the Oracle
Real Application Cluster.
Oracle files (controlfiles, datafiles, redologs, files described by the bfile datatype)
A raw device is a disk drive that does not yet have a file system set up. Raw devices are
used for Real Application Clusters since they enable the sharing of disks.
Oracle expects that you will configure at least 3 voting disks for redundancy purposes. You
should always configure an odd number of voting disks >= 3. This is because loss of more
than half your voting disks will cause the entire cluster to fail.
A raw partition is a portion of a physical disk that is accessed at the lowest possible level. A
raw partition is created when an extended partition is created and logical partitions are
assigned to it without any formatting. Once formatting is complete, it is called cooked
partition.
A CFS offers:
– Simpler management
– With Oracle_Home on CFS, when you apply Oracle patches CFS guarantees that the
updated Oracle_Home is visible to all nodes in the cluster.
It returns a dead connection IMMEDIATELY, when its primary node fails. Without using VIP
IP, the clients have to wait around 10 minutes to receive ORA-3113: “end of file on
communications channel”. However, using Transparent Application Failover (TAF) could
avoid ORA-3113.
SSH (Secure Shell,10g+) or RSH (Remote Shell, 9i+) allows “oracle” UNIX account
connecting to another RAC node and copy/ run commands as the local “oracle” UNIX
account.
No. SSH or RSH are needed only for RAC, patch set installation and clustered database
creation.
Each node of a cluster that is being used for a clustered database will typically have the
RDBMS and RAC software loaded on it, but not actual data files (these need to be available
via shared disk).
What are the restrictions on the SID with a RAC database? Is it limited to 5 characters?
The SID prefix in 10g Release 1 and prior versions was restricted to five characters by
install/ config tools so that an ORACLE_SID of up to max of 5+3=8 characters can be
supported in a RAC environment. The SID prefix is relaxed up to 8 characters in 10g
Release 2, see bug 4024251 for more information.
Does Real Application Clusters support heterogeneous platforms?
The Real Application Clusters do not support heterogeneous platforms in the same cluster.
Are there any issues for the interconnect when sharing the same switch as the public
network by using VLAN to separate the network?
RAC and Clusterware deployment best practices suggests that the interconnect (private
connection) be deployed on a stand-alone, physically separate, dedicated switch. On big
network the connections could be unstable.
The Cluster Verification Utility (CVU) is a validation tool that you can use to check all the
important components that need to be verified at different stages of deployment in a RAC
environment.
No, the OCR and voting disk must be on raw or CFS (cluster file system).
The voting disk is nothing but a file that contains and manages information of all the node
memberships.
RMAN to make backups of the database, dd to backup your voting disk and hard copies of
the OCR file.
What command would you use to check the availability of the RAC system?
What is SCAN?
Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g
Release 2 feature that provides a single name for clients to access an Oracle Database
running in a cluster. The benefit is clients using SCAN do not need to change if you add or
remove nodes in the cluster.
In a RAC environment, it is the combining of data blocks, which are shipped across the
interconnect from remote database caches (SGA) to the local node, in order to fulfill the
requirements for a transaction (DML, Query of Data Dictionary).
When an instance crashes in a single node database on start-up a crash recovery takes
place. In a RAC environment the same recovery for an instance is performed by the
surviving nodes called Instance recovery.
It is a private network which is used to ship data blocks from one instance to another for
cache fusion. The physical data blocks as well as data dictionary blocks are shared across
this interconnect.
How do you determine what protocol is being used for Interconnect traffic?
One of the ways is to look at the database alert log for the time period when the database
was started up.
What methods are available to keep the time synchronized on all nodes in the cluster?
Either the Network Time Protocol(NTP) can be configured or in 11gr2, Cluster Time
Synchronization Service (CTSS) can be used.
Spfiles, ControlFiles, Datafiles and Redolog files should be created on shared storage.
Where does the clusterware write when there is a network or Storage missed heartbeat?
The network ping failure is written in $CRS_HOME/log
The ocrconfig -showbackup can be run to find out the automatic and manually run backups.
You can use either the logical or the physical OCR backup copy to restore the Repository.
How do you find out what object has its blocks being shipped across the instance the most?
The VIP is an alternate Virtual IP address assigned to each node in a cluster. During a node
failure the VIP of the failed node moves to the surviving node and relays to the application
that the node has gone down. Without VIP, the application will wait for TCP timeout and
then find out that the session is no longer live due to the failure.
You can query the V$ACTIVE_INSTANCES view to determine the member instances of the
RAC cluster.
The Cluster Health Monitor (CHM) stores operating system metrics in the CHM repository for
all nodes in a RAC cluster. It stores information on CPU, memory, process, network and
other OS data, This information can later be retrieved and used to troubleshoot and identify
any cluster related issues.
It is a default component of the 11gr2 grid install. The data is stored in the master
repository and replicated to a standby repository on a different node.
What would be the possible performance impact in a cluster if a less powerful node (e.g.
slower CPU’s) is added to the cluster?
All processing will show down to the CPU speed of the slowest server.
CLUSTER_DATABASE
CLUSTER_DATABASE_INSTANCE
ACTIVE_INSTANCE_COUNT
UNDO_MANAGEMENT
The Grid software is becoming more and more capable of not just supporting HA for Oracle
Databases but also other applications including Oracle’s applications. With 12c there are
more features and functionality built-in and it is easier to deploy these pre-built solutions,
available for common Oracle applications.
You can run the opatch lsinventory -all_nodes command from a single node to look at the
inventory details for all nodes in the cluster.
Oracle RAC is composed of two or more database instances. They are composed of Memory
structures and background processes same as the single instance database.Oracle RAC
instances use two processes GES(Global Enqueue Service), GCS(Global Cache Service) that
enable cache fusion.
Transfer of data across instances through private interconnect is called cache fusion.Oracle
RAC is composed of two or more instances. When a block of data is read from datafile by an
instance within the cluster and another instance is in need of the same block,it is easy to
get the block image from the instance which has the block in its SGA rather than reading
from the disk. To enable inter instance communication Oracle RAC makes use of
interconnects. The Global en-queue Service(GES) monitors and Instance en-queue process
manages the cache fusion
Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g
Release 2 feature that provides a single name for clients to access an Oracle Database
running in a cluster. The benefit is clients using SCAN do not need to change if you add or
remove nodes in the cluster.
SCAN Name
What is FAN?
If you need to find the location of OCR (Oracle Cluster Registry) but your CRS is down.
Look into “ocr.loc” file, location of this file changes depending on the OS:
On Linux: /etc/oracle/ocr.loc
On Solaris: /var/opt/oracle/ocr.loc
Set ASM environment or CRS environment then run the below command:
ocrcheck
Network Card 2 (with IP address set 2) for private network (for inter node communication
between rac nodes used by clusterware and rac database)
Public IP adress is the normal IP address typically used by DBA and SA to manage storage,
system and database. Public IP addresses are reserved for the Internet.
Private IP address is used only for internal clustering processing (Cache Fusion) (aka as
interconnect). Private IP addresses are reserved for private networks.
VIP is used by database applications to enable fail over when one cluster node fails. The
purpose for having VIP is so client connection can be failover to surviving nodes in case
there is failure
Can application developer access the private ip ?
No. private IP address is used only for internal clustering processing (Cache Fusion)
Voting Disk —> Oracle RAC uses the voting disk to manage cluster membership by way of a
health check and arbitrates cluster ownership among the instances in case of network
Oracle Cluster Registry (OCR) —> Maintains cluster configuration information as well as
configuration information about any cluster database within the cluster. The OCR must
reside on shared disk that is accessible by all of the nodes in your cluster
Clusterware uses the private interconnect for cluster synchronization (network heartbeat)
and daemon communication between the the clustered nodes. This communication is based
on the TCP protocol.
RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP).
Cache Fusion is the remote memory mapping of Oracle buffers, shared between the caches
of participating nodes in the cluster.
Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP
timeout period (which can be up to 10 min) before getting an error. As a result, you
When a node fails, the VIP associated with it is automatically failed over to some other node
and new node re-arps the world indicating a new MAC address for the IP. Subsequent
packets sent to the VIP go to the new node, which will send error RST packets back to the
clients. This results in the clients getting errors immediately.
What is dynamic remastering ? When will the dynamic remastering happens?
dynamic remastering is ability to move the ownership of resource from one instance to
another instance in RAC.
dynamic resource remastering is used to implement for resource affinity for increased
performance.
resource affinity optimized the system in situation where update transactions are being
executed in one instance.
when activity shift to another instance the resource affinity correspondingly move to
another instance.
you have n number of instances running in their own separate nodes and based on the
shared storage.
Cluster is the key component and is a collection of servers operations as one unit.
RAC is the best solution for high performance and high availably.
Non RAC databases has single point of failure in case of hardware failure or server crash.
What is GRD?
The GES and GCS maintains records of the statuses of each datafile and each cached block
using global resource directory.This process is referred to as cache fusion and helps in data
integrity.
GC CR request :the time it takes to retrieve the data from the remote cache
Reason: RAC Traffic Using Slow Connection or Inefficient queries (poorly tuned queries will
increase the amount of data blocks requested by an Oracle session.
The more blocks requested typically means the more often a block will need to be read from
a remote instance via the interconnect.)
GC BUFFER BUSY: It is the time the remote instance locally spends accessing the requested
data block.
Cluster interconnect is used by the Cache fusion for inter instance communication.
Applications should use the services feature to connect to the Oracle database.Services
enable us to define rules and characteristics to control how users and applications connect
to database instances.
Issue the following query from any one node connecting through SQL*PLUS.
Oracle clusterware manages CRS resources based on the configuration information of CRS
resources stored in OCR(Oracle Cluster Registry).
We need to stop and delete the instance in the node first in interactive or silent mode.After
that asm can be removed using srvctl tool as follows:
We can verify if ASM has been removed by issuing the following command:
There are two types of connection load-balancing:server-side load balancing and client-side
load balancing.
What is the difference between server-side and client-side connection load balancing?
Client-side balancing happens at client side where load balancing is done using listener.In
case of server-side load balancing listener uses a load-balancing advisory to redirect
connections to the instance providing best service.
Oracle recommends us to use the dd command to backup the voting disk with a minimum
block size of 4KB.
dd if=backup_file_name of=voting_disk_name
where,
1. What is RAC?
RAC stands for Real Application cluster.
It is a clustering solution from Oracle Corporation that ensures high availability of databases
by providing instance failover, media failover features.
Oracle RAC is a cluster database with a shared cache architecture that overcomes the
limitations of traditional shared-nothing and shared-disk approaches to provide a highly
scalable and available database solution for all the business applications.
Oracle RAC provides the foundation for enterprise grid computing.
7. What command would you use to check the availability of the RAC system?
crs_stat -t -v (-t -v are optional)
12. Give few examples for solutions that support cluster storage?
·ASM (automatic storage management),
·Raw disk devices,
·Network file system (NFS),
·OCFS2 and
·OCFS (Oracle Cluster Fie systems).
Oracle’s Real Application Clusters (RAC) option supports the transparent deployment of a
single database across a cluster of servers, providing fault tolerance from hardware failures
or planned outages. Oracle RAC running on clusters provides Oracle’s highest level of
capability in terms of availability, scalability, and low-cost computing.
Oracle Cluster-ware has two key components Cluster Registry OCR and Voting Disk.
The cluster registry holds all information about nodes, instances, services and ASM
storage if used, it also contains state information ie they are available and up or similar.
The voting disk is used to determine if a node has failed, i.e. become separated from the
majority. If a node is deemed to no longer belong to the majority then it is forcibly rebooted
and will after the reboot add itself again the the surviving cluster nodes.
22. What are the administrative tasks involved with voting disk?
Following administrative tasks are performed with the voting disk :
1) Backing up voting disks
2) Recovering Voting disks
3) Adding voting disks
4) Deleting voting disks
5) Moving voting disks
23. Can you add voting disk online? Do you need voting disk backup?
Yes, as per documentation, if you have multiple voting disk you can add online, but if you
have only one voting disk , by that cluster will be down as its lost you just need to start crs
in exclusive mode and add the vote disk using
crsctl add votedisk <path>
27. You have lost OCR disk, what is your next step?
The cluster stack will be down due to the fact that cssd is unable to maintain the integrity,
this is true in 10g, from 11gR2 onwards, and the crsd stack will be down, the hasd still up
and running. You can add the ocr back by restoring the automatic backup or import the
manual backup,
GC CR request: the time it takes to retrieve the data from the remote cache
Reason: RAC Traffic Using Slow Connection or Inefficient queries (poorly tuned queries will
increase the amount of data blocks requested by an Oracle session. The more blocks
requested typically means the more often a block will need to be read from a remote
instance via the interconnect.)
GC BUFFER BUSY: It is the time the remote instance locally spends accessing the
requested data block.
30. How do OCSSD starts first if voting disk & OCR resides in ASM Disk groups?
You might wonder how CSSD, which is required to start the clustered ASM instance,
Without access to the voting disks there is no CSS, hence the node cannot join the cluster.
But without being part of the cluster, CSSD cannot start the ASM instance.
To solve this problem the ASM disk headers have new metadata in 11.2:
you can use kfed to read the header of an ASM disk containing a voting disk.
The kfdhdb.vfstart and kfdhdb.vfend fields tell CSS where to find the voting file. This does
not require the ASM instance to be up.
Once the voting disks are located, CSS can access them and joins the cluster.
GSDCTL stands for Global Service Daemon Control, we can use gsdctl commands to start,
stop, and obtain the status of the GSD service on any platform.
Srvctl cannot start instance, I get the following error PRKP-1001 CRS-0215,
34. However sql-plus can start it on both nodes? Or, how do you identify the
problem?
Set the environmental variable SRVM_TRACE to true... And start the instance with srvctl.
Now you will get detailed error stack.
35. What are Oracle Cluster-ware processes for 10g on UNIX and Linux?
Cluster Synchronization Services (ocssd) — Manages cluster node membership and
runs as the oracle user; failure of this process results in cluster restart.
Cluster Ready Services (crsd) — The crs process manages cluster resources (which could
be a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application
process, and so on) based on the resource's configuration information that is stored in the
OCR. This includes start, stop, monitor and failover operations. This process runs as the root
user
Event manager daemon (evmd) — a background process that publishes events that crs
creates.
Process Monitor Daemon (OPROCD) —this process monitor the cluster and provide I/O
fencing. OPROCD performs its check, stops running, and if the wake up is beyond the
expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure
results in Oracle Cluster-ware restarting the node. OPROCD uses the hangcheck timer on
Linux platforms.
SCAN IP can be disabled if not required. However SCAN IP is mandatory during the RAC
installation. Enabling/disabling SCAN IP is mostly used in oracle apps environment by the
concurrent manager (kind of job scheduler in oracle apps).
40. What are the different networks components are in 10g RAC?
Public, private, and vip components
Private interfaces is for intra node communication.
VIP is all about availability of application. When a node fails then the VIP component fail
over to some other node, this is the reason that all applications should based on vip
components means tns entries should have vip entry in the host list
54. What is the difference between server-side and client-side connection load
balancing?
Client-side balancing happens at client side where load balancing is done using listener. In
case of server-side load balancing listener uses a load-balancing advisory to redirect
connections to the instance providing best service.
Client Side load balancing: - Oracle client side load balancing feature enables clients to
randomize the connection requests among all the available listeners based on their load.
An tns entry that contains all nodes entries and use load balance=on (default its on) will use
the connect time load balancing or client side load balancing.
finance =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = myrac2-vip)(PORT = 2042))
(ADDRESS = (PROTOCOL = TCP)(HOST = myrac1-vip)(PORT = 2042))
(ADDRESS = (PROTOCOL = TCP)(HOST = myrac3-vip)(PORT = 2042))
(LOAD_BALANCE = yes)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = FINANCE) (FAILOVER=ON)
(FAILOVER_MODE = (TYPE = SELECT) (METHOD = BASIC) (RETRIES = 180) (DELAY =
5))
)
)
Server side load balancing:- This improves the connection performance by balancing the
number of active connections among multiple instances and dispatchers. In a single
instance environment (shared servers), the listener selects the least dispatcher to handle
the incoming client requests. In rac environments, PMON is aware of all instances load and
dispatchers, and depending on the load information PMON redirects the connection to the
least loaded node.
local_listener=LISTENER_MYRAC1
remote_listener = LISTENERS_MYRACDB
55. What are the administrative tools used for Oracle RAC environments?
Oracle RAC cluster can be administered as a single image using the below
· OEM (Enterprise Manager),
· SQL*PLUS,
· Server control (SRVCTL),
· Cluster Verification Utility (CLUVFY),
· DBCA,
· NETCA
59. How do we verify that an instance has been removed from OCR after deleting
an instance?
Issue the following srvctl command:
srvctl config database -d database_name
cd CRS_HOME/bin
./crs_stat
60. What are the modes of deleting instances from Oracle Real Application cluster
Databases?
We can delete instances using silent mode or interactive mode using DBCA (Database
Configuration Assistant).
61. What are the background process that exists in 11gr2 and functionality?
Process Name Functionality
crsd •The CRS daemon (crsd) manages cluster resources based on configuration
information that is stored in Oracle Cluster Registry (OCR) for each resource. This includes
start, stop, monitor, and failover operations. The crsd process generates events when the
status of a resource changes.
cssd •Cluster Synchronization Service (CSS): Manages the cluster configuration by
controlling which nodes are members of the cluster and by notifying members when a node
joins or leaves the cluster. If you are using certified third-party cluster-ware, then CSS
processes interfaces with your cluster-ware to manage node membership information. CSS
has three separate processes: the CSS daemon (ocssd), the CSS Agent (cssdagent), and
the CSS Monitor (cssdmonitor). The cssdagent process monitors the cluster and provides
input/output fencing. This service formerly was provided by Oracle Process Monitor daemon
(oprocd), also known as Ora Fence Service on Windows. A cssdagent failure results in
Oracle Clusterware restarting the node.
diskmon •Disk Monitor daemon (diskmon): Monitors and performs input/output fencing
for Oracle Exadata Storage Server. As Exadata storage can be added to any Oracle RAC
node at any point in time, the diskmon daemon is always started when ocssd is started.
evmd •Event Manager (EVM): Is a background process that publishes Oracle Cluster-
ware events
mdnsd •Multicast domain name service (mDNS): Allows DNS requests. The mDNS
process is a background process on Linux and UNIX, and a service on Windows.
gnsd •Oracle Grid Naming Service (GNS): Is a gateway between the cluster mDNS and
external DNS servers. The GNS process performs name resolution within the cluster.
ons •Oracle Notification Service (ONS): Is a publish-and-subscribe service for
communicating Fast Application Notification (FAN) events
oraagent •oraagent: Extends cluster-ware to support Oracle-specific requirements and
complex resources. It runs server callout scripts when FAN events occur. This process was
known as RACG in Oracle Clusterware 11g Release 1 (11.1).
orarootagent •Oracle root agent (orarootagent): Is a specialized oraagent process that
helps CRSD manage resources owned by root, such as the network, and the Grid virtual IP
address
oclskd •Cluster kill daemon (oclskd): Handles instance/node evictions requests that have
been escalated to CSS
gipcd •Grid IPC daemon (gipcd): Is a helper daemon for the communications
infrastructure
ctssd •Cluster time synchronization daemon(ctssd) to manage the time synchronization
between nodes, rather depending on NTP
63. What is the major difference between 10g and 11g RAC?
There is not much difference between 10g and 11gR (1) RAC. But there is a significant
difference in 11gR2.
From 11gR2(onwards) its completed HA stack managing and providing the following
resources as like the other cluster software like VCS etc.
Databases
Instances
Applications
Cluster Management
Node Management
Event Services
High Availability
Network Management (provides DNS/GNS/MDNSD services on behalf of other traditional
services) and SCAN – Single Access Client Naming method, HAIP
Storage Management (with help of ASM and other new ACFS filesystem)
Time synchronization (rather depending upon traditional NTP)
Removed OS dependent hang checker etc, manages with own additional monitor process
65. State the initialization parameters that must have same value for every
instance in an Oracle RAC database?
Some initialization parameters are critical at the database creation time and must have
same values. Their value must be specified in SPFILE or PFILE for every instance. The list of
parameters that must be identical on every instance are given below:
ACTIVE_INSTANCE_COUNT
ARCHIVE_LAG_TARGET
COMPATIBLE
CLUSTER_DATABASE
CLUSTER_DATABASE_INSTANCE
CONTROL_FILES
DB_BLOCK_SIZE
DB_DOMAIN
DB_FILES
DB_NAME
DB_RECOVERY_FILE_DEST
DB_RECOVERY_FILE_DEST_SIZE
DB_UNIQUE_NAME
INSTANCE_TYPE (RDBMS or ASM)
PARALLEL_MAX_SERVERS
REMOTE_LOGIN_passWORD_FILE
UNDO_MANAGEMENT
-------------------------------------------------------------------------------------------------------
---
66. What is RAC? What is the benefit of RAC over single instance database?
In Real Application Clusters environments, all nodes concurrently execute transactions
against the same database. Real Application Clusters coordinates each node's access to the
shared data to provide consistency and integrity.
Benefits:
Improve response time
Improve throughput
High availability
Transparency
A virtual IP address or VIP is an alternate IP address that the client connections use instead
of the standard public IP address. To configure VIP address, we need to reserve a spare IP
address for each node, and the IP addresses must use the same subnet as the public
network.
For high availability, Oracle recommends that you have a minimum of three or odd number
(3 or greater) of voting disks.
Voting Disk - is file that resides on shared storage and Manages cluster members. Voting
disk reassigns cluster ownership between the nodes in case of failure.
The Voting Disk Files are used by Oracle Cluster-ware to determine which nodes are
currently members of the cluster. The voting disk files are also used in concert with other
Cluster components such as CRS to maintain the clusters integrity.
Oracle Database 11g Release 2 provides the ability to store the voting disks in ASM along
with the OCR. Oracle Cluster-ware can access the OCR and the voting disks present in ASM
even if the ASM instance is down. As a result CSS can continue to maintain the Oracle
cluster even if the ASM instance has failed.
Oracle expects that you will configure at least 3 voting disks for redundancy purposes. You
should always configure an odd number of voting disks >= 3. This is because loss of more
than half your voting disks will cause the entire cluster to fail.
You should plan on allocating 280MB for each voting disk file. For example, if you are using
ASM and external redundancy then you will need to allocate 280MB of disk for the voting
disk. If you are using ASM and normal redundancy you will need 560MB.
SCAN provides a single domain name via (DNS), allowing and-users to address a RAC
cluster as-if it were a single IP address. SCAN works by replacing a hostname or IP list with
virtual IP addresses (VIP).
Single client access name (SCAN) is meant to facilitate single name for all Oracle clients to
connect to the cluster database, irrespective of number of nodes and node location. Until
now, we have to keep adding multiple address records in all clients tnsnames.ora, when a
new node gets added to or deleted from the cluster.
Single Client Access Name (SCAN) eliminates the need to change TNSNAMES entry when
nodes are added to or removed from the Cluster. RAC instances register to SCAN listeners
as remote listeners. Oracle recommends assigning 3 addresses to SCAN, which will create 3
SCAN listeners, though the cluster has got dozens of nodes.. SCAN is a domain name
registered to at least one and up to three IP addresses, either in DNS (Domain Name
Service) or GNS (Grid Naming Service). The SCAN must resolve to at least one address on
the public network. For high availability and scalability, Oracle recommends configuring the
SCAN to resolve to three addresses.
After an Oracle RAC node crashes—usually from a hardware failure—all new application
transactions are automatically rerouted to a specified backup node. The challenge in
rerouting is to not lose transactions that were "in flight" at the exact moment of the crash.
One of the requirements of continuous availability is the ability to restart in-flight application
transactions, allowing a failed node to resume processing on another server without
interruption. Oracle's answer to application failover is a new Oracle Net mechanism dubbed
Transparent Application Failover. TAF allows the DBA to configure the type and method of
failover for each Oracle Net client.
TAF architecture offers the ability to restart transactions at either the transaction (SELECT)
or session level.
Databases
Instances
Applications
Node Monitoring
Event Services
High Availability
From 11gR2 (onwards) it’s completed HA stack managing and providing the
following resources as like the other cluster software like VCS etc.
Databases
Instances
Applications
Cluster Management
Node Management
Event Services
High Availability
Network Management (provides DNS/GNS/MDNSD services on behalf of other
traditional services) and SCAN – Single Access Client Naming method, HAIP
Storage Management (with help of ASM and other new ACFS filesystem)
Time synchronization (rather depending upon traditional NTP)
Removed OS dependent hang checker etc, manages with own additional monitor
process
2. What are Oracle Cluster Components?
Cluster Interconnect (HAIP)
Shared Storage (OCR/Voting Disk)
Cluster-ware software
3. What are Oracle RAC Components?
VIP, Node apps etc.
4. What are Oracle Kernel Components (nothing but how does Oracle RAC
database differs than Normal single instance database in terms of Binaries and
process)?
Basically Oracle kernel need to switched on with RAC On option when you convert to RAC,
that is the difference as it facilitates few RAC bg process like LMON,LCK,LMD,LMS etc.
To turn on RAC
# link the oracle libraries
$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins_rdbms.mk rac_on
# rebuild oracle
$ cd $ORACLE_HOME/bin
$ relink oracle
Oracle RAC is composed of two or more database instances. They are composed of Memory
structures and background processes same as the single instance database.Oracle RAC
instances use two processes GES(Global Enqueue Service), GCS(Global Cache Service) that
enable cache fusion.Oracle RAC instances are composed of following background processes:
ACMS—Atomic Controlfile to Memory Service (ACMS)
GTX0-j—Global Transaction Process
LMON—Global Enqueue Service Monitor
LMD—Global Enqueue Service Daemon
LMS—Global Cache Service Process
LCK0—Instance Enqueue Process
RMSn—Oracle RAC Management Processes (RMSn)
RSMN—Remote Slave Monitor
5. What is Clusterware?
Software that provides various interfaces and services for a cluster. Typically, this includes
capabilities that:
13. What are the file types that ASM support and keep in disk groups?
Control files Flashback logs Data Pump dump sets
Data Guard
Data files DB SPFILE
configuration
Change tracking
Temporary data files RMAN backup sets
bitmaps
Online redo logs RMAN data file copies OCR files
Archive logs Transport data files ASM SPFILE
14. List Key benefits of ASM?
The node listener is a process that helps establish network connections from ASM
clients to the ASM instance.
Runs by default from the Grid $ORACLE_HOME/bin directory
Listens on port 1521 by default
Is the same as a database instance listener
Is capable of listening for all database instances on the same machine in addition to
the ASM instance
Can run concurrently with separate database listeners or be replaced by a separate
database listener
Is named tnslsnr on the Linux platform
15. What is SCAN listener?
A scan listener is something that additional to node listener which listens the incoming db
connection requests from the client which got through the scan IP, it got end points
configured to node listener where it routes the db connection requests to particular node
listener.
16. What is the difference between CRSCTL and SRVCTL?
crsctl manages clusterware-related operations:
cat /etc/oracle/ocr.loc
ocrconfig_loc=+DATA
local_only=FALSE
SCAN IP can be disabled if not required. However SCAN IP is mandatory during the
RAC installation. Enabling/disabling SCAN IP is mostly used in oracle apps environment by
the concurrent manager (kind of job scheduler in oracle apps).
To disable the SCAN IP,
i. Do not use SCAN IP at the client end.
ii. Stop scan listener
srvctl stop scan_listener
iii. Stop scan
srvctl stop scan (this will stop the scan vip's)
iv. Disable scan and disable scan listener
srvctl disable scan
5. Migrating to new Disk-group scenarious
a. Case 1: Migrating disk group from one storage to other with same name
1. Consider the disk group is DATA,
2. Create new disks in DATA pointing towards the new storage (EMC),
a) Partioning provisioning done by storage and they give you the device name
or mapper like /dev/mapper/asakljdlas
3. Add the new disk to diskgroup DATA
a) Alter diskgroup data add disk '/dev/mapper/asakljdlas'
3. drop the old disks from DATA with which rebalancing is done automatically.
If you want you can the rebalance by alter system set asm_power_limit =12 for full
throttle.
alter diskgroup data drop disk 'path to hitachi storage'
Note: you can get the device name in v$asm_disk in path column.
4. Request SAN team to detach the old Storage (HITACHI).
b. Case 2: Migrating disk group from one to another with different diskgroup name.
1) Create the Disk group with new name in the new storage.
2) Create the spfile in new diskgroup and change the parameter scope = spfile for
control files etc.
3) Take a control file backup in format +newdiskgroup
4) Shutdown the db, startup nomount the database
5) restore the control file from backup (now the control will restore to new diskgroup)
6) Take the RMAN backup as copy of all the databases with new format.
RMAN> backup database as copy format '+newdiskgroup name' ;
3) RMAN> Switch database to copy.
4) Verify dba_data_files,dba_temp_files, v$log that all files are pointing to new
diskgroup name.
c. Case 3: Migrating disk group to new storage but no additional diskgroup given
1) Take the RMAN backup as copy of all the databases with new format and place it in
the disk.
2) Prepare rename commands from v$log ,v$datafile etc (dynamic queries)
3) Take a backup of pfile and modify the following referring to new diskgroup name
.control_files
.db_create_file_dest
.db_create_online_log_dest_1
.db_create_online_log_dest_2
.db_recovery_file_des
4) stop the database
5) Unmount the diskgroup
asmcmd umount ORA_DATA
6) use asmcmd renamedg (11gr2 only) command to rename to new
diskgroup
renamedg phase=both dgname=ORA_DATA newdgname=NEW_DATA
verbose=true
7) mount the diskgroup
asmcmd mount NEW_DATA
8) start the database in mount with new pfile taken backup in step 3
9) Run the rename file scripts generated at step2
9) Add the diskgroup to cluster the cluster (if using rac)
srvctl modify database -d orcl -p +NEW_FRA/orcl/spfileorcl.ora
srvctl modify database -d orcl -a "NEW_DATA"
srvctl config database -d orcl
srvctl start database -d orcl
10) Delete the old diskgroup from cluster
crsctl delete resource ora.ORA_DATA.dg
11) Open the database.
7. Database rename in RAC, what could be the checklist for you?
a. Take the outputs of all the services that are running on the databases.
b. set cluster_database=FALSE
c. Drop all the services associated with the database.
d. Stop the database
e. Startup mount
f. Use nid to change the DB Name.
Generic question, If using ASM the usual location for the datafile would be
+DATA/datafile/OLDDBNAME/system01.dbf'
Does NID changes this path too? to reflect the new db name?
Yes it will, by using proper directory structure it will create a links to original
directory structure. +DATA/datafile/NEWDBNAME/system01.dbf'
this has to be tested, We dont have test bed, but thanks to Anji who confirmed
it will
OHAS is complete cluster stack which includes some kernel level tasks like managing
network, time synchronization, disks etc, where the CRS has the ability to manage the
resources like database,listeners,applications, etc With both of this Oracle provides the high
availability clustering services rather only affinity to databases.
The Oracle Clusterware is designed to perform a node eviction by removing one or more
nodes from the cluster if some critical problem is detected. A critical problem could be a
node not responding via a network heartbeat, a node not responding via a disk heartbeat, a
hung or severely degraded machine, or a hung ocssd.bin process. The purpose of this node
eviction is to maintain the overall health of the cluster by removing bad members.
Starting in 11.2.0.2 RAC (or if you are on Exadata), a node eviction may not actually reboot
the machine. This is called a rebootless restart. In this case we restart most of the
clusterware stack to see if that fixes the unhealthy node.
OCSSD (aka CSS daemon) - This process is spawned by the cssdagent process. It runs in
both vendor clusterware and non-vendor clusterware environments. OCSSD's primary job
is internode health monitoring and RDBMS instance endpoint discovery. The health
monitoring includes a network heartbeat and a disk heartbeat (to the voting files). OCSSD
can also evict a node after escalation of a member kill from a client (such as a database
LMON process). This is a multi-threaded process that runs at an elevated priority and runs
as the Oracle user.
Startup sequence: INIT --> init.ohasd --> ohasd --> ohasd.bin --> cssdagent --> ocssd -->
ocssd.bin
CSSDAGENT - This process is spawned by OHASD and is responsible for spawning the
OCSSD process, monitoring for node hangs (via oprocd functionality), and monitoring to the
OCSSD process for hangs (via oclsomon functionality), and monitoring vendor clusterware
(via vmon functionality). This is a multi-threaded process that runs at an elevated priority
and runs as the root user.
Startup sequence: INIT --> init.ohasd --> ohasd --> ohasd.bin --> cssdagent
CSSDMONITOR - This proccess also monitors for node hangs (via oprocd functionality),
monitors the OCSSD process for hangs (via oclsomon functionality), and monitors vendor
clusterware (via vmon functionality). This is a multi-threaded process that runs at an
elevated priority and runs as the root user.
Startup sequence: INIT --> init.ohasd --> ohasd --> ohasd.bin --> cssdmonitor
*Messages files:
Linux: /var/log/messages
Sun: /var/adm/messages
HP-UX: /var/adm/syslog/syslog.log
Please refer to the following document which provides information on collecting together
most of the above files:
11.2 Clusterware evictions should, in most cases, have some kind of meaningful error in the
clusterware alert log. This can be used to determine which process is responsible for the
reboot. Example message from a clusterware alert log:
This particular eviction happened when we had hit the network timeout. CSSD exited and
the cssdagent took action to evict. The cssdagent knows the information in the error
message from local heartbeats made from CSSD.
If no message is in the evicted node's clusterware alert log, check the lastgasp logs on the
local node and/or the clusterware alert logs of other nodes.
If you have encountered an OCSSD eviction review common causes in section 3.1 below.
Network failure or latency between nodes. It would take 30 consecutive missed checkins (by
default - determined by the CSS misscount) to cause a node eviction.
Problems writing to or reading from the CSS voting disk. If the node cannot perform a disk
heartbeat to the majority of its voting files, then the node will be evicted.
A member kill escalation. For example, database LMON process may request CSS to
remove an instance from the cluster via the instance eviction mechanism. If this times out
it could escalate to a node kill.
An unexpected failure or hang of the OCSSD process, this can be caused by any of the
above issues or something else.
An Oracle bug.
All files from section 2.0 from all cluster nodes. More data may be required.
2012-03-27 22:05:48.693: [
CSSD][1100548416]###################################
OS messages:
Mar 27 22:03:58 choldbr132p kernel: Error:Mpx:All paths to Symm 000190104720 vol 0c71
are dead.
Mar 27 22:03:58 choldbr132p kernel: Buffer I/O error on device sdbig, logical block 0
...
An Oracle bug.
All files from section 2.0 from all cluster nodes. More data may be required.
Importance of master node in a cluster:
- Master node has the least Node-id in the cluster. Node-ids are assigned to the nodes in
the same order as the nodes join the cluster. Hence, normally the node which joins the
cluster first is the master node.
- CRSd process on the Master node is responsible to initiate the OCR backup as per the
backup policy
- Master node is also responsible to sync OCR cache across the nodes
- CRSd process oth the master node reads from and writes to OCR on disk
- In case of node eviction, The cluster is divided into two sub-clusters. The sub-cluster
containing fewer no. of nodes is evicetd. But, in case both the sub-clusters have same no. of
nodes, the sub-cluster having the master node survives whereas the other sub-cluster is
evicted.
- When OCR master (crsd.bin process) stops or restarts for whatever reason, the crsd.bin
on surviving node with lowest node number will become new OCR master.
The v$cache_transfer and v$file_cache_transfer views are used to examine RAC statistics.
The types of blocks that use the cluster interconnects in a RAC environment are monitored
with the v$ cache transfer series of views:
v$cache_transfer: This view shows the types and classes of blocks that Oracle transfers
over the cluster interconnect on a per-object basis.
The forced_reads and forced_writes columns can be used to determine the types of objects
the RAC instances are sharing.
Values in the forced_writes column show how often a certain block type is transferred out of
a local buffer cache due to the current version being requested by another instance.
INST_ID IP_ADDRESS
---------- ----------------
1 192.168.261.1
2 192.168.261.2
With Oracle Grid Infrastructure 11g release 2 (11.2), Oracle Automatic Storage
Management (Oracle ASM) and Oracle Clusterware are installed into a single home
directory, which is referred to as the Grid Infrastructure home. Configuration assistants
start after the installer interview process that configures Oracle ASM and Oracle Cluster
ware. The installation of the combined products is called Oracle Grid Infrastructure.
However, Oracle Clusterware and Oracle Automatic Storage Management remain
separate products.
With this release, Oracle Cluster Registry (OCR) and voting disks can be placed on Oracle
Automatic Storage Management (Oracle ASM).
This feature enables Oracle ASM to provide a unified storage solution, storing all the data
for the clusterware and the database, without the need for third-party volume managers or
cluster filesystems. For new installations, OCR and voting disk files can be placed either on
Oracle ASM or on a cluster file system or NFS system. Installing Oracle Clusterware files on
raw or block devices is no longer supported, unless an existing system is being upgraded.
The fixup script is generated during installation. You are prompted to run the script as root
in a separate terminal session. When you run the script, it raises kernel values to required
minimums, if necessary, and completes other operating system configuration tasks. You
also can have Cluster Verification Utility (CVU) generate fixup scripts before installation.
In the past, adding or removing servers in a cluster required extensive manual preparation.
With this release, you can continue to configure server nodes manually or use Grid Plug and
Play to configure them dynamically as nodes are added or removed from the cluster.
Grid Plug and Play reduces the costs of installing, configuring, and managing server nodes
by starting a grid naming service within the cluster to allow each node to perform the
following tasks dynamically:
■ Configuring or reconfiguring itself using profile data, making host names and addresses
resolvable on the network
Because servers perform these tasks dynamically, the number of steps required to add or
delete nodes is minimized.
Oracle Clusterware 11g release 2 (11.2) replaces the oprocd and Hangcheck processes with
the cluster synchronization service daemon Agent and Monitor to provide more Accurate
recognition of hangs and to avoid false termination
If IPMI is configured, then Oracle Clusterware uses IPMI when node fencing is required and
the server is not responding.
With this release, the Single Client Access Name (SCAN) is the host name to provide for all
clients connecting to the cluster. The SCAN is a domain name registered to at
least one and up to three IP addresses, either in the domain name service (DNS) or the Grid
Naming Service (GNS).
The primary benefit of a Single Client Access Name (SCAN) is not having to update client
connection information (such as TNSNAMES.ora) every time you add or remove nodes from
an existing RAC cluster.
Clients use a simple EZconnect string and JDBC connections can use a JDBC thin URL to
access the database, which is done independently of the physical hosts that the database
instances are running on. Additionally, SCAN automatically provides both failover and load
balancing of connects, where the new connection will be directed to the least busy instance
in the cluster by default.
It should be noted here that because EZconnect is used with SCAN, the SQLNET.ora file
should include EZconnect as one of the naming methods, for example:
NAMES.DIRECTORY_PATH=(tnsnames,ezconnect,ldap)
sqlplus user/pass@mydb-scan:1521/myservice
jdbc:oracle:thin@mydb-scan:1521/myservice
It's highly recommended that the clients are Oracle 11g R2 clients, to allow them to fully
take advantage of the failover with the SCAN settings.
The TNSNAMES.ora file would now reference the SCAN rather than the VIPs as has been
done in previous versions. This is what a TNSNAMES entry would be:
MYDB
(DESCRIPTION=
(ADDRESS=(PROTOCOL=TCP)(HOST=mydb-scan.ORACLE.COM)(PORT=1521))
(CONNECT_DATA=(SERVICE_NAME=myservice.ORACLE.COM)))
There are two methods available for defining the SCAN. These are to use your corporate
DNS to define the SCAN; the second option is to use Grid Naming Service.
To use the DNS method for defining your SCAN, the network administrator must create a
single name that resolves to three separate IP addresses using round-robin algorithms.
Regardless of how many systems are part of your cluster, Oracle recommends that 3 IP
addresses are configured to allow for failover and load-balancing.
It is important that the IP addresses are on the same subnet as the public network for the
server. The other two requirements are that the name (not including the domain suffix) are
15 characters or less in length and that the name can be resolved without using the domain
suffix. Also, the IP addresses should not be specifically assigned to any of the nodes in the
cluster.
You can test the DNS setup by running an nslookup on the scan name two or more times.
Each time, the IP addresses should be returned in a different order:
Using GNS assumes that a DHCP server is running on the public network with enough
available addresses to assign the required IP addresses and the SCAN VIP. Only one static
IP address is required to be configured and it should be in the DNS domain.
The database will register each instance to the scan listener using the REMOTE_LISTENER
parameter in the spfile. Oracle 11g R2 RAC databases will only register with the SCAN
listeners. Upgraded databases, however, will continue to register with the local listener as
well as the SCAN listener via the REMOTE_LISTENER parameter. The LOCAL_LISTENER
parameter would be set to the node VIP for upgraded systems.
The REMOTE_LISTENER parameter, rather than being set to an alias that would be in a
server side TNSNAMES file (as it has been in previous versions), would be set simply to the
SCAN entry: The alter command would be
SOURCE:
http://docs.oracle.com/cd/E11882_01/install.112/e48195/undrstnd.htm#RIWIN610
An Oracle Database 11g release 2 (11.2) database service automatically registers with the
listeners specified in the LOCAL_LISTENER and REMOTE_LISTENER parameters. During
registration, PMON sends information such as the service name, instance names, and
workload information to the listeners. This feature is called service registration
Services coordinate their sessions by registering their workload, or the amount of work they
are currently handling, with the local listener and the SCAN listeners. Clients are redirected
by the SCAN listener to a local listener on the least-loaded node that is running the instance
for a particular service. This feature is called load balancing. The local listener either directs
the client to a dispatcher process (if the database was configured for shared server), or
directs the client to a dedicated server process.
When a listener starts after the Oracle instance starts, and the listener is available for
service registration, registration does not occur until the next time the Oracle Database
process monitor (PMON) starts its discovery routine. By default, the PMON discovery routine
is started every 60 seconds. To override the 60-second delay, use the SQL statement ALTER
SYSTEM REGISTER. This statement forces PMON to register the service immediately.
Local Listeners
- Starting with Oracle Database 11g release 2 (11.2), the local listener, or default
listener, is located in the Grid home when you have Oracle Grid Infrastructure installed.
Grid_home\network\admin directory.
- Oracle Clusterware 11g release 2 and later, the listener association no longer requires
tnsnames.ora file entries. The listener associations are configured as follows:
· - DBCA no longer sets the LOCAL_LISTENER parameter. The Oracle Clusterware agent
that starts the database sets the LOCAL_LISTENER parameter dynamically, and it sets it to
the actual value, not an alias. So listener_alias entries are no longer needed in the
tnsnames.ora file.
Three SCAN addresses are configured for the cluster, and allocated to servers. When a
client issues a connection request using SCAN, the three SCAN addresses are returned to
the client. If the first address fails, then the connection request to the SCAN name fails over
to the next address. Using multiple addresses allows a client to connect to an instance of
the database even if the initial instance has failed.
The net service name does not need to know the physical address of the server on which
the database, database instance, or listener runs. SCAN is resolved by DNS, which returns
three IP addresses to the client. The client then tries each address in succession until a
connection is made.
Understanding SCAN
SCAN is a fully qualified name (host name.domain name) that is configured to resolve to all
the addresses allocated for the SCAN listeners.
SCAN is configured in DNS to resolve to three IP addresses, and DNS should return the
addresses using a round-robin algorithm. This means that when SCAN is resolved by DNS,
the IP addresses are returned to the client in a different order each time.
- Based on the environment, the following actions occur when you use SCAN to connect to
an Oracle RAC database using a service name.
1. The PMON process of each instance registers the database services with the default
listener on the local node and with each SCAN listener, which is specified by the
REMOTE_LISTENER database parameter.
2. The listeners are dynamically updated on the amount of work being handled by the
instances and dispatchers.
The client issues a database connection request using a connect descriptor of the
form:
orausr/@scan_name:1521/sales.example.com
Note:
If you use the Easy Connect naming method, then ensure the sqlnet.ora file on the client
contains EZCONNECT in the list of naming methods specified by
theNAMES.DIRECTORY_PATH parameter.
3. The client uses DNS to resolve scan_name. After DNS returns the three addresses
assigned to SCAN, the client sends a connect request to the first IP address. If the connect
request fails, then the client attempts to connect using the next IP address.
4. When the connect request is successful, the client connects to a SCAN listener for the
cluster which hosts the sales database. The SCAN listener compares the workload of the
instances sales1 and sales2 and the workload of the nodes on which they are running.
Because node2 is less loaded than node1, the SCAN listener selects node2 and sends the
address for the listener on that node back to the client.
The client connects to the local listener on node2. The local listener starts a dedicated
server process for the connection to the database. The client connects directly to the
dedicated server process on node2 and accesses the sales2 database instance.
With this release, you can use the server control utility SRVCTL to shut down all Oracle
software running within an Oracle home, in preparation for patching. Oracle Grid
Infrastructure patching is automated across all nodes, and patches can be applied in a
multi-node, multi-patch fashion.
To streamline cluster installations, especially for those customers who are new to clustering,
Oracle introduces the Typical Installation path. Typical installation defaults as many options
as possible to those recommended as best practices.
The GCS and GES processes, and the GRD collaborate to enable Cache Fusion. The Oracle
RAC processes and their identifiers are as follows:
In an Oracle RAC environment, the ACMS per-instance process is an agent that contributes
to ensuring a distributed SGA memory update is either globally
The GTX0-j process provides transparent support for XA global transactions in an Oracle
RAC environment. The database autotunes the number of these processes
The LMD process manages incoming remote resource requests within each instance.
The LMON process manages the GES, it maintains consistency of GCS memory structure in
case of process death. It is also responsible for cluster reconfiguration and locks
reconfiguration (node joining or leaving), it checks for instance deaths and listens for local
messaging. A detailed log file is created that tracks any reconfigurations that have
happened.
This is the cache fusion part and the most active process; it handles the consistent copies of
blocks that are transferred between instances.
It receives requests from LMD to perform lock requests. It rolls back any uncommitted
transactions.
There can be up to ten LMS processes running and can be started dynamically if demand
requires it.
They manage lock manager service requests for GCS resources and send them to a service
queue to be handled by the LMSn process.
It also handles global deadlock detection and monitors for lock conversion timeouts.
As a performance gain you can increase this process priority to make sure CPU starvation
does not occur
You can see the statistics of this daemon by looking at the view X$KJMSDP
The LCK0 process manages non-Cache Fusion resource requests such as library and row
cache requests.
The RMSn processes perform manageability tasks for Oracle RAC. Tasks accomplished by an
RMSn process include creation of resources related to Oracle
RSMN: Remote Slave Monitor manages background slave process creation and
communication on remote instances. These background slave processes perform
This is a lightweight process; it uses the DIAG framework to monitor the health of the
cluster. It captures information for later diagnosis in the event of failures. It will perform
any necessary recovery
============================================
Cluster Verification Utility (CVU) command to verify OCR integrity of all of the nodes in your
cluster database:
List the nodes in your cluster by running the following command on one node:
olsnodes
$ocrconfig -showbackup
Synopsis:
ocrconfig [option]
option:
[-local] -export <filename>
Run the following command to inspect the contents and verify the integrity of the backup
file:
ocrcheck
Run ocrcheck and if the command returns a failure message, then both the primary OCR
and the OCR mirror have failed.
The OCRCHECK utility displays the version of the OCR's block format, total space available
and used space, OCRID, and the OCR locations that you have configured. OCRCHECK
performs a block-by-block checksum operation for all of the blocks in all of the OCRs that
you have configured. It also returns an individual status for each file as well as a result for
the overall OCR integrity check.
$./ocrcheck -local
Version : 3
ID : 814444380
Run the following command to inspect the contents and verify the integrity of the backup
file:
The number of voting files you can store in a particular Oracle ASM disk group depends
upon the redundancy of the disk group.
· External redundancy: A disk group with external redundancy can store only one
voting disk
· Normal redundancy: A disk group with normal redundancy stores three voting disks
· High redundancy: A disk group with high redundancy stores five voting disks
To migrate voting disks to Oracle ASM, specify the Oracle ASM disk group name in the
following command:
In Oracle Clusterware 11g release 2 (11.2), you no longer have to back up the voting disk.
The voting disk data is automatically backed up in OCR as part of any configuration change
and is automatically
restored to any voting disk added. If all voting disks are corrupted, however,
If all of the voting disks are corrupted, then you can restore them, as follows:
I. Restore OCR: This step is necessary only if OCR is also corrupted or otherwise
unavailable,
such as if OCR is on Oracle ASM and the disk group is no longer available.
If a resource fails, then before attempting to restore OCR, restart the resource.
As a definitive verification that OCR failed, run ocrcheck and if the command returns a
failure message,
then both the primary OCR and the OCR mirror have failed. Attempt to correct the problem
using the
$ olsnodes
2. Stop Oracle Clusterware by running the following command as root on all of the nodes:
If the preceding command returns any error due to OCR corruption, stop Oracle Clusterware
by running the following command as root on all of the nodes:
3. If you are restoring OCR to a cluster file system or network file system, then run the
following command as root to restore OCR with an OCR backup that you can identify in
"Listing Backup Files":
4. Start the Oracle Clusterware stack on one node in exclusive mode by running the
following command as root:
The -nocrs option ensures that the crsd process and OCR do not start with the rest of the
Oracle Clusterware stack.
Check whether crsd is running. If it is, then stop it by running the following command as
root:
Caution:
Do not use the -init flag with any other command.
5. If you want to restore OCR to an Oracle ASM disk group, then you must first create a disk
group using SQL*Plus that has the same name as the disk group you want to restore and
mount it on the local node.
If you cannot mount the disk group locally, then run the following SQL*Plus command:
Optionally, if you want to restore OCR to a raw device, then you must run the ocrconfig -
repair -replace command as root, assuming that you have all the necessary permissions on
all nodes to do so and that OCR was not previously on Oracle ASM.
6. Restore OCR with an OCR backup that you can identify in "Listing Backup Files" by
running the following command as root:
Notes:
Ensure that the OCR devices that you specify in the OCR configuration exist and that these
OCR devices are valid.
If you configured OCR in an Oracle ASM disk group, then ensure that the Oracle ASM disk
group exists and is mounted.
See Also:
Oracle Grid Infrastructure Installation Guide for information about creating OCRs
Oracle Automatic Storage Management Administrator's Guide for more information about
Oracle ASM disk group management
# ocrcheck
8. Stop Oracle Clusterware on the node where it is running in exclusive mode:
9. Run the ocrconfig -repair -replace command as root on all the nodes in the cluster where
you did not the ocrconfig -restore command. For example, if you ran the ocrconfig -restore
command on node 1 of a four-node cluster, then you must run the ocrconfig -repair -replace
command on nodes 2, 3, and 4.
10. Begin to start Oracle Clusterware by running the following command as root on all of
the nodes:
11. Verify OCR integrity of all of the cluster nodes that are configured as part of your cluster
by running the following CVU command:
II. Run the following command as root from only one node to start the Oracle Clusterware
stack in exclusive mode,
III. Run the crsctl query css votedisk command to retrieve the list of voting files currently
defined, similar to the following:
This list may be empty if all voting disks were corrupted, or may have entries that are
marked as status 3 or OFF.
IV. Depending on where you store your voting files, do one of the following:
If the voting disks are stored in Oracle ASM, then run the following command to migrate the
voting disks to the Oracle ASM disk group you specify:
The Oracle ASM disk group to which you migrate the voting files must exist in Oracle ASM.
You can use this
command whether the voting disks were stored in Oracle ASM or some other storage
device.
If you did not store voting disks in Oracle ASM, then run the following command using the
File Universal Identifier (FUID) obtained in the previous step:
Note:
If the Oracle Clusterware stack is running in exclusive mode, then use the -f option to force
the shutdown of the stack.
Voting Files stored in ASM - How many disks per disk group do I need?
If Voting Files are stored in ASM, the ASM disk group that hosts the Voting Files will place
the appropriate number of Voting Files in accordance to the redundancy level. Once Voting
Files are managed in ASM, a manual addition, deletion, or replacement of Voting Files will
fail, since users are not allowed to manually manage Voting Files in ASM.
If the redundancy level of the disk group is set to "external", 1 Voting File is used.
If the redundancy level of the disk group is set to "normal", 3 Voting Files are used.
If the redundancy level of the disk group is set to "high", 5 Voting Files are used.
Note that Oracle Clusterware will store the disk within a disk group that holds the Voting
Files. Oracle Clusterware does not rely on ASM to access the Voting Files.
In addition, note that there can be only one Voting File per failure group. In the above list of
rules, it is assumed that each disk that is supposed to hold a Voting File resides in its own,
dedicated failure group.
In other words, a disk group that is supposed to hold the above mentioned number of
Voting Files needs to have the respective number of failure groups with at least one disk. (1
/ 3 / 5 failure groups with at least one disk)
Consequently, a normal redundancy ASM disk group, which is supposed to hold Voting Files,
requires 3 disks in separate failure groups, while a normal redundancy ASM disk group that
is not used to store Voting Files requires only 2 disks in separate failure groups.
If you lose 1/2 or more of all of your voting disks, then nodes get evicted from the cluster,
or nodes kick themselves out of the cluster. It doesn't threaten database corruption.
Alternatively you can use external redundancy which means you are providing redundancy
at the storage level using RAID.
For this reason when using Oracle for the redundancy of your voting disks, Oracle
recommends that customers use 3 or more voting disks in Oracle RAC 10g Release 2. Note:
For best availability, the 3 voting files should be physically separate disks. It is
recommended to use an odd number as 4 disks will not be any more highly available than 3
disks, 1/2 of 3 is 1.5...rounded to 2, 1/2 of 4 is 2, once we lose 2 disks, our cluster will fail
with both 4 voting disks or 3 voting disks.
Restoring corrupted voting disks is easy since there isn't any significant persistent data
stored in the voting disk. See the Oracle Clusterware Admin and Deployment Guide for
information on backup and restore of voting disks.
An odd number of voting disks is required for proper clusterware configuration. A node must
be able to strictly access more than half of the voting disks at any time. So, in order to
tolerate a failure of n voting disks, there must be at least 2n+1 configured. (n=1 means 3
voting disks).
The odd number of voting disks should be configured to provide a method to determine who
in the cluster should survive.
A node must be able to access more than half of the voting disks at any time. For example,
let’s have a two node cluster with an even number of let’s say 2 voting disks. Let Node1 is
able to access voting disk1 and Node2 is able to access voting disk2. This means that there
is no common file where clusterware can check the heartbeat of both the nodes. If we have
3 voting disks and both the nodes are able to access more than half i.e. 2 voting disks,
there will be at least on disk which will be accessible by both the nodes. The clusterware can
use that disk to check the heartbeat of both the nodes. Hence, each node should be able to
access more than half the number of voting disks. A node not able to do so will have to be
evicted from the cluster by another node that has more than half the voting disks, to
maintain the integrity of the cluster. After the cause of the failure has been corrected and
access to the voting disks has been restored, you can instruct Oracle Clusterware to recover
the failed node and restore it to the cluster.
Loss of more than half your voting disks will cause the entire cluster to fail!!
Node-ids are assigned to the nodes in the same order as the nodes join the cluster.
Hence, normally the node which joins the cluster first is the master node.
- CRSd process on the Master node is responsible to initiate the OCR backup as per the
backup policy
- Master node is also responsible to sync OCR cache across the nodes
- CRSd process oth the master node reads from and writes to OCR on disk
- In case of node eviction, the cluster is divided into two sub-clusters. The sub-cluster
containing fewer no. of nodes is evicetd. But, in case both the sub-clusters have same no. of
nodes, the sub-cluster having the master node survives whereas the other sub-cluster is
evicted.
In RAC, every data block is mastered by an instance. Mastering a block simply means that
master instance
keeps track of the state of the block until the next reconfiguration event.
– Manually
– Resource affinity
– Instance crash
- Method – I gets info about master node from v$gcspfmaster_info using data_object_id
- Method – II gets info about master node from v$dlm_ress and v$ges_enqueue using
resource name in hexadecimal format
- Method – III gets info about master node from x$kjbl with x$le using resource name in
hexadecimal format
– CURRENT SCENARIO -
- 3 node setup
— SETUP –
select owner, data_object_id, object_name from dba_objects where owner = ‘SCOTT’ and
object_name = ‘EMP1';
For Method-II and Method-III, we need to find out file_id and block_id and hence GCS
resource name in hexadecimal format
min(dbms_rowid.rowid_block_number(rowid)) MIN_BLOCK_ID,
max(dbms_rowid.rowid_block_number(rowid)) MAX_BLOCK_ID
from scott.emp1
group by dbms_rowid.rowid_relative_fno(rowid);
4 523 523
– Find the GCS resource name to be used in the query using blodk_id and data_object_id
retrieved above.
Hexname will be used to query resource master using v$dlm_ress , v$ges_enqueue, $kjbl
and x$le
where a.le_kjbl=b.kjbllockp
from x$bh
and class = 1
and state <> 3);
HEXNAME RESOURCE_NAME
————————- —————
[0x20b][0x4],[BL] 523,4,BL
Method – I gets info about master node from v$gcspfmaster_info using data_object_id
– ——-
where o.data_object_id=74652
OBJECT_NAM CURRENT_MASTER
———- ————–
EMP1 0
—- Method II gets info about master node from v$dlm_ress and v$ges_enqueue
RESOURCE_NAME MASTER_NODE
———————- ———–
[0x20b][0x4],[BL] 0
Method – III gets info about master node from x$kjbl with x$le
– This SQL joins x$kjbl with x$le to retrieve resource master for a block
from x$kjbl
) kj, x$le le
order by le.le_addr;
[0x20b][0x4],[BL] 523,4,BL 0
For the OCR tools (OCRDUMP, OCRCHECK, OCRCONFIG) record log information in the
following location:
CRS_Home/log/hostname/client
To change the amount of logging, edit the path in the CRS_home/srvm/admin/ocrlog.ini file.
CRS_home/log/hostname/crsd
The Oracle RAC high availability trace files are located in the following two locations:
CRS_home/log/hostname/racg
$ORACLE_HOME/log/hostname/racg
Each RACG executable has a subdirectory assigned exclusively for that executable.
The name of the RACG executable subdirectory is the same as the name of the executable.
vmohswort022
vmohswort026
vmohswort029
cd /oracrs/oracle/product/112/bin
./crsctl start crs
SQL>startup mount;
2. Review the ConvertToRAC.xml file, and modify the parameters as required for your
system. The XML sample file contains comment lines that provide instructions for how to
configure the file.
When you have completed making changes, save the file with the syntax filename.xml.
Make a note of the name you select.
3. Navigate to the directory $ORACLE_HOME/bin, and use the following syntax to run the
command rconfig: rconfig input.xml
Where input.xml is the name of the XML input file you configured in step 2.
For example, if you create an input XML file called convert.xml, then enter the following
command
$./rconfig convert.xml
Note:
The Convert verify option in the ConvertToRAC.xml file has three options:
Convert verify="YES": rconfig performs checks to ensure that the prerequisites for single-
instance to Oracle RAC conversion have been met before it starts conversion
Convert verify="NO": rconfig does not perform prerequisite checks, and starts conversion
Convert verify="ONLY" rconfig only performs prerequisite checks; it does not start
conversion after completing prerequisite checks
If performing the conversion fails, then use the following procedure to recover and
reattempt the conversion.:
· Attempt to delete the database using the DBCA delete database option.
· Review the conversion log, and fix any problems it reports that may have caused the
conversion failure. The rconfig log files are under the rconfig directory in
$ORACLE_BASE/cfgtoollogs.
root@hrdevdb2 /oracrs/app/11.2.0.3/grid/network/admin
=>/oracrs/app/11.2.0.3/grid/bin/crsctl query css votedisk
Version : 3
ID : 38393080
root@hrdevdb2 /oracrs/app/11.2.0.3/grid/network/admin
=>/oracrs/app/11.2.0.3/grid/bin/ocrcheck
Version : 3
ID : 38393080
A current read is one where a session reads the current value of the data block from
another instance’s Data Buffer Cache. This current value contains the most up-to-date
committed data. The current read would happen when a second instance needs a data block
that has not been changed. This is often thought of as a read/read situation. The current
read will be seen as any wait event that starts with gc current.
Consistent Read
3. Then checks the UNDO segment to see if the transaction has been committed or not
4. If the transaction has been committed, it creates the REDO records and reads the
block
5. If the transaction has not been committed, it creates a CR block for itself using the
UNDO/ROLLBACK information.
6. Creating a CR image in RAC is a bit different and can come with some I/O overheads.
This is because the UNDO could be spread across instances and hence to build a CR copy of
the block, the instance might has to visit UNDO segments on other instances and hence
perform certain extra I/O
As you said Voting & OCR Disk resides in ASM Diskgroups, but as per startup sequence
OCSSD starts first before than ASM, how is it possible?
How does OCSSD starts if voting disk & OCR resides in ASM Diskgroups?
You might wonder how CSSD, which is required to start the clustered ASM instance, can be
started if voting disks are stored in ASM? This sounds like a chicken-and-egg problem:
without access to the voting disks there is no CSS, hence the node cannot join the cluster.
But without being part of the cluster, CSSD cannot start the ASM instance. To solve this
problem the ASM disk headers have new metadata in 11.2: you can use kfed to read the
header of an ASM disk containing a voting disk. The kfdhdb.vfstart and kfdhdb.vfend fields
tell CSS where to find the voting file. This does not require the ASM instance to be up. Once
the voting disks are located, CSS can access them and joins the cluster.
In a RAC environment, it is the combining of data blocks, which are shipped across the
interconnect from remote database caches (SGA) to the local node, in order to fulfill the
requirements for a transaction (DML, Query of Data Dictionary).
When database nodes in a cluster are unable to communicate with each other, they may
continue to process and modify the data blocks independently. If the
same block is modified by more than one instance, synchronization/locking of the data
blocks does not take place and blocks may be overwritten by others in the cluster. This
state is called split brain.
When an instance crashes in a single node database on startup a crash recovery takes
place. In a RAC enviornment the same recovery for an instance is performed by the
surviving nodes called Instance recovery.
It is a private network which is used to ship data blocks from one instance to another for
cache fusion. The physical data blocks as well as data dictionary blocks are shared across
this interconnect.
How do you determine what protocol is being used for Interconnect traffic?
One of the ways is to look at the database alert log for the time period when the database
was started up.
What methods are available to keep the time synchronized on all nodes in the cluster?
Either the Network Time Protocol(NTP) can be configured or in 11gr2, Cluster Time
Synchronization Service (CTSS) can be used.
Spfiles, ControlFiles, Datafiles and Redolog files should be created on shared storage.
Where does the Clusterware write when there is a network or Storage missed heartbeat?
The ocrconfig -showbackup can be run to find out the automatic and manually run backups.
You can use either the logical or the physical OCR backup copy to restore the Repository.
How do you find out what object has its blocks being shipped across the instance the most?
The VIP is an alternate Virtual IP address assigned to each node in a cluster. During a node
failure the VIP of the failed node moves to the surviving node and relays to the application
that the node has gone down. Without VIP, the application will wait for TCP timeout and
then find out that the session is no longer live due to the failure.
You can query the V$ACTIVE_INSTANCES view to determine the member instances of the
RAC cluster.
What is OCLUMON used for in a cluster environment?
The Cluster Health Monitor (CHM) stores operating system metrics in the CHM repository for
all nodes in a RAC cluster. It stores information on CPU, memory, process, network and
other OS data, This information can later be retrieved and used to troubleshoot and identify
any cluster related issues. It is a default component of the 11gr2 grid install. The data is
stored in the master repository and replicated to a standby repository on a different node.
What would be the possible performance impact in a cluster if a less powerful node (e.g.
slower CPU’s) is added to the cluster?
All processing will show down to the CPU speed of the slowest server.
Oracle Local repository contains information that allows the cluster processes to be started
up with the OCR being in the ASM storage ssytem. Since the ASM file system is unavailable
until the Grid processes are started up a local copy of the contents of the OCR is required
which is stored in the OLR.
In 10g the default SGA size is 1G in 11g it is set to 256M and in 12c ASM it is set back to
1G.
Datafiles
Redo logfiles
Spfiles
In 12c the files below can also new be stored in the ASM Diskgroup
Password file
This is the parameter which controls the number of Allocation units the ASM instance will try
to rebalance at any given time. In ASM versions less than 11.2.0.3 the default value is 11
however it has been changed to unlimited in later versions.
A patch is considered a rolling if it is can be applied to the cluster binaries without having to
shutting down the database in a RAC environment. All nodes in the cluster are patched in a
rolling manner, one by one, with only the node which is being patched unavailable while all
other instance open.
CLUSTER_DATABASE
CLUSTER_DATABASE_INSTANCE
ACTIVE_INSTANCE_COUNT
UNDO_MANAGEMENT
The Grid software is becoming more and more capable of not just supporting HA for Oracle
Databases but also other applications including Oracle’s applications. With 12c there are
more features and functionality built-in and it is easier to deploy these pre-built solutions,
available for common Oracle applications.
Is there an easy way to verify the inventory for all remote nodes
You can run the opatch lsinventory -all_nodes command from a single node to look at the
inventory details for all nodes in the cluster.
How does OCSSD starts first if voting disk & OCR resides in ASM Diskgroups?
You might wonder how CSSD, which is required to start the clustered ASM instance, can be
started if voting disks are stored in ASM?
without access to the voting disks there is no CSS, hence the node cannot join the cluster.
But without being part of the cluster, CSSD cannot start the ASM instance.
To solve this problem the ASM disk headers have new metadata in 11.2:
you can use kfed to read the header of an ASM disk containing a voting disk.
The kfdhdb.vfstart and kfdhdb.vfend fields tell CSS where to find the voting file. This does
not require the ASM instance to be up.
Once the voting disks are located, CSS can access them and joins the cluster.
GSDCTL stands for Global Service Daemon Control, we can use gsdctl commands to start,
stop, and obtain the status of the GSD service on any platform.
$ ORACLE_HOME/srvm/log/gsdaemon_node_name.log
What is RAC?
It is a clustering solution from Oracle Corporation that ensures high availability of databases
by providing instance failover, media failover features.
Oracle RAC is a cluster database with a shared cache architecture that overcomes the
limitations of traditional shared-nothing and shared-disk approaches to provide a highly
scalable and available database solution for all the business applications.
Oracle RAC one Node is a single instance running on one node of the cluster while the 2nd
node is in cold standby mode. If the instance fails for some reason then RAC one node
detect it and restart the instance on the same node or the instance is relocate to the 2nd
node incase there is failure or fault in 1st node. The benefit of this feature is that it provides
a cold failover solution and it automates the instance relocation without any downtime and
does not need a manual intervention. Oracle introduced this feature with the release of
11gR2 (available with Enterprise Edition).
Oracle Real Application clusters allows multiple instances to access a single database, the
instances will be running on multiple nodes.
Real Application Clusters coordinates each node's access to the shared data to provide
consistency and integrity.
Availability - nodes can be added or replaced without having to shutdown the database
Scalability - more nodes can be added to the cluster as the workload increases
Oracle RAC one Node is a single instance running on one node of the cluster while the 2nd
node is in cold standby mode. If the instance fails for some reason then RAC one node
detect it and restart the instance on the same node or the instance is relocate to the 2nd
node incase there is failure or fault in 1st node. The benefit of this feature is that it provides
a cold failover solution and it automates the instance relocation without any downtime and
does not need a manual intervention. Oracle introduced this feature with the release of
11gR2 (available with Enterprise Edition).
Oracle RAC is composed of two or more instances. When a block of data is read from
datafile by an instance within the cluster and another instance is in need of the same block,
it is easy to get the block image from the instance which has the block in its SGA rather
than reading from the disk. To enable inter instance communication Oracle RAC makes use
of interconnects. The Global Enqueue Service (GES) monitors and Instance enqueue process
manages the cache fusion.
What command would you use to check the availability of the RAC system?
The node with the lowest node number will become master node and dynamic remastering
of the resources will take place.
To find out the master node for particular resource, you can query v$ges_resource for
MASTER_NODE column.
To find out which is the master node, you can see ocssd.log file and search for "master node
number".
when the first master node fails in the cluster the lowest node number will become master
node.
All datafiles, controlfiles, SPFIles, redo log files must reside on cluster-aware shred storage.
·OCFS2 and
3.Clusterware software
Basically Oracle kernel need to switched on with RAC On option when you convert to RAC,
that is the difference as it facilitates few RAC bg process like LMON,LCK,LMD,LMS etc.
$ cd $ORACLE_HOME/rdbms/lib
# rebuild oracle
$ cd $ORACLE_HOME/bin
$ relink oracle
SAN (Storage Area Networks) - generally using fibre to connect to the SAN
NAS (Network Attached Storage) - generally using a network to connect to the NAS using
either NFS, ISCSI
The Clusterware software allows nodes to communicate with each other and forms the
cluster that makes the nodes work as a single logical server.
The software is run by the Cluster Ready Services (CRS) using the Oracle Cluster Registry
(OCR) that records and maintains the cluster and node membership information and the
voting disk which acts as a tiebreaker during communication failures. Consistent heartbeat
information travels across the interconnect to the voting disk when the cluster is running.
Real Application Clusters
Oracle RAC is a cluster database with a shared cache architecture that overcomes the
limitations of traditional shared-nothing and shared-disk approaches to provide a highly
scalable and available database solution for all your business applications. Oracle RAC
provides the foundation for enterprise grid computing.
Oracle’s Real Application Clusters (RAC) option supports the transparent deployment of a
single database across a cluster of servers, providing fault tolerance from hardware failures
or planned outages. Oracle RAC running on clusters provides Oracle’s highest level of
capability in terms of availability, scalability, and low-cost computing.
Cluster Software. Oracles Clusterware or products like Veritas Volume Manager are required
to provide the cluster support and allow each node to know which nodes belong to the
cluster and are available and with Oracle Cluterware to know which nodes have failed and to
eject then from the cluster, so that errors on that node can be cleared.
Oracle Clusterware has two key components Cluster Registry OCR and Voting Disk.
The cluster registry holds all information about nodes, instances, services and ASM storage
if used, it also contains state information ie they are available and up or similar.
The voting disk is used to determine if a node has failed, i.e. become separated from the
majority. If a node is deemed to no longer belong to the majority then it is forcibly rebooted
and will after the reboot add itself again the the surviving cluster nodes.
Oracle Clusterware has two key components Cluster Registry OCR and Voting Disk.
Oracle RAC uses the voting disk to manage cluster membership by way of a health check
and arbitrates cluster ownership among the instances in case of network failures. The voting
disk must reside on shared disk.
A node must be able to access more than half of the voting disks at any time.
For example, if you have 3 voting disks configured, then a node must be able to access at
least two of the voting disks at any time. If a node cannot access the minimum required
number of voting disks it is evicted, or removed, from the cluster.
The cluster registry holds all information about nodes, instances, services and ASM storage
if used, it also contains state information ie they are available and up or similar.
The OCR must reside on shared disk that is accessible by all of the nodes in your cluster.
Can you add voting disk online? Do you need voting disk backup?
Yes, as per documentation, if you have multiple voting disk you can add online, but if you
have only one voting disk , by that cluster will be down as its lost you just need to start crs
in exclusive mode and add the votedisk using
Oracle recommends us to use the dd command to backup the voting disk with a minimum
block size of 4KB.
How do we backup voting disks?
1) Oracle recommends that you back up your voting disk after the initial cluster creation
and after we complete any node addition or deletion procedures.
2) First, as root user, stop Oracle Clusterware (with the crsctl stop crs command) on all
nodes. Then, determine the current voting disk by issuing the following command:
dd if=voting_disk_name of=backup_file_name
where,
backup_file_name is the name of the file to which we want to back up the voting disk
contents
We can verify the current backup of OCR using the following command : ocrconfig -
showbackup
The cluster stack will be down due to the fact that cssd is unable to maintain the integrity,
this is true in 10g, From 11gR2 onwards, the crsd stack will be down, the hasd still up and
running. You can add the ocr back by restoring the automatic backup or import the manual
backup,
In a RAC environment the buffer cache is global across all instances in the cluster and hence
the processing differs.The most common wait events related to this are gc cr request and gc
buffer busy
GC CR request :the time it takes to retrieve the data from the remote cache
Reason: RAC Traffic Using Slow Connection or Inefficient queries (poorly tuned queries will
increase the amount of data blocks requested by an Oracle session. The more blocks
requested typically means the more often a block will need to be read from a remote
instance via the interconnect.)
GC BUFFER BUSY: It is the time the remote instance locally spends accessing the requested
data block.
Note.559365.1 Using Diagwait as a diagnostic to get more information for diagnosing Oracle
Clusterware Node evictions.
Srvctl cannot start instance, I get the following error PRKP-1001 CRS-0215, however sqlplus
can start it on both nodes? How do you identify the problem?
Set the environmental variable SRVM_TRACE to true.. And start the instance with srvctl.
Now you will get detailed error stack.
What are Oracle Clusterware processes for 10g on Unix and Linux?
Cluster Synchronization Services (ocssd) — Manages cluster node membership and runs as
the oracle user; failure of this process results in cluster restart.
Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be
a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application
process, and so on) based on the resource's configuration information that is stored in the
OCR. This includes start, stop, monitor and failover operations. This process runs as the root
user
Event manager daemon (evmd) —A background process that publishes events that crs
creates.
Process Monitor Daemon (OPROCD) —This process monitor the cluster and provide I/O
fencing. OPROCD performs its check, stops running, and if the wake up is beyond the
expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure
results in Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on
Linux platforms.
Oracle RAC is composed of two or more database instances. They are composed of Memory
structures and background processes same as the single instance database.Oracle RAC
instances use two processes GES(Global Enqueue Service), GCS(Global Cache Service) that
enable cache fusion.Oracle RAC instances are composed of following background processes:
To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy
a query or transaction, Oracle RAC instances use two processes, the Global Cache Service
(GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the
statuses of each data file and each cached block using a Global Resource Directory (GRD).
The GRD contents are distributed across all of the active instances.
What is GRD?
GRD stands for Global Resource Directory. The GES and GCS maintains records of the
statuses of each datafile and each cahed block using global resource directory.This process
is referred to as cache fusion and helps in data integrity.
What is ACMS?
ACMS stands for Atomic Controlfile Memory Service.In an Oracle RAC environment ACMS is
an agent that ensures a distributed SGA memory update(ie)SGA updates are globally
committed on success or globally aborted in event of a failure.
What is SCAN listener?
A scan listener is something that additional to node listener which listens the incoming db
connection requests from the client which got through the scan IP, it got end points
configured to node listener where it routes the db connection requests to particular node
listener.
SCAN IP can be disabled if not required. However SCAN IP is mandatory during the RAC
installation. Enabling/disabling SCAN IP is mostly used in oracle apps environment by the
concurrent manager (kind of job scheduler in oracle apps).
iii.Stop scan
VIP is all about availability of application. When a node fails then the VIP component fail
over to some other node, this is the reason that all applications should based on vip
components means tns entries should have vip entry in the host list
An interconnect network is a private network that connects all of the servers in a cluster.
The interconnect network uses a switch/multiple switches that only the nodes in the cluster
can access.
· Configure User Datagram Protocol (UDP) on Gigabit Ethernet for cluster interconnects.
· On UNIX and Linux systems we use UDP and RDS (Reliable data socket) protocols to be
used by Oracle Clusterware.
Clusterware uses the private interconnect for cluster synchronization (network heartbeat)
and daemon communication between the the clustered nodes. This communication is based
on the TCP protocol.
RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP).
Cache Fusion is the remote memory mapping of Oracle buffers, shared between the caches
of participating nodes in the cluster.
A virtual IP address or VIP is an alternate IP address that the client connections use instead
of the standard public IP address. To configure VIP address, we need to reserve a spare IP
address for each node, and the IP addresses must use the same subnet as the public
network.
If a node fails, then the node's VIP address fails over to another node on which the VIP
address can accept TCP connections but it cannot accept Oracle connections.
Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP
timeout period (which can be up to 10 min) before getting an error. As a result, you don't
really have a good HA solution without using VIPs.
When a node fails, the VIP associated with it is automatically failed over to some other node
and new node re-arps the world indicating a new MAC address for the IP. Subsequent
packets sent to the VIP go to the new node, which will send error RST packets back to the
clients. This results in the clients getting errors immediately.
Give situations under which VIP address failover happens?
VIP addresses failover happens when the node on which the VIP address runs fails; all
interfaces for the VIP address fails, all interfaces for the VIP address are disconnected from
the network.
When a VIP address failover happens, Clients that attempt to connect to the VIP address
receive a rapid connection refused error .They don't have to wait for TCP connection timeout
messages.
Applications should use the services feature to connect to the Oracle database. Services
enable us to define rules and characteristics to control how users and applications connect
to database instances.
The characteristics include a unique name, workload balancing, failover options, and high
availability.
Oracle Net Services enable the load balancing of application connections across all of the
instances in an Oracle RAC database.
Connection Workload management is one of the key aspects when you have RAC instances
as you want to distribute the connections to specific nodes/instance or those have less load.
1.Client Side load balancing (also called as connect time load balancing)
2.Server side load balancing (also called as Listener connection load balancing)
What is the difference between server-side and client-side connection load balancing?
Client-side balancing happens at client side where load balancing is done using listener.In
case of server-side load balancing listener uses a load-balancing advisory to redirect
connections to the instance providing best service.
Client Side load balancing:- Oracle client side load balancing feature enables clients to
randomize the connection requests among all the available listeners based on their load.
An tns entry that contains all nodes entries and use load_balance=on (default its on) will
use the connect time load balancing or client side load balancing.
finance =
(DESCRIPTION =
(LOAD_BALANCE = yes)
(CONNECT_DATA =
(SERVER = DEDICATED)
Server side load balancing:- This improves the connection performance by balancing the
number of active connections among multiple instances and dispatchers. In a single
instance environment (shared servers), the listener selects the least dispatcher to handle
the incoming client requests. In a rac environments, PMON is aware of all instances load
and dispatchers , and depending on the load information PMON redirects the connection to
the least loaded node.
In a RAC environment, *.remote_listener parameter which is a tns entry containing all
nodes addresses need to set to enable the load balance advisory updates to PMON.
local_listener=LISTENER_MYRAC1
remote_listener = LISTENERS_MYRACDB
What are the administrative tools used for Oracle RAC environments?
Oracle RAC cluster can be administered as a single image using the below
· SQL*PLUS,
· DBCA,
· NETCA
Also from 11gR2 manages the cluster resources like network,vip,disks etc
We need to stop and delete the instance in the node first in interactive or silent mode.After
that asm can be removed using srvctl tool as follows:
We can verify if ASM has been removed by issuing the following command:
How do we verify that an instance has been removed from OCR after deleting an instance?
cd CRS_HOME/bin
./crs_stat
What are the modes of deleting instances from ORacle Real Application cluster Databases?
We can delete instances using silent mode or interactive mode using DBCA(Database
Configuration Assistant).
What are the background process that exists in 11gr2 and functionality?
crsd •The CRS daemon (crsd) manages cluster resources based on configuration
information that is stored in Oracle Cluster Registry (OCR) for each resource. This includes
start, stop, monitor, and failover operations. The crsd process generates events when the
status of a resource changes.
diskmon •Disk Monitor daemon (diskmon): Monitors and performs input/output fencing
for Oracle Exadata Storage Server. As Exadata storage can be added to any Oracle RAC
node at any point in time, the diskmon daemon is always started when ocssd is started.
evmd •Event Manager (EVM): Is a background process that publishes Oracle Clusterware
events
mdnsd •Multicast domain name service (mDNS): Allows DNS requests. The mDNS
process is a background process on Linux and UNIX, and a service on Windows.
gnsd •Oracle Grid Naming Service (GNS): Is a gateway between the cluster mDNS and
external DNS servers. The GNS process performs name resolution within the cluster.
oclskd •Cluster kill daemon (oclskd): Handles instance/node evictions requests that have
been escalated to CSS
gipcd •Grid IPC daemon (gipcd): Is a helper daemon for the communications
infrastructure
There is not much difference between 10g and 11gR (1) RAC. But there is a significant
difference in 11gR2.
Databases
Instances
Applications
Node Monitoring
Event Services
High Availability
From 11gR2(onwards) its completed HA stack managing and providing the following
resources as like the other cluster software like VCS etc.
Databases
Instances
Applications
Cluster Management
Node Management
Event Services
High Availability
Removed OS dependent hang checker etc, manages with own additional monitor process
The hangcheck timer checks regularly the health of the system. If the system hangs or stop
the node will be restarted automatically.
-> hangcheck-tick: this parameter defines the period of time between checks of system
health. The default value is 60 seconds; Oracle recommends setting it to 30seconds.
-> hangcheck-margin: this defines the maximum hang delay that should be tolerated before
hangcheck-timer resets the RAC node.
State the initialization parameters that must have same value for every instance in an
Oracle RAC database?
Some initialization parameters are critical at the database creation time and must have
same values.Their value must be specified in SPFILE or PFILE for every instance.The list of
parameters that must be identical on every instance are given below:
ACTIVE_INSTANCE_COUNT
ARCHIVE_LAG_TARGET
COMPATIBLE
CLUSTER_DATABASE
CLUSTER_DATABASE_INSTANCE
CONTROL_FILES
DB_BLOCK_SIZE
DB_DOMAIN
DB_FILES
DB_NAME
DB_RECOVERY_FILE_DEST
DB_RECOVERY_FILE_DEST_SIZE
DB_UNIQUE_NAME
INSTANCE_TYPE (RDBMS or ASM)
PARALLEL_MAX_SERVERS
REMOTE_LOGIN_passWORD_FILE
UNDO_MANAGEMENT
-------------------------------------------------------------------------------------------------------
--------
What is RAC? What is the benefit of RAC over single instance database?
Benefits:
Improve throughput
High availability
Transparency
Availability - nodes can be added or replaced without having to shutdown the database
Scalability - more nodes can be added to the cluster as the workload increases
A virtual IP address or VIP is an alternate IP address that the client connections use instead
of the standard public IP address. To configure VIP address, we need to reserve a spare IP
address for each node, and the IP addresses must use the same subnet as the public
network.
If a node fails, then the node's VIP address fails over to another node on which the VIP
address can accept TCP connections but it cannot accept Oracle connections.
VIP addresses failover happens when the node on which the VIP address runs fails, all
interfaces for the VIP address fails, all interfaces for the VIP address are disconnected from
the network.
Using virtual IP we can save our TCP/IP timeout problem because Oracle notification service
maintains communication between each nodes and listeners.
When a VIP address failover happens, Clients that attempt to connect to the VIP address
receive a rapid connection refused error .They don't have to wait for TCP connection timeout
messages.
Voting Disk is a file that sits in the shared storage area and must be accessible by all nodes
in the cluster. All nodes in the cluster registers their heart-beat information in the voting
disk, so as to confirm that they are all operational. If heart-beat information of any node in
the voting disk is not available that node will be evicted from the cluster. The CSS (Cluster
Synchronization Service) daemon in the clusterware maintains the heart beat of all nodes to
the voting disk. When any node is not able to send heartbeat to voting disk, then it will
reboot itself, thus help avoiding the split-brain syndrome.
For high availability, Oracle recommends that you have a minimum of three or odd number
(3 or greater) of votingdisks.
Voting Disk - is file that resides on shared storage and Manages cluster members. Voting
disk reassigns cluster ownership between the nodes in case of failure.
The Voting Disk Files are used by Oracle Clusterware to determine which nodes are
currently members of the cluster. The voting disk files are also used in concert with other
Cluster components such as CRS to maintain the clusters integrity.
Oracle Database 11g Release 2 provides the ability to store the voting disks in ASM along
with the OCR. Oracle Clusterware can access the OCR and the voting disks present in ASM
even if the ASM instance is down. As a result CSS can continue to maintain the Oracle
cluster even if the ASM instance has failed.
http://www.toadworld.com/KNOWLEDGE/KnowledgeXpertforOracle/tabid/648/TopicID/RACR
2ARC6/Default.aspx
Oracle expects that you will configure at least 3 voting disks for redundancy purposes. You
should always configure an odd number of voting disks >= 3. This is because loss of more
than half your voting disks will cause the entire cluster to fail.
You should plan on allocating 280MB for each voting disk file. For example, if you are using
ASM and external redundancy then you will need to allocate 280MB of disk for the voting
disk. If you are using ASM and normal redundancy you will need 560MB.
Oracle expects that you will configure at least 3 voting disks for redundancy purposes. You
should always configure an odd number of voting disks >= 3. This is because loss of more
than half your voting disks will cause the entire cluster to fail.
Oracle RAC is composed of two or more database instances. They are composed of Memory
structures and background processes same as the single instance database.Oracle RAC
instances use two processes GES(Global Enqueue Service), GCS(Global Cache Service) that
enable cache fusion.Oracle RAC instances are composed of following background processes:
Cluster Synchronization Services (ocssd) — Manages cluster node membership and runs as
the oracle user; failure of this process results in cluster restart.
Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be
a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application
process, and so on) based on the resource's configuration information that is stored in the
OCR. This includes start, stop, monitor and failover operations. This process runs as the root
user
Event manager daemon (evmd) —A background process that publishes events that crs
creates.
Process Monitor Daemon (OPROCD) —This process monitor the cluster and provide I/O
fencing. OPROCD performs its check, stops running, and if the wake up is beyond the
expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure
results in Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on
Linux platforms.
Oracle RAC instances use two processes, the Global Cache Service (GCS) and the Global
Enqueue Service (GES). The GCS and GES maintain records of the statuses of each data file
and each cached block using a Global Resource Directory (GRD). The GRD contents are
distributed across all of the active instances.
What is Cache Fusion?
Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g
Release 2 feature that provides a single name for clients to access an Oracle Database
running in a cluster. The benefit is clients using SCAN do not need to change if you add or
remove nodes in the cluster.
SCAN provides a single domain name via (DNS), allowing and-users to address a RAC
cluster as-if it were a single IP address. SCAN works by replacing a hostname or IP list with
virtual IP addresses (VIP).
Single client access name (SCAN) is meant to facilitate single name for all Oracle clients to
connect to the cluster database, irrespective of number of nodes and node location. Until
now, we have to keep adding multiple address records in all clients tnsnames.ora, when a
new node gets added to or deleted from the cluster.
Single Client Access Name (SCAN) eliminates the need to change TNSNAMES entry when
nodes are added to or removed from the Cluster. RAC instances register to SCAN listeners
as remote listeners. Oracle recommends assigning 3 addresses to SCAN, which will create 3
SCAN listeners, though the cluster has got dozens of nodes.. SCAN is a domain name
registered to at least one and up to three IP addresses, either in DNS (Domain Name
Service) or GNS (Grid Naming Service). The SCAN must resolve to at least one address on
the public network. For high availability and scalability, Oracle recommends configuring the
SCAN to resolve to three addresses.
http://www.freeoraclehelp.com/2011/12/scan-setup-for-oracle-11g-release211gr2.html
1.SCAN Name
What is TAF?
After an Oracle RAC node crashes—usually from a hardware failure—all new application
transactions are automatically rerouted to a specified backup node. The challenge in
rerouting is to not lose transactions that were "in flight" at the exact moment of the crash.
One of the requirements of continuous availability is the ability to restart in-flight application
transactions, allowing a failed node to resume processing on another server without
interruption. Oracle's answer to application failover is a new Oracle Net mechanism dubbed
Transparent Application Failover. TAF allows the DBA to configure the type and method of
failover for each Oracle Net client.
TAF architecture offers the ability to restart transactions at either the transaction (SELECT)
or session level.
1. External Shared Disk to store Oracle Cluster ware file (Voting Disk and Oracle Cluster
Registry - OCR)
2. Two netwrok cards on each cluster ware node (and three set of IP address) -
Network Card 2 (with IP address set 2) for private network (for inter node communication
between rac nodes used by clusterware and rac database)
IP address set 3 for Virtual IP (VIP) (used as Virtual IP address for client connection and for
connection failover)
3. Storage Option for OCR and Voting Disk - RAW, OCFS2 (Oracle Cluster File System), NFS,
…..
Which enable the load balancing of applications in RAC?
Oracle Net Services enable the load balancing of application connections across all of the
instances in an Oracle RAC database.
If you need to find the location of OCR (Oracle Cluster Registry) but your CRS is down.
Look into “ocr.loc” file, location of this file changes depending on the OS:
On Linux: /etc/oracle/ocr.loc
On Solaris: /var/opt/oracle/ocr.loc
Set ASM environment or CRS environment then run the below command:
ocrcheck
Network Card 2 (with IP address set 2) for private network (for inter node communication
between rac nodes used by clusterware and rac database)
6 - 3 set of IP address
## eth1-Public: 2
## eth0-Private: 2
## VIP: 2
## Virtual IPs
Public IP adress is the normal IP address typically used by DBA and SA to manage storage,
system and database. Public IP addresses are reserved for the Internet.
Private IP address is used only for internal clustering processing (Cache Fusion) (aka as
interconnect). Private IP addresses are reserved for private networks.
VIP is used by database applications to enable fail over when one cluster node fails. The
purpose for having VIP is so client connection can be failover to surviving nodes in case
there is failure
No. private IP address is used only for internal clustering processing (Cache Fusion) (aka as
interconnect)
Oracle RAC is a cluster database with a shared cache architecture that overcomes the
limitations of traditional shared-nothing and shared-disk approaches to provide a highly
scalable and available database solution for all your business applications. Oracle RAC
provides the foundation for enterprise grid computing.
Oracle’s Real Application Clusters (RAC) option supports the transparent deployment of a
single database across a cluster of servers, providing fault tolerance from hardware failures
or planned outages. Oracle RAC running on clusters provides Oracle’s highest level of
capability in terms of availability, scalability, and low-cost computing.
Cluster Software. Oracles Clusterware or products like Veritas Volume Manager are required
to provide the cluster support and allow each node to know which nodes belong to the
cluster and are available and with Oracle Cluterware to know which nodes have failed and to
eject then from the cluster, so that errors on that node can be cleared.
Oracle Clusterware has two key components Cluster Registry OCR and Voting Disk.
The cluster registry holds all information about nodes, instances, services and ASM storage
if used, it also contains state information ie they are available and up or similar.
The voting disk is used to determine if a node has failed, i.e. become separated from the
majority. If a node is deemed to no longer belong to the majority then it is forcibly rebooted
and will after the reboot add itself again the the surviving cluster nodes.
Advantages of RAC (Real Application Clusters)
Availability – nodes can be added or replaced without having to shutdown the database
Scalability – more nodes can be added to the cluster as the workload increases
A virtual IP address or VIP is an alternate IP address that the client connections use instead
of the standard public IP address. To configure VIP address, we need to reserve a spare IP
address for each node, and the IP addresses must use the same subnet as the public
network.
If a node fails, then the node’s VIP address fails over to another node on which the VIP
address can accept TCP connections but it cannot accept Oracle connections.
VIP addresses failover happens when the node on which the VIP address runs fails, all
interfaces for the VIP address fails, all interfaces for the VIP address are disconnected from
the network.
Using virtual IP we can save our TCP/IP timeout problem because Oracle notification service
maintains communication between each nodes and listeners.
When a VIP address failover happens, Clients that attempt to connect to the VIP address
receive a rapid connection refused error .They don’t have to wait for TCP connection timeout
messages.
Voting Disk is a file that sits in the shared storage area and must be accessible by all nodes
in the cluster. All nodes in the cluster registers their heart-beat information in the voting
disk, so as to confirm that they are all operational. If heart-beat information of any node in
the voting disk is not available that node will be evicted from the cluster. The CSS (Cluster
Synchronization Service) daemon in the clusterware maintains the heart beat of all nodes to
the voting disk. When any node is not able to send heartbeat to voting disk, then it will
reboot itself, thus help avoiding the split-brain syndrome.
For high availability, Oracle recommends that you have a minimum of three or odd number
(3 or greater) of votingdisks.
Voting Disk – is file that resides on shared storage and Manages cluster members. Voting
disk reassigns cluster ownership between the nodes in case of failure.
The Voting Disk Files are used by Oracle Clusterware to determine which nodes are
currently members of the cluster. The voting disk files are also used in concert with other
Cluster components such as CRS to maintain the clusters integrity.
Oracle Database 11g Release 2 provides the ability to store the voting disks in ASM along
with the OCR. Oracle Clusterware can access the OCR and the voting disks present in ASM
even if the ASM instance is down. As a result CSS can continue to maintain the Oracle
cluster even if the ASM instance has failed.
By default Oracle will create 3 voting disk files in ASM. Oracle expects that you will
configure at least 3 voting disks for redundancy purposes. You should always configure an
odd number of voting disks >= 3. This is because loss of more than half your voting disks
will cause the entire cluster to fail.
You should plan on allocating 280MB for each voting disk file. For example, if you are using
ASM and external redundancy then you will need to allocate 280MB of disk for the voting
disk. If you are using ASM and normal redundancy you will need 560MB.
Oracle expects that you will configure at least 3 voting disks for redundancy purposes. You
should always configure an odd number of voting disks >= 3. This is because loss of more
than half your voting disks will cause the entire cluster to fail.
Cluster Synchronization Services (ocssd) — Manages cluster node membership and runs as
the oracle user; failure of this process results in cluster restart.
Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be
a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application
process, and so on) based on the resource’s configuration information that is stored in the
OCR. This includes start, stop, monitor and failover operations. This process runs as the root
user
Event manager daemon (evmd) —A background process that publishes events that crs
creates.
Process Monitor Daemon (OPROCD) —This process monitor the cluster and provide I/O
fencing. OPROCD performs its check, stops running, and if the wake up is beyond the
expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure
results in Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on
Linux platforms.
Oracle RAC instances use two processes, the Global Cache Service (GCS) and the Global
Enqueue Service (GES). The GCS and GES maintain records of the statuses of each data file
and each cached block using a Global Resource Directory (GRD). The GRD contents are
distributed across all of the active instances.
Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g
Release 2 feature that provides a single name for clients to access an Oracle Database
running in a cluster. The benefit is clients using SCAN do not need to change if you add or
remove nodes in the cluster.
SCAN provides a single domain name via (DNS), allowing and-users to address a RAC
cluster as-if it were a single IP address. SCAN works by replacing a hostname or IP list with
virtual IP addresses (VIP).
Single client access name (SCAN) is meant to facilitate single name for all Oracle clients to
connect to the cluster database, irrespective of number of nodes and node location. Until
now, we have to keep adding multiple address records in all clients tnsnames.ora, when a
new node gets added to or deleted from the cluster.
Single Client Access Name (SCAN) eliminates the need to change TNSNAMES entry when
nodes are added to or removed from the Cluster. RAC instances register to SCAN listeners
as remote listeners. Oracle recommends assigning 3 addresses to SCAN, which will create 3
SCAN listeners, though the cluster has got dozens of nodes.. SCAN is a domain name
registered to at least one and up to three IP addresses, either in DNS (Domain Name
Service) or GNS (Grid Naming Service). The SCAN must resolve to at least one address on
the public network. For high availability and scalability, Oracle recommends configuring the
SCAN to resolve to three addresses.
What are SCAN components in a cluster?
1.SCAN Name
What is FAN?
What is TAF?
After an Oracle RAC node crashes—usually from a hardware failure—all new application
transactions are automatically rerouted to a specified backup node. The challenge in
rerouting is to not lose transactions that were “in flight” at the exact moment of the crash.
One of the requirements of continuous availability is the ability to restart in-flight application
transactions, allowing a failed node to resume processing on another server without
interruption. Oracle’s answer to application failover is a new Oracle Net mechanism dubbed
Transparent Application Failover. TAF allows the DBA to configure the type and method of
failover for each Oracle Net client.
TAF architecture offers the ability to restart transactions at either the transaction (SELECT)
or session level.
1. External Shared Disk to store Oracle Cluster ware file (Voting Disk and Oracle Cluster
Registry – OCR)
2. Two netwrok cards on each cluster ware node (and three set of IP address) –
Network Card 1 (with IP address set 1) for public network
Network Card 2 (with IP address set 2) for private network (for inter node communication
between rac nodes used by clusterware and rac database)
IP address set 3 for Virtual IP (VIP) (used as Virtual IP address for client connection and for
connection failover)
3. Storage Option for OCR and Voting Disk – RAW, OCFS2 (Oracle Cluster File System),
NFS, …..
Oracle Net Services enable the load balancing of application connections across all of the
instances in an Oracle RAC database.
If you need to find the location of OCR (Oracle Cluster Registry) but your CRS is down.
Look into “ocr.loc” file, location of this file changes depending on the OS:
On Linux: /etc/oracle/ocr.loc
On Solaris: /var/opt/oracle/ocr.loc
Set ASM environment or CRS environment then run the below command:
ocrcheck
Network Card 2 (with IP address set 2) for private network (for inter node communication
between rac nodes used by clusterware and rac database)
6 – 3 set of IP address
## eth1-Public: 2
## eth0-Private: 2
## VIP: 2
## Virtual IPs
Public IP adress is the normal IP address typically used by DBA and SA to manage storage,
system and database. Public IP addresses are reserved for the Internet.
Private IP address is used only for internal clustering processing (Cache Fusion) (aka as
interconnect). Private IP addresses are reserved for private networks.
VIP is used by database applications to enable fail over when one cluster node fails. The
purpose for having VIP is so client connection can be failover to surviving nodes in case
there is failure.
Can application developer access the private ip ?
No. private IP address is used only for internal clustering processing (Cache Fusion) (aka as
interconnect)
GRD stands for Global Resource Directory. The GES and GCS maintains records of the
statuses of each datafile and each cached block using global resource directory.This process
is referred to as cache fusion and helps in data integrity.
Oracle RAC is composed of two or more instances. When a block of data is read from
datafile by an instance within the cluster and another instance is in need of the same
block,it is easy to get the block image from the instance which has the block in its SGA
rather than reading from the disk. To enable inter instance communication Oracle RAC
makes use of interconnects. The Global Enqueue Service(GES) monitors and Instance
enqueue process manages the cahce fusion.
ACMS stands for Atomic Controlfile Memory Service.In an Oracle RAC environment ACMS is
an agent that ensures a distributed SGA memory update(ie)SGA updates are globally
committed on success or globally aborted in event of a failure.
8) What is clustering?
As LMON is for monitoring global enqueue services, this is global enqueue services daemon
process. This process manages incoming remote resource requests within each instance.
LMD0 particularly processes incoming enqueue request messages. IT controls access to
global enqueues
This process is called as Global Cache service process.This process maintains statuses of
datafiles and each cahed block by recording information in a Global Resource
Dectory(GRD).This process also controls the flow of messages to remote instances and
manages global data block access and transmits block images between the buffer caches of
different instances.This processing is a part of cache fusion feature.
This process is called as Instance enqueue process.This process manages non-cache fusion
resource requests such as library and row cache requests.
This process is called as Remote Slave Monitor.This process manages background slave
process creation and communication on remote http://learnersreference.com/ instances.
This is a background slave process.This process performs tasks on behalf of a co-ordinating
process running in another instance.
All datafiles, controlfiles, SPFIles, redo log files must reside on cluster-aware shred storage.
17) What is the significance of using cluster-aware shared storage in an Oracle RAC
environment?
All instances of an Oracle RAC can access all the datafiles,control files, SPFILE’s, redolog
files when these files are hosted out of cluster-aware shared storage which are group of
shared disks.
18) Give few examples for solutions that support cluster storage:-
LKDEBUG is used to obtain information about the current state GCS and GES structures in
the instance.
an interconnect network is a private network that connects all of the servers in a cluster.
The interconnect network uses a switch/multiple switches that only the nodes in the cluster
can access.
Configure User Datagram Protocol(UDP) on Gigabit ethernet for cluster interconnect.On unix
and linux systems we use UDP and RDS(Reliable data socket) protocols to be used by
Oracle Clusterware. Windows clusters use the TCP protocol.
No, crossover cables are not supported with Oracle Clusterware interconnects.
Cluster interconnect is used by the Cache fusion for inter instance communication.
Users can access a RAC database using a client/server configuration or through one or more
middle tiers ,with or without connection pooling.Users can use oracle services feature to
connect to database.
Applications should use the services feature to connect to the Oracle database.Services
enable us to define rules and characteristics to control how users and applications connect
to database instances.
The characteristics include a unique name, workload balancing and failover options,and high
availability characteristics.
Oracle Net Services enable the load balancing of application connections across all of the
instances in an Oracle RAC database
A virtual IP address or VIP is an alternate IP address that the client connections use instead
of the standard public IP address. To configure VIP address, we need to reserve a spare IP
address for each node, and the IP addresses must use the same subnet as the public
network.
If a node fails, then the node’s VIP address fails over to another node on which the VIP
address can accept TCP connections but it cannot accept Oracle connections.
30) Give situations under which VIP address failover happens:-
VIP addresses failover happens when the node on which the VIP address runs fails, all
interfaces for the VIP address fails, all interfaces for the VIP address are disconnected from
the network.
When a VIP address failover happens, Clients that attempt to connect to the VIP address
receive a rapid connection refused error .They don’t have to wait for TCP connection timeout
messages.
32) What are the administrative tools used for Oracle RAC environments?
Issue the following query from any one node connecting through SQL*PLUS.
The query gives the instance number under INST_NUMBER column, host_:instancename
under INST_NAME column.
FAN UP and FAN DOWN events can be applied to instances,services and nodes.
It is a good practice to have ASM home separate from the database home
(ORACLE_HOME).This helps in upgrading and patching ASM and the Oracle database
software independent of each other.Also,we can deinstall the Oracle database software
independent of the ASM instance.
37) What is the advantage of using ASM?
Having ASM is the Oracle recommended storage option for RAC databases as the ASM
maximizes performance by managing the/storage configuration across the disks. ASM does
this by distributing the database file across all of the available storage within our cluster
database environment.
It is a new ASM feature from Database 11g. ASM instances in Oracle database 11g
release(from 11.1) can be upgraded or patched using rolling upgrade feature. This enables
us to patch or upgrade ASM nodes in a clustered environment without affecting database
availability.During a rolling upgrade we can maintain a functional cluster while one or more
of the nodes in the cluster are running in different software versions
39) Can rolling upgrade be used to upgrade from 10g to 11g database?
No,it can be used only for Oracle database 11g releases(from 11.1) and upwards
40) State the initialization parameters that must have same value for every instance in an
Oracle RAC database:-
Some initialization parameters are critical at the database creation time and must have
same values.Their value must be specified in SPFILE or PFILE for every instance.The list of
parameters that must be identical on every instance are given below:
ACTIVE_INSTANCE_COUNT
ARCHIVE_LAG_TARGET
COMPATIBLE
CLUSTER_DATABASE
CLUSTER_DATABASE_INSTANCE
CONTROL_FILES
DB_BLOCK_SIZE
DB_DOMAIN
DB_FILES
DB_NAME
DB_RECOVERY_FILE_DEST
DB_RECOVERY_FILE_DEST_SIZE
DB_UNIQUE_NAME
PARALLEL_MAX_SERVERS
REMOTE_LOGIN_PASSWORD_FILE
UNDO_MANAGEMENT
These parameters can be identical on all instances only if these parameter values are set to
zero.
42) What two parameters must be set at the time of starting up an ASM instance in a RAC
environment?
Oracle clusterware is made up of components like voting disk and Oracle Cluster
Registry(OCR).
Oracle clusterware manages CRS resources based on the configuration information of CRS
resources stored in OCR(Oracle Cluster Registry).
Oracle clusterware manages CRS resources based on the configuration information of CRS
resources stored in OCR(Oracle Cluster Registry).
48) What are the modes of deleting instances from ORacle Real Application cluster
Databases?
We can delete instances using silent mode or interactive mode using DBCA(Database
Configuration Assistant).
We need to stop and delete the instance in the node first in interactive or silent mode.After
that asm can be removed using srvctl tool as follows:
srvctl stop asm -n node_name
We can verify if ASM has been removed by issuing the following command:
50) How do we verify that an instance has been removed from OCR after deleting an
instance?
cd CRS_HOME/bin
./crs_stat
We can verify the current backup of OCR using the following command : ocrconfig -
showbackup
We have v$ views that are instance specific. In addition we have GV$ views called as global
views that has an INST_ID column of numeric data type. GV$ views obtain information from
individual V$ views.
There are two types of connection load-balancing:server-side load balancing and client-side
load balancing.
54) What is the difference between server-side and client-side connection load balancing?
Client-side balancing happens at client side where load balancing is done using listener.In
case of server-side load balancing listener uses a load-balancing advisory to redirect
connections to the instance providing best service.
Oracle RAC (Real Application Cluster) SRVM_TRACE environment variable is an oracle RAC
(Real Application Cluster) environment variable from Oracle
It is used in the debugging on Oracle RAC (Real Application Cluster) utility srvctl.
An ocrcheck utility is a diagnostic tool used for diagnosing OC(Oracle Cluster Registry)
Problems.This is used to verify the Oracle Cluster Registry(OCR) integrity.
The OCRCHECK utility displays the version of the OCR’s block format, total space available
and used space, OCRID, and the OCR locations that we have configured.
OCRCHECK performs a block-by-block checksum operation for all of the blocks in all of the
OCRs that we have configured. It also returns an individual status for each file as well as a
result for the overall OCR integrity check.
Version : 2
ID : 1918913332
Odd number of disk are to avoid split brain, When Nodes in cluster can't talk to each other
they run to lock the Voting disk and whoever lock the more disk will survive, if disk number
are even there are chances that node might lock 50% of disk (2 out of 4) then how to
decide which node to evict.
whereas when number is odd, one will be higher than other and each for cluster to evict the
node with less number.
Note.559365.1 Using Diagwait as a diagnostic to get more information for diagnosing Oracle
Clusterware Node evictions.
To display backups :
#ocrconfig -showbackup
To restore a backup :
#ocrconfig -restore
With Oracle RAC 10g Release 2 or later, you can also use the export command:
#ocrconfig -export -s online, and use -import option to restore the contents back.
With Oracle RAC 11g Release 1, you can do a manaual backup of the OCR with the
command:
# ocrconfig -manualbackup
How do you backup voting disk
or
#ocrcheck
What is SCAN?
Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g
Release 2 feature that provides a single name for clients to access an Oracle Database
running in a cluster. The benefit is clients using SCAN do not need to change if you add or
remove nodes in the cluster.
Clusterware uses the private interconnect for cluster synchronization (network heartbeat)
and daemon communication between the the clustered nodes. This communication is based
on the TCP protocol.
RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP).
Cache Fusion is the remote memory mapping of Oracle buffers, shared between the caches
of participating nodes in the cluster.
Why do we have a Virtual IP (VIP) in Oracle RAC?
Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP
timeout period (which can be up to 10 min) before getting an error. As a result, you don't
really have a good HA solution without using VIPs.
When a node fails, the VIP associated with it is automatically failed over to some other node
and new node re-arps the world indicating a new MAC address for the IP. Subsequent
packets sent to the VIP go to the new node, which will send error RST packets back to the
clients. This results in the clients getting errors immediately.
10g Release 2, support 100 nodes in a cluster using Oracle Clusterware, and 100 instances
in a RAC database.
Srvctl cannot start instance, I get the following error PRKP-1001 CRS-0215, however sqlplus
can start it on both nodes? How do you identify the problem?
Set the environmental variable SRVM_TRACE to true.. And start the instance with srvctl.
Now you will get detailed error stack.
The Oracle Notification Service (ONS) daemon is an daemon started by the CRS clusterware
as part of the nodeapps. There is one ons daemon started per clustered node.
The Oracle Notification Service daemon receive a subset of published clusterware events via
the local evmd and racgimon clusterware daemons and forward those events to application
subscribers and to the local listeners.