Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Data Guard Cheatsheet

currently being updated, this statement will be removed when I have completed this section

Terminology

Primary database A production database


Standby database A database that can become the primary database, should the primary fail
EOR End Of Redo
LWGR Log Writer process
LNS Log Network Server
ORL Online Redo Log
RFS Remote File Server
SRL Standby Redo Log file
SYNC and ASYNC Synchronous and Asynchronous

Log Files

DG alert Log drc<db_unique_name>.log


Alert Log alert_<SID>.log
# change the instance name to reflect the one you have choosen and the path you installed oracle

prod1 (alert log): /u01/app/oracle/diag/rdbms/prod1/PROD1/trace/alert_PROD1.log


Logfile locations prod1 (DG log): /u01/app/oracle/diag/rdbms/prod1/PROD1/trace/drcPROD1.log

prod1dr (alert log): /u01/app/oracle/diag/rdbms/prod1dr/PROD1DR/trace/alert_PROD1DR.log


prod1dr (DG log): /u01/app/oracle/diag/rdbms/prod1dr/PROD1DR/trace/drcPROD1DR.log

## You can get the log locations from the below view

identify log files col name for a25


col value for a65;
select name, value from v$diag_info;

Data Guard Broker

# Primary Database server

DGMGRL> create configuration prod1 as


Create base configuration > primary database is prod1
> connect identifier is prod1;

Configuration "prod1" created with primary database "prod1"

# Primary Database server - if you have setup db_unique_name, tnsname and log_archive_dest_n

DGMGRL> add database prod1dr;

# Primary Database server - the full command set


Add the standby database DGMGRL> connect sys/password
DGMGRL> add database prod1dr
> as connect identifier is prod1dr
> maintained as physical;

Database "prod1dr" added

Display configuration DGMGRL> show configuration


Display Database DGMGRL> show database verbose prod1
# Primary Database server
Enabling the configuration DGMGRL> enable configuration
Enabled.

EDIT CONFIGURATION SET PROPERTY <name>=<value>


EDIT DATABASE <db_name> SET PROPERTY <name>=<value>
Edit configuration EDIT INSTANCE <in_name> SET PROPERTY <name>=value>

There are many options see the broker section for more information

Troubleshooting (Monitoring commands and log files)


configuration DGMGRL> show configuration;
database DGMGRL> show database prod1;
DGMGRL> show database prod1dr;

# There are a number of specific information commands, here are the most used
DGMGRL> show database prod1 statusreport;
DGMGRL> show database prod1 inconsistentProperties;
DGMGRL> show database prod1 inconsistentlogxptProps;
DGMGRL> show database prod1 logxptstatus;
DGMGRL> show database prod1 latestlog;
# change the instance name to reflect the one you have choosen

prod1 (alert log): /u01/app/oracle/diag/rdbms/prod1/PROD1/trace/alert_PROD1.log


prod1 (DG log): /u01/app/oracle/diag/rdbms/prod1/PROD1/trace/drcPROD1.log
Logfiles
prod1dr (alert log): /u01/app/oracle/diag/rdbms/prod1dr/PROD1DR/trace/alert_PROD1DR.log
prod1dr (DG log): /u01/app/oracle/diag/rdbms/prod1dr/PROD1DR/trace/drcPROD1DR.log

There are a number of commands that you can use to change the state of the database

turn off/on the redo DGMGRL> edit database prod1 set state=transport-off;
transport service for all Primary
standby databases DGMGRL> edit database prod1 set state=transport-on;

DGMGRL> edit database prod1dr set state=apply-off;


turn off/on the apply state Standby
DGMGRL> edit database prod1dr set state=apply-on;

DGMGRL> edit database prod1dr set state=apply-off;


put a database into a real-
Standby sql> alter database open read only;
time query mode DGMGRL> edit database prod1dr set state=apply-on;

# Choose what level of protection you require


sql> alter database set standby to maximize performance;
change the protection sql> alter database set standby to maximize availability;
Primary sql> alter database set standby to maximize protection;
mode
# display the configuration
DGMGRL> show configuration

Redo Processing

Redo Processes (Primary and Standby Databases)

There are a number of Oracle background processes that play a key role, first the primary database

LGWR - log writer process flushes from the SGA to the ORL files
LNS - LogWriter Network Service reads redo being flushed from the redo buffers by the LGWR and performs a
network send of the redo to the standby
ARCH - archives the ORL files to archive logs, that also used to fulfill gap resolution requests, one
ARCH processes is dedicated to local redo log activity only and never communicates with a standby
database

Processes The standby database will also have key processes

RFS - Remote File Server process performs a network receive of redo transmitted from the primary and
writes the network redo to the standby redo log (SRL) files.
ARCH - performs the same as the primary but on the standby
MRP - Managed Recover Process coordinates media recovery management, recall that a physical standby is in
perpetual recovery mode
LSP - Logical Standby Process coordinates SQL apply, this process only runs in a logical standby
PR0x - recovery server process reads redo from the SRL or archive log files and apply this redo to the
standby database.

Real-Time Apply

Enable real-time apply sql> alter database recover managed standby database using current logfile disconnect;

sql> select recovery_mode from v$archive_dest_status where dest_id = 2;


Determine if real-time
RECOVERY_MODE
apply is enabled --------------------------
MANAGED REAL-TIME APPLY
Tools and views to monitor redo
Background processes
select process, client_process, thread#, sequence#, status from v$managed_standby;

## primary (example)

PROCESS   CLIENT_P THREAD#    SEQUENCE#  STATUS


--------- -------- ---------- ---------- ------------
ARCH      ARCH     1          58         CLOSING
ARCH      ARCH     0          0          CONNECTED
ARCH      ARCH     1          59         CLOSING
ARCH      ARCH     1          56         CLOSING
LNS       LNS      1          60         WRITING
LNS       LNS      1          60         WRITING
## physical standby (example)

PROCESS   CLIENT_P THREAD#    SEQUENCE#  STATUS


--------- -------- ---------- ---------- ------------
ARCH      ARCH     0          0          CONNECTED
ARCH      ARCH     1          55         CLOSING
ARCH      ARCH     0          0          CONNECTED
ARCH      ARCH     1          59         CLOSING
RFS       N/A      0          0          IDLE
RFS       UNKNOWN  0          0          IDLE
RFS       UNKNOWN  0          0          IDLE
RFS       LGWR     1          60         IDLE
MRP0      N/A      1          60         APPLYING_LOG

## Logical standby (example)

PROCESS   CLIENT_P THREAD#    SEQUENCE#  STATUS


--------- -------- ---------- ---------- ------------
ARCH      ARCH     1          55         CLOSING
ARCH      ARCH     1          10         CLOSING
ARCH      ARCH     0          0          CONNECTED
ARCH      ARCH     0          0          CONNECTED
RFS       UNKNOWN  0          0          IDLE
RFS       LGWR     1          60         IDLE
RFS       UNKNOWN  0          0          IDLE
RFS       UNKNOWN  0          0          IDLE

select * from v$dataguard_stats;


Information on Redo Data
Note: this indirectly shows how much redo data could be lost if the primary db crashes

select to_char(snapshot_time, 'dd-mon-rr hh24:mi:ss') snapshot_time,


       thread#, sequence#, applied_scn, apply_rate
Redo apply rate        from v$standby_apply_snapshot;

Note: this command can only run when the database is open

select to_char(start_time, 'dd-mon-rr hh24:mi:ss') start_time,


       item, round(sofar/1024,2) "MB/Sec"
Recovery operations        from v$recovery_progress
       where (item='Active Apply Rate' or item='Average Apply Rate');

Logical Standby

select owner from dba_logstdby_skip where statement_opt = 'INTERNAL SCHEMA' order by owner;
schema that are not
maintained by SQL apply Note: system and sys schema are not replicated so don't go creating tables in these schemas, the above command
should return about 17 schemas (Oracle 11g) that are replicated.

Check tables with select distinct owner, table_name from dba_logstdby_unsupported;


unsupported data types select owner, table_name from logstdby_unsupported_tables;

## Syntax

dbms_logstdby.skip (
  stmt in varchar2,
  schema_name in varchar2 default null,
  object_name in varchar2 default null,
  proc_name in varchar2 default null,
  use_like in boolean default true,
skip replication of tables   esc in char1 default null
);

## Examples
execute dbms_logstdby.skip(stmt => 'DML', schema_name => 'HR', object_name => 'EMPLOYEE');
execute dbms_logstdby.skip(stmt => 'SCHEMA_DDL', schema_name => 'HR', object_name => 'EMPLOYEE');

# skip all DML operations


execute dbms_logstdby.skip(stmt => 'DML', schema_name => 'HR', object_name => '%');

stop SQL apply


execute dbms_logstdby.instantiate_table(schema_name => 'HR', table_name => 'EMPLOYEE', DBLINK =>
'INSTANTIATE_TABLE_LINK');
execute dbms_logstdby.skip(stmt => 'DML', schema_name => 'HR', object_name => 'EMPLOYEE');
start SQL apply
revoke a skipped table
Note: the dblink should point to the primary database, we have to stop SQL apply as the instantiate table
procedure uses Oracle's data pump network interface to lock the source table to obtain the SCN at the primary
database, it then releases the lock and gets a consistent snapshot of the table from the primary database, it
remembers the SCN associated with the consistent snapshot.

display what tables are


select owner, name, use_like, esc from dba_logstdby_skip where statement_opt = 'DML';
being skipped
setting the guard on a
alter database guard standby;
database
Inside SQL Apply
List the above processes select * from v$logstdby_process
# Set the cache size to 200MB
Increase the LCR cache size execute dbms_logstdby.apply_set('MAX_SGA', 200);

How much LCR cache is select used_memory_size from v$logmnr_session where session_id = (select value from v$logstdby_stats where name
being used = 'SESSION_ID');

setting SQL apply mode for


execute dbms_logstdby.apply_set (name => 'PRESERVE_COMMIT_ORDER', value => FALSE);
the application
select name, value from v$logstdby_stats where name = 'DDL TXNS DELIVERED';
Determine the number of
DDL statements since the NAME                  VALUE
last restart ------------------------------------------------------------------------
DDL TXNS DELIVERED    510
select status_code as sc, status from v$logstdby_process where type = 'BUILDER';

displaying the barrier sc     status


-------------------------------------------------------------------------------------
44604  BARRIER SYNCHRONIZATION ON DDL WITH XID 1.15.256 (WAITING ON 17 TRANSACTIONS)
Tuning SQL Apply
# Set the MAX_SERVERS to 8 x the number of cores
MAX_SERVERS execute dbms_logstdby.apply_set ('MAX_SERVERS', 64);
# Set the MAX_SGA to 200MB
MAX_SGA execute dbms_logstdby.apply_set ('MAX_SGA', 200);
# Set the Hash table size to 10 million
_HASH_TABLE_SIZE execute dbms_logstdby.apply_set ('_HASH_TABLE_SIZE', 10000000);
DDL defer DDLs to off-peak hours
# Set the PERSERVE_COMMIT_ORDER to false
Preserve commit order execute dbms_logstdby.apply_set (name => 'PRESERVE_COMMIT_ORDER', value => FALSE);

# apply lag: indicates how current the replicated data at the logical standby is
# transport lag: indicates how much redo data that has already been generated is missing at the logical
#                standby in term of redo records
lagging SQL Apply
select name, value, unit from v$dataguard_stats;

select name, value from v$logstdby_stats where name like 'TRASNACTIONS%';

Name                              Value
-----------------------------------------------------------------------------------------------------
SQL Apply component
TRANSACTIONS APPLIED     3764
bottleneck TRANSACTIONS MINED       4985

The mined transactions should be about twice the applied transaction, if this decreases or staying at a low
value you need to start looking at the mining engine.

select count(1) as idle_preparers from v$logstdby_process where type = 'PREPARER' and STATUS_CODE = 16166;
Make sure all preparers are IDLE_PREPARER
busy ----------------------------
0

select used_memory_size from v$logstdby_session where session_id = (select value from v$logstdby_stats where
Make sure the peak size is name = 'LOGMINER SESSION ID');
well below the amount USED_MEMORY_SIZE
allocated ----------------------------
32522244

select (available_txn - pinned_txn) as pipleline_depth from v$logstdby_session where session_id (select value
from v$lostdby_stats where name = 'LOGMINER SESSION ID');

PIPELINE_DEPTH
verify that the preparer ----------------------------
8
does not have enough work
for the applier processes select count(*) as applier_count from v$logstdby_process where type = 'APPLIER';

APPLIER_COUNT
----------------------------
20

Setting max_servers and execute dbms_logstdby.apply_set('MAX_SERVERS', 36);


preparers execute dbms_logstdby.apply_set('PREPARE_SERVERS', 3);
## Run this first
select name, value from v$logstdby_stats where name line '%PAGE%' or name like '%UPTIME' or name like '%IDLE%';

## Run the second time about 10 mins later


display the pageout activity select name, value from v$logstdby_stats where name line '%PAGE%' or name like '%UPTIME' or name like '%IDLE%';

Now subtract one from the other and work out the percentage rate, if pageout has increase above 5% then
increase the MAX_SERVERS
unassigned large
transactions ## By default SQL apply should be one-sixth of the number of applier processes

select (available_txn - pinned_txn) as pipleline_depth from v$logstdby_session where session_id (select value
from v$lostdby_stats where name = 'LOGMINER SESSION ID');

PIPELINE_DEPTH
----------------------------
256

select count(1) as idle_applier from v$logstdby_process where type = 'APPLIER' and statuscode = 16166;

IDLE_APPLIER
---------------------------
12

## Now look for the unassigned large transactions

select value from v$logstdby_stats where name = 'LARGE TXNS WAITING TO BE ASSIGNED';

VALUE
---------------------------
12

Monitoring

# Use the thread# when using RAC an detect missing sequences


archive gap logs
select thread#, low_sequence#, high_sequence# from v$archive_gap;

select max(sequence#), thread# from v$archived_log group by thread#;

## you can use the dg_archivelog_monitor.sh script, which accepts three parameters, primary, physical
delays in redo transport ## and the archive log threshold (# of archive logs)

dg_archivelog_monitor.sh <primary> <standby> <threshold>

## On the primary run the below


select L.thread#, L.sequence#
Identify the missing logs on from
the primary   (select thread#, sequence# from v$archived_log where dest_id=1) L
    where L.sequence# not in
       (select sequence# from v$archived_log where dest_id=2 and thread# = L.thread#);

select to_char(start_time, 'DD-MON-RR HH24:MI:SS') start_time, item , sofar from v$recovery_progress


apply rate and active   where item in ('Active Apply Rate', 'Average Apply Rate', 'Redo Applied');
monitoring
Note: the redo applied is measured in megabytes, while the average apply rate and the active apply rate is measur

col name for a13


col value for a13
col unit for a30
set lines 132
transport and apply lag
select name, value, unit, time_computed from v$dataguard_stats where name in ('transport lag', 'apply lag');

## use the dg_time_lag.ksh script


dg_time_lag.ksh

col client_pid for a10;


Viewing the status of the
managed recovery process select pid, process, status, client_process, client_pid, thread#, sequence#, block#, blocks from v$managed_standb

Switchover, Failover and FSFO

Quick Switchover and Failover (no checking)


## Start the switcover on the original primary
alter database commit to switchover to standby;

## On the new primary complete the switchover


Complete Switchover alter database commit to switchover to primary;

## Now open the database on the new primary


alter database open;

## Start the failover


alter database commit to switchover to primary;

Complete Failover # Change the level of protection that you require


sql> alter database set standby to maximize performance;
sql> alter database set standby to maximize availability;
sql> alter database set standby to maximize protection;

Broker switchover DGMGRL> switchover to prod1lr

Complete Physical Switchover with checks


Action Step Commands
check redo has been received 1
## check the syn status, it should say yes (run on the standby)
sql> select db_unique_name, protection_mode, synchronization_status, synchronized from v$archive_
## if it says NO then lets make further checks (run on the standby)
sql> select client_process, process, sequence#, status from v$managed_standby;

## now check on the primary we should be one in front (run on the primary)
sql> select thread#, sequence#, status from v$log;

Note: if using a RAC environment make sure you check each instance

## check that MRP (applying_log) matches the RFS process, if the MRP line is missing then you nee
## start the apply process, you also may see the status of wait_for_gap so wait until the gap hav
check that redo has been applied ## resolved first
2
(physical)
sql> select client_process, process, sequence#, status from v$managed_standby;

## if you are using a logical standby then you need to check the following to confirm the redo ha
## applied

check that redo has been applied sql> select applied_scn, latest_scn, mining_scn from v$logstdby_progress;
3
(logical)
## if the mining scn is behind you may have a gap check this by using the following

sql> select status from v$logstdby_process where type = 'READER';


show any running jobs or backups 4 sql> select process, operation, r.status, mbytes_processed pct, s.status from v$rman_status r, v$
sql> alter system set log_archive_trace=8129;
increase logging level (if required) 5 ## to turn it off again
sql> alter system set log_archive_trace=0;
## Display the active sessions
check for active sessions 6 sql> select program, type from v$session where type='USER';
## make sure the status is "to standby", if you get "sessions active", then stop those sessions (
check the switchover status 7 ## sessions)
sql> select switchover_status from v$database;
tail the log alert log file 8 tail alert??.log
## on the primary, after this command completes you will have two physical standbys

switchover (primary) 9 sql> alter database commit to switchover to physical standby with session shutdown;

Note: at this point if you want to rollback this switchover see my troubleshooting section to get
check the switchover status 10 sql> select switchover_status from v$database;
complete the switchover (physical) 11 sql> alter database commit to switchover to primary with session shutdown;
open the new primary 12 sql> alter database open;
sql> shutdown immediate;
finish off the old primary 13 sql> startup mount;
sql> alter database recover managed standby database using current logfile disconnect;

Complete Logical Switchover with checks


Action Step Commands

## check the syn status, it should say yes (run on the standby)
sql> select db_unique_name, protection_mode, synchronization_status, synchronized from v$archive_

## if it says NO then lets make further checks (run on the standby)


sql> select client_process, process, sequence#, status from v$managed_standby;
check redo has been received 1
## now check on the primary we should be one in front (run on the primary)
sql> select thread#, sequence#, status from v$log;

Note: if using a RAC environment make sure you check each instance

## check that MRP (applying_log) matches the RFS process, if the MRP line is missing then you nee
## start the apply process, you also may see the status of wait_for_gap so wait until the gap hav
check that redo has been applied ## resolved first
2
(physical)
sql> select client_process, process, sequence#, status from v$managed_standby;

## if you are using a logical standby then you need to check the following to confirm the redo ha
## applied

check that redo has been applied sql> select applied_scn, latest_scn, mining_scn from v$logstdby_progress;
3
(logical)
## if the mining scn is behind you may have a gap check this by using the following

sql> select status from v$logstdby_process where type = 'READER';


show any running jobs or backups 4 sql> select process, operation, r.status, mbytes_processed pct, s.status from v$rman_status r, v$
sql> alter system set log_archive_trace=8129;
increase logging level (if required) 5 ## to turn it off again
sql> alter system set log_archive_trace=0;
## Display the active sessions
check for active sessions 6 sql> select program, type from v$session where type='USER';
check the switchover status 7 ## make sure the status is "to standby", if you get "sessions active", then stop those sessions (
## sessions)
sql> select switchover_status from v$database;
tail the log alert log file 8 tail alert??.log
sql> alter database prepare to switchover to logical standby;
Prepare the primary standby 9 ## confirm that the prepare has started to happen, you should now see "preparing switchover"
sql> select switchover_status from v$database;
sql> alter database prepare to switchover to primary;

## confirm that the prepare has started to happen, you should see "preparing dictionary"
Prepare the logical standby 10 sql> select switchover_status from v$database;

## wait a while until the dictionary is built and sent and you should see "preparing switchover"
sql> select switchover_status from v$database;
## you should now see its in the state of "to logical standby"
Check primary database state 11
sql> select switchover_status from v$database;
## On the primary
sql> alter database prepare to switchover cancel;
the last chance to CANCEL the
12
switchover (no going back after this) ## on the logical
sql> alter database prepare to switchover cancel;
switchover the primary to a logical
13 sql> alter database commit to switchover to logical standby;
standby
## check that its ready to become the primary, you should see "to primary"

switchover the logical standby to a sql> select switchover_status from v$database


14
primary
## Complete the switchover
sql> alter database commit to standby to primary;
start the apply process 15 sql> alter database start logical standby apply immediate;

Complete Physical/Logical failover with checks


Action Step Commands

## This will tell you the lag time

select name, value, time_computed from v$dataguard_stats where name like '%lag%';
Check redo applied 1
## You can also use the SCN number

select thread#, sequence#, last_change#, last_time from v$standby_log;

## Start by telling the apply process that this standby is going to be the new primary, and to ap
## the redo that it has

alter database recover managed standby database cancel;


alter database recover managed standby database finish;

## At this point the protection mode is lowered

select protection_mode from v$database;


the failover process (physical
2
standby) ## Now issue the switchover command and then open the database

alter database commit to switchover to primary with session shutdown;


alter database open;

## Startup the other RAC instances if using RAC

## You can then raise the protection mode (if desired)

set standby database to maximum protection;

the failover process (logical standby) 2 alter database activate logical standby database finish apply;

Bringing back the old Primary


Action Step Commands
bring back the old primary (physical 1
standby) ## Since redo is applied by SCN we need he failover SCN from the new primary

select to_char(standby_became_primary_scn) failover_scn from v$database;

FAILOVER_SCN
-----------------------------------------------
7658841

## Now flashback the old primary to this SCN and start in mount mode
startup mount;
flashback database to scn 7658841;
alter database convert to physical standby;
shutdown immediate;
startup mount;

## hopefully the old primary will start to resolve any gap issues at the next log switch, which m
## process to get this standby going to catchup as fast as possible
alter database recover managed standby database using current logfile disconnect;

## eventually the missing redos will be sent to the standby and applied, bring us back to synchro

## again we need to obtained the SCN


select merge_change# as flashback_scn, processed_change# as recovery_scn from dba_logstdby_histor
max(stream_sequence#)-1 from dba_logstdby_history);

flashback_scn      recovery_scn
---------------------------------------------------------
         7658941              7659568

## Now flashback the old primary to this SCN and start in mount mode
startup mount;
flashback database to scn 7658841;
alter database convert to physical standby;
shutdown immediate;
startup mount;

## Now we need to hand feed the archive logs from the primary to the standby (old primary) into t
## process, so lets get those logs (run on the primary)

bring back the old primary (logical select file_name from dba_logstdby_log where first_changed# <= recovery_scn and next_change# > fl
2
standby) ## Now you will hopefully have a short list of the files you need, now you need to register them
## the standby database (old primary)

alter database register logfile '<files from above list>';

## Now you can recover up to the SCN but not including the one you specify
recover managed standby database until change 7659568;

## Now the standby database becomes a logical standby as up to this point it has been a physical
alter database active standby database;

## Lastly you need tell your new logical standby to ask the primary for a new copy of the diction
## all the redo in between. The SQL Apply will connect to the new primary using the database link
## retrieve the LogMiner dictionary, once the dictionary has been built, SQL Apply will apply all
## redo sent from the new primary and get itself synchronized

create public database link reinstatelogical connect to system identified by password using 'serv

alter database start logical standby apply new primary reinstatelogical;

Use the Broker to bring back the old Primary


DGMGRL> failover to prod1dr;
Use the broker to do it all for you n/a DGMGRL> reinstate database prod1;

Fast Start Failover (FSFO)


Monitor a specific condition DGMGRL> enable fast_start failover condition "Corrupted Controlfile";
via the Broker DGMGRL> enable fast_start failover condition "Datafile Offline";

Display conditions that are


DGMGRL> show fast_start failover;
be monitored
Select the standby to DGMGRL> edit database prod1 set property FastStartFailoverTarget = 'prod1dr';
become the primary DGMGRL> edit database prod1dr set property FastStartFailoverTarget = 'prod1';

change threshold DGMGRL> edit configuration set property FastStartFailoverTargetThreshold = 45;


lag limit DGMGRL> edit configuration set property FastStartFailoverLagLimit = 60;
abort primary if in a hung
DGMGRL>edit configuration set property FastStartFailoverPmyShutdown = true;
state
reinstate primary after a
DGMGRL>edit configuration set property FastStartFailoverAutoReinstate = true;
failover
DGMGRL> enable fast_start failover;

Enable FSFO ## Display the configuration

DGMGRL> show fast_start failover;

Other sections of interest

Active Data Guard - see active data guard

Backups and Recovey - see backups and recovery

Troubleshooting - see troubleshooting

My complete setup guide - see complete setup guide

You might also like