Professional Documents
Culture Documents
Callmanager Database Replication
Callmanager Database Replication
Presentation_ID
Cisco Confidential
Agenda
CallManager Database Architecture DB Replication Flow Diagram
Presentation_ID
Cisco Confidential
DB Architecture : Install/Ugrade
In 5.0 and 5.1 The publisher upgrade migrates data prior to reboot to the new version. The subscriber starts replication setup after it is upgraded and rebooted.
Replication setup pushes data from the publisher to the subscriber. The subscribers local database is ready for failover only after replication is complete.
Presentation_ID
Cisco Confidential
DB Architecture : Install/Ugrade
In 6.X + The publisher upgrade migrates data and performs an ontape (Informix utility) backup prior to reboot to the new version.
The subscriber upgrade gets the publisher ontape backup via SFTP, and restores that data to the subscriber. (This gets the data close in content which is imperative for services reading data local.) The subscriber starts replication setup after the upgrade and reboot.
Replication setup audits the data and pushes differences between the publisher and subscriber to the subscriber. Change notification is sent to the local services for each change. The local database is ready before replication is complete. The replication setup timeout is set-able via CLI utils dbreplication setrepltimeout 900 (15 minutes) User Facing Features (listed on a later slide) are backed up locally on all servers prior to upgrade and reboot and restored after reboot so that any changes made by users during the upgrade are not lost.
Presentation_ID
Cisco Confidential
Presentation_ID
Cisco Confidential
Presentation_ID
Cisco Confidential
Privacy Enable/Disable
Do Not Disturb Enable/Disable (DND) Extension Mobility Login (EM) Monitor (for future use, currently no updates at the user level) Hunt Group Logout Device Mobility CTI CAPF status for end users and application users
Presentation_ID
Cisco Confidential
Presentation_ID
Cisco Confidential
Presentation_ID
Cisco Confidential
Presentation_ID
Cisco Confidential
10
Presentation_ID
Cisco Confidential
11
Presentation_ID
Cisco Confidential
12
Steps to DB Replication
These steps are done automatically by the replication scripts when the system is installed. When we do a utils dbreplication reset all, these steps get done again. 1. Tears down the replication . CDR DELETE Server this can cause corruption of syscdr database . 2. Define publisher - This will help to set it up to start replicating 3. Define template on publisher and realize it - This tells publisher what tables to replicate.
13
Presentation_ID
Cisco Confidential
14
DB Replication Troubleshooting
How do we verify if replication is broken Commands to diagnose and fix replication If you cannot fix it, what trace files do we collect
If customer needs an RCA, we would have to run the special ercollect script on the server.
Presentation_ID
Cisco Confidential
15
Presentation_ID
Cisco Confidential
16
What the replication state counter means: 0 = Initialization 1 = Number of replicates is not correct (old sys) 2 = Replication is good 3 = Replication is bad 4 = Replication setup did not succeed (this meaning is for 5.1.3 and all 6.X versions) .
Presentation_ID
Cisco Confidential
17
Presentation_ID
Cisco Confidential
18
=2
Presentation_ID
Cisco Confidential
19
Presentation_ID
Cisco Confidential
20
REPLICATION
REPL. DBver& REPL. REPLICATION SETUP STATUS QUEUE TABLES LOOP? (RTMT)
------------
----- ------- ----- ----------------Connected Connected 0 0 match N/A match N/A (2) PUB Setup Completed (2) Setup Completed
21
14.128.62.72 14.128.62.73
Cisco Confidential
Troubleshooting Steps
Verify Connectivity Verify Host Files are in sync.
Presentation_ID
Cisco Confidential
22
Presentation_ID
Cisco Confidential
23
/home/informix/.rhosts
/usr/local/cm/db/informix/etc/sqlhosts
Presentation_ID
Cisco Confidential
24
127.0.0.1 localhost
14.128.62.3 CM613 14.128.62.6 CM613SUB
Presentation_ID
Cisco Confidential
25
g_cm613_ccm6_1_3_1000_16
cm613_ccm6_1_3_1000_16 onsoctcp CM613 cm613_ccm6_1_3_1000_16 g=g_cm613_ccm6_1_3_1000_16 b=32767 g_cm613sub_ccm6_1_3_1000_16 group i=3 cm613sub_ccm6_1_3_1000_16
26
Presentation_ID
Cisco Confidential
27
ACCEPT
ACCEPT
tcp -- CM613SUB
udp -- CM613SUB
anywhere
anywhere
tcp dpt:1501
udp dpt:1501
This example above is from a pub (CM613) where CM613SUB is the sub. Sub should have similar entries for pub. If they do not, it is probably a network issue. TCP port 1501 is used by callmanager database at the time of migration (upgrade). Ensure all servers in cluster have good status (TCP and ACCEPT on port 1500 and is named by server). Else Verify the Cluster Manager Logs. - File list activelog platform/log/clustermgr* - File view activelog platform/log/clustermgr00000002.log Example : 06/14/2010 23:22:03.009 clm|HMAC_SHA1 match failed IP(14.128.62.6)| (Failed) 03/25/2010 06:52:39.864 clm|hostname: CM613SUB state POLICY_INJECTED| (Success)
28
Presentation_ID
Cisco Confidential
29
src=
ip=
dest=
port=8500
22:09:10.479943 CM613.8500 > CM613SUB.8500: isakmp: phase 2/others ? #71[C] (DF) 22:09:10.481232 CM613SUB.8500 > CM613.8500: isakmp: phase 2/others ? #71[C] (DF)
30
Starting diagnostic test(s) =========================== test - disk_space skip - disk_files : Passed (available: 849 MB, used: 4998 MB) : This module must be run directly and off hours : Passed
: Passed : Passed
test - ntp_reachability
test - ntp_clock_drift
test - ntp_stratum
: Passed
: Passed
Diagnostics Completed
Presentation_ID 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential
31
Troubleshooting : Is the publisher failing to define the template or realize the template
Verify the logs to see at what point is the replication failing.
admin:file list activelog /cm/trace/dbl date det 15 Jun,2010 10:45:17 <dir> dblj
15 Jun,2010 10:45:17
15 Jun,2010 10:45:17 19 Nov,2009 18:53:44
<dir>
<dir> 1,847
ncsj
sdi
2010_09_15_11_14_58_ne042_ccm_164_ccm8_6_0_96000_16_dbl_repl_cdr_define.log
19 Nov,2009 18:59:57
299,786
2010_09_15_13_10_20_dbl_repl_output_Broadcast.log
Presentation_ID
Cisco Confidential
32
DB Replication Commands
Presentation_ID
Cisco Confidential
33
DB Replication Commands
Utils dbreplication status This command displays the status of database replication by comparing the database content of subscribers to the Publisher. It will indicate if the servers in the cluster are connected, and if the data is in sync.
Utils dbreplication stop This command stops automatic replication setup on the local server waits the replication timeout and stops the automatic replication setup again.
You would want to wait it out to run the following (reset) commands.
This command is typically run prior to running reset Stop the replication on the subs first and then the pub. After we stop on the pub, it waits the repl timeout to start replication. We would have to reset to initiate replication as all the automatic setup processes are stopped.
Presentation_ID
Cisco Confidential
34
DB Replication Commands
Utils dbreplication repair This command repairs data if they are out of sync. This command is run when utils dbreplication status shows connected and few tables are out of sync. Syntax: utils dbreplication repair {all | hostname}
It can be used to tear down and rebuild replication when the system has not set up properly.
Syntax: utils dbreplication reset {all | hostname}
Presentation_ID
Cisco Confidential
35
DB Replication Commands
Utils dbreplication setrepltimeout Syntax : utils dbreplication setrepltimeout timeout
Timeout - The new database replication timeout, in seconds. Value Range is between 300 and 7200.
The default database replication timeout equals 5 minutes (value of 300). This timer comes into effect for both the replication stop and reset replication commands. For reset, it waits for the timer after defining the servers and then realizes the template. When the first subscriber requests replication with the pub, this timer will be set. When the timer expires, the first sub plus other subs that requested replication within that time period begin data replication with the pub in a "batch". For large clusters, you can use the command to increase the default timeout value, so more subs will be included in the batch.
This timer should be set on the publisher after publisher has been upgraded and booted up on the upgraded partition, but before first sub has been switched over to new release. Then, when the first sub requests replication, the pub will set the timer based on this new value.
Note: It is recommended you restore this value back to the default of 300 (5 minutes) once the cluster isInc. upgraded successfully Presentation_ID entire 2006 Cisco Systems, All rights reserved. Cisco Confidential and subs have successfully set up replication.
36
DB Replication Commands
admin:show tech repltimeout
The Replication timeout is set to 300 seconds This command helps you determine the repltimeout set on the cluster
Presentation_ID
Cisco Confidential
37
DB Replication Commands
Utils dbreplication runtimestate
This command helps to make sure the Publisher is able to communicate with all the subscribers DBLRPC service aka Database Replicator. Verify the RPC column. Typically run before running the reset command.
admin:utils dbreplication runtimestate DB and Replication Services: ALL RUNNING Cluster Replication State: Replication status command started at: 2010-05-13-15-53 Replication status command COMPLETED 427 tables checked out of 427 No Errors or Mismatches found.
REPL. DBver& REPL. REPLICATION SETUP STATUS QUEUE TABLES LOOP? (RTMT) & details
------------
----- ------- ----- ----------------Connected Connected 0 0 match N/A match N/A (2) PUB Setup Completed (2) Setup Completed
38
14.128.62.72 14.128.62.73
Cisco Confidential
This command will tear down and rebuild replication for the entire cluster.
After using this command, each sub needs to be rebooted. Also, once the subs have been rebooted, you must go to the pub and issue the CLI command "utils dbreplication reset all". RCA cannot be determined once you run this command. Syntax : utils dbreplication clusterreset
Utils dbreplication dropadmindb This command drops the Informix syscdr database on any server in the cluster.
You should run this command only if database replication reset or clusterreset fails to define a particular node in the replication process.
RCA cannot be determined. Syntax : utils dbreplication dropadmindb
Presentation_ID 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential
39
- Good Status
1. Check the output to be sure each server is connected, and no tables are suspect
2. The status should list all the subscribers as being connected at the top of the file, and no tables are suspect SERVER ID STATE STATUS QUEUE CONNECTION CHANGED ----------------------------------------------------------------------g_bldr_ccm4_ccm 2 Active Local 0 g_bldr_ccm5_ccm 3 Active Connected 0 Sep 6 16:27:15
Presentation_ID
Cisco Confidential
40
2. Replication state 3 states, there are a few tables that are out of sync.
3. You would run a dbreplication repair to clear this issue. (Slide 31)
SERVER ID STATE STATUS QUEUE CONNECTION CHANGED ----------------------------------------------------------------------g_bldr_ccm4_ccm 2 Active Local 0 g_bldr_ccm5_ccm 3 Active Connected 0 Sep 6 16:27:15 ---------- Suspect Replication Summary ---------For table: ccmdbtemplate_bldr_ccm4_ccm_1_27_processnode replication is suspect for node(s): g_bldr_ccm5_ccm For table: ccmdbtemplate_bldr_ccm4_ccm_1_34_replicationdynamic replication is suspect for node(s): g_bldr_ccm5_ccm ------------------------------------------------Presentation_ID 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential
41
SERVER ID STATE STATUS QUEUE CONNECTION CHANGED ----------------------------------------------------------------------g_bldr_ccm4_ccm 2 Active Local 0 g_bldr_ccm5_ccm 3 Active Dropped 636 Sep 10 14:01:20
Possible causes :
42
43
Utils dbreplication dropadmindbforce Drops the Informix syscdr database on the server which it is run Syntax : utils dbreplication dropadmindbforce
Presentation_ID 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential
44
Utils dbreplication repairtable This command repairs mismatched data between cluster nodes and changes the node to match the publisher data. It does not repair replication setup. Syntax : utils dbreplication repairtable tablename [nodename]|all
Presentation_ID
Cisco Confidential
45
Replication Logs
From the Publisher 1. File get activelog cm/log/informix/*dbl_repl*.log 2. File get activelog cm/trace/dbl/*dbl_repl*.log
9. File get activelog cm/trace/dbl/sdi/ReplicationRepair* 10. File get activelog cm/trace/dbl/sdi/replication_scripts_output.log 11. utils diagnose test o/p (file get activelog /platform/log/diag2.log)
Presentation_ID 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential
46
Replication Logs
From the Subscribers 1. File get activelog cm/log/informix/ccm.log*
Download the following unified reports 1. Database Status 2. Cluster Overview 3. Replication Debug
Presentation_ID
Cisco Confidential
47
Replication Logs
admin:file list activelog /cm/trace/dbl date det 15 Jun,2010 10:45:17 <dir> dblj
15 Jun,2010 10:45:17
15 Jun,2010 10:45:17
<dir>
<dir>
ncsj
sdi
19 Nov,2009 18:53:44 1,847 dbl_repl_cdr_define_subscriber_ccm7_1_3_10000_112009_11_19_18_53_21.log 19 Nov,2009 18:59:57 19 Nov,2009 18:59:57 299,786 dbl_repl_cdr_Broadcast_2009_11_19_18_58_44.log 1,261 dbl_repl_output_Broadcast_2009_11_19_18_58_44.log
Presentation_ID
Cisco Confidential
48
49
Presentation_ID
Cisco Confidential
50
Cisco Confidential
51
Presentation_ID
Cisco Confidential
52
Presentation_ID
Cisco Confidential
53