Determining Whether An Avamar System Is Experiencing A Time Synchronization (NTP) Issue. - Dell US

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Customer View

Determining Whether an Avamar System is Experiencing a Time Synchronization (NTP) issue.

Article Number: 333423


Version: 3
Article Type: Break Fix
Last Published: 29 Mar 2018
Summary: How to determine whether an Avamar system is experiencing a time synchronization (NTP) issue.

Issue How to determine whether an Avamar system is experiencing a time synchronization (NTP) issue.

Time synchronization amongst all nodes is essential for the healthy operation of an Avamar system.
If nodes within an Avamar system are not time synchronized, we can expect the following types of behavior:

The GSAN will be unable to start


Nodes may become offline
HFScheck may fail with MSG_ERR_CGSAN_FAILED
HFScheck may fail with MSG_ERR_HFSCHECKERRORS
Checkpoints may fail
Garbage Collection may fail
Data consistency issues (if the time changes during garbage collection)

Examples of error messages commonly reported as a result of loss of time synchronization:

samconn::checkallsucceed request failed DPNTIMECHECK=230


FATAL ERROR: <0001> dpn time mismatch: synchronize clocks and retry
ERROR: <0001> dpncheckmanager::verifyStartup cgsan died unexpectedly. terminating
not enough valid responses received in time

An Avamar system may experience problems with NTP time synchronization for various reasons but to begin diagnosing such an issue
we need to first of all recognize that it exists.
The scope of this KB article is to help the reader determine whether an Avamar system is experiencing a time synchronization
issue.

Resolving the time synchronization issue is beyond the scope of this article. There are many excellent websites which cover NTP
troubleshooting and the reader is encouraged to investigate them. Helpful web URLs available at the time of writing will be listed in
the 'external links' section.

Cause There may be one of several possible causes for time (NTP) related issues.
Problems with the time synchronisation (ntpd) server.
Problems with the time synchronisation client.
Network problems.

Note: This KB article was written with references to Avamar v5.x but the principles of time synchronization are applicable to all current
versions of Avamar.

Resolution Workaround:
1. Login to the Avamar server as admin per KB Link Error 95614.

2. To determine whether Avamar nodes are time synchronized, check the current time and date of each node on the Avamar system.
See APPENDIX A for sample output.
mapall --all --parallel '/bin/date'

When all nodes report the same date and time this means the time is fully synchronized between all the nodes on this system.

3. To keep time synchronised on the nodes, Avamar makes use of NTP (Network Time Protocol). The Linux OS command "ntpq -
pn" can be used to assess the state of NTP / time synchronisation on an Avamar node. See APPENDIX B for sample output.

mapall --all --noerror '/usr/sbin/ntpq -p'


4. General Avamar Server Observations:

All nodes in the grid are set to prefer 128.xxx.xxx.xx as the primary time source.
The secondary time source for all nodes in the grid is the local BIOS clock on "avmtest1" (node 0.s).
The tertiary time source is set to be avmtest2 (node 0.0) which is itself referring to avmtest1.
All nodes are synchronising with avmtest1. The time server marked with an asterisk (*) is the one that the node is currently
synchronizing with.
In this case, 128.xxx.xxx.xx is located remote to the grid and has a 'reach' value of 0 (currently unreachable). It is useless as a
time source.
avmtest1 and avmtest2 both have a reachability register of octal 377. This is the highest figure attainable and therefore the nodes
are all synchronizing with the secondary source.

Note: The 'reach' field: A full discussion of reach-ability is beyond the scope of this article. However, the 'reach' value is essentially a
report on the status of the previous eight transactions between the NTP client and NTP server. A value of 377 means that the last eight
transactions were all successful. Please refer to the external references given below for more information on how to understand the
how 'reach' value works.
5. Specifically looking at the ntpq output for node 0.2 ;

(0.2) ssh -x admin@10.64.18.164 '/usr/sbin/ntpq -p'

remote refid st t when poll reach delay offset jitter

==============================================================================
128.xxx.xxx.xx .INIT. 16 u - 1024 0 0.000 0.000 4000.00
*avmtest1.emcvmw LOCAL(0) 9 u 54 256 377 0.085 -0.116 0.002
+avmtest2.emcvmw xx.xx.xx.xxx 10 u 56 256 377 0.090 0.073 0.012

The following is determined:


Node 0.2 is polling avmtest1 every 256 seconds
Node 0.2 is currently synchronising with avmtest1
avmtest1 is at stratum 9, implying that node 0.2 is at stratum 10.
Node 0.2 is polling avmtest1 once every 256 seconds.
The reachability register for avmtest1 is octal 376.
The clock on avmtest1 is 0.116 milliseconds (or 116 microseconds) behind the clock on avmtest1.
The roundtrip delay to avmtest1 is 85 milliseconds.
The measurement of the variance in latency on the network (jitter) between node 0.2 and avmtest1 is 2 milliseconds.

NTP configuration (/etc/ntp.conf):


If reviewing the /etc/ntp.conf file on node 0.2 it corresponds to the ntpq output above.

#Customer premises / external time servers.

#
server xxx.xxx.xxx.xx <-- Primary time source (this is an external server located remote
to the Avamar grid)
# - - - - -
# DPN time servers here and in the other module(s).
#
server xx.xx.xx.xxx <-- Secondary time source (this is the utility node)
server xx.xx.xx.xxx <-- Tertiary time source (this is node 0.0)

Logging:
NTP logging is directed to the /var/log/messages file.
To view NTP related logging, grep the contents of /var/log/messages* for 'ntp'

Resolving Time Synchronization Issues:


If identifying an Avamar is experiencing time synchronization issues the problem will need to be fixed at its source. Resolving time
synchronisation issues is beyond the scope of this article.
If an external time server is unreliable, as in the example given above, it is acceptable to use an internal time server. The time of the
Avamar nodes may drift slowly from UTC but the most important consideration is that the data nodes are time synchronised with
one another.

The Avamar utility 'asktime' tool can be used to select new, preferred time sources for NTP. Please refer to KB Link Error 108926 -
How to Change NTP on an Avamar Server Using asktime
Additional Information:
http://support.microsoft.com/kb/939322 - Windows Domain controllers should not be used for good time keeping.

Notes APPENDIX A:
Example of all nodes showing synchronized time.
Note: The '--parallel' flag executes the command on each node simultaneously. On a system where time is synchronized you will see
an output similar to the following:
Note: The utility node (0.x) is set to the local time zone, in this example 'BST' whereas the data nodes are set to the 'UTC' time
zone. This is normal and expected behavior.

mapall --all --parallel 'date'

Using /usr/local/avamar/var/probe.xml
(0.s) ssh -x admin@xx.xx.xx.xxx 'date'
(0.0) ssh -x admin@xx.xx.xx.xxx 'date'
(0.1) ssh -x admin@xx.xx.xx.xxx 'date'
(0.2) ssh -x admin@xx.xx.xx.xxx 'date'
Mon Jun 20 12:01:12 BST 2011
Mon Jun 20 11:01:12 UTC 2011
Mon Jun 20 11:01:12 UTC 2011
Mon Jun 20 11:01:12 UTC 2011

APPENDIX B:
Example of ntpq output from an Avamar system (1 utility node and 3 data nodes):
Note: If adding an 'n' flag to the command below (ntpq -pn), name resolution will not be used. Output will be returned more quickly, and
IP addresses will be shown instead of hostnames, which may reduce human readability.

mapall --all --noerror '/usr/sbin/ntpq -p'


(0.s) ssh -x admin@10.xx.xx.xxx '/usr/sbin/ntpq -p'
remote refid st t when poll reach delay offset jitter
==============================================================================
128.xxx.xxx.xx .INIT. 16 u - 1024 0 0.000 0.000 4000.00
*LOCAL(0) LOCAL(0) 8 l 8 64 377 0.000 0.000 0.001

(0.0) ssh -x admin@10.xx.xx.xxx '/usr/sbin/ntpq -p'


remote refid st t when poll reach delay offset jitter
==============================================================================
128.xxx.xxx.xx .INIT. 16 u - 1024 0 0.000 0.000 4000.00
*avmtest1.emcvmw LOCAL(0) 9 u 750 1024 377 0.126 -0.197 0.001

(0.1) ssh -x admin@10.xx.xx.xxx '/usr/sbin/ntpq -p'

remote refid st t when poll reach delay offset jitter


==============================================================================
128.xxx.xxx.xx .INIT. 16 u - 1024 0 0.000 0.000 4000.00
*avmtest1.emcvmw LOCAL(0) 9 u 194 256 377 0.095 -0.139 0.004
+avmtest2.emcvmw xx.xx.xx.xxx 10 u 189 256 377 0.097 0.062 0.005

(0.2) ssh -x admin@10.xx.xx.xxx '/usr/sbin/ntpq -p'


remote refid st t when poll reach delay offset jitter
==============================================================================
128.xxx.xxx.xx .INIT. 16 u - 1024 0 0.000 0.000 4000.00
*avmtest1.emcvmw LOCAL(0) 9 u 54 256 377 0.085 -0.116 0.002
+avmtest2.emcvmw xx.xx.xx.xxx 10 u 56 256 377 0.090 0.073 0.012

Attachments

Article Properties

First Published Fri Feb 05 2016 18:25:37 GMT

Primary Product Avamar

Product Avamar Server 5, Avamar Server 6, Avamar Server 7,Avamar

You might also like