172254

Flexent/AUTOPLEX Wireless Networks Executive Cellular Processor (ECP) Release 16.
0 Common Network Interface (CNI) Ring Maintenance
Lucent TechnologiesProprietary This document contains proprietary information of Lucent Technologies and is not to be disclosed or used except in accordance with applicable agreements. Copyright 2000 Lucent Technologies Unpublished and Not for Publication All rights Reserved
Issue 16.0 December 2000 401-661-045
Copyright 2000 Lucent Technologies All Rights Reserved

This material is protected by the copyright laws of the United States and other countries. It may not be reproduced, distributed, or altered in any fashion by any entity including other Lucent Technologies business units or divisions without the expressed written consent of the Customer Training and Information Products Department.
Notice
Every effort was made to ensure that the information in this document was complete and accurate at the time of printing. However, information is subject to change.
Federal Communications Commission Statement (FCC) Notication and Repair Information

NOTE: This equipment has been tested and found to comply with the limits for a Class A digital device, pursuant to Part 15 of the FCC Rules. These limits are designed to provide reasonable protection against harmful interference when the equipment is operated in a commercial environment. This equipment generates, uses, and can radiate radio frequency energy, and if not installed and used in accordance with the instruction manual, may cause harmful interference to radio communications. Operation of this equipment in a residential area is likely to cause harmful interference in which case the user will be required to correct the interference at his/her own expense.
Security Statement
In rare instances, unauthorized individuals make connections to the telecommunications network through the use of remote access features. In such event, applicable tariffs require that the customer pay all network charges for trafc. Lucent Technologies cannot be responsible for such charges and will not make any allowance or give any credit for charges that result from unauthorized access.
Trademarks
5ESS is a registered trademark of Lucent Technologies. AUTOPLEX is a registered trademark of Lucent Technologies. AutoPACE is a registered trademark of Lucent Technologies. BILLDATS is a registered trademark of Lucent Technologies. DEFINITY is a registered trademark of Lucent Technologies. DOS Windows is a trademark of Sun Microsystems, Inc. Informix is a registered trademark of Informix Software, Inc. Intel is a registered trademark of the Intel Corporation. Motorola is a registered trademark of the Motorola Corporation. Paradyne is a trademark of Paradyne Corporation. Sun is a trademark of Sun Microsystems, Inc. Solaris is a trademark of Sun Microsystems, Inc. SPARC is a trademark of Sun Microsystems, Inc. UNIX is a registered trademark in the United States and other countries, licensed exclusively through X/Open Company Ltd. Other trademarks may appear in this document as well. They are marked on rst usage.
Lucent TechnologiesProprietary See notice on rst page
Contents
About This Document
s s s s s s s s
xv xv xv xvi xvi xvii xvii xviii xix
Purpose Reasons for Reissue Intended Audience How to Use This Document Conventions Used Product Safety Labels How to Order Documentation How to Comment on This Document
Overview of the CNI Ring

s s
1-1 1-1 1-2 1-3 1-3 1-4 1-4 1-5 1-5 1-6 1-6 1-6 1-7 1-7 1-7 1-8 1-8 1-8 1-9 1-9 1-10 1-10 1-10 1-11 1-11 1-12
s s s s s s s s s s s
DSN/CSN/ICN Hardware Descriptions CDN Hardware Description CDN CDN-I CDN-II CDN-IIx CDN-III RPCN Hardware Description Direct Link Node Hardware Description SS7 Node Hardware Description EIN Ethernet Interface Node CNI Integrity Process Descriptions Error Analysis and Recovery Process Automatic Ring Recovery Process Node Audit Capability Ring Audit Capability RPCN Token Audit CNI Safety Net Capability Inhibiting CNI Safety Net Allowing CNI Safety Net Feature General Maintenance Daily Activity Recommendation Faulty Node Recovery Strategy Routine Diagnostics Fault Descriptions
Issue 16.0
December 2000
iii
401-661-045
Contents
RAC Parity/Format Error Unexplained Loss of Token SRC Match RAC Output Parity Error General RAC Error Detected Node Audit Failure Interframe Buffer Parity Error Read Format Error Write Format Error Emergency Maintenance Ring Down Recovery Rolling CNI Initializations Global CDN Recovery Single CDN Recovery 1-12 1-17 1-21 1-27 1-30 1-32 1-35 1-38 1-39 1-41 1-41 1-41 1-47 1-48
Description of the Ring Subsystem

s s s
2-1 2-1 2-3 2-5 2-6 2-6 2-7 2-7 2-9 2-10 2-11 2-13 2-13 2-13 2-16 2-17 2-18 2-19 2-20 2-20 2-20 2-21
s s s
General Operation of the Ring Ring Nodes Ring Peripheral Controller Nodes Basic IMS User Nodes Direct Link Nodes (DLN) Call Processor/Data Base Nodes (CDN) Interframe Buffers Node Names and Addresses Ring Message Format Reconfigurations Node Quarantine Node Isolation The Ring Config Module Initializations Level-3 IMS Initializations (FPI and Boot) Level-4 IMS Initializations (FPI and Boot) Audits Central Node Control Audit (AUD CNC) Node State Audit (AUD NODEST) Node Audit
iv
Issue 16.0
December 2000
Contents
3 Ring Maintenance
s s
3-1 3-1 3-3 3-3 3-11 3-25 3-25 3-36 3-39 3-66 3-67 3-85
Overview Automatic Ring Maintenance EAR or Ring Recovery ARR or Deferrable Node Recovery Manual Ring Maintenance Ring Maintenance Interfaces Ring Diagnostics Guide to Critical Ring Maintenance Examples of Ring Maintenance Responses to Single, Ring-Related Faults Responses to Multiple, Ring-Related Faults
Ring and Ring Node Maintenance Procedures

s s
4-1 4-1 4-3 4-3 4-6 4-11 4-19 4-21 4-21 4-21 4-22 4-22 4-22 4-22 4-25 4-25 4-25 4-25 4-25 4-26 4-29 4-30
Introduction Ring Fault Conditions and Maintenance Approach Ring Node Out-of-Service Single-Ring Node Isolation Multiple-Ring Node Isolation Ring Down Ring Generic Access Package (RGRASP) Feature Definition Feature Description Software Impact Software Description User Profile Description of Feature Operation Equipment Configuration Data (ECD) Recent Change Procedures Measurement Network Management Impact Maintenance/Troubleshooting Impact Recording Output Messages Audits
Issue 16.0
December 2000
401-661-045
Contents
Critical Events Support Tools Related Documentation Cross-References 4-30 4-30 4-30
Ring Critical Events

s s
5-1 5-1 5-2 5-2 5-3 5-3 5-4 5-4
Introduction Critical Event Message Output Logging Critical Events Short Form CNCE Message Long Form CNCE Message Using the CHG:CEPARM Command CNCE Descriptions
Diagnostic Users Guide

s s
6-1 6-1 6-1 6-1 6-2 6-5 6-6 6-6 6-8 6-72 6-73 6-73 6-75
Introduction Overview Diagnostics Hardware and Interfaces System Maintenance Interfaces Performing Diagnostics Diagnostic Message Structure System Diagnostics Denied Diagnostic Requests Inhibiting Diagnostic Requests Diagnostic Aborts and Audits Operating System Diagnostics
Equipment Handling Procedures

s s
7-1 7-1 7-1 7-2 7-13 7-16
Introduction Equipment Description and Handling Precautions Power Packs and Fusing Descriptions Fan and Filter Maintenance Ring Node Circuit Pack Handling Precautions
vi
Issue 16.0
December 2000
Contents
Ring Node Equipment Visual Indicators Removing Affected Equipment From Service UN122C and UN123B Combination Circuit Pack Installation Voice Frequency Link Hardware Equipment Replacement Procedures 7-17 7-17 7-23 7-28
Ring Error Analysis and Recovery

s s s s s s s s s s s s s s s s s s s s
A-1 A-1 A-1 A-2 A-3 A-6 A-8 A-10 A-12 A-14 A-16 A-18 A-20 A-21 A-23 A-25 A-26 A-28 A-30 A-30 A-31
Introduction Data Structures General Information Blockage Error Hard Ring Parity Errors Orphan Byte Error Soft Ring Parity Error Interframe Buffer Parity Error RAC Output Parity Error Write Format Error Read Format Error Received Too Short Error Read Inhibit Error Excessive Ring Command Interrupts Token Removed from Ring Source Match Error Miscellaneous RAC Problem Unexpected Loss of Token Checksum Audit Failure Node Processor Parity Failure
Ring Maintenance Reference Material

s
B-1 B-1 B-1 B-3 B-4 B-5 B-5
Ring Transport Errors Ring-Related Errors Node-Related Errors Errors Without Consequences Unexplained Loss of Token Some IMS Input Messages
Issue 16.0
December 2000
vii
401-661-045
Contents
s s
Setting the ECD Flag for Manual Ring Mode ECD Values for Interframe Buffers
B-6 B-7
viii
Issue 16.0
December 2000
Figures

1-1. 1-2. 1-3. 1-4. 1-5. 1-6. 1-7. 1-8. RAC Parity/Format Error Unexplained Loss of Token SRC Match RAC Output Parity Error General RAC Error NAUD Failure Interframe Buffer Error Ring Down
1-1 1-14 1-19 1-23 1-29 1-31 1-33 1-37 1-43

2-1. 2-2. 2-3. 2-4. 2-5. 2-6. Conceptual Illustration of an IMS Ring A Ring Access Circuit on the IMS Ring Interframe Buffers IMS Message Format Illustration of an Isolated Ring Before (top) and After (bottom) Becoming a BISO or EISO Node
2-1 2-2 2-4 2-9 2-11 2-14 2-15
Ring Maintenance
3-1. 3-2. 3-3. 3-4. 3-5. A 1105 Display Page An 1106 Display Page Isolated RACs of BISO and EISO Nodes Manual Recovery - Method One Manual Recovery - Method Two
3-1 3-29 3-33 3-48 3-78 3-79

4-1. 4-2. Ring OOS Normal Single Node Isolation
4-1 4-4 4-8
Issue 16.0
December 2000
ix
401-661-045
Figures
4-3. 4-4. 4-5. 4-6. 4-7. New BISO Established Diagnosing EISO Node Two or More Faulty Nodes New BISO Node More Than One Faulty Node 4-9 4-10 4-14 4-16 4-18

5-1. CNCE Messages
5-1 5-3

6-1. General Format for Input/Output Messages
6-1 6-7
7-1
Issue 16.0
December 2000
Tables
1-1
2-1
Ring Maintenance
3-1. 3-2. 3-3. 3-4. 3-5. 3-6. Node Problems Mapped to Maintenance States and EAR Actions ARR Responses to Maintenance-States Output Messages that Report ARR Actions Alarms Associated with IMS Output Messages 1105-Page Symbols of Node Major States Circuit Pack LED States
3-1 3-17 3-21 3-23 3-27 3-31 3-44
4-1

5-1. CNCE Descriptions
5-1 5-5

6-1. 6-2. 6-3. 6-4. 6-5. Discontinued Availability CP Listings DGN Message Input Variations OP:RING Input Message Variations IRN and IRN2 RPCN Node Diagnostic Phases IRN LN (LIN - E/SS7) Node Diagnostic Phases
6-1 6-3 6-8 6-9 6-10 6-11
Issue 16.0
December 2000
xi
401-661-045
Tables
6-6. 6-7. 6-8. 6-9. 6-10. 6-11. 6-12. 6-13. 6-14. 6-15. 6-16. 6-17. 6-18. 6-19. 6-20. 6-21. 6-22. 6-23. 6-24. 6-25. 6-26. 6-27. 6-28. 6-29. 6-30. IRN LN (LI4S/SS7) Node Diagnostic Phases IRN DLNE Node Diagnostic Phases IRN2 DLN30 Node Diagnostic Phases IRN2 DLN60 Node Diagnostic Phases IRN CDN-I Diagnostic Phases IRN2 CDN-II/CDN-IIx Diagnostic Phases IRN2 CDN-III Diagnostic Phases IRN2 EIN Node Diagnostic Phases IRN MDL (SCN, DSN, ICN) Diagnostic Phases Discontinued Availability CP Listings IRN and IRN2 RPC Trouble Location CP List IRN LN (LIN-E/SS7) Trouble Location CP List IRN LN (LI4S/SS7) Trouble Location CP List IRN DLNE Trouble Location CP List IRN2 DLN30 Trouble Location CP List IRN2 DLN60 Trouble Location CP List IRN CDN-I Manual Trouble Location CP List IRN2 CDN-II/CDN-IIx Manual Trouble Location CP List IRN2 CDN-III Trouble Location CP List IRN2 EIN Node Trouble Location CP List IRN MDL (CSN, DSN, ICN) Trouble Location CP List Physical Node ID (Decimal Representation) Physical Node ID (Hexadecimal Representation) Physical Node Addresses (Decimal Representation) Physical Node Addresses (Hexadecimal Representation) 6-12 6-14 6-15 6-17 6-18 6-20 6-22 6-23 6-24 6-25 6-25 6-27 6-28 6-30 6-32 6-33 6-34 6-37 6-38 6-39 6-40 6-44 6-47 6-50 6-53

7-1. 7-2. 7-3. 7-4. Power Unit Index Ring Node Power Supply Index Hardware Version Values (with IFB) Hardware Version Values (No IBF)
7-1 7-3 7-21 7-25 7-27
xii
Issue 16.0
December 2000
Tables
A-1

B-1. Some Versions of the RST Input Message
B-1 B-5
Issue 16.0
December 2000
xiii
401-661-045
Tables
xiv
Issue 16.0
December 2000
About This Document
This chapter gives an overview of the contents, intended audience, and use of the Flexent/AUTOPLEX Wireless Network Systems Common Network Interface (CNI) Ring Maintenance manual.
Purpose
This guide gives you the instructions to maintain and troubleshoot the CNI Ring as used in a Flexent/AUTOPLEX wireless network. NOTE: This document is not intended for use with the 5ESS Digital Cellular Switch (DCS) component of a Flexent/AUTOPLEX wireless network. The 5ESS DCS documentation should be used for ring maintenance.
Reasons for Reissue

Issue 16 is reissued for the following reasons:
s s s
To correct erroneous information To revise any technical errors To make quality improvements
Issue 16.0
December 2000
xv
401-661-045
Intended Audience
The audience for this guide includes users who maintain the CNI ring. This may be the Lucent Technologies support personnel (CTSO) or the cellular providers technicians.
How to Use This Document

This guide is organized as follows:
s
Chapter 1Overview of the CNI Ring Describes the components of a CNI ring.
Chapter 2Description of the Ring Subsystem Describes the ring subsystem.
Chapter 3Ring Maintenance Explains the maintenance philosophy behind the CNI ring.
Chapter 4Ring and Ring Node Maintenance Procedures Explains how to run the maintenance procedures for both the ring and the ring nodes.
Chapter 5Ring Critical Events Explains events that indicate abnormal behavior in the ring.
Chapter 6Diagnostic Users Guide Explains how to perform diagnostics on ring nodes for a CNI ring-based ofce.
Chapter 7Equipment Handling Procedures Describes how to handle equipment when replacing hardware on the CNI ring.
Appendix ARing Error Analysis and Recovery Describes the ring error analysis and recovery procedures and mechanisms.
Appendix BRing Maintenance Reference Material Contains material in reference to maintaining the CNI ring.
s s
Glossary and Acronyms Index
xvi
Issue 16.0
December 2000
About This Document
Conventions Used
Specic typography is used in this guide to show actions or results. Commands you enter on the keyboard are shown in bold Data screens or responses from the system are shown in
constant width
Options for commands are shown in italics Keys that must be pressed on your keyboard are shown in ENTER
Product Safety Labels

Admonishments are strategically-placed reminders that assure safety of personnel, minimize service interruptions or loss of data, and minimize damage to equipment, products, or software. The types of admonishments used in this guide are listed below.
DANGER:
Indicates the presence of a hazard that will cause death or severe personal injury if the hazard is not avoided.
WARNING:
Indicates the presence of a hazard that can cause death or severe personal injury if the hazard is not avoided.
Indicates the presence of a hazard that will or can cause minor personal injury or property damage if the hazard is not avoided.
CAUTION:
NOTE: Noties you that something needs special attention or consideration.
Issue 16.0
December 2000
xvii
401-661-045
How to Order Documentation

The FLEXENT/AUTOPLEX Wireless Network Systems Customer Documentation Catalog (401-610-000) is a guide to all FLEXENT/AUTOPLEX Wireless Network Systems customer documents and includes document descriptions and ordering information. To order FLEXENT/AUTOPLEX Wireless Network Systems documents, including documents on CD-ROM, and all other Lucent Technologies product documentation by phone, please use the following numbers: Within the United States: Voice: 1-888-LUCENT8 or 1-888-582-3688, prompt 1 FAX: 1-800-566-9568
xviii Issue 16.0
December 2000
About This Document
Locations outside of the United States: Australia and all European countries: (317) 322-6416 Asia Pacic and China: (317) 322-6411 North America (excluding U.S.) and all other countries: (317) 322-6646 FAX for all international customers: (317) 322-6699 Product documentation can be ordered by mail using this address: Lucent Technologies Customer Information Center Attention: Order Entry Section 2855 N. Franklin Road P.O. Box 19901 Indianapolis, Indiana 46219 U.S.A. To order documentation electronically, visit the Lucent Technologies Customer Information Center web site at:
http://www.cic.lucent.com
How to Comment on This Document

Lucent Technologies has endeavored to ensure that this document meets your needs. We are interested in your suggestions for improving the document. At the back of this document is a postage-paid comment card. Please complete the comment card and mail it to us at the preprinted address. If your copy of the document has no comment card, please specify the title of the document and mail your comments to Lucent Technologies 1000 E. Warrenville Road P.O Box 3013 Naperville, Illinois 60566-7013 U.S.A. Attn: Customer Training and Information Products ManagerRoom 2V-120 or e-mail your comments to wirelessdocs@lucent.com
Issue 16.0
December 2000
xix
401-661-045
xx
Issue 16.0
December 2000
1
1-1 1-2 1-3 1-3 1-4 1-4 1-4 1-4 1-5 1-5 1-6 1-6 1-6 1-6 1-7 1-7 1-8 1-8 1-8 1-9 1-9 1-10 1-10 1-10
Contents
DSN/CSN/ICN Hardware Descriptions CDN Hardware Description
s s
s s s
CDN CDN-I Double Plate CDN-I Single Plate CDN-I CDN-II CDN-IIx CDN-III
RPCN Hardware Description Direct Link Node Hardware Description SS7 Node Hardware Description CNI Integrity Process Descriptions Error Analysis and Recovery Process Automatic Ring Recovery Process Node Audit Capability Ring Audit Capability RPCN Token Audit CNI Safety Net Capability
s s
Inhibiting CNI Safety Net Allowing CNI Safety Net Feature Daily Activity Recommendation Faulty Node Recovery Strategy
General Maintenance
s s
Issue 16.0
December 2000
1-i
401-661-045
Contents
s
Routine Diagnostics RAC Parity/Format Error Cause Effect Craft Recovery Action Unexplained Loss of Token Effect Craft Recovery Action SRC Match Cause Effect Craft Recovery Action RAC Output Parity Error Cause Effect Craft Recovery Action General RAC Error Detected Cause Effect Craft Recovery Action Node Audit Failure Cause Effect Craft Recovery Action Interframe Buffer Parity Error Cause Effect Craft Recovery Action Read Format Error Cause Effect Craft Recovery Action Write Format Error Cause Effect Craft Recovery Action
1-11 1-11 1-12 1-12 1-12 1-12 1-17 1-17 1-17 1-21 1-21 1-21 1-21 1-27 1-27 1-27 1-27 1-30 1-30 1-30 1-30 1-32 1-32 1-32 1-32 1-35 1-35 1-35 1-35 1-38 1-38 1-38 1-38 1-39 1-39 1-40 1-40 1-41
Fault Descriptions
s
Emergency Maintenance
1-ii
Issue 16.0
December 2000
Contents
s s s s
Ring Down Recovery Rolling CNI Initializations Global CDN Recovery Single CDN Recovery
1-41 1-41 1-47 1-48
Issue 16.0
December 2000
1-iii
401-661-045
Contents
1-iv
Issue 16.0
December 2000
The Common Network Interface (CNI) ring serves as the medium that connects the various cellular processors together. The following sections describe the basic hardware conguration of each type of processor.
DSN/CSN/ICN Hardware Descriptions

A Digital Switch Node (DSN) is the CNI node that is used to connect the Digital Cellular Switch (DCS) to the rest of the system via data links to the DSN. A Cell Site Node (CSN) is the CNI node that is used to connect the cell sites to the rest of the system via data links to the CSN. An Inter-Cellular Node (ICN) is the CNI node that is used to connect cellular systems together via data links to the ICN. The basic difference between each of these three node types is the software that resides in each node. The hardware conguration for these nodes is identical. In the Flexent/AUTOPLEX environment, each of these nodes is equipped with an Integrated Ring Node (IRN) circuit pack. This IRN board comes in several different microcode versions: MC3F014A1 MC3F018A1 MC3F026A1 UN303 UN303B UN303B
Issue 16.0
December 2000
1-1
401-661-045
MC3F026A1B UN303C MC3F026A1C UN304 All of these versions can be used in a CSN, DSN or ICN. The IRN board can be found in the Node Processor (NP) slot of each node. A new circuit pack, the UN304/UN304B, has replaced the UN303 in many applications. When the UN304 is used, the node is called an IRN2. When the UN304B is used, the node is called the IRN2B. Unless specically stated, the term IRN can apply to any of these circuit packs. When an IRN2B is used in a CSN, it is known as a CSN Enhanced (CSNE). Unless specied otherwise, all references to CSN can include the CSNE. The memory data link (MDL) circuit pack handles the transfer of information between the data links and the node processor. A CSN can be equipped with two MDL boards (MDL0 and MDL1), with each MDL capable of handling four data links. DSNs and ICNs should be equipped with only one MDL board. There are two types of MDL circuit packs: a TN1317 version and a TN1640 version. Either type can be used in a CSN, DSN or ICN. The TN1640 version provides additional message throughput and should be used in CSNs containing heavily loaded cell sites. See the System Capacity Monitoring and Engineering Guidelines, 401-610-009, for recommendations on how to assign CSN, DSN or ICN data links. The data links coming into each of these node types connect to an 11A, 12A, 13A, or 13B adaptor board. The 11A adaptor board is used for RS232 connections, the 12A adaptor board is used for RS449 connections, and the 13A and 13B adaptor boards are used for V.35 connections. These adaptor boards are attached to the backplane of the CSN/DSN/ICN on the vertical slot location occupied by the MDL boards. Each adaptor board holds up to four data links and there is one adaptor board for each equipped MDL board.
CDN Hardware Description

A Call Processor/Data Base Node (CDN) is the CNI node which handles the call processing functions of the FLEXENT/FLEXENT/AUTOPLEX Wireless Network Systems. A CDN is basically a two-part unit consisting of a node and Ring Application Processor (RAP) unit. The following versions of CDNs may be found in existing systems:
s s
CDN CDN-I [sometimes referred to as a Standard Multi-Application Real Time (SMART) Node (SN)] CDN-II [sometimes referred to as a Turbo CDN (TCDN)]
1-2 Issue 16.0
December 2000
s s
CDN-IIx CDN-III.
Unless specied otherwise, references to CDN in this document apply to any of these versions.
CDN
The original CDN used a double-plate RAP with 2-Mbyte memory boards. A double plate CDN occupies two horizontal mounting plate locations in a CNI frame. The CCC and CCS pair can be either a UN237 and UN236 pair or a UN625 and UN626 pair. They must be a matched pair. That is, a UN2XX series CCC/CCS board is not compatible with a UN6XX series CCC/CCS board. The MASC board can be either a UN95 board or a UN295 board. There can be up to four MASC boards in the FLEXENT/AUTOPLEX environment (MASC0 MASC3). The MASA boards are always TN56 boards. Each TN56 board provides 2 Mbytes of memory, and there can be up to eight MASA boards per MASC memory group. The NPI board is always a TN1349 board.
CDN-I
In the FLEXENT/AUTOPLEX environment, the node is always equipped with an IRN circuit pack. Only two of the three possible microcode versions are approved for use in a CDN-I. The approved versions are: MC3F018A1 MC3F026A1 UN303B UN303B
The RAP portion of a CDN-I is a 3B15-based computer. The basic functional components that make up this unit are a central controller cache (CCC) board, a central controller support (CCS) board, a main store controller (MASC) board, the main store array (MASA) memory boards, and a node processor interface (NPI) board. A CDN-I comes in two different versions commonly referred to as double plate or single plate CDN-I.
Issue 16.0
December 2000
1-3
401-661-045
Double Plate CDN-I

A double plate CDN-I occupies two horizontal mounting plate locations in a CNI frame. The CCC and CCS pair can be either a UN237 and UN236 pair or a UN625 and UN626 pair. They must be a matched pair. That is, a UN2XX series CCC/CCS board is not compatible with a UN6XX series CCC/CCS board. The MASC board can be either a UN95 board or a UN295 board. There can be up to four MASC boards in the FLEXENT/AUTOPLEX environment (MASC0 MASC3). The MASA boards are always TN56 boards. Each TN56 board provides 2 Mbytes of memory, and there can be up to eight MASA boards per MASC memory group. The NPI board is always a TN1349 board.
Single Plate CDN-I

A single plate CDN-I only occupies one horizontal mounting plate location in a CNI frame. This space reduction is due to the replacement of the 2-Mbyte TN56 MASA boards with TN1398 MASA boards. The TN1398 boards provide 16 Mbytes of memory per board, and there can be up to eight MASA boards in the unit. The CCC and CCS pair must be a UN625 and UN626 pair. The MASC board must be a UN507 board. The same NPI board (UN1349) is used in the single plate CDN-I as in the double plate CDN-I.
CDN-II
The CDN-II is a Turbo CDN node type. The CDN-II is composed of an IRN2, an\ 80386-based NP, and an AP30 (prime) attached processor (AP). The AP30 is a 68030-based processor board with 80 Mbytes of local memory (16 Mbytes on the base board and an additional 64 Mbytes of zig-zag in-line package (ZIP) memory on a mezzanine board).
1-4 Issue 16.0
December 2000
CDN-IIx
The CDN-IIx is a modied Turbo CDN node type. The CDN-II is composed of an IRN2, an 80386-based NP, and a modied AP30 attached processor. The modied AP30 is a 68030-based processor board with 16 Mbytes of local memory on the base board and from 64 to 256 Mbytes on a mezzanine board. The additional memory comes from two to eight 32-Mbyte serial in-line memory modules (SIMM). Unless otherwise specied, any reference to CDN-II applies to both the CDN-II and CDN-IIx.
CDN-III
The CDN-III is an improved CDN that may be used to upgrade CDN-II or CDN-IIx type nodes. The CDN-III consists of an IRN2 node core and AP60 attached processor (TN2523), providing greater processing and memory capacity than previous CDNs. The AP60 uses an MC68LC060 processor.
RPCN Hardware Description

The Ring Peripheral Controller Node (RPCN) is the unit which provides the interface between the ring and the ECP. In the FLEXENT/AUTOPLEX environment, the ring is always equipped with two RPCNs. This IRN board is located in the NP slot of the RPCN. The microcode versions approved for use in an RPCN are: MC3F026A1 MC3F026A1 UN303B UN304
Never use MC3F014A1 or MC3F18A1 microcode versions in an RPCN. Doing so could seriously hinder the rings ability to perform automatic fault recovery tasks. The RPCN can also be equipped with an IRN2 or IRN2B board, the UN304 or UN304B. This board is also located in the NP slot of the RPCN. The RPCN has a duplex dual serial bus selector (DDSBS) which basically terminates the ECPs connection to the ring. This board is a TN69B and has a connection from the RPCN to each Control Unit (CU) of the ECP (CU0, CU1).
CAUTION:
Issue 16.0
December 2000
1-5
401-661-045
The RPCN also contains a 3B Interface (3BI) board which serves as the interface between the DDSBS an the NP of the RPCN. This board is a TN914.
Direct Link Node Hardware Description

A Direct Link Node (DLN) is basically an RPCN equipped with an attached processor (AP), with respect to its hardware conguration, but has a different task to perform in the FLEXENT/AUTOPLEX environment. The function performed by a DLN is to route the data link message trafc between cellular systems. The DLN is used to route messages into and out of the FLEXENT/AUTOPLEX systems, and for both X.25 and SS7 types of intersystem networking. FLEXENT/ AUTOPLEX currently supports three types of DLNs: the DLNE, the DLN30, and the DLN60.
s s
The DLNE has IRNB, AP30, 3BI, and DDSBS boards. The DLN30 replaces the IRNB board with an IRN2B to provide increased performance and higher reliability. The DLN60 provides more processing power and memory than previous types of DLNs. The DLN60 uses an IRN2 node core with an AP60 attached processor. The DLN60 does not have a 3B21D computer interface.
SS7 Node Hardware Description

The SS7 nodes are used to interface with the Signal Transfer Points (STP). In the FLEXENT/AUTOPLEX environment, SS7 nodes are always equipped with an IRN circuit pack. All three IRN microcode versions are approved for use in an SS7 node. An SS7 node is also equipped with a Link Interface board. This board handles one data link from the FLEXENT/AUTOPLEX system to the STP. The LI board can be either a TN916 (MC3F003A1) or a TN1316.
EIN Ethernet Interface Node

The Ethernet Interface Node ( EIN) is an Interprocess MessageSwitch (IMS) user node on the Common Network Interface (CNI) ring. The Ethernet Interface Node (EIN) provides access through the Ethernet from the ring to the Application Processor (AP). CNI provides the capability to transport data from the EIN to the AP and vice versa over the Ethernet.The EIN hardware consists of the following:
1-6 Issue 16.0
December 2000
Integrated Ring Node (IRN) 2 (IRN2) circuit pack (CP), UN304B (MC3F024AIB) EIN Link Interface (ELI) CP, TN4016 Paddleboard, 9822EB Cable ED3F064-37 G80.
s s s
CNI Integrity Process Descriptions

This section describes the various software processes responsible for monitoring the CNI ring to verify that it is functioning properly. .
Error Analysis and Recovery Process

CNI provides an Error Analysis and Recovery process (EAR) which is responsible for analyzing error reports from the ring and determining the probable cause of the fault. Once the cause of the fault is determined, automatic corrective actionis taken. This corrective action could be as simple as restoring the ring to its original conguration (no recovery action was necessary) or could result in nodes being removed from service and left in the isolated state.
Automatic Ring Recovery Process

CNI provides an Automatic Ring Recovery (ARR) process which is responsible for automatically restoring nodes which have been removed from service by the EAR process. CNI also provides an Application Specied Unconditional Restore (ASUR) process that allows the application to specify the manner in which ARR is to restore an out-of-service node (conditional or unconditional restore). In the FLEXENT/AUTOPLEX environment, a node that is removed from service will be unconditionally restored (no diagnostics performed) if this is the rst time the node has been removed in the last hour. The only exception to this rule is in the event that EAR suspects the ring interface circuitry of the IRN board may be faulty. In this case, the node will be left in the isolated state until diagnostics are performed and the node passes phase 1 and phase 2. This is necessary to ensure the stability of the ring. Restoring a node unconditionally that is in the ring interface faulty state could result in faults being generated which seriously threaten the performance of the CNI ring.
Issue 16.0
December 2000
1-7
401-661-045
If this is the second time a node has been removed from service by EAR in the past hour, ARR will diagnose the node and only restore the unit if it passes all diagnostic phases. If this is the third time a node has been removed from service by EAR in the past hour, the node will be left in the out-of-service state. This link node will remain in this state until craft takes the appropriate recovery action to restore the node to service.
Node Audit Capability

The Node Audit feature is a CNI process responsible for ensuring that nodes which are in the active state are functioning properly and are capable of communicating with the ring. The Node Audit does this by periodically sending a message from the ECP destined for a node, followed by a chaser message. This chaser message is not destined for any particular node. Its purpose is to circulate around the ring undisturbed and return to the node audit process. When the link node receives this audit request, it should respond by sending a reply message back to the ECP. If the ECP receives the reply message, all is well. If the reply is lost, but the chaser message arrives at the ECP as expected, then another audit message is sent to the node. If this reply is also lost, the node is assumed to be in an insane state and will be removed from service. If the rst reply message was lost and the chaser message did not arrive at the ECP as expected, this implies a possible RPCN or ring problem. This is discussed in the Ring Audit Capability section of this chapter.
Ring Audit Capability

The Ring Audit feature is a CNI process based on the Node Audit process. The Ring Audit veries the message communication path from the ECP to the ring. This task is performed by monitoring the results of the chaser message sent out by the Node Audit Capability. If a chaser message is lost, another chaser will be sent through the other RPCN. If this test is successful, then the RPCN which was rst tested is assumed to be faulty and is removed from service. If the second chaser message is also lost, or the other RPCN is already out of service, a Level 3 EAR is invoked in an attempt to isolate and correct the possible ring/RPCN trouble.
1-8 Issue 16.0
December 2000
RPCN Token Audit

The RPCN Token Audit Capability is a CNI process that ensures a token message is circulating around the ring at all times. Since a node must possess the token message in order to write to the ring, it is critical that this message be present. The audit is performed by periodically forcing the RPCN to exercise its ring write circuitry, thus forcing it to read the token message. If a special timer res within the RPCN before the token is detected, the token is assumed to be lost and the RPCN sends a lost token report to the EAR process in the ECP. The EAR process then reports an unexplained loss of token. A token tracking audit is then run in an attempt to discover where the token was lost. The EAR process then initiates a Level 0 restart in an attempt to return the ring to service. If this restart is unsuccessful, EAR escalates to a Level 3 ring recovery.
CNI Safety Net Capability

The CNI Safety Net Capability is an FLEXENT/AUTOPLEX process whose sole purpose is to verify that the CNI ring is up and functional. When Safety Net detects a problem with the ring, it will respond by requesting a CNI Level 3 initialization or CNI Level 4 initialization depending on the severity of the problem. Safety Net checks the integrity of the ring every 60 seconds. It does so by sending a message from the ECP to a different node every 60 seconds. If the message is returned to the ECP by the node, then all is well. If the message is not returned to the ECP, Safety Net increments a counter and begins repeating this process, cutting the interval from 60 seconds to 10. If the failed message counter reaches its maximum error threshold (eight at present time), a Level 3 CNI initialization will be requested to restore the communication path to the CNI ring. Another critical item monitored by the CNI Safety Net is to ensure that the system has a minimum of one active CDN. If Safety Net detects that all CDNs are out of service, an SI24 Defensive Check Failure Assert message is printed on the ROP. This will repeat every minute for four additional minutes (ve total messages). On the sixth SI24, a CNI Level 4 Initialization will be initiated. The Safety Net will then turn itself off for 90 minutes. It should be noted that if Safety Net detects all CDNs are out of service, it will rst check to see if a CDN is in the process of being restored. If so, it will allow that CDN to come up rather than begin a CNI initialization.
Issue 16.0
December 2000
1-9
401-661-045
Inhibiting CNI Safety Net

At times, it may be necessary to inhibit (turn off) the CNI Safety Net feature. This need may arise due to a fault existing in the ring that prevents the system from being recovered via a CNI Level 4 initialization. Safety Net would continue to request CNI Level 4 initializations, getting in the way of craft attempts to clear the fault from the ring. The Safety Net feature can be easily inhibited from the Emergency Action Interface (EAI) page on the MCRT. Once on this page,
s s s
Enter a 42 poke command. Enter i (inhibit) for the parameter value. Next, a 50 initialization is required to set the ag in ECP memory.
Once Safety Net has been inhibited, it will remain in this state until a 54 initialization occurs or the inhibit ag is cleared from the EAI page (see following section). Whenever Safety Net is inhibited, it is critical that craft personnel remember to turn the feature back on once the source of the fault has been cleared. Failure to do so could result in an extended outage which Safety Net may have avoided.
Allowing CNI Safety Net Feature

The CNI Safety Net feature is always turned on at boot (54) time and remains this way unless inhibited from the EAI page. Once the feature is inhibited, it will remain in this state until craft resets the inhibit ag. To turn the Safety Net feature back on, once again go to the EAI page and:
s s s
Enter a 42 poke command. Enter a to allow the feature to function. Enter a 50 initialization is required to clear the inhibit ag in ECP memory.
General Maintenance
This section provides craft with information which could assist in identifying potentially faulty hardware before the problem is serious enough to cause a ring outage. Also included in this section are descriptions of common CNI ring faults and the steps necessary to correct the situation.
1-10 Issue 16.0
December 2000
Daily Activity Recommendation

The most important tool available to craft to prevent a serious ring event is the daily history of ring maintenance activity. This information is critical given the FLEXENT/AUTOPLEX strategy for recovering faulty nodes. Quite often, a faulty node will be removed from service and restored so quickly that craft is unaware the fault ever occurred. This recovery strategy will be briey discussed in the next section. The history of recent ring maintenance activity is kept in the RPTERR1 log le located in the /etc/log directory. This le should be inspected daily for the occurrence of ring faults. The UNIX command ls -l RPTERR1 will provide the date and time of the last entry to this log le. If this time stamp indicates recent ring activity, the log le should be examined to determine the nature of the activity. When this log le reaches its maximum allowable size, it is moved to RPTERR0 and a fresh RPTERR1 log le is started. This activity could be the result of routine RPCN midnight diagnostics or the result of a ring fault. If the activity is determined to be a ring fault, locate the ring fault in the `Fault Descriptions section of this chapter for assistance in correcting the situation.
Faulty Node Recovery Strategy

Usually when a node is automatically removed from service, it is due to a transient fault. This fault could be either a hardware glitch, or a software fault which causes the node to basically shut down operation. Many of these transient faults can be corrected by reinitializing the node. The only way for the node to request this is to refuse to accept messages from the ring. Once this happens, messages destined for the node will be returned to the sender. When the sending node receives this message, it reports this to the ECP and the ECP removes the node from service. Once the node is removed, it is up to ARR to restore the node to service. As mentioned in the Automatic Ring Recovery Process section, the rst time a node is removed from service within a 60-minute interval, it will be restored unconditionally (no diagnostics performed). This is due to the transient nature of most faults. If it was a one-time event, the node will probably be ATP if diagnostics are performed. Given this, it is more important to get the node back into service as quickly as possible rather than take the additional time to diagnose the node on the rst fault. If a second fault occurs within an hour, the node will be diagnosed. However, at times a node may contain questionable hardware which may only result in the node being faulted a couple of times a day or even less frequently. It is this borderline hardware that makes it imperative for craft to understand the importance of monitoring the daily activity in the RPTERR1 log le mentioned earlier. If a persistent fault is detected, craft intervention may be necessary to isolate the source of the problem.
Issue 16.0
December 2000
1-11
401-661-045
Routine Diagnostics
Given the rings ability to detect and report suspected faulty hardware, it is not recommended that diagnostics be performed on every node around the ring. However, it is recommended that RPCNs, CDNs and DLNs be taken down at least once a month (weekly if possible) and diagnosed. These nodes have been selected for preventive maintenance due to both their importance to system performance, and the extended amount of time it takes to diagnose and restore these nodes should a fault occur. While CSNs, DSNs, ICNs and SS7 are certainly important to the system, their loss does not seriously threaten system performance. Also, in the event one of these nodes is lost, the recovery time is minimal if this is the rst fault. NOTE: On the subject of performing routine diagnostics, it should be noted that there is a critical difference between a single plate and double plate (TN1398 or TN56 memory boards) CDN-I unit. Requesting diagnostics on a double plate CDN-I will result in the entire CDN-I being diagnosed. The same can not be said of a single plate CDN-I. For a single plate CDN-I, craft MUST specify that demand phases 54 through 61 be executed. These phases are responsible for diagnosing the 16Mbyte memory boards (one phase for each MASA board equipped). These memory diagnostics are done on a demand basis only due to the time required to complete memory diagnostics on the TN1398 circuit packs.
Fault Descriptions
This section describes various CNI ring faults. The output message associated with the fault is presented, followed by the cause of the fault, the effect the fault has on the ring, and the recovery action to clear the fault. For a more detailed description of possible faults, see Appendix A, Ring Error Analysis and Recovery. In the following descriptions, the terms upstream node and downstream node will be used. These terms describe relative position of nodes and are based on the direction of data ow on the rings. Basically, any particular node will RECEIVE data from its upstream neighbor and will SEND data to its downstream neighbor. Since the data ows in opposite directions on the two rings, a nodes upstream neighbor on ring 1 is the downstream neighbor on ring 0 and its upstream neighbor on ring 0 is the downstream neighbor on ring 1. For example, with respect to ring 0, LN00-7s upstream neighbor is LN00-6 and its downstream neighbor is LN00-8.
1-12 Issue 16.0
December 2000
RAC Parity/Format Error

The output message present on the ROP and the RPTERR1 log le for this fault is as follows: REPT RING TRANSPORT ERR RAC PARITY/FORMAT ERROR DETECTED, LN00 7 RAC 0. X00000000 XFFFFFFFF X03000008 X00000380 X00004000 X00000300 (3121083924)
Cause
The reporting node, LN00-7 in this example, is reporting that its upstream neighbor on RAC 0 (LN00 6) tried to pass a bad message to it. This message is used to report both bad parity and an orphan byte failure. The effect and recovery action is the same regardless of which error type it is, so it is not necessary to determine which fault type it is from a craft perspective.
Effect
The node which had the bad message presented to it will refuse to accept the message. This will force the node offering the bad message to report ring blockage to EAR. EAR will attempt to reestablish normal ring communication by performing a Level 0 ring recovery. If this fails to correct the error condition, EAR will escalate to a Level 1 ring recovery which could result in nodes being removed and isolated.
Craft Recovery Action

The RPTERR1 log le should be examined to determine if this is the rst instance of the fault. If this is a recurring fault, the node reporting the fault and the upstream neighbor node should be taken down and diagnosed. If diagnostics do not nd a problem with either node, attempt to clear the fault by cleaning and reseating the circuit packs in the suspect nodes using the recommended contact cleaner. NOTE: Miller Stevenson Company markets an aerosol form of the solvent-lubricant which is recommended (1.0 percent OS-124 in Freon TA) for use on CNI ring backplanes and circuit packs. This product is marketed as MS-181. If the fault persists, replace packs in the following order:
Issue 16.0
December 2000
1-13
401-661-045
1. If there is a pair of interframe buffer boards (IFB) between the node reporting the fault and the upstream neighbor, replace the IFB associated with the node reporting the problem. 2. If the fault persists, and IFBs are involved, replace the IFB in the node upstream of the node reporting the fault. 3. If the fault persists, replace the IRN board in the node upstream of the node reporting the problem. 4. If the fault persists, replace the IRN board in the node reporting the problem. 5. If the fault persists, and there are IFBs involved, there could be a cable problem. Call for assistance to isolate the source of the fault. See Figure 1-1 on page 1-15.
1-14 Issue 16.0
December 2000
Chart 1
RAC parity format error
Run diagnostics on the faulted node and both neighbors
Replace packs & diagnosed as per TLP list
ATP?
Examine UNIX file /etc/log/RPTERR1
ATP?
1st occurrence?
Y Done Transient fault. Monitor /etc/log/RPTERR1 log file for several weeks. If fault returns, go to 1st occurrence no leg
Go to Chart 1A
Done
Figure 1-1.
RAC Parity/Format Error
Issue 16.0
December 2000
1-15
401-661-045
Chart 1A
Note 1: If RAC 0 is implicated in the output message, the upstream neighbor is the lower node number (LN32-4 is upstream of LN32-5). If RAC 1 is implicated, the upstream neighbor is the higher node number (LN32-6 is upstream of LN32-5).
IFB boards between reporting node and upstream neighbor?
Go to Chart 1B
N Replace IRN board in upstream neighbor
Cleared? N Replace IRN board in node reporting problem
Cleared? N Call for assistance
Done
Figure 1-1.
RAC Parity/Format Error (contd)
1-16 Issue 16.0
December 2000
Chart 1B
Note 2: RPCN32 is upstream of the last node in group 00 (or group 31 if equipped) on RAC 1 and downstream on RAC 0. RPCN00 is upstream of the last node in group 32 (or group 63 if equipped) on RAC 1 and downstream on RAC 0. Replace IFB in node reporting the fault Y
Cleared?
N Replace IFB in node upstream of reporting node Y
Cleared? N
Replace IRN in node reporting the fault. Note 3: If RPCN and it has no IRN, then replace the R0 board if RAC 0 implicated or R1 if RAC 1 implicated. Y N Possible cable problem. Call for assistance in swapping cables between rings
Cleared?
Bad cable. Configure cables so that the faulty cable is in RAC 1. Obtain new cable ASAP!
Fault move? N
Call for assistance
Done
Figure 1-1.
RAC Parity/Format Error (contd)
Issue 16.0
December 2000
1-17
401-661-045
Unexplained Loss of Token

The output message present on the ROP and the RPTERR1 log le for this fault is as follows: REPT RING TRANSPORT ERR UNEXPLAINED LOSS OF TOKEN REPORTED ON RING 0.Cause This message occurs when a RPCN detects that the token is no longer circulating around the ring.
Effect
EAR will initiate a token tracking procedure in an attempt to determine where the token was last seen. If the procedure is successful, the following message will result: REPT TOKEN TRACK TOKEN WAS LOST BETWEEN LN63 1 AND LN63 6 ON RING: 0 X00000000 X3F63F104 X00300001 X40040001 There are several other versions of the message that could result depending on outcome of the token tracking procedure. Reference the FLEXENT/AUTOPLEX Output Message Manual for the other versions of this message which could result. EAR will attempt to reestablish normal ring communication by performing a Level 0 ring recovery. If this fails to correct the error condition, EAR will escalate the ring recovery to a Level 1 which could result in nodes being removed and isolated.

The RPTERR1 log le should be examined to determine if this is the rst instance of the fault. If this is a recurring fault, and the token tracking report was successful, remove and diagnose the two nodes mentioned in the report. If the token tracking report was not successful, call for assistance. If diagnostics do not nd a problem with either node, attempt to clear the fault by cleaning and reseating the circuit packs in the suspect nodes using the recommended contact cleaner. NOTE: Miller Stevenson Company markets an aerosol form of the solvent-lubricant which is recommended (1.0 percent OS-124 in Freon TA) for use on CNI ring backplanes and circuit packs. This product is marketed as MS-181. If the fault persists, start replacing circuit packs in the following order:
1-18 Issue 16.0
December 2000
1. If there is a pair of interframe buffer boards (IFB) between the two nodes identied in the token tracking report, replace the IFB in one of the nodes. 2. If the fault persists, and IFBs are involved, replace the IFB in the other node identied in the token tracking report. 3. If the fault persists, replace the IRN board in one of the two nodes identied in the token tracking report. 4. If the fault persists, replace the IRN board in the other node identied in the token tracking report. 5. If the fault persists, call for assistance. See Figure 1-2 on page 1-20.
Issue 16.0
December 2000
1-19
401-661-045
Chart 2
Unexplained loss of token
Examine ROP & UNIX file /etc/log/RPTERR1 for token tracking report
Report successful?
Examine ROP & UNIX file /etc/log/RPTERR1 for other occurrences N
1st occurrence? Y Transient fault. Monitor /etc/log/RPTERR1 log file for several weeks to see if fault returns
Diagnose both nodes Replace packs & diagnose as per TLP list
ATP? Y
Replace IRN board in one of the nodes. If RPCN and it is not an IRN, then replace the R0 board if ring 0 is implicated or R1 if ring 1 is implicated
ATP? Y
Call for assistance
Cleared? N
Done Go to Chart 2A
Done
Figure 1-2.
1-20 Issue 16.0
December 2000
Chart 2A
Replace IRN board in other node. Y
Cleared? N
IFB boards between suspect nodes? Y Replace one node's IFB
Cleared? N Replace other nodes IFB
Cleared? N Possible cable problem. Call for assistance in swapping cables between rings
Bad cable. Configure cables so that the faulty cable is in RAC 1. Obtain new cable ASAP!
Fault move?
Done
Call for assistance
Figure 1-2.
Unexplained Loss of Token (contd)
Issue 16.0
December 2000
1-21
401-661-045
SRC Match
The output message present on the ROP and the RPTERR1 log le for this fault is as follows: REPT RING TRANSPORT ERR RMV LN33 7 RQSTD; SRC MATCH RPTD BY LN31 6 X6FB015F4 X352070B8 (2834204595)
Cause
An SRC match failure results when a node does not take a message from the CNI ring that was addressed to it. This message will eventually return to the source node, who will remove the message from the ring and will report an SRC match to the ECP against the destination node.
Effect
As stated above, the message will eventually return to the source node. The source node will remove the message from the ring and report the SRC match to the EAR. This will always result in the destination node being removed from service. ARR will then restore the node to service either conditionally or unconditionally, depending on the frequency of the faults against this node.

An occasional SRC match, in itself, is normally not cause for concern. CNI integrity software running in the nodes at times detects situations that require the node to be reinitialize to clear the fault. The only means available for a node to request itself to be reinitialized is for it to force itself to quit taking its messages from the ring, commonly referred to as panic the node. By refusing to read its messages from the ring, the node is assured of being removed from service via the SRC match mechanism and restored via ARR. When SRC matches are detected, the RPTERR1 log le should be examined to determine the frequency of the fault. If the fault is persistent, then there could be a hardware problem and the node should be diagnosed. If the node is a single plate CDN-I, demand phases 54 through 61 must be performed to completely test the main store memory. If diagnostics do not nd a problem with either node, attempt to clear the fault by cleaning and reseating the circuit packs in the suspect node using the recommended contact cleaner.
1-22 Issue 16.0
December 2000
NOTE: Miller Stevenson Company markets an aerosol form of the solvent-lubricant which is recommended (1.0 percent OS-124 in Freon TA) for use on CNI ring backplanes and circuit packs. This product is marketed as MS-181. If the fault persists, replace circuit packs in the following order: 1. If the faults are occurring immediately after the node is restored to service, check the ECD (rcvecd) and the application database (apxrcv, iun form) to verify they are in sync with respect to the node type. 2. If the fault persists, replace the IRN circuit pack. 3. If the fault persists, replace the MDL boards one at a time, or replace the LLI board if the node is an SS7 node. 4. If the node is a CDN, check the RPTERR1 log le for the existence of a CDN panic message in the form of: REPT COM100 TBL LN00 07 NADR: XC07 Panic : Hardware Local Bus Parity Error: CCS0(lba=0x0): CSRs=0x61100028,0x0 MASC0(lba=0x100000): CSRs=0x422054,0x4c00b500
CCS 61100028 MASC 00422054 NPI 00000000 5. If a message similar to this appears, it is not necessarily a local bus parity error. Go directly to page 3 of Figure 1-3 for CDN assistance. 6. If the fault persists, or the panic message is not present for a CDN, call for assistance in clearing the fault. See Figure 1-3 on page 1-24.
Issue 16.0
December 2000
1-23
401-661-045
Chart 3
SRC match
Run diagnostics on the faulted node N Replace packs & diagnose as per TLP list
ATP? Y
Examine UNIX file /etc/log/RPTERR1 Transient fault. Monitor /etc/log/RPTERR1 log file several weeks to see if the fault returns
ATP? Y
for
1st occurrence? N Determine fault frequency by examining ROP or RPTERR1 log file Done
Done
Check ECD to verify node type
Fault occurs immediately after restoral?
Go to Chart 3A
Check APXRCV DB to verify it agrees with ECD entry
Agree? Y
Correct any discrepancies and restore node N
Cleared? Y
Go to Chart 3A
Done
Figure 1-3.
SRC Match
1-24 Issue 16.0
December 2000
Chart 3A
Replace IRN board in faulty node
Cleared? N Go to Chart 3B Y
Is node a CDN? N Replace MDL 0 board
Cleared? N Replace MDL 1 board if equipped
Cleared? N Replace adaptor boards on node backplane
Call for assistance
Cleared? Y Done
Figure 1-3.
SRC Match (contd)
Issue 16.0
December 2000
1-25
401-661-045
Chart 3B
Check RPTERR1 error log for a PANIC: HARDWARE message for this CDN N
Present? Y
Call for assistance
NPI USEC timer change
Cache error
Unidentified SYS error
Local bus parity error
Double bit error
Replace the NPI board Y Cleared? N Call for assistance Y Y
Replace the CCC board
Replace the NPI board Y
Cleared? N Replace the CCS board
Cleared? N Replace the CCS board
Cleared? N Replace the CCC board N
Cleared? Y
Done
Go to Chart 3C
Figure 1-3.
SRC Match (contd)
1-26 Issue 16.0
December 2000
Chart 3C
TN56 memory boards? Y
Starting at demand Phase 54, run one phase for each MASA board equipped (54-61)
ATP? N
Go to next two pages for instructions on converting address in the panic message to a MASA board location
Replace boards & diagnose as per TLP list
N Replace suspected MASA board Cleared? Y Done Insert two new TN56 boards in the first two MASA slots. If fault still exists, return original boards and slide new boards to the next slot. Continue until the two new boards have been tried in each MASA position Insert a new MASC board. If fault still exists, return the original board and slide the new board to the next MASC until new board has been tried in each MASC Y N Y N Y Valid board number N
ATP?
Done
TN56 memory boards?
Insert a new TN1398 board in the first MASA slot. If fault still exists, return original board and slide new board to the next slot. Continue until new board has been tried in each MASA slot
Cleared? Y
Done
Figure 1-3.
SRC Match (contd)
Issue 16.0
December 2000
1-27
401-661-045
RAC Output Parity Error

The output message present on the ROP and the RPTERR1 log le for this fault is as follows: REPT RING TRANSPORT ERR RAC OUTPUT PARITY ERROR DETECTED, LN31 2 RAC 1. X00000000 X00000000 X03020002 X00002280 X00014000 X00000300 (2923885816)
Cause
The node reporting the fault detected that it had attempted to write a message with bad parity to the ring.
Effect
The node which had the bad message presented to it will refuse to accept the message. This will force the node offering the bad message to report ring blockage to EAR. EAR will attempt to reestablish normal ring communication by performing a Level 0 ring recovery. As part of this recovery process, each node will reread the message that it had presented to the downstream neighbor. When doing this, the node reporting the fault detected that it had presented a message containing bad parity to its downstream neighbor. If this fails to correct the error condition, EAR will escalate the ring recovery to a Level 1 which could result in nodes being removed and isolated.

The RPTERR1 log le should be examined to determine if this is the rst instance of the fault. If this is a recurring fault, the node reporting the fault should be removed and diagnosed. If diagnostics do not nd a problem with either node, attempt to clear the fault by cleaning and reseating the circuit packs in the suspect nodes using the recommended contact cleaner. NOTE: Miller Stevenson Company markets an aerosol form of the solvent-lubricant which is recommended (1.0 percent OS-124 in Freon TA) for use on CNI ring backplanes and circuit packs. This product is marketed as MS-181. If the fault persists, start replacing circuit packs in the following order: 1. Replace the IRN board in the node reporting the fault.
1-28 Issue 16.0
December 2000
2. If the fault persists, call for assistance. See Figure 1-4 on page 1-30.
Issue 16.0
December 2000
1-29
401-661-045
Chart 4
RAC output parity error
Run diagnostics on the node reporting the fault
Replace packs & diagnose as per TLP list N
ATP? Y
ATP? Y
Examine ROP & UNIX file /etc/log/RPTERR1 for other occurrences
1st occurrence? N
Transient fault. Monitor /etc/log/RPTERR1 log file for several weeks to see if fault returns
Replace the IRN board in the node reporting the problem. Note: If RPCN and it has no IRN, then replace the R0 board if ring 0 is implicated or R1 board if ring 1 is implicated
Call for assistance
Cleared? Y
Done
Figure 1-4.
1-30 Issue 16.0
December 2000
General RAC Error Detected

The output message present on the ROP and the RPTERR1 log le for this fault is as follows: REPT RING TRANSPORT ERR GENERAL RAC ERROR DETECTED, LN63 1 RAC 0. X00000000 X00000000 X03018010 X00000380 X00000000 X00000300 (2834204091)
Cause
This is a catch all error type used to report unexpected node hardware or software hardware conditions.
Effect
The node reporting the problem will not accept any data from the upstream neighbor node, thus forcing that node to report blockage.

The RPTERR1 log le should be examined to determine if this is the rst instance of the fault. If this is a recurring fault, the node reporting the fault and its upstream neighbor should be removed from service and diagnosed. If diagnostics do not nd a problem with either node, attempt to clear the fault by cleaning and reseating the circuit packs in the suspect nodes using the recommended contact cleaner. NOTE: Miller Stevenson Company markets an aerosol form of the solvent-lubricant which is recommended (1.0 percent OS-124 in Freon TA) for use on CNI ring backplanes and circuit packs. This product is marketed as MS-181. If the fault persists, start replacing circuit packs in the following order: 1. Replace the IRN board in the node reporting the fault. 2. If the fault persists, replace the IRN board in the upstream neighbor. 3. If the fault persists, call for assistance. See Figure 1-5 on page 1-32.
Issue 16.0
December 2000
1-31
401-661-045
Chart 5
General RAC error
Run diagnostics on the node reporting the fault
Replace packs & diagnose as per TLP list
ATP? Y
ATP? Y
Replace the IRN board in the node reporting the problem. Note: If RPCN and it has no IRN, then replace the R0 board if ring 0 is implicated or R1 board if ring 1 is implicated
1st occurrence?
Transient fault. Monitor /etc/log/RPTERR1 log file for several weeks to see if fault returns
Cleared?
Replace the IRN in the upstream neighbor. Note: If RAC 0 is implicated, the upstream neighbor is the lower node # (LN32-4 is upstream of LN32-5). If RAC 1 is implicated, the upstream neighbor is the higher node # (LN32-6 is upstream of LN32-5)
Call for assistance
Cleared? Y
Done
Figure 1-5.
General RAC Error
1-32 Issue 16.0
December 2000
Node Audit Failure

The output message present on the ROP and the RPTERR1 log le for this fault is as follows: REPT RING TRANSPORT ERR RMV LN32 4 RQSTD; NAUD FAILURE RPTD X6FB015F4 X352070B8 (2834204595)
Cause
The Node Audit process has detected a node that is not responding to the node audit requests, but the rest of the ring seems to be functioning normally.
Effect
The node at fault will be removed from service.

The RPTERR1 log le should be examined to determine if this is the rst instance of the fault. If this is a recurring fault, the node faulted should be removed and diagnosed. If diagnostics do not nd a problem with either node, attempt to clear the fault by cleaning and reseating the circuit packs in the suspect nodes using the recommended contact cleaner. NOTE: Miller Stevenson Company markets an aerosol form of the solvent-lubricant which is recommended (1.0 percent OS-124 in Freon TA) for use on CNI ring backplanes and circuit packs. This product is marketed as MS-181. NAUD failures can be caused by noisy data links on the node being removed from service. Before proceeding to replace circuit packs, rst use the CMpfcnts tool to determine if there are questionable data links on the node being removed from service. If the fault persists, start replacing circuit packs in the following order: 1. Replace the IRN board in the node reporting the fault. 2. Replace one of the two MDL boards. 3. Replace the other MDL board, if equipped. 4. If the fault persists, call for assistance.
Issue 16.0
December 2000
1-33
401-661-045
See Figure 1-6 on page 1-34.
Chart 6
NAUD failure
Diagnose faulty node Replace & diagnose packs as per TLP list Y ATP? N N ATP? Y Examine UNIX file /etc/log/RPTERR1 Y
1st occurrence? N
Transient fault, monitor RPTERR1 for several weeks to see if fault returns
Is node a CDN? N Familiar with CMpfcnts tool? Y This fault could be the result of noisy data links. Run CMpfcnts to identify possible problem links
Replace IRN board N
Cleared? Y
Correct link problem and monitor node for several weeks
Noisy links? N Done
Cleared? Y Done
Go to Chart 6A
Call for assistance
Figure 1-6.
NAUD Failure
1-34 Issue 16.0
December 2000
Chart 6A
Replace IRN board in faulty node
Cleared? N Replace MDL 0 board
Cleared? N Replace MDL 1 board
Cleared? N Replaced adaptor boards on node backplane
Cleared? N
Done
Call for assistance
Figure 1-6.
NAUD Failure (contd)
Issue 16.0
December 2000
1-35
401-661-045
Interframe Buffer Parity Error

The output message present on the ROP and the RPTERR1 log le for this fault is as follows: REPT RING TRANSPORT ERR INTERFRAME BUFFER PARITY ERROR DETECTED, LN63 1 RAC 0. X00000000 XFFFFFFFF X03008010 X00000380 X00004000 X00000300 (2834204086)
Cause
The IFB board upstream of the node reporting the fault detected that a message with bad parity has been presented to it.
Effect
The IFB will set a bit and pass the message on to the downstream node. This node will refuse to accept the bad message, thus forcing the node which presented the bad message to the IFB to report ring blockage to EAR. EAR will attempt to reestablish normal ring communication by performing a Level 0 ring recovery. If this fails to correct the error condition, EAR will escalate the ring recovery to a Level 1 which could result in nodes being removed and isolated.

The RPTERR1 log le should be examined to determine if this is the rst instance of the fault. If this is a recurring fault, the node reporting the fault and its upstream neighbor should be removed and diagnosed. If diagnostics do not nd a problem with either node, attempt to clear the fault by cleaning and reseating the circuit packs in the suspect nodes using the recommended contact cleaner. NOTE: Miller Stevenson Company markets an aerosol form of the solvent-lubricant which is recommended (1.0 percent OS-124 in Freon TA) for use on CNI ring backplanes and circuit packs. This product is marketed as MS-181. If the fault persists, replace circuit packs in the following order: 1. Replace the IRN board in the node upstream of the node reporting the fault.
1-36 Issue 16.0
December 2000
2. If the fault persists, replace the IFB in the node upstream of the node reporting the fault. 3. If the fault persists, replace the IFB in the node reporting the fault. 4. If the fault persists, replace the IRN board in the node reporting the problem. 5. If the fault persists, call for assistance. See Figure 1-7 on page 1-38.
Issue 16.0
December 2000
1-37
401-661-045
Chart 7
Interframe buffer parity error
Run diagnostics on the node reporting the problem Replace packs & diagnose as per TLP list N N
ATP? Y
ATP? Y
Replace the IRN in the upstream neighbor. Note 1: If RAC 0 is implicated the upstream neighbor is the lower node # (LN32-4 is upstream of LN32-5). If RAC 1 is implicated, the upstream neighbor is the higher node # (LN32-6 is upstream of LN32-5) If RPCN, see Note 2 Y N
1st occurrence?
Transient fault. Monitor the RPTERR1 log file for several weeks to see if fault returns
Replace the IFB in the upstream node Y
Cleared?
Cleared? N Replace the IFB in the node reporting problem Replace the IRN board in node reporting the error. Note 2: If RPCN & it has no IRN, replace R0 board if ring 0 is implicated or R1 if ring 1 is implicated N N
Cleared?
Cleared? Y
Call for assistance
Done
Figure 1-7.
Interframe Buffer Error
1-38 Issue 16.0
December 2000
Read Format Error

The output message present on the ROP and the RPTERR1 logle for this fault is as follows: REPT RING TRANSPORT ERR READ FORMAT ERROR DETECTED, LN00 7 RAC 0. MSG SRC: LN00 3, msg type: zzzzz X00000000 XFFFFFFFF X03000008 X00000380 X00004000 X00000300 (3121083924)
Cause
The reporting node, LN00-7 in this example, is reporting the upstream neighbor on RAC 0 (LN00 6) tried to pass a message which had a bad message length. This error usually indicates there is a node on the ring which is clipping/mutilating messages as they pass through this node. This fault type requires immediate attention. A clipped message, if undetected, could take the appearance of a valid maintenance message. This maintenance message could take the appearance of one which would force all nodes into a set quarantine state, thus removing them from service and resulting in a system outage.
Effect
The node which had the bad message presented to it will refuse to accept the message a will send a error report to the home RPCN. This will force the node offering the bad message to report ring blockage to EAR. EAR will attempt to reestablished normal ring communication by performing a level 0 ring recovery. If this fails to correct the error condition, EAR will escalate to a level 1 ring recovery which could result in nodes being removed and isolated.

The RPTERR1 log le should be examined to determine if this is the rst instance of the fault. If this is a recurring fault all reports must be examined in an effort to determine a ring segment which most likely contains the faulty node. If MSG SRC data is present in the output message, the suspected faulty node should be one of the nodes between the SRC node and the node reporting the fault (LN00 4 -> LN00 6) in the example above. If the SRC MSG data is not present, several reports must be examined to determine which area of the ring most likely contains the faulty node. For example, if reports are present from both LN00 7 and LN32 7, all nodes between LN00 7 and LN32 7 (LN00 8 -> LN32 6) are probably not the source of the problem for RAC 0 reports.
Issue 16.0
December 2000
1-39
401-661-045
NOTE: WRITE FORMAT ERROR messages may also be present and can be used to assist in locating the faulty segment. All nodes in the suspected ring segment should be diagnosed. If diagnostics do not nd a problem with any node, attempt to clear the fault by cleaning and reseating the circuit packs in the suspected segment using the recommended contact cleaner. NOTE: Miller Stevenson Company markets an aerosol form of the solvent-lubricant that is recommended (1.0 percent OS-124 in Freon TA) for use on CNI Ring backplanes and circuit packs. This product is marketed as MS-181. If the fault persists, replace packs in the following order: 1. Select the rst node in the suspected segment and replace the UN303 board. Monitor the RPTERR data daily to determine if fault has been cleared. 2. If fault persists, examine the additional faults reported. If the node reporting the fault is in the suspected segment, all nodes from the node reporting this new fault to the previous nodes reporting the fault can be removed from the suspected faulty list. 3. Repeat Step 1 for the next logical link node in the suspected faulty ring segment. If any node contains IFBs, replace these as well once the UN303 has been eliminated as a suspected pack. 4. If fault persists, and all packs in suspected segment have been replaced, call for assistance.
Write Format Error

The output message present on the ROP and the RPTERR1 logle for this fault is as follows: REPT RING TRANSPORT ERR WRITE FORMAT ERROR DETECTED, LN00 7 RAC 0. X00000000 XFFFFFFFF X03000008 X00000380 X00004000 X00000300 (3121083924)
Cause
The reporting node, LN00-7 in this example, is reporting a message it was attempting to write to the ring failed a validation check. This message is similar to the READ FORMAT ERROR type in that it usually indicates there is a node on the ring which is clipping/mutilating messages as they pass through this node. This
1-40 Issue 16.0
December 2000
fault type requires immediate attention. A clipped message, if undetected, could take the appearance of a valid maintenance message. This maintenance message could take the appearance of one which would force all nodes into a set quarantine state, thus removing them from service and resulting in a system outage.
Effect
The node which was trying to write the message will not do so, nor accept the message being offered to it, and a error report is sent to the home RPCN. The nodes previous to the reporting node will report ring blockage to EAR. EAR will attempt to re-established normal ring communication by performing a level 0 ring recovery. If this fails to correct the error condition, EAR will escalate to a level 1 ring recovery which could result in nodes being removed and isolated.

The RPTERR1 log le should be examined to determine if this is the rst instance of the fault. If this is a recurring fault all reports must be examined in an effort to determine a ring segment which most likely contains the faulty node. For example, if reports are present from both LN00 7 and LN32 7, all nodes between LN00 7 and LN32 7 (LN00 8 -> LN32 6) are probably not the source of the problem for RAC 0 reports. NOTE: READ FORMAT ERROR messages may also be present and can be used to assist in locating the faulty segment. All nodes in the suspected ring segment should be diagnosed. If diagnostics do not nd a problem with any node, attempt to clear the fault by cleaning and reseating the circuit packs in the suspected segment using the recommended contact cleaner. NOTE: Miller Stevenson Company markets an aerosol form of the solvent-lubricant which is recommended (1.0 percent OS-124 in Freon TA) for use on CNI Ring backplanes and circuit packs. This product is marketed as MS-181. If the fault persists, replace packs in the following order: 1. Select the rst node in the suspected segment and replace the UN303 board. Monitor the RPTERR data daily to determine if fault has been cleared.
Issue 16.0
December 2000
1-41
401-661-045
2. If fault persists, examine the additional faults reported. If the node reporting the fault is in the suspected segment, all nodes from the node reporting this new fault to the previous nodes reporting the fault can be removed from the suspected faulty list. 3. Repeat Step 1 for the next logical link node in the suspected faulty ring segment. If any node contains IFBs, replace these as well once the UN303 has been eliminated as a suspect pack. 4. If fault persists, and all packs in suspected segment have been replaced, call for assistance.
Emergency Maintenance
This section is intended to assist craft in those instances where the CNI ring appears to be at on its back and requires craft intervention to get the system operational. While this data provides useful information, it should not be used as a replacement for calling for immediate assistance when such a situation occurs. Lucent Technologies personnel should be contacted whenever system recovery is involved rather than waiting until the Ring Down Recovery section of this chapter has exhausted its helpful hints.
Ring Down Recovery

A ring down situation can take several forms. One of these is the case where the CNI ring is repeatedly rolling into either a CNI Level 3 or CNI Level 4 initialization. The second form a ring down situation can take is where EAR is repeatedly performing various levels of ring recovery in an attempt to isolate the cause of the problem. The third scenario is one that should never happen, but given this document has just mentioned that it should never happen, it will be discussed. This is a case where all communication to the ring has been lost, but no integrity process appears to be doing anything about it. No section will be dedicated to discuss this scenario, but in the event it does occur, start the recovery process by requesting a CNI Level 3 initialization, and call for assistance immediately.
Rolling CNI Initializations

If the ring is in a state of repeated CNI initializations, perform the following steps:
1-42 Issue 16.0
December 2000
1. Determine if CNI Safety Net is requesting the CNI initializations. Do this by checking the ROP for the existence of SI15, SI22 or SI24 Defensive Check failures. If present, go to Step 2, else go to Step 5. 2. Disable CNI Safety Net by going to the Emergency Action Interface page and entering a 42 poke command. When the parameter eld appears, enter i to inhibit Safety Net. Next, perform a 50 initialization to set the inhibit ag in memory. This should stop the rolling initializations so that the problem can be investigated. If so, go to Step 3, else go to Step 5. 3. If Safety Net was requesting the initializations due to no CDNs being active (SI24 asserts), determine if the rest of the ring appears to be up. If so, go to Step 4; for anything else, go to Step 5. 4. No CDNs are active, but the rest of the ring seems to be up. Go to the Global CDN Recovery and Single CDN Recovery sections in this chapter for assistance in recovering from this fault. 5. Either the ring is in a rolling initialization due to CNI not being able to get an RPCN up or SI15/SI22 asserts were present due to CNI Safety Net ring. 6. Verify that there are no power interruptions to the ring. 7. If the problem persists, examine the ROP closely to determine if CNI software is agging any node, or group of nodes, as being a possible source of the problem. If so, pull the IRN board out of those nodes to force isolation around that segment. 8. If the RPCNs are equipped with IRN boards, verify that they have the proper microcode versions. Again, only MC3F026A1 is approved for use in a RPCN. 9. If problem persists, power down RPCN32 to force the ring to come up on RPCN00. 10. If the problem persists, restore power to RPCN32. Maybe the problem is related to a bad CU in the ECP. Force the ECP to do a CU switch and attempt a CNI Level 4 initialization. 11. If problem persists, force isolated segments by removing power from one mounting plate at a time (group of three nodes). After power is removed from a group of nodes, request a CNI Level 4. If the problem persists, restore power to the previous group and remove power from the next group. Repeat this step until every node has been tried in an isolated segment. 12. Again, it is assumed that you have already called for assistance, but if not, do so immediately. See Figure 1-8 on page 1-44.
Issue 16.0
December 2000
1-43
401-661-045
Chart 8 Ring down Ring is down but taking no recovery action Request a CNI Level 3 INIT to restart the driver Y Ring up? N Request a CNI Level 4 INIT to repump the ring Y Power down RPCN32 and request a CNI INIT 4 N Rolling INITS stop? Y Go to Chart 8A N
Rolling CNI INITS? Y Check the ROP for the presence of SI15, SI22 or SI24 asserts Present? N Y Inhibit safety net from the EAI page. Use pokes 42, I for inhibit and 50 boot to set new value.
Rolling INITS stop? N Power RPCN32 back up & power down RPCN00. Request a CNI INIT 4
Rolling INITS stop? N Verify that the RPCNs have the correct IRN micro code. Only MC3F026A1 can be used in an RPCN
Ring up? N Go to Chart 8A Y
Correct?
Correct and request a CNI INIT 4 N
Done
Go to Chart 8B Go to Chart 8A
Rolling INITS stop? Y Ring up? Y Done
Figure 1-8.
Ring Down
1-44 Issue 16.0
December 2000
Chart 8A
All CDNs OOS? N
Are all link nodes OOS? Y Check ROP/RPTERR1 for clues Y
Call for assistance
Mention missing files? N Rolling ring reconfigurations? Y Lost token Report? N Repeated RAC parity errors on both rings? Y
Token tracking information? Y Pull the IRN from the two nodes mentioned in the token tracking report & request CNI INIT 4 Ring up? Y Follow normal maintenance procedures to correct faulty nodes Done
If each RPCN is reporting a fault or one RPCN & the upstream neighbor of the other (last node in groups 31 or 63), then there could be two IFB problems. Power down one RPCN to force that segment out of ring. Place a new IFB in the other RPCN. If problem still present, place a new IFB in the neighbor node. If problem still exists, try new IRN. If RPCN is not IRN type, replace the R0 or R1 board based on which ring the fault is reported on if the fault does not involve both pairs. If 1506, 1509, or 1803 IFBs, then pull the IRNs from the two nodes reporting the fault to force a isolated segment.
Go to Chart 8C
Go to Chart 8B
Call for assistance
Figure 1-8.
Ring Down (contd)
Issue 16.0
December 2000
1-45
401-661-045
Chart 8B If you are here, either your ring is in a rolling boot state or the ring is thrashing trying to find a usable segment Power OK? Y N Correct problem and INIT CNI 4 N Ring up? Y
It may be a 3B problem. Force a CU switch and do a CNI INIT 4 Ring up? N Power down RPCN 00 & LN00-6. Do a CNI INIT 4 Ring up? Y Y
N Restore RPCN power. Remove power from last LN00 node, and do a CNI INIT 4. Ring up? N Continue this process of forcing small (three-six nodes) isolated segments until every node has been part of an isolated segment N Y
N Every segment tried? Y Call for assistance
Ring up? Y Correct faulty segment
Done
Figure 1-8.
Ring Down (contd)
1-46 Issue 16.0
December 2000
Chart 8C
Ring seems to be up, but we have no CDNs.
Applied a BWM recently? N Check ROP for memory exception messages Y
Did it touch CNI products? N
Present? N Maybe the CDN data bases need to be repumped. Power cycle one CDN and perform a manual restore. Note: Wait about 5 minutes before trying the restore so memory can initialize
Repeat process on remaining CDNs
CDN up? N
Done
Call for assistance
Back BWM out and call for assistance
Figure 1-8.
Ring Down (contd)
Issue 16.0
December 2000
1-47
401-661-045
Global CDN Recovery

This section is intended to provide assistance when all CDNs are out of service and fail to recover after a CNI Level 4 initialization. When this event does occur, execute the following steps in an attempt to clear the fault AND immediately call for assistance. 1. If you have not already done so, inhibit CNI Safety net by going to the EAI page and entering a 42 command. When asked for the parameter value, enter i. Next, do a 50 boot to set the ag in memory. 2. Was a BWM just applied that required the CDNs to be repumped? If so, back the BWM out. 3. Check the ROP closely to see if there are any error messages present that indicate les may be missing. 4. CDN memory could be scrambled. Inhibit ARR via inh:dmq:src arr input command. Next, power cycle each CDN, and allow the CDN to initialize its memory (approximately 5 minutes). Once the initialization is completed (red light on the MASA boards should be extinguished), request an unconditional restoral of each CDN. 5. Perform an ECP stable clear to reinitialize the CNI integrity processes using init:ecp:sc. Attempt to restore the CDNs unconditionally. 6. If the nodes are being removed during the database download portion of recovery (page 2160 shows them in the init state), use UXprint to determine if the nodes are always removed while downloading a specic database. 7. Examine the ROP closely for the existence of either of these messages: REPT:CDN x, y (CDN-I) REPT:CDN x, FAULT (CDN-II and CDN-III) where y is either STACK, MEMORY or UNKNOWN. If present, contact CTS personnel. 8. Check the application database (apxrcv) iun form to verify that the CDNs are dened properly. 9. Check the ECD (apxrcv) ucb form to verify that the CDNs are dened properly. 10. It is assumed you have called for assistance already, but if not, do so immediately.
1-48 Issue 16.0
December 2000
Single CDN Recovery

This section is intended to provide assistance when a single CDN will not restore to service. When this occurs, execute the following steps in an attempt to clear the fault: 1. Perform manual diagnostics on the suspect link node. If the CDN-I is a single plate RAP, demand phases 54-61 must be requested to test the MASA boards. One phase is required for each MASA board equipped. 2. If the restore fails during the pumping phase, (that is, ABORTED PUMP OF IUN LN00 7), check the le /1apx10/ims/cdn/OFC.cdn.lv.x to verify that it is a contiguous le. If it is not, use the fmove command to make it contiguous. If the node is a CDN-II, check the le /1apx10/ims/cdn2/OFCcdn2 to verify that it is a contiguous le. If it is not, use the fmove command to make it contiguous. 3. If the node is a CDN-I and the fault persists, refer to CDN-I Fault Isolation in Chapter 6, Diagnostic Users Guide, for assistance in running the onboard rmware diagnostics. If the node is a CDN-II node, try replacing the AP board (TN1630B). If the fault persists, contact the CTS for assistance. 4. If the node is a CDN-I and the fault persists, inspect the RPTERR1 log le for the presence of the Hardware Panic message. REPT COM100 TBL LN00 07 NADR: XC07 Panic : Hardware Local Bus Parity Error: CCS0(lba=0x0): CSRs=0x61100028,0x0 MASC0(lba=0x100000): CSRs=0x422054,0x4c00b500 CCS 61100028 MASC 00422054 NPI 00000000 5. If a message similar to this appears, it is not necessarily a local bus parity error. Go directly to Chart 3B of Figure 1-3 for CDN assistance. 6. If owchart fails to clear the fault, call for assistance.
Issue 16.0
December 2000
1-49
401-661-045
1-50 Issue 16.0
December 2000
2
2-1 2-3 2-5 2-6 2-6 2-7 2-7 2-7 2-8 2-8 2-8 2-9 2-10 2-11 2-13 2-13 2-13 2-16 2-17 2-18 2-19 2-20 2-20 2-20 2-21
Contents
General Operation of the Ring Ring Nodes
s s s s
Ring Peripheral Controller Nodes Basic IMS User Nodes Direct Link Nodes (DLN) Call Processor/Data Base Nodes (CDN) CDN-I CDN-II CDN-IIx CDN-III Interframe Buffers
Node Names and Addresses Ring Message Format Reconfigurations

s s s
Node Quarantine Node Isolation The Ring Config Module Level-3 IMS Initializations (FPI and Boot) Level-4 IMS Initializations (FPI and Boot) Central Node Control Audit (AUD CNC) Node State Audit (AUD NODEST) Node Audit
Initializations
s s
Audits
s s s
Issue 16.0
December 2000
2-i
401-661-045
Contents
2-ii
Issue 16.0
December 2000
General
The Interprocess Message Switch (IMS) is a packet switch composed of ring-based communication nodes centered upon a 3B21D computer. Each ring node is controlled by a microcomputer called the node processor. The nodes are distributed around dual, parallel communication rings that propagate data in opposite directions. Ring 0, the outer ring in the illustration below, propagates data clockwise; and ring 1, the inner ring, propagates data counter-clockwise. Ordinarily, of the two ring paths, ring 0 is actively involved in transmitting user messages, while ring 1 performs as a path for internal IMS communications. Each ring node contains one interface to each of the two rings and one interface either to the 3B21D or to a user's external system. Thus, IMS has two types of nodes: nodes interconnecting the ring and the 3B21D, the most important of which are called ring peripheral controller nodes (RPCNs), and nodes interconnecting the ring with the user's external system, most of which are called basic IMS user nodes (basic IUNs). As a processing resource, the centralized 3B21D is also available to users, but its principal purpose is to provide operational, administrative and maintenance control of the switch.
Issue 16.0
December 2000
2-1
401-661-045
The following graphic illustrates a graphic conception of the ring.
3B21D
LEGEND
RPCN BASIC IUN
Figure 2-1.
Conceptual Illustration of an IMS Ring The real situation is somewhat more complicated than this description, because IMS has other types of nodes and because users are represented not only by an external communication system but also by internal hardware and software residing in certain nodes. A full discussion of all classes of IMS nodes appears shortly below. IMS may be used either as a local area network or as a switching system. More commonly it is used as a switch to transfer user messages from incoming transmission facilities to user-specied outgoing transmission facilities. A user message typically enters IMS through the external or user interface of an IUN, is formatted and addressed to a destination IUN by the resident node processor, and is inserted on the ring by the resident ring interface. It then passes around the ring to the destination IUN where it is recognized and extracted by the ring interface, reformatted by the node processor, delivered to the user interface and, then, returned to the user. In this typical transmission the 3B21D is not directly involved, though it can be involved, depending on user requirements. When access to the 3B21D is needed, a user message enters the ring as described above but is rst removed by an RPCN or similarly functioning node, which delivers it to the 3B21D, which processes it. The 3B21D then returns the processed message to an RPCN, which inserts it on the ring, from which it is removed by the destination IUN, which further processes and returns it to the user.
2-2 Issue 16.0
December 2000
In this illustration of IMS switching, a user message is transferred between processes residing in different processors. By itself the illustration is misleading, because IMS is not an interprocessor message switch but an interprocess message switch. It is capable of transmitting messages between any two processes, whether user- or IMS-owned, residing in the same or in different processors. This capability is provided by a major IMS software module called the message switch.
Operation of the Ring

All ring nodes contain a ring interface. Each ring interface is equipped with a pair of ring access circuits (RACs), one connected to each ring. Each RAC consists of three elements: a rstin-rstout buffer (FIFO) that is 10 bits wide, circuitry providing receive logic, and circuitry providing transmit logic. The FIFO is actually a component of the ring, which is a mixed medium composed alternately of storage devices and transmission leads. The storage devices are the FIFOs. The transmission leads are a 12-bit ring bus that interconnects the FIFOs (and therefore the RACs). The ring bus contains eight data leads, two formatting leads, and two control leads. A data-available control lead permits the upstream RAC to assert to the downstream RAC to which it is offering a byte of data. A data-taken control lead allows the downstream RAC to acknowledge to the upstream RAC that it has accepted the offered byte. Data thus advances between adjacent RACs asynchronously, one byte at a time, by means of continuous handshakes. Upstream and downstream are relative terms. Each RAC is upstream of the RAC to which it offers data and downstream of the RAC from which it receives data. A byte of data may be offered to a RAC either by the upstream RAC or by the resident node processor, which connects with the RAC through an 18-bit DMA channel composed of 16 data leads and two formatting leads. The rst 8 bytes of a message from either source consists of header information. Each header byte is examined as it is offered by the second element of the RAC, the receive logic. The receive logic checks for parity and formatting errors and determines message disposition. It also controls the loading of each data byte into the FIFO. The third RAC element, the transmit logic, disposes of the data in the FIFO according to instructions from the receive logic.
Issue 16.0
December 2000
2-3
401-661-045
If the message was addressed to the resident node or was a broadcast message,1 the bytes composing it are offered by means of handshakes to the node processor via the 18-bit DMA channel. If the message was not addressed to the resident node, the bytes composing it are offered by means of handshakes to the downstream node via the next segment of the ring bus.
12-bit ring bus FIFO
12-bit ring bus
18-bit DMA channel (write)
RCV logic RAC
XMIT logic
18-bit DMA channel (write)
Figure 2-2.
A Ring Access Circuit on the IMS Ring IMS employs a token message on each ring to ensure that only one node at a time writes messages to the ring. A token continuously traverses a ring. When a node is ready to insert a message or a block of messages on a ring, it waits for the upstream node to offer a data byte that its receive logic recognizes as the rst byte of the token header. It delays accepting this byte (does not assert the data-taken lead) until it can insert its message or messages, byte by byte, on the ring. Then it accepts and transmits the token message downstream, making it available to the next node that has messages to write.
IMS has two types of broadcast messages-general broadcasts, which are read by every node, and selective broadcasts,which are read by previously dened groups of nodes. Selective broadcasting-achieved by virtual addressing-allows such practices as parallel downloading of data or code into similar node types.
2-4 Issue 16.0
December 2000
Ring Nodes
IMS has two classes of ring nodes-RPCNs and IUNs. RPCNs are nodes that contain no user software and that interconnect the ring and the 3B21D. IUNs, which contain both IMS and user software, perform a variety of functions. The class of IUNs has two subclasses-unextended IUNs, in which the node processor provides the only processing resource, and extended IUNs, in which the processing function is supplemented by an attached processor. At present, all unextended IUNs contain external user interfaces, but no extended IUNs do. This condition, however, is arbitrary and therefore subject to change. Currently there is one type of unextended IUNs; the basic IUNs. There are two types of extended IUNs-direct link nodes (DLNs) and call processor/database nodes (CDN-I). All ring nodes of either class have a ring interface and a node processor. In this document the units of a node other than the ring interface and the node processor are called auxiliary components. Ring node hardware utilizes very large scale integration hardware, housing the ring-interface and the node-processor functions in a single integrated circuit pack. These are called integrated ring nodes (IRNs). There are two versions of IRNs: the IRN/IRNB (UN303/UN303B) and the IRN2/IRN2B (UN304/UN304B). Node processors are microcomputers composed of a CPU, memory, interrupt logic, I/O ports, and DMA circuitry. They are supplemented in DLNs by an additional microcomputer called the attached processor and in CDNs by an additional minicomputer called the ring application processor. In unextended IUNs, the node processor contains both IMS and user code. In extended IUNs, user code resides only in the attached processor, whereas both node and attached processors contain IMS code. The content of user code is determined by user needs. Typically it provides or contributes to such functions as controlling user hardware resident in the node, managing the user's network, and providing real-time user services such as protocol conversion and message addressing. The code provided by IMS manages the ring-interface and node-processor hardware. It includes code for initialization and automatic maintenance and for such switching functions as message formatting and temporary message storage. It provides an operating system, boot monitor, memory, timers, and measurements. Except for the boot monitor, all code residing in node processors and attached processors is downloaded from the 3B21D.
Issue 16.0
December 2000
2-5
401-661-045
Ring Peripheral Controller Nodes

RPCNs allow messages to be passively exchanged between the ring and the 3B21D. The exchange is passive because the RPCNs contain no user code that could provide processing of message substance. By contrast, direct link nodes (discussed below) provide active exchange of messages between the ring and the 3B21D by supplementing certain real-time user functions housed in the 3B21D. To minimize the consequences of a wide failure, RPCNs are distributed about the ring with approximately equal numbers of IUNs between them. A minimum requirement exists of two RPCNs per ring. Typically, large rings will have more. In addition to a ring interface and a node processor, RPCNs contain the following circuit packs:
s
A duplex dual serial bus selector (DDSBS) serves as a termination point between the ring and the dual serial channels of the 3B21D. It converts the parallel output of the ring to the serial format of the dual serial channels and vice versa. The DDSBS is duplexed, with one DDSBS function connected to the dual serial channel of the on-line 3B21D control unit and one to the off-line control unit. A 3B21D computer interface (3BI) circuit pack serves as a buffer between the node processor and the DDSBS. It also provides data conversion between the node processor's 16-bit data bus and the DDSBS's 36-bit data bus. The 3BI communication occurs either via a DMA channel or a program I/O utility of the 3B21D operating system. The DMA channel is ordinarily used for standard message interchange. The program I/O is initiated and used by the 3B21D to issue urgent commands to the RPCN or to synchronize data transfers.
Basic IMS User Nodes

Basic IUNs interconnect the ring and the user's external system. In addition to a ring interface and a node processor, a basic IUN contains an external user interface. The external user interface and node processor communicate with one another via a shared memory in the external user interface. The MDL circuit pack described below is available as an external user interface for these nodes; or, as with Common Network Interface (CNI) link nodes, users may supply their own interface.
2-6 Issue 16.0
December 2000
Direct Link Nodes (DLN)

DLNs are designed to supplement real-time processing of user data in the 3B21D. Like RPCNs, DLNs provide message transmission between the ring and the 3B21D. But unlike RPCNs, DLNs contain user code, the presence of which enables them to reduce the processing demands upon the 3B21D by assuming some user processing functions that cannot be performed by basic IUNs. In addition to a ring interface and a node processor that contains only IMS code, DLNs are composed of the following circuit packs:
s
An attached processor that resides on the node-processor bus and communicates with the node processor via a dual-ported memory and hardware interrupts. The attached processor contains both IMS and user code. A 3B21D computer interface (3BI) and a duplex dual serial bus selector (DDSBS) that perform in the same way and serve the same functions as they do for RPCNs, as described above.
Call Processor/Data Base Nodes (CDN)

The CDN handles the call processing functions of the FLEXENT/AUTOPLEX Wireless Network Systems. There are several versions of the CDN: CDN-I, CDN-II, and CDN-IIx.
CDN-I
IMS offers an extended node for users who require more processing power in the nodes than can be supplied by basic IUNs. The node is called a CDN-I [sometimes referred to as a standard multi-application real time node (SMART node or SN)]. It serves as an alternative to the 3B21D for the substantive processing of user data. Currently, the CDN-I has only an interface to the ring. It is capable, however, of having an external user interface, and it may have one in the future. In addition to a ring interface and a node processor that contains only IMS code, a CDN-I is composed of the following elements:
s
An attached processor called a ring application processor (RAP). The RAP is a 3B15 computer mounted on an IMS backplane that has been redesigned to conform with the design of IMS ring-node frames/cabinets and the 3B15. The older version has 2 megabytes of memory and is capable of growing an additional 94 megabytes. The newer version has 16 megabytes of memory and is capable of growing an additional 112 megabytes. The following circuit packs compose the RAP:
Issue 16.0
December 2000
2-7
401-661-045
Central controller cache (CCC) Central controller support (CCS) Main store controller(s) (MASC) Main store arrays (MASAs)
s
A power control interface and display (PCID) that provides manual-power, reset, and diagnostics controls and LEDs that indicate power and diagnostic failures. A node-processor interface (NPI) that provides message exchange between the node processor and the RAP.
CDN-II
The CDN-II (sometimes referred to as the Turbo CDN) creates a new node that is used to replace the CDN-I. The CDN-II requires only two boards and ts in a standard 3-node shelf or the new 5-node shelf. The CDN-II provides a newer technology, higher performance CDN. The performance of CDN-II is about four times the performance of the CDN-I. CDN-II has a xed 80 Mbytes of memory and consists of the IRN2B (UN304B) and an AP (TN1630B).
CDN-IIx
The CDN-IIx has identical features to the CDN-II, but different hardware. It uses the IRN2B (UN304B) and an AP (TN1720x) but can have up to 272 Mbytes of memory using multiple AP boards. A CDN-II can be upgraded to a CDN-IIx by ordering a memory growth upgrade kit.
CDN-III
The CDN-III is an improved CDN that may be used to upgrade CDN-II or CDN-IIx type nodes. The CDN-III consists of an IRN2 node core and AP60 attached processor, providing greater processing and memory capacity than previous CDNs. The AP60 uses an MC68LC060 processor.
2-8 Issue 16.0
December 2000
Interframe Buffers
Interframe buffers (IFBs) are required to extend the parallel ring buses where the distance between adjacent ring nodes is greater than a few inches. In an IRN ring, the distance is 24 inches or more. Such internodal distances occur at the boundaries of frames or cabinets where the two rings must be extended by two lengths of cable. At times they may also occur within frames/cabinets. At these boundaries, an interframe-buffer circuit pack must be inserted at each end of the parallel cables, between the cables and the nodes that are separated by the cables. Interframe-buffer circuit packs are always employed in pairs. Each member of a pair contains both send and receive circuitry. Therefore, the paired packs are mutually dependent, with each providing half of the buffering function for each parallel ring bus. The following graphic iilustrates the pairing of the interframe buffers.
ring 0
RAC 0 RI RAC 1 SEND IFB RCV
cable RCV IFB SEND cable RAC 0 RI RAC 1
ring 1
Figure 2-3.
Interframe Buffers Thus, if either member of a pair fails, the pair fails. In addition to providing necessary drive capability without slowing down the internodal byte transfer rate, interframe buffers in padded form may be used to increase the effective lengths of small rings, thereby permitting them to employ longer messages. For this purpose, two pairs of 4104-byte buffers may be inserted in small IRN rings. The pairs should be placed diametrically on the ring to minimize the possibility that both would be included in an isolation. If additional interframe buffers are needed, they should be of the standard 16-byte capacity. The 16-byte capacity is adequate for use on large rings where employment of long messages requires no buffer padding. Technicians should ensure that the actual sizes of their interframe buffers correspond to the sizes entered in equipment conguration data (ECD). See `ÈCD Values for Interframe Buffers'' in Appendix B, Ring Maintenance Reference Material.
Issue 16.0
December 2000
2-9
401-661-045
Node Names and Addresses

Ring nodes are named as members of the group in which they reside. A group is composed of a maximum of 16 member nodes numbered 00 through 15. Node 00 is always reserved for an RPCN. Nodes 01 through 15 are reserved for other node-types. If a node position is unequipped, the member number is nevertheless reserved for the position. Node names consist of a node-type identication followed by a 2-digit group number followed by a 2-digit member number. IUN32 10, for example, is an IUN, and it is member 10 (or the 11th node or node position) in group 32. RPCN00 0 is an RPCN, and it is member 0 (or the rst node or node position) in group 00. Member numbers and group numbers are assigned so that they increase in the direction of trafc ow on ring 0. Unlike member numbers, however, group numbers do not necessarily increase by consecutive integers. Thus, a ring might consist of groups 00, 01, 02, 32, 33, and 34, for example. In IMS usage, nodes are identied by the formula RPCNa b or IUNa b, where a is the 2-digit group number and b is the 2-digit member number. In addition to names, nodes have identications and physical addresses. (Nodes may also have virtual addresses, but technicians will not encounter or use them.) The identication, a number between 0 and 1023, represents the physical location of the node on the ring. The identication is calculated with the formula 16(a) + b where a is the group number and b is the member number. The identication appears in decimal or hexadecimal form in various IMS output messages. It is also the address that is strapped on the back of each node by grounding the node ID pins. The pins, which are numbered 0 through 9, represent sequential binary weights (ID 0 = 1, ID 1 = 2, ID 2 = 4, ID 3 = 8, and so on). The sum of the binary weights of all grounded pins is the node identication. The physical node address, a number between 3072 and 4095, is used in IMS message headers to identify the source and destination addresses of messages. The physical address is calculated by adding 3072 (or in hexadecimal notation, C00) to the node identication. The number 3072 corresponds to the two most signicant bits in the 12-bit source- and destination-address elds of message headers, the lower 10 bits being the node identication. Tables in the reference chapter of this document provide translations of both identications and physical addresses into node names. Technicians will encounter the hexadecimal form of the physical node address in messages output in response to phase 1 and 2 diagnostic failures.
2-10 Issue 16.0
December 2000
Ring Message Format

The gure below illustrates the format of IMS messages as they appear on the 12-bit ring bus (the two control leads are not shown).
C 1 0 0 0 0 0 0 0 0
6 DC
4 RR
3 CF
1 CC
word count source address SR dest.address DR dest.address word count source address
data
last data
LEGEND CC = Control Code CF = Control Flag RR = Rac Reset DC = Destination Control SR = Source Ring ID DR = Destination R
Figure 2-4.
IMS Message Format
Issue 16.0
December 2000
2-11
401-661-045
The illustration leaves blank ll bits and bits that are not examined by ring-interface hardware. The rst 8 bytes constitute the message header. The rst byte contains a 7-bit control eld from which the RAC learns how to respond to the message. Within the rst byte, the control code (CC) denes the message function. Functions are token, software, destroy, set/clear quarantine, set/clear isolation, processor reset. The destination control (DC) identies the address-type. Types are normal address match, general broadcast, selective broadcast, and take message. In addition to the 8 data-bits, there is a ninth bit, called the control or C-bit, which is always set to logic-one to identify the beginning byte of every message. From association with this feature, the entire rst message byte is often referred to in documentation as the control or C-byte. The tenth bit is a parity bit which provides odd parity over the data byte and C-bit. When a RAC writes a message to the ring, it generates the C-bit and modies the parity bit from node-processor memory to include the C-bit. When a RAC reads a message from the ring, the C-bit is removed and parity is changed back to its original form before being written to node-processor memory. The word count in the second message byte informs the RAC of the total number of 32-bit words in the message. Each message contains 4N bytes, where N is the value of this 7-bit word count. All messages are padded out to contain an integral number of 32-bit words. The longest possible message that can be placed on the ring is limited to the maximum value of this word count, which is 127 32-bit words (508 bytes) for rings that allow the short message and 543 32-bit words (2172 bytes) for rings that allow the long message. For explanations of conditions that permit short and long messages, see the discussion of interframe buffers above. The third and fourth header bytes contain the source address, and the fth and sixth header bytes contain the destination address. The ring-interface hardware performs address matching on the 12-bit node address and the 1-bit ring id (that identies which of the two rings is used for the message). The lower 10 bits of the ring address are referred to as the node identication. Each node is assigned a unique 10-bit node identication via the ID0-ID9 backplane straps. This header information enables the RAC to determine message disposition and the source and destination addresses, to check for errors in parity, format, and message length, and to perform hardware control functions required for ring maintenance.
2-12 Issue 16.0
December 2000
Recongurations
The types and number of nodes composing any ring are selected to meet the requirements of a specic user. Thus, only a ring whose components are fully in service may be thought of as properly congured. Yet rings must sometimes be temporarily recongured for such reasons as the need to repair or replace equipment. IMS recongures a ring by removing one or more nodes from service. Nodes that have been removed from service are ordinarily in one of two states. They may be quarantined or they may be isolated.
Node Quarantine
Quarantining a node consists of electrically severing the node processor from its associated ring interface, an action that prevents the node processor from communicating through or to the ring interface. However, the action does not prevent the 3B21D or other nodes from limited communications with the node processor which they accomplish by setting registers in the ring interface. When a node is placed in quarantine, both RACs are set to forced-propagate mode, which allows them to continue propagating messages on the rings but prevents them from reading messages from or writing messages to the rings. Quarantining is the appropriate response to a fault that occurs in a node processor or in any of the auxiliary components of a node. Quarantining has the advantage over isolation in that it disturbs the ring subsystem only slightly. Throughout this document the term "quarantine'' is used solely to represent a node that is in the state described above and that is in the active ring. Nodes in isolation or nodes during initialization or recovery sequences may have their node processors electrically severed from their ring interfaces, which are in forced-propagate mode. Such nodes will not be called "`quarantined'' since they are not in the active ring.
Node Isolation
Quarantining a node insulates the active ring from faults or activities in the node processor and in auxiliary components. Isolating a node insulates the active ring from the entire node. It is achieved by converting the ring subsystem from one dual-ring structure to two single-ring structures. Of the two single-ring structures, one is the active segment that continues to transmit user messages, and the other is the isolated segment that contains the isolated node or nodes. Isolated segments do not have a token message. The following gure schematically represents an isolated ring.
Issue 16.0
December 2000
2-13
401-661-045
3B21D
LEGEND
RPCN BASIC IUN
Figure 2-5.
Illustration of an Isolated Ring In this illustration, the active segment is composed of the unlettered nodes and of basic IUNs A and C, and the isolated segment is composed of RPCN B only. Basic IUNs A and C are called, respectively, the Beginning-of-Isolation (BISO) node and the End-of-Isolation (EISO) node. They are participants in the active ring that have the special function of altering the dual rings to form the isolated ring. They achieve this alteration by means of internal data selectors that can shunt trafc from one parallel ring to the other. This phenomenon is represented in the following illustration of a node before and after it becomes a BISO or EISO node.
2-14 Issue 16.0
December 2000
Ring 0
D S
RAC 0
RAC 1
D S
Ring 1
Ring 0
D S
RAC 0
RAC 1
D S
Ring 1
DS = Data Selector Selected ring path Unselected ring path
Figure 2-6.
Before (top) and After (bottom) Becoming a BISO or EISO Node Because all nodes have this shunting capability, any node of any class can perform as a BISO or an EISO node. The nodes actually selected to perform these functions are determined by the location of the node(s)-to-be-isolated. The node selected to be the BISO node is ordinarily the rst node upstream on ring 0 of the node(s)-to-be-isolated (and therefore the next lower-numbered node), and the node selected to be the EISO node is ordinarily the rst node downstream on ring 0 of the node(s)-to-be-isolated (and therefore the next higher-numbered node). If more than one node must be isolated (a phenomenon called a multiple isolation), IMS software chooses to recongure the ring in such a way as to
Issue 16.0
December 2000
2-15
401-661-045
include the smallest number of nodes possible. Nodes included in a multiple isolation, not because they contain faults, but because they lie between faulty nodes, are called innocent victim nodes. The BISO and EISO nodes also provide the means by which maintenance messages are transmitted between the active and the isolated segments of an isolated ring. BISO and EISO nodes have one RAC participating in the active segment and one RAC participating in the isolated segment. Messages destined for either ring segment may be read from the sending segment by the EISO or BISO RAC participating in it, transmitted via the node processor to the RAC participating in the receiving segment, and then written to the receiving segment. It is by this means that diagnostic code is downloaded by the 3B21D into isolated nodes and diagnostic results are returned to the 3B21D. Isolation is a more drastic means than quarantine for removing a faulty node from service. It is an appropriate response to a fault in the ring interface or in the medium between ring interfaces (this may be a fault that prevents messages from being propagated on the ring).
The Ring Cong Module

When the ring is restarted or when an isolation is imposed or dissolved, the action is performed by the IMS ring cong module whose principal acts are: 1. to inhibit the services provided by the message switch, thus, preventing the nodes from writing to the ring, a condition known as ring silence 2. to set the data selectors of every node to positions that provide the desired ring structure 3. to test ring continuity, and-if continuity is goods
to issue one token message, when the ring contains an isolation, or two token messages, when it does not to restart the message switch; or-if continuity is badto abort and return control to the process that initiated ring cong.
s s
The ring cong module may be executed by IMS initialization software, by Error Analysis and Recovery (EAR) software, by Automatic Ring Restoral (ARR) software, or by manual commands to change the structure of the ring. The processes mentioned here are described at length later in this document.
2-16 Issue 16.0
December 2000
Initializations
IMS offers seven levels of System Initialization - 0, 1A, 1B, 3(FPI), 3(BOOT), 4(FPI), and 4(BOOT) - with each higher level providing more complete initialization and greater impact on the user.
s
Levels 0, 1A, and 1B reinitialize certain data in the 3B21D; they are usually run in response to program faults in the 3B21D or in response to 3B21D operating system initializations that affect 3B21D-to-node DMA interfaces. An escalation strategy ensures that repeated problems with these lower-level initializations will result in one or more of the higher-level initializations being attempted. Full Process Initialization(s) (FPI) occur without a preceding IMS abort and, therefore, require little initialization of IMS software in the 3B21D. Instead of copying all IMS code and data resident in the 3B21D from disk, FPI initializations restart the principal body of IMS code, the driver. The FPI feature has the advantage of saving initialization time-particularly in level 3(FPI) initializations-and of greatly simplifying the initialization sequence. The BOOT initializations are preceded by abort and boot sequences of IMS in the 3B21D. Thus, the two FPI levels provide partial initialization of IMS in the 3B21D, and the two BOOT levels provide full initialization of IMS in the 3B21D.
In the ring, completeness of initialization increases with the numbers. The level 3 initializations (FPI and BOOT) attempt to conserve system usability by reinstating the ring structure that existed prior to initialization. They resort to establishing a new structure only if tests indicate that the existing structure is not viable. By contrast, the level-4s make no attempt to reinstate the previous ring but immediately set about testing all nodes to determine the optimum ring structure. Thus, the level-3s provide partial initialization of the ring, and the level-4s provide complete initialization of the ring. IMS software can request the three lower levels of initialization but not the four higher levels. Instead, it responds to internally-detected problems requiring higher-level initializations in one of two ways. It can request the user choose one of the four higher-levels. Or it can abort, thereby forcing the user to choose either level 3(BOOT) or level 4(BOOT). In general, it responds to an indication of software mutilation by aborting. Otherwise, it allows the user to decide how to respond. The user can also independently request any of the four higher levels.
Issue 16.0
December 2000
2-17
401-661-045
Level-3 IMS Initializations (FPI and Boot)

Level-3 (FPI) initializations begin with a limited initialization of IMS in the 3B21D as described above. Level-3(BOOT) initializations begin with a full initialization of IMS in the 3B21D as described above. Both level-3s then proceed to initialize the ring with the following sequence of events: 1. An attempt is made to restart all RPCNs. That is, depending on generics, either their current operational code is placed in execution, or they are downloaded with new operational code which is placed in execution. If any RPCN restarts, this sequence proceeds to Step 2. If all RPCNs fail to restart, all are downloaded with new operational code which is placed in execution, and the sequence then proceeds to Step 2. 2. The ring is audited to
s s
conrm that the token is or the tokens are present identify the current ring structure by examining the positions of node data selectors check for inconsistencies between the actual ring structure and ECD data, and verify that all RACs can propagate data on the ring. During the audit, message trafc between nodes on the ring is permitted to continue, though message trafc to the 3B21D is denied.
3. If all audit tests pass, the ring cong module is called to establish a ring in conformity with the prior ring-structure. But if any audit test fails or reveals an inconsistency, a new strategy of empirically testing for ring continuity begins by sending test messages on the ring. If the tests reveal ring continuity, the ring cong module is called to establish the normal two-ring structure; but if the tests reveal discontinuity, ring cong is called to establish an isolated ring that excludes the problem node or nodes. In either case, the IMS initialization process exits when ring cong has established a viable ring or aborts when it is unable to.2 Unlike the audit stage, the continuity-test stage of level-3 ring initialization requires ring silence. Thus, during continuity tests, user message trafc on the ring is halted. 4. With an active ring in place, the 3B21D now queries each IUN and quarantines those that do not respond.
An exception to this statement occurs whenmanual ring mode is in effect.For an explanation of manual ring mode, see the ``Manual Ring Maintenance'' section of Chapter 3, Ring Maintenance.
2-18 Issue 16.0
December 2000
Level-4 IMS Initializations (FPI and Boot)

Level-4(FPI) initializations begin with a limited initialization of IMS in the 3B21D as described above. Level-4(BOOT) initializations begin with a full initialization of IMS in the 3B21D as described above. Both level-4s then proceed to initialize the ring with the following sequence of events: 1. RPCNs are downloaded with new operational code and placed in execution. 2. Each node is tested for the ability of its ring-interface hardware to propagate messages on the ring and for the functionality of its data selectors. 3. The ring cong module is called to establish a ring structure based on the results of these tests. 4. With the new ring structure in place, tests are made to determine the ability of each unisolated IUN to read messages from, and write messages to, the ring. Nodes that fail the tests are quarantined. 5. All unquarantined and nonisolated nodes are downloaded with operational code and placed into execution. The downloading occurs by means of selective broadcast messages that allow parallel downloading of similar node-types. When downloading is done, the IMS initialization process is done, and the ring is up. IMS level 4s are accompanied by ring silence. Even if no nodes are operational, IMS level 4 initialization completes so that technicians can conduct diagnostics in an attempt to manually correct the problem. IMS initializations are reported on the ROP by the REPT IMSDRV INIT output message. This message format will report rst the completion of the critical stage of initialization and then the completion of the non-critical stage. Initialization of the ring and initialization or restarting of the IMS driver compose the critical stage. The noncritical stage consists of initializing such features in the 3B21D as display pages, measurements, and certain craft state reports.
Issue 16.0
December 2000
2-19
401-661-045
Audits
The following information about IMS audits is offered chiey because output messages concerning audits will occasionally appear on the ROP. Technicians should rarely have occasion to use the input commands that manually initiate them.
Central Node Control Audit (AUD CNC)

This is a routine audit that runs according to a user-specied schedule. IMS recommends a 15-minute interval. It also runs during level 0 and level 1A IMS initializations and in response to manual requests. The purpose of the audit is to nd and correct inconsistencies in internal records that could interfere with the actions of automatic maintenance. The errors detected by this audit indicate mutilated internal data or other software problems, which often occur as side effects of other events, such as those reported by REPT IMSDRV FLT messages. The central node control audit attempts to correct an error by canceling the maintenance task associated with it. It does not verify that its action was successful. To verify that the error was corrected, a technician must run the audit again, using the AUD:CNC 1 input message. If the central node control audit nds an error, it reports it in an AUD CNC output message. If it does not nd an error, no output message is printed, unless the audit was manually requested. Problems in running the audit are reported in a REPT IMSDRV AUD message. Once started, the audit normally takes under 10 seconds to run.
Node State Audit (AUD NODEST)

This is a routine audit that runs according to a user-specied schedule. IMS recommends a 15-minute interval. It also runs during level 0 and level 1A IMS Initializations and in response to manual requests. Its purpose is to detect and correct errors in the node availability map, which is used by software modules such as node audits to identify nodes whose major state is ACT (See the discussion below of IMS maintenance states). The audit compares the data in the node availability map with state data in the IMS driver and, when it nds inconsistencies, modies the map to conform to the state data. The errors detected by the node state audit indicate mutilated internal data or other software problems, which often occur as side effects of other events, such as those reported by REPT IMSDRV FLT messages. The audit's attempts to correct errors should always succeed. When the audit nds an error, an AUD NODEST output message is printed. When it does not nd an error, no output message is printed, unless the audit was manually requested. Problems in running the audit are reported in a REPT IMSDRV AUD message.
2-20 Issue 16.0
December 2000
Node Audit
An automatic, internal audit of nodes allows maintenance software in the 3B21D to continuously monitor the health of the ring and all ring nodes. The node audit is run routinely every few seconds. By this means, the 3B21D veries that each active node is operating correctly, checks the communication paths of both rings, and nds nodes that have quarantined themselves or that need to be quarantined. The work of the node audit is transparent to technicians and users of IMS, unless it detects a problem that causes a node to be removed from service.
Issue 16.0
December 2000
2-21
401-661-045
2-22 Issue 16.0
December 2000
Ring Maintenance
3
3-1 3-3 3-3 3-3 3-5 3-7 3-7 3-7 3-8 3-8 3-9 3-11 3-11 3-12 3-13 3-14 3-14 3-15 3-16 3-16 3-17 3-18 3-19 3-20 3-20 3-21 3-23
Contents
Overview Automatic Ring Maintenance
s
EAR or Ring Recovery Error Detection Mechanisms Underlying Reinstatement and Reconfiguration Unexplained Loss of Token Token Track Reinstatement and Reconfiguration Ring Error Threshold Multiple Faults EAR Ring Recovery Intervals and Output Messages ARR or Deferrable Node Recovery Overview of ARR Treatment of Out-of-Service Nodes Maintenance States Ring States Node Major States Node Minor States: Ring Position Node Minor States: Ring Interface Node Minor States: Node Processor Node Minor States: Maintenance Mode Summary of EAR Actions Three ARR Rules The One-Restoral-at-a-Time Rule The Fourth-Time Rule ARR Treatment of Unstartable, Quarantined Nodes ARR Treatment of Isolated Nodes ARR Recovery Intervals and Output Messages
Issue 16.0
December 2000
3-i
401-661-045
Contents
Manual Ring Maintenance
s
3-25 3-25 3-25 3-25 3-25 3-26 3-26 3-28 3-28 3-32 3-36 3-37 3-38 3-39 3-39 3-40 3-42 3-47 3-48 3-51 3-53 3-56 3-58 3-59 3-63 3-64 3-64 3-65 3-65 3-65 3-65 3-66 3-66 3-67 3-67 3-70 3-75 3-78
Ring Maintenance Interfaces Alarms Critical Alarms Major Alarms Minor Alarms Special IMS Indicators Display Pages Page 1105 The Ring Status Summary Page Page 1106 The Ring Node Status Page Ring Diagnostics Obtaining Diagnostic Results Diagnostic Listings Using Diagnostics Guide to Critical Ring Maintenance IMS Input Messages Critical Maintenance Procedures for Nodes Critical Maintenance Procedures for Nodes in Isolation Low-Phase Ambiguity Guideline to Single-Node Isolations Guideline to Multiple-Node Isolations Responding to Ring Down Employing Manual Ring Mode Ring Application Processor Critical Maintenance Procedure Recognizing and Finding Intermittent Faults Other Suggestions for Troubleshooting New Circuit Pack; Old Failure Unconditional Restorals Unexplained Loss of Token Avoiding Trouble Recording Trouble New Installations or Ring Growth Responses to Single, Ring-Related Faults Automatic Recovery from a Transient Fault by EAR Level 0 Manual Recovery from a Hard Fault Automatic Recovery from a Transient Fault by ARR Manual Recovery from a Hard Fault on a Small Ring
Examples of Ring Maintenance

s
3-ii
Issue 16.0
December 2000
Contents
s
Responses to Multiple, Ring-Related Faults Manual Recovery from Multiple Hard Faults Automatic Recovery from Two Intermittent Faults
3-85 3-85 3-101
Issue 16.0
December 2000
3-iii
401-661-045
Contents
3-iv
Issue 16.0
December 2000
Ring Maintenance
Overview
The design of ring maintenance reects the need to recover rapidly from faults that disrupt the transportation of messages on the ring or that prevent the processing and transmission of messages within nodes. Ring maintenance addresses this need with three types of automatic recovery actions which are called reinstatement, reconguration, and node restoral. When ring maintenance software determines that a fault has disrupted the ring subsystem, it acts to resume operation by one of two means. It can attempt to reinstate the current ring; that is, to return the ring to service as it was constituted prior to the fault. Or it can recongure faulty nodes out of the ring, thereby, resuming operation with the surviving resources. If it recongures the ring, ring maintenance software then acts, in parallel with resumed operation, to restore to service nodes it has removed; or if it cannot restore them to service, it directs technicians to repair or replace them and then to restore them to service manually. Reinstatement may be achieved locally and unannounced by such means as Direct Memory Access (DMA) restarts or reexamining evidence for a fault, or it may be achieved globally and visibly by ring initializations or ring restarts. Ring restarts occur when the ring cong module is called with instructions to reset the data selectors to their current positions. Reconguration is achieved either by quarantining or isolating faulty nodes. The design of ring maintenance associates faults with nodes. A fault in the ring interface, the node processor, or an auxiliary component is associated with the host node. A fault in the ring bus between nodes or in an interframe buffer is associated with the node immediately downstream of the fault. Associating faults
Issue 16.0
December 2000
3-1
401-661-045
with nodes means the ring can respond to faults by removing nodes from service, either by quarantining or isolating them. The type of reconguration chosen depends on the impact of the fault. If the impact is conned to the internal operations of the node, then the node will be quarantined. But if the fault has disrupted operation of the ring, then the node associated with the fault will be isolated. Automatic node quarantine occurs in response to instructions from the node processor of the faulty node or from the 3B21D. Automatic node isolation occurs when the ring cong module is called with instructions to set the data selectors in positions that create an isolated segment. Reinstatement will succeed in response to most soft faults, while most hard faults require reconguration. Soft faults are transient hardware problems or glitches in software, either of which is likely to be temporary. Soft faults may often be corrected simply by resuming operation of the system or of the component they have disrupted. (Sometimes, however, the effects of soft faults are sufciently severe that recovery requires reconguration.) By contrast, hard faults are failures in hardware or software which, once manifested, are likely to persist until they or their causes are corrected. Both reinstatement and reconguration provide rapid recovery, with the former usually being faster but less rigorous. When confronted with a fault in the ring subsystem, ring maintenance software must always choose to resume operation by one of these two means. When its rst choice is reinstatement, and that choice fails to achieve a stable and usable ring, it next tries reconguration. When, on the other hand, its rst choice is reconguration, reinstatement will not ordinarily follow, since reconguration, being the more thorough action, should succeed in all but the rarest cases. Reconguration precipitates the third type of recovery action employed by ring maintenance, node restoral. Node restoral occurs after operation of the recongured ring has resumed. It begins with ring maintenance software testing quarantined or isolated nodes to determine how best to treat them. In some cases, it can and does return them to service by automatic means. When it cannot or does not return them to service, it alerts technicians to repair or replace them and then to return them to service manually. Reinstatement and reconguration occur automatically. The work of node restoral also begins with automatic procedures, which give way to manual means only if the automatic procedures fail repeatedly or if diagnostics reveal a hard fault. Thus the usual role of technicians is to support ring maintenance by manually completing tasks software has begun. In some instances, however, manual intervention in the automatic machinery may be indicated. The organization of the next two chapters reects the operational division between automatic and manual ring maintenance. The next chapter describes the maintenance procedures that occur automatically, and the chapter that follows explains the related responsibilities of technicians.
3-2 Issue 16.0
December 2000
Ring Maintenance
Automatic Ring Maintenance

In the strategy of automatic ring maintenance described above, error analysis and recover (EAR) software performs the nondeferrable task of reinstating or reconguring the ring, while automatic ring recovery (ARR) software performs the deferrable task of node restoral. The following explanation of automatic ring maintenance begins with EAR, and then proceeds to ARR.
EAR or Ring Recovery

This discussion of EAR describes events in the order of their occurrence. EAR recognizes the existence of a fault from audits or by detecting errors in message format or message delivery. The work of error detection occurs chiey in the nodes which report errors to EAR in the 3B21D. EAR in the 3B21D then analyzes the errors to determine the type and location of the fault. Its analysis distinguishes between ring-related faults that obstruct the transportation of messages on the ring and node-related faults that prevent the processing and transmission of messages within nodes. Based on this information, together with its knowledge of the current ring structure, it decides whether to reinstate or recongure the ring. Ring reinstatement and reconguration are achieved by overlapping mechanisms, and these mechanisms are also discussed below.
Error Detection
The ring assumes that faults will produce errors in message format or message delivery, so it searches for faults by looking for errors. Errors may occur as messages are propagated on the ring that is, they may occur within ring interfaces or in the medium between ring interfaces as messages are transmitted or processed by node processors or auxiliary components, or as messages are transmitted between the ring and the 3B21D. The task of detecting and reporting errors is assigned chiey to the ring nodes. By means of circuitry in their ring interfaces and software in their node processors, nodes are usually able to detect errors internal to themselves. Moreover by means of failures in message delivery, nodes can often detect external errors, errors occurring in association with other nodes. When a node detects an error, it will, if it can, report the error to the 3B21D for analysis. An error associated with a fault that disrupts trafc on the ring is ordinarily rst detected by the circuitry of the ring interface. Every ring interface contains circuits for checking parity on the ring path as well as for detecting format errors in the messages it reads, writes, and propagates. When a ring-interface circuit detects an error, it informs its node processor by means of an interrupt. The node
Issue 16.0
December 2000
3-3
401-661-045
processor then interrogates the ring-interface hardware to determine the cause of the problem and reports, if it can, the identity and location of the error to the 3B21D via one or both rings. An error associated with a fault that prevents the transmission or processing of messages within nodes will usually be detected by the node processor. Such an error is typically caused by a fault in the node processor or by a node-processor detectable fault in one of the auxiliary components. From some errors of this type, nodes can recovery immediately by means of local reinstatement. They may, for example, be able to restart an attached processor that has incurred an error. Usually, however, reinstatement is not possible, and the node processor responds to the error by placing itself in quarantine, a condition that prevents it from reporting its state to the 3B21D. Instead the 3B21D usually learns of the condition from a report made by the rst node that attempts to send a message to the quarantined node. During normal operation, messages are read from the ring by the destination node. A node in quarantine, however, cannot read messages. Instead, a message addressed to it will, after traversing the entire ring, be detected and removed from the ring by the sending node, which will understand this condition as a SOURCE MATCH error and report it to the 3B21D. If a source match fails to materialize, however, or if an injured node processor is unable to quarantine itself, the condition will be detected by a node audit and reported to the 3B21D which responds, if needed, by quarantining the disabled node. Source-match errors are one of two means by which ring nodes detect errors external to themselves. The other is ring blockage. Blockage is the condition that exists when an upstream node cannot propagate data to its downstream neighbor. Every node has a timer on the output of each of its two ring paths. The timer expires if a byte of data being offered by the upstream node is not taken by the downstream node within a specied interval. Expiration of the timer implies a problem in the downstream node, for a node processor ordinarily reacts to an error that implicates its ring interface by forcing blockage on its ring input path. In this context, all interconnections between nodes, including interframe buffer circuits, are considered part of the downstream node. When a node processor detects blockage, it immediately drains the ring of any remaining data, including the token message, and reports the blockage to the 3B21D via the alternate ring.1 Errors may also be detected during the testing phase of ring initialization. Testing, which is more extensive in level-4 than in level-3 initializations, is in neither of these levels of initialization so detailed as in diagnostics. Nevertheless, errors
The node that rst detects blockage drains the ring to avoid confusing the 3B21D as to which node is immediately upstream of the faulty node. If it did not drain the ring, mass congestion would ensue, causing many upstream nodes to experience and report blockage. Even so, the initial blockage condition will often trigger two or three upstream blockage reports before the ring can be drained.
3-4 Issue 16.0
December 2000
Ring Maintenance
detected during these tests result in the same kinds of system actions as those detected during normal operations. Therefore, a ring may become active with some of its nodes newly quarantined or isolated. Finally, errors that are transparent to the ring may be detected by a user and reported to the ring. Such errors result from faults that occur in user hardware or rmware residing in the node or in user software residing in the node processor or in an attached processor.
Mechanisms Underlying Reinstatement and Reconguration

Two mechanisms underlie reinstatement and reconguration. One recongures the ring by quarantining nodes. The other consists of an escalative recovery strategy that includes, as possible though separate responses, both reconguration by node isolation and reinstatement. These mechanisms are discussed below. A node processor responds to errors indicating node-related faults that cannot be handled by local reinstatement by quarantining itself if it can.2 If the node processor cannot quarantine itself, it waits for the 3B21D, after detecting its disabled condition from a source match error or from an audit, to quarantine the node processor. The 3B21D also quarantines a node upon receiving a request to do so from a user, a request that is ordinarily the result of the user having detected an error that is transparent to the node processor and ring interface. The 3B21D responds to errors indicating ring-related faults that cannot be handled by local reinstatement by waiting a 100-millisecond listening period'' for the arrival of any additional reports about the same problem.3 It then analyzes its reports to determine the probable nature and location of the fault and responds with an escalative recovery strategy. The escalative recovery strategy consists of a sequence of six increasingly thorough actions designed to return an obstructed ring to service. Usually, the lowest, or 0-level action is rst tried, then if it fails to return the ring to service, the
An extended IUN with an attached-processor problem offers an important example of local reinstatement. Such a node does not quarantine itself immediately. Instead, the node processor audits the operational code of the attached processor and, if the audit passes, attempts to restart the attached processor. Only if the attached processor fails the audit or fails to restart after one attempt does the node processor report the condition to the 3B21D and then quarantine itself. Since most errors involve blockage, EAR usually receives at least two reports, one from the downstream node that detected the error directly and another from the upstream node that experienced the blockage.
Issue 16.0
December 2000
3-5
401-661-045
next higher-level action is tried, and so on. In this context, failure to return the ring to service means failure to resume operation of the ring at all or to sustain operation through a condence interval of 5 seconds. The levels of ring recovery actions are as follows: level 0 Unless the frequency of reported ring faults has exceeded a user-dened threshold, EAR rst attempts to reinstate the current ring by restarting it. Even when the frequency has exceeded the threshold, EAR still attempts to restart the ring, if analysis of error messages indicates that isolating the fault would seriously impact service to a user. If restarting fails or is not attempted, EAR uses error reports to locate the node or nodes associated with the fault, and it isolates them. If in response to level-1 action the ring failed to recover at all, EAR expands the isolated segment one node in each direction. If in response to level-1 action the ring failed to sustain its recovery through the condence interval, EAR bases the expansion on analysis of any additional ring transport error messages received. Level 2 is skipped on small rings. If these attempts, based on the original ring transport error reports, fail to achieve a stable ring, EAR discards the reports and initiates a new and comprehensive recovery tactic that attempts to locate and isolate the fault by employing tests for ring continuity. The continuity tests, which may escalate through three levels of increasing thoroughness, are designed to locate faults empirically by systematically testing message trafc on the ring. The two highest-level continuity tests include soak periods for nding transient faults. If a continuity test fails to nd a fault, the escalative recovery strategy is terminated and the ring reinstated in conformity with its structure prior to the rst level of EAR activity. Ordinarily the lowest-level continuity test will nd and successfully isolate the fault. If, however, each of the three levels of continuity testing nds a fault to isolate but fails in turn to establish or to retain through the condence interval a usable ring, IMS in the 3B21D aborts. Or, if the user prefers, the 3B21D undertakes a full process initialization and is reinitialized by the user. The levels of EAR escalative recovery actions are described in still greater detail in the reference chapter of this document.
level 1
level 2
levels 3-5
3-6 Issue 16.0
December 2000
Ring Maintenance

One exception exists to the escalative rules described above. Occasionally, message transport on the ring will be obstructed with resulting loss of token, yet the 3B21D will receive no error reports. This condition may occur because some ring nodes are unable to generate error reports, or because their error reports are unable to reach an RPCN and thereby gain access to the 3B21D. It might also occur as the result of a transient software problem. In any case, the RPCNs will detect the loss of token and inform the 3B21D of the condition.4 Having received no information except that the token has vanished, EAR software outputs a REPT RING TRANSPORT ERR/UNEXPLAINED LOSS OF TOKEN message, waits for the work of a software module called token track to complete, and then attempts an EAR recovery level-0 restart. If the restart fails to return the ring to service, then it jumps to recovery level 3.
Token Track
The token track module runs automatically when an unexplained loss of token occurs. Its purpose is to inform technicians of the probable area where the token was lost. It is not otherwise used by ring software. Its rst act is to conduct a ring continuity test. If the test fails, indicating that the loss of token was caused by a hard fault, token track aborts. If the test succeeds, indicating that the loss of token was probably caused by a transient fault, token track proceeds to search for the vicinity of the ring where the token was lost, and it reports this information to technicians in a REPT TOKEN TRACK message. The message reports either that the token was lost between specic nodes or else that, owing to failure of the continuity test, the program was unable to perform the analysis necessary to determine the area of loss. In instances when EAR continuity tests cannot locate an intermittent problem, token track may guide technicians to its vicinity. Token track operates by means of ip-ops that are toggled by the token message each time it passes a node. All IRN circuit packs are equipped with these ip-ops. Of the other pairs, 122/123 are not equipped, 122B/123B are not equipped, and 122C/123B are equipped with the ip-ops. On a ring with no token track circuit packs, token track will not work; and on a ring with a mixture of token track and nontoken track circuit packs, token track may not work effectively because the area identied for token loss may be impracticably large.
Reinstatement and Reconguration

The contributions of the two mechanisms described above to EAR's methods for resuming ring operation may now be stated.
RPCNs periodically check for the presence of the token by attempting to write to the ring. If they are prevented from writing by the absence of a token, they report this condition to EAR in the 3B21D.
Issue 16.0
December 2000
3-7
401-661-045
Reinstatement of the ring by restarting occurs as level 0 in EAR's escalative recovery strategy. It also occurs after a ring continuity test fails to nd a fault. Reconguration by node isolation occurs as levels 1 and 2 of EAR's escalative recovery strategy. It also occurs after any ring continuity test succeeds in nding a fault. Reconguration by node quarantine occurs in response to an instruction from the resident node processor or from the 3B21D. The ring employs the following rules for deciding whether to respond to a fault by reinstating or reconguring the ring. When a fault can be corrected locally and immediately, IMS reinstates the current ring. When a fault cannot be corrected by local reinstatement but can be treated by quarantine, IMS recongures the ring by quarantining the faulty node. When a fault is of the type that may require node isolation, IMS rst tries, subject to certain conditions described below, to reinstate the current ring by restarting it and resorts to isolating the fault only if restarting fails to achieve a stable ring.
Ring Error Threshold

The ring error threshold is set by the user to indicate:
s
the frequency of faults to be permitted by IMS before its practice of responding initially to ring-related faults with EAR level-0 (restarting the ring) is discontinued and replaced by EAR level-1 (isolating the fault), or after an unexplained loss of token, replaced by EAR level-3 (ring continuity testing).
The user sets the frequency by specifying both the number of faults to be allowed and the interval of time over which they are allowed. After the threshold is exceeded, an error-free period the length of the threshold interval is required before IMS returns to its normal practice concerning ring restarts.
Multiple Faults
If a fault occurs in the active segment of a ring that currently contains a fault-generated isolation, a multiple-fault condition exists. In this case the 3B21D determines the relative size of two ring segments as measured in each direction from the beginning of the current isolation to the point of new failure. It then directs that the larger segment become the active ring and places all nodes that comprise the smaller segment in isolation. Often when multiple faults occur, the isolated segment that results will contain innocent victim nodes, nodes that are isolated, not because they are defective, but because they are surrounded by defective nodes. Multiple faults are statistically rare but have the potential for causing many nodes to be out-of-service.5
3-8 Issue 16.0
December 2000
Ring Maintenance
EAR Ring Recovery Intervals and Output Messages

In this document error messages have been classied according to whether they indicate a ring-related fault (a fault that obstructs the transportation of messages on the ring) or a node-related fault (a fault that prevents the processing or transmission of messages within nodes). A message of the rst class is usually followed by ring restarts and, if restarts fail, by node isolation. A message of the second class is usually followed by node quarantine. A third class of messages exists that result in no change in ring or node connectivity. All three message types (including the third class) are reported, usually by nodes to the 3B21D, which in turn formats them and sends them to the MCRT and ROP as REPT RING TRANSPORT ERR messages. A descriptive list of these messages is included in Appendix B, Ring Maintenance Reference Material. The most common ring transport errors, the error types that technicians should probably know well, are:
s s s s s s
Blockage RAC Parity/Format Error Interframe Buffer Parity Error Source Match and SRC Match NAUD Failure, and Unexplained Loss of Token.
The outages that occur during ring recovery actions are chiey the result of ring silence. Ring silence is a condition imposed upon the nodes while the ring is restarting, initializing, or reconguring to achieve an isolation. During ring silence the nodes are not permitted to write to the ring. Although the actions of the IMS ring cong module to restart the ring or to achieve an isolation require only a brief period of ring silence, the periods of silence required by continuity tests are signicantly longer. Nevertheless, most EAR ring recovery attempts will be completed very rapidly. The lower levels of EAR escalative recovery actions are brief. A level 0, 1, or 2 recovery attempt may take from to 1 second to complete, while a level 3 attempt will usually take from 1.3 to 2 seconds. The soak periods of levels 4 and 5 make them somewhat more expensive. Typically, a level 4 attempt consumes 11 to 14 seconds and a level 5 attempt 90 seconds to 3 minutes, depending on ring size.
Overall system tolerance to these partial ring outages depends on the application. Where applications require very high availability of a particular user-node function, that function can be replicated on two or more nodes. By spacing these nodes equally around the ring, at least one member of the set should remain in the active ring segment for most cases of multiple ring faults.
Issue 16.0
December 2000
3-9
401-661-045
The brevity of all but the longest of these ring recovery attempts mean that technicians will ordinarily learn of them after they have completed. Moreover, with one exception, it is the practice of the 3B21D to queue error messages and send them to the MCRT only after the recovery level to which they apply has completed its attempt to return the ring to service. Technicians may infer, however, that a high-level recovery attempt is underway from previous output messages indicating failed recovery attempts at lower levels, as well as from the blinking of the ``no token'' lights on the circuit packs of all ring nodes, indicating that tests are occurring. The output messages concerning each ring recovery attempt will usually consist of the following items of information in the order shown: 1. A REPT RING CFR message announcing a specic level of EAR recovery attempt. 2. If the attempt was successful, a REPT RING CFR message indicating that the ring has been congured and is identifying the new ring structure. 3. If the attempt was unsuccessful, an REPT RING CFR message indicating the reason for failure. 4. Separate REPT RING TRANSPORT ERR messages identifying each error that was received by the 3B21D in response to the fault that gave rise to the recovery attempt. Notice that REPT RING TRANSPORT ERR messages ordinarily appear on the MCRT and ROP following the REPT RING CFR messages to which they apply. Yet, because each of these message types is stamped in milliseconds by the realtime clock, it is possible to conrm their relations. The real-time stamp on a REPT RING CFR message indicates the completion time of the attempt being reported. The real-time stamp on a REPT RING TRANSPORT ERR message indicates the time the report arrived at the 3B21D from a ring node. Remembering that, after receiving a ring transport error report that may lead to node isolation, the 3B21D observes a listening period of 100 milliseconds before analyzing its reports and acting upon them, technicians can reconstruct system events. One exception exists to the rule that the 3B21D queues error messages until the completion of the recovery attempt to which they give rise. If the 3B21D receives a loss-of-token report, then waits the 100-millisecond listening period without receiving another error report, it immediately reports REPT RING TRANSPORT ERR/UNEXPLAINED LOSS OF TOKEN to the MCRT and ROP before jumping to a level-3 recovery attempt. Therefore, in this single case the 3B21D reports events in the order of their occurrence. There is no time stamp on messages announcing loss of token. Though quarantining a node recongures the ring, it is not accomplished by the ring cong module and, therefore, produces no REPT RING CFR output message. Instead, technicians learn that a node has become quarantined from
3-10 Issue 16.0
December 2000
Ring Maintenance
RMV RPCN or RMV IUN output messages and from indicators on display pages. Also, when a node experiences a fault that leads to quarantine, it attempts to send a message to the 3B21D identifying the type of error that occurred. Currently EAR does not use the message for fault analysis. It does, however, report the error on the MCRT and ROP in the second line of a REPT ERROR output message. In the event of an intractable problem, technicians should record and report this line. The line will indicate, among other matters, whether the error was soft (requiring no system action), rm (requiring a restart), or hard (requiring a repump of the node software).
ARR or Deferrable Node Recovery

Fundamental to the recovery strategy of automatic ring maintenance is the complementary action of ARR to EAR software. When EAR recongures a suspected fault out of the ring, either by quarantining or isolating a node, ARR assumes its responsibility of either returning the node to service or, if it determines that the node should not be returned to service, of directing technicians to repair or replace its faulty equipment and then returning it to service manually. ARR determines not to return a node to service when it has failed diagnostics or when it has become a chronic problem. After either of these events, ARR immediately surrenders control of the node to technicians whose responsibility it becomes to perform maintenance on it manually.
Overview of ARR Treatment of Out-of-Service Nodes

ARR can return nodes to service by restarting or restoring them. The two methods are achieved under different circumstances and according to different rules. Node restarts can occur only when a node has quarantined itself. Upon detecting an error in its node processor or in an auxiliary component, a node in the active ring attempts to quarantine itself. It then, in response to most error-types, runs an internal audit to test the integrity of its node-processor operational code and, if the audit passes, attemptswith the assistance of the 3B21Dto restart itself. (If the node is an extended IUN, it will audit the operational code of the attached processor as well.) A restart is done without downloading code. Rather, the node nds a safe place in its current code and places it in execution. A successful restart results in the node being returned to service almost immediately.6 On the other hand, if a node with a faulty node processor or auxiliary component is unable to detect internal faults, unable to quarantine itself, unable to pass an
In response to a few error-types, however, a self-quarantined node does not attempt to restart itself but waits for the 3B21D to detect its state and to return it to service by restoring it in the manner described below.
Issue 16.0
December 2000
3-11
401-661-045
internal audit, or unable to restart after one attempt, the 3B21D will detect its disabled condition, and if it is not already quarantined, quarantine it. Then ARR in the 3B21D will restore the node to service. ARR restores a node by downloading it with new operational code and placing the code into execution. Nodes may be restored either unconditionally without being previously diagnosed or conditionally by having their return to service depend on their passing all automatically-run diagnostic tests.
Maintenance States
ARR is driven to do its work by system indicators called IMS maintenance states. Maintenance states identify the operational mode of the ring and the operational mode, functionality, and condition of each ring node. They are determined and announced by programs in the 3B21D, mainly by EAR software. In addition to driving ARR to do its work, maintenance states serve as a primary source of system information for IMS users and for technicians who should always consult them before taking any manual action. Technicians may learn of current maintenance states from the IMS 1106 display page or from the OP:RING command. They should keep in mind that because maintenance states represent the central processor's knowledge of a distributed system, this knowledge under certain conditions may be temporarily incorrect. A node processor, for example, is allowed to quarantine itself if it detects certain irregularities in its software, but the 3B21D may not learn of this change of state until it has conducted a node audit or received a source match error. The following are the different classes of maintenance states:
s s s s s s
Ring state Node major state Node minor state: ring position Node minor state: ring interface Node minor state: node processor Node minor state: maintenance mode.
These states are explained below.
Ring States
The ring state identies the current operational mode of the ring. The following states are possible:
3-12 Issue 16.0
December 2000
Ring Maintenance
Ring Normal - This state represents the two-ring conguration, with one ring serving as the active path that chiey transmits user messages and the other serving as a standby path that may also transmit administrative and maintenance messages. A normal ring contains no isolated segment, but it may contain quarantined nodes. Ring Isolated - In this state the ring contains an isolated segment. The nodes that bound the isolation are active and are identied as the beginning-of-isolation (BISO) and the end-of-isolation (EISO) nodes. Any node, including an RPCN, may act as a BISO or an EISO node. The ring cannot contain more than one isolated segment. Ring Restoring - When Ring Restoring appears as a transitory state, it indicates a condition that occurs very briey during ring reconguration. When Ring Restoring appears as an extended state, it indicates the responses of automatic maintenance to a failed BISO or EISO node. When a BISO or EISO node experiences a node-processor failure, critical node recovery (CNR) software rst attempts to conditionally restore it. (Restoral software knows to run only those diagnostic phases that do not require isolation.) If the conditional restoral fails, ring cong extends the isolated segment to include the faulty node. Attending to a failed BISO or EISO node is the highest priority activity of ARR/CNR. Ring Conguring - In this state the ring is initializing, restarting, being recongured to isolate or unisolate one or more nodes, or engaged in one or more levels of EAR escalative recovery action. Ring Down - Chief among conditions that cause the ring to go down are when the 3B21D cannot communicate with it through any RPCN or when it is so fragmented by faults that EAR cannot dene an active segment long enough to satisfy the criterion for minimum length. The rst condition is most likely to occur when, in a two-RPCN environment, one RPCN has been manually taken out of service, after which the other experiences a failure in its 3B interface or duplex dual serial bus selector. During the time the ring is down, it is possible in some applications of IMS that all IUNs will continue to receive and transmit messages on the ring.7 For a fuller discussion of this matter, see the section ``Responding to Ring Down'' in this chapter.
Node Major States

The node major state identies the current operational mode of each node. The following states are possible:
Technicians probably have no way of conrming this to be the case.
Issue 16.0
December 2000
3-13
401-661-045
ACT - Active. An active node is on-line and capable, unless the ring is silenced or conguring, of performing all required functions. An active node is neither quarantined nor isolated. In this document, the expression ``to return a node to service'' means to give it ACT status. OOS - Out of service. An out-of-service node is unavailable for certain uses. The uses depend upon whether the node is quarantined or isolated. If the ring position (see below) of an out-of-service node is NORM, then the node is quarantined and can propagate messages on the ring, although it cannot read, write, or otherwise process messages. If the ring position of an out-of-service node is isolated, the node is entirely excluded from the active ring. Nodes in either OOS state are ordinarily able to receive and transmit only maintenance information and instructions. STBY - Standby. This designation is used for RPCNs only. It indicates that a healthy RPCN is prevented from doing its work by the circumstance that the ring is down or conguring. It also appears as a transitional condition when an RPCN is being grown and during system-wide initializations. INIT - Initializing. The attached processor of an extended node is being restarted or restored. The INIT state occurs as the second stage of restarting or restoring extended nodes. In the rst stage, the node processor is restarted or, in the case of restorals, downloaded with operational code and set to executing. In the second or INIT stage, the attached processor is treated similarly. For DLNs the second stage also includes tests of the DMA channel. OFL - Off-line. The node is quarantined out-of-service preliminary to being assigned a role in the active ring. Nodes should not be allowed to remain long in this condition, because their quarantined state prevents their node processor from fullling its important and unassignable role of error detection and reporting. GROW - Grow. The node is physically being added to or removed from the ring. During growth or degrowth, the node must always be isolated. UNEQ - Unequipped. Either the unequipped node has no hardware, or ring connections physically bypass it. Still, a place holder for the node exists in IMS software.
Node Minor States: Ring Position

The ring position of each node indicates its function within the current structure of the ring. The following are the four possible ring positions.
s
NORM - Normal. The node is included in the active ring and is neither a BISO nor an EISO node. A node in the NORM state may be quarantined; if it is quarantined, its node major state will be OOS or OFL. BISO - The node is included in the active segment of an isolated ring and bounds the beginning of the isolated segment.
3-14 Issue 16.0
December 2000
Ring Maintenance
EISO - The node is included in the active segment of an isolated ring and bounds the ending of the isolated segment. ISOL - Isolated: The node is contained in the isolated segment of an isolated ring. Its node major state will be OOS or OFL.
Node Minor States: Ring Interface

This state characterizes for each node the current condition of its ring interface.
s
USBL - Usable. This is the default state. In other words, IMS regards ring-interface hardware as usable unless it has received an error message, a diagnostic result, or has detected a ring condition indicating otherwise. QUSBL - Quarantine-usable, that is, usable by the ring to propagate data but not usable by the node processor, which is insulated from the ring as in the quarantine (OOS NORM) state. IMS sets ring-interface hardware of any node to QUSBL when diagnostics nd or suspects a fault in the ring interface that does not prevent it from propagating messages on the ring. A node that fails only diagnostic phase 10, for example, would be set to QUSBL. When, under these circumstances, a ring interface is set to QUSBL, IMS unisolates the node if possible, quarantines it, and changes its maintenance mode (see below) to manual. Before performing diagnostics or other maintenance functions on the ring interface of the node, however, the node must be isolated. IMS sets the ring interface of an IUN to QUSBL and the node processor to FLTY when, during a level-4 initialization, the node fails a communication test of its ability to receive downloaded code. If this occurs, the ring will return to service with the node in question quarantined and in the automatic maintenance mode. IMS sets the ring interface of a node to QUSBL as a way of unisolating a node that is suspected of being faulty but that, as a member of an isolated segment, has passed phases 1 and 2 diagnostics without being subjected to further diagnostic phases.
FLTY - Faulty. The 3B21D has received information indicating that the ring-interface hardware is faulty. Thus the node is, or is about to be, isolated. UNTSTD - Untested. The minor states of nodes are maintained in core memory only, not on disk or in ECD. Therefore, during a level 3 or level 4 initialization, the system loses knowledge of the ring-interface states of out-of-service nodes and must retest them. The testing is done during initialization, during which time their ring-interface states will briey be UNTSTD.
Issue 16.0
December 2000
3-15
401-661-045
Node Minor States: Node Processor

This state characterizes for each node the condition of the node processor and/or of the auxiliary components.
s
USBL - Usable. This is the default state. In other words, IMS regards node processors and auxiliary components as usable unless it has received an error message, a diagnostic result, or has detected a ring condition indicating otherwise. FLTY - Faulty The node processor and/or one or more auxiliary components is known or suspected to be faulty. The 3B21D sets the node-processor state to FLTY when it receives error messages implicating the node processor or an auxiliary component. It also sets the state to FLTY when it learns that a node has quarantined itself. Nodes ordinarily quarantine themselves when they detect a problem in their node processors or in an auxiliary component. Thus the node-processor FLTY state does not necessarily mean that a problem is in the node processor. It could be in the node processor or in any of the auxiliary components of the node. UNTSTD - Untested. Node minor states are maintained in current memory only, not on disk or in ECD. Therefore, during a level-3 or level-4 ring initialization, the system loses knowledge of the node-processor states of out-of-service nodes and must retest them. The testing is done during initialization, during which time their node-processor states will briey be untested.8
Node Minor States: Maintenance Mode

The maintenance mode of a node is always either automatic or manual.
s
AUTO - Automatic. In this mode a node is under control of IMS software. Nodes in the ACT state are always under automatic control. Nodes in the OOS state are under automatic control as long as ARR software is acting upon them. MAN - Manual. This mode indicates that an out-of-service node is under the control of technicians. Control will change to manual because of the following:
If, during ring initialization, a fault occurs requiring an isolation that includes innocent victim nodes, the node-processor hardware of the innocent victims might not have been tested before the isolation occurred and could not be tested during the isolation. In this case, the innocent victims would be quarantined, their ring-interface states set to usable, and their node-processor states set to untested. Then, when the isolation is dissolved, ARR, assuming that UNTSTD equals USBL, returns the nodes to service in accordance with its standard algorithm which is explained below.
3-16 Issue 16.0
December 2000
Ring Maintenance
A technician has entered a form of the RMV, DGN, or RST command or has entered a command with similar consequences from the 1106 display page ARR has determined from diagnostics that a hard fault exists and is directing technicians to correct it ARR nds itself being asked for the fourth time within an hour to return the same node to service (explained below), or The application has requested that the node be placed in the manual mode. When such a request occurs, the ring-interface and node-processor states will remain set at USBL.
Summary of EAR Actions

When EAR detects problems in the ring, it changes maintenance states to communicate its actions and its knowledge. The following table maps node problems to state changes and EAR actions: Table 3-1. Node Problems Mapped to Maintenance States and EAR Actions (Page 1 of 2) EAR ACTION
None None
NODE PROBLEM
None Local restart of an attached processor Faulty NP or auxiliary component User request to test user interface Faulty RI hardware Faulty RI hardware that does not interfere with propagating messages on the ring Innocent Victim
NODE STATE
ACT INIT
RING POSITION
NORM/ BISO/EISO NORM/ BISO/EISO NORM/ BISO/EISO NORM/ BISO/EISO ISOL NORM
RI STATE
USBL USBL
NP STATE
USBL USBL
MAINT. MODE
AUTO AUTO
Quarantine the node Quarantine the node Isolate the node Quarantine the node
OOS
USBL
FLTY
AUTO
OOS
USBL
USBL
AUTO
OOS OOS
FLTY QUSBL
USBL USBL
AUTO AUTO
Isolate the node
OOS
ISOL
USBL
USBL
AUTO
Issue 16.0
December 2000
3-17
401-661-045
Table 3-1.
Node Problems Mapped to Maintenance States and EAR Actions (Page 2 of 2) EAR ACTION
Isolate the node
NODE PROBLEM
Faulty NP or auxiliary component and faulty RI Needed to begin an isolation Needed to end an isolation Untested NP
NODE STATE
OOS
RING POSITION
ISOL
RI STATE
FLTY
NP STATE
FLTY
MAINT. MODE
AUTO
Congure as BISO node Congure as EISO node Quarantine the node
ACT
BISO
USBL
USBL
AUTO
ACT OOS
EISO NORM
USBL USBL
USBL UNTSTD
AUTO AUTO
Three ARR Rules

In attempting to restore out-of-service nodes, ARR observes the following three rules:
s s s
Restoral priorities rule One-restoral-at-a-time rule Fourth-time rule
Procedure 3-1. Restoral Priorities Rule

If several nodes are simultaneously out-of-service and still under automatic control, ARR acts to restore them in the order shown below: 1. Inactive BISO and EISO nodes 2. Nodes whose ring-interface state is FLTY (isolated) (In 3.4 and later generics, application-nominated critical nodes with faulty ring-interfaces are restored before other nodes with faulty ring interfaces.) 3. Innocent victim RPCNs (isolated) 4. Application-nominated critical nodes with high priority (quarantined) 5. Other RPCNs (quarantined)
3-18 Issue 16.0
December 2000
Ring Maintenance
6. Application-nominated critical nodes with low priority (quarantined) 7. Innocent victim IUNs (isolated) 8. Other IUNs (quarantined)
Nodes awaiting ARR restoral efforts may be contained in the active ring segment; or they may be contained in, or as BISO and EISO nodes associated with, the isolated segment. Because ARR's highest priority is to dissolve isolations, it deals rst with nodes contained in or associated with an isolated segment. First, it attempts to return to service any node that has become inactive after being designated a BISO or EISO node.9 Next, it attempts to restore nodes that, by virtue of having faulty ring interfaces, are responsible for the isolation. Then, it restores healthy nodes that were victims of the isolation. Finally, having dissolved the isolation by restoring all isolated nodes, ARR turns to restore any quarantined nodes. The restoral priority list does not apply to node restarts, however, which occur independent of, and may occur in parallel with, node restorals.
The One-Restoral-at-a-Time Rule

When ARR undertakes to restore a node, whether conditionally or unconditionally, it cannot begin to restore another until any current restoral effort is completed or terminated. To conditionally restore a node, ARR must request that the RTR Maintenance Input Request Administrator (MIRA) do the job.10 To unconditionally restore a node, ARR does not use MIRA but performs the work itself. Application-Nominated Critical Nodes. The rule that ARR cannot begin to restore a node until its previous restoral attempt completes has one exception. When an application-nominated critical node requires restoral, ARR aborts an ongoing restoral attempt in favor of the critical node, provided that the critical node is higher on the restoral priority list than then node currently being restored. Application-nominated critical nodes occupy the fourth and sixth positions on the list.
The Fourth-Time Rule

To prevent a transient problem from repeatedly disrupting the ring, ARR keeps a leaky-bucket count of the number of times it has restored a node to service. If, within a 60-minute interval, ARR has restored a node to service three times and is then called upon to restore it a fourth, it refuses to do so. Instead, it leaves it
9 10
These are termed IMS critical nodes. Their recovery efforts go by the special title critical node recovery (CNR), a title that may appear on IMS display pages. Technicians may learn of the status of IMS requests at MIRA from the RTR OP:DMQ command, as well as from IMS 1105 and 1106 display pages, which are discussed in the this chapter.
Issue 16.0
December 2000
3-19
401-661-045
out-of-service and changes its maintenance mode from automatic to manual, thus, delegating the problem to technicians. In this document this practice is called the fourth-time rule. Self-initiated node restarts are not counted in the fourth-time rule, nor are unconditional and conditional restorals distinguished. Thus any combination of four restorals during a 60-minute interval violates the rule.
ARR Treatment of Unstartable, Quarantined Nodes

Upon quarantining a ring node or recognizing that a self-quarantined node has failed to restart itself, EAR changes the node-processor state to FLTY, while leaving the ring-interface state USBL. ARR responds to such a node by taking two actions: it initiates a count of times that it has restored the node, and it restores the node unconditionally. Then, if ARR nds the node quarantined a second time, it increments the counter and again restores the node unconditionally. If, however, within the same 60-minute interval, ARR nds the same node quarantined a third time, it increments the counter, isolates the node, and runs full diagnostics on it. If the node passes diagnostics, ARR restores it to service. If the node fails diagnostics, ARR leaves it out of service (unisolating it if possible) and changes its maintenance mode to manual. Finally if, within the same 60-minute interval, ARR is called upon to restore the node a fourth time, it does not do so but instead obeys the fourth-time rulethat is, it leaves the node out-of-service and changes its maintenance mode to manual. Thus ARR's algorithm for handling, within an hour, restorals of a quarantined node whose ring-interface state is USBL and whose node-processor state is FLTY, may be summarized as pump-pump-diagnose-quit. There is one exception to this rule. ARR's practice for handling a newly-quarantined node in automatic maintenance mode with ring-interface QUSBL is to isolate the node and run diagnostics on all its components. ARR does not attempt to restore these nodes unconditionally.11 Extended IUNs are treated in the same manner as basic IUNs with this exception: when ARR restores an extended node, it does so in two or three steps. First, the node processor is downloaded with operational code and placed in execution. Second, the attached processor is treated in the same manner. Third, if the node is a DLN, the DMA channel is initialized. An extended node is in the INIT state during the second and third steps. Initialization of out-of-service nodes will usually take from 30 seconds to 2 minutes. Nodes may also be quarantined at the request of the user. This will usually occur when the user believes that a hardware problem exists in the external user interface and wants the interface to be diagnosed. ARR responds to this request by keeping the node in quarantine, leaving both the ring-interface and node-processors states USBL, and running diagnostic tests on only the
11 A node whose ring-interface state is QUSBL would be in automatic maintenance mode only if its condition had been detected during level-4 ring initialization. Any node whose ring interface was set to QUSBL as a consequence of diagnostic failure would be in the manual maintenance mode. See the discussion of the QUSBL ring-interface state above.
3-20 Issue 16.0
December 2000
Ring Maintenance
application interface. If the external user interface passes diagnostics, ARR automatically returns the node to service. If it fails diagnostics, ARR changes the maintenance mode of the node to manual.
ARR Treatment of Isolated Nodes

ARR treats an isolated node according to the states of its ring interface and node processor. When both states are USBL, indicating that the node is an innocent victim, ARR restores the node unconditionally as soon as the isolation can be dissolved. When the ring-interface state of an isolated node is FLTY, ARR runs diagnostics on all its components and, if they pass, attempts to eliminate or reduce the isolated segment so the node can be returned to service or, if they fail, leaves the node isolated and changes its maintenance mode to manual. When diagnostic tests of the ring interface indicate a fault but one that would not prevent it from propagating messages on the ring, diagnostic software changes the ring-interface state from FLTY to QUSBL and the maintenance mode to manual and then quarantines the node as soon as the isolation can be dissolved. Finally, when the ring-interface of an isolated node passes diagnostics but the node processor or an auxiliary component fails, diagnostic software changes its ring-interface state from FLTY to USBL, its node-processor state to FLTY, and the maintenance mode to manual and then quarantines the node as soon as the isolation can be dissolved. Thus ARR responds to any diagnostic failure in a node by changing its maintenance mode to manual. The fourth-time rule applies to restoring isolated nodes and to restoring any combination of quarantined and isolated nodes. Table 3-2 on page 3-21 summarizes possible ARR responses to maintenance states. Table 3-2.
NODE STATE ACT
ARR Responses to Maintenance-States (Page 1 of 3)

POSITION NORM RI STATE USBL NP STATE USBL CIRCUMSTANCE n/a ARR ACTION 1 None ARR ACTION 2
Issue 16.0
December 2000
3-21
401-661-045
Table 3-2.
NODE STATE OOS

POSITION NORM RI STATE USBL NP STATE FLTY CIRCUMSTANCE 1st or 2nd time in hour 3rd time in hour ARR ACTION 1 pump & return to service isolate & diagnose (pass) isolate & diagnose (fail) 4th time in hour manual maintenance pump & return to service isolate & diagnose (pass) isolate & diagnose (fail) pump & return to service manual maintenance pump & return to service manual maintenance pump & return to service manual maintenance pump & return to service manual maintenance ARR ACTION 2
OOS
NORM
USBL
UNTSTD
n/a
OOS
NORM
QUSBL
FLTY
n/a
OOS
NORM
USBL
FLTY
extended node
isolate & diagnose (pass) isolate & diagnose (fail)
OOS
ISOL
FLTY
USBL
1st, 2nd or 3rd time in hour
isolate & diagnose (pass) isolate & diagnose (fail)
4th time in hour
manual maintenance
3-22 Issue 16.0
December 2000
Ring Maintenance
Table 3-2.
NODE STATE OOS

POSITION ISOL RI STATE FLTY NP STATE FLTY CIRCUMSTANCE 1st. 2nd or 3rd time in hour ARR ACTION 1 isolate & diagnose (pass) isolate & diagnose (fail) 4th time in hour manual maintenance quarantine manual maintenance ARR ACTION 2 pump & return to service manual maintenance
OOS
ISOL
USBL
FLTY
n/a
OOS
ISOL
USBL
USBL
isolation ends
pump & return to service chg. BISO to NORM chg. EISO to NORM
ACT ACT
BISO EISO
USBL USBL
USBL USBL
isolation ends isolation ends
ARR Recovery Intervals and Output Messages

ARR activities are reected in the status information provided by the IMS 1105 and 1106 display pages which are described in the next chapter of this document. In addition, results of ARR actions are reported by the following output messages.
Table 3-3.
Output Messages that Report ARR Actions OUTPUT MESSAGE

RMV RPCN... RMV IUN... DGN RPCN... DGN IUN... RST IUN...
ARR ACTION OR RESULT

Request to quarantine an RPCN Request to quarantine an IUN Request to diagnose an RPCN Request to diagnose on an IUN Request to diagnose and restore an IUN to service
Issue 16.0
December 2000
3-23
401-661-045
Table 3-3.
Output Messages that Report ARR Actions OUTPUT MESSAGE

RST RPCN... DGN:AUDIT:RING... REPT RING CFR REPT IUN PUMP... REPT IUN RST... REPT RPC INIT... REPT ARR AUTORST a b FOR c STARTED REPT ARR AUTORST a b FOR c SUCCEEDED REPT ARR AUTORST a b FOR c FAILED REPT ARR AUTORST a b FOR c ABORTED REPT ARR AUTORST RECOVERY THRESHOLD EXCEEDED FOR c REPT ARR AUTORST TIMEOUT AWAITING MIRA FOR c REPT ARR AUTORST a b FOR c STOPPED <INHIBITED>
ARR ACTION OR RESULT

Request to diagnose and restore an RPCN to service Abortion of a diagnostics request because of an error Outcome of a request to recongure the ring Abortion of an IUN pump Failure of an IUN restore Failure of RPCN initialization during a restore or restart Start of an ARR recovery attempt Success of an ARR recovery attempt Failure of a diagnostic phase Abortion of a diagnostic request Violation of the fourth-time rule Time out of a restoral request Inhibition of a restoral request
The time taken by ARR to return a node to service varies considerably, depending on such factors as the type of restoral and the number of jobs waiting in MIRA's queue. An unconditional restoral usually takes 30 to 90 seconds. A full and successful diagnosis of a basic IUN or RPCN may take 5 to 8 minutes, while a failing diagnosis usually takes somewhat longer. Diagnosis of an extended node takes longer still, perhaps as much as 15 minutes.
3-24 Issue 16.0
December 2000
Ring Maintenance
Manual Ring Maintenance

This chapter explains tools and procedures used in manual ring maintenance and offers suggestions to technicians for solving hard problems and avoiding easy mistakes.
Ring Maintenance Interfaces

Technicians who maintain the ring are supported in their responsibilities by various maintenance interfaces. The maintenance CRT terminal (MCRT) provides an interactive interface that outputs IMS and other system messages and status information while accepting as inputs IMS and other system commands. IMS input and output messages will be recorded on the maintenance read only printer (ROP), if it is turned on. In addition, various audible and visual alarms act to alert technicians to important IMS events. These maintenance interfaces as they pertain to IMS are explained below.
Alarms
The following alarms indicate trouble that may affect IMS equipment:
Critical Alarms
A critical condition or fault in or associated with the IMS ring will be indicated by an asterisk C (*C) preceding the ROP output message that identies the problem. It may also be indicated by an audible alarm and a red CRITICAL indicator on each MCRT display-page header.
Major Alarms
A major condition or fault in the IMS ring is indicated by two asterisks (**) preceding the ROP output message that identies the problem. It may also be indicated by the following:
s s s
An audible alarm A red MAJOR indicator on each MCRT display-page header, and A red lamp on the aisle containing the frame/cabinet where the fault or failure occurred.
See the Special IMS Indicators'' section in this chapter for descriptions of other indicators that may appear with a major alarm.
Issue 16.0
December 2000
3-25
401-661-045
If a major alarm is caused by a power failure, the POWER indicator on each MCRT display-page header will show red, and display page 1111 will identify the type and location of the problem. If the problem is a failed power converter circuit pack in an IMS frame/cabinet, the lamp at the aisle containing the disabled frame/ cabinet will show red, and inside the frame/cabinet the power alarm light at the top-left will show red also.
Minor Alarms
A minor condition or fault in the IMS ring is indicated by one asterisk (*) preceding the ROP output message that identies the problem. It may also be indicated by the following:
s s s
An audible alarm A red MINOR indicator on each MCRT display-page header, and A yellow lamp on the aisle containing the frame/cabinet where the fault or failure occurred.
See Special IMS Indicators'' below for descriptions of other indicators that may appear with a minor alarm. If a minor alarm is caused by a power failure, the POWER indicator on each MCRT display-page header will show red, and display page 1111 will identify the type and location of the problem. If the problem is a single failed fan in an IMS frame/cabinet, the lamp at the aisle containing the disabled frame/cabinet will show yellow, and inside the frame/cabinet the power alarm light at the top-left will show red.
Special IMS Indicators

A ring-quarantine (RQ) LED is located on IRN circuit packs. When the RQ LED shows red, it indicates that the node containing the circuit pack is quarantined from the ring. A no-token (NT) LED is located on IRN circuit packs. The chief purpose of the NT LED is to indicate, by lighting red, when the node is isolated. The NT LED mechanism works by detecting the absence of token messages. The ring interfaces in IRNs, however, cannot make this distinction; so, during periods when diagnostic are occurring, their NT LEDs will blink off and on as test messages pass. At other times, however, IRN NT LEDs on isolated nodes will show constant red. In addition, when all NT LEDs, of whatever type, in the ring are lighted, the ring is down. Each circuit pack in the ring application processors (RAPs) of CDN-1 is equipped with an LED that indicates when the pack has failed a diagnostic phase. Some of these LEDs also turn on when the RAP is initializing and then turn off when
3-26 Issue 16.0
December 2000
Ring Maintenance
initialization tests conrm that the rmware within the pack is executing. The nature and uses of these LEDs are explained in the section ``Ring Application Processor Critical Maintenance Procedure.'' The application-processor circuit pack in a direct link node (DLN) is equipped with green, red, and yellow LEDs. The green stays on during normal operation and goes off when the node is taken out-of-service, when a hard panic occurs in the node processor, or when diagnostic code begins to be downloaded, whichever occurs rst. The red and yellow LEDs come into play as either diagnostic or operational code is downloaded. Diagnostic phase 41 begins with a rmware test. During the test the red and yellow LEDs come on and stay on permanently if the test fails. If the test passes, the red goes off briey, then joins the yellow back on again as the diagnostic proper begins. If the diagnostic fails, the yellow goes off and the red stays on. If the diagnostic passes, the red goes off and the yellow stays on until the node processor receives the diagnostic results, at which time it goes off. Then red and yellow come on and go off again as operational code is downloaded, and the green comes on as the attached processor is placed in execution. If technicians wish to consult support about the performance of a DLN, they might rst observe the behavior of these LEDs so they can report it. Output messages on the ROP are preceded, when appropriate, by an M or an A, indicating that the action described in the message is the result of a manual or an automatic IMS request. Table 3-4 on page 3-27 shows the IMS output messages accompanied by the types of alarms. Table 3-4. Alarms Associated with IMS Output Messages (Page 1 of 2) SEVERITY MESSAGE CRT
REPT DB INIT REPT ERROR REPT IMSDRV AUD REPT IMSDRV FLT REPT IMSDRV INIT REPT IUN REPT MSDC FLT REPT OP_RTM FLT REPT PSDO_UMS>P FLT REPT RING GROWTH REPT RING INIT X X X
MAJ
X X
MIN
X X X X X X X X X X
Issue 16.0
December 2000
3-27
401-661-045
Table 3-4.
Alarms Associated with IMS Output Messages (Page 2 of 2) SEVERITY MESSAGE CRT MAJ MIN
X X X X
REPT RING TRANSPORT ERR REPT TDTP FLT AUD CNC AUD NODEST
Other IMS output messages are not accompanied by audible or visual alarms.
Display Pages
IMS provides technicians with two MCRT display pages, page 1105, the Ring Status Summary Page, and page 1106, the Ring Node Status Page. These pages are similar in appearance and function to RTR display pages, and the procedure used to access them is also the same. The rst three lines of the IMS pages, consisting of the standard header information that appears on all RTR display pages, are omitted from the illustrations that follow. For more information on Status Display Page(s), see 410-610-160, The FLEXENT/AUTOPLEX Wireless Networks, Executive Cellular Processor (ECP) Operations, Administration, and Maintenance Guide. To access a particular display page, perform the following actions in the order indicated. 1. Type the NORM/DISP key. 2. Place the MCRT in the command mode by typing the CMD/MSG key. 3. Type and enter 1105 or 1106 on the numeric key pad. During ring initialization and conguration, indicators or data shown on display pages may be invalid or out of date; and during disk independent operation, the display page process is terminated.
Page 1105 The Ring Status Summary Page

The 1105 display page provides status information about the entire IMS ring. Figure 3-1 is typical of an 1105 page for small IMS ofces.
3-28 Issue 16.0
December 2000
Ring Maintenance
CMD> [Ring Major State]
-- 1105 RING STATUS SUMMARY -[Ring Error Threshold State] CMD Function 400 OP Ring Detailed [ARR Restore; System Indicator; IMSRTS.P indicator] [ARR Restart] [ACNR Restore or Restart] 00AAAOAAAiigAOO... 01.AAAAOOAA...AAAA 02.AAAAAAAAA...AAA
32AAAAAAAAOOOAAA..
33.AAAAAAAAAAAAAAA
34.AOOOOOAAAAAAAAA
Figure 3-1. A 1105 Display Page The 1105 page, as exemplied in the above gure, offers the following information and capabilities: The rst line contains, on the left, the CMD> prompt for command entries and, on the right, the page title. To enter display commands, move the cursor to the CMD> prompt by typing the CMD/MSG key, then enter the command. The next three lines identify, in square brackets, locations on the page where the types of information, shown within the square brackets, will appear, when appropriate. The brackets themselves will not appear on display pages.
s
[Ring Major State] appears at the location where the current ring state will be displayed. One of the following states should always be present: RING RING RING RING RING STATE ACTIVE STAT ISOLATED SEGMENT STAT CONFIGURING STAT DOWN STAT RESTORE
[Ring Error Threshold State] is the location where a message will appear when the Ring Error Threshold has been exceeded. The threshold is set by the user to indicate the number of faults per interval of time to be permitted before the IMS practice of responding initially to ring-related faults with EAR level-0 (restarting the ring) is discontinued and replaced by
Issue 16.0
December 2000
3-29
401-661-045
EAR level-1 (isolating the fault) or, in response to unexplained loss of token, by EAR level-3 (ring continuity testing). After the threshold is exceeded, an error-free period of time the length of the threshold interval is required before IMS returns to its normal practice concerning ring restarts. When IMS returns to its normal practice, the Ring Error Threshold Exceeded tag will disappear from the 1105 page, and the location will be blank.
s
The information CMD Function/400 OP Ring Detailed appears permanently on the 1105 page to remind technicians that the page also allows entry, at the CMD> prompt, of the 400 command, which produces the same output as the input message OP:RING;DETD. [ARR Restore; System Indicator; imsrts.p Indicator] appears at the location where a, b, or c, below, will appear: A node that ARR is currently attempting to restore, conditionally or unconditionally. The identication will read ARR followed by the method of restoral (UCL for unconditional, COND for conditional) followed by the node name in the form NODEa b. If ARR is attempting to restore an EISO or BISO node (see "Three ARR Rules'' above), CNR will appear in place of ARR. One of the following system states of IMS:
s
IMS FPI PROLOGUE (appears during the initial stage of an FPI initialization) IMS SYS BOOT (appears during the initial stage of level-3 or -4 BOOT initialization) IMS LVL3 INIT (appears during subsequent stages of a level-3 initialization) IMS LVL4 INIT (appears during subsequent stages of a level-4 initialization) IMS SYS CRIT SEQ CMPL (appears at the conclusion of a level-3 or -4 FPI or BOOT initialization) IMS SYS ABORT (appears prior to a level-3 or level-4 BOOT initialization) IMSRTS.P CREATED (see below)
One of the following states of the imsrts.p process, which creates the IMS display pages:
s s
IMSRTS.P DIED IMSRTS.P CREATED
If ARR is not currently attempting to restore a node and none of the system or IMSRTS.P conditions exist, the location will be blank.
3-30 Issue 16.0
December 2000
Ring Maintenance
[ARR Restart] appears at the location where any node (other than an application-nominated critical node) that ARR is currently attempting to restart will be identied. Node restarts that are initiated locally by the node processor are not recognized nor recorded by this indicator. [ACNR Restore or Restart] appears at the location where any application-nominated critical node (see ``Three ARR Rules'' above) that ARR is currently attempting to restore or restart will be identied. Because one ARR restart and one ACNR restart may occur in parallel and because one or both restarts may occur in parallel with a single restore, it is possible to have all three node-activity indicators lighted simultaneously. It is not, however, possible to have two restorals occurring simultaneously, since IMS can restore only one node at a time (see "Three ARR Rules'' above).
The next section of the display page, beginning in the above example with the fth line, identies all frames/cabinets in the IMS system, each node within each frame/cabinet, and the major state of each node. The nodes that occupy a frame/ cabinet are called a group. The example shows six groups identied by their group numbers as 00, 01, 02, 32, 33, and 34. To the right of the group numbers are characters representing the sixteen nodes or node positions within each group. Thus the rst character represents the RPCN, and the next fteen characters represent IUNs. In the IMS numbering scheme, nodes are identied by the formula RPCNa b or IUNa b, where a is the two-digit group number and b is a number between 00 and 15 that corresponds to the sequential location of the node within its group on the downstream path of ring 0. Thus RPCNs are always numbered 00 and IUNs are always numbered 01 to 15. The characters also identify, in accordance with the following formulas, the current major state of each of the sixteen nodes. See Table 3-5 on page 3-31. Table 3-5.
Active Standby Out of service, quarantined Out of service, isolated Grow Ofine Unequipped Initializing
1105-Page Symbols of Node Major States

A s or S O i g or G f or F . or blank space b or B
Issue 16.0
December 2000
3-31
401-661-045
In the instances that provide an alternative of an upper- or a lower-case letter, the lower-case signies that the node is isolated, and the upper-case signies that the node is in the active ring. In the example of an 1105 page above:
s s s s s s s s s
RPCN00 00 is in the active node major state LN00 01 and LN00 02 are also active LN00 03 is out-of-service quarantined LN00 04, LN00 05, and LN00 06 are active LN00 07 and LN00 08 are out-of-service isolated LN00 09 is in the grow state and is isolated LN00 10 is active LN00 11 and LN00 12 are out-of-service quarantined, and LN00 13, LN00 14, and LN00 15 are unequipped12
Page 1106 The Ring Node Status Page

The 1106 display page provides status information about, and a command interface for, a technician-specied group of nodes. Figure 3-2 is typical of an 1106 page.
12
When a group contains any out-of-service nodes, IMS color-codes the entire group with red background on white lettering. For additional information on the node and ring maintenance states, refer to the `ÀRR or Deferrable Node Recovery section of this chapter.
3-32 Issue 16.0
December 2000
Ring Maintenance
CMD> NODE> [Ring Status] [ARR Restore, etc.] 01 [ARR Restart] 02 [ACNR Restore or Restart] 03 CMS FUNCTION 04 2xx RMV node (line xx) 05 3xx RST node (line xx)(UCL) 06 400 BISO-EISO 07 401/402 all non-ACT(next/prev) 08 403/404 all Equipped(next/prev) 09 500 DGN Isolated Segment 10 5xx DGN node (line xx) 11 6nn Group nn 12 7xx RST node (line xx)(COND)13 14 TOTAL 15 16 NODE NAME RPCN00 00 LN00 01 LN00 02 LN00 03 LN00 04 LN00 09 LN00 14 LN00 15
-- 1106 - RING NODE STATUS -RING MAJOR RI NP MAINT POS STATE STATE STATE MODE NORM ACT USBL USBL AUTO NORM ACT USBL USBL AUTO BISO ACT USBL USBL AUTO ISO OOS FLTY USBL MAN ISO OOS FLTY USBL MAN EISO ACT USBL USBL AUTO NORM OOS USBL FLTY AUTO NORM ACT USBL USBL AUTO
Figure 3-2.
An 1106 Display Page The 1106 page is composed of three areas. The area to the right, beginning with and including the column of line numbers 01 through 16, displays the major and minor states of a group of up to sixteen technician-specied nodes. In this document, this is called the display area. The area at the top left beginning CMD> and ending ACNR Restore or Restart is the command-interface and system-status area. In this document, this is called the command area. The area below the command area and to the left of the column of line numbers is a nonselectable command menu. In this document, this is called the menu area. The Menu Area. Entries in the CMS column of the menu area list the input forms for commands identied under the FUNCTION column. These commands may be typed and entered at the CMD> prompt. The xx in the rst, second, seventh, and ninth commands represent a line numbernot a node numberfrom the column of numbers, beginning 01 and ending 16, at the center of the page. Each line number is associated with the node to its right. In the above example, line 02 represents IUN00 01; and to quarantine IUN00 01, a technician would enter 202 at the CMD> prompt. By contrast, the nn in the next-to-the-last command represents not a line number but a group number. In the above example, to have the nodes contained in group 32 displayed, a technician would enter 632. Below is a listing of the results obtained from entering these 3-digit commands: 2xx 3xx Quarantines the node identied on line xx. Unconditionally restores the node identied on line xx.
Issue 16.0
December 2000
3-33
401-661-045
400
Displays, if the ring has an isolated segment, currently isolated nodes preceded by the BISO node and followed by the EISO node. If the isolated segment is greater than 14 nodes, the display will list rst the BISO node, then the rst seven isolated nodes downstream of the BISO node, then the last seven isolated nodes upstream of the EISO node, then the EISO node. It can be recognized from the Total line below the menu area that a portion of an isolated segment is missing (because the isolation contains more than 14 nodes). After the 400 command is entered, this displays a number that includes all currently isolated nodes plus the BISO and EISO nodes. The count on the Total line updates interactively. Initially provides in the display area a list of nodes in the ring that are neither active nor unequipped. Thus it lists any nodes that are in the out-of-service, standby, initializing, and grow states. After the 401 command is entered, the total number of nonactive nodes will be given on the Total line below the menu area and updated interactively. If this number is greater than 16, technicians may page forward and backward in the list by reentering 401 and 402, respectively. Entered the rst time provides a list of nodes in the ring that are equipped. Thus it lists all nodes that are in the active, out-of-service, standby, initializing, and grow states. After the 403 command is entered, the total number of equipped nodes will be given on the Total line below the menu area and updated interactively. If this number is greater than 16, technicians may page forward and backward in the list by reentering 403 and 404, respectively. Runs diagnostic phases 1 and 2 on all RACs in the isolated ring segment. Runs all automatic diagnostic phases on the node identied at line xx. Displays all equipped nodes in group nn, where nn is not the line number but the group number. After the 6nn command is entered, the total number of equipped nodes within the group will be given on the Total line below the menu area and updated interactively. Conditionally restores the node identied on line xx.
401
403
500 5xx 6nn
7xx
The Command Area. CMD> is the prompt for any of the 3-character commands listed in the command menu. Entering a valid command here evokes an OK response. Entering an invalid command evokes an NG response. To enter a command, manipulate the cursor with the CMD/MSG key until it is at the prompt.
3-34 Issue 16.0
December 2000
Ring Maintenance
Then type and enter a 3-character command from the CMS column of the menu area. The prompt also accepts as input display-page numbers to which the technician wishes to turn. Node> is the prompt for a command that allows technicians to select the sequence of nodes displayed, after having entered a 401 or 403 command. To employ this feature, enter 401 or 403, manipulate the cursor with the arrow keys to the Node> prompt, and then type and enter the identication, in the form IUNa b or RPCNa b, of the node you wish to form the starting point of the sequence. The display will be redrawn with the specied node as the last entry in the 401 display and as the rst entry in the 403 display. This feature is not available for the 400 and 6nn commands where its reordering might be confusing. [Ring Status] appears at the location where the current ring state will be displayed. One of the following states should always be present: RING STATE ACTIVE RING STAT ISOLATED SEGMENT RING STAT RESTORING RING STAT CONFIGURING RING STAT DOWN [ARR Restore, etc] [ARR Restart] [ACNR Restore or Restart] provide the same information as they do for the 1105 display page, as explained above. Because one ARR restart and one ACNR restart may occur in parallel and because one or both restarts may occur in parallel with a single restore, it is possible to have all three node-activity indicators appear simultaneously. It is not possible, however, to have two restorals appear simultaneously, since IMS can restore only one node at a time (see "Three ARR Rules'' above). The Display Area. The display area lists up to 16 nodes and identies their major and minor maintenance states. Node major and minor states are explained above in the `ÀRR or Deferrable Node Recovery'' section of this chapter. A listing of the maintenance states follows:
s
Node Major States ACT - Active OOS - Out of service STBY - Standby INIT - Initializing OFL - Off-line
Issue 16.0
December 2000
3-35
401-661-045
GROW - Grow UNEQ - Unequipped

s
Node Minor States: Ring Position NORM - Normal BISO - Beginning of Isolation EISO - End of Isolation ISOL - Isolated
Node Minor States: ring interface USBL - Usable QUSBL - Quarantine-usable FLTY - Faulty UNTSTD - Untested
Node Minor States: node processor USBL - Usable FLTY - Faulty UNTSTD - Untested
Node Minor States: Maintenance Mode AUTO - Automatic MAN - Manual
Nodes may be added to 401 and 403 displays by manipulating the cursor to any vacant line in the display and typing and entering a node name in the form LNa b or RPCNa b. The display will provide status information for the node and also display the line number in reverse video, indicating its special status. The special status node will disappear when a new command is entered at the CMD> prompt. Prior to that time the node may be deleted manually by manipulating the cursor to the line and then typing only the RETURN key.
Ring Diagnostics
IMS provides diagnostic tests for all circuit packs that reside in the ring node frames/cabinets except power supplies. These tests are submitted as requests to MIRA and performed in a manner similar to standard RTR diagnostics. They may be initiated automatically by ARR or manually by technicians through input messages or display-page commands.
3-36 Issue 16.0
December 2000
Ring Maintenance
Each IMS node-type is tested by a distinct diagnostic routine; each diagnostic routine is composed of units of sequential execution called phases; and each phase tests functionally-related hardware. Phases are automatic or optional (available on demand). Automatic phases are executed when a diagnostic is run at the request of ARR or in response to a manual request without the PH option. Optional phases are executed only in response to manual requests in which they are specied in the PH option. Phases are identied by the node-type on which they are executed and by phase numbers. Node-types are further distinguished by their hardware composition. The currently available node-types are IRN RPCNs, IRN2 RPCNs, IRN LNs (LIN-E/SS7), IRN LNs (LI4S/SS7), IRN DLNEs, IRN DLN30s, IRN CDN-Is, IRN CDN-IIs, IRN CDN-IIxs, CDN-IIIs, SS7NEs, DLN6os and IRN MDLs. Phase numbers reect the relative order in which phases are run within a routine. Diagnostic phases 1 and 2 are special in two ways. They are common to all node-types; and when full, automatic diagnostics are requested whether manually or by ARR on any node (thus requiring that the node be isolated), phases 1 and 2 test the entire path within the isolation as a preliminary step to testing the specied node. Testing the isolated path requires partial tests of all nodes and interframe buffers within the isolated segment as well as tests of the isolated RACs of the EISO and BISO nodes. Running phases 1 and 2 also has the effect of clearing RAC status registers. RAC status registers may become improperly set as a consequence of a fault, of the node being powered down, or of the RAC circuit pack being removed or reset. Phase 40 is a critical juncture in IMS diagnostics. When a diagnostic request includes only phases above 39, IMS quarantines the node before running the diagnostic phases on it. When, on the other hand, a diagnostic request includes any phases below 40, IMS attempts to isolate the node prior to running diagnostics on it. If, however, ring conditions do not permit the node to be isolated, IMS runs all requested phases that do not require the node be isolated while the node is quarantined. These will include all requested phases above 40 and some requested phases below 40. Most IMS diagnostic routines terminate at the end of a phase in which a test fails. A few terminate at the end of a failing test. Important exceptions to this statement are as follows: If phase 1 or 2 fails in any node-type, all of phases 1 and 2 are still run. If either or both phases 1 or 2 fails in RPCNs, phases 10 through 27 are still run unless a test fails in these upper phases, in which case diagnostics terminate at the end of the failing upper phase.
Obtaining Diagnostic Results

Included in Appendix B, Ring Maintenance Reference Material, are two groups of tables that provide IMS diagnostic information. Diagnostic Phase Tables, available for each node type, identify and supercially describe the phases in each routine.
Issue 16.0
December 2000
3-37
401-661-045
Diagnostic Fault Tables, also available for each node type, associate phases with the circuit packs they test, thereby providing a list of suspect circuit packs for any failing phase. Whether diagnostics are initiated automatically or manually, their results appear as output messages on the ROP. The DGN output message identies failing phases and failing tests for a faulty node. And the ANALY TLPFILE output message provides a list of suspect circuit packs in the faulty node. The ANALY TLPFILE message, invoked by the TLP option of the RST command, is always included by ARR requests to restore a node. In the ANALY TLPFILE message, each circuit pack associated with a diagnostic failure is assigned a number between one and ten. The number represents the probability as calculated by IMS software that the location of the fault is in the pack; the higher the number, the greater the probability. The DGN and ANALY TLPFILE output messages are primary sources of diagnostic information for technicians.
Diagnostic Listings
If the information provided by ROP output messages fails to identify faulty equipment, further scrutiny of the diagnostic results is possible using diagnostic listings. A diagnostic listing is a document that describes a particular diagnostic phase. Common Network Interface has available the diagnostic listings that pertain to the CNI conguration of the ring. They consist of the listings for ring peripheral controller nodes, link nodes, attached processors, and ring application processors. A diagnostic listing is composed of a prologue and a statement sequence. The prologue introduces the subject phase by explaining what it tests, how the testing is done, and what hardware is involved. All lines in the prologue begin with the character C, indicating they are comments. The statement sequence consists of information, arranged into numbered statements, about each command within the series of commands that constitutes the phase. Each statement contains a statement number, a source-le version of the command, and an ASCII representation of the executable version of the command. The ASCII representation is on a line that begins with the string * adr, unless the command generates a test, in which case the line begins with * test followed by the test number. Most statements are preceded by one or more comment lines that explain the purpose of the command that follows. Statement numbers correspond to numbers that appear in early termination output messages and in DGN AUDIT RING output messages. They are also used in the EX input message. Test numbers correspond to the test numbers that appear in DGN output messages. For technicians, test numbers are the most important information in diagnostic listings.
3-38 Issue 16.0
December 2000
Ring Maintenance
Some long diagnostic listings subdivide the statement sequence into program units. Program units correspond to divisions of phases that serve explanatory rather than programming functions. Each program unit is preceded by a prologue that provides introductory information about the commands within the unit.
Using Diagnostics
IMS ring diagnostics serve three principal purposes to conrm faults, to locate faults, and to verify repairs. When IMS software removes a node suspected of being faulty from services, it sometimes employs diagnostics to conrm and to locate the fault. After replacing or repairing equipment indicated as faulty, technicians employ diagnostics manually to verify that the fault has been corrected before returning the node to service. Because conditional restoral requests of ARR always include the TLP option, technicians usually have no need to manually diagnose a node in order to conrm or locate its fault. Instead, they should consult the diagnostic results on the ROP that was generated by ARR's restoral attempt. If, however, a restoral attempt fails for nondiagnostic reasons, technicians will ordinarily need to run diagnostics on the node before performing maintenance on it.
Guide to Critical Ring Maintenance

This document uses the term "critical maintenance" for manual actions undertaken to correct faults and to recover the ring. The faults are of the kind that obstruct the transportation of messages on the ring (ring-related faults) or the kind that prevent the processing or transmission of messages within nodes (node-related faults). As applied to nodes and their components, the principles of critical maintenance are essentially the same for all except the ring application processors (RAPs) of CDN-Is which require unique treatment. Therefore, among the maintenance procedures set forth below, there is a special one for RAPs. Critical maintenance most often occurs with the ring subsystem in operation, however fragmented the total ring might be by out-of-service nodes. Occasionally, however, critical maintenance is required when, because of ring conditions, the ring subsystem fails and cannot be recovered by automatic means. This state, known as ring down, is also discussed in this chapter and addressed with its own procedure. The section begins with a discussion of the IMS commands technicians will most often employ in performing critical ring maintenance. The discussion is intended to amplify information contained in the IMS Output Manual; it is not to be used as reference material.
Issue 16.0
December 2000
3-39
401-661-045
IMS Input Messages

IMS input messages allow technicians to practice critical maintenance by manually controlling various maintenance functions associated with the IMS ring.13 A descriptive list of frequently-used IMS input messages follows. Where the word NODE appears in the list, substitute RPCN or the user's name for an IUN (LN, for example). RMV:NODE Quarantines the specied node. If the command is executed for a node that has been automatically quarantined, the maintenance mode of the node will change to manual, and the node will remain quarantined until it is manually returned to service by a version of the RST:NODE command. Before entering RMV:NODE for an active node with an active external user interface, remove from service the communication link or links that terminate in the node. DGN:NODE Executes diagnostic phases on the specied node. If no phases are specied, DGN:NODE with exceptions described in a and b below a. If a node is in the active segment of an isolated ring but not a BISO or EISO node, DGN:NODE with no phases specied quarantines the node (if it was not already quarantined) and runs all diagnostic phases that do not require the node be isolated. b. If the node is a BISO or EISO node, DGN:NODE with no phases specied extends the isolation to include the node and runs all automatic phases on it. If, however, the extended isolation would create an active ring that is too short to support message transport, the extension is not allowed and the subsequent action is that described in a. above.
13
These commands may conform either to the Program Documentation Standards (PDS) except that terminal exclamation marks are supplied automatically by software or to the Man-Machine Interface Language (MML). Technicians should select one or the other of these message conventions by setting the RTR ECD spooler ag to PDS or MML. For an explanation of the PDS input-message format, consult 3B21D Computer, UNIX RTR Operating System, Input Message Manual, PDS ``Section 2, User Guidelines. For a complete description of PDS, consult the Bell Laboratories Program Documentation Standards Reference Manual. For an explanation of the MML input-message format, consult 3B21D Computer, UNIX RTR Operating System, Input Message Manual, MML ``Section 2, User Guidelines. For a complete description of MML, consult the CCITT MML Recommendations (Z.301-Z.341) which are available from OMNICOM, Inc. Vienna, Virginia. To set the spooler ag, see the layout for the ECD splrinfo form in the RTR Operating System, Recent Change and Verify Manual for the 3B21D Computer.
3-40 Issue 16.0
December 2000
Ring Maintenance
If any phases below 40 are specied, DGN:NODE behaves as above except that it attempts to run only the specied phases. If only phases above 39 are specied, DGN:NODE runs the phases on the node after quarantining it (if it was not already quarantined). If a node was active or quarantined prior to the request for diagnostics, DGN:NODE attempts to quarantine it after diagnostics have completed. If a node was in another state, DGN:NODE leaves the node in the state in which it found it, provided that diagnostic results do not require a different state. (Technicians would ordinarily return a quarantined node that had passed diagnostics to service by unconditionally restoring it.) Before entering DGN:NODE for an active node with an active external user interface, remove from service the communication link or links that terminate in the node. RST:NODE Entered unconditionally for an out-of-service node that is not sandwiched in isolation between nodes with faulty ring interfaces, unisolates and/or unquarantines the nodethus placing it in the active ring, downloads operational code into it, places the code in execution, then changes the major state of the node to active. If the node is sandwiched in isolation, RST:NODE entered unconditionally leaves the node isolated, while placing it under ARR control so that it will be automatically restored when ring conditions permit. Entered conditionally, RST:NODE completes the same actions as DGN:NODE with no phases specied, then restores the node, provided that it passes diagnostics and is not sandwiched in isolation. If it is sandwiched in isolation, RST:NODE leaves it isolated while placing it under ARR control so that it will be automatically restored when ring conditions permit. If a node fails diagnostics, RST:NODE leaves it isolated, if its ring-interface state is FLTY, or quarantines it, if its ring-interface state is USBL or QUSBL and it is not sandwiched in an isolation. If the RST:NODE command is followed by a resource failure that prevents downloading or executing code, a REPT IUN RST output message with failure code 43 will appear on the ROP. When this occurs, technicians should wait a few minutes and try the restoral again. Before entering RST:NODE conditionally for an active node with an active external user interface, remove from service the communication link or links that terminate in the node.
Issue 16.0
December 2000
3-41
401-661-045
After entering RST:NODE for a node whose communication link has been manually removed from service, it may be necessary to manually return the communication link to service. OP:RING Produces an OP RING output message concerning the status or generic identity of specied nodes, groups of nodes, or of the ring.
CFR:RING 1. isolates or attempts to end the isolation of specied nodes or 2. initializes the ring if it is down. Because the DGN and RST commands provide automatic isolation and unisolation of nodes under most conditions, this command is rarely used. The command is intended primarily for use in the rst sense when growing and degrowing nodes and in the second sense when a new ring is being installed under Manual Ring Mode, which is explained below. In daily operations, the rst version of the command might be used with the exclude option to isolate a node whose ring-interface state is quarantine-usable prior to changing the ring-interface or IRN circuit pack. With the MOVFLT option the rst version command can be used to shift an isolation on a ring that is too small for the isolation to be extended. Before the Exclude version of the CFR command is entered for an active node, the node must be removed from service with the RMV:NODE command. Tables providing brief descriptions of commonly used versions of IMS output messages appear in Chapter 5, Ring Critical Events.
Critical Maintenance Procedures for Nodes

Because of the automatic actions of IMS maintenance software, technicians ordinarily perform critical maintenance on nodes that ARR has attempted unsuccessfully to restore. Most restoral attempts that fail do so because of diagnostic failure. A few fail either because the attempt timed out waiting a reply from MIRA or because a recurrent error condition caused a node to violate the fourth-time rule, which prevents ARR from restoring the same node for a fourth time within a 60-minute interval. When any restoral attempts fails, ARR announces the event with a version of the REPT ARR AUTORST message on the ROP and changes the maintenance mode of the node to manual, thereby, directing technicians to perform maintenance on it. This section contains three procedures for clearing faults in individual nodes and three procedures for dissolving isolations. Of the procedures for clearing faults, one is to be used when ARR has failed to restore a node, one when critical
3-42 Issue 16.0
December 2000
Ring Maintenance
maintenance is manually initiated, and one whenthese procedures failing to clear a problemit becomes necessary to consult diagnostic listings. The information provided by these three procedures is entirely sufcient for the maintenance of nodes that are quarantined. Maintenance of isolated nodes, however, involves these issues and others as well. The section ends with procedures for dissolving isolations. One is concerned with single-node isolations; one is concerned with multiple-node isolations; and one, to be used in conjunction with the other two, is concerned with the problems associated with a fault in a BISO or EISO node.
Procedure 3-2. Clearing Faults in Response to ARR Action

ARR turns a faulty node over to technicians isolated when diagnostics or error messages indicate a ring-interface problem that prevents the node from propagating messages on the ring. Otherwise, it turns a faulty node over to technicians quarantined. Thus technicians sometimes do and sometimes do not receive a node from ARR in the proper state for replacing the circuit packs that diagnostics have indicated as possibly faulty. Quarantined nodes with ring-interface problems (ring interface QUSBL) and IRN nodes with node processor problems are turned over to technicians quarantined yet must be isolated before their ring-interface circuit packs are replaced. Nodes requiring backplane repairs must also be isolated. IMS circuit packs are designed to be replaced while the power supply to the node is on. 1. Learn of the failure of an ARR restoral attempt from a REPT ARR AUTORST RST RQST FOR a FAILED output message, where a is the node that failed. Confirm with the OP:RING command or from the 1106 display page that the failed node is in the manual mode. 2. Note the failing phases and tests from the DGN output message. 3. From the information concerning failing phases, compose a list of suspect circuit packs using the ANALY TLPFILE output message, and obtain from the supply of spare circuit packs one of each pack on your list. Observing the circuit pack LEDs, ensure that the node containing the listed pack or packs is in the proper state for having the pack(s) replaced.
Issue 16.0
December 2000
3-43
401-661-045
The following Table describes the various LED indications. Nodes should be isolated before having any part of their backplanes repaired. Table 3-6. Circuit Pack LED States Node Type
any VLSI any
Circuit-Pack Type
auxiliary IRN IFB
State
quarantined or isolated isolated isolate the adjacent node in the same unit as the IFB CP
Indication
RQ LED red NT LED red NT LED red
NOTE:
Before pulling any circuit pack in units not equipped with a connector assembly, isolate all nodes serviced by the power supply associated with the connector assembly. In 3-node units, the connector assembly is located at the rear of the backplane at the RI\ 1 position in the two external nodes and is associated with the nearest power supply. In two-node units, the connector assembly is located at the rear of the backplane at the RI 1 position in both nodes and is associated with the nearest power supply. In eight-node units the connector assembly is located at the back of each power supply and is associated with that power supply. 4. Replace the first circuit pack on the list, then proceed as follows:
s
If you replaced a ring-interface, a node-processor, or an IRN circuit pack in any node-type other than an RPCN, restore the node conditionally with RST:NODEa,b command. If you replaced any circuit pack in an RPCN other than the DDSBS circuit pack, restore the node conditionally with the RST:RPCNa,b command. If you replaced the DDSBS circuit pack of an RPCN, rst run all automatic diagnostic phases with the DGN:RPCN command. If the automatic phases pass, next run optional diagnostic phase 14 with the command DGN:RPCNa,b:PH 14,CU c where c is 0 or 1, indicating the off-line control unit of the 3B21D. If the DDSBS circuit pack passed both optional and automatic diagnostic phases, restore the node to service unconditionally using the RST:RPCNa,b;UCL command. If you replaced an auxiliary circuit pack of any node other than an RPCN or CDN-I, enter the command DGN:NODEa,b:PHc where c is the range of phases that test the circuit pack you replaced. If the unit passes all specied diagnostic phases, restore the node unconditionally with the RST:NODEa,b;UCL command.
3-44 Issue 16.0
December 2000
Ring Maintenance
If you replaced the DDSBS circuit pack of a DLN, rst run all automatic diagnostic phases with the DGN:NODEa,b command. If the automatic phases pass, next run optional diagnostic phase 34 with the command DGN:NODEa,b:PH 34,CU c where c is 0 or 1, indicating the off-line control unit. If the DDSBS circuit pack passed both optional and automatic diagnostic phases, restore the node to service unconditionally using the RST:NODEa,b;UCL command. Consult the section ``Ring Application Processor Critical Maintenance Procedure'' for instructions on diagnosing and changing auxiliary circuit packs on a CDN-I. If to replace an interframe buffer you isolated an RPCN, restore the node conditionally with the RST:RPCNa,b command. If to replace an interframe buffer you isolated any other node-type, run diagnostic phases 1 through 13 with the DGN:NODE,b:PH 1-13 command and, if the phases pass, restore the node unconditionally. If you permanently removed an interframe buffer or substituted a buffer with different capacity, change the ECD HV eld to reect the change before restoring the node.
5. If the list of suspect circuit packs contained more than one entry and the node failed to pass diagnostics after the first listed pack was replaced, reinstall the original pack, replace the next pack on the list, then repeat the applicable portion of 4 and 5 above. Continue in this fashion until either the node passes the specified diagnostic tests or all circuit packs on the list have been replaced and tested. (If the node you are troubleshooting is critically important or contributing to a multiple isolation, you may wish to replace simultaneously all its circuit packs and then, at another time, reinstall the original packs and test them individually to determine which pack was at fault.) 6. If you replaced all circuit packs without the node passing diagnostics, visually inspect the node and its housing. Look for unseated circuit packs, backplane damage, poor grounding connections, and unseated cable connections. Before repairing the backplane, isolate the node. 7. If the backplane is not at fault, consult the sections below on isolations and trouble-shooting.
Procedure 3-3. Manually Initiated Maintenance of Nodes

In general, technicians should avoid manual intervention of any kind while EAR is attempting to recover the ring and should avoid manually intervening with a node that ARR is attempting to restore.
Issue 16.0
December 2000
3-45
401-661-045
IMS circuit packs are designed to be replaced while the power supply to the node is on. 1. Before entering an RMV, DGN, conditional RST, or CFR:RING,NODExx yy;EXCLUDE command for an active node with an active external user interface, remove from service the communication link or links that terminate in the node. After entering an RST command for a node whose communication link was manually removed from service, it may be necessary to manually return the communication link to service. 2. Before manually initiating maintenance on a circuit pack or interframe buffer, remove the resident or associated node from service. See Table 3-6. Before replacing a power supply circuit pack in a 3-node unit, isolate the two nodes adjacent to the power supply. In a 2-node unit, isolate the node adjacent to the power supply. In an 8-node unit, isolate the four nodes adjacent to the power supply. In a 5-node unit, learn from the unit horizontal designation strip next to the power supply in question the nodes serviced by the power supply, and isolate either three or two nodes. Nodes should be isolated before having any part of their backplanes repaired. 3. To quarantine a node, remove it from service with the RMV:NODEa b command. This action has the effect of changing the maintenance mode of the node to manual, thus preventing ARR from attempting to restore it. 4. To isolate a node, first remove it from service with the RMV:NODEa b command, and then isolate it with the CFR:RING,NODExx yy;EXCLUDE command. This also has the effect of changing the maintenance mode to manual. 5. If a quarantined or isolated node has not had a circuit pack replaced or reset, it may be restored to service unconditionally. 6. If an isolated node has not had a circuit pack replaced but has been powered down or had a circuit pack reset, run diagnostic phases 1 and 2 on it with the DGN:NODEa,b:PH 1-2 command. If it passes it may be restored to service unconditionally. 7. If a node has had a circuit pack replaced, observe the guidelines set forth in the fifth step of the procedure ``Clearing Faults in Response to ARR Action.''
3-46 Issue 16.0
December 2000
Ring Maintenance
Procedure 3-4. Using Diagnostic Listings

If the information provided by ROP output messages fails to identify faulty equipment, further scrutiny of the diagnostic results is possible using diagnostic listings as explained below: 1. Note the failing phase and failings tests in the DGN output message. 2. Obtain the diagnostic listing(s) for the phase(s) that failed. 3. Read the prologue(s) to the failing phase(s) and, if one exists, the prologue to the program unit in which failing tests appear. Pay particular attention to any troubleshooting hints. 4. Read the individual comments on statements that contain failed tests. 5. If this information does not provide guidance on how to clear the fault, consult the ``Recognizing and Finding Intermittent Faults'' and the `Òther Suggestions for Troubleshooting'' sections below for possible solutions. 6. If these sections provide no leads, seek assistance from the CTS.
Critical Maintenance Procedures for Nodes in Isolation

Under circumstances described previously in this document, EAR may respond to conditions on the ring by creating an isolated segment that ARR cannot dissolve. In these cases, dissolving the isolation becomes the responsibility of technicians. Generally, technicians should respond promptly to an isolation, since even a singly-isolated node creates the potential of a massive isolation, in the event that another node must also be isolated. Dissolving isolations sometimes requires that they be extended to include the BISO or EISO node. There are two reasons why this may need to be done. The rst involves the ambiguity IMS experiences in detecting certain types of ring-related faults. The second involves the way in which diagnostic code is transmitted into an isolated segment. The second can be stated simply. Messages, including messages containing diagnostic code, are sent from the 3B21D to an isolated segment of the ring through the BISO or the EISO node. BISO and EISO nodes have one RAC participating in the active-ring segment and one RAC participating in the
Issue 16.0
December 2000
3-47
401-661-045
isolated-ring segment. Messages destined for the isolated segment are read from the active ring by the active-ring RAC, then transmitted by the node processor to the isolated-ring RAC, which writes them to the isolated segment of the ring. A fault in the isolated-ring RAC of either BISO or EISO node might go undetected, since it would not affect the transportation of message on the active ring and could show up misleadingly as a diagnostic failure in the isolated node. Therefore, technicians who nd that they cannot clear a fault that appears to reside in the isolated node should extend the isolation to include the current BISO and EISO nodes and run diagnostics again.
Low-Phase Ambiguity
The other reason for extending isolations concerns the ambiguity that IMS experiences in detecting certain ring-related faults. Faults that prevent the propagation of messages on the ring usually produce phase-1 and phase-2 diagnostic failures. In the case of such failures, IMS often has the problem of being unable to decide in which of two adjacent RACs a fault resides. Because this problem is associated entirely with the parts of node hardware tested by diagnostic phases 1 and 2, this document calls it low-phase ambiguity.'' Low-phase ambiguity does not usually result in the isolation of two nodes because, while one suspect RAC is isolated, the other suspect RAC may be included in the isolated segment as the isolated RAC of the BISO or EISO node. The following gure illustrates the ring structure that permits this practice:
BISO Node Ring Interface
Isolated Node Ring Interface
EISO Node Ring Interface
RAC 0
RAC 0
RAC 0
RAC 0
RAC 0
RAC 1
RAC 1
RAC 1
RAC 1
RAC 1
Figure 3-3.
Isolated RACs of BISO and EISO Nodes Notice that either RAC 1 of the BISO node or RAC 0 of the EISO could be included in the isolated segment as a suspect RAC. IMS has difculty acknowledging by customary means the fact that it has included possibly faulty RACs in BISO or EISO nodes. A BISO or EISO node, being in the active ring, cannot have its ring interface marked faulty. Therefore, if a RAC of such a node is suspect, this fact will not be indicated in the minor state of the node nor in the TLP information. It will, however, be reected in tests 5 and 10 of the ROP failure data for diagnostic phases 1 or 2, provided that the RAW option of the
3-48 Issue 16.0
December 2000
Ring Maintenance
DGN command has been specied. (ARR does not specify the RAW option, so the automatically output DGN failure data does not contain this information in full. It does, however, contain failing test 5, which is a sure indication that low-phase ambiguity exists.) The maintenance principle dictated by low-phase ambiguity is represented in the following procedure:
Procedure 3-5. Determining the Nodes Involved in Low-Phase Ambiguity

1. After attempting to clear a fault in an isolated node that has failed test 5 of diagnostic phases 1 or 2, run verication diagnostics on the node with the RAW option using the command DGN:NODEa,b;RAW, where NODEa,b is the isolated node. 2. If the node passes all diagnostic phases, restore it to service unconditionally. 3. If the node still fails phases 1 or 2, consult the output message generated by the DGN command with the RAW option, and determine whether it is the BISO or EISO node that is suspected of being faulty. This is an example of an output message when the RAW option of the DGN command has been specied: DGN LN32 1 PH 1 STF (14 X'00000000 x'00000000) TEST MISMATCH ACTUAL MASK EXPECTED
001 004 005 006 007 008 009 010 011
X'00010000 X'FF012242 X'00000E01 X'00000044 X'0000002E X'00000E00 X'00000E04 X'00000E02 X'FF012242
N/A N/A N/A N/A N/A N/A N/A N/A N/A
Issue 16.0
December 2000
3-49
401-661-045
Ignore everything except the mismatch data for test 005 and 010. If either test 005 or test 010 appears in the DGN output message, the other will appear also, provided that the RAW option to the DGN command has been specied. These tests will always identify two nodes as possibly faulty. 4. Using the physical node-address table in the reference chapter of this document, translate the hexadecimal mismatch data for test numbers 005 and 010 into the node names of two nodes. For example, in the above DGN output message, 00000E01 translates into IUN32 1 and 00000E02 translates into IUN32 2. These are the nodes suspected by IMS of being faulty. In the case of single-node isolations, one of the suspect nodes will be the isolated node and the other will be the BISO or EISO node, the suspect component of which will be the RAC 1 of the former or RAC 0 of the latter. 5. When one suspect node is an EISO or BISO node, manually remove its communication link (if it has an active one) from service, then remove the node from service with the RMV:NODEa b command, thus extending the isolation to include the suspect node in the isolated segment. 6. Perform maintenance on the newly isolated node. Low-phase ambiguity has bearing on the procedures for treating singleand multiple-node isolations. The procedures concerning isolations that follow are merely recommended. When circumstances, reason, or user practices dictate to act differently, do so. The procedures are not self-sufcient but build upon the three procedures discussed above for clearing faults in nodes. The order of battle in these procedures is this: rst perform maintenance on suspect nodes within the isolated segment. If this fails to dissolve the isolation, next check to see if the isolated RAC of an EISO or BISO node is suspected of being faulty. If so, perform maintenance on it after including it in the isolation. Finally, if no isolated RAC in the EISO or BISO node is suspected of being faulty, extend the isolation to include the BISO and EISO nodes, one at a time, and run diagnostics again on the chance that a fault in one of their isolated RACs is being misread by diagnostic code.
3-50 Issue 16.0
December 2000
Ring Maintenance
Guideline to Single-Node Isolations
Procedure 3-6. Responding to Single-Node Isolations

1. Recognize the existence of an isolated segment from output messages or from information on 1105 or 1106 display pages. In some cases technicians will themselves create an isolation, as for example when ARR turns over to technicians a quarantined node that must be isolated before manual maintenance can be performed on it. 2. If you are on-site, confirm that the node is isolated by checking its NT LED. 3. Follow the appropriate procedure for the isolated node from the procedures listed below:
s s
Clearing Faults in Response to ARR Actions Manually Initiated Maintenance of Nodes
If test 5 of a phase-1 or phase-2 failure is indicated, verify your repair using the DGN command with the RAW option specied, thereby learning when the isolated node still fails diagnostics whether the isolated RAC of the BISO or EISO node is also suspected by IMS of being faulty.
BISO Node
Isolated Node
EISO Node
4. If the procedure that you employed on the isolated node in step 3 failed to end the isolation and test 5 and test 10 of a phase-1 and/or phase-2 failure is indicated, extend the isolation to include the BISO or EISO node identified by the mismatch data for test 10. Use the command RMV:NODEa, b, where NODE is the node name of the node identified by test 10 mismatch data. On small rings you may have to shift, rather than extend, the isolation by employing the MOVFLT option of the CFR:RING command. (If the BISO or EISO node has an active communication link, remove the link from service before removing the node.) 5. Follow the procedure Clearing Faults in Response to ARR Actions'' for the newly isolated node. 6. If:
Issue 16.0
December 2000
3-51
401-661-045
a. the procedure that you employed on the isolated node in 3 failed to end the isolation b. and test 5 of a phase-1 and/or phase-2 failure is not indicated, extend the isolation to include the BISO node with the command RMV:NODEa, b, where NODE is the BISO node. On small rings you may have to shift, rather than extend, the isolation by employing the MOVFLT option of the CFR:RING command. (If the BISO node has an active communication link, remove the link from service before removing the node.)
BISO Node
Former BISO Node
Originally Isolated Node
EISO Node
7. With the former BISO node now in the isolated segment, again diagnose the originally isolated node. 8. If the originally isolated node now passes diagnostics, a. diagnose the former BISO node and, if it fails, perform maintenance on it following the TLP instructions b. but if it passes, change its ring-interface and node-processor circuit pack(s), then conditionally restore it to service.
s
If the former BISO node now enters the active ring (thereby dissolving the isolation), unconditionally restore the originally isolated node (which should now have become quarantined) to service, and end this procedure.
9. But if the originally isolated node still fails diagnostics after the former BISO node has been included in the isolated segment, reduce the isolation by unconditionally restoring the former BISO node, thereby making it once again the BISO node. (You may have to manually return its communication link to service.) 10. Extend the isolation in the other direction to include the EISO node, and treat the former EISO node as you did the former BISO node above.
BISO Node
Originally Isolated Node
Former BISO Node
EISO Node
3-52 Issue 16.0
December 2000
Ring Maintenance
11. If the originally isolated node still fails diagnostics after the isolation has been extended in both directions, or if the isolation repeatedly dissolves and returns, attempt any appropriate procedures described in the section below on troubleshooting. Then, if the isolation still persists, call the CTS.
Guideline to Multiple-Node Isolations

Isolations of more than two nodes will often contain innocent victims, that is, nodes that are included in the isolation, not because they are faulty, but because they reside between faulty nodes. The ring interfaces and node processors of such nodes will be classied as usable. Unless technicians manually remove innocent victim nodes from service, they will remain in automatic maintenance mode, and ARR will automatically return them to service when the isolation is dissolved.
Procedure 3-7. Responding to Multiple-Node Isolations

1. Recognize the existence and extent of an isolated segment from output messages or from information on 1105 or 1106 display pages. 2. Identify from DGN output messages the nodes within the isolation regarded by IMS software as faulty. In nearly all cases the faulty nodes should be the isolated nodes next to the BISO and EISO nodes. If an interior node is also indicated faulty, ignore it until partial success in this procedure transforms it into a node next to an EISO or BISO node.
BISO Node
Isolated Node next to the BISO Node
Innocent Victim Node
Isolated Node next to the EISO Node
EISO Node
3. If you are on-site, confirm that the nodes in question are indeed isolated by checking their NT LEDs. 4. Choose to begin working on either the isolated node next to the BISO node or the isolated node next to the EISO node. Base your choice on the following considerations in the order shown: a. If diagnostic failure data is given for only one of the two nodes, begin with the node for which you have failure data.
Issue 16.0
December 2000
3-53
401-661-045
b. If failure data is given for both nodes, begin at the end of the isolation that includes the nodes most important to your operation. 5. For the node you have chosen, follow the procedure ``Clearing Faults in Response to ARR Actions.'' If test 5 of a phase-1 or phase-2 failure is indicated for this node, verify your repair of the node using the DGN command with the RAW option specified, thereby learning when the isolated node still fails diagnostics if the isolated RAC of the adjacent BISO or EISO node is also suspected by IMS of being faulty. 6. If the procedure clears the fault of the isolated node next to the BISO or EISO node, the ring should now contain only a singly-isolated node, since both the repaired node and the innocent victim nodes will have returned to the active ring. (An exception to this statement occurs when the isolated segment contains three faulty nodes. In this case, restoring one of the external faulty nodes will result in a smaller multiple isolation. If this occurs, return to the beginning of this procedure and repeat the steps up to here, then continue on.) Treat the singly-isolated node according to the procedure for ``Responding to Single-Node Isolations,'' and end this procedure. 7. If, however, the procedure that you employed failed to reduce the isolation and test 5 and test 10 of a phase-1 and/or phase-2 diagnostic failure are indicated, extend the isolation to include the BISO or EISO node identified by the mismatch data for test 10. Use the command RMV:NODEa, b, where NODE is the name of the node identified by test 10 mismatch data. On small rings you may have to shift, rather than extend, the isolation by employing the MOVFLT option of the CFR:RING command. (If the BISO or EISO node has an active communication link, remove the link from service before removing the node.) 8. Follow for the newly isolated node the procedure ``Clearing Faults in Response to ARR Actions.'' 9. If the procedure clears the fault of the newly isolated node, the ring should now contain only a singly isolated node, since the repaired node, the isolated node next to the original BISO or EISO node, and the innocent victim nodes will have returned to the active ring. (An exception to this statement occurs when the isolated segment contains three faulty nodes. In this case, restoring one of the external faulty nodes will result in a smaller multiple isolation. If this occurs, return to the beginning of this procedure and repeat the steps.) Treat the singly-isolated node according to the procedure for ``Responding to Single-Node Isolations,'' and end this procedure. 10. If the previous step of this procedure fails to reduce the isolation or test 5 and test 10 of a phase-1 and/or phase-2 diagnostic failure were not indicated after failure in Step 5 above, go to the other end of the isolated segment and repeat Steps 5 through 9 there.
3-54 Issue 16.0
December 2000
Ring Maintenance
11. If these steps fail to reduce the isolation, extend the isolation to include either the EISO or BISO node if one has already been extended, choose the other; if neither has been extended, choose either with the command RMV:NODEa, b, where NODE is the EISO or BISO node. (If the EISO or BISO node has an active communication link, remove the link from service before removing the node. 12. With the former EISO or BISO node now in the isolated segment, diagnose the isolated node next to the former EISO or BISO node; and if the isolated node next to the former EISO or BISO node now passes diagnostics, change the ring-interface and node-processor circuit pack(s) of the former EISO or BISO node, then conditionally restore the former EISO or BISO node to service.
BISO Node
Isolated Node next to the BISO Node
Isolated Node next to the Former EISO Node
Former EISO Node
EISO Node
BISO Node
Former BISO Node
Isolated Node next to the Former BISO Node
Isolated Node next to the EISO Node
EISO Node
13. If the former EISO or BISO node enters the active ring (thereby reducing the isolation), treat the remaining isolation according to the procedure for single-node isolations. 14. If, however, the isolated node next to the former EISO or BISO node still fails diagnostics, unconditionally restore the former EISO or BISO node to the active ring. (If you manually removed its communication link from service, you may have to manually return it to service.) Then extend the isolation at the other end of the isolated segment (unless you have done so previously), and treat that end in the same way you have treated this end. 15. If both originally faulty nodes still fail diagnostics after the isolation has been extended in both directions, or if the isolation returns after nodes have been restored, follow any appropriate procedures described below in the section on troubleshooting. Then if the problem still persists, call the CTS.
Issue 16.0
December 2000
3-55
401-661-045
Responding to Ring Down

IMS in the 3B21D and IMS in the ring are independent of one another to the extent that either can fail while the other remains in operation. This section is concerned with the problems that confront technicians when the ring subsystem fails because of ring conditions and cannot be recovered by automatic means. The ring subsystem will fail when the 3B21D cannot communicate with the active ring through any RPCN. This condition is most likely to occur in a two-RPCN environment when both RPCNs fail or when the active RPCN fails after the other RPCN had been manually removed from service. In a multiple-RPCN environment, the condition is most likely to occur because of a condition in the 3B21D that would simultaneously disable all RPCNs. The ring subsystem will also fail if the data length of the active ring becomes shorter than the maximum message length for which the system was engineered. Small rings are susceptible to this problem. The problem is brought about by the ring fragmentation associated with an isolation. An isolation that includes padded interframe buffers may shorten the active ring severely. Padded interframe buffers are redundantly employed in pairs at opposite sides of the ring. Thus a single-node isolation would not usually include both pairs. Still, interframe buffers exist under a kind of quadruple jeopardy, because if either member of a pair fails, the pair fails and must be isolated, and because a pair must also be isolated if either of the nodes adjacent to it fails. Thus while it is unlikely that both pairs will become isolated, they have. Finally, a ring may go down and stay down because of an intermittent fault that confuses initialization tests, or a ring may repeatedly go down because of a fault that is transparent during initialization tests but not during normal operations. The following procedure for recovering a ring that is down is intended as an instructional paradigm only. Technicians should freely depart from it as circumstances, reason, or user practices suggest. In particular, technicians should not manually intervene until they are certain that IMS software has exhausted all its efforts to recover a down ring. Such recovery efforts are ordinarily directed by user software. Therefore, technicians should consult user documentation to learn how to know when automatic recovery efforts have ended.
3-56 Issue 16.0
December 2000
Ring Maintenance
Procedure 3-8. Ringdown Response Procedure

1. Following the termination of automatic recovery efforts, immediately attempt to bring the ring up by submitting it to a level-3 and, if that fails, to a level-4 IMS initialization. If it is important to the user that IMS in the 3B21D not abort itself should ring initialization fail, initialize the ring at level 4 using manual ring mode, as explained below. 2. If in response to level-4 initialization the ring fails to come up (as indicated by a REPT RING INIT output message) or to stay up (as indicated by a REPT RING CFR output message), determine the cause of its failure by examining the output messages. The REPT RING INIT messages in question are of two types. One type indicates the reason the ring failed to come up. These reasons include no standby RPC nodes available and no ring segment acceptable for active ring use, with the latter indicating either that no candidate for the active ring-segment contains an RPCN or that no candidate is long enough to satisfy the requirement of minimum length. In the absence of the first message, the second message may be understood to indicate that the problem is length. The second type REPT RING INIT message identifies nodes that tests conducted during initialization have determined to be faulty. 3. If RPCN failure is the apparent cause, replace all circuit packs with known good packs in an RPCN that was not isolated before the ring went down. Then initialize IMS at level 4. If this attempt fails, replace all circuit packs with known good packs in another RPCN. 4. If ring length is the apparent cause, identify faulty nodes by examining the second type REPT RING INIT message. Mentally construct the population and distribution of nodes within the portion of the ring that is likely to become the isolated segment. Ask yourself the following questions:
s
Are any nodes adjacent to padded interframe buffers listed as faulty?

s
If so, are they all external nodes (adjacent to the BISO or EISO nodes) within the portion of the ring likely to become the isolated segment, or is one of them an internal node within that portion? If not, are they innocent victim nodes within the candidate for the isolated segment?
Issue 16.0
December 2000
3-57
401-661-045
5. If nodes adjacent to padded interframe buffers are faulty and one of them is likely to be an external node in an isolated segment, replace (if you are in an emergency situation) the ring-interface and node-processor circuit pack(s) on both nodes adjacent to the interframe buffers and replace both interframe buffers. Then initialize the ring at level 4. 6. If nodes adjacent to padded interframe buffers are internal nodes (either faulty or innocent-victim) in the candidate for the isolated segment, approach the problem following the procedure described above for responding to multiple isolations (though of course under ring down conditions you will not be able to conduct diagnostics). Then, if a node adjacent to padded interframe buffers becomes a probable external node in a candidate for the isolated segment, treat it as in 5 above. 7. Study the MOVFLT option of the CFR:RING command. It may be useful in resolving an isolation on a very small ring. 8. If none of the above approaches succeeds in recovering the ring, force faults by unseating various ring circuit packs and initializing at level 4. This is a desperate attempt by trial and error to force an isolation in the hope of getting the ring up. Once the ring is up, diagnostics can be run on the isolated portion.
Employing Manual Ring Mode

Manual ring mode allows the ring to be fully initialized without an accompanying initialization of IMS in the 3B21D. Ordinarily full ring initialization occurs as a stage in level-4(BOOT) IMS initialization. Under certain circumstances and for certain users, however, the disruption that IMS initialization entails in the operation of the 3B21D may be unacceptable as, for example, when the ring is down or when ring hardware is being retrotted to a system that has IMS as a subsystem. In these cases, the ring may be initialized manually.
Procedure 3-9. Manual Initialization of the Ring

Before manual initialization, the ring must be down and enough hardware must be in place to satisfy the requirement for minimum ring size. To initialize the ring manually, 1. Consult ``Setting the ECD Flag for Manual Ring Mode'' in Appendix B, Ring Maintenance Reference Material.
3-58 Issue 16.0
December 2000
Ring Maintenance
2. Set the ECD Manual Ring Mode flag as described in the above reference. IMS is programmed to abort if, during initialization, the ring fails to come up. The ECD manual ring mode flag inhibits this response. 3. If you are employing manual ring mode for a new installation, or if you are experiencing ring down and no RPCNs are in the standby state, restore as many RPCNs as possible. When RPCNs are restored with the ring down, they will be in the STBY, not the ACT, state. This state is expected and sufficient for moving on to Step 4. 4. Enter the command CFR:RING 5. Expect to receive a form of the REPT RING INIT message indicating that the initialization was or was not successful and a CFR RING COMP message indicating that the program has completed. Forms of the REPT RING FLT message may also appear to identify nodes that failed to participate in the initialization. 6. If the initialization was successful, reset the manual ring mode flag to null. 7. If the initialization was not successful, leave the ECD flag set for manual ring mode and use the information you gained in Step 5 to troubleshoot the ring in the manner described in ``Responding to Ring Down.''
Ring Application Processor Critical Maintenance Procedure

The ring application processors (RAPs) of the CDN-I must be manually diagnosed and maintained using special procedures. Automatically-initiated diagnostics of the RAP sometimes produce deceptive results. If RAP rmware is not executing, diagnostics run on RAP circuit packs (phases 42 through 53) will provide erroneous data about phase and circuit pack failures; yet technicians cannot know from ROP output that the data they are receiving is incorrect. They can, however, receive correct data if, during diagnostics, they are present at the RAP housing and observe the RAP LEDs. Each RAP circuit pack is equipped with an LED that turns on to indicate that the pack has failed a diagnostic phase. In addition, each of the LEDs on certain packs turn on when the RAP is initializing and then turn off when initialization tests conrm that the rmware is executing. The LEDs, thus, supply a means by which technicians can observe the progress of RAP diagnostics and of RAP initialization, provided they are present at the RAP housing as these actions occur. And they can be present, because power and diagnostic switches located
Issue 16.0
December 2000
3-59
401-661-045
on each RAP power control interface and display (PCID) board allow them to control these functions locally. Thus RAP initialization and diagnostics may be run centrally by the host or locally by means of PCID-board switches. A RAP failure will usually be tested initially by central diagnostics at the request of ARR, and ROP output will indicate the phases that failed and the circuit pack(s) suspected of being faulty. The procedure described below for fully diagnosing a RAP fault begins by tentatively accepting the results of the automatic diagnostics and then proceeds to conrm them. (Notice in the procedure the requirement that a CDN be quarantined when its RAP circuit packs are diagnosed or replaced.)
Procedure 3-10. Manually Conrming RAP Diagnostic Results

1. Remove the CDN from service by quarantining it. 2. Turn off RAP power by toggling the top switch on the PCID board. 3. Replace the first circuit pack listed in the TLP. 4. Test as follows to determine that RAP firmware is capable of initializing the RAP: Turn on RAP power, observing the LEDs on the following non-MASA circuit packs.
s s s s
The node processor interface (NPI) circuit pack. The central controller support (CCS) circuit pack. The central controller cache (CCC) circuit pack. All equipped main store controller (MASC) circuit packs.
When power is restored the LED of each pack should come on, go off, come back on, and nally go off; and this sequence of LED blinks should be completed for all packs within [18 + (2 the number of MASA boards) +/-2] seconds for systems with the 2-Mbyte memory and within [18 + (20 the number of MASA boards) +/-2] seconds for systems with the 16-Mbyte memory. If an LED fails to come on initially, turn off RAP power, replace the circuit pack, and repeat this step. If any LED fails to follow the full sequence of blinks, or if all LEDs fail to complete the sequence of blinks within the allotted time, go to Step 7 of this procedure. 5. This step manually diagnoses the node. The following information is helpful in understanding it: When diagnostics begin, the LED on each non-MASA circuit pack turns on and stays on until the pack has passed diagnostics. Moreover, diagnostics run on non-MASA packs early-terminate. Therefore, when a non-MASA pack fails
3-60 Issue 16.0
December 2000
Ring Maintenance
diagnostics, the diagnostic routine ends and the LEDs on the failed pack and on all non-MASA packs that have not yet been diagnosed stay on. MASA LEDs, on the other hand, may or may not come on when diagnostics begin, but they will come on if their circuit packs fail diagnostics. Moreover, MASA diagnostics do not early-terminate. Therefore, it is possible during a single diagnostic routine for a MASA pack to fail and for another pack perhaps a non-MASA pack further downstream to fail as well. Depress the DIAG switch on the PCID board. All non-MASA LEDs should come on, then go off within 6 minutes for systems with the 2-Mbyte memory and within 4 minutes for systems with the 16-Mbyte memory. (If more than one MASC memory group is present, add 2 minutes and 40 seconds for each additional group.) If any LED fails to come on initially, turn off RAP power, replace the circuit pack, and repeat this step. If any LED fails to go off in the time indicated, turn off RAP power, replace the circuit pack, and repeat this step. If more than one LED fails to go off in the time indicated, turn off RAP power, replace the rst circuit pack in the following list whose LED is on, and then repeat this step. a. CCS b. Memory group 0, that is, MASC_0 and all MASA packs associated with it. (MASC diagnostics depend upon memory from the rstthe MASA_0memory board, so a fault in one pack may under some circumstances cause the other to fail diagnostics. Therefore, if the situation here or elsewhere indicates that either of these related packs should be replaced but replacing it does not solve the problem, try reinstalling the original pack and replacing the pack of the other.) c. CCC d. Each additional equipped memory group in numerical order. e. NPI If, upon repetition, a replaced circuit pack fails to pass diagnostics, leave RAP power off, quarantine the node, and contact the CTS. 6. If Step 5 succeeded, unconditionally restore the node to service and end this procedure. 7. Systematically search for the fault that is preventing initialization by following Steps 7 through 23. Turn off RAP power. Reinstall the original circuit pack removed in Step 3. 8. Unplug the following circuit packs by opening their latches and pulling them out about one inch:
s
All MASCs packs except MASC_0
Issue 16.0
December 2000
3-61
401-661-045
s s
The NPI pack All MASAs packs in memory group 0 except MASA_0.
9. Restore RAP power and observe the LED on the CCS pack. If it goes on, off, on, off in 33 to 43 seconds, go to Step 24. 10. Turn off RAP power and replace the CCS pack. 11. Restore RAP power and observe the LED on the CCS pack. If it goes on, off, on, off in 33 to 43 seconds, go to Step 24. 12. Turn off RAP power. Reinstall the original CCS pack. Replace the CCC pack. 13. Restore RAP power and observe the LED on the CCS pack. If it goes on, off, on, off in 33 to 43 seconds, go to Step 24. 14. Turn off RAP power. Reinstall the original CCC pack. Replace the MASC pack. 15. Restore RAP power and observe the LED on the CCS pack. If it goes on, off, on, off in 33 to 43 seconds, go to Step 24. 16. Turn off RAP power. Reinstall the original MASC pack. Replace the MASA_0 pack. 17. Restore RAP power and observe the LED on the CCS pack. If it goes on, off, on, off in 33 to 43 seconds, go to Step 24. 18. Measure the voltage at each power converter (PWRB on the main unit and PWRC on the growth unit) from + pin 056 to gnd pin 032. If the voltage is below the +5.1 to +5.3 volt range, turn RAP power off and replace the appropriate converter. 19. Restore RAP power and observe the LED on CCS pack. If it goes on, off, on, off in 33 to 43 seconds, go to Step 24. 20. Steps 20-23 attempt to identify a problem that is not associated with the failure of a circuit pack. a. Turn off RAP power. b. Reinstall the original MASA_0 pack. c. Check backplane for shorted pins. d. Check growth unit cables and bus terminators for proper installation, adjusting as needed. e. Restore RAP power and observe the LED on the CCS pack. If it goes on, off, on, off in 33 to 43 seconds, go to Step 24.
3-62 Issue 16.0
December 2000
Ring Maintenance
21. If the RAP is not equipped with a growth unit, go to Step 23. Otherwise, turn off RAP power and remove the basic-unit ends of the six growth cables, leaving them hanging free. Remove the six terminator resistors from the growth unit and place them in the positions formerly occupied by the basic-unit ends of the six growth cables. 22. Restore RAP power and observe the LED on the CCS pack. If it goes on, off, on, off in 33 to 43 seconds, the problem is in the growth-unit backplane. Go to Step 24. 23. Leave the node quarantined, call the CTS, and end this procedure. 24. Manually diagnose the node as follows: a. Depress the PCID DIAG switch. b. Check that the CCS, CCC, and MASC_0 LEDs come on. c. Check that the CCS LED goes off in 25 to 35 seconds for systems with the 2-Mbyte memory and in 35 to 45 seconds for systems with the 16-Mbyte memory. d. Check that the following circuit packs all go off in the order listed within 2 minutes for systems with the 2-Mbyte memory and within 75 seconds for systems with the 16-Mbyte memory. 1. MASA_0 2. MASC_0 3. CCC Check that the yellow fail light on the PCID has gone out. e. If the LED on any of the four circuit packs fails to go off on time or in the indicated sequence, or if the PCID fail light fails to go off, turn off RAP power, replace the faulted pack, turn on RAP power, and repeat this step. If the repetition is unsuccessful, leave the node quarantined and call the CTS.
Recognizing and Finding Intermittent Faults

Faults that occur in IMS hardware may be hard, transient, or intermittent. Hard faults permanently disable a component and are easy to nd. IMS automatic maintenance software dependably locates hard faults, removes them from the system, and directs technicians to repair them. One-time transient faults, if not easy to nd, are easy to deal with. They are caused by temporary hardware problems or glitches in software. Usually they are corrected by the IMS practice of reinstating the ring or a component after a rst failure. By contrast, intermittent or recurring transient faults are often neither easy to nd nor to deal with. If the frequency of their occurrence is fairly short and fairly regular, IMS software can
Issue 16.0
December 2000
3-63
401-661-045
usually locate them. But if their frequency of occurrence is long or very irregular, they may escape the IMS net. In such cases, manual records kept by technicians are the indispensable tool for identifying, nding, and correcting them. How will an intermittent fault show up? In a ring interface or IRN node processor, an intermittent fault may appear in several guises as repeated losses of token, as successful ring restarts following instances of blockage, as a node that EAR isolates but ARR returns to service because it passes diagnostics, as a node that ARR turns over to technicians because it has violated the fourth-time rule, or as a combination of these automatic responses. It could also appear as a repeated failure of EAR recovery level 3 to nd a fault that levels 1 and 2 had attempted unsuccessfully to isolate. Again, the existences and histories of faults of this kind are likely to be caught only in the manual records of technicians. On nodes suspected of having intermittent faults, enact the following checks:
s
Inspect the node and its housing (Visually). Look for poorly seated circuit packs, backplane damage or improper grounding, and poorly seated cable connections. Run diagnostics on the node in the repeat mode. Tap on the front of the circuit packs and apply pressure to the backplane with your thumb in an effort to stress cracks and in an attempt to stimulate an intermittent fault to recur. Move the circuit packs of a suspected node one-by-one to another location to see which hardware (if any) have an intermittent failure follow. (Make sure you keep careful records of each move.)
s s
IMS attempts to recover automatically from software faults. Thus no regular software maintenance is required of the Craft. Intermittent faults are more likely to be in hardware than in software. Nevertheless, when a troubled component consistently passes diagnostics, the fault could be in software.
Other Suggestions for Troubleshooting

The following are hints and advice based upon developer experience.
New Circuit Pack; Old Failure

Technicians are sometimes faced with the following anomaly. A node continues to fail diagnostics after its circuit packs have been replaced, yet no problem is visible in the backplane or ring bus wiring. Faced with this problem, technicians should consider that the fault might lie in the isolated RAC of the BISO or EISO node. An explanation follows:
3-64 Issue 16.0
December 2000
Ring Maintenance
Messages, including messages containing diagnostic code, are sent from the 3B21D to an isolated segment of the ring through the BISO or the EISO node. BISO and EISO nodes have one RAC participating in the active-ring segment and one RAC participating in the isolated-ring segment. Messages destined for the isolated segment are read from the active ring by the active-ring RAC, then transmitted by the node processor to the isolated-ring RAC, which writes them to the isolated segment of the ring. A fault in the isolated-ring RAC of either BISO or EISO node might go undetected, since it would not affect the transportation of message on the active ring and could show up misleadingly as a diagnostic failure in the isolated node, thereby, creating the maintenance anomaly described above. Therefore, technicians who face this problem should consider extending the isolation to include the current BISO and EISO nodes and running diagnostics on them.
Unconditional Restorals
Do not unconditionally restore a node unless you are certain it is without faults. Even when you are certain, do not unconditionally restore a node that has been powered down, that contains a ring-interface circuit pack that has been reset, or that exists in isolation with a node that has had a ring-interface circuit pack reset without rst running diagnostic phases 1 and 2 on it. When a node or a circuit pack has been powered down, the status registers of its ring-interface hardware may become improperly set, and an unconditional restoral of the node will likely result in a ring transport error and an isolation. Diagnostic phases 1 and 2 reset all ring-interface status registers to their proper positions.

Be aware that some correlation exists between unexplained losses of token and the number of out-of-service nodes, because the node processors of quarantined and isolated nodes cannot fulll their important and unassignable role in error detection and reporting.
Avoiding Trouble
Be careful not to leave the system unattended with ARR or CNR inhibited.
Recording Trouble
When troubleshooting a ring-related problem, frequently enter the OP:RING;DETD command as a way of providing, on the ROP output, sequential records of ring status. Such records may be useful during postmortems. If a problem is likely to be referred to developers at Bell Laboratories, save the current RPTERR0 and RPTERR1 log les in /etc/log. Keep records on all circuit pack replacements and failures.
Issue 16.0
December 2000
3-65
401-661-045
Keep records on all indications of transient and intermittent faults identifying, if possible, the locations where they occur. Remember that a transient fault may be an intermittent fault in its infancy.
New Installations or Ring Growth

New installations may wish to utilize the manual ring mode which is explained above. Avoid growing nodes on a live system that is experiencing unexplained transient failures. When installing a new IMS ring or growing a new node, verify that the hardware specied in the ECD UCB hv eld matches the hardware that is physically present. Also execute full diagnostics (automatic and optional) on every new ring node, resolving problems until diagnostics indicate ATP. If you encounter troubles, be suspicious of cables. Look for poor or open connectors, for cables connected to the wrong place, and for improper backplane grounding.
Examples of Ring Maintenance

This chapter exemplies some of the maintenance principles and practices that were formulated in the previous two chapters. Its purposes are to familiarize technicians with the IMS ROP output, to suggest ways for technicians to monitor and interact with automatic maintenance, and to provide technicians with realistic examples of both manual and automatic maintenance activities. Most of the examples represent common scenarios. A few are special cases. Together they compose an IMS tutorial. Each example is preceded by an introduction. The examples themselves are composed of two elements. A literal reproduction of ROP output in the left column of the page records maintenance-related events occurring in the ring subsystem. A commentary in the right column of the page provides a gloss on the adjacent ROP output. The gloss is selective and cumulative. It usually avoids explaining features that previous entries have explained. The examples composing this chapter incorporate two recently developed features, ring restart and automatic TLP output. Readers whose systems do not have ring restart should ignore the level-0 recovery efforts in the examples and begin with the level-1s. Readers without the TLP feature may use the DGN output messages to identify probable faulty equipment. A convention of this chapter is that data in ROP output messages that is not ordinarily used by technicians will be omitted and replaced by rows of periods.
3-66 Issue 16.0
December 2000
Ring Maintenance
Responses to Single, Ring-Related Faults

The following four examples of ring recovery occur in response to single faults of the kind that disrupt the transportation of messages on the ring.
Automatic Recovery from a Transient Fault by EAR Level 0

IMS software responds to faults that disrupt the transportation of messages on the ring with the EAR escalative recovery strategy. The rst or 0 level of this strategy consists of restarting the ring in conformity with its structure prior to the fault. Such a response will usually recover the ring subsystem from a transient fault, as it does in this example. Technicians should record the occurrence and, if possible, identify the location of transient faults.
Issue 16.0
December 2000
3-67
401-661-045
This example occurs on the following ring:

CMD> -- 1105 RING STATUS SUMMARY --
00AAAAAAAAAAAA....
01................
02................
30................
31.AAAAAAAAAAAAAAA
32AAAAAAAAAAAA....
63.AAAAAAAAAAAAAAA
CMD FUNCTION
400 OP RING DETAILED
REPT RING CFR LEVEL 0 RING CONFIGURATION INITIATED BY EAR NORMAL CONFIGURATION REQUESTED 0 1 4 3600000..........................................(4030614766)
Announces the onset of a level-0 recovery attempt, stimulated by EARs receipt of one or more error messages indicating a ring-related fault. The onset time of the attempt appears in milliseconds in parentheses on the bottom line. Other numbers on the bottom line pertain to the ring error threshold. The rst digit indicates EARs mode where 0 = ``threshold not exceeded and 1 = ``threshold exceeded. The second digit identies the number of ring errors that have occurred within the current threshold interval. The third digit is the user-specied number of errors per threshold interval that causes the threshold to be exceeded. And 3600000 is the user-specied threshold interval in milliseconds. When the second number equals the third, the threshold has been exceeded. Announces a successful restart of the ring. Thus no manual response is required. 455 ms is the duration in milliseconds of ring silence resulting from the conguration attempt, and in parentheses are the times when the ring conguration job started and was completed.
REPT RING CFR RING CONFIGURATION ESTABLISHED (455 ms) NORMAL CONFIGURATION, NODE NODES ISOLATED .................................(4030614777)(4030615120)
3-68 Issue 16.0
December 2000
Ring Maintenance
REPT RING TRANSPORT ERR RAC PARITY/FORMAT ERROR DETECTED, IUN31 11 RAC 0 ....................................................................... ....................................................(4030614653)
IMS in the 3B21D received this and the following two-ring transport error messages (at the times in parentheses) as a result of the fault that stimulated the above recovery attempt. This message (the rst to arrive) identies the error type and the node and RAC associated with the error. Notice that ring transport error messages appear on the ROP following the messages announcing the system response to the error. The fault spawned two instances of blockage, one from this, the second node upstream of the faulty node...
REPT RING TRANSPORT ERR BLOCKAGE DETECTED, IUN31 9 RAC 0 ....................................................................... .....................................................(4030614663) REPT RING TRANSPORT ERR BLOCKAGE DETECTED, IUN31 10 RAC 0 ....................................................................... .....................................................(4030614667)
and one from this, the rst node upstream of the faulty node. IUN 31 9 detected blockage before IUN 31 10 could drain the ring. IUN 31 10 must have detected blockage prior to IUN 31 9, but IUN 31 9s ring transport error report reached the 3B21D rst.
Issue 16.0
December 2000
3-69
401-661-045
Manual Recovery from a Hard Fault

After a hard fault, EAR level-0 will ordinarily try unsuccessfully to restart the ring. Then based upon its analysis of ring transport error messages, EAR level-1 will attempt to locate and isolate the fault. If EAR succeeds, ARR will then attempt to restore the isolated node conditionally and, if it fails, will change the node maintenance mode to manual, thereby, directing technicians to perform maintenance on it. This example is composed of the scenario just described.
OP:RING;DETD
RING STAT: ACTIVE
00AAAAAAAAAAAA....
01................
02................
30................
31.AAAAAAAAAAAAAAA
32AAAAAAAAAAAA....
63.AAAAAAAAAAAAAAA
REPT RING CFR LEVEL 0 RING CONFIGURATION INITIATED BY EAR NORMAL CONFIGURATION REQUESTED .....................................................(4030772385) REPT RING CFR RING CONFIGURATION ATTEMPT FAILED 17 COULD NOT ESTABLISH A NORMAL RING CONFIGURATION ....................................................................... (4030772397)(4030772536) REPT RING CFR LEVEL 1 RING CONFIGURATION INITIATED BY EAR ISOLATION FROM IUN31 11 TO IUN31 11 REQUESTED 0 2 4 3600000..................................(4030772561)
Prompted by a ring transport error report, EAR level-0 requests that the ring cong module restart the ring.
The continuity test run by the ring cong module failed, an indication that the fault is probably hard.
EAR level-1 requests that the ring cong module isolate the node indicated as faulty by the ring transport error messages.
3-70 Issue 16.0
December 2000
Ring Maintenance
REPT RING CFR RING CONFIGURATION ESTABLISHED (658 MS) BISO NODE = IUN31 10, EISO NODE = IUN31 12 (4030772580)(4030772942) REPT RING TRANSPORT ERR RAC PARITY/FORMAT ERROR DETECTED, IUN31 11 RAC 0. ................................................(4030772270) REPT RING TRANSPORT ERR BLOCKAGE DETECTED, IUN31 10 RAC 0. ................................................(4030772278) REPT RING TRANSPORT ERR BLOCKAGE DETECTED, IUN31 9 RAC 0. ................................................(4030772282) REPT ARR AUTORST ARR COND RST FOR IUN31 11 STARTED
IUN31 11 is isolated with IUN31 10 acting as BISO node and IUN31 12 acting as EISO node.
ARR requests that MIRA conditionally restore the isolated node. This is ARRs check that the removal and isolation of the node was necessary. The attempt will generate diagnostic data that the technician should use if called upon to perform maintenance on the node. RTR message announcing that ARR`s restoral request is on the active queue and being processed.
RST TERM LN31 11 TASK 3 MSG STARTED
Issue 16.0
December 2000
3-71
401-661-045
The 1105 display page now looks as follows:

CMD> RING STAT ISOLATED SEGMENT ARR RESORE COND IUN31 11 -- 1105 RING STATUS SUMMARY --
00AAAAAAAAAAAA....
01................
02................
30................ 63.AAAAAAAAAAAAAAA
31.AAAAAAAAAAAAAAA
32AAAAAAAAAAAA....
CMD FUNCTION 400 OP RING DETAILED
RMV IUN31 11 STOPPED 5
RTR message announcing that it could not remove IUN31 11 from service (because EAR had done so previously). Indicates that during phase 1 diagnostics, some tests (nine in all) failed and none (X00000000 X00000000) were skipped. IUN31 11 is not necessarily the node in which phase 1 failed, but the node specied in ARRs diagnostic request. Since phases 1 and 2 test all RACs in the isolated segment, the fault that produces a phase 1 or 2 failure may not reside in the specied node. The failure of test 005 indicates that, in this instance, low-phase ambiguity exists; in other words, that both a RAC of the isolated node and a RAC of either the EISO or BISO node is suspected of being faulty. See the ``LowPhase Ambiguity section in this chapter.
DGN IUN31 11 PH 1 STF (9 X00000000 X00000000) TEST 004........................................................... 005 X00000dfb................................................ 006........................................................... 008........................................................... 009...........................................................
3-72 Issue 16.0
December 2000
Ring Maintenance
DGN IUN31 11 PH 2 STF (10 X00000000 X00000000) TEST 002........................................................... 004........................................................... 005 X00000dfb................................................ 006........................................................... 007........................................................... DGN IUN31 11terminated at ph 2 stmnt 36 after test 17 ANALY:TLPFILE: IUN31 11 SUMMARY DATA MSG STARTED TLP: IUN31 11 PH=1.................................................... TLP: IUN31 11 PH=2....................................................
Phase-1 diagnostics test the isolated segment beginning at the BISO node and phase-2 tests them beginning at the EISO node. In the case of single-node isolations, the two phases should report failure data for the same node(s), but in the case of multiple-isolations they usually report failure data for different nodes.
Indicates the point in the diagnostic routine at which execution terminated. Summarizes diagnostic failure data. Phases cited are those that failed; but because phases 1 and 2 are at issue, IUN31 11 is not necessarily the location of the failure.
TLPFILE COMPLETED DGN IUN 31 11 COMPLETED STF (19........................) ANALY TLPFILE IUN31 11 TLPSRCH MSG IP TLPFILE #983090 ANALY TLPFILE IUN31 11 SUSPECT FLTY EQUIPMENT CODE GRP MEM CONT POS WT NOTE UN303 31 11 -- -- 10 -CABLE ----- 10 3 Short form of this message. The longer form is next. This data is printed only after a test fails and only if the TLP option was specied in the DGN command (as it always is by ARR). The entry lists in weighted (WT) order equipment suspected of being faulty. The WT is a number between 1 and 10. The higher the WT the greater the likelihood of the equipment being faulty. Because ARR does not specify the RAW option of the DGN command, failure data for test 010 is not given. (See the ``Low-Phase Ambiguity section of this chapter.) Because of diagnostic failure (error code 1).
RST IUN31 11 STOPPED 1 DGN IUN31 11 STF..............................................MSG COMPL REPT ARR AUTORST ARR COND RST FOR IUN 31 11 FAILED
Conrms that ARRs restoral request has failed. Many IMS processes write to the ROP, at times resulting in some redundancy.
Issue 16.0
December 2000
3-73
401-661-045
OP:RING;DETD RING STAT: ISOLATED SEGMENT BISO: IUN31 10 EISO: IUN31 12
Manual input message.
00AAAAAAAAAAAA....
01................
02................
30................
31.AAAAAAAAAAiAAAA
32AAAAAAAAAAAA.... 4
63.AAAAAAAAAAAAAAA
The subnumber 4 under the i in the above output message indicates that the ring interface of IUN31 11 is faulty. The numbers used in this way have the following meanings: 1 = manual mode 2 = RI QUSBL or NP faulty or untested 3 = combination of 1 and 2 4 = RI faulty or untested 5 = combination of 1 and 4 6 = combination of 2 and 4 7 = combination of 1, 2, and 4
OP:RING, IUN31 11 OP:RING IUN31 11 COMPL IUN32 11: MJ = OOS; NM = MAN; RI = FLTY ; NP = USBL IN ISOL SEG
Manual input message. Like the TLP and OP:RING;DETD outputs above, this data does not reect the low-phase ambiguity. Following the procedures, ``Responding to Single Node Isolations and ``Clearing Faults in Response to ARR Actions, a technician replaces circuit pack UN303 in IUN 31 11... and conditionally restores the node.
RST:IUN31 11
3-74 Issue 16.0
December 2000
Ring Maintenance
RST IUN31 11 TASK 4 MSG STARTED RMV IUN31 11 STOPPED 5 DGN IUN31 11 COMPLETED ATP MESSAGE IN PROGRESS REPT RING CFR RING CONFIGURATION ESTABLISHED (338 ms) NORMAL CONFIGURATION, NO NODES ISOLATED (4031118365)(40311118740) RST IUN31 11 COMPLETED IUN31 11 has been returned to the active ring, pumped with operational code and placed in execution. Repaired IUN31 11 now passes diagnostics. The isolation is dissolved automatically as IUN31 11 is restored.
DGN IUN31 11 ATP MESSAGE COMPLETE OP:RING;DETD
RING STAT: ACTIVE
00AAAAAAAAAAAA....
01................
02................
30................
31.AAAAAAAAAAAAAAA
32AAAAAAAAAAAA....
63.AAAAAAAAAAAAAAA
Automatic Recovery from a Transient Fault by ARR

In this example a fault triggers a level-0 recovery attempt that fails; EAR level 1 then isolates the apparently faulty node; and ARR's attempts to restore the node succeeds. Though the fault triggers two levels of EAR responses, no manual action is required other than to record the occurrence and location of the problem as a probable transient fault. This example occurs on the following ring:
Issue 16.0
December 2000
3-75
401-661-045
CMD> -- 1105 RING STATUS SUMMARY -00AAAAAAAAAAAA.... 30................ 63.AAAAAAAAAAAAAAA CMD FUNCTION 400 OP RING DETAILED REPT RING CFR LEVEL 0 RING CONFIGURATION INITIATED BY EAR NORMAL CONFIGURATION REQUESTED. 0 3 4 3600000................(4031349825) REPT RING CFR RING CONFIGURATION ATTEMPT FAILED 17 COULD NOT ESTABLISH A NORMAL RING CONFIGURATION ..................................................... (4031349837)(4031350005) REPT RING CFR LEVEL 1 RING CONFIGURATION INITIALED BY EAR ISOLATION FROM IUN31 11 TO IUN31 11 REQUESTED. 0 3 4 3600000.................(4031350030) REPT RING CFR RING CONFIGURATION ESTABLISHED (695 ms) BISO NODE = IUN31 10, EISO NODE = IUN31 12 (4031350049)(4031350422) REPT RING TRANSPORT ERR RAC PARITY/FORMAT ERROR DETECTED. IUN31 11 RAC 0. ........................................(4031349712) REPT RING TRANSPORT ERR BLOCKAGE DETECTED, IUN31 9 RAC 0. ........................................(4031349722) REPT RING TRANSPORT ERR BLOCKAGE DETECTED, IUN31 10 RAC 0. ........................................(4031349727) RST IUN31 11 TASK 5 MSG STARTED 01................ 31.AAAAAAAAAAAAAAA 02................ 32AAAAAAAAAAAA....
3-76 Issue 16.0
December 2000
Ring Maintenance
RMV IUN31 11 STOPPED 5 OP:RING;DETD
00AAAAAAAAAAAA....
01................
02................
30................
31.AAAAAAAAAAAAAAA
32AAAAAAAAAAAA....
63.AAAAAAAAAAAAAAA
DGN IUN31 11 COMPLETED ATP MESSAGE IN PROGRESS REPT RING CFR RING CONFIGURATION ESTABLISHED (338 ms) NORMAL CONFIGURATION, NO NODES ISOLATED (4031519404)(4031519780) RST IUN31 11 COMPLETED DGN IUN31 11 ATP MESSAGE COMPLETE REPT ARR AUTORST ARR COND RST FOR IUN31 11 SUCCEEDED OP:RING;DETD
RING STAT: ACTIVE
00AAAAAAAAAAAA....
01................
02................
30................
31.AAAAAAAAAAAAAAA
32AAAAAAAAAAAA....
63.AAAAAAAAAAAAAAA
Issue 16.0
December 2000
3-77
401-661-045
Manual Recovery from a Hard Fault on a Small Ring

Small rings with padded interframe buffers are subject to ring fragmentationa condition that causes the ring to go down. Ring fragmentation will occur when an isolation that includes padded buffers shortens an active ring below its minimum data length. Padded buffers are employed redundantly in pairs at opposite sides of the ring. Thus a single-node isolation on a small ring will never include both pairs, while in many cases a two-node isolation will. Nevertheless, a single-node isolation on small rings can pose problems because of the common need, arising from low-phase ambiguity, to extend isolations to include the BISO or EISO node. (For a discussion of this issue, see the section ``Low-Phase Ambiguity'' in this chapter.) Isolations on small rings often include one pair of padded buffers, and extending the isolation would often include the other pair as well. The conditions that give rise to this problem are illustrated in the following two gures.
RPCN00 0 RAC 0
RAC 1
IUN32 1
RAC 0
RAC 1
EISO Node
BISO Node
RAC 1
RAC 0
Isolated Node
RAC 1 Isolated Ring Active Ring RAC 0 Padded Interframe Buffers RPCN32 0
Figure 3-4.
Manual Recovery - Method One
3-78 Issue 16.0
December 2000
Ring Maintenance
In response to an ambiguous ring-interface failure associated with either RAC 0 in RPCN32 0 or RAC 0 in IUN32 1, IMS would congure the ring as in the structure illustrated. If, in such a ring, performing maintenance on RAC 0 in RPCN32 0 failed to clear the fault, the next procedural stepextending the isolation to include IUN32 1 in order to perform maintenance on it (see ``Guideline to Single-Node Isolations'' in this chapter)would bring the ring down, since both pairs of padded interframe buffers would then be included in the isolated segment. A version of the CFR command is designed especially for handling this dilemma. CFR:RING,NODEa,b;MOVFLT moves the indication of a faulty ring interface from the currently isolated node to the node identied as NODEa,b in the command. It also causes the isolation to shift so that NODEa,b becomes the newly isolated node and the formerly isolated node becomes the BISO or EISO node, as in the following illustration which was created by the command CFR:RING,LN32,1;MOVFLT:
RPCN00 0 RAC 0
RAC 1
EISO Node
IUN32 1
RAC 0
RAC 1
Isolated Node
RAC 1
RAC 0
BISO Node
RAC 1 Isolated Ring Active Ring RAC 0 Padded Interframe Buffers RPCN32 0
Figure 3-5.
Manual Recovery - Method Two
Issue 16.0
December 2000
3-79
401-661-045
The following example occurs on the four-node ring just illustrated:
REPT RING CFR LEVEL 0 RING CONFIGURATION INITIATED BY EAR NORMAL CONFIGURATION REQUESTED 0 1 4 3600000.............................(242674464) REPT RING CFR RING CONFIGURATION ATTEMPT FAILED 17 COULD NOT ESTABLISH A NORMAL RING CONFIGURATION ....................................................................... (242674474)(242674649) REPT RING CFR LEVEL 1 RING CONFIGURATION INITIATED BY EAR ISOLATION FROM RPCN32 0 TO RPCN32 0 REQUESTED 0 1 3 3600000.............................(242674676) REPT RING CFR RING CONFIGURATION ESTABLISHED (610 MS) BISO NODE = IUN00 1, EISO NODE = IUN32 1 (242674689)(242674963) REPT RING TRANSPORT ERR RAC PARITY/FORMAT ERROR DETECTED, IUN32 1 RAC 0. ...................................................................... ............................................(242674346) REPT ARR AUTORST ATT COND RST FOR RPCN32 0 STARTED RMV RPCN32 0 STOPPED 5 In this instance EAR did not receive or did not report blockage.
3-80 Issue 16.0
December 2000
Ring Maintenance
DGN RPCN32 0 PH 1 STF (11 X00000000 X00000000) TEST.................................................................. 002................................................................... 004................................................................... 005 (X00000e00)...................................................... 006................................................................... 007................................................................... DGN RPCN32 0 PH 2 STF (11 X00000000 X00000000) TEST.................................................................. 002................................................................... 004................................................................... 005 (X00000e00)......................................................... 006................................................................... 007................................................................... RPCN32 0 TERMINATED AT PH 27 STMNT 15 AFTER TEST 8 ANALY:TLPFILE: RPCN32 0 SUMMARY DATA TLP: RPCN32 0 PH=1.................................................... TLP: RPCN32 0 PH=2.................................................... T.PFILE COMPLETED DGN RPCN32 0 COMPLETED STF (21 X00000000 X00000000) ANALY TLPFILE RPCN32 0 TLPSRCH TLPFILE #917573 ANALY TLPFILE RPCN32 0 SUSPECT FLTY EQUIPMENT CODE GRP MEM CONT POS WT NOTE UN122C 32 0 -- -- 10 -UN123B 32 0 -- -- 10 -CABLE ----- 10 3
The failure of test 5 means that lowphase ambiguity exists in this case; in other words, the IMS regards either RAC 1 in the BISO node or RAC 0 in the EISO node, or both, as possibly faulty.
The extended TLP output message does not identify equipment in the BISO or EISO node as faulty, because the ring interfaces of these nodes are necessarily classied as usable.
RST RPCN32 0 STOPPED 1 DGN RPCN32 0 STF (21X00000000 X00000000)
Issue 16.0
December 2000
3-81
401-661-045
REPT ARR AUTORST ARR COND RST FOR RPCN32 0 FAILED OP:RING;DETD
Failure of the ARR restoral attempt results in the maintenance mode of the node being changed to manual.
RING STAT: ISOLATED SEGMENT BISO: IUN00 1 EISO: IUN32 1
00AA..............
01................
02................
30................
31................
32iA.............. 5
The isolation in this small ring during a time of heavy trafc creates an emergency condition. Following the procedures for ``Clearing Faults in Response to ARR Actions and ``Responding to Single-Node Isolations, the technician elects to change both UN122C and UN123B in RPCN32 0 but does not troubleshoot the cable. It is possible, of course, that the fault is in the cable, but this being a situation involving low-phase ambiguity, it is far more likely that the fault, if it is not in the circuit packs of RPCN32 0, is in the isolated RAC of either the EISO or BISO node. Then, this being a phase 1 and 2 failure, the technician diagnoses the node using the RAW option so that if phase 1 or 2 still fails, an indication will be given as to whether the isolated RAC of the BISO or EISO node is suspected of being faulty. Of course, the problem could be in the cable of RPCN32 0.
DGN RPCN32 0;RAW!
DGN RPCN32 0 TASK 5 MSG STARTED
3-82 Issue 16.0
December 2000
Ring Maintenance
RMV RPCN32 0 STOPPED 5 DGN RPCN32 0 PH 1 STF (11X00000000 X00000000) TEST MISMATCH........................ 002................................................................... 004................................................................... 005 X00000e00...................................................... 006................................................................... 007................................................................... 008................................................................... 009................................................................... 010 X00000e01...................................................... 011................................................................... 016................................................................... 017................................................................... DGN RPCN32 0 PH 2 STF (10X00000000 X00000000) TEST MISMATCH 002................................................................... 004................................................................... 005 X00000e00............................ 006................................................................... 007................................................................... 008................................................................... 009................................................................... 010 X00000c01............................ 011................................................................... 016................................................................... 017................................................................... DGN RPCN32 0 PH 10 ATP.................... DGN RPCN32 0 PH 11 ATP..................... DGN RPCN32 0 PH 12 ATP..................... DGN RPCN32 0 PH 13 ATP..................... DGN RPCN32 0 PH 20 ATP..................... The mismatch data for failing test 10 identies both IUN32 1 and IUN00 1 as suspect nodes. (Hexadecimal e01 is translated by the ``Physical Node Address Hexadecimal Representation table in the reference chapter of this document as node 32 1 and hexadecimal c01 is translated as node 00 1.) In this situation, the standard procedure calls for technicians to extend the isolation to include IUN32 1 or IUN00 1 to perform maintenance on it. Extending the isolation to include IUN32 1 would in this instance, however, bring the ring down, because it would result in the isolation of both pairs of padded interframe buffers.
(See the illustration of the ring that appears at the beginning of this section.) Therefore, the rst action (which to conserve space is not shown here) was to extend the isolation to include IUN00 1 and to perform maintenance on it. This action, however, did not nd a fault in IUN00 1, and so the isolation was reduced to include once again only RPCN32 0, and the MOVFLT option of the CFR command was employed to shift the isolation from RPCN32 0 to IUN32 1 as played out below.
Issue 16.0
December 2000
3-83
401-661-045
DGN RPCN32 0 PH 23 ATP..................... DGN RPCN32 0 PH 24 ATP..................... DGN RPCN32 0 PH 26 ATP..................... DGN RPCN32 0 PH 27 ATP..................... Unuseful output generated by the DGN RAW option could have been stopped by terminating DGN with the STOP:DMQ command.
DGN RPCN32 0 TERMINATED AT PH 27 STMNT 15 AFTER TEST 3 DGN RPCN32 0 STF (21 X00000000 X0000000)......... RMV:LN32 1 In preparation for entering the CFR command, the node specied in the command must be removed from service.
RMV IUN32 1 TASK 0 RMV IUN32 1 COMPLETED OP:RING;DETD
RING STAT: RESTORING BISO: IUN00 1 EISO: IUN32 1
00AA..............
01................
02................
30................
31................
32iO.............. 51
REPT RING CFR WARNING: BISO AND/OR EISO NODE OOS BISO NODE - IUN00 1, EISO NODE =IUN32 1 ACTIVE RING SEGMENT NOT LONG ENOUGH
Removing a BISO or EISO node from service would ordinarily cause the isolation to extend to include the out-of-service node. In this case it does not, however, because IMS calculates that doing so would shorten the ring below its minimum data length.
3-84 Issue 16.0
December 2000
Ring Maintenance
CFR:RING,IUN32 1;MOVFLT!
With the suspect IUN32 1 quarantined out-of-service, the technician enters the MOVFLT version of the CFR command to shift the isolation to include IUN32 1.
REPT RING CFR RING CONFIGURATION ESTABLISHED (290 ms) BISO NODE = RPCN32 0, EISO NODE = RPCN00 0 (243506608) (243506934) REPT ARR AUTORST CNR UCL REST FOR RPCN32 0 STARTED CFR RING IUN32 1 COMPL ARR undertakes its highest-priority task, the restoral of a node designated as a BISO or EISO node. The isolation shifted, the ring now has the structure of the second illustration at the beginning of this section, and the probable fault in IUN32 1 may now be corrected.
RING STAT: ISOLATED SEGMENT BISO: RPCN32 0 EISO: RPCN00 0
00AA..............
01................
02................
30................
31................
32Ai.............. 5
Responses to Multiple, Ring-Related Faults

The following two examples of ring-recovery actions occur in response to multiple faults of the kind that disrupt the transportation of messages on the ring.
Manual Recovery from Multiple Hard Faults

Multiple faults have the potential of creating massive isolations. Because they usually develop as extensions of single faults, they are best avoided by prompt and effective attention to single faults. The history of the following massive isolation is typical. In the rst stage, a single node is isolated, diagnosed at the
Issue 16.0
December 2000
3-85
401-661-045
request of ARR as RI faulty, and its maintenance mode changed to manual. Then, before the technician can repair and return it to service, another ring-related fault occurs on a distant part of the ring, with the result that the many nodes lying between the two faulty nodes must be removed from service as victims of the expanded isolation. The rst stage of this example is identical to the example recorded above in ``Manual Recovery from a Hard Fault,'' except that the massive isolation intervenes before the rst fault can be repaired. This example occurs on the following ring:
CMD>
-- 1105 RING STATUS SUMMARY --
00AAAAAAAAAAAA....
01................
02................
30................
31.AAAAAAAAAAAAAAA
32AAAAAAAAAAAA....
63.AAAAAAAAAAAAAAA
REPT RING CFR LEVEL 0 RING CONFIGURATION INITIATED BY EAR NORMAL CONFIGURATION REQUESTED .....................................................(4030772385) REPT RING CFR RING CONFIGURATION ATTEMPT FAILED 17 COULD NOT ESTABLISH A NORMAL RING CONFIGURATION ....................................................................... (4030772397)(4030772536)
Prompted by a ring transport error report, EAR level-0 requests that the ring cong module restart the ring.
The continuity test run by the ring cong module failed, an indication that the fault is probably hard.
3-86 Issue 16.0
December 2000
Ring Maintenance
REPT RING CFR LEVEL 1 RING CONFIGURATION INITIATED BY EAR ISOLATION FROM IUN31 11 TO IUN31 11 REQUESTED 0 2 4 3600000..................................(4030772561) REPT RING CFR RING CONFIGURATION ESTABLISHED (658 MS) BISO NODE = IUN31 10, EISO NODE = IUN31 12 (4030772580)(4030772942) REPT RING TRANSPORT ERR RAC PARITY/FORMAT ERROR DETECTED, IUN31 11 RAC 0. ....................................................................... ................................................(4030772270) REPT RING TRANSPORT ERR BLOCKAGE DETECTED, IUN31 10 RAC 0. ....................................................................... ................................................(4030772278) REPT RING TRANSPORT ERR BLOCKAGE DETECTED, IUN31 9 RAC 0. ....................................................................... ................................................(4030772282) REPT ARR AUTORST ARR COND RST FOR IUN31 11 STARTED
EAR level-1 requests that the ring cong module isolate the node indicated by the ring transport error messages below as faulty.
IUN31 11 is isolated with IUN31 10 acting as BISO node and IUN31 12 acting as EISO node.
ARR requests that MIRA conditionally restore the isolated node. This is ARRs check that the removal and isolation of the node was necessary. The attempt will generate diagnostic data that the technician should use if called upon to perform maintenance on the node. RTR message announcing that ARRs restoral request is on the active queue and being processed.
RST IUN31 11 TASK 3 MSG STARTED
Issue 16.0
December 2000
3-87
401-661-045
The 1105 display page now looks as follows:

CMD> RING STAT ISOLATED SEGMENT ARR RESTORE COND IUN31 11 -- 1105 RING STATUS SUMMARY --
00AAAAAAAAAAAA....
01................
02................
30................
31.AAAAAAAAAAAAAAA
32AAAAAAAAAAAA....
63.AAAAAAAAAAAAAAA
RMV IUN31 11 STOPPED 5
RTR message announcing that it could not remove IUN31 11 from service (because EAR had done so previously). Indicates that during phase 1 diagnostics, some tests (nine in all) failed and none (X00000000 X00000000) were skipped. IUN31 11 is not necessarily the node in which phase 1 failed, but the node specied in ARRs diagnostic request. Since phases 1 and 2 test all RACs in the isolated segment, the fault that produces a phase 1 or 2 failure may not reside in the specied node. The failure of test 005 indicates that, in this instance, low-phase ambiguity exists; in other words, that both a RAC of the isolated node and a RAC of either the EISO or BISO node is suspected of being faulty. See the ``Low-Phase Ambiguity section in this chapter.
DGN IUN31 11 PH 1 STF (9 X00000000 X00000000) TEST 004........................................................... 005 X00000dfb................................................ 006........................................................... 008........................................................... 009...........................................................
3-88 Issue 16.0
December 2000
Ring Maintenance
DGN IUN31 11 PH 2 STF (10 X00000000 X00000000) TEST 002........................................................... 004........................................................... 005 X00000dfb................................................ 006........................................................... 007...........................................................
Phase-1 diagnostics test the isolated segment beginning at the BISO node and phase-2 tests them beginning at the EISO node. In the case of single-node isolations, the two phases should report failure data for the same node(s), but in the case of multiple-isolations they usually report failure data for different nodes. Indicates the point in the diagnostic routine at which execution terminated.
DGN IUN31 11 terminated at ph 2 stmnt 36 after test 17 ANALY:TLPFILE: IUN31 11 SUMMARY DATA MSG STARTED TLP: IUN31 11 PH=1.................................................... TLP: IUN31 11 PH=2.................................................... TLPFILE COMPLETED Summarizes diagnostic failure data. Phases cited are those that failed; but because phases 1 and 2 are at issue, IUN31 11 is not necessarily the location of the failure.
DGN IUN 31 11 COMPLETED STF (19...................................) ANALY TLPFILE IUN31 11 TLPSRCH MSG IP TLPFILE #983090 ANALY TLPFILE IUN31 11 SUSPECT FLTY EQUIPMENT CODE GRP MEM CONT POS WT NOTE UN303 31 11 -----10 10 -3 Short form of this message. The longer form is next. This data is printed only after a test fails and only if the TLP option was specied in the DGN command (as it always is by ARR). The entry lists in weighted (WT) order equipment suspected of being faulty. The WT is a number between 1 and 10. The higher the WT the greater the likelihood of the equipment being faulty. Because ARR does not specify the RAW option of the DGN command, failure data for test 010 is not given. (See the ``Low-Phase Ambiguity section of this chapter.) Because of diagnostic failure (error code 1).
CABLE --
RST IUN31 11 STOPPED 1 DGN IUN31 11 STF..............................................MSG COMPL
Issue 16.0
December 2000
3-89
401-661-045
REPT ARR AUTORST ARR COND RST FOR IUN 31 11 FAILED
Conrms that ARRs restoral request has failed. Many IMS processes write to the ROP, at times resulting in some redundancy. Manual input message.
OP:RING;DETD RING STAT: ISOLATED SEGMENT BISO: IUN31 10 EISO: IUN31 12
00AAAAAAAAAAAA....
01................
02................
30................
31.AAAAAAAAAAAAAAA
32AAAAAAAAAAAA....
63.AAAAAAAAAAAAAAA
OP:RING, IUN31 11 OP:RING IUN31 11 COMPL IUN31 11: MJ = OOS; NM = MAN; RI = FLTY ; NP = USBL IN ISOL SEG REPT RING CFR LEVEL 0 RING CONFIGURATION INITIATED BY EAR ISOLATION FROM IUN31 11 TO IUN31 11 REQUESTED. 0 1 4 3600000................(403082426)
Manual input message. Like the TLP output above, this data does not reect the low-phase ambiguity.
Before the technician can respond to the single isolation, another fault occurs. EAR level-0 attempts to restart the ring in conformity with its isolated structure prior to the occurrence of the second fault.
3-90 Issue 16.0
December 2000
Ring Maintenance
REPT RING CFR RING CONFIGURATION ATTEMPT FAILED 17 COULD NOT ESTABLISH BISO NODE = IUN31 10, EISO NODE = IUN31 12 ...................................................................... (403082441)(403082625) REPT RING CFR LEVEL 1 RING CONFIGURATION INITIATED BY EAR ISOLATION FROM IUN31 11 TO IUN32 6 REQUESTED. 0 2 4 3600000.................(403082654) REPT RING TRANSPORT ERR RMV RPCN 32 0 RQSTD; RPC ISOLATION RPTD ...................................(403082796) REPT RING CFR RING CONFIGURATION ESTABLISHED (703 ms) BISO NODE = IUN31 10, EISO NODE = IUN32 7 (403082671)(403082031) REPT RING TRANSPORT ERR RAC PARITY/FORMAT ERROR DETECTED, IUN32 6 RAC 0. ........................................(403082306) REPT RING TRANSPORT ERR BLOCKAGE DETECTED, IUN32 5 RAC 0. ...................................................................... ........................................(403082316) REPT RING TRANSPORT ERR BLOCKAGE DETECTED, IUN32 4 RAC 0. ...................................................................... ........................................(403082322) REPT ARR AUTORST ARR COND RST FOR IUN32 6 STARTED
Ring congs continuity test failed...
so the isolation must be extended to include both nodes suspected of having faulty ring interfaces.
This message noties the technician that an innocent-victim RPCN is being included in the extended isolation. The multiple-node isolation is now established.
Having failed previously (during the single isolation stage) to restore IUN31 11, ARR now selects IUN32 6 for a conditional restoral attempt.
RST IUN32 6 TASK 6 MSG STARTED
Issue 16.0
December 2000
3-91
401-661-045
RMV IUN32 6 STOPPED 5 DGN IUN32 6 PH 1 STF (9 X00000000 X`00000000) TEST.................................................................... 004..................................................................... 005 X00000dfb......................................................... 006..................................................................... 008..................................................................... 009..................................................................... DGN IUN32 6 PH 2 STF (11 X00000000 X`00000000) TEST.................................................................... 002..................................................................... 004..................................................................... 005 X00000e06......................................................... 006..................................................................... 007..................................................................... DGN IUN32 6 TERMINATED AT PH 2 STMNT 36 AFTER TEST 17 ANALY:TLPFILE: IUN32 6 SUMMARY DATA TLP: IUN32 6 PH=1........................................................ TLP: IUN32 6 PH=2........................................................ TLPFILE COMPLETED DGN IUN32 6 COMPLETED STF (20..................) ANALY TLPFILE IUN 32 6 TLPSRCH TLPFILE # 1179716 ANALY TLPFILE IUN32 6 SUSPECT FLTY EQUIPMENT CODE GRP MEM CONT POS WT NOTE UN303 UN303 31 31 12 11 -------10 10 10 --3 Contrast this output with the TLP output when IUN32 11 was singly isolated. Both then and now the ring interface of IUN31 12 was suspect. The difference is that when the suspect RAC of IUN31 12 was part of an EISO node, its ring interface could not be set to FLTY. IUN32 6 is not included because the TLP output reects only the rst failing phase. Phase-2 diagnostic tests begin running from the EISO node. Therefore, they identify IUN32 6 (e06) as faulty. The failure of test 005 of phase 2 indicates that low-phase ambiguity exists surrounding IUN32 6. Probably, though not certainly, IUN32 5, whose ring interface is suspected to be faulty, is the node involved in this instance of low-phase ambiguity. Phase-1 diagnostic tests begin running from the BISO node. Therefore, they identify IUN31 11 as faulty.
CABLE --
RST IUN32 6 STOPPED 1 DGN IUN32 6 STF (20 X`00000000 X`00000000)
3-92 Issue 16.0
December 2000
Ring Maintenance
REPT ARR AUTORST ARR COND RST FOR IUN32 6 FAILED OP:RING;DETD
00AAAAAAAAAAAA....
01................
02................
30................
31.AAAAAAAAAAiiiii 54
32iiiiiiiAAAAA.... 45
63.AAAAAAAAAAAAAAA
Notice that the subnumbers produced by the OP:RING;DETD command indicate that, as a result of low-phase ambiguity, four nodes are suspected of having faults in their ring interfaces. Because none of the four is now in the active ring as an EISO or BISO node, each can have its ring interface minor state marked FLTY. DGN:IUN31 11;RAW! In accordance with the procedures, ``Responding to Multiple-Node Isolations and ``Clearing Faults in Response to ARR Actions, a technician replaces circuit pack UN303 in IUN 31 11 and submits the node to automatic diagnostics with the RAW option.
DGN IUN31 11 TASK 8 MSG STARTED
Issue 16.0
December 2000
3-93
401-661-045
RMV IUN31 11 STOPPED 5 DGN IUN31 11 PH 1 (STF (10X00000000 X00000000) TEST.................................................................... 004..................................................................... 005 X00000e05........................................... 006..................................................................... 007..................................................................... 008..................................................................... 009..................................................................... 010 X00000e06........................................................ 011..................................................................... 016..................................................................... 017..................................................................... REPT ARR AUTORSTR ARR COND RST FOR IUN31 12 STARTED Having failed to restore IUN31 11 and IUN32 6, ARR now attempts to restore IUN31 12. This automatic action occurs at nearly the same time as the manual diagnostic procedure. This output from the manual diagnostic request with the RAW option shows IUN32 5 and IUN32 6 as suspected of having faulty ring interfaces, implying that IUN31 11 and IUN31 12 have passed phase 1, a condition that should cause their ring interface states to change to QUSBL.
RST IUN31 12 QUEUED TASK 0 DGN IUN31 11 PH 2 STF (11 X00000000 X00000000) TEST.................................................................... 002..................................................................... 004..................................................................... 005 X00000e06.......................................................... 006..................................................................... 007..................................................................... 008..................................................................... 009..................................................................... 010 X00000e05........................................................ 011..................................................................... 016..................................................................... 017..................................................................... DGN IUN31 11 TERMINATED AT PH 2 STMNT 36 AFTER TEST 17
3-94 Issue 16.0
December 2000
Ring Maintenance
DGN IUN31 11 COMPLETED STF (21...........) RST LN31 12 TASK 9 RMV IUN31 12 STOPPED 5 DGN IUN31 12 PH 1 (STF (10X00000000 X00000000) TEST.................................................................... 004..................................................................... 005 X00000e05......................................................... 006..................................................................... 007..................................................................... 008..................................................................... DGN IUN31 12 PH 2 (STF (11X00000000 X00000000) TEST..................................................................request. 004..................................................................... 005 X00000e06......................................................... 006..................................................................... 007..................................................................... 008..................................................................... DGN IUN31 12 TERMINATED AT PH 2 STMNT 36 AFTER TEST 17 ANALY:TLPFILE: IUN31 12 SUMMARY DATA TLP: IUN31 12 PH=1...................................................... TLP: IUN31 12 PH=2...................................................... ANALY TLPFILE IUN31 12 SUSPECT FLTY EQUIPMENT CODE GRP MEM CONT POS WT NOTE UN303 UN303 32 32 6 5 -------10 10 10 --3 Only the extended TLP message explicitly identies the node(s) within the isolation that may have failed diagnostic phases 1 and 2. This is output from ARRs restoral request. ARR restoral request on IUN31 12 started.
CABLE --
Issue 16.0
December 2000
3-95
401-661-045
REPT RING CFR RING CONFIGURATION ESTABLISHED (358 ms) BISO NODE = IUN32 4, EISO NODE = IUN32 7 (403041870)(403042272)
This action was triggered by the automatic RST command, which concludes with a request that as much as possible of an isolated segment be included in the active ring. The isolated segment is now reduced to the two nodes whose ring interfaces are still suspected of being faulty.
DGN IUN 31 12 STF................................................... REPT ARR AUTORST ARR COND RST FOR IUN31 12 FAILED REPT ARR AUTORST CNR UCL RST FOR IUN32 4 STARTED The new BISO node, having been an innocent victim of the isolation, was outof-service. Restoring a BISO or EISO node is the highest priority of ARR.
REPT ARR AUTORST CNR UCL RST FOR IUN32 4 SUCCEEDED RST IUN32 4 COMPLETED REPT ARR AUTORST ARR COND RST FOR IUN32 5 STARTED Having previously attempted and failed to restore IUN32 6, ARR now attempts to restore IUN32 5. Consult the section ``Restoral Priorities Rule in this chapter for an explanation of ARRs behavior in the remainder of this example.
RST IUN32 5 TASK 0 MSG STARTED RMV IUN32 5 STOPPED 5 DGN IUN32 5 PH 1 (STF (10X00000000 X00000000) TEST.................................................................... 004..................................................................... 005 X00000e05......................................................... 006..................................................................... 007..................................................................... 008..................................................................... This is output from ARRs restoral request for IUN32 5.
3-96 Issue 16.0
December 2000
Ring Maintenance
DGN IUN32 5 PH 2 (STF (11X00000000 X00000000) TEST..................................................................request. 004..................................................................... 005 X00000e06......................................................... 006..................................................................... 007..................................................................... 008..................................................................... DGN IUN32 5 TERMINATED AT PH 2 STMNT 36 AFTER TEST 17 ANALY:TLPFILE: IUN32 5 SUMMARY DATA TLP: IUN32 5 PH=1........................................................ TLP: IUN32 5 PH=2........................................................ ANALY TLPFILE IUN31 12 / SUSPECT FLTY EQUIPMENT CODE GRP MEM CONT POS WT NOTE UN303 UN303 32 32 6 5 -------10 10 10 --3
CABLE --
RST IUN32 5 STOPPED 10 DGN IUN32 5 STOPPED COMPLETED REPT ARR AUTORST ARR COND RST FOR IUN32 5 FAILED REPT ARR AUTORST ARR UCL RST FOR RPCN32 0 STARTED Having attempted to restore all nodes whose ring interfaces are possibly faulty, ARR now unconditionally restores the innocent victim RPCN...
RST RPC32 0 COMPLETED REPT ARR AUTORST ARR UCL RST FOR IUN31 13 STARTED and then the innocent victim IUNs. (The ROP output concerning restoral of the innocent victim IUNs is omitted from this example.)
REPT ARR AUTORST ARR UCL RST FOR IUN31 13 SUCCEEDED RST IUN31 13 COMPLETED OP:RING;DETD
Issue 16.0
December 2000
3-97
401-661-045
00AAAAAAAAAAAA....
01................
02................
30................
31.AAAAAAAAAAOOAAA 33
32AAAAAiiAAAAA.... 55
63.AAAAAAAAAAAAAAA
OP:RING, IUN31 11 OP:RING IUN31 11 COMPL IUN31 11: MJ = OOS; NM = MAN; RI = QUSBL; NP = USBL IN ACT RING OP:RING, IUN31 12 OP:RING IUN31 12 COMPL IUN31 12: MJ = OOS; NM = MAN; RI = QUSBL; NP = USBL IN ACT RING Notice that IUN31 11 and IUN31 12 are now quarantined and in the manual mode. They are in the manual mode because ARR previously failed to restore them. They are quarantinedclassied as QUSBLbecause no diagnostic phases higher than 2 have been run on them and, therefore, IMS cannot know that their ring-interface hardware (except for the hardware tested by phases 1 and 2that is, the hardware that propagates messages on the ring) is usable.
3-98 Issue 16.0
December 2000
Ring Maintenance
RST:IUN32 6:TLP
Following standard procedures, the technician now assigns priority to performing maintenance on the remaining isolated segment. Choosing IUN32 6 because it was an external isolated node in the massive isolation, the technician changes the circuit pack indicated in the original TLP message and then conditionally restores the node to service. (Although manual restoral requests take priority over automatically requested conditional restorals, the former can occur in parallel with automatically requested unconditional restorals, such as are occurring. Therefore, the technician felt free to conditionally restore IUN32 6. If a conict had existed, allowing the rapid recovery of the many innocent victim nodes to proceed without interruption would usually make sense. The decision to conditionally restore IUN32 6 rather than to follow the somewhat slower procedure of running diagnostics on it with the RAW option was dictated by the high probability that IUN32 5 is the other node involved in this instance of low-phase ambiguity.)
REPT ARR AUTORST ARR UCL RST FOR IUN31 14 STARTED RST:IUN31 11 TASK 1 REPT ARR AUTORST ARR UCL RST FOR IUN31 14 SUCCEEDED RST IUN31 14 COMPLETED REPT ARR AUTORST ARR UCL RST FOR IUN31 15 STARTED RMV IUN31 11 STOPPED 5 REPT ARR AUTORST ARR UCL RST FOR IUN31 15 SUCCEEDED DGN IUN31 11 COMPL CATP (X00000000 X40000000) See the OM under DGN IUN, Bit 30, which indicates that all phases did not run because the node under test was not the only isolated node.
Issue 16.0
December 2000
3-99
401-661-045
RST IUN31 15 COMPLETED
ROP output concerning ARRs unconditional restorals of the remaining innocent victims is omitted from this example.
RST IUN32 6 TASK 2 MSG STARTED RMV IUN32 6 STOPPED 5 DGN IUN32 6 COMPL CATP (X00000000 X40000000) REPT RING CFR RING CONFIGURATION ESTABLISHED (338 ms) NORMAL CONFIGURATION, NO NODES ISOLATED (403431319)(403431699) RST IUN32 6 COMPLETED OP:RING;DETD! OP RING COMP RING STAT: ACTIVE That IMS is dissolving the remaining isolation, returning the ring subsystem to a two-ring structure, indicates the fault was located in IUN32 6.
00AAAAAAAAAAAA....
01................
02................
30................
31.AAAAAAAAAAOOAAA 33
32AAAAAOAAAAAA.... 3
63.AAAAAAAAAAAAAAA
3-100 Issue 16.0
December 2000
Ring Maintenance
RST:IUN31 12!
Now the only task remaining for the technician is to conditionally restore the remaining out-of-service nodes, none of which will be handled by ARR, since they are all in the manual mode. Probably none of the out-of-service nodes will contain faults, since one has had its ringinterface circuit pack replaced and the other two were designated as possibly faulty as a result of low-phase ambiguity. Nevertheless, the technician restores them conditionally to be certain that a fault undetected in one of them does not lead to another massive isolation. If while diagnostics are run on these nodes, a fault were to appear elsewhere in the ring, IMS would avoid a massive isolation by immediately returning the node being diagnosed to the active ring.
RST IUN31 12 TASK 2 MSG STARTED RMV IUN31 12 STOPPED 5 RST IUN31 11 COMPLETED REPT RING CFR RING CONFIGURATION ESTABLISHED (308 ms) BISO NODE = IUN31 10, EISO NODE = IUN31 13 (403490173)(403490559) The predictable action that concludes this example is not reproduced.
Automatic Recovery from Two Intermittent Faults

In the following example of ring maintenance, two staggered intermittent faults occur at intervals that frustrate successive EAR recovery attempts by repeatedly violating the 5-second condence intervals. In this manner the faults drive EAR to level 4 before it can establish a stable, usable ring. The sequence of automatic actions culminates in a restored system. It, therefore, requires the technicians to only record the occurrences and locations of the two intermittent faults. This episode occurs in the following ring:
Issue 16.0
December 2000
3-101
401-661-045
CMD>
-- 1105 RING STATUS SUMMARY --
00AAAAAAAAAAAA....
01................
02................
30................
31.AAAAAAAAAAAAAAA
32AAAAAAAAAAAA....
63.AAAAAAAAAAAAAAA
CMD 400
FUNCTION OP RING DETAILED
REPT RING CFR LEVEL 0 RING CONFIGURATION INITIATED BY EAR NORMAL CONFIGURATION REQUESTED 0 1 4 3600000.......................(4034364845) REPT RING CFR RING CONFIGURATION ESTABLISHED (468 ms) NORMAL CONFIGURATION, NO NODES ISOLATED (4034364857)(4034365210) REPT RING TRANSPORT ERR RAC PARITY/FORMAT ERROR DETECTED, IUN31 11 RAC 0 ....................................................................... ............................................(4034364730) REPT RING TRANSPORT ERR BLOCKAGE DETECTED, IUN31 09 RAC 0 ....................................................................... ............................................(4034364740)
A ring-related fault stimulates EAR to a level-0 attempt (restart) to recover the ring.
The restart succeeds initially, but...
3-102 Issue 16.0
December 2000
Ring Maintenance
REPT RING TRANSPORT ERR BLOCKAGE DETECTED, IUN31 10 RAC 0 ....................................................................... ............................................(4034364745) REPT RING CFR LEVEL 1 RING CONFIGURATION INITIATED BY EAR ISOLATION FROM IUN31 11 TO IUN31 12 REQUESTED 0 1 4 3600000.......................... (4034368158) REPT RING CFR RING CONFIGURATION ESTABLISHED (437 MS) BISO NODE = IUN31 10, EISO NODE = IUN31 12 (4034368175)(4034368492) REPT RING TRANSPORT ERR RAC PARITY/FORMAT ERROR DETECTED, IUN31 11 RAC 0 ....................................................................... ............................................(4034368041) REPT RING TRANSPORT ERR BLOCKAGE DETECTED, IUN31 09 RAC 0 ....................................................................... ............................................(4034368051) REPT RING TRANSPORT ERR BLOCKAGE DETECTED, IUN31 10 RAC 0 ....................................................................... ............................................(4034368056) REPT RING TRANSPORT ERR UNEXPLAINED LOSS OF TOKEN REPORTED ON BOTH RINGS. REPT TOKEN TRACK TOKEN WAS LOST BETWEEN IUN32 5 AND IUN32 6 ON RING: 0 REPT RING CFR LEVEL 3 RING CONFIGURATION INITIATED BY EAR 0 1 4 3600000.............................(4034373503) ...within the condence interval the 3B21D receives notice that the token is lost without receiving other error reports. The token-track module reports the probable location where the token left the ring. When unexplained loss of token occurs during the condence interval of levels 0 or 1, EAR jumps to level 3. The isolation succeeds momentarily, but... ...another fault occurs less than 3 seconds into the recovery, thereby, driving EAR to escalate to a level-1 attempt to isolate the faulty node.
Issue 16.0
December 2000
3-103
401-661-045
REPT RING CFR RING CONFIGURATION ESTABLISHED (1302 MS) NORMAL CONFIGURATION, NO NODES ISOLATED (4034374032)(4034374330)
EAR level-3 tests for continuity in the rings. Because the tests succeed, EAR directs ring conguration to establish the normal, two-ring structure. The success of the ring continuity tests are the rst clear indication that the recent faults are transient in nature. But again the condence interval fails, so EAR escalates to level 4.
REPT RING CFR LEVEL 4 RING CONFIGURATION INITIATED BY EAR 0 1 4 3600000..............................(4034376599) REPT RING CFR RING CONFIGURATION ESTABLISHED (8169 MS) NORMAL CONFIGURATION, NO NODES ISOLATED (4034384478)(4034384790)
Level 4 also nds continuity in the rings and directs ring conguration to establish the normal, two-ring structure. In this instance the recovery out lasts the condence interval, thereby, ending this episode of EAR escalation. Evidently the episode was triggered by two transient faults. The location of one fault is suggested by the short-lived, level-1 isolation of IUN31 11. The location of the other was identied by token track as between IUN32 5 and IUN32 6. The technician who witnesses these events should record the occurrences and locations of the two intermittent faults and perhaps should retain the ROP output of this unusual episode.
3-104 Issue 16.0
December 2000
4
4-1 4-3 4-3 4-3 4-6 4-7 4-11 4-12 4-18 4-18 4-20 4-20 4-20 4-20 4-20 4-21 4-21 4-21 4-22 4-22 4-22 4-22 4-22 4-22 4-23 4-23 4-23
Contents
Introduction Ring Fault Conditions and Maintenance Approach
s
Ring Node Out-of-Service Ring Node OOS Maintenance Approach Single-Ring Node Isolation Single Node Isolation Maintenance Approach Multiple-Ring Node Isolation Multiple Node Isolation Maintenance Approach Ring Down Ring Down Maintenance Approach Feature Definition Purpose Incompatibilities Interactions Changes Feature Description Release Availability Provisioning Special Planning Considerations Hardware Software Impact Software Description User Profile Description of Feature Operation Initial Setup Setting a Breakpoint
Ring Generic Access Package (RGRASP)

s s s s s s s s s s s s s s
Issue 16.0
December 2000
4-i
401-661-045
Contents
Loading Memory Reading Memory Loading and Dumping RGRASP Utility Variables (UVARs) Feature Activation Feature Deactivation Equipment Configuration Data (ECD) Recent Change Procedures Measurement Network Management Impact Maintenance/Troubleshooting Impact Recording Output Messages Audits Critical Events Support Tools Related Documentation Cross-References 4-24 4-24 4-25 4-25 4-25 4-25 4-25 4-25 4-26 4-26 4-27 4-30 4-31 4-31 4-31 4-31
s s s s s s s s s s s
4-ii
Issue 16.0
December 2000
Introduction
This guide serves as an aid in performing ring and ring hardware maintenance functions. It contains procedures used in detecting, troubleshooting, and clearing faults associated with the ring and ring hardware. The procedures detailed in this guide are only guidelines for resolving ring-associated maintenance problems, and are not the only methods that may be used in performing ring maintenance. A system called trace provides a formal mechanism for embedding tracepoints within application code for use in testing and debugging. The system collects and forwards the trace messages produced by individual tracepoints to one or more destinations, including log les, ROPs and MCRTs. The tracepoints are controlled, so a related group scattered throughout the software can be turned on/ off at will. The parameters can also be set and changed using craft commands. The trace system is created automatically by during its initialization. Also, the user may create it manually. The tracepoints are designed to generate little overhead when disabled, but when used improperly, the trace system can consume large amounts of system resources while yielding little useful information. Craft commands allow one to totally inhibit all tracepoints, so that no trace messages are generated and the trace system uses little overhead, or to enable subsets of the tracepoints, thus restricting trace output to only that dealing with selected portions of application code. ALW:TRACE and INH:TRACE provide the basic on/off switch for trace. Until ALW:TRACE is invoked, no trace messages can be generated and logged under any circumstances. Similarly, once INH:TRACE is invoked, trace becomes totally dormant except for a certain amount of xed overhead. If trace is inhibited, the SET:TRACE command allows one to specify
Issue 16.0
December 2000
4-1
401-661-045
which tracepoints are active once trace is again enabled or, if trace is active, the command allows one to control the tracepoints during operation. The command, OP:TRACE, presents a summary of the current status of trace. The output message, REPT TRACE, reports a tracepoint from a 3B21D computer process or a node processor. The output message REPT TDTP indicates that the trace process has encountered a hardware or software fault. It should also be noted that the trace process is terminated when the system enters disk independent operation; see the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual or the 401-610-057 FLEXENT/ AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual. Ring maintenance functions for a ofce serve to detect, troubleshoot, and clear all fault conditions associated with the ring and ring hardware. The most common fault conditions associated with the ring are the following:
s s s s
Ring node out-of-service (OOS) Single ring node isolation Multiple ring node (RN) isolation Ring down.
Another less common fault condition on the ring is unexplained loss of token. These fault conditions are discussed in the remainder of this section. For additional information on ring maintenance. Direct link nodes (DLNs) follow the same guidelines as link nodes (s) in this section. CDN-I nodes also follow these guidelines except for removing ring application processor (RAP) circuit packs which require the power be turned off before circuit pack (CP) extraction.
4-2 Issue 16.0
December 2000
Ring Fault Conditions and Maintenance Approach

The information contained in this guide provides a maintenance approach for each ring fault condition listed above. These guidelines should be used only after the automatic ring recovery (ARR) has completed its attempt or has restored faulty ring nodes. For additional information concerning the use of ARR, refer to the Maintenance Description section in the this Manual.
Ring Node Out-of-Service

A ring node can be removed from active service and placed in the Out-Of-Service (OOS) state for many reasons. An RN may be placed in either of the OOS maintenance states (OOS-NORMAL or the OOS-ISOLATED state). When a node is placed in the OOS-ISOLATED state, the node is rst removed from service (OOS-NORMAL) and then isolated from the active ring (OOS-ISOLATED). When a node is removed from service for maintenance or fault detected purposes that does not interfere with the operation of system functions, the node may be taken OOS-NORMAL. The isolated node is not able to communicate or perform normal node functions with the ring, but is capable of performing and handling maintenance functions. In the OOS-NORMAL state, the node is said to be quarantined. The OOS maintenance states may be observed from the maintenance CRT (MCRT) on the 1106 display page. For additional information concerning OOS nodes in the quarantine state, refer to the Maintenance Description section in this Manual.
Ring Node OOS Maintenance Approach

This maintenance approach provides information which aids in diagnosing, correcting faults, and restoring nodes to active service. When a node is quarantined, it is not allowed to communicate with either the 3B21D computer, or the ring. When a node is quarantined, the state of the ring interface is quarantine usable (QUSBL). To verify this state, refer to the OP:RING command in the 401610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual or the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual. In cases where a node is in the OOS (quarantined) state, the most likely cause of this failure is the node processor (NP) or link interface. Listed below are guidelines to be used in troubleshooting, correcting, and restoring quarantined nodes to service. Assumption: An equipment malfunction has been detected, the fault recovery software has removed the node from service and placed it in the OOS-NORMAL maintenance state, where xx and yy are active nodes. The ARR has attempted to restore the node to service and has failed (manual action is required).
Issue 16.0
December 2000
4-3
401-661-045
xx
OOS-NORM
yy
Figure 4-1.
Ring OOS Normal
Procedure 4-1. Ring Node OOS Maintenance Guidelines

1. Determine the reason(s) the node has been taken OOS and placed in the quarantine state. Diagnose the faulty (OOS-NORM) node. Use guidelines presented in Chapter 6, Diagnostic User's Guide. Does the node remain OOS-NORMAL? NoDONE. YesProceed to next step. 2. If the node remains OOS-NORMAL, then starting with the OOS-NORMAL node, isolate and replace all RN CPs in the order of the NP, the link interface, ring interface 0 (RI0), and RI1, and then perform a conditional restore. For very large scale integration (VLSI) RNs, replace the integrated ring node (IRN) circuit pack and then the link interface. If the trouble clears after replacing the CPs in the order listed, when office traffic is minimal, the original CP(s) should be reinserted one at a time in the node, and diagnostics should be run to determine the faulty CP(s). If the diagnostics fail to detect the faulty CP(s), but the previous CP replacements cleared the trouble, then the CP(s) should be saved, noting the failure conditions. Inform the CTS of this condition. 3. After replacing the CP(s), if the node still remains OOS, then check the equipment for shorts, loose wiring, bent or broken pins, etc., and correct any problems discovered. Also, check to see if proper equipment has been used with the long message option. 4. Diagnose node (xx) adjacent to the faulty node using guidelines in Chapter 6, Diagnostic User's Guide. If problems are located, correct and restore node (xx) to service.
4-4 Issue 16.0
December 2000
NOTE: Perform an unconditional restore on the OOS-NORMAL node using the command RST:nodexx y;UCL where: For LN node = LN x = node member number y = node member number UCL = restores the node without performing diagnostics. For RPCN xx = group number y=0 UCL = restores the node without performing diagnostics.
Do not perform an unconditional restore unless one of the following has occurred:
s
CAUTION:
A complete diagnostics has produced an all-tests-passed (ATP) response. A complete diagnostics has produced a conditional all-tests-passed (CATP) response and the RI and the NP minor states are both usable (USBL).
Does the faulty node remain OOS-NORMAL? NoDONE. YesProceed to next step. 5. Diagnose node (yy) adjacent to the faulty node. If problems are located, correct and restore node (yy) to service. NOTE: Perform an unconditional restore on the OOS-NORMAL node using the command RST:nodexx y;UCL where:
Issue 16.0
December 2000
4-5
401-661-045
For LN node = LN xx = group number y = node member number UCL = restores the node without performing diagnostics. For RPCN node = RPCN xx = group number y=0 UCL = restores the node without performing diagnostics.
s s
CAUTION:
A complete diagnostics has produced an ATP response. A complete diagnostics has produced a CATP response, and the RI and the NP minor states are both USBL.
Does the faulty node remain OOS-NORMAL? NoDONE. Yes Proceed to next step. 6. If all attempts fail to clear the OOS node, then detailed testing is required. Call the CTS.
Single-Ring Node Isolation

A single node isolation is a condition on the ring where there is a node in the outof-service isolated (OOS-ISOLATED) maintenance state, and has been congured out of the active ring. This node in isolation is enclosed by a beginning of isolation (BISO) and an end of isolation (EISO) node. Single node isolation can be caused by faulty node processors (NPs), ring interfaces (RIs), Link Interfaces, interframe buffers (IFBs), IRNs, or cabling and/or backplane faults. An isolation on the ring is a serious problem, and immediate steps must be taken to correct ring isolation problems.
4-6 Issue 16.0
December 2000
Single Node Isolation Maintenance Approach

This maintenance approach provides information that aids in diagnosing, correcting faults, and restoring single-node isolations on the ring, while minimizing interference with service. When a single node is isolated on the ring, maintenance functions (diagnostics, CP replacement, etc.) must be performed. The node in isolation should rst be diagnosed, faults corrected, restored to service, and included back into the active ring. If after diagnosing, changing required CPs if necessary, and correcting any known problems, the isolation stills exists; check associated IFBs (if applicable), cabling, backplane, etc., for possible isolation causes. If the fault is not corrected by troubleshooting and working with the node in isolation, then perform maintenance on the BISO node. Should this also fail to clear the isolation, then proceed to the EISO node. Performing maintenance actions on the BISO or EISO node would normally mean extending the isolated segment to contain the BISO or EISO node in addition to the original faulty node. In the small ring case this cannot be done, since it would result in a ring with insufcient storage capacity. Special attention must be given to a small ring. The isolation must be moved to include only the faulty node. Recongure the ring by entering the following input message: CFR:RING,a;MOVFLT where: a = either an RPCN (RPCNxx y) or an (xx y) MOVFLT = command to move the isolation to the faulty node
See the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual or the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual for further information and explanation to response of message. Assumption: An equipment malfunction has been detected, the fault recovery software has removed a single node from service, recongured the ring, and has formed an isolation around the faulty node. The ARR attempts to restore the node to service and has failed (manual action is required). The following diagram depicts a single node isolation.
Issue 16.0
December 2000
4-7
401-661-045
BISO
isolated
EISO
Figure 4-2.
Single Node Isolation
Procedure 4-2. Single-Ring Node Isolation Maintenance Guidelines

1. Diagnose the isolated and faulty node using diagnostic guidelines listed in Chapter 6, Diagnostic User's Guide. If the isolation still exists after using these guidelines, proceed to next step. 2. If after diagnosing and troubleshooting the isolated node, the node does not restore to active service (thereby eliminating the isolated segment), diagnose the BISO node using guidelines listed in Chapter 6, Diagnostic User's Guide. If the ring is too small to allow the adjacent nodes to be isolated, the isolation must be moved. To diagnose the BISO node, the node must be excluded from the active ring. To accomplish this, use the RMV command. See the 401-610-057 FLEXENT/ AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual. When the BISO node is removed from service (OOS-NORM), it is automatically included in the isolated segment (OOS-ISOLATED). The application may restrict the RMV request. If the request is accepted, proceed with diagnostics as usual. If the request is denied, it may be necessary to input the command to remove the application's node from service and to diagnose the node. Put the signaling link (SLK) in the AVAILABLE-Manual Out-of-Service (MOOS) state, type the following message into the MCRT, and proceed with diagnostics as usual: CHG:SLK (a, b, [c, d]); MOOS where: a = group number (00 - 63)
4-8 Issue 16.0
December 2000
b = member number (01 - 15)
The following message should appear on the MCRT: CHG SLK a b [c d] NEW REQUESTED MINOR STATE = MOOS where: a = group number (00 - 63) b = member number (01 - 15) c = LI4 circuit pack (0 - 1) d = LI4 port (0 - 3) When the BISO node is congured into the isolated segment of the ring, a new BISO node is established. Once this occurs, the old BISO node can be diagnosed as any other node in an isolated segment.
BISO
OLD BISO isolated
isolated
EISO
Figure 4-3.
New BISO Established
NOTE: After diagnosing and clearing problems associated with the old BISO node, restore it to service using guidelines for restoring all other nodes. If the problem with the isolation was associated with the BISO node and corrected, then it is included back into the active ring, restoring and including the isolated segment into the active ring also. If the node is OOS-NORMAL, and the isolation has cleared, then unconditionally restore the OOS-NORMAL node to service. Refer to ``Ring Node OOS Maintenance Approach'' in this chapter.
CAUTION:
Issue 16.0
December 2000
4-9
401-661-045
s s
If the SLK was manually removed from service, put it back in the AVAILABLE-In Service (IS) or AVAILABLE-Standby (STBY) state by typing the following message into the MCRT: CHG:SLK (a, b, [c, d]);{IS | ARST} where: a = group number (00 - 63) b = member number (01 - 15) The following message should appear on the MCRT: CHG SLK a b [c d] NEW REQUESTED MINOR STATE = e where: a = group number (00 - 63) b = member number (01 - 15) c = LI4 circuit pack (0 - 1) d = LI4 port (0 - 3) If the isolation still exists, proceed to the next step. 3. After diagnosing and troubleshooting the BISO node, and the isolation on the ring still exists, restore the old BISO node, and then diagnose and troubleshoot the EISO node using guidelines used in diagnosing the BISO node (Step 2 above).
BISO
isolated
OLD EISO isolated
NEW EISO
Figure 4-4.
Diagnosing EISO Node
NOTE: After diagnosing and clearing any problems associated with the old EISO node, restore it to service if an ATP response is received for all phases. If the fault was
4-10 Issue 16.0
December 2000
found in the EISO node, then the isolation should clear, leaving the original faulty node in the OOS-NORMAL state. See Figure 4-4. If the node is OOS-NORMAL, and the isolation has cleared, then refer to ``Ring Node OOS Maintenance Approach,'' and unconditionally restore the node to service. If the isolation still exists, proceed to the next step. 4. If the ring isolation is not cleared, then starting with the single isolated node, replace all RN CPs in the order of ring interface 0 (RI0), RI1, the NP, and the link interface, and perform a conditional restore. For VLSI RNs, replace the IRN circuit pack and then the link interface. If the trouble clears after replacing the CPs in the order listed, then when office traffic is minimal, the original CPs should be reinserted one at a time in the node and diagnostics run to determine the faulty CP(s). If the diagnostics fail to detect the faulty CP, but the previous CP replacements cleared the trouble, then the CP(s) should be saved, noting the failure conditions. Inform the CTS of the condition. If the trouble is located and corrected, leaving the original isolated node in the OOS-NORMAL maintenance state, then refer to ``Ring Node OOS Maintenance Approach'' in this chapter to complete this approach. If the isolation still exists, proceed to the next step. 5. Visibly inspect affected equipment for shorts, bent or broken backplane/pins, etc. Correct any problems that are uncovered. Diagnose, and unconditionally restore equipment to service if an ATP response is received for all phases run. If the isolation clears, and the node is OOS-NORMAL, refer to ``Ring Node OOS Maintenance Approach'' listed in this chapter. If the isolation still exists, proceed to the next step. 6. Contact the CTS.
Multiple-Ring Node Isolation

The steps listed here are guidelines that are used to clear faults associated with multiple node failures on the ring. These guidelines are not the only approach used to correct this problem, but provide basic steps that may be effective in clearing the trouble(s). Provided in the multiple isolation guidelines are two maintenance approaches that may be used in correcting multiple faults on the ring. The rst approach (A) provides guidelines that aid in reducing time required
Issue 16.0
December 2000
4-11
401-661-045
to replace CPs, and to restore the ring to an operational state. The second approach (B) details guidelines that should be used when the load on the CNI is minimal. The rst approach is not intended to be used as the total maintenance approach, and should only be used when time does not allow for diagnostic testing. Otherwise, approach ``B'' should be used whenever possible.
4-12 Issue 16.0
December 2000
Multiple Node Isolation Maintenance Approach

A multiple node isolation occurs when there are two or more failures that occur on the ring, causing a potentially large isolated segment. This maintenance approach provides information which aids in testing, repairing, and restoring nodes in isolation to minimize the effect on service. When there is an isolated segment of multiple nodes, with an established BISO and EISO node, the most probable faulty node(s) are the isolated nodes adjacent to the BISO and EISO nodes. This is assumed because both the BISO and EISO nodes of a multiple node isolation are most likely to be established adjacent to the faulty node when attempting to recover from ring error conditions. Therefore, by troubleshooting the nodes adjacent to the BISO and EISO nodes, faults are corrected with the least amount of time and service interruption. For a more complete explanation of BISO and EISO node information, refer to the Maintenance Description section in this Manual. Assumption: An equipment malfunction has been detected, the fault recovery software has removed multiple nodes from service, recongured the ring, and formed an isolated ring segment around the faulty nodes. The ARR has attempted to restore the nodes to service and has failed. NOTE: If multiple nodes are isolated within a segment, the test approach is to diagnose the isolated node adjacent to the BISO node rst, and then the isolated node adjacent to the EISO node. See Figure 4-5. Next, the nodes (xx and yy) must be diagnosed. After these nodes are diagnosed, the BISO and then the EISO nodes are diagnosed. Nodes are diagnosed in this manner because the most probable trouble nodes are established next to, or close to BISO and EISO nodes. There may be other nodes within the isolated segment that are not faulty but are included in the isolated segment because they are between the two faulty nodes. When performing maintenance on a multiple node isolation, one should attempt to clear problems associated with either the BISO or the EISO end of the segment to form a single node isolation. Once the single node isolation has been established, follow the single-node isolation test approach. It has been determined that there are two or more faulty nodes in an isolated segment, and all faulty nodes have been removed from service and isolated from the active ring.
Issue 16.0
December 2000
4-13
401-661-045
BISO
iso 0
xx
yy
zz
iso 1
Figure 4-5.
Two or More Faulty Nodes
The xx, yy, and zz represent nodes that are in the isolated segment and may or may not be faulty.
Procedure 4-3. Multiple-Ring Node Isolation Maintenance Guidelines - A

This maintenance approach does not detail direct procedures, but instead provides the user with an understanding about what may be done differently from Approach B to reduce time consumed in restoring the ring and ring hardware. 1. Have tested good'' link node CPs available. 2. When a multiple fault occurs that isolates two or more nodes, causing innocent nodes to become OOS and included in an isolated segment as depicted in the diagram above (xx, yy, zz), then perform the following: a. Replace all CPs within the node at either end of the isolated segment, and perform a conditional restore on the node. Be certain to place all replaced CPs in protected static packaging. b. After problems are cleared at either end, and the isolation clears or is reduced in size, then the innocent OOS nodes should restore to active service automatically, possibly leaving only a single isolated node at the other end. 3. Diagnose and correct all problems associated with the node left isolated. Troubleshoot the node in this manner to avoid including innocent nodes in the isolated segment. 4. When office traffic is minimal, replace the original CPs in the faulty node where the CPs were originally replaced, and diagnose (troubleshoot) it until the faulty CP(s) are located.
4-14 Issue 16.0
December 2000
5. Place all other CPs in the original static wrapping, and store them (the ``tested good'' CPs) for possible, future faults.
Procedure 4-4. Multiple-Ring Node Isolation Maintenance Guidelines - B

1. Diagnose iso 0 using guidelines listed in Chapter 6, Diagnostic User's Guide. NOTE: If the fault in iso 0 is corrected and the node is restored to service, then the isolated segment of the ring is shortened. This creates a new BISO node and change from a multiple node isolation to a single node isolation, restoring all the innocent OOS nodes. Does the original isolation still exist, or is iso 0 OOS-NORMAL? If an isolation still exists, but has been shortened, and iso 0 is OOS-NORMAL and known to be usable, unconditionally restore iso 0 to service, and then proceed to Step 6. Use one of the following commands to restore the node:
s s
For s, enter RST:xx y;UCL! For RPCN, enter RST:RPCNxx yy;UCL xx = group number y = node member number UCL = restores the node without performing diagnostics.
where:
s s
CAUTION:
If iso 0 remains OOS-NORMAL, refer to ``Ring Node OOS Maintenance Approach'' in this chapter. If the original isolation still exists, proceed to next step. 2. Diagnose node xx using guidelines detailed in Chapter 6, Diagnostic User's Guide.
Issue 16.0
December 2000
4-15
401-661-045
If node iso 0 is in the OOS-NORMAL state, and the original BISO node no longer exists after diagnosing and repairing node xx, then refer to ``Ring Node OOS Maintenance Approach.'' If the above statement is true, and all problems are corrected concerning these nodes, then a single node isolation may be formed, including a new BISO node, iso 1, and the EISO node. If this occurs, then refer to ``Single Node Isolation Maintenance Approach'' for the remainder of these guidelines. If the original isolation still exists after diagnosing node xx and correcting any problems, then repeat Steps 1 and 2 using nodes iso 1 and yy. If the original isolation still exists, then proceed to the next step. 3. Diagnose the BISO node. NOTE: The BISO node is an active node on the ring. To diagnose the BISO node, the node must be excluded from the active ring. See Figure 4-6. To accomplish this, use the RMV command. See the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual. When the BISO node is removed from service (OOS-NORM), it is automatically included in the isolated segment (OOS-ISOLATED).
NEW BISO
OLD BISO iso
iso 0
xx
yy
zz
iso 1
EISO
Figure 4-6.
New BISO Node
The RMV request may or may not be accepted. If the request is accepted, proceed with diagnostics as usual, using guidelines listed in Chapter 6, Diagnostic User's Guide. If the request is denied, it may be necessary to remove the node and SLK from service, and then diagnose the node. To put the SLK in the AVAILABLE-MOOS state, type the following message into the MCRT, and proceed with diagnostics as usual: CHG:SLK (a, b, [c, d]); MOOS where: a = group number (00 - 63)
4-16 Issue 16.0
December 2000
b = member number (01 - 15) The following message should appear on the MCRT: CHG SLK a b [c d] NEW REQUESTED MINOR STATE = MOOS where: a = group number (00 - 63) b = member number (01 - 15) c = LI4 circuit pack (0 - 1) d = LI4 port (0 - 3) NOTE: After diagnosing and clearing problems associated with the BISO node, if any are located, restore the node to service using guidelines for restoring all other nodes. After diagnosing the BISO node, if problems are found and corrected, and if an ATP response is received, the BISO node may be deleted, leaving the iso 0 node in the OOS-NORMAL state. If this occurs, restore iso 0 to service. Refer to ``Ring Node OOS Maintenance Approach'' in this chapter.
s s
CAUTION:
If problems are corrected with the BISO, iso 0, and xx node, then the isolated segment of the ring should shorten, leaving only a single isolated node. If this occurs, refer to ``Single Node Isolation Maintenance Approach'' in this chapter for the remainder of this test. If the SLK was manually removed from service, put it back in the AVAILABLE-IS or AVAILABLE-STBY state by entering the following message at the MCRT: CHG:SLK (a, b, [c, d]);{ IS | ARST} where: a = group number (00 - 63) b = member number (01 - 15) The following message should appear on the MCRT:
Issue 16.0
December 2000
4-17
401-661-045
CHG SLK a b [c d] NEW REQUESTED MINOR STATE = e where: a = group number (00 - 63) b = member number (01 - 15) c = LI4 circuit pack (0 - 1) d = LI4 port (0 - 3) 4. If the original ring isolation still exists, starting with node iso 0, then xx, and finally the BISO node, replace all RN CPs in this order: ring interface 0 (RI0), RI1, the NP, and the link interface. Perform a conditional restore. For VLSI RNs, replace the IRN circuit pack and then the link interface. If the trouble clears after replacing the CPs in the order listed, the original CPs should be reinserted one at a time in the node and diagnostics run to determine the faulty CP(s). If the diagnostics fail to detect the faulty CP(s), but the previous CP replacement cleared the trouble, then the CP(s) should be saved, noting the failure conditions. Inform the CTS of the condition. 5. If the original ring isolation still exists, visibly inspect affected equipment for shorts, bent or broken pins, backplane faults, etc. Also ensure that proper equipment has been used with the long message option. If problems are located, correct the problems and perform a conditional restore on the affected equipment. 6. If the isolation still exists, or if all problems with the original BISO node, the iso 0 node, and node xx have been cleared, diagnose and attempt to correct problems associated with nodes iso 1, yy, and the EISO node, using Steps 3 through 5 of these guidelines. See Figure 4-7.
BISO
iso 0
xx
yy
zz
iso 1
OLD EISO iso
NEW EISO
Figure 4-7.
More Than One Faulty Node
NOTE: After correcting and restoring this portion of the isolated segment of the ring, attempt to restore iso 0, xx, and the BISO nodes if problems were not corrected in previous steps.
4-18 Issue 16.0
December 2000
7. If all attempts fail to clear the isolated segment, then detailed testing is required. Contact the CTS.
Ring Down
The ring down maintenance state is a state where the ring is unable to handle trafc. In this state, communication with the 3B21D computer (except for maintenance purposes) and other nodes on the ring is lost. All s are in the OOS state and all ring peripheral controller nodes (RPCNs) are in the standby state. The RPCNs are left in this standby state to eliminate any need to restore them if service can be restored. This state totally affects system operation; therefore, the problem must be corrected as soon as possible. If the ring is down for more than one second, the CCS network is affected and it results in a critical alarm. Expect the REPT CSLM output message.
Ring Down Maintenance Approach

When the ring is down, total system operation is affected. The guidelines below present steps that may be used in attempting to restore the ring.
Procedure 4-5. Ring Down Maintenance Guidelines

1. Confirm that the total ring is down. Use the 1105 display page, OP:RING;DETD, or check the maintenance receive-only printer (MROP) for printouts. This should confirm that the ring is down; for additional verification, check the ring quarantine (RQ) lamps on each ring node frame/cabinet (RNF/C). NOTE: The MROP may or may not conrm or print any information on the Ring Down state. 2. After confirmation that the ring is down, attempt to restore the ring by reinitializing the system. Perform a level-3 initialization (see the 401-610-055 FLEXENT/ AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual or the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual.)
Issue 16.0
December 2000
4-19
401-661-045
NOTE: For additional information on the initialization levels, refer to `Ìnitialization,'' Part 4 of this manual. Does the ring initialize? YesProceed to next step. NoProceed to Step 5. 3. Are all nodes that were not previously OOS (except quarantined nodes) before the ring down state restored to service? YesProceed to Step 8. NoProceed to next step. 4. For all nodes that were not previously OOS before the ring failure, perform an unconditional RST. See Chapter 6, Diagnostic User's Guide, or the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual or the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual. Did all nodes previously not OOS prior to the ring failure restore? YesProceed to Step 8. NoProceed to next step. 5. Attempt to reinitialize the ring. Perform a level-4 initialization (see the proper application in the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual or the 401-610-057 FLEXENT/ AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual.). NOTE: For additional information on the initialization levels, refer to `Ìnitialization,'' Part 4 of Chapter 6, Diagnostic User's Guide. Does the ring initialize? YesProceed to next step. NoProceed to Step 9. 6. Are all nodes that were not previously OOS prior to the ring failure restored to service? YesProceed to Step 8. NoProceed to next step.
4-20 Issue 16.0
December 2000
7. For all nodes that were not previously OOS before the ring failure, perform an unconditional RST. See Chapter 6, Diagnostic User's Guide. 8. Are there any other nodes OOS left on the ring? NoDONE. YesDetermine the ring condition (single node isolation, multiple node isolation, etc.) and proceed to that condition's maintenance approach presented in this chapter. 9. If the system still doesn't initialize after the level-3 and level-4 initialization attempts, call the CTS.
Ring Generic Access Package (RGRASP)

Feature Denition
RGRASP is a single-user utility system for the CNI ring nodes. RInteractions
Care must be exercised when using the RGRASP tool. Improper use of RGRASP can result in program mutilation or excessive utilization of system resources. Both of these consequences of improper use of the tool can lead to call processing downtime and therefore interrupt the operation of a node on the ring or the whole ring.
CAUTION:
Feature Description
The RGRASP tool can:
s
Set (allow) breakpoints (a breakpoint corresponds to the address of the rst byte of a target process instruction). Clear breakpoints. Report on current status for specied breakpoints. Inhibit breakpoints. Load a specied RGRASP utility variable (UVAR). Dump a specied RGRASP UVAR. Load a specied node with data.
s s s s s s
Issue 16.0
December 2000
4-21
401-661-045
s s s
Dump the contents of a specied address in a given node. Direct the loading of an address. Dump the contents of a specied Application Processor or Node Processor register.
Software Impact
This feature does not impact customer engineerable software resources on APs. This feature could impact customer engineerable software resources on NPs, dependent on memory size.
Software Description
The software consists of the following processes: RGP_KER This is a UNIX process kernel for the feature. It acts as the interface between the AM (RG_CFT and RG_PRT) and the ring node (monitor) processes. This UNIX process handles input commands from the craft shell. It parses and performs some preliminary checking on the input command. Then it relays the command to the RG_KER process for further processing. This UNIX process handles printing of output. This system process performs the actual operations required to handle breakpoints, memory dumping, and memory loading. It communicates with the RGP_KER.
RGP_CFT
RGP_PRT monitor
User Prole
This feature and its associated input commands are intended for use by technicians in conjunction with the CTS.
Description of Feature Operation

The following paragraphs describe how this feature can be used.
Initial Setup
First, determine the address in memory that requires investigation. This can be done by using the latest PR/PK listings provided. This address may be provided by the CTS.
4-22 Issue 16.0
December 2000
Determine which processor should be looked at. In the case of the DLN, there is an active and a standby processor. Use the OP:SLK or poke the 118 page to determine this. As a precaution, it is a good idea to set breakpoints in only one processor at a time.
Setting a Breakpoint
You can set a breakpoint in a program using the WHEN:RUTIL input command. Before this can be done, the opcode (OPC) must be known. To verify the OPC, use the DUMP:RUTIL command to dump the memory at the breakpoint address. If the expected OPC does not match the dump output, then the listings do not match the memory. This discrepancy should be cleared up before continuing the procedure. One possible explanation is that the node software is out of date. To eliminate this possibility, you can remove and restore the target node (node in which breakpoint is to be set). Doing this will ensure that the newest version of code has been pumped from disk. You can use the RMV:LN and RST:LN commands or 118 poke to achieve this. After the node has been pumped, try dumping the breakpoint address again. If it does not match up now, you know the listings are out of date. In this case, you should stop and get a current listing before proceeding. The WHEN:RUTIL command allows you to specify actions (commands) to be executed when the breakpoint you set res. The input message manual page for WHEN:RUTIL denes the actions. Up to 24 actions may be specied in the action list for a single breakpoint. The action list must be terminated by a END:WHEN command. The action list can contain only the END:WHEN command, in which case you will simply know whether a piece of code is being executed. Only ve breakpoints can be set in any one ring node processor.
Loading Memory
You can load memory with the LOAD:ADDR, LOAD:WORD, LOAD:SHORT or LOAD:BYTE commands within the WHEN:RUTIL command or with the LOAD:RUTIL command. Details on the use of these command are provided under " Input Messages.''
Loading memory may drastically change program execution. If not done properly, this can interrupt or degrade service; for example, calls may be lost.
CAUTION:
Issue 16.0
December 2000
4-23
401-661-045
The RGRASP tool has WRITE permissions to all parts of available memory. This makes the tool powerful but dangerous. No OPC checking is performed; it is possible to specify the wrong address and overwrite the wrong data. If you should overwrite the wrong data and the original contents cannot be loaded, the ring node should be removed and restored (pumped) to get an original disk copy back. To perform the remove and restore, the RMV:LN and RST:LN commands should be used. After a load, you should use the DUMP:RUTIL command to verify the new contents in memory. Registers can be loaded only during breakpoint action lists (WHEN:RUTIL command).
Reading Memory
Dumping memory is a fairly straightforward and safe operation. You need only the address to dump. You can dump memory with the DUMP:ADDR or DUMP:REG commands within the WHEN:RUTIL command or with the DUMP:RUTIL command. RGRASP allows 468 bytes to be dumped in one operation. The output is hexadecimal. You can dump memory either higher or lower than the starting address with the DUMP:RUTIL command. A range of addresses may also be specied with DUMP:RUTIL. Registers can be read only during breakpoint action lists (WHEN:RUTIL command).
Loading and Dumping RGRASP Utility Variables (UVARs)

Within a WHEN:RUTIL breakpoint action list, you can load and dump RGRASP UVARs with the LOAD:UVAR and DUMP:UVAR commands, respectively.
Feature Activation
You can activate the feature; that is, execute one or more of its functions by using any of the following input commands:
s s
ALW:RUTIL or ALW:RUTILFLAG DUMP:RUTIL
4-24 Issue 16.0
December 2000
s s s
LOAD:RUTIL OP:RUTIL or OP:RUTILFLAG WHEN:RUTIL command
Feature Deactivation
You can deactivate the feature; that is, clear all breakpoints in a specied node with the CLR:RUTIL command. You can clear a specic breakpoint in a specied node with the CLR:RUTILFLAG command. You can temporarily disable or inhibit all breakpoints in a specied node with the INH:RUTIL command. You can temporarily disable or inhibit a specic breakpoint in a specied node with the INH:RUTILFLAG command.
Equipment Conguration Data (ECD)

ECD are not affected by the RGRASP feature.
Recent Change Procedures

Recent change procedures are not associated with the use of the RGRASP tool.
Measurement
No measurements are provided as part of the RGRASP tool.
Network Management Impact

If the RGRASP tool is used improperly, service interruption or degradation can occur.
Maintenance/Troubleshooting Impact
The RGRASP tool is a debugging tool for CNI ring nodes. It is usable only at nodes that are active from an IMS viewpoint, such as the IMS ACT state. Nodes that are quarantined or isolated cannot be accessed with RGRASP. There are no new diagnostics related to this tool.
Issue 16.0
December 2000
4-25
401-661-045
RGRASP breakpoints are affected by CNI initialization levels as follows: Level O,1,FPI,2,3 4 Effect None Clears all breakpoints
Recording
This tool has no impact on recording.
Procedure 4-6. Input Messages

The following input messages/commands are associated with the RGRASP tool. For more information about each of these messages, refer to the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual or the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual.l.
Incorrect use of these commands may interrupt operation of a node on the ring or the whole ring. READ EACH PURPOSE CAREFULLY. 1. ALW:RUTIL or ALW:RUTILFLAG The rst command allows all breakpoints in the specied node; the second allows a specic breakpoint in the specied node. 2. CLR:RUTIL or CLR:RUTILFLAG The rst command clears all breakpoints in the specied node; the second clears specic breakpoints in the specied node. 3. DUMP:ADDR Dumps the contents of the specied address in the given node. This command is allowed only within a WHEN:RUTIL command <action-list>. 4. DUMP:REG
CAUTION:
4-26 Issue 16.0
December 2000
Dumps the contents of the specied Application or Node Processor register in the given node. This command is allowed only within a WHEN:RUTIL command <action-list>. 5. DUMP:RUTIL Dumps the contents of memory at the address range given at the specied node. It can also dump the contents of memory starting at the given address for the specied number of bytes. Currently a maximum length of 468 bytes is allowed for a single dump operation. A formatted output of the node's memory contents will follow this input command. 6. DUMP:UVAR Dumps the contents of the specied RGRASP UVAR. This command is allowed only within a WHEN:RUTIL command <action-list>. 7. INH:RUTIL or INH:RUTILFLAG The rst command inhibits all breakpoints in the specied node; the second inhibits specic breakpoint(s) in the specied node. 8. LOAD:ADDR Loads the specied address with the specied data. This command is allowed only within a WHEN:RUTIL command <action-list>. 9. LOAD:BYTE Loads the address in the given node with the specied data. This command is allowed only within a WHEN:RUTIL command <action-list>. 10. LOAD:REG Loads an Application or Node Processor register with the specied data in the given node. This command is allowed only within a WHEN:RUTIL command <action-list>. 11. LOAD:RUTIL Loads the address at the given node with the specied data. The maximum number of data items allowed for loading is 128 bytes or 32 4-byte words.
Issue 16.0
December 2000
4-27
401-661-045
There must be a one-to-one correspondence between the length of the data to be written and the data provided. If there are 3 bytes of data to be written, three data entries must be specied. Similarly, if there are ve words to be written, ve data entries must be specied.
4-28 Issue 16.0
December 2000
12. LOAD:SHORT Loads the address in the given node with the specied data. This command is allowed only within a WHEN:RUTIL command <action-list>. The address provided is expected to be on a 2-byte boundary. The data provided is expected to be a 2-byte value. 13. LOAD:UVAR Loads the specied RGRASP UVAR with the specied data. This command is allowed only within a WHEN:RUTIL command <action-list>. 14. LOAD:WORD Loads the address in the given node with the specied data. This command is allowed only within a WHEN:RUTIL command <action-list>. The address provided is expected to be on a 4-byte boundary for an AP or a 2byte boundary for an NP. The data provided is expected to be a 4-byte value. 15. OP:RUTIL or OP:RUTILFLAG The rst command outputs the status of all breakpoints in the specied node; the second outputs the status of a specic breakpoint in the specied node. 16. WHEN:RUTIL <action list> END:WHEN! Sets a RGRASP breakpoint in the specied node along with a specied action-list to be performed by the node when the breakpoint res. Current <action-list> items available are: ALW:RUTIL ALW:RUTILFLAG DUMP:ADDR DUMP:REG DUMP:UVAR INH:RUTIL INH:RUTILFLAG LOAD:ADDR LOAD:BYTE LOAD:REG
Issue 16.0
December 2000
4-29
401-661-045
LOAD:SHORT LOAD:UVAR LOAD:WORD For more specic instructions on these items, see preceding listings for specic commands, or refer to the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual or the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual.
Output Messages
The following output messages are associated with the RGRASP tool. For more information about each of these messages, refer to the the 401-610-057 Output Message Manual.. 1. ALW RUTIL or ALW RUTILFLAG Prints in response to a ALW:RUTIL or ALW:RUTILFLAG command. Indicates the action that has occurred as a result of the command. 2. CLR RUTIL or CLR RUTILFLAG Prints in response to a CLR:RUTIL or CLR:RUTILFLAG command. Indicates the action that has occurred as a result of the command. 3. DUMP RUTIL Prints in response to a DUMP:RUTIL command. Indicates the action that has occurred as a result of the command. 4. INH RUTIL or INH RUTILFLAG Prints in response to a INH:RUTIL or INH:RUTILFLAG command. Indicates the action that has occurred as a result of the command. 5. LOAD RUTIL Prints in response to a LOAD:RUTIL command. Indicates the action that has occurred as a result of the command. 6. OP RUTIL or OP RUTILFLAG Prints in response to a OP:RUTIL or OP:RUTILFLAG command. Indicates the action that has occurred as a result of the command.
4-30 Issue 16.0
December 2000
7. REPT RGP PRT Prints when anomalies occur within the print process of the RGRASP tool. Indicates the kind of anomaly that has occurred. 8. REPT RUTIL This message has 40 formats. Formats [1] through [15] report an error condition encountered by the RGRASP RGP_KER process. Formats [16] through [40] print in response to the ring of a breakpoint. 9. WHEN RUTIL Prints in response to a WHEN:RUTIL command.
Audits
The RGRASP tool does not affect any audits.
Critical Events
The RGRASP tool does not affect any critical events.
Support Tools
The RGRASP tool is a new support tool.
Related Documentation Cross-References

For more details about the use of each input command associated with RGRASP, refer to the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual. . For more details about the use of each output message associated with RGRASP, refer to the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual.
Issue 16.0
December 2000
4-31
401-661-045
4-32 Issue 16.0
December 2000
5
5-1 5-2 5-2 5-3 5-3 5-4 5-4
Contents
Introduction Critical Event Message Output
s s s s
Logging Critical Events Short Form CNCE Message Long Form CNCE Message Using the CHG:CEPARM Command
CNCE Descriptions
Issue 16.0
December 2000
5-i
401-661-045
Contents
5-ii
Issue 16.0
December 2000
Introduction
CCS Network Critical Events (CNCE) are predened events that are considered indicators of abnormal network operation. They are of importance to network operation and to the proper functioning of the ofce. Both on-site and support system personnel must be immediately aware of events affecting the CCS network. CNCE messages are output as these critical events occur and are referred to as on-occurrence autonomous messages. CNCE messages are output as critical events occur in the ofce or as network events are recognized and acted upon. There are approximately 70 critical events in a system. Some critical events pertain to the CCS network in general, while others have signicance to the. A CNCE could represent an occurrence, the beginning of some state, or the ending of some state. Events indicating the beginning or ending of a state should occur in pairs. A critical event never represents a length of time. The naming convention used for critical events is similar to the naming convention used for measurements. It is as follows:
s
The mnemonic represents as closely as possible the actual event. The mnemonic is derived from a set of abbreviations representing typical signaling events. These abbreviations are combined to describe the event. The sufx E means the state indicated by the mnemonic has ended. Names may include letters, digits, or special characters. Names are unique and contain no more than 12 characters.
s s s
Issue 16.0
December 2000
5-1
401-661-045
The names given to critical events are used by the Measurement Output Control Table (MOCT), which is described in the ``Measurement Output Control Table'' section in the. At the end of this section are tables providing explanations of each critical event by name.
Critical Event Message Output

The Critical Event Table (CET) in the MOCT controls the reporting of critical events. The critical event handler is responsible for sending the message to the users specied in the CET. This table includes information indicating which users are to be informed of which particular critical events. The CET also species that messages should be recorded in a log le and designates what form of the message the users receive: long or short. Each of these forms is discussed later. Automatic reporting of critical events is in real time.
Logging Critical Events

The recognition of critical events (the occurrences to be reported) takes place in the central processor. The following information is provided to the central processor:
s s s
Identication of the event that occurred (the CNCE name) When the event occurred (may be set to network or local time) Identication of the peripheral units involved, if required.
The critical event handler immediately generates a CNCE message. The CNCE message is generated in two forms: short form and long form (see the REPT CNCE message in the ). The CNCE message is automatically recorded in the CNCE log le, rst, using the long form. Then, it is output to the appropriate users in the forms specied in the CET. The CNCEs are output at the MROP locally and are sent to various support system centers over BX.25 links. For more information on the CNCE message forms, see the REPT CNCE message in the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual. The CNCE log le is a circular le stored on disk (/etc/log/CNCELOG). The le contains a minimum of 90 minutes of the most recent CNCE messages. The messages in the log le can be retrieved. The le can be output using the OP:LOG:CNCELOG UNIX system Real Time Reliable (RTR) command (see the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual or the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual). Support system users cannot use this command over BX.25 sessions.
5-2 Issue 16.0
December 2000
Short Form CNCE Message

The short form (shown below) provides the critical event name, local or network time, and identication of the associated hardware (by pointcode, link set, or group-member number). The short form is intended mainly for support systems that have a reference database containing details on the hardware identied. Figure 5-1 shows examples of long and short CNCE messages. Refer tosee the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual or the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual, for a description of the elds in a CNCE message. A CNCE message cannot be generated by an input command.
REPT CNCE C6EMRPO REPT CNCE C7LCABMIS
14:00:36:59
32-00
Short form Long form
14:00:36:59
7 02-0
ATLN_GA_TL_MS2_06 56. A
Figure 5-1.
CNCE Messages
For CNCE messages related to PBX links, both long and short forms may contain circuit pack and port identication and diagnostic code.
Long Form CNCE Message

The long form (shown above) includes all the information specied for the short form message. Since the long form is used by the maintenance work force, more detailed information must be provided. In particular, the ofce identication (CLLI code), the speed, link type, and the protocol of the link. If applicable, it also includes the VFL identication, function number, or subsystem number. Refer to tsee the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual or the 401-610-057 FLEXENT/ AUTOPLEX Wireless Networks OUTPUT MESSAGES Manualfor a description of the elds in a CNCE message. A CNCE message cannot be generated by an input command.
Issue 16.0
December 2000
5-3
401-661-045
Using the CHG:CEPARM Command

The CHG:CEPARM command allows users to change the parameters that control the reporting of certain critical events. It is primarily intended for use by support system users but may be entered by on-site users through the MCRT. The C7NOTRNS and C7MTPERR events currently are controlled in this manner. Both events have ``cycle time'' and ``number of occurrences'' parameters. For a description of these events, refer to the table in the following part. The command is input as follows: CHG:CEPARM:REPT a, EVENT b, CYCLE c! Where: a = Name of the autonomous message event: NOTRNS or MTPERR. b = Number of occurrences, or messages, per cycle: 0 to 100. c = Duration of the cycle in seconds: 0 to 60. Upon execution of the command, an output message is generated and the specied parameter values are stored. The values are rst written to the /cmp/stp/ odata/miscparm disk le and then are used to update the appropriate main memory tables. Any future occurrences of the specied event are reported as indicated by the new parameter values. The above-mentioned le also contains the default values for the parameters.
CNCE Descriptions
The event names appearing in CNCE output messages are derived from the MOCT and are dened as shown in Table 5-1. The descriptions are presented alphabetically by event name. The table shows the information provided by the CNCE message. The eld, shown in parentheses after the event name, is the group-member number, the point code, or the link set. Often, an occurrence not only causes a CNCE message but is also counted as a measurement. Some of the critical events in the table can be better understood by referring to corresponding measurements in the rst part of this chapter. That part contains a table with more detailed descriptions of certain events. Some measurement names should be similar to the critical event names. NOTE: The C6'' or `C7'' at the beginning of a CNCE name identies the event as either CCIS6 or CCS7 link related. The ``CP'' or ``CT'' at the beginning of a CNCE name identies it as PBX node/link related. Others are per ofce events.
5-4 Issue 16.0
December 2000
Table 5-1.
CNCE Descriptions (Page 1 of 14) DESCRIPTION Change back from a failure that is not a declared failure. This is an automatic change back to a link that previously did an automatic changeover and then restored. The change back must normally occur within 3 minutes of the changeover. If the LI reports a long key exchange is taking place, this time period is extended to 6 minutes. This event occurs for all automatic change backs exclusive of the C6ACBFLD event. Refer to the L6ACO_ measurements for a description of the changeover/change back sequence. This event is usually preceded by a C6ACO_ event. Automatic change back from declared failure. This event indicates that the link is declared failed, has recovered, and trafc has been routed back to the link. This event is preceded by one of the C6FLD_ events (see those descriptions for more information on declared failure). Note that if a link is in the MOOS state and an emergency condition automatically forces the link back into service (called preemption), the C6MCB event occurs rather than this event. Automatic changeover initiated by the far end. A changeover involves transferring signaling messages from the unavailable link to some other link. For example, in the case of a B-link, the changeover results in messages being routed to the mate link, and in the case of an A-link, the changeover results in messages being routed to a C-link. When the changeover message is received from the far end, the following occurs: 1. The link is removed from service. 2. No new messages are given to the link. New messages are diverted to the mate link or C-link. 3. Messages remaining in the transmit buffers are retrieved and an attempt is made to transmit these messages on some other link.
Name (data) C6ACB (gg-mm)
C6ACBFLD (gg-mm)
C6ACOCOV (gg-mm)
Issue 16.0
December 2000
5-5
401-661-045
Table 5-1.
CNCE Descriptions (Page 2 of 14) DESCRIPTION 4. Only synchronization messages are sent to the far end. 5. The link switches VFLs and attempts to synchronize. 6. If acceptable, the link is proven in (from 3 to 15 seconds required) and restored. Messages are routed back (referred to as change back). Both VFLs are tested alternately until one syncs. If the link cannot change back within 3 minutes (or 6 minutes if a long key exchange is involved), it is declared failed. Refer to the L6ACO_ measurements for more information.
Name (data) C6ACOCOV (gg-mm) (Cont.)
C6ACOER (gg-mm)
Automatic changeover error threshold has been exceeded. The error rate monitor in the LI maintains a "leaky bucket" count of the number of SUs received in error during normal operation and also a linear count of SUs received in error during prove-in. If either count exceeds some threshold, the error is reported to the node. The node then reports this event, and alternate synchronization and changeover messages are sent to the far end (the far end recognizes this as a changeover request). Similar actions to those described for the C6ACOCOV event are taken. Transmit buffer overow begins (this occurs only for the telephone message transmit buffer). This event indicates that message(s) have been discarded because the buffer is full. The message is discarded and this event is reported on the rst attempt to transmit a message with the buffer full. As long as the buffer is full, messages may be discarded. This event is not reported again at least until buffer overload ends (indicated by the C6BOLXE event). This event should be preceded by the C6BOLX event.
C6BOFX (gg-mm)
5-6 Issue 16.0
December 2000
Table 5-1.
CNCE Descriptions (Page 3 of 14) DESCRIPTION Transmit buffer overload begins (only the telephone message transmit buffer). The number of signal units in the buffer has reached the threshold for congestion controls to be activated. This event is reported only once when the threshold has been reached and not again at least until the overload ends. When the overload occurs, the node returns selected outgoing messages to their originations. The originators of these messages in turn control their trafc towards the node experiencing buffer overload. This mechanism is called selected return, and consists of the following: s Return some direct signaling messages. s Discard all IAMs and COTs and return message refusal to the sending ofce.
s
Name (data) C6BOLX (gg-mm)
Send a group signaling congestion message to all ofces that send messages on this link.
Every second, the node checks to see if buffer occupancy has dropped to an abatement threshold (see the C6BOLXE event description). When that occurs, the overload has ended. Should the link remain overloaded for one minute, it is declared failed. C6BOLXE (gg-mm) Transmit buffer overload ends. This event indicates that the number of signal units in the transmit buffer has dropped to the abatement threshold after an overload. The node checks the buffer occupancy once each second. When occupancy has reached the abatement threshold, selective message return is ended and this event is reported. Both overload and overow are considered ended when this event occurs. Broadcast the remove dynamic overload controls message. These messages are in response to messages from end ofces requesting the application or removal of a particular DOC state. The corresponding C6DOC_ event occurs when the message is received. The request results in a DOCx message being transmitted backwards for all bands that can send messages to the congested ofce. The messages are sent on each "trigger" band to the far end ofces. The request may be received on a CCS7 link if virtual links are assigned. Those far end ofces then apply the controls to all bands associated with the trigger bands. All DOCx messages are one signal unit in length. Two minutes after receiving the last message, an end ofce automatically removes the controls. The DOC0 broadcast is an explicit request for the end ofce to remove the controls.
C6DOC0 (gg-mm)
Issue 16.0
December 2000
5-7
401-661-045
Table 5-1.
CNCE Descriptions (Page 4 of 14) DESCRIPTION Broadcast the dynamic overload control 1 message. The least severe control. DOC1 and DOC2 are progressive controls used when the congested ofce is only slightly overloaded or is recovering from a failure. They allow CCS messages to be slowly restored to (or removed from) the affected ofce. For a description of the broadcast mechanism, refer to the C6DOC0 event. Broadcast the dynamic overload control 2 message. Refer to the C6DOC1 description. Broadcast dynamic overload control message to a far end ofce. The most severe control. Caused by an emergency restart due to a received processor outage. This DOC message is broadcast every minute until congestion is relieved. It stops all CCS messages to the congested ofce. See the C6DOC0 event for a description of the broadcast mechanism. Emergency restart (EMR) begins. The specied link failed at the near end causing a complete failure of banded signaling between this ofce and the other ofce. This affects banded signaling, but if a particular ofce contains only one link, other types of signaling may be affected. If another path is available, the signaling load is transferred to the other link and an EMR condition is not triggered. When the last link in the C-link pool or set fails, emergency restarts are triggered on many A, B, and D-links. Refer to the EMR_ measurement descriptions. Since signaling messages cannot be routed over the affected link, alternate link messages may be lost (such as banded messages). Selective return is used so some direct signaling messages are returned to their originators. The end of the EMR condition is indicated by the C6EMRE event. Emergency restart ends. The link restoral causes an automatic status update for the affected link, bands, and routes. This event indicates that the end of the EMR condition on the specied link (regardless of what triggered the EMR). Emergency restart due to processor outage begins. The specied link receives a processor outage message from the far end while its mate is unavailable. This results in DOC3 messages being broadcast to all ofces that could send messages to this link. See the C6EMR event for further description. Declared link failure due to a 1-minute continuous receive buffer overload. If there is not an EMR, a changeover is initiated. The link is removed from service and is diagnosed.
Name (data) C6DOC1 (gg-mm)
C6DOC2 (gg-mm) C6DOC3 (gg-mm)
C6EMR (gg-mm)
C6EMRE (gg-mm)
C6EMRPO (gg-mm)
C6FLDCOL (gg-mm)
5-8 Issue 16.0
December 2000
Table 5-1.
CNCE Descriptions (Page 5 of 14) DESCRIPTION Declared link failure due to an automatic changeover initiated by the far end. The changeover lasted more than 3 minutes (or 6 minutes if a long key exchange is involved). Actions are taken as described under the C6FLDCOL event except no diagnostics are attempted and the changeover (the C6ACOCOV event) precedes this event. Declared link failure due to error threshold exceeded. This is caused by an excessive number of received SUs in error. Actions are taken as described under the C6FLDCOV event except the changeover (the C6ACOER event) precedes this event. Declared link failure due to continuous (lasting 30 seconds) far end processor congestion. This event occurs only on A-links. Actions are taken as described under the C6FLDCOL event. The C6PCR description (that event precedes this event) shows how a processor congestion is detected. Declared link failure due to a sanity check failure. This failure is due to either software or hardware problems causing abnormal node operation. Automatic diagnostics then attempt to determine the problem. Actions are taken as described under the C6FLDCOL event. Manual change back from manual changeover. This event occurs either due to manually restoring the link or due to preemption of the MOOS state by an emergency condition. In the latter case, this event may be preceded by a C6EMR_ event on the mate link. Refer to the L6MCO_ measurements for a description of the changeover/change back sequence.
Name (data) C6FLDCOV (gg-mm)
C6FLDER (gg-mm)
C6FLDPCR (gg-mm)
C6FLDSNT (gg-mm)
C6MCB (gg-mm)
Issue 16.0
December 2000
5-9
401-661-045
Table 5-1.
CNCE Descriptions (Page 6 of 14) DESCRIPTION Far end manual changeover request has been received. A changeover involves transferring signaling messages from the unavailable link to some other link, usually due to a need for link changes or maintenance. For example, in the case of a B-link, the changeover results in messages being routed to the mate link, in the case of an A-link, the changeover results in messages being routed to a C-link, and in the case of a C-link, it results in messages being load balanced over the other available C-links. The changeover request may be denied if the mate link is out-of-service or the C-link pool is unable to handle the additional load. When the request is received, the following occurs (if the request is accepted): 1. A manual changeover acknowledgment is sent to the far end, and the link is removed from service. 2. No new messages are given to the link. New messages are diverted to the mate link or C-link. 3. Messages remaining in the transmit buffers are retrieved, and an attempt is made to transmit these messages on some other link. Refer to the L6MCO_ measurements for more information.
Name (data) C6MCOF (gg-mm)
C6MCON (gg-mm)
Near end manual changeover due to local maintenance action. The maintenance and routing actions taken when this event occurs are similar to those taken for the C6MCOF event, except, before diverting messages to the other link, a manual changeover request is sent to the far end (not an acknowledgment). Upon receipt of an acknowledgment from the far end, the link is removed from service and the diversion is done. Refer to the L6MCO_ measurements for more information. Far end 1STP processor congestion event begins. This event occurs only on A-links. It indicates that the base call-processing cycle of the congested ofce exceeded a specied value for three consecutive cycles. The node uses selective message return to limit trafc to the congested ofce (described under the C6BOLX event). If a congestion message is received at least every 8 to 10 seconds for 30 seconds, declare the link failed. The event occurs once when the message is rst received and not again at least until congestion ends (indicated by the C6PCRE event). End of received processor congestion. If more than 10 seconds elapse between congestion messages, consider the event ended.
C6PCR (gg-mm)
C6PCRE (gg-mm)
5-10 Issue 16.0
December 2000
Table 5-1.
CNCE Descriptions (Page 7 of 14) DESCRIPTION Adjacent processor outage begins (a PRO has been received). This indicates that the far end ofce is undergoing initialization or is overloaded. The far end LI goes into the processor outage send mode. In this mode, processor outage (PRO) signal units are transmitted in a continuous stream. This end treats the problem as a link failure (causes a changeover). DOC3 is broadcast every 60 seconds on links to connected ofces that go into EMR due to the PROs being received on this link. The DOC message continues until synchronism is restored on this link. This is indicated by no more PROs. This event occurs once when the PRO is rst received, and not again until the outage ends. This is indicated by the C6PORE event. The C6DOC3 event occurs every 60 seconds as shown above. Adjacent processor outage ends. This event occurs when the far end stops sending PRO, synchronism is regained, and the link is restored. Automatic link check (ALC) failure. When a link is declared failed (a C7FLD_ event), the ALC is initiated. If the ALC is not successful within 15 seconds from the link failure, this event occurs. Change back from a failure that is not a declared failure. This is an automatic change back to a link that previously did an automatic changeover and then was restored. The change back must normally occur within 3 minutes of the changeover. If the LI reports a long key exchange is taking place, this time period is extended to 10 minutes. This event occurs for all automatic change backs exclusive of the C7ACBFLD event. Refer to the L7ACO_ measurements for a description of the changeover/ change back sequence. This event is usually preceded by a C7ACO_ event. Automatic change back from declared failure. This event indicates that the link is declared failed, has recovered, and trafc has been routed back to the link. This event is preceded by one of the C7FLD_ events (see those descriptions for more information on declared failure). Note that if a link is in the MOOS state and an emergency condition automatically forces the link back into service (called preemption), the C7MCB event occurs rather than this event.
Name (data) C6POR (gg-mm)
C6PORE (gg-mm)
C7ALCIF (gg-mm)
C7ACB00 (gg-mm)
C7ACBFLD (gg-mm)
Issue 16.0
December 2000
5-11
401-661-045
Table 5-1.
CNCE Descriptions (Page 8 of 14) DESCRIPTION Automatic changeover initiated by the far end. A changeover involves transferring signaling messages from the unavailable link to other links. These could be any links in the combined link set or C-links. In the case of a C-link failing, the changeover results in messages being load balanced over the other available C-links. The changeover message and the acknowledgment are both sent on some other link in the specied links set. When the changeover order is received from the far end, this event occurs and either a changeover or emergency changeover is initiated. An emergency changeover is done when the far end indicates that messages were received out of sequence or when the link node is out-of-service. The following is the changeover sequence: 1. The link is removed from service and no new messages are given to the link node (message handling pauses). 2. A changeover acknowledgment is sent to the far end on some other link in the set. Messages remaining in the transmit and retransmit buffers are retrieved and are transmitted in sequence on other links. An emergency changeover does not attempt the retrieval from the retransmit buffer (if the link node is out-of-service or the link failed due to a near end PRO, no retrieval is done). 3. Message handling resumes with new messages to the other links. 4. Only synchronization messages are sent on this link. In the case of an automatic changeover, the link changes back when sync is regained. Then it is proven in (from 3 to 15 seconds required) and restored. CCS messages are routed back to the restored link. If the link cannot sync and change back within 3 minutes (or 10 minutes if a long key exchange is involved), it is declared failed.
Name (data) C7ACOCOV (gg-mm)
C7ACOER (gg-mm)
Automatic changeover error threshold has been exceeded. The error rate monitor in the LI has reported excessive signal unit errors. The monitor is described in more detail under the C6ACOER event. Similar actions to those described for the C7ACOCOV event are taken.
5-12 Issue 16.0
December 2000
Table 5-1.
CNCE Descriptions (Page 9 of 14) DESCRIPTION Declared link failure due to a 1-minute continuous receive buffer overload. This event is followed by a changeover (assuming it is not denied due to a blocked path). The link is removed from service and is diagnosed. Declared link failure due to an automatic changeover initiated by the far end. The changeover lasted more than 3 minutes (or 10 minutes if a long key exchange is involved). Actions are taken as described under the C7FLDCOL event except no diagnostics are attempted and the changeover (the C7ACOCOV event) precedes this event. Declared link failure due to error threshold exceeded. This is caused by an excessive number of received SUs in error. Actions are taken as described under the C7FLDCOV event except the changeover (the C7ACOER event) precedes this event. Declared link failure due to a sanity check failure. This failure is due to either software or hardware problems causing abnormal node operation. Automatic diagnostics attempt to determine the problem. Actions are taken as described under the C7FLDCOL event. Transmit buffer level 1 congestion ends. Buffer occupancy has dropped below the threshold for level 1 abatement after transmit buffer congestion. Messages are not being discarded. Transmit buffer level 2 congestion ends. Buffer occupancy has dropped below the threshold for level 2 abatement after transmit buffer congestion. The node reverts to level 1 discard. Transmit buffer level 3 congestion ends. Buffer occupancy has dropped below the threshold for level 3 abatement after transmit buffer congestion. The node reverts to level 2 discard.
Name (data) C7FLDCOL (gg-mm)
C7FLDCOV (gg-mm)
C7FLDER (gg-mm)
C7FLDSNT (GGmm)
C7LCABM1X (gg-mm) C7LCABM2X (gg-mm) C7LCABM3X (gg-mm)
Issue 16.0
December 2000
5-13
401-661-045
Table 5-1.
CNCE Descriptions (Page 10 of 14) DESCRIPTION Transmit buffer level 1 congestion discard begins. Buffer occupancy has reached the threshold for level 1 discard to be initiated. The SS7 discard strategy (for levels 1, 2, or 3) is as described below: The node rst checks the priority of a message before transmitting it. The priority is contained in the service information octet eld and is compared with the congestion state of the transmit buffer. If the priority is less than the congestion level, the message is removed and a return message may be sent. The return message is sent only if the return indicator in the received message is set. If the message to be transmitted is a unit data type SCCP message, a UDS message is created and returned to the originator. If the priority of the message is equal to or greater than the congestion level, it is transmitted. This event does not occur again at least until buffer occupancy drops below the level 1 abatement threshold (signaled by the C7LCABM1X event). Transmit buffer level 2 congestion discard begins. Buffer occupancy has reached the threshold for level 2 discard to be initiated. The C7LCDIS1X event describes the discard strategy. Transmit buffer level 3 congestion discard begins. Buffer occupancy has reached the threshold for level 3 discard to be initiated. At this point, all messages are being discarded. The C7LCDIS1X event describes the discard strategy. Transmit buffer level 1 congestion onset begins. The congestion onset thresholds (levels 1, 2, or 3), are higher than the corresponding abatement levels but lower than the corresponding discard levels. At each onset level, the node reports the congestion state to the central processor. Network management messages (transfer controlled) are then broadcast to adjacent signaling points to limit messages to the affected node. To avoid further congestion of the transmit buffer, the far end initiates the discard strategy used by nodes at the discard level (described under the C7LCDIS1X event). If the node remains in the same congestion level (1, 2, or 3) for 60 seconds, it is taken OOS and diagnosed.
Name (data) C7LCDIS1X (gg-mm)
C7LCDIS2X (gg-mm) C7LCDIS3X (gg-mm)
C7LCON1X (gg-mm)
C7LCON2X (gg-mm)
Transmit buffer level 2 congestion onset begins. Messages are being discarded according to the level 1 strategy. The node reports the level 2 congestion state to the central processor. Actions are taken as described under the C7LCON1X event.
5-14 Issue 16.0
December 2000
Table 5-1.
CNCE Descriptions (Page 11 of 14) DESCRIPTION Transmit buffer level 3 congestion onset begins. Messages are being discarded according to the level 2 strategy. The node reports the level 3 congestion state to the central processor. Actions are taken as described under the C7LCON1X event. Link set failure begins. When the last available link in the set fails, this event occurs. If the failure of the link set results in failure of the associated combined link set, another C7LSF CNCE message is output with the combined link set identication. The end of this event is signaled by the C7LSFE event. The CLF_ measurements describe the various link set failure scenarios. If this failure causes some destination to become isolated from this ofce (for example, all signaling paths to a signaling point have failed), this event is accompanied by a C7SPI event. Link set failure ends. When any link in the set restores, this event occurs. Manual change back from manual changeover. This event occurs either due to manually restoring the link (at the near end or far end) or due to preemption of the MOOS state by an emergency condition. When the link regains sync, a change back declaration is sent to the far end. The link state is changed to OOS and new messages are diverted back to the link. Until all acknowledgments are received, these messages are not transmitted; messages are diverted to other links if the link fails to return to service. Note that this event occurs before the link is made available. Far end manual changeover request has been received, usually due to a need for link changes or maintenance. The far end has requested and permission has been granted to initiate a changeover. Either a changeover or emergency changeover is initiated. The sequence is described under the C7ACOCOV event. Near end manual changeover due to local maintenance action. The changeover could be denied if removing the link from service would cause the far end to become inaccessible. This end requests permission from the far end to initiate a changeover (the far end recognizes a C7MCOF event). If the far end grants permission, either a changeover or emergency changeover is initiated. The sequence is described under the C7ACOCOV event. Adjacent processor outage event begins (the end of this event is signaled by the C7PORE event). Refer to the C6POR description.
Name (data) C7LCON3X (gg-mm)
C7LSF (linkset)
C7LSFE (linkset) C7MCB (gg-mm)
C7MCOF (gg-mm)
C7MCON (gg-mm)
C7POR (gg-mm)
Issue 16.0
December 2000
5-15
401-661-045
Table 5-1.
CNCE Descriptions (Page 12 of 14) DESCRIPTION Adjacent processor outage event ends. Refer to the C6PORE description. An adjacent signaling point isolation begins due to local failure. A link failed causing a complete failure of all signaling paths to the indicated destination from this ofce. This condition is usually accompanied by a C7LSF event. The end is indicated by the C7SPIE event. See the SPI_ measurements for more detail. Adjacent signaling point isolation ends. Some failed path to the indicated destination has restored due to a local link set recovery. This event indicates that the destination is no longer isolated from this ofce. An adjacent signaling point isolation begins due to a far end processor outage. A link failed due to receiving PROs from the far end causing a complete failure of all signaling paths to the indicated destination from this ofce. See also the C7SPI description. The end of this condition is indicated by the C7SPIE event. Received a subsystem allowed message. Receiving an SSA message indicates that the subsystem (either local or nonlocal), has become allowed. SSA messages sent by the far end are in response to subsystem status test messages. This event (and the C7SSPF event described below) occurs only if both of the following two conditions are met:
s s
Name (data) C7PORE (gg-mm) C7SPI (pointcode)
C7SPIE (pointcode)
C7SPIPO (pointcode)
C7SSAF (subsystem)
Indicated subsystem is in the same region, and It is simplex, or duplex with the mate subsystem prohibited.
C7SSPF (subsystem)
Received a subsystem prohibited message. SSP messages sent by the far end are in response to signaling messages destined for the indicated prohibited subsystem. Receiving an SSP message indicates that the subsystem (either local or nonlocal), has become prohibited causing it to be blocked. The C7SSAF description details certain conditions for the generation of this event. Automatic return to service from a declared failure. Automatic link check (ALC) failure on the specied link. When a link is declared failed (the CPFLD or CPFLDNS event), the ALC is initiated. If the ALC is not successful within 15 seconds from the link failure, this event occurs.
CPARSFLD (PBX Link) CPALCIF (PBX Link)
5-16 Issue 16.0
December 2000
Table 5-1.
CNCE Descriptions (Page 13 of 14) DESCRIPTION A SERV message exchange has failed on the specied D-channel link. The SERV message is sent several times and, if no acknowledgment is received (the T321 timer expires), this event occurs. This indicates that either a layer 3 protocol problem, a provisioning problem, or a hardware failure other than facility failure. This event occurs when a link attempts to transition to the IS state. Note that since the SERV message exchange is not done for standby links, a standby link could have latent layer 3 problems. A duplex D-channel link has transitioned to the standby state. If the link was in declared failure, this event indicates that it has recovered. The mate D-channel link fails while the indicated link is in the manual out-of-service (MOOS) state. No switchover occurs until manual action removes the MOOS state. If the link remains in MOOS, the system attempts to recover the mate link normally. This event is a warning of possible service outage. Declared link failure (this only applies to PBX links with diagnostic). The link state is changed to OOS and the central processor is informed. For a D-channel link failure, this event indicates that a signaling path failure; therefore, any associated B-channels are removed from service. There are various reasons for the failure, including:
s
Name (data) CPDSERVF (PBX Link)
CPDSTBY (PBX Link) CPDUMOOS (PBX Link)
CPFLD (PBX Link)
s s s
Layer 1 protocol down (probably failure of DS0 or DS1, no explicit indication of L1 failure) Layer 2 protocol down (protocol exceptions and inability to establish link within 90 sec.) DDS code received Disconnect message received from far end Level 2 error threshold exceeded (usually facility problems).
CPFLDNS (PBX Link)
Nonsignaling declared link failure of a mated link. The signaling path is still available on the backup link. The link state is changed to OOS. For the reasons for this event, see the CPFLD description. Manual out-of-service (MOOS) begins. Manual out-of-service ends.
CPMOOS (PBX Link) CPMOOSE (PBX Link)
Issue 16.0
December 2000
5-17
401-661-045
Table 5-1.
CNCE Descriptions (Page 14 of 14) DESCRIPTION Red alarm declared (near end DS1 facility failure). This is the second most severe trouble condition for a PBX node. This event obstructs sensing of the yellow alarm condition. Note that this means that there may be no explicit clearing of any yellow alarm in progress (normally indicated by the CTYELALC event). Red alarm cleared. Any yellow alarm in progress is also cleared. Yellow alarm declared. Yellow alarm cleared.
Name (data) CTREDAL (PBX Node)
CTREDALC (PBX Node) CTYELAL (PBX Node) CTYELALC (PBX Node)
5-18 Issue 16.0
December 2000
6
6-1 6-1 6-1 6-2 6-5 6-6 6-6 6-7 6-8 6-9 6-9 6-24 6-39 6-40 6-41 6-41 6-53 6-54 6-57 6-59 6-63 6-66 6-66 6-67 6-68 6-70 6-71
Contents
Introduction Overview
s s s
Diagnostics Hardware and Interfaces System Maintenance Interfaces Diagnostic Message Structure System Diagnostics Use of DGN Commands Obtaining the Status of Diagnostics Node Diagnostic Phase Descriptions Circuit Pack Trouble Location Guide Diagnostic Listings Clearing Troubles Using the Diagnostic Listings LNs with Unequipped LI Boards - MV Updates Ring Node Addressing Automatic Diagnostics and Restorals Manual (Unit) Diagnostics Manual Diagnostics Using the 1106 Display Page Manual Diagnostics Using the DGN Command Manual Diagnostics Procedure Using the RST Command CDN-I Fault Isolation Panic Messages RAP Diagnostic Firmware Interactive Diagnostics Denied Diagnostic Requests Inhibiting Diagnostic Requests
Performing Diagnostics
s s
s s
Issue 16.0
December 2000
6-i
401-661-045
Contents
s
Diagnostic Aborts and Audits Aborts Audits Audit Failures
6-71 6-71 6-72 6-72 6-73
Operating System Diagnostics
6-ii
Issue 16.0
December 2000
Introduction
This chapter serves as an aid for performing diagnostics on ring nodes (RNs) in a Common Network Interface ring-based ofce. When diagnostics are performed, see the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual or the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual should also be used. Diagnostics are performed both automatically and manually. Automatic diagnostics are performed by automatic ring recovery (ARR). For more information concerning ARR, refer to the "Maintenance Description section in this manual. Manual diagnostics are performed with the aid of input messages at the Maintenance CRT (MCRT).
Overview
Diagnostics
Diagnostics serve two major purposes. First, diagnostics are run for fault detection and resolution, and are invoked by manual requests. Diagnostics are also invoked by error analysis programs as part of the automatic ring recovery (ARR) of a node that has been removed due to a fault condition. Secondly, diagnostics are invoked for the purpose of repair verication.
Issue 16.0
December 2000
6-1
401-661-045
The CNI diagnostics provide diagnostic testing for the system. These diagnostics are performed in a manner similar to those of the 3B21D computer system, but diagnose totally different equipment. For a complete list and details on 3B21D computer diagnostics and UNIX system RTR, refer to the UNIX System RTR 3B20/3B21 Operators System Maintenance Manual, 254-303-106.
Hardware and Interfaces

The CNI utilizes the 3B21D computer as the central processing unit. The function of the CNI is to receive messages from incoming applications and route them to an outgoing application. It utilizes a ring communication bus to totally interconnect all application terminations and the 3B21D computer. The ring is a dual bus conguration, and is designed such that faulty circuits can be eliminated from the active system for an indenite period of time. The CNI diagnostics primarily test ring node hardware that is contained in the ring node frame/cabinet (RNF/C). The types of RNs are:
s s s s s s s s
Ring Peripheral Controller Nodes (RPCNs) Link Nodes (both LIN-E/SS7 and LI4S/SS7 nodes) Direct Link Nodes (DLNs) DLN30 nodes DLN60 nodes CDN-I, CDN-II, CDN-IIx, and CDN-III nodes MDL nodes Ethernet Interface Node(s) (EINs)
Very large scale integration (VLSI) is used for RNs. The VLSI ring node combines the two RIs and the NP of the ring node into one circuit pack called the IRN. The CNI utilizes a link interface to provide an interface between the ring and any ofce in the network, thus the name Common Network Interface. The CNI diagnostics primarily test this link interface. The following is a description of the ring nodes and their contents. NOTE: Parentheses () have been used throughout these circuit pack listings to designate that more than one type of circuit pack may exist for a particular ring node, depending upon which generic is being used (although it is preferred that the most
6-2 Issue 16.0
December 2000
current circuit packs be in operation). For more information, refer to SD 3F019-02 (Application Schematic for (CNI) and for features provided by each circuit pack. Table 6-1. Discontinued Availability CP Listings UNIT NAME RIO RI1 NP IRN LI-E LI-E UPDATE CIRCUIT BACK UN122C UN123B TN922 UN303B TN917B TN1803
MD CIRCUIT BACK UN122, UN122B UN123 TN913 UN303 TN917 TN1506

s
IRN RPC node Integrated ring node (IRN) UN303() (VLSI) Dual duplex serial bus selector (DDSBS) TN69B 3B computer interface (3BI) TN914.
IRN2 RPC node Integrated ring node (IRN2) UN304() Dual duplex serial bus selector (DDSBS) TN69B 3B computer interface (3BI) TN914.
IRN link (LIN-E/SS7) node Node processor (NP) TN922 Integrated ring node (IRN) UN303() (VLSI). not encrypted TN916 or encrypted TN917() or memory data link (MDL) TN1317.
IRN link (LI4S/SS7) node Integrated ring node (IRN) UN303() (VLSI). 4-Port Link Interface 0 (LI4 0) TN1316 (LI4S) (the TN1316 has an APA 12A CP, rear mount). IRN DLNE node Integrated ring node (IRNB) UN303B (VLSI). Dual duplex serial bus selector (DDSBS) TN69B 3B computer interface (3BI) TN914
Issue 16.0
December 2000
6-3
401-661-045
Attached processor (AP) TN1630

s
IRN2 DLN30 node Integrated ring node (IRN2B) UN304B Dual duplex serial bus selector (DDSBS) TN69B 3B computer interface (3BI) TN914 Attached processor (AP) TN1630
IRN2 DLN60 node Integrated ring node (IRN2B) UN304B TN918 TN1803 TN1508 Attached processor (AP) TN2522
IRN CDN-I node Integrated ring node (IRN) UN303 () Node processor interface (NPI) TN1349 3B15 computer line of boards:
s
Central controller cache (CCC) UN237(1) or UN626 for the 16-Mbyte memory board option Central controller support (CCS) UN236(1) or UN625 for the 16-Mbyte memory board option Main store controller (MASC) UN95(1-6) or UN507(1) for 16-Mbyte memory board option Main store array (MASA) TN56(1-48) or TN1398(1-8) for 16-Mbyte memory board option Power control interface and display (PCID) TN1128.
s s
IRN2 CDN-II node Integrated ring node (IRN2B) UN304B Attached processor (AP) TN1630B
IRN2 CDN-IIx node Integrated ring node (IRN2B) UN304B Attached processor (AP) TN1720x
6-4 Issue 16.0
December 2000
NOTE: The x represents boards lettered TN1720A through TN1720H depending upon the amount of memory installed. Each board has 32 Mbytes of memory.
s
IRN2 CDN-III node Integrated ring node (IRN2B) UN304 TN918 TN1803 TN1508 Attached processor (AP) TN2523
IRN MDL node (includes CSN, DSN, and ICN) Integrated ring node (IRN) UN303()/UN304 MDL TN1640
IRN2 EIN node Integrated Ring Node (IRN) 2 UN304B TN4016 Paddleboard, 9822EB ED3F064-37 G80 cable.
An RPCN is a node where packetized information is removed from the ring and transferred to the 3B21D computer for processing, or reenters the ring after processing. It is the node on the ring where packetized information enters or exits a transmission facility. Both the RPCN and the DLNs are located in the RNF/C. DLNs function like s but have DMA capability. They contain the same circuit packs as an RPCN plus an attached processor (AP). CDN-I nodes are located in the RNF/C too. They are basically a VLSI with a modied 3B15 computer as the user apparatus circuit. The Underwriters Laboratories (UL) listed RNF/C provides ring bus connections between the RNs, access to analog and digital facilities and access to the 3B21D computer via the RPCNs.
System Maintenance Interfaces

Local maintenance access and status information for the 3B21D computer is obtained through video terminals and receive-only printers (ROPs). The Maintenance Terminal (MCRT) - provides the primary interface and
Issue 16.0
December 2000
6-5
401-661-045
communications for system control and display (C&D), input and output messages, and the 3B21D computer emergency action interface (EAI) control and display. Inputs entered at the MCRT are monitored via the CTS. The ROP provides hard copies of the MCRT input and output messages, report status information, fault conditions, audits, and diagnostic results. If remote maintenance is provided, it has the same terminal access and terminal capabilities as the on-site user. Because both the remote and local users have simultaneous access to the 3B21D computer, it is advised that diagnostic input requests be coordinated through the on-site MCRT user.
Performing Diagnostics
When performing manual RN diagnostics, input and output messages are entered and interpreted from the maintenance terminal. For this reason, basic terminal familiarization and operating knowledge is required. An understanding of input messages and knowledge of the message data elds and formats are also important. UNIX system Real Time Reliable (RTR) or UNIX system RTR Very Large Main Memory (VLMM) provides assistance to users for entering input messages. It can be used to complete or correct errors caused by the user. Invalid values are rejected and accompanied by an appropriate error acknowledgment. Further help can be obtained by entering a question mark (?). A prompting mode can be used to lead the user through the input message. When a complete input message has been constructed, the user may either execute it or cancel it. The help session is then completed; that is, help is provided for only one input message at a time.
Diagnostic Message Structure

Listed within the following paragraphs are basic guidelines for understanding the PDS input message format. For a detailed explanation of the message structure, also refer to see the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manuall An input message can contain 96 characters, separated by colons (:) into elds. The elds of an input message are identied as the action, identication, and the data eld, with each eld being variable in length. These elds are briey explained below:
s
Action Field: An action verb (keywords) identies the action the system should perform. This is a verb such as diagnose (DGN), inhibit (INH), remove (RMV), or restore (RST).
6-6 Issue 16.0
December 2000
Identication Field: Consists of one, two, or three elds called subelds. These subelds are separated by semicolons (;) with each containing one or more keywords. The identication eld aids in structuring the message to permit a complete specication, or provides other information further identifying the object of the action. Data Field: This eld is either null or composed of additional variable information pertaining to the message. This information is in keyword format with keywords separated by commas.
A general format for input messages and some output messages can be seen in the following format in Figure 6-1 on page 6-7.
ACTION
IDENTIFICATION
DATA
subfield (verb) : (object) ;
subfield (object) ;
subfield (action option) :
Figure 6-1.
General Format for Input/Output Messages A typical diagnostic input message and format varies in length and eld identiers. The sample message below provides eld separation and identication. Each eld is separated by a colon (:) and square brackets [ ] indicate optional information. DGN:NODExx y[;[RPT n][,RAW][,UCL]][:PH n [,TLP] | :TLP] where: DGN: = the action eld NODE = LN or RPCN xx y[;[RPT n][,RAW][,UCL]][: = the identication eld PH n [,TLP] | :TLP] = the data eld.
Issue 16.0
December 2000
6-7
401-661-045
System Diagnostics
Diagnostics may be performed manually. However, when the system detects a fault(s), diagnostics are performed automatically (ARR). The diagnostics in this section cover only the manual portions of system diagnostics, and present information to familiarize the user with the various diagnostic (DGN) input commands, phase descriptions, message interpretation, and other diagnostic information. For more information concerning ARR, refer to the "Maintenance Description section in this manual. DLNs and CDN-I use the same commands as LNs for diagnostics.
Use of DGN Commands

The manual command, DGN, is used to perform diagnostics on ring nodes. The DGN command has several formats, and some are detailed in the table DGN Message Input Variations. commands and variations. The term nodexx y used with the DGN commands in the following table and throughout this document, is used to identify any node and its group and member number. Insert appropriate node type before using commands from this manual. DLNs and CDN-I are treated like s for diagnostics. Table 6-2. DGN Message Input Variations FUNCTIONS Runs all automatic phases on nodexx y. Runs only the specied phase (a) on nodexx y. Runs all automatic phases within the specied range (a through b) on nodexx y. Runs all automatic phases on nodexx y and repeats execution "n" times, where n<_255. Runs all automatic phases on nodexx y and prints the diagnostic results of every phase at the MROP. Runs all automatic phases on nodexx y.Early terminations built into data tables are ignored. Runs all automatic phases on nodexx y and executes the troublelocating process at the conclusion of the diagnostics.This process prints at the MROP and MCRT a list of possible faulty equipment.
COMMANDS DGN:nodexx y DGN:nodexx y:PH a DGN:nodexx y:PH a-b DGN:nodexx y;RPT n DGN:nodexx y;RAW DGN:nodexx y;UCL DGN:nodexx y:TLP
6-8 Issue 16.0
December 2000
Obtaining the Status of Diagnostics

When performing ring node diagnostics, it may be necessary to obtain visual status of the system, the ring itself, or the status of a particular node. One manner that a status report can be obtained is with the use of the OP input message. Listed in the OP:RING Input Message Variations table are formats for most of the OP commands which produce status reports that can aid in status report interpretation. If other information or formats pertaining to the OP command are desired, refer to the 401-610-055 Input Message Manual. Table 6-3. OP:RING Input Message Variations FUNCTION Provides status information for the specied node (RPCN or LN). Provides status information for all nodes on a specied frame/ cabinet (GRP xx). Provides summary information for the ring. Provides detailed status of the ring. Provides status information for all equipment which is out-of-service. Requests generic information for the specied node (RPCN or LN).
INPUT MESSAGE OP:RING,nodexx y OP:RING,GRP xx OP:RINGOP:RING;S UM OP:RING;DETD OP:00S OP:RING,nodexx y;GEN
Another means of obtaining a status report of the system is by calling up the 1105 or 1106 display page from the MCRT. See the Trouble Indicators, Error Analysis, and Display Pages in this manual.
Node Diagnostic Phase Descriptions

The diagnostic routines for ring nodes are broken down into phases. These phases are described in the Diagnostic Phases tables for each type of node. Phases are arranged to test functionally related groups of hardware. Each phase may test all or part of the hardware on a single CP, or several CPs. Also, each ring node is diagnosed by its own set of diagnostic phases. Certain hardware components, such as the NP, are used by every type of ring node. Therefore, the phases that correspond to these hardware components are also used by every type of ring node. DLNs use the IRN LN phases plus the DLN phases. The CDNs use the IRN LN phases plus the CDN phases.
Issue 16.0
December 2000
6-9
401-661-045
Table 6-4.
IRN and IRN2 RPCN Node Diagnostic Phases PHASE DESCRIPTION Tests that a message can be relayed from the BISO node to the EISO node via the isolated segment over ring 0. Phase 1 also tests that any interframe buffers and all IRN boards in the isolated segment are equipped in accordance with ECD data, and that any interframe buffers in the isolated segment exhibit the proper data storage capacity. Tests that a message can be relayed from the EISO node to the BISO node via the isolated segment over ring 1. Phase 2 also tests that any interframe buffers and all IRN boards in the isolated segment are equipped in accordance with ECD data, and that any interframe buffers in the isolated segment exhibit the proper data storage capacity. Tests the interface between the Dual Serial Channel (DSCH) and the DDSBS. Tests interface between the DDSBS and the 3BI. Veries that RAC0 can detect bad parity in a ring message. Veries that RAC1 can detect bad parity in a ring message. Runs off-line CU to DDSBS tests (Demand phase only). Tests the NP RAM memory, NP parity checker, and generator circuitry. Tests everything but the memory in the node-processor function.THIS PHASE IS NOT VALID FOR IRN2. Tests part of both RAC circuits, and the RAC to the NP interface. Partially tests interface between both RACs and the ring bus. Veries that RAC0 can detect bad parity in a ring message. Veries that RAC1 can detect bad parity in a ring message.
PHASE 01
02
10 11 12 13 14 20 21 (IRN only) 30 32 33
6-10 Issue 16.0
December 2000
Table 6-5.
IRN LN (LIN - E/SS7) Node Diagnostic Phases PHASE DESCRIPTION Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC0. Phase 1 also tests that a message can be relayed from the BISO node to the EISO node via the isolated segment overring 0, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity. Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC1. Phase 2 also tests that a message can be relayed from the EISO node to the BISO node via the isolated segment overring 1, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity. Tests part of both RACs, the RAC to the NP interface, and the interface between both RACs and the ring bus. Checks the capacity of the interframe buffers associated with the node under test. Verifies that RAC0 can detect bad parity in a ring message. Verifies that RAC1 can detect bad parity in a ring message. Tests the NP RAM memory, NP parity checker and generator circuitry. Tests the NP programmable master and slave interrupt controllers and associated circuitry.It also tests the NP programmable interval timer circuitry. Verifies the ability of the node to read, write and propagate a maximum-length long message (demand only phases for transition load). Tests hardware in the LI board or the LI-NP interface. Tests the sanity of the microprocessor and the ROM. Tests the 2.4 and 4.8 data service units, along with their respective VFLA or DSA units. CCS7 will ATP by default. Ensures that the firmware and the hardware on the LI board will function as a whole.
PHASE 01
02
10
12 13 20 21
39
40 41 47 48
Issue 16.0
December 2000
6-11
401-661-045
Table 6-6. PHASE 01
IRN LN (LI4S/SS7) Node Diagnostic Phases (Page 1 of 2) PHASE DESCRIPTION Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC0. Phase 1 also tests that a message can be relayed from the BISO node to the EISO node via the isolated segment overring 0, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity. Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC1. Phase 2 also tests that a message can be relayed from the EISO node to the BISO node via the isolated segment overring 1, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity. Tests part of both RACs, the RAC to the NP interface, and the interface between both RACs and the ring bus. Checks the capacity of the interframe buffers associated with the node under test. Verifies that RAC0 can detect bad parity in a ring message. Verifies that RAC1 can detect bad parity in a ring message. Tests the NP RAM memory, NP parity checker and generator circuitry. THIS PHASE IS NOT VALID FOR IRN2 Tests the NP programmable master and slave interrupt controllers and associated circuitry .It also tests the NP programmable interval timer circuitry. Verifies the ability of the node to read, write and propagate a maximum length long message (demand only phases for transition load). Tests the LI4 0 local RAM and the Dual Port RAM from the Node Processor. The LI4 is held reset. Tests the NP-LI4 0 interface and DPRAM from the NP view while the microprocessor on the Link Interface board is running. This phase is downloaded to the LI4 0 via the NP. Tests the 8086 microprocessor on theLI4 0 board. A subset of the instruction set of the 8086 is exercised to verify that the microprocessor operates properly. This phase is downloaded to the LI4 0 via the NP. Tests the DPRAM and the parity check circuit. This phase is downloaded to the LI40 RAM via the NP.
02
10
12 13 20 21 (IRN Only) 39 50 51
52
53
6-12 Issue 16.0
December 2000
Table 6-6. PHASE 54 55
IRN LN (LI4S/SS7) Node Diagnostic Phases (Page 2 of 2) PHASE DESCRIPTION Tests the Programmable Interrupt Controllers and the Programmable Interval Timers.This phase is downloaded to the LI4 0 RAM via the NP. Tests the DMA, Serial Communications Chip (SCC), part of the Programmable Interrupt Controller, timers, and the formatting chips ofLI4 0 when the LI4D is tested (TN1315). No tests are run; ATPs are by default. If TLP is run, the APA13 and the DSA (Z2556L1A/2) are noted but no tests are run. Thus, when link maintenance is performed, this equipment must be taken into consideration.
56
Issue 16.0
December 2000
6-13
401-661-045
Table 6-7.
IRN DLNE Node Diagnostic Phases PHASE DESCRIPTION Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC0. Phase 1 also tests that a message can be relayed from the BISO node to the EISO node via the isolated segment overring 0, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC1. Phase 2 also tests that a message can be relayed from the EISO node to the BISO node via the isolated segment overring 1, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity. Tests part of both RACs, the RAC to the NP interface, and the interface between both RACs and the ring bus. Checks the capacity of the interframe buffers associated with node under test. Veries that RAC0 can detect bad parity in a ring message. Veries that RAC1 can detect bad parity in a ring message. Tests the NP RAM memory, NP parity checker and generator circuitry. Tests the NP programmable master and slave interrupt controllers and associated circuitry.It also tests the NP programmable interval timer circuitry. Tests the interface between the DSCH and the DDSBS. Tests the interface between the DDSBS and the 3BI. Tests the ability of NP to go insane and set the Interrupt Request Flag when the 3BI has an error. Tests the interface between the 3BI and the NP. Runs off-line CU to DDSBS tests. (Demand phase only.) Cooperates with the 3B21D driver to test the DMA capability via the 3BI. Tests the hardware in the LI board or the LI-NP interface. Tests the sanity of the microprocessor and the ROM. Tests the interface between DMA and 3BI.
PHASE 01
02
10
12 13 20 21
30 31 32 33 34 35 40 41 42
6-14 Issue 16.0
December 2000
Table 6-8. PHASE 01*
IRN2 DLN30 Node Diagnostic Phases (Page 1 of 2) PHASE DESCRIPTION Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC0. Phase 1 also tests that a message can be relayed from the BISO node to the EISO node via the isolated segment overring 0, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity. Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC1. Phase 2 also tests that a message can be relayed from the EISO node to the BISO node via the isolated segment overring 1, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity. Tests part of both RACs, the RAC to the IRN2 interface, and the interface between both RACs and the ring bus. Veries that RAC0 can detect bad parity in a ring message. Veries that RAC1 can detect bad parity in a ring message. Tests the IRN2 RAM memory, IRN2 parity checker and generator circuitry. Tests the interface between the DSCH and the DDSBS. Tests the interface between the DDSBS and the 3BI. Tests the ability of NP to go insane and set the Interrupt Request Flag when the 3BI has an error. Tests the interface between the 3BI and the NP. Runs off-line CU to DDSBS tests. (Demand phase only) Cooperates with the 3B21D driver to test the DMA capability via the 3BI. Tests the shared static memory in the AP30 from theIRN2 side.
02*
10* 12* 13* 20* 30 31 32 33 34 35 40*
Issue 16.0
December 2000
6-15
401-661-045
IRN2 DLN30 Node Diagnostic Phases (Page 2 of 2) PHASE DESCRIPTION Tests the shared static memory from the AP30 side, the local parity error snapshot register, and the main 16 Megabytes of DRAM on the AP30. Tests the DMA capability via the 3BI.The DMA is from the 3B21D to/ from the AP Dual Port Memory (DPM). Tests the 4 D-channel data links on the AP30.
42* 43
* Automatic Demand-Only
6-16 Issue 16.0
December 2000
Table 6-9.
IRN2 DLN60 Node Diagnostic Phases PHASE DESCRIPTION Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC0. Phase 1 also tests that a message can be relayed from the BISO node to the EISO node via the isolated segment overring 0, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity. Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC1. Phase 2 also tests that a message can be relayed from the EISO node to the BISO node via the isolated segment overring 1, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity. Tests part of both RACs, the RAC to the IRN2 interface, and the interface between both RACs and the ring bus. Veries that RAC0 can detect bad parity in a ring message. Veries that RAC1 can detect bad parity in a ring message. Tests the IRN2 RAM memory, IRN2 parity checker and generator circuitry. Tests the shared static memory in the AP60 from the IRN2 side. Tests the shared static memory from the AP60 side, the local parity error snapshot register, and the main 32 Megabytes of DRAM on the AP60.
PHASE 01*
02
10 12 13 20 40 41
Demand-only
Issue 16.0
December 2000
6-17
401-661-045
Table 6-10. PHASE 01
IRN CDN-I Diagnostic Phases (Page 1 of 2) PHASE DESCRIPTION Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC0. Phase 1 also tests that a message can be relayed from the BISO node to the EISO node via the isolated segment overring 0, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity. Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC1. Phase 2 also tests that a message can be relayed from the EISO node to the BISO node via the isolated segment overring 1, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity. Tests part of both RACs, the RAC to the NP interface, and the interface between both RACs and the ring bus. Checks the capacity of the interframe buffers associated with node under test. Veries that RAC0 can detect bad parity in a ring message. Veries that RAC1 can detect bad parity in a ring message. Tests the NP RAM memory, NP parity checker and generator circuitry. Tests the NP programmable master and slave interrupt controllers and associated circuitry .It also tests the NP programmable interval timer circuitry. Tests the NPI from the IRN side. Tests the CCS board. Tests the MASC 0 memory group. Tests the MASC 16 memory group. Tests the CCC board. Tests the NPI from the RAP side. Tests the MASC 1 memory group.
02
10
12 13 20 21
40 42 43 43 (16 meg) 44 45 46
6-18 Issue 16.0
December 2000
Table 6-10. PHASE 47 48 49 50 51 52 53
IRN CDN-I Diagnostic Phases (Page 2 of 2) PHASE DESCRIPTION Tests the MASC 2 memory group. Tests the MASC 3 memory group. Tests the MASC 4 memory group. Tests the MASC 5 memory group. Tests the MASC 6 memory group. Tests the MASC 7 memory group. Tests a comprehensive end-to-end test. Tests the MASA 0. Tests the MASA 1. Tests the MASA 2. Tests the MASA 3. Tests the MASA 4. Tests the MASA 5. Tests the MASA 6. Tests the MASA 7.
54 (16 meg) 55 (16 meg) 56* (16 meg) 57* (16 meg) 58* (16 meg) 59* (16 meg) 60* (16 meg) 61* (16 meg) * Demand-only
Issue 16.0
December 2000
6-19
401-661-045
IRN2 CDN-II/CDN-IIx Diagnostic Phases (Page 1 of 2) PHASE DESCRIPTION Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC0. Phase 1 also tests that a message can be relayed from the BISO node to the EISO node via the isolated segment overring 0, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC1. Phase 2 also tests that a message can be relayed from the EISO node to the BISO node via the isolated segment overring 1, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity. Tests part of both RACs, the RAC to the IRN2interface, and the interface between both RACs and the ring bus. Veries that RAC0 can detect bad parity in a ring message. Veries that RAC1 can detect bad parity in a ring message. Tests the IRN2 RAM memory, IRN2 parity checker and generator circuitry. Tests the shared static memory in the AP30 from the IRN2 side. Tests the shared static memory from the AP30 side, the local parity error snapshot register, and the main 16 Megabytes of DRAM on the AP30. Tests the 4 D-channel data links on the AP30. Tests the overall functionality of the mezzanine memory. For CDN-II, tests the 1st 32 Mbytes of the mezzanine memory.For CDN-IIx, tests the 1st 32-Mbyte block of the mezzanine. For CDN-II, tests the 2nd 32 Mbytes of the mezzaninememory.For CDN-IIx, tests the 2nd 32-Mbyte block of the mezzanine. For CDN-IIx only, tests the 3rd 32-Mbyte block of the mezzanine.
02*
10* 12* 13* 20* 40* 41*
43 44 45
46
47
6-20 Issue 16.0
December 2000
Table 6-11. PHASE 48 49 50 51 52
IRN2 CDN-II/CDN-IIx Diagnostic Phases (Page 2 of 2) PHASE DESCRIPTION For CDN-IIx only, tests the 4th 32-Mbyte block of the mezzanine. For CDN-IIx only, tests the 5th 32-Mbyte block of the mezzanine. For CDN-IIx only, tests the 6th 32-Mbyte block of the mezzanine. For CDN-IIx only, tests the 7th 32-Mbyte block of the mezzanine. For CDN-IIx only, tests the 8th 32-Mbyte block of the mezzanine.
Automatic. NOTE: For APX6.1 prior to Software Update that includes diagnostics for CDN-IIx, Phases 43 and 45 through 52 are demand-only phases; Phase 44 is an automatic phase. For APX6.1 with the Software Update that includes diagnostics for CDN-IIx and for APX7.0, Phase 43 does not apply; and Phases 44 through 52 are automatic phases.
Issue 16.0
December 2000
6-21
401-661-045
Table 6-12. PHASE 01
IRN2 CDN-III Diagnostic Phases PHASE DESCRIPTION Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC0. Phase 1 also tests that a message can be relayed from the BISO node to the EISO node via the isolated segment overring 0, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity. Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC1. Phase 2 also tests that a message can be relayed from the EISO node to the BISO node via the isolated segment overring 1, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity. Tests part of both RACs, the RAC to the IRN2interface, and the interface between both RACs and the ring bus. Veries that RAC0 can detect bad parity in a ring message. Veries that RAC1 can detect bad parity in a ring message. Tests the IRN2 RAM memory, IRN2 parity checker and generator circuitry. Tests the shared static memory in the AP60 from theIRN2 side. Tests the shared static memory from the AP60 side, the local parity error snapshot register, and the main 32 Megabytes of DRAM on the AP60. Tests the database memory control circuits. Tests the 1st 128 Mbytes of the AP60 0.5 Gbyte database memory array. Tests the 2nd 128 Mbytes of the AP60 0.5 Gbyte database memory array. Tests the 3rd 128 Mbytes of the AP60 0.5 Gbyte database memory array. Tests the 4th 128 Mbytes of the AP60 0.5 Gbyte database memory array.
02
10 12 13 20 40 41
44 45* 46* 47* 48*
Demand-only.
6-22 Issue 16.0
December 2000
IRN2 EIN Node Diagnostic Phases PHASE DESCRIPTION Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC0. Phase 1 also tests that a message can be relayed from the BISO node to the EISO node via the isolated segment overring 0, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC1. Phase 2 also tests that a message can be relayed from the EISO node to the BISO node via the isolated segment overring 1, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity. Tests part of both RACs, the RAC to the IRN2 interface, and the interface between both RACs and the ring bus. Veries that RAC0 can detect bad parity in a ring message. Veries that RAC1 can detect bad parity in a ring message. Tests the IRN2 RAM memory, IRN2 parity checker and generator circuitry. Tests the shared static memory in the AP30 from the IRN2 side. Tests the shared static memory from the AP30 side, the local parity error snapshot register, and the main 16 Megabytes of DRAM on the AP30. Tests the 4 D-channel data links on the AP30. Tests the overall functionality of the mezzanine memory. For CDN-II, tests the 1st 32 Mbytes of the mezzaninememory. For CDN-IIx, tests the 1st 32-Mbyte block of the mezzanine. For CDN-II, tests the 2nd 32 Mbytes of the mezzaninememory.For CDN-IIx, tests the 2nd 32-Mbyte block of the mezzanine. For CDN-IIx only, tests the 3rd 32-Mbyte block of the mezzanine.
02*
10* 12* 13* 20* 40* 41*
43 44 45 46
47
Automatic.
Issue 16.0
December 2000
6-23
401-661-045
IRN MDL (SCN, DSN, ICN) Diagnostic Phases PHASE DESCRIPTION Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC0. Phase 1 also tests that a message can be relayed from the BISO nodeto the EISO node via the isolated segment overring 0, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity. Tests that each node in the isolated segment is able to set and clear its data selector via hardware commands at RAC1. Phase 2 also tests that a message can be relayed from the EISO nodeto the BISO node via the isolated segment overring 1, and that any interframe buffers in the isolated segment are equipped in accordance with ECD data and exhibit the proper data storage capacity. Tests part of both RACs, the RAC to the NP interface, and the interface between both RACs and the ring bus. Checks the capacity of the interframe buffers associated with node under test. Veries that RAC0 can detect bad parity in a ring message. Veries that RAC1 can detect bad parity in a ring message. Tests the IRN2 RAM memory, IRN2 parity checker and generator circuitry. Requests download of diagnostic driver code to the IRN2 and initiates its execution to diagnose the Ethernet interface hardware. Testing ends at the loopback relay on the ELI circuit pack, CP TN4016. * Automatic Circuit Pack Trouble Location Guide On the following pages are check lists for probable or suspected faulty circuit packs to be used when a diagnostic phase has failed for a particular ring node. These listings are ordered from the most to the least probable cause of failure. When diagnosing ring nodes, if the diagnostic result returned is some-tests-failed (STF), refer to the Trouble Location CP List tables for the location of the faulty or suspected faulty CP(s). The TLP option delivers the same information as these tables and can also be used in identifying faulty or suspected faulty CPs. The TLP output is valid only for the rst failing phase and only when all phases are run.
02*
10*
12* 13* 20* 40*
6-24 Issue 16.0
December 2000
The TLP capability has been enhanced to provide more extensive on-line interpretation of the isolated segment diagnostic failure (phases 1 and 2). This assists in the direct localization of ring faults to nodes (or circuit packs) within a multinode isolated segment other than the node being diagnosed. Visual indicators in the form of LEDs located on the CPs can also be used to locate faulty CPs too. For more information on visual indicators in this manual. NOTE: Parentheses () have been used throughout these circuit pack listings to designate that more than one type of circuit pack may exist for a particular ring node, depending upon which generic is being used (although it is preferred that the most current circuit packs be in operation). (For more information, refer to "SD 3F019-02, the Application Schematic for CNI" for features provided by each circuit pack.) Table 6-15. Discontinued Availability CP Listings UNIT NAME RI0 RI1 NP IRN LI-E LI-E UPDATED CIRCUIT PACK UN122C UN123B TN922 UN303B TN917B TN1803
MD CIRCUIT PACK UN122, UN122B UN123 TN913 UN303 TN917 TN1506
Table 6-16.
IRN and IRN2 RPC Trouble Location CP List (Page 1 of 2) PROBABLE/SUSPECTED FAULTY PACK UN303()/UN304B TN915/TN918 TN1508/TN1803 Ring Bus Cable UNIT NAME IRN/IRN2 IFB IFB RNF/C Same as Phase 01
DIAGNOSTIC PHASE PHASE 01 TABLE
02
rpc02.I
Same as Phase 01
Issue 16.0
December 2000
6-25
401-661-045
Table 6-16.
IRN and IRN2 RPC Trouble Location CP List (Page 2 of 2) PROBABLE/SUSPECTED FAULTY PACK TN69B KBN15 (3B21D) UNIT NAME DDSBS DSCH 3BI DDSBS 3BI IRN/IRN2 3BI IRN/IRN2 DDSBS (Demand only Phase) Off-Line DSCH IRN/IRN2 IRN IRN/IRN2 IRN/IRN2 IFB IFB Same as Phase 32
DIAGNOSTIC PHASE PHASE 10 TABLE rpc10.I
11
rpc11.I
TN914 TN69B
12
rpc12.I
TN914 UN303()/UN304B
13
rpc13.I
TN914 UN303()/UN304B
14
rpc14.I
TN69B
KBN15 (3B21D) 20 21 30 32 rpci20.I rpci21.I rpci30.I rpc32.I UN303()/UN304B UN303()/UN304B UN303()/UN304B UN303()/UN304B TN915/TN918 TN1508/TN1803 33 rpc33.I Same as Phase 32
6-26 Issue 16.0
December 2000
Table 6-17.
IRN LN (LIN-E/SS7) Trouble Location CP List (Page 1 of 2) PROBABLE/SUSPECTED FAULTY PACK UN303() TN915/TN918 TN1506/TN1508/TN1509 Ring Bus Cable UNIT NAME IRN IFB IFB RNF/C Same as Phase 01 IRN IRN IFB IFB Same as Phase 12 IRN IRN IRN LI-NE LI-E IRN Same as Phase 40
DIAGNOSTIC PHASE PHASE 01 TABLE iuin01.I
02 10 12
iun02.I iuni10.I iun12.I
Same as Phase 01 UN303() UN303() TN915/TNTN918 TN1506/TN1508/TN1509
13 20 21 39 40
iun13.I iuni20.I iiuni21.I iun39.I cBph0.40.I
Same as Phase 12 UN303() UN303() UN303() TN916 TN917() UN303()
41
cBph1.41.I
Same as Phase 40
Issue 16.0
December 2000
6-27
401-661-045
Table 6-17.
IRN LN (LIN-E/SS7) Trouble Location CP List (Page 2 of 2) PROBABLE/SUSPECTED FAULTY PACK TN916 TN917() TN919 2024-A, 2048-A TN922 LINK Cabling UNIT NAME LI-NE LI-E VFLA Data Sets NP
DIAGNOSTIC PHASE PHASE 47 TABLE cBph7.47.I*
48
cBph8.48.I
TN919 (CCS6) 2024-A, 2048-A (CCS6) TF9 (CCS7)
VFLA
Facility Int. Data Sets LI-NE LI-E NP
Z2466L1A/2 (CCS7) TN916 TN917() TN922 Link Cabling * Phase 47 - CCS7 will ATP by default. Phase 48 - test 47 will fail if Z24556L1A/2 is in Local Loop (LL).
Table 6-18.
IRN LN (LI4S/SS7) Trouble Location CP List (Page 1 of 2) PROBABLE/SUSPECTED FAULTY PACK UN303()/UN304B TN915/TN918 TN1506/TN1508/TN1509 Ring Bus Cable IRN/IRN2 IFB IFB RNF/C UNIT NAME
DIAGNOSTIC PHASE PHASE 01 TABLE iun01.l
6-28 Issue 16.0
December 2000
Table 6-18.
IRN LN (LI4S/SS7) Trouble Location CP List (Page 2 of 2) PROBABLE/SUSPECTED FAULTY PACK Same as Phase 01 UN303()/UN304B UN303()/UN304B TN915/TN918 TN1508/TN1803 Same as Phase 01 IRN/IRN2 IRN/IRN2 IFB IFB Same as Phase 12 IRN/IRN2 IRN IRN LI4S 0 Same as Phase 50 LI4S 0 LI4S 0 LI4S 0 LI4S 0 UNIT NAME
DIAGNOSTIC PHASE PHASE 02 10 12 TABLE iun02.l iuni10.l iun12.l
13 20 21 50
iun13.l iuni20.l iuni21.l LI4ph0.50.l
Same as Phase 12 UN303()/UN304B UN303() UN303() TN1316
51 52 53 54 55 56
LI4ph1.5i1.l LI4ph2.52.l LI4ph3.53.l LI4ph4.54.l LI4ph5.55. LI4ph6.56.l
Same as Phase 50 TN1316 TN1316 TN1316 TN1316 ATPs are by default (APA13 and the DSA (Z2556L1A/2) are noted but no tests are run.
Issue 16.0
December 2000
6-29
401-661-045
Table 6-19.
IRN DLNE Trouble Location CP List (Page 1 of 2) PROBABLE/SUSPECTED FAULTY PACK UN303()/UN304B TN915/TN918 TN1508/TN1803 Ring Bus Cable IRN/IRN2 IFB IFB RNF/C Same as Phase 01 IRN/IRN2 IRN/IRN2 IFB IFB Same as Phase 12 IRN/IRN2 IRN/IRN2 DDSBS DSCH 3BI DDSBS 3BI IRN/IRN2 3BI IRN/IRN2 UNIT NAME
02 10 12
iun02.l iuni10. iun12.l
Same as Phase 01 UN303()/UN304B UN303()/UN304B TN915/TN918 TN1508/TN1803
13 20 21 30
iun13.l iuni20.l iuni21.l iun30.l
Same as Phase 12 UN303()/UN304B UN303()/UN304B TN69B KBN15 (3B21D)
31
iun31.l
TN914 TN69B
32
iun32.l
TN914 UN303()/UN304B
33
iun33.l
TN914 UN303()/UN304B
6-30 Issue 16.0
December 2000
Table 6-19.
IRN DLNE Trouble Location CP List (Page 2 of 2) PROBABLE/SUSPECTED FAULTY PACK TN69B DDSBS (Demand only phase) Off-line DSCH Same as Phase 33 AP AP LI4E AP AP LI4E IRN AP AP LI4E UNIT NAME
DIAGNOSTIC PHASE PHASE 34 TABLE iun34.I
KNB15 (3B21D) 35 40 iun35.I ap68.40.I Same as Phase 33 TN1340 (2 Meg) TN1641 (8 Meg) TN1630 (4ESS Only) 41 ap68.41.I TN1340 (2 Meg) TN1641 (8 Meg) TN1630 (4ESS Only) 42 ap68.42.I UN1340 (2 Meg) TN1340 (2 Meg) TN1641 (8 Meg) TN1630 (4ESS Only)
Issue 16.0
December 2000
6-31
401-661-045
Table 6-20.
IRN2 DLN30 Trouble Location CP List (Page 1 of 2) PROBABLE/SUSPECTED FAULTY PACK UN304 TN918 TN1803 TN1508 Ring Bus Cable IRN2 IFB-U IFB-4K/8 IFB-16/8 RNF/C Same as Phase 01 IRN2 IRN2 IFB-U IFB-4K/8 IFB-16/8 Same as Phase 12 IRN/IRN2 DDSBS DSCH 3BI DDSBS 3BI IRN2 3BI IRN2 UNIT NAME
DIAGNOSTIC PHASE PHASE 01* TABLE iun01.l
02* 10* 12*
Same as Phase 01 UN304 UN304 TN918 TN1803 TN1508
13* 20* 30
iun13.l iuni20.l iun30.l
Same as Phase 12 UN303()/UN304B TN69B KBN15 (3B21D)
31
iun31.l
TN914 TN69B
32
iun32.l
TN914 UN304
33
iun33.l
TN914 UN304
6-32 Issue 16.0
December 2000
Table 6-20.
IRN2 DLN30 Trouble Location CP List (Page 2 of 2) PROBABLE/SUSPECTED FAULTY PACK TN69B DDSBS (Demand only phase) Off-line DSCH Same as Phase 33 AP30 AP30 AP30 AP30 UNIT NAME
KNB15 (3B21D) 35 40* 41* 42* 43 iun35.I ap68.40.I ap60.41.I ap68.42.I Ii4e.43.I Same as Phase 33 TN1630B TN1630B TN1630B TN1630B
* Automatic Demand-Only
Table 6-21.
IRN2 DLN60 Trouble Location CP List (Page 1 of 2) PROBABLE/SUSPECTED FAULTY PACK UN304 TN918 TN1803 TN1508 Ring Bus Cable IRN2 IFB-U IFB-4K/8 IFB-16/8 RNF/C Same as Phase 01 IRN2 UNIT NAME
02 10
iun02.l iuni10.
Same as Phase 01 UN304B
Issue 16.0
December 2000
6-33
401-661-045
Table 6-21.
IRN2 DLN60 Trouble Location CP List (Page 2 of 2) PROBABLE/SUSPECTED FAULTY PACK UN304B TN918 TN1803 TN1508 IRN2 IFB-U IFB-4K/8 IFB-16/8 Same as Phase 12 IRN2 AP60 AP60 UNIT NAME
13 20 40 41
iun13.l iuni20.l ap68.40.I ap68.41.I
Same as Phase 12 UN304B TN2522 TN2522
Table 6-22.
IRN CDN-I Manual Trouble Location CP List (Page 1 of 3) PROBABLE/SUSPECTED FAULTY PACK UN303 UN303B TN918 TN1803 TN1508 Ring Bus Cable IRN IRNB IFB-U IFB-4K/8 IFB-16/8 RNF/C Same as Phase 01 IRN IRNB UNIT NAME
02 10
iun02.I iuni10.I
Same as Phase 01 UN303 UN303B
6-34 Issue 16.0
December 2000
Table 6-22.
IRN CDN-I Manual Trouble Location CP List (Page 2 of 3) PROBABLE/SUSPECTED FAULTY PACK UN303 UN303B TN918 TN1803 TN1508 IRN IRNB IFB-U IFB-4K/8 IFB-16/8 Same as Phase 12 IRN IRNB Same as Phase 20 NPI CCS CCS16 MASA (0-7) MASC 0 MASA16 (0-7) MASC16 CCC CCC16 NPI MASA (0-7) MASC1 Same as Phase 46 Same as Phase 46 UNIT NAME
13 20
iun13.I iuni20.I
Same as Phase 12 UN303 UN303B
21 40 42
iuni21.I irap40.I irap42.I
Same as Phase 20 TN1349 UN236 UN625
43
irap43.I
TN56 UN95
43 (16meg)
irap43_16.I
TN1398 UN507
44
irap44.I
UN237 UN626
45 46
irap45.I irap46.I
TN1349 TN56 UN95/UN295
47 48
irap47.I irap48.I
Same as Phase 46 Same as Phase 46
Issue 16.0
December 2000
6-35
401-661-045
Table 6-22.
IRN CDN-I Manual Trouble Location CP List (Page 3 of 3) PROBABLE/SUSPECTED FAULTY PACK Same as Phase 46 Same as Phase 46 Same as Phase 46 Same as Phase 46 all TN1398 TN1398 TN1398 TN1398 TN1398 TN1398 TN1398 TN1398 Same as Phase 46 Same as Phase 46 Same as Phase 46 Same as Phase 46 all MASA16 (0) MASA16 (1) MASA16 (2) MASA16 (3) MASA16 (4) MASA16 (05 MASA16 (6) MASA16 (7) UNIT NAME
DIAGNOSTIC PHASE PHASE 49 50 51 52 53 54 55 56* 57* 58* 59* 60* 61* * Demand-only TABLE irap49.I irap50.I irap51.I irap52.I irap53.I irap54.I irap55.I irap56.I irap57.I irap58.I irap59.I irap60.I irap61.I
6-36 Issue 16.0
December 2000
Table 6-23.
IRN2 CDN-II/CDN-IIx Manual Trouble Location CP List (Page 1 of 2) PROBABLE/SUSPECTED FAULTY PACK UN304 TN918 TN1803 TN1508 Ring Bus Cable IRN2 IFB-U IFB-4K/8 IFB-16/8 RNF/C Same as Phase 01 IRN2 IRN2 IFB-U IFB-4K/8 IFB-16/8 Same as Phase 12 IRN2 AP30 AP30 AP30 AP30 AP30 AP30 AP30 UNIT NAME
02* 10* 12*
iun02.l iuni10.I iun12.l
Same as Phase 01 UN304 UN304 TN918 TN1803 TN1508
13* 20* 40* 41* 43 44 45 46 47
iun13.l iuni20.l ap68.40.I Ii4e.41.I Ii4e.43.I ap30.44.I ap30.45.I ap30.46.I ap30.47.I
Same as Phase 12 UN304 TN1630B(CDN-II) TN1720()(CDN-IIx) TN1630B(CDN-II) TN1720()(CDN-IIx) TN1630B TN1630B(CDN-II) TN1720()(CDN-IIx) TN1630B(CDN-II) TN1720()(CDN-IIx) TN1630B(CDN-II) TN1720()(CDN-IIx) TN1720() CDN-IIx
Issue 16.0
December 2000
6-37
401-661-045
Table 6-23.
IRN2 CDN-II/CDN-IIx Manual Trouble Location CP List (Page 2 of 2) PROBABLE/SUSPECTED FAULTY PACK TN1720() CDN-IIx TN1720() CDN-IIx TN1720() CDN-IIx TN1720() CDN-IIx TN1720() CDN-IIx AP30 AP30 AP30 AP30 AP30 UNIT NAME
DIAGNOSTIC PHASE PHASE 48 49 50 51 52 * Automatic TABLE ap30.48.I ap30.49.I ap30.50.I ap30.51.I ap30.52.I
NOTE: For APX6.1 prior to Software Update that includes diagnostics for CDN-IIx, Phases 43 and 45 through 52 are demand-only phases; Phase 44 is an automatic phase. For APX6.1 with the Software Update that includes diagnostics for CDN-IIx and for APX7.0, Phase 43 does not apply; and Phases 44 through 52 are automatic phases.
Table 6-24.
IRN2 CDN-III Trouble Location CP List (Page 1 of 2) PROBABLE/SUSPECTED FAULTY PACK UN304 TN918 TN1803 TN1508 Ring Bus Cable IRN2 IFB-U IFB-4K/8 IFB-16/8 RNF/C Same as Phase 01 IRN2 UNIT NAME
02 10
iun02.l iuni10.
Same as Phase 01 UN304B
6-38 Issue 16.0
December 2000
Table 6-24.
IRN2 CDN-III Trouble Location CP List (Page 2 of 2) PROBABLE/SUSPECTED FAULTY PACK UN304B TN918 TN1803 TN1508 IRN2 IFB-U IFB-4K/8 IFB-16/8 Same as Phase 12 IRN2 AP60 AP60 AP60 AP60 AP60 AP60 AP60 UNIT NAME
13 20 40 41 44 45 46 47 48
iun13.l iuni20.l ap60.40I ap60.41I ap60.44I ap60.45I ap60.46I ap60.47I ap60.48I * Automatic
Same as Phase 12 UN304 TN2523 TN2523 TN2523 TN2523 TN2523 TN2523 TN2523
Table 6-25.
IRN2 EIN Node Trouble Location CP List (Page 1 of 2) PROBABLE/SUSPECTED FAULTY PACK UN304 TN918 TN1803 TN1508 Ring Bus Cable IRN2 IFB-U IFB-4K/8 IFB-16/8 RNF/C UNIT NAME
Issue 16.0
December 2000
6-39
401-661-045
Table 6-25.
IRN2 EIN Node Trouble Location CP List (Page 2 of 2) PROBABLE/SUSPECTED FAULTY PACK Same as Phase 01 UN304B UN304B TN918 TN1803 TN1508 Same as Phase 01 IRN2 IRN2 IFB-U IFB-4K/8 IFB-16/8 Same as Phase 12 IRN2 ELI UNIT NAME
DIAGNOSTIC PHASE PHASE 02* 10* 12* TABLE iun02.l iuni10. iun12.l
13* 20* 40*
iun13.l iuni20.l ein40.I * Automatic
Same as Phase 12 UN304 TN4016
Table 6-26.
IRN MDL (CSN, DSN, ICN) Trouble Location CP List PROBABLE/SUSPECTED FAULTY PACK UNIT NAME
02 10 12
Same as Phase 01 UN303()/UN304() UN303()/UN304() TN915/TN918 TN1508/TN1803
Same as Phase 01 IRN/IRN2 IRN/IRN2 IFB IFB
6-40 Issue 16.0
December 2000
Table 6-26.
IRN MDL (CSN, DSN, ICN) Trouble Location CP List PROBABLE/SUSPECTED FAULTY PACK Same as Phase 12 UN303()/UN304() UN303() TN1640 TN1640 Same as Phase 40 Same as Phase 40 TN1640 TN1640 Same as Phase 50 Same as Phase 50 Same as Phase 12 IRN/IRN2 IRN MDL_0 MDL_0 Same as Phase 40 Same as Phase 40 MDL_1 MDL_1 Same as Phase 50 Same as Phase 50 UNIT NAME
DIAGNOSTIC PHASE PHASE 13 20 21 (IRN only) 40 (IRN only) 40 (IRN2 only) 41 (IRN only) 41 (IRN2 only) 50 (IRN only) 50 (IRN2 only) 51 (IRN only) Demand Phase 51 (IRN2 only) Demand Phase TABLE iun13.l iuni20.l iuni21.I iun40.I i2mdI40.I iun41.I i2un41.I iun50.I i2mdI50.1 iun51.I i2mdI51.I
Diagnostic Listings
When diagnostic failures still exist after replacing hardware as recommended in the Manual Trouble Location Circuit Pack List tables, analysis of diagnostic test results is important. This is accomplished using the diagnostic output message and diagnostic listings (.l les), if available. The diagnostic listings are les that end with a .l sufx (such as iun01.l, or rpc01.l). See the manual trouble location circuit pack list tables. Generally the rst failing phase and the rst few failing tests within that phase are useful for analysis. If this data is not on hand, run diagnostics using the RAW option to print all test failures at the ROP. A diagnostic listing consists of a prologue, followed by one or more program units. Each program unit has a prologue, which gives information about what is tested, how the testing is done, and the hardware involved. The remainder of the program unit consists of the diagnostic command lines, comment lines, and lines that are ASCII equivalent of the data found in the corresponding object le. The command lines direct the sequence of diagnostic test execution.
Issue 16.0
December 2000
6-41
401-661-045
Each diagnostic command begins with a statement number. This is the statement number that is referred to in the interactive diagnostics (EX) input and output message (see Performing Diagnostics in this chapter) in early termination output messages, or in the DGN AUDIT RING output message. Some diagnostic command lines are preceded by one or more comment lines. These are lines that begin with the character C. They are intended to give the purpose of the command line that follows it. Each diagnostic command line is followed by a line that shows, in ASCII format, the data corresponding to the command that is contained in the associated executable object le. This line begins with the string * adr unless the command generates a test, and in this case, the command line begins with the string * test. The test numbers in the diagnostic listings correspond to the test numbers in the diagnostic output messages. The only data on this line of importance to on-site users are the test numbers. NOTE: For the rdgnrsl diagnostic command, a separate line is shown to illustrate that all failed test numbers that are returned from the NP are reported by adding 20 to the failed test number that is actually returned.
Clearing Troubles Using the Diagnostic Listings

If a trouble is not cleared after replacing the hardware as listed in the manual trouble locating procedures tables, the following procedure is recommended: 1. From the ROP, examine the diagnostic output message to determine which phases failed. 2. Obtain the les (if available) and read the prologues for the phase and program unit in which the failing test occurs. 3. Find the diagnostic commands associated with the failed tests by checking the test numbers. 4. Read the comments (lines beginning with a C) on the lines that precede the command list to gain understanding on where the problem is located. 5. If unable to determine how to proceed on clearing the trouble, seek assistance from the CTS.
LNs with Unequipped LI Boards - MV Updates

It should be noted that when an LN is equipped in an active ring, but does not contain a link interface (LI4) circuit pack, diagnostic phases 50 through 56 (LI diagnostics) should not be run on that link node. For this situation, the unit control block (UCB) of the equipment conguration data base (ECD) must be modied to accommodate this unequipped LI4. The member version (MV) eld on the UCB
6-42 Issue 16.0
December 2000
form for the LN must be changed. Therefore, if the LN is not equipped with an LI4 circuit pack, enter 0x3 in the MV eld. If the LN is equipped with an LI4 circuit pack, enter 0x3d in the MV eld.
Ring Node Addressing

The addressing of ring nodes and the manner in which frames/cabinets are identied are for maintenance purposes (see Tables 6-21 through 6-24). An address is identied in terms of an integer sequence number and may be represented in decimal or hexadecimal notations. The decimal notations represent the physical node identication, ranging from 0 to 1023, where 1023 is the maximum number of ring nodes located in a location. Another decimal notation listing, ranging from 3072 to 4095, represents the physical node addresses in machine logic. These notations are not usually seen by the users. The other type of node addresses are in hexadecimal notations. These are important in analyzing the mismatch data produced when Phase 1 or 2 at an RN fails. The suspected faulty node(s), as well as the beginning of isolation (BISO) and the end of isolation (EISO) nodes, are identied by hexadecimal physical node addresses. The following tables contain these addresses. Additional information on node addressing can be found in the "Maintenance Description section in the CNI Maintenance Manual, 256-090-202.
Issue 16.0
December 2000
6-43
401-661-045
Table 6-27.
GRP # 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 0 0 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448
Physical Node ID (Decimal Representation) (Page 1 of 3)

MEMBER NUMBER (0 is RPCN, 1 - 15 is IUN) 1 1 17 33 49 65 81 97 113 129 145 161 177 193 209 225 241 257 273 289 305 321 337 353 369 385 401 417 433 449 2 2 18 34 50 66 82 98 114 130 146 162 178 194 210 226 242 258 274 290 306 322 338 354 370 386 402 418 434 450 3 3 19 35 51 67 83 99 115 131 147 163 179 195 211 227 243 259 275 291 307 323 339 355 371 387 403 419 435 451 4 4 20 36 52 68 84 100 116 132 148 164 180 196 212 228 244 260 276 292 308 324 340 356 372 388 404 420 436 452 5 5 21 37 53 69 85 101 117 133 149 165 181 197 213 229 245 261 277 293 309 325 341 357 373 389 405 421 437 453 6 6 22 38 54 70 86 102 118 134 150 166 182 198 214 230 246 262 278 294 310 326 342 358 374 390 406 422 438 454 7 7 23 39 55 71 87 103 119 135 151 167 183 199 215 231 247 263 279 295 311 327 343 359 375 391 407 423 439 455 8 8 24 40 56 72 88 104 120 136 152 168 184 200 216 232 248 264 280 296 312 328 344 360 376 392 408 424 440 456 9 9 25 41 57 73 89 105 121 137 153 169 185 201 217 233 249 265 281 297 313 329 345 361 377 393 409 425 441 457 10 10 26 42 58 74 90 106 122 138 154 170 186 202 218 234 250 266 282 298 314 330 346 362 378 394 410 426 442 458 11 11 27 43 59 75 91 107 123 139 155 171 187 203 219 235 251 267 283 299 315 331 347 363 379 395 411 427 443 459 12 12 28 44 60 76 92 108 124 140 156 172 188 204 220 236 252 268 284 300 316 332 348 364 380 396 412 428 444 460 13 13 29 45 61 77 93 109 125 141 157 173 189 205 221 237 253 269 285 301 317 333 349 365 381 397 413 429 445 461 14 14 30 46 62 78 94 110 126 142 158 174 190 206 222 238 254 270 286 302 318 334 350 366 382 398 414 430 446 462 15 15 31 47 63 79 95 111 127 143 159 175 191 207 223 239 255 271 287 303 319 335 351 367 383 399 415 431 447 463
6-44 Issue 16.0
December 2000
Table 6-27.
GRP # 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 0 464 480 496 512 528 544 560 576 592 608 624 640 656 672 688 704 720 736 752 768 784 800 816 832 848 864 880 896 912 928

MEMBER NUMBER (0 is RPCN, 1 - 15 is IUN) 1 465 481 497 513 529 545 561 577 593 609 625 641 657 673 689 705 721 737 753 769 785 801 817 833 849 865 881 897 913 929 2 466 482 498 514 530 546 562 578 594 610 626 642 658 674 690 706 722 738 754 770 786 802 818 834 850 866 882 898 914 930 3 467 483 499 515 531 547 563 579 595 611 627 643 659 675 691 707 723 739 755 771 787 803 819 835 851 867 883 899 915 931 4 468 484 500 516 532 548 564 580 596 612 628 644 660 676 692 708 724 740 756 772 788 804 820 836 852 868 884 900 916 932 5 469 485 501 517 533 549 565 581 597 613 629 645 661 677 693 709 725 741 757 773 789 805 821 837 853 869 885 901 917 933 6 470 486 502 518 534 550 566 582 598 614 630 646 662 678 694 710 726 742 758 774 790 806 822 838 854 870 886 902 918 934 7 471 487 503 519 535 551 567 583 599 615 631 647 663 679 695 711 727 743 759 775 791 807 823 839 855 871 887 903 919 935 8 472 488 504 520 536 552 568 584 600 616 632 648 664 680 696 712 728 744 760 776 792 808 824 840 856 872 888 904 920 936 9 473 489 505 521 537 553 569 585 601 617 633 649 665 681 697 713 729 745 761 777 793 809 825 841 857 873 889 905 921 937 10 474 490 506 522 538 554 570 586 602 618 634 650 666 682 698 714 730 746 762 778 794 810 826 842 858 874 890 906 922 938 11 475 491 507 523 539 555 571 587 603 619 635 651 667 683 699 715 731 747 763 779 795 811 827 843 859 875 891 907 923 939 12 476 492 508 524 540 556 572 588 604 620 636 652 668 684 700 716 732 748 764 780 796 812 828 844 860 876 892 908 924 940 13 477 493 509 525 541 557 573 589 605 621 637 653 669 685 701 717 733 749 765 781 797 813 829 845 861 877 893 909 925 941 14 478 494 510 526 542 558 574 590 606 622 638 654 670 686 702 718 734 750 766 782 798 814 830 846 862 878 894 910 926 942 15 479 495 511 527 543 559 575 591 607 623 639 655 671 687 703 719 735 751 767 783 799 815 831 847 863 879 895 911 927 943
Issue 16.0
December 2000
6-45
401-661-045
Table 6-27.
GRP # 59 60 61 62 63 0 944 960 976 992 1008

MEMBER NUMBER (0 is RPCN, 1 - 15 is IUN) 1 945 961 977 993 1009 2 946 962 978 994 1010 3 947 963 979 995 1011 4 948 964 980 996 1012 5 949 965 981 997 1013 6 950 966 982 998 1014 7 951 967 983 999 1015 8 952 968 984 1000 1016 9 953 969 985 1001 1017 10 954 970 986 1002 1018 11 955 971 987 1003 1019 12 956 972 988 1004 1020 13 957 973 989 1005 1021 14 958 974 990 1006 1022 15 959 975 991 1007 1023
6-46 Issue 16.0
December 2000
Table 6-28.
GRP # 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 0 000 010 020 030 040 050 060 070 080 090 0A0 0B0 0C0 0D0 0E0 0F0 100 110 120 130 140 150 160 170 180 190 1A0 1B0 1C0
Physical Node ID (Hexadecimal Representation) (Page 1 of 3)

MEMBER NUMBER (0 is RPCN, 1 - 15 is IUN) 1 001 011 021 031 041 051 061 071 081 091 0A1 0B1 0C1 0D1 0E1 0F1 101 111 121 131 141 151 161 171 181 191 1A1 1B1 1C1 2 002 012 022 032 042 052 062 072 082 092 0A2 0B2 0C2 0D2 0E2 0F2 102 112 122 132 142 152 162 172 182 192 1A2 1B2 1C2 3 003 013 023 033 043 053 063 073 083 093 0A3 0B3 0C3 0D3 0E3 0F3 103 113 123 133 143 153 163 173 183 193 1A3 1B3 1C3 4 004 014 024 034 044 054 064 074 084 094 0A4 0B4 0C4 0D4 0E4 0F4 104 114 124 134 144 154 164 174 184 194 1A4 1B4 1C4 5 005 015 025 035 045 055 065 075 085 095 0A5 0B5 0C5 0D5 0E5 0F5 105 115 125 135 145 155 165 175 185 195 1A5 1B5 1C5 6 006 016 026 036 046 056 066 076 086 096 0A6 0B6 0C6 0D6 0E6 0F6 106 116 126 136 146 156 166 176 186 196 1A6 1B6 1C6 7 007 017 027 037 047 057 067 077 087 097 0A7 0B7 0C7 0D7 0E7 0F7 107 117 127 137 147 157 167 177 187 197 1A7 1B7 1C7 8 008 018 028 038 048 058 068 078 088 098 0A8 0B8 0C8 0D8 0E8 0F8 108 118 128 138 148 158 168 178 188 198 1A8 1B8 1C8 9 009 019 029 039 049 059 069 079 089 099 0A9 0B9 0C9 0D9 0E9 0F9 109 119 129 139 149 159 169 179 189 199 1A9 1B9 1C9 10 00A 01A 02A 03A 04A 05A 06A 07A 08A 09A 0AA 0BA 0CA 0DA 0EA 0FA 10A 11A 12A 13A 14A 15A 16A 17A 18A 19A 1AA 1BA 1CA 11 00B 01B 02B 03B 04B 05B 06B 07B 08B 09B 0AB 0BB 0CB 0DB 0EB 0FB 10B 11B 12B 13B 14B 15B 16B 17B 18B 19B 1AB 1BB 1CB 2 00C 01C 02C 03C 04C 05C 06C 07C 08C 09C 0AC 0BC 0CC 0DC 0EC 0FC 10C 11C 12C 13C 14C 15C 16C 17C 18C 19C 1AC 1BC 1CC 13 00D 01D 02D 03D 04D 05D 06D 07D 08D 09D 0AD 0BD 0CD 0DD 0ED 0FD 10D 11D 12D 13D 14D 15D 16D 17D 18D 19D 1AD 1BD 1CD 14 00E 01E 02E 03E 04E 05E 06E 07E 08E 09E 0AE 0BE 0CE 0DE 0EE 0FE 10E 11E 12E 13E 14E 15E 16E 17E 18E 19E 1AE 1BE 1CE 15 00F 01F 02F 03F 04F 05F 06F 07F 08F 09F 0AF 0BF 0CF 0DF 0EF 0FF 10F 11F 12F 13F 14F 15F 16F 17F 18F 19F 1AF 1BF 1CF
Issue 16.0
December 2000
6-47
401-661-045
Table 6-28.
GRP # 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 0 1D0 1E0 1F0 200 210 220 230 240 250 260 270 280 290 2A0 2B0 2C0 2D0 2E0 2F0 300 310 320 330 340 350 360 370 380 390 3A0

MEMBER NUMBER (0 is RPCN, 1 - 15 is IUN) 1 1D1 1E1 1F1 201 211 221 231 241 251 261 271 281 291 2A1 2B1 2C1 2D1 2E1 2F1 301 311 321 331 341 351 361 371 381 391 3A1 2 1D2 1E2 1F2 202 212 222 232 242 252 262 272 282 292 2A2 2B2 2C2 2D2 2E2 2F2 302 312 322 332 342 352 362 372 382 392 3A2 3 1D3 1E3 1F3 203 213 223 233 243 253 263 273 283 293 2A3 2B3 2C3 2D3 2E3 2F3 303 313 323 333 343 353 363 373 383 393 3A3 4 1D4 1E4 1F4 204 214 224 234 244 254 264 274 284 294 2A4 2B4 2C4 2D4 2E4 2F4 304 314 324 334 344 354 364 374 384 394 3A4 5 1D5 1E5 1F5 205 215 225 235 245 255 265 275 285 295 2A5 2B5 2C5 2D5 2E5 2F5 305 315 325 335 345 355 365 375 385 395 3A5 6 1D6 1E6 1F6 206 216 226 236 246 256 266 276 286 296 2A6 2B6 2C6 2D6 2E6 2F6 306 316 326 336 346 356 366 376 386 396 3A6 7 1D7 1E7 1F7 207 217 227 237 247 257 267 277 287 297 2A7 2B7 2C7 2D7 2E7 2F7 307 317 327 337 347 357 367 377 387 397 3A7 8 1D8 1E8 1F8 208 218 228 238 248 258 268 278 288 298 2A8 2B8 2C8 2D8 2E8 2F8 308 318 328 338 348 358 368 378 388 398 3A8 9 1D9 1E9 1F9 209 219 229 239 249 259 269 279 289 299 2A9 2B9 2C9 2D9 2E9 2F9 309 319 329 339 349 359 369 379 389 399 3A9 10 1DA 1EA 1FA 20A 21A 22A 23A 24A 25A 26A 27A 28A 29A 2AA 2BA 2CA 2DA 2EA 2FA 30A 31A 32A 33A 34A 35A 36A 37A 38A 39A 3AA 11 1DB 1EB 1FB 20B 21B 22B 23B 24B 25B 26B 27B 28B 29B 2AB 2BB 2CB 2DB 2EB 2FB 30B 31B 32B 33B 34B 35B 36B 37B 38B 39B 3AB 2 1DC 1EC 1FC 20C 21C 22C 23C 24C 25C 26C 27C 28C 29C 2AC 2BC 2CC 2DC 2EC 2FC 30C 31C 32C 33C 34C 35C 36C 37C 38C 39C 3AC 13 1DD 1ED 1FD 20D 21D 22D 23D 24D 25D 26D 27D 28D 29D 2AD 2BD 2CD 2DD 2ED 2FD 30D 31D 32D 33D 34D 35D 36D 37D 38D 39D 3AD 14 1DE 1EE 1FE 20E 21E 22E 23E 24E 25E 26E 27E 28E 29E 2AE 2BE 2CE 2DE 2EE 2FE 30E 31E 32E 33E 34E 35E 36E 37E 38E 39E 3AE 15 1DF 1EF 1FF 20F 21F 22F 23F 24F 25F 26F 27F 28F 29F 2AF 2BF 2CF 2DF 2EF 2FF 30F 31F 32F 33F 34F 35F 36F 37F 38F 39F 3AF
6-48 Issue 16.0
December 2000
Table 6-28.
GRP # 59 60 61 62 63 0 3B0 3C0 3D0 3E0 3F0

MEMBER NUMBER (0 is RPCN, 1 - 15 is IUN) 1 3B1 3C1 3D1 3E1 3F1 2 3B2 3C2 3D2 3E2 3F2 3 3B3 3C3 3D3 3E3 3F3 4 3B4 3C4 3D4 3E4 3F4 5 3B5 3C5 3D5 3E5 3F5 6 3B6 3C6 3D6 3E6 3F6 7 3B7 3C7 3D7 3E7 3F7 8 3B8 3C8 3D8 3E8 3F8 9 3B9 3C9 3D9 3E9 3F9 10 3BA 3CA 3DA 3EA 3FA 11 3BB 3CB 3DB 3EB 3FB 2 3BC 3CC 3DC 3EC 3FC 13 3BD 3CD 3DD 3ED 3FD 14 3BE 3CE 3DE 3EE 3FE 15 3BF 3CF 3DF 3EF 3FF
Issue 16.0
December 2000
6-49
401-661-045
Table 6-29.
GRP # 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 0 3072 3088 3104 3120 3136 3152 3168 3184 3200 3216 3232 3248 3264 3280 3296 3312 3328 3344 3360 3376 3392 3408 3424 3440 3456 3472 3488 3504 3520
Physical Node Addresses (Decimal Representation) (Page 1 of 3)

MEMBER NUMBER (0 is RPCN, 1 - 15 IUN) 1 3073 3089 3105 3121 3137 3153 3169 3185 3201 3217 3233 3249 3265 3281 3297 3313 3329 3345 3361 3377 3393 3409 3425 3441 3457 3473 3489 3505 3521 2 3074 3090 3106 3122 3138 3154 3170 3186 3202 3218 3234 3250 3266 3282 3298 3314 3330 3346 3362 3378 3394 3410 3426 3442 3458 3474 3490 3506 3522 3 3075 3091 3107 3123 3139 3155 3171 3187 3203 3219 3235 3251 3267 3283 3299 3315 3331 3347 3363 3379 3395 3411 3427 3443 3459 3475 3491 3507 3523 4 3076 3092 3108 3124 3140 3156 3172 3188 3204 3220 3236 3252 3268 3284 3300 3316 3332 3348 3364 3380 3396 3412 3428 3444 3460 3476 3492 3508 3524 5 3077 3093 3109 3125 3141 3157 3173 3189 3205 3221 3237 3253 3269 3285 3301 3317 3333 3349 3365 3381 3397 3413 3429 3445 3461 3477 3493 3509 3525 6 3078 3094 3110 3126 3142 3158 3174 3190 3206 3222 3238 3254 3270 3286 3302 3318 3334 3350 3366 3382 3398 3414 3430 3446 3462 3478 3494 3510 3526 7 3079 3095 3111 3127 3143 3159 3175 3191 3207 3223 3239 3255 3271 3287 3303 3319 3335 3351 3367 3383 3399 3415 3431 3447 3463 3479 3495 3511 3527 8 3080 3096 3112 3128 3144 3160 3176 3192 3208 3224 3240 3256 3272 3288 3304 3320 3336 3352 3368 3384 3400 3416 3432 3448 3464 3480 3496 3512 3528 9 3081 3097 3113 3129 3145 3161 3177 3193 3209 3225 3241 3257 3273 3289 3305 3321 3337 3353 3369 3385 3401 3417 3433 3449 3465 3481 3497 3513 3529 10 3082 3098 3114 3130 3146 3162 3178 3194 3210 3226 3242 3258 3274 3290 3306 3322 3338 3354 3370 3386 3402 3418 3434 3450 3466 3482 3498 3514 3530 11 3083 3099 3115 3131 3147 3163 3179 3195 3211 3227 3243 3259 3275 3291 3307 3323 3339 3355 3371 3387 3403 3419 3435 3451 3467 3483 3499 3515 3531 12 3084 3100 3116 3132 3148 3164 3180 3196 3212 3228 3244 3260 3276 3292 3308 3324 3340 3356 3372 3388 3404 3420 3436 3452 3468 3484 3500 3516 3532 13 3085 3101 3117 3133 3149 3165 3181 3197 3213 3229 3245 3261 3277 3293 3309 3325 3341 3357 3373 3389 3405 3421 3437 3453 3469 3485 3501 3517 3533 14 3086 3102 3118 3134 3150 3166 3182 3198 3214 3230 3246 3262 3278 3294 3310 3326 3342 3358 3374 3390 3406 3422 3438 3454 3470 3486 3502 3518 3534 15 3087 3103 3119 3135 3151 3167 3183 3199 3215 3231 3247 3263 3279 3295 3311 3327 3343 3359 3375 3391 3407 3423 3439 3455 3471 3487 3503 3519 3535
6-50 Issue 16.0
December 2000
Table 6-29.
GRP # 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 0 3536 3552 3568 3584 3600 3616 3632 3648 3664 3680 3696 3712 3728 3744 3760 3776 3792 3808 3824 3840 3856 3872 3888 3904 3920 3936 3952 3968 3984 4000

MEMBER NUMBER (0 is RPCN, 1 - 15 IUN) 1 3537 3553 3569 3585 3601 3617 3633 3649 3665 3681 3697 3713 3729 3745 3761 3777 3793 3809 3825 3841 3857 3873 3889 3905 3921 3937 3953 3969 3985 4001 2 3538 3554 3570 3586 3602 3618 3634 3650 3666 3682 3698 3714 3730 3746 3762 3778 3794 3810 3826 3842 3858 3874 3890 3906 3922 3938 3954 3970 3986 4002 3 3539 3555 3571 3587 3603 3619 3635 3651 3667 3683 3699 3715 3731 3747 3763 3779 3795 3811 3827 3843 3859 3875 3891 3907 3923 3939 3955 3971 3987 4003 4 3540 3556 3572 3588 3604 3620 3636 3652 3668 3684 3700 3716 3732 3748 3764 3780 3796 3812 3828 3844 3860 3876 3892 3908 3924 3940 3956 3972 3988 4004 5 3541 3557 3573 3589 3605 3621 3637 3653 3669 3685 3701 3717 3733 3749 3765 3781 3797 3813 3829 3845 3861 3877 3893 3909 3925 3941 3957 3973 3989 4005 6 3542 3558 3574 3590 3606 3622 3638 3654 3670 3686 3702 3718 3734 3750 3766 3782 3798 3814 3830 3846 3862 3878 3894 3910 3926 3942 3958 3974 3990 4006 7 3543 3559 3575 3591 3607 3623 3639 3655 3671 3687 3703 3719 3735 3751 3767 3783 3799 3815 3831 3847 3863 3879 3895 3911 3927 3943 3959 3975 3991 4007 8 3544 3560 3576 3592 3608 3624 3640 3656 3672 3688 3704 3720 3736 3752 3768 3784 3800 3816 3832 3848 3864 3880 3896 3912 3928 3944 3960 3976 3992 4008 9 3545 3561 3577 3593 3609 3625 3641 3657 3673 3689 3705 3721 3737 3753 3769 3785 3801 3817 3833 3849 3865 3881 3897 3913 3929 3945 3961 3977 3993 4009 10 3546 3562 3578 3594 3610 3626 3642 3658 3674 3690 3706 3722 3738 3754 3770 3786 3802 3818 3834 3850 3866 3882 3898 3914 3930 3946 3962 3978 3994 4010 11 3547 3563 3579 3595 3611 3627 3643 3659 3675 3691 3707 3723 3739 3755 3771 3787 3803 3819 3835 3851 3867 3883 3899 3915 3931 3947 3963 3979 3995 4011 12 3548 3564 3580 3596 3612 3628 3644 3660 3676 3692 3708 3724 3740 3756 3772 3788 3804 3820 3836 3852 3868 3884 3900 3916 3932 3948 3964 3980 3996 4012 13 3549 3565 3581 3597 3613 3629 3645 3661 3677 3693 3709 3725 3741 3757 3773 3789 3805 3821 3837 3853 3869 3885 3901 3917 3933 3949 3965 3981 3997 4013 14 3550 3566 3582 3598 3614 3630 3646 3662 3678 3694 3710 3726 3742 3758 3774 3790 3806 3822 3838 3854 3870 3886 3902 3918 3934 3950 3966 3982 3998 4014 15 3551 3567 3583 3599 3615 3631 3647 3663 3679 3695 3711 3727 3743 375 3775 3791 3807 3823 3839 3855 3871 3887 3903 3919 3935 3951 3967 3983 3999 4015
Issue 16.0
December 2000
6-51
401-661-045
Table 6-29.
GRP # 59 60 61 62 63 0 4016 4032 4048 4064 4080

MEMBER NUMBER (0 is RPCN, 1 - 15 IUN) 1 4017 4033 4049 4065 4081 2 4018 4034 4050 4066 4082 3 4019 4035 4051 4067 4083 4 4020 4036 4052 4068 4084 5 4021 4037 4053 4069 4085 6 4022 4038 4054 4070 4086 7 4023 4039 4055 4071 4087 8 4024 4040 4056 4072 4088 9 4025 4041 4057 4073 4089 10 4026 4042 4058 4074 4090 11 4027 4043 4059 4075 4091 12 4028 4044 4060 4076 4092 13 4029 4045 4061 4077 4093 14 4030 4046 4062 4078 4094 15 4031 4047 4063 4079 4095
6-52 Issue 16.0
December 2000
Table 6-30.
GRP # 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 0 C00 C10 C20 C30 C40 C50 C60 C70 C80 C90 CA0 CB0 CC0 CD0 CE0 CF0 D00 D10 D20 D30 D40 D50 D60 D70 D80 D90 DA0 DB0 DC0
Physical Node Addresses (Hexadecimal Representation) (Page 1 of 3)

Member Number (0 is RPCN, 1 - 15 is IUN) 1 C01 C11 C21 C31 C41 C51 C61 C71 C81 C91 CA1 CB1 CC1 CD1 CE1 CF1 D01 D11 D21 D31 D41 D51 D61 D71 D81 D91 DA1 DB1 DC1 2 C02 C12 C22 C32 C42 C52 C62 C72 C82 C92 CA2 CB2 CC2 CD2 CE2 CF2 D02 D12 D22 D32 D42 D52 D62 D72 D82 D92 DA2 DB2 DC2 3 C03 C13 C23 C33 C43 C53 C63 C73 C83 C93 CA3 CB3 CC3 CD3 CE3 CF3 D03 D13 D23 D33 D43 D53 D63 D73 D83 D93 DA3 DB3 DC3 4 C04 C14 C24 C34 C44 C54 C64 C74 C84 C94 CA4 CB4 CC4 CD4 CE4 CF4 D04 D14 D24 D34 D44 D54 D64 D74 D84 D94 DA4 DB4 DC4 5 C05 C15 C25 C35 C45 C55 C65 C75 C85 C95 CA5 CB5 CC5 CD5 CE5 CF5 D05 D15 D25 D35 D45 D55 D65 D75 D85 D95 DA5 DB5 DC5 6 C06 C16 C26 C36 C46 C56 C66 C76 C86 C96 CA6 CB6 CC6 CD6 CE6 CF6 D06 D16 D26 D36 D46 D56 D66 D76 D86 D96 DA6 DB6 DC6 7 C07 C17 C27 C37 C47 C57 C67 C77 C87 C97 CA7 CB7 CC7 CD7 CE7 CF7 D07 D17 D27 D37 D47 D57 D67 D77 D87 D97 DA7 DB7 DC7 8 C08 C18 C28 C38 C48 C58 C68 C78 C88 C98 CA8 CB8 CC8 CD8 CE8 CF8 D08 D18 D28 D38 D48 D58 D68 D78 D88 D98 DA8 DB8 DC8 9 C09 C19 C29 C39 C49 C59 C69 C79 C89 C99 CA9 CB9 CC9 CD9 CE9 CF9 D09 D19 D29 D39 D49 D59 D69 D79 D89 D99 DA9 DB9 DC9 10 C0A C1A C2A C3A C4A C5A C6A C7A C8A C9A CAA CBA CCA CDA CEA CFA D0A D1A D2A D3A D4A D5A D6A D7A D8A D9A DAA DBA DCA 11 C0B C1B C2B C3B C4B C5B C6B C7B C8B C9B CAB CBB CCB CDB CEB CFB D0B D1B D2B D3B D4B D5B D6B D7B D8B D9B DAB DBB DCB 12 C0C C1C C2C C3C C4C C5C C6C C7C C8C C9C CAC CBC CCC CDC CEC CFC D0C D1C D2C D3C D4C D5C D6C D7C D8C D9C DAC DBC DCC 13 C0D C1D C2D C3D C4D C5D C6D C7D C8D C9D CAD CBD CCD CDD CED CFD D0D D1D D2D D3D D4D D5D D6D D7D D8D D9D DAD DBD DCD 14 C0E C1E C2E C3E C4E C5E C6E C7E C8E C9E CAE CBE CCE CDE CEE CFE D0E D1E D2E D3E D4E D5E D6E D7E D8E D9E DAE DBE DCE 15 C0F C1F C2F C3F C4F C5F C6F C7F C8F C9F CAF CBF CCF CDF CEF CFF D0F D1F D2F D3F D4F D5F D6F D7F D8F D9F DAF DBF DCF
Issue 16.0
December 2000
6-53
401-661-045
Table 6-30.
GRP # 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 0 DD0 DE0 DF0 E00 E10 E20 E30 E40 E50 E60 E70 E80 E90 EA0 EB0 EC0 ED0 EE0 EF0 F00 F10 F20 F30 F40 F50 F60 F70 F80 F90 FA0

Member Number (0 is RPCN, 1 - 15 is IUN) 1 DD1 DE1 DF1 E01 E11 E21 E31 E41 E51 E61 E71 E81 E91 EA1 EB1 EC1 ED1 EE1 EF1 F01 F11 F21 F31 F41 F51 F61 F71 F81 F91 FA1 2 DD2 DE2 DF2 E02 E12 E22 E32 E42 E52 E62 E72 E82 E92 EA2 EB2 EC2 ED2 EE2 EF2 F02 F12 F22 F32 F42 F52 F62 F72 F82 F92 FA2 3 DD3 DE3 DF3 E03 E13 E23 E33 E43 E53 E63 E73 E83 E93 EA3 EB3 EC3 ED3 EE3 EF3 F03 F13 F23 F33 F43 F53 F63 F73 F83 F93 FA3 4 DD4 DE4 DF4 E04 E14 E24 E34 E44 E54 E64 E74 E84 E94 EA4 EB4 EC4 ED4 EE4 EF4 F04 F14 F24 F34 F44 F54 F64 F74 F84 F94 FA4 5 DD5 DE5 DF5 E05 E15 E25 E35 E45 E55 E65 E75 E85 E95 EA5 EB5 EC5 ED5 EE5 EF5 F05 F15 F25 F35 F45 F55 F65 F75 F85 F95 FA5 6 DD6 DE6 DF6 E06 E16 E26 E36 E46 E56 E66 E76 E86 E96 EA6 EB6 EC6 ED6 EE6 EF6 F06 F16 F26 F36 F46 F56 F66 F76 F86 F96 FA6 7 DD7 DE7 DF7 E07 E17 E27 E37 E47 E57 E67 E77 E87 E97 EA7 EB7 EC7 ED7 EE7 EF7 F07 F17 F27 F37 F47 F57 F67 F77 F87 F97 FA7 8 DD8 DE8 DF8 E08 E18 E28 E38 E48 E58 E68 E78 E88 E98 EA8 EB8 EC8 ED8 EE8 EF8 F08 F18 F28 F38 F48 F58 F68 F78 F88 F98 FA8 9 DD9 DE9 DF9 E09 E19 E29 E39 E49 E59 E69 E79 E89 E99 EA9 EB9 EC9 ED9 EE9 EF9 F09 F19 F29 F39 F49 F59 F69 F79 F89 F99 FA9 10 DDA DEA DFA E0A E1A E2A E3A E4A E5A E6A E7A E8A E9A EAA EBA ECA EDA EEA EFA F0A F1A F2A F3A F4A F5A F6A F7A F8A F9A FAA 11 DDB DEB DFB E0B E1B E2B E3B E4B E5B E6B E7B E8B E9B EAB EBB ECB EDB EEB EFB F0B F1B F2B F3B F4B F5B F6B F7B F8B F9B FAB 12 DDC DEC DFC E0C E1C E2C E3C E4C E5C E6C E7C E8C E9C EAC EBC ECC EDC EEC EFC F0C F1C F2C F3C F4C F5C F6C F7C F8C F9C FAC 13 DDD DED DFD E0D E1D E2D E3D E4D E5D E6D E7D E8D E9D EAD EBD ECD EDD EED EFD F0D F1D F2D F3D F4D F5D F6D F7D F8D F9D FAD 14 DDE DEE DFE E0E E1E E2E E3E E4E E5E E6E E7E E8E E9E EAE EBE ECE EDE EEE EFE F0E F1E F2E F3E F4E F5E F6E F7E F8E F9E FAE 15 DDF DEF DFF E0F E1F E2F E3F E4F E5F E6F E7F E8F E9F EAF EBF ECF EDF EEF EFF F0F F1F F2F F3F F4F F5F F6F F7F F8F F9F FAF
6-54 Issue 16.0
December 2000
Table 6-30.
GRP # 59 60 60 62 63 0 FB0 FC0 FD0 FE0 FF0

Member Number (0 is RPCN, 1 - 15 is IUN) 1 FB1 FC1 FD1 FE1 FF1 2 FB2 FC2 FD2 FE2 FF2 3 FB3 FC3 FD3 FE3 FF3 4 FB4 FC4 FD4 FE4 FF4 5 FB5 FC5 FD5 FE5 FF5 6 FB6 FC6 FD6 FE6 FF6 7 FB7 FC7 FD7 FE7 FF7 8 FB8 FC8 FD8 FE8 FF8 9 FB9 FC9 FD9 FE9 FF9 10 FBA FCA FDA FEA FFA 11 FBB FCB FDB FEB FFB 12 FBC FCC FDC FEC FFC 13 FBD FCD FDD FED FFD 14 FBE FCE FDE FEE FFE 15 FBF FCF FDF FEF FFF
Automatic Diagnostics and Restorals

Automatic restoral of nodes is a feature provided by the node recovery monitor (NRM). Only nodes in the OOS major state are considered for restoral by the NRM. Depending on the minor states, a conditional, unconditional, or no restoral request is issued. The NRM ensures that any node entering a state which indicates that it is eligible to be restored to service is the object of an appropriate restoral attempt within a few minutes, unless other work takes precedence. The NRM must perform the following tasks: 1. Attempt recovery of faulted nodes, including any associated ring isolations. A node can be faulted when a problem is detected during operation or when it fails to become active during system-wide initialization. 2. Recover usable nodes which become available due to removal of a ring isolation. 3. Detect and make ineligible for automatic recovery, those nodes which are too frequently faulted and recovered. 4. Inhibit the automatic starting of node restorals:
s s
During a system-wide initialization. When the ring maintenance state indicates that the ring is undergoing reconguration or is down.
5. Submit all conditional restorals under software known as ARR. When a requested restoral is not successful, or the internal timer awaiting job completion expires, the following message is generated: REPT ARR AUTORST FAILURE FOR aaaa b where: aaaa b = identifying name of the node.
Issue 16.0
December 2000
6-55
401-661-045
If the ECD restoral threshold is exceeded, the following output message is generated: REPT ARR AUTORST THRESHOLD EXCEEDED FOR aaaa b where: aaaa b = identifying name of the node. If a time-out occurs while waiting for a reply message, this output message is generated: REPT ARR AUTORST TIMEOUT AWAITING MIRA FOR aaaa b where: aaaa b = identifying name of the node. For additional information regarding the BREPT ARR AUTORST messages, refer to the the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual. The following priorities determine the order in which nodes eligible for automatic restoral are served: 1. 2. 3. 4. 5. 6. A nominated critical node (typically the BISO or EISO node) Nodes with faulty ring interfaces RPCNs eligible for unconditional restorals RPCNs eligible for conditional restorals Is eligible for unconditional restorals Is eligible for conditional restorals.
For a more detailed description of automatic node restorals and ARR, refer to the"Maintenance Description section in ththe 401-610-055 Input Message Manual.
Manual (Unit) Diagnostics

Presented on the following pages are variations of procedures that are used in performing RN diagnostics. Each procedure completely performs the diagnostic tasks. The procedures are presented to illustrate that there is no one dened procedure for performing RN diagnostics. The user may determine which procedure to use, depending upon the extent of the diagnostic task, but in general, use of the 1106 page will provide adequate results. NOTE: Replace the term nodexx y within each input command with the appropriate node being diagnosed (or RPCN). Also, before any manual diagnostics begin, ARR
6-56 Issue 16.0
December 2000
should be inhibited to prevent automatic diagnostics (ARR) from attempting to diagnose and restore nodes scheduled for manual diagnostics. See fINH:DMQ in the the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual. Before any node associated with an active link can be removed from service for diagnostic purposes, the appropriate link must be removed from service. To put the signaling link (SLK) in the AVAILABLE-Manual Out-of-Service (MOOS) state, enter the following message at the MCRT, and proceed with diagnostics as usual. CHG:SLK (a, b, [c, d]); MOOS where: a = group number (00 - 63) b = member number (01 - 15) The following message should appear on the MCRT: CHG SLK a b [ c d ] NEW REQUESTED MINOR STATE = MOOS where: a = group number (00 - 63) b = member number (01 - 15) c = LI4 circuit pack (0 - 1) d = LI4 port (0 - 3) If the SLK was manually removed from service, after diagnostics put it back in the AVAILABLE-In Service (IS) or Standby (STBY) state by entering the following message at the MCRT: CHG:SLK (a, b, [c, d]); {IS | ARST} where: a = group number (00 - 63) b = member number (01 - 15) c = LI4 circuit pack (0 - 1) d = LI4 port (0 - 3) The following message should appear on the MCRT:
Issue 16.0
December 2000
6-57
401-661-045
CHG SLK a b [ c d ] NEW REQUESTED MINOR STATE = IS where: a = group number (00 - 63) b = member number (01 - 15) c = LI4 circuit pack (0 - 1) d = LI4 port (0 - 3) Refer back to these procedures as required when performing manual diagnostics. There are basic events that must be accomplished when performing RN diagnostics. Input messages and formats can vary. As indicated in earlier paragraphs of this guide, some input messages cause the system to perform all diagnostic activities, such as removing the node from service, isolating the node, diagnosing the node, unisolating the node, and restoring the node to service. Yet, there are other input messages, where each individual event is acted upon according to the diagnostic message used. When performing RN diagnostics with the use of a conditional restore (RST) or with the DGN command, a basic sequence of events (excluding obtaining a status report) autonomously occur in the manner listed below: 1. The node under test (NUT) must rst be removed from service. This is done by changing its state to out-of-service normal (OOS-NORMAL), if it was in the ACT state prior to performing the diagnostics. For additional information on node state changes, see the Maintenance Description section in this Manual. 2. The NUT is changed to the OOS-ISOLATED state to route incoming and outgoing trafc around the NUT. The request to isolate the NUT may be denied for reasons not listed here. 3. The node under test is diagnosed. 4. If the NUT was in the active ring prior to Step 2, after all diagnostic phases ran, the NUT is congured back into the active ring (OOS-NORMAL). The conguration can be denied if the diagnostics determined that the ring interface (RI) minor state is faulty (FLTY). 5. Finally, after successfully conguring the node back into the active ring, the NUT is restored to service. It is automatically pumped with operational code, placed into execution, and changed to the active (ACT) state. NOTE: If the request was a DGN rather than an RST, the node is not restored to service.
6-58 Issue 16.0
December 2000
When a diagnostic failure cannot be corrected by CP replacement using the manual trouble locating process (see the trouble location circuit pack list tables in this chapter), check:
s s s
Interframe buffering cables Backplane and pins Wiring.
Before replacing any cables or changing any connections or pins, refer to the appropriate maintenance manuals. The following pages provide procedures used in performing RN diagnostics. Any of the following procedures can perform a diagnostic task. The following procedures are used for diagnosing either RPCNs or s. Each procedure is totally independent and should not be combined.
Manual Diagnostics Using the 1106 Display Page

The 1106 display page, sometimes called the ring node status page, allows you to perform diagnostics and remove or unconditionally restore any node in the ofce. The ring node status page (RNSP), that is, the 1106 page, allows for the performance of either function mentioned above on the frame/cabinet that is displayed on the MCRT. To obtain proper MCRT operation and page display instructions, see Trouble Indicators, Error Analysis, and Display Pages in this Manual. When the Index Page display has been obtained, enter 1106 on the command line at the top of the MCRT. Before any node supporting an active link is taken out of service, the associated link must rst be removed from service. The link should also be placed in its previous state after diagnostics is completed. Refer to Manual (Unit) Diagnostic in this chapter for procedures to add and remove links. From this point, the following may be performed to diagnose, remove, restore, or display a particular frame/cabinet group:
Procedure 6-1. The 1106 Page Diagnostic Procedure

NOTE: Before any manual diagnostics begin, ARR should be inhibited to prevent automatic diagnostics (ARR) from attempting to diagnose and restore nodes queued, or actively performing manual diagnostics. See the INH:DMQ message in the CNI Input Message Manual, 256-090-204. 1. From the MCRT Display the frame/cabinet group to be diagnosed by entering the following command: 6xx
Issue 16.0
December 2000
6-59
401-661-045
where:
xx = group number.
2. If a node is to be removed from service (OOS-NORMAL) for any reason, the following input command is used: 2xx where: xx = display line number of the node to be removed from service. The node state changes to OOS-NORMAL. 3. From the MCRT To diagnose a node from this frame/cabinet group, enter the following command: 5xx where: xx = display line number of the node to be diagnosed.
See the DGN command in the 401-610-057 Output Message Manual, for the response to the completion of the diagnostics. If the diagnostic result is: STFDetermine which phase(s) failed, and record the CP number(s) for that phase. See the trouble location circuit pack list tables in this chapter for additional information. Conditional all-tests-passed (CATP) Determine the reason for the CATP response. If the reason is the node was not singly isolated, go to Step 4. Conditionally restore (RST) the adjacent nodes. When these nodes have been restored, conditionally restore this node, the rst failing node. If the reason is the node was not isolated, correct all problems so that a duplex ring exists and conditionally restore this node. If the reason is the ring is down, correct all problems so that an active ring exists and conditionally restore this node. For additional information on ring conguration and maintenance, see Maintenance Description section in this manual. No-tests-run (NTR)If an NTR response is received, go to Step 3. If the problem persists, seek technical assistance. ABTIf an ABORT is received, determine the reason(s) for the ABORT. After determining the reason(s) for the ABORT, go to Step 3, and/or seek technical assistance. 4. From the MCRT Unconditionally restore the node to service by entering the following input command: 3xx
6-60 Issue 16.0
December 2000
where:
xx = display line number of the node to be unconditionally restored.
s
CAUTION:
A complete diagnostics has produced an all-tests-passed (ATP) response. A complete diagnostics has produced a CATP response, and the RI and the NP minor states are both USBL.
The node which was being diagnosed should return to the system ACT state, and this should complete the diagnostic tests.
Procedure 6-2. Manual Diagnostics Using the DGN Command

This procedure uses the DGN command. When this command is entered at the MCRT, the following sequence of events normally occurs. For exceptions, see the DGN: or DGN:RPCN command in the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual. 1. If the node is active or handling traffic, the node is removed from service (OOS-NORMAL). 2. The node under test is isolated (OOS-ISOLATED). 3. Diagnostics are performed on the NUT. 4. The node is unisolated (OOS-NORMAL) and configured back into the active ring. 5. The node is not restored to service.
Procedure 6-3. The DGN Command Diagnostic Procedure

When using the DGN command, the following procedure should be used to restore a node to service:
Issue 16.0
December 2000
6-61
401-661-045
NOTE: Before any manual diagnostics begin, ARR should be inhibited to prevent automatic diagnostics (ARR) from attempting to diagnose and restore nodes queued, or actively performing manual diagnostics. See the INH:DMQ message in the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual. 1. At the MCRT Obtain a report on the status of a node in a particular group, or the status of the ring by entering the following input message, or a variation thereof, as shown in OP: Ring Input Message Variations table, or refer to the 401-610-055 FLEXENT/ AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual. OP:RING,nodexx y For LN node = LN xx = group number y = node member number. For RPCN node = RPCN xx = group number y = node member number. NOTE: The input message provided above provides the status information for a specied RN. For the message completion response, observe the MCRT or the ROP. To determine what response message to expect and for an explanation of such, see the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manuall. 2. At the MCRT If there is an active link supported by this node, remove it from service using the procedures listed previously in this section. Request diagnostics of the node by entering the following input message, or a variation thereof, as listed in DGN Message Input Variation table. For a complete listing of all DGN input command variations, see the 401-610-055 FLEXENT/ AUTOPLEX Wireless Networks INPUT MESSAGES Message Manuall.
6-62 Issue 16.0
December 2000
DGN:nodexx y For LN node = LN xx = group number y = node member number. For RPCN node = RPCN xx = group number y = node member number. NOTE: The input message listed above runs all automatic phases on the specied RN. To determine what response message to expect and for an explanation of this message, see the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual or the 401-610-057 FLEXENT/ AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual 3. At the ROP Examine the copy of the DGN printout to determine the status of the diagnostics tests (determine which phases failed or passed). If an ATP response is received at the ROP, proceed to Step 4. If an STF, NTR, or CATP response is received at the ROP, go to Step 5. 4. At the MCRT If a link associated with this node was removed from service prior to diagnostics, put the link back in service using the procedures listed previously in this section. Unconditionally restore the node to service by entering the following input message: RST:nodexx y;UCL For LN node = LN xx = group number y = node member number UCL = restores the node without diagnostics.
Issue 16.0
December 2000
6-63
401-661-045
For RPCN node = RPCN xx = group number y = node member number UCL= restores the node without diagnostics.
s s
CAUTION:
NOTE: If the major state of the node is OOS-ISOLATED, this input message requests that the node be included back into the active ring. If conguring the node back into the active ring is successful, the node major state is changed to ACT and the node is pumped with the required operational code. If the node is unable to be congured back into the active ring, the restore is stopped and the node is left in the OOS-NORMAL state. If the node was not originally OOS, the restore is stopped and the node is left in the state it was in prior to the restoral request. The nodes major state must be changed to OOS via a recent change and verify (RCV) command before it can be restored. For additional information concerning a node state change, refer to Maintenance Description section in this manual. NOTE: If the major state is changed to ACT, the DGN diagnostics are complete. Omit the remainder of this test procedure. NOTE: Perform Steps 5 through 8 only if an ATP response is not received in Step 3. 5. From the ROP If the diagnostic result is: STFDetermine which phase(s) failed, and record the CP number(s) for that phase. See the trouble location circuit pack list tables in this chapter for additional information on RNs. Proceed to Step 6. CATPDetermine the reason for the CATP response.
6-64 Issue 16.0
December 2000
If the reason is the node was not singly isolated, go to Step 4. Conditionally restore (RST) the adjacent nodes. When these nodes have been restored, conditionally restore this node, the rst failing node. If the reason is the node was not isolated, correct all problems so that a duplex ring exists and conditionally restore this node. If the reason is the ring is down, correct all problems so that an active ring exists and conditionally restore this node. For additional information on ring conguration and maintenance, see the "Maintenance Description section in this manual. NTRIf an NTR response is received, go to Step 1 or Step 2. If the problem persists, seek technical assistance. ABTIf an ABORT is received, determine the reason(s) for the ABORT. See the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual. After determining the reason(s) for the ABORT, go to Step 1 or Step 2, and/or seek technical assistance. 6. At the ring node frame/cabinet (RNF/C) Use the trouble location circuit pack list tables in this chapter to determine the equipment location for each suspected or faulty CP. 7. At the RNF/C Replace the faulty CP(s) using the procedures described in using the procedure described in Chapter 7, Equipment Handling Procedures. 8. If time permits and there is uncertainty about node operation, repeat diagnostics to confirm proper system operations. Go to Step 2.
Procedure 6-4. Manual Diagnostics Procedure Using the RST Command

This procedure uses the RST input command. This command provides the same functions as the DGN command, with the addition of an automatic restoral at the completion of running the diagnostic phases. The restoral is conditional upon an ATP or CATP diagnostic result, with the RI and NP minor states both being usable (USBL). This command normally performs the following sequence of events. For
Issue 16.0
December 2000
6-65
401-661-045
exceptions, see the RST:/RST:RPCN input command in the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual. 1. Conditionally removes the node from service (OOS-NORMAL). 2. Isolates (OOS-ISOLATED) the node. 3. Runs all automatic phases on the node. 4. Unisolates the node (OOS-NORMAL). 5. Restores the node to service (ACT). For additional information on the normal sequence of events when using the RST command, see the 401-610-055 Input Message Manual.
Procedure 6-5. The RST Command Diagnostic Procedure

When using the RST command, the following procedure can be used: 1. At the MCRT Obtain a report on the status of a node in a particular group, or the status of the ring by entering the following input message, or a variation thereof, as shown in the OP: Ring Input Message Variations table. OP:RING,nodexx y For LN node = LN xx = group number y = node member number. For RPCN node = RPCN xx = group number y = node member number. NOTE: The input message listed provides the status information for a specied RN. To determine what response message to expect and for an explanation of such, see the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual.
6-66 Issue 16.0
December 2000
2. At the MCRT If there is an active link supported by this node, remove it from service using the procedures listed previously in this section. Request node test by entering the following input message: RST:nodexx y For LN node = LN xx = group number y = node member number. For RPCN node = RPCN xx = group number y = node member number. NOTE: Upon inserting the RST command at the MCRT, the following events normally occur: 1. The node is conditionally removed from service (OOS-NORMAL). The ring quarantine (RQ) LED on the node processor or IRN lights if the remove above was successful. 2. The node is isolated from the active ring (OOS-ISOLATED). The no token (NT) LED lights at the node under test if the node is successfully congured out of the active ring. 3. All diagnostic phases are run on the specied node under test. 4. If the diagnostic result is an ATP response, the node is congured back into the active ring. When the node is successfully congured back into the active ring, it is restored to service. If the node is unable to congure back into the active ring, it is left in the OOS state. To determine what completion response message to expect and for an explanation of such, see the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual. If a link associated with this node was removed from service prior to diagnostics, put the link back in service using the procedures listed previously in this section. NOTE: If the node is left in the OOS state, and the response STF, CATP, or NTR is received at the ROP, further diagnostics are required. Depending upon the severity of the failure(s), that is, if a particular phase or range of phases failed,
Issue 16.0
December 2000
6-67
401-661-045
choose a DGN input message as listed in the DGN Message Input Variations Table or from the CNI Input Message Manual, 256-090-204 which matches the circumstances of the failed phase(s), and perform Steps 3 through 9. At the ROP From the printout received at the ROP (this step), determine which phase(s) failed. If an ATP response is received at the ROP, all diagnostics are complete and the rest of this test procedure should be omitted. If only a particular phase failed, proceed to Step 4, and enter message as listed in instructions. If a range of phases failed, enter the appropriate input message from DGN Message Input Variations table in Step 4, and proceed with the test. NOTE: Perform Steps 4 through 9, only if a CATP, NTR, or STF response is received in Steps 2 and 3. At the MCRT Request diagnostics for the failing phase by entering the following input message, or a variation thereof, as listed in DGN Message Input Variations table: DGN:nodexx y:PH a For LN node = LN xx = group number y = node member number PH = phase a = number of a particular phase to run For RPCN node = RPCN xx = group number y = node member number PH = phase a = number of the particular phase to run. NOTE: To determine what completion response message to expect and for an explanation of the message, see the 401-610-055 Input Message Manual or the 401-610-057 Output Message Manual.
6-68 Issue 16.0
December 2000
1. At the ROP Examine the printout and ascertain the failed phase(s), record the CP(s) number(s) and use the trouble location circuit pack list tables in this chapter to determine the equipment location of the failed or faulty CP(s). The TLP option can also be used to determine the location of suspected faulty equipment. 2. At the RNF/C Replace the faulty CP using the procedure described in Chapter 7 Equipment Handling Procedures. 3. If time permits and there is uncertainty about node operation, repeat diagnostics to conrm proper system operations. Go to Step 2.
CDN-I Fault Isolation Panic Messages

Panic messages are intended for use in analyzing software problems. They are, for the most part, not useful for hardware fault isolation. Recurring panic messages should be reported to the CTS. The hardware panic message that indicates that the microsecond timer on the NPI board is malfunctioning, is a valuable message. This timer is not tested by the diagnostic but is tested in the background of the operational software. If this message is received, the NPI board should be replaced. If the panic persists, replace the CCS board. Formerly when a CDN-I crashed because of hardware problems, diagnostics were relied on to recover the node. Each RAP circuit pack is diagnosed by a particular diagnostic phase. A failing diagnostic phase is supposed to isolate the fault to the pack associated with that phase number. The diagnostics rely on RAP rmware to be operational. This diagnostic is a diagnostics driver which is pumped to the IRN. The driver sends commands to the RAP rmware allowing for the diagnostics to be executed for a given board. A large percentage of circuitry on every pack on the RAP local bus must be operational for this to work and even more circuitry must be operational for rmware execution of the power up initialization sequence. If the RAP cannot initialize, diagnostics is impossible. Diagnostic responses received at the host fall into one of three categories. They are:
s s
A normal response containing failure data. A response without failure data because the RAP is hung in a diagnostic phase (the board being diagnosed is at fault).
Issue 16.0
December 2000
6-69
401-661-045
A response without failure data because the RAP rmware is not executing.
The rst two faults can be isolated using standard diagnostic procedures. More than likely, however, the RAP rmware is not executing (a category 3 failure). In the automatic recovery procedure, diagnostics are run on a particular sequence of boards. The rst board (on the RAP local bus) of this sequence always fails regardless of which board is bad.
RAP Diagnostic Firmware

Each circuit pack on the RAP bus in a CDN-I is equipped with a diagnostic fail LED. The system initializes with all LEDs on and if all diagnostics are successful, the LEDs turn off. The diagnostics can be run locally by pressing the DIAG button on the PCID. The LEDs can also be used to mark the progress of the initialization when power is applied to the RAP. When the RAP appears as though it is not initializing, it is very difcult to isolate the faulty pack because many packs can affect the bus. Fortunately, the minimum number of packs on the local bus required for rmware operation is just three (CCS, CCC, MASC_0). Utilizing RAP rmware greatly reduces RAP downtime as compared with running the diagnostics from the host. Refer to the section Ring Application Processor Critical Maintenance Procedure in Chapter 3, Ring Maintenance.
Interactive Diagnostics
Interactive diagnostics (EX) are used to exercise a node in the interactive mode. Interactive diagnostics are used to enter a mode of operation whereby diagnostic execution is controlled to exercise any particular phase or portion of diagnostic execution. Interactive diagnostics can be used to replace regular diagnostic execution when the following is to be performed: 1. To run diagnostics up to a particular point of execution and stop 2. To perform a specic group of tasks repeatedly 3. To start and to stop a loop of diagnostic executions 4. To step through a set of diagnostic commands 5. To suspend diagnostic execution for a specic time period. NOTE: This capability is limited to data table statements; that is, downloaded diagnostic code when executed cannot be controlled interactively. When EX is begun, the following sequence of events occurs: 1. The or RPCN is rst removed from service following the rules of the RMV: or RMV:RPCN input messages.
6-70 Issue 16.0
December 2000
2. The node is isolated if the nodes major state is OOS, GROW, OFFLINE, or UNAV. Otherwise, the diagnostic request is aborted. 3. The EX demand executions are performed. 4. Upon successful completion of the EX routine, an attempt is made to include the node back into the active ring if it was in the active ring prior to entering the EX command. Otherwise, the node is left in the isolated segment. In all cases, the node is left in the OOS state.
Procedure 6-6. Interactive (EX) Diagnostic Procedures

When it is desired to perform interactive diagnostics, the following procedure should be used: 1. To start the interactive diagnostic mode: From the MCRT If there is an active link supported by this node, remove it from service using the procedures listed previously in this section. Enter the EX command for the desired node. This command returns a slot number. For an LN EX:xx y :PH b For RPCN EX:RPCNxx y :PH b [,c] where: xx = group number y = node member number b = phase(s) to be executed c = statement number 2. From MCRT or ROP Wait for the display of EX:STARTED AT STATEMENT a, which indicates that the interactive mode has started.
Issue 16.0
December 2000
6-71
401-661-045
3. From the MCRT Execute the diagnostics by entering the EX commands as listed or in the order that the diagnostics are to be performed: To pause or suspend diagnostic execution at a specified statement number within a diagnostic phase for an RN, enter the following command: EX:PAUSE;nodexx y :ST e where: node = or RPCN xx = group number y = node member number b = phase(s) to be executed e = statement number See tthe 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual. for system response to message. To put the diagnostics in a loop between the specied statement numbers for any RN, enter the following command: EX:LOOP;nodexx y :ST f - g See the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual for the system response to the message. To step through the diagnostics and to suspend at a specied statement number for any RN, enter the following command: EX:STEP;nodexx y :ST e See the 401-610-057 Output Message Manual for the system response to the message. To stop the looping started by the EX:LOOP command for any RN, enter the following command: EX:STOP;nodexx y See the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual or the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual for the system response to the message. To exit from the interactive mode for any RN, enter the following input command:
6-72 Issue 16.0
December 2000
STOP:DMQ;nodexx y If a link associated with this node was removed from service prior to diagnostics, put the link back in service using the procedures listed previously in this section.
Denied Diagnostic Requests

When a manual request is denied, the following message is printed at the ROP: <type> : NO node AVAILABLE _ RETRY LATER where: <type> = Type of request: DGN - Manual diagnostic EX - Interactive diagnostic RMV - Remove node RST - Restore node. Reenter the request at a later time. When an automatic request is denied, the user does not receive any notication, and no action on the users part is required. For additional information concerning denied diagnostic requests.
Inhibiting Diagnostic Requests

A diagnostic inhibit (INH) is used to inhibit (stop) automatic diagnostic request. Any process that sends a restore, remove, or diagnostic request to the system for processing can be prevented from being activated for any amount of time specied. A reminder that a specic inhibit is output at the display terminal at specied intervals. The message format for inhibiting a diagnostic request is as follows: INH:DMQ;SRC a, TINH b, AINH c where: INH = inhibit DMQ = diagnostics SRC a = identity of process to be inhibited TINH b = time in minutes that inhibit lasts AINH c = alarm intervals in minutes.
Issue 16.0
December 2000
6-73
401-661-045
For more details and an explanation of the INH:DMQ command, refer to the 401610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual.
Diagnostic Aborts and Audits Aborts

At times when performing diagnostics, it may be necessary to abort or cancel a request in the active queue if:
s s
The request was entered by mistake. A request of higher importance is in the waiting queue, and an active queue must be cleared to allow room for another. An interactive diagnostic is to be exited. The active and waiting queues of all requests must be cleared for the eld update of diagnostic les.
s s
When it is necessary to abort or cancel a diagnostic request, the following procedure should be used: 1. At the MCRT Enter the following input command: OP:DMQ The output from this command tells the user the slot number and queue assigned to a particular job. The source in the output message may be (but is not limited to) one of the following:
s s s s s
ARR - Automatic ring recovery ADP - Automatic diagnostic process MAN - Manual requests input by the user PSM - Power switch monitor REX - Routine exercise.
2. At the maintenance terminal Enter the following command to abort a diagnostic request in the active queue or cancel it from the waiting queue. STOP:DMQ;nodexx y
6-74 Issue 16.0
December 2000
Audits
At various points in the diagnostic execution process, checks are performed to verify that the diagnostic system is functioning properly. These verications are:
s s s s s
Called functions gives correct return codes Needed system resources are available Necessary les can be opened or read, and executed Hardware errors have not occurred Illegal operations are not attempted
Audit Failures
If an audit fails, a report is printed at the MCRT. The user should respond to the audit report in the following manner: 1. If a diagnostic test or phase fails prior to an audit failure, clear the problem indicated by the test failure. This may also clear the audit failure. 2. Save the printout pertaining to the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manualthe 401-610-057 Output Message Manual:
s s s
to determine the reason for the audit failure, to determine whether or not the CTS should be contacted, and to see if any additional data should be collected.
When a diagnostic is aborted, one of two messages is printed at the MTTY and the ROP. Listed here is only one format and explanation. For details and explanation of the second format, refer to the 401-610-057 Output Message Manual. DGN AUDIT RING R = b SYSTEM DATA D = n T = i A = j S = k I = l PH = p where: b = reason for the audit, (in hexadecimal notation) n = error code returned on a failing system call or a failing function call (in decimal notation) i = last test executed (in decimal notation) j = data table address (in hexadecimal notation)
Issue 16.0
December 2000
6-75
401-661-045
k = data table statement number (in decimal notation) l = task routine index (in hexadecimal notation). PH = phase number being executed when the DGN was aborted (in decimal notation). For additional information concerning audits, refer to the Audits section of this manual.
Operating System Diagnostics

The procedures and information needed for performing 3B21D-2 and UNIX system RTR or UNIX system RTR VLMM diagnostics are provided in the UNIX System RTR 3B20/3B21 Operators System Maintenance Manual, 304-046.
6-76 Issue 16.0
December 2000
7
7-1 7-1 7-2 7-2 7-7 7-13 7-13 7-13 7-13 7-15 7-16 7-17 7-17 7-23 7-28
Contents
Introduction Equipment Description and Handling Precautions
s
Power Packs and Fusing Descriptions Power Pack Description and Replacement Procedures Fuse Description and Replacement Procedures Fan and Filter Maintenance Ring Node Frame Fan Unit Description Ring Node Cabinet Fan Unit Description Analog Facility Access Frame Fan Unit Description Filter Maintenance Ring Node Equipment Visual Indicators Removing Affected Equipment From Service UN122C and UN123B Combination Circuit Pack Installation Voice Frequency Link Hardware Equipment Replacement Procedures
Ring Node Circuit Pack Handling Precautions

s s s s
Issue 16.0
December 2000
7-i
401-661-045
Contents
7-ii
Issue 16.0
December 2000
Introduction
This chapter the contains guidelines and precautions to be followed when working with equipment in a Common Network Interface (CNI) ofce. These guidelines and precautions must be followed closely before and during the handling of all circuit packs (CPs). Since improper handling may cause isolation of the ring or total system failure, they are of extreme importance. Use them in conjunction with Chapter 4, Ring and Ring Node Maintenance Procedures and Chapter 6, Diagnostic Users Guide.
Equipment Description and Handling Precautions

The following precautions are for ring maintenance functions. Failure to follow these procedures could result in the damage to highly integrated CPs or loss of service, caused by isolating or totally interrupting the ring. These procedures cover the handling and the replacement of equipment only. The equipment has been Underwriters Laboratories (UL) approved and consists of the following components:
s
Integrated ring circuit packs (described for each ring node type in the Overview of Chapter 6, Diagnostic Users Guide) Power converter packs Ring node frame/cabinet (RNF/C) fan units.
s s
Issue 16.0
December 2000
7-1
401-661-045
NOTE: When handling ring and ring node (RN) equipment, the appropriate light emitting diodes (LEDs) must be illuminated to prevent severe system interruption or failure.
Power Packs and Fusing Descriptions

The power packs and fuses associated with the RNF/C and power distribution frame/cabinet provide the necessary power for equipment located on each RNF/ C. A power island (PI) supplies backup power in the event of primary power loss. The PI provides from 5 to 30 minutes of battery holdover, depending on the load and the number of battery strings used, and is contained in 2-4 3B21D computer-type cabinets. The fan units provide the necessary equipment cooling. NOTE: DLNs and CDNs use the same procedures as RNF/C(s). The term LN is used in these procedures to represent all of these nodes.
Power Pack Description and Replacement Procedures

Each unit on the RN frame/cabinet uses two 495FA or 410AA power converters to supply power to the three s associated with that particular unit. Therefore, one power converter supplies power to one and a half s. The loss of either converter affects the operation of two of the three s in that unit. CDN-I uses 410AA power converters, one for the node, one for the RAP and Link Node unit, and two for each additional memory growth unit. Likewise, each RPCNU uses two 495FA power converters. Loss of either converter affects the operation of that RPCNU. Before replacing a power supply circuit pack in a 3-node unit, isolate the two nodes adjacent to the power supply. In a 2-node unit, isolate the node adjacent to the power supply. In an 8-node unit, isolate the four nodes adjacent to the power supply. In a 5-node unit, learn from the unit horizontal designation strip next to the power supply in question the nodes serviced by the power supply, and isolate either three or two nodes. No power pack should be removed without rst removing the associated s or RPCN from service. Power may be affected due to a faulty power converter, a short in one of the associated circuit packs, or an incorrect or missing current programming resistor on an circuit pack. Table 7-1 will determine which nodes must be removed when removing power supplies in the RNF.
7-2 Issue 16.0
December 2000
Table 7-1.
Power Unit Index REPLACE POWER UNIT 1 2 3 4 5 6 7 8 9 10 REMOVE NODES: 1, 2 2, 3 4, 5 5, 6 7, 8 8, 9 10, 11 11, 12 13, 14 14, 15
Procedure 7-1. Replacing Ring Node Frame/Cabinet Power Packs

1. At the maintenance cathode ray tube (MCRT), determine affected equipment location. 2. Press the alarm release (ALM-RLS) key to silence the audible alarm. NOTE: The audible alarm may also be silenced by pressing the alarm cutoff (ACO) key at the alarm frame. 3. Remove either the two associated s or the affected RPCN from service. Enter: RMV:nodexx y where: node = LN or RPCN xx = Ring node group number y = Node position in the ring node group (member number). 4. Isolate the associated RPCN or s from the active ring by entering:
Issue 16.0
December 2000
7-3
401-661-045
CFR:RING a, b;EXCLUDE where: a = Ring node (if b is present, a is the rst of a range of RNs (in the direction of ow of Ring 0). In the form of {RPCNx y | x y} b = Last node in the range begun by a in the same form. EXCLUDE = Request to isolate specied node(s) from the active ring. 5. At the affected RNF/C, locate the correct faulty converter. 6. Obtain the proper replacement power pack using precautions for handling RN equipment CPs.
Before removing the affected power pack, ensure that the associated RPCN or (s) has been removed from service and isolated. Refer to Table 7-1 to determine the proper nodes to remove from service. 7. At the faulty equipment location, replace the faulty power pack (observe all equipment handling precautions). 8. At the RN control panel, press the PWR ALM RESET button to restore the frame/ cabinet to normal operation. 9. At the 410AA or 495FA power converter, verify that the power alarm lamp and the LEDs are illuminated. 10. Place the faulty power pack in protective static wrapping, and return it to storage for later repair. 11. Before returning the node(s) to service, diagnose the node by entering the following at the MCRT: DGN:nodexx y where: DGN = Requests the run of all diagnostics phases node = LN or RPCN xx = Ring node group number y = Node position in the ring node group (member number).
CAUTION:
7-4 Issue 16.0
December 2000
NOTE: Before unconditionally restoring the node to the ring, it is strongly recommended that at least Phase 1 and Phase 2 diagnostics are run on the node. The above procedure will execute full diagnostics. 12. After diagnostics returns an ATP message, restore node(s) removed from service by entering the following at the MCRT: RST:nodexx y; UCL where: node = An LN or RPCN xx = Ring node group number y = Node position in the ring node group (member number). For further reference see Chapter 6, Diagnostic Users Guide. If after replacing the power converter the power failure is not corrected, then there may be a short in the . If a short on an circuit pack is the cause of a power failure, then the following procedure should be used to correct the malfunction:
Procedure 7-2. Fixing Power Failures Caused by a Shorted Link Node Circuit Pack
1. At the MCRT, determine the affected equipment location. 2. Press the ALM-RLS key to silence the audible alarm. NOTE: The audible alarm may also be silenced by pressing the ACO key on the control panel of the affected RNF/C. 3. At the affected equipment location, locate the nodes affected by the power loss. 4. At the MCRT, remove either the two associated s or the affected RPCN from service. Enter the following command: RMV:nodexx y where: node = An LN or RPCN xx = Ring node group number y = Node position in ring node group (member number)
Issue 16.0
December 2000
7-5
401-661-045
UCL = Restore node unconditionally. 5. Isolate the associated RPCNs or s from the active ring. Enter: CFR:RING a, b;EXCLUDE where: a = Ring node (if b is present, a is the rst of a range of RNs (in the direction of ow on Ring 0). In the form of {RPCNx y | x y} b = Last node in the range begun by a in the same form. EXCLUDE = Request to isolate specied node(s) from the active ring. 6. At the faulty equipment location, unplug all circuit packs affected by the power loss. This includes either the affected RPCN or two associated s.
Before removing the affected power pack, ensure that the associated RPCN or (s) has been removed from service and isolated. Refer to Table 7-1 to determine the proper nodes to remove from service. 7. At the faulty power pack, recycle power to the affected power converter. 8. If the converter does not turn on with no load on it, then replace the CP. Place the faulty power pack in protective static wrapping and return it to storage for later repair. 9. If the converter powers up, try replacing each suspect CP one-at-a-time. At the faulty equipment location, plug in each circuit pack removed in Step 6. The CP with the short will power down the power converter. 10. Replace the faulty circuit pack with a new one. 11. If the problem is corrected after replacing the faulty CP, place the faulty CP in protective static wrapping and return it to storage for later repair. 12. At the RN control panel, press the PWR ALM RESET key to restore the frame/ cabinet to normal operation. 13. Before returning the node(s) to service, diagnose the node by entering the following at the MCRT: DGN:nodexx y
CAUTION:
7-6 Issue 16.0
December 2000
where: DGN = requests the run of all diagnostics phases node = An LN or RPCN xx = Ring node group number y = Node position in the ring node group (member number). NOTE: Before unconditionally restoring the node to the ring, it is strongly recommended that at least Phase 1 and Phase 2 diagnostics are run on the node. The above procedure will execute full diagnostics. 14. After diagnostics returns an ATP message, restore the node(s) removed from service by entering the following at the MCRT: RST:nodexx y; UCL where: node = An LN or RPCN xx = Ring node group number y = Node position in the ring node group (member number). For further reference see Chapter 6, Diagnostic Users Guide.
Fuse Description and Replacement Procedures

System interruption and/or the loss of other s or RPCNs may be caused by the loss of a 10-amp fuse on the RNF/C. Also, the loss of a 20-amp fuse on the power distribution frame (PDF), the 20-amp fuse on the DC power distribution cabinet (DCPD), or the 25-amp fuse on the Global Power Distribution Frame (GPDF) may cause failure of either one RPCNU or one unit. The loss of a 250-amp fuse at the battery plant could affect a total of four RNFs or RNCs. This causes the failure of sixty s, and the possible failure of two RPCNs. When this fuse is lost, a major alarm is triggered in the ofce and must be corrected as soon as possible.
Procedure 7-3. Fuse Replacement for Ring Node Frame/Cabinet Failures

1. At the MCRT or the affected equipment, determine the blown fuse location. 2. At the MCRT, press the ALM-RLS key to silence the audible alarm.
Issue 16.0
December 2000
7-7
401-661-045
NOTE: The audible alarm may also be silenced by pressing the ACO key on the control panel of the affected RN frame/cabinet. 3. To avoid ring interruption, the affected ring nodes should be taken out of service and isolated from the active ring before the power converter is removed. If the RNs are not already OOS and isolated, enter the following commands: RMV:nodexx y CFR:RING a, b;EXCLUDE where: node = LN or RPCN xx = The ring node group number y = Position in the ring node group (member number). a = Ring node (if b is present, a is the rst of a range of RNs (in the direction of ow on Ring 0). In the form of {RPCNx y | x y} b = Last node in the range begun by a in the same form. EXCLUDE = Request to isolate specied node(s) from the active ring. 4. At the faulty equipment location, unseat the affected power converter (that which is associated with the blown fuse and OOS nodes). 5. Replace the faulty fuse. 6. Reseat the power converter. If the fuse does not blow again, proceed to Step 8. 7. Otherwise, the power converter must be replaced:
s s s s
unseat the affected power converter, insert a new fuse, replace the power converter, place the faulty power converter in protective static wrapping, and return it to storage for later repair.
8. At the RN control panel, press the PWR ALM RESET key to restore the frame/ cabinet to normal operation. 9. The lamp test key can be used to test the power alarm (PA) and fuse alarm (FA) lamps. 10. Before returning the node(s) to service, diagnose the node by entering the following at the MCRT: DGN:nodexx y
7-8 Issue 16.0
December 2000
where: DGN = Requests the run of all diagnostics phases node = LN or RPCN xx = Ring node group number y = Node position in the ring node group (member number). NOTE: Before unconditionally restoring the node to the ring, it is strongly recommended that at least Phase 1 and Phase 2 diagnostics are run on the node. The above procedure will execute full diagnostics. 11. After diagnostics returns an ATP message, restore node(s) removed from service by entering the following at the MCRT: RST:nodexx y; UCL where: node = LNor RPCN xx = Ring node group number y = Node position in the ring node group (member number). For further reference see Chapter 6, Diagnostic Users Guide. Disruption of either one unit or one RPCNU may be caused by a blown 20-amp fuse on the PDF or DCPD. Loss of the fuse also affects the two power converters on the or RPCN unit.
Procedure 7-4. Fuse Replacement for Power Distribution Frame/Cabinet Failures

1. At the MCRT, determine affected equipment location. Locate the PDF or DCPD blown fuse and the RN equipment affected by it. 2. At the affected RN control panel, press the ALM-RLS key to silence the audible alarm. NOTE: The audible alarm may also be silenced by pressing the ACO key at the affected RNF/C.
Issue 16.0
December 2000
7-9
401-661-045
3. To avoid ring interruption, the affected ring nodes should be taken out of service and isolated from the active ring before the power converter is removed. If the RNs are not already OOS and isolated, enter the following commands: CFR:RING,a, b;EXCLUDE RST:nodexx y; UCL where: a = Ring node (if b is present, a is the rst of a range of RNs (in the direction of ow of Ring 0). b = Last node in the range begun by a. EXCLUDE = Request to exclude specied node(s) from the active ring. node = LN or RPCN xx = Ring node group number y = Node position in the ring node group (member number). 4. At the faulty equipment location, unseat the affected power converters and circuit packs. Remove the fan fuse(s). 5. At the PD frame/cabinet, remove the blown fuses (both the main and indicator fuses). The GPDF does not have indicator fuses. 6. Insert the charging tool into the indicator fuse slot, and press the charge key on the PD control panel. The GPDF does not have a charging probe. When this key is pressed, the charge indicator LED illuminates and slowly decays to off as the fuse location becomes fully charged. 7. Insert a new 20A main fuse and remove the charging tool. The GPDF uses a 25-amp fuse. 8. Reinsert the indicator fuse. 9. At the affected RNF/C, reseat the power converters and replace the fan fuse. 10. Reseat all circuit packs. If all fuses hold (on both the RNF/C and the PD frame/cabinet), proceed to the next step. Otherwise, correct the problem using guidelines for the appropriate condition. 11. At the RN control panel, press the PWR ALM RESET key to restore the frame/ cabinet to normal operation. 12. Before returning the node(s) to service, diagnose the node by entering the following at the MCRT:
7-10 Issue 16.0
December 2000
DGN:nodexx y where: DGN = Requests the run of all diagnostics phases node = LN or RPCN xx = Ring node group number y = Node position in the ring node group (member number). NOTE: Before unconditionally restoring the node to the ring, it is strongly recommended that at least Phase 1 and Phase 2 diagnostics are run on the node. The above procedure will execute full diagnostics. 13. After diagnostics returns an ATP message, restore node(s) removed from service by entering the following at the MCRT: RST:nodexx y; UCL where: node = LNor RPCN xx = Ring node group number y = Node position in the ring node group (member number). For further reference see, Chapter 6, Diagnostic Users Guide
Procedure 7-5. Fixing Blown Fuse or Power Failures of the Digital Facility Access Frame/Cabinet
There are also cases where fuses and power failures may occur on the digital facility access (DFA) frame/cabinet or the analog facility access frame (AFAF). 1. At the affected equipment control panel, press the ACO key to silence the alarm. 2. At the affected equipment location, locate the blown fuse(s). 3. Unseat the appropriate 495H1 and the 393A power converters (those associated with the blown fuse or fuses). 4. At the fuse location, replace the blown fuse(s). 5. Reseat both the 495H1 and the 393A power converters.
Issue 16.0
December 2000
7-11
401-661-045
6. When powering up the DFA frame/cabinet, a major alarm may be activated before the power converters stabilize. If a major alarm sounds, continue; otherwise, the problem is corrected. 7. At the DFA control panel, press the POWER ALARM RESET key to restore the frame/cabinet to normal operation. 8. Press the ACO key to silence the alarm.
Procedure 7-6. Fixing Blown Fuse or Power Failures of the Analog Facility Access Frame
1. At the affected equipment control panel, press the ACO key to silence the alarm. 2. At the affected equipment location, locate blown fuse(s). 3. Unseat the associated 133K and the 130D power converters. NOTE: Ensure the correct power converters are removed (those associated with the blown fuse or fuses). 4. At the fuse location, replace the blown fuse(s). 5. Reseat both the 133K and the 130D power converters. 6. At the AFAF control panel, press the POWER ALARM RESET key to restore the frame to normal operation. 7. If the alarm is due to a power failure in the fan system, do the following: a. At the affected AFAF, replace the blown fuse. If the fuse blows again, proceed to Step b; otherwise, the problem is corrected. b. Replace the fan or restore it to an operational state. c. On the 64C2 data mounting unit, press the alarm reset key. d. Press the alarm reset (ARS) key.
7-12 Issue 16.0
December 2000
Fan and Filter Maintenance

Two frames/cabinets are equipped with fan units: the Ring Node Frame/Cabinet (RNF/C) and the Analog Facility Access Frame (AFAF). Each fan unit has a removable wire mesh air lter. When a fault is detected in one of the fans, the fan alarm (ALM) lamp on the unit and the power alarm (PWR ALM) lamp at the control panel both illuminate. Since the fans are used for cooling, corrective action must be taken as soon as possible. The fans should be checked for proper operation every 6 months. Also, the lters should be cleaned and, if necessary, replaced every 6 months.
Ring Node Frame Fan Unit Description

The Ring Node Frame (RNF) fan unit contains three fans (1, 2, and 3) and a fan failure detector, with each fan being powered through individual fuses. These fuses are in a panel at the base of the RNF. The fans are located just above the fuse panel to force cooled air up through the entire frame and thus maintain the proper operating temperature. An RNF should be able to function properly with the loss of one fan, but with the loss of two fans, the equipment rapidly overheats. If there is only one operational fan in an RNF and there are no ofce spares, then a fan must be taken from another RNF and placed in the faulty unit. It is imperative that each RNF have at least two operational fans. It is also recommended that the ofce has two spare fans.
Ring Node Cabinet Fan Unit Description

The Ring Node Cabinet (RNC) fan unit contains four fans (1, 2, 3, and 4) and a fan failure detector, with each fan being powered through individual fuses. These fuses are in a panel at the base of the RNF. The fans are located at the bottom of the cabinet to force cooled air up through the entire cabinet and thus maintain the proper operating temperature. An RNC should be able to function properly with the loss of two fans, but with the loss of three fans, the equipment rapidly overheats. If there is only one operational fan in an RNC and there are no ofce spares, then a fan must be taken from another RNC and placed in the faulty unit. It is imperative that each RNC have at least two operational fans. It is also recommended that the ofce have two spare fans.
Analog Facility Access Frame Fan Unit Description

In the AFAF, there is one fan unit for each equipped data set unit. Thus, each frame can have up to two fan units. An AFAF fan unit contains three fans, but is replaceable only as a unit. Power for each unit is through individual fuses located in the fuse panel at the base of the frame. The data set unit power converter provides fan failure detection. The fan unit forces cooled air through the data set
Issue 16.0
December 2000
7-13
401-661-045
mounting to maintain the proper operating temperature. Although the data sets can function properly with a fan unit failure, corrective action should be taken as soon as possible. Fans in standard and K-cabinets have six fans in the middle of the cabinet; three fans in front and three fans in back. The three fans in front cool the upper half of the cabinet, and the three fans in back cool the lower half of the cabinet. These fans vary in speeds from 1700 RPM to 3400 RPM. The LEDs and toggle switch for the fans are located on the back of the cabinet. When a fan failure is detected (as indicated by the ALM and PWR ALM lamps illuminating), one of the following procedures should be used to correct the fault.
Procedure 7-7. Ring Node Frame/Cabinet Fan Replacement Guidelines

1. At the control panel of the affected RNF, retire any audible alarm by pressing the ALARM CUTOFF key. 2. At the fuse panel, ensure there are no loose or blown fuses. If replacing a fuse corrects the problem, do not replace the fan, but proceed to Step 8. 3. Power down the faulty fan by releasing the associated fuse (BF0, DF1, or FF2). 4. At the front of the unit, disconnect the faulty fan from the unit by unplugging the 48 V DC power cabling to the fan. 5. Remove the fan by loosening the two screws on the face of the fan and sliding the fan out the front of the unit. 6. Secure the new fan in place with the two screws, and plug in the power cable. 7. At the fuse panel, reinsert the associated fuse. 8. At the fan unit, press the black FAN ALM RST key. This should extinguish the FAN ALM lamp. 9. At the control panel, press the PWR ALM RESET key to restore the frame to normal operation.
7-14 Issue 16.0
December 2000
Procedure 7-8. AFAF Fan Replacement Guidelines

1. At the control panel of the affected AFAF, retire any audible alarm by pressing the ALARM CUTOFF key. 2. At the fuse panel, ensure there are no loose or blown fuses. If replacing a fuse corrects the problem, do not replace the fans, but proceed to Step 8. 3. Power down the faulty fan unit by releasing the associated fuse (AF0 or BF1). 4. At the rear of the unit, disconnect the unit by unplugging the 48 V DC power cabling. 5. At the front of the unit, remove the fans by loosening the two screws on either side of the unit (just above the filter) and sliding it out the front. 6. Secure the new unit in place with the two screws and plug in the power cable. 7. At the fuse panel, reinsert the associated fuse. 8. At the right of the data unit, set the ON/RST toggle switch to the RST position and then back to the ON position. This should extinguish the FAN ALM lamp. 9. At the control panel, press the PWR ALM RESET key to restore the frame to normal operation.
Filter Maintenance
The air lters are intended to eliminate dust from the cooling air. Dust buildup on frame circuitry could lead to improper system operation. Although no alarms are associated with the fan lters, they must be properly maintained by periodic replacement. The RNF/C lters are positioned horizontally just above the fan unit. To replace the RNF/C fan lter, simply slide it out the front of the frame/cabinet. On frame installations, remove the handle from the old lter and attach it to the new lter. On cabinet installations, simply replace the old lter. The AFAF lters are positioned horizontally just below the fan unit(s). To replace the AFAF data unit fan lter, the data unit cover must rst be opened. The lter then simply slides out the front of the frame.
Issue 16.0
December 2000
7-15
401-661-045
In the newer cabinets, the lters are above and below the front fan unit. To replace the lter, slide the lter out of the cabinet and replace it with a new lter.
Ring Node Circuit Pack Handling Precautions

Before any RN equipment is replaced on a functional ring, certain handling precautions must be observed. This Section presents some of those precautions. Before removing, installing, or handling any ring node CP, proper ground must be made to avoid damaging or further damaging the CP. If proper ground is not made before handling the CP, static electricity may damage it. To properly avoid this discharge of electricity, a static control wrist strap (3M-2200 series) must be worn at all times when handling RN CPs. Before touching the CP, connect the wrist strap lead to a nonelectrical metallic portion of a frame/cabinet or any appropriate location where repairing or handling CPs. The wristband portion of the strap must be placed around the wrist. The 3M-2200 series wrist strap must also be worn when handling new or repaired CPs. New CPs are always wrapped in a static protective wrapper to avoid static discharge damage. Therefore, when handling a new CP, keep it in the static-proof wrapper until the appropriate ground connections are made and the pack is ready to be inserted. Also, when handling old or defective CPs, static precautions must be observed as with handling a new CP. The static discharge can cause further damage to a CP, thereby affecting repair procedures. The old or defective CP should be wrapped in the protective wrapping, labeled with diagnostic failure information, and returned for repair. When a ring node CP is pulled for inspection, or for the purpose of replacement, the pack and the connections must be checked to ensure that:
s s
Backplane pins do not come out with the pack No pins are bent when the replacement CP is inserted. Extreme care must be used when handling the ring interface CPs. These CPs require considerable force to insert and remove. Therefore, whenever replacing or inspecting these CPs, check them carefully and use care in applying pressure to them.
7-16 Issue 16.0
December 2000
Ring Node Equipment Visual Indicators

Located on most ring CPs are visual indicators that indicate faulty or out-of-service (OOS) states. They indicate when particular maintenance functions may or may not be performed. These indicators, the ring quarantine (RQ), no token (NT), error, and diagnostic fail lamps, are found on the NP, RI1, IRN, RAP, AP, and circuit packs. The RQ visual indicator is located on the NP, IRN, and the LI4 CPs, and indicates that the circuit is presently in the OOS maintenance state but is still part of the active ring. The NT lamp is located on the RI1 and IRN circuit packs and indicates that there is no token message traversing the ring. This is an indication that the node (RPCN or) is in the OOS maintenance state and is isolated from the active ring. When a nodes NT lamp is illuminated, any CP may be removed from that node without affecting system operation. The attached processor uses a red LED to indicate an error. A red LED also indicates diagnostic failures on a RAP board. The PWR ALM lamp illuminates on the RN control panel for:
s s s
Fuse failure Unplugged power converter Fan unit failure. If more than one fan fails, a major alarm sounds. If the problem is not corrected, a total RNF/RNC failure may occur.
The NT lamps are also adjacent to nodes equipped with IFBs. Before any IFB circuit pack can be replaced, the NT lamps of both adjacent nodes must be illuminated. There are only two IFBs per frame/cabinet. These are located at the RPCN node if equipped, or the rst and last of the RNF/C. Since the IFB is adjacent to one node within its own RNF/C and another in the next RNF/C in line, the NT lamp adjacent to the suspected IFB on the associated frame/cabinet, and the NT lamp on the frame/cabinet next in line must be illuminated before the IFB circuit pack can be extracted.
Removing Affected Equipment From Service

When service has been interrupted because of faulty equipment, or when system maintenance requires replacing CPs, the node associated with the equipment must be removed from service. It is important to note that if there is another isolated segment on the ring, caution must be exercised. All affected nodes and equipment must rst be removed from service before any equipment can be replaced. Removal of a node in this case could create a larger isolated segment. Therefore, all isolated segments on the ring should be corrected before other maintenance functions are performed on the ring. For example, if there is an isolated segment on the ring and another trouble is detected 50 nodes away, the
Issue 16.0
December 2000
7-17
401-661-045
original isolation should be corrected before attempting to correct the new problem. This eliminates the possibility of expanding the isolated segment over the additional 50 nodes. System software puts faulty equipment OOS in one of two manners: normally and isolated. By taking it OOS normally, the system leaves it in the OOS-NORMAL maintenance state. In this state, the equipment is still part of the active ring. However, when the system removes the equipment from service and isolates it from the active ring, it is in the OOS-ISOLATED maintenance state. In this state, the node is a functional part of the ring for maintenance purposes only. Equipment Replacement Procedures Before any ring node equipment involving CPs is replaced or handled, all precautions and illuminated LEDs must be observed. When performing diagnostics, faulty CPs are listed in the manual trouble locating process. Therefore, all precautions must be followed before replacing these CPs. Following is a summary of the sequence of events that must take place when replacing equipment. When a malfunction or faulty equipment is detected: 1. Press the alarm cutoff (ACO) button at the affected equipment, or the ALM-RLS key at the MCRT, to silence the audible alarm. 2. Before attempting to change, inspect, or handle any CP, ground yourself using the static control wrist strap (3M-2066). 3. At the faulty equipment location, determine which CP is faulty. On the RNFs or RNCs, nodes are grouped closely together. Individual CPs are distinguishable by a color-coded bar above and across each ring node unit. To ensure that the proper pack is removed, examine each color-coded bar before any pack is extracted. Using the identification numbers on the faulty CP (be sure to check microcode, version, and issue), obtain the proper replacement CP. 4. Make sure the wrist strap is grounded and remove the suspect CP. 5. Insert the replacement CP from the storage cabinet. 6. Wrap up the old CP and place it in a carton for return. 7. Perform diagnostics on any affected equipment, and if all goes well, restore it to service. 8. If diagnostics fail, the faulty CP may have not been removed. At the replacement CP, ensure that the proper LEDs are illuminated for the type of CP replaced:
7-18 Issue 16.0
December 2000
RI1 RI0 NP Link
The NT lamp on this CP is illuminated and both RQ lamps are illuminated on the NP and circuit packs. The RQ lamp on the adjacent pack and the NT lamp on the adjacent RI1 CP is illuminated. The RQ lamp on this CP is illuminated, the adjacent RI1 NT lamp is illuminated, and the adjacent RQ lamp is illuminated. The RQ lamp on this CP is illuminated, the RQ lamp on the adjacent NP pack is illuminated, and the RI1 NT lamp is illuminated. The MDL boards are not equipped with LEDs. The NT lamps adjacent to the IFB are illuminated. Both RQ lamps on the adjacent NP and the CPs are illuminated, along with the NT lamp on RI1. The RQ lamp on the adjacent IRN is illuminated, and the PCID and power converter for the RAP are turned off. The RQ and NT lamps on this CP are illuminated, and the RQ lamp on the adjacent circuit pack is illuminated.
IFB AP RAP IRN
The CP names and associated identication numbers are as follows:

s s s s s s s s s s s s s s s s s
IRN/IRNB UN303 or UN303B (VLSI only) IRN2/IRN2B UN304 or UN304B IFB-U TN918 IFB-P TN915 IFB-4K TN1506 IFB-F TN1508 IFB-F TN1509 IFB-F TN1803 IFB-F TN4016 3BI TN914 DDSBS TN69B LI TN916 or TN1317 LI4S TN1316 LI4D TN1315 T1FA UN291 LI4S TN1316 12A Applique APA12
Issue 16.0
December 2000
7-19
401-661-045
AP: AP68 TN1340 (2 meg) or TN1641 (8 meg) for DLN AP30 TN1630 for DLNE or DLN30 AP30 TN1630B with 64-Mbyte mezzanine memory for DLNE-AP30 or CDN-II AP30 TN1630B with 64- to 256-Mbyte mezzanine memory for CDN-IIx
s s
NPI TN1349 RAP 3B15 computer boards CCC UN237 (1) for 2-mbyte, UN626 for 16-mbyte CCS UN236 (1) for 2-mbyte, UN625 for 16-mbyte MASC UN95 (1-6) or UN507 (1) for 16-mbyte memory board option MASA TN56 (1-48) or TN1398 (1-8) for 16-mbyte memory board option PCID TN1128.
As stated earlier, all faulty equipment must be OOS before maintenance is performed. If the equipment has not been automatically made OOS, then it must be manually removed from service before any CPs are handled. Ring node CPs must be isolated before they can be removed. Also, caution is again stressed when isolating nodes in a ring that already contains isolated nodes. To avoid increasing the size of the original ring isolation, problems associated with the previous ring isolation should be corrected before isolating any other nodes. This can be dangerous, in that the isolation may isolate too large of a segment on the ring, thereby not leaving enough active nodes to have a sufciently operational ring.
Procedure 7-9. Ring Hardware Circuit Pack Replacement Procedures

The following are guidelines for removing, inspecting, or handling CPs located in an IUN, RPCN, DLN, or CDN unit. These are the RI0, RI1, NP, 3BI, DDSBS, IRN, NPI, AP, CCS, CCC, MASC, MASA, PCID, LI4D, LI4S, APA12 and IFB circuit packs.
7-20 Issue 16.0
December 2000
When replacing circuit packs in ring nodes, it is important that the proper node and associated nodes are removed and isolated. There are two power supplies for each shelf, each power supply feeding 1 ring nodes. Table 7-2 displays additional nodes that must be isolated and removed when replacing a circuit pack in node. Table 7-2. Ring Node Power Supply Index REPLACE CIRCUIT PACK INRING NODE: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 REMOVE AND ISOLATE NODES: 1,2 1, 2, 3 2, 3 4, 5 4, 5, 6 5, 6 7, 8 7, 8, 9 8, 9 10, 11 10, 11, 12 11, 12 13, 14 13, 14, 15 14, 15
Assumption: Diagnostics have determined that there are faulty CPs in a node(s) on the ring. 1. At the MCRT, press the ALM-RLS key if necessary to silence alarms. NOTE: An audible alarm may also be silenced by pressing the ACO key at the affected RNF/C. 2. If the node with the faulty CP and associated nodes have not been removed from service, remove them. Refer to Table 7-1 to determine which nodes to remove and isolate. At the MCRT, enter:
Issue 16.0
December 2000
7-21
401-661-045
RMV:nodexx y where: node = RPCN or LN xx = Ring node group number (00-63) y = Node position in the ring node group. (0 for RPCN, 1-15 for ) 3. At the MCRT, isolate the associated node from the active ring. Enter: CFR:RING a, b;EXCLUDE where: a = Ring node (if b is present, a is the rst of a range of RNs (in the direction of ow on Ring 0). In the form of {RPCNx y | x y} b = Last node in the range begun by a in the same form. EXCLUDE = Request to isolate specied node(s) from the active ring. 4. At the faulty equipment location, obtain CP identification for the faulty pack. Get the proper replacement CP (use caution handling the new pack). 5. Ensure that the appropriate node is OOS, proper LEDs are illuminated, and that you are properly grounded to avoid static discharge. 6. Replace the faulty/suspected CP. NOTE: Ensure that the adjacent NP and (LI4 and APA12) CP RQ lamps are illuminated before removing either of these affected CPs. NOTE: Ensure that the adjacent RI1 NT and the adjacent RQ lamps are both illuminated before removing either of these CPs. NOTE: Since most CPs require considerable force to insert or remove, extreme caution must be exercised. Carefully inspect the CP edge connector and the backplane connector for bent or missing pins. 7. Place the old (or faulty) CP in the protective static wrapping, and return it to the storage cabinet for later repair. 8. At the affected RN control panel, press the PWR ALM RESET button to restore the frame/cabinet to normal operation.
7-22 Issue 16.0
December 2000
9. Diagnose the node by entering the following at the MCRT: DGN:nodexx y where: DGN = Requests the run of all diagnostics phases node = LN or RPCN xx = Ring node group number y = Node position in the ring node group (member number). NOTE: Before unconditionally restoring the node to the ring, it is strongly recommended that at least Phase 1 and Phase 2 diagnostics are run on the node. The above procedure will execute full diagnostics. 10. After diagnostics returns an ATP message, restore node(s) removed from service by entering the following at the MCRT: RST:nodexx y; UCL where: node = LNor RPCN xx = Ring node group number y = Node position in the ring node group (member number). For further reference see Chapter 6, Diagnostic Users Guide.
UN122C and UN123B Combination Circuit Pack Installation

The UN122C and UN123B CPs are used for the token tracking feature. Each frame must contain at least one UN122C and UN123B in a node to allow for token tracking capability. 1. Determine which CPs are to be used for token tracking. 2. The selected node and all nodes sharing the same FA495 converter must be isolated from the active ring. If the UN122C and UN123B candidate node is at the end of the unit, the middle node must also be removed from service. If the node is in the middle of the unit, all three nodes on the unit must be removed. 3. To be sure the minor link state of the token tracking node is in the MOOS state, enter: CHG:SLK=a-b:MOOS
Issue 16.0
December 2000
7-23
401-661-045
4. To request full diagnostics on the token tracking node, enter: DGN:LNa=b 5. Resolve all troubles if the diagnostics fail. 6. To be sure the minor state of the neighbor node(s) is in the MOOS state, enter: CHG:SLK=a-b:MOOS 7. To remove appropriate neighbor nodes from ring service, enter: RMV:LNa=b 8. Isolate the token tracking node and the neighbor nodes from the active ring. Enter this command for each of the nodes: CFR:RING,LNa=b:EXCLUDE 9. Replace the existing CPs with the new UN122C and UN123B CPs. Be sure to use a wrist strap to protect from electrostatic discharge. 10. Update the in-core ECD for the token tracking node. First, change the UCB major state from OOS to GROW. Update the hv values. Now change the major state from GROW to OOS. See Table 7-3 for the appropriate hv values. 11. To request a full diagnostics on the token tracking node, enter: DGN:LNa=b 12. Wait for the diagnostics on the token tracking node to run all test pass (ATP). From the maintenance terminal, go to the 199 page and execute the activate RC/V form to copy the in-core copy of the ECD to disk. 13. To restore the neighbor nodes, enter: RST:LNa=b 14. If the token tracking node is an IUN node, run diagnostic phases 12 and 13 on the token tracking node. If the token tracking node is an RPC node, run diagnostic phases 32 and 33. After these diagnostics run ATP, enter the following to restore the token node: RST:LNa=b
7-24 Issue 16.0
December 2000
Table 7-3.
Hardware Version Values (with IFB) (Page 1 of 2) POSITION IN RNF/C TN918 HV VALUE FOR IFB TYPE TN915 0x0002 0x0020 0x0802 0x0820 0x1002 0x1020 0x1802 0x1820 0x2002 0x2020 0x0102 0x0120 0x0902 0x0920 0x1102 0x1120 0x1902 0x1920 0x2102 0x2120 TN1506 0x0004 0x0040 0x0804 0x0840 0x1004 0x1040 0x1804 0x1840 0x2004 0x2040 0x0104 0x0140 0x0904 0x0940 0x1104 0x1140 0x1904 0x1940 0x2104 0x2140 TN1508 0x0005 0x0050 0x0805 0x0850 0x1005 0x1050 0x1805 0x1850 0x2005 0x2050 0x0105 0x0150 0x0905 0x0950 0x1105 0x1150 0x1905 0x1950 0x2105 0x2150 TN1509 TN1803 0x0006 0x0060 0x0806 0x0860 0x1006 0x1060 0x1806 0x1860 0x2006 0x2060 0x0106 0x0160 0x0906 0x0960 0x1106 0x1160 0x1906 0x1960 0x2106 0x2160
NP, RI0, & RI1 CPs*
TN913 UN122 UN123 TN913 UN122B UN123B TN913 UN122B UN123B TN913 UN122C UN123B TN913 UN122C UN123B TN922 UN122 UN123 TN922 UN122B UN123B TN922 UN122B UN123Bq TN922 UN122C UN123B TN922 UN122C UN123B
Lowest Highest Lowest Highest Lowest Highest Lowest Highest Lowest Highest Lowest Highest Lowest Highest Lowest Highest Lowest Highest Lowest Highest
0x0001 0x0010 0x0801 0x0810 0x1001 0x1010 0x1801 0x1810 0x2001 0x2010 0x0101 0x0110 0x0901 0x0910 0x1101 0x1110 0x1901 0x1910 0x2101 0x2110
Issue 16.0
December 2000
7-25
401-661-045
Table 7-3.
Hardware Version Values (with IFB) (Page 2 of 2) POSITION IN RNF/C TN918 HV VALUE FOR IFB TYPE TN915 0x8002 0x8020 0x8802 0x8820 0x9002 0x9020 0x9802 0x9820 0xc002 0xc020 0xc802 0xc820 TN1506 0x8004 0x8040 0x8804 0x8840 0x9004 0x9040 0x9804 0x9840 0xc004 0xc040 0xc804 0xc840 TN1508 0x8005 0x8050 0x8805 0x8850 0x9005 0x9050 0x9805 0x9850 0xc005 0xc050 0xc805 0xc850 TN1509 TN1803 0x8006 0x8060 0x8806 0x8860 0x9006 0x9060 0x9806 0x9860 0xc006 0xc060 0xc806 0xc860
NP, RI0, & RI1 CPs*
UN303 (IRN)
Lowest Highest
0x8001 0x8010 0x8801 0x8810 0x9001 0x9010 0x9801 0x9810 0xc001 0xc010 0xc801 0xc810
UN303 (IRN)
Lowest Highest
UN303B (IRNB
Lowest Highest
UN303B (IRNB)
Lowest Highest
UN304B (IRNB)
Lowest Highest
UN304B (IRNB)
Lowest Highest
* The RI CPs may be equipped with the Long Message Strap (LMS). This option is indicated in these tables within the symbol next to the CP number. Otherwise, the RI is not equipped with the LMS option.
7-26 Issue 16.0
December 2000
Table 7-4.
Hardware Version Values (No IBF) NP, RI0, & RI1 CPs* TN913, UN122, UN123 TN913, UN122B, UN123B TN913, UN122B, UN123B TN913, UN122C, UN123B TN913, UN122C, UN123B TN922, UN122, UN123 TN922, UN122B, UN123B TN922, UN122B, UN123B TN922, UN122C, UN123B TN922, UN122C, UN123B UN303 (IRN) UN303 (IRN) UN303B (IRNB) UN303B (IRNB) UN304B (IRNB) UN304B (IRNB) HV VALUE 0x0000 0x0800 0x1000 0x1800 0x2000 0x0100 0x0900 0x1100 0x1900 0x2100 0x8000 0x8800 0x9000 0x9800 0xc000 0xc800
* The RI CPs may be equipped with the Long Message Strap (LMS). This option is indicated in these tables with the symbol next to the CP number. Otherwise, the RI is not equipped with the LMS option.
Example: RI types UN122/UN123B Remove the letter sufx (B) from the UN122/UN123B RI board code. Then look for the UN122/UN123 RI type, your node processor (NP) type, and the interframe buffer (IFB) type required, to locate the hardware version value. The IFB unit name indicates the buffer capacity and the ring speed. In cases where it is necessary to identify a specic IFB, the following terminology and convention should be used:
Issue 16.0
December 2000
7-27
401-661-045
Example: IFB-4K/6 This is an IFB with 4K bytes of buffer running at the ring speed of 6 Mhz. The following information is a summary of current IFBs: EXISTING CONVENTION IFB PIFB padded IFB (IFB-P) CODE TN918 TN915 TN1506 TN1508 TN1509 TN1803 NEW CONVENTION IFB (IFB-16) IFB-P (IFB-512) IFB-4k/6 IFB-16/8 IFB-4k/8 IFB-4k/8
The plain term IFB should be used whenever it is not necessary to refer to a particular vintage of this circuit.
Voice Frequency Link Hardware Equipment Replacement Procedures

The voice frequency link (VFL) is composed of a VFL access CP (TN919) and a 2024A or a 2048A data set (the latter is used for 4.8 Kbps applications). The following are guidelines and precautions for replacing a VFL access CP or a data set.
Procedure 7-10. Voice Frequency Link Access Circuit Pack Replacement Procedures
1. At the affected equipment location or the MCRT, silence any audible alarm by pressing the ACO key or the ALM-RLS key. 2. Before attempting to change, inspect, or handle any CP, ground yourself using the static control wrist strap (3M-2066).
7-28 Issue 16.0
December 2000
3. Obtain the replacement VFL access CP.
Keep the CP in the protective wrapping until it is ready to be inserted in the frame/cabinet. 4. At the MCRT, put the SLK in the UNAV-TEST state. Use the Change Analog SLK VFL Access Circuit Board Procedures in the section referred to above. NOTE: If the SLK is already in the AVL-OOS state, it can be moved directly to the UNAV-TEST state without rst being moved to the AVL-MOOS state. 5. At the affected equipment location, remove the suspect VFL access CP and insert the new CP. 6. Wrap the suspect CP, and place it in a carton to be returned for repair. 7. Restore the SLK to service. Use the Change Analog SLK VFL Access Circuit Board Procedures in the section referred to above.
CAUTION:
Procedure 7-11. Data Set Replacement Procedures

1. At the affected equipment location or the MCRT, silence any audible alarm by pressing the ACO key or the ALM-RLS key. 2. Before attempting to change, inspect, or handle any CP, ground yourself using the static control wrist strap (3M-2066). 3. Obtain the replacement data set.
Keep the data set in the protective wrapping until it is ready to be inserted in the frame/cabinet. 4. At the MCRT, put the SLK in the UNAV-TEST state. Use the Change Analog SLK Data Speed Procedures in the section referred to above.
CAUTION:
Issue 16.0
December 2000
7-29
401-661-045
NOTE: If the SLK is already in the AVL-OOS state, it can be moved directly to the UNAV-TEST state without rst being moved to the AVL-MOOS state. 5. At the back of the data set unit, remove the appropriate data set cables and the suspect data set. 6. On the data set unit, verify that the rise time option switches are set correctly:
s s
In the open position, the rise time is set for fast. In the closed position (toward numbers), the rise time is set for slow.
7. Insert the new data set and connect the data set cables. 8. Wrap the suspect data set, and place it in a carton to be returned for repair. 9. Set the data set options and restore the SLK to service. Use the Change Analog SLK Data Speed Procedures in the section referred to above.
7-30 Issue 16.0
December 2000
Introduction
This appendix provides information about the ring node portion of the ring error analysis and recovery mechanisms. The error handling for ring errors is split between the node and the 3B21D. When an error is detected by a node, that node will perform some recovery action and then report the error by sending a message to the 3B21D. The 3B21D will then take some corrective action and notify the craft via message printed on the ROP. This document describes all errors reported to the 3B21D by the node. Included is a description of the error, the recovery action taken by the node, and the state of the node after the recovery is complete.
Data Structures
The following structures dene the error message the node sends to the 3B21D. Throughout this document, this message will be referred to as the error message when discussing data that will be sent from the node to the 3B21D. Normally when an error occurs, the node will send error messages on both rings to the 3B21D. This ensures that a message will reach the 3B21D. In some cases, this is not possible and this will be noted as otherwise. This is the 3B21D view of the error message layout. See header le ims/com/ head/ims_emsgs.h for the NP view.
Issue 16.0
December 2000
A-1
401-661-045
struct immemsg { struct immsg_hd immh; NODE_PADD node; unsigned char imm_etype; unsigned char erring; union vardata { struct { union{ struct header dhead; struct{ short tokblk; short sint;/ short spare2; short spare3; } misc; } un; struct _riracstat ports;/ struct _riracstat opports;/ } specic; unsigned char dchar[24]; unsigned short dshrt[12]; long dlong[6]; } data; };
/* IMS mtce. message header */ /* phys. addr. of ring node */ /* IMS error message type */ /* faulty ring */
/* header from failing msg */ /*Blockage occurred on the token */ * False interrupt indicator */
* the rac error ports */ * opposite rac ports */ /* general information */
General Information
In the following descriptions, the terms upstream node and downstream node will be used. These terms describe relative positions of nodes and are based on the direction of data ow on the rings. Basically, any particular node will RECEIVE data from its upstream neighbor and will SEND data to its downstream neighbor. Since the data ows in opposite directions on the two rings, a nodes upstream neighbor on ring 1 is the downstream neighbor on ring 0 and its upstream neighbor on ring 0 is the downstream neighbor on ring 1.
A-2
Issue 16.0
December 2000
The following pages contain several headings. The error code is the dened symbol for the particular error and is placed in the immemsg.imm_etype eld in the error message. The faulty ring is indicated in the erring eld in the error message. The description is a detailed description of the error and the node recovery action is a description of the node recovery process. The variable data is a description of the variable data in the error message. This data is intended to be used by the 3B21D when analyzing the error and will differ depending on the error type. There may be other data in the error message that is provided to be printed at the ROP. The ROP data is a description of the data that is printed on the ROP. This data is taken from the error message. The error message will be in the following general form: REPT RING TRANSPORT ERR See the output manual page for the complete description of the ROP output message. When this message is printed, various data elds will be included in the printout, and it is assumed that data taken from the error message from the node will be printed in the following order: 0xAAAAAAAA 0xBBBBBBBB 0xCCCCCCCC 0xDDDDDDDD 0xEEEEEEEE 0xFFFFFFFF(TTTTTTTTTT) AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD EEEEEEEE FFFFFFFF TTTTTTTTTT immemsg.data.dlong[0] immemsg.data.dlong[1] immemsg.data.dlong[2] immemsg.data.dlong[3] immemsg.data.dlong[4] immemsg.data.dlong[5] The value of the real time clock.
Blockage Error
Error Code
_RG_BLKG, _RG_RDBLK
Issue 16.0
December 2000
A-3
401-661-045
Description
The blockage timer has timed out waiting for transfer of data. The following table contains the error ags that are used to determine this error. At the present time, only _RG_BLKG is reported to the 3B21D, regardless of the type of node. The _RD_RDBLK is provided for future use with the IRN.
RAC ERROR FLAGS

IRN PRPBLK (_RG_BLKG) RDBLK (_RG_RDBLK) IRN ERROR Propagate Blockage Read Blockage
The IRN nodes report blockage in two situations: the downstream node does not take the data or the read FIFO does not take the data. The rst is called propagate blockage and the latter called read blockage. Propagate blockage means the downstream node is the cause of the fault, whereas read blockage indicates that the reporting node is at fault.
Node Recovery Action

IRN The node will be put in force read to clear the ring. This action will remove the token from the ring. After error recovery, the node will be in total silence. When the blockage is detected, the error message cannot be sent on the faulty ring so an error message must be sent on the opposite ring. Consider a case of blockage on a ring that has an isolated segment. If the error message is sent on the opposite ring to the home RPC, it may go through the EISO or BISO node and return to the faulty RAC before it reaches the home RPC. If an error message is sent to each RPC, it has a better chance of arriving at an RPC before it reaches the EISO or BISO node and is looped back. Therefore, the node will try to send error messages to each RPC on the opposite ring. If a blockage is detected by an EISO or BISO node, the node cannot send error messages because of the blockage, but it will still perform the recovery action described above with one additional step, which is that inhibit input will be set. The total effect is that the blockage is not reported and the ring has no token. The rst indication of trouble in the 3B21D is that it will receive an unexpected loss of token error message. See the _RG_NOTOKEN error description.
A-4
Issue 16.0
December 2000
If the blockage is a read blockage, the hardware will destroy the message and switch the RAC to the force propagate mode, the token will remain on the ring and the ring will continue to operate normally. The read blockage is reported to the 3B21D with the _RG_RINH error code. This code is used to indicate that the blockage was the fault of the reporting node and not the downstream node. The error is reported by sending error messages on each RPC on the opposite ring. If a blockage occurs on a broadcast message, the error ags will indicate both propagate and read blockages. This case will be handled as a propagate blockage.
Variable Data
immemsg.data.specic.ports Rac status ports from the faulty ring. immemsg.data.specic.opports Rac status ports from the opposite ring. See Notes. immemsg.data.specic.un.misc.tokblk Block on token code, which indicates whether the token was being held by the node when the blockage timeout occurred. Nonzero values indicate that the node found evidence that it was holding the token. See le ims_emsgs.h for details.
ROP Data
BLOCKAGE DETECTED (LN/RPCN)XX YY RAC (0/1) 0xaabbccdd 0xeeffgghh 0xjjkkllmm 0xnnppqqrr 0xssttuuvv 0xwwxxyyzz (TTTTTTTTTT) aabb ccdd ee ff gg hh jj kk Block on token code (see description above). not used. The nodes home RPC overow state (IRN only). The nodes overload state (IRN only). The nodes overow state (IRN only). The nodes silence state (IRN only). node type, 3 = IRN. port C, faulty ring.
Issue 16.0
December 2000
A-5
401-661-045
ll mm nnpp qq rr ss tt uu vv wwxx yy zz -
port B, faulty ring. port A, faulty ring. not used. port E, faulty ring (IRN only). port D, faulty ring (IRN only). not used. port C, opposite ring. See Notes. port B, opposite ring. See Notes. port A, opposite ring. See Notes. not used. port E, opposite ring (IRN only). See Notes. port D, opposite ring (IRN only). See Notes.
NOTE: This status port information from the RAC is used to transmit the error report. For this particular error type, the status is always from the RAC opposite to that on which the error occurred. If the error report was sent by an RPC node, this status information is meaningless.
Hard Ring Parity Errors

Error Code
_RG_HPTY
Description
This error indicates a byte with bad parity has been presented to the input of the RAC. A hard parity error is a parity error that cannot be cleared by the node
RAC ERROR FLAGS

IRN PTYERR ERROR Parity error
The faulty byte will not be accepted by the node and the upstream node will eventually detect blockage.
A-6
Issue 16.0
December 2000

Since it has been determined that this is a hard error, inhibit input is set to prevent the faulty byte from producing recurring error interrupts, then the RAC error latches are cleared. Because the reporting node will not accept data from the upstream node, that node will report a blockage condition. Error messages are sent on both rings to the home RPC. Inhibit input is set to prevent the error from producing recurring error interrupts.
Variable Data
msg->specic.ports Rac status ports from the faulty ring. msg->specic.opports Rac status ports from the ring that was used to write the error message to the 3B21D. This information was taken just before the error message was written.
ROP Data
RAC PARITY/FORMAT ERROR DETECTED (LN/RPCN)XX YY RAC (0/1) 0xaabbccdd 0xeeffgghh 0xjjkkllmm 0xnnppqqrr 0xssttuuvv 0xwwxxyyzz (TTTTTTTTTT) aa bb cc dd eeffgghh jj kk ll mm nnpp qq rr ss tt The nodes home RPC overow state (IRN only). The nodes overload state (IRN only). The nodes overow state (IRN only). The nodes silence state (IRN only). not used. node type, 3 = IRN. port C, faulty ring. port B, faulty ring. port A, faulty ring. not used. port E, faulty ring (IRN only). port D, faulty ring (IRN only). not used. port C, opposite ring. See Notes.
Issue 16.0
December 2000
A-7
401-661-045
uu vv wwxx yy zz -
port B, opposite ring. See Notes. port A, opposite ring. See Notes. not used. port E, opposite ring (IRN only). See Notes. port D, opposite ring (IRN only). See Notes.
NOTE: This status port information from the RAC is used to transmit the error report. In most cases, this is the RAC opposite to that on which the error occurred. If the error report was sent by an RPC node, this status information is meaningless.
Orphan Byte Error

Error Code
_RG_ORBYTE
Description
An orphan byte has been presented to the input of the RAC. An orphan byte condition occurs when the RAC is expecting a C byte but the byte received is not a C byte. At the present time, the orphan byte is reported to the 3B21D using the _RG_HPTY error code. The _RG_ORBYTE code is provided for future IRN application.
RAC ERROR FLAGS

IRN ORBYTE IRN ERROR Orphan byte
In the case of the orphan byte, 2 bytes are accepted into the input FIFO of the IRN. The bytes are not read into memory and will be held until the error condition is cleared.
A-8
Issue 16.0
December 2000

IRN The error interrupt is disabled to prevent recurring interrupts. The error latches are cleared and the 3B21D is notied of the message.
Two bytes may have been accepted by the input FIFO. A processor RAC reset must be issued to clear the orphan byte(s) from the input FIFO. The input is inhibited to prevent the input FIFO from accepting more bytes. Because the reporting node will not accept data from the upstream node, that node will report a blockage condition. The orphan byte error is reported by sending error messages to each RPC only on the opposite ring.
Variable Data
ROP Data
RAC PARITY/FORMAT ERROR DETECTED (LN/RPCN)XX YY RAC (0/1) 0xaabbccdd 0xeeffgghh 0xjjkkllmm 0xnnppqqrr 0xssttuuvv 0xwwxxyyzz (TTTTTTTTTT) aa bb cc dd eeffgghh jj kk ll The nodes home RPC overow state (IRN only). The nodes overload state (IRN only). The nodes overow state (IRN only). The nodes silence state (IRN only). not used. node type, 3 = IRN. port C, faulty ring. port B, faulty ring.
Issue 16.0
December 2000
A-9
401-661-045
mm nnpp qq rr ss tt uu vv wwxx yy zz -
port A, faulty ring. not used. port E, faulty ring (IRN only). port D, faulty ring (IRN only). not used. port C, opposite ring. See Notes. port B, opposite ring. See Notes. port A, opposite ring. See Notes. not used. port E, opposite ring (IRN only). See Notes. port D, opposite ring (IRN only). See Notes.
Soft Ring Parity Error

Error Code
_RG_SPTY
Description
This error indicates a ring parity error occurred but was subsequently cleared by the recovery routine..
RAC ERROR FLAGS

IRN PTYERR IRN ERROR Parity Error
Because of the difference in the recovery action, orphan byte errors will not be included in this error class. All orphan byte errors will be hard errors.
A-10
Issue 16.0
December 2000

The node was able to clear the parity error latch so the parity error is considered to be a transient error. Error messages are sent to the home RPC via both rings and the node will be in its normal operating condition. The RAC port information in the error message will show the RAC status of the faulty ring before the error was cleared.
Variable Data
ROP Data
TRANSIENT RAC ERROR DETECTED (LN/RPCN)XX YY RAC (0/1) 0xaabbccdd 0xeeffgghh 0xjjkkllmm 0xnnppqqrr 0xssttuuvv 0xwwxxyyzz (TTTTTTTTTT) aa bb cc dd eeffgghh jj kk ll mm nnpp qq rr ss tt uu The nodes home RPC overow state (IRN only). The nodes overload state (IRN only). The nodes overow state (IRN only). The nodes silence state (IRN only). not used. node type, 3 = IRN. port C, faulty ring. port B, faulty ring. port A, faulty ring. not used. port E, faulty ring (IRN only). port D, faulty ring (IRN only). not used. port C, opposite ring. See Notes. port B, opposite ring. See Notes.
Issue 16.0
December 2000
A-11
401-661-045
vv wwxx yy zz -
port A, opposite ring. See Notes. not used. port E, opposite ring (IRN only). See Notes. port D, opposite ring (IRN only). See Notes.
Interframe Buffer Parity Error

Error Code
_RG_IFBP
Description
The upstream interframe buffer has detected a parity error..
RAC ERROR FLAGS

IRN IFBPF ERROR IFB parity error

Inhibit input will be set on the faulty ring and an error message will be sent on both rings to the home RPC. The inhibit input is effective at the input of the interframe buffer. This will cause the node upstream of the interframe buffer to report a blockage.
Variable Data
msg->specic.ports Rac status ports from the faulty ring. msg->specic.opports -
A-12
Issue 16.0
December 2000
Rac status ports from the ring that was used to write the error message to the 3B21D. This information was taken just before the error message was written.
ROP Data
INTERFRAME BUFFER PARITY ERROR DETECTED (LN/RPCN)XX YY RAC (0/1) 0xaabbccdd 0xeeffgghh 0xjjkkllmm 0xnnppqqrr 0xssttuuvv 0xwwxxyyzz (TTTTTTTTTT) aa bb cc dd eeffgghh jj kk ll mm nnpp qq rr ss tt uu vv wwxx yy zz The nodes home RPC overow state (IRN only). The nodes overload state (IRN only). The nodes overow state (IRN only). The nodes silence state (IRN only). not used. node type, 3 = IRN. port C, faulty ring. port B, faulty ring. port A, faulty ring. not used. port E, faulty ring (IRN only). port D, faulty ring (IRN only). not used. port C, opposite ring. See Notes. port B, opposite ring. See Notes. port A, opposite ring. See Notes. not used. port E, opposite ring (IRN only). See Notes. port D, opposite ring (IRN only). See Notes.
Issue 16.0
December 2000
A-13
401-661-045

Error Code
_RG_ROPF
Description
A explanation on the RAC hardware is needed to understand this error code which is another form of blockage. When a node detects blockage while propagating a message, the hardware will be set to force read the remainder of the message that was being propagated and will then stop the ring. If the blockage occurred while the node was writing data to the ring, the write is stopped and the contents of the RAC FIFO are read into memory. As part of the recovery procedure, the data that was read into memory is checked for valid parity. Bad parity would explain the blockage because the downstream node will not accept data with bad parity. To get this error, the RAC must have received good data either from the upstream node or the node processor, but it tried to transmit bad parity to the downstream node. This implies the RAC hardware is faulty. If a node reports this error, the downstream node should have reported a hard parity error. If this error occurs during a write, a partial message may have been written to the ring and this will cause one or more downstream nodes to report a read format error.
RAC ERROR FLAGS

IRN PRPBLK ERROR Propagate Blockage Propagate Blockage IRN This error code will not be reported from an IRN if the blockage is a read blockage. In that case, no data will be read into the NP memory.

The node recovery action will be the same as in the blockage error (_RG_BLKG).
A-14
Issue 16.0
December 2000
Variable Data
immemsg.data.specic.ports Rac status ports from the faulty ring. immemsg.data.specic.opports Rac status ports from the opposite ring. See Notes.
ROP Data
RAC OUTPUT PARITY ERROR DETECTED (LN/RPCN)XX YY RAC (0/1) 0xaabbccdd 0xeeffgghh 0xjjkkllmm 0xnnppqqrr 0xssttuuvv 0xwwxxyyzz (TTTTTTTTTT) aabb ccdd ee ff gg hh jj kk ll mm nnpp qq rr ss tt uu vv wwxx yy zz not used. not used. The nodes home RPC overow state (IRN only). The nodes overload state (IRN only). The nodes overow state (IRN only). The nodes silence state (IRN only). node type, 3 = IRN. port C, faulty ring. port B, faulty ring. port A, faulty ring. not used. port E, faulty ring (IRN only). port D, faulty ring (IRN only). not used. port C, opposite ring. See Notes. port B, opposite ring. See Notes. port A, opposite ring. See Notes. not used. port E, opposite ring (IRN only). See Notes. port D, opposite ring (IRN only). See Notes.
Issue 16.0
December 2000
A-15
401-661-045
Write Format Error

Error Code
_RG_WFMT, _RG_WRSMM, _RG_WRTOSHRT, _RG_WRLEN
Description
These error codes indicate some error occurred while a node was attempting to write a message to the ring. At the present time, all write errors are reported with the _RG_WFMT error code, regardless of the type of node reporting the error. The other error codes are provided for future use with the IRN.
RAC ERROR FLAGS

IRN WRSMERR (_RG_WRSMM) W2SHRT (_RG_WRTOSHRT) WRLEN (_RG_WRLEN) IRN ERROR Write source match Write to short Write length error
This error code may indicate one of the following: a. Write source match error. The node tried to write a message to the ring, but the source address did not match the nodes address or the source ring in the message did not match the ring being used. b. Write too short. A C byte was presented to the header FIFO before the FIFO had received enough of the header to determine the disposition of the message. c. Write length error. When a write is performed, a counter is loaded with the length value from the message. If the write FIFO becomes empty and the write DMA channel asserts the end of DMA signal (EOD) before the counter reaches zero, a write length error is indicated. This error means the RAC saw at least the rst 6 bytes of the message and was able to
A-16
Issue 16.0
December 2000
determine the disposition of the message. If this error occurs, partial message was sent on the ring and downstream node(s) may report read format errors (_RG_RFMT).

The write in progress is removed from the write queue and a _RETRY code is returned to the writer. Inhibit input is set and an error message is sent to the home RPC on both rings.
Variable Data
msg->specic.ports Rac status ports from the faulty ring. msg->specic.opports Rac status ports from the ring that was used to write the error message to the 3B21D. This information was taken just before the error message was written. msg->specic.dhead The header of the message that was being written to the ring.
ROP Data
WRITE FORMAT ERROR DETECTED (LN/RPCN)XX YY RAC (0/1) 0xaabbccdd 0xeeffgghh 0xjjkkllmm 0xnnppqqrr 0xssttuuvv 0xwwxxyyzz (TTTTTTTTTT) aabbccdd eeffgghh - Header of the message that was being written to the ring. jj kk ll mm nnpp qq rr ss tt uu node type, 3 = IRN. port C, faulty ring. port B, faulty ring. port A, faulty ring. not used. port E, faulty ring (IRN only). port D, faulty ring (IRN only). not used. port C, opposite ring. See Notes. port B, opposite ring. See Notes.
Issue 16.0
December 2000
A-17
401-661-045
vv wwxx yy zz -
port A, opposite ring. See Notes. not used. port E, opposite ring (IRN only). See Notes. port D, opposite ring (IRN only). See Notes.
Read Format Error

Error Code
_RG_RFMT, _RG_RDTO, _RG_RDLEN
Description
.
RAC ERROR FLAGS

IRN ERROR Read timeout RDLEN (_RG_RDLEN) IRN Read length error
Read length error. A C byte was received before the end of the message is reached

The error latch is cleared, and the received message is discarded. An error message is sent to the home RPC on both rings. Note that IUNs will not report a read format error if it occurs on a broadcast message. Only RPCs will report read format errors on broadcast messages.
Variable Data
msg->specic.ports Rac status ports from the faulty ring.
A-18
Issue 16.0
December 2000
msg->specic.opports Rac status ports from the ring that was used to write the error message to the 3B21D. This information was taken just before the error message was written. msg->specic.dhead The header of the message that was being read from the ring.
ROP Data
READ FORMAT ERROR DETECTED (LN/RPCN)XX YY RAC (0/1) MSG SRC: (LN/RPCN)GG MM, MSG TYPE: (NORMAL/BROADCAST/ SEL BROADCAST/TAKE) 0xaabbccdd 0xeeffgghh 0xjjkkllmm 0xnnppqqrr 0xssttuuvv 0xwwxxyyzz (TTTTTTTTTT) MSG SRC, MSG TYPE - MSG SRC and MSG TYPE are the source node and message type respectively, extracted from the rst word of the message header: 0xaabbccdd. When the node is unsuccessful in recovering the message involved in the READ FORMAT ERROR, 0xaabbccdd is set to 0xffffffff. . aabbccdd eeffgghh - Header of the message that was being read from to the ring. If the node could not recover the message that was read from the ring, these elds will be set to 0xffffffff. jj kk ll mm nnpp qq rr ss tt uu vv wwxx yy zz node type, 3 = IRN. port C, faulty ring. port B, faulty ring. port A, faulty ring. not used. port E, faulty ring (IRN only). port D, faulty ring (IRN only). not used. port C, opposite ring. See Notes. port B, opposite ring. See Notes. port A, opposite ring. See Notes. not used. port E, opposite ring (IRN only). See Notes. port D, opposite ring (IRN only). See Notes.
Issue 16.0
December 2000
A-19
401-661-045
Received Too Short Error

Error Code
_RG_RDTOSHRT
Description
.
RAC ERROR FLAGS

IRN R2SHRT IRN ERROR Read too short.
Read too short. A second C byte was received before a complete ims header had been received.

The error latch is cleared and the partial header is discarded. The node will return to its normal operating mode. It is assumed that an upstream node mutilated the message. Error messages are sent to the home RPC on both rings.
Variable Data
A-20
Issue 16.0
December 2000
ROP Data
READ TOO SHORT DETECTED (LN/RPCN)XX YY RAC (0/1) 0xaabbccdd 0xeeffgghh 0xjjkkllmm 0xnnppqqrr 0xssttuuvv 0xwwxxyyzz (TTTTTTTTTT) aabbccdd eeffgghh - The partial header that was read into memory. If the node could not recover the message that was read from the ring, these elds will be set to 0xffffffff. jj kk ll mm nnpp qq rr ss tt uu vv wwxx yy zz node type, 3 = IRN. port C, faulty ring. port B, faulty ring. port A, faulty ring. not used. port E, faulty ring (IRN only). port D, faulty ring (IRN only). not used. port C, opposite ring. See Notes. port B, opposite ring. See Notes. port A, opposite ring. See Notes. not used. port E, opposite ring (IRN only). See Notes. port D, opposite ring (IRN only). See Notes.
Read Inhibit Error

Error Code
_RG_RINH
Issue 16.0
December 2000
A-21
401-661-045
Description
When a blockage occurs during a write, the data in the FIFO should be transferred to the NP memory. If a blockage occurs during a read or while propagating a message, the data up to the next C byte should be read into memory. This error code is set if it appears that no data was put into memory. Either problem indicates that the RAC hardware is faulty or there is a problem with the DMAC. This error code indicates that the reporting node caused the blockage, not the downstream node..
RAC ERROR FLAGS

IRN PRPBLK IRN ERROR Propagate Blockage
At the present time, this error code is used in the IRN to report a read blockage.

The recovery action will be the same as that action in the blockage error, (_RG_BLKG).
Variable Data
immemsg.data.specic.ports ac status ports from the faulty ring. immemsg.data.specic.opports Rac status ports from the opposite ring. See Notes.
ROP Data
READ INHBIT ERROR DETECTED (LN/RPCN)XX YY RAC (0/1) 0xaabbccdd 0xeeffgghh 0xjjkkllmm 0xnnppqqrr 0xssttuuvv 0xwwxxyyzz (TTTTTTTTTT) aabb ccdd ee ff not used. not used. The nodes home RPC overow state (IRN only). The nodes overload state (IRN only).
A-22
Issue 16.0
December 2000
gg hh jj kk ll mm nnpp qq rr ss tt uu vv wwxx yy zz -
The nodes overow state (IRN only). The nodes silence state (IRN only). node type, 3 = IRN. port C, faulty ring. port B, faulty ring. port A, faulty ring. not used. port E, faulty ring (IRN only). port D, faulty ring (IRN only). not used. port C, opposite ring. See Notes. port B, opposite ring. See Notes. port A, opposite ring. See Notes. not used. port E, opposite ring (IRN only). See Notes. port D, opposite ring (IRN only). See Notes.
Excessive Ring Command Interrupts

Error Code
_RG_XHCMD
Description
In order to detect and recover from problems caused by certain types of circulating hardware control messages, ring error interrupts generated by hardware control message execution are counted and thresholded at IRN RPCs. This report indicates that the number of these ring command interrupts generated at the reporting IRN RPC has exceeded a threshold. A leaky bucket thresholding technique is used to determine when the number of interrupts is excessive; a
Issue 16.0
December 2000
A-23
401-661-045
count of ring command events is incremented during processing of ring error interrupts, and decremented on each 10 ms clock interrupt. After incrementing the leaky bucket count, the ring error interrupt handler compares the count against a pre-dened threshold; if the count has exceeded the threshold, a circulating hardware control message is assumed to be the cause. The leaky bucket count increment, decrement, and threshold are parameters dened in header le rg.ear.h. Two separate thresholds are dened: one for use during the normal RPC operational state (RPCS4), and one for use during the RPC initialization and ring maintenance states (RPCS2 and RPCS3). This error condition is most likely an indication of a circulating broadcast type hardware control message - one of the nonlethal control types that do not quarantine or NP reset the affected nodes. A circulating nonbroadcast RAC reset message will also generate ring command interrupts in this way. A less likely cause is faulty ring interface hardware that generates an unclearable ring command interrupt. Refer to the contents of RAC status port D for an indication of the type of hardware control command that generated the excessive interrupt activity.

As indicated above, only IRN RPC nodes detect and report this error condition. When the condition is detected, the RPC takes a recovery action designed to halt and destroy circulating hardware control messages: propagate inhibit is set on both rings, and after a time delay to allow the circulating message to traverse the ring and return, inhibit input is set on both RACs and both RACs are reset. If these recovery actions do not clear all errors on the interrupting ring, the error interrupt on that ring is disabled. After this recovery action has been completed, the problem is reported to the 3B21D.
Variable Data
immemsg.data.specic.ports RAC status ports from the interrupting ring, prior to the node recovery actions. immemsg.data.specic.opports RAC status ports from the interrupting ring, after the node recovery actions have been completed. ROP Data EXCESSIVE RING CMD INTERRUPTS DETECTED, RPCNXX YY RAC (0/1) 0xaabbccdd 0xeeffgghh 0xjjkkllmm 0xnnppqqrr 0xssttuuvv 0xwwxxyyzz (TTTTTTTTTT)
A-24
Issue 16.0
December 2000
aa bb -
RPC node state. a ag to indicate whether the leaky bucket counters value incremented past the threshold over a span of multiple ring error interrupts (ag = 01) or entirely during the processing of one ring error interrupt (ag = 00). value of ring cmd interrupt leaky bucket counter, after it was incremented and found to exceed the counter threshold. leaky bucket counter increment, on each ring command event. leaky bucket counter decrement, on each 10 ms clock tick. leaky bucket counter threshold. node type, 3 = IRN (should always indicate IRN). port C, interrupting ring (prior to recovery actions). port B, interrupting ring (prior to recovery actions). port A, interrupting ring (prior to recovery actions). not used. port E, interrupting ring (prior to recovery actions). port D, interrupting ring (prior to recovery actions). not used. port C, interrupting ring (after recovery actions). port B, interrupting ring (after recovery actions). port A, interrupting ring (after recovery actions). not used. port E, interrupting ring (after recovery actions). port D, interrupting ring (after recovery actions).
ccdd ee ff gghh jj kk ll mm nnpp qq rr ss tt uu vv wwxx yy zz -
Token Removed from Ring

Error Code
_RG_RDTOKEN
Issue 16.0
December 2000
A-25
401-661-045
Description
The opns module has determined this node removed the token from the ring. The token was taken from the ring as if there was a legitimate destination address match. The node message switch was delivering messages from the ring buffers and a message destined for the _TOKEN channel was encountered. There are no ring status ports to check; this is purely a software decision. However, if the token was actually removed from the ring, the INACT bit in the RAC status information may be set.

The node takes no recovery action; it only reports the error by sending an error message on both rings to the home RPC.
Variable Data
msg->specic.dhead The header of the suspected token.
ROP Data
DEQUEUED TOKEN DETECTED (LN/RPCN)XX YY RAC (0/1) 0xaabbccdd 0xeeffgghh 0xjjkkllmm 0xnnppqqrr 0xssttuuvv 0xwwxxyyzz (TTTTTTTTTT) aabbccdd eeffgghh jj kkllmm nnppqqrr ssttuuvv wwxxyyzz The header of the suspected token.
node type, 3 = IRN. not used. not used. not used. not used.
Source Match Error

Error Code
_RG_SRCM
A-26
Issue 16.0
December 2000
Description
The node placed a message on the ring, but the destination node was not able to remove the message from the ring. The message traveled completely around the ring and returned to the source node.
RAC ERROR FLAGS

IRN ERROR Ring source match The ring hardware generates a ring error interrupt when a source match occurs. However, the ring error interrupt handler simply clears the error, letting the node message switch software determine which messages arriving from the ring are source matches. The node message switch software declares a source match when a message arrives from the ring for which all of the following are true: (a) the 12-bit source address eld contains an address matching the nodes physical address, (b) the source ring ID bit eld matches the ring the message was received on, and (c) the 12-bit destination address eld does not contain an address matching the physical address of the node. This software denition of source match lets a node send a message to itself. The hardware generates a source match interrupt under the following conditions: IRN A source match error interrupt is generated when a message arrives with a source address that matches the nodes physical address, and a destination address that does not match the nodes physical or virtual address.

IRN The source match error latch is cleared.
Variable Data
msg->specic.dhead The header from the message that caused the source match.
ROP Data
RMV (LN/RPCN)XX YY; SRC MATCH RPTD BY (LN/RPCN)AA BB 0xaabbccdd 0xeeffgghh (TTTTTTTTTT)
Issue 16.0
December 2000
A-27
401-661-045
aabb ccdd ee ff gghh -
The source address of the source match message. The control word of the source match message. The destination function of the source match message. The source function of the source match message. The destination address of the source match message.
Miscellaneous RAC Problem

Error Code
_RG_RACPROB
Description
An error interrupt is generated and the error cannot be cleared. This error is a catch-all to handle some unexpected hardware or software condition. When any error interrupt is generated, the status ports are saved, the errors are cleared, and some recovery action is taken. After the recovery has completed, the status ports are checked again. If errors still exist, the cycle of clearing the error and performing the recovery is repeated. If the number of times the cycle is repeated exceeds a predened threshold, it is assumed that the error is permanent and the RAC problem is reported to the 3B21D. It is possible for this error to be caused by a circulating message on the ring.

The error is reported and inhibit input is set to prevent the recurring interrupt if it is caused by ring messages. The error interrupt is also disabled.
Variable Data
immemsg.data.specic.ports Rac status ports from the faulty ring. immemsg.data.specic.opports Rac status ports from the opposite ring. See Notes. immemsg.data.specic.un.misc.sint -
A-28
Issue 16.0
December 2000
Will be 1 if the problem is a false interrupt; otherwise, it will be 0.
ROP Data
GENERAL RAC ERROR DETECTED (LN/RPCN)XX YY RAC (0/1) 0xaabbccdd 0xeeffgghh 0xjjkkllmm 0xnnppqqrr 0xssttuuvv 0xwwxxyyzz (TTTTTTTTTT) aabb ccdd ee ff gg hh jj kk ll mm nnpp qq rr ss tt uu vv wwxx yy zz False interrupt indicator. If this eld is 0x1, the problem was a false interrupt generated by a RAC. not used. The nodes home RPC overow state (IRN only). The nodes overload state (IRN only). The nodes overow state (IRN only). The nodes silence state (IRN only). node type, 3 = IRN. port C, faulty ring. port B, faulty ring. port A, faulty ring. not used. port E, faulty ring (IRN only). port D, faulty ring (IRN only). not used. port C, opposite ring. See Notes. port B, opposite ring. See Notes. port A, opposite ring. See Notes. not used. port E, opposite ring (IRN only). See Notes. port D, opposite ring (IRN only). See Notes.
Issue 16.0
December 2000
A-29
401-661-045
Unexpected Loss of Token

Error Code
_RG_NOTOKEN
Description
The node is trying to write to the ring, and a timer expired while waiting for the token to arrive at this node. This timing interval is 60 msec. This error is reported only by an RPC and then only when it has attempted to write to the ring.

The node reports the error and the pending write is removed from the write queue.
Variable Data
None.
ROP Data
UNEXPLAINED LOSS OF TOKEN ON aa aa RING 0, RING1 or BOTH RINGS.
Checksum Audit Failure

Error Code
_BADTXTCS
Description
The node checksum audit on a text or data section has failed.

The node reports the error but takes no recovery action.
A-30
Issue 16.0
December 2000
ROP Data
RMV (LN/RPCN)XX YY; NODE CKSUM ERROR 0xaabbccdd 0xeeffgghh (TTTTTTTTTT) aa bb cc dd eeff gghh Current audit number. Accumulated sum. Not used. Reference sum. Segment that the audit was running in. Offset to the beginning of the section that was being audited.
Node Processor Parity Failure

Error Code
_RG_NPPF
Description
This error code should never be reported to the 3B21D, but it is included here for reference. If a node processor parity failure occurs, the node will panic but it will not send an error message. If there is bad parity and an attempt is made to send a message, it may create parity errors at the downstream node and cause that node to be removed from service. If an NP parity error occurs while writing a message to the ring, the write will be terminated. This will chop off the end of the message and cause the downstream node(s) to report a read format error. The 3B21D will be unaware of the problem until a message destined for the node is returned as a source match.
Design Issues
Some new error codes were created, but they were mapped to existing error codes. These new codes were provided for future use in the 3B21D. 1. Presently, the indication of which ring is at fault is the upper bit of the error code in the error message. Would it be any simpler to dedicate a separate eld in the error message for this purpose?
Issue 16.0
December 2000
A-31
401-661-045
Yes. A spare eld in the error message will be assigned to use as the faulty ring indicator. The bit will still be set until the 3B21D code is changed to the new eld in the message. 2. The error messages will contain an indication of the type of node that sent the message. 3. In the current conguration, the error message may contain the RAC ports of both rings. Is this necessary? The info from the opposite port is not usually used by the 3B21D, but in some cases the additional information in the ROP printout is helpful in the analysis of the problem. For that reason, the opposite port information will be retained whenever possible. This status information is really data that is obtained from the RAC on which the error message was transmitted. When an error message is sent on both rings, it is not possible to tell which RAC this status belongs to. Should something be added to the error message to indicate which RAC the message was transmitted from? Should this status information be provided when the reporting node is an RPC? 4. Should the orphan byte error be handled separately from the parity error? Yes. Previously, these errors were grouped together because the recovery action was the same in either case. That is no longer true, so a new error code will be assigned for orphan byte errors. Also, the orphan byte error requires that error messages be sent to all RPCs on the opposite ring. 5. There are three error codes that indicate blockage, _RG_BLKG, _RG_ROPF, and _RG_RINH. Is it necessary to have all of the error codes? The ring error analysis in the 3B21D relies on the different error codes to determine how to recover from the error. 6. What if there is a source match and the destination address of the source match message is a virtual address. How does the 3B21D know which node to remove? Is it going to have to wait until the neighbor audit runs to discover which node is in error? This is a known hazard associated with using virtual addresses. The source match will not be reported if the destination is a virtual address. The faulty node will not be removed until the neighbor audit runs. 7. At the present time, the recovery strategy for a write format error (_RG_WFMT), sets inhibit input, which will cause the upstream node to see a blockage. Is this overkill? There seems to be a couple of things to consider. Why should we block the ring because one node cannot write? The problem with causing blockage is that all trafc on the ring is lost and this seems a harsh penalty to pay for a write format error.
A-32
Issue 16.0
December 2000
It seems logical that we could clean up after the error and try to continue normal operation. The 3B21D could then make the decision whether to remove the node from service. This is also the case with the input format error (_RG_IFMT). This is really a read too short error. Inhibit input is also set when this error occurs. The nal decision was to set inhibit input in the IRN to make it look like an older node. 8. In some error cases in older nodes, inhibit input is set to prevent recurring error interrupts. Will that work in the IRN? Or would it be better to disable the ring interrupt? The IRN will continue to use inhibit input to prevent recurring interrupts. If it disabled interrupts, it would be difcult for the node to determine when to reenable the interrupt. 9. The error report printed at the ROP presently contains data taken from the error message. This is provided to help analyze the problem. The amount of data printed may change and is subject to the time required to print the message. The time used to print the report affects the total error recovery time. 10. The loss of token error message is sent to the 3B21D if the timer times out during a token write or if it times out during a priority write. Should there be a separate codes for the different write failures? The nal decision was not to create a new error code. 11. The input format error is really a read too short error, so the error code is changed from _RG_IFMT to _RG_RDTOSHRT.
Issue 16.0
December 2000
A-33
401-661-045
A-34
Issue 16.0
December 2000
Ring Transport Errors

This section provides brief descriptions of the circumstances that are associated with each type of REPT RING TRANSPORT ERROR message. The messages are classied according to the consequences of the errors that the messages report. The REPT RING TRANSPORT ERR/ UNEXPLAINED LOSS OF TOKEN message is listed separately as belonging to a class by itself.
Ring-Related Errors
The following ring transport errors indicate faults that obstruct the transportation of messages on the ring. Such faults usually lead to ring restarts and/or node isolations. BLOCKAGE A nodes blockage timer timed out waiting for the downstream node or interframe buffer board (IFB) to accept an offered data byte. The blocked node will clear the ring by reading all data from the ring, including the token message. It then reports the condition to the 3B20D/3B21D by sending on the opposite ring a BLOCKAGE Ring Transport Error Message to each RPCN.
Issue 16.0
December 2000
B-1
401-661-045
RAC OUTPUT PARITY ERROR A node attempted to transmit bad parity to the downstream node or IFB. Since bad parity is not accepted by the downstream node or IFB, the transmitting node eventually detects blockage and reads the data with bad parity into memory as part of the blockage recovery process. Upon recognizing the bad parity, the transmitting node will take the same recovery action as with BLOCKAGE, except that this error is reported instead of BLOCKAGE. READ INHIBIT ERROR Blockage occurred during a read or while propagating a message and no data was read into NP memory as part of the blockage recovery process. The node will take the same recovery action as with BLOCKAGE. RAC PARITY/FORMAT ERROR A node reporting this error will not accept data from its upstream neighbor, thereby forcing the upstream node to detect ring blockage. The following two conditions cause this error. (1) A ring data byte with bad parity has been offered to the node; and the node recovery action of resampling the data could not clear the error. (If bad parity were due to a transient error, resampling should clear it.) (2) An orphan byte has been offered to the node. An orphan byte condition occurs when a node expects to receive a control byte but is offered another byte instead. The control byte is the rst byte of data in an IMS message. A special signal lead on the ring bus is asserted only during the control byte, thereby allowing the receiving node to identify the control byte from all other message bytes. INTERFRAME BUFFER PARITY ERROR The upstream interframe buffer has detected a ring parity error. The IFB will not accept any more data, thereby forcing blockage in the node upstream from the IFB. WRITE FORMAT ERROR Some error occurred while a node was attempting to write a message to the ring. For example, the message may have had a source address that does not match that of the writing node, or the message specied an improper message length. A node reporting this error will not accept ring data from its upstream node, thereby forcing the upstream node to detect blockage. GENERAL RAC ERROR A catch-all error type used to report an unexpected node hardware or software condition. A node reporting this error will not accept ring data from its upstream node, thereby forcing the upstream node to detect blockage.
B-2
Issue 16.0
December 2000
DEQUEUED TOKEN A ring node reports this error when it nds that it has read the token message from the ring. This error is intended to detect failures that cause a node to inadvertently read data from the ring. RING INTERFACE FAILURE During a boot, ring maintenance activity found an RPCs ring interface to be faulty. PIO FAILURE A Programmed IO operation at an RPCN from the 3B20D/3B21D failed. RPCN ISOLATION An RPCN was removed from service due to isolation. The RPCN may or may not be an innocent victim. This condition is reported as a ring transport error but is actually a status message, since it is a condition imposed upon an RPCN by the 3B20D/3B21D as a result of ring transport error messages it has previously received.
Node-Related Errors
The following ring transport errors indicate faults that prevent the processing and transmission of messages in nodes. They usually lead to node quarantine. SOURCE MATCH A ring message returned to the sending node because the destination node did not remove the message from the ring. SRC MATCH This is the same as the SOURCE MATCH error, except the detection was made by the node audit (NAUD) operation. NAUD FAILURE The node audit operation failed in a communication test with a node. RPCN PANIC This is a failure condition in RPCN software. RPCN STATE CHANGE FAILURE The RPCN failed to conrm that it has followed a 3B20D/3B21D directive to change into a particular software state during ring maintenance activity.
Issue 16.0
December 2000
B-3
401-661-045
UNXPCTD STATE CHNG MSG This is similar to the RPCN STATE CHANGE FAILURE. Without having been sent a 3B20D/3B21D directive, an RPCN reported that it has changed into a particular software state. RING WRITE FAILURE An RPCN reported that it failed to write a message to the active ring. MSG RELAY FAILURE This is similar to the RING WRITE FAILURE. An RPCN failed in relaying a message from the 3B20D/3B21D onto one of the rings during ring maintenance activity. RING READ FAILURE An RPCN reported that it failed to read a message from the active ring. UNXPCTD SET QUA The 3B20D/3B21D received an unprovoked conrmation from an RPCN that it has been directed to quarantine itself. RAC CONTROL FAILURE During ring maintenance activity, the ability of the 3B20D/3B21D to control an RPCs ring access circuit (RAC) failed.
Errors Without Consequences

The following ring transport errors cause no system action other than a report. TRANSIENT RAC ERROR A ring data byte with bad parity was offered to the node and node recovery action of resampling the data cleared the error. Had the error not been cleared, a RAC PARITY/FORMAT ERROR would have been reported. If occurrences of this error exceed a specied rate, a RAC PARITY/ FORMAT ERROR will be reported and the node will be isolated. READ FORMAT ERROR A node read a message that was shorter than the length indicated in the message header, but at least the length of an IMS header (8 bytes). The received message is discarded. IUNs will not report this error if it occurs on a broadcast message, but RPCNs will.
B-4
Issue 16.0
December 2000
READ TOO SHORT ERROR A node read a message that was shorter than an IMS header (8 bytes). The partial message header is discarded.

RPCNs have reported loss of token to the 3B20D/3B21D, but no node has reported another ring transport error type to identify the cause or location of the ring problem.
Some IMS Input Messages

The following tables identify commonly used versions of some IMS input messages. In these tables, as elsewhere in this document, the following conventions are used: In the expression NODEa b substitute for NODE RPCN, IUN, or LN, substitute for a the 2-digit group number, and substitute for b the 2-digit member number. For a complete listing of all IMS input messages and their variations, consult the the 401-610-055 FLEXENT/AUTOPLEX Wireless Networks INPUT MESSAGES Message Manual or the 401-610-057 FLEXENT/AUTOPLEX Wireless Networks OUTPUT MESSAGES Manual Table B-1. Message RST:NODEa b RST:NODEa b:TLP Some Versions of the RST Input Message Result Restores the specied node conditionally. Restores the specied node conditionally and executes the Trouble Locating Procedure, thus generating at the conclusion of a failed diagnostic a list of circuit packs suspected of being faulty. Restores the specied node unconditionally. Returns all eligible, isolated nodes to the active ring. Returns the specied, isolated node, if it is eligible, to the active ring. Returns the specied range of nodes, if they are eligible, to the active ring.
RST:NODEa b;UCL CFR:RING CFR:RING ,NODEa b CFR:RING ,NODEa b ,NODEa b
Issue 16.0
December 2000
B-5
401-661-045
Table B-1. Message
Some Versions of the RST Input Message Result Returns the specied range of nodes, if they are eligible, to the active ring. Isolates the specied node, if it is eligible. Isolates the specied range of nodes, if they are eligible. Moves the indication of a faulty ring interface from the currently isolated node to the node identied as NODEa,b and causes the isolation to shift so that NODEa,b becomes the newly isolated node and the currently isolated node becomes the EISO or BISO node. See Manual Recovery from a Hard Fault on a Small Ring in Chapter 3, Ring Maintenance.
CFR:RING ,NODEa b ,NODEa b;INCLUDE CFR:RING ,NODEa b;EXCLUDE CFR:RING ,NODEa b NODEa b;EXCLUDE CFR:RING,NODEa,b;MOVFLT
Setting the ECD Flag for Manual Ring Mode

The manual ring mode ag is eld 22 of the UNODS 0 UCB form. The following is the procedure to set/reset this ag: 1. After executing the trbegin form, enter the form name ucb on the forms selection page. 2. ODIN will display the database operation page and request the action desired. Enter u to indicate a form update is required. 3. ODIN will display page 1 of the UCB form and position the cursor at eld 1. 4. Advance the cursor to eld 3 by depressing the <CR> key twice. 5. Enter UNODS in eld 3. Advance the cursor to eld 4 by depressing the <CR>, Enter 0 (zero) in eld 4. 6. If the form name is found, ODIN will display the current values in the ECD for each eld for page 1 of the form. A prompt for the next operation desired will appear at the lower portion of the screen. 7. Enter 2 in response to move to page 2 of the UCB form. ODIN will display page 2 of the form, and another operation prompt will appear.
B-6
Issue 16.0
December 2000
8. Enter c in response to indicate that a eld is to be changed. ODIN will then prompt for the eld number. 9. Enter 22 in response to specify eld 22 (the equippage eld). ODIN will position the cursor at eld 22. 10. Change the value of eld 22 as follows:
s
0x8 at the beginning of the manual ring initialization is used to set the ag. and at the completion of the manual ring initialization, after the ring is stable, to reset the ag.
ODIN will prompt for the next eld to be changed. 11. Depress the <CR> key to indicate that no other changes are desired on the page. ODIN will again display the operations prompt at the lower portion of the screen. 12. Enter u in response to update the form and inform ODIN that no other changes are required for this session. 13. The message FORM UPDATED will ash once at the upper right of the screen when the form is updated. ODIN will then return to page 1 of the form. 14. Return to the forms selection page by depressing the < key, and execute the TREND Form.
ECD Values for Interframe Buffers

For interframe buffers that are upstream of RAC 0, set bits 0-3 of the ECD UCB HV eld to the following values: VALUE 0 1 2 3 4 5 6 no IFB TN918 (unpadded) TN915 (padded 512 byte capacity) TN1507 (fiber 256 byte capacity) TN1506 (padded 4104 byte capacity) TN1508 (fast unpadded 16 byte capacity) TN1509 (fast 4104 byte capacity) BUFFER TYPE
Issue 16.0
December 2000
B-7
401-661-045
For interframe buffers that are upstream of RAC 1, set bits 4-7 of the ECD UCB HV eld to the following values: VALUE 0 1 2 3 4 5 6 no IFB TN918 (unpadded) TN915 (padded 512 byte capacity) TN1507 (fiber 256 byte capacity) TN1506 (padded 4104 byte capacity) TN1508 (fast unpadded 16 byte capacity) TN1509 (fast 4104 byte capacity) BUFFER TYPE
B-8
Issue 16.0
December 2000
Abbreviations
For denitions of terms used in this acronym list, see the Glossary or consult the Index for text references.
Numerics
3B20D AT&T 3B20 Duplex Real Time Reliable computer 3B21D A new version of the existing 3B20D processor 5ESS Registered trademark of Lucent Technologies for its premier electronic switching system
A
ACCH Associated control channel ACDN Administrative Call Processing/Database Node ACT Active state ACTS Automated Cellular Test System ACU Analog conversion unit AIF Antenna Interface Frame (Series II Cell) AMA Automatic Message Accounting AMASE Automatic Message Accounting Standard Entries AMPS Advanced Mobile Phone Service AP Attached Processor - Another name for the Ring Application/Attached Processor.
Issue 16.0
December 2000
AC-1
401-661-045
ATP All Tests Passed AUTOPLEX AT&T Registered Trademark for its Cellular Switching Systems AutoPACE Performance Analysis and Cellular Engineering
B
BBA Bus Interface Unit + Baseband Combiner & Radio + Analog Conversion Unit (BIU+BCR+ACU) BCR Baseband Combiner & Radio BER Bit Error Rate BIU Bus Interface Unit BWM Broadcast Warning Message
C
CCC CDMA Cluster Controller CCCEQ CDMA Cluster equipage form CCFDB Custom Calling Features Database CCU CDMA Channel Unit CDMA Code Division Multiple Access CDN Call Processing/Database Node CDN-II Call Processing/Database Node - II
AC-2
Issue 16.0
December 2000
CDN-IIX Call processing/database node - IIX CE Channel Element CELLDB Cell Site Database CEQCOM1 Series I Cell Equipage Common form CEQCOM2 The Series II Cell Equipage RC/V Form CEQFACE Cell Equipage Face CGSADB Cellular Geographic Service Area Database CNI Common Network Interface CNI/IMS Common Network Interface/Interprocess Message Switch CPI Communication processor interface CPU Core processor unit CSC Cell Site Controller CU Control unit
D
DAT Digital Audio Tape DCCH Digital Control Channel DCI Dual-Serial Channel (DSCH) Computer Interconnect DCS Digital Cellular Switch
Issue 16.0
December 2000
AC-3
401-661-045
DCSDB Digital Cellular Switch Database DFI Digital Facility Interface DRTU Digital Radio Test Unit DRU Digital Radio Unit DS-1 Digital Signal level 1 DS0 Digital Signal-0 DSN Digital Switch Node
E
EA Emergency Action Page EA/NORM Emergency Action/Normal Display Key on MCRT ECD Equipment Conguration Database ECP Executive Cellular Processor ECPC ECP Complex ECPDB Executive Cellular Processor Database
F
FAF Feature Activation File FDMA Frequency Division Multiple Access
AC-4
Issue 16.0
December 2000
FER Frame Error Rate
G
GPS Global Positioning System
H
HO Handoff Hz Hertz
I
IMS Interprocessor Message Switch IIRN Integrated Ring Node IRN2 Integrated ring node version 2
Issue 16.0
December 2000
AC-5
401-661-045
L
LAF Linear Amplier Frame LAN Local Area Network
M
MAHO Mobile Assisted Handoff MB Mega Byte MCRT Maintenance Cathode Ray Tube/Terminal MHD Moving Head Disk MHz Megahertz MSC Mobile Switching Center (formerly MTSO) MSO Multiple Size Option for Subscriber Database MUFDB Mobile Unit Features Data Base
AC-6
Issue 16.0
December 2000
N
N/A Not Applicable NVM Non-Volatile Memory
O
OA&M Operations, Administration & Maintenance ODA Ofce Data Assembler ODD Ofce Dependent Data OMP Operations Mgmt Platform, previously Operations and Maintenance Processor OOS Out-Of-Service
P
PC Personal Computer PM Plant Measurements PSTN Public switched telephone network PSU Packet Switching Unit
Issue 16.0
December 2000
AC-7
401-661-045
R
RAM Random Access Memory RCC Radio Control Complex RCU Radio Channel Unit RCV Recent Change & Verify RF Radio Frequency RFTG Reference frequency and timing generator RN Ring Node ROP Read/Receive-Only Printer RPC Ring Peripheral Controller (node) RPCN Ring Peripheral Controller Node RTR Real Time Reliable RTU Radio Test Unit
S
SC Stable Clear
AC-8
Issue 16.0
December 2000
SCSI Small Computer System Interface SCT Synchronous Clock and Tone SH Speech Handler SII Series II Cell Site SM Service Measurements SMS Short Message Service SS7 Signaling System 7 STBY Standby SU Software Update
T
TDMA Time Division Multiple Access TEA Translations Entry Assistant TRKGRP Trunk group TRTU TDMA Radio Test Unit
Issue 16.0
December 2000
AC-9
401-661-045
V
VCSA Voice Channel Selection Activity
W
WTSC Wireless Technical Support Center (formerly CTSO)
AC-10
Issue 16.0
December 2000
Glossary
A
Attached Processor (AP) A circuit pack used with the direct link node (DLN) that provides expanded storage for added processing capacity on the ring.
B
Basic Error Correction (BEC) BEC or Basic is an algorithm for Level 2 error correction on signaling links with short one-way propagation delay. In normal operation, BEC ensures correct transfer of message signal units over CCS7 and CCITT7 signaling links, in sequence and with no double delivery. Positive acknowledgments indicate correct transfer of message signal units. Negative acknowledgments request a retransmission of those signal units because they were received in a corrupt form.
C
Call Processor/Database Node (CDN) A CNI node that handles the call processing functions of the FLEXENT/AUTOPLEX Wireless Network Systems. A CDN is a two-part unit consisting of a node and ring application processor (RAP). There are several versions of CDNs: CDN, CDN-I, CDN-II, and CDN-IIx. CCITT Consultive Committee International Telegraph and Telephone (Comite Consultatif International Telegraphique et Telephonique). An international body that controls the standards of communications protocols. CDN Call Processor/Database Node
Issue 16.0
December 2000
GL-1
401-661-045
CDN-I A CDN that is comprised of an IRN and a 3B15-based computer. This is sometimes referred to as a SMART Node (SN). CDN-II A CDN that is comprised of an IRN2 and an AP30. This is sometimes referred to as a Turbo CDN. CDN-IIx A CDN that is comprised of an IRN2B and a modied AP30. CNCE CCS Network Critical Events Common Network Interface (CNI) A common subsystem software component supplied to various network components whose primary function is providing CCS network access and CCS message routing. Computer Congestion Control The 3B20D/3B21D computer congestion control feature enables a craft to reduce real-time congestion by reducing CNIs activity on the 3B20D/3B21D computer. If not used by a craft, it remains inactive. Critical Node Restore/Monitor CNIs critical node monitor looks for congurations of out-of-service link nodes and direct link nodes (DLNs) that have cut its ring off from the outside world. To restore these nodes quickly, it tells Interprocess Message Switch (IMS) to give them a user critical priority on its automatic ring recovery (ARR) priority list. The monitor also permits its rings application to nominate nodes to this priority. CSN Cell Site Node
D
DCS Digital Cellular Switch Destination Point Code (DPC) A unique value associated with every network component that is used for routing. Direct Link Node (DLN) A DLN is basically an RPCN equipped with an AP. A DLN routes the data link message trafc between cellular systems for both X.25 and SS7 messaging.
GL-2
Issue 16.0
December 2000
Glossary
Direct Link Node 30 (DLN30) The DLN30 has IRN2B, AP30, 3BI, and DDSBS boards. The IRN2B board provides increased performance and higher reliability. Direct Link Node Enhanced (DLNE) The DLNE has IRNB, AP30, 3BI and DDSBS boards. DSN Digital Switch Node
E
EAI Emergency Action Interface EAR Error Analysis and Recovery Extended Access Links (E-Links) and Full Point Code Routing (FPCR) The ELINKS/FPCR features allow LECs to achieve the following benets in their networks: provides additional routes to destinations which further minimizes signaling end point (SEP) isolation; forces trafc to be directly routed (thus using fewer intermediate STPs) to more efcient and less problematic routes which improves network performance; and allows switching trafc between Access Links (A-Links) and E-Links which makes network reconguration easier.
F
Full Process Initialization (FPI) FPI will reduce failed and abandoned initializations. It is a faster and more reliable initialization response than the abort and boot initialization.
Issue 16.0
December 2000
GL-3
401-661-045
I
ICN Inter-Cellular Node IFB Interframe Buffer Board IMS User Node (IUN) An IMS provided node on the ring where with the addition of CNI hardware provides an interface between the ring and the transmission facility. This includes all non-RPCNs. Integrated Ring Node (IRN) A ring node that uses very large scale integration to combine the node processor and both ring interfaces into one circuit pack. There are several versions of the IRN referenced in this document: the IRN (UN303), the IRNB (UN303B), the IRN2 (UN304), and IRN2B (UN304B). Functionally, they all serve the same purpose, but different IRN versions are used in different node types. Interprocess Message Switch (IMS) A common subsystem software component that provides a ring based interfunction, interprocessor transport mechanism. IUN Init with Optional Pump This restores the node without repumping the node. It increases the systems availability through reduced down time.
GL-4
Issue 16.0
December 2000
Glossary
L
LI Link Interface LIN-E Link Interface Node - Encrypted LIN-NE Link Interface Node - Nonencrypted Link Node (LN) A node on the ring where digital information enters from or exits to the transmission facility.
M
MCRT Maintenance Cathode Ray Tube MDL Memory Data Link Message Switch The portion of the IMS software that handles the sending and receiving of internal messages. There are portions of the message switch in all ring nodes and in the central processor. Message Transfer Part (MTP) The functional part of CCS7 that transfers signaling messages as required by all the users and also performs the necessary subsidiary functions (for example, error control and signaling security).
Issue 16.0
December 2000
GL-5
401-661-045
N
Network Interconnect (NI) NI is used to interconnect signaling points in different North American networks which adhere to the ANSI standard specications for the CCS7 protocol. It provides: MTP and SCCP routing to PCs in nonlocal networks, SNM and SCMG for nonlocal network PCs, administration of the associated nonlocal network routing data, new routing types to support routing to small networks and cluster-level-only routing to populated clusters, and NID only routing. Node Processor (NP) The NP is the central processing unit (CPU) portion of a ring node. It controls and schedules the processes in the ring node. Nonlocal Point Code Any signaling point code which has a network identier value that is different from the network identier value of the local point code. NRM Node Recovery Monitor
O
Ofine Boot (OFLBOOT) The OFLBOOT feature allows the 3B20D/3B21D duplex processor of a 5ESS-2000 switch to be logically separated into two simplex machines: the ONLINE side and the OFFLINE side. This allows personnel at a 5ESS-2000 switch to cut over to a new software release with a minimum of downtime.
P
Peripheral Routing Provides the capability to do CCS7 MTP and SCCP routing in a node on the ring. Preventive Cyclic Retransmission (PCR) PCR is an algorithm for Level 2 error correction on CCS7 or CCITT7 signaling links with a long one-way propagation delay. Each message signal unit must be retained at the transmitting signaling link terminal until a positive acknowledgement arrives
GL-6
Issue 16.0
December 2000
Glossary
from the receiving signaling link terminal. During the period when there are no new signal units to be transmitted, all the signal units which have not yet been positively acknowledged are retransmitted cyclically. Protected Applications Segment (PAS) CNI data that rarely changes is referred to as static data, and is preserved in the protected applications segment area of 3B memory. CNI can reuse this data from PAS during CNI init level 2, saving time that would have been wasted downloading the data from disk. To insure PAS data is safe, it must be protected from accidental writes. For this purpose, CNI has improved protection of the PAS area.
R
Ring Refers collectively to the RPCNs and IUNs which are serially connected to one of two circular busses. The ring provides 4 megabyte data paths in both directions between adjacent nodes and can uniquely address up to 1,024 nodes. Ring Application Processor A modied 3B15 computer used in the standard multiapplication real time node that performs processing on the ring. Ring Conguration For various reasons, the ring is recongured under control of the 3B20D/3B21D computer to isolate the faulty segment. Ring Generic Access Package (RGRASP) RGRASP is a debugging tool for CNI ring nodes. Ring Interface (RI) A RI is one of two circuits in a ring node that interfaces the node processor to the ring. Each RI can access either ring 0 or ring 1 to insert messages onto, or remove messages from, the active ring. The heart of the circuit is a rst-in rst-out (FIFO) buffer that provides access to the ring yet allows messages to circulate in the ring independent of the node. Ring Isolation A ring conguration where ring nodes are isolated from the active ring.
Issue 16.0
December 2000
GL-7
401-661-045
Ring Peripheral Controller Node (RPCN) A node on the ring where digital information is removed from the ring for transferral to the 3B20D/3B21D computer for processing or, after processing, reenters the ring.
S
Signaling Connection Control Part (SCCP) An adjunct to the MTP layer of CCS7 which performs interpoint code subsystem status. Signaling End Point (SEP) Dual Point Code (DUALPC) The DUALPC feature allows Signaling End Points (SEPs) to support a two point code assignment to facilitate the change of the point code for resectoring of the SEP with minimal Signaling System Number 7 (SS7) service disruption. SMART Node (SN) Standard Multi-Application Real Time node. See CDN-I. SS7 Signaling System 7
T
Turbo CDN See CDN-II.
GL-8
Issue 16.0
December 2000
Glossary
W
WTSC Wireless Technical Support Center
Issue 16.0
December 2000
GL-9
401-661-045
GL-10
Issue 16.0
December 2000
Index
Index
I
Interactive Diagnostics, 6-70 IRN CDN-I Diagnostic Phases, 6-18 IRN DLNE Node Diagnostic Phases, 6-14 IRN LN (LI4S/SS7) Node Diagnostic Phases, 6-12 IRN LN (LIN-E/SS7) Node Diagnostic Phases, 6-11 IRN2 CDN-II/CDN-IIx Diagnostic Phases, 6-20 IRN2 CDN-III Diagnostic Phases, 6-22 IRN2 DLN30 Node Diagnostic Phases, 6-15 IRN2 DLN60 Node Diagnostic Phases, 6-17
A
About this document, xv comments, xix Automatic Diagnostics and Restorals, 6-55
C
Circuit Pack Trouble Location, 6-24
L
LNs with Unequipped LI Boards - MV Updates, 6-42
D
Diagnostic Listings, 6-41 Diagnostic Message Structure, 6-6
M
Manual (Unit) Diagnostics, 6-56 Manual Diagnostics Using the 1106 Display Page, 659 Manual Diagnostics Using the DGN Command, 6-61
E
Equipment Description, 7-1
N
Node Diagnostic Phases IRN CDN-I, 6-18 IRN DLNE, 6-14 IRN LN (LI4S/SS7), 6-12 IRN LN (LIN-E/SS7), 6-11 IRN2 CDN-II/CDN-IIx, 6-20 IRN2 CDN-III, 6-22 IRN2 DLN30, 6-15 IRN2 DLN60, 6-17 Node Phase Descriptions, 6-9
G
Global Positioning System, AC-5
H
Handling Precautions, 7-1 Hardware and Interfaces, 6-2
Issue 16.0
December 2000
IN-1
401-661-045
O
Operating System Diagnostics, 6-75
P
Performing Diagnostics, 6-6 Power Packs and Fusing, 7-2
R
RAP Diagnostic Firmware, 6-69 Ring Node Addressing, 6-43
S
System Diagnostics, 6-8 System Maintenance Interfaces, 6-5
U
Unexplained Loss of Token, B-5
IN-2
Issue 16.0
December 2000
Lucent Technologies values your comments!

Flexent/AUTOPLEX Wireless Networks Executive Cellular Processor (ECP) Release 16.0 Common Network Interface (CNI) Ring Maintenance 401-661-045 Issue 16 December 2000
Lucent Technologies welcomes your comments on this information product. Your opinion is of great value and helps us to improve.
1. Was the information product:
Yes No Not applicable
In the language of your choice? In the desired media (paper, CD-ROM, etc.)? Available when you needed it? Please provide any additional comments: ________________________________________________________________________________________________ ________________________________________________________________________________________________
2. Please rate the effectiveness of this information product:
Excellent More than satisfactory Satisfactory Less than satisfactory Unsatisfactory Not applicable
Ease of use Level of detail Readability and clarity Organization Completeness Technical accuracy Quality of translation Appearance If your response to any of the above questions is Less than satisfactory or Unsatisfactory, please explain your rating. ________________________________________________________________________________________________ ________________________________________________________________________________________________
3. If you could change one thing about this information product, what would it be?
________________________________________________________________________________________________ ________________________________________________________________________________________________
4. Please write any other comments about this information product:
________________________________________________________________________________________________ ________________________________________________________________________________________________
Please complete the following if we may contact you for clarification or to address your concerns:
Name: ______________________________________________________ Company/organization: ______________________________
Date: ________________________________
Telephone number: ________________________________
Address: ____________________________________________________________________________________________ Email address: ______________________________ Job function: __________________________________________
If you choose to complete this form online, go to http://www.lucent-info.com/comments Otherwise fax to 407 767 2760 (U.S.) or +1 407 767 2760 (outside the U.S.) or email comments to ctiphotline@lucent.com

172254

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

172254

Uploaded by

Copyright:

Available Formats

Flexent/AUTOPLEX Wireless Networks Executive Cellular Processor (ECP) Release 16.

0 Common Network Interface (CNI) Ring Maintenance

Issue 16.0 December 2000 401-661-045

Copyright 2000 Lucent Technologies All Rights Reserved

Federal Communications Commission Statement (FCC) Notication and Repair Information

Lucent TechnologiesProprietary See notice on rst page

xv xv xv xvi xvi xvii xvii xviii xix

Overview of the CNI Ring

Lucent TechnologiesProprietary See notice on rst page

Description of the Ring Subsystem

Lucent TechnologiesProprietary See notice on rst page

Ring and Ring Node Maintenance Procedures

Lucent TechnologiesProprietary See notice on rst page

Ring Critical Events

5-1 5-1 5-2 5-2 5-3 5-3 5-4 5-4

Diagnostic Users Guide

Equipment Handling Procedures

7-1 7-1 7-1 7-2 7-13 7-16

Lucent TechnologiesProprietary See notice on rst page

Ring Error Analysis and Recovery

Ring Maintenance Reference Material

B-1 B-1 B-1 B-3 B-4 B-5 B-5

Lucent TechnologiesProprietary See notice on rst page

Overview of the CNI Ring

1-1 1-14 1-19 1-23 1-29 1-31 1-33 1-37 1-43

Description of the Ring Subsystem

2-1 2-2 2-4 2-9 2-11 2-14 2-15

3-1 3-29 3-33 3-48 3-78 3-79

Ring and Ring Node Maintenance Procedures

4-1 4-4 4-8

Lucent TechnologiesProprietary See notice on rst page

Ring Critical Events

Diagnostic Users Guide

Equipment Handling Procedures

Lucent TechnologiesProprietary See notice on rst page

Overview of the CNI Ring

Description of the Ring Subsystem

3-1 3-17 3-21 3-23 3-27 3-31 3-44

Ring and Ring Node Maintenance Procedures

Ring Critical Events

Diagnostic Users Guide

6-1 6-3 6-8 6-9 6-10 6-11

Lucent TechnologiesProprietary See notice on rst page

Equipment Handling Procedures

7-1 7-3 7-21 7-25 7-27

Lucent TechnologiesProprietary See notice on rst page

Ring Error Analysis and Recovery

Ring Maintenance Reference Material

Lucent TechnologiesProprietary See notice on rst page

Lucent TechnologiesProprietary See notice on rst page

About This Document

Reasons for Reissue

Lucent TechnologiesProprietary See notice on rst page

How to Use This Document

Chapter 2Description of the Ring Subsystem Describes the ring subsystem.

Glossary and Acronyms Index

Lucent TechnologiesProprietary See notice on rst page

About This Document

Product Safety Labels

NOTE: Noties you that something needs special attention or consideration.

Lucent TechnologiesProprietary See notice on rst page

How to Order Documentation

Lucent TechnologiesProprietary See notice on rst page

xviii Issue 16.0