Professional Documents
Culture Documents
White Paper On Avaya Aura™ Application Enablement
White Paper On Avaya Aura™ Application Enablement
This white paper is intended for a software application developer or a systems engineer
who is responsible for deploying an application or an AE server in a HA configuration.
Uninterrupted telephony is important for many enterprises especially for mission critical
applications. Avaya Aura™ Application Enablement (AE) Services on System Platform
(SP) Release 5.2 supports a high availability (HA) cluster of two nodes. The active server
node automatically fails over to the standby node in the event of a hardware failure.
Client applications are able to re-establish communication with the AE Services cluster
when the failover is complete. This failover feature is not supported on the AE Services
5.2 software only and bundled offerings.
Avaya Aura™ Communication Manager (CM) provides the Processor Ethernet (PE)
interface for direct connection to the main media server. This feature reduces cost by not
requiring a CLAN for communications. However, a DMCC client application must re-
establish any H.323 registrations that are terminated when an interchange occurs between
a duplicated pair of CMs that are communicating to an AE server over PE, unless the
Time To Service feature is used. Furthermore, an AE Server 5.2 that communicates over
PE does not support the ESS and LSP configurations (only a single IP address is allowed
to be administered on the AE Server 5.2 for a PE connection, and ESS and LSP servers
will have their own (unique) IP addresses, which will always be different than that of the
main media server).
1
Avaya recommends the following:
In this CM ESS configuration, the applications and associated AE Server at the remote
sites are always active and are supplying functionality for the local resources at the
remote site. As described in later sections, this type of configuration ensures the shortest
outage.
Headquarters Headquarters
Active AE Server Primary S8720 AE Server Primary S8720
Figure 1a Figure 1b
The AE Services on System Platform 5.2 release provides higher availability relative to
the software-only, bundled and earlier releases. This configuration monitors the server
nodes for loss of network connectivity and hardware failure events. This information is
used to detect faults and decide when to failover from the active node to the standby node
in the server cluster. The AE Services on the standby node are restarted when a failover
2
event occurs. This feature enables AE Services to continue to provide service to client
applications with reduced downtime when a hardware failure event occurs. In addition to
this, the System Platform will restart the AE Services virtual machine if it does not
maintain its sanity keep alive because of a software fault condition.
In addition to the SP failover feature, DMCC provides recovery from a software fault or a
shutdown that does not allow the DMCC Java Virtual Machine (JVM) process to exit
normally. The DMCC Service Recovery feature is available on all AE Services
configurations: software only, bundled and on system platform. When the DMCC JVM
process is restarted after an abnormal exit, the DMCC service is initialized from persisted
state information on the hard disk. This persisted state information is saved during normal
operation and represents the last known state of the DMCC service prior to a JVM
abnormal exit. The state information includes session, device, device/call monitor and
H.323 registration data.
From a client application’s point of view, the DMCC recovery appears as a temporary
network interruption that requires the client to re-establish any disconnected sessions.
When the client application re-establishes the session, the DMCC service will send
events for any resources that could not be recovered. These will include monitor stopped
and unregistered event messages and enable the client to determine what needs to be
restored through new service requests. Otherwise, the client will continue to operate as
usual.
3
Normal operation
Headquarters Remote site A
G650 Gateways Active AE Server
Active AE Server
Application
Application
Primary S8700
WAN
Application Application
Figure 2
In case of a WAN outage (as shown in Figure 3 below), each remote site becomes
independent and provides service without major interruption to endpoints and
applications. Remote site A with a G700 media gateway will have the LSP go online and
the G700 media gateway will connect to that local LSP. It is recommended to configure
the primary search list of the G700 media gateway such that it contains CLANs of only
one site (i.e. headquarters in this case). The secondary search list should contain the LSP
at the local site (site A in this case).
The AE server will detect connectivity failure with the main site (headquarters) and will
notify its applications. The applications will have to direct the AE server to move the
connectivity over to the LSP (described in detail further below).
The G650 media gateways at the remote sites (sites B and C) will connect to the local
ESS server in case of a WAN outage. The AE server will automatically get connected
with the ESS server through the G650 media gateways. This will be transparent to the
AE server and its applications except for what will appear to be a brief network outage
(described in detail further below).
4
The site at the headquarters will continue to function as it did previously in case of a
WAN outage.
Note: Each of the remote sites and the headquarter site will not be able to access each
other’s resources during a WAN outage.
WAN Outage
Remote site A
Headquarters G650 Gateways AE Server
AE Server
Application
Application
X
Primary S8700
WAN
G650 Gateways
Remote site C G650 Gateways Remote site B
AE Server
AE Server
Application Application
Figure 3
If the main headquarters site is completely down but the WAN is functional, (as shown in
Figure 4 below), the remote sites will behave similar to the WAN outage scenario
described above, but with one important exception. With the ESS feature, the system
will attempt to stay as “whole” as possible. Since the WAN is still intact, all of the G650
gateways end up being controlled by the same ESS server at Remote Site B. Since the
application and AE Server were configured to support only the local resources at the
remote sites, the application continues to function the same whether the sites operate
independently (WAN failure) or jointly (normal operation or site destruction at
headquarters).
5
Site destruction
Remote site A
Headquarters
G650 Gateways AE Server
X
AE Server
Application
Application
Primary S8700
WAN
Application Application
Figure 4
6
b. CallInformation Services within DMCC, Call Control Services within
DMCC, and all other CTI services
The CallInformation and Call Control services within DMCC and all other
CTI Services (TSAPI, CVLAN, DLG and JTAPI) use the Transport (AEP)
link to communicate with Communication Manager. The transport links
(Switch Connections) on each AE Server should be administered to
communicate only with CLANs in gateways that are local to the AE Server’s
site. If the system is configured in this fashion, the application / AE Server
will not have to take any unusual action to recover in the event that a gateway
loses connectivity to the primary S8700 and transitions to an ESS server.
The AE Server will then automatically attempt to reestablish the AEP links.
Note that it takes a little over 3 minutes for the media gateway (like G600 or
G650) to connect to an ESS server. Once the media gateway has registered
with the ESS server, the AE Server will succeed in establishing its AEP links
very soon thereafter (after around 30 seconds). As soon as an AEP link is
established, the application will be notified that the CTI link is back up, and
the application can begin to resume normal operations. Since there is no run-
time state preserved on a transition to an ESS server (as there is with an
interchange on an S8700) all application state must be reestablished. Note
that, from the AE server’s and application’s perspectives, the failure scenario
and recovery actions appear exactly the same as a long network outage
between the AE Server and the gateways.
7
There is one important note with respect to the current versions of AE
Services (i.e., AE Services 3.1 and above) ESS behavior and AE Services 3.0
ESS behavior. If an AE 3.0 Server ends up with AEP links to gateways that
are controlled by different ESS or primary call servers (i.e. a fragmented
system), the system will not behave in a sane fashion. Some messages will be
sent to one call server, and others will be sent to other call servers, with no
deterministic behavior with respect to where messages are being sent. Recall,
however, that the ESS feature attempts to keep as many gateways as possible
under the control of a single call server. Given that this is the case, it is
possible to configure the system such that it is extremely unlikely that a 3.0
AE Server will have AEP links to different fragments of a survivable system.
The safest configuration is to have the 3.0 AE Server talk only to CLANs
resident in a single gateway. Avaya recommends that wherever possible, all
gateways through which an AE Server connects are all on the same LAN,
preferably even on the same ethernet switch to avoid fragmentation. In such a
configuration, it is virtually certain that the gateways will all be controlled by
the same controller at all times, and the system will therefore always operate
in a sane fashion.
8
Starting with Communication Manager 3.1, new administration forms have been
created to control the behavior of survivable processors (i.e. LSPs and ESSs).
Particularly, the Enabled field on the add/change survivable-processor forms that
can be set to one of the following three values:
• "n" or no: This means that this processor channel will be disabled on the LSP or
ESS.
• "i" or inherit: This means that this link is to be inherited by the LSP or ESS
exactly as administered on the main. When set to "i" the remaining data on the
line is recopied from the translations from the main and may not be edited. Note
that this does not mean that the link will work. For example, if the link is
administered to a CLAN and an attempt is made to inherit this link on an LSP, the
link won’t work because the LSP has no CLAN. It is most appropriate to use "i"
for a link administered via procr or for an ESS.
• "o" or overwrite: This entry will cause the link field to change to "p" and be
uneditable. The data entered on this line will overwrite the processor channel
shown on this line when the data is file-synchronized to an LSP or ESS.
Avaya recommends different administration settings for the Enabled field depending
on the configuration of a system (as shown below).
Configuration Administration
Only LSPs (no ESSs) set Enabled to “o”
Both LSPs and ESSs set Enabled to “n”
Table 1
For configurations with LSPs and no ESSs, Avaya recommends setting the Enabled
field to “o” (overwrite). This will allow automatic transition to a local LSP after
detecting connectivity failure to the main site.
If both LSPs and ESS servers are configured, Avaya recommends setting the Enabled
field to “n” (disabled) for LSPs. Setting the Enabled field to “o” (overwrite) for LSPs
will most likely result in undesired behavior. Consider the scenario in Figure 4 where
the main headquarters site is completely down but the WAN is still functional. If the
Enabled field is set to “o” for the LSPs, then the AE Server will always connect to an
9
LSP first since it would be available for connections before any of the ESS servers.
Remember, it takes a little over 3 minutes for a media gateway (like a G600 or G650)
to connect to an ESS server. Additionally, while connected to the LSP, the AE Server
will deny (i.e. immediately drop) subsequent connections to any ESS servers.
However, in this scenario, it would have been preferable to connect to one of the ESS
servers first since it’s possible that the ESS server had connectivity and full control of
the system.
Note that if the Enabled field is set to “n” (disabled), the AE server will detect
connectivity failure to the main site, but it will not automatically transition to a local
LSP. Depending on the Link type different actions need to be performed, as
described below, by the applications using the AE server.
When the connectivity to the main server is back up, the LSP would need to
be put in offline mode either manually or automatically (if configured
properly). The DMCC service will detect connectivity failure to the LSP and
will send an unregistered event to the application for each DMCC extension.
Avaya recommends that the application then retry connecting to the main
Communication Manager through the DMCC service on the AE server.
AE Services has a feature in 3.0 that allows the use of a symbolic name
for a list of ip-addresses (i.e. Gatekeeper list). Once administered through
the AE Services OAM web-page, the application can then use the
symbolic name to get a DeviceID (i.e. DMCC softphone extension) for a
particular Communication Manager. This feature allows the application
to easily switch over the DMCC softphones to the LSP using the symbolic
name.
10
sends a link down event to the application. Avaya recommends that the AE
server be pre-configured to have the LSP administered under the main site
switch name through the AE Services OAM web-page. This connection will
not be active as long as the LSP is not up. The application will have to use
System Management Services to dynamically configure the Transport (AEP)
link (using the change ip-services command) on Communication Manager
running on the LSP once it receives the Call Information link down event. The
application should use the WSDL defined in:
http://<ae-svcs-server-name>/sms/SystemManagementService.php?wsdl
with the IPService Model defined in:
http://<machine-name>/sms/ModelSchema.php?model=IPServices
When the connectivity to the main server is back up, the LSP would need to
be put in offline mode either manually (by giving a “reset system 4”
command in Communication Manager) or automatically (if configured
properly through the “change system-parameters mg-recovery-rule” form
in Communication Manager). In either case, the Call Information service will
detect transport link connectivity failure to the LSP and will send a link down
event to the application. Also the transport link to the main Communication
Manager will be back up for which the application will receive a link up event
from Call Information services.
Note: a) If the Transport (AEP) link has multiple CLAN addresses configured,
the application will not receive a Call Information link down event unless
connectivity to all CLANs is lost.
b) If the Transport AEP link is connected to one CLAN and the DMCC H.323
link is connected to another CLAN, it is possible that one of the connections
could be down. In this case, if LSPs are being used, then one of the links
could be on the main server and the other could be on a LSP. This will cause
undesirable behavior.
c) Avaya recommends that for remote sites with G700 gateways and LSPs
(and without a G600/G650/MCC/SCC on the same site), the transport (AEP)
link from AE Server at that remote site be configured to link to a CLAN(s)
(on a G600/G650/MCC/SCC) at the main headquarters site.
11
link (using the change ip-services command) on Communication Manager
running on the LSP once it receives the link down event. The application
should use the WSDL defined in:
http://<ae-svcs-server-name>/sms/SystemManagementService.php?wsdl
with the IPService Model defined in:
http://<machine-name>/sms/ModelSchema.php?model=IPServices
When the connectivity to the main server is back up, the LSP would need to
be put in offline mode either manually (by giving a “reset system 4”
command in Communication Manager) or automatically (if configured
properly through the “change system-parameters mg-recovery-rule” form
in Communication Manager). The transport link to the LSP will be down and
the application will receive a link down event. Also the transport link to the
main Communication Manager will be back up for which the application will
receive a link up event.
Note: a) If the Transport (AEP) link has multiple CLAN addresses configured,
the application will not receive a link down event unless connectivity to all
CLANs is lost.
b) Avaya recommends that for remote sites with G700 gateways and LSPs
(and without a G600/G650/MCC/SCC on the same site), the transport (AEP)
link from AE Server at that remote site be configured to link to a CLAN(s)
(on a G600/G650/MCC/SCC) at the main headquarters site.
12
http://<machine-name>/sms/ModelSchema.php?model=IPServices
When the connectivity to the main server is back up, the LSP would need to
be put in offline mode either manually (by giving a “reset system 4”
command in Communication Manager) or automatically (if configured
properly through the “change system-parameters mg-recovery-rule” form
in Communication Manager). The transport link to the LSP will be down and
the application will receive a link down event. Also the transport link to the
main Communication Manager will be back up for which the application will
receive a link up event.
Note: a) If the Transport (AEP) link has multiple CLAN addresses configured,
the application will not receive a link down event unless connectivity to all
CLANs is lost.
b) Avaya recommends that for remote sites with G700 gateways and LSPs
(and without a G600/G650/MCC/SCC on the same site), the transport (AEP)
link from AE Server at that remote site be configured to link to a CLAN(s)
(on a G600/G650/MCC/SCC) at the main headquarters site.
13