DS Real Time FAQ Final

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

FAQ: Data Services Real Time Set Up

Assumptions Setup with at least two Web Servers and two Job Servers / Access Server
How to load balancer Real-Time job? The Access Server acts as a software load balancer distributing real-time message requests from
1-to-many service providers (job server instances). In order to provide failover for the Access Server
and/or Web Server (when using web services) you would need to use external load balancer solution.
One example using an external load balancer (See more detailed diagrams below)
Load balancer Web Services -> Access Server
Web Services -> Access Server
As for optimizing Real Time jobs or using multiple job servers, you can designate multiple service
providers as noted below. Note: when Job Server service/daemon is not running the Access Server
will be do wn. . In addition, when web services are being used there is a connection pool between
the web service layer and the Access Server(s). The connection pool settings can be adjusted on the
web server to enhance scalability and performance. For further information, see Data Services
Integrator’s Guide.
When more than one environment is setup for failover and/or load balancing you would typically
replicate th e environments. There is no automated way to do this right now. Long term engineering
is planning on improving the product to make it easier to replicate environments. For now are detailed
steps on how to manually replicate the management console and access server configuration through
copying and manipulation of the XML configuration files.

How does a Web Service Real-Time job utilize When a web services request is made they are sent to the DS Web Service layer via HTTP/SOAP
multiple job servers? protocols and the Web Service routes them to the Access Server. When using the Message Client
API’s, requests are sent to the Access server directly bypassing the web service layer and thus
providing improve performance. You need to setup Real-time Services and for each service define
which job servers will handle the requests and how many service providers (AL_ENGINE processes)
will be instantiated to service the requests on each job server. You MUST manage the minimum and
maximum instances properly in order to gain scalability of the service on each job server. DOP does
NOT scale real-time jobs, but the number of s ervice providers (Processes) does. You need to
balance loads with memory consumption per physical deployment and number of services a job server
is supporting.
Recommendation: We recommend using separate job server (physical/virtual) to handle batch
processing vs. real-time to help ensure SLA’s for real-time requests are met. For further performance
optimization, see SAP BusinessObjects Data Services Performance Optimization Guide and SAP
BusinessObjects Data Services Integrator's Guide
Can web service requests be sent directly to the The load balancer would go before the web server in the case of web service requests, which is
Access Server or go through the web tier from the implemented via a Java application running on the web server. In the case of the Java/C++ message
Load Balancer? client, it will work directly with the DS Access Server. Keep in mind that this API’s TCP/IP connection
is persistent. The Load balancer must maintain the connection to the same access server each time
using the load balancers persistent connection capabilities. When using a load balancer with
message clients, message clients should be setup to automatically reconnect and retry if a failure
occurs. At that point the load balancer will automatically forward the request to an active node to gain
a new connection. The load balancer in this case will balance connections across 1-to-many Access
Servers but each invoke() call will have to send the request back to the exact same Access Server,
which will balance the load across 1-to-many job servers setup as service providers for that service
being invoked.
Does Job Server return the result set to the calling Request/response are a continuous flow (synchronous) in and back out through the same processes.
Web Services? It is not asynchronous. However, most web service tool kits provide an asy nchronous interface which
will provide a call-back and spawn a separate thread to manage the synchronous request to Data
Services. Data Services is setup to handle connection failures between application components.
The job server and Access Server's availability is ensured by job service, which will be pinging for the
servers every 30 sec. If the Access Server stopped for some unknown reason, th e Job Service will
restart that again. There is parameter in DSConfig.txt to reduce this time, but I think 30 sec is ideal.
However the customer should ensure they setup their client to properly manage any failures that could
occur due to a connection failure in Data Services which results in an exception on the client or if the
connection between the client and web service or Access Server fails.

What would be the optimal placement of an access Place one Access Server on an y one of the Job Servers. As noted above, each Real-Time job is
server? associated with a list of J ob Servers where the Real-Time Job, RTJ, can run. So everybody is
sending XMLs to the Access Server, and it looks for an idle service provider and sends the XML to that
one. For f ailover you might want to have a second Access Server so at the end you will probably
rd
have one Access Server per Job Server and make sure that 1/3 of the requests are sent to one of the
Access Servers. You scale up you can have 1-many job servers providing services through 1 access
server. At some point the Access Server/Web Server could become the bottleneck and thus you may
need additional for performance or purely failover needs.
As for the SAP connectivity I would use the term “RFC”. We talk to SAP via the RFC protocol.
How can an external load balancer know that a DS The access server acts as a software load balancer within the application across 1-to-many job servers
environment is running properly? setup to handle real-time requests. However the Access Server would be your single point of failure
so most customers use a hardware based load balancer. Each load balancer technology has different
methodologies for monitoring a particular node (Data Services instance) in a pool to ensure it is
accessible. Sometimes it’s a simple TCP/IP connection test, other times it’s making an HTTP request
and analyzing the content returned. Most allow an HTTP request with a given HTTP body and the
ability to compare that against a truth string to ensure availability others have the ability to use SNMP a
special configuration when calling web services (sending SOAP messages) etc.. The customer will
need to ensure they select the best method of mon itoring that suits their needs. In general the simple
TCP/IP Connection test is just checking the web server for availability but NOT the Access/Job
Servers. So the best methodology to have more confidence in a node being up is to create a tes t
real-time job with minimal overhead and have the load balancer send a test soap document to the
request. A tool like SOAP UI will generate a test input SOAP document and give you the return
document to easily configure the load balancer. This works well for the web services interface but will
NOT work for the message client libraries. The message is a proprietary binary format and cannot
easily be generated by a load balancer to test node availability when connecting directly with the
access server. So for now a simple TCP/IP connection test to the access server should suffice.

One of the most common devices used by our customers is the F5 Big-IP solution. More details on its
monitoring capabilities can be found here:

https://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/ltm_configuration_guide_10_0_0.
pdf.html/Configuration_Guide_for_BIG-IP_Local_Traffic_Management.pdf

Can I run DS Web Services on its own server – one Yes. In fact we typically recommend the web server tier be installed on a separate server from the job
different server from the job server? server tier. We currently do not charge for deploying the web server to different servers customers
just pay for the deployment of job servers.
Where is the Web Admin configuration saved? Admin.xml and should be behind the firewall.
How should DS Web Admin server be configured The management console web application and the web services/SAP interfaces are all a single
for Real-Time job fail over? intertwined application from an installation perspective and cannot be installed separately. Thus, to
setup the management console for high availability/load balancing today you have two options:
1. Install 2 or more environments and pair up the web server instance with it’s own access
server and 1-many job servers. Setup each environment manually by l ogging into the
management consoles using the direct host name (not through a load balancer). Setup
replicated environments in terms of repo’s and services.
2. Install the environments and setup 1 environment. Then follow the steps to replicate the
configuration of the management console AND the access server (if using real-time services)
through copying and modifying the necessary XML files (see the appendix with detail at the
end of this document.
Customers commonly ask about how they should handle the repository when it comes to ensuring high
availability. You have two options:
1. Use the RDBMS vendors’ failover support such as Oracle RAC if using Oracle. Many
customers don’t own these features of their RDBMS and are hesitant to invest in this way.
From an engineering perspective we’re really only done testing with Oracle’s RAC capabilities
and have not done official testing with any other vendors’ failover capabilities.
2. If the customer doesn’t want to invest in the RDBMS vendors’ solutions then the second most
common methodology is to use 2 or more copies of your production repository 1 copy for each
replicated environment.

Describe the steps to test Data Services Real-Time You can first test your real-time job in the Designer. Open the reader and loader and specify the
job location of an input XML file that matches your defined schema and open the loader and define the
location and file name of the output XML file you want it to create. Make sure you specify in the loader
to delete and recreate the file or i t will append each run of the job into the existing file. Ensure that
your output looks correct first when running from the designer.
Next test the real-time service through 1 of two ways depending on if the customer will be using the
message client libraries OR the web service interface:
1. Message Client libraries – In the “SDK” folder wh ere the message client libraries are installed
with Data Services you will find a samples folder. There is some basic sample code for both
the Java and C++ API’s. Use the samples to test a real-time service if you will be using the
message client library.
Pre-requisites:
1. Create a Real-Time job and test through the designer
2. Configure a Real-Time service in DS Management Console
3. Modify the sample code to call your real-time service as you named it in the management
console and change the input XML to match the schema you defined in Data Services.

2. Web Services – if the customer will be using the web service interface there are many tools out
there to perform simple testing without generating the web service clients through a software
development tool. There is a great free tool called SOAP UI which is fairly reliable tool for
ensuring you web service is working correctly
Pre-requisites:
4. Create a Real-Time job and test through the designer
5. Configure a Real-Time service in DS Management Console
6. Add the real-time service as a web service.
7. Install free tool soapUI
Start soapUI and add WSDL URL, i.e. the Real-Time service defined in the Management Console.
Expand the job and display default XML. Replace the default XML with the test XML content and run.
You will see the result set on the right pane. Also you will see the requests increasing for the real-time
service in the DS Management Console for each request
Some other very useful tools are XMLSpy which allows you to graphically create XSD’s, validate them,
and generate sample XML files. SOAPScope is a great tool for testing web services and we’ve used
this internally many times for testing purposes. This tool isn’t all that expensive and can be used for
development teams to automate the testing of their services as well. Well worth a look.
Ensure that you’ve properly defined a namespace Over the past many releases we’ve ran into a number of issues with the handling of namespaces in
in your XSD’s that you use as a source and target. web service tools and a few bugs in Data Services that we’ve addressed to ensure this works properly
whether a customer defines a namespace or not in their XSD’s. But the best practice is to always
define a namespace for your XSD’s. Make sure you do the following:
1. Define a default namespace
2. Define a target namespace (typically the same as the default namespace)
3. Define the “elementFormDefault” attribute of the schema element to be “true”.
Here is an example:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://sap.com/im/dssugg"
targetNamespace="http://sap.com/im/dssugg"
elementFormDefault="qualified">

NOTES:
1. Ensure that you are defining unique namespaces for your reader and your loader
2. Ensure that your namespaces are unique across real-time jobs.
3. Your namespace doesn’t need to be a URL but that’s typically standard as it’s an easy way to
ensure that you have a unique namespace. It can be any string you want to use. Typically
organizations define a standard such as the company domain and different paths after the
domain for various groups/applications.
Performance Tuning Some main things to watch when it comes to performance:
1. When you create your real-time services the number of Min/Max instances determines how
the real-time service will scale up. For each instance 1 AL_ENGINE process is created on
the server. Each AL_ENGINE process can handle 1 real-time message request at a time. So
you’ll want to change these from the default of 1 to something else based on the number of
service providers your setting up on each job server and the number of CPU’s/Cores.
2. Connection Pool – there is a connection pool for the web service connections to the access
server. The documentation covers the options you’ll need to tweak in order to properly
manage this pool. We’ve seen a few customers that have had to tweak these settings from
the defaults in order to further scale up the solution for better throughput.
3. Watch the different statistics in the management console around avg. response times, queue
lengths etc. so you can see if requests are piling up in the queues and response times are
not in alignment with what you need to meet an S LA. NOTE: The first message request
through a given AL_ENGINE process will always take longer than each subsequent request
due to a small amount of initialization that can only happen when the first request comes
through. That may skew you request times you see in the management console.
4. Each AL_ENGINE proc ess for real-time jobs can take a good amount of memory even for
simple jobs so monitor the memory use on the server and ensure you have enough for all the
services and the number of service providers (Max/Min settings) you intend to have on a given
job server.
5. We typically recommend that customers deploy at least 2 stand alone web/access/job servers
to ensure high availability. We typically expect customers to deploy those on servers that are
NOT shared with a job server that will be performing batch operations. This recommendation
is to ensure that they will meet their performance SLA’s for their real-time services by not
consuming most if not all of the system resources when running a batch job. However, if a
customer will have low or no real-time requests at night and can schedule their batch jobs then
there is no reason they couldn’t share the same job server for batch/real-time operations.
NOTE: If a customer is to have more than 1 environment for HA/LB they will need to pay for the
CPU’s for each job server they are using ONLY for those job servers that are truly active. If they have
an Active/Passive configuration where a particular environment would ONLY be used upon another
environment failing they do not have to pay for the passive setup.
How do I update my data quality reference data for Typically customers will have 2 or more replicated environments (web/access/job servers) and for each
real-time services? of those environments have a separate file system containing the address directories and potentially a
separate cleansing package repository. This way they can bring down 1 environment at a time to
update the reference data and the load balancer will automatically send request to the still active
environments.
How should a customer setup a load balancer to There are many ways this can be tackled and each administrator will probably have their own
determine which environments to send requests to? preference but they commonly ask for features in the software to enable them to easily do this. Some
common mechanisms we’ve seen:
1. Simple open/close a TCP/IP connection to each web server or acc ess server (message client
libraries) instance. If it succeeds in making a connection it considers the environment
available.
2. Make a web service request (SOAP/HTTP) to the web server to call a simple real-time service.
3. Call the “ping” real-time service method from the load balancer.
4. Some load balancers also have software daemons which run on the servers where our
applications are housed that monitor and report system used back to the load balancer
Typically we’d recommend #2 and the customer create a very simple job that responds quickly. This
is more fool proof than option #1 and #2 since those are really only testing the availability of the web
service and not the Access Server or Job Server instances.
NOTE: You can configure the Windows Service and unix/linux daemon for DS (which controls BOTH
the Access and Job Server) to autorestart if they were to fail for some reason. Also if a service
provider (AL_ENGINE instance for a real-time service) fails DS will always start new AL_ENGINE
process to maintain at least the minimum number you’ve defined for each service.
How are RDBMS connections handled when a We have a solution to address the loss of repository connections when trying to access the repository
failure occurs? during job execution.
This covers the following scenarios:
1) Access to repo by DQ transforms (i.e. DQ transform writing data to the repository for report
data)
2) General DS accesses to the repository for writing operational metadata to AL_HISTORY,
AL_STATISTICS at end of dataflow/job execution.
3) The fix is applicable for both real-time and batch for the above scenarios.
In addition, the DB reader for candidate selector (Match transform) is also made connection tolerant in
real-time mode. It automatically reconnects a broken connection bef ore processing a real-time
message.
The supported databases are Oracle, SQLServer, DB2 or MySQL. Due to a technical issue with the
client, we cannot support this functionality for Sybase as a repository RDBMS.
When connections are lost and reconnecting is taking place, trace will show:

Database connection to <sj-asamudra> broken. Retrying with retry count <6> and sleep interval <10>
seconds
Database connection to <sj-asamudra> broken. Successfully obtained connection after <2> attempts.

In the event the db re-connect failure, you will see the following message (this time, it is an error, as the
job could not access repo to update al_history) and engine aborts execution:
Database connection to <sj-asamudra> broken. Failed to obtain connection after <6> attempts.
Please verify that the database server is accessible.
Note: You will still see green icon in admin in this case.
These database retry’s are configurable within the DSConfig file:
Recycle_DB_Connection_Retry_Count=6
Recycle_DB_Connection_Retry_Sleep_Interval_In_Seconds=10
Setting Recycle_DB_Connection_Retry_Count=0 will turn-off this feature, useful while testing.

Identifying and debugging XML Schema or other Sometimes there could be certain issues with the XML schema definitions being used or other
web service issues at design time? problems where the real-time service cannot be properly added to the WSDL. This is not common but
has come up a few times. When that happens the key is usually that you won’t see your real-time
service in the list of real-time services after adding it in the management console. If this happens you
should view the WSDL in the management console and you will usually see an error message at the
very top about your real-service and what the issue was. If you see this the best thing is to make sure
you’ve tested your job in the designer and if need be r e-add the service. However, you can find
additional detail that might be beneficial in some log files on the web service tier: Depending on
where the error occurs, the logs should be in $LINK_DIR/log folder. Take a look at webservice.log,
webadmin.log or error.log et cetera. These should be submitted to customer support if you’re not able
to address the issue yourself.

How is the number of real-time service providers When a real-time service is started the “min” number of service providers will automatically be started.
management with the Min/Max settings? If the #of service requests in a queue is larger than the # of service providers, it will create new provider
instances up until the max instances is reached.
If a service provider instance is idle for 10 minutes AND the #of instances is > minimum instances it will
shutdown.
How to replicate the management console and if Scenario: tested the replication of Data Services Web Server Configuration using 2 Servers, following
you are using Real-Time services how to replicate are the steps for that, same steps can be followed for more than 2 servers. See below for more detail.
the configuration stored with the Access Server Sample failover / load balancing architecture Visio diagrams are listed at the end of this document.

How to replicate the management console and how to replicate the configuration stored with the access server if you’re using real-time services:
1 Install Data Services Web Server on the 2 different servers (machines)
2 Configure repositories, access server, CMS Connection and other parameter on the Web server 1 (access the Management console by directly
connection to this server, don't connect using F5 switch)
3 Stop both the Web servers
4 Copy following files from Web Server 1 from $LINK_DIR/conf to Web Server 2 $LINK_DIR/conf directory
a dmin.xml
mdreport.ini (if it exists)
dqreport.xml (if it exists, this file is created if, function areas, and rules are configured in Data Validation else this file will not be there)
sapconnections.xml (if it exists, this file is created if SAP RFC Server Interface is configured, else this file will not be there)
5 After copy is completed, on Web Server 2, open the admin.xml an d update the AccessServer details, the access server name and port info will be
stored in the XML tag named server-name and server-port in the <accessservers> XML node
<server-name>VANPGWIN195</server-name>
<server-port>4000</server-port>
6 Restart both the Web Server
Note:- Do not change any configuration by accessing the management console URL via F5 switch, which can result in configuration files getting out of sync,
the other limitation is you will not be able to monitor the real time services from DS Management Console via F5, since the Access Server information for
different http session can come from any of the web server, depending on the web server that is processing the request it will display a different access
server, and while trying to access the info for RT services the page may give errors
The Access Server and real time service configuration is stored in AS.xml file in Access Server directory
1 Start Access Server on Machine 1
2 Configure Real Time Services, by opening the management console directly on the Web server mac hine to which this Access Server is associated
3 Stop the Access Server on both machines
4 Copy the AS.xml of the Access Server configured on Machine 1 to Access Server directory of Machine 2
5 Change the job server information of each service, the job server info is stored <JobServer> xml tag in th e host and port attributes. Replace the
highlighted attributed with the job server host and port of this machine
<JobServer
connection=" -SVANPGWIN195 -NOracle -UDS122
-P;994973A404D5BDB1552BC53D9587EAFDF3101D2A01633F444501B944F039A864"enabled="1"host="SJ-W-01-TEST"max="1"min="1"port="
3595"/>
6 Start both the Access servers
Useful Diagrams to depict failover/load balancing setup

You might also like