Professional Documents
Culture Documents
ABC Cook Book
ABC Cook Book
ABC Cook Book
Cook Book
===========================================
===============================================================================
PURPOSE
The objective of this book is to make it possible to use theABC framework in the
best possible way for PMDB. It is extremely import to clearly understand the
ABC frame work and the best practices associated with it. Most problems
encountered with the ABC framework are often due to lack of clear
understanding of the framework or improper usage of the framework. This
books aims to provide all the necessary details required to make the best use of
the ABC framework for PMDB - It is not an objective of this book to explain
all the features of ABC framework itself.
Stream
Step
ABC OF ABC
A for Audit, B for Balance, C for Control.
The only functionality of ABC that is used in PMDB is the C part the
Control Part. We will not spend any time discussing the Audit and Balance
functionality that is not used in PMDB.
A unique identifier
An alias name.
Always set to BSMRExec
Set to BSMRLauncher or
BSMRAuditLauncher
Maximum execution time in minutes
Maximum number of retries on
failure
Resource DwId:
Resource Type:
Pool Count:
Score:
A SAMPLE STREAM
Given below is a sample ABC stream. This has been provided for illustrating
different Step status and how the stream behaves. In reality most of the job
Streams are much simpler than these, consisting often of a sequence of Steps
alone and some times a couple of parallel paths.
The main points that you should remember here are these
When a Step fails no futher child Steps will get executed. For
example, in the diagram, Steps M and P will not get executed because
Step J failed.
The child Steps get executed only after the parent Step has finished
successfully. For example, in the diagram, Step K will get executed only
after the Step H is successfully completed.
It is possible that some Stream paths can complete successfully even if
some other Stream paths fail. For example, the stream path that ends
with Step O will run to completion even if the stream path that ends with
step P fails to complete successfully.
STATE TRANSITION
This section builds up on the previous section and provides some additional
details related to the different States a Step can take and state transitions
allowed.
Infact there are two things here
1. Status:
This is based on the exit code for a Step. This indicates how the task
associated with a Step completed its execution.
2. State:
This is the transition states a Step goes through during its life time.
The picture below shows the different status a Step can move to during
the course of its execution. Some point to note are these
I.
Once you have completed the job of writing the stream definition XML file, you
are ready to import the definition to the ABC dictionary tables. This step is
perfomed typically at the time of content pack installation. So this is a one time
task. The command to execute to do this is below.
II.
Unlike the importing of definition, this is not an one time task. The import
definition task will place the stream definition in to certain dictionary tables
internal to ABC. However to execute a stream, these definitions should be
loaded to a set of ABC run time tables. This is a task that should be executed
periodically, using a scheduler, if you want the Steps in the Stream to get
executed periodically. The command load Stream definition to ABC run time
table are the ones below.
To load a specific stream:
abcLoadNRun -loadBatch streamId <Stream Id>
To load all the streams that are eligible: (Recommended option)
abcLoadNRun -loadBatch allStreams
Some points to Note:
A new instance of a stream cannot be loaded if an Active instance of a
stream is present in the run time tables.
The abcLoadNRun loadBatch allStreams command will load all the
streams that are eligible (i.e. those without an Active instance in run
time tables) to the run time tables. This would mean that it would not be
possible to control the loading of streams individually. Each stream will
be loaded for execution as soon as it finishes execution, depending on the
frequency at which the load batch command is invoked for all streams.
The command is a called loadBatch because it is loading a batch of Steps
which is nothing but a Stream.
Debugging:
The errors and log messages generated during the load batch operation is
logged in dw_abclauncher.log in PMDB HOME/log folder.
III.
The load batch operation took care of getting the Stream definitions to the run
time tables. However this itself does not ensure tha the Steps will get executed.
The Steps will get executed only when the ABC Run Steps command is invoked.
The command line is as below
abcLoadNRun runSteps
The run steps command will check all the streams in the run time tables,
scouting for steps that are in the Waiting state. It executes the eligble steps in
no specific order, but executes them all.
The one other point that needs to be remembered here is that, if there a Step
under execution which has requested exclusive access to a resource required by
another Step which eligible for running, the later Step will not get invoked by
the framework.
Debugging:
The errors and log messages generated during the load batch operation is
logged in dw_abclauncher.log in PMDB HOME/log folder.
IV.
This is an optional step. It is possible to specify the list of resources to which a
step needs exclusive access. The resource definitions can be imported after the
stream which contains the Steps has been imported. Once the resource
definition for a Step is imported, any future RUN STEP command will take the
resource in to consideration. If resource required by a Step is already held by
another Step, i.e. if another Step with the same resource requirement is
running, then this particular Step will not get executed. It will continue to be
in the Waiting state.
The command to import a resource definition is as below
abcResourceLoader file <Resource file name>
Note: The Resource definition frame is very powerful. Though the only resource
type supported is Table, there is no check really performed to see if the
resource type is a Table indeed. It is actually just a namespace with a string
value. Using this feature it possible to achive two different objectives
1. Provide exclusive access to resources and help resolve locking issues with
critical resources. If the pool count, explained earlier, then access to the
resource becomes exclusive.
2. Control the number of processes launched. This feature is not currently
used but it is possible. For example, if I set a pool count of 5 for a
resource named Summary and attach the resource to all the trend_sum
jobs, then only a maximum of 5 instances of 5 trend_sum can run at any
point of time. This way the number of processes running can be
controlled.
I.
The command that allows you manage the steps is abcStatusSetter. The name
however is a misnomer. It is actually a State setter and not a Status setter.
This command allows you to change the state of a Step as allowed by the
supported state transitions discussed in an earlier section. The state setting can
only happen in the forward direction and not in the reverse direction.
The state of the Steps is normally managed by the framework itself and you
will never have to use the status setter. It becomes useful only in unusual
circumstances when you want to manage the state of a Step yourself.
The usage is as below:
abcStatusSetter -processId -running | -waiting | -success | -warning
| -error | -final [-info "status info"]
options:
errorupdateDWABCDBwitherrorstatusforthisjob
finalupdateDWABCDBwithworststatusofallaudit
metricsforthisjob
helpprintthishelpmessage
info<"statusinfo">Optionalstatusinfo
processId<"processId">StreamstepprocessId
runningupdateDWABCDBwithrunningstatusforthisjob
successupdateDWABCDBwithsuccessstatusforthisjob
waitingupdateDWABCDBwithwaitingstatusforthisjob
warningupdateDWABCDBwithwarningstatusforthisjob
II.
The command that allows you to manage the Stream is abcBatchControl. The
command is called batch control because a Stream is nothing but a batch of
steps. The abcBatchControl command allows you to change the status of a
Stream, as per allowed state transitions. This command will most often be used
to manually Abort a stream, so that a new instance of the Stream will get
loaded during the next load batch operation. The syntax of the command is as
below.
abcBatchControl <options> <streamId>
<streamId>:theidofthestreamonwhichtooperate.
options:
abortabortthespecifiedstream(killallcurrentlyrunning
constituantprocesses)
alloperateonallloadedstreams
helpprintthishelpmessage
resumeresumethespecifiedstream
streamId<arg>performtheactiononthenamedstream
suspendsuspendthespecifiedstream(allowanyrunningprocessto
complete)
Note: Please do not use the Suspend and Resume functionality, this has not been
tested .
Debugging:
The errors and log messages generated during the load batch operation is
logged in dw_abclauncher.log in PMDB HOME/log folder.
UI
This would be the first and most popular means of monitoring ABC
streams. The internal monitoring UI for ABC Streams available in the
PMDB Admin UI displays the status of the last run of a Stream
graphically (similar to one on sample stream in a section above). The UI
currently does not have the capability to display historical stream status
(Status information for streams is preserved for 30 days in the ABC run
time tables), however this will added going forward. The Stream
monitoring UI provides the capability navigate and view the Stream of
choice.
The second mechanism the UI provides to know the Stream status is
Alerts. ABC has an execution_log which stores the status of all the Steps
executed. The Alert windows picks up and displays any Fatal errors, so
that its immediately brought to the attention of PMDB administrator.
Log File
There are two log files of interest (a) importdefs.log and
(b) dw_abclauncher.log, both found in the PMDB_HOME/log folder. These
have been already discussed in earlier sections.
Another import log mechanism is the execution_log table. Here the exit
log associated with every Step is logged and the log messages are tied to
the unique identifier of the Step in run table. Currently there is no
mechanism to display these logs in the UI, however this will be added in
this release itself.
FOLLOW RECOMMENDATIONS
As in any frame work, not using the ABC frame work correctly can result in a
lot of frustration and undesirable funcationliaty. Here are a set of
recommendations, if followed, will enable you to use ABC in the PMDB context
in the best possible way. This section provides specific recommendation on
certain PMDB processes and also some general guidelines on how to use ABC
effectively.
There are two parameters that you should know exactly how to handle. These
directly impact the ABC Stream execution functionality.
1. Maxexectime: This is the maximum execution time for a
command in minutes. Setting a very high value here will result in
delay in identifying hanged processes. Setting a low value here will
result in the Step getting marked as Error.
2. Maxretries: This parameter instructs ABC on the number of times
to retry, until successful execution. A Step will be re-tried only if it
has failed in the previous execution. The Step will be re-tried
during the next time the RUN STEPS command runs, as scheduled.
The Maxretries option should be set to a positive value only if a
retry could result in successful execution. This option is most
commonly used to over resource unavailability issues. The value of
this should be set carefully after careful examination of the use
case. This can be edited through the PMDB Administration UI.
Given below is some specific recommendations that you SHOULD follow. The
best way to arrive at the value of configuration parameters is to understand
the use case, test and arrive at an optimum value. This should be done by the
Stream developer. If this is not possible, the next best thing to do will be to
follow the recommendations given below.
TREND SUM:
Re-try recommendation:
Does re-try make sense? Mostly no. There never is a scenario where a re-try
will make an already errored out trend sum task to succeed. The scenario of
trend sum succeeding on retry due to system resource availability is rare. The
IQ concurrency issues with trend sum should be handled through the ABC
Resource Loader feature. Two trend sums will typically never run in parallel
for a set of destination tables. Since only key_id based summarization is
supported we never insert in the dimension tables.
Recommended value of retry: 2
Max execution time recommendation:
This depends on two thigns 1) data in the table and 2) type of summarization
(for example, forecast summarization takes long time). There is no single
recommendation possible here. If it is not possible to arrive at an appropriate
value, it is recommended that you put a value of 4 hours for all levels of
summarization.
Recommended value of maxexec time: 4 x 60 = 240
Resource locking recommendations:
The only available resource as of now is table type. This is typically used to
avoid Sybase multi writer problem. However with trendsum in BSMR there is
typically no scenario where multi write problem exists. No need to give any
resource locking for trend sum at all.
Recommendation: No resource locking required
LOADER:
Re-try recommendation:
When a number of loader processes for different facts but with common
dimensions are launched concurrently there could be locking issues. This can be
resovled to an extent using the re-try option. Re-try forsystem resources is
another use case. The number of re-tries to be configured will be a function of
number of loader jobs planned to be launched concurrently which share
common dimension tables. Retry could be required for dimenstion tables as
well, for snowflake and conformed dimensions. An important point to
remember here is that Loader has its own retry logic as well. Considering all
the above the recommended re-try value is as below.
AUDIT STEPS:
Re-try recommendation:
The re-try makes sense for audit steps for resource unavailability only.
Recommended value = 3
Max execution time recommendation:
This is an audit step that only involves a JMX lookup. This should complete
really fast.
Recommended value = 5
Resource locking recommendations:
There are no resource locking requirements for the audit steps.
Recommendation = No resource locking required.
NOTE: An important point to note is that, there is no need to run the audit
steps in sequence. The auditing is happening for an event in the past and all the
audit steps can be executed in parallel. This is recommended, however not
mandatory.
FAQ
1. Is there a way to know if max re-tries are exceeded for a Step in error?
Currently No. The user interface will be enhanced to support this going
forward. If you are interested you can query the ABC dictionary tables
and figure this out.
2. More to be added..
A BIT OF ARCHITECTURE
This is at the end, because you really dont need to read this. Read on only if
you are curious.
This section yet to be filled in