Professional Documents
Culture Documents
TIBCO BE Sol Best Practices v0.4
TIBCO BE Sol Best Practices v0.4
This document represents a guide to best practices for architecting, designing, and implementing solutions based on TIBCO BusinessEvents. The guide covers generic patterns as well as specific solution patterns such as the Transaction Analysis Solution. For specific solution examples it will cover also models for physical architecture and capacity planning. This document will evolve over time when new insight is gained from real-life projects and requirements/constraints.
http://www.tibco.com Global Headquarters 3303 Hillview Avenue Palo Alto, CA 94304 Tel: +1 650-846-1000 Toll Free: 1 800-420-8450 Fax: +1 650-846-1005
Copyright 2004, TIBCO Software Inc. All rights reserved. TIBCO, the TIBCO logo, The Power of Now, and TIBCO Software are trademarks or registered trademarks of TIBCO Software Inc. in the United States and/or other countries. All other product and company names and marks mentioned in this document are the property of their respective owners and are mentioned for identification purposes only. 0204
1 Document Revisions
Version 0.1 0.2 0.3 0.4 Date Oct 6th, 2004 May 23rd, 2005 Nov 9th, 2005 Dec1st, 2005 Author R. GomezUlmke H. Karmarkar H. Karmarkar H. Karmarkar Document Created. Document Updated for BusinessEvents version 1.1 Document Updated for BusinessEvents version 1.2 Document Updated to include monitoring and fault tolerance discussion Notes
Table of Contents
1 Document Revisions.......................................................................................2 2 Purpose of Document......................................................................................3 3 Generic Design & Implementation Patterns..................................................4
3.1 Designing the Ontology.............................................................................................4 3.2 Designing Rules........................................................................................................4 3.3 Designing State Machines........................................................................................5 3.4 BE Archive Configuration..........................................................................................6 3.4.1 Selecting Rulesets........................................................................................6 3.4.2 Enabling Input destinations..........................................................................6 3.4.3 Designing the Engine Persistence................................................................6 3.5 Configuring the Deployment......................................................................................7
2 Purpose of Document
The purpose of this document is to provide a guide for architecting, designing, and implementing solutions based on TIBCO BusinessEvents TM. This document assumes familiarity with the product and availability of It covers a collection of best practices in the following areas: Design and implementation of ontology, concepts, rules, state models, etc. based on performance optimization criteria as well as operational maintainability, flexibility of design, etc. Architecture patterns for fault-tolerance and scaleability. Specific solution patterns such as Transaction Analysis and others. Physical architecture patterns and capacity planning patterns for general solutions but also for specific solutions.
This guide will evolve over time and will be enhanced with lessons learnt from specific real-life projects. The targeted audience are architects, designers, and developers.
Throughout this document Courier New will be used for rules engine code samples, and Verdana will be used for deployment variable names as well as property file entries.
Apply the following practices for designing the ontology in general for optimal performance and throughput.
2. 3.
4.
5.
3.2
1.
Designing Rules
Use rulesets to group together rules with similar logical functions, this facilitates deployment design, as well as making it easier to activate and deactivate certain functionality at runtime using Tibco Hawk methods. Always comment rules to make the goal of the rule clear to other developers who may have to maintain the project Use different types of concepts or events as the identifiers in different rules and avoid using generic parent types within rules that would better address sub-types. BE first filters out unwanted instances/events based on their type before passing them to rules evaluation. Use filter conditions (a condition that uses only one single identifier) wherever possible. E.g. Order.amount > 1000, or Order.customer == "Fred". BE first evaluates all the instances/events using filter conditions before matching them in subsequent join conditions. 5. Minimize the number of identifiers in a rule. Each additional identifier in a rule requires an additional join in rule. Joining objects is expensive, as a rule has to match/test N x M combinations. However, it is impossible to avoid joining or matching multiple objects. So, use simple join conditions if possible based on single keys rather than multiple property evaluations. BE has optimized for some join conditions like equivalent join condition (e.g. 'Order.customerId == Customer.Id', or 'Function(X) ==Y.property'). These types of join condition provide constant-time performance.
2.
3.
4.
6.
Avoid using I/O functions in the conditions/actions. E.g. updating a database in the action. Do this asynchronously whenever possible. For example,send an event to trigger a BW process to update the DB. BE will support asynchronous functions in future releases and the asynch functions will be executed independently outside the working memory. Avoid using functions that take a concept as a parameter in rule condition e.g. function(ConceptA instance). The engine will not trace out the dependency in the condition and will make the rule dependant on any change to the instance. So, for any change to the instance, the rule will be re-evaluated regardless of which property changed Minimize the usage of the XSLT mapper and XPath Builder for creating and modifying instances/events within rules. Executing XSLT and mapping/creating XML is expensive. Use the factory methods whenever possible from the Ontology functions tab to create new instances or events; use Event properties instead of Event payload is possible. Take extra care to avoid using the XPath builder in the condition when the argument is a concept, this creates the "any change" dependency described above in the point above. Perform a check for null when evaluating the length of a string e.g. cu.ATTRIBUTE13 !=null && String.length(cu.ATTRIBUTE13)>0
7.
8.
9.
10. Perform a check for array length before accessing an array property by index e.g. cu.array@length > 4 && cu.array[4] == "xx" 11. Delete instances and consume events that are no longer needed. Keeping them in working memory uses memory resource and may result in unnecessary matching evaluation both slowing the engine and unnecessarily consuming memoty. 12. When Identifying containment relationships with concepts, matching the container and contained instance by contained@parent == container is much more efficient than Instance.PropertyArray.indexOfContainedConcept(container.containedConceptProperty , contained) != -1
3.3
1.
2.
3.
3.4
BE Archive Configuration
3.4.1 Selecting Rulesets
When configuring the archive you have the option of setting a number of parameters
By selecting only the rulesets that are crucial to your engine you can minimize the footprint of the engine at runtime and maintain the maximum performance. However, any ruleset that is not selected here will not be compiled into the engine, and cannot later be activated through Hawk.
3.4.2
If you select the listener set of default every destination that is the default destination for an event within a rule declaration will be enabled as an input destination. If you select custom, you will be able to designate which destinations the engine will receive input messages from
3.4.3
The following general recommendations should be considered when configuring BE engine persistence, but should only be considered guidelines as persistence tuning is very dependant on the use case, the manner in which events received and dealt with, as well as the runtime state of the engine/working memory (average rate of events, burst rates, number of active existing objects) 1. Enabling persistence adds overhead and will slightly reduce the speed of the engine, but it also enables the property cache allowing least used properties of objects to be swapped out to disk. In cases where the amount of objects in memory would exceed the available memory for the heap, persistence should be enabled (with the truncate deployment option if the persistence is not explicitly required by the use case, but only enabled to use the property cache). 2. The property cache size is the number of object properties that business events will keep in memory. This setting should be altered depending on the use case based on number of object properties that should remain in memory when persistence is enabled. Based on our LRU ( least recently used) implementation, the most actively used properties will remain in the property cache, up to the number defined by the user in this field. A property cache size that is too low will lead to thrashing, and a property cache size that is too high will lead to too much memory consumption. It is important to test this property setting for best performance, as it is highly dependent on the use case. 3. The checkpoint interval for persistence is another configuration parameter that depends heavily on the use case, specifically the rate of changes to the facts in the engine. Each checkpoint only writes the modified objects to disk. With many events coming in and altering facts, it makes sense to have a lower checkpoint interval, whereas with few events it can make sense to have a larger interval, we recommend a range between 10-50 seconds with a default of 30. 4. Unless you have specific analysis requirements for all historical instances, use the "Delete Retraced Objects from Database" option in persistence configuration. Checking this option physically removes the deleted objects from the persistence layer database as opposed to keeping the object in the database and marking it deleted internally. This will help in preventing unchecked growth of the persistence database. Also, under certain conditions, it may provide better performance from the persistence layer. 5. Refer to deployment configuration for information on the Berkeley DB cache size.
3.5
The following general rules should be applied to Business Events Deployment configurations, but optimal setting are highly dependent on the actual use case and runtime character of the system 1. The Berkeley DB cache percentage is initially set to 20%. This means that when persistence is enabled, 20% of the entire jvm heap will be used for the cache for the embedded DB. The adjustment of this parameter depends heavily on the way the objects are used by the engine, and the best approach is testing. We recommend keeping values between 15-30%. This property can be set at deploy-time through the administrator using the deployment variable be.engine.om.berkeleydb.internalcachepercent, but this value should not be adjusted unless the trade off of persistence layer performance versus memory consumption has been carefully observed through testing of the specific use case with realistic volumes of data In a multiple CPU machine, the following properties should be set as follows: be.engine.wm.poolsize=1 be.engine.wm.queuesize=<some big number e.g. 10000 3. To run BusinessEvents, the administrator is recommended, but not required. During development and testing it is often easier to run the deployment from the BE tester within designer, or to run ear file from a command line by running the executable and passing the ear file as an argument. .\be-engine.exe -propFile <filename>.ear (Note that the executable will pick up whatever tra file shares the same name as the executable within the current working directory)
2.
4.2
Scalability
BE is designed to handle high message volumes, but in rare cases where the production scenario exceeds the capability of a single engine there are a multiple options for scaling based on the scenario.
4.2.1
Stateless Scenarios
For a stateless use case where rules are evaluated against single events, and no cross event correlation is required scaling and load balancing can be handled by using a transport to split input messages across multiple engines, such as using EMS with round robin delivery on queues, or using Rendezvous distributed queues ( RVCMQ).
4.2.2
Stateful Scenarios
When the use case requires stateful scaling some sort of content based partitioning between multiple engines is recommended. The two recommended ways of handling this are by using message selectors in EMS, or by
configuring a BE engine for pre-correlation thereby ensuring that all messages for any given flow are processed a single engine. When using EMS selectors one has to be able to partition the set of incoming messages by a single field such as a transaction id, a region, or some other key that will be consistent within all messages across a transaction, but allow for enough separation of the total set of events to provide performance gains by partitioning between engines. Pre-correlation methods are further discussed in the BE design patterns document
4.3
4.4
Handling Duplicates
BE provides a 100% fail-safe environment for event processing by using a mechanism similar to a transaction log of a database combined with the persistency of the JMS layer. An internal Object Manager logs all activity of the engine into persistent memory for potential later recovery. Recovery at this time is really a re-play of all things that have happened in the right order. In order to satisfy also performance requirements this transaction log is flushed to disk in the background via a separate thread based on a configurable time (for this example lets assume an unusually large window of, every 10 minutes). This leaves potentially a window of 10 minutes where internal state can be lost due to a fatal crash of the engine. In this case, all input i.e. events are still stored in the guaranteed transport, i.e. they have not been acknowledged at this point. The recovery mechanism will then: 1. 2. re-play the internal BE transaction log start receiving 10 minutes worth of guaranteed transport S input events
At this point exactly the same state as before the crash has been established. As with all asynchronous, real-time systems the possibility of duplicates needs to be handled properly. This is not specific to BE but in general applicable for non-2-phase commit based systems. There is no generic mechanism to achieve duplicate detection since it can happen within all layers of the solution where stateful processing is applied. I.e. in this solution these are the BE engines, the databases, the guaranteed transport, etc. All mechanisms for duplicate detection are tied to specific application and state logic and must be dealt with explicitly. (however, BE ensures that itself does not consume guaranteed transport messages twice by comparing message IDs) For example, the engine crashes 9 minutes after the last flush. During these last 9 minutes a number of alerts, KPIs, etc. have been generated and sent out to the presentation layer or database. In this case duplicate alerts, events, etc. need to be avoided. The logic that inserts events or alerts or KPIs into a database needs to be duplicate-aware and handle them appropriately. Another level of potential duplicates can be caused by faults within the collection layer, the data source / agent layer, etc. I.e. any component within the downstream layers of BE. Therefore, as part of the BE models e.g. state models explicit duplicate detection must be modeled based on event IDs, state IDs, etc. Example Scenario Events are received by BE via JMS which stores them persistently until explicitly acknowledged BE receives the event, does NOT acknowledge to JMS at this point BE evaluates all rules that are triggered by this event Rules that are triggered call the Event.consume() function to mark this event as potentially to be consumed and acknowledged Once all rules have been evaluated and the consume flag has been set, BE acknowledges the event to JMS and deletes it from memory; but not before its internal transaction log has been flushed to disk.
4.5
and future updates as well as allow the functionality to be disabled independently of the application logic, as well as the statistic collection Time intervals for publishing out statistics events should be stored in scorecards and initialized with global variables. This will allow them to be changed with rules and incoming events later, but also give the deployment administrator the ability to check the initial value and manage it as she would any other deployment parameter
4.5.2
Like most TIBCO products, BE exposes certain monitoring and management functionality through a Hawk Microagent. The methods exposed are detailed in the BusinessEvents documentation, but aside from the standard stop functionality some BE specific methods are available the behavior of the engine at runtime, and getting information about the objects contained within the working memory. By and large the object inspection methods should be used sparingly to minimize expensive calls to the object manager, but when combined with Scorecards for tracking limited statistics these methods can provide access to data within a hawk display that would otherwise require a separate GUI.
10