Professional Documents
Culture Documents
BODS Starter Guide
BODS Starter Guide
1
Understanding Data Integrator
BODS ARCHITECTURE
Job Server
Access Server
Designer is a GUI based client tool or development tool that allows you to create, test and execute jobs that populates to Data warehouse.
The Administrator provides you to Scheduling, monitoring, and executing batch jobs Configuring, starting, and stopping real-time services Configuring Access Server, and repositories Configuring and managing adapters Managing users for Secure central repositories, Profiler repositories
DI Services
When we install DI on your windows platform two services will be installed DI Service: It starts the job server and access server automatically when system starts. DI Web Server: It supports web-based applications like DI Admin and Metadata reporting tool.
DI Access Server
It is a real-time request reply message broker that collects the message request roots them to realtime service & delivers a message reply with in user specified time frame. The access server maintains the queue of messages and senses them to next available real-time service across any number of computing resources.
DI Web Server
It supports web-based applications like DI Admin and Metadata reporting tool.
Auto Documentation
It captures the repository information of all the objects like projects, jobs, WFs, DFs etc., and can prepare a report, can take a print outs as well.
This kind of reports provides graphical representation of DI job execution statistics, job execution duration and execution, duration statistics histories.
Data Quality
This kind of reports provides graphical representation of DI validation rules that you created in batch jobs. We can identify inconsistencies or errors in source data.
2
Defining Source and Target Metadata
i. Using data stores ii. Importing metadata iii. Defining a file format
Understanding datastores
A datastore provides a connection or multiple connections to data sources such as a database. Through the datastore connection, Data Integrator can import metadata from the data source, such as descriptions of fields. Data Integrator uses datastores to read data from source tables or load data to target tables When you specify tables as sources or targets in a data flow, Data Integrator uses the datastore to determine how to read data from or load data to those tables. Data Integrator reads and writes data stored in flat files through flat file formats. It reads and writes data stored in XML documents through DTDs and XML Schemas. The specific information that a datastore contains depends on the connection. When your database or application changes, you must make corresponding changes in the datastore information in Data Integrator.
1.
From the Datastores tab of the object library, right-click in the blank area and click New. In the Datastore name box, type ODS_DS. This datastore name labels the connection to the database you will use as a source. The datastore name will appear in the local repository. When you create your own projects/applications, remember to give your objects meaningful names. In the Datastore type box, click Database. In the Database type box, click the option that corresponds to the database software being used to store the source data. The remainder of the boxes on the Create New Datastore window depend on the Database type you selected.
2.
3. 4.
6.
Click OK. Data Integrator saves a datastore for your source in the repository.
Importing Metadata
Metadata consists of:
Table name Column names Column data types Primary key columns Table attributes RDBMS functions Application specific data structure
File formats:
Data Integrator can use data stored in files for data sources and targets. A file format defines a connection to a file. Therefore, you use a file format to connect Data Integrator to source or target data when the data is stored in a file rather than a database table. When working with file formats, we must: Create a file format template that defines the structure for a file.
The file format editor has three work areas: Properties-Values: used to edit the values for file format properties. Expand and collapse the property groups by clicking the leading plus or minus. Column Attributes: used to edit and define the columns or fields in the file. Field-specific formats override the default format set in the Properties-Values area. Data Preview: used to view how the settings affect sample data.
Can contain these statements: Function calls If statements While statements Assignment statements Operators Basic rules for syntax of the scripts are: Each line ends with a semicolon(;). Variable names start with a dollar sign($). String values are enclosed in single quotation marks(). Comments start with a pound sign(#). Function calls always specify parameters even if the function uses no parameters
Scripts
Template tables
During the initial design of an application, you might find it convenient to use template tables to represent database tables. With template tables, you do not have to initially create a new table in your DBMS and import the metadata into Data Integrator. Instead, Data Integrator automatically creates the table in the database with the schema defined by the data flow when you execute a job. After creating a template table as a target in one data flow, you can use it as a source in other data flows. Though a template table can be used as a source table in multiple data flows, it can only be used as a target in one data flow. Template tables are particularly useful in early application development when you are designing and testing a project.
When the job is executed, Data Integrator uses the template table to create a new table in the database you specified when you created the template table. Once a template table is created in the database, you can convert the template table in the repository to a regular table.
Once a template table is converted, you can no longer alter the schema.
3
Creating a batch job
i. ii. iii. iv. v. Creating a batch job Creating a simple data flow Adding source and target objects to a data flow Using the Query transform Executing the job
Object Hierarchy
Jobs:
A job is the only object you can execute. You can manually execute and test jobs in development. In production, you can schedule batch jobs and set up real-time jobs as services that execute a process when Data Integrator receives a message request.
We can include any of the following objects in a job definition: Data flows Sources Targets Transforms Work flows Scripts Conditionals While loops Try/catch blocks
Work flows
A work flow defines the decision-making process for executing data flows (a work flow may contain a data flow). For example, elements in a work flow can determine the path of execution based on a value set by a previous job or can indicate an alternative path if something goes wrong in the primary path. Ultimately, the purpose of a work flow is to prepare for executing data flows and to set the state of the system after the data flows are complete.
Data flows
Data flows extract, transform, and load data. Everything having to do with data, including reading sources, transforming data, and loading targets, occurs inside a data flow. The lines connecting objects in a data flow represent the flow of data through data transformation steps.
1. With JOB_SalesOrg selected in the project area, click the work flow button on the tool palette. 2. Click the blank workspace area. A work flow icon appears in the workspace. The work flow also appears in the project area on the left under the job name (expand the job to view). 3. Change the name of the work flow to WF_SalesOrg. 4. Click the name of the work flow. An empty view for the work flow appears in the workspace. You will use this view to define the elements of the work flow. Notice the title bar changes to display the name of the work flow.
1. Click the square on the right edge of the source file and drag it to the
triangle on the left edge of the query transform.
2. Use the same drag technique to connect the query transform to the target table.
QUERY Transform
The query editor, a graphical interface for performing query operations, contains these areas: Input schema area Output schema area Parameters area The i icon indicates tabs containing user-defined entries.
1. Select the job name in the project area, in this case JOB_SalesOrg. 2. Right-click and click Execute.
4
Validating, Tracing, and Debugging Batch Jobs
I. II. Validating and tracing jobs Debugging jobs
All Objects in View Validates the object definition open in the workspace and all of the objects that it calls.
Also note the Validate Current and Validate All buttons on the toolbar.These buttons perform the same validation as Current View and All Objects in View, respectively.
Trace objects
Monitor Tab
Log tab
Use trace properties to select the information that Data Integrator monitors and writes to the trace log file during a job. Data Integrator writes trace messages to the trace log associated with the current Job Server and writes error messages to the error log associated with the current Job Server.
The statistics log (also known as the monitor log) quantifies the activities of the components of the job. It lists the time spent in a given component of a job and the number of data rows that streamed through the component.
Data Integrator produces an error log for every job execution. Use the error logs to determine how an execution failed. If the execution completed without error, the error log is blank.
If your Designer is running when job execution begins, the execution window opens automatically, displaying the trace log information.
5
Using Built-in Transforms
I. Describing built-in transforms
Transforms
A transform is a set in a dataflow that acts on a data set
Transform manipulate input sets and produce one or more output sets
Sometimes transforms such as Date_generation and SQL transform can also be used as source objects. Use operation codes with transforms to indicate how each row in the data set is applied to target table Most commonly used transform is Query transform & SQL transform.
Case Transform
Specifies multiple paths in a single transform (different rows are processed in different ways).
The Case transform simplifies branch logic in data flows by consolidating case or decision making logic in one transform. Paths are defined in an expression table.
The Case transform editor includes an expression table and a smart editor.
The connections between the Case transform and objects used for a particular case must be labeled. Each output label in the Case transform must be used at least once.
Merge Transform
Combines incoming data sets, producing a single output data set with the same schema as the input data sets. All sources must have the same schema, including: The same number of columns The same column names The same data types of columns If the input data set contains hierarchical data, the names and data types must match at every level of the hierarchy.
SQL transform
Performs the indicated SQL query operation. Use this transform to perform standard SQL operations when other built transforms cannot perform them. The options for the SQL transform include specifying a datastore, join rank, cache, array fetch size, and entering SQL text.
Array fetch size Indicates the number of rows retrieved in a single request to a source database. The default value is 1000. Higher numbers reduce requests, lowering network traffic, and possibly improve performance. The maximum value is 5000. Cache Select this check box to hold output from the transform in memory for use in subsequent transforms. Select Cache only if the resulting data set is small enough to fit in memory. SQL text The text of the SQL query. This string is passed to the database server. You do not need to put enclosing quotes around the SQL text. You can put enclosing quotes around the table and column names, if required by the syntax rules of the DBMS involved. Update schema Click this option to automatically calculate and populate the output schema for the SQL SELECT statement.
Map Operation
Allows conversions between data manipulation operations. The Map_Operation transform allows you to change operation codes on data sets to produce the desired output. For example, if a row in the input data set has been updated in some previous operation in the data flow, we can use this transform to map the UPDATE operation to an INSERT. The result could be to convert UPDATE
Use this transform to produce the key values for a time dimension target.
From this generated sequence you can populate other fields in the time dimension (such as day_of_week) using functions in a query. Data Outputs A data set with a single column named DI_GENERATED_DATE containing the date sequence. The rows generated are flagged as INSERT. The Date_Generation transform does not generate hierarchical data. Generated dates can range from 1900.01.01 through 9999.12.31.
Validation Transform
The validation transform provides the ability to compare your incoming data against a set of pre-defined business rules and, if needed, take any corrective actions. The Data Profiler and View Data features can identify anomalies in the incoming data to help you better define corrective actions in the validation transform. Qualifies a data set based on rules for input schema columns. Allows one validation rule per column. Filter out or replace data that fails your criteria. Outputs two schemas: Pass and Fail. The Match Pattern option allows you to enter a pattern that Data Integrator supports in its Match_patttern function. Action on Failure tab On this tab you specify what Data Integrator should do with a row of data when a column value fails to meet a condition. You can send the row to the Fail target, to the Pass target, or to both. You can choose to substitute these values with a value or expression using the smart editor provided. For example, the INVALID value in this case.
Pivot Transform
Creates a new row for each value in a column that you identify as a pivot column. The Pivot transform allows us to change how the relationship between rows is displayed. For each value in each pivot column, Data Integrator produces a row in the output data set. We can create pivot sets to specify more than one pivot column.
Suppose we have a table containing rows for an individuals expenses, broken down by expense type. This source table has expense numbers in several columns, so we might have difficulty calculating expense summaries. The Pivot transform can rearrange the data into a more manageable form, with all expenses in a single column, without losing category information.
Table Comparison Transform Examines and compares the source and target tables Generates INSERT for any new row not in the target table
Generates UPDATE for any row in the target table that has changed
Ignores any row that is in the target table and has not changed Fills in the generated key for the updated rows
1
SOURCE
2
Query
3
Table Comparison Key Generation CUSTOMER
5 4
Table_Comparison Transform
Allows you to detect and forward changes that have occurred since the last time a target was updated
Compares two data sets and produces the difference between them as a data set with rows flagged as INSERT or UPDATE
Allows you to identify changes to a target table for incremental updates
Table_Comparison Transform
Data input Input dataset must be flagged as Normal If input dataset contains hierarchical data, only the top-level data is included in the comparison, and nested schemas are not passed through to the output Options: Table name : Name of the data store Generated key column (Optional) : Compares the row with largest key value of row and ignores other rows, if same row is updated multiple times Comparison method Input primary column : Primary column of input dataset, this key must be present in comparison table with same name and data type Compare columns: Compares only selected column from input dataset, if no columns are listed, all columns in input dataset which are also there in comparision tables are used as compare columns
Table_Comparison Transform
Data Output : A dataset containing rows flagged as INSERT or UPDATE Contains only the row that make up the difference between two input sources Three possible outcomes from this transform An Insert row is added An Update row is added Row is ignored
Example: Changes in Customer Region would give you incorrect result if you simply update the Customer Record. Source to extract the rows from source table Query to map column from the source Table comparison transform to generate INSERT and UPDATE and to fill in existing keys History Preserving transform to convert certain UPDATE rows to INSERT rows Key Generation transform to generate new keys for the updated rows that are now flagged as INSERT
Source customer table Target customer table Target customer history table
Region East
GKe y 1
GKe y
Data
Freds Coffee
Region
East East Central
1 2 3 4
Janes Donuts
Sandys Canada Janes Donuts
West Centra l 2 3
Janes Donuts
Sandys Canada
Date columns - Valid from A date or date time column from the source schema. Date columns - Valid to A date or date time column from the source schema. Compare columns The column or columns in the input data set for which this transform compares the before- and after-images to determine if there are changes. If the values in each image of the data match, the transform flags the row as UPDATE. The result updates the warehouse row with values from the new row. The row from the before-image is included in the output as UPDATE to effectively update the date and flag information. If the values in each image do not match, the row from the after-image is included in the output of the transform flagged as INSERT. The result adds a new row to the warehouse with the values from the new row.
Descriptions of transforms
Description
Itemizes the steps executed in the job and the time execution began and ended
Error log
Displays each step of each data flow in the job, the number of rows streamed through each step, and the duration of each step. Displays the name of the object being executed when an Data Integrator error occurred and the text of the resulting error message. If the job ran against SAP data, some of the ABAP errors are also available in the Data Integrator error log.
Error log
Monitor log
Statistics log
6
Built in Functions
I. Describing built-in functions
Built in Functions
Understanding variables Describe the variables and parameters window Difference between global and local variables Create global variables Set global variable values Explain language syntax Use strings and variables in Data Integrator Scripting language. Scripting a custom function
Understanding variables Variables are symbolic placeholders for values. The data type of variable can be any supported by Data Integrator such as integer, decimal, date or text string. A script is a single use object used to call functions and assign values to variables in a work flow A catch is a part of serial sequence called try/catch block. A conditional process uses a conditional which is a single use object, available in work flows, that allows you to branch the execution logic based on results of an expression.
Script
Work Flow
Variable defined:
$AA int
$BB int Catch If $BB<0 $BB=0; $AA=$AA+$BB;
Data Integrator displays the variables and parameters defined for anwindow Variables and Describing variables and parameters object in the Parameters window. Variables and Parameters window contains 2 tabs. Definitions tab allows you to create and view variables(name and data type) and parameters(name, data type, and parameter type) for an object type. Calls tab allows you to view the name of each parameter defined for all objects in a parent objects definition.
Language Syntax
Supports ANSI SQL-02 varchar behavior Treats an empty string as zero length varchar value(instead of NULL) Evaluates comparisons to FALSE. Uses new is NULL and IS NOT NULL operators in Data Integrator Scripting language to test for NULL values. Treats trailing blanks as regular characters, instead of trimming them, when reading from all sources. Ignores trailing blanks in comparisons in transforms(Query and Table Comparison) and functions (decode,ifthenelse,lookup,lookup_ext,lookup_seq)
Recommendations
Do not compare without explicitly testing for NULLS. Business Objects does not recommend using this logic because any relational comparison to NULL value returns FALSE. Will execute the TRUE branch if both $var1 and $var2 are NULL, or if neither are NULL but are equal to each other.
Creating Repository
Repository Manager window : Choose the Database type and enter the Database name (as per TNS entry ) , User Id and Password . The Default option in Local . Now click the Create button .
Creating Repository
Once the repository creation will finish, It will display like below that the local repository successfully created . Do not work on any action items when repository creation is on . Leave the system/server idle .
Job Server Creation Enter name without spaces. Port number should be Unique. Click Apply.
Got Error: Because same port number already exists. Change it to 3800
Migrating projects Steps:Export: Export Project from one schema to another schema. Generally, use for project backup or production move.
Similarly, we can import repository or BODI Jobs from other schema or from .atl file as well.
Note: .atl file is we can create while export all project component in bodi generated file.
THANK YOU