Kalido - The Next Gen of DI PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

White Paper

The Next Generation of Data Integration for Data Warehousing

Kalido White Paper: The Next Generation of Data Integration for Data Warehousing

Introduction
Data integration has long been the most fundamental challenge in delivering a data warehouse, or for that matter any data integration initiative. The industry has seen three generations of ETL technology from code generators, to proprietary engines, to Extract-Load-Transform architectures, each with progressively more reuse, performance and manageability. However, in spite of this progress, the cost of delivering the data integration tasks has remained the most time consuming and labor intensive deliverable in any data warehouse project. This has led to significant delays in putting data warehouses into production often as long as 12 to 18 months and hamstrung IT departments from being responsive to changes in requirements and the business environment which necessitate changing the data warehouse.

FIGURE 1: BY FAR THE MOST SIGNIFICANT EFFORT IN BUILDING A TRADITIONALLY-DEVELOPED DATA WAREHOUSE IS SPENT ON DATA INTEGRATION

The ETL Vendor Response


In response to this reality, there is a new wave of messaging from leading ETL vendors proposing data virtualization as the answer to delivering agility in data warehousing. This proposed approach is to use data virtualization to validate what customers are about to build with their ETL code. With this approach, customers may see a relative acceleration in the time it takes to get first get visibility of the data, but this does not really constitute sustained business value. Without appropriate processes in place, the data samples produced from virtualization are no more accurate than system exports brought into a maze of uncontrolled spreadsheets. If one applies this same virtualization solution for more complex, cross system integration initiatives, such as a data warehouse, then one might argue that the validation and integrity rules may be implemented in the virtualized solution. This may be the possible in some cases, but as soon as we go down this path, were back into the traditional set of time-consuming ETL job development activities.
2

Kalido White Paper: The Next Generation of Data Integration for Data Warehousing

Automation: The Missing Piece


Companies who have implemented data warehouses are well aware that there are a series of repeatable patterns related to the data integration tasks one must undertake. Why cant these patterns simply be automated and these time-consuming, labor intensive tasks vastly reduced? The answer is simple. Traditional ETL tools have left the database model up to the modeling experts (and tools) to design and develop. This necessitates a dependency between two specialist disciplines in the IT space: the data modelers and the ETL developers. This dependency forces serialization and hand-offs between the two teams, which results in more documentation, not more automation. Another by-product of the traditional approach is that there is a lack of semantic knowledge of the data and how it relates to other data, which forces ETL development tasks to repeat the same manual tasks over and over again. For example, if your model specifies your sales invoices reference product, time, customer account and sales representative entities, the system could automatically infer that lookup to these tables is required to be performed. But without this semantic knowledge, the ETL tool has no way to automate these tasks. At Kalido, we believe there is a better way, and our 300+ customer deployments have proven it.

Data integration tasks involve a series of repeatable patterns. Why cant these patterns simply be automated and these timeconsuming, labor intensive tasks vastly reduced?

Kalido: Reducing Data Integration for Data Warehousing


The Kalido Information Engine is driven from a business information model which clearly captures the model of the information the business requires to meet their analytical needs. This same business model is used to drive all the most common tasks required to build and sustain an operational data integration job. The data warehouse developers map the inbound data to the business model. The Kalido Information Engine automates the detailed activities required to stage, validate and integrate this data.

FIGURE 2: A SIMPLE EX AMPLE OF A K ALIDO BUSINESS INFORMATION MODEL

Kalido White Paper: The Next Generation of Data Integration for Data Warehousing

Lets explore an example data integration job from a popular ETL vendor and compare this to a load definition in Kalido to achieve the same objective.

FIGURE 3: AN IBM INFOSPHERE DATASTAGE ETL JOB, TYPICAL OF AN ETL ROUTINE USED IN DATA WAREHOUSING (SOURCE: IBM INFOSPHERE DATASTAGE PERFORMANCE AND SCALABILITY BENCHMARK WHITEPAPER, FEBRUARY 2010)

If we examine the above data warehouse ETL job, we see a series of common tasks. On the extreme left and right hand side of this job, the input source and the output target (in this case, a physical table) of the job are defined. Each of the links and stages in between these two icons represents the areas where ETL developers are required to specify detailed transformations, lookups, table mappings, error conditions, and stage the rejected records and aggregate the error messages. While there can be no doubt that this approach enables very complex and powerful data integration jobs to be defined, the process is time-consuming and error prone, which results not only in significant effort in development, but equally large investment in testing and debugging issues which inevitably arise in this process.

A traditional ETL process is time-consuming and error prone, which results not only in significant effort in development, but equally large investment in testing and debugging issues which inevitably arise in this process.

Kalido White Paper: The Next Generation of Data Integration for Data Warehousing

FIGURE 4: K ALIDO AUTOMATES ALL THE TASKS IN THE ETL MAPPING JOB BECAUSE IT KNOWS ALL THE RULES AND RELATIONSHIPS THAT ARE EXPRESSED IN THE K ALIDO BUSINESS MODEL

By contrast in Kalido we too require that the source file or table is selected. Similarly we map this incoming data definition to the target. But in Kalido, the target is not a physical table but one or more target entities we defined in the Business Model in Figure 2. Kalido both creates the physical tables to house this data (inside the chosen database platform) and does the physical integration of the data into these tables. All the tasks specified in the ETL mapping job above are automated by the Kalido Information Engine, since we know all the relationships and rules that have been expressed in the Business Model. Another important difference with Kalido is that because the target of the data integration activity is a logical entity and not a physical table, any changes to the underlying physical table structures are managed through Kalido. When any of the tables involved in this job are changed, Kalido can automatically reflect the impact of those changes. In fact, in many cases, Kalido will automatically stage and recast the data into the new physical tables when this occurs.

Does Kalido Displace ETL in the Enterprise?


While you may find ETL tools useful for other integration projects, they are not necessarily required for a data warehouse project. ETL technology has earned its place at an infrastructure level in many enterprises today. Operational data integration and data sharing infrastructure has applicability in ERP integration and establishing an operational data bus. The Kalido Information Engine is dedicated only to the integration of data to deliver a data warehouse or data mart. Further, Kalidos focus is not on automating every possible transformation or function required to deliver a data warehouse, but rather on automating all the most common functions required in every data integration job. Kalido provides rich facilities to support the automated transformations directly in database, thus leveraging the power of the DBMS to enhance scalability, performance and resilience.

Kalido White Paper: The Next Generation of Data Integration for Data Warehousing

Kalido customers realize faster time to value, easier maintenance, and greater agility not only in responding to initial warehouse demands, but also in keeping those warehouses up to date as the business and its analytical requirements evolve.

Conclusion
ETL tools have their place in many integration projects, however their disadvantages outweigh their ability to assist customers to deliver agile data warehouses that can enable business value in a timely way. Too many data warehouse-related integration tasks are not repeatable with ETL tools because of the way they were designed to operate. Kalido overcomes these issues by automating the data integration tasks which otherwise would be manually constructed and maintained in ETL tools. As a result, Kalido customers realize faster time to value, easier maintenance, and greater agility not only in responding to initial warehouse demands, but also in keeping those warehouses up to date as the business and its analytical requirements evolve.

About the Kalido Information Engine


The Kalido Information Engine helps customers rapidly deploy a foundation for analytics much faster than traditional hand-coding or ETL-based methods. Its ability to instantly adapt and change delivers more accurate, consistent and reliable information faster to your business, which can lead to better analytical performance and better decision making sooner, affording significant top-line growth and bottomline savings opportunities for your organization.

About Kalido
Kalido is the leading provider of business-driven information management software. Kalido enables companies to manage data as a shared enterprise asset by supporting the business process of data management. Kalido software has been deployed at more than 300 locations in over 100 countries, including 20 percent of the worlds most profitable companies as determined by Fortune Magazine. More information about Kalido can be found at: http://www.kalido.com.

Contact Information US Tel: +1 781 202 3200 Eur Tel: +44 (0)845 224 1236 Email: info@kalido.com or visit our website at www.kalido.com

Copyright 2011 Kalido. All rights reserved. Kalido, the Kalido logo and Kalidos product names are trademarks of Kalido. References to other companies and their products use trademarks owned by the respective companies and are for reference purpose only.

WP-NGDI10114

You might also like