Professional Documents
Culture Documents
Kalido - The Next Gen of DI PDF
Kalido - The Next Gen of DI PDF
Kalido - The Next Gen of DI PDF
Kalido White Paper: The Next Generation of Data Integration for Data Warehousing
Introduction
Data integration has long been the most fundamental challenge in delivering a data warehouse, or for that matter any data integration initiative. The industry has seen three generations of ETL technology from code generators, to proprietary engines, to Extract-Load-Transform architectures, each with progressively more reuse, performance and manageability. However, in spite of this progress, the cost of delivering the data integration tasks has remained the most time consuming and labor intensive deliverable in any data warehouse project. This has led to significant delays in putting data warehouses into production often as long as 12 to 18 months and hamstrung IT departments from being responsive to changes in requirements and the business environment which necessitate changing the data warehouse.
FIGURE 1: BY FAR THE MOST SIGNIFICANT EFFORT IN BUILDING A TRADITIONALLY-DEVELOPED DATA WAREHOUSE IS SPENT ON DATA INTEGRATION
Kalido White Paper: The Next Generation of Data Integration for Data Warehousing
Data integration tasks involve a series of repeatable patterns. Why cant these patterns simply be automated and these timeconsuming, labor intensive tasks vastly reduced?
Kalido White Paper: The Next Generation of Data Integration for Data Warehousing
Lets explore an example data integration job from a popular ETL vendor and compare this to a load definition in Kalido to achieve the same objective.
FIGURE 3: AN IBM INFOSPHERE DATASTAGE ETL JOB, TYPICAL OF AN ETL ROUTINE USED IN DATA WAREHOUSING (SOURCE: IBM INFOSPHERE DATASTAGE PERFORMANCE AND SCALABILITY BENCHMARK WHITEPAPER, FEBRUARY 2010)
If we examine the above data warehouse ETL job, we see a series of common tasks. On the extreme left and right hand side of this job, the input source and the output target (in this case, a physical table) of the job are defined. Each of the links and stages in between these two icons represents the areas where ETL developers are required to specify detailed transformations, lookups, table mappings, error conditions, and stage the rejected records and aggregate the error messages. While there can be no doubt that this approach enables very complex and powerful data integration jobs to be defined, the process is time-consuming and error prone, which results not only in significant effort in development, but equally large investment in testing and debugging issues which inevitably arise in this process.
A traditional ETL process is time-consuming and error prone, which results not only in significant effort in development, but equally large investment in testing and debugging issues which inevitably arise in this process.
Kalido White Paper: The Next Generation of Data Integration for Data Warehousing
FIGURE 4: K ALIDO AUTOMATES ALL THE TASKS IN THE ETL MAPPING JOB BECAUSE IT KNOWS ALL THE RULES AND RELATIONSHIPS THAT ARE EXPRESSED IN THE K ALIDO BUSINESS MODEL
By contrast in Kalido we too require that the source file or table is selected. Similarly we map this incoming data definition to the target. But in Kalido, the target is not a physical table but one or more target entities we defined in the Business Model in Figure 2. Kalido both creates the physical tables to house this data (inside the chosen database platform) and does the physical integration of the data into these tables. All the tasks specified in the ETL mapping job above are automated by the Kalido Information Engine, since we know all the relationships and rules that have been expressed in the Business Model. Another important difference with Kalido is that because the target of the data integration activity is a logical entity and not a physical table, any changes to the underlying physical table structures are managed through Kalido. When any of the tables involved in this job are changed, Kalido can automatically reflect the impact of those changes. In fact, in many cases, Kalido will automatically stage and recast the data into the new physical tables when this occurs.
Kalido White Paper: The Next Generation of Data Integration for Data Warehousing
Kalido customers realize faster time to value, easier maintenance, and greater agility not only in responding to initial warehouse demands, but also in keeping those warehouses up to date as the business and its analytical requirements evolve.
Conclusion
ETL tools have their place in many integration projects, however their disadvantages outweigh their ability to assist customers to deliver agile data warehouses that can enable business value in a timely way. Too many data warehouse-related integration tasks are not repeatable with ETL tools because of the way they were designed to operate. Kalido overcomes these issues by automating the data integration tasks which otherwise would be manually constructed and maintained in ETL tools. As a result, Kalido customers realize faster time to value, easier maintenance, and greater agility not only in responding to initial warehouse demands, but also in keeping those warehouses up to date as the business and its analytical requirements evolve.
About Kalido
Kalido is the leading provider of business-driven information management software. Kalido enables companies to manage data as a shared enterprise asset by supporting the business process of data management. Kalido software has been deployed at more than 300 locations in over 100 countries, including 20 percent of the worlds most profitable companies as determined by Fortune Magazine. More information about Kalido can be found at: http://www.kalido.com.
Contact Information US Tel: +1 781 202 3200 Eur Tel: +44 (0)845 224 1236 Email: info@kalido.com or visit our website at www.kalido.com
Copyright 2011 Kalido. All rights reserved. Kalido, the Kalido logo and Kalidos product names are trademarks of Kalido. References to other companies and their products use trademarks owned by the respective companies and are for reference purpose only.
WP-NGDI10114