Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 2

DWH is a relation database containing the collection of current and historical data drawn from different OLTP systems

so that this data is used for reporting and analysis. OLTP systems support 2 types of data. 1. Master data 2. Transactional data. 1. Master data is a referential data is created once and changes occasionally. 2. Transactional data is the data i.e generated due to Business activities and reported and analyzed taking master data as reference. Diff: W.H.Inman: A copy of the Transactional data specially structed for reporting and analysis. Ralph Kimball: A DWH is a subject oriented integrated, non-volatile and time variant data to support decision making of the organization. Granularity of Data: Granularity specifies at what level of detail information with in DWH has to be represented. It determines 3 factors. 1. Requirement of biz users. 2. Storage capacity of DWH. 3. Existing data in the OLTP systems. Aging process: As the time possess by current data becomes historical and historical becomes most historical data. Better move the most historical data from the OLTP system such process is called Aging process or backup recovery process. Extraction: Selecting the data required from the Biz users point of view is called extraction. Staging layer: A layer used to hold the data temporarily for applying the transformations till the data is made consistent.(In consistent data is : data types, coding conversations, storage structure, integrated constraints, detailed data). The total operations in the DWH can be classified into 3 fields. 1. Populating of the data from the OLTP systems into the DWH. 2. Backup and archiving the data into secondary storage. 3. Querying the data to support 2 dimensional and multidimensional reports. The 3 major implementation of DWH can be classified into 1. Modeling the DWH 2. ETL process 3. Reporting. Modeling the DWH: Multi dimensional modeling techniques have to be used for creating data marts. Two dimensional entity relationship approach should used for modeling the DWH. Modeling the Data Marts: Star schema is the approach used for modeling the data marts. Star schema contains two main components 1. Dimension tables. 2. Fact Tables. Dimension table contain master data, and Fact table contain transactional data. Star Schema: contains multiple dimension tables with fact table. Components of Fact table have key section and measure section. Key section contains foreign key of all the dimension tables. All the dimension tables are independent table and fact table is dependent table. To have the relationship between the dimension tables and fact tables, primary keys of the dimension tables should be migrated as foreign key to the key section of dependent table (fact table). Types of Star schema: OLAP supports two types of star schemas: 1. Basic Star schema (Star-flake star schema): The dimension tables are Denormalized and fact tables are normalized in nature. 2. Extended star schema (snow flake star schema): The dimension and fact tables are normalized in nature. Normalization: is the process of splitting the structure of the table into two or more tables so that redundancy (duplicates) can be reduced and transactions can be effectively implemented. Data Marts: DM are subsets of DWH to satisfy the requirements of a specific functional domain. Two types of DM. 1. Data marts are constructed from the OLTP systems are called independent DM. 2. DM constructed from DWH are called dependent DWH. Surrogate Key: OLAP doesnt depend on the primary key generated by the OLTP systems for identifying the records. It generates its own keys called warehouse key or surrogate key. Characteristics: These are pure integer value. They are used only for internal identification of reporting the data. These warehouse

keys are used as foreign keys in the key section of the fact table. The primary keys of the OLTP systems in OLAP are called natural keys. Types of Dimension and fact tables: The classification of the table with in the DWH is totally dependent on the type of data that is been stored i.e. (current or current + Historical data), the data of transactions that are being done with in the OLTP systems. Classification of Dimension tables: Master data once created might not be changed but when the changes occur, those changes should also be captured occasionally into the dimension tables. Since the changes are occasional the dimension tables can be classified as Slowly Changing Dimension (SCD). 1. SCD Type 1: Maintain only current data. When the records are initially extracted into OLAP tables for the first time, inserts are always treated as inserts, updates are treats as updates. 2. SCD Type 2: Maintain current + complete history) 1. Flag current data 2. Version number mapping, 3. Effective date range. Flag current data Version number Effective date range Mapping Records inserted In initial extract Every record with in into type2, table for records are type 2 is specified the first time are inserted with with effective starting treated as New version No. 1 date and ending date. When an updated When an update Mentioning the period record is inserted the record is inserted in which the record is previous version of the previous valid. the flag should be version is changed from New incremented by to Old. 1. We cant timely tract the data with respect to the changes. 3. SCD Type 3: Maintain current data + one time History. Inserts are treated as inserts and updates are treated as updates. The data in the data marts can be created in the form of relational DB or multi dimensional DB. The tables are the physical structures for storing the data in relational DB. Cubes are the structures for storing multidimensional data. Cubes are physically exits that already containing the data and reporting are directly performed on these cubes. Cubes are created only at the time of reporting for generating multi dimensional reports. Based on two OLAP can be classified into 4 types. ROLAP: Relational OLAP: the components of the star schema are created in vendor specific relational DB. Cubes are created only at the time of reporting so the process is some what slow. Since the data present in EDW both the current and historical data can participate and data can be drilled to the lowest possible level of detail. MOLAP: Multi dimensional OLAP: In case of MOLAP there are specially structured databases which store the data directly in the Multidimensional format. Since the data is present in multi dimensional format reporting is comparatively fast. We need to migrate the data twice from OLTP system to DWH and from DWH to data marts. The data can be drilled only to the limited possible extent. DOLAP: Desktop OLAP: When ever it is not possible to carry the complete information a part of the information can be stored in desktop databases like access and excel sheet to create the reports. OLAP information present in the form of desktop databases are called DOLAP structure. HOLAP: Hybrid OLAP: In order to take the advantages of both ROLAP and MOLAP the concept of HOLAP has been introduced. Conformed Dimension: There might be some cases where some dimensions can be used as common dimensions across multiple star schemas. Dimensions created once and reused across multiple star schemas are called conformed dimensions.

You might also like