Professional Documents
Culture Documents
DW Basics
DW Basics
An Enterprise Data warehouse is a relational DB which is specially designed for analyzing the
business and making decisions to achieve the business goals and responding to business
problems ,but not designed for business transactional processing
A Data warehouse is a concept of consolidating the data from multiple OLTP data bases
1.Low range
2.Mid range
3.High range
Example:Ms-Access
Can organized and managed Tera bytes and Peta Bytes of information
Example:Teradata,Netezza,GreenPlum,Hadoop.
There are two types of data storage patterns which are supported by relational DB
7.Recomended for data warehousing for small and medium scale enterprises with storage
capacity of gigabytes
2.Storing nothing architecture (every processor has dedicated memory& disk that is not shared
by another processor)
4.Unlimited Scalability
5.Designed only for Building Enterprise data warehouse but not for OLTP
Example:Teradata,Netteza,Hadoop,green plum
1.Data base that supports enormous storage capacity(Billions of rows and Tera bytes)
2. DB that supports distributed file storage pattern
6.DB that supports mature optimizers to handle complex SQL Queries( Run the queries more
faster with less system resource usage
8.100% data without data loss even S/W,H/W components are down
10.That DB supports low TCO (total cost of owner ship) ease to set up ,administrate & Manage
11. Single DB server that can provide access to hundreds of users concurrently
Data Acquisition:
It is a process of extracting the data from multiple source systems,transforming the data into
consistent format and load in to a target system,To implement the ETL process we need ETL
tools
Data Cleansing:
Data Merging:
1.Join
2.Union
Data warehouse:
1.Data warehouse is a relational DB that is used to store the historical data for query& Analysis
SOR-->Source of records
Computer system that stores time sensitive transaction related data that is processed
immediately and analysis and always kept current.
1. Dimension Table
2. Fact Table
1. Dimension Table:
Customer,Product,Stores,Employees,Pramotions,Time
Fact Table:
Sales,Purchase,Inventry
1. SA_LoanTransaction Fact
2. CC_Transaction Fact
3. CC_Statement Fact
Fact table consists of Keys and Measures and Fact table consist of Composite Primary key
3.Fact less Fact table acts as a Bridge between the Dimensional tables
Example of Fact less Fact table: Employee Attendence Fact less Fact
Dimension Tables
Auditorim Sponsors Time Paticipant Events
Aud Id Sponsor Id Date Key Paticipant Id Event Id
Sponsor Month Paticipant Event
Aud Name Name Key Name Name
Aud Type Contribution Qtr Gender Event Type
Aud Mgr Address Year Address Event Desc
Aud
Address
Fact Table
Aud Id Sponsor Id Paticipant id Event id
A1 S1 P1 E1
A1 S1 P2 E1
A2 S1 P3 E1
It consist of semi additive facts and non additive facts it describes states of things in a
particular instance of time
Types of Facts:
1. Additive Facts
1.Additive Fact: Business measurements in a fact table that can be summed up through all of
the dimensional Keys
Fact Table
Store Key Prod Key Date Key Revenue
S1 P1 12-Jan-15 600
S1 P2 12-Jan-15 400
S2 P2 12-Jan-15 800
S2 P3 13-Jan-15 500
S3 P1 13-Jan-15 700
S3 P3 14-Jan-15 900
Semi Additive Fact: Business measurements in a fact table that can be summed up across only
few Dimensional Keys
Balance By Acct Id
Acct Balanc
Id Balance e
160000
21653 0 900000
100000
21654 0 600000
Balance By Date
Date Key Balance
12-Jan-15 1100000
13-Jan-15 1500000
SEM1 80%
SEM2 60%
TOTAL 140% Wrong
Types of Dimensions:
1. Confirmed Dimension
2. Degenerated Dimension
3. Shrunken Dimension
4. Junk Dimension
5.Dirty Dimension
Types of Dimensions:
Conformed Dimension: A Dimension that is shared across multiple Fact table that is called
Conformed Dimension Or Dimension that is used to join Data mart
Banking Domain:
Degenerated Dimension:
If a fact table act as dimension and it’s shared with another fact table (or) maintains foreign key
in another fact table .such a table called degenerated dimension.
Shrunken Dimension:
Or
Cardinality is no of unique values in a column or Cardinality expresses the minimum and the
maximum no of instances of an entity ‘B’ that can be associated to an instance of Entity ‘A’
Dirty Dimension:
If a record occurs more than one time in a table by the difference of non key attribute such a
table is called dirty dimension
Orders:
Or
1.SCD TYPE1
2.SCD TYPE-II
3.SCD TYPE-III
1. SCD TYPE-I:
SCD TYPE-II:
Change is inserted as a new record
PRODUCTS
PNAM
PID E PRICE EFF_DATE
11 ABC 300 12-JAN-10
12 PQR 270 15-JAN-10
PRODUCT PRICE OF 12
CHANGED 199 27-AUG-11
PKE PNAM
Y PID E PRICE EFF_DATE END_DATE
100 11 ABC 300 12-JAN-10
26-AUG-
101 12 PQR 270 15-JAN-10 11
102 12 PQR 199 27-AUG-11
CURR
CKEY CID CNAME LOC PREVLOC
101 11 BEN HYD
102 12 TOM CHE
CURR
CKEY CID CNAME LOC PREVLOC
101 11 BEN KER HYD
102 12 TOM BNG CHE
Role Play Dimension: Dimension that is recycled in multiple applications within the DB
Data Modeling:
Model: Business presentation of the structure of the data in one or more database
OLTP:ER-Mode is used
Model is normalized
1.Star Schema
3.Galaxy Schema
1.Star Schema: In a star schema a centre of a star is Fact table and corners are Dimension
tables
Custom
er
Cna Gend Geo
Cid me er id
11 C1 1 111
12 C2 1 111
13 C3 0 112
14 C4 1 111
Geograph
y
Stat Countr
Geoid City e y Region
111 Hyd Ts India Asia
112 VSP Ap India Asia
Countr
Cid Cname Gender Geoid City State y Region
11 C1 1 111 Hyd Ts India Asia
12 C2 1 111 Hyd Ts India Asia
13 C3 0 112 VSP Ap India Asia
14 C4 1 111 Hyd Ts India Asia
1.B*Tree Index
2.BitMap Index
1.B*Tree Index
2.BitMap Index
STEP1:
=============================================================================
STATE_ID,STATE_NAME,COUNTRY_ID
250.00,Rio Negro,111
251.00,Buenos Aires,111
252.00,Victoria,115
253.00,South Australia,115
254.00,Queensland,115
255.00,Northern Territory,115
258.00,Sao Paulo,110
259.00,Santa Catarina,110
260.00,Rio de Janeiro,110
=============================================================================
C:\SOURCE\States.txt
STEP2:
Create table in the oracle data base using the script given below
LOAD DATA
INFILE 'C:\SOURCE\States.txt'
STEP4:
C:\oracle\product\10.2.0\db_1\BIN>exit
STEP5:
X KICK OF MEETINGS2
ANALYSIS PHASE
DESIGN PHASE
CODING PHASE
REVIEWS
TESTING PHASE
GO LIVE PHASE
SUPPORT
1.Analysis Phase:
Business Analyst:
Outcome:
2. DB Tool to be used
2.Design Phase:
Data warehouse Architect/ETL Architect provides solution to build the DW or Data marts
Outcome:
1.Summary Information
2.Project Architecture
3.System Architecture
4.Source
5.DB
12.Mapping Details
Outcome: Low Level Design Document It consists of Source and Target Object Details
ETL Team:
3.Coding Phase:
Mapping is created based on Design document
Code Review: Code review is to check Business Logic and whether naming standards are
followed or Not
Peer Review:
Team member review the same as above mentioned, If everything is Ok then do testing
4.Testing Phase:
1. Unit testing (Mappings are tested by individual users debugger or enable test load to test
mapping with limited test data
2. SIT (System Integration testing): Mappings are tested according to their dependencies
3. UAT (User acceptance testing): Mappings are tested in the presence of onsite users
4.Production Phase:
Jobs are scheduled and monitored scheduling tools-UC4,DAC,Autosis,Control-M,Tivoli Work
Load Scheduler
Project Architecture: