Professional Documents
Culture Documents
DW Lecture 05
DW Lecture 05
Architectural Components
3 Major Areas Data Acquisition
Extraction, Transformation, Cleansing, Integration, Staging Loading, Archiving, Management Reports, Query Processing, Complex Analysis
Data Storage
Information Delivery
Building Blocks of the Data Warehouse Source Data Data Staging Data Storage Information Delivery Metadata Management and Control
2
Architectural Components
Management & Control Source Data
External
Information Delivery
DATA ACQUISITION
DATA STORAGE
Data Warehouse
INFORMATION DELIVERY
Data Mining
Production
OLAP
Internal
Archived
Data Storage
Reports / Queries
Data Staging
Management Software
Physical
Hardware Operating System DBMS Network Software
4
Platform Options
Single Platform Option Hybrid Option Source Data Platforms Staging Area Platforms Options for Staging Area Source Data Platforms Data Storage Platforms Separate Platforms Data Movement Options Shared Disk Mass Data Transmission Real Time Connection Manual Methods
5
Server Hardware
SMP (Symmetric Multiprocessing)
Clusters MPP (Massively Parallel Processing) ccNuma or NUMA (Cache-coherent Nonuniform
Memory Architecture)
Symmetric Multiprocessing
Features
Shared everything architecture Simplest parallel processing
Benefits
Proven technology since 1970 Workload balance Scalable performance Easy administration
Limitations
Limited available memory Limited bandwidth Limited availability
Consideration
Data warehouse size is two to three hundred gigabytes and concurrency
Symmetric Multiprocessing
Processor Processor Processor Processor
Common Bus
Shared Memory
Shared Disks
8
Clusters
Features
Each node has one or more processors and associated memory Memory is shared within each node only High speed bus communication Shared disks Cluster of nodes
Benefits
High availability Preserves the concept of one database Incremental growth
Limitations
Bus bandwidth High O/S overhead Cache consistency maintenance for inter-node synchronization
Consideration
If data warehouse is expected to grow in a well defined increments
9
Clusters
Processor Processor Processor Processor
Shared Memory
Shared Memory
10
Shared nothing architecture Focus of disk access than memory access Works well with O/S that supports transparent disk access Inter-node communication through processor to processor connection
Highly scalable Fast access between nodes Improved system availability Cost per node is low Requires rigid data partitioning Restricted data access Limited work load balance Cache consistency must be maintained
Benefits
Limitations
Considerations
Medium to large size data warehouse of four to five hundred gigabytes
11
Memory
Memory
Memory
Memory
Disk
Disk
Disk
Disk
12
Benefits
Maximum flexibility Overcome memory limitations of SMP Better scalability than SMP Partitioning with centralized approach
Limitations
Complex programming Limited software support Still maturing
Consideration
For experienced technology users
13
Shared Memory
Shared Memory
Disks
Disks
14
Software Tools
Data Modeling Data Extraction Data Transformation Architecture First, Data Loading Then Tools Data Quality Queries and Reports OLAP Alert Systems Middleware and Connectivity Data Warehouse Management
15
Metadata
Definitions Data about data Table of contents for the data Catalog for the data Data warehouse atlas Data warehouse roadmap Data warehouse directory The nerve center
16
Metadata Example
Entity Name Definition Remarks Source Systems Created Date Last Update Date Update Cycle Last Full Refresh Full Refresh Cycle Data Quality Reviewed Last Deduplication Planned Archival Responsible User Customer Alias Names Account, Client A person or an organization that purchases good or services from the company It includes regular, current and past customers Finished Goods Orders, Maintenance Contracts, Online Sales January 15, 1999 January 21, 2001 Weekly December 29, 2000 Every Six Months January 25, 2001 January 10, 2001 Every Six Months Jane Brown
17
Need of Metadata
For Using Data Warehouse
For Building Data Warehouse For Administering Data Warehouse Who needs it? IT Professionals Power Users Casual Users
18
A Nerve Center
Source Systems Extraction Tools Query Tool Reporting Tool
Cleansing Tools
OLAP Tool
19
20
Metadata Requirements
Capturing and Storing Data
Variety of Metadata Sources Metadata Integration Metadata Standardization Rippling through Revisions Keeping Metadata Synchronized Metadata Exchange
21
Metadata Sources
Source Systems
Data Extraction Data Transformation and Cleansing Data Loading Data Storage Information Delivery
22