Professional Documents
Culture Documents
DWM 3
DWM 3
DWM 3
1. "top-down" approach
2. "bottom-up" approach
1."top-down" approach-
-In this approach “data warehouse is built first and then data marts are built on top of the data
warehouse.”
Process-
-ETL tools are used for to check accuracy and correctness of data.
-We can apply various techniques like summarization,aggregation of the data and then it loaded
on data warehouse.
Samarth
Advantages
4.Developing new data mart from the data warehouse is very easy.
Disadvantages
2."bottom-up" approach
Process-
-this approach Data Marts are first created .data marts provide reporting and analytics capability
for specific business approach.
The data flow from extraction of data from various source system into stage area
Advantages
Samarth
Disadvantages
-the locations of the data warehouse and the data marts are reversed in the bottom-up approach
design.
-It saves time and money of an IT industries in their business analysis process.
The top-down view − This view allows the selection of relevant information needed for a data
warehouse.
The data source view − This view presents the information being captured, stored, and
managed by the operational system.
The data warehouse view − This view includes the fact tables and dimension tables. It
represents the information stored inside the data warehouse.
The business query view − It is the view of the data from the viewpoint of the end-user.
-In top-down approach, the approach starts with overall design and planning.
Samarth
-This approach can be used where the technology is mature and well known, the business
problem is clear and well understood.
-In bottom-up approach, the process can start with experiments and prototypes, this process is
useful in early stage of business modeling and technology development.
-Steps in the design and construction of a data warehouse from software engineering point of
view:
*Planning
*Requirements study
*Problem analysis
*Warehouse design
-There are two best method in software development as waterfall model and spiral model
-in waterfall model, we can developed structured and analysis of step to next step.
-In spiral model, we can developed rapid steps one involve others
- In case the business process is organizational and it involves multiple complex objects
then a data warehouse should be used.
3. Choosing the dimension applied to each fact table record E.g. customer, account, item
4. Choosing the measures
-Measures like additive quantities e.g. dollars_sold, units_sold.
-After the data warehouse design and construction ,data warehouse deployment starts which
includes
-Installation
-Training
-Roll out Planning
Samarth
-Querying
-Statistical analysis
1. High quality of data in data warehouses: A data warehouse constructed with pre-processing
techniques serves as a valuable source for OLAP.
6. Online selection of data mining functions: OLAP integrated with data mining gives users
flexibility of selecting desired data mining functions.
For Example, All Electronics sales data cube contains city, item, year and sales in dollars as
shown in figure below.The three attributes city, item, and year, as the dimensions for the data
cube and sales in dollars as the measure; the total number of cuboids or group by's, that can be
computed for this data cube is 2^3=8. The possible group-by's are
(city, item),
(city, year),
(item, year),
(city),
(item),
(year),
()}
where () means that the group-by is empty (ie, the dimensions are not grouped). These group-
by's form a lattice of cuboids for the data cube, as shown in Figure below.
Samarth
2 .Full Materialization:
-This referred to as a full cube as pre-computation of all of the cuboids is done initially.
-Memory space is required to store all cuboids.
3.Partial Materialization:
1.Bitmap Indexing
Advantages
-It represent data in single bit
-It is useful to save time for preprocessing
-It reduces a space for storage.
-It maintains relationships between attribute values of dimension and corresponding row
of table.
-In data warehouses,Join Index relates the values of the dimensions of a star schema to
rows in the fact table.
For example,A star Schema containing a fact table:sales and two dimensions:city and
product then A join index on city maintains for each distinct city a list of R-IDs of the
tuples recording the sales in the city as shown in below figure.
-
Samarth
-It is situated between relational back end servers and client front end tools.
-ROLAP uses relational and extended relational database to store and manage and handle the
data in data warehouse.
○ Database server.
○ ROLAP server.
○ Front-end tool.
Samarth
Advantages-
Can handle large amounts of data - the limitation is the data size of the underlying relational
database. OLAP itself has no limitation on data amount.
Disadvantages
-It uses array based multidimensional storage engines to produce the views.
Database server.
MOLAP server.
Front-end tool.
Samarth
Advantages
Excellent Performance: A MOLAP cube is built for fast information retrieval, and is optimal for
slicing and dicing operations.
Can perform complex calculations: All evaluation have been pre-generated when the cube is
created. Hence, complex calculations are not only possible, but they return quickly.
Disadvantages
Limited in the amount of information it can handle: Because all calculations are performed when
the cube is built, it is not possible to contain a large amount of data in the cube itself.
Requires additional investment: Cube technology is generally proprietary and does not already
exist in the organization. Therefore, to adopt MOLAP technology, chances are other investments
in human and capital resources are needed.
-It combines features of ROLAP & MOLAP for greater scalability and faster computation.
-It allows to store large data on relational database and aggregation store in separate MOLAP
storage
Advantages of HOLAP
3. HOLAP balances the disk space requirement, as it only stores the aggregate information
on the OLAP server and the detail record remains in the relational database. So no
duplicate copy of the detail record is maintained.
Disadvantages of HOLAP
1. HOLAP architecture is very complicated because it supports both MOLAP and ROLAP
servers.