Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 9

ASSIGNMENT OF

DATAWAREHOUSE AND DATA


MINING

CSE-501

SUBMITTED BY:- SUBMITTED TO:-


ABHINAV MAHAJAN Ms. NEHA BHATEJA
ROLL NO- RB27T2A22
REG.NO-3070070116
Q1: Data warehouse tends to be as much as 4 times as large as operational databases.
How can you manage this amount of data?
Ans:
The Data Warehouse manages the data in the SQL Server and OLAP databases. These
databases are then used to produce reports and to analyze and view population segments.
The Data Warehouse is designed to support robust query and analysis. You use the Analysis
modules in Commerce Server Business Desk to analyze the data in the Data Warehouse, for
example, to identify user trends or to analyze the effectiveness of a campaign, and then
update your site to target content to specific user groups or to sell specific products.

There are many different ways by which we can manage large amount of data:

o Ad hoc query access grows over time and must be carefully monitored as new
inexperienced users continue to run requests against base tables rather than summary
or aggregate tables to produce totals.
o Advances in technology in terms of network, hardware and software require more
rapid release changes to be applied.
o An ongoing training program for business analysts, executives and decision support
tool programmers keeps everyone informed as how to use the current version pf the
data warehouse or mart and find the information they need.
o By compression: Extra amount of data present in the dataware house is compressed
using some different techniques and tools so that only relevant data that is of use is
kept in the data warehouse.

o Follow parallel processing: When the server on which the data is being loaded is
short of memory space then that data is being loaded to the multiple servers so that the
total system throughput is increased through parallel processing of the data in multiple
servers.

Example:
Organizations need actionable and timely business insights from rapidly growing
data. So they take advantage of Microsoft SQL Server 2008 R2 Parallel Data
Warehouse and its massively parallel processing (MPP) architecture to gain scalable
performance, flexibility, and hardware choices with the most comprehensive data
warehouse solution available.
Parallelism can be performed in two ways:
• Horizontal Parallelism
• Vertical Parallelism
o Distribution on the basis of probability of access: In this the data in the data in the
server is physically separated on the basis of probability of data being accessed
usually. Data that is accessed mostly and is important is at the top and then the
moderate and at last is the data that is used rarely.

Q2:.How multidimensional is different from multirelational OLAP.


Ans:
Multidimensional OLAP

• The database is stored in a special, usually proprietary, structure that is optimized for
multidimensional analysis.
• very fast query response time because data is mostly pre-calculated
• There is a practical limit on the size because the time taken to calculate the database
and the space required to hold these pre-calculated values.
• When the requirement is only to access summarized data then it fulfills the request.
The data is stored more efficiently as it is stored in the multidimensional arrays.
• MOLAP databases and data warehouse databases have no linkage among them to be
used for query purposes.
• It requires large amount of the disk space available.
• The storage issue is the main problem being faced by the Multidimensional OLAP.

Multirelational OLAP

• It stores all the aggregations and data within the relational database.
• The database is a standard relational database and the database model is a
multidimensional model, often referred to as a star or snowflake model or schema.
• more scalable solution.
• performance of the queries will be largely governed by the complexity of the SQL and
the number and size of the tables being joined in the query
• There can be a linkage between ROLAP databases and data warehouse databases.
• In ROLAP its takes more time while retrieving the data.
• It is suitable for the implementation of standalone databases and small data
warehouse.
• The maintenance issue is the main problem being faced by the Multidimensional
OLAP.
Q3: How can we map the DW to Multi Processor architecture? Elaborate.
Ans: We can map the DW to Multi Processor architecture with the help of three DBMS
software architectures:
Shared-memory architecture: Through this architecture all the major components such as
processors, memory and the entire database can be utilized by a single RDBMS. All the data
that is being distributed across multiple local disks can be accessed equally by all the
processors.

Shared-disk architecture: In this architecture the entire database is accessed across multiple
RDBMS servers. Each of the RDBMS servers present have authority or permission to make
changes to the database that is being shared. It results in decrement in the performance as
there is an uneven distribution of the data.

Shared nothing architecture: In Shared- nothing architecture RDBMS the execution of the
queries is parallelized across the multiple processing nodes. Each of the processor there in the
system has its own unique feature that is memory and data over interconnection network
which also differentiate the processors.
Part B:

Q4: Suppose that a data warehouse consists of 4 dimensions, date, spectator, location,
game and the two measures, count and charge , where charge is the fare that a spectator
pays when watching a game on a given date. Spectators may be student, adults, or
seniors with each category having its own charge rate.
Draw a star schema diagram for the data warehouse.
Ans:

Dimensions of Spectator

Spectator id
Spectator category

Spectator name

Spectator Address
Dimensions of Date
Date id
\ Day

Month

Year Face table


Date id
Spectator id
Location id
Game id
Count
Charge
Dimensions of Game

Game id
Game name
Game Description
No of players

Dimensions of Location
Location id

Colony
City

Country
Q5: Discuss the components of metadata interchange standard framework.
Ans: Metadata interchange standard framework was designed to handle certain issues such as
exchanging, sharing and managing of metadata. Two metamodels are defined by it:

1. Application metamodel : It contain the tables to hold the metadata in the tabular form.
2. Metadata metamodel: It consist of set of objects that metadata interchange standard
can be used to describe.
The main components of metadata interchange standard framework are:

 The Standard Metadata model: The ASCII file format that is being used for the
representation of metadata being exchanged among different data sources is described
by it.
 The Standard Access Framework: It gives information about the no of Application
Programming Interface that must be supported by the vendor for proper exchange of
metadata.
 Tool Profile: It gives detail about the aspects that is being supported by each tool for
interchange standard model.
 The User Configuration: It is kind of file that provides a useful facility to its
customer so that the metadata that is being propagated from one tool to the another
can be constrained and also helps in finding out whether the metadata model file is
being imported by any of the tools.

Tool 3
Tool 2 Tool 4

Tool Profile
Tool 1 Tool Profile
Tool 5
Tool Profile
Tool Profile Tool Profile

Q6: Discuss the benefits of metadata repository

Ans: Metadata repository manages the metadata. It consists of tools that increase business
and technical understanding of data, through the capture, maintenance and presentation of
information that describe the organization’s data and processes.
The benefits are:

• Improve Quality:
 Metadata is the input needed to build the data profiles that are the foundation
of data quality.
 Provides architecture & infrastructure to maintain & present the data quality
expectations, definitions, structures, processes, flow thru the enterprise, and
business uses of our data.
• Cost Reduction
 Less project hours required for legacy discovery.
 Increased data quality reduces data scrap and rework
 Reduced dependency on undocumented associate knowledge

• Shortened Delivery
 Project legacy discovery can be automated or accessed electronically.
 Reduced rework due to poor or inaccessible information.
 Reduced time spent verifying the quality of information.

• Competitive Advantage
 Deeper insight into our customer (and ourselves) through improved data
quality.
 Maximize development efficiencies to create a more nimble and responsive
organization
 Improved decision making through better understanding of information and
processes.

• You can customize it to your requirements because your technicians have full control
over design and functionality.
• You can build and deploy the metadata repository capabilities in increments over
time. This will allow you to start quickly without too much complexity and spending
too much time on development.
• Ability to customize your reports, end user interfaces, meta tutorials and any other
usage of the metadata repository.
• It increases the flexibility, control and reliability of application development process.

You might also like