Assignment On Chapter 1 Data Warehousing and Management

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

DATA WAREHOUSING AND MANAGEMENT MODULE 1

CHAPTER I: FUNDAMENTAL CONCEPTS: INTRODUCTION TO DATA WAREHOUSE


SYSTEMS

I. OBJECTIVES
At the end of this chapter, the students should be able to:
 Explain the fundamental concepts to data warehouse systems.
 Discuss the historical overview of data warehousing, starting from the early achievements.
 Understand the various ways of organizing and managing information for decision making
use.
 Review the history of decision support systems.
 Learn briefly what is data warehouse and see why data warehousing is the viable solution
 Describe the field of spatial and spatiotemporal data warehouses
 Describe new domains and challenges that are being explored in order to answer the
requirements of today’s analytical applications.

II. SUBJECT MATTER


Topic: Introduction to Data Warehousing Systems
Subtopic: - A Historical Overview of Data Warehousing
- Spatial and Spatiotemporal Data Warehouses
- New Domains and Challenges

III. PROCEDURE
A. Preliminaries
Pre- Assessment
1. Explain and discuss the historical overview of data warehousing including traditional
operational or transactional databases, the design of operational databases, and
multidimensional modelling.
2. Identify the field of spatial and spatiotemporal data warehouses, which has been
increasingly used in many application domains.
3. Examine and discuss new domains and challenges that are being explored in order to
answer the requirements of today’s analytical applications.

B. Lesson Proper
Organizations today are facing increasingly complex challenges in terms of management
and problem solving in order to achieve their operational goals. This situation compels people
in those organizations to utilize analysis tools that can better support their decisions.

1
DATA WAREHOUSING AND MANAGEMENT MODULE 1
In this chapter, we present an overview of the data warehousing field, covering both
established topics and new developments, and indicate the chapters in the book where these
subjects are covered. We give a historical overview of data warehousing and On-Line Analytical
Processing (OLAP), starting from the early achievements. Then, we describe the field of spatial
and spatiotemporal data warehouses, which has been increasingly used in many application
domains. Finally, we describe new domains and challenges that are being explored in order to
answer the requirements of today’s analytical applications.

1. Introduction

Business intelligence comprises a collection of methodologies, processes, architectures,


and technologies that transform raw data into meaningful and useful information for decision
making. Business intelligence and decision-support systems provide assistance to managers at
various organizational levels for analyzing strategic information. These systems collect vast
amounts of data and reduce them to a form that can be used to analyze organizational behavior.

This data transformation comprises a set of tasks that take the data from the sources and,
through extraction, transformation, integration, and cleansing processes, store the data in a
common repository called a data warehouse. Data warehouses have been developed and
deployed as an integral part of decision-support systems to provide an infrastructure that enables
users to obtain efficient and accurate responses to complex queries.

The Data Warehouse is an integrated, subject-oriented, time-variant, non-volatile


database that provides support for decision making.

Integrated: The Data Warehouse is a centralized, consolidated database that integrates data
retrieved from the entire organization.

Subject-Oriented: The Data Warehouse data is arranged and optimized to provide answers to
questions coming from diverse functional areas within a company.

Time Variant: The Warehouse data represent the flow of data through time. It can even contain
projected data.

Non-Volatile: Once data enter the Data Warehouse, they are never removed. The Data
Warehouse is always growing.

2
DATA WAREHOUSING AND MANAGEMENT MODULE 1

Figure 1. Comparison of Data Warehouse and Operational Database Characteristics

A data warehouse system has the following characteristics:

• It provides a centralized utility of corporate data or information assets.

• It is contained in a well-managed environment.

• It has consistent and repeatable processes defined for loading operational data.

• It is built on an open and scalable architecture that will handle future expansion of data.

• It provides tools that allow its users to effectively process the data into information without a
high degree of technical support.

A wide variety of systems and tools can be used for accessing, analyzing, and exploiting
the data contained in data warehouses. From the early days of data warehousing, the typical

3
DATA WAREHOUSING AND MANAGEMENT MODULE 1
mechanism for those tasks has been online analytical processing (OLAP). OLAP systems allow
users to interactively query and automatically aggregate the data contained in a data warehouse.
In this way, decision makers can easily access the required information and analyze it at various
levels of detail.

On-Line Analytical Processing (OLAP) is an advanced data analysis environment that


supports decision making, business modeling, and operations research activities. OLAP systems
are designed to use both operational and Data Warehouse data.

Four Main Characteristics of OLAP:

 Use multidimensional data analysis techniques


 Provide advanced database support
 Provide easy-to-use end user interfaces
 Support client/server architecture

Figure 2. OLAP Server Arrangement

4
DATA WAREHOUSING AND MANAGEMENT MODULE 1
The design of operational databases is typically performed in four phases: requirements
specification, conceptual design, logical design, and physical design. During the requirements
specification process, the needs of users at various levels of the organization are collected. The
specification obtained serves as a basis for creating a database schema capable of responding to
user queries.

Databases are designed using a conceptual model, such as the entity-relationship (ER)
model, which aims at describing an application without taking into account implementation
considerations. The resulting design is then translated into a logical model, which is an
implementation paradigm for database applications. Finally, physical design particularizes the
logical model for a specific implementation platform in order to produce a physical model.

Multidimensional modeling views data as consisting of facts linked to several dimensions.

Multidimensional Data Analysis Techniques

 The processing of data in which data are viewed as part of a multidimensional structure.
 Multidimensional view allows end users to consolidate or aggregate data at different
levels.
 Multidimensional view allows a business analyst to easily switch business perspectives.

A fact represents the focus of analysis (e.g., analysis of sales in stores) and typically includes
attributes called measures.

Measures are usually numeric values that allow a quantitative evaluation of various aspects of an
organization. For example, measures such as the amount or number of sales might help to
analyze sales activities in various stores.

Dimensions are used to see the measures from several perspectives. For example, a time
dimension can be used to analyze changes in sales over various periods of time, whereas a
location dimension can be used to analyze sales according to the geographical distribution of
stores.

Hierarchies allow users to explore measures at various levels of detail. Examples of hierarchies
are month–quarter–year in the time dimension and city–state–country in the location dimension.

5
DATA WAREHOUSING AND MANAGEMENT MODULE 1

Figure 3. OLAP Server with Multidimensional Data Store Arrangement

Conceptual Modelling of Data Warehouses

Star schema: A fact table in the middle connected to a set of dimension tables

Snowflake schema: A refinement of star schema where some dimensional hierarchy is


normalized into a set of smaller dimension tables, forming a shape similar to snowflake

Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact constellation

6
DATA WAREHOUSING AND MANAGEMENT MODULE 1

Although data warehouses are, in the end, a particular kind of databases, there are
significant differences between the development of operational databases and data warehouses.
A key one is the fact that data in a warehouse are extracted from several source systems. Thus,
data must be taken from these sources, transformed to fit the data warehouse model, and
loaded into the data warehouse. This process is called extraction, transformation, and loading
(ETL), and it has been proven crucial for the success of a data warehousing project.
Data analytics is the process of exploiting the contents of a data warehouse in order to
provide essential information to the decision-making process. Three main tools can be used for
this:
Data mining consists in a series of statistical techniques that analyze the data in a warehouse in
order to discover useful knowledge that is not easy to obtain from the original data.
Key performance indicators (KPIs) are measurable organizational objectives that are used for
characterizing how an organization is performing.
Dashboards are interactive reports that present the data in a warehouse, including the KPIs, in a
visual way, providing an overview of the performance of an organization for decision-support
purposes.

7
DATA WAREHOUSING AND MANAGEMENT MODULE 1

2. Spatial and Spatiotemporal Data Warehouse

Over the years, spatial data has been increasingly used in various areas, like public
administration, transportation networks, environmental systems, and public health, among
others. Spatial data can represent either objects located on the Earth’s surface, such as
mountains, cities, and rivers, or geographic phenomena, such as temperature, precipitation, and
altitude. Spatial data can also represent nongeographic data, that is, data located in other spatial
frames such as a human body, a house, or an engine.
Management of spatial data is carried out by spatial databases or geographic information
systems (GISs). Since the latter are used for storing and manipulating geographic objects and
phenomena, we shall use the more general term spatial databases in the following. Spatial
databases are used to store spatial data located in a two- or three-dimensional space.
These systems provide a set of functions and operators for querying and manipulating
spatial data. Queries may refer to spatial characteristics of individual objects, such as their area or
perimeter, or may require complex operations on two or more spatial objects.
Topological relationships between spatial objects, such as intersection, touches, and
crosses, are essential in spatial applications. For example, two roads may intersect, two countries
may touch because they have a common border, or a road may cross a dessert. An important
characteristic of topological relationships is that they do not change when the underlying space is
distorted through rotation, scaling, and similar operations.
Spatial data warehouses emerged as a combination of the spatial database and data
warehouse technologies. Spatial data warehouses provide improved data analysis, visualization,
and manipulation. This kind of analysis is called spatial OLAP (SOLAP), conveying a reference to
the ability of exploring spatial data through map navigation and aggregation, as it is performed in
OLAP with tables and charts.

3. New Domains and Challenges

Nowadays, the availability of enormous amounts of data is calling for a shift in the way data
warehouse and business intelligence practices have been carried out since the 1990s. It is
becoming clear that for certain kinds of business intelligence applications, the traditional
approach, where day-to-day business data produced in an organization are collected in a huge
common repository for data analysis, needs to be revised, to account for efficiently handling
large-scale data. In many emerging domains where business intelligence practices are gaining

8
DATA WAREHOUSING AND MANAGEMENT MODULE 1
acceptance, such as social networks or geospatial data analytics, massive-scale data sources are
becoming common, posing new challenges to the data warehouse research community.
We already commented that the typical method of loading data into a data warehouse is
through an ETL process. This process pulls data from source systems periodically (e.g., daily,
weekly, or monthly), obtaining snapshot of the business data at a given moment in time. These
data are then used for refreshing the contents of the data warehouse.
For some data analysis tasks (like worldwide price evolution of some product), the data
contained in a conventional data warehouse may not suffice. External data sources, like the web,
can provide useful multidimensional information, although usually too volatile to be permanently
stored. The semantic web aims at representing web content in a machine-processable way. The
basic layer of the data representation for the semantic web recommended by the World Wide
Web Consortium (W3C) is the Resource Description Framework (RDF), on top of which the Web
Ontology Language (OWL) is based.

9
DATA WAREHOUSING AND MANAGEMENT MODULE 1

ACTIVITY 1: REVIEW QUESTIONS


Based on the discussion, answer each questions and / or statements briefly. Write on the spaces
provided below.

1. Why are traditional databases called operational or transactional? Why are these
databases inappropriate for data analysis?
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________

2. Discuss four main characteristics of data warehouses.


________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
3. What is the purpose of online analytical processing (OLAP) systems and how are they
related to data warehouses?
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________

4. What is the extraction, transformation, and loading (ETL) process?


________________________________________________________________________
________________________________________________________________________
________________________________________________________________________

5. Explain the differences between spatial databases and spatial data warehouses.
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________

10
DATA WAREHOUSING AND MANAGEMENT MODULE 1
ACTIVITY 2: EXCERSICE
An information package of a promotional analysis is shown below. To evaluate the
effectiveness of various promotions, brand managers are interested in analyzing data for the
products represented, the promotional offers, and the locations where the promotions ran.
Construct a star schema based on the information package diagram, and discuss how the brand
manager or other analysts can use the model to evaluate the promotions.

All time periods All products All Locations All promotions


Years Category Region Type
Quarters Sub-category District Sub-type
Months Brand Store Name
Package Size
Measures/Facts: Units, revenue, cost, margin
(Calculated)

11

You might also like