Professional Documents
Culture Documents
Assignment On Chapter 1 Data Warehousing and Management
Assignment On Chapter 1 Data Warehousing and Management
Assignment On Chapter 1 Data Warehousing and Management
I. OBJECTIVES
At the end of this chapter, the students should be able to:
Explain the fundamental concepts to data warehouse systems.
Discuss the historical overview of data warehousing, starting from the early achievements.
Understand the various ways of organizing and managing information for decision making
use.
Review the history of decision support systems.
Learn briefly what is data warehouse and see why data warehousing is the viable solution
Describe the field of spatial and spatiotemporal data warehouses
Describe new domains and challenges that are being explored in order to answer the
requirements of today’s analytical applications.
III. PROCEDURE
A. Preliminaries
Pre- Assessment
1. Explain and discuss the historical overview of data warehousing including traditional
operational or transactional databases, the design of operational databases, and
multidimensional modelling.
2. Identify the field of spatial and spatiotemporal data warehouses, which has been
increasingly used in many application domains.
3. Examine and discuss new domains and challenges that are being explored in order to
answer the requirements of today’s analytical applications.
B. Lesson Proper
Organizations today are facing increasingly complex challenges in terms of management
and problem solving in order to achieve their operational goals. This situation compels people
in those organizations to utilize analysis tools that can better support their decisions.
1
DATA WAREHOUSING AND MANAGEMENT MODULE 1
In this chapter, we present an overview of the data warehousing field, covering both
established topics and new developments, and indicate the chapters in the book where these
subjects are covered. We give a historical overview of data warehousing and On-Line Analytical
Processing (OLAP), starting from the early achievements. Then, we describe the field of spatial
and spatiotemporal data warehouses, which has been increasingly used in many application
domains. Finally, we describe new domains and challenges that are being explored in order to
answer the requirements of today’s analytical applications.
1. Introduction
This data transformation comprises a set of tasks that take the data from the sources and,
through extraction, transformation, integration, and cleansing processes, store the data in a
common repository called a data warehouse. Data warehouses have been developed and
deployed as an integral part of decision-support systems to provide an infrastructure that enables
users to obtain efficient and accurate responses to complex queries.
Integrated: The Data Warehouse is a centralized, consolidated database that integrates data
retrieved from the entire organization.
Subject-Oriented: The Data Warehouse data is arranged and optimized to provide answers to
questions coming from diverse functional areas within a company.
Time Variant: The Warehouse data represent the flow of data through time. It can even contain
projected data.
Non-Volatile: Once data enter the Data Warehouse, they are never removed. The Data
Warehouse is always growing.
2
DATA WAREHOUSING AND MANAGEMENT MODULE 1
• It has consistent and repeatable processes defined for loading operational data.
• It is built on an open and scalable architecture that will handle future expansion of data.
• It provides tools that allow its users to effectively process the data into information without a
high degree of technical support.
A wide variety of systems and tools can be used for accessing, analyzing, and exploiting
the data contained in data warehouses. From the early days of data warehousing, the typical
3
DATA WAREHOUSING AND MANAGEMENT MODULE 1
mechanism for those tasks has been online analytical processing (OLAP). OLAP systems allow
users to interactively query and automatically aggregate the data contained in a data warehouse.
In this way, decision makers can easily access the required information and analyze it at various
levels of detail.
4
DATA WAREHOUSING AND MANAGEMENT MODULE 1
The design of operational databases is typically performed in four phases: requirements
specification, conceptual design, logical design, and physical design. During the requirements
specification process, the needs of users at various levels of the organization are collected. The
specification obtained serves as a basis for creating a database schema capable of responding to
user queries.
Databases are designed using a conceptual model, such as the entity-relationship (ER)
model, which aims at describing an application without taking into account implementation
considerations. The resulting design is then translated into a logical model, which is an
implementation paradigm for database applications. Finally, physical design particularizes the
logical model for a specific implementation platform in order to produce a physical model.
The processing of data in which data are viewed as part of a multidimensional structure.
Multidimensional view allows end users to consolidate or aggregate data at different
levels.
Multidimensional view allows a business analyst to easily switch business perspectives.
A fact represents the focus of analysis (e.g., analysis of sales in stores) and typically includes
attributes called measures.
Measures are usually numeric values that allow a quantitative evaluation of various aspects of an
organization. For example, measures such as the amount or number of sales might help to
analyze sales activities in various stores.
Dimensions are used to see the measures from several perspectives. For example, a time
dimension can be used to analyze changes in sales over various periods of time, whereas a
location dimension can be used to analyze sales according to the geographical distribution of
stores.
Hierarchies allow users to explore measures at various levels of detail. Examples of hierarchies
are month–quarter–year in the time dimension and city–state–country in the location dimension.
5
DATA WAREHOUSING AND MANAGEMENT MODULE 1
Star schema: A fact table in the middle connected to a set of dimension tables
Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact constellation
6
DATA WAREHOUSING AND MANAGEMENT MODULE 1
Although data warehouses are, in the end, a particular kind of databases, there are
significant differences between the development of operational databases and data warehouses.
A key one is the fact that data in a warehouse are extracted from several source systems. Thus,
data must be taken from these sources, transformed to fit the data warehouse model, and
loaded into the data warehouse. This process is called extraction, transformation, and loading
(ETL), and it has been proven crucial for the success of a data warehousing project.
Data analytics is the process of exploiting the contents of a data warehouse in order to
provide essential information to the decision-making process. Three main tools can be used for
this:
Data mining consists in a series of statistical techniques that analyze the data in a warehouse in
order to discover useful knowledge that is not easy to obtain from the original data.
Key performance indicators (KPIs) are measurable organizational objectives that are used for
characterizing how an organization is performing.
Dashboards are interactive reports that present the data in a warehouse, including the KPIs, in a
visual way, providing an overview of the performance of an organization for decision-support
purposes.
7
DATA WAREHOUSING AND MANAGEMENT MODULE 1
Over the years, spatial data has been increasingly used in various areas, like public
administration, transportation networks, environmental systems, and public health, among
others. Spatial data can represent either objects located on the Earth’s surface, such as
mountains, cities, and rivers, or geographic phenomena, such as temperature, precipitation, and
altitude. Spatial data can also represent nongeographic data, that is, data located in other spatial
frames such as a human body, a house, or an engine.
Management of spatial data is carried out by spatial databases or geographic information
systems (GISs). Since the latter are used for storing and manipulating geographic objects and
phenomena, we shall use the more general term spatial databases in the following. Spatial
databases are used to store spatial data located in a two- or three-dimensional space.
These systems provide a set of functions and operators for querying and manipulating
spatial data. Queries may refer to spatial characteristics of individual objects, such as their area or
perimeter, or may require complex operations on two or more spatial objects.
Topological relationships between spatial objects, such as intersection, touches, and
crosses, are essential in spatial applications. For example, two roads may intersect, two countries
may touch because they have a common border, or a road may cross a dessert. An important
characteristic of topological relationships is that they do not change when the underlying space is
distorted through rotation, scaling, and similar operations.
Spatial data warehouses emerged as a combination of the spatial database and data
warehouse technologies. Spatial data warehouses provide improved data analysis, visualization,
and manipulation. This kind of analysis is called spatial OLAP (SOLAP), conveying a reference to
the ability of exploring spatial data through map navigation and aggregation, as it is performed in
OLAP with tables and charts.
Nowadays, the availability of enormous amounts of data is calling for a shift in the way data
warehouse and business intelligence practices have been carried out since the 1990s. It is
becoming clear that for certain kinds of business intelligence applications, the traditional
approach, where day-to-day business data produced in an organization are collected in a huge
common repository for data analysis, needs to be revised, to account for efficiently handling
large-scale data. In many emerging domains where business intelligence practices are gaining
8
DATA WAREHOUSING AND MANAGEMENT MODULE 1
acceptance, such as social networks or geospatial data analytics, massive-scale data sources are
becoming common, posing new challenges to the data warehouse research community.
We already commented that the typical method of loading data into a data warehouse is
through an ETL process. This process pulls data from source systems periodically (e.g., daily,
weekly, or monthly), obtaining snapshot of the business data at a given moment in time. These
data are then used for refreshing the contents of the data warehouse.
For some data analysis tasks (like worldwide price evolution of some product), the data
contained in a conventional data warehouse may not suffice. External data sources, like the web,
can provide useful multidimensional information, although usually too volatile to be permanently
stored. The semantic web aims at representing web content in a machine-processable way. The
basic layer of the data representation for the semantic web recommended by the World Wide
Web Consortium (W3C) is the Resource Description Framework (RDF), on top of which the Web
Ontology Language (OWL) is based.
9
DATA WAREHOUSING AND MANAGEMENT MODULE 1
1. Why are traditional databases called operational or transactional? Why are these
databases inappropriate for data analysis?
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
5. Explain the differences between spatial databases and spatial data warehouses.
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
10
DATA WAREHOUSING AND MANAGEMENT MODULE 1
ACTIVITY 2: EXCERSICE
An information package of a promotional analysis is shown below. To evaluate the
effectiveness of various promotions, brand managers are interested in analyzing data for the
products represented, the promotional offers, and the locations where the promotions ran.
Construct a star schema based on the information package diagram, and discuss how the brand
manager or other analysts can use the model to evaluate the promotions.
11