Professional Documents
Culture Documents
Data Warehouse
Data Warehouse
A data warehouse is a repository for large sets of transactional data. It is a system for storing
and delivering massive quantities of data. The data in the warehouse varies widely, depending
on the discipline and the focus of the organization.
One of the benefits to data warehouses is that they allow organizations to store information that
can be used to improve the marketing strategies of their companies. Organizations can also be
able to make strategic decisions based on the information they have compiled and organized.
With techniques such as data mining and data visualization, organizations will be able to
discover important patterns that they did not know existed. The patterns that they discover can
allow the organizations take decisions that translate into efficiency and ultimately to higher
profits.
Extraction of Data: The extract function identifies and gets data from the various sources,
applications or databases which may be in different systems. They are then converted into a
single format.
Transformation of Data: The transformation process is where the data is manipulated into
the desired form. It can be from a simple conversion process to a complex data scrubbing
techniques. This process requires the user to have good knowledge about both the source and
target database. The transformation process requires the use of some special tools.
Loading of Data: It involves loading the data into the reporting area or the target database.
There are several techniques used in transporting data into the data warehouse. The main
challenge is how to deliver huge amount of database into the target database within a short
time.
Load Manager: The load manager performs all the operations associated with extraction and
loading data into the data warehouse. These operations include simple transformations of the
data to prepare the data for entry into the warehouse.
Warehouse Manager: The warehouse manager performs all the operations associated with
the management of data in the warehouse. The operations performed by warehouse manager
include: analyzing of data to ensure consistency, transforming and merging the source data
from temporary storage into data warehouse tables, creating indexes and views on the base
table and backing up and archiving of data.
Query Manager: The query manager performs all operations associated with management of
user queries.
End-User Access Tools: The main purpose of data warehouse is to provide information to the
business managers for strategic decision-making. These users interact with the warehouse
using end user access tools. Some of the end user access tools can be: Reporting and Query
Tools, Application Development Tools, Executive Information Systems Tools, Online
Analytical Processing Tools and Data Mining Tools.
DATA MARTS
Data mart is a subset of the data warehouse. Data mart is usually concerned with specific
business unit. In some organizations, each department or business unit is made the owner of its
data mart including all the hardware, software and data. This enables each unit to develop,
manipulate and use their data any way they deem it fit; without altering information in other
data marts or the data warehouse.
Therefore, a data mart is a storehouse of a business organization's data implemented to meet
the demands of specific group of workers. It has a comparatively narrow or specific subject
area, such as for sales and marketing to achieve competitive edge.
Some areas where data mining has been applied in business are as follows: advertising,
customer relationship management, e-Commerce, fraud detection, health care, investments,
manufacturing, sports/entertainment, targeted/direct marketing, market analysis and trend
analysis.
The steps involved in data mining are iterative, with the processes moving to and fro whenever
needed. These are:
a. Extract, transform and load transaction data onto the data warehouse system.
b. Cleaning and pre-processing
c. Store and manage the data in a multidimensional database system.
d. Query and analyze the data by application software.
e. Present the data in a useful format and interpret the findings.
Clustering: This is similar to classification. Data items are grouped according to having
similar attributes. Clustering tries to group a set of objects and finds whether there is some
relationship between the objects. For example, data can be mined to identify market segments
or consumer preferences, and this in turn can be used to drive marketing and promotion
strategies to target specific types of customers
Associations: Associations are data mining functions that discover the co-occurrence of
transactions. That is where one transaction can be correlated or leads to another. The
relationships between the co-occurring transactions or items are expressed as association rules.
Associations are useful for analysing and predicting the behaviour of customers. The beer-
diaper example is an example of associative mining.
Sequences: Sequences happen where events or transactions are connected or related over time,
that is one transaction lead to another transaction later. For example, a hardware dealer may
find out in 80% of the time, the purchasing of louver flames is followed by the purchase of
louver blades. Identifying those customers who purchase louver frames and sending them mails
with an offer for louver blades will increase sales for louver blades. Sequences therefore is
where data is mined to anticipate behaviour patterns and trends
Forecasting: Data is mined to discover patterns that can lead to predictions or projections.
Forecasting is extrapolating current trends and patterns to decide or determine future value. For
example, forecasting sales for the next year based on sales figures for the past 10 years.
For example, a sales manager can analyze data to show sales for a particular product made in
a one market zone for a particular month and compare the figures with similar sales made by
the same product in a different month, and possible even compare other products in the same
market zone for the same period. This kind of analysis can be made possible when the data is
stored in a multidimensional database.