Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

DATA WAREHOUSE

A data warehouse is a repository for large sets of transactional data. It is a system for storing
and delivering massive quantities of data. The data in the warehouse varies widely, depending
on the discipline and the focus of the organization.

One of the benefits to data warehouses is that they allow organizations to store information that
can be used to improve the marketing strategies of their companies. Organizations can also be
able to make strategic decisions based on the information they have compiled and organized.
With techniques such as data mining and data visualization, organizations will be able to
discover important patterns that they did not know existed. The patterns that they discover can
allow the organizations take decisions that translate into efficiency and ultimately to higher
profits.

ETL in Data Warehouses


Data need to be loaded regularly into the data warehouse to serve the purpose of facilitating
business analysis. This is facilitated by ETL. ETL is short for extract, transform and load, and
it is the process of extracting data from the source systems and bringing them into the data
warehouse. These three separate functions or operations are combined into one tool and used
to pull data out of one database and placed into another database. ETL is thus used to migrate
data from one database to another to form data marts and data warehouses.

Extraction of Data: The extract function identifies and gets data from the various sources,
applications or databases which may be in different systems. They are then converted into a
single format.

Transformation of Data: The transformation process is where the data is manipulated into
the desired form. It can be from a simple conversion process to a complex data scrubbing
techniques. This process requires the user to have good knowledge about both the source and
target database. The transformation process requires the use of some special tools.

Loading of Data: It involves loading the data into the reporting area or the target database.
There are several techniques used in transporting data into the data warehouse. The main
challenge is how to deliver huge amount of database into the target database within a short
time.

Data Warehouse Tools


The following are some Data warehousing tools included in a standard software package.

Load Manager: The load manager performs all the operations associated with extraction and
loading data into the data warehouse. These operations include simple transformations of the
data to prepare the data for entry into the warehouse.

Warehouse Manager: The warehouse manager performs all the operations associated with
the management of data in the warehouse. The operations performed by warehouse manager
include: analyzing of data to ensure consistency, transforming and merging the source data
from temporary storage into data warehouse tables, creating indexes and views on the base
table and backing up and archiving of data.

Query Manager: The query manager performs all operations associated with management of
user queries.
End-User Access Tools: The main purpose of data warehouse is to provide information to the
business managers for strategic decision-making. These users interact with the warehouse
using end user access tools. Some of the end user access tools can be: Reporting and Query
Tools, Application Development Tools, Executive Information Systems Tools, Online
Analytical Processing Tools and Data Mining Tools.

Advantages of Data Warehouse

a. It enables organization to have access a large amount of consolidated information. This


information can be used to solve a number of problems.
b. Data warehouse has access to querying and reporting system.
c. Data from different locations are integrated into one location.
d. Improved data quality

Disadvantages of Data Warehouse

a. Data warehouse are difficult to build and can be costly to maintain.


b. Before data can be stored within the warehouse, it must be cleaned, loaded and
extracted. These processes can take a long period of time.
c. There may also be issues with compatibility. A new transaction system may not work
with systems that are already being used.
d. Data warehouse that are accessed via the Internet could be vulnerable to security
threats.

DATA MARTS
Data mart is a subset of the data warehouse. Data mart is usually concerned with specific
business unit. In some organizations, each department or business unit is made the owner of its
data mart including all the hardware, software and data. This enables each unit to develop,
manipulate and use their data any way they deem it fit; without altering information in other
data marts or the data warehouse.
Therefore, a data mart is a storehouse of a business organization's data implemented to meet
the demands of specific group of workers. It has a comparatively narrow or specific subject
area, such as for sales and marketing to achieve competitive edge.

Benefits of Creating Data Marts


a. It facilitates easy and faster access to frequently needed data by a unit.
b. It affords improved end-user response time since the data is a subset of the data
warehouse which focuses on restricted tasks.
c. As a subclass of data warehouse, creation or setup of data marts are comparatively
simple.
d. It is less costly to implement than a complete data warehouse.
e. Since they are created on department basis, users are more clearly defined than in a
full data warehouse.
f. The contents of data marts are more orderly since they contains only essential
business data for a specific purpose.
g. It focuses on user needs and limited scope.
DATA MINING
Information in data warehouse need to be mined (unearthed) to facilitate decision-making. Data
mining is a powerful technology that assists companies to focus on the most important aspect
about the data they have collected about their customers. Data mining discovers information
within data that ordinary queries may not easily reveal.

Data mining is defined as computer-assisted process of digging through and analyzing


enormous sets of data in order to discover meaningful patterns. Data mining tools predict
behaviour and trends, and they allow businesses to make proactive, knowledge-driven
decisions. Data mining tools answer questions that methods by the use SQL will find it difficult
to do. The driving force behind the need to mine organization’s data is to enable the
organization to focus on its customers.

Some areas where data mining has been applied in business are as follows: advertising,
customer relationship management, e-Commerce, fraud detection, health care, investments,
manufacturing, sports/entertainment, targeted/direct marketing, market analysis and trend
analysis.

The steps involved in data mining are iterative, with the processes moving to and fro whenever
needed. These are:

a. Extract, transform and load transaction data onto the data warehouse system.
b. Cleaning and pre-processing
c. Store and manage the data in a multidimensional database system.
d. Query and analyze the data by application software.
e. Present the data in a useful format and interpret the findings.

Results of Data Mining


The result of data mining is one of the following:

Classification: In classification, an organization has a set of predefined classes and wants to


know which class a new object belongs to. For example, a restaurant could mine customers’
purchase data to determine when customers visit and what they typically order. This
information could be used to increase patronage by having daily specials.

Clustering: This is similar to classification. Data items are grouped according to having
similar attributes. Clustering tries to group a set of objects and finds whether there is some
relationship between the objects. For example, data can be mined to identify market segments
or consumer preferences, and this in turn can be used to drive marketing and promotion
strategies to target specific types of customers

Associations: Associations are data mining functions that discover the co-occurrence of
transactions. That is where one transaction can be correlated or leads to another. The
relationships between the co-occurring transactions or items are expressed as association rules.
Associations are useful for analysing and predicting the behaviour of customers. The beer-
diaper example is an example of associative mining.

Sequences: Sequences happen where events or transactions are connected or related over time,
that is one transaction lead to another transaction later. For example, a hardware dealer may
find out in 80% of the time, the purchasing of louver flames is followed by the purchase of
louver blades. Identifying those customers who purchase louver frames and sending them mails
with an offer for louver blades will increase sales for louver blades. Sequences therefore is
where data is mined to anticipate behaviour patterns and trends

Forecasting: Data is mined to discover patterns that can lead to predictions or projections.
Forecasting is extrapolating current trends and patterns to decide or determine future value. For
example, forecasting sales for the next year based on sales figures for the past 10 years.

ONLINE ANALYTIC PROCESSING (OLAP)


OLAP (online analytical processing) is a computer processing approach that enables a user to
easily and selectively extract and view data from multiple perspectives or different points of
view. OLAP is used to store and deliver data warehouse information by data mining and
discovering previously undiscerned relationships between data items. It thus facilitates access
to multidimensional databases and provides management useful display techniques. OLAP
provides applications on areas such as sales, marketing, management reporting, budgeting,
forecasting and financial reporting.

For example, a sales manager can analyze data to show sales for a particular product made in
a one market zone for a particular month and compare the figures with similar sales made by
the same product in a different month, and possible even compare other products in the same
market zone for the same period. This kind of analysis can be made possible when the data is
stored in a multidimensional database.

You might also like