Difference Between Data Warehousing and Data Mining

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

Difference between Data Warehousing and Data Mining

A data warehouse is built to support management functions whereas data mining is used to extract
useful information and patterns from data. Data warehousing is the process of compiling information
into a data warehouse.

Data Warehousing:

It is a technology that aggregates structured data from one or more sources so that it can be compared
and analyzed rather than transaction processing. A data warehouse is designed to support the
management decision-making process by providing a platform for data cleaning, data integration, and
data consolidation. A data warehouse contains subject-oriented, integrated, time-variant, and non-
volatile data. The Data warehouse consolidates data from many sources while ensuring data quality,
consistency, and accuracy. Data warehouse improves system performance by separating analytics
processing from transnational databases. Data flows into a data warehouse from the various databases.
A data warehouse works by organizing data into a schema that describes the layout and type of data.
Query tools analyze the data tables using schema.

Figure: Data Warehousing process

Advantages of Data Warehousing:

 The data warehouse’s job is to make any form of corporate data easier to understand. The
majority of the user’s job will consist of inputting raw data.

 The capacity to update continuously and frequently is the key benefit of this technology. As a
result, data warehouses are perfect for organizations and entrepreneurs who want to stay
current with their target audience and customers.

 It makes data more accessible to businesses and organizations.

 A data warehouse holds a large volume of historical data that users can use to evaluate
different periods and trends in order to create predictions for the future.
Disadvantages of Data Warehousing:

 There is a great risk of accumulating irrelevant and useless data. Data loss and erasure are other
potential issues.

 Data is gathered from various sources in a data warehouse. Cleansing and transformation of the
data are required. This could be a difficult task.

Data Mining:

It is the process of finding patterns and correlations within large data sets to identify relationships
between data. Data mining tools allow a business organization to predict customer behavior. Data
mining tools are used to build risk models and detect fraud. Data mining is used in market analysis and
management, fraud detection, corporate analysis, and risk management.

Figure: Data Mining process

Advantages of Data Mining:

 Data mining aids in a variety of data analysis and sorting procedures. The identification and
detection of any undesired fault in a system is one of the best implementations here. This
method permits any dangers to be eliminated sooner.

 In comparison to other statistical data applications, data mining methods are both cost-effective
and efficient.
 Companies can take advantage of this analytical tool by providing appropriate and easily
accessible knowledge-based data.

 The detection and identification of undesirable faults that occur in the system are one of the
most astonishing data mining techniques.

Disadvantages of Data Mining:

 Data mining isn’t always 100 percent accurate, and if done incorrectly, it can lead to data
breaches.

 Organizations must devote a significant amount of resources to training and implementation.


Furthermore, the algorithms used in the creation of data mining tools cause them to work in
different ways.

Comparison between Data Mining and Data Warehousing:

S. Basis of
No. Comparison Data Warehousing Data Mining

A data warehouse is a
database system that is
Data mining is the process of analyzing data
designed for analytical
patterns.
analysis instead of
1. Definition transactional work.

2. Process Data is stored periodically. Data is analyzed regularly.

Data warehousing is the


process of extracting and Data mining is the use of pattern recognition
storing data to allow easier logic to identify patterns.
3. Purpose reporting.

Managing Data warehousing is solely Data mining is carried out by business users
4. Authorities carried out by engineers. with the help of engineers.

Data warehousing is the


Data mining is considered as a process of
Data process of pooling all relevant
extracting data from large data sets.
5. Handling data together.
S. Basis of
No. Comparison Data Warehousing Data Mining

Subject-oriented, integrated, AI, statistics, databases, and machine


time-varying and non-volatile learning systems are all used in data mining
6. Functionality constitute data warehouses. technologies.

Data warehousing is the


process of extracting and Pattern recognition logic is used in data
storing data in order to make mining to find patterns.
7. Task reporting more efficient.

It extracts data and stores it in This procedure employs pattern recognition


an orderly format, making tools to aid in the identification of access
8. Uses reporting easier and faster. patterns.

Data mining aids in the creation of


When a data warehouse is
suggestive patterns of key parameters.
connected with operational
Customer purchasing behavior, items, and
business systems like CRM
sales are examples. As a result, businesses
(Customer Relationship
will be able to make the required
Management) systems, it adds
adjustments to their operations and
value.
9. Examples production.

Characteristics and Functions of Data warehouse

 Read

 Discuss

 Courses

Introduction :
A data warehouse is a centralized repository for storing and managing large amounts of data from
various sources for analysis and reporting. It is optimized for fast querying and analysis, enabling
organizations to make informed decisions by providing a single source of truth for data. Data
warehousing typically involves transforming and integrating data from multiple sources into a unified,
organized, and consistent format.

Prerequisite – Data Warehousing Data warehouse can be controlled when the user has a shared way of
explaining the trends that are introduced as specific subject. Below are major characteristics of data
warehouse :

1. Subject-oriented – A data warehouse is always a subject oriented as it delivers information


about a theme instead of organization’s current operations. It can be achieved on specific
theme. That means the data warehousing process is proposed to handle with a specific theme
which is more defined. These themes can be sales, distributions, marketing etc.
A data warehouse never put emphasis only current operations. Instead, it focuses on
demonstrating and analysis of data to make various decision. It also delivers an easy and precise
demonstration around particular theme by eliminating data which is not required to make the
decisions.

2. Integrated – It is somewhere same as subject orientation which is made in a reliable format.


Integration means founding a shared entity to scale the all similar data from the different
databases. The data also required to be resided into various data warehouse in shared and
generally granted manner.
A data warehouse is built by integrating data from various sources of data such that a
mainframe and a relational database. In addition, it must have reliable naming conventions,
format and codes. Integration of data warehouse benefits in effective analysis of data. Reliability
in naming conventions, column scaling, encoding structure etc. should be confirmed. Integration
of data warehouse handles various subject related warehouse.
3. Time-Variant – In this data is maintained via different intervals of time such as weekly, monthly,
or annually etc. It founds various time limit which are structured between the large datasets and
are held in online transaction process (OLTP). The time limits for data warehouse is wide-ranged
than that of operational systems. The data resided in data warehouse is predictable with a
specific interval of time and delivers information from the historical perspective. It comprises
elements of time explicitly or implicitly. Another feature of time-variance is that once data is
stored in the data warehouse then it cannot be modified, alter, or updated. Data is stored with a
time dimension, allowing for analysis of data over time.

4. Non-Volatile – As the name defines the data resided in data warehouse is permanent. It also
means that data is not erased or deleted when new data is inserted. It includes the mammoth
quantity of data that is inserted into modification between the selected quantity on logical
business. It evaluates the analysis within the technologies of warehouse. Data is not updated,
once it is stored in the data warehouse, to maintain the historical data.
In this, data is read-only and refreshed at particular intervals. This is beneficial in analysing
historical data and in comprehension the functionality. It does not need transaction process,
recapture and concurrency control mechanism. Functionalities such as delete, update, and
insert that are done in an operational application are lost in data warehouse environment. Two
types of data operations done in the data warehouse are:

 Data Loading

 Data Access

1. Subject Oriented: Focuses on a specific area or subject such as sales, customers, or inventory.

2. Integrated: Integrates data from multiple sources into a single, consistent format.

3. Read-Optimized: Designed for fast querying and analysis, with indexing and aggregations to
support reporting.

4. Summary Data: Data is summarized and aggregated for faster querying and analysis.

5. Historical Data: Stores large amounts of historical data, making it possible to analyze trends and
patterns over time.

6. Schema-on-Write: Data is transformed and structured according to a predefined schema before


it is loaded into the data warehouse.

7. Query-Driven: Supports ad-hoc querying and reporting by business users, without the need for
technical support.

Functions of Data warehouse: It works as a collection of data and here is organized by various
communities that endures the features to recover the data functions. It has stocked facts about the
tables which have high transaction levels which are observed so as to define the data warehousing
techniques and major functions which are involved in this are mentioned below:

1. Data Consolidation: The process of combining multiple data sources into a single data repository
in a data warehouse. This ensures a consistent and accurate view of the data.
2. Data Cleaning: The process of identifying and removing errors, inconsistencies, and irrelevant
data from the data sources before they are integrated into the data warehouse. This helps
ensure the data is accurate and trustworthy.

3. Data Integration: The process of combining data from multiple sources into a single, unified data
repository in a data warehouse. This involves transforming the data into a consistent format and
resolving any conflicts or discrepancies between the data sources. Data integration is an
essential step in the data warehousing process to ensure that the data is accurate and usable for
analysis. Data from multiple sources can be integrated into a single data repository for analysis.

4. Data Storage: A data warehouse can store large amounts of historical data and make it easily
accessible for analysis.

5. Data Transformation: Data can be transformed and cleaned to remove inconsistencies, duplicate
data, or irrelevant information.

6. Data Analysis: Data can be analyzed and visualized in various ways to gain insights and make
informed decisions.

7. Data Reporting: A data warehouse can provide various reports and dashboards for different
departments and stakeholders.

8. Data Mining: Data can be mined for patterns and trends to support decision-making and
strategic planning.

9. Performance Optimization: Data warehouse systems are optimized for fast

Prerequisites – Data Warehousing, Data Warehouse Architecture , Characteristics and


Functions of Data warehouse Here are some of the difficulties of Implementing Data
Warehouses:
1. Implementing a data warehouse is generally a massive effort that must be
planned and executed according to established methods.
2. Construction, administration, and quality control are the significant
operational issues which arises with data warehousing.
3. Some of the important and challenging consideration while implementing
data warehouse are: the design, construction and implementation of the
warehouse.
4. The building of an enterprise-wide warehouse in a large organization is a
major undertaking.
5. Manual Data Processing can risk the correctness of the data being entered.
6. An intensive enterprise is the administration of a data warehouse, which is
proportional to the complexity and size of the warehouse.
7. The complex nature of the administration should be understood by an
organization that attempts to administer a data warehouse.
8. There must be a flexibility to accept and integrate analytics to streamline the
business intelligence process.
9. To handle the evolutions, acquisition component and the warehouse’s schema
should be updated.
10. A significant issue in data warehousing is the quality control of data. The
major concerns are: quality and consistency of data.
11. Consistency remain significant issues for the database administrator.
12. One of the major challenge that has given differences in naming, domain
definitions, identification numbers is Melding data from heterogeneous and
disparate sources.
13. The data warehouse administrator must consider the possible interactions
with elements of warehouse, every time when a source database changes.
14. There should be accuracy of data. The efficiency and working of a warehouse
is only a good as the data that support its operation.
15. Usage projections should be estimated conservatively prior to construction of
the data warehouse and should be revised continually to reflect current
requirements.
16. To accommodate addition and attrition of data sources, the warehouse should
be designed. This also avoids a major redesign.
17. Sources and source data will be evolve, and the warehouse must
accommodate such changes.
18. Another continual challenge is fitting of the available source data into the
data model of the warehouse. This is because requirements and capabilities of
the warehouse will change over time as there will be a continual rapid change
in technology.
19. A far broader skills will be required by administration of data warehouse for
traditional database administration.
20. Managing the data warehouse in large organization, design of the
management function and selection of the management team for a database
warehouse are some of the major tasks.

You might also like