Data Mining 14

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 3

Knowledge Discovery Process (KDP)

Data mining is the core part of the knowledge discovery process. KDP is a process of finding
knowledge in data; it does this by using data mining methods (algorithms) in order to extract
demanding knowledge (required) from large amount of data.

 The term KDD stands for Knowledge Discovery in Databases. It refers to the broad procedure
of discovering knowledge in data and emphasizes the high-level applications of specific Data
Mining techniques. It is a field of interest to researchers in various fields, including artificial
intelligence, machine learning, pattern recognition, databases, statistics, knowledge acquisition
for expert systems, and data visualization. The main objective of the KDD process is to extract
information from data in the context of large databases (data warehouse). It does this by using
Data Mining algorithms (techniques) to identify what is deemed knowledge (required
information).

Knowledge Discovery Process may consist of the following steps:-


Fig: - Knowledge Discovery Process (KDP)

1. Data cleaning - First step in the Knowledge Discovery Process is Data cleaning in which
noise and inconsistent data is removed or Data cleaning is also defined as removal of
noisy and irrelevant data from collection.

 Cleaning in case of Missing values. Ex:-1 2 4 5 6


 Cleaning noisy data, where noise is a random or variance error. Ex:- ggjggyghu
 Cleaning with Data discrepancy detection and Data transformation tools.

2. Data Integration - Second step is Data Integration in which multiple data sources are
combined or Data integration is defined as heterogeneous data from multiple sources
(different databases) combined in a common source i.e. Data Warehouse.

 Data integration using Data Migration tools.


 Data integration using Data Synchronization tools.
 Data integration using ETL (Extract-Load-Transform) process.

3. Data Selection -Next step is Data Selection in which data relevant to the analysis are
retrieved from the database or Data selection is also defined as the process where data
relevant to the analysis is decided and retrieved from the data collection i.e. Data
Warehouse.

 Data selection using Neural network.


 Data selection using Decision Trees.
 Data selection using Naive bayes.
 Data selection using Clustering, Regression, etc.

4. Data Transformation - In Data Transformation, data are transformed into forms


appropriate for mining by performing summary or aggregation operations or Data
Transformation is also defined as the process of transforming data into appropriate form
required by mining procedure.

Data Transformation is a two step process:


 Data Mapping: Assigning elements from source base to destination to capture
transformations. Or Data mapping is the process of matching fields from one database to
another.
 Data Translation: Translation converts data from formats used in one system to
formats appropriate for a different system.
5. Data Mining - In Data Mining, data mining methods (algorithms) are applied in order to
extract data patterns. Or Data mining is defined as clever techniques that are applied to
extract patterns potentially useful.

 Transforms task relevant data into patterns.


 Decides purpose of model using classification or characterization.

6. Pattern Evaluation - In Pattern Evaluation, data patterns are identified based on some
interesting measures or Pattern Evaluation is defined as identifying strictly increasing
patterns representing knowledge based on given measures.

 Find interestingness score of each pattern.


 Uses summarization and Visualization to make data understandable by user.

7. Knowledge Presentation - In Knowledge Presentation, knowledge is represented to user


using many knowledge representation techniques or Knowledge representation is defined as
technique which utilizes visualization tools to represent data mining results (data mining -
>Pattern ->knowledge->represented to user).

 Generate reports.
 Generate tables.
 Generate discriminant rules (Association rules), classification rules, characterization
rules, etc.
 Charts, histograms etc

Note:
 KDD is an iterative process where evaluation measures can be enhanced, mining can be
refined, new data can be integrated and transformed in order to get different and more
appropriate results.

 Preprocessing of databases consists of Data cleaning and Data Integration.

You might also like