Professional Documents
Culture Documents
Data Mining 14
Data Mining 14
Data Mining 14
Data mining is the core part of the knowledge discovery process. KDP is a process of finding
knowledge in data; it does this by using data mining methods (algorithms) in order to extract
demanding knowledge (required) from large amount of data.
The term KDD stands for Knowledge Discovery in Databases. It refers to the broad procedure
of discovering knowledge in data and emphasizes the high-level applications of specific Data
Mining techniques. It is a field of interest to researchers in various fields, including artificial
intelligence, machine learning, pattern recognition, databases, statistics, knowledge acquisition
for expert systems, and data visualization. The main objective of the KDD process is to extract
information from data in the context of large databases (data warehouse). It does this by using
Data Mining algorithms (techniques) to identify what is deemed knowledge (required
information).
Fig: - Knowledge Discovery Process (KDP)
1. Data cleaning - First step in the Knowledge Discovery Process is Data cleaning in which
noise and inconsistent data is removed or Data cleaning is also defined as removal of
noisy and irrelevant data from collection.
2. Data Integration - Second step is Data Integration in which multiple data sources are
combined or Data integration is defined as heterogeneous data from multiple sources
(different databases) combined in a common source i.e. Data Warehouse.
3. Data Selection -Next step is Data Selection in which data relevant to the analysis are
retrieved from the database or Data selection is also defined as the process where data
relevant to the analysis is decided and retrieved from the data collection i.e. Data
Warehouse.
6. Pattern Evaluation - In Pattern Evaluation, data patterns are identified based on some
interesting measures or Pattern Evaluation is defined as identifying strictly increasing
patterns representing knowledge based on given measures.
Generate reports.
Generate tables.
Generate discriminant rules (Association rules), classification rules, characterization
rules, etc.
Charts, histograms etc
Note:
KDD is an iterative process where evaluation measures can be enhanced, mining can be
refined, new data can be integrated and transformed in order to get different and more
appropriate results.