Professional Documents
Culture Documents
Welcome: Knowledge Is Power. Pass It On
Welcome: Knowledge Is Power. Pass It On
www.openreferencetech.com
DATA MINING:
www.openreferencetech.com
DATA MINING
Data Mining, is the extraction of hidden predictive information from
large databases,
The over all goal of the data mining process is to extract information
from a dataset and transform it into an understandable structure for
further use
Its goal is the extraction of patterns and knowledge from large amount
of data, not the extraction of data itself.
Data mining (the analysis step of the "Knowledge Discovery and Data
Mining“ process, or KDD)
www.openreferencetech.com
Why Data Mining?
The Explosive Growth of Data: from terabytes to petabytes
Eg: Global backbone telecommunication network carry tens of petabytes
everyday
(1024 Gigabytes = 1 Terabyte)( 1024 Terabytes = 1 Petabytes)
Data collection and data availability
Automated data collection tools, database systems, Web, computerized
society
Major sources of abundant data
Business: Web, e-commerce, transactions, stocks, …
Science: Remote sensing, bioinformatics, scientific simulation, …
Society and everyone: news, digital cameras,…
www.openreferencetech.com
www.openreferencetech.com
Knowledge Process
1. Data cleaning – to remove noise and inconsistent data
2. Data integration – to combine multiple source
3. Data selection – to retrieve relevant data for analysis
4. Data transformation – to transform data into appropriate form for
data mining.
5. Data mining- An essential process where intelligent methods are
applied to extract data patterns
6. Pattern Evaluation-Identify truly interesting patterns representing
knowledge based on interestingness measure
7. Knowledge presentation-visualization and representation techniques
www.openreferencetech.com
www.openreferencetech.com
www.openreferencetech.com