Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 13

Data Mining

Lecture 5 Instructor Dr. Dhani Bux


In last lecture you have learned
• Data mining tasks
After this lecture you will be able to
• Understand major issues in data mining
• Understand Distributed Data Mining
• Understand Parallel Data mining
Contents
• Mining issues
• User interaction
• Applications
• Distributed Data Mining
• Parallel Data mining
Major issues in data mining

Mining different kinds of knowledge from


diverse data types
• Mining information from heterogeneous
databases and global information systems
• The data is available at different data sources
on LAN or WAN.
• These data source may be structured, semi
structured or unstructured.
Conti:
Performance: efficiency and scalability
• In order to effectively extract the information
from huge amount of data in databases,
• Data mining algorithm must be efficient and
scalable
Conti:
• Uses summarization and Visualization to make
data understandable by user.
• High cost of buying and maintaining powerful
software's, servers and storage hardware's that
handle large amounts of data.
•  Unavailability of data or difficult access to data.
• Poor data quality such as noisy data, dirty data,
missing values
Conti:
• Distributed mining methods
• Dealing with huge datasets that require
distributed approaches
• Integration of the discovered knowledge with
existing one
Conti:
• User interaction
• Data mining query languages and ad-hoc
mining
• Expression and visualization of data mining
results
• Applications
• Invisible data mining
• Protection of data security, integrity, and
privacy
Distributed Data mining
Conti:
• Distributed Data Mining (DDM) is a field which deals with
analyzing distributed data and proposes algorithmic solutions
to perform different data analysis and mining operations in a
distributed way by considering the resource constraints
Parallel data Mining
• Data mining is the automated analysis of large volumes of
data, looking for the 'interesting' relationships and knowledge
that are implicit in large volumes of data.
• Research and development work in the area of parallel data
mining concerns the study and definition of parallel
algorithms, methods, and tools for the extraction of novel,
useful, and implicit patterns from data using high-performance
architectures. 
References
• Data Mining: Concepts, Models, Methods, and Algorithms by
Mehmed Kantardzic,3rd Edition ,2019
• Data Mining: Concepts and Techniques by Jiawei Han,
Micheline Kamber and Jian Pei, Morgan Kaufmann; 3rd Edition
(2012).

You might also like