In last lecture you have learned • Data mining tasks After this lecture you will be able to • Understand major issues in data mining • Understand Distributed Data Mining • Understand Parallel Data mining Contents • Mining issues • User interaction • Applications • Distributed Data Mining • Parallel Data mining Major issues in data mining
Mining different kinds of knowledge from
diverse data types • Mining information from heterogeneous databases and global information systems • The data is available at different data sources on LAN or WAN. • These data source may be structured, semi structured or unstructured. Conti: Performance: efficiency and scalability • In order to effectively extract the information from huge amount of data in databases, • Data mining algorithm must be efficient and scalable Conti: • Uses summarization and Visualization to make data understandable by user. • High cost of buying and maintaining powerful software's, servers and storage hardware's that handle large amounts of data. • Unavailability of data or difficult access to data. • Poor data quality such as noisy data, dirty data, missing values Conti: • Distributed mining methods • Dealing with huge datasets that require distributed approaches • Integration of the discovered knowledge with existing one Conti: • User interaction • Data mining query languages and ad-hoc mining • Expression and visualization of data mining results • Applications • Invisible data mining • Protection of data security, integrity, and privacy Distributed Data mining Conti: • Distributed Data Mining (DDM) is a field which deals with analyzing distributed data and proposes algorithmic solutions to perform different data analysis and mining operations in a distributed way by considering the resource constraints Parallel data Mining • Data mining is the automated analysis of large volumes of data, looking for the 'interesting' relationships and knowledge that are implicit in large volumes of data. • Research and development work in the area of parallel data mining concerns the study and definition of parallel algorithms, methods, and tools for the extraction of novel, useful, and implicit patterns from data using high-performance architectures. References • Data Mining: Concepts, Models, Methods, and Algorithms by Mehmed Kantardzic,3rd Edition ,2019 • Data Mining: Concepts and Techniques by Jiawei Han, Micheline Kamber and Jian Pei, Morgan Kaufmann; 3rd Edition (2012).