Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Data Mining

IS314

Dr. Ayman Alhelbawy 27 Feb 2022


Course Structure

• Lectures: Introduction to the main topics


• Lab work: Use of some big data tools and assessed by a project
• Year Work: Midterm assessment.
• Oral : Presentation and oral test

Grading

• Final Exam: 60%


• Lab Work: 10%
• Year Work: 20%
• Oral Presentation: 10%

Text Book(s)
• Data Mining Concepts and Techniques
3rd Edition

Jiawei Han, Micheline Kamber and Jian Pei

• Introduction to Data Mining


2nd Edition

Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, and


Vipin Kumar

Introduction
Agenda

- The need for Data Mining;


- What is Data Mining and what is not Data
Mining;
- Knowledge Discovery in Data (KDD) Process;
- Kinds of Data could be mined;
- Data Mining Tasks;
- Data Mining Applications;
-Challenges of Data Mining;

The need for data mining

• The Explosive Growth of Data: from terabytes to petabytes


•Data collection and data availability

Automated data collection tools, database systems, Web,
computerized society
•Major sources of abundant data

Business: Web, e-commerce, transactions, stocks, …

Science: Remote sensing, bioinformatics, scientific simulation, …

Society and everyone: news, digital cameras, YouTube
• We are drowning in data, but starving for knowledge!

What Data Mining is


It is also called Knowledge Discovery (KDD)

It the process of extracting interesting (non-trivial,


implicit, previously unknown and potentially useful)
patterns or knowledge from huge amount of data
Examples:
-Predicting the weather based on historical data
-Predicting the stock market prices based on the
people sentiment on social media
-Classifying people into trusted and untrusted by
banks for issuing credit cards and setting limits.

Example of Discovered Patterns

Association rules (Single dimension


association):
“80% of customers who buy cheese and milk also buy bread ,
and 5% of customers buy all of them together”
Cheese, Milk—> Bread [support =5%, confidence=80%]

Classification:
Classifying customers of insurance companies based on some
criteria. Confidence in customers is really important in insurance
companies. So, using previous claims to classifying people may
help in classification of future customers.

What is not Data Mining

Any extracted information that is done by


simple query or calculation is not classified as a
data mining.
Examples:
- Computing the GPA of students;
- Dividing students into two groups based on
gender;
- Searching the Web for a certain keywords.

Thank You.
Questions????

You might also like