L0_Overview

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 15

Data Mining

Flavius Frasincar

1
Contents

• Your Teacher
• This Course
• Evaluation
• Book

2
Your Teacher

• Flavius Frasincar, frasincar@ese.eur.nl


• PhD completed at the Eindhoven University of Technology
(TU/e) in June 2005:
– Title of the thesis: “Hypermedia Presentation Generation for
Semantic Web Information Systems”
(thesis available from http://alexandria.tue.nl/extra2/200511530.pdf)
– 2004-2005 assistant professor at TU/e

• From August 2005 assistant professor at Erasmus


University Rotterdam (EUR)
• I do come originally from Romania

3
Romania

4
Computer Science Minor

• I am the coordinator of the (Advanced) Computer


Science Minor
• Courses:
– Introduction to Programming [broadening minor] or Advanced
Programming (minor) [deepening minor] (4 ECTS)
– Databases (4 ECTS)
– Data Mining (4 ECTS)
– Topics in Business Intelligence (3 ECTS): compulsory only for
the ones following a 15 ECTS variant of the minor (non-ESE)

5
Computer Science Minor

• “Successful participation in this minor requires a significant ability to


deal with abstract concepts. In addition, a good mathematical
background (algebra, calculus and statistics) is desired.” (from
https://www.eur.nl/en/minor/computer-science)
• This minor is a lot of hard work, but:
– You will learn many Computer Science topics
– You will learn a lot of useful Computer Science skills (e.g., programming,
querying, designing, modeling, etc.), which are very much appreciated by future
employers (especially in industry)
– If you like mathematics, the minor is a lot of fun!

6
This Course

• The course Data Mining (FEB53020) covers the major


principles and techniques used in Data Mining
• At the end of the course you should know:
– What is data mining?
– What are data types, data quality, and data preprocessing?
– What are data similarity and data dissimilarity?
– What are data classification techniques?
– How to evaluate data classification techniques?
– What are data clustering techniques?
– How to evaluate data clustering techniques?

7
Topics

• Data Types, Data Quality, and Data Preprocessing


• Data Similarity and Data Dissimilarity
• Data Classification Techniques
• Data Classification Evaluation
• Data Clustering Techniques
• Data Clustering Evaluation

8
Lectures

• Week 1 (Tuesday, 30 August 2022)


• Week 2 (Tuesday, 06 September 2022)
• Week 3 (Tuesday, 13 September 2022)
• Week 4 (Tuesday, 20 September 2022)
• Week 5 (Tuesday, 27 September 2022)
• Week 6 (Tuesday, 04 October 2022)
• Week 7 (Tuesday, 11 October 2022)

9
Group Meetings

• Take place at my office ET-44 (after the lectures 10


minutes/group starting at 15:10, ordered by group id)
• Evaluation of the progress of the work
• Questions regarding lectures
• Workload: 4 ECTS x 28 hours = 112 hours (112 hours/(4
hours/day) = 28 days!)
• Presentations (PowerPoint) each week (on your laptops)
• Presentations are compulsory!
• 30 August 2022 (today) there are no group meetings

10
Evaluation

• Assignments:
– Report (assignments after each lecture)
• For the assignments groups of 5 people should be formed:
– Every team member should equally contribute to the assignment
• The groups need to be formed 30 August 2022 (today)
• You can sign-up using
https://docs.google.com/spreadsheets/d/1OSuB-H2aSfKZigW_-
AKN3xHuX_0HivP45htDTIY3mx0/edit#gid=0
(if you do not have a group, sign-up for the first free slot and
coordinate with the rest of the group members by email)
• Evaluation based on:
– Report
– Written examination
11
Evaluation

• Tuesday 11 October 2022 final presentation


• Thursday 13 October 2022 reports sent by email in PDF
to me
• Tuesday 18 October 2022 09:30-11:30 written
examination
• Mark = individual input in the work resulting in the report
+ quality of the report [2 points] + written examination [8
points]

12
Report

• Group ID
• Authors
• Assignment 1:
– Exercise 1: formulation + solution
– Exercise 2: formulation + solution
– …

• Assignment 2:
– …

13
Book

• Title: Introduction to Data


Mining
• Authors: Pang-Ning Tan,
Michael Steinbach, and Vipin
Kumar
• Publisher: Pearson Education
• Year: 2005
• ISBN: 978-0321420527
• 1st Edition

14
Tools

• WEKA: GUI
• Matlab: statistics toolbox>machine learning package
• R: caret, nnet, randomForest, etc.
• Python: scikit-learn, TensorFlow, PyTorch, etc.
• Java: WEKA, deeplearning4j, Mahout, MALLET, etc.

• Many videos and tutorials online!

15

You might also like