Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

PIR MEHR ALI SHAH ARID AGRICULTURE UNIVERSITY

University Institute of Information Technology

CS-775 Advanced Data Mining


Credit Hours: 3(3-0) Prerequisites: Data Mining
Course Learning Outcomes (CLOs)
At the end of course the students will be able to: Domain BT Level*
1. Explain fundamental database concepts. C 2
2. Design conceptual, logical and physical database C 5
schemas using different data models.
3. Identify functional dependencies and resolve database C 2
anomalies by normalizing database tables.
4. Use Structured Query Language (SQL) for database C 4
definition and manipulation in any DBMS
*BT- Bloom’s Taxonomy, C=Cognitive domain, P=Psychomotor domain, A=Affective domain

Course Contents:
Topic to be covered include:
 Basic statistical ideas - populations, distributions, samples and random samples
 Classification models and methods - including: linear discriminant analysis; trees;
random forests; neural nets; boosting and bagging approaches; support vector machines.
 Linear regression approaches to classification, compared with linear discriminant
analysis,
 The training/test approach to assessing accuracy, and cross-validation.
 Strategies in the (common) situation where source and target population differ,
typically in time but in other respects also.
 Unsupervised models - kmeans, association rules, hierarchical clustering, model based
clusters.
 Low-dimensional views of classification results - distance methods and ordination.
 Strategies for working with large data sets.
 Practical approaches to classification with real life data sets, using different methods to
gain different insights into presentation.
 Privacy and security.
 Use of the R system for handling the calculations.

Course Objective:
The main focus of the course will be supervised learning, primarily for classification. The
emphasis will be on practical applications of the methodologies that are described, with the R
system used for the computations. Attention will be given to
1) Generalizability and predictive accuracy, in the practical contexts in which methods are
applied.

2) Low-dimensional visual representation of results, as an aid to diagnosis and insight.

3) Interpretability of model parameters, including potential for misinterpretation.


Lectures, Written Assignments, Practical labs, Semester Project, Presentations
Courses Assessment:
Exams, Assignments, Quizzes. Course will be assessed using a combination of written
examinations.
Week Contents Theory
1
M1: Introduction: Machine Learning and Data Mining
 Data Flood
 Data Mining Application Examples
 Data Mining and Knowledge Discovery
 Data Mining Tasks

2 M2: Machine Learning and Classification


 Machine Learning and Classification
 Examples
 Learning as Search
 Bias
 Weka

3 M3. Input: Concepts, instances, attributes


 What is a concept?
 What is an example?
 What is an attribute?
 Preparing the data

4 M4. Output: Knowledge Representation


 Decision tables
 Decision trees
 Decision rules
 Rules involving relations
 Instance-based representation

5 M5. Classification - Basic methods


 OneR
 NaiveBayes

6 M6: Classification: Decision Trees


 Top-Down Decision Trees
 Choosing the Splitting Attribute
 Information Gain and Gain ratio

7 M7: Classification: C4.5


 Handling Numeric Attributes
  Finding Best Split
 Dealing with Missing Values
 Pruning
  Pre-pruning, Post-Pruning, Estimating Error Rates
 From Trees to Rules

8 M8: Classification: CART


 CART Overview and Gymtutor Tutorial Example
 Splitting Criteria
 Handling Missing Values
 Pruning
  Finding Optimal Tree

MID TERM
9 M9: Classification: more methods
 Rules
 Regression
 Instance-based (Nearest neighbor)

10 M10: Evaluation and Credibility


 Introduction
 Classification with Train, Test, and Validation sets
  Handling Unbalanced Data; Parameter Tuning
 *Predicting Performance
 Evaluation on "small data": Cross-validation
 *Bootstrap
 Comparing Data Mining Schemes
 *Choosing a Loss Function

11 M11: Evaluation - Lift and Costs


 Lift and Gains charts
 *ROC
 Cost-sensitive learning
 Evaluating numeric predictions
 MDL principle and Occam's razor

12 M12: Data Preparation for Knowledge Discovery


 Data understanding
 Data cleaning
 Date transformation
 Discretization
 False "predictors" (information leakers)
 Feature reduction, leaker detection
 Randomization
 Learning with unbalanced data

13 M13: Clustering
 Introduction
 K-means
 Hierarchical

14 M14: Associations
 Transactions
 Frequent itemsets
 Association rules
 Applications

15 M15: Visualization
 Graphical excellence and lie factor
 Representing data in 1,2, and 3-D
 Representing data in 4+ dimensions
o Parallel coordinates
o Scatterplots
o Stick figures

16 M19: Data Mining and Society; Future Directions


 Data Mining and Society: Ethics, Privacy, and Security issues
 Future Directions for Data Mining
web mining, text mining, multi-media data
 Course Summary

Final Exam

You might also like