Data Mining Outline

PIR MEHR ALI SHAH ARID AGRICULTURE UNIVERSITY
University Institute of Information Technology
CS-775 Advanced Data Mining

Credit Hours: 3(3-0) Prerequisites: Data Mining
Course Learning Outcomes (CLOs)
At the end of course the students will be able to: Domain BT Level*
1. Explain fundamental database concepts. C 2
2. Design conceptual, logical and physical database C 5
schemas using different data models.
3. Identify functional dependencies and resolve database C 2
anomalies by normalizing database tables.
4. Use Structured Query Language (SQL) for database C 4
definition and manipulation in any DBMS
*BT- Bloom’s Taxonomy, C=Cognitive domain, P=Psychomotor domain, A=Affective domain
Course Contents:
Topic to be covered include:
 Basic statistical ideas - populations, distributions, samples and random samples
 Classification models and methods - including: linear discriminant analysis; trees;
random forests; neural nets; boosting and bagging approaches; support vector machines.
 Linear regression approaches to classification, compared with linear discriminant
analysis,
 The training/test approach to assessing accuracy, and cross-validation.
 Strategies in the (common) situation where source and target population differ,
typically in time but in other respects also.
 Unsupervised models - kmeans, association rules, hierarchical clustering, model based
clusters.
 Low-dimensional views of classification results - distance methods and ordination.
 Strategies for working with large data sets.
 Practical approaches to classification with real life data sets, using different methods to
gain different insights into presentation.
 Privacy and security.
 Use of the R system for handling the calculations.
Course Objective:
The main focus of the course will be supervised learning, primarily for classification. The
emphasis will be on practical applications of the methodologies that are described, with the R
system used for the computations. Attention will be given to
1) Generalizability and predictive accuracy, in the practical contexts in which methods are
applied.
2) Low-dimensional visual representation of results, as an aid to diagnosis and insight.
3) Interpretability of model parameters, including potential for misinterpretation.

Lectures, Written Assignments, Practical labs, Semester Project, Presentations
Courses Assessment:
Exams, Assignments, Quizzes. Course will be assessed using a combination of written
examinations.
Week Contents Theory
1
M1: Introduction: Machine Learning and Data Mining
 Data Flood
 Data Mining Application Examples
 Data Mining and Knowledge Discovery
 Data Mining Tasks
2 M2: Machine Learning and Classification

 Machine Learning and Classification
 Examples
 Learning as Search
 Bias
 Weka
3 M3. Input: Concepts, instances, attributes

 What is a concept?
 What is an example?
 What is an attribute?
 Preparing the data
4 M4. Output: Knowledge Representation

 Decision tables
 Decision trees
 Decision rules
 Rules involving relations
 Instance-based representation
5 M5. Classification - Basic methods

 OneR
 NaiveBayes
6 M6: Classification: Decision Trees

 Top-Down Decision Trees
 Choosing the Splitting Attribute
 Information Gain and Gain ratio
7 M7: Classification: C4.5

 Handling Numeric Attributes
Finding Best Split
 Dealing with Missing Values
 Pruning
Pre-pruning, Post-Pruning, Estimating Error Rates
 From Trees to Rules
8 M8: Classification: CART

 CART Overview and Gymtutor Tutorial Example
 Splitting Criteria
 Handling Missing Values
 Pruning
Finding Optimal Tree
MID TERM
9 M9: Classification: more methods
 Rules
 Regression
 Instance-based (Nearest neighbor)
10 M10: Evaluation and Credibility

 Introduction
 Classification with Train, Test, and Validation sets
Handling Unbalanced Data; Parameter Tuning
 *Predicting Performance
 Evaluation on "small data": Cross-validation
 *Bootstrap
 Comparing Data Mining Schemes
 *Choosing a Loss Function
11 M11: Evaluation - Lift and Costs

 Lift and Gains charts
 *ROC
 Cost-sensitive learning
 Evaluating numeric predictions
 MDL principle and Occam's razor
12 M12: Data Preparation for Knowledge Discovery

 Data understanding
 Data cleaning
 Date transformation
 Discretization
 False "predictors" (information leakers)
 Feature reduction, leaker detection
 Randomization
 Learning with unbalanced data
13 M13: Clustering
 Introduction
 K-means
 Hierarchical
14 M14: Associations
 Transactions
 Frequent itemsets
 Association rules
 Applications
15 M15: Visualization
 Graphical excellence and lie factor
 Representing data in 1,2, and 3-D
 Representing data in 4+ dimensions
o Parallel coordinates
o Scatterplots
o Stick figures
16 M19: Data Mining and Society; Future Directions

 Data Mining and Society: Ethics, Privacy, and Security issues
 Future Directions for Data Mining
web mining, text mining, multi-media data
 Course Summary
Final Exam

Data Mining Outline

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining Outline

Uploaded by

Copyright:

Available Formats

PIR MEHR ALI SHAH ARID AGRICULTURE UNIVERSITY

University Institute of Information Technology

CS-775 Advanced Data Mining

2) Low-dimensional visual representation of results, as an aid to diagnosis and insight.

3) Interpretability of model parameters, including potential for misinterpretation.

2 M2: Machine Learning and Classification

3 M3. Input: Concepts, instances, attributes

4 M4. Output: Knowledge Representation

5 M5. Classification - Basic methods

6 M6: Classification: Decision Trees

7 M7: Classification: C4.5

8 M8: Classification: CART

10 M10: Evaluation and Credibility

11 M11: Evaluation - Lift and Costs

12 M12: Data Preparation for Knowledge Discovery

16 M19: Data Mining and Society; Future Directions

You might also like