INSY 5339 - Data Mining Exam #2 Review

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

INSY 5339 – Data Mining

Exam #2 Review

 Know the following methods of estimation: holdout, random subsample, cross validation, and
stratified sampling (see the lecture notes on Model Evaluation).
 Know what the following methods for treating class imbalance problem do: over-sampling,
under-sampling and SMOTE.
 Know the different metrics for performance evaluation: accuracy, cost, precision, recall, F-
measure, etc. (see the lecture notes on Model Evaluation)
 Know the following data preparation methods: sampling and discretization (equal frequency vs.
equal width).
 Know the following data representation formats: decision tables, decision trees, classification
rules, regression equation.
 Know methods for estimating generalization errors: optimistic vs pessimistic (see the example in
lecture notes on Decision Trees).
 Know the following data mining methods: ZeroR, OneR (see the example in lecture notes on
Basic Classification Methods), Prism (see the example in lecture 7 on Classification Methods II),
Instance-based (k-NN; see the sample 1-NN problem as separate attachment), Decision Tables,
Linear Regression, Association Rules (should be able to find the association rules with a certain
confidence level – see the example in lecture notes on Intro to Machine Learning), Decision
Trees (using either the entropy measure or Gini coefficient; see the sample DTL problem as
separate attachment).
 Know how to construct the ROC curve given the posterior probabilities of predicting the class
(see the example in lecture notes on Model Evaluation)

You might also like