Professional Documents
Culture Documents
INSY 5339 - Data Mining Exam #2 Review
INSY 5339 - Data Mining Exam #2 Review
INSY 5339 - Data Mining Exam #2 Review
Exam #2 Review
Know the following methods of estimation: holdout, random subsample, cross validation, and
stratified sampling (see the lecture notes on Model Evaluation).
Know what the following methods for treating class imbalance problem do: over-sampling,
under-sampling and SMOTE.
Know the different metrics for performance evaluation: accuracy, cost, precision, recall, F-
measure, etc. (see the lecture notes on Model Evaluation)
Know the following data preparation methods: sampling and discretization (equal frequency vs.
equal width).
Know the following data representation formats: decision tables, decision trees, classification
rules, regression equation.
Know methods for estimating generalization errors: optimistic vs pessimistic (see the example in
lecture notes on Decision Trees).
Know the following data mining methods: ZeroR, OneR (see the example in lecture notes on
Basic Classification Methods), Prism (see the example in lecture 7 on Classification Methods II),
Instance-based (k-NN; see the sample 1-NN problem as separate attachment), Decision Tables,
Linear Regression, Association Rules (should be able to find the association rules with a certain
confidence level – see the example in lecture notes on Intro to Machine Learning), Decision
Trees (using either the entropy measure or Gini coefficient; see the sample DTL problem as
separate attachment).
Know how to construct the ROC curve given the posterior probabilities of predicting the class
(see the example in lecture notes on Model Evaluation)