Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Crisp DM framework

Data Mining Tasks:


Description
Estimation
Prediction
Classification
Clustering
Association

1
TOPICS
Topics
Business Objectives and Data Mining Problem
Data Preprocessing
Exploratory Data Analysis
Simple Linear Regression
Multiple Linear Regression
Logistic Regression
Model Evaluation Techniques
Integration of Predictive and Prescriptive
Analytics
Decision Trees
Ensemble Methods: Bagging and Boosting
Model Development
Conduct Exploratory
Derive and Analyze Data Analysis
Explore the data
Descriptive Statistics

Define functional
Perform Estimate regression form of the
Diagnostic Tests parameters relationship

NO

Model satisfies
diagnostic test Validate
YES the STOP
model(s)
R

Data structures class(), Indexing[ ], c(),list(),factor(),matrix(),array(),


data.frame(), length(),levels(),dim(),str(),
head(),tail(),names(), rownames(),colnames(),cbind(),
rbind(),nrow(),ncol()
Operators +,-,*|, & (element wise) &&,||(array)
>,<,>=,==,!=,<=, :,rep
Control For loop, If then else, Ifelse
statements

4
Graphical output plot(),hist(),barplot()
par(mfrow=c(2,2))
Mathematical Function sqrt(),log(),min(), max()
Statistical summary(), mean(),sd(),quantile(), cor()
Data generation
sample()
set.seed()
Missing na.strings = c(“”,””)
na.omit(df)
is.na()
complete.cases(df)
Data input read.csv(strip.white,stringsAs Factors,header)
readxl()
anyDuplicated(df)
duplicated(df)

5
Models Commands Problems/ Case Studies Concepts

Linear lm() Baseball Data Model Building


Regression predict() LaQunita Case Study Assumptions Check
Models plot() Package Pricing Case Study Transformations
summary() Influential Point Analysis
vif()
step()
Model set.seed() Mileage Data K fold CV
Validation cv.glm() Test Error, Training Error

Logistic sample.split() Pima Data (Diabetes Classification Table


Regression glm() Example) AUROC
plotROC() Marketing Head Optimal Cut off
optimalCutoff() Conundrum Integration of Predictive
and Prescriptive Analytics
Decision Tree rpart() Heart Data Set Gini Index
rpart.plot() QWE Case Study Termination Criterion
prune() Overfitting
Tree pruning
Ensemble randomForest() Heart Data Set Bagging
Method boosting() QWE Case Study Boosting

You might also like