Tutorial 6

Uploaded by

Low Jia Hui

0% found this document useful (0 votes)

3 views3 pages

Original Title

Tutorial6

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

3 views3 pages

Tutorial 6

Uploaded by

Low Jia Hui

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 3

Search inside document

Tutorial 6

DSA1101
Introduction to Data Science
October 12, 2018

Exercise 1. n-fold cross validation for decision trees

Recall that we studied n-fold cross-validation for the k-nearest neighbor
classifier, in which the value of k is varied to control the complexity of the
decision surface for the classifier. For decision tree classification, a similar
complexity parameter exists, which is denoted as Cp . Heuristically, smaller
values of Cp correspond to decision trees of larger sizes, and hence more
complex decision surfaces. For this week’s tutorial, we will investigate n-fold
cross validation for a decision tree classifier.

(a) Consider the dataset ‘bank-sample.csv’ we discussed in the lectures. For

this exercise, we will fit a decision tree with subscribed as outcome
and job, marital, education, default. housing, loan, contact and
poutcome as feature variables. We want to find the best Cp value in
terms of misclassification error rate.
1. Randomly split the entire dataset into 10 mutually exclusive datasets
2. Let Cp take on the values 10k for k = −5, −4, −3, ..., 0, ..., 3, 4, 5.
3. At each Cp value, run the following loop for j = 1, 2, ..., 10 :
(a) Set the j th group to be the test set
(b) Fit a decision tree on the other 9 sets with the value of Cp
(c) Predict the class assignment of subscribed for each observation
of the test set
(d) Calculate the number of misclassification(s) by comparing pre-
dicted versus actual class labels in the test set
4. Determine the best Cp value in terms of misclassification error rate.

1
1 library ( " rpart " )
2 library ( " rpart . plot " )
3
4
5 # CV for decision tree
6
7 banktrain <- read . table ( " bank - sample . csv " , header = TRUE , sep = " ," )
8
9 # # drop a few columns to simplify the tree
10 drops <-c ( " age " , " balance " , " day " , " campaign " ,
11 " pdays " , " previous " , " month " , " duration " )
12 banktrain <- banktrain [ , ! ( names ( banktrain ) % in % drops ) ]
13
14 # # total records in dataset
15 n = dim ( banktrain ) [1]
16
17 # # Randomly split into 10 datasets
18 # # We have seen this code before
19 n _ folds =10
20 folds _ j <- sample ( rep (1: n _ folds , length . out = n ) )
21 table ( folds _ j )
22
23 cp =10^( -5:5)
24 misC = rep (0 , length ( cp ) )
25
26 for ( i in 1: length ( cp ) ) {
27
28 misclass =0
29 for ( j in 1: n _ folds ) {
30 test <- which ( folds _ j == j )
31 train = banktrain [ - c ( test ) ,]
32 fit <- rpart ( subscribed ~ job + marital +
33 education + default + housing +
34 loan + contact + poutcome ,
35 method = " class " ,
36 data = train ,
37 control = rpart . control ( cp = cp [ i ]) ,
38 parms = list ( split = ’ information ’) )
39
40 new . data = data . frame ( banktrain [ test , c (1:8) ])
41 # # predict label for test data based on fitted tree
42 prd = predict ( fit , new . data , type = ’ class ’)
43 misclass = misclass + sum ( prd ! = banktrain [ test ,9])
44 }
45 misC [ i ]= misclass / n
46 }
47

48 plot ( log ( cp , base =10) , misC , type = ’b ’)

2
(b) Plot the decision tree fitted with the best Cp value in terms of misclas-
sification rate
1 # # determine the best cp in terms of
2 # # miscl assifi cation rate
3
4 best . cp = cp [ which ( misC == min ( misC ) ) ]
5
6 # # Fit decision tree
7 fit <- rpart ( subscribed ~ job + marital +
8 education + default + housing +
9 loan + contact + poutcome ,
10 method = " class " ,
11 data = train ,
12 control = rpart . control ( cp = best . cp ) ,
13 parms = list ( split = ’ information ’) )
14
15 # # Plot the tree
16 rpart . plot ( fit , type =4 , extra =2)

Q 2
Document3 pages
Q 2
Mohit Jain
No ratings yet
Codes
Document14 pages
Codes
Arvind NANDAN SINGH
No ratings yet
A Brief Overview On Data Mining Survey PDF
Document8 pages
A Brief Overview On Data Mining Survey PDF
Ned Thai
No ratings yet
Spam Email Classification 3
Document8 pages
Spam Email Classification 3
Austin Kinion
No ratings yet
Final Project
Document9 pages
Final Project
Guillermo Aguilar
No ratings yet
R-Unit 5
Document76 pages
R-Unit 5
Lakshmi Ganesh
No ratings yet
21bme145 Exp 5
Document3 pages
21bme145 Exp 5
21bme145
No ratings yet
LabNote 3
Document3 pages
LabNote 3
CHAI TZE ANN
No ratings yet
21bme145 Exp 4
Document3 pages
21bme145 Exp 4
21bme145
No ratings yet
Exercise 02 RadonovIvan 5967988
Document1 page
Exercise 02 RadonovIvan 5967988
Erika
No ratings yet
Solution First Point ML-HW4
Document6 pages
Solution First Point ML-HW4
Juan Sebastian Otálora Montenegro
100% (1)
Grid Search For SVM
Document9 pages
Grid Search For SVM
kPrasad8
No ratings yet
21BEC505 Exp2
Document7 pages
21BEC505 Exp2
jay
No ratings yet
Tutorial 6
Document8 pages
Tutorial 6
POEASO
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
Document20 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
Saloni Tuli
No ratings yet
Vid 4
Document6 pages
Vid 4
diyalap01
No ratings yet
Code
Document25 pages
Code
Classy Man
No ratings yet
K Means Clustering in R Example - Learn by Marketing
Document3 pages
K Means Clustering in R Example - Learn by Marketing
Ari Clecius
No ratings yet
Machine Learning Lab
Document13 pages
Machine Learning Lab
Kumaraswamy Annam
No ratings yet
Rstudio Study Notes For PA 20181126
Document6 pages
Rstudio Study Notes For PA 20181126
Trong Nghia Vu
No ratings yet
PGM 7
Document3 pages
PGM 7
badeni
No ratings yet
Question 2.2
Document2 pages
Question 2.2
Md Shameem
No ratings yet
sectionSVM PDF
Document10 pages
sectionSVM PDF
Arif Rahman Hakim
No ratings yet
PCA Explained
Document9 pages
PCA Explained
Raj kumar
No ratings yet
Home Work
Document12 pages
Home Work
sandeepssn47
No ratings yet
LAB-4 Report
Document21 pages
LAB-4 Report
jithentar.cs21
No ratings yet
PCR and Pls Regression
Document5 pages
PCR and Pls Regression
api-285777244
No ratings yet
Subject: ML Name: Priyanshu Gandhi Date: 10/4/21 Expt. No.: 9 Roll No.: C008 Title: Clustering Implementation in Python
Document7 pages
Subject: ML Name: Priyanshu Gandhi Date: 10/4/21 Expt. No.: 9 Roll No.: C008 Title: Clustering Implementation in Python
Kartik Katekar
No ratings yet
KRAI Practical
Document14 pages
KRAI Practical
Contact Vishal
No ratings yet
Kelbie Davidson (44817015) COMP4702 - Assignment 2
Document8 pages
Kelbie Davidson (44817015) COMP4702 - Assignment 2
Kelbie Davidson
No ratings yet
ML - Practical File
Document15 pages
ML - Practical File
Jatin Mathur
No ratings yet
Programming With R Test 2
Document5 pages
Programming With R Test 2
KamranKhan
50% (2)
Python For DS Cheat Sheet
Document6 pages
Python For DS Cheat Sheet
Sebastián Emdef
100% (2)
ML Lab
Document7 pages
ML Lab
Sharan Patil
No ratings yet
Multicore Programming in Haskell
Document55 pages
Multicore Programming in Haskell
Don Stewart
No ratings yet
ConvexOpt Assessment 2 Sols
Document11 pages
ConvexOpt Assessment 2 Sols
Vaibhav Dabas
No ratings yet
Module - 4 (R Training) - Basic Stats & Modeling
Document15 pages
Module - 4 (R Training) - Basic Stats & Modeling
RohitGahlan
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbors: Zhongheng Zhang
Document7 pages
Introduction To Machine Learning: K-Nearest Neighbors: Zhongheng Zhang
Ethan Manani
No ratings yet
1.1 Loading The Data: Survival by Sex
Document6 pages
1.1 Loading The Data: Survival by Sex
k767
No ratings yet
"Cps - TXT" "Education" "South" "SEX" "Experience" "Union" "WAGE" "AGE" "RACE" "Occupat Ion" "Sector" "MARR"
Document9 pages
"Cps - TXT" "Education" "South" "SEX" "Experience" "Union" "WAGE" "AGE" "RACE" "Occupat Ion" "Sector" "MARR"
Alper Tamay Arslan
No ratings yet
Python For Data Science PDF
Document30 pages
Python For Data Science PDF
pdrpatnaik
100% (5)
Lecture 4
Document13 pages
Lecture 4
立哲
No ratings yet
Pattern Recognition
Document26 pages
Pattern Recognition
Aryan Attri
No ratings yet
Assignment 4 On Clustering Techniques
Document2 pages
Assignment 4 On Clustering Techniques
06–Yash Bhusal
No ratings yet
Exp 3
Document4 pages
Exp 3
jay
No ratings yet
Statistical Learning in R
Document31 pages
Statistical Learning in R
Angela Ivanova
No ratings yet
(Unit 4-5) R 2marks
Document6 pages
(Unit 4-5) R 2marks
srimathie.21aim
No ratings yet
Machinelearning - Alisya Athirah Binti Mohd Huzzainny (Updated)
Document26 pages
Machinelearning - Alisya Athirah Binti Mohd Huzzainny (Updated)
Alisya Athirah
No ratings yet
The Bmamevt Package: Bayesian Model Averaging at Work For Multivariate Extremes
Document5 pages
The Bmamevt Package: Bayesian Model Averaging at Work For Multivariate Extremes
ajquinonesp
No ratings yet
K-Nearest Neighbor On Python Ken Ocuma
Document9 pages
K-Nearest Neighbor On Python Ken Ocuma
Aliyha Dionio
100% (2)
Ww2 Coastal Edu Kingw Statistics R Tutorials Simplelinear HT
Document15 pages
Ww2 Coastal Edu Kingw Statistics R Tutorials Simplelinear HT
Nandkishore Srinivasan
No ratings yet
Lecture Material 12
Document9 pages
Lecture Material 12
Ali Naseer
No ratings yet
Exp 4
Document10 pages
Exp 4
jay
No ratings yet
ML Practical Journal With Writeups
Document46 pages
ML Practical Journal With Writeups
sakshi mishra
No ratings yet
Ip pb1 QP Ms Agra Set A
Document17 pages
Ip pb1 QP Ms Agra Set A
Mayank Mishra
No ratings yet
Experiment No 2 ML
Document3 pages
Experiment No 2 ML
ANIKET THAKUR [UCOE- 4388]
No ratings yet
Supervidsed Algorithm
Document19 pages
Supervidsed Algorithm
Mittu Rajareddy
No ratings yet
Aggialavura - Python Linear Regression Model
Document1 page
Aggialavura - Python Linear Regression Model
himtajay
No ratings yet
GATE Syllabus For Cs
Document11 pages
GATE Syllabus For Cs
Kalpana Uma
No ratings yet
K Means
Document329 pages
K Means
yousef shaban
100% (1)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Alnafrah, Zeno - 2019 - Innovation and Development A New Comparative Model For National Innovation Systems Based On Machine Learning Cla-Annotated
Document23 pages
Alnafrah, Zeno - 2019 - Innovation and Development A New Comparative Model For National Innovation Systems Based On Machine Learning Cla-Annotated
rleonardo_ochoa6263
No ratings yet
IV-cse DM Viva Questions
Document10 pages
IV-cse DM Viva Questions
Imtiyaz Ali
No ratings yet
Bank Note Authentication
Document13 pages
Bank Note Authentication
Ankit Goyal
No ratings yet
Classification
Document2 pages
Classification
Raja Velu
No ratings yet
It Works As Follows:: Decision Tree ?
Document3 pages
It Works As Follows:: Decision Tree ?
cse VBIT
No ratings yet
7-Decision Trees Learning
Document51 pages
7-Decision Trees Learning
yshu1
No ratings yet
Business Intelligence cw2
Document4 pages
Business Intelligence cw2
usman
No ratings yet
Decision Tree Classification of Spatial Data Streams Using Peano Count Trees
Document5 pages
Decision Tree Classification of Spatial Data Streams Using Peano Count Trees
nobeen666
No ratings yet
SM Binning
Document12 pages
SM Binning
Merry Purba
No ratings yet
Machine Learning For Optimal Electrode Wettability in Lithium Ion Batteries
Document12 pages
Machine Learning For Optimal Electrode Wettability in Lithium Ion Batteries
daaanu
No ratings yet
12395-Article (PDF) - 25776-2-10-20210118
Document39 pages
12395-Article (PDF) - 25776-2-10-20210118
Florin Muntean
No ratings yet
Chapter 9
Document22 pages
Chapter 9
abishek
No ratings yet
Data Mining - Classification: Alternative Techniques
Document120 pages
Data Mining - Classification: Alternative Techniques
Tran Duy Quang
100% (1)
Crop Price Prediction Using Machine Learning
Document5 pages
Crop Price Prediction Using Machine Learning
Tejas hv
No ratings yet
Saspdf Veryimp
Document163 pages
Saspdf Veryimp
Nishit Goyal
No ratings yet
RUS Boost Tree Ensemble Classifiers For OD
Document7 pages
RUS Boost Tree Ensemble Classifiers For OD
Esma Özel
No ratings yet
Ai Unit5 Learning
Document62 pages
Ai Unit5 Learning
1DA19CS111 Prajath
No ratings yet
Part 2
Document165 pages
Part 2
msuresh79
No ratings yet
ML Practical File
Document43 pages
ML Practical File
Pankaj Singh
100% (1)
UNIT - 2 .DataScience 04.09.18
Document53 pages
UNIT - 2 .DataScience 04.09.18
Raghavendra Rao
No ratings yet
Prediction of Diseases in Smart Health Care System Using Machine Learning
Document5 pages
Prediction of Diseases in Smart Health Care System Using Machine Learning
Nadeem Jamadar
No ratings yet
Tutorial Rapid Miner Life Insurance Promotion 1 PDF
Document11 pages
Tutorial Rapid Miner Life Insurance Promotion 1 PDF
IamSajid Jatoi
No ratings yet
Decision Trees 2 PDF
Document39 pages
Decision Trees 2 PDF
sushanth
No ratings yet
Score Prediction and Analysis in Cricket
Document9 pages
Score Prediction and Analysis in Cricket
Nishant shukla
No ratings yet
ML Unit 3
Document40 pages
ML Unit 3
Tharun Kumar
No ratings yet
Representation of Decision Problems With Influence Diagrams
Document17 pages
Representation of Decision Problems With Influence Diagrams
Neslihan Ak
No ratings yet
CS583 Unsupervised Learning
Document95 pages
CS583 Unsupervised Learning
Jrey Kumalah
No ratings yet
Data Mining: Learning Objectives For Chapter 5
Document22 pages
Data Mining: Learning Objectives For Chapter 5
Captain Ayub
No ratings yet
Ivy - Data Science and Data Visualization Certification Course
Document10 pages
Ivy - Data Science and Data Visualization Certification Course
Rishi Mukherjee
100% (1)