Professional Documents
Culture Documents
Question 2.2
Question 2.2
Question 2.2
1. Using the support vector machine function ksvm contained in the R package kernlab, find a
good classifier for this data. Show the equation of your classifier, and how well it classifies the
data points in the full data set. (Don’t worry about test/validation data yet; we’ll cover that
topic soon.)
To find a good classifier for this data first we must explore some margin values. Exploring margin
values in increments of 100 between 10 and 2000, we find as follows.
Reviewing the two charts it seems that the highest prediction accuracy resides between 10-200.
Reviewing margins between 10-200 at increments of 5, we find as follows.
Based on our findings, a margin of 100 as initially provided can be utilized for ksvm with
vanilladot as there seems to be no significant deviation in prediction accuracy between 10 and
200.
3. Using the k-nearest-neighbors classification function kknn contained in the R kknn package,
suggest a good value of k, and show how well it classifies that data points in the full data set.
Don’t forget to scale the data (scale=TRUE in kknn).
prediction_vect 1 81.49847
prediction_vect 2 81.49847
prediction_vect 3 81.49847
prediction_vect 4 81.49847
prediction_vect 5 85.16820
prediction_vect 6 84.55657
prediction_vect 7 84.70948
prediction_vect 8 84.86239
prediction_vect 9 84.70948
prediction_vect 10 85.01529
prediction_vect 11 85.16820
prediction_vect 12 85.32110
prediction_vect 13 85.16820
prediction_vect 14 85.16820
prediction_vect 15 85.32110
prediction_vect 16 85.16820
prediction_vect 17 85.16820
prediction_vect 18 85.16820
prediction_vect 19 85.01529
prediction_vect 20 85.01529
prediction_vect 21 84.86239
prediction_vect 22 84.70948
prediction_vect 23 84.40367
prediction_vect 24 84.55657
prediction_vect 25 84.55657
prediction_vect 26 84.40367
prediction_vect 27 84.09786
prediction_vect 28 83.79205
prediction_vect 29 83.94495
prediction_vect 30 84.09786
prediction_vect 31 83.79205
prediction_vect 32 83.63914
prediction_vect 33 83.48624
prediction_vect 34 83.33333
prediction_vect 35 83.18043
prediction_vect 36 83.18043
prediction_vect 37 83.18043
prediction_vect 38 83.18043
prediction_vect 39 83.18043
prediction_vect 40 83.18043
prediction_vect 41 83.18043
prediction_vect 42 83.48624
prediction_vect 43 83.48624
prediction_vect 44 83.63914
prediction_vect 45 83.94495
prediction_vect 46 84.09786
prediction_vect 47 83.79205
prediction_vect 48 83.94495
prediction_vect 49 83.94495
prediction_vect 50 83.79205
Based on the results I would recommend using a k value of 12 because it is the smallest number of
neighbors required with the highest accuracy of 85.32% that I found. This way we can optimize both
efficiency of computation and accuracy of model.