Professional Documents
Culture Documents
TIM209 HW1 Solutions Problem1: X1 X2 X3 Y Distance From Origin
TIM209 HW1 Solutions Problem1: X1 X2 X3 Y Distance From Origin
Problem1
(a) Compute the Euclidean distance between each observation and the test point, X1 = X2 = X3 = 0.
Our prediction with K=1 is Green because we will be picking the 1 nearest neighbor and clustering
accordingly.
Our prediction with K=3 is Red because we will be picking the 3 nearest neighbors and clustering according to
whichever color occurs most number of times.
(d) If the Bayes decision boundary in this problem is highly nonlinear, then would we expect the best value for
K to be large or small? Why?
When K becomes larger, we get a smoother boundary, therefore if the boundary is very non-linear, we would
expect K to be small.
Problem3
(vi) #This depends on what question you want to answer from this dataset. If you want to know which
universities have the highest % of faculty with PhDs, then we can start to dig into that.
summary(college$PhD)
#The range of % of faculty with PhDs is 8 to 103 and the median is 75. The 103% throws me off a little
bit. Not sure if that is data integrity (outlier).
nrow(subset1<-college[college$PhD==103,])
#There is only 1 such university. Clearly an outlier. We can either choose to correct this to 100% in our
final analysis or we can just ignore this record altogether and move on with the rest of the data.
row.names(subset1)