Professional Documents
Culture Documents
Lecture 4
Lecture 4
Lecture 4
In the density estimation plot, the estimates provided by the Parzen window method
with both kernels are plotted against the true density of the triangular distribution.
You can see how each kernel approaches the task of estimating the true distribution
and the differences between them.
n = 10000, The plot above illustrates the density estimations using the Epanechnikov
kernel with different window widths h and compares them to the true triangle
density.
2 Non-parametric classification
● In the worst case, if you choose k equal to n (the total number of samples),
the k-NN classifier will always predict the most frequent class in the dataset,
regardless of the input.
● If the dataset has c categories, the worst-case error rate would be the
proportion of the dataset that is not in the most frequent class. This could be
close to 1 - (1/c) if the classes are evenly distributed.
● The naive approach involves computing the distance from the sample x to
every other sample in the dataset, then selecting the k closest samples.
● The computational complexity of this approach is O(n*d), where n is the
number of samples and d is the dimensionality of the data.
● To improve this, you can use techniques like KD-trees or ball trees for more
efficient distance computations, especially in higher dimensions.