Professional Documents
Culture Documents
Why Do You Need To Scale Data in KNN: 3 Answers
Why Do You Need To Scale Data in KNN: 3 Answers
Home
Why do you need to scale data in Ask Question
Questions
KNN
Tags
Asked 3 years, 8 months ago Active 7 months ago Viewed 47k times
Users
Unanswered Could someone please explain to me why you need to normalize data
when using K nearest neighbors.
24
I've tried to look this up, but I still can't seem to understand it.
12 https://discuss.analyticsvidhya.com/t/why-it-is-necessary-to-normalize-in-
knn/2715
But in this explanation, I don't understand why a larger range in one of the
features affects the predictions.
k-nearest-neighbour
It seems like any scaling (min-max or robust) is acceptable, not just standard
scaling. Is that correct? – skeller88 Apr 10 '20 at 20:20
Add a comment
Notice, how without normalization, all the nearest neighbors are aligned in
the direction of the axis with the smaller range, i.e. 𝑥1 leading to incorrect
classification.
2 This answer is exactly right, but I fear the illustrations might be deceptive
because of the distortions involved. The point might be better made by
drawing them both so that the two axes in each are at the same scale. –
whuber ♦ Jun 26 '17 at 19:30
1 I found it difficult to fit all data points in the same scale for both figures. Hence,
I mentioned in a note that scales of axes are different. – kedarps Jun 26 '17 at
19:55
1 That difficulty actually is the point of your response! One way to overcome it is
not to use such an extreme range of scales. A 5:1 difference in scales, rather
than a 1000:1 difference, would still make your point nicely. Another way is to
draw the picture faithfully: the top scatterplot will seem to be a vertical line of
points. – whuber ♦ Jun 26 '17 at 19:57
2 @whuber, I misunderstood your first comment. Fixed the plots, hopefully it's
better now! – kedarps Jun 26 '17 at 20:10
Suppose you had a dataset (m "examples" by n "features") and all but one
feature dimension had values strictly between 0 and 1, while a single
10 feature dimension had values that range from -1000000 to 1000000. When
taking the euclidean distance between pairs of "examples", the values of
the feature dimensions that range between 0 and 1 may become
uninformative and the algorithm would essentially rely on the single
dimension whose values are substantially larger. Just work out some
example euclidean distance calculations and you can understand how the
scale affects the nearest neighbor computation.
Add a comment
Add a comment
Your Answer
Advanced help
Not the answer you're looking for? Browse other questions tagged
k-nearest-neighbour or ask your own question.
Featured on Meta
Linked
Related
Question feed
STACK EXCHANGE
NETWORK
Technology
Life / Arts
Culture / Recreation
Science site design / logo © 2021 Stack Exchange Inc;
user contributions licensed under cc by-sa.
Other rev 2021.2.18.38600