Why Do You Need To Scale Data in KNN: 3 Answers

SPONSORED BY
Home
Why do you need to scale data in Ask Question
Questions
KNN
Tags
Asked 3 years, 8 months ago Active 7 months ago Viewed 47k times
Users
Unanswered Could someone please explain to me why you need to normalize data
when using K nearest neighbors.
24
I've tried to look this up, but I still can't seem to understand it.
I found the following link:
12 https://discuss.analyticsvidhya.com/t/why-it-is-necessary-to-normalize-in-
knn/2715
But in this explanation, I don't understand why a larger range in one of the
features affects the predictions.
k-nearest-neighbour
Share Cite Edit Follow asked Jun 26 '17 at 17:41

bugsyb
471 1 4 12
I think normalization has to be justified from the subject-matter point of view.

Essentially, what matters is what defines the distance between points. You
have to find a convenient arithmetic definition of distance that reflects the
subject-matter definition of distance. In my limited experience, I have
normalize in some but not all directions based on subject-matter
considerations. – Richard Hardy Jun 26 '17 at 19:00
1 For an instructive example, please see

stats.stackexchange.com/questions/140711. – whuber ♦ Jun 26 '17 at 19:28
It seems like any scaling (min-max or robust) is acceptable, not just standard
scaling. Is that correct? – skeller88 Apr 10 '20 at 20:20
Add a comment
3 Answers Active Oldest Votes
The k-nearest neighbor algorithm relies on majority voting based on class

membership of 'k' nearest samples for a given test point. The nearness of
34 samples is typically based on Euclidean distance.
Consider a simple two class classification problem, where a Class 1

sample is chosen (black) along with it's 10-nearest neighbors (filled green).
In the first figure, data is not normalized, whereas in the second one it is.
Notice, how without normalization, all the nearest neighbors are aligned in
the direction of the axis with the smaller range, i.e. 𝑥1 leading to incorrect
classification.
Normalization solves this problem!
Share Cite Edit Follow edited Jun 26 '17 at 20:09
answered Jun 26 '17 at 18:56

kedarps
2,422 1 16 28
2 This answer is exactly right, but I fear the illustrations might be deceptive
because of the distortions involved. The point might be better made by
drawing them both so that the two axes in each are at the same scale. –
whuber ♦ Jun 26 '17 at 19:30
1 I found it difficult to fit all data points in the same scale for both figures. Hence,
I mentioned in a note that scales of axes are different. – kedarps Jun 26 '17 at
19:55
1 That difficulty actually is the point of your response! One way to overcome it is
not to use such an extreme range of scales. A 5:1 difference in scales, rather
than a 1000:1 difference, would still make your point nicely. Another way is to
draw the picture faithfully: the top scatterplot will seem to be a vertical line of
points. – whuber ♦ Jun 26 '17 at 19:57
2 @whuber, I misunderstood your first comment. Fixed the plots, hopefully it's
better now! – kedarps Jun 26 '17 at 20:10
1 @Undertherainbow That is correct! – kedarps Mar 11 '19 at 19:33
Show 1 more comments
Suppose you had a dataset (m "examples" by n "features") and all but one
feature dimension had values strictly between 0 and 1, while a single
10 feature dimension had values that range from -1000000 to 1000000. When
taking the euclidean distance between pairs of "examples", the values of
the feature dimensions that range between 0 and 1 may become
uninformative and the algorithm would essentially rely on the single
dimension whose values are substantially larger. Just work out some
example euclidean distance calculations and you can understand how the
scale affects the nearest neighbor computation.
Share Cite Edit Follow edited Jun 26 '17 at 20:26
answered Jun 26 '17 at 20:08

Derek Jones
101 3
Add a comment
If the scale of features is very different then normalization is required. This

is because the distance calculation done in KNN uses feature values.
0 When the one feature values are large than other, that feature will
dominate the distance hence the outcome of the KNN.
see example on gist.github.com
Share Cite Edit Follow answered Jul 4 '20 at 0:47

Ajey
145 4
Add a comment
Your Answer
Links Images Styling/Headers Lists Blockquotes Code HTML Tables
Advanced help
Post Your Answer
Not the answer you're looking for? Browse other questions tagged
k-nearest-neighbour or ask your own question.
Featured on Meta
Opt-in alpha test for a new Stacks editor
Visual design changes to the review

queues
Hot Meta Posts
7 What does it mean that a user is

unregistered?
Linked
20 Why does gap statistic for k-means suggest

one cluster, even though there are
obviously two of them?
29 Why would anyone use KNN for

regression?
1 Is it possible to compare two different

datasets with a different range of values?
Related
10 Why is KNN not “model-based”?
29 Why would anyone use KNN for

regression?
2 What machine learning model would I use

for predicting rainfall given numerical data
1 Do I need data separation in KNN?
1 How Rapidminer handle same distance for

KNN Algorithm
1 Why is my KNN model overfitting?
2 My machine learning finds patterns in

literally random (generated) data, how to
fix?
Hot Network Questions

should developers have a say in functional
requirements
N-dimensional pyramid numbers
Why would the Lincoln Project campaign *against*

Sen. Susan Collins?
Is there an election System that allows for seats to

be empty?
varioref doesn't use the right language
What does "if the court knows herself" mean?
First postdoc as "the big filter": myth or fact?
why does tee refuse to work when launched

remotely via ssh?
What is the meaning of energy-consistent and

shape-consistent in the context of
pseudopotentials?
Does tightening a QR skewer in different ways

affect wheel alignment?
Which was the first magazine presented in

electronic form, on a data medium, to be read on a
computer?
Cowboys and Aliens type TV Movie on SyFy

Channel
Is it dangerous to use a gas range for heating?
How to tell coworker to stop trying to protect me?
Which is better to minimize w.r.t a lower bound or

an upper bound of an objective function?
Can you solve this creative chess problem?
Why do string instruments need hollow bodies?
Why do guitarists specialize on particular

techniques?
What's a positive phrase to say that I quoted

something not word by word
Bitwarden PIN protection mechanism
Can anyone give me an instance of 3SAT with

exactly one solution?
Buying a house with my new partner as Tenants in

common. How do we work out what is fair for us
both?
Why won't NASA show any computer screens?
unrar nested folder in ubuntu strange behaviour
Question feed
CROSS VALIDATED COMPANY Blog Facebook Twitter LinkedIn Instagram
Tour Stack Overflow

Help For Teams
Chat Advertise With Us
Contact Hire a Developer
Feedback Developer Jobs
Mobile About
Disable Responsiveness Press
Legal
Privacy Policy
Terms of Service
Cookie Settings
STACK EXCHANGE
NETWORK
Technology
Life / Arts
Culture / Recreation
Science site design / logo © 2021 Stack Exchange Inc;
user contributions licensed under cc by-sa.
Other rev 2021.2.18.38600

Why Do You Need To Scale Data in KNN: 3 Answers

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Why Do You Need To Scale Data in KNN: 3 Answers

Uploaded by

Copyright:

Available Formats

SPONSORED BY

I found the following link:

Share Cite Edit Follow asked Jun 26 '17 at 17:41

I think normalization has to be justified from the subject-matter point of view.

1 For an instructive example, please see

3 Answers Active Oldest Votes

The k-nearest neighbor algorithm relies on majority voting based on class

Consider a simple two class classification problem, where a Class 1

Normalization solves this problem!

Share Cite Edit Follow edited Jun 26 '17 at 20:09

answered Jun 26 '17 at 18:56

1 @Undertherainbow That is correct! – kedarps Mar 11 '19 at 19:33

Show 1 more comments

Share Cite Edit Follow edited Jun 26 '17 at 20:26

answered Jun 26 '17 at 20:08

If the scale of features is very different then normalization is required. This

see example on gist.github.com

Share Cite Edit Follow answered Jul 4 '20 at 0:47

Links Images Styling/Headers Lists Blockquotes Code HTML Tables

Post Your Answer

Opt-in alpha test for a new Stacks editor

Visual design changes to the review

Hot Meta Posts

7 What does it mean that a user is

20 Why does gap statistic for k-means suggest

29 Why would anyone use KNN for

1 Is it possible to compare two different

10 Why is KNN not “model-based”?

29 Why would anyone use KNN for

2 What machine learning model would I use

1 Do I need data separation in KNN?

1 How Rapidminer handle same distance for

1 Why is my KNN model overfitting?

2 My machine learning finds patterns in

Hot Network Questions

N-dimensional pyramid numbers

Why would the Lincoln Project campaign *against*

Is there an election System that allows for seats to

varioref doesn't use the right language

What does "if the court knows herself" mean?

First postdoc as "the big filter": myth or fact?

why does tee refuse to work when launched

What is the meaning of energy-consistent and

Does tightening a QR skewer in different ways

Which was the first magazine presented in

Cowboys and Aliens type TV Movie on SyFy

Is it dangerous to use a gas range for heating?

How to tell coworker to stop trying to protect me?

Which is better to minimize w.r.t a lower bound or

Can you solve this creative chess problem?

Why do string instruments need hollow bodies?

Why do guitarists specialize on particular

What's a positive phrase to say that I quoted

Bitwarden PIN protection mechanism

Can anyone give me an instance of 3SAT with

Buying a house with my new partner as Tenants in

Why won't NASA show any computer screens?

unrar nested folder in ubuntu strange behaviour

CROSS VALIDATED COMPANY Blog Facebook Twitter LinkedIn Instagram

Tour Stack Overflow

You might also like

Why would the Lincoln Project campaign against