Download as ppsx, pdf, or txt
Download as ppsx, pdf, or txt
You are on page 1of 24

Foundations of Data Science (CS F320)

Dr. Bharat Richhariya


Assistant Professor, Dept. of CSIS
BITS Pilani BITS Pilani
Pilani Campus
High dimensional spaces

BITS Pilani
Pilani Campus
High dimensional space
• Our geometric intuition often fails in higher dimensions.
• Many properties of simple objects, such as higher dimensional
analogs of cubes and spheres, are very counterintuitive.

Example
• Most of the volume of a high dimensional objects is near the surface.
• The volume of a sphere approaches 0 as the dimension increases to
high values.
• Vectors in high dimensions become nearly orthogonal.

3
Spheres and cubes in 2-d
• Consider a square with side length 1. 1/2
• At each corner of the square place a circle
of radius 1/2, so that the circles cover the 1
edges of the square.
• Then consider the circle centered at the
center of the square that is just large
enough to touch the circles at the corners of
the square.

4
Spheres and cubes in 3-d

5
Spheres and cubes in 3-d
• To understand what happens in higher
dimensions, we need to compute the
radius of the inner sphere in terms of the
dimension.
• The radius of the inner sphere is equal to 1
1/2(Length of diagonal of cube) – (Radius of
sphere at the corners).
• The length of the diagonal is .
• Thus, the radius of the inner sphere is

So, the radius of the inner sphere is


increasing with the dimension .

6
Spheres and cubes in 3-d
• In dimension, 2 and 3, the inner sphere is
inside the cube.
• In the radius of the inner sphere is equal to .
It touches the cube.
• In the radius of the inner sphere is

So, the inner sphere is now outside the unit


cube.

7
Volume in high dimensions
• The area of a circle is .
• Intersect the sphere with a plane at some
height above the center of the sphere.
• Summing up cross-sectional area gives
volume.

8
Volume of a unit sphere

A high dimensional unit sphere encloses almost no volume! 9


Concentration of measure
• If we place a band around the
equator of the unit sphere so that
99% of the surface area of the
sphere falls within that band.
• Nearly all of the surface area of a
high dimensional sphere lies near
the equator.

10
Concentration of measure (approach to
expected value)

11
Concentration of measure
• Every point on a d-sphere
(circle is 1-sphere) must
satisfy the equation,
•.
• As increases the number of
terms in the sum increases,
and each coordinate gets a
smaller share.

Plot showing 20000 random points sampled uniformly


from a d-sphere. 12
Unit sphere and hypercube

Illustration of the relationship between sphere and


hypercube in 2, 4, and d-dimensions.
13
Unit sphere and hypercube in high dimensions

14
Homework
• Find the distribution of the points (x,y) randomly selected from the
circumference of a circle (i.e. 1-sphere).
Hint: Calculate distributions of x=cos(), y=sin().

15
Gaussians in high dimension

Gaussian in 2-d Gaussian in 3-d


16
Norm in high dimensions,


𝑘
¿∨𝑥∨¿ 2= ∑𝑥 𝑖
2

𝑖=1
• s distributed according to
the chi distribution.
• Mode of chi distribution is
• In high dimensions, k is
represented as d, i.e. the
number of dimensions.
• Most of the samples gets
concentrated around a
sphere.

17
Chi distribution with ‘k’ degrees of freedom
Clustering in high dimensions
• Consider X1​∼(μ1​,I), X2​∼(μ2​,I), X3​∼(μ3​,I)​,
μ1​=[0,0,0,⋯]T, μ2​=[5,0,0,⋯]T, μ3​=[10,0,0,⋯]T.​

18
Clustering in high dimensions

19
Distance matrix of points

20
Better clustering in low dimensions

• We can clearly identify the three clusters when projecting the


samples along first two dimensions!
• This is why dimensionality reduction is important.

21
Spherical Gaussian
• The –dimensional spherical Gaussian (pdf has spherical symmetry) with 0 mean
and variance in each coordinate has density function

Spherical Gaussian (diagonal covariance,


Gaussian (diagonal covariance)
equal variance)
Gaussian Annulus Theorem
• The Gaussian Annulus Theorem states that nearly all the probability
of a spherical Gaussian with unit variance is concentrated in a thin
annulus at radius

• So the mean squared distance of a point from the center is .
• The theorem states that the points are tightly concentrated.
• Recall most of the volume of high dimensional objects is near the
surface. This implies that even a thin annulus can contain most of the
probability density of our Gaussian.
Annulus
23
Sources for some of the material used in these
slides:
• https://www.cs.cmu.edu/~venkatg/teaching/CStheory-infoage/
chap1-high-dim-space.pdf
• https://marckhoury.github.io/blog/counterintuitive-properties-of-
high-dimensional-space/
• https://research.wmz.ninja/articles/2018/03/the-counterintuitive-
behavior-of-high-dimensional-gaussian-distributions.html
• https://en.wikipedia.org/wiki/Annulus_%28mathematics%29
• https://www.inf.ed.ac.uk/teaching/courses/inf2b/learnnotes/inf2b-
learn-note08-2up.pdf

24

You might also like