Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Chair of Remote Sensing Technology

TUM School of Engineering and Design


Technical University of Munich

Prof. Dr. rer. nat. habil. Marco Körner

Practical Exercises to the Lecture

Introduction to Machine Learning


Winter Term 2023/24

Exercise Sheet 1: The Curse of Dimensionality

You will verify the claim that most data points in high-dimensional feature spaces are far away from each
other, and also approximately at the same distance.

1 Experiments on Synthetic Data



For each choice of dimension D ∈ 20 , 21 , . . . , 210 , sample 100 points xi ∈ RD , xi ∼ p(x) randomly from the
unit cube, following distribution models p(x) of your choice. Compute pair-wise distances di,j = d(xi , xj ), i ̸= j,
and nearest-neighbor (Euclidean) distances dNN (x). Use these values to retrieve some statistical parameters to
fill the following table:

Dimensionality pair-wise distances nearest neighbor distances

D min {d(xi , xj )} max {d(xi , xj )} avg {d(xi , xj )} min {dNN (x)} max {dNN (x)} avg {dNN (x)}
0
2
21
..
.

Plot both the averages and the standard deviations as functions of D.


You might take advantage of any function pre-implemented in available Python libraries, e.g., NumPy,
SciKit-Learn, Pandas, etc.

2 Experiments on Real-World Data


Repeat the previous experiments on real-world datasets by omitting particular feature dimensions. Feel free to
use any dataset(s) of your choice, some of them have been introduced in our lecture.

You might also like