Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Data camp

Prof : Xavier Boute

Some quotes to begin …

«  The only statistics

you can trust are those
you falsified
yourself » 

Sir Winston Churchill

Prime minister of UK

Some quotes to begin …

“I keep saying that the

sexy job in the next 10
years will be

Hal Varian, chief

economist @Google in
2009 3

Some quotes to begin …

« Prof. Boute! I just wanted

to thank you for your
statistics class a year and a
half ago at the MBA. I'm
currently working for
Amazon and not a day goes
by when I don't use what I
learned and more!
Jose Chapa, MBA 13
Head of POD, Kindle Statistical modeling and
experimental design is a big
Analytics and TAM at
part of my job. »

Did you say « Big Data » ?

The 5 keys for decision
I : Data structuration

II : Data integration

III : Basic descriptive statistics

IV : Multidimensional analysis

V : Visualization

I : Data structuration

Which individual ?

Which population ? Which sample ? Which size ?

Which variable ?

Quantitative or qualitative ?

Which method ?

II : Data integration

Think as a basic xls file

N individuals Id Variable 1 Variable 2 … Variable p

P variables
Var1 Var2 VarP
Individual 1 Id 1
(Ind1) (Ind1) (Ind1)

Var1 Var2 VarP

Individual 2 Id 2
(Ind2) (Ind2) (Ind2)

Var1 Var2 VarP

Individual N Id N
(Ind N) (Ind N) (Ind N)

II : Data integration Not so natural !

Student Gender Age

Mary W 29
John M 30
Visham M 32
Sept intake Li W 29
Chen M 31
Paul M 30
Toto M 32
Carol W 28
Kevyn M 29
Jan intake Martin M 33
Liz W 30
John M 29

II : Data integration Not so natural !

ID Student Intake Gender Date of Birth

1 Mary Sept W 10/10/1995
2 John Sept M 15/03/1997
3 Visham Sept M 3/02/1998
4 Li Sept W 1/9/1992
5 Chen Sept M 3/11/1993
6 Paul Sept M 15/11/1993
7 Toto Sept M 3/1/1990
8 Carol Jan W 2/4/1992
9 Kevyn Jan M 23/2/1989
10 Martin Jan M 15/08/1995
11 Liz Jan W 2/9/1998
12 John Jan M 12/12/1992

II : Data integration Not so natural !
Call’s date Name Action Date of dinner Hour of dinner Discount Type

201812231040 Dupont Booking 24/12/18 Evening The fork Family

201812231053 Durant Booking 25/12/18 Lunch The fork Business
201812231332 Macron Booking 25/12/18 Lunch Elysée Business
201812231445 Durant Cancellation 25/12/18 Lunch The fork Business
201812231459 Philippe Booking 26/12/18 Lunch No Family
201812231553 Dupont Cancellation 24/12/18 Evening The fork Family

Tableau 1-1

Name ID Call 1’s date Action 1 Date of dinner Hour of dinner Discount Type Call 2’s date Action 2

Dupont 201812231040 Booking 24/12/18 Evening The fork Family 201812231553 Cancelling

Durant 201812231053 Booking 25/12/18 Lunch The fork Business 201812231445 Cancelling

Macron 201812231332 Booking 25/12/18 Lunch Elysée Business

Philippe 201812231459 Booking 26/12/18 Lunch No Family

II : Data integration Not so natural !

ID Student Intake Gender Age

1 Mary Sept W 29
2 John Sept M 30
3 Visham Sept M 32
4 Li Sept W 29
5 Chen Sept M 31
6 Paul Sept M 30
7 Toto Sept M 32
8 Carol Jan W 28
9 Kevyn Jan M 29
10 Martin Jan M 33
11 Liz Jan W 30
12 John Jan M 29
13 MEAN 30,17
III : Basic descriptive statistics

- Quantitative : mean, median, std deviation, boxplot,
histogram, extreme value, distribution, …
- Qualitative : camembert, proportion, …

- Quantitative x quantitative : scatter dot, pivot table,
correlation, outliers, …
- Qualitative x qualitative : chi-square, …
- Quantitative x qualitative : box plot, variance
analysis, …

IV : Multidimensional analysis
Id Variable 1 Variable 2 … Variable p
Individual 1 Id 1
N in millions
Individual 2 Id 2
p in hundreds

Individual N Id N

Describe Explain
Varp = f(Var1; Var2; . . . ; Varp−1)


V : Visualization

V : Visualization

98 % against 0.000001 %


- Attendance
- An assignment (will be provided on session 2)



Install SPSS on your laptop

Blackboard Tutorial 1
Data : apartment


« Non puoi insegnare niente a un uomo.

Puoi solo aiutarlo a scoprire ciò che ha dentro di sé »


You might also like