Professional Documents
Culture Documents
Statistics and Data Analysis: Professor William Greene Stern School of Business IOMS Department Department of Economics
Statistics and Data Analysis: Professor William Greene Stern School of Business IOMS Department Department of Economics
Statistics and Data Analysis: Professor William Greene Stern School of Business IOMS Department Department of Economics
Analysis
Professor William Greene
Stern School of Business
IOMS Department
Department of Economics
1/54 2: Descriptive Statistics
Statistics and Data
Analysis
The “NYU No
Action Letter”
A Cultural Revolution …
“3000 women, ages 14
to 78 describe in their
own words …”
A Cultural Revolution …
“3000 women, ages 14 to 78
describe in their own words …”
13/54 2: Descriptive Statistics
http://en.wikipedia.org/wiki/Shere_Hite
http://old.cni.org/docs/ima.ip-workshop/Massarsky.html
16/54 2: Descriptive Statistics
A Descriptive Statistic
Is … ?
Describes what?
The sample data
The population that the data came from
1
Listing = (896,800 + 713,864 +... +164,326) = 369,687
51
12
Median = 1.8000
9
F req u en cy
6
Mean = 1.8767
3
0
1. 000 1. 500 2. 000 2. 500 3. 000
DEFECTS
6
Histogram of Defects
We quantify the variation of the values
5
around the mean. Note the range is
4
from 1.05 to 2.70. This gives an idea
where the data lie. The mean plus a
Frequency
1
2
N
Variance = sy 2
= Yi - Y
N 1 i=1
1
2
N
Standard deviation = sy = Yi - Y
N 1 i=1
Note the units of measurement. The standard deviation has the same units
as the mean. The standard deviation is the standard measure for the
dispersion (spread) of a set of values (sample of observations).
See HOG, p. 37
35/54 2: Descriptive Statistics
Computing a Standard Deviation
Y Deviation Squared
From Mean Deviation
1 -2.1 4.41
4 0.9 0.81
6 2.9 8.41 Sum = 31
0 -3.1 9.61
3 -0.1 0.01 Mean = 31/10=3.1
2 -1.1 1.21 Sum of squared deviations = 38.90
6 2.9 8.41 Variance = 38.90/(10-1) = 4.322
4 0.9 0.81
4 0.9 0.81 Standard Deviation = 2.079
1 -2.1 4.41
SUM 0.0 38.90
1 2 1
i
30
Variance = Y -1.8767 = 4.808667 = 0.165816
30 -1 i=1 30 -1
1 2
30
Standard Deviation = Yi -1.8767 = 0.407205
30 -1 i=1
4
Frequency
0
1.2 1.6 2.0 2.4 2.8
Defects
900000
800000
700000
600000
Listing
500000
400000
300000
200000
100000
15000 17500 20000 22500 25000 27500 30000 32500
IncomePC
900000
800000
Regression Line: Listing = a + b IncomePC
700000
600000
Listing
500000
400000
300000
200000
100000
15000 17500 20000 22500 25000 27500 30000 32500
IncomePC
25000
The U.S. Gasoline
22500 Market. Data are
20000 yearly from 1953 to
Income
100
22500
20000 80
GasPrice
Income
17500
60
15000
12500 40
10000
20
1950 1960 1970 1980 1990 2000 2010 1950 1960 1970 1980 1990 2000 2010
Year Year
1 1
2 2
i Yi - Y
N n
Standard Deviations: s X X - X , s Y
N 1 i=1 N 1 i=1
Covariance: s XY
N
X
i=1 i
X Yi Y
N 1
s XY
Correlation : rXY -1 < rXY < +1 Units free. A pure number.
sX sY
900000
800000
700000
600000
Listing
500000
400000
Listing
300000
200000
100000
15000 17500 20000 22500 25000 27500 30000 32500
IncomePC
rIncome,Listing = +0.591
2.4
2.2
Correlations 2.0
Noise
1.8
1.6
1.4
25.24
r = 0.0
25.22
25.20
cost
25.18
25.16
25.14
Scatterplot of Noise vs MoreNoise
25.12 2.6
25.10
2.4
1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8
Defects
2.2
2.0
Noise
r = +1.0
1.8
1.6
1.4
1.2
1.50 1.75 2.00 2.25 2.50
MoreNoise
r = +0.5