Professional Documents
Culture Documents
2023 Statistics Fin 6
2023 Statistics Fin 6
Lecture 6
Descriptive Statistics
4. ANALYSIS OF CONCENTRATION
ANALYSIS OF CONCENTRATION –
Distribution of the total value between the elementary units -
whether the total value of the variable is uniformly distributed
between the elementary units or not
ANALYSIS OF KURTOSIS(PEAKEDNESS)
whether the distribution is mesokurtic, leptokurtic or platykurtic
Concentration of elementary units near the mean value
Descriptive Statistics – MEASURES OF CONCENTRATION
no income
cumulative the Lorenz curve
income 100
1 0 0
2 0 0
90
80
A - the area from the
3 0 0 70
diagonal to the real
4 0 0 60 income distribution
5 0 0 50
B - the area from the
6 0 0 40
30
curve to the axis
7 0 0
𝐴
8 0 0 20
𝐺= –
9 5 5 10 𝐴+𝐵
10 95 100 0 𝐺 =0 - complete equality
1 2 3 4 5 6 7 8 9 10
total 100 𝐺 closer to 0 - income is distributed more evenly
𝐺 closer to 1 - income is distributed more unevenly
Descriptive Statistics – MEASURES OF CONCENTRATION
https://www.economicsonline.co.uk/Definitions/Gini_co-efficient.html https://towardsdatascience.com/clearly-explained-gini-coefficient-and-lorenz-curve-fe6f5dcdc07
Descriptive Statistics – MEASURES OF CONCENTRATION
World map of income inequality Gini coefficients by country (as %). Based on World Bank data ranging (2021)
the lowest score on The Gini coefficient ~0.2- low degree of inequality
the highest score over 0.6 - very unequal the income distributions
Descriptive Statistics – MEASURES OF CONCENTRATION
the lowest score on The Gini coefficient ~0.2- low degree of inequality
the highest score over 0.6 - very unequal the income distributions
Descriptive Statistics – MEASURES OF CONCENTRATION
Europe
Vietnam
the lowest score on The Gini coefficient ~0.2- low degree of inequality
the highest score over 0.6 - very unequal the income distributions
Descriptive Statistics – MEASURES OF CONCENTRATION
S80/S20 ratio
The "income quintile share ratio" (also called the „S80/S20 ratio” „ 20/20 ratio”) –
the ratio of the total income received by the 20% of the population with the highest income (= 1st or top quintile)
to that income received by the 20% of the population with the lowest (= 5th or bottom quintile).
or
the annual income of the top 20% of the population expressed in the number of years the lowest 20%
of the population have to work in order to achieve the same income result.
split the box in two covers the interquartile interval (with 50% of the data)
MODIFIED BOXPLOTS
outliers may be the result of:
• a measurement error,
• an observation from a different population
• an unusual extreme observation
• it may instead be an indicator of skewness.
Usually we use quartiles and the interquartile range – IQR - to identify potential outliers.
We define the lower and upper limits, the numbers that lie, respectively:
Lower limit (fence) – 1.5 IQRs below the first quartile = Q1 -1.5 IQR
Upper limit (fence) – 1.5 IQRs above the third quartile = Q3 +1.5 IQR
Descriptive Statistics - INFORMATION ON CENTRE, VARIATION AND SKEWNESS
MODIFIED BOXPLOTS
Measurement Distribution
A B C
A Minimum 0.00 0.11 0.14
Lower quartile (Q1) 0.02 0.37 0.69
Median (Q2) 0.11 0.48 0.88
Upper quartile (Q3) 0.32 0.58 0.95
Maximum 0.86 0.93 1.00
B
INFORMATION ON CENTRE
The centre of distribution A is the lowest of the 3 distributions (median is 0.11).
The centre of distribution C is the highest of the three distributions (median is 0.88).
INFORMATION ON VARIATION
A - the interquartile range is Q3 - Q1 = 0.32 – 0.02 = 0.30
B - the interquartile range is Q3 - Q1 = 0.21 - The most concentrated distribution because the interquartile range is 0.21, compared to
0.30 for distribution A and 0.26 for distribution C.
C - the interquartile range is Q3 - Q1 = 0.26
A, B, C include potential outliers.
INFORMATION ON SKEWNESS
A - the distribution is positively skewed - the whisker and half-box are longer on the right side of the median than on the left side.
B – the distribution is approximately symmetric - both half-boxes are almost the same length (0.11 on the left side and 0.10 on the right side).
C - the distribution is negatively skewed because the whisker and half-box are longer on thehttps://www150.statcan.gc.ca/n1/edu/power-pouvoir/ch12/5214889-eng.htm
left side of the median than on the right side.
Descriptive Statistics - BOXPLOTS - in Excel
https://www150.statcan.gc.ca/n1/edu/power-pouvoir/ch12/5214889-eng.htm
Descriptive Statistics – DESCRIPTIVE ANALISYS in Excel
Data > Analysis|Data Analysis and choose the Descriptive Statistics option
Statistic Description
Mean the arithmetic mean of the sample data.
Standard Error the standard error of the data set (a measure of the
difference between the predicted value and the actual value).
Median the middle value in the data set (the value that separates the largest
half of the values from the smallest half of the values).
Mode the most common value in the data set.
Standard Deviation the sample standard deviation measure for the data set.
Sample Variance the sample variance for the data set (the squared standard dev.).
Kurtosis the kurtosis of the distribution.
Skewness the skewness of the data set’s distribution.
Range the difference between the max and min values in the data set.
Minimum the smallest value in the data set.
Maximum the largest value in the data set.
Sum Adds all the values in the data set together to calculate the sum.
Count Counts the number of values in a data set.
Largest(X) the largest X value in the data set.
Smallest(X) the smallest X value in the data set.
Confidence Level(X) the confidence level at a given percentage for the data set values.
Percentage
Kurtosis: Skewness:
= 0 - Mesokurtic (normal) distribution =0 - the distribution is symmetric
> 0 - Leptokurtic distribution - more peaked than the normal one. < 0 - the distribution is skewed to the left (negatively skewed)
< 0 - Platykurtic distribution - flattered than the normal one > 0 - the distribution is skewed to the right (positively skewed )
Descriptive Statistics - MOMENTS OF THE DISTRIBUTION
Definition:
Moments of a distribution
The kth moment of a distribution is the average of the deviations of the individual
observations from any value x0 to the power k.
( x i − x 0 ) ni ( x'i − x0 ) ni
k Grades – xi Number of k
Wages Number of
grades - ni
mk = 2 0 mk = xi
0-6
employees - ni
3
N 3 3 N 6 - 12 4
4 10
5 3 12 - 18 13
Total 16 Total 20
Descriptive Statistics - MOMENTS OF THE DISTRIBUTION
We distinguish:
- the moments about the origin x0 =0
- the moments about the mean x0 =m.
k
x
mk = i
N
In the case of grouped data the formula will be changed into:
x i ni
k Grades – xi Number of
mk =
i ni
x ' k
Wages Number of
mk =
grades - ni xi employees - ni
2 0
N 3 3 N 0-6 3
4 10 6 - 12 4
5 3 12 - 18 13
Total 16 Total 20
It should be noted that the first moment about the origin is simply equal to the mean.
Descriptive Statistics - MOMENTS OF THE DISTRIBUTION
.
The moments about the mean x0 =m.
detail data = =
(x − x)
i
=0 grouped data =
( xi − x )ni
= 0 1 =
( x' − x ) n
i i
0
1 1 1
N N N
Since the sum of deviations of observations from their mean is zero. xi → x'i
The 2nd central moment - the variance of the X
( xi − x )2 grouped data 2 =
(x − x ) n 2
= Sx =
( x' − x ) n 2
= S x2
=
i i
= S x2
2 i i
detail data 2
2
N N N
For odd values of r some term in the sum must be positive and some must be negative.
The 3rd central moment In fact for symmetric distributions the positive and negative terms cancel out
detail data = i
( x − x ) 3
grouped data 3
=
(x − x ) n
i
3
i
3
=
( x' − x ) n
i
3
i
3
N N N
standardised measure of skewness =
3
3
− 3 (-2;2)
S x3
= 0 - the distribution is symmetric
< 0 - the distribution is skewed to the left (negatively skewed)
> 0 - the distribution is skewed to the right (positively skewed )
Descriptive Statistics - MOMENTS OF THE DISTRIBUTION
.
The moments about the mean x0 =m.
detail data = i
( x − x ) 4
grouped data = i
( x − x ) 4
ni
4
=
( x '− x ) n
i
4
i
4 4
N N N
=
4
4
4 standardised measure of kurtosis
sx
Przebieg Odsetek
Mileage
w tys. kmkm
thousand
the percentage
taksówek
of taxis xi, x i, n i
0.8 - 1.2 2 1 2
1.2 - 1.6 15 1,4 21
1.6 - 2.0 41 1,8 73,8
2.0 - 2.4 33 2,2 72,6
2.4 - 2.8 6 2,6 15,6
2.8 - 3.2 3 3 9
suma
total N =100 194
xn '
i i
194
x= i =1 = = 1,94
N 100
The interpretation:
the average mileage of taxi amounts to 1.94 thous.km
Example
The Radio-Taxi company conducted a taxi survey due to the weekly number of kilometers traveled.
Using the statistical moments method present a complete descriptive statistics of the population
Przebieg Odsetek
( xi, − x ) 2 ni
Mileage the percentage
w tys. km
thousand km taksówek
of taxis xi, x i, n i
0.8 - 1.2 2 1 2 1,77
1.2 - 1.6 15 1,4 21 4,37
1.6 - 2.0 41 1,8 73,8 0,8
2.0 - 2.4 33 2,2 72,6 2,23
2.4 - 2.8 6 2,6 15,6 2,61
2.8 - 3.2 3 3 9 3,37
suma
total N =100 194 15,16
measures of dispersion Interpretation: The mileage of taxi differ from the mean on
the average for about 0.4 thous.km.
= S ( x) =
2 ( x' − x ) n
i
2
i
=
15,16
= 0,1516 S( x ) = 0,1516 = 0,4
2
N 100
THE COEFFICIENT OF VARIATION THE TYPICAL RANGE
Vx =
Sx
100% =
0,4
100% = 20% 1,54 xtyp 2,34
x 1,94 Interpretation:
A typical of taxi mileage ranged from 1.54 till 2.34 thous.km.
Interpretation: The standard deviation of the mileage Or
of taxis constitutes over 20% of the mean About 68% of of taxi mileages ranged from 1.54 till 2.34 thous.km.
Example
The Radio-Taxi company conducted a taxi survey due to the weekly number of kilometers traveled.
Using the statistical moments method present a complete descriptive statistics of the population
Przebieg Odsetek
( xi, − x ) 2 ni ( xi − x ) ni
Mileage the percentage
xi, , 3
w tys. km
thousand km taksówek
of taxis x i, n i
0.8 - 1.2 2 1 2 1,77 -1,66
1.2 - 1.6 15 1,4 21 4,37 -2,36
1.6 - 2.0 41 1,8 73,8 0,8 -0,11
2.0 - 2.4 33 2,2 72,6 2,23 0,58
2.4 - 2.8 6 2,6 15,6 2,61 1,73
2.8 - 3.2 3 3 9 3,37 3,57
suma
total N =100 194 15,16 1,742
measures of skewness
The analysis of skewness can be presented by calculating the 3rd moment about the mean
= i ni
( x ' − x ) 3
=
1,742
= 0,01742 = 3
3
0,01742
= = 0,3
3 3 3
N 100 s ( x) 0,4
=(-2;2) = 0 - the distribution is symmetric
Interpretation: < 0 - the distribution is skewed to the left (negatively skewed)
> 0 - the distribution is skewed to the right (positively skewed )
3= 0.3 - the distribution is slightly skewed to the right
The mean value is a good measure of central tendency although more units is smaller than the average indicates.
Example
The Radio-Taxi company conducted a taxi survey due to the weekly number of kilometers traveled.
Using the statistical moments method present a complete descriptive statistics of the population
Przebieg Odsetek
( xi, − x ) 2 ni ( xi − x ) ni ( xi, − x) 4 ni
Mileage the percentage , 3
thousand km
w tys. km taksówek
of taxis xi, x i, n i
0.8 - 1.2 2 1 2 1,77 -1,66 1,56
1.2 - 1.6 15 1,4 21 4,37 -2,36 1,28
1.6 - 2.0 41 1,8 73,8 0,8 -0,11 0,02
2.0 - 2.4 33 2,2 72,6 2,23 0,58 0,15
2.4 - 2.8 6 2,6 15,6 2,61 1,73 1,14
2.8 - 3.2 3 3 9 3,37 3,57 3,79
suma
total N =100 194 15,16 1,742 7,929
= i ni
( x ' − x ) 4
=
7,929
= 0,0793 =
4
0,07923
= = 3,45
4 4 4 4
N 100 s ( x) 0,4
Interpretation:
a4= 3.45 - the distribution is leptocurtic
The distribution, which is more peaked than the normal one.