Professional Documents
Culture Documents
Descriptive Stats - PGP2022
Descriptive Stats - PGP2022
Descriptive Stats - PGP2022
Business Data
Sahadeb Sarkar
Operations Management Group, IIM Calcutta
1
Topics to be covered
1. Summarizing Data through Tables, Charts and Graphs
[Sec 2.1 – 2.6, Sec 3.1 – 3.4, 3.6]
2. Regression Based Business Forecasting: Covariance &
Correlation Coefficient (Sec 3.5, 5.2), Simple Linear
Regression (Sec 13.1-13.6), Multiple Linear Regression
(Sec 14.1, 14.2, 14.6 (Dummy Var Reg), 15.1
(Polynomial Reg), 15.2)
2
Types of Data
Data
Categorical Numerical
(Qualitative) (Quantitative)
(e.g., Gender of customer
/ Salesperson, Location of
store, Preference, Bond
rating) Discrete Continuous
(e.g., Family size, No. of (e.g., Product life, Waiting
credit cards, No. of footfalls time, Market share, Sales,
in a store, No. of units in Cost, Inventory value,
inventory) Shipment weight)
3
Bond Ratings
4
Bond Ratings (contd.)
5
Various Measurement Scales
Nonmetric Scale:
i. Nominal Scale (e.g., gender, type of stocks
(growth- or value-), internet provider)
Metric Scale:
i. Interval Scale (e.g., Temp (in C, F), IQ
Score, Standardized Test Scores (ACT/ SAT),
Rating products; zero is relative not absolute);
Ratio of two numbers is not meaningful
8
Various Measurement Scales
Scale
Nominal Numbers Finish
Assigned
7 8 3
to Runners
Interval Performance
Rating on a 8.2 9.1 9.6
0 to 10 Scale
Numerical Data 41, 24, 32, 26, 27, 27, 30, 24, 38, 21
2 144677 5
3 028 0
10 20 30 40 50 60
Display
4 1
Tables Histograms
11
Organizing Numerical Data
(continued)
• Sales Data in Raw Form (as collected):
24, 26, 24, 21, 27, 27, 30, 41, 32, 38
• Data in Ordered Array from Smallest to Largest:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
• Stem-and-Leaf Display:
2 144677
3 028
4 1
12
Stem and Leaf Display
14
Stem-Leaf Diagram of Scores
15
Histogram of Scores
16
Frequency Distribution Tables
Relative
Class Frequency Frequency Percentage
More than 9 and Upto 19 3 .15 15
More than 19 and Upto 29 6 .30 30
More than 29 and Upto 39 5 .25 25
More than 39 and Upto 49 4 .20 20
More than 49 and Upto 59 2 .10 10
Total 20 1 100 18
Histogram
Histogram
7 6
6 5 No Gaps
Frequency
5 4
4 3
Between
3 2 Bars
2
1 0 0
0
4 14 24 34 44 54 More
Class Boundaries 19
Class Midpoints
Annual Net Sales Data
20
Novartis: Frequency Distribution Table
21
Novartis Annual Net Sales
40.00 36.84
35.00
30.00 26.32
Percentage
25.00
20.00 15.79
15.00 10.53 10.53
10.00
5.00
0.00
0.00
241.51 354.62 467.72 580.82 693.92 807.02
Bin Upper Boundary
22
NOVARTIS
Percentage
40
35
30
Percentage
25
20
15
10
5
0
200 300 400 500 600 700 800 900
Bin Upper Boundary
23
Intentionally Kept Blank
24
Descriptive Summary Measures
for Numerical Data
25
Summary Measures
Arithmetic Mode
Mean Range
Median
(Variance),
Inter Quartile Standard Deviation
Geometric Range
Mean
Coefficient of Variation
Harmonic Weighted
Mean “Mean”
Trimmed 26
Mean
“Mean” or Arithmetic Mean
X i
X1 X 2 Xn
X i1
n n
Sample size Observed values
27
Median
11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20
Median = 13 Median = 13
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5 Median = 5
30
Australia batsman Hughes dies from head injury
(http://timesofindia.indiatimes.com/sports/off-the-field/Australia-batsman-
Hughes-dies-from-head-injury/articleshow/45292785.cms)
31
U.S. Household Incomes Surged 5.2% in 2015,
First Gain Since 2007
• http://www.wsj.com/articles/u-s-household-incomes-surged-5-2-in-2015-ending-slide-1473776295
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
No Mode
Mode = 9 35
Geometric Mean
• Geometric Mean:
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 5 Mean = 6
Mean= 5 6
Geo. Mean= 3.94 4.30 36
Geometric Mean Used for Ratio Data &
by Financial Analysts
X G (X1 X 2 X n ) 1/ n
37
Geometric Mean in Business
39
Harmonic Mean in Business
• Harmonic mean is appropriate for situations when
average of rates (e.g., productivity of machines, speed,
price/earnings ratio) are to be calculated.
• Harmonic mean of the positive values X1, X2, ..., Xn is
defined to be
41
Weighted Harmonic Mean
• If one invests wi proportion of a certain amount
of money in a mutual fund at prices X1,
X2, ..., Xn over n time periods, then average
price per unit paid is a weighted harmonic mean:
42
Weighted Mean
43
Weighted Mean used Daily at BSE & NSE
44
Intentionally Kept Blank
45
Summary Measures
Arithmetic Mode
Mean Range
Median
(Variance),
Inter Quartile Standard Deviation
Geometric Range
Mean
Coefficient of Variation
Harmonic Weighted
Mean “Mean”
Trimmed 46
Mean
Measures of Variation
Variation
Inter-quartile
Range
“coefficient of variation” is
used to compare multiple
population/sample variations
(with significantly different Same center,
47
mean values) different variation
Measures of Variation:
Range Can Be Misleading
Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Inter Quartile Range (Mid Range) = IQR
• Q1= (n+1)/4-ranked (interpolated) value and Q3=3(n+1)/4-ranked
(interpolated) value. In Excel (version 2010 and later) use
QUARTILE.EXC function
7 8 9 10 11 12 7 8 9 10 11 12
49
Various Methods of IQR Calculation
Source: http://mathworld.wolfram.com/Quartile.html 50
Variance & Standard Deviation
X X
2
i
S
2 i 1
n 1
X X
2
i
S i 1
n 1
51
Same Mean but different Standard
Deviation
Data A Mean = 15.5
s = 3.338
11 12 13 14 15 16 17 18 19 20 21
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.57
52
Coefficient of Variation
54
Shape of a Distribution
• Describes how data are distributed
• Two useful shape related statistics are:
– Skewness
• Measures the extent to which data values are not
symmetrical
– Kurtosis
• Kurtosis affects the peakedness of the curve of the
distribution — that is, how sharply the curve rises
approaching the center of the distribution
Skewness
Skewness
Statistic < 0 0 >0
Descriptive Stats Using Microsoft Excel
Functions
58
Normalization of
Performance Ratings
Coming from Diverse
Sources
59
Normalization of Performance
Ratings (out of 10) on Employees
60
Performance Ratings by Managers A, B, C
out of 10
61
Normalization Method
(Ratings out of 10)
• Standardized Rating = (Raw Rating
– mean)/stdev
• Grand mean
• Grand stdev
Example (Manager A)
Grand mean=5.39
Grand stdev=1.76