Descriptive Stats - PGP2022

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 63

Organizing & Making Sense of

Business Data

Sahadeb Sarkar
Operations Management Group, IIM Calcutta

1
Topics to be covered
1. Summarizing Data through Tables, Charts and Graphs
[Sec 2.1 – 2.6, Sec 3.1 – 3.4, 3.6]
2. Regression Based Business Forecasting: Covariance &
Correlation Coefficient (Sec 3.5, 5.2), Simple Linear
Regression (Sec 13.1-13.6), Multiple Linear Regression
(Sec 14.1, 14.2, 14.6 (Dummy Var Reg), 15.1
(Polynomial Reg), 15.2)

Text book: “Statistics for Managers using Microsoft Excel”


Levine Stephan & Szabat, 8th ed, Pearson Education.

2
Types of Data
Data

Categorical Numerical
(Qualitative) (Quantitative)
(e.g., Gender of customer
/ Salesperson, Location of
store, Preference, Bond
rating) Discrete Continuous
(e.g., Family size, No. of (e.g., Product life, Waiting
credit cards, No. of footfalls time, Market share, Sales,
in a store, No. of units in Cost, Inventory value,
inventory) Shipment weight)
3
Bond Ratings

4
Bond Ratings (contd.)

5
Various Measurement Scales
Nonmetric Scale:
i. Nominal Scale (e.g., gender, type of stocks
(growth- or value-), internet provider)

ii. Ordinal Scale (e.g., course grades,


employee designation, bond rating)

Metric Scale:
i. Interval Scale (e.g., Temp (in C, F), IQ
Score, Standardized Test Scores (ACT/ SAT),
Rating products; zero is relative not absolute);
Ratio of two numbers is not meaningful

ii. Ratio Scale (e.g., Age, Weight, Sales, Profit,


Demand; Counts, zero is absolute , Temp (in
K), C=K-273.15, F= (K - 273.15) * 9/5 + 32 )
6
IQ Scale
• Over 140 - Genius or near genius
• 120 - 140 - Very superior intelligence
• 110 - 119 - Superior intelligence
• 90 - 109 - Normal or average intelligence
• 80 - 89 - Dullness
• 70 - 79 - Borderline deficiency
• Under 70 - Definite feeble-mindedness

Here, average = 100, one could define it to be 50, say.


These numbers are on an interval scale
7
Measurement Scales
• Nominal Scale: Numbers used/assigned are just labels
• Ordinal Scale: Involves the ranking of individuals,
attitudes or items
• Interval Scale: Can say respondent’s preference rating
scores 1 and 2 are as far apart as his/her scores 4 and
5, but not that a person with score 10 feels twice as
strongly as one with score 5. [Temperature: can’t say
50°F is twice as hot as 25°F since on the centigrade
scale, corresponding 10°C and -3.9°C, are not in the
ratio 2:1; C=(F-32)*5/9]

8
Various Measurement Scales
Scale
Nominal Numbers Finish
Assigned
7 8 3
to Runners

Ordinal Rank Order Finish


of Winners
Third Second First
place place place

Interval Performance
Rating on a 8.2 9.1 9.6
0 to 10 Scale

Ratio Time to 15.2 14.1 13.4


Finish, in 9
Seconds
Example: Measurement Scales

Nominal Ordinal Interval Ratio


Scale Scale Scale Scale
Preference Preference $ spent last
No. Store Rankings Ratings 3 months
1 to 7 -3 to 3
1. Parisian 7 5 1 0
2. Macy’s 2 7 3 200
3. Kmart 8 4 0 0
4. Kohl’s 3 6 2 100
5. J.C. Penney 1 7 3 250
6. Neiman Marcus 5 5 1 35
7. Marshalls 9 4 0 0
8. Saks Fifth Avenue 6 5 1 100
9. Sears 4 6 2 0
10.Wal-Mart 10 2 -2 10 10
Organizing Numerical Data

Numerical Data 41, 24, 32, 26, 27, 27, 30, 24, 38, 21

Charts: Pie, Frequency Distributions


Ordered Array Charts:
Bar Pie, Bar
21, 24, 24, 26, 27, 27, 30, 32, 38, 41

2 144677 5

Stem and Leaf


4

3 028 0

10 20 30 40 50 60

Display
4 1
Tables Histograms
11
Organizing Numerical Data
(continued)
• Sales Data in Raw Form (as collected):
24, 26, 24, 21, 27, 27, 30, 41, 32, 38
• Data in Ordered Array from Smallest to Largest:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
• Stem-and-Leaf Display:
2 144677
3 028

4 1

12
Stem and Leaf Display

 A stem-and-leaf display organizes data into groups (called


stems) so that the values within each group (the leaves)
branch out to the right on each row.
Age of College Students

Age of Day Students Day Students Night Students


Surveyed
16 17 17 18 18 18 Stem Leaf
College Stem Leaf
Students 19 19 20 20 21 22
1 67788899 1 8899
22 25 27 32 38 42
Night Students 2 0012257 2 0138
18 18 19 19 20 21
3 28 3 23
23 28 32 33 41 45
4 2
4 15
Stem-Leaf Diagram of Exam Scores

14
Stem-Leaf Diagram of Scores

15
Histogram of Scores

16
Frequency Distribution Tables

• Sort sales Data values in Ascending Order


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
• Find Range: 58 - 12 = 46
• Select Number of Classes: 5 (usually between 5 and 10)
• Compute Class Interval (width): 10 (46/5 then round up)
• Determine Class Boundaries (limits): 9, 19, 29, 39, 49, 59
• Compute Class Midpoints: 14, 24, 34, 44, 54

• Count Observations & Assign to Classes


17
Frequency Distributions: Example
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Relative
Class Frequency Frequency Percentage
More than 9 and Upto 19 3 .15 15
More than 19 and Upto 29 6 .30 30
More than 29 and Upto 39 5 .25 25
More than 39 and Upto 49 4 .20 20
More than 49 and Upto 59 2 .10 10
Total 20 1 100 18
Histogram

Data in ordered array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Histogram

7 6
6 5 No Gaps
Frequency

5 4
4 3
Between
3 2 Bars
2
1 0 0
0
4 14 24 34 44 54 More
Class Boundaries 19
Class Midpoints
Annual Net Sales Data

20
Novartis: Frequency Distribution Table

Bin Lower Bin Upper Bin Relative Percent-


Boundary Boundary Midpoint Freq Freq age
241.52 354.62 298.07 3 0.1579 15.79
354.62 467.72 411.17 5 0.2632 26.32
467.72 580.82 524.27 7 0.3684 36.84
580.82 693.92 637.37 2 0.1053 10.53
693.92 807.02 750.47 2 0.1053 10.53

21
Novartis Annual Net Sales

40.00 36.84
35.00
30.00 26.32
Percentage

25.00
20.00 15.79
15.00 10.53 10.53
10.00
5.00
0.00
0.00
241.51 354.62 467.72 580.82 693.92 807.02
Bin Upper Boundary

22
NOVARTIS

Percentage

40
35
30
Percentage

25
20
15
10
5
0
200 300 400 500 600 700 800 900
Bin Upper Boundary
23
Intentionally Kept Blank

24
Descriptive Summary Measures
for Numerical Data

25
Summary Measures

Central Tendency Variation

Arithmetic Mode
Mean Range
Median
(Variance),
Inter Quartile Standard Deviation
Geometric Range
Mean
Coefficient of Variation
Harmonic Weighted
Mean “Mean”

Trimmed 26
Mean
“Mean” or Arithmetic Mean

• The arithmetic mean (often just called the


“mean”) is the most common measure of central
tendency

Pronounced X-bar The ith value

X i
X1  X 2    Xn
X i1

n n
Sample size Observed values
27
Median

In an ordered array, the median is the


“middle” number (50% above, 50% below)

11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20

Median = 13 Median = 13

• Not affected by extreme values


28
Median
• “Robust” Measure of Central Tendency
• Not Affected by Extreme Values

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5 Median = 5

• In an Ordered Array, the Median is the ‘Middle’


Number
– If n or N is odd, the median is the middle number
– If n or N is even, the median is the average of the 2
middle numbers
29
http://economictimes.indiatimes.com/tech/ites/tcs-ceo-n-
chandrasekarans-pay-rises-20-in-fy16-to-rs-25-6-crore-459-times-
companys-median-remuneration/articleshow/52450273.cms

30
Australia batsman Hughes dies from head injury
(http://timesofindia.indiatimes.com/sports/off-the-field/Australia-batsman-
Hughes-dies-from-head-injury/articleshow/45292785.cms)

• SYDNEY: Australian cricketer Phillip Hughes died in hospital on


Thursday, two days after the batsman was struck on the head by a
bouncer during a domestic match. … …
• Questions about the response time of ambulances dispatched to the
stadium were also raised. The head of New South Wales
Ambulance was to be hauled before the state health minister Jillian
Skinner on Thursday after the ambulance authority issued conflicting
statements about their response times. The arrival of the first
ambulance took 15 minutes, NSW Ambulance clarified in a
statement on Wednesday. The state's median response time for
the highest priority "life-threatening cases" was just under
eight minutes in 2013-14, according the authority's statistics. …

31
U.S. Household Incomes Surged 5.2% in 2015,
First Gain Since 2007
• http://www.wsj.com/articles/u-s-household-incomes-surged-5-2-in-2015-ending-slide-1473776295

• The median household income—the level at which half are above


and half are below—rose 5.2%, or $2,798, to $56,516, from a year
earlier, after adjusting for inflation, the Census Bureau said
Tuesday. 32
“Believe in the average rather than in extremes”
by Uma Shashikant
https://economictimes.indiatimes.com/wealth/invest/why-average-return-from-
investments-is-what-you-should-expect-and-be-happy-with/articleshow/62296843.cms
(ET, Jan 03, 2018): If someone told us that we should be
happy making average returns on our investments, it would
probably be unacceptable to us. The lure of doing better or
beating the average is something we simply cannot let go
of. The very notion of earning above average returns on a
consistent basis is a statistical challenge. … … …
Fund managers tend to compare their performances with
the median returns made by their peer group. This is a
modified version of the average, taking into account the
possibility that larger number of funds could be
concentrated at the top and bottom of the league table. …

33
Hope for Sensex
https://www.telegraphindia.com/1170210/jsp/business/story_134992.jsp#.WJ1PG2997IU

• Mumbai, Feb 9, 2017: Global brokerage firm Morgan


Stanley today said it expects the BSE Sensex to touch the
39000-mark in a "bull case" scenario by December.
• The brokerage listed various factors that include macro-
economic conditions, corporate fundamentals and
sentiment.
• "Domestic appetite for equities remains strong. Sentiment
is off lows though yet to hit exuberant territory," Morgan
Stanley India strategist Ridham Desai and Sheela Rathi
said in a research note.
• The brokerage has a base case (50 per cent probability)
target for BSE Sensex at 30000, a bull case (30 per cent
probability) of 39000 and a bear case (20 per cent 34
probability) of 24000.
Mode
• Value that Occurs Most Often
• Not Affected by Extreme Values
• There May Not be a Mode
• There May be Several Modes
• Used for Either Numerical or Categorical Data

0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

No Mode
Mode = 9 35
Geometric Mean
• Geometric Mean:

• Both Means Affected by Extreme Values

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 5 Mean = 6

Mean= 5 6
Geo. Mean= 3.94 4.30 36
Geometric Mean Used for Ratio Data &
by Financial Analysts

X G  (X1  X 2  X n ) 1/ n

• For a certain Company the ratio of prices in


2008 to those in 2007 for four products are 0.92,
1.25, 1.75 and 0.85
• Find the average price ratio (may use the
geometric mean = 1.14)

37
Geometric Mean in Business

• Suppose an investment of 100 units grows to 50, 85,


42.5 and 72.25 in next 4 years (growth is 80%, 16.7%
and 42.9% each year). Arithmetic mean calculates a
(linear) average growth of 10% (-50% + 70% -50% +70%
divided by 4).
• If investment of 100 units grows 10% each year, the
result is 146.41 units, not 72.25, so the linear average
over-states the year-on-year growth.
• Geometric mean of 0.5 =(50/100), 1.7 (=85/50), 0.5
(=42.5/85) and 1.7 (=72.85/42.5) is -7.805%. If
investment of 100 units grows with -7.805% each year,
the result is 72.25 units.
38
Example: Geometric Mean

39
Harmonic Mean in Business
• Harmonic mean is appropriate for situations when
average of rates (e.g., productivity of machines, speed,
price/earnings ratio) are to be calculated.
• Harmonic mean of the positive values X1, X2, ..., Xn is
defined to be

• Harmonic mean of 10 and 20: first take 1/10 and 1/20,


find their average, which is 3/40, and then take the
reciprocal of that, 40/3.
40
( https://economictimes.indiatimes.com/wealth/invest/mutual-funds-
garner-record-high-aum-of-rs-7304-crore-via-sips-in-
may/canararobecoshowsp_dp/64600837.cms )

41
Weighted Harmonic Mean
• If one invests wi proportion of a certain amount
of money in a mutual fund at prices X1,
X2, ..., Xn over n time periods, then average
price per unit paid is a weighted harmonic mean:

42
Weighted Mean

43
Weighted Mean used Daily at BSE & NSE

• “Closing Price” of a stock (or its “Future”) at NSE


is the weighted average of the stock prices (or
prices of Futures contracts) traded in the last
half-an-hour of the day

44
Intentionally Kept Blank

45
Summary Measures

Central Tendency Variation

Arithmetic Mode
Mean Range
Median
(Variance),
Inter Quartile Standard Deviation
Geometric Range
Mean
Coefficient of Variation
Harmonic Weighted
Mean “Mean”

Trimmed 46
Mean
Measures of Variation
Variation

Range Variance Standard Coefficient


Deviation of Variation

Inter-quartile
Range

“coefficient of variation” is
used to compare multiple
population/sample variations
(with significantly different Same center,
47
mean values) different variation
Measures of Variation:
Range Can Be Misleading
 Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5

 Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Inter Quartile Range (Mid Range) = IQR
• Q1= (n+1)/4-ranked (interpolated) value and Q3=3(n+1)/4-ranked
(interpolated) value. In Excel (version 2010 and later) use
QUARTILE.EXC function

7 8 9 10 11 12 7 8 9 10 11 12
49
Various Methods of IQR Calculation

Source: http://mathworld.wolfram.com/Quartile.html 50
Variance & Standard Deviation

Shows Variation/Spread About the Mean


– Sample Variance: n

 X X
2
i
S 
2 i 1

n 1

– Sample Standard Deviation:


n

 X X
2
i
S i 1

n 1
51
Same Mean but different Standard
Deviation
Data A Mean = 15.5
s = 3.338
11 12 13 14 15 16 17 18 19 20 21

Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.57
52
Coefficient of Variation

• Measures relative variation


• Shows variation relative to mean
• Can be used to compare the variability of two or
more sets of data measured in different units
Intentionally Kept Blank

54
Shape of a Distribution
• Describes how data are distributed
• Two useful shape related statistics are:
– Skewness
• Measures the extent to which data values are not
symmetrical
– Kurtosis
• Kurtosis affects the peakedness of the curve of the
distribution — that is, how sharply the curve rises
approaching the center of the distribution
Skewness

• Measures the extent to which data is not


symmetrical
Left-Skewed Symmetric Right-Skewed
Mean < Median<Mode Mean = Median=Mode Mode<Median < Mean

Skewness
Statistic < 0 0 >0
Descriptive Stats Using Microsoft Excel
Functions

House Prices Descriptive Statistics


$ 2,000,000 Mean $ 600,000 =AVERAGE(A2:A6)
$ 500,000 Standard Error $ 357,770.88 =D6/SQRT(D14)
$ 300,000 Median $ 300,000 =MEDIAN(A2:A6)
$ 100,000 Mode $ 100,000.00 =MODE(A2:A6)
$ 100,000 Standard Deviation $ 800,000 =STDEV(A2:A6)
Sample Variance 640,000,000,000 =VAR(A2:A6)
Kurtosis 4.1301 =KURT(A2:A6)
Skewness 2.0068 =SKEW(A2:A6)
Range $ 1,900,000 =D12 - D11
Minimum $ 100,000 =MIN(A2:A6)
Maximum $ 2,000,000 =MAX(A2:A6)
Sum $ 3,000,000 =SUM(A2:A6)
Count 5 =COUNT(A2:A6)
Intentionally Kept Blank

58
Normalization of
Performance Ratings
Coming from Diverse
Sources

59
Normalization of Performance
Ratings (out of 10) on Employees

60
Performance Ratings by Managers A, B, C
out of 10

61
Normalization Method
(Ratings out of 10)
• Standardized Rating = (Raw Rating
– mean)/stdev

• Grand mean

• Grand stdev

New Rating = Grand Mean + (Grand


stdev) *(Standardized Rating)

Example (Manager A)

Raw Rating = 5.9


(5.9 - 6.99)/0.8 = -1.37

Grand mean=5.39
Grand stdev=1.76

New Rating= 5.39+1.76*(-1.37) =3.0


62
63

You might also like