Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

Quantitative Techniques

Shovan Chowdhury

Indian Institute of Management, Kozhikode


Data Driven Business Performance

17-12-2020 EPGP 13
Course Objective
• Familiarity with different types of data and their visualization
• Understanding presence of intrinsic uncertainty in a business
situation
• Use of appropriate statistical techniques for modelling data and
capturing uncertainty
• Applying statistical software for data analysis
• Interpreting outputs from a managerial aspect (may require
knowledge of other disciplines)
• Developing basic expertise of the course to understand the other
areas

17-12-2020 EPGP 13
“Statistical Techniques/Methods”

Formulate Get some Visualize the


problem data data

Do some Interpret
statistical results
calculations

17-12-2020 EPGP 13
DATA AND SUMMARIZATION
Cola Exclusivity Agreement
A large university with a total enrollment of about 50,000
students has offered one Cola company (Soft) an exclusivity
agreement that would give the company exclusive rights to
sell its products at all university facilities for the next year
with an option for future years.

In return, the university would receive 35% of the on-campus


revenues and an additional lump sum of 5,00,000 per year.

Soft has been given 2 weeks to respond.

17-12-2020 EPGP 13
Cola Exclusivity Agreement
The market for soft drinks is measured in terms of 200 ml bottles.

Cola company currently sells an average of 22,000 bottles per


week (over the 40 weeks of the year that the university operates).

The bottles sell for an average of Rs 10 each.

Soft is unsure of its market share but suspects it is considerably


less than 50%.

17-12-2020 EPGP 13
Cola Exclusivity Agreement
A quick analysis reveals that if its current market share were
25%, then, with an exclusivity agreement, Soft would sell
88,000 (22,000 is 25% of 88,000) bottles per week or
3,520,000 bottles per year.

The profit or loss can be calculated.

The only problem is that


we do not know how many soft drinks are sold weekly at the
university.

17-12-2020 EPGP 13
Cola Exclusivity Agreement
Cola assigned a recent university graduate to survey the
university's students to supply the missing information.

Accordingly, she organizes a survey that asks 500 students to


keep track of the number of soft drinks they purchase in the
next 7 days.

17-12-2020 EPGP 13
Inferential statistics
The information we would like to acquire in is an estimate of
annual profits from the exclusivity agreement. The data are
the numbers of bottles of soft drinks consumed in 7 days by
the 500 students in the sample.

We want to know the mean number of soft drinks consumed


by all 50,000 students on campus.

To accomplish this goal we need another branch of statistics-


inferential statistics.

17-12-2020 EPGP 13
Inferential statistics
Inferential statistics is a body of methods used to draw
conclusions or inferences about characteristics of populations
based on sample data. The population in question in this case
is the soft drink consumption of the university's 50,000
students. The cost of interviewing each student would be
prohibitive and extremely time consuming. Statistical
techniques make such endeavors unnecessary. Instead, we
can sample a much smaller number of students (the sample
size is 500) and infer from the data the number of soft drinks
consumed by all 50,000 students. We can then estimate
annual profits for the cola company.

17-12-2020 EPGP 13
Primary Uses of Statistics

• Descriptive statistics – the collection, organization,


presentation and summary of data.

• Inferential statistics – generalizing from a sample to a


population, estimating unknown parameters, drawing
conclusions, making decisions.

17-12-2020 EPGP 13
Problem
One Chocolate manufacturing company sells quality chocolate products at its plant
and retail stores. Two years ago, the company developed a Web site and began
selling its products over the Internet. Web site have exceeded the company’s
expectations, and management is now considering strategies to increase sales even
further. To learn more about the Web site customers, a sample of 50 Chocolate
transactions was selected from the previous month’s sales.

Data showing
the day of the week each transaction was made,
the type of browser the customer used,
the time spent on the Web site,
the number of Web site pages viewed,
the amount spent by each of the 50 customers.

17-12-2020 EPGP 13
The Cab Case (Text Book) : Demand Supply Gap

17-12-2020 EPGP 13
The Car Mileage Case: Estimating Mileage

• Study of tax credit offered by the federal government for improving


fuel economy
• Automaker has introduced a new model and wishes to demonstrate it
qualifies for the tax credit
• At least 31 mpg to be attained
• Sample of 50 cars

17-12-2020 EPGP 13
The Care Mileage Case: The Data

17-12-2020 EPGP 13
Basic Vocabulary of Statistics

POPULATION
A population consists of all the items or individuals about which
you want to draw a conclusion.

SAMPLE
A sample is the portion of a population selected for analysis.

PARAMETER
A parameter is a numerical measure that describes a
characteristic of a population.

STATISTIC
A statistic is a numerical measure that describes a characteristic
of a sample.
17-12-2020 EPGP 13
Qualitative(Categ
Quantitative
orical)

Discrete (no.
of customers, Ordinal (customer
no of claims) satisfaction,
efficiency of workers,
bond rating)
Continuous
(salary, price)
Nominal (sex,
nationality,
eye color)

17-12-2020 EPGP 13
Cross-Sectional Data

• Cross-sectional data: Data collected at the same or approximately the


same point in time
• Time series data: data collected over different time periods

17-12-2020 EPGP 13
Data Visualization

17-12-2020 EPGP 13
Some quick questions
- Return on investment
- Project completion time
- Mutual fund ratings
- Political affiliation
-Demand for a product
- No of customers waiting in a queue
- Diameter of bolts
- Number of defectives produced in a shift
- Gender
-No of misprints per page of a book
- Marital Status
- Efficiency of employee
Excel Bar and Pie Chart of Pizza Preference
Data

17-12-2020 EPGP 13
Histogram

17-12-2020 EPGP 13
Summary for WaitTime
A nderson-D arling N ormality Test
A -S quared 0.24
P -V alue 0.759
M ean 5.4600
S tDev 2.4755
V ariance 6.1279
S kew ness 0.250415
Kurtosis -0.404960
N 100

M inimum 0.4000
1st Q uartile 3.8000
M edian 5.2500
3rd Q uartile 7.2000
0 2 4 6 8 10 12
M aximum 11.6000
95% C onfidence Interv al for M ean
4.9688 5.9512
95% C onfidence Interv al for M edian
4.5742 5.8773
95% C onfidence Interv al for S tD ev
9 5 % C onfidence Inter vals
2.1735 2.8757
Mean

Median

4.50 4.75 5.00 5.25 5.50 5.75 6.00

17-12-2020 EPGP 13
Rating Distribution of Sample
35%
30%
25%
20%
15%
10%
5%
0%
1 2 3 4 5

Gender Profile of Sample


80%

60%

40%

20%

0%
M F

Distribution of Waiting Time (mins)

25

20

15

10

0
17-12-2020 EPGP 13
Skewness
Skewed to left

17-12-2020 EPGP 13
Skewness
Symmetric

17-12-2020 EPGP 13
Skewness
Skewed to right

17-12-2020 EPGP 13
What is the point? Why collect this data?

17-12-2020 EPGP 13
Data and randomness

• Three questions that good business managers ask themselves when


they look at “the numbers”:-

• What is a typical or central value?

• How much variability is present in the data set?

• Are there unusual shocks/events/cases (shape of the curve)?

17-12-2020 EPGP 13
Dispersion
• Describes how similar a set of observations are to each other
or
the degree of deviation (spread) of a set of data from their central
value

• In general, the more spread out a distribution is, the larger the measure of
dispersion will be

17-12-2020 EPGP 13
Measures of Dispersion

125

• Which of the 100


75
distributions of 50
demand has the larger 25
dispersion? 0
1 2 3 4 5 6 7 8 9 10

The upper distribution 125

has more dispersion 100

75
because the scores 50

are more spread out 25

0
1 2 3 4 5 6 7 8 9 10

17-12-2020 EPGP 13
Measures of Dispersion

• There are four main measures of dispersion:


• Range
• Variance
• Standard Deviation
• Inter-quartile range (IQR)

17-12-2020 EPGP 13
Interpretation

• The larger the SD/variance is, the more the observations deviate, on
average, away from the mean
• The smaller the SD/variance is, the less the observations deviate, on
average, from the mean

17-12-2020 EPGP 13
Coefficient of Variation (CV)

• Relative measure (unit free) used for the purpose


of comparison of variability.

• Relative Measure=absolute measure/avg. *100

s
CV = 100 
x

17-12-2020 EPGP 13
Percentiles, Quartiles and IQR

• Percentiles are data that have been divided into 100


groups (99 percentiles).
• For example, you score in the 83rd percentile on a
standardized test. That means that 83% of the test-
takers scored below you.
• Deciles are data that have been divided into 10 groups
(9 deciles).
• Quartiles are data that have been divided into 4 groups
(3 quartiles).

17-12-2020 EPGP 13
Percentiles, quartiles, and the IQR

The 10th percentile (denoted by P10) is the number


such that 10% of the values are less than it and 90%
are bigger.

The median is the 50th percentile.

The 1st quartile (denoted by Q1) is the data such


that 25% of the values are less than it and 75% are
bigger.

Inter quartile range (IQR) = Q3-Q1

17-12-2020 EPGP 13
Box Plot

Describes the overall distribution of a set of


numbers but is simpler than a histogram.
Useful when comparing several samples because
too many histograms on one graph would be
both crowded and confusing.
Also produces useful display with small data
sets.

Useful to detect outliers / extreme values

17-12-2020 EPGP 13
Box Plot

S=smallest, L=Largest, M=median


Q1=lower quartile, Q3=upper quartile

17-12-2020 EPGP 13
Detection of Outliers (Box Plot)

• Calculate Q1-1.5*IQR and Q3+1.5*IQR


• Any data lying outside this region is an outlier
BoxPlot

83 84 85 86 87 88 89 90 91
IBM

BoxPlot

18.5 19 19.5 20 20.5 21 21.5 22 22.5


EDS

17-12-2020 EPGP 13
A large number of fast-food restaurants with drive-through
windows offering drivers and their passengers the
advantages of quick service. To measure how good the
service is, an organization called QSR planned a study
wherein the amount of time taken by a sample of drive-
through customers at each of five restaurants was
recorded. Compare the five sets of data using a box plot
and interpret the results.

17-12-2020 EPGP 13
Box Plots…

Wendy’s service time is


shortest and least variable.

Hardee’s has the greatest


variability, while Jack-in-the-
Box has the longest service
times.

17-12-2020 EPGP 13
Standardising Data
• Purpose: To compare each data point to the natural
range and variation of the dataset.
• Method: For each data value – subtract off sample
mean and divided by sample std dev.
Resulting numbers called z-values or z-scores
• measure how many standard deviations above or
below the mean a data point is.
• are “unit free”
• have mean zero and SD 1

17-12-2020 EPGP 13
Standardising Data

How: To compare each data point to the


natural range and variation of the dataset.
x−x
z=
s

z score can be both positive or negative

17-12-2020 EPGP 13
Capturing variation

⚫ Chebyshev’s Theorem
Applies to any distribution, regardless of shape

⚫ Empirical Rule
Applies only to roughly mound-shaped and symmetric
distributions

17-12-2020 EPGP 13
Chebyshev’s Theorem
 1 
⚫ At least 1 −
 2 of the elements of any


 k 
distribution lie within k standard deviations of the
mean
1 1 3
1− = 1 − = = 75%
2
2
4 4 2
Standard
At 1 1 8 Lie
1 − 2 = 1 − = = 89% 3 deviations
least 3 9 9 within of the mean
1 1 15 4
1− 2 = 1− = = 94%
4 16 16

17-12-2020 EPGP 13
Empirical Rule
⚫ For roughly mound-shaped and symmetric
distributions, approximately:

68% 1 standard deviation


of the mean

95% Lie 2 standard deviations


within of the mean

All 3 standard deviations


of the mean
17-12-2020 EPGP 13
Empirical Rule
99.72%
95.44%
68.26%

m
x
m – 3s m – 1s m + 1s m + 3s
m – 2s m + 2s

17-12-2020 EPGP 13
A survey is conducted on 20 respondents to gather information on customer satisfaction for a product.
The data on customer satisfaction is obtained on a 3 point scale viz. highly satisfied (HS), satisfied (S), not satisfied
(NS) and also on gender- male (M) and female (F). The data is recorded as shown below:

Respondents 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Gender M F F F F F F M M M M M F F F F M F M M
Satisfaction S S NS S NS NS NS NS HS HS S NS HS S S HS NS NS S S
level

17-12-2020 EPGP 13
Scatter Plots and Correlation

• A scatter plot (or scatter diagram) is used to show


the relationship between two variables
• Correlation analysis is used to measure strength of
the linear association between two variables
• Only concerned with strength of the relationship
• No causal effect is implied

17-12-2020 EPGP 13
Scatter Plot Examples
Linear relationships Curvilinear relationships

y y

x x

y y

x x
17-12-2020 EPGP 13
Strong relationships Weak relationships

y y

x x

y y

x x
17-12-2020 EPGP 13
No relationship

x
17-12-2020 EPGP 13
Correlation Coefficient

• The correlation coefficient (r) is used to measure


the strength of the linear relationship in the sample
observations

17-12-2020 EPGP 13
Calculating sample Correlation Coefficient

cov( x, y )
rxy =
sx s y
1
cov( x, y ) =  ( xi − x )( yi − y )
n
1 1
sx =
n
 ( xi − x ) 2
s y =
n
 ( y i − y ) 2

17-12-2020 EPGP 13
Features of correlation coefficient
• Unit free
• Range between -1.00 and 1.00
• -1≤r<0 implies that as X ↑ (↓), Y ↓ (↑ )
• 0< r≤1 implies that as X ↑ (↓), Y ↑ (↓)
• The closer to -1.00, the stronger the negative linear relationship
• The closer to 1.00, the stronger the positive linear relationship
• The closer to 0.00, the weaker the linear relationship
• r=0 implies that X and Y are not linearly associated

17-12-2020 EPGP 13
Examples of Approximate r Values

y y y

x x x
r = -1.00 r = -.60 r = 0.00
y y

x x
17-12-2020 r = 0.20 EPGP 13 r = 1.00

You might also like