Professional Documents
Culture Documents
Module 3
Module 3
bschool.cms.ac.in
Introduction
bschool.cms.ac.in
Introduction
bschool.cms.ac.in
Introduction
bschool.cms.ac.in
Introduction
bschool.cms.ac.in
Introduction
bschool.cms.ac.in
Introduction
bschool.cms.ac.in
Comparative Analysis
bschool.cms.ac.in
Comparative Analysis
bschool.cms.ac.in
Correlation Analysis
bschool.cms.ac.in
Correlation Analysis
• Consider an example:
• Typically, in the summer as the temperature increases people are
thirstier.
• Consider the two numerical variables, temperature and water
consumption.
• We would expect the higher the temperature, the more water a
given person would consume.
• Thus we would say that in the summer, temperature and water
consumption are positively correlated.
bschool.cms.ac.in
Correlation Analysis
• Consider another example:
• For seven random summer days, a person recorded the
temperature and their water consumption, during a three-hour
period spent outside.
Water Consumption
Temperature (F)
(Ounces)
75 16
83 20
85 25
85 27
92 32
97 48
99 48 bschool.cms.ac.in
Correlation Analysis
• The graph below helps visualize what appears to be a somewhat
linear relationship between temperature and the amount of
water one drinks.
bschool.cms.ac.in
Correlation Analysis
bschool.cms.ac.in
Significance of Measuring Correlation
bschool.cms.ac.in
Correlation Coefficient
bschool.cms.ac.in
Correlation Coefficient
• Positive Correlation: As the ‘X’ variable increases so does the ‘Y’
variable.
r value Positive Correlation Interpretation
0 No linear relationship
Example: As the price of an item increases, the number of items sold decreases.
bschool.cms.ac.in
Correlation Coefficient
bschool.cms.ac.in
Measures of Correlation
bschool.cms.ac.in
Scatter Diagram
bschool.cms.ac.in
Scatter Diagram
• If the variables form a positives slope (a line moving in the upward
direction) they are said to be perfectly positively correlated.
• If the variables are clustered around the positive slope then they are
positively correlated.
bschool.cms.ac.in
Scatter Diagram
• If the variables form a negative slope (a line moving in downward direction)
they are said to be perfectly negatively correlated.
• If the variables are clustered around the negative slopes, they are negatively
correlated.
If the variables are spread all over the graph, then they are not correlated. bschool.cms.ac.in
Karl Pearson’s Correlation Coefficient
bschool.cms.ac.in
Properties of Karl Pearson’s Correlation Coefficient
• The value of r does not depend upon the units of measurement.
• The value of r does not depend upon which variable is labelled ‘X’
and which is labelled ‘Y’
• Correlation coefficient lies between -1 and 1. A positive value of r
means a positive linear relationship, a negative value means a
negative linear relationship
• If r = ±1, then all the points of the scatter diagram lie exactly on a
straight line and the correlation is said to be positive perfect if r =
+1 and negative perfect if r = -1.
• ‘r’ measures only the linear relationship between ‘X’ and ‘Y’
bschool.cms.ac.in
Karl Pearson’s Correlation Coefficient
bschool.cms.ac.in
PRACTICE :
Numerical Problems
bschool.cms.ac.in
Karl Pearson’s Correlation Coefficient
• A travel and leisure magazine provides an annual list of the 500
best hotels in India. The magazine provides a rating for each
hotel along with a brief description that includes the size of the
hotel, amenities and the cost per night for a double room. A
sample of 12 of the top-rated hotels in India is as follows:
bschool.cms.ac.in
Hotel Location No. of Rooms Cost/night Rs. ’00)
Cubs Trail Resort Kanha, MP 220 499
Seasons Resort and Spa Cochin, Kerala 727 340
Buffalo Inn Coorg, Karnataka 285 585
Swasti Heritage Hotel Udaipur, Rajasthan 273 495
Tiger Den Jim Corbett, Uttarakhand 145 495
Snowden Spa and resorts Dharmashala, HP 213 279
Sun & Sand Beach Resort Panjim, Goa 398 279
Sand Stone Beach Resort Mahabalipuram, TN 343 455
Snow View Towers Gangtok, Sikkim 250 595
Six Seasons Beach Resort Vizag, AP 414 367
Golden Sands Mapusa, Goa 400 675
Chiru Towers Hyderabad, Telangana 700 420
bschool.cms.ac.in
Problem 1
Questions:
a. Develop a scatter diagram with the number of rooms on the
horizontal axis and the cost per night on the vertical axis. Does
there appear to be a relationship between the number of rooms
and the cost per night? Discuss.
b. What is the correlation coefficient? What does it tell you about
the relationship between the number of rooms and the cost per
night for a double room? Does this appear reasonable? Discuss.
bschool.cms.ac.in
The data points on the
scatter diagram does not
follow any pattern. They
neither are around positive
scope nor around negative
scope.
Hence, there appears no
relationship between the
number of rooms and the
cost per night per room.
bschool.cms.ac.in
X Y X2 Y2 XY
220 499 48400 249001 109780
727 340 528529 115600 247180
285 585 81225 342225 166725
273 495 74529 245025 135135
145 495 21025 245025 71775
213 279 45369 77841 59427
398 279 158404 77841 111042
343 455 117649 207025 156065
250 595 62500 354025 148750
414 367 171396 134689 151938
400 675 160000 455625 270000
700 420 490000 176400 294000
4368 5484 1959026 2680322 1921817
bschool.cms.ac.in
X Y X2 Y2 XY
220 499 48400 249001 109780
727 340 528529 115600 247180
285 585 81225 342225 166725
273 495 74529 245025 135135
145 495 21025 245025 71775
213 279 45369 77841 59427
398 279 158404 77841 111042
343 455 117649 207025 156065
250 595 62500 354025 148750
414 367 171396 134689 151938
400 675 160000 455625 270000
700 420 490000 176400 294000
4368 5484 1959026 2680322 1921817
bschool.cms.ac.in
Solution 1
Since r = -0.29, there is a weak negative correlation between the number of
rooms and the cost of room per night.
This does appear reasonable as this result is a reflection of the scatter diagram.
bschool.cms.ac.in
Problem 2
Newly appointed finance secretary receives a feedback from his team in a
review meeting about the rising unemployment in the country. Coming from
the science background, he decides to take various parameters to understand
the real reason behind the rise in the unemployment rate. One of the
parameters he selects is the industrial production. He seeks the data about the
industrial production index and number of unemployed people between 2012
and 2019 from his team.
He gets the following table that gives indices of industrial production and
number of registered unemployed people (in lakh). He decides to use the
correlation analysis to understand the relationship between the given data.
Use the Karl Pearson’s Coefficient of Correlation analysis to find out what the
finance secretary discovers from the given data.
bschool.cms.ac.in
Problem 2
Number Unemployed 15 12 13 11 12 12 19 26
bschool.cms.ac.in
X Y X2 Y2 XY
100 15 10000 225 1500
102 12 10404 144 1224
104 13 10816 169 1352
107 11 11449 121 1177
105 12 11025 144 1260
112 12 12544 144 1344
103 19 10609 361 1957
99 26 9801 676 2574
832 120 86648 1984 12388
bschool.cms.ac.in
X Y X2 Y2 XY
bschool.cms.ac.in
Problem 3
A financial analyst wanted to find out whether inventory turnover influences
any company’s earnings per share (in percent). A random sample of 7
companies listed in a stock exchange was selected and the following data was
recorded for each. Find the strength of association between inventory turnover
and earnings per share. Interpret this finding to the analyst.
bschool.cms.ac.in
Problem 3
Inventory Turnover Earnings per share
Company
(no. of times) (percent)
A 4 11
B 5 9
C 7 13
D 8 7
E 6 13
F 3 8
G 5 8
bschool.cms.ac.in
X Y X2 Y2 XY
4 11 16 121 44
5 9 25 81 45
7 13 49 169 91
8 7 64 49 56
6 13 36 169 78
3 8 9 64 24
5 8 25 64 40
bschool.cms.ac.in
X Y X2 Y2 XY
4 11 16 121 44
5 9 25 81 45
7 13 49 169 91
8 7 64 49 56
6 13 36 169 78
3 8 9 64 24
5 8 25 64 40
bschool.cms.ac.in
Solution 3
Since r = 0.126, there is a weak positive correlation between inventory turnover
and earnings per share.
This means, as the inventory turnover increases, the earning per share
increases not significantly.
bschool.cms.ac.in
Problem 4
A nutritionist well-known for her nutritional prescriptions to pregnant women
wishes to estimate the association between gestational age and infant birth
weight in order to enhance her prescriptions. For this, a small study is
conducted involving 10 infants to investigate the association between
gestational age at birth, measured in weeks, and birth weight, measured in
grams. Calculate the association and give recommendations to the nutritionist.
bschool.cms.ac.in
Problem 4
Infant ID Gestational Age (In Weeks) Birth Weight (In Grams)
1 35 1895
2 36 2030
3 29 1440
4 40 2835
5 36 3090
6 42 3827
7 40 3260
8 37 2690
9 41 3285
10 38 2920
bschool.cms.ac.in
X Y X2 Y2 XY
bschool.cms.ac.in
Problem 5
The success of a shopping center can be represented as a function of the
distance (in miles) from the center of the population and the number of clients
(in hundreds of people) who will visit. The data is given in the table below.
Calculate the linear correlation coefficient.
No. Customers 8 7 6 4 2 1
Distance 15 19 25 23 34 40
bschool.cms.ac.in
Association Analysis of Ranked Order
• At times we need to measure the strength of the linear relationship between
variables using data which can be trusted only to the extent of its rank
ordering.
• The rank correlation coefficient may be used in many situations, for which the
conventional correlation coefficient is unsuitable.
• Spearman's rank correlation coefficient is a measure of rank correlation
(statistical dependence between the rankings of two variables).
bschool.cms.ac.in
Association Analysis of Ranked Order
• When the given pairs of observations in the data set are not ranked, the ranks
are assigned by taking either the highest or the lowest value as 1 for both the
variable’s value.
• While attempting to rank the observations as mentioned above, we may come
across a situation of more than one observations being of equal size.
• In such a case, the rank to be assigned to individual observations is an
average of the ranks which these individual observations would have got had
they differed from each other.
• For example, if two observations are ranked equal at fourth place, then the
rank 4 is assigned to these two observations. But the next rank would be 6
and not 5.
bschool.cms.ac.in
Spearman’s Rank Correlation Coefficient
•
bschool.cms.ac.in
Problem 1
In one of the recruitment drives, Tata Motors Limited (TML) decided that they
would select a group of employees for skill based training on the basis of
aptitude tests. On completion of training, the quality of their work is assessed
and they are again ranked as follows where ‘X’ denotes aptitude ranking and ‘Y’
denote quality ranking. Calculate the rank correlation and comment on the
selection of employees.
X 2 1 3 7 6 8 4 5 10 9
Y 3 2 1 8 4 9 5 6 10 7
bschool.cms.ac.in
Solution 1
X Y D D2 •
2 3 -1 1
1 2 -1 1
3 1 2 4
7 8 -1 1
6 4 2 4
8 9 -1 1
4 5 -1 1
5 6 -1 1
10 10 0 0
9 7 2 4
18
bschool.cms.ac.in
Problem 2
The following table provides data about the percentage of students who have qualified for a
scholarship offered by the state universities and their CGPA scores. Calculate the Spearman’s
Rank Correlation between the two and interpret the result.
State University % of Students qualified for scholarship % of students scoring above 8.5 CGPA
Bangalore 14 54
Delhi 7 64
Mumbai 27 44
Jaipur 33 32
Kolkata 38 37
Raipur 16 68
Vishakhapatnam 5 62
Trichy 8 43
Bhopal 29 49
Cuttack 18 52
bschool.cms.ac.in
X Rank X Y Rank Y D2 •
14 7 54 4 9
7 9 64 2 49
27 4 44 7 9
33 2 32 10 64
38 1 37 9 64
16 6 68 1 25
5 10 62 3 49
8 8 43 8 0
29 3 49 6 9
18 5 52 5 0
278
bschool.cms.ac.in
Problem 3
Following the tradition followed for years, the department of Horticulture, Karnataka
organized its annual Republic Dar flower show at Lalbagh Botanical Garden,
Bengaluru from 17th to 28th January 2020. As is the practice, the best theme would
be awarded. To judge the display of various flowers a panel comprising of three judges
was appointed. There were eight participants who were ranked by the panel based on
mutually agreed criteria. The panel’s rankings are as follows:
Participant No. 1 2 3 4 5 6 7 8
Judge 1 4 5 2 1 6 8 7 3
Judge 2 3 2 6 8 1 5 7 4
Judge 3 1 5 3 6 8 7 4 2
Using Spearman’s rank correlation coefficient, name two among the three judges
who have closer views regarding the display of flowers. bschool.cms.ac.in
Correlation between Judge 1 and Judge 2
X (Judge 1) Y (Judge 2) D2
•
4 3 1
5 2 9
2 6 16
1 8 49
6 1 25
8 5 9
7 7 0
3 4 1
110
bschool.cms.ac.in
Correlation between Judge 2 and Judge 3
Y (Judge 2) Z (Judge 3) D2
•
3 1 4
2 5 9
6 3 9
8 6 4
1 8 49
5 7 4
7 4 9
4 2 4
92
bschool.cms.ac.in
Correlation between Judge 3 and Judge 1
Z (Judge 3) X (Judge 1) D2
•
1 4 9
5 5 0
3 2 1
6 1 25
8 6 4
7 8 1
4 7 9
2 3 1
50
Since the correlation between judge 1 and judge 3 are the highest, they have closer
views regarding the display of flowers. bschool.cms.ac.in
Problem 4
Covid-19 data as compiled by the Ministry of Health and Family Welfare, India,
the following data is selected to measure the association between the number
of active cases, number of cured cases and the number of deaths. Using
Spearman’s rank correlation coefficient, name two among the three variables
which have closer association.
bschool.cms.ac.in
Problem 4
State No. of active cases No. of cured cases No. of deaths
bschool.cms.ac.in
X (Active Cases) Rank X Y (Cured Cases) Rank Y Z (Deaths) Rank Z
405 5 426 5 31 4
364 6 377 6 6 6
26 8 489 4 6 6
29 7 34 8 2 8
1678 3 168 7 31 4
bschool.cms.ac.in
X Y D2
•
5 5 0
1 1 0
6 6 0
4 2 4
8 4 16
2 3 1
7 8 1
3 7 16
38
bschool.cms.ac.in
Y Z D2
•
5 4 1
1 1 0
6 6 0
2 3 1
4 6 4
3 2 1
8 8 0
7 4 9
16
bschool.cms.ac.in
Z X D2
4 5 1 •
1 1 0
6 6 0
3 4 1
6 8 4
2 2 0
8 7 1
4 3 1
8
Since the correlation between no. of active cases and the no. of deaths are the highest,
they have closer association. bschool.cms.ac.in
Problem 5
The following data corresponds to the scores of a student of MBA at Jain
University in continuous assessment in 2nd Semester. His mentor wishes to
know if there is any association between the marks scored by the student in
two subjects. Use Spearman’s rank correlation analysis to measure the
association and interpret the result.
X 78 42 90 24 73 80 81 62 65 42
Y 84 51 92 43 75 54 86 54 54 43
bschool.cms.ac.in
X Rank X Y Rank Y D2 •
78 4 84 3 1
42 8 51 8 0
90 1 92 1 0
24 10 43 9 1
73 5 75 4 1
80 3 54 5 4
81 2 86 2 0
62 7 54 5 4
65 6 54 5 1
42 8 43 9 1
13
bschool.cms.ac.in
Problem 6
TVS Motor Company is about to launch their new 100 CC stylish scooter targeted at
the youth. As part of the testing processes, they decide to invite the general public to
test drive the scooters in order to evaluate its mileage. For this experiment, the
company selects two youths as test drivers from two different colleges in Bengaluru
and Chennai. Each driver is supposed to travel a distance on 9 random routes and
record observations. The observations are as follows. Use Spearman’s rank correlation
analysis to measure the association between two drivers and interpret the result.
X 41 49 52 35 41 42 30 50 48
Y 51 44 44 47 49 51 28 39 22
bschool.cms.ac.in
X Rank X Y Rank Y D2 •
41 6 51 1 25
49 3 44 5 4
52 1 44 5 16
35 8 47 4 16
41 6 49 3 9
42 5 51 1 16
30 9 28 8 1
50 2 39 7 25
48 4 22 9 25
137
bschool.cms.ac.in
Business Prediction Models
• Irrespective of the sector, the organizations are literally in the race
to predict the future of their organization, be it in terms of
opportunities or challenges.
• They are finding ways for prediction (Forecasting)
• Looking at some of the oldest forecasting strategies, the most
common one would be to use the historic data and forecast.
• As the data grew, the need to analyse this data using relevant tools
came up.
• Statistical tools were the result.
bschool.cms.ac.in
Business Prediction Models
• Among many such statistical forecasting tools was regression
analysis.
• Others are simulation technique, exponential smoothing, etc.
• Regression analysis is one of the most tried & tested, popular
statistical tools for forecasting.
bschool.cms.ac.in
Regression Analysis
• It was Sir Francis Galton who first used the term regression as a
statistical concept in 1877.
• He made a statistical study that showed that the height of children
born to tall parents tends to ‘regress’ towards the mean height of
population.
• Galton used the term regression as a statistical technique to
predict one variable (the height of children) from another variable
(the height of parents).
• This is called ‘regression’ or ‘simple regression’ confined to
bivariate data.
bschool.cms.ac.in
Regression Analysis
• Regression analysis tells us how one variable is related to another
by providing an equation that allows us to use the known value of
one or more variables, to estimate the unknown value of the
remaining variable.
• A statistical model is a set of mathematical formulae and
assumptions which describe a real world situation.
bschool.cms.ac.in
Regression Analysis
• Regression analysis is a mathematical measure, which helps to
determine the probable form of the relationship between variables
and it is used to predict or estimate the value of one variable,
corresponding to a given value of another variable.
• The variable being predicted is called dependent variable and
variable used to predict the value of dependent variable is called
independent variable.
bschool.cms.ac.in
Regression Analysis
• The simplest type of regression analysis involving one
independent variable and one dependent variable in which the
relationship between the variables is approximated by a straight
line is called linear regression.
• Regression analysis involving two or more independent variables
is called multiple regression analysis.
• The relationship between two variables is quantified by
representing the line of best fit as a mathematical equation known
as regression equation.
• In other words, the linear relationship between two variables can
be described by a straight line, which is known as regression line.
bschool.cms.ac.in
Regression Analysis
•
bschool.cms.ac.in
Regression Analysis
•
bschool.cms.ac.in
Regression Analysis - LEAST SQUARE METHOD
•
bschool.cms.ac.in
Regression Analysis - LEAST SQUARE METHOD
The goal is to minimize the sum of the square of the errors of the data
points using Ei = Yi - (a+bX). This minimizes the Mean Square Error
bschool.cms.ac.in
Regression Equations
•
bschool.cms.ac.in
Regression Coefficients
•
bschool.cms.ac.in
Properties of Regression Coefficients
• Correlation coefficient is the geometric mean between the
regression coefficients.
• Arithmetic mean of the regression coefficient is greater than or
equal to the correlation coefficient.
• Regression coefficients are independent of change of origin but
not of scale
• If one of the regression coefficient is greater than unity, the other
must be less than unity
• Both the regression coefficients will have the same sign, either
positive or negative.
bschool.cms.ac.in
Problem 1
• The government of India is announcing plenty of reforms during this
pandemic period. In continuation with this activity, it has assigned the
ministry of commerce and industry to predict the relationship between
import and export values of electronic sector in the country. The ministry has
gathered the data between 2013-14 and 2018-19 from DGCIS for the
prediction. Use regression analysis to model the relation between import
& export and vice versa of the electronic data. Also predict the import for
the year 2020-21 given that the export will be USD 11 Billion.
32 8 1024 64 256
36 6 1296 36 216
40 6 1600 36 240
42 7 1764 49 294
51 7 2601 49 357
55 9 3025 81 495
bschool.cms.ac.in
•
bschool.cms.ac.in
Problem 2
• Ralison Appliances Pvt. Ltd. manufactures different types of electrical
appliances in India. It has been using radio (FM) for advertising its products.
The following table shows the amounts of radio time and the number of
electrical appliances sold over seven weeks. Fit linear equations of radio time
on the number of electrical appliances sold and vice-versa. Also calculate the
sales when the radio time is 24 minutes.
bschool.cms.ac.in
X Y X2 Y2 XY
bschool.cms.ac.in
Multiple Linear Regression
• Multiple linear regression (MLR), also known simply as multiple
regression, is a statistical technique that uses several explanatory
variables to predict the outcome of a response variable.
• The goal is to model the linear relationship between the
explanatory (independent) variables and response (dependent)
variable.
• In essence, multiple regression is the extension of ordinary
least-squares regression that involves more than one explanatory
variable.
bschool.cms.ac.in
Multiple Linear Regression
• A simple linear regression is a function that allows an analyst or
statistician to make predictions about one variable based on the
information that is known about another variable.
• Linear regression can only be used when one has two continuous
variables—an independent variable and a dependent variable.
• The independent variable is the parameter that is used to calculate
the dependent variable or outcome.
• A multiple regression model extends to several explanatory
variables.
bschool.cms.ac.in
Multiple Linear Regression
• For example, an analyst may want to know how the movement of
the market affects the price of Exxon Mobil (XOM).
• In this case, his linear equation will have the value of the S&P 500
index as the independent variable, or predictor, and the price of
XOM as the dependent variable.
bschool.cms.ac.in
Multiple Linear Regression
• In reality, there are multiple factors that predict the outcome of an
event.
• The price movement of Exxon Mobil, for example, depends on
more than just the performance of the overall market.
• Other predictors such as the price of oil, interest rates, and the
price movement of oil futures can affect the price of XOM and stock
prices of other oil companies.
• To understand a relationship in which more than two variables are
present, a multiple linear regression is used.
bschool.cms.ac.in
Multiple Linear Regression
• Multiple linear regression is used to determine a mathematical
relationship among a number of random variables.
• In other terms, it examines how multiple independent variables
are related to one dependent variable.
• Once each of the independent factors has been determined to
predict the dependent variable, the information on the multiple
variables can be used to create an accurate prediction on the level
of effect they have on the outcome variable.
• The model creates a relationship in the form of a straight line
(linear) that best approximates all the individual data points.
bschool.cms.ac.in
Multiple Linear Regression
A two-variable multiple linear regression equation is given as:
Y = a + b 1X1 + b 2X2
bschool.cms.ac.in
Problem 3
• People in the aerospace industry believe the cost of a space project is a
function of the weight of the major object being sent into space. Use the
following data to develop a regression model to predict the cost of a space
project by the weight of the space object.
bschool.cms.ac.in
Problem 4
• The editor-in-chief of Bangalore Mirror has been trying to convince the
paper’s owner to improve the working conditions in the press room. He is
convinced that the noise level, when the presses are running, creates
unhealthy levels of tension and anxiety. He recently had a psychologist
conduct a test during which pressmen were placed in rooms with varying
levels of noise and then given a test to measure mood and anxiety levels. The
following table shows the index of their degrees of nervousness and the level
of noise to which they were exposed (5 is low and 10 is high). Develop
estimating equations. Also predict the degrees of nervousness that we might
expect when the noise level is 7.5
Noise Level 7.0 6.5 5.5 6.0 8.0 8.5 6.0 6.5
Degree of Nervousness 23 38 45 36 16 18 39 41
bschool.cms.ac.in
Problem 5
• As a part of a study on transportation safety, Karnataka State Government
collected data on number of fatal accidents per 1000 licenses and percentage
of licensed drivers under the age of 21 in 10 cities of the state. The data is
tabulated as below. Fit linear equations to the above data.
bschool.cms.ac.in
Problem 6
• A researcher at Jain University wished to investigate if there is any
relationship between atmospheric temperature (in oC) and the number of
Covid-19 cases. In this regard, the researcher collected the following data
from among 12 random states in India. Model the relation between the
temperature and the Covid-19 cases. Also estimate the number of Covid-19
cases when the temperature is 8oC
bschool.cms.ac.in
Problem 6
State Average Temperature (oC) No. of Covid-19 Cases
HP 17 38
J&K 18 540
UP 33 2055
Delhi 32.5 5894
Chhattisgarh 35 29
West Bengal 30.5 1818
Karnataka 27 535
Andhra Pradesh 32 952
Tamil Nadu 34 8423
Gujarat 38 6906
Goa 30 10
Rajasthan 37 2522 bschool.cms.ac.in
Problem 7
• Hyundai Motor India Ltd has recently held 3-day road-side exhibits on the
introduction of its new model of Creta. The number of sales personnel
employed at each of a sample of 10 exhibitions and the number of cars
booked at each one are given as follows. Using these data, regress the number
of cars booked on the number of salesmen and obtain the regression
equation. Also estimate the number of cars booked if 10 salesmen are
employed on an exhibition.
No. of Salesman 5 8 6 8 9 3 5 4 6 6
No. of Cars booked 132 160 148 156 168 102 142 98 152 142
bschool.cms.ac.in
Problem 8
• ITI Limited recorded data showing the experience of machine operators and
their performance rating as given by the number of good parts turned out per
100 pieces. Obtain the regression equation of performance rating on
experience. Use this equation to estimate the probable performance if an
operator has 7 years of experience.
Operator 1 2 3 4 5 6 7 8
Experience (Years) 16 12 18 4 3 10 5 12
Performance Rating 87 88 89 68 78 80 75 82
bschool.cms.ac.in
Basic terminologies
•POPULATION
•SAMPLE
•CENSUS
•SAMPLING
•STATISTIC
•PARAMETER
bschool.cms.ac.in
Population
bschool.cms.ac.in
Census
bschool.cms.ac.in
Parameter
bschool.cms.ac.in
Sample
bschool.cms.ac.in
Sampling
bschool.cms.ac.in
Statistic
bschool.cms.ac.in
What is a good sample ?
bschool.cms.ac.in
When is Census appropriate ?
•A census is appropriate if the population size itself is
quite small
•If the cost of making an incorrect decision is high, then
a census is more appropriate
•If the sampling errors are high, then a census may be
more appropriate than a sample
bschool.cms.ac.in
Advantages of Census
bschool.cms.ac.in
Disadvantages of Census
•Costly process
•Time consuming
•High amount of manpower and effort reqd.
•Handling huge data collection
•Maintenance of data base
bschool.cms.ac.in
Sampling method is more desired when….
•The Population is very large
•Quick results are required
•In studies involving destruction of the elementary
units under study
•Cost of conducting surveys are prohibitive
•Difficulty in handling large size data
bschool.cms.ac.in
Advantages of Sampling
bschool.cms.ac.in
Reasons for Taking a Census
bschool.cms.ac.in
Reasons for Sampling
• Sampling can save money.
• Sampling can save time.
• For given resources, sampling can broaden the scope of the data set.
• Because the research process is sometimes destructive, the sample
can save product.
• If accessing the population is impossible; sampling is the only option.
bschool.cms.ac.in
Sampling : Design and procedures
• The sampling design process
bschool.cms.ac.in
Target population
bschool.cms.ac.in
Sampling frame
bschool.cms.ac.in
Sample size
bschool.cms.ac.in
Qualitative factors
bschool.cms.ac.in
Formula
bschool.cms.ac.in
Numericals
bschool.cms.ac.in
Numericals
Maggi Omega is worried with their reduced sales
and hence has decided to conduct a survey with
the level of precision of ± 5 and the confidence
level of 95%. The standard deviation of the
population is known to be 55. Determine the
sample size.
bschool.cms.ac.in
Data available
•
bschool.cms.ac.in
Formula
bschool.cms.ac.in
Numericals
At the confidence level of 95% and the precision
level of ± 4, determine the sample size for a survey
given that the standard deviation of the population
to be 39.
bschool.cms.ac.in
Numericals
bschool.cms.ac.in
Formula
bschool.cms.ac.in
Numericals
bschool.cms.ac.in
Sampling techniques
bschool.cms.ac.in
bschool.cms.ac.in
Probability Sampling
bschool.cms.ac.in
Non - probability sampling
bschool.cms.ac.in
Non-probability sampling
1. Convenience sampling
2. Judgmental sampling
3. Quota sampling
4. Snowball sampling
bschool.cms.ac.in
bschool.cms.ac.in
Types of probability sampling
•Simple random sampling
•Systematic sampling
•Stratified random sampling
•Cluster sampling
•Multistage sampling
•Area sampling
•Multiphase sampling
bschool.cms.ac.in
Simple random sampling
A probability sampling technique in which each
element in the population has a known and equal
probability of selection.
Every element is selected independently of every other
element and the sample is drawn by a random
procedure from a sampling frame.
bschool.cms.ac.in
Systematic sampling
bschool.cms.ac.in
Stratified sampling
bschool.cms.ac.in
Stratified sampling….
bschool.cms.ac.in
Classification of stratified sampling
Stratified sampling is broadly classified as
Proportional stratified sampling and
disproportional stratified sampling
Proportional stratified sampling is further
classified as Directly proportional stratified
sampling and Inversely proportional stratified
sampling
bschool.cms.ac.in
Directly proportional stratified sampling
Assume that a researcher is evaluating customer satisfaction
for a beverage that is consumed by a total of 600 people.
Among the 600 people, 400 are brand loyal and 200 are
variety seeking
bschool.cms.ac.in
Inversely proportional stratified sampling
Assume that among the 600 consumers in the
population, 200 are heavy drinkers and 400 are light
drinkers. If a researcher values the opinion of the heavy
drinkers more than that of the light drinkers, more
people will have to be sampled from the heavy drinkers
group. In such instances, one can use an inversely
proportional stratified sampling
bschool.cms.ac.in
Inversely proportional stratified sampling
If a sample size of 60 is desired then a 10 percent inversely
proportional stratified sampling is employed
Total 600 60
bschool.cms.ac.in
Cluster sampling
First, the target population is divided into mutually exclusive and
collectively exhaustive subpopulations called clusters
Then, a random sample of clusters is selected based on a probability
sampling technique such as simple random sampling
For each selected cluster, either all the elements are included in the
sample or a sample of elements is drawn probabilistically
bschool.cms.ac.in
Stratified v/s Cluster sampling
Sampling efficiency improved by increasing accuracy Sampling efficiency improved by decreasing cost at a
at a faster rate than cost faster rate than accuracy
bschool.cms.ac.in