Professional Documents
Culture Documents
1 - Business Statistics
1 - Business Statistics
1 - Business Statistics
S. No Lecture Coverage
Text / References:
1. Levin, I. R.& Rubin, D. S (2011). Statistics for management. New Delhi: Prentice
Hall India Publications.
1
Business Statistics
3
Origin
• Statistics originated from the Italian word “statista”(meaning
statesman). It was first used by Gottfried Achenwall (1719-1772)
German statistician and philosopher.
• Long before the eighteenth century people have been recording and
using data.
4
Statistics is used in
all fields and is the
SCIENCE OF
CONDUCTING
STUDIES TO
COLLECT,
ORGANISE,
SUMMARIZE,ANALY
ZE AND DRAW
CONCLUSIONS
FROM THE DATA
5
Statistics in Management Domain
• It is used in many disciplines such as financial analysis, econometrics ,
auditing, production and operations including services improvement
and marketing research".
6
Statistical Tools for Decision Making
• Forecasting techniques :
• Economy fluctuations
• Business predictions
• Trend estimates :
• Costs, price, sales, demand, supply, profitability
• Statistics and Economics : Statistical tools are found immensely useful to solve
economic problems like wages, prices, demand analysis etc. Wide applications of
Mathematics and Statistics has led development of Econometrics stream.
• Statistics and Business : The success of business depends on the use of statistical
models for prediction like correlation, regression. Any new startup will depend
on the market study, business pointers for growth and the forecasted demand
supply for the product.
8
Importance & Scope of Statistics
• Statistics and Industry : Statistics is widely used for Production planning, quality
control, inventory management. Inspection plans, control charts, sampling are
used regularly.
• Statistics and Finance : Statistical data, analysis, interpretations are widely used
by analyst for stock predictions, investment opportunities and guide the
consumer to manage fiances.
9
Statistics
There are 2 subdivisions in Statistics
2. Inferential Statistics – Making inferences from our data. Predicting behavior of the data.
Hypothesis testing, Probability are some of the tools used.
Statistics
Inferential : Drawing
Descriptive : Presenting,
conclusions about the
organising, summarising
populations from sample
data 10
drawn
CHAPTER 1
11
Objectives
• Data organisation
• Frequency distribution
12
1. Data processing- Classification – Summarization –
13
Data Processing
14
Data Processing
• Sources of Data – Primary and Secondary
• When the investigator collects first hand data for a task it is Primary data.
• If the data is obtained from published or unpublished sources such data is
secondary data.
15
Classification
The collected data is classified on the following four aspects :
1. Geographical
2. Chronological
3. Qualitative
4. Quantitative
16
17
Classification
2. Chronological Classification :
If the data is arranged according to time ie years, months or weeks
then the classification is called chronological classification.
Similarly Profits of the Company.
Year Sales figures (Millions)
2015 200
2016 230
2017 250
2018 (estimated) 300
18
Classification
3. Qualitative classification :
There are certain characteristics which cannot measured like intelligence, smell,
taste, honesty etc. These are called qualitative characteristics. There can be simple
and manifold classification.
Simple classification : Workers in a factory can be classified as Skilled or unskilled.
Workers
Skilled Unskilled
20
21
Data Processing
22
23
Data Processing
Steps for Data processing cycle :
Collecting Data :
1. First stage which is very crucial since the quality of data will impact the output.
It should represent all the groups.
2. Data can be collected from actual observations or from maintained records,
files, library.
3. Data should be organised in simple, compact, usable forms.
4. Data should be accurate. “GIGO” Garbage in, Garbage Out should avoided.
5. The main question is “Is the data worth usable ?”
24
Population and Sample
• Data is gathered from a Sample. This is used to make inferences
about the population.
• Population includes all the elements of a data set.
• Sample consists of one or more observations from the population.
25
Example
• 25% of the cars sold in US in 1996 were manufactured in Japan. Is this conclusion
drawn from sample or population ?
• A FMCG company with global customer base of 20,000 would like to understand
customer satisfaction rate. Is it advisable to have sample or population for the
study ?
26
Organising Data
27
Organising Data
Frequency distribution is a table that organizes data into classes and the number of observations from the
Data that fall into the classes. 28
Organising Data
16.2 15.8 15.8 15.8 16.3 15.6
15.7 16.0 16.2 16.2 16.8 16.0
16.4 15.2 15.9 15.9 15.9 16.8
15.4 15.7 15.9 16.0 16.3 16.0
16.4 16.6 15.6 15.6 16.9 16.3
29
Classification and Tabulation
1. Class interval : Class intervals are numerical groups which include all
values between the minimum and maximum in a group.
Eg. In the following example class intervals are 20-25, 25-30, 30-35.
Class
Total Frequency
Interval
20-25 4
25-30 6
30-35 9
35-40 7
40-45 4
Total 30
30
Classification and Tabulation
1. Class interval : Class intervals are numerical groups which include all
values between the minimum and maximum in a group.
Eg. Suppose following are IQ scores of 10 students in ascending order.
32
Classification and Tabulation
4. Exclusive method of writing class interval : In this method the lower
class limit is included but upper class limit is not included in the class
interval. Eg a boy whose age is 10 is included in the class interval 10-20
but not included in the class interval 0-10.
33
Organising Data
• Decide on the number of classes for dividing the data
34
35
Organising Data
Data array of average inventory from 20
retail outlets
• Frequency Class 2.0 3.8 4.3 4.7 5.5
• Class limits 3.4 4.0 4.2 4.8 5.5
15.5-15.7 5
15.8-16.0 11
16.1-16.3 6
16.4-16.6 3
16.7-16.9 3
37
Problem
38
Lower Limit Upper Limit Frequency Relative Frequency
1 6 4 0.20
7 12 8 0.40
13 18 4 0.20
19 24 4 0.20
total 20
b. If the company wants to ensure that half of its deliveries are made in <10 days
Can you determine from the table whether they have reached the goal ?
We can conclude that 20% or 40% orders are completed in 12 or less days. The distribution
does not give enough information to make a conclusion.
39
Problem
ABC Store recorded the number of service tickets
submitted by each of its 20 stores last month as
follows :
823 648 321 634 752
669 427 555 904 586
722 360 468 847 641
217 588 349 308 766
41
Graphical representation : Bar Graph
42
Simple Bar diagram
• Following table gives the population of different states in the year
2001. Represent the data by a suitable diagram.
43
Multiple Bar diagram
• These diagrams are used to when there are 2 or more variables.
Eg : The table below gives the export and import figures of company ABC Ltd during
the years 2004-2007. Represent the data by a suitable diagram.
44
Percentage Bar Diagram
If the components of the variable are expressed in percentages to the
total then the corresponding bar diagram is percentage bar diagram.
Eg Following table gives the information of persons working in a
workshop during 2001,2002 and 2003.
45
Pie Chart
In this diagram a circle is divided into sectors with areas proportional to the the values of the
components of a variable. The angle at the center of the circle is 360 degrees. The value of each
component is given as an angle of the sector of the circle and calculated as :
Angle of any sector = Component value/Total value * 360 degree
Eg. Following table give the allocation of funds under different heads. Draw a pie diagram.
Heads Fund Angle in
alloca degrees
tion
Agriculture 130 65
Irrigation 145 72.5
Small industries 150 75
Transport 110 55
Social Service 125 62.5
Inventories 60 30
Total 720 360
46
Graphs of Time Series
• When data is arranged according to the order of occurrence the resulting data is
time series. Eg inflation data each month, share prices for the day/month/year.
• Time is taken on X axis and the other variable is taken on Y axis
• Eg The data relating to the production of washing machines in a factory during
2001 – 2005 is given in the table. Plot the graph.
47
Histogram
• Histogram is a simple method of showing the frequency distribution.
• The class intervals are taken on X axis and the frequencies on Y axis.
• Draw histogram for the following data :
Class Frequency
internvals
20-40 15
40-60 28
60-80 42
80-100 30
100-120 16
48
Frequency Polygon
• To make the frequency polygon the class marks are taken on the X
axis and frequencies along Y axis.
• Eg
Classes Frequency
40-50 9
50-60 5
60-70 18
70-80 20
80-90 14
90-100 4
49
Frequency Curve
• The method of plotting points on Frequency Curve is similar to
Frequency Polygon. These points are joined by smooth curve and not
line.
Age No of employees
20-25 12
25-30 18
30-35 26
35-40 30
40-45 24
45-50 11
50
Measures of Central Tendency
Central tendency : is the middle point of a distribution. Measures of central
tendency are also called measures of location.
Dispersion : Dispersion is the spread of the data in a distribution that is the extent
to which data is scattered.
51
Measures of Central Tendency and Variation
Data Description
Range,Std Deviation,Interquartile
Mean,Median,Mode,Geometric range,Variance,Coefficient of
Mean Variation
52
Arithmetic Mean : Ungrouped data
To find the arithmetic mean we sum the values and divide by the number of observations.
Example in Excel.
Following is the time recorded for track team members in 1-Mile race
Member 1 2 3 4 5 6 7
Time in Minutes 4.2 4.3 4.7 4.8 5.0 5.1 5.0
Advantages : 1. Mean concept is familiar to all 2. Every data set has a mean
3. It is useful for further statistical analysis
Disadvantages : 1. Though there are advantages, the mean gets affected by extreme values that
are not representative of the data set.
2. When the data set is big it is difficult to calculate 53
Arithmetic Mean : Grouped data
A frequency distribution consists of data that are grouped by classes.
Each value of an observation falls somewhere in these classes.
Class Frequency
15.2-15.4 2
15.5-15.7 5
X = fx
15.8-16.0 11 N
16.1-16.3 6
16.4-16.6 3
16.7-16.9 3
54
Arithmetic Mean : Grouped data
In a grouped data midpoint of the class intervals are taken.
Eg Calculate mean of the following distribution.
Mean = (270+650+770+1530)/50
= 3220 / 50 = 64.4
55
Coding
Instead of using actual mid-points to perform our calculations we assign
consecutive integers / we assign codes to each of the midpoints.
Zero is assigned to mid-point of the frequency distribution.
9 Frequency classes : midpoint is 5th class
56
Coding
57
Coding : Example
x0 = value of midpoint assigned value zero
X = 19.5 + 8 *( 5/20)
= 21.5 Average annual snowfall
58
Weighted Average
Arithmetic mean of a combined group can be calculated as
= n1 x1 + n2 x2
x
n1+n2
The mean weight of 40 boys is 60 kgs and that of 35 girls is 54 kgs. Find
their combined mean.
n1 = 40, x1 = 60 n2 = 35 x2 = 54 kgs
59
Weighted averages
Eg. Marks of 2 candidates A and B in written test, group discussion and interview. If
the weights of the same are 3,2,1 then determine who is the better candidate.
Example : The mean wage of 100 laborers working in a factory running 2 shifts of
60 and 40 workers is Rs.38. The mean wage of 60 laborers working in the morning
shift is Rs. 40. Find the mean of 40 laborers working in the evening shift.
Ans : 35
61
Median
• Median is the single value in a data set that measures the central item in the
data.
Disadvantages : 1. Many times data doesn't have modal value, so every value
is mode.
63
Geometric Mean
• When we are dealing with quantities that change over a period of time we need to know an average rate of
change eg average growth rate over a period of several years.
64
Geometric Mean
n
GM = √Product of all x values
65
Dispersion
• Dispersion in Statistics is a way of describing the data spread.
• Advantages are :
• It gives us additional information to judge
• We must be able to
tackle data which are
widely spread
• We may wish to
compare dispersion of
various samples
66
67
Dispersion
68
Range
• Difference between highest and lowest value
Interquartile Range :
• Divide our data in four parts each of which contains 25% of the
distribution.
• Interquartile range = Q3 – Q1
• Quartile that divide the data into four parts and Percentiles divide the
data into 10 parts.
69
Interquartile Range
70
Measures of Dispersion
Eg Calculate Quartile deviation from the following figures of daily expenses RS
• Q2 = 5
• Q3 = 3 (N + 1) /4 = 3 * 10/4 = 7.5 item
• Q3 = ( 7 + 11 ) /2 = 9
73
74
Measures of Dispersion
Population Variance and Standard Deviation
Every population has variance which is given by
- Population variance
X – observation
µ - population mean
N – total number of observations
75
Measures of Dispersion
Sample variance
Find the variance and standard deviation
for the
S Sample standard deviation
x individual values
X Sample mean Following data :
n-1 number of observations in the sample 35,45,30,35,40,25
76
Problem
17,21,18,27,17,21,20,22,18,23
77
Problem
78
Problem – Standard Deviation
• Suppose we want to understand how volatile are stocks of 2
companies. How risky it is to invest in a company? Following are the
year by year returns.
79
Coefficient of Variation
80
Coefficient of Variation
CV is useful when you want to compare results of 2 different surveys or tests that have different measures
Or values. If sample A has CV of 12% and sample B of 25% , sample B has more variability.
ABC Ltd is considering employing one of two training programs. Group 1 was trained by Program A and group 2 by
Program B. For the first group the time average time required is 32.11 hours and variance of 68.09. For the second
Group average is 19.75 hours and variance is 71.14. Which training program has less variability ?
(25.7 and 42.7)
81
Coefficient of Variation
82