1 - Business Statistics

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 82

Gen 02 Business Statistics Sem I – 40 Teaching Hours

S. No Lecture Coverage

Data processing- Classification – Summarization – Tabulation of data –


presentation (tabular and graphic) – Frequency distribution. Measures of central
1
tendency - Arithmetic mean, Weighted mean, Geometric mean, Median, Mode,
and Partition values – Quartiles, Deciles and Percentiles. Measures of dispersion
-Range, Quartile deviation, Standard deviation, Variance and Co-efficient of
variation

2 Conditions of statistical dependence and independence,
Bayes‘ theorem and its applications. Probability distributions – Random
variable,
Expected value, Binomial, Poisson, Normal and Exponential distributions

3 Types, Probability sampling, Non-probability sampling. Estimation -Point
and Interval estimates – Confidence intervals – determining sample size

4 Business forecasting - Time series analysis, components – Methods – Straight
line method – Semi averages method – Least square method – Moving averages.
Correlation and Regression -Correlation – interpretation and applications.
Regression – meaning and uses – normal equations – model building. Correlation
versus Regression
Testing of Hypothesis - Concepts – types of errors – null and alternate
5 hypothesis – level of significance. Testing of means and proportions for small
and large samples - one-sample test. Testing of difference between means and
proportions for small and large samples. 2 Test of goodness of fit – Test of
independence – ANOVA – one way and two-way classifications.
6 Discussion on Industry use cases of Business Statistics and hands on exercise
through excel.


Text / References:
1. Levin, I. R.& Rubin, D. S (2011). Statistics for management. New Delhi: Prentice
Hall India Publications.

1
Business Statistics

• Business statistics is the science of good


decision making in the face of uncertainty.
• It is used in many disciplines such as financial
analysis, econometrics , auditing, production
and operations including services
improvement and marketing research".
• It introduces to Statistics and how we can
apply different statistical methods for
decision making.
2
Simple Puzzles

3
Origin
• Statistics originated from the Italian word “statista”(meaning
statesman). It was first used by Gottfried Achenwall (1719-1772)
German statistician and philosopher.

• Long before the eighteenth century people have been recording and
using data.

• Governments of ancient Babylonia, Egypt and Rome gathered


detailed records of populations and resources.

4
Statistics is used in
all fields and is the
SCIENCE OF
CONDUCTING
STUDIES TO
COLLECT,
ORGANISE,
SUMMARIZE,ANALY
ZE AND DRAW
CONCLUSIONS
FROM THE DATA

5
Statistics in Management Domain
• It is used in many disciplines such as financial analysis, econometrics ,
auditing, production and operations including services improvement
and marketing research".

• Application of statistical tools in the areas of production, marketing,


finance, research and development, manpower planning etc. to
support in business decision making.

• In Management Statistics is used to understand the performance.


This involves data collection of current business processes, analysis
and interpretation for Management decisions.

6
Statistical Tools for Decision Making
• Forecasting techniques :
• Economy fluctuations
• Business predictions

• Trend estimates :
• Costs, price, sales, demand, supply, profitability

• Yearly Business plans

• Measuring variations in current and new processes

• Identifying relationships between 2 or more variables

• Helps in validating the assumptions


7
Importance & Scope of Statistics
• Statistics and Planning : Organisation and Government entities depend heavily
on planning exercises which give an insight of the economy, businesses and other
development areas. To make successful planning it must be based on correct
statistical models.

• Statistics and Economics : Statistical tools are found immensely useful to solve
economic problems like wages, prices, demand analysis etc. Wide applications of
Mathematics and Statistics has led development of Econometrics stream.

• Statistics and Business : The success of business depends on the use of statistical
models for prediction like correlation, regression. Any new startup will depend
on the market study, business pointers for growth and the forecasted demand
supply for the product.

8
Importance & Scope of Statistics
• Statistics and Industry : Statistics is widely used for Production planning, quality
control, inventory management. Inspection plans, control charts, sampling are
used regularly.

• Statistics and Biology, Astronomy and Medical Science : There is an association


between statistical methods and biology studies conducted. Clinical trials to
analyse drug effectiveness makes use of Statistical tools.

• Statistics and Finance : Statistical data, analysis, interpretations are widely used
by analyst for stock predictions, investment opportunities and guide the
consumer to manage fiances.

9
Statistics
There are 2 subdivisions in Statistics

1. Descriptive Statistics – It involves organizing and presenting data.


Graphs, Tables, Charts are used to display the results.

2. Inferential Statistics – Making inferences from our data. Predicting behavior of the data.
Hypothesis testing, Probability are some of the tools used.
Statistics

Inferential : Drawing
Descriptive : Presenting,
conclusions about the
organising, summarising
populations from sample
data 10
drawn
CHAPTER 1

1. Data processing- Classification – Summarization – Tabulation of data


–presentation (tabular and graphic) – Frequency distribution.

2. Measures of central tendency - Arithmetic mean, Weighted mean,


Geometric mean, Median, Mode,

3. and Partition values – Quartiles, Deciles and Percentiles. Measures of


dispersion Range, Quartile deviation, Standard deviation, Variance
and Co-efficient of variation

11
Objectives

• Understand Data and information

• Data organisation

• Frequency distribution

• Analyse Frequency Distribution

12
1. Data processing- Classification – Summarization –

Tabulation of data –presentation (tabular and graphic)

13
Data Processing

• What is meant by data ? – Data is raw unorganized facts which needs


to be processed.
• Eg Student admission data
• Census data
• Survey data

14
Data Processing
• Sources of Data – Primary and Secondary
• When the investigator collects first hand data for a task it is Primary data.
• If the data is obtained from published or unpublished sources such data is
secondary data.

Primary Data Secondary data


Method First hand data collected by Data collected by someone
the researcher else
Data Real time data Past time data
Cost Expensive Economical
Specific Specific to researcher’s May not be specific
needs
Accuracy High Low

15
Classification
The collected data is classified on the following four aspects :

1. Geographical
2. Chronological
3. Qualitative
4. Quantitative

Geographical classification : The data is arranged geographically ie


according to nations, states, districts etc.
Classification of data for the production of food grains statewise.

16
17
Classification
2. Chronological Classification :
If the data is arranged according to time ie years, months or weeks
then the classification is called chronological classification.
Similarly Profits of the Company.
Year Sales figures (Millions)
2015 200
2016 230
2017 250
2018 (estimated) 300

18
Classification
3. Qualitative classification :
There are certain characteristics which cannot measured like intelligence, smell,
taste, honesty etc. These are called qualitative characteristics. There can be simple
and manifold classification.
Simple classification : Workers in a factory can be classified as Skilled or unskilled.

Workers
Skilled Unskilled

If more than one characteristics is considered at a time then it is called manifold


classification.
Grades Workers
Skilled Unskilled
Male Female Male Female
A
19
B
Classification

4. Quantitative Classification : The characteristics which can be


measured with well defined units is called quantitative classification.
Eg. Height, weight, age, income, production etc.
Eg. Classification of students according to their age groups :

Age in Years No. of students


10-15 20
15-20 35
20-25 25
25-30 30

20
21
Data Processing

• Data processing is collection and compilation of data to produce


meaningful information.

• Data is gathered from various sources, entered into computer where


it is processed to create information.

22
23
Data Processing
Steps for Data processing cycle :
Collecting Data :
1. First stage which is very crucial since the quality of data will impact the output.
It should represent all the groups.
2. Data can be collected from actual observations or from maintained records,
files, library.
3. Data should be organised in simple, compact, usable forms.
4. Data should be accurate. “GIGO” Garbage in, Garbage Out should avoided.
5. The main question is “Is the data worth usable ?”

24
Population and Sample
• Data is gathered from a Sample. This is used to make inferences
about the population.
• Population includes all the elements of a data set.
• Sample consists of one or more observations from the population.

25
Example
• 25% of the cars sold in US in 1996 were manufactured in Japan. Is this conclusion
drawn from sample or population ?

• A manufacturer needs to decide whether a batch of material from production is


of high enough quality to be released to the customer, or should be sentenced for
scrap or rework due to poor quality. In this case, what is the batch ?

• A FMCG company with global customer base of 20,000 would like to understand
customer satisfaction rate. Is it advisable to have sample or population for the
study ?

26
Organising Data

• Information before it is arranged and analyzed is called raw


data.

• Data array is one way in which data can be presented.

• It arranges the data in ascending or descending order.

27
Organising Data

Better way of organising data is to use Frequency table or Frequency


Distribution.

Data array of average inventory from 20 retail


outlets
2.0 3.8 4.3 4.7 5.5
3.4 4.0 4.2 4.8 5.5
3.4 4.1 4.3 4.9 5.5
3.8 4.1 4.7 4.9 5.5

Frequency distribution is a table that organizes data into classes and the number of observations from the
Data that fall into the classes. 28
Organising Data
16.2 15.8 15.8 15.8 16.3 15.6
15.7 16.0 16.2 16.2 16.8 16.0
16.4 15.2 15.9 15.9 15.9 16.8
15.4 15.7 15.9 16.0 16.3 16.0
16.4 16.6 15.6 15.6 16.9 16.3

15.2 15.7 15.9 16 16.2 16.4


15.4 15.7 15.9 16 16.3 16.6
15.6 15.8 15.9 16 16.3 16.8
15.6 15.8 15.9 16.1 16.3 16.8
15.6 15.8 16.0 16.2 16.4 16.9

We can notice the highest and lowest values in the data.


We can divide the data into sections
We can see any value occurs more than once

29
Classification and Tabulation
1. Class interval : Class intervals are numerical groups which include all
values between the minimum and maximum in a group.
Eg. In the following example class intervals are 20-25, 25-30, 30-35.
Class
Total Frequency
Interval
20-25 4
25-30 6
30-35 9
35-40 7
40-45 4
Total 30

30
Classification and Tabulation
1. Class interval : Class intervals are numerical groups which include all
values between the minimum and maximum in a group.
Eg. Suppose following are IQ scores of 10 students in ascending order.

70 Seeing Maximum and Minimum Scores we can generate Class


72 Intervals starting from 70 and end at 100
75
78 70 - 79
80 80 - 89
90 90 - 99
91
92 Then insert data values in respective Class intervals.
95
95
96
31
Classification and Tabulation
2. Class limits : The lowest value and highest values in the class
intervals are called as class limits. Eg in the class interval 10-20 , 10 is
the lower class limit and 20 is the upper class limit.

3. Class width : It is defined as the difference between upper and lower


class limit of the class interval Eg class width of class interval 10-50 is
50-10 = 40.

32
Classification and Tabulation
4. Exclusive method of writing class interval : In this method the lower
class limit is included but upper class limit is not included in the class
interval. Eg a boy whose age is 10 is included in the class interval 10-20
but not included in the class interval 0-10.

5. Inclusive method of writing class interval : In this method both the


class limits are included. Class intervals written in the form 6-10, 11-
15, 16-20 and so on are considered as inclusive type of class intervals
as observations having values 6 and 10 are included in the class 6-10.

33
Organising Data
• Decide on the number of classes for dividing the data

• The number of classes is dependent on the data

• Find out the maximum and minimum value

• Width of class interval =


Next unit value after largest value in data – Smallest value in data
Total number of class intervals

34
35
Organising Data
Data array of average inventory from 20
retail outlets
• Frequency Class 2.0 3.8 4.3 4.7 5.5
• Class limits 3.4 4.0 4.2 4.8 5.5

• Frequency 3.4 4.1 4.3 4.9 5.5


3.8 4.1 4.7 4.9 5.5
• Relative Frequency Class (Group of Frequency Relative
5.5 – 2 = 0.58 similar value) – 6
classes
Frequency

6 2.0 – 2.5 1 0.05


2.6 – 3.1 0 0.00
3.2 – 3.7 2 0.10
3.8 – 4.3 8 0.40
4.4 – 4.9 5 0.25
5.0 – 5.5 4 0.20
36
Total 20 1.00
Sample of daily production in yarn

16.2 15.8 15.8 15.8 16.3 15.6


15.7 16.0 16.2 16.2 16.8 16.0
16.4 15.2 15.9 15.9 15.9 16.8
15.4 15.7 15.9 16.0 16.3 16.0
16.4 16.6 15.6 15.6 16.9 16.3

Class Frequency Width of class interval = 16.9 – 15.2 = 0.3


15.2-15.4 2 6

15.5-15.7 5
15.8-16.0 11
16.1-16.3 6
16.4-16.6 3
16.7-16.9 3

37
Problem

38
Lower Limit Upper Limit Frequency Relative Frequency
1 6 4 0.20
7 12 8 0.40
13 18 4 0.20
19 24 4 0.20
total 20

a. What statement can you make about effectiveness of order processing ?


Assuming they have 6 days working / week we find that 80% of the orders are delivered
In 3 weeks or less.

b. If the company wants to ensure that half of its deliveries are made in <10 days
Can you determine from the table whether they have reached the goal ?
We can conclude that 20% or 40% orders are completed in 12 or less days. The distribution
does not give enough information to make a conclusion.
39
Problem
ABC Store recorded the number of service tickets
submitted by each of its 20 stores last month as
follows :
823 648 321 634 752
669 427 555 904 586
722 360 468 847 641
217 588 349 308 766

The company believes that a store cannot break


even with fewer than 475 service actions a
month.
Company policy to give bonus to any store
manager who generates > 725 actions a
month
Arrenge data and find no of stores not
breaking even and how many get bonuses. 40
Graphs
• Graphs are visual representation of frequency distribution and time
series.
• Graph is made by plotting values of one variable against the
corresponding values of other variable.
• Horizontal line is X axis and Vertical line is Y axis
• The 2 axes together divide the plane into 4 parts called quadrants
• In the first quadrant both the values are positive, in second X is
negative and Y is positive, in third quadrant both are negative and in
fourth X is positive and Y is negative.

41
Graphical representation : Bar Graph

42
Simple Bar diagram
• Following table gives the population of different states in the year
2001. Represent the data by a suitable diagram.

States Population (CR)


Bihar 82.88
U.P. 166.05
Rajasthan 56.47
Haryana 21.08
Delhi 13.78

43
Multiple Bar diagram
• These diagrams are used to when there are 2 or more variables.
Eg : The table below gives the export and import figures of company ABC Ltd during
the years 2004-2007. Represent the data by a suitable diagram.

Year Export(CR) Import(CR)


2004 210 198
2005 232 205
2006 240 212
2007 255 220

44
Percentage Bar Diagram
If the components of the variable are expressed in percentages to the
total then the corresponding bar diagram is percentage bar diagram.
Eg Following table gives the information of persons working in a
workshop during 2001,2002 and 2003.

2001 2002 2003


Men 120 60% 105 52.50% 140 56%
Women 60 30% 70 35% 80 32%
Children 20 10% 25 12.50% 30 12%
Total 200 100% 200 100% 250 100%

45
Pie Chart
In this diagram a circle is divided into sectors with areas proportional to the the values of the
components of a variable. The angle at the center of the circle is 360 degrees. The value of each
component is given as an angle of the sector of the circle and calculated as :
Angle of any sector = Component value/Total value * 360 degree
Eg. Following table give the allocation of funds under different heads. Draw a pie diagram.
Heads Fund Angle in
alloca degrees
tion
Agriculture 130 65
Irrigation 145 72.5
Small industries 150 75
Transport 110 55
Social Service 125 62.5
Inventories 60 30
Total 720 360

46
Graphs of Time Series
• When data is arranged according to the order of occurrence the resulting data is
time series. Eg inflation data each month, share prices for the day/month/year.
• Time is taken on X axis and the other variable is taken on Y axis
• Eg The data relating to the production of washing machines in a factory during
2001 – 2005 is given in the table. Plot the graph.

Year Production (in ‘00)


2001 200
2002 250
2003 235
2004 250
2005 270

47
Histogram
• Histogram is a simple method of showing the frequency distribution.
• The class intervals are taken on X axis and the frequencies on Y axis.
• Draw histogram for the following data :
Class Frequency
internvals
20-40 15
40-60 28
60-80 42
80-100 30
100-120 16

48
Frequency Polygon
• To make the frequency polygon the class marks are taken on the X
axis and frequencies along Y axis.
• Eg
Classes Frequency
40-50 9
50-60 5
60-70 18
70-80 20
80-90 14
90-100 4

49
Frequency Curve
• The method of plotting points on Frequency Curve is similar to
Frequency Polygon. These points are joined by smooth curve and not
line.
Age No of employees
20-25 12
25-30 18
30-35 26
35-40 30
40-45 24
45-50 11

50
Measures of Central Tendency
Central tendency : is the middle point of a distribution. Measures of central
tendency are also called measures of location.

Dispersion : Dispersion is the spread of the data in a distribution that is the extent
to which data is scattered.

Skewness : Graphs representing the data points may be symmetrical or skewed.

51
Measures of Central Tendency and Variation

Data Description

Central Tendency (Location)


Variation

Range,Std Deviation,Interquartile
Mean,Median,Mode,Geometric range,Variance,Coefficient of
Mean Variation

52
Arithmetic Mean : Ungrouped data
To find the arithmetic mean we sum the values and divide by the number of observations.
Example in Excel.

Following is the time recorded for track team members in 1-Mile race
Member 1 2 3 4 5 6 7
Time in Minutes 4.2 4.3 4.7 4.8 5.0 5.1 5.0

Mean = 33.09 / 7 = 4.72

Advantages : 1. Mean concept is familiar to all 2. Every data set has a mean
3. It is useful for further statistical analysis
Disadvantages : 1. Though there are advantages, the mean gets affected by extreme values that
are not representative of the data set.
2. When the data set is big it is difficult to calculate 53
Arithmetic Mean : Grouped data
A frequency distribution consists of data that are grouped by classes.
Each value of an observation falls somewhere in these classes.

Class Frequency
15.2-15.4 2
15.5-15.7 5
X = fx
15.8-16.0 11 N
16.1-16.3 6
16.4-16.6 3
16.7-16.9 3

54
Arithmetic Mean : Grouped data
In a grouped data midpoint of the class intervals are taken.
Eg Calculate mean of the following distribution.

Mean = (270+650+770+1530)/50
= 3220 / 50 = 64.4
55
Coding
Instead of using actual mid-points to perform our calculations we assign
consecutive integers / we assign codes to each of the midpoints.
Zero is assigned to mid-point of the frequency distribution.
9 Frequency classes : midpoint is 5th class

56
Coding

57
Coding : Example
x0 = value of midpoint assigned value zero

w = width of class interval

u = code assigned to each class

f = number of observations in each class

n = total number of observations

X = 19.5 + 8 *( 5/20)
= 21.5 Average annual snowfall

58
Weighted Average
Arithmetic mean of a combined group can be calculated as
= n1 x1 + n2 x2
x
n1+n2

The mean weight of 40 boys is 60 kgs and that of 35 girls is 54 kgs. Find
their combined mean.
n1 = 40, x1 = 60 n2 = 35 x2 = 54 kgs

x = (40*60 + 35*54)/(40+35) = 57.2 kgs

59
Weighted averages
Eg. Marks of 2 candidates A and B in written test, group discussion and interview. If
the weights of the same are 3,2,1 then determine who is the better candidate.

Weighted average of A = ∑wA / ∑w = 465 / 6 = 77.5


Weighted average of B = ∑wB / ∑w = 468 / 6 = 78
Thus candidate B is better.
60
Weighted Average
Arithmetic mean of a combined group can be calculated as
= n1 x1 + n2 x2
x
n1+n2

Example : The mean wage of 100 laborers working in a factory running 2 shifts of
60 and 40 workers is Rs.38. The mean wage of 60 laborers working in the morning
shift is Rs. 40. Find the mean of 40 laborers working in the evening shift.
Ans : 35

61
Median
• Median is the single value in a data set that measures the central item in the
data.

Example : Number of patients treated in ICU on 8 consecutive days :


86 52 49 43 35 31 30 11
Arranging in ascending order : 11, 30, 31, 35, 43, 49, 52, 85
Median = (43 + 35)/2 = 39

Advantages : 1. Median is not affected by the extreme values


2. It is easy to understand
3. We can find median for qualitative data also
Disadvantages : 1. Arranging the data array may be time consuming
2. It is complex to use for further statistical analysis 62
Mode
Mode : Is the value which is repeated most often in the data set.
Example : Delivery trips / day in 20 day Period

0, 0, 1, 1, 2, 2, 4, 4, 5, 5, 6, 6, 7, 7, 8, 12, 15, 15, 15, 19

Advantages : 1. Can be used for quantitative and qualitative data


2. Mode is not affected by extreme values

Disadvantages : 1. Many times data doesn't have modal value, so every value
is mode.

63
Geometric Mean
• When we are dealing with quantities that change over a period of time we need to know an average rate of
change eg average growth rate over a period of several years.

• Suppose we deposit Rs.100.

• Growth factor = 1+(interest rate=10)/100

• Arithmetic mean of the growth factor : ((1.07+1.08+1.10+1.12+1.18)/5) = 1.11

• If we multiply 100 * 1.11 * 1.11 * 1.11 * 1.11 * 1.11 = 168.51

64
Geometric Mean
n
GM = √Product of all x values

= 5 th Root of (1.07 * 1.08 * 1.10 * 1.12 * 1.18)

= 1.1093. -------- Geometric mean

The average interest rate is 10.93.

65
Dispersion
• Dispersion in Statistics is a way of describing the data spread.
• Advantages are :
• It gives us additional information to judge
• We must be able to
tackle data which are
widely spread
• We may wish to
compare dispersion of
various samples

66
67
Dispersion

68
Range
• Difference between highest and lowest value

Interquartile Range :
• Divide our data in four parts each of which contains 25% of the
distribution.

• Interquartile range = Q3 – Q1

• Quartile that divide the data into four parts and Percentiles divide the
data into 10 parts.
69
Interquartile Range

• Quartiles divide the data set into 4 equal parts.

• Q2 is the median value

70
Measures of Dispersion
Eg Calculate Quartile deviation from the following figures of daily expenses RS

45,52,38,47,60,56,62 ( 7 observations , odd numbers )

Ascending order Rs 38, 45, 47, 52, 56, 60, 62

Q1 = value of (N+1)/4 observation = 8/4 = 2nd observation ie 45

Q3 = value of 3(N+1)/4 observation = 24/4 = 6th observation ie 60

Quartile range = (60-45)

QUARTILE.EXC function to be used in Excel 71


Interquartile Range
• Example : Consider data set (odd 9 obs) – 1, 3, 4, 5, 5, 6, 7, 11,15
• Q1 = ( N + 1)/4 = 10 / 4 = 2.5
• Q1 = ( 3 + 4 ) /2 = 3.5

• Q2 = 5
• Q3 = 3 (N + 1) /4 = 3 * 10/4 = 7.5 item
• Q3 = ( 7 + 11 ) /2 = 9

• Interquartile range = Q3 – Q1 = 9 – 3.5 = 5.5

• Find the interquartile range : 1, 2, 2, 4, 4, 5, 5, 6, 7, 8, 11.


• 1, 3, 4, 5, 5, 6, 7, 111 =2, Q3=7 72
Decile and Percentile

• Quartiles divide the data into 4 parts

• Percentiles divide the data into 100 parts

• Decile divides the data into 10 parts

73
74
Measures of Dispersion
Population Variance and Standard Deviation
Every population has variance which is given by

- Population variance
X – observation
µ - population mean
N – total number of observations

Population Standard Deviation =

75
Measures of Dispersion

Sample variance
Find the variance and standard deviation
for the
S Sample standard deviation
x individual values
X Sample mean Following data :
n-1 number of observations in the sample 35,45,30,35,40,25

Ans : 41.7 and 6.5

76
Problem

• Following data is sample of the daily production rate of Fibreglass


boats from ABC Ltd,

17,21,18,27,17,21,20,22,18,23

Company Production Manager feels a standard deviation of more than


3 boats a day indicates unacceptable production rate variations.
Should she be concerned ?

77
Problem

78
Problem – Standard Deviation
• Suppose we want to understand how volatile are stocks of 2
companies. How risky it is to invest in a company? Following are the
year by year returns.

79
Coefficient of Variation

• Coefficient of variation is a relative measure of variation

• It gives us the feel of dispersion with relation to mean of the data

• It relates standard deviation and mean by expressing standard


deviation as a percentage of mean

• Thus unit of measurement is %

80
Coefficient of Variation

CV is useful when you want to compare results of 2 different surveys or tests that have different measures
Or values. If sample A has CV of 12% and sample B of 25% , sample B has more variability.

ABC Ltd is considering employing one of two training programs. Group 1 was trained by Program A and group 2 by
Program B. For the first group the time average time required is 32.11 hours and variance of 68.09. For the second
Group average is 19.75 hours and variance is 71.14. Which training program has less variability ?
(25.7 and 42.7)

81
Coefficient of Variation

82

You might also like