Professional Documents
Culture Documents
Business Statistic S: ..By Mr. Ravi Prasad
Business Statistic S: ..By Mr. Ravi Prasad
2008-
STATISTIC 09
S
Study
..by Mr. Ravi
Materia
Prasad
l
BUSINESS STATISTICS BBM SECOND YEAR
THEORY QUESTIONS
Question:1
Functions of Statistics :
The important functions of statistics are explained below:
1. To present facts in a precise form: The important function of statistics is to
present general information in a precise and a definite form. For example,
instead of stating that average cotton yield in India is much lower than in China.
To more convincing statement is that the average yield of 160 Kg of cotton per
hectare in China is more than average cotton yield of 150 Kg of cotton per
heritage in India.
2. To simplicity and classify complex data: The raw data is often highly complex.
Therefore, the purpose of statistic is to simplify and classify large bodies of
numerical data to make them easily understand able. For example, the results
of various university examination will be quite intelligible, by pass percentage
for different course4s and for different years of the same course as compared
with massive data placed before any academic body.
3. To make comparisons: After simplifying and classifying the data, statistics helps
for making comparison.
4. To know relationship between the two groups: The relationship between the
two groups is best represented by certain statistical techniques such as
coefficient of correlation, regression etc., for example the relationship between
demand for the product and price of the product is possible through correlation
analysis technique.
5. To Forecast future trends: Statistics aim at forecasting future trends by
applying statistical technique such as extrapolation and time series analysis.
6. To measure uncertainty: Statistical methods help not only in ascertaining the
chance of the occurrence of an event but also in finding out the total effect of an
uncertain event if the consequences of various occurrences are known.
7. To test a hypothesis : Statistical techniques are highly in formulating and
testing Hypothesis and develop new theories.. For example, the hypothesis that
a new drug is effective in reducing malaria.
8. To draw valid inferences : Statistical methods are used to help in drawing valid
inferences with regard to the nature of the universe on the basics of study of
sample or the universe.
9. To formulate policies: It is possible to formulate suitable policies in different
field with the help of statistics. Various policies in the field of planning taxation,
foreign trade, social securities etc ,. Are formulated on the basis of analysis of
statistical data.
Question:2
Limitations of statistics
Its limitations in order to draw reliable conclusions. The following are the
limitations of statistics.
1) Statistics does not study with individuals : Statistics deals with an aggregate of
facts rather than individual it deals with aggregates of fact stating that average
height of a college is 5’.5”.
2) Qualitative phenomena ignored: Statistics is applicable only to the quantitative
aspect of a problem. As such, statistics ignores qualitative phenomena such as
honesty poverty, wisdom etc,. which cannot be expressed numerically.
3) Statistical results are only on an average : W.I.King writes, “Statistics largely
deals with averages and these averages may be made up of individual items
radically different from each other”. The laws of statistics are not universal like
that the laws of physics, Chemistry or Astronomy. The statistical laws are true
only on an average.
4) Statistics can be misused: Statistics can be misused by ignorant persons. The
data used by untrained people can lead to misleading results. The statistics can
be handled correctly only by those who have sufficient knowledge of statistics.
As King Says, “Statistics are like clay of which one can make a God or Devil as a
one pleases.” If the statistics is misused, wrong conclusions may be drawn from
the study.
5) Statistical relations do not necessarily bring out cause and effect relationship
between phenomena Statistics reveal association amongst certain sets of
quantities which may not be of ‘cause and effect’ type. It is the job of the
interpreter to determine type of relationship existing between various
variables.
6) Statistics is collected with a purpose: Statistics is collected with a given purpose
and cannot be applied to any situation. The use of secondary data without
proper care is likely to lead to fallacious conclusions.
Question:3
IMPORTANCE OR USES OF STATISTICS:
Statistics is widely used in all the fields. There is hardly any field, Industry,
Commerce, Trace, Economics. Astronomy, Physics and Chemistry etc., where
statistics are not used. The importance of statistics in relation to the state affairs
followed by its use in Economics, Commerce etc., is discussed below:
1. Statistics and the State of Affairs
Initially, statistics originates as a science of State- craft but its scope has
increased gradually. It was world history that statistics was used to draw
Conclusions on population and land use from very early times.
The kings depended heavily on statistics for knowing manpower for
deference purposes. Modern states, the date is collected in order to solve the
problems and to design suitable policies to solve these problems.
6).Statistics and Physical & Natural Sciences: Statistical methods are applied in
Physical and Natural Science. In the physical sciences like Astromany, Geology, and
Physics and Natural sciences such as Meteorology, Zoology, Botany etc., statistical
techniques are used. Karl pearson used statistical methods in Biology. He proved that
doctrine of evolution and her editions is based on statistical knowledge.
Question:4
Explain the advantages and limitations of secondary data.
Ans. Data which are not originally collected but are obtained from published or
unpublished sources are secondary data. They are collected and processed by some
agency and made use of by some other agency for their statistical work.
Advantages of Secondary data:
The following are the advantages of secondary data.
1) The secondary data constitute the chief material on the basis of which statistical
work is carried out in many investigations.
2) The secondary data may be obtained from various international national and
local publications.
3) Secondary data is generally available in various magazines, journals, bulletins
and reports etc.
4) Secondary data readily available from different sources to the investigator.
5) In most of the studies, the investigator finds it impracticable to collect first hand
information on all related issues and as such he makes use of the data collected
by others. Hence, it is a less costly affair.
6) There is a vast amount of published information from which statistical studies
may be made and fresh statistics are constantly in a state of production.
Question:5
Question:6
Explain the different methods of measuring the dispersion and highlight their merits
and demerits.
A. According to Spiegel, the degree to which numerical data tend to spread about
an average values is called the variation or dispersion of data. The different
methods of measuring the dispersion and their merits and demerits are given
below.
a) Range: The difference between the largest and smallest values is called as
range.
Merits:
i) It is easy to calculate and simple to understand.
ii) It is rigidly defined.
iii) It is gives the total picture of the data at a glance.
iv) It is used to check the quality of product for quality control.
v) It also in forecasting the weather by keeping the range of temperature.
Demerits:
i) It is not based on all items, hence value is not reliable.
ii) It is affected by fluctuations of sampling.
iii) It does not tell about variability of other data.
iv) It can’t be determined for open end intervals.
b) Quartile Deviation: Under this method, the two quartiles are used to calculate the
dispersion.
Merits:
i) It can be easily calculated and simple to understand.
ii) It does not involve much mathematical difficulties.
iii) It is not affected by extreme items as 25% of upper and 25% of lower items
are left out.
iv) It facilities the calculation of standard deviation and mean deviation
also.
Demerits:
i) It is not capable of further algebraic treatment.
ii) It involves tedious calculations.
iii) It is very much affected by fluctuations of samples.
iv) It ignores, first and last 25% of items and hence not a representative figure.
v) It affects the result badly, if the values are irregular.
vi) It does not show the scatterness around any average.
c) Mean Deviation: Mean deviation is the average difference between the items in a
series from the mean or median or mode.
Merits:
i) It takes all the items into account.
ii) It highly useful in the field such as business, economics and commerce.
iii) It has least sampling fluctuations.
iv) It helps in the comparison of two or more series.
v) It is based on measurement not on estimate.
vi) It is rigidly defined.
vii) It is less affected by extreme items. If calculated from median.
Demerits:
i) It is difficult to calculate, if the average is in fraction.
ii) It is capable of further algebraic treatment.
iii) It involves tedious calculations.
iv) It ignores the positive and negative deviations from average.
Demerits:
I) It is difficult to compute and not easy to understand when compared with
other methods.
II) It results in assumption or lower limit of first class and upper limit of last
class in a open end series.
III) It requires the calculation of co-efficient of S.D. or C.V. for comparing two or
more series..
Question: 7
What are the different methods of measuring the correlation? Explain.
A. In correlation, we are required to know the relationship between variables.
Following are the different methods of measuring the correlation.
Scatter Diagram Method: This method is the simplest tool of determining the
correlation between two variables. The values of ‘X’ series should be plotted
corresponding to ‘Y’ series. The diagram so obtained is called as scatter diagram.
Depending on the scatterness of diagram. We have to determine whether
correlation exists or not. If the points plotted are very close to each other. It
shows high correlation, otherwise poor correlation exists.
Merits:
i) The relationship between variables can be established by inspection
only.
ii) This method is not affected by extreme values.
iii) This methods helps in getting approximate estimated line.
Demerits :
i) This is not suitable if number of observations are more.
ii) This is does not provide exact measure of relationship between given
variables.
b) Graphic Method : According to this method, the values of the series are plotted on a
graph paper and two curves are drawn. By looking at the direction and closeness of the
two curves, the extent of relationship is studies. If the two curves move in the same
direction the correlation is said to be positive and if they move in different direction,
correlation is said to be negative .
Merits :
i) By graph, relationship is studied very easily.
ii) It is desirable if data is given for a longer period.
Demerits :
I) It is not possible to measure the exact degree of correlation.
II) It does not give the rigid correlation between the given variables.
Merits:
i) It is based on arithmetic mean and standard deviation. Hence, they are
rigidly defined.
ii) It is explains the direction of relationship i.e., positive and negative.
iii) It is also establishes the size of relationship i.e., ±1, -1, etc.
iv) It helps to measure the co-variance.
v) It takes into consideration the positive, negative deviations calculated from
mean.
Demerits :
I) Its calculation is very lengthy and time consuming.
II) It requires the interpretation of calculated correlation value.
III) It assumes linear relationship between the variables.
IV) It is affected by extreme values of the data.
Demerits:
i) It is not applicable are grouped frequency distribution.
ii) If number of observations are more, this method cannot be used.
Demerits:
i) It does not consider the small or big variation.
ii) It only gives rough idea about presence or absence of correlation.
Question:8
STATISTICS PROBLEMS
PROBLEM:1
Find Mean and Median from the data given below:
Marks No. of students
15 -25 4
25 -35 11
35 -45 19
45 -55 14
55-65 0
65 -75 2
25 – 35 11 30 -10 -1 -11
35 – 45 19 40 0 0 0
45 – 55 14 50 10 1 14
55 – 65 0 60 20 2 0
65 – 75 2 70 30 3 6
N = 50 ∑ f . dx = 1
A = 40 ( Assumed Mean )
N = 50 ( No. of students)
X = A + ∑ f d x ' /N
= 40 + 10/50
Mean (x) = 40.2
Median:
∑ f =135
th
N1 N
for Q 1 =
4
= 135/4 = 33.75th term
33.75th iterm lies in (20 - 30)
L1 = 20, f = 20, N 1= N , N 1 = 33.75, c.f.= 25, i=10
4
N
−c . f 33.75−25
Q1 = 2 × i = 20+ ×10
L 1+ 20
f
= 20 + 8.75/2 = 20 + 4.37 = 24.37.
th
N 1 for Q 3 = 3 N item
4
3 x 135
N1 = = 101.25th item
4
101.25th item lies in (50 - 60)
Where L = 50, i = 10, c.f = 100, f = 20, N 1 = 101.25
N
−c . f
Q3 = 2 ×i
L 1+
f
101.25 – 100
=50+ 20
× 10
1.25
= 50+ 2 = 50 + 0.62 = 50.62
Quartile Deviation = Q 1 - Q 3
= 50.62 – 24.37 = 26.25
Co-efficient of Quartile Deviation = Q 3 - Q 1 / Q 3 + Q 1
50.62−24.37
= 50.62+ 24.37
26.25
= 74.99
= 0.350.
∑ D 2=160
6 ∑ D2
rk =1-
N ( N 2−1)
6 x 160
= 1−¿
8(8 2−1)
960 960
= 1− 8 x 63 =1−¿ 504 = 1- 1.904 = -0.904.
Problem:4
Compute the trend values by the method of least squares from the following data
Year 1998 1999 2000 2001 2002 2003 2004 2005
No. of 56 55 51 47 42 38 35 32
T.V’s
Sold
Solution:
Year Production(Y) Deviations X XY X2 Y c =a +bx
from mid-year
1998 56 -3.5 -7 -392 49 57.52
1999 55 -2.5 -5 -275 25 53.80
2000 51 -1.5 -3 -153 9 50.08
2001 47 -0.5 -1 -47 1 46.36
2002 42 +0.5 +1 +42 1 42.64
2003 38 +1.5 +3 +114 9 38.92
2004 35 +2.5 +5 +175 25 35.20
2005 32 +3.5 +7 +224 49 31.48
∑ Y =356 ∑X ∑ XY =- ∑ X 2=¿ ¿1
=0 312 68
a=
∑Y= 8
356
= 44.5
N
∑ XY −312
b= = 168 = -1.857 = -1.86.
∑ X2
∴ Y c = 44.5+(-1.86) X
The value of Y c for the year 1998 will be 44.5 – 1.86 (-7) = 57.52.
Similarly, other values can be calculated.
Problem:5
Compute the trend values by the method of least squares from the following
data
Year 1999 2000 2001 2002 2003 2004 2005
Production 80 90 92 83 94 99 92
(in‘000Quintals)
(A) Fit a Straight line trend to these figure
(B) Estimate the values of production for the year 2006.
Solution:
year Production X = t- 2002 XY X2 Trends values
(y) Y c =a +bx
1999 80 1999 –2002 = -240 9 90 +2(-3)= 90-6 =84
-3
2000 90 2000 -2002 = -180 4 90 +2(-2)=90 -4 =86
-2
2001 92 -1 -92 1 90+2(-1) =88
2002 83 0 0 0 90
2003 94 1 94 1 90 + 2(1) = 92
2004 99 2 198 4 90 + 2(2) = 94
2005 92 3 276 9 90 + 2(3) = 96
∑ =630 ∑ X =0
Y ∑ XY 28
=56
∑Y = n a+ b∑ X
630 = 7 a + b(0)
630
a= = 90
7
∑ XY = a∑ X +b∑ X 2
56 = 90 (0) + b(28)
56
b= =2
28
Y c = 90 +2(x)
The values of production for the year 2006,
Y c = 90 +2(2006 – 2002) = 90 + 2(4) = 90 + 8 =98.
Problem:6
Fit a trend line equation of Y=a + bx to the following data.
Year 2001 2002 2003 2004 2005 2006
Profits(Rs’000) 75 85 92 97 99 100
Also Estimate the Profit for the 2010.
Solution:
Years Profits(Y) t−2003.5 XY X2 Trendvalues
X= 1 Y =a +bx
2
2001 75 2001−2003.5 -375 25 91.3+2.4(-5)=91.3-
1 = -5 2=79.3
2
2002 85 -5 -255 9 84.1
2003 92 -1 -9 1 88.9
2004 97 1 97 1 93.7
2005 99 3 297 9 98.5
2006 100 5 500 25 103.3
548 ∑ XY 70
=172
∑Y = n a+ b∑ X ∑ XY = a∑ X +b∑ X 2
548 = 6a+b(0) 172 = 91.3(0) +b(70)
548 172
a= = 91.3 b= = 2.4
6 70
Y= 91.3 + (2.4)X
Problem:7
Given that the means of X and Y are 65 and 67 then S.D’s are 2.5 and
3.5 respectively and the coefficient of correlation between them is 0.8 .
(a) Write down the two regression lines
(b) Obtain the best estimate of X, when Y =70.
(c) Using the estimated value of X as the given value of X, estimate corresponding
value of Y.
Solution:
(i) Regression Line of y on X
δy
Y - Ý = γ (X - X́ )
δx
3.5
Y – 67 = 0.8 (X – 65)
2.5
= 1.12 X - 65×1.12
= 1.12X – 72.8
Y = 1.12X – 72.8 + 67
Y = 1.12X -5.8
Regression line of X on Y :-
δx
X - X́ = γ (Y -Ý )
δy
2.5
X – 67 = 0.8 (Y – 67) = 0.571(Y - 67)
3.5
X - 67 = 0.571 Y - 38.257.
Regression Analysis :-
X = 0.571 Y -38.257 +65
X = 0.571 Y + 26.743
(i) Best estimate of X when Y = 70 can be got from regression equation X on Y.
X = 0.571 (70) +26.743
= 39.97 +26.743 = 66.713.
(ii) When X = 66.713, Y = 1.12(66.713) – 5.8
Problem:8
Calculation pearson correlation coefficient for the following data.
X 77 54 27 52 14 35 90 25 56 60
f 35 38 60 40 50 40 35 56 34 42
Solution:
X Y dx =X - X́ dy = Y-Ý dx 2 d y2 dx .dy
77 35 77-49=28 35-43=-8 (28)2=784 64 28(-8)=-224
54 38 54-49=5 38-43=-5 (5)2=25 25 5(-5) =-25
27 60 27-49=-22 60-43=17 (−22)2=484 289 -22(17)=-374
52 40 52-49=3 40-43=-3 9 9 3(-3) =-9
X́ =
∑x 490
= 10 Ý =
∑Y = 10
430
N N
X́ = 49 Ý = 43
∑ dx dy −1059 −1069
γ= = = 1978.383 = -0.535
√∑ d x 2 × ∑ d y 2 √ 5150 ×760
Problem:9
Given the following Data:
Variance of X = 9
Regression Equation’s : 4X – 5Y + 33 = 0
20X – 9Y-107=0
Find : (a) The Mean values of X and Y
(b) The Standard Deviation of Y and
(c) The coefficient of correlation between X and Y.
Solution: Regression Equation’s : 4X – 5Y + 33 = 0 … (1)
20X – 9Y-107=0 …..(2)
Solve (1) × 5 and (2)
20 X – 25 Y +165=0
20 X – 9Y −107=0
−16 Y +272=0
272
Y = 16 = 17.
4X – 5Y + 33 = 0
4X – 5(17) = -33
4X – 85 = -33
4X = 85 – 33 ; 4X = 52
52
X= = 13
4
Regression equation ‘Y’ on ‘X’.
4X – 5Y + 33 = 0
-5Y = -33 -4X
5Y = 33 + 4X
4 33
∴ Y = 5X + 5
Regression equation ‘X’ on ‘Y’
20X – 9Y-107=0
4 9 9 3
= ×
5 20 √=
25
4
= 5.
√
Regression co-efficient of ‘Y’ on ‘X’ = .
5
2
σx = 9
σx = 3.
σx 4
γ . σy = 5
3 σy 4
5 () ( )
. 3
= 5
∴ σy = 4
Problem:10
From the following Data find two Regression Equation’s.
X 65 72 75 78 82 85 96
f 55 65 75 85 95 88 86
Solution:
X Y A =78 A = 75 dx 2 d y2 dx.dy
dx=X - A dy= Y-A
65 55 -13 -20 169 400 260
72 65 -6 -10 36 100 60
75 75 -3 0 9 0 0
78 85 0 10 0 100 0
82 95 4 20 16 400 80
85 88 7 13 49 169 91
96 86 18 11 324 121 198
∑X ∑Y ∑ dx=+ ∑ dy ∑ d x 2=60 ∑ d y2 ∑ dx . dy=
=553 =549 7 =24 3 =1290 689
X́ =
∑x 553
= = 79 Ý =
∑ Y = 549 = 78
N 7 N 7
N ∑ dx . dy−∑ dx . ∑ dy
bX = 2
Y
[ N ∑ d y −(∑ dy ) ]
2
X - X́ = b X ( Y −Ý )
Y
X – 79 = 0.55 (Y – 78)
X – 79 = 0.55Y – 42.9
X = 0.55Y + 36.1
N ∑ dx . dy−∑ dx . ∑ dy
bY = 2
X
[ N ∑ d x 2−( ∑ dx ) ]
7 ×689−7 × 24 4823−168
= 2 = 4221−49
7 × 603− (7 )
4655
= 4172 = 1.1
Regression equation of Y on X
Y −Ý = b Y ( X− X́ )
X
Y – 78 = 1.1 (X – 79)
Y – 78 = 1.1X – 86.9
Y = 1.1X – 86.9 + 78
Y = 1.1X – 8.9
Problem:11
Find the Geometric Mean of 2,4,8,12,16 and 24.
Solution:
X log X
2 log 2= 0.3010
4 log 4= 0.6020
8 log 8= 0.9030
12 log 12= 1.079
16 log 16= 1.2041
24 log 24= 1.3802
N=6 ∑ log X = 5.4696
log X
g = Antilog ∑ ( )
N
5.4696
= Antilog ( 6 )
= Antilog( 0.9116 )
= 8.158
Problem:12
Find the Geometric Mean of 576, 57.6, 5.76, 0.576.
X log X
576 log 576= 2.7604
57.6 log 57.6=1.7604
5.76 log 5.76 = 0.7604
Problem:13
Calculate coefficient of Correlation from the following data.
X 20 25 34 43 45 56 65 70
Y 72 76 81 68 84 78 80 85
Solution:
X Y dx=X - 45 dy = Y -78 dx2 d y2 dx.dy
20 72 20-45=-25 72-78=-6 (−25 )2=¿625 (−6 2)=36 -25(-6)=150
25 76 25-45=-20 76-78=-2 400 4 -20(-2)=40
34 81 34-45=-11 81-78=3 121 9 -33
43 68 43-45=-2 68-78=-10 4 100 20
45 84 45-45=0 84-78=6 0 36 0
56 78 56-45=11 78-78=0 121 0 0
65 80 65-45=20 80-78=2 400 4 40
70 85 70-45=25 85-78=7 625 49 175
∑ X=3 ∑ Y =62 ∑ dx= -2 ∑ dy=0 ∑ d x 2=2296 ∑ d y2 ∑ dx . dy=392
58 4 =238
X́ =
∑ x = 358 Ý =
∑ Y = 624
N 8 N 8
X́ = 45 Ý = 78
∑ dx dy
∑ dx dy − N
γ = 2 2
√ ∑dx −
392−0
2
( ∑ dx
N ) √∑ ( dy −
∑ dy
N )
=
√ 2296−
392
(−28 ) √238
392 392
= √2296.25 √ 238 = 47.9 ×15.6 = 747.24
= 0.5245
Problem:14
Calculate mean deviation from Median from the following Data.
Marks less 80 70 60 50 40 30 20 10
than
No of 100 90 80 60 32 20 13 5
Students
Solution :
X f C .f d y 1 =| X−M | f.d y 1
80 100 100 20 2000
70 90 190 10 900
60 80 270 0 0
50 60 330 10 600
40 32 362 20 640
30 20 382 40 800
20 13 395 47 611
10 5 400 55 275
∑ f =400 ∑ f . d y 1=¿
5826
400+1 th
M = Size of ( 2 ) term
= size of 200.5th term
= 60