Download as pdf or txt
Download as pdf or txt
You are on page 1of 102

Statistics 3-2 Descriptive Statistics : Measures of Dispersion Statistics 3-3 Descriptive Statistics : Measures of Dispersion

3.15 Rank Correlation introduction


Ex
3.16 Regression
* [n the previous chapter we saw that average condenses information into a single value.
3.17 Lines of Regression However, only the average is not sufficient to describe the distribution completely.
3.18 Regression Coefficients » There may be two or miore distributions with the same means but distributions may not
3.19 Binomial and Multinomial Distributions be identical. Runs by players X, Y, Z in 5 matches are as follows,
9 ASPIISFSCE COUNTS
ON BOIOPATOI FTI IC CACC NP CSOVL SC PISN

Run
3.20 Poisson Distribution ret Sr Eeverervend or ab0 Co ULIE OH Ps CENCE COCONNeF

94 98 100 102 10s


3.21 Uniform Distribution AAA ao 6997997 007 rhe TOOK: NOAA

100 125 140


3.22 Exponential Distribution
100 110 170
3.23 Gaussian Distribution
3.24 Log-Normal Distribution * If we observe the average run scored by all students are the same but they are differin
variation. We can say that player X is more consistent than Y and y is more consistent
3.25 Chi-square Distribution
than Z. That is runs are scattered from central value 100.
3.26 Additive Property of Chi-square Distribution
* A scatteredness of observation from a central value is called dispersion.
3,27 Hypothesis

3.28 Null and Alternative Hypothesis [3.2 Characteristics for an Ideal Measures of Dispersion
3.29 One-Sided or Two-Sided Hypothesis An ideal measure of dispersion i is to satisfy the following characteristics: :
3.30 Errors (Type-l and Type-ll) i) It should be based on all observations in the data set. ,
3.37 Some Definitions ii) It should be easy to understand.

Multiple Choice Questions . , iii) It should be capable of further mathematical treatment.


iv) It should be affected by fluctuations of sampling.
v) It should be affected by extreme observations.

EFJ Measure of Dispersion


* In this chapter we will discuss the following measures of dispersion i) Range

ii) Quartile deviation iii) Mean deviation iv) Standard deviation.

e These measures have the same units as that of the observations e.g. cm, hours, etc. and
the measures are called as absolute measures of dispersion.

Absolute and Relative Measure of Dispersion: . °

* It can be seen that absolute measures of dispersion possess units and hence create
difficulty in comparison of dispersion for two or more frequency distributions. Absolute
ing
measures of dispersion use original units of data and are most useful for understand
the dispersion within the context of experiment and measurements. -

. TECHNICAL PUBLICA TIONS® - an up-thrust for knowledge -


TECHNICAL PUBLICATIONS® -an up-thrust for knovdedge
Statistics 3-4 Descriptive Statistics : Measures of Dispersion
_ Statistics : 3-5 Descriptive Statistics : Measures of Dispersion
e Relative measures of dispersion are calculated as-ratios or percentages for example, one
FJ Quartile Deviation
relative measure of dispersion is the ratio of the standard deviation to the mean.”
e In range we have only extreme observations of data.
¢ Relative measures of dispersion are always dimensionless and they are particularly Hence, any change in the
observations in between these two extreme observations is not going to affect
useful for making comparisons between separate data sets or different experiments that the range.
In many data sets extreme observations are. away from remaining observations.
might use different units. They are sometimes called coefficients of dispersion. Thus
range does not gives actual picture of dispersion.
EX] Range + To overcome this problem, range of middle 50 % items is calculated.
* The range in statistics tells us the size of the data. Also for a given end datapoint of the e The quartile deviations are convenient to assess the spread of a
distribution. This
data, we can find the other end data point with the help of range. distribution measures the spread from its central tendency.
Definition: The range is the difference between the highest /largest value and the lowest/ e Inter quartile deviation can be defined as the difference between the first quartile and the
smallest value of the data. third quartile in the frequency distribution table.

+t Largest / Highest value * When this difference isdivided by two then it is called as quartile deviatio
n or semi
interquartile range.
i
. + .
'S Smallest/ Lowest vaiue Quartile deviation or semi interquartile range = 5
Range =. L-S

Steps to find range in statistics : Note : Coefficient of quartile deviation = T


: Q+Q,
1. Arrange given data in ascending order or descending order.
Formula to calculate Q, and
2. Observe the smallest value (S) of the data and largest value (L) of the data
a) For row data or individual observations :
3. Subtract the smallest value (S) from largest vaiue to obtain range of given data set.
1) When n= Number of observations are odd
e.g. The range of the data 2, 8, 10, 11, 12, 18, 21, 26, 27, 28, 33, 35, 38.
: + Ty l -
Smallest value (S) =.2 Q, = The value of (? 4 ) observation
Largest value (L) = 38 : 3 + 1 th .
Q, = The value of ( a4 ) observation
, Range = L ~S =38-2.=
36
Range is the most convenient metric to find but it has the following limitations : ii) When n = Number of observations are even, suppose n= 10 even

a The range does not tell us the number of data points. - 10+
Q, = The value of C4) observation
b. The Tange cannot be used to find mean, median or mode. : 1 1

The value of observation = The value of 2.75" observation


The range is affected by extreme values.
©.

The range cannot be used for open-ended distribution. So, 2. 759 observation lies between i. and 3" observation in the given data set
Re

Q, == 2" observation + 0.75 Be observation- 2" observation]


Note : Coefficient of range = L-S : ( +1 1) th
L+S Q, = The val of (212 ) , observation
33
The value of (==) observation

The value of 8,25" observation


i!

TECHNICAL PUBLICATIONS®. an up-thrust for knowledge


TECHNICAL PUBLICA TIONS® - an up-thrust for knowledge
Statistics . Descriptive Statistics : Measures of Dispersion Descriptive Statistics Measures of Dispers ion

set.
So, the 8.25" observation lies between the 8" and 9" observation in the given data
= Q, = 8 observation + 0.25 (9" observation ~ 8" observation)

b) For grouped data : .

N - Total frequency
C.F. - Cumulative frequency —~
Solution : i) To find range and coefficient of range
- Frequency of the particular class
<4

Smallest observation (S) = 14

|
- Lower bound of the class which respective quartile lies
ya

Largest observation (L)

©
29

|
w - Class width of respective quartile lies, Range = L-S=108-14=94

|
= The value of By observation. L-S$ 94 94
Class of Q ; Coefficient of range = TTS=Tog + 14 122 © 0.7704
This value provides the class where Q, hes,
it} To find quartile deviation and coefficient of quartile deviation
We arrange data in ascending order.
Q,= f 14, 19, 21, [22], 26, 29, 30, 62, 69, 74, 92, [100], 100, 104, 108
| ql
Class of 'Q, = The value of 2) observation. Here i = Number of observations = 15
n+ 1 th

Q, = The value of 4 observation


This value provides the class where Q, lies,
'
15+ 1\" observation
3N -CF.} = The value of 4
= 4" observation far = 22 yh
.

Q, The value of 46 observation

5+1
The value of 35+1) observation

i
= 12® observation = 100
2
Quartile deviation & 75 Q 400 - 2 == — 39

Q,-Q, 100-22 78
Coefficient of Q-D = QFQ =700 +22 122 = 0.6393

Solution : i) To find range and coefficient of range


. Here,
Smailest observation (S)

* TECHNICAL PUBLICATIONS® - an up-tfor te


knowledg
hrus TECHNICAL PUBLICA rions® - an up-thrust for knowledge
_—
Statistics 3-8 Descriptive Statistics : Measures of Dispersion Statistics 3-9 . Descriptive Statistics : Measures of Dispersion
Largest observation (L) = 30 .
Solution : Given data ;
Range = L-S=30-7=23

Coefficient ofrange = = = = = 0.6216

ii) To find quartile deviation, we arrange the observations in ascending order


07; 12, 17, 19, 20, 21, 22, 23 27, 30.
Here n = Number of Observation = 10
|

= Q, =
10+ 1\" how.
The value of | 4 observati on
= 2.75" observation.
2.75" observation lies between 2” and 3 observation in given data
“ Q, = gn observation + 0.75(3" observation ~ 24 observation) Here
—_ 180
= 12+0.75x(17-12) = 1243.75 “ Q, = The value of 47 4 745) by observation
.
Q, = 15.75 -. Therefore (20 — 30) is Q, class
+ 11
Q, = The value 7
ED) observation = — C.F.
33 th - ” Q, =
= The value of 4 observation
— 30
= The value of 8.25" observation
e Q, = 8" observation + 0.25 (9" observation — 8" observation) ~ 30
= 23 + 0.25(27 - 23)= 23 +1 'Q, =
Q, = 24
; 3N_ 3x1 a
- Q | Q, = The value of
Quartile deviation = Q
== = 80 = 135] observation.

= The value of (135)" observation


= 1555 = 4.125 Therefore (40 - 50) isQ,

. Q,—
Coefficient of QD = Q =
D Q

tt
~ ven TEE = 0.2075

Qs.

I
Quartile deviation

.10.625
Il

TCNUANNA!
TECHNICAL PUBLICATIONS® _ an cin apo gh
ANCHPATINUGE®D
tenn
an In thenot fac bnaulanne
3-11 Descriptive Statistics : Measures of Dispersion
Statistics . 3-10 Descriptive Statistics : Measures ofDispersion Statistics
nix. ~ Model
Q,-Q , (for frequency distribution)
quartile deviation=
Coefficient of
q | Q+Q,
46.25 —25 | MD about mode
46.25 + 25 Note: Coefficient of M.D. about mode =
Mode

0.2982
1
Ea Mean Deviation
e We have seen that in range, we calculate it by taking extreme observations. In quartile
deviation 25 % extreme observations are left out while calculating Q, and Q,.
* In both these cases only few observations are. considered while calculation arid
remaining observations are ignored. These both terms cannot give the clear picture of all
observations. .

* Here we discuss the measure of dispersion which takes into account all the observations
of the data set and can be calculated from an average. Such measures which are
calculated from average are known as “average deviation”
Definition : The arithmetic mean of absolute deviations from any average (mean or median
or mode) is called Mean Deviation (M.D.) about the respective average.

i) Mean deviation (M.D.} about mean (x) :


4

x Ix, — X|
MD. (for row data of individual observation)
i=l] on

ne |x,— X|
YL
MD.=D.= LF 1
| (for frequency distribution)
Here Zh
M.D. about mean.
Coefficient of M.D. about mean = Mean uy M.D. about mean

I
ii} Mean Deviation (M.D.) about median : M.D. 2
Coefficient of M.D. about mean = Mean” aT =0. 0281
ix, 1 - Median|
SA

M.D. (about median) (for row data or individual observation)


n ii) M.D, and coefficient of M.D. about median
ll
om

il

x;- Median To obtain median we arrange data in ascending order


anti

i HW (for frequency distribution)


68 69 70[70]72 73 75
il

M.D. about median


Coefficient of M.D. about median = Here n=7
Median n +1 th .

. Median 7 observation

Il
iii) Mean Deviation (M.D.) about mode :
a. jx, — Mode} 4" observation
M.D. (about mode) = , x A (for row data or individual observation)
i= 70

TECHNICAL PUBLICA Tions® - an up-thrust for knowledge


TECHNICAL PUBLICA TIONS? - an up-thrust for knewledge
Statistics
Statistics . 3-13 Descriptive Statistics : Measures of Dispersion
th . . ;
Median class
: The > observation i.e. 5" observation

Median class : 5-7

. Median =

Here [= 5 Nas, C.F.=7,w=2 Median =3

Here Zix; — Median| = 13


|x, - Median|
M.D. about median = 7

_= 13=

= 1.8571
M.P. 1.8571
Coefficient of M.D. about median =
Median© 70 © 0-0265

Coefficient of M.D. about mean = MD. about mean |


Mean
_=e148 .

Solution : = 0.3894 —
£fix, — Median| ;
© . MD, about median = —- N *
18 co
= 10 © 18 _.

Coefficient of M.D. about median


M.D. about median
Median

ER Mean Square Deviation


* Suppose u = x — a is a deviation taken from an arbitrary reference point ‘a’. To remove

TECHNICAL PUBLICATIONS® - an uo-thrust for knowledae


- TECHNICAL PUBHICATIONS® - an un-ihriet far kneuterine
Statistics Descriptive Statistics : Measures of Dispersion
Statistics : 3-14 Descriptive Statistics : Measures of Dispersion

al which means deviation. Using w we can develop. a measure of dispersion. Which is (for frequency distribution)
better than mean deviation.
The arithmetic mean of u* is known as mean Square deviation in short M.S.D. and is given
Note : o(S.D) afvar(a) is the relation between S.D. and variance.
by,
i) For row data or individual observation Simplified formula of above formulae :
4) For row data or individual observations
2
noua
M.S.D. 2
8

Let,

Squaring on both sides,

MS.D. _—
|

“jay N
A.
il

For different values of ‘a’ we get different M.S.D. This creates a problem in measuring
dispersion.

Ed Standard Deviation and Root Mean Square Deviation


a) Variance : The arithmetic mean of squares of deviation taken from arithmetic mean is
called as variance and is given by,
n a2
x (xi-— Xx)
Variance or var(x) (for row data or individual observations)
i=l ®
ii) For frequency distribution
L EE
k £(x, _ x)
(for frequency distribution) | 1
Sy £(x;
wm
x7
1= * . 3 =

b) Standard deviation : The positive square root of squares of the deviations taken from
=1 N-
arithmetic mean is called Standard Deviation (S.D.). = 2xX + (7)
= a3
In other words the positive square root of variance is called standard deviation or root
k
mean square deviation.
It is denoted by o(sigma)
= E 6x" .
5 fx; D £6)
1= i=

She =
= WN i _ oR) +
o(S.D.) = {for row data-or individual observation)

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge


- TECHNICAL PUBLICA TIONS® -'an up-thrust for knowledge
Statistics 3-16 Descriptive Statistics : Measures of Dispersion
Statistics 3-17 Descriptive Statistics : Measures of Dispersion

— , fmtDl 2n+1 a+)


V2.3 773
= (2 4n+2-3n-3
~ * 2 6
: Out of all measures of dispersion standard deviation which satisfies most necessary
conditions of a good measure. |
c) Coefficient of variation: The coefficient of variation represents the ratio of the Standard
deviation and the arithmetic mean and is given by,

CV. _= Tay
SD. 100 _o
= X© 100% .

' Remarks: | -
1) Coefficient
of variation always expressed in percentage.
2) In many cases, the value of the coefficient of variation is too smali, for convenienc
e it is
multiplied by 100.

Solution : We know that, o

(2)
We have to find standard deviation of first n natural numbers, :
- 1e.1,2,3,:..,n
- ws ~ +1 core

x
Addition of first n natural numbers = 2x,=14+2+...¢n= mt) Here n = Number of observations =

Addition of squares of first n-natural numbers = Ex, .


= Paes
>
ite? fo
_ A(n+U(20
+1) ©. From (1) a(S.D.) == (18) = 564 = 7:5099
DAN
aA

From (2) CV. = 3x 100 = 41.7222 %


+
=
=

x=

Equation (1) becomes ot


nn + )Qn+1) (n+) .
nx6 - 4

. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge


- TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
'
Statistics

2) For batsman B

Solution : 1) For batsman


Rea aorey

.6(S.D.) fot BatsmanB

batsman B (C.V.) = IM

* 24.30 ;
554 * 100

lt
==
x= 50455
= 50+5.8 = 55 8 43.86 %

C.V. of batsman A is less than that of batsman B.


o(S.D.) for batsman A
il

. Batsman is more consistent.


58
10 10
= 31776
= 17.82
o
Coefficient of variation for batsman A (C.V )= A.M: x 100
17.82... 31.94 %
= 2 x 100 =

‘ TECHNICAL PUBLICATIONS® - an up-thrust for knowledge


Statistics 3-20 3-27 Descriptive Statistics : Measures of Dispersion

‘Solution : Data A :
aideesountonesgssnrrasye
&

C.V. of data A > C.V. of Data B .


. «. Data B is more consistent than data A or Dates A has greater variability.
o(S.D.) =
EH] Skewness
ca Moments
= 11.105 : * We have studied several aspects of frequency distribution such as coverage and
CV. = x97 * 100 -
6 dispersionIn . order to study a few more aspects such as symmetry, shape of frequency.
distribution, a moments are useful.
| 11.105"
© 332.866 * 190 ; The quantity
2x7 a) th -
is called as r moment (or coy
moment of order r about ‘a’)
= 2.0840%
n .

Data B : a) Raw Moments


Moments about origin or about any point of observation differens from mean is known as
raw moments.

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge TECHNICAL PUBLICATIONS® - an up-thrust for knowiedae
Statistics , 3-22 Descriptive Statistics : Measures of Dispersion
- — — _ jatistios 3-23 Descriptive Statistics : Measures of Dispersion
S 7 D
The r" moment or moment of order r is denoted 1, is given by,
. th = =
i) About origin : —_—_

SAY
se]
/
(for individual observations) u, = Lf(x; — x)
WE

als,
Se 7
z
<A

-
= &
&]

OO
Ip

I
I
(for frequency distribution)

ll
i
il

I
|
1-1.
ain
_ u, = Es 5 = Variance= of
i) About any observation A : _

(x,-- Ay _
(for individual observation)
| . | Lf(x, =)
W, 5
i=l | | w = FJ
& 06,AY (for frequency distribution)
a “seem!
LE(x, — X)
= I, —— N-
‘ WE
i=l a
:
Substituting + r= 0,1, 2, 3, 4 in moments about any point A
y= 0 - = - | EEFIRelation between Raw and Central Moments
we get, - .
—_—— —_— We know that,
— A)
Zf(x, Lfx, (Ef; |
Bee NTA oh w=
f(x,
- AY
L © N
w= X¥-A

_—_— wy

+

ll
vs ? = 00407
N ;
.
=
.
= OW =o
- Lf; .
3.9.1)
= & = Mean square deviation . .
LAY — Also, u. Ef(x,
A 7 = £)
) _ 2f(x,-A+A-—X)
x; = ) ©
uw, = Xf(x, - A)
3 N . a .
dea wy Cow = ER-A)
and i A)’)
i, = ZE(x,~ NEW) CW
Using binomial expansion
az £=1 r=2
b) Central Moments : | = FE (uw) -C, (pg) +©C, (; +.,+(-1
- ; Ms { Mf (Hi tH ) Hw J ) (9 3
Moments about the mean are known ass central moments. .
from equation (3.9.1) 1 EAU)
Central moment of order r (or r” order moment) is denoted by yy, and i is given by, IDB = o-_

1
E(x, — x) B= WCW AD HC Ha WY er wy
. u, - (for individual observation)
. _— Substitutin = 2,3,4 we get,
L(x, — X). a ap ce mse ne " =_ , ie , 32
= . (for frequency distribution) Wy = Wan COMME LY)

| OF wy WY FY CC, =2),
Substituting r=0, 1, 2,3, 4 in moment about mean of frequency distribution formula. rear
Hy = Wy - (HD
/ 3 4 / 4 ; ,

BW, = Hs Cpe tC way - Wy


TECHNICAL PUBLICATIONS® - an up-thrust for knowledge PHRIICA TIONS® - an tin-thrust fac knowlarne
TFOHNICAL
—_—_ Se = _ T OT
Statistics . 3-24 Descriptive Statistics : Measures of Dispersion Statistics : 3+ 25 Descriptive Statistics : Measures of Dispersion

Eee Skewness
4 , / 3
Þ, = W,-3wW,Þu", +20)
, 4 rn 14 , ra 4 _— +3 4 74
By = Wan Cut CW WW) 5 Cu uyy + Cu) '@ A frequency distribution is symmetric about value ‘a’ if the corresponding frequency
= Wd + Ow, (wy = Aur + wy curve is symmetric about * a”, i.e. the ordinate x = a divides frequency curve into two
Wy = Wd + 6h, (1) = 30)
/ 4 é é ? , 4
equal parts (Fig. 3.9.1(a)).

y= I + Skewness is a lack ofsymmetry or departure from symmetry.

M0 Depending of departure from; symmetry, it is divided in two types :


i) Positive skewness
Thus | w= Bey ii) Negative skewness.

By = Wn 3p +201)? i} Positive skewness : :


Vt / /
e
4 é
If the mean lies to the right Side of mode or the curve elongated (or stretched) towards
2 / 4
My = Wy Aye + 6u"(uW) —3(u’))

the right side then distribution i is said to be positively skewed Fig. 3.9.1(b).
© In this case we observe that mode < median < mean
Note :
ii) Negative skewness:
i) The expression ut, contains r terms.
¢ Ifthe mean lies on the left:side of mode or the curve elongated (or stretched) towards the
ii) First term in the expression of jt, is positive and alternative terms are negative. left side then distributioniis Said to be negatively skewed Fig. 3.9.1 (c).
lit) Sum of the coefficient on R.H.S. of each of the relations is zero. In this case we observe then
iv) The last term in the éxpression of u, is (’)’.

We
Mean < Median < Mode
f
SEY Sheppard’s Correction for Centra! Moments
¢ While computing arithmetic mean and S.D. of frequency distribution of continuous
variables, we assume that all the items in the class interval are concentrated at the
midpoint of the class. The same assumption is. made for computing moments. This Mode = Median
= Mean Mode 4 Mean Mean 4 Mode
enables us to compute moments but it gives some amount of error.
; þ © Median | Median
* In case of odd order moments the sum of errors being small and it is negligible. (a) - : (b) {c)
Therefore, odd order moments need not be corrected,
| Fig. 3.9.1
* On the other hand, while computing even ordered central moments, errors are raised to
Different measures of skewness are,
even power, hence all errors are effectively positive. The sum of these errors will be
3(Mean ~ Median
considerably large. In this case W.F. Sheppard suggested the following corrections, i) Skewness =
2 Standard deviation
22

H, (corrected) = Þ, i h is class width ii) Coefficient of skewness = 5


1, (corrected) = My

‘LL, (corrected) = Wy FB ht
+30 | 3.10 Kurtosis
¢ We have studied various: aspects of comparison of frequency distributions which are
average,dispersion and symmetry. However these aspects are not enough for
comparison. | :

rn norms nam nor


EI IN IA dns
nt Lou bom dale
TECHNICAL PUBLICATIONS® . an up-thrust for knowlecae
3-27 Descriptive Statistics : Measures of Dispersion
Statistics _ 3-26 Descriptive Statistics : Measures of Dispersion Statistics

of py = Wm 3 +200)?
+ Two frequency distributions may have the same average, dispersion and same amount
‘ 40 = 3%20X2+2X(2)) == 64
skewness but they may differ in relative height of the curve. ,

Ul
ty +60, 2 3(u,)
C,C, which are symmetrical about W,-4W,
® Observe the Fig. 3.10.1 containes three curves C,C,
—3 x (2) =212
100 -4x2x40+6x 20x (2)
mean and have the same range.
= 16; Þ, =— 64, == 212 |
* Kurtosis is the property of a distribution which express preakedness or flatness of curve. First four central moments are Mk, = 6, p,

Cz - (Leptokurtic curve) ii) To find mean standard deviation


We know that,
- (Mesokurtic curve) WwW, = X-A
Cy - (Platykurtic curve) DE £= W,+A
G (+ X= 2+4=6

It
Fig. 3.10.1 —_ Mean =
Kurtosis is measured by the coefficien t Þ, and is given by, .
S.D.

bn Fare webs
iM.

is called
i
To find coefficient of skewness and kurtosis.
iii)
We know that, -
a) Mesokurtic curve (Normal curve) ; The curve which is neither flat or peaked Coefficient of skewness ( 9 and coefficient of kurtosis (B,) is given by,
as mesokurtic curve or normal curve.
In this case Þ, = 3 or y, = 0. The curve C, in Fig. 3.10.1 is mesokurtic.
b) Platykurtic curve : The curve which is flatter or.it has less peak than that of normal
- curve than the curve is called as platykurtic curve.
- In this case Þ, < 3 or y, < 0, The curve C,in Fig. 3.10.1 is platyurtic,
c) Leptokurtic curve : The curve which is more peaked or curve having more peak than
that of normal curve then the it is called as leptokurtic curve.
In this case B;> 3 or y, > 0. The curve C, in Fig. 3.10.1 is leptokurtic curve.
Example 3.10.1
Solution : Also /

i) Find four central moments. it) Find coefficients of skewness and kurtosis.

Let us consider vitaarbitrary


ucesasentagtt
point A=34 a oo Ein ant oW ht

Solution: Given, A= 4,
First four raw moments p’, = 2, p', = 20, p”, = 40, p,= 100
i) To find the first four central moments.
it, = 0
r ap yl
W K 27 (ul J

20-2*=16
il

TECHNICAL PUBLICA TIONS® . an up-thrust for knowledge


Statistics 3-28 Descriptive Statistics : Measures of Dispersion
Statistics 4 3-29 Descriptive Statistics : Measures of Dispersion

itt) To find coefficient of skewness and kurtosis


We know that,
coefficient of skewness (8) is given by,
1
B, a

MW:
MH :

B, > 3
.. Distribution is leptokurtic.

i) To find four moments about any arbitrary point Rees Bivariate Distribution |
We know that, The distribution for one variate x is knows as univariate distribution. The distribution
Raw moments is given by. which involves more than one variable i is known as bivariate distribution.
yd I Suppose X and Y are the vo; variables. Whenever variable X and Y are the measure on the
WEN Ef (x; = AY = yy Een
same event, then they are likely correlated, For example the rainfall (X) and agricultural
, , 428 _* production (Y). We record the values of X and Y for each village of district. Suppose it gives a
WIN ©2440 1-75 set of pairs (x,, ¥,), (Ky Vos cos py}, --» (X,, y,) Where x, is rainfall and y; is the agricultural
,, _ Efuj 1426 sth .
production ofi village. TL:
Thiss set fn pairs is a bivariate data.
Ho= N © 244©
3554©
My=_ Efuſ = ag ESP] Correlation
= 14.56
4 In real life we come across many situations where two variables are correlated. For
Liu; _ 15766
We Se = 64.61 example work and production, income and expense, price and demand, etc.

ii) To find four central moments In all these cases we get idea about two variables, Such interrelated variables are called as
correlated variables.
H, = 0
And linear relation between such two variables is called
My = pw’, - (Ww) = 5.84 - (1.75) = 2.78 as
It

correlation. ~ .
uw, = W;3W, +201)?
14,56 ~ 3 x 5.84 x 1.75 + (1.75) == 9,98 Types of correlation : Production
ll

Uy W,-48, p* +68, p31)? i) Positive correlation : If variable X increases, Y also


ll

increases or variable X decreases, variable Y also decreases Work


64.61 ~ 4x 1.75 x 14.56 + 6 x 5.84 x 1.75 — 3(1.75)"
then that type of correlation is called as positive correlation. Fig. 3.12.1 Positive
41.86
e.g. If work(X) in any factory increases production (y) correlation
increase and vice versa.

TECHNICAL PUBLICATIONS® . an up-thrust for knowledge


TERCUMCAL DIIBLIP-ATIANC® LE
Descriptive Statistics : Measures of Dispersion
Statistics
Statistics 3-30 . Descriptive Statistics : Measures of Dispersion

bb
‘ where

ll
ii) Negative correlation : If variable X increases, variable Y decreases or variable X
decreases then variable Y increases then that type of correlation is called as negative
correlation. ,
e.g, If supply of any product is more, price decreases and if there is sliortage of produc:
then price increases.
lo oe
‘Hence there is a negative correlation between supply of product and price of product. ' Now — DV; > ¥)
cov(x, y) = EG
There are three types of measures of correlation. i) Scatter diagram, ii) Karl Pearson’s
coefficient of correlation, iii) Rank correlation.
= 236%, y,- Fx, Fy FBV)
= = X y,—PEx,—XLy,+LX5)
HK Scatter Diagram
When we plot correlated variables on XY plane using graph paper taking on variable on X = EVE [ 5(=)IG) 1
(73) +yxyn
axis and other on y axis that representation is called as scatter diagram.

cov(x, y) = h ZKY,~ om)


(a) Positive perfect _ (b} Positive correfation
correlation © 1 2x,)\ (ZY, .
cov(x, y) = HEX NT

’ Also

= 23 -2kx,+@)
5:
{c) Negative perfect _ (d) Negative correlation = Lys * 5 ( =) @*Ln
correlation

S
Fig. 3.13.1

—|=s
= 2K(x) + (8?

Hf
aL Karl Pearson's Coefficient of Correlation

mls
To measure the intensity or degree of linear relationship between two variables, Karl

ret|
Pearson developed a formula called correlation coefficient.

=
I><
—_
wl
a
©
I!

l
a) Correlation coefficient between two variables x and y is denoted by r(x, y) and is

Nl


Y_

on
Ll
A
Lad
defined as

Lo)

|
cov (x,y)
(x, y) 5, 5,
Similarly
where cov(x, y) = Co-variance of (x, y)

= =E(;3) (9; =?)


1

TECHNICAL PUBLICATIONS® - an up-thrust for knowledae


Statistics . 3-32 . Descriptive Statistics : Measures of Dispersion a statistics 3-33 Deseriptive Statistics : Measures of Dispersion
b) Method of step deviation
: : ; Sf Uy; V; _ =
x, A y,-B . 0 cov(u, v) = yp TUT
If us = To and v= | © : 3
:
Er UV. of Ef u; (ay
_ l == -
cov (u, v) = aT) : = Sf
‘2 22 atvEf v;
th en 2 BoER GET-v
0 ET 2 Bo - o,= "37 i
= Ef _ By,
n . n. where Uu=">p V=
| df, i Lf. i
cov(u,v)
and r(u, v) =
: cov(u, v}
| Go % and r(x, y) = ru, v) =~
Note that r(x, y) = r(u, v) OP
-Substituting all the above values we can write
Note: = Oo - | Shay, (SEU) (SEy;
1) Ifr=0 then there is lack of relationship between x andy. —_—_ . © NTN n
2) Ifr= +1 then the relationship between x andy is very strong. . r& y) = (= ur fre v2 (= vy
c) Correlation coefficient for bivariate frequency distribution “LN ) N *"\N )
When a data is presented in bivariate frequency distribution then also
cov(x,y) .
r(x, y) = .
%) 9, Oy i.e. I(x, y)
where
F cov
Hy)
(x, y)
2
; =
xf. -X
*Y
EEG Rank Correlation

- One of the best measure of correlation is Karl pearson's coefficient of correlation but there
(ny) Ef, x; y; = ) = ) : _
i.e. cov(x, y=
y) = NJ [I mi |=
a is a difficulty in measuring the correlation between qualitative characteristics we can arrange
7 — : , the items in ascending or in descending order according to their merit this arrange is called as
Also, of = = - IN ranking, | _
— _— The number indicating the position in ranking is called as rank.
and o = = + yy : The Karl pearson’s correlation coefficient between these ranks is called as Spearman's
; .
Rank correlation coefficient and is denoted by R and is given by
d) Method of step deviation :
: x,-A y,-B
Ov)
R = 1 ——
If WET. v= k : n(n = ) .

TECHNICAL PUBLICATIONS® . an up-thrust forknowle dge


TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics 3-35 Descriptive Statistics ; Measures of Dispersion
2 >
2 2%, ,_2_ 19 ly
Gn W045
= 2.9166 -

—_ mo = o r(x, y) = r(u, v)
Solution : |
_ | cov(u, y) _ 1.75
. . Go, oO.
. .
22 . =_ i _ 1
LetA
_— = 1.75
; y,-B x,-24 . 2.9166
and B = 24 2V,= = :
|

Table = =

Solution : Let x, y represents the marks in two subjects.


Arzage x in increasing order and write corresponding y in front of x.
Let uy,

cov(u, v) =
Wn
7
WJ
i>
Is
j
|
Aa
©
>
Nl
i

nN
I
Tt
wv
li

is

i

b

W
lj

= 3.1666 — 0.25
= 2.9166

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics . 3-36
Statistics . : 3-37 Descriptive Statistics : Measures of Dispersion

Xv —49
SET —

Eu, v,
n —UV= 140 — 12.554

127.446

2. EF
uy
(uy = 128.512

6, = 11.336
. —_F= 149.7, 0, v = 12.2353
Example 3.15.3

RETA Regression
Solution : Let A=68, B 70 As we discussed there is a correlation between rainfall and agricultural production. If we
u;=X;= 68, v,= y; 70 - want to calculate the agriculture production of this year when we know the rainfall of this
Consider the following table year, regression helps us to calculate such value,
“A method of estimating the value of one variable when that of the value of other is known
aud when that two variables are correlated as called regression”,

Ferg Lines of Regression + .


If there are n observations in two variable. X and Y, then there will be two regression line
i) Regression line of X on Y ii) Regression line of Y on X.
When two variables are correlated then we can plot them on graph. And when these points
are in nearly straight strip. Then this line represents an ideal variation and this line is called the
line of best fit.

i) Regression line of Xon Y:


The distance of points measured along X axis and if we minimise the deviation ‘d’ of these
points from the line, we get a line of regression of X on Y.
It is given by X=a+by
Use of this line is to find the value of X when value of Y is given -

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge


“TECHNICAL PUBLICATIONS® - an up-thrust for knowlege
Statistics - 3-38 Descriptive Statistics : Measures of Disp ersion
statistics . 3-39 Descriptive Statistics : Measures of Dispersion

The equation of the regression line of X on Y is


(x-%) =b,(y-9)
ii) Regression coefficient of Y on X :
- The coefficient of x in the equation of regression line of y on'x ie. y= a + bx is called
regressioncoefficient of y on x.
It is denoted by by. and is given by,

Fig. 3.17.1
.. The equation of the regression line of y on x is
ii) Regression line of Y on X:
v
T (y-J) = by, (x =)
Note :
1) Regression coefficients can also be expressed as
Exy
n
xdy
nn
Ex By
Exy_n inn
by = Sy (z y ,b,vx "ix
= (= y

n ~ n n \ny
*

2) Coefficient of correlation is the geometric mean of regression coefficients bxy and byx
ie. = afb,, “By = :
Fig. 3.17.2 3) If one regression coefficient is greater than one then other must be less than one.
The distance of the points measured along y axis and if we minimize the distance ‘d’ of
these point from the line we get line of regression of Y on X.
It is given by Y=aX+bX .
- Use of this line is to find the value of y when value of X is given.
Solution: i) To find mean X and¥ :
RAE Regression Coefficients |
Mean X and¥ satisfies the equation of regression lines.
i) Regression coefficient of XonyY: \
-, When we solve given lines of regression we get values ofX and F.
The coefficient of Y in the equation of regression \ line of XonYiex=e+byis called ” 3% + 27 26 l _— (1)
regression coefficient of X on Y. 6x+y.= 31 . (2)
It is denoted by b,, and is given by
we get,
Solve equation (1) and (2) simultaneously,
olo

b xy =r x= 4,y=7
>

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge


TECHNICAL PUBLICATIONS® - an un-thrust for knowlerlaa
seesctrars
Statistics 3-40 Descriptive Statistics : Measures of Dispersion - Statistics o 3-41 Descriptive Statistics : Measures of Dispersion
ii) To find corretation coefficient (r) :
Solution : Consider the table
Let equation 3x + 2y = 26 be equation of regression line of x on y.
We convert it in x = a + by form

“. XE - 20_2y
3
In this equation coefficient of y gives regression coefficient b,
© by =
—2 >
,

Let equation 6x + y = 31 be the equation of regression line of y on x.


. We convert it in y =a+ bx form
©. y = 31-6x
In this equation coefficient of x gives regression coefficient Dye
of x ==
Mean Xx = Ex,= 7 2385 _ 5.6
= b,= —-6
_ . By, 142.
We know that t=), b.=\ 5 2) 6)=4>1 Mean
of y=y = as =8,4

- But—1l<r<1 Covariance between x and y


'. We consider 3x + 2y = 26 be equation of regression lines of y on x 2X, ,
cov(x, y) = —XY © . ,
| _ 26 3x
y" 272 . | = i=. - (5.6) (8.4) == 8.04
| —3
DER . Ex , T21
oe Q Y : 11.84
_We consider 6x + y = 31 be equation of regression lines of x on y ;
. -=1_y oO, = St
Ly. | 3
gay = 5.84
™ 6 6

wg Gea
i) Regression line of y on x:
(y-¥) = b, (x=3)
(y- 8.4) = —0.679(x = 5.6)
y = 12.202 - 0.679x ...(1)

TECHNICAL PHRICATIONS® . an wn-thruet for knowladee


TECHNICAL PUBLICATIONS® - an up-thrust for knowtedas
Statistics. : 3-43 Descriptive Statistics : Measures of Dispersion
Statistics _ 3-42 Descriptive Statistics : Measures of Dispersion

Multinational distribution is generalisation of binomial distribution.


it) Regression line of x ony:
it leads to
Ja tae experiment if there are more than two.mutually exclusive outcomes then
(x- x) bxy (y- 9) .
multinomial distribution.

Uh
(x=5.6) (— 0.236) (y - 8.4) holds
We use multinomial distribution on experiments in which the following conditions
7.582— 0.236y

&
_
X
good.
iii) Value of y when x ll 12 « The experiment in which repeated trials performed, e.g. Rolling a die fire times instead
Put x = 12 in (1) of just once.
We get, Ly 4.054
Each trial must be irdependent of others.
ti

+
iv) Value of x when y 8 « The probability of each outcomes be same across each trial of the experiment.
Put y =8 in (2)
We get, . x 5.964 , =_ . , £20) Poisson Distribution
I

number of
Poisson distribution is a limiting case of the binomial distribution when ‘n’ the
En Binomial and Multinomial Distributions trials is infinitely large and ‘p’ the probability of success is very small and np is finite.
Let us consider an experiment involving n trials performed under identical conditions each ‘The probability of r successes in a series of n independent trials is
is q.
of which can result either success whose probabi ility is p or failure waose probability
Bp, 9 = "Cpq"",r=0,1,2,
Such distribution is called by Binomial. As there are only two outcomes ptgq=l.
Let us consider r success out of n trials hence (nn—r) are failure.
*. Probability of r success in ntrial is given by,
— 1)(n~ 2)...(n= (e~1Y(n~ 1)! Ky
n(n
pt success) = p{(ssss...ss)(times)}X picfff...ff)(n ~ 1) times} @=rprt
(ppp...p)(r times) X (qq...q) (nr) times
I

ron-sf
= Pq
A r successes and (n~ r) failures can occu rin "C mutually exclusive cases. Z
ce n-r Let Z=0p «PE,
Ps p[r success] "cpq":
*.. Put “fr 0. 1,2,...,n

on

oo
—_
B(n, p, r)

Ou
l
1

AQ
Ix
Consider the binomial expansion
@ +p)= htc’ pt "Cp pt. tp
n-2
np =z
co and —
Asn-
Terms of RHS given the orobabtty of r = 0, I, 2, ..., n success. Hence above probability
t-z , n

tim, B(o, p, r) ze" (im {1 -2) =e ‘


distribution to be called Binomial probability distribution. It is denoted by Þ(n, p, r)
cr n-r
bo B(n, p,r) = "Cp
| Note : Detail study of binomial distribution will be in next unit.

TECHNICAL PUBLICATION. s® - an up-thrust for knowledge


Statistics 3-44 Descriptive Statistics : Measures of Dispersion statistics . | 3-45 Descriptive Statistics : Measures of Dispersion
EPL Uniform Distribution
When it is represented on graph, it is bell y
A discrete random variable x taking values 1, 2, 3, ...,
2+
shaped curve. This curve is symmetric about
n is said to have uniform distribution if its pmf is given ordinate x = a i.e. symmetric about mean as shown
by, in Fig. 3.23.1. '

=
P(X =x) = = 1,2,...,n In Gaussian curve mean, amedian end mode
coinsides, therefore the gaussian curve has one
Where ‘n’ is parameter of uniform distribution.
maximum point, Two tails of the curve are extend
When P(X =x) = 1, Fig. 3.21.1 to — «> and + co towards the positive and negative Mean
this distribution is called as standard uniform directions of X axis 1.e. curve is asymptotic to X
distribution shown in Fig, 3.21.1. axis. The line x = a(mean) divides the distribution Fig. 3.23.1
curve in two equal parts above xX axis.
EZ] Exponential Distribution
Total area under this curve is i.
A continuous random variable X taking non-negative values is said to have an exponential
oo

distribution with parameter n > 0, if its pmf is given by


‘Le. ſ f(x) dx = |
ne“ x>0
P(X =x) =
0 otherwise When x lies between x, and X,, then probability is represented by the area (shaded in
The exponential distribution is used when the amount of the time unit some event occurs. Fig. 3.23.2) under the curve y = f(x) bounded by X axis and the lines x = 2y and x = x, and is
Some example those can be modeled by using exponentiai distribution are given by * .
¢ Time duration between two customers those entering in a contain shop. . Xy : Xz | 3 x = a)" +

p(x, €xSx,)
_ "ox

$x $x,) = JJ fx)
:

* Time duration between occurrance of earthquake. i ) dx= | | =


oon e ©. ® | i. ( (3.23.2 )
° Time gap between customer calls in business.

EE] Gaussian Distribution 1


}
Gaussian distribution is also known as normal distribution. Gaussian distribution is the k
t

continuous distribution of a variable x (known as random variable or normal variate) —


Gaussian distribution can be derived from binomial distribution when n number of trials is
I
very large and neither p nor q is very small.
Gaussian distribution interms of continuous random variable-x with mean a and standard
|
WW X1Xo
deviation o* is given by
Fig. 3.23.2
1 YL . Standard form : | .
y= 1) in ° o .. (3.23.1)
Put —_—

KA
' 40

TECHNICAL PUBLICATIONS® . an up-thrust for knowledge


TECHNjCAL PUBLICATIONS® - an up-thrust for knowtedas
Statistics _. ae . 3-47 Descriptive Statistics : Measures of Dispersion ©
Statistics , - 3.46 Descriptive Statistics : Measures of Dispersion
"T's | 0.9641 0.9649 | 0.9653 | 0.9664 | 0.9871 } 0.9678 | 0.9686 | 0.9693 : 0.9699 | 0.9708
—o 709713 : 0.9719 | 0.9726 : 0.9732 : 0.9738 | 0.9744 : 0,9750 | 0.9756 : 0.9761 5 0.9767
— 0 709772 7 0.9778 } 0.9783 | 0.9788 7 0.9793 | 0.9883 | 0.9803 | 0.9808 | 0.9812 } 0.9817.
2.3 0.9821 } 0.9826 } 0.9830 : 0.9634 # 0.9838 } 0.9842 : 0.9846 : 0.9850 : 0.9854 : 0.9857
in equation (3.23.2) 0.9881 ; 0.9884 : 0.9887 } 0.9890
"327 0.9861 * 0.9864 7 0.9868 | 0.9871 0.9875| 0.9878 |
When mean (a) = 0 and Standard deviation (0) = 1, random variable z is called as Standard 3.3 7 0.9893 7 0.9896 : 0.9898 ! 0.9901 : 0.9904 | 0.9906 : 0.9809 | 0.9911 | 0.9913 | 0.9916
normal random variable and equation (3.23.2) becomes 34 70.9918 7 0.9990 } 0.9922 : 0.9925 } 0.9927 | 0.9929 } 0.9931 2 0.9932 : 0.9934 } 0.9936
. . 2
25709938 | 0.9940 } 0.9941 : 0.9943 | 0.9945 } 0.9946 * 0.9948 } 0.9949 : 0.9951 : 0.9952
"26 7 0.9953 7 9.9955 ? 0.9956 | 0.9957 | 0.9959 | 0.9960 | 0.9961 | 0.9962 | 0.9963: 0.9964
(3.23.3) 27 0.9965 } 0.9966 } 0.9967 * 0.9968 } 0.9969 } 0.9970 : 0.9971-3 0.9972 ; 0.9973 : 0.9974
2.8- | 0.9974 © 0.9975. 7 0.9976 : 0.977 : 0.9977 : 0.9978 : 0.9979 | 0.9979 : 0.9980 ; 0.9981
It is free from two parameters a and o. It is useful to compute areas under the. normal 2.9 0.9981 | 0.9982 | 0.9982 : 0.9983 : 0.9984: 0.9984 : 0.9985 : 0.9985 : 0.9988 ; 0.9986
probability curve by using standard table. 30 | 0.9987 | 0.9987 | 0.9987 | 0.9988 | 0.9988 | 0.9989 | 0.9989 | 0.9989 | 0.9990 : 0.9990
3.1 0.9999 : 0.9991 # 0.9991 : 0.9991 } 0.9992 : 0.9992 : 0.9992 : 0.9992: 0.9993 : 0.9993
When curve is divided into two equal parts by z= 0 then LHS area =RHS area = 0.5
3.2 0.9993 } 0.9993 } 0.9994 : 0.9994 : 0.9994 : 0.9994 : 0.9994 } 0.9995 : 0.9995 : 0.9995
3.3 0.9995 : 0.9995 } 0.9995 } 0.9996 } 0.9996 : 0.9996 : 0.9996 : 0.9996 : 0.9998 0.9997
3.4 60.9997 : 0.9997 : 0.9997 : 0.9997 } 0.9997 } 0.9997 : 0.9997 : 0.9997 9.9997 } 9.9898 :

Table 3.23.1 in each row and each column 0.5 to be substracted


.
Remarks :
1) pl, <x<x,) = P(2 <z< 2) = P(z,) - P(Z) = (Area under the normal curve from.
0 0 to z,) — (Area under the normal curve from 0 to z,) = A(z;) — A(z,)

Fig. 3.23.3

Table of area:
Z 0.00 } 0.01 * 0.02 | 0.03 | 0.04 } 0.05 : 0.06 } 0.07 | 0.08 } 0.09.
0.0 7 0.5000 : 0.5040 : 0.5080 } 0.5120 : 0.5160 7 0.5199 : 0.5239 + 0.5279 : 0.5319 2 0.5259
0.1 # 0.5398 | 0.5438 | 0.5478 : 0.5517 } 0.5557 | 0.5596 } 0.5636 ? 0.5675 : 0.5714 7 0.5753
0.2 | 0.5793 : 0.5832 | 0.5871 | 0.5910 : 0.5948 : 0.5987 ! 0.6026 } 0.6064 : 0.6103 | 0.6141
a=0 x4 Xo
0.3 © 0.6179 | 0.6217 } 0.6255* 0.6293 + 0.6331 | 0.6368 | 0.6406 : 0.6443 | 0.6480 } 0.6517
0.4 0.6554 : 0.6591 | 0.6628 : 0.6664 : 0.6700 0.6736 0,6772 + 0.6808 } 0.6844 } 0.6879 Fig. 3.23.4
0.5 # 0.6915 | 0.6950 | 0.6985 | 0.7019 : 0.7054 | 0.7088 : 0.7123 + 0.7157 ; 0.7190 } 0.7224
0.6 2. 0.7257 | 6.7291 } 0.7324 | 0.7357 | 0.7389 | 0.7422 | 0.7454 } 0.7486 } 0.7517 3 0.7549 2) POO<x<x,)=P(0<z<z,) = Area under the normal curve 0 to z, = A(z,)
0.7 0.7580 | 0.7611 } 0.7642 7 0.7673 } 0.7703 } 0.7734 ; 0.7764 } 0.7793 : 0.7823 } 0.7852
0.8 # 0.7881 { 0.7910 * 0.7939 } 0.7967 : 0.7995 } 0.8023 : 0.8051 } 0.8078 } 0.8106 } 0.8133
0.9 0.8159 } 0.8186 } 0.8212 } 0.8238 + 0.8264 * 0.8289 } 0.8315 5 0.8340+ 0.8365 * 0.8389
“£0 £7 0.8413 | 0.8438 } 0.8461 * 0.8485 | 0.8508 } 0.8531 } 0.8554 * 0.8577 : 0.8599 } 0.8621
1.1 0.8643 : 0.8665 : 0.8686 : 0.8708 | 0.8729 ? 0.8749 {| 0.8770 : 0.8790 : 0.8810 * 0.8830

_—===
1.2 # 0.8849 * 0.8869 * 0.8888 | 0.8906 } 0.8925 } 0.8943 } 0.8962 } 0.8980 } 0.8997 * 0.9015
1.3. 2 0.9032 : 0.9049 : 0.9066 } 0.9082 } 0.9099 : 0.9115 } 0.9331 ? 0.9147 } 0.9162 3.0,9177

Tx
0.9292 } 0.9306 } 0.9319 0
1.4 2 0.9192 : 0.9207 7 0.9222 { 0.9236 } 0.9251 } 0.9265 : 0.9279 7
1.5 # 0.9332 + 0.9345 + 0.9357 } 0.9370 } 0.9382 } 0.9394 : 0.9406 } 0.9418 : 0,9429'5 0.9441 Fig. 3.23.5
1.6 } 0.9452 * 0.9463 | 0.9474| 0.9494 ; 0.9495 | 0.9505 : 0.9515 } 0.9525 } 0.9535 } 0.9545
1.7 | 0.9554 : 0.9564 | 0.9573 | 0.9582 ; 0.9591 | 0.9599 | 0.9608 | 0.9616 | 0.9625 | 0.9633
TECHNICAL PUBLICA TIONS® . an up-thrust for knowledge =

TECHNICAL PUBLICATIONS® - an up-thrust for knowledae


Statistics 3-48 Descriptive Statistics : Measures of Dispersion
3-49 Descriptive Statistics : Measures of Dispersion
3) P(x >x,) = P(z>z)=0.5- A(z,)
EZ] Log-Normal Distribution
A continuous distribution of. tandom variable X whose natural logarithm, is normally
distributed is known as log-normal distribution.
This distribution is also calledas, Galton distribution.
For example if random variable X = exp(y) has log normal distribution then y = logx has
“+. normal distribution.
0 xy Most use of lognormal distribution in finance, in analysis of stock prices, milk production
Fig. 3.23.6 by cows.
4) P(x <~x)=P(z<- Z,)=0.5 - A(z,)
The probability density function is given by then
mean a and standard deviation o, -
__1 Inx—a)\. x>0.
- P(x) = oven 26 | ;

The log normal distribution is: represented by graph


as shown in Fig. 3.24. 1.
Log normal distribution curve is positively skewed © . Fig. 3.24.1
Fig. 3.23.7 with long right tails due to low mean values and high:
5) P(- X,<X <x) =P(-z,<z<- 2,) = A(z,) -— A(z,) variances in the random variables.

EF Chi-quare Distribution
Chi-square distribution mainly ued it
i) Testing hypothesis. =_
ii) Tests for the independence if two criteria of classification of qualitative data.
iii) Tests for goodness of an observed distribution to a theoretical one.
=x, X 0 The chi-square distribution is
i one of the important distribution in statistical theory.
Fig. 3.23.8
6) P(x, <x <x,)=P(-z,<z< Z,) = A(z,) + A(z,) It is denoted by x, where ‘n’ is parameter of distribution called as “degrees.of freedom”.
The variate x. is the sum of squares of n independent standard normal variables [N(0, 1)].
Let X,,X,, X,,..., X, ben independent N(O, 1} variables then the Chi-square distributio
n is
given by, .

*X 0 % where n'is degrees of freedom.


Fig. 3.23.9
7) Mean deviation from the mean of Gaussian distrib 4 .
ution is 59 (approximate)

TECHNICAL PUBLICA Tions® - an up-thrust for knowledge


TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
9-51 Descriptive Statistics : Measures of Dispersion
Statistics . 3-50 - Statistics

[3.29] One-Sided or Two-Sided Hypothesis


3:26 Additive Property of Chi-square Distribution
x) variates with n, and n, degrees of Nature of the hypothesis decides the type of hypothesis whether it is one-sided Hypothesis
If Y, and Y, are two independent Chi-square
or Two-sided Hypothesis. .
freedom respectively, then Y, and Y, has Chi-square (x* ) distribution with (n, + n,) degrees of
One-sided : Hypothesis H,: My > Uy, H,: P< 0. 5, H, : oy <b, ete. are called as one-sided
freedom. -
hypothesis.
Two-sided : Hypothesis of the type H, : P, # 0.5, H, : 6, #6,, 0, : = Þy etc. are called as
. EF Hypothesis
two sided hypothesis.
Hypothesis is a claim about testing the population parameters such as mean variance
eg. FEI Errors (Type-l and Type-ll)
1) Average of two vehical’s. Type-I and Type-II are the two errors defined in testing hypothesis.
2) Proportion of pass students in two subjects. Type-I error : When H, is true and it is rejected.
3) Proportion of placements of students of two branches. e.g. Consider null hypothesis
4) Proportion of production of crop is two different fields. H, product ‘X’ is a good.
These claims stated in terms of population paratheters or distribution are called hypothesis. In reality if product ‘X’ is a good product. i.e. H, is true, but if the quality analyst says that
Definition : Hypothesis is the statement or assertion about the statistical distribution or the product is bad hypothesis rejected then there is a mistake such a mistake is called as Type-l
unknown parameter of statistical distribution. ‘ ,
. error.
Type-II error : When H, is false and it is accepted.
EF Null and Alternative | Hypothesis:
e.g. Consider null hypothesis | :
Test begins by considers ing null and alternative hypothesis. These hypothesis containing _ ‘
H,: Preduct “Y’ is bad.
opposite view points. i.e. If one rejected the other one is to be accepted.
In reality is product ‘Y’ is good product. i.e. H, is false, but the quality analyst says that
Null Hypothesis (H,):
product is bad-means hypothesis accepted then there is mistake such a mistake is called as

According to R.A. Fisher null hypothesis defined as a hypothesis of no difference and it is Type-ll error. ;
denoted by H,.
The following table will clear the idea.
e.g. H,: Þ,= Þ, Or MW, = Wy= ©, This Hypothesis states that there is no difference between
two population means.
Alternative Hypothesis (H,):
Alternative hypothesi s is a hypothesis when null hypothesis rejected, alternative hypothesis
accepted.
It other words Alternative hypothesis is a complementary hypothesis to null hypothes is
EE Some Definitions
and it is denoted by H,.
j) Test of Hypothesis: A rule which jeads to the decision of acceptance of H, or rejection
e.g. If H,: Þ,= UM, then alternative hypothesis may be
of H, on the basis of random samples is called test of hypothesis.
H, : jy, = 1, or (H,) are
ii) One sided and two sided tests : The tests used for testing null hypothesis
H,:1<p,or one
called as one sided or two sided tests according as the alternative hypothesis are
H:Þ>M
sided or two sided hypothesis.

TECHNICAL PUBLICA Tions® - an up-thrust for knowledge


TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics 3-82 Descriptive Statistics : Measures of Dispersion Statistics 3- 53 Descriptive Statistics : Measures of Dispersion

© OH...
it) Test statistic : A function of random sample observation ‘which isi used to test null k

lk Mx
hypothesis H, is called as a test statistic. L = 2N#N CY k= YL @=

It
iv) Critical region : Let R,, X,, «-., X, be the set.of Critical
0.
values for which the null hypothesis Hp is | region(w) Acceptance 1

Mx
LOR

ll
rejected is called as critical region or rejection region (w°)
region. JW has X distribution with (k = p - I)
Critical region is denoted by w and acceptance degrees of freedom and it is parameter of
region is denoted by (w) as Shown in
Fig. 3.31.1
the chi-square distribution.
Fig. 3.31.1. The critical region at level of
v) Level of significance : Probability of rejecting the null hypothesis H, when it is true is significance © is Acceptance
l
2 region l
called as level of significance. x. pz Xe p-he
That is it is the probability of committing type-I error. [t is denoted by «. where. 2
X,_,.,z - Table :
value — . 1
7 ss
if we try to minimize level of significance the probability of error of type-II increases, corresponding (critical value) to degree ;
So level of significance cannot be made zero. However we can fix it in advance as 0.01 of freedom k Land level of _ Xk-p-1;6
or 0.05 i.e. (1 % or 5 %). In most of the cases it is 5 %. Significance &, it is $hown is Fig. 3.31.1 (a)
vi) Test for Goodness of fit of X’ distribution : For a given frequency distribution are try Fig. 3.31.1(a).
to fit probability distribution.
Note :
We know that there are several ity distributions, among these distribution which 4) If the total of cell frequencies is sufficiently large (greater than 50) and expected
will fit to given data use have to identify for that we have to test the appropriatness of the fit. frequencies are greater than or equal to S(i.e. e, 2 5) the we can apply this test.

Let H, : Fitting of the probability distribution to given data is proper (good). 2) While fitting a probability distribution or in finding expected frequencies if any
parameter are not estimated then value
of P is zero.
The test based on X’ distribution used to test this H, is called X’ test of goodness of fit.
3) When expected frequency of a class is less than 5, the class is merged into
Suppose %, 0,0; ... 0; ... O, be the set of observed frequencies and ¢€,, €,,... €,... &, be neighbouring class along with its observed and expected frequencies until total of
corresponding expected frequencies or theoretical frequencies. expected fréquencies becomes greater than or equal to five. This procedure is called
There is no significance
“Pooling the classes”.
between observed and theoretical frequenceness.
In this case k is number of class frequencies after pooling.
Suppose P = Number ofparameter estimated for fitting the probability distribution.
k
oO > 0; = £ e; =
i=l i=l.
If H, is true, then the statistic
Solution :
kys fo-e .
x’ jo Here n= 10
lt

6 1
p| = Probably of getting a doublet =3676
i 5
q= Probability of not getting a doublet ==] = 676

_ TECHNICAL PUBLICA Tions® - an up-thrust for knowledge


TECHNIGAL PUBLIGA Tions® - an up-thrust for knowisica
Statistics 3-54 Descriptive Statistics: Measures of Dispersion
. 3-55 Descriptive Statistics : Measures of Dispersion
Statistics
By binomial distribution, 45 ©
PB)= 7024

T
=
pr)

Lv

x
ll

[ALO
ii) Probabi-ity of getting at least 4 success:
‘p(6)

wo

i
ll

&
p(r2 6) = p(6) + p(7) + p(8) + p(9) + p(10)
© = arals) @ 16 TN IN 106 1 1Y

6\2) \2 BQ)
it TY TY 4 1 i0 1
+ 0 3) \5 + Cals 5

10! /1\ 101 ne ior rn el


Solution t
= gals) trea) tira) +16
Here. ' © = 20
_n = No. of sample 10 ,
+ (210 +45+10+1)
266
1024

We know that

p(r) = Cp 4
p(4) = C,

20! (1 (9! 20x 19x 18x17 OY" + . _


PE 15
16! 4 76 Vo) © 4x3x2x1 (09
= 0.05278 | q

a
ll

i
Hence probability of r no.s called out of 6 called are
6. r_6=r
p@) = Cpa
Solution : A dice isi thrown 10 times. odd numbers are 1,3, 5, 6,9 i) For not more than 3 calls are busy:
1 p(r<3) = p(9)+ p(t) +p) +2) o
: bility of an getting an odd number =-16”= "2
”. Proba
‘c.ls) (is) wos)(is
ex) (9-15) GS)
We know that, p(x) = "C,p'q
a9, (14Y + 6(14f +28 (19) +228
0.9997

“ll
) Probability of 8 SUCCESSES :
w. fi)ne flfry ii) For at least 3 calls tobe busy :
p(8) Cy 2) 2
_ 102 ty _ 10x9/1\® p(t 2 3) 1 - p(<3)
2181\2 2 2 p(r 2 3) 1 = [p(t = 0) + p(r= 1) + pr=2)]
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
TECHNICAL PUBLICA TIONS® . an un-thrust far knowladne
Statistics . 3-56 Descriptive Statistics : Measures of Dispersion
statistics 3-67 Descriptive Statistics : Measures of Dispersion

Solano = 750: |

ll
0.0051 iti) Probability of expected no. of families having 1 or 2 girls. Which is equivalent to finding
. probability of having 3 boys or 2 boys.
p(l or 2 girls} = p(3 or 2 boys)

Solution : = pe=) pen)


Let
.
p = Probability of having a boy = 3
I - 6 LDP G) 0G) @)
4 ay.
1 | iG) ++35 (5)
q = Probability of having a girl =>
We know that .
p(r) = "C. p* q =r
Hence expected number of families having I or 2 girls

1) Probability of at least a boy 2000p[3 or 2 boys] 2000 x28


p(at least a boy) = 1~—p(r<1)
= i250
= 1—p(r=0) iv) Probability of ao girls i.e.
i probability of all children are boy's
4 T 0 Tr 4

- i= © () G) p(no girls) = p(having boys)


: pr =4)

<0

It
= 1 -4 4

= EET7 Ov .
Hence expected no. of families faving at least a boy, .
Hence expected number of families having no0 girls
2000 p(r>1) = 2000xD
<7;
2000 [p(r = 4)] = 2000: xx16
= 1875
it) Probability of two boys : - = 125
p(having xample 3.34.6
2 boys) = p(r=2) *
LA I 9
a ( 2) © .
4 (1)
~ 212!() .
Solution : Here 1 in 500 tyres to be defective,
=73
Hence p=
Hence expected number of families having 2 boys

2000 p(r = 2) = 2000 x2

TECHNICAL PUBLICATIONS® . an up-thrust for knowledae


TECHNICAL PUBLICATIONS® - an un-thruet far knawlarine
3-59 Descriptive Statistics : Measures of Dispersion
Statistics ; 3 - 58 statistics

By poission distribution, we know that By noisson distribution, i.


F

|
“—Z

—E:r=0,1 ,2,3,4 | |
ha r

‘ P= |
DH rh
= 0.6065
e oS. =
i) Probability of no defective tyre : A p(0)
~ 0.02
o 0.02)?
p(r=0) = FL e = 0.5
= 0,6065 x 0.5 = 0.3033 |
p(1) = ii 58 ' |
: , 0.980
= os = 0.6065 x 0.125 = 0.07
It

©. Number of lots containing no defective p(2) =


-05 3 = . .

= ££2L. = 0.6965 3 0.0208 = 0.0126


1000 x 0.980
p(3)
980 tots
ES 0.57
ii) Probability of one defective
p(4) = al = 0.6065 x 0.0026 = 0.0016
. ~ 0.02 1

pe=1) = OO Theoretical frequencies are |


0.9802 x 0.02 = 0.019 p(0) x 200 = 121
|

, Number of lots containing one defectives _— are x 200 =


p(1) 161
1000 x 0.019 =_ p(2) x 200
Il

19 lots
p(3) x 200
It

p(4) x 200

Il
Solution : Solution :

Here 0.001p =
n= 2000

z=np = 0.601 x 2000 = 2


By poisson distribution, -
-z T
e Z
p(t) = rl

i) Probability that exactly 3 suffers a bad reaction


2 2 3
pt= 3) 3!
Let mn _ z= Mean of the distribution
Exf 100 0.136 x2
~ 200 0
~ Nesp 1
_— 0.1813 :

TECHNICAL PHREOATIONS® | an inthruet far knowidadns


TECHNICAL PUBLICATIONS® - an uo-thrust for knowledae
my; PTL IT
Statistics 3-60 Descriptive Statistics : Measures of Dispetsion
Statistics 3-61 Descriptive Statistics : Measures of Dispersion
ii) Probability that more than two will suffer a bad reaction
p(more than 2) p(r=3)+p(r=4) +... + p(r = 2000)

It
_ (p(r = 0) +p(r= ) *p(r=2)]

tt
It I-e*fl+2+2]
= 0.32
0 061 2D
Fig. 3.31.2
A, = Area corresponding to z, = 0.61
= value of 0.61 in Normal table (0:6 row and 0.01 col.)
Solution : Given that 5 phone calls are coming into a call centre.
5
= 0.7291 -0.5
= z= ==0.02
60 = 0.2291
. The probability of getting at most two phone calls.is A, = Area corresponding to z, = 1.32

= p(r<2) = 0.9066 ~ 0.5 = 0.4066


-. Required probability = A,~A,
= p(r=0+p(r=1)+p(r=2)
=2,
e WW ,c
\0 -2z
(2
i
cf
- 2 ,
i.e. p(443<x£ 7.29) = 0.1775
TS TT ii) —0A3<x £5.39» | |
_—_— 061 |
2

= e 8) 140.08 OS) z, = 2-085


= 0.9999

bt
Let z= =
oc

i) When 4.43 < x < 7,29 —0.61 0 0.85

. Zz, = Aas | Fig. 3.31.3


Z - 0.607 = 0.61
p(- 0.43 <x < 5.39) p(- 0.61 $z< 0.85) .
z, =
7.29 = -2
= 1,322 = 1.32 Area correspording Z, =— 0.61 (= 0.61) due to symmetry

p(4.43 <x <7,29) = p(0.61<x< 1.32) I


0.2291
Area corresponding to z, = 0.85 = 0.3023

TECHNICAL PUBLICA Tions® - an up-thrust for knowledge


‘ TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
a
eg
statistics | | 3-63 Descriptive Statistics : Measures of Dispersion
Statistics 3-62 © © Descriptive Statisfics: Measures of Dispersion
No. of candidates who score between 30 and 60 = 1000 x 1.0514 = 1051.4

as
©. Required probability A, + A, 274 candidates score exceeds 60 - .
0.2291 + 0.3023 = 0.5314 1051 candidates score is between 30 and 60.
Example :

sae

Solution : Here mean a = 45 and standa rd deviation o = 25


Solution : Given : Mean a= 1000, Standard deviation a = 200 ©
x-a x~45
7° z= = 700 — 1000" 7 .
o 25 x = 700,z= 200 —_—

If
i) Number of candidates whose score exceeds 60
~ 415
60 — 45
When - x = 60, z= 0.6 e p(x < 700) p(z<-I1.5)
25
p(x > 60) p(z > 0.6) p(z> 1.5)
fo
0.5 —p(z= 0.6) 0.5 = 0.4332

fl
0.5 ~ 0.2257 = 0.2743 0.0668

i
ll

*. No. of candidates who exceed 60 = 1000 x 0.274 D 0.6


=274 . —_— =_ Fig. 3.31.4
ii) The number of candidates who score between 30 and 60 i.e. p(30 <x < 60)
X,7a 30-45
= GBF 235 =—0.6

=
2 Go 235 706
Fig. 3.31.6
p(30 <x < 60) p(- 0.6 <z<0.6)
= p(0<z<0.6)+p(0<z<0.6). be expected to fail in 700 Hrs. = 0.0668 x 2000 = 1336 bulbs *
Number of LED bulbs might
a
0.5257 + 0.5257 Example vo : UG -

1.0514

Solution : Here n= 1000, Mean a = 14, Standard deviation (0) =2


x-a = x-14
zZ=7
6 2.5

DD 0 06
Fig. 3:31.5
TECHNICAL PUBLICA tions® 7 an up-thrust for knowledge

TECHNICAL PHREICA Tinns® nan unthrust far knawlodae


Statistics _ 3-64 Descriptive Statistics : Measures of Dispersion

i) Students score between 12 and 15:

5 12.2 = “yp a 08 x, = 15.5,z,


%715,% = aM o4
« 550
p(2<x<13) = p(-0.8<2<04) © . :
A, = Area corresponds to (z, = — 0.8) = 0.2881
T x2= 16.5, ©
| | | - 16514
_ A, = Area corresponds to (z,=0.4)=0.1554 —
72 : "735 | Fig. 3319
p(12<x<15) = A, +A,
p(15.5<x< 16.5) = p(0.6<z<
1)
= 0.2881 + 0.1554 = 0.4435
=p0<2<1)-p(0<2<0.6)
A, = Area corresponds to (z, = 0.6)

= 0.2257
A, = Area corresponds to (z, = 1)
E . | = 0.3413
_ p(13.5<x<16.5) = A,—A,
a 0 04 =_ - = = 0.3413 = 0.2257
Fig. 3.3.7
= 0.1156
Required number of students = 1000 x 0.4435
. Number of students score 16 = 1000 x 0.1156
443 | —_— | = 115.6= 116
il) Students score below 8 ,
Here x =8

p(x <8) = p(z<-2.4)


| = 0.5-p(0<2z(24)) *
= 0.5 -0.4918
p(x<8) = 0.0082 2.4 0
-. Required number of students = 1000 x 0.0082 ; Fig. 3.31.8
= 8.2
N= Zo, = $6
=8 ‘. E(X) = Expected customers
= Np, =56 x7 =8

TECHNICAL PUBLICATIONS® - an pina knowledge : TECHNICAL PUBLICATIONS® - an otras for knowledge


Statistics ‘sth ' 3-67

= 800 x 7 = 200 oo,


eget rascenecnattr p

Here k=41, p=0 = .


. | . : . . .
Here p=0,k=7.
i=l % _—
3 k (o,- Sj)
xh ed +
6800
2000
K-p-l

X, = 34 (observed value) _ —_

= 7.815 (Table value)


? = 5.25 (observed value) AS
2
X, > X3 095
2
,
X,ovs = 15-592 (Table value) .
Calculated value > Table value ,
. . 2 oh
. We reject H, ; \
™ X, < X;o05
i.e. Observed value < Table value ©. The students are not uniformly distributed over the windows.
©. We accept H,
Exercise
- The customer visit are distributed uniformly over the days of week.
1. Calculate first four moments about the mean for the distribution
Example = , ; oxeooners: x95

Also find coefficients of skewness and kurtosis.


[Ans. : jt, = 0, ft, = 2.49, jy, = 0.68, 1, = 18.26, Þ, = 0.029, B, = 2.95]
the
2. The first four moments of a distribution at x = 2 are 1,.2.5, 5.5 and 16. Calculate
about mean. — fAns. : py = 0, n, = 1.5; By =0, p,= 6]'
first four moments
Solution : Let, H, : The distribution of students over 4 windows is uniform .
3. Obtain the regression line y on x for the data
_As the distribution is even 1
. _1 ,
“ | - .
Also estimate value of yfor x = 10 [Ans. : y = 1.3399 x — 1.9606, y = 11.4384]
E(x) = Expected frequency
Np;

* TECHNICAL PUBLICATIONS® - an up-thrust for knowledge TECHNICAL PUBLICATH ons® - an up-thrust for knowledge
Statistics - 3-68 Descriptive Statistics : Measures of Dispersion

| 4 For the lines of regression 8x - 10y = = 66 and 4x — 18y = 214. Find i) Mean of
x and y, ii) Coefficient of correlation r. [Ans. : X=13, Y=17, r=+ 0.6}
5. The following equations represents regression lines equation 76. In a certain examination test 200 students appeared in subject of statistics. Average
marks obtained were 50 % with Standard deviation 5 %. How many Students do You
y = 0.516x + 33.73 expect to obtain more than 60 % of marks, supposing that marks are distributed
x = 0.512y + 32.52 ' normally. : {Ans. : 46 students approximately]
Find i) Mean of x and ¥, tf) Coefficient of correlation (r) 17. In a distribution exactly normal 7 % of the items are under 35 and 89 % are under 63.
= fAns. :X= 67, y = 68.6y, r = 0.514] Find the mean and standard deviation of the distribution _
” —_ [Ans. : mean a = 50.3, o = 10.33]
Problems on Binomial eae 18. A normal distribution has a mean 15.73 and a standard deviation of 2.08. Find % of
—_— cases that fall between 17.81 and 13.65. | [Ans. : 68.26]
6. An unbaised coin is thrown 10 times. Find the probability of getting i) exactly 6 heads 19. A fair coin is tossed 600 times. Using normal distribution. Find the probability of
i) at least 6 heads. - fAns. : i) 575 ii) a getting i) Number of heads less than 270, ii) Number of heads between 280 to 360.
7. Probability of man aged 60 years will live for 70 years is 1/10. Find the probability
:
of - .
. (Ans, .: i)i 0.0074 , 11)ii) 0.4845
0.4845]
5 men selected at random 2 will live for 70 years. {Ans. : 0.05467] 20. The average test marks in: particular class is 59 and S.D. is 9. If the marks are
8. | Out of 800
; families
j wpe iStri
with 4 children each how many families ' 7 "70 received
would be expected to have i) More than 70? how us students in a class of 70 received i)i) vane
‘ks belaw 50 ?
a Oa by 3]
i) 2 boys and 2 girls ii) At least one boy iii) No girl iv) At most two girl
Assume equal probability for boys and girls. fAns. : i} 300 ii) 750 iii) 50 iv) 550] Problems on Ch-square Distribution
9. A group of 20 aeroplane are sent on an operational flight. The chances that
” the : - Tt : h
aeroplane fails to return from the fight is 5 %. Determine the probability then i) No 21. The table below gives. . T7 in cer ‘ai‘ain country7 m_ on
the number 07 accidents that occurred
plane returns ii) At most3 planes do not return. [Ans,. : i) 0.3585 ii) 0.9844} various days of week :
> 7; 2 ae : * Days . . | Mon | Tue | Wed | Thurs | Fri | Sat
10. Team A has-a probability of 3 of winning whenever the team plays a particular game. :
if team A plays 4 games, find the probability that the team coins : I) Exactly two Accidents — 126 | 130 | 110 | 115 | 135 |j 170
— > 7 saps
games, ti) At least two games. , fAns. : i) 0.2962 ii) 0.8889] Test 5 % of level of significance whether the accidents are uniformly distributed ove
Problems on Poisson the days. {Ans. : X, = 4.58, X,,,, = 14.07 H, accepted]
ET ng
_ 22. The demand for aparticular spare part in a factory was found to vary from day to day.
11. Find the probability that atmost 5 defective fuses will be found in a box of 200 fuses if Ina sample study. The following information was obtained. ,
2 % of such fuses are defective: - ‘ | {Ans, : 0.735] | : Days Mon } Tue | Wed | Thurs | Fri | Sat
12. 4 car hire firm has 2 cars which it hires out day by day. The member of demands for
No. of parts | 1124 | 1125 | 1170 | 1120 | 1126 | 1135
the car on each day is distributed as poisson's distributikn with parameter 15.0.
demanded .
Calculate a) the probability of days on which neither car is used and b) for days on
which demand is refused. 2 2 .
{Ans. : (a) 0,22 (b) 0,2025] [Ans. : X = 3.9222, X,,,, = 9.488 H, is accepted]
13. A manufacturer knows that the razor blades he makes contain on the average 0.5 % of
defectives, He picks them in packets of 5. What is the probability that a packet picked
Multiple ‘Choice Questions ;
at random will contain 3 or more faulty blades ? - {Ans, : 0.00000255]
14, There are 300 misprints are distributed randomly throughout a book of 500 pages. Q.1 There are two types of meagures of dispersion:
Find the probability that the gives page contains i) Exactly two mispoints, li) Two or’
[al nomina! measure of dj spersion and real measure of dispersion
more mispoints. {Ans, :{) 0.1 il) 0.122] A 7 7
15. Using Poisson's distribution, find the probability that the ace of spades will be drawn
[b] nominal measure of dispersion and relative measure of dispersion
from a pack of well shaffled carts at least once in 104 consecutive trials. [Ans. ; 0,864}

TECHNICAL PUBLICATIONS®
- an up-thrust for knowledge rs . TECHNICAL PUBLIGATIONS® - an up-thrust for knowledge
Statistics , 3-70 Descriptive Statistics : Measures of Dispersion Statistics , . 3-71 Descriptive Statistics :|Measures of Dispersion

Coefficient of kurtosis [ad] Coefficient of variation
- .

[cl
FT ——_] . « .

absolute measure of dispersion and relative measure of dispersion


The moments about mean are called .
Fl real measure of dispersion and relative measure of dispersion | Q.9

[a] raw moments [b] central moments


Q2 The is the easiest measure of dispersion to caiculate.

[a] range | [vb] mean absolute deviation ſe] moments about origin [a] all of the above

[4] variance Q.10 if the distribution is negatively skewed, then the .


ſc] standard deviation
[al mean is more thanthe mode fo} median is at right to the mode
0.3 Quartile deviation is calculated by | .

[al Q, + Q, , © Q, _ Q, [cl mean is at right to the median [a] mean is less than the mode
2 . 2
Q,11 Ina frequency distribution of a variable x, mean.= 34, median = 28. The distribution is
Q,+Q,
5 ” = Q,-Q
[a] 5

[al positively skewed [6] negatively skewed


Q.4 The numerical value of the standard deviation can never be .

[b] Zero fe] mesokurtic [4 platykurtic


[a] none

[a] larger than the variance Q,12 in a negatively skewed distribution


[c} negative
[al amcan > mode > median [b] mode > median > mean
Q.5 Standard deviation is also called .

[6] mean deviation [cl mean > median > mode {al mode < median > mean.
[al mean square deviation
Q.13 For a distribution, mean is 42, median is 42.8 and mode is 44. The distribution is .
fe] square deviation fal root mean square deviation
[a] normal - [b]. positively skewed
Q.6 Square of standard deviation is called .
[c] negatively skewed [a] mesokurtic
[a] Mean Square Deviation [b] Mean Deviation

Q.14 The second moment (uz) and fourth moment (p,) about mean are 4 and 18 respectively,
[cl Variance [a] Root Mean Square Deviation
What is value of skewness Þ, 7?
Calculate mean deviation from median and its coefficient from the following data : 110,
Q.7
[a] 0.875 | fo} 1.125
450, 100, 90, 160, 200, 140 : |

= \ [6] 44.28 ſe] 1.25 [a] 4.5


[a} 3428
Q,15 The first four moments of a distribution about the origin are.— 1.5, 17, — 30 and 108. The -
ſe] 32.14 [a] 33.55
third moment about mean is :
The ratio of the standard deviation to the arithmetic mean expressed as a percentage is
Q.3
ſa] 39.75 [b] 41.75
called : _ .

[a] Coefficient of standard deviation B Coefficient of gkewness [ce] 40.75 [a] 42.75

, TECHNICAL PUBLICA TIONS® - an up-thrust for knowledge


TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics 3-72 Descriptive Statistics : Measures of Dispersion ‘Statistics - 3-73 Descriptive Statistics : Measures of Dispersion
——
Q.16 if y, be the nm order central moments of a population then j,, u, and y, are given by . 0.22 if B,, is positive then ,, wil be

fal. 10° [b} 1,0, of fal Positive [b] Negative

ſe] 1,1,5 {a} 1,0,5 fc] Zero {al 1


Q.17 The value of B, can be . . Q,23 Relation between r, B,, and B,, is

[al less than 3 {b] greater than 3 fal r= B + B,. © cj


[c] equal to 3 {d] All of the above
ſe] r=1ſÞ,,-B,, [d] r=Þ,'Þ,
Q.18 The method of least squares find the best fit line that the error between
a.24 ifr=048, þ,, =1.6 then þ,, = ?
observed and estimated points on the line.

[al maximizes
ſa] 0.2 [b] 0.4
[b] minimizes -
fd 1 [a]0
ſe] reduces to zero [a] - approaches to infinity
Q.25 The value of coefficient of correlation
Q.19 The slope of the regression line Y on X is also called
[al between | and 0 [b] between — | and 0
the correlation coefficient of X on Y
ſe] between — | and + 1 fa] Equals to 1
the correlation coefficient of Y on X
Q.26 Line of regression y on x is
the regression coefficient of X on Y

the regression coefficient of Y on X


[al y+F=rEx+x
Oy.
[b] y-5=rEx-Þ)
—i =
[d y-J=rgp(x-5)
Ox —
Q.20 If the equation of the regression line X = 4 then ——— een . K-~K=Ir y 9-9)
o

the line passes through the origin


Q.27 Line.of regression x on y

the line passes through (0, 4)


[a x-x=1 29-9 SOX,
[b] x-F=ro/(
=
y}
the line is parallel to Y axis

the line is parallel to X axis


ſe] x+X-r Sty
oy
[al
—_ oy
X=X=r
_
ty)

Q.28 If the two regression coefficients are © E and -2 chen correlation coefficient is
Q.21 if the scatter diagram is drawn the scatter points lie on a straight line then it indicates

[a] . - 0.862 [b] - 0.3281


[a] skewness [b} perfect correlation
[4] 0.4
[c] no correlation [a] none of the above

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge TECHNICAL PUBLICA TIONS® . an up-thrust for knowledge
Statistics 1 Beh. Descriptive Statistics : Measures of Dispersion

Q.29 Line of regression y on x is 8x — 10y + 66 = 0 and Line of regression xon yis

of X and y are
— 18y = 214. Then values
40x .
Unit IV
[a] Z=12, F=15 ſb] #=10,F=11

Q.30 Line of regression y on x is 8x — 10y + 66 = 0. Line of regression x on y is 40x —18 = 214,


Random Variables
The value of variance of y is 16. The standard deviation of x is equal to
M2
and Probability Distributions

[dl7 Syllabus
Random Variables and Distribution Functions:
Discrete Random
Random Variable, Distribution Function, Properties of Distribution Function,
Random Variable,
Answer Keys for Multiple Choice Questions : Variable, Probability Mass Function, Discrete Distribution Fi unction, Continuous
- Probability Density Function
Qi c Q@<2 > a @3 b> Q4 cc Qs 4 Theoretical Discrete Distributions : Bernoulli Distribution, Binomial Distributio
n, Mean Deviation
of Binomial
about Mean of Binomial Distribution, Mode of Binomial Distribution, Additive Property
Binomial Distriviution,.
Q7: 6b: 08: d: Q9: b 1 QO: d Distribution, Characteristic Function of Binomial Distribution, Cumulants of
S
oO

Poisson Distribution, The Poisson Process, Geometric Distributio n...


(Qari a £Q12) b Q13;, e {Q.14: b ;Q.415 ‘a
=_ ,
c Contents
O17. d jQ18} b 7Q19+ d jQ20}
7)
rd
WG.

4.1 Introduction
0.23: e ;Q24i b iQ25: c Random Variable
oO
&

oO
OY >
el

4.2
=

a
AQ

4.3 Probability Mass Function (p.m.f) .


Q.28; b §.Q29: c $Q30: a
©

&
el
r-
a

QO
oe

4.4 Probability Density Function (p.d.f.)


4.5 Solved Examples ©
000
4.6 Introduction
4.7 Binomial (Bernoulli's) Distribution
4.8 Mean of the Binomial Distribution
4.9 Variance of the Binomial Distribution
4.10 Mode of the Binomial Distribution
4.11 Additive Property of Binomial Distribution
4.12 Characteristic Function of Binomial Distribution
4.13 Cumulants of Binomial Distribution
4.14 Solved Examples ,
4.15 Poisson Distribution
4.16 Poisson Process
4.17 Solved Examples
4.18 Geometric Distribution
Multiple Choice Questions

(4-1)
TECHNICAL PUBLICA TIONS® - an up-thrust for knowledge
Statistics 4-2 Random Variables and Probability Distributions
Statistics _ 4-9 _ Random Variables and Probability Distributions

LEI introduction + In simple words a variable used to denote the real value of the outcome of a random

¢ Already in theory of probability we have studied about random experiments and its experiment is a random variable.

corresponding sample space. Now our interest is not to construct the sample space of e A random variable X may be finite or countably finite or countably infinite or
. random experiment but in some number obtained from the outcomes. uncountably infinite. :
© For example, in case of the experiment of tossing a coin twice, our interest may by only . EPRI Types of Random Variables
in the number of heads when a coin are tossed two times. So in probability problems it is
There are two types ofrandom vatiables,
convenient to think of a variable and consider outcomes as the values , the —— takes
such a variable is called a random variable. 1) Discrete random variable
If the random variable X takes only a finite or countably infinite values, then it is called
EFJ Random Variable - discrete random variable. Examples of discrete random variables are the number of
» A random variable is a real valued function defined on the sample space of a random children in a family, the number of cars sold by a dealer, number of stars in the sky and
experiment. In other words, the random variable X can be considered as a function that so on. ,
maps all the elements in the sample space S into points on the real number. 2) Continuous random variables :
Thus X : S > R is a random variable. If the random variable X takes uncountably infinite values, then it is- called continuous
Example: random variable. =
Consider the experiment of tossing two fair coins. Examples of continuous random variables are height of person selected at random, the
The sample Space S = {HH, HT, TH, TT}. There are 4 possible outcomes. age of person selected at random, because x can take any value between a specified
But if we are interested in the number of heads when two coins are tossed, then thete are range. :
only 3 different possible values from 0 to 2. LE] Probability Mass Function (P.M.F)
' 1) Getting no head, 2) Getting one head 3) Getting two heads. * If a discrete random variable X takes values x,, x,, x, .... x, with corresponding
n
probabilities p,, p,, Pys-.» P : respectively where 5 p, = 1 then the distribution .
S ={HH, HT, TH, TT} $f
, ~2-1 0 t 2
Real line
‘OR

is called discrete probability distribution.


In discrete probability distribution the probabilities P, satisfying the conditions that
iO< p;<1 foralli it) Zp,=
Then p; is called soba mass function.
Example : |
Suppose x denotes theoutcome of the throw of a die. Then X takes values 1, 2, 3, 4, 5, 6
Fig. 4.2.1 each with probability 1/6.:

TECHNICAL PUBLICA rions® - an up-thrust for knowledge


, - TECHNICAL PUBLICATIONS®- an up-thrust for knowledge
Statistics 4-4 Random Variables and Probability Distributions Statistics 4-5 Random Variables and Probability Distributions
t

-. The discrete probability distribution 1s Example : For a continuous random variable x, the probability density function defined
fe
by
[x 0< x83
fx) = 49
0 otherwise
Properties : In p.n.f. .
Properties : In p.d.f.
i) OSP(x)<1 foralli
1, i) f(x)2 0 for allx

1=1I
i) | fo) dx=1
i1)P (Sx Sx, )= ,;)
PH) + PR) +... + P(x;
db
iv) Mean or expected value : -_
iii)P (a <x $b) = | f(x)dx
The mean or expected value of X is denoted by 1 or E(X) and is defined as. a
Dp, X; iv) Mean ox Expected value :
Þ = E(Q= = 2 Pi;
Dp, _
= , - 2 ; 2 2 2 E(x) =" xt ax
v) The variance of X is denoted by V(X) and V(X) =o = Dlx, —wy = EX’) - [ECX)] — co

where - & = Standard deviation = V(X) _ ‘ \ oO

vi) Cumulative distribution function (c.d.f.) of a discrete random variable X is denoted by ¥) Variance = V(x) =o" = | 6@- wy f(x) dx = E[x’] - [EGOy

i
F(x) and is defined as, . — oo

P(X=x) . .
F(x) = PX <x]=" >,
, | XiSX

EY Probability Density Function (P.D.F.)


* If x is a continuous random variable and the probability of that x will be given in
interval (a, b), then the probability distribution is called a continuous probability
distribution.
e Usually a continuous probability distribution is given in the form of function f(x).
. The probability function f(x). The probability function f(x) which has the following
properties. Solution : i) The table gives a probability distribution.
i) f(x) 20 for all x os P(X = 0) + P(X =1)+P(X=2)+P(X =3)+P(X=4)=1
co ” 0.1+k+2k+2k+K 1
ii) | f(x) dx = 1 is called a probability density function. c 0.1+6k = 1
— 00 a 6k = 0.9
- ' k= 0.45

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
fe teint

Statistics 4-6 Random Variables and Probability Distributions

ii) P [X<2
=P (X =0]+PIX=1]=
] 0.1+k=0.14+0.15=0.25

ll
iii)P [X23
=P [X=3]]+ P[X=4]=2k +k=3k = 3(0.15)=0.45

ll

Il
iv)P [1 SX <3] =P [X= 1]+P[X=2] + P[X=3]=k+2k+2k
= 5k
5 (0.15) Solution : Given random variable X = —2, -1, 0, 1, 2.
0.75 _ By definition of probability mass function
P (X=-2) + P(X =-1) + P(X= 1) + P(X =2) + P(K=0)= 1 .
Given, P (X =—2) = P(X =-1) =P (X= 1) = P(X = 2)=k (say)
Also given P(X < 0) = P(X = 0) |
op P(X=0) = P(X=—1)+P(X=-2)
P(X=0) = 2K
From (I) and (II)
I
k+k+2k+k+k =1
Solution : 1) From probability distribution
” 6k = 1
.
P(X=1)+P(X=2)+P(X=3)+P(X =4) +P(X=5) +P(X=6)+P(X=7)=1
oo kt¢2k+3KH4K +kt 244K =] = ket
c Tk+8Kk = 3 Hence the probability distribution (p.m.f.) 1s
re BY
*
NIOReY
8k +7k-i = 0
8k +8k-k-1 = 0
(k+1)(8k-1)= 0
k= -1 ork=1/8 te seessiarisnon:
Example F yy
Since P(x=1)=k - k>0
The value ofk= 1/8
1} P(X>5) =P(X=6)+P(X
=)
| =2k +4k =6K Solution : Since f(x) > 0 for all x
_
= “6d
aoe
= 32
" POSXS5)= P(X=1)+P(X=2) +P(X.=3)+P(X = 4) + P(X = 5) 0 1 0
=
~

ae « | t6) a + ſix) dx+ ft),


OR POSXES)= 1-P(X>5)=]- = 0
32 32 1

TECHNICAL PUBLICATIONS® - an upsthrual for knowledge


TECHNIGAL PUBLICATIONS® - en upethrust for knowledge
te:

4-9 Random Variables and Probability Distributions


statistics
Statistics . 4-8 "Random Variables and Probability Oisributiong
2
iy Mean = B(x) = J x £08) dx

_
oy

|
il
4 |


|
i!

mt
©

j
Vf

mY
oft
, =
Solution : Given p.6d.f. ‘
_ ſkx(2-x), O<x<2 |
otherwise. : “ E(x) =
fx) = { 0
m) - Variance = V(x)= E(x’) - [EG]
i) Since, f (x) 20: for all x and .
\ 2
.

BGR= fie fax


8—

_

X
I
th

>
ll

©
ca
=

|
>

it
—_
l
RD

E(x) =
ntst
lst.

x (2-x) ,O<x<2
A,

&=

ll

TECHNICAL PUBLICA TIONS® - an up-thrust for knowledge


Statistics 4-10 Random Variables and Probability Distributions Statistics | 4-71 . Random Variables and Probability Distributions
.
-
Solution ; Given p.d.f. + Observed frequency distributions are based on actual observations and experimentation.
f(x) = Axe", x>0 + Theoretical distributions are derived mathematically with some assumptions.
1) Since, (x)= 0 for all x « There are many types of theoretical distribution but we shail consider only three.
1) Bionomial (Bernoulli’s) distribution
and f f(x)-dx

It
2) Poisson’s distribution
0
3) Geometric distribution.

fa xedx =1 “8 Binomial (Bernoulli's) Distribution |


0
of (i) An experiment results only in two ways, success or failure , (ii) The probability of
" success is p and the probability of failure is q such that p +.q = 1 (iii) The experiment is
A {xe ax =1 repeated n times then probability ofr success is given by, *
0 P(x=n= "© p-gq > Wherer=0,1,2......
n n n!
C, = 1,1 and © 1m)!
AV2 =1 (Since by definition of Gamm function
Co
+ To find this probability we; consider Binomial. Expansion, so the above probability
Tex" a= Va and 1/2 =1 I=) distribution is called Binomial probability distribution. It is denoted by B (n, p, r), where
0 n, p, qare called the parameters.
X1=1
[EX Mean of the Binomial Distribution
A =1 x
5 Mean of the Binomial distribution B (n, p, r) is denoted by x or Wand defined as,
it) P(2<x <5) ſf6x) ax n _
ſt

2 Þ Y P(X=r)-r=E(X=1)
5 r=0 |
n
fx e” dx
2 r P(r)

i
M
-x 35
xe “x
1” © |

= [(-Se°-e* )-(2e —2 —2
=e )}
= -6e*+3e?
Variance of the Binomial distribution is denoted by V and is defined as,
* P(2<x<5) = 3e*—6e*
. n n :

EFJ introduction v = = 3 ©P()-| XY r- PQ)

il
Qa
I
© Frequency distributions can be classified as, © —x=0 r=0 .
. 2
1) Observed frequency distributions an Variance = & =npq

2) Theoretical or expected frequency distributions and the standard deviation G = + ripq

I
TECHNICAL PUBLICA TIONS® + an up-thrust for knowledge TECHNICAL PUBLICA TIONS? ~ an up-thrust for knowledge
Statistics l 4-12 Random Variables and Provability Distributions 48 4-13 Random Variables and Probability Distributions

EXT Mode of the Binomial Distribution


"Cp r

=

M
Ry

li
i
* Depending on the values of the two parameters n and p, Binomial distribution may be ©
‘capt =1(3)
9,
5)
AVERY 0.9

©

=
Au

1
I
unimodal or bi-modal.
+ To find the mode of Binomial distribution, first we have to find the value of (n+1)p.
1) If (n+1) p is a non-integer then mode is uni-modal and mode = the large st integer P(r<1) = P(r=0) + P(r= 1)1
‘contained in (n+1) p Cpat Cpa
9. 0 9. 9,

Il
2) If (n+1) p is an integer then mode is bi-modal and mode = (n+1)p, (n+1) p~ 1.
G96

It
FEI Additive Property of Binomial Distribution
(Daw
. let X and Y be two independent Binomial variables such that X follows Binomial
distribution with (n, p) and Y follows Binomial distribution with (n,, p), then X+Y.
193) _—
follows Binomial distribution with (n, + n,, p).
P(r <1) 0.00097

Ii
PEPA Characteristic Function of Binomial Distribution
The characteristic function of Binomial distribution is denoted byy ©.(t) and isi defined as
(0) = Efe”)
(qt pe’)” =(1-p + pe’) . : -
ll

Solution : Given : n = 10 ,
'
ESE) Cumutants of Binomial Distribution Let p be the probability of defective transistor
The moment generating function of Binomial distribution is denoted by M,(t and is- - P = 100-0
_ |
defined as —— =
c q= 1-p=08
M,(t) = Ele")
ll

lO-r
Since p(r defective) = "C,p'q’ "= "°C. (02) (0.8)
(q+pe)* =(1-p+pe" | ,
(i) p (All defective transistor)= P@= 10)== C02)" (0.8)° = 0.000001
Il

- The cumulant of Binomial variable X is ky(t) = log. (q+ pe)” log.(qt. pe)
(ii)p (All are non-defective) = P(r=0)= C,(0. 2) (0. 8)" ==0. 1074

‘ERE Solved Examples

Solution : Since an unbiased coin is thrown, therefore,

Solution: Given : Mean= yp, np =6 and variance =v=npq=2 p = a=3=0.5andn=10


.
Oh 6xq =2 =
a-r

where p - the probability of getting head. Since p (getting r heads) "Cop


we q = 13
We

(i) p(getting exactly 6 heads)


Hence _Þp=1-q=23
P(r = 6)
As. : ; np 6 > nx% =6 => n=9 10 6 4
C.p-q
it

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge TECHNICAL PUBLICA Tions® - an up-thrust for knowledge
Statistics 4-14 Random Variables and Probability Distributions Statistics

10!
#77 (0.5)* (0.5)!A
10.9.8.7 = 233
2321 05".
= 210(0.5)"
= 0,205
(ii).p (getting at least 6 heads)
p(getting 6 or 7 or 8 or 9 or 10 heads)
P(r = 6) + P(r = 7) + P(r = 8) + P(r = 9) + P(r = 10) Since the theoretical frequencies are given by
Il

= "c,(0.5)* (0.5)*
+ *c,(0.5)' (0.5)
+ *c,(0.5)* (0.57+ °C,0.5) | No +9) = 128 (44 1 7

0 10 0 : . . >
(0.5) + "C,q(0.5) (0.5) oft (AY a OY LY a TY +a TY. 7,f1V 7, fay" 7. fly
= (0.5) (°C, + Wey %. + %. +. = 128 Co 3} + C5) + ¢, 3) * C, 3] Cy 3) + C3} +0 x) + OF
10f Cr 10,9.8.7
10.9.8 10.9 ‘‘ _ ao 2h221 7 1 35 35
= 051455 +F3g1 2 HL] | 128] 798 * Tag
* 108 * 198* 128
* 1287 18” Dy,
7

= (0.5) 10 (386) = :

= 0.3769

Solution : In a toss of a single dice,

jet p =
1,12
Probability of getting 5 or 6 = 6167673
1

o. g= [-p= 5 andn=6 (since 6 dice are thrown simultaneously) Solution : Given :


: one Mean = = 12
Since MAx=r) = Cp4q and Variance = pq=6
-. plat least 3 dice to show 5 or 6) = . q= _6 1
= 1-[P(r=0)+P(+ r=1)
P(r=2)) —_ 2
_ 6 LY fz(2°
2) 6 1\
ths (2° 6 DAT=
a . ==
= IL G3) 3) © G3)
and
© Gl) 3 P
_ [64 , 192, 240 | Since np P = nxF=12 |
* 729
= 1-1729*729 |
=_ 496 | Hence n =
729 To find the mode, first we find the value of (n + 1)p
— 233 oo 1
~ 999 - (n+ 1p = (25) 3)E 12.5
” Expected number = N-P(at least 3 dice to show 5 or 6)
i

TECHNICAL PUBLICATIONS® - an -up-thrust for knowledge


4-17 . Random Variables and Probability Distributions
Statistics ' 24-16 Random Variables and Probability Distributions Statistics

‘Since (n + 1)p = 12.5 is a non-integer, hence the given binomial distribution is uni-modal
and its mode = largest integer contained in 12.5
© Mode = 12

EEE Poisson Distribution


e Poisson distribution is the limiting case of Binomial distribution when p, the probability

>

on
—_
Au
>
ll
of success is very small and n, the number of trials is very large. i.e. p + 0, 2m > & such

ll
| than np =m remaining finite.
In some cases where an experiment results in two outcomes success or failure, the
number of successes only can be observed and not the number of failures. e.g. we can
observe how many persons die of corona (covid-19) but we can not observe how many 9! 1
=e *= 0.04978
do not die of corona. In such cases Binomial distribution can not be used. In such cases e™ m? —3

we use poisson distribution. _ a a 0.5041

|
* A random variable X is said to follows Poisson distribution if the probability of X is
given by,

Solution : We are given n= 2000, p = 0.001


Ems Poisson Process. Since < m = np=2000 x (0.001) =2

* The poisson process is one of the most widely used counting process in probability. It is for a Poisson distribution -
. =m Xx
the limiting case of Binomial distribution such that n > -, p > 0 and np = m = finite.
€ Mm
Px).
4

I
x!
Poisson distribution is uniparametric distribution as it is characterized by only one
e 2"
parameter m. ” P(x) x!
1) Mean of Poisson distribution is uy = E(X) = m
i) P(exactly 3 will suffer a bad reaction)
2) Variance of Poisson distribution is V = o* = m
P(x= 3)
3) Poisson distribution may be uni-modal or bi-modal depending upon the value of the _ PA ON

parameter m, ;
3!
(i) If m is a non-integer, then distribution is uni-modal and 0.1804
Mode = The largest contained in m
ii) P(more than 1 will suffer a bad reaction)
(ii) Ef m is an integer, then the distribution is bi-modal and mode = m, m- 1 1 P= 0) +P(x = D]
4) Additive property of Poisson distribution: Let X and Y be the two independent Poisson
variables such that X follows Poisson distribution having parameter m, and Y follows
EL,
Poisson distribution having parameter m,, then X + Yis also a Poisson distribution with 1 ~ {0.1353 +
parameter m, + m, | 1 — 0.4060
0.5940

TECHNICAL PUBLICA TIONS® - an up-thrust for knowledge


TECHNICAL PUBL} CATIONS® - an up-thrust for knowledge
Statistics Random Variables and Probability Distributions

P(x = 3)

P(x = 4)
PR 10
Now Theoretical frequency ==N- P(x)
m= n=50x0.2=10 Putting x =0, 1,2,3, 4we get the theoretical frequencies as 109 66, 20,4, 1.
For a Poisson distribution,

P(x) =

-.*. P(3 or more accidents in a day) Solurion : Given that


1-[P(&=0)+P(x=1)+P(x=2)] m

oO:
io
IU
I

S
For a Poisson distribution’
= tml gt ta
1 — [0.000045 + 0.0004539 + 0.002269} P(x) = =0,1,2,...
1 ~ 0.002754
7 P(x) =
0.997246

1) P(the computer has no breakdown)


. = P(0)=£03).
OE = 0.7408

11) PEthere i is at most one breakdown) = P(0) + PL)


=_ ~0.3
0.3 0.3
£03 £403) 0 1466 + 02222 - 0.9630
Zxf,
Solution : The mean of the distribution = m= If.

_ 1090 +65x1+22X2+3X3+1x4_122 9
me 200 =200*0
Solution : Given that m = 2
For Poisson distribution
For a Poisson distribution
P(x) = = = where x = 0, 1,2, 3,4 P(x)
. where
x = 0, [, 2, ...

© P(a page from the book has at least one printing mistake) = P(x > 1)
P(x
= 0) = oe e ** = 0.5434
= 1-P(x<1)
= 1=P(x=0)
- 0.61
0.61
P(x = 1) = © (061) _ 0.3315
(= J+
e 72°
—0.61
0!
P(x = 2) ey$1) 01013 { — 0.1353
0.8647

Jt
TECHNICAL PUBLICATIONS® . an up-thrust for knowledge
TERKNICAL PHRIHCATIONS® . an to-thrust for knowledae
~

statistics. 4-21 Random Variables and Probability Distributions


Statistics | : 4-20 Random Variables and Probability Distributions pn

Geometric Distribution
1} 4 random variable X has the following probability distribution.
© Ina series-of Bemoulli's trial (independent trials with constant probability P of success), ON nrepoeoeoteepeoronane

let the random variable, X denote the number of trials until the first process, then X is a
geometric random variable with parameter p such that 0 < p < 1 and the probability mass
function (p.m.f.) ofX is = Find: i) the value of k ii) Mean iif) Variance iy) P(x <-2) v) PAX <- 2)
P(X=x) = op, where x = 1,2, 3, :..\ 2) The probability mass function of a random variable is defined as P(X = 0) = 3C,
P(X = 1) = 4C— 10C* and P(X = 2) = 5C-1, where C > 0 then find the value of C.
PEER Properties of Geometric Distribution 3) If the random variable X takes the values 1, 2, 3, 4 Such that (2P (X = 1) = 3P (X = 2)
= P (X = 3) = SP(X = 4)) then find the probability distribution and P (2 <X <4).

—| a
1) The mean of the geometric distribution is p = E(X) =
4) Find the value of k if the probability density function f(x) = kx eo’, 0<x< 0
2) The variance of the geometric distribution is
5) Find the value k if the probability density function f{x) = — ,- oe < x < co
VOX) = EX) ~ [EG]? = SP = 2 2 1=- x

Also find P(O Sx © 1).


6) Find probability that almost 5 defective fuses will be found in a box of 200 fuses if 2 %
3) The standard deviation of the geometric distribution is 6 = + V(X) =+ pe of such fuses are defective. = . [Ans. : 0.7350]
7) In a certain factory turning out razor blades there is a small chance of 0.002 for any
' 4) Moment generating function of geometric distribution —Þ
is blade to be defective. The blades are supplied in a packet of 10 calculate approximate
1-(1=-p)je
number of packets containing no defective and 2 defective blades in a consignment of
10,000 packets. - =_ [Ans. :-2]
8) Fit a Poisson distribution to the following data

Solution : Given : For Geometric distribution


: P(X) = gq’ -p — where q = 1-p
9) Using Poisson distribution, find the probability that ace of spade will be drawn froma
3 pack of well shuffled cards at least once in 104 consecutive draws. [Ans,: 0.8647]
Here - p= 55=0.05 | -.q=1-p=1-0.05=0.95
] 0) Accidents occur on a particular stretch of highway at an average rate 3 per week.
A OO PX=6) (0.95) ** (0.05) = (0.95)! (0.05) What is the probability that there will be exactly two accidents in a given week : ?
0.0406 {fAns.: 0.224]
i

Multiple Choice Questions

Q.1 If the discrete random variable X has the following probability distribution
Solution : Given : For Geometric distribution
x=1 Xx 1 2 3 4
P(X) q -p —_ ; _ whereq=Il-p
1

04 “q=1-04=0.6
P(X=x) | 3k k k 3k
Here p=
” - P(X=3) =(0.699"! (0.4) then the value of k is .

= (0.6) (0.4)
0.144
q

TECHNICAL PUBLICA TIONS®. an up-thrust for knowledge


TECHNICAL PUBLICATIONS® > an up-thrust for knowledge
Statistics 4-22 Random Variables and Probability Distributions
statistics = _ 4-23 Random Vatiables and Probability Distributions
—— ;
{a]

[=]
f. Q.6 i the discrete random variable X has the following probability distribution
x o | 1 2 3 4 5
[c] P(X=x) | 0.15 | 0.1-} 0.15 | 030 | 015 | 615
Q.2 If the discrete random variable X has the following probability distribution = then the value of P(X < 2) is; .
x 9 1: 2 3 4 [al 0.15 [b] 0.25
P(X =x) 6.1 k | 2k 2k k
[c] 0.4 fa} 0.45
then the value of k is : -

[a] 0.15 [v] 0.25 Q7 if discrete random variable X has the following probability distribution
x {| 0 | 1 2 3 4 |° 8
[c] 0.10 ſa] 0.35 - P(x=x) | 2k | 2k | k | © | 2@ | T+2k
Q.3 Three coins are tossed simultaneously, X is the number of heads, then P(X = 2) is

ape
[=]

Col
q @

tol
Apo

a; @ Q.8 The mathematical expectatio n E(X) of the following probability distribution is


Nis

x + 0 I 2 3
Q4 ifthe discrete random variable X has the following probability distribution
PX=)} a2 | 3 | 3 ] 2
x 1 2 3 4 | 8 8 8 8
P(X=x) | 1 1 3 2 13 . 3
10 5 10 5 - [a] 5 [bl 5
then the value of P(2<X<4)= 9 — 2
3 3
[cl § [ 5
5 [>] 10 Q.9 If probability mass function. f a discrete random variable X is
c -
[a 2 P(X =x)= x? forx= 12.3 then value of
10 c is
0” :
Q.5 The discrete random variable X has the following probability distribution 216 = 251
x 0 1 2 3
PE [eI 216
| 294
ll 776
294
[ 557
P(X
= x)
to ſms

ie
|

00 |

Then the value of P(X 2 2) is Q.10 Ifprobabil y mass function of a discrete random variable X is
c _ : :
| [a] 0.5 [bv] 0.25 P(X =») ={x* * forx= 1.2.5 then E(X) is __.
o' _. :
[Jos + [d] 08

TECHNICAL PUBLICATIONS® . an up-thrust for knowledge


TECHNICAL PUBLICATIONS® . an up-thrust for knowledge
4-25 Random Variables and Probability Distributions
Statistics 4 - 24 Random Variables and Probability Distributions Statistics

fa] 565
297
343 ~
[bb] 557
251
294 ”
| |
7.
Ai 3
lel
297
294 [9]
294
297
da; .)ti‘<‘iCTSGG
values X,, X,, x, Q.16 If the probability density function of a continuous random variable X is
Q.11 ff the probability function of a discrete random variable X which assumes
= ; BY | ~1Sxs4
Such that P(x,) = 2 P(x,) = 3P(x,) then P(x,) = f(x) = then P(0 <x < 1)
0 ° otherwise

= O@
ez al
1
B

ain
3.
a 2 , ae

IN
& 5 {|

Q.12 If the probabili3ty density function of a continuous random variable x . is


Q.17 The probability density function of a continuous random variable X is
|
tox) = {R= , isxs3 then the value of k is : 2
_ <X<
f(x) = {tx 0 x) » noice then the value of k is
=
a5
1
[2]
aN

fal 12
feds hdd
len

ſe] 6 | : 6
Q.13 if the p.d.f. of 2a continuous random variable X is defined by
_[k(t-2x) , O<x<1 then the value of k is .
0 4 | otherwise
.
fo) =| :
f(x) = { x
' DEXE ®
<X<oo
then the value of P(0 <x <2)
[b]
—_M”

‘ 2
[al l-e
_
[4
ef
©

e -1 l . {a -e@ —e
= . . >
variable X is defined by _____.
Q.14 If the probability density function of a continuous random
fog) = KX, ~3<x<3 i,
Wyo, otherwise Q.19 The probability density function of x is f(x) “Tee? <x <oo then the value of k =

1 '1
[al 27 [b] 28

= ©

RIN
] 1
Bir 5

Ae
Q.15 The probability density function of a continuous random
, OSxst :
variable X is
L a
_ f6x(1-x)
fog ={ 0 , otherwise Q.20 Mean and variance are equal in

Af P(X < k) = P(X > k) then the value of kis . Poisson’s distribution
[al Binomial distribution [6]

ſe] Normal distribution fa] Geometric distribution

TECHNICAL PUBLICA TIons® - an up-thrust for knowledge


TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics \ 4-26 Random Variables and Probability Distributions Statistics 4-27 Random Variables and Probability Distributions
——
Q,.21 In a Binomial distribution, if n is the number of trials and p is the probability of success,
Q.28 The mean and variance of a Binomial distribution are 24 and
2 respectively,
then the mean value is given by
Then the P(X= 2) is
[a] ap fb} a 128 — -219
[al 336 [b] 256
[c] p [4 np(1-p) 37 : 28.
Q.22 In a Binomial distribution, if p, q and n are probability of success, failure and number of 256 [El 256
trials respectively then the variance is given by . Q.29 The mean and variance of Binomial distribution are 6 and
2 respectively then the number
of trials n is -
[jmp [b] pq
2 . 2 [af14 [6] 18
[e} np'q [a] apg
Q.23 IfXis a random variable taking X, probability of success and failure being p and q
[e] 10 fa xv
respectively and n be the number of trials then P(X= x) is Q.30 The mean and variance of Binomial distribution is 36 and 9 respective
ly, then the value
of probability of success p = :
[a] nc p’g" [b] aC p'y™
fa] 1/3 ſv) 3/4
[e] *C,p'q id] ‘Cp
Q.24 In a Binomial distribution, if p, q and n areprobability of success, failure and-number of [a 12
triats respectively then the Standard deviationis | . Q.31 The mode of the Binomial distribution for which mean and variance
are 12 and 6
respectively, then the value of mode is
ve EH FH os
(vn [al
Q.25 In a Binomial distribution which of the following is correct ?
[a] 2
[al Mean = Vairance Q.32 The mean and variance of Binomial
[b} Mean < Variance probability distribution are 6 and 4 respectively then
the value of mode is
[c] Mean > Variance {al None of these
[a] 65 D 6:
Q.26 In a Binomial distribution, n = 5, if P(X = 4) = P(X = 3) then p=
[c] 6 [d 6.6
4 5
[al 5 bl 5 Q.33 In Poisson's probability distribution, if m = np, where n the number of trials
is very large
9 6 and p the probability of success at each trial then the probability of r successes is
13 El 13 s
0.27 In a binomial distribution, if p = q, then P(X = x} is _
[a] *c,0.5)" [b] *c,(0.5)*
"C,(0.5)* [a] “Cp
TECHNICAL PUBLICA TIONs® - an up-thrust for knowledge
TECHNICAL PUBLICA TIONS® - an up-thrust for knowledge
Statistics 4-28 Random Variables and Probability Distributicns Staiisio8 _
4-29 - Random Variables and Probability Distributions

Q.34 in a Poisson's distribution if n.= 100, p = 0.01 then the value


of P(X = 0) is — | Answer Keys for Multiple Choice Questions :
b
Ql Q.2 a 1.3 'Q.4 Q.5
bb

Alo
1 c

[a +
Qs b Q.7 d Qs Qo Q.10

[a]

st]oO,
3
[5 2 Quit. Q.12 a :£ Q.13 ‘Q.14 Q.15
fottowing is correct ?
Q.35 In a Poisson’s probability distribution, which of the Q.19 Q.20
Q.16 Q.17 a | Q.18
[a] “Mean = Variance [ol Mean > Variance |
Q.21 Q.22 b i Q.23 Q.24 Q.25
[c] Mean < Variance [a] None of these
Q.26 Q.27 a +: Q.28 Q.29 9.30
Q.36 Poisson's probability distribution is c | Q.33 Q34 Q.35
Q.31 Q.32
[al uni-modal [bo] bi-modal
Q,36 Q.37 c +: Q.38 Q.39 Q:40 |
[cl uni-modal or bi-modal[d] None of the above
then the mode of distribution is
gaa
Q.37 Jn Polsson's probability distribution, m = np-is an integer

[al uni-modal andm © [b] uni-modal and m ~ 1

- fel bi-modal and m, m-1 [d] bi-modal and m, m +1


the value of m is ,
Q.38 Ina Poisson's probability distribution, P(x = 2) = P(x = 3) then

[a] 1 ſb] 3
[e] 2 - fa} 4
are .
Q.39 The mean and variance of geometric distribution

P:.,Þ gq d g
[al g and
and
[b] p ands 3°

[d q=45 B d P.
[oJ 4Z7; an d 2
F(X = x) = .
Q.40 The probability mass function of a geometric distribution is

[a] pq [bl dp
[e] p'q El 4

TECHNICAL PUBLICAT! ons®.- an up-thrust for knowledge TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics 4-30 Random Variables and Probability Distributions

Notes

‘Unit V-

Inferential Statistics:
Hypothesis
Syllabus
Statistical Inference - Testing of Hypothesis, Non-parametric Methods and Sequential Analysis:
Introduction, Statistical Hypothesis (Simple and-Composite), Test of a Statistical Hypothesis, Null
Hypothesis, Alternative Hypothesis, Critical Region, Two Types of Errors, level of Significance,
Power of the Test : :

Contents
5,7 Introduction
5.2 Statistical Hypothesis :
5.3 Test of Statistical Hypothesis (Test of Significance)
5.4 Null and Alternate Hypothesis

5.5 Types of Error |


5.6 Critical Region

5 7 Lovel of Significance

5.8 Power of Test =


5.9 General Procedure Followed in Testing of Statistical Hypothesis
5.10 Test of Significance

8.17 Chi-Square Test

Multiple Choice Questions

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge 5-7


“ash tea
Statistics . . 5-3 inferential Statistics: Hypothesis
Statistics 5-2 inferential Statistics: Hypothesis
a
2) Sampling: A finite subset of universe is called “sample”. The process of selecting
Gl Introduction sample from the population is called sampling.
© Statistical inference is the branch of statistics which deals the study of applications of 3) Degree of freedom : Degrees of freedom is number of independent values / variables
probability theory for decision making of some proposition. Statistical inference that can vary in an analysis without violating any constraints.
problems are divided into two categories. If data contains total *n” values of random variable and no. of contraints on the variable
= are “k” then ,
1.Estimation,
df.= n-k-1
2.Hypothesis testing
is Parameter of statistics : :
+ When data.is collected from a sample taken’ from the population then the problem
2 The statistical constants of the population such ¢ as mean (uw), the variance (o } etc. are
either to estimate the value of population parameter or to check the validity of the
known as the parameters.
statement made about the population. Finding the value of population parameter leads to the
, e The statistical quantity computed from the members of the sample to estimate
_ estimation theory. as
parameters of the population from which the sample has been drawn are known
1) Examples of estimation : Estimate the population mean weight using sample mean
‘statistic’. —_
weight. ‘
| While the process of decision making ofstatement is, ‘testing of Hypothesis’. Standard Error (S.E.) :
For example- A data based decision made by an engineer about the efficiency ofnewly ard
*Stancard deviation of the sampling distribution of a static is known as the ‘Stand
‘invented gadget over the existing gadgets. Error’. It forms a basis of the testing of hypothesis.
weight
2) Hypothesis testing : Use sample evidence to test the claim that population mean If t is any statistic.
is 50 kg. For given large samples,
Sample statistics Inference Population parameters
(known) » nown, can ke z= c- E(t)
. estimated from sample 0 and variances 1.
“ "s- k(t) 8 normally distributed with mean
statistics)
Where E(t) is error int and SE () is Standard error int.
and
* This Z is called test statistic which is used to test significant difference of sample
population proportion.
s. This
Population e In other words, test static is a number calculated from statistical test of hypothesi
the null
- shows how well the observed data agrees with expected distribution under
hypothesis.
Fig. 5.1.1
FH statistical Hypothesis
Preliminaries e In many situations one may be interested in drawing certain conclusions about the
ns
1) Population : : A collection of individuals / objects under study is called population or population based on sample information. In taking such decisions certain assumptio
’.
universe. It is classified into two parts. , are made. These assumptions (statement / claim) are called* Statistical Hypothesis
hypothesis
a) Finite population: A population that contains finite number of individuals /objects e Hypothesis is mainly categorised into :'1) Simple hypothesis 2) Composite
is called finite population. the population
1) Simple hypothesis (Definition) : If the statistical hypothesis specifies
| b) Infinite population : A population containing infinite number of individuals /_ ,
completely then it is called as simple hypothesis.
objects is called infinite population.
TECHNICAL PUBLICA Tions® - an up-thrust for knowledge
TECHNICAL PUBLICA TIONS? - an up-thrust for knowledge
Statistics §-4 inferential Statistics: Hypothesis Statistics 6-6 inferential Statistics: Hypothesis

2) Composite hypothesis (Definition): It is composed by many simple hypothesis and 2) Alternate hypothesis:
hence don’t specify the population completely.
. The bypothesis that is different from the null hypothesis is called the alternate
For example, If x,, Xp ++aX, is a random sample of size n from a population having
hypothesis.
mean j and variance & then the hypotheses.
For Example : :
Hy: Þ= by of = o is a simple hypothesis.
If we want to test the null hypothesis that “ the average percentile of students in a
Where as college have specified mean 70.” then we have
a) Hy: k=þy (where of unknown) Ho: w= 70
b) H: of = o; (where Þ unknown) then alternate hypothesis, denoted by H, or H, is

c} SP ? H,: pF 70 en < 70 or p > 70) (Two tailed alternate hypothesis)


H,:

p< 70 Left / single tailed alternate hypothesis


d) Hy : > Pp©
i

H,: p< 70 Right / single tailed alternate hypothesis


NO

e) Hh: ay of <0
Mo™

f Hh: WE, o>0& Note:


i) H, always have an “equal sign” (and possibly ‘less than” or “greater than” symbol
no

8) Hy: Þ<Þl, o >&,


depending on the alternative hypothesis).
are all composite hypothesis.
2) The null and alternate hypotheses are stated together.
EX] Test of Statistical Hypothesis (Test of Significance)
Solved Example
© The method which one can decide Whether to accept or reject a hypothesis is known as
‘test of hypothesis’ or hypothesis testing’. There are two types of hypotheses
1) Null hypothesis 2) Alternate hypothesis

EX! Null and Alternate Hypothesis


Solution ; We need to test that average birth weight is 8.6 pound.
1) Null hypothesis: ” H, :u =8.6
* Itis a hypothesis which is to be tested for possible rejection under the assumption that it H :p>86
is true.
For Example :
GG Types of Error
° If we want to find out whether extra coaching has benefited the students or not we shall ° Acceptance or rejection of null hypothesis depends on the sampling while making any
set up a null hypothesis that “ extra coaching has not benefited the students.” conclusion about the null hypothesis. There may arise two9 types of error,

* The null hypothesis is very useful tool in testing the significance of difference. In other type terror
words, null hypothesis states that there is no real difference in the sample and the Type I error happens when null hypothesis (H,) is rejected even if it is true
population in a particular characteristics under consideration. Therefore it is also defined
as “hypothesis of no difference”. Null hypothesis is denoted by Hy. Type Il error
Type Il error happens when null hypothesis (H,) is accepted when it is false.

TECHNICAL PUBLICA Tions® - an up-thrust for knowledge


TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics : 5-7 inferential Statistics: Hypothesis
Statistics 5-6 Inferenilal Statistics: Hypcthesis
Then for right tailed test =_
These errors can be summarized in tabular form as :
P(Z>Z,) = a .

and for left tailed test . _


P(Z< —Z) =
i.e. area of the criticalregion on either side of tail (left or right) i is &.
ii) For two tailed test =
cf ‘ : . ed

‘© The aim of hypothesis testing is to minimize both types of errors. But in practice it is
not possible to reduce both errors simultaneously. The reduction of one type of error Acceptance . Acceptance
region region
usually increases the othertype of error. Therefore, a compromise must be found to limit
Critical
more serious errors. Critical
region

TN Critical Region
* Hypothesis testing involves determining the region such: that the hypothesis will be
| rejected if the sample points falls in the region and it will be accepted if the sample
{i) Left tailed test {ii) Right tailed test
points are outside the region. This region of rejection is called “critical region”. It is
denoted by W and acceptance region 1s denoted W,
* The value of a test statistic which separates the critical region from the acceptance
"region is called “critical value” or significance value. 1

One tailed test -


Critical / Acceptance Criticat
* If the critical region lies on the one side / tail of distribution of test value then such tests region region region

“are called one tailed /one sided tests.


* If critical region lies on the left side of the distribution then it is known as left tailed test.
+ For right tailed test critical region lies on the right side.
{iii} Two tailed test
Two tailed test
Fig. 5.6.7
¢ If critical region is located i in both rails ofihe distribution then such tests are called two
tailed tests. . = 0
P(ZI-Z,)\=@ —_
>Z
PZ<—Z)+PZ = :. (5.6.1)
& &
Area of critical region
But normal curve is symmetric
i) For one tail test : . P(Z< —-Z)che = P(Z>Z ): = ... (5.6.2)

li
Oh

Let Z be the test statistic . From (5.6.1) and {5.6.2

os
Z,, be the critical value at level of significance a 2P(Z > Z2)

S
P(Z > Z)- 8
= P(Z ~Z,)

TECHNICAL PUBLICA Tions® -an up-thrust for knowledge


TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics — 5-8 inferential Statistics: Hypothesis Statistics inferential Statistics: Hypothesis

i.e. area of critical region on both tails is same and is equal to. , Qa [BEI Power of Test
Thus total area of critical region for two tailed test is &. : + Probability of type II error is denoted by B ie. B is probability of accepting wrong null
Note : hypothesis. Therefore 1 —. B gives the probability of rejecting null hypothesi
s when it is
1) Critical value Z,, ; For one tailed test (left or right) at significance level false.
& is equal to the
critical value Z,, for two tailed test at significance level 2 a
*. This probability (i.e. 1— B) is called power of the test.
¢ Power of the test helps to measure the goodness of test (te. how well test is working)

B = [Accept H, | H, is false]
[Accept H, | H, is true]

[Accept H, | H,]

PRACORCALLER P [Type I error]


3} In following table critical values for one tailed and two tailed tests are listed at
different levels of significance. Note :
1) Smaller the value of Þ, larger the vaiue of 1 — B i.e. chances of rejecting false hypothesis
are more, thus test will work quiet well.
2) Low value of 1 ~ B (near to Zero) means that the test is working very poorly
+ Following factors primarily affect the power of a test.
D Level of significance (&) : If L.o.s. (0) is increased, by keeping all other parameters
constant, then power of test is also increased.
Since larger value of 1.0.s. (a) means the larger critical region (rejection region) and
thus
EX Level of Significance probability of rejecting null hypothesis Ha is greater.
© The probability of Type I error is known as level of significance (Lo. 3) and is denoted II) Sample size (n) - The power of test increases as sample size increase.
by “cw”. It is also known “size of test”. ITNEffect size - It is a number that measures the strength of relationship between
two
¢ In many hypotheses testing situations, consequences of type I error are more serious variables of population.
than type II error. Therefore the probability of type I error i.e. & is fixed Effect size = Hypothetical value of parameter — Actual value of parameter
at low level
generally at 5 % or 1 %. which means probability of accepting correct
hypothesis is It is also known as “magnitude of effect”.
95 %.
Larger effect size enhances the power of test,
* Mathematically
Examples on & and B
& = Þ [Rejecting H, | H, is true] =
P [Reject H, | Hy]
Il

P [Type I error]

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge


” TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics 5-10 inferential Statistics: Hypothesis Statistics
5-41 inferential Statistics: Hypothesis

Probability of type I error = P [Reject H,/ H,]


Selution : Given X ~ P (8)
Hypothesis testing problem is
p[X>4/H,]
1 . 1
H, : A=1 vis H:A=2
1 = ppx=4/P=5)4+P x= 5/P=5
Test function 5. (1 (AN 5s AY (15
1 whenx>3 = ©) (2) * Gk (3
/ $(x) = 0 whenx<3 3 ry 15
= 515) +115
We want to find the probability of Type I error
1y 3
By using definition
P (Type I error) = Eg, [0(x)]
= 613) © 16
S
i

3
Plan3) | - «= TG
It

P{x>3/x-P(A=I)] B = Probability of typeIl error


it

I-P (x<3/x-P(A=1)] = P [Accept Hy H,]


1-{Pſx=0,1,2,3/x-P(A=1)]} = 3
= Plx23/P=7
1-{P(x=0)+P(x= 1) +P(x=2) +P (x=3)}
I

-1,0 -1,1 -1,2 . 41,3) © 3


e 1 e e el _ = 1-Pjx>24/P= |
- O17 ata
l-yart Tr 217 ri
3h

ae

Solution : Given
=

A
ll

756
Then power of test = 1-B=1-75g
s

H, : P
I

n=5 Example 5.8.3

in n tosses. Then by binomial distribution


Let X denotes the no. of heads
P(X-x) ="C. pq” q=1-p
= "C,p'(1-pY” |
Hy is rejected if more than 3 heads are obtained. Therefore critical region is given by
Solution :
W= ſx/x>4} 1
Hy : Pay
> W ix/x<3}
Io

e
TECHNICAL PUBLICA Trons® -an up-thrust for knowledg
TECHNICAL PUBLICA Tions® - an up-thrust for knowledge.
Statistics §-12 inferential Statistics: Hypothesis Statistics 5-73 _inferential Statistics: Hypothesis
.. 1 Al General Procedure Followed
H, : Pez iin Testing ofStatistical Hypothesis

n= 2 ‘gtep |: Set up the null hypothesis H, clearly.


Let X is ar.v. that.denotes even no. Step || : Specify an alternate hypothesis H, in an appropriate way.
Then by binomial ee
. Step Ul: Choose a level ofsignificance (ct).
P(X=x) = "CP'ql*= "CP (1-pj""*
We reject H, ifno. of times, that an even number is shown is less than 2. Step IV : State and compute the. appropriate test statistic (Z) under H,-

.. Critical region is, ' Step V : Compare the test statistic value (Z) with Z (the critical value) at level of
W = {x|x<2} significance a.
Draw a conclusion about the rejection or acceptance of H,.
W = {x[x=2}
Note : 1. If fZ]>Z, we reject H,-
a = P- [Reject
H, / H,]
2. If|Z|<Z,, we accept H,
= pix<2/P=4|
8 5.10 Test of Significance
1
= 1-p[x =2/p=S| Depending on the sample size. We categories testing of significance into two types.

= 1a) G)
2; {1 2 1 2-2
1. Test of significance for large samples- (sample size n> 30)

_

. Test of. significance forsmall samples- (sample size n < 30)

5.10.1 A[E51 of Significance for Large Samples


= I
4 If the sample size n > 30, the sample i is taken as large sample. For these samples we apply
. a= 3= normal tests ſas Binomial, Poisson, Chi-square etc. are approximated by normal ‘distributions.
4 considering the population as normal. J
B= P [Accept H,} H,] Following are the important tesis to test the significance for large samples.
1 1. Testing of significance for single proportion.
P [x= 2/P= 5
2. Testing of significance for difference of proportions.

= 7ex(3) (1-4) 3. Testing of significance for single mean.


4. Testing of significance for a difference of means.
B
ro]
il

“ Power of test CHF Test of Significance for Single Proportion


TW»
ran
ll
1

' We use this test to check the significant difference between proportion of sample and the
population.
Notations used :
X - no. of success in n independent trials
P - probability of success for each trail

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics 5-15 inferential Statistics: Hypothesis
Statistics inferential Statistics: Hypothesis

x 216 (No. of success i.e. getting head)


Q - Probability of failure
. . _X _ 216 _
Q =1 —P | . .
‘ p Proportion of success in sample = n 400 > 0.54
X
P a. observed proportion of success Population proportion P 0.5

I hy _ Q 1-P=1-0.5=0.5
E(X) Ul nP
V(X) Q Test statistic

Then E(p) z(=)a _ E(X)=> nP’


8

E(p) P a4
v9) -Sv00=47629= PQ©
V(p)
i

| . since iZ| 1.6 <1.96 = Z,


WE ft =_

|
_— i.e. - |Z| < Z where Z,,, is 1.0.8. at 5%

Then test statistic is given by | Hy is accepted ie. coin is unbiased.


P-EQ)_ N(O, 1)
Z = SEP) oF -
Solution : Here,
Note :
PQn? where Z,
‘1) The probable limit for the observed proportion of success are p + Z,

is the significant value at level of significance “co”. .


PQn
2) If P is not known, the limits for the proportion in the population are p + Z,

where q=1-p
3) If & is not given, we can take safely 30 limits.

salen
. Let = H Die is unbiased Le. P =
2]

Hence, confidence-limits for observed proportion pare pt 1


=

Die is biased i.e. P #3

I Solved Senn ce
. Tests statistic .
Þ=P_
a PQ/n
0.36 — 0.33
5
Solution : “Probability of getting head /T="5 =0.5 = P (1/3 x 2/3) /9000
0.03496 < 1.96
The coin is unbiased (i.e. b= 0.5)
i.e, [Zi < Z (1.0.8 at 5%)
The coin is not unbiased (P # 0.5) ces

-, We accept H, i.e. die is unbiased.


Here 400 . FO .

TECHNICAL PUBLIGATIONS® - an up-thrust for knowledge:


_ TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics 5-16 __Inferential Statistics: Hypothesis Statistics 5-17 inferential Statistics: Hypothesis
5.10.3 Testing ofSignificance for Difference Proportion
Let x, and X, be two samples taken from two different populations. n, and n, be the sizes
of X, and X, respectively. If P, and P, are proportion on success in the samples X, and X,

«
Solution : Here, n= 600, X = 36 . = = respectively “Then to test the significance of difference between PY and P, we use test statistic.
36 i
p = Probability of getting defective bolt =%00 © 0.06 . Z =

P = Proportion of defectives in the population =4 % = 0.04


o Q = 0.96

it
2 QA

-
a
=
H, : 4% products are defective (iy p.= 0: 04)
H, : p#0.04 . | | Rar
Test statistic ,
. 7 = {_Þ=P|
VPQ'n Solution: Given : n, = 500, n, = 100
2 = | EE
0.04 x 0.96] | 25>196=
oO 2, . (l.o.s. at 5 %)
—— = 9 P, = 16 _
"599 = 0.032
600 | 3
Since I> Z| | | P, = 100 =0, 03 |
.H, is rejected Le. defective products are more than 4 %. © ~, Pim + P, n, _ (0.032) (500) + (0.03 x 100) _ 19. = 0.032
hen © 500 + 100 © 600
e. Q = 1-P=1-0.032 =0.968
Since we have to test whether machine is improved or not.
Solution : Given, =X=40, n = 400 | = * We state null hypothesis.as, =
| : X 40 H, : Machine has not improved after overhauling i.e. Pi =P,
p = Proportion of bad apples =, =0g = 0.1 | | Thus HM, :P, > (gh tailed)
Gf q= 1-p=0.9 under H Z = 7 0.032= 0.03
Since confidence limits is not given, we assume it is 95 % = po (+,ray A 0.032 x 0.968 x == Ta
-. Level of s1gnificance & = 5 %
- | .
Z, = 1.96 = 0.104<1.645=Z, (a =5%)
Also population pr oportion P is not provided | | Conclusion As
: [Z}<Z, (at5%Lo.s.), H, is accepted i.e. machine is not improved.
*. The limits are,

p#Z,A[F* = 0.141.96\/ 0.1x0.9


ac
= 0.1 + 0.0294

7 , :
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge | : TECHNICAL iPUBLICA TICNS® ® - | an
an up-thrust
up- for knowledge
5-19 inferential Statistics: Hypothesis
Statistics _ 5-18 inferential Statistics: Hypothesis Statistics

1 13
Solution : Given : = 1000, n, = 1200 P, =P, . BOM = -1.29
_£0_._4,, .00_2 Under H, Z = = 1
tot
Þ = 1000 93 5 *P2= 1200© 3 PQ (+ + *) (0.525/0.475) (hs an)
Pmt Pm, "soo + 60 1600 _ 8 [Z} 129 <1.96=Z,
n, +n, 1000 + 1200 2200 11
8 3 Conclusion :.

.Q 1-P=1- li ti - AS (Zl < Z at 5% Los,


_—
the null hypothesis as + H, is accepted.
To test the difference in coffee consumption of people, we set
tion of coffee before and after increase in
Hy: P, =P, Le. there is no. difference in the consump
ERIE] Test of Significance for Single Mean
excise duty,
Let X,, Ky os X, bea random sample of sign n from a large population X,, Xyy oy Xy
HL: Pi?Pp (right tailed test)
e the standard error of mean of a random
of size N with mean js and variance o’, Therefor2,
sample of s:ze n from a population with variance & is ohſn.
under - Hy : Z= T | i\ [8 3/1 1 t
tion with mean 11
To test whether the given sample of size n has been drawn from a.popula
- | \*2 mn ain 000 ©I) population mean is
that is to fest whether the difference between the sample mean and
difference between the sample
> 1.645
6.992 —_ significant or not. Under the null hypothesis that there is no
,
Z (a= 5%) -- mean and population mean.
|

The test statistic is


Conclusion :
2 - |=
4
As IZ) > Z, (ar 5% los.)
in the consumption of coffee after increasing
©. His rejected i.e. there is significan t decrease ofan
the excise duty. Where o- Standard deviation of the population.

Note : If co is unknown, we use the test statistic

2 = |=
N [$8/xfn
‘| Where Standard deviation of the sample.
Solution : Given

DTT;

p = Pi My + Py My _ 200 + 325. = 0,525


“a, Fn, 0 400+60 Solution : Given data : py =6.8, o= 1.5, X= 6.75 and
n= 400

Q = 1-P=0.475 Define
i.e. p,= Ps
'H : Proportion of men and women in favour of proposal is same mean) and [L (population mean)
LS ' H, : There is no significant difference between x (sample
H,: p, *p, (Two Tailed test}

TECHNICAL PUBLICA rions® - an up-thrust for knowledge


TECHNICAL PUBLICA TIONS® - an up-thrust for knowledge
Statistics - 5-20 inferential Statistics: Hypothesis
Statistics 6-21 inferential Statistics: Hypothesis

i.e. = As IZ| = 10.54>Z, = 1.96 at 5% level


H, : there is significant difference between X and H, isrejected at both 1 % to 5 % level of significance.
Le WE,
Test statistic : en bp 10. wp

- | X= 6.75 -6.8) _
Z| = = =|-0.
2) o/\ſn] | 1.5 /.f900 | 0.67]
Solution-: Given data :
Z = 0.67
. ASH, Is accepted }Z| = 0.67 <Z,=1.96 at5% level of significance, n= 100,X =51,S=6, n= 50
There is no significant difference between the sample mean and the population mean. o - Unknown
H, : The sample is drawn from the population with mean þp= 50.
H, 7H #50

Under H, : Oz Z| zs
$7 vn 6/100 _10_
_ |-51=50| 6 1.666
Solution : Given data, n= 400, þ = 162 5 cms., X = 160 cms, 6=4.5 cms, Since [Z] = 1.666 <Z, =1.96
. Hh is accepted
Note :
© : $:D, of population, Therefore the sample is drawn, from the population with mean p = 50
xX : Sample mean, | EXIT Test of Significance for Two Means
L. : Population mean, 0
_ : . . . 2 =
ns Sample Size}
n: Let x, be the mean of sample size n, from the population with Þ, and variance & , . Let x,
: Sample drawn from the population with mean Þ = 162.5, be the mean? of an independent sample of size n, from another popuiation with mean p, and
2 : '
by us 162. 5 variance & , . The test statistics is given by

Here
Xp 160 — 162. = | -
ere [Z] = o/\ſn = . I. > 29 11.11

As iz] = 11.11>Z, =1.96


At 5 % level of si nificance Hy is rejected.

Solution : Given data, n= 1000, o= 6, u = 160, X= 158


here, H, : The sample drawn from the population with mean = 160. ‘Under the null hypothesis that the samples are drawn from the same population there
H, : p# 160 may arise two cases. |
|
X = —158 — 160
Undernder H Hy 12} = |S]
oa = « 1xf7000 =|.[-10.54 |= 10,54

TECHNICAL PUBLICATIONS®- an up-thrust for knowledge


TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Inferential Statistics: Hypothesis
Statistics 5-22

Casel:0,=0,=0

The test statistic is given by

Z=

Case Il: Ifo, = 6, and o is unknown H, : rejected

We know
2 2
ns +1,8
Co =
* n +0
There is no ) =X.

Is
Solution : H, :
Test statistics, z=
H;X, = x,
We need to calculate sample variance -
. ce ;
25
= aw KX) _ 784000 _
Examp s= 2° =, = 1000 ~ 784
2400000
— 2
a (8,
.
; .
S, = ,F©2 =T500 = 1600
Now we calculate the sample mean
_7, = Ex, ==49000 =49
i ©" © 1000
2x, _ 70500 _
girls ie. X, =X,
nm, 1500 ~47 7
. HH 49-47 T6900 _ | | 4
The test statistic Z = 7 7 784 1.470 .

—+— AY 1000" 1500


=
~~?

N
Il
=

a
=D

vu

< 1.96
As {Zj = 1.47 - _—_—
At 5 % level of significance H, is rejected i.e. there is no Significance difference between th
AS 3.144 <Z,= 1.96 at 5 level of significance.
tt

sample mean.
H, is accepted.

. There is no significant difference between, the me ans of the intelligence of boys and girls

TECHNICAL PUBLICATIONS®.- an up-thrust for knowledge


Statistics . §- 24 inferential Statistics: Hypothesis Statistics 5-25 ; inferential Statistics: Hypothesis
For college B’

n, = 60 xX, = 79 S,=7 _ , t=
Kap
Syn ©
1.67
— 1.60
Define H, and H,and solve ! 0.16 16 © 1
Ans. : H, : rejected Here, degrees of freedom =n — } =16—1=15 from the
t table | t |= 1.75 <toos js = 2.13
-. Hy, accepted at 5 % /.0.s.
EXIT Test of Significance of Small Samples Therefore we conclude that the mean height of the newcomers is not significantly different
When the sample size is less than 30 then the sample is called small sample. from the mean height of the student population of the previous year
epugabremetonevaveunsmcouurusyaton'or
Following two types are enlisted for small samples to test the significance.
1) Student t-test (t - test) 2) F-Test (Snedecor’s variance test)

t-test : (for mean of one sample)


t-test is a small sample test. It is developed by William Gosset in 1908. It is also called as
students t-test. T test assesses whether the means of two groups are statistically different from Solution
each other when the population standard deviation is unknown.
If X;, x, bees R, is the random sample from ‘the normal population N (B, o) then here n=
t-statistics is defined as

Where X
{l

Solution : HA,: There is no Significant difference between the mean height of the newcomers
and the mean height of the student population of the previous year.
i.e. H): w= 1.60, H,:
yu # 1.60

heren= 16, X = 1.67, w= 1.60,


S = 0.16

TECHNICAL PUBLICA Tions® - an up-thrust for knowledge


TECHNICAL PUBLICATIONS® . an up-thrust for knowledge
inferential Statistics: Hypothesis
Statistics Inferential Statistics: Hypothesis

0.011967 _
0.001088 = 0.032983

wn

lt
n-1 11

l
xX —p 0.9883— 1
= TITRE = 1.228815
S|\n 0.032983 / 12
-. [t| 1.228815 < tos =2.201 d.f. 11
.H, accepted at 5 % level of significance. _
©. The average weight of contents of package taken as 1 kg.

t - Test for Comparison of Mean of Two Samples Solution : H, . Þ, = Hy


UTE
and
Let X,, 1 X,, ... X, n and X! 1 : tt x’ hy be the values of two random samples of size n,
yoo H,: #4,
the t-test statistic
n, respectively from the same normal population N (i, 6?) then
here 1, = 10, =80, L0,—L9 = 0.66
n, =8, R, =7.6, 2(X— Ky = = 0.80
=

g * 0.66 + 0.80 1.4616 = 0.09125.


16

Where, X
£076 X,-X, 7
4{0.09125 SA fotn
ats n
_ .
and $
2
Degrees of freedom =n, +n,-2=10+8-2=16
as 2.79 is less than
For 16 degrees of freedorr, the probability of getting a value of t as high
with degrees of freedom n, +, — 2 hypothesis. ' There is
0.05 Therefore the value © at 5 % level of Significance rejects null
Note: significant difference i in the mean yields.
then we have
1) If the two sample standard deviations S,, S, are given
n,S; +n, s*
S* = n 7H 2

2) Let X,, X,, ....X, n and X; 1? ,X;, ... X! n be the values of two random
samples from two Solution: H,: me Þ,
and 2 but the same
different normal population N(Þ1, o) and N(Þ,, o) having pt, H,: kW *Þ
standard deviation o then the t-test statistic is
Here x, 20, n, =10, 8, =3.5,

lt
be 18.6, 14, S$, =5.2,
it n, $ '+n,S)_ 10 (3.5)°+ 145.2)"
= 22.775
A

10+ 14-2
Ln
1
Iz
With n, + n, ~2 degrees of freedom.

TEDHNICAL PUBLICA TIONS® - an up-thrust for knowledge


TECHNICAL PUBLICATIONS® - an up-thrust for knowledge ,
Statistics - §- 28 inferential Statistics: Hypothesis

. S = 4.772

_ Ri Ron Wl) _= ___20.3-18.6 *


t FR = 0.8604
i 44) 4.772
it} = 0.8604 <1,,. For degrees of freedom 22, | t | = 2.07
ef H, accepted. |

Ext FTest — | : ,
Let X,, X,, ... X, and x > 3» wee xy are two independent random samples from two
normal population N (,,0,) and (1,,0;) respectively then the statistics of F-distribution is
& 144
Solution
= 10-1 16

ry

YH
F =—S, where $.'>S;
1 2
| -
_ 264 _.,
= 2 1301-2
2 DX, —X 2
Where S, = n,-1 F test statistics

©
2 2X, — Ky

©]
S, = n, — 1 .
. 2 Degrees of freedom = n,-1,n-1=13-1,10-1=
12, 9
(a, and n, be the sizes of two samples with the variances S, and S,2 )

Note : . .
1. In F-test. It is very important to note that the population variances are equal
: . 2 2 .
ie. O, =O 2
%
2) The degrees of freedom is (n,~ 1, n, ~ 2)
PAIRUA WHOA KOye
Test result : If the calculated value of F exceeds Fy; forn,-1,n,-1 degree of freedom from Solution :
the given F-value table. We conclude that the ratio is significant at 5 % level

Remark : When readymade data is given instead of tabular long data. We use
Mo

2 Nn, S, 2

to
S = 2 x, — A
Ln S, = -
.
1
2 .

2 1,8 2
— 2 _ 61.52-8(1.2)
ID 2 2 dx, —n,x,) gop 714
_— Sy 7 1

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge TECHNICAL PUBLICATIONS® - ar up-thrust for knowledge
__tnferential Statistics: Hypothesis Statistics (5-31 inferential Statistics: Hypothesis
Statistics 5-30
2 - —~ 2 | oh For Sample I .
—1(1X))
2x, n,— —11.(1
_ 73.26 11- .5) == 4.85 \ | .
2 4,

N
1:

A
=

>

ON
©
i,R

=oO
pe
~

©
8,

i
i

l
mo
-. F test statistics is. = | . I —— = 12.057

I
wn
i
o " __F= = (-6,=6, ) (Sample drawn from the same population or

from two population with same variance)


12.057
= FF = 1.0576.

ons Here the tabulated value of F at 5.1 level of significance with degrees of freedom.
;
(n, —1, 2, —1) (7, 9) is 3.29
. .
Solution :
[F| = 1.057 <Fo95 = 3.29
= 4
. ,
here
1.

gut S
2
209 _,
22-1 $1 : , = => H, is accepted.
1 n,-1 -
eo
2 _ 198, _16(3.8)
S HET 16-1 © 15
Test statistics
2
S\ 15.4
F= +=
$, 8.81
=1.75 . -

Degrees of freedom = n,—1,n,-1=16~1,22-1= 15,21


Ii

ay -djr016h

Mm
Example Ty Solution :
H, : The stability of both workers is same 0; = 9,
H, : Worker A is more stable than worker B.
n, = 6 n, =8
Solution : . .
1 2 .2
: .
6, = ©,
. . . IX 222
_ Hy: There is no significant difference between population variance Le.
H, : There is significant difference between population variance.
_ _ 2X; i _ 315 315 39.38
I ;.
For sample
,
n, = 8 = 84.4
Dx,-%,)

=_ TECHNICAL PUBLICATIONS® - an up-thrust for knowledge


TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics inferential Statistics: Hypothesis
Statistics

Now we will write table value of F


WorkerA
Fi,o905.75 = 4.88
F 2.80 < Fy 0.05 = 4.88

.. H, is accepted,
i.e. The stability of both workers is same.

Case Study:
L. Dieters lose more fat than exercise ?
- Diet only
Sample mean : 5.5 kg.
Sample standard deviation : 4.2 kg
- Sample size =n = 46

Stand error for dieters = £2 = 0.62


Worker B 46
Exercise only
Sample mean 4.2 kg
Sample standard deviation 3.8 kg
Sample size = n= 52
3.8
Standard error for exercise = JS =0.53

Measure of variability = = 0.8156

Testing a hypothesis

Step |: Determination of null and alternate hypothesis


H,: No difference in average fat lost in population for two methods. Population mean
difference is zero..
H, ; There is difference in average fat lost in population for two methods. Population mean
difference is non-zero.

Step 1: Test statistic


The sample mean difference = 5.5~—4.2=1, 3 kg
F test statistics F =
S, _ 44.86
3 = 16 © 2.80
and the standard error of the difference is = 0.8156 = 0.82
13-0
So Z = Test statistic
0.82
= 1.5853 = 1.56
Degree of freedom n,-l,n-1=8-1, 6-1=7,5
If

Iona em
mem mat, dn nen,
IT TECHNICAL PUBLICATIONS® - an up-thrust for knowiedge
Statistics = 5-35 inferential Statistics: Hypothesis
Statistics 5-34 inferential Statistics: Hypothesis
. . 2
Which is another form to calculate x value.
Step Ill: Decision
a= 0.05 ie 5% Remarks
AS © Z = 1.56< 1.96 © Z,) i) x’ value ranges from 0 to °°

it
izi= Z, . ii) If x - = 0 then it indicates that observed and theoretical frequencies matches exactly.
We accept H, - null hypothesis and we conclude that there is no significant difference iii) If x >0, shows that there is difference between observed and theoretical frequencies.
between the two methods. The greater the discrepancy, greater the x value.

Ea Chi-square Test Note : = OST ; .

When any experiment is ‘performed, it is observed that theoretical consideration and 1) If expected frequencies are not ‘provided then iit can be calculated by using the formula.
Row total x column total
observed data are not same. There is always some difference between observed data and Expected frequency = (formula i is used for contigency table]
Total frequency
theoretical data. va (Chi-square) test helps to measure the magnitude of this discrepancy
between observed and expected data. . 2) x test can be used as a

Chi-square test is most commonly used test to check the goodness of fit of theoretical a) Test of goodness of fit
distribution to actual distribution. b) Test of homogeneity of independent estimates ofpopulation variance

Test statistic for x’ - test is given by c) Test of hypothetical value of population variance
n 2 d) List to homogeneity of independent estimates of population correlation. .
-(O.-E, .(5-E1.1)
Y= yo . - 3) Once expected frequency for one cell of 2 x2 contigency table is calculated, then
. i .
*
. =l remaining frequencies can be also calculated by using the formula either (Row total —
where O, (i= 1, 2, ...n) are observed frequencies. calculated frequency) or (Column total — Calculated frequency).
E,(i= 1, 2, ...m) are expected frequencies.
Conditions for applying 7 test :
Such that
XO,= LE, =N (Total frequencies) 1) Total frequency (N) should be large (N > 50).
me
Simplifying RHS of equation (5.11.1), we eet . 2) Expected frequencies should be greater than or equal to five i.e. E, >5Vi=1,2,
*;
4 n
0; ~20,E,-E, Since x value is over estimated for small values of expected frequencies.
>
x = i=1 Ej 3) When Ex 5 for some class “i?” then that class is merged with neighbouring class until
the total value of expected frequency becomes greater than or equal to. 5. [While
22 2X0 En
=

merging the class merge both expected and observed frequencies}.


=1
TeAt

‘This process of grouping the classes is known as “pooling” of classes.

E, -2N-N 4) Note that the number of degrees of freedom is to be determined with the number of
classes after pooling them.
“O
ew!

_N _ ... (5.11.2)
71 iy
—_
=Jl

TECHNICAL PUBLICA TIONS® - an up-thrust for knowledge


TECHNICAL PUBLICA TIONS® - an up-thrust for knowledge
Statistics : : 5-36 .
Statistics

¢ rf ~E)
Solution : =
1

Degree of freedom = 5-1=4. - 25 .784 2116 | 36 , 4489 ,576 _ 251


33 66 66 33 7 -
, 0,-EY
n 2

We have y= I ( = 2 Tabular L value for 5 d.f. at 5 % 1.0.5. is 11.07.


i=1 4 Conclusion -
_ (10-17) (40-52) (60-54) (40-31) (10-6) Since calculated x
2
value is much greater than tabular y° value, therefore the hypothesis
mn a eg Fp FG
that “Given data follows binomial distribution” is rejected. . .
x° = 11.597 . Example 5.11.3

SPPU : Deci-D1
Solution : From given data
Solution : Given proportion 8 :
No. of coin tossed n = 5 13 : >
8+2+2+1
No. of times coin tossed N = 210°
Total frequency = 524
“. By binomial distribution, expected frequency for getting r heads is
n r_n-r Then expected frequencies E. are
Expected frequency (E,) = NP (X=r)- 210 C.p q .

Probability of getting head p = = .


=x 524 = 323
Jo
- q=
1
I-p=l-z=7
I 4x 524 =8l
5 lv I . 2
an E, = 210°C, 2) 5 73 x 524 = &1

5 1Y .
= 210°C, 5 “tex 524 = 40

TECHNICAL PUBLICATIONS® - an up-thrust for knowtedgs +


TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics §- 39 inferential Statistics: Hypothesis
Statistics 5-38

Thus we have . (0,—E;)


A ens ges

__0_,120* 100,100 25 , 225 100 550120 _= 45833 |


120 * 1207 120 120 *+730 120 © 120
2
Calculated x value = 4.5833 | , :
2
Tabular x value for 5 degree of freedom and at 5 % of /.0.s. = 11.071

Conclusion :. :

Calculated x value < Tabular x value


“. We accept H,
i.e. No. of books issued do not depend on days.
101? 39° 49° 110°
= 5353 grt ee? 40 ~ 282.502
"Calculated x’ value = 382.502 —
Tabular < value at 5 % /.o.s. for 3 degrees of freedom = 7.815 -
Example 5.11.4 ; OR
AAA

Solution : From the given data, we observe that expected frequencies in first and last classes
are less than 5. Hence we need to apply pooling of classes.
We will merge these classes with their neighbouring classes.
Thus we get
naa nti oe

Solution : Given Hy = No. of books issued is independent of days.


Total books issues 120 + 130 + 110 +115 +.135 +110 = 720
No. of days n 6.
720
-, Expected frequency = __ = 120

= .
Thus we have

TECHNICAL PUBLICA TIONS®. - an up-thrust for knowledge


Statistics : 5-40 Inferential Statistics: Hypothesis
Statistics + 6-44 inferential Statistics: Hypothesis
df. =7-1=6 _ ; Now consider the table *
Tabular value of ¥ 12.592, at 5 % 1.0.s. As calculated x value is less than tabular value.
Thus fit is good.

= 38.392

From x table, for 1 d.f., at 5 % Lo.s.


Solution ; Let the null hypothesis be
x = 3.84
2

H, : Quinine is not effective/useful to check malaria. - . . . :


Conclusion : Since calculated x value is greater than tabular x 2 value.
H, : Quinine is useful in checking malaria.
. We reject the null hypothesis.
From given contigency table degrees of freedom = (r—1) (C—1)=.(2—HD) (2-1)=1 . Quinine is useful in checking malaria.
and expected frequency can be calculated by using the formula
Row total x Coloumn total Exercise
E =
Total frequency ' . 1. 4 engineer hypothesises that the. mean number of defects can be decreased in
Calculation of expected frequencies manufacturing process of compact discs by. using robot instead of humans for certain
tasks. The mean number of defective discs per 1000 is 18. State H, and H,
H : .

i
2. A psychologist feels that playing soft music during a test will change the result of test.
The psychologist is not sure whether the grades will be higher or lower. In the past
mean of the scores was 74,
State H, and H,.
3. Let P is the probability of getting head when a coin is tossed once. Suppose that the
hypothesis H,: P = 0.5 is rejected in favour of H 1 P=0:6, if 7 or more heads are
obtained in 10 trials. Calculate the probability of type I error and power of the test.
7
, : 7
4. In a Bernowli's distribution with parameter P, H, : P = 773 against H, © P -4 is
rejected if more than.3 heads are obtained out of 6 throws of a coin. Find the
probabilities of type I and type II error. [Ans. : a = 0.3437, B = 0.3196]

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge . on = TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics , 5-43. inferential Statistics: Hypothesis
Statistics 86-42 Inferential Statistics: Hypothesis
15. The mean weight obtained from a random sample size 100 is 64 gms the S.D. of the
5. An urn contains 6 marbles of which @ are white and the others black. In order to test
weight distribution of the population is 3 gms. Test the statement that the mean weight
the null hypothesis H,: 0 = 3, Ws H, : 0 = 4, two marbles are drawn at random {Ans. : Ho rejected]
of the population is 67 gms at 5 % level of significance.
(without replacement) and H, is rejected if both marbles are white otherwise H, is.
16. Five measurement of the tar content of a certain kind of cigarette yielded 14.5, 14.2,
accepted. Find the probabilities of committing type 1 and type II error.
[Ans. : « = 0.20, Þ = 0.60] 14.4, 14.3 and 14.6 mg/cigarette. Assuming that the data arc a random sample from a
6. Let XK, X,, ve Xo bea random sample from I (@ 25). If for testing H, : 8 = 20 normal population show that at the 0.05 level of significance the null hypothesis
Le = 14-must be rejected in favour of the alternative hypothesis 4 # 14. (Hint : yo
against H,: 0 = 26, the critical region W is defined by W ={x |¥ > 23.266}. Then calculate S.D. of sample.)
find the size of critical region and the power. [Ans. : 0.2389]

7.. 325 women out of 600 women chosen from a city were found to be working in different
sectors. Test the hypothesis that majority of women in the city are working.
[Ans. : Ho is rejected at 5 % 1.0.s.]
[Ans. : Ho rejected] |
8. A random sample of 500 bolts was taken from a large consignment and 65 were found
to be defective. Find the percentage of defective bolts in the consignment ? 17. 4 sample of 1000 students from a university was taken and their average weight was
. . [Ans. : Between 17.51 and 8.49] found to be 112 pounds with S.D. of 26 pounds. Could the mean weight of Students in
fAns, : Ho rejected]
9. A bag contains defective articles, the exact number of which is not known. A sample of the population be 120 pounds.
was Rs. 210 with a S.D. of Rs. 10 in sample of
100 from the bag gives 10 defective articles. Find the limits for the proportion of 18. The average income of person
defective articles in the bag. [Ans. : limits are (0.1588, 0.0412)] 100 people of a city. For another sample of 150 persons the average income was Rs.
10. In a city a sample of 1000 people were taken and out of them 540 are vegetarian and 220 with S.D. of Rs. 12. The S.D. of incomes of the people of the city was Rs. 11. Test
income of the
the rest are non- vegetarian. Can we say that both habits of eating are equally whether there.is any significant difference between the average
: [Ans. : Hy rejected]
popular in the city.at i) 1 % level of significance ii) 5 % level of significance. localities.
results. Examine
[Ans. : Ho rejected at 5 % of 1.0.5, Ho accepted at 1 % of /.0.s} 19. Intelligence tests on two groups of boys and girls gave the following
[Ans. : H, accepted]
the difference is significant ?
Test of significance for difference of proportion ' paint. Two
26. A product developer is interested in reading the drying time of a primer
11. 4 manufacturing firm claims that its brand A product out sells its brand B product by formulation of the paint tested formulation 1 is the standard chemistry
and formulation
8 %. It is found that 42 out of a sample of 200 person prefer brandA and out of 18 out 2 has new dying of ingredient that should reduce the drying time is8 minutes and this
of another sample of 100 person prefer brand B. Test whether the 8 % difference is inherent variability should be unaffected by the addition of the new ingradient
10
valid claim. ; [Ans, ! H, accepted] are painted with
specimens are painted with formulation I and another 10 specimens
12.. In a pole submitted to the students body at university, 850 men and 560 women voted. min andX, = 112 min
500:-men and 320 women voted “yes”. Does this indicate a significant difference of " formulations 2. Two samples average drying lime are x, = 121
effectiveness
opinion between men and women on this matter at 1 % of level of significance ? respectively. What conclusion can the product developer draw about the ]
, [Ans. : H, accepted] [Ans. : H, accepted
of the new ingredients ?
bank studies claim as
13. In a town 4, there were 956 births of which 52.5 % was males while in towns 4 and B 2i. A study of the number lunches that executes in the insurance and
the following
combined, this proportion in total of 1406 birth was 0.496 Is there any Significant deductible expenses per month was based random samples and yields
difference in the proportion of male births in two towns, [Ans, : Hy rejected]
results. n, = 40, X, = 9,8, = 19, “ny = 50, ¥,= 80, S, = 21, Test the null
[Ans. : H, rejected]
Test of significance for single mean - hypothesis against the alternative hypothesis ?
14. A random sample of 900 members has a mean 3.4 cms. Can it be reasonably regarded
as a sample from a large population of mean 3.2 ems and standard deviation
2.3 ems ? {Ans. : Ho accepted]

knowledge
TECHNICAL PUBLICATIONS® - an up-thrust for
- TECHNICAL PUBLIOATIONS®- an up-thrus! for knowledge
Statistics . 5-44 inferential Statistics: Hypothesis statistics 5-45 inferential Statistics: Hypothesis
f

22. To find out whether the in habitants of two /slands may be regarded as having racial _29. Two random samples are drawn from 2 normal populations are as follows
ancestry an anthropologist determines
the cephalic incides of six adult males from
each island getting x, = 77.4 andx,= 72.2 the corresponding standard deviations
Jo, = 3.3 and o, = 2.1 Test whether the difference between the two samples means
can reasonably be attributed to chance. [Ans. : H, rejected} [Ans.:H, accepted]
30. The two random samples reveals thefollowing data
23. A group of 10 boys fed on diet A another group of 8 boys fed on different diet B
recorded the following increase in weight (kgs).

DietA | 5 6 8 i 12 4 3 | 9 6 10

Diet B i0. 1 2 71 8
wo

oo
a
Ww

[Ans. : H, accepted] {Ans.: H, accepted]

24. A random sample of 10 boys had the 1.Q.’s 70, 120, 110, 101, 88, 83, 95, 98, 107 and 31. Two independent samples of siz
size 8 and 9 had ihe following values of the variables.
100. Do these data support the assumption of population mean 1.0. of 160 ?
: [Ans. : H, accepted]
25. A sample of 18 items has a mean 24 units and standard deviation 3 units Test the
hypothesis that it is a random sample from a normal population with mean 27 units.-
fAns.: H, rejected] Do the estimates of the population 7 variance differ Significantly ?
26. The average number of articles produced by two machines per day are 200 and 250 32. Ina sample study, following information regarding demand for a particular spare part
with standard deviation 20 to 25 respectively on the basis of records of 25 days on various days was obtained from given data , at 5% 1.0.s. fest that demand of no. of
production. Can you regard both the machines equally efficient at 5 % level of spare part does not depend on day. :
significance ? [Ans. : H, rejected]
27. For filling each bottle with 170 tablets of a perticular machine, an automatic machine
was installed from the production a sample of 9 bottles was taken the number of tablet
found in these 9 bottles are as follows. Test whether the machine has been installed
properly or not. X : 168, 164, 166, 167, 168, 169, 170, 170, 171. — [Ans.: H, rejected]
[Ans, ys = 0.18029, Accept H,]

F-Test 33. The table below gives the number of accidents that occurred in the certain factory on
28. The following data shows the runs scored by 2 batsman. Can it be said the
performance of batsman A is more consistent than the performance of batsman B?
Use 1% level of significance.

Test whether the accidents are uniformly distributed over the different days.
[Ans.: == 5.25, Accept Hy]
[Ans. : H, accepted]

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge TECHNICAL PUBLICA TIONS® - an up-thrust for knowledge
’ Statistics _—_— 5-46 : Inferential Statistics: Hypothesis = 5-47 inferential Statistics: Hypothesis
in son
34. Using the data given in following contigency table, test whether the eye colour

Dayeenvarcvend
Multiple Choice Questions
- is associated with the eye colour in father.
2
Q.1 An assertion made abouta population for testing, is called ,

[al hypothesis [b] statistic

ſc} test-Statistic ſa] level of significance

Q2 ff the assumed hypothesis is tested for rejection considering it to be true is called ?

[al Nuil hypothesis [6] Statistical hypothesis


s of
35. In an experiment on pea | breeding, a scientist obtained the following frequencie
yellow, 102 wrinkled and yellow, 109 round and green and 33 Simple hypothesis - . ſal Composite hypothesis
seeds ; 316 round and
wrinkled and green. Theory predicts that the frequencies of seeds should be in the , conclusions about a statement is drawn on the basis of a Sample.
Q.3 In
- proportion 9 : 3 : 3 : 1. respectively. Set a proper hypothesis and test it at 5%1.0.s.
[Ans.: x == 9.3555, Accept H,] fa cull hypothesis - [b] statistical hypothesis
36. From the information given below, test whether the type of occupetion and attitude [¢) simple hypothesis [a] alternate hypothesis
towards the Social laws are independen [
t Use 1%los] Aritude towards social laws.
arnnsorstrsotes heat
Q.4 A hypothesis that spec'fies the population distribution completely, is called .

[al composite hypothesis [b] simple hypothesis

null hypothesis ‘{d] ‘alternate hypothesis

Q. 5. The probability of rejecting null hypothesis when it is true is called as .

37, The no. of defects per unit in a sample of 330 units of a manufactured product
was [al level of significarc2 [6] confidence level
found asfollows.
CIO power of test [4] critical value

Qs is the point where the null hypothesis gets rejected.

Fit a Poisson distribution to the data and test for goodness of. ‘fit. [al Sample size [6] Power of test
[Ans.: y== 0.0292, Accept H,]
[Hint - Apply pooling of classes] [cl Critical value [a] None of these
.
each time is
38. A set of 5 coins is tossed 3200 times and number of heads appearing Q.7 A region where the null hypothesis gets rejected is called .
noted. The results are given below.
[a] critical region [v] acceptance region

critical value, [a] none of these


BY
[Ans.: x = 58.80, Reject H,]
Test the hypothesis that coins are unbiased

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge FESHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics 5-48 inferential Statistics: Hypothesis Statistics : ‘5-49 inferential Statistics: Hypothesis

'Q8 If the critical region lies on the one side of the distribution then the test is referred as, [ 9.16 The probability of Type | error is referred as

[a] Zero tailed B One tailed [a] 1~a | {b]6


Two tailed - [4] Three tailed [Ja [4 1-8
Q.9 if the critical region lies on the both sides of the distribution then the test is referred as, Q.17 Alternative hypothesis is atype of

[al Zero tailed [b] One tailed [a] research hypothesis n composite hypothesis
ſc] Two tailed [d] Three tailed 5 null hypothesis [d] simple hypothesis |

Q.10 The type of test (One tailed/ Two taited) depends on, Q.18 If a null hypothesis (H,) is accepted then the value of test statistic (z) lies in the

[a] Null hypothesis [b] Alternative hypothesis


[al rejection region [6] acceptance region
ſe] Simple hypothesis [dl Composite hypothesis
ſc] left tail of the curve fa] right tail of the curve
Q.11 Arule/ formula used to test a null hypothesis is called
Q.19 The range of level of Significance is
[al population statistic [6] variance statistic
[a] -etod [b] — © to 00
[c] -test statistic [4] null statistic

Q.12 Consider a null hypothesis H, : 4, = 15 against H, : H, > 15. The testis


0 toe | [d] 001
_ . «

Q.20 is known as confidence coefficient.


[a] Two tailed [b] Left tailed
, [b] 1-&
[c] Right tailed Eli None of these
Q.13 Consider.a hypothesis where H, : Hy = 2 against H,: 2, <2. The test is 7 " [d) 1-8
Q.23 The independent values in a set of values of a test is called as
[al Two tailed [b] Left tailed +

[a] degrees of freedom [b] test statistic


Right tailed ſa] None of these
[c] level of Significance [4] level of confidence
Q.14 Type | error occurs when ?
Q.22 A bank utilizes these tellerwindows to render service to the customer. On a particular
| [al We accept H, if it is true [6] We reject H, if it is false
day 600 customer were
served. If the customers are uniformly distributed over the
ſe] We accept H, if it is false [d] We reject H, if it is true counters. Expected numbers of customer served on each counter is

Q.15 Type ll error occurs when ? | ſa] 100 : [b] 200


[a] We accept H, if it is true [6] We reject H, if it is false 300 | [ 150
[c] We accept H, if it is false [a] We reject H, if it is true

TrRe RURAL ALOT IAATIANIC® brit for bnnwiarina TECHNICAL PUBLICA TIONS® . an up-thrust for knowledge
5-67 inferential Statistics: Hypothesis
inferentiat Statistics: Hypothesis Statistics
Statistics . 5-50 =
black
Q. 28 Among | 64 offspring's of a certain cross between guinea pig 34 were red, 10 were
from a set of tables. Th e frequencies of the
digits are as.
Q.23 200 digits are chosen at random and 20 were white, Acceding to genetic model, these numbers
should in the ratio 9:3:4.
follows : o
2 | 3 | 4 5 6 7 8 9 Expected frequencies in the order .

[ Digit 0 1 fo] 12, 36, 16


16 | 25 22 j 20 215 ſa] 36, 12,16
Frequency | 18 | 19 | 23 | 21 j
freedom is . 20, 12, 16 [d} 36, 12, 25
The expected frequency and degree of

a] 20 and 10 [b] 21 and9 Q.29 A sample analysis of examination results of 500 students
was made. The observed
are in the ratio 4:3:2:1 for the
frequencies are 220, 170, 90 and 20 and the numbers
ſe] 20 and9 [a] 15and8 Then the expected frequencies are
various categories.

Q.24 in experiment on pea breeding,


the observed frequencies are 222, 120,
32, 150 and
[a] 156, 150, 50, 25 [b] 200, 100, 50, 10
40, then x has the value .
' expected frequencies are 323, 81, 81,
fe] 206, 150, 100, 50 [a] 400, 300, 200, 100
[a] 382.502 [b] 380.50,
cies are 222, 120, 32, 150 and the
a 30 in experiment on pea breeding, the observed frequen
429.59 [a] 303.82 proportion 8:2:2:1. Then the expected
theory predicts that the frequencies should be in
e,, €,, , are
are 5, 10, 15 and expected frequencies frequencies are .
Q.25 If observed frequencies O,, O,, O,
each equal to 10, then % nas the value [a] 323, 81, 40, 81 [b] 81, 323, 40, 81
[al 20 {jw [a] 433, 81, 81, 35
[s} 323, 81, $1, 40
15 - [a5 s was made. The observed frequencies
Q.31 A sampie anaiysis of certain data of 1000 student
holiday are 5:2:2:1 for the various
of the week, excluding sunday which is and 120 and the numbers are in the ratio
270, 390
Q.26 Number of bo oks issued on six days
are 220,
is 120 books : on each day, then x is
given as 120, 130, 140, 115, 435, 140 and
expectation categories. Then the expected frequencies are

[a] 150, 150, 50, 25 [b] 200, 100, 50, 10


[a] 2.58 DEES 500, 200, 200, 100 [a] 400, 300, 200, 100
[ce] 6.56 [a] 4.58
s for
ing are. expected are observed frequencie
Q.27 A coin tossed 160 times and follow
Answers Keys for Multiple Choice Questions :
number of heads
1 2 3 4 | Ol a Q2, a Q3 b Q4 b QS) a
No. of heads 0

Observed frequency 17 $2 54 | 31 } $., b Q9: c Q.10 b


Q6: c:Q7! atQ8:;
60 | 40 | 10
Expected frequency . 10 40
7Q.14: d 1 Qs c¢
Q.1i1: c :Q.2; ce ;Q.B b
Then x, is d jQ2; Dd
046: ¢ (QUT) b iqQ48; b }Q49
ſa] 12.72 [b] 9.49
fd} 9.00
e
TECHNICAL PUBLICA TIONS® - an up-thrust for knowledg
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics 5-82 © + , inferential Statistics: Hypothesis

Q21: a {922i d iQ23! ce {Q24} a 1Q25: d


Q.26: d {Q27; a 'Q28: a 5Q29: c 1 Q30: ¢
Unic V1.
Q.31: c

Inferential Statistics :
Tests for Hypothesis
Syllabus
Steps in Solving Testing of Hypothesis Problem, Optimum Tests Under Different Situations, Most
" , : : Powerful Test (MP Tesi}, Uniformly Most Powerful Test, likelihood Ratio Test, Properties of
. . Likelihood Ratio Test, Test for the Mean of a Normal Population, Test for the Equality of Means of
; \ . . Two Normal Popidations, Test for the Equality of -Means of Several Normal Populations, Test for
the Variance of a Normal Population, Test for Equality of Variances of two Normal Populations,
Non-parametric Methods, Advantages and Disadvantages of Non-parametric Methods.

Contents
8.7 Pre-requisites :
6.2 Optimum Test Under Different Situations
6.3 Best Critical Region (BCR) /Most Powerful Critical region / Most Powerful Test (MP)
. * 6.4 Uniformly Most Powerful Critical Region or UMP Test
6.5 Likelihood Ratio Test (LRT)

6.6 Properties of Likelihood Ratio Test

6.7 Test for the Mean of Normal Population


. 6.6 Test for Equality of Means of 7 wo Normal Populations
| 6.9 Test of Equality of Means of Several Normal Populations

6.70 ‘Test for Variance of Normal Population

6.11 Test for Equality of Variances of Two Normal Populations '


6:12 Non-Parametric Methods
Multiple Choice Questions

.
. - : {6 _ 1
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge : . _ )
Statistics . \ 6-3 inferential Statistics : Tests for Hypothesis
Statistics * 6-2 inferential Statistics : Tests for Hypothesis |
2. Uniformly most powerful test*(UMP)

ERM Pre-requisites 3. Likelihood ratio test

Parameter space Ba Best Critical Region (BCR) / Most Powerful Critical region / Most
Parameter space is'a set of all values that a parameter of population can take. It is denoted Powerful Test (MP)
by 6. In testi ing simple null hypothesis (H)) vs simple alternative hypothesis (H,) the critical
eg: | — “region which has maximum power (1- 8) among the set of all critical Tegion belongs to
value *
1) Consider normal distribution N(l, o). Here the parameter Þ (mean) can take any - sample space is called most powerful test.
from — oe to 0 ; therefore parameter space for | is . The region is called most powerful region.
= {p/—o<p<o} While testing simple null hypothesis -
And the e param (variance) can take only positive values, therefore patameter space H,: 8=0,

foro’ is , vs
= {o/o' > 0}
2) Consider a null hypoothesis The critical region W is said to be most powerful critical region of size 0.
: Þ<4 if P(x € W/H,) = JLo dx =
ws.
Here pcan take any vaalue from co to 4
P(x = W/H,) = P(x € W,/H,)
= {u/oo<ps 4}
Where, W, is any other critical region.
©, is parameter Space wrt. Hy. -
Clearly ©, < © [X41 Uniformly Most Powerful Critical Region or UMP Test
vs composite alternative hypothesis, the critical
_In testing simple/composite nnull hypothesis
Likelihood function
region which has maximum power among the set of all critical regions belongs to sample is
It describes how likely a specific population can fit the observed data.
called UMPCR or UMP.
Definition : Consideer a population with p.d.f. UMP region may or may not exist. However, for H, : 8 = 6, vs H, : 0, > ®, or H, : 8, < 8,
function
f(X,9,, 8,,... 0) ; where X : X), Xz Xz, ++ Ny isa random sample. ‘Then likelihood
a UMP test exist.
of given sample observation is
Note : BCR or UMPCR is find out with the help of Neymann-Pearsson Lemma.
= Tl f(x,8,, 0, -.., 0)
1=1 Neyman - Pearson Fundamental Lemma (N-P-Lemma) Statement :
Let k> 0 and W bea critical region of size & such that
(GE) Optimum Test Under Different Situations ov

> kj | .. (64.1)

| +
(8) or to maximize
In hypothesis testinng, our main focus is to minimize the type I error w = {xe s

on
test statistics and critical
power (1 — B) under fixed a. This can be done by proper choice of
. ,

| md
region . W=jxeEs ;=<k ... (6.4.2)
0
Henceforth, following methods help us to choose best critical region.
1. Most powerful test (MP)

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge


TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
PIFaT77 or
Statistics
Statistics inferential Statistics : Tests for Hypcthesis
Where L, and L, Q are the likelihood function of sample observations under H, and BH 0
#1, @
respectively then W is the most powerful critical region of size 0 to test hypothesis. A
H,: 8 = 8, Vs H, :6 = @,
. _—y
Proof: Let W be the critical region of size & then
B
Plxe WH) = a= J Lo dx | ... (6.4.3) Now consider (6.4.3),

<k © (nw

Fl [1
Px € W/H,) = 1-B «JL dx ... (6.4.4) 0 .
= L,< kL,
Let, W, be any other CR of size &, < &.
ſues, Shore [Lies
P(x © W,/H,) = «= ) To dx ... (6.4.5) B B. A |
1
ſt, a < {Li a
P(x . W/H,) = 1-6, . = Ww,J Ly a . .
... (6.4.6) B . * A |
Tues | ia
Consider ,
;
|
W
.
W,
S
Buc ~ AUC
aS a
from (6.4.5)
and (6.4.6) Ww ſua.
1 «J ſie
! | - 16 aud.6)
From (6.4 (6.4.4)1

| Lo dx < | Lo dx 1-B, < 1-8) =


W, Ww, | Fig. 6.4.1 => Power of W, < Power of W
2 : . ,
) Lk [| tyax => W is the best critical region.
BUC © AUC
_ ſtodx 5 [hoes Unbiased test:
B A If the power of a test (1 — B) is never less than its size a ie. 1- B >
then the test is known as unbiased test.
ſr, _ fyax
- A B Note : For MPCR or BCR, powe: is always greater than its size, so test is always unbiased.
Now, consider (6.4.1),
L
TL? k (on W)
0

> L, >kL,
_ ſua, [Load
W Ww
fx, dx >k- ſL&,. J to ax
A A B

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge TECHNICAL. PUBLICATIONS® - an up-thrust for knowledge
inferential Statistics : Tests for Hypothesis
Siatistics
Statistics 6-6

Solution : Given :

f(x,8) = = -O<x< O41


9<0+1 Solution. : H f=f, vs H :f=f,
Ho: @=1 Vs H,=0=15
1 Qa -0.2
& = P [Reject H,/H,] P [Reject H,/ H,]
‘We know that Oo
= P[X>1.7/H, 0.2 P[XEW/H |]
= 1-P[X<1.7/H]. |
where Ww {1,2, 3,4}
1.7
x 2 is the critical region.
alm

7
IS
_

Lp .
MX
ll

2] Power of the test


i-P [Accept H/ H,]
P [Reject H,/ H,]
P[IX=2/K]

1-B* 0:3.
& = 0.37
3] There are two critical regions for which & < 0.2
B = P [Accept Hy Hi] .
= {X=
OW,

wn

I
on
LY
= R(X<1.7/H,) * Power of test P [Reject H/ H,)

I}
P[X=2/A]

0.3
ii) W,= {X= 1}
, Power of test P [Reject H/ H,]
P [X= 1/H,}
I
2,
©
|
i

0.4
4]W,is more powerful than W,
5] Yes the C.R. {X = 1} is more powerful than {X =2}
CR.

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge


Statistics 6-8 Staiistics . _ 6-9 inferential Statistics : Tests forHypothesis

= 1-P {xe W/Q=2}


1.5
Solution : | H, 0=1 vs H,:90=2 . = 1- ſIfx,9),.,&
I
Here C.R.=W {x:0.5<x}
B= 0.75 :
{x:x > 0.5}
Power function = 1-Þ = 0.25
ll P {x & W/H,}
N

P {x>0.5/6=1}
WU

P {0.5<x<8/Q=1}
P {0.5<x<1/6=1}
1
Jo ax W= {x:x 2 1} and W={x:x<1}
0.5 89-1 H,:0= 2 andH,:0 =1
1 & = Size of the type I error
\1ax=0.5 = PREW/H|]
0.5 = P[x>1/0=2}
Similarly, B PIxeW/H] = fe™ dx
1 :
P [x <0.5/0=2] ET
0.5
J (Gs, 99},-, x —__-
22 1
0 =e =z

0.5 1
J 5 dx B = Size of type 1] error
0
= P{xe W/H] =P{x<1/0 =1]
B = 0.25 1 Te 1
Therefore size of type I and type II errors are & = 0.5 and Þ = 0.25 respectively.
0
ii) W=CR={x:1<x<1.5}
a P {xe W/6=1}
1.5 Example 6.4.5
J f(x, @)],.., dx
1
“= 0
Solution : Pdf for norma! distribution is
Since under H,:0=1 2
f 1,2 1 (4g)
f{x, 8) =0, for l <x<1.5 P.d.f= (2n nl exp SE
%
2
B= P {xe W/0=2} Mean = 1, variance = & 0

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
” Statistics . 6-10
inferential Statistics : Tests for Hypothesis
The likehood function is

Llp, 0) = (no) exp.


L (x)
=] p

Therefore the ratio of likelihood function is Solution: -

(x,~ 10)"
f(a; 6,y) = Bexp {-B(x—)},x>y
- (32n) ** exp = = 0, - . :
otherwise
the likelihood function is given , ; .

L(x, B,y) = B*expy-B 3 (iv) . |


isl -
on cancellation,
| { 16 16 using N-P lemma. . |
exp )— 35 © @-10y- L. (=p) [EK Ratio of likelihood function is given by,
Pl i=l n

| x |
-S(Lx —20 Ex,+ 16% (107 —Lx; + 2p Ex;—16 uw
i= J |

BM
exp
L(y)

vi

. fod 2 2 L(Y) . tT
= . exp 35 (2 Ex, (u-10) + 16 (4 —I10);<K
x
Taking natural log on both the sides
-2(y =10) £x, +16 (y*—10\) £32 in (k)

=
_— i 1} | 6) ep]-B, Lo 040 + Bon (Xn) 2
> 16% = Tee (p10) (32 Ink-16 (u’— 10°) = =_ PR 9
=k say : Taking log,
We know that . —
nin Ea Boy = ink
B, -B,) +9 8,¥,-n
= LO cit ze -
L(y)
Therefore BCR of size o for H, : # = 10 vs . > x (B,-B,) $ 1B =tBeSlnk+ln 5]
H, : p>10 | > *<3 6,5 ¥;B,-%By~a ln k + In = BL Bo
is given by
Example 6.4.7 7
| CR {Op Xp en&) 1% BK}
_y QC (PX>k) when u=10 _.
HN

.. CR is a unifo rmly most powerful region of size Q, Solution :

For f(x, 8) =

TECHNICAL PUBLICATIONS® - an up-thrus! for knowledge


TECHNICAL PUBLICATIONS®- an up-thrust for knowledge
Statistics : :
Statistics 6-12 inferential Statistics : Tests for Hypothesis 6-73 inferential Statistics : Tests for Hypothesis

To test Hy: 0€ ©, Vis H,:0< ©,,


likelihood function is defined as
n LRT statistics denoted by A or A (x) is given by,

[1 f6,9-L6,9) = (+6) n +6) max


1
L (x,0)
n

LO) _9€
. E Oo
i= 1 i=1
(x)= A max L (x,0)
By N-P lemma L®) gee
Ratio of likelihood function is given as where L (x, 9) is likelihood function then LRT says
Reject H, if,A <C
4
(1 +8) i +0) where C is constant choosen such that
— n
>k P[A<C/¥Hy] | =O |
n Il 1
i.e. C is choosen s.t. probability of type le error is of size a.
1+8
( o (= (x, + 6g)?

Remark : :
n 1 . 1

" IL 1) Clearly ,> 0 =


> 1+9,)
059) U Gror>k(1+0,)
* IG 0g) ii) Since @, CO => max L (x,0)< max L(x, 9)
. Be Qo | 8 E ©
. Taking log
=> , - LO < L@)
> nin(l+0)-2 5 in (x,+0,) 2 In(k) +nin (1 +9,) - 2E (6.469
i=l i=
L@,) i
x, + 6, 1+9, \
> 2X x, +8, >Ink+nln TT6. A
L(O)
=> A 1
here the test criterion is of the form

IA
Thus 0 < ASL (Using remark i)
I In xt 8p In other words, smaller the value of A, more the chances of rejecting H,.
i=1 AX tS, |
i) If H, and H, both are simple hypothesis, then L.R.T. is equivalent to N-P lemma.
Which can't be put in the form of a function of the sample observations, not depending on
iv) if either one or both hypotheses (H,, and H,) are composite then LRT statistic A is a
hypothesis.
function of every sufficient statistic for 9.
Hence no BCR. exist in this case.
(HJ Properties of Likelihood Ratio Test
6.51 Likelihood Ratio Test (L.R.T.)
1) Ifnull and alternative hypothesis
N-P lemma is applicable only if both H, and H, are simple hypotheses. But if either H, or are both simple then LRT leads to the N-P lemma.
H, or both are composite then no such theorem is available. In such cases we can employ the 2) if UMP test at all exists then LRT is generally UMP. In this case following
asymptotic properties of LRT are satisfied.
likelihood ratio test (LRT), to find BCR and power of the test.
a) Under particular conditions, —2 log, A has asymptotic chi-square distribution.
Statement - .
b) Under certain assumptions LRT is consistent.
Let © be a parametric space, ©, and ©, are parametric Spaces W.r.t. null ©, and alternative
hypothesis ©, respectively. Such that ©= ©, U ©).
Considera random variable X with p.d.f. f(X, 9) where 8 & ©

vn ta dns TECHNICAL PUBLICA TIONS® an up-thrust for knowledge


a 1 =
Statistics 6-15 inferential Statistics : Tests for Hypothesis
Statistics 6-14 inferential Statistics : Tests for Hypothesis
variable parameter
Similarly in (,) (parameter space over H,}, as Þ is specified, of is the only
[Ed Test for the Mean of Normal Population
“+ Maximum likelihood estimate of o under H, is
Mean and variance both are unknown
population with mean and
Let (X,, 2, «- , X,) be a ranndom sample of size n from normal
variance of, where both Þ and o are unknown.
Consi der the null hypothesis (composite)
Hoi = bo: 0 <0 <e

V/s an alternative hypothesis,


H):4 #U, 30 <0 <co.
; parameter space over the entire region is
ie
For unknown 1 and
1g i=lL owt D ny seen) i=l> 6-9
i, o)/-<p<o,0<o<o} i=l
and the parameter pare determined by the null hypothesis i is n n

= {(p, o) /u =p o > 0} yw wt a) v1
i=l

ae)
=]
Als we have, for N & &P) distribution curve is given by the formula.
= L
%, is given by °
Then likelihood function of the sample observations Xp X,,.--,

L(@) I Fae (=p)


inion P28
L (x, —)* +( - Hy) = 85
N 4

(6.74)
.
[= 5
=
exp| SO _— i. (6.7.1
Substituting equation (6.7.4) in equation (6.7.1) we get,
I

2710

The maximum likelihood estimates (MLE) of and & are given by


L@©,) = =] © = | |
ft = a, X; =X by
Thus from equation (6.7.3) and (6.7.4), LRT statistic is given
L 6 2 ni2
n
HOOA Is,3g, .. 6.7.6)
= EY. (6-7 = 8" (say) 4, =
Ic

. .
1=
L{®)
2: w2
Substituting ofeequatio n (6.7. 2)i in equation (6.7.1), we get
3 +... (fromequations (6.7.2) and (6.7.4)

<
n

il
ub) = [oo] wee E 5x -¥) = ‘|

A )
w2

= ... (6.7.7)
[ea] . i. (6.7.3)
2s

estimate of L in parameter space ©. :


Which is maximum likelihood
TECHNICAL PUBLICATIONS® - ari up-thrust for knowledge
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics : 6-16 Inferential Statistics : Tests for Hypothesis Statistics = 6-17 inferential Statistics : Tests for Hypothesis

Now, according to students t - distribution with (n — 1) d.f., the statistics is . K = t, (0/2)


Ym

Thus for a normal distribution where o is unknown, while testing H, : = Þy against


H, :##U,, LRT converges to two tailed students t-test.
n Hy is rejected if
where S = ma 2 (x, 3} a= (From equation (6.7.2))
vag tty) >t, (a2)
lt =

and we accept H, if
[t] < t,_, (0/2)
X— My Remark : | .
~t 1) Consider, testing of H,: = uy, 0 < of <oo against H, :u >, 0< of <a
fl

Sin -1
The parameter space is
t RK Uy
=> © = {@, 0); => <p<o, o >0}
n—1 S
II

and that w.r.t. H, is


Thus equation (6.7.7) reduces to
1
®, = {(w;o): p=py, o*>0}
A= oo = oft) Gay) For the parameter space ©, the MLE (maximum likelihood estimate) of u and of are
( 1 5) A i
w= xX Ifx2þ,
The critical region of the LRT viz 0 <A <C is equivalent to
2
=w2
. = Wy ifk <p,
(1+ n-1
L | <c A n
(of) = iy (x,-x)=S° if X24,
Mist
2 n2
: +t

OE
—1
-

=>
o (
+
Iya
t
=)
2

2c
2 Cc
=2in
- Enos
1 + 2 _ a2 >

And for the space ©, MLE is


(@-1) [C21]. A ;
i)

IV
Y

EM:
Ke (say)
IV

3 1 i 2 2
Thus the critical region may be will defined by
& = i. >) =
VeGat)

il
Then likelihood functions under © and ©, are
it} =
A i W | a 2
where the constant K is derived such that size of critical region am Le. L(®,) = (Fr 5) exp 57, i (x,
— 8)
P (i= K|H,) = a i=l

Since, under H,, the statistic “t”. follows students t-distribution with (n —1) degrees of
freedom, :

© TECHNICAL-PUBLICATIONS® - an up-thrust for knowledge


TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics 6-19 inferential Statistics : Tests for Hypothesis
Statistics ‘ 6-18 inferential Statistics : Tests for Hypothesis
In this case we get left tailed t-test as
1 =

_ Jens, ) © fit pam


\ſn(x- Hy)
< - 4,
1 n -w2
ifX2 Uy
y2nS} ° Thus we reject Hy ift <—t,_, (@) otherwise H, is accepted.
n
2
We summarize is inn following table.
: A 1 1 a
and L(®,) = ‘on G2 exp oor L (x; — Ho)

1 A —12

\ſ2zs,) ©
Thus LRT statistic is given by

A=
L Gp
A
yy

9
“Ym

aN
77
W

fff ifx2 |
Ii

Table 6.7.1 Normal population N(u, o) o is unknown)

= 1] if x <p
[HI Test for Equality of Means of Two Normal Populations
Which is same as the LRT statistics derived in equation 6.7.5. +
Let X, and X, be two independent random variables from normal populations, having
Therefore following the same procedure (as in the case of testing of H,: w= Þy vis
« means 1, and Jl, respectively and corresponding variances bo and o..
: 1 Pg), we get ihe crilical region viz 0 < A <A equivalent to '0?

Let Þ,, >» °, and 9, all are unknown. Consider the problem of testing hypothesis
2 2 . .
t =
Jn &-pg)
sek . "x2 H, : 1, =, =; ©,, 0, > 0 against an alternative hypothesis.
2 2
H,: W#W; 5,,0,>0
Here t is.right tailed t-test and k is calculated as 2 2 .
P(t>k) = a CaseI: 6, # ©, (Population variances are not equal)
Parameter spaces w.r.t. entire region and H, are defined respectively as
k= t,_,@
n-

Thus we reject H, : w= yt, if © {(,, Hy 6, , 09): HERE, o >0} i =1,2


t>t,_\(&) © 0 {(Þ, 5): <p <c,0, >0}1=1,2
Otherwise H, is accepted Let (X, X12 Kip wey Ky and Xap) Xago Xgqr vers gg DE the random samples of normal

Similarly to test populations N(u,,o ') and N (1, © "), with sizes m and n respectively,
H, : = Ly O >0
Then likelihood function is given by
against 1 m/2 = n/2 1 I/n
Hy: <p, o >0 L = [== exp|” LY (1 xg) exper 20, j=1
gn)
The critical region is given by --.(6.8.1)
t <-t,_,(@)
TECHNICAL PUBLICA TIONS® - an up-thrust for knowledge
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics 6-20 Inferentia { Statistics: Tests for Hypothesis Statistics : 6-21 Inferential Statistics : Tests for Hypothesis

Taking tog on both sides we get = (Wo); eSB Se, of >0}


2
then likelihood function when 0, = 0, = 0°
vet Ploeg oer”ag 2, OW Tgh2
1 +1 1 is defined as.
= — yb ... (6.82) -
: 1 m n

MLE for pt, and 5; (6=1,2) are given by - : min


NT. 20 i 2
[= + Lingus]
=) j
= ne e - ©. (6.8.6)
£-gL
ou, Og = 0=>h, > t== LY = x GER,
MLE of u,, p, and of under © are
1
_0_ A _Þþ _ F—log
Iu;
oO
L
Lan
= O> e
4, = x,
au,
=—logl = 0>k,=7 & Xz; =X, ... (6.8.3)
2. Noon ;
a his 2 a
au, logl = O>u,= X, ©. (6.8.7)
—zlogL
Io. = 0> o; ms (x,,-X,) =S, (say)
_0_ 4 1 m / A a “A 9
4 n 5 log E = 0S& = Ob (x; =1)* + 2. (x, — *
9
—logL = 050;== Y (8, =} =, (say) . A=] j=l J
FE 1 Nia] 3}
1 2 2 2
= ſms, +ns8] = (stay)
Substituting these values from equation (6.8.3) in equation (6.8.1)
| 2p 1 2 -(=; 2)
+

A
We get, L(O) = |—z —; e , .-. (6.8.4 211/22
“BF © = as) (ams) * | —_ mtn 32] L &~ wht Yo
Now, in the parameter space defined under H, (i.e. in g, we have Þ,= Þ, = Þ and the
1) 2. tl
L= (=) e -.. (6.8.8)
likelihood function is given by,
n | And MLE ofu and of are
YL (9 -| 1/26} 2 (%; =) |
(1 ©) j=] jel Osh 20 sal -2 By 1)-2Y(x, 1) |=
oa Gee=o weeaye ©
In (©), MLE of 1 is obtained as the root of cubic equation SS x, aye ~mu —nu =0
+l * Fl
om (%,- 1) n (X,— 1)
+ ... (6.8.5) >mx, + nx, =(m+n)u
= A 6 A
x (x, - ay > (x; py
i=] j=l SHE USER ... (6.8.9)
Which is a complicated function of sample observations. Apparently the LRT statistic and
Similarly, Glog
log LL = 0
also become complex function of sample observations and its distribution become tedious.
Therefore critical region for given 1.0.8. & can ot be obtained.
Lo 2 2 2 2 2 ma So
Case H : Population variances 6, and ©, are equal 1.e. 0, = 0, =0& {say) 20} i= a 2j

The parameter ee are


== {pts 8): oS; <0, o > 0} i=
o = o* = =
| a ae = Roof ...(6.8.10)
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge aa PUBLICATICNS® - an up-thrust for knowledge
Statistics , | 6-22 inferential Statistics : Tests for Hypothesis Statistics _ 6-23 Inferential Statistics : Tests for Hypothesis

mtn
Consider 4 1 2
= ” ; .- (6.8.14):

EA
a
2, (x;
A.2
=Þ) = (=3 +3, —ÞY ial
a
mn (X,— X,)
1

_il
wt
(m+n)(ms, + ns})

EA
(> FO + m(X, = uy For students t-distribution with (m + n — 2) degrees of freedom, the test statistic is given by
1
T (See explanation in test for mean of normal population)
mx, + nk,
AN

1
ms, +m|xX,7 Xn
ll

J 2.
A .mx.+ nx. Where g = RIT_
[pe and ms; = Souk x))]

2, =
2 mn (X,~X,)
= ms 7”
by (m+n)
t
2 . mn (X, - X,)
2 , — 42 5
—_ . Aa 2 mn (X,-X;) = — Ta
Similarly,
4 Ay:
2 . (x4—Þ)
_ == 8; + a+ nj? m+n-2 (m+n) (ms, +ns,)
.. equation 6.8.14 becomes
Substituting these values in & in equation (6.8.10) we get mtn
N 1
os 6-5 | (6.8.11), . : x = [1+ t pee) | 68.15)
T = m+n [ ms? rn + mtn mt+n-2

Thus from equations (6. 8.8), (6.8.9) and (6.8.16) we get The critical region of LRT is given by
_min [= + a] ;
A mtn. 2 0<A<C
L@) =e 2.+4— mn > ..(6.8.12)
2n (157+ ms? REIT RS "ec
m+n-2
From equation (6.8.6) and (6.8.7) we get,
m+n m+n
LG) =- © 2 2 . [ann ... (6.8.13)
(6.8.1
t
2 1 1
. I |
From equations (6.8.12) and (6.8.13)
, I
LRT statistic is given by 2 (m+n-2) = 1
A

A =
L(O,) . tek (say) .
A
L(®,) =. It|2k
2 2 mtn where k is calculated as l .
(ms, + ns,) 2 P(\t}>k|H,) = & , ... (6.8.16)
It

2: 2 mn = 2
(ms,.+ ns.)+ mn (X,-X,)

TECHNICAL PUBLICATIONS® - an up-thrust for knovdedge TECHNICAL PUBLICATIONS®- an up-thrust for knowledge
6-24 3
inferential Statistics : Tests for Hypothesis .
Statistics ‘
Statistics : 6-25 inferential Statistics : Tests tor Hypothesis
of freedom
Since, under H,, the statistic t follows the student’s t - distribution with degrees
GE] Test of Equality of Means of Several Normal Populations
(m +n — 2), therefore from equation (6.8.1) we get,
k = (0/2) ... (6.8.17) Consider there are k normal populations with means þ; , @ = 1, 2, ... k) and common
| .
ten2
2_ 2 2 . 2 22 2. variance o*. Which is-unknown to test the null hypothesis Hy: p,=—, =H, =... =U, =U
Thus LRT for testing Hy : þ, =, ,6,=6, = 0 vis H, :W#Þ,,0, = 9, is
against an alternate hypothesis Huw Wy=W,6.. ch
equivalent to two tailed t-test where we reject H, if
Letx;
þ (i=1,2, .... kandj=1,2, ...Okn) be thej® observation
.th
from i normal population

jth= having sample size n; (=1,2, . .,k) and x n=n


=1 1

te
Otherwise H, may be accepted. In this case parameter spaces are defined a
~ Remarks : 2 2 2 . 2_2_ 2. : ©
. : 0 1S same as <a, o> 0} i =1,2, wk
1) LRT for testing Hy : yy, = Þ, 3 0 = 0, =O against Hy: HW, > Hy, 6,=0,—
{(u, of) ;—<p,

{(u, 0) ;—<p <<, 6 >9}


2
right tailed t-test and its critical region is given by and a 9,
The likelihood function w.r.t. © and © , re given respectively by
it] = IT > t.., (0)
> . S m n
2_ 2 , . 2_ 2 2_ 2 2, ;
— 2
0, o ; we get left tailed
2) To test H, : Þ, =, 0, = ©, = © against H,: py, < py GE
L(O)
t-test whose critical region is given by

L(©,)

i
3) Po The MLE of p,(i= 1, 2,

1,2,...,k

pets
ll

ll
=
and _—_ o =

where So =

Table 6.8.1 Norma! populations with means |, 2, respectively.


tae 2 2 2
Variances 0, = 0, = 0 (unknown).

TECHNICAL PUBLICA TIONS® - an up-thrust for Knowledge TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics 6-26 ‘Inferential Statistics : Tests for Hypothesis Statistics 6-27 inferential Statistics : Tests for Hypothesis’

—_—_ _1 2 _2r
and & =o x 2, (xg —X) =

where
NN Asc
©
|

Then
I

, I!

and L@,) = Gs) e


With these value, LRT statistic is given by

A =
LO)AE 3($392 (6.9.1) F > A (Say)
p Where A is obtained as P (F > AJH =O

From E-table at 100 & % of 1.0.s. and for k—1,n-k) degrees of freedom, we get
Consider si = $ E Sy
a =] A= Fer, cx (0)
. Thus, H, is -cjected if
=> (xj-% +x) = Fer,n-k(O)
me =]:
Otherwise it is accepted.
= $ 5 (x; —X) "3 ni,- X) Note : To find MLE and ML function, follow the procedure as discussed in previous cases.
(dell
= Sot S; . (6.9.2) Remark :
ni
In analysis of variance (ANOVA) the terminology.
where S,= 1, (%,-%)
Fl i) Sy is called Within Sample Sum of Squares (W.S.S.)
From equation (6.9.2) and equation (6.9.1) becomes ii) S$? is called Total Sum of Squares (T.S.S.)
iii) S, is called Between Samples Sum.of Squares (B.S.S.)
L ToT (6.9.3) | iv) Sz/k — Lis called between samples Mean Sum of Squares (M.S.S.)
1+|<
Sp

v) S,/n — kis called within samples M.S.S.


Sw.

But under Hy . _— —_—


_ [IJ Text for Variance of Normal Population
SY k-1
F = (6.9.4) Let X,, x, ..., x, be random sample of size n, drawn from a normal population having mean
Sy/n-k . 2 -
and variance &.
Follows F-distribution with degree offreedom (k- 1, n—-k) . : . 2. a . 2. 2_- 2
Assume that the variance o° is known and is equal to &, i.e. o' = &,
From equation (6.9.3) and (6.9.4) we can write .

‘TECHNICAL PUBLICA Tions® « an yp-thrust for knowledge TECHNIQAL PUBLICATIONS® . an up-thrust for knowledge:
Statistics . : 6-29 inferential Statistics ; Tests for Hypothesis
Statistics 6-28 inferential Statistics : Tests for Hypothesis

2 2 Follows chi-square distribution with (n—1) degrees of freedom


To test Hg:0 = %
2 2 From equations (6. 10.1) and (6. 10.2) we get
against H,: 0 = Gy

We have the parameter spaces.


0/2 =>
A {Ey * o& —1)
O= (0): -e<pcoo > 0}
and y= (WO) <<, =55} and the critical region is

Likelihood functions in © and ©, are Viz A<C

(ep? 2%
L,2_ n)
7 1 Ye? 196he & =)? n =
Cc

A
HO) i Ge = © . , W2 (-(/2)+ n/2 |
(x) e _<Cn?
CON 222 > (& —u)* 2 RY) OO
and L@,) =- (a) * on (x) e <Cn" ef

phe
7 Og — 1,2
< c (ne )
MLE of © are
wu and &, in
< B (say).
WET ‘As the statistic x follows the chi-square distribution with @ - I) degrees of freedom, The
N =

critical region is obtained by pair of intervals 0 < F< % and XL < xs <a, Where XL; and % are
aA 1 2
and | of == 2 (,-%= $* [Refer 6.7 for MLE] determined such that - A

and in ©,, of = o, is known, sherefor MLE of i is, ae ni? 2M


x an +4
2 42 « (2 J2-
(%) - e ct }
A_
. W=EX : Le. x and X; are defined as
Thus likelihood functions under © and ©, reduces to P(x >X) = 9/2; . 5 y - O i
A 1 wW2 -n/2 2 2 {
. X2 L4 ,

P(x >%) = 1-4 . Fig, 6.10.1


(9) (=) °
A | n/2 - ns*/20} in other words Xx, = er (0/2)
and L(®,) = | VE e 2 2 a
2no X2 = mat -$) , «.. (6.10.3)

Then LRT criterion become Where x. (0) is upper & -point of the chi-square distribution with n degrees of freedom.

A a {tsa Thus required criticalregion is two tailed region defined as


- If) S) 2 % OX > Xa (9/2)
... (6.10.1)
A
L(O) x < xl -2)
But under H, the statistic
a - ng We reject H, if x > He (or) ory < nr {1 -%) otherwise we may accept it.
x 5, . (6.10.2)

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge


- or meme ena mrnarnn® on a hornet far Uanudadaa
A ALD he AL 4A

L@) (m+n)? (ms?) (nS:)m+n (6.114)


A = nue 2
and of is known. .
Table 6. 40. 1; Normal-population N (u, o y where p is unknown
[mS)+ nS] 2.

Populations
wae Test for Equality of Variances of Two Normal Under H,, the statistic
Lt, Hy» 5, and 5, are all
Let N (11,,07) and N (145,05) be two normal populations where Mm

, Y (4-7 /m-1

VT
unknown.
samples from normal

all
x = 1, 2, 3 ..., m) and x, (i = 1, 2, ..., n) be independent random

rot

Se
so

Le:
——
roo
_
I

i}
il

afl I
populations of sizes m and.n respectively. (x) —1
1


Suppose we want totest
0, = 0; =o FollowsF distribution with (m ~— 1, n - 1) df. substituting equation (6.11.5) in equation
Hy:
|
Ws OH: 0 #6, (6.11.4) we get,
m+nf. (m=1
the pa arameter spaces defined as . n-1
2 2 2 2 (m+n) 2 . .. (6.11.6)
A = mi? WE om + m2 .
o m n. m-1
2 2
>6}
i+ F
Þ,, &):—=< Ws M, <5 5 n-1
©, i;
fi

and the likelihood function is given by But 3, defined in (6.11.6) is a monotonic function of F. Therefore test can be carried
out
1 m following F-distribution with test statistic as given in equation (6.11.5).
(xz; 7 wu)?
121 W
—Yo
The critical region is given by a pair of intervals as
2 Fg Zou wy 2;

7 i=l ve 2J=t
\2no,} | 27s, F< F,andF2F, *
LE
. (6.19.1) where F, and F, are obtained such that
topics) in ©.and ©, and
Calculating MLE of p, and o, (k = 1, 2) (as discussed in previous P(F2F,) = 1-52

Il
substituting them in equation (6.11.1) we get,
1 m/2, , \w2 —(mt ny/2 P(F>F) = F
= Ins) nS e |
L@)

TECHNICAL PUBLICA TIONS® - an up-thrust for knowledge TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics . 6-32 inferential Statistics : Tests for Hypothesis Statisti ics —_ . |
: 6-33 inferential Statistics ; Tests for Hypothesis

ane
Since under H,: LRT is equivalent to F-test, thus from F-table,
= ad 2D & +A - 10)"
of
F, = Fon l.n- AC -2)

i
then LRT statistic is given by
a
F, = Fe-uei(3)
16). = F610}
where F__ m.n (&) is upper o - point of F-distribution with (m, n) degrees of freedom.
Thus we get two tailed F-test with critical region.
16).
+. The critical region A < c becomes
Q
F > Fon ten—2) (2)
76 10}
: a
and F< Foe tae) (1 -£)
mY 19 sinc
—4
(X10) 2Shc
Soiution : Given :
> [*— 10
9,:Þ 10
O,:4 # 10
As variance o* is known, therefore pis the only parameter of distribution.
| —hc
— 10 n ,
. We define parameter spaces as
iE=101,
ohſn © on = k (say)
© = {pl—oo<p 9} But from SND; M0, 1),
Oo = {p[p =10}
We know that p.d.f. forN (0°) is 2 = Ew): 1646
_ 2 Ne ohſn
f(X) = = 705) . We get Z> k
Where k is derived s.t. size of critical region is
& =5 % j.e.
Likelihood function becomes P(Z2k/0,) = a :
> — k= Z,= 1 96
Thus we reject ©, if Z> 1.96,

Le oy FL ww
an) © i=! | CY
0 =2)
Replacing 1 by its MLE, likelihood function under © and Q, is
Solution : Given

L(O)A = (4)102 e FED


—1I 2
f(x, 8) = 8e*
+ (1)
H,:0<6,
A 12? 15, 194 H :0<0,
L(O,) = (4) £4 56-10)
TEPHNINA! DURLPATINNS®. an un-thrust for knowledae
TECHNIDNAL Dont ane ®d
Statistics . : 6-35 inferential Statistics : Tests for Hypothesis
; 6-34 inferential Statistics : Tests for Hypothesis
.

=| a?
Statistics .
1 if k2
The parameter spaces are

=a
© = {8[9>0} 1
(O,x)" e 04% 1
(I-1 Te
if XXK <

©, = {819 <9
1 if 0x21
can be written as
Using equation (1), likelihood function

L= [I ger = oo
(eax OY if XOg<!
isl Let 8, x = t
equation
MLE of @ can be ‘obtained by solving the
(8, x)" en MED = en?
2 Gang
1 OE,
ie Exe
BE;
i=0
then
: _ _ - . | _ 1

90 Note that t'e a=) 1-5 maximum value ©1” att=1 i.e. when X = 3.0

> , Clearly A<1


it

n A Agssume that 0<C <1,


6-578 Then we reject H,ifA<C .
fn

under © and ©, become


ieiffe EC
- With this MLE 9, the likelihood function or if8,x=t<)
n .
A n.

L@) = I] [| e Thus for@<C <1;


iy" t<iandt'e “"<C,
Iff t < k, where k is a constant such that0<k< 1.
If 1.0.5. of critical region is Q, then k is defined as

A (=) ef 2x,” P(tsk) a

and LG) =) on P(8,x <k)


. gf yx, if 5x7 % => a

2K,
> P 8 <k

3
i
Then LRT statistic becomes
A
= P(O, =x, < nk)

3
ll
L(Og)
A = A
_ L(O)
= [=] a

I
0 [2

EEE] Non-Parametric Methods


so far, are based on an important
- All hypethesis testing methods, that we have studied
ly distributed. These methods are
assumption that the data, drawn from a sample is normal
about population parameters. Thus the
concerned about either estimating or testing hypothesis

e
TECHNICAL PUBLICATI ons® - an up-thrust for knowledg
Eo wanat AUANAATIANCO _ on un-thrict for knowledae
Statistics 6 - 36 inferential Statistics : Tests for Hypothesis - Statistics 6-37 inferential Statistics: Tests for Hypothesis

3. Letx, x, ..., x, be a random sample of size n, from a normal distribution Nw, &F),
tests which deals with drawing the conclusion about parameters of population are called
where fl and of both are unknown. Show that LRT, used to test H, - Þ = ky ws
“Parametric tests”.
f,.: H FM, O< o< co, is two tailed t-test.
But in practice, many situations arises where the population under consideration does not
follow any specific distribution. To deal with such problems, statisticians have derived severai 4. Prove that the LRT for resting the equality of variance of two normal populations is
methods and tests, in which no assumptions about distributions as well as about parameters of same as two tailed F-test. 7

are made. These methods are known as “Non-parametric” methods.


population ar 5. Derive the LRT to test H,: 0 = 0, against H, : @> 6,, based on a random sample of

As these methods are independent of population distribution, they are also known as size n, from a Poisson distribution. =_

“Distribution free” methods. 6. What is composite and simple hypothesis. Give examples.
7, Write Short notes on -
PRE Advantages of Non-parametric Tests (N.P.)
1) Most powerful test 2 Uniformly most powerful test
N.P. methods have many advantages over the parametric methods.
3) The best critical region 4) Level of signifi icance.
1) N.P. methods are easy and simple to understand, easy and simple to implement when
8. What is the importance of pe 1 and type [1 errors in Neyman and Pearson, theory of
sample size is small. .
testing.
2) N.P. methods do not require complicated computation, therefore these methods are less
9. State and prove Neyman pearson lemma for testing a simple hypothesis against a
time consuming. . . . 0
simple alternative hypothesis.
3) In N.P. methods no assumption, regarding distribution of parent population, from which 10. Let &, Xy X, bea random sample from a p.d.f.
sample is drawn is made. O<x<1 .| 8>0
fi.) ={0x 7 0<x<l
4) They need only nominal or ordinal data. otherwise.
5) N.P. methods not only make less rigorous assumptions but also work well even certain ‘Find on U.M.P. test of size a for testing H,: @= 1against H, o>.
assumption are violated.
Multiple Choice Question
yw Disadvantages of Non-parametric Tests
1) Due to simplicity of N.P. tests, they. are often used even if there exist an appropriate Qu The set of all possible values of a parameter of some p.d.f. is calied
parametric method. ,
[al null space [b] parameter space
In such cases (where parametric test is available and can be more powerful), applying
N.P. is just waste of time and data. [c] critical region [a] none of these
2) ALN. P..tests are not as simple as they are claimed to be. Q.2 The ratio of maximum likelihood function defined over ©, (null parameter space) to that
3) It is not possible to determine the actual of a N.P. test. defined over @ {complete parameter space) is called

Exercise
[al likelihood ratio statistic ſo] p-value

1. Define i) Parameter space ii) Likelihood function iii) Likelihood Ratio test statistic ſc] “maximum likelihood estimate [d] t-test statistic
2. Consider a random sample x,, x, ... x, of size n, from a normal distribution N(u, F).
Develop LRT to test H,: = M, (specified) against.
DH,.u>k WA UE, WR: HF
{Given & is known]

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge


TECHNICAL PUBLICA Tions® - an up-thrust for knowledge
- ~ menace — = nnn eens
aes . ‘

cs for Hypothesis
: Tests Statistics. 6-39 inferential Statistics : Tests for Hypothesis
Statistics 6-38 Inferential Statisti
Q9 Suppose ttest is applied to test the equality of means of two random variables X and Y,
‘| Q/3 Likelihood ratio test statistic A= . |
bo. sup where X and Y are independent. Then : .
. sup -
8&9 L(x,0) (6 080, L(x,8) [a] Distribution of both X and Y must be normal with equal variances.

SUP Sup
L(x,8) . Fj [b] Distribution of both X and Y must be normal.
©, L(x,9) a gs 9.
[e] sup - Sup [e] Distribution of X and Y may not be normal.
8&0 ix,0) ; 9, 0)
[4 Both X and Y must have the same variance.
Q.4 According to L.R.T,, H, is rejected if 0 <A <A, where, A, is derived from .

[a] P(A<A,JH,) = a [b] PA <AJH,) =o Answer Keys for Multiple Choice Questions :

Ef 9009nn
=a Q1/ b {02} a iQ3} ec 1Q4} b
ſe] PO>MgH)=o& |- [d] P(A>AgHg)
, 4Q6 d iQ7:.b §Q8i d : QS. b
| Q.5, If x,, X,, ..., X, be a random sample from a population with p.d.f. f(x, 8,, 8, ..., 9,) then
likelihood function is given by
“tt n
=—_ oo —_ ooo
L=2 f(x, 8,, 8, «.., Og) [6] i= 11 0x, 0, 0,....,09

[cl Ls = [a] None of these


. "
+

Q.6 The LRT for testing H, > W = Hy against H, : =, based on random sample of size n from
tk os
- a normal population with unknown p and of is equivalent to

fal _ two tailed F-test [o} one tailed F-test

one tailed t-test = [d] two tailed t-test


as
Q.7 A uniformly most powerful test among the class of unbiased test is termed

Optimum biased test - [6] Uniformly most-powerful unbiased test

[ec] Minimax. unbiased test [a] Minimax test

The Neyman-Pearson lemma provides the best critical region for testing
null
Q.8
hypothesis against aliernative hypothesis.

[al composite, simple Bn composite, composite

simple, composite [a] simple, simple

TECHNICAL PUBLICATI ONnS® - an up-thrust for knowledge


TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
6-40 Inferential Statistics : Tests for Hypothesis
Statistics

Notes SOLVED MODEL QUESTION PAPER (In Sem)


; Statistics
S.E. (AE and DS) Semester - IV (As Per 2020 Pattern)

- Time : 1 Hour] [Maximum Marks : 30


N.B.:
i) Attempt Q.1 or 0.2, Q.3 or QA.
ti) Neat diagrams must be drawn wherever necessary.
ii) Figures to the right side indicate full marks,
iv) Assume suitable data, if necessary.

Q.1 a) What is statistics ? Give importance and limitations of statistics,


(Refer sections 1.1, 1.3 and 14) [5]

b} Define sampling: Explain random sampling. (Refer sections 1.7 and 1.7.2) [4]

©) What is population and sample ? Give difference between them.


{Refer section 1.7} | [4]
OR

Q.2 a) What are the methods of estimation ? Give brief on testing of hypothesis.
(Refer sections 1.10.1 and 1.10.2) [5]
b) Give advantages and disadvantages ofstatistical analysis.
‘(Refer sections 1.3 and 1.4) [4]
c) Explain the scope of statistics in engineering and technology.
(Refer section 1.2.2) _ 14
Q.3 a) What is histogram ? Draw the histogram for the following data - [5]

Age (in years) 2-5 5- 1 11-72 12-14 14-15 15 - 16


No. of boys 6 6: 2 5 1 3

(M- 1)
rm em amt narnt® ann thenet far “nawladae
Statistics , M-2 Solved Model Question Papers Statistics M-3 Solved Model Question Papers
t

ns. :

Scale =- 1-+|
Median 2 xh
X- axis: - age (in Years) 1 om = 1 Year
Y - axis : - No. of boys 1 om = 1

300 +{ 2 100 = 300 + 0.3809 x 100


©
shoq Jo “ON
O

338.0952
=
©

d) Define Geometric mean and harmonic mean. Compare them on the basis of merits.
Q

(Refer sections 2.17 and 2.19) ° [4]


_

OR

123456 7 8 9 10 11 12 15 14 15 16
Age (in years) Q.4 a) Daily expenditure of 100 families on transport is given below : [5]
' Fig. 1
Expenditure 20 - 28 30 - 39 40 - 49 80. - 59 60 - 69
'b) of arithmetic mean (two each). (Refer section 2.10)
State merits and demerits [4]
No. of families 14 © 9? '27 ? 15

c) Obtain the median from following table, = We


If mode of the distribution is 43.5 find the missing frequencies.
Class 0-100" | 100-200 | 200-300 300-400 400-500 | 500-600 | 600-700
‘Ans. : .
Frequency 9 15 18 21 nn 14 5 c.f. f

20 - 29.5 14
Ans. : 29.5 - 39.5 fk
39.5 - 49.5 27
Cl. f Less than c.f.
49.5 - 59.5 fy
0-100 9 * 9 __ 59.6 - 69.5 15
100-200 ns 24 N = 100
Mode = 43.5
200-300 18 42 5 = 50
Here maximum frequency is 27

{_ 300-400 21 63) © Thus class 39.5 - 49.5 is modal class.

400-500 18 Os Bt Median class_ where I = 395, f, = 27, fy =f, & =f h= 10

Ml
500-600 14. 95
Mode

a
bet;
a
S
+
oO
Il
600-700 . 5 100

1= 300, h= 100, f = 21, cf = 42 43.5

©
a
mn
+
Il

TECHNICAL PUBLICA Tons? - an up-thrust for knowledge . TEC! NICAL PUBLICA TIONs® - an up-thrust for knowledge ©

EIFS
\
Statistics M-4 Solved Mode! Question Papers
Statistics M-5 Solved Mode! Question Papers’
2133 = 39.5f, — 39.5f, +270 —10f,
43.5
54-f -f,
2349 - 43.5f, — 43.56, = 2403 ~ 49.5, - 39.5f,
49.5f + 39.5f, — 43.5f, —43.5f, = 54

6f, —4f, = 54 d) State the. advantages atand limitation ofgraphical representation of data.
... (1)
(Refer Section 2.5)
As N = Sif= 100 ‘(4
14 +6 +27 +6 +15 = 100 SOLVED MODEL QUESTION PAPER (End Sem)
f +f, = 100-56 Statistics
f +f, = 44 S.E, (Al and DS) Semester- IV (As Per 2020. Pattern)
-.. (2)
‘Solving equation (1) and (2),

; Time; 2 4+ Hours]
f, = 23 f, =21- [Maximum Marks : 70
N.B.: . :
b) Give.merits and demerits of median. (Refer section 2.13) [4] i) Attempt Q.1or Q.2, Q.3 or Q4, Q.5 or Q.6, Q.7 or Q.8.
ti) Neat diagrams must be drawn wherever necessary.
c) Calculate harmonic mean of the following Series
(4] 4) Figures to the right side indicate full marks.
Values 2 6 10 44 78 iv) Assume suitable data, if necessary.
: Frequency 4 12 20 9 5
Q.1 a) For the regression lines 3x + 2y~26 = 0 and 6x + y - 31 =0
find DX andy it) Correlation coefficient r between x andy.
Ans. :
(Refer example 3.18. 1) =_ {6]
Ww; x
b) Compute M.D, about |i Mean ii) Median also find coefficient of M.D, for the
2 4 following frequency distribution, (Refer example 3.6.2) [5]
6 12
10. - 20 Class internal 1-3 =: . 3-5 5-7 7-9
14 - 9- Frequency 3: 4 2 1
18 5
©) MNC company conducted 1000 candidates aptitude test. The average score is 45
dw; = 50 and the standard deviation of ‘score is 25. Assuming normal distribution for the
Wie 2,4 6,10, 14,18 result. Find i) The number of candidates whose scores exceed 60.
X; 12 20 9 5 ii) The number of candidates whose scores lies between 30 and 60.
(Refer example 3.31.11) [6]
=» i ll +1.5556 +3.6
2 2 2

= 15 + 5.1556 = 6.6556
TECHNICAL PUBLICATIONS ~
®
- an up-thrust for knowledge
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Statistics M-6 Solved Mode! Question Papers
Statistics M-7 Solved.Mode! Question Papers

OR
b) Let a random variable x takes values -2, -1, 0, 1, 2 such that
G2 a) The following marks have been obtained by a class of students in 2 papers .of P(X=-2)= P(X=~1) = P(X= 1) = P(X= 2) and
mathematics _— 5} P(X < G) = P(X = 0) = P(X > 0). Determine the probability mass function of X.
(Refer example 4.5.3) [6]
Fares vor

Paper! 45.1 55 56 58 60 6 | 68 | 70: 75 | 80: 85


Paper fi 56 50 48 60 62 64 65 70 : 74 82 90 The probability density function fix)= Ax e*, x > 0.
Find.: i} The value of 2, i) P (2 < x < 5) (Refer example 4.5. 6) © [6]
Calculate, the coefficient of correlation for the above data. (Refer example 3.15.3)
Q.5- a) Let P be the probability that a coin will fall head in a single toss in order to test
b} - Find the first four moments about any arbitrary point for the following data
Hyg : Þ = ; against P = : . The coin is tossed5 times and Hy is rejected if more
(Refer example 3.10.2) —_ [6]
’ Marks 29 30 31 32 33 34 35 36 37 38 39 than 3 heads are obtained. Find the probability of type I error and power of the
No. of Students . 2 1 5 4 5 10 75 :- $0 74 :- 62 40 41° test. (Refer example 5.8.2) . [6]
9 b) A normal vopulation has mean 6.8 and standard deviation of 1.5. A sample of
Out of 2000 families with 4 children each, how many would you expect to have
400 mertbers gave a mean of 6.75. Is the “ference significant ?
i) At least a boy ii) Two boys iii) 1 or 2 girls iv) No girls, On [5]
(Refer example 5.10.8)
(Refer example 3.31.5) [6]
©) Suppose that sweets are sold in packages of fixed weight of the contents. The
@.3 a) An unbiased coin is thrown 10 times. Find the probability that (i) Getting exactly
procedure of the packages is interested in testing the average weight of content in
6 heads (ii) Getting at least 6 heads. (Refer example 4.14.3) [6] packages in 1 kg. Hence a random sample of 12 packages is drawn and their
[6] contents found (in kg) are as follows: 1.05, 1.01, 1.04, 0.98, 0.96, 1.01, 0.97, 0.99,
b} ‘The probability distribution of X is as follows :
0.98, 0.95, 0.97, 0.95. Using-the above data what should he conclude about the
x - 0 1 . 2 3 4 average. (Refer example 5.10.17) [6]
PIX = x) 0.1 k 2k 2k k OR

is probability mass function. Q.6 a) In an experiment on pea breeding, the following frequencies of seeds were obtained.
Find i) k ti) P{x<2] i) P[X>3] iv) P[1<x<3} {6]
(Refer example 4.5.1)
Round and Wrinkled and Round and Wrinkied and Total
greer. . green yellow — yellow
In a continuous distribution density function f(x) = k x (2 = x), 0 < x < 2. Find, 222 120 32 150 524

- OR Theory predicts that the frequencies should be in proportion 8:2:2:1. Examine the
correspendence between theory and experiment. (Refer example 5.11.3)
@.4 a) Fit a Poisson distribution to the following data and calculate theoretical frequencies.
(Refer example 4.17.4) [6] From the data given below. Intelligence tests of two groups of boys and girls gave
b)
x: i 0 1 2 3 4 Total the follewing results. Examine the difference is significant.
(Refer example 5.10.12} . [5]
f; © 109 65. 22 83 1 200

* TECHNICAL PUBLICA Tions® - an up-thrust for knowledge


PUBLICATIONS®
Statistics M-8 "Solved Model! Question Papers

Mean S.D. Size Notes


Girls 70 10 - 70
Boys 75 11 110

c) In two independent samples of size 8 and 10 the sum of squares deviations of the
sample values from the respective sample means were 84.4 and 102.6. Test whether
the difference of variances of the population is significant or not.
(Refer example 5.10.22) Oo . [6]
Q.7 a) State and Prove N-P-Lemma. (Refer section 6.4) [8]
b) Check whether a BCR exists for testing the null hypothesis Hy : 8 = @ vs
Hy: 8 = 0, > 09 for the parameter 0 of the distribution. f(x, 8) coe
= ;
x+
1sx<c, (Refer example 6.4.7) [5]
c) Let X ve a random sample having normal distribution N(u,o7) where v-is unkown
and 0* = 2. Derive likelihood ratio test for Ho: u = 10 against H; : p = 10 at 5%
level of Significance.
(Given that, maximum: Fikelihood function under © and ©,.
n2
1O)=[
© = i et Þ
F(x; —X)
, —Xx 2

KO), on =[—]
( gg) "2 e 14 TAD I(x; Mey —X)°
RFT2 +X14 -10)"]3012 5
Ans.: ©, - rejected if Z > 1.96
OR

Q8 a) If x 2 1 is the critical region for testing ©) : 8 = 2 against the alternative


9 = 1 on the basis of the single observation from the population.
fx ,9) =06 0<x<, obtain the values of type I and type II error.
(Refer example.6.4.4) . [5]
b) . Show that the likelihood ratio test for testing the equally of variances of two
normal distribution is the usual f-test. (Refer section 6.11) [8]
c) State advantages and disadvantages of Non-parametric Tests.
(Refer sections 6.12.1 and 6.12.2) [5]
OOO
TECHNICAL PUBLICATIONS ® - an up-thrust for knowledge

You might also like