Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

https://shop.khoji.

net

EEC-13: ELEMENTARY STATISTICAL METHODS AND SURVEY TECHNIQUES

Programme Code: BDP


Course Code: EEC-13
Assignment Code: EEC-13/AST/TMA/2017-18
Maximum Marks: 100

Answer all the questions.

m
A. Long Answer Questions 2 x 20 = 40 marks

1. (a) Explain the concepts of skewness and kurtosis. Briefly describe how these can be measured.

ckc co
(b) The median, mode and Karl Person’s coefficient of skewness of a distribution are 17.4, 15.3
and 0.35 respectively. Calculate the coefficient of variation of the distribution.
yh a.
2. (a) Explain the concepts of level of significance and rejection region using suitable diagram of a
standard normal curve.
(b) Explain the concept of standard error through a suitable example. What are its implications?
xqY b
(c) How do you construct confidence interval for a statistic? What are its implications?

k
ba

B. Medium Answer Questions 4 x 12= 48 marks

3) Define probability distribution function of binomial distribution. Assume that the probability of
ly

a boat reaching its destination safely is 0.9. Under the assumption of binomial distribution, find
the mean and standard deviation of boats reaching destination safely out of a total 500 boats.
ul

4) Calculate index numbers using Paasche’s method and Fisher’s method from the following data.
G

Commodity ‫݌‬ଵ ‫ݍ‬ଵ ‫݌‬଴ ‫ݍ‬଴


A 5 14 3 8
B 8 18 6 25
C 3 25 1 40
D 15 36 12 48
E 9 14 7 18
F 7 13 5 19

5) a) Define correlation coefficient. What are its properties?


b) Fit a straight line ሺܻ ൌ ܽ ൅ ܾܺሻ to the following data. Compare the estimated values of the
dependent variable with its actual values.

1
DOWNLOADED FROM KHOJINET
https://shop.khoji.net

X 5 8 10 12 13 15 17 16
Y 8 12 14 10 13 16 14 17

6) What is a life table? Explain its uses and limitations.

C. Short Answer Questions 2 x 6= 12 marks

7) Write short notes on the following:

(a) Bayes’ theorem of probability


(b) Age specific birth and death rates

8) Differentiate between the following:

m
(a) Simple random sampling and Stratified random sampling
(b) Type I and Type II errors in hypothesis testing

ckc co
yh a.
xqY b
k
ba
ly
ul
G

2
DOWNLOADED FROM KHOJINET
https://shop.khoji.net

ASSIGNEMNT REFERENCE MATERIAL (2017-18)

E.E.C.-13

ELEMENTARY STATISTICAL METHODS AND SURVEY


TECHNIQUES

A. Long Answer Questions

Q1. (a) Explain the concepts of skewness and kurtosis. Briefly describe how these can be
measured.

m
Ans. Concept of skewness: The term ‘Skewness’ means lack of symmetry, i.e. if the distribution of
data is not symmetrical, it is called a skewed distribution. Any measure of skewness indicates the

ckc co
difference between the manner in which item are distributed in a particular distribution compared with
a symmetrical (or normal) distribution. If skewness is positive, the frequencies in the distribution are
spread out over a greater range of value on the high value end of the curve (the right hand side) than
they are on the low value end. If the curve is normal, the spread will be the same on both sides of the
center point and the mean median and mode will all have same value.
yh a.
A simple method of finding the direction of skewness is to consider the tails of a frequency polygon.
The concept of skewness will be clear from the following three figures showing symmetrical,
xqY b
positively skewed and negatively skewed distributions.
k
ba

There are four measures of skewness, these measures are discussed briefly below:

Karl Pearson’s Measure: The formula for measuring skewness as given by Kari Pearson is as
follows:
ly

Skewness = Mean – Mode


ul
G

3
DOWNLOADED FROM KHOJINET
https://shop.khoji.net

Bowley’s Measure: Bowley developed a measure of skewness, which is based on quartile values.
The formula for measuring skewness is:

Skewness (Q 3 – Q 2 ) – (Q 2 – Q 1 )

Where Q 3 and Q 1 are upper and lower quartiles. The value of this skewness varies between ±1 . In the
case of open-ended distribution as well as where extreme values are found in the series, this measure
is particularly useful.

Kelly’s Measure: Kelly developed another measure of skewness, which is based on percentiles.

The formula for measuring skewness is as follows:

m
ckc co
where P and D stand for percentile and decile, respectively.

Moment’s Measure: In mechanics, the term moment is used to denote the rotating effect of a force.
yh a.
In Statistics, it is used to indicate peculiarities of a frequency distribution. The utility of moments lies
in the sense that they indicate different aspects of a given distribution. Thus, by using moments, we
can measure the central tendency of a series, dispersion or variability, skewness and the peakedness of
xqY b
the curve.

k
Concept of kurtosis: Kurtosis is the measure of the shape of a frequency curve. It is a Greek word,
ba

which means bulginess. While skewness signifies the extent of asymmetry, kurtosis measures the
degree of peakedness of a frequency distribution.

Karl Pearson classified curves into three types on the basis of the shape of their peaks. These are
ly

mesokurtic, leptokurtic and platykurtic These three types of curves are shown in Fig
ul
G

µ4
The coefficient of kurtosis as given by Karl Pearson is β2 = . In case of a normal distribution, that
µ 22
is, mesokurtic curve, the value of β 2 = 3. If turns out to be > 3, the curve is called a leptokurtic curve
and is more peaked than the normal curve. Again, when β 2 ˂ 3, the curve is called a platykurtic curve
and is less peaked than the normal curve.

4
DOWNLOADED FROM KHOJINET
https://shop.khoji.net

(b) The median, mode and Karl Person’s coefficient of skewness of a distribution are 17.4, 15.3
and 0.35 respectively. Calculate the coefficient of variation of the distribution.

Ans. C.V. = S.D. × 100 Hear standard deviation and mean values are not given. To find out mean
Mean
value use the formula

Mode = 3Meadian – 2Mean

15.3 = 3(17.4) – 2Mean

2Mean = 52.2 – 15.3

2Mean = 36.9

m
Mean = 36.9
2

ckc co
Mean = 18.45

To find out S.D use Skewness formula


yh a.
Mean – Mode
SK p =
σ
xqY b
18 ⋅ 45 –15.3
0 ⋅ 35 =
σ k
ba

0.35σ = 3.15

3.15
σ=
ly

0.35

σ=9
ul

9
Therefore, C.V. = 100 ×
G

18.45

C.V. = 48.78 %

Q2. (a) Explain the concepts of level of significance and rejection region using suitable diagram
of a standard normal curve.

Ans. This refers to the degree of significance with which we accept or reject a particular hypothesis.
Since 100 per cent accuracy is not possible in taking a decision over the acceptance or rejection of a
hypothesis, we have to take the decision at a particular level of confidence which would speak of the
probability of one being correct or wrong in accepting or rejecting a hypothesis. In most of the cases
of hypothesis testing, such a confidence is fixed at 5 per cent level, which implies that our decisions
would be correct to the extent of 95 per cent. For a greater precise, however, such a confidence may
be fixed at 1 per cent level which would imply that the decision would be correct to the extent of 99
per cent. This level is usually denoted by the symbol, α (alpha) which represents the probability of

5
DOWNLOADED FROM KHOJINET
https://shop.khoji.net

committing the type I error (i.e. rejecting a null hypothesis which is true). The level of confidence (or
significance), is always fixed in advance before applying the test procedures. It is important to note
that if no level of significance is given, then we always take α = 0.05.

The underlying idea behind hypothesis testing and interval estimation is the same. A confidence
interval is built around sample mean with certain confidence level. A confidence level of 95 per cent
implies that in 95 per cent cases the population mean would remain in the confidence interval
estimated from the sample mean. It is implicit that in 5 per cent cases the population mean will not
remain within the confidence interval. Note that when the population mean does not remain within the
confidence interval we should reject the null hypothesis. Sampling distribution of sample mean ( x
σ
)follows normal distribution with mean µ and standard deviation is the standard error of x .
n

In notations

m
ckc co
yh a.
xqY b
k
ba
ly
ul
G

6
DOWNLOADED FROM KHOJINET
https://shop.khoji.net

m
ckc co
yh a.
(b) Explain the concept of standard error through a suitable example. What are its
xqY b
implications?

k
ba

Ans. The standard deviation of the sampling distribution of a statistic is known as the standard error
of a statistic. As there are various types of sampling distribution, we could have various types of
standard errors depending on the nature of sampling distribution. The standard deviation of the
sampling distribution of means is called the standard error of the means. In sampling theory, instead
of using the term standard deviation for measuring variation, we use a new term called standard error
ly

of mean.

Utility of Standard Error


ul

The standard error is used in a large number of problems which are discussed as follows:
G

• Reliability of a Sample: The standard error gives an idea about the reliability and precision of a
sample. That is, it indicates how much the estimated value differs from the observed values. The
greater the standard error, the greater is the deviation between the estimated and observed values and
lesser is the reliability of a sample. The smaller the standard error, the smaller is the deviation
between the estimated and observed values and greater is the reliability of a sample.

• Tests of Significance: The standard error is also used to test the significance of the various results
obtained from small and large samples. In case of large sample, if the difference between the observed
and the expected value is greater than 1.96 standard error, then we reject the hypothesis at 5% and
conclude that sample differs widely from the population. But if the difference between the observed
and the expected value is greater than 2.58 S.E. (Standard error), then we reject the null hypotheses at
1% and conclude that the sample differs widely from the population.

7
DOWNLOADED FROM KHOJINET
https://shop.khoji.net

• To determine the confidence limits of the unknown population mean: The standard error enables us
in determining the confidence limits within which a population parameter is expected to lie with a
certain degree of confidence. The confidence limits of the unknown population mean μ are given by:

The standard error of the sampling distribution of means is obtained as:

m
(c) How do you construct confidence interval for a statistic? What are its implications?

ckc co
Ans. We assume that the random sample is taken from a normal distribution and that the population
variance is known. The latter assumption is somewhat unrealistic, because the population variance is
rarely known. Suppose a random sample is taken with an unknown mean and known variance. The
confidence interval uses the fact that the random variable Z, where
yh a.
xqY b
k
ba

Has a standard normal distribution. Suppose a 100(1– α) per cent confidence interval is set up, so that
α/2 is the area of the right tail of the normal distributions, α/2 is the area of the left tail, and 1α is the
area in the center as shown in figure. The cutoff points on the normal distribution are z α/2 and –z α/2 .
The confidence interval is derived as follows:
ly
ul
G

Equation (i) implies that confidence intervals have the following characteristics:

• As the standard deviation increases, the length of the confidence interval increase. This result is
understandable: the wider the deviation, the more uncertain the estimate of the mean.

• The bigger the sample size, the smaller the confidence interval for a given variance. This is because
more information decreases the interval, making a better interval possible.

8
DOWNLOADED FROM KHOJINET
https://shop.khoji.net

• The confidence interval is larger of smaller confidence levels (α) .A 99 per cent confidence interval
has a smaller α than a 95 per cent interval because a 99 per cent interval has more certainty.

m
B. Medium Answer Questions

ckc co
Q3) Define probability distribution function of binomial distribution. Assume that the
probability of a boat reaching its destination safely is 0.9. Under the assumption of binomial
distribution, find the mean and standard deviation of boats reaching destination safely out of a
total 500 boats.
yh a.
Ans. The mathematical function describing the possible values of a random variable and their
associated probabilities is known as a probability distribution. Let us take a concrete example of
probability distribution. Assuming that the two coins are unbiased, we can write
xqY b
k
ba
ly
ul
G

Probability of safe arrival, P = 0.9 and q = 1– P

= 1– 0.9 = 0.1

Mean of boats arriving safely = m = np

M = 500 × 0.9

= 450

S.D. (σ) = nqp

9
DOWNLOADED FROM KHOJINET
https://shop.khoji.net

500 × 0.9 × 0.1

= 45

= 6.71

Hence, the mean and standard deviation of the boats returning safely is respectively 450 and 6.71.

Q4) Calculate index numbers using Paasche’s method and Fisher’s method from the following
data.

m
Ans.

ckc co
yh a.
Commodity p1 q1 p0 q0 p0q0 p1q1 p0q1 p1q0
A 5 14 3 8 24 70 42 40
B 8 18 6 25 150 144 108 200
xqY b
C 3 25 1 40 40 75 25 120
D 15 36 12 k 48 576 540 432 720
ba

E 9 14 7 18 126 126 98 162


F 7 13 5 19 95 91 65 133
1011 1046 770 1375
ly

(a) Paasche’s
ul

P01 =
∑p q
1 1
100
×
1046
= 100
× 135.8
=
∑p q
0 1 770
G

(b)Leyspeyer’s

P01 =
∑p q
1 0
×
1375
100 = 100
× 136
=
∑p q
0 0 1011

Fisher = L × P = 135.8 136× 135.9


=

Q5) a) Define correlation coefficient. What are its properties?

Ans. The correlation refers to the statistical technique used in measured the closeness of the
relationship between the variables. Correlation coefficient is said to be a measures of covariance
between two series. It is denoted by r. The value of correlation coefficient always lie between or equal
to –1 to 1.

10
DOWNLOADED FROM KHOJINET
https://shop.khoji.net

The coefficient of correlation measures the degree of relationship between two sets of figures. As the
reliability of estimates depends upon the closeness of relationship it is imperative that utmost care be
taken while interpreting the value of coefficient of correlation, otherwise fallacious conclusions can
be drawn.

Properties of the coefficient of correlation

The following are the important properties of the correlation coefficient, r:

m
ckc co
yh a.
xqY b
b) Fit a straight line (Y = a + bX) to the following data. Compare the estimated values of the
k
dependent variable with its actual values.
ba
ly

Ans.
ul
G

11
DOWNLOADED FROM KHOJINET
https://shop.khoji.net

m
Q6) What is a life table? Explain its uses and limitations.

ckc co
Ans. A life table (also called a mortality table or actuarial table) is a table which shows, for each age,
what the probability is that a person of that age will die before his or her next birthday.
yh a.
With the help of life tables

• the probability of surviving any particular year of age


xqY b
• remaining life expectancy for people at different age can be found.
k
ba

In other words, life tables are tabular display of life expectancy and probability of dying at each age or
age group for a given population, according to the age specific death rates prevailing at that time.

Uses
ly

The life tables have significant applications in actuarial science especially in the field of life
ul

assurance. Life tables form the basis for determining the rates of premiums necessary to various
amount of life assurance. Life tables provide the actuarial science with a sound foundation, converting
the insurance business from a mere gambling in the human lives to the ability to offer well calculated
G

safeguard in the event of death.

Limitations

Life table estimates have all the disadvantages of any statistical measure based on population censuses
and vital records. Data on ages and mortality registries may be incomplete or biased. Infant mortality
weighs heavily on life expectancy, which means that under-reporting of this indicator, a habitual fact
in many countries, can have an important effect on the results of the tables. The same can be said
about the procedure used in closing the final, open interval of the mortality table (e.g. 85 and more, 90
and more) and the information inaccuracies existing in these age intervals. Also, important differences
in specific age/sex groups with high mortality may be overlooked, since this would have little effect
on the overall life expectancy.

12
DOWNLOADED FROM KHOJINET
https://shop.khoji.net

C. Short Answer Questions

Q7) Write short notes on the following:

(a) Bayes’ theorem of probability

Ans. Bayes’ theorem (also known as Bayes’ rule) is a useful tool for calculating conditional
probabilities. Bayes’ theorem can be stated as follows:

Let A1, A2,..., An be a set of mutually exclusive events that together form the sample space S. Let B
be any event from the same sample space, such that P(B) > 0. Then,

P ( Ak ∩ B)
P ( Ak / B) =
P ( A1 ∩ B ) + P ( A 2 ∩ B ) + ... + P ( A n ∩ B )

m
Invoking the fact, that P ( A k ∩ B ) = P ( A k ) P ( B / A k ) (by multiplication rule for dependent events) Bayes’
theorem can also be expressed as:

ckc co
P ( Ak ) P ( B / Ak )
P ( Ak / B) =
P ( A1 ) P ( B / A1 ) + P ( A 2 ) P ( B / A 2 ) + ... + P ( A n ) P ( B / A n )
yh a.
Applying Bayes’ Theorem

Part of the challenge in applying Bayes’ theorem involves recognising the types of problems that
xqY b
warrant its use. One should consider Bayes’ theorem when the following conditions exist:

k
ba

• The sample space is partitioned into a set of mutually exclusive events {A1, A2, …, An}.

• Within the sample space, there exists an event B, for which P(B) > 0.

• The analytical goal is to compute a conditional probability of the form: P(A k /B).
ly

• Know at least of the two sets of probabilities described below:


ul

P ( A ∩ B)
 k
for each A k .
 P(A k ) and P(B/A k ) for each A k .
G

(b) Age specific birth and death rates

Ans. Birth rate: Birth rate is defined as the number of births in a specific community or region in a
given period, preferably on a yearly basis, per thousand persons. The formula is, Crude Birth Rate
(CBR) =

Annual Number births (in a community or region)


× 1, 000
Annual mid year population(of the community or region)

Example: The mid-year population and number of births occurred of a tribal community in Uttar
Pradesh in 2012 are 40,000 and 1200 respectively. Find the crude birth rate.

Here, we have 2012 mid-year population = 40000 and the 2012 number of births = 1200.

13
DOWNLOADED FROM KHOJINET
https://shop.khoji.net

1,200
CBR = × 1, 000
40,000

= 30 per I000 persons per annum

Death Rate: It is defined as the number of deaths per 1,000 people per year in a specific age group or
sex group or community or region. Crude Death Rate (CDR) =

Number of deaths during time period


× 1000
Total population at mid-point of time period

Example: The mid-year population and the number of deaths registered in 2011 for a town in
Maharashtra among females are 25000 and 245 respectively. Find the crude death rate. Here, we have
2011 mid-year female population = 25000 and the number of deaths in 2011 =245.

m
245
CDR = × 1, 000
25000

ckc co
= 9.8 per 1000 persons per annum among females.

Q8) Differentiate between the following:


yh a.
(a) Simple random sampling and Stratified random sampling

Ans. Simple random sampling: Simple Random Sampling is the foundation of Probability
xqY b
Sampling. It is a special case of probability sampling in which every unit in the population has the
same chance of being selected. If you have to select n units out of N units, every possible selection of
k
ba

n units must have the same probability.

Stratified Sampling: This method is useful when the population consists of a number of
heterogeneous subpopulations and the elements within a given subpopulation are relatively
homogeneous compared to the population as a whole. Thus, population is divided into mutually
ly

exclusive groups called strata that are relevant, appropriate and meaningful in the context of the study.
A simple random sample, called a subsample, is then drawn from each strata or group, in proportion
or a nonproportion to its size. As the name implies, a proportional sampling procedure requires that
ul

the number of elements in each stratum be in the same proportion as in the population.

(b) Type I and Type II errors in hypothesis testing


G

Ans. When a statistical hypothesis is tested, there are four possible results:

• The hypothesis is true but out test rejects it.


• The hypothesis is false but our test accepts it.
• The hypothesis is true and our test accepts it.
• The hypothesis is false and our test rejects it.

Obviously, the first two possibilities lead to errors. If we reject a hypothesis when it should be
accepted (possibility No. 1), we say that a Type I error has been made. On the other hand, if we accept
a hypothesis when it should be rejected (possibility No. 2), we say that a Type II error has been made.
In either case, a wrong decision or error in judgement has occurred. The sizes of Type-I error and
Type-II error are denoted by and respectively. The usual practice in testing of hypothesis is to fix, the
size of Type-I error and they try to obtain a criterion which minimises, the size of Type-II error.

14
DOWNLOADED FROM KHOJINET

You might also like