Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

1

Question 1-A statistical survey is a scientific process of collection and analysis of numerical
data. Explain the stages of statistical survey. Describe the various methods for collecting data in
a statistical survey.
Answer: A statistical survey is the scientific process of collecting and analysis of numerical data.
Statistical surveys are used to collect information about units in a population and it involves asking
questions to individuals. Surveys of human populations are common in government, health, social
science and marketing sectors.
Statistical survey involves two stage :
- Planning: The relevance and accuracy of data obtained in a survey depends upon the care taken in
planning. A properly planned investigation can lead to the best result with least cost and time.
Steps involved in planning stage:
Step 1: Nature of the problem to be investigated should be clearly defined in an unambiguous
manner.
Step 2: Objectives of investigation should be stated at the outset. Objectives could be to
o Obtain certain estimates
o Establish a theory
o Verify an existing statement
o Find relationship between characteristic

Step 3: The scope of investigation has to be made clear. The scope of investigation refers to
the area to be covered, identification of units to be studied, nature of characteristic to be
observed, accuracy of measurement, analytical methods, time, cost and other resources
required.
Step 4: Whether to use data collection from primary or secondary source should be
determined in advance.
Step 5: The Organization of investigation is the final step in the process. It encompasses the
determination of the number of investigators required, their training, supervision work
needed, funds required.
-Execution: Controlled methods should be adopted at every stage of carrying out the investigation to
check the accuracy, coverage, methods of measurements, analysis and interpretation. The collected
data should be edited, classified, tabulated and presented in the form of diagrams and graphs. The data
should be carefully and systematically analyzed and interpreted.
Methods for collecting data in a statistical survey are:
-Primary data:
Primary data is the one, which is collected by the investigator for the purpose of a specific
inquiry or study. Such data is original in character and is generated by a survey conducted by
individuals or a research institution or any organization.
Collection of primary data is done by a suitable method as per the following:
Direct personal observation
Indirect oral interview
Information through agencies
Information through mailed questionnaires
Information through a schedule filled by investigators
2


- Secondary data:
Any information, that is used for the current investigation but is obtained from some data,
which has been collected and used by some other agency or person in a separate investigation, or
survey, is known as secondary data. They are available in a published or unpublished form.
In published form, secondary data is available in research papers, newspapers, magazines,
government publication, international publication, and websites. Secondary data is collected for
different purposes. Therefore, care should be exercised while using it.

3

Question 2- Analysis of daily wages of workers in two organizations A and B yielded the
following results:
organization
a b


Number of workers
average daily wages (RS)

variance
Obtain the average daily wages and the standard
deviation of wages of all workers in the two organizations taken together. Which organization is more
equitable in regard to wages?
Solution
Given that:
N1=10, N2=20 , X1=30 , X2=15
1=25, 2=100

combine mean = N
1

1
+N
2


N
1
+N
2

=


i . e
12
=RS 20
d
1
=
1
-
12
= 30-20 = 10
d
2
=
2
-
12
= 15-20 = -5
(d
1
)
2
= 100, (d
2
)
2
= 25
Now,

2
(N
1
+N
2
) = N
1
(
1
2
+ d
1
2
) + N
2
(
2
2
+ d
2
2
)

2
(30) = 10 (25+100) + 20 (100+25)

2
= 3750/30

10


20

30


15

25


100
4

= 11.180

now,
coefficient of variance (c.v
1
) =
1
X 100

1


= 5 / 30 X 100
= 50 / 3
= 16.6666

coefficient of variance (c.v
2
) =
2
X 100

2

= 10 / 15 X 100
= 66.66
Organization A is more equitable in wages than organization B since the coefficient of variance(c.v)
of organization A is lower than organization B.
5

Question 3-a. State the addition and multiplication rules of probability giving an example of
each case.
Answer: Additional rule:
The additional rule of probability states that :
i) If 'A' and 'B' are any two events then the probability of the occurrence or either 'A' or 'B' is given
by:
P(A U B )= P(A) +P(B)-P(AB)
For example; A single card is chosen at random. What is the probability of getting king or a club?
P(king or club)= P(king)+p(club)-P(king of club)= 4/52+13/52-1/52= 4/13
ii) If 'A' and 'B' are any two mutually exclusive events then the probability of the occurrence or either
'A' or 'B' is given by:
P(A U B )= P(A) +P(B)
For example: a single 6 sided die is rolled. what is the probability of rolling 2 or 5?
P(2 U 5)=P(2)+P(5) = 1/6+ 1/6 =2/6 = 1/3
iii) If 'A' and 'B' and 'C' are any three events then the probability of occurrence of either 'A' or 'B' or
'C' is given by:
P(A U B U C)= P(A) +P(B)+P(C)-P(AB)-P(BC)-P(AC)+P(ABC)
iv) If A1,A2,A3......An is 'n' mutually exclusively and exhaustive events then the probability of
occurrences of at least one of them is given by:
P(A1 U A2 U A3 U...... U An )=P(A1)+P(A2)+......+P(An)

Multiplication rule:
If 'A' and 'B' are two independent events then the probability of occurrence of 'A' and 'B' is given by :
P(AB)=P(A)P(B)
For example; a coin is tossed and a single 6 sided die is rolled. Find the probability of landing on the
head side of the coin and rolling a 3 on a die.
P(head)=1/2
P(3)=1/6
P(head and 3)=1/2x1/6=1/12

6

Question 3- b) In a bolt factory machines A, B, C manufacture 25, 35 and 40 percent of the total
output. Of their total output 5, 4 and 2 percent are defective respectively. A bolt is drawn at
random and is found to be defective. What are the probabilities that it was manufactured by
machines A, B and C?
Answer: Solution,
Let us assume the following:
Let'A1' be the event that machine A produced bolt
Let'A2' be the event that machine B produced bolt
Let'A3' be the event that machine C produced bolt
Let 'B' be the event that produced defective bolt
we are given that:
P(A1)=0.25, P(A2)= 0.35 , P(A3)= 0.4
P(B/A1)= 0.05, P(B/A2)= 0.04, P(B/A3)= 0.02
The required probabilities are calculated and represented in the table below
Event Ai

Prior
probability
P(Ai)
Conditional
probability
P(B/Ai)
Joint probability
P(Ai B)
Posterior probability
P(Ai/B)
A1

0.25 0.05 0.125 0.125/0.147=0.85034014
A2

0.35 0.04 0.014 0.014/0.147=0.0952381
A3 0.4

0.02 0.008 0.008/0.147=0.05442177
Total 1 0.11 0.147 1

Therefore, the required probability is, P(A1/B)= 0.85034014
P(A2/B)= 0.0952381
P(A3/B)= 0.05442177
The probability that the defective bolt was manufactured by the machine A=0.85034, B=0.0952381,
C=0.05442177

7

Question 4-a. What is a Chi-square test? Point out its applications. Under what conditions is this
test applicable?
Answer: Chi- square test is a statistical test done to see whether observed experimental data is good
fit with theoretical expected result. It is one of the most commonly used non-parametric test in
statistical work. The value of

is calculate as:


Where, Oi= the observed frequency
Ei= expected frequency.
In inferential statistics, the chi-square test can be applied for the discrete distributions. In using chi-
square test, we need no assumptions regarding the shape of sampling distributions. The applications
of chi-square test include testing:
the significance of sample variances
the goodness of fit of a theoretical distribution
the independence in a contingency table whether the observed results are consistent with
the expected segregation in breeding experiments of genetics.
where the first is a parametric test and the other two are nonparametric test.
The condition for applying the chi- square test are as follows:
1. The frequencies used in chi-square test must be absolute and not on relative term.
2. Total number of observation collected for this test must be large.
3. Each of the observation which make up the sample of this test must be independent of each
other.
4. As

test is based wholly on sample data, no assumption is made concerning the population
distribution.
5.

test is wholly dependent on degree of freedom. As the degree of freedom increase, the
chi-square distribution curve becomes symmetrical.
6. The expected frequency of any item or cell must not be less than 5, the frequency of
adjacent item or cell should be polled together in order to make it more than 5.
7. The data should be expressed in original units for convenience of comparison and the given
distribution should not replaced by relative frequencies or proportions.
8. This test is used only for drawing inferences through test of the hypothesis, so it cannot be
used for estimation of parameter value.

8

Question 4-b) Discuss the types of measurement scales with examples.
Answer: Measurement scales are of four types: They are as follows:
1. Nominal scale: Nominal scales are used for labeling variables, without any quantitative
value. Nominal scales could simply be called labels. Here are some examples: Notice that all
of these scales are mutually exclusive and none of them have any numeric significance.







Examples of nominal scales
2. Ordinal scale: With ordinal scale, it is the order of the value is what is important and
significance, but the differences between each other one is not really known. Ordinal scales
are typically measures of non-numeric concepts like satisfaction, happiness, discomfort etc.
Examples of ordinal scales are as follows:





Examples of ordinal scale
3. Interval scale: Interval scales are numeric scales in which we know not only the order, but the
exact differences between the values. For example, the Celsius temperature. Time is another
good example of interval scale in which the increments are known, consistent and
measurable.

4. Ratio scale: Ratio scale are the ultimate nirvana when it comes to measurement scales
because they tells us about the order, they tells us the exact value between units, and they also
have absolute zero- which allows for a wide range of both descriptive and inferential statistics
to be applied. Good example of ratio variables include height and weight.


What is your gender?
o M- male
o F- female
What is your hair color?
o Black
o Brown
o Blonde
o Grey
o Other
How do you feel?
o Happy
o Unhappy

How satisfied are you with our
service?
o Very satisfied
o Satisfied
o Unsatisfied
o Neutral
9

Question 5- Explain the components of time series.
Answer: A time series is a collection of well-defined data items obtained through repeated
measurement over time. For example, measuring the value of retail sales each month of the year
would comprise a time series. This is because sales revenue is well defined, and constituently
measured at equal space interval. Data collected irregularly or only once are not time series.
The components of time series are as follows:
1. Long term trend or secular trend:
This refers to the smooth or regular long term growth or decline of the series. This movement
can be characterized by a trend curve. If this curve is a straight line, then it is called a trend line. If the
variable increases over a song period of time, then it is called an upward trend. If the variable
decreases over a long period of time, then it is called a downward trend. If the variable moves upward
or downward along a straight line then the trend is called a linear trend, otherwise it is called a non-
linear trend.
2. Seasonal variation:
variations in a time series that are periodic in nature and occur regularly over short periods of
time during a year are called seasonal variations. These variations are precise and can be forecasted.
The following are examples of seasonal variations in a time series.
I. The prices of vegetables drop down after rainy season or in winter months and they go up
during summer, every year.
II. The prices of cooking oils reduce after the harvesting of oil seeds and go up after some
time.
3. Cyclic variation:
The long-term oscillations that represent consistent rise and decline in the values of the
variable are called cyclic variations. Since these are long term oscillations in the time series, the
period of oscillation is usually greater than one year. The oscillations are either a trend curve or a
trend line. The period of one cycle is the time-distance between two successive peaks or two
successive troughs.
4. Random variation:
Random variations are called irregular movements. Movements that occur usually in brief
periods of time, without any pattern and which are unpredictable in nature are called irregular
movements. These movements do not have any regular period of time of occurrences. For example,
the effect of national strikes, floods, earthquakes, etc. it is very difficult to study the behavior of such
a time series.

10

Question 6- a. What is analysis of variance? What are the assumptions of the technique?
Answer: Analysis of variance is a method of splitting the total variance of data into constituent parts
which measures the different sources of variations.
Analysis of variance is useful in such situation as comparing the mileage achieved by five different
brands of gasoline, testing which of four different training methods produce the fastest learning
record, or comparing the first-year earning of the graduates of half a dozen different business schools.
The assumption of the technique are as follows:
The samples are simple random samples
The samples are independent of each other
The parent population from which they are drawn are normally distributed.

The assumption that all the population should have normal distribution is hardly achieved in
practical cases. Hence it can be considered as limitation.
Question 6b- Three samples below have been obtained from normal populations with equal
variances. Test the hypothesis at 5% level that the population means are equal.
A B c
8 7 12
10 5 9
7 10 13
14 9 12
11 9 14
[the table value of F at 5% level of significance for v1=2 and v2=12 is 3.88]
Solution,
Let Ho: there is no significant difference in the mean of three samples
A B c
8 7 12
10 5 9
7 10 13
14 9 12
11 9 14
A=50

B=40 C=60

T= sum of all the observation= 150
Correction factor =

= 1500
SST( total sum of the squares) = sum of squares of all observations
-

= (

+....+

)- 1500 = 100
Sum of the squares of error between the columns (samples):
11

SSC = [

]
=

- 1500
= 40
Sum of the squares of the error within column (samples):
SSE = SST - SSC = 100 - 40 = 60
Variance between samples
MSC = SSC/(k-1) = 40/(3-1) = 40/2 = 20
Variance within the samples:
MSE = SSE/(n-k) = 60/(15-3)= 5
The degree of freedom = ( k - 1, n - k) = ( 2,12)
[ k is the number of columns and n is the total number of observations]
ANOVA Table
Source of
variation
Sum of squares df Mean square F - value
between SSC=40 2 MSC = 20 Fcal= 20/5=4
within SSE=60 12 MSE= 5
total TSS=100 14
F table value for degree of freedom(2,12) [V1=2, V2= 12] at 5% level of significance is 3.88. since F
table value is small than F calculated value, we reject the null hypothesis and conclude that sample
means are not equal.

You might also like