Professional Documents
Culture Documents
Unit 3
Unit 3
13 TE AL QUESTIONS
1 What is a statistical survey? Describe the steps to be followed while organising a
statistical survey.
2 What preliminaries should be accomplished before the data collection work starts?
Explain.
3 Differentiate between the primary and secondary data. Explain different methods of
collecting primary data and the sources of secondary data.
4 Why sample survey is preferred compared to census survey? Explain.
5 What is sampling? Explain various methods of sampling?
6 Write short notes on the following:
i) Characteristics of a good statistical unit.
ii) Significance of reasonable degree of accuracy in a statistical survey.
iii) Law of Statistical Regularity.
iv) Law of Inertia of Large Numbers.
v) Differentiate between Reasonable Accuracy, Absolute Accuracy and
Spurious Accuracy.
Note: These questions will help you to understand the unit better. Try to write
answers for them. But do not submit your answers to the univenity. These
are for y o u practice only.
0 Objectives
1 Introduction '
2 Accuracy
3 Approximation
3.3.1 Methods of Approximation
4 Eqors in Statistics
3.4.1 Errors of Approximation
3.4.2 Measurement of Errors of Approximation
3.4.3 Computation with Rounded Numben
3.4.4 Effect of Mathematical Operations on Emom
3.4.5 Biased and Unbiased Errors
3.4.6 Estimation of Biased and Unbiased Errors
;1 3.4.7
u:pg
;y
:;
Sampling and Non-sampling Errors
,I INTRODUCTION
with selection of
of accuracy. You
12 ACCURACY
1s you know, statistical data may be obtained by counting or by measuring or by
taking estimates. The data on cars produced may be obtained by counting the cars.
he data on nlilk powder producedmay be recprded by weighing the milk powder. But
hen government requires data on production of wheat, before the crop is harvested,
atistician can only estimate the total production. Counting, if done properly, results
1 exact numbers. But measurements and estimates on the bther hand are not exact.
or example, when a truck load of rpik powder is weighed on a weigh-bridge, a
ilogram more or less does not make a difference. Here, the weight is accurate upto a
ilogram. But if a pinch of powder is weighed on chemical balanm in the laboratory,
ien a milligram will tilt the balance. In this case, the weight is accurate upto a
figram, Thus,when articlesare measured, there is a limit to the accuracy depending
the mehuriqg instrument used,
Bnsic StatisticalConcepts here are various other factors which also effect the accuracy and lead to errors. You
will read about such factors later in this unit when we discuss about the sources of
errors. As we have discussed in the previous unit, it is very difficult to attain perfect
accuracy. Even in physical sciences like Physics, Chemistry, etc., it is very difficult to
achieve complete accuracy. In statistical measurement, as discussed in previous unit,
we are content with a reasonable degree of accuracy which is decided by keeping in view
its practical value, its use, cost of attaining it, nature and purpose of the survey, etc. In
many cases a high degree of accuracy is not even necessary. For examples, it may be
more meaningful to say that the population of a country is one million instead of saying
more accurately as 1,004,601. One should not, therefore, be particular about absolute
accuracy when it is not desirable.
Absolute accuracy may not give the desired clarity. For instance, you are comparing the
annual sale of polyester cloth and cotton cloth from a retail shop. I t is better to say that
they are in the ratio of 3:2 than to state the actual sale figure of Rs. 60,340'h the case
of polyester and Rs. 40,105 in the case of cotton cloths.
The extent of accuracy required will also depend upon the situation. A blacksmith
weighing iron may not worry about grams, but a goldsmith weighing gold will try to be
very accurate to a milligram.
The accuracy is a relative term. Measurements which are accurate for one purpose may
be inaccurate for another purpose. For example, in stating the population one may not
give the exact number. But when election results are stated, the exact number of votes
polled is very important as sometimes the victory margin may be even one vote. Thus,
the desired level of accuracy is guided by the purpose of enquiry.
Spurious Accuracy: When the level of accuracy is greater than its real or desired level,
it is called spurious accuracy. In statistics claim should not be made for such an accuracy
which does not exist. Note that spurious accuracy may be misleading also. In statistical
treatment of data, spurious accuracy often results in greater accuracy than is warranted
by data. Sometimes the find answer is represented with greater accuracy while the
intermediate calculations are done with less accuracy. This also leads to spurious
accuracy. A false appearance of extreme accuracy should be avoided.
3.3 APPROXIMATION
You must be aware that in several cases only approximate figures could be arrived at.
This is especially so in the case of measurement, as it is difficult to have accurate physical
measurement, although one can achieve the reasonable degree of precision. If
numerous digits are included in a figure, it may confuse. Through approximation we
can exclude the unnecessary digits and avoid any such confusion. Approximation
enables clear grasping and facilitates calculationsand comparisons. The extent to which
approximation should be done depends upon the degree of accuracy desired in the
data. The approximation is done by rounding off the digits.
Illustration 1
................................................................................... i . . . . ..............ma..m
-..-...
3 What do you understand by significant digits?
.......................................................................................................
.............................................
..............................................................
The word 'error' has a specific meaning in statistics. It means the difference between '
the true value and the estimated or approximated value of an item.-IiKorsare bound to
be there, as the estimates are often based on the sample observations, and themethods
involved also include approximation through rounding. Suppose you are interested t
estimate the percentage (bjl weight) of nitrogen id fertiliser. The samples may be tak
from different parts of the fertiliser mkture on different days. These samples are s
to different laboratories and analysed by different analysts using different method
There will be slight variatiqns in the composition of the mixture owing to day-to-d
changes in temperature, humidity and other factors.'Some of the variations are also
to differentsamples and results in sampling errors. Differences among the analysts ma
alsp gi* rise to some errors. Moreover, there miy also be errors of measurement.
will be termed as errors of observations. Thlis, in,experimentsand surveysthere
h;~ ..e kinds of errors, namely, (i) samprig errors, (ii) an4ytical errors, and
several
-
(iii)errors of observations and measurements. One must hear in mind that errors are not ~ c c u m yApproxlmstion
,
a d Errors
mistakes arising from the compilation of data. Inaccuracy due to arithmetical
miscalculationis not an error but simply a 'mistake'. Since statistics deals with estimated
and/or approximated values, the errors are inevitable. These errors cannot be
eliminated completely but there are ways to minimise them. However, mistakes can be
eliminated completely.
Sources of Errors
,There are three main sources of errors in statistics. They are discussed below:
1 Errors of origin: It is not possible to attain precision while measuring variables such
as height, weight, distance, etc. This is due to the limitations of the measuring
instruments. Therefore, there is ?1ways scope for difference between the
measurement and the actual state. Several times the selection of unsuitable statistical
units gives incorrect measurement. Another possibility is the informants giving
incorrect answers. Biased collection of data i.e., bias of the person collecting the
data, also causes errors. The errors so generated are called erron of origin. This type
of error tends to increase with the number of observations.
2 Errors of inadequacy: In any enquiry the sample should be representative of the
population. If the size of sample is very small and the sample is not correctly
representing the population, errors creep in. Such errors are called errors of
inadequacy.
3 Errors of manipulation: There may be several errors unconsciously committed by the
- investigator in measuring, counting and classifying the objects. Such errors together
with the errors due to approximation are termed as errors of manipulation. These
errors increase with an increase in the number of observations.
In any statistical investigation all these three types of eFofs prevail.
IUustration 2
Find the relative error and percentage error when 2,234.752 is rounded to the
1 nearest two digits after decimal
2 nearest,whole number
3 nearest hundred Accuracy. Approximation
and Errors
4 nearest thousand.
Solution:
If you study the above illustration carefully, you can notice the following aspects:
1 The maximum absolute error increases as the order of rounding increases, i.e.,
increasing number of digits are left out.
2 The relative error also increases as the order of rounding increases.
Thus, precision is reduced due to the higher order of rounding.
Effect of Addition
The absolute error of a sum is equal to the sum of absolute errors of its components.
For example, add 500 (to the nearest 10) and 400 (to the nearest 100). This statement
can be presented as below:
Note that the absolute error is f 55 (i.e., 5 + 50), the relative error is +0.061(+,55/900),
and the percentage error is f6.1% ( f 0.061 x 100.)
Basic Statistical Concepts Effect of Subtraction
The absolute error of difference equals the sum of errors of its components. For
example, from500(to thenearest 10) subtract 400 (to the nearest 100). Thedifference
500 - 400 = 100 will have the error as 5 + 50 = 55. This can be expressed as follows.
Let us explain it in detail. Maximum error will occur when the greater figure is at the
greatest and lower figure is at the lowest or.vice versa. Thus, the absolute error of
differencecan be calculated as follows:
Subtractionwill have
-
Fimres Error Absolute
Error
Maximum
Value
Minimum
Value
The absolute error is +55 (i.e., 5+50), the relative error is kO.55 ( f 55/100), and the
percentage error is +55% (k0.55 X 100). If you compare the errors in addition and
subtraction, you \;rillnotice that the relative error in subtraction is much higher than in
addition (because the base is smaller) while the absolute errors are equal. You may
note that in both addition and subtraction, absolute error is the sum total of absolute
errors of the components.
Effect of Multiplication
The relative error of a product is approximately equal to the sum of the relati.ve error(
of its components. Multiply 500 (to the nearest 10) by 40 (to the nearest unit). Now
absolute error in 500is +5 and the relative error is +I%. Absolute error in40 is k0.5, /
and the relative errors is f1.25%. The multiplication of 500 and 40 is 2,000. This can!
I
be presented as follows: 1
Here relative error +2.25% is the sumof +1% and f 1.25%. Let us explain it further. i
The maximum value of the product will be: I
Normally, when the errors are small the product of the two errors i.e., the term
(5 x 0.5) in (a) and (b), is ignored as it is small. So the absolute error in 500 X 40 i.e.'
+
2,000 is (5 X 40) (0.5 x 500) which is equal to 450. This means relative esor is 450.
2,000 = 0.0225 and the percentage error is 2.25%.
Effect of Division I
The relative error of a quotient is approximately equal to the sum of the relative errors
1
of its components. To explain this, let us take the example discussed under
multiplication and divide it. We have 500 / 40 = 12.5. Taking into account the relative
errors, we will have as per statement:
/
This means relative error in the quotient 2.25% is the sum of two relative errors 1% an$
I
1.25%. To check it, we find out the smallest value and the largest value of the divisioi
and see how much it is less or more than the division of 500 by 40 i.e. 12.5%.
I[
The smallest value of the division will be obtained when the smallest value of the I
numerator I
(i.e. 500 - 1%) is divided by the largest value of the denominator (i.e., 40 + l.s%)#
I
Now500-1% =500-5=495,and40+ 1.25% =40-1- 0.5=40.5
i
So we have the smallest value of the division as 495 s 40.5 = 12.22. The difference Accuracy, Appmxlmatic '
~d Error6
between 12.50 and 12.22 is 0.28, which is the absolute error. The relative error is
0.28 + 12.5 X 100 = 2.24% or approximately 1% + 1.25% which is the sum of the
relative errors in two numbers.
Similarly, we can find out the largest value of the division and see how much it is more
than the value 500 s 40 i.e., 12.5. This will be (500 + 5) i (40 - 0.5) - 12.5 = 505 a
39.5 -39.5 - 12.5 = 0.28. This is also the same as the earlier difference. So the relative
error of a quotient is approximately equal to the sum of the relative errors of its
components.
............................................................................................................
ii) A - B ..................;.............................................
, . i... ......................
iii) AX B .....................,.,..........,.............,..,.,..........
..........................
iv) A 4 B .........................................................................................
3.4.5 Biased and Unbiased Errors
As you know we cannot avoid errors in statistics. Although we accept the inevitability
of errors, it is important to know if such errors are biased or unbiased. Let us now
discuss about these two types of errors. I
I
Unbiased Errors
When the errors tend to cancel each other, they are called unbiased or compensating
errors. For example, let us find out the total error in rounding of six numbers, 21, 22,
24,26,27 and 28 to nearest tens. The first three figures i.e. 21,22, and 24 would be
approximated to 20 each with a total error of +7. The remaining three figures i.e. 26,
27 and 28, would be approximated to 30 each with a total error of-9. The total error in
the sum of all these six figures is 7 - 9 = -2 only. Thus, when errors are unbiased, in
some cases the approximated value is less than the true value and in other cases the
approximated value is more than the true value. Therefore, errors are positive as well
as negative and as a result they nullify each other. The net error is negligible. The larger
the number of items under review, the smaller will be the unbiased errors. As the
number of observations increase, the unbiased errors have a tendency to decrease.
Thus, to minimise the unbiased error one of the methods used is to increase the number
of observations. Study lllustration 3 carefully.
Illustration 3
Similarly, we can also estimate the relative error with the following formula:
Relative Error = AAE x fis Approximated Total
= 250 fl+ 51,000
= 0.0098 or 0.98%
Sampling Errors
The errors caused by drawing inference about the population on the basis af samples
are termed as sampling errors. The sampling errors result from the bias in the selection
of sample units. These errors occur because the study is based on a portion of the
population. If the whole population is taken, sampling error can be eliminated. If two
or more sample units are taken from a population by random sampling method, their
results need not be identical and the results of both of them may be different from the
result of the population. This is due to the fact that the selected two sample items will
not be identical. Thus, sampling error means precisely the difference between the
sample result and that of the population when both the results are obtained by using the
same procedure or method of calculation. The exact amount of sampling error will
differ from sample to sample. The sampling errors are inevitable even if utmost care is
taken in selecting the sample. However, it is possible to minimise the sampling erfors
by designing the survey appropriately.
Sampling errors are of two types: (i) biased sampling errors, and (ii) unbiased sampling
errors. Let us now discuss about them in detail.
1 Biased Sampling Errors: A bias may be said to exist when the values of the statistics
obtained from the survey have a tendency to deviate only in one direction. Therefore,
this type of error does not cancel out. These errors arise due to bias in the selection
ic tati is tical Concepts of sample units, faulty collection of data, bias in analysis, etc. For example, possibility
of biased sampling errors is more when the sample units are selected through
deliberate sampling method instead of random sampling method.
Further, in case of difficulties in collecting inkmation from some of the sampling
units included in the random selection, the investigator might substitute them by
some other units of the population. This also leads to bias if the substitute units are
not selected randomly. Sometimes the respondents do not furnish all the information
and if the investigator himself supplies the remaining information, this would also
lead to bias. In some cases, the informationsupplied by the respondent itself may be
biased, specially when the informant wants to conceal some facts from the
investigator. Any consistent error in measurement will give rise to bias. Bias can also
creep in as a result of improper data collection instruments and the incompetence of
the investigator. Limitations of collection procedure, coding and methods of analysis
also give rise to bias. This kind of errors tend to grow as the number of observations
increase. Biased sampling errors are cumulative in nature.
2 Unbiased Sampling Errors: These errors arise due to chance differences between the
units ofpopulation included in the sample and those not included. Errors which arise
just on account of chance are called unbiased samplingerrors. They are not the result
. of any bias. These errors do not accumulate with an increase in the number of
observations. On the other hand these errors have a tendency to get neutralised with
the increase in the number of observatibns. Therefore, these errors are also called
compensating errors or non-cumulativeerrors.
Thus, the total sampling errors are made of biased as well as unbiased errors. The main
objective of the statistical method in any survey is to devise sampling schemes so that
biased errors are eliminated as far as possible and the unbiased errors are reduced to;
the pinimum.
Non-sampling Errors ,
These non-sampling errors can occur in any survey, whether it be a complete ,
enumeration or sampling. Non-sampling errors include biases as well as mistakes.
These are not chance1errors.
Most of the factors causing bias in complete enumeration are similar to the one
described above under sampling errors. They also include careless definition of
population, a vague conception regarding the information sought, inefficient method
of interview and so on. Mistakes arise as a result ofimproper coding, computations and
processing. More specifially, non-sampling errors may arise because of one or more of
the following reasons:
i) Improper and ambiguous data specifications which are not consistent with the
census or survey objectives.
ii) Inappropriate sampling methods, incomplete quextionnaire and incorrect way of
interviewing.
iii) Personal bias of the investigators or informants.
iv) Lack of trained and qualified investigators.
v) Errors in compilation and tabulation.
This list is not exhaustive, but it indicates some of the main possible reasons.
b
The sum of sampling errors and non-sampling errors is the total errors. In any survey,
the objective is to minimise this total error. Non-sampling errors can be easily
controlled by defining population precisely, constructing the questionnaire carefully
and pre-testing it, training the investigators, proper checking and monitoring of every
step carefully. But this is possible when the number of items is small. Otherwise, it will
be very time consuming and costly. But by keeping the sample size small, the sampling
errors increase. Hence, while planning a survey you have to be very careful in allocatini
your limited resources viz., money, time and human resources. The allocation should
be done in such that the sampling and non-sampling errors are minimised and thereby
maximum level of accuracy is achieved.
............................................................................................................
What are the causes of non-sampling errors?
Exercises
1 Using the rules for rounding to the nearest digit, round the following numbers first
to four significant digits, then to three significant digits and finally to two significant
digits.
(i) 5.6994 (ii) 47.251 (iii) 3.0009 (iv) 0.0064251 (v) 5.34651
2 The population of a town has been estimated at 49,000 and actual population is
50,000. Calculate (i) absolute error, (ii) relative error, and (iii) percentage error
Answer: i) 1,000 ii) 0.02 iii) 2%
3 The distances of two petrol pumps from the city limit were measured as225 0.5 kms+
+
and 175 0.3 kms. Find out the absolute error (AE) ,and relative error (RE) in (I)
the total distance, and in (2) the difference of two distances.
+ +
Answer: (1) A E 0.8, RE 0.002; (2) f0.8, R E 0.0166 +
4 There are two measures: A = 375 and B = 25. Each of these two measurements are
subject to an error of f 10%. Determine the range of absolute error and relative
error under the following cases:
i) The sum (A+B)
ii) The product (A B)
iii) The difference (A-B)
iv) The quality A/B
+ +
Answer: i) A E 40 R E +- 10; ii) AE rt 1969 R E 21; iii) AE rt 40 RE +- 11.4;
+
iv) AE +- 3.3 RE 22.2
5 The cost of 8 kg of sugar is Rs. 13. If the cost of 1 kgis given as Rs. 1.60 find absolute
and percentage errors.
Answer: AE 0.025, PE 1.5%.
6 Find the absolute error and relative error in computing the average speed if total
distance was rounded to 1,440 km. and the time was rounded to 70.5 hours.
Answer: AE 0.022, RE 0.0011
7 The sales made by the five regionaviices of a company are shown below:
Regions Units
North 13,434 ,
South 54,682
East , 36,482
West 17,804
Central 34,384 I
I
Approximate the above figures to the nearest thousand. Estimate the relative error
of the total sales assuming that the actual sales values are not known. How the
estimated relative error will change if the sale figures are approximated to the next
thousand above the actual figure? Compare the estimated relative error with actual
relative error. L
Answer: 0.004.
Note: These questions and exercises will help you to understand the unit better.
Try to write answers for them, But do not send your answers to the
university. These are for your practice only.