Professional Documents
Culture Documents
Chapter 1
Chapter 1
Introduction to Statistics
1.1Introduction
In this chapter, we are going to introduce the subject Statistics and the basics of data
collection.
All the notions related to the history of Statistics, its definition, importance, limitations and
functions will be discussed. In addition, the two types of Statistics and the different stages
of Statistical investigation are the major topics that are going to be addressed in the
introductory chapter.
Nowadays most executives and other decision makers pass effective decisions based on
research findings. Most researches in different areas of study require data so as to generate
valuable information that facilitate the decision making process. Data are raw materials for
researches. Moreover, the quality of the collected data greatly affects or determines the
precision of the results to be obtained from a specific investigation. Therefore, it is
extremely important to know about the basics of data collection.
It is believed that Statistics have originated from two main sources, namely, government
records and mathematics. In all the countries of ancient culture, like Egypt, Judea and
Rome there is evidence to show that they had some system of collecting Statistics. Since
Statistical data were collected for governmental purposed such as taxation and evaluation
of military strength, Statistics was then described as the “Science of Kings” or “the science
of statecraft”. On the other hand, Statistics had been considered as a branch of
mathematics because most of the Statistical methods are based on the mathematical theory
of probability.
Statistics, just like any other disciplines, has passed through different developmental stages.
A number of individuals have contributed a lot in the development process. Of course, the
present body of Statistical theory (method) is achieved as a result of the concepts and
theories contributed by numerous scholars at different points in time. It is , therefore,
appropriate to state the contribution forwarded by some of the scholars so as to understand
or show the growth of the science of Statistics.
In the sixteenth century, Tycko Brave (1554-1601) collected valuable information about
the movements of planets and Johannes Kepler made an exhaustive study of these data and
discovered the three famous laws relating to the movement of planets. It was on the basis of
these laws that Sir Isaac Newton formulated his theory of gravitation.
1
MESERET TADDESSE EJETA
In the seventh century, Captain John Graunt (1620-1674) studied Statistics of births and
death, often called Vital Statistics, for the first time. In addition James Dodson, Thomas
Simpson, Dr. Price and others, computed mortality tables, which was the basis for the
origin of life insurance. It was in 1698 that the first life insurance institution founded in
London. Adding to the knowledge of Vital Statistics, J.P. Sussmilch (1707-1767) stated
that as a matter of natural law, the ratio of births and deaths remain more or less constant.
Moreover, Jacob Bernoulli (1654-1705) stated the law of large numbers for the first time.
The famous mathematician Abraham De-Moiré (1667-1754) discovered the normal curve
which forms an integral part of modern Statistical theory. Laplace (1749-1827) and Gauss
(1777-1855) independently arrived at the same results as of De-Moiré.
In the eighteen century, L.A. Jacques Quetlet (1798-1874) put forward the fundamental
principle “The Constancy of Great Numbers” which is the basis for sampling. Apart from
the great work of Quetlet, a number of mathematicians, including Moiré, Euler, Lagrange,
Chrystal, Bayes Todhunter, Gauss Morqau, Lexis, Laplace and Charlier, added a great deal
to the development of the theory of Statistics. However, the one which was done by Pierra
Simon De Laplace (1749-1827) is recognized as one of the best ever done on the subject
Probability. In fact, the work of Laplace was based on that of Jacob Bernoulli’s and his
nephew Daniel Bernoulli (1700-1782).
In the nineteenth and twentieth centuries, much of the development in Statistical techniques
has taken place. To mention few, Sir Francis Galton (1822-1911) developed the concept of
regression and other related Statistical Methods in the field of biometry, Karl Pearson
(1857-1936) developed the Chi-square goodness of fit test, Sir Ronald Fisher (1890-1962)
made invaluable contributions in the field of experimental design.
Many authors agree that Pearson; Fisher and Galton are the real giants in the development
of Statistical Methods.
In recent years, the domain of Statistical Methods has considerably widened and today
there is almost no science which does not make use of Statistical Methods. In other words,
though the degree of association may differ, any science is directly or indirectly associated
or related to Statistics.
Although the term Statistics is defined in a number of ways, all the definitions converge to
two basic aspects. That is, Statistics may be defined as Statistical data (plural sense) or it
can also be defined as a method (singular sense). Each one of these definitions is treated
separately as follows.
1.3.1 Statistics defined as data (Plural sense)
According to this notion, Prof. Horace Secrist gives the following definition:
“Statistics refer to the aggregates of facts affected to a marked extent by multiplicity of
causes, numerically expressed, enumerated or estimated according to reasonable standards
of accuracy, collected in a systematic manner for a pre-determined purpose and placed in
relation to each other.‘
2
MESERET TADDESSE EJETA
This definition makes it clear that Statistics (as numeric data) should possess the following
characteristics.
3
MESERET TADDESSE EJETA
v. They should be collected in a systematic manner
If data are collected in haphazard manner, then results to be obtained are likely to lead to
fallacious conclusions. Therefore, it is essential that Statistics must be collected in a
systematic manner so that they may confirm to reasonable standards of accuracy.
In general, it can be said that all statistics are numerical facts; but not all numerical facts are
statistics. In order numerical facts to be statistics, the above seven characteristics have to be
satisfied.
Example 1.1: i) To a manager of an industry, statistics are perhaps daily reports of inventory
levels, absenteeism and production;
ii) For a College student, statistics may be the grades on all the quizzes in a
course of
a given semester.
Each of the above examples shows the meaning of statistics when used in its plural sense.
Seligman
“Statistics is the method of judging collective, natural or social phenomenon from the results
obtained on the analysis or enumeration or collection of estimates.”
King
4
MESERET TADDESSE EJETA
“Statistics is the application of the scientific method in the analysis of numerical data for the
purpose of making rational decisions.”
Berenson and Levin
“Statistics is the collection, organization, presentation, analysis and interpretation of numerical
data.”
Coraxton and Cowden
Summing up all the above definitions, one can define Statistics preferably as:
Statistics is the study of the principles and methods used in the collection, organization,
presentation, analysis and interpretation of numerical data in any sphere of enquiry.
Data: In Statistics, all conclusions are based on facts and the first step in any statistical
investigation is to collect a set of related observations from which conclusions may be drawn.
Such related observations that form the set are known as Data. The word data was obtained from
the singular Latin word “datum “to mean fact.
Population: The complete collection of individuals, objects, or measurements that have a
characteristic in common or totality of related observations in a given study is described as a
population. The population that is being studied is also called the target population.
Example1.2
Sample: A sub group of the population that will be studied in detail is called a Sample.
Example 1.3
In studying the average marks of students in Statistics course, if you take marks of 20 students
out of all 150 students who might have taken the course, the collected data will be a sample.
Parameters: are statistical measures obtained from a population data. These measures may
include the mean, variance, standard deviation, etc. and are denoted by , 2 , , etc.
respectively.
Sample Statistic: a number computed from a sample data. Sample Statistics are denoted by
lower case letters of the alphabet such as x -the sample mean, s 2 - the sample variance, etc
5
MESERET TADDESSE EJETA
Variable: is a characteristic under study that assumes different values for different elements and
most of the time variables are denoted by the letters X, Y, Z, etc.
Example 1.4: Height, Weight, Age, Income, Expenditure, Grade, Intelligence, sex, color, etc.
Quantitative Variable: is variable that can be expressed numerically such as height, weight, age,
Qualitative Variable: is variable that cannot assume a numerical value but can be classified into
two or more nonnumeric categories such as the gender of a person, the language in which a book
is written, hair color, and so on.
Discrete Variable: is a variable whose values are countable such as family size, number of
students in a class, etc. Its values are obtained by counting.
Continuous Variable: is a variable which can, theoretically assume any numerical value
between two given values. The respective values of such continuous variables are obtained by
measuring such as time taken to complete a certain examination, height of a person, age of a
person, test scores of a student and so on.
Elementary Unit: is a specific person, business, product account, and so on with some
characteristic to be measured or categorized.
Statistical Design: is a process that involves a decision problem and choosing an approach to
solve the problem. It is a guide that indicates how an investigation is going to channeled.
Sample Frame: A list of the entire population from which items can be selected to form a
sample is referred to as sample frame.
6
MESERET TADDESSE EJETA
In short, Descriptive Statistics describes the nature or characteristics of data without making
conclusion or generalization.
Example 1.5: The average age of athletes participated in London Marathon was 25 years, 80%
of the instructors in Adama University are males, The marks of 50 students in Statistics course
are found to range from 30 to 85, etc. are some examples of Descriptive Statistics.
Inferential Statistics is concerned with the process of drawing conclusions (inferences) about
specific characteristics of a population based on information obtained from samples, performing
hypothesis testing, determining relationships among variables, and making predictions.
Example 1.6: The result obtained from the analysis of the income of 100 randomly selected
citizens in Ethiopia suggests that the average perception income of a citizen in Ethiopia is 30
Birr; the average income of all families in Ethiopia can be estimated from figures obtained from
a few hundred families.
7
MESERET TADDESSE EJETA
4. Analysis of Data
After collection, organization and presentation the next step is that of analysis. This is the
extraction of summarized and comprehensible numerical descriptions of the data where these
measures will in turn give a far better understanding the nature of the data.
The purpose of analyzing data is to dig out information useful for decision making
Methods used in analyzing the presented data are numerous, ranging from simple observation of
the data to complicated, sophisticated and highly mathematical techniques. The most commonly
used methods of statistical analysis are measures of central tendency, measures of variation,
correlation, regression, estimation and hypothesis testing
5. Interpretation of data
This is the last stage in statistical investigation. It is the task of drawing conclusions from the
analysis of the data and usually involves the formulation of predictions concerning a large
collection of objects from information available for a small collection of similar objects. This
step usually involves decision making about a large collection of objects (Population) and about
information gathered from a small collection of similar objects (sample). The interpretation of
data is a difficult task and necessitates a high degree of skill and experience.
Statistics only deals with aggregates of values. For example, age of a single student in a given
class in a given year is not a Statistical data. In contrast, the age of all students within a given
class in a given year form an aggregate and hence can be considered as data. Alternatively, the
semester GPA of a single student for 4 semesters also forms a Statistical data. In short, Statistical
8
MESERET TADDESSE EJETA
methods are suited only to those problems of situations where group characteristics are desired to
be studied.
Another limitation of Statistics is that it deals with those subjects of inquiry that are capable of
being quantitatively measured and numerically expressed. Accordingly, such qualitative
characteristics as health, poverty, honesty and intelligence are not suitable for Statistical analysis.
However, problems involving such qualitative variables are treated in Statistics indirectly. For
example, the variable health may be studied through death rate, which is a quantitative variable.
However, these are only indirect methods.
Statistical results are true only in general and on average statistical rules are not physical laws.
They are derived by taking a majority of cases and are not true for every individual. The
conclusions obtained statistically are not universally true. They are only true under certain
conditions. This is because statistics as a science is less exact as compared to natural sciences.
Thus, the statistical inferences are uncertain.
The greatest limitation of statistics is that it is liable to be misused. The misuse of statistics may
arise because of several reasons. It may result from having incomplete data or from lack of
having enough experience in using it. Statistical data should be properly handled by someone
who has a founded knowledge of statistics in order to arrive at a correct conclusion from the
gathered information.
Statistical interpretation requires a high degree of skill and understanding of the subject. In order
to get meaningful results, it is necessary that the data be properly and professionally collected
and critically interpreted. It requires intensive training to read and analyses statistics in its proper
context. It may lead to fallacious conclusions in the hands of the inexperienced persons.
Example 1.7: From 1995 E.C. graduates of Accounting at ASTU more than 80 percent of the
females graduated with the GPA above 2.5. Therefore, females are better in Accounting than any
other field.
Here the given information is not sufficient to make the conclusion stated because
1) It is a data taken from 1995 E.C. only and does not also include the performance of
females in the other departments.
2) It does not tell the female to male proportion, where the fact may be there were only two
female students in the Accounting department who graduated that year and all of them
9
MESERET TADDESSE EJETA
Example 1.8: The argument that drinking alcoholic beverages is bad for longevity because 95
percent of the persons who take alcoholic beverages die before the age of 80 years is statistically
defective, since we are not told what percentage of persons who do not take alcoholic beverages
die before reaching that age. So, statistical conclusions that are based on incomplete information
may lead us to fallacies.
Thus, if statistics is not properly used, it is capable of being misinterpreted and, generally, there
may be distrust of statistics in the minds of the public. Therefore, it must be kept in mind that
statistics should be used only as a tool and not as end by itself.
10
MESERET TADDESSE EJETA
1.9 LEVELS (SCALES) OF MEASUREMENT
Measurement can be defined as the assignment of numbers to objects and events according to
logically acceptable rules. The number system is highly logical and offers a multiplicity of
possibilities of further logical manipulations.
A measurement scale should possess the following attributes to allow for these logical
manipulations
Magnitude
Is the quantum or quantity in which the attribute exists in various instances of the phenomena. It
allows us to tell whether one instance of the attribute is greater than, less than or equal to another
instance of the attribute.
Example 1.9: If X gets a score of 20 on an aggressiveness scale and Y a score of 25, we can say
that Y is more aggressive than X.
Equal intervals
It denotes that the magnitude of the attribute represented by a unit of measurement on the scale
is equal regardless of where on the scale the unit falls.
Example 1.10: A difference in heights between 60 inches and 65 inches is equal to the
difference in height, between 67 inches and 72 inches.
Absolute zero point
Is a value that indicates zero exists at that point or nothing at all of the attribute being measured
exists.
For example, a zero weight indicates “no weight” at all.
Keeping in mind these three characteristics of measurements, scales of measurement can be
divided in to four different types.
1. Nominal Scale (Classificatory Scale)
It refers to the simple classification of objects or items in to discrete groups which do not bear
any magnitude relationships to one another.
In the nominal scale of measurement, numbers are used simply as labels for groups or classes.
Example1.11: If our data set consists of blue, green, and red items, we may designate blue as 1,
green as 2, and red as3. In this case, the numbers 1, 2 and 3 stand only for the category to which
a data point belongs.
“Nominal” stands for “name” of category. The nominal scale of measurement is used for
qualitative rather than quantitative data: blue, green, red, male, female; marital status (married,
single, divorced, etc.); professional classification; geographic classification; and so on.
11
MESERET TADDESSE EJETA
Ordinal scale reflects only magnitude and does not possess the attribute of equal intervals or an
absolute zero point
Example 1.12: Let us consider the set of all diplomas. B.Sc /B.A., and M.Sc. /MA graduates of a
certain university at the end of a certain academic year. The graduates could be classified
according to their level of education. It is possible to say “the M.Sc/MA graduate is better than
the B.Sc. /BA. Graduate “or to say” the B.Sc./BA. Graduate is better than the diploma graduate”.
Therefore, we can have meaningful inequalities.
3. Interval Scale
The interval scale possesses two out of three important requirements of good measurement scale
i.e., magnitude and equal intervals but lacks the real or absolute zero point.
An interval scale is one, which provides equal intervals from an arbitrary origin. An interval
scale not only orders according to the amount of the attribute they represent, but also establishes
equal intervals between the units of measure. Equal differences in the numbers represent equal
differences in the attribute being measured.
Example 1.13: If we apply a meter scale to measure heights of the students, we any find their
height to be 150cm, and so on. These measurements are on a scale with equal intervals. The
distance of height between 150cm and 160cm is exactly equal to the distance of height between
170cm and 180cm.
Example 1.14: The Fahrenheit and centigrade thermometers are examples of interval scales.
On an interval scale, both the order and distance relationships among the numbers have meaning.
We may assert 30 and 31 degrees centigrade equal to the differences between 40 and 41 degrees
centigrade. We could not say, however 50oC is twice as hot as 25oC. This is because there is no
true zero point on an interval scale. Since zero is arbitrary, multiplication and division of
numbers are not appropriate.
4. Ratio Scale
The scale of measurement which has all the three attributes – magnitude, equal intervals and an
absolute zero point- is called a ratio scale. Addition, subtraction, multiplication and division of
the numbers are appropriate.
Example 1.15: - All physical measurements, like height, weight, etc.
- Number of students in various classes.
- Number of books possessed by students of a class, etc.
12
MESERET TADDESSE EJETA