Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Chapter 1

Introduction to Statistics

In this chapter, we are going to introduce the subject Statistics and the basics of data
All the notions related to the history of Statistics, its definition, importance, limitations and
functions will be discussed. In addition, the two types of Statistics and the different stages
of Statistical investigation are the major topics that are going to be addressed in the
introductory chapter.
Nowadays most executives and other decision makers pass effective decisions based on
research findings. Most researches in different areas of study require data so as to generate
valuable information that facilitate the decision making process. Data are raw materials for
researches. Moreover, the quality of the collected data greatly affects or determines the
precision of the results to be obtained from a specific investigation. Therefore, it is
extremely important to know about the basics of data collection.

1.2 History of Statistics

The word Statistics has originated from the Italian word”Statista” or the German word
‘Statistik” each of which means a political state. The book entitled “Elements of Universal
Eradiation” is the first time to designate the science that teaches us what is the political
arrangement of all the modern states of the known world.

It is believed that Statistics have originated from two main sources, namely, government
records and mathematics. In all the countries of ancient culture, like Egypt, Judea and
Rome there is evidence to show that they had some system of collecting Statistics. Since
Statistical data were collected for governmental purposed such as taxation and evaluation
of military strength, Statistics was then described as the “Science of Kings” or “the science
of statecraft”. On the other hand, Statistics had been considered as a branch of
mathematics because most of the Statistical methods are based on the mathematical theory
of probability.

Statistics, just like any other disciplines, has passed through different developmental stages.
A number of individuals have contributed a lot in the development process. Of course, the
present body of Statistical theory (method) is achieved as a result of the concepts and
theories contributed by numerous scholars at different points in time. It is , therefore,
appropriate to state the contribution forwarded by some of the scholars so as to understand
or show the growth of the science of Statistics.

In the sixteenth century, Tycko Brave (1554-1601) collected valuable information about
the movements of planets and Johannes Kepler made an exhaustive study of these data and
discovered the three famous laws relating to the movement of planets. It was on the basis of
these laws that Sir Isaac Newton formulated his theory of gravitation.

In the seventh century, Captain John Graunt (1620-1674) studied Statistics of births and
death, often called Vital Statistics, for the first time. In addition James Dodson, Thomas
Simpson, Dr. Price and others, computed mortality tables, which was the basis for the
origin of life insurance. It was in 1698 that the first life insurance institution founded in
London. Adding to the knowledge of Vital Statistics, J.P. Sussmilch (1707-1767) stated
that as a matter of natural law, the ratio of births and deaths remain more or less constant.
Moreover, Jacob Bernoulli (1654-1705) stated the law of large numbers for the first time.
The famous mathematician Abraham De-Moiré (1667-1754) discovered the normal curve
which forms an integral part of modern Statistical theory. Laplace (1749-1827) and Gauss
(1777-1855) independently arrived at the same results as of De-Moiré.

In the eighteen century, L.A. Jacques Quetlet (1798-1874) put forward the fundamental
principle “The Constancy of Great Numbers” which is the basis for sampling. Apart from
the great work of Quetlet, a number of mathematicians, including Moiré, Euler, Lagrange,
Chrystal, Bayes Todhunter, Gauss Morqau, Lexis, Laplace and Charlier, added a great deal
to the development of the theory of Statistics. However, the one which was done by Pierra
Simon De Laplace (1749-1827) is recognized as one of the best ever done on the subject
Probability. In fact, the work of Laplace was based on that of Jacob Bernoulli’s and his
nephew Daniel Bernoulli (1700-1782).

In the nineteenth and twentieth centuries, much of the development in Statistical techniques
has taken place. To mention few, Sir Francis Galton (1822-1911) developed the concept of
regression and other related Statistical Methods in the field of biometry, Karl Pearson
(1857-1936) developed the Chi-square goodness of fit test, Sir Ronald Fisher (1890-1962)
made invaluable contributions in the field of experimental design.
Many authors agree that Pearson; Fisher and Galton are the real giants in the development
of Statistical Methods.

In recent years, the domain of Statistical Methods has considerably widened and today
there is almost no science which does not make use of Statistical Methods. In other words,
though the degree of association may differ, any science is directly or indirectly associated
or related to Statistics.

1.3 Definition of Statistics

Although the term Statistics is defined in a number of ways, all the definitions converge to
two basic aspects. That is, Statistics may be defined as Statistical data (plural sense) or it
can also be defined as a method (singular sense). Each one of these definitions is treated
separately as follows.
1.3.1 Statistics defined as data (Plural sense)
According to this notion, Prof. Horace Secrist gives the following definition:
“Statistics refer to the aggregates of facts affected to a marked extent by multiplicity of
causes, numerically expressed, enumerated or estimated according to reasonable standards
of accuracy, collected in a systematic manner for a pre-determined purpose and placed in
relation to each other.‘

This definition makes it clear that Statistics (as numeric data) should possess the following

i. Statistics should be aggregates of facts

Single and isolated figures are not Statistics for the simple reason that such figures are
unrelated and can’t be compared. According to this aspect, to be Statistics, data must be in
aggregate (mass) and also the individual elements within the aggregate should relate to a
common phenomenon so that they can be compared to one another. The statement ‘Haile is
22 years old’ is not a statistics although it is a numerical statement of fact.
In order such numerical facts to be regarded as statistics, they have to be comparable and
have to be given as parts of a common phenomenon and their relationship has to be clearly

ii. Statistics should be affected to a marked extent by multiplicity of causes

Since Statistics are most commonly used in social sciences it is natural that they are
affected by a large variety of factors at the same time. Putting differently, Statistics are not
as such caused by a single factor (force), rather they are outcomes of a number of
(multiple) factors (forces) operating together.
For example, the educational performance of a student is affected by the living condition of
the person, by the language of instruction used and her/his mastery of the language, the
school atmosphere, etc. But it is very difficult to study separately the effect of each of those
forces on the performance of the student.

iii. They should be expressed numerically

All Statistics are expressed in numbers. Nevertheless, the converse of this statement is not
in general true. That is, statements expressed in terms of numbers may not necessarily be
Qualitative statements such as ‘Ethiopia is showing a rapid economic growth’ or ‘Majority
of the rural population of Ethiopia is illiterate’ do not constitute statistics. Because, such
statements are vague. On the other hand, statements such as ‘The economy of Ethiopia is
growing at a rate of 9% per year or ’70% of the population of rural Ethiopia is illiterate’ are
statistical statements.

iv. They should be enumerated or estimated according to reasonable

standards of accuracy
Information can be gathered either by counting and measuring or by estimation.
Estimations cannot be as accurate as actual counts and measurements. But even in
estimations, the information has to be accurate at least to a specified and reasonable degree.
In fact, the degree of the accuracy desired largely depends on the nature of the object of
study. For example, in measuring the weight of a gold and weight of a person, the degree of
accuracy shouldn’t be the same. When measuring the weight of a gold, even milligrams are
important. But when measuring the weight of a person, even hundreds of grams can be
ignored. However, it is important that reasonable standards of accuracy should be attained;
otherwise, numbers may be altogether misleading.

v. They should be collected in a systematic manner
If data are collected in haphazard manner, then results to be obtained are likely to lead to
fallacious conclusions. Therefore, it is essential that Statistics must be collected in a
systematic manner so that they may confirm to reasonable standards of accuracy.

vi. They should be collected for a predetermined purpose

Statistics collected without any predetermined purpose do not serve any useful purpose.
Therefore, the purpose of collecting Statistics should be defined clearly before they are
collected. Meaning figures (Statistics) should be collected in view of some goal or target.
Moreover, the data should be collected in such a manner that it meets the predetermined

vii. They should be placed in relation to each other (Comparable)

From practical point of view, for statistical analysis the data should be comparable. They
may be compared with respect to some unit, generally time (period) or place.
For example, the data relating to the population of a country for different years or the
population of different countries in some fixed year constitute Statistics, since they are
comparable. However, the data relating to the size of the shoe of an individual and his
intelligence quotient (I.Q.) do not constitute Statistics as they are not comparable. In order
to make valid comparisons the data should be homogeneous i.e., they should relate to the
same phenomenon or subject.

In general, it can be said that all statistics are numerical facts; but not all numerical facts are
statistics. In order numerical facts to be statistics, the above seven characteristics have to be
Example 1.1: i) To a manager of an industry, statistics are perhaps daily reports of inventory
levels, absenteeism and production;
ii) For a College student, statistics may be the grades on all the quizzes in a
course of
a given semester.
Each of the above examples shows the meaning of statistics when used in its plural sense.

1.3.2 Statistics defined as a method (Singular sense)

The second definition of Statistics refers to the science or the methods of Statistics. It is also in
the sense of its second definition that we consider Statistics as a subject. With this regard,
Statistics may be defined as:
“Statistics is the science which deals with the methods of collecting, organizing, classifying,
presenting, computing (analyzing) and interpreting numerical data collected to throw some light
on any sphere of enquiry.”

“Statistics is the method of judging collective, natural or social phenomenon from the results
obtained on the analysis or enumeration or collection of estimates.”


“Statistics is the application of the scientific method in the analysis of numerical data for the
purpose of making rational decisions.”
Berenson and Levin
“Statistics is the collection, organization, presentation, analysis and interpretation of numerical
Coraxton and Cowden

Summing up all the above definitions, one can define Statistics preferably as:
Statistics is the study of the principles and methods used in the collection, organization,
presentation, analysis and interpretation of numerical data in any sphere of enquiry.


As a subject (science), Statistics has its own terms and terminologies. Knowing these terms and
words is fundamental in understanding the Statistical methods and concepts. Some of the basic
terms that will be used throughout the course are defined below.

Data: In Statistics, all conclusions are based on facts and the first step in any statistical
investigation is to collect a set of related observations from which conclusions may be drawn.
Such related observations that form the set are known as Data. The word data was obtained from
the singular Latin word “datum “to mean fact.
Population: The complete collection of individuals, objects, or measurements that have a
characteristic in common or totality of related observations in a given study is described as a
population. The population that is being studied is also called the target population.


i) Population of trees underspecified climatic conditions.

ii) Population of animals fed a certain type of diet.
iii) Population of farms having a certain type of natural fertility.
iv) Population of households, etc.
Population can be finite (limited in its size) or infinite (unrestricted).

Sample: A sub group of the population that will be studied in detail is called a Sample.

Example 1.3

In studying the average marks of students in Statistics course, if you take marks of 20 students
out of all 150 students who might have taken the course, the collected data will be a sample.

Parameters: are statistical measures obtained from a population data. These measures may
include the mean, variance, standard deviation, etc. and are denoted by  ,  2 ,  , etc.

Sample Statistic: a number computed from a sample data. Sample Statistics are denoted by
lower case letters of the alphabet such as x -the sample mean, s 2 - the sample variance, etc

Variable: is a characteristic under study that assumes different values for different elements and
most of the time variables are denoted by the letters X, Y, Z, etc.

Example 1.4: Height, Weight, Age, Income, Expenditure, Grade, Intelligence, sex, color, etc.

Quantitative Variable: is variable that can be expressed numerically such as height, weight, age,

income, expenditure, grade, family size, number of students in a class, etc.

Qualitative Variable: is variable that cannot assume a numerical value but can be classified into
two or more nonnumeric categories such as the gender of a person, the language in which a book
is written, hair color, and so on.

Discrete Variable: is a variable whose values are countable such as family size, number of
students in a class, etc. Its values are obtained by counting.

Continuous Variable: is a variable which can, theoretically assume any numerical value
between two given values. The respective values of such continuous variables are obtained by
measuring such as time taken to complete a certain examination, height of a person, age of a
person, test scores of a student and so on.

Elementary Unit: is a specific person, business, product account, and so on with some
characteristic to be measured or categorized.

Observation or Measurement: The value of a variable for an element is called an

Observation or Measurement.

Survey: Survey or experiment is a device of obtaining the desired data.

Statistical Design: is a process that involves a decision problem and choosing an approach to
solve the problem. It is a guide that indicates how an investigation is going to channeled.

Data Set: A data set is a collection of observations on one or more variables.

Sample Frame: A list of the entire population from which items can be selected to form a
sample is referred to as sample frame.


Statistical methods are classified into two groups or areas based on how data are used. These
are: Descriptive Statistics and Inferential Statistics.

Descriptive Statistics consists of the collection, organization, presentation and analysis of

numerical data. It is concerned with describing certain characteristics of a set of data (usually a
sample) – that is, what it is shaped like, what number the values tend to cluster (converge)
around, how much variation is present in the data, and so forth.

In short, Descriptive Statistics describes the nature or characteristics of data without making
conclusion or generalization.

Example 1.5: The average age of athletes participated in London Marathon was 25 years, 80%
of the instructors in Adama University are males, The marks of 50 students in Statistics course
are found to range from 30 to 85, etc. are some examples of Descriptive Statistics.

Inferential Statistics is concerned with the process of drawing conclusions (inferences) about
specific characteristics of a population based on information obtained from samples, performing
hypothesis testing, determining relationships among variables, and making predictions.

Example 1.6: The result obtained from the analysis of the income of 100 randomly selected
citizens in Ethiopia suggests that the average perception income of a citizen in Ethiopia is 30
Birr; the average income of all families in Ethiopia can be estimated from figures obtained from
a few hundred families.


There are five steps of statistical investigation on numerical data. The indicated steps are
discussed below
1. Collection of data
This is the process of obtaining measurements or counts and constitutes the first step in statistical
In general information pertinent to the underlying investigation are collected. Valid conclusions
can only result from properly collected data. i.e., if data are faulty, the conclusions drawn can
never be reliable. Hence, utmost care must be exercised in collecting data because they form the
foundation of statistical analysis
2. Organization of data
Collected data have to be organized in a suitable form so that one can have a general
understanding of the information gathered. A large mass of figures that are collected from
surveys frequently need organization
The first step in organizing a group of data is editing. The collected data must be edited very
carefully so that omissions, inconsistencies, irrelevant answers anet wrong computations in the
returns from a survey may be corrected or adjusted
After data are edited, the next step is to classify them. The purpose of data classification is to
arrange them according to some common
Characteristics possessed by the items constituting the data
The last step in data organization is tabulation. The purpose of tabulation is to arrange the data in
columns and rows so that there is absolute clarity in the data presented.
3. Presentation
After the data have been collected and organized, they are ready for presentation. The main
purpose of data presentation is to facilitate statistical analysis. This can be done by arranging the
data using graphs and diagrams.

4. Analysis of Data
After collection, organization and presentation the next step is that of analysis. This is the
extraction of summarized and comprehensible numerical descriptions of the data where these
measures will in turn give a far better understanding the nature of the data.
The purpose of analyzing data is to dig out information useful for decision making
Methods used in analyzing the presented data are numerous, ranging from simple observation of
the data to complicated, sophisticated and highly mathematical techniques. The most commonly
used methods of statistical analysis are measures of central tendency, measures of variation,
correlation, regression, estimation and hypothesis testing

5. Interpretation of data
This is the last stage in statistical investigation. It is the task of drawing conclusions from the
analysis of the data and usually involves the formulation of predictions concerning a large
collection of objects from information available for a small collection of similar objects. This
step usually involves decision making about a large collection of objects (Population) and about
information gathered from a small collection of similar objects (sample). The interpretation of
data is a difficult task and necessitates a high degree of skill and experience.


The study of statistics has become more popular than ever during the past three decades or so.
The increasing availability of computers and statistical soft ware packages has enlarged the role
of statistics as a tool for empirical research. As a result, statistics is used for research in almost
all professions, from medicine to sports, Today college students in almost all disciplines are
required to take at least one statistics course.
So the various tools of statistics are being used to solve problems in everyday life, in research, in
marketing, in planning, in production and quality control, and other areas.


Even thought statistics is growing in popularity and is being successfully employed by the
seekers of truth in numerous fields of learning, still it has limitations.
Limitations of statistics
The field of statistics, though widely used in all areas of human knowledge and widely applied in
a variety of disciplines such as business, economics and research, has its own limitations. Some
of these limitations are

1. It does not deal with individual values.

Statistics only deals with aggregates of values. For example, age of a single student in a given
class in a given year is not a Statistical data. In contrast, the age of all students within a given
class in a given year form an aggregate and hence can be considered as data. Alternatively, the
semester GPA of a single student for 4 semesters also forms a Statistical data. In short, Statistical

methods are suited only to those problems of situations where group characteristics are desired to
be studied.

2. Statistics deals only with quantitatively expressed items.

Another limitation of Statistics is that it deals with those subjects of inquiry that are capable of
being quantitatively measured and numerically expressed. Accordingly, such qualitative
characteristics as health, poverty, honesty and intelligence are not suitable for Statistical analysis.
However, problems involving such qualitative variables are treated in Statistics indirectly. For
example, the variable health may be studied through death rate, which is a quantitative variable.
However, these are only indirect methods.

3. Statistical conclusions are not universally true.

Statistical results are true only in general and on average statistical rules are not physical laws.
They are derived by taking a majority of cases and are not true for every individual. The
conclusions obtained statistically are not universally true. They are only true under certain
conditions. This is because statistics as a science is less exact as compared to natural sciences.
Thus, the statistical inferences are uncertain.

4. Statistics can be misused by ignorant or wrongly motivated persons.

The greatest limitation of statistics is that it is liable to be misused. The misuse of statistics may
arise because of several reasons. It may result from having incomplete data or from lack of
having enough experience in using it. Statistical data should be properly handled by someone
who has a founded knowledge of statistics in order to arrive at a correct conclusion from the
gathered information.
Statistical interpretation requires a high degree of skill and understanding of the subject. In order
to get meaningful results, it is necessary that the data be properly and professionally collected
and critically interpreted. It requires intensive training to read and analyses statistics in its proper
context. It may lead to fallacious conclusions in the hands of the inexperienced persons.
Example 1.7: From 1995 E.C. graduates of Accounting at ASTU more than 80 percent of the
females graduated with the GPA above 2.5. Therefore, females are better in Accounting than any
other field.
Here the given information is not sufficient to make the conclusion stated because
1) It is a data taken from 1995 E.C. only and does not also include the performance of
females in the other departments.
2) It does not tell the female to male proportion, where the fact may be there were only two
female students in the Accounting department who graduated that year and all of them

graduated with a GPA above 2.50.

Example 1.8: The argument that drinking alcoholic beverages is bad for longevity because 95
percent of the persons who take alcoholic beverages die before the age of 80 years is statistically
defective, since we are not told what percentage of persons who do not take alcoholic beverages
die before reaching that age. So, statistical conclusions that are based on incomplete information
may lead us to fallacies.
Thus, if statistics is not properly used, it is capable of being misinterpreted and, generally, there
may be distrust of statistics in the minds of the public. Therefore, it must be kept in mind that
statistics should be used only as a tool and not as end by itself.


In general, statistical methods and procedures are useful almost in every one’s life. The decision
making process by the managers of modern business and industry is governed by statistical
applications. So, statistical methods are applied to, visually, any kind of situation where
numerical information is gathered with the objective of making rational decisions in the face of
uncertainty. Some of the uses of statistics are:
i) To present facts in a definite form
That is to present huge numerical information in a concise, unambiguous and easily
understandable way and this helps proper comprehension of what is stated.
ii) Statistics facilitates comparisons
Unless figures are compared with the others of the same kind, they are often devoid of any
meaning. Statistics is a technique of making comparisons, using statistical measures like
correlation, averages, coefficients, rates and ratio, etc. comparisons between two or more sets of
data can be made to understand their relationship and differences
iii) Statistics gives guidance in the formulation of suitable policies
From the past recorded statistics predication can be made to the future situations and accordingly
policy makers could formulate their production program in the light of changes that are
anticipated to take place in the quantity of sales.
For example, demographic data about population size, its distribution by age and sex and other
socioeconomic characteristics, the rate of population growth, migration, death rate, etc. help
policy makers in determining the future needs such as food, clothing, housing, education, health
facilities, recreational places, water, electricity, transportation system, etc
iv) Prediction
Plans and policies are usually formulated well in advance of time of their implementation;
Acknowledge of future trends is very helpful in framing suitable policies and plans. Statistical
methods provide helpful means of forecasting future events.
v) Statistical methods are very helpful in formulating and testing hypothesis and to develop
new theories.
vi) Statistics in the Sciences
The ideas and methods of statistics have been useful to the sciences- in the conduct of research in
the analysis of findings and also in the formulation of laws.

Measurement can be defined as the assignment of numbers to objects and events according to
logically acceptable rules. The number system is highly logical and offers a multiplicity of
possibilities of further logical manipulations.
A measurement scale should possess the following attributes to allow for these logical
Is the quantum or quantity in which the attribute exists in various instances of the phenomena. It
allows us to tell whether one instance of the attribute is greater than, less than or equal to another
instance of the attribute.
Example 1.9: If X gets a score of 20 on an aggressiveness scale and Y a score of 25, we can say
that Y is more aggressive than X.
Equal intervals
It denotes that the magnitude of the attribute represented by a unit of measurement on the scale
is equal regardless of where on the scale the unit falls.
Example 1.10: A difference in heights between 60 inches and 65 inches is equal to the
difference in height, between 67 inches and 72 inches.
Absolute zero point
Is a value that indicates zero exists at that point or nothing at all of the attribute being measured
For example, a zero weight indicates “no weight” at all.
Keeping in mind these three characteristics of measurements, scales of measurement can be
divided in to four different types.
1. Nominal Scale (Classificatory Scale)
It refers to the simple classification of objects or items in to discrete groups which do not bear
any magnitude relationships to one another.
In the nominal scale of measurement, numbers are used simply as labels for groups or classes.
Example1.11: If our data set consists of blue, green, and red items, we may designate blue as 1,
green as 2, and red as3. In this case, the numbers 1, 2 and 3 stand only for the category to which
a data point belongs.
“Nominal” stands for “name” of category. The nominal scale of measurement is used for
qualitative rather than quantitative data: blue, green, red, male, female; marital status (married,
single, divorced, etc.); professional classification; geographic classification; and so on.

2. Ordinal Scale or Ranking Scale

First it is nominal scale and here data elements may be ordered according to their relative size or
quality. It means that ordinal scale is first of all, nominal but most people would agree with the
order in which the categories were placed so, in ordinal scale inequalities have a meaning. The
inequality signs ‘<’ or’>’ may assume any meaning like “stronger than”, “softer than”, “weaker
than” etc.

Ordinal scale reflects only magnitude and does not possess the attribute of equal intervals or an
absolute zero point
Example 1.12: Let us consider the set of all diplomas. B.Sc /B.A., and M.Sc. /MA graduates of a
certain university at the end of a certain academic year. The graduates could be classified
according to their level of education. It is possible to say “the M.Sc/MA graduate is better than
the B.Sc. /BA. Graduate “or to say” the B.Sc./BA. Graduate is better than the diploma graduate”.
Therefore, we can have meaningful inequalities.

3. Interval Scale

The interval scale possesses two out of three important requirements of good measurement scale
i.e., magnitude and equal intervals but lacks the real or absolute zero point.

An interval scale is one, which provides equal intervals from an arbitrary origin. An interval
scale not only orders according to the amount of the attribute they represent, but also establishes
equal intervals between the units of measure. Equal differences in the numbers represent equal
differences in the attribute being measured.
Example 1.13: If we apply a meter scale to measure heights of the students, we any find their
height to be 150cm, and so on. These measurements are on a scale with equal intervals. The
distance of height between 150cm and 160cm is exactly equal to the distance of height between
170cm and 180cm.
Example 1.14: The Fahrenheit and centigrade thermometers are examples of interval scales.
On an interval scale, both the order and distance relationships among the numbers have meaning.
We may assert 30 and 31 degrees centigrade equal to the differences between 40 and 41 degrees
centigrade. We could not say, however 50oC is twice as hot as 25oC. This is because there is no
true zero point on an interval scale. Since zero is arbitrary, multiplication and division of
numbers are not appropriate.

4. Ratio Scale

The scale of measurement which has all the three attributes – magnitude, equal intervals and an
absolute zero point- is called a ratio scale. Addition, subtraction, multiplication and division of
the numbers are appropriate.
Example 1.15: - All physical measurements, like height, weight, etc.
- Number of students in various classes.
- Number of books possessed by students of a class, etc.


You might also like