Professional Documents
Culture Documents
Chapter One: TH TH
Chapter One: TH TH
lecture Note
CHAPTER ONE
INTRODUCTION
Learning objectives:
After completing this chapter, the student will be able to:
1. Define Statistics and Biostatistics
2. Enumerate the importance and limitations of statistics
3. Define and identify the different types of data and understand why we need to
classifying variables
1.1 Origin and Growth of Statistics
The origin of modern statistics can be traced back to the 17th and 18th centuries when
mathematicians were mainly interested in the development of the theory of probability as
applied to the theory of chance. In the modern world of computers and information
technology, the importance of statistics is very well recognized by all the disciplines.
Statistics has originated as a science of statehood and found applications slowly and
steadily in Agriculture, Economics, Commerce, Biology, Medicine, Industry, planning,
Education and so on.
Statistical thinking has now a day became very essential for different fields of study. Its
usefulness has now spread to such diverse fields as agriculture, business, accounting,
marketing, economics, management, medicine, political science, psychology, sociology,
engineering, journal, metrology, tourism, etc. In biomedical research, meaningful
conclusions can only be drawn based on data collected from a valid scientific design using
appropriate statistical methods. Therefore, the selection of an appropriate study design is
important to provide an unbiased and scientific evaluation of the research questions. Each
design is based on a certain rationale and is applicable in certain experimental situations.
Biostatistics is the segment of statistics that deals with data arising from biological
processes or medical experiments. Thus biostatistics is the application of statistical
techniques in a health related area (application of statistical methods on biological,
medical and public health data).
Why biostatistics?
1
Introduction To Statistics……………………………………………………………. lecture Note
Because some statistical methods are more heavily used in health applications than
elsewhere (e.g Survival analysis, longitudinal data analysis)
The word statistics on the other hand, has two meanings. In the more common
usage,
Statistics (plural sense) refers to numerical information (aggregates of facts). Example
includes statistics of births, disease cases, imports, exports, etc. In these examples
statistics are numbers or facts. The subject of statistics (singular sense), has a much
broader meaning than just collecting and publishing numerical information. Statistics
in this sense may be defined as the science of Collecting, Organizing,
Presenting, Analyzing and Interpreting data to assist in making more effective
decisions. This definition points out five stages in any statistical investigation
Definitions of Statistics
All the above definition of statistics can be summarized by the following statements.
A) Statistics as Numerical Data (Plural Senses): in this sense statistics are defined as
aggregates of numerically expressed facts (figures) collected in a systematic manner for a
pre-determined purpose.
2
Introduction To Statistics……………………………………………………………. lecture Note
Classification of Statistics
Statistics can be classified in to two broad classes: Descriptive statistics and Inferential
Statistics.
1. Descriptive statistics:
This part of statistics deals only with describing some characteristics of the data
collected without going beyond the data. In other words, it deals with only
describing the sample data without going any further: that is without attempting to
infer (conclude) anything about the population.
Descriptive statistics deals with collection of data, its presentation in various
forms, such as tables, graphs and diagrams and findings averages and other
measures which would describe the data.
Descriptive statistics refers only to the actual data. That is, the data at hand.
Descriptive Statistics is basically a kind of Statistics which is used to describe the
features of the data that gathered by the researcher.
Examples:
Classification of students in college of computing and informatics according to
their Department.
The number of female students in this class.
2. Inferential Statistics:
3
Introduction To Statistics……………………………………………………………. lecture Note
According to the definition of statistics, we have the following five stages of a statistical
investigation.
1. Collection of data: The first stage of statistical investigation. The data should be
collected with a specific and well defined purpose so that the conclusions drawn are
not to be misleading. Two methods of data collection: Primary and Secondary:
Primary method of data collection refers to obtaining original and first hand data
and Secondary method of data collection involves obtaining data from other sources.
2. Organization of data: This is a methodology for classification and describing the
properties of data in a summary form. Editing, coding and classification are the three
steps in the organization of data.
3. Presentation of data: In this stage the collected and organized data are presented
with in some systematic order to facilitate statistical analysis. The organized data are
presented with the help of tables, diagrams and graphs.
4. Analysis of data: Analysis of data involves extraction of relevant information from
the collected data using some mathematical and statistical tools. In other words, it
involves extracting relevant information from the data (like mean, median, mode,
range, variance…), mainly through the use of elementary mathematical operation.
5. Interpretation of data: This stage involves drawing a valid conclusion from the
analyzed data. That is interpretation of data involves making inferences (drawing
conclusions) based on the analysis of data.
4
Introduction To Statistics……………………………………………………………. lecture Note
5
Introduction To Statistics……………………………………………………………. lecture Note
a) Applications of statistics:
• In almost all fields of human endeavor.
• Almost all human beings in their daily life are subjected to obtaining numerical
facts.
• Applicable in some process e.g. invention of certain drugs, extent of environmental
pollution.
• In industries especially in quality control area.
b) Uses of statistics
The main function of statistics is to enlarge our knowledge of complex phenomena. The
following are some uses of statistics:
1. It presents facts in a definite and precise form.
2. Data reduction.
3. Measuring the magnitude of variations in data.
4. Furnishes a technique of comparison
5. Estimating unknown population characteristics.
6. Testing and formulating of hypothesis.
7. Studying the relationship between two or more variable.
8. Forecasting future events.
c) Limitations of statistics:
Statistics with all its wide application in every sphere of human activity has its own
limitations. Some of them are given below.
1. Statistics is not suitable to the study for qualitative phenomenon: Since statistics is
basically a science and deals with a set of numerical data, it is applicable to the study of
only these subjects of enquiry, which can be expressed in terms of quantitative
measurements. As a matter of fact, qualitative phenomenon like honesty, poverty, beauty,
intelligence etc, cannot be expressed numerically and any statistical analysis cannot be
directly applied on these qualitative phenomenons. Nevertheless, statistical techniques
may be applied indirectly by first reducing the qualitative expressions to accurate
6
Introduction To Statistics……………………………………………………………. lecture Note
quantitative terms. For example, the intelligence of a group of students can be studied on
the basis of their marks in a particular examination.
2. Statistics does not study individuals: Statistics does not give any specific importance
to the individual items; in fact it deals with an aggregate of objects. Individual items,
when they are taken individually do not constitute any statistical data and do not serve
any purpose for any statistical enquiry.
3. Statistical laws are not exact: It is well known that mathematical and physical
sciences are exact. But statistical laws are not exact and statistical laws are only
approximations. Statistical conclusions are not universally true. They are true only on an
average.
4. Statistics can be easily misused: Statistics must be used only by experts; otherwise,
statistical methods are the most dangerous tools on the hands of the inexpert. The use of
statistical tools by the inexperienced and untraced persons might lead to wrong
conclusions. Statistics can be easily misused by quoting wrong figures of data.
1) Types of variables
7
Introduction To Statistics……………………………………………………………. lecture Note
Proper knowledge about the nature and type of data to be dealt with is essential in order
to specify and apply the proper statistical method for their analysis and inferences.
Measurement scale refers to the property of value assigned to the data based on the
properties of order, distance and fixed zero.
In mathematical terms measurement is a functional mapping from the set of objects {O }
i
to the set of real numbers {M (O )}. The goal of measurement systems is to structure the
i
rule for assigning numbers to objects in such a way that the relationship between the
objects is preserved in the numbers assigned to the objects.
8
Introduction To Statistics……………………………………………………………. lecture Note
The different kinds of relationships preserved are called properties of the measurement
system.
Order
The property of order exists when an object that has more of the attribute than another
object, is given a bigger number by the rule system. This relationship must hold for all
objects in the "real world".
The property of ORDER exists When for all i, j if O > O , then M (O ) > M (O ).
i j i j
Distance
The property of distance is concerned with the relationship of differences between
objects. If a measurement system possesses the property of distance it means that the unit
of measurement means the same thing throughout the scale of numbers. That is, an inch
is an inch, no matters were it falls - immediately ahead or a mile downs the road.
More precisely, an equal difference between two numbers reflects an equal difference in
the "real world" between the objects that were assigned the numbers. In order to define
the property of distance in the mathematical notation, four objects are required: O , O ,
i j
O , and O . The difference between objects is represented by the "-" sign; O - O refers to
k l i j
the actual "real world" difference between object i and object j, while M (O ) - M (O )
i j
Fixed Zero
A measurement system possesses a rational zero (fixed zero) if an object that has none of
the attribute in question is assigned the number zero by the system of rules. The object
does not need to really exist in the "real world", as it is somewhat difficult to visualize a
"man with no height". The requirement for a rational zero is this: if objects with none of
the attribute did exist would they be given the value zero. Defining O as the object with
0
The property of fixed zero is necessary for ratios between numbers to be meaningful.
Scale Types
10
Introduction To Statistics……………………………………………………………. lecture Note
4. Ratio Scales
Ratio scales are measurement systems that possess all three properties: order, distance,
and fixed zero. The added power of a fixed zero allows ratios of numbers to be
meaningfully interpreted; i.e. the ratio of Bekele's height to Martha's height is 1.32,
whereas this is not possible with interval scales.
Level of measurement which classifies data that can be ranked, differences are
meaningful, and there is a true zero. True ratios exist between the different units
of measure.
All arithmetic and relational operations are applicable.
Examples:
Weight, Height, Number of students, Age, etc.
Helps you decide how to interpret the data from the variable.
Helps you decide what statistical analysis is appropriate on the values that were
assigned. For example if a measurement is nominal then you know that you never
average the data level.
Review Questions
11
Introduction To Statistics……………………………………………………………. lecture Note
12
Introduction To Statistics……………………………………………………………. lecture Note
Data may be derived from several sources. Depending on the source, data can be
classified as Primary or Secondary data.
1. Primary Data
Data measured or collect by the investigator or the user directly from the source.
Data is gathered for the first time by the researcher for a given purpose.
Example:
o An enquiry is made from each tax payer in a city to obtain their opinion
about the tax collecting machinery.
o the data collected in the census study
Two activities involved: planning and measuring.
A) Planning:
Identify source and elements of the data.
Decide whether to consider sample or census.
If sampling is preferred; decide on sample size, selection method etc.
Decide measurement procedure.
Set up the necessary organizational structure.
B) Measuring: there are different options.
Focus Group
Telephone Interview
Mail Questionnaires
Door-to-Door Survey
New Product Registration
Personal Interview and
Experiments are some of the sources for collecting the primary data.
2. Secondary Data
Data gathered or compiled from published and unpublished sources or files.
Usually secondary data is obtained from years book, census reports, survey
reports, official records or reported experimental reports
When our source is secondary data check that:
The type and objective of the situations.
13
Introduction To Statistics……………………………………………………………. lecture Note
The purpose for which the data are collected and compatible with the present
problem.
The nature and classification of data is appropriate to our problem.
There are no biases and misreporting in the published data.
For example, let‟s assume a researcher is interested to study the prevalence of family
planning utilization among women of reproductive age in a given Woreda. The
researcher can either conduct a survey (primary data) or utilize the record of family
planning clinics in the woreda (secondary data).
Note: Data which are primary for one may be secondary for the other.
Questionnaire is the main data collection instrument in formal sample survey. Before
examining the steps in designing a questionnaire we need to review the types of questions
used in questionnaires. Depending on the amount of freedom given to respondent in
offering responses, there are two basic types of questions that can be used in
questionnaires: open-ended questions and closed ended questions.
The type of questions will be determined by the form of responses wanted, the nature of
the respondents and their ability to answer the questions.
Open-ended questions: - allows the respondent to answer it freely in his or her own
words.
A multiple choice question offers more than two responses in the predetermined list of
alternate responses.
14
Introduction To Statistics……………………………………………………………. lecture Note
Advantage
Quick and inexpensive.
Responses from different respondents are comparable.
Useful in describing quantifiable characteristics of a large population.
Very large and representative samples are feasible.
Standardized questions make measurement more precise.
Disadvantage
Participants need to be able to read and write to respond.
High non-response rate.
15
Introduction To Statistics……………………………………………………………. lecture Note
16
Introduction To Statistics……………………………………………………………. lecture Note
Having collected and edited the data, the next important step is to organize it. That is to
present it in a readily comprehensible condensed form that aids in order to draw
inferences from it. It is also necessary that the like be separated from the unlike ones. The
process of arranging data in to classes or categories according to similarities technically
is called classification. Classification is a preliminary and it prepares the ground for
proper presentation of data. Mainly, the purpose of classification is to divide the data into
homogeneous groups or class.
The classification of the data generally done on geographical, chorological, qualitative or
quantitative basis on the following lines:
1) In geographical classification, data are arranged according to places, areas or regions.
2) In chorological classification, data are arranged according to time i.e., weekly,
monthly, quarterly, half yearly, annually, etc.
3) In qualitative classification, the data are arranged according to attributes like sex,
marital status, educational standard, stage or intensity of diseases etc.
17
Introduction To Statistics……………………………………………………………. lecture Note
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D
18
Introduction To Statistics……………………………………………………………. lecture Note
Solution:
Since the data are categorical, discrete classes can be used. There are four types of
marital status M, S, D, and W. These types will be used as class for the distribution. We
follow procedure to construct the frequency distribution.
Step 1: Make a table as shown.
Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
Step 4: Find the percentages of values in each class by using;
f
%= 100 = Where f= frequency of the class, n=total number of value.
n
Percentages are not normally a part of frequency distribution but they can be added since
they are used in certain types diagrammatic such as pie charts.
Step 5: Find the total for column (3) and (4).
Combing the entire steps one can construct the following frequency distribution.
Class (1) Tally (2) Frequency (3) Percent (4)
M //// 5 20
S //// // 7 28
D //// // 7 28
W //// / 6 24
19
Introduction To Statistics……………………………………………………………. lecture Note
80 76 90 85 80
70 60 62 70 85
65 60 63 74 75
76 70 70 80 85
60 // 2
62 / 1
63 / 1
65 / 1
70 //// 4
74 / 1
75 // 2
76 / 1
80 /// 3
85 /// 3
90 / 1
3) Grouped frequency Distribution:
A frequency distribution when several numbers are grouped in one class.
When the range of the data is large, the data must be grouped in to classes that
are more than one unit in width.
Definition of some common terms
20
Introduction To Statistics……………………………………………………………. lecture Note
21
Introduction To Statistics……………………………………………………………. lecture Note
22
Introduction To Statistics……………………………………………………………. lecture Note
5. Pick a suitable starting point less than or equal to the minimum value. The starting
point is called the lower limit of the first class. Continue to add the class width to this
lower limit to get the rest of the lower limits.
6. To find the upper limit of the first class, subtract U from the lower limit of the second
class. Then continue to add the class width to this upper limit to find the rest of the
upper limits.
7. Find the boundaries by subtracting 0.5U units from the lower limits and adding 0.5U
units from the upper limits. The boundaries are also half-way between the upper limit
of one class and the lower limit of the next class.
8. Tally the data.
9. Find the frequencies.
10. Find the cumulative frequencies. Depending on what you're trying to accomplish, it
may not be necessary to find the cumulative frequencies.
11. If necessary, find the relative frequencies and/or relative cumulative frequencies
Example 2.3: Construct a frequency distribution for the following data.
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solutions:
Step 1: Find the highest and the lowest value H=39, L=6
Step 2: Find the range; R=H-L=39-6=33
Step 3: Select the number of classes‟ desired using Sturges formula;
=1+3.32log (20) =5.32=6(rounding up)
Step 4: Find the class width; w=R/k=33/6=5.5=6 (rounding up)
Step 5: Select the starting point, let it be the minimum observation.
6, 12, 18, 24, 30, 36 are the lower class limits.
Step 6: Find the upper class limit;
E.g. the first upper class=12-U=12-1=11
11, 17, 23, 29, 35, 41 are the upper class limits.
So combining step 5 and step 6, one can construct the following classes.
Class limits
23
Introduction To Statistics……………………………………………………………. lecture Note
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41
Step 7: Find the class boundaries;
E.g. for class 1 Lower class boundary=6-U/2=5.5
Upper class boundary =11+U/2=11.5
• Then continue adding class width on both boundaries to obtain the rest boundaries
and one can obtain the following classes.
Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
29.5 – 35.5
35.5 – 41.5
Step 8: tally the data.
Step 9: Write the numeric values for the tallies in the frequency column.
Step 10: Find cumulative frequency.
Step 11: Find relative frequency or/and relative cumulative frequency.
The complete frequency distribution follows as:
24
Introduction To Statistics……………………………………………………………. lecture Note
Pie chart
A Pie Chart is a circular chart divided into sectors, illustrating relative magnitudes or
frequencies of classes of a given variable. Pie chart usually represents categorical data but it
is also possible to use it for discrete quantitative data. The angle of each sector has to be
proportional to the relative frequency of a given class.
value of the part
Angle of Sector= * 100
the whole quantity
Example 2.4: Draw a suitable diagram to represent the following population in a town.
Men Women Girls Boys
2500 2000 4000 1500
Solutions:
Step 1: Find the percentage.
Step 2: Find the number of degrees for each class.
Step 3: Using a protractor and compass, graph each section and write its name
corresponding percentage.
25
Introduction To Statistics……………………………………………………………. lecture Note
15%
25%
Men
Women
Girls
Boys
40% 20%
Pictogram
Data are presented with the help of picture. Such presentation is known as pictorial
diagram or pictogram. Here the magnitudes of quantities of the variable are explained
with the help of pictures which depict the variable approximately. In a pictogram, each
symbol in the picture represents a fixed quantity of the variable.
Bar Charts:
- A set of bars (thick lines or narrow rectangles) representing some magnitude over
time space.
- They are useful for comparing aggregate over time space.
- Bars can be drawn either vertically or horizontally.
- There are different types of bar charts. The most common being:
Simple bar chart
Component or sub divided bar chart.
Multiple bar charts.
Sales($) In 1957
30
24 24
25
20
15 12
10
0
A B C
Solutions:
100
80
sales in $
product C
60
product B
40
product A
20
0
1957 1958 1959
Years of production
60
50
Sales in $
40 product A
30 product B
20 product C
10
0
1957 1958 1959
Years of production
Example 2.6: Draw a diagram presenting by product 1958 assuming that there was a
product D whose sales in 1958 was $ 100000.
The histogram, frequency polygon and cumulative frequency graph or Ogive is most
commonly applied graphical representation for continuous data.
28
Introduction To Statistics……………………………………………………………. lecture Note
Histogram
A graph which displays the data by using vertical bars of various heights to represent
frequencies. Class boundaries are placed along the horizontal axes. Class marks and class
limits are sometimes used as quantity on the X axes. Unlike Bar graph, in the case of
Histogram the categories (bars) must be adjacent.
Example 2.7: the following table summarizes the Biostatistics mid exam score of 38
students out of 35 marks.
29
Introduction To Statistics……………………………………………………………. lecture Note
Frequency Polygon:
Frequency Polygon depicts a frequency distribution for discrete or continuous numeric
data. Frequency polygons are a graphical device for understanding the shapes of
distributions.
A Histogram can easily be changed to Frequency Polygon by joining the mid points of
the top of the adjacent rectangles of the Histogram with a line. It is also possible to draw
Frequency Polygon without drawing Histogram.
Example 2.8: - the following Frequency Distribution represents the ages (in years) of 60
patients at a psychiatric counseling center.
Finally we have to plot the midpoints (as X axis) with respective frequency of each class
(as Y axis) and connect adjacent plots with a straight line.
30
Introduction To Statistics……………………………………………………………. lecture Note
31