Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Topic 2: Descriptive Statistics


A variable - characteristic that can be

measured or observed

A variable – attribute that can take

different values for different objects
 Sex : Male, female
 Weight: Kg, Ibs
 Height: Tall, short
 Level of knowledge: High, low
What are data?
 Data is a collection of observations expressed in
numerical figures.

 Datasets consist of variables reflecting different

characteristics of study units

 But why collect data? Why assign numbers to the

events, attributes, and characteristics of the world?
Types of variables
 Data needs to be summarised before it can be
presented and used

 Depends on type of data

 Different variables produce different types of data

 Two broad categories exist:

1. Qualitative (categorical) variables

2. Quantitative (numerical) variables

1. Qualitative or categorical variables

 Do not take numerical values, but are recorded

and reported in categories.

 Possess attributes only. For example,

 Sex (male, female)

 Type of health facility (dispensary, health centre, hospital)

 Marital status (single, married, cohabiting, separated,

divorced, widowed)

 Level of knowledge: poor, average, good

2. Quantitative or numerical variables
 Numeric variables can be:

1. Discrete - taking whole numbers only, e.g.

parity, number of newly admitted patients.

2. Continuous - taking any value within

meaningful extremes, e.g. haemoglobin,

birth weight.
Continuous Data
Continuous data represent measurable
quantities that are not restricted to taking
specific values. For practicability, continuous
data are usually rounded to discrete values.
The measuring device itself is also important.

AKA: scale data (in SPSS)

Levels of measurement
 How categories or values of the variable are

arranged in relation to each other

 Statistical procedures require that a variable to

measure at particular level for a particular

Levels of Measurement

 There are four types of measurements or

measurement scales used in statistics:
 Nominal

 Ordinal

 Interval

 Ratio

 The four types or levels of measurement have

different degrees of usefulness in statistical research
Scale of measurement

 Qualitative (Categorical)
 Nominal
 Ordinal

 Quantitative (Continuous/Discrete)
 ratio
 interval
1. Nominal Data
 Nominal data have no meaningful rank
order among values
 Examples
 Sex: male/female
 HIV status: +/-
 Religion:
 Race
 Occupation
 Marital status
 Other examples:

AKA: categorical variables, class variables, binary variables (if only two levels)
2. Ordinal Data
Ordinal data fall into ordered categories, but arithmetic
operations are not appropriate
- Are used to reflect rank order among categories (e.g. SES: 1=low;
2=medium; 3=high)
Note: number used for indication of rank order only.
 Education level
 Severity of rheumatoid arthritis
 Degrees of maturation in adolescents
Other examples:
 Age categories
 Meat-eating habits
 Socio-economic status

AKA: categorical variables, class variables

Subtypes of Continuous/ Discrete Data
3. Interval

Interval measurements: have meaningful

distances between measurements (i.e.
numbers used are more meaningful)

 arithmetic operations (“+” and “-”) can be

performed (e.g. temperature in C or in F)

 no meaningful zero value (i.e. zero point

arbitrarily defined; e.g. in IQ measurements or
temperature measurements in degrees Celsius
0 degrees C does not signify an absence of
heat )
4. Ratio

 Ratio measurements:
 most sophisticated level

 has all characteristics of interval scale,

but has absolute zero indicating an
absence of the measured quantity (e.g.
pulse rate, height, weight etc.)

 can be converted to lower level, e.g. to

ordinal scale
Differences between levels

INTER Amount of
VAL/ difference


Order of
NAL difference

NOMI Existence of

NAL difference

Rules to use in analysis
 Any statistic that can be used for variables at

low level of measurement, can be used for

those of high level

 Statistics designed for variables of high level of

measurement, should NOT be used for those of

low level
Rules to use in analysis

Statistics designed for


Statistics NOMINAL OK Never Never

can be
used for
ORDINAL Can, not OK Never
INTERVAL Can, not Can, not OK
best best
Conceptualizing Variables
 Imagine you are studying the
relationship between HIV and sexual
 Which variables to collect?
 What is the nature of these variables?
Cumulative Frequency, Relative Frequency, and
Cumulative Relative Frequency

 Cumulative Frequency
 Add the number of observations in each
category/interval to the total number of observations in
all categories/intervals above it
 Relative Frequency
 Divide the number of observations in each
category/interval by the total number of observations,
and multiply the result by 100
 Cumulative Relative Frequency
 Add the percent of observations in each
category/interval to the total percent of observations in
all categories/intervals above it. Will total 100% in the
last category.
Serum Cholesterol Levels among men aged
25-34, Mbeya, 1976-1980
Cholesterol Number of Cumulative Relative Cumulative
Level men Number Frequency Relative
(mg/100 % Frequency,
ml) %

80-119 13 13 1.2 1.2

120-159 150 163 14.1 15.3

160-199 442 605 41.4 56.7

200-239 299 904 28.0 84.7

240-279 115 1019 10.8 95.5

280-319 34 1053 3.2 98.7

320-359 9 1062 0.8 99.5

360-399 5 1067 .5 100.0


 These are common terms in

Epidemiology and Vital Statistics
 There is need to distinguish them
 Any number (the numerator) divided by any other
number (the denominator) gives a ratio.
 i.e. a/b is a ratio
 a is the numerator; b is the denominator.
 A ratio is often presented as a:b
 a and b need not have the same units.
 Example:
 sex ratio at birth = number of male births
number of female births
 A proportion is a special form of a ratio in that,
in a proportion the numerator is part of a
 Example:
 Proportion of male births
= number of male births
total number of births
 A proportion is often expressed in a
percentage form
 A rate is a proportion with an extra dimension, TIME.
 One has to study the population for a particular period, say 1
year, and count the number of times the particular event occurs.
 So a rate indicates the frequency of events occurring in a
population per unit of time.
 Example: Crude Death Rate (CDR)
 CDR = Number of deaths in one year x 1000
Total population
 Rates may be expressed per 1000, 100,000 or 1,000,000
depending on convention and convenience.
Practical: Malaria and Birth
 Between 1995 and 1997, a baseline survey to
determine the association between malaria infection
and birth weight was carried out in Dar Es Salaam.

 A total of 800 mothers were involved in the study. Of

these, 60 mothers were found to have been infected
with malaria parasites. It was also observed that 80
mothers had given birth to babies with low birth
weight (i.e. below 2500g), while the rest gave birth to
babies with normal birth weight.
Practical (contd)
 What proportion (in percentage) of mothers was
found to be infected with malaria?
 What proportion (in percentage) of mothers gave
birth to babies with low birth weight?
 What is the ratio of mothers who gave birth to
babies with low birth weight to mothers who gave
birth to babies with normal birth weight?
 Derive a two-way table showing birth weight (low
and normal) by malaria infection (+ve and -ve),
given that 40 of mothers found to have no malaria
parasites gave birth to low birth weight babies. Do
you think malaria infection has an effect on birth

You might also like