Introduction To Statistics

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 46

INTRODUCTION TO

STATISTICS

By Loyce Gonzo – 0773 390 959


gonzol@staff.msu.ac.zw
Office U4
Extension 2169

Adapted from
DR.S. Ahamed
LECTURE OUTLINE:

 Definition of Statistics
 Types of data
 Frequency distribution of data
 Graphical representation of data
 “Statistics is the science which deals
with collection, classification and
tabulation of numerical facts as the
basis for explanation, description
and comparison of phenomenon”.
------ Lovitt
 Statistics explores the collection,
organization, analysis and interpretation of
data.
WHAT DOES STATISTICS
COVER ?
Planning
Design
Execution (Data collection)
Data Processing
Data analysis
Presentation
Interpretation
Publication
INVESTIGATION
Data Colllection

Descriptive Statistics
Data Presentation Inferential Statistics
Univariate analysis
Measures of Location
Tabulation Estimation-Point estimate
Measures of Dispersion
Diagrams Interval estimate Multivariate analysis
Measures of Skewness &
Graphs Hypothesis Testing
Kurtosis
TYPES OF DATA

 QUALITATIVE DATA
 DISCRETE QUANTITATIVE
 CONTINUOUS QUANTITATIVE
QUALITATIVE

Nominal
Example: Sex ( M, F)
Exam result (P, F)
Blood Group (A,B, O or AB)
Color of Eyes (blue, green,
brown, black)
ORDINAL
Example:
Response to treatment
(poor, fair, good)
Severity of disease
(mild, moderate, severe)
Income status (low, middle,
high)
QUANTITATIVE (DISCRETE)

Example: The no. of family members


The no. of heart beats
The no. of admissions in a day

QUANTITATIVE (CONTINUOUS)

Example: Height, Weight, Age, BP, Serum


Cholesterol and BMI
Discrete data -- Gaps between possible values

Number of Children

Continuous data -- Theoretically,


no gaps between possible values

Hb
CONTINUOUS DATA

DISCRETE DATA

wt. (in Kg.) : under wt, normal & over wt.


Ht. (in cm.): short, medium & tall
Table 1 Distribution of blunt injured patients
according to hospital length of stay
hospital length of stay Number Percent
1 – 3 days 5891 43.3
4 – 7 days 3489 25.6
2 weeks 2449 18.0
3 weeks 813 6.0
1 month 417 3.1
More than 1 month 545 4.0
Total 14604 100.0
Mean = 7.85 SE = 0.10
Scale of measurement
Qualitative variable:
A categorical variable

Nominal (classificatory) scale


 - gender, marital status, race

Ordinal (ranking) scale


 - severity scale, good/better/best
Scale of measurement
Quantitative variable:
A numerical variable: discrete; continuous

Interval scale :
Data is placed in meaningful intervals and order. The unit of
measurement are arbitrary.

- Temperature (37º C -- 36º C; 38º C-- 37º C are equal) and


No implication of ratio (30º C is not twice as hot as 15º C)
Ratio scale:
Data is presented in frequency distribution in
logical order. A meaningful ratio exists.

- Age, weight, height, pulse rate


- pulse rate of 120 is twice as fast as 60
- person with weight of 80kg is twice as heavy
as the one with weight of 40 kg.
Scales of Measurement
 Nominal – qualitative classification of equal
value: gender, race, color, city
 Ordinal - qualitative classification which can
be rank ordered: socioeconomic status of
families
 Interval - Numerical or quantitative data: can
be rank ordered and sizes compared :
temperature
 Ratio - Quantitative interval data along with
ratio: time, age.
INVESTIGATION
Data Colllection

Descriptive Statistics
Data Presentation Inferential Statistics
Univariate analysis
Measures of Location
Tabulation Estimation-Point estimate
Measures of Dispersion
Diagrams Interval estimate Multivariate analysis
Measures of Skewness &
Graphs Hypothesis Testing
Kurtosis
Frequency Distributions

 data distribution – pattern of


variability.
 the center of a distribution
 the ranges
 the shapes
 simple frequency distributions
 grouped frequency distributions
 midpoint
Tabulate the heamoglobin values of 30 adult
male patients listed below

Patien Hb Patien Hb Patien Hb


t No (g/dl) t No (g/dl) t No (g/dl)
1 12.0 11 11.2 21 14.9
2 11.9 12 13.6 22 12.2
3 11.5 13 10.8 23 12.2
4 14.2 14 12.3 24 11.4
5 12.3 15 12.3 25 10.7
6 13.0 16 15.7 26 12.5
7 10.5 17 12.6 27 11.8
8 12.8 18 9.1 28 15.1
9 13.2 19 12.9 29 13.4
10 11.2 20 14.6 30 13.1
Steps for making a
table
Step1 Find Minimum (9.1) & Maximum (15.7)

Step2 Calculate difference 15.7 – 9.1 = 6.6

Step3 Decide the number and width of


the classes (7 c.l) 9.0 -9.9, 10.0-10.9,----

Step4 Prepare dummy table –


Hb (g/dl), Tally mark, No. patients
DUMMY TABLE Tall Marks TABLE
 
  Hb (g/dl) Tall marks No. Hb (g/dl) Tall marks No.
patients patients

9.0 – 9.9     9.0 – 9.9 l 1


10.0 – 10.9 10.0 – 10.9 lll 3
11.0 – 11.9 11.0 – 11.9 lll 6
12.0 – 12.9 12.0 – 12.9
13.0 – 13.9 llll llll 10
13.0 – 13.9
14.0 – 14.9 14.0 – 14.9 llll 5
15.0 – 15.9 15.0 – 15.9 3
lll 2
ll
Total    
Total - 30
Table Frequency distribution of 30 adult male
patients by Hb
Hb (g/dl) No. of
patients
9.0 – 9.9 1
10.0 – 10.9 3
11.0 – 11.9 6
12.0 – 12.9 10
13.0 – 13.9 5
14.0 – 14.9 3
15.0 – 15.9 2
Total 30
Table Frequency distribution of adult patients by
Hb and gender:
Hb Gender Total
(g/dl)
Male Female

<9.0 0 2 2
9.0 – 9.9 1 3 4
10.0 – 10.9 3 5 8
11.0 – 11.9 6 8 14
12.0 – 12.9 10 6 16
13.0 – 13.9 5 4 9
14.0 – 14.9 3 2 5
15.0 – 15.9 2 0 2

Total 30 30 60
Elements of a Table
Ideal table should have Number
Title
Column headings
Foot-notes
Number – Table number for identification in a report

Title,place - Describe the body of the table, variables,


Time period (What, how classified, where and when)

Column - Variable name, No. , Percentages (%), etc.,


Heading

Foot-note(s) - to describe some column/row headings,


special cells, source, etc.,
Table II. Distribution of 120 (Madras) Corporation divisions
according to annual death rate based on registered deaths in
1975 and 1976

Death rate (/1000 per


No.annum)
of divisions
7.0-7.9 4 (3.3)
8.0 - 8.9 13 (10.8)
9.0 - 9.9 20 (16.7)
10.0 - 10.9 27 (22.5)
11.0 - 11.9 18 (15.0)
12.0 - 12.9 11 (0.2)
13.0 - 13.9 11 (9.2)
14.0 - 14.9 6 (5.0)
15.0 - 15.9 2 (1.7)
16.0 - 16.9 4 (3.3)
17.0 - 18.9 3 (2.5)
19.0 + 1 (0.8)
Total 120 (100.0)

Figures in parentheses indicate percentages


DIAGRAMS/GRAPHS
Discrete data
--- Bar charts (one or two groups)
--- Pie Charts
Continuous data
--- Histogram
--- Frequency polygon (curve)
--- Stem-and –leaf plot
--- Box-and-whisker plot
Bar Graphs
25 Heights of the bar indicates
20 20
20 16
frequency
15 12 12
9 8 Frequency in the Y axis
er

10
umb

and categories of variable


N

5
0 in the X axis
Smo Alc Chol DM HTN No F-H
Exer The bars should be of equal
Risk factor width and no touching the
other bars
The distribution of risk factor among cases with
Cardio vascular Diseases
HIV cases enrolment in
USA by gender
Bar chart
12
Enrollment (hundred)

10
8
6
Men
4 Women
2
0
1986 1987 1988 1989 1990 1991 1992

Year
HIV cases Enrollment
in USA by gender
Stacked bar chart
18
16
Enrollment (Thousands)

14
12
10
8 Women
6 Men
4
2
0
1986 1987 1988 1989 1990 1991 1992
Year
Pie Chart
•Circular diagram – total -100%
10%
•Divided into segments each
representing a category
20% Mild
Moderate
•Decide adjacent category

Severe •The amount for each category is


70% proportional to slice of the pie

The prevalence of different degree of


Hypertension
in the population
Example data

68 63 42 27 30 36 28 32
79 27 22 28 24 25 44 65
43 25 74 51 36 42 28 31
28 25 45 12 57 51 12 32
49 38 42 27 31 50 38 21
16 24 64 47 23 22 43 27
49 28 23 19 11 52 46 31
30 43 49 12
Histogram
20
Frequency

10

11.5 21.5 31.5 41.5 51.5 61.5 71.5


Age

Figure 1 Histogram of ages of 60 subjects


Histogram
 Histogram- pictorial presentation of discrete
or continuous data, most commonly used
graph in practice;
 Horizontal axis displays true interval limits
whilst the vertical axis depicts
frequency/relative frequency of the intervals.
 Frequency is represented by the area and
not the height, bars are continuous as
intervals share true limits.
Polygon

20
Frequency

10

11.5 21.5 31.5 41.5 51.5 61.5 71.5


Age
Polygon
 Frequency polygons- similar to a
histogram as it uses the same axes but
plots a point at the midpoints of the
intervals corresponding to the frequency
of that interval;
 Can be superimposed making
comparisons of several data sets easy;
 Cumulative frequency polygons have
cumulative relative frequencies on the
vertical axis.
Example data

68 63 42 27 30 36 28 32
79 27 22 28 24 25 44 65
43 25 74 51 36 42 28 31
28 25 45 12 57 51 12 32
49 38 42 27 31 50 38 21
16 24 64 47 23 22 43 27
49 28 23 19 11 52 46 31
30 43 49 12
Stem and leaf plot
Stem-and-leaf of Age N = 60
Leaf Unit = 1.0

6 1 122269
19 2 1223344555777788888
(11) 3 00111226688
13 4 2223334567999
5 5 01127
4 6 3458
2 7 49
Box plot

80

70

60

50
Age

40

30

20

10
Descriptive statistics report:
Boxplot
- minimum score
- maximum score
- lower quartile
- upper quartile
- median
- mean

- the skew of the distribution:


positive skew: mean > median & high-score whisker is longer
negative skew: mean < median & low-score whisker is longer
Box plot
 Box plot- similar to one-way scatter plot
as it requires a single axis but displays a
summary of the data, thus the 25th, 50th
and 75th percentiles, and the adjacent
values which are the most extreme
observations that are not more than 1.5
times the height of the box on either
quartile;
 Outliers are represented by circles
outside the specified range;
Scatter Plots
 One-way scatter plot- uses a single axis
to display the relative position of each
observation in the data set;
 Two-way scatter plot- depicts
relationship between 2 continuous
variables where a point represents a pair
of values, one on the x-axis and another
on the y-axis;
Graphic Presentation of
Data
the frequency polygon
(quantitative data)

the histogram
(quantitative data)

the bar graph


(qualitative data)
General rules for designing
graphs
 A graph should have a self-explanatory
legend
 A graph should help reader to understand
data
 Axis labeled, units of measurement
indicated
 Scales important. Start with zero (otherwise
// break)
 Avoid graphs with three-dimensional
impression, it may be misleading (reader
visualize less easily
GRAPHS

 Simple to construct and clear to read;


 Produce memorable visual images;
 Can easily display complex relationships
which may not be evident in the raw data.
Any Questions

You might also like