Basic Concepts On Statistics

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 27

Descriptive Statistics

By
Umesh Raj Aryal
Lecturer
Department of Community Medicine
Kathmandu Medical College
Affiliated to Kathmandu University
What is Statistics?

Statistics is the science that deals with


collecting, organizing & interpreting data
using well-defined procedures in order to
make decisions
Main Types of Statistics

 Descriptive Statistics

 Inferential Statistics
Descriptive Statistics

Descriptive Statistics involves organizing,

summarizing & displaying data to make


them more understandable

 Most common statistics used are frequencies,


percents, measures of central tendency,
summary tables, charts & figures.
Inferential Statistics

 Inferential Statistics: a set of statistical


techniques that provides predictions about
population characteristics based on
information from a sample drawn from that
population.
Tabulating and Graphing Numerical Data

Numerical Data 41, 24, 32, 26, 27, 27, 30, 24, 38, 21

Frequency Distributions
Ordered Array
Cumulative Distributions
21, 24, 24, 26, 27, 27, 30, 32, 38, 41

Tables
Polygons
Histograms Ogive
Frequency
7
O g iv e
6 7
5 6 120
4 5
100
4
3
3 80
2
2 60
1 1
40
0 0
10 20 30 40 50 60 5 15 25 35 45 55 More 20
Tabulating Numerical Data: Frequency
Distributions
 Sort Raw Data in Ascending Order
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44,
46, 53, 58
 Find Range: 58 - 12 = 46
 Select Number of Classes: 5 (usually between 5 and 15)
 Compute Class Interval (Width): 10 (46/5 then round up)
 Determine Class Boundaries (Limits):10, 20, 30, 40, 50, 60
 Compute Class Midpoints: 15, 25, 35, 45, 55
 Count Observations & Assign to Classes
Frequency Distributions, Relative Frequency
Distributions and Percentage Distributions
Data in Ordered Array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46,
53, 58
Class Frequency Relative Frequency Percentage

0-10 0 0 0

10-20 3 0.15 15

20-30 6 0.3 30

30-40 5 0.25 25

40-50 4 0.2 20

50-60 2 0.1 10

Total 20 1 100
Tabulating Numerical Data:
Cumulative Frequency

Data in Ordered Array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Lower Cumulative Cumulative


Limit Frequency % Frequency
Less than10 0 0
Less than20 3 15
Less than 30 9 45
Less than40 14 70
Less than50 18 90
Less than 60 20 100
Tabulating and Graphing Univariate
Categorical Data

Categorical Data

Tabulating Data
Graphing Data The Summary Table
The Contingency Table

Bar Charts Pie Charts Pictogram Cartogram


200000

158669

100000 84897

0
Male Female
Sex
Summary Table
(for occupation of the population )

Occupation Number Percent


Management 264 26.4
Clerical 382 38.2
Services 148 14.8
Manufacturing 206 20.6
Total 1000 100
Tabulating for Bivariate Categorical Data

Contingency Tables: Smoking habits and lung cancer

Lung Cancer

Smoking Present Absent Total

Yes 92 8 100

No 10 90 100

Total 102 98 200


Numerical Descriptive Measures
Summary Measures
Summary Measures

Central Tendency Variation


Quartile
Mean Mode
Range Variance
Median Percentile
Coefficient of
Inter Quartile Range Variation

Standard Deviation
Summary Measures
Summary Measures

Central Tendency Location

Median Mode
Quartile Percentile
Mean
Max. repeated
value

x i (n  1) i (n  1) i ( n  1)
x Md 
2 Qi  Pi 
n 4 100
The Shape

Mean Mean Mean


Mode Mode
Median
Median Mode
Median

Negatively Symmetric Positively


Skewed (Not Skewed) Skewed
41, 24, 32, 26, 27, 27, 30, 24, 38, 21
Summary Statistics
(Based on central values)
VARIABLE
N Valid 9
Missing
0
Mean 29.22
Median
27.00

Mode 24.00

Mean>median, the data are right skewed.


Here, Median is better presentation
Selecting among mean, median and mode
Summary Statistics
(Based on noncentral values)
Statistics

VARIABLE
N Valid 9
Missing
0
Percentiles 25 24.00
40 26.00
50 27.00
70 32.00
75 35.00
Measures of Variation
Variation

Interquartile Range

IQR Q3  Q1 Standard Deviation Coefficient


Range of Variation

 X i  X
2
Xlargest - Xsmallest S
S i 1 CV   100%
n 1 X
Summary Statistics (Variation)
(Based on noncentral values)
VARIABLE
N Valid 9
Missing 0
Range 20
Minimum
21
Maximum
41
Quartiles 25 24.00
50 27.00
75 35.00

IQR Q  Q
3 1
 35  24
11
7, 4, 9, 7, 3, 12
Summary Statistics (Variation)
(Based on central values)
VARIABLE
N Valid 6
Missing 0
Mean 7.00
Median 7.00
Mode 7
Std. Deviation
3.29

Variance
10.8
Comparing Coefficient
of Variation
College A: College B:
Average Height 155 cm Average Height = 160 cm
Standard deviation = 10 cm Standard deviation = 5 cm

s  10  s  5 
 
CV    100%    
100%  6.45% CV    100%   100%  3.12%
 x  155   x  160 

CV for College A> CV for College B, There is


more variation in height in college A
Population Mean and Standard deviation
x
 Population Mean
N
N

 Xi   
2

 i 1 Population standard deviation


N

 X 
2
i
2  i 1 Population variance
N

S2 = Sample Variance
Thank you

You might also like