Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 65

Medical Statistics

医学统计学
Yinguang Fan (范引光)

fanyinguang@163.com
Dept. of Epi & Biostatistics
1
Do you know Statistics ?

• Nothing
= ¥5 = ?

• Known
• Central limit theory.
• Least square method .
• Logistic regression.
●●● ●●● 2
About Statistics
• There are three kinds of lies: lies, damned
lies, and statistics.
---------Benjamin
Disraeli.
• Statistics are like a bikini. What they reveal
is suggestive , but what they conceal is
vital.
---------Aaron Levenstein.
3
Report on the Quality of Life in Chinese (2004)

• 80% Chinese feel happy in daily life, while


people who live in the rural area were happier
than those who live in the urban area.

– It was a websurvey.
– Happy is a subjective feeling, it is changeable.

• How to get a objective answer about subjective feeling ?


• How to assess your salt taste ? Heavy, medium, light ?

4
Statistical thinking
• The reports of National Bureau of statistics.

• Macroscoptical thinking

• Sampling method

• Sample size

• Results

• Statistical description

• Statistical reference 5
Age
Age
Lecture 1 : Introduction

一.Something about Medical Statistics

二.Basic concepts of statistical terms

三.Steps of medical statistics

6
Vocabulary for Lecture 1
Statistics 统计学
Statistic 统计量
Medical statistics 医学统计学
Mathematical statistics 数理统计学
Homogeneity 齐性、同质
Variation 变异
Population 总体
Sample 样本
Randomization 随机化
7
Vocabulary for Lecture 1
Finite population 有限总体
Infinite population 无限总体
Systematic error 系统误差
Random error 随机误差
Sampling error 抽样误差
Random event 随机事件
Small probability event 小概率事件
8
卫生统计学 / 医学统计学

• Medical Statistics
• Health Statistics
• Biostatistics
• Statistics for health professionals
• Statistics in public health

9
Why do we have to learn statistics ?

To know
• Upgrade ourwhat theknowledge
medical bikini conceals
system..
• Data treatment.
• Statistical thinking is extremely important.
• Medical research require statistical
background.

10
The most important things

• Concentrate on your statistical work !

• Cross-check you data !

• Do not tamper with your database!

11
Teaching and Learning
• Concepts , principles and methods
-- lectures and practice.
• Applied Statistics
– Practice session----Experiments and Discussion
– Statistical software----Display in class. ( if you
have interesting )

12
Commonly used statistical software
Programming & Fool-style operation
• SAS: Statistical Analysis System
• SPSS: Statistical Package for the Social Science
• STATA
• R
• Mini Tab
• ∙∙∙ ∙∙∙
• DPS: Data Processing System -----MIC,Yeah !
13
Statistics
• The discipline concerned with the treatment
of numerical data derived from groups of
individuals (P. Armitage).

• The science and art of dealing with variation


in data through collection classification and
analysis in such a way as to obtain reliable
results ( JM Last).
14
Medical Statistics

Application of statistics in the field of


medicine.

15
Characteristics of Medical Statistics
• Numerical properties of groups rather than
individuals.
• Collection of methods that enables one to draw
reasonable conclusions from data.
• The art of making numerical conjecture about
puzzling questions.
• Using the quantity to reflect the quality.
• Using known to reflect the unknown.
16
Theoretical basis
• Theoretical basis
– Mathematical statistics
– Probability theory
Two kinds of events:
--Inevitable events: Probability = 0 / 1;
--Random events: 0 < Probability < 1
• Object of statistics: Random events
17
二、 Basic concepts of statistical
terms
1. Individual & Variable
2. Population & Sample
3. Homogeneity & Variation
4. Frequency & Probability
5. Parameter & Statistic
6. Error
7. Types of data
8. The steps of statistical work
18
1. Individual & Variable (1)
• Individual ( Observatory unit ): The basic unit
in statistical research, it depends on the purpose.
– Each student in our class.
– every class/grade of our school.

• Variable ( Indicator ): individual characteristics


– Height 、 weight 、 gender 、 blood type of everyone in
our class
– The number of students in each class of our school

19
1. Individual & Variable (2)
• Variable value : the value of variables.
– Height: 1.65 meter ; weight: 52 kg
– Gender: female blood type: “O”
– Laboratory test: negative , positive
– Treatment effect: recovered, improved, unrecovered
• Data: composed of a lot of variable values.
– Data for blood glucose
– Data for HIV test
20
2. Population and sample (1)
• Population: The whole collection of
individuals that one intends to study.
– All the undergraduates in Hefei aged 20 , 2018
– All the people live with HIV in China.
– All the students of our class.

21
2. Population and sample (2)
• Population
– Finite population: the space, time for a
specific population have been limited.
– Infinite population: no space , time limits for
the population.
It only exists in our imagination.

22
2. Population and sample (3)
• Sample: A representative part of the
population.
– 20 students in our class.
– 5 ml blood of our body.
Test for Hepatitis C, Syphilis and HIV.
• Representative:
– Randomization.
– The sample size is big enough.

23
2. Population and sample (4)
• Randomization: selects some subjects/individuals
from the population randomly.
• It can guarantee each of the individuals in the
sample was selected from the population with
the same probability.
– Select 5 students form our class, the probability is
5/40, 4/39, 3/38, 2/37, 1/36.

24
2. Population and sample (5)
Question: How to select individuals randomly ?
• Toss a coin: football game
• Select a card in a black box.
Selected our dormitory monitor.
• Welfare lotteries number.

25
2. Population and sample (6)
Randomization
Population Sample
Inference
Population characteristics Sample information
( Parameter ) (Statistic)

Note : Sampling is the way to get information, but


inferring the population is our ultimate aim.
26
3. Homogeneity & Variation (1)
 Homogeneity : Common characteristics for the
given individuals

– All individuals are Chinese, women, aged 30 years


old, live with HIV in rural area of Fuyang city,
Anhui province.

Homogeneity in nationality, gender, age, living


condition and disease.
27
3. Homogeneity & Variation (2)
 Variation: difference existing among the
variable values of homogeneity individuals .

– All individuals are Chinese, women, aged 30 years old,


infected with HIV in rural area of Fuyang city, Anhui
province.

– The mode of HIV infection, side effects on AIDS Anti-


retroviral therapy and the ends of them are different.
28
3. Homogeneity & Variation (3)
• If there is no variation, there is no need for statistics.

• We always study on the Variation based on the


Homogeneity .

Give us an example to distinguish Homogeneity and


Variation .

29
3. Homogeneity & Variation (4)
• Toss a coin: The mark face may be up or
down . (Throw the dice ) .
• Treat the patients suffering from pneumonia
with same antibiotics: some of them
recovered and others didn’t.

30
4. Frequency & Probability
(1)
• Frequency :Given the same condition, repeat a trial for n
times independently (like play cards). Among the n trials,
event A (get the king) appears m times , so the ratio of m /
n was called the frequency of random event A among n
trials.
Number of observations: n (large enough)
Number of occurrences of random event A: m

then :
P(A)  m n
31
4. Frequency & Probability
(2)
• Probability: the likelihood of random events.

Given the same condition, repeat a trial for N times


independently. Among N trials, A appears for f
times , so thef / ratio
N of is called the frequency of
random event A. As N increases gradually, the f / Nfrequency
will approach a constant. The constant ratio was called
the probability of random event A and expressedPin(A) .
P .
In common, it is abbreviated as

32
4. Frequency & Probability
(3)
• Frequency is used to describe the sample,
while the Probability for the population.
• m / n is the estimation of P(A) .
• As trials increases, the estimation value is
more reliable.

33
4. Frequency & Probability
(4)
0 P
(A) 1
if P
(A)=1, A is absolute event. 

 nonrandom event;
(A)=0,A is absolutely nonoccurrence event 
if P

if 0  P
(A) 1, A is random event.

34
4. Frequency & Probability
(4)
• Random event: the event may occur or may not
occur in one experiment.
• Before experiment, nobody sure whether the event
occurs or not.
• There must be some regulation in a large number
of experiments.

Give us some examples of random event.

35
4. Frequency & Probability
(5)
• small probability event: Because the conclusions
are made based on a certain significance level,
statisticians always select PA  0.05 OR PA  0.01
as judge criterion. So such events with PA  0.05
OR P A  0.01 are called small probability events.
• It means that such events happen rarely and can be
regarded as nonoccurrence in a randomization or a
trail.
36
5. Parameter & Statistic (1)
• Parameter : A measure of population
or A measure of the distribution of population.

• Parameter is usually presented by Greek letter,


such as μ, π.

• Parameters are unknown.

• To know the parameter of a population, we need


a sample.

37
5. Parameter & Statistic (2)
• Statistic: A measure of sample or a measure of the
distribution of sample.
• Statistic is usually presented by Latin letter , such
as s and p.

• Questions 1: Give us an example to distinguish


parameter and statistic.
• Questions 2: Does a parameter vary? Does a
statistic vary?
38
5. Parameter & Statistic (3)
Sample 1 Statistic 1

Sample 1

Sample 2
Sample 2

Statistic 2
Population
Sample 3
Sample 3 Statistic 3
Sample 4

Sample 4 Statistic 4
Parameter 39
6. Error ( 1 )
• Error : the difference between
measurement value and true value.
• Kinds of classification for error in different
major.
• In statistics :
– Systematic error
– Random error
– Sampling error
40
6. Error ( 2 )
• Systematic error: it is produced in
experiment and keeps constant or changes
according certain rules.
• Usually, the reasons are known and the
error are controllable.
– The weight showed by the electric weighing
scales are 2 kilograms higher than our actual
weight.
41
6. Error ( 3 )
• Random error: Unstable and changing at
random errors that caused by uncontrolled
factors.
• Commonly, random errors are referred to
those errors appearing during repeated
measurements.
– The height/Blood pressure of my is different in
morning , noon and night.
42
6. Error ( 4 )
• Sampling error : it is caused by study on only
a sample rather than the whole population.
– Sample method
– The Individual differences
• In Statistics, sampling error is the main study
contents.
– The average score of some ten students (selected
randomly) is not equal to that of our whole class.
43
6. Error ( 5 )
• When a teacher marked the examination papers ,
he gave a wrong answer to a choice question.
Then he marked an essay question 7 points in the
morning while gave 8 points to the same answer
in another papers in the afternoon. In the end, the
average score of the class is 87 points while that of
some 10 students is 80 points.
• Which type of error happened above ?

44
7. Types of data (1)
• Measurement data: Quantitative/numerical data.
• Measurement data always has measurement units.
• If a variable can take on any value between its
minimum value and its maximum value, it is called a
Continuous variable; otherwise, it is called a Discrete
variable.
– Continuous variable:
• Height data (meter), weight data (kilogram)
– Discrete variable:
• Number of tooth ( 颗 ), Number of child (个)
45
7. Types of data (2)
• Enumeration data: Qualitative / Count data.
• For such data, it needs to classify the
observation units before and count them.
• Its value appear different characteristics and
sorts.
– Binomial: gender, yes or no, pass or fail in the
exam.
– Multiple:blood type, our occupations, constellation

46
7. Types of data (3)

• Ranked data: Ordinal / Semi-quantitative data.


• It need to classify observatory units into different
classes according to the extent before calculate the
frequencies of each groups.
• it represents the order of individuals
– Degree of burn :Ⅰ,Ⅱ,Ⅲ .
– Steak: Well done, medium well,Medium, medium rare, Rare
47
7. Types of data (4)
Which type of variables they belong to ?
• 45 000 people
• 4.5 thousand people
• WBC (9.41× 109/L)
• blood type B
• Reaction for the medicine (+++)
• target cell death
48
7. Types of data (5)

• Measured blood pressure of the students in


our class, what kind of data shall we get ?

– Read the indicator directly: 90mmHg


– Test for enlisting: pass / fail
– Clinic screening: lower , normal and higher blood
pressure

49
7. Types of data (6)
Data transformation:
Quantitative data

ranked data ( multiple )

binomial data
• Choosing of statistical methods depends on the
data type to a great extent 。
50
7. Types of data (7)

 Continuous variable
Measurement data  
 Discrete variable 

Data   Binomial
Enumeration data Unordered categories Multiple
  
 Ordinal categories

51
三、 Steps of medical statistics

• Design

• Data Collection

• Data Compilation

• Data Analysis

52
Design (1)
• A good design always covers Professional
design and Statistical design.
• Professional design:
– Research aims
– Subjects
– Measures
• Professional design should guarantee the
research useful and advanced.
53
Design (2)
• Statistical design involves all the arrangements for
the process of data collection, sorting and analysis .
– Sampling or allocation method
– Sample size
– Randomization
– Data processing
• Statistical design should guarantee the research
reliable and economical (manpower, material
resource and time)
54
Design (3)
Experimental Study
It is a type of evaluation that seeks to determine
whether a program or intervention had the intended
causal effect on program participants.
 Three key components :
♠ pre-post test design;
♠ a treatment group and a control group;
♠ random assignment of study participants.

55
Statistical Design (2)
Observational study
It is a study in which inferences are drawn
or hypotheses tested through observational
methods. which does not include any
intervention or experimentation.

How to assess the effect of a medicine for influenza ?

56
Data Collection (1)
• Collection data can be time consuming
and expensive .One need not collect
every possible piece of data to make
decision.
• Sampling methods are designed to gain
maximum information at minimum cost.

57
Data Collection (2)
• Objective : to gather accurate and reliable raw
data
• Data sources :
– statistical reporting
– routine records
– purposive surveys or experiments
– statistical yearbook and special data book
• Requirements :
– Randomization
– sufficient sample size
58
Data Collection (3)
• Steps in Data Collection
– State the purpose for collecting data
– Determine sources
– Determine data capture and presentation methods
– Train personnel
– Collect the data accurately
– Document the work

59
Data Collection (4)
• The big issues in Data Collection :
– The extent to which a given set of data can be
relied on depends on its quality
– Data Design quality
• Are the data relevant to the problem we wish to solve
?
– Data production Quality
• Were the data collected with sufficient skill and
care ?

60
Data sorting (1)
• It is the process that cleans and systematizes
raw data.
• Data sorting should meet the requirment of
next step------data analysis.

61
Data sorting (2)
• Checking
Hand, computer software
• Amend
• Missing data?
• Grouping
– According to categorical variables (sex, occupation,
disease…)
– According to numerical variables (age, income,
height…)

62
Data Analysis
• To illustrate the rules hidden in the data.
• It includes two aspects:
– Statistical description : it is the process of describing
the characteristics of data with statistical figures,
statistical tables and statistical indicators.
– Statistical inference : the process of using sample
statistic to infer population parameter. It consists of:
parameter estimation and hypothesis testing.

63
indicator

Statistical
description
Table and chart

Statistical
analysis
Parameter
estimation
Statistical
inference

Hypothesis
testing
64
65

You might also like