Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 7

DESCRIPTIVE

STATISTICS
LECTURE 1
PM DR KHAIRIL ANUAR MD ISA
LEARNING OUTCOMES
At the end of this lecture, the student will
• understand the basic concept and terminology of biostatistics, including types of variables,
measurement, and measurement scale.
• be able to understand population parameter and sample statistics.
• understand how data can be appropriately organised and displayed (making APA table,
making bar plot, histogram for checking normality and producing box plot for assessing
outliers).
• understand how to reduce data sets into a few useful, descriptive measures.
• be able to calculate and interpret measures of central tendency, such as mean, median, and
mode.
• be able to calculate and interpret measure of dispersion, such as range, variance, and
standard deviation.
• understand the central limit theorem and when to apply it.
• be able to use basic R software command
POPULATION vs. SAMPLE
• Suppose we have a population of size , consisting of the ages of five children who are outpatients in a
community health center.

• The ages are follows: and .


age <-c(6,8,10,12,14) 𝑁

∑ 𝑥𝑖
50
𝑖= 1
mean(age) 𝜇=
𝑁
=
5
=10

var(age) 𝑁=5
Parameters

( 𝑥 𝑖 −𝜇 )
‾ 2

𝑆 2
=
∑ 40
= =10 𝜇
𝑁 −1 4
2 2
𝑆 𝑆
SAMPLE

sample_age <- sample(age, size = 2, replace = FALSE)


𝑛
𝑥𝑖 N=5

𝑖= 1
𝑥=
𝑛
n=2
statistics

( 𝑥𝑖 − 𝑥 )
‾ 2

2
𝑠 =

𝑛− 1
mean(sample_age)
var(sample_age)
Female, N = Male, N =
install.packages("gtsummary") Characteristic 21 31
library(gtsummary)
Age 10.00 (2.83) 10.00 (4.00)
Race
Children_age <- data.frame( Chinese 0 (0%) 2 (67%)
Age = c(6,8,10,12,14), Indian 0 (0%) 1 (33%)
Gender = c("Male", "Female", "Male", Malays 2 (100%) 0 (0%)
1
Mean (SD); n (%)
"Female", "Male"),
Race = c("Chinese", "Malays", "Indian”

,"Malays", "Chinese")
) Tip
For numerical variable with normally distributed data, we usually
reporting their mean and respective sd.

However, when the numerical data are not normally distributed,


tbl_summary( median and IQR are reported.
data = Children_age, Categorical data are reported by their frequency and percentage.
by = Gender,
type = list(Age ~ "continuous"),
statistic = list(Age ~"{mean} ({sd})")
)
VIEWING DISTRIBUTION (ggplot2)
• How you visualize the distribution of a variable depends on the type of variable either categorical or numerical.
• Categorical variable
• Bar graph

• Numerical Variable
• Histogram (To show distribution)

• Box-plot (To visualise outliers)


STATISTICS

Descriptive Inferential
statistics statistics

Measure of
Measure of Hypothesis
central Estimation
dispersion testing
tendency

Mean,
Variance, SD, Point Confidence P-value
Median, Ho/Ha
IQR,etc. estimate interval calculation
Mode

You might also like