M6 M7 Biostat Final

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Biostatistics and Epidemiology

Dr. Nathalie ting 4th yr 1st sem

M6- Biostatistics Median

Outline • The middlemost observation in a set of data that


measures of central tendency has been put in numerical order or in an array
• A good descriptive measure. particularly for data
a. Mean
that are skewed, because it is the central point
b. Median
of the distribution.
c. Mode
• Middlemost observation, if n is odd
Measures of variability • Mean of the 2 middlemost observations, if n is
even
a. Ranges
b. Variance Median (odd)
c. Standard deviation
Ages in years for patients with a particular illness: 4,
d. Mean Deviation
23, 28, 31, 32
Confidence intervals
• The median age is 28 years.
➢ Normal distribution and skewness
➢ Sampling distribution Interpretation: fifty percent of the patients are 28
yrs. or less and 50% are more than 28 yrs.
Measures of central tendency
Median (even)
• Are “typical values”
Ages in years for patients with a particular illness:
• Usually central values that bet represent an
1,3,4,5,6,7,7,8,9,10
entire distribution of data
6+7
Median= = 6.5 yrs
2

• The median age is 6.5 years

Interpretation: fifty percent of the patients are 6.5


yrs or less and 50% are more than 6.5 yrs.

Position of the median

Mean Middle position = (n+1)/2

• Arithmetic mean-more technical name for what When n is the sample size or the total number of
is more commonly called the mean or average observations in the set.
• The value that is closest to all the other values in 1,3,4,5,6,7,7,8,9,10
a distribution
10+1
• Best for descriptive measure for data that are Position of the median = 5.5th observation
2
normally distributed
6+7
𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
Median = 2
= 6.5 yrs
Mean =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
• The median age is 6.5 years
• The high temperatures for a 7- day week during
December in Chicago were 29°, Interpretation: fifty percent of the patient are 6.5 years
31°,28°,32°,29°,27° and 55°. Find the mean high or less and 50% are more than 6.5 yrs.
temperature for the week
Mode
29+31+28+32+29+27+55
Mean= 7 • The most frequently occurring value in a set of
231
observations
Mean= • Easiest measure to understand and explain.
7
Also, the easiest to identify, and requires no
Mean= 33 calculations
The mean of average, high temperature for the week • Always located at the peak of the distribution
was 33° • Preferred for addressing which value is the most
popular or the most common

JOANA BOLA 1
A clinic director believes quality care begins from the
moment the patient walks into his office. He has learned
from the patient satisfaction survey that one of the areas
where improvement should be made is the waiting time
in the lobby. He asked his assistant to randomly select
one patient a day and carefully observe the waiting time
of the patient. The waiting time data for this week,
Monday through Saturday, is 35, 23, 10, 40, 28.21
(minutes). What is the variance of his patients' waiting
Skewed left (negatively) time?
- The median exceeds the mean

Skewed right (positively)

- The mean exceeds the median

Measures of Dispersion
Standard deviation
• Or measures of spread
• The second important feature Of frequency • The square root of the variance
distributions • Provides measure of dispersion in original units
• Measures of central location describe where the • The measure of spread used most commonly
peak is located, measures of spread describe with the arithmetic mean
the variation of values from that peak in the • Conveys how widely or tightly the observations
distribution. are distributed from the center
• Usually calculated only when the data are
The range more-or-less "normally distributed" (bell-shaped
• The simplest measure of variability curve)
• The difference between the highest and lowest
value

Variance

• Is the average of the squared deviations from


the mean
• Need to square the n deviations because the
sum of the deviations from the mean is equal to Mean Deviation
zero • is the average of the deviations from the
The variance represents squared units and not arithmetic mean. The mean deviation gives us a
expresses the concept in terms of its original units rough estimate of the individual scores from the
mean scores. It is the average of the absolute of
the scores from the mean.

JOANA BOLA 2
• Seldom used • The statistic that we can compute from the
sample depend on which ones were randomly
included in the sample
• Repeating the sampling may result to a different
statistics
• Statistics vary from sample to sample (random
variables

Exercise:

A study was performed to evaluate the strength of a


certain prefabricated post for post-and-core buildups in
endodontically treated teeth. Seven canines of similar
length were treated endodontically and prepared with
standardized drill bits to receive the respective posts. A
composite base resin was used for the buildup. The
length of the teeth was approximately 18 mm, and 4 mm
of gutta-percha was left in the apex. The posts were
sandblasted and cemented with composite cement. The
following fracture load data were collected after
thermocycling the tooth samples and tested under Statistical inference
load using an Instron machine at 45 degrees: 109.73,
44.83, 42.86, 57.43, 74.23, 95.88, 48.04 kg. The mean, 2 main types
variance and standard deviation are? Answers in 1
➢ Estimation
decimal place.
➢ Hypothesis testing

Estimation

• Estimation is the process by which a statistic


computed for a random sample is used to
approximate (“estimate”) the corresponding
parameter
1. Get data from sample respondent
2. Calculate summary measure (statistics)
3. Use statistic to estimate parameter value

Hypothesis testing

• Process of deciding whether or not a hypothesis


Statistical inference about target population is true based on sample
data
• Process of generalizing or drawing conclusion
about the target population on the basis of result Hypothesis
obtained from a sample
• A statement about the population
Parameter • Usually something about the values of
parameters
• Measures computed using data from the entire
1. Get data from sample respondents
population
2. Calculate summary measure (statistic)
• Constant (regardless of sample); usually
3. Apply statistical test
unknown
4. Decide whether to reject the hypothesis
Statistics
To estimate the average length of hospital, stay of the
• Measurers computed using data from sample patient admitted for dengue fever in metro manila based
• Variable (depends on the sample) on a sample

Sampling variation • Target population

JOANA BOLA 3
• Variable Properties of the normal distribution
• Parameter
• Statistic
• Estimation/ hypothesis testing

To test if the average length of hospital stay of patient


admitted for dengue fever in metro manila is equal to 5
days based on a sample

• target population Sampling distribution of x̄ is approximately normal


• variable hypothesis distributed
• statistics
• Estimation/hypothesis testing • If the population is normally distributed, the
sampling distribution of x̄ is normally distributed
Importance of normal distribution • if the population is NOT normally distributed, the
sampling distribution of x̄ will approximate
Important role in statistical inference
normality if n is large enough
• Statistical inference usually requires assuming
central limit theorem
normality if the distribution useful in explaining
the distributions of most biological measures Estimation
• Most measurements are assumed to be
normally distributed (height, weight, etc) Point estimate

Properties of normal distribution • Single number estimate of the parameter

• Bell shaped Interval estimate


• Mean-=median= mode at the center • Estimate of the parameter within a range of
• Total area under the curve (AUC)=q or 100% values
• Area= AUC=probability
• Long tapering tails that extend infinitely in both Point estimate
directions, but never touching the x- axis
• The best estimate of the population µ is the
sample of mean x̄

Example:

Properties of the normal distribution


Point Estimate
Shaped is completely determined by 2 parameters:
• Subject to potential sampling error
• Mean- determines the curve’s location in the x- • x̄ will not always be exactly equal to the µ
axis • Because x̄ is a random variable that will vary
• Standard deviation- determined the spread of depending on the sample taken
curve
• Properties of the normal distribution More informative to report the estimate within a range of
values
JOANA BOLA 4
Interval estimate

• Confidence level= 90% or 0.90


Interval estimate α= 0.1% or 10%

• x̄ is an estimate of µ, but they may not be


exactly equal
• we can add and subtract a certain amount of x̄
to create an interval that we believe contains µ

Interval estimate

Interval estimate • Z values corresponding to each confidence level

• z represent ant value in the standard normal


distribution
• level of significance (α) = 100% - desired
confidence coefficient

Interval Estimate example

The study objective is to estimate the mean weight of all


school-aged children. We randomly sample 25 children
and record each person's weight in pounds. The mean
Interval estimate was found to be 74.16 lbs. Based on literature; the
standard deviation of weight is 5.37 lbs. Compute for the
Have to decide
95% confidence interval estimate.
• how confident do we want to be that the interval
x̄ = 7416 lbs
contains µ?
• Known as confidence level of confidence σ= 5.37 lbs
coefficient
• Generally, we select either: 90% ,95%, 99% n=25
• Interval constructed is called the Confidence Interval estimate
interval
• Confidence level= 99% or 0.99 Interpretation:
α= 0.01% or 1% • We are 95% confident that the PARAMETER is
between LOWER LIMIT to UPPER LIMIT
• We are 95% confident that the mean weight of
all school aged children is between 72.05 to
76.26lbs

JOANA BOLA 5
using cadavers fer demonstration those using models. If
the parameter for evaluating student performance is the
proportion who obtain a grade of 2.0 or better, how do
we write Ho and HA?

Null Hypothesis:

Ho: PM = Pc (no difference)

The proportion of students who obtain a grade of 2.0 or


better among those using models for demonstration, PM,
is equal to the proportion among those using cadavers,
Pc.

Alternative hypothesis

HA : PM Pc (two-tailed test)

The proportion of students who obtain a grade of 2.0 or


better among those using models for demonstration, PM ,
is not equal to the proportion among those using
cadavers, Pc

HA: PM > Pc (one tailed test)

The proportion of students who obtain a grade of 2.0 or


better among those using models for demonstration, PM,
Steps in hypothesis testing is greater than the pro those using cadavers, Pc
1. Stating the null hypothesis Ho and alternative State the level of significance
hypothesis HA
2. Stating the level of significance, a level of significance, α, is the probability of occurrence
3. Choosing the test statistic and determining its (of the observed and more extreme values) that is
sampling distribution considered too low to warrant support of the hypothesis
4. Determining the critical region being tested.
5. Computing the test-statistic
6. Making a statistical decision, ie. Whether or not α = 100%- desired confidence coefficient
to reject the null hypothesis
7. Drawing conclusions about the population i.e., a cut-off points to tell whether the p-value is too
small or too large
State the null and alternative hypothesis
• note the definition of the p-value
Hypothesis
The researcher sets the level of significance arbitrarily.
• Should be cast un a suitable form (testable)
α (Type I error) is the probability of rejecting the null
• Should open the way for statistical assessment hypothesis when, in fact, the null hypothesis is true.
(permits computation of a test statistic)
Example: A researcher sets the a at 0.05. If at the end of
hypothesis testing procedure,
Ho is the hypothesis of no difference usually a statement
of equality he/she decides to reject the HO, the probability that
he/she REJECTED a TRUE NULL
HA is the research hypothesis. Hypothesis that the
investigator believes in hypothesis is 5%

state the level of significance


State the null and alternative hypothesis
β is another type of probability error that is related to α
Suppose a proposed study was used to comp re the
β or type II error is the probability of NOT REJECTING a
performance in anatomy of 2 groups of students, those
FALSE NULL hypothesis.

JOANA BOLA 6
Recall that α is the arbitrarily set by the researcher β on
the other hand, inversely related to α.

I.e lower alpha, higher beta

How to use critical region

1. Determine whether your alternative hypothesis


Select the appropriate test statistic implies a one-tailed or two-tailed test
2. Determine the value of the test statistic
Test criterion or test statistic depends on the sampling corresponding to the set alpha. This will give you
distribution of the sample statistic which is used to test the divide between the critical region (rejection
the hypothesis. region) and the non-rejection region.
3. Reject the null hypothesis if COMPUTED test
Sampling distributions of test statistics statistic falls at region of rejection.
• The various sampling distributions of the test Do not reject the null hypothesis if COMPUTED test
statistic is where we get our p-values. statistic falls at non-rejection region.
• Statisticians have created tables for different
statistics. These tables show the probabilities Compute the test statistic
associated with values of the test statistic.
The ff should already be determined at this step:
• Standard normal table
Alpha, test statistic and the critical region
Criteria for test selection
Formula for computing the test statistic varies per type of
1. Objectives of the study
test statistic.
2. Design of the study
3. Type of variables Note: we have a value of the test statistic which divides
4. Level of measurement the distribution into a rejection and non-rejection region
5. Whether the samples are related or independent
6. Assumption about the test In this step, we are referring to the test statistic that
would be computed from your data.

Make a statistical decision

The statistical decision is always towards the null


hypothesis. Whether we reject or do not reject.

Ways to make a statistical decision:

1. Looking at where the computed test statistic falls.


Does it fall in the rejection region?

Determine the critical region 2. Assessing whether the corresponding p-value is less
than alpha or not. Look at confidence interval estimates.
The critical region or region of rejection is the set of
values of the test statistic which leads to the rejection of Statistical decisions
the null hypothesis
• If computed test statistic falls under region of
It is the value of the test statistic that corresponds to α. rejection, the null hypothesis is rejected. If
Usually found at the tail-end of the distribution otherwise, the null hypothesis is not rejected.

JOANA BOLA 7
• If the p-value is less than alpha, the null
hypothesis is rejected. If otherwise, the null
hypothesis is not rejected.

Draw a conclusion

• Rejection of the null hypothesis leads to a


conclusion stated this way: "There is sufficient
evidence to say that (alternative hypothesis)"
• Non-rejection of the null hypothesis leads to a
conclusion stated this way: "There is no
sufficient evidence to say that (alternative
hypothesis)"

JOANA BOLA 8

You might also like