Download as pdf or txt
Download as pdf or txt
You are on page 1of 155

SUMMARY MEASRES

1 3/14/2024
Learning objectives
❑ After completing this chapter, a student will able to;

✓ List and calculate measures of central tendency

✓ List and calculate measures of dispersion

✓ List and calculate measures of shape

2 3/14/2024
Measures of Central Tendency or location

 It gives us information about the location of the center of the

distribution of data values.

 Measures of central Tendency: the various methods of


determining a single value at which the data tend to
concentrate .

 Hence, measures of central Tendency is a value which tends to


sum up or describe the mass of the data in to single value.

3 3/14/2024
Measures of Central Tendency

 These are the most common measures of central

tendency:
1. Arithmetic Mean

2. Median

3. Mode

4 3/14/2024
1. The Arithmetic Mean or Simple Mean (𝑋ത )
 Is the sum of all observations divided by the number of

observations (N) and is usually denoted by 𝑋ത .

 Most widely used measure of central tendency.

- Let us consider X1,X2,..., XN are the list of N measurements


obtained from N subjects.
- Then the mean for ungrouped number of measurements for N
subjects is defined as:

5 3/14/2024
1. The Arithmetic Mean or simple Mean cont…

 When the data are arranged or given in the form of


frequency distribution i.e. there are K variety such that a
value Xi has frequency fi (i=1,2,…,k), then the arithmetic
mean ungrouped data can be computed as follows:

where
 k=the number of observations

 Xi=the ith individual observations and

 f i =frequency of the ith item

6 3/14/2024
Example 1

 Consider the data on birth weight of 10 new born

children in kilogram in a certain hospital: 2.51, 3.01,


3.25, 2.02,1.98, 2.33, 2.33, 2.98, 2.88, 2.43.

 Then the average birth weight can be ?

Solution:

7 3/14/2024
Grouped data
 In calculating the mean from grouped data, we assume that

all values falling into a particular class interval are located


at the midpoint of each interval. Therefore, mean for
grouped data is calculated as:

8 3/14/2024
Example2
Compute the mean age of 169 subjects from the grouped data.
Mean = 5810.5/169 = 34.48 years

Class interval Mid-point (mi) Frequency (fi) mifi


10-19 14.5 4 58.0
20-29 24.5 66 1617.0
30-39 34.5 47 1621.5
40-49 44.5 36 1602.0
50-59 54.5 12 654.0
60-69 64.5 4 258.0
Total __ 169 5810.5

9 3/14/2024
Characteristics
✓ The value of the arithmetic mean is determined by every item in the
series.
✓ It is greatly affected by extreme values.

✓ The sum of the deviations about it is zero.

✓ The mean can be used as a summary measure for both discrete and
continuous data, in general however, it is not appropriate for either
nominal or ordinal data.
Advantages
1) It is based on all values given in the distribution.
2) It is most early understood.
3) It is most amenable to algebraic treatment.

10 3/14/2024
Disadvantages
1) It may be greatly affected by extreme items ,

✓ its usefulness as a “Summary of the whole” may be

considerably reduced.

2) When the distribution has open-end classes,


✓ its computation would be based on assumption, and therefore

may not be valid.

11 3/14/2024
2. Median
 Is an alternative measure of central tendency, perhaps second in
popularity to arithmetic mean.
1. Ungrouped data
 Suppose that there are n observations in a sample.
 The observations arranged in increasing or decreasing order, then
median is defined as the middle observation from the set of data.

1. [(n+1)/2]th observation…………..if n is odd

2. The average of the {(n/2)th +[(n/2)+1]th }/2 values …if n is even.

12 3/14/2024
Example
The number of children with asthma during a specific year in a
certain local districts clinic is shown.
Find the median for this data set.
253, 125, 328, 417, 201, 70, 90
Solution:
First we must arrange the data in ascending order
70, 90, 125, 201, 253, 328, 417

Therefore, the fourth observation is the median of the data, i.e. the
value 201 is the median value.

13 3/14/2024
Median for grouped data
2. For grouped data

-If data are given in a continuous frequency distribution, the


median is defined as:

Where: Lmed =lower class boundary of the median class. f med= The
frequency of the median class, W=the size of the median class, n=
total number of observation, f c= The cumulative frequency less
than type preceding the median class.
Note: Median class is the class with a smallest value of the
cumulative frequency {less than type) greater than or equal to n/2.

14 3/14/2024
Cont…
 Example; find the median for the following distribution

15 3/14/2024
Cont…
Solution

16 3/14/2024
Cont…
 We can computed the median value as follow; n=76, 76/2=38
✓ The values greater than or equal to n/2=76/2=38 are 39,54,
66, 72,76.
✓ The smallest value among these less Ogive type frequency is 39
✓ So the median class is the third class (49.5-54.5)

17 3/14/2024
Characteristics
➢ It is an average of position.

➢ It is affected by the number of items than by extreme values.

➢ The median is not influenced by large sample values and,

➢ It is a better measure of centrality if the distribution is


skewed.
➢ The median can be used as a summary measure for ordinal,
discrete and continuous data, in general however, it is not
appropriate for nominal data.
18 3/14/2024
Advantages
1) It is easily calculated and is not much disturbed by extreme values

2) It is more typical of the series


3) The median may be located even when the data are incomplete,
E.g. When the class intervals are irregular and the final classes have
open ends.

Disadvantages
1. The median is not so well suited to algebraic treatment means.
2. It is not so generally familiar as the arithmetic mean

19 3/14/2024
3. Mode
 It is the most frequently occurring value.

 There can be no or more than one mode.

 A unique number of observations, there may be no mode.

 There may be more than one mode such as when dealing with

a bimodal (two-peaks) distribution.

o The mode is not often used in biological or medical data.

20 3/14/2024
Examples
1. Find the mode of 5, 3, 5, 8, and 9 ; Mode = 5
2. Find the mode of 8, 9, 9, 7, 8, 2, 5; Mode =8 and 9
3. Find the mode of 4, 12, 3, 6, and 7. No mode/ mode doesn’t
exist.

21 3/14/2024
Mode for Grouped data
The mode for grouped data can be computed using the following
1
formula: Mode = L + ∗W
 1 +  2
where L = The lower class boundary of the modal class;

1 = fmod - f1,  2 = fmod - f2


w = the size of the modal class
f1= frequency of the class preceding the modal class.
f2= frequency of the class succeeding the modal class
fmod = frequency of the modal class.

 NB: Modal class is class with a highest frequency.

22 3/14/2024
Cont…
Example: Calculate the modal age for the age distribution of 228 patients
below.

23 3/14/2024
Cont…
Solution
By inspection (simply looking at the frequencies), the mode lies in
the fourth class, where L=29.5, fmod = 57, f1=50, f2=48, w = 5, and

∆1=57-50=7, ∆2=57-48=9
7
Therefore, the modal age, x = 29.5 + 7 + 9 ∗ 5

= 29.5 + 2.2
= 31.7

24 3/14/2024
Characteristics

▪ The mode can be used as a summary measure for nominal,

ordinal, discrete and continuous data, in general however, it is


more appropriate for nominal and ordinal data.
 It is not affected by extreme values
 It can be calculated for distributions with open end classes
 Sometimes its value is not unique
 The main drawback of mode is that it may not exist

25 3/14/2024
Advantages
 It is not affected by extreme observations.
 Easy to calculate and simple understand
 It can be calculated for distribution with open ended class
Disadvantages
o 1 It is not suitable for further for mathematical treatment

o It is not based on all observations

o It is not stable average because it is affected by fluctuations of

sampling to some extent.


o Often its value is not unique.

26 3/14/2024
Measure of non-central tendency
These are quartiles, deciles and percentiles.
1. Quartiles

- Quartiles are measures that divide the frequency distribution in to four


equal parts.

- The value of the variables corresponding to these divisions are


denoted Q1, Q2, and Q3 often called the first, the second and the third
quartile respectively.

- Q1 is a value in which 25% items which are less than or equal to it

- Similarly Q2 has 50% items with value less than or equal


27 to it. 3/14/2024
Cont…
− Q3 has 75% items whose values are less than or equal to it.

❑ Quartile for ungrouped data.

✓ Arrange data in ascending order.

✓ If the number of observation is


A. Odd
𝑖(𝑛+1)th
 Qi = item
4
 B. Even
𝑖𝑛 𝑖𝑛
4
𝑡ℎ+ 4 +1 𝑡ℎ
 Qi =( )
2

28 3/14/2024
For grouped data

29 3/14/2024
2. Percentiles
❑ Simply divide the data into 100 pieces

❑ Shows the percentage of values that fall below the particular value in a

set of data scores.

30 3/14/2024
Cont…
✓ Arrange the numbers in ascending order.

Percentiles for individual series


A. Odd
𝑖(𝑛+1)th
Pi = item
100
B. Even
𝑖𝑛 𝑖𝑛
100
𝑡ℎ+ 100+1 𝑡ℎ
Pi =( )
2

Percentiles for grouped data

𝑤 𝑖𝑛
𝑃𝑖 = 𝐿 + − 𝐶𝐹 ,i = 1, 2,...,99 .
𝑓𝑃𝑖 100

31 3/14/2024
Cont…

32 3/14/2024
Cont…
 For example: suppose that 50% of a cohort survived at least

4 years.

 This means also that 50% survived at most 4 years.

 We say that 4 years is the median.

 The median is also called 50th percentile.

 We write p50= 4 years.

33 3/14/2024
Example
Marks of 50 students out of 85 is given below. Based on the data find
𝑄1 𝑎𝑛𝑑 𝑃7.
46-50 51-55 56-60 61-65 66-70 71-75 76-80
Marks
4 8 15 5 9 5 4
fSolution:
i first find CB and CF distribution.

Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80


CB 45.5-50.5 50.5-55.5 55.5-60.5 60.5-65.5 65.5-70.5 70.5-75.5 75.5-80.5
fi 4 8 15 5 9 5 4
CF 4 12 27 32 41 46 50
Second determine the quartile and percentile classes.
For 𝑄1 : the smallest CF ≥ i*N/4=1*50/4= 12.5

34 3/14/2024
Cont…
 CF ≥ 12.5 are 27,37,41,46, and 50. but the smallest CF is
27. so the quartile class is the third class (55.5-60.5).
𝑤 𝑛 5
 Q1 = L + − 𝐶𝐹 = 55.5 + 12.5 − 12 = 55.7
4 15
𝑓𝑄1

 For percentiles

 P7 measure of (7n/100)th value = 3.5th value which lies in


group 45.5 – 50.5.
𝑤 7𝑛 5
 P7 = L + − 𝐶𝐹 = 45.5 + 3.5 − 0 = 49.875.
100 4
𝑓𝑃7

35 3/14/2024
Cont…
1. Calculate 𝑄1 , 𝑄2 , 𝑄3, 𝐷4, 𝑃40 & 𝑃90 for the following data
given on the table below.
x 10 11 12 13 14 15 16 17 18
f 2 8 25 48 65 40 20 9 2

2. The following frequency distribution represents the magnitude


of earth quake.

Magnitude 0-0.9 1-1.9 2-2.9 3-3.9 4-4.9 5-5.9 6-6.9 7-7.9


Frequency 20 50 45 30 10 8 6 1
Compute the median and verify that it is equal to the second
quartile and find 72nd percentile.

36 3/14/2024
Measures of Dispersion (Variation)
• The scatter or spread of items of a distribution is known as
dispersion or variation.
• In other words the degree to which numerical data tend to
spread about an average value is called dispersion or variation
of the data.
The most commonly used measures of dispersions are:
1) Range and relative range
2) Quartile deviation and coefficient of Quartile deviation
3) Mean deviation and coefficient of Mean deviation
4) Variance
5) Standard deviation and coefficient of variation.

37 3/14/2024
Range
 The range is the largest score minus the smallest score.
 It is a quick and dirty measure of variability
 Because the range is greatly affected by extreme scores and its
only depends on two observations
𝑅 =𝐿−𝑆

Relative Range (RR)


 It is also some times called coefficient of range and given:

38 3/14/2024
Cont..
Example: Suppose the first and third quartile for weights of
girls 12 months of age are 8.8 Kg and 10.2 Kg respectively.
The IQR = 10.2 Kg – 8.8 Kg

The Quartile Deviation (Semi-inter quartile range)


 The inter quartile range is the difference between the third and the
first quartiles of a set of items and semi-inter quartile range is half of
the inter quartile range.

39 3/14/2024
Variance
 Variance is the "average squared deviation from the mean".
 A good measure of dispersion should make use of all the data.
 For ungrouped data, the population variance is computed as:

 For the case of frequency distribution it is expressed as:

o The variance is limited as a descriptive statistic because it is not in the


same units as in the observations.

40 3/14/2024
For the case of frequency distribution it is expressed as:

 Why you use n-1;


− To obtain unbiased estimate of population variance or,
− To describe the spread of the population.

41 3/14/2024
Cont…
 There is a problem in a variance because the deviations are squared

and its units also square, in order to get the original unit of
measurements using square root.

42 3/14/2024
Example1
Consider the following three datasets

▪ Dataset 1:7, 7, 7, 7, 7, 7 Mean=7, sd=0

▪ Dataset 2: 6, 7, 7, 7, 7, 8, mean=7, sd=0.63

▪ Dataset 3: 3, 2, 7, 8, 9, 13, mean=7, sd=4.04

❖ We understand that the same mean but different variation

43 3/14/2024
Special properties of variance
• The main drawback of variance => unit is squared, so it is
difficult to interpret .
• Variance gives weight to extreme values than those near the
mean value because the difference is squared.
• Variance will be zero for distributions with equal magnitude
• The greater the difference in the values, the greater the variance
and vise versa.
 Why you use n-1;

− To obtain unbiased estimate of population variance or,


− To describe the spread of the population.

44 3/14/2024
SD Vs. Standard Error (SE)
 SD describes the variability among individual values in a given data

set.

 SE is used to describe the variability among separate sample means

obtained from each sample.

 We interpret SE of the mean may give a mean that may lie between ±

SE.

45 3/14/2024
Cont…
o The SD has the advantage of being expressed in the same units

of measurement as the mean.

o SD is considered to be the best measure of dispersion and is

used widely because of the properties of the theoretical normal


curve.

o However, if the units of measurements of variables for two data

sets is not the same, then their variability can’t be compared by


using SD.

46 3/14/2024
Coefficient of Variation (C.V)
 The standard deviation is an absolute measure of deviation of
observations around their mean and is expressed with the same
unit of the data.

 Due to this nature of the standard deviation, it is not directly


used for comparison purposes with respect to variability.
 Coefficient of variation, is often used for this purpose
 Is defined as the ratio of standard deviation to the mean usually
expressed as precents.

𝑆
𝐶. 𝑉 = ∗ 100%
𝑋ത

 The distribution having less C.V is said to be less variable or


more consistent or homogenous.

47 3/14/2024
When to use coefficient of variation
 When two data sets have different units of measurements, or
their means differ sufficiently in size, the CV should be used as
a measure of dispersion.
 When different units of measurements are involved, e.g.,

group 1 unit is mm, and group 2 unit is gm (CV is suitable for


comparison as it is unit-free)
 In such cases, standard deviation should not be used for

comparison.

48 3/14/2024
Standard score (Z-scores)
❑ It is obtained by subtracting the mean of the data set from

the value and dividing the result by the standard deviation


of the data set.

❑ It tells us how many standard deviations a specific value is

above or below the mean value of the data set.

❑ The z-score is the number of standard deviations the data

value falls above (positive z-score) or below (negative z-


score) the mean for the data set.

49 3/14/2024
Cont…
 Z-score computed from the population
𝑋−𝜇
𝑍 𝑠𝑐𝑜𝑟𝑒 =
𝜎
 Z-score computed from the sample
𝑋 − 𝑋ത
𝑍 𝑠𝑐𝑜𝑟𝑒 =
𝑆
Example: Suppose that a student scored 66 in biostatistics and 80 in
Epidemiology . The score of the summary of the courses is given below.

Course Average score Standard deviation of the score


Biostatistics 51 12
Epidemiology 72 16
In which course did the student scored better as compared to his
classmates?

50 3/14/2024
Solution:
𝑋−𝜇 66−51 15
Z-score of student in Biostatistics: 𝑍 = = = =
𝜎 12 12

1.25
𝑋−𝜇 80−72 8
Z-score of student in Epidemiology: 𝑍 = = = =
𝜎 16 16

0.5

From these two standard scores, we can conclude that the


student has scored better in Biostatistics course relative to his
classmates than in Epidemiology.

51 3/14/2024
Moments about the mean (central moments)
 The rth moments about the mean (the rth central moments) defined
as
σ 𝑋𝑖 − 𝑋ത 𝑟
𝑀𝑟 = , r = 0, 1, 2, …
𝑛
 For continuous grouped data

σ 𝑓𝑖 𝑋𝑖 − 𝑋ത 𝑟
𝑀𝑟 =
𝑛
Where 𝑋𝑖 ’s is class mark
Find the first three central moments about the mean of the following
individual series of 2, 3 and 7.
52 3/14/2024
Measure of shape

 There are different type of measure of shapes;

I. Skewness

II. Kurtosis

53 3/14/2024
1. Skewness
o Measure of central tendency and variation do not reveal the
shape of frequency distribution.

o Skewness is the degree of asymmetry or departure from


symmetry of a distribution.

o A skewed frequency distribution is one that is not


symmetrical.

o Skewness is concerned with the shape of the curve not size.

54 3/14/2024
Skewness…
o The skewness of a distribution is defined as the lack of symmetry.

o In a symmetrical distribution, mean, median, and mode are equal to


each other.

55 3/14/2024
Skewness…
• For moderately skewed distribution, the following relation holds
among the three commonly used measures of central tendency.

➢ Mean-Mode=3*(Mean-Median)

 Thera are two type of skewness based the its shape.

 Positively skewed distribution: Smaller observations are more

frequent than larger observations. i.e. the majority of the observations


have a value below an average and it has a long tail in the positive
direction (Mean > Mode).

56 3/14/2024
Cont…
Skewed to the right (positively skewed)

Mode

Median

Mean

57 3/14/2024
Cont…
 Negatively (left) skewed: Smaller observations are less frequent

than larger observations. The mean is pulled towards the low-


valued item (that is, to the left). i.e. the majority of the
observations have a value above an average. i.e. Mean < Mode.
Mode

Median

Mean

58 3/14/2024
Measures of Skewness
1. Karl Pearson’s Coefficient of Skewness (SK):

S k = Mean - Mode
Standard deviation S k = 3(Mean - Median)
Standard deviation

If SK = 0, then the distribution is symmetrical.


If SK > 0, then the distribution is positively skewed.
If SK < 0, then the distribution is negatively skewed.

59 3/14/2024
2. Moment Coefficient of Skewness
 Moment coefficient of skewness is based on moments. The
formula for calculating coefficient of skewness is:

𝑀3 𝑀3
𝛼3 = 3/2 =
𝑀2 𝜎3

Where, Mr = σ𝑛𝑖=1(𝑥𝑖 − 𝑥)ҧ 𝑟 /𝑛


𝛼3 > 0,➔ the distribution is positively skewed
α3 = 0,➔ the distribution is symmetric
α3 < 0,➔ the distribution is negatively skewed

60 3/14/2024
Kurtosis
o Kurtosis is a measure of peakedness of a distribution.
o The degree of kurtosis of a distribution is measured relative
to the peakedness of a normal curve.
o The peakedness of a distribution can be classified into three:

o Leptokurtic: -
- A distribution having relatively high peak.
- A curve is more peaked than the normal curve .

61 3/14/2024
Cont…
o Mesokurtic: -
- Normal peak
- The curve is properly peaked

o Platykurtic:
▪ Flat toped

▪ A large number of observations have low frequency are


spread in the middle interval.

62 3/14/2024
Cont…

63 3/14/2024
Measures of kurtosis
 The moment coefficient of kurtosis 𝛼4 ;

𝑀4
𝛼4 =
𝑀2 2

Where; 𝑀2 and 𝑀4 are central moments.

 If 𝛼4 = 3, then the distribution is Mesokurtic.

 If 𝛼4 > 3, then the distribution is Leptokurtic.

 If 𝛼4 < 3, then the distribution is Platykurtic.

64 3/14/2024
Example:
Based on the following data:
𝑀0 = 1, 𝑀1 = -0.6, 𝑀2 = 1.6, 𝑀3 = -2.4, 𝑀4 = 5.8
a) Find the coefficient of skewness and discuss the distribution
type.
b) Find the coefficient of kurtosis and discuss the distribution type.
Solution
𝑀′3 −2.4
a) 𝛼3 = 3/2 = = -1.19 < 0, ➔the distribution is negatively
𝑀′2 1.63/2
skewed.
𝑀′4 5.8
b) 𝛼4 = = = 2.26 < 3, ➔the curve is Platykurtic.
𝑀′22 1.62

65 3/14/2024
Which measures to use?
 When the distribution of the data is symmetric and unimodal (i.e. the data
are approximately normally distributed), it is usual to summarize the data
using means and standard deviations.

 However when the data are skewed, it is preferable to use the median and
quartiles as summary statistics.

 Median and quartiles are not easily influenced by extreme values in a


skewed distribution unlike means and standard deviations.

 Remark:
o The mean and median of symmetric distribution coincide.
o When the distribution is skewed to the right, its mean is larger than its
mode.
o When the distribution is skewed to the left, its mean is smaller than its
66 mode. 3/14/2024
THANK YOU!!!

67 3/14/2024
Probability and probability distribution

68 3/14/2024
Objectives
❑ After completing this chapter, learners should be able to :-

✓ Define basic terms in probability

✓ Describe set theory and probability

✓ Identify types of probability

✓ Identify types of random variable and probability distribution

✓ List common probability distributions and their properties

69 3/14/2024
Introduction of probability
 Many medical decisions are made on a statistical basis since
individuals differ in their reactions to medications or surgery in an
unpredictable way.

 In that case, the treatment applied is based on getting the best


outcome for as many patients as possible.

 The life experienced consists of a series of events.


 “Probability” is a very useful concept and are used in everyday
communication.

70 3/14/2024
Introduction of probability….
▪ Understanding of probability is fundamental for
▪ quantifying the uncertainty in the decision-making process.
▪ drawing conclusions about a population of patients based on
known information about a sample of patients drawn from
that population.

▪ Probability can be defined as the chance of an event


occurring.

▪ Many people are familiar with probability from observing or


playing games of chance, such as card games or lotteries.

▪ Probability theory is used in the various fields of area like


insurance, investments, and weather forecasting and other
areas.
71 3/14/2024
Probability cont.….

 Conclusions/Inferences in science are using probability


72 3/14/2024
Basic terms of probability concepts
o Random experiment: an experiment which can be repeated any
number of times under the same conditions, but does not give unique
results.
✓ A chance process that leads to well-defined results called
outcomes, that is the result cannot be predicted.
✓ E.g. Tossing of coins, throwing of dice are some examples of
random experiments.
o Trial: Performing a random experiment is called a trial.
Example: Tossing a coin and observing the face showing up is a probability
experiment.
o Outcome: It is the result of a single trial in a probability experiment.
It is also called event.
73 3/14/2024
Basic terms cont.…..
Example: the outcome of the sex of a newborn from a mother in delivery room is
either Male or female
 Certain event: an event which is sure to occur.
 Impossible event: an event which will never occur.
 Complement of an event: the complement of an even A consists of all the
sample points in the sample space that are not in A (non-occurrence of event
A).

 Equally-likely outcomes: if each outcome in a sample space has the same


chance to be occurred, then each possible outcome with ‘n’ equally likely must
have probability 1/n of occurring.

 Mutually exclusive events: two event are not occur at the same time or .
They cannot be occurred simultaneously.

74 3/14/2024
Basic Terms Cont…
• Sample space: The set of all possible outcomes of a statistical
experiment is called the sample space and is represented by the
symbol 'S’.
Example: The sample space for the sex of newborns when two mothers are
in the gynecology ward to give birth is: S= {MM, MF, FM, FF}
• An event: consists of one or more outcomes and is a subset of the sample
space or a collection of sample points.
Example: From the above experiment, an event consisting of at least one
female is E = {MF, FM, FF}
• RandomVariable: is a function that associates a unique
numerical value with every outcome of an experiment

75 3/14/2024
Basic terms cont.…
Two different broad classes of random variables:
1. A continuous random variable: can take any value in
an interval or collection of intervals.

2. A discrete random variable: can take one of a countable


list of distinct values.

76 3/14/2024
Basic concepts cont’d…..
 Some sample spaces for various probability experiments are.

 Probability attempts to quantify an uncertain situation and the likelihood, or

chance, that an outcome of a random experiment will occur.

 Probability is a number between 0 and 1 that expresses how likely the event is occur.

77 3/14/2024
Basic concepts cont’d…..
Example: Find the sample space for the gender of the children if a family
has three children. Use B for boy and G for girl.
 Solution: There are two genders, male and female, and each child could
be either gender. Hence, there are eight possibilities, as shown here.
S= {BBB, BBG, BGB, GBB, GGG, GGB, GBG, BGG}
Note: the way to find all possible outcomes of a probability experiment
(the sample spaces) would be:-
 by observation and reasoning;
 use a tree diagram (a device consisting of line segments emanating from
a starting point and also from the outcome point.)

78 3/14/2024
Tree diagram of the above example

79 3/14/2024
Types of probability
1.Classical (or theoretical) probability
▪ It is used when each outcome in a sample space is equally
likely to occur.
▪ That is if an experiment has 'n' equally likely outcomes, then
each possible outcome must have probability of 1/n to occur
Or, equivalently the probability for event E is;

Note: Fail to answers the following conditions


1. it is not possible to enumerate all elements of the sample space
2. If each outcome is not equally likely.
Example: The probability of getting at least one female birth from
two pregnant mothers is: ¾ = 0.75
80 3/14/2024
Types of probability cont…
2. Frequentist (Empirical or statistical) probability: is based on
observations obtained from experiments /a large number of trials or
from historical data.

Example:
• A medical doctor realized that out of 100,000 patients visited the
hospital, there are 50 cancer cases. What is the probability that a patient
to be examined will be positive for cancer?
P(+ve for cancer) = 50/100,000 = 0.0005

81 3/14/2024
Example 2
In a sample of 50 people, 21 had type O blood, 22 had type A blood, 5
had type B blood, and 2 had type AB blood. Set up a frequency
distribution and find the following probabilities
a. A person has type O blood
b. A person has type A or type Bblood
c. A person does not have type AB blood

82 3/14/2024
Solution
Blood type Frequency
A 22
B 5
AB 2
O 21
Total 50

▪ P(O) = 21/50 = 0.42


▪ P(A)= 22/50 = 0.44
▪ P (A or B)=p(A)+P(B)= 22/50+5/50=27/50
▪ Do others in this way?

83 3/14/2024
3. Axioms of probability
 Let “E” be a random experiment and “S” be a sample space

associated with “E”. With each event A , we associate a real

number designed by P (A) is called probability of A which

satisfies the following properties (basic rules of probability):

 0 ≤ P (A) ≤1

 P (S) =1

 P (A') = 1- P (A)

 P (ø) = 0, ø is the null or empty event.

84 3/14/2024
Mutually exclusive events/Disjoint events:
o Two events E1 and E2 are said to be mutually exclusive events if there is no sample
point which is common to both events E1 and E2.That means, E1∩ E2 =ᶲ (empty).
E.g. One die is rolled. Sample space = S = (1,2,3,4,5,6)
Let A = the event an odd number turns up, A = (1,3,5)
Let B = the event a 1,2 or 3 turns up; B = (1,2,3 )
A. Find Pr (A) and Pr (B)
P( A ) = P(1) + P(3) + P(5) = 1/6+1/6+ 1/6 = 3/6 = ½
P( B ) = P(1) + p(2) + P(3) = 1/6+1/6+1/6 = 3/6 = ½
B. Are A and B mutually exclusive?
o A and B are not mutually exclusive. Because they have the elements 1 and 3 in
common.

85 3/14/2024
The Venn diagram to show two disjoint events A and B might
look like this one:

86 3/14/2024
Union of events: The union of two events A and B,denoted by (A⋃B) ,
consists of all outcomes that are in A or in B or both A andB.
❖ If A and B are two events,then
▪ P(A ∪ B) = P(A) + P(B) − P(A ∩B)
❖ If A and B are mutually exclusive/independent,then
▪ P(A ∪ B) = P(A) + P(B)
Example: In a hospital unit there are 8 nurses and 5 physicians; 7 nurses and
3 physicians are females. If a staff person is selected, find the probability that
the subject is nurse or a male?

87 3/14/2024
Staff Gender
Male Female Total
Physician 2 3 5
Nurse 1 7 8
Total 3 10 13

88 3/14/2024
Intersection of events
 If A and B are events, then the intersection of A and B, denoted by A ∩ B,
represents the event composed of all basic outcomes in A and B
 P(A ∩ B) = P(A)*P(B/A) if the two events are dependent or related
 P(A ∩ B) = P(A)*P(B) if the two events are independent

A B

89 3/14/2024
Conditional probability
 Conditional probability of A given B means the probability of occurrence
of A when the event B has already happened, and it is defined as
follow:- P(A/B) = P(A ∩ B)/P(B), P(B) ≠ 0
 Special case: when both events are independent then,
 P(A/B) = P(A), and P(B/A) = P(B),
 P(A ∩ B) = P(A)*P(B)

90 3/14/2024
Example

91 3/14/2024
Independent Events
 Two events are independent if the occurrence of one of the events
does not in any way affect the probability of the other event.
 That is, A and B are independent if :P (B |A) = P (B) or if P (A |B)
= P (A)
Example: Let event A stands for “the sex of the first child from a
mother is female”; and event B stands for “the sex of the second child
from the same mother is female”
 Are A and B independent?
 Solution
 P(B/A) = P(B) = 0.5
 The occurrence of A does not affect the probability of B, so the
events are independent.
92 3/14/2024
Example
The following data shows the association between aspirin use and heart
attack.
Table: Data for treatment versus Myocardial Infarction
Myocardial Infarction
Treatment Yes No Total
Placebo 100 500 600
Aspirin 60 900 960
Total 160 1400 1560

Let us define A and B as, positive for Myocardial Infarction and Aspirin
used respectively.

93 3/14/2024
Find;
A. P(A/B), B. P(B/A)
C. Are the characteristics of A and B independent
Solution:
A. P(A/B) = P(A n B)/P(B) = 60/1560 ÷ 960/1560 = 0.0625
B. P(B/A) = P(B n A)/P(A) = 60/1560 ÷ 160/1560 = 0.375
C. To test independency p(A/B) = p(A) or p(A ∩ 𝐵) = p(A)×p(B)
Therefore: P(A/B) = 0.0625 where as p(A) = 160/1560 =0.103

Now P(A/B) ≠p(A) i.e. 0.0625≠ 0.103


So, the characteristics of A and B are not independent, i.e. they are
dependent

94 3/14/2024
Counting rules of probability
We have three different counting rules.
✓ Basic multiplication rule
✓ Permutation
✓ Combinations

95 3/14/2024
Multiplication Rule (for counting techniques)
 If an operation can be described as a sequence of k steps, and

 if the number of ways of completing step 1 is n1, and

 if the number of ways of completing step 2 is n2 for each way of

completing step 1, and


 if the number of ways of completing step 3 is n3 for each way of

completing step 2, and so forth until k steps, the total number of ways
of completing the operation is n1*n2*…..* nk

96 3/14/2024
multiplication rule cont.….

 E.g. Assume we have a coin & a die. If we toss a coin first and then
the die, how many possible outcomes does the experiment have?

 We have: n1xn2 = 2 x 6 = 12 possibilities

97 3/14/2024
Permutations
 The number of possible permutations is the number of different
orders in which particular events occur. The number of possible
permutations are

 where r is the number of events in the series, n is the number of


possible events
 Factorial: n! denotes the factorial of n = the product of all the positive
integers from 1 to n.
n! = n∙(n-1)∙(n-2)∙(n-3)….. (3)∙(2)∙(1)
0! = 1!= 1, nP0 = 1, np1 = n, nPn= n!
 E.g. 6p2?

98 3/14/2024
Combination
The number of ways r objects can be chosen a set of n objects without
considering the order of selection is called the number of combination of
n objects taking r of them at a time, denoted by

Eg 1. C(8,2) ?
2. In a club containing 7 members a committee of 3 people is to be formed.
In how many ways can the committee be formed?

7 7!
7 C3 = 
 3  = 3!(7 - 3)! =35
 

99 3/14/2024
Example:
Given the letters A, B, C, and D list the permutation and
combination for selecting two letters.
Solution:
Permutation Combination
AB BA CA DA AB BC
AC BC CB DB AC BD
AD BD CD DC AD DC
 In case of combination, AB=BA, but not for permutation since it

consider order arrangement.

100 3/14/2024
Random variable and Probability Distribution
Definition of random variables and probability distribution
A random variable: is a variable whose values are determined by chance.

Probability distribution:- is a complete list of all possible of values of


a random variable and their corresponding probabilities.

Generally a random variables are denoted by capital letters X,Y,Z…and


the value of the random variables are denoted by small letters x, y, z

101 3/14/2024
1. Discrete random variables: have a finite number of possible values or an
infinite number of values that can be counted.
 The word counted means that they can be enumerated using the numbers
1, 2, 3, etc
 Variables that can assume all values in the interval between any two given
values are called continuous variables

2. Continuous random variables: it assumes infinite and uncountable set of


values between any two given values (no gaps).

102 3/14/2024
Examples of discrete random variable:
• Toss a coin “n” time and count the number of heads.
• Number of experimental rats in specific study.
• Number of defective items in a given company.
• Number of bacteria per two cubic centimeter of water
Examples of continuous random variable:
• Height of students at certain college.
• Mark of students.
• Life time of a certain disease .
• Length of time required to complete a given training

103 3/14/2024
The probability distribution of a discrete random variable is a table, graph,
formula, or other device used to specify all possible values of a random
variable along with their respective probabilities.
✓ Since the values of a probability distribution are probabilities, they must be
numbers in the interval from 0 to 1.
Example: Consider the experiment of tossing a coin three times. Let X be the
number of heads. Construct the probability distribution of X. F (X) = Pr (X =
Xi) , i = 0, 1, 2, 3.
Pr (X = 0) = 1/8 …………………………….TTT
Pr (X = 1) = 3/8 ……………………………. HTT THT TTH
Pr (X = 2) = 3/8 ……………………………..HHT THH HTH
Pr (X = 3) = 1/8 ………………………………HHH
X 0 1 2 3
P(X) 1/8 3/8 3/8 1/8
104 3/14/2024
Example 2:
Construct a probability distribution for rolling a single die.
Solution
Since the sample space is 1, 2, 3, 4, 5, 6 and each outcome has a probability
of the distribution

X 1 2 3 4 5 6
p(x) 1/6 1/6 1/6 1/6 1/6 1/6

105 3/14/2024
Two requirements for probability distribution
 The sum of the probabilities of all events in the sample space must be equal
to 1; i.e. σ 𝑝 𝑋 = 𝑥𝑖 = 1

 The probabilities of each event in the sample space must be between or


equal to 0 and 1; i.e. 0 ≤ 𝑝 𝑋 = 𝑥𝑖 ≤ 1

106 3/14/2024
Properties of continuous probability distribution
1.

1. The total area under the curve is one i.e.



-
f ( x) = 1

2. P(a  X  b) = the area under the curve between the point a and b.
3. P( X )  0
4.
P( X = a ) = 0
5. P(a  X  b) = P(a  X  b) = P(a  X  b) = P(a  X  b)
b
P(a  x  b) =  f(x) dx
a

107 3/14/2024
Introduction to expectation
Definition: the expected value (also known as the mean) of a
random variable is a measure of the center location for the
random variable.
1. Discrete R.V n

E(X) = X1P(X1) +X2P(X2) +…. +XnP(Xn) =  X i .P ( X i )


i =1

2. Continuous R.V
b
E(X ) =  X . f ( x)d ( x)
a

108 3/14/2024
Variance Probability distribution
 The expected value of X is its mean
Mean of X= E(X)
 The variance of X is given by:
Variance of X=Var(x) = E (X 2 ) - ( E ( X )) 2
n
E ( X ) =  X i .P( X i ) if X is discrete
2 2

i =1

=  X 2 f (x )d ( x) if X is continuous
x

109 3/14/2024
Example
Let X be a continuous R.V with distribution
1
 x 0 x2
f ( x) =  2
0, otherwise
Then find
a) P (1<x<1.5)
b) E(x)
c) Var(x)
d) E (3x 2 - 2 x)

E.g2. Two dice are rolled. Let X be a random variable denoting the sum of
the numbers on the two dice.
i) Give the probability distribution of X
ii) Compute the expected value of X and its variance
110 3/14/2024
Discrete probability distributions …….

✓ Bernoulli
✓ Binomial
✓ Poisson
✓ Negative binomial
✓ Geometric

111 3/14/2024
1. Bernoulli Distribution:
• The random variable X takes two values 1 or 0.
• Ω = {0, 1}, P(X = 1) = p, P(X = 0) = 1 − p
•Then, the probability function is: P(Y = y) =

The prevalence of HIV infection is 11%. Let X be the HIV


status of a randomly chosen person. X = 1 if HIV+; X = 0 if
HIV-. Then, X has a Bernoulli distribution.
• p(X = 1) = 0.11, p(X = 0) = 0.89.
112 3/14/2024
2. Binomial Distribution
A binomial experiment is a probability experiment that satisfies the
following four requirements called assumptions of a binomial
distribution.

– The experiment consists of 'n' identical trials.

– Each trial has only one of the two possible mutually exclusive
outcomes, success or a failure.

– The probability of success does not change from trial to trial, and

– The trials are independent, thus we must sample with replacement

113 3/14/2024
Binomial distribution Cont..

114 3/14/2024
When using the binomial formula to solve problems, we have to identify three
things:
▪ The number of trials (n)

▪ The probability of a success on any one trial (P) and

▪ The number of successes desired (x)


 We call the distribution for random variable X Binomial Distribution and
often X~Binom(n, p).
 Remark: If X is a binomial random variable with parameters n and p then
E( X ) = np , Var( X ) = npq

115 3/14/2024
Example: Suppose that an examination consists of six true and false
questions, and assume that a student has no knowledge of the subject
matter. The probability that the student will guess the correct answer to
the first question is 30%. Likewise, the probability of guessing each of the
remaining questions correctly is also 30%.
a) What is the probability of getting exactly three correct
answers?
b) What is the probability of getting at least two correct answers?
c) What is the probability of getting at most two correct answers?
d) What is the probability of getting less than five correct answers?
e) Find expected value and standard deviation?

116 3/14/2024
117 3/14/2024
118 3/14/2024
Exercise
1. Suppose 14 percent of mothers admitted to smoking one or more
cigarettes per day during pregnancy. If a random sample of size 10 is
selected from this population, what is the probability that it will contain
exactly four mothers who admitted to smoking during pregnancy?

2. Suppose that 80% of adults with allergies report symptomatic relief with a
specific medication. If the medication is given to 10 new patients with
allergies, what is the probability that it is effective in exactly seven? assume
that the replications are independent.

119 3/14/2024
3.Poisson distribution
• The probability distribution of a Poisson random variable ‘X'
representing the number of successes occurring in a given time interval
or a specified region of space is given by the formula:

Where
• k=Number of successes per unit time

• e=The base of the natural logarithm

• λ= The expected number of successes per unit time


• If λ is the average number of successes occurring in a given time
interval or region in the Poisson distribution, then the mean and
the variance of the Poisson distribution are both equal to λ.
o It describes random events that occurs rarely over a unit of time or
space.
120 3/14/2024
The following statements describe what is known as the Poisson process.
1. The occurrences of the events are independent. This means, the
occurrence of an event in an interval of space or time has no effect on
the probability of a second occurrence of the event in the same, or any
other, interval.
2. Theoretically, an infinite number of occurrences of the event must be
possible in the interval.
3. The probability of the single occurrence of the event in a given
interval is proportional to the length of the interval.
The Poisson distribution is used as a distribution of rare events, such
as:
✓ Number of misprints.

✓ Natural disasters like earth quake.

✓ Accidents.

✓ Hereditary.

✓ Arrivals

121 3/14/2024
 The Poisson distribution differs from the binomial distribution in these
fundamental ways:

 The binomial distribution is affected by the sample size n and the


probability p, whereas the Poisson distribution is affected only by the
mean μ.

 In a binomial distribution the possible values of the random variable are


X are 0, 1, . . . n, but a Poisson distribution has possible x values of 0, 1, .
. . , with no upper limit.
Note: The mean and variance of a Poisson distributed variable are given by
m = l and V = l, respectively.

122 3/14/2024
Example:
In a study of drug-induced anaphylaxis among patients taking rocuronium
bromide as part of their anesthesia, the occurrence of anaphylaxis followed a
Poisson distribution with λ =12 incidents per year in Norway. Find the
probability that in the next year, among patients receiving rocuronium,
a. exactly three will experience anaphylaxis.
b. At least two will experience anaphylaxis

123 3/14/2024
Solution
𝜆 = 12 incidents per year
𝑒 −12 ∗123
a. P(x=3)= = 0.00177
3!

𝑒 −12 ∗120 𝑒 −12 121


b. P(x≥2)=1-P(x<2)= 1- +
0! 1!

=0.9

124 3/14/2024
Exercise
In a certain population an average of 13 new cases of esophageal cancer are
diagnosed each year. If the annual incidence of esophageal cancer follows a
Poisson distribution, find the probability that in a given year the number of
newly diagnosed cases of esophageal cancer will be:
A. Exactly 10 cases
B. At least three cases
C. No more than 3
D. Between nine and 12, inclusive
E. Fewer than two

125 3/14/2024
4.Negative Binomial Distribution
 Consider a Bernoulli trial with two outcomes: S and F.
 Repeat this trial identical and independently.
 Count the number of trials until the kth success is observed
 If repeated independent trials can result in a success with probability p
and a failure with probability q = 1 − p, has a negative binomial
distribution with discrete probability function given by;

 where p is the success probability in each trial.


𝑘 𝑘𝑞
 We have that E 𝑋 = and var 𝑥 =
𝑝 𝑝2
 Used widely in count data whose distribution is over-dispersed, with the
126 variance greater than the mean. 3/14/2024
5. Geometric probability distribution
 Defines the probability that first success occurs after k number of trials
 A special case for k = 1 is known as the Geometric probability
distribution
 A discrete random variable X has a Geometric probability distribution if
the discrete probability function is given by ;

We have that
1 𝑞
𝜇= and 𝜎 2 = 2
𝑝 𝑝

Example:
 Tossing a balanced coin until Head appears
127 3/14/2024
Continuous probability distributions
• If a random variable is a continuous variable, its probability
distribution is called a continuous probability distribution
• A continuous probability distribution differs from a discrete
probability distribution in several ways by:
• Under different circumstances, the outcome of a random variable
may not be limited to categories or counts.
 Some common continuous probability distribution

✓ Normal distribution

✓ Chis-square distribution

✓ Student’s t-distribution

128 3/14/2024
1. Normal distribution
❑ The normal distribution refers to a family of continuous probability distributions
described by the normal equation and described as follows:
1  x-m 
2

1 -  
2  
f ( x) = e for -  x 
 2
e = 2.7183  = 3.1416
m and  are the population mean and standard deviation.

Where;

• X is a normal random variable,

• The random variable X in the normal equation is called the normal random
variable.

129 3/14/2024
Characteristics of Normal Distribution
• It links frequency distribution to probability distribution
• Has a Bell Shape Curve and is Symmetric
• It is Symmetric around the mean: Two halves of the curve are the
same (mirror images)
• Hence Mean = Median=mode
• The total area under the curve is 1 (or 100%)
• Normal Distribution has the same shape as Standard normal
distribution

130 3/14/2024
Normal Curve
 The graph of the normal distribution depends on two factors:
✓the mean and the standard deviation.
 The mean of the distribution determines the location of the center of the graph,
and the standard deviation determines the height and width of the graph.
 When the standard deviation is large, the curve is short and wide; when the
standard deviation is small, the curve is tall and narrow.
 All normal distributions look like a symmetric, bell-shaped curve.

131 3/14/2024
Standard Normal Distribution
• It makes life a lot easier for us if we standardize our normal curve, with
a mean of zero and a standard deviation of 1 unit.
• We can transform all the observations of any normal random variable X
with mean μ and variance σ to a new set of observations of another
normal random variable Z with mean 0 and variance 1 using the following
transformation:

132 3/14/2024
 About 68% of the area under the curve falls within 1 standard deviations of
the mean
 About 95% of the area under the curve falls within 2 standard deviations of
the mean
 About 99.7% of the area under the curve falls within 3 standard deviations
of the mean
 A graph of this standardized (mean 0 and variance 1) normal curve is given
in Graph:

133 3/14/2024
Probability and Normal Distributions
❑ We know that the area under any normal curve is 1 unit
➢ Therefore, we can link these areas with probability
i.e. if a random variable, x, is normally distributed, the probability that x will fall in
a given interval is the area under the normal curve for that interval.
➢ Or P(a < x < b) = area under the curve between a and b.

• There is no probability attached to any single value of x


• The probability for specific value is equal to zero, that is, P(x = a) = 0
134 3/14/2024
Probability and Normal Distribution
• For the solution of problems using the standard normal
distribution, a two-step process is recommended
Step 1: Draw the normal distribution curve and shade the area
Step 2: Find the appropriate figure in the Procedure Table and
follow the directions given.

135 3/14/2024
Table of normal distribution
Example 1: Suppose we want to compute the area under the
normal curve to the left of 1.45
• This area can be computed by finding the probability under the normal
curve
• The probability can be read at the normal curve by combining the value of
1.4 under the first column and 0.05 under the first row
• The left side of the area in the diagram represents the area that is within
1.45 standard deviations from the mean.
• The area of this shaded portion is 0.9265(or 92.65% of the total area
under the curve).

136 3/14/2024
137 3/14/2024
138 3/14/2024
Example:
Find the area to the left of z = 2.06
Solution
Step 1: Draw the figure

139 3/14/2024
Step2: We are looking for the area under the standard normal distribution
to the left of z = 2.06, It is 0.9803. Hence, 98.03% of the area is less than z
= 2.06.

140 3/14/2024
Find the area between z = 1.68 and z =-1.37.
Solution
Step 1: Draw the figure as shown.

Step 2 Since the area desired is between two given z values, look up
the areas
corresponding to the two z values and subtract the smaller area from the larger
area. (Do not subtract the z values.) The area for z=1.68 is 0.9535, and the area
for z= -1.37 is 0.0853. The area between the two z values is 0.9535 - 0.0853 =
0.8682 or 86.82%
141 3/14/2024
Example:
For subject A, a 27-year-old female, the ammonia concentration in parts per
billion (ppb) followed a normal distribution over 30 days with mean 491 and
standard deviation 119.What is the probability that on a random day, the
subject’s ammonia concentration is between 292 and 649 ppb?
Solution:
We find the z value corresponding to an x of 292 by

142 3/14/2024
The area desired is the difference between these, 0.9082 - 0.0475 = 0.
8607.
Exercise:
1. For another subject (a 29-year-old male), the acetone levels were
normally distributed with a mean of 870 and a standard deviation of 211
ppb. Find the probability that on a given day the subject’s acetone level is:
a. Between 600 and 1000 ppb
b. Over 900 ppb
c. Under 500 ppb
d. Between 900 and 1100 ppb

143 3/14/2024
2. If the total cholesterol values for a certain population are approximately
normally distributed with a mean of 200 mg\100 ml and a standard
deviation of 20 mg\100 ml, find the probability that an individual picked at
random from this population will have a cholesterol value:
a. Between 180 and 200 mg/100 ml
b. Greater than 225 mg/100 ml
c. Less than 150 mg/100 ml
d. Between 190 and 210 mg/100 ml

144 3/14/2024
2. Student t-distribution
• It is often the case that one wants to calculate the size of sample
needed to obtain a certain level of confidence in survey results
• Unfortunately, this calculation requires prior knowledge of
the population standard deviation σ.
• Realistically, σ is unknown
• Often a preliminary sample will be conducted so that a reasonable
estimate of this critical population parameter can be made
• If such a preliminary sample is not made, but confidence intervals
for the population mean are to be constructing using an unknown
σ, then the distribution known as the Student t distribution can
be used.

145 3/14/2024
Student’s t-distribution cont…
 Suppose we have a simple random sample of size n drawn from a
Normal population with mean μ and standard deviation σ. Let us
denote the sample mean by 𝑥ҧ and sample standard deviation by s,
then the quantity:
𝑥ҧ − 𝜇
𝑡= 𝑠
𝑛
has a t distribution with n-1 degrees of freedom.

The degrees of freedom are the number of values that are free to vary
after a sample statistic has been computed.

146 3/14/2024
Some properties of t-distribution are;
The t distribution shares some characteristics of the normal distribution
and differs from it in others
The t distribution is similar to the standard normal distribution in these
ways:
1. It is bell-shaped
2. It is symmetric about the mean
3. The mean, median, and mode are equal to 0 and are located at the center
of the distribution
Converges to the normal distribution as the sample size gets large
5. The curve never touches the x-axis.

147 3/14/2024
The t distribution differs from the standard normal distribution in the
following ways:
▪ The variance is greater than 1.
▪ The t distribution is actually a family of curves based on the concept
of degrees of freedom, which is related to sample size.

▪ As the sample size increases, the t distribution approaches the


standard normal distribution

148 3/14/2024
Assumption of student’s t-distribution
❑The parent population from which the sample is

drawn is normal.

❑The sample observations are independent; that is, the

sample is random.

❑The population standard deviation (𝜎) is unknown.

❑ samples of size n is small; that is n<30.

149 3/14/2024
Student’s t Distribution…….
 The t distribution has a (slightly) different shape for each possible
sample size.

150 3/14/2024
What happens as sample gets larger?
T-distribution and Standard Normal Z distribution

0.4
Z distribution
0.3
density

0.2 T with 60 d.f.

0.1

0.0

-5 0 5
Value

As the df gets larger, the student’s t-distribution looks more and more like the SND
with mean=0 and variance=1.
151 3/14/2024
Student’s t Table

The body of the table


contains t values, not
probabilities Look up a

Look up df
Note: the values
tabled for df = ∞
are the same values
for the standard
normal distribution,
za

152 3/14/2024
3. Chi-square distribution
 The chi-squared distribution with v degrees of freedom is the distribution
of a random variable that is the sum of the squares of k independent
standard normal random variables.
 A continuous random variable Y (Y~𝑋 2 (𝑣)) has a chi-square distribution
with v degrees of freedom if the density function is given by

where:
v = n-1 is the degree of freedom and we have that 𝜇 = v and 𝜎 2 = 2v:

153 3/14/2024
Properties of Chi-square distribution
 The exact shape of the distribution depends upon the number of
degrees of freedom v.
 The mean and variance of the 𝜒 2 distribution are v and 2v
respectively.
 Chi-square values are always positive, so the Chi-square curves is
always positively skewed.
 As n → ∞, then 𝜒 2 distribution approaches a normal distribution.

154 3/14/2024
155 3/14/2024

You might also like