BPT-Probability-binomia Distribution, Poisson Distribution, Normal Distribution and Chi Square Test

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Binomial distribution

Poisson’s distribution

Ninganagouda.P
Lecturer in Biostatistics,
Department of Community Medicine,
SIMS&RH, Tumkur

Normal distribution
Definition of probability:
The probability forms the foundation of all decision making and
statistical reasoning. It is a numerical measure of the likelihood of
an events occurring. If ‘S’ be the sample space in a random
experiment and the events E⊆S. Then the probability of an event is
defined as,
𝐍𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐟𝐚𝐯𝐨𝐮𝐫𝐚𝐛𝐥𝐞 𝐨𝐮𝐭𝐜𝐨𝐦𝐞𝐫𝐬 𝐭𝐨 𝐄 𝐄
P(E) = 𝐍𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐚𝐥𝐥 𝐩𝐨𝐬𝐬𝐢𝐛𝐥𝐞 𝐨𝐮𝐭𝐜𝐨𝐦𝐞𝐬
= 𝐒
Basic Definitions of Probability :-
 Experiment: An operation, in which we can get well defined
outcomes, is known as an experiment.
Example: If a coin is tossed, we may get a head or a tail, which
is an experiment.
 Random experiment: If in each trail of an experiment under
identical conditions, the outcomes is not always the same, but may
be any of the possible outcomes, then such an experiment is called
a random experiment. It is denoted by (r.e).
Example: In a tossing a coin, there is not sure that a head or tail
will be came out, then it is random experiment.
 Sample Space: The set of all possible outcomes in a random
experiment is called a sample space and it is denoted by ‘S’.
Example: Obtain the sample space for the experiment, “ if the
two coins are tossed together”, the possible outcomes of the
experiments are HH,HT,TH,TT, So we get the sample space ‘S’
as, S = 𝑯𝑯, 𝑻𝑯, 𝑯𝑻, 𝑻𝑻
 Event: Any subset of a sample space is called an event.
Example: Getting a head or tail tossing a coin is an event.
 Tail : Performing an experiment once is called a trail.
Example: Tossing a coin, throwing a dice.
 Outcomes: The possible result of an experiment.
Example: The result of a single trial of a probability of an
outcome experiment as given below,
Problem 1. In a single thrown of two dice, find the probability of
getting a total 5 or 8.
Solution : The sample space is,

S = (x, y) /1,2,3,4,5,6
And set of favourable cases are,
S= 1,4 , 4,1 , (2,3), 3,2 , 3,5 , 5,3 , 2,6 , (6,2)
Now, we have,
n(S) = 6 x 6=36 (or) 6n = 62 = 36 and n(E) = 8
n(E) 8 2
Therefore, P(a total of 5 or 8) = n(S)
= 36
= 9
Binomial distribution
(OR)
Bernoulli distribution

Binomial distribution discovered by Swiss mathematician Jakob Bernoulli, in


a proof published posthumously in 1713, determined that the probability of
k such outcomes in n repetitions is equal to the kth term.
Binomial distribution (OR) Bernoulli distribution
Binomial distribution is a statistical probability distribution that states the
likelihood that a value will take one of two independent values under a
given set of parameters or assumptions. The binomial distribution function
is calculated as:-
P(r) = n C r q n – r p r
Where:
‘n' is the number of trials (occurrences)
‘r’ is the number of successful trials, q = Probability of failure = 1-p
‘p’ is the probability of success in a single trial
nC is the combination of n and x. A combination is the number of ways to
r

choose a sample of x elements from a set of n distinct objects where order does
not matter, and replacements are not allowed. Note that n C r = n! / r! ( n − r ) ! ),
where ! is factorial (so, 4! = 4 × 3 × 2 × 1).
Binomial Distribution Mean and Variance:
For a binomial distribution, the mean, variance and standard
deviation for the given number of success are represented using
the formulas-
Mean (μ) = np
Variance (σ2) = npq
Standard Deviation (σ) = 𝒏𝒑𝒒
Where p is the probability of success q is the probability of
failure, where q = 1-p
Application of Binomial Distribution
We now already know that binomial distribution gives the probability of
a different set of outcomes. In real life, the concept of the binomial
distribution is used for:
 To find the number of failures in a sample.
 Finding the quantity of raw and used materials while making a
product.
 Taking a survey of positive and negative reviews from the public for
any specific medicine or product or place.
 By using the YES/ NO survey.
 To find the number of male and female students in an institution.
 The number of votes collected by a candidate in an election is
counted based on 0 or 1 probability.
Properties of Binomial Distribution
The properties of the binomial distribution are:
 There are only two distinct possible outcomes: true/false,
success/failure, yes/no.
 There is a fixed number of 'n' times repeated trials in a given
experiment.
 The probability of success or failure remains constant for each
attempt/trial.
 Only the successful attempts are calculated out of 'n' independent
trials.
 Every trial is an independent trial on its own, this means that the
outcome of one trial has no effect on the outcome of another trial.
Problem 2. A dice is thrown 6 times, consider “ getting of odd
number” is a success, then what is the probability of-
(i) 5 successes?
(ii) At least 5 successes?
(iii) At most 5 successes?
Solution: The sample space of an experiment is,
S = 1,2,3,4,5,6 and n(S) = 6

Let E be a getting of odd number, then


E = 1,3,5 n(E) = 3
𝑛(𝐸) 3 1
Probability in a single throw = 𝑛(𝑆)
= =
6 2
1 1
Probability of failure = q = 1- p = 1- =
2 2
Therefore, the dice is thrown six times, then take n = 6
(i) We know that formula, P(r) = n C r q n – r p r

Therefore, P(x=5) = 6 C 5 (1/2) 6 – 5 (1/2) 5 = 6 x (1/2) x (1/2) 5 = 3/32

(ii) P(At least 5 success) = P(x=5) + P(x=6)


3
= 32
+ 6 C 6 (1/2) 6 – 6 (1/2) 6
3
= 32
+ 1 x 1 x (1/2) 6
3 1 𝟕
= + =
32 64 𝟔𝟒

(iii) P(At most 5 success) = 1 - P(x=6)


1 64−1 𝟔𝟑
=1 - = =
64 64 𝟔𝟒
Problem 3. Find the binomial distribution of getting a six in three
tosses of an unbiased dice.
Solution: HW
POISSON’S DISTRIBUTION
A Poisson distribution is a discrete probability distribution,
meaning that it gives the probability of a discrete (i.e., countable)
outcome. For Poisson distributions, the discrete outcome is the
number of times an event occurs, represented by k or x.
And the Poisson distribution was developed by the French
mathematician Simeon Denis Poisson in 1837.
Poisson distribution formula:
The probability mass function(pmf) of the Poisson distribution is:

𝑒 −λ λ𝑘
P(X = k) =
𝑘!
Where:
• X is a random variable following a Poisson distribution
• k is the number of times an event occurs
• P(X = k) is the probability that an event will occur k times
• e is Euler’s constant (approximately 2.7183)
• λ (lambda) is the average number of times an event occurs
• ! is the factorial function
Poisson distribution Mean,SD and Variance:
The Poisson distribution has only one parameter, called λ.
 The mean of a Poisson distribution is λ.
 The variance of a Poisson distribution is λ.
 Standard Deviation (SD) of a Poisson distribution is also 𝛌

In most distributions, the mean is represented by µ (mu) and


the variance is represented by σ² (sigma squared). Because these
two parameters are the same in a Poisson distribution, we use the λ
symbol to represent both.
The Poisson distribution has only one parameter, λ (lambda),
which is the mean number of events. The graph below shows
examples of Poisson distributions with different values of λ.
Examples of Poisson distributions:
Poisson distributions have been used to describe many other things. For
example, a Poisson distribution could be used to explain or predict:
Text messages per hour.
It is used in statistical quality control to count the number of defects of an item.
In Biology, to count the number of bacteria.
Machine malfunctions per year.
To count the number of errors per page in a typed material.
To count the number of deaths at a particular crossing in a town as a result of
road accident.
Website visitors per month.
Dengue cases per year.
To count the number of incoming telephone calls in a town.
Assumptions of Poisson Distribution:
 The probability of an occurrence is constant over time. In other
words, the rate does not change based on time.
 Occurrences are independent. In other words, the occurrence of one
event does not affect the occurrence of a subsequent event.
 It is also a discrete probability distribution.
 No upper limit to the number of occurrences of an event during the
specified time interval.
 The probability of a single occurrence of an event within a certain
time period is proportional to the length of the time period.
 It is a positively skewed distribution.
Problem 4. It is given that 2% of the screw manufactured by a
company are defective. Poisson distribution to find the probability
that a packet of 100 screws contains:
( a ) Number of defective screws
( b ) One defect screw
( c ) Two or more defectives( Given: e-2 = 0.135)
2
Solution: Let P = Probability of defective screw = 2% = 100
= 0.02

n = 100
2
m = n.p 100 x 100
= 2, m = 2

𝑒 −λ λ𝑘 𝑒 −2 20
( a ) Number of screw, P(X = 0) = = = 0.135
𝑘! 0!
( b ) P(One defective)

𝑒 −λ λ𝑘 𝑒 −2 21
P(X = 1) = 𝑘!
= 1!
= 𝑒 −2 x 2 = 0.135 x 2 = 0.27

( c ) P(two or more defectives)


= 1 – [P(0)+P(1)]
= 1 – [0.135 + 0.27]
= 1 – 0.405
= 0.595
Problem 5. As only 6 students came to attend the class today, find
the probability for exactly 7 students to attend the classes
tomorrow.(Given λ = 6, k= 7)
Solution: HW
Problem 6: If new cases of dengue in a town are occurring at a rate
of about 5 per month, then what’s the probability that exactly 10
cases will occur in the next 3 months?

𝑒 −(5∗3) ∗(5∗3)10
Solution: P( X = 10 in 3 months) = 10!
=?
Topic: Z – test or Normal distribution
With Skewness and Kurtosis
&
Chi – square test
Z – test or Normal distribution
What is Normal Distribution?
Normal distribution, also known as the Gaussian
distribution, is a probability distribution that is symmetric
about the mean, showing that data near the mean are more
frequent in occurrence than data far from the mean. In
graph form, normal distribution will appear as a bell curve.
If ‘X’ is a continuous random variable with probability
density function
Then ‘X’ is a normal variate and the and the distribution
‘x’ is called normal distribution.
Z – test or Normal distribution
Z – test or Normal distribution

μ = mu
σ = sigma
Z – test or Normal distribution

μ is the mean and


σ is the standard deviation of the distribution
Z – test or Normal distribution
Skewness:
 Skewness refers to asymmetry in a symmetrical bell curve in a data set. If the curve is shifted
to the left or to the right, it is said to be skewed.
 The three probability distributions depicted in our example are positively-skewed (or right-
skewed), symmetric and negatively-skewed (or left-skewed).
 The data on the right side of the curve may taper differently from the data on the left.
 These taperings are known as “tails.” A negative skew refers to a longer tail on the left side of
the distribution, while a positive skew refers to a longer tail on the right.
 The mean of positively skewed data will be greater than the median, but in the case of
negatively skewed data, the mean will be less than the median.
 The mode of positively skewed data will be less than the median and mean, whereas for
negatively skewed data, mode will be greater than median and mean.
For a symmetric distribution, mean, median and mode are all the same.

Mean<Median<Mode Mode>Median>Mean
Kurtosis:
 Like skewness, kurtosis is a statistical measure that is used to describe distributions.
Kurtosis is defined as a measure of ‘peakedness’ and is generally measured relative to
normal distributions.
 There are three categories of kurtosis that can be displayed by a set of data.
 Firstly, mesokurtic – This distribution has kurtosis statistics similar to that of a normal
distribution. Secondly, leptokurtic distribution. Any distribution that is leptokurtic
displays greater kurtosis than a mesokurtic distribution. Thirdly, platykurtic distribution.
A platykurtic distribution displays less kurtosis than mesokurtic distributions.
Let me summarize Skewness and Kurtosis as below:
1. Skewness gives the direction and the magnitude of the lack of symmetry whereas Kurtosis
gives the idea of flatness of distribution.
2. In Symmetric distribution with no Skewness, mean, median and mode coincide.
3. Positively skewed distribution has mean > median > mode while a negatively skewed
distribution has mean < median < mode.
4. Measures of skewness can be both absolute well as relative, but an absolute measure of
skewness cannot be used for purposes of comparison of distributions.
5. Degree of kurtosis of a distribution is measured relative to that of a normal curve. The
curves with a greater peak than the normal curve are called “Leptokurtic”. The curves
which are flatter than the normal curve are called “Platykurtic”. The normal curve is
called “Mesokurtic.”
6. In business and economics, measures of variation have larger practical applications than
measures of skewness. However, in medical and life sciences measures of skewness have
larger practical applications than the variance.
Topic : Chi square test ( )
Applications of chi square test:
The important applications of chi square distribution(test) are
1). Test of homogeneity.
2). To test “ goodness of fit ” of a theoretical distribution to an
observed distribution.
3). To test independence of attributes in a 2 x 2 contingency table.
1. Test of homogeneity
This test can also be used to test whether the occurrence of
events follow uniformity or not e.g. the admission of patients in
government hospital in all days of week is uniform or not can be
tested with the help of chi square test
chi square calculated value is less than chi square table value,
then null hypothesis is accepted, and it can be concluded that there is
a uniformity in the occurrence of the events.
2. Goodness of fit :
• This test enables us to see how well does the assumed theoretical
distribution (such as Binomial distribution, Normal distribution) fit
to the observed data.
• The chi square test formula for goodness of fit is:
, Where, Oi = Observed frequency
Ei = Expected frequency
• If (calculated) > (tabulated), with (n-1) degrees of freedom, then
null hypothesis is rejected otherwise accepted.⸪(c-1) x (r-1) = df
• And if null hypothesis is accepted, then it can be concluded that the
given distribution follows theoretical distribution.
3. Chi- test for independent of attributes:
By using chi square test we can find out whether two attributes
are independent or not. Suppose, we have N observations classified
according to two attributes.

Example:
1. Whether there is any association between eye colours of parents
and children.
2. Whether there is any association between marriage and happiness.
3. Whether quinine is effective in controlling malaria or not
Yate’s correction :
If in the 2 x 2 contingency table, the expected frequencies are
small Say less than 5, then chi square test can’t be used. In that case,
the direct formula of the chi square test is modified and given by
Yate’s correction for continuity.

You might also like