Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 53

Statistical Learning

Probability Distributions

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Agenda
● Probability Overview
● Types of Probability
● Rules for computing probability
● Marginal probability
● Conditional probability
● Bayes Theorem and its Applications
● Normal distribution
● Further reading
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Statistical Learning ( TOC )
Sno Topic Scope

1 Probability Overview What is probability, its range, application, how to


calculate, types of event, experiment

2 Types of Probability Types of probability

3 Rules for computing Ways to compute probability, joint, marginal,


probability conditional probability

4 Bayes Theorem and its Bayes theorem, conditional probability,


Applications application of bayes theorem

5 Probability Distribution Binomial, normal, poisson distribution and their


properties

6 Further Reading Binomial distribution, Poisson distribution and


their properties, defective Vs defects

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Probability Overview
● Probability refers to chance or likelihood of a particular event-taking place.
● An event is an outcome of an experiment.
● An experiment is a process that is performed to understand and observe possible
outcomes.
● Set of all possible outcomes of an experiment is called the sample space.
Example: Experiment - tossing a coin
Event - occurrence of head when a fair coin is tossed
Sample space - head, tail
Probability - likelihood of getting head when a fair coin is tossed ( 0.5)

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Probability Overview
● Probability value ranges from [0,1]
● When will the P(A) will be zero ?
○ Probability of getting 0 when you roll a dice.

● When will the P(A) will be one?


○ Probability of getting an integer when you roll a dice

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Probability Overview - Applications
● To quantify uncertainty of an outcome
● Inferential statistics is built on the foundation of probability theory.
● Helps in deriving conclusions from data
● To estimate occurence an event
● Used in Machine Learning algorithms to estimate or predict an outcome

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Probability Overview
Probability of a single event is defined as :

P = Number of favorable outcomes

Number of possible equally likely outcomes

Example - If you roll a dice, there are 6 possible outcomes, and each of these are equally
likely.

Possible outcomes (sample space) = 1,2,3,4,5,6

Event (E) - probability of getting an even number. Favorable outcomes = 2,4,6

Probability = P(E) = 3/6 = 0.5


Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Probability Overview
Probability of two or more events :
● Case 1: probability of events A or B
● Case 2: probability of events A and B
○ Joint probability - probability of two events occurring simultaneously

● Case 3 : probability of event X=A given variable Y


○ Marginal probability - probability of an event irrespective of the outcome of another variable

● Case 4 : A given B
○ Conditional probability - probability of one event given the occurrence of another event

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Probability Overview
Mutually Exclusive Events Independent Events

● Two events A and B are said to be mutually ● Two events A and B are said to be independent if
exclusive if they cannot occur at the same time. the occurrence of A is in no way influenced by
Or occurence of A excludes occurence of B. the occurrence of B. Likewise occurrence of B is
in no way influenced by the occurrence of A.
● Example - when you toss a coin, getting a head
or a tail are mutually exclusive as either head or ● Example - when you toss a fair coin and get head
tail will appear in case of an ideal scenario. and then you toss it again and get a head

A = Head when a fair coin is tossed A = Getting head in first toss of a fair coin
B = Tail when a fair coin is tossed B = getting head again in the second toss of a fair coin

A = The roll of a die is odd


B = The roll of a die is even

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Types of Probability
1. A Priori Classical Probability
a. When a fair coin is tossed. What is the probability of getting a tail?

2. Empirical Probability :
a. What is the probability that a man lives for 1000 years?

3. Subjective Probability
a. Chance of India winning the next world cup?

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Rules for computing probability
● Addition Rule - Mutually exclusive events
P(AUB) = P(A) + P(B)
Probability of union of A and B is determined by adding probability of the
events of A and B

● Example - From a pack of well-shuffled cards, a card is picked up at random.


● What is the probability that the selected card is a King or a Queen?
● Event A = Probability of getting a King
P(A) = 4/52
A B
● Event B = Probability of getting a Queen
P(B) = 4/52
● P(AUB) = 2/13 Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Rules for computing probability
● Addition Rule - Events are not Mutually exclusive events
P(AUB) = P(A) + P(B) - P(A∩B)
● Probability of union of A and B is determined by adding probability of the events of A and B and
then subtracting probability of the intersection of the events A and B
● Example - From a pack of well-shuffled cards, a card is
picked up at random.
● What is the probability that the selected card is a King or a
Diamond?
○ Probability of getting a King, P(A) = 4/52 A B
○ Probability of getting a Diamond, P(B) = 13/52
○ Probability of getting a diamond king,
○ P(A∩B) = 1/52
○ P(AUB) = 4/52 + 13/52 - 1/52 = 16/52 = 4/13
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Rules for computing probability
● Multiplication Rule - Independent events
P(A∩B)= P(A) * P(B)

● When two events A and B are independent, the probability of simultaneous occurrence of A and
B equals the product of probability of A and probability of event B.
● Can be extended to more than two events

● Example ( Assuming A and B are independent)


○ Event A = Probability of getting A grade in a maths exam is 0.7
○ Event B = Probability of getting A grade in a science exam is 0.6
○ Probability of getting A grade in both maths and science exam will be
= 0.7*0.6 = 0.42

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Rules for computing probability
● Multiplication Rule - When events are not independent
P(A∩B)= P(A) * P(B/A)
P(A∩B)= P(B) * P(A/B)

● When two events A and B are not independent, the probability of simultaneous occurrence of A
and B equals the product of probability of A and probability of event B given that A has
happened.

● Example - from a deck of cards, two cards are drawn in succession one after the other ( Without
replacement). What is the probability that both the cards are hearts?
○ P( First king) = P(A) = 13/52
○ P( Second is also king) = 12/51
○ P(A∩B) = 3/51 = 1/17
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Marginal Probability
● Case 3 : probability of event X=A given variable Y
● Marginal probability - probability of an event irrespective of the outcome of another
variable
● The term marginal is used to indicate that the probabilities are calculated using a
contingency table (also called joint probability table).
● Contingency table consists of rows and columns of two attributes at different levels
with frequencies or numbers in each of the cells. It is a matrix of frequencies
assigned to rows and columns.
● The marginal probability of one variable (X) would be the sum of probabilities for
the other variable (Y rows) on the margin of the table.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Marginal Probability
Example - we have a data of few candidates who studied for some number of hours for an
exam and and whether they pass or fail the exam

Exam Status No. hours studied <5 No. hours studied >5 Total

Pass 38 42 80

Fail 82 38 120

Total 120 80 200

1. What is the probability that a randomly selected student passed the exam?
2. What is the probability that a randomly selected student passed the exam and studied for more
than 5 hours? Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Marginal Probability
1) What is the probability that a randomly selected student passed the exam?
80/200 =0.40

2) What is the probability that a randomly selected student passed the exam and studied
for more than 5 hours?
42/200 =0.21

3) A student selected at random is found to be studying for more than 5 hours. What is the
probability that he failed the exam?
38/80 =0.475
Note this is a case of conditional probability
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Conditional Probability
● Case 4 : A given B
○ Conditional probability : probability of one event given the occurrence of another event
○ The conditional probability for events A given event B is calculated as follows:
○ P(A given B) = P(A and B) / P(B)
○ P(A/B) = P(A∩B)/ P(B)

● Example - Probability of getting a king given that it is a hearts


○ P(A) = 4/52
○ P(B) = 13/52
○ P(A∩B) = 1/52
○ P(A/B) = 1/13

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Rules for computing probability
Event Probability

A P(A) ranges from [0,1]

Not A 1-P(A)

A or B (AUB) P(AUB) = P(A) + P(B) - P(A∩B)

A or B (AUB) for mutually exclusive events P(AUB) = P(A) + P(B)

A and B (A∩B) P(A∩B)= P(A) * P(B/A)

A and B (A∩B) when A and B are independent P(A∩B)= P(A) * P(B)

A given B has occurred P(A/B)= P(A∩B)/P(B)

B given A has occurred P(B/A) = P(A∩B)/P(A)

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Bayes Theorem
● Bayes’ Theorem is used to revise previously calculated probabilities based on new
information.
● Developed by Thomas Bayes in the 18th Century.
● It is an extension of conditional probability.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Bayes Theorem
● Given a hypothesis H and evidence E, Bayes theorem states that the relationship
between the probability of hypothesis P(H) before getting the evidence and the
probability of P(H/E) of the hypothesis after getting the evidence is P(H/E) =
P(E/H)*P(H)/P(E)

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Bayes Theorem
● Bayesian spam filtering example :
● Event A = The message is spam.
○ p(spam) = 0.8
● Event B = Message contains money as a word in it.
○ p(word/spam) = 0.7
○ p(word) = 0.6
● P(spam/word) = p(word/spam) * p(spam)
p(word)
● P(spam/word) = 0.7*0.8 / 0.6
= 0.933

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Bayes Theorem - Application
● Naive Bayes algorithm
● Spam filtering use case
● Interpretation of statistical results

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Bayes Theorem - Exercise
● A drilling company has estimated a 40% chance of striking oil for their new well.
● A detailed test has been scheduled for more information. Historically, 60% of
successful wells have had detailed tests, and 20% of unsuccessful wells have had
detailed tests.
● Given that this well has been scheduled for a detailed test, what is the probability that
the well will be successful?

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Probability Distribution
● It’s a mathematical function of probability values for different possible outcomes
( random variable)
● Total list of various values a random variable can take
● They are divided into two classes - discrete and continuous

● Example - discrete probability distribution


○ Experiment - rolling a dice
○ Variable X takes possible outcomes - 1,2,3,4,5,6
○ Probability values for each outcome is ⅙
○ Probability distribution of X would take value of ⅙ for X=1, X=2…..X=6
○ This is a discrete distribution
○ Has a probability mass function (pmf)
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Probability Distribution
● Example - continuous probability distribution
○ Experiment: length of time taken for a bus to arrive
○ Variable X takes possible outcomes: any positive number
○ This is a continuous distribution
○ Has a probability density function (pdf)
○ Has a cumulative distribution function (cdf)

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
PDF Vs CDF

Image Credit - https://en.wikipedia.org/wiki/File:Arrival_pdf_and_cdf_30_minutes.svg


Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Normal Distribution
● Probability density function for
continuous probability distribution
● Also called Gaussian distribution
(normally distributed)

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Normal Distribution - Properties
● Approximately 68% of the data in a bell
shaped distribution is within 1 std of the
mean or µ ± 1σ
● Approximately 95% of the data in a bell-
shaped distribution lies within two standard
deviations of the mean, or µ ± 2σ
● Approximately 99.7% of the data in a bell-
shaped distribution lies within three standard
deviations of the mean, or µ ± 3σ

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Normal Distribution - Properties
● Mean, mode, and median are same
● Mean, mode, and median = µ
● Standard deviation = σ
● Symmetrical around mean. Skewness = 0
● Kurtosis = 3
● Asymptotic to x-axis. If the tails are extended, they will come closer to the horizontal
axis without actually touching it
● It has two parameters, mean and standard deviation
● Standard normal distribution has mean as zero and std deviation as 1
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Normal Distribution - Example
The mean weight of a morning breakfast cereal pack is 0.295 kg with a standard deviation
of 0.025 kg. The random variable weight of the pack follows a normal distribution.

a)What is the probability that the pack weighs less than 0.280 kg?

b)What is the probability that the pack weighs more than 0.350 kg?

c)What is the probability that the pack weighs between 0.260 kg to 0.340 kg?

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Normal Distribution - Example
Z = (x-µ ) /σ
= (0.280-0.295)/0.025 = -0.6
Use this z value to get the cumulative probability.

Answer = 0.2743
27.43% of the packs weigh less than 0.280 kgs

Note while using excel function :


● If z is negative, it returns the direct answer
● When z is positive, answer would be 1- cumulative probability returned by excel

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Normal Distribution - Example
Z = (x-µ ) /σ
= (0.350-0.295)/0.025 = 2.2
Use this z value to get the cumulative probability.

Answer = 0.9861
1-0.9861 = 0.0139
1.39% of the packs weigh less than 0.35 kgs

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Normal Distribution - Example
● Cumulative probability up to 0.340kg
Z = (x-µ ) /σ
= (0.340-0.295)/0.025 = 1.8
Use this z value to get the cumulative probability.
Answer = 0.9641

● Cumulative probability up to 0.260kg


= (0.260-0.295)/0.025 = -1.4
Answer = 0.0808
= 0.9641 - 0.0808 = 0.8833
● 88.33% of the packs weigh between 0.26kg and
0.34 kgs Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Hands-on
Case study

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Why Sampling

Need for sampling as opposed to using the entire population for analysis

● Time factor
● Effort factor

It is usually not feasible to make a complete census of a population because of time


and budget constraints. Therefore, a sample of the population is used to make
inferences about the whole population. The goal of this type of sampling is to collect
data that is representative of the entire population of interest.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Central Limit Theorem

“Sampling Distribution of the means of any independent random variable will be


Normal”

● This applies to both discrete and continuous distributions


● The random variable should have a well defined mean and variance (standard
deviation)
● Applicable even when the original variable is not normally distributed

Let’s watch central limit theorem in action. (please open the notebook titled ‘Central Limit Theorem’)

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Any Questions?

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Further Reading
OpenIntro Statistics
Think Stats
NIST Engineering Statistics Handbook

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Binomial Distribution - Overview
● Probability distribution of a discrete
probability distribution
● Random variable has binary outcome
(success, failure)
● Parameters - n, p
● Here, n is number of experiments and p is
probability of successful outcome
● Also called as Bernoulli distribution
● It plays a major role in quality control and
quality assurance function. Manufacturing
units do use the binomial distribution for
defective analysis. Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Binomial Distribution - Conditions
Conditions for applying Binomial Distribution ( Bernoulli process)
● Trials are independent and random.
● There are fixed number of trials (n trials).
● There are only two outcomes of the trial designated as success or failure.
● The probability of success is uniform throughout the n trials

Example -
● Tossing a coin 100 times

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Binomial Distribution - Formula
● Under the condition of Bernoulli process, the probability of getting x successes out of
n trials would be binomial probability function -
n
P(X) = x Px(1-P)n-x

● Where P(X) is the probability of getting x successes in n trials


● P is probability of success
● n is the number of ways x successes can take place out of n trials.
x
● The mean of a binomial distribution is given by E(x) = np
● The standard deviation = 𝛔 = (np*(1-p))1/2

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Poisson Distribution
● Poisson Distribution is another discrete distribution which also plays a major role in
quality control in the context of reducing the number of defects per standard unit.
● Examples include number of defects per item, number of defects per transformer
produced, number of defects per 100 m2 of cloth, etc.
● Other real life examples would include 1) The number of cars arriving at a highway
check post per hour; 2) The number of customers visiting a bank per hour during
peak business period.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Poisson Distribution - Process
● The probability of getting exactly one success in a continuous interval such as length,
area, time and the like is constant.
● The probability of a success in any one interval is independent of the probability of
success occurring in any other interval.
● The probability of getting more than one success in an interval is 0.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Poisson Distribution - Formula

● P(X) = probability of X successes given an idea of 𝜆


● 𝜆 is average number of successes
● X is successes per unit
● Mean of the Poisson distribution = 𝜆
● Standard deviation of the Poisson distribution ⟌𝜆

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Defective Vs Defects
● In a 1000 lines Code Program, if n number of lines have problems, it is called
defective
● However, the type of problem in each problematic line is called a defect
● Binomial deals with defective analysis, while poisson deals with the number of
defects

● A computer is said to be defective. Binomial


● The type of defects in each defective computer. Poisson

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Defective Vs Defects
● Preparing a Project Report
● Number of Defective pages in the Report – Binomial (Defective/Non Defective)
● Number of defects per page – Poisson

● We could say - Without Poisson , there is no Binomial


● Binomial deals with % defective
● Poisson deals with number of defects per item
● Number of people arriving at an atm - Poisson

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Defective Vs Defects
● Bank – Dissatisfied Customers percentage is Binomial
● Number of complaints made by each customer – Poisson

● Ashok Leyland chooses Leather covers


● Defective covers – Binomial
● Types of Defects in each cover – black spots, thread cuts etc - Poisson

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Practice Exercises

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Association of attributes - Exercise
● Of 37 men and 33 women, 36 are teetotalers (completely abstain from alcoholic
beverages). Nine of the women are non-smokers and 18 of the men smoke but do not
drink. 13 of the men and seven of the women drink but do not smoke.
● How many, both drink and smoke? What is the associated probability?

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Binomial Distribution - Example
● A bank issues credit cards to customers under the scheme of Mastercard. Based on
the past data, the bank has found out that 60% of all accounts pay on time following
the bill. If a sample of 7 accounts is selected at random from the current database,
construct the Binomial Probability Distribution of accounts paying on time.

● Work out using python

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Poisson Distribution - Example
● If on an average, 6 customers arrive every two minutes at a bank during the busy
hours of working,
○ a) what is the probability that exactly four customers arrive in a given minute?
○ b) What is the probability that more than three customers will arrive in a given minute?

● Work out using python

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Thank you!
Happy Learning :)

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

You might also like