Lesson 2 - Basics of Probability Theory

STATS707 Computational Intro to Statistics Basics of Probability Theory
Basics of
Probability Theory
Basics of Probability Theory

• Random variables
• Probability distributions
• Expectation
• Variance
• Normal distribution
• Other commonly used distributions
• Bernoulli, Binomial, Poisson, Exponential
• Simulation exercises and real data
• Laws of probability
• Conditional probability
University of Auckland
Probability Distributions
What are they?
Why do we need them??
What are they?
• Mathematical functions which determine the probability that a random variable

takes a particular value/s
• Discrete probability distributions

• Continuous probability distributions
Why do we need them??
• Because we have to understand the uncertainty associated with ‘random

variables’.
• But what are ‘random variables’? Why should we bother?

• How are they different from ‘variables’?
Variables
• Can take different values
• the value that variable will take is deterministic.
• Governed by an order
• Or determined by other factors
• e.g.
• Time (thankfully deterministic – set order!)
else, tomorrow could be exam day!
• BMI = Weight/ (Height^2)
Can determine what BMI would be for given height and weight
• any other?
Random variables
• Can take different values
• BUT the value is governed by a random mechanism
• Sampling a particular unit, e.g.
• Height/weight of a person (we are all different!)
• Weekly income of a family
• Complex combination of various uncertain factors, e.g.
• How will be the weather today?
• When will the bus arrive?
• Outcome of a match/exam?
• Return on investment
• Behaviour of a person
• … and so on….(almost everything we come across)
Random variables
• The Only way to understand the random variables is to assign probabilities to all possible
outcomes
• This is precisely what probability distributions do.

• They are mathematical functions which assign these probabilities
• We will not go into the maths in this course

• But we do want to be able to ‘read’ a probability distribution plot (to understand what
probabilities it is assigning)
Types of random variables

• Discrete random variables
• Can take only finitely many values
• Weather in Auckland (sunny, overcast, showery)
• Outcome of a test match (win, lose, drawn)
• Continuous random variables

• Can take infinitely many values
• When will bus arrive today?
• Return on investment
• Height of a person (exact measurement – if it was ever possible!)
• Income of a family (again, assuming exact measurement)
• In practice, variables which can take a large number of distinct values are considered
continuous (e.g. income in $, height in cm)
Notations - I
• Random variables are often denoted by letters, e.g. X, Y or Z, etc.
• Subscripts are used to denote multiple random variables. For e.g.

• Xi to denote (say) the income of the ith individual,
• Zt to denote the temperature at time t, etc.
• Sometimes, small letters are used to denote the notional observed value.
For e.g.
• Y = y, indicates that the observed value of the random variable Y is y,
• Xi = xi , i= 1,…,n indicates that we have observed n random variables (say, heights of n
individuals) and the notional observed value of the ith random variable Xi is xi.
10
Notations - II
• A general probability distribution may be represented as f(x) or p(x)
• Typically, f(x) is used for continuous distributions and p(x) for discrete.
• For discrete, p(x) indicates the probability of X taking a value x.
• For continuous, f(x) indicates the probability density of X taking a value x.
• Specific probability distributions can also be expressed. For e.g.

• N(µ,σ) denotes a normal distribution with mean µ and standard deviation σ.
• The symbol ~ is used to denote that a certain random variable follows a

certain probability distribution. For e.g
• X ~ f(x) denotes that X follows the probability distribution f(x)
• Z ~ N(0,1) denotes that Z follows the normal distribution with mean 0 and std dev 1.
• For multiple random variables, we may say Xi ~ f(x), i= 1,…,n.
11
Rules of probability
• Always positive and between 0 and 1.
• Total probability, i.e. the sum of probabilities of all possible values

must always sum to one!
• For discrete probability distributions, all bars must be between 0 and 1 and
the sum of all bars must be equal to one.
• For continuous probability distributions, the total area under the curve must
equal to one.
• For continuous random variables, since they can take infinitely many different
values, probability of taking any particular value is always equal to zero.
12
Reading a discrete probability distribution
Weather in Auckland Shows the probability for

each of these values
What is the
probability that a
particular day will be 0.4
probability
sunny, overcast or
showery? 0.3
Random variable: weather Shows the

0.2
values the
only 3 possible values variable
can take
Probability (sunny) = 0
weather
13
Reading a continuous probability distribution
Shows the probability density

Height of a male student for each of these values
Probability density
Measured in cm
Area under the
Random variable: height
curve =1
Can take many possible
values
Shows the
values the
0.04 variable
0.02 can take
Prob (height < 170) 0
= shaded area 150 170 190
Height
in cm
14
Reading a continuous probability distribution
Height of a male student
Probability density
Measured in cm
Random variable: height
Can take many possible

values
0.04
Probability
distribution for the 0.02
height of a female 0
student?? 150 170 190
Height
in cm
15
Expectation and Variance
• Some of the most basic statistical concepts

• Summarising data
16
Expectation or Expected Value
• Expected value of a random variable

• Is ‘simply’ the mean μ of the random variable
• Captures the ‘central tendency’: data is centred around mean.
• Also known as Expectation.
• Expectation of a random variable X is denoted by E(X).
17
Variance
• Variance of a random variable

• Square of the standard deviation = σ^2
• Captures the variability in the data
• Variance of a random variable X is denoted by V(X) or Var(X).
• It is the expected squared difference from the mean, Var(X) = E[(X - µ)2]
18
Normal distribution
• Symmetric bell shaped distribution. Symmetric about m (mean)
• Tails go down fairly fast. Usually no observations more than 3 s

(standard deviations) away from the mean.
• approx 68% within 1 s of mean

• approx 95% within 2 s of mean
• approx 99.7% within 3 s of mean
19
Normal distribution
Area between
+1s and -1s = 0.68
68 % of the observations are likely to be between -1s and

+1s
20
Normal distribution
Area between
+2s and -2s = 0.95
95 % of the observations are likely to be between -2s and +2s
21
Normal distribution
Area between
+3s and -3s = 0.997
99.7 % of the observations are likely to be between -3s and

+3s
22
Normal distribution
• Any Normal distribution has same basic shape.
• It may have a different mean (be shifted)
• It may have a different standard deviation (width)
• Normal distribution is completely determined by mean m and standard

deviation s
• Any normal distribution can be transformed back to a standard normal

(m=0, s=1) by subtracting its mean and dividing by its standard deviation
23
Normal distribution
Two Normal distributions with different (shifted) means but the same standard deviation
24
Normal distribution
Two Normal distributions with different standard deviations but the same means
25
Non- normal distributions

• Often, normal distribution is not the answer. The data may be discrete or
the distribution maybe skewed.
• Some of the commonly used distributions for modelling data are

• Bernoulli
• Binomial
• Poisson
• Exponential
• Other distributions commonly used for hypothesis tests are

• t, F and Chi-squared
26
Three different Gamma distributions
27
Three different Beta distributions
28
Non- normal distributions

• Normal distribution is unique in that
• It is always symmetric
• It is defined by mean and variance
• It can be defined over any (positive or negative) values of X.
• T-distribution is symmetric and very close to N(0,1)
• But other distributions are mostly not symmetric and often only applicable
if X takes values in a certain range (>0 or between 0 and 1, etc.)
• Also, non-normal distributions are defined using different parameters such

as degrees of freedon, scale, shape, etc., not by mean and variance.
29
Probability theory
• Learning the basics …
30
Probability theory
• Events – compliment, intersection, union and sample space
• Laws of probability
• Probability of union and intersection of events
• Conditional probability, dependent and independent event
31
Probability theory
• Statistics is about decision making under uncertainty.
• One important way to quantify uncertainty is using probabilities.
• But before we can define probability, we need to define events.
32
Probability theory
• An event is a basic outcome of a trial or a random experiment.
• For e.g. Tossing a coin is a random experiment and each of the possible
outcomes - H or T - is an event.
• We denote an event using capital letters such as ’A’ and ’B’.
• Events A and B are said to be independent if they are unrelated and

the value of one does not depend on the value of the other. Else they
are said to be dependent.
33
Example – throw of a dice

• Let’s consider the outcome of a throw of a dice and define the
following two events:
• A: the outcome is an odd number

• A = {1,3,5}
• B: the outcome is greater than 4

• B = {5,6}
34
Probability theory
• Compliment of an event A is denoted by Ac.
• Intersection of events A and B is denoted by A ∩ B and is
deﬁned as the outcome shared by both A and B.
• Union of events A and B is denoted by A ∪ B and is deﬁned as the
outcome shared by either A or B or both.
• Null event denotes ’no outcome’ and is denoted by ϕ.
35

We have, A = {1,3,5} and B = {5,6}. Then,
• Ac: the outcome is an even number
• Ac = {2,4,6}
• Bc : the outcome is less or equal to 4
• Bc = {1,2,3,4}
• A ∩ B: the event is an odd number AND greater than 4
• A ∩ B = {5}
• A ∪ B: the event is an odd number OR greater than 4
• A ∪ B = {1,3,5,6}
36
Probability theory
• Sample space is denoted by Ω and is the collection of all events
possible for a given experiment.
• For toss of a coin, Ω = {H, T, ϕ}.
• For throw of a dice, Ω = {1,2,3,4,5,6,ϕ}.
• What will be the sample space for income/height/weight of a

person?
37
Probability – foundational definitions

• Probability is a measure. It measures the chance/likelihood of the
occurrence of events.
• Basic laws of probability

• 1 ≥ P (A) ≥ 0 for any event A (probabilities are always positive and between 0
and 1).
• P (Ω) = 1 for any experiment (total probability always sums to 1).
• P (ϕ) = 0 (probability of a null event is 0).
38
Calculating probabilities of events

• Relative frequency approach: if Ω consists of n equally likely
outcomes, k of which constitute an event A, then P(A) = k/n.
• When a dice is tossed, all six possible outcomes are equally likely and
three of those outcomes result in the event A (odd number).
Therefore, P(A) = 3/6 = ½.
• Other approaches to calculating probabilities:

• Modelling (for e.g using a probability distribution)
• Subjective probability (not covered in this course).
39
Probability – foundational definitions

• P (A c ) = 1 − P (A).
• For any two events A and B,

P (A ∪ B ) = P (A) + P (B) − P (A ∩ B).
• But if A and B are mutually exclusive, that is, A ∩ B = ϕ then,

P (A ∪ B ) = P (A) + P (B).
40

We have, P(A) = ½ and P(B) = 2/6 = 1/3. Then,
• P(Ac) = 1 – ½ = ½.
• P(Bc) = 1 – 1/3 = 2/3.
• A ∩ B = {5}, therefore, P(A ∩ B) = 1/6.
• A ∪ B = {1,3,5,6}, therefore, P(A ∪ B) = 4/6 = 2/3.
41
Conditional probability
• The conditional probability of A given B is
P (A|B) = P (A ∩ B)
P(B)
• If A and B are independent then, P (A ∩ B) = P (A)P (B).

• If A and B are independent then, P (A|B) = P (A)
The Bayes theorem is essentially this conditional probability formula.
42

In our example, A|B will be interpreted as ‘the outcome is odd given
that it is greater than four’. Now, we have
• P(B) = 2/6 = 1/3.
• A ∩ B = {5}, therefore, P(A ∩ B) = 1/6.
Therefore, P(A|B) = P(A ∩ B) / P(B) = 0.1667/0.3334 = ½.
• We can see that P(A|B) = ½ = P(A), that is, A and B are independent
events.
43
Summary on probability theory

• Probability is a mathematical measure and must follow its basic laws.
• Probability is applied on sets, which denote the various events (possible

outcomes).
• Samples space Ω is a collection of all possible events/outcomes of a

random variable.
• For any continuous r.v., Ω is a set of infinite elements
• For any discrete r.v., Ω is a finite/countable set.
• Concepts learned in this probability primer will be needed later!
44
Bernoulli distribution (p)

• Used for a random variable that can only have two possible outcomes.
• For e.g.
• toss of coin (H/T),
• results of an experiment (success/failure), etc.
• It has only one parameter, p, which represents the probability of success.
• If we run successive experiments, where each one is independent and the

outcome of an experiment is not governed by the outcomes of the previous ones,
then the outcome of each of those experiments can be modelled using a
Bernoulli distribution.
45
Bernoulli distribution (p)

• If X ~ Bernoulli(p), then X can take values 0 or 1.
• Sample space Ω = {0,1}
• The probability mass function is:

P(X = 1) = p
P(X = 0) = 1-p
• E[X] = p
• Var[X] = p(1-p)
46
Binomial distribution
• What if we have n repeated experiments?
• We could model them as n independent Bernoulli random variables OR
• We could model the distribution of the total no. of successes out of n!
• The total no. of successes out of n independent Bernoulli trials follows a

Binomial distribution.
• e.g. no. of Heads/Tails out of n successive coin tosses, no. of wins out of the total no.
of matches played, etc.
• It has two parameters: n and p, where,

• n is the no. of independent repetitions
• P is the probability of success in each trial.
47
Binomial distribution (n,p)

• If X ~ Binomial(n,p), then X can take values 0,1,...,n.
• Sample space Ω = {0,1, … , n}

𝑛
P(X = x) = 𝑝 𝑃 𝑥 1 − 𝑃 𝑛−𝑥
, x= 0,1,…,n.
• E[X] = np
• Var[X] = np(1-p)
48
Poisson distribution (λ)

• What if the number of trials is VERY large (approaching infinity) and the
probability of success is VERY small (close to 0)?
• Here, the number of trials becomes immaterial and we are really modelling the
number of ‘rare’ events that can occur!
• The binomial distribution ‘converges’ to a Poisson distribution
• Poisson distribution models the number of occurrences of a ‘rare event’

and has only one parameter that denotes the rate at which such events
occur.
• .e.g. no. or road accidents, no. of injury claims received, no. of burglaries, etc.
• The rate is typically denoted using µ or λ.
49
Poisson distribution (λ)

• If X ~ Poisson(λ), then X can take values 0,1,…
• Sample space Ω = {0,1,2,3,…}

𝜆𝑥 ⅇ −𝜆
P(X = x) = ,𝑥 = 0,1,2, …
𝑥!
• E[X] = λ
• Var[X] = λ
50
Exponential distribution (λ)

• When modelling rare events, you may also be interested in modelling the time
between two successive events.
• If the number of events follows a Poisson distribution then the time between the
events follows an Exponential distribution.
• For e.g.
• If your business was ram-raided, how long before it is targeted again?
• A large store has multiple self checkout machines and a steady stream of customers who
want to check out. If 3 self checkouts become available each minute (on an average), how
long before the next one becomes available?
• Exponential distribution is a continuous distribution and also has just one, the
rate, parameter (similar to Poisson).
51
Exponential distribution (λ)

• If X ~ Exponential(λ), then X can take values any values 0 or greater.
• Sample space Ω = [0,∞)

P(X = x) = 𝜆ⅇ −𝜆𝑥 , 𝑥 ≥ 0.
• E[X] = 1/λ
• Var[X] = 1/ λ2
52
Summary on probability distributions

• It is not possible to predict the outcome of random variable/event with
certainty.
• Probability distributions quantify the uncertainty in the values a random

variable can take!
• Discrete probability distributions provide the actual probability of each

value, but the continuous distributions provide the relative probability.
• Normal distribution can be completely determined by mean and standard

deviation. But other distributions have other parameters.
53
Simulation and real data exercises
54

Lesson 2 - Basics of Probability Theory

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lesson 2 - Basics of Probability Theory

Uploaded by

Copyright:

Available Formats

STATS707 Computational Intro to Statistics Basics of Probability Theory

Basics of Probability Theory

Why do we need them??

What are they?

• Mathematical functions which determine the probability that a random variable

• Discrete probability distributions

Why do we need them??

• Because we have to understand the uncertainty associated with ‘random

• But what are ‘random variables’? Why should we bother?

• This is precisely what probability distributions do.

• We will not go into the maths in this course

Types of random variables

• Continuous random variables

• Subscripts are used to denote multiple random variables. For e.g.

• Specific probability distributions can also be expressed. For e.g.

• The symbol ~ is used to denote that a certain random variable follows a

• Total probability, i.e. the sum of probabilities of all possible values

Reading a discrete probability distribution

Weather in Auckland Shows the probability for

Random variable: weather Shows the

Reading a continuous probability distribution

Shows the probability density

Reading a continuous probability distribution

Height of a male student

Random variable: height

Can take many possible

Expectation and Variance

• Some of the most basic statistical concepts

Expectation or Expected Value

• Expected value of a random variable

• Variance of a random variable

• Tails go down fairly fast. Usually no observations more than 3 s

• approx 68% within 1 s of mean

68 % of the observations are likely to be between -1s and

95 % of the observations are likely to be between -2s and +2s

99.7 % of the observations are likely to be between -3s and

• Any Normal distribution has same basic shape.

• It may have a different mean (be shifted)

• It may have a different standard deviation (width)

• Normal distribution is completely determined by mean m and standard

• Any normal distribution can be transformed back to a standard normal

Non- normal distributions

• Some of the commonly used distributions for modelling data are

• Other distributions commonly used for hypothesis tests are

Three different Gamma distributions

Three different Beta distributions

Non- normal distributions

• T-distribution is symmetric and very close to N(0,1)

• Also, non-normal distributions are defined using different parameters such

• Learning the basics …

• Probability of union and intersection of events

• Conditional probability, dependent and independent event

• One important way to quantify uncertainty is using probabilities.

• But before we can define probability, we need to define events.

• We denote an event using capital letters such as ’A’ and ’B’.

• Events A and B are said to be independent if they are unrelated and

Example – throw of a dice

• A: the outcome is an odd number

• B: the outcome is greater than 4

Example – throw of a dice

• What will be the sample space for income/height/weight of a

Probability – foundational definitions

• Basic laws of probability

Calculating probabilities of events