Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

STATS707 Computational Intro to Statistics Basics of Probability Theory

Basics of
Probability Theory

Basics of Probability Theory


• Random variables
• Probability distributions
• Expectation
• Variance
• Normal distribution
• Other commonly used distributions
• Bernoulli, Binomial, Poisson, Exponential
• Simulation exercises and real data
• Laws of probability
• Conditional probability

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Probability Distributions
What are they?

Why do we need them??

Probability Distributions

What are they?

• Mathematical functions which determine the probability that a random variable


takes a particular value/s

• Discrete probability distributions


• Continuous probability distributions

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Probability Distributions

Why do we need them??

• Because we have to understand the uncertainty associated with ‘random


variables’.

• But what are ‘random variables’? Why should we bother?


• How are they different from ‘variables’?

Variables
• Can take different values
• the value that variable will take is deterministic.
• Governed by an order
• Or determined by other factors
• e.g.
• Time (thankfully deterministic – set order!)
else, tomorrow could be exam day!
• BMI = Weight/ (Height^2)
Can determine what BMI would be for given height and weight
• any other?

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Random variables
• Can take different values
• BUT the value is governed by a random mechanism
• Sampling a particular unit, e.g.
• Height/weight of a person (we are all different!)
• Weekly income of a family
• Complex combination of various uncertain factors, e.g.
• How will be the weather today?
• When will the bus arrive?
• Outcome of a match/exam?
• Return on investment
• Behaviour of a person
• … and so on….(almost everything we come across)

Random variables
• The Only way to understand the random variables is to assign probabilities to all possible
outcomes

• This is precisely what probability distributions do.


• They are mathematical functions which assign these probabilities

• We will not go into the maths in this course


• But we do want to be able to ‘read’ a probability distribution plot (to understand what
probabilities it is assigning)

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Types of random variables


• Discrete random variables
• Can take only finitely many values
• Weather in Auckland (sunny, overcast, showery)
• Outcome of a test match (win, lose, drawn)

• Continuous random variables


• Can take infinitely many values
• When will bus arrive today?
• Return on investment
• Height of a person (exact measurement – if it was ever possible!)
• Income of a family (again, assuming exact measurement)
• In practice, variables which can take a large number of distinct values are considered
continuous (e.g. income in $, height in cm)

Notations - I
• Random variables are often denoted by letters, e.g. X, Y or Z, etc.

• Subscripts are used to denote multiple random variables. For e.g.


• Xi to denote (say) the income of the ith individual,
• Zt to denote the temperature at time t, etc.

• Sometimes, small letters are used to denote the notional observed value.
For e.g.
• Y = y, indicates that the observed value of the random variable Y is y,
• Xi = xi , i= 1,…,n indicates that we have observed n random variables (say, heights of n
individuals) and the notional observed value of the ith random variable Xi is xi.

10

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Notations - II
• A general probability distribution may be represented as f(x) or p(x)
• Typically, f(x) is used for continuous distributions and p(x) for discrete.
• For discrete, p(x) indicates the probability of X taking a value x.
• For continuous, f(x) indicates the probability density of X taking a value x.

• Specific probability distributions can also be expressed. For e.g.


• N(µ,σ) denotes a normal distribution with mean µ and standard deviation σ.

• The symbol ~ is used to denote that a certain random variable follows a


certain probability distribution. For e.g
• X ~ f(x) denotes that X follows the probability distribution f(x)
• Z ~ N(0,1) denotes that Z follows the normal distribution with mean 0 and std dev 1.
• For multiple random variables, we may say Xi ~ f(x), i= 1,…,n.

11

Rules of probability
• Always positive and between 0 and 1.

• Total probability, i.e. the sum of probabilities of all possible values


must always sum to one!
• For discrete probability distributions, all bars must be between 0 and 1 and
the sum of all bars must be equal to one.
• For continuous probability distributions, the total area under the curve must
equal to one.
• For continuous random variables, since they can take infinitely many different
values, probability of taking any particular value is always equal to zero.

12

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Reading a discrete probability distribution

Weather in Auckland Shows the probability for


each of these values
What is the
probability that a
particular day will be 0.4

probability
sunny, overcast or
showery? 0.3

Random variable: weather Shows the


0.2
values the
only 3 possible values variable
can take
Probability (sunny) = 0
weather

13

Reading a continuous probability distribution

Shows the probability density


Height of a male student for each of these values
Probability density

Measured in cm
Area under the
Random variable: height
curve =1
Can take many possible
values
Shows the
values the
0.04 variable
0.02 can take
Prob (height < 170) 0
= shaded area 150 170 190
Height
in cm

14

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Reading a continuous probability distribution

Height of a male student

Probability density
Measured in cm

Random variable: height

Can take many possible


values

0.04
Probability
distribution for the 0.02
height of a female 0
student?? 150 170 190
Height
in cm

15

Expectation and Variance

• Some of the most basic statistical concepts


• Summarising data

16

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Expectation or Expected Value

• Expected value of a random variable


• Is ‘simply’ the mean μ of the random variable
• Captures the ‘central tendency’: data is centred around mean.
• Also known as Expectation.
• Expectation of a random variable X is denoted by E(X).

17

Variance

• Variance of a random variable


• Square of the standard deviation = σ^2
• Captures the variability in the data
• Variance of a random variable X is denoted by V(X) or Var(X).
• It is the expected squared difference from the mean, Var(X) = E[(X - µ)2]

18

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Normal distribution
• Symmetric bell shaped distribution. Symmetric about m (mean)

• Tails go down fairly fast. Usually no observations more than 3 s


(standard deviations) away from the mean.

• approx 68% within 1 s of mean


• approx 95% within 2 s of mean
• approx 99.7% within 3 s of mean

19

Normal distribution

Area between
+1s and -1s = 0.68

68 % of the observations are likely to be between -1s and


+1s

20

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Normal distribution

Area between
+2s and -2s = 0.95

95 % of the observations are likely to be between -2s and +2s

21

Normal distribution

Area between
+3s and -3s = 0.997

99.7 % of the observations are likely to be between -3s and


+3s

22

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Normal distribution

• Any Normal distribution has same basic shape.

• It may have a different mean (be shifted)

• It may have a different standard deviation (width)

• Normal distribution is completely determined by mean m and standard


deviation s

• Any normal distribution can be transformed back to a standard normal


(m=0, s=1) by subtracting its mean and dividing by its standard deviation

23

Normal distribution
Two Normal distributions with different (shifted) means but the same standard deviation

24

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Normal distribution
Two Normal distributions with different standard deviations but the same means

25

Non- normal distributions


• Often, normal distribution is not the answer. The data may be discrete or
the distribution maybe skewed.

• Some of the commonly used distributions for modelling data are


• Bernoulli
• Binomial
• Poisson
• Exponential

• Other distributions commonly used for hypothesis tests are


• t, F and Chi-squared

26

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Three different Gamma distributions

27

Three different Beta distributions

28

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Non- normal distributions


• Normal distribution is unique in that
• It is always symmetric
• It is defined by mean and variance
• It can be defined over any (positive or negative) values of X.

• T-distribution is symmetric and very close to N(0,1)

• But other distributions are mostly not symmetric and often only applicable
if X takes values in a certain range (>0 or between 0 and 1, etc.)

• Also, non-normal distributions are defined using different parameters such


as degrees of freedon, scale, shape, etc., not by mean and variance.

29

Probability theory

• Learning the basics …

30

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Probability theory
• Events – compliment, intersection, union and sample space

• Laws of probability

• Probability of union and intersection of events

• Conditional probability, dependent and independent event

31

Probability theory
• Statistics is about decision making under uncertainty.

• One important way to quantify uncertainty is using probabilities.

• But before we can define probability, we need to define events.

32

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Probability theory
• An event is a basic outcome of a trial or a random experiment.
• For e.g. Tossing a coin is a random experiment and each of the possible
outcomes - H or T - is an event.

• We denote an event using capital letters such as ’A’ and ’B’.

• Events A and B are said to be independent if they are unrelated and


the value of one does not depend on the value of the other. Else they
are said to be dependent.

33

Example – throw of a dice


• Let’s consider the outcome of a throw of a dice and define the
following two events:

• A: the outcome is an odd number


• A = {1,3,5}

• B: the outcome is greater than 4


• B = {5,6}

34

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Probability theory
• Compliment of an event A is denoted by Ac.
• Intersection of events A and B is denoted by A ∩ B and is
defined as the outcome shared by both A and B.
• Union of events A and B is denoted by A ∪ B and is defined as the
outcome shared by either A or B or both.
• Null event denotes ’no outcome’ and is denoted by ϕ.

35

Example – throw of a dice


We have, A = {1,3,5} and B = {5,6}. Then,
• Ac: the outcome is an even number
• Ac = {2,4,6}
• Bc : the outcome is less or equal to 4
• Bc = {1,2,3,4}
• A ∩ B: the event is an odd number AND greater than 4
• A ∩ B = {5}
• A ∪ B: the event is an odd number OR greater than 4
• A ∪ B = {1,3,5,6}

36

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Probability theory
• Sample space is denoted by Ω and is the collection of all events
possible for a given experiment.
• For toss of a coin, Ω = {H, T, ϕ}.
• For throw of a dice, Ω = {1,2,3,4,5,6,ϕ}.

• What will be the sample space for income/height/weight of a


person?

37

Probability – foundational definitions


• Probability is a measure. It measures the chance/likelihood of the
occurrence of events.

• Basic laws of probability


• 1 ≥ P (A) ≥ 0 for any event A (probabilities are always positive and between 0
and 1).
• P (Ω) = 1 for any experiment (total probability always sums to 1).
• P (ϕ) = 0 (probability of a null event is 0).

38

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Calculating probabilities of events


• Relative frequency approach: if Ω consists of n equally likely
outcomes, k of which constitute an event A, then P(A) = k/n.
• When a dice is tossed, all six possible outcomes are equally likely and
three of those outcomes result in the event A (odd number).
Therefore, P(A) = 3/6 = ½.

• Other approaches to calculating probabilities:


• Modelling (for e.g using a probability distribution)
• Subjective probability (not covered in this course).

39

Probability – foundational definitions


• P (A c ) = 1 − P (A).

• For any two events A and B,


P (A ∪ B ) = P (A) + P (B) − P (A ∩ B).

• But if A and B are mutually exclusive, that is, A ∩ B = ϕ then,


P (A ∪ B ) = P (A) + P (B).

40

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Example – throw of a dice


We have, P(A) = ½ and P(B) = 2/6 = 1/3. Then,

• P(Ac) = 1 – ½ = ½.

• P(Bc) = 1 – 1/3 = 2/3.

• A ∩ B = {5}, therefore, P(A ∩ B) = 1/6.

• A ∪ B = {1,3,5,6}, therefore, P(A ∪ B) = 4/6 = 2/3.

41

Conditional probability
• The conditional probability of A given B is

P (A|B) = P (A ∩ B)
P(B)

• If A and B are independent then, P (A ∩ B) = P (A)P (B).


• If A and B are independent then, P (A|B) = P (A)

The Bayes theorem is essentially this conditional probability formula.

42

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Example – throw of a dice


In our example, A|B will be interpreted as ‘the outcome is odd given
that it is greater than four’. Now, we have
• P(B) = 2/6 = 1/3.
• A ∩ B = {5}, therefore, P(A ∩ B) = 1/6.

Therefore, P(A|B) = P(A ∩ B) / P(B) = 0.1667/0.3334 = ½.

• We can see that P(A|B) = ½ = P(A), that is, A and B are independent
events.

43

Summary on probability theory


• Probability is a mathematical measure and must follow its basic laws.

• Probability is applied on sets, which denote the various events (possible


outcomes).

• Samples space Ω is a collection of all possible events/outcomes of a


random variable.
• For any continuous r.v., Ω is a set of infinite elements
• For any discrete r.v., Ω is a finite/countable set.

• Concepts learned in this probability primer will be needed later!

44

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Bernoulli distribution (p)


• Used for a random variable that can only have two possible outcomes.

• For e.g.
• toss of coin (H/T),
• results of an experiment (success/failure), etc.

• It has only one parameter, p, which represents the probability of success.

• If we run successive experiments, where each one is independent and the


outcome of an experiment is not governed by the outcomes of the previous ones,
then the outcome of each of those experiments can be modelled using a
Bernoulli distribution.

45

Bernoulli distribution (p)


• If X ~ Bernoulli(p), then X can take values 0 or 1.

• Sample space Ω = {0,1}

• The probability mass function is:


P(X = 1) = p
P(X = 0) = 1-p

• E[X] = p
• Var[X] = p(1-p)

46

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Binomial distribution
• What if we have n repeated experiments?
• We could model them as n independent Bernoulli random variables OR
• We could model the distribution of the total no. of successes out of n!

• The total no. of successes out of n independent Bernoulli trials follows a


Binomial distribution.
• e.g. no. of Heads/Tails out of n successive coin tosses, no. of wins out of the total no.
of matches played, etc.

• It has two parameters: n and p, where,


• n is the no. of independent repetitions
• P is the probability of success in each trial.

47

Binomial distribution (n,p)


• If X ~ Binomial(n,p), then X can take values 0,1,...,n.

• Sample space Ω = {0,1, … , n}

• The probability mass function is:


𝑛
P(X = x) = 𝑝 𝑃 𝑥 1 − 𝑃 𝑛−𝑥
, x= 0,1,…,n.

• E[X] = np
• Var[X] = np(1-p)

48

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Poisson distribution (λ)


• What if the number of trials is VERY large (approaching infinity) and the
probability of success is VERY small (close to 0)?
• Here, the number of trials becomes immaterial and we are really modelling the
number of ‘rare’ events that can occur!
• The binomial distribution ‘converges’ to a Poisson distribution

• Poisson distribution models the number of occurrences of a ‘rare event’


and has only one parameter that denotes the rate at which such events
occur.
• .e.g. no. or road accidents, no. of injury claims received, no. of burglaries, etc.

• The rate is typically denoted using µ or λ.

49

Poisson distribution (λ)


• If X ~ Poisson(λ), then X can take values 0,1,…

• Sample space Ω = {0,1,2,3,…}

• The probability mass function is:


𝜆𝑥 ⅇ −𝜆
P(X = x) = ,𝑥 = 0,1,2, …
𝑥!

• E[X] = λ
• Var[X] = λ

50

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Exponential distribution (λ)


• When modelling rare events, you may also be interested in modelling the time
between two successive events.
• If the number of events follows a Poisson distribution then the time between the
events follows an Exponential distribution.
• For e.g.
• If your business was ram-raided, how long before it is targeted again?
• A large store has multiple self checkout machines and a steady stream of customers who
want to check out. If 3 self checkouts become available each minute (on an average), how
long before the next one becomes available?

• Exponential distribution is a continuous distribution and also has just one, the
rate, parameter (similar to Poisson).

51

Exponential distribution (λ)


• If X ~ Exponential(λ), then X can take values any values 0 or greater.

• Sample space Ω = [0,∞)

• The probability mass function is:


P(X = x) = 𝜆ⅇ −𝜆𝑥 , 𝑥 ≥ 0.

• E[X] = 1/λ
• Var[X] = 1/ λ2

52

University of Auckland
STATS707 Computational Intro to Statistics Basics of Probability Theory

Summary on probability distributions


• It is not possible to predict the outcome of random variable/event with
certainty.

• Probability distributions quantify the uncertainty in the values a random


variable can take!

• Discrete probability distributions provide the actual probability of each


value, but the continuous distributions provide the relative probability.

• Normal distribution can be completely determined by mean and standard


deviation. But other distributions have other parameters.

53

Simulation and real data exercises

54

University of Auckland

You might also like