Download as pdf or txt
Download as pdf or txt
You are on page 1of 128

SIT787: Mathematics for Artificial Intelligence

Topic 3: Probability
amangupta0141@gmail.com
QU1HPBT85A
Asef Nazari

School of Information Technology, Deakin University

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 1 / 128
Exploratory Data Analysis
amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 2 / 128
Data Collection and Sampling
Statistics, the art and science of learning from data
Everything starts with a question
Collection, description, analysis, drawing conclusion
Journey form data to models and back
Data is the essence of modelling,
models are prepared to describe data
Data are obtained from experiments and are the result of
measuring some characteristics or property of objects
amangupta0141@gmail.com
QU1HPBT85A

Each row is for a case, an


observation, or an object
These characteristics or
properties are called
variables
This file is meant for personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 3 / 128
Data variables or features
Numerical (quantitative):can be measured on a numerical
scale or counted
A variable is numerical if its observation takes numerical values
that represent different magnitudes of the variable
Continuous or discrete
your weight is continuous and the number of cars in a
household is discrete
Categorical (qualitative): take values that are non-numerical
amangupta0141@gmail.com
in nature
QU1HPBT85A
If a variable has observations belonging to one of a set of
categories, it is called categorical.
It is meaningless to do arithmetic on categorical data (like
postcodes)
Categorical data could only be classified into categories, levels
or classes
Nominal or ordinal
”yes” or ”no” for whether someone passed the driving test,
This file is meantgendet
for personal use by amangupta0141@gmail.com only.
Level of education
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 4 / 128
Populations and samples

The set of all the objects of interest is called the population


Populations are not generally available to study, or it is very
difficult and costly to access them
In a census, we actually trying to capture all the population.
Instead of accessing the whole population,
we might take a sample, a smaller subset of the population
amangupta0141@gmail.com
do conduct the study over the sample
QU1HPBT85A
Based on the information we get from the sample, we make
inferences about the population
We need to be very careful in choosing a sample.
It should be a good representative of the population
A good sample presents similar characteristics to the
population
A random sample: every individual in the population has the
same chance to be selected in the sample
This file is meant for personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 5 / 128
Issues in data collection

Noise
there are always sources of inaccuracies in any real data
collected
Missing values
Missing variable
amangupta0141@gmail.com
latent variable analysis (latent variable = hidden variable =
QU1HPBT85A
missing variable)
Bias
systematically inaccurate estimates of population values
Outliers
These are data points that lie well beyond the bulk of samples

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 6 / 128
Sample statistics and population parameters

Numerical summaries of the population are called parameters


and generally shown by Greek letters.
A parameter is a numerical summary of the population
A statistic is a numerical summary of a sample
We are interested in learning about the parameters to have a
amangupta0141@gmail.com
better understanding about the population.
QU1HPBT85A
The parameter values are almost always unknown.
We use sample statistics to estimate the related parameter
values.
We estimate population parameters using the information
obtained from a sample from it.

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 7 / 128
Notations

We must also distinguish between sample statistics that we


calculated so far and corresponding population parameters
Population mean µ
Sample mean x̄
amangupta0141@gmail.com 2
population variance σ
QU1HPBT85A
sample variance s2
population standard deviation σ
sample standard deviation s

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 8 / 128
Data table

Columns are variables or features


rows are cases or observations, assume we have m
observations
A collumn is considered as a random variable with unknown
parameters
amangupta0141@gmail.com
QU1HPBT85A
we only have a sample x1 , x2 , . . . , xn
Assume that all the observations are selected randomly
if we have n columns (assumen they are all numerical), then
we ahve a system of random variables (X1 , X2 , . . . , Xn )
So, for each column j, x1j , x2j , . . . , xmj ∼iid Xj

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 9 / 128
Exploratory data analysis

Considre single variables


Detect the type
Find summary statistics
measure central tendencies (mean ,median, mode) and
dispersion (variance, std, range, IQR)
amangupta0141@gmail.com
frequency and realtive frequency tables
QU1HPBT85A
Visualisation: Histograms, barplots, boxplots, piecharts
consider them two at a time
Find the correlation between them
Scatter plots, contingency tables and plots

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 10 / 128
Measures of central tendency

To summarise a categorical variable: frequency table


The category with the highest frequency is called the modal
category
Bar plots
we may use relative frequencies
To summarise a numerical variable {x1 , x2 , . . . xn }
n
P
amangupta0141@gmail.com xi
QU1HPBT85A
Average or mean x̄ = i=1
n
Median x̃: Sort the data increasingly, and choose the middle
for odd n, or get the average of two middle ones for even n
It is an elemnt that 50% of data is less than x̃ and 50% of
data is larger than x̃.
The median is more robust than the average, as it is not
affected by outliers
Mode: the most frequent item
This file is meant for personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 11 / 128
Measures of statistical dispersion
n
P
(xi −x̄)2
Variance s2 = i=1
n−1 √
Standard deviation s = s2
IQR: consider the increasingly sorted data
x1 , x2 , x3 , x4 , x5 , x6 , x7
The median x̃ = x4 which is called 50th percentile or second
quartile Q2 .
the median of the left side of x̃, is x2 , whihc is called the 25th
amangupta0141@gmail.com
QU1HPBT85Apercentile, or the first quartile Q1
the median of the right side of x̃, is x6 , whihc is called the
75th percentile or the third quartile Q3
IQR = Q3 − Q1

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 12 / 128
Probability Theory
amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 13 / 128
Probability: big picture

Two interpretation of uncertainty


the fraction of times an event occurs
degree of belief about an event
Sources of uncertainty
uncertainty in the data,
uncertainty in the machine learning model, and
amangupta0141@gmail.com
QU1HPBT85Auncertainty in the predictions produced by a model.
Quantifying uncertainty requires the idea of a random variable
Associated with the random variable there is a function called
probability distribution
Random variables and their distributions are meant to
describe populations in Statistics.

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 14 / 128
Sample space, events, and probability

Consider a random experiment or trial


We know all the outcomes
but we do not know which one will happen
outcome is not predictable with certainty in advance
The sample space Ω
the set of all possible outcomes of the experiment
The event space A
amangupta0141@gmail.com
A subset of the sample space is called an event A ⊂ Ω
QU1HPBT85A
An event A occers if the outcome of the expriment is a
membebr of A
A is the collection of all subsets of Ω
A is the power set of Ω
The probability P
With each A ∈ A we associate a number P (A)
measures the probability that the event will occur
(Ω,isA,
This file P ) isforcalled
meant a probability
personal space
use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 15 / 128
The big picture

Our aim is to learn from data


Any data collection has noise (randomness)
We use probability theory (in particular random variables) to
model the data nd deal with the noise
Different types of data features can be modelled using
amangupta0141@gmail.com
different types of random variablaes
QU1HPBT85A
We first do Exploratory data analysis to learn some aspects of
data
Using descriptivr statistics to summarise the data
use visualisation to represent the data efficiently
We may look at features one at a time, or two together

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 16 / 128
The data

Variables or features each will be modelled using a random


variable
Machine learning is the same as statistical learning

Each column, variable, or


feature is treated as a
random variable
amangupta0141@gmail.com
QU1HPBT85A
Here we need to study a
system of random variables
together.
Explanatory Variables vs.
Response Variables
supervised and unsupervised
learning
This file is meant for personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 17 / 128
Examples of sample space

If the outcome of an experiment consists in the determination


of the sex of a newborn child, then

Ω = {m, f }

If the experiment consists of the running of a race among the


seven horses having post positions 1, 2, 3, 4, 5, 6, 7, then
amangupta0141@gmail.com
QU1HPBT85A
Ω = {all orderings of (1, 2, 3, 4, 5, 6, 7)}

Suppose we are interested in determining the amount of


dosage that must be given to a patient until that patient
reacts positively
Ω = (0, ∞)

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 18 / 128
Random experiments

Toss a fair (or unfair) coin once, twice, and 3 times


Ω1 = {H, T }
Ω2 = {HH, HT, T H, T T }
Ω3 = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }
roll a fair die, and roll two fair dice
Ω4 = {1, 2, 3, 4, 5, 6}
Ω5 = {(i, j)|i, j ∈ {1, 2, 3, 4, 5, 6}}
amangupta0141@gmail.com
toss a fair (or unfair) coin n times (n = 5 for example)
QU1HPBT85A
Ω6 = {HHHHH, HHHHT, . . .}, |Ω6 | = 2n
toss a coin until the first head appears
Ω7 = {H, T H, T T H, T T T H, . . .}
number of arrivals to a shop in a given period of time
Ω8 = {0, 1, 2, 3, . . .}
lottory game playing with a coin
Ω9 = {H, T }
This file is meant for personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 19 / 128
Random experiments

Number of emails I receive every day Ω = {0, 1, 2, . . .}


Amount of time someone spent in social media in a day
amangupta0141@gmail.com
Ω = [0, 24]
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 20 / 128
The concept of probability
For a randomised experiment or trial, the probability is the
likeliness of a particular outcome
in tossing two coins, Ω = {hh, ht, th, tt}
If we are interested in the cases where the first coin lands
heads E = {hh, ht}
When all possible outcomes are equally likely
amangupta0141@gmail.com number of outcomes in E
QU1HPBT85A P (E) =
number of outcomes in the sample space

P (E) = 24 = 0.5
Two events are called disjoint or exclusive if they have no
outcome in common
Event E the first coin lands heads: E = {hh, ht}
Event F the first coin lands tails: F = {tt, th}

This file is meant for personal use by


E amangupta0141@gmail.com
∩F =∅ only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 21 / 128
Algebra of events

Suppose E1 and E2 are events (E1 , E2 ⊂ Ω)


You can make new events by any set operations that you
know
The union of events E1 ∪ E2
The intersection of events E1 ∩ E2
The difference of events E1 − E2
The complement of an event is another event Ωc = ∅, ∅c = Ω
amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 22 / 128
Axioms of probability (Kolmogorov)
1 P (Ω) = 1
2 For E ⊂ Ω, 0 ≤ P (E) ≤ 1
3 For two disjoint events E1 , E2 ⊂ Ω and E1 ∩ E2 = ∅

P (E1 ∪ E2 ) = P (E1 ) + P (E2 )

Some extensions
amangupta0141@gmail.com
For any two events E1 , E2 ⊂ Ω
QU1HPBT85A
P (E1 ∪ E2 ) = P (E1 ) + P (E2 ) − P (E1 ∩ E2 )

The collection of events {E1 , E2 , . . . , En } is colled mutually


disjoint if Ei ∩ Ej = ∅ for every i 6= j
For mutually disjoint events {E1 , E2 , . . . , En }

[n Xn
P ( Ei ) = P (Ei )
This file is meant for personal use
i=1by amangupta0141@gmail.com
i=1 only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 23 / 128
Some propositions

1 = P (Ω) = P (E ∪ E c ) = P (E) + P (E c ) which implies that

P (E c ) = 1 − P (E)

For any two events E1 , E2 ⊂ Ω

1P (E ∪ E ) = P (E1 ) + P (E2 ) − P (E1 ∩ E2 )


2
amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 24 / 128
Example

A total of 28 percent of males living in a city smoke


cigarettes, 6 percent smoke cigars, and 3 percent smoke both
cigars and cigarettes. What percentage of males smoke
neither cigars nor cigarettes?
Let E be the event that a randomly chosen male is a cigarette
smoker P (E) = 0.28
amangupta0141@gmail.com
QU1HPBT85ALet F be the event that he is a cigar smoker P (F ) = 0.6
P (E ∩ F ) = 0.3
Then, the probability this person is either a cigarette or a cigar
smoker is
P (E∪F ) = P (E)+P (F )−P (E∩F ) = 0.28+0.06−0.03 = 0.31

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 25 / 128
Conditional Probability
and Independent Events
amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 26 / 128
Conditional probability

A and B are two events, A ⊂ Ω and B ⊂ Ω


P (A ∩ B)
P (A|B) =
P (B)

P (A ∩ B) = P (A|B)P (B)
P (A ∩ B) = P (B|A)P (A)
amangupta0141@gmail.com
useful to check when events are dependent or independent
QU1HPBT85A
Two events with non-zero probabilities are independent
P (A ∩ B) = P (A)P (B)
P (A|B) = P (A)
P (B|A) = P (B)
If E and F are independent, then so are E and F c .
If we have several independent models, it is better to make an
ensemble model.
This file is meant for personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 27 / 128
Conditional probabilities: example of die tossing

Toss a die Ω = {1, 2, 3, 4, 5, 6}


Event A the number is even: A = {2, 4, 6}
Event B: the number is greater than 4, B = {4, 5, 6}
Event C: The number is greater than 5, C = {5, 6}
P (A) = 12 , P (B) = 12 , P (C) = 13
amangupta0141@gmail.com
QU1HPBT85AA ∩ B = {4, 6} and P (A ∩ B) = 1
3
A ∩ C = {6} and P (A ∩ C) = 16
1
B ∩ C = {5, 6} and P (B ∩ C) = 3
If P (E ∩ F ) = P (E)P (F ), then they are independent.
Otherwise dependent.

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 28 / 128
Conditional probabilities: example

Consider a group of 100


people |Ω| = 100 Consider a group of 100
H: the chosen person is people |Ω| = 100
48
happy P (M ) = 100 = 0.48 C: the chosen person likes
80
M: the chosen participant is chees P (C) = 100 = 0.8
70 48
married P (M ) = 100 = 0.7 P (C|D) = 60 = 0.8
P (H|M ) = 42
70 = 0.6
amangupta0141@gmail.com P (C|D) = P (C)
QU1HPBT85A c 6
P (H|M ) = 30 = 0.2
Dogs
Happy yes no
yes no yes 48 32 80
Cheese
yes 42 28 70 no 12 8 20
Married
no 6 24 30 60 40 100
48 52 100
This file is meant for personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 29 / 128
Independence of more than two events

Two events are indepenent if P (E ∩ F ) = P (E)P (F )


Three events are independent
P (E ∩ F ∩ G) = P (E)P (F )P (G)
P (E ∩ F ) = P (E)P (F )
(E ∩ G) = P (E)P (G)
amangupta0141@gmail.com
P (F ∩ G) = P (F )P (G)
QU1HPBT85A
The events E1 , E2 , . . . , En are said to be independent if for
every subset E10 , E20 , . . . , Er0 , r ≤ n of these events

P (E10 ∩ E20 ∩ . . . ∩ Er0 ) = P (E10 )P (E20 ) . . . P (Er0 )

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 30 / 128
Conditional probability: example

A bin contains 5 defective (that immediately fail when put in


use), 10 partially defective (that fail after a couple of hours of
use), and 25 acceptable transistors. A transistor is chosen at
random from the bin and put into use. If it does not
immediately fail, what is the probability it is acceptable?
amangupta0141@gmail.com
Solution: Since the transistor did not immediately fail, we know that it
QU1HPBT85A
is not one of the 5 defectives and so the desired probability is:
P {acceptable|not defective} = P {acceptable, not defective}
P {not defective} ,
P {acceptable} 25/40 5
= P {not defective} = 35/40 = 7
where the last equality follows since the transistor will be both
acceptable and not defective if it is acceptable.

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 31 / 128
The law of total probability
Consider two events F and E

F = (F ∩ E) ∪ (F ∩ E c ), the union of two disjoint events


P (F ) = P (F ∩ E) + P (F ∩ E c ) =
amangupta0141@gmail.com
QU1HPBT85AP (F |E)P (E) + P (F |E c )P (E c )
n
Consider events E1 , . . . , En s.t. Ei ∩ Ej = ∅,
S
Ei = Ω
i=1
n
P
P (C) = P (Ei )P (C|Ei )
i=1

Ei are hypothesis. Only one of


them will happen
ThisNow
file is meant
find P (Eifor
|C)personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 32 / 128
The law of total probability: example
Example: An insurance company believes that people can be
divided into two classes — those that are accident prone and
those that are not. Their statistics show that an
accident-prone person will have an accident at some time
within a fixed 1-year period with probability .4, whereas this
probability decreases to .2 for a non-accident-prone person. If
we assume that 30 percent of the population is accident
amangupta0141@gmail.com
prone, what is the probability that a new policy holder will
QU1HPBT85A
have an accident within a year of purchasing a policy?
Solution: We obtain the desired probability by first
conditioning on whether or not the policy holder is accident
prone. Let A1 denote the event that the policy holder will
have an accident within a year of purchase; and let A denote
the event that the policy holder is accident prone. Hence, the
desired probability, P (A1 ), is given by P (A1 ) =
This file is meant for personal usec )P
by(A c ) = (.4)(.3) + (.2)(.7) = .26
amangupta0141@gmail.com only.
P (A 1 |A)P (A) + P (A1 |A
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 33 / 128
Bayes’ Theorem
amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 34 / 128
Bayes’ Theorem

The essence of Bayes’ Theorem is updating probabilities when


we receive further knowledge or information.
That’s why we are going to have prior and posterior
probabilities here.
Consider event E, you may know P (E). This is your prior
probability
amangupta0141@gmail.com
Another event F occurs.
QU1HPBT85A
Now, you are interested to see how your prior probability is
going to be changed.

P (F |E)P (E) P (F |E)


P (E|F ) = = P (E)
P (F ) P (F )

There is something which is going to be multiplied by your


prior P (A) and makes your posterior P (A|B)
This file is meant for personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 35 / 128
Bayes’ Theorem

Prior probability P (E)


Posterior probability P (E|F )
P (F |E)
Likelihood P (F )

amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 36 / 128
BAYES’ Theorem

Let E and F be events. we can write F = (F ∩ E) ∪ (F ∩ E c )

amangupta0141@gmail.com
QU1HPBT85A
P (F ) = P (F ∩ E) + P (F ∩ E c )
= P (F |E)P (E) + P (F |E c )P (E c )
= P (F |E)P (E) + P (F |E c )[1 − P (F )]
the probability of the event F is a weighted average of the
conditional probability of F given that E has occurred and the
conditional
This file probability
is meant for personal of
useF by
given that E has not occurred,only.
amangupta0141@gmail.com
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 37 / 128
Bayes’ Theorem

P (F ) = P (F |E)P (E) + P (F |E c )[1 − P (F )]


Bayes’ Theorem

P (E ∩ F )
amangupta0141@gmail.com P (E|F ) =
QU1HPBT85A P (F )

P (F |E)P (E) P (F |E)P (E)


= =
P (F ) P (F |E)P (E) + P (F |E c )[1 − P (F )]

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 38 / 128
Bayes’ Theorem: example

Question: A laboratory blood test is 99 percent effective in detecting a


certain disease when it is, in fact, present. However, the test
also yields a “false positive” result for 1 percent of the healthy
persons tested. (That is, if a healthy person is tested, then,
with probability .01, the test result will imply he or she has
the disease.) If .5 percent of the population actually has the
disease, what is the probability a person has the disease given
amangupta0141@gmail.com
that his test result is positive?
QU1HPBT85A
Solution: Let D be the event that the tested person has the disease and
E the event that his test result is positive. The desired
probability P (D|E) is obtained by P (E|D) = 0.99, P (D) =
0.005, P (E|Dc ) = 0.01, P (Dc ) = 0.995

P (D ∩ E) P (E|D)P (D)
P (D|E) = = = 0.3322
P (E) P (E|D)P (D) + P (E|Dc )P (Dc )
This file is meant for personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 39 / 128
BAYES’ Theorem: extension

F1 , F2 , . . . , Fn are mutually exclusive events


n
[
Fi = Ω
i=1

In other words, exactly one of the events


F1 , F2 , . . . , Fn must occur
amangupta0141@gmail.com
QU1HPBT85A n
(E ∩ Fi )
S
E is another event, and can be written as E =
i=1
n n
P (E ∩ Fi ) =
P P
P (E) = P (E|Fi )P (Fi )
i=1 i=1
Suppose now that E has occurred
P (E ∩ Fj ) P (E|Fj )P (Fj )
P (Fj |E) = = P
n
P (E) P (E|Fi )P (Fi )
This file is meant for personal use by amangupta0141@gmail.com
i=1 only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 40 / 128
Bayes’ Theorem: explanation

Bayes’ formula

P (E ∩ Fj ) P (E|Fj )P (Fj )
P (Fj |E) = = P
n
P (E) P (E|Fi )P (Fi )
i=1

If we think of the events Fj as being possible “hypotheses”


amangupta0141@gmail.com
about some subject matter,
QU1HPBT85A
P (Fj ) are our priors
then Bayes’ formula may be interpreted as showing us how
opinions about these hypotheses held before the experiment
[that is, the P (Fj )] should be modified by the evidence of the
experiment.
P (Fj |E) is our posteriors
This file is meant for personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 41 / 128
Bayes’ Theorem: simle version
P (B|A)P (A)
P (A|B) = P (B|A)P (A)+P (B|Ac )P (Ac )
The probabilities involved: P (A|B), P (B|A), P (A), P (Ac ),
P (B|Ac )
We need to know 4 of these to find the fifth one.

amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 42 / 128
Bayes’ theorem: example
Example: A test is 98% effective at detecting HIV. However,
test has a “false positive” rate of 1%. If 0.5% of a country’s
population has HIV, what is the probability having HIV when
the test is positive?
Let E = someone’s test is positive for HIV with this test
Let F = That person actually have HIV
A test is 98% effective at detecting HIV
If someone has HIV the success of test in prediction is
P (E|F ) = 0.98
amangupta0141@gmail.com
QU1HPBT85AHowever, test has a “false positive” rate of 1%
P (E|F c ) = 0.01
0.5% of a country’s population has HIV
P (F ) = 0.005 and therefore P (F c ) = 1 − 0.005 = 0.995
What is P (F |E)?
P (E|F )P (F )
P (F |E) =
P (E|F )P (F ) + P (E|F c )P (F c )
(0.98)(0.005)
= for personal use by amangupta0141@gmail.com
This file is meant = 0.33 only.
(0.98)(0.005) + (0.01)(1 − 0.005)
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 43 / 128
Bayes’ theorem: the intuition

amangupta0141@gmail.com
QU1HPBT85A

Conditioning on a positive result


changes the sample space
People who test positive
P (E|F )P (F ) + P (E|F c )P (F c )
People who test positive and
have HIV P (E|F )P (F )

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 44 / 128
Bayes’ theorem: the intuition
Say we have 1000 people
5 have HIV and test positive
985 do not have HIV and test negative
10 do not have HIV and test positive

amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 45 / 128
Random Variables
amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 46 / 128
Random variables

Tools to study the randomness in data


It is to assign numbers to the outcomes of a random
experiment
types of random variables
amangupta0141@gmail.com
QU1HPBT85AX is the number of heads in 3 tosses of a coin, it is a discrete
random variable
X is the reading of a scale in weighing people, it is a
continuous random variable

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 47 / 128
Example of a random variable

Experiment: Toss a coin n times. X is the number of heads


n=1 (
1 if H
X=
0 if T
amangupta0141@gmail.com
n=2 
QU1HPBT85A if  1 HT



if 1 TH

X=


 2 if HH

0 if

TT

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 48 / 128
Example of a random variable

Experiment: Roll a die. X square of the number



if


 1 1

if
4 2




amangupta0141@gmail.com 9 if 3


QU1HPBT85A X=
16 if 4



25 if 5







36 if 6

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 49 / 128
The definition

Consider Ω to be a sample space of a random experiment.


A random variable is defined as

X:Ω→R

For all ω ∈ Ω, X(ω) is a number


amangupta0141@gmail.com
All possible values that a random variable X can take is called
QU1HPBT85A
the support set of that random variable and denoted by SX
If we conduct this experiment several times, we will have a
sample of this ramdom variable.
x1 = X(ω1 ), x2 = X(ω2 ), . . . , xn = X(ωn ) or simply

x1 , x2 , . . . , xn

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 50 / 128
The notion of probability distribution

To represent a random variabel consisely.


n = 2: toss a fair coin twice
probability of
Outcome (ω) X(ω)
outcome, P (ω)
1
4 HH 2
1
4 HT 1
1
4 TH 1
1
4 TT 0
amangupta0141@gmail.com 1
P (X = 0) = P ({ω ∈ Ω|X(ω) = 0}) = P ({T T }) =
QU1HPBT85A 4
2
P (X = 1) = P ({ω ∈ Ω|X(ω) = 1}) = P ({T H, HT }) = 4
1
P (X = 2) = P ({ω ∈ Ω|X(ω) = 2}) = P ({HH}) = 4
P (X = x)
x 0 1 2
1 2 1
P (X = x) 4 4 4
This file is meant for personal use by amangupta0141@gmail.com only. x
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 51 / 128
Using the probability distribution

P (X = x)
x 0 1 2
1 2 1
P (X = x) 4 4 4
x
1 1 3
P (X ≥ 1) = P (X = 1) + P (X = 2) = 2 + 4 = 4
amangupta0141@gmail.com
QU1HPBT85A
Probability mass function (PMF) pX (x) = p(x) = P (X = x)
Let X is a random variable and SX = {x1 , x2 , . . . , xn } with
probabilities {p1 , p2 , . . . , pn } where P (X = xj ) = pj for all j.
pj ≥ 0
Pn
pj = 1
j=1

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 52 / 128
Rnadom variables

Discrete random variable: whose set of possible values can be


written either as a finite sequence x1 , . . . , xn , or as an infinite
sequence x1 , x2 , . . .
random variable whose set of possible values is the set of
amangupta0141@gmail.com
nonnegative integers is a discrete random variable
QU1HPBT85A
Continuous random variable: random variables that take on a
continum of possible values
the random variable denoting the lifetime of a car, (t1 , t2 )

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 53 / 128
Probability mass function

For a discrete random variabl X with values


SX = {x1 , . . . , xn }: probability mass function p(a) is defined
as
p(x) = P {X = x}
amangupta0141@gmail.com
QU1HPBT85A
p(xi ) > 0, i = 1, 2, . . .
p(x) = 0, for other values of x
P∞
p(xi ) = 1
i=1

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 54 / 128
Expectation of a Random
Variables
amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 55 / 128
Expected value of a random variable
You know the concept of average of x1 , x2 , . . . , xn which is
P
xi
x̄ =
n
For a random variable the values have different probabilities
and importance.
We want values with higher probability contribute more to the
average
amangupta0141@gmail.com
for a discrete random variable X with finite support set
QU1HPBT85A
x1 , x2 , . . . , xn and probabilities p1 , p2 , . . . , pn
n
X X
µ = E[X] = xi p i = xi P {X = xi }
i=1 i
for a discrete random variable X with infinite support set
x1 , x2 , . . . and probabilities p1 , p2 , . . .

X
µ = E[X] = xp This may not exist
This file is meant for personal usei by
i
amangupta0141@gmail.com only.
i=1
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 56 / 128
Example of expectation of a random variable
for a random variable X
x x1 x2 ... xn
P (X = x) p1 p2 ... pn
The expectation is
n
X
µ = E[X] = xi pi
i=1
amangupta0141@gmail.com
Lottory game:
QU1HPBT85A
(
win $5 with probability 0.1
lose $1 with probability 0.9
The related random variable
x 5 -1
P (X = x) 0.1 0.9
This file is meant for personal use by amangupta0141@gmail.com only.
µ = E[X] = 5(0.1) + (−1)(0.9) = −0.4 dollars per game
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 57 / 128
Examples

Lottory game:

win $100 with probability 0.5


lose $100 with probability 0.3


lose $50 with probability 0.2
amangupta0141@gmail.com
QU1HPBT85A
The related random variable
x 100 -100 -50
P (X = x) 18 37
18
37
1
37
µ = E[X] = 100(0.5) + (−100)(0.3) + (−50)(0.2) = 10
dollars per game

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 58 / 128
Expectation example
Find E[X] where X is the outcome when we roll a fair die.
Since p(1) = p(2) = p(3) = p(4) = p(5) = p(6) = 16 then
E[X] =
(1)p(1) + (2)p(2) + (3)p(3) + (4)p(4) + (5)p(5) + (6)p(6) = 72
Note that, for this example, the expected value of X is not a
value that X could possibly assume.
the average value of X in a large number of repetitions of the
amangupta0141@gmail.com
experiment
QU1HPBT85A
If I is an indicator random variable for the event A, that is, if
(
1 if A occurs
I=
0 if A does not occure

Then E[I] = 1P (A) + 0P (Ac ) = P (A)


the expectation of the indicator random variable for the event
This file is meant for personal use by amangupta0141@gmail.com only.
A is just the probability that A occurs
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 59 / 128
Discrete Random Variables
amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 60 / 128
Discrete random variables

x x1 x2 ... xn
x x1 x2 ... xn ...
P (X = x) p1 p2 ... pn
P (X = x) p1 p2 ... pn ...

With finite support set With infinite support set


pi ≥ 0 for all i pi ≥ 0 for all i
amangupta0141@gmail.com
n
QU1HPBT85A
P
p =1 ∞
i P
pi = 1
i=1
i=1
expectations
expectations (may not exist)
n

X
µ = E[X] = xi p i X
i=1
µ = E[X] = xi pi
i=1

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 61 / 128
Discrete random variable with infinite support set

Experiment: Toss a fair coin until the first head

Ω = {H, T H, T T H, T T T H, . . .}

Define X as the number of tossing till the first head


(including the last one)
amangupta0141@gmail.com 1
P (X = 1) = P ({H}) =
QU1HPBT85A 2
1
P (X = 2) = P ({T H}) = 4
x 1 2 ... k ...
1 1 1
P (X = x) 2 22
... 2k
...

P 1
2i
=1
i=1

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 62 / 128
Bernoulli trial

Independent repeated trials of an experiment with exactly two


possible outcomes are called Bernoulli trials.
There is a coin, P (H) = p, P (T ) = 1 − p
a sequence of independent coin toss
n = 2, H1 : First tossing is heads, H2 : second tossing is
heads
amangupta0141@gmail.com
QU1HPBT85Athese two are indep
P (HH) = P (H1 ∩ H2 ) = p2
P (T T ) = P (H1c ∩ H2c ) = (1 − p)2
P (T H) = P (HT ) = p(1 − p)
Arbitrary n, |Ω| = 2n
P (HHT HT ) = p3 (1 − p)2
P (m heads and n − m tails) = pm (1 − p)n−m
This file is meant for personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 63 / 128
Binomial distribution

Random experiment: Toss a coin with P (H) = p for n times


X is the number of heads
SX = {0, 1, 2, . . . , n}
consider n = 5

P (X = 0) = P ({T T T T T }) =
amangupta0141@gmail.com
(1 − p)5
QU1HPBT85A
P (X = 1) =
P ({HT T T T, T HT T T, T T HT T,
T T T HT, T T T TT H}) =
5p(1 − p)4 = 51 p(1 − p)4
P (X = 2) = 52 p2 (1 − p)3


n k
For B(n, p): P (X = k) = k p (1 − p)n−k for k ∈ SX
This file is meant for personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 64 / 128
Geometric and Poison distributions with infinite support set

Geometric distribution
Toss a coin with P (H) = p untill
the first head
X is the number of tossing
(including the last one)
P (X = k) = P ({T T . . . T H}) =
p(1 − p)k−1 , k ≥ 1
amangupta0141@gmail.com
QU1HPBT85A

Poisson distribution
X the number of arrivals
k
P (X = k) = λk! e−λ for
parameter λ > 0, k ≥ 0

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 65 / 128
Variance of a Random
Variables
amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 66 / 128
Expectation is not enough
Given a random variable X along with its probability
distribution function
We want summarize the essential properties of the mass
function by certain suitably defined measures.
One such measure would be E[X], the expected value of X
E[X] yields the weighted average of the possible values of X,
it does not tell us anything about the variation, or spread, of
amangupta0141@gmail.com
these values
QU1HPBT85A
Example: all have the same expectation
W = 0 with probability 1
(
1
−1 With probability 2
Y = 1
1 With probability 2
(
−100 With probability 12
Z=
This file is meant for personal
100use byWith probability 12
amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 67 / 128
Variance

The measure how far a random variable deviates fro its mean
We wanto to measure X − E[X]
E[X − E[X]] = 0
Let’s considser E[(X − E[X])2 ] which is the variance
Lottory game 1: µ = E[X] = 5(0.1) + (−1)(0.9) = −0.4
dollars per game
amangupta0141@gmail.com x 5 -1
QU1HPBT85A P (X = x) 0.1 0.9
X − E[X] 5.4 −0.6
(X − E[X])2 29.16 0.36

E[(X − E[X])2 ] = (29.16)(0.1) + (0.36)(0.9) = 3.24


The definition:

Var(X) = E[(X − E[X])2 ] = E[(X − µ)2 ]


This file is meant for personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 68 / 128
Variance
If X is a random variable with mean µ, then the variance of
X, denoted by Var(X), is defined by
Var(X) = E[(X − µ)2 ]
Also, Var(X) = E[X 2 ] − µ2
Example: Compute Var(X) when X represents the outcome
when we roll a fair die.
Since P {X = i} = 16 , i = 1, 2, 3, 4, 5, 6 we obtain
amangupta0141@gmail.com
QU1HPBT85A
6
X
E[X 2 ] = i2 P {X = i}
i=1
1 1 1 1 1 1 91
= 12 ( ) + 22 ( ) + 32 ( ) + 42 ( ) + 52 ( ) + 62 ( ) =
6 6 6 6 6 6 6
7
We computed the mean before, and µ = E[X] = 2 then
2 2 91 7 2 35
This file is meantVar(X) = E[X
for personal − µamangupta0141@gmail.com
use] by = −( ) = only.
6 2 12
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 69 / 128
Systems of random
variables
amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 70 / 128
Systems of random variables

A random experiment happend. we have several random


variables related to this experiment. We want to study them.
We find Ω and compute P (ω) for every ω ∈ Ω
We can define many real functions on Ω. Each is a random
variable
X : Ω → R and Y : Ω → R
amangupta0141@gmail.com
considering these together is a system of random variables
QU1HPBT85A

(X, Y )

we want to know whether we can say something about Y


based on our knowledge about X and vice versa
We may ocnsider many random variables together in a larger
system (X1 , . . . , Xn )
This file is meant for personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 71 / 128
Linear transformations

Let X be a random variable and c ∈ R


Y =X +c
Example Y = X + 2 and PMF

x -1 3 4 x 1 5 6
amangupta0141@gmail.com
P (X = x) 0.2 0.5 0.3
QU1HPBT85A P (Y = x) 0.2 0.5 0.3

P (X = x) P (Y = x)

x x

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 72 / 128
Linear transformations

Let X be a random variable and c ∈ R


Y = cX
Example Y = 2X and PMF

x -1 3 4 x -2 6 8
amangupta0141@gmail.com
P (X = x) 0.2 0.5 0.3
QU1HPBT85A P (Y = x) 0.2 0.5 0.3

P (X = x) P (Y = x)

x x

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 73 / 128
properties of expected values

E[aX + b] = E[aX] + E[b] = aE[X] + b


E[X + Y ] = E[X] + E[Y ]
E[aX + bY ] = aE[X] + bE[Y ]
For n random variables X1 , X2 , . . . , Xn
amangupta0141@gmail.com
QU1HPBT85A " n n
#
X X
E ci Xi = ci E[Xi ]
i=1 i=1

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 74 / 128
Transformation of a random variable
A function of random variable X is a ranodm variable and
g : R → R,
Y = g(X)
Examples
g(x) = ax + b, then Y = g(X) = aX + b
g(x) = x2 , then Y = X 2
g(x) = (x − E[X])2 , then Y = (X − E[X])2
g(x) = sin(x), then Y = sin(X)
amangupta0141@gmail.com
QU1HPBT85A
Y = g(X) where Y (ω) = g(X(ω))

x x1 x2 ... xn x g(x1 ) g(x2 ) ... g(xn )


P (X = x) p1 p2 ... pn P (Y = x) p1 p2 ... pn

n
P
E[X] = xi p i
i=1
n
P
E[Y
This file ] = E[g(X)]
is meant =
for personal g(xiby
use )piamangupta0141@gmail.com only.
i=1
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 75 / 128
Properties of variance

For a random variable X

Var(X) = E[(X − E[X])2 ]

from another perspective


g(x) = (x − E[X])2 , then Y = g(X) = (X − E[X])2
amangupta0141@gmail.com n n
2
(xi − µ)2 pi
P P
QU1HPBT85AE[Y ] = E[(X − E[X]) ] = g(xi )pi =
i=1 i=1
Var(x) = E[Y ]
Var(aX + b) = a2 Var(X), a, b ∈ R
adding a constant will shift the cloud of data to the left or
right, it does not change the variability
What about Var(X1 + X2 )?

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 76 / 128
Properties of variance

Var(aX + b) = a2 V ar(X)
if a = 0, Var(b) = 0
p
The quantity Var(X) is called the standard deviation of X.
q
std(X) = Var(X)

Var(X1 + X2 ) = E[((X1 + X2 ) − E(X1 + X2 ))2 ] =


amangupta0141@gmail.com
E[((X1 − E[X1 ]) + (X2 − E[X2 ]))2 ]
QU1HPBT85A

= E[(X1 −E[X1 ])2 +(X2 −E[X2 ])2 +2(X1 −E[X1 ])(X2 −E[X2 ])]

= E[(X1 − E[X1 ])2 ] + E[(X2 − E[X2 ])2 ] + 2E[(X1 −


E[X1 ])(X2 − E[X2 ])]

= Var(X1 ) + Var(X2 ) + 2E[(X1 − E[X1 ])(X2 − E[X2 ])]


This file is meant for personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 77 / 128
Covariance

The covariance of two random variables X and Y , written


Cov(X, Y ), is defined by

Cov(X, Y ) = E[(X − µX )(Y − µY )]

Equivalently, Cov(X, Y ) = E[XY ] − E[X]E[Y ]


amangupta0141@gmail.com
Cov(X, Y ) = Cov(Y, X)
QU1HPBT85A
Cov(aX, Y ) = aCov(X, Y )
Cov(X1 + X2 , Y ) = Cov(X1 , Y ) + Cov(X2 , Y )
Cov(X + a, Y ) = Cov(X, Y + b) = Cov(X, Y )
Y = aX + b,
Cov(X, Y ) = Cov(X, aX + b) = aCov(X, Y ) = aVar(X)

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 78 / 128
Covariance

Definition: Cov(X, Y ) = E [(X − E[X])(Y − E[Y ])]


Covariance measures association between X and Y
It is strongly related to the dependence of X and Y
If X and Y are independent then Cov(X, Y ) = 0
amangupta0141@gmail.com
QU1HPBT85A
E [(X − E[X])(Y − E[Y ])]

= E[XY − XE[Y ] − Y E[X] + E[X]E[Y ]


E[XY ] − E[Y ]E[X] − E[Y ]E[X] + E[X]E[Y ] = 0

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 79 / 128
Back to varince

Var(X + X) = 4Var(X) 6= Var(X) + Var(X)


n
P n
P n
P n
P
Var( Xi ) = Var(Xi ) + Cov(Xi , Xj )
i=1 i=1 i=1 j=1,j6=i
for n = 2,
amangupta0141@gmail.com
Var(X + Y ) = Var(X) + Var(Y ) + Cov(X, Y ) + Cov(Y, X)
QU1HPBT85A

If X and Y are independent random variables, then


Cov(X, Y ) = 0
n
P Pn
Var( Xi ) = Var(Xi )
i=1 i=1

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 80 / 128
Properties of covariance

Var(X + Y ) = Var(X) + Var(Y ) + 2Cov(X, Y )


If Cov(X, Y ) = 0 then Cov(X, Y ) = 0
if Cov(X, Y ) = 0, we cannot say Cov(X, Y ) = 0.
P (X = 0 ∩ Y = 1) = 0 6= P (X = 0)P (Y = 1) they are not
independent.
E[X] = 0, E[Y ] = 0
amangupta0141@gmail.com
E[XY ] = 0
QU1HPBT85A
Cov(X, Y ) = 0

Y \X -1 0 1 margin
1 1 2
1 3 0 3 3
1 1
-2 0 3 0 3
1 1
margin 3 3 0

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 81 / 128
Correlation

it can be shown that a positive value of Cov(X, Y ) is an


indication that Y tends to increase as X does,
a negative value indicates that Y tends to decrease as X
increases.
The strength of the relationship between X and Y is
amangupta0141@gmail.com
QU1HPBT85A
indicated by the correlation between X and Y

Cov(X, Y )
Corr(X, Y ) = p
Var(X)Var(Y )

that this quantity always has a value between -1 and +1

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 82 / 128
Correlation of two random variable

Corr(X, Y ) = √ Cov(X,Y )
Var(X)Var(Y )
If there is a perfect linear relationship between X and Y ,
Y = aX + b
Corr(X, Y ) = Corr(X, aX + b) = √ aVar(X)
amangupta0141@gmail.com 2
k
= |k| = ±1
Var(X)a Var(X)
QU1HPBT85A
Corr(aX, Y ) = Corr(X, Y ) if a > 0
Corr(X, Y ) ∈ [−1, 1]
Corr(X, Y ) = 0 then two random variables are uncorrelated

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 83 / 128
Sum of random variables

X : Ω → R, Y : Ω → R, Z : Ω → R
Expectation
Z = X + Y then E[Z] = E[X] + E[Y ]
Variance
Suppose Var(X) > 0
amangupta0141@gmail.com
QU1HPBT85AY = −X then Var(Y ) = Var(Y )
Z = X + Y , Var(Z) = Var(X − X) = Var(0) = 0
Var(X) + Var(Y ) = 2Var(X)
Var(X + Y ) 6= Var(X) + Var(Y )
For especial case that X and Y are independent random
variables, Var(X + Y ) = Var(X) + Var(Y )

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 84 / 128
Joint Probability
Distribution
amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 85 / 128
Joint probability distribution
To express interactions between two random variables
exeriment: Tossing two fair coins Ω = {HH, HT, T H, T T }
Let’s define three random variables over this sample space
(
1 if the 1st toss is H x 1 0
X= P (X = x) 0.5 0.5
0 else
x 1 0
amangupta0141@gmail.com
Y =1−X P (Y = x) 0.5 0.5
QU1HPBT85A
(
1 if the 2nd toss is H x 1 0
Z= P (Z = x) 0.5 0.5
0 else
P (X = 0 ∩ Y = 0) = P (X = 0 ∩ X = 1) = 0 impossible!
P (X = 0 ∩ Z = 0) = P ({T T }) = 14
although they look similar, but they have dofferent
interactions. We use joint doistribution to express interactions
This file is meant for personal use by amangupta0141@gmail.com only.
between them.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 86 / 128
Joint probability distribution

x x1 x2 ... xm x y1 y2 ... yn
P (X = x) p1 p2 ... pm P (Y = x) q1 q2 ... qn

P (X = xi ∩ Y = yj ) = pij for i = 1, . . . , m and j = 1, . . . , n


m P
P n
pij ≥ 0 and pij = 1
i=1 j=1
(
1 if the 1st toss is H
X= and Y = 1 − X
amangupta0141@gmail.com
0 else
QU1HPBT85A
HH Y
H HH Y
0 1 HH 0 1
X HHH X H
H
0 P (X = 0 ∩ Y = 0) P (X = 0 ∩ Y = 1) 0 0 0.5
1 P (X = 1 ∩ Y = 0) P (X = 1 ∩ Y = 1) 1 0.5 0

( (
1 if the 1st is H 1 if the 2nd is H
X= and Z =
0 else 0 else
HH Y
H
0 1
HH Y
0 1
This fileHisP (Xmeant
X HH
for personal use by amangupta0141@gmail.com only.
H
X HH
H
0 = 0 ∩ Z = 0) P (X = 0 ∩ Z = 1) 0 0.25 0.25
P (X = 1 ∩ Z = 0) P (X = 1 ∩ Z = 1) 1 0.25 0.25
Sharing or publishing the contents in part or full is liable for legal action.
1
Asef Nazari Math AI - SIT787 Topic 3 87 / 128
Marginal probabilities

Consider random variable X with SX = {x1 , . . . , xm } and Y


with SY = {y1 , . . . , yn }
we know P (X = xi ∩ Y = yj ) = pij
P (X = xi ) = P (X = xi ∩Y = y1 )+. . .+P (X = xi ∩Y = yn )
= pi1 + . . . + pin

X\Z 0 1
amangupta0141@gmail.com
X\Y 0 1 0 0.25 0.25
QU1HPBT85A 1 0.25 0.25
0 0 0.5
1 0.5 0
P (X = 0) = 0.25 + 0.25 =
P (X = 0) = 0 + 0.5 = 0.5 0.5
P (X = 1) = 0.5 + 0 = 0.5 P (X = 1) = 0.25 + 0 = 0.25
P (Y = 0) = 0 + 0.5 = 0.5 P (Z = 0) = 0.25 + 0.25 =
P (Y = 1) = 0.5 + 0 = 0.5 0.5
This file is meant for personal use by amangupta0141@gmail.com
P (Z = 1) = 0.25 + 0 =only. 0.25
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 88 / 128
Independent random variables

HH Y H
HH Y
H 0 1 0 1
X HH H X H
H
H
0 0 0.5 0 0.25 0.25
1 0.5 0 1 0.25 0.25

in this example, if we know X we can tell what is happening


to Y , But, knowledge about X does not say anything about
amangupta0141@gmail.com
Z.
QU1HPBT85A
Consider two random variable defined on the same sample
space Ω; random variable X with SX = {x1 , . . . , xm } and Y
with SY = {y1 , . . . , yn }
if P (Y = yj |X = xi ) = P (Y = yj ) for all i and j, then X
and Y are independent.
P (X = xi ∩ Y = yj ) = P (x = xi )P (Y = yj ) for all i and j
This file is meant for personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 89 / 128
Example
Experiment: three tosses of a fair coin
X is the number of heads for 1st and 2nd tosse
Y is the number of heads for 2nd and 3rd tosse
Are these random variables independent?
P (X = 0 ∩ Y = 0) = P ({T T T }) = 18
P (X = 1 ∩ Y = 0) = P ({HT T }) = 18
P (X = 2 ∩ Y = 0) = 0 impossible
amangupta0141@gmail.com ({HT H, T HH}) = 14
QU1HPBT85AP (X = 1 ∩ Y = 1) = P
P (X = 0 ∩ Y = 0) = 8 6= P (X = 0)P (Y = 0) = ( 14 )( 14 )
1

HH Y
H 0 1 2 marginal
X HH H
1 1
0 8 8 0 14
1 1 1 1
1 8 4 8 2
2 0 8 8 14
1 1
1 1 1
This file is meant formarginal
personal use
4 by
2 amangupta0141@gmail.com
4 only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 90 / 128
Independent random variables

x x1 x2 ... xm x y1 y2 ... yn
P (X = x) p1 p2 ... pm P (Y = x) q1 q2 ... qn

If two random variables are independent


E[XY ] = E[X]E[Y ]
amangupta0141@gmail.com
m n
QU1HPBT85AE[XY ] = P P x y P (X = x ∩ Y = y )
i j i j
P P i=1 j=1
= xi yj P (X = xi )P (Y = yj )
i P
P j P P
= xi yj pi qj = xi pi yj qj = E[X]E[Y ]
i j i j

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 91 / 128
Continuous random
variables
amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 92 / 128
Continuous random variables

Discrete random variables can take distinct values


Continous random variables can take all values in a continum,
like an inreval
Temperature, height, weight, time
The nature of this kind of variables is that they never take
amangupta0141@gmail.com
QU1HPBT85A
exact values. Even the best scales have limit in accuracy.
something is 10 germs ± 0.1 grams. It means the exact value
is in [10 − 0.1, 10 + 0.1] or [a − , a + ]
Therefore, for a continuous random variable X,
P (X = a) = 0

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 93 / 128
Probability density function
Xis a continuous random variable
fx (x) is the probabilty density functions given that
fX (x) ≥ 0
Z +∞
fX (x) = 1
−∞
Z b
P (X ∈ [a, b]) = P (a ≤ X ≤ b) = fX (x)dx
Z a a
P (X = a) = P (X ∈ [a, a]) =
amangupta0141@gmail.com fX (x)dx = 0
QU1HPBT85A a

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 94 / 128
Probability density function
The cumulative distribution
Z a
F (a) = P {X ≤ a} = P {X ∈ (−∞, a]} = f (x)dx
−∞

dF (a)
da = f (a)
P {a < x < b} = area of the shaded region
For
amangupta0141@gmail.com (
QU1HPBT85A e−x , x ≥ 0
f (x)
0, x < 0

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 95 / 128
Uniform continuous random variable
A continuous random variable X on an interval [a, b]
1
fX (x) = b−a
we can easily check
fX (x) ≥ 0
Z +∞
fX (x)dx = 1
−∞

amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 96 / 128
Cumulative distribution function
X is a conntinous or discrete random variable
The cumulative distribution function is defined as

FX (a) = P (X ≤ a)

For a continuous random


Z avariable
FX (a) = P (X ≤ a) = fX (x)dx
amangupta0141@gmail.com −∞
QU1HPBT85AFor a discrete random variable
P
FX (a) = P (X ≤ a) = pX (x)
x≤a

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 97 / 128
Properties of CDF

FX (x) = P (X ≤ x)
FX (x) non-strictly increasing
lim FX (x) = 0 and lim FX (x) = 1
x→−∞ x→+∞
FX (x) ∈ [0, 1]
Relationship between PDF and CDF
X is a random variable with CDF FX (x) and PDF fX (x)
amangupta0141@gmail.com
P (X ∈ [a, b]) = P (X ≤ b) − P (X ≤ a) = F (b) − F (a)
QU1HPBT85A Z b
P (X ∈ [a, b]) = fX (x)dx
Z b a

fX (x)dx = FX (b) − FX (a), means FX (x) is an


a
antiderivative of fX (x)
0
fX (x) = FX (x)

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 98 / 128
Examples

Uniform distribution on [a, b], X ∼ U (a, b)


(
1
b−a x ∈ [a, b]
fX (x) =
0 otherwise

amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 99 / 128
Cumulative distribution function: example

Question: Suppose the random variable X has distribution function


(
0 x≤0
F (x) = 2
1 − e−x x>0
amangupta0141@gmail.com
QU1HPBT85A
What is the probability that X exceeds 1?
Solution The desired probability is computed as follows:

P {X > 1} = 1 − P {X ≤ 1} = 1 − F (1) = e−1

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 100 / 128
Normal (Gaussian) distribution
X is a normally distributed random variable X ∼ N (µ, σ 2 )
The PDF with parameters (µ, σ 2 )
1 1 x−µ 2
fX (x) = √ e− 2 ( σ )
σ 2π

Standard normal distribution when (µ = 0, σ 2 = 1),


Z ∼ N (0, 1)
amangupta0141@gmail.com
QU1HPBT85A 1 1 2
fX (x) = √ e− 2 x

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 101 / 128
Normal distribution and transformations

Standard normal distribution is obtained from a normal


distribution using a linear transformation
X ∼ N (µ, σ 2 )
X−µ
Z= σ
Z = 1
X + σµ
amangupta0141@gmail.com
σ
QU1HPBT85A
Sometines data do not have normal distribution, and we have
nonlinear transformation to make them look similar to normal.
For example log transformation Y = log(X)
The new variable may have normal distribution
1
other examples Y = X, Y = X 2 , Y = sqrtX

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 102 / 128
Approximating PDFs using histograms

Consider X is a random variable, but only have a sample of it


You can think about a column in your data matrix
x1 , x2 , . . . , xn an independent sample from X

amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 103 / 128
Different shapes of distributions

Symmetric, skewed right, skewed left


Unimodal, bimodal, mutlimodal

amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 104 / 128
Expected value of a continuous random variable
for a discrete random variable X
x x1 x2 ... xn
P (X = x) p1 p2 ... pn
The expectation is
n
X
µ = E[X] = xi pi
i=1
amangupta0141@gmail.com
for a continuous random variable X with PDF fX (x)
QU1HPBT85A
Z +∞
µ = E[X] = xfX (x)dx
−∞

Example: Uniform distribution X ∼ U (0, 1) on [0, 1],


fX (x) = 1,
Z +∞ Z 1
1
E[X] = xfX (x)dx = (x)(1)dx =
This file is meant for personal
−∞ use by amangupta0141@gmail.com
0 2 only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 105 / 128
Properties of expected value
E[X + Y ] = E[X] + E[Y ]
E[aX + bY + c] = aE[X] + bE[Y ] + c
Z +∞
E[g(X)] = g(x)fX (x)dx
−∞
Variance of a continuous random variable
Z +∞
Var(X) = E[(X − E[X])2 ] = (x − µ)2 fX (x)dx
−∞
amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 106 / 128
Properties of expected value

Suppose we are given a random variable X and its probability


distribution
we are interested in the expected value of some function of X,
say g(X)
If X is a discrete random variable with probability mass
function p(x), then for any real-valued function g,
amangupta0141@gmail.com X
QU1HPBT85A E[g(X)] = g(x)p(x)
x

If X is a continuous random variable with probability density


function f (x), then for any real-valued function g,
Z ∞
E[g(X)] = g(x)f (x)dx
−∞

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 107 / 128
Properties of expected value

The expected value of a random variable X, E[X], is also


referred to as the mean or the first moment of X
The quantity E[X n ], n ≥ 1 is called the nth moment of X
P
amangupta0141@gmail.com 
 xn p(x) if discrete
QU1HPBT85A


 x

E[X n ] =




R ∞
 n f (x)dx
−∞ x if continuous

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 108 / 128
Joint PDF and CDF

Let X and Y are two random variables defined on the same


probability space
Joint PDF for discrete random variable

pX,Y (x, y) = P (X = x ∩ Y = y)
amangupta0141@gmail.com
Joint PDF for continuous random variable
QU1HPBT85A
fX,Y (x, y)

Joint CDF

FX,Y (x, y) = P (X ≤ x ∩ Y ≤ y)

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 109 / 128
Independent random variables

X and Y are independent if


FX,Y (x, y) = FX (x)FY (y) for all x and y
equivalent to P (X ≤ x ∩ Y ≤ y) = P (X ≤ x)P (Y ≤ y)
fX,Y (x, y) = fX (x)fY (y)
Covariance Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]
Correlation Corr(X, Y ) = √ Cov(X,Y )
amangupta0141@gmail.com Var(X)Var(Y )
QU1HPBT85A
X and Y are independent, then Cov(X, Y ) = 0. The inverse
may not be true.
X and Y are independent, then

Var(X + Y ) = Var(X) + Var(Y )

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 110 / 128
Descriptive statistics
amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 111 / 128
Descriptive statistics
Generally we have a sample of a population (modelled as a
random variable)
for example a column representing age of all the cases in a
data table is a sample form a an original random variable that
we don’t know it.
average age is 56.75, variance 90.2, and stdev is 9.5. We can
summarise the column as 56.75 ± 9.5
amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 112 / 128
Random variables as populations and samples

Consider a random variable X, and a sample from it


x1 , . . . , x n
using the properties of the sample, we can estimate the
properties of the population (random variable)

amangupta0141@gmail.com
Properties of X Properties of the sample
QU1HPBT85A
µ = E[X] x̄
σ2 = Var(X) s2
σ = std(X) s
m x̃

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 113 / 128
Study two random variables
Conisder two random variable X and Y
σX,Y = Cov(X, Y ) = E[(X − µx )(Y − µY )]
ρX,Y = Corr(X, Y ) = √ Cov(X,Y )
Var(X)Var(Y )
Instead of random variables we have samples of them, or we
want to study to numereical columns in a data matrix
x1 , . . . , xn and y1 , . . . , yn
sample covariance
amangupta0141@gmail.com
n
QU1HPBT85A
(xi − x̄)(yi − ȳ)
P
i=1
sX,Y =
n
Sample correlation
n n
(xi − x̄)(yi − ȳ) (xi − x̄)(yi − ȳ)
P P
i=1 i=1
r = =s
(n − 1)sX sY n n
(xi − x̄)2 (yi − ȳ)2
P P
This file is meant for personal use by amangupta0141@gmail.com
i=1 i=1 only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 114 / 128
Correlation coefficient through scatter plots

−1 ≤ r ≤ 1

amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 115 / 128
Approximately normal datasets

Consider X whose distibution (histogram) is similar to


symmetric bellshape
The empirical rule
Approximately 68% of the observations lie within x̄ ± s
Approximately 95% of the observations lie within x̄ ± 2s
Approximately 99.7% of the observations lie within x̄ ± 3s

amangupta0141@gmail.com
using standard normal
QU1HPBT85A
distribution
P (−1 ≤ Z ≤ +1) = 0.68
P (−2 ≤ Z ≤ +2) = 0.95
P (−3 ≤ Z ≤ +3) = 0.997

If a value sits more than 3 std of the mean, it is called an


outlier.
This file is meant for personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 116 / 128
Important Random Variables
amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 117 / 128
Important discrete random variables

X is a Bernoulli random variable, X ∼ Br(p)


Suppose that an experiment, whose outcome can be classified
as either a success or as a failure is performed
If we let X = 1 when the outcome is a success and X = 0
when it is a failure
amangupta0141@gmail.com
QU1HPBT85AP (X = 1) = p
P (X = 0) = 1 − p
where p, 0 ≤ p ≤ 1, is the probability that the trial is a success.
E[X] = p, Var(X) = p(1 − p)

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 118 / 128
Important discrete random variables
X is a Binomial random variable, X ∼ Binom(n, p)
the number of successes in n independent trials, when each
trial is a success with probability p.
SX = {0, 1, . . . , n}
Probability mass function for k ∈ SX
 
n k n!
P (X = k) = p (1 − p)n−k = pk (1 − p)n−k
k k!(n − k!)
amangupta0141@gmail.com
QU1HPBT85AE[X] = np, Var(X) = np(p − 1)
X is a Poisson random variable with parameter λ > 0,
X ∼ Pois(λ)
SX = {0, 1, 2, . . .}
Probability mass function for k ∈ SX

e−λ λk
P (X = k) =
k!
This file is meant for personal
E[X] = Var(X) = λ use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 119 / 128
Important continuous random variables

X is a uniform random variable in interval [a, b],


X ∼ Unif(a, b)
SX = R
Probability density function
(
1
b−a if a ≤ x ≤ b
f (x) =
0 otherwise
amangupta0141@gmail.com
QU1HPBT85A
X is normal random variable, X ∼ N (µ, σ 2 ),
SX = R
Probability density function
1 (x−µ)2
f (x) = √ e− 2σ2
σ 2π

E[X] = µ, Var(X) = σ 2
This file is meant for personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 120 / 128
Important continuous random variables

X is an exponential random variable with parameter λ > 0,


X ∼ Exp(λ)
SX = R
Probability density function
(
λe−λx if x ≥ 0
f (x) =
amangupta0141@gmail.com 0 if x < 0
QU1HPBT85A
the distribution of the amount of time until some specific
event occurs.
the amount of time (starting from now) until an earthquake
occurs,
or until a new war breaks out,
or until a telephone call you receive

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 121 / 128
Parameter Estimation and
Sampling
amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 122 / 128
Random variables as populations and samples

Consider a random variable X, and a sample from it


x1 , . . . , x n
using the properties of the sample, we can estimate the
properties of the population (random variable)

Population parameters Point estimates


amangupta0141@gmail.com n
QU1HPBT85A P
xi
i=1
µ = E[X] x̄ = n
n
P
(xi −x̄)2
σ 2 = Var(X) s2 = i=1 n−1

σ = std(X) s = s2

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 123 / 128
Statistical Inference

amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 124 / 128
Sampling distribution

µ, σ 2 , σ for a population
x̄, s2 , s coming for one
sample
X̄, S 2 , S when we have
many samples
amangupta0141@gmail.com
QU1HPBT85A From a population with µ
and σ 2
X̄ is the random variable of
sample mean of samples
S 2 a random variable of
sample variances of samples

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 125 / 128
Central limit theorem for the mean

Consider a population with the mean µ and variance σ 2


Random variable X̄ is the sample mean of randomly selected
samples of size n
CLT says
X̄ is approximately distributed as a normal distribution
E[X̄] = µ and std(X̄) = √σn
amangupta0141@gmail.com
Conditions
QU1HPBT85A
The original population is normal, OR
The original population is symmetric and n ≥ 10, OR
Any population, n ≥ 30
Cautions
In practice we have only one sample
If you are interested in better approximation of µ, use larger
sample size
This file is meant for personal use by amangupta0141@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 126 / 128
Confidence Interval

Think about a population with parameters µ and σ 2


you have a sample x1 , . . . , xn and you computed the point
estinate x̄
you can make a confidence interval of the form x̄ ± z √sn
v
amangupta0141@gmail.com
n
P uP n

QU1HPBT85A xi
u (xi −x̄)2
t
x̄ = i=1n , s = i=1 n−1 , n is the sample size
z is the level of confidence
z = 1.645 for 90% CI for the mean
z = 1.96 for 95% CI for the mean
z = 2.576 for 99% CI for the mean

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 127 / 128
Point estimate v.s. confidence interval

amangupta0141@gmail.com
QU1HPBT85A

This file is meant for personal use by amangupta0141@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Asef Nazari Math AI - SIT787 Topic 3 128 / 128

You might also like