Professional Documents
Culture Documents
Probability and Statistics
Probability and Statistics
Probability and Statistics
Probability ✗
Machine Learning
is
What Probability theory ? I
••
Whether that Examples
a)
It means you're not
have sure about
person cancer ←
08
something .
or not 0
probability %
→
usually
to
expressed over a
range of 0 CNMI not happen)
I 1- happens
- ' •
→ It
enables machine learning to predict future events based on
historical data o
Probability theory
a-
Statistics 2-4 Machine
teaming
Builds the base
Terminologies 9- concept ◦
denoted
It is by % symbol ◦
* Events 9-
An event is
an outcome
or -
defined collections
of outcomes of
expt
•
an
A of the W is called an Event
.
subset
f.gg , by
:
Degree of belief
-
*
if doctor analyzes
a
40010
a
patient 8
Says that the patient
has chance
of having flu
•
1- → absolute
certainty that the
"
pdr Mt don't
✗
(0 -4 absolute
certainty have
flt o
that the
has patient
flee
◦
.
0 @
1- PCB )
☒ (A) )
AUB
Cpc
Some
examples : -
or Buon ? = PLAN B)
pep wins =
PC A) 1- Pe B)
02-10-42 0 :b
Note :
-
Review the concept of factorials 9 combinatorics
The Law of Large Numbers : -
E. ✗ Pt :
Tossing a coin : Pc
getting ahead )
Ig 5 or 50%
-
=
or o •
toss
experimentmen .
probability
•
Random variable 8-
→ between set
of possible values from a random experiment ◦
coin 8-
→
Eg : Rossing a
°
: ◦' • are the possible values
Cheadle ,
{
✗
}↳
0 to
✓ exp
=
'
of an
1 1 fails )
Random events
✓
→
Eg : Throwing a dice 9- ✗ = I t
,
2,
3,4 , 5, 64
P= % , Pc✗ 2)
%
= = ◦ • ◦
Probability of Rov
taking the
value of ✗ = I
Random Variable
↓--→
Discrete continuous
random variable random variable
* Discrete
random variable : -
It's a row that can take
finite or
countably finite values .
E- g : -
the helps
*= Amt ◦ of rainfall on a
given day ,
where ✗ C- [ 0,07
PC ) Suck probe
teeming
us in out ✗2 2cm or •
height of students
probability
E- : ✗ where we take out the
g can
-
=
,
such as Pl ☒ 2180cm 7
↳
probability that a student have
to 180 emo
height
greater or
equals
E- 1- ✗ Time spent on a given Buch as Pex ≥ 107min ) 80010
day
=
g. = ,
where ☒ 10,03
→ It
gives the probabilities for the given discrete random nakiablllo
→ It's the function i
and
the
probability distribution of discrete rovo
their
• e a
provides
values ☒
probabilities .
) Pc * re )
P*lM
=
=
☆ ' I
The
=
of Probabilities sum up to 10
*
↳ sum
$
✗
Crc ) 20 -4 It is non -
negative .
Eg : PMF
of rolling dice : -
-
a
☒ = It ,
2 , 3,4s 5 , 6 1
46 "
116
%• % "
•6
I
/•b
•• •
I
P☒ 11-7 =
6
:
É Pd 6) = I
1- 2 3 4 5 6
0
fig : beraph of Pmf !
! Note :
-
changes as well
according
to the example ◦
variable o
→ It
gives the probability for continuous random
distinct range of
→ Tames out the
values
probability coming within a
,
as
opposed taking only one value ◦
Histogram :
-
Example : -
=
I
-
I
I
:
I
= btw
3 customer
waiting I 435 .
8 Go
on !
r
Density Plots : -
*
It can be
thought of as plots of smoothed histogram ◦
Examples
wdym ? ?
are :-
%
Note :
-
We will take a detail look ✗ analysis -9g histograms
density plots later •
^
?
f-
Pca ≤✗ ≤ b)
f pend ✗
2
[ a , by
→
.
Properties : - _
*
1PM DX =
⑥ * Every pen ≥ 0
Expected value : -
→ Mean of a
probability distribution ◦ It
represents
argo value me
expect before collecting any data
◦
whereas Mean is
typically sample
used as when
→ meee
calculate the mean here collect
of a we
•
data ◦
Eg : -
|
0018
EE =
-2 Kpcc ) 1- 0034
K O • 35
↓ 0011
3
do 02
IE = 0 ✗ 0 ◦
18 -1 4
02=1104--5-1
✓
*
expected
team Will
no of goals a
Calculating
•
Mean 8- scene
Ñ = In XP Law
of large noes implies
÷ that as we increase the
+
Take a look at the
practical
example of this -
Example : -
✗ =
[ 1
, 2,3 , 4,5 ,
6 ] unbiased six-sided die .
11=1*1 i
%) -1201% ) -13 (g) (8) 1- "
✗
(8)
= ◦
+ 4
•
+ 6.
= 3 • 5
3 50
-2 value •
why ? ?
Because So
of large
increase
N , each of Law members • as you
possible value
of 17×1×7 will occur with equal probes
of % , i. e turns to the expected value
avg
◦ •
Measures
of Central
tendency 8-
co -1 %
* A measure of a
single value that
attempts to
describe a set
of data
by identifying the central position
within that set of data .
MeaqM÷%hg\ Mode
Mean :
.
*
commonly I majorly used for continuous data ◦
* Sample mean = ER =
Fin ✗ ? ☆
Mean
value that
tells us
is
the
most
2- common o
Population
n✗P isn't
* Mean
mean =
% = ☆ the actual
value present in data •
When mean does n'* work
? T
◦
→
if your data have outliers, then mean can
get influenced
and data so we
may give wrong interpretation of
o
you
have
median which doesn't gets
influence by outliers
Median :
-
It's less
affected by Outliers
* ◦
↳: -
1-4
,
35
,
45
,
55
, 55,15¥ 56 ,
65 ] 87,09 , 922
scores
of ?
Mode : -
*
Most
frequent score % allll dataset ◦
it
dividing probability distribution
◦
can also refer to
groups a
eyto
.
of equal probability
•
areas
Quartiles :-
Quartiles formula : -
4.
↓
~ Its
Quartiles
.
%
* cut the data 1 distribution 4 equal parts ◦
2g 41 g
4
g
*g G
g 71 ,
8
•
0, 02 { upper
l middle Quartile )
flower
Quartile ) Quartile ) media!!
↳ also
called
Inter quatell Range
* The unter quintile range % from Qr to Is :
÷¥
Inter Quartile range
:
-
41,4 § 6,71
into quartile
gym
2 8
, , ,
range
,
% -
Qi = y -
y =
¥iÉ
-
* ( %)
Interpreting Quartiles µ •
9- • In
Eg 9- 59
84
,
60 .
8T
, 6$
90
,
6.5 ,Ñ; 69,70
95798
,
72,7
.FI?-&t-p7.Median
b ,
77 ,
81
)
,
8.2 ,
, , ,
021
* 0 , is the central p.ae?rtblW Smallest and the median ◦
* Q, M * 11141 19 *
{ 4 75 ≈ b-
= - = •
=
Dog =
(821-84)/2 =
83
Quartile is percentile
of Oy, represent
score 1st 8 the 25the
set
→ Q, tells 25010 Of the Seoul are less than 688 75010
→
Qs is the 75th percentile ,
reveals that 25010 of the scarce
greater than 84 ✗ 75010 lets than 84010 i
→ I DR tells how far apart the 1st & 3rd Quartile are
how out 5090
of dataset
indicating
our
spread in 1
,
Eg 9- 2, 3 3
, ,
4 , 5 , 6 6,7, 8
, , 8,8 , 9
◦
Q, = 3•5
,
02 = 6
, 03 =
8 What if in data me
Note : -
Percentiles 9- certain
of scores fall below that no
◦ to •
score is
if you know that yourbetter in the 90th percentile,
that scared
means than 90010
you of people
who took the test .
Say for an : -
the 70th percentile on the 2013 be RE was 156 .
scored 156
if you the
on exam , you score was
test takers ◦
better
they 7001 o g
-4 the 25th percentile is called the first quartile •
→
I5 22 24 27 32 36 40 41 50 9 0
↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
I 2 3 4 5 6 7 8 9 10
-4 15 6th
1st decile $ decile 36
=
*
H students scored
10010 Students
Goto of
Scored
Of below 15 •
below 360
Variance ¥ Variance is
from
a
measure
the meaner it is how tall set
of how data
of
ptos differ
data are
speed out
from their value
argo
•
* Measure
of Speudd from data ◦
Formula :
-
Variance =
& =
-
distribution variance of Rov
if we talk about probability ,
or .
Eg : -
{ °o° "µ%L?n=
☒ [ too , too ] : " "
P✗l☒ >
= -
=
o • 5 too ✗ = IN
[ 09
P*W={to
* = : * too 9=0
otherwise
E [ ✗] =
-2 91×7 p 1×7 EEY ] -
I 91×7 p ex )
all ✗ all ✗
2 -100 •
0 of -1 100 •
Oe 5 I ✗0
=
= 0
✗ 100 , or IN
ECYT
-
=
y
==
↳ it's mean = 0 hr
Variance of ✗ : -
E- [ c** y Rs I ex
µ ipcn )
- = -
alive
↓
var Cx )
Why we are
squaring ?
E [☒ ECX] E- [ ]
up =
µ×
⇐
141*-141 ☒ 0
-
-
=
b/w it's
ifthe
we don't 8g the
result is Oo
difference . ✗ $ mean ,
Nar Cx ) = F- [IX -
µ *P ]
= % I ☒i -
U*PP×l %)
9=1
'
Variance ✗
og
Varix) =
(-100-03210.5)-11100-0740.53 =
10,000
Vasil Y) = to -
OR er ) = 0 .
Farm -1-0*-1
1--1
toooo ✗ To =D
So , o-
☒
= = too 0-4 =
2 ( How to prove it ? )
EE ( ✗ -
UP ] =
E- 1×2 ) -
[f- ( x) ]
feint 9- Varcx ) = F- [ H -
14×12 ?
'
↳ expand Pt
using ta
-612
identity
.
&
Var CX ) = E [ IT] -
I ECX ) ]
"
↓ ↳
9¥ →
jane Pines
⇐
µ =
at
F- [ ] =
I, WE P☒ Irc )
au x
for
5,64 and
Ig
& ↑, 2 , 3,4 , ↑ 2 6
Pick)
k
Rex
•
= = ◦ •
= ,
F- [ ✗ ] = I ◦
1g 1- 2 ◦
f- -1
301g 1- he ◦
% 1- 50 f- 1- 6 •
%
{
=
'
E [× ] = to
% 40%
1- 1- 9 •
% -1 lb ◦
f- 1-
25% 1-1-30 % •
9dg
=
$0 2
: rare × ) = E [× ]
'
-
[ En ) 92 a
9dg (2)
-
AH Tae92 ≈ 1.71
≈
9% 4dg ≈ 2092
-
o = = -
materials ◦
P¥q
Note :
-
covariance : * "
I
-
t t
Nov Nov
-111
relationshits
.
A finance example % -
Say you're a investor , his portfolio tracks performance of the SAP 500 and
want to add the stock
you of ABC Corpo
S&P 500 → ABC Corpo
Data 9- ↓
→ wants to assess the dtreticnal
relationship btw the stock of ABC
Corpo A
S&P 500 ABC Corpo S&P 500
1692 6 8
2013
× →y
identity
2014 1978 1 02
-6 wants to
µ the directional
2015 I 884 110 relationship •
2016 2 1 51 112 *
if both ofthen
the stocks tend to increase
it has
together , me
say positive
2017 2519 154 covariance o
covariance formula : -
Corexit)
Il✗q-✗_m)CY§
=
Sample covariance
Corexit)
where :-
=
ICX-i-n-Y.FI
-6 the values of the ✗ var -47 the values of the Y Varo
Xp
- -
-
1692 -
2044 68 -
109 ◦ 20 -352080 ✗ -41 ◦
20
1978 1 02 ✗
-
2044 -
109 o 20 -
6 6 08 0 -7020
2151 -
2044 11 2 -
1090 20 10 6020 ✗ do 80
✗
2519 -
2044 154 -
1090 20 474 20 ◦ 44 u 80
↳
-
Cove saw , NBC , wipe) Multiply
14,535036
= 36,429020
480096
7.
-
128064
9.1077030
=
+
297 036
.
↑
positive covariance -
21,244 • I 6
36,429-021
Note : we will take
covariance matrix when
-
a look at •
in
looking Colato
correlation :
-
it describes a
perfect negative on
◦
inverse correlation ↑ in which one rises other decreases o
*
if Metatron Cleef is 0 , then there % no relationship
-4 The most common is Pearson coefficient or Pearson's which measures
of linear relationship btw two variables
the and direction
strength .
↳
§ PC 6h52 ) ± 2
/ 52 = ↑
/ 26
'
PC red ] =
4152 ✗ 26152
=
/26
PC 6) ✗
"
N
"
is referred as intersection which means it will happen
as the same time .
Marginal Probability
→ It's the
probability of a
single event
drawings independent of other events
•
sum rule :-
PC
*me
Ey y )
C- ☒ g ✗= re ) a PC ☒ = rc , Y:
↑
rov
Example : -
Calculate the Marginal Probability of Pet preference
and women ◦
among men
→
/ 22
} Margins
people who prefer cats = 7 2 0032
→
People who prefer Pish =
7122 = 0032
→ People who prefer Dogs 8122 0 36
= •
=
Conditional Probability
* The likelihood of an event
accruing based on the occurrence of a
Pc Al B)
↳ between Bo
probability of A
Conditional Probability formula on
P¥%
PCB / A) B)
}
'
Revisiting tag
or
n-pn.gg
= • '
Eg : -
we say that there is 23 to of the
days are
rainy •
↑ paint woody
Pc,np,B→
Pt At B) ±
PC B) 1=0
,
• When A 8 B are
disjoint : An B. = ∅ ,
SO ,
PIA / B) = PCAh_B) = P = 0
PCB) PCB )
since .
A ✗ B are
disjoint they ,
can't occur at the Same time .
•
When B is a subset of A : then hthlneulr B happens ,
A also
happens ◦
Pl AI B) )
Plrtpenpb PpY%,
=
= = 1
When A Is a sellset
og B, then
PCAIB)
Pcpttcnpp Pp4↑
=
=
possible values of Ko
continuous uniform distribution :-
b✗ l
¥a
Area
|
=
* & ait -
= '
f- 1×7=1
fat Area = I b-a
→ →
0 otherwise
-121
a
median 9 mean
←¥
= =
Iz ed d) distribution te ist
= o
-
a recto
↳ you don't need to
!
◦
•
say ✗ is arrow that hasuniform distribution
a with
f- "" =
%
=
2%0-2
=
¥-0 for 2100£ me ≤ 250
0 Others heir
=
what is Pex > 2301
?
:|ÉÉi:÷m
f-CX)
250
to the right of 230
this ie
to
simply a
•
Pl XD 2307 ⇐
(250-230) * I 20 04
50
¥ ÷=
Oo
µ 002 to the left ?
f-CX) ( a- 2001 * go = 02
"
Normal Distribution : -
it is
*
symmetrictoabout the mean ,
showing that
far from
data
mean
near the mean are
occur than data a
more
frequent
→ There are cases
many
a which data tends
in
t.EE#Tl.-in.maeai:niou-..
to be around a
central value with no
bias
left
or
sight •
Many such
examples follow N • Dt
50010 5090
of values are > than
50010 A mean .
of our values a
Around 68010
of the Values are within 1 Sob from the mean •
Around 99 7010
•
of the values are within 3 Sop .
Inc -1412
Formula 1- I -
1- ex ) = e
& You only need µ ✗
a- TEA 0--2 )
*
Niy = 0 -0=1 )
,
↳ Normal distribution has mean ee and standard deviation o- •
2- Score =
VI.
5
Empirical ruler
Stdo
Stand arising with 2- scores :-
SAT ACT
Mean 1500 2 1
SD 300 5-
Kann ←
-
2 = SAT
Ann 1800 -
1500
= = 1-
#
290m
×TomÉa 24-5-21
=
=
= 0 • 6
* The 2- Store
of an observation is
the no .
of Standard deviations it
falls above all below the mean 0 we
compute
2- score for an observation me that follows distribution with mean µ and standard deviation :
a •
easing
-
1¥
2 =
https://stats.libretexts.org/Bookshelves/Introductory_Statistics/
Book%3A_OpenIntro_Statistics_(Diez_et_al)./
03%3A_Distributions_of_Random_Variables/
3.01%3A_Normal_Distribution
Heo metric distribution : -
✗ How
long should we expect to flip a coin until it turns up heads ?
-
✗ How
many times should we expect to rod a die until we get 1 ?
-
if the
probability of a success in one trial is p and the
probability of a
failure is 1-
p , then
the
ki -
p
Eg : Suppose-
you're
playing games of darts
◦ The
probability of success is 004 • What is the
probability
you will hit the
beuseye on the 3rd
try ?
P = 0 • 4
•
p↑→p UP
-
PC * = ne ) =L I -
; PC ✗ =3 ) = I l -
o • 10 4) ;
•
PC ✗ =3 ) = Ooty y
ie
% i-¥
= =
• =
; ;
P2
here , eh 8 expected value are same • On
avg
◦ it tunes Yp to get success under the
geometric distribution a
→
if the PI success is ↑ then you don't have to wait for success
long to a •
Pl succeed
leg i. 008 ; = 1025 trials on
=
egg, avg ◦
avg ◦
=
Bernoulli Distribution
-4 This distribution % Quite simple left for Students to explore •
Binomial distribution :
-
Hypothesis test : -
*
say you run an expto and find that a certain drug is effective at treating headaches .
is
What
Hypothesis ?
It's educated guess
something
* an about ◦
as
a
guess based on knowledge
and experience .
Eg :
-
→
Drug A works better r than
Doug Bo
-6 A way
• J of teaching you might think be better •
it
-6 can be
really anything at all as
long as you can put
it to the test .
" "
If I • ◦ • I do this 1 • • ◦ then I thees will happen )
Eg :
-
•
if I lgive patients counseling in addition to medication)
then c their overall depression scale will decelerate)
•
Null
hypothesis -
Ho -
The Null
hypothesis is always accepted as fact Simple
◦
examples of
null hypotheses that are
generally accepted being
i t
as true alee
Taking
◦
violence can increase your
risk of heart problems ◦
Alternate hypothesis -
*
a- The alternate hypothesis is also called research hypothesis • It involves the
claim to be tested ◦
* It's a
results
way for
•
you to test the results of a sunny or expto to see if you have
meaningful