Professional Documents
Culture Documents
ML 3 & 4 Notes
ML 3 & 4 Notes
ML 3 & 4 Notes
Natya-y+(za-z)+ ...n
At a time, we can only fnd
olistance between 22 oits. If theve
are 3 points , then we dlo au
3 distantes separately
* KNN (K Nearest Neighbour
Also knon as lazy Algoithm.
(We ae nev qoing to use tis)
KNN doesnt do anything dung
tauning
It doesnt do anything th the training
data duing tsaining strae/phase.
Supervised Machine
Leaming Algoihm that asigns a class labl
Cfor classifcaton) oo pxdicts a value
( for eqre ssion) for aa new data poit by
consideing the K- nearest daa points trom
the tsaining dataset based on a chosen
distance methc. The pdi cded output is
detesmined by a majoty (for
clasocbn)
an average (tor eqession) of the
k- neare st Neighbour's al ue.
ln KN you wiu deide the vlue of k.
(t usually is an odd nunber. feterably 5 or t.
- Aftex foding 5 os 4 nearest eighbours .
k get the mean of al yalueslnn negression.
X.shape
- X.head()
- # spi Hing the data.
-from
skleam.model-selecn
-X_tsain, X_test,
impost tain testselit
ytain, ytest = train test-spi
(X,y,test_size = 0.2 ,andom state=\23,Stratifyy)
-X_train.shape
-fom skleano, iear model inn post LogistcRegession
- logi= ogisticRege ssion ()
- Logi.ft (X_tsain, ytrain)
3-pred= Logi. pedict lxtet)
y-test
3-pred
-fom sklean, metics impost confusion-mati
accuracy-Score, classihcaonrepost
-
Confusionmatix (y test, y-prea)
accuracy- score (y.test,y-pred)
clasihcahonepost
*# iF (ytes,3-pred).
we use the abe
code. ik wot be
visilole properly , hente we pint i.
-pónt (elascihcaton Deport
-test y-prca)
Note! e onte stsatty=y to
proposton ally
lctslbute the testing ota
te Accuracy Meties of
Cacsifcation:
(100 asked in intevie)
- Eg: Covid patients (y)
|00
50 S0
30
N So
20
Hese, the acuray is 8o). ( 3o +5o-xoo)=&o
30+20+50
Eg: Lets
Lets take covid paients dote
|00
p + - N
tve b oTSecty So 50
pedicted as tve -ve but wrongly
P - 20 P- o pedieted s te
N
N-G0 e e conrectty
+ve but wrongly Predicted as Ve.
Poedicted as -ve
Now we ut this Ln the natix.
Heye, the main focus is tve lass (
in medical condions for example used
Confusion Redicted
Matiy
P N
Tsue False
Posiire Negate
P
TP 20 /FN 30
A ctual sue
Falre
N Positve NegaBe
|FP OTN 50
TP+TN
TP+TN+FP +FN)
20+5O 70
=0.4=0.
Accuracy 20+50+0430
Anolher E9 200
5o 150
+
40
’l40
Rredicted
Hex, Accuracy
ITP TP +TN
10 40 TP+TN4 FP4FN
Actual FP
TN 10+140
N 140 200
IO+140+10+40
= 0.75 = 75).
But this Accux aCy is
misleading s!
- We wo firding covid tve . Dus main focuy
should be the class. So, in te above example,
Actual covid tue wee 50 and we just
10 toety predicted
Thots only 209. aauYacy. 80/.
of it is wong.
A+\TP
FN 10 40
A
TN 10 |40
TP = 0 0.2
TP+FN 50
() frecision: TP = 0.5
TP+FP 20
( ) FA- SUOe! (Hanmonie mean of Recau
and Arcision)
(P+R
(iv) Sensiitivty: TN = |40
O.93
TN+ P 150
+ >25o t b2o0
250 L300
P
Confusion matmx! +250 250
A
2003oo
Reca= TP 250
TP +FN 500 0.5
" Rrecision = TP 250 -0.55
TP+p 450
[P+R 2 de/.05
(PtR 6.275
= 7.636
Sen siivity = TN
= 6.
ML 4
tMajosity of the time, the data wont be
balanced. It wiu be imn balanced data.
-Ang data distibucbon apast Aom 50-50 ai be
mbalanced.
-Generally, Go-4o imbalane ie fine and we can
uie it as it is, we wott ty to balance thatdde
But most of the time, qenenally the das
inmbalance is too much. fo example,
parttie lass ’| ,Negatve clas 497.
When your doda is imbalanced, these can be
3 pOssiole things that you can do:
i.Do nt balance 1t
. Balance it
i . Doit use that dta.
Theye ae 2 ways to balante the dada
i) Oversampling. (SMOTE)
ÞjUnder sarmpüng CNear Miss)
*
OVERSAMPLINGi Oversampling inyoves
increasing the nurber of instances in the
minovy class t balance it with the maosity
class Ths is typi cally done by duplicating ing
instances or ganeratng Synthete data points.
Eq: Coid Dota. \000
te/ ve
200
In ovesamplin9, it wiu vepeat this 2o u
t beccomes oD
1000 200
+/\- ’ 200
200 &oo 200
Soote
- SMOTE (Syntheie Mingity Over-Samping
Technique) Is a dada augmentaton techrique
in macine leoning sed to balance imbalanced
dadesets by qenerotbrg Synthete examples
for the oinoity clacs through
interpolatian
between eisinog instances and their
nearest neihloours
* UNDERSAMPLING:
- Undersompling
of instances ininvolyes
the
meduing the nmber
mgoty bss to
balance t with the minoty class.This is
typically done by an dony selectng a sobset
of instances or the mo
ty cla.
-Generoly, we use undersamping moxt.
-lt neduces tha
majity cass 0 it beones
equal to the minoty das
Eg I000
-1 | select 200 random
200 yalues em Buo