Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Heart failure

prediction using
PCA as a dimension
reduction technique.
What my dataset looks like!

As you can see , I can’t throw this dataset into my machine learning model because it contains a lot
of categorical variables.
Data preparation

I tried to explore my dataset to see if there is any missing values and fortunately , there
is none.

● So , next , I explored each column with categorical variables


● This gave me an insight on how I can handle them.
● So columns with just two unique variable ,I used label encoding while I used one
hot encoding for the rest.

NB: When using One-Hot Encoding , you must remember to drop first in the argument
as this helps to remove the issue of multicollinearity
Scaling of my dataset

There are two reasons why I scaled my dataset :

● PCA only works with scaled data


● The variables in my dataset are terribly lopsided , for example , my dummies are zero
and ones , and there are other variables that are way bigger, throwing them into the
model without scaling might confuse the model.
Challenges
Before I use PCA , I want to find out how much accuracy I
can get with different machine learning models , but then
that will be too much iteration for me to repeat especially
finding the best parameters for these machine learning
models. So in this case , I will be using GridSearchCV. In the
next slide, I will be showing you how I used GridSearchCV to
find the model with the best accuracy.
How I used
GridSearchCV

Firstly, I created a
dictionary for all the
models I wanted to
use and their
parameters .
How I used GridSearchCV
Best scores without PCA
I need to say this that PCA doesn’t doesn’t mean that the
accuracy of our model will increase, usually it decreases , but
computation is much lighter and this is some of the trade off
we consider in the industry.
PCA reduces the number of variables in a dataset while
maintaining as much information as possible. It transforms the
original variables into a new set of variables, which are called
principal components. These components are ordered so that
the first few retain most of the variation present in all of the
original variables
How I implemented PCA
Model accuracy significantly improved after PCA to about
89.67% . PCA reduced my features to 13 from the initial 16
components
Thank you

You might also like