Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 13

ASSIGNMENT 1

Syed Zaid Bin Haris


22868
Assignment 1 2

INTRODUCTION
competition is inspired by the Bank Marketing dataset available at UCI Repository.
We had to use knime and python to predict if the client will subscribe a to term
deposit (variable y).
The accuracy matrix was determined through AUC.
SUMMARY
Presentation title 3

Start Best Increase

Random Forest
0.91850 0.92003 0.00153
(knime)
KNN Knime 0.92319 0.92773 0.00454
Tree Ensembler
0.75983 0.75983 0
(knime)
Naïve Bayes
0.59427 0.59427 0
(knime)
Tree Ensembler
with data
0.78910 0.92793 0.13883
manipulation
(knime)
Random Forest
0.87445 0.91968 0.04523
(python)
Stacking (Python) 0.92625 0.92856 0.00231
Presentation title 4

RANDOM FOREST (KNIME)


• The first model I tried on knime, It initially gave an accuracy of 0.91850(
try1.csv).
• I changed many parameters to increase the accuracy.
• I was initially using the probability of yes in my out variable then I used
solid 0s and 1s , which lowered my accuracy to 0.81191 (try9.csv). So I
never tried that again.
• I increased the trees which took the accuracy to 0.92003 (something2.csv
).
Presentation title 5

TREE ENSEMBLE (KNIME)


• After Random Forest I moved on to Tree Ensemble
• I changed many parameters to increase the accuracy.
• I initially got an accuracy of (0.92319)
• I changed my seed value and it increased my accuracy to (0.92379).
• This took a couple of tries.
• After this, I increased the trees which took the accuracy to 0.92773 (
tree ens.csv).
Presentation title 6

K NEAREST NEIGHBORS(KNIME)
• I also tried K nearest neighbors.
• I got an accuracy of 0.75983 (knearsetneighbor.csv) which was lower than
all other methods by a lot so I was a bit disappointed by that.
• I tried changing the trees to 2 since we are doing binary classification and
my result dropped even further to 0.73965 (knearsetneighbor2.csv).
• This prompted me to not use knn anymore.
Presentation title 7

NAÏVE BAYES (KNIME)


• After KNN I used Naïve bayes, I tried it and got a very horrible accuracy
of 0.59427.(naivebayes.csv)
• This horrible accuracy and my already gotten AUC of 0.92 prompted me
to never use this again.
Presentation title 8

DATA MANIPULATION (KNIME)


• I tried some data manipulation.
• I used the idea that more data will lead to more accuracy hence I tried
several different ways to increase and clean the data.
• I couldn’t find any ways to clean the data .
• I tried several methods to increase the data, initially I was under the
wrong assumption that we were getting the score through accuracy, so
after sorting by p(y=1)I tried using the first 49 percentile as 0 and the last
49 percentiles as yes and added them to my training data. I ran the model
using the same configurations I got the best result from tree ensembler.
• My first result was 0.78910 (try11.csv).
Presentation title 9

DATA MANIPULATION (KNIME)


• After that I figured out that since this is not working, I tried this with the
first 42 percentiles and the last 42 persentiles.
• I got this value from my accuracy.(0.92)
• The idea was that since I have an accuracy of 0.92 the most of the
mistakes would lie in the middle.
• This was also wrong but increased by accuracy a bit.(0.84989)
• Then I took the first 10000 and converted them to 0s this increased my
accuracy a bit to (0.92793).
Presentation title 10

DATA MANIPULATION (KNIME)


• Lastly I tried to decrease the class imbalance by reducing some no values
from the training set.
• I deleted 10000 no values from the dataset and then my accuracy dropped
so I stopped going in this direction. (0.92718)
Presentation title 11

RANDOM FOREST PYTHON


• Lastly, I tried python sk learn library.
• Initially I used random forest with n_estimators =0 and no other
parameters.
• This gave me an accuracy of 0.87445 (hello5.csv).
• Then I increased the n_estimators to 1000 to get an accuracy of 0.91968.
Presentation title 12

STACKING (PYTHON)
• I then used sklearn.stackingClassifier with logistic regression as the final
estimator final estimator.
• I was stacking RandomForestClassifier and ExtraTreesClassifier first,
which gave me an accuracy of 0.92525.
• I then added the GradientBoostingClassifier to get an accuracy of
0.92625.
• I then changed some parameters(bootstrap,warm start), and got upto
0.92856. (hello20.csv)
• This is the best accuracy I got.
THANK YOU
Zaid Bin Haris 22868

You might also like