Assignment 1

ASSIGNMENT 1
Syed Zaid Bin Haris

22868
Assignment 1 2
INTRODUCTION
competition is inspired by the Bank Marketing dataset available at UCI Repository.
We had to use knime and python to predict if the client will subscribe a to term
deposit (variable y).
The accuracy matrix was determined through AUC.
SUMMARY
Presentation title 3
Start Best Increase
Random Forest
0.91850 0.92003 0.00153
(knime)
KNN Knime 0.92319 0.92773 0.00454
Tree Ensembler
0.75983 0.75983 0
(knime)
Naïve Bayes
0.59427 0.59427 0
(knime)
Tree Ensembler
with data
0.78910 0.92793 0.13883
manipulation
(knime)
Random Forest
0.87445 0.91968 0.04523
(python)
Stacking (Python) 0.92625 0.92856 0.00231
RANDOM FOREST (KNIME)

• The first model I tried on knime, It initially gave an accuracy of 0.91850(
try1.csv).
• I changed many parameters to increase the accuracy.
• I was initially using the probability of yes in my out variable then I used
solid 0s and 1s , which lowered my accuracy to 0.81191 (try9.csv). So I
never tried that again.
• I increased the trees which took the accuracy to 0.92003 (something2.csv
).
TREE ENSEMBLE (KNIME)

• After Random Forest I moved on to Tree Ensemble
• I changed many parameters to increase the accuracy.
• I initially got an accuracy of (0.92319)
• I changed my seed value and it increased my accuracy to (0.92379).
• This took a couple of tries.
• After this, I increased the trees which took the accuracy to 0.92773 (
tree ens.csv).
K NEAREST NEIGHBORS(KNIME)
• I also tried K nearest neighbors.
• I got an accuracy of 0.75983 (knearsetneighbor.csv) which was lower than
all other methods by a lot so I was a bit disappointed by that.
• I tried changing the trees to 2 since we are doing binary classification and
my result dropped even further to 0.73965 (knearsetneighbor2.csv).
• This prompted me to not use knn anymore.
NAÏVE BAYES (KNIME)

• After KNN I used Naïve bayes, I tried it and got a very horrible accuracy
of 0.59427.(naivebayes.csv)
• This horrible accuracy and my already gotten AUC of 0.92 prompted me
to never use this again.
DATA MANIPULATION (KNIME)

• I tried some data manipulation.
• I used the idea that more data will lead to more accuracy hence I tried
several different ways to increase and clean the data.
• I couldn’t find any ways to clean the data .
• I tried several methods to increase the data, initially I was under the
wrong assumption that we were getting the score through accuracy, so
after sorting by p(y=1)I tried using the first 49 percentile as 0 and the last
49 percentiles as yes and added them to my training data. I ran the model
using the same configurations I got the best result from tree ensembler.
• My first result was 0.78910 (try11.csv).

• After that I figured out that since this is not working, I tried this with the
first 42 percentiles and the last 42 persentiles.
• I got this value from my accuracy.(0.92)
• The idea was that since I have an accuracy of 0.92 the most of the
mistakes would lie in the middle.
• This was also wrong but increased by accuracy a bit.(0.84989)
• Then I took the first 10000 and converted them to 0s this increased my
accuracy a bit to (0.92793).

• Lastly I tried to decrease the class imbalance by reducing some no values
from the training set.
• I deleted 10000 no values from the dataset and then my accuracy dropped
so I stopped going in this direction. (0.92718)
RANDOM FOREST PYTHON

• Lastly, I tried python sk learn library.
• Initially I used random forest with n_estimators =0 and no other
parameters.
• This gave me an accuracy of 0.87445 (hello5.csv).
• Then I increased the n_estimators to 1000 to get an accuracy of 0.91968.
STACKING (PYTHON)
• I then used sklearn.stackingClassifier with logistic regression as the final
estimator final estimator.
• I was stacking RandomForestClassifier and ExtraTreesClassifier first,
which gave me an accuracy of 0.92525.
• I then added the GradientBoostingClassifier to get an accuracy of
0.92625.
• I then changed some parameters(bootstrap,warm start), and got upto
0.92856. (hello20.csv)
• This is the best accuracy I got.
THANK YOU
Zaid Bin Haris 22868

Assignment 1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assignment 1

Uploaded by

Copyright:

Available Formats

ASSIGNMENT 1

Syed Zaid Bin Haris

Start Best Increase

RANDOM FOREST (KNIME)

TREE ENSEMBLE (KNIME)

NAÏVE BAYES (KNIME)

DATA MANIPULATION (KNIME)

DATA MANIPULATION (KNIME)

DATA MANIPULATION (KNIME)

RANDOM FOREST PYTHON

You might also like