Professional Documents
Culture Documents
Submission PPT 110721 - Group 2
Submission PPT 110721 - Group 2
Submission PPT 110721 - Group 2
Proprietary and confidential — do not distribute Enter title via "insert>header and footer>footer" | 8/22/21 | 3
Boston Housing Data : Summary
Task : MEDV is the dependent variable. Using all variables, build a model to predict Median
Value of owner-occupied homes.
Key findings:
Key findings:
SVM summary
RMSE Linear – 5.9
Regression summary
RMSE Polynomial – 6.8
RMSE - 5.8
RMSE Radial – 6.04
RMSE Sigmoid – 12.4
Based on the lowest RMSE, the preferred method for model fitting is Neural
Network
Proprietary and confidential — do not distribute Enter title via "insert>header and footer>footer" | 8/22/21 | 8
Customer Profit
Proprietary and confidential — do not distribute Enter title via "insert>header and footer>footer" | 8/22/21 | 9
Customer Profit Data : Summary
Task : Profit is the dependent variable. Using all variables, build a model to predict profit
Key findings:
1. For various values of ntree, the lowest OOB is achieved every
time with mtry value of 1. Also the OOB error difference for
ntree of 400 and 500 is very close. So in interest of having a
simpler model and best balance between speed and accuracy
we chose ntree = 400
2. The top 2 variables that influenced Profit for Random Forest
were INCOME and TENURE
3. The best parameters for radial kernel in SVM are epsilon
=0.6 , gamma = 0.1428571, cost = 0.25
4. For Neural network lowest RMSE is seen with 'TANH'
activation function
Key findings:
1. The top variables that influenced Profit for Neural
Network were ONLINE and AGE
2. For Neural network lowest RMSE is seen with 'TANH'
activation function Tanh activation function was applied
with two hidden layers with 10, 5 on the training set
3. Original function (Rectifier, 1 layer , X neurons, 100
epochs) had an accuracy of xx%
SVM
RMSE Linear – 223.5
Regression
RMSE Polynomial – 223
RMSE – 217.4
RMSE Radial – 222.8
RMSE Sigmoid – 53052.4
Based on the lowest RMSE, the preferred method for model fitting is Neural
Network
Proprietary and confidential — do not distribute Enter title via "insert>header and footer>footer" | 8/22/21 | 14
E-sign
Proprietary and confidential — do not distribute Enter title via "insert>header and footer>footer" | 8/22/21 | 15
E-sign : Summary
Proprietary and confidential — do not distribute Enter title via "insert>header and footer>footer" | 8/22/21 | 16
Key findings: Random Forest and SVM
1. To reduce processing time, we ran two different grids and then selected the
model with optimum C value having better accuracy. We find that for linear
kernel, best model is with C=1
2. Best parameters for radial kernel are epsilon = 0, cost = 2
3. The influencing variables for RF were AMOUNT REQUESTED, RISK
SCORE and INCOME
4. This is classification problem hence number of variables used at each split
would be square root of total number of variables default value of ntree is set
to 500, we used print command which allowed to see the number of trees
tried, variables considered at each split and the OOB error rate - OOB error
rate is 36.89%, NTREE = 500, MTRY = 4
5. We used mtry for better accuracy when mtry – 8, ntree – 300 the OOB error
was 36.71% then tunRF OOB reduced to 36.37% .AUC WAS 64%
6. SVM Radial with AUC 61% was nearest to RF
SVM
AUC Linear – 0.58
Logistic Regression
AUC Polynomial – 0.60
AUC – 0.57
AUC Radial – 0.61
AUC Sigmoid – 0.50
Based on the highest AUC, the preferred method for model fitting is Random Forest
Proprietary and confidential — do not distribute Enter title via "insert>header and footer>footer" | 8/22/21 | 20
Lead scoring – Ed tech
Proprietary and confidential — do not distribute Enter title via "insert>header and footer>footer" | 8/22/21 | 21
Lead Scoring: Summary
Task : Identify predictor variables to generate a model that predicts likelihood of customer
conversion for online courses. The current conversion rate is 30% and the sales team with
this model can focus more on likely prospects.
Proprietary and confidential — do not distribute Enter title via "insert>header and footer>footer" | 8/22/21 | 23
Key findings on Neural network
Additional packages : Boruta
Proprietary and confidential — do not distribute Enter title via "insert>header and footer>footer" | 8/22/21 | 24
Key findings on Random Forest
TUNING PARAMETERS
Abhi/Bhushan- I need your inputs on the
tuning parameters
Proprietary and confidential — do not distribute Enter title via "insert>header and footer>footer" | 8/22/21 | 25
Deriving the best model
SVM
AUC Linear – 0.81
Logistic Regression
AUC Polynomial – 0.59
AUC – 0.815
AUC Radial – 0.81
AUC Sigmoid – 0.80
Based on the highest AUC, the preferred method for model fitting is Random Forest
Proprietary and confidential — do not distribute Enter title via "insert>header and footer>footer" | 8/22/21 | 27