Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Topic(s): Decision Tree & Random Forest

1.) A cloth manufacturing company is interested to know about the segment or


attributes contributing to high sale. Approach - A decision tree & random forest
model can be built with target variable 'Sale' (we will first convert it into categorical
variable) & all other variables will be independent in the analysis.

© 2013 - 2020 360DigiTMG. All Rights Reserved.


ANSWER:

Business Problem: To know what segments or attributes that are contributing to high sale.

Import the Dataset and Data Preprocessing:


- Converting Sales column into categorical variable
- Removing the 1st column
- Exploratory Data Analysis
- Graphical Representation
- Shuffled the dataset

Data Partition:
- Checking the proportion of class variable and dividing dataset into test and train.

Model Building:
- DECISION TREE
- Train accuracy : 1.0 and Test accuracy : 0.75
- RANDOM FOREST
- Train accuracy : 1.0 and Test accuracy : 0.8

© 2013 - 2020 360DigiTMG. All Rights Reserved.


2.) Divide the data (Diabetes) into training and test datasets and create a Random
Forest Model to classify 'Class Variable'.

Answer:
Business Problem: Classify if the person has diabetes or not
Import the dataset and Data Preprocessing:
- Graphical Representation
- Label encoding is done
- Predictors and target columns are rearranged and set for partition
Data Partition:
- Checking the proportion of the output variable. Test data taken as 30% and train as
70%
Model Building:
- Train accuracy is 77.05% and test accuracy is 94.13%.

© 2013 - 2020 360DigiTMG. All Rights Reserved.


3.) Use decision trees & random forest algorithm to prepare a model on fraud data,
treating those who have taxable_income <= 30000 as "Risky" and others are "Good".

© 2013 - 2020 360DigiTMG. All Rights Reserved.


ANSWER:
Business Problem: Classify the people having a taxable_income <= 30000 as "Risky" and
others are "Good
Import the data set and Data Preprocessing:
- Dataset is imported
- 476 Good customers and 124 risky customers.
- Prediction and confusion matrix- training data.
- Looks like the first six predicted value and original value matches.
- Train accuracy is 1.0 and test accuracy is 0.661 using decision tree
- On the graph, if taxable income is 3000 or greater than they are good customers else
those are risky customers.
- Train accuracy is 0.99 and test accuracy is 0.71 using random forest model.

Hints:
1. Business Problem
1.1. Objective
1.2. Constraints (if any)
2. Data Pre-processing
2.1 Data cleaning, Feature Engineering, EDA etc.
3. Model Building
3.1 Partition the dataset
3.2 Model(s) - Reasons to choose any algorithm
3.3 Model(s) Improvement steps
3.4 Model Evaluation
3.5 Python and R codes
4. Deployment
4.1 Deploy solutions using R shiny and Python Flask.
5. Result Share the benefits/impact of the solution - how or in what way the business (client)
gets benefit from the solution provided.

Note:
© 2013 - 2020 360DigiTMG. All Rights Reserved.
1. For each assignment the solution should be submitted in the format
2. Research and Perform all possible steps for improving the model(s) accuracy
Ex: Feature Engineering, Hyper Parameter tuning etc.
3. All the codes (executable programs) are running without errors
4. Documentation of the module should be submitted along with R & Python codes, elaborating
on every step mentioned here

© 2013 - 2020 360DigiTMG. All Rights Reserved.

You might also like