Problem Statement SND Rubric - ML-2 - PGPDSBA.O.aug23.B

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Problem 1

Context

You are in discussions with ABC Consulting company for providing transport for their employees. For this
purpose, you are tasked with understanding how do the employees of ABC Consulting prefer to
commute presently (between home and office). Based on the parameters like age, salary, work
experience etc. given in the data set ‘Transport.csv’, you are required to predict the preferred mode of
transport. The project requires you to build several Machine Learning models and compare them so that
the model can be finalized.

Objective

The objective is to build various Machine Learning models on this data set and based on the accuracy
metrics decide which model is to be finalized for finally predicting the mode of transport chosen by the
employee.

Data Dictionary

Age: Age of the Employee in Years

Gender: Gender of the Employee

Engineer: For Engineer =1 , Non Engineer =0

MBA: For MBA =1 , Non-MBA =0

Work Exp: Experience in years

Salary: Salary in Lakhs per Annum

Distance: Distance in km from Home to Office

license: If Employee has Driving Licence -1, If not, then 0

Transport: Mode of Transport

Problem 2

Context

A dataset of Shark Tank episodes is made available. It contains 495 entrepreneurs making their pitch to
the VC sharks. You will ONLY use “Description” column for the initial text mining exercise.
Criteria Ratings Pts

Define the problem and perform Exploratory Data This area will be used by the 6.0
Analysis assessor to leave comments pts
- Problem definition related to this criterion.
- Check shape, Data types, statistical summary
- Univariate analysis
- Bivariate analysis
- Use appropriate visualizations to identify the
patterns and insights
- Key meaningful observations on individual
variables and the relationship between variables
Data Pre-processing This area will be used by the 2.0
Prepare the data for modelling: assessor to leave comments pts
- Outlier Detection(treat, if needed) related to this criterion.
- Feature Engineering / drop redundant features (if
needed)
- Encode the data
- Train-test split
Model Building - Bagging This area will be used by the 5.0
- Build a Bagging classifier assessor to leave comments pts
- Build a Random forest classifier related to this criterion.
- Check the performance of the models across train
and test set using different metrics and comment on
the same
Model Improvement - Bagging This area will be used by the 7.0
- Try and improve the model performance by tuning assessor to leave comments pts
the model (minimum 2 parameters to be tuned) related to this criterion.
- Bagging Classifier
- Random Forest Classifier
- Comment on model performance after tuning the
model
Model Building - Boosting This area will be used by the 3.0
- Build a Boosting classifier assessor to leave comments pts
- Check the performance of the models across train related to this criterion.
and test set using different metrics and comment on
the same
Note: AdaBoost or GradientBoosting classifier can
be built
Model Improvement - Boosting This area will be used by the 4.0
- Try and improve the model performance by tuning assessor to leave comments pts
the model (minimum 2 paratmeters to be tuned) related to this criterion.
- Comment on model performance after tuning the
model
Actionable Insights & Recommendations This area will be used by the 6.0
- Compare all the models and choose the best assessor to leave comments pts
model with proper rationale related to this criterion.
- Conclude with the key takeaways (actionable
insights and recommendations) for the business
Data Preparation This area will be used by the 9.0
Data preparation and exploratory data analysis assessor to leave comments pts
- Pick out the Deal (Dependent Variable) and related to this criterion.
Description columns into a separate dataframe
- Create two corpora - one with those who secured a
deal and the other with those who did not secure a
deal
- Find the number of characters for both the
corpuses
Text pre processing on corpora which secured the
deal
Insight Generation This area will be used by the 3.0
- Create a wordcloud of common words used by assessor to leave comments pts
companies who secure a deal related to this criterion.
- Provide insights from the preprocessed data
Business Report Quality This area will be used by the 6.0
- Adhere to the business report checklist assessor to leave comments pts
related to this criterion.

Guided Project Deduction This area will be used by the 9.0


assessor to leave comments pts
related to this criterion.

Total Points: 60.0

You might also like