Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

Predictive and Prescriptive

Analysis
Project Presentation
Presented by:

Group 5
B2021041
Shiwang Agarwal
B2021039
Sarthak Kumar Bal
B2021029
Parag Agrawal
INTRODUCTION

AGENDA OBJECTIVE
THE WHOLE IDEA BEHIND CHOOSING THIS
THE COMPLETE AIM IS TO APPLY DATA SCIENCE
DATASET IS TO FIGURE OUT THE IMPORTANT
TECHNIQUES FOR PREDICTING THE PRICE OF CARS
FACTORS WHICH CAN CREATE HUGE IMPACT ON
WITH THE AVAILABLE INDEPENDENT
AUTOMOBILES BUSINESS RIGHT FROM SETTING
VARIABLES(THE MATERIALS USED FOR DESIGNING
UP THE BUSINESS TO DELIVERY OF CAR TO THEIR
OF CAR). THAT SHOULD HELP THE MANAGEMENT
CUSTOMERS.
TO UNDERSTAND HOW EXACTLY THE PRICES VARY
WITH CHANGE IN INDEPENDENT VARIABLES.
MOST OF THE TIMES, COMPANIES FACES ISSUES
ON DETERMINING THE RIGHT PRICE FOR THEIR
AUTOMOBILES ON BASIS OF THEIR
WITH HELP OF THE BUILT MODEL, THEY CAN
CONSTRUCTION AND HENCE THE MODEL HAS
ACCORDINGLY MANIPULATE THE DESIGN OF THE
BEEN CREATED TO HELP THE INDUSTRIES
CARS AND THE BUSINESS STRATEGY TO MEET
DETERMINE THE RIGHT PRICE FOR THEIR
CERTAIN PRICE LEVELS.
CONSUMERS.
PROBLEM STATEMENT

THROUGH SUPPORT OF IN THIS MODEL : THIS MODEL WILL HELP


OUR MODEL, WE ARE COMPANY IN:
TRYING TO RESOLVE A • MINIMIZING THE DECISION
• WE HAD CLEANED THE
PROBLEM WHERE THE EFFORTS IN ORDER TO
DATASETS
COMPANY CAN INPUT THE REDUCE THEIR OPERATING
DIFFERENT POSSIBLE • APPLIED CORRELATION
COST FOR THEIR
VARIABLES WHICH ARE • DUMMY ENCODING
AUTOMOBILES
BASICALLY THE DETAILS OF • EXTRACTED TARGET AND
THEIR NEW CAR TO BE FEATURES VARIABLES
MANUFACTURED AND BUILT A MODEL TO GET • CAN GET AN EFFECTIVE
CAN GET A BEST POSSIBLE THE OUTPUT OF PRICE PRICE WHICH CAN BE LATER
BASE PRICE FOR THAT TO WITH HELP OF ON ADJUSTED BASED ON
WITH BE LAUNCHED IN DIFFERENT SUPPORTING LOCATION, TAXES AND
THE MARKET. INPUTS. OTHER FACTORS.
CAR PRICE PREDICTION DATASET INFORMATION

DATASET CONTAINS 25
FEATURES

DATASET IS COMBINATION
OF 10 CATEGORICAL
VARIABLES AND 15
NUMERICAL VARIABLES

DATASET CONTAINS
MISSING VALUES IN
WHEELBASE AND
CARWIDTH
PREPROCESSING OF DATASET INFORMATION
REPLACING CARNAME TO
CARBRAND TO MAKE TO
MORE APPEALING, AND
DROPPING CARNAME
FEATURE

REPLACING MISSING
VALUES WITH MEAN OF
THAT FEATURE

REPLACING CATEGORICAL
TO NUMERICAL DATA

REMOVING DUPLICATES
CAR PRICE PREDICTION DATASET INFORMATION

PRICE DATA IS RIGHT


SKEWED

PAIR PLOT HELP US TO SEE


BOTH DISTRIBUTION OF
SINGLE VARIABLES AND
RELATIONSHIPS BETWEEN
TWO VARIABLES 

IT SHOWS SKEWED AND


NORMALLY DISTRIBUTED
DATASET 
OUTLIER INFORMATION
CATEGORICAL COLUMNS DISTRIBUTION ANALYSIS THROUGH
ANOVA

CARBODY AND FUELTYPE HAVE P


VALUE> 0.05 OTHER THAN ALL
CATEGORICAL FEATURES HAVE P
VALUE LESS THAN 0.05
CORRELATION OF DATASET INFORMATION

CORRELATION IDENTIFIES VARIABLES AND


LOOKS FOR RELATIONSHIP BETWEEN THEM

Highwaympg and citympg => 0.97


Enginesize and Curbweight => 0.88
Carlength and Curbweight => 0.87
Carlength and enginesize => 0.73
Wheelbase and carlength => 0.86
Citympg and Horsepower => -0.87
Horsepower and Highwaympg => -0.85
Horsepower and Curbweight => 0.78
DUMMY ENCODING AND EXTRACTING TARGET AND FEATURE
VARIABLE

WE HAVE DROPPED
FEATURES WITH HIGH
CORRELATION AND GOING
FOR MULTIPLE
REGRESSION

ENCODING ALL
CATEGORICAL VARIABLES
THROUGH ONE HOT
ENCODING

EXTRACTING THE TARGET


AND FEATURE VARIABLES
TRANFORMING TARGET VARIABLE, SCALING THE VARIABLES AND
BUILDING THE MODEL

SPLITING DATA INTO TRAINING AND


TESTING

DEALS WITH DATA IMBALANCING( LIKE


STANDARDSCALER SCALES EACH FEATURE/VARIABLE TO UNIT
COMPRESSIONRATIO)
VARIANCE.
MODELS
OBSERVATIONS

CARBODY AND IMPUTING CITYMPG, CARLENGTH AND


FUELTYPE HAVE P CYLINDERNUMBER WITH WHEELBASE SHOWED GOOD
VALUE> 0.05 OTHER RESPECTIVE NUMBER GIVE CORRELATION AMONG
THAN ALL BAD MEAN SQUARED EACHOTHER AND
CATEGORICAL ERROR FOR ALL MODELS REMOVING THEM FROM
FEATURES HAVE P- ALSO R SQUARED ALSO MODEL HAVE SHOWN GOOD
VALUE LESS THAN 0.05 CHANGED FOR ALL MODELS PREDICTION

AFTER REMOVING CARBODY FEATURE USING STANDARD SCALAR, ANOVA AND


FROM CATEGORICAL DATA BY SMOTE(IMBALANCE TECHNIQUE) WE
PERFORMING ANOVA, WE FOUND ANALYSED VERY LESS OVERFITTING,
LASSOCV PERFORMED AS THE BEST OUTLIERS AND DATA IMBALANCE
MODEL( INDICATING IT WAS ISSUES, AND LEADING ABLE TO PREDICT
RESPONSIBLE FOR MULTICOLLINEARITY) BETTER MODELS
THANK YOU

You might also like