Professional Documents
Culture Documents
PredictiveModelling Bhushanrai 03072022
PredictiveModelling Bhushanrai 03072022
Learning
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Table of Contents
Contents
Problem 1:
You are hired by one of the leading news channels CNBE who wants to analyze recent elections. This survey was
conducted on 1525 voters with 9 variables. You have to build a model, to predict which party a voter will vote for
on the basis of the given information, to create an exit poll that will help in predicting overall win and seats
covered by a particular party.
Data Ingestion:
1.1 Read the dataset. Do the descriptive statistics and do the null value condition check. Write an inference on it.
1.2 Perform Univariate and Bivariate Analysis. Do exploratory data analysis. Check for Outliers.
Data Preparation:
1.3 Encode the data (having string values) for Modelling. Is Scaling necessary here or not? Data Split: Split the data
into train and test (70:30).
Modeling:
Inference:
1.8 Based on these predictions, what are the insights?
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Problem 2:
In this particular project, we are going to work on the inaugural corpora from the nltk in Python. We will be looking
at the following speeches of the Presidents of the United States of America:
4. 2.1 Find the number of characters, words, and sentences for the mentioned documents. –
5. 2.2 Remove all the stopwords from all three speeches. –
6. 2.3 Which word occurs the most number of times in his inaugural address for each president? Mention the
top three words. (after removing the stopwords) –
7. 2.4 Plot the word cloud of each of the speeches of the variable. (after removing the stopwords) – refer to
the End-to-End Case Study done in the Mentored Learning Session ]
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Executive Summary
You are hired by one of the leading news channels CNBE who wants to analyze recent elections. This survey was
conducted on 1525 voters with 9 variables. You have to build a model, to predict which party a voter will vote for
on the basis of the given information, to create an exit poll that will help in predicting overall win and seats
covered by a particular party
Introduction
1.1 Read the dataset. Do the descriptive statistics and do the null value condition check. Write an inference on it.
Sample of the dataset:
Table 1. Dataset
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Insights:
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
1.2 Perform Univariate and Bivariate Analysis. Do exploratory
data analysis. Check for Outliers.
Univariate Analysis:
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Minimum age of voting is 24 years old and maximum age is 93
years old with mean of 53 years.
The representation value of economic.cond.national and
household min is and max is with avg of 3
Voting description for blair, min is 1 with max of 5 and avg of
4 and blair count min is 1, max is 5 and avg is 2
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Multi Variate and Bivariate
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Insights:
Data is not usually correlated as variables are moving in
opposite directions as for voters for blair is not voting for
Hague, there is a negative correlation
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Check for Outliers:
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
1.3 Encode the data (having string values) for Modelling. Is
Scaling necessary here or not? Data Split: Split the data into
train and test (70:30).
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited