Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

Machine

Learning

Name Bhushan Rai


PGP-DSBA
Date: 07/08/2022

0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Table of Contents

Contents

Problem 1:

You are hired by one of the leading news channels CNBE who wants to analyze recent elections. This survey was
conducted on 1525 voters with 9 variables. You have to build a model, to predict which party a voter will vote for
on the basis of the given information, to create an exit poll that will help in predicting overall win and seats
covered by a particular party.

Data Ingestion:
1.1 Read the dataset. Do the descriptive statistics and do the null value condition check. Write an inference on it.
1.2 Perform Univariate and Bivariate Analysis. Do exploratory data analysis. Check for Outliers.

Data Preparation:
1.3 Encode the data (having string values) for Modelling. Is Scaling necessary here or not? Data Split: Split the data
into train and test (70:30).

Modeling:

1.4 Apply Logistic Regression and LDA (linear discriminant analysis).


1.5 Apply KNN Model and Naïve Bayes Model. Interpret the results.
1.6 Model Tuning, Bagging (Random Forest should be applied for Bagging), and Boosting.
1.7 Performance Metrics: Check the performance of Predictions on Train and Test sets using Accuracy, Confusion
Matrix, Plot ROC curve and get ROC_AUC score for each model. Final Model: Compare the models and write
inference which model is best/optimized.

Inference:
1.8 Based on these predictions, what are the insights?

0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Problem 2: 

In this particular project, we are going to work on the inaugural corpora from the nltk in Python. We will be looking
at the following speeches of the Presidents of the United States of America:

1. President Franklin D. Roosevelt in 1941


2. President John F. Kennedy in 1961
3. President Richard Nixon in 1973

4. 2.1 Find the number of characters, words, and sentences for the mentioned documents. –
5. 2.2 Remove all the stopwords from all three speeches. –
6. 2.3 Which word occurs the most number of times in his inaugural address for each president? Mention the
top three words. (after removing the stopwords) –
7. 2.4 Plot the word cloud of each of the speeches of the variable. (after removing the stopwords) – refer to
the End-to-End Case Study done in the Mentored Learning Session ]

0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Executive Summary

You are hired by one of the leading news channels CNBE who wants to analyze recent elections. This survey was
conducted on 1525 voters with 9 variables. You have to build a model, to predict which party a voter will vote for
on the basis of the given information, to create an exit poll that will help in predicting overall win and seats
covered by a particular party

Introduction

1.1 Read the dataset. Do the descriptive statistics and do the null value condition check. Write an inference on it.
Sample of the dataset:

Table 1. Dataset

0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Insights:

The dataset consists of 1525 rows and 9 variables with data


types as categorical and numerical values, we have dropped
the Unnamed variable.

In data set there are 2 types of voting parties Labor and


conservative, divided in 2 gender groups male and female.
Minimum age of voting is 24 years old and maximum age is
93 years old with mean of 53 years.
The representation value of economic.cond.national and
household min is and max is with avg of 3
Voting description for blair, min is 1 with max of 5 and avg of
4 and blair count min is 1, max is 5 and avg is 2

Europe voting assessments stands with max description of 11


min of 1 and avg of 6, whereas for political knowledge max is
3, min is 0 and avg is 2

There are duplicate values present in dataset which we can


remove, also as per the count we can see there are higher
number of female voters as compared to male

0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
1.2 Perform Univariate and Bivariate Analysis. Do exploratory
data analysis. Check for Outliers. 

Univariate Analysis:

0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Minimum age of voting is 24 years old and maximum age is 93
years old with mean of 53 years.
The representation value of economic.cond.national and
household min is and max is with avg of 3
Voting description for blair, min is 1 with max of 5 and avg of
4 and blair count min is 1, max is 5 and avg is 2

Europe voting assessments stands with max description of 11


min of 1 and avg of 6, whereas for political knowledge max is
3, min is 0 and avg is 2

0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Multi Variate and Bivariate

0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Insights:
Data is not usually correlated as variables are moving in
opposite directions as for voters for blair is not voting for
Hague, there is a negative correlation

0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Check for Outliers:

Outliers are present is two variables for economic conditions


we have went ahead and treated the outliers

0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
1.3 Encode the data (having string values) for Modelling. Is
Scaling necessary here or not? Data Split: Split the data into
train and test (70:30). 

We have used dummy encoding method here

0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited

You might also like