Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

EXPLORATORY DATA ANALYSIS

TOTAL MARKS:70 DURATION: 3 HOURS

Instructions
1. Candidates should answer all the questions in the same order provided in the question paper.
2. Any activity that compromises the integrity of the examination will not be permitted.
3. Candidates should complete the examination within the provided timeline.
4. Candidates are expected to check and ensure that the correct answer file (in. ipynb format) is
uploaded in LMS.

SECTION A: 20 MARKS

Q1. Read the file 'Automobile_data.csv' and answer the following questions: (5Marks)
A. For the Dataset given below. Write a code to remove Hyphen (-) and change the datatype of the
column as numeric? (2 Marks)

B. For the Dataset given below. Write a code to Convert 'N' Category as 0 and 'P' category as 1 for the
Shortlisted Column? (1 Mark)
EXPLORATORY DATA ANALYSIS

C. For the Dataset given below. Create a calculated field Male Ratio which calculates the ratio of Male
Population to the total population? (2 Marks)

Q2. Read the dataset (German Credit Data.csv) and answer the questions below (5 Marks)
A. Draw the Count Plot for the 'Status' Column? (1 Marks)
B. Split the Dataset into Train and Test. Also give us the reason behind your split (2 Marks)
C. Is the Data imbalanced? If so what types of sampling methods can be used and write the code for
any one type of sampling (No need to execute)? (2 Marks)
Q3. Read the dataset (German Credit Data.csv) answer the questions below (5 Marks)
A. Draw the Count Plot for 'Checkin_acc' Column? (1 Marks)
EXPLORATORY DATA ANALYSIS

B. How does the distribution of 'Age' column look like and perform the test of Normality? (2 Marks)
C. How do you handle object variables? Write down the code for encoding? (2 Marks)
Q4. Read the dataset(bank.csv) answer the questions below (5 Marks)
A. Check for Null Values? (1 Marks)
B. Treat the Null values and also the reason for the method used (2 Marks)
C. Check the spellings in the dataframe and treat them accordingly? (2 Marks)
SECTION B: 20 MARKS
Q5. Read the dataset(beer.csv) answer the questions below (10 marks)
A. Check for outliers and how to treat them? (5 Marks)
B. Check the spelling of the brands by removing the alphanumeric value? (5 Marks)

Q6. Read the dataset (IPL.csv) answer the questions below (10 marks)
A. Which player got the maximum premium (Price) on the base price and What is the average SOLD
PRICE for each 'age' category? (5 Marks)
B. What are the outliers in Sold Price? Filter out the outliers and display the Name of the player, sold
price and their Playing role and Who are the highest sold players? (5 Marks)

SECTION C: 30 MARKS
Q7. Read the dataset(bollywood.csv) answer the questions below. (15 marks)
A. Is there any relationship between Genre and Release time? (5 Marks)
B. Which movie got the highest profit and which genre of movie has the highest budget? (5 Marks)
C. Which year has the highest box office collection (5 Marks)

Q8. Read the dataset (GLAXO.csv) answer the questions below. (15 marks)
A. Create new columns by splitting the date column into Day, Month and Year? (5 Marks)
B. What was the highest daily swing in the price? (5 Marks) Hint: Price High - Price Low = Daily Swing
C. Check the distribution of the close price? What type of transformation can be applied? (5 Marks)

You might also like