Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 5

MODULE – 7 ASSIGNMENT

Python for data analytics

Please implement Python coding for all the problems.

1) Please take care of missing data present in the “Data.csv” file using python module
“sklearn.impute” and its methods, also collect all the data that has “Salary” less than “70,000”.

2) Subtracting dates:
Python date objects let us treat calendar dates as something similar to numbers: we can
compare them, sort them, add, and even subtract them. Do math with dates in a way that
would be a pain to do by hand. The 2007 Florida hurricane season was one of the busiest on
record, with 8 hurricanes in one year. The first one hit on May 9th, 2007, and the last one hit
on December 13th, 2007. How many days elapsed between the first and last hurricane in
2007?

Instructions:

Import date from datetime.

Create a date object for May 9th, 2007, and assign it to the start variable.

Create a date object for December 13th, 2007, and assign it to the end variable.

Subtract start from end, to print the number of days in the resulting timedelta object.

3) Representing dates in different ways

Date objects in Python have a great number of ways they can be printed out as strings. In
some cases, you want to know the date in a clear, language-agnostic format. In other cases,
you want something which can fit into a paragraph and flow naturally.

Print out the same date, August 26, 1992 (the day that Hurricane Andrew made landfall in
Florida), in a number of different ways, by using the “ .strftime() ” method. Store it in a
variable called “Andrew”.

Instructions:

Print it in the format 'YYYY-MM', 'YYYY-DDD' and 'MONTH (YYYY)'

4) For the dataset “Indian_cities”,


a) Find out top 10 states in female-male sex ratio

© 360DigiTMG. All Rights Reserved.


b) Find out top 10 cities in total number of graduates
c) Find out top 10 cities and their locations in respect of total effective_literacy_rate.

5) For the data set “Indian_cities”


a) Construct histogram on literates_total and comment about the inferences
b) Construct scatter plot between male graduates and female graduates

6) For the data set “Indian_cities”


a) Construct Boxplot on total effective literacy rate and draw inferences
b) Find out the number of null values in each column of the dataset and delete them.

1)

import pandas as pd

from sklearn.impute import SimpleImputer

from sklearn.compose import ColumnTransformer

data = pd.read_csv(r"D:\Data.csv")

print("Number of missing values in each column:\n",data.isna().sum())

numerical_imputer = SimpleImputer(strategy = "mean")

categorical_imputer = SimpleImputer(strategy="constant" , fill_value = "missing")

categorical_columns = ["Position" , "State", "Sex", "MaritalDesc", "CitizenDesc", "EmploymentStatus",


"Department", "Race"]

numerical_columns = ["Salaries" , "age"]

transformer = ColumnTransformer([

("categorical_imputer" , categorical_imputer , categorical_columns),

("numerical_imputer" , numerical_imputer , numerical_columns)])

filled = transformer.fit_transform(data)

data_filled = pd.DataFrame(filled , columns=["Position" , "State", "Sex", "MaritalDesc", "CitizenDesc",


"EmploymentStatus", "Department", "Salaries", "age", "Race"])

print("After imputing:\n",data_filled.isna().sum())

© 360DigiTMG. All Rights Reserved.


print(data_filled)

2) subtracting dates

from datetime import date

end_date = date(2007, 12, 13)

start_date = date(2007, 5, 9)

timedelta = end_date - start_date

print(f"The number of days elapsed between first and last hurricane in 2007 is {timedelta.days} days")

3) Representing dates in different ways

from datetime import date

andrew = date(1992, 8, 26)

print(andrew.strftime("%Y-%m")) #printing date in 'YYYY-MM'

print(andrew.strftime("%Y-%j")) #printing date in 'YYYY-DDD'

print(andrew.strftime("%B (%Y)")) #printing date in 'MONTH (YYYY)'

4)

a) Find out top 10 states in female-male sex ratio

import pandas as pd

df = pd.read_csv(r'D:\Indian_cities.csv')

df1 = df.sort_values(by = ['sex_ratio'], ascending = False)

df1.state_name.head(10)

b) Find out top 10 cities in total number of graduates

import pandas as pd

df = pd.read_csv(r'D:\Indian_cities.csv')

df2 = df.sort_values(by = ['total_graduates'], ascending = False)

© 360DigiTMG. All Rights Reserved.


df2.name_of_city.head(10)

c)Find out top 10 cities and their locations in respect of total effective_literacy_rate.

import pandas as pd

df = pd.read_csv(r'D:\Indian_cities.csv')

df3 = df.sort_values(by = ['effective_literacy_rate_total'], ascending = False)

df3[['name_of_city', 'location']].head(10)

5)

a) histogram

import pandas as pd

import matplotlib.pyplot as plt

df = pd.read_csv(r'D:\Indian_cities.csv')

plt.hist(data = df,x='literates_total', histtype='bar',rwidth=0.8,color='blue')

plt.xlabel('literates_total')

plt.show()

b) scatter plot

import pandas as pd

import matplotlib.pyplot as plt

df = pd.read_csv(r'D:\Indian_cities.csv')

plt.scatter(data = df,x = 'male_graduates', y = 'female_graduates')

plt.xlabel('male_graduates')

plt.ylabel('female_graduates')

© 360DigiTMG. All Rights Reserved.


plt.show()

6)

a) box plot

import pandas as pd

import seaborn as sns

df = pd.read_csv(r'D:\Indian_cities.csv')

sns.boxplot(df.effective_literacy_rate_total)

b) null values

import pandas as pd

df = pd.read_csv(r'D:\Indian_cities.csv')

df.isnull().sum()

df.dropna()

© 360DigiTMG. All Rights Reserved.

You might also like