Module 7 Python For Data Analytics Assignment

MODULE – 7 ASSIGNMENT
Python for data analytics
Please implement Python coding for all the problems.
1) Please take care of missing data present in the “Data.csv” file using python module
“sklearn.impute” and its methods, also collect all the data that has “Salary” less than “70,000”.
2) Subtracting dates:
Python date objects let us treat calendar dates as something similar to numbers: we can
compare them, sort them, add, and even subtract them. Do math with dates in a way that
would be a pain to do by hand. The 2007 Florida hurricane season was one of the busiest on
record, with 8 hurricanes in one year. The first one hit on May 9th, 2007, and the last one hit
on December 13th, 2007. How many days elapsed between the first and last hurricane in
2007?
Instructions:
Import date from datetime.
Create a date object for May 9th, 2007, and assign it to the start variable.
Create a date object for December 13th, 2007, and assign it to the end variable.
Subtract start from end, to print the number of days in the resulting timedelta object.
3) Representing dates in different ways
Date objects in Python have a great number of ways they can be printed out as strings. In
some cases, you want to know the date in a clear, language-agnostic format. In other cases,
you want something which can fit into a paragraph and flow naturally.
Print out the same date, August 26, 1992 (the day that Hurricane Andrew made landfall in
Florida), in a number of different ways, by using the “ .strftime() ” method. Store it in a
variable called “Andrew”.
Instructions:
Print it in the format 'YYYY-MM', 'YYYY-DDD' and 'MONTH (YYYY)'
4) For the dataset “Indian_cities”,

a) Find out top 10 states in female-male sex ratio
© 360DigiTMG. All Rights Reserved.

b) Find out top 10 cities in total number of graduates
c) Find out top 10 cities and their locations in respect of total effective_literacy_rate.
5) For the data set “Indian_cities”

a) Construct histogram on literates_total and comment about the inferences
b) Construct scatter plot between male graduates and female graduates
6) For the data set “Indian_cities”

a) Construct Boxplot on total effective literacy rate and draw inferences
b) Find out the number of null values in each column of the dataset and delete them.
1)
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
data = pd.read_csv(r"D:\Data.csv")
print("Number of missing values in each column:\n",data.isna().sum())
numerical_imputer = SimpleImputer(strategy = "mean")
categorical_imputer = SimpleImputer(strategy="constant" , fill_value = "missing")
categorical_columns = ["Position" , "State", "Sex", "MaritalDesc", "CitizenDesc", "EmploymentStatus",

"Department", "Race"]
numerical_columns = ["Salaries" , "age"]
transformer = ColumnTransformer([
("categorical_imputer" , categorical_imputer , categorical_columns),
("numerical_imputer" , numerical_imputer , numerical_columns)])
filled = transformer.fit_transform(data)
data_filled = pd.DataFrame(filled , columns=["Position" , "State", "Sex", "MaritalDesc", "CitizenDesc",

"EmploymentStatus", "Department", "Salaries", "age", "Race"])
print("After imputing:\n",data_filled.isna().sum())

print(data_filled)
2) subtracting dates
from datetime import date
end_date = date(2007, 12, 13)
start_date = date(2007, 5, 9)
timedelta = end_date - start_date
print(f"The number of days elapsed between first and last hurricane in 2007 is {timedelta.days} days")
3) Representing dates in different ways
from datetime import date
andrew = date(1992, 8, 26)
print(andrew.strftime("%Y-%m")) #printing date in 'YYYY-MM'
print(andrew.strftime("%Y-%j")) #printing date in 'YYYY-DDD'
print(andrew.strftime("%B (%Y)")) #printing date in 'MONTH (YYYY)'
4)
a) Find out top 10 states in female-male sex ratio
import pandas as pd
df = pd.read_csv(r'D:\Indian_cities.csv')
df1 = df.sort_values(by = ['sex_ratio'], ascending = False)
df1.state_name.head(10)
b) Find out top 10 cities in total number of graduates
import pandas as pd
df2 = df.sort_values(by = ['total_graduates'], ascending = False)

df2.name_of_city.head(10)
c)Find out top 10 cities and their locations in respect of total effective_literacy_rate.
import pandas as pd
df3 = df.sort_values(by = ['effective_literacy_rate_total'], ascending = False)
df3[['name_of_city', 'location']].head(10)
5)
a) histogram
import pandas as pd
import matplotlib.pyplot as plt
plt.hist(data = df,x='literates_total', histtype='bar',rwidth=0.8,color='blue')
plt.xlabel('literates_total')
plt.show()
b) scatter plot
import pandas as pd
import matplotlib.pyplot as plt
plt.scatter(data = df,x = 'male_graduates', y = 'female_graduates')
plt.xlabel('male_graduates')
plt.ylabel('female_graduates')

plt.show()
6)
a) box plot
import pandas as pd
import seaborn as sns
sns.boxplot(df.effective_literacy_rate_total)
b) null values
import pandas as pd
df.isnull().sum()
df.dropna()

Module 7 Python For Data Analytics Assignment

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 7 Python For Data Analytics Assignment

Uploaded by

Copyright:

Available Formats

MODULE – 7 ASSIGNMENT

Python for data analytics

Please implement Python coding for all the problems.

Import date from datetime.

3) Representing dates in different ways

Print it in the format 'YYYY-MM', 'YYYY-DDD' and 'MONTH (YYYY)'

4) For the dataset “Indian_cities”,

© 360DigiTMG. All Rights Reserved.

5) For the data set “Indian_cities”

6) For the data set “Indian_cities”

from sklearn.impute import SimpleImputer

from sklearn.compose import ColumnTransformer

print("Number of missing values in each column:\n",data.isna().sum())

numerical_imputer = SimpleImputer(strategy = "mean")

categorical_imputer = SimpleImputer(strategy="constant" , fill_value = "missing")

categorical_columns = ["Position" , "State", "Sex", "MaritalDesc", "CitizenDesc", "EmploymentStatus",

numerical_columns = ["Salaries" , "age"]

("categorical_imputer" , categorical_imputer , categorical_columns),

("numerical_imputer" , numerical_imputer , numerical_columns)])

data_filled = pd.DataFrame(filled , columns=["Position" , "State", "Sex", "MaritalDesc", "CitizenDesc",

© 360DigiTMG. All Rights Reserved.

from datetime import date

end_date = date(2007, 12, 13)

timedelta = end_date - start_date

3) Representing dates in different ways

from datetime import date

andrew = date(1992, 8, 26)

print(andrew.strftime("%Y-%m")) #printing date in 'YYYY-MM'

print(andrew.strftime("%Y-%j")) #printing date in 'YYYY-DDD'

print(andrew.strftime("%B (%Y)")) #printing date in 'MONTH (YYYY)'

a) Find out top 10 states in female-male sex ratio

df1 = df.sort_values(by = ['sex_ratio'], ascending = False)

b) Find out top 10 cities in total number of graduates

df2 = df.sort_values(by = ['total_graduates'], ascending = False)

© 360DigiTMG. All Rights Reserved.

df3 = df.sort_values(by = ['effective_literacy_rate_total'], ascending = False)

import matplotlib.pyplot as plt

plt.hist(data = df,x='literates_total', histtype='bar',rwidth=0.8,color='blue')

import matplotlib.pyplot as plt

plt.scatter(data = df,x = 'male_graduates', y = 'female_graduates')

© 360DigiTMG. All Rights Reserved.

import seaborn as sns

© 360DigiTMG. All Rights Reserved.

You might also like