Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

SVKM’s NMIMS University

Mukesh Patel School of Technology Management & Engineering


Course: Python Programming 
PROGRAMME: B.Tech/MBATech. 
First Year AY 2022-2023 Semester: II

PRACTICAL 7
Part A (To be referred by students)

Built-in Package: Pandas (dataframes)


SAVE THE FILE AND UPLOAD AS (RollNo_Name_Exp7)

Problem Statement: Write Python program to

1. Create data dictionary for below data, then convert it into data frame.
i) Print all the columns where the name of students begin with letter A and
percentage is higher than 85 using “ index” attribute .
ii) Print Age column “loc” function. Print average age.
iii)Print 0th and 2nd index column using “iloc” function.
iv)Update the Percentage column between 0 and 1.
Name Age Stream Percentage
Rima 21 Math 58
Alok 19 Commerce 92
Anandita 20 Arts 85
Priyanka 18 Biology 30

2. Based on the dataset ”Olympics.csv” answer the following


What is the first country in df?
Which country has won the most gold medals in summer games?
Which country had the biggest difference between their summer and winter gold
medal counts?
Which country has the biggest difference between their summer and winter gold
medal counts relative to their total gold medal count? Only include countries that have
won at least 1 gold in both summer and winter.
Write a function to update the dataframe to include a new column called "Points"
which is a weighted value where each gold medal counts for 3 points, silver medals
for 2 points, and bronze medals for 1 point. The function should return only the
column (a Series object) which you created
3. Create charts for below dataset.

i) Scatter plot of two columns – name and num_children , num_children and


num_pets
ii) Bar plot of column values – name and age, between any other two columns
of your choice.

1|Page
SVKM’s NMIMS University
Mukesh Patel School of Technology Management & Engineering
Course: Python Programming 
PROGRAMME: B.Tech/MBATech. 
First Year AY 2022-2023 Semester: II

iii) Line plot, multiple columns – name and num_children, num_pets


iv) Save plot to file -
v) Bar plot with group by – Number of unique names per state
vi) Stacked bar plot with group by – hint – Learn concept of dummy variables

vii) Stacked bar plot with group by, normalized to 100%

ix) Stacked bar plot, two-level group by - Stacked bar chart showing the
number of people per state, split into males and females

x) Stacked bar plot with two-level group by, normalized to 100% - Count
grouped by state and gender, with normalized columns so that each sums up to
100%

2|Page
SVKM’s NMIMS University
Mukesh Patel School of Technology Management & Engineering
Course: Python Programming 
PROGRAMME: B.Tech/MBATech. 
First Year AY 2022-2023 Semester: II

Topic covered: Built-in Package: Pandas (data frames).

Learning Objective: Learner would be able to


1. Learn to create data frames using pandas package.
2. Learn data analysis using Pandas package.
3. Handle large amount of data to create data visualization.

Theory:
Pandas is a Python library used for working with data sets. It has functions for analyzing,
cleaning, exploring, and manipulating data. Pandas use the loc attribute to return one or
more specified row(s)

# Create dataset
import pandas as pd
mydataset = { 'cars': ["BMW", "Volvo", "Ford"], 'passings': [3, 7, 2] }
myvar = pd.DataFrame(mydataset)
print(myvar)

Pandas series is one-dimensional array holding data of any type. It is like is like a column in
a table
# create series with index value
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)
print(myvar["y"]) # return value of y

# Create series from Dictionary


import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories)
print(myvar)

O/P - day1 420


day2 380
day3 390

3|Page
SVKM’s NMIMS University
Mukesh Patel School of Technology Management & Engineering
Course: Python Programming 
PROGRAMME: B.Tech/MBATech. 
First Year AY 2022-2023 Semester: II

dtype: int64
# Create dataframe from two series
import pandas as pd
data = { "calories": [420, 380, 390], "duration": [50, 40, 45] }
myvar = pd.DataFrame(data)
print(myvar)
print(df.loc[0]) #refer to the 0 row index
myvar 1= pd.DataFrame(data, index = ["day1", "day2", "day3"])
print(myvar1)
print(df.loc[0]) #refer to the 0 row index

If your data sets are stored in a file, Pandas can load them into a DataFrame.
import pandas as pd
df = pd.read_csv('data.csv') # Give path if not in same directory
print(df)
print(pd.options.display.max_rows) # Checks maximum number of rows:
print(df.head(10)) # printing the first 10 rows of the DataFrame:
print(df.tail()) # printing the last 5 rows of the DataFrame:
print(df.info()) # Print information about the data:

Data Cleaning

Data cleaning means fixing bad data in your data set. Bad data could be: Empty cells, Data in
wrong format, Wrong data, Duplicates
In this tutorial you will learn how to deal with all of them.

import pandas as pd
df = pd.read_csv('data.csv')
new_df = df.dropna() # Returns a new DataFrame without changing original dataframe
df.dropna(inplace = True) # Changes original DataFrame
print(new_df)

df.fillna(130, inplace = True) # Replace NULL values with the number 130:
df["Calories"].fillna(130, inplace = True) # Replace column calories only
x = df["Calories"].mean()
df["Calories"].fillna(x, inplace = True) # Replace by mean value

Data Plotting

Pandas uses the plot() method to create diagrams. We can use Pyplot, a submodule of the
Matplotlib library to visualize the diagram on the screen.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
df.plot()
df.plot(kind='scatter',x='Duration’,y='Calories') # Draws Scatter plot

4|Page
SVKM’s NMIMS University
Mukesh Patel School of Technology Management & Engineering
Course: Python Programming 
PROGRAMME: B.Tech/MBATech. 
First Year AY 2022-2023 Semester: II

# To Do -> Plot between Duration and Maxpulse


plt.show()
Histogram - A histogram shows us the frequency of each interval, e.g. how many workouts
lasted between 50 and 60 minutes?

Use the kind argument to specify that you want a histogram: kind = 'hist'
df["Duration"].plot(kind = 'hist')

5|Page
SVKM’s NMIMS University
Mukesh Patel School of Technology Management & Engineering
Course: Python Programming 
PROGRAMME: B.Tech/MBATech. 
First Year AY 2022-2023 Semester: II

PRACTICAL 7
Part B (to be completed by students)

Built-in Package: Pandas (dataframes)


(Students must submit the soft copy as per the following segments. A soft copy containing
Part A and Part B answered must be uploaded on the platform specified by the Practical
Teacher. The filename should be RollNo_Name_Exp7)

Roll No.: A113 Name: Aayush Singh


Prog/Yr/Sem: 2 Batch: 2
Date of Experiment: Date of Submission:

1. Program Code along with Sample Output: (Paste your programs [1,2,3,4,5], input and
output screen shot for programs [1,2,3,4,5])

2. Conclusion (Learning Outcomes): Reflect on the questions answered by you jot down
your learnings about the Topic: Data Types, Input / Output Statements and Operators.

6|Page

You might also like