Python Libraries

What is a Module?
Consider a module to be the same as a code library.
A file containing a set of functions you want to include in your application.
Create a Module
To create a module just save the code you want in a file with the file
extension .py:
Example
Save this code in a file named mymodule.py
def greeting(name):
print("Hello, " + name)
Use a Module
Now we can use the module we just created, by using the import statement:
Example
Import the module named mymodule, and call the greeting function:
import mymodule
mymodule.greeting("Jonathan")
Variables in Module
The module can contain functions, as already described, but also variables of
all types (arrays, dictionaries, objects etc):
Example
Save this code in the file mymodule.py
person1 = {
"name": "John",
"age": 36,
"country": "Norway"
}
Example
Import the module named mymodule, and access the person1 dictionary:
import mymodule
a = mymodule.person1["age"]
print(a)
Re-naming a Module
You can create an alias when you import a module, by using the as keyword:
Example
Create an alias for mymodule called mx:
import mymodule as mx
a = mx.person1["age"]
print(a)
Introduction to Matplotlib
Matplotlib is an amazing visualization library in Python for 2D plots of arrays.

Matplotlib is a multi-platform data visualization library built on NumPy arrays and
designed to work with the broader SciPy stack. It was introduced by John Hunter in
the year 2002. One of the greatest benefits of visualization is that it allows us visual
access to huge amounts of data in easily digestible visuals. Matplotlib consists of
several plots like line, bar, scatter, histogram etc.
Importing matplotlib
After successfully installing matplotlib , You can run this command to import
matplotlib on your system.
Import matplotlib
Basic plots in Matplotlib

Matplotlib comes with a wide variety of plots. Plots help to understand trends, and
patterns, and to make correlations. They’re typically instruments for reasoning about
quantitative information. Some of the sample plots are covered here.
Line plot using Matplotlib
By importing the matplotlib module, defines x and y values for a plots, plots the data
using the plot() function and it helps to display the plot by using the show() function .
The plot() creates a line plot by connecting the points defined by x and y values.
from matplotlib import pyplot as plt

# x-axis values
x = [5, 2, 9, 4, 7]
# Y-axis values
y = [10, 5, 8, 4, 2]
# Function to plot
plt.plot(x,y)
# function to show the plot
plt.show()
Output :
Bar plot using Matplotlib

By using matplotlib library in python , it allows us to access the functions and classes
provided by the library for plotting. There are tow list x and y are defined . This
function creates a bar plot by taking x-axis and y-axis values as arguments and
generates the bar plot based on those values.
# importing matplotlib module

# x-axis values
x = [5, 2, 9, 4, 7]
# Y-axis values
y = [10, 5, 8, 4, 2]
# Function to plot the bar
plt.bar(x,y)
plt.show()
Output:
Histogram using Matplotlib

By using the matplotlib module defines the y-axis values for a histogram plot. Plots in
histogram using the hist() function and displays the plot using the show() function.
The hist() function creates a histogram plot based on the values in the y-axis list.

# Y-axis values
y = [10, 5, 8, 4, 2]
# Function to plot histogram
plt.hist(y)
# Function to show the plot
plt.show()
Output:
Scatter Plot using Matplotlib

By imports the matplotlib module, defines x and y values for a scatter plot, plots the
data using the scatter() function, and displays the plot using the show() function. The
scatter() function creates a scatter plot by plotting individual data points defined by
the x and y values.

# x-axis values
x = [5, 2, 9, 4, 7]
# Y-axis values
y = [10, 5, 8, 4, 2]
# Function to plot scatter
plt.scatter(x, y)
plt.show()
Output:
Pandas
Pandas is an open-source library that is built on top of NumPy library. It is a Python
package that offers various data structures and operations for manipulating numerical
data and time series. It is mainly popular for importing and analysing data much
easier. Pandas is fast and it has high-performance & productivity for users.
A Dataframe is a two-dimensional data structure, i.e., data is aligned in a tabular
fashion in rows and columns. In dataframe datasets are arranged in rows and columns,
we can store any number of datasets in a dataframe. We can perform many operations
on these datasets like arithmetic operation, columns/rows selection, columns/rows
addition etc.
Creating an empty dataframe :
A basic DataFrame, which can be created is an Empty Dataframe. An Empty
Dataframe is created just by calling a dataframe constructor.
# import pandas as pd
import pandas as pd
# Calling DataFrame constructor

df = pd.DataFrame()
print(df)
Output :
Empty DataFrame
Columns: []
Index: []
Creating a dataframe using List:

DataFrame can be created using a single list or a list of lists.
# import pandas as pd
import pandas as pd
# list of strings
lst = ['Hello', 'This', 'is',
'python', 'Class', 'BMS']
# Calling DataFrame constructor on list

df = pd.DataFrame(lst)
print(df)
Output:
0
0 Hello
1 This
2 is
3 python
4 Class
5 BMS
Creating DataFrame from dict of narray/lists:

To create DataFrame from dict of narray/list, all the narray must be of same length.
Example
Create a simple Pandas DataFrame:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
#load data into a DataFrame object:

df = pd.DataFrame(data)
print(df)
Result
calories duration
0 420 50
1 380 40
2 390 45
Locate Row
As you can see from the result above, the DataFrame is like a table with
rows and columns.
Pandas use the loc attribute to return one or more specified row(s)
Example
Return row 0:
#refer to the row index:

print(df.loc[0])
Result
calories 420
duration 50
Name: 0, dtype: int64
Return row 0 and 1:
#use a list of indexes:

print(df.loc[[0, 1]])
Note: When using [], the result is a Pandas DataFrame.
Named Indexes
With the index argument, you can name your own indexes.
Example
Add a list of names to give each row a name:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
print(df)
Result
calories duration
day1 420 50
day2 380 40
day3 390 45
Locate Named Indexes

Use the named index in the loc attribute to return the specified row(s).
Example
Return "day2":
#refer to the named index:
print(df.loc["day2"])
Result
calories 380
duration 40
Name: day2, dtype: int64
Load Files Into a DataFrame

If your data sets are stored in a file, Pandas can load them into a DataFrame.
CSV files contains plain text and is a well know format that can be read by
everyone including Pandas.
Example
Load a comma separated file (CSV file) into a DataFrame:
import pandas as pd
df = pd.read_csv('data.csv')
OR (in case you have a link where csv file is stored)
df =
pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv")
print(df.to_string())
use to_string() to print the entire DataFrame.
If you have a large DataFrame with many rows, Pandas will only return the
first 5 rows, and the last 5 rows
Manipulating Data in Data Frame
1. Adding new column to existing DataFrame in Pandas

There are multiple ways we can do this task.
By declaring a new list as a column
# Import pandas package
import pandas as pd
# Define a dictionary containing Students data

data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Height': [5.1, 6.2, 5.1, 5.2],
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
# Convert the dictionary into DataFrame

# Declare a list that is to be converted into a column

address = ['Delhi', 'Bangalore', 'Chennai', 'Patna']
# Using 'Address' as the column name

# and equating it to the list
df['Address'] = address
# Observe the result

print(df)
Output:
the length of your list should match the length of the index column otherwise it
will show an error.
By using DataFrame.insert()
It gives the freedom to add a column at any position we like and not just at the end. It
also provides different options for inserting the column values.

import pandas as pd

'Height': [5.1, 6.2, 5.1, 5.2],

# Using DataFrame.insert() to add a column

df.insert(2, "Age", [21, 23, 24, 21], True)
print(df)
Output:
Using Dataframe.assign() method

This method will create a new dataframe with a new column added to the old
dataframe.
import pandas as pd

'Height': [5.1, 6.2, 5.1, 5.2],

# Using 'Address' as the column name and equating it to the list

df2 = df.assign(address=['Delhi', 'Bangalore', 'Chennai', 'Patna'])

print(df2)
Output:
By using a dictionary
We can use a Python dictionary to add a new column in pandas DataFrame. Use an
existing column as the key values and their respective values will be the values for a
new column.
import pandas as pd

'Height': [5.1, 6.2, 5.1, 5.2],
# Define a dictionary with key values of

# an existing column and their respective
# value pairs as the # values for our new column.
address = {'Delhi': 'Jai', 'Bangalore': 'Princi',
'Patna': 'Gaurav', 'Chennai': 'Anuj'}

# Provide 'Address' as the column name

df['Address'] = address
# Observe the output

print(df)
Pandas head() method is used to return top n (5 by default) rows of a data frame or
series. The head() method returns the headers and a specified number of rows,
starting from the top.
Syntax: Dataframe.head(n=5)
Parameters:
n: integer value, number of rows to be returned
import pandas as pd
# making data frame

Data=pd.read_csv("https://media.geeksforgeeks.org/wpcontent/uploads/
nba.csv")
# calling head() method
# storing in new variable
data_top = data.head()
# display
data_top
There is also a tail() method for viewing the last rows of the DataFrame.
The tail() method returns the headers and a specified number of rows,
starting from the bottom.
Print the last 5 rows of the DataFrame:
print(df.tail())
Delete rows/columns from DataFrame using Pandas.drop()
Rows or columns can be removed using an index label or column name using this
method.
Syntax:
DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None,
inplace=False, errors=’raise’)
Parameters:
labels: String or list of strings referring row or column name.
axis: int or string value, 0 ‘index’ for Rows and 1 ‘columns’ for Columns.
index or columns: Single label or list. index or columns are an alternative to axis
and cannot be used together. level: Used to specify level in case data frame is having
multiple level index.
inplace: Makes changes in original Data Frame if True.
errors: Ignores error if any value from the list doesn’t exists and drops rest of the
values when errors = ‘ignore’ Return type: Dataframe with dropped values
Dropping Rows by index label
In this code, A list of index labels is passed and the rows corresponding to those labels
are dropped using .drop() method.
# importing pandas module
import pandas as pd
# making data frame from csv file

data = pd.read_csv("nba.csv", index_col="Name")
print(data.head(5))
Note: index_col:
This is to allow you to set which columns to be used
as the index of the dataframe. The default value is None, and
pandas will add a new column start from 0 to specify the index
column.
Output: Data Frame before Dropping values
Team Number Position Age Height Weight
College Salary
Name
Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0
Texas 7730337.0
Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0
Marquette 6796117.0
John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0
Boston University NaN
R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0
Georgia State 1148640.0
Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0
NaN 5000000.0
Applying the drop function.

# dropping passed values
data.drop(["Avery Bradley", "John Holland", "R.J. Hunter"], inplace =
True)
# display
print(data)
Output: Data Frame after Dropping values

As shown in the output before, the new output doesn’t have the passed values. Those
values were dropped and the changes were made in the original data frame since
inplace was True.
College Salary
Name
Marquette 6796117.0
NaN 5000000.0
Amir Johnson Boston Celtics 90.0 PF 29.0 6-9 240.0
NaN 12000000.0
Jordan Mickey Boston Celtics 55.0 PF 21.0 6-8 235.0
LSU 1170960.0
Kelly Olynyk Boston Celtics 41.0 C 25.0 7-0 238.0
Gonzaga 2165160.0
Dropping columns with column name
In this code, Passed columns are dropped using column names. axis parameter is kept
1 since 1 refers to columns.
import pandas as pd
# making data frame from csv file

data = pd.read_csv("nba.csv", index_col ="Name" )
print(data.head())
Output: Data Frame before Dropping Columns

College Salary
Name
Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0
Texas 7730337.0
Marquette 6796117.0
John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0
Boston University NaN
R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0
Georgia State 1148640.0
NaN 5000000.0
Applying drop function.

# dropping passed columns
data.drop(["Team", "Weight"], axis = 1, inplace = True)
# display
print(data.head())
Output: Data Frame after Dropping Columns

As shown in the output images, the new output doesn’t have the passed columns.
Those values were dropped since the axis was set equal to 1 and the changes were
made in the original data frame since inplace was True.
Number Position Age Height College

Salary
Name
Avery Bradley 0.0 PG 25.0 6-2 Texas
7730337.0
Jae Crowder 99.0 SF 25.0 6-6 Marquette
6796117.0
John Holland 30.0 SG 27.0 6-5 Boston University
NaN
R.J. Hunter 28.0 SG 22.0 6-5 Georgia State
1148640.0
Jonas Jerebko 8.0 PF 29.0 6-10 NaN
5000000.0
Pandas Merging, Joining, and Concatenating

Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure
with labelled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is
aligned in a tabular fashion in rows and columns. We can join, merge, and concat dataframe using
different methods. In Dataframe df.merge(),df.join(), and df.concat() methods help in joining,
merging and concating different dataframe.
Concatenating DataFrame using .concat() :

In order to concat a dataframe, we use .concat() function this function concat a dataframe and returns
a new dataframe.
import pandas as pd
# Define a dictionary containing employee data

data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}

data2 = {'Name':['Abhi', 'Ayushi', 'Dhiraj', 'Hitesh'],
'Age':[17, 14, 12, 52],
'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']}

df = pd.DataFrame(data1,index=[0, 1, 2, 3])

df1 = pd.DataFrame(data2, index=[4, 5, 6, 7])
print(df, "\n\n", df1)

Now we apply .concat function in order to concat two dataframe
frames = [df, df1]
res1 = pd.concat(frames)
res1
Concatenating DataFrame using .append()

In order to concat a dataframe, we use .append() function this function concatenate along axis=0,
namely the index.
res = df.append(df1)
res
Merging DataFrame
Pandas have options for high-performance in-memory merging and joining. When
we need to combine very large DataFrames, joins serve as a powerful way to
perform these operations swiftly. Joins can only be done on two DataFrames at a
time, denoted as left and right tables. The key is the common column that the two
DataFrames will be joined on. It’s a good practice to use keys which have unique
values throughout the column to avoid unintended duplication of row values. Pandas
provide a single function, merge(), as the entry point for all standard database join
operations between DataFrame objects.
There are four basic ways to handle the join (inner, left, right, and outer), depending
on which rows must retain their data.
Code #1 : Merging a dataframe with one unique key combination

import pandas as pd

data1 = {'key': ['K0', 'K1', 'K2', 'K3'],
'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],}

data2 = {'key': ['K0', 'K1', 'K2', 'K3'],

df = pd.DataFrame(data1)
df1 = pd.DataFrame(data2)
Now we are using .merge() with one unique key combination

# using .merge() function
res = pd.merge(df, df1, on='key')
res
Output:
Code #2: Merging dataframe using multiple join keys.

import pandas as pd

data1 = {'key': ['K0', 'K1', 'K2', 'K3'],
'key1': ['K0', 'K1', 'K0', 'K1'],
'Age':[27, 24, 22, 32],}

data2 = {'key': ['K0', 'K1', 'K2', 'K3'],
'key1': ['K0', 'K0', 'K0', 'K0'],


Now we merge dataframe using multiple keys

# merging dataframe using multiple keys
res1 = pd.merge(df, df1, on=['key', 'key1'])
res1
Output :
Merging dataframe using how in an argument:

We use how argument to merge specifies how to determine which keys are to be
included in the resulting table. If a key combination does not appear in either the left
or right tables, the values in the joined table will be NA. Here is a summary of the
how options and their SQL equivalent names:
MERGE METHOD JOIN NAME DESCRIPTION
left LEFT OUTER JOIN Use keys from left frame only
right RIGHT OUTER JOIN Use keys from right frame only
outer FULL OUTER JOIN Use union of keys from both frames
inner INNER JOIN Use intersection of keys from both frames
import pandas as pd
data1 = {'key': ['K0', 'K1', 'K2', 'K3'],
'key1': ['K0', 'K1', 'K0', 'K1'],
'Age':[27, 24, 22, 32],}

data2 = {'key': ['K0', 'K1', 'K2', 'K3'],
'key1': ['K0', 'K0', 'K0', 'K0'],
Run on IDE
Now we set how = 'left' in order to use keys from left frame only.
# using keys from left frame
res = pd.merge(df, df1, how='left', on=['key', 'key1'])
res
Output :
Now we set how = 'right' in order to use keys from right frame only.
# using keys from right frame
res1 = pd.merge(df, df1, how='right', on=['key', 'key1'])

res1
Output :
Now we set how = 'outer' in order to get union of keys from dataframes.
# getting union of keys
res2 = pd.merge(df, df1, how='outer', on=['key', 'key1'])
res2
Output :
Now we set how = 'inner' in order to get intersection of keys from dataframes.
# getting intersection of keys
res3 = pd.merge(df, df1, how='inner', on=['key', 'key1'])

res3
Output :
Joining DataFrame
In order to join dataframe, we use .join() function this function is used for combining
the columns of two potentially differently-indexed DataFrames into a single result
DataFrame.
import pandas as pd
'Age':[27, 24, 22, 32]}
data2 = {'Address':['Allahabad', 'Kannuaj', 'Allahabad', 'Kannuaj'],
'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons']}
df = pd.DataFrame(data1,index=['K0', 'K1', 'K2', 'K3'])

df1 = pd.DataFrame(data2, index=['K0', 'K2', 'K3', 'K4'])
Run on IDE
Now we are use .join() method in order to join dataframes

# joining dataframe
res = df.join(df1)
res
Output :
Now we use how = 'outer' in order to get union

# getting union
res1 = df.join(df1, how='outer')
res1
Output :
Joining dataframe using on in an argument :

In order to join dataframes we use on in an argument. join() takes an optional on
argument which may be a column or multiple column names, which specifies that the
passed DataFrame is to be aligned on that column in the DataFrame.
import pandas as pd
'Age':[27, 24, 22, 32],
'Key':['K0', 'K1', 'K2', 'K3']}
data2 = {'Address':['Allahabad', 'Kannuaj', 'Allahabad', 'Kannuaj'],
df1 = pd.DataFrame(data2, index=['K0', 'K2', 'K3', 'K4'])
Run on IDE
Now we are using .join with “on” argument

# using on argument in join
res2 = df.join(df1, on='Key')
res2
Output :
Joining singly-indexed DataFrame with multi-indexed DataFrame :

In order to join singly indexed dataframe with multi-indexed dataframe, the level will
match on the name of the index of the singly-indexed frame against a level name of
the multi-indexed frame.
import pandas as pd
data1 = {'Name':['Jai', 'Princi', 'Gaurav'],
'Age':[27, 24, 22]}
data2 = {'Address':['Allahabad', 'Kannuaj', 'Allahabad', 'Kanpur'],
df = pd.DataFrame(data1, index=pd.Index(['K0', 'K1', 'K2'], name='key'))

index = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),
('K2', 'Y2'), ('K2', 'Y3')],
names=['key', 'Y'])
df1 = pd.DataFrame(data2, index= index)
Run on IDE
Now we join singly indexed dataframe with multi-indexed dataframe

# joining singly indexed with
# multi indexed
result = df.join(df1, how='inner')
result
Output :

Python Libraries

Uploaded by

Copyright:

Available Formats

You might also like

Python Libraries

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Python Libraries

Uploaded by

Copyright:

Available Formats

What is a Module?

Consider a module to be the same as a code library.

A file containing a set of functions you want to include in your application.

Matplotlib is an amazing visualization library in Python for 2D plots of arrays.

Basic plots in Matplotlib

from matplotlib import pyplot as plt

Bar plot using Matplotlib

# importing matplotlib module

Histogram using Matplotlib

# importing matplotlib module

Scatter Plot using Matplotlib

# importing matplotlib module

# Calling DataFrame constructor

Creating a dataframe using List:

# Calling DataFrame constructor on list

Creating DataFrame from dict of narray/lists:

#load data into a DataFrame object:

#refer to the row index:

Return row 0 and 1:

#use a list of indexes:

Note: When using [], the result is a Pandas DataFrame.

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

Locate Named Indexes

Load Files Into a DataFrame

use to_string() to print the entire DataFrame.

Manipulating Data in Data Frame

1. Adding new column to existing DataFrame in Pandas

# Define a dictionary containing Students data

# Convert the dictionary into DataFrame

# Declare a list that is to be converted into a column

# Using 'Address' as the column name

# Observe the result

# Import pandas package

# Define a dictionary containing Students data

# Convert the dictionary into DataFrame

# Using DataFrame.insert() to add a column

Using Dataframe.assign() method

# Define a dictionary containing Students data

# Convert the dictionary into DataFrame

# Using 'Address' as the column name and equating it to the list

# Observe the result

# Define a dictionary containing Students data

# Define a dictionary with key values of

# Convert the dictionary into DataFrame

# Provide 'Address' as the column name

# Observe the output

# making data frame

# storing in new variable

Print the last 5 rows of the DataFrame:

Delete rows/columns from DataFrame using Pandas.drop()

Dropping Rows by index label

# making data frame from csv file

Applying the drop function.

Output: Data Frame after Dropping values

# making data frame from csv file

Output: Data Frame before Dropping Columns

Applying drop function.

Output: Data Frame after Dropping Columns