Python Libraries

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 27

What is a Module?

Consider a module to be the same as a code library.

A file containing a set of functions you want to include in your application.

Create a Module
To create a module just save the code you want in a file with the file
extension .py:

Example
Save this code in a file named mymodule.py
def greeting(name):
print("Hello, " + name)

Use a Module
Now we can use the module we just created, by using the import statement:

Example
Import the module named mymodule, and call the greeting function:

import mymodule

mymodule.greeting("Jonathan")

Variables in Module
The module can contain functions, as already described, but also variables of
all types (arrays, dictionaries, objects etc):

Example
Save this code in the file mymodule.py

person1 = {
"name": "John",
"age": 36,
"country": "Norway"
}
Example
Import the module named mymodule, and access the person1 dictionary:

import mymodule

a = mymodule.person1["age"]
print(a)

Re-naming a Module
You can create an alias when you import a module, by using the as keyword:

Example
Create an alias for mymodule called mx:

import mymodule as mx

a = mx.person1["age"]
print(a)

Introduction to Matplotlib

Matplotlib is an amazing visualization library in Python for 2D plots of arrays.


Matplotlib is a multi-platform data visualization library built on NumPy arrays and
designed to work with the broader SciPy stack. It was introduced by John Hunter in
the year 2002. One of the greatest benefits of visualization is that it allows us visual
access to huge amounts of data in easily digestible visuals. Matplotlib consists of
several plots like line, bar, scatter, histogram etc.
Importing matplotlib
After successfully installing matplotlib , You can run this command to import
matplotlib on your system.

Import matplotlib

Basic plots in Matplotlib


Matplotlib comes with a wide variety of plots. Plots help to understand trends, and
patterns, and to make correlations. They’re typically instruments for reasoning about
quantitative information. Some of the sample plots are covered here.
Line plot using Matplotlib
By importing the matplotlib module, defines x and y values for a plots, plots the data
using the plot() function and it helps to display the plot by using the show() function .
The plot() creates a line plot by connecting the points defined by x and y values.

from matplotlib import pyplot as plt


# x-axis values
x = [5, 2, 9, 4, 7]
# Y-axis values
y = [10, 5, 8, 4, 2]
# Function to plot
plt.plot(x,y)
# function to show the plot
plt.show()

Output :

Bar plot using Matplotlib


By using matplotlib library in python , it allows us to access the functions and classes
provided by the library for plotting. There are tow list x and y are defined . This
function creates a bar plot by taking x-axis and y-axis values as arguments and
generates the bar plot based on those values.

# importing matplotlib module


from matplotlib import pyplot as plt
# x-axis values
x = [5, 2, 9, 4, 7]
# Y-axis values
y = [10, 5, 8, 4, 2]
# Function to plot the bar
plt.bar(x,y)
# function to show the plot
plt.show()

Output:

Histogram using Matplotlib


By using the matplotlib module defines the y-axis values for a histogram plot. Plots in
histogram using the hist() function and displays the plot using the show() function.
The hist() function creates a histogram plot based on the values in the y-axis list.

# importing matplotlib module


from matplotlib import pyplot as plt
# Y-axis values
y = [10, 5, 8, 4, 2]
# Function to plot histogram
plt.hist(y)
# Function to show the plot
plt.show()
Output:

Scatter Plot using Matplotlib


By imports the matplotlib module, defines x and y values for a scatter plot, plots the
data using the scatter() function, and displays the plot using the show() function. The
scatter() function creates a scatter plot by plotting individual data points defined by
the x and y values.

# importing matplotlib module


from matplotlib import pyplot as plt
# x-axis values
x = [5, 2, 9, 4, 7]
# Y-axis values
y = [10, 5, 8, 4, 2]
# Function to plot scatter
plt.scatter(x, y)
# function to show the plot
plt.show()

Output:
Pandas
Pandas is an open-source library that is built on top of NumPy library. It is a Python
package that offers various data structures and operations for manipulating numerical
data and time series. It is mainly popular for importing and analysing data much
easier. Pandas is fast and it has high-performance & productivity for users.
A Dataframe is a two-dimensional data structure, i.e., data is aligned in a tabular
fashion in rows and columns. In dataframe datasets are arranged in rows and columns,
we can store any number of datasets in a dataframe. We can perform many operations
on these datasets like arithmetic operation, columns/rows selection, columns/rows
addition etc.
Creating an empty dataframe :
A basic DataFrame, which can be created is an Empty Dataframe. An Empty
Dataframe is created just by calling a dataframe constructor.
# import pandas as pd
import pandas as pd

# Calling DataFrame constructor


df = pd.DataFrame()

print(df)

Output :
Empty DataFrame
Columns: []
Index: []

Creating a dataframe using List:


DataFrame can be created using a single list or a list of lists.
# import pandas as pd
import pandas as pd

# list of strings
lst = ['Hello', 'This', 'is',
'python', 'Class', 'BMS']

# Calling DataFrame constructor on list


df = pd.DataFrame(lst)
print(df)

Output:
0
0 Hello
1 This
2 is
3 python
4 Class
5 BMS

Creating DataFrame from dict of narray/lists:


To create DataFrame from dict of narray/list, all the narray must be of same length.

Example
Create a simple Pandas DataFrame:
import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

#load data into a DataFrame object:


df = pd.DataFrame(data)

print(df)

Result
calories duration
0 420 50
1 380 40
2 390 45

Locate Row
As you can see from the result above, the DataFrame is like a table with
rows and columns.

Pandas use the loc attribute to return one or more specified row(s)

Example
Return row 0:

#refer to the row index:


print(df.loc[0])
Result
calories 420
duration 50
Name: 0, dtype: int64

Return row 0 and 1:

#use a list of indexes:


print(df.loc[[0, 1]])

Note: When using [], the result is a Pandas DataFrame.

Named Indexes
With the index argument, you can name your own indexes.

Example
Add a list of names to give each row a name:

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

print(df)

Result
calories duration
day1 420 50
day2 380 40
day3 390 45

Locate Named Indexes


Use the named index in the loc attribute to return the specified row(s).

Example
Return "day2":
#refer to the named index:
print(df.loc["day2"])

Result
calories 380
duration 40
Name: day2, dtype: int64

Load Files Into a DataFrame


If your data sets are stored in a file, Pandas can load them into a DataFrame.

CSV files contains plain text and is a well know format that can be read by
everyone including Pandas.

Example
Load a comma separated file (CSV file) into a DataFrame:

import pandas as pd

df = pd.read_csv('data.csv')
OR (in case you have a link where csv file is stored)

df =
pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv")

print(df.to_string())

use to_string() to print the entire DataFrame.

If you have a large DataFrame with many rows, Pandas will only return the
first 5 rows, and the last 5 rows

Manipulating Data in Data Frame

1. Adding new column to existing DataFrame in Pandas


There are multiple ways we can do this task.
By declaring a new list as a column
# Import pandas package
import pandas as pd

# Define a dictionary containing Students data


data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Height': [5.1, 6.2, 5.1, 5.2],
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}

# Convert the dictionary into DataFrame


df = pd.DataFrame(data)

# Declare a list that is to be converted into a column


address = ['Delhi', 'Bangalore', 'Chennai', 'Patna']

# Using 'Address' as the column name


# and equating it to the list
df['Address'] = address

# Observe the result


print(df)
Output:

the length of your list should match the length of the index column otherwise it
will show an error.
By using DataFrame.insert()
It gives the freedom to add a column at any position we like and not just at the end. It
also provides different options for inserting the column values.

# Import pandas package


import pandas as pd

# Define a dictionary containing Students data


data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Height': [5.1, 6.2, 5.1, 5.2],
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}

# Convert the dictionary into DataFrame


df = pd.DataFrame(data)

# Using DataFrame.insert() to add a column


df.insert(2, "Age", [21, 23, 24, 21], True)
# Observe the result
print(df)
Output:

Using Dataframe.assign() method


This method will create a new dataframe with a new column added to the old
dataframe.
# Import pandas package
import pandas as pd

# Define a dictionary containing Students data


data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Height': [5.1, 6.2, 5.1, 5.2],
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}

# Convert the dictionary into DataFrame


df = pd.DataFrame(data)

# Using 'Address' as the column name and equating it to the list


df2 = df.assign(address=['Delhi', 'Bangalore', 'Chennai', 'Patna'])

# Observe the result


print(df2)

Output:
By using a dictionary
We can use a Python dictionary to add a new column in pandas DataFrame. Use an
existing column as the key values and their respective values will be the values for a
new column.
# Import pandas package
import pandas as pd

# Define a dictionary containing Students data


data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Height': [5.1, 6.2, 5.1, 5.2],
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}

# Define a dictionary with key values of


# an existing column and their respective
# value pairs as the # values for our new column.
address = {'Delhi': 'Jai', 'Bangalore': 'Princi',
'Patna': 'Gaurav', 'Chennai': 'Anuj'}

# Convert the dictionary into DataFrame


df = pd.DataFrame(data)

# Provide 'Address' as the column name


df['Address'] = address

# Observe the output


print(df)

Pandas head() method is used to return top n (5 by default) rows of a data frame or
series. The head() method returns the headers and a specified number of rows,
starting from the top.
Syntax: Dataframe.head(n=5)
Parameters:
n: integer value, number of rows to be returned
import pandas as pd

# making data frame


Data=pd.read_csv("https://media.geeksforgeeks.org/wpcontent/uploads/
nba.csv")
# calling head() method

# storing in new variable

data_top = data.head()

# display

data_top

There is also a tail() method for viewing the last rows of the DataFrame.

The tail() method returns the headers and a specified number of rows,
starting from the bottom.

Print the last 5 rows of the DataFrame:

print(df.tail())

Delete rows/columns from DataFrame using Pandas.drop()

Rows or columns can be removed using an index label or column name using this
method.
Syntax:
DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None,
inplace=False, errors=’raise’)
Parameters:
labels: String or list of strings referring row or column name.
axis: int or string value, 0 ‘index’ for Rows and 1 ‘columns’ for Columns.
index or columns: Single label or list. index or columns are an alternative to axis
and cannot be used together. level: Used to specify level in case data frame is having
multiple level index.
inplace: Makes changes in original Data Frame if True.
errors: Ignores error if any value from the list doesn’t exists and drops rest of the
values when errors = ‘ignore’ Return type: Dataframe with dropped values

Dropping Rows by index label

In this code, A list of index labels is passed and the rows corresponding to those labels
are dropped using .drop() method.
# importing pandas module
import pandas as pd

# making data frame from csv file


data = pd.read_csv("nba.csv", index_col="Name")
print(data.head(5))
Note: index_col:
This is to allow you to set which columns to be used
as the index of the dataframe. The default value is None, and
pandas will add a new column start from 0 to specify the index
column.
Output: Data Frame before Dropping values
Team Number Position Age Height Weight
College Salary
Name
Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0
Texas 7730337.0
Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0
Marquette 6796117.0
John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0
Boston University NaN
R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0
Georgia State 1148640.0
Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0
NaN 5000000.0

Applying the drop function.


# dropping passed values
data.drop(["Avery Bradley", "John Holland", "R.J. Hunter"], inplace =
True)

# display
print(data)

Output: Data Frame after Dropping values


As shown in the output before, the new output doesn’t have the passed values. Those
values were dropped and the changes were made in the original data frame since
inplace was True.
Team Number Position Age Height Weight
College Salary
Name
Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0
Marquette 6796117.0
Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0
NaN 5000000.0
Amir Johnson Boston Celtics 90.0 PF 29.0 6-9 240.0
NaN 12000000.0
Jordan Mickey Boston Celtics 55.0 PF 21.0 6-8 235.0
LSU 1170960.0
Kelly Olynyk Boston Celtics 41.0 C 25.0 7-0 238.0
Gonzaga 2165160.0
Dropping columns with column name
In this code, Passed columns are dropped using column names. axis parameter is kept
1 since 1 refers to columns.
# importing pandas module
import pandas as pd

# making data frame from csv file


data = pd.read_csv("nba.csv", index_col ="Name" )
print(data.head())

Output: Data Frame before Dropping Columns


Team Number Position Age Height Weight
College Salary
Name
Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0
Texas 7730337.0
Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0
Marquette 6796117.0
John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0
Boston University NaN
R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0
Georgia State 1148640.0
Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0
NaN 5000000.0

Applying drop function.


# dropping passed columns
data.drop(["Team", "Weight"], axis = 1, inplace = True)

# display
print(data.head())

Output: Data Frame after Dropping Columns


As shown in the output images, the new output doesn’t have the passed columns.
Those values were dropped since the axis was set equal to 1 and the changes were
made in the original data frame since inplace was True.

Number Position Age Height College


Salary
Name
Avery Bradley 0.0 PG 25.0 6-2 Texas
7730337.0
Jae Crowder 99.0 SF 25.0 6-6 Marquette
6796117.0
John Holland 30.0 SG 27.0 6-5 Boston University
NaN
R.J. Hunter 28.0 SG 22.0 6-5 Georgia State
1148640.0
Jonas Jerebko 8.0 PF 29.0 6-10 NaN
5000000.0

Pandas Merging, Joining, and Concatenating


Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure
with labelled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is
aligned in a tabular fashion in rows and columns. We can join, merge, and concat dataframe using
different methods. In Dataframe df.merge(),df.join(), and df.concat() methods help in joining,
merging and concating different dataframe.

Concatenating DataFrame using .concat() :


In order to concat a dataframe, we use .concat() function this function concat a dataframe and returns
a new dataframe.
# importing pandas module
import pandas as pd

# Define a dictionary containing employee data


data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}

# Define a dictionary containing employee data


data2 = {'Name':['Abhi', 'Ayushi', 'Dhiraj', 'Hitesh'],
'Age':[17, 14, 12, 52],
'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],
'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']}

# Convert the dictionary into DataFrame


df = pd.DataFrame(data1,index=[0, 1, 2, 3])

# Convert the dictionary into DataFrame


df1 = pd.DataFrame(data2, index=[4, 5, 6, 7])

print(df, "\n\n", df1)


Now we apply .concat function in order to concat two dataframe
frames = [df, df1]

res1 = pd.concat(frames)
res1

Concatenating DataFrame using .append()


In order to concat a dataframe, we use .append() function this function concatenate along axis=0,
namely the index.

res = df.append(df1)
res
Merging DataFrame

Pandas have options for high-performance in-memory merging and joining. When
we need to combine very large DataFrames, joins serve as a powerful way to
perform these operations swiftly. Joins can only be done on two DataFrames at a
time, denoted as left and right tables. The key is the common column that the two
DataFrames will be joined on. It’s a good practice to use keys which have unique
values throughout the column to avoid unintended duplication of row values. Pandas
provide a single function, merge(), as the entry point for all standard database join
operations between DataFrame objects.
There are four basic ways to handle the join (inner, left, right, and outer), depending
on which rows must retain their data.

Code #1 : Merging a dataframe with one unique key combination


# importing pandas module
import pandas as pd

# Define a dictionary containing employee data


data1 = {'key': ['K0', 'K1', 'K2', 'K3'],
'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],}

# Define a dictionary containing employee data


data2 = {'key': ['K0', 'K1', 'K2', 'K3'],
'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],
'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']}

# Convert the dictionary into DataFrame


df = pd.DataFrame(data1)
# Convert the dictionary into DataFrame
df1 = pd.DataFrame(data2)

print(df, "\n\n", df1)

Now we are using .merge() with one unique key combination


# using .merge() function
res = pd.merge(df, df1, on='key')

res

Output:

Code #2: Merging dataframe using multiple join keys.


# importing pandas module
import pandas as pd

# Define a dictionary containing employee data


data1 = {'key': ['K0', 'K1', 'K2', 'K3'],
'key1': ['K0', 'K1', 'K0', 'K1'],
'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],}

# Define a dictionary containing employee data


data2 = {'key': ['K0', 'K1', 'K2', 'K3'],
'key1': ['K0', 'K0', 'K0', 'K0'],
'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],
'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']}

# Convert the dictionary into DataFrame


df = pd.DataFrame(data1)

# Convert the dictionary into DataFrame


df1 = pd.DataFrame(data2)

print(df, "\n\n", df1)

Now we merge dataframe using multiple keys


# merging dataframe using multiple keys
res1 = pd.merge(df, df1, on=['key', 'key1'])
res1
Output :

Merging dataframe using how in an argument:


We use how argument to merge specifies how to determine which keys are to be
included in the resulting table. If a key combination does not appear in either the left
or right tables, the values in the joined table will be NA. Here is a summary of the
how options and their SQL equivalent names:
MERGE METHOD JOIN NAME DESCRIPTION

left LEFT OUTER JOIN Use keys from left frame only

right RIGHT OUTER JOIN Use keys from right frame only

outer FULL OUTER JOIN Use union of keys from both frames

inner INNER JOIN Use intersection of keys from both frames

# importing pandas module

import pandas as pd

# Define a dictionary containing employee data

data1 = {'key': ['K0', 'K1', 'K2', 'K3'],

'key1': ['K0', 'K1', 'K0', 'K1'],

'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],

'Age':[27, 24, 22, 32],}

# Define a dictionary containing employee data


data2 = {'key': ['K0', 'K1', 'K2', 'K3'],

'key1': ['K0', 'K0', 'K0', 'K0'],

'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],

'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']}

# Convert the dictionary into DataFrame

df = pd.DataFrame(data1)

# Convert the dictionary into DataFrame

df1 = pd.DataFrame(data2)

print(df, "\n\n", df1)

Run on IDE

Now we set how = 'left' in order to use keys from left frame only.
# using keys from left frame

res = pd.merge(df, df1, how='left', on=['key', 'key1'])

res

Output :

Now we set how = 'right' in order to use keys from right frame only.
# using keys from right frame

res1 = pd.merge(df, df1, how='right', on=['key', 'key1'])


res1

Output :

Now we set how = 'outer' in order to get union of keys from dataframes.
# getting union of keys

res2 = pd.merge(df, df1, how='outer', on=['key', 'key1'])

res2

Output :

Now we set how = 'inner' in order to get intersection of keys from dataframes.
# getting intersection of keys

res3 = pd.merge(df, df1, how='inner', on=['key', 'key1'])


res3

Output :

Joining DataFrame

In order to join dataframe, we use .join() function this function is used for combining
the columns of two potentially differently-indexed DataFrames into a single result
DataFrame.
# importing pandas module

import pandas as pd

# Define a dictionary containing employee data

data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],

'Age':[27, 24, 22, 32]}

# Define a dictionary containing employee data

data2 = {'Address':['Allahabad', 'Kannuaj', 'Allahabad', 'Kannuaj'],

'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons']}

# Convert the dictionary into DataFrame

df = pd.DataFrame(data1,index=['K0', 'K1', 'K2', 'K3'])

# Convert the dictionary into DataFrame


df1 = pd.DataFrame(data2, index=['K0', 'K2', 'K3', 'K4'])

print(df, "\n\n", df1)

Run on IDE

Now we are use .join() method in order to join dataframes


# joining dataframe

res = df.join(df1)

res

Output :

Now we use how = 'outer' in order to get union


# getting union

res1 = df.join(df1, how='outer')

res1
Output :

Joining dataframe using on in an argument :


In order to join dataframes we use on in an argument. join() takes an optional on
argument which may be a column or multiple column names, which specifies that the
passed DataFrame is to be aligned on that column in the DataFrame.
# importing pandas module

import pandas as pd

# Define a dictionary containing employee data

data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],

'Age':[27, 24, 22, 32],

'Key':['K0', 'K1', 'K2', 'K3']}

# Define a dictionary containing employee data

data2 = {'Address':['Allahabad', 'Kannuaj', 'Allahabad', 'Kannuaj'],

'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons']}

# Convert the dictionary into DataFrame

df = pd.DataFrame(data1)

# Convert the dictionary into DataFrame

df1 = pd.DataFrame(data2, index=['K0', 'K2', 'K3', 'K4'])

print(df, "\n\n", df1)

Run on IDE

Now we are using .join with “on” argument


# using on argument in join
res2 = df.join(df1, on='Key')

res2

Output :

Joining singly-indexed DataFrame with multi-indexed DataFrame :


In order to join singly indexed dataframe with multi-indexed dataframe, the level will
match on the name of the index of the singly-indexed frame against a level name of
the multi-indexed frame.
# importing pandas module

import pandas as pd

# Define a dictionary containing employee data

data1 = {'Name':['Jai', 'Princi', 'Gaurav'],

'Age':[27, 24, 22]}

# Define a dictionary containing employee data

data2 = {'Address':['Allahabad', 'Kannuaj', 'Allahabad', 'Kanpur'],

'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons']}

# Convert the dictionary into DataFrame

df = pd.DataFrame(data1, index=pd.Index(['K0', 'K1', 'K2'], name='key'))


index = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),

('K2', 'Y2'), ('K2', 'Y3')],

names=['key', 'Y'])

# Convert the dictionary into DataFrame

df1 = pd.DataFrame(data2, index= index)

print(df, "\n\n", df1)

Run on IDE

Now we join singly indexed dataframe with multi-indexed dataframe


# joining singly indexed with

# multi indexed

result = df.join(df1, how='inner')

result

Output :

You might also like