Lab Manual ET Lab III

EMERGING TECHNOLOGY LAB III
PRACTICAL NO: 1
TITLE: Introduction to Data Science
AIM: To study the Data Science & use of Pandas for data science
PRIOR CONCEPT:
Data Science is the process that combines statistics, scientific methods, and algorithms to
derive only meaningful and important insights from a ginormous pool of data. It is an
interdisciplinary field whose true foundation lies in Statistics, Mathematics, Computer
Science, and Business. Hence, it becomes a little difficult to understand what exactly Data
Science is and what is it that makes Data scientists one of the coolest professions today.
Pandas is defined as an open-source library that provides high-performance data

manipulation in Python. The name of Pandas is derived from the word Panel Data, which
means an Econometrics from Multidimensional data. It is used for data analysis in Python
and developed by Wes McKinney in 2008.
Data analysis requires lots of processing, such as restructuring, cleaning or merging, etc.
There are different tools are available for fast data processing, such as Numpy, Scipy,
Cython, and Panda. But we prefer Pandas because working with Pandas is fast, simple and
more expressive than other tools. Pandas is built on top of the Numpy package,
means Numpy is required for operating the Pandas.
Before Pandas, Python was capable for data preparation, but it only provided limited support
for data analysis. So, Pandas came into the picture and enhanced the capabilities of data
analysis. It can perform five significant steps required for processing and analysis of data
irrespective of the origin of the data, i.e., load, manipulate, prepare, model, and analyze.
Key Features of Pandas :
• It has a fast and efficient DataFrame object with the default and customized indexing.
• Used for reshaping and pivoting of the data sets.
• Group by data for aggregations and transformations.
• It is used for data alignment and integration of the missing data.
• Provide the functionality of Time Series.
Department of Computer Science & Engineering 1
• Process a variety of data sets in different formats like matrix data, tabular
heterogeneous, time series.
• Handle multiple operations of the data sets such as subsetting, slicing, filtering,
groupBy, re-ordering, and re-shaping.
• It integrates with the other libraries such as SciPy, and scikit-learn.
• Provides fast performance, and If you want to speed it, even more, you can use
the Cython.
Python Pandas Data Structure :
The Pandas provides two data structures for processing the data i.e., Series and DataFrame,
which are discussed below:
1) Series : It is defined as a one-dimensional array that is capable of storing various data

types.The row labels of series are called the index. We can easily convert the list, tuple, and
dictionary into series using series method. A Series cannot contain multiple columns.It has
one parameter.
Data: It can be any list, dictionary, or scalar value.
2) DataFrame : It is a widely used data structure of pandas and works with a two
dimensional array with labeled axes (rows and columns). DataFrame is defined as a standard
way to store data and has two different indexes, i.e., row index and column index. It consists
of the following properties:
• The columns can be heterogeneous types like int, bool, and so on.
• It can be seen as a dictionary of Series structure where both the rows and columns are
indexed. It is denoted as “columns” in case of columns and “index” in case of rows.
How to install pandas using pip?

Step-1

First head over to https://www.python.org and click on Downloads on the Navigation bar as
highlighted on the image below:
Step-2
Be sure to download the latest version of the Python.

Step-3
On running the downloaded installer, you will get this window. Click on ‘Install Now’.
Step-4
After finishing the installation, it is recommended to choose the option to disable path
length to avoid any problems with your Python installation.
Step-5
Now that Python is installed, you should head over to our terminal or command prompt from
where you can install Pandas. So go to your search bar on your desktop and search for cmd.
An application called Command prompt should show up. Click to start it.

Step-6
Type in the command “pip install manager”. Pip is a package install manager for Python
and it is installed alongside the new Python distributions.
Step-7
Wait for the downloads to be over and once it is done you will be able to run Pandas inside
your Python programs on Windows.

CONCLUSION:
QUESTIONS:-
1. What is Data Science?

2. Explain the key features of Pandas.
3. Write down the major applications of data science.


PRACTICAL NO 2
TITLE: Python Data Series
AIM: Write a pandas program to add, subtract, multiple & divide two pandas series.
PRIOR CONCEPT:
Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer,
string, float, python objects, etc.). The axis labels are collectively called index. Pandas Series
is nothing but a column in an excel sheet.Labels need not be unique but must be a hashable
type. The object supports both integer and label-based indexing and provides a host of
methods for performing operations involving the index. In the real world, a Pandas Series
will be created by loading the datasets from existing storage, storage can be SQL Database,
CSV file, and Excel file. Pandas Series can be created from the lists, dictionary, and from a
scalar value etc.
Python Code :
import pandas as pd
ds1 = pd.Series([2, 4, 6, 8, 10])
ds2 = pd.Series([1, 3, 5, 7, 9])
ds = ds1 + ds2
print("Add two Series:")
print(ds)
print("Subtract two Series:")
ds = ds1 - ds2
print(ds)
print("Multiply two Series:")
ds = ds1 * ds2
print(ds)

print("Divide Series1 by Series2:")
ds = ds1 / ds2
print(ds)
OUTPUT:

CONCLUSION:
QUESTIONS:-
1. Write a code to create a simple Pandas Series from a list?

2. How to create our own labels in Pandas?

PRACTICAL NO: 3
TITLE: Python data frames
AIM: Write a Pandas program to create and display a DataFrame from a specified dictionary
data which has the index labels.
PRIOR CONCEPT :
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table

with rows and columns. Pandas use the loc attribute to return one or more specified row(s). If
your data sets are stored in a file, Pandas can load them into a DataFrame.
PYTHON CODE:
import pandas as pd
import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew',

'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(exam_data , index=labels)

OUTPUT:
CONCLUSION:
QUESTIONS:-
1. What is the difference between Dataset and Dataframe?

2. How to load files in to a DataFrame?


PRACTICAL NO.: 4
TITLE: Insert data frame
AIM: Write a pandas program to insert a new column in existing data frame.
PRIOR CONCEPT:
There are multiple ways to insert new column in existing data frame.
1. By declaring a new list as a column.

2. By using DataFrame.insert(): It gives the freedom to add a column at any position we
like and not just at the end. It also provides different options for inserting the column
values.
3. Using Dataframe.assign() method : This method will create a new dataframe with a
new column added to the old dataframe.
4. By using a dictionary : We can use a Python dictionary to add a new column in
pandas DataFrame. Use an existing column as the key values and their respective
values will be the values for a new column.
PYTHON CODE:
import pandas as pd
import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew',

'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(exam_data , index=labels)
print("Original rows:")
print(df)

color = ['Red','Blue','Orange','Red','White','White','Blue','Green','Green','Red']
df['color'] = color
print("\nNew DataFrame after inserting the 'color' column")
print(df)
OUTPUT:

CONCLUSION:
QUESTIONS:-
1. How to add up columns in python?

2. How to add multiple columns to a DataFrame in Pandas

PRACTICAL NO: 5
TITLE: Display Pandas Index
AIM: Write a pandas program to display the default index & set a column as an index in a
given data frame.
PRIOR CONCEPT:
To get the index of a Pandas DataFrame, call DataFrame.index property. The

DataFrame.index property returns an Index object representing the index of this DataFrame.
The index property returns an object of type Index. We could access individual index using
any looping technique in Python. We can print the elements of Index object using a for loop
PYTHON CODE:
import pandas as pd
df = pd.DataFrame({
'school_code': ['s001','s002','s003','s001','s002','s004'],
'class': ['V', 'V', 'VI', 'VI', 'V', 'VI'],
'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill',

'David Parkes'],
'date_Of_Birth':
['15/05/2002','17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
'weight': [35, 32, 33, 30, 31, 32],
'address': ['street1', 'street2', 'street3', 'street1', 'street2', 'street4'],
't_id':['t1', 't2', 't3', 't4', 't5', 't6']})
print("Default Index:")
print(df.head(10))
print("\nt_id as new Index:")
df1 = df.set_index('t_id')
print(df1)
print("\nReset the index:")
df2 = df1.reset_index(inplace=False)
print(df2)
OUTPUT:

CONCLUSION:
QUESTIONS:-
1. How could we get first row index in Pandas?

2. Write the syntax to select a specific index in Pandas.

PRACTICAL NO.: 6
TITLE: Create Index labels
AIM: - Write a pandas program to create an index labels by using 64-bit integers,using
floating point numbers in a given data frame.
PRIOR CONCEPT :
Indexing in pandas means simply selecting particular rows and columns of data from a
DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the
rows and all of the columns, or some of each of the rows and columns.Int64Index is a special
case of Index with purely integer labels. Parameters dataarray-like (1-dimensional)
dtypeNumPy dtype (default: int64) copybool. Make a copy of input ndarray. nameobject.
Name to be stored in the index.
PYTHON CODE:
import pandas as pd
print("Create an Int64Index:")
df_i64 = pd.DataFrame({
'class': ['V', 'V', 'VI', 'VI', 'V', 'VI'],

'David Parkes'],
'date_Of_Birth':
['15/05/2002','17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
'weight': [35, 32, 33, 30, 31, 32],
'address': ['street1', 'street2', 'street3', 'street1', 'street2', 'street4']},
index=[1, 2, 3, 4, 5, 6])
print(df_i64)
print("\nView the Index:")

print(df_i64.index)
print("\nFloating-point labels using Float64Index:")
df_f64 = pd.DataFrame({
'class': ['V', 'V', 'VI', 'VI', 'V', 'VI'],

'David Parkes'],
'date_Of_Birth ':
['15/05/2002','17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
'weight': [35, 32, 33, 30, 31, 32],
index=[.1, .2, .3, .4, .5, .6])
print(df_f64)
print("\nView the Index:")
print(df_f64.index)
OUTPUT:

CONCLUSION :
QUESTIONS:-
1. What are the index labels in Pandas?

2. Write the four types of data labels.

PRACTICAL NO: 7
TITLE: Pandas string
AIM: Write a pandas program to convert all the string values to upper,lower cases in a
given pandas series.Also find the length of the string.
PRIOR CONCEPT :
Pandas provides a set of string functions which make it easy to operate on string data. Most
importantly, these functions ignore (or exclude) missing/NaN values.
PYTHON CODE:
import pandas as pd
import numpy as np
s = pd.Series(['X', 'Y', 'Z', 'Aaba', 'Baca', np.nan, 'CABA', None, 'bird', 'horse', 'dog'])
print("Original series:")
print(s)
print("\nConvert all string values of the said Series to upper case:")
print(s.str.upper())
print("\nConvert all string values of the said Series to lower case:")
print(s.str.lower())
print("\nLength of the string values of the said Series:")
print(s.str.len())

OUTPUT:

CONCLUSION:
QUESTIONS:-
1. What is the Panda datatype of string data?
2. How do you find the string value of a dataframe?

PRACTICAL NO: 8
TITLE: Pandas regular expression
AIM: Write a pandas program to remove whitespaces,left sided whitespaces & right sided
whitespaces of the string values.
PRIOR CONCEPT:
Pandas provide 3 methods to handle white spaces(including New line) in any text data. As it
can be seen in the name, str.lstrip() is used to remove spaces from the left side of
string, str.rstrip() to remove spaces from right side of the string and str.strip() removes spaces
from both sides. Since these are pandas function with same name as Python’s default
functions, .str has to be prefixed to tell the compiler that a Pandas function is being called.
PYTHON CODE:
import pandas as pd
color1 = pd.Index([' Green', 'Black ', ' Red ', 'White', ' Pink '])
print("Original series:")
print(color1)
print("\nRemove whitespace")
print(color1.str.strip())
print("\nRemove left sided whitespace")
print(color1.str.lstrip())
print("\nRemove Right sided whitespace")
print(color1.str.rstrip())

OUTPUT:
CONCLUSION:
QUESTIONS:-
1. How to get rid of leading and trailing spaces in Pandas?
2. How to check if a string has whitespaces or not?

PRACTICAL NO.: 9
TITLE: Joining data frames

AIM: Write a pandas program to join the two given dataframes along rows & assign all data..
PRIOR CONCEPT :
Pandas provides various facilities for easily combining together Series or DataFrame with
various kinds of set logic for the indexes and relational algebra functionality in the case of
join / merge-type operations. In addition, pandas also provides utilities to compare two Series
or DataFrame and summarize their differences.
PYTHON CODE:
import pandas as pd
student_data1 = pd.DataFrame({
'student_id': ['S1', 'S2', 'S3', 'S4', 'S5'],
'name': ['Danniella Fenton', 'Ryder Storey', 'Bryce Jensen', 'Ed Bernal', 'Kwame Morin'],
'marks': [200, 210, 190, 222, 199]})
'student_id': ['S4', 'S5', 'S6', 'S7', 'S8'],
'name': ['Scarlette Fisher', 'Carla Williamson', 'Dante Morse', 'Kaiser William', 'Madeeha
Preston'],
'marks': [201, 200, 198, 219, 201]})
print("Original DataFrames:")
print(student_data1)
print("-------------------------------------")

print("\nJoin the said two dataframes along rows:")
result_data = pd.concat([student_data1, student_data2])
print(result_data)
OUTPUT:

CONCLUSION:
QUESTIONS:-
1. Explain the concat( ) function in Pandas.

2. What is difference between joining and merging in pandas DataFrame?

PRACTICAL NO. : 10
TITLE: Merging data frames.
AIM:- Write a pandas program to append a list of dictionaries or series to a existing

dataframe & display the combined data.
PRIOR CONCEPT :
The merge() method updates the content of two DataFrame by merging them together, using
the specified method(s). Pandas provides a single function, merge, as the entry point for all
standard database join operations between DataFrame objects.
PYTHON CODE:-
import pandas as pd
'student_id': ['S1', 'S2', 'S3', 'S4', 'S5'],
'name': ['Danniella Fenton', 'Ryder Storey', 'Bryce Jensen', 'Ed Bernal', 'Kwame Morin'],
'marks': [200, 210, 190, 222, 199]})
s6 = pd.Series(['S6', 'Scarlette Fisher', 205], index=['student_id', 'name', 'marks'])
dicts = [{'student_id': 'S6', 'name': 'Scarlette Fisher', 'marks': 203},
{'student_id': 'S7', 'name': 'Bryce Jensen', 'marks': 207}]
print("Original DataFrames:")
print("\nDictionary:")
print(s6)
combined_data = student_data1.append(dicts, ignore_index=True, sort=False)
print("\nCombined Data:")
print(combined_data)

OUTPUT:

CONCLUSION:
QUESTIONS:-
1. How to merge 3 DataFrames in pandas Python?
2. How to merge a list of DataFrames in pandas?
3. Which are the 3 main ways of combining DataFrames together?.

PRACTICAL NO. : 11
TITLE: Pandas Time Series.
AIM:- Write a pandas program to create a date from a given year,month ,day & another date
from a given string formats.
PRIOR CONCEPT :
A time series is a sequence of data points that occur in sequential order over a given time
period. Values measured or observed over time are in a time series structure. Pandas’ time
series tools are very useful when data is timestamped. Timestamp is the pandas equivalent of
python’s Datetime. It’s the type used for the entries that make up a DatetimeIndex, and other
timeseries-oriented data structures in pandas. The simplest of the time series is the Series
structure indexed by timestamp.
PYTHON CODE :-
from datetime import datetime
date1 = datetime(year=2020, month=12, day=25)
print("Date from a given year, month, day:")
print(date1)
from dateutil import parser
date2 = parser.parse("1st of January, 2021")
print("\nDate from a given string formats:")
print(date2)
OUTPUT:

CONCLUSION:
QUESTIONS:-
1. How does pandas handle time series data?
2. Which are the three data structures to work with the time series in Pandas?

PRACTICAL NO. : 12
TITLE: Pandas grouping aggregate.
AIM:- Write a Pandas program to split the following dataframe by school code and get mean,
min, and max value of age for each school.
PRIOR CONCEPT :
Aggregation in pandas provides various functions that perform a mathematical or logical

operation on our dataset and returns a summary of that function. Aggregation can be used to
get a summary of columns in our dataset like getting sum, minimum, maximum, etc. from a
particular column of our dataset. The function used for aggregation is agg(), the parameter is
the function we want to perform. Pandas’ GroupBy is a powerful and versatile function in
Python. It allows you to split your data into separate groups to perform computations for
better analysis.
PYTHON CODE:-
import pandas as pd
pd.set_option('display.max_rows', None)
#pd.set_option('display.max_columns', None)
student_data = pd.DataFrame({
'class': ['V', 'V', 'VI', 'VI', 'V', 'VI'],

'David Parkes'],
'date_Of_Birth ':
['15/05/2002','17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
'age': [12, 12, 13, 13, 14, 12],
'height': [173, 192, 186, 167, 151, 159],
'weight': [35, 32, 33, 30, 31, 32],

index=['S1', 'S2', 'S3', 'S4', 'S5', 'S6'])
print("Original DataFrame:")
print(student_data)
print('\nMean, min, and max value of age for each value of the school:')
grouped_single = student_data.groupby('school_code').agg({'age': ['mean', 'min', 'max']})
print(grouped_single)
OUTPUT:

CONCLUSION:
QUESTIONS:-
1. Which functions are used in the aggregation?
2. Does pandas Groupby return series?Explain.

Lab Manual ET Lab III

Uploaded by

Copyright:

Available Formats

You might also like

Lab Manual ET Lab III

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lab Manual ET Lab III

Uploaded by

Copyright:

Available Formats

EMERGING TECHNOLOGY LAB III

TITLE: Introduction to Data Science

Pandas is defined as an open-source library that provides high-performance data

Key Features of Pandas :

Python Pandas Data Structure :

1) Series : It is defined as a one-dimensional array that is capable of storing various data

Data: It can be any list, dictionary, or scalar value.

How to install pandas using pip?

Department of Computer Science & Engineering 2

Be sure to download the latest version of the Python.

Department of Computer Science & Engineering 3

Department of Computer Science & Engineering 4

Department of Computer Science & Engineering 5

1. What is Data Science?

Department of Computer Science & Engineering 6

Department of Computer Science & Engineering 7

TITLE: Python Data Series

ds1 = pd.Series([2, 4, 6, 8, 10])

ds2 = pd.Series([1, 3, 5, 7, 9])

print("Add two Series:")

print("Subtract two Series:")

print("Multiply two Series:")

Department of Computer Science & Engineering 8

print("Divide Series1 by Series2:")

Department of Computer Science & Engineering 9

1. Write a code to create a simple Pandas Series from a list?

Department of Computer Science & Engineering 10

TITLE: Python data frames

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew',

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

Department of Computer Science & Engineering 11

1. What is the difference between Dataset and Dataframe?

Department of Computer Science & Engineering 12

Department of Computer Science & Engineering 13

TITLE: Insert data frame

1. By declaring a new list as a column.

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew',

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

Department of Computer Science & Engineering 14

print("\nNew DataFrame after inserting the 'color' column")

Department of Computer Science & Engineering 15

1. How to add up columns in python?

Department of Computer Science & Engineering 16

TITLE: Display Pandas Index

To get the index of a Pandas DataFrame, call DataFrame.index property. The

'class': ['V', 'V', 'VI', 'VI', 'V', 'VI'],

'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill',

'weight': [35, 32, 33, 30, 31, 32],

'address': ['street1', 'street2', 'street3', 'street1', 'street2', 'street4'],

't_id':['t1', 't2', 't3', 't4', 't5', 't6']})

print("\nt_id as new Index:")

print("\nReset the index:")

Department of Computer Science & Engineering 18

1. How could we get first row index in Pandas?

Department of Computer Science & Engineering 19