Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

EX.NO.

: 1
DATA SCIENCE PACKAGES IN
DATE: PYTHON

AIM:
To Download, install and explore the features of NumPy, SciPy,
Jupyter, Stats models and Pandas packages.

REQUIREMENTS:
Anaconda and its related components.
o Install Anaconda (it installs all packages you need and all other tools
mentioned below).
o For writing and executing code, use notebooks in JupyterLab for
exploratory and interactive computing, and Spyder or Visual Studio
Code for writing scripts and packages.
o Use Anaconda Navigator to manage your packages and start
JupyterLab, Spyder, or Visual Studio Code.
Alternatively prefer pip/PyPI
o Install Python from python.org, Homebrew, or your Linux package
manager.
o Use Poetry as the most well-maintained tool that provides a
dependency resolver and environment management capabilities in a
similar fashion as conda does.
PYTHON PACKAGE MANAGEMENT
Managing packages is a challenging problem, and, as a result, there are lots of
tools. For web and general-purpose Python development there’s a whole host of
tools complementary with pip. For high-performance computing (HPC), Spack is
worth considering. For most NumPy users though, conda and pip are the two
most popular tools.
What is a Python package?
A Python package is a collection of modules/functions where each module is
designed to solve a specific task. You can simply import the modules using the
word “import” and specifying the module or submodule name eg. import
numpy (Numpy is a module used for scientific computations).
If you want to create your own package you can create a Python file with some
modules implemented using OOPS and publish it on pypi.org and then everyone
would be able to access it.
Anaconda already comes up with a bunch of Python packages that may be useful for
you if not you can delete them if needed.
Managing Python Packages:
Python package management can be done using two utilities PIP or Conda. These
are called Python Package Managers that can help in the installation, deletion, and
management of the Python packages. The only difference between these two is
Conda manages the packages that are available from Anaconda distribution while
PIP is the default package management system for Python. You can download your
required package just by specifying the following command:
$ pip install package_name
$ conda install package_name
Which one should you prefer PIP or Conda?
Packages that are specific to Data Science and Machine Learning are preferably
installed using Conda, while PIP can be used for general package installations.
For deleting any package, you can delete it using PIP only even if it was downloaded
from Conda:
$ pip uninstall package_name

2
EX.NO.: 2
WORKING WITH NUMPY ARRAYS
DATE:

AIM:
To perform Basic NumPy operations on the following:
1. To convert an array to a float type
2.To add a border (filled with 0's) around an existing array
3. To convert a list and tuple into arrays
4. Write a NumPy program to append values to the end of an array
5. To convert an array to a float type
6. To create an empty and a full array
7. To convert a list and tuple into arrays
8. To find the real and imaginary parts of an array of complex numbers

PRE-REQUISITES:
Jupiter Notebook/Spyder, Necessary Packages.

PROCEDURE:
1. Start Jupiter Notebook.
2. Import the NumPy package.
3. Initialize a NumPy array.
4. Use appropriate syntax to work on NumPy array.
5. Use necessary functions.
6. If necessary, use text file to load and work with NumPy arrays.
7. Display the Result.
8. Save the file.
9. Stop the Jupiter notebook.

3
2.1.write a numpy program to convert an array to a
float type
In [2]:
import numpy as
np import numpy
as np a = [1, 2,
3, 4]
print("Original array")
print(a)
x = np.asfarray(a)
print("Array converted to a float
type:") print(x)
Original array [1, 2,
3, 4]
Array converted to a float type:
[1. 2. 3. 4.]

2.2.write a numpy program to add a border (filled with


0's) around an existing array
In [5]: import numpy as
np x =
np.ones((3,3))
print("Original array:")
print(x)
print("0 on the border and 1 inside in the array")
x = np.pad(x, pad_width=1, mode='constant',
constant_values=0) print(x)
Original array:
[[1. 1. 1.]
[1. 1. 1.]
4
[1. 1. 1.]]
0 on the border and 1 inside in the array
[[0. 0. 0. 0. 0.]
[0. 1. 1. 1. 0.]
[0. 1. 1. 1. 0.]
[0. 1. 1. 1. 0.]
[0. 0. 0. 0. 0.]]

5
2.3 List to array:
[1 2 3 4 5 6 7 8]
Tuple to array:
[[8 4 6]
[1 2 3]]

2.4.write a numpy program to append values


to the end of an array

In [7]: import numpy as


np x = [10, 20,
30]
print("Original array:",x)
x = np.append(x, [[40, 50, 60], [70, 80, 90]])
print("After append values to the end of the array:",x)
Original array: [10, 20, 30]
After append values to the end of the array: [10 20 30 40 50 60 70 80 90]

2.5write a numpy program to convert an array to a


float type
In [11]: import numpy as
np import numpy
as np a = [1, 2,
3, 4]
print("Original array")
print(a)
x = np.asfarray(a)
print("Array converted to a float
type:") print(x)

6
ginal array [1, 2, 3,
O 4]
r Array converted to a float type:
i [1. 2. 3. 4.]

2.6 write a numpy program to create an empty


and a full array

Empty Array
[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]]

7
2.7.write a numpy program to convert a list and
tuple into arrays

In [10]: import numpy as np


my_list = [1, 2, 3, 4, 5, 6, 7, 8]
print("Original data:",my_list )
print("List to array: ")
print(np.asarray(my_list))
my_tuple = ([8, 4, 6], [1, 2,
3]) print("Tuple to array: ")
print(np.asarray(my_tuple))

Original data: [1, 2, 3, 4, 5, 6, 7, 8] List to array:


[1 2 3 4 5 6 7 8]
Tuple to array:
[[8 4 6]
[1 2 3]]

2.8.write a numpy program to find the real and


imaginary parts of an array of complex numbers
In [17]: import numpy as np
x =
np.sqrt([2+5j]) y
= np.sqrt([4+7j])
print("Original array:x ",x)
print("Original array:y ",y)
print("Real part of the
array:") print(x.real)
print(y.real)
print("Imaginary part of the
array:") print(x.imag)
print(y.imag)
Original array:x [1.92160933+1.30099285j] Original array:y
[2.45583568+1.42517679j] Real part of the array:
[1.92160933]
[2.45583568]
Imaginary part of the array:
[1.30099285]
[1.42517679]

2.9 Write a NumPy program to convert a Python dictionary to a NumPy ndarray.

In [11
import numpy as np
from ast import literal_eval
udict = """{"column0":{"a":1,"b":0.0,"c":0.0,"d":2.0},
"column1":{"a":3.0,"b":1,"c":0.0,"d":-1.0},
"column2":{"a":4,"b":1,"c":5.0,"d":-1.0},
"column3":{"a":3.0,"b":-1.0,"c":-1.0,"d":-1.0}
}"""
t = literal_eval(udict) print("\
8
nOriginal dictionary:") print(t)
print("Type: ",type(t))
Original dictionary:
{'column0': {'a': 1, 'b': 0.0, 'c': 0.0, 'd': 2.0}, 'column1': {'a': 3.0,
'b': 1, 'c': 0.0, 'd': -1.0}, 'column2': {'a': 4, 'b': 1, 'c': 5.0, 'd': -
1.0}, 'column3': {'a': 3.0, 'b': -1.0, 'c': -1.0, 'd': -1.0}}
Type: <class 'dict'> ndarray:

[[ 1. 0. 0. 2.]
[ 3. 1. 0. -1.]
[ 4.<class
Type: 1. 'numpy.ndarray'>
5. -1.]
[ 3. -1. -1. -1.]]
Original dictionary:
{'column0': {'a': 1, 'b': 0.0, 'c': 0.0, 'd': 2.0}, 'column1': {'a': 3.0,
'b': 1, 'c': 0.0, 'd': -1.0}, 'column2': {'a': 4, 'b': 1, 'c': 5.0, 'd': -
1.0}, 'column3': {'a': 3.0, 'b': -1.0, 'c': -1.0, 'd': -1.0}}
Type: <class 'dict'>
ndarray:
[[ 1. 0. 0. 2.]
[ 3. 1. 0. -1.]
[ 4. 1. 5. -1.]
[ 3. -1. -1. -1.]]
Type: <class 'numpy.ndarray'>

9
2.10 Write a NumPy program to search theindex of a given array in another given array.

In [118…
import numpy as np
np_array = np.array([[1,2,3], [4,5,6] , [7,8,9], [10, 11, 12]])
test_array = np.array([4,5,6])
print("Original Numpy array:")
print(np_array)
print("Searched array:")
print(test_array)
print("Index of the searched array in the original
array:") print(np.where((np_array == test_array).all(1))
[0])
Original Numpy array:
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
Searched array:
[4 5 6]
Index of the searched array in the original array: [1]

10
EX.NO.: 3
WORKING WITH PANDAS DATA
DATE: FRAMES

AIM:
To create a Pandas data frame and working with that data frame to perform the
following:
1. To create and display a DataFrame from a specified dictionary data which has the index labels.
2. To select the rows where the number of attempts in the examination is greater than 2.
3. To get the rows of a given DataFrame and Sample Python dictionary data and list labels.
4. To select the rows where the score is missing.
5.

PRE-REQUISITES:
Jupiter Notebook/Spyder, Necessary Packages.

PROCEDURE:
1. Start Jupiter Notebook.
2. Import the NumPy and Pandas packages.
3. Create a panda’s data frame.
4. Use appropriate syntax to work on panda’s data frames.
5. Use necessary functions.
6. If necessary, use Datasets to load and manipulate with Pandas data frames.
7. Display the Result.
8. Save the file.
9. Stop the Jupiter notebook.

11
3.1Write a Pandas program to create and display a
DataFrame from a specified dictionary data which
has the index label

12
Sample Python dictionary data and list labels: exam_data = {'name': ['Anastasia', 'Dima', 'Katherine',
'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5, 9,
16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes', 'no',
'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

In [106… import pandas as pd


import numpy as np

exam_data = {'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],


'name': ['Anastasia', 'Dima', 'Katherine',
'James', 'Emily','Michael', 'Matthew',
'Laura', 'Kevin', 'Jonas'],

'qualify': ['yes', 'no', 'yes',


'no', 'no', 'yes',
'yes', 'no', 'no', 'yes'],

'score': [12.5, 9, 16.5, np.nan,


9, 20, 14.5, np.nan, 8, 19]}

labels = ['a', 'b', 'c', 'd', 'e',


'f', 'g', 'h', 'i', 'j']

sampledata = pd.DataFrame(exam_data ,
index=labels) print(sampledata)

attempts name qualify score


a 1 Anastasia yes 12.5
b 3 Dima no 9.0
c 2 Katherine yes 16.5
d 3 James no NaN
e 2 Emily no 9.0
f 3 Michael yes 20.0
g 1 Matthew yes 14.5
h 1 Laura no NaN
i 2 Kevin no 8.0
j 1 Jonas yes 19.0

3.2Write a Pandas program to select the rows where


the number of attempts in the examination is greater
than 2.
Sample Python dictionary data and list labels: exam_data = {'name': ['Anastasia', 'Dima',
'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5, 9,
16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes', 'no',
'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

In [22]:
import pandas as pd
import numpy as np

exam_data = {'name': ['Anastasia', 'Dima',


'Katherine', 'James',
'Emily', 'Michael',
'Matthew', 'Laura',
'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan,

13
9, 20, 14.5, np.nan, 8, 19],
'attempts' : [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no',
'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f',


'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data ,
index=labels) print("Number of attempts in
the \
examination is greater than
2:") print(df[df['attempts'] > 2])
Number of attempts in the examination is greater than 2: name score attempts
qualify
b Dima 9.0 3 no
d James NaN 3 no
f Michael 20.0 3 yes

3.3Write a Pandas program to get the first 3 rows of


a given DataFrame.
Sample Python dictionary data and list labels: exam_data = {'name': ['Anastasia', 'Dima',
'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5, 9,
16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes', 'no',
'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

In [107…
import pandas as pd
import numpy as np

exam_data = {'name': ['Anastasia', 'Dima',


'Katherine', 'James',
'Emily', 'Michael',
'Matthew', 'Laura',
'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan,
9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes',
'no', 'no', 'yes',
'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e',


'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data , index=labels)
print("First three rows of the data
frame:")
First three rowsprint(df.iloc[:3])
of the data frame: name score
attempts qualify
a Anastasia 12.5 1 yes
b Dima 9.0 3 no
c Katherine 16.5 2 yes

3.4Write a Pandas program to select the rows

14
where the score is missing, i.e. is NaN.
Sample Python dictionary data and list labels: exam_data = {'name': ['Anastasia', 'Dima',
'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5,
9,
16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes',
'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', labels = ['a',
'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

In [108…import pandas as pd
import numpy as np
exam_data = {'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'name': ['Anastasia', 'Dima',
'Katherine', 'James',
'Emily', 'Michael',
'Matthew', 'Laura',
'Kevin', 'Jonas'],

'qualify': ['yes', 'no', 'yes',


'no', 'no','yes',
'yes', 'no', 'no', 'yes'],

'score': [12.5, 9, 16.5, np.nan,


9, 20, 14.5, np.nan, 8, 19]}

labels = ['a', 'b', 'c', 'd', 'e',


'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data ,
index=labels) print("Rows where score is
missing:") print(df[df['score'].isnull()])

Rows where score is missing:


attempts name qualify score
d 3 James no NaN
h 1 Laura no NaN

3.5 Write a Pandas program to get the numeric


representation of an array by identifying distinct
values of a given column of a dataframe.
In [120…
import pandas as
pd df =
pd.DataFrame({
'Name': ['Alberto Franco','Gino Mcneill',
'Ryan Parkes', 'Eesha Hinton', 'Gino
Mcneill'], 'Date_Of_Birth ':
['17/05/2002','16/02/1999',
'25/09/1998','11/05/2002','15/09/1997'],
'Age': [18.5, 21.2, 22.5, 22, 23]
Original DataFrame:
Name Date_Of_Birth
1 Gino Mcneill 16/02/1999 21.2
2 Ryan Parkes 25/09/1998 22.5
3 Eesha Hinton 11/05/2002 22.0
4 Gino Mcneill 15/09/1997 23.0

Numeric representation of an array by identifying distinct values: [0 1 2 3 1]


Index(['Alberto Franco', 'Gino Mcneill', 'Ryan Parkes', 'Eesha Hinton'], d type='object')

3.6 Write a Pandas program to check for


inequality of two given DataFrames.
In [20]: import pandas as pd
df1 = pd.DataFrame({'W':[68,75,86,80,None],
'X':[78,85,None,80,86],
'Y':[84,94,89,83,86],
'Z':[86,97,96,72,83]});

df2 = pd.DataFrame({'W':[78,75,86,80,None],
'X':[78,85,96,80,76],
'Y':[84,84,89,83,86],
'Z':[86,97,96,72,83]});

print("Original DataFrames:")
print("\ndata frame 1:\n",df1)
print("\ndata frame 2:\n",df2)
print("\nCheck for inequality of the said
dataframes:") print(df1.ne(df2))

Original DataFrames:
data frame 1:
W X Y Z
0 68.0 78.0 84 86
1 75.0 85.0 94 97
2 86.0 NaN 89 96
3 80.0 80.0 83 72
4 NaN 86.0 86 83

data frame 2:
W X Y Z
0 78.0 78 84 86
1 75.0 85 84 97
2 86.0 96 89 96
3 80.0 80 83 72
4 NaN 76 86 83

Check for inequality of the said dataframes: W X


Y Z
0 True False False False
1 False False True False
2 False True False False
3 False False False False
4 True True False False

3.7 Write a Pandas program to get first n records of a


DataFrame
In [21]:
import pandas as pd

d = {'col1': [1, 2, 3, 4, 7, 11],


'col2': [4, 5, 6, 9, 5, 0],
'col3': [7, 5, 8, 12, 1,11]}

df = pd.DataFrame(data=d)
print("Original DataFrame")
print(df)
print("\nFirst 3
rows of the said
DataFrame':") df1 =
df.head(3)

Original DataFrame
col1 col2 col3
0 1 4 7
1 2 5 5
2 3 6 8
3 4 9 12
4 7 5 1
5 11 0 11
First 3 rows of the said DataFrame': col1 col2 col3
0 1 4 7
1 2 5 5
2 3 6 8

3.8 Write a Pandas program to select all


columns, except one given column in a
DataFrame.
import pandas as pd
d = {'col1': [1, 2, 3, 4, 7],
'col2': [4, 5, 6, 9, 5],
'col3': [7, 8, 12, 1, 11]}
df = pd.DataFrame(data=d)
print("Original DataFrame")
print(df)
print("\nAll columns except
'col3':") df = df.loc[:, df.columns
!= 'col3'] print(df)

Original DataFrame col1 col2 col3


0 1 4 7
1 2 5 8
2 3 6 12
3 4 9 1
4 7 5 11

All columns except 'col3': col1 col2


0 1 4
1 2 5
2 3 6
3 4 9
4 7 5

You might also like