Professional Documents
Culture Documents
Fds Lab 1-3 Exp
Fds Lab 1-3 Exp
: 1
DATA SCIENCE PACKAGES IN
DATE: PYTHON
AIM:
To Download, install and explore the features of NumPy, SciPy,
Jupyter, Stats models and Pandas packages.
REQUIREMENTS:
Anaconda and its related components.
o Install Anaconda (it installs all packages you need and all other tools
mentioned below).
o For writing and executing code, use notebooks in JupyterLab for
exploratory and interactive computing, and Spyder or Visual Studio
Code for writing scripts and packages.
o Use Anaconda Navigator to manage your packages and start
JupyterLab, Spyder, or Visual Studio Code.
Alternatively prefer pip/PyPI
o Install Python from python.org, Homebrew, or your Linux package
manager.
o Use Poetry as the most well-maintained tool that provides a
dependency resolver and environment management capabilities in a
similar fashion as conda does.
PYTHON PACKAGE MANAGEMENT
Managing packages is a challenging problem, and, as a result, there are lots of
tools. For web and general-purpose Python development there’s a whole host of
tools complementary with pip. For high-performance computing (HPC), Spack is
worth considering. For most NumPy users though, conda and pip are the two
most popular tools.
What is a Python package?
A Python package is a collection of modules/functions where each module is
designed to solve a specific task. You can simply import the modules using the
word “import” and specifying the module or submodule name eg. import
numpy (Numpy is a module used for scientific computations).
If you want to create your own package you can create a Python file with some
modules implemented using OOPS and publish it on pypi.org and then everyone
would be able to access it.
Anaconda already comes up with a bunch of Python packages that may be useful for
you if not you can delete them if needed.
Managing Python Packages:
Python package management can be done using two utilities PIP or Conda. These
are called Python Package Managers that can help in the installation, deletion, and
management of the Python packages. The only difference between these two is
Conda manages the packages that are available from Anaconda distribution while
PIP is the default package management system for Python. You can download your
required package just by specifying the following command:
$ pip install package_name
$ conda install package_name
Which one should you prefer PIP or Conda?
Packages that are specific to Data Science and Machine Learning are preferably
installed using Conda, while PIP can be used for general package installations.
For deleting any package, you can delete it using PIP only even if it was downloaded
from Conda:
$ pip uninstall package_name
2
EX.NO.: 2
WORKING WITH NUMPY ARRAYS
DATE:
AIM:
To perform Basic NumPy operations on the following:
1. To convert an array to a float type
2.To add a border (filled with 0's) around an existing array
3. To convert a list and tuple into arrays
4. Write a NumPy program to append values to the end of an array
5. To convert an array to a float type
6. To create an empty and a full array
7. To convert a list and tuple into arrays
8. To find the real and imaginary parts of an array of complex numbers
PRE-REQUISITES:
Jupiter Notebook/Spyder, Necessary Packages.
PROCEDURE:
1. Start Jupiter Notebook.
2. Import the NumPy package.
3. Initialize a NumPy array.
4. Use appropriate syntax to work on NumPy array.
5. Use necessary functions.
6. If necessary, use text file to load and work with NumPy arrays.
7. Display the Result.
8. Save the file.
9. Stop the Jupiter notebook.
3
2.1.write a numpy program to convert an array to a
float type
In [2]:
import numpy as
np import numpy
as np a = [1, 2,
3, 4]
print("Original array")
print(a)
x = np.asfarray(a)
print("Array converted to a float
type:") print(x)
Original array [1, 2,
3, 4]
Array converted to a float type:
[1. 2. 3. 4.]
5
2.3 List to array:
[1 2 3 4 5 6 7 8]
Tuple to array:
[[8 4 6]
[1 2 3]]
6
ginal array [1, 2, 3,
O 4]
r Array converted to a float type:
i [1. 2. 3. 4.]
Empty Array
[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]]
7
2.7.write a numpy program to convert a list and
tuple into arrays
In [11
import numpy as np
from ast import literal_eval
udict = """{"column0":{"a":1,"b":0.0,"c":0.0,"d":2.0},
"column1":{"a":3.0,"b":1,"c":0.0,"d":-1.0},
"column2":{"a":4,"b":1,"c":5.0,"d":-1.0},
"column3":{"a":3.0,"b":-1.0,"c":-1.0,"d":-1.0}
}"""
t = literal_eval(udict) print("\
8
nOriginal dictionary:") print(t)
print("Type: ",type(t))
Original dictionary:
{'column0': {'a': 1, 'b': 0.0, 'c': 0.0, 'd': 2.0}, 'column1': {'a': 3.0,
'b': 1, 'c': 0.0, 'd': -1.0}, 'column2': {'a': 4, 'b': 1, 'c': 5.0, 'd': -
1.0}, 'column3': {'a': 3.0, 'b': -1.0, 'c': -1.0, 'd': -1.0}}
Type: <class 'dict'> ndarray:
[[ 1. 0. 0. 2.]
[ 3. 1. 0. -1.]
[ 4.<class
Type: 1. 'numpy.ndarray'>
5. -1.]
[ 3. -1. -1. -1.]]
Original dictionary:
{'column0': {'a': 1, 'b': 0.0, 'c': 0.0, 'd': 2.0}, 'column1': {'a': 3.0,
'b': 1, 'c': 0.0, 'd': -1.0}, 'column2': {'a': 4, 'b': 1, 'c': 5.0, 'd': -
1.0}, 'column3': {'a': 3.0, 'b': -1.0, 'c': -1.0, 'd': -1.0}}
Type: <class 'dict'>
ndarray:
[[ 1. 0. 0. 2.]
[ 3. 1. 0. -1.]
[ 4. 1. 5. -1.]
[ 3. -1. -1. -1.]]
Type: <class 'numpy.ndarray'>
9
2.10 Write a NumPy program to search theindex of a given array in another given array.
In [118…
import numpy as np
np_array = np.array([[1,2,3], [4,5,6] , [7,8,9], [10, 11, 12]])
test_array = np.array([4,5,6])
print("Original Numpy array:")
print(np_array)
print("Searched array:")
print(test_array)
print("Index of the searched array in the original
array:") print(np.where((np_array == test_array).all(1))
[0])
Original Numpy array:
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
Searched array:
[4 5 6]
Index of the searched array in the original array: [1]
10
EX.NO.: 3
WORKING WITH PANDAS DATA
DATE: FRAMES
AIM:
To create a Pandas data frame and working with that data frame to perform the
following:
1. To create and display a DataFrame from a specified dictionary data which has the index labels.
2. To select the rows where the number of attempts in the examination is greater than 2.
3. To get the rows of a given DataFrame and Sample Python dictionary data and list labels.
4. To select the rows where the score is missing.
5.
PRE-REQUISITES:
Jupiter Notebook/Spyder, Necessary Packages.
PROCEDURE:
1. Start Jupiter Notebook.
2. Import the NumPy and Pandas packages.
3. Create a panda’s data frame.
4. Use appropriate syntax to work on panda’s data frames.
5. Use necessary functions.
6. If necessary, use Datasets to load and manipulate with Pandas data frames.
7. Display the Result.
8. Save the file.
9. Stop the Jupiter notebook.
11
3.1Write a Pandas program to create and display a
DataFrame from a specified dictionary data which
has the index label
12
Sample Python dictionary data and list labels: exam_data = {'name': ['Anastasia', 'Dima', 'Katherine',
'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5, 9,
16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes', 'no',
'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
sampledata = pd.DataFrame(exam_data ,
index=labels) print(sampledata)
In [22]:
import pandas as pd
import numpy as np
13
9, 20, 14.5, np.nan, 8, 19],
'attempts' : [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no',
'yes', 'yes', 'no', 'no', 'yes']}
df = pd.DataFrame(exam_data ,
index=labels) print("Number of attempts in
the \
examination is greater than
2:") print(df[df['attempts'] > 2])
Number of attempts in the examination is greater than 2: name score attempts
qualify
b Dima 9.0 3 no
d James NaN 3 no
f Michael 20.0 3 yes
In [107…
import pandas as pd
import numpy as np
df = pd.DataFrame(exam_data , index=labels)
print("First three rows of the data
frame:")
First three rowsprint(df.iloc[:3])
of the data frame: name score
attempts qualify
a Anastasia 12.5 1 yes
b Dima 9.0 3 no
c Katherine 16.5 2 yes
14
where the score is missing, i.e. is NaN.
Sample Python dictionary data and list labels: exam_data = {'name': ['Anastasia', 'Dima',
'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5,
9,
16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes',
'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', labels = ['a',
'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
In [108…import pandas as pd
import numpy as np
exam_data = {'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'name': ['Anastasia', 'Dima',
'Katherine', 'James',
'Emily', 'Michael',
'Matthew', 'Laura',
'Kevin', 'Jonas'],
df = pd.DataFrame(exam_data ,
index=labels) print("Rows where score is
missing:") print(df[df['score'].isnull()])
df2 = pd.DataFrame({'W':[78,75,86,80,None],
'X':[78,85,96,80,76],
'Y':[84,84,89,83,86],
'Z':[86,97,96,72,83]});
print("Original DataFrames:")
print("\ndata frame 1:\n",df1)
print("\ndata frame 2:\n",df2)
print("\nCheck for inequality of the said
dataframes:") print(df1.ne(df2))
Original DataFrames:
data frame 1:
W X Y Z
0 68.0 78.0 84 86
1 75.0 85.0 94 97
2 86.0 NaN 89 96
3 80.0 80.0 83 72
4 NaN 86.0 86 83
data frame 2:
W X Y Z
0 78.0 78 84 86
1 75.0 85 84 97
2 86.0 96 89 96
3 80.0 80 83 72
4 NaN 76 86 83
df = pd.DataFrame(data=d)
print("Original DataFrame")
print(df)
print("\nFirst 3
rows of the said
DataFrame':") df1 =
df.head(3)
Original DataFrame
col1 col2 col3
0 1 4 7
1 2 5 5
2 3 6 8
3 4 9 12
4 7 5 1
5 11 0 11
First 3 rows of the said DataFrame': col1 col2 col3
0 1 4 7
1 2 5 5
2 3 6 8