Lab2.2 Kritika

Experiment 2.
2
AIM-Study Different Basic functions of Pandas Library
1. Create a series of 6 different branches of CSE
In [1]:
import pandas as pd
In [2]:
branches = [
"Artificial Intelligence and Machine Learning (AI/ML)",
"Data Science and Analytics",
"Computer Systems Engineering",
"Software Engineering",
"Cybersecurity",
"Human-Computer Interaction (HCI)"
]
branches_series = pd.Series(branches)
branches_series
Out[2]: 0 Artificial Intelligence and Machine Learning (...

1 Data Science and Analytics
2 Computer Systems Engineering
3 Software Engineering
4 Cybersecurity
5 Human-Computer Interaction (HCI)
dtype: object
1. Create a DataFrame using the following dataitems. Complete the following tasksName of the
dataframe= Yourname1
Regd No Name DBE Mark DAA Result Grade
S001 Rohan 68 54 Pass A
S002 Rahul 78 65 Fail C
S003 Seema 45 23 Pass E
S003 Puja 34 78 Fail C
S004 Priya 25 90 Pass O
S005 Rohan 67 65 Pass A
S006 Guduli 89 34 Fail C
i. Display the first 5 rows
ii. Display the last 5 rows

iii. Display the statistical description
iv. Display the transpose of Statistical description
v. Store the statistical description into a new dataframe for further usage
vi. Display the Regd No and Result of first 3 students.
vii. Display the Name of last 5 students
viii. Delete the feature Grade
ix. Display the number of rows
x. Display the number of features
xi. Display the dimension of the dataframe
xii. Display the name of each features
xiii. Display the Regd no and DBE mark of those students whose Result = Pass
xiv. Display the DBE Mark, DAA Mark , and Regd No of those students whose Name =Rohan and
Result= Pass
In [3]:
import pandas as pd
# Creating the DataFrame

data = {
"Regd No": ["S001", "S002", "S003", "S003", "S004", "S005", "S006"],
"Name": ["Rohan", "Rahul", "Seema", "Puja", "Priya", "Rohan", "Guduli"],
"DBE Mark": [68, 78, 45, 34, 25, 67, 89],
"DAA Mark": [54, 65, 23, 78, 90, 65, 34],
"Result": ["Pass", "Fail", "Pass", "Fail", "Pass", "Pass", "Fail"],
"Grade": ["A", "C", "E", "C", "O", "A", "C"]
}
Yourname1 = pd.DataFrame(data, columns=["Regd No", "Name", "DBE Mark", "DAA Mark

Yourname1
Out[3]: Regd No Name DBE Mark DAA Mark Result Grade
0 S001 Rohan 68 54 Pass A
1 S002 Rahul 78 65 Fail C
2 S003 Seema 45 23 Pass E
3 S003 Puja 34 78 Fail C
4 S004 Priya 25 90 Pass O
6 S006 Guduli 89 34 Fail C
In [4]:
Yourname1.head()
1 S002 Rahul 78 65 Fail C
In [5]:
Yourname1.tail()
6 S006 Guduli 89 34 Fail C
In [6]:
Yourname1.describe()
Out[6]: DBE Mark DAA Mark
count 7.000000 7.000000
mean 58.000000 58.428571
std 23.720596 23.585710
min 25.000000 23.000000
25% 39.500000 44.000000
50% 67.000000 65.000000
75% 73.000000 71.500000
max 89.000000 90.000000
In [7]:
Yourname1.describe().transpose()
Out[7]: count mean std min 25% 50% 75% max
DBE Mark 7.0 58.000000 23.720596 25.0 39.5 67.0 73.0 89.0
DAA Mark 7.0 58.428571 23.585710 23.0 44.0 65.0 71.5 90.0
In [8]:
statistical_description=Yourname1.describe()
In [9]:
sel_col = ["Regd No" ,"Result"]
Yourname1[sel_col].head(3)
Out[9]: Regd No Result
0 S001 Pass
1 S002 Fail
2 S003 Pass
In [10]:
Yourname1["Name"].tail()
Out[10]: 2 Seema
3 Puja
4 Priya
5 Rohan
6 Guduli
Name: Name, dtype: object
In [11]:
Yourname1 = Yourname1.drop(columns=["Grade"])
Yourname1
Out[11]: Regd No Name DBE Mark DAA Mark Result
0 S001 Rohan 68 54 Pass
1 S002 Rahul 78 65 Fail
2 S003 Seema 45 23 Pass
3 S003 Puja 34 78 Fail
4 S004 Priya 25 90 Pass
5 S005 Rohan 67 65 Pass
6 S006 Guduli 89 34 Fail
In [13]:
print("Number of rows" ,len(Yourname1))
Number of rows 7
In [14]:
print("Number of columns" ,len(Yourname1.columns))
Number of columns 5
In [15]:
print("Dimension of the dataframe:", Yourname1.shape)
Dimension of the dataframe: (7, 5)
In [16]:
print("Feature names:", Yourname1.columns.tolist())
Feature names: ['Regd No', 'Name', 'DBE Mark', 'DAA Mark', 'Result']
In [18]:
passing_students=Yourname1[Yourname1["Result"]=="Pass"]
passing_students[["Regd No", "DBE Mark"]]
Out[18]: Regd No DBE Mark
0 S001 68
2 S003 45
4 S004 25
5 S005 67
In [21]:
selected_students = Yourname1[(Yourname1["Name"] == "Rohan") & (Yourname1["Resul
print(selected_students[["DBE Mark", "DAA Mark", "Regd No"]])
DBE Mark DAA Mark Regd No

0 68 54 S001
5 67 65 S005
1. Implement the following questions using student_result.csv file
i. Load the dataset and store using yourname2
ii. Check the datatype, index range, memory usage, number of columns and rows.
iii. Check result distribution i.e. count the number of students passed and failed.
iv. Check students who have got more than or equal to 80 on Math
v. List of students who have failed in all subjects
vi. Find the correlation between attributes
vii. Add Name column from yourname1 dataset to yourname2 dataset
viii. Display all the data items of yourname2 dataset.
ix. Check for any null value present in dataframe
x. More precisely check any null values is present in each feature or not
xi. Impute “Anonymous” to each ’null value’.
xii. Drop first 3 rows
In [22]:
import pandas as pd
import numpy as np
In [24]:
yourname2=pd.read_csv('C:/Users/kriti/Downloads/student_result.csv')
yourname2
Out[24]: math bangla english result
0 70 80 90 1
1 30 40 50 0
2 50 20 35 0
3 80 33 33 1
4 33 35 36 1
5 32 80 35 0
6 40 50 21 0
7 33 35 35 1
8 60 23 10 0
9 33 34 35 1
10 50 40 40 1
11 35 40 30 0
12 0 0 0 0
13 10 10 10 0
14 33 33 33 1
In [25]:
print("Data Types:")
print(yourname2.dtypes)
print("\nIndex Range:")
print(yourname2.index)
print("\nMemory Usage:")
print(yourname2.memory_usage())
print("\nNumber of Columns and Rows:")
print(yourname2.shape)
Data Types:
math int64
bangla int64
english int64
result int64
dtype: object
Index Range:
RangeIndex(start=0, stop=15, step=1)
Memory Usage:
Index 128
math 120
bangla 120
english 120
result 120
dtype: int64
Number of Columns and Rows:

(15, 4)
In [27]:
filter = yourname2[yourname2['result'] == 0]
print("Fail Students : ",filter["result"].count())
filter = yourname2[yourname2['result'] == 1]
print("Pass Students : ",filter["result"].count())
Fail Students : 8
Pass Students : 7
In [28]:
filter = yourname2[yourname2["math"] >= 80]
filter
3 80 33 33 1
In [29]:
filter = yourname2[yourname2["result"] == 0]
filter
1 30 40 50 0
2 50 20 35 0
5 32 80 35 0
6 40 50 21 0
8 60 23 10 0
11 35 40 30 0
12 0 0 0 0
13 10 10 10 0
In [30]:
df_corr = yourname2.corr()
df_corr
math 1.000000 0.430168 0.526313 0.382474
bangla 0.430168 1.000000 0.733799 0.204588
english 0.526313 0.733799 1.000000 0.484200
result 0.382474 0.204588 0.484200 1.000000
In [32]:
import seaborn as sns
import matplotlib.pyplot as plt
sns.heatmap(df_corr)
plt.title("Graphical Co-relation between Attributes")
Out[32]: Text(0.5, 1.0, 'Graphical Co-relation between Attributes')

In [35]:
yourname2 = yourname2.join(Yourname1["Name"])
yourname2
Out[35]: math bangla english result Name
0 70 80 90 1 Rohan
1 30 40 50 0 Rahul
2 50 20 35 0 Seema
3 80 33 33 1 Puja
4 33 35 36 1 Priya
5 32 80 35 0 Rohan
6 40 50 21 0 Guduli
7 33 35 35 1 NaN
8 60 23 10 0 NaN
9 33 34 35 1 NaN
10 50 40 40 1 NaN
11 35 40 30 0 NaN
12 0 0 0 0 NaN
13 10 10 10 0 NaN
14 33 33 33 1 NaN
In [36]:
print("Checks for any null value in DataFrame : ",yourname2.isnull())
Checks for any null value in DataFrame : math bangla english result N
ame
0 False False False False False
7 False False False False True
In [37]:
print("Checks for any null value in DataFrame (precisely) : ",yourname2.isnull()
Checks for any null value in DataFrame (precisely) : math False

bangla False
english False
result False
Name True
dtype: bool
In [38]:
yourname2 = yourname2.fillna("Anonymous")
yourname2
0 70 80 90 1 Rohan
1 30 40 50 0 Rahul
2 50 20 35 0 Seema
3 80 33 33 1 Puja
4 33 35 36 1 Priya
5 32 80 35 0 Rohan
6 40 50 21 0 Guduli
7 33 35 35 1 Anonymous
10 50 40 40 1 Anonymous
11 35 40 30 0 Anonymous
13 10 10 10 0 Anonymous
14 33 33 33 1 Anonymous
In [39]:
yourname2 = yourname2.drop(index = yourname2.index[:3])
yourname2
3 80 33 33 1 Puja
4 33 35 36 1 Priya
5 32 80 35 0 Rohan
math bangla english result Name
6 40 50 21 0 Guduli
10 50 40 40 1 Anonymous
11 35 40 30 0 Anonymous
13 10 10 10 0 Anonymous
14 33 33 33 1 Anonymous
Name-Kritika Das
Roll no-CSE21068
Regd no-2101020068
In [ ]:

Lab2.2 Kritika

Uploaded by

Copyright:

Available Formats

You might also like

Lab2.2 Kritika

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lab2.2 Kritika

Uploaded by

Copyright:

Available Formats

Experiment 2.

1. Create a series of 6 different branches of CSE

Out[2]: 0 Artificial Intelligence and Machine Learning (...

Regd No Name DBE Mark DAA Result Grade

S001 Rohan 68 54 Pass A

S002 Rahul 78 65 Fail C

S003 Seema 45 23 Pass E

S003 Puja 34 78 Fail C

S004 Priya 25 90 Pass O

S005 Rohan 67 65 Pass A

S006 Guduli 89 34 Fail C

i. Display the first 5 rows

ii. Display the last 5 rows

iv. Display the transpose of Statistical description

vi. Display the Regd No and Result of first 3 students.

vii. Display the Name of last 5 students

viii. Delete the feature Grade

ix. Display the number of rows

x. Display the number of features

xi. Display the dimension of the dataframe

xii. Display the name of each features

# Creating the DataFrame

Yourname1 = pd.DataFrame(data, columns=["Regd No", "Name", "DBE Mark", "DAA Mark

Out[3]: Regd No Name DBE Mark DAA Mark Result Grade

0 S001 Rohan 68 54 Pass A

1 S002 Rahul 78 65 Fail C

2 S003 Seema 45 23 Pass E

3 S003 Puja 34 78 Fail C

4 S004 Priya 25 90 Pass O

5 S005 Rohan 67 65 Pass A

6 S006 Guduli 89 34 Fail C

0 S001 Rohan 68 54 Pass A

1 S002 Rahul 78 65 Fail C

2 S003 Seema 45 23 Pass E

3 S003 Puja 34 78 Fail C

4 S004 Priya 25 90 Pass O

Out[5]: Regd No Name DBE Mark DAA Mark Result Grade

2 S003 Seema 45 23 Pass E

3 S003 Puja 34 78 Fail C

4 S004 Priya 25 90 Pass O

5 S005 Rohan 67 65 Pass A

6 S006 Guduli 89 34 Fail C

Out[6]: DBE Mark DAA Mark

count 7.000000 7.000000

mean 58.000000 58.428571

std 23.720596 23.585710

min 25.000000 23.000000

25% 39.500000 44.000000

50% 67.000000 65.000000

75% 73.000000 71.500000

max 89.000000 90.000000

Out[7]: count mean std min 25% 50% 75% max

Out[9]: Regd No Result

Out[11]: Regd No Name DBE Mark DAA Mark Result

0 S001 Rohan 68 54 Pass

1 S002 Rahul 78 65 Fail

2 S003 Seema 45 23 Pass