Lab2.2 Kritika

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Experiment 2.

2
AIM-Study Different Basic functions of Pandas Library

1. Create a series of 6 different branches of CSE

In [1]:
import pandas as pd

In [2]:
branches = [
"Artificial Intelligence and Machine Learning (AI/ML)",
"Data Science and Analytics",
"Computer Systems Engineering",
"Software Engineering",
"Cybersecurity",
"Human-Computer Interaction (HCI)"
]

branches_series = pd.Series(branches)
branches_series

Out[2]: 0 Artificial Intelligence and Machine Learning (...


1 Data Science and Analytics
2 Computer Systems Engineering
3 Software Engineering
4 Cybersecurity
5 Human-Computer Interaction (HCI)
dtype: object

1. Create a DataFrame using the following dataitems. Complete the following tasksName of the
dataframe= Yourname1

Regd No Name DBE Mark DAA Result Grade

S001 Rohan 68 54 Pass A

S002 Rahul 78 65 Fail C

S003 Seema 45 23 Pass E

S003 Puja 34 78 Fail C

S004 Priya 25 90 Pass O

S005 Rohan 67 65 Pass A

S006 Guduli 89 34 Fail C

i. Display the first 5 rows

ii. Display the last 5 rows


iii. Display the statistical description

iv. Display the transpose of Statistical description

v. Store the statistical description into a new dataframe for further usage

vi. Display the Regd No and Result of first 3 students.

vii. Display the Name of last 5 students

viii. Delete the feature Grade

ix. Display the number of rows

x. Display the number of features

xi. Display the dimension of the dataframe

xii. Display the name of each features

xiii. Display the Regd no and DBE mark of those students whose Result = Pass

xiv. Display the DBE Mark, DAA Mark , and Regd No of those students whose Name =Rohan and
Result= Pass

In [3]:
import pandas as pd

# Creating the DataFrame


data = {
"Regd No": ["S001", "S002", "S003", "S003", "S004", "S005", "S006"],
"Name": ["Rohan", "Rahul", "Seema", "Puja", "Priya", "Rohan", "Guduli"],
"DBE Mark": [68, 78, 45, 34, 25, 67, 89],
"DAA Mark": [54, 65, 23, 78, 90, 65, 34],
"Result": ["Pass", "Fail", "Pass", "Fail", "Pass", "Pass", "Fail"],
"Grade": ["A", "C", "E", "C", "O", "A", "C"]
}

Yourname1 = pd.DataFrame(data, columns=["Regd No", "Name", "DBE Mark", "DAA Mark


Yourname1

Out[3]: Regd No Name DBE Mark DAA Mark Result Grade

0 S001 Rohan 68 54 Pass A

1 S002 Rahul 78 65 Fail C

2 S003 Seema 45 23 Pass E

3 S003 Puja 34 78 Fail C

4 S004 Priya 25 90 Pass O

5 S005 Rohan 67 65 Pass A

6 S006 Guduli 89 34 Fail C

In [4]:
Yourname1.head()
Out[4]: Regd No Name DBE Mark DAA Mark Result Grade

0 S001 Rohan 68 54 Pass A

1 S002 Rahul 78 65 Fail C

2 S003 Seema 45 23 Pass E

3 S003 Puja 34 78 Fail C

4 S004 Priya 25 90 Pass O

In [5]:
Yourname1.tail()

Out[5]: Regd No Name DBE Mark DAA Mark Result Grade

2 S003 Seema 45 23 Pass E

3 S003 Puja 34 78 Fail C

4 S004 Priya 25 90 Pass O

5 S005 Rohan 67 65 Pass A

6 S006 Guduli 89 34 Fail C

In [6]:
Yourname1.describe()

Out[6]: DBE Mark DAA Mark

count 7.000000 7.000000

mean 58.000000 58.428571

std 23.720596 23.585710

min 25.000000 23.000000

25% 39.500000 44.000000

50% 67.000000 65.000000

75% 73.000000 71.500000

max 89.000000 90.000000

In [7]:
Yourname1.describe().transpose()

Out[7]: count mean std min 25% 50% 75% max

DBE Mark 7.0 58.000000 23.720596 25.0 39.5 67.0 73.0 89.0

DAA Mark 7.0 58.428571 23.585710 23.0 44.0 65.0 71.5 90.0

In [8]:
statistical_description=Yourname1.describe()
In [9]:
sel_col = ["Regd No" ,"Result"]
Yourname1[sel_col].head(3)

Out[9]: Regd No Result

0 S001 Pass

1 S002 Fail

2 S003 Pass

In [10]:
Yourname1["Name"].tail()

Out[10]: 2 Seema
3 Puja
4 Priya
5 Rohan
6 Guduli
Name: Name, dtype: object

In [11]:
Yourname1 = Yourname1.drop(columns=["Grade"])
Yourname1

Out[11]: Regd No Name DBE Mark DAA Mark Result

0 S001 Rohan 68 54 Pass

1 S002 Rahul 78 65 Fail

2 S003 Seema 45 23 Pass

3 S003 Puja 34 78 Fail

4 S004 Priya 25 90 Pass

5 S005 Rohan 67 65 Pass

6 S006 Guduli 89 34 Fail

In [13]:
print("Number of rows" ,len(Yourname1))

Number of rows 7

In [14]:
print("Number of columns" ,len(Yourname1.columns))

Number of columns 5

In [15]:
print("Dimension of the dataframe:", Yourname1.shape)

Dimension of the dataframe: (7, 5)

In [16]:
print("Feature names:", Yourname1.columns.tolist())

Feature names: ['Regd No', 'Name', 'DBE Mark', 'DAA Mark', 'Result']
In [18]:
passing_students=Yourname1[Yourname1["Result"]=="Pass"]
passing_students[["Regd No", "DBE Mark"]]

Out[18]: Regd No DBE Mark

0 S001 68

2 S003 45

4 S004 25

5 S005 67

In [21]:
selected_students = Yourname1[(Yourname1["Name"] == "Rohan") & (Yourname1["Resul
print(selected_students[["DBE Mark", "DAA Mark", "Regd No"]])

DBE Mark DAA Mark Regd No


0 68 54 S001
5 67 65 S005

1. Implement the following questions using student_result.csv file

i. Load the dataset and store using yourname2

ii. Check the datatype, index range, memory usage, number of columns and rows.

iii. Check result distribution i.e. count the number of students passed and failed.

iv. Check students who have got more than or equal to 80 on Math

v. List of students who have failed in all subjects

vi. Find the correlation between attributes

vii. Add Name column from yourname1 dataset to yourname2 dataset

viii. Display all the data items of yourname2 dataset.

ix. Check for any null value present in dataframe

x. More precisely check any null values is present in each feature or not

xi. Impute “Anonymous” to each ’null value’.

xii. Drop first 3 rows

In [22]:
import pandas as pd
import numpy as np

In [24]:
yourname2=pd.read_csv('C:/Users/kriti/Downloads/student_result.csv')
yourname2
Out[24]: math bangla english result

0 70 80 90 1

1 30 40 50 0

2 50 20 35 0

3 80 33 33 1

4 33 35 36 1

5 32 80 35 0

6 40 50 21 0

7 33 35 35 1

8 60 23 10 0

9 33 34 35 1

10 50 40 40 1

11 35 40 30 0

12 0 0 0 0

13 10 10 10 0

14 33 33 33 1

In [25]:
print("Data Types:")
print(yourname2.dtypes)
print("\nIndex Range:")
print(yourname2.index)
print("\nMemory Usage:")
print(yourname2.memory_usage())
print("\nNumber of Columns and Rows:")
print(yourname2.shape)

Data Types:
math int64
bangla int64
english int64
result int64
dtype: object

Index Range:
RangeIndex(start=0, stop=15, step=1)

Memory Usage:
Index 128
math 120
bangla 120
english 120
result 120
dtype: int64

Number of Columns and Rows:


(15, 4)

In [27]:
filter = yourname2[yourname2['result'] == 0]
print("Fail Students : ",filter["result"].count())
filter = yourname2[yourname2['result'] == 1]
print("Pass Students : ",filter["result"].count())

Fail Students : 8
Pass Students : 7

In [28]:
filter = yourname2[yourname2["math"] >= 80]
filter

Out[28]: math bangla english result

3 80 33 33 1

In [29]:
filter = yourname2[yourname2["result"] == 0]
filter

Out[29]: math bangla english result

1 30 40 50 0

2 50 20 35 0

5 32 80 35 0

6 40 50 21 0

8 60 23 10 0

11 35 40 30 0

12 0 0 0 0

13 10 10 10 0

In [30]:
df_corr = yourname2.corr()
df_corr

Out[30]: math bangla english result

math 1.000000 0.430168 0.526313 0.382474

bangla 0.430168 1.000000 0.733799 0.204588

english 0.526313 0.733799 1.000000 0.484200

result 0.382474 0.204588 0.484200 1.000000

In [32]:
import seaborn as sns
import matplotlib.pyplot as plt
sns.heatmap(df_corr)
plt.title("Graphical Co-relation between Attributes")

Out[32]: Text(0.5, 1.0, 'Graphical Co-relation between Attributes')


In [35]:
yourname2 = yourname2.join(Yourname1["Name"])
yourname2

Out[35]: math bangla english result Name

0 70 80 90 1 Rohan

1 30 40 50 0 Rahul

2 50 20 35 0 Seema

3 80 33 33 1 Puja

4 33 35 36 1 Priya

5 32 80 35 0 Rohan

6 40 50 21 0 Guduli

7 33 35 35 1 NaN

8 60 23 10 0 NaN

9 33 34 35 1 NaN

10 50 40 40 1 NaN

11 35 40 30 0 NaN

12 0 0 0 0 NaN

13 10 10 10 0 NaN

14 33 33 33 1 NaN

In [36]:
print("Checks for any null value in DataFrame : ",yourname2.isnull())

Checks for any null value in DataFrame : math bangla english result N
ame
0 False False False False False
1 False False False False False
2 False False False False False
3 False False False False False
4 False False False False False
5 False False False False False
6 False False False False False
7 False False False False True
8 False False False False True
9 False False False False True
10 False False False False True
11 False False False False True
12 False False False False True
13 False False False False True
14 False False False False True

In [37]:
print("Checks for any null value in DataFrame (precisely) : ",yourname2.isnull()

Checks for any null value in DataFrame (precisely) : math False


bangla False
english False
result False
Name True
dtype: bool

In [38]:
yourname2 = yourname2.fillna("Anonymous")
yourname2

Out[38]: math bangla english result Name

0 70 80 90 1 Rohan

1 30 40 50 0 Rahul

2 50 20 35 0 Seema

3 80 33 33 1 Puja

4 33 35 36 1 Priya

5 32 80 35 0 Rohan

6 40 50 21 0 Guduli

7 33 35 35 1 Anonymous

8 60 23 10 0 Anonymous

9 33 34 35 1 Anonymous

10 50 40 40 1 Anonymous

11 35 40 30 0 Anonymous

12 0 0 0 0 Anonymous

13 10 10 10 0 Anonymous

14 33 33 33 1 Anonymous

In [39]:
yourname2 = yourname2.drop(index = yourname2.index[:3])
yourname2

Out[39]: math bangla english result Name

3 80 33 33 1 Puja

4 33 35 36 1 Priya

5 32 80 35 0 Rohan
math bangla english result Name

6 40 50 21 0 Guduli

7 33 35 35 1 Anonymous

8 60 23 10 0 Anonymous

9 33 34 35 1 Anonymous

10 50 40 40 1 Anonymous

11 35 40 30 0 Anonymous

12 0 0 0 0 Anonymous

13 10 10 10 0 Anonymous

14 33 33 33 1 Anonymous

Name-Kritika Das

Roll no-CSE21068

Regd no-2101020068

In [ ]:

You might also like