Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

Lab Session -04

Artificial Intelligence (Comp – 00634)


Lab Session-04

Objective: Understanding the basics of Pandas in Python

A- Outcomes:
After completion of the lab session students will be able:
a. To understand the series data structure
b. To understand the DataFrame
c. To understand indexing, selection & filtering in Pandas
d. To learn applying functions, sorting & ranking in Pandas

1
Lab Session -04

B- Lab Tasks:
1- Consider the following sample data for a DataFrame:
ID Name Math Score English Score Science Score
101 Ali 85 92 78
102 Fatima 92 88 90
103 Hassan 78 80 75
104 Aisha 95 90 92
105 Ahmed 88 85 85
106 Hira 90 95 88
107 Saad 79 82 80
108 Zara 87 91 89
109 Bilal 93 89 94
110 Sana 84 93 87

 Create a Pandas DataFrame using the given sample data and display the
DataFrame.
 Set the ‘ID’ column as the index of the DataFrame.
 Access and display the details of the student with ID 105.
 Display the maths scores of students with IDs 102, 104 and 108.
 Calculate and display the average scores for each subject.
 Display students who scored above the average Math score.
 Sort the DataFrame based on the ‘English Score’ column in descending order.
 Add a new column ‘Total Score’ representing the sum of Math, English, and
Science scores.
 Display the DataFrame after adding the ‘Total Score’ column.
Write/copy your code here:
Code:
import pandas as pd

print("Maaz Bin Fazal_39_Sample data")


data = {
'ID': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110],

2
Lab Session -04

'Name': ['Ali', 'Fatima', 'Hassan', 'Aisha', 'Ahmed', 'Hira', 'Saad', 'Zara', 'Bilal', 'Sana'],
'Math Score': [85, 92, 78, 95, 88, 90, 79, 87, 93, 84],
'English Score': [92, 88, 80, 90, 85, 95, 82, 91, 89, 93],
'Science Score': [78, 90, 75, 92, 85, 88, 80, 89, 94, 87]
}

# Create DataFrame
df = pd.DataFrame(data)

# Set 'ID' column as index


df.set_index('ID', inplace=True)

# Display details of student with ID 105


print("\nMaaz Bin Fazal_39_Display details of student with ID 105")
print(df.loc[105])

# Display math scores of students with IDs 102, 104, and 108
print("\nMaaz Bin Fazal_39_Display math scores of students with IDs 102, 104, and
108")
print(df.loc[[102, 104, 108], 'Math Score'])

# Calculate and display average scores for each subject


print("\nMaaz Bin Fazal_39_Calculate and display average scores for each subject")
avg_scores = df.mean()
print("Average Scores:\n", avg_scores)

# Display students who scored above the average Math score


print("\nMaaz Bin Fazal_39_Display students who scored above the average Math
score")
above_avg_math = df[df['Math Score'] > avg_scores['Math Score']]
print("Students with Math Score above average:\n", above_avg_math)

# _Sort the DataFrame based on 'English Score' column in descending order

3
Lab Session -04

print("\nMaaz Bin Fazal_39_Sort the DataFrame based on 'English Score' column in


descending order")
df_sorted = df.sort_values(by='English Score', ascending=False)
print("Sorted DataFrame based on English Score:\n", df_sorted)

# Add a new column 'Total Score' representing the sum of Math, English, and Science
scores
print("\nMaaz Bin Fazal_39_Add a new column 'Total Score' representing the sum of
Math, English, and Science scores")
df['Total Score'] = df['Math Score'] + df['English Score'] + df['Science Score']

# Display DataFrame after adding 'Total Score' column


print("\nMaaz Bin Fazal_39_Display DataFrame after adding 'Total Score' column")
print(df)Output:
1

4
Lab Session -04

2- Consider the following extended sample data for a DataFrame, representing sales
data for a retail store:

Product Product Name Unit Quantity Discount Customer


ID Price Sold % Rating
01 Laptop 1200 50 5 4.2
02 Smartphone 800 80 10 4.5
03 Television 1500 30 8 4.1
04 Refrigerator 1000 45 6 4.3
05 Washing Machine 1200 35 7 4.4
06 Air Conditioner 1800 20 12 4.0

5
Lab Session -04

07 Microwave oven 400 60 4 4.6


08 Blender 50 150 2 4.8
09 Vacuum Cleaner 200 40 5 4.2
10 Coffee Maker 100 120 3 4.7

 Create a Pandas DataFrame using the given sample data and display the
DataFrame.
 Set the ‘Product ID’ column as the index of the DataFrame.
 Create a column “Total Revenue”. Calculate and display the total revenue for
each product (Unit Price * Quantity Sold after applying discount %).
 Identify and Display the top-selling product based on total revenue.
 Sort the DataFrame based on the ‘Customer Rating’ column in descending
order.
 Display the products with a customer rating above 4.2.
 Calculate and display the correlation matrix for the numerical columns (Unit
Price, Quantity Sold, Discount %, Customer Rating).

Write/copy your code here:


# Maaz_39
import pandas as pd

# Maaz_39_Sample data
data = {
'Product ID': ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10'],
'Product Name': ['Laptop', 'Smartphone', 'Television', 'Refrigerator', 'Washing
Machine',
'Air Conditioner', 'Microwave oven', 'Blender', 'Vacuum Cleaner', 'Coffee
Maker'],
'Unit Price': [1200, 800, 1500, 1000, 1200, 1800, 400, 50, 200, 100],
'Quantity Sold': [50, 80, 30, 45, 35, 20, 60, 150, 40, 120],
'Discount %': [5, 10, 8, 6, 7, 12, 4, 2, 5, 3],
'Customer Rating': [4.2, 4.5, 4.1, 4.3, 4.4, 4.0, 4.6, 4.8, 4.2, 4.7]

6
Lab Session -04

# Maaz_39_Create DataFrame
df = pd.DataFrame(data)

# Maaz_39_Set 'Product ID' column as index


df.set_index('Product ID', inplace=True)

# Maaz_39_Create column "Total Revenue"


df['Total Revenue'] = df['Unit Price'] * df['Quantity Sold'] * (1 - df['Discount %'] / 100)

# Display DataFrame
print("Maaz_39_DataFrame:\n", df)

# Identify and Display the top-selling product based on total revenue


top_selling_product = df['Total Revenue'].idxmax()
print("\nTop selling product based on total revenue:", df.loc[top_selling_product])

# Maaz_39_Sort the DataFrame based on 'Customer Rating' column in descending


order
df_sorted = df.sort_values(by='Customer Rating', ascending=False)
print("\nSorted DataFrame based on Customer Rating:\n", df_sorted)

# Maaz_39_ Display the products with a customer rating above 4.2


high_rated_products = df[df['Customer Rating'] > 4.2]
print("\nProducts with a customer rating above 4.2:\n", high_rated_products)

7
Lab Session -04

# Maaz_39_Calculate and display the correlation matrix for the numerical columns
correlation_matrix = df[['Unit Price', 'Quantity Sold', 'Discount %', 'Customer
Rating']].corr()
print("\nMaaz_39_Correlation Matrix:\n", correlation_matrix)
Output:
1 2

4 5

8
Lab Session -04

3- Consider the excel file named 2_pak_imm.xlsx containing year-wise immigration


data from Pakistan to other countries. The data includes the following columns:
 Origin Country (constant value: ‘Pakistan’)
 Origin Country Latitude
 Origin Country Longitude
 Destination Region
 Destination Country
 Destination Country Latitude
 Destination Country Longitude
 Year
 Number of Emigrants
Perform the following tasks:

9
Lab Session -04

 Load the excel file into a Pandas DataFrame.


 Display the first few rows of the DataFrame.
 Display the last few rows of the DataFrame.
 Print the concise summary of the DataFrame.
 Drop columns ‘Origin Country’, ‘Origin Country Latitude’, ‘Origin Country
Longitude’, ‘Destination Country Latitude’ and ‘Destination Country
Longitude’ from the DataFrame and display the updated DataFrame.
Write/copy your code here:
Code:
# Maaz_39
import pandas as pd
import ipywidgets as widgets
from IPython.display import display

# Maaz_39_Function to handle file upload and perform tasks


def handle_file_upload(change):
uploaded_file = list(file_upload.value.values())[0]
content = uploaded_file['content']
df = pd.read_excel(content)

# Maaz_39_ Display the first few rows of the DataFrame


print("First few rows of the DataFrame:")
print(df.head())

# Display the last few rows of the DataFrame


print("\nLast few rows of the DataFrame:")
print(df.tail())

# Maaz_39_Print the concise summary of the DataFrame


print("\nConcise summary of the DataFrame:")
print(df.info())

10
Lab Session -04

# Drop specified columns from the DataFrame if they exist


columns_to_drop = ['Origin Country', 'Origin Country Latitude', 'Origin Country
Longitude',
'Destination Country Latitude', 'Destination Country Longitude']
df.drop(columns=[col for col in columns_to_drop if col in df.columns], inplace=True)

# Display the updated DataFrame


print("\nUpdated DataFrame after dropping specified columns:")
print(df)

# Maaz_39_Create a file upload widget


file_upload = widgets.FileUpload(accept='.xlsx', description='Upload Excel file')

# Display the file upload widget


display(file_upload)

# Maaz_39_ Register the callback function to handle file upload


file_upload.observe(handle_file_upload, names='value')
Output:
1

11
Lab Session -04

12
Lab Session -04

4- After dropping the specified columns, the DataFrame should now contain the
following columns:
 Destination Region
 Destination Country
 Year
 No. of Emigrants
Perform the following tasks:
 Display the first few rows of the updated DataFrame.
 Display the last few rows of the updated DataFrame.
 Print the concise summary of the updated DataFrame.

13
Lab Session -04

 Calculate and display the mean, median, and standard deviation of the
‘No. of Emigrants’ column.
 Calculate and display the total number of rows in the DataFrame.
 Calculate and display the total number of cells (entries) in the DataFrame.
 Identify and display the unique values in the ‘Destination Region’ column.
 Display the count of each unique value in the ‘Destination Region’ column.
 Identify and display the data types of each column in the DataFrame.
 Inspect and display the descriptive statistics of the ‘Year’ column.
Write/copy your code here:
import pandas as pd

# Maaz_39

# Assuming df is our DataFrame

# Maaz_39_Specify the columns that should remain in the DataFrame

remaining_columns = ['Destination Region', 'Destination Country', 'Year', 'No. of


Emigrants']

# Maaz_39_ Drop columns not in the remaining_columns list


df = df[remaining_columns]

# Maaz_39_Display the first few rows of the updated DataFrame


print("First few rows of the updated DataFrame:")
print(df.head())

# Display the last few rows of the updated DataFrame


print("\nLast few rows of the updated DataFrame:")
print(df.tail())

# Print the concise summary of the updated DataFrame

14
Lab Session -04

print("\nConcise summary of the updated DataFrame:")


print(df.info())

# Maaz_39_Calculate and display the mean, median, and standard deviation of the
‘No. of Emigrants’ column
print("\nMean:", df['No. of Emigrants'].mean())
print("Median:", df['No. of Emigrants'].median())
print("Standard deviation:", df['No. of Emigrants'].std())

# Maaz_39_Calculate and display the total number of rows in the DataFrame


print("\nTotal number of rows:", len(df))

# Calculate and display the total number of cells (entries) in the DataFrame
print("Total number of cells:", df.size)

# Identify and display the unique values in the ‘Destination Region’ column
print("\nUnique values in the ‘Destination Region’ column:")
print(df['Destination Region'].unique())

# Display the count of each unique value in the ‘Destination Region’ column
print("\nCount of each unique value in the ‘Destination Region’ column:")
print(df['Destination Region'].value_counts())

# Identify and display the data types of each column in the DataFrame
print("\nData types of each column:")
print(df.dtypes)

# Maaz_39_Inspect and display the descriptive statistics of the ‘Year’ column


print("\nDescriptive statistics of the ‘Year’ column:")
print(df['Year'].describe())

15
Lab Session -04

Output:
1 2

16
Lab Session -04

5- Write a conclusion of this lab in your own words.

Write your answer here by hand:

17
Lab Session-04

18

You might also like