Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Toward Data Science

Pandas

Year 2023
Outline
 What is Pandas? Why using it?
 Getting started – Data and Installation
 Data manipulation
 Question
What is Pandas?

https://www.kaggle.com/datasets/abcsds/pokemon?resource=download
Year 2023
What is Pandas?
A Python library

Exploring

Manipulating

Cleaning

Analyzing
Why Using Pandas

Ease of Use Performance and Scalability

Visualization and Reporting Automation and Reproducibility

Learning Curve Customization and Flexibility

Quick Prototyping Integration with Ecosystem

Data Entry and Formatting Version Control

Compatibility Reproducibility and Portability


Getting Started

https://www.kaggle.com/datasets/abcsds/pokemon?resource=download
Year 2023
Getting started
Data link for today lesson
Getting started
Download data on Colab

Run the line below in Colab to download the dataset

!gdown 136oQPFJqG0vugwm0oAOU9IxwJN0zT54K
Read CSV

Year 2023
Read CSV
import pandas as pd
pd.read_csv(<<data path>>, **kwargs)

All rows? How about first and last rows?


To CSV

Year 2023
To CSV

To CSV format

To xlsx format, remove index col

To CSV format, use ‘tab’ instead


Describe

Year 2023
Describe

Get some general informations of your data


Head - Tail

Year 2023
Head - Tail
Print the first <<nums>> rows, default = 5

Wanna get specific columns?

Print the last <<8>> rows

Wonder what colunms we have?

How about specific rows?

Specific coordinates perhaps?


Columns

Year 2023
Columns

We can re-arrange the data as ours liking

There is also

data.Name
data[‘Name’]

But I would not recommend using these methods


Columns

Remove a column

Add new column


Columns

Arrange by column’s name

Arrange by column’s indices


Iloc - Loc

Year 2023
iloc - loc

Single row

Multiple rows

A range of rows

Rows that fulfill condition


iloc - loc

1 2 3 4
0

row = 3, col = 4

Locations that fulfill condition


Filtering Data

Year 2023
Filtering Data
We can filter with multiple conditions at ease
Regex Filtering

Year 2023
Regex Filtering

Explicit Filtering

Regex Filtering, case sensitive by defaul

Turn off case sensitive


Regex Filtering

REGEX
basic
syntaxs
Regex Filtering
Sort values

Year 2023
Sort values

Sort by one column, default is Ascending

Sort by one column, Descending

Sort by multiple columns, 1 = asc ; 0 = des


Reset Index

Year 2023
Reset Index

The Index was not in order

Here we reset the index, but the change was not inplace

The change only register when we save it to a variable

But as you can see the original index was still there
Group by

Year 2023
Group by

What to group Aggregate func


Group by

Hierarchical group
Group by

You can use custom function

Different function for


different column
Quizzes

Year 2023
QUIZZES
Use what we have learned about Pandas to solve these questions below.
1/ How can you read a CSV file named "pokemon_data.csv" into a Pandas DataFrame?
2/ How do you display the first 5 rows of the DataFrame?
3/ How can you display the last 3 rows of the DataFrame?
4/ What method is used to see the names of all columns in the DataFrame?
5/ How can you select the rows from index 10 to 15 (inclusive)?
6/ How do you select the row with index 50?
7/ How can you filter the DataFrame to only show rows where the 'Type 1' column is 'Grass’?
8/ How can you filter the DataFrame to only show rows where the 'HP' column is greater than 100?
9/ How can you filter the DataFrame to only show rows where the 'Name' column contains the
substring 'chu’?
10/ How can you sort the DataFrame based on the 'Attack' column in descending order?
11/ How do you reset the index of the DataFrame?
12/ How can you select specific columns 'Name', 'Type 1', and 'HP' from the DataFrame?
13/ Filter the DataFrame to only show rows where 'Type 1' is 'Grass' and 'Type 2' is 'Poison’?
14/ How can you find the mean HP of Legendary Pokémon?
15/ How can you create a new DataFrame containing only the rows with even-numbered indices?
16/ How can you write the filtered DataFrame to a new CSV file named "filtered_pokemon.csv"?
QUIZZES
Use what we have learned about Pandas to solve these questions below.
17/ Display the first 10 rows of the DataFrame for columns 'Name', 'Type 1', and 'Attack’?
18/ How do you find the maximum Defense value in the DataFrame?
19/ How can you create a new DataFrame with rows where 'Speed' is above the mean Speed value?
20/ How can you select the first 3 rows and the columns 'Name' and 'Attack’?
21/ How can you find the number of Pokémon of each 'Type 1' in the DataFrame?
22/ How can you display the rows where 'Generation' is 3 or 'Generation' is 5?
23/ How can you replace all occurrences of 'Fire' in the 'Type 1' column with 'Flame’?
24/ How can you display the rows where 'Name' starts with the letter 'P’?
25/ How can you find the mean 'Attack' value for each 'Type 1’?
26/ How can you sort the DataFrame based on 'Type 1' in ascending and 'Attack' in descending order?
27/ How can you find the median 'Defense' value for each 'Type 1’?
28/ How can you filter the DataFrame to show only Legendary Pokémon with 'Attack’ > 100?
29/ Create a new DataFrame with only 'Name' and 'Total' columns, and rename 'Total' to 'Overall’?
30/ How can you display the rows where 'Type 1' is 'Water' and 'Attack' is at least 80?
31/ Create a new DataFrame with only the 'Name' and 'Defense' columns for Generation 4 Pokémon?
32/ How can you find the mean 'Total' value for each 'Type 1' and 'Type 2' combination?
33/ Filter the DataFrame to show only Pokémon with a 'Total' between 400 and 500 (inclusive)?

You might also like