Professional Documents
Culture Documents
Loki Temp PPT Pandas 2
Loki Temp PPT Pandas 2
Loki Temp PPT Pandas 2
By: Logesh R
Overview of Pandas vs
Pandas NumPy
Installing and
Pandas Data
Importing
Structures
Pandas
Agenda
Merging and
Operating on
Concatenating
Data in Pandas
DataFrames
Handling
Missing Data
Introduction to Pandas
Definition and Importance Applications in Data Analysis Key Features
NumPy:
• Ideal for numerical operations, array manipulation, and mathematical functions.
Pandas:
• Suited for data manipulation, analysis, and cleaning, especially when dealing with
structured data in tabular form.
Use NumPy for:
• Mathematical and array operations.
Use Pandas for:
• Data cleaning, analysis, and manipulation in tabular datasets.
Installing Pandas
Code:
!pip install pandas
Output:
Confirmation of successful installation.
Importing Pandas
Pandas Data Structures
Introduction to Series and DataFrame
Series:
Definition:
A one-dimensional labeled array capable of holding any data type.
Characteristics:
Indexed, Homogeneous data type, Size Immutable.
Use Cases:
Often used for representing a column in a dataset or a single-dimensional dataset.
DataFrame:
Definition:
A two-dimensional labeled data structure with columns that can be of different data types.
Characteristics:
Tabular structure, Indexed, Heterogeneous data types, Size Mutable.
Use Cases:
Represents a complete dataset, similar to a spreadsheet or SQL table.
Code to create a
Pandas Series
Code:
import pandas as pd
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
print(series)
Creating a DataFrame
Code:
import pandas as pd
data= {'Name':'Alice','Bob', 'Charlie'],
'Age': [25, 30, 35]}
df = pd.DataFrame(data)
print(df)
Code:
import pandas as pd
Reading a Dataset df = pd.read_csv('your_dataset.csv')
print(df.head())
Output:
Operating on Data in
Pandas
Pandas in action! Perform basic operations,
statistical summaries, and mean calculations
effortlessly.
Basic Operations
• head(), tail(), sample()
• describe(), info(), dtypes()
• min(), max(), mean()
• df.head() - In Pandas head is used to display the ordered
data from the top.
head(), tail(), • df.tail() - The Tail is opposite to the head. It displays the
sample() ordered data from below.
• df.sample() - Using the Sample method, you can display the
random data from your dataset.
• df.describe() - The describe() method
returns description of the data in the
DataFrame.
• df.info() - Pandas info() function is
describe(), used to get a concise summary of the
info(), dtypes() dataframe.
• df.dtypes() - Pandas dtypes attribute
return the data types in the
DataFrame. It returns a Series with
the data type of each column.
df.describe()
df.info()
df.dtypes()
min(), max(), mean()
grouped_df.get_group('group_name') #
Finding the values contained in the
any group.
Merging and Concatenating
DataFrames
Concatenating DataFrames
Code:
concatenated_df = pd.concat([df1, df2])
print(concatenated_df)
Merging DataFrames
The merge() method updates the content
of two DataFrame by merging them
together, using the specified method(s).
Python code:
df.fillna(<any statement>)
Conclusion
Recap of Key Concepts
Pandas Fundamentals:
Series and DataFrame are the core structures for data manipulation and analysis.
Effortless installation and import with pip install pandas and import pandas as pd.
Data Exploration:
Reading datasets using pd.read_csv() to kickstart analysis.
Basic operations, statistical summaries, and mean calculations for quick insights.
Data Manipulation:
Data selection using labels (df.loc[]) and indexes (df.iloc[]).
Filtering data based on specific conditions.
Advanced Operations:
Grouping and aggregating data for more in-depth analysis.
Merging and concatenating DataFrames to create comprehensive datasets.
Handling Missing Data:
Identifying missing values with df.isnull().
Dropping missing values using df.dropna() and filling them with df.fillna()