Pandas

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Introduction to Pandas

Pandas is a powerful and flexible open-source data analysis and manipulation library for Python. It
provides data structures and functions needed to manipulate structured data seamlessly. Developed
by Wes McKinney in 2008, Pandas is built on top of NumPy and is part of the broader ecosystem of
Python data analysis tools.

Key Features of Pandas

1. Data Structures: Pandas introduces two primary data structures: Series and DataFrame.

 Series: A one-dimensional labeled array capable of holding any data type. Labels,
known as the index, provide a way to access elements.

 DataFrame: A two-dimensional labeled data structure with columns of potentially


different types. It's similar to a spreadsheet or SQL table.

2. Data Alignment and Handling Missing Data: Pandas handles missing data elegantly using
methods like dropna(), fillna(), and more, ensuring robust data cleaning processes.

3. Data Wrangling: Pandas provides powerful tools for merging, reshaping, selecting, and
filtering data. Operations like joining tables (using merge()), concatenation, and group
operations (groupby()) are straightforward and efficient.

4. Input/Output Tools: Pandas supports reading from and writing to various file formats like
CSV, Excel, SQL databases, and more, using methods such as read_csv(), read_excel(),
to_csv(), and to_sql().

5. Time Series Functionality: With robust time series capabilities, Pandas can handle frequency
conversion, moving window statistics, and date range generation.

Basic Operations in Pandas

Importing Pandas

python

Copy code

import pandas as pd

Creating Data Structures

1. Series

python

Copy code

data = [1, 3, 5, 7, 9] series = pd.Series(data) print(series)

2. DataFrame

python

Copy code
data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles',
'Chicago'] } df = pd.DataFrame(data) print(df)

Reading Data

python

Copy code

df = pd.read_csv('data.csv')

Writing Data

python

Copy code

df.to_csv('output.csv', index=False)

Data Selection and Filtering

1. Selecting Columns

python

Copy code

ages = df['Age'] print(ages)

2. Selecting Rows by Label

python

Copy code

row = df.loc[0] print(row)

3. Selecting Rows by Position

python

Copy code

row = df.iloc[0] print(row)

4. Filtering Rows

python

Copy code

filtered_df = df[df['Age'] > 30] print(filtered_df)

Data Manipulation

1. Adding a Column

python

Copy code
df['Salary'] = [50000, 60000, 70000] print(df)

2. Deleting a Column

python

Copy code

df.drop('Salary', axis=1, inplace=True) print(df)

3. Renaming Columns

python

Copy code

df.rename(columns={'Name': 'Full Name'}, inplace=True) print(df)

4. Handling Missing Data

python

Copy code

df.fillna(0, inplace=True) print(df)

Grouping and Aggregation

Pandas provides the groupby() function for data aggregation.

python

Copy code

grouped = df.groupby('City').mean() print(grouped)

Merging and Joining

1. Concatenation

python

Copy code

df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', 'B1']}) df2 = pd.DataFrame({'A': ['A2', 'A3'], 'B': ['B2',
'B3']}) result = pd.concat([df1, df2]) print(result)

2. Merging

python

Copy code

left = pd.DataFrame({'key': ['K0', 'K1'], 'A': ['A0', 'A1']}) right = pd.DataFrame({'key': ['K0', 'K1'], 'B':
['B0', 'B1']}) merged = pd.merge(left, right, on='key') print(merged)

Advanced Features

1. Time Series

python
Copy code

rng = pd.date_range('2021-01-01', periods=10, freq='D') ts = pd.Series(range(10), index=rng) print(ts)

2. Pivot Tables

python

Copy code

pivot = df.pivot_table(values='Age', index='City', columns='Name', aggfunc='mean') print(pivot)

Visualization with Pandas

Pandas integrates with Matplotlib for easy plotting.

python

Copy code

import matplotlib.pyplot as plt df.plot(x='Name', y='Age', kind='bar') plt.show()

Conclusion

Pandas is an essential tool for data scientists and analysts, offering powerful data manipulation
capabilities and seamless integration with other Python libraries. Whether dealing with small
datasets or large-scale data, Pandas provides the flexibility and efficiency required to perform
complex data analysis tasks. As you become more familiar with its functionalities, you'll find Pandas
indispensable for your data-driven projects.

You might also like