Professional Documents
Culture Documents
All Document Reader 1715619870900
All Document Reader 1715619870900
All Document Reader 1715619870900
1. Pandas Series
A Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float,
Python objects, etc.). The axis labels are collectively called indexes.
The Pandas Series is nothing but a column in an Excel sheet. Labels need not be unique but must be of a hashable
type. The object supports both integer and label-based indexing and provides a host of methods for performing
operations involving the index.
Creating a Series:
Empty series
Import pandas as pd
S=pd.Series()
Series(())
Series using array
Import numpy as np
A = np.ndarray([10,20,30,40])
S=pd.Series(a)
Series using Lists
L=[10,11,12,13]
S=pd.Series(L)
Series using Dictionary
S=pd.Series(D)
Pandas Series is created by loading the datasets from existing storage (which can be a SQL database, a CSV file, or
an Excel file).
Pandas Series can be created from lists, dictionaries, scalar values, etc.
Key/Value Objects as Series:
You can also use a key/value object, like a dictionary, when creating a Series. The keys of the dictionary become
the labels.
2. Pandas DataFrame: Pandas DataFrame is a two-dimensional data structure with labelled axes (rows and
columns).
Creating DataFrame: Pandas DataFrame is created by loading the datasets from existing storage (which can be a
SQL database, a CSV file, or an Excel file).
Pandas DataFrame can be created from lists, dictionaries, a list of dictionaries, etc.
A Dataframe is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.
In dataframe datasets arrange in rows and columns, we can store any number of datasets in a dataframe. We
can perform many operations on these datasets like arithmetic operation, columns/rows selection,
columns/rows addition etc.
Locate Row:
the DataFrame is like a table with rows and columns.
Pandas use the loc attribute to return one or more specified row(s)
Joins:
Inner: joins all shared rows on the joining/key columns. You will lose rows that don’t have a match in the
other DataFrame’s key column.
Outer: joins all rows from both DataFrames. No data will be lost.
Left: joins on all rows from the DataFrame. Any rows from the right DataFrame that do not have a match
in the key column of the left DataFrame are discarded.
Right: the opposite of the left join. Joins on all rows from the right DataFrame. Any rows from the right
DataFrame that do not have a match in the key column of the left.
1. Data visualization: means graphical or pictorial representation of the data using graph, chart etc. the
purpose od plotting is to visualize variation or show relationship between variables.
Visualization of data is effectively used in fields like: health, finance, science, mathematics,
engineering etc. visualize data using matplotlib line, bar, scatter with respect to the various types of
data.