Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

 Pandas Definition:

Pandas is an open-source Python library that provides high-performance, easy-to-use data manipulation
and analysis tools. It is built on top of NumPy and is particularly suited for working with structured data,
such as tables or relational databases.
 Pandas and NumPy:

Pandas NumPy
 Designed for working with structured data,  Primarily focused on numerical operations and
particularly tables or relational databases. manipulating homogeneous numerical arrays.
 Provides a high-level interface for data  Provides a multi-dimensional array object
manipulation and analysis with built-in data called ndarray, which is more suitable for
structures like DataFrame and Series. numerical computations.
 Offers powerful data alignment and  Does not have built-in support for handling
handling of missing data. missing data.
 Supports heterogeneous data types within a  Supports homogeneous data types, allowing for
single data structure. efficient storage and computation.

 Data Structures in Pandas:


a. Series: A one-dimensional labeled array capable of holding data of any type (integer, float, string, etc.). It
is similar to a column in a spreadsheet or a database table.
b. DataFrame: A two-dimensional labeled data structure with columns of potentially different data types. It
is similar to a spreadsheet or a SQL table, where data is organized in rows and columns.

 Series and DataFrame


Series Numpy Array
 Contains both data and an index, which  Consists only of homogeneous numerical data
labels the data. without any labels.
 Supports labeled indexing, allowing for easy  Supports positional indexing, where elements
access to elements based on index labels. are accessed based on their position in the
array.
 Can hold data of any type (numeric, string,  Typically holds numerical data of the same
etc.). type (homogeneous arrays).

 ii. Series and DataFrame:


Series Dataframe
 Represents a one-dimensional array or  Represents a two-dimensional table of data
column of data. with rows and columns.
 Contains both data and an index, which  Contains both data and row/column labels
labels the data. (indexes) for efficient data access.
 Can hold data of any type (numeric,string,  Can hold different data types in different
etc.). columns.
 Equivalent to a single column in a  Equivalent to a complete spreadsheet or a
spreadsheet or a database table. database table.
 iii. Series and List:
Series list
 Represents a one-dimensional labeled array.  Represents a one-dimensional collection of
elements.

Supports labeled indexing, allowing for easy  Supports positional indexing, where elements
access to elements based on index labels. are accessed based on their position in the list.
 Can hold data of any type (numeric, string,  Can hold elements of different data types.
etc.).
 Provides various built-in methods and  Provides basic functionalities like indexing,
functionalities for data manipulation. appending, and modifying elements.

 Creation of Series from:

i. Scalar Value or Constant value:


pd.Series(5, index=[1,2,3,4]) #creates a Series with a single element 5.

ii. List:
pd.Series([1, 2, 3, 4]) # creates a Series with elements from the list
without index (i.e , default index)

iii. NumPy Array:


pd.Series(np.array([1, 2, 3])) #creates a Series from a NumPy array.
iv. NumPy arange:
pd.Series(np.arange(0, 10, 2)) # creates a Series with values generated using
the arange function from NumPy, from 0 and
incrementing by 2 until reaching 10.

v. Dictionary:-
pd.Series( # Creates a series where key of dictionaries(i.e,
{“Name”:”NISHA”, Name,Title,Profession) will become index of
“Title: “JHA”, the Series. Do not give index in case of Series
“Profession”: “PGT IP” making with dictionary
}
)

 Slicing of Series:
Slicing in a Series refers to extracting a portion of the Series based on its index. It allows you to select
specific elements or a range of elements from the Series.

Example:
series = pd.Series([1, 2, 3, 4, 5])
series[2] # Returns the element at index 2 (value: 3)
series[1:4] # Returns a new Series with elements from index 1 to index 3 (values: [2, 3, 4])
series[:3] # Returns a new Series with elements from the beginning up to index 2 (values: [1, 2, 3])
series[3:] # Returns a new Series with elements from index 3 to the end (values: [4, 5])
series[::-1] # Returns series in reverse order
series[:] # Returns all elements of the series
series[::] # Returns all elements of the Series
series[:3] # Returns first 3 elements
series[-3:] #Returns last three elements
series[1:2] # Returns elements from 1st index to last index with step value of 2

 iloc and loc:


Both iloc and loc are indexing techniques used in Pandas to access specific elements, rows, or columns in a
DataFrame or Series.
 iloc is primarily used for integer-based indexing. It allows you to access elements using integer
positions.

Example:
series = pd.Series([1, 2, 3, 4, 5])
series.iloc[2] # Returns the element at integer position 2 (value: 3)
series.iloc[1:4] # Returns a new Series with elements from integer position 1 to position 3
(values: [2, 3, 4])

 loc is used for label-based indexing. It allows you to access elements using labels or index values.

Example:
series = pd.Series([1, 2, 3, 4, 5], index=['A', 'B', 'C', 'D', 'E'])
series.loc['C'] # Returns the element with the label 'C' (value: 3)
series.loc['B':'D'] # Returns a new Series with elements from label 'B' to label 'D' (values: [2, 3, 4])

 Filtering of Series:
Filtering in a Series involves selecting specific elements based on certain conditions. It allows you to extract
a subset of data that satisfies a given criterion.

Example:
series = pd.Series([1, 2, 3, 4, 5])
filtered_series = series[series > 3] # Returns a new Series with elements greater than 3 (values: [4, 5])
 Comparison of Two Series:
Comparing two Series involves checking the equality or inequality of corresponding elements in the Series.

Example:
series1 = pd.Series([1, 2, 3])
series2 = pd.Series([3, 2, 1])

series1 == series2 # Returns a new Series with Boolean values indicating whether the elements are
equal or not (values: [False, True, False])

 Mathematical Operations on Series:

1. Without Fill Value:


Mathematical operations on Series without a fill value will result in missing values (NaN) when performing
operations between mismatched indices.

Example:
series1 = pd.Series([1, 2, 3], index=['A', 'B', 'C'])
series2 = pd.Series([4, 5, 6], index=['B', 'C', 'D'])

addition
series1 + series2 # Returns a new Series with addition of elements, but NaN for mismatched
indices
subtraction
series1 - series2 # Returns a new Series with subtraction of elements, but NaN for
mismatched indices
multiplication
series1 * series2 # Returns a new Series with multiplication of elements, but NaN for
mismatched indices
division
series1 / series2 # Returns a new Series with division of elements, but NaN for mismatched
indices

2. ii. With Fill Value = 0:


Mathematical operations on Series with a fill value will replace missing values (NaN) with the specified fill
value.

Example:
series1 = pd.Series([1, 2, 3], index=['A', 'B', 'C'])
series2 = pd.Series([4, 5, 6], index=['B', 'C', 'D'])

addition
series1.add(series2, fill_value=0) # Returns a new Series with addition of elements,
replacing NaN with 0
subtraction
series1.sub(series2, fill_value=0) # Returns a new Series with subtraction of elements,
replacing NaN with 0
multiplication
series1.mul(series2, fill_value=0) # Returns a new Series with multiplication of elements,
replacing NaN with 0
division
series1.div(series2, fill_value=0) # Returns a new Series with division of elements,
replacing NaN with 0

 Attributes and Methods of Series:


Atrributes
Attribute/ Description Example
name Give name to series series.name=”NISHA”
index.name Give name to index of the series Index.name= “JHA”
values Returns the underlying NumPy array series.values
index Returns the index of the Series series.index
dtype Returns the data type of the elements series.dtype
size Returns the number of elements in the Series series.size
shape Returns the shape of the Series series.shape
hasnans Returns True if the Series contains any NaN values series.hasnans
ndim Returns the number of dimensions of the Series (always Series.ndim
1)

Methods
Method Description Example
head(n) Returns the first n elements of the Series series.head(5)
tail(n) Returns the last n elements of the Series series.tail(3)
describe() Provides summary statistics of the Series series.describe()
unique() Returns an array of unique values series.unique()
nunique() Returns the number of unique values series.nunique()
sort_values() Sorts the Series by values series.sort_values()
sort_index() Sorts the Series by index series.sort_index()
max() Returns the maximum value in the Series series.max()
min() Returns the minimum value in the Series series.min()
mean() Returns the mean of the Series series.mean()
median() Returns the median of the Series series.median()
sum() Returns the sum of the Series series.sum()
std() Returns the standard deviation of the Series series.std()
isnull() Returns a Boolean Series indicating null values series.isnull()
notnull() Returns a Boolean Series indicating non-null values series.notnull()
dropna() Removes null values from the Series series.dropna()
fillna(value) Fills null values with the specified value series.fillna(0)
astype(dtype) Converts the data type of the Series series.astype('float')
value_counts() Returns a Series with value frequencies series.value_counts()
replace(old, new) Replaces specified values with new values series.replace(0, np.nan)
These are some commonly used attributes and methods of a Pandas Series. They can be used to retrieve
information about the Series, manipulate the data, and perform various operations. Mostly asked
attribute/methods are highlighted with yellow color
 Here's an example of using the name attribute in a Pandas Series(Two way):
 import pandas as pd Output:
 series = pd.Series([10, 20, 30, 40, 50], name="NISHA")
 print(series) 0 10
1 20
2 30
3 40
Note:- this is 1st way in with we write here 4 50
Name: NISHA, dtype: int64

 import pandas as pd Output:



 data = [10, 20, 30, 40, 50] 0 10
 series = pd.Series(data, name="Numbers") 1 20
 series.name=”NISHA” 2 30
 print(series) 3 40
4 50
Name: Numbers, dtype: int64
Note :-This is second way in which we write like this

You might also like