Python Pandas Series

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 45

Python

Pandas
Series
The advantages of Pandas over Excel are
 Scalability - Pandas is only limited by hardware and can manipulate larger
quantities of data.
 Speed - Pandas is much faster than Excel, which is especially noticeable when
working with larger quantities of data.
 Automation - A lot of the tasks that can be achieved with Pandas are extremely
easy to automate, reducing the amount of tedious and repetitive tasks that need
to be performed daily.
 Interpretability - It is very easy to interpret what happens when each task is
run, and it is relatively easy to find and fix errors.
 Advanced Functions - Performing advanced statistical analysis and creating
complex visualizations is very straightforward.
Module: Module is a file which contains python functions. It is .py
file which has python executable code or statements.
Package: Package is namespace which contains multiple
packages or modules. It is a directory which contains a special
file __init__.py.
A namespace is a system that has a unique name for each and
every object in Python. An object might be a variable or a
method.
Library: It is collection of various packages. There is no difference
between package and python library conceptually.

Framework: It is a collection of various libraries which architects th


code flow.
 Data in a structure

 It will store in specific manner

 It is a collection of data values and operations that


can be applied to that data.

 It will enables efficient storage, retrieval and


modification to the data
Pandas:
Pandas is a software library for the Python programming language
written by Wes McKinney for data manipulation and analysis. The name
Pandas is derived from the term “Panel Data”. It is an open source and
free to use
We can analyze the data in pandas in two ways-
 Series

 Dataframe
Installation of pandas:

pip(preferred installation program) install


pandas
KEY FEATURES OF PANDAS
 Fast and efficient DataFrame object with default and customized
indexing.
 Tools for loading data from different file formats.
 Data alignment and integrated handling of missing data.
 Reshaping and pivoting of date sets.
 Label-based slicing and indexing of large data sets.
 Deletion/Insertion of columns from/to a data structure.
 Group by data for aggregation and transformations.
 High performance merging and joining of data.
SERIES

 Series is a one-dimensional array like structure with


homogeneous data. For example, the following series is a
collection of integers 10, 23, 56, …
SERIES (CONTD.)
 Pandas series is a one-dimensional labeled array capable of
holding data of any type (integer, string, float, python objects,
etc.).
 The axis labels are collectively called index.
 Pandas series is nothing but a column in an excel sheet.
 The object supports both integer and label-based indexing and
provides a host of methods for performing operations
involving the index.
Characteristics of series

series is a one-dimensional labeled array capable of holding


homogenous data of any type (integer, string, float etc.).

The data labels in series are numeric starting from 0 by default.


The data labels are called as indexes.

The data in series is mutable i.E. It can be changed but the size of
series is immutable i.E. Size of the series cannot be changed.
CREATING A SERIES
Pandas series can be created from the lists, dictionary, and from a
scalar value etc.
Syntax
Pandas.Series( data, index, name)
Where
Data: takes various forms like ndarray, list, constants/scalar
values, dictionary, mathematical expression
Index: are unique and hashable with same length as data.
Default is np.Arange(n) if no index is passed.
Name: allows you to give a name to a series object
Series() with arguments

SYNTAX:

<SERIES OBJECT> = PANDAS.SERIES(DATA, INDEX = IDX, [DTYPE =


<DATA TYPE>])

THE DATA SUPPLIED TO SERIES() CAN BE EITHER:

 A SEQUENCE (LIST)
 AN NDARRAY
 A SCALAR VALUE
 A PYTHON DICTIONARY
 A MATHEMATICAL EXPRESSION/FUNCTION
Here, keys of the dictionary become the indexes of the
series.
Creating a series with index of string type
String can be used as an index to the elements of a series.
Creating a series using two different lists
The two lists are passed as arguments to Series() method, out of which
the first list will be index and the other one will be the value.
Creating a series using missing values (nan)
In certain situations, we need to create a series object for which size is
defined but some elements or data are missing. This is handled by defining
NaN (Not a number) value(s), which is an attribute of Numpy library and
this can be achieved by defining a missing value using np.NaN.
Creating a series using a range()
 To create a series using range() method.

CODE:

 We can change the index in place also by

ser.index = [ ‘first’, ‘second’, ‘third’, ‘fourth’, ‘fifth’]


Creating a series with range() & for loop
Creating a series from scalar or constant values
A series can be created using a scalar or constant value as shown below. Here,
data is a scalar value for which it is a must to provide an index and the constant
value shall be repeated to match the length of the index.
CREATING A SERIES USING MATHEMATICAL
EXPRESSION/FUNCTION
A series object can be created by defining a function or a mathematical
expression that determines the values for data sequence using the syntax as
follows:
<Series Object> = pd.Series (index = None, data = <expression [function]>)
CREATING A SERIES USING A MATHEMATICAL
FUNCTION
A series using a mathematical exponentiation function.
SERIES OBJECT ATTRIBUTES
Some common attributes related to series object are described below
and are accessed using the syntax:
<series object>.<Attributename>
Attribute Description
Series.index Returns index of the series
Series.values Returns ndarray
Series.dtype Returns dtype object of the underlying data
Series.shape Returns tuple of the shape of underlying data
Series.nbytes Returns number of bytes of underlying data
Series.ndim Returns the number of dimension
Series.size Returns the number of elements

Series.hasnans Returns true if there are any NaN


Series.empty Returns true if series object is empty
INDEXING AND SELECTING DATA IN SERIES
ACCESSING DATA FROM A SERIES WITH
POSITION
We can access data from a series by passing the position
value and even through slicing.
Accessing data through iloc & loc
● indexing and accessing can also be done using iloc and loc.
● iloc :- iloc is used for indexing and selecting based on position,
i.e. by row no. and column no. it refers to position-based
indexing.

syntax: iloc = [<row no. range>, <column no. range>]

● loc :- loc is used for indexing and selecting based on name, i.e.
by row name and column name. it refers to name-based
indexing.

syntax: loc = [<list of row names>, <list of column names>]


Accessing data using iloc & loc
Retrieving values from a series using head()
And tail() functions
The Series.head() function displays first ‘n’ from a pandas object. By
default, it gives us the top 5 rows of data in the series.

The Series.tail() function displays the last 5 elements by default.


Retrieving values from a series using head()
And tail() functions
The Series.head() function displays first ‘n’ from a pandas object. By
default, it gives us the top 5 rows of data in the series.

The Series.tail() function displays the last 5 elements by default.


Mathematical operations on a series
Mathematical processing can be performed on series using scalar
values and functions. All the arithmetic operators such as +, -, *, /,
etc. can be successfully performed on series.
Example:

Note:
Arithmetic
operation is
possible on objects
of same index;
otherwise will
result as NaN.
Vector operations on a series
Vector operations mean that if you apply a function or expression
then it is individually applied on each item of the object. Since Series
objects are built upon Numpy arrays (ndarrays), they also support
vectorized operations, just like ndarrays.

All these are


vector operations.
Retrieving values using conditions
We can also give conditions to retrieve values from a series that
satisfies the given condition.
Example:

Here, it is performing the


filter operation and returns
filtered result containing only
those values that return True
for the given Boolean
expression.
DELETING ELEMENTS FROM A SERIES
We can delete an element from a series using drop( ) method by
passing the index of the element to be deleted as the argument
to it.
Example:

Series will Series after


all the deleting
elements item at
intact. index 3.
Sorting on the Values and Index
To sort a Series object on the basis of values and index,
you may use sort_values() and and sort_index().
Sorting on the Values and Index
To sort a Series object on the basis of values and index,
you may use sort_values() and and sort_index().

pandas series.sort_values() function is used to sort values on series


object. it sorts the series in ascending order or descending order, by
default it does in ascending order. you can specify your preference
using the ascending parameter which is true by default.

you may use sort_values() and and sort_index().

# syntax of series.sort_values()
● series.sort_values(axis=0, ascending=True)
sort pandas series in an ascending order:
sortedseries = myseries.sort_values()
sortedseries = myseries.sort_values(ascending=true)

# sort series contains numeric values.


import pandas as pd
myseries = pd.series([25000,30000,23000,15000,80000])
# sort pandas series in a descending order.
sortedseries = myseries.sort_values(ascending=False)

# sort inplace
myseries.sort_values(ascending=false, inplace=True)

You might also like