All in One CH 1 Data Series

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

lo1

DATA HANDLING USING


PANDAS-I

Python Pandas is a Python package providing fast, flexible and expressive data CHAPTER CHECKLIST
structures designed for manipulation.
Python Pandas was developed by Wes Mckinney in 2008 and
used for data Features of Python Pandas
analysis in Python. Data analysis requires lots of processing, such as restructuring, Data Structure
as Num/y, Scipy,
deaning, merging etc. Using different tools available such Series Data Structure
Cython and Panda. Using Pandas, we can accomplish five typical steps in the
manipulate, model and DataFrame Data Structure
processing and analysis of data, such as load, prepare,
Transferring Data between
analyse.
for using Pandas to CSV Files and DataFrames
Jupyter notebook offers a good and effective environment
do data exploration and modeling.
programming
In this chapter, we will use Jupyter Notebook for

FEATURES OF PYTHON PANDAS


as follows
Some important features of Python Pandas are
library provides fast and eflicient way to
() Handling of Data Python
methods or structures as
manage and explore the data. t provides two
Series and DataFrames, which help us not only to represent data
cfficiently but also manipulate the data in various methods.
provide an extremely simple wide array
(i) Input and Output tools Pandas
tools for the purpose of
of built-in tools such as input and output
reading and writing data.
of data science. It
(ii) Visualise Visualising the data is an important part
understandable by human eyes. Pandas
makes the results of the study
and the various kinds
have in-built ability to help you plot your data see

of graphs formed.
Class 12th
2 Allinone | INFORMATICS PRACTICES

(iv)
SERIES DATA STRUCTURE
Grouping With the help of this feature
you can split data into categories
of Pandas
of your choice, can take integer values.
according to the criteria you set.
The GroupBy Series type of list in Pandas which
is a
labels of Series
function splits the data, implements
a function string values, double values and more. The row
and then combines the result. are called the index.
(v) Merging and
joining of datasets While analysing List, tuple and dictionary can be ecasily converted into Series by
data, we constantly Series() method.
need to merge and join
multiple datasets to create a final
dataset to be Serics has following parameters
able to properly analyse it. Pandas
can help to data list, tuple, dictionary or scalar value
merge various datasets, with
extreme efficiency so
that we dont face any problems while analysing index Is value should be unique. It uses default as
the data. np.arange(n), when we do not pass any index.
vi) Optimised performance Pandas have dtype data type of series
a really
optimised performance, which makes it really fast copy copying the data
and suitable for data. The critical code for Pandas
is written in C or Cython,
which makes it Creating a Series
extremely responsive and fast. In Pandas, Series can be created in two ways as
1. Creating an Empty Series
To Work with Pandas We can create an empty series object,
To work in Python Pandas, you need i.e. having no values
import pandas
to using Series () method.
library. By the following command
either on shell
prompt or script file, you can import pandas: Syntax Series_0bject pandas.Series()
import pandas as pd It contains always default data
type i.e. float64.
import pandas as pd
DATA STRUCTURE S pd.Series
It is a specialised print(s)
format for organising, processing,
retrieving and storing data. Any data Output Series([ 1, drype:float64)
structure is
designed to arrange data to suit a specific purpose 2. Creating a Series
so that Using Inputs
it can be accessed and worked with in appropriate ways.
Pandas' series can be created
Pandas provide rwo data structures for processing in different ways like from lists,
theactionary, scalar value, ndarray etc.
data as ) Create a Series
(i) Series It is one dimensional from Array In order Create a series
i

array with rom array, we have to


homogeneous data. All the elements of series use array() function.
importa numpy module and to
should be of same data type.
index must be If data is an ndarray then passed
(i) Dataframe It is a two dimensional array with of the same length.
e.g. import
heterogeneous data, usually represented in the pandas as pd
tabular format. The data is represented in rows import numpy as np
and columns. Each column represents an attribute ddta = np. array
(['P*. 'R°, *0°, G,
and each row represents a person.
à
A"M1)
pd. Series (data)
Difference between Series and DataFrame
print(a)
Series DataFrame Output
It is one dimensional data It is two dimensional data
P
structure. structure.
It
includes homogeneous It includes heterogeneous R
data, i.e. all the elements data, i.e. all the elements can 2
must be of same data type have different data types. 3
Its size is immutable
means lts size is mutable means
R
once objeCt has created once object has created can
cannot be change. be change according to A
condition. M
dtype : object
Data Handling Using Pandas-I 3
In above exanmple, we did not pass any index, so it Output
took by default indexes ranging from 0 to O ne 1564
len(data)-1l i.c. 0 to 6. Two 1569
If we pass index valucs,then the customised indexed Three 7896
values will be show in the output. Four 7596
drype int64
i mport pandas as pd
i mport numpy às np (iv) Create a Series from Scalar Value In order to creatc a
np. array ([ series from scalar value, an index must be provided.
data =
°P*. *R'. '0'. 'G'. The scalar value will be repeated to match the length
R". 'A*, 'M°])
= of index.
a pd. Series (data, index [11, 12.
13. 14. 15, 16,17]) 1mport pandas ds pd
data - pd.Series (50, indes [1. 2.
print(a)
3, 41)
Output
11 print data)
12 R
Output
50
13 O 50
14 G 3 50
15 R 4 50
16 A dtype int64
17 M We can also define the index using range function.
dtype:object mport pandas as pd
i

data pd.Series (50, index - range (1.


(i) Create a Series from Lists In order to create a series
10. 3))
from list, we have to first create a list after that we print(data)
can create a series from list.
Output
e.g. import pandas as pd 50
data [ P*, 'R*. '0", 'G', 'R'. 'A*. 'M*] 50
4
à pd.Series (data) 7 50
print(a) dtype int64
Output
0 P Creating Series Objects-Additiona
1 R Functionality
2 We can also create a series object with additional
3 G functionality as
(i) Adding NaN values in a series object When we create
4 R
series object of certain size with incomplete data,
5 then these missingg are filled with NaN (Not a
M Number) value. This can be done by np.NaN.
object i mport pandas as pd
dtype: Cg

i mport numpy as np
(ii) Create a Series from Dictionary When we create a
series from dictionary and dictionary object is being data- [ "Hello*, 'How', np. NaN, *You"]
passed as an input but index is not specified, tlhen a-pd.Series(data)
the dictionary keys are taken in a sorted order to print a)
Construct the index. Output
c.g 1
mport pandas as pd Hello
data 'One": 1569.
1564, 'Two' How
Three': 7896. 'Four': 75961 NaN
a - pd.Series (data) You
print(a) dtype object
INFORMATICS PRACTICES Class 12th
A Allinome

(ii) Specif index as well as data with Series


() Along Series Object Attributes
with values, we can also provide index while creatíng related to
a series. The series attribute is defined as any intormation
erc. To use this
Syniax Ser ies_Object-pandas.Series (data the series object such as size, datarype
attribute, following syntax is used
Nane,fnde, - Kone) Series_0bject. attribute_name
Here, None is default value.
Some of attributes that can use to get the intormation about
c. import pandas as pd the series object as follows
ind - [10. 11. 12, 13. 141 Description
info- ['Hello',
Atributes
'How'. 'Hho'. 'Thís
The'] Series Objectindex t defines the index of the series
a pd. Series (data - info, indez= ind) Series_Object.shape It returns the tup'e of shape of the
print (a) data
returns the data type of the data
Output Series Object dtype It
10 Hello Series Objsct.sze_ it returns the size of the data
Series_Object.empty t returns True i series object is empty
11 How otherwiss Faise
12 Who Series_Obect.hasnans treturns True i there are any NaN
13 This vaues. otnerwse Fase
Series_Object.nbytes t returns the number oi bytes in the
4 The
drype object
Cata
(1i1) Specify data type along with data and index We can
Series_ODject.ncim
i returns ha number oi dimensions in
the data
also specify the data type along with data and index. Series_Object.itemsize it returns the size of the datatype of
Syntax Series_0bject=pandas.Series (data =
tem
ione. index = ione. dtype = None) Series Object.values Itreturns series as ndaray
e.g import pandas as pd Here are some examples which will show the use of above
import numpy as np
mentioned attributes.
ind [10. 11. 12. 13. 14]
info [' Hel1o'. "How", 'Hho", 'This'. (i) Retrieving Index Array and Data Array
The'3 of a Series Object
a- pd. Series (data = ind, index = lt you want to retrieve index anray and data array of an aisung
info, dtype = np.f1oat64) series objeca, then index and values artributs are used.
printla) e.g. import pandas as pd
Output infol-pd.Series(data-t1.3.5.7.9])
Hello 10.0 info2=pd. Series (data=[12.56.36.451.
How 11.0 1ndex=[ 'a','b°. 'c'.'d'1)
12.0 print(infol.index)
Who
13.0
print(infol.values)
This
print(info2.index)
The 14.0
print(info2.values)
drype float64
Output Rangelndex(start=0, stop=5, step=1)
[13579)
Check Point 01J Index(['a', 'b, 'e, 'd'), dtype='object")
1.In which year, Python Pandas was developed? 12 56 36 45]
2. Which language or plutform is used to write critical code for
(ii) Retrieving Data Type and Size of
Pandas? Type
3. Which function is used to create a series from Array? If you want to retrieve
data type and size of type of series
form of NaN?
4. What is the full object, then dtype and itemsize attributes
are used
5. Write the syntax to specify index as well as data with Series). respectively.

This Topic ís removed from latest CBSE Sylabus.


Data Handling Using Pandas-I 5
e.g. import pandas as pd C.g. 1mport pandas pd
infoi-pd.Series(data-[1.0,3.3.5.6.7,4.9.1]) 1

mport numpy as np
info2-pd.Series(data-[12,56,36,45). xpd.Series (data - [1. 2. np.NaN])
index=[ 'a', 'b'. 'c', 'd']) y pd.Series (data - [4.9. 8.2. 5.6. 2.9].
print infol .dtype) index=[ 'a*. 'b'. 'c*. 'd')
print in fol.items i ze) pd.Series( )

print info2.dtype)
print(x.empty, y.empty. 2.empty)
print info2.items i
ze)
print(x.hasnans. y.hasnans. z. hasnans)
Output float64
Output
int64 (False, False, True)
(True, False, False)
(ii) Retrieving Shape
Shape of any series can be retrieve using shape attribute. It Accessing Elements from Series
defines the number of elements including missing or empty (Indexing and Slieing)
values (NaN).
Index number (an integer) is used to access the element of
e.g. import pandas as pd
Series.
x=pd. Series (data=[10. 20. 30])
To access the individual element, use the following syntax
y-pd.Series (data-[4.9. 8.2. 5.6. 2.9].
index=['a', 'b', 'c'. 'd*]) Series Object [ i ndex_number]
print(x. shape) To access multiple elements from Series, use slice operation.
This operation pertorms using the colon G).
print(y. shape)
Different forms of slice operation as
Output (3.)
(4,) : Index]-To print elements from begining to a range
-Index-To print elements from end
(iv) Retrieving Dimension, Size and Number
[Index:-To print elements from specific index till the
of Bytess
end use
If you want to retrieve dimension use ndim attribute, to
retrieve size use size attribute and to retrieve number of .[Start index: End index]-To print elements within a
bytes use nbytes attribute. range
eg. 1mport pandas as pd 1-To print whole series
xpd.Series (data-[10, 20. 30]) :-11-To print the whole series in reverse order
y pd.Series (data [4.9. 8.2. 5.6. 2.91.
-
c.g
index = ['a', 'b'. 'C°, 'd']) import pandas as pd
print(x.ndim, y. ndim) import numpy as np
print(x.size. y.size) data np.array([ 'P', 'R*. '0*. 'G*, 'R°, *A'.
print(x.nbytes, y. nbytes) M. 1,2.3])
a pd.Series (data)
Output
3
print'a[0] :
\n°, a[0])
print('a[:3] :\n*.al:3])
24 32
print'al: 3] \n'.a[:-3])
(v) Checking Emptiness and Presence print 'a[3:] :\n*.a[3:])
of NaN print'a[3:7] :\n',a[3:7])
empty attribute is used to check emptiness and
hasnans :\n*,a[:])
print( 'al:]
attribute is used to check that series object contains some
values or not. print 'a[:: -1] :\n',a[::-1])
PRACTICES Class 12th
|INFORMATICS
6 Allinone
al:-1)
Output a[0):
9

a:3 8 1

P
M
1 R
2
dtype object R
al-3] G
P O
R R
2 P
G
dtype: object
4 R
5 A Operations on Series Object
as follows
6 M We can perform various operations on series object
drype: object Element.
(i) Modify the Series
a[3:] item
The data value of series object can be modified using
3 G assignment.
4 R Syntax
Series 0bject[index]-new_data_value
M To modify the data value falling in mentioned slice, use

1 following syntax
8 2 Series0bject [start : stop] =
new_data_value
9 c.g. import pandas as pd
i mport numpy as np
dtype: object
data =
np.array([120,150.200.175])
a[3:7] a =
pd.Series (data)
3 G print(a
4 R a[2]-500

5 print"After modi fy the single element\n,a'


a[l:3]-1000
print("After modi fy the multiple el ements
dtype: object
In ",a)
a) Output 0 120
0 P
150
R 2
200
O
175
G
dtype: int32
R After moditfy the single
clement
A
120
M
150
7 2 S00
8 2
175
dtype: int32
drype: object
Data Handling Using Pandas-I 7

After modify the muleiple elenments


(ili) Mathematical Operations on Series
120 We can perform the mathematical operations on series
1000 objects but only on the matching indexes.
1000 If data items of rwo indexes arc not compatible, then it will
175
produce the result as NaN (Not a Number).
C.g. 1mport numpy as np
drype: int32
import pandas as pd
(ii) Head and Tail Functions pd.Series(data-[1.2.3,4.5])
head ( ) function is used to return first n rows of a Series. y-pd.Series (data-[4.9.8.2.5.5.2.9.87.
Default value of n is 5. index-[ 'a'.'D'. ':'.':. '"
Syntax z-pd.Series (data-[1 .2.6.3.4.5.8..4.3.i.2.

. .2
Series.head(n) 2

tail () tunction is used to return last n rows of a Series. p pd.Series (data-[8.6.7.5.103.indey-! 'a'.
Default value of n is 5.
Syntax print("Series 'x**)

Series.tail (n) print(x)


print(Series 'y**)
e.g. import pandas as pd
print(y)
import numpy as np
print("Series 'z**)
data np.array(['World", 'Honesty", print(z)
Loyalty', 'Charming', 'Intelligent, 'Wise'1)
print("Series 'p**)
a pd.Series (data)
print(p)
print(a)
hd-a.head(3) Output
print("First 3 Rows") Series 'x
print (hd) 0
t1=a.tail(4) 2
print"last 4 rows")
print(tl1)
Output 0 World 4
1 Honesty drype: intó4
Loyalty Series 'y
Charming 4.9
Intelligent S.2
Wise 5.6
drype: object 2.9
C S.0
First 3 Rows
0 World dtype : tloató4
1 Honesty Series'
Loyalty 0 1.2
2
deype: object 6.3
Last 4 rows
Loyaley S.1
Charming 4.3
4 Intelligent .2
5 Wise 2.2
dtype: tloató-4
dtype object
PRACTICES Class 12th
Allinone | INFORMATICS

Series p Output
8 -3.1
6 b 2.2
-1.4
7
d -3.1
6
e -2.0
10
dtype float64
dtype:int64
In above examples, series objects x and z have matching print(x 2)
indexes, i.e. 0, 1,2, ...
so on and series objects y and p have Output
matching indexes, i.e. a, b, c, d, e. 1.2
So, objects x and z successfully carries out arithmetic 1
12.6
operations on corresponding elements. Same as objects y 2 13.5
and p carries out arithmetic operations on corresponding
32.4
elements.
print(x+z) 4 21.5
5 NaN
Output
NaN
2.2
deype float64
1
8.3
print(y p)
7.5
12.1
Output
a 39.2
4 9.3
49.2
5 NaN
39.2
6 NaN
17.4
dtype: float64
80.0
print(y+p)
drype float64
Output
print(/x)
12.9
Output
b 14.2
1.200
12.6
3.150
8.9
1.500
18.0
2.025
dtype : float64
4 0.860
print(x-z) 5 NaN
Output NaN
-0.2
dtype: float64
A.3 print(y/p)
1.5 Output
3 4.1
0.612500
0.7
1.366667
5 NaN
0.800000
6 NaN 0.483333
aye: loat64 0.800000
print(y-p) dtype: tloat64
Data Handling Using Pandas-1 9
(iv) Vector Operations on Series Object Sorting Series Values
Vector operations are perfornmed individually on each
clement ot the series object. We can sort the data values of series objects using values and
indexes. Sorting can be done either in ascending order or
cg. import pandas as pd
descending order.
y pd.Series(data-[4, 8. 5, 21.
index-t 'a'. 'b°. 'c', 'd'1) Sort the Series Values using Values
print(y) To sort the series values using values, use sort_values (0
print "Add some value to el ement") function. This function sorts the vaues in ascending order.
print(y+3) Syntax
print"Multip1y some value t0 element")
print (y3) Series _0bject.sort_val ues
If you want the values in descending order pass
to sort
print( "use cubic power to el ement")
print(y**3) argument as ascending = False.
Syntax
print "Us ing relational operator
on element") Series_0bject.sort_values (ascending
False)
print(y>5)
Output c.B. import pandas as pd
x-pd.Series (data-[ 125.360.480.560.8503.
index=["'a'. 'b*.'c'.'a'.'e'])
b
print (x)
d 2
print( "Ascending 0rder")
print(x.sort_values ())
drype: int64
print( "Descending Order")
Add some value to clement
print(x.sort_vai ues (ascending=falseji
Output
b 125
8
360
480
dtype: int64
560
Multiply some value to element
12 850
24 dype: int64
15 Ascending Order
6 125
d
dtype: int64 360
Use cubic power to element 480
64 560
512 C 850
125 dtype: int64
d 8 Descending Order
deype: int64 850
Using relational operator on element d 560
False
480
b True
b 360
C False
125
False
bool dtype int64
dtype:
10 TAllinone IINFORMATICS PRACTICES Class 12th

Sort the Series Values using


Index DATAFRAME DATA
To sort the series values using indexes,
use sort_index( )
function. This function sorts the indexes in ascending
STRUCTURE
order. with labelled
Symtax Series_0bject.sort_index() DataFrame two dimensional data structure
is
consists of
axes, i.c. rows and columns. Pandas DataFrame
If you want to sort the indexes in descending columns.
order pass three principal components, i.c. data, rows and
argument as ascending = False.
Symtar series_0bject.sort_index There arc following characteristics of DataFrame
((ascending
False) (i) Therc are two indexes in DataFrame named row
e.g. 1mport pandas as pd index (axis = 0) and column index (axis = 1).
x=pd.Series (data-[125.360.480.560.8501. spreadsheet which contains row index
Gi) It looks like
is
index=['d*, 'b',"'a','c','e']) and column index combination, where row index
print(x) known as index and column index is known as
print(Ascending Order") column name.
print(x.sort_index( )) (ii) Indexes can be contain numbers, or letters or strings.
print"Des cend i ng Order") (iv) Columns of DataFrame can be heterogeneous.
print(x. sort_index(ascending=False))
Output d 125 Creating a DataFrame
b 360 To create a DataFrame, data is passed in two dimensional
480 format. Before create a DataFrame, you must to import
560 Pandas library as
850 import pandas as pd
drype: int64 Syntax
Ascending Order DataFrame_0bject = pandas .DataFrame (data,
480
index, col umns, dtype, copy)
Here,
b 360
data Different form like ndarray, series, list etc.
560
index For row label, if no index is passed in DataFrame
125 then default np.arange(n) is used.
850 columns For column label
drype: int64 dtype Datatype of each column
Descending Order copy Used for copying data
850 (i) Creating an Enmpty DataFrame
d 125 Empty DataFrame is basie DataFrame.
560 Syntax Dataframe_Object -
360
pandas.DataFrame)
b e.g. import pandas as pd
480 datapd.Dataframe
dtype: int64 print(data)
Output Empty DataFrame
Columns:0
Check Point 02) Index:
1. How to use atribute of series? (ii) Creating a DataFrame from
2. Which slice operation is used to print elements fom specifie Dictionary
index till the end use A dictioary is having
items as key value :
pair in which
3. Write syntax to modify the series element. value part belongs to data structure of any type as
4. Differentiate between head) fiunction and tailo function. ist ctc. ndarray,
. In which order. you can sort the series 'values?

You might also like