Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Python Libraries - Pandas - Pandas Basics

Pandas is a library built using NumPy specifically for data analysis.you will be using Pandas heavily
for data manipulation,visuilization,building machine learning models,etc.

There are two main data structures in pandas:

• series

• dataframes

The default way to store data in dataframes,and thus manipilating dataframes quickly in probable the most important skill set for datya analysis.

In [1]:

1 pip install pandas

Requirement already satisfied: pandas in c:\users\student\anaconda3\lib\site-packages (1.4.4)


Requirement already satisfied: pytz>=2020.1 in c:\users\student\anaconda3\lib\site-packages (from pandas) (2022.1)
Requirement already satisfied: numpy>=1.18.5 in c:\users\student\anaconda3\lib\site-packages (from pandas) (1.21.5)
Requirement already satisfied: python-dateutil>=2.8.1 in c:\users\student\anaconda3\lib\site-packages (from pandas) (2.8.2)
Requirement already satisfied: six>=1.5 in c:\users\student\anaconda3\lib\site-packages (from python-dateutil>=2.8.1->panda
s) (1.16.0)
Note: you may need to restart the kernel to use updated packages.

In [3]:

1 import pandas as pd

In [4]:

1 # The Pandas series


2 #creating a numeric pandas series
3 s = pd.Series([2,4,5,6,9])
4 print(s)
5 print(type(s))
0 2
1 4
2 5
3 6
4 9
dtype: int64
<class 'pandas.core.series.Series'>

In [5]:

1 #creating a series of type datetime


2 data_series = pd.date_range(start = '11-09-2017', end= '12-12-2017')
3 data_series
4 #type (data_series)
Out[5]:

DatetimeIndex(['2017-11-09', '2017-11-10', '2017-11-11', '2017-11-12',


'2017-11-13', '2017-11-14', '2017-11-15', '2017-11-16',
'2017-11-17', '2017-11-18', '2017-11-19', '2017-11-20',
'2017-11-21', '2017-11-22', '2017-11-23', '2017-11-24',
'2017-11-25', '2017-11-26', '2017-11-27', '2017-11-28',
'2017-11-29', '2017-11-30', '2017-12-01', '2017-12-02',
'2017-12-03', '2017-12-04', '2017-12-05', '2017-12-06',
'2017-12-07', '2017-12-08', '2017-12-09', '2017-12-10',
'2017-12-11', '2017-12-12'],
dtype='datetime64[ns]', freq='D')

The Dataframe
Dataframe is the most widely used data-structure in data analysis.It is a table with rows andcolumns,with rows having index and columns having meaningful
data.

creating dataframes from dictionaries.

EXAMPLE - 1
In [8]:

1 country = ['United States','Australia','India','Russia','Morrocco']


2 symbol = ['US','AU','IND','RUS','MOR']
3 dic_world = {"country":country,"symbol":symbol}

In [9]:

1 print(dic_world)
{'country': ['United States', 'Australia', 'India', 'Russia', 'Morrocco'], 'symbol': ['US', 'AU', 'IND', 'RUS', 'MOR']}

In [10]:

1 dic_world["country"]
2
Out[10]:

['United States', 'Australia', 'India', 'Russia', 'Morrocco']

In [11]:

1 dic_world["symbol"]
Out[11]:

['US', 'AU', 'IND', 'RUS', 'MOR']

In [12]:

1 data = pd.DataFrame(dic_world)

In [13]:

1 print(type(data))
2

<class 'pandas.core.frame.DataFrame'>

In [14]:

1 print(data)
2

country symbol
0 United States US
1 Australia AU
2 India IND
3 Russia RUS
4 Morrocco MOR

In [15]:

1 print(data["country"])

0 United States
1 Australia
2 India
3 Russia
4 Morrocco
Name: country, dtype: object

In [16]:

1 print(data["symbol"])
2
0 US
1 AU
2 IND
3 RUS
4 MOR
Name: symbol, dtype: object

EXAMPLE-2
In [18]:

1 #defining data to create lists for dictionary


2 cars_per_cap = [809,731,588,18,200,70,45]
3 country = ['United states','Australia','Japan','India','Russia','Morroco','Egypt']
4 drives_right = [False,True,True,True,False,False,False]
5
In [19]:

1 #creating the dictionaries to state the entries as key:value pair.


2 cars_dict = {"cars_per_cap":cars_per_cap,"country":country,"drives_right":drives_right}

In [20]:

1 print(cars_dict)

{'cars_per_cap': [809, 731, 588, 18, 200, 70, 45], 'country': ['United states', 'Australia', 'Japan', 'India', 'Russia', 'M
orroco', 'Egypt'], 'drives_right': [False, True, True, True, False, False, False]}

In [21]:

1 print(cars_dict['cars_per_cap'])
[809, 731, 588, 18, 200, 70, 45]

In [22]:

1 cars = pd.DataFrame(cars_dict)

AGGREGATION FUNCTION
In [24]:

1 cars
Out[24]:

cars_per_cap country drives_right

0 809 United states False

1 731 Australia True

2 588 Japan True

3 18 India True

4 200 Russia False

5 70 Morroco False

6 45 Egypt False

In [25]:

1 cars.cars_per_cap

Out[25]:

0 809
1 731
2 588
3 18
4 200
5 70
6 45
Name: cars_per_cap, dtype: int64

In [26]:

1 print(cars.cars_per_cap.max())
809

In [27]:

1 print(cars.cars_per_cap.min())

18

In [28]:

1 print(cars.cars_per_cap.mean())
351.57142857142856

In [29]:

1 print(cars.cars_per_cap.std())
345.59555222005633

In [30]:

1 print(cars.cars_per_cap.count())

7
In [39]:

1 country = ['United states','Australia','Japan','India','Russia','Morroco','Egypt']


2 cars_per_cap = [809,731,588,18,200,70,45]

In [41]:

1 lst = [['tom','reacher',25],['krish','pete',30],['nick','wilson',26],['julie', 'jonny', 28]]


2 df = pd.DataFrame(lst,columns = ['FName','LName','Age'],dtype = float)
3 df

C:\Users\student\AppData\Local\Temp\ipykernel_9292\3002031254.py:2: FutureWarning: Could not cast to float64, falling back


to object. This behavior is deprecated. In a future version, when a dtype is passed to 'DataFrame', either all columns will
be cast to that dtype, or a TypeError will be raised.
df = pd.DataFrame(lst,columns = ['FName','LName','Age'],dtype = float)

Out[41]:

FName LName Age

0 tom reacher 25.0

1 krish pete 30.0

2 nick wilson 26.0

3 julie jonny 28.0

In [42]:

1 df.Age.max()
Out[42]:

30.0

In [43]:

1 df.Age.min()
Out[43]:

25.0

In [44]:

1 df.Age.mean()
Out[44]:

27.25

In [45]:

1 df.Age.std()

Out[45]:

2.217355782608345

In [46]:

1 df.Age.count()
Out[46]:

In [ ]:

You might also like