Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

10/8/21, 12:19 AM Numpy_KickStart - Jupyter Notebook

Why Numpy?
Inorder to perform some numerical operations like array addition, multiplication, create dummy values,
etc.,

In [1]:

import numpy as np

In [2]:

arr_1 =np.array([1,2,3,4,5])
arr_1

Out[2]:

array([1, 2, 3, 4, 5])

In [4]:

type(arr_1)

Out[4]:

numpy.ndarray

In [5]:

list_1 = [1,2,3,4,5]
list_1

Out[5]:

[1, 2, 3, 4, 5]

In [6]:

type(list_1)

Out[6]:

list

List Vs Numpy

1. In List, we cannot do element-wise operation directly but in array we can do that directly.

2. Array is homogenous datatype and List is heterogenous datatype.

3. List can be converted into an array and vice versa, but dimensions will be missed.

Create 1D array

localhost:8888/notebooks/Data science/Numpy_KickStart.ipynb 1/14


10/8/21, 12:19 AM Numpy_KickStart - Jupyter Notebook

In [10]:

arr_1d = np.array([1,2,3,4,5])
print(arr_1d)
print('No of dimensions: ',arr_1d.ndim) #Attribute
print('No of elements : ',arr_1d.size) #Attribute
print('Max element : ',arr_1d.argmax()) #Returns the index number of the max value

[1 2 3 4 5]

No of dimensions: 1

No of elements : 5

Max element : 4

Differences

In [13]:

list_1.append([6,7])

In [14]:

list_1

Out[14]:

[1, 2, 3, 4, 5, [6, 7]]

In [ ]:

arr_1d #Cannot add new value

In [16]:

arr_1d + 3

Out[16]:

array([4, 5, 6, 7, 8])

In [17]:

list_1 + 3 #Not possible to go for element-wise operation in list

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-17-6657cea64c08> in <module>

----> 1 list_1 + 3

TypeError: can only concatenate list (not "int") to list

localhost:8888/notebooks/Data science/Numpy_KickStart.ipynb 2/14


10/8/21, 12:19 AM Numpy_KickStart - Jupyter Notebook

In [18]:

list_2 = [1,3.5,'Vennela']
list_2

Out[18]:

[1, 3.5, 'Vennela']

In [21]:

arr_2 = np.array([1,2,3])
print(arr_2)
print(arr_2.dtype)

[1 2 3]

int32

In [22]:

arr_2 = np.array([1,2.4,3])
print(arr_2)
print(arr_2.dtype)

[1. 2.4 3. ]

float64

In [23]:

arr_2 = np.array([1,2.4,'3'])
print(arr_2)
print(arr_2.dtype)

['1' '2.4' '3']

<U32

Create a 2d array

In [28]:

arr_2d = np.array([[1,2,3],[4,5,6]])
print(arr_2d)
print('No of dimensions : ',arr_2d.ndim)

[[1 2 3]

[4 5 6]]

No of dimensions : 2

Create 3d array

localhost:8888/notebooks/Data science/Numpy_KickStart.ipynb 3/14


10/8/21, 12:19 AM Numpy_KickStart - Jupyter Notebook

In [33]:

arr_3d = np.array([[[1,2,3],[4,5,6],[7,8,9]]])
print(arr_2d)
print('No of dimensions : ',arr_3d.ndim)
print('Type of elements : ',arr_3d.dtype)

[[[1 2 3]

[4 5 6]

[7 8 9]]]

No of dimensions : 3

Type of elements : int32

Convert int to float

In [35]:

arr_3d_converted = arr_3d.astype(dtype = 'float')


print(arr_3d_converted)
print('No of dimensions : ',arr_3d_converted.ndim)
print('Type of elements : ',arr_3d_converted.dtype)

[[[1. 2. 3.]

[4. 5. 6.]

[7. 8. 9.]]]

No of dimensions : 3

Type of elements : float64

List to Array Conversion

In [40]:

list_3 = [[1,2,3,4],[3,7,8,9]]
print(list_3)
print(type(list_3))

#Conversion
list_to_array = np.array(list_3)
print(list_to_array)
print(type(list_to_array))
print(list_to_array.ndim)

[[1, 2, 3, 4], [3, 7, 8, 9]]

<class 'list'>

[[1 2 3 4]

[3 7 8 9]]

<class 'numpy.ndarray'>

Array to List Conversion

localhost:8888/notebooks/Data science/Numpy_KickStart.ipynb 4/14


10/8/21, 12:19 AM Numpy_KickStart - Jupyter Notebook

In [45]:

arr_4 = np.array([[1,2,3],[4,5,6]])
print(arr_4)
print(type(arr_4))
print('No of dimensions : ',arr_4.ndim)

#Conversion
arr_to_list = arr_4.tolist()
arr_to_list

[[1 2 3]

[4 5 6]]

<class 'numpy.ndarray'>

No of dimensions : 2

Out[45]:

[[1, 2, 3], [4, 5, 6]]

In [48]:

import pandas as pd
pd.read_csv('dummy_data.csv')

Out[48]:

Name Age Salary

0 Ram 30.0 80000

1 Vinoth 32.0 120000

2 Ishwarya NaN 70000

3 Shadab 27.0 60000

Create Nan with numpy

In [52]:

arr_5 = np.array([[1.,2,3],[4,5,6]])
arr_5

Out[52]:

array([[1., 2., 3.],

[4., 5., 6.]])

In [53]:

arr_5[0][0] = np.nan

localhost:8888/notebooks/Data science/Numpy_KickStart.ipynb 5/14


10/8/21, 12:19 AM Numpy_KickStart - Jupyter Notebook

In [54]:

arr_5

Out[54]:

array([[nan, 2., 3.],

[ 4., 5., 6.]])

Statistical Operations

In [55]:

arr_6 = np.array([1,2,3,4,5,6,7,8,9,10])
print(arr_6)

[ 1 2 3 4 5 6 7 8 9 10]

In [56]:

arr_6.sum()

Out[56]:

55

In [57]:

arr_6.prod()

Out[57]:

3628800

In [58]:

arr_6.mean()

Out[58]:

5.5

In [59]:

arr_6.std() #From the center value, how much the datapoints got deviated

Out[59]:

2.8722813232690143

In [60]:

arr_6.argmax()

Out[60]:

Reshaping
localhost:8888/notebooks/Data science/Numpy_KickStart.ipynb 6/14
10/8/21, 12:19 AM Numpy_KickStart - Jupyter Notebook

In [64]:

arr_7 = np.array([[1,2,3,4,5],[2,3,4,4,6]])
print(arr_7)
print('Dimension: ',arr_7.ndim)
print('Shape : ',arr_7.shape)

[[1 2 3 4 5]

[2 3 4 4 6]]

Dimension: 2

Shape : (2, 5)

In [68]:

arr_7_reshape =arr_7.reshape((5,2))
print(arr_7_reshape)
print('Dimension: ',arr_7_reshape.ndim)
print('Shape : ',arr_7_reshape.shape)

[[1 2]

[3 4]

[5 2]

[3 4]

[4 6]]

Dimension: 2

Shape : (5, 2)

Reshape to 1 dimension

In [74]:

arr_7 = arr_7.reshape(1,10)
print(arr_7)
print('Dimension: ',arr_7.ndim)
print('Shape : ',arr_7.shape)

[[1 2 3 4 5 2 3 4 4 6]]

Dimension: 2

Shape : (1, 10)

In [75]:

arr_7 = arr_7.flatten()
print(arr_7)
print('Dimension: ',arr_7.ndim)
print('Shape : ',arr_7.shape)

[1 2 3 4 5 2 3 4 4 6]

Dimension: 1

Shape : (10,)

Sequencing, Repetition and Random numbers

localhost:8888/notebooks/Data science/Numpy_KickStart.ipynb 7/14


10/8/21, 12:19 AM Numpy_KickStart - Jupyter Notebook

In [82]:

np.arange(1,21,dtype='int')

Out[82]:

array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,

18, 19, 20])

In [83]:

np.arange(1,21,dtype='float')

Out[83]:

array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13.,

14., 15., 16., 17., 18., 19., 20.])

In [84]:

np.linspace(start = 1,stop = 50,num=20) #Return evenly spaced numbers over a specified inte

Out[84]:

array([ 1. , 3.57894737, 6.15789474, 8.73684211, 11.31578947,

13.89473684, 16.47368421, 19.05263158, 21.63157895, 24.21052632,

26.78947368, 29.36842105, 31.94736842, 34.52631579, 37.10526316,

39.68421053, 42.26315789, 44.84210526, 47.42105263, 50. ])

In [86]:

np.ones((3,5),dtype = 'int')

Out[86]:

array([[1, 1, 1, 1, 1],

[1, 1, 1, 1, 1],

[1, 1, 1, 1, 1]])

In [87]:

np.ones((3,5),dtype = 'float')

Out[87]:

array([[1., 1., 1., 1., 1.],

[1., 1., 1., 1., 1.],

[1., 1., 1., 1., 1.]])

In [90]:

np.zeros((5,3),dtype = 'int')

Out[90]:

array([[0, 0, 0],

[0, 0, 0],

[0, 0, 0],

[0, 0, 0],

[0, 0, 0]])

localhost:8888/notebooks/Data science/Numpy_KickStart.ipynb 8/14


10/8/21, 12:19 AM Numpy_KickStart - Jupyter Notebook

In [91]:

arr_2d

Out[91]:

array([[[1, 2, 3],

[4, 5, 6],

[7, 8, 9]]])

In [93]:

arr_2d.repeat(repeats = 10, axis=0)

Out[93]:

array([[[1, 2, 3],

[4, 5, 6],

[7, 8, 9]],

[[1, 2, 3],

[4, 5, 6],

[7, 8, 9]],

[[1, 2, 3],

[4, 5, 6],

[7, 8, 9]],

[[1, 2, 3],

[4, 5, 6],

[7, 8, 9]],

[[1, 2, 3],

[4, 5, 6],

[7, 8, 9]],

[[1, 2, 3],

[4, 5, 6],

[7, 8, 9]],

[[1, 2, 3],

[4, 5, 6],

[7, 8, 9]],

[[1, 2, 3],

[4, 5, 6],

[7, 8, 9]],

[[1, 2, 3],

[4, 5, 6],

[7, 8, 9]],

[[1, 2, 3],

[4, 5, 6],

[7, 8, 9]]])

localhost:8888/notebooks/Data science/Numpy_KickStart.ipynb 9/14


10/8/21, 12:19 AM Numpy_KickStart - Jupyter Notebook

In [97]:

#To generate random numbers


random_numbers = np.random.rand(10,10) # 0 to 1
print(random_numbers)

[[0.7522378 0.59840277 0.98032876 0.8909582 0.53373616 0.88835145

0.24153822 0.27212979 0.1555258 0.66672105]

[0.03359001 0.98019392 0.75856936 0.55605301 0.03174803 0.84657788

0.71622908 0.91338059 0.37556562 0.63562302]

[0.10884397 0.44691789 0.42777777 0.81528865 0.49332769 0.47537318

0.16523043 0.38317708 0.89125815 0.12553124]

[0.75474447 0.78261561 0.64383317 0.508903 0.86589117 0.87565516

0.63702356 0.86827629 0.31093215 0.92112643]

[0.28122148 0.11475459 0.2543637 0.67415472 0.40711809 0.07182503

0.10851266 0.95715354 0.47222885 0.08351885]

[0.92799134 0.14695707 0.14208547 0.71562343 0.55254851 0.27853705

0.54003526 0.91133382 0.36815828 0.85215297]

[0.93988949 0.341174 0.01166787 0.61474266 0.39748557 0.10211612

0.82334904 0.40665148 0.28809701 0.24895734]

[0.01654924 0.34930347 0.66160658 0.63317519 0.75035205 0.32912402

0.2542498 0.70585709 0.44998947 0.34655589]

[0.30147321 0.73018294 0.84467288 0.51520822 0.54461626 0.86300238

0.13285876 0.24993216 0.38268974 0.75638246]

[0.87338358 0.4557205 0.79204827 0.46789719 0.29564859 0.1751014

0.70685805 0.74206353 0.06701718 0.82941239]]

localhost:8888/notebooks/Data science/Numpy_KickStart.ipynb 10/14


10/8/21, 12:19 AM Numpy_KickStart - Jupyter Notebook

In [115]:

plt.hist(random_numbers)

Out[115]:

(array([[2., 1., 2., 0., 0., 0., 0., 2., 1., 2.],

[0., 2., 0., 2., 2., 0., 1., 2., 0., 1.],

[1., 1., 1., 0., 1., 0., 2., 1., 2., 1.],

[0., 0., 0., 0., 1., 3., 3., 1., 1., 1.],

[1., 0., 1., 1., 2., 3., 0., 1., 1., 0.],

[2., 1., 1., 1., 1., 0., 0., 0., 3., 1.],

[1., 2., 2., 0., 0., 1., 1., 2., 1., 0.],

[0., 0., 2., 1., 1., 0., 0., 2., 1., 3.],

[1., 1., 1., 4., 2., 0., 0., 0., 0., 1.],

[1., 1., 1., 1., 0., 0., 2., 1., 2., 1.]]),

array([0.01166787, 0.10853396, 0.20540005, 0.30226614, 0.39913223,

0.49599832, 0.59286441, 0.68973049, 0.78659658, 0.88346267,

0.98032876]),

<a list of 10 BarContainer objects>)

In [105]:

a = np.random.randint(low = 10, high=100, size=(10,5), dtype=int)


a

Out[105]:

array([[32, 76, 55, 98, 65],

[79, 73, 15, 21, 61],

[51, 97, 49, 97, 25],

[23, 83, 89, 35, 53],

[76, 85, 51, 88, 63],

[58, 41, 21, 13, 59],

[25, 31, 19, 35, 84],

[82, 89, 24, 15, 60],

[92, 45, 83, 59, 20],

[27, 27, 21, 90, 44]])

localhost:8888/notebooks/Data science/Numpy_KickStart.ipynb 11/14


10/8/21, 12:19 AM Numpy_KickStart - Jupyter Notebook

In [106]:

ages = [12,35,67,89,55,78,55,76,89,100]
ages

Out[106]:

[12, 35, 67, 89, 55, 78, 55, 76, 89, 100]

In [109]:

np.random.choice(a = ages,size=3)

Out[109]:

array([55, 76, 35])

In [112]:

norm_distribution_random_numbers = np.random.randn(10,10) #Return a samples from the "stand


norm_distribution_random_numbers

Out[112]:

array([[-0.14998811, -0.82616376, 1.23162413, 1.50599222, 0.60775798,

0.82031135, 0.16314201, -0.27971942, -0.31255425, -1.41858954],

[-0.61427653, 0.53437206, 0.94536002, -0.34814053, 0.92670669,

-1.20521558, -0.84808193, 0.79223646, 2.45851022, 1.82426662],

[ 0.5661328 , -1.15224168, -0.84290388, -0.16048055, 0.61652193,

1.1043627 , -0.88178525, -1.05846469, -0.45731413, 0.20114114],

[-1.94570301, 0.49578246, -1.03705626, 0.35186015, 1.41369587,

0.85136387, 0.3640365 , 0.51675965, 0.72282229, 1.9518135 ],

[ 0.06346569, 1.12512869, 0.22062349, -1.11470712, -1.10188094,

1.86510639, 0.66377541, 1.01920725, -0.64348622, 1.09742148],

[ 0.18142665, -0.01470521, -0.33146925, 1.71768155, -1.15201099,

0.9561929 , -0.65252581, -2.86042729, -1.58878786, -0.82784187],

[-0.49608431, -1.41429201, 0.24803719, -0.07503125, 0.60806954,

-1.15681029, 0.20593078, 2.04048407, -0.38445193, -0.4233213 ],

[ 0.67697507, 0.55686488, -0.78769268, 1.23991432, 0.97276586,

0.83458431, 0.83824446, -0.38067527, -0.76783127, -0.34740588],

[-2.34481876, 0.63495551, 1.23336232, -0.81977836, -0.75801358,

-0.79793036, 1.00629451, 0.19687593, -0.48135247, -1.03210872],

[ 0.4654954 , 0.36415721, 1.51658244, 0.12111279, 0.29816271,

-0.15215329, -0.19026641, -0.47142746, 0.76256217, -0.0871029 ]])

In [113]:

import matplotlib.pyplot as plt

localhost:8888/notebooks/Data science/Numpy_KickStart.ipynb 12/14


10/8/21, 12:19 AM Numpy_KickStart - Jupyter Notebook

In [114]:

plt.hist(norm_distribution_random_numbers)

Out[114]:

(array([[1., 1., 0., 0., 2., 3., 3., 0., 0., 0.],

[0., 0., 1., 2., 0., 1., 5., 1., 0., 0.],

[0., 0., 0., 3., 1., 2., 0., 3., 1., 0.],

[0., 0., 0., 2., 1., 3., 1., 1., 2., 0.],

[0., 0., 0., 3., 0., 1., 3., 2., 1., 0.],

[0., 0., 0., 3., 0., 1., 3., 2., 1., 0.],

[0., 0., 0., 2., 1., 3., 3., 1., 0., 0.],

[1., 0., 0., 1., 3., 1., 2., 1., 0., 1.],

[0., 0., 1., 1., 5., 0., 2., 0., 0., 1.],

[0., 0., 1., 2., 2., 2., 0., 1., 1., 1.]]),

array([-2.86042729, -2.32853354, -1.79663979, -1.26474604, -0.73285229,

-0.20095854, 0.33093521, 0.86282896, 1.39472271, 1.92661647,

2.45851022]),

<a list of 10 BarContainer objects>)

OBSERVATION
NORMAL DISTRIBUTION:

It follows an empirical rule:

68% of the datapoints, will fall between -1SD to +1SD.


95% of the datapoints, will fall between -2SD to +2SD.
99.99% of the datapoints, will fall between -3SD to +3SD.

localhost:8888/notebooks/Data science/Numpy_KickStart.ipynb 13/14


10/8/21, 12:19 AM Numpy_KickStart - Jupyter Notebook

Explore where function

localhost:8888/notebooks/Data science/Numpy_KickStart.ipynb 14/14

You might also like