Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

Introduction to Business Analytics

Copyright © LEARNXT
Data Wrangling and Manipulation in Python

NumPy Package and Arrays


Copyright © LEARNXT
Objectives
After completing this session, you will be able to:
 Demonstrate an understanding of the basics of NumPy
package

 Explain fundamentals of NumPy Arrays with examples

 Apply built-in functions and perform arithmetic operations


on NumPy arrays

 Explain the process of saving and loading arrays with NumPy

Copyright © LEARNXT
NumPy
Copyright © LEARNXT
Scientific Python
 Extra features required:

 Fast, multidimensional arrays

 Libraries of reliable, tested scientific functions

 Plotting tools

 NumPy is at the core of nearly every scientific Python application or module

 It provides a fast N-d array datatype that can be manipulated in a vectorized form

Copyright © LEARNXT
NumPy Package
 The fundamental library needed for scientific computing with Python is called NumPy

This Open-Source library contains:

 A powerful N-dimensional array object

 Advanced array slicing methods (to select array elements)

 Convenient array reshaping methods

Copyright © LEARNXT
NumPy Package
 NumPy even contains 3 libraries with numerical routines:

 Basic linear algebra functions

 Basic Fourier transforms

 Sophisticated random number capabilities

Copyright © LEARNXT
Install NumPy
 Ensure that the NumPy package is installed on your laptop/computer

 You can use Anaconda command prompt terminal or Jupyter notebook to install the package:

conda install numpy

pip install numpy

Copyright © LEARNXT
Import NumPy
 Import the NumPy package into Python session

import numpy as np

Copyright © LEARNXT
NumPy Arrays
Copyright © LEARNXT
NumPy Arrays
 Lists are useful for storing small amounts of one-dimensional data
>>> a = [1,3,5,7,9] >>> a = [1,3,5,7,9]
>>> print(a[2:4]) >>> b = [3,5,6,7,9]
[5, 7] >>> c = a + b
>>> b = [[1, 3, 5, 7, 9], [2, 4, 6, 8, 10]] >>> print c
>>> print(b[0]) [1, 3, 5, 7, 9, 3, 5, 6, 7, 9]
[1, 3, 5, 7, 9]
>>> print(b[1][2:4])
[6, 8]

 But Lists can’t be used directly with arithmetical operators (+, -, *, /, …)

 Need efficient arrays with arithmetic and better multidimensional tools

NumPy Arrays:

 Like lists, but much more capable, except fixed size


Copyright © LEARNXT
Similarities Between Lists and Arrays
 Both are used for storing data

 Both are mutable

 Both can be indexed and iterated through

 Both can be sliced

Copyright © LEARNXT
Differences Between Lists and Arrays
 Arrays are specially optimized for arithmetic computations so if you’re going to perform similar
operations you should consider using an array instead of a list

 E.g. dividing each element in an array by number 2 is possible without a loop

 Lists are containers for elements having differing data types, but arrays are used as containers
for elements of the same data type

 NumPy arrays are faster and more compact than Python lists

 An array consumes less memory and is convenient to use

 NumPy uses much less memory to store data and it provides a mechanism of specifying the
data types. This allows the code to be optimized even further

Copyright © LEARNXT
Arrays from Data
Demographic data Extract Birth rate as Pandas Series
Birth Internet
Country Name rate users Income Group
Aruba 10.244 78.9 High income
Afghanistan 35.253 5.9 Low income
Angola 45.985 19.1 Upper middle income
Albania 12.877 57.2 Upper middle income
United Arab Emirates 11.044 88 High income
Extract birth rate as numpy array
Convert to data
frame
Convert data frame to numpy array

Copyright © LEARNXT 14
NumPy Array
 NumPy arrays are the one of the most widely used data structuring techniques

 An array is a central data structure of the NumPy library

 An array is a grid of values and it contains information about the raw data, how to locate an
element, and how to interpret an element

 It has a grid of elements that can be indexed in various ways

 The elements are all of the same type, referred to as the array dtype

Copyright © LEARNXT
NumPy Array
 NumPy arrays are of two types:

NumPy Arrays

Vectors Matrices A matrix refers to an array


A vector is an array
1-dimensional 2-dimensional with two dimensions
with a single
arrays arrays For 3-D or higher
dimension - there’s no
dimensional arrays, the
difference between
term tensor is also
row and column
commonly used
vectors
A matrix can still possess a
single row or a column

Copyright © LEARNXT
NumPy Array - Attributes
 An array is usually a fixed-size container of items of the same type and size

 The number of dimensions and items in an array is defined by its shape

 The shape of an array is a tuple of non-negative integers that specify the sizes of each
dimension

Copyright © LEARNXT
NumPy – Creating Arrays
There are several ways to initialize new NumPy arrays, for example from

 A Python list, list of lists, or tuples

 Using functions that are dedicated to generating NumPy arrays, such as arrange(), linspace(),
etc.

 Reading data from files

Copyright © LEARNXT
Creating NumPy Arrays – Examples
simple_list = [101,102,103,104,105,106,107,108,109,110]

simple_list

[101, 102, 103, 104, 105, 106, 107, 108, 109, 110]

# NumPy array from list

array1 = np.array(simple_list)

array1

array([101, 102, 103, 104, 105, 106, 107, 108, 109, 110])

# Type of the array

type(array1)

numpy.ndarray
Copyright © LEARNXT
Creating NumPy Arrays – Examples
list_of_lists = [[10,11,12],[20,21,22],[30,31,32]]

list_of_lists

[[10, 11, 12], [20, 21, 22], [30, 31, 32]]

# create an array

array2 = np.array(list_of_lists)

array2

array([[10, 11, 12],

[20, 21, 22],

[30, 31, 32]])


Copyright © LEARNXT
Creating NumPy Arrays – Built-in Functions
arange(): Returns evenly spaced values within a given interval as input

# Array using built-in function arange()

np.arange(0,20)

# Returns values 0 to 19. Start value is 0 (included). Stop value is 20 (not included)

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,


15, 16, 17, 18, 19])

# Array with arange() and including step argument

np.arange(0,21,4)

array([ 0, 4, 8, 12, 16, 20])

Copyright © LEARNXT
Generate Arrays of 0's
# Generate Array of 0's

array3 = np.zeros(50)

array3

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0.,0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

Copyright © LEARNXT
Generate Arrays of 1's
# Generate Array of 1's

array4 = np.ones((4,5))

array4

array([[1., 1., 1., 1., 1.],

[1., 1., 1., 1., 1.],

[1., 1., 1., 1., 1.],

[1., 1., 1., 1., 1.]])

Useful when we must create an empty array

Example: we initiate an empty array and progressively add results from a loop into the array
Copyright © LEARNXT
Arrays using linspace()
Equally specified values from the interval specified - create numeric sequences

# linspace() - create numeric sequence

array5 = np.linspace(0,20,10)

array5

array([ 0., 2.22222222, 4.44444444, 6.66666667, 8.88888889,11.11111111, 13.33333333,


15.55555556, 17.77777778, 20. ])

Copyright © LEARNXT
Arrays using eye()
# Create an Identity Matrix with eye()

array6 = np.eye(5)

Array6

array([[1., 0., 0., 0., 0.],

[0., 1., 0., 0., 0.],

[0., 0., 1., 0., 0.],

[0., 0., 0., 1., 0.],

[0., 0., 0., 0., 1.]])

Copyright © LEARNXT
Random Numbered Arrays
Create random number arrays using rand(), randn(), randint()

Uniform distribution:

# Array - uniform distribution with rand()

# Every time you run this will generate the new set of numbers

array7 = np.random.rand(3,2)

array7

array([[0.48341811, 0.94935455],

[0.86604955, 0.29532457],

[0.79461142, 0.28140248]])
Copyright © LEARNXT
Random Numbered Arrays
Normal distribution:

# Array - Normal distribution with randn()

array8 = np.random.randn(3,2)

array8

array([[-0.05195311, 0.14081327],

[ 0.57633652, -0.42966707],

[ 1.03544668, -0.81755038]])

Copyright © LEARNXT
Random Numbered Arrays
Integers:

# Array - Integers with randint()

array9 = np.random.randint(5,20,10)

array9

array([15, 16, 14, 15, 12, 17, 14, 11, 18, 12])

Copyright © LEARNXT
Functions & Arithmetic Operations on Arrays
Copyright © LEARNXT
Functions on Arrays
Create an array and reshape into a 5 by 6 matrix

# Create an Array of 30 elements with arange()

sample_array = np.arange(30)

sample_array

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29])

Copyright © LEARNXT
Functions on Arrays
# Reshape the array into a 5 x 6 matrix using reshape()

matrix2 = sample_array.reshape(5,6)

matrix2
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29]])

Copyright © LEARNXT
Functions on Arrays
Get the min and max values in an array

# Create an array of integers using randint()

array9 = np.random.randint(5,20,10)

array([ 9, 7, 11, 12, 9, 14, 18, 9, 6, 11])

# get the minimum number in the array

array9.min()

Copyright © LEARNXT
Functions on Arrays
Get the min and max values in an array

# Get the position of the minimum value in the array

array9.min()

# Get the dimension of the array

array9.shape

(10,)

Copyright © LEARNXT
Universal Array Functions
# Create an array and find the variance

sample_array = np.arange(30)

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29])

# Variance

np.var(sample_array)

74.91666666666667

Copyright © LEARNXT
Universal Array Functions
# Square root

Arr = np.sqrt(sample_array)

Arr
array([0. , 1. , 1.41421356, 1.73205081, 2. ,
2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ,
3.16227766, 3.31662479, 3.46410162, 3.60555128, 3.74165739,
3.87298335, 4. , 4.12310563, 4.24264069, 4.35889894,
4.47213595, 4.58257569, 4.69041576, 4.79583152, 4.89897949,
5. , 5.09901951, 5.19615242, 5.29150262, 5.38516481])

Copyright © LEARNXT
Universal
# log
Array Functions
np.log(sample_array)

array([ -inf, 0. , 0.69314718, 1.09861229, 1.38629436,

1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458,

2.30258509, 2.39789527, 2.48490665, 2.56494936, 2.63905733,

2.7080502 , 2.77258872, 2.83321334, 2.89037176, 2.94443898,

2.99573227, 3.04452244, 3.09104245, 3.13549422, 3.17805383,

3.21887582, 3.25809654, 3.29583687, 3.33220451, 3.36729583])

# Maximum value in the array

np.max(sample_array)

29
Copyright © LEARNXT
Universal Array Functions
Round the array values to 2 decimal places
# Round up the decimals
np.round(Arr, decimals = 2)
array([0. , 1. , 1.41, 1.73, 2. , 2.24, 2.45, 2.65, 2.83, 3. , 3.16,
3.32, 3.46, 3.61, 3.74, 3.87, 4. , 4.12, 4.24, 4.36, 4.47, 4.58,
4.69, 4.8 , 4.9 , 5. , 5.1 , 5.2 , 5.29, 5.39])

# Standard deviation
np.std(Arr)
1.3683899139885065
# Mean
np.mean(Arr)
3.553520654688042

Copyright © LEARNXT
Universal Array Functions - Strings
# Create an array of string values

sports = np.array(['golf', 'cric', 'fball', 'cric', 'Cric', 'fooseball’])

# Fetch unique values from the string-based array

np.unique(sports)

array(['Cric', 'cric', 'fball', 'fooseball', 'golf'], dtype='<U9’)

Copyright © LEARNXT
Arithmetic Operations
# View the sample array

sample_array

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29])

# Addition of arrays

sample_array + sample_array

array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32,34, 36, 38, 40, 42, 44, 46, 48,
50, 52, 54, 56, 58])

Copyright © LEARNXT
Arithmetic Operations
# Division of arrays

sample_array / sample_array

array([nan, 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1.,

1., 1., 1., 1.])

# Addition of a fixed value to the array

sample_array + 1

array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29, 30])

Copyright © LEARNXT
Saving and Loading Arrays with NumPy
Copyright © LEARNXT
Saving Arrays with NumPy
 Save function - saves in working directory as *.npy file

np.save(‘S2_sample_array', sample_array)

 Create a new array called simple_array

simple_array = np.array(['golf', 'cric', 'fball', 'cric', 'Cric','fooseball’])

 Save z function - saving multiple arrays in a zip archive

np.savez(‘S2_arrays.npz', a=sample_array, b=simple_array)

Copyright © LEARNXT
Loading Arrays with NumPy
# Load the saved file S2_sample_array

np.load('sample_array.npy’)

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29])

# Load the saved zip file with multiple arrays

archive = np.load('2_arrays.npz’)

# Load the second file from zip file

archive[‘b’]

array(['golf', 'cric', 'fball', 'cric', 'Cric', 'fooseball'], dtype='<U9')

Copyright © LEARNXT
Summary
 NumPy provides a fast N-d array datatype that can be manipulated in a vectorized form

 This Open-Source library contains: A powerful N-dimensional array object, Advanced array
slicing methods and convenient array reshaping methods

 To import the NumPy package into Python session, import numpy as np

 An array is a grid of values and it contains information about the raw data, how to locate an
element, and how to interpret an element

 Two types of NumPy arrays: vectors (1-dimensional) and matrices (2-dimensional)

 Built-in functions can be applied on NumPy arrays for faster processing

Copyright © LEARNXT
Additional Resources
 McKinney, W. (2013). Python for data analysis. O'Reilly Media.

 Lutz, M. (2013). Learning Python: Powerful object-oriented programming. O'Reilly Media.

 Summerfield, M. (2010). Programming in Python 3: A complete introduction to the Python


language. Pearson Education India.

 Matthes, E. (2019). Python crash course: A hands-on, project-based introduction to


programming (2nd ed.). No Starch Press.

 Beazley, D., & Jones, B. K. (2013). Python cookbook: Recipes for mastering Python 3. O'Reilly
Media.

Copyright © LEARNXT
e-References
 Welcome to Python.org. (n.d.). Python.org. https://www.python.org

 Introduction to Python. (n.d.). W3Schools Online Web


Tutorials. https://www.w3schools.com/python/python_intro.asp

Copyright © LEARNXT 46
Any Questions?

Thank you
Copyright © LEARNXT
Copyright © LEARNXT

You might also like