Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

In [ ]: import numpy as np

NumPy
NumPy
Introduction to NumPy
Creating Arrays
Nested Lists
Array-Generating Functions
Empty Arrays
Ranges
Random Data
Matrix Creation
Data Types
Shapes
Exercises
Exercise 1
Exercise 2
Exercise 3
Exercise 4
Exercise 5
Manipulating arrays
Indexing
Slicing
Boolean Mask
Assigning Values to Subarrays
Exercises
Exercise 1
Exercise 2
Exercise 3
Exercise 4
Exercise 5
Array Operations
Logical Operations
Arithmetic
Aggregative Functions
Vectorization
Broadcasting
Rule 1
Rule 2
Rule 3

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 1/27


8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

Further Reading
Excercises
Exercise 1
Exercise 2
Exercise 3
Exercise 4
Exercise 5
Advanced Manipulation
Reshaping and Transposing
Adding a new dimension with newaxis
Concatenation and Splitting
Exercises
Exercise 1
Exercise 2
Exercise 3
Exercise 4
Additional Resources:

Introduction to NumPy
Datasets can include collections of documents, images, sound clips, numerical measurements, or,
really anything. Despite the heterogeneity, it will help us to think of all data fundamentally as arrays
of numbers.

Data type Arrays of Numbers?

Images Pixel brightness across different channels

Videos Pixels brightness across different channels for each frame

Sound Intensity over time

Numbers No need for transformation

Tables Mapping from strings to numbers

Therefore, the efficient storage and manipulation of large arrays of numbers is fundamental to the
process of doing data science. NumPy is a library specially designed to handle arrays of numerical
data.

NumPy (http://www.numpy.org/) is short for numerical python, and provides functions that are
especially useful when you have to work with large arrays and matrices of numeric data, like matrix
multiplications.

The array object class is the foundation of NumPy, and NumPy arrays are much like nested lists in
base Python. However, NumPy supports vectorization. This means that many operations in
NumPy are written and compiled in C code rather than Python, making it much faster as we will
see.

Creating Arrays
localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 2/27
8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

Nested Lists

Arrays can be created from nested lists. The nesting determines the dimensions of the resulting
array.

In [ ]: # Create array from lists:


lis = [[1,2,3,4,5],[6,7,8,9,10]]
ary = np.array(lis)
print(ary)

Note that dimensions must be consistent. If nested lists do not have the same lengths, NumPy will
create a 1-D array in which the elements are the sublists.

In [ ]: print(np.array([[1,2,3,4,5],[6,7,8,9]]))

The most important attributes of an array are its shape and the number of dimensions.

In [ ]: ary.shape

In [ ]: ary.ndim

Less important but worth mentioning is the dtype of an array indicating what kind of data it
contains.

In [ ]: ary.dtype

Array-Generating Functions
For larger arrays it is inpractical to initialize the data manually. Instead we can use one of the many
functions in numpy that generate arrays of different forms. Some of the more common are:

Empty Arrays

When the intended shape of an array is known in advance but its values are not, we can use
various functions to generate empty arrays.

In [ ]: np.zeros((2, 3))

In [ ]: np.ones((3, 4), dtype=np.int8)

In [ ]: np.full((3, 5), 3.14)

A special case is the function np.empty , which does not initialize any values. It will reserve

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 3/27


8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

memory for the array but use whatever values are already stored there without reseting them. This
can be a useful optimization for speed when creating extremely large arrays.

In [ ]: print(np.empty((2, 3)))


print(np.empty((7, 10)))

Ranges

Numpy also has a number of functions to support creating number ranges, such as:

In [ ]: # Define endpoints and step size


np.arange(start=0, stop=10, step=1)

In [ ]: np.arange(start=6, stop=15, step=2)

In [ ]: # 'step' defaults to 1 and 'start' defaults to 0


np.arange(8)

In [ ]: # Define endpoints and the number of elements


np.linspace(start=1, stop=10, num=15)

In [ ]: # Includes the endpoint by default (non-standard Python behavior!)


np.linspace(start=1, stop=10, num=15, endpoint=False)

Random Data

Arrays can also be initialized with random values. NumPy supports many different probability
distributions.

In [ ]: # Uniform distribution, i.e. all values equally likely,


# between low (inclusive) and high (exclusive)
np.random.uniform(low=0, high=1, size=(3, 3))

In [ ]: # Alias for np.random.uniform(low=0, high=1, ...)


np.random.random(size=(5, 5))

In [ ]: # Normal (Gaussian) distribution centered around 'loc' (mean)


# with a standard deviation of 'scale'
np.random.normal(loc=5, scale=2, size=(3, 3))

Beyond distributions of uniformly distributed floating point values, NumPy also lets us generate
random integers.

In [ ]: np.random.randint(low=1, high=100, size=(4, 4))

Matrix Creation
localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 4/27
8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook
Matrix Creation
Due to their ubiquity, NumPy also has several generating functions for 2-D arrays (matrices)

In [ ]: # Create an NxM identity matrix with 1 along the diagonal and 0 elsewhere
np.eye(N=3, M=5)

In [ ]: # Offset the diagonal


np.eye(N=4, M=4, k=1)

In [ ]: # a diagonal matrix with custom diagonal values


np.diag([1,2,3])

In [ ]: # put the values on the offset diagonal of degree k


# NumPy automatically generates a matrix of the necessary size
np.diag([1,2,3], k=2)

In [ ]: # A matrix with 1's on the diagonal and all lower offset diagonals
# Can also be offset with argument k=...
np.tri(N=5, M=4)

Data Types

Most, if not all, of these functions allow us to determine the data type with the dtype function
argument, e.g.

In [ ]: np.zeros((2, 3), dtype=np.int16)

Some of the most common supported data types are

Data Type Description

np.bool_ or np.bool Boolean (True or False) stored as a byte

np.int8 Byte (-128 to 127)

np.int16 Integer (-32768 to 32767)

np.int32 Integer (-2147483648 to 2147483647)

np.int64 Integer (-9223372036854775808 to 9223372036854775807)

np.int_ or np.int Default integer type (normally either int64 or int32)

np.uint8 Unsigned integer (0 to 255)

np.uint16 Unsigned integer (0 to 65535)

np.uint32 Unsigned integer (0 to 4294967295)

np.uint64 Unsigned integer (0 to 18446744073709551615)

np.float16 Half precision float: sign bit, 5 bits exponent, 10 bits mantissa

np.float32 Single precision float: sign bit, 8 bits exponent, 23 bits mantissa

np.float64 Double precision float: sign bit, 11 bits exponent, 52 bits mantissa

np.float_ or np.float Default float type (normally either float64 or float32)

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 5/27


8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

Shapes

Until now, we've always created 1-D or 2-D arrays but NumPy is in no way limited to this. Any time
a shape or size parameter is used in a function, we can create an array with as many dimensions
as we like, e.g.

In [ ]: # Three 4x5 arrays stacked into a 3-D cube


np.random.randint(low=1, high=10, size=(3, 4, 5))

In [ ]: # Two sets of three 4x5 arrays stacked into 3-D cubes.
np.random.randint(low=1, high=10, size=(2, 3, 4, 5))

Exercises

Exercise 1

Create a new 2x2 array without initializing entries.

In [ ]: ### your code here

In [ ]: # MC
np.empty((2, 2))

Exercise 2

Create a new 3x2x4 array of ones and make sure they're floating point numbers.

In [ ]: ### your code here

In [ ]: # MC
np.ones((3, 2, 4), dtype=np.float)

Exercise 3

Create a 1-D array of 20 evenly spaced elements between 3. (inclusive) and 10. (exclusive).

In [ ]: ### your code here

In [ ]: # MC
np.linspace(start=3, stop=10, num=20, endpoint=False)

Exercise 4

Create a matrix with the values (2, 4, 9) on the third offset diagonal and 0 everywhere else.
localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 6/27
8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

In [ ]: ### your code here

In [ ]: # MC
np.diag((2, 4, 9), k=3)

Exercise 5

You want to simulate a coin toss with a binomial distribution ( np.random.binomial ). If this is a
fair coin, then the probability of getting heads is p=0.5 . If you toss the coin n=100 times, how
often does your simulation toss heads?

In [ ]: ### your code here

In [ ]: # MC
np.random.binomial(n=100, p=0.5)

Manipulating arrays

Indexing
We can index elements in an array using square brackets and indices:

In [ ]: # a vector: the argument to the array function is a Python list


v = np.array([1,2,3,4])
print(v)
print(v[0])

In [ ]: M = np.random.randint(low=1, high=10, size=[3,3])


print(M)
# M is a matrix, or a 2 dimensional array, taking two indices
print(M[1,1])

In [ ]: M = np.random.randint(low=1, high=10, size=[2,3,3])


print(M)
print(M[0, 2, 1])

Slicing
Just as we can use square brackets to access individual array elements, we can also use them to
access subarrays with the slice notation, marked by the colon ( : ) character.
The NumPy slicing
syntax follows that of the standard Python list; to access a slice of an array x , use this:

x[start:stop:step]

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 7/27


8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

Slicing follows the typical Python convention of excluding the stop-index. If any of these are
unspecified, they default to the values start=0 , stop=<size_of_dimension> , step=1 .

In [ ]: v = np.arange(10)
print(v)
print(v[3:7])
print(v[5:])
print(v[:6])
print(v[1:10:3])
print(v[::2])

The second : is unnecessary if no step is specified.

In [ ]: print(v[2:5])
print(v[2:5:1])

Like before, we can index multidimensional arrays by using slices for each dimension.

In [ ]: M = np.random.randint(low=1, high=10, size=(5, 5))


print(M)
print()
print(M[0:2, 3:5])
print()
print(M[::2, 0:2])

If we omit an index of a multidimensional array, it assumes all of the following dimensions should
be indexed fully. For example, indexing a 2-D matrix with only one index slice will return all
columns of the specified rows.

In [ ]: print(M[3])
print(M[3, :])

Boolean Mask
Lastly, we can use boolean masks to select specific values. Masks must have the same shape as
the array itself. Note that NumPy will automatically convert base Python into NumPy arrays. That
means that a mask can be anything that can be converted into an array, e.g. a (nested) list.

In [ ]: v = np.linspace(start=1, stop=10, num=4, endpoint=True)


print(v)
print()
print(v[[True, False, True, True]])

Indexing with boolean masks will always flatten arrays, i.e. all shape information will be lost.

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 8/27


8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

In [ ]: mask = np.array([


[False, False, False, False, False],
[False, False, False, False, False],
[True, False, True, False, False],
[True, True, False, False, False],
[False, False, False, False, False]])
print(M)
print()
print(M[mask])

We can negate boolean NumPy arrays with ~

In [ ]: v = np.arange(5)
mask = np.array([True, True, False, False, False])
print(v[mask])
print(v[~mask])

Assigning Values to Subarrays


We can assign new values to elements in an array using any of the indexing methods shown
above.

In [ ]: M = np.zeros((5, 5), dtype=np.int)


M[0,0] = 1
print(M)

In [ ]: # also works for rows and columns


M[1,:] = 2
M[:,2] = 3
print(M)

In [ ]: # simultaneous assignment of subarray


M[3:5, 2:5] = 4
print(M)

Even though boolean masks flatten outputs when used for selection, they can be used to assign
values while retaining the shape.

In [ ]: mask = np.array([


[False, False, False, False, False],
[False, False, False, False, False],
[True, False, True, False, False],
[True, True, False, False, False],
[False, False, False, False, False]])
M[mask] = 5
print(M)

Assigned values are broadcast to the necessary shape as per the broadcasting rules above. This
means that for assignment with indices/slices, they are broadcast to the subarray shape
localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 9/27
8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

In [ ]: M[0:2, 3:5] = np.array([[-1, -2], [-3, -4]])


print(M)

For boolean masks, the values must be either a scalar value, i.e. a 0-D array, or a 1-D array. Note
that after assignment, the original shape of the array is retained.

In [ ]: mask = np.array([


[True, False, True, False, True ],
[False, True, False, True, False],
[False, False, False, False, False],
[False, False, False, False, False],
[False, False, False, False, False]])
M[mask] = [10, 11, 12, 13, 14]
print(M)

Exercises
Unless otherwise stated, the following exercises are based on the following array. Keep in mind,
with regards to the phrasing, that Python begins indexing at 0, i.e. the 'first' element is the element
with index 0.

In [ ]: np.random.seed(100)
M = np.random.randint(low=-5, high=5, size=(5, 5))
print(M)

Exercise 1

Extract the third column of the matrix M

In [ ]: ### your code here

In [ ]: # MC
print(M[:,2])

Exercise 2

Extract only the odd-indexed rows and columns, i.e. those with indices 1 and 3, of M

In [ ]: ### your code here

In [ ]: # MC
print(M[1:5:2, 1:5:2])
print(M[[1,3], :][:, [1,3]])

Exercise 3
localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 10/27
8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

Extract the positive values of the matrix M

In [ ]: ### your code here

In [ ]: # MC
print(M[M > 0])

Exercise 4

Replace all negative values of matrix M with 0

In [ ]: ### your code here

In [ ]: # MC
M[M < 0] = 0
print(M)

Exercise 5

We can use arrays to represent images. The function create_stick_figure() returns an array
representing a grayscale image. The function show_image(arr) displays the array arr as an
image. Use the array manipulation techniques we've learned so far to perform the following tasks.

a) Remove unnecessary (black) background pixels on any side of the stick figure.

b) Subset the trimmed image into three parts: one containing only the head, one containing the
torso and arms, and one containing only the legs.

c) Remove the arms in the original image by setting all pixels (array entries) corresponding to arms
to black.

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 11/27


8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

In [ ]: def create_stick_figure():


# Set grayscale values. 1 == white and 0 == black
head_val = 1
body_val = 0.25
arms_val = 0.5
legs_val = 0.75
# Create array
arr = np.zeros((21, 21))
# head
arr[(1, 5), 9:12] = head_val
arr[2:5, (8, 12)] = head_val
# body
arr[6:14, 10] = body_val
# arms
arr[7, 9:12] = arms_val
arr[8, (8, 12)] = arms_val
arr[9:12, (7, 13)] = arms_val
# legs
arr[14, (9, 11)] = legs_val
arr[15:20, (8, 12)] = legs_val
return arr

def show_image(arr):
plt.imshow(arr, cmap='gray', vmin=0, vmax=1)
plt.xticks(np.arange(arr.shape[1], step=2))
plt.yticks(np.arange(arr.shape[0], step=2))

# Demo code
image = create_stick_figure()
show_image(image)

In [ ]: ### your code here


# a)

In [ ]: ### your code here


# b)

In [ ]: ### your code here


# c)

In [ ]: # MC
# a)
image_trimmed = image[1:-1, 7:14]
show_image(image_trimmed)

In [ ]: # MC
# b)
head = image_trimmed[:5, :]
torso = image_trimmed[5:13, :]
legs = image_trimmed[13:, :]

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 12/27


8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

In [ ]: # MC
show_image(head)

In [ ]: # MC
show_image(torso)

In [ ]: # MC
show_image(legs)

In [ ]: # MC
# c)
image[image == 0.5] = 0
show_image(image)

Array Operations
Apart from just manipulating array contents directly, we can also perform operations on them, such
a logical, arithmetical, or aggregative operations.

Logical Operations
Logical operations on NumPy arrays evaluate a condition on every individual entry and return
boolean arrays of the same shape as the original array.

In [ ]: M = np.random.randint(low=-10, high=10, size=(5, 5))


print(M)
print()
print(M >= 0)

We can, of course, use the resulting boolean array as a selection mask

In [ ]: print(M[M >= 0])

and to assign new values

In [ ]: M[M > 0] = 20


print(M)

When using boolean arrays in conditions, for example if statements and other boolean
expressions, one needs to use any or all , which requires that any or all elements in the array
evalute to True :

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 13/27


8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

In [ ]: M = np.array([[ 1, 4],[ 9, 16]])


print(M)
print()
print((M > 5).any())
print()
print((M > 5).all())

Base Python doesn't play well with boolean arrays consisting of multiple values.

In [ ]: # Uncomment to run and see Exception


# if M > 5:
# print("Hello World")

In [ ]: #any
if (M > 5).any():
print("At least one element in M is larger than 5")
else:
print("No element in M is larger than 5")

In [ ]: #all
if (M > 5).all():
print("All elements in M are larger than 5")
else:
print("Not all elements in M are larger than 5")

Arithmetic
Arithemtical operations on NumPy arrays are performed on an element-by-element basis. We can
either perform this arithmetic between an array and a scalar, i.e. a single number, or between two
arrays.

In the case of a scalar, the identical operation is applied to every single array entry.

In [ ]: v1 = np.arange(0, 5)
v1

In [ ]: v1 * 2

In [ ]: v1 + 2

In [ ]: A = np.random.randint(low=-5, high=5, size=(3, 3))


print(A)
print()
print(A * 2)
print()
print(A ** 2)

When we add, subtract, multiply and divide arrays with each other, the default behaviour is
localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 14/27
8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

element-wise operations:

In [ ]: v1 = np.arange(start=5, stop=10)


v2 = np.arange(start=0, stop=5)
print(v1)
print(v2)
print(v1 + v2) # element wise
print(v1 ** v2)

Aggregative Functions
We can also aggregate over arrays using several built-in functions. For example,

In [ ]: A = np.random.randint(low=0, high=10, size=(2, 2))


print(A)
print()
print(np.sum(A))

NumPy provides many aggregation functions, but we won't discuss them in detail here.
Additionally, most aggregates have a NaN -safe counterpart that computes the result while
ignoring missing values, which are marked by the special IEEE floating-point NaN value.
Some of
these NaN -safe functions were not added until NumPy 1.8, so they will not be available in older
NumPy versions.

The following table provides a list of useful aggregation functions available in NumPy:

Function Name NaN-safe Version Description

np.sum np.nansum Compute sum of elements

np.prod np.nanprod Compute product of elements

np.mean np.nanmean Compute mean of elements

np.std np.nanstd Compute standard deviation

np.var np.nanvar Compute variance

np.min np.nanmin Find minimum value

np.max np.nanmax Find maximum value

np.argmin np.nanargmin Find index of minimum value

np.argmax np.nanargmax Find index of maximum value

np.median np.nanmedian Compute median of elements

np.percentile np.nanpercentile Compute rank-based statistics of elements

np.any N/A Evaluate whether any elements are true

np.all N/A Evaluate whether all elements are true

" NaN -safe" means that the function ignores any missing values, e.g.

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 15/27


8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

In [ ]: A = np.array([[1, 2], [3, np.nan]])


print(A)
print()
print(np.sum(A))
print(np.nansum(A))

We can apply these functions either to entire arrays or individual axes. To understand how the
axis parameter works it's best to stop thinking of arrays as rows and columns but as nested lists.
axis=0 performs an operation along the outer-most dimension, e.g. if

𝐴 = [[1[2 2]]5]
then the two arrays (1, 5) and (2, 2) would be added together elementwise, resulting in (3, 7). For
axis=1 , the individual elements of each array in the next layer would be added together, i.e. (1 +
5, 2 + 2) = (6, 4).

In [ ]: A = np.array([[1, 5], [2, 2]])


print(A)
print()
print(np.sum(A, axis=0))
print()
print(np.sum(A, axis=1))

Vectorization
Vectorization in NumPy refers to the implementation of mathematical operations in compiled C
code rather than interpreted Python code. This provides a substantial performance boost.
Furthermore, due to NumPy's more intuitive treatment of array arithmetic, as much of a program's
math should be formulated in terms of NumPy operations. Many packages, like Pandas, SciPy,
and Scikit-Learn make use of this vectorization.

In [ ]: # More intuitive treatment of lists versus arrays


lis = [1,2,3,4,5]
print(lis + lis)

ary = np.array(lis)
print(ary + ary)

Achieving the same result in base Python requires loops

In [ ]: [x+x for x in lis]

This takes substantially longer. NumPy is faster by a factor of over 100 when adding large lists
together.

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 16/27


8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

In [ ]: lis = range(10000)


ary = np.array(lis)
%timeit [x+x for x in lis]
%timeit ary + ary

We call operations on numpy arrays vectorized. This feature is the reason NumPy sits at the base
of so many numerical and scientific Python libraries, e.g. scipy, scikit-learn and Pandas.

Broadcasting
If we can add arrays together element-wise then we also need to make sure we have rules in place
for when their shape doesn't match. For example, we want np.array([1,2,3]) * 2 =
np.array([2,4,6]) . That means that the scalar 2 needs to be broadcast to the same shape as
the array. NumPy defines three rules for broadcasting that determine how binary functions, e.g.,
addition, subtraction, multiplication, division, or exponentiation, are performed on arrays of different
sizes.

Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer
dimensions is padded with ones on its leading (left) side.
Rule 2: If in any dimensions the sizes disagree and one of the arrays has a size of 1 in that
dimension then that array is stretched to match the other shape.
Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

Rule 1

Let's take a moment to highlight a very important distinction between 1D arrays and 2D arrays with
a single row. These may look and behave similarly but are, in fact, quite different.

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 17/27


8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

In [ ]: a1 = np.ones(5)
print(a1)
print("a1 shape: {}".format(a1.shape))
print()
a2 = np.ones((1, 5))
print(a2)
print("a2 shape: {}".format(a2.shape))

The same applies to higher-dimensional arrays with more "padded dimensions", e.g.

In [ ]: a3 = np.ones((1, 1, 1, 5))


print(a3)
print("a3 shape: {}".format(a3.shape))

We will see later in this chapter that they behave differently with regards to indexing and stacking.
For now, simply keep in mind that they have different dimensions despite containing the same data
arranged in the same way, i.e. a single row.

With this in mind, the following example highlights rule 1. Array b has a single dimension and
must first be padded with an empty dimension before being added, element-wise, to array a .

In [ ]: a = np.ones((1, 3), dtype=int)


b = np.arange(1, 4)

print(a)
print("")
print(b)
print("")
print(a + b)

Scalars are a special case in this.

In [ ]: M = np.ones((3, 3))


print(M)
print()
print(M + 5)

Rule 2

Both arrays have the same number of dimensions, but while a has 3 rows, b has only one.
Therefore, b is stretched to have 3 rows and the two resulting matrices are added together
element-wise.

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 18/27


8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

In [ ]: a = np.zeros((3, 3), dtype=np.int)


b = np.array([[1, 2, 3]])

print(a)
print("a shape: {}".format(a.shape))
print("")
print(b)
print("b shape: {}".format(b.shape))
print("")
print(a + b)

In this example, a and b don't match in either dimension, but for each one, at least one of the
arrays has a size of 1 so that they can be stretched accordingly. The result is a 3x3 matrix.

In [ ]: a = np.array([[1], [2], [3]])


b = np.array([[1, 2, 3]])

print(a)
print("a shape: {}".format(a.shape))
print("")
print(b)
print("b shape: {}".format(b.shape))
print("")
print(a + b)

Internally, NumPy expands this to:

1 1 1 1 2 3 2 3 4

2 2 2 + 1 2 3 = 3 4 5

3 3 3 1 2 3 4 5 6

Rule 3

Here, the second dimension doesn't match but neither of the arrays has a size of 1. NumPy
doesn't know how to solve this problem this and throws an exception.

In [ ]: # Rule three


a = np.ones((3, 2))
b = np.random.randint(low=1, high=10, size=(3, 3))

print(a)
print("a shape: {}".format(a.shape))
print("")
print(b)
print("b shape: {}".format(b.shape))
print("")
# Uncomment to see Exception
#print(a + b)

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 19/27


8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

Further Reading
https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html
(https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html)

Excercises

Exercise 1

Create two 8x8 arrays of random integers. The first should have only negative numbers between
-10 and -1 (inclusive) and the second should have only positive numbers between 1 and 10
(inclusive). Add them together and save the result as a variable A .

In [ ]: ### Your code here

In [ ]: # MC
A1 = np.random.randint(low=-10, high=0, size=(8, 8))
A2 = np.random.randint(low=1, high=11, size=(8, 8))
A = A1 + A2
print(A)

Exercise 2

Calculate the mean of the entire matrix A .

In [ ]: ### Your code here

In [ ]: # MC
np.mean(A)

Exercise 3

How many of the entries of the resulting matrix A are positive, negative, and zero?

In [ ]: ### Your code here

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 20/27


8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

In [ ]: # MC
# Positive
A_pos = A > 0
print(A_pos)
print(np.sum(A_pos))
print()
# Negative
A_neg = A[A < 0]
print(A_neg)
print(len(A_neg))
print()
# Zero
A_zero = np.sum(A == 0)
print(A_zero)

Exercise 4

Calculate the mean of every row and column of the matrix A

In [ ]: ### Your code here

In [ ]: # MC
# Columns
print(A)
print()
col_means = np.mean(A, axis=0)
print(col_means)
row_means = np.mean(A, axis=1)
print(row_means)

Exercise 5

Make use of broadcasting rules to multiply the first row of the following array by 2, the second row
by 3 and the third row by 4.

In [ ]: ### Your code here


A = np.array([
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 21/27


8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

In [ ]: # MC
A = np.array([
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
print(A)
print(A.shape)
print()
B = np.array([[2], [3], [4]])
print(B)
print(B.shape)
print()
print(A * B)

Advanced Manipulation

Reshaping and Transposing


On disk, NumPy arrays are stored by their values and their shapes separately. That means we can
change the shape of an array very quickly, regardless of the actual size. The array dimensions
must match, i.e. the new shape must have space for exactly as many elements as the old shape.

Elements are reshaped in an "inside-out" fashion. That means the inner-most dimensions are filled
with values first and then combined in the outer dimensions. In the context of 2D matrices, this
means that values are set rows-first.

In [ ]: a = np.arange(12)
print(a)
print()
print(a.reshape(3, 4))
print()
print(a.reshape(6, 2))

Alternatively, we can also transpose matrices. Transposing means that the order of dimensions
become flipped, i.e. the first dimensions becomes the last, the last the first, etc. Consequently,
transposing 1D arrays has no effect.

In [ ]: print(a)
print(a.transpose())

Transposing 2D arrays means that rows become columns and columns become rows.

In [ ]: A = np.arange(15).reshape(3, 5)
print(A)
print()
print(A.transpose())

For higher-dimensional arrays, the order of the dimensions reverses. Within this new shape,
localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 22/27
8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook
For higher dimensional arrays, the order of the dimensions reverses. Within this new shape,
values are then set in the same "inside-out" fashion.

In [ ]: A = np.random.randint(low=-5, high=5, size=(2, 3, 4, 5, 6))


print(A.shape)
print()
print(A.transpose().shape)

Alternatively, we can also define how we want to reorder the dimensions.

In [ ]: print(A.shape)
print()
print(A.transpose((0, 1, 2, 4, 3)).shape)

To help you understand what is happening here, it is easiest to picture this as creating an empty
array with a specified shape and then filling it with the values of the original array, even though this
isn't actually what happens "under the hood".

Adding a new dimension with newaxis


With newaxis, we can insert new dimensions in an array, for example converting a vector to a
column or row matrix.

In [ ]: v = np.arange(5)
print(v)
print(v.shape)
print()
v2 = v[np.newaxis, :]
print(v2)
print(v2.shape)
print()
v3 = v[:, np.newaxis]
print(v3)
print(v3.shape)
print()

This is essentially shorthand for reshaping an array and becomes useful when we don't want to
explictly list the old dimensions of the array, e.g.

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 23/27


8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

In [ ]: A = np.random.randint(low=-5, high=5, size=(3, 4))


print(A)
print(A.shape)
print()
A2 = A[:, np.newaxis, :]
print(A2)
print(A2.shape)
print()
# We have to explicitly list the old dimensions
A3 = A.reshape(3, 1, 4)
print(A3)
print(A3.shape)

Concatenation and Splitting


We can concatenate arrays along given axes. In order to be concatenated, they must have the
same number of dimensions and their shapes must match in all dimensions but the one being
concatenated along

In [ ]: A = np.arange(10)
B = np.arange(20, 40).reshape((2,10))
print('A')
print(A)
print(A.shape)
print()
print('B')
print(B)
print(B.shape)
print()

In [ ]: print(np.concatenate((A, A)))


print()
print(np.concatenate((B, B)))

By default, np.concatenate will combine arrays along axis=0 . We can specify the axis along
which to concatenate, however.

In [ ]: print(np.concatenate((B, B), axis=1))

While np.concatenate joins arrays along existing axes, np.stack combines them along new
axes.

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 24/27


8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

In [ ]: print(A)
print(A.shape)
print()
A2 = np.stack((A, A), axis=0) # vertically
print(A2)
print(A2.shape)
print()
A3 = np.stack((A, A), axis=1) # horizontally
print(A3)
print(A3.shape)

We can also split arrays along a certain axis into sections. We can either dictate how many equally
sized parts the array should be split into or we can determine specifically where to split the array

In [ ]: A = np.arange(12)
print(A)
print()
# Split into 3 equally sized parts
print(np.split(A, 3))
print()
# Split at specific indices
print(np.split(A, (2, 3, 8)))

Note that NumPy will throw an exception if equally sized parts cannot be created, e.g. an array
with 10 numbers cannot be split into 4 equally sized parts

In [ ]: # Uncomment for exception


# np.split(np.arange(10), 4)

The axis argument allows us to determine along which axis to split the array

In [ ]: print(B)
print()
print(np.split(B, 2, axis=0))
print()
print(np.split(B, 2, axis=1))

Exercises

Exercise 1

Let x be array

[[1, 2, 3],

[4, 5, 6]].

Convert it to

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 25/27


8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

[[1 4 2 5 3 6]]

In [ ]: ### Your code here

In [ ]: # MC
x = np.array([[1, 2, 3], [4, 5, 6]])
print(x)
print()
print(x.reshape(1,6))

Exercise 2

Let x be an array

[[1, 2, 3]

[4, 5, 6]]

and y be an array

[[ 7, 8, 9]

[10, 11, 12]]

Concatenate x and y so that a new array looks like

[[1, 2, 3, 7, 8, 9]

[4, 5, 6, 10, 11, 12]]

In [ ]: ### Your code here

In [ ]: # MC
x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.array([[7, 8, 9], [10, 11, 12]])
np.concatenate((x,y), axis=1)

Exercise 3

Let x be an array [1, 2, 3, ..., 9]. Split x into 3 arrays, each of which has 4, 2, and 3 elements in the
original order.

In [ ]: ### Your code here

In [ ]: # MC
x = np.arange(1, 10)
np.split(x, [4,6])

Exercise 4

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 26/27


8/22/22, 9:21 PM Python_Day4_MC - Jupyter Notebook

Let x be an array [0, 1, 2]. Convert it to

[[0, 1, 2, 0, 1, 2]

[0, 1, 2, 0, 1, 2]]

In [ ]: ### Your code here

In [ ]: # MC
x = np.array([0, 1, 2])
x2 = np.concatenate((x, x))
print(x2)
print()
print(np.stack((x2, x2), axis=0))
print()

# Alternatively...
x3 = x2[np.newaxis, :]
print(np.concatenate((x3, x3), axis=0))
print()

Additional Resources:
numpy Quickstart Guide (https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)
Rahul Dave's CS109 lab1 content at Harvard (https://github.com/cs109/2015lab1)
The Data Incubator (https://www.thedataincubator.com)
Python Data Science Handbook (https://github.com/jakevdp/PythonDataScienceHandbook)

localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Advanced)/Day 4/Python_Day4_MC.ipynb 27/27

You might also like