L2. Numpy

Lecture 2.
Numpy
2-1. Numpy basic
Numpy : efficient implementation of n-dim array
built in C : fast
1-d array, 2-d array, ..., n-d array
In [414]:
i m p o r t numpy a s np
In [415]:
a = [1,2,3,4,5,6,7,8]
print("a : List =", a)
b = np. array(a)
print("b : np array =", b)
a : List = [1, 2, 3, 4, 5, 6, 7, 8]
b : np array = [1 2 3 4 5 6 7 8]
NumPy standard data types

NumPy arrays contain values of a single type
data type can be specified when constructing an array
np.zeros(10, dtype=int)
np.zeros(10, dtype=float)
np.zeros(10, dtype='int16')
np.zeros(10, dtype=np.float32)
Data type Description
bool_ Boolean (True or False) stored as a byte
int_ Default integer type (same as C long ; normally either int64 or int32 )
intc Identical to C int (normally int32 or int64 )

Data type Description
intp Integer used for indexing (same as C ssize_t ; normally either int32 or int64 )
int8 Byte (-128 to 127)
int16 Integer (-32768 to 32767)
int32 Integer (-2147483648 to 2147483647)
int64 Integer (-9223372036854775808 to 9223372036854775807)
uint8 Unsigned integer (0 to 255)
float_ Shorthand for float64 .
float16 Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
float32 Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
float64 Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
complex_ Shorthand for complex128 .
complex64 Complex number, represented by two 32-bit floats
complex128 Complex number, represented by two 64-bit floats
In [416]:
np. array([1, 2, 3, 4], dtype= 'float32')
Out[416]: array([1., 2., 3., 4.], dtype=float32)
Creating arrays from scratch

shape :
(d1) : 1-d array of size d1
(d1,d2) : 2-d array of size d1xd2
(d1,d2,d3) : 3-d array of size d1xd2xd3
...
In [417]:
# Create a length-10 integer array filled with zeros
a0 = np. zeros(10, dtype= int)
print(a0. shape)
print(a0)
# Create a 3x5 floating-point array filled with ones
a1 = np. ones((2, 5), dtype= float)
print(a1. shape)
print(a1)
# Create a 3x5 array filled with 3.14
af = np. full((2, 5), 3.14)
print(af)
(10,)
[0 0 0 0 0 0 0 0 0 0]
(2, 5)
[[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]]
[[3.14 3.14 3.14 3.14 3.14]
[3.14 3.14 3.14 3.14 3.14]]
In [418]:
# Create a 3x3 identity matrix
np. eye(3)
Out[418]: array([[1., 0., 0.],

[0., 1., 0.],
[0., 0., 1.]])
In [419]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np. arange(0, 20, 2)
Out[419]: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
Creating array with random numbers

In [9]:
# setting seed for random number generator
# for reproducibility
i m p o r t random
random. seed(0)
np. random. seed(0)
In [10]:
# Create a 3x3 array of uniform[0,1] random numbers
np. random. random(size= (3, 3))
Out[10]: array([[0.5488135 , 0.71518937, 0.60276338],

[0.54488318, 0.4236548 , 0.64589411],
[0.43758721, 0.891773 , 0.96366276]])
In [421]:
# Create a 3x3 array of random integers in the interval [0, 10)
np. random. randint(0, 10, size= (3, 3))
Out[421]: array([[7, 1, 6],

[9, 9, 8],
[6, 3, 4]])
In [422]:
# Create a 3x3 array of N(0,1)
np. random. randn(3,3)
Out[422]: array([[ 0.33786932, 1.39970946, 1.1298669 ],

[-0.07111281, -0.80368313, -1.11158007],
[ 1.01861985, 0.36387617, -0.30621626]])
In [423]:
# Create a 3x3 array of N(50,1)
np. random. normal(50, 10, size= (3, 3))
Out[423]: array([[52.7827885 , 50.90993972, 35.35756522],

[66.58160611, 53.7680413 , 36.89402385],
[53.89007158, 60.90909425, 61.22526522]])
NumPy array attributes

dtype : the data type of the array
ndim : the number of axes
shape : the size of each axis
size : the total number of element in the array
itemsize : the size of each array element (in bytes)
nbytes : the total size of the array (in bytes)
In [424]:
np. random. seed(0) # seed for reproducibility
x3 = np. random. randint(0, 10, size= (3, 4, 5)) # Three-dimensional array
print("dtype:", x3. dtype)
print("ndim: ", x3. ndim)
print("shape:", x3. shape)
print("size: ", x3. size)
print("itemsize:", x3. itemsize, "bytes")
print("nbytes:", x3. nbytes, "bytes")
dtype: int32
ndim: 3
shape: (3, 4, 5)
size: 60
itemsize: 4 bytes
nbytes: 240 bytes
Array indexing, slicing: similar to python list

In [425]:
x1 = np. arange(10)
print (x1)
[0 1 2 3 4 5 6 7 8 9]
In [426]:
x1[1] = 1.8 # truncated to integer
print (x1[0], x1[1], x1[- 1], x1[- 2])
print (x1[:4])
print (x1[4:7])
print (x1[7:])
print (x1[7:- 1])
print (x1[1:8:2])
0 1 9 8
[0 1 2 3]
[4 5 6]
[7 8 9]
[7 8]
[1 3 5 7]
Slicing
- a view (not a copy) of the base array
- to make a copy, use copy()
In [ ]:
x1 = np. arange(10)
y = x1[4:7]
print(x1, y)
y[0] = 0
print(x1, y) # TAQ
z = x1[7:]. copy()
z[0] = 0
print(x1,z) # TAQ
In [428]:
x2 = np. random. randint(0, 100, size= (3,4))
print (x2)
print (x2[0]) # first row
print (x2[0,1:3])
print (x2[:,1]) # second col
[[42 58 31 1]
[65 41 57 35]
[11 46 82 91]]
[42 58 31 1]
[58 31]
[58 41 46]
Array reshaping
y = x.reshape(new_shape) : changes shapes of x
no-copy view : reference the original array
x.reshape(-1) : flattens to 1-d array
numpy n-d array : 1-d array storage + n-d view
In [ ]:
x1 = np. arange(9)
print(x1, '\n')
x2 = x1. reshape((3, 3))
print(x2, '\n')
x2[1] = 0 # x2[1] = x2[1,:]
print(x1) # TAQ
In [16]:
x2 = np. arange(12). reshape((3, 4))
print (x2, '\n')
print (x2[0], '\n')
print (x2[1]. reshape((1,4)), '\n')
print (x2[2]. reshape((4,1)), '\n')
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[0 1 2 3]
[[4 5 6 7]]
[[ 8]
[ 9]
[10]
[11]]
In [17]:
print (x2. reshape(- 1), '\n')
print (x2. reshape((4,- 1)), '\n')
print (x2. reshape((- 1,6)), '\n')
print (x2. reshape((2,2,- 1)))
[ 0 1 2 3 4 5 6 7 8 9 10 11]
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]]
[[[ 0 1 2]
[ 3 4 5]]
[[ 6 7 8]
[ 9 10 11]]]
1-d index of n-d array

x.shape = (d 0, d1 , d2 )
y.reshape(-1) : flattened(1-d) view of x

x[a, b, c] ≡ y[k], where k = a(d1 d2 ) + bd2 + c = (ad1 + b)d2 + c
In [6]:
x3 = np. arange(2* 3* 4)
x4 = x3. reshape((2,3,4))
a,b,c = 1,1,2
print(x4[a,b,c])
k = (a* 3 + b)* 4 + c
print(x3[k])
x3[k] = 0
print (x4[a,b,c]) # TAQ
18
18
0
Concatenating arrays
In [18]:
x = np. array([1, 2, 3])
y = np. array([3, 2, 1])
print (np. concatenate([x, y]), '\n')
z = [99, 99, 99]
print(np. concatenate([x, y, z]))
[1 2 3 3 2 1]
[ 1 2 3 3 2 1 99 99 99]
In [20]:
x = np. arange(0,8). reshape((2, 4))
y = np. arange(8,16). reshape((2, 4))
print (x, '\n')
print (y, '\n')
print (np. concatenate([x,y], axis= 0), '\n')
print (np. vstack([x,y]), '\n')
print (np. concatenate([x,y], axis= 1), '\n')
print (np. hstack([x,y]))
[[0 1 2 3]
[4 5 6 7]]
[[ 8 9 10 11]
[12 13 14 15]]
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
[[ 0 1 2 3 8 9 10 11]
[ 4 5 6 7 12 13 14 15]]
[[ 0 1 2 3 8 9 10 11]
[ 4 5 6 7 12 13 14 15]]
2-2. Computation on NumPy arrays: Universal

Functions
Loops are slow
In [22]:
a = np. random. random(size= 1000000)
print(a. sum()) # TAQ : any guess?
500387.3135894248
the following code is slow, because of?

using list?
using for loop?
In [23]:
# using list and for-loop
d e f reciprocal_1(x):
n = len(x)
y = []
s = 0.0
f o r i i n range(n):
z = 1.0 / x[i]
s + = z
y. append(z)
r e t u r n y, s
% t i m e i t b, s = reciprocal_1(a)
381 ms ± 13.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [24]:
# using numpy array and for-loop
n = len(x)
y = np. zeros(n)
f o r i i n range(n):
y[i] = 1.0 / x[i]
r e t u r n y, y. sum()
% t i m e i t b, s = reciprocal_2(a)
373 ms ± 37 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [25]:
# using numpy array and no-loop
y = 1/ x
r e t u r n y, y. sum()
% t i m e i t b, sum = reciprocal_3(a)
4.38 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
UFuncs : vectorized opertions
ufunc : universal function
very fast
element-wise operations on numpy array
unary
binary: scalar ⊙ np array
binary: np array ⊙ np array
In [438]:
x = np. arange(0,11,2)
y = np. arange(1,12,2)
print(x)
print(y)
[ 0 2 4 6 8 10]
[ 1 3 5 7 9 11]
In [439]:
print("x =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x / / 2) # floor division
x = [ 0 2 4 6 8 10]
x + 5 = [ 5 7 9 11 13 15]
x - 5 = [-5 -3 -1 1 3 5]
x * 2 = [ 0 4 8 12 16 20]
x / 2 = [0. 1. 2. 3. 4. 5.]
x // 2 = [0 1 2 3 4 5]
In [440]:
print("-x = ", - x)
print("x ** 2 = ", x * * 2)
print("x % 2 = ", x % 2)
print("-(x/2+1)**2 = ", - (0.5* x + 1) * * 2)
-x = [ 0 -2 -4 -6 -8 -10]
x ** 2 = [ 0 4 16 36 64 100]
x % 2 = [0 0 0 0 0 0]
-(x/2+1)**2 = [ -1. -4. -9. -16. -25. -36.]
In [441]:
print (x + 2)
print (np. add(x, 2))
[ 2 4 6 8 10 12]
[ 2 4 6 8 10 12]
The following table lists the arithmetic operators implemented in NumPy:
Operator Equivalent ufunc Description
+ np.add Addition (e.g., 1 + 1 = 2 )
- np.subtract Subtraction (e.g., 3 - 2 = 1 )
- np.negative Unary negation (e.g., -2 )
* np.multiply Multiplication (e.g., 2 * 3 = 6 )
/ np.divide Division (e.g., 3 / 2 = 1.5 )
// np.floor_divide Floor division (e.g., 3 // 2 = 1 )

Operator Equivalent ufunc Description
** np.power Exponentiation (e.g., 2 ** 3 = 8 )
% np.mod Modulus/remainder (e.g., 9 % 4 = 1 )
Math functions
In [442]:
x = [- 1, 2, - 3]
print("x =", x) # x: python list
y = np. abs(x) # x is converted to np array, so is y
print("y=|x| =", y)
x = [-1, 2, -3]
y=|x| = [1 2 3]
In [443]:
x = [1, 2, 3]
print("e^y =", np. exp(y))
print("2^y =", np. exp2(y))
print("3^y =", np. power(3, y))
e^y = [ 2.71828183 7.3890561 20.08553692]

2^y = [2. 4. 8.]
3^y = [ 3 9 27]
In [444]:
x = [1, 2, 4, 10]
print("x =", x)
print("ln(x) =", np. log(x))
print("log2(x) =", np. log2(x))
print("log10(x) =", np. log10(x))
x = [1, 2, 4, 10]
ln(x) = [0. 0.69314718 1.38629436 2.30258509]
log2(x) = [0. 1. 2. 3.32192809]
log10(x) = [0. 0.30103 0.60205999 1. ]
Special function : np.expm1(x), np.log1p(x)

for computing more pricisely when x is small
np.expm1(x) : high precision function for np.exp(x)-1
np.log1p(x) : high precision function for np.log(1+x)
In [445]:
x = np. array([0.001, 0.0001, 0.00001], dtype= np. float32)
y = np. array(x, dtype= np. float64)
print("exp(x) - 1 =", np. exp(x)- 1)
print("exp(y) - 1 =", np. exp(y)- 1)
print("expm1(x) =", np. expm1(x))
print("log(1 + x) =", np. log(1+ x))
print("log(1 + y) =", np. log(1+ y))
print("log1p(x) =", np. log1p(x))
exp(x) - 1 = [1.00052357e-03 1.00016594e-04 1.00135803e-05]

exp(y) - 1 = [1.00050021e-03 1.00004998e-04 1.00000497e-05]
expm1(x) = [1.0005003e-03 1.0000499e-04 1.0000050e-05]
log(1 + x) = [9.99546959e-04 1.00011595e-04 1.00135303e-05]
log(1 + y) = [9.99500381e-04 9.99949978e-05 9.99994975e-06]
log1p(x) = [9.995003e-04 9.999500e-05 9.999950e-06]
Trigonometric functions
In [446]: theta = np. linspace(0, np. pi, 4)
print("theta = ", theta)
print("sin(theta) = ", np. sin(theta))
print("cos(theta) = ", np. cos(theta))
print("tan(theta) = ", np. tan(theta))
theta = [0. 1.04719755 2.0943951 3.14159265]

sin(theta) = [0.00000000e+00 8.66025404e-01 8.66025404e-01 1.22464680e-16]
cos(theta) = [ 1. 0.5 -0.5 -1. ]
tan(theta) = [ 0.00000000e+00 1.73205081e+00 -1.73205081e+00 -1.22464680e-16]
In [447]:
x = [- 1, 0, 1]
print("x = ", x)
print("arcsin(x) = ", np. arcsin(x))
print("arccos(x) = ", np. arccos(x))
print("arctan(x) = ", np. arctan(x))
x = [-1, 0, 1]
arcsin(x) = [-1.57079633 0. 1.57079633]
arccos(x) = [3.14159265 1.57079633 0. ]
arctan(x) = [-0.78539816 0. 0.78539816]
Specialized functions : gamma, beta, erf, ...

scipy.special : provides many special functions
In [448]:
f r o m scipy i m p o r t special
In [449]:
# Gamma functions (generalized factorials) and related functions
x = [1, 5, 10]
print("gamma(x) =", special. gamma(x))
print("ln|gamma(x)| =", special. gammaln(x))
print("beta(x, 2) =", special. beta(x, 2))
gamma(x) = [1.0000e+00 2.4000e+01 3.6288e+05]

ln|gamma(x)| = [ 0. 3.17805383 12.80182748]
beta(x, 2) = [0.5 0.03333333 0.00909091]
In [450]:
# Error function (integral of Gaussian)
# its complement, and its inverse
x = np. array([0, 0.3, 0.7, 1.0])
print("erf(x) =", special. erf(x))
print("erfc(x) =", special. erfc(x))
print("erfinv(x) =", special. erfinv(x))
erf(x) = [0. 0.32862676 0.67780119 0.84270079]

erfc(x) = [1. 0.67137324 0.32219881 0.15729921]
erfinv(x) = [0. 0.27246271 0.73286908 inf]
Specifying output
In [27]:
x = np. arange(5)
y = np. arange(10)
np. multiply(x, 10, out= y[3:8]) # store x*10 to y[3:8]
print(y)
y[3:8] = np. multiply(x, 10)
print(y)
[ 0 1 2 0 10 20 30 40 8 9]
[ 0 1 2 0 10 20 30 40 8 9]
2-3. Aggregations : sum, min, max, and so on
numpy aggregation functions are much faster than standard python aggregation
In [452]:
L = np. random. random(100000)
% t i m e i t sum(L)
% t i m e i t np.sum(L)
15.4 ms ± 243 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
45.2 µs ± 1.52 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [453]:
% t i m e i t max(L)
% t i m e i t L.max()
10 ms ± 248 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
36.3 µs ± 2.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [28]:
M = np. random. random((3, 4))
print(M)
[[0.59753225 0.07463842 0.43803321 0.21812627]

[0.53159247 0.10793176 0.8448757 0.03346237]
[0.32140902 0.65124221 0.61473394 0.2195387 ]]
In [29]:
print(M. sum())
print(M. sum(axis= 0), '\n')
print(M. cumsum(axis= 1), '\n')
print(M. prod(axis= 0), '\n')
print(M. cumprod(axis= 1), '\n')
4.653116331422877
[1.45053374 0.83381239 1.89764286 0.47112734]
[[0.59753225 0.67217067 1.11020389 1.32833016]

[0.53159247 0.63952423 1.48439993 1.5178623 ]
[0.32140902 0.97265124 1.58738518 1.80692388]]
[0.10209353 0.00524631 0.22750296 0.00160242]
[[0.59753225 0.04459886 0.01953578 0.00426127]

[0.53159247 0.05737571 0.04847534 0.0016221 ]
[0.32140902 0.20931512 0.12867311 0.02824873]]
In [30]:
print('min =', M. min(axis= 0))
print('max =', M. max(axis= 0))
print('mean=', M. mean(axis= 0))
print('var =', M. var(axis= 0))
print('std =', M. std(axis= 0))
print('med =', np. median(M, axis= 0)) # M.median(axis=0) does not work
print('p75%=', np. percentile(M, 75, axis= 0))
min = [0.32140902 0.07463842 0.43803321 0.03346237]

max = [0.59753225 0.65124221 0.8448757 0.2195387 ]
mean= [0.48351125 0.27793746 0.63254762 0.15704245]
var = [0.01386324 0.06986296 0.02774547 0.00763635]
std = [0.11774227 0.26431602 0.1665697 0.08738621]
med = [0.53159247 0.10793176 0.61473394 0.21812627]
p75%= [0.56456236 0.37958699 0.72980482 0.21883248]
In [31]:
x = np. arange(9,- 1,- 1)
print (np. argmin(x))
print (x. argmax())
9
0
2-4. Example: What is the Average Height of US

Presidents?
Aggregates available in NumPy can be extremely useful for summarizing a set of values.
As a
simple example, let's consider the heights of all US presidents.
This data is available in the file
president_heights.csv, which is a simple comma-separated list of labels and values:
president_heights.csv
order,name,height(cm)
1,George Washington,189
2,John Adams,170
3,Thomas Jefferson,189
...
pandas to read the file

pandas will be explored more fully later
In [458]:
i m p o r t pandas a s pd
data = pd. read_csv('data/president_heights.csv')
heights = np. array(data['height(cm)'])
print(heights)
[189 170 189 163 183 171 185 168 173 183 173 173 175 178 183 193 178 173
174 183 183 168 170 178 182 180 183 178 182 188 175 179 183 193 182 183
177 185 188 188 182 185]
summary statistics:
In [459]:
print("Mean height: ", heights. mean())
print("Standard deviation:", heights. std())
print("Minimum height: ", heights. min())
print("Maximum height: ", heights. max())
Mean height: 179.73809523809524

Standard deviation: 6.931843442745892
Minimum height: 163
Maximum height: 193
quantiles:
In [460]:
print("25th percentile: ", np. percentile(heights, 25))
print("Median: ", np. median(heights))
print("75th percentile: ", np. percentile(heights, 75))
25th percentile: 174.25

Median: 182.0
75th percentile: 183.0
In [461]:
% m a t p l o t l i b inline
i m p o r t matplotlib.pyplot a s plt
i m p o r t seaborn; seaborn. set() # set plot style
In [462]:
plt. hist(heights)
plt. title('Height Distribution of US Presidents')
plt. xlabel('height (cm)')
plt. ylabel('number');
In [ ]:

2-5. Broadcasting
Motivation
ufunc : element-wise operation
A⊙B
what if A and B has different shape?
we want to match shapes
as long as there is a natural way

</span>
broadcasting : rules for binary ufunc when shapes differ
In [463]:
In [464]:
a = np. array([0, 1, 2])
b = np. array([5, 5, 5])
print (a + b)
[5 6 7]
In [465]:
print(a + 5)
[5 6 7]
we can view a + 5 as :
duplicate the value 5 into the array [5, 5, 5]
then add element-wise
this is only mental model (simple way of thinking broadcasting)
numpy does this in a more efficient way
We can similarly extend this to arrays of higher dimension
In [466]:
a = np. array([0, 1, 2])
M = np. ones((3, 3))
print(M+ a)
[[1. 2. 3.]
[1. 2. 3.]
[1. 2. 3.]]
M+a
a is duplicated, or broadcast
across the second dimension (vertically)
in order to match the shape of M .
In [32]:
a = np. arange(3)
b = np. arange(3). reshape((3,1))
print(a, '\n')
print(b, '\n')
print(a+ b)
[0 1 2]
[[0]
[1]
[2]]
[[0 1 2]
[1 2 3]
[2 3 4]]
visualization of broadcasting in a + 5 , M + b , and a + b
The light boxes represent the broadcasted values: again, this extra memory is not actually
allocated in the course of the operation, but it can be useful conceptually to imagine that it is.
Rules of Broadcasting
Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two
arrays:
Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with
fewer dimensions is padded with ones on its leading (left) side.
Rule 2: If the shape of the two arrays does not match in any dimension, the array with
shape equal to 1 in that dimension is stretched to match the other shape.
Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.
Non-compatible example
In [468]:
M = np. ones((3, 2))
a = np. arange(3)
M + a # TAQ : result?
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
< i p y t h o n - i n p u t - 4 6 8 - d 4 a d f a 6 8 c d 6 2 > in <module>
1 M = np. ones( ( 3 , 2 ) )
2 a = np. arange( 3 )
----> 3 M + a
V a l u e E r r o r : operands could not be broadcast together with shapes (3,2) (3,)
Broadcasting rules apply to any binary ufunc .

e.g. logaddexp(a, b) = log(exp(a) + exp(b))
In [33]:
a = np. array([0, 1, 2])
M = np. ones((3, 3))
print (np. logaddexp(M, a),'\n')
print (np. logaddexp(M, a. reshape((3,1))))
[[1.31326169 1.69314718 2.31326169]

[1.31326169 1.69314718 2.31326169]
[1.31326169 1.69314718 2.31326169]]
[[1.31326169 1.31326169 1.31326169]

[1.69314718 1.69314718 1.69314718]
[2.31326169 2.31326169 2.31326169]]
Broadcasting Example : Centering an array (i.e. zero mean)

Some data analysis algorithms assume zero-mean data for simplicity
PCA
How to do centering?
In [34]:
X = np. random. random((10, 3))
print(X,'\n')
Xmean = X. mean(axis= 0)
print(Xmean)
[[0.20020551 0.71990961 0.04386778]

[0.24253241 0.58362154 0.19365921]
[0.73131058 0.54673692 0.34738314]
[0.4634587 0.36476761 0.48515853]
[0.83694219 0.67311311 0.08619293]
[0.08002749 0.40544108 0.42883816]
[0.34285624 0.21887864 0.82597284]
[0.91164433 0.76492665 0.4030136 ]
[0.2637624 0.37390141 0.97775962]
[0.40045545 0.65564646 0.33258765]]
[0.44731953 0.5306943 0.41244335]

We can compute the mean of each feature using the mean aggregate across the first
dimension:
In [471]:
X_centered = X - Xmean
print (X_centered. mean(axis= 0))
[ 9.43689571e-17 -5.55111512e-17 4.44089210e-17]
Broadcasting Example : Plotting a two-dimensional function

plot a function z = f (x, y)
need to evaluate f (x, y) at 50x50 grid points

use broadcasting to compute z = f (x, y)
then plot z using matplotlib, which will be covered later
In [472]:
# x and y have 50 steps from 0 to 5
x = np. linspace(0, 5, 50) # shape=(50,)
y = np. linspace(0, 5, 50). reshape((- 1,1)) # shape=(50,1)
z = np. sin(x)* * 10 + np. cos(10 + y * x) * np. cos(x)
print('shape of z = ', z. shape)
print(z)
shape of z = (50, 50)

[[-0.83907153 -0.83470697 -0.8216586 ... 0.8956708 0.68617261
0.41940746]
[-0.83907153 -0.82902677 -0.8103873 ... 0.92522407 0.75321348
0.52508175]
[-0.83907153 -0.82325668 -0.79876457 ... 0.96427357 0.84172689
0.66446403]
...
[-0.83907153 -0.48233077 -0.01646558 ... 0.96449925 0.75196531
0.41982581]
[-0.83907153 -0.47324558 0.00392612 ... 0.92542163 0.68540362
0.37440839]
[-0.83907153 -0.46410908 0.02431613 ... 0.89579384 0.65690314
0.40107702]]
In [473]:
plt. imshow(z, origin= 'lower', extent= [0, 5, 0, 5], cmap= 'viridis')
plt. colorbar();
2-6. Comparisons, Masks, and Boolean Logic

In [ ]:
x = np. array([1, 2, 3, 4, 5])
b = (x < = 3)
print ("x <= 3 : ", b)
print (np. sum(x < = 3)) # TAQ
print (np. count_nonzero(x < = 3)) # TAQ
In [475]:
print (x* b) # masking
print (np. sum(x* b))
print (x[x< = 3])
print (x[b])
print (np. sum(x[x< = 3]))
[1 2 3 0 0]
6
[1 2 3]
[1 2 3]
6
In [476]:
print ((3 < = x) & (x < = 4))
print (np. any((3 < = x) & (x < = 4)))
print ((x < 3) | (x > 4))
print (np. all((x < 3) | (x > 4)))
[False False True True False]

True
[ True True False False True]
False
In [477]:
b = (x < = 3)
print (x* b) # masking
print (np. sum(x* b))
[1 2 3 0 0]
6
Motivating Example: Sleepless in Seatle

Is Seatle really rainy city?
Let's get data first!
daily rainfall data from January 1 to December 31, 2014.
In [37]:
i m p o r t pandas a s pd
# use pandas to extract rainfall inches as a NumPy array
data = pd. read_csv('data/Seattle2014.csv')
rainfall = data['PRCP']. values
rainfall. shape
Out[37]: (365,)
In [43]:
# you may need to install seaborn to set nice plot styles
# >>> conda install seaborn
i m p o r t seaborn; seaborn. set()
plt. hist(rainfall, 40);
Questions (on Seatle rainfall data in 2014)

number of rainy days
number of rainy days in non-summer
precipitation in summer
precipitation in non-summer
...
In [480]:
print("Number days without rain: ", np. sum(rainfall = = 0))
print("Number days with rain: ", np. sum(rainfall > 0))
print("Days with more than 10 mm: ", np. sum(rainfall > 10))
print("Rainy days with < 5 mm: ", np. sum((rainfall > 0) & (rainfall < 5)))
Number days without rain: 215

Number days with rain: 150
Days with more than 10 mm: 120
Rainy days with < 5 mm: 10
In [481]:
# construct a mask of all rainy days
rainy = (rainfall > 0)
# construct a mask of all summer days (June 21st is the 172nd day)
days = np. arange(365)
summer = (days > 172) & (days < 262)
print("Median precip on rainy days in 2014 (mm): ",
np. median(rainfall[rainy]))
print("Median precip on summer days in 2014 (mm): ",
np. median(rainfall[summer]))
print("Maximum precip on summer days in 2014 (mm): ",
np. max(rainfall[summer]))
print("Median precip on non-summer rainy days (mm):",
np. median(rainfall[rainy & ~ summer]))
Median precip on rainy days in 2014 (mm): 49.5

Median precip on summer days in 2014 (mm): 0.0
Maximum precip on summer days in 2014 (mm): 216
Median precip on non-summer rainy days (mm): 51.0
2-7. Fancy Indexing

Indexing np array
simple index: arr[0]
slice: arr[:5]
Boolean mask: arr[arr > 0]
fancy indexing: arr[[1,3,7]]
In [482]:
x = np. random. randint(100, size= 10)
print(x)
[ 7 8 89 16 52 87 72 34 4 0]
In [ ]:
ind = [3, 7, 4]
print (x[ind]) # TAQ : result?
In [ ]:
ind = np. array([[3, 7],
[4, 5]])
x[ind] # TAQ : result?
In [485]:
X = np. arange(12). reshape((3, 4))
Out[485]: array([[ 0, 1, 2, 3],

[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [486]:
row = np. array([0, 1, 2])
col = np. array([2, 1, 3])
X[row, col] # TAQ : result?
Out[486]: array([ 2, 5, 11])
In [487]:
row = np. array([0, 2]). reshape((2,1))
col = np. array([2, 1, 3])
X[row, col] # TAQ : result? Hint : broadcasting is applied
Out[487]: array([[ 2, 1, 3],

[10, 9, 11]])
Combined Indexing
combining simple, slice, mask, and fancy index
In [488]:
print(X)
col = [2,0,1]
print(X[2, col])
print(X[1:, col])
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[10 8 9]
[[ 6 4 5]
[10 8 9]]
In [489]:
row = np. array([0, 2]). reshape((2,1))
col_mask = np. array([T r u e , F a l s e , T r u e , F a l s e ])
X[row, col_mask]
Out[489]: array([[ 0, 2],

[ 8, 10]])
Example: Selecting Random Points

consider a set of N points in D dimensions
we generate N = 100 points in 2D
bi-variate normal
plot them using matplotlib
randomly select 20 points from them
mark selected points in different shape
In [45]:
mean = [0, 0]
cov = [[1, 2],
[2, 5]]
X = np. random. multivariate_normal(mean, cov, 100)
X. shape
Out[45]: (100, 2)
In [46]:
i m p o r t seaborn; seaborn. set() # for plot styling
plt. scatter(X[:, 0], X[:, 1]);
In [47]:
indices = np. random. choice(X. shape[0], 20, replace= F a l s e )
print (indices)
selection = X[indices] # fancy indexing here
print (selection. shape)
[ 2 81 12 6 94 68 82 30 1 23 37 3 64 21 11 45 83 67 92 71]
(20, 2)
In [48]:
plt. scatter(X[:, 0], X[:, 1], alpha= 0.3)
plt. scatter(selection[:, 0], selection[:, 1],
facecolor= 'red', s= 7);
Modifying Values with Fancy Indexing

In [494]:
x = np. arange(10)
idx = np. array([2, 1, 8, 4])
x[idx] = 99
print(x)
x[idx] - = 10
print(x)
[ 0 99 99 3 99 5 6 7 99 9]
[ 0 89 89 3 89 5 6 7 89 9]
We can use any assignment-type operator for this. For example:
In [495]:
x[i] - = 10
print(x)
[ 0 89 89 3 89 5 6 7 89 -1]
Avoid duplication in fancy index

may cause unexpected results
use at() method if duplication is unavoidable
In [496]:
# duplication in fancy index
x = np. zeros(10)
x[[0, 0]] = [4, 6]
print(x)
[6. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
In [497]:
# duplication in fancy index
idx = [2, 3, 3, 4, 4, 4]
x[idx] + = 1
print(x)
[6. 0. 1. 1. 1. 0. 0. 0. 0. 0.]
In [498]:
x = np. zeros(10)
np. add. at(x, idx, 1)
print(x)
[0. 0. 1. 2. 3. 0. 0. 0. 0. 0.]
Sorting
np.sort
np.argsort
In [499]:
x = np. random. random(5)
y = np. sort(x)
print(x) # x is not changed
print(y)
x. sort() # in-place sort
print(x)
[0.59488531 0.19874637 0.33881144 0.23509604 0.80192003]

[0.19874637 0.23509604 0.33881144 0.59488531 0.80192003]
[0.19874637 0.23509604 0.33881144 0.59488531 0.80192003]
In [500]:
height = 150 + 40* np. random. random(5)
print('hieght=', height)
money = 100* np. random. random(5)
print('money =', money)
idx = np. argsort(height) # idx in order of height
print('index =', idx)
print('h[idx]=', height[idx]) # fancy index
print('m[idx]=', money[idx]) # fancy index
hieght= [189.40070616 161.84410721 163.2017855 167.6655901 184.75984166]

money = [ 3.3759033 12.44325455 84.22738284 5.61546723 45.12811488]
index = [1 2 3 4 0]
h[idx]= [161.84410721 163.2017855 167.6655901 184.75984166 189.40070616]
m[idx]= [12.44325455 84.22738284 5.61546723 45.12811488 3.3759033 ]
Partitioning
g
complete sorting is not needed
want to find the k-smallest values in the array
np.partition :
the smallest K values to the left of the partition
and the remaining values to the right, in arbitrary order:
In [501]:
x = np. array([7, 2, 3, 1, 6, 5, 4])
y = np. partition(x, 3)
print (y)
idx = np. argpartition(x, 3)
print (idx)
print (x[idx])
[2 1 3 4 6 5 7]
[1 3 2 6 4 5 0]
[2 1 3 4 6 5 7]
In [502]:
X = np. random. randint(0, 10, (4, 6))
print (X)
print (np. partition(X, 2, axis= 1))
[[5 8 4 2 0 0]
[6 5 1 9 6 8]
[8 4 4 1 2 1]
[0 4 1 0 6 7]]
[[0 0 2 4 5 8]
[1 5 6 9 6 8]
[1 1 2 8 4 4]
[0 0 1 4 6 7]]
Example: k-Nearest Neighbors

Randomy created 10 points in 2D
In [509]:
N = 10
X = np. random. rand(N, 2)
In [510]:
i m p o r t seaborn; seaborn. set() # Plot styling
plt. scatter(X[:, 0], X[:, 1], s= 100);
In [511]: # squared distance matrix (NxN)
dist_sq = np. sum((X. reshape(N,1,2) - X. reshape(1,N,- 1)) * * 2, axis= - 1)
dist = np. sqrt(dist_sq)
In [512]:
# the above can be done using scipy.spatial.distance
f r o m scipy.spatial.distance i m p o r t pdist, squareform
# pdist(.) : pairwise distance, metric = 'euclid' by default
# squareform(.) : nxn matrix form
dist = squareform(pdist(X))
In [513]:
K = 2
knn0 = np. argpartition(dist, K + 1, axis= 1)
In [514]:
plt. scatter(X[:, 0], X[:, 1], s= 100)
# draw lines from each point to its two nearest neighbors
knn = knn0[:, 1:K+ 1] # exclude column 0
f o r i i n range(N):
f o r j i n knn[i]:
# plot a line from X[i] to X[j]
# use some zip magic to make it happen:
plt. plot(* zip(X[j], X[i]), color= 'black')
In [ ]:

In [ ]:

L2. Numpy

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L2. Numpy

Uploaded by

Copyright:

Available Formats

Lecture 2.

print("a : List =", a)

print("b : np array =", b)

NumPy standard data types

Data type Description

bool_ Boolean (True or False) stored as a byte

intc Identical to C int (normally int32 or int64 )

int8 Byte (-128 to 127)

int16 Integer (-32768 to 32767)

int32 Integer (-2147483648 to 2147483647)

int64 Integer (-9223372036854775808 to 9223372036854775807)

uint8 Unsigned integer (0 to 255)

uint16 Unsigned integer (0 to 65535)

uint32 Unsigned integer (0 to 4294967295)

uint64 Unsigned integer (0 to 18446744073709551615)

float_ Shorthand for float64 .

complex_ Shorthand for complex128 .

complex64 Complex number, represented by two 32-bit floats

complex128 Complex number, represented by two 64-bit floats

Out[416]: array([1., 2., 3., 4.], dtype=float32)

Creating arrays from scratch

a0 = np. zeros(10, dtype= int)

# Create a 3x5 floating-point array filled with ones

a1 = np. ones((2, 5), dtype= float)

# Create a 3x5 array filled with 3.14

af = np. full((2, 5), 3.14)

Out[418]: array([[1., 0., 0.],

# Starting at 0, ending at 20, stepping by 2

# (this is similar to the built-in range() function)

np. arange(0, 20, 2)

Out[419]: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])

Creating array with random numbers

np. random. seed(0)

np. random. random(size= (3, 3))

Out[10]: array([[0.5488135 , 0.71518937, 0.60276338],

np. random. randint(0, 10, size= (3, 3))

Out[421]: array([[7, 1, 6],

np. random. randn(3,3)

Out[422]: array([[ 0.33786932, 1.39970946, 1.1298669 ],

np. random. normal(50, 10, size= (3, 3))

Out[423]: array([[52.7827885 , 50.90993972, 35.35756522],

NumPy array attributes

x3 = np. random. randint(0, 10, size= (3, 4, 5)) # Three-dimensional array

print("dtype:", x3. dtype)

print("ndim: ", x3. ndim)

print("shape:", x3. shape)

print("size: ", x3. size)

print("itemsize:", x3. itemsize, "bytes")

print("nbytes:", x3. nbytes, "bytes")

Array indexing, slicing: similar to python list

print (x1[0], x1[1], x1[- 1], x1[- 2])

print (x1[7:- 1])

- a view (not a copy) of the base array

- to make a copy, use copy()

print (x2[0]) # first row

print (x2[:,1]) # second col

x2 = x1. reshape((3, 3))

x2[1] = 0 # x2[1] = x2[1,:]

print (x2, '\n')

print (x2[0], '\n')

print (x2[1]. reshape((1,4)), '\n')

print (x2[2]. reshape((4,1)), '\n')

print (x2. reshape((4,- 1)), '\n')

np.power Exponentiation (e.g., 2 3 = 8 )