Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

We’ve made changes to our Terms of Service and Privacy Policy.

They take e ect on September 1, 2020, and we


encourage you to review them. By continuing to use our services, you agree to the new Terms of Service and
acknowledge the Privacy Policy applies to you.

Continue

Sign in to your account (pa__@g__.com) for your personalized experience.

Sign in with Google

Not you? Sign in or create an account

You have 2 free stories left this month. Sign up and get an extra one for free.

10 Numpy functions you should know


with data science and arti cial intelligence examples

Amanda Iglesias Moreno Follow


Jan 2 · 9 min read

Numpy is a python package for scientific computing that provides high-performance


multidimensional arrays objects. This library is widely used for numerical analysis,
matrix computations, and mathematical operations. In this article, we present 10 useful
numpy functions along with data science and artificial intelligence applications. Let’s get
started! 🍀

1. numpy.linspace
The numpy.linspace(start, stop, num=50, endpoint=True, retstep=False,
dtype=None, axis=0) function returns evenly spaced numbers over a specified interval
We’vedefined by the
made changes firstTerms
to our two of
arguments
Service andof the function
Privacy (start
Policy. They take eand
ect stop — required
on September 1, 2020, and we
encourage you to review them. By continuing to use our services, you agree to the
arguments). The number of samples generated is specified by the third argument new Terms of Service and
num.
acknowledge the Privacy Policy applies to you.
If omitted, 50 samples are generated. One important thing to bear in mind while
Continue
working with this function is that the stop element is provided in the returned array (by
default endpoint=True), unlike in the built-in python function range.

1 # Linspace function
2
3 # array with 11 elements, last element included
4 np.linspace(0,10,11)
5 # array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])
6
7 # array with 11 elements, last element not included
8 np.linspace(0,10,11,endpoint=False)
9 # array([0., 0.90909091, 1.81818182, 2.72727273, 3.63636364,4.54545455, 5.45454545, 6.36363636, 7

numpy_linspace.py hosted with ❤ by GitHub view raw

Example
Linspace function can be used to generate evenly spaced samples for the x-axis. For
instance, if we want to plot a mathematical function, we can easily generate samples for
the x-axis by using the numpy.linspace function. In reinforcement learning, we can
employ this function for discretization purposes, providing the highest and lowest value
of a continuous space (states or actions), generating a uniformly discrete space.

The following plot shows 4 mathematical functions: (1) Sine, (2) Cosine, (3)
Exponential, and (4) Logarithmic function. To generate x-axis data, we employ the
linspace function, generating 111 data points from 0 to 100, both included. You may
notice that for generating the mathematical functions we have used Numpy again. We
can consult the documentation to observe the wide range of mathematical functions
that Numpy provides :)
We’ve made changes to our Terms of Service and Privacy Policy. They take e ect on September 1, 2020, and we
encourage you to review them. By continuing to use our services, you agree to the new Terms of Service and
acknowledge the Privacy Policy applies to you.

Continue

2. numpy.digitize
Maybe you have never heard about this function, but it can be really useful working with
continuous spaces in reinforcement learning. The numpy.digitize(x, bins, right=False)
function has two arguments: (1) an input array x, and (2) an array of bins, returning the
indices of the bins to which each value in input array belongs. Confusing? Let’s see an
example 👌

Example
In the code above, we have 5 bins in total:

x < 0 → Index 0

0≤ x < 1 → Index 1

1≤ x < 2 → Index 2

2≤ x < 3 → Index 3

3≤ x → Index 4

Therefore, if we provide as an input 0.5, the function returns 1, since that is the index of
the bin to which 0.5 belongs.

In reinforcement learning, we can discretize state spaces by using uniformly-spaced


grids. Discretization allows us to apply algorithms designed for discrete spaces such as
Sarsa, Sarsamax, or Expected Sarsa to continuous spaces.
We’veImagine we have
made changes to ourthe following
Terms of Servicecontinuous TheyThe
space.
and Privacy Policy. takeagent
e ect oncan be in any
September position
1, 2020, and we
encourage
(x,y), where 0≤x≤5 and 0≤y≤5. We can discretize the position of the agent by and
you to review them. By continuing to use our services, you agree to the new Terms of Service
acknowledge the Privacy Policy applies to you.
providing a tuple, indicating the grid where the agent is located.
Continue

We can easily achieve this discretization by using the numpy.digitize function as


follows:

We will consider than any value lower than 1 belongs to bin index 0 and any value larger
than or equal to 4 belongs to bin index 4. And voilà! we have transformed a continuous
space into a discrete one.

3. numpy.repeat
The numpy.repeat(a, repeats, axis=None) function repeats the elements of an array.
The number of repetitions is specified by the second argument repeats.

Example
Let’s say we have two different data frames, containing the sales in 2017 and 2018, but
we want only one data frame, including all the information.
We’ve made changes to our Terms of Service and Privacy Policy. They take e ect on September 1, 2020, and we
encourage you to review them. By continuing to use our services, you agree to the new Terms of Service and
acknowledge the Privacy Policy applies to you.
sales in 2017
Continue

sales in 2018

Before merging both data frames, we need to add a column, specifying the year in which
the products were sold. We can add this information by using the numpy.repeat
function. Subsequently, we concatenate both data frames by using the pandas.concat
function.

sales

4. numpy.random
4.1. numpy.random.randint
The numpy.random.randint(low, high=None, size=None, dtype=’l’) function returns
random integers from the interval [low,high). If high parameter is missing (None), the
random numbers are selected from the interval [0,low). By default, a single random
number(int) is returned. To generate a narray of random integers, the shape of the array
is provided in the parameter size.
We’veExample
made changes to our Terms of Service and Privacy Policy. They take e ect on September 1, 2020, and we
encourage you to review them. By continuing to use our services, you agree to the new Terms of Service and
This function can be used to simulate random events such as tossing a coin, or rolling a
acknowledge the Privacy Policy applies to you.
dice as shown below.
Continue

4.2. numpy.random.choice
The numpy.random.choice(a, size=None, replace=True, p=None) returns a random
sample from a given array. By default, a single value is returned. To return more
elements, the output shape can be specified in the parameter size as we did before with
the numpy.random.randint function.

Example
The random events shown above can also be simulated by using the
numpy.random.choice.

By default, elements have equal probability of being selected. To assign different


probabilities to each element, an array of probabilities p can be provided. Using this
parameter p, we can simulate a biased coin flip as follows:

4.3. numpy.random.binomial
We can simulate a wide variety of statistical distributions by using numpy such as
normal, beta, binomial, uniform, gamma, or poisson distributions.

The numpy.random.binomial(n, p, size=None) draws samples from a binomial


distribution. The binomial distribution is used when there are two mutually exclusive
outcomes, providing the number of successes of n trials with a probability of success on
a single trial p.

I recommend to read the documentation and discover the wide range of function that
the numpy.random library provides.

5. numpy.polyfit
The numpy.polyfit(x, y, deg, rcond=None, full=False, w=None, cov=False) function
outputs a polynomial of degree deg that fits the points (x,y), minimizing the square
error.

This function can be very useful in linear regression problems. Linear regression
models the relationship between a dependent variable and an independent variable,
obtaining a line that best fits the data.
y =a+bx
We’ve made changes to our Terms of Service and Privacy Policy. They take e ect on September 1, 2020, and we
encourage you to review them. By continuing to use our services, you agree to the new Terms of Service and
acknowledge the Privacy Policy applies to you.
where x is the independent variable, y is the dependent variable, b is the slope, and a
Continue
is the intercept. To obtain both coefficients a and b, we can use the numpy.polyfit
function as follows.

Example
Let’s say we have a data frame containing the heights and weights of 5000 men.

As we can observe, both variables present a linear relation.

We obtain the best-fit linear equation with the numpy.polyfit function in the following
manner:
We’veThe
madefunction
changes toreturns theofslope
our Terms Service(5.96) and Policy.
and Privacy intercept
They (-224.50)
take e ect onofSeptember
the linear model.
1, 2020, andNow,
we
encourage you to review them. By continuing to use our services, you agree to the new
we can employ the obtained model (y=5.96x-224.50) to predict the weight of a man Terms of Service and
acknowledge the Privacy Policy applies to you.
(unseen data). This prediction can be obtained by using the numpy.polyval function.
Continue

6. numpy.polyval
The numpy.polyval(p, x) function evaluates a polynomial at specific values. Previously,
we have obtained a linear model to predict the weight of a man (weight=5.96*height-
224.50) by using the numpy.polyfit function. Now, we use this model to make
predictions with the numpy.polyval function. Let’s say we want to predict the weight of
a men 70 inches tall. As arguments, we provide the polynomial coefficients (obtained
with polyfit) from highest degree to the constant term (p=[5.96,-224.49]), and a
number at which to evaluate p (x=70).

The following plot shows the regression line as well as the predicted weight.

7. numpy.nan
Numpy library includes several constants such as not a number (Nan), infinity (inf) or
pi. In computing, not a number is a numeric data type that can be interpreted as a value
that is undefined. We can use not a number to represent missing or null values in
Pandas. Unfortunately, dirty data sets contain null values with other denominations
(e.g. Unknown, — , and n/a), making difficult to detect and drop them.

Example
We’veLet’s
madesay we have
changes to ourthe following
Terms of Servicedata set, containing
and Privacy Policy. Theyinformation about houses
take e ect on September inand
1, 2020, thewecity
encourage you to review
of Madrid them.set
(this data By continuing
is reduced to for
use explanatory
our services, you agree to the new Terms of Service and
purposes).
acknowledge the Privacy Policy applies to you.

Continue

Data frame with non-standard missing values

We can easily analyze missing values by using the pandas.DataFrame.info method.


This method prints information about the data frame including column types, number of
non-null values, and memory usage.

Output of the info method

As we can observe, the info function does not detect unexpected null values (Unknown
and -). We have to convert those values into null values that Pandas can detect. We can
achieve that by using the numpy.nan constant.
We’veBefore analysing
made changes to ourthe data,
Terms we have
of Service and to handle
Privacy missing
Policy. values.
They take e ect To do so, there
on September are and we
1, 2020,
encourage you toapproaches:
different review them. By(1)
continuing to use ourvalues
assign missing services,manually
you agree to
(inthe newwe
case Terms of Service
know and
the data),
acknowledge the Privacy Policy applies to you.
(2) replace missing values with the mean/median value, or (3) delete rows with missing
Continue
data, among other approaches.

After replacing (Unknown and -) with standard null values, two missing values are
detected in columns num_bedrooms and num_balconies. Now, those missing values can
be easily deleted by using the pandas.DataFrame.dropna function (approach 3).

Data frame before dropping null values

Data frame after dropping null values

8. numpy.argmax
The numpy.argmax(a, axis=None, out=None) function returns the indices of the
maximum values along an axis.

In a 2d array, we can easily obtain the index of the maximum value as follows:

We can obtain the indeces of maximum values along a specified axis, providing 0 or 1 to
the axis attribute.

Example
We’veThe
madenumpy.argmax canofbe
changes to our Terms veryand
Service useful in Policy.
Privacy reinforcement
They take elearning tasks. The
ect on September 1, 2020, and weis
Q-table
encourage you to review them.
an action-value By continuing
function estimationto use
thatourcontains
services, you
theagree to the new
expected Terms
return ofeach
for Service and
state-
acknowledge the Privacy Policy applies to you.
action pair, assuming the agent is in state s, and takes action a, following policy π until
Continue
the end of the episode.

Q table

We can easily obtain the policy by choosing the action a that provides maximum
expected return for each state s.

Policy from the Q table

In the above example, the numpy.argmax function returns the policy: state 0 → action
0, state 1 → action 2, and state 2 → action 1.

9. numpy.squeeze
The numpy.squeeze(a, axis=None) removes single-dimensional entries from the shape
of an array. The argument axis specifies the axis we want to squeeze out. If the shape of
the selected axis is greater than 1 a ValueError is raised. An example of how to use
numpy.squeeze function is shown below.

As we can observed, only axes 0 and 2 can be removed since both have lenght 1. Axis 1
has 3 elements; therefore, a ValueError is raised.

Example
Pytorch is an open source machine learning library based on the Torch library. The
library provides multiple data sets such as MNIST, Fashion-MINST, or CIFAR that we can
We’veuse
madefor training
changes neural
to our Termsnetworks.
of Service andFirst, we Policy.
Privacy download theedata
They take setSeptember
ect on (e.g. MNIST)
1, 2020,with the
and we
encourage you to review them. Byfunction.
torchvision.datasets continuingThen,
to use weour services,
create an youiterable
agree to by
theusing
new Terms of Service and
acknowledge the Privacy Policy applies to you.
torch.utils.data.DataLoader. This iterable is passed to the iter() method, generating
Continue
an iterator. Finally, we get each element of the iterator by using the next() method.
Those elements are tensors of shape [N,C,H,W], being N — batch size, C — number of
channels, H — height of input planes in pixels, and W width in pixels.

To visualize an element of the previous batch, we have to eliminate the first axis since
the matplotlib.pyplot.imshow function accepts as an input an image of shape (H,W).

First image of the batch

10. numpy.histogram
The numpy.histogram(a, bins=10, range=None, normed=None, weights=None,
density=None) computes the histogram of a set of data. The function returns 2 values:
(1) the frequency count, and (2) the bin edges.

Example
The following data frame contains the height of 5000 men. We create a histogram plot,
passing kind=’hist’ to the plot method.
We’ve made changes to our Terms of Service and Privacy Policy. They take e ect on September 1, 2020, and we
encourage you to review them. By continuing to use our services, you agree to the new Terms of Service and
acknowledge the Privacy Policy applies to you.

Continue

By default, the histogram method breaks up the data set into 10 bins. Notice that the x-
axis labels do not match with the bin size. This can be fixed by passing in a xticks
parameter, containing the list of the bin sizes, in the following manner:

Thanks for reading!! 🍀 🍀 💪 And use Numpy!

Sign up for The Daily Pick


By Towards Data Science
Hands-on real-world examples, research, tutorials, and cutting-edge techniques delivered Monday
to Thursday. Make learning your daily ritual. Take a look

Create a free Medium account to get The Daily Pick in


your inbox.
We’ve made changes
Gettothis
our newsletter
Terms of Service and Privacy Policy. They take e ect on September 1, 2020, and we
encourage you to review them. By continuing to use our services, you agree to the new Terms of Service and
acknowledge the Privacy Policy applies to you.

Continue

Machine Learning Python Data Science Numpy Arti cial Intelligence

About Help Legal

Get the Medium app

You might also like