Numpy & Pandas

Introduction: What is NumPy? Pandas?
Numpy and Pandas are Python libraries that are incredibly useful
for all data scientists. Numpy is a used for scientific computing, and
its main feature is its high-performance implementations of arrays
and matrices. You’ll find NumPy extremely useful in working with
large-scale, multi-dimensional data, and you can use it in
conjunction with many popular machine learning libraries such as
sci-kit and Tensorflow.
Pandas is a library used for data manipulation, and its main feature
is the use of DataFrame objects to work with data in an easy-to-use
table format. Pandas is built on top of the functionality provided by
NumPy.
Assuming you have Python and pip (Python Package Installer), you
can easily install NumPy and Pandas using your command line.
pip install numpy
pip install pandas
And with that, let’s delve into a quick overview of NumPy and
Pandas!
Intro to NumPy
Like any regular python package, you’ll need to import NumPy

before you do anything with it.
import numpy as np
Creating NumPy arrays

There are several ways to create an array in NumPy, such as
np.array, np.zeros, np.ones, etc. Each has its own purpose.
np.array allows you to pass in a regular Python list in order to

create a NumPy array. Note that the object you get is different from
the Python list type.
>>> a = np.array([43, 56, 35, 3])
>>> a
array([43, 56, 35, 3])>>> b = [43, 56, 35, 3]>>> type(a)
<class 'numpy.ndarray'>
>>> type(b)
<class 'list'>
Note that you can create multidimensional arrays as well!

>>> c = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
>>> c
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
np.ones creates a NumPy array full of ones. You can specify the
shape of the array too (shape describes the dimensions of the array
object. I’ll discuss this in just a moment). np.zeros works the same
way, but with zeros!
>>> d = np.ones((2, 2))
>>> d
array([[1., 1.],
[1., 1.]])>>> d = np.zeros((2, 2))
>>> d
array([[0., 0.],
[0., 0.]])
np.linspace takes a start point, end point, and the number of

elements you want in the array. It then makes an array with evenly
spaced numbers.
>>> e = np.linspace(0, 5, 3)
>>> e
array([0. , 2.5, 5. ]) # THREE evenly spaced numbers>>> e =
np.linspace(0, 5, 4)
>>> e
array([0. , 1.66666667, 3.33333333, 5. ]) # FOUR
of em
NumPy arrays have some attributes which are very useful to know.
These include:
1. ndim: the dimension of the array
2.shape: a tuple of integers indicating the size of the array in

each dimension
3.size: the total number of elements in the array
4.dtype: returns the type of elements in the array (ex. int64,

float, bool)
Let’s look at one of the simple arrays we created earlier.

>>> c
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])>>> c.ndim
2>>> c.shape
(3, 3)>>> c.size
9>>> c.dtype
dtype('int32')
Accessing array elements
NumPy array elements can be accessed using a similar indexing

scheme to good ole Python’s (called slicing notation). Let’s say we
have an array A.
 A[2] will give the element at index 2. Remember indices start
at 0.
 A[2:5] will give the elements from 2 to 4. The endpoint is not

inclusive.
 A[:3] will give all elements from the beginning until index 3.
Similarly, A[3:] will give all elements from index 3 to the very
end
 You can specify multiple dimensions by using a comma (“,”) in

between the ranges you want to specify. For example, A[2:4,
1:4].
Let’s look at this in action. We’ve got an array c.

>>> c
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
We can access a single element two ways.

>>> c[2][2]
9
>>> c[2,2]
9
You can also access a full row or column. Note putting just ‘:’ will
refer the entire range of values from beginning to end.
>>> c[1]
array([4, 5, 6])
>>> c[:,1]
array([2, 5, 8])
For more info on indexing NumPy arrays, visit their documentation

page here.
Array operations
There’s a lot of things you can do with NumPy arrays, and I won’t get
the chance to cover them all. I’ll start with the basics.
Operations with scalar values applies the operation to each element

of the array.
>>> a
array([1, 2, 3])>>> a + 1
array([2, 3, 4])
>>> a * 2
array([2, 4, 6])
Operations between two arrays will be element-wise.

>>> a
array([1, 2, 3])
>>> b
array([4, 5, 6])
>>> a * b
array([ 4, 10, 18])
Make sure they’re the same shape though!

>>> x
array([1, 2, 3])
>>> y
array([1, 2])
>>> x + y
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with shapes
(3,) (2,)
You can also check for equality (or any comparison rather). You can
do it element-wise, and get back and array of boolean values. You
can also check whether two arrays are equal
using np.array_equal().
>>> a = np.array([1, 2, 3, 4])
>>> b = np.array([4, 2, 2, 4])
>>> a == b
array([False, True, False, True])
>>> a > b
array([False, False, True, False])>>> np.array_equal(a, b)
False
You can transpose a matrix too.

>>> a
array([[1, 1],
[0, 0]])
>>> a.T
array([[1, 0],
[1, 0]])
You can also reshape an array by specifying a tuple, which will be

the shape of the resulting array.
>>> a.reshape((4,1))
array([[1],
[1],
[0],
[0]])
>>> a.reshape((1,4))
array([[1, 1, 0, 0]])
Let’s move on to slightly more complex operations. We’ll go over

some basic, but useful reductions, which are values you calculate
from all of the elements in a list.
Let’s take a simple array. I’ll simply show you some of the reductions
you should know.
>>> x = np.array([1, 3, 2])>>> x.sum()
6
>>> x.min()
1
>>> x.max()
3
>>> x.mean()
2.0
>>> np.median(x)
2.0
>>> x.std()
0.816496580927726
You can also do reductions on rows and columns of multi-

dimensional arrays by specifying the axis along which to do your
operation, like so:
>>> a
array([[1, 1],
[2, 2]])
>>> a.sum(axis=0)
array([3, 3])
>>> a.sum(axis=1)
array([2, 4])
How axes are labelled in a matrix
The last couple operations that’ll be super important are your cross
product, dot product, and matrix multiplication operations.
>>> x = np.array([1, 2, 3])
>>> y = np.array([2, 3, 4])# Dot product
>>> np.dot(x, y)
20# Cross product
>>> np.cross(x, y)
array([-1, 2, -1])# Matrix multiplication
>>> a
array([[1, 1],
[2, 2]])
>>> b
array([[1, 3],
[2, 4]])
>>> np.matmul(a,b)
array([[ 3, 7],
[ 6, 14]])
Phew! That was a lot of examples with NumPy, and there’s a bunch
more you can do with it. Let’s move on to Pandas, which is very
useful for organizing data you’ll typically encounter in the real
world!
Intro to Pandas
Just like NumPy, we have to import pandas.

import pandas as pd
Creating Pandas data structures
The two main data structures you’ll come across in Pandas are
the DataFrame and the Series.
A Series can be treated as a 1D array, similar to a single column in a

spreadsheet. A DataFrame is a 2D table, analogous to an entire
spreadsheet.
A Series can be created by passing a list of values to

the pd.Series() function.
>>> s = pd.Series([1, 2, 5, np.nan, 6, 8])
>>> s
0 1.0
1 2.0
2 5.0
3 NaN # note that np.nan creates a 'not a number', or a
'NaN'
4 6.0
5 8.0
dtype: float64
You can create a DataFrame by using

the pd.DataFrame() constructor. A link to the documentation
page can be found here for a full overview of how to create a
DataFrame.
>>> df = pd.DataFrame(np.random.randn(6, 4), columns=['A', 'B',
'C', 'D'])
>>> df
A B C D
0 1.529833 -0.933167 0.728422 -0.797813
1 -0.508315 -0.952360 -0.148712 0.702790
2 -1.590158 0.376262 0.367797 -0.226617
3 1.066155 1.067526 -0.684484 1.310766
4 0.385859 0.087228 1.476244 0.511632
5 1.035326 1.011037 -0.753938 -0.285154
Here, I’ve created a DataFrame with random numbers, with

columns A, B, C, an D. Let’s see what we can do with it!
Viewing data in a DataFrame
Firstly, we can view the head (first couple of rows) and tail (last
couple of rows) of the DataFrame. Normally, it’d give you 5 rows,
but we can specify how many rows we’d want.
>>> df.head()
A B C D
0 1.529833 -0.933167 0.728422 -0.797813
1 -0.508315 -0.952360 -0.148712 0.702790
2 -1.590158 0.376262 0.367797 -0.226617
3 1.066155 1.067526 -0.684484 1.310766
4 0.385859 0.087228 1.476244 0.511632>>> df.tail(2)
A B C D
4 0.385859 0.087228 1.476244 0.511632
5 1.035326 1.011037 -0.753938 -0.285154
You can view the column labels of the DataFrame.

>>> df.columns
Index(['A', 'B', 'C', 'D'], dtype='object')
You can grab a single column, which yields a Series, or you can grab
rows using Python slice notation.
>>> df['B'] # Specify the column label
0 -0.933167
1 -0.952360
2 0.376262
3 1.067526
4 0.087228
5 1.011037
Name: B, dtype: float64>>> df[0:3] # Specify the row indices
using slice notation
A B C D
0 1.529833 -0.933167 0.728422 -0.797813
1 -0.508315 -0.952360 -0.148712 0.702790
2 -1.590158 0.376262 0.367797 -0.226617
EDIT: You can also select multiple columns by passing a list of the
column names you want! For example: df[['A', 'B']] .
However, you cannot select a single row using [] notation. For that,
you’ll need to use either loc or iloc. loc will use the named label for
the index, while iloc will use the integer index.
In our example, we used the default integer indexing scheme, but

your data could be indexed by date or something like that, in which
case you’d use loc.
>>> df.iloc[0]
A 1.529833
B -0.933167
C 0.728422
D -0.797813
Name: 0, dtype: float64
You can also use slice notation for more powerful data accesses.
>>> df.iloc[0:3]
A B C D
0 1.529833 -0.933167 0.728422 -0.797813
1 -0.508315 -0.952360 -0.148712 0.702790
2 -1.590158 0.376262 0.367797 -0.226617>>> df.iloc[0:3, 1:3] #
Rows 0-2, columns 1-2
B C
0 -0.933167 0.728422
1 -0.952360 -0.148712
2 0.376262 0.367797
Editing DataFrames
There are a number of ways we can change our DataFrames.
Firstly, we have setting. You can set by column using a Series.

>>> s1 = pd.Series([1, 2, 3, 4, 5, 6])
>>> s1
0 1
1 2
2 3
3 4
4 5
5 6
dtype: int64>>> df['F'] = s1 # This creates a new column 'F'
equal to Series s1
Or you could set values using labels

>>> df.at[0, 'A'] = 0 # The first 0 is the label, 'A' is the
column
Or you could set using an integer position.

>>> df.iat[0, 1] = np.nan
Here’s the resulting DataFrame after these changes were made:

>>> df
A B C D F
0 0.000000 NaN 0.728422 -0.797813 1
1 -0.508315 -0.952360 -0.148712 0.702790 2
2 -1.590158 0.376262 0.367797 -0.226617 3
3 1.066155 1.067526 -0.684484 1.310766 4
4 0.385859 0.087228 1.476244 0.511632 5
5 1.035326 1.011037 -0.753938 -0.285154 6
If you have multiple DataFrames and Series that you want to

combine, you can do that! These are done using either concatenation
or appending.
You’ll want to use append when you have rows that you want to
add on to an existing DataFrame.
>>> df1 = pd.DataFrame(np.random.randn(3, 5), columns=['A', 'B',
'C', 'D', 'F'])
>>> df1
A B C D F
0 -0.075624 0.210857 0.215464 -0.732181 2.151847
1 -0.265325 1.323702 -0.488284 1.253780 -1.949705
2 -0.592924 -0.442635 0.601039 1.839268 -1.247409>>> df =
df.append(df1)
>>> df
A B C D F
0 0.000000 NaN 0.728422 -0.797813 1.000000
1 -0.508315 -0.952360 -0.148712 0.702790 2.000000
2 -1.590158 0.376262 0.367797 -0.226617 3.000000
3 1.066155 1.067526 -0.684484 1.310766 4.000000
4 0.385859 0.087228 1.476244 0.511632 5.000000
5 1.035326 1.011037 -0.753938 -0.285154 6.000000
0 -0.075624 0.210857 0.215464 -0.732181 2.151847
1 -0.265325 1.323702 -0.488284 1.253780 -1.949705
2 -0.592924 -0.442635 0.601039 1.839268 -1.247409
You can reset the indices in the resulting DataFrame to fix the
indices a bit using reset_index(). The ‘drop’ setting makes sure
the original indices are not saved into a new column.
>>> df = df.reset_index(drop=True)
>>> df
A B C D F
0 0.000000 NaN 0.728422 -0.797813 1.000000
1 -0.508315 -0.952360 -0.148712 0.702790 2.000000
2 -1.590158 0.376262 0.367797 -0.226617 3.000000
3 1.066155 1.067526 -0.684484 1.310766 4.000000
4 0.385859 0.087228 1.476244 0.511632 5.000000
5 1.035326 1.011037 -0.753938 -0.285154 6.000000
6 -0.075624 0.210857 0.215464 -0.732181 2.151847
7 -0.265325 1.323702 -0.488284 1.253780 -1.949705
8 -0.592924 -0.442635 0.601039 1.839268 -1.247409
To concatenate, you’d use the pd.concat() function. You can add

both rows and columns, as long as you specify the axis along which
you’re adding new data.
>>> c1 = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9], name='Z')
>>> c1
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
Name: Z, dtype: int64>>> pd.concat([df, c1], axis=1) # Specify
axis 1 to add a column
A B C D F Z
0 0.000000 NaN 0.728422 -0.797813 1.000000 1
1 -0.508315 -0.952360 -0.148712 0.702790 2.000000 2
2 -1.590158 0.376262 0.367797 -0.226617 3.000000 3
3 1.066155 1.067526 -0.684484 1.310766 4.000000 4
4 0.385859 0.087228 1.476244 0.511632 5.000000 5
5 1.035326 1.011037 -0.753938 -0.285154 6.000000 6
6 -0.075624 0.210857 0.215464 -0.732181 2.151847 7
7 -0.265325 1.323702 -0.488284 1.253780 -1.949705 8
8 -0.592924 -0.442635 0.601039 1.839268 -1.247409 9

Numpy & Pandas

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Numpy & Pandas

Uploaded by

Copyright:

Available Formats

Introduction: What is NumPy? Pandas?

Like any regular python package, you’ll need to import NumPy

Creating NumPy arrays

np.array allows you to pass in a regular Python list in order to

Note that you can create multidimensional arrays as well!

np.linspace takes a start point, end point, and the number of

1. ndim: the dimension of the array

2.shape: a tuple of integers indicating the size of the array in

3.size: the total number of elements in the array

4.dtype: returns the type of elements in the array (ex. int64,

Let’s look at one of the simple arrays we created earlier.

Accessing array elements

NumPy array elements can be accessed using a similar indexing

 A[2:5] will give the elements from 2 to 4. The endpoint is not

 You can specify multiple dimensions by using a comma (“,”) in

Let’s look at this in action. We’ve got an array c.

We can access a single element two ways.

For more info on indexing NumPy arrays, visit their documentation

Operations with scalar values applies the operation to each element

Operations between two arrays will be element-wise.

Make sure they’re the same shape though!

You can transpose a matrix too.

You can also reshape an array by specifying a tuple, which will be

Let’s move on to slightly more complex operations. We’ll go over

You can also do reductions on rows and columns of multi-

How axes are labelled in a matrix

Just like NumPy, we have to import pandas.

Creating Pandas data structures

A Series can be treated as a 1D array, similar to a single column in a

A Series can be created by passing a list of values to

You can create a DataFrame by using

Here, I’ve created a DataFrame with random numbers, with

Viewing data in a DataFrame

You can view the column labels of the DataFrame.

In our example, we used the default integer indexing scheme, but

There are a number of ways we can change our DataFrames.

Firstly, we have setting. You can set by column using a Series.

Or you could set values using labels

Or you could set using an integer position.

Here’s the resulting DataFrame after these changes were made:

If you have multiple DataFrames and Series that you want to

To concatenate, you’d use the pd.concat() function. You can add

You might also like