Professional Documents
Culture Documents
Num Py
Num Py
Num Py
Problem Definition
Data Collection
Data Preprocessing
Data Transformation
Data Mining
Data Analysis
Data Visualization
Python provides many powerful libraries that can be used to perform various tasks
described above.
NumPy
Holds data in N-dimensional array (ndarray) objects, which can store data in
multiple dimensions.
pandas
Stores Data in different Primary data structures: Series, DataFrame and Panel.
matplotlib
SciPy
Jupyter
Combines code, rich text, plots, media and mathematical equations together.
Bokeh
Scientific Distributions
Data scientist has to manually install all the python libraries required for
performing various tasks involved in Knowledge Discovery Process.
Time-consuming task.
All draw backs of manual installation could be overcome using any one of the
available Scientific Distributions.
The base version is open source and contains over 100+ packages from Python, R, and
Scala.
Additionally, provides access to over 700+ packages that could be installed and
managed using conda.
Anaconda is available for 32-bit and 64-bit Operating systems: Windows, Linux, and
Mac OSX.
Installing Anaconda
Steps for installing Anaconda
Choose the Python Version, i.e 3.x or 2.x, based on your interest.
Anaconda Navigator
Home
Environment
Projects
Learning
Community
Enables launching working environment through various modes like Jupyter Notebooks,
Jupyter qt-console, and Sypder IDE.
Environment Window
Anaconda Prompt
You can access anaconda's default Python interactive interpreter, using command
'python'.
conda --version
Command for viewing available environments.
conda info --envs
Creating New Environment
By default anaconda comes with root environment.
A new environment testenv, with Python 2.7, can be created using the below command.
activate testenv
conda list
Now you can verify the numpy availability with conda list command.
After successful installation, you can access numpy from testenv, without any
errors.
IPython
IPython provides interactive working environment, which is highly convenient and
efficient.
A Jupyter kernel that allows working with Python code in various interactive front
ends.
Features of IPython
Python statements and System commands can be executed in IPython.
Q)
Define the list fruits = ['apple', 'mango', 'kiwi', 'watermelon', 'pear'] in first
cell.
Q)Determine the length of each defined fruit names and save it in list fruits_len.
Find the fruit names that start with 'm' or 'p' and save it in list fruits_mp.
Q)Make use of magic method %save to save the previous six commands in a file
sample_script.py. - Hint : Use the expression %save sample_script.py 1-6- View the
contents of file sample_script.py using magic method %more. - Try the command %more
sample_script.py
Creating a Folder
Creating a Folder
A folder can be created using Folder option present under New section.
The Kernel enables the environment required for executing the code snippets.
Renaming it to MyFirstNoteBook.
A user is allowed to write either code snippets or markdown text, inside a cell.
A Markdown Text can be used to embed Normal text, Header Text, Unordered, Ordered
Lists, Hyperlinks, Tables, Images, Videos, HTML content, and other useful elements
inside the Notebook.
Markdown Basics
Markdown Basics
In this section, you will be writing the following elements in Markdown.
Markdown Basics
Unordered Lists : Either of the symbols - Asterix *, hypen -, plus + are used.
Nested Unordered Lists : The nested lists are indexed with a minimum of four spaces
and followed with symbols.
Justifying Text of a list element : Two spaces, at the end of each line, are used
to justify multiple lines of text.
Markdown Basics
Code snippets: Pair of three back quotes are used.
Reference Links: Text and Reference both are written in two different pairs of
square brackets.
NumPy
NumPy
NumPy is a Python library, which supports efficient handling of various numerical
operations on arrays holding numeric data.
Example 1
import numpy as np
x = np.array([5, 8,
9, 10,
11]) # using 'array' method
numpy.ndarray
y = np.array([[6, 9, 5],
[10, 82, 34]])
print(y)
Output
array([[ 6, 9, 5],
[10, 82, 34]])
ndarray Attributes
Some of the important attributes of a ndarray are
Example 3
2 (2, 3) 6 int32 4 24
Numpy dtypes
Numpy supports various data types based on number of bytes required by the data
elements.
Example 4
y = np.array([[6, 9, 5],
[10, 82, 34]],
dtype='float64')
print(y)
print(y.dtype)
Output
Using Numpy array creation methods like ones, ones_like, zeros, zeros_like
import numpy as np
import numpy as np
n = [[-1, -2, -3, -4], [-2,-4, -6, -8]]
y
y = np.array(n)
print(y.dtype)
print(y.dtype)
int64
ndim
print(y.ndim)
2
shape
print(y.shape)
(2, 4)
size
print(y.size)
8
print(y.dtype)
int64
nbytes
print(y.nbytes)
64
import numpy as np
a = [[[4.1, 2.5], [1.1, 2.3], [9.1, 2.5]],
[[8.6, 9.9],[3.6, 4.3], [6.6, 0.3]]]
x = np.array(a, dtype='float64')
Output
numpy.ndarray, 3, (2, 3, 2)
x = np.zeros(shape=(2,4))
print(x)
Output of Example 1
[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]
Example 2 : Using full method
y = np.full(shape=(2,3), fill_value=10.5)
print(y)
Output of Example 2
Example 1
[ 3. 6. 9. 12. 15.]
print(x)
Output of Example1
[ 0.54340494 0.27836939]
Example 2
print(y)
Output of Example 2
[18 34 13]
np.random.seed(100)
x = np.random.randn(3) # Standard normal distribution
print(x)
Output of Example1
np.random.seed(100)
x = 10 + 2*np.random.randn(3) # normal distribution with mean 10 and sd 2
print(x)
Output of Example 2
d = np.loadtxt(x,delimiter=' ')
print(d)
print(d.ndim, d.shape)
Output of Example 1
Number of dimensions
Shape
Size
Define a ndarray x2, whose shape is (3, 2, 2) and contains all 1's.
Define a ndarray x3, whose shape is (4,4) and contains 1's on diagonal and 0's
elsewhere.
Reshaping ndarrays
Shape of an array can be changed using reshape.
Example
import numpy as np
np.random.seed(100)
x = np.random.randint(10, 100, 8)
print(x, end='\n\n')
y = x.reshape(2,4)
print(y, end='\n\n')
z = x.reshape(2,2,2)
print(z, '\n\n')
Output
[18 34 77 97 89 58 20 62]
[[18 34 77 97]
[89 58 20 62]]
[[[18 34]
[77 97]]
[[89 58]
[20 62]]]
import numpy as np
x = np.array([[-1, 1], [-3, 3]])
y = np.array([[-2, 2], [-4, 4]])
np.vstack((x,y))
Output
array([[-1, 1],
[-3, 3],
[-2, 2],
[-4, 4]])
import numpy as np
x = np.array([[-1, 1], [-3, 3]])
y = np.array([[-2, 2], [-4, 4]])
z = np.array([[-5, 5], [-6, 6]])
np.hstack((x,y,z))
Output
import numpy as np
x = np.arange(30).reshape(6, 5)
res = np.vsplit(x, 2)
print(res[0], end='\n\n')
print(res[1])
Output
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]]
[[15 16 17 18 19]
[20 21 22 23 24]
[25 26 27 28 29]]
import numpy as np
x = np.arange(30).reshape(6, 5)
res = np.vsplit(x, (2, 5))
print(res[0], end='\n\n')
print(res[1], end='\n\n')
print(res[2])
Output
[[0 1 2 3 4]
[5 6 7 8 9]]
[[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
[[25 26 27 28 29]]
import numpy as np
x = np.arange(10).reshape(2, 5)
res = np.hsplit(x, (2,4))
print(res[0], end='\n\n')
print(res[1], end='\n\n')
print(res[2])
Output
[[0 1]
[5 6]]
[[2 3]
[7 8]]
[[4]
[9]]
Create a 2-D array q, of shape (2, 3) with elements 15, 18, 21, 24, 27, 30.
import numpy as np
x = np.arange(6).reshape(2,3)
print(x + 10, end='\n\n')
print(x * 3, end='\n\n')
print(x % 2)
Output
[[10 11 12]
[13 14 15]]
[[ 0 3 6]
[ 9 12 15]]
[[0 1 0]
[1 0 1]]
import numpy as np
x = np.array([[-1, 1], [-2, 2]])
y = np.array([[4, -4], [5, -5]])
print(x + y, end='\n\n')
print(x * y)
Output
[[ 3 -3]
[ 3 -3]]
[[ -4 -4]
[-10 -10]]
import numpy as np
x = np.array([[-1, 1], [-2, 2]])
y = np.array([-10, 10])
print(x * y)
Output
[[10 10]
[20 20]]
This is due Broadcasting feature exhibited by numpy arrays.
Broadcasting in NumPy
Element wise operations between arrays are possible only when they have the same
shape or compatible for Broadcasting.
If the shape of both arrays are equal or either of it has a shape of 1, continue
the comparison.
Finally, the resulted broadcasting array shape would be maximum of two compared
shapes in each dimension.
Feasibility of Broadcasting
Below examples show feasibility of broadcasting between two arrays, having shape s1
and s2 respectively.
Examples
import numpy as np
x = np.array([[0,1], [2,3]])
print(np.square(x), end='\n\n')
print(np.sin(x))
Output
[[0 1]
[4 9]]
[[ 0. 0.84147098]
[ 0.90929743 0.14112001]]
To know more on Universal functions, refer this link.
https://docs.scipy.org/doc/numpy/reference/ufuncs.html
Example
import numpy as np
x = np.array([[0,1], [2, 3]])
print(x.sum(), end='\n\n')
print(x.sum(axis=0), end='\n\n')
print(x.sum(axis=1))
Output
6
[2 4]
[1 5]
8 of
Calculate mean of x
Calculate variance of x.
Having only a single number inside square brackets refer to start index.
10
[10 15 20 25 30]
[10 25]
Indexing, Slicing a 2-D ndarray
Two slice objects, one for each dimension, are required to slice a 2-D array.
They are separated by a comma (,) and having only a single slice object inside
square brackets refers to first dimension.
Example
import numpy as np
y = np.array([[0, 1, 2],
[3, 4, 5]])
print(y[1:2, 1:3])
print(y[1])
print(y[:, 1])
Output
[[4 5]]
[3 4 5]
[1 4]
3 of 12
Example
[4 5]
[[-5 5]
[-9 9]]
[[-7 7]
[-9 9]]
Row : [-1 1]
Row : [-2 2]
import numpy as np
x = np.array([[0,1], [2, 3]])
for a in np.nditer(x):
print(a)
Output
0
1
2
3
Boolean Indexing
Checking if every element of an array satisfies a condition, results in a Boolean
array.
This Boolean array can be used as index to filter elements that satisfy the
condition.
Example
import numpy as np
x = np.arange(10).reshape(2,5)
condition = x % 2 == 0
print(condition)
print(x[condition])
Output
Obtain elements, overlapping first two rows and last three columns.
x[b]
x[b,:,1:3]