Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 51

UNIT 4-

Data Visulization:
Index
CHAPTER 10: Visualizing Information:
Starting with a Graph, Defining the plot, Drawing multiple lines
and plots, Saving your work to disk, Setting the Axis, Ticks,
Grids, Getting the axes, Formatting the axes, Adding grids,
Defining the Line Appearance, Working with line style, Using
colors, Adding markers, Using Labels, Annotations, and Legends,
Adding labels, Annotating the chart, Creating a legend.
CHAPTER 11: Visualizing the Data:
Choosing the Right Graph, Showing parts of a whole with
pie charts, Creating comparisons with bar charts, Showing
distributions using histograms, Depicting groups using
boxplots, Seeing data patterns using scatterplots, Creating
Advanced Scatterplots, Depicting groups, Showing
correlations, Plotting Time Series, Representing time on axes,
Plotting trends over time, Plotting Geographical Data, Using
an environment in Notebook, Getting the Basemap toolkit,
Dealing with deprecated library issues, Using Basemap to
plot geographic data, Visualizing Graphs, Developing
undirected graphs, Developing directed graphs.
Introduction-Data Visulization:
In today’s world, a lot of data is being generated on a
daily basis. And sometimes to analyze this data for
certain trends, patterns may become difficult if the data is
in its raw format.
To overcome this data visualization comes into play.
Data visualization provides a good, organized pictorial
representation of the data which makes it easier to
understand, observe, analyze.
Python provides various libraries that come with
different features for visualizing data. All these libraries
come with different features and can support various
types of graphs.
Matplotlib
Seaborn
Bokeh
Plotly
Matplotlib -INTRODUCTION
 Matplotlib is an easy-to-use, low-level data visualization library that
is built on NumPy arrays. It consists of various plots like scatter
plot, line plot, histogram, etc. Matplotlib provides a lot of
flexibility.
 Matplotlib is a low level graph plotting library in python that serves
as a visualization utility.
 Matplotlib was created by John D. Hunter.
 Matplotlib is open source and we can use it freely.
 Matplotlib is mostly written in python, a few segments are written
in C, Objective-C and Javascript for Platform compatibility.

 To install this type the below command in the terminal.


◦ pip install matplotlib
8 Popular Types of Data Visualizations in Python
1.) Scatterplot:
2.) Histogram:
3.) Bar Chart:
4.) Pie Chart:
5.) Countplot:
6.) Boxplot:
7.) Heatmap:
8.) Distplot:
What is Plotting in Python?
 It is basically data representation in graphical format using
python scripts.
 Plotting can be either 2D or 3D in graphics.
 In python Matplotlib is being used for plotting graphs, lines,
charts etc.
 It is convenient way of representing data in visual formats.
 It makes data easily understandable.
What is Matplotlib in Python?
 Matplotlib is a Python 2D plotting library which produces publication
quality figures in a variety of hardcopy formats and interactive
environments across platforms.
 Matplotlib can be used in Python scripts, the Python and IPython shells,
the Jupyter notebook, web application servers, and four graphical user
interface toolkits.
 You can generate plots, histograms, power spectra, bar charts, error charts,
scatter plots, etc., with just a few lines of code.
 It can embed graphics in the user interface toolkit of your choice, and
currently supports interactive graphics on all major desktop operating
systems using the GTK+, Qt, Tk, FLTK, wxWidgets and Cocoa toolkits
 matplotlib can also be embedded in a headless webserver to provide
hardcopy in both raster-based formats like Portable Network Graphics
(PNG) and vector formats like PostScript, Portable Document Format
(PDF) and Scalable Vector Graphics (SVG) that look great on paper.
Matplotlib toolkits
There are several toolkits which are available that extend python
matplotlib functionality. Some of them are separate downloads,
others can be shipped with the matplotlib source code but have
external dependencies.
Basemap: It is a map plotting toolkit with various map projections,
coastlines and political boundaries.
Cartopy: It is a mapping library featuring object-oriented map
projection definitions, and arbitrary point, line, polygon
and image transformation capabilities.
Excel tools: Matplotlib provides utilities for exchanging data with
Microsoft Excel.
Mplot3d: It is used for 3-D plots.
Architecture of Matplotlib
• Three layers are as follow:

1)Scripting layer:-It contains Pyplot and Pylab functionalities.

2)Artist layer:-It contains primitives like Line2D,Rectangle,Text,


Image and containers like Figure,Axes i.e parts of the plot to be
rendered.

1)Backend layer:-It contains hardcopy backends such as PDF,AGG etc


i.e. in which format it should render the output.
Difference between Pyplot and Pylab.
 Package matplotlib.pyplot provides a MATLAB-like plotting
framework.
 Package matplotlib.pylab combines pyplot with NumPy into a
single namespace.
Access of Matplotlib library

• Pylab example:-
import pylab
Or
Or
from matplotlib.pyplot
import matplotlib.pyplot as plt import *
import numpy as np From numpy import *
Or
from pylab import *
Types of Plot in Python using PyPlot and Pylab
module.
• There are several types of plots we can generate using matplolib
like:-
1)Line Plot
2)Scatter Plot
3)Bar and Histogram Plot
4)Image Plot
5)Contour Plot
6)Polar Plot
7)3D Plot
8)Pie-chart Plot etc.
Various Functions used for Plotting
• plot()
Given a single list or array, plot() assumes it’s a vector of y-values.
Automatically generates an x vector of the same length with consecutive integers beginning
with 0.
Here [0,1,2,3]
To override default behavior, supply the x data: plot(x,y )
where x and y have equal lengths

• show()
Should be called at most once per script
Last line of the script
Then the GUI takes control, rendering the figure.
Cont.
• Savefig()
We can also save output to supported formats: emf, eps, pdf,
png, ps, raw, rgba, svg, svgz
If no extension specified, defaults to .png.
• draw()
Clear the current figure and initialize a blank figure without hanging or displaying anything.
• close()
To dismiss the figure and clear it.
• axis()

The axis() command takes a list [xmin, xmax, ymin, ymax]


and specifies the view port of the axes.
Functions for customising the Plot
• xlabel() , ylabel(), title() , text(), setp() : The functions can be used for setting custom
visualization of output by setting properties as follow:
CHAPTER 10: Visualizing Information:

Starting with a Graph, Defining the plot, Drawing


multiple lines and plots, Saving your work to disk,
Setting the Axis, Ticks, Grids, Getting the axes,
Formatting the axes, Adding grids, Defining the Line
Appearance, Working with line style, Using colors,
Adding markers, Using Labels, Annotations, and
Legends, Adding labels, Annotating the chart, Creating a
legend.
Line Drawing:
Example 1: draw a line in a diagram from position (1, 3) to position (8, 10):
 import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([1, 8])
ypoints = np.array([3, 10])
plt.plot(xpoints, ypoints)
plt.show()

Example 2: Draw two points in the diagram, one at position (1, 3) and one in position (8, 10):
import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([1, 8])
ypoints = np.array([3, 10])
plt.plot(xpoints, ypoints, 'o')
plt.show()
Example 4: Draw a line in a diagram from position (1, 3) to (2, 8) then to (6, 1) and
finally to position (8, 10):

import matplotlib.pyplot as plt


import numpy as np
xpoints = np.array([1, 2, 6, 8])
ypoints = np.array([3, 8, 1, 10])
plt.plot(xpoints, ypoints)
plt.show()

Example 5: Plotting without x-points:

import matplotlib.pyplot as plt


import numpy as np
ypoints = np.array([3, 8, 1, 10, 5, 7])
plt.plot(ypoints)
plt.show()
Example 6: Mark each point with a circle
import matplotlib.pyplot as plt
import numpy as np
ypoints = np.array([3, 8, 1, 10])
plt.plot(ypoints, marker = 'o')
plt.show()

Mark each point with a star:


...
plt.plot(ypoints, marker = '*')
...
Example 7:
# importing the required module
import matplotlib.pyplot as plt

# x axis values
x = [1,2,3]
# corresponding y axis values
y = [2,4,1]

# plotting the points


plt.plot(x, y)

# naming the x axis


plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis')

# giving a title to my graph


plt.title('My first graph!')

# function to show the plot


plt.show()
Example 8: Plotting two or more lines on same plot

import matplotlib.pyplot as plt

# line 1 points
x1 = [1,2,3]
y1 = [2,4,1]
# plotting the line 1 points
plt.plot(x1, y1, label = "line 1")

# line 2 points
x2 = [1,2,3]
y2 = [4,1,3]
# plotting the line 2 points
plt.plot(x2, y2, label = "line 2")

# naming the x axis


plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis')
# giving a title to my graph
plt.title('Two lines on same graph!')

# show a legend on the plot


plt.legend()

# function to show the plot


plt.show()
Example 9: Customization of Plots

import matplotlib.pyplot as plt

# x axis values
x = [1,2,3,4,5,6]
# corresponding y axis values
y = [2,4,1,5,2,6]

# plotting the points


plt.plot(x, y, color='green', linestyle='dashed', linewidth = 3,
marker='o', markerfacecolor='blue', markersize=12)

# setting x and y axis range


plt.ylim(1,8)
plt.xlim(1,8)

# naming the x axis


plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis')

# giving a title to my graph


plt.title('Some cool customizations!')

# function to show the plot


plt.show()
Example 10: Matplotlib Adding Grid Lines
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.title("Sports Watch Data")
plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")
plt.plot(x, y)
plt.grid()
plt.show()
Example 11 : Specify Which Grid Lines to Display

import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y=
np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.title("Sports Watch Data")
plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")
plt.plot(x, y)
plt.grid(axis = 'x')
plt.show()
Example 12 : Specify Which Grid Lines to Display
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.title("Sports Watch Data")
plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")
plt.plot(x, y)
plt.grid(color = 'green', linestyle = '--', linewidth = 0.5)
plt.show()
Different Matplotlib Linestyle /Marker in
Python
Character Definition Character Definition

– Solid line 1 tri_down marker


— Dashed line 2 tri_up marker
Character Definition
-. dash-dot line 3 tri_left marker
: Dotted line 4 tri_right marker D Diamond marker

. Point marker s square marker d thin_diamond marker

o Circle marker p pentagon marker | vline marker

, Pixel marker * star marker

v triangle_down marker h hexagon1 marker

^ triangle_up marker H hexagon2 marker

< triangle_left marker + Plus marker

> triangle_right marker x X marker


Color codes used with Matplotlib Linestyle

b blue
g green
r red
c cyan
m magenta
y yellow
k black
w white
Annotating Chart
import matplotlib.pyplot as plt
values=[1,5,8,9,2,0,3,10,4,7]
plt.annotate(xy=[1,1],s="First Entry")
plt.plot(range(1,11),values)
plt.show()
CHAPTER 11: Visualizing the Data:

Choosing the Right Graph, Showing parts of a whole with


pie charts, Creating comparisons with bar charts, Showing
distributions using histograms, Depicting groups using
boxplots, Seeing data patterns using scatterplots, Creating
Advanced Scatterplots, Depicting groups, Showing
correlations, Plotting Time Series, Representing time on axes,
Plotting trends over time, Plotting Geographical Data, Using
an environment in Notebook, Getting the Basemap toolkit,
Dealing with deprecated library issues, Using Basemap to
plot geographic data, Visualizing Graphs, Developing
undirected graphs, Developing directed graphs
Bar Graph:
 Bar graph is a chart that graphically represents the comparison between categories of data.
 It displays grouped data by way of parallel rectangular bars of equal width but varying the length.
 Each rectangular block indicates specific category and the length of the bars depends on the values
they hold. The bars in a bar graph are presented in such a way that they do not touch each other, to
indicate elements as separate entities.
 Bar diagram can be horizontal or vertical, where a horizontal bar graph is used to display data varying
over space whereas the vertical bar graph represents time series data. It contains two axis, where one
axis represents the categories and the other axis shows the discrete values of the data. See figure given
below:
Program1 : Bar chart
import matplotlib.pyplot as plt

# x-coordinates of left sides of bars


left = [1, 2, 3, 4, 5]

# heights of bars
height = [10, 24, 36, 40, 5]

# labels for bars


tick_label = ['one', 'two', 'three', 'four', 'five']

# plotting a bar chart


plt.bar(left, height, tick_label = tick_label,
width = 0.8, color = ['red', 'green'])

# naming the x-axis


plt.xlabel('x - axis')
# naming the y-axis
plt.ylabel('y - axis')
# plot title
plt.title('My bar chart!')

# function to show the plot


plt.show()
Histogram:
 In statistics, Histogram is defined as a type of bar chart that is used to represent statistical
information by way of bars to show the frequency distribution of continuous data. It
indicates the number of observations which lie in-between the range of values, known as
class or bin.
 The first step, in the construction of histogram, is to take the observations and split them
into logical series of intervals called bins. X-axis indicates, independent variables i.e.
classes while the y-axis represents dependent variables i.e. occurrences. Rectangle blocks
i.e. bars are depicted on the x-axis, whose area depends on the classes. See figure given
below:
Program 2: Histogram
import matplotlib.pyplot as plt

# frequencies
ages = [2,5,70,40,30,45,50,45,43,40,44,
60,7,13,57,18,90,77,32,21,20,40]

# setting the ranges and no. of intervals


range = (0, 100)
bins = 10

# plotting a histogram
plt.hist(ages, bins, range, color = 'green',
histtype = 'bar', rwidth = 0.8)

# x-axis label
plt.xlabel('age')
# frequency label
plt.ylabel('No. of people')
# plot title
plt.title('My histogram')

# function to show the plot


plt.show()
Scatterplot – Showing Co-relations
Scatterplot is a type of data display that shows the
relationship between two numerical variables.
Each member of the dataset gets plotted as a point
whose (x, y) for the two variables.
Program 3: Scatter Plot
import matplotlib.pyplot as plt

# x-axis values
x = [1,2,3,4,5,6,7,8,9,10]
# y-axis values
y = [2,4,5,7,6,8,9,11,12,12]

# plotting points as a scatter plot


plt.scatter(x, y, label= "stars", color= "green",
marker= "*", s=30)

# x-axis label
plt.xlabel('x - axis')
# frequency label
plt.ylabel('y - axis')
# plot title
plt.title('My scatter plot!')
# showing legend
plt.legend()

# function to show the plot


plt.show()
What is correlation?
When the y variable tends to increase as the x variable increases, we say there is
a positive correlation between the variables.

When the y variable tends to decrease as the x variable increases,


we say there is a negative correlation between the variables.

When there is no clear relationship


between the two variables, we say
there is no correlation between the
two variables.
Program- To show corelations between two
variables
from numpy import mean
from numpy import std
from numpy.random import randn
from numpy.random import seed
from matplotlib import pyplot
# seed random number generator
seed(1)
# prepare data
data1 = 20 * randn(1000) + 100
data2 = data1 + (10 * randn(1000) + 50)
# summarize data1: mean=100.776 stdv=19.620
print('data1: mean=%.3f stdv=%.3f' % (mean(data1), data2: mean=151.050 stdv=22.358
std(data1)))
print('data2: mean=%.3f stdv=%.3f' % (mean(data2),
std(data2)))
# plot
pyplot.scatter(data1, data2)
pyplot.show()
Program 4: Pie chart
import matplotlib.pyplot as plt

# defining labels
activities = ['eat', 'sleep', 'work', 'play']

# portion covered by each label


slices = [3, 7, 8, 6]

# color for each label


colors = ['r', 'y', 'g', 'b']

# plotting the pie chart


plt.pie(slices, labels = activities, colors=colors,
startangle=90, shadow = True, explode = (0,
0, 0.1, 0),
radius = 1.2, autopct = '%1.1f%%')

# plotting legend
plt.legend()

# showing the plot


plt.show()
Write a Python programming to create a pie chart of the popularity of programming Languages.
Sample data: (GTU paper winter -21,marks =7)
Programming languages: Java, Python, PHP, JavaScript, C#, C++
Popularity: 22.2, 17.6, 8.8, 8, 7.7, 6.7

import matplotlib.pyplot as plt


# defining labels
activities = ['Java', 'Python', 'PHP',
'JavaScript','C#','C++']
# portion covered by each label
slices = [22.2, 17.6, 8.8, 8, 7.7, 6.7]
# color for each label
colors = ['r', 'y', 'g', 'b','w','k']
# plotting the pie chart
plt.pie(slices, labels = activities,
colors=colors,startangle=140, shadow = True,
explode = (0, 0, 0, 0, 0 ,0),radius = 1.2, autopct
= '%1.1f%%')
# plotting legend
plt.legend()
# showing the plot
plt.show()
Depicting groups using boxplots
What are Boxplots?
 A boxplot is a graphical and standardised way to display the distribution of data based on
five key numbers: The “minimum”, 1st Quartile (25th percentile), median (2nd Quartile./
50th Percentile), the 3rd Quartile (75th percentile), and the “maximum”.
 The minimum and maximum values are defined as Q1–1.5 * IQR and Q3 + 1.5 * IQR
respectively.
 Any points that fall outside of these limits are referred to as outliers.
Program- To create boxplot
# Import libraries
 import matplotlib.pyplot as plt
 import numpy as np
 # Creating dataset
 np.random.seed(10)
 data = np.random.normal(100, 20, 200)
 fig = plt.figure(figsize =(10, 7))
 # Creating plot
 plt.boxplot(data)
 # show plot
 plt.show()
Program- To create more than one boxplot
Import libraries
import matplotlib.pyplot as plt
import numpy as np

# Creating dataset
np.random.seed(10)

data_1 = np.random.normal(100, 10, 200)


data_2 = np.random.normal(90, 20, 200)
data_3 = np.random.normal(80, 30, 200)
data_4 = np.random.normal(70, 40, 200)
data = [data_1, data_2, data_3, data_4]

fig = plt.figure(figsize =(10, 7))

# Creating axes instance


ax = fig.add_axes([0, 0, 1, 1])

# Creating plot
bp = ax.boxplot(data)

# show plot
plt.show()
Plotting Time Series
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
import numpy as np
start_date=dt.datetime(2022,7,1)
end_date=dt.datetime(2022,8,1)
daterange=pd.date_range(start_date,end_date) sales
2022-07-01 37
sales=(np.random.rand(len(daterange))*50).astype(int) 2022-07-02 47
2022-07-03 30
df=pd.DataFrame(sales,index=daterange,columns=['sales']) 2022-07-04 15
2022-07-05 5
print(df) 2022-07-06 3
df.loc['July 01 2022':"Aug 01 2022"].plot() 2022-07-07 15
2022-07-08 33
plt.ylim(0,50) 2022-07-09 34
2022-07-10 10
plt.xlabel("sales Date") 2022-07-11 4
2022-07-12 15
plt.ylabel("Sales value") …..
2022-07-30 18
plt.title("Plotting times") 2022-07-31 14
plt.show() 2022-08-01 22
Plotting Trends over Time
import matplotlib.pyplot as plt
import datetime as dt
import numpy as np
start_date=dt.datetime(2022,7,1)
end_date=dt.datetime(2022,8,1)
daterange=pd.date_range(start_date,end_date)
sales=(np.random.rand(len(daterange))*50).astype(int)
df=pd.DataFrame(sales,index=daterange,columns=['sales'])
#In python, Numpy polyfit() is a method that fits the data within a
#polynomial function.
lr_coef=np.polyfit(range(0,len(df)),df['sales'],1)
#The numpy. poly1d() function helps to define a polynomial function
lr_func=np.poly1d(lr_coef)
trend=lr_func(range(0,len(df)))
df['trend']=trend sales trend
print(df) 2022-07-01 48 27.414773
df.loc['July 01 2022':"Aug 01 2022"].plot() 2022-07-02 23 27.210594
plt.ylim(0,50) 2022-07-03 38 27.006415
2022-07-04 24 26.802236
plt.xlabel("sales Date")
2022-07-05 33 26.598057
plt.ylabel("Sales value")
2022-07-06 11 26.393878
plt.title("Plotting times") 2022-07-07 5 26.189699
plt.legend(['sales,trend']) 2022-07-08 48 25.985521
plt.show() ….
2022-07-29 5 21.697764
2022-07-30 9 21.493585
2022-07-31 38 21.289406
2022-08-01 35 21.085227
Visualizing Graph – Undirected graph
import networkx as nx
import matplotlib.pyplot as plt
G=nx.Graph()
H=nx.Graph()
G.add_node(1)
G.add_nodes_from([2,3])
G.add_nodes_from(range(4,7))
H.add_node(7)
G.add_nodes_from(H)
G.add_edge(1,2)
G.add_edge(1,1)
G.add_edges_from([(2,3),(3,6),(4,6),(5,6)])
H.add_edges_from([(4,7),(5,7),(6,7)])
G.add_edges_from(H.edges())
nx.draw_networkx(G)
plt.show()
Visualizing Graph –directed graph
import networkx as nx
import matplotlib.pyplot as plt
G=nx.DigGraph()
G.add_node(1)
G.add_nodes_from([2,3])
G.add_nodes_from(range(4,6))
G.add_path([6,7,8])

G.add_edge(1,2)
G.add_edge(1,1)
G.add_edges_from([(1,4),(4,5),(2,3),(3,6),(5,6)])
nx.draw_networkx(G)
plt.show()
Program 5: plotting curve equation
# importing the required modules
import matplotlib.pyplot as plt
import numpy as np

# setting the x - coordinates


x = np.arange(0, 2*(np.pi), 0.1)
# setting the corresponding y - coordinates
y = np.sin(x)

# potting the points


plt.plot(x, y)

# function to show the plot


plt.show()
Multiple Subplot in Single Plot
• Functions axes() and subplot() both used to create axes.
subplot() is used more often.

• subplot(numRows, numCols, plotNum)


Creates axes in a regular grid of axes numRows by numCols.
plotNum becomes the current subplot
Subplots numbered top-down, left-to-right
1 is the 1st number

• Commas can be omitted if all numbers are single digit.


• Subsequent subplot() calls must be consistent on the
numbers or rows and columns.
# importing the matplotlib library
import matplotlib.pyplot as plt

# defining the values of X


x =[0, 1, 2, 3, 4, 5, 6]

# defining the value of Y


y =[0, 1, 3, 6, 9, 12, 17]

# creating the canvas with class 'fig'


# and it's object 'axes' with '1' row
# and '2' columns
fig, axes = plt.subplots(1, 2)

# plotting graph for 1st column


axes[0].plot(x, y, 'g--o')

# plotting graph for second column


axes[1].plot(y, x, 'm--o')

# Gives a clean look to the graphs


fig.tight_layout()
# importing the matplotlib library
import matplotlib.pyplot as plt

# defining the values of X


x =[0, 1, 2, 3, 4, 5, 6]

# defining the value of Y


y =[0, 1, 3, 6, 9, 12, 17]

# creating the canvas with class 'fig'


# and it's object 'axes' with '1' row
# and '2' columns
fig, axes = plt.subplots(2, 2)

# plotting graph for 1st element


axes[0, 0].plot(x, y, 'g--o')

# plotting graph for 2nd element


axes[0, 1].plot(y, x, 'm--o')

# plotting graph for 3rd element


axes[1, 0].plot(x, y, 'b--o')

# plotting graph for 4th element


axes[1, 1].plot(y, x, 'r--o')

# Gives a clean look to the graphs


fig.tight_layout()

You might also like