Data Visualization With Python

Data Visualization with Python
July 29, 2023
[1]: import pandas as pd

import matplotlib.pyplot as plt
# reading the database

data = pd.read_csv("data/tips.csv")
# printing the top 10 rows

display(data.head(5))
total_bill tip sex smoker day time size

0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
0.1 Scatter Plot

Scatter plots are used to observe relationships between variables and uses dots to represent the
relationship between them.
Purpose: Displaying relationships between variables.
matplotlib function: scatter(x, y)
• x, y: The values for the two variables.
[2]: import numpy as np
# value gen
x = range(20)
y = np.arange(50, 70) + (np.random.random(20) * 10)
# plot
# adding figure
plt.figure()
plt.scatter(x, y)
plt.show()
1
Commonly used parameters:
• c: Set the color of the markers.
• s: Set the size of the markers.
• marker: Set the marker style, e.g., circles, triangles, or squares.
• edgecolor: Set the color of the lines on the edges of the markers.
[3]: import matplotlib.pyplot as plt

import numpy as np
x = np.arange(20)
y = np.arange(50, 70) + (np.random.random(20) * 10.)
plt.figure()
plt.scatter(x,
y,
c='green',
s=100,
marker='s',
edgecolor='none')
2
plt.show()
plt.figure()
# Scatter plot with day against tip
plt.scatter(data['total_bill'], data['tip'])
# Adding Title to the Plot

plt.title("Scatter Plot")
# Setting the X and Y labels

plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show()
3
This graph can be more meaningful if we can add colors and also change the size of the points. We
can do this by using the c and s parameter respectively of the scatter function. We can also show
the color bar using the colorbar() method.

# # reading the database

# data = pd.read_csv("data/tips.csv")
# Scatter plot with day against tip

plt.scatter(data['day'],
data['tip'],
s=data['size']*10,
c=data['total_bill'])

plt.title("Scatter Plot")
4
plt.xlabel('Day')
plt.ylabel('Tip')
plt.colorbar()
plt.show()
1 ## Line Chart
Purpose: Showing trends in data – usually time series data with many time points.
matplotlib function: plot(x, y)
• x: The x-coordinates of the lines or markers.
• y: The y-coordinates of the lines or markers.

# data gen
x = np.linspace(0, 20)
5
y1 = np.sin(x)
y2 = np.cos(x)
# plot
plt.figure()
# line chart
plt.plot(x, y1)
plt.plot(x, y2)
plt.show()

• color: Set the color of the line.
• linestyle: Set the line style, e.g., solid, dashed, or none.
• linewidth: Set the line thickness.
• marker: Set the marker style, e.g., circles, triangles, or none.
• markersize: Set the marker size.
• label: Set the label for the line that will show up in the legend.
6
[7]: # data gen.
x = np.linspace(0, 20)
y1 = np.sin(x)
y2 = np.cos(x)
# plot
plt.figure()
plt.plot(x, y1,
color='black',
linestyle='-',
linewidth=2,
marker='s',
markersize=6,
label='sin values')
plt.plot(x, y2,
color='gray',
linestyle='--',
linewidth=2,
marker='^',
markersize=6,
label='cos values')
plt.legend()
plt.show()
7

# data = pd.read_csv("data/tips.csv")
plt.figure()
# Line plot with day against tip
plt.plot(data['tip'], color='gray')

plt.title("Line chart")

plt.xlabel('Index')
plt.ylabel('Tip')
plt.show()
8
1.1 Bar Chart
Purpose: Comparing categories OR showing temporal trends in data with few (< 4) time points.
matplotlib function: bar(left, height)
• left: The x coordinate(s) of the left sides of the bars.
• height: The height(s) of the bars.
[9]: years = np.arange(2012, 2015)

values = [2, 5, 9]
plt.figure()
plt.bar(years, values)
plt.show()
9
• color: Set the color of the bars.
• edgecolor: Set the color of the lines on the edges of the bars.
• width: Set the width of the bars.
• align: Set the alignment of the bars, e.g., center them on the x coordinate(s).
• label: Set the label for the bar that will show up in the legend.
[10]: years = np.arange(2012, 2015)

category1_values = [2, 5, 9]
category2_values = [7, 6, 3]
plt.figure()
plt.bar(years - 0.2,
category1_values,
color='blue',
edgecolor='none',
width=0.4,
align='center',
label='y1')
10
plt.bar(years + 0.2,
category2_values,
color='orange',
edgecolor='none',
width=0.4,
align='center',
label='y2')
plt.xticks(years, [str(year) for year in years])
plt.legend()
plt.show()
Purpose: Comparing categories.

matplotlib function: barh(bottom, width)
• bottom: The y coordinate(s) of the bars.
• width: The width(s) of the bars.
11
[11]: categories = ['A', 'B', 'C', 'D', 'E']
values = [7, 12, 4, 2, 9]
# print(np.arange(len(categories)))
plt.figure()
plt.barh(np.arange(len(categories)), values)
plt.yticks(np.arange(len(categories)),
[f'Category {x}' for x in categories])
plt.show()

• color: Set the color of the bars.
• edgecolor: Set the color of the lines on the edges of the bars.
• height: Set the height of the bars.
• align: Set the alignment of the bars, e.g., center them on the y coordinate(s).
[12]: categories = ['A', 'B', 'C', 'D', 'E']

values = [7, 12, 4, 2, 9]
plt.figure()
12
plt.barh(np.arange(len(categories)),
values,
color='green',
edgecolor='none',
height=0.6,
align='center')
['Category {}'.format(x) for x in categories])
plt.show()
categories = ['A', 'B', 'C', 'D', 'E']

values = [7, 12, 4, 2, 9]
# convert to dataframe
category_values = pd.DataFrame({'categories': categories,
'values': values})
# category values wise sorted
13
category_values.sort_values(by='values', ascending=True, inplace=True)
categories = category_values['categories'].values
values = category_values['values'].values
# plot
plt.figure()
plt.barh(np.arange(len(categories)),
values,
color='blue',
edgecolor='none',
height=0.6,
align='center')
['Category {}'.format(x) for x in categories])
plt.show()
[14]: # Bar chart with day against tip

plt.bar(data['day'], data['tip'], color='green')
14
plt.title("Bar Chart")

plt.xlabel('Day')
plt.ylabel('Tip')
plt.show()
2 Pie charts
Purpose: Displaying a simple proportion.
matplotlib function: pie(sizes)
• sizes: The size of the wedges as either a fraction or number.
[15]: counts = [17, 14]
plt.figure(figsize=(4, 4))
15
plt.pie(counts)
plt.show()

• colors: Set the colors of the wedges.
• labels: Set the labels of the wedges.
• startangle: Set the angle that the wedges start at.
• autopct: Set the percentage display format of the wedges.
[16]: counts = [17, 14]
plt.pie(counts,
colors=['blue', 'orange'],
labels=['Category A', 'Other categories'],
startangle=90,
autopct='%1.2f%%')
plt.show()
16
[17]: grp_data = data.groupby('sex').tip.mean().reset_index()
print(grp_data)
plt.pie(grp_data.tip, labels=grp_data.sex, autopct='%1.2f%%')

plt.show()
sex tip
0 Female 2.833448
1 Male 3.089618
17
2.1 Histogram
Purpose: Showing the spread of a data column.
matplotlib function: hist(x)
• x: List of values to display the distribution of.
[18]: column_data = np.random.normal(42, 3, 1000)
plt.figure()
plt.hist(column_data)
plt.show()
18
• color: Set the color of the bars in the histogram.
• bins: Set the number of bins to display in the histogram, or specify specific bins.
[19]: column_data = np.random.normal(42, 3, 1000)
plt.figure()
plt.hist(column_data,
color='red',
bins=50)
plt.show()
19

# data = pd.read_csv("tips.csv")
# hostogram of total_bills
plt.hist(data['total_bill'], bins=20)
plt.title("Histogram")
# Adding the legends

plt.show()
20
3 Subplots
Purpose: Allows you to place multiple charts in a figure.
matplotlib function: subplot(nrows, ncols, plot_number)
• nrows: The number of rows in the figure.
• ncols: The number of columns in the figure.
• plot_number: The placement of the chart (starts at 1).
[21]: plt.figure()
for i in range(1, 7):

plt.subplot(3, 2, i)
plt.title(i)
plt.xticks([])
plt.yticks([])
plt.tight_layout()
21
[22]: dist1 = np.random.normal(42, 7, 1000)
dist2 = np.random.normal(59, 3, 1000)
plt.subplot(1, 2, 1)
plt.hist(dist1)
plt.title('dist1')
plt.subplot(1, 2, 2)
plt.scatter(dist2, dist1)
plt.xlabel('dist2')
plt.ylabel('dist1')
plt.tight_layout()
22
[23]: years = np.arange(2010, 2016)
for category_num in range(1, 17):

plt.subplot(4, 4, category_num)
y_vals = np.arange(10, 16) + (np.random.random(6) * category_num / 4.)
plt.plot(years, y_vals)
plt.ylim(10, 18)
plt.xticks(years, [str(year) for year in years])
plt.title('Category {}'.format(category_num))
plt.tight_layout()
23
3.1 Styles
# data
x1_values = [2012, 2013, 2014, 2015]
y1_values = [4.3, 2.5, 3.5, 4.5]
x2_values = [2012, 2013, 2014, 2015]

y2_values = [2.4, 4.4, 1.8, 2.8]
x3_values = [2012, 2013, 2014, 2015]

y3_values = [2, 2, 3, 5]
24
# plot
plt.figure()
plt.plot(x1_values, y1_values, label='Python')
plt.plot(x2_values, y2_values, label='JavaScript')
plt.plot(x3_values, y3_values, label='R')
plt.xlim(2012, 2015)
plt.ylim(0, 6)
plt.xticks([2012, 2013, 2014, 2015], ['2012', '2013', '2014', '2015'])
plt.yticks([1, 2, 3, 4, 5])
plt.xlabel('year')
plt.ylabel('Web Searches')
plt.legend(loc='upper center', ncol=3)

plt.grid(True)
plt.savefig('web-searches.png', dpi=150)
25
[25]: plt.figure()
plt.plot(x1_values, y1_values, label='Python', lw=3, color='#1f77b4')
plt.plot(x2_values, y2_values, label='JavaScript', lw=3, color='#ff7f0e')
plt.plot(x3_values, y3_values, label='R', lw=3, color='#2ca02c')
plt.xlim(2012, 2015)
plt.ylim(0, 6)
plt.xticks([2012, 2013, 2014, 2015], ['2012', '2013', '2014', '2015'])
plt.yticks([1, 2, 3, 4, 5])
plt.xlabel('')
plt.ylabel('Web Searches')
plt.legend(loc='upper center', ncol=3)

# plt.legend(loc='lower center', ncol=3)
plt.grid(True)
plt.savefig('web-searches.png', dpi=150)
26
4 Advanced Visualization
from matplotlib import pyplot as plt
import seaborn as sns
[27]: df = pd.read_csv('data/tips.csv')
df
[27]: total_bill tip sex smoker day time size

.. … … … … … … …
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2
[244 rows x 7 columns]
sns.set(rc = {'figure.figsize':(14, 6)})

ax = sns.histplot(x = data['total_bill'], kde=True)
quant_5 = data['total_bill'].quantile(0.05)
quant_dict = {'5%': quant_5, '25%': quant_25, '50%': quant_50, '75%': quant_75,␣
↪'95%': quant_95}
kdeline = ax.lines[0]
xs = kdeline.get_xdata()
ys = kdeline.get_ydata()
for key, value in quant_dict.items():

height = np.interp(value, xs, ys)
ax.vlines(value, 0, height, ls=':')
ax.text(value , height * 0.5, key, rotation=0)
plt.show()
27
4.0.1 Pairplot
[29]: sns.pairplot(df)
plt.show()
28
[30]: df.value_counts('smoker')
[30]: smoker
No 151
Yes 93
dtype: int64
[31]: sns.pairplot(df, hue='smoker')

plt.show()
29
4.0.2 Linear regression with distributions
[32]: sns.jointplot(x="total_bill", y="tip", data=df, kind="reg")

plt.show()
30
4.0.3 Box Plot
[33]: sns.boxplot(x=df["total_bill"])
plt.show()
31
[34]: sns.boxplot(x="day", y="total_bill", data=df)
plt.show()
4.0.4 Heatmap
[35]: # Load the example flights dataset and convert to long-form

flights_long = sns.load_dataset("flights")
flights_long
[35]: year month passengers

0 1949 Jan 112
32
1 1949 Feb 118
2 1949 Mar 132
3 1949 Apr 129
4 1949 May 121
.. … … …
139 1960 Aug 606
140 1960 Sep 508
141 1960 Oct 461
142 1960 Nov 390
143 1960 Dec 432
[144 rows x 3 columns]
[36]: flights = flights_long.pivot("month", "year", "passengers")

flights
C:\Users\88019\AppData\Local\Temp\ipykernel_7132\254108779.py:1: FutureWarning:
In a future version of pandas all arguments of DataFrame.pivot will be keyword-
only.
flights = flights_long.pivot("month", "year", "passengers")
[36]: year 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960
month
Jan 112 115 145 171 196 204 242 284 315 340 360 417
Feb 118 126 150 180 196 188 233 277 301 318 342 391
Mar 132 141 178 193 236 235 267 317 356 362 406 419
Apr 129 135 163 181 235 227 269 313 348 348 396 461
May 121 125 172 183 229 234 270 318 355 363 420 472
Jun 135 149 178 218 243 264 315 374 422 435 472 535
Jul 148 170 199 230 264 302 364 413 465 491 548 622
Aug 148 170 199 242 272 293 347 405 467 505 559 606
Sep 136 158 184 209 237 259 312 355 404 404 463 508
Oct 119 133 162 191 211 229 274 306 347 359 407 461
Nov 104 114 146 172 180 203 237 271 305 310 362 390
Dec 118 140 166 194 201 229 278 306 336 337 405 432
[37]: # Draw a heatmap with the numeric values in each cell

f, ax = plt.subplots(figsize=(16, 8))
sns.heatmap(flights, annot=True, fmt=".0f")
plt.show()
33
Visit this page to learn more -> https://seaborn.pydata.org/examples/index.html
34

Data Visualization With Python

Uploaded by

Copyright:

Available Formats

You might also like

Data Visualization With Python

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Visualization With Python

Uploaded by

Copyright:

Available Formats

Data Visualization with Python

July 29, 2023

[1]: import pandas as pd

# reading the database

# printing the top 10 rows

total_bill tip sex smoker day time size

0.1 Scatter Plot

[2]: import numpy as np

[3]: import matplotlib.pyplot as plt

[4]: import matplotlib.pyplot as plt

# Adding Title to the Plot

# Setting the X and Y labels

[5]: import pandas as pd

# # reading the database

# Scatter plot with day against tip

# Adding Title to the Plot

[6]: import numpy as np

Commonly used parameters:

# # reading the database

# Adding Title to the Plot

# Setting the X and Y labels

[9]: years = np.arange(2012, 2015)

[10]: years = np.arange(2012, 2015)

plt.xticks(years, [str(year) for year in years])

Purpose: Comparing categories.

Commonly used parameters:

[12]: categories = ['A', 'B', 'C', 'D', 'E']

[13]: import pandas as pd

categories = ['A', 'B', 'C', 'D', 'E']

# category values wise sorted

[14]: # Bar chart with day against tip

# Setting the X and Y labels

[15]: counts = [17, 14]

Commonly used parameters:

[16]: counts = [17, 14]

plt.pie(grp_data.tip, labels=grp_data.sex, autopct='%1.2f%%')

[18]: column_data = np.random.normal(42, 3, 1000)

[19]: column_data = np.random.normal(42, 3, 1000)

# # reading the database

# Adding the legends

for i in range(1, 7):

for category_num in range(1, 17):

x2_values = [2012, 2013, 2014, 2015]

x3_values = [2012, 2013, 2014, 2015]

plt.legend(loc='upper center', ncol=3)

plt.legend(loc='upper center', ncol=3)

[27]: total_bill tip sex smoker day time size

[244 rows x 7 columns]

[28]: import numpy as np

sns.set(rc = {'figure.figsize':(14, 6)})

for key, value in quant_dict.items():

[31]: sns.pairplot(df, hue='smoker')

[32]: sns.jointplot(x="total_bill", y="tip", data=df, kind="reg")

[35]: # Load the example flights dataset and convert to long-form

[35]: year month passengers

[144 rows x 3 columns]

[36]: flights = flights_long.pivot("month", "year", "passengers")

[37]: # Draw a heatmap with the numeric values in each cell