Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 174

DATA VISUALIZATION

WITH

With Expert Python Instructor Chris


Bruehl

*Copyright Maven Analytics,


COURSE
STRUCTURE
This is a project-based course for students looking for a practical, hands-on
approach to learning data visualization with Python using the Matplotlib and
Seaborn libraries

Additional resources include:

Downloadable PDF to serve as a helpful reference when you’re offline or on the


go

Quizzes & Assignments to test and reinforce key concepts, with step-by-step
solutions

Interactive demos to keep you engaged and apply your skills throughout the
course
*Copyright Maven Analytics, LLC
COURSE
OUTLINE
Cover key data visualization best practices for clear communication, with
1 Intro to Data tips for
choosing the right chart, formatting it effectively, and using it to tell a story
Visualization s
Introduce the Matplotlib library and use it to build & customize several ,
2 Matplotlib chart type including line charts, bar charts, pie charts, scatterplots, and
histograms
Fundamentals
PROJECT: Visualizing Coffee Industry
Data
3 Advanced Apply advanced customization techniques in Matplotlib, including multi-
chart figures, custom layouts & colors, style sheets, and more
Customization
PROJECT: Consolidating Coffee Industry Data into a
Report
Visualize data with Seaborn, another Python library that introduces new
4 Data Viz with chart
Seaborn types and layouts, and interacts will with Matplotlib

PROJECT: Highlighting Insights from the Automotive Auction


Industry
*Copyright Maven Analytics, LLC
WELCOME TO MAVEN CONSULTING
GROUP
You’ve just been hired as an Associate Consultant for Maven Consulting
THE Group (MCG), a multinational firm that provides strategic advice to
SITUATIO companies across different industries. Your new role will see you take on
projects in the hotel, coffee, automotive, and diamond industries.
N

Your task is to effectively visualize data from these industries to deliver key
THE insights to M C G’s clients.
ASSIGNMENT This will range from analyzing hotel customer demographics to understanding
the major players in the global coffee industry.

• Use Pandas to read & manipulate multiple datasets


THE
• Use Matplotlib to visualize data & communicate
OBJECTIVE insights, and then build reports to consolidate your
S findings
• Use Seaborn to conduct advanced exploratory
analysis and aid the decision-making process

*Copyright Maven Analytics, LLC


SETTING EXPECTATIONS

This course covers the core functionality for Matplotlib &


Seaborn
• We’ll cover chart types, common customization options, and best practices for visualizing and analyzing
data
• We’ll give the tools to use the official documentation to apply any customization option not covered in the
course
We’ll focus on creating static visuals &
dashboards
• Interactive data visualization with Python will be covered in a separate
course

We’ll use Jupyter Notebooks as our primary coding


environment
• Jupyter Notebooks are free to use, and the industry standard for conducting data analysis with
Python
(we’ll introduce Google Colab as an alternative, cloud-based environment as well)

You do NOT need to be a Python expert to take this


course
• It is strongly recommended that you complete our Python Foundations and Data Analysis with Pandas
courses, or
have a solid understanding of basic Python syntax and DataFrame manipulation with the Pandas library
*Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
INSTALLING ANACONDA
(MAC)
1) Go to anaconda.com/products/distribution and
click

4) Follow the installation


steps
(default settings are OK)

2) Click X on the Anaconda Nucleus pop-


up
(no need to launch)

3) Launch the downloaded Anaconda pkg


file

*Copyright Maven Analytics, LLC


INSTALLING ANACONDA
(PC)
1) Go to anaconda.com/products/distribution and
click

4) Follow the installation


steps
(default settings are OK)

2) Click X on the Anaconda Nucleus pop-


up
(no need to launch)

3) Launch the downloaded Anaconda exe


file

*Copyright Maven Analytics, LLC


LAUNCHING
JUPYTER

1) Launch Anaconda 2) Find Jupyter Notebook and


Navigator click

*Copyright Maven Analytics, LLC


YOUR FIRST JUPYTER
NOTEBOOK
1) Once inside the Jupyter interface, create a folder to store your notebooks for the
course

NOTE: You can rename your folder by clicking “Rename” in the top left
corner

2) Open your new coursework folder and launch your first Jupyter
notebook!

NOTE: You can rename your notebook by clicking on the title at the top of the
screen

*Copyright Maven Analytics, LLC


THE NOTEBOOK
SERVER
NOTE: When you launch a Jupyter notebook, a terminal window may pop
up as well; this is called a notebook server, and it powers the notebook
interface

If you close the server window,


your notebooks will not run!

Depending on your OS, and


method of launching Jupyter, one
may not open. As long as you can
run your notebooks, don’t worry!

*Copyright Maven Analytics, LLC


ALTERNATIVE: GOOGLE
COLAB
Google Colab is Google’s cloud-based version of Jupyter
Notebooks

To create a Colab notebook:


1. Log in to a Gmail account
2. Go to colab.research.google.com
3. Click “new notebook”

Colab is very similar to Jupyter Notebooks


(they even share the same file extension); the
main difference is that you are connecting to
Google Drive rather than your machine, so
files will be stored in Google’s cloud

*Copyright Maven Analytics, LLC


*Copyright Maven Analytics, LLC
DATA VISUALIZATION

In this section we’ll cover key data visualization best practices for clear
communication,
with tips for choosing the right chart, formatting it effectively, and using it to tell a
story

TOPICS WE’LL GOALS FOR THIS SECTION:


COVER:
• Understand the purpose behind visualizing
data
• Learn the common chart types and their use cases

• Apply data visualization best practices to create


clear
and compelling charts

• Address common errors and how to avoid them

*Copyright Maven Analytics, LLC


WHY VISUALIZE
DATA?
Data visualization allows you to bring your data to life
• The human brain is built to interpret raw data as meaningless numbers and noise
• We need clear patterns and visual cues to help us quickly make sense of complex
information

Prefrontal Visual
Cortex
• Located in the frontal lobe
Cortex
• Located in the occipital lobe
• Responsible for cognitive • Responsible for visual
functioning & problem perception & understanding
solving • Helps us make sense of
• Helps us make sense of colors,
non-visual patterns, shapes, sizes, etc.
information (like raw • Instantaneous & subconscious
data)
• Slow & conscious

Data visualization puts both our prefrontal and visual cortex to work,
combining
the power of cognition (slow and conscious) and perception (instantaneous)
*Copyright Maven Analytics, LLC
THE TEN SECOND
RULE
In 10 seconds, what can you learn from the data
below?

0 10
TIME’S
UP!

*Copyright Maven Analytics, LLC


THE TEN SECOND
RULE
What if you were given the
averages?

*Copyright Maven Analytics, LLC


THE TEN SECOND
RULE
What if you visualize
it?

This is a slight twist on


Anscombe’s Quartet

Despite sharing nearly


identical descriptive
stats, each series tells a
very different visual
story

*Copyright Maven Analytics, LLC


THE 3 KEY
QUESTIONS
The 3 key questions are a great way to help choose the right
visual
What type of data What do you want Who is the end user
are you working to and what do they
with? communicate? need?

Time-series Comparison Analyst


Data that spans across Compares values over time Likes to see details and
continuous time or across categories understand
periods what’s happening at a granular
level
Categorical Composition Manager
Data that can be split up Breaks down the Wants summarized
into groups or categories component parts of a information with clear,
whole actionable insights

Numeric Distribution Executive


Data with quantitative Shows the frequency of Needs high-level, clear KPIs to
values, values track
either discrete or continuous within a series business health and performance

Hierarchical Relationship General Public


Data with natural groups Shows the correlation Requires engaging visuals and
and sub-groups between multiple variables a clear story to follow

*Copyright Maven Analytics, LLC


ESSENTIAL
VISUALS
KPI PIE TABL
CARD CHART E
Sometime Sort the slices, Add a color scale
s simple keep them under to highlight
text ~5, and focus on patterns in the
works one data
best

LINE BAR SCATTER PLOT


CHART CHART Remember that
correlation does
not imply
The dates must
causation
be continuous

Baseline must start at


zero

AREA 100% HISTOGRA


CHART STACKED M

Comparison
& Avoid using
composition too many
bins!

*Copyright Maven Analytics, LLC


CHART FORMATTING

Chart formatting should be used to eliminate noise & facilitate


understanding
BEFORE: Cluttered This is the right chart type… so why
chart is it
so hard to understand the visual?
× The chart border and gridlines are
more distracting than useful
× The vertical axis labels are hard to
read and lack context – it’s using
scientific notation and doesn’t start
at 0
× Data labels can help add context, but
they
just add noise here
× It’s not clear what each line
represents
PRO TIP: Be intentional about the formatting you apply – don’t just use the default
settings!

*Copyright Maven Analytics, LLC


CHART FORMATTING

Chart formatting should be used to eliminate noise & facilitate


understanding

AFTER: Clear chart


PRO TIPS:
 Remove the chart border & gridlines
 Format the axis labels clearly
 Add context with the chart title
 Create a visual order
 Make sure the story is clear

“Perfection is achieved not when there is nothing more to add, but when there is nothing left to take
away”
Antoine de Saint-Exupery
*Copyright Maven Analytics, LLC
STORYTELLING

Descriptive titles and data labels can be used to tell a clear story within your
visuals

AFTER: Compelling chart


PRO TIPS:
 Leverage the title to guide the
audience toward specific insights
 Insert text & shapes directly inside
the chart
 Use data labels and annotations to
draw
attention to the main data points
 Use color strategically

*Copyright Maven Analytics, LLC


COMMON
ERRORS
Choosing the wrong visual to represent the type of
data

Using a line chart, which is


meant for time series
data, with categorical data
gives the false sense of a
trend

Bar charts are great for


showing
comparison with categorical
data

While a tree map can work,


comparisons and compositions
are harder to make than with
a bar or pie chart
It’s best to use them with PRO TIP: Don’t prioritize
hierarchical data variety over effectiveness;
use the right chart for the
job!

*Copyright Maven Analytics, LLC


COMMON
ERRORS
Including too many series in a single
visual

It’s hard to focus or


extract
any valuable
information

Try highlighting the


series you want, or
aggregating other
categories

You can also group the


other categories into a
single series

*Copyright Maven Analytics, LLC


COMMON
ERRORS
Providing little to no context with text and
labels

What does
each line
represent?

What are
these
values?

What does
each
period
represent?

When removing elements from a chart to reduce clutter and


noise, remember to keep all the elements that add
understanding

*Copyright Maven Analytics, LLC


COMMON
ERRORS
Using inconsistent colors between related
visuals

Using different colors for the same


series
makes it difficult to associate them
visually

Consistency gains more


importance as the
number of visuals
increases, making it
critical for dashboards

Using the same colors consistently


makes them easier to understand, and
in some cases allows you to remove
the legend

*Copyright Maven Analytics, LLC


KEY TAKEAWAYS

Always answer the 3 key questions to choose the right visual


• What type of data are you working with? What do you want to communicate? Who is the end user?

Do NOT prioritize variety over effectiveness


• Choose chart types based on how clearly they communicate the data underneath – you can customize
later!

Eliminate noise and distractions to facilitate understanding


• “Perfection is achieved not when there is nothing more to add, but when there is nothing left to take
away”

Tell a story with the data to guide the user to the insights
• Use titles, strategic labels, and callouts to create a clear narrative
*Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
INTRO TO
MATPLOTLIB
In this section we’ll introduce the Matplotlib library and use it to build & customize
several
chart types, including line charts, bar charts, pie charts, scatterplots, and histograms

TOPICS WE’LL GOALS FOR THIS SECTION:


COVER:
• Understand the difference between the two
primary Matplotlib plotting frameworks

• Identify the key components of an object-oriented


plot

• Build different variations of line, bar and pie charts,


as well as scatterplots and histograms

• Customize your charts by adding custom titles,


labels, legends, annotations and much more!

*Copyright Maven Analytics, LLC


MEET MATPLOTLIB

Matplotlib is an open-source Python library built for data visualization that lets
you produce a wide variety of highly customizable charts & graphs

‘plt’ is the standard alias for


Matplotlib

The plot() function creates a


line chart by default, using
the index as the x-values
and the list elements as the
y-values

*Copyright Maven Analytics, LLC


COMPATIBLE DATA TYPES

Matplotlib can plot many data types, including base Python sequences,
NumPy Arrays, and Pandas Series & DataFrames

Python Pandas Pandas


List Series DataFrame

*Copyright Maven Analytics, LLC


PLOTTING METHODS

Matplotlib has two plotting methods, or


interfaces:

Charts are created with the plot() Charts are created by defining a plot
function, and modified with object, and modified using figure &
additional functions axis methods
1. Create the figure object and assign it
to
the ‘fig’ variable
2. Add a chart, or axis, object to the
figure
and assign it to the ‘ax’ variable
3. Call the axis plot() method to draw
the
chart
We’ll mostly focus on the
Object-Oriented
approach, as it provides
more clear control over
customization

*Copyright Maven Analytics, LLC


OBJECT-ORIENTED
PLOTTING
Object-Oriented plots are built by adding axes, or charts, to a figure
• The subplots() function lets you create the figure and axes in a single line of code
• You can then use figure & axis methods to customize the different elements in the
plot

Creates the figure and axis


Plots “y”

Adds a title to the figure and


axis

We’ll start by adding a single


subplot to each figure for
now, but will dive deeper
into subplots later in the
course!

*Copyright Maven Analytics, LLC


PLOTTING DATAFRAMES

When plotting DataFrames using the Object-Oriented interface, Matplotlib


will use the index as the x-axis and plot each column as a separate series by
default

*Copyright Maven Analytics, LLC


PLOTTING DATAFRAMES

Plotting each series independently allows for improved


customization
• ax.plot(x-axis series, y-series values)

*Copyright Maven Analytics, LLC


ASSIGNMENT: PLOTTING
DATAFRAMES
Results
NEW MESSAGE Preview
August 29, 2022

From: Ian Intern (Summer Consultant)


Subject: Do you know Matplotlib?

Hi!

I need someone who knows M atplotlib for help with


some client work.

Can you plot Lodging Revenue and Other Revenue over


time for our hotel client?

Thanks!

section02_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: PLOTTING
DATAFRAMES
Solution
NEW MESSAGE Code
August 29, 2022
Plot Each
From: Ian Intern (Summer Consultant) Series
Subject: Do you know Matplotlib?

Hi!

I need someone who knows M atplotlib for help with Plot The
some client work.
DataFrame
Can you plot Lodging Revenue and Other Revenue over
time for our hotel client?

Thanks!

section02_solutions.ipynb

*Copyright Maven Analytics, LLC


FORMATTING OPTIONS

Matplotlib has these formatting options for PyPlot and Object-Oriented


plots:
Figure
Title

Y-axis Tick Legen Figure Title fig.suptitle() plt.suptitle()


d
Axis Chart Title ax.set_title() plt.subtitle()
Title
Y-axis X-Axis Label ax.set_xlabel() plt.xlabel()
Label
Y-Axis Label ax.set_ylabel() plt.ylabel()

Legend ax.legend() plt.legend()


Tex
t X-Axis Limit ax.set_xlim() plt.xlim()

Y-Axis Limit ax.set_ylim() plt.ylim()

Axe X-Axis Ticks ax.set_xticks() plt.xticks()


s
Figur Y-Axis Ticks ax.set_yticks() plt.yticks()
Vertical
e
Line Vertical Line ax.axvline() plt.axvline()

Horizontal Line ax.axhline() plt.axhline()


X-axis spine[‘bottom’] Text ax.text() plt.text()
Tick X-axis
Label Spines (borders) ax.spines[‘side’] plt.spines[‘side’]

*Copyright Maven Analytics, LLC


CHART
TITLES
The set_title() and set_label() methods let you add chart titles and axis
labels
• fig.suptitle() serves as an overall figure title

*Copyright Maven Analytics, LLC


FONT
SIZES
You can modify chart font sizes with the “fontsize” argument
• You can specify the size in points (10, 12, etc.) or relative size (“smaller”, “x-large”,
etc.)

*Copyright Maven Analytics, LLC


CHART
LEGENDS
The legend() method lets you add a chart legend to identify each
series
• The series labels are used by default, but custom values can also be passed
through

*Copyright Maven Analytics, LLC


CHART
LEGENDS
The legend() method lets you add a chart legend to identify each
series
• The series labels are used by default, but custom values can also be passed
through

*Copyright Maven Analytics, LLC


LEGEND LOCATION

You can change the legend location with the “loc” or “bbox_to_anchor”
arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates

1
best (default)

upper right

upper left

upper center

lower right

lower left

lower center

center right

center left 0

center bbo
0 1
x
*Copyright Maven Analytics, LLC
LEGEND LOCATION

You can change the legend location with the “loc” or “bbox_to_anchor”
arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates

*Copyright Maven Analytics, LLC


LEGEND LOCATION

You can change the legend location with the “loc” or “bbox_to_anchor”
arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates

*Copyright Maven Analytics, LLC


LEGEND LOCATION

You can change the legend location with the “loc” or “bbox_to_anchor”
arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates

*Copyright Maven Analytics, LLC


LEGEND LOCATION

You can change the legend location with the “loc” or “bbox_to_anchor”
arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates

Setting coordinates beyond 1 will push


the legend outside the chart area
(useful when there is no whitespace!)
*Copyright Maven Analytics, LLC
LINE
STYLE
You can change the line style with the “linestyle”, “linewidth”, and “color”
arguments
• Common line styles are “solid”, “dashed”, or “dotted” (you can also use “-”, “--”, or “:”)

We will dive into colors in depth later, including


changing the default color palette and using hex
color codes!

*Copyright Maven Analytics, LLC


AXIS
LIMITS
The set_ylim() and set_xlim() functions let you modify the axis
limits
• ax.set_xlim(lower limit, upper limit)

Your date x-axis ticks may change interval size!

PRO TIP: Keeping the base of the y-axis


at 0 highlights the true magnitude of
change across periods and the differences
between series

*Copyright Maven Analytics, LLC


FIGURE
SIZE
You can adjust the figure size with the “figsize”
argument
• figsize=(width, height) – the default is 6.4 x 4.8 inches

PRO TIP: Increasing figure size lets you


add whitespace to your visual, which can
reduce clutter and add space to crowded
axes

*Copyright Maven Analytics, LLC


CUSTOM X-
TICKS
You can apply custom x-ticks with the set_xticks() and xticks()
functions
• ax.set_xticks(iterable)

This sets the xticks at every 2 nd date


from
the index and rotates them by 45
degrees *Copyright Maven Analytics, LLC
ADDING VERTICAL
LINES
You can add vertical lines to mark key points with the axvline()
function

Set the coordinate (in this case days since Jan 1,


1970) and an optional color and style

*Copyright Maven Analytics, LLC


TEXT

You can add text at specific coordinates with the text()


function
• ax.text(x-coordinate, y-coordinate, string, additional text
formatting)

*Copyright Maven Analytics, LLC


PRO TIP:
ANNOTATIONS
Annotations are a great way to call-out and label important datapoints
• ax.annotate(string, datapoint coordinate, text coordinate, arrow style dictionary, text
formatting)

Annotations have many more options that we won’t cover in


depth,
but the documentation has great examples worth looking into!

For a more info on annotations, visit: https://matplotlib.org/stable/tutorials/text/annotations.html#sphx-glr-tutorials-text-annotations- *Copyright Maven Analytics, LLC
REMOVING CHART
BORDERS
You can remove specific chart borders with
ax.spines[].set_visible(False)

This removes the right and top


borders

*Copyright Maven Analytics, LLC


ASSIGNMENT: CHART
FORMATTING
Results
NEW MESSAGE Preview
August 30, 2022

From: Ian Intern (Summer Consultant)


Subject: RE: Final Charts for Client

Hi there!

The data you plotted earlier looks good, but can you clean
up the chart a little bit? I want it to to look polished for
our client. This is my last day in my summer internship
and I want to get hired back!

Thanks!

section02_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: CHART
FORMATTING
Solution
NEW MESSAGE Code
August 30, 2022

From: Ian Intern (Summer Consultant)


Subject: Final Charts for Client

Hi there!

The data you plotted earlier looks good, but can you
clean up the chart a little bit! Want to to look polished
for our client.
This is my last day in my summer internship and I want
to get hired back!

Thanks!

section02_solutions.ipynb

*Copyright Maven Analytics, LLC


LINE
CHARTS
Line charts are used for showing trends over
time
• ax.plot(x-axis series, series values, formatting
options)
Column for each
series
Dates as the
index

PRO TIPS
Pivot tabular data to turn each unique series into a DataFrame column, and set the datetime as the

index Divide your series by the appropriate units while plotting to simplify the y-axis scale

*Copyright Maven Analytics, LLC


LINE
CHARTS
EXAMPL Available Housing Units by
E Week

*Copyright Maven Analytics, LLC


STACKED LINE
CHARTS
Use stackplot() to create a stacked line chart, which lets you visualize the
overall trend over time, as well as its composition by series

*Copyright Maven Analytics, LLC


STACKED LINE
CHARTS
Use stackplot() to create a stacked line chart, which lets you visualize the
overall trend over time, as well as its composition by series

PRO TIP: Use the bottom series in


the stacked line chart to draw focus
to its individual trend – it’s the most
visible!

*Copyright Maven Analytics, LLC


PRO TIP: DUAL AXIS
CHARTS
Use twinx() to create a dual axis chart, which lets you plot series with
values on significantly different scales inside a single visual

The “Inventory” values are so small compared to “Price”


that
they appear to be 0 when plotted on the same y-axis

*Copyright Maven Analytics, LLC


PRO TIP: DUAL AXIS
CHARTS
Use twinx() to create a dual axis chart, which lets you plot series with
values on significantly different scales inside a single visual

Create a second axis (ax2) with


ax.twinx(), then create the desired
plot on ax2

Note that using the figure level


legend picks up both series
*Copyright Maven Analytics, LLC
ASSIGNMENT: LINE
CHARTS
Results
NEW MESSAGE Preview
August 30, 2022

From: Ian Intern (Summer Consultant)


Subject: Re: Re: Final Charts for Client

Hey again,

Great work on those charts!

Final request - we want to plot compare room nights


booked vs cancellations over time, we might need a dual
axis chart to effectively do this. I’m totally checked out,
so can you do this? You’ll be put in contact with the
client soon.

Thanks!

section02_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: LINE
CHARTS
Solution
NEW MESSAGE Code
August 30, 2022

From: Ian Intern (Summer Consultant)


Subject: Re: Re: Final Charts for Client

Hey again,

Great work on those charts!

Final request - we want to plot compare room nights


booked vs cancellations over time, we might need a dual
axis chart to effectively do this. I’m totally checked out,
so can you do this? You’ll be put in contact with the
client soon.

Thanks!

section02_solutions.ipynb

*Copyright Maven Analytics, LLC


BAR
CHARTS
Bar charts are used to compare values across different
categories
• ax.bar(category labels, bar heights, formatting options)

Values in a single
Categories as the column
index

PRO TIPS
Use .groupby() and .agg() to aggregate your data by category and push the labels into the

index Use Seaborn or the Pandas plot API for grouped bar charts

*Copyright Maven Analytics, LLC


BAR
CHARTS
EXAMPL Median Home Price by
E City

*Copyright Maven Analytics, LLC


PRO TIP: HORIZONTAL
LINES
Use axhline() to add a horizontal line at a specified y-value on a bar
chart
• This will typically be something to benchmark against, like a mean or target

*Copyright Maven Analytics, LLC


HORIZONTAL BAR
CHARTS
Use barh() to create a horizontal bar
chart

Note that the Series in a horizontal bar chart are


sorted in the opposite order as in a vertical bar
chart

*Copyright Maven Analytics, LLC


PRO TIP:
HIGHLIGHTS
Use the “color” argument to highlight the series you’d like to focus
on

Use a list to specify the color for each


Series

*Copyright Maven Analytics, LLC


ASSIGNMENT: BAR
CHARTS
Results
NEW MESSAGE Preview
September 1, 2022

From: Sarah Shark (Managing Director)


Subject: CHARTS NEEDED ASAP

Hello,

Our hotel client is concerned about our intern’s

departure. I need YOU to step up and make sure

they’re happy with us.


Start by taking a quick look at room nights and
lodging by country for our top 10 countries by total
nights booked.

I expect the results in my inbox by morning (more


details in
the notebook attached).

-S section02_assignments.ipynb

*Copyright Maven Analytics, LLC


ASSIGNMENT: BAR
CHARTS
Solution
NEW MESSAGE Code
September 1, 2022

From: Sarah Shark (Managing Director)


Subject: CHARTS NEEDED ASAP

Hello,

Our hotel client is concerned about our intern’s

departure. I need YOU to step up and make sure

they’re happy with us.


Start by taking a quick look at room nights and
lodging by country for our top 10 countries by total
nights booked.

I expect the results in my inbox by morning (more


details in
the notebook attached).

-S section02_solutions.ipynb

*Copyright Maven Analytics, LLC


STACKED BAR
CHARTS
You can create a stacked bar chart by setting the “bottom” argument for
the
second “stacked” series as the values from the bars below it
• This will use those values as the baseline for the stacked bars instead of the x-axis

The Oregon bars are plotted by


using the
California values as their “bottom”

*Copyright Maven Analytics, LLC


100% STACKED BAR
CHARTS
To create a 100% stacked bar chart, convert your DataFrame to row-
level percentages before plotting

*Copyright Maven Analytics, LLC


PRO TIP: GROUPED BAR
CHARTS
You can create a grouped bar chart by reducing the width of each series
and shifting them evenly around their corresponding label

This shifts the bars to the left


across the x-axis by half their
width

This shifts these bars to the


right

Grouped bar charts are much easier to


create by using Seaborn or Pandas’
Matplotlib API

*Copyright Maven Analytics, LLC


PRO TIP: COMB O
CHARTS
You can create a combo chart by specifying different chart types in a dual axis
plot

PRO TIP: Use the “alpha” argument


to
modify the transparency of each plot
(0 is invisible and 1 is solid)
*Copyright Maven Analytics, LLC
ASSIGNMENT: ADVANCED BAR
CHARTS
Results
NEW MESSAGE Preview
September 2, 2022

From: Sarah Shark (Managing Director)


Subject: RE: RE: CHARTS NEEDED ASAP

Hello,

Nice work…so far. I need some more detailed views on


the breakdown of lodging revenue vs. other revenue by
country.

Build a grouped bar chart with the lodging revenue and


other revenue for each country. Then, build a 100%
stacked bar chart showing how much each revenue
category contributes to overall country revenue. Add a
reference line at 80% to help illustrate which countries
get less than 80% of their revenue from lodging.

-S
section02_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: ADVANCED BAR
CHARTS
Solution
NEW MESSAGE Code
September 2, 2022

From: Sarah Shark (Managing Director)


Subject: RE: RE: CHARTS NEEDED ASAP

Hello,

Nice work…so far. I need some more detailed views on


the breakdown of lodging revenue vs. other revenue by
country.

Build a grouped bar chart with the lodging revenue and


other revenue for each country. Then, build a 100%
stacked bar chart showing how much each revenue
category contributes to overall country revenue. Add a
reference line at 80% to help illustrate which countries
get less than 80% of their revenue from lodging.

-S
section02_solutions.ipynb

*Copyright Maven Analytics, LLC


PIE
CHARTS
Pie charts are used to compare proportions totaling 100%
• ax.pie(series values, labels= , startangle= , autopct=, pctdistance=,
explode=)

Values in a single
column
Labels as the
index

PRO TIPS
Keep the number of slices low (<7) to enhance readability – you can group “others” into a single

slice Use bar charts if you want to compare the categories – pies are for showing how they make

up a whole Donut charts make great KPI progress trackers


*Copyright Maven Analytics, LLC
PIE
CHARTS
EXAMPL Homes Sold by
E City

*Copyright Maven Analytics, LLC


PRO TIP: DONUT
CHARTS
You can create a donut chart by adding a “hole” to a pie chart and shifting the
labels

How does this code work?


• It pushes the data labels 85% of the way towards the edge of the pie
chart
• Then adds a white circle that covers the center of the pie chart to the
figure
*Copyright Maven Analytics, LLC
ASSIGNMENT: PIE & DONUT
CHARTS
Results
NEW MESSAGE Preview
September 3, 2022

From: Sarah Shark (Managing Director)


Subject: UPDATED CHARTS

Hello,

Our hotel client is looking for a pie/donut chart to


represent the share of revenue by country.

Create a pie chart with slices for the top 5 countries by


revenue, and a single “other” slice for the rest of the
countries.

Need it ASAP.

Thx

section02_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: PIE & DONUT
CHARTS
Solution
NEW MESSAGE Code
September 3, 2022

From: Sarah Shark (Managing Director)


Subject: UPDATED CHARTS

Hello,

Our hotel client is looking for a pie/donut chart to


represent the share of revenue by country.

Create a pie chart with slices for the top 5 countries by


revenue, and a single “other” slice for the rest of the
countries.

Need it ASAP.

Thx

section02_solutions.ipynb

*Copyright Maven Analytics, LLC


SCATTERPLOTS

Scatterplots are used to visualize the relationship between numerical


variables
• ax.scatter(x-axis series, y-axis series, size= , alpha=)

One row per x-series y-series


point

PRO TIPS
Modify the alpha (transparency) level to make overlapping points more visible

Bubble charts can be useful in some cases, but they often add confusion rather than
clarity
*Copyright Maven Analytics, LLC
SCATTERPLOTS

EXAMPL Months of Supply vs. Median List


E Price

*Copyright Maven Analytics, LLC


BUBBLE
CHARTS
To create a bubble chart, specify a third series in the “size” argument
of .scatter()
• You may need to apply some arithmetic to adjust the bubble sizes

*Copyright Maven Analytics, LLC


HISTOGRAMS

Histograms are used to visualize the distribution of a numeric


variable
• ax.hist(series, density= , alpha=, bins=)

numerical
series

PRO TIPS
Modify the alpha (transparency) level to plot multiple distributions on the same

axis Set density=True to use relative frequencies on the y-axis (percent of total)

*Copyright Maven Analytics, LLC


HISTOGRAMS

EXAMPL Distribution Y-o-Y Growth in Home Price for Calendar


E Weeks

*Copyright Maven Analytics, LLC


ASSIGNMENT: SCATTERPLOTS &
HISTOGRAMS
Results
NEW MESSAGE Preview
September 4, 2022

From: Sarah Shark (Managing Director)


Subject: Additional Customer Profiling

Not bad rookie – thanks for the quick turnaround.

I need two more charts to help finalize a marketing


strategy targeting overseas guests:

1. A chart comparing average revenue per customer and


average nights stayed, with average nightly revenue
as the size of the bubbles (you’ll need to aggregate
the data by country)
2. The distribution of customer ages in France &
Germany

-sent from my yPhone

section02_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: SCATTERPLOTS &
HISTOGRAMS
Solution
NEW MESSAGE Code
September 4, 2022

From: Sarah Shark (Managing Director)


Subject: Additional Customer Profiling

Not bad rookie – thanks for the quick turnaround.

I need two more charts to help finalize a marketing


strategy targeting overseas guests:

1. A chart comparing average revenue per customer and


average nights stayed, with average nightly revenue
as the size of the bubbles (you’ll need to aggregate
the data by country)
2. The distribution of customer ages in France &
Germany

-sent from my yPhone

section02_solutions.ipynb

*Copyright Maven Analytics, LLC


KEY TAKEAWAYS

Matplotlib has two methods for plotting data: PyPlot API & Object
Oriented
• Both can visualize many data types (lists, DataFrames, etc.), but object-oriented plots are easier to fully
customize

Object Oriented plots are built by adding axes to a figure


• You can layer on different elements to these objects to modify the chart formatting

You can create common chart types by using Matplotlib functions


• Each chart type can be customized further to create more advanced variations

Matplotlib's extreme customizability also adds complexity


• Understanding the anatomy of a Matplotlib figure helps pinpoint how to change every component in your *Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
PROJECT DATA: COFFEE
PRODUCTION

*Copyright Maven Analytics, LLC


PROJECT DATA: COFFEE
IMPORTS

*Copyright Maven Analytics, LLC


PROJECT DATA: COFFEE
PRICES

*Copyright Maven Analytics, LLC


ASSIGNMENT: MID-COURSE
PROJECT
Key Objectives
NEW MESSAGE
September 7, 2022 1. Read in data from multiple csv files
From: Sarah Shark (Managing Director) 2. Reshape the data to prepare it for
Subject: Coffee Industry Deep Dive visualization
3. Build & customize charts to communicate the
Hi there, key insights to the client
I’m starting to trust you… which is rare. We just got an
inquiry from a major coffee trader looking to get an
outside view on the coffee industry. They’re particularly
interested in Brazil’s production relative to other nations.

We’ll also look at a comparison of importer volume vs


the prices they pay to understand if we can unlock
margin by diversifying into new markets.

Do well on this and you’ll be on promotion track.

section03_coffee_project_part1.ipynb

*Copyright Maven Analytics, LLC


*Copyright Maven Analytics, LLC
ADVANCED CUSTOMIZATION

In this section we’ll cover advanced customization techniques in Matplotlib,


including
multi-chart figures, custom layouts & colors, style sheets, and more

TOPICS WE’LL GOALS FOR THIS SECTION:


COVER:
• Understand how to build multi-chart figures both
with subplots and GridSpec layouts

• Learn how to customize chart colors, by leveraging


custom colormaps and creating your own!

• Take a look at pre-built stylesheets, and dive into


the settings behind them that allow for extreme
chart customization

*Copyright Maven Analytics, LLC


SUBPLOTS

Subplots let you create a grid of equally sized charts in a single figure
• fig, ax = plt.subplots(rows, columns) – this creates a grid with the specified rows &
columns

Column Column
0 1

Row (0, (0,


0 0) 1)

Row (1, (1,


1 0) 1)
This creates a 2 row, 2
column grid that can be
populated with individual
charts

*Copyright Maven Analytics, LLC


SUBPLOTS

Subplots let you create a grid of equally sized charts in a single figure
• fig, ax = plt.subplots(rows, columns) – this creates a grid with the specified rows &
columns

(0, (0,
0) 1)

(1, (1,
0) 1)

Specify ax[row][column] to
create and modify individual
subplots
*Copyright Maven Analytics, LLC
SUBPLOTS

Subplots let you create a grid of equally sized charts in a single figure
• fig, ax = plt.subplots(rows, columns) – this creates a grid with the specified rows &
columns

*Copyright Maven Analytics, LLC


SUBPLOTS

Use the “sharex “& “sharey” arguments to set the same axis limits on all the
plots
• This is set as “none” by default, but can be set to “all”, “row”, or “col”

*Copyright Maven Analytics, LLC


SUBPLOTS

Subplots can be any chart type, and do not have to be the same
type

*Copyright Maven Analytics, LLC


ASSIGNMENT:
SUBPLOTS
Results
NEW MESSAGE Preview
September 10, 2022

From: Wendy Whiz (Data Scientist)


Subject: Deeper Exploration

Hey there,

I want to get a quick read on the distribution of revenue


by customer for our top 5 countries – I’m working on a
model for a similar client and want to see if the
distributions are similar.

Doesn’t need to be polished, just need the 5 histograms in


a
single figure.

Thanks, and looking forward to working with you

more! Wendy
Section04_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION:
SUBPLOTS
Solution
NEW MESSAGE Code
September 10, 2022

From: Wendy Whiz (Data Scientist)


Subject: Deeper Exploration

Hey there,

I want to get a quick read on the distribution of revenue


by customer for our top 5 countries – I’m working on a
model for a similar client and want to see if the
distributions are similar.

Doesn’t need to be polished, just need the 5 histograms in


a
single figure.

Thanks, and looking forward to working with you

more! Wendy
Section04_solutions.ipynb

*Copyright Maven Analytics, LLC


GRIDSPEC

You can build layouts with charts of varying sizes by setting a gridspec
object
• This creates a grid with a specified number of rows & columns

Column 0 Column 1 Column 2


Column 3

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

*Copyright Maven Analytics, LLC


GRIDSPEC

You can build layouts with charts of varying sizes by setting a gridspec
object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid

Column 0 Column 1 Column 2


Column 3

Row 0

Row 1 ax1
Row 2
Use a slice to specify the Row
ranges of rows and columns 3
for each axis Row
4

Row
5

Row
6

Row
7 *Copyright Maven Analytics, LLC
GRIDSPEC

You can build layouts with charts of varying sizes by setting a gridspec
object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid

Column 0 Column 1 Column 2


Column 3

Row 0

Row 1 ax1 ax2


Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

*Copyright Maven Analytics, LLC


GRIDSPEC

You can build layouts with charts of varying sizes by setting a gridspec
object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid

Column 0 Column 1 Column 2


Column 3

Row 0

Row 1 ax1 ax2


Row 2

Row 3

Row 4

Row 5 ax3
Row 6

Row 7

*Copyright Maven Analytics, LLC


GRIDSPEC

You can build layouts with charts of varying sizes by setting a gridspec
object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid

*Copyright Maven Analytics, LLC


GRIDSPEC

You can build layouts with charts of varying sizes by setting a gridspec
object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid

*Copyright Maven Analytics, LLC


ASSIGNMENT:
GRIDSPEC
Results
NEW MESSAGE Preview
September 12, 2022

From: Sarah Shark (Managing Director)


Subject: Revenue Report Format

Hi there,

Big meeting with our hotel client coming up – we want to


propose a report format that will help track their revenue,
specifically with respect to their goal to get French
customers to surpass German customers.

Can you create a figure with a line chart tracking revenue


by category, a bar chart with revenue for the top 5
countries, and a chart indicating progress towards our
French revenue goal?

Thanks!

section04_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION:
GRIDSPEC
Solution
NEW MESSAGE Code
September 12, 2022
GridSpec Layout (see notebook for chart
From: Sarah Shark (Managing Director) code):
Subject: Revenue Report Format

Hi there,

Big meeting with our hotel client coming up – we want to


propose a report format that will help track their revenue,
specifically with respect to their goal to get French
customers to surpass German customers.

Can you create a figure with a line chart tracking revenue


by category, a bar chart with revenue for the top 5
countries, and a chart indicating progress towards our
French revenue goal?

Thanks!

section04_solutions.ipynb

*Copyright Maven Analytics, LLC


COLORS

You can pass colors to a plot by assigning them to


a list

This assigns each color in


the list to each bar in
the plot

*Copyright Maven Analytics, LLC


COLORS

You can also loop through a list of colors to pass them to separate series in a
plot

*Copyright Maven Analytics, LLC


COLORS

Hex codes can be used to supply specific color


pantones

PRO TIP: Sites like Google


have
helpful hexadecimal color
pickers

*Copyright Maven Analytics, LLC


PRO TIP: COLOR
PALETTES
You can also modify the entire color palette for the series in a
plot

Default Color Map:

The “Set2” color map is applied


here

Series colors are applied in this


sequential
order (at 10+ series, the cycle
repeats)
rcParams are the underlying settings for Matplotlib charts and can
be
modified to gain a high level of customization (more on these
soon!)
For more on color palettes, visit: *Copyright Maven Analytics, LLC
ASSIGNMENT:
COLORS
Results
NEW MESSAGE Preview
September 13, 2022

From: Sarah Shark (Managing Director)


Subject: Re: Revenue Report Format

Hi again,

Love the layout, HATE the colors! Let’s show some polish by
getting away from the defaults.

Apply the “Set2” colormap to the line chart and look up


the national color hex codes for the top 5 countries to
use them for the rest of the charts.

Thanks,

Sarah

section04_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION:
COLORS
Solution Code
NEW MESSAGE
Apply Set2 (see notebook for chart
September 13, 2022
code): :
From: Sarah Shark (Managing Director)
Subject: Re: Revenue Report Format Country
Colors:
Hi again,
Donut
Love the layout, HATE the colors! Let’s show some polish by Chart
getting away from the defaults.

Apply the “Set2” colormap to the line chart and look up


the national color hex codes for the top 5 countries to
use them for the rest of the charts.

Thanks,

Sarah

section04_solutions.ipynb

*Copyright Maven Analytics, LLC


STYLE
SHEETS
Matplotlib (and Seaborn) have style sheets that can be used instead of the
default

The style is set in


advance

The “fivethirtyeight”
style has larger font
sizing, and adds
gridlines and a
background color

*Copyright Maven Analytics, LLC


STYLE
SHEETS
Matplotlib (and Seaborn) have style sheets that can be used instead of the
default
• You can still customize individual formatting options after setting a style

*Copyright Maven Analytics, LLC


STYLE
SHEETS
Matplotlib (and Seaborn) have style sheets that can be used instead of the
default
• You can still customize individual formatting options after setting a style
The Seaborn library
has additional styles
that can be used with
Matplotlib charts, like
“darkgrid”

*Copyright Maven Analytics, LLC


ADDITIONAL
STYLES
These are some of the additional styles available in both
libraries:

*Copyright Maven Analytics, LLC


ASSIGNMENT: STYLE
SHEETS
Results
NEW MESSAGE Preview
September 14, 2022

From: Sarah Shark (Managing Director)


Subject: Re: Re: Revenue Report Format

Hi,

Layout and colors look great now, but can we spruce up


the chart styling?

Use a style sheet of your choice.

Once we’ve done that it should be ready to ship.

Thx

-S

section04_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: STYLE
SHEETS
Solution
NEW MESSAGE Code
September 14, 2022
Style Setting Only (see notebook for chart
From: Sarah Shark (Managing Director) code):
Subject: Re: Re: Revenue Report Format

Hi,

Layout and colors look great now, but can we spruce up


the chart styling?

Use a style sheet of your choice.

Once we’ve done that it should be ready to ship.

Thx

-S

section04_solutions.ipynb

*Copyright Maven Analytics, LLC


STYLE PARAMETERS

Viewing the parameters of a style sheet can help format charts properly and
provide inspiration for your own formatting changes

*Copyright Maven Analytics, LLC


PARAMETER
GROUPS
There are 300+ parameters that can be modified, which fall into parameter
groups:

axes Chart-level formatting axes.spine.top = False, axis.titlesize=‘Large’


date Date formatting options date.autoformatter.month = %Y-%m
figure Figure-level formatting figure.figsize = (8.5, 11), figure.facecolor=“grey”
font Font settings font.size = 16, font.style=‘helvetica’, font.weight=‘bold’
grid Gridline settings grid.linestyle = ‘:’, grid.linewidth = 2
legend Legend settings legend.loc = ‘lower right’, legend.frameon=False
savefig Saved figure Settings savefig.dpi = 1000, savefig.format = ‘png’
text Text settings text.color = ‘grey’, text.usetex = True
xtick/ytick X and Y tick settings xtick.labelcolor=‘green’, ytick.minor.visible = True
boxplot Settings for boxplots boxplot.whiskerprops.color = ‘orange’
hist Settings for histograms hist.bins = 20
lines Settings for line charts lines.linewidth = 2, lines.color = ‘red’,
scatter Settings for scatterplots scatter.marker = “+”

For more on rcParams, visit: *Copyright Maven Analytics, LLC


MODIFYING
PARAMETERS
There are two ways to modify parameters:
1. You can change individual parameters via assignment
2. You can change multiple parameters from the same group with the rc() function
Turn off top and right spines
Change default axes title size to 20
Modify
figure size to 8”x 6”

PRO TIP: M odify parameters to avoid having


to
repeat the same formatting options on each
chart

*Copyright Maven Analytics, LLC


SAVING
FIGURES
The savefig() function will save figures as an image
file
• Simply specify the desired filename and format

Screenshotting the images with your operating


system’s snipping tool will often be sufficient for
building plots into presentations like this course
;).

*Copyright Maven Analytics, LLC


SAVING
FIGURES
The savefig() function will save figures as an image
file
• Simply specify the desired filename and format

If no extension in the filename is specified, the


file will be saved as a .png. Most systems
support
.jpg, .jpeg, .svg, and .pdf, among others. The
default resolution is 100dpi (pixels per inch)

*Copyright Maven Analytics, LLC


KEY TAKEAWAYS

Subplots and GridSpec allow us to create multi-chart


figures
• Subplots are equally sized grids, GridSpec allows for custom
layouts

Colors can be set by specifying a colormap or by assigning colors to the


data of interest
• Common color names and hex codes can be used to assign colors to your
data

Set a style to spruce up the default aesthetics, or use rcParams to


completely
customize your
• Pre-built stylescharts
can add some nice aesthetic polish compared to the matplotlib defaults
• Understanding how to modify rcParams will allow you full control over chart customization, and reduce
the need
for manual formatting
*Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
PROJECT DATA:
OVERVIEW
Coffee
Production

*Copyright Maven Analytics, LLC


PROJECT DATA:
OVERVIEW
Prices Paid To
Growers

*Copyright Maven Analytics, LLC


ASSIGNMENT: MID-COURSE
PROJECT
Key Objectives
NEW MESSAGE
September 18, 2022 1. Read in data from multiple csv files
From: Clarissa Café (Coffee Client) 2. Reshape the data with Pandas to set up
Subject: Summary Report charts

Hi there, 3. Build and customize line charts, bar


charts, histograms and more to
Sarah told me to reach out directly to you – we loved the communicate key insights to our client
work you did on breaking down the industry, but we want
to summarize your findings on Brazil into a single figure 4. Modify chart colors to represent national
we can pass around. flags
Can you combine your findings into a single figure report? 5. Combine modified charts into a single
We’ll also want to modify colors. There are more details in
the attached notebook.
report by
leveraging meshgrid and subplots
Thanks!

Clarissa
section05_coffee_project_part2.ipynb

*Copyright Maven Analytics, LLC


*Copyright Maven Analytics, LLC
DATA VISUALIZATION WITH
SEABORN
In this section we’ll cover data visualization with Seaborn, another Python library
that
introduces new chart types and layouts, and interacts well with Matplotlib

TOPICS WE’LL GOALS FOR THIS SECTION:


COVER:
• Introduce the basics of plotting data with
Seaborn
• Build variations of Matplotlib charts like bar charts
and histograms, as well as new visuals like boxplots,
violin plots, and linear relationship plots

• Create FacetGrid layouts as an alternative to


subplots

• Integrate Seaborn plots with Matplotlib objects to


get the best of both worlds

*Copyright Maven Analytics, LLC


MEET
SEABORN

Seaborn is a Python library for built for easily visualizing Pandas


DataFrames,
taking away some of the “drawing” required when using Matplotlib
‘sns’ is the standard alias for
Seaborn
You simply need to
specify a DataFrame as
the “data” argument
and set columns as the
“x” and “y” axes
Seaborn will
automatically aggregate
the results!

*Copyright Maven Analytics, LLC


MEET
SEABORN

Seaborn is a Python library for built for easily visualizing Pandas


DataFrames,
taking away some of the “drawing” required when using Matplotlib

You can change the aggregation


method and suppress the
confidence intervals

*Copyright Maven Analytics, LLC


CHART FORMATTING

You can apply chart formatting to Seaborn plots using Matplotlib


arguments
• These are passed to the Matplotlib object that Seaborn creates internally

We’ll cover integration with Matplotlib later, which is where you’ll be able
to leverage the chart formatting skills you’ve learned throughout the
course

*Copyright Maven Analytics, LLC


CHART FORMATTING

Seaborn still has some useful chart formatting functions like


despine()

*Copyright Maven Analytics, LLC


BAR
CHARTS
Bar charts can be created in Seaborn with sns.barplot()
• Simply specify the desired category labels and series values as “x” & “y”
arguments

Note that Seaborn automatically aggregates the data for the plot, using unique category values as the
labels for the bars, the mean of each category for the bar length, and the column headers as the axis
labels
*Copyright Maven Analytics, LLC
BAR
CHARTS
Bar charts can be created in Seaborn with sns.barplot()
• Simply specify the desired category labels and series values as “x” & “y”
arguments

To create a horizontal bar chart, specify “x” as the data


and “y” as the labels. ci=None will suppress error
bars.
*Copyright Maven Analytics, LLC
GROUPED BAR
CHARTS
Grouped bar charts can be created by specifying a categorical column as
“hue”

You can also sort the bars by one of


the columns, and apply a different
color map
*Copyright Maven Analytics, LLC
HISTOGRAMS

Histograms can be created with sns.histplot() and a single “x”


argument

*Copyright Maven Analytics, LLC


HISTOGRAMS

Histograms can be created with sns.histplot() and a single “x”


argument
• You can also specify the number of “bins” and add the kernel density
(kde=True)

The default style for Seaborn plots can


be nicer than their Matplotlib
counterparts, and vice versa, so choose
the library the works best for each
chart!

*Copyright Maven Analytics, LLC


ASSIGNMENT: BASIC
CHARTS
Results
NEW MESSAGE Preview
September 20, 2022

From: Sarah Shark (Managing Director)


Subject: New Charts

Hi,

Need a few more views on the hotel data using Seaborn.

Can we look at the distribution of lodging revenue for


each booking? Only plot customers with less than 1,500
dollars to weed out longer term stays.

Then, build a bar chart with the average room nights


stayed
for our top 5 countries.

Thanks

section06_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: BASIC
CHARTS
Solution
NEW MESSAGE Code
September 20, 2022

From: Sarah Shark (Managing Director)


Subject: New Charts

Hi,

Need a few more views on the hotel data using Seaborn.

Can we look at the distribution of lodging revenue for


each booking? Only plot customers with less than 1,500
dollars to weed out longer term stays.

The build a bar chart with the average room nights stayed
for
our top 5 countries.

Thanks

section06_solutions.ipynb

*Copyright Maven Analytics, LLC


BOXPLOTS

Boxplots can be created with sns.boxplot()


• They visualize the distribution of a variable by plotting key
statistics

Q1 Media Q3
n

Min Q3+1.5*IQR
Boxplot
statistics:
• M edian (50th percentile) Max

• 1 st & 3 rd Quartiles (25th & 75 th


percentiles) Outlier
s
• Interquartile Range (IQR)
• M in & M ax Values (or 1.5x the IQR)
• Outliers

IQR

*Copyright Maven Analytics, LLC


BOXPLOTS

Boxplots can be created with sns.boxplot()


• They visualize the distribution of a variable by plotting key
statistics

Specify a second axis to


create separate boxplots
by category

*Copyright Maven Analytics, LLC


VIOLIN
PLOTS
Violin plots can be created with sns.violinplot()
• They are boxplots with symmetrical kernel densities along their
sides

*Copyright Maven Analytics, LLC


ASSIGNMENT: BOX & VIOLIN
PLOTS
Results
NEW MESSAGE Preview
September 24, 2022

From: Sarah Shark (Managing Director)


Subject: Re: New Charts

Hi,

Let’s view the distribution of lodging revenue using a boxplot


instead, once again capping the revenue at 1500.

Then filter the data to the top 5 countries and build a


violin plot of their lodging revenue, as well as their age
distribution.

Sarah

section06_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: BOX & VIOLIN
PLOTS
Solution
NEW MESSAGE Code
September 24, 2022

From: Sarah Shark (Managing Director)


Subject: Re: New Charts

Hi,

Let’s view the distribution of lodging revenue using a boxplot


instead, once again capping the revenue at 1500.

Then filter the data to the top 5 countries and build a


violin plot of their lodging revenue, as well as their age
distribution.

Sarah

section06_solutions.ipynb

*Copyright Maven Analytics, LLC


LINEAR RELATIONSHIP
PLOTS
Seaborn has several plots to explore linear
relationships:

Creates a scatterplot sns.scatterplot(x, y,


data)

Creates a scatterplot with a fitted regression line sns.regplot(x, y,


data)
Create a scatterplot with a fitted regression line, and can
visualize sns.lmplot(x, y, hue, row, col,
multiple categories using color, or splitting into rows & columns data)

Creates a scatterplot and adds the distribution for each variable sns.jointplot(x, y, kind,
data)
Creates a matrix of scatterplots comparing multiple variables,
and shows the distribution for each one sns.pairplot(cols
)

*Copyright Maven Analytics, LLC


REGPLOT()

sns.regplot() creates a scatterplot with a fitted regression


line

*Copyright Maven Analytics, LLC


LMPLOT(
)
sns.lmplot() lets you explore the impact of other variables on the
relationship

Specify the ‘hue’ to


create a line for each
category in the
specified column and
set a different color
for each category

*Copyright Maven Analytics, LLC


LMPLOT(
)
sns.lmplot() lets you explore the impact of other variables on the
relationship

Specify the ‘row’ and


‘column’ to create regression
plots for each combination
of variables

PRO TIP: This type of visual is


great for exploring your data, but
way too complex for a
presentation!

*Copyright Maven Analytics, LLC


JOINTPLOT()

sns.jointplot() creates a scatterplot and adds the distribution of each


variable
The ‘kind’ argument
has several options
like ‘kde’, which
plots the kernel
densities, and ‘reg’,
which plots the
regression line

*Copyright Maven Analytics, LLC


PAIRPLOT(
)
sns.pairplot() creates a matrix of scatterplots comparing multiple variables,
and shows the distribution for each one along the diagonal

This lets you see the relationship between a


diamond’s
weight (carat) and its length (x), width (y), and depth
(z)
You can see that the weight of the diamond has a
positive relationship with height, width, and length,
with the relationships being VERY strong for width
and depth

*Copyright Maven Analytics, LLC


ASSIGNMENT: LINEAR RELATIONSHIP
PLOTS
Results
NEW MESSAGE Preview
September 26, 2022

From: Wendy Whiz (Data Scientist)


Subject: More Exploration

Hi there,

Can you produce charts to explore the relationship


between room nights and lodging revenue?

First for all the data and then for each top 5 country.

Can you also produce a pairplot comparing lodging


revenue
to several key variables? (more details in the notebook)

Best,

Wendy

section06_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: LINEAR RELATIONSHIP
PLOTS
Solution
NEW MESSAGE Code
September 26, 2022

From: Wendy Whiz (Data Scientist)


Subject: More Exploration

Hi there,

Can you produce charts to explore the relationship


between room nights and lodging revenue?

First for all the data and then for each top 5 country.

Can you also produce a pairplot comparing lodging


revenue
to several key variables? (more details in the notebook)

Best,

Wendy

section06_solutions.ipynb

*Copyright Maven Analytics, LLC


HEATMAPS

Create a heatmap to visualize a table of data with


sns.heatmap()

PRO TIP: Pandas’


pivot_table method is a great
way to set up the data
needed for a heat map!

*Copyright Maven Analytics, LLC


HEATMAPS

Create a heatmap to visualize a table of data with


sns.heatmap()

You can modify rcParameters


with sns.set(), but we’ll show
the syntax for combining
Matplotlib and Seaborn
shortly!

*Copyright Maven Analytics, LLC


ASSIGNMENT:
HEATMAPS
Results
NEW MESSAGE Preview
September 26, 2022

From: Wendy Whiz (Data Scientist)


Subject: RE: More Exploration

Hi there,

Last piece to help me look at features for my modeling work.

Can you build a heatmap with countries as rows and


market segment as columns with the mean lodging
revenue for each?

Then build a heatmap for a correlation

matrix. Thanks,
Wendy

section06_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION:
HEATMAPS
Solution
NEW MESSAGE Code
September 26, 2022

From: Wendy Whiz (Data Scientist)


Subject: RE: More Exploration

Hi there,

Last piece to help me look at features for my modeling work.

Can you build a heatmap with countries as rows and


market segment as columns with the mean lodging
revenue for each?

Then build a heatmap for a correlation

matrix. Thanks,
Wendy

section06_solutions.ipynb

*Copyright Maven Analytics, LLC


FACETGRID

Seaborn’s FacetGrid is a convenient alternative to Matplotlib’s subplot


grids
• sns.FacetGrid(DataFrame, column, column wrap)

This creates 7 charts, one for


each
“color”, in a grid with 3
columns

*Copyright Maven Analytics, LLC


FACETGRID

Seaborn’s FacetGrid is a convenient alternative to Matplotlib’s subplot


grids
• sns.FacetGrid(DataFrame, column, column wrap)

This plots a
histogram of “price”
for each “color” in
the DataFrame

*Copyright Maven Analytics, LLC


MATPLOTLIB
INTEGRATION
You can build Seaborn plots in Matplotlib objects, which lets you customize
and integrate Seaborn charts as if they were built using Matplotlib

This creates a Matplotlib figure and axis, sets a Seaborn


style, creates a Seaborn bar chart, and then adds
Matplotlib labels

*Copyright Maven Analytics, LLC


MATPLOTLIB
INTEGRATION
You can build Seaborn plots in Matplotlib objects, which lets you customize
and integrate Seaborn charts as if they were built using Matplotlib

This lets you specify


which
axes to plot the chart
on

*Copyright Maven Analytics, LLC


KEY TAKEAWAYS

Seaborn is a user-friendly extension of Matplotlib


• It has a simple interface, nice aesthetics, and works well with Pandas DataFrames

Seaborn adds new chart types that are useful in exploring data
• Boxplots, violin plots, and linear model plots help profile data and identify relationships between
variables

Seaborn is very compatible with


Matplotlib
• Seaborn charts are extensions of Matplotlib objects, so they can be placed in Matplotlib
figures
• Matplotlib formatting arguments can passed to corresponding Seaborn plotting functions

*Copyright Maven Analytics, LLC


*Copyright Maven Analytics, LLC
PROJECT DATA: USED CARS
DATA

*Copyright Maven Analytics, LLC


ASSIGNMENT: FINAL
PROJECT
Key Objectives
NEW MESSAGE
October 10, 2022 1. Read in and manipulate data with Pandas
From: Aaron Auto (VP of Fleet Management) 2. Build summary charts with M atplotlib and
Subject: Optimal Fleet Truck Purchase Seaborn
3. Leverage Seaborn’s advanced chart types to
Hello,
mine
We need an outside analysis on auto procurement for insights from the data and make a decision
our fleet of service vehicles. We lease trucks to
contractors and other businesses, but a recent spike in
demand has meant we’re unable to get cars from
traditional suppliers.

I want to see an overview of the automotive auction


industry, before diving into where we can get Ford F150s
for the most affordable price on the market (more details
in the notebook).

Thanks
section07_final_project.ipynb

*Copyright Maven Analytics, LLC

You might also like