Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 37

14-Oct-21 549519167.

docx 1

Data Visualization: Getting Started with Plotly


Plotly is Python's browser-based graphing library, which provides users with online
graphing, analytics, and statistics tools. In this course, you'll explore how to use
Plotly's declarative APIs to build interactive graphs and visualizations. You'll start this
course by getting familiar with the components of the Plotly library. You'll identify
the role of the high-level library (plotly.express) in creating visualizations and the
low-level library (plotly.graph_objects) in creating granular customizations of your
charts. Next, you'll investigate the use of box plots in visualizing the statistical
properties of a continuous data series. You'll also discover how to represent
additional categorical data by creating separate box plots and customizing their
color. Finally, you'll examine how to implement a candlestick chart to reflect the
trend of stock price performance over a period of time and visualize sequential data
in a linear process using funnel charts.

Contents
Data Visualization: Getting Started with Plotly..............................................................1
1. Course Overview..................................................................................................3
2. Installing Plotly.....................................................................................................4
3. Components of Plotly Graphs..............................................................................7
4. Creating Box Plots in Plotly................................................................................11
5. Plotting Categorical Data with Box and Strip Plots............................................15
6. Customizing Plotly Box Plots..............................................................................19
7. Visualizing Financial Data Using Candlestick Charts..........................................24
8. Visualizing Data Using Plotly Funnel Charts.......................................................29
9. Course Summary................................................................................................34

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 2

1. Course Overview
Hi, and welcome to this course of getting started with Plotly. My name is Vitthal
Srinivasan, and I will be your instructor for this course.

Your host for this session is Vitthal Srinivasan. He is a Software engineer and big
data expert.

A little bit about myself first, I did my master's from Stanford University and have
worked at various companies including Google and Credit Suisse. I presently work for
Loonycorn, a studio for high quality video content. Plotly's Python graphing library
makes interactive publication quality graphs. Plotly allows anyone with basic
programming knowledge in Python to build high quality graphs and dashboards.
Plotly APIs are declarative, which means customizing graphs and charts is very
straightforward and intuitive. Plotly has great support for basic chart types, but its
support for specialized graphs is what makes Plotly special. In this course, you will
learn how to use the declarative APIs that Plotly

offers to build interactive graphs and visualizations. You will start this course off by
understanding the basic components that make up the Plotly library. These include
Plotly Express, which is a high level library. And plotly.graph objects, which is a lower
level library with very granular customization of your charts made possible. First you
will create a box and whisker plot using Plotly. This will visualize statistical properties
of a continuous data series. Next, we will see how you can use candlestick charts to
visualize the trend of stock price performance over time. Candlestick charts allow
you to represent information corresponding to the open, close, high, and low for a
particular stock over a specific period of time. The color of the candlestick is cleverly
encoded to specify whether

the day represented an up or down date for that stock. Finally, you will round this
course off by creating and visualizing funnel charts. These are a niche type of
visualization which is perfect for plotting sequential steps in a linear process. Such as
conversions of sales leads into sales after various steps along the way. When you are
finished with this course, you will have a good grip on the foundations of using Plotly
to construct simple visualizations and charts.

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 3

2. Installing Plotly
Installing Plotly. Your host for this session is Vitthal Srinivasan.

Let's go ahead and get started with our exploration of the capabilities of Plotly. Here
I am at a terminal window. This is on a Macintosh machine. However, you can run
similar commands on other platforms as well. I begin by typing out python --version.
This will give us a sense of the version of Python that I'm running here, which is
Python 3.8.5. In these demos, we will be making use of Jupyter notebooks. And so
let's also do a quick version check for our Jupyter libraries. The command for this is
on screen now, it's jupyter --version. And when we hit Enter, we get the version
numbers for all of the important libraries associated with Anaconda's installation on
this platform. The next step is to actually launch the Jupyter Notebook server.

The command for this is jupyter space notebook. We hit Enter and various messages
come up, culminating in a series of messages which tell us how we can access
notebooks that are URLs which we can hit from a browser. Jupyter runs as a server.
And by copying one of these URLs and pasting it into a browser, we are effectively
hitting that server. And that server is running on port 8888 of the local host. So let's
copy one of these URLs and switch over to a browser and paste that URL in. This
brings up the familiar user interface of Jupyter. We can see that there are various
tabs associated with the files running

The Jupyter home page appears on the screen.

programs and clusters. In the Files tab, which is currently selected, we can see the
contents of the present working directory.

Here is a folder called datasets. And if we click on it, we will be taken into that
datasets folder. This includes a large number of csv files, which we will be making use
of in the upcoming demos. All of them are in this datasets folder. And this datasets
folder in turn is in the same present working directory from which we will be opening
up Jupyter notebooks. This means that we will be able to access these files using a
relative file path. Let's scroll back to the top and click on the folder icon. This is going
to take us back one level up to the main working directory. And here we can now
click on the New button over on the top right to launch a new notebook.

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 4

A drop down menu appears showing two sections: Notebook and other. In
Notebook there is only one option "Python 3". Wherein Other contains three
options: Text File, Folder, and Terminal.

In the drop down which opens up,

we currently see only one choice which is Python 3. Depending on what technologies
you have installed on your system, you might see additional entries there for Python
2, Scala, Spark, and so on. But Python 3 is what we are interested in so that's what
we will select. This causes a brand new, untitled and empty Jupiter Notebook to
open up. We can change the title by clicking on its name. This brings up a Rename
Notebook dialog. And there we'll give this notebook the name
InstallationOfModules, and then click on the Rename button in the bottom right.
Let's free up a little bit of screen real estate. We do this by clicking on the View menu
up top. There are menu items there for Toggle Header and Toggle Toolbar and we'll
click on both of those to Toggle the display off. And the result of this is that neither
the header nor the toolbar are displayed any longer. We can always bring them back
because the toolbar ribbon and

the View menu item still remain visible even after we do so. We now have one code
cell visible on screen. We can type code into the cells and executed by hitting
Shift+Enter. You probably know this already, but in case you're new to Jupyter
notebooks, please remember that's the keystroke combination Shift+Enter. Here we
hit Shift+Enter in this empty code cell, and that creates many more code cells that
we can now type into. Let's run the

!pip install Plotly

command, you can see this on screen now. A couple of points worth noting here. We
have an exclamation point, often referred to as a bank, preceding the pip install
command. This exclamation point is no longer strictly required. That bank is required
to tell Jupyter that you're trying to run a shell command. But pip install is a common
enough command that Jupyter is now able to

figure it out. The second bit worth noting is that you might have pip three installed
on your system. You should know that pip is going to redirect to pip three under the
hood in any case. We now hit Shift+Enter, and the Plotly installation happens if it's
required. Here the requirement is already satisfied and therefore, there's no
additional need to install any libraries. Let's quickly check the version of Plotly that

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 5

we are making use of. We'll do this by first importing Plotly and then displaying the
version special variable.

import plotly. plotly.__version__.

This special variable is preceded by and followed by two underscores, that is known
as the dunder notation. So here we are printing out plotly.__version__. And that
displays 4.14.1. That's the version of Plotly that we will be making use of in the
upcoming demos.

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 6

3. Components of Plotly Graphs


Components of Plotly Graphs. Your host for this session is Vitthal Srinivasan.
The Figure -Jupyter Notebook appears on the screen.

In this demo, we will get started with simple Plotly visualizations. We begin by
performing a simple import statement. This is the import of plotly.express as px. A
quick word about Plotly Express. This is a high level library, a module which comes as
a part of the Plotly package, that can be used to directly create figure objects. The
advantage of Plotly Express APIs is that they allow you to create these figures really
quickly and without getting into all of the low-level nitty gritty. This is where the
word express comes from in its name. But you should be aware that all of those
figure objects are actually examples of graph objects. And graph objects, in turn, is
the name of yet another module which we will be importing.

This is also a part of Plotly. Graph objects is a much lower-level API. Internally, all
Plotly Express functions are also working with graph objects. And each graph object,
in turn, is expressed in terms of JSON. And these JSON graph objects are rendered by
a library called plotly.js. Let's go ahead and get started. The relationship between
Plotly Express and graph objects will become a lot more clear as we go along.

fig = px.line(x = ['a', 'b', 'c', 'd'], y = [1, 3, 2, 6], title = 'simple figure') fig.show().

On screen now we have used a Plotly Express function to create our first figure. This
figure is a px.line. It has x and y coordinates, as well as a title. The x coordinates are
taken in, in the form of a list. These values are a, b, c, d.

The y coordinates are 1, 3, 2, and 6, and the title is sample figure. Once we have this
figure object, we can go ahead and render it by invoking the show method on it. And
that will cause that figure object to be rendered by plotly.js. This happens in line
inside our Jupyter Notebook. And here we can see our first figure. On the y-axis, we
do indeed have all of those numeric values. And on the x-axis, we have the string
values that we passed in. Now, if we place our mouse on the top right corner of this
visualization, we can see that a toolbar opens up. There are many interesting toolbar
icons here, but the one we're currently looking at is the rightmost one, which says,
Produce with Plotly. And as you can see from the tooltip, this is the Plotly logo. If we
click on it, it's going to open up plotly.com. That's the homepage for Dash Enterprise,
as well as all of the other products in the Plotly suite.

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 7

As you can see from this, Plotly and Dash will run on Azure, as well as AWS. And as
we scroll down, we can get a sense of all of the capabilities of this Enterprise suite.
You might now be thinking that most of the references on this page seem to be to
Dash rather than to Plotly. But you should be aware that they are closely related.
Dash is a very popular visualization library. And under the hood, it's making use of
Plotly. It also makes heavy use of Decorators, and that's its basic mechanism of
working. There's a menu right up top, and one of the menu items there is DOCS.

The menu bar contains four tabs: DASH ENTERPRISE, DOCS, COMPANY, and
GALLERY.

Let's open that up and click on DOCS. And here we see that there are three offerings
in the Plotly Suite, Plotly Python, Plotly R, and Plotly JavaScript.

Each of these is an Open Source Graphing Library. Let's click on the DOCS of Plotly
Python, and we'll get to the documentation, which is actually very high quality, even
relative to the documentation for other popular Python visualization libraries.

A page titled, plotly | Graphing Libraries appears on the screen.

The DOCS are broken down by category. So we have Fundamentals, Artificial


Intelligence and Machine Learning, Basic Charts, Statistical Charts, Scientific Charts,
and if you keep scrolling down, there are also use cases for Financial Charts, as well
as Maps. And that's not the end of it. There are more examples, 3D Charts, Subplots,
Transforms, Animations, and more. Once you've had your fill of navigating these
DOCS, we can switch back to our Jupyter Notebook. There we had a figure object.

That's the one which we had created using px.line. Remember that, under the hood,
every figure object is a graph object instance.

print(fig)

And we can print out this figure, we will see that it's actually in JSON format. This
means that it has the typical look of nested dictionaries that JSON objects tend to
have. JSON, of course, is an acronym for JavaScript Object Notation. By having every
figure expressed in terms of JSON, as the Plotly Docs tell us, there are many
advantages to having graph objects in JSON rather than just expressed as plain
Python dictionaries. We can access the properties of graph objects either using
dictionary style key lookup or class style property access. We can get good precise

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 8

data validation, as well as descriptions of valid properties. There are other


advantages, as well.

Let's now turn our attention to the code on screen. This is the JSON representation
of our figure object. There are two keys in here, data and layout. In general, figure
objects in Plotly are going to have these two keys. And they might also have a third
top level key called frames, which comes into play when you're working with
animations. You can see here that inside the data, we have a list that list, itself,
contains many more keys and values. And those values include a line along with its
associated color and line type. If you look a little further below, you can see that
there are keys for the x and the y. Both of those are arrays. x is dtype object.

There isn't any dtype specified for the y. And the layout includes more information,
which includes the title, the text of that title is, sample figure. And this is how all of
the information that we passed in while creating our figure object using px.line is
internally represented. Let's now access the key data from this figure dictionary.

fig ['data'].

And we can see that we do indeed get all of the key value pairs contained within it.
Here, we can see that we have a list of dictionaries. These are referred to as traces,
in Plotly. Each trace has one of more than 40 possible types. On screen now is the list
of dictionaries associated with the key data. And this is the trace for data. We can
index into this trace with the typically indexing notation.

fig ['data'][0].

Here we are accessing the element at index position 0, and

this is the Scatter. Finally, let's very quickly look at the value associated with the
other top level key, which is layout.

fig ['layout'].

This value is also going to be a dictionary, and it's going to contain attributes that
control positioning and configuration of non-data related parts of the figure, such as
dimensions and margins, figure-wide defaults, title and legend associations, and
color axes and color bars. We won't get to the third top level key that's often found
inside these figure dictionary objects, and that is frames. Because that's a lot to do
with animation, which we're not really going to be able to cover in these demos. That

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 9

gets us to the end of this little demo, in which we instantiated our first graph objects
figure, and we created a simple line chart using that. Along the way, we also
explored how every Plotly Express API under the hood is invoking graph objects APIs,
and creating a graph objects figure. And every instance of graph object is, actually, a
JSON object.

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 10

4. Creating Box Plots in Plotly


Creating Box Plots in Plotly. Your host for this session is Vitthal Srinivasan.

In this demo, we are going to work with box plots in Plotly. Box plots, commonly
referred to also as box and whisker plots, are a great visual way to get a quick sense
of the statistical properties of a distribution. At a glance, we can tell the median, the
25th and 75th percentiles, and also outliers if any. Let's go ahead and see how box
plots can be constructed in Plotly.

The BoxPlots-Jupyter Notebook appears on the screen.

Let's start with the required import statements. These are pretty simple.

import pandas as pd. import plotly.express as px. import plotly.graph_objects as


go.

We import pandas as pd, plotly.express as px, and graph_objects as go. Next, let's
read in the data that we'll be working with in this demo into a Pandas data frame.

bank_customer_data, = pd.read_csv('datasets/bank.csv'). bank_customer_data().

The URL for this dataset is visible on screen now.

The url: https://archive.ics.uci.edu/ml/datasets/Bank+Marketing.

And it contains information about marketing of loans and

deposits to customers of a bank. We've read this in using the pd.read_csv function.
We have the bank.csv file in the datasets folder. And so a relative file path does the
trick. We save the resulting data frame in a variable and invoke the head method on
it to quickly get a sense of the data. Head is always going to return the first five rows
of a Pandas data frame. Here we can see that we have columns for the age, job,
marital status, education. This is all customer information. And then we have
attributes of the relationship of this customer with the bank. These include whether
the customer is in default. The current bank account balance, loan status, contact
information. And then we have marketing related information such as marketing
campaigns

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 11

that have touched this particular customer. Let's invoke the describe method on this
data frame. This will give us a quick sense of the statistically properties organized in
the form of yet another Pandas data frame.

bank_customer_data.describe().

Here you can see that we have columns. Each column here is a column from the
original Pandas data frame.

Cell 3 displays a table with the following column headers: age, balance, day,
duration, and so on.

All non-numeric columns have been excluded. And then the rows correspond to
various statistical properties. These include the count, mean, standard deviation,
min, 25th, 50th, and 75th percentiles, and the max. Virtually all of this information is
also going to be represented in our box plots. The 50th percentile is also the median,
and the box plots that we will plot by default are going to display the median rather
than the mean.

Box plots also do not often display the standard deviation. But other than that, these
other statistical properties will all be visually represented in a box plot. So let's go
ahead and create one. On screen now we've invoked the plotly.express function,
px.box.

fig = px.box(bank_customer_data, y = 'age'). fig.show().

We've passed in the Pandas data frame, which is going to hold the underlying data.
We've specified the name of a column which we wish to compute the box plot on.
That's the age column. And then we invoke the show method on this resulting figure
object. The box plot comes into view. Let's take a minute to understand it. We know
quickly that along the y-axis we have the age, and that of course is what we had
specified while instantiating the figure.

Then in the center of the plot, we have a rectangular area. We then have two fences
also known as whiskers extending out in both directions from the rectangle. And
then on top of the upper fence, we have individual points. And finally we can see
that there is a horizontal line which passes through the rectangular area. This is the
basic construction of a box plot. And in many tools and software packages, we've
then got to reverse engineer exactly what all of these mean. But this is a great

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 12

feature of Plotly box plots. If you just hover over this, you'll get a beautiful
explanation, and that's the explanation which we now see on screen. Let's start with
the horizontal line which is inside the rectangular area. That horizontal line is the
median. And we can see from the tool tip that the median age is 39. Then we have
the rectangle which extends from q1 which is 32,

up to q3 which is 49. Q1 is the 25th percentile value and q3 is the 75th percentile
value. The way to interpret these two percentiles is as follows. 75% of the data
points in our dataset have age which is less than or equal to 49. Only 25% of the data
points have aged less than or equal to 32. And the median, well, that's the 50th
percentile. So 50% of our data lies on or below the age of 39 and the other 50% lies
above. There are some statistical fine points here which hinge on whether the
median is an inclusive or an exclusive median. We won't go too deep into the
differences between these two, but you should know that by default, Plotly
computes an inclusive median. This means that if the sample size is odd, the median
is going to be included in both halves. An exclusive median would have not done this,
it would have excluded

the median from both halves if the sample size was odd. In both cases, q1 is going to
be the median of the lower half, and q3, the median of the upper half. This is true
whether we are performing inclusive or exclusive median calculations. Let's now turn
to the whiskers or the fences as they're referred to in Plotly. The upper fence here
has the value 74 and that is 1.5 times the interquartile range away from the box
edge. The interquartile range is defined as q3 minus q1, which here is 49 minus 32,
which is 17. 1.5 times that interquartile range is approximately 25.5. When we add
25.5 to 49, we get 74.5. And that is approximately the value of the upper fence.

There's a little bit of rounding going on in here as well. What now of the lower fence?
32, which is the lower edge of the box, minus 25.5, which is 1.5 times the
interquartile range, should give us approximately seven. Why then is the lower fence
at the value of 18? Well, the answer is that the minimum value in our dataset is 18.
And because 18 is greater than 7, the lower fence extends only up to 18. Any points
in our underlying data which are larger than the upper fence or smaller than the
lower fence. Are going to be depicted with individual point markers, and these are
considered outliers. Here we do not have any small outliers, and that's because we
don't have any values lower than the minimum value of 18. But we do have a string
of outliers which is greater than the upper fence value of 74.

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 13

The max age in this dataset is 95. And so all ages which are greater than 74 are going
to appear as outliers. We can hover over those individual points and tool tips will
appear telling us the ages. Here, for instance, we're hovering over a point where age
is 93. If you move the mouse up a little, we will be hovering over the max value
where age is 95. Now there's one important little note to keep in mind. The exact
construction of a box plot varies from one visualization technology to another. So
you will find that a box plot on a data series in Plotly might look slightly different
from a box plot in another visualization technology. Either in Python or even like in
Microsoft Excel. And this comes down to all of the little choices which are made by

the box plot implementation. Such as the algorithm used for computing the median,
the definition of the interquartile range. How far the whiskers or the fences extend
out from the box and so on. So you should not be worried if box plots on a particular
data series in a different visualization packages look a little different. And secondly,
you should pay close attention to the exact docs for every box plot implementation.
So that you understand how all of these properties are determined.

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 14

5. Plotting Categorical Data with Box and Strip Plots


Plotting Categorical Data with Box and Strip Plots. Your host for this session is
Vitthal Srinivasan.

In the previous demo, we constructed a box plot for a single variable, which was the
age.

The BoxPlots-Jupyter Notebook appears on the screen.

Here you can see that on the y-axis, we have the age, but the x-axis is completely
blank. And that's because we don't actually have any variable on the x-axis at all. A
more common use of box plots is to actually plot different box plots for x and y
variables. The x variable is usually a categorical one, and we have box plots
corresponding to values of the y variable for each value of the x variable. Let's see
what this means with an example. On screen now, we've created another box plot
figure, once again,

fig = px.box(bank_customer_data, x = 'job', y = 'age'). fig.show().

we've invoked px.box. But the difference this time is that we have both x and y
variables,

x is set to job and y is set to age. Note that both of these values are strings which
correspond to the column names in the Pandas data frame which is passed in as the
first input argument. Once we have that figure, we can go ahead and invoke show on
it, and our box plot appears on screen down below. If we now look at the x-axis, we
see that this has the variable job. Now job is a categorical variable, and it has specific
labels or levels. These include admin, technician, services, management, retired and
so on. The y-axis continues to display age, however, we now have one box plot for
each value of the categorical x variable job. And this kind of box plot gives us
valuable information. About how the statistical properties of the y variable are
influenced by the values of the x variable. This is perhaps most prominent in the case
of the category retired.

There you can see that the entire box plot is shifted upwards. Somewhat expectedly,
the age of those in the retired category is quite a bit older than that of other
categories. The median age of the retired individuals is north of 60, it's around 64.
We can also see that the 25th percentile age for the retired category is at 59, the

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 15

lower fence is at 40. However, there are also a few outliers. For instance, the lowest
value of age for a retired individual is just 34. Maybe that person is an adherent of
the FIRE movement which has become popular in personal finance these days. If
you've not heard of it, FIRE is an acronym for financial independence, to retire early.
The beauty of Plotly's box plots is all of these tooltips which give us such granular
information. The oldest individual in our data set is in the retired job category and
that person has age 95.

We can see that from the tooltip. And we know that this person is the oldest.
Because that outlier for the retired category is higher on the y-axis than any of the
other outliers for any of the other categories. Retirees tend to skew higher on age,
students tend to skew noticeably lower. We can see by hovering over the box plot
for the student category over on the extreme, right? That their median age is only
25, there are, however, quite a few positive outliers here. The oldest student here
has age 47. So this box plot represents a more typical use case where we have a
categorical x variable. And we have individual box plots for all of the values of some
continuous y variable. We've seen now that we spent quite a bit of time hovering
over the outliers in our box plots. What if we'd like to have all points in the
underlying data set displayed in the box plot? That is also easy enough to do and the
code to do that is on screen now.

fig = px.box(bank_customer_data, x = 'deposit', y = 'duration', points = 'all')

fig.show().

The important difference is the keyword argument points which has been set to be
equal to all. So by specifying points = all, we are going to get all of the individual data
points in the data displayed as well. And you can see an example of this on screen
now. We have invoked px.box passing in our usual Pandas data frame. The x variable
is deposit, the y variable is duration. And again, points is set to be equal to all. Now if
we scroll down a little bit, we can see that we have a mass of the individual points
just to the left of the box plot. You might be wondering now how all of those points
seem to be spread out

along the x-axis. After all, there are only two values possible for the x-axis variable
and those are yes and no. And that's a great question, the reason that those points
do not all lie in the same vertical line is because this is a strip plot. In a strip plot,
jitter or random noise has been added to all of the values so that they do not overlap
with each other. So here we have two box plots, and then to the left of those two

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 16

box plots we have two strip plots. The strip plots give us the individual data points.
These give the relationship between duration and deposit, the box plots summarize
the statistical properties. Now that we have the strip plots, there's no need for the
outliers to be displayed on the box plot. And that's why we only have the rectangular
box along with the associated

whiskers, also known as the fences. As an aside, a strip plot is a great way to depict
relationships between a categorical and a continuous variable. However, a slightly
better representation is something known as a swarm plot. Swarm plots are also
used in such use cases, the difference is that in a strip plot, the jitter or noise that's
added to each data point is random. In the case of a swarm plot, those points are
actually going to be displayed in a bee swarm-like shape intended to avoid overlap.
So a swamp plot is actually algorithmically trying to minimize overlap, a strip plot just
adds random noise. Let's now hover over the different aspects of this plot. When we
hover over the box plot, we continue to get those helpful tool tips with the values of
the min, max and the median. Also note how here, because we have an x-axis
variable, the tooltip also gives us the specific label associated with this particular box
plot.

Right now we are hovering over the box plot for deposit equal to yes. And that's why
all of the tooltips also contain the string yes. Let's now hover over the individual
points in the strip plot, which is to the left of our box plot. And when we do this, we
see that the tooltips there contain information about both the x and the y
coordinates. And that's why we have the deposit which is the x coordinate, and the
duration which is the y coordinate. This is a handy way of zeroing on individual points
in our data set. Next, let's how on the box plot on the right. This one corresponds to
customers who do not have deposits with the bank. We can see that the statistical
properties of these customers in terms of duration of relationship are significantly
different. The longer the duration, the more likely deposit is to have the value yes.
We can see this from the fact that the distribution box plot on the right

is significantly smaller and shifted downwards relative to the one on the left.

Now that we have computed box plots which also includes strip plots, let's see how
easy it is to compute individual strip plots.

fig = px.strip(bank_customer_data, x = 'deposit', y = 'duration'). fig.show().

The code for this is on screen now, we've just invoked px.strip. We specify the
Pandas data frame with the source columns. And then we reference the x column

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 17

which is deposit and the y column which is duration. Deposit is a categorical column
and y is a continuous one. When we invoke the show method on this figure, we get a
strip plot. We have the two values of deposit on the x-axis, yes and no. And then we
have duration on the y-axis as before. We can see that this looks very, very similar to
the strip plots that we had inside the box plots. You can also see how even though all
of those points have x equal to either yes or no, they do not perfectly coincide on the
x-axis.

In other words, they do not appear concentrated along vertical lines. And that's
because a little bit of random jitter has been added to all of them so that their x
coordinates do not perfectly coincide. And this random jitter is what gives rise to this
particular strip appearance. As we've already discussed in a swarm plot, the jitter
would not have been random. Instead, PyPlot would have computed the best
optimal positions for the points to not overlap. In any case, we can examine all of
these points, we hover over them. And we do indeed have the values for both the
deposit and the duration show up in the tooltip.

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 18

6. Customizing Plotly Box Plots


Customizing Plotly Box Plots. Your host for this session is Vitthal Srinivasan.

In the previous demo, we took a brief segue away from box plots to strip plots, but
now we are going to come back to box plots. On screen now is the last box plot
which we visualized in the previous demo. There we had constructed a box plot
where we had the categorical variable on the x-axis and a continuous variable on the
y-axis. We'll now move on and continue to build on this, and add two features to the
next box plot that we construct. To begin with, we are going to add a color. This is
going to be one more bit of information, which will result in a doubling of the
number of box plots in our graph. And next, we will also see how we can add notches
to our box plot, and we'll see how these notches can be interpreted. So this box plot
on screen now is what we want to build. Here is the code that builds it. You can see
that we've invoked px.box as usual, so

we are still making use of Plotly Express APIs.

fig = px.box(bank_customer_data, x = 'housing', y = 'duration',. color = 'default',


facet_col ='deposit',. notched = True, title = 'Notched Box plot of Duration (Last
contact) by housing type',. hover_data = ['default']. )

fig.show().

The beauty of the Plotly Express APIs is just how high level they are. A little bit later
in this demo, we will recreate such box plots using the Graph Objects APIs. They are
also very handy, but just a little bit more low-level than the Plotly Express ones. In
any case, on screen now we've invoked px.box. We start by passing in the pandas
data frame bank_customer_data, housing is the name of the x variable, remember
this is a categorical variable, duration is the name of the y variable, this is a
continuous variable. And then we have a new input argument called facet_col. This is
a deposit like housing and duration. Deposit is also a column name from the pandas
data source passed in.

Deposit is also categorical and it has the acceptable values of yes and no. The facets
refer to individual subplots inside the outer box plot. And then we have one more
variable which we wish to represent, which is the color variable. This is taken from
the column titled default and it indicates whether a customer defaulted or not. So
you can see that in total we have three categorical variables and one continuous

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 19

variable. The three categorical variables are housing, default, and deposit. Each one
of them takes two values, yes and no. And that's why we are going to have a total of
eight box plots, 2 raised to 3 equals 8. Then importantly, we set notched = True.
We'll get to this in a moment, but at a high level it indicates confidence bounds
around the median.

Finally, we specify the title as well as the hover_data. We would like to include the
value of the default column in the hover_data so that's what we specify in there.
Once we invoke the show method on this figure object, we get our nice notched box
plot. A few different points worth paying attention to. Notice first off that we now
have two sub plots, you can think of these as a facets. And these correspond to
deposit=yes and deposit=no. This comes from having specified the facet column
equal to deposit. Then within each facet, on the x-axis we have housing and on the y-
axis we have the duration. On the x-axis we have the values yes and no. And then for
each value of housing, we also have two box plots for

the two values of the default variable. The legend which is visible over on the top
right tells us that the blue box plots are for default equal to no. The red box plots are
for default equal to yes. And we can count and satisfy ourselves that we have eight
box plots in total. That makes sense because, as we discussed a moment ago, we
have three categorical variables, housing, deposit, and default. Each one of these is
binary so there are two possible values, 2 raised to 3 equals 8. And that's the number
of box plots that we see. Now let's turn our attention to the notches, which are next
to the medians of each of these box plots. Those notches ought to be interpreted
vertically rather than horizontally. Let's, for instance, focus on the fourth box plot.
That's the red box plot where housing is equal to no, default is equal to yes, and
deposit is equal to yes.

The notches on this one are pretty prominent. The length of the notch as measured
vertically gives us the confidence interval around the median for that particular box
plot. Hovering over that box plot tells us that the median here is 682. How confident
are we about this median? Well, the vertical extent of the notches give us the 95%
confidence intervals. Those notches are also useful to compare the medians of
different box plots. The rule of thumb here is that if any two box plots have notches
which do overlap, then those medians might be statistically equivalent to each other.
On the other hand, if the notches of two box plots do not overlap, then it's a fair
estimate to say that the medians do differ from each other. For instance, let's
examine the first two box plots over on the left. The first one has median equal to

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 20

603.5, we can see that from the tooltip. And the second one right next to it has
median equal to 599.

These two box plots probably have the same median value. How do we know this?
Because the notches around the medians for these two box plots clearly overlap.
And we can conclude from this that the median duration for customers who do have
deposits, where deposit is equal to yes, and who also have housing, so where
housing is also equal to yes, does not vary based on whether they've defaulted or
not. But if we examine the two box plots to the immediate right, we come to a
different conclusion for that combination of the x variables. Here we can see that
when a deposit is equal to yes, and housing is equal to no, and default is equal to no,
the median is equal to 352. This is the median, which we see in the tooltip for this
blue box plot. If we just move our mouse a little bit over to the right, we can see that
the box plot there has median equal to 682.

More importantly, the notches of these two box plots do not overlap at all. And the
lesson that we can draw from this is that the median duration for the groups of
consumers where housing is equal to no and deposit is equal to yes do differ based
on whether default is equal to no or yes. And this important conclusion was
something that we could draw only because this is a notched box plot. Had this been
a regular box plot, we would not have been able to draw this conclusion. Let's wrap
up this demo by discussing how we can plot box plots using the Graph Objects APIs.

trace = go.Box(y = bank_customer_data['age'],. boxpoints = 'all', jitter = 0.3,name =


'Age',. pointpos = +1.8). fig = go.Figure(data = [trace]). fig.show().

So far we were making use of the Plotly Express APIs which were high-level. We just
needed the one function, px.box. We can also achieve a similar result using the
Graph Objects APIs. Because after all, Plotly Express is making use of Graph Objects
under the hood.

The code is just ever so slightly more low-level. On screen now is the Graph Objects
code to build a box plot along with an associated strip plot. Remember that every
Graph Object is represented as a JSON object. And this in turn means that it consists
of key-value pairs. All of the values inside the data, that's one of the top level keys
inside the Graph Objects representation, must be a list of dictionaries, and that's a
trace. On screen now, we've created one trace which holds the visualizations. And
then we've assigned that trace to the data element of a Graph Object. That trace is
created by invoking go.Box. We specify the y variable which is the age column from

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 21

our bank_customer_data. Then we specify boxpoints = 'all'. This is equivalent to


asking for

a box plot as well as the associated strip plot. So this is a really important input
parameter. Note this time that we needed to manually specify the jitter, the
magnitude of the jitter is 0.3. And then we also specify the name of our graph which
is 'Age', and the pointpos at which the strip plot should appear, pointpos takes a
value between -2 and +2. Here we've specified the value +1.8. If we had passed in
the value 0, the points would have been placed over the center of the boxes. Positive
values will have the strip plot appear to the right of the box plot. Having instantiated
this trace object, we can pass it in to the figure. We instantiate the figure using
go.Figure. The only keyword argument we need to pass in is data. And then we can
invoke the show method on the figure as usual.

Doing this displays our box plot along with the associated strip plot. This little
example proves that pretty much anything that you can do with Plotly Express APIs,
you can also do with Graph Objects APIs. And of course, that's because Graph
Objects APIs are more low-level, and also because Plotly Express uses Graph Objects
under the hood. Let's build one more box plot using the go APIs. On screen now we
have not one, but two traces.

fig = go.Figure(). fig.add_trace(go.box(. y = bank_customer_data['duration'],.


name = 'Duration-Only Mean',. market_color = 'darkolivegreen',. boxmean =
True. )). fig.add_trace(go.Box(. y = bank_customer_data['duration'],. name =
'Duration-Mean & SD',. market_color = 'firebrick',. boxmean = 'sd'. )). fig.show().

This time we've instantiated a fig and then invoked the add_trace method on that fig
twice. Inside the add_trace method we directly instantiate the box trace. We do that
by instantiating two go.Box objects. In both cases we have y taken from the duration
column of the bank_customer_data. And in both cases we omit an x variable. Well
then how are these two box plots going to be different?

It comes down to how the box mean is computed. You can see that the first one has
boxmean = True, the second one has boxmean = 'sd'. 'sd' here is short for standard
deviation. And the marker colors that we use for these box plots are different as
well. The first marker color is darkolivegreen, the second one is firebrick. Now we hit
Shift+Enter and we see that we have two box plots. Both of these represent
duration. Both of these have the same positions for the box as well as for the
whiskers. Notice how both of these box plots have a dashed line inside the rectangle.

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 22

This is a horizontal dashed line which is parallel to the median, but above it. This
dashed line represents the mean. This is the first time that we are displaying
information about the mean in

a box plot but you can see that this is possible as well. And then in the box plot over
on the right, we have a little diamond. That diamond gives the vertical extent of the
standard deviation. So the top vertex of that diamond is at that value mean plus 1
standard deviation. And the bottom vertex is at mean minus 1 standard deviation.
This is a really handy box plot representation because it combines information about
the standard deviation along with the 25th, 50th, and 75th percentiles. That gets us
to the end of our exploration of box plots in Plotly. We've made use of both Plotly
Express and the Graph Objects APIs here.

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 23

7. Visualizing Financial Data Using Candlestick Charts


Visualizing Financial Data Using Candlestick Charts. Your host for this session is
Vitthal Srinivasan.

In this demo, we're going to discuss specialized visualizations known as candlestick


charts, which are great for displaying financial data, specifically stock market
information. Let's plunge right in. We begin with the import statements. These
include pandas as pd, graph_objects as go, and then we also import datetime.

import pandas as pd. import plotly.graph_objects as go. from dateline import


datetime.

Notice that in this, we do not import Plotly Express because there isn't a low level
Plotly Express API for the candlestick representation as of now. Let's move on and
read in our data into a Pandas data frame.

stock_ge = pd.read_csv('datasets/GE.csv'). stock_ge.head().

This is in a file called GE.csv. As the name would suggest, this contains stock market
information for General Electric. We've downloaded this from Yahoo Finance and

we're now reading it into a Pandas data frame. As usual, we invoke the head method
on the data frame in order to get a sense of the available columns. Here we have a
datetime column, then we have the Open, High, Low, Close, Adj Close, and the
Volume. Next, let's examine the shape property of our Pandas data frame. And we
can see that this has 1259 rows and 7 columns.

The input command in the stock_ge .shape. The output command in the Cell 3:
(1259, 7).

It's worth noting here that the Date column at this point has been read in using
pd.read_csv and so it's represented as a string. But even so, we do not need to
convert it explicitly to datetime ns64 if we are making use of the graph objects APIs.
These APIs are smart enough to figure out that this is a date object. This is quite an
improvement over some of the other Python libraries, which require Pandas date
columns to be explicitly converted to datetime ns64. So at this point, we can move
on and

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 24

directly visualize our candlestick chart.

trace = go.Candlestick(x = stock_ge ['Date'],. open = stock_ge ['Open'],. high =


stock_ge ['High'],. low = stock_ge ['Low'],. close = stock_ge ['Close']). fig =
go.Figure(data = [trace]). fig.show().

We'll do this using the graph objects APIs. We start by creating a trace, that trace is
created by invoking go.Candlestick. Then we pass that trace into a figure and then we
invoke the show method on that figure. The trace which is of type go.Candlestick
requires us to specify the x-axis variable which here is the date column from our
stock_ge Pandas data frame. Then the candlestick chart requires us to specify
keyword arguments for the Open, High, Low, and Close. And these, of course, are
taken from the corresponding columns of the Pandas data frame. Then we pass the
trace into the data keyword argument while instantiating our go.Figure, and that
really is all there is to it. We hit Shift+Enter and our

first candlestick chart appears on screen before us. Let's scroll down just a little bit.
And there we see that this candlestick chart has not one but two visualizations. Both
of them have the same x-axis, which of course is time. Here we have data for the
stock price of GE for the last five years. So this starts in 2016 and extends up to 2020.
If you're wondering what that little visualization below the main chart is, well, it's a
slider. That's a slider that we can use in order to select a subset of the data in the
main visualization. And we can use the knobs, we can see how those knobs restrict
the amount of data that we show. And as we pull the left knob towards the right, we
can see that the x-axis also seems to compress.

We now only have data for 2020. We'll have more to say about the exact markers for
each point. But that is a little hard to tell in a very dense visualization like this one.
Let's know hover on the toolbar on the top right. There are many interesting options
there. For instance, this one says, Show closest data on hover. The button just to the
left reads Toggle Spike Lines. But for now, we are actually interested in the option to
the right, which is Compare data on hover, so let's click that. And now when we
come back down into the body of the main graph and hover over the individual
points, we get information about the open, high, low, and close of whatever point
we happen to be hovering over. Also notice that the color of the tooltip varies based
on whether that was an up day or a down day.

How does this chart know whether day was an up day or a down day? Well, it
compares the open and the close. If the open is greater than the close, this means

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 25

that the stock fell during the day, in which case the rectangle shows up in red. If the
open is less than the closed, it means that the stock price rose during the day and
then the rectangle is green. The candlestick chart is a great feature of Plotly. Plotly is
less focused on interactivity than other competing visualization packages, such as say
Bokeh for instance. But even so, these sliders make the candlestick chart really quite
interactive. In any case, let's now move on and construct some more types of
candlestick charts. For our next chart, we are going to hone in on a very small subset
of the data.

stock_ge _latest = stock_ge [-15:]. stock_ge_latest.head().

This will allow us to view each of the candlestick patterns in more detail. On screen
now, we've taken a subset of the ge stock data consisting of only the last 15 rows.
We do this by indexing into the Pandas data frame called stock_ge. Inside the square
brackets for the indexing operator, we do not have any number after the colon. This
indicates that we want all of the rows up to the end, but how many rows? And the
answer is 15 rows. The negative sign tells us that we would like the 15 rows, which
start 15 rows from the end and go on to include the last row. When we invoke the
head property on this Pandas data frame, we can see that it has a data for December
of 2020.

The Cell 5 displays a table with the following column header: Date, Open, High,
Low, and so on.

We can now go ahead and compute a candlestick chart.

trace = go.Candlestick(x =stock_ge_latest['Date'],. open = stock_ge_latest['Open'],.


high = stock_ge_latest['High'],. low = stock_ge_latest['Low'],. close =
stock_ge_latest['Close']). fig = go.Figure(data = [trace]). fig.show().

This is done in the same way as we previously did. We create a trace object by
invoking go.Candlestick and pass that in while

instantiating go.Figure, then we can just show our graph object. Let's scroll down just
a little bit and examine this candlestick chart in more detail. Because there are now
only 15 points, we can understand each of these candlesticks. And if we think about
it, each candlestick is remarkably similar to a boxplot. The rectangle in the middle of
each candle stick defines the open and close. If the open was higher than the close,
then that was a down day for that particular stock, and that's why the whole

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 26

candlestick is red. If the open is less than the close, then that was an up day for that
stock, and so that particular candlestick is green. The whiskers which extend out in
both directions define the high and the low. Of course, the high is the upper whisker
and the low is the lower whisker. To confirm this, let's hover over one individual
candlestick.

This is for Dec 4, 2020. This candlestick is green because the open which is 10.67 is
less than the close, which is 10.88. If we look closely at the values on the y-axis, we
can see that the lower bound of the rectangle corresponds to the open, that's 10.67.
And the upper bound corresponds to the close, 10.88. The upper whisker ends at the
high which is 10.93. And the lower whisker ends at the low, which is 10.51. And we
can repeat this experiment by hovering over some of the other candlesticks. And we
will always see that the color encodes whether that was an up or a down day for that
particular stock. And then the structure of the candlestick is very similar to that of a
box and whisker plot. Now, if you look closely at this candlestick chart,

you'll see that this one too has a slider below the main visualization. That slider made
sense in the previous one where we had data extending five years out, that was a
very dense visual. Here however, we only have data worth 15 days, and so we
probably can do without the slider. The advantage of using the graph objects APIs or
Plotly Express APIs is that we can control every aspect of a plot's appearance.

fig = go.Figure(data = [go.Candlestick(x = stock_ge_latest['Date'],. open =


stock_ge_latest['Open'],. high = stock_ge_latest['High'],. low =
stock_ge_latest['Low'],. close = stock_ge_latest['Close']).
fig.update_layout(xaxis_rangeslider_visible = False). fig.show().

On screen now is another version of the same candlestick chart. The only difference
is that now we turned off the display of the slider. How did we do that? Well, we
added one line of code, we invoked the update_layout method on our figure. And
inside that update_layout method, we passed in xaxis_rangeslider_visible = False.
And now, when we show this figure, we'll see that we have pretty much

the same candlestick chart, but without the slider below the main visual. And that
gets us to to the end of this little demo in which we explored the use of candlestick
charts, which are a great visualization for displaying stock market information.

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 27

8. Visualizing Data Using Plotly Funnel Charts


Visualizing Data Using Plotly Funnel Charts. Your host for this session is Vitthal
Srinivasan.
The "FunnelCharts - JupyterNotebook" appears on the screen.

In this demo, we are going to build a funnel chart. You can see this funnel chart on
screen now. It represents the rates of conversion at various points in a sales funnel.
So for instance, up top, we have the number of leads. Then we know how many of
those leads translated into sales calls, how many of those sales calls generated
follow up inquiries. How many of them were converted into requests for proposals,
and finally, how many sales were closed. You can see that we have these sales
funnels broken up by region. In the top right corner, we have the legend which tells
us that we have separate funnels for EMEA, Latin America, North America and the
Asia Pacific region. What's more,

each of the values in each of the funnels is also expressed as a percentage. For
instance, the leads for each of the regions started out as 100% and then subsequent
steps had smaller percentages. And in this way, we can tell at a glance that the sales
team for Latin America seems to be wildly outperforming its counterparts in the
other regions. How do we know? Because the sale percentage for Latin America is
46%, that's significantly higher than the closing percent for any of the other regions.
And Asia Pacific seems to come in last with a sales conversion ratio of just 28%. What
is the driver of this outperformance of the Latin American sales team? Well, it comes
down to how many leads are converted to sales calls, but it doesn't end there. At the
Sales Call state, we can see that LATAM is at 86%, which is 6 percentage points
higher than any of the other regions.

So the outperformance has already emerged by this point. However, the gap only
widens by the time we get to the Follow Up stage. Here the gap between Latin
America and Asia Pacific has expanded from 6% in the Sales Call stage to 11% in the
Follow Up stage. How did we get 11%? 69% minus 58%. As we can see from this little
analysis, a funnel chart is an extremely important and useful tool for managing linear
sequential processes, where we like to measure the dropouts or the rates of success
and failure at successive steps of the pipeline. Let's now go ahead and build this
funnel chart. We start with the required import statements. These are pandas
plotly.express and graph_objects.

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 28

import pandas as pd. import plotly.express as px

from plotly import graph_objects as go.

Then we read in the data, this is done using pd.read_csv. And because this data set is
small, we examine it in its entirety.

sale_conv_data = pd.read_csv('datasets/Fun_data.csv'). sale_conv_data

We can see that there are three columns, Stages, Number and Region. And the
Stages correspond to the Stages of the sales process, the Number correspond to how
many live prospects were left at that stage. And of course, the region is the
geographic region. This is a little dummy data set which we hand crafted and while
doing this handcrafting, we were careful to have the different regions grouped
together. So we first have all of the rows for EMEA, then come all of the rows for
Latin America, followed by those for North America and then for Asia Pacific. Within
the rows for a given region, we've also tried to stick to the stages of the process. So
we always have Leads first, then come Sales Calls, Follow Ups, Conversions and the
Sales closings. We can scroll down and verify that this is true for all of the regions.
This sorted ordering doesn't make a huge difference, but it is a little convenience and

to help us in the next step of the process, which you see on screen now.

sales_stages = sale_conv_data.groupby('Stages', sort = False).sun(). sales_stages =


sales_stages.reset_index(). sales_stages.

Here we have performed a groupby operation on our pandas data frame. Note here
that we've got to specify which column we would like to groupby, and that is the
Stages column. And here because our data was already sorted, we could specify sort
= False. The groupby operation requires an aggregation and that is the sum function.
And then we invoke the reset index method on this pandas data frame, that's
convert the stages, which at this point are the indexes of the rows back into a
column of the pandas data frame. And when we view that data frame, we can
confirm that it has the format that we like. We have the stages along with the
associated numbers. Please note that at this point, we've lost track of the region
information. We have to add that back-in in order to build the complex funnel chart
which we saw at the start of the demo.

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 29

But for now, we are going to start simple. On screen now, we have invoked a Plotly
express function px.funnel.

fig = px.funnel(sales_stages, x ='Number', y = 'Stages' )

fig.show().

This shows that we can use a high level API in order to build our first funnel chart.
This is an aggregate funnel chart which combines information across all of the
regions in our little data set. The first input argument is the pandas data frame with
the source. The second is the column name of the x variable, and the third is the
column name of the y variable. We've instantiated the figure and then invoked the
show method on it and this brings up the first aggregated funnel chart. You can see
that we have the Leads, Sales Calls, Follow Ups, Conversion and then the Sale. Now
at this point, when we hover over the steps in this funnel, we just get the basic
information, which is already visible inside the points itself.

What would be awesome at this point is to also have the percentages. And that's
easy enough to do but we'll have to make use of graph objects and not the Plotly
express function. On screen now, we've invoked the go.Figure method. This is from
the graph objects API, so

fig = go.Figure(go.Funnel(y = ['Leads','Sales Call', 'Follow Up','Conversion','Sale'],. x


= sales_stages['Number'], textinfo = 'value+percent initial'))

fig.show()

we are instantiating a graph object which is going to hold our funnel chart. This
instantiation of the figure takes in an invocation of go.Funnel, go.Funnel is going to
define the trace which will go inside this figure. go.Funnel requires the y and the x
variables. The y variables are the names of the stages, the x variables are taken from
the number column of the pandas data frame. But what's really interesting here is
the input argument called textinfo, which is set to the special string value+percent
initial. And this is directly interpreted by the Plotly APIs. And when we hit
Shift+Enter, we will see that we do indeed have the percentages as well

as the initial values associated with each step in our funnel. This textinfo input
argument will accept any combination of the strings label, text, percent initial,
percent previous, percent total and value. And these words can be joined with the
plus symbol, and that's exactly what we did. The funnel chart you see on screen now

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 30

has text info set value+percent initial. It's time to build on this funnel chart by
breaking up the funnels by region, and that's what we've done on screen. We've
gone back to the Plotly express API, and we've invoked px.funnel. We still pass in as
the first input argument a pandas data frame. We still have the x and the y variables,
but now the color also is the name of a column from the data frame, and that is the
Region column. That's why when we hit show on this figure, we can see that each of
the horizontal bars corresponding to the steps in

the funnels have been broken up by region. At this point, we have almost achieved
the very nice funnel chart which we saw at the start of the demo. But there's one
important element missing. We don't have the percentages for each region and for
each step of the funnel. We'll get there, but first, let's just hover over these different
rectangles. And we can see from the tooltips that we get the region name and the
number of data points as well as the current stage. So this is still helpful, but we can
do a little bit better. Here on screen now is the code for the fully blown funnel chart
which we saw at the start of the demo.

fig = go.Figure()

fig.add_trace(go.Funnel(name = 'EMEA',. y = ['Leads','Sales Call', 'Follow


Up','Conversion','Sale'],. x = sale_conv_data.query(Region == 'EMEA'")['Number'],
textinfo = 'value+percent initial')

We've gone back to graph objects because we did not have the required level of
control using Plotly express.

This time, it wasn't as simple as merely specifying color equal to Region. That's what
we had done in the case of the Plotly express funnel. Instead, we have invoked
go.Funnel four times, and each of those funnels has been added in to our figure
using fig.add_trace. Now each of those funnels corresponds to a different region.
Let's just focus on the first region, that's the EMEA region, to understand how this
funnel is wired up. The name is EMEA, y corresponds to a list with the steps in the
sales process. The x is really interesting, let's come back to that, and then the
textinfo is once again value+percent initial. Let's now come back to the x value for
EMEA. We obtained this x value by querying the rows in our underlying pandas data
frame by invoking the .query method on it.

And that .query method takes in a predicate expression. What's the predicate
expression? Region == EMEA. Notice how EMEA is enclosed within single quotes. So
effectively, we've now only filtered out all of the rows in the sales call data pandas

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 31

data frame, where region is equal to EMEA. And then from those rows, we've
extracted one specific column. That's the column called Number and that's why we
have Number enclosed within square brackets there. This now describes the funnel
for EMEA. All of the other funnels are virtually identical. The only difference is in the
name and in the query which is passed while invoking the .query property on the
pandas data frame. Each of the other funnels has the name of the corresponding
region instead of EMEA. You might also notice that we've specified orientation = h

for the other three funnels, but we've not done so far EMEA. Does this make any
difference to the output? No, it does not. The orientation by default is equal to h,
and so whether we leave it in or explicitly specify it, it doesn't matter. We can also
specify orientation = v, which will cause funnel bars to appear vertically rather than
horizontally. And now when we hit Shift+Enter, graph objects is smart enough to
figure out that we wanted these four funnels to splice together. This output is not a
surprise to us because we discussed it in some detail at the start of the demo. The
key attraction of this particular funnel chart is that it has the percentages associated
with each step for each region. This gets us to the end of our exploration of funnel
charts in Plotly. You can see how powerful these are for linear sequential processes
that we like to investigate the rates of success or failure at different steps in that
sequential process.

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 32

9. Course Summary
Course Summary

We have now come to the end of this course getting started with Plotly. We began
by installing Plotly using the PIP package manager, and then spend some time
understanding the components of the Plotly library. We noted that Plotly Express is
the high level library for creating visualizations, while plotly.graphobjects is a
relatively low level library with JSON representations of figures. The Plotly Express
API is a high level wrappers which, under the hood, construct and manipulate
plotly.graph objects. We then saw how we could create a basic Plotly Express figure
object to create a graph. And saw how we could access the JSON representation of a
graph objects chart. We then moved on to the use of box and whisker plots for
visualizing and representing different statistical properties of a continuous data
series. These plots pack a lot of information into a single visualization, including the
median, the 25th and 75th percentiles, and the min and max

values in a data set. We also saw how we could create separate box plots for
categories of a second categorical variable, and use color to further drill down into a
third categorical variable. We then also saw how we could visualize all the data
points segregated by a category using the strip plot. We then saw how we could
create box plots colored based on the value of a categorical field, and how we could
create a notched box plot to display confidence intervals around the median as well.
After that, we moved on to the use of candlestick charts to visualize the trend of
stock price performance over a period of time. Candlestick charts visualize data in a
remarkably similar fashion to box and whisker plots. But tweak the meanings of the
different plot elements to visualize the open high, low, and close prices of a stock for

a specific time period such as a date. In addition, the color of the candlestick tells us
whether the stock rose or fell. This is cleverly inferred by comparing the open and
the close. What's more, the Plotly candlestick chart also includes a range slider that
can be used to zoom into specific time ranges in the chart, adding a touch of
interactivity to this great visualization. We then finally rounded out this course by
exploring the use of funnel charts, which are a niche type of visualization just perfect
for plotting sequential steps in a linear process. Such as conversions of sales leads
into actual closed sales after various steps along the way. Sequential processes like a
sales or marketing funnel are not all that common, but in those situations where
they do arise,

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 33

funnel charts are absolutely incredible. Because they allow you to attribute
performance very accurately, and precisely dig into exactly the step of the process
where performance needs to be improved. You now have a good grip on the
foundations of using Plotly to construct visualizations. And are ready to move on to
some more advanced techniques in the course visualizing data using advanced charts
in Plotly coming up ahead.

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 34

10. Test

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 35

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 36

/conversion/tmp/activity_task_scratch/549519167.docx
14-Oct-21 549519167.docx 37

/conversion/tmp/activity_task_scratch/549519167.docx

You might also like