Professional Documents
Culture Documents
PDF Machine Learning in Business An Introduction To The World of Data Science 2Nd Edition John C Hull Ebook Full Chapter
PDF Machine Learning in Business An Introduction To The World of Data Science 2Nd Edition John C Hull Ebook Full Chapter
https://textbookfull.com/product/introduction-to-machine-
learning-with-python-a-guide-for-data-scientists-andreas-c-
muller/
https://textbookfull.com/product/artificial-intelligence-with-an-
introduction-to-machine-learning-2nd-edition-richard-e-
neapolitan/
https://textbookfull.com/product/introduction-to-machine-
learning-with-python-a-guide-for-data-scientists-1st-edition-
andreas-c-muller/
https://textbookfull.com/product/social-science-an-introduction-
to-the-study-of-society-david-c-colander/
Python Real World Machine Learning: Real World Machine
Learning: Take your Python Machine learning skills to
the next level 1st Edition Joshi
https://textbookfull.com/product/python-real-world-machine-
learning-real-world-machine-learning-take-your-python-machine-
learning-skills-to-the-next-level-1st-edition-joshi/
https://textbookfull.com/product/flex-the-art-and-science-of-
leadership-in-a-changing-world-jeffrey-hull/
https://textbookfull.com/product/encyclopedia-of-machine-
learning-and-data-mining-2nd-2nd-edition-claude-sammut/
https://textbookfull.com/product/social-science-an-introduction-
to-the-study-of-society-18th-edition-david-c-colander/
https://textbookfull.com/product/encyclopedia-of-machine-
learning-and-data-mining-2nd-edition-claude-sammut/
Machine Learning in
Business:
An Introduction to the World of Data
Science
Machine Learning in
Business:
An Introduction to the World of Data
Science
Second Edition
John C. Hull
University Professor
Joseph L. Rotman School of Management
University of Toronto
Second Printing
Copyright © 2019, 2020 by John C. Hull
All Rights Reserved
ISBN: 9798644074372
To my students
Contents
Preface xi
Chapter 1 Introduction 1
1.1 This book and the ancillary material 3
1.2 Types of machine learning models 4
1.3 Validation and testing 6
1.4 Data cleaning 14
1.5 Bayes’ theorem 16
Summary 19
Short concept questions 20
Exercises 21
John Hull
Introduction
the most exciting development within AI and one that has the potential
to transform virtually all aspects of a business.2
What are the advantages for society of replacing human decision
making by machines? One advantage is speed. Machines can process
data and come to a conclusion much faster than humans. The results
produced by a machine are consistent and easily replicated on other
machines. By contrast, humans occasionally behave erratically and
training a human for a task can be quite time consuming and expensive.
To explain how machine learning differs from other AI approaches
consider the simple task of programming a computer to play tic tac toe
(also known as noughts and crosses). One approach would be to pro-
vide the computer with a look-up table listing the positions that can
arise and the move that would be made by an expert human player in
each of those positions. Another would be to create for the computer a
large number of games (e.g., by arranging for the computer to play
against itself thousands of times) and let it learn the best move. The
second approach is an application of machine learning. Either approach
can be successfully used for a simple game such as tic tac toe. Machine
learning approaches have been shown to work well for more complicat-
ed games such as chess and Go where the first approach is clearly not
possible.
A good illustration of the power of machine learning is provided by
language translation. How can a computer be programmed to translate
between two languages, say from English to French? One idea is to give
the computer an English to French dictionary and program it to trans-
late word-by-word. Unfortunately, this produces very poor results. A
natural extension of this idea is to develop a look up table for translat-
ing phrases rather than individual words. The results from this are an
improvement, but still far from perfect. Google has pioneered a better
approach using machine learning. This was announced in November
2016 and is known as “Google Neural Machine Translation” (GNMT).3 A
computer is given a large volume of material in English together with
the French translation. It learns from that material and develops its own
(quite complex) translation rules. The results from this have been a big
improvement over previous approaches.
Data science is the field that includes machine learning but is some-
times considered to be somewhat broader including such tasks as the
setting of objectives, implementing systems, and communicating with
2 Some organizations now use the terms “machine learning” and “artificial intelli-
gence” interchangeably.
3 See https://arxiv.org/pdf/1609.08144.pdf for an explanation of GNMT by the
4See, for example, H. Bowne-Anderson, “What data scientists really do, according to
35 data scientists,” Harvard Business Review, August 2018:
https://hbr.org/2018/08/what-data-scientists-really-do-according-to-35-data-
scientists
4 Chapter 1
inspire some readers to learn more and develop their abilities in this
area. Data science may well prove to be the most rewarding and excit-
ing profession in the 21st century.
To use machine learning effectively you have to understand how the
underlying algorithms work. It is tempting to learn a programming lan-
guage such as Python and apply various packages to your data without
really understanding what the packages are doing or even how the re-
sults should be interpreted. This would be a bit like a finance specialist
using the Black−Scholes−Merton model to value options without under-
standing where it comes from or its limitations.
The objective of this book is to explain the algorithms underlying
machine learning so that the results from using the algorithms can be
assessed knowledgeably. Anyone who is serious about using machine
learning will want to learn a language such as Python for which many
packages have been developed. This book takes the unusual approach
of using both Excel and Python to provide backup material. This is be-
cause it is anticipated that some readers will, at least initially, be much
more comfortable with Excel than with Python.
The backup material can be found on the author’s website:
www-2.rotman.utoronto.ca/~hull
Readers can start by focusing on the Excel worksheets and then move to
Python as they become more comfortable with it. Python will enable
them use machine learning packages, handle data sets that are too large
for Excel, and benefit from Python’s faster processing speeds.
Supervised learning
Unsupervised learning
Semi-supervised learning
Reinforcement learning
Data scientists are typically not just interested in testing one model.
They typically try several different models, choose between them, and
then test the accuracy of the chosen model. For this, they need three
data sets:
a training set
a validation set
a test set
The training set is used to determine the parameters of the models
that are under consideration. The validation set is used to determine
how well each of the models generalizes to a different data set. The test
set is held back to provide a measure of the accuracy of the chosen
model.
We will illustrate this with a simple example. Suppose that we are in-
terested in predicting the salaries of people working in a particular pro-
fession in a certain part of the United States from their age. We collect
data on a random sample of 30 individuals. (This is a very small data set
created to provide a simple example. The data sets used in machine
learning are many times larger than this.) The first ten observations
(referred to in machine learning as instances) will be used to form the
training set. The next ten observations will be used for form the valida-
tion set and the final ten observations will be used to form the test set.
The training set is shown in Table 1.1 and plotted in Figure 1.1. It is
tempting to choose a model that fits the training set really well. Some
experimentation shows that a polynomial of degree five does this. This
is the model:
Y a b1 X b2 X 2 b3 X 3 b4 X 4 b5 X 5
where Y is salary and X is age. The result of fitting the polynomial to the
data is shown in Figure 1.2. Details of all analyses carried out, are in
www-2.rotman.utoronto.ca/~hull
The model provides a good fit to the data. The standard deviation of
the difference between the salary given by the model and the actual sal-
ary for the ten individuals in the training data set, which is referred to
as the root-mean-squared error (rmse), is $12,902. However, common
sense would suggest that we may have over-fitted the data. (This is be-
cause the curve in Figure 1.2 seems unrealistic. It declines, increases,
declines, and then increases again as age increases.) We need to check
the model out-of-sample. To use the language of data science, we need
8 Chapter 1
Table 1.1 The training data set: salaries for a random sample of ten
people working in a particular profession in a certain area.
Figure 1.1 Scatter plot of the training data set in Table 1.1
350,000
300,000
250,000
Salary ($)
200,000
150,000
100,000
50,000
0
20 30 40 50 60 70
Age (years)
Introduction 9
350,000
300,000
250,000
Salary ($)
200,000
150,000
100,000
50,000
0
20 30 40 50 60 70
Age (years)
The validation set is shown in Table 1.2. The scatter plot for this da-
ta is in Figure 1.3. When we use the model in Figure 1.2 for this data, we
find that the root mean square error (rmse) is about $38,794, much
higher than the $12,902 we obtained using the training data set in Table
1.1. This is a clear indication that the model in Figure 1.2 is over-fitting:
it does not generalize well to new data.
350,000
300,000
250,000
Salary ($)
200,000
150,000
100,000
50,000
0
20 30 40 50 60 70
Age (years)
The natural next step is to look for a simpler model. The scatter plot
in Figure 1.1 suggests that a quadratic model might be appropriate. This
model is:
Y a b1 X b2 X 2
Figure 1.4 Result of fitting a quadratic model to the data in Table 1.1
and Figure 1.1 (see Salary vs. Age Excel file)
350,000
300,000
250,000
Salary ($)
200,000
150,000
100,000
50,000
0
20 30 40 50 60 70
Age (years)
Visually it can be seen that this model does not capture the decline in
salaries as individuals age beyond 50. This observation is confirmed by
the standard deviation of the error for the training data set, which is
$49,731, much worse than that for the quadratic model.
Figure 1.5 Result of fitting a linear model to training data (see Sala-
ry vs. Age Excel file)
350,000
300,000
250,000
Salary ($)
200,000
150,000
100,000
50,000
0
20 30 40 50 60 70
Age (years)
12 Chapter 1
Table 1.3 summarizes the root mean square errors given by the
three models we have considered. Note that both the linear model and
the quadratic model generalize well to the validation data set, but the
quadratic model is preferred because it is more accurate. By contrast,
the five-degree polynomial model does not generalize well. It over-fits
the training set while the linear model under-fits the training set.
Table 1.4 Errors when quadratic model is applied to the test set
This rule is illustrated in Figure 1.6. The figure assumes that there is a
continuum of models that get progressively more complex. For each
model, we calculate a measure of the model’s error, such as root mean
square error, for both the training set and the validation set. When the
complexity of the model is less than X, the model generalizes well: the
error of the model for the validation set is only a little more than that
for the training set. As model complexity is increased beyond X, the er-
rors for the validation set start to increase.
Figure 1.6 Errors of a model for the training set and the vali-
dation set.
Training set
Model
Error Validation set
X Model Complexity
14 Chapter 1
The best model is the one with model complexity X. This is because
that model has the lowest error for the validation set. A further increase
in complexity lowers errors for the training set but increases them for
the validation set, which is a clear indication of over-fitting.
Finding the right balance between under-fitting and over-fitting is
referred to as the bias-variance trade-off in machine learning. The bias is
the error due the assumptions in the model that cause it to miss rele-
vant relations. The variance is the error due to the model over-fitting by
reflecting random noise in the training set.
To summarize the points we have made:
In the simple example we have looked at, the training set, validation
set, and test set had equal numbers of observations. In a typical ma-
chine learning application much more data is available and at least 60%
of it is allocated to the training set while 10% to 20% is allocated to
each of the validation set and the test set.
It is important to emphasize that the data sets in machine learning
involve many more observations that the baby data set we have used in
this section. (Ten observations are obviously insufficient to reliably
learn a relationship.) However, the baby data set does provide a simple
illustration of the bias-variance trade-off.
5 See https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-
time-consuming-least-enjoyable-data-science-task-survey-says/#2f8970aa6f63 for
a discussion of this.
Introduction 15
Inconsistent Recording
Either numerical or categorical data can be subject to inconsistent
recording. For example, numerical data for the square footage of a
house might be input manually as 3300, 3,300, 3,300 ft, or 3300+, and
so on. It is necessary to inspect the data to determine variations and
decide the best approach to cleaning. Categorical data might list the
driveway as “asphalt”, “Asphalt”, or even “aphalt.” The simplest ap-
proach here is to list the alternatives that have been input for a particu-
lar feature and merge them as appropriate.
Unwanted Observations
If you are developing a model to predict house prices in a certain ar-
ea, some of your data might refer to the prices of apartments or to the
prices of houses that are not in the area of interest. It is important to
find a way of identifying this data and removing it before any analysis is
attempted.
Duplicate Observations
When data is merged from several different sources or several dif-
ferent people have been involved in creating a data set there are liable
to be duplicate observations. These can bias results. It is therefore im-
portant to use a search algorithm to identify and remove duplicates as
far as possible.
Outliers
In the case of numerical data, outliers can be identified by either
plotting data or searching for data that is, say, six standard deviations
away from the mean. Sometimes it is clear that the outlier is a typo. For
example, if the square footage of a house with three bedrooms is input
16 Chapter 1
Missing Data
In any large data set there are likely to be missing data values. A
simple approach is to remove data with missing values for one or more
features. But this is probably undesirable because it reduces the sample
size and may create biases. In the case of categorical data, a simple solu-
tion is to create a new category titled “Missing.” In the case of numerical
data, one approach is to replace the missing data by the mean or median
of the non-missing data values. For example, if the square footage of a
house is missing and we calculate the median square footage for the
houses for which this data is available to be 3,500, we could populate all
the missing values with 3,500. More sophisticated approaches can in-
volve regressing the target against non-missing values and then using
the results to populate missing values. Sometimes it is reasonable to
assume that data is missing at random and sometimes the very fact that
data is missing is itself informative. In the latter case it can be desirable
to create a new indicator variable which is zero if the data is present
and one if it is missing.
𝑃(𝑋|𝑌)𝑃(𝑌)
𝑃(𝑌|𝑋) = (1.1)
𝑃(𝑋)
𝑃(𝑋 and 𝑌)
𝑃(𝑌|𝑋) =
𝑃(𝑋)
and
𝑃(𝑋 and 𝑌)
𝑃(𝑋|𝑌) =
𝑃(𝑌)
Substituting for 𝑃(𝑋 and 𝑌) from the second of these equations into the
first leads to the Bayes’ theorem result in equation (1.1).
For an application of Bayes’ theorem, suppose that a bank is trying
to identify customers who are attempting to do fraudulent transactions
at branches. It observes that 90% of fraudulent transactions involve
over $100,000 and occur between 4pm and 5pm. In total, only 1% of
transactions are fraudulent and 3% of all transactions involve over
$100,000 and occur between 4pm and 5pm.
In this case we define:
We know that P(Y) = 0.01, 𝑃(𝑋|𝑌) = 0.9, and P(X) = 0.03. From Bayes’
theorem:
Both Hannibal and Hasdrubal his brother smiled when this letter
was ended, and the former remarked:
“Ay, Mago, my lad, I see it, despite all the girl’s cunning. She is
indeed a clever girl, and she wants me to give her to Maharbal, and
by Melcareth! she shall have her wish, for she can be useful, ay,
indeed more than useful to me at the present juncture. I have sore
need of the close alliance of the whole of the Ilergetes and of all the
great tribes dependant upon them, for we shall not long be left alone
in Iberia, since the Romans will soon be sending their legions here.
And this girl can win us this alliance as she saith. But for all that I will
not give her her liberty, nay, nor marry her to Maharbal, for he
certainly shall not marry her if I let him not wed Elissa; further, I
would keep a hold on the girl. But for the interests of the State I will,
as she desireth, make her happy. I will therefore give her to
Maharbal, at all events for the time being. He shall leave the palace
at New Carthage, and whether he will or no shall take her to live with
him, with the understanding given to her by me that she is to be
considered as his affianced wife, to be wedded and set free when I
see fit. If that will not make her happy, then I am not named
Hannibal. I am not, alas! so sure of Maharbal himself, nor of Elissa.
But reasons of State ever are paramount, and all must bend to my
will or suffer for it.”
And Hannibal frowned deeply at the mere idea of being thwarted
in any way.
“No one dare oppose thy will, brother,” said Hasdrubal, “for thou
art king here absolutely; although thou wearest not the crown thou
couldst any day, an thou would, place it upon thy brow. Thy plan is a
good one as I see it. For it will firstly have the effect of separating
Elissa from Maharbal; secondly, it will prevent the latter from
marrying at all; thirdly, thou canst send the girl Melania under the
escort of Maharbal himself to the court of King Andobales, and she
can point to him as her affianced husband. That will more than
content these barbarians, especially when they know how highly he
stands in thy favour, and that he is, leaving his high connections on
one side, commander of all the Numidian Cavalry.”
“It shall be done without delay,” said the chief. “Call in my faithful
Greek friend and scribe, Silenus; he shall write the necessary letters
for us. He hath a cunning hand hath Silenus, and knoweth well how
to convey an order so that it is thoroughly understood, yet seemeth
but intended as a favour. But at all events, Maharbal shall not marry,
and to my mind Elissa is too young to be married yet. Further, she
may be useful to the State later.”
These reflections he added meditatively, as if sorry for the blow
that he was about to inflict upon his daughter.
Then Silenus, who was ever Hannibal’s closest friend, and who
accompanied him in all his wanderings, was called in, and three
letters were written—to Elissa, Maharbal, and Melania respectively,
all carefully worded.
In about a week the courier bearing the letters arrived at New
Carthage, where they caused considerable stir and many heart-
burnings.
That to Elissa, after conveying the warmest praise for her conduct,
intimated the speedy arrival of Hannibal himself, and then referred to
Maharbal. Of him the Commander-in-Chief said, that since he was
the only officer in the whole of the Carthaginian army of those who
had served before Saguntum who had no female slaves, and that
Elissa herself being unmarried and Maharbal residing in the palace
with her, some talk was being bandied about the camp which were
best suppressed, therefore, Hannibal considered it best that
Maharbal should leave the palace forthwith, and as he seemed not
yet wholly recovered from his wound, that he should take Melania
with him to watch him until his recovery. Further, Hannibal intimated
to his daughter that, as there were reasons of State for this
arrangement, he trusted to her duty, even if she should herself have
formed any attachment for the young man, to offer no opposition to
her father’s projects. The letter ended with instructions to send the
unfortunate Zeno and the captive guards of Adherbal to perpetual
slavery in the silver mines.
To Maharbal were conveyed the warmest thanks and praise of his
Commander-in-Chief, an intimation of his promotion, and of the
despatch to him of much gold and many horses; further, a deed of
gift conveying to him a house belonging to Hannibal, situated near
the citadel. He was also informed that, as a reward for his bravery
and devotion, Melania was appointed to be his companion, and,
although Hannibal would himself not resign his own vested rights in
her, she was to be considered in all other respects as his slave.
Finally, Hannibal enjoined upon Maharbal that, for reasons of his
own, he expected him to do all in his power to make Melania happy
in every respect; also the necessity of his impressing upon everyone
that Melania was not merely his slave but his affianced bride, to be
wedded when his commander should see fit.
In a kindly-worded note, in Hannibal’s own hand on a separate
paper, the contents of which he was enjoined to keep to himself,
Maharbal was informed that he need be in no fear of being plunged
into any immediate wedlock, for that Hannibal had no intention of
having any of his superior officers married for a long time to come,
not, at all events, before certain work of great importance that he had
in hand should be completed.
Before the arrival of these letters, Maharbal and Elissa had been
living in a state of halcyon bliss, the only disturbing element to cause
any trouble having been the foolish little Princess Cœcilia, who, with
her mania for flirtation, had been incessantly casting eyes at the
young Colossus, and indeed making love to him very openly. For she
was dying to get married again, and had conceived the idea of
marrying Maharbal himself. As for Melania, she had suffered greatly
for some days after her escape, and had, during the days that
Maharbal, sick himself, had tended her like a brother, in no wise ever
allowed her feelings to get the upper hand of her self-constraint, nor
allowed her inward devotion and passionate attachment to him to
appear outwardly. As Elissa had also been kindness itself to her, she
had, indeed, during those days of sore sickness, resolved to subdue
self entirely, and to banish from her heart the love she bore to the
gallant officer of the Numidian Horse. Thus it had been solely with
the intention of striving to make her two benefactors happy, while
removing temptation from herself, that she had secretly written as
she had done in the first part of her letter to Hannibal. The latter part
spoke for itself. But her self-abnegation had been utterly
misunderstood by the great commander and his brothers, who had
quite misjudged her, with the result that is known.
The letter that she received herself came to her as a surprise. No
mention was made of the letter that she had sent to Hannibal, but his
to her commenced by saying that he expected shortly to have need
of her services on an important matter; that he regretted to hear of
the danger she had been in, and that he rejoiced at her escape, and
at the condign punishment of her aggressor.
Then the letter continued, that Hannibal, ever mindful of the
happiness of those who had done good service to the State, had not
forgotten her or Maharbal, and was anxious to make them both
happy. Therefore, since Maharbal had not, in the usual fashion of the
army, any female slave living with him, and as he was universally
well spoken of by men and women alike, he had decreed that, for the
present, she was to remove herself from the palace, and to reside
with Maharbal in the house which he himself was going to give him
as a residence. Further, that she was not to consider that she was
being treated lightly in this matter, although she was undoubtedly at
present a slave, nor was she to consider herself merely in the same
light as any other slave-girl who might be the temporary mistress of
the home of one of the nobles in the Carthaginian army. For
Hannibal, bearing the greatest good-will to both Maharbal and
herself, and recognising that, from her birth, she was in a position to
be his wife, had decreed that, while under Maharbal’s roof, Melania
was to be considered and treated as his affianced bride. She was
informed that the actual marriage should take place at such time, as,
in the opinion of Hannibal, it conveniently might, and that, at the
same time, her freedom would be conferred upon her.
The letter ended: “Thou art to show unto Maharbal this my letter
unto thee, and show it further to my daughter, Elissa, Regent and
Governor of New Carthage.”
The terribly mixed feelings with which Melania read this letter
caused her poor fluttering heart to beat as though her bosom would
burst. There was no joy she longed for in life more than to become
all in all to Maharbal, although, alas! she well knew that he did not
love her, but only loved Elissa. Thus, despite her love, she hated the
idea of being compelled to live under his roof as his wife, for this was
very plainly the General’s intention. Again, she knew how Maharbal
himself would take the matter, and she dreaded his scorn and
neglect. She also feared the anger and revenge of which she might
be the sufferer at the hands of Elissa, whose ardent love for
Maharbal she well knew, for she had seen it indulged in openly and
unrestrainedly by the young girl before her very eyes. For Elissa,
with all the thoughtless folly of youth, had never considered her
slave’s presence when with the glorious young Apollo, her own sun
god.
Sooth to say, there was no such man as Maharbal in all the lands
of Carthagena or of Iberia. He was, indeed, a very Adonis for beauty,
with all the strength of a Hercules. It was no wonder that he was
beloved by maids and matrons alike, for in face, form, and
disposition he was in all points a man for a woman to worship.
The wretched Melania in her despair knew not what to do. When
nearly mad with thinking, she eventually sent a maiden with the letter
to Maharbal and Elissa when they were together. And then, leaving a
note in her apartment saying that she was departing for ever, and
that it would be useless to seek her, she fled from the town; walking
as one distraught, not knowing what she would do, but simply with
the idea of taking away her own life in some way. For, from whatever
aspect she looked at it, she could not face the situation. While
passing the guard house and crossing the bridge leading to the
mainland, she met many people who knew her, and who saluted her.
She looked at them vaguely without seeing them, and passed on.
They thought from her dazed expression that she had gone mad.
And so, in fact, the poor girl had in a way. Vaguely still, she
wandered on until she took a little by-road that led up into an
interminable cork and hazel forest, that covered the whole of the
mountain-side. As she was ascending the hill, she met a man whom
she had quite recently befriended, an old soldier who had had his leg
broken in an accident in the palace, and whom she had nursed. He
had gone to live on the mountain-side, where he made a living by
capturing, with the aid of his sons, the game which abounded. He
stopped her, and being a garrulous old man, forced her to speak to
him. He informed her that as evening was now coming on she must
not proceed further, for that she would be in danger of her life from
the wolves, bears, and wild boars with which the forest was filled.
“Wolves, bears, and wild boars! are there many?” she asked.
“The hill is full of them, dear lady Melania; therefore, to go further
to-night will be certain death.”
“Then, as certain death is what I seek, I shall proceed,” replied the
girl. “Take thou this piece of gold, and let me pass. Nay, here are
two, and some silver also—take them all.”
Pushing the old man aside, she passed on, and wandered away
into the recesses of the forest, until, long after having left all vestige
of a trail, she fell from sheer exhaustion beneath the shadow of a
spreading plane-tree, beside a little spring. After drinking a draught
of the cool, refreshing water, she laid herself down to await the
coming of the wild animals that were to solve the vexed problem of
her existence for her, and to terminate all her woes. But she
remained there that night, and also for the following three days,
gradually dying from starvation, and still no ferocious beast came by
to terminate her ills.