Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

Introduction to Algorithms for Data

Mining and Machine Learning 1st


edition - eBook PDF
Go to download the full and correct content document:
https://ebooksecure.com/download/introduction-to-algorithms-for-data-mining-and-ma
chine-learning-ebook-pdf/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Machine Learning for Biometrics: Concepts, Algorithms


and Applications (Cognitive Data Science in Sustainable
Computing) 1st Edition - eBook PDF

https://ebooksecure.com/download/machine-learning-for-biometrics-
concepts-algorithms-and-applications-cognitive-data-science-in-
sustainable-computing-ebook-pdf/

BIG DATA ANALYTICS: Introduction to Hadoop, Spark, and


Machine-Learning 1st Edition - eBook PDF

https://ebooksecure.com/download/big-data-analytics-introduction-
to-hadoop-spark-and-machine-learning-ebook-pdf/

(eBook PDF) Introduction to Machine Learning with


Python: A Guide for Data Scientists

http://ebooksecure.com/product/ebook-pdf-introduction-to-machine-
learning-with-python-a-guide-for-data-scientists/

(eBook PDF) Introduction to Business Data Mining 1st


Edition

http://ebooksecure.com/product/ebook-pdf-introduction-to-
business-data-mining-1st-edition/
(eBook PDF) Machine Learning Refined: Foundations,
Algorithms, and Applications

http://ebooksecure.com/product/ebook-pdf-machine-learning-
refined-foundations-algorithms-and-applications/

(eBook PDF) Introduction to Data Mining, Global Edition


2nd Edition

http://ebooksecure.com/product/ebook-pdf-introduction-to-data-
mining-global-edition-2nd-edition/

Big Data Mining for Climate Change 1st edition - eBook


PDF

https://ebooksecure.com/download/big-data-mining-for-climate-
change-ebook-pdf/

(eBook PDF) Introduction to Data Mining 2nd Edition by


Pang-Ning Tan

http://ebooksecure.com/product/ebook-pdf-introduction-to-data-
mining-2nd-edition-by-pang-ning-tan/

Machine Learning for Planetary Science 1st Edition -


eBook PDF

https://ebooksecure.com/download/machine-learning-for-planetary-
science-ebook-pdf/
Xin-She Yang

Introduction to
Algorithms for Data Mining
and Machine Learning
Introduction to Algorithms for Data Mining and
Machine Learning
This page intentionally left blank
Introduction to
Algorithms for Data
Mining and Machine
Learning

Xin-She Yang
Middlesex University
School of Science and Technology
London, United Kingdom
Academic Press is an imprint of Elsevier
125 London Wall, London EC2Y 5AS, United Kingdom
525 B Street, Suite 1650, San Diego, CA 92101, United States
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
Copyright © 2019 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or
mechanical, including photocopying, recording, or any information storage and retrieval system, without
permission in writing from the publisher. Details on how to seek permission, further information about the
Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center
and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other
than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our
understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using
any information, methods, compounds, or experiments described herein. In using such information or methods
they should be mindful of their own safety and the safety of others, including parties for whom they have a
professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability
for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or
from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

Library of Congress Cataloging-in-Publication Data


A catalog record for this book is available from the Library of Congress

British Library Cataloguing-in-Publication Data


A catalogue record for this book is available from the British Library

ISBN: 978-0-12-817216-2

For information on all Academic Press publications


visit our website at https://www.elsevier.com/books-and-journals

Publisher: Candice Janco


Acquisition Editor: J. Scott Bentley
Editorial Project Manager: Michael Lutz
Production Project Manager: Nilesh Kumar Shah
Designer: Miles Hitchen
Typeset by VTeX
Contents

About the author ix


Preface xi
Acknowledgments xiii

1 Introduction to optimization 1
1.1 Algorithms 1
1.1.1 Essence of an algorithm 1
1.1.2 Issues with algorithms 3
1.1.3 Types of algorithms 3
1.2 Optimization 4
1.2.1 A simple example 4
1.2.2 General formulation of optimization 7
1.2.3 Feasible solution 9
1.2.4 Optimality criteria 10
1.3 Unconstrained optimization 10
1.3.1 Univariate functions 11
1.3.2 Multivariate functions 12
1.4 Nonlinear constrained optimization 14
1.4.1 Penalty method 15
1.4.2 Lagrange multipliers 16
1.4.3 Karush–Kuhn–Tucker conditions 17
1.5 Notes on software 18

2 Mathematical foundations 19
2.1 Convexity 20
2.1.1 Linear and affine functions 20
2.1.2 Convex functions 21
2.1.3 Mathematical operations on convex functions 22
2.2 Computational complexity 22
2.2.1 Time and space complexity 24
2.2.2 Complexity of algorithms 25
2.3 Norms and regularization 26
2.3.1 Norms 26
2.3.2 Regularization 28
2.4 Probability distributions 29
2.4.1 Random variables 29
2.4.2 Probability distributions 30
vi Contents

2.4.3 Conditional probability and Bayesian rule 32


2.4.4 Gaussian process 34
2.5 Bayesian network and Markov models 35
2.6 Monte Carlo sampling 36
2.6.1 Markov chain Monte Carlo 37
2.6.2 Metropolis–Hastings algorithm 37
2.6.3 Gibbs sampler 39
2.7 Entropy, cross entropy, and KL divergence 39
2.7.1 Entropy and cross entropy 39
2.7.2 DL divergence 40
2.8 Fuzzy rules 41
2.9 Data mining and machine learning 42
2.9.1 Data mining 42
2.9.2 Machine learning 42
2.10 Notes on software 42

3 Optimization algorithms 45
3.1 Gradient-based methods 45
3.1.1 Newton’s method 45
3.1.2 Newton’s method for multivariate functions 47
3.1.3 Line search 48
3.2 Variants of gradient-based methods 49
3.2.1 Stochastic gradient descent 50
3.2.2 Subgradient method 51
3.2.3 Conjugate gradient method 52
3.3 Optimizers in deep learning 53
3.4 Gradient-free methods 56
3.5 Evolutionary algorithms and swarm intelligence 58
3.5.1 Genetic algorithm 58
3.5.2 Differential evolution 60
3.5.3 Particle swarm optimization 61
3.5.4 Bat algorithm 61
3.5.5 Firefly algorithm 62
3.5.6 Cuckoo search 62
3.5.7 Flower pollination algorithm 63
3.6 Notes on software 64

4 Data fitting and regression 67


4.1 Sample mean and variance 67
4.2 Regression analysis 69
4.2.1 Maximum likelihood 69
4.2.2 Liner regression 70
4.2.3 Linearization 75
4.2.4 Generalized linear regression 77
4.2.5 Goodness of fit 80
Contents vii

4.3 Nonlinear least squares 81


4.3.1 Gauss–Newton algorithm 82
4.3.2 Levenberg–Marquardt algorithm 85
4.3.3 Weighted least squares 85
4.4 Overfitting and information criteria 86
4.5 Regularization and Lasso method 88
4.6 Notes on software 90

5 Logistic regression, PCA, LDA, and ICA 91


5.1 Logistic regression 91
5.2 Softmax regression 96
5.3 Principal component analysis 96
5.4 Linear discriminant analysis 101
5.5 Singular value decomposition 104
5.6 Independent component analysis 105
5.7 Notes on software 108

6 Data mining techniques 109


6.1 Introduction 110
6.1.1 Types of data 110
6.1.2 Distance metric 110
6.2 Hierarchy clustering 111
6.3 k-Nearest-neighbor algorithm 112
6.4 k-Means algorithm 113
6.5 Decision trees and random forests 115
6.5.1 Decision tree algorithm 115
6.5.2 ID3 algorithm and C4.5 classifier 116
6.5.3 Random forest 120
6.6 Bayesian classifiers 121
6.6.1 Naive Bayesian classifier 121
6.6.2 Bayesian networks 123
6.7 Data mining for big data 124
6.7.1 Characteristics of big data 124
6.7.2 Statistical nature of big data 125
6.7.3 Mining big data 125
6.8 Notes on software 127

7 Support vector machine and regression 129


7.1 Statistical learning theory 129
7.2 Linear support vector machine 130
7.3 Kernel functions and nonlinear SVM 133
7.4 Support vector regression 135
7.5 Notes on software 137
viii Contents

8 Neural networks and deep learning 139


8.1 Learning 139
8.2 Artificial neural networks 140
8.2.1 Neuron models 140
8.2.2 Activation models 141
8.2.3 Artificial neural networks 143
8.3 Back propagation algorithm 146
8.4 Loss functions in ANN 147
8.5 Optimizers and choice of optimizers 149
8.6 Network architecture 149
8.7 Deep learning 151
8.7.1 Convolutional neural networks 151
8.7.2 Restricted Boltzmann machine 157
8.7.3 Deep neural nets 158
8.7.4 Trends in deep learning 159
8.8 Tuning of hyperparameters 160
8.9 Notes on software 161

Bibliography 163

Index 171
About the author

Xin-She Yang obtained his PhD in Applied Mathematics from the University of Ox-
ford. He then worked at Cambridge University and National Physical Laboratory (UK)
as a Senior Research Scientist. Now he is Reader at Middlesex University London, and
an elected Bye-Fellow at Cambridge University.
He is also the IEEE Computer Intelligence Society (CIS) Chair for the Task Force
on Business Intelligence and Knowledge Management, Director of the International
Consortium for Optimization and Modelling in Science and Industry (iCOMSI), and
an Editor of Springer’s Book Series Springer Tracts in Nature-Inspired Computing
(STNIC).
With more than 20 years of research and teaching experience, he has authored
10 books and edited more than 15 books. He published more than 200 research pa-
pers in international peer-reviewed journals and conference proceedings with more
than 36 800 citations. He has been on the prestigious lists of Clarivate Analytics and
Web of Science highly cited researchers in 2016, 2017, and 2018. He serves on the
Editorial Boards of many international journals including International Journal of
Bio-Inspired Computation, Elsevier’s Journal of Computational Science (JoCS), In-
ternational Journal of Parallel, Emergent and Distributed Systems, and International
Journal of Computer Mathematics. He is also the Editor-in-Chief of the International
Journal of Mathematical Modelling and Numerical Optimisation.
This page intentionally left blank
Preface

Both data mining and machine learning are becoming popular subjects for university
courses and industrial applications. This popularity is partly driven by the Internet and
social media because they generate a huge amount of data every day, and the under-
standing of such big data requires sophisticated data mining techniques. In addition,
many applications such as facial recognition and robotics have extensively used ma-
chine learning algorithms, leading to the increasing popularity of artificial intelligence.
From a more general perspective, both data mining and machine learning are closely
related to optimization. After all, in many applications, we have to minimize costs,
errors, energy consumption, and environment impact and to maximize sustainabil-
ity, productivity, and efficiency. Many problems in data mining and machine learning
are usually formulated as optimization problems so that they can be solved by opti-
mization algorithms. Therefore, optimization techniques are closely related to many
techniques in data mining and machine learning.
Courses on data mining, machine learning, and optimization are often compulsory
for students, studying computer science, management science, engineering design, op-
erations research, data science, finance, and economics. All students have to develop
a certain level of data modeling skills so that they can process and interpret data for
classification, clustering, curve-fitting, and predictions. They should also be familiar
with machine learning techniques that are closely related to data mining so as to carry
out problem solving in many real-world applications. This book provides an introduc-
tion to all the major topics for such courses, covering the essential ideas of all key
algorithms and techniques for data mining, machine learning, and optimization.
Though there are over a dozen good books on such topics, most of these books are
either too specialized with specific readership or too lengthy (often over 500 pages).
This book fills in the gap with a compact and concise approach by focusing on the key
concepts, algorithms, and techniques at an introductory level. The main approach of
this book is informal, theorem-free, and practical. By using an informal approach all
fundamental topics required for data mining and machine learning are covered, and
the readers can gain such basic knowledge of all important algorithms with a focus
on their key ideas, without worrying about any tedious, rigorous mathematical proofs.
In addition, the practical approach provides about 30 worked examples in this book
so that the readers can see how each step of the algorithms and techniques works.
Thus, the readers can build their understanding and confidence gradually and in a
step-by-step manner. Furthermore, with the minimal requirements of basic high school
mathematics and some basic calculus, such an informal and practical style can also
enable the readers to learn the contents by self-study and at their own pace.
This book is suitable for undergraduates and graduates to rapidly develop all the
fundamental knowledge of data mining, machine learning, and optimization. It can
xii Preface

also be used by students and researchers as a reference to review and refresh their
knowledge in data mining, machine learning, optimization, computer science, and data
science.

Xin-She Yang
January 2019 in London
Acknowledgments

I would like to thank all my students and colleagues who have given valuable feedback
and comments on some of the contents and examples of this book. I also would like to
thank my editors, J. Scott Bentley and Michael Lutz, and the staff at Elsevier for their
professionalism. Last but not least, I thank my family for all the help and support.

Xin-She Yang
January 2019
This page intentionally left blank
Introduction to optimization
Contents
1.1 Algorithms
1 1
1.1.1 Essence of an algorithm 1
1.1.2 Issues with algorithms 3
1.1.3 Types of algorithms 3
1.2 Optimization 4
1.2.1 A simple example 4
1.2.2 General formulation of optimization 7
1.2.3 Feasible solution 9
1.2.4 Optimality criteria 10
1.3 Unconstrained optimization 10
1.3.1 Univariate functions 11
1.3.2 Multivariate functions 12
1.4 Nonlinear constrained optimization 14
1.4.1 Penalty method 15
1.4.2 Lagrange multipliers 16
1.4.3 Karush–Kuhn–Tucker conditions 17
1.5 Notes on software 18

This book introduces the most fundamentals and algorithms related to optimization,
data mining, and machine learning. The main requirement is some understanding of
high-school mathematics and basic calculus; however, we will review and introduce
some of the mathematical foundations in the first two chapters.

1.1 Algorithms
An algorithm is an iterative, step-by-step procedure for computation. The detailed
procedure can be a simple description, an equation, or a series of descriptions in
combination with equations. Finding the roots of a polynomial, checking if a natu-
ral number is a prime number, and generating random numbers are all algorithms.

1.1.1 Essence of an algorithm


In essence, an algorithm can be written as an iterative equation or a set of iterative
equations. For example, to find a square root of a > 0, we can use the following
iterative equation:
1 a
xk+1 = xk + , (1.1)
2 xk
where k is the iteration counter (k = 0, 1, 2, . . . ) starting with a random guess x0 = 1.
Introduction to Algorithms for Data Mining and Machine Learning. https://doi.org/10.1016/B978-0-12-817216-2.00008-9
Copyright © 2019 Elsevier Inc. All rights reserved.
2 Introduction to Algorithms for Data Mining and Machine Learning

Example 1
As an example, if x0 = 1 and a = 4, then we have

1 4
x1 = (1 + ) = 2.5. (1.2)
2 1

Similarly, we have

1 4 1 4
x2 = (2.5 + ) = 2.05, x3 = (2.05 + ) ≈ 2.0061, (1.3)
2 2.5 2 2.05
x4 ≈ 2.00000927, (1.4)

which is very close to the true value of 4 = 2. The accuracy of this iterative formula or algorithm
is high because it achieves the accuracy of five decimal places after four iterations.

The convergence is very quick if we start from different initial values such as
x0 = 10 and even x0 = 100. However, for an obvious reason, we cannot start with
x0 = 0 due to division by
√zero.
Find the root of x = a is equivalent to solving the equation

f (x) = x 2 − a = 0, (1.5)

which is again equivalent to finding the roots of a polynomial f (x). We know that
Newton’s root-finding algorithm can be written as

f (xk )
xk+1 = xk − , (1.6)
f  (xk )

where f  (x) is the first derivative or gradient of f (x). In this case, we have
f  (x) = 2x. Thus, Newton’s formula becomes

(xk2 − a)
xk+1 = xk − , (1.7)
2xk

which can be written as


xk a 1 a
xk+1 = (xk − )+ = xk + ). (1.8)
2 2xk 2 xk

This is exactly what we have in Eq. (1.1).


Newton’s method has rigorous mathematical foundations, which has a guaranteed
convergence under certain conditions. However, in general, Eq. (1.6) is more general,
and the gradient information f  (x) is needed. In addition, for the formula to be valid,
we must have f  (x) = 0.
Introduction to optimization 3

1.1.2 Issues with algorithms


The advantage of the algorithm given in Eq. (1.1) is that√it converges very quickly.
However, careful readers may have asked: we know that 4 = ±2, how can we find
the other root −2 in addition to +2?
Even if we use different initial value x0 = 10 or x0 = 0.5, we can only reach x∗ = 2,
not −2.
What happens if we start with x0 < 0? From x0 = −1, we have
1 4 1 4
x1 = (−1 + ) = −2.5, x 2 = (−2.5 + ) = −2.05, (1.9)
2 −1 2 −2.5
x3 ≈ −2.0061, x4 ≈ −2.00000927, (1.10)
which is approaching −2 very quickly. If we start from x0 = −10 or x0 = −0.5, then
we can always get x∗ = −2, not +2.
This highlights a key issue here: the final solution seems to depend on the initial
starting point for this algorithm, which is true for many algorithms.
Now the relevant question is: how do we know where to start to get a particular
solution? The general short answer is “we do not know”. Thus, some knowledge of
the problem under consideration or an educated guess may be useful to find the final
solution.
In fact, most algorithms may depend on the initial configuration, and such algo-
rithms are often carrying out search moves locally. Thus, this type of algorithm is
often referred to as local search. A good algorithm should be able to “forget” its initial
configuration though such algorithms may not exist at all for most types of problems.
What we need in general is the global search, which attempts to find final solutions
that are less sensitive to the initial starting point(s).
Another important issue in our discussions is that the gradient information f  (x) is
necessary for some algorithms such as Newton’s method given in Eq. (1.6). This poses
certain requirements on the smoothness of the function f (x). For example, we know
that |x| is not differentiable at x = 0. Thus, we cannot directly use Newton’s method
to find the roots of f (x) = |x|x 2 − a = 0 for a > 0. Some modifications are needed.
There are other issues related to algorithms such as the setting of parameters, the
slow rate of convergence, condition numbers, and iteration structures. All these make
algorithm designs and usage somehow challenging, and we will discuss these issues
in more detail later in this book.

1.1.3 Types of algorithms


An algorithm can only do a specific computation task (at most a class of computational
tasks), and no algorithms can do all the tasks. Thus, algorithms can be classified due
to their purposes. An algorithm to find roots of a polynomial belongs to root-finding
algorithms, whereas an algorithm for ranking a set of numbers belongs to sorting
algorithms. There are many classes of algorithms for different purposes. Even for the
same purpose such as sorting, there are many different algorithms such as the merge
sort, bubble sort, quicksort, and others.
4 Introduction to Algorithms for Data Mining and Machine Learning

We can also categorize algorithms in terms of their characteristics. The root-finding


algorithms we just introduced are deterministic algorithms because the final solutions
are exactly the same if we start from the same initial guess. We obtain the same set of
solutions every time we run the algorithm. On the other hand, we may introduce some
randomization into the algorithm, for example, using purely random initial points.
Every time we run the algorithm, we use a new random initial guess. In this case, the
algorithm can have some nondeterministic nature, and such algorithms are referred
to as stochastic.√Sometimes, using randomness may be advantageous. For example, in
the example of 4 = ±2 using Eq. (1.1), random initial values (both positive and neg-
ative) can allow the algorithm to find both roots. In fact, a major trend in the modern
metaheuristics is using some randomization to suit different purposes.
For algorithms to be introduced in this book, we are mainly concerned with al-
gorithms for data mining, optimization, and machine learning. We use a relatively
unified approach to link algorithms in data mining and machine learning to algorithms
for optimization.

1.2 Optimization

Optimization is everywhere, from engineering design to business planning. After all,


time and resources are limited, and optimal use of such valuable resources is crucial.
In addition, designs of products have to maximize the performance, sustainability, and
energy efficiency and to minimize the costs. Therefore, optimization is important for
many applications.

1.2.1 A simple example


Let us start with a very simple example to design a container with volume capacity
V0 = 10 m3 . As the main cost is related to the cost of materials, the main aim is to
minimize the total surface area S.
The first thing we have to decide is the shape of the container (cylinder, cubic,
sphere or ellipsoid, or more complex geometry). For simplicity, let us start with a
cylindrical shape with radius r and height h (see Fig. 1.1).
The total surface area of a cylinder is

S = 2(πr 2 ) + 2πrh, (1.11)

and the volume is

V = πr 2 h. (1.12)

There are only two design variables r and h and one objective function S to be min-
imized. Obviously, if there is no capacity constraint, then we can choose not to build
the container, and then the cost of materials is zero for r = 0 and h = 0. However,
Introduction to optimization 5

Figure 1.1 Design of a cylindric container.

the constraint requirement means that we have to build a container with fixed volume
V0 = πr 2 h = 10 m3 . Therefore, this optimization problem can be written as

minimize S = 2πr 2 + 2πrh, (1.13)

subject to the equality constraint

πr 2 h = V0 = 10. (1.14)

To solve this problem, we can first try to use the equality constraint to reduce the
number of design variables by solving h. So we have
V0
h= . (1.15)
πr 2
Substituting it into (1.13), we get

S = 2πr 2 + 2πrh
V0 2V0
= 2πr 2 + 2πr 2 = 2πr 2 + . (1.16)
πr r
This is a univariate function. From basic calculus we know that the minimum or max-
imum can occur at the stationary point, where the first derivative is zero, that is,
dS 2V0
= 4πr − 2 = 0, (1.17)
dr r
which gives

V0 3 V0
r3 = , or r = . (1.18)
2π 2π
Thus, the height is

h V0 /(πr 2 ) V0
= = 3 = 2. (1.19)
r r πr
6 Introduction to Algorithms for Data Mining and Machine Learning

This means that the height is twice the radius: h = 2r. Thus, the minimum surface is

S∗ = 2πr 2 + 2πrh = 2πr 2 + 2πr(2r) = 6πr 2


 V 2/3 6π
0 2/3
= 6π =√3
V0 . (1.20)
2π 4π 2

For V0 = 10, we have


 
3 V0 3 10
r= = ≈ 1.1675, h = 2r = 2.335,
(2π) 2π

and the total surface area

S∗ = 2πr 2 + 2πrh ≈ 25.69.

It is worth pointing out that this optimal solution is based on the assumption or re-
quirement to design a cylindrical container. If we decide to use a sphere with radius R,
we know that its volume and surface area is
4π 3
V0 = R , S = 4πR 2 . (1.21)
3
We can solve R directly

3V0 3 3V0
R =
3
, or R = , (1.22)
4π 4π
which gives the surface area
 3V 2/3 √
0 4π 3 9 2/3
S = 4π =√ 3
V0 . (1.23)
4π 16π 2
√3 √ √ 3
Since 6π/ 4π 2 ≈ 5.5358 and 4π 3 9/ 16π 2 ≈ 4.83598, we have S < S∗ , that is, the
surface area of a sphere is smaller than the minimum surface area of a cylinder with
the same volume. In fact, for the same V0 = 10, we have

4π 3 9 2/3
S(sphere) = √ 3
V0 ≈ 22.47, (1.24)
16π 2
which is smaller than S∗ = 25.69 for a cylinder.
This highlights the importance of the choice of design type (here in terms of shape)
before we can do any truly useful optimization. Obviously, there are many other fac-
tors that can influence the choice of design, including the manufacturability of the
design, stability of the structure, ease of installation, space availability, and so on. For
a container, in most applications, a cylinder may be much easier to produce than a
sphere, and thus the overall cost may be lower in practice. Though there are so many
factors to be considered in engineering design, for the purpose of optimization, here
we will only focus on the improvement and optimization of a design with well-posed
mathematical formulations.
Introduction to optimization 7

1.2.2 General formulation of optimization


Whatever the real-world applications may be, it is usually possible to formulate an
optimization problem in a generic form [49,53,160]. All optimization problems with
explicit objectives can in general be expressed as a nonlinearly constrained optimiza-
tion problem

maximize/minimize f (x), x = (x1 , x2 , . . . , xD )T ∈ RD ,


subject to φj (x) = 0 (j = 1, 2, . . . , M),
ψk (x) ≤ 0 (k = 1, . . . , N), (1.25)

where f (x), φj (x), and ψk (x) are scalar functions of the design vector x. Here the
components xi of x = (x1 , . . . , xD )T are called design or decision variables, and they
can be either continuous, discrete, or a mixture of these two. The vector x is often
called the decision vector, which varies in a D-dimensional space RD .
It is worth pointing out that we use a column vector here for x (thus with trans-
pose T ). We can also use a row vector x = (x1 , . . . , xD ) and the results will be the
same. Different textbooks may use slightly different formulations. Once we are aware
of such minor variations, it should cause no difficulty or confusion.
In addition, the function f (x) is called the objective function or cost function,
φj (x) are constraints in terms of M equalities, and ψk (x) are constraints written as
N inequalities. So there are M + N constraints in total. The optimization problem
formulated here is a nonlinear constrained problem. Here the inequalities ψk (x) ≤ 0
are written as “less than”, and they can also be written as “greater than” via a simple
transformation by multiplying both sides by −1.
The space spanned by the decision variables is called the search space RD , whereas
the space formed by the values of the objective function is called the objective or
response space, and sometimes the landscape. The optimization problem essentially
maps the domain RD or the space of decision variables into the solution space R (or
the real axis in general).
The objective function f (x) can be either linear or nonlinear. If the constraints φj
and ψk are all linear, it becomes a linearly constrained problem. Furthermore, when
φj , ψk , and the objective function f (x) are all linear, then it becomes a linear pro-
gramming problem [35]. If the objective is at most quadratic with linear constraints,
then it is called a quadratic programming problem. If all the values of the decision
variables can be only integers, then this type of linear programming is called integer
programming or integer linear programming.
On the other hand, if no constraints are specified and thus xi can take any values
in the real axis (or any integers), then the optimization problem is referred to as an
unconstrained optimization problem.
As a very simple example of optimization problems without any constraints, we
discuss the search of the maxima or minima of a univariate function.
8 Introduction to Algorithms for Data Mining and Machine Learning

2
Figure 1.2 A simple multimodal function f (x) = x 2 e−x .

Example 2
For example, to find the maximum of a univariate function f (x)

f (x) = x 2 e−x ,
2
−∞ < x < ∞, (1.26)

is a simple unconstrained problem, whereas the following problem is a simple constrained mini-
mization problem:

f (x1 , x2 ) = x12 + x1 x2 + x22 , (x1 , x2 ) ∈ R2 , (1.27)

subject to

x1 ≥ 1, x2 − 2 = 0. (1.28)

It is worth pointing out that the objectives are explicitly known in all the optimiza-
tion problems to be discussed in this book. However, in reality, it is often difficult to
quantify what we want to achieve, but we still try to optimize certain things such as the
degree of enjoyment or service quality on holiday. In other cases, it may be impossible
to write the objective function in any explicit form mathematically.
From basic calculus we know that, for a given curve described by f (x), its gradient
f  (x) describes the rate of change. When f  (x) = 0, the curve has a horizontal tangent
at that particular point. This means that it becomes a point of special interest. In fact,
the maximum or minimum of a curve occurs at
f  (x∗ ) = 0, (1.29)

which is a critical condition or stationary condition. The solution x∗ to this equation


corresponds to a stationary point, and there may be multiple stationary points for a
given curve.
To see if it is a maximum or minimum at x = x∗ , we have to use the information of
its second derivative f  (x). In fact, f  (x∗ ) > 0 corresponds to a minimum, whereas
f  (x∗ ) < 0 corresponds to a maximum. Let us see a concrete example.

Example 3
To find the minimum of f (x) = x 2 e−x (see Fig. 1.2), we have the stationary condition
2

f  (x) = 0 or

f  (x) = 2x × e−x + x 2 × (−2x)e−x = 2(x − x 3 )e−x = 0.


2 2 2
Introduction to optimization 9

Figure 1.3 (a) Feasible domain with nonlinear inequality constraints ψ1 (x) and ψ2 (x) (left) and linear
inequality constraint ψ3 (x). (b) An example with an objective of f (x) = x 2 subject to x ≥ 2 (right).

As e−x > 0, we have


2

x(1 − x 2 ) = 0, or x = 0 and x = ±1.

The second derivative is given by

f  (x) = 2e−x (1 − 5x 2 + 2x 4 ),
2

which is an even function with respect to x.


So at x = ±1, f  (±1) = 2[1 − 5(±1)2 + 2(±1)4 ]e−(±1) = −4e−1 < 0. Thus, there are
2

two maxima that occur at x∗ = ±1 with fmax = e−1 . At x = 0, we have f  (0) = 2 > 0, thus
the minimum of f (x) occurs at x∗ = 0 with fmin (0) = 0.

Whatever the objective is, we have to evaluate it many times. In most cases, the
evaluations of the objective functions consume a substantial amount of computational
power (which costs money) and design time. Any efficient algorithm that can reduce
the number of objective evaluations saves both time and money.
In mathematical programming, there are many important concepts, and we will
first introduce a few related concepts: feasible solutions, optimality criteria, the strong
local optimum, and weak local optimum.

1.2.3 Feasible solution


A point x that satisfies all the constraints is called a feasible point and thus is a feasible
solution to the problem. The set of all feasible points is called the feasible region (see
Fig. 1.3).
For example, we know that the domain f (x) = x 2 consists of all real numbers. If
we want to minimize f (x) without any constraint, all solutions such as x = −1, x = 1,
and x = 0 are feasible. In fact, the feasible region is the whole real axis. Obviously,
x = 0 corresponds to f (0) = 0 as the true minimum.
However, if we want to find the minimum of f (x) = x 2 subject to x ≥ 2, then it
becomes a constrained optimization problem. The points such as x = 1 and x = 0 are
no longer feasible because they do not satisfy x ≥ 2. In this case the feasible solutions
are all the points that satisfy x ≥ 2. So x = 2, x = 100, and x = 108 are all feasible. It
is obvious that the minimum occurs at x = 2 with f (2) = 22 = 4, that is, the optimal
solution for this problem occurs at the boundary point x = 2 (see Fig. 1.3).
10 Introduction to Algorithms for Data Mining and Machine Learning

Figure 1.4 Local optima, weak optima, and global optimality.

1.2.4 Optimality criteria


A point x ∗ is called a strong local maximum of the nonlinearly constrained op-
timization problem if f (x) is defined in a δ-neighborhood N (x ∗ , δ) and satisfies
f (x ∗ ) > f (u) for u ∈ N (x ∗ , δ), where δ > 0 and u = x ∗ . If x ∗ is not a strong lo-
cal maximum, then the inclusion of equality in the condition f (x ∗ ) ≥ f (u) for all
u ∈ N (x ∗ , δ) defines the point x ∗ as a weak local maximum (see Fig. 1.4). The local
minima can be defined in a similar manner when > and ≥ are replaced by < and ≤,
respectively.
Fig. 1.4 shows various local maxima and minima. Point A is a strong local max-
imum, whereas point B is a weak local maximum because there are many (in fact,
infinite) different values of x that will lead to the same value of f (x ∗ ). Point D is the
global maximum, and point E is the global minimum. In addition, point F is a strong
local minimum. However, point C is a strong local minimum, but it has a discontinuity
in f  (x ∗ ). So the stationary condition for this point f  (x ∗ ) = 0 is not valid. We will
not deal with these types of minima or maxima in detail.
As we briefly mentioned before, for a smooth curve f (x), optimal solutions usu-
ally occur at stationary points where f  (x) = 0. This is not always the case because
optimal solutions can also occur at the boundary, as we have seen in the previous ex-
ample of minimizing f (x) = x 2 subject to x ≥ 2. In our present discussion, we will
assume that both f (x) and f  (x) are always continuous or f (x) is everywhere twice
continuously differentiable. Obviously, the information of f  (x) is not sufficient to
determine whether a stationary point is a local maximum or minimum. Thus, higher-
order derivatives such as f  (x) are needed, but we do not make any assumption at this
stage. We will further discuss this in detail in the next section.

1.3 Unconstrained optimization

Optimization problems can be classified as either unconstrained or constrained. Un-


constrained optimization problems can in turn be subdivided into univariate and mul-
tivariate problems.
Introduction to optimization 11

1.3.1 Univariate functions


The simplest optimization problem without any constraints is probably the search for
the maxima or minima of a univariate function f (x). For unconstrained optimization
problems, the optimality occurs at the critical points given by the stationary condition
f  (x) = 0.
However, this stationary condition is just a necessary condition, but it is not a suf-
ficient condition. If f  (x∗ ) = 0 and f  (x∗ ) > 0, it is a local minimum. Conversely, if
f  (x∗ ) = 0 and f  (x∗ ) < 0, then it is a local maximum. However, if f  (x∗ ) = 0 and
f  (x∗ ) = 0, care should be taken because f  (x) may be indefinite (both positive and
negative) when x → x∗ , then x∗ corresponds to a saddle point.
For example, for f (x) = x 3 , we have

f  (x) = 3x 2 , f  (x) = 6x. (1.30)

The stationary condition f  (x) = 3x 2 = 0 gives x∗ = 0. However, we also have

f  (x∗ ) = f  (0) = 0.

In fact, f (x) = x 3 has a saddle point x∗ = 0 because f  (0) = 0 but f  changes sign
from f  (0+) > 0 to f  (0−) < 0 as x moves from positive to negative.

Example 4
For example, to find the maximum or minimum of a univariate function

f (x) = 3x 4 − 4x 3 − 12x 2 + 9, −∞ < x < ∞,

we first have to find its stationary points x∗ when the first derivative f  (x) is zero, that is,

f  (x) = 12x 3 − 12x 2 − 24x = 12(x 3 − x 2 − 2x) = 0.

Since f  (x) = 12(x 3 − x 2 − 2x) = 12x(x + 1)(x − 2) = 0, we have

x∗ = −1, x∗ = 2, x∗ = 0.

The second derivative of f (x) is simply

f  (x) = 36x 2 − 24x − 24.

From the basic calculus we know that the maximum requires f  (x∗ ) ≤ 0 whereas the minimum
requires f  (x∗ ) ≥ 0.
At x∗ = −1, we have

f  (−1) = 36(−1)2 − 24(−1) − 24 = 36 > 0,

so this point corresponds to a local minimum

f (−1) = 3(−1)4 − 4(−1)3 − 12(−1)2 + 9 = 4.


12 Introduction to Algorithms for Data Mining and Machine Learning

Similarly, at x∗ = 2, f  (x∗ ) = 72 > 0, and thus we have another local minimum

f (x∗ ) = −23.

However, at x∗ = 0, we have f  (0) = −24 < 0, which corresponds to a local maximum


f (0) = 9. However, this maximum is not a global maximum because the global maxima for f (x)
occur at x = ±∞.
The global minimum occurs at x∗ = 2 with f (2) = −23.

The maximization of a function f (x) can be converted into the minimization of A−


f (x), where A is usually a large positive number (though A = 0 will do). For example,
we know the maximum of f (x) = e−x , x ∈ (−∞, ∞), is 1 at x∗ = 0. This problem
2

can be converted to the minimization of −f (x). For this reason, the optimization
problems can be expressed as either minimization or maximization depending on the
context and convenience of formulations.
In fact, in the optimization literature, some books formulate all the optimization
problems in terms of maximization, whereas others write these problems in terms of
minimization, though they are in essence dealing with the same problems.

1.3.2 Multivariate functions


We can extend the optimization procedure for univariate functions to multivariate
functions using partial derivatives and relevant conditions. Let us start with the ex-
ample

minimize f (x, y) = x 2 + y 2 , x, y ∈ R. (1.31)

It is obvious that x = 0 and y = 0 is a minimum solution because f (0, 0) = 0. The


question is how to solve this problem formally. We can extend the stationary condition
to partial derivatives, and we have ∂f ∂f
∂x = 0 and ∂y = 0. In this case, we have

∂f ∂f
= 2x + 0 = 0, = 0 + 2y = 0. (1.32)
∂x ∂y

The solution is obviously x∗ = 0 and y∗ = 0.


Now how do we know that it corresponds to a maximum or minimum? If we try to
use the second derivatives, we have four different partial derivatives such as fxx and
fyy , and which one should we use? In fact, we need to define the Hessian matrix from
these second partial derivatives, and we have
⎛ ⎞
 ∂ 2f ∂ 2f
fxx fxy
=⎝ ⎠.
∂x 2 ∂x∂y
H= (1.33)
fyx fyy ∂ 2f ∂ 2f
∂y∂x ∂y 2
Introduction to optimization 13

Since

∂ 2f ∂ 2f
= , (1.34)
∂x∂y ∂y∂x

we can conclude that the Hessian matrix is always symmetric. In the case of f (x, y) =
x 2 + y 2 , it is easy to check that the Hessian matrix is

2 0
H= . (1.35)
0 2

Mathematically speaking, if H is positive definite, then the stationary point (x∗ , y∗ )


corresponds to a local minimum. Similarly, if H is negative definite, then the sta-
tionary point corresponds to a maximum. The definiteness of a symmetric matrix is
controlled by its eigenvalues. For this simple diagonal matrix H , its eigenvalues are
its two diagonal entries 2 and 2. As both eigenvalues are positive, this matrix is pos-
itive definite. Since the Hessian matrix here does not involve any x or y, it is always
positive definite in the whole search domain (x, y) ∈ R2 , so we can conclude that the
solution at point (0, 0) is the global minimum.
Obviously, this is a particular case. In general, the Hessian matrix depends on the
independent variables, but the definiteness test conditions still apply. That is, positive
definiteness of a stationary point means a local minimum. Alternatively, for bivariate
functions, we can define the determinant of the Hessian matrix in Eq. (1.33) as

 = det(H ) = fxx fyy − (fxy )2 . (1.36)

At the stationary point (x∗ , y∗ ), if  > 0 and fxx > 0, then (x∗ , y∗ ) is a local mini-
mum. If  > 0 but fxx < 0, then it is a local maximum. If  = 0, then it is inconclu-
sive, and we have to use other information such as higher-order derivatives. However,
if  < 0, then it is a saddle point. A saddle point is a special point where a local
minimum occurs along one direction, whereas the maximum occurs along another
(orthogonal) direction.

Example 5
To minimize f (x, y) = (x − 1)2 + x 2 y 2 , we have

∂f ∂f
= 2(x − 1) + 2xy 2 = 0, = 0 + 2x 2 y = 0. (1.37)
∂x ∂y

The second condition gives y = 0 or x = 0. Substituting y = 0 into the first condition, we have
x = 1. However, x = 0 does not satisfy the first condition. Therefore, we have a solution x∗ = 1
and y∗ = 0.
For our example with f = (x − 1)2 + x 2 y 2 , we have

∂ 2f 2 2 2
2 + 2, ∂ f = 4xy, ∂ f = 4xy, ∂ f = 2x 2 ,
= 2y (1.38)
∂x 2 ∂x∂y ∂y∂x ∂y 2
14 Introduction to Algorithms for Data Mining and Machine Learning

and thus we have


 
2y 2 + 2 4xy
H= . (1.39)
4xy 2x 2

At the stationary point (x∗ , y∗ ) = (1, 0), the Hessian matrix becomes
 
2 0
H= ,
0 2

which is positive definite because its double eigenvalues 2 are positive. Alternatively, we have
 = 4 > 0 and fxx = 2 > 0. Therefore, (1, 0) is a local minimum.

In fact, for a multivariate function f (x1 , x2 , . . . , xn ) in an n-dimensional space, the


stationary condition can be extended to
∂f ∂f ∂f T
G = ∇f = ( , ,..., ) = 0, (1.40)
∂x1 ∂x2 ∂xn
where G is called the gradient vector. The second derivative test becomes the definite-
ness of the Hessian matrix
⎛ 2 ⎞
∂f ∂ 2f ∂ 2f
...
⎜ ∂x1 2 ∂x1 ∂x2 ∂x1 ∂xn ⎟
⎜ ∂ 2f 2f 2f ⎟
⎜ ∂x ∂x ∂
... ∂x∂2 ∂x ⎟
H =⎜ ⎜ 2 1 ∂x 2 n ⎟. (1.41)

2
⎜ .. . .
.. . .. .
.. ⎟
⎝ ⎠
∂ 2f ∂ 2f ∂ 2f
∂xn ∂x1 ∂xn ∂x2 ... ∂xn 2

At the stationary point defined by G = ∇f = 0, the positive definiteness of H gives a


local minimum, whereas the negative definiteness corresponds to a local maximum. In
essence, the eigenvalues of the Hessian matrix H determine the local behavior of the
function. As we mentioned before, if H is positive semidefinite, then it corresponds
to a local minimum.

1.4 Nonlinear constrained optimization


As most real-world problems are nonlinear, nonlinear mathematical programming
forms an important part of mathematical optimization methods. A broad class of non-
linear programming problems is about the minimization or maximization of f (x) sub-
ject to no constraints, and another important class is the minimization of a quadratic
objective function subject to nonlinear constraints. There are many other nonlinear
programming problems as well.
Nonlinear programming problems are often classified according to the convexity
of the defining functions. An interesting property of a convex function f is that the
Introduction to optimization 15

vanishing of the gradient ∇f (x ∗ ) = 0 guarantees that the point x∗ is a global minimum


or maximum of f . We will introduce the concept of convexity in the next chapter. If
a function is not convex or concave, then it is much more difficult to find its global
minima or maxima.

1.4.1 Penalty method


For the simple function optimization with equality and inequality constraints, a com-
mon method is the penalty method. For the optimization problem

minimize f (x), x = (x1 , . . . , xn )T ∈ Rn ,

subject to φi (x) = 0, (i = 1, . . . , M), ψj (x) ≤ 0, (j = 1, . . . , N ), (1.42)


the idea is to define a penalty function so that the constrained problem is transformed
into an unconstrained problem. Now we define


M 
N
(x, μi , νj ) = f (x) + μi φi2 (x) + νj max{0, ψj (x)}2 , (1.43)
i=1 j =1

where μi 1 and νj ≥ 0.
For example, let us solve the following minimization problem:

minimize f (x) = 40(x − 1)2 , x ∈ R, subject to g(x) = x − a ≥ 0, (1.44)

where a is a given value. Obviously, without this constraint, the minimum value occurs
at x = 1 with fmin = 0. If a < 1, then the constraint will not affect the result. However,
if a > 1, then the minimum should occur at the boundary x = a (which can be obtained
by inspecting or visualizing the objective function and the constraint). Now we can
define a penalty function (x) using a penalty parameter μ 1. We have

(x, μ) = f (x) + μ[g(x)]2 = 40(x − 1)2 + μ(x − a)2 , (1.45)

which converts the original constrained optimization problem into an unconstrained


problem. From the stationarity condition  (x) = 0 we have
40 − μa
80(x − 1) − 2μ(x − a) = 0, or x∗ = . (1.46)
40 − μ
For a particular case a = 1, we have x∗ = 1, and the result does not depend on μ.
However, in the case of a > 1 (say, a = 5), the result will depend on μ. When a = 5
and μ = 100, we have x∗ = 40 − 100 × 5/40 − 100 = 7.6667. If μ = 1000, then this
gives 50 − 1000 ∗ 5/40 − 1000 = 5.1667. Both values are far from the exact solution
xtrue = a = 5. If we use μ = 104 , then we have x∗ ≈ 5.0167. Similarly, for μ = 105 ,
we have x∗ ≈ 5.00167. This clearly demonstrates that the solution in general depends
on μ. However, it is very difficult to use extremely large values without causing extra
computational difficulties.
Another random document with
no related content on Scribd:
purity and abstinence of his style, and we who speak it, for having
emboldened us to trust ourselves to take delight in simple things,
and to trust ourselves to our own instincts. And he hath his reward. It
needs not to

Bid Beaumont lie


A little farther off to make him room,

for there is no fear of crowding in that little society with whom he is


now enrolled as the fifth in the succession of the great English poets.
LECTURE XII
THE FUNCTION OF THE POET

(Friday Evening, February 16, 1855)

XII
Whether, as some philosophers here assume, we possess only the
fragments of a great cycle of knowledge, in whose center stood the
primeval man in friendly relation with the powers of the universe, and
build our hovels out of the ruins of our ancestral palace; or whether,
according to the developing theory of others, we are rising gradually
and have come up from an atom instead of descending from an
Adam, so that the proudest pedigree might run up to a barnacle or a
zoöphyte at last, are questions which will keep for a good many
centuries yet. Confining myself to what little we can learn from
History, we find tribes rising slowly out of barbarism to a higher or
lower point of culture and civility, and everywhere the poet also is
found under one name or another, changing in certain outward
respects, but essentially the same.

But however far we go back, we shall find this also—that the poet
and the priest were united originally in the same person: which
means that the poet was he who was conscious of the world of spirit
as well as that of sense, and was the ambassador of the gods to
men. This was his highest function, and hence his name of seer.

I suppose the word epic originally meant nothing more than this, that
the poet was the person who was the greatest master of speech. His
were the ἔπεα πτερόεντα, the true winged words that could fly down
the unexplored future and carry thither the names of ancestral
heroes, of the brave, and wise, and good. It was thus that the poet
could reward virtue, and, by and by, as society grew more complex,
could burn in the brand of shame. This is Homer’s character of
Demodocus in the eighth book of the “Odyssey,”

When the Muse loved and gave the good and ill,

the gift of conferring good or evil immortality.

The first histories were in verse, and, sung as they were at the feasts
and gatherings of the people, they awoke in men the desire of fame,
which is the first promoter of courage and self-trust, because it
teaches men by degrees to appeal from the present to the future.
We may fancy what the influence of the early epics was when they
were recited to men who claimed the heroes celebrated in them for
their ancestors, by what Bouchardon, the sculptor, said only two
centuries ago: “When I read Homer I feel as if I were twenty feet
high.”

Nor have poets lost their power over the future in modern times.
Dante lifts up by the hair the face of some petty traitor, the Smith and
Brown of some provincial Italian town, lets the fire of his Inferno glare
upon it for a moment, and it is printed forever on the memory of
mankind. The historians may iron out the shoulders of Richard III. as
smooth as they can; they will never get over the wrench that
Shakspeare gave them.

The peculiarity of almost all early literature is that it seems to have a


double meaning; that underneath its natural we find ourselves
continually seeing and suspecting a supernatural meaning. Even in
the older epics the characters seem to be only half-historical and
half-typical. They appear as the Pilgrim Fathers do in Twenty-second
of December speeches at Plymouth. The names may be historical,
but the attributes are ideal. The orator draws a portrait rather of what
he thinks the founders ought to have been than a likeness which
contemporaries would have recognized. Thus did early poets
endeavor to make reality out of appearances. For, except a few
typical men in whom certain ideas get embodied, the generations of
mankind are mere apparitions who come out of the dark for a
purposeless moment, and enter the dark again after they have
performed the nothing they came for.

The poet’s gift, then, is that of seer. He it is that discovers the truth
as it exists in types and images; that is the spiritual meaning, which
abides forever under the sensual. And his instinct is to express
himself also in types and images. But it was not only necessary that
he himself should be delighted with his vision, but that he should
interest his hearers with the faculty divine. Pure truth is not
acceptable to the mental palate. It must be diluted with character and
incident; it must be humanized in order to be attractive. If the bones
of a mastodon be exhumed, a crowd will gather out of curiosity; but
let the skeleton of a man be turned up, and what a difference in the
expression of the features! Every bystander then creates his little
drama, in which those whitened bones take flesh upon them and
stalk as chief actor.

The poet is he who can best see or best say what is ideal; what
belongs to the world of soul and of beauty. Whether he celebrates
the brave and good, or the gods, or the beautiful as it appears in
man or nature, something of a religious character still clings to him.
He may be unconscious of his mission; he may be false to it, but in
proportion as he is a great poet, he rises to the level of it more often.
He does not always directly rebuke what is bad or base, but
indirectly, by making us feel what delight there is in the good and fair.
If he besiege evil it is with such beautiful engines of war (as Plutarch
tells us of Demetrius) that the besieged themselves are charmed
with them. Whoever reads the great poets cannot but be made better
by it, for they always introduce him to a higher society, to a greater
style of manners and of thinking. Whoever learns to love what is
beautiful is made incapable of the mean and low and bad. It is
something to be thought of, that all the great poets have been good
men. He who translates the divine into the vulgar, the spiritual into
the sensual, is the reverse of a poet.
It seems to be thought that we have come upon the earth too late;
that there has been a feast of the imagination formerly, and all that is
left for us is to steal the scraps. We hear that there is no poetry in
railroads, steamboats, and telegraphs, and especially in Brother
Jonathan. If this be true, so much the worse for him. But, because he
is a materialist, shall there be no poets? When we have said that we
live in a materialistic age, we have said something which meant
more than we intended. If we say it in the way of blame, we have
said a foolish thing, for probably one age is as good as another; and,
at any rate, the worst is good enough company for us. The age of
Shakspeare seems richer than our own only because he was lucky
enough to have such a pair of eyes as his to see it and such a gift as
his to report it. Shakspeare did not sit down and cry for the water of
Helicon to turn the wheels of his little private mill there at the
Bankside. He appears to have gone more quietly about his business
than any playwright in London; to have drawn off what water-power
he wanted from the great prosy current of affairs that flows alike for
all, and in spite of all; to have ground for the public what grist they
want, coarse or fine; and it seems a mere piece of luck that the
smooth stream of his activity reflected with ravishing clearness every
changing mood of heaven and earth, every stick and stone, every
dog and clown and courtier that stood upon its brink. It is a curious
illustration of the friendly manner in which Shakspeare received
everything that came along, of what a present man he was, that in
the very same year that the mulberry tree was brought into England,
he got one and planted it in his garden at Stratford.

It is perfectly true that this is a materialistic age, and for this very
reason we want our poets all the more. We find that every
generation contrives to catch its singing larks without the sky’s
falling. When the poet comes he always turns out to be the man who
discovers that the passing moment is the inspired one, and that the
secret of poetry is not to have lived in Homer’s day or Dante’s, but to
be alive now. To be alive now, that is the great art and mystery. They
are dead men who live in the past, and men yet unborn who live in
the future. We are like Hans-in-Luck, forever exchanging the
burthensome good we have for something else, till at last we come
home empty-handed. The people who find their own age prosaic are
those who see only its costume. And this is what makes it prosaic:
that we have not faith enough in ourselves to think that our own
clothes are good enough to be presented to Posterity in. The artists
seem to think that the court dress of posterity is that of Vandyke’s
time or Cæsar’s. I have seen the model of a statue of Sir Robert
Peel—a statesman whose merit consisted in yielding gracefully to
the present—in which the sculptor had done his best to travesty the
real man into a make-believe Roman. At the period when England
produced its greatest poets, we find exactly the reverse of this, and
we are thankful to the man who made the monument of Lord Bacon
that he had genius enough to copy every button of his dress,
everything down to the rosettes on his shoes. These men had faith
even in their own shoe-strings. Till Dante’s time the Italian poets
thought no language good enough to put their nothings into but Latin
(and, indeed, a dead tongue was the best for dead thoughts), but
Dante found the common speech of Florence, in which men
bargained, and scolded, and made love, good enough for him, and
out of the world around him made such a poem as no Roman ever
sang.

We cannot get rid of our wonder, we who have brought down the
wild lightning from writing fiery doom upon the walls of heaven to be
our errand-boy and penny postman. In this day of newspapers and
electric telegraphs, in which common-sense and ridicule can
magnetise a whole continent between dinner and tea, we may say
that such a phenomenon as Mahomet were impossible; and behold
Joe Smith and the State of Deseret! Turning over the yellow leaves
of the same copy of Webster on “Witchcraft” which Cotton Mather
studied, I thought, Well, that goblin is laid at last! And while I mused,
the tables were dancing and the chairs beating the devil’s tattoo all
over Christendom. I have a neighbor who dug down through tough
strata of clay-slate to a spring pointed out by a witch-hazel rod in the
hands of a seventh son’s seventh son, and the water is sweeter to
him for the wonder that is mixed with it. After all, it seems that our
scientific gas, be it never so brilliant, is not equal to the dingy old
Aladdin’s lamp.
It is impossible for men to live in the world without poetry of some
sort or another. If they cannot get the best, they will get at some
substitute for it. But there is as much poetry as ever in the world if we
can ever know how to find it out; and as much imagination, perhaps,
only that it takes a more prosaic direction. Every man who meets
with misfortune, who is stripped of his material prosperity, finds that
he has a little outlying mountain-farm of imagination, which does not
appear in the schedule of his effects, on which his spirit is able to
keep alive, though he never thought of it while he was fortunate. Job
turns out to be a great poet as soon as his flocks and herds are
taken away from him.

Perhaps our continent will begin to sing by and by, as the others
have done. We have had the Practical forced upon us by our
condition. We have had a whole hemisphere to clear up and put to
rights. And we are descended from men who were hardened and
stiffened by a downright wrestle with Necessity. There was no
chance for poetry among the Puritans. And yet if any people have a
right to imagination, it should be the descendants of those very
Puritans. They had enough of it, or they could not have conceived
the great epic they did, whose books are States, and which is written
on this continent from Maine to California.

John Quincy Adams, making a speech at New Bedford many years


ago, reckoned the number of whale ships (if I remember rightly) that
sailed out of that port, and, comparing it with some former period,
took it as a type of American success. But, alas! it is with quite other
oil that those far-shining lamps of a nation’s true glory which burn
forever must be filled. It is not by any amount of material splendor or
prosperity, but only by moral greatness, by ideas, by works of the
imagination, that a race can conquer the future. No voice comes to
us from the once mighty Assyria but the hoot of the owl that nests
amid her crumbling palaces; of Carthage, whose merchant fleets
once furled their sails in every port of the known world, nothing is left
but the deeds of Hannibal. She lies dead on the shore of her once
subject sea, and the wind of the desert flings its handfuls of burial-
sand upon her corpse. A fog can blot Holland or Switzerland out of
existence. But how large is the space occupied in the maps of the
soul by little Athens or powerless Italy. They were great by the soul,
and their vital force is as indestructible as the soul.

Till America has learned to love Art, not as an amusement, not as a


mere ornament of her cities, not as a superstition of what is comme il
faut for a great nation, but for its harmonizing and ennobling energy,
for its power of making men better by arousing in them the
perception of their own instincts for what is beautiful and sacred and
religious, and an eternal rebuke of the base and worldly, she will not
have succeeded in that high sense which alone makes a nation out
of a people, and raises it from a dead name to a living power. Were
our little mother-island sunk beneath the sea; or worse, were she
conquered by Scythian barbarians, yet Shakspeare would be an
immortal England, and would conquer countries when the bones of
her last sailor had kept their ghastly watch for ages in unhallowed
ooze beside the quenched thunders of her navy.

This lesson I learn from the past: that grace and goodness, the fair,
the noble, and the true will never cease out of the world till the God
from whom they emanate ceases out of it; that the sacred duty and
noble office of the poet is to reveal and justify them to man; that as
long as the soul endures, endures also the theme of new and
unexampled song; that while there is grace in grace, love in love,
and beauty in beauty, God will still send poets to find them, and bear
witness of them, and to hang their ideal portraitures in the gallery of
memory. God with us is forever the mystical theme of the hour that is
passing. The lives of the great poets teach us that they were men of
their generation who felt most deeply the meaning of the Present.

I have been more painfully conscious than any one else could be of
the inadequacy of what I have been able to say, when compared to
the richness and variety of my theme. I shall endeavor to make my
apology in verse, and will bid you farewell in a little poem in which I
have endeavored to express the futility of all effort to speak the
loveliness of things, and also my theory of where the Muse is to be
found, if ever. It is to her that I sing my hymn.
Mr. Lowell here read an original poem of considerable length, which
concluded the lecture, and was received with bursts of applause.
*** END OF THE PROJECT GUTENBERG EBOOK LECTURES ON
ENGLISH POETS ***

Updated editions will replace the previous one—the old editions


will be renamed.

Creating the works from print editions not protected by U.S.


copyright law means that no one owns a United States copyright
in these works, so the Foundation (and you!) can copy and
distribute it in the United States without permission and without
paying copyright royalties. Special rules, set forth in the General
Terms of Use part of this license, apply to copying and
distributing Project Gutenberg™ electronic works to protect the
PROJECT GUTENBERG™ concept and trademark. Project
Gutenberg is a registered trademark, and may not be used if
you charge for an eBook, except by following the terms of the
trademark license, including paying royalties for use of the
Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such
as creation of derivative works, reports, performances and
research. Project Gutenberg eBooks may be modified and
printed and given away—you may do practically ANYTHING in
the United States with eBooks not protected by U.S. copyright
law. Redistribution is subject to the trademark license, especially
commercial redistribution.

START: FULL LICENSE


THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the


free distribution of electronic works, by using or distributing this
work (or any other work associated in any way with the phrase
“Project Gutenberg”), you agree to comply with all the terms of
the Full Project Gutenberg™ License available with this file or
online at www.gutenberg.org/license.

Section 1. General Terms of Use and


Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand,
agree to and accept all the terms of this license and intellectual
property (trademark/copyright) agreement. If you do not agree to
abide by all the terms of this agreement, you must cease using
and return or destroy all copies of Project Gutenberg™
electronic works in your possession. If you paid a fee for
obtaining a copy of or access to a Project Gutenberg™
electronic work and you do not agree to be bound by the terms
of this agreement, you may obtain a refund from the person or
entity to whom you paid the fee as set forth in paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only


be used on or associated in any way with an electronic work by
people who agree to be bound by the terms of this agreement.
There are a few things that you can do with most Project
Gutenberg™ electronic works even without complying with the
full terms of this agreement. See paragraph 1.C below. There
are a lot of things you can do with Project Gutenberg™
electronic works if you follow the terms of this agreement and
help preserve free future access to Project Gutenberg™
electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project
Gutenberg™ works in compliance with the terms of this
agreement for keeping the Project Gutenberg™ name
associated with the work. You can easily comply with the terms
of this agreement by keeping this work in the same format with
its attached full Project Gutenberg™ License when you share it
without charge with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.

1.E. Unless you have removed all references to Project


Gutenberg:

1.E.1. The following sentence, with active links to, or other


immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project
Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:

This eBook is for the use of anyone anywhere in the United


States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it
away or re-use it under the terms of the Project Gutenberg
License included with this eBook or online at
www.gutenberg.org. If you are not located in the United
States, you will have to check the laws of the country where
you are located before using this eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is


derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of the
copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is


posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.

1.E.4. Do not unlink or detach or remove the full Project


Gutenberg™ License terms from this work, or any files
containing a part of this work or any other work associated with
Project Gutenberg™.
1.E.5. Do not copy, display, perform, distribute or redistribute
this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must, at
no additional cost, fee or expense to the user, provide a copy, a
means of exporting a copy, or a means of obtaining a copy upon
request, of the work in its original “Plain Vanilla ASCII” or other
form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or


providing access to or distributing Project Gutenberg™
electronic works provided that:

• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who


notifies you in writing (or by e-mail) within 30 days of receipt that
s/he does not agree to the terms of the full Project Gutenberg™
License. You must require such a user to return or destroy all
copies of the works possessed in a physical medium and
discontinue all use of and all access to other copies of Project
Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of


any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project


Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend


considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite
these efforts, Project Gutenberg™ electronic works, and the
medium on which they may be stored, may contain “Defects,”
such as, but not limited to, incomplete, inaccurate or corrupt
data, transcription errors, a copyright or other intellectual
property infringement, a defective or damaged disk or other
medium, a computer virus, or computer codes that damage or
cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES -


Except for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU
AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE,
STRICT LIABILITY, BREACH OF WARRANTY OR BREACH
OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH
1.F.3. YOU AGREE THAT THE FOUNDATION, THE
TRADEMARK OWNER, AND ANY DISTRIBUTOR UNDER
THIS AGREEMENT WILL NOT BE LIABLE TO YOU FOR
ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE
OR INCIDENTAL DAMAGES EVEN IF YOU GIVE NOTICE OF
THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If


you discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person or
entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.
If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.

1.F.4. Except for the limited right of replacement or refund set


forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR
ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the


Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you do
or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.

Section 2. Information about the Mission of


Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new
computers. It exists because of the efforts of hundreds of
volunteers and donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status by
the Internal Revenue Service. The Foundation’s EIN or federal
tax identification number is 64-6221541. Contributions to the
Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.

The Foundation’s business office is located at 809 North 1500


West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws


regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or
determine the status of compliance for any particular state visit
www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states


where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot


make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.

Please check the Project Gutenberg web pages for current


donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and
credit card donations. To donate, please visit:
www.gutenberg.org/donate.

Section 5. General Information About Project


Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and

You might also like