0001

Chapter 1
by 103.178.218.4 on 08/16/23. Re-use and distribution is strictly not permitted, except for Open Access articles.
Introduction
Machine Learning with Python Downloaded from www.worldscientific.com
1.1 Naturally Learned Ability for Problem Solving
We are constantly dealing with all kinds of problems every day, and would
like to solve these problems for timely decisions and actions. We may notice
that for many of the daily-life problems, our decisions are often made spon-
taneously, swiftly without much consciousness. This is because we have been
constantly learning to solve such problems in the past since we were born,
and therefore the solutions have already been encoded in the neuron cells in
our brain. When facing similar problems, our decision is spontaneous.
For many complicated problems, especially in science and engineering,
one would need to think harder and even conduct extensive research and
study on the related issues before we can provide a solution. What if we want
to give spontaneous reliable solutions to these types of problems as well?
Some scientists and engineers may be able to do this for some problems, but
not many. Those scientists are intensively trained or educated in specially
designed courses for dealing with complicated problems.
What if a normal layman would also like to be able solve these challenging
types of problems? One way is to go through a special learning process.
The alternative may be through machine learning, to develop a special
computer model with a mechanism that can be trained to extract features
from experience or data to provide a reliable and instantaneous solution for
a type of problem.
1.2 Physics-Law-based Models
Problems in science and engineering are usually much more difficult to solve.
This is because we humans can only experience or observe the phenomena
1
2 Machine Learning with Python: Theory and Applications
associated with the problem. However, many phenomena are not easily
observable and have very complicated underlying logic. Scientists have been
trying to unveil the underlying logic by developing some theories (or laws or
principles) that can help to best describe these phenomena. These theories
are then formulated in the form of algebraic, differential, or integral system
equations that govern the key variables involved in the phenomena. The
next step is then to find a method that can solve these equations for these
variables varying in space and with time. The final step is to find a way
to validate the theory by observation and/or experiments to measure the
values of these variables. The validated theory is used to build models to
solve problems that exhibit the same phenomena. This type of model is
called physics-law-based model.

The above-mentioned process is essentially what humans on earth have
been doing in trying to understand nature, and we have made tremendous
progress so far. In this process, we have established a huge number of areas
of studies, physics, mathematics, biology, etc., which are now referred to as
sciences.
Understanding nature is only a part of the story. Humans want to
invent and build new things. A good understanding of various phenomena
enables us to do so, and we have practically built everything around us,
buildings, bridges, airplanes, space stations, cars, ships, computers, cell
phones, internet, communication systems, and energy systems. Such a list is
endless. In this process, we humans established a huge number of areas of
development, which we are now referred to as engineering.
Understanding biology helped us to discover medicines, treatments for
illnesses of humans and animals, treatments for plants and the environment,
as well as proper measures and policies dealing with the relationships
between humans, animals, plants, and environments. In this process, we
humans established a huge number of areas of studies, including medicine,
agriculture, and ecology.
In the relentless quest by humans in history, countless theories, laws,
techniques, methods, etc., have been developed in various areas of science,
engineering, and biology. For example, in the study of a small area of compu-
tational mechanics for designing structural systems, we have developed the
finite element method (FEM) [1], smoothed finite element method (S-FEM)
[2], meshfree methods [3, 4], inverse techniques [5], etc., just to name a few
that the author has been working on. It is not possible and necessary to list
all of these kinds of methods and techniques. Our discussion here is just to
provide an overall view of how a problem can be solved based on physics laws.
Introduction 3
Note that there are many problems in nature, engineering, and society
for which it is difficult to describe and find proper physics laws to accurately
and effectively solve them. Alternative means are thus needed.
1.3 Machine Learning Models, Data-based
There is a large class of complicated problems (in science, engineering,

biology, and daily-life) that do not yet have known governing physics laws, or
the solutions to the governing laws’ equations are too expensive to obtain. For
this type of problem, on the other hand, we often have some data obtained
and accumulated through observations or measurements or historic records.
When the data are sufficiently large and of good quality, it is possible to
develop computer models to learn from these data. Such a model can then be
used to find a solution for this type of problem. This kind of computer model
is defined as a data-based model or machine learning model in this book.
Different types of effective artificial Neural Networks (NNs) with various
configurations have been developed and widely used for practical problems
in sciences and engineering, including multilayer perceptron (MLP) [6–9],
Convolutional Neural Networks (CNNs) [10–14], and Recurrent Neural
Networks (RNNs) [15–17]. TrumpetNets [8] and TubeNets [9, 18–20] were
also recently proposed by the author for creating two-way deepnets using
physics-law-based models as trainers, such as the FEM [1] and S-FEM [2].
The unique feature of TrumpetNets and TubeNets is their effectiveness for
both forward and inverse problems [5]. It has a unique net architecture.
Most importantly, solutions to inverse problems can be analytically derived
in explicit formulae for the first time. This implies that when a data-based
model is built properly, one can find solutions very efficiently.
Machine learning is essentially to mimic the natural learning process
occurring in biological brains that can have a huge number of neurons. In
terms of usage of data, we may have three major categories:
1. Supervised Learning, using data with true labels (teachers).

2. Unsupervised Learning, using data without labels.
3. Reinforcement Learning, using a predefined environment.
In terms of problems to solve, there are the following:
1. Binary classification problems, answer in probability to yes or no.

2. k-classification problems, answer in probabilities to k classes.
3. k-clustering problems, answer in k clusters of data-points.
4. Regression (linear or nonlinear), answer in predictions of continuous

functions.
5. Feature extraction, answer in key features in the dataset.
6. Abnormality detection, answer in abnormal data.
7. Inverse analysis, answer in prediction on features from known responses.
In terms of learning methodology or algorithms, we have the following:

1. Linear and logistic regression, supervised.

2. Decision Tree, supervised.
3. Support Vector Machine (SVM), supervised.
4. Naive Bayes, supervised.

5. Multi-Layer Perceptron (MLP) or artificial Neural Networks (NNs),
supervised.
6. k-Nearest Neighbors (kNN), supervised.
7. Random Forest, supervised.
8. Gradient Boosting types of algorithms, supervised.
9. Principal Components Analysis (PCA), unsupervised.
10. K-means, Mean-Shift, unsupervised.
11. Autoencoders, unsupervised.
12. Markov Decision Process, reinforcement Learning.
This book will cover most of these algorithms, but our focus will be
more on neural network-based models because rigorous theory and predictive
models can be established.
Machine learning is a very active area of research and development. New
models, including the so-called cognitive machine learning models, are being
studied. There are also techniques for manipulating various ML models. This
book, however, will not cover those topics.
1.4 General Steps for Training Machine Learning Models
General steps for training machine learning models are summarized as

follows:
1. Obtaining the dataset for the problem, by your own means of data
generation, or imported from other existing sources, or computer
syntheses.
2. Clean up the dataset if there are objectively known defaults in it.
3. Determine the type of hypothesis for the model.
Introduction 5
4. Develop or import proper module for the needed algorithm for the
problem. The learning ability (number of the learning parameters) of
the model and the size of the dataset shall be properly balanced, if
possible. Otherwise, consider the use of regularization techniques.
5. Randomly initialize the learning parameters, or import some known pre-
trained learning parameter.
6. Perform the training with proper optimization techniques and monitor-
ing measures.
7. Test the trained model using an independent test dataset. This can also
be done during the training.
8. Deploy the trained and tested model to the same type of problems, where
the training and testing datasets are collected/generated.
1.5 Some Mathematical Concepts, Variables, and Spaces
We shall define variables and spaces often used in this book for ease of
discussion. We first state that this book deals with only real numbers, unless
specified when geometrically closed operations are required. Let us introduce
two toy examples.
1.5.1 Toy examples

Toy Example-1, Regression: Assume we are to build a machine learning
model to predict the quality of fruits. Based on its three features, size,
weight, and roundness (that can easily observe and measure), we aim to
establish a machine learning regression model to predict the values of two
characteristics, sweetness and vitamin-C content (that are difficult to
quantify nondestructively), for any given fruit. To build such a model, we
make 8,000 measurements to randomly selected fruits from the market and
create a dataset with 8,000 paired data-points. Each data-point records
the values of these three features and pairs with the values of these two
characteristics. The values of these two characteristics are called labels
(ground truth) to the data-point. The dataset is called labeled dataset that
can be used systematically to train a machine learning model.
Toy Example-2, Classification: Assume we are to build a machine learning
model to classify the type of fruits based on its three features (size, weight,
and roundness). In this case, we want a machine to predict whether any
given frait is an apple or orange, so that it can be packaged separately
in an automatic manner. To achieve this, we make 8,000 measurements to
randomly selected fruits of these two types from the market, and create a
dataset with 8,000 paired data-points. Each data-point records the values
of these three features and pairs with two labels (ground truth) of yes-or-no
for apple or yes-or-no for orange. The dataset is also called labeled dataset
for model training.
With an understanding of these two typical types of examples, it should
be easy to extend this to many other types of problems for which a machine
learning model can be effective.
1.5.2 Feature space

Feature space Xp : Machine learning uses datasets that contain observed or

measured p variables of real numbers in R, often called features. In our two
toy examples, p = 3. We may define a p-dimensional feature space Xp which is
a vector space (https://en.wikipedia.org/wiki/Vector space) over real num-
bers in R with inner product defined. A vector in Xp for an arbitrary point
(x1 , x2 , . . . , xp ) is written as
x = [x1 , x2 , . . . , xp ], x ∈ Xp (1.1)
The origin of Xp is at x = [0, 0, . . . , 0] following the standard for all vector
spaces. Note that we use italic for scalar variables, bold face for all vectors
and matrices, and blackboard bold for spaces (or sets or of that nature),
and this convention is followed throughout this book. Also, we define, in
general, all vectors in row vectors by default, as we usually do in Python
programming. A column vector is treated as special case of 2D array (matrix)
with only one column.
It is clear that the feature space Xp is a special (with vector operations
defined) case of the real space Rp . Thus, Xp ∈ Rp .
Also, xi (i = 1, 2, . . . , p) is called linear basis functions (not to be confused
with the basis vectors), because a linear combination of xi gives a new x that
is still in Xp . A two-dimensional (2D) feature space X2 is the black plane
x1 − x2 shown in Fig. 1.1.
An observed data-point xi with p features is a discrete point in the space,
and the corresponding vector xi is expressed as
xi = [xi1 , xi2 , . . . , xip ], xi ∈ Xp , ∀ i = 1, 2, . . . , m (1.2)
where m is the number of measurements or observations or data-points in
the dataset. It is also often referred as number of samples in a dataset. For
these two toy examples, m = 8,000. For the example shown in Fig. 1.1, these
4 blue vectors are for four data-points in space X2 , and m = 4.
Introduction 7
Figure 1.1: Data-points in a 2D feature space X2 with blue vectors: xi = [xi1 , xi2 ]; and
2
the same data-points in the augmented feature space X , called affine space, with red
vectors: xi = [1, xi1 , xi2 ]; i = 1, 2, 3, 4.
These data-points xi (i = 1, 2, . . . , m) can be stacked to form a dataset

noted as X ∈ Xp . This is for convenience in formulation. We do not form
such a matrix in computation because it is usually very large for big datasets
with large m.
1.5.3 Affine space

p
Affine space X : It is an augmented feature space. It is the red plane
shown in Fig. 1.1. It has a “complete” linear bases (or basis functions):
x = [1, x1 , x2 , . . . , xp ] (1.3)
By complete linear bases, we mean all bases up to the 1st order of all the
variables including the 0th order. The 0th order basis is the constant basis 1
that provides the augmentation. Affine space is not a vector space, because
p p
0∈ / X and (xi + xj ) ∈
/ X where i, j=1 or 2 or 3 or 4 in Fig. 1.1. This special
and fundamentally useful space always has a constant 1 as a component, and
thus it does not have an origin by definition. Operation that occurs on an
affine space and still stays in an affine space is called affine transformation.
It is the most essential operation in major machine learning models, and the
fundamental reason for such models being predictive.
An observed data-point with p features can also be presented as an
p
augmented discrete point in the X space and can be expressed by
p
xi = [1, xi1 , xi2 , . . . , xip ], xi ∈ X , ∀ i = 1, 2, . . . , m (1.4)
p
A X space can be created by first spanning Xp by one dimension to Xp+1
via introduction of a new variable x0 as
[x0 , x1 , x2 , . . . , xp ] (1.5)
and then set x0 = 1. These 4 red vectors shown in Fig. 1.1 live in an affine
2
space X .
p
Note that the affine space X is neither Xp+1 nor Xp , and is quite
p
special. A vector in a X is in Xp+1 , but the tip of the vector is confined
in “hyperplane” of x0 = 1. For convenience of discussion in this book, we
say that an affine space has a pseudo-dimension that is p + 1. Its true
dimension is p, but it is a hyperplane in a Xp+1 space.

In terms of function approximation, the linear bases given in Eq. (1.3)
can be used to construct any arbitrary linear function in the feature
space. A proper linear combination of these complete linear bases is still
in the affine space. Such a combination can be used to perform an affine
transformation, which will be discussed in detail in Chapter 5.
These data-points xi (i = 1, 2, . . . , m) are stacked to form an augmented
p
dataset X ∈ X , which is the well-known moment matrix in function
approximation theory [1–4]. Again, this is for convenience in formulation.
We may not form such a matrix in computation.
1.5.4 Label space

Label space Yk : Consider a labeled dataset for a supervised machine
learning model creation. We shall introduce variables (y1 , y2 , . . . , yk ) of real
numbers in R. For toy example-1, k = 2. We may define a label space Yk over
real numbers. It is a vector space. A vector in space Yk for can be written as
y = [y1 , y2 , . . . , yk ], y ∈ Yk ∈ Rk (1.6)
A label in a dataset is paired with a data-point. The label for data-point xi

which is denoted as yi can be expressed as
yi = [yi1 , yi2 , . . . , yik ], yi ∈ Yk , ∀i = 1, 2, . . . , m (1.7)
For the toy example-1, yij (i = 1, 2, . . . , 8000; j = 1, 2) are 8,000 real numbers
in 2D space Y2 . For the toy example-2, each label, yi1 or yi2 , has a value of
0 or 1 (or −1 or 1), but the labels can still be viewed living in Y2 .
These labels yi (i = 1, 2, . . . , m) can be stacked to form a label set Y ∈ Yk ,
although we may not really do so in computation.
Introduction 9
Typically, affine transformations end at the output layer in a neural

network and produces a vector in a label space, so that a loss function can
be constructed there for “terminal control”.
1.5.5 Hypothesis space

The learning parameters ŵ in a machine learning model are continuous vari-
ables that live in a hypothesis space noted as WP over the real numbers.
Learning parameters are also called training or trainable parameters. We
use these terms interchangeably. The learning parameters include weights
and biases in each and all the layer. The hat above w implying that it
is a collection of all weights and biases, so that we have single notation

in a vector for all learning parameters. Its dimension P depends on type
of hypothesis used including the configuration of neural networks or ML
models. These parameters always work with feature vectors, resulting in
intermediate feature vectors in a new feature space or in a label space,
thorough a properly designed architecture.
These parameters need to be updated which involves vector operations.
To ensure convergence, we would need the vector of all learning parameters
obey important vector properties, such as inner products, norms and the
Cauchy-Schwartz inequality, etc. We will do such proofs multiple times in
this book. Therefore, we require WP be a vector space, so that each update
to the current learning parameters results new parameters that are still in
the same vector space, until they converge.
Note that the learning parameters, in general, are in matrix form or
column vectors (that can be viewed as a special case of matrix). In a typical
machine learning model, there could be multiple matrices of different sizes.
These matrices form affine transformation matrices that operates on
features on affine spaces. A component in a “vector” of the hypothesis space
can be in fact a matrix in general, and thus it is not easy to comprehend
intuitively. The easiest (and valid) way is to “flatten” all the matrix and
then “concatenate” them together to form a tall vector, and then treat it as
a usual vector. We do this kind of flattening and concatenation all the time
in Python. Such a flattened tall vector ŵ in the hypothesis space WP can
be written generally as,
ŵ = [W 0 , W 1 , . . . , W P ] ∈ WP (1.8)
We will discuss in later chapters the details about WP for various models
including estimation of the dimension P .
1.5.6 Definition of a typical machine learning model,

a mathematical view
Finally, we can define mathematically ML models for prediction as a mapping
operator:
p
M(ŵ ∈ WP ; X ∈ X , Y ∈ Yk ) : Xp → Yk (1.9)
It reads that the ML model M uses a given dataset X with Y to train its
learning parameters ŵ, and produces a map (or giant functions) that makes
a prediction in the label space for any point in the feature space.
The ML model shown in Eq. (1.9) is in fact a data-parameter
converter: it converts a given dataset to learning parameters during training

and then converts the parameters back in making a prediction for a given set
of feature variables. It can also be mathematically viewed as a giant function
with k components in the feature space Xp and controlled (parameterized)
by the training parameters in WP . When the parameters are tuned, one gets
a set of k giant functions over the feature space.
On the other hand, this set of k giant functions can also be viewed as
continuous (differentiable) functions of these parameters for any given data-
point in the dataset, which can be used to form a loss function that is also
differentiable. Such a loss function can be the error between these k giant
functions and the corresponding k labels given in the dataset. It can be
viewed as a functional of prediction functions that in turn are functions of
ŵ in the vector space WP . The training is to minimize such a loss function
for all the data-points in the dataset, by updating the training parameters
to become minimizers. This overview picture will be made explicitly into
a formula in later chapters. The success factors for building a quality ML
model include: (1) type of hypothesis, (2) number of learning parameters
in WP , (3) quality (representativeness to the underlaying problem to be
modeled, including correctness, size, data-point distribution over the features
space, and noise level) of the dataset in Xp , and (4) techniques to find the
minimizer of learning parameters to best produce the label in the dataset.
We will discuss this in detail in later chapters for different machine learning
models.
Concepts on spaces are helpful in our later analysis of the predictive
properties of machine learning models. Readers may find difficulty in
comprehending these concepts at this stage, and thus are advised to just have
some rough idea for now and to revisit this section when reading relevant
chapters. Readers may jump to Section 13.1.5 and take a look at Eq. (13.13)
there just for a quick glance on how the spaces evolve in a deepnet.
Introduction 11
Note also that there are ML models for discontinuous feature variables,
and the learning parameters may not need to be continuous. Such methods
are often developed based on proper intuitive rules and techniques, and
we will discuss some of those. The concepts on spaces may not be directly
applicable but can often help.
1.6 Requirements for Creating Machine Learning Models
To train a machine learning model, one would need the following:

1. A dataset, which may be obtained via observations, experiments, and

physics-law-based models. The dataset is usually divided (in a random
manner) into two mutually independent subsets, training dataset and
testing dataset, typically at a rate of 75:25. The independence of the test-
ing dataset is critical, because ML models are determined largely by the
training dataset, and hence their reliability depends on objective testing.
2. Labels with the dataset, if possible.
3. Prior information on the dataset if possible, such as the quality of the
data and key features of the data. This can be useful in choosing a proper
algorithm for the problem, and in application of regularization techniques
in the training.
4. Proper computer software modules and/or effective algorithms.
5. A computer, preferably connected to the internet.
1.7 Types of Data
Data are the key to any data-based models. There are many types of data
available for different types of problems that one may make use of as follows:
• Images: photos from cameras (more often now cellphones), images

obtained from the open websites, computer tomography (CT), X-ray,
ultrasound, Magnetic resonance imaging (MRI), etc.
• Computer-generated data: data from proven physics-law-based mod-
els, other surrogate models, other reliable trained machine learning
models, etc.
• Text: unclassified text documents, books, emails, webpages, social media
records, etc.
• Audio and video: audio and video recordings.
Note that the quality and the sampling domain of the dataset play
important roles in training reliable machine learning models. Use of a trained
model beyond the data sampling domain requires a special caution, because
it can go wrong unexpectedly, and hence be very dangerous.
1.8 Relation Between Physics-Law-based and

Data-based Models
Machine learning models are in general slow learners, fast predictors,

while physics-law-based models do not need to learn (using existing laws),
but are slow in prediction. This is because the strategies for physics-law-
based models and those for data-based models are quite different. ML models
use datasets to train the parameters, but physics-law-based models use laws
to determine the parameters.
However, at the detailed computational methodology level, many tech-
niques used in both models are in fact the same or quite similar. For example,
when we express a variable as a function of other variables, both models
use basis functions (polynomial, or radial basis function (RBF), or both).
In constructing objective functions, the least squares error formulation is
used in both. In addition, the regularization methods used are also quite
similar. Therefore, one should not study these models in total isolation. The
ideas and techniques may be deeply connected and mutually adaptable. This
realization can be useful in better understanding and further development
of more effective methods for both models, by exchanging the ideas and
techniques from one to another. In general, for physics-law-based computa-
tional methods, such as the general form of meshfree methods, we understand
reasonably well why and how a method works in theory [3]. Therefore, we are
quite confident about what we are going to obtain when a method is used
for a problem. For data-based methods, however, this is not always true.
Therefore, it is of importance to develop fundamental theories for data-based
methods. The author made some attempts [21] to reveal the relationship
between physics-law-based and data-based models, and to establish some
theoretical foundation for data-based models. In this book, we will try to
discuss the similarities and differences, when a computational method is
used in both models.
1.9 This Book
This book offers an introduction to general topics on machine learning. Our

focus will be on the basic concepts, fundamental theories, and essential
Introduction 13
computational techniques related to creation of various machine learn-

ing models. We decided not to provide a comprehensive document for
all the machine learning techniques, models, and algorithms. This is
because the topic of machine learning is very extensive and it is not possible
to be comprehensive in content. Also, it is really not possible for many read-
ers to learn all the content. In addition, there are in fact plenty of documents
and codes available publicly online. There is no lack of material, and there is
no need to simply reproduce these materials. In the opinion of the author, the
best learning approach is to learn the most essential basics and build a strong
foundation, which is sufficient to learn other related topics, methods, and
algorithms. Most importantly, readers with strong fundamentals can even
develop innovative and more effective machine models for their problems.
Based on this philosophy, the highlights of the book that cannot be found
easily or in good completion in the open literature are listed as follows, many
of which are the outcomes of author’s study in the past years:
1. Detailed discussion on and demonstration of predictability for arbitrary

linear functions of the basic hypothesis used in major ML models.
2. Affine transformation properties and their demonstrations, affine space,
affine transformation unit, array, chained arrays, roles of the weights and
biases, and roles of activation functions for deepnet construction.
3. Examination of predictability of high-order functions and a Universal
Prediction Theory for deepnets.
4. A concept of data-parameter converter, parameter encoding, and unique-
ness of the encoding.
5. Role of affine transformation in SVM, complete description of SVM
formulation, and the kernel trick.
6. Detailed discussion on and demonstration of activation functions,
Neural-Pulse-Unit (NPU), leading to the Universal Approximation
Theorem for wide-nets.
7. Differentiation of a function with respect to a vector and matrix, leading
to automatic differentiation and Autograd.
8. Solution Existence Theory, effects of parallel data-points, and pre-
dictability of the solution against the label.
9. Neurons-Samples Theory gives, for the first time, a general rule of thumb
on relationship between the number of data-points and the number
neurons in a neural network (or the total pseudo-dimensions of affine
spaces involved).
10. Detailed discussion on and demonstration of Tikhonov regularization
effects.
The author has made substantial effort to write Python codes to demonstrate
the essential and difficult concepts and formulations, which allows readers
to comprehend each chapter earlier. Based on the learning experience of the
author, this can make the learning more effective.
The chapters of this book are written, in principle, readable indepen-
dently, by allowing some duplicates. Necessary cross-references between
chapters provided are kept minimum.
1.10 Who May Read This Book

The book is written for beginners interested to learn the basics of machine
learning, including university students who have completed their first
year, graduate students, researchers, and professionals in engineering and
sciences. Engineers and practitioners who want to learn to build machine
learning models may also find the book useful. Basic knowledge of college
mathematics is helpful in reading this book smoothly.
This book may be used as a textbook for undergraduates (3rd year or
senior) and graduate students. If this book is adopted as a textbook, the
instructor may contact the author (liugr100@gmail.com) directly for some
homework and course projects and solutions.
Machine learning is still a fast developing area of research. There still exist
many challenging problems, which offer ample opportunities for research to
develop new methods and algorithms. Currently, it is a hot topic of research
and applications. Different techniques are being developed every day, and
new businesses are formed constantly. It is the hope of the author that this
book can be helpful in studying existing and developing machine learning
models.
1.11 Codes Used in This Book
The book has been written using Jupiter Notebook with codes.
Readers who purchased the book may contact the author directly
(mailto:liugr100@gmail.com) to request a softcopy of the book with codes
(which may be updated), free for academic use after registration. The
conditions for use of the book and codes developed by the author, in both
hardcopy and softcopy, are as follows:
1. Users are entirely at their own risk using any of part of the codes and
techniques.
Introduction 15
2. The book and codes are only for your own use. You are not allowed to
further distribute without permission from the author of the code.
3. There will be no user support.
4. Proper reference and acknowledgment must be given for the use of the
book, codes, ideas, and techniques.
Note that the handcrafted codes provided in the book are mainly for
studying and better understanding the theory and formulation of ML

methods. For production runs, well-established and well-tested packages
should be used, and there are plenty out there, including but not limited
to Scikit learn, PyTouch, TensorFlow, and Keras. Also, our codes provided
are often run with various packages/modules. Therefore, care is needed when
using these codes, because the behavior of the codes often depends on the
versions of Python and all these packages/modules. When the codes do not
run as expected, version mismatch could be one of the problems. When this
book was written, the versions of Python and some of the packages/modules
were as follows:
• Python 3.6.13 :: Anaconda, Inc.

• Jupyter Notebook (web-based) 6.3.0
• TensorFlow 2.4.1
• keras 2.4.3
• gym 0.18.0
When issues are encountered in running a code, readers may need to

check the versions of the packages/modules used. If Anaconda Navigator
is used, the versions of all these packages/modules installed with the Python
environment are listed when the Python environment are highlighted. You
can also check the versions of a package in a code cell of the Jupyter
Notebook. For example, to check the version of the current environment
of Python, one may use
!python -V # ! is used to execute an external command
Python 3.6.13 :: Anaconda, Inc.
To check the version of a package/module, one may use
• import package name

• print(‘package name version’,package name)
For example,
import keras
print('keras version',keras.__version__)
import tensorflow as tf
print('tensorflow version',tf.version.VERSION)
keras version 2.4.3

tensorflow version 2.4.1
If the version is indeed an issue, one would need to either modify the code
to fit the version or install the correct version in your system, by may be
creating an alternative environment. It is very useful to query on the web
using the error message, and solutions or leads can often be found. This is
the approach the author often takes when encountering an issue in running
a code. Finally, this book has used materials and information available on
the web with links. These links may change over time, because of the nature
of the web. The most effective way (and often used by the author) to dealing
with this matter is to use keywords to search online, if the link is lost.
References
[1] G.R. Liu and S.S. Quek, The Finite Element Method: A Practical Course,
Butterworth-Heinemann, London, 2013.
[2] G.R. Liu and T.T. Nguyen, Smoothed Finite Element Methods, Taylor and Francis
Group, New York, 2010.
[3] G.R. Liu, Mesh Free Methods: Moving Beyond the Finite Element Method, Taylor
and Francis Group, New York, 2010.
[4] G.R. Liu and Gui-Yong Zhang, Smoothed Point Interpolation Methods: G Space
Theory and Weakened Weak Forms, World Scientific, New Jersey, 2013.
[5] G.R. Liu and X. Han, Computational Inverse Techniques in Nondestructive Evalua-
tion, Taylor and Francis Group, New York, 2003.
[6] F. Rosenblatt, Principles of Neurodynamics: Perceptrons and the Theory of
Brain Mechanisms, New York, 1962. https://books.google.com/books?id=7FhRAA
AAMAAJ.
[7] D.E. Rumelhart, G.E. Hinton and R.J. Williams, Learning Internal Representations
by Error Propagation, 1986.
[8] G.R. Liu, FEA-AI and AI-AI: Two-way deepnets for real-time computations for both
forward and inverse mechanics problems, International Journal of Computational
Methods, 16(08), 1950045, 2019.
[9] G.R. Liu, S.Y. Duan, Z.M. Zhang et al., TubeNet: A special trumpetnet for explicit
solutions to inverse problems, International Journal of Computational Methods,
18(01), 2050030, 2021. https://doi.org/10.1142/S0219876220500309.
Introduction 17
[10] Fukushima Kunihiko, Neocognitron: A self-organizing neural network model for a

mechanism of pattern recognition unaffected by shift in position, Biological Cyber-
netics, 36(4), 193–202, Apr 1980. https://doi.org/10.1007%2Fbf00344251.
[11] D. Ciregan, U. Meier and J. Schmidhuber, Multi-column deep neural networks
for image classification, 2012 IEEE Conference on Computer Vision and Pattern
Recognition, 2012.
[12] M.V. Valueva, N.N. Nagornov, P.A. Lyakhov et al., Application of the residue number
system to reduce hardware costs of the convolutional neural network implementation,
Mathematics and Computers in Simulation, 177, 232–243, 2020.

[13] Duan Shuyong, Ma Honglei, G.R. Liu et al., Development of an automatic lawnmower
with real-time computer vision for obstacle avoidance, International Journal of
Computational Methods, Accepted, 2021.
[14] Duan Shuyong, Lu Ningning, Lyu Zhongwei et al., An anchor box setting technique
based on differences between categories for object detection, International Journal of

Intelligent Robotics and Applications, 6, 38–51, 2021.
[15] M. Warren and P. Walter, A logical calculus of ideas immanent in nervous activity,
Bulletin of Mathematical Biophysics, 5, 127–147, 1943.
[16] J. Schmidhuber, Habilitation Thesis: An Ancient Experiment with Credit Assignment
Across 1200 Time Steps or Virtual Layers and Unsupervised Pre-training for a
Stack of Recurrent NNs, 1993, TUM. https://people.idsia.ch//∼juergen/habilitation/
node114.html.
[17] Yu Yong, Si Xiaosheng, Hu Changhua et al., A review of recurrent neural networks:
LSTM cells and network architectures, Neural Computation, 31(7), 1235–1270,
2019. https://direct.mit.edu/neco/article/31/7/1235/8500/A-Review-of-Recurrent-
Neural-Networks-LSTM-Cells.
[18] L. Shi, F. Wang, S. Duan et al., Two-way TubeNets uncertain inverse methods for
improving positioning accuracy of robots based on interval, The 11th International
Conference on Computational Methods (ICCM2020), 2020.
[19] Duan Shuyong, Shi Lutong, G.R. Liu et al., An uncertainty inversion technique using
two-way neural network for parameter identification of robot arms, Inverse Problems
in Science & Engineering, 29, 3279–3304, 2021.
[20] Duan Shuyong, Wang Li, G.R. Liu et al., A technique for inversely identifying joint-
stiffnesses of robot arms via two-way TubeNets, Inverse Problems in Science &
Engineering, 13, 3041–3061, 2021.
[21] G.R. Liu, A neural element method, International Journal of Computational Methods,
17(07), 2050021, 2020.

0001

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

0001

Uploaded by

Copyright:

Available Formats

Chapter 1

1.1 Naturally Learned Ability for Problem Solving

1.2 Physics-Law-based Models

called physics-law-based model.

1.3 Machine Learning Models, Data-based

There is a large class of complicated problems (in science, engineering,

1. Supervised Learning, using data with true labels (teachers).

In terms of problems to solve, there are the following:

1. Binary classiﬁcation problems, answer in probability to yes or no.

4. Regression (linear or nonlinear), answer in predictions of continuous

In terms of learning methodology or algorithms, we have the following:

1. Linear and logistic regression, supervised.

4. Naive Bayes, supervised.

1.4 General Steps for Training Machine Learning Models

General steps for training machine learning models are summarized as

the training and testing datasets are collected/generated.

1.5 Some Mathematical Concepts, Variables, and Spaces

1.5.1 Toy examples

learning model can be eﬀective.

1.5.2 Feature space

Feature space Xp : Machine learning uses datasets that contain observed or

These data-points xi (i = 1, 2, . . . , m) can be stacked to form a dataset

1.5.3 Aﬃne space

dimension is p, but it is a hyperplane in a Xp+1 space.

1.5.4 Label space

A label in a dataset is paired with a data-point. The label for data-point xi

yi = [yi1 , yi2 , . . . , yik ], yi ∈ Yk , ∀i = 1, 2, . . . , m (1.7)

Typically, aﬃne transformations end at the output layer in a neural

1.5.5 Hypothesis space

is a collection of all weights and biases, so that we have single notation

1.5.6 Deﬁnition of a typical machine learning model,

converter: it converts a given dataset to learning parameters during training

1.6 Requirements for Creating Machine Learning Models

To train a machine learning model, one would need the following:

1. A dataset, which may be obtained via observations, experiments, and

1.7 Types of Data

• Images: photos from cameras (more often now cellphones), images

1.8 Relation Between Physics-Law-based and

Machine learning models are in general slow learners, fast predictors,

1.9 This Book

This book oﬀers an introduction to general topics on machine learning. Our

computational techniques related to creation of various machine learn-

1. Detailed discussion on and demonstration of predictability for arbitrary

1.10 Who May Read This Book

1.11 Codes Used in This Book

studying and better understanding the theory and formulation of ML

• Python 3.6.13 :: Anaconda, Inc.

When issues are encountered in running a code, readers may need to

!python -V # ! is used to execute an external command

Python 3.6.13 :: Anaconda, Inc.

To check the version of a package/module, one may use

• import package name

keras version 2.4.3

[10] Fukushima Kunihiko, Neocognitron: A self-organizing neural network model for a

Mathematics and Computers in Simulation, 177, 232–243, 2020.

based on diﬀerences between categories for object detection, International Journal of

You might also like