Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Basic Concepts of Artificial

Intelligence: Primed for Clinicians 1


Niklas Lidströmer, Federica Aresu, and Hutan Ashrafian

Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
AI, Machine Learning, and Deep Learning per Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
A Brief History of AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Rising Demand for AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
AI Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
AI Staging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
AI Programming Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Types of Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Machine Learning Problem Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Limitations of Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Introduction of Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Single-Layer Perceptrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

N. Lidströmer (*)
Department of Women’s and Children’s Health, Karolinska
Institutet, Stockholm, Sweden
e-mail: niklas.lidstromer@ki.se
F. Aresu
KTH Royal Institute of Technology, Stockholm, Sweden
e-mail: aresu@kth.se
H. Ashrafian
Department of Surgery and Cancer, Imperial College
London NHS Trust, London, UK
Institute of Global Health Innovation, Imperial College
London, London, UK
Hamlyn Centre for Robotics and Artificial Intelligence,
Department of Surgery and Cancer, Imperial College
London, London, UK
e-mail: h.ashrafian@imperial.ac.uk

© Springer Nature Switzerland AG 2022 3


N. Lidströmer, H. Ashrafian (eds.), Artificial Intelligence in Medicine,
https://doi.org/10.1007/978-3-030-64573-1_1
4 N. Lidströmer et al.

Abstract presentation of AIM for the many medical special-


ties. These are the purposes of these first upward-
With the urgent need for automatized algo- sloping components of this book, which contents
rithm applications to an ever-increasing should be natural parts of the understanding of AIM
amount of data and a further decrease of the by medical students and healthcare professionals of
chances of human errors on crucial tasks, arti- the future, in general.
ficial intelligence algorithms were introduced. After these general words on AI, a brief history
An expansive demand of AI applications in of AI is presented, and an explanation to why it has
varying fields led to the development of spe- become so famous right now. Then we must define
cifically designed ad hoc algorithms with the exactly what AI is, its different stages, what lan-
role of better estimating (by learning) solutions guage is best fitted for AIM, and how to properly
to the problems. communicate with programming experts. This part
The boost of AI in healthcare right now is a also contains a short intro to machine learning, with
consequence of two things – the availability of some medical demos. It will help to better grasp the
big data and better processors, able to train and contents of this large book, and understand the
execute algorithmic tasks, i.e., implementations limitations of machines, and why machine learning
of these algorithms with neural networks. is a necessity for the upcoming revolution of deep
It will soon be vital for medical students to learning – a cornerstone in deep medicine. Other
grasp the principles of AI. The purpose of this concepts in this first pedagogic book part are deep
major reference textbook on AI in medicine, of neural networks, natural language processing, and
which this chapter is the base level introduc- the practical implementation of the latter into med-
tion, is to become the greatest standard refer- icine and in particular into AIM.
ence work. No area of medicine, preclinical or
clinical, will escape the profound effects of AI:
the whole healthcare domain will be reshaped AI, Machine Learning, and Deep
thoroughly. Learning per Definition

What exactly is AI? How was it defined when it


Keywords
first emerged as a term in 1956? According to
Artificial Intelligence · Medicine · Basic · John McCarty it is the science and engineering
Concepts · Introduction · Classification · of making intelligent machines.
Healthcare · Machine Learning · Deep Hence, it is the theory and development of
Learning · Neural Networks · Programming computer systems able to perform tasks normally
requiring human intelligence, such as visual per-
ception, speech recognition, decision making, and
Introduction translation between languages.
AI is a technique of getting machines to work,
Let’s look at the AI landscape. Today we dwell on act, or behave like humans. Recently we have
the plains of AI. It will heavily influence medicine, started to realize this – robots and, in general,
yet relatively few healthcare professionals still have measurements-processing machines are used in
a good understanding of the concept to come. many fields, such as healthcare, marketing, robot-
Therefore, this chapter contains the scientific fun- ics, stock markets, business analytics, transporta-
damental principles of AI – primed for clinicians. tion, surveillance, etc.
This serves as the path leading up to the basecamp Machine Learning (ML) is a subset of AI tech-
location, where the evolution of deep medicine is niques and refers to a computer program that can
elaborated. Hereafter, the path leads up to the moun- learn how to produce behavior not explicitly pro-
tainous high-level plateau of section III and its grammed by the program’s author.
1 Basic Concepts of Artificial Intelligence: Primed for Clinicians 5

This behavior is learned based on the data that the two sciences would “eventually become a
given, by minimizing the error between the cur- firm alliance.”
rent action and the ideal one, and a feedback In 1950, Alan Turing published a landmark paper,
mechanism (the famous “error backpropagation”) in which he speculates about the possibility of creat-
that uses it to lead the machine to self-recognize, ing machines that think. He created a test known as
and internally improve its performances. the Turing test, aimed at determining whether a
Deep Learning (DL) is a subset of machine computer can think like a human being. He con-
learning, which uses multilayer neural networks cluded that thinking is hard to define. If a machine
to solve complex problems. They exploit the pro- can carry out a conversation, which is indistinguish-
cessing of contextual data. Deep learning tech- able from a conversation with a human being, then it
niques are, therefore, more suitable for learning would be reasonable to say the machine is thinking.
unstructured data. The machine would pass the Turing test. No machine
until this day has achieved this result. The Turing test
A Brief History of AI stands out as the first contribution to the philosophy
of artificial intelligence.
The history of AI is very old – it goes far back to Followed by this contribution came the era of
antiquity and Greek mythology. Talos was a giant “game AI” during the 1950s. In 1951 with the
animated bronze warrior, programmed to guard usage of the Ferranti Mark 1 machine, at the Uni-
the island of Crete, created by Hephaestus, to versity of Manchester, the computer scientist
throw rocks at nearing enemy ships. But there Christopher Strachey wrote a checkers program,
are several myths about mechanical men and and Dietrich Prinz wrote one for chess at around
automata throughout human history. These were the same time.
generally well thought of [1]. These were the first attempts to let computers
Modern medicine is intrinsically connected to play games such as chess, and compete with
advanced mathematical analysis, now requiring humans.
computers with fast processors. This has led to This was followed by probably the most impor-
new diagnostics, pharmacovigilance, therapies, tant year in AI history. In 1956, the concept of AI
robot-assisted interventions, epidemiological was coined for the first time at the Dartmouth
data mining, and synthesis of evidence-based Conference by Professor John McCarthy. In 1959
medicine and decision support. the first AI laboratory was established, marking the
Also, mathematics in medicine has a long his- next coming period as the AI research era. The lab
tory. Leiden professor of medicine Archibald is called MIT, and is still in operation and famous.
Pitcairne (1652–1713) developed a theory of In 1960 the first AI robot was implemented into
iatro-mathematics (medicine and mathematics) the GM assembly line, and the first chatbot
and can be regarded as the father of mathematical ELIZA was invented. This is the great grand-
medicine [2]. During his career he stood under mother of Siri and Alexa. Soon hereafter came
documented influence by mathematicians such as the very well-known IBM Deep Blue, which in
David Gregory (1659–1708) and Isaac Newton 1997 beat the world champion Garry Kasparov in
(1642–1727), whose Philosophiæ Naturalis a game of chess. This is regarded as maybe one of
Principia Mathematica, heavily influenced the the first great accomplishments of AI.
formation of a Newtonian medicine, and the for- In 2005 at the DARPA Grand Challenge, the
malized concept of iatro-mathematics, with lec- racing team of Stanford University participated
tures in Leiden during the early 1690s [3]. with Stanley, an autonomous car, which won
In 1920, The Lancet published an article of the the race.
topic of iatro-mathematics [4], where it was noted In 2011 IBM’s answering system Watson
that the “rapprochement of medicine and mathe- defeated two of the greatest Jeopardy! Champions
matics is incomplete.” However, it was concluded Brad Rutter and Ken Jennings.
6 N. Lidströmer et al.

AI started as a hypothetical situation, evolved Microsoft, and most car manufacturers, and a
and is now the most important technology in long list of major tech companies, are deeply
today’s world. AI has presented an exponential investing in AI. The consensus among all men-
growth of its potential. Wherever we look around tioned instances is that AI is the way of the future.
us AI deep learning or machine learning power
many things. For more details on the importance of the AI in
Today AI dominates Knowledge Base, Expert medicine, please see the chapter “On the Impor-
Systems, Deep Learning, Computer Vision and tance of AIM,” by Dr Katarina Gospic et al.
Image Processing, Machine Learning and Natural
Language Processing, etc. AI Applications

Now that the definition is set, let us briefly men-


Rising Demand for AI tion a few significant applications, to highlight the
importance of AI. The most famous is likely
A common query is how AI could have existed for Google’s predictive search engine. It is in global
over half a century and suddenly now appears as a use; whenever a person starts to type, Google
“sudden” hype and attracting all attention? The makes immediate suggestions the user could use.
main reasons for the present demand for AI are: This is AI in action literally. The predictability is
based on collected data from individuals, browser
• First of all, we have more computational usage, location, personal info, age, gender, and
power now to train deep learning models, and many more. Behind this guessing, there are
one of the most important contributions to many layers of natural language processing,
these technology improvements are graphic deep learning, and machine learning.
processors units(GPUs) for massively parallel Another striking application is in the financial
processing at low cost. The improved compu- field – J.P. Morgan uses the Chase’s Contract
tational power makes it possible to broadly and Intelligence Platform (COiN), which uses AI,
globally implement AI. machine learning and image recognition software
• Secondly, the enormous amount of data presently to analyze legal documents. In this way the com-
at hand. The data generated is at an immeasurable pany avoided manually reviewing ca. 12,000 doc-
pace. The sources are vast networks of social uments, which took more than 36,000 h. But when
media, IoT devices, mail, conversations, photos, this monotonous task was replaced by the AI
medical imaging, and numerous other places. machine it took a few hours.
Hence, there is a demand for a method or solution Since this book is focused on AI in medicine, let
to process this overload of data, in order to give us mention some applications for healthcare. For
us insight from it and let businesses grow as a instance, the Watson computer from IBM. Watson
result. This process is AI in essence. AI is trained uses natural language processing, evidence-based
on large datasets, big data, to help us make smart learning capacities, and hypothesis generation,
decisions, to classify objects in images, etc. which has contributed to clinical decision support
These processes enable us to act more efficiently. systems and contributed to AI in healthcare, and is
• Next up, we have far better algorithms. These today in use in an increasing number of medical
are much more effective and based on the specialties [5]. Medical doctors can pose questions
concept of neural networks, i.e., the deep learn- to Watson, entering clinical facts such as symp-
ing architecture. All this enables quicker and toms, medications, and heredity, and Watson can
more accurate computations. then mine the patient data, examine available data,
• And last, but not least, governments, venture form a hypothesis, and finally provide a list of
capitalists, tech giants, and start-ups are now all individualized confidence-scored suggestions [6].
focused on AI and pour in investments. For Watson’s data sources encompass research arti-
instance, companies in the FAANG group cles, clinical studies, treatment guidelines, and
(Facebook, Apple, Amazon, Netflix, Google), electronic health record information [5]. In should
1 Basic Concepts of Artificial Intelligence: Primed for Clinicians 7

though be noted that not been directly involved driving cars and even taxis without a driver, based
into the medical diagnosing, it has only assisted on a range of AI implementations. Also, Netflix uses
with treatment alternatives for already readily AI and pattern recognition to make personal film
diagnosed patients [7]. recommendations. Gmail uses a similar principle to
During the last 10 years Watson has partnered automatically sort incoming letters in mailboxes,
with a long range of organizations, companies, spam filtering, etc. The latter uses buzzwords that
and universities, e.g., Columbia University, Uni- are common in spam, e.g., full refund, lottery, etc.,
versity of Maryland [8], Memorial Sloan- and then directs them to the spam compartment.
Kettering Cancer Center [9], MD Anderson Can-
cer Center, Manipal Hospital, Cleveland Clinic, AI Staging
and Case Western Reserve University [10].
The FAANG group and other large corporations There are three main stages of AI
have started AI initiatives within health; Facebook
(Preventative Health, 2019–), Microsoft (Health • Narrow AI, also known as weak AI. This can
Vault, 2011–2019, Apple (Health, 2014), Amazon only be applied to specific tasks. Most applica-
(Amazon Care, 2018–), Google (Google Health tions today belong to this group, e.g., Alexa –
2006–2012, and with DeepMind 2018–) [11]. although sophisticated, all functions are within a
For instance, DeepMind Health collaborates narrowly defined function range. Other examples
with Moorfields Eye Hospital and are developing are self-driving cars, chess computers, AlphaGo.
AI applications for healthcare, especially eye • General AI, or strong AI. We have not reached
scanning [12], and with University College this stage. Strong AI refers to machines being
London Hospital aiming to develop an algorithm able to possess the ability to perform any intel-
to differentiate between healthy and cancerous lectual task that a human can. Machines now
tissues in the head and neck region [13]. have strong processing powers, but hitherto no
In global large networks, such as Facebook, AI sign of reasoning capacity; hence we’re stuck
is used in, e.g., face verification, both as password in the weak AI stage. Not even AlphaGo Zero,
and auto-tagging of friends, and to personalize which learned without human intervention,
advertising systems using neural networks, could be defined as strong AI.
machine learning, and deep learning concepts. • Super AI, refers to a stage when computers
Many people are not aware of how much AI would surpass human capacities. This stage
they use on a daily basis in their lives – all social encompassed big data statistics, symbolic
media platforms, e.g., Facebook, Instagram, Twit- mathematics, number of faces, generative
ter, LinkedIn, heavily rely on AI. Through the adversarial networks (GANs).
2016 US elections, political ads using social
media are called to the spotlight. Specifically, the AI Programming Languages
controversy of targeting users to obtain their per-
sonal information and determining what adver- There are several languages used in AI applica-
tisement would persuade those electors. tions, and one of the most popular and well-
Twitter uses AI to identify hate speech and known is Python, partly because of its simple
terroristic language in tweets. In this way they and functional syntaxes, and also for the great
detected ca. 300,000 terror-linked accounts. number of libraries designed for Python (to imple-
These nonhuman AI machines found 95% [14]. ment Machine Learning algorithms in a straight-
Also, virtual assistants such as Alexa and Siri forward manner), such as Keras and Tensorflow.
have entered the market, and quite recently also The Python advantages are related to its sim-
Google Duplex, which responds to calls, can book plicity and their maintainability as well as the
appointments and with a human touch, making it possibility to connect and integrate with files writ-
sound realistic. ten in other programming languages. Problems of
Some other examples of AI are Tesla’s and many memory usage and not having multithreading are
other car manufacturers’ experiments with self- substantial Python disadvantages [15].
8 N. Lidströmer et al.

Python was created in 1989 by Guido Rossum, of course, a vital part of AI, and which will be seen
and is an interpreted, object-oriented high-level in AIM, especially when applications are directed
programming language with dynamic semantics. toward patients and students. Java provides better
Hence, it is a high-level language (no concerns managing tools of garbage and provides multi-
with low-level details, e.g., memory allocation), threading, differently from Python.
which is free and open-source. Python is portable, Another alternative language is Lisp, less
i.e., it is supported by many platforms, e.g., Linux, known, but the most ancient and perhaps best-
iOS Mac, Windows PC, FreeBSD, Solaris, OS/2, adapted language for AI development. Lisp goes
Amiga, AS/400, BeOS, OS/390, PlayStation, Win- all back to the origins of AI, and was introduced
dows CE, etc. by John McCarty in the late 1950s, and can pro-
Python supports various programming para- cess symbolic information, can prototype, create
digms, i.e., both object-oriented, and procedure- dynamic objects, automatic garbaging, and is
oriented programming, and is extensible, i.e., it deemed easy by developers. Though, nearly all
can invoke C and C++ libraries, and can integrate of its excellent features have migrated into many
with a multitude of other languages, such as Java other languages. It is the Sanskrit, Swahili, or
and NET products. Python is the most rapid gainer Latin of AI languages. The latter are more effec-
in AI, with a huge momentum. Its use is ubiqui- tive, have better packaging, etc.
tous to create AI algorithms, machine learning, SWI Prolog is a language, which is relevant in
IoT projects, etc. With Python, the developer AIM, since it is often used in knowledge base and
doesn’t need to code very much, because there expert systems – it has features such as pattern
are ready-made packages, with algorithms. For matching, freebase data structuring, and auto-
instance, PiBrain (for Machine Learning), matic backtracking. This gives a strong and flex-
NumPy (scientific computing, Pandas, etc.) can ible framework for programming, which makes it
be implemented, and a vast range of libraries. frequently used in AIM.
Apart from Python, another popular program- Other languages worth mentioning are C++,
ming language used mainly for statistical tasks is SaaS, JavaScript, MATLAB, and Julia. All of
called R. This language well-performs in analysis these can be used for AI.
and manipulation of incoming data for statistical
purposes. R is well known for its publication-
quality plots and its compatibility with other pro- Machine Learning
gramming languages. However, R is less suitable
for handling big data analysis tasks for its consum- Machine Learning (ML) is one of the most
ing memory characteristic compared to Python and important instruments in AI. It is a way of feed-
its speed in other programming languages. ing data into a machine, so it can make its own
R is almost as easy as Python to learn. Both decisions. The need for ML is as old as the
languages are very similar to English in syntax and technical revolution, which has generated
construction, hence they belong to the easiest to immeasurable loads of data. In research we gen-
master. They both have an enormous number of erate over 2.5 quintillion bytes of data per day. In
libraries to provide all thinkable predefined algo- 2020 an estimated 1.7 MB of data was created
rithms, statistical models, data scientific inputs, AI, every second for every person on earth. With this
machine learning with algorithms, NLP, etc. vast amount, models can be created to study and
Moreover, Java is also used in AI, especially analyze complex data, insights and more precise
for artificial neural networks and genetic program- results can hence be delivered. In the last 2 years
ming. Here Java has its benefits with, e.g., simple alone, the astonishing 90% of all the world’s data
packaging and debugging, user interaction, and has been created. At the end of 2020, 44 zetta-
functionality for mega-project scalability and bytes (1021) made up the entire digital universe.
graphics. The latter is one of the outstanding 2.5 quintillion (1018) bytes are created by
assets of Java with its standard interface and humanity every day. It is estimated 463 exabytes
graphics’ toolkit – the graphical presentation is, (1018) of data will be generated every day by
1 Basic Concepts of Artificial Intelligence: Primed for Clinicians 9

humans in 2025 [16]. The rememberable fact is When Arthur Samuel coined Machine Learn-
also that we have decided to store this forever, ing in 1959, the definition was “a computer pro-
costing us substantial energy. gram is said to learn from experience E with
In a worst-case scenario, computer technology respect to some class of tasks T and performance
could use as much as 51% of global electricity in measure P if its tasks in T, as measured by P,
2030. This will happen if not enough improve- improves with experience E.”
ment in electricity efficiency of wireless access In other words:
networks and fixed access networks/data centers
is possible. However, until 2030, globally Data > Training the Machine
gsenerated renewable electricity is likely to > Building a Model > Predicting Outcome
exceed the electricity demand of all networks
and data centers. Nevertheless, a recent investiga- Machine learning (ML) in all essence, is a subset
tion suggests, for the worst-case scenario, that of AI, which provides machines the ability to learn
electricity for computer technology usage could automatically and improve from data pattern anal-
contribute up to 23% of the globally released ysis. The ML has been programmed in a way that it
greenhouse gas emissions in 2030 [17]. can adjust its parameters. Some more definitions of
With machine learning in the finance sector, a terms used in this textbook are the following:
wide range of profitable opportunities and avoid-
able risks can be identified. An understatement is
Algorithms – Set of rules and statistical techniques
of course that the equivalence will enter the med-
to learn patterns in data, mapping all decisions
ical and healthcare domains. The simple founda-
that a model can take.
tion of all machine learning, and of AI, is data per
Model – A mathematical equation, i.e., a compu-
se. Data is the solution, we just need to know how
tation or a formula, which is the result of an
to handle it, and ways to do this are ML, deep
algorithm that takes some values as input and
learning, and AI.
produces some values as output.
In medicine the needs for machine learning are
Predictor feature – Variable of the data that helps
associated with
predict the output, e.g., a physical factor in
patients, where a gene, height, lab value, vital
• Increased data generation from electronic
sign, or weight can predict a symptom or
health records (EHRs), digitalization
diagnosis.
• Improvements in decision making, risk
Response Variable – The feature, output variable,
prediction
or target variable that will be predicted.
• Uncovering of patterns and trends in data,
Training data – The data, which is used to train
finding hidden patterns, key data extraction
the ML model.
• Building statistical models, time saving
Testing data – Unseen data used to test the ML
• Solving of complex problems for humans
model after the training procedure.

The machine learning process involves the con-


struction of a Predictive Model, which is then used
to identify a solution to a Problem Statement.
Hence the process of ML involves several steps:

Definition of the objective > Data gathering


> Data preparation > Data exploration
> Model construction built on training
> Model testing > Predictions

An illustration of Machine Learning (ML) [18] AIM ML case example:


10 N. Lidströmer et al.

• Definition of Objective the output result will depend on. Exactly how,
Formulation of the problem idea: e.g., to we have to find out in the patterns of the vast
predict if an infected patient will recover, yes material, i.e., the correlations of such variables.
or no. • Building a Machine Learning Model
Q: What target feature: e.g., temperature, A predictive model is created with ML algo-
blood pressure, or other specific symptoms rithms, e.g., Linear Regression, Decision
Q: What input data: medical records, vital Trees, etc. This stage always starts with data
signs history, lab results, etc. splicing, into training data and testing data.
Q: What kind of problem: Binary classifi- The former is always used to build the model.
cation? Clustering? Regression problem? Training data is usually 80% of the total.
Hence, at step 1 we define how we will In this AIM example, we are predicting the
solve the problem, what kind of data we need, outcome of classification variables, also
and what we are trying to predict, what infor- known as categorical variables, i.e., recovered
mation is needed to predict. patient yes or no. Two alternatives. In this case
• Data Gathering we can use linear regression, support vector
Data such as temperature curves, lab values, machines, K nearest neighbor, or Naïve
vital signs, physical examinations, massive Bayes, etc. Which algorithmic model can be
loads of previous comparable patients EHRs used depends on the problem statement, i.e., it
etc., and where can we get this data – gathered depends on the task to solve. The methods
manually or scrapping from the web, from suggested to better approach a specific type of
EHRs, from other sources? This is the most task are the results of a continuous process of
time-consuming element of the ML process. trial and error.
We create the dataset. • Model Evaluation and Optimization
For ML exercises there are a lot of training This step evaluates and tests the model, so it
datasets online, e.g., Kaggle (https://www. can be improved, parameters tuned, etc. A part
kaggle.com/) and Grand Challenge (https:// of the testing dataset is used as a validation
grand-challenge.org/challenges/), where a dataset. The accuracy and errors as a form of
whole range of sets can be downloaded, having ML performance metrics are calculated.
numerous different themes; weather forecasts, • Predictions
economic forecasts, etc. Hence, developers can The final outcome is used to make predic-
skip the time-consuming data gathering step, tions about the given medical condition, and in
and just download the dataset. this case the outcome will be a categorical one.
• Curating First you get a probability, and based on a
Data preparation, or cleaning, involves preset range, the clinician can decide whether
erasing inconsistencies, e.g., missing factors the answer is yes or no.
or redundant information, erasing unnecessary
data, format correction, and getting the data
ready for analysis. Step 3 is likely the hardest Types of Machine Learning
step to perform. It is easy to bias the dataset if,
e.g., one factor’s values are missing more fre- There are basically three types of ML – super-
quently. Any mistake will affect the result. vised, unsupervised, and reinforcement learning.
• Exploratory Data Analysis In supervised learning, we train the machine
This is the ML brainstorming step, which with labeled data. In other words, the labels act
involves the understanding of patterns and like guides. In AIM, we may feed the machine
trends in the data. Insights are concluded, with expert interpretations of, e.g., X-ray images,
such as correlations between variables. For such as fracture or no fracture of a specific bone.
instance, low blood pressure and alternating We explicitly train the machine with these labels.
fever or other pre-septic signs are factors that This type is suitable for regression and
1 Basic Concepts of Artificial Intelligence: Primed for Clinicians 11

classification problems. The approach involves Machine Learning Problem Solutions


mapping labeled input to known output. Applica-
ble algorithms include linear regression, logistic ML problems can be classified into three
regression, Support Vector Machines, KNN, etc. types – regression, classification, and clustering
In unsupervised learning the machine trains problems.
with unlabeled data, which lets the machine Regression problems – Solved with supervised
work on unguided information. No labels are fed learning. The output will always be a continuous
into the procedure. Here the machine classifies by quantity, i.e., a variable with an infinite range of
itself; it will identify dissimilarities between an values, for instance, medical lab results. The body
X-ray with a fracture and one without, by noticing temperature can be 36.8, 36.9, 35.9, etc. The data
another pixel pattern, other shadows, fracture is variable. The aim is to forecast or predict. It is
lines, and other classifying features. It clusters solved with algorithms like linear regression.
into two groups in this example. This type is Regression is fitting a curve through data.
suitable for association and clustering problems. Classification problems – Solved with super-
The approach is to understand patterns and dis- vised learning. The output is always a categorical
cover output. Applicable unsupervised algorithms quantity. The main goal is to calculate the data
include Clustering using hierarchical clustering category, for instance, classification into skeletal
algorithms, k-means, and mixture models. Fur- injury or undamaged structure. Or gender, two cat-
thermore, Anomaly Detection algorithms such as egorical outcomes. On ImageNet there are even
k-nearest neighbor in medical image analysis are 1000 classes. Solved with algorithms such as logis-
widely used algorithms together with Deep Neu- tic regression, support vector machines, K nearest
ral Networks approaches such as Deep Belief neighbor, etc.
Nets, autoencoders, and Self-Organizing Map Clustering problems – Solved with
(SOM). unsupervised learning. Assigns data points into
In reinforcement learning, the agent is placed clusters. The principal aim is to group similar
in an environment and learns how to behave entities into >2 clusters, based on feature similar-
inside these surroundings by performing certain ity, for instance, to find hidden signs of severe
actions and noting what kind of rewards are disease. Hence, when the data is insufficient, and
received for these. An example is how Robinson we don’t know the output of, e.g., two categories,
Crusoe started to adapt to the desolate island. He then we form clusters. Solved, e.g., by the algo-
explored the environment and identified possible rithm of K-means.
rewards and dangers. This type of learning is
suitable for reward-based problem solutions. The Supervised Learning Algorithms
approach is a trial and error method. Applicable Below seven useful algorithms to a class of super-
algorithms include Q-learning, SARSA, etc. vised learning are presented.

Fig. 1 Illustration of three machine learning problems; left to right: linear and nonlinear regression, classification, and
clustering [19]
12 N. Lidströmer et al.

• Linear Regression – A method to predict the Let us assume we have a large data set of ICU
dependent variable that belongs to the y-axis, patients, and we would like to predict whether an
based on the values of independent variables infected patient faces the risk of a serious com-
along to the x-axis. The variables are continu- plication, e.g., sepsis, or not. Every step in the
ous or discrete. Used for predictions of a con- inverted tree represents a categorical classifica-
tinuous quantity. Hence the curve fitting the tion step, i.e., a choice between two values in a
data is linear. The equation is: binary tree. For instance, we use these short
queries – “feverish or not,” then “normal blood
Y ¼ ß0 þ ß1 X þ e pressure or not,” “normal pulse or not,” “shills or
not,” “affected general condition or not,” etc.
Y Dependent variable Each node that represents observations is
ß0 Y-intercept directed through the branches, purely conjunc-
ß1 Slope tions of features, to the leaves also known as
X Independent variable targets/labels. A common approach in triage or
e Error telephone consultation, where the nonmedical
operator tries to decide whether a patient should
For instance, variables with lab results can be seek the emergency room, see his/her GP the day
imported in a CSV format into Python and then
or just wait.
prepared, analyzed, and adequate algorithms applied. The most significant clinical factor should be
• Logistic Regression – A method used to predict a the initial root node, followed by internal nodes
dependent variable, from a dataset, when the and then the terminal nodes or leaves lead to a
dependent variable is categorical, and the outcome suggested outcome. Branches are the answers, yes
is 1 or 0, e.g., in orthopedic medical imaging, this or no, 1 or 0, etc.
would correspond to the presence of a bone frac- ID3 algorithm, Iterative Dichotomizer 3 algo-
ture or no fracture. The function is sigmoid with rithm, is one of the most effective ways of build-
values that go from 0 to 1. Afterward classification ing the tree within healthcare:
algorithms can be applied. See: https://en.
Step 1: Selection of the best attribute (A) – e.g.,
wikipedia.org/wiki/Logistic_regression.
affected general condition?
• Decision Tree – This is another classification
Step 2: Assign A a decision variable for the root
algorithm, and looks like an inverted tree. It is a
node – affected or unaffected patient?
ML algorithm where each node signifies a
Step 3: For every value A can take, create a node
predictor variable (aka feature), and the links
descendant – if yes, if no, etc.
between the nodes are decisions and at each
Step 4: Add classification labels to the lead node
branch there is a leaf, which stands for an
Step 5: If the date results in correction classifica-
outcome, or response variable.
tion, then stop.
Step 6: If not, then iterate.

What best separates the data in a clinical


decision tree is of course the most important
factor first, and which classifies the patient
group the best. During the construction, it is
common to repeatedly try with different vari-
ables and analyze which is the most suitable.
The two most important factors to consider
here are information gain and entropy. The var-
iable that best separates the data into the desired
output classes is the variable with the highest
An image showing an example of a decision tree [20] information gain.
1 Basic Concepts of Artificial Intelligence: Primed for Clinicians 13

The entropy is a measure of uncertainty or With the bootstrapped dataset we build a deci-
impurity, which the data contains. The information sion tree, starting at the root node, for which the
gain (IG) signals how much information a specific best attribute is used to split the dataset. For each
variable or feature brings us, in regard to the final of the upcoming branch nodes, the process is
outcome, which can be concluded with this logic: repeated. For each step, the best attributes are
E.g., has the patient had a fever? chosen. The iteration is operated hundreds of
p(yes) ¼ No. of yes outcomes in the parent times, and hence a forest of trees arises. The
node/total number of outcomes bootstrap dataset is used multiple times.
The entropy can be calculated in a similar Finally, the prediction stage is reached. If we
manner: intend to predict a medical condition in a patient,
  we run the data through the forest of decision trees,
Entropyparent ¼ pyes log 2 pyes and after all trees have been used it is presented
which classification the majority of trees have
þ pno log 2ðpno Þ voted for.
The model can be evaluated with a part of the
In this way, the variables’ data splitting capac- dataset, which was not bootstrapped, in the
ities are calculated to deem their suitability in a so-called out-of-bag dataset.
given place of the decision tree:
• Naïve Bayes Classifier – A supervised classi-
Information Gain ¼ Entropyparent fication algorithm, which is based on Bayes’
 ½weighted average Theorem, which solves classification prob-
 Entropychildren lems with a probabilistic approach. The
main idea is that the predictor variables in
• Random Forest – Builds multiple decision trees an ML model are intrinsically considered
(a forest) and fuses them and achieves a more independent of each other. The concept is
precise and stable prediction. The forest gives called naïve, since it does not consider any
more accuracy, avoids overfitting (when the correlations between the variables. In medi-
models learn also the noise or disturbance and cal real-world problems, there are often
takes this into the model, which negatively exactly such correlations anyway, but those
affects the models’ ability to predict from new are disregarded in this model.
data), and provides bagging, which means mul-
tiple trees test the data. This is suitable for more The mathematics building up this model cal-
complex medical situations, e.g., a patient with culates the probability for an event to occur based
multiple symptoms. From huge data sets of on events in the past:
medical records, then bootstrapping can be exe-
cuted in a row of circuits to make predictions. P(A|B) – the conditional probability of the event A
Random Forest Simplified happening, given event B
P(A) – the likelihood of event A happening
Instance
Random Forest
P(B) – and of event B happening
P(B|A)– the conditional probability of the event B
happening, given event A

Tree-1 Tree-2 Tree-n which gives

Class-A Class-B Class-B


pðBjAÞpðAÞ
PðAjBÞ ¼
Majority-Voting p ð BÞ
Final-Class
In a medical example this model could be used
An illustration of a Random Forest [21] to classify, e.g., the likely infectious agent, based
14 N. Lidströmer et al.

on its diagnostic features – lab tests, microscopy, remembers the training set, instead of learning a
X-ray, quick tests, clinical signs, etc. discriminative function. The most important fea-
ture is that it is based on feature similarity with
Infectious One lab Quick Clinical
agent test Microscopy test signs neighboring data points.
Infection 4500/5000 0 0 5000/ The K value stands for the number of nearest
type 1 5000
neighbors, within a radius defined by, e.g., Euclid-
Infection 500/5000 5000/5000 4000/ 0
type 2 5000 ean or Manhattan distances – there are many pos-
Infection 5000/5000 0 1000/ 500/ sible distance metrics. When determining the
type 3 5000 5000
nature of a novel data point, the number of mem-
Above, we have 15,000 patients, which can be bers of preexisting class categories as neighbors are
divided into three groups of 5000 with one spe- crucial. If a new data point would have one data
cific infection per group. Let’s say we are given point within its radius, which is of type A and two
the following observation then: out of B, then it would be assigned the type B, if K
were set to 3 (to include the three closest neigh-
One lab Quick Clinical bors). If more neighbors are enclosed, then the type
test Microscopy test signs
Observation Yes No Yes No may change. The adequate K value can be calcu-
lated with, e.g., the elbow method, see below.
To predict whether the infection is of a certain The data points (y,x) in a diagram are separated
type, Naïve Bayes can be used. P is the probabil- with a length, which is calculated with the Euclid-
ity. H is the hypothesis. C is a specific class. C1, ean distance. Its simple equation is used by KNN
class 1, etc. to check the closeness of a new data point from its
closest neighbors.
PðHjMultiple EvidencesÞ ¼ PðC1Þ j HÞ  P ðC2jHÞ . . .
pðCnjHÞ  pðHÞ=PðMultiple EvidencesÞ

All of the above alternatives are calculated, by


inserting them separately, hence we receive the
probability of which one is the likely infection in
the observation. The conditional probability is
delivered. There is a plethora of educational and
clinically relevant examples online [22].

• K-Nearest Neighbor – KNN is a supervised


learning algorithm, also used in unsupervised
learning problems, which classifies a new data Example of K-Nearest Neighbors (KNN). Calculating the
point into a target class, led by its distance to its Euclidean distance of each point to the unclassified gray
closest neighbor points. For instance, this point and defining the K number of neighbor points to con-
sider, the classification occurs. For instance, if K ¼ 3, then x
model is used to present medical images to will belong to class 2. The x is in the center of the circles,
the model, which classifies it into, e.g., if a which indicate the distance to a point (the cross). [19]
skin mole is a melanoma, or a benign nevus.
• Support Vector Machine – This is a classifica-
KNN is a simple and easily applied ML algo- tion and regression algorithm, which separates
rithm of both classification (mainly) and regres- data, with the use of hyperplanes. SVM studies
sion type. It is nonparametric, i.e., it has no labeled training data. Support Vector Regres-
assumption, which is the case with Naïve Bayes, sors are useful in regression problems; other-
which assumes there are no relations between the wise SVMs are mainly used in classification
variables. It is also a lazy algorithm, meaning it problems. Nonlinear data can be classified by
1 Basic Concepts of Artificial Intelligence: Primed for Clinicians 15

SVM with the use of kernel tricks. Nonlinear as the sum of the squared distance between each
means the data cannot be separated with a sole data point, or member, of the cluster and its cen-
linear line, elaborated below. troid. The method can be plotted as graph of the
WCSS(x) value (within-cluster sums of squares),
SVM works by drawing a boundary, a hyper- using this formula:
plane, between different classes, and hence sepa-
rating them in the best way. Support vectors are X
k X  
WCSSðkÞ ¼ xi  x j 2
the data points closest to the hyperplane. The
j¼1 xi  cluster j
boundary will be drawn based on info on the
support vectors. The optimal hyperplane has a
Where x j is the sample mean in cluster j
maximal distance, i.e., margin, to the support vec-
Elbow method equation [23].
tors. It is easy as long as the hyperplane separation
The distortion will decrease with increased
is linear. If this is not the case nonlinear SVM is
number of clusters, i.e., with a higher K value.
used. Here the Kernel trick transforms the data
At one point the distortion tilts abruptly, like an
into another dimension, e.g., separating them on
elbow in the graph. In large AIM studies, it is
the z-axis instead, if a 2D-approach didn’t allow
critical to pick the best amount of clusters.
separation with a straight line. In the 3D-space a
clear hyperplane might then be visualized, if the
data allow. Reinforcement Learning
Python can easily demonstrate and run all of Reinforcement learning algorithms consist of two
the mentioned algorithms mentioned in this intro- main components – agent and environment.
ductory part of the book. For instance, the RL agent learns from the
environment by rewards or failures, like if
exposed to a new computer game. The RL iterates
Unsupervised Learning Algorithms
the game until it masters it. Some main concepts:
The main aim of the unsupervised learning algo-
rithm K-means clustering is to group similar data
points into a cluster. The process classifies objects Reward (R) – The instant return from the environ-
into a predefined number of groups, so they are as ment to feedback the last agent action
dissimilar as possible between the groups, and as Policy (π) – The approach the agent uses to decide
similar as possible within the group. the next action
Every cluster has a centroid, from which dis- Value (V) – The expected long-term award with
tances to the objects are calculated. Grouping is discount, in opposition to short-term reward
then based on minimum distance. When faced Action-value (Q) – Like Value, but includes an
with new medical population material, we need extra parameter, the current action (A)
to first guess how many clusters there may be, and
then provide a centroid for all these. The algo- The Reward Maximization Theory states that
rithm then calculates the Euclidean distance of the an RL agent must be trained in such a manner that
points from each centroid and assigns the point to it takes the best action, so that the reward is
the most proximate cluster. Over and over the maximized.
clusters can be recalculated and new points Discount means escaping negative events.
added, and these steps are repeated until the cen- Discounting is measured with a gamma value
troids become the average of the cluster, i.e., between 0 and 1, with larger discount the smaller
iteration until the centroid value doesn’t change. gamma value.
With the elbow method the most optimal K Two other very important terms are explora-
value for a given problem is calculated. First the tion and exploitation trade-off. The former is
sum of squared errors (SSE) is computed for about exploring the environment, the latter about
some values of K. The definition of SSE is defined exploiting the environment.
16 N. Lidströmer et al.

Another concept is the Markov’s Decision Pro- from 1D, 2D, and to 3D demands quite new
cess, which is the mathematical approach for map- requirements. And in a real-life situation such as
ping a solution in reinforcement learning, and in AIM there can be thousands of dimensions.
includes the following parameters: One of the big challenges with traditional ML
models is a process named feature recognition,
Set of actions, A involved in object recognition, handwriting rec-
Set of states, S ognitions, and natural language processing, which
Reward, R consume a lot of resources. In these types of tasks,
Policy, π the dataset size has a big impact on machine
Value, V learning performances. It is necessary to have a
larger dataset for learning such variety and com-
For example, this can be shown in the shortest plex data. Therefore, the usage of deep learning
path problem, i.e., to calculate the shortest path approaches, such as CNN with hundreds of mil-
between two nodes, with the minimum possible lions of trainable weights, is required.
cost.
Q-Learning algorithms are among the most
important examples of reinforced learning, Introduction of Deep Learning
which can be used to reinforce, e.g., the pathways
taken by an agent, which is taught to get out of a Deep learning (DL) models are capable of focus-
labyrinth with several rooms, if this is the set goal. ing by themselves on feature extraction with very
The memory learnt by the agent with experi- little guidance from the programmer, and can
ence is represented as a Q matrix, where the rows solve especially the dimensionality problem – to
represent the current agent state and the columns avoid the curse of dimensionality [24]. The main
the possible actions leading to the next state, idea is to imitate the structure of the brain. DLs
which results in the final formula: learn by themselves automatically, and they can
quickly identify the decisive factors.
Qðstate, actionÞ ¼ Rðstate, actionÞ For instance, DL can be used for facial
þ Gamma  Max½Qðnext state, all actionsÞ recognition. If ML is used, all specific ele-
ments of a face must be separately defined;
The γ (gamma) parameter ranges as said 0 to 1, but with the neural networks of DL, they will
and if γ is closer to 0, the agent will consider automatically detect the key features of each
immediate rewards, and if closer to 1, the agent individual.
will weigh future rewards greater. Deep learning is a form of ML, which uses a
Hence, γ closer to 0 leads to exploitation, while computing model, hugely inspired by the cerebral
γ closer to 1 leads to exploration. structure, where dendrites receive signals to the
With these words I round off this simple math- neuron from other neurons, the neuronal cell bodies
ematical review, and tilt our focus to other general summarize all the inputs and the axons transmit
considerations on AI, ML and DL, e.g., the defi- these to other cells, by firing off at a certain
nitions and their limitations, which come next. threshold, etc.
In DL the dendrites are represented by artificial
neurons, also called perceptrons. They hence
Limitations of Machine Learning receive input of data from multiple sources, and
deliver an output. The input and output are exactly
ML is first of all not capable of handling high- like predictor variables. Data fed to a perceptron
dimensional data, where the input and output is will undergo different functions and transforma-
very large, since this complex data consumes tions, and then give an output. The perceptrons are
resources, and causes curse of dimensionality, as connected into an artificial neural network. DL is a
several new dimensions are added. Just going collection of statistical ML techniques, which
1 Basic Concepts of Artificial Intelligence: Primed for Clinicians 17

learn feature hierarchical structures, based on the The perceptron above receives different inputs
concept of artificial neural networks. X1, X2,. . .Xn, etc., and biases, and those are
In this neural network structure, there are dif- weighted accordingly to W1, W2,. . .Wn1, Y2,. . .
ferent types of layers – the input layer receiving all Y n.
the inputs, and the delivering output layer. The word activation function alludes to its
Between these layers are numerous amounts of equivalent in the human cerebral cellular function
hidden layers. The number of perceptrons in each with the action potential, where the neurons are
layer and the number of layers depend on the activated at a certain threshold. It is mirrored here
problem in question. mathematically, when the function becomes satu-
For image recognition in radiology, at first the rated above a given threshold.
high dimensional data is presented to the input There are several types of activation functions,
layer, which in this case due to the massive load e.g., signum, sigmoid, tan, hedge, etc. All func-
will contain multiple sublayers of artificial input tions match the input to the respective output. The
neurons to digest the entire input. The output bias can shift the activation function in order to get
received from the first input layer contains extract- an exact output.
able patterns, identifying the edges of the images, As a medical perception analogy – should a
based on contrast levels. This data will be relayed person at the ER be sent to the ICU or not? There
to the first hidden layer, which will be able to are three cardinal symptoms, X1–3, and those have
identify certain clinically relevant features (or in an importance or weight of W1–3. If the symptom
facial recognitions, facial features), further trans- is present X attains the value of 1 and if not 0. W1
fer to the second hidden layer will be able to create is set to 6, W2 to 2, and W3 to 2. Then the firing
a fuller picture (or the entire face). If the deeper threshold is considered. If the threshold is 5, it
recognition’s entirety is sufficient (and no more means the patient will be admitted if symptom 1 is
layers needed), then the data is transferred to the present, even if the two others are not. If the
output layer to deliver the classification. threshold is 3, then either symptom 1 or the two
other symptoms together will be sufficient for
Single-Layer Perceptrons patient transfer.

An Artificial Neural Network (ANN) with a single X1 – symptom 1 with W1 of 6


layer used for binary classification tasks is a linear X2 – symptom 2 with W2 of 2
model. The ANN, like the neuron, has a set of X3 – symptom 3 with W3 of 2
inputs, and each of these is dedicated to a certain
weight, and it computes some specific functions The limitations of perceptrons are associated
on these weighted deliverances and produces an with the absence of hidden layers, with the
output. It separates the data into two separate consequence it can only be applied to linear
classes (low or high). problems. When dealing with a nonlinearly
separable data a single-layer approach is not
feasible. A multilayer perceptron with back-
propagation can though be used to solve non-
linear problems.

Multilayer Perceptron – Artificial Neural


Network
A multilayer perceptron has the same structure as
a single layer perceptron, but with one or several
hidden layers, and hence is considered a deep
Principle of single-layer perceptron [25] neural network.
18 N. Lidströmer et al.

X1 y1 results updated with new weightage to minimize


the error (compared to the wanted output). The
X2 y2 procedure is iterated using gradient descent to do
it faster and the errors/losses are reduced and the
X3 y3 output optimized.
The limitation of a feed forward network can
be exemplified with, e.g., image recognition.
XN yN The network is exposed to a set of images,
Input Hidden Output
where the first photo presented will not really
Layer Layer(s) Layer change the way it classifies the next photo. So
the output at the time T is independent of the
A multilayer Artificial Neural Network (ANN) [26]
output at time T-1. This concept cannot be a
logic approach if the T-1 information is neces-
The weights between the layers of the deep net- sary to understand the next information coming
work are the principal way in which long-term info at T. Here we need a concept, which is the same
is stored in artificial neural networks. The ANN as you apply, when you read this book – you
learns new info by altering the weights or keeping need to understand parts I–III before you pro-
them up to date. A set of new inputs passes through ceed to the systematic review in part IV. In an
the first hidden layer, and then to the next as shown ANN the info at T + 1 has nothing to do with
in the figure. Initially in an ANN the weights are the info at T or T-1. Hence, an ANN cannot be
assigned as random. With backpropagation, which is used in situations where the output is based on
the quintessence of supervised neural network train- previous outputs, e.g., for predictions of words
ing, the weights of a neural network are fine-tuned. in a sentence.
The tuning is based on the error rate obtained in the The solution to this problem is a Recurrent
previous epoch (or iteration). Done properly this Neural Network (RNN), which is a type of ANN
reduces the error rates and makes the model reliable designed to recognize patterns in series of data,
by increasing its generalization. In order to have a e.g., genomes, text, speech, handwriting, or
correct output and a reduced error rate, the numerical times series from medical sensors, or
weightage needs to be adequate. The back-propaga- from stock markets, government agencies, etc.
tion is a way to train the perceptron by updating the The applications are vast for RNNs. The predic-
weights. tion is based on past output.
The most common DL algorithm for super- Another concept is the Convolutional Neural
vised training of a multilayer perceptron is back- Network (CNN), which is useful for image pro-
propagation. After the forward pass through the cessing. A computer sees an image by breaking it
whole network, it is propagated backward, the down to three color channels, red-green-blue

55
27
13 13 13
11

5 3
11 3 3 13 dense dense
13 3 13
5 27
224 3 3
384 384 256 100C
55
Max
256 pooling 4096 4096
Max Max
pooling pooling
Stride 96
224 of 4

Fig. 2 A Convolutional Neural Network (CNN) architecture, with (fully connected) dense layer of 4096 units, derived
from the last max-pool layer to the right (dimensionality change) [27]
1 Basic Concepts of Artificial Intelligence: Primed for Clinicians 19

(RGB), and reads the image with RGB values. GPT-3 is 175 billion machine learning parame-
Each of the channels is mapped with the pixels ters, and is the currently largest example of natural
of the image. (The computer calculates the value language procession (NLP) model [29].
of each pixel, and delivers the image size). In a A major concept in NLP is tokenization, which
CNN, a neuron will only be connected to a small is the splitting process of the whole data (corpus)
portion of the previous layer. The CNNs are not into smaller items (chunks). First it breaks a com-
fully connected. plex sentence into words, then analyses each single
word’s importance in the sentence and then trans-
lates. Stemming is the procedure of normalizing the
Natural Language Processing
word into its root or base or nominative state. The
stemming algorithm provides this, with limitations
The main reason for the need of text mining and
though. In order to improve the accuracy
Natural Language Processing (NLP) is the strato-
spheric amount of data generated every day, and it lemmatization uses a large dictionary to group dif-
is expected to grow. For instance: ferent inflected forms of a word, i.e., lemma.
Similar to stemming it labels a group of words
Instagram 8.95 million photos posted/day into a common bundle. The output is then a usable
Twitter 500 million tweets/day proper word. Stemming may lead to an indiscrim-
Facebook 3.2 billion likes and 350 million photos/day inate cutting of the word, so that the grammar or
Mail 300 billion emails sent/day the understanding is lost, hence the introduction
of lemmatization, which always delivers a word
The vast majority of all data is completely found in a dictionary. Lemmatization occurs after
unstructured. With the mining, structuring, and the morphological analysis.
analyzing the data, huge scientific and economic Stop words are critical to applications – they
values can be harvested. This is the essence of text can be removed with no loss of meaning, in order
mining – the process of deriving meaningful data to focus on the keywords. The stop words will
out of natural language. NLP is a method used in only decrease search accuracy.
text mining, and is the part of computer science and Another concept, useful in natural language
AI, which deals with human languages, and helps processing (NLP) is the document-term matrix,
computers read human texts. A few applications of which is a mathematical matrix, which describes
NLP and text mining are: auto-correction, auto- the frequency of terms occurring in a set of docu-
completion in searches, spam email detection, pre- ments. Columns correspond to terms and rows to
dictive typing, spell-checkers, and email classifica- documents. Features can also be used besides
tion. Sentimental analysis provides insights into the terms [30].
public or customer opinions in certain topics or Natural language processing has enormous
products. Social media platforms use this analysis potential for text mining of electronic health
continuously. Chatbots are widely used in, e.g., records and other medical applications, as will
customer services. Speech recognition in use by be elaborated in many other chapters of this refer-
Siri, Cortana, and Alexa are all NLP applications. ence textbook.
Machine translation is also an example of NLP, for
instance, in use by Google Translate. Other NLP
applications include information extraction, key- Conclusions
word search, and advertisement matching.
The Generative Pre-trained Transformer In conclusion, machine learning and deep learn-
(GPT-3) is an autoregressive language model, ing algorithms have been widely implemented in
which uses deep learning to produce human-like various sectors with noticeable exponential appli-
text. In the GPT series, it is the third-generation cation demand.
language prediction model, and was created by There will be no single area in preclinical med-
OpenAIin San Francisco [28]. The capacity of icine or clinical specialty, which will not be
20 N. Lidströmer et al.

profoundly affected by applications of artificial 13. Baraniuk C. Google DeepMind targets NHS head and
intelligence in medicine. AI will execute a thor- neck cancer treatment. BBC; 2016, August 16.
Retrieved 5 Sept 2016.
ough recast of all fields of healthcare. 14. Marr B. Accessed 8 Feb 2021. https://www.
The era of mathematical medicine, which can bernardmarr.com/default.asp?contentID¼1373
be said to have started with Archibald Pitcairne 15. https://medium.com/better-programming/pythons-
already in the seventeenth century [2], now with advantages-and-disadvantages-summarized-212b5fdf
8883. Accessed 8 Feb 2021.
the emerging AI in medicine era, comes into full 16. The 6th DOMO Report. Domo.com, 2018.
fruition. 17. Andrae A, Edler T. On global electricity usage of
This book encompasses the future potential of communication technology: trends to 2030. Chal-
medicine and the benefits that can be unleashed lenges. 2015;6:117–57.
18. Illustration by Nilay Nishit, Birla Institute of Technol-
when AI platforms can unlock the strengths of Big ogy, Mesra, India, May 2019.
Data for Healthcare. 19. Illustration by Federica Aresu, KTH, and Niklas
Lidströmer, Karolinska Institute, Stockholm, Sweden,
February 2021.
20. https://www.javatpoint.com/machine-learning-deci
References sion-tree-classification-algorithm, open-source image.
21. Illustration by Venkata Jagannath, TIBCO Spotfire,
1. Sparkes B. The Red and The Black: studies in Greek http://community.tibco.com. Accessed 12 Feb 2021,
pottery. Routledge; 1996. p. 124. ISBN 0-415-12661- release as free license on Wikipedia.
4, ISBN 978-0-415-12661-8; two late fifth-century 22. http://Machinelearningmastery.com. Accessed 11 Feb
vase paintings depicting the death of Talos are 2021.
discussed by Robertson M. The death of Talos. 23. Oliver Carloni, SemSpirit.com, Research engines in
J Hellenic Stud 1977;97:159f. artificial intelligence. His website presents a profound
2. Ashrafian H. Mathematics in medicine: the 300-year and comprehensive guidance and illustration of most of
legacy of iatro-mathematics. Lancet. 2013;382(9907): the useful calculations for AI.
1780. 24. Poggio T, Liao Q, Theory I. Deep networks and the
3. Guerrini A. Archibald Pitcairne and Newtonian medi- curse of dimensionality. Bull Polish Acad Sci Tech Sci.
cine. Med Hist. 1987;31:70–83. 2018;66(6):761–73.
4. Iatro-mathematics. Lancet 1920;196:610–11. 25. Rosenblatt F. The perceptron, a perceiving and recog-
5. Putting Watson to Work: Watson in healthcare. IBM. nizing automaton Project Para. Cornell Aeronautical
Retrieved 11 Nov 2013. Laboratory; 1957.
6. IBM Watson helps fight cancer with evidence-based 26. Image by Kiyoshi Kawaguchi, The University of Texas
diagnosis and treatment suggestions. IBM. Retrieved at El Paso College of Engineering Electrical & Com-
12 Nov 2013. puter Engineering, utep.edu.
7. Saxena M. IBM Watson progress and 2013 Roadmap 27. Krizhevsky A, Sutskever I, Hinton GE. ImageNet clas-
(Slide 7). IBM; 2013. Retrieved 12 Nov 2013. sification with deep convolutional neural networks
8. Wakeman N. IBM’s Watson heads to medical school. Neural Information Processing Systems (NIPS); 2012.
Washington Technology; 2011. Retrieved 19 Feb 2011. 28. Sagar R. OpenAI releases GPT-3, the largest model so
9. Upbin B. IBM’s Watson gets its first piece of business far. Analytics India Magazine. 2020, June 3. Retrieved
in healthcare. Forbes; 2013, February 8. 31 July 2020.
10. Miliard M. Watson heads to medical school: Cleveland 29. Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J,
Clinic, IBM Send Supercomputer to College. Prafulla D, Neelakantan A, Shyam P, Sastry G,
Healthcare IT News; 2012, October 30. Retrieved Askell A, Agarwal S, Herbert-Voss A, Krueger G,
11 Nov 2013. Henighan T, Child R, Ramesh A, Ziegler DM, Wu J,
11. Ghosh S. Google is consolidating DeepMind’s Winter C, Hesse C, Chen M, Sigler E, Litwin M,
healthcare AI business under its new Google Health Gray S, Chess B, Clark J, Berner C, McCandlish S,
unit. Business Insider. Retrieved 30 Jan 2020. Radford A, Sutskever I, Amodei D Language models
12. Baraniuk C. Google’s DeepMind to peek at NHS eye are few- shot learners. 2020;arXiv:2005.14165.
scans for disease analysis. BBC; 2016, 6 July. 30. Document-feature matrix: Tutorials for quanteda.
Retrieved 6 July 2016. http://tutorials.quanteda.io. Accessed 11 Feb 2021.

You might also like