Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 17

NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Week 7 Tools and Resources and


Interdisciplinary Inspiration

Video 7.2 Interdisciplinary


Inspiration
Mathematics and Statistics are Necessary Foundations (not only Inspiration)
Many parts of Mathematics are crucial for Machine Learning.
Vector and Matrix fundamentals
Elementary Matrices, Vectors, Linear mappings, Inner and Outer Products, Measures of Similarity in Vector
Spaces, Differentiation of matrices and Vectors, The Chain Rule, Multidimensional Taylor Series Expansions,
The Pseudoinverse of a matrix and Least Squares Techniques, Eigenvalues and Eigenvectors, Tensors
Geometry for state-space visualization
Geometric interpretation of mappings, Hypercubes, Decision regions & boundaries, Discriminant functions.
Quadratic surfaces and boundaries
Optimizaton
Gradient Descent-based procedures, Error functions contours and trajectories
Graphs and Digraphs

In a similar fashion Statistics is a corner stone for Machine Learning.

Statistics has traditionally dealt with: data collection, data modelling, data analysis, data interpretation and data
presentation. The core theoretical part of statistics: Mathematical statistics is based on Probability Theory, a branch
of Mathematics. Machine Learning is mostly dependent on inferential statistics which draws conclusions
on populations from studies of samples in contrast to descriptive statistics, which primarily summarizes samples.
Areas of particular interest are Stochastic processes (e.g. Markov processes), Bayesian methods and Monte Carlo
methods.
Inspiration from Theoretical Philosophy and Logic
The classical modes of reasoning inherited from theoretical philosophy and logic have obviously had
strong impact on artificial intelligence in general and machine machine learning in particular:
• Deduction as a basis for inductive logic programming
• Induction as a foundation and general source of inspiration for most parts of machine learning
• Abduction as a basis for explanation-based learning and backward reasoning in bayesian networks
• Analogy as a basis for case-based reasoning and associative memories.

The couplings between Logic/Theoremproving and certain parts of Artificial ntelligence are very strong.

Specific examples that has been metioned during the course are the following:
• The Logic Theorist program by Newell and Simon proved theorems from Principia Mathematica
• McCulloch and Pitt based their analysis of the capability of a neuron for performing logical operators
on a specific logic notation introduced by Carnap
• SLD resolution theorem proving was a basis for Logic Programming.
• The Lambda calculus was the basis for the LISP language.
Inspiration from Linguistics

Many aspects of Language including Learning of Language is central to Artificial Intelligence.

The theoretical and structural view of Language had its major proponent in Noam Chomsky
who published his classical work ´Syntactic Structures´ in 1957.

This view of Linguistics went well hand in hand with the developments in Computer Science and Artificial Intelligence
during the same period.

It also was well aligned with the movement away from Behaviourism and towards Cognitivism in psychology.

George Lakoff widened the work of Chomsky into the realm of Semantics also highlighting the importance of cultural
differences and embodiment.
Inspiration from Psychology
The various approaches in Machine Learning reflects a long ongoing debate in Psychology
where the pendulum in the last century has swung between Reductionism and Cognitivism.

Reductionism strives for reducing explanations of all mental behaviour into neuro physiological low level
processes (to be related to Artificial Neural Network aproaches in Machine learning), while Cognitivism (Miller, Broadbent,
Cherry and Bruner) argue for the relevance and existence of an abstract model of Cognition in terms of Symbols, Concepts and
logically related inferences(to be related to Symbolic Machine Learning approaches) .

In between these extremes, one finds the intermediate standpoints of:


• Structuralism (Wundt). Structuralism as a school of psychology seeks to analyze the mind in terms of the simplest
definable components and how these components fit together to form more complex experiences as well as how they
correlate to physical events. The evidence is based on introspection and self-reports.
• Functionalism (James and Dewey) is a psychological school of thought that was a direct outgrowth of Darwinian thinking
which focusses attention on the utility and purpose of behavior that has been modified over years of human
existence
• Behaviourism (Watson, Skinner and Thorndyke): assumes that all behaviors are either reflexes produced by a response
to certain stimuli in the environment, or a consequence of that individual's history, including especially
reinforcement and punishment, together with the individual's current motivational state and controlling stimuli. The
parallel to the perspectives in reinforcement learning is obvious.
• Gestalt psychology (Köhler and Wertheimer) The central principle of gestalt psychology is that the mind forms a
global whole with self-organizing tendencies.
Inspiration from Philosophy, Psychology and Anthropology combined
Categorization (Classification, Concept formation)
The Classical view of categories (with old roots but examplified by work of e.g. Bruner):
- Categories are arbitrary. Typically the ways we choose to categorize objects are culturally and linguistically based
- Categories have defining attributes (features/value combinations). All members share them, No nonmembers share
them and there are no overlap between members and nonmembers.
- The Intension (set of attributes) determines the extension of a category (its members)
- The member space has no internal structure and all members are regarded as equal and first class citizens.
- The levels in a hierarchy (lattice) of categories have the same status.

The Modern or natural view of a categories (prototype theory):


- Categories are in many case motivated by properties of our sensory system and the world surrounding us.
- Categories are built around central members or prototypes defined by sharing more attributes with other
members than with nonmembers
- Members are graded based on typicality
- Obviously typicality generates a topology of the member space
- The borders between categories are fuzzy rather than crisp.
- The levels in a hierachy (lattice) of categories have a different status , with the middle layer being most fundamental to everyday
thinking (basic level) in contrast to more abstract (superordinate) and more detailed (subordinate) levels.

The shift from the classical view to a modern or mixed view was triggered by work of Brown, Lenneberg, Mervis, Rosch, Berlin and
Kay. The most extensive studies were made on color terms (Berlin and colleagues studied colorterms in 98 countries).

The modern view of categorization has strongly influenced subareas such Instance-based learning (k-nearest neigbour approach ) and
some forms of clustering analysis (k-means approach).
Inspiration from Neuro Science
The objects of study for Neuro Science, are the nervous systems of animals in general and humans
in particular. A special focus is also on the Brains.

The atoms of the nervous systems are Neurons, is an electrically excitable cell that receives,
processes, and transmits information through electrical and chemical signals. In general the
nervous system is analogue, not digital

A typical neuron consists of a cell body (soma), dendrites (input channels), and an axon (output
channel). The excitation of a output signal from a neuron is called ´firing´. An axon from a
presynaptic neuron meets the dendrites of a postsynaptic neuron at a synapse.
The human brain is assumed to have in the order of 100 Billion neurons and 10 000 times as many
connections with the potential of large scale parallell activity.

Two major questions discussed in Neuro Science and Psychology for almost a century are:
• the belief that brain functions contributing to specific behaviours are primarily local to limited
brain areas (the Principle of Locality) in contrast to the belief that large portions of the brain
contribute to all kinds of behaviours (Holism)
• the belief that cognitive models of behaviour are just a figment of our imagination and that all
that exists is the myriad of atomic neuron activations (Reductionism)

The above big questions will be further elaborated by looking in more detail on:
• The work by Karl Lashley
• The work by Donald Hebb
• The work by David Hubel, Torsten Wiesel
• The works by Roger Sperry, Alexander Luria, Eric Kandel, Karl Pribram
The work by Karl Lashley
Karl Lashley was one of the most influential neuro scientists of the first part of the 20th century. He started as a student of
the father of Behaviourism but developed into the the most clearsighted proponent for a balanced view between Holism and
Localization. Furthermore Lashley, inspite of beeing a scrupulous experimentalist, seriously questioned the more extreme
beliefs in reductionism. He moved the focus from a mostly passive brain primarily triggered by external stimulus toward a
view of an almost always active brain with central and hierarchic control that proactively accomodate to external input.

At the core of Lashleys thinking was the concepts of:


• Equipotentiality – meaning that large areas of the brain potentially has the possibility to contribute to specific behaviours
• Law of mass action - meaning that the consequences of a braindamage is rather proportional to the amout of tissue
damaged rather than the exact localization of the damaged part
• Plasticity – meaning that the contributions to specific behaviours could be taken over my alternative brain regions in the
case of brain damages in specific regions.

Lashley also seriously researched the possibilities for sharply localizing the manifestion of singular concepts or memories in
the brain, what he termed as the search for the Engram. Lashleys conclusion was that this search was as fruitless as the
search for the holy grail as any functionality seemed to be spread over large areas of the brain, but his standpoint was
balanced enough to be a basis also for the proponents of symbolism (e.g. Newell)
The work by Donald Hebb
Donald Hebb published in 1949 a neuroscientific theory claiming that an increase in synaptic efficacy arises from
a presynaptic cell's repeated and persistent stimulation of a postsynaptic cell. The theory is alternatively called Hebb´s
theory, Hebb's rule, Hebb's postulate or the Cell Assembly theory. The theory is often summarized as "Cells that fire
together wire together.

Hebb expressed himself as follows:


´Let us assume that the persistence or repetition of a reverberator activity (or "trace") tends to induce lasting cellular changes that add to its
stability. ... When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or
metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased.´

Hebb’s Law can be represented in the form of two rules:


• If two neurons on either side of a connection (synapse) are activated synchronously, then the weight of that
connection is increased.
• If two neurons on either side of a connection (synapse) are activated asynchronously, then the weight of that
connection is decreased.

As Hebb described the overall learning phenomena of the Brain it was as a combination of:
- the Local learning enabled directly by Hebb´s law above and
- more Holistic learning in the sence of sequences or complex structures of synaptic connections being adapted.
So in this sense, Hebb had a balanced view in the locality/holism debate.
The work by David Hubel and Torsten Wiesel
Nobel prize awarded work by David Hubel and Torsten Wiesel in the 1950s and 1960s showed that cat and monkey
visual cortexes contain neurons that individually respond to small regions of the visual field.

Provided the eyes are not moving, the regions of the whole visual field within which visual stimuli affect the firing of
single neurons we call receptive fields.. Neighboring neurons have similar and overlapping receptive fields. Receptive
fields sizes and locations varies systematically to form a complete map of visual space. The responses of specific neurons
to a subset of stimuli within its receptive field is called neuronal tuning.

Hubel and Wieser identified two basic visual cell types in the brain:
• simple cells, whose output is maximized by straight edges having particular orientations within their receptive
field. Neurons of this kind are located in the earlier visual areas.
• complex cells, which have larger receptive fields, whose output is insensitive to the exact position of the edges in
the field. In the higher visual areas, neurons have complex tuning. For example, in the inferior temporal cortex, a
neuron may fire only when a certain face appears in its receptive field.
Hubel and Wiesel also proposed a cascading model of these two types of cells for use in pattern recognition task

The work by Hubel and Wiesel strengthens the belief in the importance of the combination of the principled of locality
and specificity.
The Human Visual system

Dorsal stream
Posterior
V5 parietal Cx
Superior
colliculus

Eye V3 V3A STS

Dorsal
LGN V1
V1 V2 V4 TEO TE

Ventral stream
Striate Cortex Extrastriate Cortex Inferior Temporal Cortex
The works by Roger Sperry, Alexander Luria, Eric Kandel,
Fernando Nottebohm and Karl Pribram
All the work mentioned below shed some further light on the two main issues highligted: Localization/Holism and
standpoint with respect to reductionism.
Roger Sperry is most well known for his work on the separate functionalities of the two brain hemispheres : left and
right, where the left have a stronger role for language and conceptally oriented task while the right has bias towards
spatial functions and more detailed discrimination functions. Sperry also studied plasticity phenomena between the
two hemispheres in the presence of certain kind of brain damages and in different periods of life.
Alexander Luria and his colleagues looked into the seemingly hierachical organizational of different brain regions
in their contributions to evoke specific behviours. They also showed how the role-play among brain regions may
change over time from being a child to being an adult. It seems that in younger individuals sensory regions dominate
while in older individuals the more planning oriented regions have the upper hand.
Eric Kandel and colleagues made important contributions regarding the relationship between simple neuron functions
and the functionality of an organism as a whole by systematically studying very simple organisms in a comprehensive
way.
Fernando Nottebohm published some suggestive work on the neural localization of functionalities enabling the Song
of Birds adding evidence for the Localization standpoint.
Karl Pribram on the other hand argued for the Holistic standpoint through his famous analogy between the Brain and
a Hologram (a hologram being an image recording, where the recorded image can be reconstructed from every tiny
fragment of the recording).
Inspiration from Logic and Neuro Science combined
In 1943 a Neuro Science Researcher: Warren McCulloch and a logician: Walter Pitts made a bold first attempt to
demonstrate that a model inspired by the current view of the functionality of a neuron could be proven to exhibit
Behaviour equivalent to classical Logical operations.

The approach is also referred to as the Threshold Logic Unit, Linear Threshold Unit or Threshold Logic Gate model.
The model had the following characteristics.

• A neuron has a binary output (0,1)


• It has excitatory as well as inhibitory inputs.
• A neuron receives inputs from excitatory synapses, all having
identical weights.
• An Inhibitory input has an absolute veto power over any
excitatory inputs
• A neuron computes a weighted sum and has as a transfer
function a fixed threshold
• These networks displayed no learning. They were essentially
"hard-wired" logic devices.
AND if threshold=1
OR if threshold=2
Inspiration from Genetics and Evolution Theory
Evolutionary Computing in general and Genetic algorithms in
particular are inspired by Darwinian evolutionary theory and
ideas of the survival of the fittest by natural selection.
The computational model is conceptually simple:
- Managing a Population (read data-set) of Chromosoms (data-
item) where every position in the chromosom corresponds to a
Gene (read feature).
Inspiration from Genetics and Evolution Theory
The population is developed in a cycle as depicted below:

For every new generation (population) all chromsoms are evaluted with respect
to fitness. The fittest subset of chromosoms are allowed to reproduce.
Reproduction takes place by:
- A crossover operation between pairs of the fittest chromosoms (exchange of
genes between certain chromosom positions.
- Mutation of the new population with a ceratin probability (helps for getting
out of local optima).
Inspiration from Physics and the Engineering Sciences
Thermodynamics
The Information Gain measure that is used to prioritize the choice of features in the build-up of a decision tree
is based on an entropy measure of probability of class membership inspired by the concept of Entropy
in thermodynamics which meausures the Molecular disorder. The second law of thermodynamics states that Entropy
can never decrease if not ´order´ is enforced by external influence.

Statistical Mechanics
The Energy function in Hopfield Networks is inspired by a mathematical model of ferromagnetism in Statistical
Mechanics called the Ising model named after the physicist Ernst Ising. The model consists of discrete variables that
represent magnetic dipole moments of atomic spins that can be in one of two states (+1 or −1). The spins are arranged in
a graph, usually a lattice, allowing each spin to interact with its neighbor.

Metallurgy
The Simulated annealing process used to find global optimizations in Boltzman Machines, is inspired by the annealing
process in Metallurgy where a material is heated to a temperature above its recrystallization temperature
maintaining a suitable temperature for a suitable amount of time, and then slowly cooling to achieve an increased
ductility and a decreased hardness.

Control theory, Operations analysis, Cybernetics (Wiener) and Information Theory (Shannon),
The Feedback loop based on Error measurement/Reward that is a core component in Reinforcement learning is inherited
from Control Theory. The dynamic programming technique (Bellman) and many other techniques in Reinforcement
learning are also inherited from the same field.
NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Thanks for your attention!

The next week of the course will contain:


- A repetition of assignment related tasks as a rehearsal for the final exam
- Some examples of applications and some demos.

You might also like