Download as pdf or txt
Download as pdf or txt
You are on page 1of 134

Introduction to

Machine Learning

By
Renu Dhir
Department of Computer Science & Engineering
NIT Jalandhar
What is Learning?
Herbert Simon: “Learning is any process by
which a system improves performance from
experience.”
What is the task?
» Classification
» Problem solving / planning / control
Webster's ddefinition of “to learn”
“To gain knowledge or understanding of, or skill in
by study, instruction or experience''
• Learning a set of new facts
• Learning HOW to do something
• Improving ability of something already
learned
What is Machine Learning
Machine Intelligence
Computer Intelligence
Artificial Intelligence
It is the outward characteristics of a system that qualify
it to be classified as being intelligent-
for example,
 expertise,
 ability to satisfactorily deal with unexpected and
unfamiliar situations
and ability to reason, deduce, and infer from
incomplete information
Simon's definition of “machine learning”
``Learning denotes changes in the system that are
adaptive in the sense that they enable the system to
do the same task or tasks drawn from the same
population more effectively the next time''
What is Machine Learning?
Optimize a performance criterion using
example data or past experience.
Role of Statistics: Inference from a sample
Role of Computer science: Efficient
algorithms to
» Solve the optimization problem
» Representing and evaluating the model for
inference
Why “Learn” ?

Machine Learning is programming computers to optimize


a performance criterion using example data or past
experience.
There is no need to “learn” to calculate payroll
Learning is used when:
» Human expertise does not exist (navigating on Mars),
» Humans are unable to explain their expertise (speech
recognition)
» Solution changes in time (routing on a computer
network)
» Solution needs to be adapted to particular cases (user
biometrics)
Machine Learning: A Definition

Definition: A computer program


is said to learn from experience
E with respect to some class of
tasks T and performance
measure P, if its performance
at tasks in T, as measured by P,
improves with experience E.
AI and ML
As Artificial Intelligence (AI)
continues to progress rapidly in
2020, achieving mastery over
Machine Learning (ML) is becoming
increasingly important for all the
players in this field.

This is because both AI and ML


complement each other.
AI and ML

• Unfortunately, there’s still much


confusion within the public and the media
regarding what truly is Artificial
Intelligence (AI),
• and what truly is Machine Learning (ML) .
• Often the terms are being used as
synonyms, in other cases, these are being
used as discrete, parallel advancements,
• while others are taking advantage of the
trend to create hype and excitement,
as to increase sales and revenue.
Examples of Successful Applications of
Machine Learning

Learning to recognize spoken words


(Lee, 1989; Waibel, 1989).
Learning to drive an autonomous vehicle
(Pomerleau, 1989).
Learning to classify new astronomical
structures (Fayyad et al., 1995).
Learning to play world-class
backgammon (Tesauro 1992, 1995).
Problem Solving / Planning / Control
Performing actions in an environment in order to
achieve a goal.
» Solving calculus problems
» Playing checkers, chess, or backgammon
» Balancing a pole
» Driving a car or a jeep
» Flying a plane, helicopter, or rocket
» Controlling an elevator
» Controlling a character in a video game
» Controlling a mobile robot
Why is Machine Learning Important?

Some tasks cannot be defined well,


except by examples (e.g., recognizing
people).
Relationships and correlations can be
hidden within large amounts of data.
Machine Learning/Data Mining may be
able to find these relationships.
Human designers often produce
machines that do not work as well as
desired in the environments in which
they are used.
Why is Machine Learning Important
The amount of knowledge available
about certain tasks might be too large
for explicit encoding by humans (e.g.,
medical diagnostic).
Environments change over time.
New knowledge about tasks is
constantly being discovered by
humans. It may be difficult to
continuously re-design systems “by
hand”.
Why Study Machine Learning?
The Time is Ripe
Many basic effective and efficient algorithms
available.
Large amounts of on-line data available.
Large amounts of computational resources
available.
Develop systems that are too difficult /
expensive to construct manually because they
require specific detailed skills or knowledge
tuned to a specific task
(knowledge engineering bottleneck).
• ML is a branch of artificial intelligence (AI),
and as defined by Computer Scientist and
machine learning pioneer Tom M. Mitchell:
• “ML is the study of computer algorithms that
allow computer programs to automatically
improve through experience.”
 ML is one of the ways we expect to achieve AI.
 Machine learning relies on working with small to
large datasets by examining and comparing the
data to find common patterns and explore
nuances .
 (nuances : A slight or delicate variation in tone,
color, meaning, etc.; shade of difference.
 The definition of a nuance is a subtle
difference..
For instance,
if you provide a machine learning (ML) model with
many songs that you enjoy, along with their
corresponding audio statistics (dance-ability,
instrumentality, tempo, or genre).

It oughts' to be able to automate (depending on the


supervised machine learning model used)
and generate a recommender system as to suggest
you with music in the future that
(with a high percentage of probability rate) you’ll
enjoy,
similarly as to what Netflix, Spotify, and other
companies do.
Personalized Digital Media:
ML has massive potential in the entertainment
industry, and the technology has already found a
home in streaming services such as Netflix,
Amazon Prime, Spotify, and Google Play.
Some algorithms are already being used to
eliminate buffering and low-quality playback,
getting you the best quality from your internet
service provider.
ML algorithms are also making use of the almost
endless stream of data about consumers’ viewing
habits, helping streaming services offer more
useful recommendations.
In a simple example,
if you load a ML program with a considerable large dataset
of x-ray pictures along with their description (symptoms,
items to consider, and others), it oughts' to have the
capacity to assist
(or perhaps automatize) the data analysis of x-ray pictures
later on.

The ML model looks at each one of the pictures in the


diverse dataset, and find common patterns found in pictures
with labels with comparable indications.

Furthermore, (assuming that we use an acceptable ML


algorithm for images) when you load the model with new
pictures, it compares its parameters with the examples it
has gathered before to disclose how likely the pictures
contain any of the indications it has analyzed previously.
The term “AI” came to inception in 1956 by a group of researchers,
including Allen Newell and Herbert A. Simon ,
Since then, AI’s industry has gone through many fluctuations.
In the early decades, there was much hype surrounding the industry, and
many scientists concurred that human-level AI was just around the
corner.
However, undelivered assertions caused a general disenchantment with
the industry along with the public and led to the AI winter, a period
where funding and interest in the field subsided considerably.
Afterward, organizations attempted to separate themselves with the
term AI,
which had become synonymous with unsubstantiated hype and utilized
different terms to refer to their work.
For instance, IBM described Deep Blue as a supercomputer and
explicitly stated that it did not use AI , while it did.
During this period, a variety of other terms, such as big data,
predictive analytics, and ML, started gaining traction and popularity.
In 2012, ML , DL, and NNs made great strides and found use in a
growing number of fields.
Organizations suddenly started to use the terms of ML and DL for
advertising their products.
NLP & DL
They will help more and more with the
production of media too.
NLP (Natural Language Processing)
algorithms help write trending news
stories to decrease production time,
and a new MIT-developed AI,
named Shelley helps users write horror
stories through
DL algorithms and a bank of user-
generated fiction.
At this rate, the next great content
creators may not be human at all.
History of Machine Learning

1950s
» Samuel’s checker player
» Selfridge’s Pandemonium
1960s:
» Neural networks: Perceptron
» Pattern recognition
» Learning in the limit theory
» Minsky and Papert prove limitations of Perceptron
1970s:
» Symbolic concept induction
» Winston’s arch learner
» Expert systems and the knowledge acquisition bottleneck
» Quinlan’s ID3
» Michalski’s AQ and soybean diagnosis
» Scientific discovery with BACON
» Mathematical discovery with AM
History of Machine Learning (cont.)

1980s:
» Advanced decision tree and rule learning
» Explanation-based Learning (EBL)
» Learning and planning and problem solving
» Utility problem
» Analogy
» Cognitive architectures
» Resurgence of neural networks (connectionism, backpropagation)
» Valiant’s PAC Learning Theory
» Focus on experimental methodology
1990s
» Data mining
» Adaptive software agents and web applications
» Text learning
» Reinforcement learning (RL)
» Inductive Logic Programming (ILP)
» Ensembles: Bagging, Boosting, and Stacking
» Bayes Net learning
History of Machine Learning (cont.)

2000s
» Support vector machines
» Kernel methods
» Graphical models
» Statistical relational learning
» Transfer learning
» Sequence labeling
» Collective classification and structured outputs
» Computer Systems Applications
• Compilers
• Debugging
• Graphics
• Security (intrusion, virus, and worm detection)
» E mail management
» Personalized assistants that learn
» Learning in robotics and vision
Foundations of AI & ML
Computer
Science &
Engineering
Mathematics Philosophy

Economics AI & ML Biology

Psychology Cognitive Linguistics


Science
23
The Foundation of AI

Philosophy
» At that time, the study of human
intelligence began with no
formal expression
» Initiate the idea of mind as a
machine and its internal
operations

24
The Foundation of AI
 Mathematics formalizes the three main
area of AI: computation, logic, and
probability
 Computation leads to analysis of the
problems that can be computed
Complexity theory
 Probability contributes the “degree of
belief” to handle uncertainty in AI
 Decision theory combines probability
theory and utility theory (bias)

25
The Foundation of AI

Psychology
» How do humans think and act?
» The study of human reasoning and
acting
» Provides reasoning models for AI
» Strengthen the ideas
• humans and other animals can be
considered as information
processing machines
26
The Foundation of AI

Computer Engineering
» How to build an efficient computer?
» Provides the artifact that makes AI
application possible
» The power of computer makes
computation of large and difficult
problems more easily
» AI has also contributed its own work
to computer science, including:
time-sharing, the linked list data type,
OOP, etc. 27
The Foundation of AI
Control theory and Cybernetics
» How can artifacts operate under their own
control?
» The artifacts adjust their actions
• To do better for the environment over
time
• Based on an objective function and
feedback from the environment
» Not limited only to linear systems but also
other problems
• as language, vision, and planning, etc.
28
The Foundation of AI
Linguistics
» For understanding natural
languages
• different approaches has been
adopted from the linguistic work
» Formal languages
» Syntactic and semantic analysis
» Knowledge representation

29
Areas of Influence for Machine
Learning
Statistics: How best to use samples drawn from
unknown probability distributions to help decide
from which distribution some new sample is
drawn?
Brain Models: Non-linear elements with weighted
inputs (ANNs) have been suggested as simple
models of biological neurons.
Adaptive Control Theory: How to deal with
controlling a process having unknown
parameters that must be estimated during
operation?
Areas of Influence for Machine
Learning
• Psychology: How to model human performance on
various learning tasks?
• Artificial Intelligence: How to write algorithms to
acquire the knowledge humans are able to acquire,
at least, as well as humans?
• Evolutionary Models: How to model certain aspects
of biological evolution to improve the performance
of computer programs?
How to Achieve AI & ML?

Acting
humanly

Thinking AI & Thinking


humanly
ML rationally

Acting
rationally

32
Acting humanly: Turing Test
Turing (1950) "Computing machinery and
intelligence":
“Can machines think?”  “Can machines behave
intelligently?”
Predicted that by 2000, a machine might have a 30%
chance of fooling a lay person for 5 minutes
Anticipated all major arguments against AI in
following 50 years
Suggested major components of AI: knowledge,
reasoning, language understanding, learning
Capabilities
» Natural language processing
» Knowledge representation
» Automated reasoning
» Machine learning
» Computer vision
» robotics 33
Turing Test

Three rooms contain a person, a computer, and an


interrogator.
The interrogator can communicate with the other two by
teleprinter.
The interrogator tries to determine which is the person and
which is the machine.
The machine tries to fool the interrogator into believing that
it is the person.
If the machine succeeds, then we conclude that the
machine can think.
34
Acting Humanly: The Turing Test

Alan Turing
1912-1954

To be intelligent, a program should


simply act like a human
35
The Turing Test - Example

36
The Turing Test - Example

37
The Turing Test - Example

38
The Turing Test - Example

39
The Turing Test - Example

40
Acting Humanly
To pass the Turing test, the computer/robot
needs:
» Natural language processing to communicate
successfully.
» Knowledge representation to store what it knows or
hears.
» Automated reasoning to answer questions and draw
conclusions using stored information.
» Machine learning to adapt to new circumstances
and to detect and extrapolate patterns.

» These are the main branches of AI.

41
Thinking Humanly
Real intelligence requires thinking  think
like a human !
First, we should know how a human think
» Introspect ones thoughts
» Physiological experiment to understand how
someone thinks
» Brain imaging – MRI…
Then, we can build programs and models
that think like humans
» Resulted in the field of cognitive
science: a merger between AI and
Psychology. 42
Thinking humanly: cognitive modeling
1960s "cognitive revolution":
information-processing psychology
Requires scientific theories of internal
activities of the brain
How to validate? Requires
» Predicting and testing behavior of
human subjects (top-down)
» Direct identification from
neurological data (bottom-up)
Both approaches (roughly, Cognitive
Science and Cognitive Neuroscience)
are now distinct from AI
43
Problems with Imitating Humans

The human thinking process is


difficult to understand:
How does the mind raises from the
brain ?
Think also about unconscious
tasks such as vision and speech
understanding.
Humans are not perfect !
We make a lot of systemic
mistakes:
44
Thinking Rationally
Instead of thinking like a human : think rationally.
Find out how correct thinking must proceed:
the laws of thought.
Aristotle syllogism:
Aristotle: what are correct arguments/thought
processes?
» Syllogisms: ("conclusion, inference")
» A syllogism is a rhetorical device that begins with a
major statement, known as a premise, narrows down to
a minor statement, or premise, and then arrives at a
conclusion using deductive reasoning. ...
» Socrates is a man; all men are mortal;
» Socrates is mortal.
» Conclusion: Socrates is mortal.
This initiated logic: a traditional and important
branch of mathematics and computer science.
45
Thinking rationally: "laws of thought"
Problem: it is not always possible to model thought as a
set of rules; sometimes there is uncertainty.
Even when a modeling is available, the complexity of the
problem may be too large to allow for a solution .
Logicians in the 19th century developed a precise
notation for statements about all kinds of objects in the
world
Problems:
» It’s not easy to take informal knowledge and state it
in the formal terms required by logical notation
» There is a big difference between solving a problem
» “in principle”
» and solving it
» “in practice”
46
Acting Rationally

This is how birds Humans tried to This is how we finally


fly mimic achieved “artificial
birds for centuries flight”

47
Acting Rationally
Rational agent: acts as to achieve the best
outcome
Logical thinking is only one aspect of
appropriate behavior: reactions like getting
your hand out of a hot place is not the result of a
careful deliberation, yet it is clearly rational.
Sometimes there is no correct way to do, yet
something must be done.
Instead of insisting on how the program should
think, we insist on how the program should act:
we care only about the final result.
Advantages:
» more general than “thinking rationally” and more
» Mathematically principled; proven to achieve
rationality unlike human behavior or thought

48
Acting Rationally: Rational agent
Rational behavior: doing the right thing
The right thing: which is expected to maximize goal
achievement, given the available information
The rational-agent approach has two advantages
» It’s more general than the “laws of thought”
because correct inference is just one of several
possible mechanisms for achieving rationality
» Second , it’s more amendable to scientific
development than approaches based on human
behavior or human thought.
One point to keep in mind: we will see before too long
that achieving perfect rationality is not feasible in
complicated environments
49
What is AI & ML?

Views of AI fall into four categories:

Thinking humanly Thinking rationally

Acting Humanly Acting rationally

More studies / focus advocates

“Acting Rationally"
50
What is Artificial Intelligence & ML ?

THOUGHT Systems that think Systems that think


like humans rationally

Systems that act Systems that act


BEHAVIOUR like humans rationally

HUMAN RATIONAL

51
Disciplines which form the core of AI-inner circle
Fields which draw from these disciplines-outer circle.
Computer vision
Computer is a field of
vision is a computer science
field of that focuses on
AI that enabling
trains computers to
computers identify and
to capture understand
and objects and people
interpret in images and
informatio videos. Like other
n from types of AI,
image and computer vision
video data. seeks to perform
and automate
tasks that replicate
human
capabilities.
52
With each passing day and gradually as we move into future,

smart or intelligent machines will slowly replace and


enhance human capabilities in many areas. ...
Study in this area of AI has rapidly influenced the
emergence of smart technologies that has a huge impact on
our daily lives.
The intelligence exhibited by machines or soft-wares are
often termed as “Artificial Intelligence” which is a subfield
of computer science.
AI along with ML is now a potential game changer in the
history of computing backed with strong data analytics.
Study in this area of AI has rapidly influenced the
emergence of smart technologies that has a huge impact on
our daily lives.
The field of science, engineering, business and medicine has
become smarter with prediction capabilities to smoothen
our lives in our daily activities.
AI, on the other hand, is vast in scope.
“AI & ML” are the science and engineering of
making computers behave in ways that, until
recently, we thought required human intelligence.”
That is a great way to define AI & ML in a
single sentence;
however, it still shows how broad and vague the
field is.
Fifty years ago, a chess-playing program was considered as
a form of AI , since game theory, along with game
strategies, were capabilities that only a human brain could
perform.
Nowadays, a chess game is dull and antiquated since it is
part of almost every computer’s operating system (OS) ;
therefore, “until recently” is something that progresses
with time .
AI, as we know it today, is symbolized with Human-AI

interaction gadgets by Google Home, Siri, and Alexa, by


the ML powered video prediction systems that power
Netflix, Amazon, and YouTube.
These technological advancements are progressively
becoming essential in our daily lives.
They are intelligent assistants that enhance our abilities
as humans and professionals — making us more and more
productive.
In contrast to machine learning (ML),
AI is a moving target, and its definition changes as its
related technological advancements turn out to be further
developed .
Possibly, within a few decades, today’s innovative AI
advancements ought to be considered as dull as flip-phones
are to us right now.
On a broad level,
we can differentiate both AI and ML as:
AI is a bigger concept to create intelligent
machines that can simulate human thinking
capability and behavior, whereas,
ML is an application or subset of AI that
allows machines to learn from data without
being programmed explicitly.
AI will go for finding the optimal solution
and leads to intelligence or wisdom.
ML will go for only solution for that whether
it is optimal or not and leads to knowledge.
AI and ML
While ML is based on the idea
that machines should be able to learn
and adapt through experience,
AI refers to a broader idea
where machines can execute tasks
"smartly."
Artificial Intelligence (AI) applies
ML, DL and other techniques
to solve actual problems.
• AI brings with it a promise of genuine human-to-machine
interaction.
• When machines become intelligent, they can understand
requests, connect data points and draw conclusions. They
can reason, observe and plan.
• Consider: Leaving for a business trip tomorrow? Your
intelligent device will automatically offer weather reports
and travel alerts for your destination city.
• Planning a large birthday celebration?
Your smart bot will help with invitations, make
reservations and remind you to pick up the cake.
• Planning a direct marketing campaign?
Your AI assistant can instinctively segment your
customers into groups for targeted messaging and
increased response rates.
• Clearly, we’re not talking about robotic butlers. This isn’t
a Hollywood movie. But we are at a new level of cognition
in the AI field that has grown to be truly useful in our lives.
• not all AI is ML. ...
• This does not mean the machine is
self-aware or similar to human
intelligence; it only means that
the machine is capable of solving a
specific problem.
• ML refers to a particular type
of AI that learns by itself.
• And as it gets more data, it gets
better at learning.
Difference b/w AI and ML
AI ML
Concept of AI is broader than ML. ML is a subset of AI, where machines
It uses computers to imitate the cognitive have the ability to think and perform
human functions. actions based on their past experiences.
They can change their algorithm as per
the data sets on which they are operating.
Main aim of AI is to increase the chance ML focuses on accuracy rather than
of success not accuracy. success.
AI is not a system. In ML the system can work and learn
It can be implemented within a system to from datasets.
operate on computer programs that can
work smart.
Goal is to simulate natural intelligence to Goal is to learn from data for a certain
solve complex problems. task to maximize the performance of the
machine on the task.
Difference b/w AI and ML
AI ML
AI is primarily used in decision ML allows the system to learn
making from previous experiences.
It develops a system to mimic It helps in creating self-learning
humans, thus systems can respond algorithms
and behave in certain
circumstances.

Helps in finding the optimal ML will go after the solution,


solution without thinking much about the
optimal solution.
AI leads to intelligence or ML leads to knowledge.
wisdom.
• AI technology is important because it enables
human capabilities – understanding, reasoning,
planning, communication and perception – to be
undertaken by software increasingly
effectively, efficiently and at low cost. ...
• Applications of AI-powered computer vision will
be particularly significant in the transport
sector.
• The iterative aspect of ML is important ,
because as models are exposed to new data, they
are able to independently adapt.
• They learn from previous computations to
produce reliable, repeatable decisions and
results. ... Machine learning applications for
everyday life.
• One thing about AI is that it has been a never-ending
journey of building modern machinery with human intellect.
• It is a far-fetched approach to be able to colonize (to take
control of another country or place and make it a colony) the
human mindset for systematic operations.
• The programmatic implementation for AI might take some
time.
• As far as ML is a concern, you can start working on small
sets of data for initial tasks screening and adoption.
• As M L is a subset of AI, still it will take time to develop
and deliver.
• If you wish to implement AI technology in your business
then you must hire an expert artificial intelligence
developer from the best AL development company for
effective results.
• There has been a huge debate on AI Vs ML. The choice is
ultimately yours when you are looking forward to choosing
between AI and ML.
AI and ML is now considered to be one of the biggest
innovations since the microchip.
AI used to be a fanciful concept from science fiction, but
now it’s becoming a daily reality.
NNs (imitating the process of real neurons in the brain)
are paving the way toward breakthroughs in machine
learning, called “deep learning.”
ML can help us live happier, healthier, and more productive
lives… if we know how to harness its power.
Some say that AI & ML are ushering in another “industrial
revolution.”
Whereas the previous Industrial Revolution harnessed
physical and mechanical strength, this new revolution will
harness mental and cognitive ability.
One day, computers will not only replace manual labor, but
also mental labor.
But how exactly will this happen? And is it already
happening?
Machine learning as most in-demand AI skill

Over the past three years alone the number of


AI-related job postings on Indeed has
increased by more than 119 percent, according
to the platform's latest AI talent report.
Google uses ML algorithms to provide its
customers with a valuable and personalized
experience.
Gmail,
Google Search
and Google Maps already have
ML embedded in services.
ML as most in-demand AI skill

It is estimated that, there will be 2.3 million jobs


in the field of AI and ML by 2026. ...
When AI & ML takes over repetitive or
dangerous tasks,
it frees up the human workforce to do work they
are better equipped for—
tasks that involve creativity and empathy among
others.
If people are doing work that is more engaging
for them,
it could increase happiness and job satisfaction.
ML and DL as subfields of AI
As a whole AI contains many subfields, including: ML is mesmerizing,
particularly its advanced sub-branches, i.e., DL & the various types of
NNs, it is “magic” regardless of whether the public, at times, has issues
observing its internal workings.
While some tend to compare DL & NNs to the way the human brain
works, there are essential differences between the two.
ML automates analytical model building.
4 Key Types of Data Analytics
Descriptive Analytics. Descriptive analytics is the simplest type of analytics
and the foundation the other types are built on. ...
Diagnostic Analytics. Diagnostic analytics addresses the next logical
question, “Why did this happen?” ...
Predictive Analytics. ... (Predictive analytics uses ML and AI as tools to parse
data and predict possible outcomes)
Prescriptive Analytics. Prescriptive analytics is a type of data analytics that
uses statistical algorithms, ML techniques, and AI to analyze data and provide
recommendations on the actions to take to optimize business outcomes or
which encompasses algorithms
and models that allow computers to make decisions based on statistical data
relationships and patterns.
ML and DL as subfields of AI

It uses methods from NNs, statistics, operations


research and physics to find hidden insights in data
without being explicitly programmed where to look
or what to conclude.
A NNs is a kind of ML inspired by the workings of
the human brain.
It’s a computing system made up of interconnected
units (like neurons) that processes information by
responding to external inputs, relaying information
between each unit.
The process requires multiple passes at the data to
find connections and derive meaning from undefined
data.
M L and DL as subfields of AI
D L uses huge NNs with many layers of processing units,
taking advantage of advances in computing power and
improved training techniques to learn complex patterns in
large amounts of data.
Common applications include image and speech recognition.
Computer vision relies on pattern recognition and DL to
recognize what’s in a picture or video.
When machines can process,
analyze and understand images,
they can capture images or videos in real time and interpret
their surroundings.
NLP is the ability of computers to analyze, understand and
generate human language, including speech.
The next stage of NLP is natural language interaction,
which allows humans to communicate with computers using
normal, everyday language to perform tasks.
Traditional AI & ML Vs Modern
Data Mining
Retail: Market basket analysis, Customer
relationship management (CRM)
Finance: Credit scoring, fraud detection
Manufacturing: Optimization, troubleshooting
Medicine: Medical diagnosis
Telecommunications: Quality of service
optimization
Bioinformatics: Motifs, alignment
Web mining: Search engines
...
Why Study ML?
Engineering Better Computing Systems
Learning general models from a data of particular
examples.
Data is cheap and abundant (data warehouses, data
marts); knowledge is expensive and scarce.
Develop systems that can automatically adapt and
customize themselves to individual users.
» Personalized news or mail filter
» Personalized tutoring
Example in retail: Customer transactions to consumer
behavior:
People who bought “Da Vinci Code” also bought “The
Five People You Meet in Heaven” (www.amazon.com)
What We Talk About When We Talk About “Learning”

Build a model that is a good and useful


approximation to the data.
Discover new knowledge from large databases (data mining ).
» Market basket analysis (e.g. diapers and beer)
» "If a customer buys bread, they are also likely to buy milk" is an
association rule that could be mined from this data set.
Medical text mining (e.g. migraines to calcium channel blockers to
magnesium)

Basket analysis:
P (Y | X ) probability that somebody who buys X also buys
Y where X and Y are products/services.
Example: P ( chips | beer ) = 0.7
Is it good recommendation by AI & ML or not?
KBS ( Knowledge based systems)

 A knowledge-based system (KBS) is a


form of AI & ML that aims to capture
the knowledge of human experts to
support decision-making. ...
 The typical architecture of a
knowledge-based system,
 which informs its problem-solving
method,
 includes a knowledge base
 and an inference engine.
KBS ( Knowledge based systems)
KBS
 A knowledge-based system (KBS) is a form of AI & ML that aims to
capture the knowledge of human experts to support decision-
making.
 Examples of KBSs include ES, which are so called because of their
reliance on human expertise.
 The KB contains a collection of information in a given field --
medical diagnosis, for example.
 The inference engine deduces insights from the information housed
in the kB.
 KBS also include an interface through which users query the
system and interact with it.
 A KBS may vary with respect to its problem-solving method or
approach.
 Some systems encode expert knowledge as rules and are therefore
referred to as rule-based systems.
 Another approach, case-based reasoning, substitutes cases for
rules.
 Cases are essentially solutions to existing problems that a case-
based system will attempt to apply to a new problem.
Where KBS are used?
 Over the years, KBS have been developed for a
number of applications.
 MYCIN, for example, was an early KBS created
to help doctors diagnose diseases.
 Healthcare has remained an important market
for knowledge-based systems, which are now
referred to as clinical decision-support systems
in the health sciences context.
 KBS have also been employed in applications as
diverse as avalanche path analysis, industrial
equipment fault diagnosis and cash management.
KBSs and AI, ML, Big Data, NNs & DL
 While a subset of AI , classical KBS differ in
approach to some of the newer developments in AI.
 Daniel Dennett, a philosopher and cognitive
scientist, in his 2017 book, From Bacteria to Bach
and Back, cited a strategy shift from early AI,
characterized by "top-down-organized,
bureaucratically efficient know-it-all"
 systems to systems that harness Big Data and
"statistical pattern-finding techniques" such as
 data-mining and DL in a more bottom-up approach.
 Examples of AI following the latter approach
include NN systems,
 a type of DL technology that concentrates on
signal processing and pattern recognition problems
such as facial recognition.
Soft Computing
Important branch of study in the
area of intelligence and Knowledge
based systems.
Human reasoning is predominantly
approximated, qualitative and soft.
Humans can effectively handle
incomplete, imprecise and fuzzy
information in making intelligent
decisions
Soft Computing (SC)
Soft Computing (SC) is an emerging field that
consists of complementary elements of
fuzzy logic,
Neural computing,
evolutionary computation,
ML
and probabilistic reasoning.
Due to their strong learning and cognitive
ability and good tolerance of uncertainty and
imprecision,
Soft computing techniques have found wide
applications.
Soft Computing
 SC is an emerging approach having ability of the human mind to
reason and learn in an environment of uncertainty and imprecision.
 SC is based on some biological inspired methodologies such as
genetics, evolution, ant’s behaviors, particles swarming
 (Swarm intelligence is a form of collective learning and decision-
making based on decentralized, self-organized systems. Natural
examples are commonplace—flocks of birds and schools of fish act
and react as groups, without instruction or direction from any single
leader.), human nervous systems, etc.
 SC is the only solution when we don’t have any mathematical modeling
of problem solving (i.e., algorithm), need a solution to a complex
problem in real time,
 SC is easy to adapt with changed scenario and can be implemented
with parallel computing.
 It has enormous applications in many application areas such as medical
diagnosis, computer vision, hand written character recondition,
pattern recognition, machine intelligence, weather forecasting,
network optimization, VLSI design, etc.
What is Soft Computing?
The idea behind soft computing is to model cognitive
behavior of human mind.
Soft computing is foundation of conceptual intelligence
in machines.
Unlike hard computing, Soft computing is tolerant of
imprecision, uncertainty, partial truth, and approximation,
computationally intelligent,
− possess human like expertise in particular domain,
− can adapt to the changing environment and can learn
to do better
− can explain their decisions
Hard Vs Soft Computing Paradigms
∙ Soft computing
− Uses inexact methods to give useful but
inexact answers to intractable problems.
− Represents a significant paradigm shift in
the aims of computing - a shift which
reflects the human mind.
− Tolerant to imprecision, uncertainty,
partial truth, and approximation.
− Well suited for real world problems where
ideal models are not available.
Hard Vs Soft Computing Paradigms
∙ Hard computing
− Based on the concept of precise modelling and
analyzing to yield accurate results.
− Works well for simple problems, but is bound
by the NP-Complete set.
− Many contemporary problems do not lend
themselves to precise solutions such as
− Recognition problems (handwriting, speech,
objects, images,
− Mobile robot coordination, forecasting,
combinatorial problems etc.
Difference b /w Soft and Hard Computing

Hard Computing Soft Computing


Conventional computing requires Soft computing is tolerant of
a precisely stated analytical imprecision.
model.
Often requires a lot of Can solve some real world
computation time. problems in reasonably less time.

Not suited for real world Suitable for real world problems.
problems for which ideal model is
not present.
It requires full truth Can work with partial truth

It is precise and accurate Imprecise.

High cost for solution Low cost for solution


Components of Soft Computing
∙ Components of soft computing include:
− Fuzzy Logic (FL)
Evolutionary Computation (EC) - based on the origin of
the species
Genetic Algorithm
 Evolutionary algorithms
 Differential evolution
− Meta heuristic and Swarm Intelligence
 Ant Colony Optimizations
• Particle swarm optimization
− Neural Network (NN)
− Ideas about probability including:
» Bayesian network
Chaos theory
Perceptron and ML
How three important fields
AI and ML and DL are related?
How three important fields
AI and ML and DL are related?
Relationship among AI and ML and DL
Relationship among AI and ML and DL
Relationship among AI , ML, DL
and NN
Relationship among AI , ML, DL and Data Science
Relationship among AI , ML, DL , Big data and data Science
How ML is different from Deep Learning?
DL began to perform tasks that were
impossible to do with classic rule-based
programming.
Fields such as Speech and face recognition,
DIP classification and NLP, which were at
early stages, suddenly took great leaps ,
and
on March 2019–three the most recognized DL
pioneers won a Turing award thanks to their
contributions
and breakthroughs that have made deep NNs
a critical component to nowadays computing.
Learning Algorithm
Uses training values for the target function to induce a
hypothesized definition that fits these examples and
hopefully generalizes to unseen examples.
In statistics, learning to approximate a continuous
function ( continuous set can be measured in fractions
and decimals) is called regression.
Attempts to minimize some measure of error (loss
function) such as mean squared error:

[Vtrain(b)  V (b)]2

E bB
B
Classification
Example: Credit
scoring
Differentiating
between low-risk
and high-risk
customers from
their income and
savings

Discriminant: IF income > θ1 AND savings > θ2


THEN low-risk ELSE high-risk
Classification: Applications
Face recognition Pose, lighting, (glasses, beard), make-
up, hair style and occlusion (In this space, the foreground
objects of the scene will occlude the background
surfaces.
Put simply, occlusion in an image occurs when an object
hides a part of another object.
The areas that are occluded depend on the position of the
camera relative to the scene.
Or An occlusion occurs when something has been closed
up or blocked off)
Character recognition: Different handwriting styles.
Speech recognition: Temporal dependency.
» Use of a dictionary or the syntax of the language.
» Sensor fusion: Combine multiple modalities; eg, visual (lip image)
and acoustic for speech
Medical diagnosis: From symptoms to illnesses
Face Recognition

Training examples of a person

Test images
Classification
Classification: Different Step Process
1. Model construction (Learning):Each record (instance) is
assumed to belong to a predefined class, as determined
by one of the attributes, called the class label
2. The set of all records used for construction of the
model is called training set. The model is usually
represented in the form of classification rules,
3. (IF-THEN statements) or decision trees.
4. Model Evaluation (Accuracy):
5. Estimate accuracy rate of the model based on a test set.
The known label of test sample is compared with the
classified result from model.
Accuracy rate: percentage of test set samples correctly
classified by the model.
Test set is independent of training set otherwise over-
fitting will occur. Model Use (Classification):
The model is used to classify unseen instances (assigning
class labels)Predict the value of an actual attribute
How to Classify? How do humans classify items?

For example, suppose you had to classify the healthiness of


a food Identify set of features indicative of health: fat,
cholesterol, sugar, sodium, etc.
Extract features from foods Read nutritional facts,
chemical analysis, etc.
Combine evidence from the features into a hypothesis
Add health features together to get “healthiness factor”
Finally, classify the item based on the evidence If
“healthiness factor” is above a certain value, then deem it
healthy
Ontologies Ontology is a labeling or categorization scheme
Examples
Binary (spam, not spam) Multi-valued (red, green, blue)
Hierarchical (news/local/sports)
Different classification tasks require different ontologies
Linear and Non linear classifier
• A linear classifier is a model that makes a decision to categories a
set of data points to a discrete class based on a linear combination
of its explanatory variables.
• As an example, combining details about a dog such as weight,
height, color and other features would be used by a model to decide
its species.
• In the field of ML, the goal of statistical classification is to use an
object's characteristics to identify which class (or group) it belongs
to.
• A linear classifier achieves this by making a classification decision
based on the value of a linear combination of the characteristics.
• An object's characteristics are also known as feature values and
are typically presented to the machine in a vector called a feature
vector.
• Such classifiers work well for practical problems such as document
classification, and more generally for problems with many variables
(features),
• reaching accuracy levels comparable to non-linear classifiers while
taking less time to train and use.
Regression and classification
These are both related to prediction, where regression
predicts a value from a continuous set (can be measured in
fractions and decimals)., It is mainly used for prediction,
forecasting, time series modeling, and determining the
causal-effect relationship between variables.
In Regression, we plot a graph between the variables which
best fits the given data points, using this plot, the ML model
can make predictions about the data.
whereas classification predicts the 'belonging' to the
class.eg : price of a house depending on the 'size' (sq.
feet or whatever unit). Similarly the prediction of price
can be in words, viz., 'very costly', 'costly', 'affordable',
'cheap', and 'very cheap' : this relates to classification.
Each class may correspond to some range of values
and say 'location' of the house, can be some 'numerical
value' : this relates to regression.
Regression Vs Classification
Regression: given a set of data, find the best relationship
that represent the set of data.
Classification: given a known relationship, identify the class
that the data belongs to.
We can see that regression and classification starts from
opposing ends:
to find a pattern, or to find the pattern that it belong to.
But the result is what would make us to choose between the
two.
For example simple, hard classifiers simply tries to put the
example in specific class (eg SVM).(for eg, whether the
project is profitable or not , and doesn't account for how
much).
Where as regression can give exact figure of profit value as
some continuous value.
Regression Vs Classification
Classification trees have dependent variables
that are categorical and unordered. •
Regression trees have dependent variables that
are continuous values or ordered whole values.
Regression means to predict the output value
using training data.
Classification means to group the output into a
class.
E.g. we use Regression to predict the house
price from training data
and use classification to predict the type of
tumor i.e. harmful or not harmful using training
data.
Regression Vs Classification
In case of classification we can consider probabilistic
models (eg logistic regression) where each class or
label has some probability which can be weighted by
the cost associated with each label or class
and thus give us with final value on basis of which we
can decide to put it some label or not.

(for eg label A has probability of 0.3 but the payoff is


huge (1000, however label B has probability 0.7 but the
payoff is very low 10. )
So for maximizing the profit we might label the
example as label A instead of B.
Classification and Regression
The Formula for Linear Regression
we know the formula: y = mx + b and represents the slope-
intercept of a straight line. ‘y’ and ‘x’ represent variables,
‘m’ describes the slope of the line and ‘b’ describe the y-
intercept, where the line crosses the y-axis.
For Linear Regression, ‘y’ represents the dependent
variable, ‘
x’ represents the independent variable,
????0 represents the y-intercept
and ????1 represents the slope,
which describes the relationship between the independent
variable and the dependent variable
Linear Regression
 The use of LR is to make predictions on continuous
dependent variables with the assistance and knowledge
from independent variables & to find the line of best fit,
which can accurately predict the output of future events
for continuous dependent variables.
It is a statistical method used in (DS & ML) for predictive
analysis.
 Simple Linear Regression is a regression model that
estimates the relationship between one single
independent variable and one dependent variable using
a straight line.
 If there are more than two independent variables, we
then call this Multiple Linear Regression.
 Using the strategy of the line of best fits helps us to
understand the relationship between the dependent and
independent variable; which should be of linear nature.
Regression Models
Regression models are helpful for
predicting numerical values based on
different data points, such as sales revenue
projections for a given business.
• Some popular regression algorithms are
linear regression,
• logistic regression and
• polynomial regression.
• Linear Regression is used for solving
Regression problems,
• whereas Logistic regression is used for
solving the classification problems.
Linear Vs Non linear Regression
In logistic Regression, we predict the values of categorical
variables.
Here x in the curve is called the independent variable,
predictor variable, or explanatory variable because it has a
known value.
Logistic Regression
In general, logistic regression explores how independent variables
affect one dependent variable by looking at historical data values of
both variables.
Logistic regression can predict the probability of an event based on
known conditions
Binary logistic regression is used to predict the probability of a binary
outcome, such as yes or no, true or false, or 0 or 1.
For example, it could be used to predict whether a customer will churn
or not, whether a patient has a disease or not, or whether a loan will be
repaid or not.
(Definition: Churn is a measurement of the percentage of accounts that
cancel or choose not to renew their subscriptions.)
target variable here is 'Churn' which will tell us whether or not a
particular customer has churned.
It is a binary variable - 1 means that the customer has churned and 0
means the customer has not churned.
Advantages: Can handle both categorical and continuous predictor
variables.
Disadvantages: Assumes linearity, requires large sample size, prone to
over fitting.
Logistic Regression as most popular M L algorithms
Logistic Regression which comes under the Supervised Learning technique.
Logistic regression predicts the output of a categorical dependent variable. it
gives the probabilistic values which lie between 0 and 1
Therefore the outcome must be a categorical or discrete value.
It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the
exact value as 0 and 1,
Logistic Regression is much similar to the Linear Regression except that how
they are used.
In Logistic regression, instead of fitting a regression line, we fit an "S" shaped
logistic function, which predicts two maximum values (0 or 1).
The curve from the logistic function indicates the likelihood of something such
as whether the cells are cancerous or not, a mouse is obese or not based on its
weight, etc.
Logistic Regression is based on Maximum Likelihood Estimation, which is a
method of estimating the parameters of an assumed probability distribution,
given some observed data
Logistic Regression is a significant machine learning algorithm
because it has the ability to provide probabilities and classify new data using
continuous and discrete datasets.
Logistic Regression can be used to classify the observations using different
types of data and can easily determine the most effective variables used for the
classification.
Logistic Function (Sigmoid Function):
The sigmoid function is a mathematical function used to map the
predicted values to probabilities.
It maps any real value into another value within a range of 0 and 1.
Assumptions for Logistic Regression:
The dependent variable must be categorical in nature.
The independent variable should not have multi-collinearity.
Multicollinearity is a statistical concept where several independent
variables in a model are correlated.
Least Mean Squares (LMS) Algorithm
A gradient descent algorithm that incrementally
updates the weights of a linear function in an attempt to
minimize the mean squared error
Until weights converge :
For each training example b do :
1) Compute the absolute error :

2) For each board feature, fi, update its weight, wi :



error (b)  Vtrain(b)  V (b)
for some small constant (learning rate) c

wi  wi  c  fi  error (b)
LMS Discussion
Intuitively, LMS executes the following rules:
» If the output for an example is correct, make no
change.
» If the output is too high, lower the weights
proportional to the values of their corresponding
features, so the overall output decreases
» If the output is too low, increase the weights
proportional to the values of their corresponding
features, so the overall output increases.
Under the proper weak assumptions, LMS can be
proven to eventetually converge to a set of weights
that minimizes the mean squared error.
The Formula for Sigmoid Function

Logistic Regression is based on Maximum Likelihood


Estimation,
which is a method of estimating the parameters of an
assumed probability distribution, given some observed data.
Cost Function
A Cost Function is a mathematical formula used to calculate
the error, it is a difference between our predicted value
and the actual value.
It simply measures how wrong the model is in terms of its
ability to estimate the relationship between x and y.
Cost Function of a Linear & Non Linear Regression

Cost function of linear regression is root mean squared error or also known as
mean squared error (MSE).
MSE measures the average squared difference between an observation’s actual
and predicted values.
The cost will be outputted as a single number which is associated with our current
set of weights.
The reason we use Cost Function is to improve the accuracy of the model;
minimising MSE does this.
Logistic Regression
The Cost Function of a Logistic Regression cannot use MSE because our prediction
function is non-linear (due to sigmoid transform).
Therefore we use a cost function called Cross-Entropy, also known as Log Loss.
Cross-entropy measures the difference between two probability distributions for a
given random variable or set of events.
Type of Logistic Regression
On the basis of the categories, Logistic Regression
can be classified into three types:
Binomial:
In binomial Logistic regression, there can be only
two possible types of the dependent variables,
such as 0 or 1, Pass or Fail, etc.
Multinomial:
In multinomial Logistic regression, there can be 3
or more possible unordered types of the
dependent variable, such as "cat", "dogs", or
"sheep"
Ordinal: In ordinal Logistic regression, there can
be 3 or more possible ordered types of dependent
variables, such as "low", "Medium", or "High".
Linear and Non linear Regression
Linear Regression Logistic Regression
Used to predict the continuous dependent Used to predict the categorical dependent
variable using a given set of independent variable using a given set of independent
variables. variables.

The outputs produced must be a continuous The outputs produced must be Categorical
value, such as price and age. values such as 0 or 1, Yes or No.

The relationship between the dependent The relationship DOES NOT need to be
variable and independent variable must be linear between the dependent and
linear. independent variables.

Used for solving Regression problems. Used for solving Classification problems.

We are finding and using the line of best fit We are using the S-curve (Sigmoid) to help
to help us easily predict outputs. us classify predicted outputs.

Least square estimation method is used for Maximum likelihood estimation method is
the estimation of accuracy. used for the estimation of accuracy.

There is a possibility of collinearity between There should not be any collinearity


the independent variables. between the independent variable.
Applications for Supervised Learning

Situations where there is no human expert


» x: Bond graph for a new molecule.
» f(x): Predicted binding strength to AIDS protease molecule.

Situations where humans can perform the task but can't


describe how they do it.
» x: Bitmap picture of hand-written character
» f(x): Ascii code of the character

Situations where the desired function is changing frequently


» x: Description of stock prices and trades for last 10 days.
» f(x): Recommended stock transactions

Situations where each user needs a customized function f


» x: Incoming email message.
» f(x): Importance score for presenting to user (or deleting
without presenting).
Supervised Learning
Given: Training examples (x; f(x)) for some unknown
function f
Find: A good approximation to f.
Example Applications
Handwriting Recognition
» x: Data from pen motion.
» f(x): Letter of the alphabet.
Disease diagnosis
» x: Properties of patient (symptoms, lab tests)
» f(x): Disease (or maybe, recommended therapy)
Face recognition
» x: Bitmap picture of person's face
» f(x): Name of the person.
Spam Detection
» x: Email message
» f(x): Spam or not spam.
Supervised Learning: Uses
Prediction of future cases:
Use the rule to predict the output for future
inputs
Knowledge extraction: The rule is easy to
understand
Compression: The rule is simpler than the
data it explains
Outlier detection: Exceptions that are not
covered by the rule, e.g., fraud
Unsupervised Learning
We are interested in capturing inherent
organization in the data
» clustering,
» density estimation
No output
Example applications
» Customer segmentation in CRM
» Image compression: Color quantization
» Bioinformatics: Learning motifs
Supervised Vs Unsupervised Learning
Supervised learning algorithms try to model
relationship and dependencies between the target
prediction output and the input features,
such that we can predict the output values for new
data based on those relationships,
which it has learned from previous datasets fed.
Unsupervised learning, another type of ML are the
family of ML algorithms,
which have main uses in pattern detection and
descriptive modeling.
These algorithms do not have output categories
or labels on the data
(the model trains with unlabeled )
Supervised Vs Unsupervised Learning
Supervised learning can be separated into two types of
problems:
classification and regression:
Classification problems use an algorithm to accurately
assign test data into specific categories, such as
separating apples from oranges.
Or, in the real world, supervised learning algorithms can be
used to classify spam in a separate folder from your inbox.
Linear classifiers,
support vector machines,
decision trees
and random forest
are all common types of classification algorithms.
Uses of Unsupervised learning models for three main tasks:

•Clustering is a data mining technique for grouping unlabeled data based


on their similarities or differences.
•For example, K-means clustering algorithms assign similar data points into
groups, where the K value represents the size of the grouping and
granularity.
•This technique is helpful for market segmentation, image compression, etc.
•Association is another type of unsupervised learning method that uses
different rules to find relationships between variables in a given dataset.
These methods are frequently used for market basket analysis and
recommendation engines, along the lines of “Customers Who Bought This
Item Also Bought” recommendations.
•Dimensionality reduction is a learning technique used when the number
of features (or dimensions) in a given dataset is too high.
•It reduces the number of data inputs to a manageable size while also
preserving the data integrity.
•Often, this technique is used in the preprocessing data stage, such as
when auto encoders remove noise from visual data to improve picture
quality.
key differences between
Supervised and unsupervised learning
Goals:
In supervised learning, the goal is to predict outcomes for
new data. You know up front the type of results to expect.
With an unsupervised learning algorithm, the goal is to get
insights from large volumes of new data.
The ML itself determines what is different or interesting
from the dataset.
Applications:
Supervised learning models are ideal for spam detection,
sentiment analysis, weather forecasting and pricing
predictions, among other things.
In contrast, unsupervised learning is a great fit for anomaly
detection, recommendation engines, customer personas
and medical imaging.
key differences between supervised and
unsupervised learning
Complexity:
Supervised learning is a simple method for machine
learning, typically calculated through the use of programs
like R or Python.
In unsupervised learning, you need powerful tools for
working with large amounts of unclassified data.
Unsupervised learning models are computationally
complex because they need a large training set to
produce intended outcomes.
Drawbacks:
Supervised learning models can be time-consuming to
train, and the labels for input and output variables require
expertise.
Meanwhile, unsupervised learning methods can have
wildly inaccurate results unless you have human
intervention to validate the output variables.

You might also like