Download as pdf or txt
Download as pdf or txt
You are on page 1of 68

Please read this disclaimer before proceeding:

This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
22IT401 - ARTIFICIAL
INTELLIGENCE AND
MACHINE LEARNING

Department: Information Technology


Batch/Year: 2022 – 26 / II Year
Created by:
Dr. T. Mahalingam
Dr. S. Selvakanmani

Date: 01.02.2024
Table of Contents
SLIDE
S.NO. CONTENTS
NO.
1 CONTENTS 5

2 COURSE OBJECTIVES 7

3 PRE REQUISITES (COURSE NAMES WITH CODE) 8

4 SYLLABUS (WITH SUBJECT CODE, NAME, LTPC DETAILS) 9

5 COURSE OUTCOMES (6) 13

6 CO- PO/PSO MAPPING 14

7 LECTURE PLAN –UNIT 4 15

8 ACTIVITY BASED LEARNING –UNIT 4 17


9 LECTURE NOTES – UNIT 4

Motivation for Machine Learning, Applications, Machine


10 19
Learning
11 Learning associations, Classification, Regression 23
The Origin of machine learning, Uses and abuses of
12 26
machine learning, Success cases
How do machines learn, Abstraction and knowledge
13 35, 37
representation, Generalization
Factors to be considered, Assessing the success of
14 42
learning, Metrics for evaluation of classification method

15 Steps to apply machine learning to data 43

16 Machine learning process, Input data and ML algorithm 44

17 Classification of machine learning algorithms 44


Table of Contents

SLIDE
S.NO. CONTENTS
NO.

18 General ML architecture 44

Group of algorithms, Reinforcement learning, Supervised


19 learning, Unsupervised learning, Semi-Supervised learning, 45
Algorithms

Ensemble learning, Matching data to an appropriate


20 47
algorithm

21 ASSIGNMENT - UNIT 4 52

22 PART A Q & A (WITH K LEVEL AND CO) 53

23 PART B Q s (WITH K LEVEL AND CO) 58

24 SUPPORTIVE ONLINE CERTIFICATION COURSES 59

REAL TIME APPLICATIONS IN DAY TO DAY LIFE AND TO


25 60
INDUSTRY

26 CONTENTS BEYOND THE SYLLABUS 63

27 ASSESSMENT SCHEDULE 65

28 PRESCRIBED TEXT BOOKS & REFERENCE BOOKS 66

29 MINI PROJECT SUGGESTIONS 67


2. COURSE OBJECTIVES

Understand the concept of agents, problem solving and


searching strategies.

Familiarize with Knowledge reasoning and representation


based AI systems and approaches.

Apply the aspect of Probabilistic approach to AI.

Understanding of concepts of machine learning approaches.

Recognize the concepts of Machine Learning and its


deterministic tools
3. PRE REQUISITES

PRE-REQUISITE CHART
Artificial INTELLIGENCE
Intelligence
LEARNING

22MA401- Probability
Machine Learning

and Statistics

22CS303- Design and


ARTIFICIAL
andMACHINE

Analysis of Algorithms

22CS102- Problem
22IT401–
AND

Solving using C++


20IT602
4. 22IT401 ARTIFICIAL INTELLIGENCE AND MACHINE
LEARNING

OBJECTIVES
• Understand the concept of Artificial Intelligence
• Familiarize with Knowledge based AI systems and approaches
• Apply the aspect of Probabilistic approach to AI
• Identify the Neural Networks and NLP in designing AI models
• Recognize the concepts of Machine Learning and its deterministic tools
UNIT 1 PROBLEM SOLVING AND SEARCH STARTEGIES
Introduction: What Is Ai, The Foundations Of Artificial Intelligence, The
History Of Artificial Intelligence, The State Of The Art. Intelligent Agents: Agents And
Environments, Good Behaviour: The Concept Of Rationality, The Nature Of
Environments, And The Structure Of Agents. Solving Problems By Searching: Problem-
Solving Agents, Uninformed Search Strategies, Informed (Heuristic) Search Strategies,
Heuristic Functions. Beyond Classical Search: Local Search Algorithms and Optimization
Problems, Searching With Nondeterministic Actions And Partial Observations, Online
Search Agents And Unknown Environments. Constraint Satisfaction Problems:
Definition, Constraint Propagation, Backtracking Search, Local Search, The Structure Of
Problems.
List of Exercise/Experiments
1. Implementation of uninformed search algorithm (BFS and DFS).
2. Implementation of Informed Search algorithm (A* and Hill
Climbing Algorithm)

UNIT 2 KNOWLEDGE REPRESENTATION AND REASONING


Logical Agents: Knowledge-Based Agents, Propositional Logic,
Propositional Theorem Proving, Effective Propositional Model Checking, Agents Based
on Propositional Logic. FirstOrder Logic: Syntax and Semantics, Knowledge Engineering
in FOL, Inference in First-Order Logic, Unification and Lifting, Forward Chaining,
Backward Chaining, Planning: Definition, Algorithms, Planning Graphs, Hierarchical
Planning, Multi-agent Planning. Knowledge Representation: Ontological Engineering,
Categories and Objects, Events, Mental Events and Mental Objects, Reasoning Systems
for Categories, Reasoning with Default Information, The Internet Shopping World.
List of Exercise/Experiments
1. Implementation of forward and backward chaining.
2. Implementation of unification algorithms.

9
4. 22IT401 ARTIFICIAL INTELLIGENCE
AND MACHINE LEARNING
UNIT 3 LEARNING
Learning from Examples: Forms of Learning, Supervised Learning, Learning
Decision Trees, Evaluating and Choosing the Best Hypothesis, The Theory of Learning,
Regression and Classification with Linear Models, Artificial Neural Networks. Applications:
Human computer interaction (HCI), Knowledge management technologies, AI for customer
relationship management, Expert systems, Data mining, text mining, and Web mining,
Other current topics.
List of Exercise/Experiments
1. Numpy Operations
2. NumPy arrays
3. NumPy Indexing and Selection
4. NumPy Exercise:
(i) Write code to create a 4x3 matrix with values ranging from 2 to 13.
(ii) Write code to replace the odd numbers by -1 in the following array.
(iii) Perform the following operations on an array of mobile phones prices 6999,
7500, 11999, 27899, 14999, 9999.
a) Create a 1d-array of mobile phones prices
b) Convert this array to float type
c) Append a new mobile having price of 13999 Rs. to this array
d) Reverse this array of mobile phones prices
e) Apply GST of 18% on mobile phones prices and update this array.
f) Sort the array in descending order of price
g) What is the average mobile phone price.

10
4. 22IT401 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

UNIT 4 FUNDAMENTALS OF MACHINE LEARNING


Motivation for Machine Learning, Applications, Machine Learning,
Learning associations, Classification, Regression, The Origin of machine learning, Uses
and abuses of machine learning, Success cases, How do machines learn, Abstraction
and knowledge representation, Generalization, Factors to be considered, Assessing the
success of learning, Metrics for evaluation of classification method, Steps to apply
machine learning to data, Machine learning process, Input data and ML algorithm,
Classification of machine learning algorithms, General ML architecture, Group of
algorithms, Reinforcement learning, Supervised learning, Unsupervised learning, Semi-
Supervised learning, Algorithms, Ensemble learning, Matching data to an appropriate
algorithm.
List of Exercise/Experiments
1. Build linear regression models to predict housing prices using python ,
using data set available Google colabs.
2. Stock Ensemble-based Neural Network for Stock Market Prediction using
Historical Stock Data and Sentiment Analysis.
UNIT 5 MACHINE LEARNING AND TYPES
Supervised Learning, Regression, Linear regression, Multiple linear
regression, A multiple regression analysis, The analysis of variance for multiple
regression, Examples for multiple regression, Overfitting, Detecting overfit models:
Cross validation, Cross validation: The ideal procedure, Parameter estimation, Logistic
regression, Decision trees: Background, Decision trees, Decision trees for credit card
promotion, An algorithm for building decision trees, Attribute selection measure:
Information gain, Entropy, Decision Tree: Weekend example, Occam’s Razor,
Converting a tree to rules, Unsupervised learning, Semi Supervised learning,
Clustering, K – means clustering, Automated discovery, Reinforcement learning, Multi-
Armed Bandit algorithms, Influence diagrams, Risk modelling, Sensitivity analysis,
Casual learning.

11
4. 22IT401 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

List of Exercise/Experiments
Use Cases
Case Study 1: Churn Analysis and Prediction (Survival Modelling)
Cox-proportional models
Churn Prediction
Case Study 2: Credit card Fraud Analysis
Imbalanced Data
Neural Network
Case study 3: Sentiment Analysis or Topic Mining from New York Times
Similarity measures (Cosine Similarity, Chi-Square, N Grams)
Part-of-Speech Tagging
Stemming and Chunking
Case Study 4: Sales Funnel Analysis
A/B testing
Campaign effectiveness, Web page layout effectiveness
Scoring and Ranking
Case Study 5: Recommendation Systems and Collaborative filtering
User based
Item Based
Singular value decomposition–based recommenders
Case Study 6: Customer Segmentation and Value
Segmentation Strategies
Lifetime Value
Case Study 7: Portfolio Risk Conformance
Risk Profiling
Portfolio Optimization
Case Study 8: Uber Alternative Routing
Graph Construction
Route Optimization

12
5.COURSE OUTCOME
Cognitive/
Expected
Affective
Course Level of
Course Outcome Statement Level of the
Code Attainmen
Course
t
Outcome
Course Outcome Statements in Cognitive Domain
Understan
Explain the problem solving and
C211.1 d 70%
search strategies.
K2
Demonstrate the techniques for
Apply
C211.2 knowledge representation and 70%
K3
reasoning.
Interpret various forms of learning, Apply
C211.3 artificial neural networks and its K3 70%
applications.

Experiment various machine Analyse


C211.4 70%
learning algorithms. K4

Employ AI and machine learning


Apply
C211.5 algorithms to solve real world 70%
K3
problems.

13
6.CO-PO/PSO MAPPING

Correlation Matrix of the Course Outcomes to


Programme Outcomes and Programme Specific
Outcomes Including Course Enrichment Activities

Programme Outcomes (POs), Programme Specific Outcomes (PSOs)


P P P
Course PO PO PO PO PO PO PO PSO PS PS
PO3 PO4 O O O
Outcome 1 2 5 9 10 11 12 1 O2 O3
6 7 8
s (COs)
K3/ A A A
K3 K4 K5 K5 A3 A3 A3 A2 K3 K3 K3
K5 2 3 3
C211
K2 2 1 1 1 3 3
.1
C211
K3 3 2 1 1 3 3
.2
C211
K3 3 2 1 1 3 2 3
.3
C211
K4 3 3 2 2 3 3
.4
C211 5
K3 3 2 1 1 3 2 3
.5

14
LECTURE PLAN – UNIT IV
UNIT I INTRODUCTION

Sl.
No
NO PROPOSED ACTUAL
OF LECTURE LECTURE PERTAINING TAXONOMY
TOPIC MODE OF DELIVERY
PERI CO(s) LEVEL
ODS
PERIOD PERIOD

Motivation for Machine


1 Learning, Applications, 1 CO2 K2 MD1
Machine Learning

Learning associations,
Classification, 1 CO2 K2 MD1
2 Regression

The Origin of machine


learning, Uses and
abuses of machine 1 CO2 K2 MD1
3
learning, Success
cases

How do machines
learn, Abstraction and
knowledge 1 CO2 K3 MD1
4 representation,
Generalization,

Factors to be
considered, Assessing
the success of
5 1 CO2 K3 MD1
learning, Metrics for
evaluation of
classification method,

6
Steps to apply
machine learning to 1 CO2 K2 MD1
data

Machine learning
process, Input data
and ML algorithm,
7 General ML 1 CO2 K2 MD1
architecture

Group of algorithms,
8 Reinforcement
learning, Supervised
learning, Unsupervised 1 CO2 K2 MD1
learning, Semi-
Supervised learning,
Algorithms

Ensemble learning,
9 Matching data to an 1 CO2 K2 MD1
appropriate algorithm
LECTURE PLAN – UNIT IV

ASSESSMENT COMPONENTS MODE OF DELEIVERY


AC 1. Unit Test MD 1. Oral presentation
AC 2. Assignment MD 2. Tutorial
AC 3. Course Seminar MD 3. Seminar
AC 4. Course Quiz MD 4 Hands On
AC 5. Case Study MD 5. Videos
AC 6. Record Work MD 6. Field Visit
AC 7. Lab / Mini Project
AC 8. Lab Model Exam
AC 9. Project Review

15
Activity Based Learning - Semantris

Using the game in Google Experiment called “Semantris”


students can see how AI is using NLP datasets to work
out best word associations.
UNIT IV
FUNDAMENTALS OF MACHINE
LEARNING

13
Lecture Notes
UNIT 4 FUNDAMENTALS OF MACHINE LEARNING

The motivation behind ML


 Learning the language rules by heart, using textbooks, dictionaries, and so on. That's
how college students usually do it.

 Observing live language: by communicating with native speakers, reading books, and
watching movies. That's how children do it.

 In both cases, you build in your mind the language model, or, as some prefer to say, develop a
sense of language.
 In the first case, you are trying to build a logical system based on rules. In this case, you will
encounter many problems: the exceptions to the rule, different dialects, borrowing from other
languages, idioms, and lots more. Someone else, not you, derived and described for you
the rules and structure of the language.
 In the second case, you derive the same rules from the available data. You may not even be
aware of the existence of these rules, but gradually adjust yourself to the hidden structure and
understand the laws. You use your special brain cells called mirror neurons, trying to mimic
native speakers. This ability is honed by millions of years of evolution. After some
time, when facing the wrong word usage, you just feel that something is wrong but you can't
tell immediately what exactly.
 In any case, the next step is to apply the resulting language model in the real world. Results
may differ. In the first case, you will experience difficulty every time you find the missing
hyphen or comma, but may be able to get a job as a proofreader at a publishing house. In the
second case, everything will depend on the quality, diversity, and amount of the data on
which you were trained. Just imagine a person in the center of New York who studied English
through Shakespeare. Would he be able to have a normal conversation with people around
him?

 Now we'll put the computer in place of the person in our example. Two approaches, in this
case, represent the two programming techniques. The first one corresponds to writing ad hoc
algorithms consisting of conditions, cycles, and so on, by which a programmer expresses
rules and structures. The second one represents ML , in which case the computer itself
identifies the underlying structure and rules based on the available data.
 The analogy is deeper than it seems at first glance. For many tasks, building the algorithms
directly is impossibly hard because of the variability in the real world. It
may require the work of experts in the domain, who must describe all rules and edge cases
explicitly. Resulting models can be fragile and rigid. On the other hand, this same task can
be solved by allowing computers to figure out the rules on their own from a reasonable
amount of data. An example of such a task is face recognition. It's virtually impossible to
formalize face recognition in terms of conventional imperative algorithms and data
structures. Only recently, the task was successfully solved with the help of ML

What is ML ?

 ML is a subdomain of AI that has demonstrated significant progress over the last

decade, and remains a hot research topic. It is a branch of knowledge concerned with
building algorithms that can learn from data and improve themselves with regards to the
tasks they perform. ML allows computers to deduce the algorithm for some task or to
extract hidden patterns from data. ML is known by several different names in different
research communities: predictive analytics, data mining, statistical learning, pattern
recognition, and so on. One can argue that these terms have some subtle differences, but
essentially, they all overlap to the extent that you can use the terminology interchangeably.

Applications of Machine learning

Machine learning is a buzzword for today's technology, and it is growing very rapidly day by
day. We are using machine learning in our daily life even without knowing it such as Google
Maps, Google assistant, Alexa, etc. Below are some most trending real-world applications of
Machine Learning:
1. Image Recognition:

Image recognition is one of the most common applications of machine learning. It is used to identify
objects, persons, places, digital images, etc. The popular use case of image recognition and face
detection is, Automatic friend tagging suggestion:

Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a photo with
our Facebook friends, then we automatically get a tagging suggestion with name, and the technology
behind this is machine learning's face detection and recognition algorithm.

It is based on the Facebook project named "Deep Face," which is responsible for face recognition and
person identification in the picture.
2. Speech Recognition

While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.

Speech recognition is a process of converting voice instructions into text, and it is also
known as "Speech to text", or "Computer speech recognition." At present, machine
learning algorithms are widely used by various applications of speech recognition.
Google assistant, Siri, Cortana, and Alexa are using speech recognition technology to
follow the voice instructions.

3. Traffic prediction:

If we want to visit a new place, we take help of Google Maps, which shows us the correct
path with the shortest route and predicts the traffic conditions.

It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily
congested with the help of two ways:

o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.

Everyone who is using Google Map is helping this app to make it better. It takes information
from the user and sends back to its database to improve the performance.

4. Product recommendations:

Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search
for some product on Amazon, then we started getting an advertisement for the same
product while internet surfing on the same browser and this is because of machine
learning.

Google understands the user interest using various machine learning algorithms and suggests
the product as per customer interest.

As similar, when we use Netflix, we find some recommendations for entertainment series,
movies, etc., and this is also done with the help of machine learning.
5. Self-driving cars:

One of the most exciting applications of machine learning is self-driving cars. Machine
learning plays a significant role in self-driving cars. Tesla, the most popular car
manufacturing company is working on self-driving car. It is using unsupervised learning
method to train the car models to detect people and objects while driving.

6. Email Spam and Malware Filtering:

Whenever we receive a new email, it is filtered automatically as important, normal, and


spam. We always receive an important mail in our inbox with the important symbol and
spam emails in our spam box, and the technology behind this is Machine learning. Below
are some spam filters used by Gmail:

o Content Filter
o Header filter

o General blacklists filter


o Rules-based filters
o Permission filters

Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree,


and Naïve Bayes classifier are used for email spam filtering and malware detection.

7. Virtual Personal Assistant:

We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri.
As the name suggests, they help us in finding the information using our voice instruction.
These assistants can help us in various ways just by our voice instructions such as Play
music, call someone, Open an email, Scheduling an appointment, etc.

Association Rule Learning

Association rule learning is a type of unsupervised learning technique that checks for the
dependency of one data item on another data item and maps accordingly so that it can be
more profitable. It tries to find some interesting relations or associations among the
variables of
dataset. It is based on different rules to discover the interesting relations between
variables in the database.

The association rule learning is one of the very important concepts of machine
learning, and it is employed in Market Basket analysis, Web usage mining,
continuous production, etc. Here market basket analysis is a technique used by the
various big retailer to discover the associations between items. We can understand it
by taking an example of a supermarket, as in a supermarket, all products that are
purchased together are put together.

For example, if a customer buys bread, he most likely can also buy butter, eggs, or
milk, so these products are stored within a shelf or mostly nearby. Consider the
below diagram:

Association rule learning can be divided into three types of algorithms:


1. Apriori

2. Eclat

3.F-P Growth Algorithm

We will understand these algorithms in later chapters. How does Association Rule Learning

work?
Association rule learning works on the concept of If and Else Statement, such as if A then B.

Here the If element is called antecedent, and then statement is called as Consequent. These
types of relationships where we can find out some association or relation between two items
is known as single cardinality. It is all about creating rules, and if the number of items
increases, then cardinality also increases accordingly. So, to measure the associations
between thousands of data items, there are several metrics. These metrics are given below:

oSupport

oConfidence

oLift

oLet's understand each of them:

Support

Support is the frequency of A or how frequently an item appears in the dataset. It is defined
as the fraction of the transaction T that contains the itemset X. If there are X datasets, then
for transactions T, it can be written as:

Confidence

Confidence indicates how often the rule has been found to be true. Or how often the
items X and Y occur together in the dataset when the occurrence of X is already given. It
is the ratio of the transaction that contains X and Y to the number of records that contain
X.
Lift

It is the strength of any rule, which can be defined as below formula:

It is the ratio of the observed support measure and expected support if X and Y are
independent of each other. It has three possible values:

Applications of Association Rule Learning

It has various applications in machine learning and data mining. Below are some
popular applications of association rule learning:

oMarket Basket Analysis: It is one of the popular examples and applications of


association rule mining. This technique is commonly used by big retailers to determine
the association between items.

oMedical Diagnosis: With the help of association rules, patients can be cured easily, as
it helps in identifying the probability of illness for a particular disease.
oProtein Sequence: The association rules help in determining the synthesis of artificial
Proteins.
oIt is also used for the Catalog Design and Loss-leader Analysis and many more other
applications.

The Origin of machine learning

1950 – this is the year when Alan Turing, one of the most brilliant and
influential British mathematicians and computer scientists, created the
Turing test. The test was designed to determine whether a computer has
human-like intelligence. In order to pass the test, the
computer needs to be able to convince a human to believe that it’s another
human. Apart from a computer program simulating a 13-year-old Ukrainian
boy who is said to have passed the Turing test, there were no other
successful attempts so far.
1952 – Arthur Samuels, the American pioneer in the field of artificial intelligence and
computer gaming, wrote the very first computer learning program. That program was
actually the game of checkers. The IBM computer would first study which moves lead to
winning and then put them into its program.

1957 – this year witnessed the design of the very first neural network for computers called
the perceptron by Frank Rosenblatt. It successfully stimulated the thought processes of the
human brain. This is where today’s neural networks originate from.

1967 – The nearest neighbor algorithm was written for the first time this year. It allows
computers to start using basic pattern recognition. This algorithm can be used to map a
route for a traveling salesman that starts in a random city and ensures that the salesman
passes by all the required cities in the shortest time. Today, the nearest neighbor algorithm
called KNN is mostly used to classify a data point on the basis of how their neighbors are
classified. KNN is used in retail applications that recognize patterns in credit card usage or
for theft prevention when implemented in CCTV image recognition in retail stores.

1981 – Gerald Dejong introduced the concept of explanation-based learning (EBL). In this
type of learning, the computer analyzes training data and generates a general rule that it can
follow by discarding the data that doesn’t seem to be important.

1985 – Terry Sejnowski invented the NetTalk program that could learn to pronounce words
just like a baby does during the process of language acquisition. The artificial neural
network aimed to reconstruct a simplified model that would show the complexity of
learning human- level cognitive tasks.

The 1990s – during the 1990s, the work in machine learning shifted from the knowledge-
driven approach to the data-driven approach. Scientists and researchers created programs
for computers that could analyze large amounts of data and draw conclusions from the
results. This
led to the development of the IBM Deep Blue computer, which won against the world’s chess
champion Garry Kasparov in 1997.

2006 – this is the year when the term “deep learning” was coined by Geoffrey Hinton. He used
the term to explain a brand-new type of algorithms that allow computers to see and distinguish
objects or text in images or videos.

2010 – this year saw the introduction of Microsoft Kinect that could track even 20 human
features at the rate of 30 times per second. Microsoft Kinect allowed users to interact with
machines via gestures and movements.

2011 – this was an interesting year for machine learning. For starters, IBM’s Watson managed
to beat human competitors at Jeopardy. Moreover, Google developed Google Brain equipped
with a deep neural network that could learn to discover and categorize objects (in particular,
cats).

2012 – Google X lab developed a machine learning algorithm able to autonomously browse
YouTube videos and identify those that contained cats.

2014 – Facebook introduced DeepFace, a special software algorithm able to recognize and
verify individuals on photos at the same level as humans.

2015 – this is the year when Amazon launched its own machine learning platform, making
machine learning more accessible and bringing it to the forefront of software development.
Moreover, Microsoft created the Distributed Machine Learning Toolkit, which enables
developers to efficiently distribute machine learning problems across multiple machines.
During the same year, however, more than three thousand AI and robotics researchers
endorsed by figures like Elon Musk, Stephen Hawking, and Steve Wozniak signed an open
letter warning
about the dangers of autonomous weapons that could select targets without any human
intervention.

2016 – this was the year when Google’s artificial intelligence algorithms managed to beat
a professional player at the Chinese board game Go. Go is considered the world’s most
complex board game. The AlphaGo algorithm developed by Google won five out of five
games in the competition, bringing AI to the front page.

2020 – Open AI announced a groundbreaking natural language processing algorithm GPT-


3 with a remarkable ability to generate human-like text when given a prompt. Today, GPT-
3 is considered the largest and most advanced language model in the world, using 175
billion parameters and Microsoft Azure’s AI supercomputer for training.

Uses and abuses of machine learning

Most people have heard of Deep Blue, the chess-playing computer that in 1997 was the
first to win a game against a world champion. Another famous computer, Watson, defeated
two human opponents on the television trivia game show Jeopardy in 2011. Based on
these stunning accomplishments, some have speculated that computer intelligence will
replace workers in information technology occupations, just as machines replaced workers
in fields and assembly lines.

The truth is that even as machines reach such impressive milestones, they are still
relatively limited in their ability to thoroughly understand a problem. They are pure
intellectual horsepower without direction. A computer may be more capable than a human
of finding subtle patterns in large databases, but it still needs a human to motivate the
analysis and turn the result into meaningful action.

Machines are not good at asking questions, or even knowing what questions to ask. They
are much better at answering them, provided the question is stated in a way that the
computer can comprehend. Present-day machine learning algorithms partner with people
much like a
bloodhound partners with its trainer: the dog's sense of smell may be many times stronger
than its master's, but without being carefully directed, the hound may end up chasing its tail.

To better understand the real-world applications of machine learning, we'll now consider
some cases where it has been used successfully, some places where it still has room for
improvement, and some situations where it may do more harm than good.

Machine learning successes

Machine learning is most successful when it augments, rather than replaces, the specialized
knowledge of a subject-matter expert. It works with medical doctors at the forefront of the
fight to eradicate cancer; assists engineers and programmers with efforts to create smarter
homes and automobiles; and helps social scientists to build knowledge of how societies
function.
Toward these ends, it is employed in countless businesses, scientific laboratories,
hospitals, and governmental organizations. Any effort that generates or aggregates data
likely employs at least one machine learning algorithm to help make sense of it.

Though it is impossible to list every use case for machine learning, a look at recent
success stories identifies several prominent examples:
 Identification of unwanted spam messages in email
 Segmentation of customer behavior for targeted advertising
 Forecasts of weather behavior and long-term climate changes
 Reduction of fraudulent credit card transactions
 Actuarial estimates of financial damage of storms and natural disasters
 Prediction of popular election outcomes
 Development of algorithms for auto-piloting drones and self-driving cars
 Optimization of energy use in homes and office buildings
 Projection of areas where criminal activity is most likely
 Discovery of genetic sequences linked to diseases
By the end of this book, you will understand the basic machine learning algorithms that are
employed to teach computers to perform these tasks. For now, it suffices to say that no matter
what the context is, the machine learning process is the same. Regardless of the task, an
algorithm takes data and identifies patterns that form the basis for further action.

The limits of machine learning

Although machine learning is used widely and has tremendous potential, it is important to
understand its limits. Machine learning, at this time, emulates a relatively limited subset of the
capabilities of the human brain. It offers little flexibility to extrapolate outside of strict
parameters and knows no common sense. With this in mind, one should be extremely careful to
recognize exactly what an algorithm has learned before setting it loose in the real world.

Without a lifetime of past experiences to build upon, computers are also limited in their ability
to make simple inferences about logical next steps. Take, for instance, the banner
advertisements seen on many websites. These are served according to patterns learned by data
mining the browsing history of millions of users. Based on this data, someone who views
websites selling shoes is interested in buying shoes and should therefore see advertisements
for shoes. The problem is that this becomes a never-ending cycle in which, even after shoes
have been purchased, additional shoe advertisements are served, rather than advertisements
for shoelaces and shoe polish.

Many people are familiar with the deficiencies of machine learning's ability to understand
or translate language, or to recognize speech and handwriting. Perhaps the earliest example
of this type of failure is in a 1994 episode of the television show The Simpsons, which
showed a parody of the Apple Newton tablet. For its time, the Newton was known for its
state-of-the-art handwriting recognition. Unfortunately for Apple, it would occasionally fail
to great effect. The television episode illustrated this through a sequence in which a bully's
note to "Beat up Martin" was misinterpreted by the Newton as "Eat up Martha."

Machine language processing has improved enough in the time since the Apple Newton
that Google, Apple, and Microsoft are all confident in their ability to offer voice-activated
virtual concierge services such as Google Assistant, Siri, and Cortana. Still, these services
routinely struggle to answer relatively simple questions. Furthermore, online translation
services sometimes misinterpret sentences that a toddler would readily understand, and the
predictive text feature on many devices has led to a number of humorous "autocorrect fail"
sites that illustrate computers' ability to understand basic language but completely
misunderstand context.

Some of these mistakes are surely to be expected. Language is complicated, with multiple
layers of text and subtext, and even human beings sometimes misunderstand context. In
spite of the fact that machine learning is rapidly improving at language processing, the
consistent
shortcomings illustrate the important fact that machine learning is only as good as the data it
has learned from. If context is not explicit in the input data, then just like a human, the
computer will have to make its best guess from its limited set of past experiences.

Machine learning ethics

At its core, machine learning is simply a tool that assists us with making sense of the world's
complex data. Like any tool, it can be used for good or for evil. Where machine learning
goes most wrong is when it is applied so broadly, or so callously, that humans are treated as
lab rats, automata, or mindless consumers. A process that may seem harmless can lead to
unintended consequences when automated by an emotionless computer. For this reason,
those using machine learning or data mining would be remiss not to at least briefly consider
the ethical implications of the art.

Due to the relative youth of machine learning as a discipline and the speed at which it is
progressing, the associated legal issues and social norms are often quite uncertain, and
constantly in flux. Caution should be exercised when obtaining or analyzing data in order to
avoid breaking laws; violating terms of service or data use agreements; or abusing the trust
or violating the privacy of customers or the public.

Retailers routinely use machine learning for advertising, targeted promotions, inventory
management, or the layout of the items in a store. Many have equipped checkout lanes with
devices that print coupons for promotions based on a customer's buying history. In exchange
for a bit of personal data, the customer receives discounts on the specific products he or she
wants to buy. At first, this appears relatively harmless, but consider what happens when this
practice is taken a bit further.

One possibly apocryphal tale concerns a large retailer in the United States that employed
machine learning to identify expectant mothers for coupon mailings. The retailer hoped that if
these mothers-to-be received substantial discounts, they would become loyal customers who
would later purchase profitable items such as diapers, baby formula, and toys.
Equipped with machine learning methods, the retailer identified items in the customer purchase
history that could be used to predict with a high degree of certainty not only whether a woman was
pregnant, but also the approximate timing for when the baby was due.

After the retailer used this data for a promotional mailing, an angry man contacted the chain and
demanded to know why his daughter received coupons for maternity items. He was furious that the
retailer seemed to be encouraging teenage pregnancy! As the story goes, when the retail chain called
to offer an apology, it was the father who ultimately apologized after confronting his daughter and
discovering that she was indeed pregnant!

Whether completely true or not, the lesson learned from the preceding tale is that common sense
should be applied before blindly applying the results of a machine learning analysis. This is
particularly true in cases where sensitive information, such as health data, is concerned. With a bit
more care, the retailer could have foreseen this scenario and used greater discretion when choosing
how to reveal the pattern its machine learning analysis had discovered.

As machine learning algorithms are more widely applied, we find that computers may learn some
unfortunate behaviors of human societies. Sadly, this includes perpetuating race or gender
discrimination and reinforcing negative stereotypes. For example, researchers have found that
Google's online advertising service is more likely to show ads for high-paying jobs to men than
women, and is more likely to display ads for criminal background checks to black people than white
people.

Proving that these types of missteps are not limited to Silicon Valley, a Twitter chatbot service
developed by Microsoft was quickly taken offline after it began spreading Nazi and anti- feminist
propaganda. Often, algorithms that at first seem "content neutral" quickly start to reflect majority
beliefs or dominant ideologies. An algorithm created by Beauty.AI to reflect an objective
conception of human beauty sparked controversy when it favored almost exclusively white people.
Imagine the consequences if this had been applied to facial recognition software for criminal
activity!

To limit the ability of algorithms to discriminate illegally, certain jurisdictions have well-
intentioned laws that prevent the use of racial, ethnic, religious, or other protected class data for
business reasons. However, excluding this data from a project may not be enough because machine
learning algorithms can still inadvertently learn to discriminate. If a certain segment
of people tends to live in a certain region, buys a certain product, or otherwise behaves in a way
that uniquely identifies them as a group, machine learning algorithms can infer the protected
information from other factors. In such cases, you may need to completely de-identify these people
by excluding any potentially identifying data in addition to the already-protected statuses.

Apart from the legal consequences, inappropriate use of data may hurt the bottom line. Customers
may feel uncomfortable or become spooked if aspects of their lives they consider private are made
public. In recent years, a number of high-profile web applications have experienced a mass exodus
of users who felt exploited when the applications' terms of service agreements changed or their
data was used for purposes beyond what the users had originally intended. The fact that privacy
expectations differ by context, by age cohort, and by locale adds complexity to deciding the
appropriate use of personal data. It would be wise to consider the cultural implications of your
work before you begin on your project, in addition to being aware of ever-more-restrictive
regulations such as the European Union's newly- implemented General Data Protection
Regulation (GDPR) and the inevitable policies that will follow in its footsteps

Finally, it is important to note that as machine learning algorithms become progressively more
important to our everyday lives, there are greater incentives for nefarious actors to work to exploit
them. Sometimes, attackers simply want to disrupt algorithms for laughs or notoriety— such as
"Google bombing," the crowd-sourced method of tricking Google's algorithms to highly rank a
desired page.

Other times, the effects are more dramatic. A timely example of this is the recent wave of so- called
fake news and election meddling, propagated via the manipulation of advertising and
recommendation algorithms that target people according to their personality. To avoid giving such
control to outsiders, when building machine learning systems, it is crucial to consider how they
may be influenced by a determined individual or crowd.

How do machines learn, Abstraction and knowledge representation, Generalization, Factors


to be considered, Assessing the success of learning,

A commonly cited formal definition of machine learning, proposed by computer scientist Tom
M. Mitchell, says that a machine is said to learn if it is able to take experience and utilize it such
that its performance improves up on similar experiences in the future. This definition is
fairly exact, yet says little about how machine learning techniques actually learn to
transform data into actionable knowledge.

Regardless of whether the learner is a human or a machine, the basic learning process is
similar. It can be divided into three components as follows:

 Data input: It utilizes observation, memory storage, and recall to provide a


factual
basis for further reasoning.
 Abstraction: It involves the translation of data into broader representations.
 Generalization: It uses abstracted data to form a basis for action.

To better understand the learning process, think about the last time you studied for a
difficult test, perhaps for a university final exam or a career certification. Did you wish for
an eidetic (that is, photographic) memory? If so, you may be disappointed to learn that
perfect recall is unlikely to save you much effort. Without a higher understanding, your
knowledge is limited exactly to the data input, meaning only what you had seen before and
nothing more. Therefore, without knowledge of all the questions that could appear on the
exam, you would be stuck attempting to memorize answers to every question that could
conceivably be asked. Obviously, this is an unsustainable strategy.

Instead, a better strategy is to spend time selectively managing only a smaller set of key
ideas. The commonly used learning strategies of creating an outline or a concept map are
similar to how a machine performs knowledge abstraction. The tools define relationships
among information and in doing so, depict difficult ideas without needing to memorize
them word- for-word. It is a more advanced form of learning because it requires that the
learner puts the topic into his or her own words.

It is always a tense moment when the exam is graded and the learning strategies are either
vindicated or implicated with a high or low mark. Here, one discovers whether the learning
strategies generalized to the questions that the teacher or professor had selected.
Generalization requires a breadth of abstracted data, as well as a higher-level understanding of
how to apply such knowledge to unforeseen topics. A good teacher can be quite helpful in this
regard.

Keep in mind that although we have illustrated the learning process as three distinct steps,
they are merely organized this way for illustrative purposes. In reality, the three components
of learning are inextricably linked. In particular, the stages of abstraction and generalization
are so closely related that it would be impossible to perform one without the other. In human
beings, the entire process happens subconsciously. We recollect, deduce, induct, and intuit. Yet
for a computer, these processes must be made explicit. On the other hand, this is a benefit of
machine learning. Because the process is transparent, the learned knowledge can be examined
and utilized for future action.

Abstraction and knowledge representation

Representing raw input data in a structured format is the quintessential task for a learning
algorithm. Prior to this point, the data is merely ones and zeros on a disk or in memory; they
have no meaning. The work of assigning a meaning to data occurs during the
abstraction process.

The connection between ideas and reality is exemplified by the famous René Magritte
painting The Treachery of Images shown as follows:
Source: http://collections.lacma.org/node/239578

The painting depicts a tobacco pipe with the caption Ceci n'est pas une pipe ("this is not a
pipe"). The point Magritte was illustrating is that a representation of a pipe is not truly a
pipe. In spite of the fact that the pipe is not real, anybody viewing the painting easily
recognizes that the picture is a pipe, suggesting that observers' minds are able to connect the
picture of a pipe to the idea of a pipe, which can then be connected to an actual pipe that
could be held in the hand. Abstracted connections like this are the basis of knowledge
representation, the formation of logical structures that assist with turning raw sensory
information into a meaningful insight.

During the process of knowledge representation, the computer summarizes raw inputs in a
model, an explicit description of the structured patterns among data. There are many
different types of models. You may already be familiar with some. Examples include:

 Equations
 Diagrams such as trees and graphs
 Logical if/else rules
 Groupings of data known as clusters
The choice of model is typically not left up to the machine. Instead, the model is dictated by
the learning task and the type of data being analyzed. Later in this chapter, we will discuss
methods for choosing the type of model in more detail.

The process of fitting a particular model to a dataset is known as training. Why is this not
called learning? First, note that the learning process does not end with the step of data
abstraction. Learning requires an additional step to generalize the knowledge to future data.
Second, the term training more accurately describes the actual process undertaken when the
model is fitted to the data. Learning implies a sort of inductive, bottom-up reasoning. Training
better connotes the fact that the machine learning model is imposed by the human teacher onto
the machine student, providing the computer with a structure it attempts to model after.

When the model has been trained, the data has been transformed into an abstract form that
summarizes the original information. It is important to note that the model does not itself
provide additional data, yet it is sometimes interesting on its own. How can this be? The
reason is that by imposing an assumed structure on the underlying data, it gives insight into
the unseen and provides a theory about how the data is related. Take for instance the discovery
of gravity. By fitting equations to observational data, Sir Isaac Newton deduced the concept of
gravity. But gravity was always present. It simply wasn't recognized as a concept until the
model noted it in abstract terms—specifically, by becoming the g term in a model that
explains observations of falling objects.

Most models will not result in the development of theories that shake up scientific
thought for centuries. Still, your model might result in the discovery of previously unseen
relationships among data. A model trained on genomic data might find several genes that
when combined
are responsible for the onset of diabetes; banks might discover a seemingly innocuous type
of transaction that systematically appears prior to fraudulent activity; psychologists might
identify a combination of characteristics indicating a new disorder. The underlying
relationships were always present; but in conceptualizing the information in a different
format, a model presents the connections in a new light.

Generalization

Recall that the learning process is not complete until the learner is able to use its abstracted
knowledge for future action. Yet an issue remains before the learner can proceed—there
are countless underlying relationships that might be identified during the abstraction
process and myriad ways to model these relationships. Unless the number of potential
theories is limited, the learner will be unable to utilize the information. It would be stuck
where it started, with a large pool of information but no actionable insight.

The term generalization describes the process of turning abstracted knowledge into a form
that can be utilized for action. Generalization is a somewhat vague process that is a bit
difficult to describe. Traditionally, it has been imagined as a search through the entire set
of models (that is, theories) that could have been abstracted during training. Specifically, if
you imagine a hypothetical set containing every possible theory that could be established
from the data, generalization involves the reduction of this set into a manageable number
of important findings.

Generally, it is not feasible to reduce the number of potential concepts by examining them
one- by-one and determining which are the most useful. Instead, machine learning
algorithms generally employ shortcuts that more quickly divide the set of concepts.
Toward this end, the algorithm will employ heuristics, or educated guesses about the
where to find the most important concepts.

Heuristics are routinely used by human beings to quickly generalize experience to new
scenarios. If you have ever utilized gut instinct to make a snap decision prior to fully
evaluating your circumstances, you were intuitively using mental heuristics.

For example, the availability heuristic is the tendency for people to estimate the likelihood
of an event by how easily examples can be recalled. The availability heuristic might help
explain
the prevalence of the fear of airline travel relative to automobile travel, despite automobiles
being statistically more dangerous. Accidents involving air travel are highly publicized and
traumatic events, and are likely to be very easily recalled, whereas car accidents barely
warrant a mention in the newspaper.

The preceding example illustrates the potential for heuristics to result in illogical conclusions.
Browsing a list of common logical fallacies, one is likely to note many that seem rooted in
heuristic-based thinking. For instance, the gambler's fallacy, or the belief that a run of bad
luck implies that a stretch of better luck is due, may be resultant from the application of the
representativeness heuristic, which erroneously led the gambler to believe that all random
sequences are balanced since most random sequences are balanced.

The folly of misapplied heuristics is not limited to human beings. The heuristics employed by
machine learning algorithms also sometimes result in erroneous conclusions. If the
conclusions are systematically imprecise, the algorithm is said to have a bias. For example,
suppose that a machine learning algorithm learned to identify faces by finding two circles, or
eyes, positioned side-by-side above a line for a mouth. The algorithm might then have trouble
with, or be biased against faces that do not conform to its model. This may include faces with
glasses, turned at an angle, looking sideways, or with darker skin tones. Similarly, it could be
biased toward faces with lighter eye colors or other characteristics that do not conform to its
understanding of the world.

In modern usage, the word bias has come to carry quite negative connotations. Various forms
of media frequently claim to be free from bias, and claim to report the facts objectively,
untainted by emotion. Still, consider for a moment the possibility that a little bias might be
useful. Without a bit of arbitrariness, might it be a bit difficult to decide among several
competing choices, each with distinct strengths and weaknesses? Indeed, some recent studies
in the field of psychology have suggested that individuals born with damage to portions of
the brain responsible for emotion are ineffectual at decision making, and might spend hours
debating simple decisions such as what color shirt to wear or where to eat lunch.
Paradoxically, bias is what blinds us from some information while also allowing us to utilize
other information for action.

Assessing the success of learning

Bias is a necessary evil associated with the abstraction and generalization process inherent
in any machine learning task. Every learner has its weaknesses and is biased in a particular
way; there is no single model to rule them all. Therefore, the final step in the generalization
process is to determine the model's success in spite of its biases.

After a model has been trained on an initial dataset, the model is tested on a new dataset,
and judged on how well its characterization of the training data generalizes to the new data.
It's worth noting that it is exceedingly rare for a model to perfectly generalize to every
unforeseen case.

In part, the failure for models to perfectly generalize is due to the problem of noise, or
unexplained variations in data. Noisy data is caused by seemingly random events, such as:

 Measurement error due to imprecise sensors that sometimes add or subtract a bit from
the reading
 Issues with reporting data, such as respondents reporting random answers to survey
questions in order to finish more quickly
 Errors caused when data is recorded incorrectly, including missing, null, truncated,
incorrectly coded, or corrupted values

Trying to model the noise in data is the basis of a problem called overfitting. Because
noise is unexplainable by definition, attempting to explain the noise will result in
erroneous conclusions that do not generalize well to new cases. Attempting to generate
theories to explain the noise also results in more complex models that are more likely to
ignore the true pattern the learner is trying to identify. A model that seems to perform well
during training but does poorly during testing is said to be overfitted to the training
dataset as it does not generalize well.
Solutions to the problem of overfitting are specific to particular machine learning approaches. For
now, the important point is to be aware of the issue. How well models are able to handle noisy data
is an important source of distinction among them.

Steps to apply machine learning to your data

Any machine learning task can be broken down into a series of more manageable steps.

1. Collecting data: Whether the data is written on paper, recorded in text files and spreadsheets, or
stored in an SQL database, you will need to gather it in an electronic format suitable for analysis.
This data will serve as the learning material an algorithm uses to generate actionable knowledge.

2. Exploring and preparing the data: The quality of any machine learning project is based largely
on the quality of data it uses. This step in the machine learning process tends to require a
great deal of human intervention. An often cited statistic suggests that 80 percent of the effort in
machine learning is devoted to data. Much of this time is spent learning more about the data
and its nuances during a practice called data exploration.
3. Training a model on the data: By the time the data has been prepared for analysis, you are
likely to have a sense of what you are hoping to learn from the data. The specific machine learning
task will inform the selection of an appropriate algorithm, and the algorithm will represent the data
in the form of a model.
4. Evaluating model performance: Because each machine learning model results in a biased
solution to the learning problem, it is important to evaluate how well the algorithm learned from its
experience. Depending on the type of model used, you might be able to evaluate the accuracy of
the model using a test dataset, or you may need to develop measures of performance specific to
the intended application.
5. Improving model performance: If better performance is needed, it becomes necessary to utilize
more advanced strategies to augment the performance of the model.
Sometimes, it may be necessary to switch to a different type of model altogether. You may need to
supplement your data with additional data, or perform additional preparatory work as in step two
of this process.
After these steps have been completed, if the model appears to be performing satisfactorily, it can
be deployed for its intended task. As the case may be, you might utilize your model to provide
score data for predictions (possibly in real time), for projections of financial data, to generate
useful insight for marketing or research, or to automate tasks such as mail delivery or flying
aircraft. The successes and failures of the deployed model might even provide additional data to
train the next generation of your model.
Types of Machine Learning Architecture

The Machine Learning Architecture can be categorized on the basis of the algorithm used in
training.

1. Supervised Learning
In supervised learning, the training data used for is a mathematical model that consists of both
inputs and desired outputs. Each corresponding input has an assigned output which is also known
as a supervisory signal. Through the available training matrix, the system is able to determine the
relationship between the input and output and employ the same in subsequent inputs post-training
to determine the corresponding output. The supervised learning can further be broadened into
classification and regression analysis based on the output criteria. Classification analysis is
presented when the outputs are restricted in nature and limited to a set of values. However,
regression analysis defines a numerical range of values for the output. Examples of supervised
learning are seen in face detection, speaker verification systems.
2. Unsupervised Learning
Unlike supervised learning, unsupervised learning uses training data that does not contain output.
The unsupervised learning identifies relation input based on trends, commonalities, and the output
is determined on the basis of the presence/absence of such trends in the user input.

3. Reinforcement Training
This is used in training the system to decide on a particular relevance context using various
algorithms to determine the correct approach in the context of the present state. These are widely
used in training gaming portals to work on user inputs accordingly.
Architecting the Machine Learning Process
1. Data Acquisition
As machine learning is based on available data for the system to make a decision hence the first
step defined in the architecture is data acquisition. This involves data collection, preparing and
segregating the case scenarios based on certain features involved with the decision making cycle
and forwarding the data to the processing unit for carrying out further categorization. This stage is
sometimes called the data preprocessing stage. The data model expects reliable, fast and elastic
data which may be discrete or continuous in nature. The data is then passed into stream processing
systems (for continuous data) and stored in batch data warehouses (for discrete data) before
being passed on to data modeling or processing stages.

2. Data Processing
The received data in the data acquisition layer is then sent forward to the data processing layer
where it is subjected to advanced integration and processing and involves normalization of the
data, data cleaning, transformation, and encoding. The data processing is also dependent on the
type of learning being used. For e.g., if supervised learning is being used the data shall be needed
to be segregated into multiple steps of sample data required for training of the system and the data
thus created is called training sample data or simply training data. Also, the data processing is
dependent upon the kind of processing required and may involve choices ranging from action
upon continuous data which will involve the use of specific function-based architecture, for
example, lambda architecture, Also it might involve action upon discrete data which may require
memory-bound processing. The data processing layer defines if the memory processing shall be
done to data in transit or in rest.

3. Data Modeling
This layer of the architecture involves the selection of different algorithms that might adapt the
system to address the problem for which the learning is being devised, These algorithms are being
evolved or being inherited from a set of libraries. The algorithms are used to model the data
accordingly, this makes the system ready for the execution step.
4. Execution
This stage in machine learning is where the experimentation is done, testing is involved and
tunings are performed. The general goal behind being to optimize the algorithm in order to extract
the required machine outcome and maximize the system performance, The output of the step is a
refined solution capable of providing the required data for the machine to make decisions.

5. Deployment
Like any other software output, ML outputs need to be operationalized or be forwarded for further
exploratory processing. The output can be considered as a non-deterministic query which needs to
be further deployed into the decision-making system.
It is advised to seamlessly move the ML output directly to production where it will enable the
machine to directly make decisions based on the output and reduce the dependency on the further
exploratory steps.
What Is Ensemble?
Multiple machine learning algorithms are used in ensemble learning, aiming to improve the
correct prediction ratio on a dataset. A dataset is used to train a list of machine learning models,
and the distinct predictions made by each of the models applied to the dataset form the basis of an
ensemble learning model. The ensemble model then combines the outcomes of different models'
predictions to get the final result.
Each model has advantages and disadvantages. By integrating different independent models,
ensemble models can effectively mask a particular model's flaws.
Typically, ensemble techniques fall into one of two categories:
Bagging Ensemble Learning

The name Bagging came from the abbreviation of Bootstrap AGGregatING. As the name implies,
the two key ingredients of Bagging are bootstrap and aggregation.
Bootstrap aggregation, or bagging for short, is an ensemble learning method that seeks a diverse
group of ensemble members by varying the training data.
This typically involves using a single machine learning algorithm, almost always an unpruned
decision tree, and training each model on a different sample of the same training dataset. The
predictions made by the ensemble members are then combined using simple statistics, such as
voting or averaging.
Key to the method is the manner in which each sample of the dataset is prepared to train
ensemble members. Each model gets its own unique sample of the dataset Examples
(rows) are drawn from the dataset at random, although with replacemen
Replacement means that if a row is selected, it is returned to the training dataset for potential re-
selection in the same training dataset. This means that a row of data may be selected zero, one, or
multiple times for a given training dataset.
This is called a bootstrap sample. It is a technique often used in statistics with small datasets to
estimate the statistical value of a data sample. By preparing multiple different bootstrap samples
and estimating a statistical quantity and calculating the mean of the estimates, a better overall
estimate of the desired quantity can be achieved than simply estimating from the dataset directly.
In the same manner, multiple different training datasets can be prepared, used to estimate a
predictive model, and make predictions. Averaging the predictions across the models typically
results in better predictions than a single model fit on the training dataset directly.
We can summarize the key elements of bagging as follows:
• Bootstrap samples of the training dataset.
• Unpruned decision trees fit on each sample.
• Simple voting or averaging of predictions.
In summary, the contribution of bagging is in the varying of the training data used to fit each
ensemble member, which, in turn, results in skillful but different models.
It is a general approach and easily extended. For example, more changes to the training dataset
can be introduced, the algorithm fit on the training data can be replaced, and the mechanism
used to combine predictions can be modified.

Many popular ensemble algorithms are based on this approach, including:


• Bagged Decision Trees (canonical bagging)
• Random Forest
• Extra Trees
Next, let’s take a closer look at stacking.

Stacking Ensemble Learning


Stacked Generalization, or stacking for short, is an ensemble method that seeks a diverse group
of members by varying the model types fit on the training data and using a model to combine
predictions.
Stacking has its own nomenclature where ensemble members are referred to as level-0 models
and the model that is used to combine the predictions is referred to as a level-1 model.

The two-level hierarchy of models is the most common approach, although more layers of
models can be used. For example, instead of a single level-1 model, we might have 3 or 5
level-1 models and a single level-2 model that combines the predictions of level-1 models in
order to make a prediction.
Boosting Ensemble Learning
Boosting is an ensemble method that seeks to change the training data to focus attention on
examples that previous fit models on the training dataset have gotten wrong.
In boosting, […] the training dataset for each subsequent classifier increasingly focuses on
instances misclassified by previously generated classifiers.

The key property of boosting ensembles is the idea of correcting prediction errors. The models
are fit and added to the ensemble sequentially such that the second model attempts to correct
the predictions of the first model, the third corrects the second model, and so on.
This typically involves the use of very simple decision trees that only make a single or a few
decisions, referred to in boosting as weak learners. The predictions of the weak learners are
combined using simple voting or averaging, although the contributions are weighed
proportional to their performance or capability. The objective is to develop a so-called “strong-
learner” from many purpose-built “weak-learners.”
… an iterative approach for generating a strong classifier, one that is capable of achieving
arbitrarily low training error, from an ensemble of weak classifiers, each of which can barely
do better than random guessing.
Typically, the training dataset is left unchanged and instead, the learning algorithm is modified
to pay more or less attention to specific examples (rows of data) based on whether they have
been predicted correctly or incorrectly by previously added ensemble members. For example,
the rows of data can be weighed to indicate the amount of focus a learning algorithm must give
while learning the model.

We can summarize the key elements of boosting as follows:


• Bias training data toward those examples that are hard to predict.
• Iteratively add ensemble members to correct predictions of prior models.

• Combine predictions using a weighted average of models.


The idea of combining many weak learners into strong learners was first proposed theoretically
and many algorithms were proposed with little success. It was not until the Adaptive Boosting
(AdaBoost) algorithm was developed that boosting was demonstrated as an effective ensemble
method.
The term boosting refers to a family of algorithms that are able to convert weak learners to
strong
Since AdaBoost, many boosting methods have been developed and some, like stochastic
gradient boosting, may be among the most effective techniques for classification and
regression on tabular (structured) data

Summary
In this section, you discovered the three standard ensemble learning techniques for machine
learning.
Specifically, you learned:
•Bagging involves fitting many decision trees on different samples of the same dataset and
averaging the predictions.
•Stacking involves fitting many different models types on the same data and using another
model to learn how to best combine the predictions.
•Boosting involves adding ensemble members sequentially that correct the predictions made by
prior models and outputs a weighted average of the predictions.
ASSIGNMENT – UNIT IV

Very Easy
1. Explain each component of architecture for building machine learning systems
with diagram
2. What are the differences between supervised and unsupervised machine
learning? Explain what you think semi-supervised machine learning is.

Easy
1. Explain in detail about reinforcement learning.
2. Explain unsupervised learning with diagram and applications.

Medium
1. How ensembles learning improves model performance? Explain anyone
ensemble based method.
2. Describe a semi-supervised learning approach and why it can build a stronger
model by using unlabelled data.

Hard
Explain travelling sales person problem. Describe a suitable machine learning
model that supports and solve the problem to cover maximum number of places
with minimum distance covered by the person.

Very Hard
Nowadays, data stored in medical databases are growing in an increasingly rapid
way. Analyzing the data is crucial for medical decision making and management.
There is a huge requirement for the support of specific knowledge-based
problem solving activities through the analysis of patients raw data collected
furing diagnosis. There is a increasing demand for discovery of new knowledge to
be extracted by the analysis of representative collections of example cases,
described by symbolic and numeric descriptors. Explain how machine learning
can deal with the problem of finding interesting patterns in data for the above
scenario. Choose an appropriate model and explain for the application.

49
PART A- UNIT-4
Write down few applications in Machine learning? K2 CO4
Machine learning has a wide range of applications across various
fields. They are
1. Image Recognition
2. Speech Recognition
3. Traffic prediction
4. Product recommendations
5. Self-driving cars
6. Email Spam and Malware Filtering
7. Virtual Personal Assistant
Write a short note on motivation of ML? K2 CO4
The motivation behind machine learning lies in its ability to tackle
complex tasks that are difficult to solve using traditional rule-based
programming. By allowing computers to learn from data and
identify underlying patterns and structures, ML enables automation,
optimization, and innovation across various domains, ultimately
leading to more efficient and intelligent systems.
List any 3 applications in ML K1 CO4
Machine learning finds applications in various domains:
Spam Detection in Email
Customer Behavior Analysis for Targeted Advertising
Weather Forecasting and Climate Prediction
Fraud Detection in Credit Card Transactions
Actuarial Assessment of Storm and Natural Disaster Financial Impact
Autonomous Vehicles Algorithm Development
Energy Efficiency Optimization in Buildings
Crime Hotspot Identification
Genetic Sequencing for Disease Identification
What is learning association? K1 CO4
Learning association involves identifying patterns and relationships
between variables or data items within a dataset. Through
association rule learning, algorithms analyze the co-occurrence of
items and uncover meaningful associations, such as frequently co-
purchased products in a transactional database. By understanding
these associations, businesses can make informed decisions for
product placement, marketing strategies, and customer
recommendations.
PART A- UNIT- 4
Difference between classification and clustering? K1 CO4

Write a short note on Clustering or Define Clustering K1 CO4


Clustering is a data analysis technique used to group similar data
points together based on their characteristics. It aims to identify
natural patterns or groupings within a dataset without prior
knowledge of group memberships. Clustering algorithms partition
data into clusters, where data points within the same cluster are more
similar to each other than to those in other clusters.
Write down the methods used in Supervised learning and K2 CO4
Unsupervised learning?
In supervised learning, methods include:
Regression: Predicting a continuous outcome variable based on one or
more input features.
Classification: Assigning categorical labels or classes to input data
based on their features.

In unsupervised learning, methods include:


Clustering: Grouping similar data points together based on their
features without labeled outcomes.
Dimensionality Reduction: Reducing the number of input variables
while preserving important information and patterns in the data.
PART A- UNIT-4
What is knowledge representation? Or Define knowledge K2 CO4
representation?
Knowledge representation is the process of structuring information in a
format that a computer system can interpret, understand, and
manipulate. It involves encoding knowledge, concepts, and
relationships in a formalized manner, allowing for reasoning, inference,
and problem-solving
Write down the metrics used for evaluation of classification K2 CO4
method?
Accuracy: Proportion of correctly classified instances.
Precision: Proportion of true positive predictions out of all positive
predictions.
Recall (Sensitivity): Proportion of true positive predictions out of all
actual positive instances.
F1 Score: Harmonic mean of precision and recall.
ROC Curve: Plot of true positive rate against false positive rate.
AUC (Area Under ROC Curve): Quantifies overall performance of the
model.
Confusion Matrix: Summarizes performance with counts of true
positives, true negatives, false positives, and false negatives.
Cohen's Kappa: Measures inter-rater agreement for categorical items.
Write down the steps to apply machine learning data? K2 CO4
Define Problem: Clearly articulate the problem to be solved with
machine learning.
Preprocess Data: Clean, handle missing values, scale, and engineer
features.
Select Model: Choose a suitable machine learning algorithm based on
problem type and data.
Train Model: Train the selected model on the training data.
Evaluate Model: Assess model performance using appropriate
evaluation metrics.
Deploy Model: Put the model into production for real-world use.
What is reinforcement learning? K1 CO4
Reinforcement learning involves an agent learning to make decisions by
interacting with an environment to maximize cumulative rewards. The
agent learns through trial-and-error, receiving feedback in the form of
rewards or penalties based on its actions. It aims to discover an optimal
policy that maps states to actions to achieve the desired goal. RL has
applications in various domains, including robotics,
gaming, and autonomous systems.
PART A- UNIT-4
Write a short note on semi supervised and supervised learning K1 CO4
Supervised Learning:
Supervised learning trains a model on labeled data, where each input is
paired with a corresponding target label. The model learns to map input
features to target labels, enabling it to make predictions on new data.
Examples include image classification, spam detection, and stock price
prediction.
Semi-Supervised Learning:
Semi-supervised learning uses a combination of labeled and unlabeled
data for training. The model learns from the labeled data to generalize to
unlabeled instances, effectively leveraging the unlabeled data to improve
performance. It's beneficial when labeled data is scarce or expensive to
obtain, as seen in applications like speech recognition and document
classification.

What is Ensemble learning? K1 CO4


Multiple machine learning algorithms are used in ensemble learning,
aiming to improve thecorrect prediction ratio on a dataset. A dataset is
used to train a list of machine learning models, and the distinct
predictions made by each of the models applied to the dataset form the
basis of an ensemble learning model. The ensemble model then
combines the outcomes of different models' predictions to get the final
result.
What is Bagging? K1 CO4
Bagging (Bootstrap Aggregating) is an ensemble learning technique
where multiple models are trained on bootstrapped subsets of the
training data. Each model contributes to the final prediction through
averaging (for regression) or voting (for classification), reducing variance
and improving overall performance. Examples include Random Forest,
where decision trees are aggregated using bagging to enhance predictive
accuracy.
Write a short note on bagging and boosting? K2 CO4
Bagging (Bootstrap Aggregating):
Bagging combines predictions from multiple models trained on
bootstrapped subsets, reducing variance and overfitting; Random Forest
is a popular example.
Boosting:
Boosting sequentially trains weak learners, focusing on previously
misclassified instances to improve predictive accuracy; AdaBoost and
Gradient Boosting Machines (GBM) are common boosting algorithms.
Difference between reinforcement and unsupervised learning? K1 CO4

Give any two machine learning ethics? K2 CO4


Fairness and Bias Mitigation:
Ensuring fairness in machine learning involves mitigating biases to
prevent discrimination, promoting equity and impartiality in decision-
making.
Transparency and Accountability:
Transparency in AI systems ensures decision-making processes are
understandable and accountable, fostering trust among users and
stakeholders.

How do machines learn?


Abstraction and knowledge representation, Generalization, Factors to be K2 CO4
considered, Assessing the success of learning.
PART B- UNIT-4
What is regression? explain multi linear regression with neat diagram? (13) K2 CO 4

Explain various metrics for evaluating classification method (13). K2 CO 4


What are all the factors to be considered for assessing the success of K2 CO 4
learning? Explain any one algorithm with example. (13)

Explain the following in detail: (13) K2 CO 4


1. Metrics for evaluation of classification method.
2. Steps to apply machine learning to data.
3. Machine learning process
Discuss about ID3 and SVM in detail? (13) K2 CO 4

Write a short note on classification? explain any one algorithm (13) K2 CO 4


What is classification method? Explain any algorithm in detail (13) K2 CO 4
Explain ML architecture in detail with neat diagram. (13) K2 CO 4

Explain general machine learning algorithm with neat diagram? (13) K2 CO 4


Draw the architecture for machine learning algorithm and explain it in K2 CO 4
detail? (13)
K2 CO 4
Explain Ensemble Learning and its types? (15)
What is EL? Explain bagging and boosting in detail with neat diagram K3 CO 4
SUPPORTIVE ONLINE COURSES – UNIT IV

https://onlinecourses.nptel.ac.in/noc21_cs42/preview
An Introduction to Artificial Intelligence
By Prof. Mausam | IIT Delhi

https://www.coursera.org/learn/computational-thinking-problem-
solving

https://www.coursera.org/learn/artificial-intelligence-education-
for-teachers

https://www.coursera.org/specializations/ai-healthcare

https://www.coursera.org/learn/predictive-modeling-machine-
learning
https://www.drdobbs.com/parallel/the-practical-application-of-
prolog/184405220

52
REAL TIME APPLICATION- UNIT IV
Neural Networks find extensive applications in areas where traditional computers
don’t fare too well. Like, for problem statements where instead of programmed
outputs, you’d like the system to learn, adapt, and change the results in sync with
the data you’re throwing at it. Neural networks also find
rigorous applications whenever we talk about dealing with noisy or incomplete data.
And honestly, most of the data present out there is indeed noisy.

With their brain-like ability to learn and adapt, Neural Networks form the entire
basis and have applications in Artificial Intelligence, and consequently, Machine
Learning algorithms. Before we get to how Neural Networks power Artificial
Intelligence, let’s first talk a bit about what exactly is Artificial Intelligence.

For the longest time possible, the word “intelligence” was just associated with the
human brain. But then, something happened! Scientists found a way of training
computers by following the methodology our brain uses. Thus came Artificial
Intelligence, which can essentially be defined as intelligence originating from
machines. To put it even more simply, Machine Learning is simply providing
machines with the ability to “think”, “learn”, and “adapt”.

With so much said and done, it’s imperative to understand what exactly are the use
cases of AI, and how Neural Networks help the cause. Let’s dive into
the applications of Neural Networks across various domains – from Social
Media and Online Shopping, to Personal Finance, and finally, to the smart assistant
on your phone.

You should remember that this list is in no way exhaustive, as the applications
of neural networks are widespread. Basically, anything that makes the machines
learn is deploying one or the other type of neural network.

53
Social Media
The ever-increasing data deluge surrounding social media gives the creators of these
platforms the unique opportunity to dabble with the unlimited data they have. No
wonder you get to see a new feature every fortnight. It’s only fair to say that all of this
would’ve been like a distant dream without Neural Networks to save the day.

Neural Networks and their learning algorithms find extensive applications in the world of
social media. Let’s see how:

Facebook
As soon as you upload any photo to Facebook, the service automatically highlights faces
and prompts friends to tag. How does it instantly identify which of your friends is in the
photo?
The answer is simple – Artificial Intelligence. In a video highlighting Facebook’s Artificial
Intelligence research, they discuss the applications of Neural Networks to power their
facial recognition software. Facebook is investing heavily in this area, not only within the
organization, but also through the acquisitions of facial-recognition startups
like Face.com (acquired in 2012 for a rumored $60M), Masquerade (acquired in 2016 for
an undisclosed sum), and Faciometrics (acquired in 2016 for an undisclosed sum).
In June 2016, Facebook announced a new Artificial Intelligence initiative that uses
various deep neural networks such as DeepText – an artificial intelligence engine
that can understand the textual content of thousands of posts per second, with
near-human accuracy.
Instagram
Instagram, acquired by Facebook back in 2012, uses deep learning by making use
of a connection of recurrent neural networks to identify the contextual meaning of
an emoji – which has been steadily replacing slangs (for instance, a laughing
emoji could replace “rofl”).
By algorithmically identifying the sentiments behind emojis, Instagram creates
and auto-suggests emojis and emoji related hashtags. This may seem like a
minor application of AI, but being able to interpret and analyze this emoji-to-text
translation at a larger scale sets the basis for further analysis on how people use
Instagram
Online Shopping
Do you find yourself in situations where you’re set to buy something, but you end
up buying a lot more than planned, thanks to some super-awesome
recommendations?
Yeah, blame neural networks for that. By making use of neural network and its
learnings, the e-commerce giants are creating Artificial Intelligence systems that
know you better than yourself. Let’s see how:
Search
Your Amazon searches (“earphones”, “pizza stone”, “laptop charger”, etc) return a
list of the most relevant products related to your search, without wasting much
time. In a description of its product search technology, Amazon states that
its algorithms learn automatically to combine multiple relevant features. It uses
past patterns and adapts to what is important for the customer in question.
And what makes the algorithms “learn”? You guessed it right – Neural Networks!
Recommendations
Amazon shows you recommendations using its “customers who viewed this item
also viewed”, “customers who bought this item also bought”, and also via curated
recommendations on your homepage, on the bottom of the item pages, and
through emails. Amazon makes use of Artificial Neural Networks to train its
algorithms to learn the pattern and behaviour of its users. This, in turn, helps
Amazon provide even better and customized recommendations.
CONTENT BEYOND SYLLABUS – UNIT IV

Autoencoders
Autoencoders are a specialized class of algorithms that can learn efficient
representations of input data with no need for labels. It is a class of
artificial neural networks designed for unsupervised learning.
Learning to compress and effectively represent input data without specific
labels is the essential principle of an automatic decoder. This is
accomplished using a two-fold structure that consists of an encoder and a
decoder.
The encoder transforms the input data into a reduced-dimensional
representation, which is often referred to as “latent space” or “encoding”.
From that representation, a decoder rebuilds the initial input. For the
network to gain meaningful patterns in data, a process of encoding and
decoding facilitates the definition of essential features.
The general architecture of an autoencoder includes an encoder, decoder,
and bottleneck layer.

Encoder
•Input layer take raw input data
•The hidden layers progressively reduce the dimensionality of the input,
capturing important features and patterns. These layer compose the
encoder.
•The bottleneck layer (latent space) is the final hidden layer, where the
dimensionality is significantly reduced. This layer represents the
compressed encoding of the input data.

56
Decoder
•The bottleneck layer takes the encoded representation and expands it back
to the dimensionality of the original input.
•The hidden layers progressively increase the dimensionality and aim to
reconstruct the original input.
•The output layer produces the reconstructed output, which ideally should be
as close as possible to the input data.

The loss function used during training is typically a reconstruction loss,


measuring the difference between the input and the reconstructed output.
Common choices include mean squared error (MSE) for continuous data or
binary cross-entropy for binary data.

During training, the autoencoder learns to minimize the reconstruction loss,


forcing the network to capture the most important features of the input data
in the bottleneck layer.

After the training process, only the encoder part of the autoencoder is
retained to encode a similar type of data used in the training process. The
different ways to constrain the network are: –
Keep small Hidden Layers: If the size of each hidden layer is kept as small as
possible, then the network will be forced to pick up only the representative
features of the data thus encoding the data.
Regularization: In this method, a loss term is added to the cost function which
encourages the network to train in ways other than copying the input.
Denoising: Another way of constraining the network is to add noise to the
input and teach the network how to remove the noise from the data.
Tuning the Activation Functions: This method involves changing the
activation functions of various nodes so that a majority of the nodes are
dormant thus, effectively reducing the size of the hidden layers.
ASSESSMENT SCHEDULE

Tentative schedule for the Assessment During


2023-2024 EVEN semester

Name of the
S.NO Start Date End Date Portion
Assessment

1 UNIT TEST 1 2.2.24 9.2.24 UNIT 1

2 IAT 1 12.2.24 17.2.24 UNIT 1 & 2

3 UNIT TEST 2 11.3.24 16.3.24 UNIT 3

4 IAT 2 1.4.24 6.4.24 UNIT 3 & 4

5 MODEL 20.4.24 30.4.24 ALL 5 UNITS

65
PRESCRIBED TEXT BOOKS AND REFERENCE BOOKS

1. Introduction to Artificial Intelligence and Machine Learning (IBM ICE


Publications).

2. Stuart Russell, Peter Norvig, “Artificial Intelligence: A Modern


Approach”, Third Edition, Pearson Education I Prentice Hall of India,
2010.

3. Elaine Rich and Kevin Knight, “Artificial Intelligence”, Third Edition,


Tata McGraw-Hill, 2010.

REFERENCES:

1. Patrick H. Winston. "Artificial Intelligence", Third edition, Pearson


Edition, 2006.

2. Dan W.Patterson, “Introduction to Artificial Intelligence and Expert


Systems”, PHI, 2006.

3. Nils J. Nilsson, “Artificial Intelligence: A new Synthesis”, Harcourt


Asia Pvt. Ltd., 2000.

67
Mini Projects
1. Fruit Classifier: This project involves using machine learning
to identify fruits. It is an easy project suitable for beginners.

2. Student Performance Predictor: This mini-project involves


predicting academic success using machine learning. It is an
easy project suitable for beginners.

3. Sentiment Analysis on Social Media: This project involves


analyzing user reviews with machine learning. It is a medium
level project.

4. Loan Approval Prediction: This project involves assessing


creditworthiness using machine learning. It is a medium level
project.

5. Autonomous Robot Navigation: This project involves


implementing reinforcement learning for pathfinding. It is a
hard project.

6. Medical Image Segmentation: This project involves


identifying and analyzing tumor regions with machine
learning. It is a hard project.

7. Natural Language Processing Chatbot: This project


involves building an intelligent conversational agent with
machine learning.
Thank you

Disclaimer:

This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document through
email in error, please notify the system manager. This document contains proprietary
information and is intended only to the respective group / learning community as
intended. If you are not the addressee you should not disseminate, distribute or
copy through e-mail. Please notify the sender immediately by e-mail if you have
received this document by mistake and delete this document from your system. If
you are not the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this information is
strictly prohibited.

60

You might also like