Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

lOMoARcPSD|9421877

Unit-1(MLT) - Lecture notes 1


lOMoARcPSD|9421877

Machine learning Techniques (KCS-055)


Departmental Elective II
Unit – 1(Introduction)
1. Introduction-
1.1 What is Learning
Learning is the process of acquiring new understanding, knowledge, behaviors, skills,
values, attitudes, and preferences. The ability to learn is possessed by humans,
animals, and some machines; there is also evidence for some kind of learning in
certain plants.

1.2 Learning in Computer


Ever since computers were invented, we have wondered whether they might be made
to learn. If we could understand how to program them to learn to improve
automatically with experience-the impact would be dramatic. Imagine computers
learning from medical records which treatments are most effective for new diseases,
personal software assistants learning the evolving interests of their users in order to
highlight especially relevant stories from the online morning newspaper. A successful
understanding of how to make computers learn would open up many new uses of
computers and new levels of competence and customization.

We do not yet know how to make computers learn nearly as well as people learn.
However, algorithms have been invented that are effective for certain types of learning
tasks, and a theoretical understanding of learning is beginning to emerge. Many
practical computer programs have been developed to exhibit useful types of learning,
and significant commercial applications have begun to appear. For problems such as
speech recognition, algorithms based on machine learning outperform all other
approaches that have been attempted to date. In the field known as data mining,
machine learning algorithms are being used routinely to discover valuable knowledge
from large commercial databases containing equipment maintenance records, loan
applications, financial transactions, medical records, and the like. As our
understanding of computers continues to mature, it seems inevitable that machine
learning will play an increasingly central role in computer science and computer
technology. In recent years, many successful ML applications have been developed,
ranging from data-mining programs that learn to detect fraudulent credit card
transactions, to information-filering systems that learn users’ reading preferences, to
autonomous vehicles that learn to drive on public highways.
lOMoARcPSD|9421877

1.3 Multi disciplinary


Machine learning draws on concepts and results from many fields, including statistics,
artificial intelligence, philosophy, information theory, biology, cognitive science,
computational complexity, and control theory.

2. Introduction to Machine learning from programming perspective-


What is machine learning? Let us try to understand machine learning from a
programming perspective. The field of Machine Learning (ML) is concerned with the
question of how to construct computer programs that automatically improve with
experience. So, how the programming is different from machine learning? We will try
to answer that question first and then slowly go into basic terminologies of machine
learning and various different modules of the machine learning systems. So, let’s try
to understand machine learning again from a programmer’s perspective. Let’s take
two problems.

Figure 1
The first problem is let’s write a program to add two numbers a and b, most of you
will wonder what is a question this is such a basic question probably this particular
program is among some of the early programs that all of us have written. t. So, how
do we really write this program? We essentially write a function f() which takes two
arguments a and b and then it returns a + b. This is a program that all of you are
familiar with, we can add two numbers very easily by writing a computer program.
Let us try to solve a slightly different problem with the same technique and we will
see whether we can solve it or if we need some more tools in our toolkit. The second
lOMoARcPSD|9421877

problem is let’s say, we have a bunch of handwritten digits 8,9,2. So, what we have
done is, we have fixed an area in which you can write these digits and now the task is
Can you write a program to recognize these digits. Your job is to write a function that
recognizes digit given the picture digit image. So, can you write a program just as you
did for the addition of two numbers to recognize handwritten digits. Now, I can
imagine that some of you must have started thinking about writing rules for different
kind of numbers. Are rules really scalable? What if I write the number in a slightly
different orientation or I write a number in a very different style, probably rules will
break rules would not be able to cater to all the situations. But as a human being, we
are able to recognize these numbers. What makes us recognize these numbers? We
will come to this question in a bit. But before that can we write down the process of
recognizing these digits just as we did in the other problem with where we added two
numbers. When we were giving given two numbers a and b we immediately came up
with a step or we immediately came up with a function to add two numbers which was
simply a + b. But as you can imagine or as you must be facing right now is it is
incredibly hard to come up with the stepwise process to recognize the digits.
So, how do we really solve this problem? And before getting into solving the problem,
let us think what is a difference between these two problems, why are we
able to solve the first problem very easily, but the second problem is a bit of a harder
problem for us to recognize digits with computers.
What are the key differences between these two problems?
In the first problem, the formula to add two numbers was known to us. So, given two
numbers a and b I can simply do a + b and that gave us the answer.
But in case of the second problem where we are trying to recognize digits, we are
able to recognize it with our vision but unable to come up with steps that we can code
up in the computer so that computer can also start recognizing digits. So, we need to
do something else: Machine learning.

Let us take a step back and try to understand why we are able to recognize these digits
you can think that we have been seeing these kinds of digits right from our childhood.
When you started our formal education we are introduced to these digits.
So, somehow our brain is trained to recognize these digits even if they are written in a
slightly different style or in a slightly different orientation. Can we try to mimic the
training that we provided to a brain, can we give the same training to a computer?
Let’s try to explore that. This is the question that ML tries to explore. So, let us write
down the key difference between the programming the traditional programming
paradigm and the ML.
In our traditional programming world, we have a program,we give some data as an
input and we also input the rules, rather we code these rules in the program and then
lOMoARcPSD|9421877

pass the data into this program, the rules get applied on the data and we get the output.
We did exactly the same thing while adding two numbers. When we sort the numbers,
we also give step by step instructions to the computer as to how to sort these numbers.
Now, let us look at how machine learning operates, remember the handwritten digit
recognition examples and we see that we have data, but we do not have rules.
We cannot write a traditional computer program, but we can actually provide lots of
examples of handwritten digits along with the corresponding digit. For example, I can
say that this is the image and 8 is the digit corresponding to this particular image, 9 is
the digit corresponding to this particular image, 2 is the digit corresponding to this
particular image and this is 8. We have lots of examples where we have images of
handwritten digits along with their actual labels, which are nothing but the numbers
that are there in the handwritten digit.
We have data and we also provide the intended output as input to ML and machine
learning comes up with rules or sometimes you also collect patterns or models.
You can now see a clear difference here that in traditional programming, the rule is on
the left-hand side, and in ML, the rule is on the right-hand side and the output which
was on the right-hand side in traditional programming, had moved to the left-hand
side (input side). (See Figure. 1)
The traditional program takes data and rules as input, the rules are applied to the input
data to produce the output. In the case of ML, we have data and the output as the input
given to the ML and ML comes up with rules or patterns or models that it sees in the
input data.
We will write down the steps in the ML process here. So, we have data and
we have labels. Input them to ML trainer. The trainer looks at the input data
and corresponding labels(output) and forms rules. So, this gives us a model or rules;
the model is nothing but the mapping of input to the output. Once we get this
particular model what we do is, we can take the new data and pass it through the
model to get the output. We can see that once we get the model the process is exactly
the same as the programming world, because once I know the model I know
exactly the formula to map the input to the output. The process or the work that we do
in ML training is to take the data and desired output and use ML trainer to come up
with a model and once we have modelled we can use that model to get output on the
new data. Now we can see that, so there are two stages in the machine learning
process- 1. Training 2. Inference or Prediction
This particular stage where we had data and we reached to model, is called as a
training stage. When we apply model to new data and get the output, this particular
phase is called as inference or prediction.
lOMoARcPSD|9421877

3. Definition of Machine Learning


A computer program is said to learn from experience E with respect to some class of
tasks T and performance measure P, if its performance at tasks in T, as measured by P,
improves with experience E.
For example, assume that a machine has to predict whether a customer will buy a
specific product lets say “Antivirus” this year or not. The machine will do it by
looking at the previous knowledge/past experiences i.e the data of products that the
customer had bought every year and if he buys Antivirus every year, then there is a
high probability that the customer is going to buy an antivirus this year as well. This is
how machine learning works at the basic conceptual level.

4. History of ML

1950 — Alan Turing creates the “Turing Test” to determine if a computer has real
intelligence. To pass the test, a computer must be able to fool a human into believing it
is also human.

1952 — Arthur Samuel wrote the first computer learning program. The program was
the game of checkers, and the IBM computer improved at the game the more it
played, studying which moves made up winning strategies and incorporating those
moves into its program.

1957 — Frank Rosenblatt designed the first neural network for computers (the
perceptron), which simulate the thought processes of the human brain.

1967 — The “nearest neighbor” algorithm was written, allowing computers to begin
using very basic pattern recognition. This could be used to map a route for traveling
salesmen, starting at a random city but ensuring they visit all cities during a short tour.

1979 — Students at Stanford University invent the “Stanford Cart” which can
navigate obstacles in a room on its own.

1981 — Gerald Dejong introduces the concept of Explanation Based Learning (EBL),
in which a computer analyses training data and creates a general rule it can follow by
discarding unimportant data.

1985 — Terry Sejnowski invents NetTalk, which learns to pronounce words the same
way a baby does.
lOMoARcPSD|9421877

1990s — Work on machine learning shifts from a knowledge-driven approach to a


data-driven approach. Scientists begin creating programs for computers to analyze
large amounts of data and draw conclusions — or “learn” — from the results.

1997 — IBM’s Deep Blue beats the world champion at chess.

2006 — Geoffrey Hinton coins the term “deep learning” to explain new algorithms
that let computers “see” and distinguish objects and text in images and videos.

2010 — The Microsoft Kinect can track 20 human features at a rate of 30 times per
second, allowing people to interact with the computer via movements and gestures.

2011 — IBM’s Watson beats its human competitors at Jeopardy.

2011 — Google Brain is developed, and its deep neural network can learn to discover
and categorize objects much the way a cat does.

2012 – Google’s X Lab develops a machine learning algorithm that is able to


autonomously browse YouTube videos to identify the videos that contain cats.

2014 – Facebook develops DeepFace, a software algorithm that is able to recognize or


verify individuals on photos to the same level as humans can.

2015 – Amazon launches its own machine learning platform.

2015 – Microsoft creates the Distributed Machine Learning Toolkit, which enables the
efficient distribution of machine learning problems across multiple computers.

2015 – Over 3,000 AI and Robotics researchers, endorsed by Stephen Hawking, Elon
Musk and Steve Wozniak (among many others), sign an open letter warning of the
danger of autonomous weapons which select and engage targets without human
intervention.

2016 – Google’s artificial intelligence algorithm beats a professional player at the


Chinese board game Go, which is considered the world’s most complex board game
and is many times harder than chess. The AlphaGo algorithm developed by Google
DeepMind managed to win five games out of five in the Go competition.

5. Relation of Artificial Intelligence, Machine Learning and Deep Learning

Artificial Intelligence (AI) -the broad discipline of creating intelligent machines.


lOMoARcPSD|9421877

Machine Learning (ML) -refers to systems that can learn from experience.

Deep Learning (DL) -refers to systems that learn from experience on large data sets.

6.Types of Learning

6.1Supervised Learning:
Supervised learning is when the model is getting trained on a labelled
dataset. Labelled dataset is one which have both input and output parameters. In this
type of learning both training and validation datasets are labelled as shown in the
figures below.
lOMoARcPSD|9421877

Both the above figures have labelled data set –


 Figure A: It is a dataset of a shopping store which is useful in predicting
whether a customer will purchase a particular product under consideration or not
based on his/ her gender, age and salary.
Input: Gender,Age,Salary
Output : Purchased i.e. 0 or 1 ; 1 means yes the customer will purchase and 0
means that customer won’t purchase it.
 Figure B: It is a Meteorological dataset which serves the purpose of predicting
wind speed based on different parameters.
Input : Dew Point, Temperature, Pressure, Relative Humidity, Wind Direction
Output : Wind Speed

Types of Supervised Learning:


1. Classification : It is a Supervised Learning task where output is having defined
labels(discrete value). For example in above Figure A, Output – Purchased has
defined labels i.e. 0 or 1 ; 1 means the customer will purchase and 0 means that
customer won’t purchase. It can be either binary or multi class classification.
In binary classification, model predicts either 0 or 1 ; yes or no but in case
of multi class classification, model predicts more than one class.
Example: Gmail classifies mails in more than one classes like social,
promotions, updates, forum.
2. Regression : It is a Supervised Learning task where output is having
continuous value.
Example in above Figure B, Output – Wind Speed is not having any discrete
lOMoARcPSD|9421877

value but is continuous in the particular range. The goal here is to predict a value
as much closer to actual output value as our model can and then evaluation is
done by calculating error value. The smaller the error the greater the accuracy of
our regression model.

6.2 Unsupervised Learning


Unsupervised learning is the training of machine using information that is not
labeled(output label not present) and allowing the algorithm to act on that information
without guidance. Here the task of machine is to group information according to
similarities, patterns and differences without any prior training of data.
Unlike supervised learning, no teacher is provided that means no training will be
given to the machine. Therefore machine is restricted to find the hidden structure in
unlabeled data by our-self.
For instance, suppose it is given an image having both dogs and cats which have not
seen ever.

Thus the machine has no idea about the features of dogs and cat so we can’t categorize
it in dogs and cats. But it can categorize them according to their similarities, patterns,
and differences i.e., we can easily categorize the above picture into two parts. First
may contain all pics having dogs in it and second part may contain all pics
having cats in it. Here you didn’t learn anything before, means no training data or
examples.
Unsupervised learning classified into two categories of algorithms:
 Clustering: A clustering problem is where you want to discover the inherent
groupings in the data, such as grouping customers by purchasing behavior.
 Association: An association rule learning problem is where you want to
discover rules that describe large portions of your data, such as people that buy X
also tend to buy Y.
lOMoARcPSD|9421877

6.3 Semi-Supervised Learning


The most basic disadvantage of any Supervised Learning algorithm is that the
dataset has to be hand-labeled either by a Machine Learning Engineer or a Data
Scientist. This is a very costly process, especially when dealing with large volumes of
data. The most basic disadvantage of any Unsupervised Learning is that
it’s application spectrum is limited.
To counter these disadvantages, the concept of Semi-Supervised Learning was
introduced. In this type of learning, the algorithm is trained upon a combination of
labeled and unlabeled data. Typically, this combination will contain a very small
amount of labeled data and a very large amount of unlabeled data. The basic
procedure involved is that first, the programmer will cluster similar data using an
unsupervised learning algorithm and then use the existing labeled data to label the rest
of the unlabeled data.
Intuitively, one may imagine the three types of learning algorithms as Supervised
learning where a student is under the supervision of a teacher at both home and
school, Unsupervised learning where a student has to figure out a concept himself and
Semi-Supervised learning where a teacher teaches a few concepts in class and gives
questions as homework which are based on similar concepts.
6.4 Reinforcement Learning
Reinforcement learning addresses the question of how a system that senses and acts in
its environment can learn to choose optimal actions to achieve its goals. This very
generic problem covers tasks such as learning to control a mobile robot, learning to
optimize operations in factories, and learning to play board games. Each time the
system performs an action in its environment, a trainer may provide a reward or
penalty to indicate the desirability of the resulting state.The task of the agent is to
learn to choose sequences of actions that produce the greatest cumulative reward.

Consider the scenario of teaching new tricks to your cat

 As cat doesn't understand English or any other human language, we


can't tell her directly what to do. Instead, we follow a different
strategy.
 We emulate a situation, and the cat tries to respond in many different
ways. If the cat's response is the desired way, we will give her fish
otherwise some penalty.
lOMoARcPSD|9421877

 Now whenever the cat is exposed to the same situation, the cat
executes a similar action with even more enthusiastically in
expectation of getting more reward (food).
 That's like learning that cat gets from "what to do" from positive
experiences.
 At the same time, the cat also learns what not do when faced with
negative experiences.

7. Successful Applications of Machine Learning-


-Learning to recognize spoken words- All of the most successful speech recognition
systems employ machine learning in some form. For example, the SPHINX system
(e.g., Lee 1989) learns speaker-specific strategies for recognizing the primitive sounds
(phonemes) and words from the observed speech signal. Neural network learning
methods (e.g., Waibel et al. 1989) and methods for learning hidden Markov models
(e.g., Lee 1989) are effective for automatically customizing to, individual speakers,
vocabularies, microphone characteristics, background noise, etc. Similar techniques
have potential applications in many signal-interpretation problems.
-Learning to drive an autonomous vehicle- ML methods have been used to train
computer-controlled vehicles to steer correctly when driving on a variety of road
types. For example, the ALVINN system (Pomerleau 1989) has used its learned
strategies to drive unassisted at 70 miles per hour for 90 miles on public highways
among other cars. Similar techniques have possible applications in many sensor-based
control problems.
-Learning to classify new astronomical structures-
Machine learning methods have been applied to a variety of large databases to learn
general regularities implicit in the data. For example, decision tree learning algorithms
have been used by NASA to learn how to classify celestial objects from the second
Palomar Observatory Sky Survey (Fayyad et al. 1995). This system is now used to
automatically classify all objects in the Sky Survey, which consists of three terrabytes
of image data.
-Learning to play world-class backgammon- The most successful computer
programs for playing games such as backgammon are based on ML algorithms. For
example, the world's top computer program for backgammon, TD-GAMMON
(Tesauro 1992, 1995). learned its strategy by playing over one million practice games
against itself. It now plays at a level competitive with the human world champion.
lOMoARcPSD|9421877

Similar techniques have applications in many practical problems where very large
search spaces must be examined efficiently.

8.Well-Posed/Defined Learning Problems


Let us begin our study of ML by considering a few learning tasks. We will define
learning broadly, to include any computer program that improves its performance at
some task through experience. Put more precisely,
Definition: A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.
For example, a computer program that learns to play checkers might improve its
performance as measured by its ability to win at the class of tasks involving playing
checkers games, through experience obtained by playing games against itself. In
general, to have a well-defined learning problem, we must identity these three
features: the class of tasks, the measure of performance to be improved, and the source
of experience.
A checkers learning problem:
Task T: playing checkers
Performance measure P: percent of games won against opponents
Training experience E: playing practice games against itself
We can specify many learning problems in this fashion, such as learning to recognize
handwritten words, or learning to drive a robotic automobile autonomously.

A handwriting recognition learning problem:


Task T: recognizing and classifying handwritten words within images
Performance measure P: percent of words correctly classified
Training experience E: a database of handwritten words with given classifications

A robot driving learning problem:


Task T: driving on public four-lane highways using vision sensors
Performance measure P: average distance traveled before an error (as judged by
human overseer)
Training experience E: a sequence of images and steering commands recorded while
observing a human driver
9. Hypothesis- A hypothesis (plural: hypotheses), in a scientific context, is a testable
statement about the relationship between two or more variables or a proposed
lOMoARcPSD|9421877

explanation for some observed phenomenon. Hypothesis then, can be tested for truth,
via experiment or mathematical proofs. If it is tested correct, it becomes theory.
Theory is proved explanation about phenomenon.

A machine learning hypothesis is a candidate model that approximates a target


function for mapping inputs to outputs.

10. Designing A Learning System- Let us take a simple example of designing a


system to predict the House price.
10.1 Choosing the Training Experience-This is most important step in designing a
ML system. A credible and live set of training examples can serve the purpose. A
survey can be conducted to get a perfect set of training examples. For example in
designing a ML system for predicting House price, Training data can be in the form as
below.
{X1, X2, X3, X4, X5, X6, X7, X8, X9, X10}: {Y1, Y2, Y3, Y4, Y5, Y6, Y7, Y8, Y9, Y10}
X1,X2,X3… are called Data points(Inputs) andY1, Y2, ….are called labels/Outputs.
Each data point can have multiple features. For example each data point can have four
features. 1.e. X1 has X11, , X12, X13, X14.
For example in case of House price prediction problem, These features could be-
1. No. of rooms in House
2. Distance from market
3. Distance from station
4. Area in Square feet
Y1, Y2, Y3….are corresponding price in Training data set.
Task T: Price estimation of a house
Performance measure P: Accuracy with which price is estimated
Training experience E: Data point collected from training example

In order to complete the design of the learning system, we must now choose
1. the exact type of knowledge to be learned
2. a representation for this target knowledge
3. a learning mechanism
lOMoARcPSD|9421877

10.2 Choosing the Target Function/Evaluation function and its representation-


The next design choice is to determine exactly what type of knowledge will be learned
and how this will be used by the performance program. In house price example, we
need a function which can map from a set of inputs to set of outputs.
i.e. Estimateprice, V(X) : X--->Y
For example in case of playing chess program, it needs only to learn how to choose
the best move from among these legal moves. This learning task is representative of a
large class of tasks for which the legal moves that define some large search space are
known a priori, but for which the best search strategy is not known. Many
optimization problems fall into this class, such as the problems of scheduling and
controlling manufacturing processes where the available manufacturing steps are well
understood, but the best strategy for sequencing them is not. In any ML model, it is
useful to reduce the problem of improving performance P at task T to the problem of
learning some particular target function such as Estimateprice, V(X). The choice of
the target function will therefore be a key design choice.
We can represent target function as the linear combination of various features , or a
quadratic polynomial function of features, or an artificial neural network. In general,
this choice of representation involves a crucial tradeoff, and depends upon nature of
problem and experience of designer in the field. On one hand, we wish to pick a very
expressive representation to allow representing as close an approximation as possible
to the ideal target function V(X). On the other hand, the more expressive the
representation, the more training data the program will require in order to choose
among the alternative hypotheses it can represent. To keep the discussion brief, let us
choose a simple representation:
for any given data point, the function will be calculated as a linear combination of
the features of data point.
X1: No. of rooms in House
X2: Distance from market
X3: Distance from station
X4: Area in Square feet
Estimateprice(V) =W0+W1X1+W2X2+W3X3+W4X4
lOMoARcPSD|9421877

Thus, our learning program will represent Target Function as a linear function of the
form where W0 through W4 are numerical coefficients, or weights, to be chosen by the
learning algorithm. Learned values for the weights W1 through W4 will determine the
relative importance of the various house features in determining the price of house,
whereas the weight W0 will provide an additive constant to the features of data point.
To summarize our design choices thus far, we have elaborated the original formulation
of the learning problem by choosing a type of training experience, a target function to
be learned, and a representation for this target function. Therefore, Partial design of a
House pricing estimation program becomes:
Task T: Price estimation of a House
Performance measure P: Accuracy with which price is estimated
Training experience E: Data point collected from training example
Target Function, Estimateprice(V): X--->Y
Representationof target function: Estimateprice(V)=W0+W1X1+W2X2+W3X3+W4X4

10.3 Choosing a Function Approximation Algorithm- Now we need to find


optimized values of w1, w2, w3, w4 so that difference in predicted value and actual
value is minimized.
10.3.1 Adjusting the weights
All that remains is to specify the learning algorithm for choosing the weights W i to
best fit the set of training examples {Xi,Yi}. One common approach is to define the
best hypothesis, or set of weights, as that which minimizes the squared error E
between the training values and the values predicted by the hypothesis V.
n
E= ∑ (Yi-Vw(X i)) 2
i=1

Thus, we seek the weights, that minimize E for the observed training examples.
Several algorithms are known for finding weights of a linear function that minimize E
defined in this way. In our case, we require an algorithm that will incrementally refine
the weights as new training examples become available and that will be robust to
errors in these estimated training values. One such algorithm is called the least mean
squares, or LMS training rule. For each observed training example it adjusts the
weights a small amount in the direction that reduces the error on this training example.
This algorithm can be viewed as performing a stochastic gradient-descent search
through the space of possible hypotheses (weight values) to minimize the squared
error E. The LMS algorithm is defined as follows:
lOMoARcPSD|9421877

LMS weight update rule.


For each training example (Xi, Yi), Use the current weights to calculate V(Xi)
For each weight Wi, update it as
Wi <----Wi+ȵ (Yi-V(Xi)) Xi
Here ȵ is a small constant (e.g., 0.1) that moderates the size of the weight update. To
get an intuitive understanding for why this weight update rule works, notice that when
the error (Yi-V(Xi)) is zero, no weights are changed. When (Y i-V(Xi)) is positive (i.e.,
when V(X i) is too low), then each weight is increased in proportion to the value of its
corresponding feature. This will raise the value of V(X i), reducing the error. Notice
that if the value of some feature X i is zero, then its weight is not altered regardless of
the error, so that the only weights updated are those whose features actually occur on
the training example. Surprisingly, in certain settings this simple weight-tuning
method can be proven to converge to the least squared error approximation to the
training values.

11. Perspectives and issues in Machine Learning


One useful perspective on machine learning is that it involves searching a very large
space of possible hypotheses to determine one that best fits the observed data and any
prior knowledge held by the learner. For example, consider the space of hypotheses
that could in principle be output by the above checkers learner. This hypothesis space
consists of all evaluation functions that can be represented by some choice of values
for the weights wo through w6. The learner's task is thus to search through this vast
space to locate the hypothesis that is most consistent with the available training
examples. The LMS algorithm for fitting weights achieves this goal by iteratively
tuning the weights, adding a correction to each weight each time the hypothesized
evaluation function predicts a value that differs from the training value. This algorithm
works well when the hypothesis representation considered by the learner defines a
continuously parameterized space of potential hypotheses.
The field of machine learning is concerned with answering questions such as the
following:
1. What algorithms exist for learning general target functions from specific training
examples? In what settings will particular algorithms converge to the desired
function, given sufficient training data? Which algorithms perform best for which
types of problems and representations?
2. How much training data is sufficient? What general bounds can be found to relate
the confidence in learned hypotheses to the amount of training experience and the
character of the learner's hypothesis space?
lOMoARcPSD|9421877

3. When and how can prior knowledge held by the learner guide the process of
generalizing from examples? Can prior knowledge be helpful even when it is only
approximately correct?
4. What is the best strategy for choosing a useful next training experience, and how
does the choice of this strategy alter the complexity of the learning problem?
5. What is the best way to reduce the learning task to one or more function
approximation problems? Put another way, what specific functions should the
system attempt to learn? Can this process itself be automated?
6. How can the learner automatically alter its representation to improve its ability to
represent and learn the target function?

12. Data Science vs Machine Learning


Data Science follows an interdisciplinary approach. It lies at the intersection of
Maths, Statistics, Artificial Intelligence, Software Engineering and Design
Thinking. Data Science deals with data collection, cleaning, analysis, visualisation,
model creation, model validation, prediction, designing experiments, hypothesis
testing and much more. The aim of all these steps is just to derive insights from
data.
Machine learning is a branch of artificial intelligence that is utilised by data
science to achieve its objectives. It gives machines the ability to learn, without
being explicitly programmed.

You might also like