Professional Documents
Culture Documents
Unit 1 Notes
Unit 1 Notes
We do not yet know how to make computers learn nearly as well as people learn.
However, algorithms have been invented that are effective for certain types of learning
tasks, and a theoretical understanding of learning is beginning to emerge. Many
practical computer programs have been developed to exhibit useful types of learning,
and significant commercial applications have begun to appear. For problems such as
speech recognition, algorithms based on machine learning outperform all other
approaches that have been attempted to date. In the field known as data mining,
machine learning algorithms are being used routinely to discover valuable knowledge
from large commercial databases containing equipment maintenance records, loan
applications, financial transactions, medical records, and the like. As our
understanding of computers continues to mature, it seems inevitable that machine
learning will play an increasingly central role in computer science and computer
technology. In recent years, many successful ML applications have been developed,
ranging from data-mining programs that learn to detect fraudulent credit card
transactions, to information-filering systems that learn users’ reading preferences, to
autonomous vehicles that learn to drive on public highways.
lOMoARcPSD|9421877
Figure 1
The first problem is let’s write a program to add two numbers a and b, most of you
will wonder what is a question this is such a basic question probably this particular
program is among some of the early programs that all of us have written. t. So, how
do we really write this program? We essentially write a function f() which takes two
arguments a and b and then it returns a + b. This is a program that all of you are
familiar with, we can add two numbers very easily by writing a computer program.
Let us try to solve a slightly different problem with the same technique and we will
see whether we can solve it or if we need some more tools in our toolkit. The second
lOMoARcPSD|9421877
problem is let’s say, we have a bunch of handwritten digits 8,9,2. So, what we have
done is, we have fixed an area in which you can write these digits and now the task is
Can you write a program to recognize these digits. Your job is to write a function that
recognizes digit given the picture digit image. So, can you write a program just as you
did for the addition of two numbers to recognize handwritten digits. Now, I can
imagine that some of you must have started thinking about writing rules for different
kind of numbers. Are rules really scalable? What if I write the number in a slightly
different orientation or I write a number in a very different style, probably rules will
break rules would not be able to cater to all the situations. But as a human being, we
are able to recognize these numbers. What makes us recognize these numbers? We
will come to this question in a bit. But before that can we write down the process of
recognizing these digits just as we did in the other problem with where we added two
numbers. When we were giving given two numbers a and b we immediately came up
with a step or we immediately came up with a function to add two numbers which was
simply a + b. But as you can imagine or as you must be facing right now is it is
incredibly hard to come up with the stepwise process to recognize the digits.
So, how do we really solve this problem? And before getting into solving the problem,
let us think what is a difference between these two problems, why are we
able to solve the first problem very easily, but the second problem is a bit of a harder
problem for us to recognize digits with computers.
What are the key differences between these two problems?
In the first problem, the formula to add two numbers was known to us. So, given two
numbers a and b I can simply do a + b and that gave us the answer.
But in case of the second problem where we are trying to recognize digits, we are
able to recognize it with our vision but unable to come up with steps that we can code
up in the computer so that computer can also start recognizing digits. So, we need to
do something else: Machine learning.
Let us take a step back and try to understand why we are able to recognize these digits
you can think that we have been seeing these kinds of digits right from our childhood.
When you started our formal education we are introduced to these digits.
So, somehow our brain is trained to recognize these digits even if they are written in a
slightly different style or in a slightly different orientation. Can we try to mimic the
training that we provided to a brain, can we give the same training to a computer?
Let’s try to explore that. This is the question that ML tries to explore. So, let us write
down the key difference between the programming the traditional programming
paradigm and the ML.
In our traditional programming world, we have a program,we give some data as an
input and we also input the rules, rather we code these rules in the program and then
lOMoARcPSD|9421877
pass the data into this program, the rules get applied on the data and we get the output.
We did exactly the same thing while adding two numbers. When we sort the numbers,
we also give step by step instructions to the computer as to how to sort these numbers.
Now, let us look at how machine learning operates, remember the handwritten digit
recognition examples and we see that we have data, but we do not have rules.
We cannot write a traditional computer program, but we can actually provide lots of
examples of handwritten digits along with the corresponding digit. For example, I can
say that this is the image and 8 is the digit corresponding to this particular image, 9 is
the digit corresponding to this particular image, 2 is the digit corresponding to this
particular image and this is 8. We have lots of examples where we have images of
handwritten digits along with their actual labels, which are nothing but the numbers
that are there in the handwritten digit.
We have data and we also provide the intended output as input to ML and machine
learning comes up with rules or sometimes you also collect patterns or models.
You can now see a clear difference here that in traditional programming, the rule is on
the left-hand side, and in ML, the rule is on the right-hand side and the output which
was on the right-hand side in traditional programming, had moved to the left-hand
side (input side). (See Figure. 1)
The traditional program takes data and rules as input, the rules are applied to the input
data to produce the output. In the case of ML, we have data and the output as the input
given to the ML and ML comes up with rules or patterns or models that it sees in the
input data.
We will write down the steps in the ML process here. So, we have data and
we have labels. Input them to ML trainer. The trainer looks at the input data
and corresponding labels(output) and forms rules. So, this gives us a model or rules;
the model is nothing but the mapping of input to the output. Once we get this
particular model what we do is, we can take the new data and pass it through the
model to get the output. We can see that once we get the model the process is exactly
the same as the programming world, because once I know the model I know
exactly the formula to map the input to the output. The process or the work that we do
in ML training is to take the data and desired output and use ML trainer to come up
with a model and once we have modelled we can use that model to get output on the
new data. Now we can see that, so there are two stages in the machine learning
process- 1. Training 2. Inference or Prediction
This particular stage where we had data and we reached to model, is called as a
training stage. When we apply model to new data and get the output, this particular
phase is called as inference or prediction.
lOMoARcPSD|9421877
4. History of ML
1950 — Alan Turing creates the “Turing Test” to determine if a computer has real
intelligence. To pass the test, a computer must be able to fool a human into believing it
is also human.
1952 — Arthur Samuel wrote the first computer learning program. The program was
the game of checkers, and the IBM computer improved at the game the more it
played, studying which moves made up winning strategies and incorporating those
moves into its program.
1957 — Frank Rosenblatt designed the first neural network for computers (the
perceptron), which simulate the thought processes of the human brain.
1967 — The “nearest neighbor” algorithm was written, allowing computers to begin
using very basic pattern recognition. This could be used to map a route for traveling
salesmen, starting at a random city but ensuring they visit all cities during a short tour.
1979 — Students at Stanford University invent the “Stanford Cart” which can
navigate obstacles in a room on its own.
1981 — Gerald Dejong introduces the concept of Explanation Based Learning (EBL),
in which a computer analyses training data and creates a general rule it can follow by
discarding unimportant data.
1985 — Terry Sejnowski invents NetTalk, which learns to pronounce words the same
way a baby does.
lOMoARcPSD|9421877
2006 — Geoffrey Hinton coins the term “deep learning” to explain new algorithms
that let computers “see” and distinguish objects and text in images and videos.
2010 — The Microsoft Kinect can track 20 human features at a rate of 30 times per
second, allowing people to interact with the computer via movements and gestures.
2011 — Google Brain is developed, and its deep neural network can learn to discover
and categorize objects much the way a cat does.
2015 – Microsoft creates the Distributed Machine Learning Toolkit, which enables the
efficient distribution of machine learning problems across multiple computers.
2015 – Over 3,000 AI and Robotics researchers, endorsed by Stephen Hawking, Elon
Musk and Steve Wozniak (among many others), sign an open letter warning of the
danger of autonomous weapons which select and engage targets without human
intervention.
Machine Learning (ML) -refers to systems that can learn from experience.
Deep Learning (DL) -refers to systems that learn from experience on large data sets.
6.Types of Learning
6.1Supervised Learning:
Supervised learning is when the model is getting trained on a labelled
dataset. Labelled dataset is one which have both input and output parameters. In this
type of learning both training and validation datasets are labelled as shown in the
figures below.
lOMoARcPSD|9421877
value but is continuous in the particular range. The goal here is to predict a value
as much closer to actual output value as our model can and then evaluation is
done by calculating error value. The smaller the error the greater the accuracy of
our regression model.
Thus the machine has no idea about the features of dogs and cat so we can’t categorize
it in dogs and cats. But it can categorize them according to their similarities, patterns,
and differences i.e., we can easily categorize the above picture into two parts. First
may contain all pics having dogs in it and second part may contain all pics
having cats in it. Here you didn’t learn anything before, means no training data or
examples.
Unsupervised learning classified into two categories of algorithms:
Clustering: A clustering problem is where you want to discover the inherent
groupings in the data, such as grouping customers by purchasing behavior.
Association: An association rule learning problem is where you want to
discover rules that describe large portions of your data, such as people that buy X
also tend to buy Y.
lOMoARcPSD|9421877
Now whenever the cat is exposed to the same situation, the cat
executes a similar action with even more enthusiastically in
expectation of getting more reward (food).
That's like learning that cat gets from "what to do" from positive
experiences.
At the same time, the cat also learns what not do when faced with
negative experiences.
Similar techniques have applications in many practical problems where very large
search spaces must be examined efficiently.
explanation for some observed phenomenon. Hypothesis then, can be tested for truth,
via experiment or mathematical proofs. If it is tested correct, it becomes theory.
Theory is proved explanation about phenomenon.
In order to complete the design of the learning system, we must now choose
1. the exact type of knowledge to be learned
2. a representation for this target knowledge
3. a learning mechanism
lOMoARcPSD|9421877
Thus, our learning program will represent Target Function as a linear function of the
form where W0 through W4 are numerical coefficients, or weights, to be chosen by the
learning algorithm. Learned values for the weights W1 through W4 will determine the
relative importance of the various house features in determining the price of house,
whereas the weight W0 will provide an additive constant to the features of data point.
To summarize our design choices thus far, we have elaborated the original formulation
of the learning problem by choosing a type of training experience, a target function to
be learned, and a representation for this target function. Therefore, Partial design of a
House pricing estimation program becomes:
Task T: Price estimation of a House
Performance measure P: Accuracy with which price is estimated
Training experience E: Data point collected from training example
Target Function, Estimateprice(V): X--->Y
Representationof target function: Estimateprice(V)=W0+W1X1+W2X2+W3X3+W4X4
Thus, we seek the weights, that minimize E for the observed training examples.
Several algorithms are known for finding weights of a linear function that minimize E
defined in this way. In our case, we require an algorithm that will incrementally refine
the weights as new training examples become available and that will be robust to
errors in these estimated training values. One such algorithm is called the least mean
squares, or LMS training rule. For each observed training example it adjusts the
weights a small amount in the direction that reduces the error on this training example.
This algorithm can be viewed as performing a stochastic gradient-descent search
through the space of possible hypotheses (weight values) to minimize the squared
error E. The LMS algorithm is defined as follows:
lOMoARcPSD|9421877
3. When and how can prior knowledge held by the learner guide the process of
generalizing from examples? Can prior knowledge be helpful even when it is only
approximately correct?
4. What is the best strategy for choosing a useful next training experience, and how
does the choice of this strategy alter the complexity of the learning problem?
5. What is the best way to reduce the learning task to one or more function
approximation problems? Put another way, what specific functions should the
system attempt to learn? Can this process itself be automated?
6. How can the learner automatically alter its representation to improve its ability to
represent and learn the target function?