History, Schematic Diagram of ML System, Data Set, Types and Terminology of ML

History, Schematic Diagram of ML
System, Data Set, Types and

Terminology of ML
AP
Machine Learning History
1950s :
• Samuel’s checker –playing program
1960s:
• Neural network: Rosenblatt’s perceptron
• Pattern Recognition
• Minsky and Papert prove limitations of Perceptron
1970s:
• Symbolic concept induction
• Expert Systems and knowledge acquisition bottleneck
• Quinlan’s ID3
• Natural Language Processing (symbolic)
AP
… Cntd.
1980s :
• Advance decision tree and rule learning
• Learning and planning and problem solving
• Resurgence of neural network
• Valiant’s PAC learning theory
• Focus on experimental methodology
AP
… Cntd.
1990s: Machine Learning and Statics
• Support Vector Machines
• Adaptive agents and web applications
• Text Learning
• Reinforcement Learning
• Ensembles
• Bayes Net Learning
AP
Applications Where ML Has
Come in Forefront
• 1994:Self-driving car road test
• 1997:Deep Blue beats world champion Gary Kasparov in the
game of chess
• 2009:Google builds self driving car
• 2011: Watson wins the game of Jeopardy
• 2014: Human Vision surpassed by ML systems
AP
Popularity of ML in Recent Time
• New Software/Algorithms
̶ Neural networks
̶ Deep Learning
• New Hardware
̶ GPU’s
• Cloud Enabled Availability of Big Data
AP
How ML Solutions Differ From
Programmable Solutions
Examples of
Data Program Data I/O Data
Computer Computer
Output Program
Algorithm Machine Learning
AP
Schematic Diagram of
ML System
Experiences / Problem /Task
Data
Models
Learner Reasoner
Background Answer /
Knowledge / Bias Performance
AP
Various Domains and
Application of ML
Medicine:
• Diagnose a disease
̶ Input: Symptoms, Lab Measurements, Test Results, tests, etc
̶ Output: One of set of possible disease, or “None of the above”
• Data mining of historical medical records to learn which future
patients will respond best to which treatments.
Computer Vision:
̶ To find what objects appear in an image
̶ To convert hand-written digits to characters
̶ To detect where objects appear
AP in an image
Various Domains and
Application of ML … Cntd.
Robot control:
• To design Autonomous mobile robots that learn to navigate from
their own experience.
Natural language processing (NLP):
• To detect where entities are mentioned in NL
• To detect what facts are expressed in NL
• To detect if a product/movie review is positive, negative or neutral
Financial:
• To predict if a stock price will rise or fall In the next few milliseconds
• To predict if a user will click on a ad or not in order to decide which
ad to show (target advertising)AP
Various Domains and
Business Intelligence
• Robustly forecasting product sales quantities taking
seasonality and trend into account
• Identifying cross selling promotional opportunities for
consumer goods
• Identify the price sensitivity of a consumer product and
identify the optimum price point that maximizes net profit
• Optimizing product location at a super market retail outlet
• Modelling variables impacting customers churn and refining
strategy
AP
Various Domains and
Few Other Applications
• Fraud detection :Credit Card Providers
• Determine whether or not someone will default on a home
mortgage
• Understand consumer sentiment based on unstructured text data
• Forecasting women’s conviction ratesbased on external
macroeconomic factors
AP
Data Set (Example)
Cell Nuclei
• Cell samples were taken from tumors in cancer patients before

surgery, and imaged Tumors were exercised
• Patients were followed to determine whether or not the cancer
recurred, and how long until recurrence or disease free
AP
Data Set … Cntd.
Thirty real-valued variables per tumor.
• Two variables that can be predicted:
– Outcome (R=recurrence, N=non-recurrence)
– Time (until recurrence, for R, time healthy, for N).
tumor size texture perimeter …… outcome time
18.02 27.6 117.5 N 31

17.99 10.38 122.8 N 61
20.29 14.34 135.1 R 27
……
AP
Data Set Terminology
tumor size texture perimeter …… outcome time
18.02 27.6 117.5 N 31

17.99 10.38 122.8 N 61
20.29 14.34 135.1 R 27
……
• A training example i has the form: 〈xi,1, . . . xi,n, yi〉

where n is the number of attributes (30 in our case).
• We will use the notation xi to denote the column vector
with elements xi,1, . . . xi,n.
• The training set D consists of m training examples
• We denote the m×n matrix of attributes by X and the
size-m column vector of outputs AP
from the data set by y.
Steps for creating a Learner
Step 1. Choose the training experience/data - Features
Step 2. Choose the target function (that is to be learned)
Step 3. Choose how to represent the target function
– Class of functions (Hypothesis Language)
Step 4. Choose a learning algorithm to infer the target function
Rich representation – more difficult to learn
AP
Types o ML
Based on the information available:
1) Supervised learning
2) Unsupervised learning
3) Reinforcement learning
4) Semi-supervised learning
Based on the role of the learner

1) Passive Learning
2) Active learning
AP
Supervised Learning
• Supervised learning as the name indicates the presence of a
supervisor as a teacher.
• Basically supervised learning is a learning in which we teach or train
the machine using data which is well labelled that means some data
is already tagged with the correct answer.
• After that, the machine is provided with a new set of
examples(data) so that supervised learning algorithm analyses the
training data(set of training examples) and produces a correct
outcome from labelled data.
• Generally what will happen in Supervised Learning
̶ X, Y (Pre-classified training examples)
̶ Given an observation X, what
AP is the best label for Y?
Schematic of Supervised
Learning
New Input X ’
X Y
Input 1 Output 1
Learning
Input 2 Output 2 Model
Algorithm
Input 3 Output 3
.
.
Input n Output n
OutputY’
AP
Example of Supervised
Learning
• For instance, suppose you are given a basket filled with different kinds
of fruits. Now the first step is to train the machine with all different fruits
one by one like this:
• If shape of object is rounded and depression at top having color Red

then it will be labelled as –Apple
• If shape of object is long curving cylinder having color Red then it will be
labelled as – Apple
AP
Example of Supervised
Learning …Cntd.
• Now suppose after training the data, you have given a new separate
fruit say Apple from basket and asked to identify it
• Since the machine has already learned the things from previous data
and this time have to use it wisely.
• It will first classify the fruit with its shape and color and would
confirm the fruit name as APPLE and put it in Apple category.
• Thus the machine learns the things from training data(basket
containing fruits) and then apply the knowledge to test data (new fruit)
AP
Classification of Supervised
Learning
Supervised learning classified into two categories of
algorithms:
• Classification: A classification problem is when the output
variable is acategory, such as “Red” or “blue” or “disease” and “
no disease”.
• Regression: A regression problem is when the output
variable is a real value,such as “dollars” or “weight”.
AP
Unsupervised Learning
• Unsupervised learning is the training of machine using information
that is neither classified nor labelled and allowing the algorithm to
act on that information without guidance.
• Here the task of machine is to group unsorted information
according to similarities, patterns and differences without any prior
training of data.
• Unlike supervised learning, no teacher is provided that means no
training will be given to the machine. Therefore machine is restricted
to find the hidden structure in unlabelled data by our-self.
• Unsupervised Learning
̶ Input: X
̶ Output: Given a set of X’s, clusters
AP (or summarizing them)
Schematic of Unsupervised
Learning
Clusters
Input 1 Learning
Input 2 Algorithm
Input 3.
.
Input n
AP
Example of Unsupervised
Learning
• For instance, suppose it is given an image having both dogs and cats
which have not seen ever
• Thus the machine has no idea about the features of dogs and cat so we
can’t categorize it in dogs and cats
• But it can categorize them according to their similarities, patterns, and
differences i.e., we can easily categorize the above picture into two parts.
First may contain all pics having dogs in it and second part may contain all
pics having cats in it
• Here you didn’t learn anything before, means no training data or examples
needed or such scenerios AP
Classification of Unsupervised
Learning
Unsupervised learning classified into two categories of algorithms:
• Clustering: A clustering problem is where you want to discover the
inherent groupings in the data,such as grouping customers by
purchasing behaviour
• Association: An association rule learning problem is where you want
to discover rules that describe large portions of your data,such as
people that buy X also tend to buyY
AP
Reinforcement Learning
• Reinforcement Learning is defined as a Machine Learning method
that is concerned with how software agents should take actions in
an environment
• Reinforcement Learning is a part of the deep learning method that
helps you to maximize some portion of the cumulative reward
• This neural network learning method helps you to learn how to
attain a complex objective or maximize a specific dimension over
many steps
AP
Schematic of Reinforcement
Learning
Action at
State S t S t +1
A gent Environment
Reward R t R t +1
AP
Example of Reinforcement
Learning
• Consider the scenario of teaching new tricks to your cat
• As cat doesn't understand English or any other human language, we
can't tell her directly what to do. Instead, we follow a different strategy.
• We emulate a situation, and the cat tries to respond in many different
ways. If the cat's response is the desired way,we will give her fish(or food).
• Now whenever the cat is exposed to the same situation, the cat
executes a similar action with even more enthusiastically in expectation of
getting more reward(food).
• That's like learning that cat gets from "what to do" from positive
experiences.
• At the same time, the cat also learns what not to do when faced with
negative experiences.
AP
Semi-supervised Learning
• Semi-supervised learning is a combination of the
supervised and Unsupervised Learning.
• It includes a partially labelled training data,
usually a small portion of labelled and a larger
portion of unlabelled data.
• In a way, semi-supervised learning can be found
in humans as well. A large part of human
learning is semi-supervised.
AP
Semi-supervised Learning
AP
Example of Semi-supervised
Learning
• A small amount of labelling of objects during childhood
leads to identifying a number of similar (not same)
objects throughout their lifetime.
• Suppose a child comes across fifty different cars but
its elders have only pointed to four and identified them
as a car.
• The child can still automatically label most of the
remaining 96 objects as a ‘car’ with considerable
accuracy.
AP
Passive and Active Learning
Passive Learning
• Traditionally, learning algorithms have been passive
learners, which take a given batch of data and process it to
produce a hypothesis or model
• Data → Learner → Model
Active Learning
• Active learning is a special case of machine learning in which
a learning algorithm is able to interactively query the user
(or some other information source) to obtain the desired
outputs at new data points
AP next…
Passive and Active Learning … Cntd.
Active Learning
• Active learners are instead allowed to query the
environment
– Ask questions
– Perform experiments
• There are situations in which unlabelled data is abundant
but manually labelling is expensive. In such a scenario,
learning algorithms can actively query the user/teacher for
labels
• This type of iterative supervised learning is called active
learning
• Open issues: how to query the environment optimally?
how to account for the costAPof queries?
Terminology in ML
1. Accuracy :Percentage of correct predictions made by the model.
2. Attribute: A quality describing an observation (e.g.color, size,
weight). In Excel terms, are column headers.
3. Bias metric :What is the average difference between your
predictions and the correct value for that observation?
4. Low bias: could mean every prediction is correct. It could also
mean half of your predictions are above their actual values and half
are below, in equal proportion, resulting in low average difference.
5. High bias (with low variance): suggests your model may be
underfitting and you’re using the wrong architecture for the job.
AP
Terminology in ML …Cntd.
6. Bias term: Allow models to represent patterns that do not
pass through the origin.
7. Categorical Variables: Variables with a discrete set of possible
values.Can be ordinal (order matters) or nominal (order
doesn’t matter).
8. Classification:Predicting a categorical output.
✓Binary classification predicts one of two possible outcomes
(e.g. is the email spam or not spam?)
✓Multi-class classification predicts one of multiple possible
outcomes (e.g. is this a
photo of a cat,dog,horse or human?)
AP
9. Classification Threshold :The lowest probability

value at which we’re
comfortable asserting a positive classification.
✓For example, if the predicted probability of
being diabetic is > 50%, return True,otherwise
return False.
10.Clustering: Unsupervised grouping of data into
buckets.
AP
11.Confusion Matrix: Table that describes the
performance of a classification
model by grouping predictions into 4 categories.
• True Positives:we correctly predicted they do have
diabetes
• True Negatives:we correctly predicted they don’t have
diabetes
• False Positives: we incorrectly predicted they do have
diabetes (Type I error)
• False Negatives: we incorrectly predicted they don’t have
diabetes (Type II error) AP
12.Convergence: A state reached during the training of a
model when the loss changes very little between each
iteration.
13.Deduction: A top-down approach to answering
questions or solving problems.
• A logic technique that starts with a theory and tests
that theory with observations to derive a conclusion.
• E.g. We suspect X, but we need to test our hypothesis
before coming to any conclusions.
AP
14. Dimension: Dimension for machine learning and data
scientist is differ from physics, here Dimension of data
means how much feature you have in you data ocean
(data-set).
• E.g. in case of object detection application, flatten image size and
colour channel (e.g. 28*28*3) is a feature of the input set.
• In case of house price prediction (may be) house size is the data-
set so we call it 1 dimensional.
15. Epoch: An epoch describes the number of times the
algorithm sees the entire data set.
AP
16.Extrapolation: Making predictions outside the range of

a dataset.
• E.g.My dog barks, so all dogs must bark.
• In machine learning we often run into trouble when
we extrapolate outside the range of our training data.
AP
17.Feature: With respect to a dataset, a feature represents an
attribute and value combination.
• Color is an attribute.“Color is blue” is a feature.
• In Excel terms,features are similar to cells.
18.Feature Selection: Feature selection is the process of selecting
relevant features from a data-set for creating a Machine Learning
model.
19.Feature Vector:A list of features describing an observation with
multiple attributes.
• In Excel we call this a row.
AP
20.Hyper parameters: Hyper parameters are higher-level properties
of a model such as how fast it can learn (learning rate) or complexity
of a model.
• The depth of trees in a Decision Tree or number of hidden
layers in a Neural Networks are examples of hyper parameters.
21.Induction: A bottoms-up approach to answering questions or
solving problems. A logic technique that goes from observations to
theory.
• E.g.We keep observing X, so we infer thatY must beTrue.
22.Instance: A data point, row,or sample in a dataset. Another term
for observation.
AP
23.Label: The “answer” portion of an observation in supervised
learning.
• For example, in a dataset used to classify flowers into different
species, the features might include the petal length and petal
width, while the label would be the flower’s species.
24.Null Accuracy: Baseline accuracy that can be achieved by always
predicting the most frequent class. (“B has the highest frequency,
so lets guess B every time”).
25.Outlier: An observation that deviates significantly from other
observations in the
dataset.
AP
26. Overfitting: Overfitting occurs when your model learns

the training data too well and incorporates details and
noise specific to your dataset. You can tell a model is
overfitting when it performs great on your
training/validation set, but poorly on your test set (or
new real-world data).
AP
27.Precision: In the context of binary
classification (Yes/No), precision measures
the model’s performance at classifying
positive observations (i.e. “Yes”). In other
words, when a positive value is predicted,
how often is the prediction correct?
AP
28.Recall (Or True Positive Rate (TPR)): Also called
sensitivity. In the context of binary classification (Yes/No),
recall measures how “sensitive” the classifier is at
detecting positive instances. In other words, for all the
true observations in our sample, how many did we
“catch.”
AP
Prec i s i on Vs Rec a ll
• Analyzing Brain scans and trying to predict whether a person has a tumor
(True) or not (False).
• Precision is the % of True guesses that were actually correct! If we guess 1
image is True out of 100 images and that image is actually True, then our
precision is 100%! Our results aren’t helpful however because we missed
10 brain tumors! (Actually 10 images with brain tumor)
• Recall, or Sensitivity, provides another lens which with to view how good
our model is. Again let’s say there are 100 images, 10 with brain tumors, and
we correctly guessed 1 had a brain tumor. Precision is 100%, but recall is
10%.
• Perfect recall requires that we catch all 10 tumors!
AP
29.Specificity: In the context of binary classification

(Yes/No), specificity measures the model’s performance at
classifying negative observations (i.e. “No”). In other
words, when the correct label is negative, how often is
the prediction correct?
AP
29.Underfitting: Underfitting occurs when your model

over-generalizes and fails to incorporate relevant
variations in your data that would give your model more
predictive power.
• You can tell a model is underfitting when it performs
poorly on both training and test sets.
AP
30. Variance: How tightly packed are your predictions
for a particular observation relative to each other?
• Low variance suggests your model is internally
consistent, with predictions varying little from each
other after every iteration.
• High variance (with low bias) suggests your model
may be overfitting and reading too deeply into the noise
found in every training set.
AP

History, Schematic Diagram of ML System, Data Set, Types and Terminology of ML

Uploaded by

Copyright:

Available Formats

You might also like

History, Schematic Diagram of ML System, Data Set, Types and Terminology of ML

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

History, Schematic Diagram of ML System, Data Set, Types and Terminology of ML

Uploaded by

Copyright:

Available Formats

History, Schematic Diagram of ML

System, Data Set, Types and

Algorithm Machine Learning

• Cell samples were taken from tumors in cancer patients before

tumor size texture perimeter …… outcome time

18.02 27.6 117.5 N 31

18.02 27.6 117.5 N 31

• A training example i has the form: 〈xi,1, . . . xi,n, yi〉

Based on the role of the learner

• If shape of object is rounded and depression at top having color Red

9. Classification Threshold :The lowest probability

16.Extrapolation: Making predictions outside the range of

26. Overfitting: Overfitting occurs when your model learns

29.Specificity: In the context of binary classification

29.Underfitting: Underfitting occurs when your model

You might also like