UCS551 Chapter 5 - Machine Learning (Intro)

CHAPTER 5:
MACHINE
LEARNING
CONCEPT AND
TECHNIQUES
DR AZLIN AHMAD
CONTENT
 What is Machine Learning?

 Concept of Learning
 Supervised and Unsupervised Learning
 Concept of Training and Testing
 Cross Validation and Split Ratio
 The Artificial Intelligence or AI is one of the Computer Science area, which
emphasizes the creation and development of intelligent machines
that can think, work and react like humans.
 Machine Learning gives “computers the ability to
learn without being explicitly
programmed.”(Samuel, A., 1959)
 Machine Learning is the subset of Artificial
WHAT IS Intelligence, that deal with the extraction of
patterns from data sets.
MACHINE
 Because of new computing technologies, machine
LEARNING?
learning today is not like machine learning of the
(PEMBELAJARAN past.
MESIN)  It was born from pattern recognition and the theory
that computers can learn without being
programmed to perform specific tasks; researchers
interested in artificial intelligence wanted to see if
computers could learn from data.
self-driving Google car Online recommendation offers such as those from
Amazon and Netflix
Fraud detection
Knowing what customers are saying about you on Twitter?

THE CRITERIA NEEDED WHILE CREATING A GOOD MACHINE
LEARNING SYSTEMS
 In machine learning, a target is

 Data preparation capabilities. called a label.
 Algorithms – basic and advanced.  In statistics, a target is called a
 Automation and iterative dependent variable.
processes.  A variable in statistics is called a
feature in machine learning.
 Scalability.
 A transformation in statistics is
 Ensemble modeling. called feature creation in machine
learning.
Target = label= class label
Variable = feature = attribute = criteria = characteristics column in your dataset
APPLICATIONS OF MACHINE LEARNING
 Financial services
 Banks and other businesses in the financial industry use machine learning technology for two key purposes: to
identify important insights in data, and prevent fraud. The insights can identify investment opportunities, or
help investors know when to trade. Data mining can also identify clients with high-risk profiles, or use
cybersurveillance to pinpoint warning signs of fraud.
 Government
 Government agencies such as public safety and utilities have a particular need for machine learning since they
have multiple sources of data that can be mined for insights. Analyzing sensor data, for example, identifies ways
to increase efficiency and save money. Machine learning can also help detect fraud and minimize identity theft.
 Health care
 Machine learning is a fast-growing trend in the health care industry, thanks to the advent of wearable devices
and sensors that can use data to assess a patient's health in real time. The technology can also help medical
experts analyze data to identify trends or red flags that may lead to improved diagnoses and treatment.
APPLICATIONS OF MACHINE LEARNING
 Retail
 Websites recommending items you might like based on previous purchases are using machine learning
to analyze your buying history. Retailers rely on machine learning to capture data, analyze it and use it to
personalize a shopping experience, implement a marketing campaign, price optimization, merchandise
supply planning, and for customer insights.
 Oil and gas
 Finding new energy sources. Analyzing minerals in the ground. Predicting refinery sensor failure.
Streamlining oil distribution to make it more efficient and cost-effective. The number of machine learning
use cases for this industry is vast – and still expanding.
 Transportation
 Analyzing data to identify patterns and trends is key to the transportation industry, which relies on
making routes more efficient and predicting potential problems to increase profitability. The data analysis
and modeling aspects of machine learning are important tools to delivery companies, public
transportation and other transportation organizations.
LEARNING IN
MACHINE
LEARNING
WHAT IS LEARNING?
Learning is one of the fundamental building block of AI solutions.
Learning is a process that improves the knowledge of an AI program by

making observations about its environment.
AI learning process focused on processing a collection of input-output

pairs for specific function and predicts the output for new input.
Features of
apples:
shape,
color,
size,
diameter
RESULTS/
OUTPUT
INPUT
PROCESS:
Machine learning
It’s an apple
It’s a banana
TYPES OF LEARNING
• Training data includes desired outputs.

Supervised • Eg: Classification: This is spam this is not, learning is
(terselia) supervised.
• Training data does not include desired outputs.

• Eg: Clustering. It is hard to tell what is good learning and
Unsupervised
what is not.
(tidak terselia)
SUPERVISED LEARNING
(LEARNING WITH TEACHER SIGNALS OR TARGETS)
Definition: Process of
adjusting weights in a neural net
using learning algorithm; the
desired output for each of a set
training input vectors is
presented to the net. Many
iterations through the training
data may be required.
Designed to perform pattern
classification - to classify an
input pattern as either
belonging or not belonging to a
given category
Iris Data Set (UCI repository)
UNSUPERVISED LEARNING
(LEARNING WITHOUT THE USE OF TEACHER SIGNALS )
- Produce the output based of input
data without labelled responses.
- The most common unsupervised
learning method is cluster analysis,
which is used for exploratory data
analysis to find hidden patterns or
grouping in data.
- The clusters are modelled using a
measure of similarity which is
defined upon metrics such as
Euclidean or probabilistic distance.
1st Iteration 5000th Iteration
QE:1.80 QE:0.48
TE: 0.32 TE: 0.1467
TRAINING (LEARN/ BELAJAR – TEACH THE MODEL)
TESTING SET(UJI MODEL )
DATA SET (1000 data/ row)
Training (80%) Testing (20%)
THE WHOLE DATASET = 100%
SPLIT RATIO:
SPLIT RATIO
(TRAINING, VALIDATION(OPTIONAL) AND TESTING )
Training Dataset: Validation Dataset: The sample of data used to provide Test Dataset: The sample of data used
The sample of data used an unbiased evaluation of a model fit on the training to provide an unbiased evaluation of a
to fit the model. dataset while tuning model hyperparameters. The final model fit on the training dataset.
The actual dataset that evaluation becomes more biased as skill on the The Test dataset provides the gold
we use to train the validation dataset is incorporated into the model standard used to evaluate the model.
model (weights and configuration. It is only used once a model is
biases in the case of The validation set is used to evaluate a given model, but completely trained(using the train and
Neural Network). The this is for frequent evaluation. We as machine learning validation sets). The test set is
model sees and learns engineers use this data to fine-tune the model generally what is used to evaluate
from this data. hyperparameters. competing models.
CROSS VALIDATION
1000 ROWS OF DATA  5 FOLDS (EACH FOLD CONSISTS OF 200 DATA)
TESTING TRAINING TRAINING TRAINING TRAINING
K=1 K=2 K=3 K=4 K=5
TRAINING TESTING TRAINING TRAINING TRAINING
TRAINING TRAINING TESTING TRAINING TRAINING
TRAINING TRAINING TRAINING TESTING TRAINING
TRAINING TRAINING TRAINING TRAINING TESTING

 Cross-validation is a technique to evaluate predictive models by partitioning the original
sample into a training set to train the model, and a test set to evaluate it.
 In k-fold cross-validation, the original sample is randomly partitioned into k equal size
subsamples. Of the k subsamples, a single subsample is retained as the validation data for
testing the model, and the remaining k-1 subsamples are used as training data.
 The cross-validation process is then repeated k times (the folds), with each of the k
subsamples used exactly once as the validation data. The k results from the folds can
then be averaged (or otherwise combined) to produce a single estimation.
 The advantage of this method is that all observations are used for both training and
validation, and each observation is used for validation exactly once.
 For classification problems, one typically uses stratified k-fold cross-validation, in which the
folds are selected so that each fold contains roughly the same proportions of class labels.
LINKS:
 Movie clips:
https://drive.google.com/drive/folders/14mGBMMcLtxvVv9KEZBO_BLhbT7QuoY1v?usp=sharing
REFERENCE
 https://www.sas.com/en_my/insights/analytics/machine-learning.html
 https://www.oreilly.com/ideas/machine-learning-a-quick-and-simple-definition
 https://www.openml.org/a/estimation-procedures/7
 https://machinelearningmastery.com/k-fold-cross-validation/
 https://www.quora.com/What-is-training-learning-and-testing-in-machine-learning

UCS551 Chapter 5 - Machine Learning (Intro)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

UCS551 Chapter 5 - Machine Learning (Intro)

Uploaded by

Copyright:

Available Formats

CHAPTER 5:

 What is Machine Learning?

Knowing what customers are saying about you on Twitter?

 In machine learning, a target is

Learning is one of the fundamental building block of AI solutions.

Learning is a process that improves the knowledge of an AI program by

AI learning process focused on processing a collection of input-output

• Training data includes desired outputs.

• Training data does not include desired outputs.

DATA SET (1000 data/ row)

Training (80%) Testing (20%)

THE WHOLE DATASET = 100%

TESTING TRAINING TRAINING TRAINING TRAINING

K=1 K=2 K=3 K=4 K=5

TRAINING TESTING TRAINING TRAINING TRAINING

TRAINING TRAINING TESTING TRAINING TRAINING

TRAINING TRAINING TRAINING TESTING TRAINING

TRAINING TRAINING TRAINING TRAINING TESTING

You might also like