6 - CSE3013 - Learning Systems

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

CSE3013 - "Artificial Intelligence"

Learning Systems

Dr. Pradeep K V
Assistant Professor (Sr.)
School of Computer Science and Engineering
VIT - Chennai

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 1/ 42


Contents...

Introduction to Machine Learning


Traditional Learning Vs Machine Learning
Types of Learning (Machine)
Features and Applications of ML
AI Verses ML
Types of Machine Learning

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 2/ 42


Introduction

Definition-1
It is a system of computer algorithms that can learn from example through
self-improvement without being explicitly coded by a programmer.

Definition-2
It is all about making computers how to learn from data to make decisions /
predictions / identify patterns without being explicitly programmed to.

How Machine Learning Works?

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 3/ 42


Machine Learning vs. Traditional Programming

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 4/ 42


Machine Learning...!

Definition-3
Machine learning enables a machine to automatically learn from data, improve
performance from experiences, and predict things without being explicitly
programmed.

A Machine Learning system learns from historical data, builds the prediction
models, and whenever it receives new data, predicts the output for it

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 5/ 42


Introduction to ML

Features of Machine Learning

Machine learning uses data to detect various patterns in a given dataset.


It can learn from past data and improve automatically.
It is a data-driven technology.
Machine learning is much similar to data mining as it also deals with the
huge amount of the data.

Types of Machine Learning

Supervised Learning (Classification, Regression)


Unsupervised Learning (Clustering, Association)
Semi-supervised Learning (class of supervised learning)
Reinforcement Learning

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 6/ 42


Types of Ml

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 7/ 42


Supervised Learning
In Supervised learning, we provide the data along with the desired output (i.e
Labelled data). For instance, If we want our system to learn cat detection, we’ll
collect thousands of images, draw a bounding box around the cat and feed the
entire dataset to the machine so it can learn all by itself.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 8/ 42


Supervised Learning...

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 9/ 42


Unsupervised Learning

Here, we provide data and let the machine find out the patterns in the dataset.
For instance, provided 3 different shapes (circles, triangles, and squares) and let
the machine cluster them. Such a technique is called clustering.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 10/ 42


Unsupervised Learning...

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 11/ 42


Semi-Supervised Learning

Semi-Supervised learning is a class of supervised learning tasks and techniques


that also make use of unlabeled data for training. Here, the machine learns
from partially labelled data and maps these learning’s to unlabeled data.

For instance, a photo-storage service would group all the photos of an


individual, and you only have to label one image and all the rest will be labelled
with the same name because they have the same person.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 12/ 42


Re-enforcement Learning

Here, the machine is commonly referred to as an agent, and the agent receives
a reward (or a penalty) based on each of its actions. It then learns what would
be the best actions to maximize the rewards and alleviate the penalties.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 13/ 42


Overfitting and Underfitting

After getting trained on data, the goal of our trained model is the
generalize on unseen data as accurately as possible.
If the model yield very accurate results on training data but fails to
generalize on unseen data, it’s called over-fitting because the model
over-fits the training data.
If the model doesn’t even predict accurately on training data, that means
the model has not learned anything, which is known as under-fitting.

Challenges that encounters while machine learning?


Insufficient Data
Poor-Quality Data - Reduce Noise, Discard Outliers (differing from all the
members of the same group)
Irrelevant Features

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 14/ 42


Applications of ML I

Augmentation:
Machine learning, which assists humans with their day-to-day tasks,
personally or commercially without having complete control of the output.
Such machine learning is used in different ways such as Virtual Assistant,
Data analysis, software solutions. The primary user is to reduce errors due
to human bias.

Automation:
Machine learning, which works entirely autonomously in any field without
the need for any human intervention. For example, robots performing the
essential process steps in manufacturing plants.

Finance Industry :
Machine learning is growing in popularity in the finance industry. Banks
are mainly using ML to find patterns inside the data but also to prevent
fraud.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 15/ 42


Applications of ML II

Government organization :
The government makes use of ML to manage public safety and utilities.
Take the example of China with the massive face recognition. The
government uses Artificial intelligence to prevent jaywalker.

Healthcare industry
Healthcare was one of the first industry to use machine learning with
image detection.

Marketing
Broad use of AI is done in marketing thanks to abundant access to data.
Before the age of mass data, researchers develop advanced mathematical
tools like Bayesian analysis to estimate the value of a customer. With the
boom of data, marketing department relies on AI to optimize the customer
relationship and marketing campaign.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 16/ 42


Applications of Machine Learning

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 17/ 42


History of Machine Learning

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 18/ 42


Machine Learning Life Cycle I

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 19/ 42


Machine Learning Life Cycle II

1 Gathering Data : is the first step to identify and obtain all data-related
problems. The quantity and quality of the collected data will determine
the efficiency of the output. The more will be the data, the more accurate
will be the prediction.
Identify various data(Structured/Unstructured) sources
(Files/Database/Internet)
Collect data
Integrate the data obtained from different sources (coherent set of data -
Dataset)

2 Data Preparation : is a step where we put our data into a suitable place
and prepare it to use in our machine learning training.
Data exploration: It is used to understand the nature of data that we have
to work with. We need to understand the characteristics, format, and
quality of data. A better understanding of data leads to an effective
outcome. In this, we find Correlations, general trends, and outliers.
Data pre-processing: Now the next step is preprocessing of data for its
analysis.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 20/ 42


Machine Learning Life Cycle III

3 Data Wrangling : It is the process of cleaning the data, selecting the


variable to use, and transforming the data in a proper format to make it
more suitable for analysis.(To Avoid negative affect of the quality of the
outcome.)
Collected data may have various issues, including:
Missing Values
Duplicate data
Invalid data
Noise
So, can use various filtering techniques to clean the data.

4 Analysis of Data : To build a ML model to analyze the data using various


analytical techniques and review the outcome. It involves -
Selection of analytical techniques (Classification, Regression, Cluster
Analysis, Association...)
Building models
Review the result

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 21/ 42


Machine Learning Life Cycle IV

5 Train Model : By using Datasets, train the model to improve its


performance for better outcome of the problem. raining a model is
required so that it can understand the various patterns, rules, and,
features.

6 Test Model : To check for the accuracy of the trained model by providing
a test dataset to it. Testing the model determines the percentage accuracy
of the model as per the requirement of project or problem.

7 Deployment : The last step of machine learning life cycle is deployment,


where we deploy the model in the real-world system.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 22/ 42


AI verses ML
Artificial Intelligence Machine learning
ML is a subset of AI which allows a machine
AI is a technology which enables a
to automatically learn from past data without
machine to simulate human behavior.
programming explicitly.
The goal is to make a smart system The goal is to allow machines to learn from
like humans to solve complex problems. data so that they can give accurate output.
In ML, we teach machines with data to
Intelligent systems to perform
perform a particular task and
any task like a human.
give an accurate result.
ML and DL are the two main subsets of AI. Deep learning is a main subset of ML
AI has a very wide range of scope. Machine learning has a limited scope.
AI is working to create an intelligent Machine learning is working to create
system which can perform machines that can perform only those
various complex tasks. specific tasks for which they are trained.
AI system is concerned about maximizing Machine learning is mainly concerned
the chances of success. about accuracy and patterns.
The main applications of AI are Siri, The main applications of ML are
customer support using catboats, Online recommender system,
Expert System, Online game playing, Google search algorithms,
intelligent humanoid robot, etc. Facebook auto friend tagging suggestions, etc.
ML can also be divided into
AI can be divided into three types, Supervised learning,
Weak AI, General AI, and Strong AI. Unsupervised learning, and
Reinforcement learning.
It includes learning, reasoning, and It includes learning and self-correction
self-correction. when introduced with new data.
AI completely deals with Structured, ML deals with Structured and
semi-structured, and unstructured data. semi-structured data.
Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 23/ 42
Data Sets I

What is a Dataset?
A dataset is a collection of data in which data is arranged in some order.
A dataset can contain any data from a series of an array to a database
table.

A tabular dataset can be understood as a database table or matrix, where


each column corresponds to a particular variable, and each row
corresponds to the fields of the dataset.
The most supported file type for a tabular dataset is "Comma Separated
File," or CSV.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 24/ 42


Data Sets II

Types of data in datasets


Numerical data: Such as house price, temperature, etc.
Categorical data: Such as Yes/No, True/False, Blue/green, etc.
Ordinal data: These data are similar to categorical data but can be
measured on the basis of comparison.
Note: A real-world dataset is of huge size, which is difficult to manage and
process at the initial level. Therefore, to practice ML algorithms, we can use
any dummy dataset.

Need of Dataset : We need a lot of data to work on ML projects because


ML/AI models can’t be trained without data. One of the most important
aspects of building an ML/AI project is gathering and preparing the dataset.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 25/ 42


Data Sets III

During the development of the ML project, the developers completely rely on


the datasets. In building ML applications, datasets are divided into two parts:
Training dataset:
Test Dataset

Note: The datasets are of large size, so to download these datasets, you must
have fast internet on your computer.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 26/ 42


Data Sets IV

Popular sources for Machine Learning datasets


Kaggle Datasets : https://www.kaggle.com/datasets.
UCI Machine Learning Repository :
https://archive.ics.uci.edu/ml/index.php.
Datasets via AWS : https://registry.opendata.aws/
Google’s Dataset Search Engine :
https://toolbox.google.com/datasetsearch
Microsoft Datasets : https://msropendata.com/
Awesome Public Dataset Collection :
https://github.com/awesomedata/awesome-public-datasets
Computer Vision Datasets : https://www.visualdata.io/
Scikit-learn dataset : https://scikit-learn.org/stable/datasets/index.html.
Government Datasets :
https://data.gov.in/
https://www.data.gov/
https://data.europa.eu/euodp/data/dataset

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 27/ 42


CSE3013 - "Artificial Intelligence"
Types of Machine Learning

Dr. Pradeep K V
Assistant Professor (Sr.)
School of Computer Science and Engineering
VIT - Chennai

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 28/ 42


1. Supervised Learning

Supervised learning is a type of ML, in which machines are trained using


well-labeled training data and then predict the output based on that data.

Labeled data indicates that some input data has already been tagged with the
appropriate output.

In supervised learning, the training data provided to the machines acts as a


supervisor, teaching the machines how to correctly predict the output. It
employs the same concept that a student would learn under the supervision of
a teacher.

Supervised learning is the process of providing correct input and output data to
a machine learning model. And the goal is to find a mapping function that
maps the input variable (X) to the output variable (Y).

In the real-world, supervised learning can be used for Risk Assessment, Image
classification, Fraud Detection, spam filtering, etc.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 29/ 42


How Supervised Learning Works?

Models are trained using labelled datasets, where the model learns about each
type of data. After the training process is completed, the model is tested on
test data (a subset of the training set) and predicts the output.

The working of Supervised learning can be easily understood by the below


example and diagram:

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 30/ 42


How Supervised Learning Works?

Assume we have a dataset with various shapes such as squares, rectangles,


triangles, and polygons. The model must now be trained for each shape, which
is the first step.
If the given shape has four sides, and all the sides are equal, then it will be
labelled as a Square.
If the given shape has three sides, then it will be labelled as a triangle.
If the given shape has six equal sides then it will be labelled as hexagon.
After training, we use the test data set to put our model to the test, and the
model’s task is to identify the shape.

The machine has already been trained on all types of shapes, and when it
discovers a new one, it classifies it based on a number of sides and predicts the
output.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 31/ 42


How Supervised Learning Works?

Steps Involved in Supervised Learning

First Determine the type of training dataset


Collect/Gather the labelled training data.
Split the training dataset into training dataset, test dataset, and
validation dataset.
Determine the input features of the training dataset, which should have
enough knowledge so that the model can accurately predict the output.
Determine the suitable algorithm for the model, such as support vector
machine, decision tree, etc.
Execute the algorithm on the training dataset. Sometimes we need
validation sets as the control parameters, which are the subset of training
datasets.
Evaluate the accuracy of the model by providing the test set. If the model
predicts the correct output, which means our model is accurate.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 32/ 42


Advantages and Disadvantages of Supervised Learning

Advantages of Supervised learning

With the help of supervised learning, the model can predict the output on
the basis of prior experiences.
In supervised learning, we can have an exact idea about the classes of
objects.
Supervised learning model helps us to solve various real-world problems
such as fraud detection, spam filtering, etc.

Disadvantages of Supervised learning

Supervised learning models are not suitable for handling the complex tasks.
Supervised learning cannot predict the correct output if the test data is
different from the training dataset.
Training required lots of computation times.
In supervised learning, we need enough knowledge about the classes of
object.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 33/ 42


Types of Supervised Learning

1 Classification : are used when the output variable is categorical, which


means there are two classes such as Yes/No, Male/Female, True/false,
etc.
Random Forest
Decision Trees
Logistic Regression
Support vector Machines
2 Regression - are used if there is a relationship between the input variable
and the output variable. It is used for the prediction of continuous
variables, such as Weather forecasting, Market Trends, etc.
Linear Regression
Regression Trees
Non-Linear Regression
Bayesian Linear Regression
Polynomial Regression

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 34/ 42


2. Unsupervised Learning

As w.k.t, Supervised ML is a type of learning in which models are trained using


labelled data under the supervision of training data.

However, there may be many cases where we do not have labelled data and
must find hidden patterns in the given dataset. Unsupervised learning
techniques are required to solve such types of cases in machine learning.

It is a ML technique, in which models are not supervised using training dataset.


But, Models itself find the hidden patterns and insights from the given data. It
can be compared to learning which takes place in the human brain while
learning new things.

It is a type of ML in which models are trained using unlabeled dataset and are
allowed to act on that data without any supervision.

The goal of unsupervised learning is to find the underlying structure of


dataset, group that data according to similarities, and represent that
dataset in a compressed format.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 35/ 42


Example

Given a dataset containing images of various types of cats and dogs The
algorithm is never trained on the given dataset, so it has no idea about the
dataset’s characteristics.

The task of this learning is to identify the image features on their own. And
will perform by clustering the image dataset into the groups according to
similarities between images.

Why use Unsupervised Learning?


It is helpful for finding useful insights from the data.
It is much similar as a human learns to think by their own experiences,
which makes it closer to the real AI.
It works on unlabeled and uncategorized data which make unsupervised
learning more important.
In real-world, we do not always have input data with the corresponding
output so to solve such cases, we need unsupervised learning.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 36/ 42


Working of Unsupervised Learning

Here, Input data is unlabeled, i.e, it is not categorized and corresponding


outputs are also not given.

Now, this unlabeled data is fed to the ML model in order to train it. Firstly, it
will interpret the raw data to find the hidden patterns from the data and then
will apply suitable algorithms such as k-means clustering, Decision tree, etc.

Once it applies the suitable algorithm, the algorithm divides the data objects
into groups according to the similarities and difference between the objects.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 37/ 42


Types of Unsupervised Learning

Clustering: is a method of grouping the objects into clusters such that


objects with most similarities remains into a group and has less or no
similarities with the objects of another group. It finds the commonalities
between the data objects and categorizes them as per the presence and
absence of those commonalities.
Association: It is used for finding the relationships between variables in
the large database. It determines the set of items that occurs together in
the dataset. It makes marketing strategy more effective. Such as people
who buy X item (suppose a bread) are also tend to purchase Y
(Butter/Jam) item. A typical example of Association rule is Market
Basket Analysis.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 38/ 42


Unsupervised Learning Algorithms

K-means clustering
KNN (k-nearest neighbors)
Hierarchal clustering
Anomaly detection
Neural Networks
Principle Component Analysis
Independent Component Analysis
Apriori algorithm
Singular value decomposition

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 39/ 42


Advantages and Disadvantages of Unsupervised Learning

Advantages of Unsupervised learning

Unsupervised learning is used for more complex tasks as compared to


supervised learning because, in unsupervised learning, we don’t have
labeled input data.
Unsupervised learning is preferable as it is easy to get unlabeled data in
comparison to labeled data.

Disadvantages of Unsupervised learning

Unsupervised learning is intrinsically more difficult than supervised learning


as it does not have corresponding output.
The result of the unsupervised learning algorithm might be less accurate
as input data is not labeled, and algorithms do not know the exact output
in advance.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 40/ 42


Supervised Verses Unsupervised I
Supervised Learning Unsupervised Learning
These algorithms are trained using
They are trained using unlabeled data.
labeled data.
Its model takes direct feedback to check
It does not take any feedback.
if it is predicting correct output or not.
Model predicts the output. It finds the hidden patterns in data.
Input data is provided to the model Only input data is provided
along with the output. to the model.
The goal is to train the model so that it The goal is to find the hidden
can predict the output patterns and useful insights from
when it is given new data. the unknown dataset.
It does not need any supervision
It needs supervision to train the model.
to train the model.
It can be categorized in Classification It can be classified in Clustering and
and Regression problems. Associations problems.
It can be used for those cases where
It can be used for those cases where we
we have only i/p data and no
know the i/p and its corresponding o/p.
corresponding o/p data.
It produces an accurate result. It may give less accurate result.
It is not close to true AI as in this, It is more close to the true AI,
we first train the model for each data, as it learns similarly as a child
and then only it can predict the learns daily routine things
correct o/p. by his experiences.
It includes various algorithms such as
Linear Regression, Logistic Regression,
It includes various algorithms such as
Support Vector Machine, Multi-class
Clustering, KNN, and Apriori algorithm.
Classification, Decision tree,
Bayesian Logic, etc.

Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 41/ 42


Dr. Pradeep K V CSE3013 - "Artificial Intelligence" 42/ 42

You might also like