Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 25

Supervised Machine

Learning
By Dr. Raivrajsinh S. Vaghela
Outline
• Basics of Supervised Learning
• Prediction
• Classification
• Understanding Datasets
• Feature Selection
• Feature Normalization
• Data Cleaning
• Training, Testing & Validation Sets
Basics of Supervised Learning

• Supervised learning uses a training set to teach models to yield the


desired output.

• This training dataset includes inputs and correct outputs, which allow
the model to learn over time.

• The algorithm measures its accuracy through the loss function,


adjusting until the error has been sufficiently minimized.
Supervised Learning
• Supervised learning can be separated into two types of problems
when data mining—classification and regression:
Classification
• Classification uses an algorithm to accurately assign test data into specific
categories.
• It recognizes specific entities within the dataset and attempts to draw some
conclusions on how those entities should be labeled or defined.
• Common classification algorithms are
• Linear classifiers,
• Support vector machines (SVM),
• Decision trees,
• k-nearest neighbor,
• Random forest.
Classification
• Classification is a type of supervised learning that categorizes input
data into predefined labels. It involves training a model on labeled
examples to learn patterns between input features and output
classes. In classification, the target variable is a categorical value. For
example, classifying emails as spam or not.

The model’s goal is to generalize this learning to make accurate
predictions on new, unseen data. Algorithms like Decision Trees,
Support Vector Machines, and Neural Networks are commonly used
for classification tasks.
Regression
• Regression is used to understand the relationship between dependent
and independent variables.
• It is commonly used to make projections, such as for sales revenue
for a given business.
• Linear regression,
• logistical regression,
• and polynomial regression are popular regression algorithms.
Regression

• Regression is a supervised learning technique used to predict


continuous numerical values based on input features. It aims to
establish a functional relationship between independent variables
and a dependent variable, such as predicting house prices based on
features like size, bedrooms, and location.
• The goal is to minimize the difference between predicted and actual
values using algorithms like Linear Regression, Decision Trees, or
Neural Networks, ensuring the model captures underlying patterns in
the data.
Data
• Data is the driving force of ML.
• Data comes in the form of words and numbers stored in tables, or as
the values of pixels and waveforms captured in images and audio files.
We store related data in datasets.
• For example, we might have a dataset of the following:
• Images of cats
• Housing prices
• Weather information
• Datasets are made up of individual examples that contain features and a
label.
• You could think of an example as analogous to a single row in a
spreadsheet.
• Features are the values that a supervised model uses to predict the label.
The label is the "answer," or the value we want the model to predict.
• In a weather model that predicts rainfall, the features could be latitude,
longitude, temperature, humidity, cloud coverage, wind direction, and
atmospheric pressure.
• The label would be rainfall amount.
dataset
• A dataset is characterized by its size and diversity. Size indicates the number of examples.
Diversity indicates the range those examples cover. Good datasets are both large and highly
diverse.

• Some datasets are both large and diverse. However, some datasets are large but have low
diversity, and some are small but highly diverse. In other words, a large dataset doesn’t
guarantee sufficient diversity, and a dataset that is highly diverse doesn't guarantee
sufficient examples.

• For instance, a dataset might contain 100 years worth of data, but only for the month of July.
Using this dataset to predict rainfall in January would produce poor predictions. Conversely,
a dataset might cover only a few years but contain every month. This dataset might produce
poor predictions because it doesn't contain enough years to account for variability.
Characterized
• A dataset can also be characterized by the number of its features. For
example, some weather datasets might contain hundreds of features,
ranging from satellite imagery to cloud coverage values.
• Other datasets might contain only three or four features, like
humidity, atmospheric pressure, and temperature.
• Datasets with more features can help a model discover additional
patterns and make better predictions.
• However, datasets with more features don't always produce models
that make better predictions because some features might have no
causal relationship to the label.
Understanding of Dataset context of
Supervised learning.
Model generation from Labeled Example
• In supervised learning, a model is the complex collection of numbers
that define the mathematical relationship from specific input feature
patterns to specific output label values. The model discovers these
patterns through training.

The model takes in a single labeled example and provides a prediction.


The model compares its predicted value with
the actual value and updates its solution.

An ML model updating its predicted value.


• The model repeats this process for each labeled example in the dataset.
• In this way, the model gradually learns the correct relationship
between the features and the label.
• This gradual understanding is also why large and diverse datasets
produce a better model.
• The model has seen more data with a wider range of values and has
refined its understanding of the relationship between the features
and the label.
Evaluating

• We evaluate a trained model


to determine how well it
learned. When we evaluate a
model, we use a labeled
dataset, but we only give the
model the dataset's features.
• We then compare the
model's predictions to the
label's true values.
Advantages of Supervised Learning

• The power of supervised learning lies in its ability to accurately


predict patterns and make data-driven decisions across a variety of
applications.
Labeled training data benefits supervised learning by enabling models to
accurately learn patterns and relationships between inputs and outputs.
Supervised learning models can accurately predict and classify new data.
Supervised learning has a wide range of applications, including classification,
regression, and even more complex problems like image recognition and
natural language processing.
Well-established evaluation metrics, including accuracy, precision, recall, and
F1-score, facilitate the assessment of supervised learning model
performance.
Disadvantages of Supervised Learning

• Although supervised learning methods have benefits, their limitations


require careful consideration during problem formulation, data collection,
model selection, and evaluation.
• Overfitting: Models can overfit training data, which leads to poor
performance on new, unseen data due to the capture of noise.
• Feature Engineering: Extracting relevant features from raw data is crucial
for model performance, but this process can be time-consuming and may
require domain expertise.
• Bias in Models: Training data biases can lead to unfair predictions.
• Supervised learning heavily depends on labeled training data, which can be
costly, time-consuming, and may require domain expertise.

You might also like