AI-Module 4 - Updated

MODULE 4
28/04/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 1

What is a Feature?
• In the context of machine learning, a feature (also known as a variable

or attribute) is an individual measurable property or characteristic of a
data point that is used as input for a machine learning algorithm.
Features can be numerical, categorical, or text-based, and they
represent different aspects of the data that are relevant to the problem
at hand.
• For example, in a dataset of housing prices, features could include the
number of bedrooms, the square footage, the location, and the age of
the property. In a dataset of customer demographics, features could
include age, gender, income level, and occupation.
• The choice and quality of features are critical in machine learning, as
they can greatly impact the accuracy and performance of the model.
2
Feature Engineering
• Feature engineering is the pre-processing step of machine learning,
which is used to transform raw data into features that can be used for
creating a predictive model using Machine learning.
• All machine learning algorithms take input data to generate the output.
• The input data remains in a tabular form consisting of rows (instances
or observations) and columns (variable or attributes), and these attributes
are often known as features.
• Example: An image is an instance in computer vision, but a line in the
image could be the feature. In NLP, a document can be an observation, and
the word count could be the feature.
• A feature is an attribute or individual measurable property or
characteristic of a phenomenon.
3
• Feature engineering process selects the most useful predictor variables
for the model.
• Feature engineering in ML contains mainly four processes: Feature

Creation, Transformations, Feature Extraction, and Feature Selection.
4
1. Feature Creation:
• Feature creation is finding the most useful variables to be used in a
predictive model.
• The process is subjective, and it requires human creativity and
intervention.
• The new features are created by mixing existing features using
addition, subtraction, and ratio, and these new features have great
flexibility.
• Types of Feature Creation:
• Domain-Specific: Creating new features based on domain knowledge, such
as creating features based on business rules or industry standards.
• Data-Driven: Creating new features by observing patterns in the data, such
as calculating aggregations or creating interaction features.
• Synthetic: Generating new features by combining existing features or
synthesizing new data points. 5
2. Transformation:
• It involves adjusting the predictor variable to improve the accuracy and performance of the
model.
• It ensures that all the variables are on the same scale, making the model easier to
understand.
• It ensures that all the features are within the acceptable range to avoid any computational
error.
• Types of Feature Transformation:
1.Normalization: Rescaling the features to have a similar range, such as between 0 and 1, to
prevent some features from dominating others.
2.Scaling: Scaling is a technique used to transform numerical variables to have a similar scale,
so that they can be compared more easily. Rescaling the features to have a similar scale, such
as having a standard deviation of 1, to make sure the model considers all features equally.
3.Encoding: Transforming categorical features into a numerical representation. Examples are
one-hot encoding and label encoding.
4.Transformation: Transforming the features using mathematical operations to change the
distribution or scale of the features. Examples are logarithmic, square root, and reciprocal
6
3. Feature Extraction:
• Feature extraction is an automated feature engineering process that generates new
variables by extracting them from the raw data.
• The aim is to reduce the volume of data so that it can be easily used and managed
for data modelling.
• Feature extraction methods include cluster analysis, text analytics, edge
detection algorithms, and principal components analysis (PCA).
• Types of Feature Extraction:
1.Dimensionality Reduction: Reducing the number of features by transforming the
data into a lower-dimensional space while retaining important information.
2.Feature Combination: Combining two or more existing features to create a new one.
For example, the interaction between two features.
3.Feature Aggregation: Aggregating features to create a new one. For example,
calculating the mean, sum, or count of a set of features.
4.Feature Transformation: Transforming existing features into a new representation.
For example, log transformation of a feature with a skewed distribution. 7
4. Feature Selection:
• Feature selection is a way of selecting the subset of the most relevant features
from the original features set by removing the redundant, irrelevant, or noisy
features.
• This is done in order to reduce overfitting in the model and improve the
performance.
• Types of Feature Selection:
1.Filter Method: Based on the statistical measure of the relationship between the
feature and the target variable. Features with a high correlation are selected.
2.Wrapper Method: Based on the evaluation of the feature subset using a specific
machine learning algorithm. The feature subset that results in the best
performance is selected.
3.Embedded Method: Based on the feature selection as part of the training process
of the machine learning algorithm.
8
Feature Engineering Techniques
1. Imputation:
• Imputation deals with handling missing values in data.
• Deleting records that are missing is one way of dealing with missing data
issue. But it could lead to losing out on a chunk of valuable data. This is
where imputation can help.
• Data imputation can be classified into two types:
 Categorical Imputation: Missing categorical values are generally
replaced by the most commonly occurring value (mode) of the feature.
 Numerical Imputation: Missing numerical values are generally replaced
by the mean or median of the corresponding feature.
9
• Example: Categorical Imputation
10
• Example: Numerical Imputation

2. Discretization:
• Discretization involves taking a set of values of data and grouping sets
of them in some logical fashion into bins (or buckets).
• Binning can apply to numerical values as well as to categorical values.
Prof. Trupthi Rao, Dept. of AI & DS, GAT 12

• The grouping of data can be done as follows:
 Grouping of equal intervals (equal width)
 Grouping based on equal frequencies (of observations in the bin)
• Example:
Prof. Trupthi Rao, Dept. of AI & DS, GAT 13

3. Categorical encoding:
• Categorical encoding is the technique used to encode categorical features
into numerical values which are usually simpler for an algorithm to
understand.
• This can be done by:
(i) Integer Encoding
(ii) One-Hot Encoding
28/04/2024 14
(i) Integer Encoding:
• Integer encoding consist in replacing the
categories by digits from 1 to n (or 0 to n-1),
where n is the number of distinct categories of
the variable.
• Each unique category is assigned an integer
value.
• This method is also called as label encoding.
• This method is used when there exists ordinal
relationship in the variables.
28/04/2024 15
(ii) One-Hot Encoding:
• For categorical variables where no ordinal
relationship exists, a one-hot encoding (OHE) can be
applied.
• Here a new binary variable is added for each
unique integer value.
• In the “color” variable example, there are 3
categories: red, green and blue.
• Therefore 3 binary variables: ‘color_red’,
‘color_blue’ and ‘color_green’ are needed.
• A “1” value is placed in the binary variable for the
color and “0” values for the other colors.
• The binary variables are often called “dummy
variables or indicator variables”.
4. Feature Splitting:
• Feature splitting is the process of separating features into two or more
parts to make new features.
• This technique helps the algorithms to better understand and learn the
patterns in the dataset.
• Example 1: Sale Date is split into year, month and day.
28/04/2024 17
• Example 2: Time stamp is split into 6 different attributes.

5. Handling outliers:
• Outliers are unusually high or low values in the dataset which are
unlikely to occur in normal scenarios.
• Since the outliers could adversely affect the model prediction they must be
handled appropriately.
• Methods of handling outliers include:
 Removal: The records containing outliers are removed from the variable.
However, the presence of outliers over multiple variables could result in
losing out on a large portion of the data.
 Replacing values: The outliers could alternatively be treated as missing
values and replaced by using appropriate imputation.
 Capping: Capping the maximum and minimum values and replacing them
with an arbitrary value.
6. Variable transformations:
• Variable transformation techniques
could help with normalizing skewed
data.
• Skewness is a measure of the
asymmetry of a distribution.
• A distribution is asymmetrical when its
left and right side are not mirror images.
• Some of the variable transformations are
the Logarithmic transformation, Square
root transformation and Box cox
transformation which when applied on
heavy-tailed distributions results in less
skewed values.
7. Scaling:
• Feature scaling is a method used to normalize the range of independent
variables or features of data.
• The commonly used processes of scaling include:
 Min-Max Scaling/Normalization: This process involves the rescaling of
all values in a feature in the range 0 to 1. In other words, the minimum
value in the original range will take the value 0, the maximum value will
take 1 and the rest of the values in between the two extremes will be
appropriately scaled.
 Standardization/Variance scaling: Mean is subtracted from every data

point and the result is divided by the standard deviation to arrive at a
distribution with a 0 mean and variance of 1.
8. Creating features:
• Feature creation involves deriving new features from existing ones.
• This can be done by simple mathematical operations such as aggregations
to obtain the mean, median, mode, sum, or difference and even product of
two values.
• These features, although derived directly from the given data, when
carefully chosen to relate to the target can have an impact on the
performance.
• Example:

Introduction to ML
24
a) Evolution of Machine Learning
• The term Machine Learning (ML) was first used by Arthur Samuel, one of
the pioneers of Artificial Intelligence at IBM, in 1959.
• Machine learning (ML) is an important tool for the goal of leveraging
technologies around artificial intelligence.
• Because of its learning and decision-making abilities, machine learning is often
referred to as AI, though, in reality, it is a subdivision of AI.
• Until the late 1970s, it was a part of AI’s evolution. Then, it branched off to
evolve on its own.
• Machine learning is now responsible for some of the most significant
advancements in technology.
28/04/2024 25
b) What is Machine Learning (ML)?
• Machine learning (ML) is defined as a discipline of artificial intelligence (AI) that
provides machines the ability to automatically learn from data and past experiences
to identify patterns and make predictions with minimal human intervention.
• Machine learning is a branch of artificial intelligence (AI) and computer science
which focuses on the use of data and algorithms to imitate the way that humans learn,
gradually improving its accuracy.
• Machine learning is an application of AI that provides systems the ability to learn on

their own and improve from experiences without being programmed externally.
• Machine learning was defined by Stanford University as “the science of getting

computers to act without being explicitly programmed.”
• Traditional programming is a manual process—here the programmer
creates the program. Programming aims to answer a problem using a
predefined set of rules or logic.
• In machine learning, the algorithm automatically formulates the rules

from the data. Machine learning seeks to construct a model or logic for the
problem by analyzing its input data and answers.
28/04/2024 29
c) Types of ML
• Based on the methods and way of learning, machine learning is
divided into six types.
28/04/2024 30
• supervised machine learning, is defined by its use of labeled datasets
to train algorithms to classify data or predict outcomes accurately.
• As input data is fed into the model, the model adjusts its weights until
it has been fitted appropriately. This occurs as part of the cross
validation process to ensure that the model avoids overfitting or
underfitting
• Supervised learning helps organizations solve a variety of real-world
problems at scale, such as classifying spam in a separate folder from
your inbox. Some methods used in supervised learning include neural
networks, naïve bayes, linear regression, logistic regression, random
forest, and support vector machine (SVM).
28/04/2024 31
1. Supervised Machine Learning Algorithms:
• The primary purpose of supervised learning is to scale the scope of
data and to make predictions of unavailable, future or unseen data
based on labeled sample data.
• Supervised learning is where there are input variables (x) and an
output variable (Y) and an algorithm is used to learn the mapping
function from the input to the output Y = f(x) .
• The goal is to approximate the mapping function so well that when
there comes a new input data (x), the machine should be able to
predict the output variable (Y) for that data.
• Supervised machine learning includes two major
processes: classification and regression.
 Classification is the process which basically categorizes a set of data into classes
(yes/no, true/false, 0/1, yes/no/may be). There are various types of Classification
problems, such as: Binary Classification, Multi-class Classification, Multi-label
Classification. Examples for classification problems are: Spam filtering, Image
classification, Sentiment analysis, Classifying cancerous and non-cancerous
tumors, Customer churn prediction etc.
 Regression is the process of identifying patterns and calculating the predictions

of continuous outcomes. The different types of regression analysis techniques get
used when the target and independent variables show a linear or non-linear
relationship between each other, and the target variable contains continuous values.
Examples for regression problems are: predicting the house rate, predicting month’s
sales, predicting age of a person, prediction of rain, determining Market trends etc.
• Which of the following is/are classification tasks(s)? (Multiple options
may be correct)
• (a) Predicting whether an email is spam or not spam
• (b) Predicting the number of COVID cases over a given period
• (c) Predicting the score of a cricket team
• (d) Identifying the language of a text document

• Which of the following is/are regression task(s)? (Multiple options
may be correct)
• (a) Predicting whether or not a customer will repay a loan based on
their credit history 1
• (b) Forecasting the amount of rainfall in a given place
• (c) Identifying the types of crops from aerial images of farms
• (d) Predicting the future price of a stock

28/04/2024 37
28/04/2024 38
• The most widely used supervised algorithms are:
 Linear Regression
 Logistic Regression
 Random Forest
 Boosting algorithms
 Support Vector Machines
 Decision Trees
 Naive Bayes
 Nearest Neighbor.

2. Unsupervised Machine Learning Algorithms:
• Unsupervised learning feeds on unlabeled data.
• In unsupervised machine learning algorithms, the desired results are unknown and yet to be
defined.
• Unsupervised learning algorithms apply the following techniques to describe the data:
 Clustering: It is an exploration of data used to segment it into meaningful groups (i.e., clusters)
based on their internal patterns without any prior knowledge of group credentials. The credentials
are defined by similarity of individual data objects and also aspects of its dissimilarity from the
rest. Examples: Identifying fraudulent or criminal activity, classifying network traffic, Identifying
Fake News etc.
 Dimensionality reduction: Most of the time, there is a lot of noise in the incoming data.
Machine learning algorithms use dimensionality reduction to remove this noise while distilling
the relevant information. Examples: Image compression, classify a database full of emails into
“not spam” and “spam”.
• unsupervised machine learning, uses machine learning algorithms to
analyze and cluster unlabeled datasets (subsets called clusters). These
algorithms discover hidden patterns or data groupings without the
need for human intervention.
• This method’s ability to discover similarities and differences in
information make it ideal for exploratory data analysis, cross-selling
strategies, customer segmentation, and image and pattern recognition.
• It’s also used to reduce the number of features in a model through the
process of dimensionality reduction. Principal component analysis
(PCA) and singular value decomposition (SVD) are two common
approaches for this. Other algorithms used in unsupervised learning
include neural networks, k-means clustering, and probabilistic
clustering methods.

• Which of the following is/are unsupervised learning problem(s)?
(Multiple options may be correct) (a) Grouping documents into
different categories based on their topics (b) Forecasting the hourly
temperature in a city based on historical temperature patterns (c)
Identifying close-knit communities of people in a social network (d)
Training an autonomous agent to drive a vehicle (e) Identifying
different species of animals from images

• The most widely used unsupervised algorithms are:
 K-means clustering
 PCA (Principal Component Analysis)
 Association rule.

3. Semi-supervised Machine Learning Algorithms:
• Semi-supervised learning algorithms represent a middle ground between
supervised and unsupervised algorithms.
• In this type of learning, the algorithm is trained upon a combination of
labeled and unlabelled data.
• This combination will contain a very small amount of labeled data and a
very large amount of unlabelled data.
• The basic procedure involved is that first, the programmer will cluster
similar data using an unsupervised learning algorithm and then use the
existing labeled data to label the rest of the unlabelled data.
• Examples: Text document classifier, Speech analysis etc.
• One of the popular Semi-supervised ML algorithm is Label Propagation
algorithm.
• Semi-supervised learning
• Semi-supervised learning offers a happy medium between supervised
and unsupervised learning. During training, it uses a smaller labeled
data set to guide classification and feature extraction from a larger,
unlabeled data set. Semi-supervised learning can solve the problem of
not having enough labeled data for a supervised learning algorithm. It
also helps if it’s too costly to label enough data.

4. Reinforcement Machine Learning Algorithms:
• Reinforced ML employs a technique called exploration/exploitation.
• It’s an iterative algorithm. The action takes place, the consequences
are observed, and the next action considers the results of the first
action.
• Using this algorithm, the machine is trained to make specific
decisions.
• It works this way: The machine is exposed to an environment where it
trains itself continually using trial and error. The machine learns from
past experience and tries to capture the best possible knowledge to
make accurate business decisions.
• Examples: Video games, Self-driving cars etc.
• Which of the following statement(s) about Reinforcement Learning
(RL) is/are true? (Multiple options may be correct) (a) While learning
a policy, the goal is to maximize the long-term reward. (b) During
training, the agent is explicitly provided the most optimal action to be
taken in each state. (c) The state of the environment changes based
on the action taken by the agent. (d) RL is used for building agents to
play chess. (e) RL is used for predicting the prices of apartments from
their features.

• Most common reinforcement learning algorithms include:
 Q-Learning
 Temporal Difference (TD)
 Monte-Carlo Tree Search (MCTS)
 Asynchronous Actor-Critic Agents (A3C).

AI-Module 4 - Updated

Uploaded by

Copyright:

Available Formats

You might also like

AI-Module 4 - Updated

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AI-Module 4 - Updated

Uploaded by

Copyright:

Available Formats

MODULE 4

28/04/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 1

• In the context of machine learning, a feature (also known as a variable

• Feature engineering in ML contains mainly four processes: Feature

28/04/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 11

Prof. Trupthi Rao, Dept. of AI & DS, GAT 12

Prof. Trupthi Rao, Dept. of AI & DS, GAT 13

28/04/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 18

 Standardization/Variance scaling: Mean is subtracted from every data

28/04/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 22

• Machine learning is an application of AI that provides systems the ability to learn on

• Machine learning was defined by Stanford University as “the science of getting

• In machine learning, the algorithm automatically formulates the rules

 Regression is the process of identifying patterns and calculating the predictions

28/04/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 34

28/04/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 35

28/04/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 40

28/04/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 42

28/04/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 43

28/04/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 44

28/04/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 47

28/04/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 51

28/04/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 53

You might also like