ML Notes

Feature engineering on Numerical data:
Feature engineering on numeric data involves creating new features or

transforming existing ones to enhance the predictive power of machine
learning models. Here are some common techniques for feature
engineering on numeric data:
1. Polynomial Features:
- Polynomial features involve creating new features by raising existing
features to a power.
- This technique allows the model to capture nonlinear relationships
between variables.
- For example, if you have a feature 'x', you can create polynomial
features like 'x^2', 'x^3', etc.
2. Interaction Features:
- Interaction features are created by combining two or more existing
features.
- These features can capture relationships between variables that may
be useful for prediction.
- For example, if you have features 'x' and 'y', you can create an
interaction feature 'x*y'.
3. Logarithmic Transformation:
- Applying a logarithmic transformation to a feature can help in
handling skewed or exponentially distributed data.
- It can reduce the impact of large values and make the relationship
between variables more linear.
4. Scaling and Normalization:

- Scaling and normalization techniques are used to bring features to a
similar scale and remove differences in magnitude.
- Common scaling methods include standardization (mean = 0, standard
deviation = 1) and min-max scaling (values mapped to a specified range,
often 0 to 1).
- Scaling helps algorithms that are sensitive to the scale of features,
such as gradient-based methods.
5. Binning/Discretization:
- Binning involves dividing a continuous feature into discrete intervals
or bins.
- It can help in capturing non-linear relationships or handling outliers.
- Binning can be done based on equal-width intervals, equal-frequency
intervals, or using custom intervals based on domain knowledge.
6. Time-based Features:
- If your dataset contains timestamps or time-related information, you
can extract additional features from them.
- Examples include hour of the day, day of the week, month, season, or
time differences between events.
7. Statistical Aggregations:
- You can compute statistical aggregations on numeric features, such as
mean, median, sum, variance, etc.
- Aggregating features over different groups or time windows can
capture useful patterns or trends in the data.
8. Domain-specific Transformations:
- Depending on the domain or problem you are working on, you might
apply specific transformations that make sense for the data.
- Examples include logarithmic returns for financial data, ratios,
percentage changes, or domain-specific calculations.
Remember that feature engineering is an iterative process, and it's

important to evaluate the impact of the engineered features on the model's
performance through cross-validation or other evaluation techniques.
Feature engineering on Temporal data:
Feature engineering on temporal data involves extracting meaningful

features from timestamps or time-related information. Here are some
techniques for feature engineering on temporal data:
1. Time Decomposition:
- Extract components such as year, month, day, hour, minute, second
from the timestamp.
- These components can provide insights into seasonal patterns, daily
trends, or specific time intervals that may be relevant for the problem.
2. Time Lags:
- Create lag features by shifting the values of a variable forward or
backward in time.
- Lags can capture temporal dependencies and help the model
understand how the target variable changes over time.
- For example, you can create features representing the value of a
variable one hour ago, one day ago, or one week ago.
3. Rolling and Expanding Statistics:

- Compute rolling or expanding statistics to capture trends and patterns
over a specific window of time.
- Rolling statistics involve calculating metrics like mean, sum, standard
deviation, etc., over a fixed-size window that moves along with time.
- Expanding statistics, on the other hand, consider all the data points up
to a certain time point.
- These features can capture moving averages, cumulative sums, or
other patterns that change over time.
4. Time Since:
- Calculate the time elapsed since a specific event or reference point.
- For example, you can calculate the time since the last purchase, the
time since the last login, or the time since a specific event occurred.
- These features can capture recency or the temporal relationship
between events.
5. Cyclical Encoding:
- Encode cyclical time features, such as hour of the day or month of the
year, using trigonometric transformations.
- Cyclical encoding helps capture circular patterns in time, where the
first and last values are close to each other.
- For example, you can represent the hour of the day as sine and cosine
components to preserve the cyclical nature of the variable.
6. Time-based Aggregations:
- Compute various statistical aggregations (mean, median, min, max,
etc.) for different time periods, such as hourly, daily, weekly, or monthly.
- Aggregations can capture trends, seasonality, or periodic patterns in
the data.
7. Event Count:
- Count the occurrences of specific events within a given time window.
- For example, count the number of purchases, logins, or any other
relevant events that occurred within the last hour, day, or week.
- Event counts can capture activity levels or the intensity of certain
behaviors.
8. Time Intervals:
- Calculate the duration or time differences between specific events or
reference points.
- For example, compute the time duration between two consecutive
purchases, the time between order placement and delivery, or the time
since the last login session.
- These features can capture waiting times, time intervals between
events, or other time-related metrics.
Remember to consider the problem domain, the specific temporal

characteristics of the data, and the context of the problem when applying
these techniques. Exploratory data analysis and domain knowledge play a
crucial role in identifying relevant features and deriving insights from
temporal data.
Feature engineering on Image data
Feature engineering is an important step in machine learning, and it can

be especially critical in working with image data. Here are a few common
techniques for feature engineering on image data:
1. Color spaces: Image data can be represented in different color spaces

such as RGB, HSV, CMYK, etc. Depending on the use case, a particular
color space may be more suitable. For example, HSV is often used in
image segmentation tasks, while RGB is commonly used for object
detection.
2. Histograms: A histogram of an image can be used to extract features

that represent the distribution of pixel intensities. Histograms can be
computed for different color channels, and can be used to detect specific
patterns or textures in an image.
3. Edge detection: Edge detection algorithms can be used to identify the

boundaries between different regions in an image. This can be useful for
object detection or segmentation tasks.
4. Texture analysis: Texture analysis techniques can be used to extract

features that describe the patterns and structures in an image. This can
include methods such as Gabor filters or local binary patterns.
5. Convolutional neural networks: CNNs are a powerful technique for
feature extraction in image data. They automatically learn features from
the raw pixel data by training on a large dataset of labeled images.
In general, the choice of feature engineering technique will depend on the

specific problem and the characteristics of the image data. A combination
of techniques may be necessary to achieve the best results.
Principal component analysis:

Principal Component Analysis (PCA) is a widely used technique for
feature extraction and dimensionality reduction. PCA can be applied to a
variety of data types, including image data. Here's how it works for
feature extraction in image data:
1. Convert the image data into a matrix: Each image is represented as a

matrix where each pixel is a feature.
2. Normalize the data: PCA requires the data to be normalized so that the
features have the same scale. This can be achieved by subtracting the
mean and dividing by the standard deviation.
3. Compute the covariance matrix: The covariance matrix represents the

relationship between the features. It is computed as the dot product of the
transpose of the matrix with itself.
4. Compute the eigenvectors and eigenvalues: The eigenvectors of the

covariance matrix represent the directions in which the data varies the
most. The eigenvalues represent the amount of variance in each direction.
5. Select the top k eigenvectors: The top k eigenvectors with the highest
eigenvalues are selected. These eigenvectors represent the most important
directions in the data.
6. Project the data onto the new feature space: The original data is
projected onto the new feature space spanned by the top k eigenvectors.
This results in a lower-dimensional representation of the data.
PCA can be used to extract meaningful features from image data and
reduce the dimensionality of the data, which can make subsequent
machine learning tasks more efficient and effective. However, it's
important to note that PCA may not always result in the best features for
a particular task, and other techniques such as convolutional neural
networks may be more appropriate for some image data applications.
https://www.javatpoint.com/principal-component-analysis
UNIT-IV
1) What is a Support Vector Machine (SVM)? Explain the basic

working principle of SVMs and their advantages in comparison
to other machine learning algorithms.
https://www.javatpoint.com/machine-learning-support-vector-machine-
algorithm
2) How does the Linear Discriminant Function help with binary

classification in SVMs? Describe the mathematical formulation of
the Linear Discriminant Function and how it can be used to
separate two classes of data.
The Linear Discriminant Function, also known as the decision function or

hyperplane, plays a crucial role in binary classification using Support
Vector Machines (SVMs). It helps to separate two classes of data by
creating a decision boundary in the feature space.
Mathematically, the Linear Discriminant Function takes the form:
f(x) = w^T * x + b
where:
- f(x) represents the output of the discriminant function for a given input
sample x.
- w is the weight vector that determines the orientation of the decision
boundary.
- x is the input sample or feature vector.
- b is the bias term or the offset.
The weight vector w and the bias term b are learned during the training
process of an SVM.
The goal of the Linear Discriminant Function is to assign a positive or

negative value to the input samples based on their position with respect to
the decision boundary. If f(x) is positive, the sample is classified as one
class, and if it is negative, it is classified as the other class.
The decision boundary itself is defined by the equation:

w^T * x + b = 0
For SVMs, the decision boundary is specifically designed to maximize

the margin, which is the distance between the decision boundary and the
nearest training samples of both classes. SVMs aim to find the hyperplane
that separates the two classes with the maximum margin, providing a
robust decision boundary.
During the training phase of an SVM, the weights w and the bias term b
are optimized to find the best possible separation between the two classes.
This optimization is typically formulated as a quadratic programming
problem with constraints, where the objective is to maximize the margin
while minimizing the classification errors.
Once the SVM is trained, the learned weights and bias are used in the
Linear Discriminant Function to classify new unseen samples. By
evaluating the sign of f(x) for a given input, the SVM assigns the sample
to one of the two classes.
In summary, the Linear Discriminant Function in SVMs provides a

mathematical formulation for separating two classes of data. It takes an
input sample, applies a linear transformation using weights and a bias
term, and assigns a class label based on the sign of the resulting value.
The decision boundary defined by the discriminant function maximizes
the margin between the classes, leading to an effective and robust binary
classification.
3) Discuss the concept of a linear maximal margin classifier for

linearly separable data in SVMs. Explain the significance of the
maximal margin and how it helps with classification.
The concept of a linear maximal margin classifier in Support Vector

Machines (SVMs) is based on the idea of finding a decision boundary
(hyperplane) that maximizes the margin between two classes of linearly
separable data. The maximal margin is significant because it offers
several benefits for classification.
In SVMs, the goal is to find the hyperplane that best separates the two
classes while maximizing the margin. The margin is defined as the
distance between the decision boundary and the closest training samples
of each class. The hyperplane that achieves the largest margin is
considered the optimal solution.
The significance of the maximal margin can be understood through the
following points:
1. Robustness: A large margin provides a buffer zone between the

decision boundary and the data points. This means that even if there are
small perturbations or outliers in the training data, they are less likely to
influence the classification. The decision boundary obtained by
maximizing the margin tends to be more resilient to noise and generalizes
better to unseen data.
2. Generalization: A larger margin corresponds to a lower degree of

model complexity. By maximizing the margin, SVMs prioritize simpler
decision boundaries that are less likely to over-fit the training data. This
promotes better generalization to new, unseen samples and helps avoid
the problem of over fitting, where the model becomes too specialized to
the training data and performs poorly on unseen data.
3. Optimal separation: The maximal margin classifier aims to find the

hyperplane that maximally separates the two classes. By doing so, it
maximizes the distance between the classes, making the classification
more reliable and reducing the chances of misclassification. The maximal
margin provides a clear separation between the two classes, enhancing
the discriminative power of the classifier.
4. Support Vectors: The data points that lie closest to the decision
boundary and determine its position are called support vectors. These
support vectors have a crucial role in defining the decision boundary and
maximizing the margin. By focusing on these critical points, SVMs
effectively capture the most informative samples and utilize them for
classification.
The concept of the linear maximal margin classifier in SVMs

demonstrates the importance of finding a decision boundary that
maximizes the margin between classes. This approach offers robustness
to noise, promotes better generalization, provides optimal separation, and
focuses on the most informative support vectors. By considering these
factors, SVMs achieve effective and accurate classification for linearly
separable data.
4) What is a linear soft margin classifier in SVMs? Explain why it is

needed when dealing with overlapping classes and how it is
different from the linear maximal margin classifier.
A linear soft margin classifier in Support Vector Machines (SVMs) is an
extension of the linear maximal margin classifier that allows for some
degree of overlap between the classes. It is designed to handle situations
where the data points of different classes are not completely separable by
a linear boundary.
In real-world scenarios, it is common to encounter datasets where the

classes are not perfectly separable due to various factors such as noise,
overlapping distributions, or inherent class structure. In such cases, the
linear maximal margin classifier may not be applicable because it strictly
requires a clear separation between the classes. This is where the linear
soft margin classifier comes into play.
The linear soft margin classifier introduces a slack variable (ξ) for each
data point, which allows for misclassification or data points falling within
the margin or on the wrong side of the decision boundary. The
introduction of slack variables relaxes the constraint of perfect separation
and enables the model to tolerate a certain amount of error.
The objective of the linear soft margin classifier is to find a decision

boundary that balances between maximizing the margin and minimizing
the classification errors. The optimization problem is formulated to
minimize the sum of the slack variables while still maintaining a
reasonable margin. This is achieved by introducing a regularization term
that penalizes the slack variables in the objective function.
The main differences between the linear soft margin classifier and the
linear maximal margin classifier are as follows:
1. Handling overlapping classes: The linear maximal margin classifier

assumes perfectly separable classes, where there is no overlap between
the data points of different classes. In contrast, the linear soft margin
classifier is designed to handle overlapping classes by allowing
misclassifications and data points within the margin.
2. Slack variables: The linear soft margin classifier introduces slack

variables (ξ) to handle misclassifications and data points that fall within
the margin or on the wrong side of the decision boundary. These slack
variables represent the degree of error or violation of the margin
constraints.
3. Trade-off between margin and errors: While the linear maximal margin
classifier prioritizes maximizing the margin and finding the hyperplane
with the largest separation, the linear soft margin classifier balances
between maximizing the margin and minimizing the classification errors.
It seeks a compromise between a larger margin and allowing some
misclassifications.
4. Regularization: The linear soft margin classifier incorporates a

regularization term in the objective function to control the extent of error
tolerance. The regularization term helps to prevent overfitting by
penalizing large slack variables and encouraging a simpler decision
boundary.
In summary, the linear soft margin classifier in SVMs is used when

dealing with overlapping classes, where the data points are not linearly
separable. It relaxes the strict requirement of perfect separation by
introducing slack variables and balancing between margin maximization
and error minimization. This approach enables SVMs to handle more
complex datasets with overlapping classes while still providing effective
and robust classification.
5) Explain the concept of kernel-induced feature spaces in SVMs.

Describe how a nonlinear classifier can be constructed using this
approach and what types of kernels are commonly used.

ML Notes

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Notes

Uploaded by

Copyright:

Available Formats

Feature engineering on Numerical data:

Feature engineering on numeric data involves creating new features or

4. Scaling and Normalization:

Remember that feature engineering is an iterative process, and it's

Feature engineering on Temporal data:

Feature engineering on temporal data involves extracting meaningful

3. Rolling and Expanding Statistics:

Remember to consider the problem domain, the specific temporal

Feature engineering on Image data

Feature engineering is an important step in machine learning, and it can

1. Color spaces: Image data can be represented in different color spaces

2. Histograms: A histogram of an image can be used to extract features

3. Edge detection: Edge detection algorithms can be used to identify the

4. Texture analysis: Texture analysis techniques can be used to extract

In general, the choice of feature engineering technique will depend on the

Principal component analysis:

1. Convert the image data into a matrix: Each image is represented as a

3. Compute the covariance matrix: The covariance matrix represents the

4. Compute the eigenvectors and eigenvalues: The eigenvectors of the

1) What is a Support Vector Machine (SVM)? Explain the basic

2) How does the Linear Discriminant Function help with binary

The Linear Discriminant Function, also known as the decision function or

Mathematically, the Linear Discriminant Function takes the form:

The goal of the Linear Discriminant Function is to assign a positive or

The decision boundary itself is defined by the equation:

For SVMs, the decision boundary is specifically designed to maximize

In summary, the Linear Discriminant Function in SVMs provides a

3) Discuss the concept of a linear maximal margin classifier for

The concept of a linear maximal margin classifier in Support Vector

1. Robustness: A large margin provides a buffer zone between the

2. Generalization: A larger margin corresponds to a lower degree of

3. Optimal separation: The maximal margin classifier aims to find the

The concept of the linear maximal margin classifier in SVMs

4) What is a linear soft margin classifier in SVMs? Explain why it is

In real-world scenarios, it is common to encounter datasets where the

The objective of the linear soft margin classifier is to find a decision

1. Handling overlapping classes: The linear maximal margin classifier

2. Slack variables: The linear soft margin classifier introduces slack

4. Regularization: The linear soft margin classifier incorporates a

In summary, the linear soft margin classifier in SVMs is used when

5) Explain the concept of kernel-induced feature spaces in SVMs.

You might also like