Professional Documents
Culture Documents
ML Notes
ML Notes
1. Polynomial Features:
- Polynomial features involve creating new features by raising existing
features to a power.
- This technique allows the model to capture nonlinear relationships
between variables.
- For example, if you have a feature 'x', you can create polynomial
features like 'x^2', 'x^3', etc.
2. Interaction Features:
- Interaction features are created by combining two or more existing
features.
- These features can capture relationships between variables that may
be useful for prediction.
- For example, if you have features 'x' and 'y', you can create an
interaction feature 'x*y'.
3. Logarithmic Transformation:
- Applying a logarithmic transformation to a feature can help in
handling skewed or exponentially distributed data.
- It can reduce the impact of large values and make the relationship
between variables more linear.
5. Binning/Discretization:
- Binning involves dividing a continuous feature into discrete intervals
or bins.
- It can help in capturing non-linear relationships or handling outliers.
- Binning can be done based on equal-width intervals, equal-frequency
intervals, or using custom intervals based on domain knowledge.
6. Time-based Features:
- If your dataset contains timestamps or time-related information, you
can extract additional features from them.
- Examples include hour of the day, day of the week, month, season, or
time differences between events.
7. Statistical Aggregations:
- You can compute statistical aggregations on numeric features, such as
mean, median, sum, variance, etc.
- Aggregating features over different groups or time windows can
capture useful patterns or trends in the data.
8. Domain-specific Transformations:
- Depending on the domain or problem you are working on, you might
apply specific transformations that make sense for the data.
- Examples include logarithmic returns for financial data, ratios,
percentage changes, or domain-specific calculations.
1. Time Decomposition:
- Extract components such as year, month, day, hour, minute, second
from the timestamp.
- These components can provide insights into seasonal patterns, daily
trends, or specific time intervals that may be relevant for the problem.
2. Time Lags:
- Create lag features by shifting the values of a variable forward or
backward in time.
- Lags can capture temporal dependencies and help the model
understand how the target variable changes over time.
- For example, you can create features representing the value of a
variable one hour ago, one day ago, or one week ago.
4. Time Since:
- Calculate the time elapsed since a specific event or reference point.
- For example, you can calculate the time since the last purchase, the
time since the last login, or the time since a specific event occurred.
- These features can capture recency or the temporal relationship
between events.
5. Cyclical Encoding:
- Encode cyclical time features, such as hour of the day or month of the
year, using trigonometric transformations.
- Cyclical encoding helps capture circular patterns in time, where the
first and last values are close to each other.
- For example, you can represent the hour of the day as sine and cosine
components to preserve the cyclical nature of the variable.
6. Time-based Aggregations:
- Compute various statistical aggregations (mean, median, min, max,
etc.) for different time periods, such as hourly, daily, weekly, or monthly.
- Aggregations can capture trends, seasonality, or periodic patterns in
the data.
7. Event Count:
- Count the occurrences of specific events within a given time window.
- For example, count the number of purchases, logins, or any other
relevant events that occurred within the last hour, day, or week.
- Event counts can capture activity levels or the intensity of certain
behaviors.
8. Time Intervals:
- Calculate the duration or time differences between specific events or
reference points.
- For example, compute the time duration between two consecutive
purchases, the time between order placement and delivery, or the time
since the last login session.
- These features can capture waiting times, time intervals between
events, or other time-related metrics.
2. Normalize the data: PCA requires the data to be normalized so that the
features have the same scale. This can be achieved by subtracting the
mean and dividing by the standard deviation.
5. Select the top k eigenvectors: The top k eigenvectors with the highest
eigenvalues are selected. These eigenvectors represent the most important
directions in the data.
6. Project the data onto the new feature space: The original data is
projected onto the new feature space spanned by the top k eigenvectors.
This results in a lower-dimensional representation of the data.
PCA can be used to extract meaningful features from image data and
reduce the dimensionality of the data, which can make subsequent
machine learning tasks more efficient and effective. However, it's
important to note that PCA may not always result in the best features for
a particular task, and other techniques such as convolutional neural
networks may be more appropriate for some image data applications.
https://www.javatpoint.com/principal-component-analysis
UNIT-IV
https://www.javatpoint.com/machine-learning-support-vector-machine-
algorithm
f(x) = w^T * x + b
where:
- f(x) represents the output of the discriminant function for a given input
sample x.
- w is the weight vector that determines the orientation of the decision
boundary.
- x is the input sample or feature vector.
- b is the bias term or the offset.
The weight vector w and the bias term b are learned during the training
process of an SVM.
During the training phase of an SVM, the weights w and the bias term b
are optimized to find the best possible separation between the two classes.
This optimization is typically formulated as a quadratic programming
problem with constraints, where the objective is to maximize the margin
while minimizing the classification errors.
Once the SVM is trained, the learned weights and bias are used in the
Linear Discriminant Function to classify new unseen samples. By
evaluating the sign of f(x) for a given input, the SVM assigns the sample
to one of the two classes.
In SVMs, the goal is to find the hyperplane that best separates the two
classes while maximizing the margin. The margin is defined as the
distance between the decision boundary and the closest training samples
of each class. The hyperplane that achieves the largest margin is
considered the optimal solution.
The significance of the maximal margin can be understood through the
following points:
4. Support Vectors: The data points that lie closest to the decision
boundary and determine its position are called support vectors. These
support vectors have a crucial role in defining the decision boundary and
maximizing the margin. By focusing on these critical points, SVMs
effectively capture the most informative samples and utilize them for
classification.
The linear soft margin classifier introduces a slack variable (ξ) for each
data point, which allows for misclassification or data points falling within
the margin or on the wrong side of the decision boundary. The
introduction of slack variables relaxes the constraint of perfect separation
and enables the model to tolerate a certain amount of error.
The main differences between the linear soft margin classifier and the
linear maximal margin classifier are as follows:
3. Trade-off between margin and errors: While the linear maximal margin
classifier prioritizes maximizing the margin and finding the hyperplane
with the largest separation, the linear soft margin classifier balances
between maximizing the margin and minimizing the classification errors.
It seeks a compromise between a larger margin and allowing some
misclassifications.