SVM Report

B22bb009
Anchitya Kumar
ASSIGNMENT_4_LAB REPORT
Decision Boundaries
Determination of Decision Boundaries for Original and Scaled Data:
Two Linear Support Vector Classifier (LinearSVC) models were applied, one on the
original
data and the other utilizing {StandardScaler} to scale the data. Plotting the decision
boundaries for each model side by
side
QUESTION 2 : Kernel SVM and Decision Boundaries

1. Creation of an Artificial Dataset
Using scikit-learn's `make_moons` function, a synthetic dataset was

created. This dataset
has 200 data points, 15%–20% noise, and two classes that form crescent
moon shapes. The
dataset's synthetic nature makes controlled experimentation and analysis

possible.
2) . Decision Limits for Various Kernels
1. Put SVM models into practice using the linear, polynomial, and RBF
kernels.
Linear Kernel:
- Strengths: Basic and suitable for linearly separable data.
- Limitations: Ineffective for datasets with non-linear relationships.
Polynomial Kernel:
- Strengths: More versatile than the linear kernel, capable of capturing

some non-linear
patterns.
- Limitations: Prone to sensitivity towards the choice of the polynomial

degree.
3) . RBF Kernel:
- Strengths: Extremely flexible, adept at handling intricate non-linear

patterns.
- Limitations: Prone to overfitting if the hyperparameters are not

appropriately adjusted.
Plot the decision boundaries for each kernel on the

synthetic dataset
Kernel Complexity Analysis:
1. Linear Kernel: The linear kernel is appropriate for datasets with well-separated
classes. However, in the unscaled Iris dataset example, it struggles to capture
the curved decision boundary accurately. Feature scaling becomes crucial in this
case, highlighting the importance of preparing the data properly.
2. Polynomial Kernel: The polynomial kernel offers more flexibility than the
linear kernel, allowing it to model more complex decision boundaries. In the
make_moons dataset example, the polynomial kernel SVM fits a polynomial
decision boundary, which is particularly beneficial for datasets exhibiting curved
patterns.
3).RBF Kernel (Radial Basis Function): The RBF kernel stands out as the most
versatile option among the three. It excels at capturing intricate and nonlinear
decision boundaries. In the make_moons dataset, the RBF kernel SVM provides a
smooth, curvilinear decision boundary that adapts well to complex datasets with
non-linear structures. Its adaptability enables it to handle a wide range of
decision boundaries, both simple and intricate.
Interplay of Hyperparameters:
The interplay between the gamma (γ) and C hyperparameters in the RBF kernel
SVM plays a pivotal role in balancing model bias and variance:
- Gamma (γ): A higher γ value makes the decision boundary more sensitive to
individual data points, resulting in a complex and intricately shaped boundary.
Conversely, a lower γ value creates a smoother, less complex boundary. In the
visualizations, you can discern that elevated γ values lead to intricate decision
boundaries, which, if not managed properly, can result in overfitting.
- C: The C hyperparameter influences the SVM's emphasis on accurate

classification of training points. A higher C enforces a hard margin, which means
the model is less tolerant of misclassification. In contrast, a lower C allows some
misclassification, yielding a softer margin. In scenarios with low C, the model may
generalize effectively but might not perform well on noisy training data.
The dynamic interplay between γ and C allows you to strike a balance between
model bias and variance. In cases where the data is relatively clean and well-
structured, opting for a higher C and a moderate γ can create a hard margin.
However, when dealing with noisy data or aiming for superior generalization, you
may lean towards lower C and γ values.
Generalization vs. Overfitting:
Ensuring that the RBF kernel SVM generalizes effectively while fine-tuning
hyperparameters is of utmost importance. Strategies to promote generalization
and mitigate overfitting include:
1. Cross-validation: Employing techniques like cross-validation (e.g., GridSearchCV)

facilitates the discovery of the optimal γ and C combination. It assesses the
model's performance on different data subsets, diminishing the risk of overfitting.
2. Regularization (C): Adjusting the C parameter enables the control of

regularization strength. Excessive C values can induce overfitting, underscoring
the significance of finding an ideal C that balances model bias and variance.
3. Gamma (γ): Exploring a range of γ values helps in averting overfitting. High γ

values can lead to overcomplex models, reinforcing the importance of selecting an
appropriate γ value.
4. Data Splitting: Utilizing train-test splitting ensures that the model's

performance is evaluated on unseen data, offering insights into its generalization
capabilities.
5. Visualization: The visual examination of decision boundaries aids in assessing

the model's complexity and suitability for the dataset, thereby helping in the
quest for generalization without overfitting.

SVM Report

Uploaded by

Copyright:

Available Formats

You might also like

SVM Report

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SVM Report

Uploaded by

Copyright:

Available Formats

B22bb009

Determination of Decision Boundaries for Original and Scaled Data:

QUESTION 2 : Kernel SVM and Decision Boundaries

Using scikit-learn's `make_moons` function, a synthetic dataset was

dataset's synthetic nature makes controlled experimentation and analysis

2) . Decision Limits for Various Kernels

- Strengths: Basic and suitable for linearly separable data.

- Limitations: Ineffective for datasets with non-linear relationships.

- Strengths: More versatile than the linear kernel, capable of capturing

- Limitations: Prone to sensitivity towards the choice of the polynomial

- Strengths: Extremely flexible, adept at handling intricate non-linear

- Limitations: Prone to overfitting if the hyperparameters are not

Plot the decision boundaries for each kernel on the

- C: The C hyperparameter influences the SVM's emphasis on accurate

1. Cross-validation: Employing techniques like cross-validation (e.g., GridSearchCV)

2. Regularization (C): Adjusting the C parameter enables the control of

3. Gamma (γ): Exploring a range of γ values helps in averting overfitting. High γ

4. Data Splitting: Utilizing train-test splitting ensures that the model's

5. Visualization: The visual examination of decision boundaries aids in assessing

You might also like