SVM Report

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

B22bb009

Anchitya Kumar

ASSIGNMENT_4_LAB REPORT

Decision Boundaries

Determination of Decision Boundaries for Original and Scaled Data:

Two Linear Support Vector Classifier (LinearSVC) models were applied, one on the
original

data and the other utilizing {StandardScaler} to scale the data. Plotting the decision
boundaries for each model side by
side

QUESTION 2 : Kernel SVM and Decision Boundaries


1. Creation of an Artificial Dataset

Using scikit-learn's `make_moons` function, a synthetic dataset was


created. This dataset

has 200 data points, 15%–20% noise, and two classes that form crescent
moon shapes. The

dataset's synthetic nature makes controlled experimentation and analysis


possible.

2) . Decision Limits for Various Kernels

1. Put SVM models into practice using the linear, polynomial, and RBF
kernels.
Linear Kernel:

- Strengths: Basic and suitable for linearly separable data.

- Limitations: Ineffective for datasets with non-linear relationships.

Polynomial Kernel:

- Strengths: More versatile than the linear kernel, capable of capturing


some non-linear

patterns.

- Limitations: Prone to sensitivity towards the choice of the polynomial


degree.

3) . RBF Kernel:

- Strengths: Extremely flexible, adept at handling intricate non-linear


patterns.

- Limitations: Prone to overfitting if the hyperparameters are not


appropriately adjusted.

Plot the decision boundaries for each kernel on the


synthetic dataset
Kernel Complexity Analysis:

1. Linear Kernel: The linear kernel is appropriate for datasets with well-separated
classes. However, in the unscaled Iris dataset example, it struggles to capture
the curved decision boundary accurately. Feature scaling becomes crucial in this
case, highlighting the importance of preparing the data properly.

2. Polynomial Kernel: The polynomial kernel offers more flexibility than the
linear kernel, allowing it to model more complex decision boundaries. In the
make_moons dataset example, the polynomial kernel SVM fits a polynomial
decision boundary, which is particularly beneficial for datasets exhibiting curved
patterns.

3).RBF Kernel (Radial Basis Function): The RBF kernel stands out as the most
versatile option among the three. It excels at capturing intricate and nonlinear
decision boundaries. In the make_moons dataset, the RBF kernel SVM provides a
smooth, curvilinear decision boundary that adapts well to complex datasets with
non-linear structures. Its adaptability enables it to handle a wide range of
decision boundaries, both simple and intricate.

Interplay of Hyperparameters:

The interplay between the gamma (γ) and C hyperparameters in the RBF kernel
SVM plays a pivotal role in balancing model bias and variance:

- Gamma (γ): A higher γ value makes the decision boundary more sensitive to
individual data points, resulting in a complex and intricately shaped boundary.
Conversely, a lower γ value creates a smoother, less complex boundary. In the
visualizations, you can discern that elevated γ values lead to intricate decision
boundaries, which, if not managed properly, can result in overfitting.

- C: The C hyperparameter influences the SVM's emphasis on accurate


classification of training points. A higher C enforces a hard margin, which means
the model is less tolerant of misclassification. In contrast, a lower C allows some
misclassification, yielding a softer margin. In scenarios with low C, the model may
generalize effectively but might not perform well on noisy training data.

The dynamic interplay between γ and C allows you to strike a balance between
model bias and variance. In cases where the data is relatively clean and well-
structured, opting for a higher C and a moderate γ can create a hard margin.
However, when dealing with noisy data or aiming for superior generalization, you
may lean towards lower C and γ values.
Generalization vs. Overfitting:

Ensuring that the RBF kernel SVM generalizes effectively while fine-tuning
hyperparameters is of utmost importance. Strategies to promote generalization
and mitigate overfitting include:

1. Cross-validation: Employing techniques like cross-validation (e.g., GridSearchCV)


facilitates the discovery of the optimal γ and C combination. It assesses the
model's performance on different data subsets, diminishing the risk of overfitting.

2. Regularization (C): Adjusting the C parameter enables the control of


regularization strength. Excessive C values can induce overfitting, underscoring
the significance of finding an ideal C that balances model bias and variance.

3. Gamma (γ): Exploring a range of γ values helps in averting overfitting. High γ


values can lead to overcomplex models, reinforcing the importance of selecting an
appropriate γ value.

4. Data Splitting: Utilizing train-test splitting ensures that the model's


performance is evaluated on unseen data, offering insights into its generalization
capabilities.

5. Visualization: The visual examination of decision boundaries aids in assessing


the model's complexity and suitability for the dataset, thereby helping in the
quest for generalization without overfitting.

You might also like