MLAdv&Dis Adv

Isha Rani
Data & AI Leader

@isharanimicrosoftleader
Machine
Learning Models
Isha Rani
Data & AI Leader
Linear Regression
Advantages
• Simple and easy to understand.
• Provides insights into relationships
between variables.
• Suitable for predicting linear trends.
• Efficient for large datasets.
• Facilitates feature importance analysis.
Disadvantages
• Assumes a linear relationship, limiting applicability.
• Sensitivity to outliers can impact model accuracy.
• May not capture complex, non-linear patterns.
• Assumes independence of errors, which may not always hold.
• Limited in handling categorical or non-continuous data.
Isha Rani
Data & AI Leader
Logistic Regression
Advantages
• Effective for binary classification problems.
• Provides probabilities of class membership.
• Less prone to overfitting, especially with regularization.
• Works well when the relationship between features and
outcome is approximately linear.
• Robust to noise and can handle irrelevant features.
Disadvantages
• Assumes a linear relationship between features and log odds.
• Sensitive to outliers and multicollinearity.
• May struggle with high-dimensional datasets.
• Requires a large sample size for stable results.
Isha Rani
Data & AI Leader
Decision Tree
Advantages
• Easily interpretable and visualizable.
• No assumptions about the distribution of data.
• Can handle both numerical and categorical data.
• Automatically selects important features.
• Robust to outliers in the data.
• Requires minimal data preprocessing.
Disadvantages
• Prone to overfitting, especially with deep trees.
• Instability – small variations in data may lead to different
splits.
• Biased towards features with more levels.
Isha Rani
Data & AI Leader
K-Nearest Neighbour
Advantages
• Simple and easy to implement.
• Adapts well to changes in the dataset.
• Effective for both classification and regression tasks.
• No model training phase, making it versatile for dynamic data.
• Robust to noisy data and outliers.
Disadvantages
• Computationally expensive for large datasets.
• Requires a suitable distance metric for accurate results.
• Noisy data and irrelevant features can significantly impact
predictions.
Isha Rani
Data & AI Leader
K-Means
Advantages
• Efficient and computationally faster for large datasets.
• Simple and intuitive algorithm for unsupervised clustering.
• Easily adaptable to different types of data and variable shapes.
• Scalable to a large number of dimensions/features.
• Works well with spherical or isotropic clusters.
• Can be used for preliminary data exploration and
segmentation.
Disadvantages
• Sensitive to the initial placement of centroids.
• Assumes clusters with similar size and density.
• Sensitive to outliers, which can distort cluster boundaries.
• Requires the specification of the number of clusters 'k' in
advance.
Isha Rani
Data & AI Leader
Support Vector Machine

Advantages
• Effective in high-dimensional spaces, even with limited data.
• Robust in handling non-linear decision boundaries through
kernel functions.
• Optimizes the margin, promoting generalization to unseen
data.
• Versatile, suitable for both classification and regression tasks.
• Resistant to overfitting, especially in high-dimensional spaces.
Disadvantages
• Computationally intensive, especially with large datasets.
• Difficult to interpret and visualize complex decision
boundaries.
• Choice of kernel and associated parameters may require
careful tuning.
Isha Rani
Data & AI Leader
Principal Component Analysis

Advantages
• Reduces dimensionality while retaining most of the variability
in the data.
• Uncovers underlying patterns and relationships between
features.
• Mitigates multicollinearity issues in regression and
classification models.
• Useful for visualization and exploration of high-dimensional
datasets.
Disadvantages
• Assumes linear relationships between variables, limiting its
applicability.
• May lose some information when reducing dimensionality.
Isha Rani
Data & AI Leader
Naive Bayes
Advantages
• Simple and computationally efficient, especially for large
datasets.
• Requires a small amount of training data to estimate
parameters.
• Handles irrelevant features well due to the independence
assumption.
• Well-suited for online learning and real-time applications.
Disadvantages
• May struggle with capturing complex relationships in the data.
• Cannot model interactions between features.
• Sensitivity to the quality of the input data and features.
Isha Rani
Data & AI Leader
ANN
Advantages
• Capable of learning complex non-linear relationships in data.
• Can automatically extract relevant features from raw data.
• Parallel processing capability enhances efficiency for certain
tasks.
• Adaptable to various problem types, including classification
and regression.
• Robust to noisy data and can handle large, high-dimensional
datasets.
Disadvantages
• Prone to overfitting, especially with limited training data.
• Difficulties in determining the optimal architecture and
hyperparameters.
• Computationally intensive, requiring substantial resources for
training.

MLAdv&Dis Adv

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MLAdv&Dis Adv

Uploaded by

Copyright:

Available Formats

Isha Rani

Data & AI Leader

Support Vector Machine

Principal Component Analysis

You might also like