Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Isha Rani

Data & AI Leader


@isharanimicrosoftleader

Machine
Learning Models
Isha Rani
Data & AI Leader
@isharanimicrosoftleader

Linear Regression
Advantages
• Simple and easy to understand.
• Provides insights into relationships
between variables.
• Suitable for predicting linear trends.
• Efficient for large datasets.
• Facilitates feature importance analysis.

Disadvantages
• Assumes a linear relationship, limiting applicability.
• Sensitivity to outliers can impact model accuracy.
• May not capture complex, non-linear patterns.
• Assumes independence of errors, which may not always hold.
• Limited in handling categorical or non-continuous data.
Isha Rani
Data & AI Leader
@isharanimicrosoftleader

Logistic Regression
Advantages
• Effective for binary classification problems.
• Provides probabilities of class membership.
• Less prone to overfitting, especially with regularization.
• Works well when the relationship between features and
outcome is approximately linear.
• Robust to noise and can handle irrelevant features.

Disadvantages
• Assumes a linear relationship between features and log odds.
• Sensitive to outliers and multicollinearity.
• May struggle with high-dimensional datasets.
• Requires a large sample size for stable results.
Isha Rani
Data & AI Leader
@isharanimicrosoftleader

Decision Tree
Advantages
• Easily interpretable and visualizable.
• No assumptions about the distribution of data.
• Can handle both numerical and categorical data.
• Automatically selects important features.
• Robust to outliers in the data.
• Requires minimal data preprocessing.

Disadvantages
• Prone to overfitting, especially with deep trees.
• Instability – small variations in data may lead to different
splits.
• Biased towards features with more levels.
Isha Rani
Data & AI Leader
@isharanimicrosoftleader

K-Nearest Neighbour
Advantages
• Simple and easy to implement.
• Adapts well to changes in the dataset.
• Effective for both classification and regression tasks.
• No model training phase, making it versatile for dynamic data.
• Robust to noisy data and outliers.

Disadvantages
• Computationally expensive for large datasets.
• Requires a suitable distance metric for accurate results.
• Noisy data and irrelevant features can significantly impact
predictions.
Isha Rani
Data & AI Leader
@isharanimicrosoftleader

K-Means
Advantages
• Efficient and computationally faster for large datasets.
• Simple and intuitive algorithm for unsupervised clustering.
• Easily adaptable to different types of data and variable shapes.
• Scalable to a large number of dimensions/features.
• Works well with spherical or isotropic clusters.
• Can be used for preliminary data exploration and
segmentation.

Disadvantages
• Sensitive to the initial placement of centroids.
• Assumes clusters with similar size and density.
• Sensitive to outliers, which can distort cluster boundaries.
• Requires the specification of the number of clusters 'k' in
advance.
Isha Rani
Data & AI Leader
@isharanimicrosoftleader

Support Vector Machine


Advantages
• Effective in high-dimensional spaces, even with limited data.
• Robust in handling non-linear decision boundaries through
kernel functions.
• Optimizes the margin, promoting generalization to unseen
data.
• Versatile, suitable for both classification and regression tasks.
• Resistant to overfitting, especially in high-dimensional spaces.

Disadvantages
• Computationally intensive, especially with large datasets.
• Difficult to interpret and visualize complex decision
boundaries.
• Choice of kernel and associated parameters may require
careful tuning.
Isha Rani
Data & AI Leader
@isharanimicrosoftleader

Principal Component Analysis


Advantages
• Reduces dimensionality while retaining most of the variability
in the data.
• Uncovers underlying patterns and relationships between
features.
• Mitigates multicollinearity issues in regression and
classification models.
• Useful for visualization and exploration of high-dimensional
datasets.

Disadvantages
• Assumes linear relationships between variables, limiting its
applicability.
• May lose some information when reducing dimensionality.
Isha Rani
Data & AI Leader
@isharanimicrosoftleader

Naive Bayes
Advantages
• Simple and computationally efficient, especially for large
datasets.
• Requires a small amount of training data to estimate
parameters.
• Handles irrelevant features well due to the independence
assumption.
• Well-suited for online learning and real-time applications.

Disadvantages
• May struggle with capturing complex relationships in the data.
• Cannot model interactions between features.
• Sensitivity to the quality of the input data and features.
Isha Rani
Data & AI Leader
@isharanimicrosoftleader

ANN
Advantages
• Capable of learning complex non-linear relationships in data.
• Can automatically extract relevant features from raw data.
• Parallel processing capability enhances efficiency for certain
tasks.
• Adaptable to various problem types, including classification
and regression.
• Robust to noisy data and can handle large, high-dimensional
datasets.

Disadvantages
• Prone to overfitting, especially with limited training data.
• Difficulties in determining the optimal architecture and
hyperparameters.
• Computationally intensive, requiring substantial resources for
training.

You might also like