Professional Documents
Culture Documents
MLAdv&Dis Adv
MLAdv&Dis Adv
Machine
Learning Models
Isha Rani
Data & AI Leader
@isharanimicrosoftleader
Linear Regression
Advantages
• Simple and easy to understand.
• Provides insights into relationships
between variables.
• Suitable for predicting linear trends.
• Efficient for large datasets.
• Facilitates feature importance analysis.
Disadvantages
• Assumes a linear relationship, limiting applicability.
• Sensitivity to outliers can impact model accuracy.
• May not capture complex, non-linear patterns.
• Assumes independence of errors, which may not always hold.
• Limited in handling categorical or non-continuous data.
Isha Rani
Data & AI Leader
@isharanimicrosoftleader
Logistic Regression
Advantages
• Effective for binary classification problems.
• Provides probabilities of class membership.
• Less prone to overfitting, especially with regularization.
• Works well when the relationship between features and
outcome is approximately linear.
• Robust to noise and can handle irrelevant features.
Disadvantages
• Assumes a linear relationship between features and log odds.
• Sensitive to outliers and multicollinearity.
• May struggle with high-dimensional datasets.
• Requires a large sample size for stable results.
Isha Rani
Data & AI Leader
@isharanimicrosoftleader
Decision Tree
Advantages
• Easily interpretable and visualizable.
• No assumptions about the distribution of data.
• Can handle both numerical and categorical data.
• Automatically selects important features.
• Robust to outliers in the data.
• Requires minimal data preprocessing.
Disadvantages
• Prone to overfitting, especially with deep trees.
• Instability – small variations in data may lead to different
splits.
• Biased towards features with more levels.
Isha Rani
Data & AI Leader
@isharanimicrosoftleader
K-Nearest Neighbour
Advantages
• Simple and easy to implement.
• Adapts well to changes in the dataset.
• Effective for both classification and regression tasks.
• No model training phase, making it versatile for dynamic data.
• Robust to noisy data and outliers.
Disadvantages
• Computationally expensive for large datasets.
• Requires a suitable distance metric for accurate results.
• Noisy data and irrelevant features can significantly impact
predictions.
Isha Rani
Data & AI Leader
@isharanimicrosoftleader
K-Means
Advantages
• Efficient and computationally faster for large datasets.
• Simple and intuitive algorithm for unsupervised clustering.
• Easily adaptable to different types of data and variable shapes.
• Scalable to a large number of dimensions/features.
• Works well with spherical or isotropic clusters.
• Can be used for preliminary data exploration and
segmentation.
Disadvantages
• Sensitive to the initial placement of centroids.
• Assumes clusters with similar size and density.
• Sensitive to outliers, which can distort cluster boundaries.
• Requires the specification of the number of clusters 'k' in
advance.
Isha Rani
Data & AI Leader
@isharanimicrosoftleader
Disadvantages
• Computationally intensive, especially with large datasets.
• Difficult to interpret and visualize complex decision
boundaries.
• Choice of kernel and associated parameters may require
careful tuning.
Isha Rani
Data & AI Leader
@isharanimicrosoftleader
Disadvantages
• Assumes linear relationships between variables, limiting its
applicability.
• May lose some information when reducing dimensionality.
Isha Rani
Data & AI Leader
@isharanimicrosoftleader
Naive Bayes
Advantages
• Simple and computationally efficient, especially for large
datasets.
• Requires a small amount of training data to estimate
parameters.
• Handles irrelevant features well due to the independence
assumption.
• Well-suited for online learning and real-time applications.
Disadvantages
• May struggle with capturing complex relationships in the data.
• Cannot model interactions between features.
• Sensitivity to the quality of the input data and features.
Isha Rani
Data & AI Leader
@isharanimicrosoftleader
ANN
Advantages
• Capable of learning complex non-linear relationships in data.
• Can automatically extract relevant features from raw data.
• Parallel processing capability enhances efficiency for certain
tasks.
• Adaptable to various problem types, including classification
and regression.
• Robust to noisy data and can handle large, high-dimensional
datasets.
Disadvantages
• Prone to overfitting, especially with limited training data.
• Difficulties in determining the optimal architecture and
hyperparameters.
• Computationally intensive, requiring substantial resources for
training.