Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Data Science

Certificate Program
Delivered In Collaboration with IBM

Contact Hours
Computer Skills, Basic knowledge
184 Hours in Maths & Analytical Mindset

No Programming Experience Required

Mode of Delivery : Instructor Led Training

Course objectives
360DigiTMG's Data Scientist Certification Programme is one of the most comprehensive data
scientist courses in India. It is specially designed to suit both data professionals and beginners who
want to make a career in this fast-growing profession. Over 4 months, students will learn key
techniques such as Statistical Analysis, Regression Analysis, Data Mining, Machine Learning,
Predictive Modeling, Forecasting, Text Mining, Deep Learning, Neural Networks, and tools such as
Python Programming, R Programming, Tableau, SQL, Big Data Hadoop, Spark and Cloud Computing.

Who should attend

• Candidates aspiring to be Data Scientist, • Employees of Organizations, who are
Machine Learning Expert, Analytics planning to shift to Data Science and
Manager/Professional, Business Analyst, Proactive Predictive Analytics
Data Analyst, etc.
• Mid-level Executives
• Graduates who are looking to build a
• Managers with Knowledge of Basic
career in Data Science and Machine
Learning Outcomes
The 4 Months training will provide some useful guidelines:

Work with various data generation sources

Perform text mining to generate

customer sentiment analysis

Analyze structured and unstructured data using

different tools and techniques

Develop an understanding of descriptive and

predictive analytics

Apply data-driven, machine learning approaches

for business decisions

Build prediction models for day-to-day applications

Perform forecasting to take proactive

business decisions

Use data visualization concepts to represent

data for easy understanding

Lecture with a blend Discussions &
Quizzes and
of theoretical & final exam to
group activities
practical exposure attain certificate

Live project opportunity to engage

Internship opportunity and explore Data Science and
for eligible candidates Machine Learning with USA based
Analytics Consulting firm - Innodatatics
Data Science Learning path

1 2 3
Basic of Python & R Application of Project Management
Programming Data Science Overview of
Handling Project

5 4

Data Preparation Data Collection

6 Data Analysis
/ Cleansing

7 8 9
Data Machine Learning / Black Box
Visualization Data Mining Techniques Text Mining
Supervised Learning NLP 10

12 11
Exclusive Forecasting / Data Mining &
IBM Modules Time Series Unsupervised

Final Exam Live Projects

Assignments & Resume &
Practice Sessions LinkedIn
Course Outline

Data Science
Module 1 Project Management Methodology
Topics • CRISP-DM

• Introduction to Big Data  Business Understanding

- Define Business Problem –
• Data, Data, Data everywhere Objective and Constraints
• Data and its uses – - Assess and Analyze Scenarios
A case study (Grocery store) - Define Data Mining Problem
- Project Plan
• Interactive Marketing using Data & IoT–
A case study  Data Understanding
• Stages of Analytics - Data Collection
 Descriptive Analytics - Data Description
 Diagnostic Analytics - Exploratory Data Analysis
 Predictive Analytics - Data Quality Analysis
 Prescriptive Analytics  Data Preparation
• Machine Learning Categories - Data Integration
 Supervised Learning - Data Wrangling
 Unsupervised Learning - Feature Extraction and Engineering
 Reinforcement Learning - Attribute Generation and Selection

• Data Science Project Lifecycle  Modeling

• Frameworks for Building Machine Learning - Selecting Modeling Methods

Systems - Model Training
- Model Evaluation and
 Knowledge Discovery Databases (KDD) Improving by Tuning
 SEMMA (Sample, Explore, - Model Assessment
Modify, Model, Assess)
 Cross-Industry Standard Process  Evaluation
for Data Mining  Deployment
Data Understanding: Exploratory
Module 2 Data Analytics (EDA) / Descriptive Analytics

Topics • Population and Sample

 Sampling Techniques
• Common Data Formats
- Probability Sampling (Unbiased)
 CSV - Non-Probability Sampling (Biased)
 XML • Sampling Techniques for handling Balanced
vs. Imbalanced Datasets
 SQL (Databases)  Random Resampling -
Under & Over Sampling
• Data Types  K-fold Cross-Validation
 Numeric (Quantitative)  SMOTE - Synthetic Minority
Oversampling Technique
 Categorical (Qualitative)
 MSMOTE - Modified SMOTE
 Continuous
 Cluster-Based Sampling
 Discrete
 Count • Sampling Funnel and its Components
 Text  Population
 Measurement Scales  Sampling Frame
- Nominal  Simple Random Sampling
- Ordinal  Sample
- Interval
• Data Cleansing / Preparation /
- Ratio Types
Wrangling / Munging
• Data Collection  Outlier Analysis / Treatment
 Primary Sources  Missing Values Handling / Imputation
- Surveys  Data Filtering
- Simulations  Typecasting
- Sensors Data  Transformations
- Design of Experiments, etc.  Duplicate Data Handling
 Secondary Sources  Managing Categorical Data
- Data Warehouses  Standardizing and Normalizing the Data
- Data Lakes  Zero and Near-Zero Variance Feature
- Databases (SQL, NoSQL, etc.) • Random Variable and its Definition
• Data and Datasets • Probability & Probability Distribution
 Structured Data vs. Unstructured Data  Continuous Probability Distribution /
 Big Data vs. Regular Size Data Probability Density Function
 Cross-Sectional Data vs. Time Series Data  Discrete Probability Distribution /
 Balanced vs. Imbalanced Data Probability Mass Function
 Offline vs. Real-Time Data
Statistical Data Business Intelligence
Module 3 and Data Visualization
Topics  Univariate
- Line Charts
• Measures of Central Tendency - Bar Plots
 Mean/Average - Dot Charts
 Median - Histograms / Frequency Distribution
 Mode - Box Plots / Box and Whisker Plots
- Density Plots
• Measures of Dispersion
- Q-Q Plots / Normal Quantile –
 Variance Quantile Plots
 Standard Deviation
 Bivariate
 Range
- Scatter Plots
• Measure of Skewness
 Multivariate
• Measure of Kurtosis
- Pair Plots
• Spread of the Data - Heat Maps
• Various Graphical Techniques - Correlation Matrix
to Understand Data

Module 4 Feature Engineering and Selection

Topics  One Hot Encoding Scheme

 Dummy Coding Schema
• Feature Engineering  Effect Coding Schema
• Binarization  Bin-Counting Schema
• Rounding  Feature Hashing Schema
• Interactions • Feature Engineering on Text Data
• Binning
• Feature Engineering on Temporal Data
 Fixed-Width Binning
• Feature Engineering on Image Data
• Adaptive Binning
• Feature Scaling
• Transformations
 Standardized Scaling
 Log Transform
 Min-Max Scaling
 Box-Cox Transform
 Robust Scaling
• Feature Engineering on Numeric Data
• Feature Selection Techniques
• Feature Engineering on Categorical Data
 Threshold-Based Methods
 Transforming Nominal Features  Statistical Methods
 Transforming Ordinal Features  Recursive Feature Elimination
• Encoding Categorical Features  Model-Based Selection
Probability and Probability Distributions
Module 5 (Continuous & Discrete)

• Discrete Probability Distribution - Binomial • Sampling Variation

• Central Limit Theorem
• Continuous Probability Distribution - Normal
• Confidence Interval - Concept
• Confidence Interval with Sigma
• Standard Normal Distribution /
Z-Distribution • t-Distribution / Student's-t Distribution
• Z scores and the Z table • Confidence Interval without Sigma
• QQ Plot / Quantile - Quantile plot  Population Parameter Standard
Deviation Known
• Sample Statistics
 Population Parameter Standard
• Population Parameters Deviation Not Known

• Inferential Statistics

Module 6 Confirmatory Analysis - Hypothesis Testing


• Business Understanding • Hypothesis Test Cases Based on Variable

of Interest being Evaluated
• Formulating a Hypothesis Statements
 Y is Continuous
• (Ho) Null Hypothesis – Default Condition /
Current Condition / Status Quo  Y is Discrete
• (Ha/H1) Alternative Hypothesis – Action • 1 Sample z-test
Condition • 2 Sample t-test
• Type I – (Alpha) – Caused by Rejection of a • Mann-Whitney Test
True Ho
• Paired t-test
• Type II Errors – Caused by No Rejectionof a
False Ho • ANOVA

• Comparative Study using Hypothesis testing • ANOVA vs. ANOM

• Parametric vs. Non-Parametric Test Cases • 2 Proportion Tests

• Chi-Square Test
• Tukey Test
Data Mining Supervised Learning –
Module 7 Regression Analysis

• Scatter Diagram • Simple Linear Regression

 Correlation Analysis – Direction,
• Non-Linear Regression Techniques
Strength, Linearity
 Exponential Regression
 Correlation vs. Covariance
 Logarithmic Regression
• Correlation and Causation  Polynomial Regression
• Correlation Coefficient (r)  Power Regression

• Principles of Regression • Zero Intercept Model

• Ordinary Least Squares – Unbiased Technique • Model Evaluation

 Loss Function
• Interpretation of Regression Output
 Cost Function
 Coefficients
 Error Function
 p-values for significance
 Residuals
 Coefficient of Determination (R2)

Predictive Modelling –
Module 8 Multiple Linear Regression
• Leverage
• Multivariate Regression
• Residuals vs. Predicting Variables Plots
• LINE assumption • Fitted vs. Residuals Plot
 Linearity • Histogram of the Normalized Residuals
 Collinearity (Variance Inflation Factor) • Q-Q plot of the Normalized Residuals
 Independent Errors • Shapiro-Wilk Normality Test on the
 Auto Correlation Residuals
 Normality • Cook’s Distance Plot of the Residuals
 Homoscedasticity / Equal Variance • Testing a Subset of Regression
 Heteroscedasticity Coefficients
• Multiple Linear Regression
• Model Quality Metrics
 Step AIC
• Deletion Diagnostics
 Forward Selection
• Influence Plot
 Backward Elimination
• Added Variable Plots
 Stepwise Method
• Cook’s Distance
Module 9 Lasso and Ridge Regressions


• Multiple R2 and Adjusted R2

• Understanding Overfitting (Variance) vs. Underfitting (Bias)
• Generalization Error
• Regularization Techniques
 L1 Norm
 L2 Norm
• Penalty Term for Cost Function
• LASSO (Least Absolute Shrinkage and Selection Operator) Regression
• Ridge Regression / Tikhonov Regularization
• Elastic Net Regression
• Finding Optimized Alpha

Logistic Regression –
Module 10 Binary Value Prediction, MLE

• Principles of Logistic Regression • Performance Metrics

 Precision
• Logit Function
 Sensitivity / Recall
• Types of Logistic Regression  Specificity
• Assumption & Steps in Logistic Regression  F1 Ratio

• Analysis of Simple Logistic Regression • Receiver Operating Characteristics

Results Curve (ROC curve)

• Multiple Logistic Regression • Area Under Curve (AUC)

• Confusion Matrix • Lift Charts and Gain Charts

 False Positive, False Negative • Finding the best Cutoff Value

 True Positive, True Negative • Risk-Taking vs. Risk-Averse Strategies
Multiclass Regression -
Module 11 Multinomial & Ordinal Logistic Regression

• Logit and Log-Likelihood • Interpretation of p-value’s

• Exponential Family of Distributions
• Category Baselining
 Bernoulli
• Modeling (Multi) Nominal Categorical Data  Dirichlet
• Modeling Ordinal Categorical Data  Gamma
 Geometric
• Multilogit Function
• Residual Deviance

Multiclass Regression -
Module 12 Multinomial & Ordinal Logistic Regression

• Over Dispersion • Effects of Interaction Variables

• Discrete Probability Distribution • Effects of Moderation Variables
 Negative Binomial Distribution
• Link Functions
 Poisson Distribution
 Identity Link
• Poisson Regression  Log Link
• Poisson Regression with Offset  Logit Link
 Probit Link
• Negative Binomial Regression  Log-Log Link
• Model Fit Test with Residual Deviance • Treatment of Data with Excessive
• Interpretation of Negative Binomial Zeros'
Regression Coefficients  Zero-Inflated Poisson
 Zero-Inflated Negative Binomial
• Interpretation of Poisson Regression
Coefficients  Hurdle Model

• Saturated Models
Data Mining Supervised Learning –
Module 13 Machine Learning - KNN Classifier

• Parametric Learning • Weighted Voting Process

• Building a KNN Model by Splitting • Deciding the best K value

the Data • Understanding various generalization and
• Calculating Distance regulation techniques to avoid Over Fitting
and Under Fitting
• Bias-Variance Tradeoff
• Improving Model Performance through

Module 14 Decision Tree

Topics • Developing a Tree using

Information Gained Technique
• Elements of Classification Tree: • Decision Tree C5.0
 Root Node
• Pruning
 Child Node
 Pre-Pruning
 Leaf Node, etc.
 Post-Pruning
• The decision to build a Tree
• Grafting Branches
• The decision on when to stop the  Sub-Tree Raising
growth of a Tree
 Sub-Tree Replacement
• Greedy Algorithm
• Strengths and Weakness of the
• Measure of Entropy Decision Tree

• Gini Index, Chi-Squared Statistic, • Devising Cost Matrix

Gain Ratio
• Attribute Selection using
Information Gain
Ensemble Techniques -
Module 15 Bagging and Boosting

• Overfitting • Sequential Model Training – Boosting

• Underfitting • The culmination of Multiple Trees - Random

Forest / Decision Tree Forest
• Bias vs. Variance
• Variable Importance Plot
• Voting
• Out-of-Bag Error Rate
 Soft Voting
 Hard Voting • Random Forest with k-Fold Validation

• Meta-Learning Methods • Strategies of Random Feature Selection

• Allocation Functions, Combination • Ensemble Learning for Regression

• Ensemble Learning for Classification
• Stacking / Stack Generalization
• Parallel Model Training - Bagging
(Bootstrap Aggregation)

Module 16 AdaBoost & Extreme Gradient Boosting


• AdaBoost / Adaptive Boosting

• Reweighting
• Gradient Boosting
• Extreme Gradient Boosting (XGB)
• Hyperparameters
 Cross-Validation
- Leave One Out CV
- K-Fold CV
- Stratified K-Fold CV
Module 17 Introduction to Neural Network

Topics • Iterative Approach

 Threshold Error
• Neurons of a Biological Brain  Predefined Iterations
• Artificial Neuron • Use Case to Classify a Linearly Separable Data
• Perceptron • Multilayer Perceptron to Handle
• Perceptron Algorithm Non-Linear Data

Module 18 Building Blocks of Neural Network

Topics • Learning Rate (eta)

• Integration Functions • Error Functions

 Mean Squared Error
• Activation Functions  Binary Cross-Entropy
• Weights  Cross-Entropy
• Bias

Deep Learning Black Box Technique -

Module 19 Neural Network

• Learning Rate (eta)
• Artificial Neural Networks  Momentum
• ANN Structure  Constant Learning Rate
 Shrinking Learning Rate
• Activation Functions
• Batch Gradient Descent
• Error Surface
• Stochastic Gradient Descent
• Gradient Descent Algorithm
• Minibatch Stochastic Gradient Descent
• Backward Propagation
• Optimization Methods: Adagrad,Adadelta,
• Network Topology RMSprop, Adam
• Principles of Gradient Descent
(Manual Calculation)
Deep Learning Algorithms for
Module 20 Videos, Images, Text
Topics  Disadvantages of MLP
 Back Propagation Through Time
• Convolution Neural Network (CNN)  Long Short-Term Memory (LSTM)
 ImageNet Challenge –  LSTM – Architecture
Winning Architectures
- Cell State
 Parameter Explosion with MLPs
- Input Gate
 Convolution Networks
- Output Gate
 Convolution Layers with Filters
- Forget Gate
and Visualizing Convolution Layers
- Sigmoid and Tanh
 Pooling Layer, Padding, Stride
 Properties of CNN  Gated Recurrent Network (GRU)
 Adversaries  Architecture & Gates
 Final Memory at Current Timestep
• Recurrent Neural Network
 Language Models
 Traditional Language Model

Module 21 Kernel Method – SVM


• Support Vector Machines / Large-Margin / • Non-Linear Kernel Tricks

Max-Margin Classifier  Linear Kernel
• Hyperplanes  Polynomial
 Sigmoid
• Best Fit "boundary"  Gaussian RBF
• Linear Support Vector Machine using • SVM for Multi-Class Classification
Maximum Margin
 One vs. All
• SVM for Noisy Data  One vs. One
• Non- Linear Space Classification • Directed Acyclic Graph (DAG) SVM
Text Mining & Natural Language
Module 22 Processing (NLP)

• Sources of Data • Semantic Network

• Bag of Words • Clustering

• Pre-Processing, Corpus Document • Extract User Reviews of the

Term Matrix (DTM) & TDM Product/Services from Amazon,
Snapdeal and Trip Advisor
• Stemming
• Extraction and Text Analytics in
• Lemmatization Python
• TF / TF-IDF • Latent Dirichlet Allocation (LDA)
• Word Clouds, Lexical Dispersion Plot • Topic Modelling
• Co-occurrence Matrix • Parts of Speech Tagging
• Corpus Level Word Clouds • Sentiment Extraction
 Sentiment Analysis
• Lexicons & Emotion Mining
 Positive Word Clouds
 Negative word Clouds
 Unigram, Bigram, Trigram

Machine Learning Classifier Technique -

Module 23 Naive Bayes
Topics • MAP Rule

• Probability, Joint Probability, • Practical Issue in Handling

Conditional Probability Continuous Attributes

• Bayes Rule • Underflow Prevention

• Naïve Bayes Classifier / Probabilistic • Laplace Estimator

Classification • Strengths and Weakness
• Prior Probability of Naïve Bayes
 Data Prior • Text Classification using Naive Bayes
 Class Prior
• Hidden Markov Models
 Marginal Likelihood
• Posterior Probability
Data Mining Unsupervised Learning –
Module 24 Clustering Topics

• Data Mining Process • Non-Hierarchical Clustering / K-Means

• Supervised vs Unsupervised Learning  Measurement Metrics of Clustering
• Measures of Distance - Within the Sum of Squares
 Numeric - Euclidean, Manhattan, - Between the Sum of Squares
Mahalanobis - Total Sum of Squares
 Categorical - Binary Euclidean,  Choosing the Ideal K value using
Simple Matching Coefficient, Screeplot / Elbow Curve
Jaquard's Coefficient
• K-Medians
 Mixed - Gower's General
Dissimilarity Coefficient • K-Medoids
• Types of Linkages • K-Modes
 Single Linkage / Nearest Neighbor • Clustering Large Application (Clara)
 Complete Linkage /
Farthest Neighbor • Partitioning Around Medoids (PAM)
 Average Linkage • Density-Based Spatial Clustering of
 Centroid Linkage Applications with Noise (DBSCAN)
• Hierarchical Clustering / Agglomerative • Ordering Points to Identify the
Clustering Clustering Structure (OPTICS)

Data Mining Unsupervised Learning -

Module 25 Dimension Reduction

• High Dimensional Data • Basics of Matrix Algebra

• Factor Analysis • 2D Visualization using Principal

• Dimension Reduction
• Linear Discriminant Analysis
• Advantages of PCA
• Singular Value Decomposition
• Calculation of PCA Weights
Data Mining Unsupervised Learning -
Module 26 Association Rules

• Market Basket / Affinity Analysis / • Sparse Matrix and Density Calculation

Relationship Mining • Apriori Algorithm
• If-Then Probabilistic Statements – • Visualizing Transaction Data
Classification Rules
• 3 Categories of Association Rules
• Measure of Association  Actionable
 Support  Trivial
 Confidence  Inexplicable
 Lift Ratio
• Sequential Pattern Mining
• Frequent Item Sets
• Drawbacks of Measures of Association

Module 27 Recommendation Engine


• User-Based Collaborative Filtering • Hybrid-Recommendation Engine

• The measure of Distance/Similarity • Popularity Based

between users Recommendation Engine

• Driver for Recommendation • SVD in Recommendation

• Computation Reduction Techniques • Matrix Factorization Based

Recommendation Engine
• Item to Item Collaborative Filtering
• The vulnerability of
• Search-Based Methods Recommender Systems
• Content-Based Filtering
Module 28 Network / Graph Analytics


• Definition of a Network / Graph • Centrality as Predictors

• Vertices / Nodes • Entity Resolution

• Edges / Connections / Links • Network Properties

 Adjacency Matrix  Path
 Unidirectional  Shortest Path
 Bidirectional  Diameter
 Average Path Length
• Node Properties  Density
 Degree Centrality  Cluster Coefficient
 Closeness Centrality
 Eigenvector Centrality • Community Detection Algorithm
 Betweenness Centrality  Edge Betweenness
 Google Page Ranking  Fast Greedy
 Diffusion Centrality  Leading Eigenvector

Module 29 Survival Analytics


• Examples of Survival Analysis

• Time to Event/ Duration Analysis
• Censoring
 Right Censored
 Left Censored
 Interval Censored
• Survival, Hazard, Cumulative Hazard
• Introduction to Parametric and
Non-Parametric Functions
• Kaplan-Meier Survival Function and Curve
Forecasting/Time Series –
Module 30 Model-Driven Algorithms

• Introduction to Time Series Data  Mean Square Error
• Steps to Forecasting  Root Mean Square Error
 Mean Percentage Error
• Components to Time Series Data  Mean Absolute Percentage Error
• Scatter Plot and Time Plot • Model-Based approaches
• Lag Plot  Linear Model
 Exponential Model
• ACF - Auto-Correlation Function /
Correlogram  Quadratic Model
 Additive Seasonality
• Visualization Principles  Multiplicative Seasonality
• Naïve Forecast Methods • Model-Based Approaches Continued
• Errors in the Forecast • AR (Auto-Regressive) Model for Errors
 Mean Error
 Mean Absolute Error • Random Walk

Forecasting/Time Series –
Module 31 Data-Driven Algorithms

• De-Seasoning and De-Trending
• ARMA (Auto-Regressive Moving Average),  Differencing
Order p and q
 Seasonal Index
• ARIMA (Auto-Regressive Integrated
• Econometric Models
Moving Average), Order p, d, and q
• ARCH and GARCH for High-Frequency Data
• Data-Driven Approach to Forecasting
• Smoothing Techniques
 Moving Average
- Centered Moving Average
- Training Moving Average
 Exponential Smoothing
 Holts / Double Exponential Smoothing
 Winters / Holt-Winters
Module 32 AutoML


• AutoML Methods • AutoML on Cloud - Azure

 Meta-Learning  Workspace
- Transfer Learning  Environment
- Few Shot Learning  Compute Instance
 Hyperparameter Optimization  Compute Targets
- Grid Search  Automatic Featurization
- Randomized Search  AutoML and ONNX
- Bayesian Optimization • AutoML on Cloud - GCP
 Neural Architecture Search  AutoML Natural Language
 Network Architecture Search Performing Document Classification
 AutoML Version API's For
• AutoML Systems
Image Classification
 Auto-WEKA
 Performing Sentiment Analysis
 Hyperopt – sklearn using AutoML Natural Language API
 Auto – sklearn  Tensor-Flow Models Using
 Auto-Net 1.0 & 2.0 Cloud ML Engine
 TPOT  Cloud ML Engine and Its
 Hyperras - keras Components
• AutoML on Cloud - AWS  Training and Deploying Applications
on Cloud ML Engine
 Amazon SageMaker
 Choosing Right Cloud ML Engine
 Sagaemaker Notebook Instance for
for Training Jobs
Model Development, Training and
 XG Boost Classification Model
 Training Jobs
 Hyperparameter Tuning Jobs

Exclusive Course Module from IBM

• Introduction to Data Science
• Python for Data Science
• Using R with Databases
Use Cases

Problem Type Use Cases ML Algorithms

Generate Hypothesis and Prove Hypothesis Testing - The ‘4’

them using various Test cases Must Know Hypothesis Tests

Predicting a continuous What would be the weight Simple Linear Regression

number gained on the consumption of
extra calories

Predicting a continuous How is Salary offered affected Simple Linear Regression

number based on the Employee's

Predicting a continuous What will be the sales for a Multiple Linear Regression
number computer accessories store?

Predicting a continuous Predict the price of a car with Multiple Linear Regression
number various features/attributes

Predict the probability of an Prediction of Extra Martial Logistic Regression

event (True/False) Affairs

Predict the probability of an Probability of a client Logistic Regression

event (True/False) subscribing to a term deposit

Predict the probability of an What is the probability of a Logistic Regression

event (True/False) candidate winning an election?

Predict the probability of Students likeliness to apply to Multiclass Classification models

event out of many possible graduate school from the survey (Nominal Regression)
events (Multi-class) feedback

Group the contents based on Customer Segmentation for the K-means clustering, Hierarchical
similarity Aviation sector clustering
Use Cases

Problem Type Use Cases ML Algorithms

Group the contents based on Group the States based on the K-means clustering, Hierarchical
similarity Crime Rate clustering

Group the contents based on Customer Segmentation for K-means clustering, Hierarchical
similarity Insurance firm clustering

Principal Component Analysis

Dimension Reduction Map high dimensional dataset
(PCA), Singular Value
to a smaller feature set
Decomposition (SVD), Linear
Discriminant Analysis

Measuring the dependency Identify the relationship Association Rule Mining

between entities between different books

Recommend systems Build a Recommendation Content-based filtering,

Engine to recommend Stand-up Collaborative filtering
Comedy videos to users

Categorizing the variable of Categorizing the class into one k-NN Classifier
interest into one of the of the classes based on its
ClassesClasses preparation and material used

Categorizing the variable of Classify the animals into one of k-NN Classifier
interest into one of the the multiclass types

Prediction of Categories Classify a loan applicant as Decision Tree, Random Forest,

Risky or Good Customer AdaBoost & Extreme Gradient

A cloth manufacturing company

Predict the probability of Decision Tree, Random Forest,
is interested to know about the
event out of many possible AdaBoost & Extreme Gradient
segment or attributes
events (Multi-class) Boosting
contributing to high sale.

Opinion Mining Predict the sentiment for a Text Mining & Natural Language
product no e-commerce Processing (NLP)
Use Cases

Problem Type Use Cases ML Algorithms

Opinion Mining Perform Competitor Analysis for Text Mining & Natural Language
Mobiles (iPhone and One plus) Processing (NLP)
based on the customer's

Naive Bayes Classifier

Predict the probability of an Predict the probability of a new
event (True/False) incoming email being Spam or

Predict the probability of an Prepare a classification model Naive Bayes Classifier

event out of many possible for salaried personal
events (Multi-class)

Predicting a continuous What will be the profits made by Artificial Neural Network, SVM
number Start-ups?

Predicting a continuous Predict the area damaged due Artificial Neural Network, SVM
number to wild forest fire.

The Strength of concrete

Predicting a continuous Artificial Neural Network, SVM
mixture defines the life of any
construction. Predict the
strength of concrete mixture.

Analyze the life tables of the

Survival Analytics
patients to predict the survival
Duration Analysis

Forecast the Solar Power

Prediction of a continuous Forecasting/Time Series
number in time

Prediction of a continuous Forecast the demand for Bike's Forecasting/Time Series

number in time on the data collected daily from
the Bike-sharing app

Prediction of a continuous Predict the sales of Coca-Cola Forecasting/Time Series

number in time for every quarter.

Prediction of a continuous Forecast the monthly plastic Forecasting/Time Series

number in time wastage being generated.
Software Used

Course Deliverables
Understanding of the real-life
01 application of concepts with
industry use cases

Additional assignments 02
of over 140+ hours

Access to free webinars and

03 the latest industry updates

24/7 interaction with our 04

trainers through online
forums and WhatsApp groups

Access to free
05 SAS Base e-learning

Hands-on experience 06
in a live project

Access to Learning
07 Management System
for LifeTime

Job assistance in 08
data science fields

Three mock interviews

09 to test your job-readiness
Accreditation to international certification bodies

For further details, call us at


36 ,1st Main Road , New Colony ,Behind Lalitha Jewellery ,

Chromepet , Chennai 600044


You might also like