Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Data Preprocessing:

Dataset Processing (EDA, Web Scrapping, Feature Engineering,)


Multicollinearity

Numpy, Pandas, Matplotlib, Plotly, Seaborn, Scipy, Scikit learn, Regex, File Operation in
python
One hot encoding
Normalization
Regularization, Generalization
Computer Vision Data Pre Processing
● Introduction to Image Pre-Processing
● Data augmentation
● Transformation operation
● Pixel brightness transformations(PBT)
● Gamma Correction
● Histogram equalization
● Sigmoid stretching
● Geometric Transformations
● Image Filtering and Segmentation
● Image Segmentation
● Fourier transform
NLP-Data PreProcessing
● Tokenization
● Lemmatization and stemming.
● Bag of words
● TF-IDF
● Stop Words Removal
● NGrams
● Regex Matching
● Text Matching
● Chunking
● Date Matcher
● Part-of-speech tagging
● Sentence Detector (DL models)
● Dependency parsing
● Sentiment Detection (ML models)
● Spell Checker (ML & DL models)
● Doc2Vec Embeddings (Word2Vec)
● Word2Vec Embeddings (Word2Vec)
● Word Embeddings (GloVe & Word2Vec)
● Sentiment Analysis.
● Named Entity Recognition.(NER)
● Summarization.
● Topic Modeling.
● Text Classification.
● Keyword Extraction.
● LDA

Supervised Machine Learning Algorithm:


Type 1: Regression Algorithm**
a. Linear Regression
b. Support Vector Regression
c. Decision Tree Regression
d. Random Forest Regression

Type 2: Classification Algorithm


a. logistic regression
b. SVM
c. KNN
d. Random Forest
e. Decision Tree
f. Naive Bayesian
g. Ensemble Techniques(Bagging, Boosting, Gradient Boosting, Ada Boosting, XG
Boosting)

Unsupervised Machine Learning Algorithm:


1. K-means clustering
2. Hierarchical clustering
3. APIORI
4. PCA
Deep-learning algorithm:
1. Neural Networks
2. CNN
3. Gradient Descent
4. GAN
Reinforcement Learning:(Update will be future)

NLP:
Words Cloud
Knowledge graphs
BERT Embeddings
DistilBERT Embeddings
RoBERTa Embeddings
DeBERTa Embeddings
XLM-RoBERTa Embeddings
Longformer Embeddings
ALBERT Embeddings
XLNet Embeddings
RNN
LSTM
GRU
Transformer
ELMO Embeddings
Universal Sentence Encoder
Sentence Embeddings
Chunk Embeddings
Neural Machine Translation (MarianMT)
Text-To-Text Transfer Transformer (Google T5)
Generative Pre-trained Transformer 2 (OpenAI GPT-2)
Unsupervised keywords extraction
Language Detection & Identification (up to 375 languages)
Multi-class Text Classification (DL model)
Multi-label Text Classification (DL model)
Multi-class Sentiment Analysis (DL model)
BERT for Token & Sequence Classification
DistilBERT for Token & Sequence Classification
ALBERT for Token & Sequence Classification
RoBERTa for Token & Sequence Classification
XLM-RoBERTa for Token & Sequence Classification
XLNet for Token & Sequence Classification
Longformer for Token & Sequence Classification
Named entity recognition (DL model)
Easy TensorFlow integration
GPU Support
Full integration with Spark ML functions

*** Key important topics


1. Sigmoid vs Softmax
2. Overfitting, Underfitting
3. Parameter, Hyperparameter
4. Multi-layer Perceptron, Neural Network
5. Activation function
6. Forward propagation, Backward propagation
7. Deep neural network
8. Vanishing gradients
9. Data augmentation
10. Transfer learning
11. Image classification, Classification with localization, Object detection
12. YOLO algorithm
13. Neural style transfer
14. Sentiment classification
15. Transformer
16. K fold cross-validation techniques
17. Confusion matrix
18. NER(For NLP)
19. Tensorflow,OpenCV,Pytorch,Keras etc.

Mathematics & Statistics:


1. Calculus
2. Linear Algebra
3. Statistics
4. Discrete Mathematics

Additional Learning:
1. MLOPs
2. Git, Github
3. Pyspark, Hadoop, etc.
4. Docker, Fast API, Rest API
5. NoSQL,MongoDB,MySQL
6. Data Structure and Algorithms****
7. AWS, Microsoft Azure, Google Cloud Platform
8. Stackhome (e.g from where can we get the dataset?)

You might also like