Data_science

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Full Stack

DATA SCIENCE & AI


PYTHON
Introduction to Data Science  Tuple Immutable concept
 len() || count() || index()
 Introduction to Data Science
 Forward indexing
 Discussion on Course Curriculum
 Backward Indexing
 Introduction to Programming
Dictionary and Dictionary comprehension
Python – Basics
 create a dictionary using variable
 Introduction to Python: Installation and
 keys:values concept
Running
 len() || keys() || values() || items()
(Jupyter Notebook, .py file from terminal,
 get() || pop() || update()
Google Colab)
 comparision of datastructure
 Data types and type conversion
 Introduce to range()
 Variables
 pass range() in the list
 Operators
 range() arguments
 Flow Control : If, Elif, Else
 For loop introduction using range()
 Loops
 Python Identifier Functions
 Building Funtions (print, type, id, sys, len)
 Inbuilt vs User Defined
Python - Data Types & Utilities  User Defined Function
 Function Argument
 List, List of Lists and List Comprehension
 Types of Function Arguments
 List creation
 Actual Argument
 Create a list with variable
 Global variable vs Local variable
 List mutable concept
 Anonymous Function | LAMBDA
 len() || append() || pop()
 insert() || remove() || sort() || reverse() Packages
 Forward indexing
 Backward Indexing Map Reduce
 Forward slicing OOPs
 Backward slicing
 Step slicing Class & Object:

Set  what is mean by inbuild class


 how to creat user class
 SET creation with variable  crate a class & object
 len() || add() || remove() || pop()  __init__ method
 union() | intersection() || difference()  python constructor
Tuple  constructor, self & comparing objects
 instane variable & class variable
 TUPLE Creation
 Create Tuple with variable Methods:
 what is instance method  Data Frame Attributes
 what is class method  Data Frame Methods
 what is static method  Rename Column & Index
 Accessor & Mutator  Inplace Parameter
 Handling missing or NaN values
Python DECORATOR:
 iLoc and Loc
 how to use decorator  Data Frame – Filtering
 inner class, outerclass  Data Frame – Sorting
 Inheritence  Data Frame – GroupBy
 Merging or Joining
Polymorphism:  Data Frame – Concat
 DataFrame - Adding, dropping columns &
 duck typing
rows
 operator overloading
 DataFrame - Date and time
 method overloading
 DataFrame - Concatenate Multiple csv files
 method overridding
 Magic method Numpy
 Abstract class & Abstract method
 Iterator  Introduction, Installation, pip command,
 Generators in python import numpy package,
ModuleNotFoundError, Famous Alias name to
Python - Production Level Numpy
 Fundamentals – Create Numpy Array, Array
 Error / Exception Handling
Manipulation, Mathematical Operations,
 File Handling
Indexing & Slicing
 Docstrings  Numpy Attributes
 Modularization
 Important Methods- min(),max(), sum(),
Pickling & Unpickling reshape(), count_nonzero(), sort(), flatten()
etc.,
Pandas  adding value to array of values
 Diagonal of a Matrix
 Introduction, Fundamentals, Importing
 Trace of a Matrix
Pandas, Aliasing, DataFrame
 Parsing, Adding and Subtracting Matrices
 Series – Intro, Creating Series Object, Empty
 "Statistical Functions: numpy.mean()
Series Object, Create series from
 numpy.median()
List/Array/Column from DataFrame, Index in
 numpy.std()
Series, Accessing values in Series
 numpy.sum()
 NaN Value
 numpy.min()"
 Series – Attributes (Values, index, dtypes,
size)  Filter in Numpy
 Series – Methods – head(), tail(), sum(), Matplotlib
count(), nunique() etc.,
 Date Frame  Introduction
 Loading Different Files  Pyplot
 Figure Class
 Axes Class  lmplot() function
 Setting Limits and Tick Labels  Seaborn Facetgrid() function
 Multiple Plots  Multi-plot grids
 Legend  Statistical Plots:
 Different Types of Plots:  Color Palettes:
 Line Graph  Faceting:
 Bar Chart  Regression Plots:
 Histograms,  Distribution Plots
 Scatter Plot  Categorical Plots:
 Pie Chart  Pair Plots
 3D Plots
Scipy
 Working with Images
 Customizing Plots  Signal and Image Processing (scipy.signal,
scipy.ndimage):
Seaborn
 Linear Algebra (scipy.linalg):
 catplot() function  Integration (scipy.integrate)
 stripplot() function  Statistics (scipy.stats):
 boxplot() function  Spatial Distance and Clustering (scipy.spatial):
 violinplot() function
Statsmodels
 pointplot() function
 barplot() function  Linear Regression (statsmodels.regression):
 Visualizing statistical relationship with  Time Series Analysis (statsmodels.tsa):
Seaborn relplot() function  Statistical Tests (statsmodels.stats)
 scatterplot() function  Anova (statsmodels.stats.anova):
 regplot() function  Datasets (statsmodels.datasets):
Mathematics
Set Theory  Binomial, Poisson, Normal Distribution,
Standard Normal Distribution
 Data Representation & Database Operations
 Guassian Distribution, Uniform Distribution
Combinatorics  Z Score
 Skewness
 Feature Selection  Kurtosis
 Permutations and Combinations for Sampling  Geometric Distribution
 Hyperparameter Tuning  Hyper Geometric Distribution
 Experiment Design  Markov Chain
 Data Partitioning and Cross-Validation
Linear Algebra
Probability
 Linear Equations
 Basics  Matrices(Matrix Algebra: Vector Matrix
 Theoretical Probability Vector matrix multiplication Matrix matrix
 Empirical Probability multiplication)
 Addition Rule  Determinant
 Multiplication Rule  Eigen Value and Eigen Vector
 Conditional Probability
 Total Probability Euclidean Distance & Manhattan Distance
 Probability Decision Tree Calculus
 Bayes Theorem
 Sensitivity & Specificity in Probability  Differentiation
 Bernouli Naïve Bayes, Gausian Naïve Bayes,  Partial Differentiation
Multinomial Naïve Bayes  Max & Min

Distributions Indices & Logarithms


STATISTICS
Introduction  Dependent Variable
 Independent Variable
 Population & Sample
 Control Moderating & Mediating
 Reference & Sampling technique
Frequency Distribution Table
Types of Data
 Nominal, Ordinal, Interval, Ratio
 Qualitative or Categorical – Nominal &
Ordinal
Types of Variables.
 Quantitative or Numerical – Discrete &
 Categorical Variables - Nomial variable &
Continuous
ordinal variables
 Cross Sectional Data & Time Series Data
 Numerical Variables: discreate & continuous
Measures of Central Tendency  Dependent Variable
 Independent Variable
 Mean, Mode & Median – Their frequency  Control Moderating & Mediating
distribution
Frequency Distribution Table
Descriptive statistic Measures of symmetry
 Relative Frequency, Cumulative Frequency
 skewness (positive skew, negative skew, zero  Histogram
skew)  Scatter Plots
 kurtosis (Leptokurtic, Mesokurtic,  Range
Platrykurtic)  Calculate Class Width:
Measurement of Spread  Create Intervals
 Count Frequencies
 Range, Variance, Standard Deviation  Construct the Table

Measures of variability Correlation, Regression & Collinearity

 Interquartile Range (IQR):  Pearson & Spearman Correlation Methods


 Mean Absolute Deviation (MAD)  Regression Error Metrics
 Coefficient of variation
 Covariance Others

Levels of Data Measurement  Percentiles, Quartiles, Inner Quartile Range


 Different types of Plots for Continuous,
 Nominal, Ordinal, Interval, Ratio Categorical variable
 Box Plot, Outliers
Variable
 Confidence Intervals
 Types of Variables.  Central Limit Theorem
 Categorical Variables - Nomial variable &  Degree of freedom
ordinal variables
Bias and Variance in ML
 Numerical Variables: discreate & continuous
Entropy in ML Inferential Statistics

Information Gain  Hypothesis Testing: One tail, two tail and p-


value
Surprise in ML
 Formulation of Null & Alternate Hypothesis
Loss Function & Cost Function  Type-I error & Type-II error
 Statistical Tests:
 Mean Squared Error, Mean Absolute Error –  Sample Test
Loss Function  ANOVA Test
 Huber Loss Function  Chi-square Test
 Cross Entropy Loss Function  Z-Test & T-Test

SQL
Introduction SQL Commands

 DBMS vs RDBMS  Create


 Intro to SQL  Insert
 SQL vs NoSQL  Alter, Modify, Rename, Update
 MySQL Installation  Delete, Truncate, Drop
 Grant, Revoke
Keys
 Commit, Rollback
 Primary Key  Select
 Foreign Key
SQL Clause
Constraints
 Where
 Unique  Distinct
 Not NULL  OrderBy
 Check  GroupBy
 Default  Having
 Auto Increment  Limit

CRUD Operations Operators

 Create  Comparison Operators


 Retrieve  Logical Operators
 Update  Membership Operators
 Delete  Identity Operators

SQL Languages Wild Cards

 Data Definition Language (DDL) Aggregate Functions


 Data Query Language
SQL Joins
 Data Manipulation Language (DML)
 Data Control Language  Inner Join & Outer Join
 Transaction Control Language  Left Join & Right Join
 Self & Cross Join
 Natural Join
EDA & ML
EDA • Feature Engineering – Adding new features as
per requirement, Modifying the data
 Univariate Analysis • Data Cleaning – Treating the missing values,
 Bivariate Analysis Outliers
 Multivariate Analysis • Data Wrangling – Encoding, Feature
Transformations, Feature Scaling
Data Visualisation
• Feature Selection – Filter Methods, Wrapper
 Various Plots on different datatypes Methods, Embedded Methods
 Plots for Continuous Variables • Dimension Reduction – Principal Component
 Plots for Discrete Variables Analysis (Sparse PCA & Kernel PCA), Singular
 Plots for Time Series Variables Value Decomposition
• Non Negative Matrix Factorization
ML Introduction
Regression
 What is Machine Learning?
 Types of Machine Learning Methods • Introduction to Regression
• Supervised Learning • Mathematics involved in Regression
• Unsupervised Learning • Regression Algorithms:
• Reinforcement Learning) • Simple Linear Regression
 Classification problem in general • Multiple Linear Regression
• Polynomial Regression
 Validation Techniques: CV,OOB
• Lasso Regression
 Different types of metrics for Classification
• Ridge Regression
 Curse of dimensionality
• Elastic Net Regression
 Feature Transformations
 Feature Selection Evaluation Metrics for Regression:
 Imabalanced Dataset and its effect on
Classification • Mean Absolute Error (MAE)
 Bias Variance Tradeoff • Mean Squared Error (MSE)
• Root Mean Squared Error (RMSE)
Important Element of Machine Learning • R²
• Adjusted R²
Multiclass Classification
Classification
• One-vs-All
• Overfitting and Underfitting • Introduction
• Error Measures • K-Nearest Neighbors
• PCA learning • Logistic Regression:
• Statistical learning approaches • Implementation and Optimizations
• Introduce to SKLEARN FRAMEWORK • Stochastic gradient descent
algorithms
Data Processing
• Finding the optimal HyperParameters
• Creating training and test sets, Data scaling through Grid Search
and Normalisation • Support Vector Machines (Linear SVM):
 Linear support vector machines • Accuracy & F1 Score
• Scikit-learn implementation • Precision & Recall
• Linear Classification • Sensitivity & Specificity
• Kernel-based classification • True Positive Rate, False Positive Rate
 Radial Basis Function • ROC & ROC_AUC
 Polynomial Kernel
Clustering
 Sigmoid Kernel
 Custom Kernels Introduction
• Non-linear examples
• 2 features forms straight line & 3 features K-Means Clustering:
forms plane
• Finding the optimal number of clusters
• Hyperplane and Support vectors
• Optimizing the inertia
• Controlled support vector machines
• Cluster instability
• Support vector Regression
• Elbow method
• Kernel SVM (Non-Linear SVM)
• Naives Bayes: Hierarchical Clustering
• Bayes theorem
• Naive Bayes Classifiers Agglomerative clustering
• Naive Bayes in scikit learn ( Bernoulli DBSCAN Clustering
Naive Bayes, Mulitnomial Naive
Bayes, Guassian Naive Bayes)" Association Rules
• Decision Trees:
• Market Basket Analysis
 Binary Decision Trees
• Apriori Algorithm
 Binary decisions
 CART Algorithm Recommendation Engines
 Impurity measures (Gini impurity
index, Cross-entropy impurity index, • Collaborative Filtering:
Misclassification impurity index) • User based collaborative filtering
 Feature importance • Item based collaborative filtering
 Decision tree classification with scikit- • Recommendation Engines
learn Time Series & Forecasting
• Random Forest / Bagging:
• Random Forests and Features • What is Time series data
importance in Random Forest • Different components of time series data
• AdaBoost • Stationary of time series data
• Gradient tree boosting • ACF, PACF
• Voting classifier • Time Series Models:
• Ensemble:Bagging • AR
• Ensemble:Boosting" • ARMA
• Ada Boost • ARIMA
• Gradient Boost • SARIMAX
• XG Boost
• Evaluation Metrics for Classification: Model Selection & Evaluation
• Confusion Matrix Over Fitting & Under Fitting
• Biance-Variance Tradeoff Others
o Cross Validation:
• Dummy Variable, Onehotencoding
o Stratified Cross validation
o K-Fold Cross validation • gridsearchcv vs randomizedsearchcv
• Hyper Parameter Tuning ML Pipeline
• Joblib And Pickling
ML Model Deployment in Flask
PowerBI
Introduction Hierarchies, Filters

• Power BI for Data scientist • Creating Hierarchies


• Types of reports • Drill Down options
• Data source types • Expand and show
• Installation • Visual filter,Page filter,Report filter
• Drill Thru Reports
Basic Report Design
Power Query
• Data sources and Visual types
• Canvas and fields • Power Query transformation
• Table and Tree map • Table and Column Transformations
• Format button and Data Labels • Text and time transformations
• Legend,Category and Grid • Power query functions
• CSV and PDF Exports • Merge and append transformations

Visual Sync, Grouping DAX Functions

• Slicer visual • DAX Architecture,Entity Sets


• Orientation,selection process • DAX Data types,Syntax Rules
• Slicer:Number,Text,slicer list • DAX measures and calculations
• Bin count,Binning • Creating measures
• Creating Columns
Deep Learning
Deep learning at Glance • Vanishing Gradient Descend
• Fine-tuning neural network hyperparameter
• Introduction to Neural Network • Number of hidden layers, Number of neurons
• Biological and Artificial Neuron per hidden layer
• Introduction to perceptron
• Activation function
• Perceptron and its learning rule and • INSTALLATION OF YOLO V8, KERAS, THEANO
drawbacks
• Multilayer Perceptron, loss function PY-TORCH Library
• Neural Network Activation function
RNN (Recurrent Neural Network)
Training MLP: Backpropagation
• Introduction to RNN
Cost Function • Back Propagation through time
• Input and output sequences
Gradient Descent Backpropagation - Vanishing and • RNN vs ANN
Exploding Gradient Problem
• LSTM (Long Short-Term Memory)
Introduce to Py-torch • Different types of RNN: LSTM, GRU
• Biirectional RNN
Regularization • Sequential-to-sequential architecture
(Encoder Decoder)
Optmizers
• BERT Transformers
Hyperparameters and tuning of the same • Text generation and classification using Deep
Learning
TENSORFLOW FRAMEWORK • Generative-AI (Chat-GPT)

• Introduction to TensorFlow Basics of Image Processing


• TensorFlow Basic Syntax
• TensorFlow Graphs • Histogram of images
• Variables and Placeholders • Basic filters applied on the images
• TensorFlow Playground
Convolutional Neural Networks (CNN)
ANN (Artificial Neural Network)
• ImageNet Dataset
• ANN Architecture • Project: Image Classification
• Forward & Backward Propagation, Epoch • Different types of CNN architectures
• Introduction to TensorFlow, Keras • Recurrent Neural Network (RNN)
• Using pre-trained model: Transfer Learning
Natural Language Processing (NLP)
Natural Language Processing (NLP) • TextBlob
• Installing textblob library
• Text Cleaning • Simple TextBlob Sentiment Analysis Example
• Texts, Tokens
• Using NLTK’s Twitter Corpus
• Basic text classification based on Bag of
Words Spacy Library

Document Vectorization • Introduction, What is a Token, Tokenization


• Stop words in spacy library
• Bag of Words
• Stemming
• TF-IDF Vectorizer
• Lemmatization,
• n-gram: Unigram, Bigram • Lemmatization through NLTK
• Word vectorizer basics, One Hot Encoding • Lemmatization using spacy
• Count Vectorizer
• Word Frequency Analysis
• Word cloud and gensim • Counter
• Word2Vec and Glove • Part of Speech, Part of Speech Tagging
• Text classification using Word2Vec and Glove
• Pos by using spacy and nltk
• Parts of Speech Tagging (PoS Tagging or POST) • Dependency Parsing
• Topic Modelling using LDA • Named Entity Recognition(NER)
• Sentiment Analysis • NER with NLTK
Twitter Sentiment Analysis Using Textblob • NER with spacy
Computer Vision
Human vision vs Computer vision OPEN AI

• CNN Architecture • Introduction to Open AI


• CONVOLUTION – MAX POOLING – FLATTEN • Generative AI
LAYER – FULLY CONNECTED LAYER • Chat Gpt (3.5)
• CNN Architecture • LLM (Large Language Model)
• Striding and padding • Classification Tasks with Generative AI
• Max pooling • Content Generation and Summarization with
• Data Augmentation Generative AI
• Introduction to OpenCV & YoloV3 Algorithm • Information Retrieval and Synthesis workflow
with Gen AI
Image Processing with OpenCV
Time Series and Forecasting
• Image basics with OpenCV
• Opening Image Files with OpenCV • Time Series Forecasting using Deep Learning
• Drawing on Images, Image files with OpenCV • Seasonal-Trend decomposition using LOESS
• Face Detection with OpenCV (STL) models.
• Bayesian time series analysis
Video Processing with OpenCV
MakerSuite Google
• Introduction to Video Basics, Object Detection
• Object Detection with OpenCV • PaLM API
• MUM models
Reinforcement Learning
Azure ML
• Introduction to Reinforcement Learning
• Architecture of Reinforcement Learning
• •Reinforcement Learning with Open AI
• Policy Gradient Theory

You might also like