Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Pioneer in Artificial Intelligence Training in Pune, India

https://marsiantech.com/

Course Outline: DATA SCIENCE Master (200 Hours)

1. Introduction 3. Advanced Excel


• What is Data Science
• Evolution of Analytics 4. Statistical Concepts & Application
• Data Science Components
• Data Scientist Skillset 4.1 Descriptive Statistics
• Types of Data Scientists
• Data Basics
• Introduction to Machine Learning
• Observations, variables, and data matrices
• Data Science Process
• Types of variables
• Relationships between variables
2. Analytic Techniques using R • Central Tendency
2.1 Introduction to R Programming • Measures of Central Tendency
- Arithmetic Mean / Average
• When and Why to use R for Analytics
- Merits & Demerits of Arithmetic Mean
• Types of Objects in R
- Mode
• Naming Conventions in R
- Merits & Demerits of Mode
• Creating Objects in R
- Median
• Data Structure in R
- Merits & Demerits of Median
• Matrix, Data Frame, String, Vectors
- Variance
• Understanding Vectors & Data input in R
• Lists, Data Elements
• Creating Data Files using R 4.2 Data Visualization
• Importing Data Files from other sources.
• BAR Graph
• Know your Data
• Pie Chart
2.2 Data Manipulation & Exploration in R • Box Plot
• Scatter Plot
• Sorting Data
• Histograms
• Sub-setting Data
• Bimodal & Multimodal Histograms
• Selecting (Keeping) Variables
• Frequency Chart
• Excluding (Dropping) Variables
• Line Charts
• Selecting Observations and Selection using
Subset Function • Basic Statistics & Data Visualization in R
• Merging Data
• Adding Rows 4.3 Probability Basics
• Data Type Conversion
• Built-In Numeric Functions • Notation and Terminology
• Built-In Character Functions • Unions and Intersections
• User Built Functions • Conditional Probability and Independence
• Control Structures
• Loop Functions
• Outlier & Missing Values
Page 1 of 10

https://marsiantech.com/ Branches: Pimple Saudagar and Viman Nagar, Pune, INDIA. Tel: 7028052111, 7028052112
Pioneer in Artificial Intelligence Training in Pune, India
https://marsiantech.com/

Course Outline: DATA SCIENCE Master (200 Hours)


4.4 Probability Distributions 4.7 ANOVA (Analysis of Variance)
• Random Variable • Hypothesis in Analysis of Variance
• Probability Distributions • General setup of ANOVA
• Probability Mass Function • Nonparametric Test and a Parametric Test
• Parameters vs. Statistics • Tests of Hypothesis using R
• Binomial Distribution • Analysis of Variance Using R
• Poisson Distribution
• Normal Distribution
• Standard Normal Distribution
5. Analytic Techniques using Python
• Central Limit Theorem
• Cumulative Distribution function 5.1 Basics of Python Language
• Probability Distributions in R • When and Why to use Python for Analytics
• Introduction & Installation of Python
4.5 Probability Distributions Sampling • Python Syntax
• Random Sampling • Strings
• Systematic Random Sampling • Lists and Dictionaries
• Stratified Random Sampling • Loops
• Cluster Random Sampling • Regular Expressions

4.6 Inferential Statistics 5.2 Introduction to Pandas


• Hypothesis Testing
• Selecting data from Pandas DataFrame
- Null Hypothesis
• Slicing and dicing using Pandas
- Alternate Hypothesis
• GroupBY / Aggregate
- Level of Significance
• Strings with Pandas
- P-Value, Normality
• Cleaning up messy data with Pandas
- Decision Criteria
• Dropping Entries
• Tests of Hypothesis • Selecting Entries
- Large Sample Test
- Small Sample Test 5.3 Data Manipulation using Pandas
- One Sample: Testing Population Mean
• Data Alignment
- Hypothesis in One Sample z-test
• Sorting and Ranking
- Two Sample: Testing Population Mean
• Summary Statistics
- One Sample t-test
• Missing values
- Two Sample t-test
• Merging data
- Paired t-test
• Concatenation
- Hypothesis in Paired Samples t-test
• Combining DataFrames
- Chi-Square test
• Pivot
- Hypothesis in Chi-Square test
• Duplicates
- F test, Hypothesis in F test
• Binning
Page 2 of 10

https://marsiantech.com/ Branches: Pimple Saudagar and Viman Nagar, Pune, INDIA. Tel: 7028052111, 7028052112
Pioneer in Artificial Intelligence Training in Pune, India
https://marsiantech.com/

Course Outline: DATA SCIENCE Master (200 Hours)


5.4 Visualization with MatplotLib 6.2 Machine Learning Concepts &
• Anatomy of a Matplotlib Plot Terminologies
• Matplotlib Installation
• Steps in developing a Machine Learning
• Matplotlib Basic Plots & it's Containers
• A MatplotLib Figure, it's components and application
properties • Key tasks of Machine Learning
• Axes and other graphical objects • Modelling Terminologies
• Pylab & Pyplot • Learning a Class from Examples
• Scatter plots • Probability and Inference
• 2D Plots-Straight Lines & Curves • PAC (Probably Approximately Correct) Learning
• Histograms
• Noise
• Pie Charts
• Bar Graphs • Noise and Model Complexity
• Box Plots • Triple Trade-Off
• Data for Matplotlib Plots • Association Rules
• Set up Title, Axes Labels, Legend, Layout • Association Measures
• Showing, Saving and Closing Your Plot • Sample Algorithms

5.5 Scientific Libraries in Python


• Numpy -- Supervised Learning --
• Scikit-Learn

Predictive Models
(Simple/Multiple/Logistic Regression)
6. Machine Learning
6.1 Fundamentals of Machine Learning 7. Simple Linear Regression
• Correlation
• Overview & Terminologies
• Regression
• What is Machine Learning?
• Model Assumptions
• Why Learn?
• Estimation Process
• When is Learning required?
• Least Squares Method
• Data Mining
• The Coefficient of Determination
• Application Areas and Roles
• Types of Machine Learning • Correlation and Regression Using R &
Python
- Supervised Learning • Simple Linear Regression Assignments
- Unsupervised Learning
- Reinforcement learning

Page 3 of 10

https://marsiantech.com/ Branches: Pimple Saudagar and Viman Nagar, Pune, INDIA. Tel: 7028052111, 7028052112
Pioneer in Artificial Intelligence Training in Pune, India
https://marsiantech.com/

Course Outline: DATA SCIENCE Master (200 Hours)


8. Multiple Regression Analysis 10. Maximum Likelihood Estimation
• Introduction • Bernoulli distribution
• Design Requirements • Multinomial distribution
• Assumptions • Gaussian distribution
• Independence • Assessing the Model
• Normality, Homoscedasticity, Linearity • Assessing Changes in Models
• Multiple Regression • Assessing Predictors
• Formal Statement of the Model • Methods of Regression
• Estimating parameters of the model • Complete Separation
• F-test for the overall fit of the model • Overdispersion
• Multiple regression model Building
• Selecting the best Regression equation • MLE using Python
• Examples/Use Cases
• Interpreting the Final Model
• Multicollinearity and its Diagnostics
• Examples/Use Cases
• Qualitative Independent Variables 11. Decision Trees
• Indicator variables
• Interpretation of Regression Coefficients • Understanding the Concept
• Examples/Use Cases • Internal decision nodes
• Regression Diagnostics and Residual • Terminal leaves.
Analysis • Tree induction: Construction of the tree
• Classification Trees
• Multiple Linear Regression Using R & Python • Entropy
• Multiple Regression Assignment • Selecting Attribute
• Information Gain
9. Logistic Regression Analysis • Partially learned tree
• Theory Behind Logistic Regression • Overfitting
- Assessing the Model and Predictors • Causes for over fitting
• Overfitting Prevention (Pruning) Methods
• When and Why do we Use Logistic
• Reduced Error Pruning
Regression?
• Decision trees - Advantages & Drawbacks
- Binary • Ensemble Models
- Multinomial
• Decision Trees using Python
• Interpreting Logistic Regression
• Decision Trees Assignment
• Assumptions
• Sample size requirements
• The logistic function & Interpretation
• Methods for including variables
• Computational method
• Logistic Regression Model using R & Python Page 4 of 10
• Logistic Regression Assignment
https://marsiantech.com/ Branches: Pimple Saudagar and Viman Nagar, Pune, INDIA. Tel: 7028052111, 7028052112
Pioneer in Artificial Intelligence Training in Pune, India
https://marsiantech.com/

Course Outline: DATA SCIENCE Master (200 Hours)


12. Random Forests 14. Bayesian Theory
• Introduction & Motivation • Axioms of Probability Theory
• Ensemble Methods - Bagging, Boosting & • Conditional Probability
Random Forests • Independence
• Ensemble Classifiers • Joint Distribution
• Ensemble Models • Baye’s Rule
• How random forests work? • Bayesian Categorization
• Gini Index • Generative Probabilistic Models
• Operation of Random Forest • Naïve Bayes Generative Model
• Random forest algorithm • Naïve Bayesian Categorization
• Common variables for random forests • Example & Exercises
• Random Forest – practical consideration • Naïve Bayes Classifier using Python
• Random Forest – Features, Advantages and
Disadvantages
• Limitations of random forests
15. K-Nearest Neighbor (K-NN)
• Random Forest using Python
• Non-parametric methods
13. Support Vector Machine • k-Nearest Neighbor Estimator
• Problem Definition • How to Choose k or h
• Separating Hyperplanes • Strengths and Weaknesses
• Linear separable case • K-Nearest Neighbor using Python
• Formula for the Margin
• Finding the optimal hyperplane
• The optimization problem
16. Boosting
• The Legrangin Dual Problem
• Importance of the Support Vectors • Gradient Boosting
• VC dimension • ADA Boost
• Non-linear SVM
• Mapping the data to higher dimension
• The Kernel Trick 17. Intro to Dimensionality Reduction
• Important Kernel Issues
• Soft Margin • Principal Components Analysis (PCA)
• The primal optimization problem • Singular Value Decomposition (SVD)
• The Dual Formulation • Latent Dirichlet Analysis (LDA)
• The “C” Problem: Overfitting and Underfitting
• Model selection procedure • PCA using Python
• SVM For Multi-class classification
• Applications of SVM
• Advantages & Drawbacks of SVM
Model Building Exercises in Python Page 5 of 10

https://marsiantech.com/ Branches: Pimple Saudagar and Viman Nagar, Pune, INDIA. Tel: 7028052111, 7028052112
Pioneer in Artificial Intelligence Training in Pune, India
https://marsiantech.com/

Course Outline: DATA SCIENCE Master (200 Hours)

-- Un-Supervised Learning -- 19. Time Series


18. K Means Clustering • The Art of Forecasting
• Forecasting Approaches
• Parametric Methods Recap • Qualitative Forecasting Methods
• Clustering • Quantitative Forecasting Methods
• Direct Clustering Method • Time Series & its Components
• Mixture densities - Trend
• Classes v/s Clusters - Cyclical
- Seasonal
• Non-Hierarchical Clustering
- Irregular
• K-Means
• Smoothing Methods
• Distance Metrics - Moving Average Method
• K-Means Algorithm - Exponential Smoothing Method
• K-Means Objective • Forecast Effect of Smoothing Coefficient
• Color Quantization • Linear Time-Series Forecasting Model
• Vector Quantization • Forecast using Trend Models
• Encoding/Decoding • The Linear Trend Model
• Soft Clustering • Time Series Plot
• Expectation Maximization (EM) • Seasonality Plot
• EM Algorithm • Trend Analysis
• Feature Selection vs Extraction • Quadratic Time-Series Forecasting Model
• Seed Choice • Quadratic Time-Series Model Relationships
• Uses of Clustering • Quadratic Trend Model
• Clustering as Pre-processing • Exponential Time-Series Forecasting Model
• K-Means Clustering using Python • Exponential Weight
• Exponential Trend Model
• Autoregressive Modeling
• Time Series Data Plot
• Auto-correlation Plot
• Evaluating Forecasts
• Quantitative Forecasting Steps
• Forecasting Guidelines
• Pattern of Forecast Error
• Residual Analysis
• Time Series Using Python

Page 6 of 10

https://marsiantech.com/ Branches: Pimple Saudagar and Viman Nagar, Pune, INDIA. Tel: 7028052111, 7028052112
Pioneer in Artificial Intelligence Training in Pune, India
https://marsiantech.com/

Course Outline: DATA SCIENCE Master (200 Hours)

20. Deep Learning / Neural Network 22. Tableau


• Introduction to Deep Learning 22.1 Tableau Fundamentals
• Historical Context - Uses of Tableau
• Advances in Related Fields - Tableau Installation
• Pre-requisites - Help MENU AND SAMPLES
• Deep Learning Frameworks - Connecting to data
• Introduction of each Framework - Dimensions and measures
• TensorFlow, Theano, Keras, Pytorch - Tableau interfaces
• Architecture of each Framework - Single table & Multiple table
• How to choose a Framework & when - Copy and Paste
• Artificial Neural Network
• Neuron & Perceptron 22.2 Data Visualization with Tableau
• Natural Language Processing (NLP) & NLTK - Features of Tableau
• Natural Language Understanding (NLU) - Exporting Data
• Natural Language Generation (NLG) - Connecting Sheets
• Forward Propagation - Making Basic Visualization
• Backward Propagation - Making sense out of visuals
• Image Processing - Bullet Charts
• Types of Aggregation Graphs - Dual Axis
• Anomaly Detection - Reference Lines
• Use Cases - Pareto Charts
- Waterfall Charts
- Joins
21. Machine Learning in Cloud - Other Advance Techniques

• Machine Learning Services & Features


22.3 Conditional Formatting & Scripting
• R and Python Scripting in Cloud
• Hands on labs - How to make Charts with Conditions
- Calculated Fields
- Accessing Machine Learning Services - Drill Down
- Getting Data - Drill in Effects
- Preparation of Data - Date/Time Manipulations
- Selecting and Applying Machine Learning
Algorithm 22.4 Dashboard Integration
- Publishing Models
- Dashboard Designing
- Action Settings
- Linking Charts and Storytelling

- Capstone Project

Page 7 of 10

https://marsiantech.com/ Branches: Pimple Saudagar and Viman Nagar, Pune, INDIA. Tel: 7028052111, 7028052112
Pioneer in Artificial Intelligence Training in Pune, India
https://marsiantech.com/

Course Outline: DATA SCIENCE Master (200 Hours)

23. Big Data and Hadoop Fundamentals 26. SPARK & SPARK ML
• What is Big Data? • Spark Introduction
• Challenges of Big Data • Spark Architecture
• Traditional Data Vs Big Data • Important components in Spark
• What is Hadoop? • Introducing RDDs
• Architecture • RDD Basics
• Hadoop Ecosystem • RDDs of Key / Value Pairs
• Transformations and Actions
24. HDFS [Hadoop Distributed Filing System] • RDD APIs
• File System Overview • RDD Examples
• Distributed File System • Spark SQL Overview
• Hadoop Distributed File System • Spark ML
• Salient Features
• HDFS Architecture
• NameNode, Secondary NameNode
• DataNode
• Reading & Writing data to HDFS
Special Sessions on;
• HDFS Heartbeats
• HDFS High Availability
• Architecture of NameNode - Interview Preparation
• Data Integrity, Failure Handling, Hadoop API
- Resume Building
25. Hive and HiveQL
• Hive Overview - Mock Interviews
• Hive Architecture
• Execution Flow & Compiler
• Hive Metastore
- Post-Training Engagement
• Data Model
• Physical Layout
• Hive DDL & DML Commands
• Partitions & Buckets
• Querying in Hive
• Join & Join Optimizations
• Hive Built-in Functions
• Extensibilities
• UDF
• RDBMS (SQL) Vs Hive (HQL)
• Use Cases

Page 8 of 10

https://marsiantech.com/ Branches: Pimple Saudagar and Viman Nagar, Pune, INDIA. Tel: 7028052111, 7028052112
Pioneer in Artificial Intelligence Training in Pune, India
https://marsiantech.com/

Course Outline: DATA SCIENCE Master (200 Hours)

27. Pre-requisites and Setup


• Setup on Eclipse / Pycharm
• Numpy and SciPy

28. Introduction to Keras


• Overview of Keras
• Installation Procedure
- Dependencies
- TensorFlow backend
- Theano backend
• Guiding Principles
- Modularity
- Minimalism
- Easy Extensibility
- Work with Python
• When to use KERAS?
• Code Examples

https://marsiantech.com/ Branches: Pimple Saudagar and Viman Nagar, Pune, INDIA. Tel: 7028052111, 7028052112
Pioneer in Artificial Intelligence Training in Pune, India
https://marsiantech.com/

Course Outline: DATA SCIENCE Master (200 Hours)

31. TensorFlow 32. Convolutional Neural Networks (CNN)


• TensorFlow installation • Convolution Operation
• Introduction to TensorFlow • Pooling Operation
• TensorFlow APIs • Convolution-Detector-Pooling Building Block
• Tensors • Convolution Variants
• Importing TensorFlow • Intuition behind CNNs
• Building & Running a computational graph
• Variables: Create, Initialize, Save, and Load 33. Recurrent Neural Networks (RNN)
• Tensor Ranks, Shapes, and Types
• Sharing Variables • RNN Basics
• Reading Data • Training RNNs
• Supervisor: Training Helper for Days-Long • Bidirectional RNNs
Trainings. • Gradient Explosion and Vanishing
• TensorFlow Debugger (tfdbg) Command- • Gradient Clipping
Line-Interface Tutorial: MNIST • Long Short-Term Memory
• How to Use TensorFlow Debugger (tfdbg)
with tf.contrib.learn 34. Autoencoders
• Exporting and Importing a MetaGraph
• TensorFlow Version Semantics
• TensorFlow Data Versioning: GraphDefs and
35. Custom Metrics
Checkpoints
• TensorBoard: Suite of visualization tools 36. Use Cases
• Natural Language Processing [NLP]
• Object Recognition

- Frequently Asked Interview Questions

Page 10 of 10

https://marsiantech.com/ Branches: Pimple Saudagar and Viman Nagar, Pune, INDIA. Tel: 7028052111, 7028052112

You might also like