Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

SMEC

Detailed Syllabus

Course Name : Master in Data Science

Key Features:
 Experiential Learning
 Offline/Online Classes
 Case studies and assignments
 Hands on Projects
 Mentoring Sessions
 Job Assistance

Learning Pathway:
• Learn python program from scratch
• Statistical and mathematical essential for Data Science
• Data Science with python
• Machine Learning
• Natural language processing
• Database
• Django, Flask Application Development
• Data visualization techniques

Program Outcomes:
 Deep understanding of data structure and data manipulation.
 Understand and use linear non-linear regression models and classification techniques for data
analysis.
 A comprehensive knowledge of supervised, unsupervised and Reinforcement learning models
such as linear regression, logistic regression, clustering, decision tree, naive bayes, support vector
machines, random forest, K-NN,K-means.
 Gain expertise in mathematical computing using the NumPy and Scikit-Learn package.
 Gain expertise in Exploratory data analysis using pandas, matplotlib and seaborn.
 Gain expertise in time series modeling.
 Understand deep reinforcement learning techniques applied in Natural Language Processing
 Understand the different components of the Hadoop ecosystem and learn to work with HBase, its
architecture and data storage, learning the difference between HBase and RDBMS, and use Hive
for partitioning.
 Understand MapReduce and its characteristics

Page 1

www.smeclabs.com
SMEC
1. Learn Python Program from Scratch
Programming is an increasingly important skill; this program will establish your proficiency in handling
basic programming concepts. By the end of this program, you will understand object-oriented
programming; basic programming concepts such as data types, variables, strings, loops, and functions;
and software engineering using Python. 25+ practices sessions on all modules

1.1 Objectives:

• Gain fundamental knowledge of programming basics.


• Achieve an understanding of object-oriented programming principles including data types,
variables, strings, loops, strings, lists, functions, and classes etc.
• Comprehend software engineering concepts, using Python.

1.2 Program curriculum:

• Course Introduction
• Programming

2. Statistical and Mathematical Essential for Data Science

Statistics is the science of assigning a probability through the collection, classification, and
analysis of data. A foundational part of Data Science, this session will enable you to define statistics
and essential terms related to it, explain measures of central tendency and dispersion, and comprehend
skewness, correlation, regression, distribution. Understanding the data is the key to perform Exploratory
Data analysis and justify your conclusion to the business or scientific problem.

2.1 Objectives:

 Understand the fundamentals of statistics


 Work with different types of data
 How to plot different types of data
 Calculate the measures of central tendency, asymmetry, and variability
 Calculate correlation and covariance
 Distinguish and work with different types of distribution
 Estimate confidence intervals
 Perform hypothesis testing
 Make data-driven decisions
 Understand the mechanics of regression analysis
 Carry out regression analysis
 Use and understand dummy variables
 Understand the concepts needed for Data Science even with Python
Page 2

www.smeclabs.com
SMEC
2.2 Program Curriculum:

1. Introduction
2. Sample or Population Data?
3. The Fundamentals of Descriptive Statistics
4. Measures of Central Tendency, Asymmetry, and Variability
5. Practical Example: Descriptive Statistics
6. Distributions
7. Estimators and Estimates
8. Confidence Intervals Lesson
9. Practical Example: Inferential Statistics
10. Hypothesis Testing: Introduction
11. Practical Example: Hypothesis Testing
12. The Fundamentals of Regression Analysis
13. Assumptions for Linear Regression Analysis
14. Dealing with Categorical Data
15. Practical Example: Regression Analysis

3. Data Science with Python:


Perform fundamental hands-on data analysis using the Jupyter Notebook and PyCharm based
lab environment and create your own Data Science projects learn the essential concepts of Python
programming and gain in-depth knowledge in data analytics, Machine Learning, data visualization, web
scraping, and natural language processing. Python is a required skill for many Data Science positions.

3.1 Objectives:

 Write your first Python program by implementing concepts of variables, strings, functions, loops,
conditions
 Understand the concepts of lists, sets, dictionaries, conditions and branching, objects and classes .
 Work with data in Python such as reading and writing files, loading, working, and saving data
with Pandas
 Gain an in-depth understanding of Data Science processes, data wrangling, data exploration, data
visualization, hypothesis building, and testing.
 Install the required Python environment and other auxiliary tools and libraries.
 Understand the essential concepts of Python programming such as data types, tuples, lists,
dictionaries, basic operators and functions.
 Perform high-level mathematical computing using the NumPy package and its vast library of
mathematical functions.

Page 3

www.smeclabs.com
SMEC
 Perform data analysis and manipulation using data structures and tools provided in the Pandas
package.
 Gain expertise in Machine Learning using the Scikit-Learn package
 Gain an in-depth understanding of supervised learning and unsupervised learning models such as
linear regression, logistic regression, clustering, dimensionality reduction, K-NN and pipeline
 Use the matplotlib library of Python for data visualization
 Extract useful data from websites by performing web scraping using Python.

3.2 Program Curriculum:

1. Python Basics
2. Python Data Structures
3. Python Programming Fundamentals
4. Working with Data in Python
5. Data Science Overview
6. Data Analytics Overview
7. Statistical Analysis and Business Applications
8. Python Environment Setup and Essentials
9. Mathematical Computing with Python (NumPy)
10. Data Manipulation with Pandas
11. Machine Learning with Scikit–Learn
12. Natural Language Processing with Scikit Learn
13. Data Visualization in Python using Matplotlib
14. Web Scraping with Beautiful Soup
15. Working with NumPy Arrays

4. Machine Learning
It will make you an expert in Machine Learning, a subclass of Artificial Intelligence that automates data
analysis to enable computers to learn and adapt through experience to do specific tasks without explicit
programming. You will master Machine Learning concepts and techniques, including supervised and
unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms
and prepare you for your role with advanced Machine Learning knowledge.

4.1 Objectives:

 Master the concepts of supervised and unsupervised learning, recommendation engine, and time
series modeling
 Gain practical mastery over principles, algorithms, and applications of Machine Learning through
hands-on projects
 Acquire thorough knowledge of the statistical and heuristic aspects of Machine Learning

Page 4

www.smeclabs.com
SMEC
 Implement models such as support vector machines, kernel SVM, naive Bayes, decision tree
classifier, random forest classifier, logistic regression, K-means clustering and more in Python
 Validate Machine Learning models and decode various accuracy metrics. Improve the final
models using another set of optimization algorithms
 Comprehend the theoretical concepts and how they relate to the practical aspects of Machine
Learning

4.2 Program Curriculum:

1. Introduction to Artificial Intelligence and Machine Learning


2. Data Wrangling and Manipulation
3. Supervised Learning
4. Feature Engineering
5. Supervised Learning Classification
6. Unsupervised Learning
7. Time Series Modeling
8. Ensemble Learning
9. Recommender Systems
10. Text Mining

5.Database
A database is an organized collection of structured information, or data, typically stored electronically in
a computer system. A database is usually controlled by a database management system (DBMS).
Company data are store in databases and later on retrieved using python to develop analytics and bring
insights to business problems.

5.1 Objectives:
 Understand the basic fundamentals of SQL database
 Methods to structure and configure your database
 Structure the author efficient SQL statements and clauses Manage your SQL database
5.2 Program curriculum:
1. Introduction to SQL
2. Database Normalization and Entity-Relationship (ER) Mode
3. Installation configurations to setup MySQL
4. Understanding Database and Tables
5. Learn Operators, Constraints, and Data Types
6. Understanding functions, Subqueries, Operators, and Derived Tables in SQL

Page 5

www.smeclabs.com
SMEC
6. Flask Web Development
Flask is a microframework for developers, designed to enable them to create and scale web apps
quickly and simply. This is a way for web servers to pass requests to web applications or frameworks

6.1 Objectives:
 Understand the basic fundamentals Flask
 Installation and Creation of a Flask
 The structure and Scope of Flask
 Deployment
6.2 Program Curriculum:
1. Routing: Flask's routing system maps URLs to Python functions, allowing you to define your
application's URL structure.
2. Templates: Flask uses Jinja2 as its template engine, allowing you to easily generate dynamic HTML
pages.
3. Forms: Flask's form handling makes it easy to process user input and validate data.
4. Sessions and cookies: Flask's session and cookie management features allow you to store user data
and maintain state across multiple requests.
5. Database integration: Flask can be used with a variety of databases, including MySQL
6. RESTful APIs: Flask is a popular choice for building RESTful APIs due to its simplicity and
flexibility.
7. Extensions: Flask has a wide range of extensions available for adding functionality to your
application, including Flask-WTF, Flask-Security, and Flask-Mail.
8. Flask-Login: Flask-Login is an extension for handling user authentication and authorization in Flask
applications.
9. Flask-RESTful: Flask-RESTful is an extension for building RESTful APIs with Flask, providing
additional features for API development.
10. Deployment: Flask can be deployed to a variety of environments, including traditional web hosting,
cloud-based services, and containers.

Page 6

www.smeclabs.com
SMEC
7. Fast Api

FastAPI is a modern, fast (high-performance), web framework for building APIs with Python. Creating
APIs, is an important part of making your software accessible. In machine learning they allow different
applications to share data and work together, saving time and effort.

7.1 Objectives:

 Construct Python Microservices with FastAPI


 Building fast API microservice for machine learning predictions
 Deploying python lambda microservice
 Understanding HTTP methods, async await, user modeling, database usage, HTTP get, post,
and delete requests, HTTP status codes, and raising exceptions.

7.2 Program Curriculum:

 Construct Python Microservices with FastAPI


 Understanding FastAPI and Uvicorn
 Installation and Creating Your First API

 Path Parameters

 Query Parameters

 Combining Path and Query Parameters

 Query parameter and string

 Path parameter and numeric validation

 Body multiple parameters, body field, body nested models

 Declare request example data

 Cookie parameters

 Response status codes

 Handling errors

 JSON compatible encoder, security

 Bigger applications – multiple files

 Testing and debugging

Page 7

www.smeclabs.com
SMEC
8. Django Web Development
Django is a Python-based web framework that allows you to quickly create efficient web applications.
With built-in features for everything including Django Admin Interface, default database – SQLlite3,
etc. Django gives you ready-made components to use and that too for rapid development.

8.1 Objectives:
 Understand the basic fundamentals Django
 Django’s MVT Architecture and its Influence
 Installation and Creation of a Django website
 Deployment

8.2 Program Curriculum:


1. Models: Django's object-relational mapping (ORM) tool is used to define models, which are Python
classes that represent database tables.
2. Views: Views are Python functions that handle HTTP requests and return HTTP responses.
3. Templates: Django's template system allows you to define the structure and layout of your web
pages.

4. Forms: Django provides a powerful form handling system, which makes it easy to process user input
and validate data.
5. Admin site: Django comes with a built-in admin site, which provides an easy-to-use interface for
managing your application's data.
6. Authentication: Django provides built-in authentication mechanisms to handle user authentication
and authorization.
7. Middleware: Middleware is a mechanism in Django that allows you to process requests and
responses before they are handled by views.
8. URLs: Django's URL routing system allows you to map URLs to views and organize your
application's URLs.
9. Testing: Django provides a built-in testing framework that makes it easy to write tests for your
application.
10. Deployment: Django can be deployed to a variety of environments, including traditional web
hosting, cloud-based services, and containers.

Page 8

www.smeclabs.com
SMEC
9. Big Data Analytics

9.1 Objectives:
 Understand the concepts of big data in the industry
 Exploring the data sources of big data and problems associated in solving big data problems
 Possibilities of big data and the Hadoop framework to solve wide range of problems
 Understand Hadoop’s architecture and primary components, such as MapReduce and (HDFS)
 Learn to Add and remove nodes from Hadoop clusters, check the available disk space on each
node, and modify configuration parameter
 Data analytics with Scala
 Big data analytics using Spark

9.2 Program curriculum:


1. Introduction to Hadoop, Architecture, Administration and Components
2. Understanding NoSQL Databases HBase
3. Understand basic object-oriented programming methodologies in Scala
4. Introduction to Scala, Basics of Functional Programming, Case Objects and Classes Collections
5. Introduction to Spark, Work with RDD in Apache Spark, processing real time data
6. Perform DataFrame operations in Spark using SQL queries
7. Spark MLib Modelling Big Data with Spark
8. Introduction to Spark GraphX

10. Natural Language Processing (AI)


This Natural Language Processing and Speech Recognition course will give you a detailed look at the
science of applying machine learning algorithms to process large amounts of natural language data. This
module primarily focuses on natural language understanding, feature engineering, natural language
generation, automated speech recognition, speech to text conversion, text to speech conversion.

10.1Objectives:

 Master the concepts of Text mining and natural language processing


 Introduction to NLP Pipeline
 Understanding Tokenization and Sentence splitting, stemming, lemmatization in NLP
 Implement models to perform natural language processing using NLTK
 Perform various natural language models like sentiment analysis, summarizations, identifying
words in text etc.
 Understanding various use cases to implement NLP in different domains

Page 9

www.smeclabs.com
SMEC
10.2 Program Curriculum:

 Introduction to Artificial Intelligence and Machine Learning in NLP


 Perform various Text Analysis, Summarization and Extraction, Sentiment Mining, Text
Classification, Text Summarization, Information Extraction etc.
 Create a basic speech recognizer to convert speech to text

 Perform advanced Speech and Language Processing: An Introduction to Natural Language


Processing, Computational Linguistics and Speech Recognition
 Implement emotion recognition/sentiment analysis etc. using real time speech recognitions.

11. AI, Deep Learning, Computer vision (OpenCV)


Deep learning is one of the most technological advancement made in the fields of artificial intelligence
and machine learning. The software and tools which designed to provide high end scientific computing
solving problems in multiple domains. Learning OpenCV is the key to perform image processing,
filtering, manipulating the image data that is captured from webcam or any image data generated any
other source. Several libraries offer AI solutions, computing deep learning with TensorFlow Keras is
preferred in most companies. This course is designed to get immense knowledge in developing and
deploying machine learning, Deep learning models from scratch to advanced methods including
Convolution neural networks, recurrent neural networks, long short term memory (LSTMs), transfer
learning etc. Get hands on in several deep learning models and tuning hyper parameters to increase the
efficiency of the model.
11.1 Objectives:

 Understanding OpenCV to perform image filtering, thresholding, recognition etc.


 Understand the concepts of TensorFlow Keras
 Understanding low level data manipulations in TensorFlow
 Understand the language and fundamental concepts of artificial neural networks
 Deep learning pipeline for developing a model and deploying from scratch.
 Master advance techniques in deep learning model hyper parameter tuning
 Understanding image recognition pipelines for object detections, face detections etc.
 Understanding the power of deep learning compared to machine learning algorithms

11.2 Program Curriculum:


1. Introduction to Artificial Intelligence and Machine Learning in Computer vision
2. Perform various manipulations in image recognitions
3. Learn to build a Convolution neural network
4.Create a basic image recognizer using Convolutional neural network
5. Understanding time series data analytics
6. Create a Recurrent neural network, LSTM to model time series data like cryptoanalysis, stock
predictions Page 10

www.smeclabs.com
SMEC
12. Kafka, MQTT and AWS IOT

Kafka and MQTT are two complementary technologies, together they allow to build IoT end-to-end
integration from the edge to the data center. MQTT is a widely used ISO standard publish-subscribe-
based messaging protocol. MQTT has many implementations such as Mosquitto or HiveMQ. MQTT is
mainly used in Internet of Things scenarios (like connected cars or smart home). However, MQTT is not
built for high scalability, longer storage or easy integration to legacy systems. Apache Kafka is a highly
scalable distributed streaming platform. Kafka ingests, stores, processes and forwards high volumes of
data from thousands of IoT devices.

12.1 Objectives:

 Integrating IOT devices to data analytics


 MQTT brokers for IOT devices integration
 Understanding Kafka for large scale integration
 IIOT Data architecture using MQTT and Kafka
 Building Data Science pipelines using AWS IOT/HIVEMQ

12.2 Program Curriculum:

 Understanding Data Science IOT data pipelines


 Understanding MQTT and Kafka to build IOT systems
 Configuring MQTT
 Understand MQTT subscriber, publisher and brokers
 Deploying HIVE MQTT services with IOT device
 Understanding AWS IOT core
 Configuring and managing AWS services
 Deploy IoT devices with AWS IoT Core, AWS IoT Device Management, and AWS IoT
Analytics

Technologies:

Python:

Introduction to Python and Computer Programming, Data Types, Variables, Basic Input- Output
Operations, Basic Operators, Boolean Values, Conditional Execution, Loops, Lists and List Processing,
Logical and Bitwise Operations, Functions, Tuples, Dictionaries, Sets, and Data Processing,Modules,
Packages, String and List Methods, and Exceptions, File Handlings. Regular expressions,

Page 11

www.smeclabs.com
SMEC
database, The Object-Oriented Approach: Classes, Methods, Objects, and the Standard Objective
Features; Exception Handling, and Working with Files.

Matplotlib:

Scatter plot ,Bar charts, histogram ,Stack charts , Legend title Style , Figures and
subplots ,Plotting function in pandas ,Labelling and arranging figures ,Save plots .

Seaborn:

Style functions, Color palettes, Distribution plots, Categorical plots, Regression plots, Axis
grid objects.

NumPy

Creating NumPy arrays, Indexing and slicing in NumPy, Downloading and parsing data
Creating multidimensional arrays, NumPy Data types, Array attributes, Indexing and Slicing, creating
array views copies, Manipulating array shapes I/O .

Pandas:

Using multilevel series, Series and Data Frames, Grouping, aggregating, Merge Data Frames,
Generate summary, Group data into logical pieces, Manipulate dates, Creating metrics for analysis,
Data wrangling, Merging and joining, Data Mugging using Pandas, Building a Predictive Mode.

Flask

Flask Configuration, Application creation, Routing, Variable Rules, URL Building,


HTTP methods, Templates, Static Files, Request Object, Sending Form Data to Template,
Sessions, Redirect & Errors, File Uploading.

Django

Django Installation, Django Project, Django Admin Interface, App, MVT, Model, Views,
Templates, Files Handling, Forms, Validation, File Upload, Database Connectivity, Database
Migrations, Django Middleware, Request and Response, Django Exceptions, Django Session,
Django Redirects

Scikit-learn:

Scikit Learn Overview, Plotting a graph, Identifying features and labels, Saving and
opening a model, Classification, Train / test split, What is KNN? What is SVM?, Linear regression,
Logistic vs linear regression, KMeans, Neural networks, Overfitting and underfitting, Backpropagation,
Cost function and gradient descent, CNNs

Page 12

www.smeclabs.com
SMEC
Keras:
Introduction to Deep Learning - Biological Neural Networks Artificial Neural Networks,
Activation Functions, Introduction to Deep Learning Libraries, Regression Models with Keras,
Classification Models with Keras, Deep Neural Networks, Convolutional Neural Networks, Recurrent
Neural Networks.

TensorFlow

Introduction to TensorFlow, HelloWorld with TensorFlow, Linear Regression, Nonlinear


Regression, Logistic Regression, Activation Functions, Convolutional Neural Networks (CNN), CNN
History, Understanding CNNs, CNN Application, Distributed Computing, Exporting, Multi- Layer
Perceptron Learning, Hidden Layers of Perceptron, Optimizers, Gradient Descent
Optimization, Forming Graphs, Image Recognition using TensorFlow

NLTK

Introduction to NLP, Linguistic Resources, Word Level Analysis, Syntactic Analysis,


Semantic Analysis, Word Sense Disambiguation, Natural Language Discourse Processing, Part of
Speech (PoS), Tagging, Natural Language Processing - Inception, NLP - Information Retrieval,
Applications of NLP, Natural Language Processing - Python (NLTK)

MySQL

MySQL – Introduction, Installation, Create Database, Drop Database, Selecting Database, Data
Types, Create Tables, Drop Tables, Insert Query, Select Query, WHERE Clause, Update Query,
DELETE Query, LIKE Clause, Sorting Results, Using Joins, Handling NULL Values, ALTER
Command, Aggregate functions, MySQL Clauses, MySQL Conditions.

MongoDB

No Schema, Install MongoDB, How MongoDB Works? Insert First Data, CRUD
Operations, Insert Many, Update and Update Many, Delete and Delete Many, Diving Deep into find
Difference bbetween update and update Many, Projection, Intro to Embed Documents, Embed
Documents in Action, Adding Arrays, Fetching Data from Structured Data, Schema Types, Types
of Data in Mongo DB, Relationship between data.

Web Scraping:

Scraping Webpages, Scrapping steps, Beautiful Soup package.

Java

Features of Java, Java basics, if statement, Loops, Arrays, Switch, Methods, defining a
class, Access Modifiers, Scope and lifetime of variables, Creating an Object, Object invocation,

Page 13

www.smeclabs.com
SMEC
Method Overloading. Constructor, this, Inheritance, overriding, super, final, Local Classes, Anonymous
Classes, Static classes, Inner class, Nested Classes, Abstract class, Interfaces, Packages, Access control,
Basic java.lang Package, Exception Handling, java.util Package, Collection Frameworks, I/O and
streaming, DBMS & RDBMS, JDBC

Hadoop: Hadoop Architecture & HDFS

Introduction to Big Data and Hadoop, ccommand to monitor the cluster, Hadoop Architecture
Distributed Storage (HDFS) and Yarn, What is HDFS, Need for HDFS, Regular File System vs HDFS,
Characteristics of HDFS, HDFS Architecture and Components, High Availability Cluster,
Implementations, HDFS Component File System Namespace, Data Block Split, Data Replication
Topology, HDFS Command Line, Demo: Common HDFS Commands.

HBase
HBase Overview, Data Model, Configuration, Shell, Write, MemStore, General Commands,
Creating a Table using HBase Shell, Creating a Table Using API, Listing a Table using HBase Shell,
Listing Tables Using API, Enabling a Table, Describe & Alter, Drop a Table, Create Data, Update Data,
Read Data, Delete Dat.

Hive
Introduction to Hive, Hive SQL over Hadoop MapReduce, Hive Architecture, Interfaces to
Run Hive Queries, Running Beeline from Command Line, Hive DDL and DML, Creating New Table,
Data Types, File Format Types, Data Serialization, Hive Table and Avro Schema, Hive Optimization
Partitioning Bucketing and Sampling, Data Insertion, Data Representation and Import Using Hive.

Scala
Basics of Functional Programming and Scala, Functional Programming, Programming With
Scala, Basic Literals and Arithmetic Programming, Logical Operators, Arrays, Lists, Tuples, Sets, Maps,
Type Inference, Classes, Objects , Functions in Scala, Type Inference Functions Anonymous Function
and Class, Exception Handling, FILE Operations

Apache Spark

Introduction to Apache Spark, Limitations of MapReduce in Hadoop, Advantages of Spark,


Components of Spark, Spark Architecture, Spark Shell, Application of In-memory Processing, Spark
Cluster, Running a Scala Programs in Spark Shell, RDD.

Page 14

www.smeclabs.com
SMEC
Apache Kafka

Understanding messaging system in IOT, Understand Apache Kafka Ecosystem, Architecture,


Core Concept, Producers and Consumers concepts, Extended APIs Overview (Kafka Connect, Kafka
Streams), Topics, Partitions, Brokers, Producers, Consumers, clusters. Industry use case and applications
in Data science pipelines

AWS IOT
Configuring and deploying AWS for IOT data stream processing with services such
as AWS IoT Core, MQTT messages streaming to AWS IoT, AWS IoT Device Management and AWS
IoT Analytics, Understanding AWS IoT APIs and SDKs, Industry use case and applications in Data
science pipelines

Page 15

www.smeclabs.com

You might also like