DataScience Training Course Syllabus

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3


Course Content
1. Introduction to Data Science
Learning Objectives - This module will give you an understanding of Big Data and the Roles and
Responsibilities of a Data Scientist. You will learn how Hadoop and R are used in Big Data Analytics and
what are the methodologies used in the Analysis. This module will cover common Big Data as well as
non-Big Data problems and available methods in Data Science to solve these problems. We will also
solve few real-life data sets a Data Scientist encounter in his day to day work using R, Hadoop and
Topics - Introduction to Big Data, Roles played by a Data Scientist, Analyzing Big Data using Hadoop and
R, Methodologies used for analysis, the Architecture and Methodologies used to solve the Big Data
problems, For example, Data Acquisition from various sources, Data preparation, Data transformation
using Map Reduce (RMR), Application of Machine Learning Techniques, Data Visualization etc., problem
statement of few data science problems which we shall solve during the course.

2. Basic Data Manipulation using R

Learning Objectives - In this module, you will learn the various data manipulation techniques using R.
Topics - Understanding vectors in R, Reading Data, Combining Data, subsetting data, sorting data and
some basic data generation functions.

3. Machine Learning Techniques Using R Part-1

Learning Objectives - In this module, you will get an overview of the Machine learning Algorithms, and
Supervised and Unsupervised Learning Techniques.
Topics - Machine Learning Overview, ML Common Use Cases, Understanding Supervised and
Unsupervised Learning Techniques, Clustering, Similarity Metrics, Distance Measure Types: Euclidean,
Cosine Measures, Creating predictive models.

GIIT Getin IT Solutions
Marathahalli, Bangalore,
9845563072 / 78..

4. Machine Learning Techniques Using R Part-2
Learning Objectives - In this module, you will learn Unsupervised Machine Learning Techniques and the
implementation of different algorithms, for example, K-Means Clustering, TF-IDF and Cosine Similarity.
Topics - Understanding K-Means Clustering, Understanding TF-IDF and Cosine Similarity and their
application to Vector Space Model, Implementing Association rule mining in R.

5. Machine Learning Techniques Using R Part-3

Learning Objectives - In this module, you will learn the Supervised Learning Techniques and the
implementation of various Techniques, for example, Decision Trees, Random Forest Classifier etc.
Topics - Understanding Process flow of Supervised Learning Techniques, Decision Tree Classifier, How
to build Decision trees, Random Forest Classifier, What is Random Forests, Features of Random Forest,
Out of Box Error Estimate and Variable Importance, Naive Bayes Classifier.

6. Introduction to Hadoop Architecture

Learning Objectives - In this module, you will learn the HDFS Architecture, MapReduce Paradigm and
few data acquisition techniques in Hadoop.
Topics - Hadoop Architecture, Common Hadoop commands, MapReduce and Data loading techniques
(Directly in R and in Hadoop using SQOOP, FLUME, and other Data Loading Techniques), Removing
anomalies from the data.

7. Integrating R with Hadoop

Learning Objectives - In this module, you will learn the methods to integrate two popular open source
softwares for Big Data analytics: R and Hadoop. You will also learn techniques to write your own
Mappers and Reducers.
Topics - Integrating R with Hadoop using R Hadoop and RMR package, Exploring RHIPE (R Hadoop
Integrated Programming Environment), Writing MapReduce Jobs in R and executing them on Hadoop.

GIIT Getin IT Solutions
Marathahalli, Bangalore,
9845563072 / 78..

8. Mahout Introduction and Algorithm Implementation
Learning Objectives - In this module, you will understand Apache Mahout Machine Learning Library and
will also gain an insight into the methods to achieve Parallel Processing using Algorithms in Mahout.
Topics - Implementing Machine Learning Algorithms on larger Data Sets with Apache Mahout.

9. Additional Mahout Algorithms and Parallel Processing using R

Learning Objectives - In this module, you will learn how to implement Random Forest Classifier with
Parallel Processing Library in R
Topics - Implementation of different Mahout algorithms, Random Forest Classifier with parallel
processing Library in R.

10. Project
Learning Objectives - In this module, you will learn various approaches to solve a Data Science problem
and How different technologies and Tools (R, Hadoop, Mahout) work together in a typical Data Science

GIIT Getin IT Solutions
Marathahalli, Bangalore,
9845563072 / 78..

You might also like