big data syllabus

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

BCA-IOP BADA Theory Practical

Name of The Course

Introduction to Big Data

Science L T P C IA MTE ETE PR ETE

Course Code BCABA1101 3 0 2 4 20 15 30 15 20

Prerequisite

Co requisite

Ant requisite

Course Objectives:

The student should be made to:

Course Outcomes

CO1 Describe what Data Science is and the skill sets needed to be a data scientist.

CO2 Explain in basic terms what Statistical Inference means. Identify probability distributions

commonly used as foundations for statistical modeling. Fit a model to data

CO3 Explain the significance of exploratory data analysis (EDA) in data science. Apply basic

tools (plots, graphs, summary statistics) to carry out EDA.

CO4 Describe the Data Science Process and how its components interact. Use APIs and other

tools to scrap the Web and collect data.

CO5 Identify and explain fundamental mathematical and algorithmic ingredients that constitute a

Recommendation Engine (dimensionality reduction, singular value decomposition,

principal component analysis). Build their own recommendation system using existing

components.

CO6 Describe advances and the latest trends in data science.

Text Book (s)


1.Cathy O‟Neil and Rachel Schutt. Doing Data Science, Straight Talk From The Frontline. O‟Reilly. 2014.

Reference Book (s)

1. Jure Leskovek, Anand Rajaraman and Jeffrey Ullman. Mining of Massive Datasets. v2.1, Cambridge

University Press. 2014. (free online)

2. Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. ISBN 0262018020. 2013.

3. Foster Provost and Tom Fawcett. Data Science for Business: What You Need to Know about Data
Mining

and Data-analytic Thinking. ISBN 1449361323. 2013.

4. Trevor Hastie, Robert Tibshirani and Jerome Friedman. Elements of Statistical Learning, Second
Edition.

ISBN 0387952845. 2009. (free online)

5. Avrim Blum, John Hopcroft and Ravindran Kannan. Foundations of Data Science. (Note: this is a book

currently being written by the three authors. The authors have made the first draft of their notes for the

book available online. The material is intended for a modern theoretical course in computer science.)

6. Mohammed J. Zaki and Wagner Miera Jr. Data Mining and Analysis: Fundamental Concepts and

Algorithms. Cambridge University Press. 2014.

7. Jiawei Han, Micheline Kamber and Jian Pei. Data Mining: Concepts and Techniques, Third Edition. ISBN

0123814790. 2011.

Unit-1 Introduction to BI 8 hours

What is Data Science? - Big Data and Data Science hype – and getting past the hype - Why now? –

Datafication - Current landscape of perspectives - Skill sets needed 2. Statistical Inference -

Populations and samples - Statistical modelling, probability distributions, fitting a model - Intro to

R.

Unit-2 . Exploratory Data Analysis and the Data Science Process 8 hours

Exploratory Data Analysis and the Data Science Process - Basic tools (plots, graphs and summary

statistics) of EDA - Philosophy of EDA - The Data Science Process - Case Study: RealDirect

(online real estate firm) 4. Three Basic Machine Learning Algorithms - Linear Regression - k-
Nearest Neighbors (k-NN) - k-means.

Unit-3 Machine Learning Algorithm and Usage in Applications 8 hours

Motivating application: Filtering Spam - Why Linear Regression and k-NN are poor choices for

Filtering Spam - Naive Bayes and why it works for Filtering Spam - Data Wrangling: APIs and

other tools for scrapping the Web 6. Feature Generation and Feature Selection (Extracting Meaning

From Data) - Motivating application: user (customer) retention - Feature Generation

(brainstorming, role of domain expertise, and place for imagination) - Feature Selection algorithms

– Filters; Wrappers; Decision Trees; Random Forests.

Unit-4 Building a User-Facing Data Product 8 hours

Algorithmic ingredients of a Recommendation Engine - Dimensionality Reduction - Singular Value

Decomposition - Principal Component Analysis - Exercise: build your own recommendation

system 8. Mining Social-Network Graphs - Social networks as graphs - Clustering of graphs -

Direct discovery of communities in graphs - Partitioning of graphs - Neighborhood properties in

graphs.

Unit-5 Data Visualization and Ethical Issues 8 hours

Basic principles, ideas and tools for data visualization , Examples of inspiring (industry) projects -

Exercise: create your own visualization of a complex dataset Discussions on privacy, security,

ethics - A look back at Data Science - Next-generation data scientists.

Unit-6 Research 8 hours

The advances and the latest trends in the course as well as the latest applications of the areas

covered in the course.

The latest research conducted in the areas covered in the course.

Discussion of some latest papers published in IEEE transactions and ACM transactions, Web of

Science and SCOPUS indexed journals as well as high impact factor conferences as well as

symposiums.

Discussion on some of the latest products available in the market based on the areas covered in the

course and patents filed in the areas covered.


BCA-IOP Big Data Theory Practical

Name of The Course

Foundation of Big Data

System L T P C IA MTE ETE PR ETE

Course Code BCABI1101 3 0 2 4 20 15 30 15 20

Prerequisite

Co requisite

Ant requisite

COURSE OBJECTIVES:

Understanding Data Science Process and learning techniques, tools, Statistical Methodologies and

Machine learning algorithms used in the process.

COURSE OUTCOMES:

Course Outcomes

CO1 Students should know about design issues of Hadoop Architecture.

CO2 Students should learn various techniques for big data analytics.

CO3 Students able to identify the real time problems and able to design solution using

various big data analytics techniques.

CO4 Students use prediction of supervised and unsupervised learning.

CO5 Students can use classification of clustering algorithms

CO6 Student can understand current research trends in big data

COURSE CONTENT: Hours

UNIT I INTRODUCTION TO BIG DATA: 9

Introduction – distributed file system – Big Data and its importance, Four V‟s in bigdata, Drivers for Big
data, Big data analytics, Big data applications. Algorithms using map reduce, Matrix-Vector

Multiplication by Map Reduce.

UNIT II INTRODUCTION HADOOP : 9

Big Data – Apache Hadoop & Hadoop EcoSystem – Moving Data in and out of Hadoop –

Understanding inputs and outputs of MapReduce - Data Serialization.

UNIT- III HADOOP ARCHITECTURE: 9

Hadoop Architecture, Hadoop Storage: HDFS, Common Hadoop Shell commands , Anatomy of File

Write and Read., NameNode, Secondary NameNode, and DataNode, Hadoop MapReduce paradigm,

Map and Reduce tasks, Job, Tasktrackers - Cluster Setup – SSH & Hadoop Configuration – HDFS

Administering –Monitoring & Maintenance.

UNIT-IV HADOOP ECOSYSTEM AND YARN : 9 Hadoop

ecosystem components - Schedulers - Fair and Capacity, Hadoop 2.0 New Features- NameNode High

Availability, HDFS Federation, MRv2, YARN, Running MRv1 in YARN.

UNIT-V HIVE AND HIVEQL, HBASE: 9 Hive

Architecture and Installation, Comparison with Traditional Database, HiveQL - Querying Data - Sorting

And Aggregating, Map Reduce Scripts, Joins & Subqueries, HBase concepts- Advanced Usage, Schema

Design, Advance Indexing - PIG, Zookeeper - how it helps in monitoring a cluster, HBase uses

Zookeeper and how to Build Applications with Zookeeper.

Unit VI 5 hours

The advances and the latest trends in the course as well as the latest applications of the areas covered in
the course.

The latest research conducted in the areas covered in the course.

Discussion of some latest papers published in IEEE transactions and ACM transactions, Web of Science
and

SCOPUS indexed journals as well as high impact factor conferences as well as symposiums.

Discussion on some of the latest products available in the market based on the areas covered in the
course and

patents filed in the areas covered.


Reference Books

1. Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, “Professional Hadoop Solutions”,

2. Wiley, ISBN: 9788126551071, 2015.

3. Chris Eaton, Dirk deroos et al. , “Understanding Big data ”, McGraw Hill, 2012.

4. Tom White, “HADOOP: The definitive Guide” , O Reilly 2012.

5. Vignesh Prajapati, “Big Data Analytics with R and Haoop”, Packet Publishing 2013.

6. Tom Plunkett, Brian Macdonald et al, “Oracle Big Data Handbook”, Oracle Press, 2014.

7. Jy Liebowitz, “Big Data and Business analytics”,CRC press, 2013.

You might also like