Big Data Analytics

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Department of Master of Computer Applications

FMTH0301/Rev.5.3
Course Plan

Semester: III Year: 2023-2024


Course Title: Big Data Analytics Course Code: : 20ECAC801
Total Contact Hours: 40 Duration of ESA Hours: 3
ESA Marks: 50 ISA Marks: 50
Lesson Plan Author: S.V.Budni Date: 01-01-2024
Checked By: Deepa.Mulimani Date: 01-01-2024
Prerequisites:

i. Knowledge of DBMS, Data mining and Statistics.

Course Outcomes (COs):


At the end of the course the student should be able to:

1. Explain the concept and challenge of big data and why existing technology is
inadequate to analyze the big data;
2. Analyze the impact of big data on business decisions and strategy.
3. Gain hands-on experience on large-scale analytics tools to solve some open big data
problems;
4. Apply non-relational databases and techniques for storing and processing large volumes
of semi-structured and unstructured data.

Powered by www.ioncudos.com Page 1 of 19.


Department of Master of Computer Applications

Course Articulation Matrix: Mapping of Course Outcomes (CO) with Program


Outcomes

Course Title: Big Data Analytics Semester: 3


Course Code: 20ECAC801 Year: 2023-24

Course Outcomes (CO) / Program 1 2 3 4 5 6 7 8 9 10 11 12 13 14


Outcomes (PO)
1. Explain the concept and challenge L M M
of big data and why existing
technology is inadequate to
analyze the big data;

2. Analyze the impact of big data on L M M L


business decisions and strategy.

3. Gain hands-on experience on large- L M M M L


scale analytics tools to solve some
open big data problems;

4. Apply non-relational databases L M M L


and techniques for storing and
processing large volumes of semi-
structured and unstructured data.

Degree of compliance L: Low M: Medium H: High

Powered by www.ioncudos.com Page 2 of 19.


Department of Master of Computer Applications

Competency addressed in the Course and corresponding Performance Indicators

Competency Performance Indicators


1.4 - Demonstrate competence in computing 1.4.1 - Apply software engineering principles
knowledge to solve computing problems.
2.2 - Demonstrate an ability to formulate a 2.2.1 - Decompose complex problem into
solution plan and methodology for a interconnected sub-problems.
computing problem
2.2.2 - Identify the requirements of sub-
problems and formulate the solution plan.
3.2 - Demonstrate an ability to generate a 3.2.1 - Apply formal idea generation tools to
diverse set of alternative design solutions. develop multiple computing design solutions.
3.2.2- Build models, prototypes, etc., to
develop diverse set of design solutions

5.2 - Demonstrate an ability to select and 5.2.1 - Identify the strengths and limitations of
apply domain specific tools, techniques and tools for (i) acquiring information, (ii)
resources modeling and simulating, (iii) Monitoring
system performance, and (iv) creating
designs.
5.2.2 - Demonstrate proficiency in using
domain specific tools
7.2 - Demonstrate an ability to Identify 7.2.1 - Identify technological advances in
changing trends in computing knowledge and computing that required practitioners to stay
practice updated with current technologies.
7.2.2 - Recognize the necessity of being
updated with new developments in the
domain

Eg: 1.2.3: Represents program outcome ‘1’, competency ‘2’ and performance indicator ‘3’.

Powered by www.ioncudos.com Page 3 of 19.


Department of Master of Computer Applications
Course Code: 20ECAC801 Course Title: Big Data Analytics
L-T-P: 3-0-1 Credits: 4 Contact Hrs: 5
ISA Marks: 50 ESA Marks: 50 Total Marks: 100
Teaching Hrs: 40+24 Exam Duration: 3 hrs
Content Hrs
Unit – I
Chapter 1: Types of digital data and concept of big data 4 Hrs
Classification of digital data: Unstructured, Semi-structured, and Structured;
Characteristics of data, Evolution of big data, and definition of big data: 5 Vs,
challenges with big data, typical data warehouse environment: Hadoop
Environment.
Chapter 2: Big Data Analytics 7 Hrs
What is big data analytics? What big data analytics is not? Classification of
analytics, Top challenges facing big data, Importance of big data analytics, Need of
technology to meet big data challenges, Data science: business acumen skills,
technology expertise, mathematics expertise, Data scientist, terminologies used in
big data environments, BASE, top analytics tools.
Chapter 3: Big data technology landscape 4 Hrs
Not Only SQL (NOSQL): Types of NoSQL, Advantages of NoSQL, Use of NoSQL in
industry, NewSQL, Hadoop: features, key advantages, versions, overview of
Hadoop ecosystem, Hadoop distributions, Hadoop versus SQL, Cloud-based
Hadoop solutions.
Unit – II
Chapter 4: Hadoop distributed file system 7Hrs
Introduction, Why Hadoop, RDBMS versus Hadoop, distributed computing
challenges: hardware failure, how to process gigantic store of data, history of
Hadoop, Hadoop overview, use case of Hadoop, Hadoop distributors, Hadoop
Distributed File System (HDFS): Name node, Data node, secondary Name node,
anatomy of file read, anatomy of file write; replica placement, processing of data
with Hadoop, Managing resources an applications with Hadoop, Interacting with
Hadoop ecosystem.

Chapter 5: MongoDB and query language 4 Hrs


Introduction, Why MongoDB, Terms used in RDBMS and MongoDB, data types in
MongoDB, MongoDB query language: basic functions, Arrays, aggregate functions,
MapReduce function, Java script programming, Cursors in MongoDB, MongoImport
and MongoExport.

Chapter 6: Cassandra and MapReduce programming 4 Hrs


Introduction, Apache Cassandra, features of Cassandra, data types, CQLSH,
Keyspaces, CRUD operations, Introduction to MapReduce, Mapper, Reducer,
Combiner, partitioner, searching, Sorting, and compression.
Unit – III
Chapter 7: Hive and query language 5 Hrs
Introduction, What is Hive, History of Hive and recent releases of Hive, Hive
integration and work flow, Hive data units; Hive architecture, Hive data types, Hive
file format, Hive Query Language (HQL): DDL, DML, Hive shell, database, tables,
Partitions, Bucketing, Views, Sub-query: RCFile implementation, SERDE, User
defined function.

Powered by www.ioncudos.com Page 4 of 19.


Department of Master of Computer Applications

Chapter 8: PIG 5 Hrs


Introduction, What is PIG, Key features of PIG; The anatomy of PIG, PIG
philosophy, use case for PIG: ETL processing, PIG Latin overview, Data types in
PIG, Running PIG, execution modes of PIG, HDFS commands, relational operators,
eval function, complex data types, piggy bank, user defined function.

Text Book
1. Seema Acharya, Subhashini Chellapan, Big Data and Analytics, First edition,
2015, Wiley publications.
References
1. EMC Education Services, Data Science and Big Data Analytics: Discovering,
Analyzing, Visualizing and Presenting Data, Wiley Publications.
2. Frank J Ohlhorst, Big Data Analytics: Turning Big Data into Big Money‖,
Wiley and SAS Business Series, 2012.
3. Colleen Mccue, Data Mining and Predictive Analysis: Intelligence Gathering
and Crime Analysis‖, Elsevier, 2007.
4. Michael Berthold, David J. Hand, Intelligent Data Analysis, Springer, 2007.
5. Bill Franks, Taming the Big Data Tidal Wave: Finding Opportunities in Huge
Data Streams with Advanced Analytics‖, Wiley and SAS Business Series,
2012.
6. Paul Zikopoulos, Chris Eaton, Paul Zikopoulos, Understanding Big Data:
Analytics for Enterprise Class Hadoop and Streaming Data‖, McGraw Hill,
2011.
7. Jiawei Han, Micheline Kamber, Data Mining Concepts and Techniques‖,
Second Edition, Elsevier, Reprinted 2008.

Evaluation Scheme
CIE Scheme

Assessment Theory

ISA- 1 15

ISA- 2 15

Lab Practices + Seminar 20

Total 50

Powered by www.ioncudos.com Page 5 of 19.


Department of Master of Computer Applications

List of Practices

S. No Assignment BDA Tools

1 Hadoop Implementation of MapReduce problem. Hadoop


MongoDB
2 Demonstration of CRUD operations in MongoDB Cassandra
3 Implementation of MapReduce functions in MongoDB for log data analysis. Hive
4 Integration of JavaScript with MongoDB, Loading of large data into Pig
MongoDB
5 Storing of Unstructured data and demonstration of Cassandra Query
Language.
6 Demonstration of ETL process using Hive and Hive Query Language (HQL)
7 Log analysis using Hive
8 Implementation of Word count problem using Pig

Powered by www.ioncudos.com Page 6 of 19.


Department of Master of Computer Applications

Course Unitization for ISA Exams and ESA Examination

No. of No. of No. of


Topics / Chapters Teaching Questions Questions Questions
hours in ISA-1 in ISA-2 in ESA
EXAM
Unit I
1. Types of digital data and concept of 4 1 -- 1
big data
2. Big Data Analytics 7 1 -- 1
3. Big data technology landscape 4 1 -- 1
Unit II
4. Hadoop distributed file system 7 -- 1 1
5. MongoDB and query language 4 -- 1 1
6. Cassandra and MapReduce 4 -- 1 1
programming
Unit III
7. Hive and query language 5 -- -- 1
8. PIG 5 -- -- 1
Note:
1. Each Question carries 20 marks and may consist of sub-questions.
2. Mixing of sub-questions from different chapters within a unit (only for Unit I and Unit
II) is allowed in ISA-1, ISA-2 and ESA.
3. Answer 5 full questions of 20 marks each (two full questions from Unit I, Unit II,
and 1 full question from Unit III) out of 8 in ESA.

Date: Head of Department

Powered by www.ioncudos.com Page 7 of 19.


Department of Master of Computer Applications

Course Assessment Plan

Course Title: Big Data Analytics Code: 20ECAC801


Course outcomes (COs) Weightage Assessment Methods
in ISA 1 ISA 2 Seminar Lab ESA
assessment
1. Explain the concept and
challenge of big data and why
existing technology is 12.5%    
inadequate to analyze the big
data;
2. Analyze the impact of big data
on business decisions and 12.5%    
strategy.
3. Gain hands-on experience on
large-scale analytics tools to
solve some open big data
12.5%    
problems;
4. Apply non-relational databases
and techniques for storing and
processing large volumes of
semi-structured and
62.5%     
unstructured data.

Weightage 15% 15% 10% 10% 50%

Powered by www.ioncudos.com Page 8 of 19.


Department of Master of Computer Applications

Chapter-wise Plan

Course Code and Title: 20ECAC801/Big Data Analytics


Chapter Number and Title: 1 Types of digital data and concept of Planned Hours: 4 hrs
big data

Learning Outcomes:
At the end of the topic the student should be able to:

TLO's CO's BL CA
Code
1. Differentiate between structured, semi-structured and CO1 L3 1.4
unstructured data.
2. Explain the need to integrate structured, semi-structured and CO1 L3 1.4
unstructured data in the context of big data analytics.
3. Explain characteristics and significance of big data. CO1 L2 2.2
4. Address challenges of big data. CO1 L3 1.4

Lesson Schedule
Class No. - Portion covered per hour
1. Classification of digital data: Unstructured, Semi-structured, Structured.
2. Characteristics of data, Evolution of big data.
3. Definition of big data: 5 Vs, challenges with big data.
4. Typical data warehouse environment: Hadoop Environment.

Review Questions
Sr. No. - Questions TLO BL PI Code
1. Illustrate types of digital data. Explain sources of structured TLO1 L3 1.4.1
data.
2. Why integration of structured, semi-structured and unstructured TLO2 L3 1.4.1
data is needed in the context of data generated by Facebook.
3. Define big data. Explain five V’s of big data. Illustrate sources TLO3 L3 2.2.2
of big data.
4. What are challenges of big data? How traditional BI TLO4 L3 1.4.1
environment is different from big data environment.

Powered by www.ioncudos.com Page 9 of 19.


Department of Master of Computer Applications

Course Code and Title: 20ECAC801/Big Data Analytics


Chapter Number and Title: 2. Big Data Analytics Planned Hours: 7 hrs

Learning Outcomes:
At the end of the topic the student should be able to:

TLO's CO's BL CA
Code
1. Explain significance of big data analytics in various business CO2 L2 1.4
domains.
2. Explain the role of data scientist. CO2 L2 2.2
3. Explain various terminologies used in the big data CO2 L2 1.4
environment.
4. Select best tool/s for data analytics based on the business CO2 L3 2.2
domain.

Lesson Schedule
Class No. - Portion covered per hour
1 What is big data analytics? What big data analytics is not?
2 Classification of analytics, Top challenges facing big data
3 Importance of big data analytics
4 Need of technology to meet big data challenges, Data science: business acumen skills
5 Technology expertise, mathematics expertise
6 Data scientist, terminologies used in big data environments
7 BASE, top analytics tools

Review Questions
Sr. No. - Questions TLO BL PI Code
1 Why big data analytics is needed? Explain terminologies used TLO1 L3 2.2.2
in big data analytics.
2 Explain the knowledge required by data scientist. Give CAP TLO2 L2 2.2.2
theorem used for BDA. and
TLO3
3 What is predictive and prescriptive analytics? Illustrate any TLO4 L3 2.2.2
three analytics tools used for BDA.

Powered by www.ioncudos.com Page 10 of 19.


Department of Master of Computer Applications

Course Code and Title: 20ECAC801/Big Data Analytics


Chapter Number and Title: 3. Big data technology landscape Planned Hours: 4 hrs
Learning Outcomes:
At the end of the topic the student should be able to:

TLO's CO's BL CA Code


1. Explain the significance of different types of NoSQL such as CO4 L3 1.4
Cassandra, MongoDB and InfiniteGraph for storing unstructured
and semi-structured data.
2. Compare and contrast SQL, NoSQL and NewSQL technologies. CO4 L3 2.2
3. Explain key advantages of Hadoop and cite the differences CO4 L2 2.2
between Hadoop 1.0 and Hadoop 2.0

Lesson Schedule
Class No. - Portion covered per hour
1. Not Only SQL (NOSQL): Types of NoSQL, Advantages of NoSQL
2. Use of NoSQL in industry, NewSQL
3. Hadoop: features, key advantages, versions, overview of Hadoop ecosystem
4. Hadoop distributions, Hadoop versus SQL, Cloud-based Hadoop solutions.

Review Questions
Sr.No. - Questions TLO BL PI Code
1. What are the properties and benefits of NoSQL? Illustrate TLO1 L3 2.2.2
classification of NoSQL.
2. Who are the vendors of NoSQL? Identify and explain best TLO2 L3 2.2.2
parameters used compare SQL, NoSQL and NewSQL.
3. Justify the need of Hadoop in the context of BDA. Cite key TLO3 L3 2.2.2
advantages of Hadoop. How Hadoop 1.0 is different from
Hadoop 2.0?
.

Powered by www.ioncudos.com Page 11 of 19.


Department of Master of Computer Applications

Course Code and Title: 20ECAC801/Big Data Analytics


Chapter Number and Title: 4. Hadoop distributed file system Planned Hours: 7 hrs
Learning Outcomes:
At the end of the topic the student should be able to:

TLO's CO's BL CA Code


1 Comprehend the reasons behind the popularity of Hadoop CO3 L3 2.2
2 Explain parameters used to compare Hadoop and RDBMS. CO3 L3 2.2
3 Perform HDFS operations and use Hadoop ecosystem. CO3 L3 1.1
4 Apply MapReduce programming framework to process massive CO3 L3 1.4
amounts of data in parallel.

Lesson Schedule
Class No. - Portion covered per hour
1. Introduction, Why Hadoop, RDBMS versus Hadoop.
2. Distributed computing challenges. hardware failure, how to process gigantic store of
data.
3. History of Hadoop, Hadoop overview, use case of Hadoop
4. Hadoop distributors, Hadoop Distributed File System (HDFS), Name node, Data
node, secondary Name node.
5. Anatomy of file read, anatomy of file write, replica placement
6. Processing of data with Hadoop, Managing resources an applications with Hadoop
7. Interaction with Hadoop ecosystem.
Review Questions
Sr.No. - Questions TLO BL PI Code
1 What is the key consideration of Hadoop popularity? List and TLO1 L2 1.4.1
explain parameters used to compare Hadoop and RDBMS. &
TLO2
2 Which features of HDFS makes it suitable for distributed TLO3 L3 2.2.2
computing? Identify and explain the components used to build
HDFS architecture.
3 How MapReduce programming is used to process massive TLO4 L3 2.2.2
amounts of data in parallel. Explain the same in the context of
word count problem.

Powered by www.ioncudos.com Page 12 of 19.


Department of Master of Computer Applications

Course Code and Title: 20ECAC801/Big Data Analytics


Chapter Number and Title: 5. MongoDB and query language Planned Hours: 4 hrs
Learning Outcomes:
At the end of the topic the student should be able to:

TLO's CO's BL CA Code


1. Explain properties and the need of MongoDB for storing non- CO4 L3 2.2
relational data.
2. Perform CRUD operations in MongoDB. CO4 L3 2.2
3. Apply MapReduce programming framework in MongoDB. CO4 L3 2.2
4. Apply MongoImport and MongoExport commands in MongoDB to CO4 L3 2.2
perform data flow between CSV file and MongoDB.

Lesson Schedule
Class No. - Portion covered per hour
1 Introduction, Why MongoDB, Terms used in RDBMS and MongoDB
2 Data types in MongoDB
3 MongoDB query language: basic functions, Arrays, aggregate functions, MapReduce
function
4 Java script programming, Cursors in MongoDB, MongoImport and MongoExport.

Review Questions
Sr.No. - Questions TLO BL PI Code
1 What is MongoDB and why it is needed? How replication and TLO1 L3 2.2.2
sharding is performed in MongoDB.
2 Compare terms used in RDBMS and MongoDB. Illustrate TLO2 L3 2.2.2
CRUD operation in MongoDB.
3 How MapReduce framework is used in MongoDB? Illustrate the TLO3 L3 2.2.2
same.
4 How to implement data flow between MongoDB and CSV file? TLO4 L3 2.2.2
Illustrate the same.

Powered by www.ioncudos.com Page 13 of 19.


Department of Master of Computer Applications

Course Code and Title: 20ECAC801/Big Data Analytics


Chapter Number and Title: 6. Cassandra and MapReduce Planned Hours: 4 hrs
programming
Learning Outcomes:
At the end of the topic the student should be able to:

TLO's CO's BL CA Code


1 Use Apache Cassandra to handle large amounts of data across CO4 L3 1.4
many commodity servers.
2 Write queries in CQL for Apache Cassandra. CO4 L3 1.4
3 Build mapper and reducer to process large amounts of data in CO4 L3 2.2
Hadoop.

Lesson Schedule
Class No. - Portion covered per hour
1 Introduction, Apache Cassandra, features of Cassandra
2 Data types, CQLSH, Keyspaces, CRUD operations
3 Introduction to MapReduce, Mapper, Reducer, Combiner.
4 Partitioner, searching, sorting, compression.

Review Questions
Sr.No. - Questions TLO BL PI Code
1 Explain notable points and technical features of Apache TLO1 L2 1.4.1
Cassandra.
2 What is the need of CQL? Illustrate use of Collections, Import and TLO2 L3 1.4.1
Export in Apache Cassandra.
3 Illustrate how MapReduce programming is implemented using TLO3 L3 2.2.2
mapper and reducer on Hadoop cluster.

Powered by www.ioncudos.com Page 14 of 19.


Department of Master of Computer Applications

Course Code and Title: 20ECAC801/Big Data Analytics


Chapter Number and Title: 7. Hive and query language Planned Hours: 5 hrs
Learning Outcomes:
At the end of the topic the student should be able to:

TLO's CO's BL CA Code


1. Use Hive to query structured data built on top of Hadoop. CO4 L3 2.2
2. Create database, tables and execute DML statements on Hive. CO4 L3 2.2
3. Differentiate between static and dynamic partition in Hive. CO4 L3 1.4
4. Create managed tables and external tables using HQL. CO4 L3 2.2

Lesson Schedule
Class No. - Portion covered per hour
1 Introduction, What is Hive, History of Hive and recent releases of Hive
2 Hive integration and work flow, Hive data units
3 Hive architecture, Hive data types, Hive file format, Hive Query Language (HQL): DDL
4 DML, Hive shell, database, tables, Partitions, Bucketing, Views
5 Sub-query: RCFile implementation, SERDE, User defined function

Review Questions
Sr.No. - Questions TLO BL PI Code
1 What is Hive? How it is used to query structured data built on TLO1 L3 2.2.2
top of Hadoop?
2 What types of data are supported by Hive? Illustrate DDL TLO2 L3 2.2.2
statements in HQL.
3 Why partitions are required in Hive? Illustrate static and TLO3 L3 1.4.1
dynamic partitions in Hive
4 Cite features of Hive? Illustrate how managed table and TLO4 L3 2.2.2
external tables are created in Hive.

Powered by www.ioncudos.com Page 15 of 19.


Department of Master of Computer Applications

Course Code and Title: 20ECAC801/Big Data Analytics


Chapter Number and Title: 8. PIG Planned Hours: 5 hrs

Learning Outcomes:
At the end of the topic the student should be able to:

TLO's CO's BL CA Code


1. Explain features and anatomy of a Pig. CO4 L2 1.4
2. Write scripts in Pig Latin language to analyze large amounts of CO4 L2 2.2
data.
3. Run Pig both in interactive mode and batch mode. CO4 L3 1.4
4. Apply relational operators and built-in API’s to extract required CO4 L3 2.2
patterns from large amounts of data.

Lesson Schedule
Class No. - Portion covered per hour
1 Introduction, What is PIG, Key features of PIG
2 The anatomy of PIG, PIG philosophy, use case for PIG: ETL processing
3 PIG Latin overview, Data types in PIG, Running PIG, execution modes of PIG
4 HDFS commands, relational operators, eval function
5 Complex data types, piggy bank, user defined function.
.

Review Questions
Sr.No. - Questions TLO BL PI Code
1 Why do we need Apache Pig? What features makes Apache TLO1 L3 1.4.1
Pig more popular? How Apache Pig is different from
MapReduce?
2 Write a user defined function in Pig Latin to convert name into TLO2 L3 2.2.2
uppercase.
3 Explain different running modes and execution modes of Pig. TLO3 L2 1.4.1
4 Write Pig Latin script to illustrate the uses of relational TLO4 L3 2.2.2
operators using Filter operator.
.

Powered by www.ioncudos.com Page 16 of 19.


Department of Master of Computer Applications

Question Paper Title: Model Question Paper for ISA- I

Total Duration : 75 Minutes Course : Big Data Analytics Maximum Marks :40
Code: 20ECAC801
Note:
1. Answer any two FULL questions
2. Missing data may be suitably assumed with justification.
Q.N Questions Mark CO BL PO PI
o. s Code
1a Illustrate types of digital data. Explain sources of structured 10 CO1 L3 1 1.4.1
data.
1b What are the properties and benefits of NoSQL? Illustrate 10 CO4 L3 2 2.2.2
classification of NoSQL.
2a Why big data analytics is needed? Explain terminologies 10 CO2 L3 2 2.2.2
used in big data analytics.
2b Why integration of structured, semi-structured and 5 CO1 L3 1 1.4.1
unstructured data is needed in the context of data generated
by Facebook.
2c Define big data. Explain three V’s of big data. Illustrate 5 CO2 L2 2 2.2.2
sources of big data.
3a Who are the vendors of NoSQL? Identify and explain best 10 CO4 L3 2 2.2.2
parameters used to compare SQL, NoSQL, and NewSQL.
3b What is predictive and prescriptive analytics? Illustrate any 10 CO2 L3 2 2.2.2
three analytics tools used for BDA.

Question Paper Title: Model Question Paper for ISA- II


Total Duration : 75 Minutes Course : Big Data Analytics Maximum Marks :40
Code : 20ECAC801
Note:
1. Answer any two FULL questions
2. Missing data may be suitably assumed with justification.
Q.No Questions Mark CO BL PO PI
. s Code
1a How MapReduce programming is used to process massive 10 CO3 L3 2 2.2.2
amounts of data in parallel. Explain the same in the context of
word count problem.
1b Illustrate how MapReduce programming is implemented using 10 CO1 L3 2 2.2.2
mapper and reducer on Hadoop cluster.
2a What is MongoDB and why it is needed? How replication and 10 CO4 L2 1 1.4.1
sharding is performed in MongoDB.
2b What is the key consideration of Hadoop popularity? List and 5 CO3 L3 2 2.2.2
explain parameters used to compare Hadoop and RDBMS.
2c Which features of HDFS makes it suitable for distributed 5 CO3 L2 1 1.4.1
computing? Identify and explain the components used to build
HDFS architecture.
3a What is the need of CQL? Illustrate use of Collections, Import and 10 CO4 L3 2 2.2.2
Export in Apache Cassandra.
3b Compare terms used in RDBMS and MongoDB. Illustrate 10 CO1 L3 1 1.4.1
CRUD operation in MongoDB.

Powered by www.ioncudos.com Page 17 of 19.


Department of Master of Computer Applications

Question Paper Title: Model Question Paper for ESA


Total Duration : 3 hours Course : Big Data Analytics Maximum Marks :100
Code: 20ECAC801
Note:
1. Answer any two FULL questions from UNIT I and UNIT II; Answer any one question
from UNIT III.
2. Missing data may be suitably assumed with justification.
Unit 1
Q.No. Questions Marks CO BL PO PI
Code
1a What are the different types of digital data? What are 10 CO1 L3 1 1.4.1
their sources? Explain with illustrative examples.
1b Illustrate the context where NoSQL is the solution. 10 CO4 L3 2 2.2.2
What are the properties and benefits of NoSQL?
Illustrate classification of NoSQL.
2a Justify the need of big data analytics. List and explain 10 CO2 L3 2 2.2.2
the challenges of big data.
2b Why integration of structured, semi-structured and 5 CO1 L3 1 1.4.1
unstructured data is needed in the context of data
generated by Facebook.
2c What is big data? What are the characteristics of big 5 CO1 L2 2 2.2.2
data?
3a Who are the vendors of NoSQL? Identify and explain 10 CO4 L3 2 2.2.2
best parameters used compare SQL, NoSQL and
NewSQL.
3b Illustrate, how predictive analytics is different form 10 CO2 L3 2 2.2.2
prescriptive analytics? Illustrate any three analytics
tools used for BDA.
Unit II
Q.No. Questions Marks CO BL PO PI
Code
4a How MapReduce concept is adopted for processing big 10 CO3 L3 2 2.2.2
data. Explain the same in the context of word count
problem.
4b Illustrate how MapReduce programming is 10 CO4 L3 2 2.2.2
implemented using mapper and reducer on Hadoop
cluster.
5a Explain the features of MongoDB. How replication and 10 CO4 L3 2 2.2.2
sharding is performed in MongoDB.
5b What is the key consideration of Hadoop popularity? 5 CO3 L3 1 1.4.1
List and explain parameters used to compare Hadoop
and RDBMS.
5c Why HDFS is suitable for distributed computing? 5 CO3 L3 2 2.2.2
Explain the components used to build HDFS
architecture.
6a What are the features of Apache Cassandra? Illustrate 10 CO4 L3 2 2.2.2
use of Collections, Import and Export in Apache
Cassandra.

Powered by www.ioncudos.com Page 18 of 19.


Department of Master of Computer Applications

6b Compare the Insert, Update, Delete and Select 10 CO4 L3 2 2.2.2


statements of RDBMS and MongoDB. Illustrate CRUD
operation in MongoDB.
Unit III
Q.No. Questions Marks CO BL PO PI
Code
7a With neat diagram explain the functionalities of building 10 CO4 L3 1 1.4.1
blocks of Hive architecture.
7b What types of data are supported by Hive? Illustrate 10 CO4 L3 1 1.4.1
DDL statements in HQL.

8a Why do we need Apache Pig? What features makes 10 CO4 L3 1 1.4.1


Apache Pig more popular? How Apache Pig is different
from MapReduce?
8b Explain two execution modes of Pig. Write a user 10 CO4 L3 2 2.2.2
defined function in Pig Latin to convert name into
uppercase.

Powered by www.ioncudos.com Page 19 of 19.

You might also like