Bda MCQ

BDA
1) MapReduce framework of Hadoop is also takes care of _______.

a) Scheduling b) Monitoring c) Re-executing failed task d) All
2) The term NoSQL was first coined by _______.

a) Doug Laney b) Carlo Strozzi c) Bre wer d) Gartner
* 3) The structured, unstructured and semi-structured data is

deals with which of the following characteristics?
a) Velocity b) Volatility c) Variability d) Volume
* 4) In which of the following analysis, Data is descriptive,

predictive and prescriptive?
a) Analytics 1.0 b) Analytics 2.0 c) Analytics 3.0 d) None
5) When NameNode starts up, it reads the ______ and ______ from
disk.
a) TaskTracker,JobTracker b) FsImage, EditLog
c) Master Node, Slave Node d) None of the above.
6) Which of the following is a tool to transfer data between Hadoop

and Relational Databases?
a) Sqoop b) HBase c) Hive d) Pig
7) Which of the following is/are advantages of Hadoop?

a) Scalable b) Cost Effective c) Fault-tolerant d) All the above
8) Core mongo DB operations are ________.
a) create, select, update, delete b) create, read, update, delete
c) create, read, update, drop d) create, remove, update, drop
9) Cassandra is a ________ database.

a) Document-oriented b) Graph-oriented c) Column-oriented d)
SQL
10) A List is a collection of _________.

a) Unordered elements b) Ordered elements
c) Paired elements d) Only images
11) PIG is ________.

a) dataflow language b) NoSQL database
c) import export tool d) scheduling engine
12) Hive provides _________ kinds of partitions.

a) Static b) Dynamic
c) Both Static and dynamic d) Neither static nor dynamic
13) ______ is Data warehousing tool.

a) Jaspersoft studio b) Cassandra c) Pig d) Hive
14) ETL processing in Pig stand for _________.

a) Extract, transform and load b) Extend transfer and load
c) Extract, transform and local d) None of the above
15) _______ used to transmit data between web server and web
application.
a) XML b) JSON c) Both a) and b) d) None
16) MS-Excels are under the category of _________ data.

a) Structured data b) Semi-structured data c) Unstructured data
d) None
17) Real time processing deals with __________ characteristic of

data.
a) Variety b) Velocity c) Variability d) Volume
18)The 3V’s term of big data was first introduced by

a) Doug Laney b) Brewer c) Carlo Strozzi d) Doug Cutting
19) _________ has no support for ACID properties.

a) SQL b) MySQL c) NewSQL d) NoSQL
20)A coordinated processing of a program by multiple processors,

each working on different part of the program and using its own
operating system and memory is called
a) In memory analysis b) Distributed system
c) Massively parallel processing d) Shared disk
* 21) Node manager is responsible for launching application

a) Resources b) Job c) Containers d) None
22) Which of the following format is supported by MongoDB ?

a) XML b) SQL c) BSON d) All of the mentioned
23) Core MongoDB operation are
a) Create, Select, Update, Delete b) Create, Read, Update,
Delete
c) Create, Read, Update, Drop d) None
* 24) Which of the following is correct command to update a

document ?
a) db.books.update({item:’book”,qty:{$gt:7},{$set:{x:5},$inc:{y:8})
b) db.books.find( ).update({item:“book”,qty:{$gt:7},{$set:{x:5},$inc:{y:8})
c)
db.books.update({item:’book”,qty:{$gt:7},{$set:{x:5},$inc:{y:8},{multi:true}
)
d) db.books.find( ).update({item:’book”,qty:{$gt:7},{$set:{x:5},$inc:{y:8},
{multi:true})
25) Which of the following is wide column store ?

a) MongoDB b) Cassandra c) Riak d) Redis
26) MetaStore contains _________ of Hive tables.

a) System catalog b) Drivers c) Database d) CLI
27) The interactive mode of pig is

a) Pig Engine b) Grunt c) ETL d) Pig Latin
28) In Cassandra __________ is called peer to peer communication

protocol used for intra ring communications.
a) Anti-Entropy b) Gossip protocol c) Hinted Handoffs d) All
29)CCTV footage is which type of data ?
a) Structured data b) Unstructured data c) Semi-structured data d) All
of these
* 30)Big volume of data like 1 Yottabyte is equal to

a) 1024 ∧ 4 bytes b) 1024 ∧ 6 bytes c) 1024 ∧ 8 bytes d) 1024 ∧
9 bytes
31) System will continue to function even when network partition

occurs is
a) Partition tolerant b) Consistency c) Availability d) None
* 32)___________ is robust database that support ACID properties

of transaction and has the scalability of NoSQL.
a) SQL b) NewSQL c) MysQL d) NoSQL
33) A typical block size is used by HDFS is

a) 64 MB b) 128 MB c) 32 MB d) 256 MB
34) _____________ is the book-keeper of HDFS.

a) Name node b) Data node c) Job Tracker d) Task tracker
35) ) Name node uses ______________ to record every transaction.

a) FsImage b) EditLog c) Data node d) Map Reduce
36)MongoDB has been adopted as _____________ software by a

number of major websites and services.
a) Frontend b) Backend c) Proprietary d) All of the mentioned
37) Which one of the following is equivalent to ? Select * from
employee order by salary desc;
a) db.employee.find.sort({“salary”:1}) b)
db.employee.find.sort({“salary”:-1})
c) db.employee.sort({“salary”:1}) d) db.employee.sort({“salary”:-
1})
38)Hive is ______________ tool.
a) Data Flow b) Data Warehouse c) Import Export Tool d) Data
Transfer
39) Cassandra is column oriented database designed to support

__________ symmetric node architecture.
a) Peer to Peer b) Master Slave c) Both a) and b) d) Non
40)The 3 types of collection used in Cassandra are

a) Array, Set, List b) Set, List, struct
c) Set, MAP, Array d) Set, List, MAP
41)MetaStore consist of _____________ and a ____________

a) Metaservices, database b) Metatable, WebUI
c) Metaservices, drivers d) CLI, Server
* 42)Pig is used in ____________ process.

a) ETL b) Scripting c) Database d) None
43)E-mails are under the catagory of ___________ data.

a) Structured data b) Semi-structured data
c) Unstructured data d) None of above
44)______ is used to transmit data between a web server and a web
application.
a) XML b) JSON c) Both a and b d) None
45)MongoDB based on ________ and _________ properties of CAP

theorem.
a) consistancy and availability b) availability and partition
tolerance
c) consistancy and partition tolerance d) all of the above
* 46)Big volume of data like 1 Zettabytes is equal to

a) 10244 Bytes b) 10246 Bytes c) 10245 Bytes d) 10247
Bytes
47)Cap theorem is also called as ________ theorem.

a) Doug Laney b) Brewer c) Carlo Strozzi d) Doug
cutting
*48) Hadoop supports ________ data formats.

a) structured b) semi-structured c) unstructured d) all
above
49) MongoDB is
a) RDBMS b) Document-oriented DBMS
c) Object Oriented DBMS d) Key-value store
50) Which command in MongoDB is equivalent to SQL select ?

a) find() b) search() c) document() d) none of above
* 51) ) What does the following command do ?
db.sample.find().limit(10)
a) Show 10 documents randomly from the collection sample
b) Show only first 10 documents from the collection sample
c) Repeats the first document 10 times
d) None of above
52)In Cassandra, _________ is called peer-to-peer communication

protocol used for intra-ring communication.
a) Anti-entropy b) Hinted Handoffs c) Gossip protocol d) None
of above
53)Databases are under the catagory of data.

a) Structured data b) Semi-structured data
c) Unstructured data d) None of above
54`) What was Hadoop named after ?

a) Creator Doug cutting’s favorite circus act
b) Cutting’s high school rock band
c) The toy elephant of Cutting’s son
d) A sound Cutting’s laptop made during Hadoop’s development
55) A____ serves as the master and there is only one NameNode
per cluster.
e) Data Node b) NameNode c) Data block d)
Replication
56)_____ NameNode is used when the Primary NameNode

goes down.
a) Rack b) Data c) Secondary d) None of

these
57)What is the default HDFS replication factor ?

a) 4 b) 1 c) 3 d) 2
*58)Which one is not one of the big data feature ?
a)Velocity b) Veracity c) Volume d) Variety
59)NameNode in HDFS uses ___to store file system name space.
a)EditLog b) Fslmage c) Data node d) Map reduce

60) How many blocks will be created for a file that is 300 MB ?
The default block size is 64 MB and the replication factor is 3.
a) 30 b) 15 c) 5 d) 100
61)What does job tracker do ?

a) Stores block of data b) Store metadata
c) Coordinate and schedule the job d) Act as mini reducer
62)HDFS is based on
a) Facebook file system b) Google file system
c) lBM file system d) Yahoo file system
63)The hadoop frame work is written in

a) C++ b) Python c) Java d) C
64) Apache Cassandra was born at

a) Google b) Facebook c) lBM d) Yahoo
65)Which of the following is a valid flow in Hadoop ?
a. Input -> Reducer -> Mapper -> Combiner -> -> Output
b. Input -> Mapper -> Reducer -> Combiner -> Output
c. Input -> Mapper -> Combiner -> Reducer -> Output
d. Input -> Reducer -> Combiner -> Mapper -> Output
66)Hive is used as
a) Data Flow Language b) Data Warehousing
Language
c) Workflow Language d) Scheduling Language
67)MapReduce was devised by
a)Apple b) Google c) Microsoft d) Samsung
68)Hadoop was first introduced by

a) Doug Laney b) Brewer c) Carlo Strozzi d) Doug
cutting
69)On a single Hadoop cluster how many Name node can run ?
a)depend on clusters b) only one
c) only 3 d) depend on data nodes
* 70)Apache hadoop YARN is a sub-project of

a)Hadoop 1.0 b) Hadoop 2.0 c) Both d) None
71)A container used to hold application data in Cassandra is called

a)Document b) Table c) Keyspaces d) Record
SECTION-II
1. Pig in Hadoop Eco system is
a) Data Flow language b) NoSQL database c) import export tool d) scheduling

engine
Ans: A
2. Which of the following function is used to read data in PIG ?

A. WRITE
B. READ
C. LOAD -ans
D. None of the mentioned
3. You can run Pig in interactive mode using the ______ shell.
A. Grunt -ans
B. FS
C. HDFS
D. None of the mentioned
4. ________ is the slave/worker node and holds the user data in the form of Data
Blocks.
A. DataNode -ans
B. NameNode
C. Data block
D. Replication
5. What was Hadoop named after?

a) Creator Doug Cutting’s favorite circus act
6. A ________ serves as the master and there is only one NameNode per cluster.
a) Data Node b) NameNode c) Data block d) Replication
7. What is the default HDFS replication factor?

a) 4 b) 1 c) 3 d) 2
8. Which one is not one of the big data feature?

a) Velocityb) Veracity c) Volume d) Variety
9. The 3 types of Collections used in Cassandra are

a) Array, List, Map b) List, Map,Struct c) List, Set, Map d) Set, Map, Array
10. The metastore in Hive consist of ______ and ______
a) driver, services b) metaservices, database
c) driver, database d) metaservices, driver
11. Pig is __________________

a) Data Flow language b) NoSQL database c) import export tool d) scheduling
engine
12. Which of the following is a valid flow in Hadoop ?
a. Input -> Reducer -> Mapper -> Combiner -> -> Output
b. Input -> Mapper -> Reducer -> Combiner -> Output
c. Input -> Mapper -> Combiner -> Reducer -> Output -ans
d. Input -> Reducer -> Combiner -> Mapper -> Output
13. Hive is used as ______________
a) Data Flow language b) Data Warehousing language
c) Workflow language d) scheduling language
14. MapReduce was devised by ...
a) Apple b) Google -ans c) Microsoft d) Samsung

15. The hadoop frame work is written in
a) C++ b) Python c) Java d) C
16. Apache Cassandra was born at _________

a) Google b) Facebook c) IBM d) Yahoo
17. A container used to hold application data in Cassandra is called _______

a) Document b) Table c) Keyspaces -ans d) Record
18. Which one of the following is equivalent to following in MongoDB

Select * from employees order by salary desc;
a) db.employee.find().sort({“salary” :1}); b) db.employee.sort({“salary” :-1});
a) db.employee.find().sort({“salary” :-1}); d) db.employee.sort({“salary” :1});
19) The term ‘ ETL ‘ used in Pig stands for
b) a) extract, transform, load b) extend, transfer, load
c) c) extract, transform, local d) extract, transfer, load
20) Which of the following is component of Hadoop?
a) YARN
b) HDFS
c) Map Reduce
d) All of above - ans
21) Which of the following platforms does Apache Hadoop run on ?

a) Bare metal
b) Unix like
c) Cross platform -ans
d) Debian
22) Which type of data Hadoop can deal with is

a) Structured
b) Semi - structured
c) Unstructured
d) All of above - ans
23) Which of the following is a column-oriented database that runs on top of HDFS
a) Hive
b) Sqoop
c) HBase
d) Flume
24) Which of the following is not the Dameon process that runs on a
hadoop cluster ?
a. JobTracker
b. DataNode
c. TaskTracker
d. TaskNode -ans
25) MongoDB stores all documents in _____________
a) tables
b) collections -ans
c) rows
d) all of the mentioned
26) Which of the following query selects documents in the records collection that
match the condition { “user_id”: { $lt: 42 } }?
a) db.records.findOne( { “user_id”: { $lt: 42 } }, { “history”: 0 } )

b) db.records.find( { “user_id”: { $lt: 42 } }, { “history”: 0 } ) -ans
c) db.records.findOne( { “user_id”: { $lt: 42 } }, { “history”: 1 } )
d) db.records.select( { “user_id”: { $lt: 42 } }, { “history”: 0 } )
27) Which of the following key is used to denote uniqueness in the collection of
MongoDB?
a) _id -ans
b) id
c) id_
d) none of the mentioned
28) Which of the following line skips the first 5 documents in the bios collection and
returns all remaining documents in MongoDB?
a) db.bios.find().limit( 5 )
b) db.bios.find().skip( 1 )
c) db.bios.find().skip( 5 )
d) db.bios.find().sort( 5 )
29) Each database created in hive is stored as

a) a directory -ans
b) a file
c) a hdfs block
d) a jar file
Brahmdevdada Mane Institute of Technology, Solapur
Department of Computer Science & Engineering
Multiple Choice Questions
Class:- BE-CSE ACY:-2020-21
Subject:- Big Data Analytics Sem-II
Unit 1 Introduction to Types of Digital Data

1. XML is which type of data ?
a) structured data b) unstructured data c) semi-structured data d) all three
Ans: C
2. 7) E-mail is which type of data ?
Ans: B
3. CCTV footage is which type of data ?
Ans: B
4. XML and JSON are sources of which type of data ?
Ans : C
5. OLTP system and spreadsheets are sources of which type of data ?
Ans : A
6. Example on types of data
Structured Unstructured Semi-structured
MS Access Email XML
Database Images
Relations/Tables Chat conversations
MS Excel Facebook
Videos
7. Match the Following answer
Column A ANS
NLP Comprehend human or natural language

input
Text Analytics Text Mining
UIMA Content Analytics
Noisy unstructured Data Text Messages
Data Mining Uses methods at the intersection of

statistics, AI, machine learning & DBs.
Noisy Unstructured Data Chats
IBM UIMA
Unit 2 Introduction to Big Data
1. Doug Laney a gartner analyst coined the term ‘Big Data’
2. Volatility is the characteristics of data dealing with its retention.
3. Data Lakes is a large repository of data in its native format until it is needed.
4. Variability characteristics of data explains the spikes in data.
5. Near real time processing or real time processing deals with Velocity characteristics of
data.
6. Big data is high volume, high velocity and high variety information assets that demand
cost-effective , innovative forms of information processing for enhanced Insight and
Decision making.
7. Match the Following with answers
Column A Answer
Postgre Sql Open source relational database
Scientific Data Machine Generated unstructured data
Point of Sale Machine generated structured data
Social Media Data Human-generated unstructured data
Gaming Related Data Human-generated structured data
Mobile Data Human-generated unstructured data
Unit 3 Big Data Analytics
1) The expansion for CAP is ____________,______________ and _____________ .
a) Consistency, Ability, Partition Tolerance b) Consistency, Atomicity, Partition Tolerance
c) Consistency, Availability, Parallel d) Consistency, Availability, Partition Tolerance
Ans: D
2) MongoDB is ___________ and___________.
a) Consistency and Partition Tolerance(CP) b) Consistency and Availability (CA)
c) Availability and Partition Tolerance (AP) d) none of above
Ans: A
3) Cassandra is ___________and ___________.
a) Consistency and Partition Tolerance(CP) b) Consistency and Availability (CA)
c) Availability and Partition Tolerance (AP) d) none of above
Ans: C
4) __________ has no support for ACID properties of transaction.
a) NoSQL b) newSQL c) SQL d) none of above
Ans: A
5) __________ is a robust database that supports ACID properties of transactions and has the
scalability of NoSQL.
a) NoSQL b) newSQL c) SQL d) none of above
Ans: B
6) The expansion of BASE is ______________
Ans:- Basically available soft state eventual consistency
7) Hadoop has a shared nothing architecture
8) In a Hadoop 2.0 a new and separate resource management framework called Yet Another
Resource Negotiator(YARN) has been added.
9) CAP theorem is also called as Brewer Theorem
10) The In-memory analytics technology helps query data that resides in a computer random
access memory(RAM) rather than data stored on physical disks.
11) Eventually consistency is a consistency model used in distributed computing to achieve high
availability
12) Scalability is an important advantage of shared nothing architecture
13) In shared disk architecture multiple processors have their own private memory.
14) In shared memory architecture central memory is shared by multiple processors.
15) Ambari is a web based tool for provisioning, managing and monitoring Apache Hadoop
clusters.
Unit 4 Introduction to Hadoop
1. The 3 V terms of Big data was first introduced by _________
a) Doug Laney b) Brewer c) Carlo Strozzi d)Doug cutting
Ans: A
2. Hadoop was first introduced by __________
a) Doug Laney b) Brewer c)Carlo Strozzi d)Doug cutting
Ans : D
3. NoSQL was first introduced by ____________
a) Doug Laney b) Brewer c)Carlo Strozzi d)Doug cutting
Ans : C
4. Name node in HDFS uses _________ to store file system name space.
a) EditLog b) FsImage c) Data node d)Map reduce
Ans: B
5. Name node in HDFS uses _________ to record every transaction.
a) EditLog b) FsImage c) Data node d)Map reduce
Ans: A
6. A typical block size used by HDFS is _______
a) 32 MB b)64 MB c) 64 KB d)32 KB
Ans: B
7. How many blocks will be created for a file that is 300 MB? The default block size is 64
MB and the replication factor is 3.
a) 30 b)15 c)5 d) 100
Ans: B
8. What does Job tracker do ?
a) stores block of data b) store metadata c) coordinate and schedule the job d) act as mini
reducer
Ans: C
9. The MapReduce programming model widely used in analytics was developed at ______
a) Yahoo b) IBM c) Google d) Facebook
Ans: C
10. On a single Hadoop cluster how many Name node can run ?
a) depend on clusters b) only one c) only 3 d) depend on data nodes
Ans: A
11. HDFS is based on _______
a)Facebook file system b) Google file system c) IBM File system d)Yahoo file system
Ans.: B
12. Apache hadoop YARN is a sub-project of ______
a) Hadoop 1.0 b) Hadoop 2.0 c)Both d) none
Ans: B
13. YARN stands for ______________
a) Yet Another Relocator Node b) Yet Apache Resource Negotiator
c) Yet Another Resource Negotiator d) Yahoo Another Resource Negotiator
Ans: C
14. Hadoop supports Structured,Semistructured and unstructured data formats.
15. In Hadoop Data is processed in parallel.
16. NameNode uses FsImage to store file system namespace.
17. NameNode uses EditLog to record every transaction
18. Secondary NameNode is a Helper or HouseKeeping Daemon.
19. DataNode is responcible for Read/Write file operation.
20. Hadoop 2.x is based on YARN architecture.
21. YARN is responsible for Cluster Management.
22. HDFS has Master/Slave architecture.
23. HDFS is built using Java Language.

24. The name node periodically receives a Heartbeat and a block report from each of the
data nodes in the cluster.
25. Receipt of heartbeat implies that the Data Node is functioning properly.
26. A block report contains list of all blocks in on a data node.
27. Match me(Answers are given here in table)
HDFS Storage
Mapreduce programming Processing Data
Master Node Name Node
Slave Node Data Node
Hadoop Implementation Google File System and Map Reduce
28. Match me(Answers are given here in table)
Name Node Handles storage on master
Job Tracker Handles processing on master
Data Node Handles storage on slave
Task Tracker Handles processing on slave
29. Oozie is to import/export data from RDBMS Ans:- False
30. “hadoop fs-ls/” will show the contents for the HDFS root directory. Ans:- True
BIG DATA ANALYTICS SKNSCOEK
UNIT - I
1. As companies move past the experimental phase with Hadoop, many cite the need
for additional capabilities, including:
a) Improved data storage and information retrieval
b) Improved extract, transform and load features for data integration
c) Improved data warehousing functionality
d) Improved security, workload management and SQL support
2. Point out the correct statement :

a) Hadoop do need specialized hardware to process the data
b) Hadoop 2.0 allows live stream processing of real time data
c) In Hadoop programming framework output files are divided in to lines or records
d) None of the mentioned
3. According to analysts, for what can traditional IT systems provide a foundation when
they’re integrated with big data technologies like Hadoop ?
a) Big data management and data mining
b) Data warehousing and business intelligence
c) Management of Hadoop clusters
d) Collecting and storing unstructured data
4. Hadoop is a framework that works with a variety of related tools. Common cohorts
include:
a) MapReduce, Hive and HBase
b) MapReduce, MySQL and Google Apps
c) MapReduce, Hummer and Iguana
d) MapReduce, Heron and Trumpet
5. Point out the wrong statement :

a) Hardtop’s processing capabilities are huge and its real advantage lies in the ability to
process terabytes & petabytes of data
b) Hadoop uses a programming model called “MapReduce”, all the programs should
confirms to this model in order to work on Hadoop platform
c) The programming model, MapReduce, used by Hadoop is difficult to write and test
d) All of the mentioned
6. What was Hadoop named after?

a) Creator Doug Cutting’s favorite circus act
7. All of the following accurately describe Hadoop, EXCEPT:

a) Open source
b) Real-time
c) Java-based
d) Distributed computing approach
8. __________ can best be described as a programming model used to develop Hadoop-

based applications that can process massive amounts of data.
a) MapReduce
b) Mahout
c) Oozie
9. __________ has the world’s largest Hadoop cluster.

a) Apple
b) Datamatics
c) Facebook
10. Facebook Tackles Big Data With _______ based on Hadoop.

a) ‘Project Prism’
b) ‘Prism’
c) ‘Project Big’
d) ‘Project Data’
UNIT – II
1. ________ is a platform for constructing data flows for extract, transform, and load
(ETL) processing and analysis of large datasets.
a) Pig Latin
b) Oozie
c) Pig
d) Hive

a) Hive is not a relational database, but a query engine that supports the parts of SQL
specific to querying data
b) Hive is a relational database with SQL support
c) Pig is a relational database with SQL support
3. _________ hides the limitations of Java behind a powerful and concise Clojure API for
Cascading.
a) Scalding
b) HCatalog
c) Cascalog
4. Hive also support custom extensions written in :

a) C#
b) Java
c) C
d) C++

a) Elastic MapReduce (EMR) is Facebook’s packaged Hadoop offering
b) Amazon Web Service Elastic MapReduce (EMR) is Amazon’s packaged Hadoop
offering
c) Scalding is a Scala API on top of Cascading that removes most Java boilerplate
6. ________ is the most popular high-level Java API in Hadoop Ecosystem

a) Scalding
b) HCatalog
c) Cascalog
d) Cascading
7. ___________ is general-purpose computing model and runtime system for distributed

data analytics.
a) Mapreduce
b) Drill
c) Oozie
8. The Pig Latin scripting language is not only a higher-level data flow language but
also has operators similar to :
a) SQL
b) JSON
c) XML
9. _______ jobs are optimized for scalability but not latency.

a) Mapreduce
b) Drill
c) Oozie
d) Hive
10. ______ is a framework for performing remote procedure calls and data serialization.
a) Drill
b) BigTop
c) Avro
d) Chukwa
UNIT - III
1. IBM and ________ have announced a major initiative to use Hadoop to support
university courses in distributed computer programming.
a) Google Latitude
b) Android (operating system)
c) Google Variations
d) Google

a) Hadoop is an ideal environment for extracting and transforming small volumes of
data
b) Hadoop stores data in HDFS and supports data compression/decompression
c) The Giraph framework is less useful than a MapReduce job to solve graph and
machine learning
3. What license is Hadoop distributed under ?

a) Apache License 2.0
b) Mozilla Public License
c) Shareware
d) Commercial
4. Sun also has the Hadoop Live CD ________ project, which allows running a fully
functional Hadoop cluster using a live CD.
a) OpenOffice.org
b) OpenSolaris
c) GNU
d) Linux
5. Which of the following genres does Hadoop produce ?

a) Distributed file system
b) JAX-RS
c) Java Message Service
d) Relational Database Management System
6. What was Hadoop written in ?

a) Java (software platform)
b) Perl
c) Java (programming language)
d) Lua (programming language)
7. Which of the following platforms does Hadoop run on ?

a) Bare metal
b) Debian
c) Cross-platform
d) Unix-like
8. Hadoop achieves reliability by replicating the data across multiple hosts, and hence
does not require ________ storage on hosts.
a) RAID
b) Standard RAID levels
c) ZFS
d) Operating system
9. Above the file systems comes the ________ engine, which consists of one Job Tracker,
to which client applications submit MapReduce jobs.
a) MapReduce
b) Google
c) Functional programming
d) Facebook
10. The Hadoop list includes the HBase database, the Apache Mahout ________ system,
and matrix operations.
a) Machine learning
b) Pattern recognition
c) Statistical classification
d) Artificial intelligence
UNIT – IV
1. A ________ node acts as the Slave and is responsible for executing a Task assigned to
it by the JobTracker.
a) MapReduce
b) Mapper
c) TaskTracker
d) JobTracker

a) MapReduce tries to place the data and the compute as close as possible
b) Map Task in MapReduce is performed using the Mapper() function
c) Reduce Task in MapReduce is performed using the Map() function
3. ___________ part of the MapReduce is responsible for processing one or more chunks
of data and producing the output results.
a) Maptask
b) Mapper
c) Task execution
4. _________ function is responsible for consolidating the results produced by each of

the Map() functions/tasks.
a) Reduce
b) Map
c) Reducer

a) A MapReduce job usually splits the input data-set into independent chunks which
are processed by the map tasks in a completely parallel manner
b) The MapReduce framework operates exclusively on pairs
c) Applications typically implement the Mapper and Reducer interfaces to provide the
map and reduce methods
6. Although the Hadoop framework is implemented in Java , MapReduce applications

need not be written in :
a) Java
b) C
c) C#
7. ________ is a utility which allows users to create and run jobs with any executables
as the mapper and/or the reducer.
a) Hadoop Strdata
b) Hadoop Streaming
c) Hadoop Stream
8. __________ maps input key/value pairs to a set of intermediate key/value pairs.

a) Mapper
b) Reducer
c) Both Mapper and Reducer
9. The number of maps is usually driven by the total size of :

a) inputs
b) outputs
c) tasks
10. _________ is the default Partitioner for partitioning key space.

a) HashPar
b) Partitioner
c) HashPartitioner
UNIT – V
1. Mapper implementations are passed the JobConf for the job via the ________ method
a) JobConfigure.configure
b) JobConfigurable.configure
c) JobConfigurable.configureable

a) Applications can use the Reporter to report progress
b) The Hadoop MapReduce framework spawns one map task for each InputSplit
generated by the InputFormat for the job
c) The intermediate, sorted outputs are always stored in a simple (key-len, key, value-
len, value) format
3. Input to the _______ is the sorted output of the mappers.

a) Reducer
b) Mapper
c) Shuffle
4. The right number of reduces seems to be :

a) 0.90
b) 0.80
c) 0.36
d) 0.95

a) Reducer has 2 primary phases
b) Increasing the number of reduces increases the framework overhead, but increases
load balancing and lowers the cost of failures
c) It is legal to set the number of reduce-tasks to zero if no reduction is desired

d) The framework groups Reducer inputs by keys (since different mappers may have
output the same key) in sort stage
6. The output of the _______ is not sorted in the Mapreduce framework for Hadoop.
a) Mapper
b) Cascader
c) Scalding
7. Which of the following phases occur simultaneously ?

a) Shuffle and Sort
b) Reduce and Sort
c) Shuffle and Map
8. Mapper and Reducer implementations can use the ________ to report progress or just
indicate that they are alive.
a) Partitioner
b) OutputCollector
c) Reporter
9. __________ is a generalization of the facility provided by the MapReduce framework to

collect data output by the Mapper or the Reducer
a) Partitioner
b) OutputCollector
c) Reporter
10. _________ is the primary interface for a user to describe a MapReduce job to the
Hadoop framework for execution.
a) Map Parameters
b) JobConf
c) MemoryConf

UNIT - VI
1. Which of the following scripts that generate more than three MapReduce jobs ?
a)
a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:m

ap[]);
b = group a by (j#'PIG_SCRIPT_ID', j#'USER', j#'JOBNAME');
c = for b generate group.$1, group.$2, COUNT(a);
d = filter c by $2 > 3;
dump d;
b)
ap[]);
b = display a by (j#'PIG_SCRIPT_ID', j#'USER', j#'JOBNAME');
c = foreach b generate group.$1, group.$2, COUNT(a);
dump d;
c)

ap[]);
b = group a by (j#'PIG_SCRIPT_ID', j#'USER', j#'JOBNAME');
c = foreach b generate group.$1, group.$2, COUNT(a);
dump d;

a) LoadPredicatePushdown is same as LoadMetadata.setPartitionFilter
b) getOutputFormat() is called by Pig to get the InputFormat used by the loader
c) Pig works with data from many sources
3. Which of the following find the running time of each script (in seconds) ?
a)

ap[]);
b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_na

me,
(Long) j#'SUBMIT_TIME' as start, (Long) j#'FINISH_TIME' as end;
c = group b by (id, user, script_name)
d = foreach c generate group.user, group.script_name, (MAX(b.end) - MIN(b.start)/1000;
dump d;
b)
ap[]);
b = for a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name,
(Long) j#'SUBMIT_TIME' as start, (Long) j#'FINISH_TIME' as end;
c = group b by (id, user, script_name)
d = for c generate group.user, group.script_name, (MAX(b.end) - MIN(b.start)/1000;
dump d;
c)
ap[]);
b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'QUEUE_NAME' as queu
e;
c = group b by (id, user, queue) parallel 10;
d = foreach c generate group.user, group.queue, COUNT(b);
dump d;
4. Which of the following script determines the number of scripts run by user and
queue on a cluster:
a)

ap[]);
b = foreach a generate (Chararray) j#'STATUS' as status, j#'PIG_SCRIPT_ID' as id, j#'USER'

as user, j#'JOBNAME' as script_name, j#'JOBID' as job;
c = filter b by status != 'SUCCESS';
dump c;
b)

ap[]);

me, (Long) r#'NUMBER_REDUCES' as reduces;
c = group b by (id, user, script_name) parallel 10;
d = foreach c generate group.user, group.script_name, MAX(b.reduces) as max_reduces;

e = filter d by max_reduces == 1;
dump e;
c)
ap[]);
e;
dump d;

a) Pig can invoke code in language like Java Only.
b) Pig enables data workers to write complex data transformations without knowing
Java.
c) Pig’s simple SQL-like scripting language is called Pig Latin, and appeals to developers
already familiar with scripting languages and SQL.
d) Pig is complete, so you can do all required data manipulations in Apache Hadoop
with Pig.
6. Which of the following script is used to check scripts that have failed jobs ?
a)

ap[]);
dump c;
b)
ap[]);
dump e;
c)

ap[]);
e;
dump d;
7. Which of the following code is used to find scripts that use only the default
parallelism ?
a)
ap[]);
dump c;
b)
ap[]);
dump e;
c)
ap[]);
e;
dump d;
8. Pig Latin is _______ and fits very naturally in the pipeline paradigm while SQL is
instead declarative.
a) functional
b) procedural
c) declarative
9. In comparison to SQL, Pig uses :

a) lazy evaluation
b) ETL
c) Supports pipeline splits
10. Which of the following is an entry in jobconf ?

a) pig.job
b) pig.input.dirs
c) pig.feature
UNIT – VII
1. Which of the following is shortcut for DUMP operator ?

a) \de alias
b) \d alias
c) \q
2. Point out the correct statement:

a) Invoke the Grunt shell using the “enter” command
b) Pig does not support jar files
c) Both the run and exec commands are useful for debugging because you can modify a
Pig script in an editor
3. Which of the following command is used to show values to keys used in Pig ?
a) set
b) declare
c) display
4. Use the __________ command to run a Pig script that can interact with the Grunt
shell (interactive mode).
a) fetch
b) declare
c) run
5. Point out the wrong statement:

a) You can run Pig scripts from the command line and from the Grunt shell
b) DECLARE defines a Pig macro
c) Use Pig scripts to place Pig Latin statements and Pig commands in a single file
6. Which of the following command can be used for debugging ?

a) exec
b) execute
c) error
d) throw
7. Which of the following file contains user defined functions (UDFs) ?

a) script2-local.pig
b) pig.jar
c) tutorial.jar
d) excite.log.bz2
8. Which of the following is correct syntax for parameter substitution using cmd ?
a) pig {-param param_name = param_value | -param_file file_name} [-debug | -dryrun]
script
b) {%declare | %default} param_name param_value
c) {%declare | %default} param_name param_value cmd
9. You can specify parameter names and parameter values in one of the ways:
a) As part of a command line.
b) In parameter file, as part of a command line
c) With the declare statement, as part of Pig script
10. _________ are scanned in the order they are specified on the command line.
a) Command line parameters
b) Parameter files
c) Declare and default preprocessors
d) Both parameter files and command line parameters
UNIT – VIII
1._________ operator is used to review the schema of a relation.

a) DUMP
b) DESCRIBE
c) STORE
d) EXPLAIN

a) During the testing phase of your implementation, you can use LOAD to display
results to your terminal screen
b) You can view outer relations as well as relations defined in a nested FOREACH
statement
c) Hadoop properties are interpreted by Pig
3. Which of the following operator is used to view the map reduce execution plans ?
a) DUMP
b) DESCRIBE
c) STORE
d) EXPLAIN
4. ___________ operator is used to view the step-by-step execution of a series of

statements.
a) ILLUSTRATE
b) DESCRIBE
c) STORE
d) EXPLAIN

a) ILLUSTRATE operator is used to review how data is transformed through a sequence
of Pig Latin statements
b) ILLUSTRATE is based on an example generator
c) Several new private classes make it harder for external tools such as Oozie to
integrate with Pig statistics

6. __________ is a framework for collecting and storing script-level statistics for Pig
Latin.
a) Pig Stats
b) PStatistics
c) Pig Statistics
7. The ________ class mimics the behavior of the Main class but gives users a statistics
object back.
a) PigRun
b) PigRunner
c) RunnerPig
8. ___________ is a simple xUnit framework that enables you to easily test your Pig
scripts.
a) PigUnit
b) PigXUnit
c) PigUnitX
9. Which of the following will compile the Pigunit ?

a) $pig_trunk ant pigunit-jar
b) $pig_tr ant pigunit-jar
c) $pig_ ant pigunit-jar
10. PigUnit runs in Pig’s _______ mode by default.

a) local
b) tez
c) mapreduce

Bda MCQ

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bda MCQ

Uploaded by

Copyright:

Available Formats

BDA

1) MapReduce framework of Hadoop is also takes care of _______.

2) The term NoSQL was first coined by _______.

* 3) The structured, unstructured and semi-structured data is

* 4) In which of the following analysis, Data is descriptive,

6) Which of the following is a tool to transfer data between Hadoop

7) Which of the following is/are advantages of Hadoop?

9) Cassandra is a ________ database.

10) A List is a collection of _________.

11) PIG is ________.

12) Hive provides _________ kinds of partitions.

13) ______ is Data warehousing tool.

14) ETL processing in Pig stand for _________.

16) MS-Excels are under the category of _________ data.

17) Real time processing deals with __________ characteristic of

18)The 3V’s term of big data was first introduced by

19) _________ has no support for ACID properties.

20)A coordinated processing of a program by multiple processors,

* 21) Node manager is responsible for launching application

22) Which of the following format is supported by MongoDB ?

* 24) Which of the following is correct command to update a

25) Which of the following is wide column store ?

26) MetaStore contains _________ of Hive tables.

27) The interactive mode of pig is

28) In Cassandra __________ is called peer to peer communication

* 30)Big volume of data like 1 Yottabyte is equal to

31) System will continue to function even when network partition

* 32)___________ is robust database that support ACID properties

33) A typical block size is used by HDFS is

34) _____________ is the book-keeper of HDFS.

35) ) Name node uses ______________ to record every transaction.

36)MongoDB has been adopted as _____________ software by a

39) Cassandra is column oriented database designed to support

40)The 3 types of collection used in Cassandra are

41)MetaStore consist of _____________ and a ____________

* 42)Pig is used in ____________ process.

43)E-mails are under the catagory of ___________ data.

45)MongoDB based on ________ and _________ properties of CAP

* 46)Big volume of data like 1 Zettabytes is equal to

47)Cap theorem is also called as ________ theorem.

*48) Hadoop supports ________ data formats.

50) Which command in MongoDB is equivalent to SQL select ?

52)In Cassandra, _________ is called peer-to-peer communication

53)Databases are under the catagory of data.

54`) What was Hadoop named after ?

56)_____ NameNode is used when the Primary NameNode

a) Rack b) Data c) Secondary d) None of

57)What is the default HDFS replication factor ?

*58)Which one is not one of the big data feature ?

a)Velocity b) Veracity c) Volume d) Variety

59)NameNode in HDFS uses ___to store file system name space.

a)EditLog b) Fslmage c) Data node d) Map reduce

61)What does job tracker do ?

63)The hadoop frame work is written in

64) Apache Cassandra was born at

65)Which of the following is a valid flow in Hadoop ?

67)MapReduce was devised by

a)Apple b) Google c) Microsoft d) Samsung

68)Hadoop was first introduced by

* 70)Apache hadoop YARN is a sub-project of

71)A container used to hold application data in Cassandra is called

a) Data Flow language b) NoSQL database c) import export tool d) scheduling

41)MetaStore consist of _ and a

45)MongoDB based on and _ properties of CAP

2) MongoDB is _ and_.

3) Cassandra is _and _.