Professional Documents
Culture Documents
Bda MCQ
Bda MCQ
5) When NameNode starts up, it reads the ______ and ______ from
disk.
a) TaskTracker,JobTracker b) FsImage, EditLog
c) Master Node, Slave Node d) None of the above.
49) MongoDB is
a) RDBMS b) Document-oriented DBMS
c) Object Oriented DBMS d) Key-value store
55) A____ serves as the master and there is only one NameNode
per cluster.
e) Data Node b) NameNode c) Data block d)
Replication
62)HDFS is based on
a) Facebook file system b) Google file system
c) lBM file system d) Yahoo file system
a. Input -> Reducer -> Mapper -> Combiner -> -> Output
b. Input -> Mapper -> Reducer -> Combiner -> Output
c. Input -> Mapper -> Combiner -> Reducer -> Output
d. Input -> Reducer -> Combiner -> Mapper -> Output
66)Hive is used as
a) Data Flow Language b) Data Warehousing
Language
c) Workflow Language d) Scheduling Language
69)On a single Hadoop cluster how many Name node can run ?
a)depend on clusters b) only one
c) only 3 d) depend on data nodes
SECTION-II
1. Pig in Hadoop Eco system is
Ans: A
B. READ
C. LOAD -ans
3. You can run Pig in interactive mode using the ______ shell.
A. Grunt -ans
B. FS
C. HDFS
4. ________ is the slave/worker node and holds the user data in the form of Data
Blocks.
A. DataNode -ans
B. NameNode
C. Data block
D. Replication
6. A ________ serves as the master and there is only one NameNode per cluster.
a) Data Node b) NameNode c) Data block d) Replication
a. Input -> Reducer -> Mapper -> Combiner -> -> Output
b. Input -> Mapper -> Reducer -> Combiner -> Output
c. Input -> Mapper -> Combiner -> Reducer -> Output -ans
d. Input -> Reducer -> Combiner -> Mapper -> Output
13. Hive is used as ______________
a) Data Flow language b) Data Warehousing language
c) Workflow language d) scheduling language
14. MapReduce was devised by ...
a) YARN
b) HDFS
c) Map Reduce
d) All of above - ans
23) Which of the following is a column-oriented database that runs on top of HDFS
a) Hive
b) Sqoop
c) HBase
d) Flume
24) Which of the following is not the Dameon process that runs on a
hadoop cluster ?
a. JobTracker
b. DataNode
c. TaskTracker
d. TaskNode -ans
a) tables
b) collections -ans
c) rows
d) all of the mentioned
26) Which of the following query selects documents in the records collection that
match the condition { “user_id”: { $lt: 42 } }?
27) Which of the following key is used to denote uniqueness in the collection of
MongoDB?
a) _id -ans
b) id
c) id_
d) none of the mentioned
28) Which of the following line skips the first 5 documents in the bios collection and
returns all remaining documents in MongoDB?
a) db.bios.find().limit( 5 )
b) db.bios.find().skip( 1 )
c) db.bios.find().skip( 5 )
d) db.bios.find().sort( 5 )
Ans: C
Ans: B
Ans: B
Ans : C
Ans : A
6. Example on types of data
Database Images
MS Excel Facebook
Videos
Column A ANS
IBM UIMA
Unit 2 Introduction to Big Data
1. Doug Laney a gartner analyst coined the term ‘Big Data’
3. Data Lakes is a large repository of data in its native format until it is needed.
5. Near real time processing or real time processing deals with Velocity characteristics of
data.
6. Big data is high volume, high velocity and high variety information assets that demand
cost-effective , innovative forms of information processing for enhanced Insight and
Decision making.
Column A Answer
Postgre Sql Open source relational database
Scientific Data Machine Generated unstructured data
Point of Sale Machine generated structured data
Social Media Data Human-generated unstructured data
Gaming Related Data Human-generated structured data
Mobile Data Human-generated unstructured data
Unit 3 Big Data Analytics
1) The expansion for CAP is ____________,______________ and _____________ .
Ans: D
Ans: A
Ans: C
Ans: A
5) __________ is a robust database that supports ACID properties of transactions and has the
scalability of NoSQL.
Ans: B
8) In a Hadoop 2.0 a new and separate resource management framework called Yet Another
Resource Negotiator(YARN) has been added.
10) The In-memory analytics technology helps query data that resides in a computer random
access memory(RAM) rather than data stored on physical disks.
11) Eventually consistency is a consistency model used in distributed computing to achieve high
availability
13) In shared disk architecture multiple processors have their own private memory.
15) Ambari is a web based tool for provisioning, managing and monitoring Apache Hadoop
clusters.
Unit 4 Introduction to Hadoop
1. The 3 V terms of Big data was first introduced by _________
Ans: A
Ans : D
Ans : C
4. Name node in HDFS uses _________ to store file system name space.
Ans: B
Ans: A
a) 32 MB b)64 MB c) 64 KB d)32 KB
Ans: B
7. How many blocks will be created for a file that is 300 MB? The default block size is 64
MB and the replication factor is 3.
Ans: B
a) stores block of data b) store metadata c) coordinate and schedule the job d) act as mini
reducer
Ans: C
9. The MapReduce programming model widely used in analytics was developed at ______
Ans: C
10. On a single Hadoop cluster how many Name node can run ?
Ans: A
a)Facebook file system b) Google file system c) IBM File system d)Yahoo file system
Ans.: B
Ans: B
Ans: C
25. Receipt of heartbeat implies that the Data Node is functioning properly.
HDFS Storage
30. “hadoop fs-ls/” will show the contents for the HDFS root directory. Ans:- True
BIG DATA ANALYTICS SKNSCOEK
UNIT - I
1. As companies move past the experimental phase with Hadoop, many cite the need
for additional capabilities, including:
a) Improved data storage and information retrieval
b) Improved extract, transform and load features for data integration
c) Improved data warehousing functionality
d) Improved security, workload management and SQL support
3. According to analysts, for what can traditional IT systems provide a foundation when
they’re integrated with big data technologies like Hadoop ?
a) Big data management and data mining
b) Data warehousing and business intelligence
c) Management of Hadoop clusters
d) Collecting and storing unstructured data
4. Hadoop is a framework that works with a variety of related tools. Common cohorts
include:
a) MapReduce, Hive and HBase
b) MapReduce, MySQL and Google Apps
c) MapReduce, Hummer and Iguana
d) MapReduce, Heron and Trumpet
c) The programming model, MapReduce, used by Hadoop is difficult to write and test
d) All of the mentioned
UNIT – II
1. ________ is a platform for constructing data flows for extract, transform, and load
(ETL) processing and analysis of large datasets.
a) Pig Latin
b) Oozie
c) Pig
d) Hive
3. _________ hides the limitations of Java behind a powerful and concise Clojure API for
Cascading.
a) Scalding
b) HCatalog
c) Cascalog
d) All of the mentioned
c) Scalding is a Scala API on top of Cascading that removes most Java boilerplate
d) All of the mentioned
8. The Pig Latin scripting language is not only a higher-level data flow language but
also has operators similar to :
a) SQL
b) JSON
c) XML
d) All of the mentioned
10. ______ is a framework for performing remote procedure calls and data serialization.
a) Drill
b) BigTop
c) Avro
d) Chukwa
BIG DATA ANALYTICS SKNSCOEK
BIG DATA ANALYTICS SKNSCOEK
UNIT - III
1. IBM and ________ have announced a major initiative to use Hadoop to support
university courses in distributed computer programming.
a) Google Latitude
b) Android (operating system)
c) Google Variations
d) Google
4. Sun also has the Hadoop Live CD ________ project, which allows running a fully
functional Hadoop cluster using a live CD.
a) OpenOffice.org
b) OpenSolaris
c) GNU
d) Linux
8. Hadoop achieves reliability by replicating the data across multiple hosts, and hence
does not require ________ storage on hosts.
a) RAID
b) Standard RAID levels
c) ZFS
d) Operating system
9. Above the file systems comes the ________ engine, which consists of one Job Tracker,
to which client applications submit MapReduce jobs.
a) MapReduce
b) Google
c) Functional programming
d) Facebook
10. The Hadoop list includes the HBase database, the Apache Mahout ________ system,
and matrix operations.
a) Machine learning
b) Pattern recognition
c) Statistical classification
d) Artificial intelligence
BIG DATA ANALYTICS SKNSCOEK
UNIT – IV
1. A ________ node acts as the Slave and is responsible for executing a Task assigned to
it by the JobTracker.
a) MapReduce
b) Mapper
c) TaskTracker
d) JobTracker
3. ___________ part of the MapReduce is responsible for processing one or more chunks
of data and producing the output results.
a) Maptask
b) Mapper
c) Task execution
d) All of the mentioned
7. ________ is a utility which allows users to create and run jobs with any executables
as the mapper and/or the reducer.
a) Hadoop Strdata
b) Hadoop Streaming
c) Hadoop Stream
d) None of the mentioned
UNIT – V
1. Mapper implementations are passed the JobConf for the job via the ________ method
a) JobConfigure.configure
b) JobConfigurable.configure
c) JobConfigurable.configureable
d) None of the mentioned
6. The output of the _______ is not sorted in the Mapreduce framework for Hadoop.
a) Mapper
b) Cascader
c) Scalding
d) None of the mentioned
8. Mapper and Reducer implementations can use the ________ to report progress or just
indicate that they are alive.
a) Partitioner
b) OutputCollector
c) Reporter
d) All of the mentioned
10. _________ is the primary interface for a user to describe a MapReduce job to the
Hadoop framework for execution.
a) Map Parameters
b) JobConf
c) MemoryConf
BIG DATA ANALYTICS SKNSCOEK
UNIT - VI
1. Which of the following scripts that generate more than three MapReduce jobs ?
a)
3. Which of the following find the running time of each script (in seconds) ?
a)
4. Which of the following script determines the number of scripts run by user and
queue on a cluster:
a)
6. Which of the following script is used to check scripts that have failed jobs ?
a)
7. Which of the following code is used to find scripts that use only the default
parallelism ?
a)
a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:m
ap[]);
b = foreach a generate (Chararray) j#'STATUS' as status, j#'PIG_SCRIPT_ID' as id, j#'USER'
as user, j#'JOBNAME' as script_name, j#'JOBID' as job;
c = filter b by status != 'SUCCESS';
dump c;
b)
a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:m
ap[]);
b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_na
me, (Long) r#'NUMBER_REDUCES' as reduces;
c = group b by (id, user, script_name) parallel 10;
d = foreach c generate group.user, group.script_name, MAX(b.reduces) as max_reduces;
e = filter d by max_reduces == 1;
dump e;
c)
a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:m
ap[]);
b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'QUEUE_NAME' as queu
e;
c = group b by (id, user, queue) parallel 10;
d = foreach c generate group.user, group.queue, COUNT(b);
dump d;
d) None of the mentioned
8. Pig Latin is _______ and fits very naturally in the pipeline paradigm while SQL is
instead declarative.
a) functional
b) procedural
c) declarative
BIG DATA ANALYTICS SKNSCOEK
UNIT – VII
3. Which of the following command is used to show values to keys used in Pig ?
a) set
b) declare
c) display
d) All of the mentioned
4. Use the __________ command to run a Pig script that can interact with the Grunt
shell (interactive mode).
a) fetch
b) declare
c) run
d) All of the mentioned
8. Which of the following is correct syntax for parameter substitution using cmd ?
a) pig {-param param_name = param_value | -param_file file_name} [-debug | -dryrun]
script
b) {%declare | %default} param_name param_value
c) {%declare | %default} param_name param_value cmd
d) All of the mentioned
9. You can specify parameter names and parameter values in one of the ways:
a) As part of a command line.
b) In parameter file, as part of a command line
c) With the declare statement, as part of Pig script
d) All of the mentioned
10. _________ are scanned in the order they are specified on the command line.
a) Command line parameters
b) Parameter files
c) Declare and default preprocessors
d) Both parameter files and command line parameters
BIG DATA ANALYTICS SKNSCOEK
UNIT – VIII
3. Which of the following operator is used to view the map reduce execution plans ?
a) DUMP
b) DESCRIBE
c) STORE
d) EXPLAIN
6. __________ is a framework for collecting and storing script-level statistics for Pig
Latin.
a) Pig Stats
b) PStatistics
c) Pig Statistics
d) None of the mentioned
7. The ________ class mimics the behavior of the Main class but gives users a statistics
object back.
a) PigRun
b) PigRunner
c) RunnerPig
d) None of the mentioned
8. ___________ is a simple xUnit framework that enables you to easily test your Pig
scripts.
a) PigUnit
b) PigXUnit
c) PigUnitX
d) All of the mentioned