Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

MALLA REDDY ENGINEERING COLLEGE (AUTONOMOUS)

Maisammaguda, Dhulapally,(Post Via Kompally), Secunderabad 500100.


Department of Computer Science and Engineering
IV B.Tech I Sem II Mid Examination (MR18 Regulations-2021-22 Admitted Batch)
Subject: Big Data Analytics Marks: 20

1. _________ controls the partitioning of the keys of the intermediate map-outputs. [B]
A) Collector B) Partitioner C) Input Format D)None of the mentioned
2. A combinator in Map Reduce is a function that: [C]
A) Assembles program fragments B) Helps in fragmenting the program C) Builds programs from
program fragments D) Builds program fragments
3. Output of the mapper is first written on the local disk for sorting and _________ process. [A]
A) Shuffling B) secondary sorting C) forking D) reducing
4. Point out the correct statement. [D]
A) Mapper maps input key/value pairs to a set of intermediate key/value pairs
B) Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce
methods
C) Mapper and Reducer interfaces form the core of the job
D) None of the mentioned
5. Users can control which keys (and hence records) go to which Reducer by implementing a custom? [A]
A) Partitioner B) Output Split C) Reporter D) All of the above
6. The number of reduces for the job is set by the user via _________ [B]
A) JobConf.setNumTasks(int) B) JobConf.setNumTasks(int) C) JobConf.setNumMapTasks(int)
D) All of the above
7. The right level of parallelism for maps seems to be around _________ maps per-node. [B]
A) 1 to 10 B) 10 to 100 C) 100 to 150 D) 150 to 200
8. Applications can use the ____________ to report progress and set application-level status messages.[B]
A) Partitioner B) Reporter C) OutputSplit D) All of the above
9. _________ function is responsible for consolidating the results produced by each of the Map() [A]
[functions/tasks.
A) Reduce B) Map C) Reducer D) All of the above
10. The framework groups Reducer inputs by key in _________ stage. [A]
A) sort B) shuffle C) reduce D) none of the mentioned
11. The output of the reduce task is typically written to the File System via _____________ [A]
A) OutputCollector.collect B) OutputCollector.get C) OutputCollector.receive D) OutputCollector.put
12. _________ is the name of the archive you would like to create. [B]
A) Archive B) archive Name C) name D) none of the mentioned
13. What are the core methods of a Reducer? [D]
A) Setup (),reduce (),cleanup ()
B) Get (),Mapreduce (),cleanup ()
C) Put (),reduce (),clean ()
D) Set-up (),reduce (),cleanup ()
14. The split size is normally the size of a ________ block, which is appropriate for most applications.[D]
A) Generic B) Task C) Library D) HDFS
15. What license is Hadoop distributed under? [C]
A). Apache License 2.1 B) Apache License 2.2 C) Apache License 2.0 D) Apache License 1.0
16. Which of the following is false about Raw Comparator? [C]
A) Compare the keys by byte.
B) Performance can be improved in sort and suffle phase by using RawComparator
C) Intermediary keys are deserialized to perform a comparison
D) all of the mentioned
17. Input Format class calls the ________ function and computes splits for each file and then sends them
to the job tracker. [C]
A) Puts B) gets C) get Splits D) all of the mentioned
18. ______________ class allows the Map/Reduce framework to partition the map outputs based on
certain key fields, not the whole keys. [B]
A) KeyFieldPartitioner B) KeyFieldBasedPartitioner C) KeyFieldBased D) none of the mentioned
19. Which of the following is not a goal of HDFS? [C]
A) Fault detection and recovery
B) Handle huge dataset
C) Prevent deletion of data
D) Provide high network bandwidth for data movement
20. The daemons associated with the MapReduce phase are ________ and task-trackers. [A]
A) job-tracker B) map-tracker C) reduce-tracker D) all of the mentioned
21. Point out the wrong statement. [D]
A) The Mapper outputs are sorted and then partitioned per Reducer
B) The total number of partitions is the same as the number of reduce tasks for the job
C) The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value)
format
D) None of the mentioned
22. Point out the correct statement. [A]
A) The minimum split size is usually 1 byte, although some formats have a lower bound on the split
size
B) Applications may impose a minimum split size
C) The maximum split size defaults to the maximum value that can be represented by a Java long type
D) all of the mentioned
23. The output of the _______ is not sorted in the Mapreduce framework for Hadoop. [D]
A) Mapper B) Cascader C) Scalding D) none of the mentioned
24. Point out the correct statement.
A) A Hadoop archive maps to a file system directory
B) Hadoop archives are special format archives
C) A Hadoop archive always has a *.har extension
D) all of the mentioned
25. Which of the following is the only way of running mappers? [B]
A) MapReducer B) MapRunner C) MapRed D) All of the mentioned
26. Hadoop I/O Hadoop comes with a set of ________ for data I/O [D]
A) Methods B) commands C) classes D) none of the mentioned
27. Gzip (short for GNU zip) generates compressed files that have a _________ extension. [B]
A) .gzip B) .gz C) .gzp D) .g
28. The key, a ____________ is the byte offset within the file of the beginning of the line. [B]
A) LongReadable B) LongWritable C) VLongWritable D) All of the mentioned
29. How many formats of SequenceFile are present in Hadoop I/O? [C]
A) 2 B) 3 C) 4 D) 5
30. Point out the correct statement. [D]
A) A Hadoop archive maps to a file system directory
B) Hadoop archives are special format archives
C) A Hadoop archive always has a *.har extension
D) All of the mentioned
31. Apache Hadoop ___________ provides a persistent data structure for binary key-value pairs.[B]
A) GetFile B) SequenceFile C) Putfile D) All of the mentioned
32. Apache _______ is a serialization framework that produces data in a compact binary format.[D]
A) Oozie B) Impala C) kafka D) Avro
33. Which of the following is based on the DEFLATE algorithm? [C]
A) LZO B) Bzip2 C) Gzip D) All of the mentioned
34. Point out the correct statement [D]
A) The sequence file also can contain a “secondary” key-value list that can be used as file Metadata
B) SequenceFile formats share a header that contains some information which allows the reader to
recognize is format
C) There’re Key and Value Class Name’s that allow the reader to instantiate those classes, via
reflection, for reading
D) All of the mentioned
35. The ____________ is an iterator which reads through the file and returns objects using the next ()
method. [B]
A) DatReader B) Datum Reader C) Datum Read D) none of the mentioned
36. The _________ codec from Google provides modest compression ratios. [B]
A) Snapcheck B) Snappy C) FileCompress D) None of the mentioned
37. ____________ data file takes is based on avro serialization framework which was primarily created
for hadoop. [B]
A) Oozie B) Avro C) cTakes D) Lucene
38. Point out the correct statement. [D]
A) He reduce input must have the same types as the map output, although the reduce output types may
be different again
B) The map input key and value types (K1 and V1) are different from the map output types
C) The partition function operates on the intermediate key
D) All of the mentioned
39. Which of the following format is more compression-aggressive? [C]
A) Partition Compressed B) Record Compressed C) Block-Compressed D) Uncompressed
40. In _____________ the default job is similar, but not identical, to the Java equivalent. [B]
A) Mapreduce B) Streaming C) Orchestration D) All of the mentioned
41. Which of the following works well with Avro? [C]
A) Lucene B) kafka C) MapReduce D) None of the mentioned
42. An input _________ is a chunk of the input that is processed by a single map. [B]
A) textformat B) split C) datanode D) all of the mentioned
43. The __________ is a directory that contains two SequenceFile. [C]
A) ReduceFile B) MapperFile C) MapFile D) None of the mentioned
44. Which of the following is the default output format? [C]
A) TextFormat B) TextOutput C) TextOutputFormat D) None of the mentioned
45. The _________ as just the value field append(value) and the key is a LongWritable that contains the
record number, count + 1. [B]
A) SetFile B) ArrayFile C) BloomMapFile D) None of the mentioned
46. The ________ method in the ModelCountReducer class “reduces” the values the mapper collects into
a derived value. [C]
A) count B) add C) reduce D) all of the mentioned
47. Point out the wrong statement. [C]
A) The data file contains all the key, value records but key N + 1 must be greater than or equal to the
key N
B) Sequence file is a kind of hadoop file based data structure
C) Map file type is splittable as it contains a sync point after several records
D) None of the mentioned
48. The ____________ class extends and implements several Hadoop-supplied interfaces. [C]
A) AvroReducer B) Mapper C) AvroMappe D) None of the mentioned
49. Point out the wrong statement. [C]
A) Hadoop sequence file format stores sequences of binary key-value pairs
B) equenceFileAsBinaryInputFormat is a variant of SequenceFileInputFormat that retrieves the sequence
file’s keys and values as opaque binary objects
C) SequenceFileAsTextInputFormat is a variant of SequenceFileInputFormat that retrieves the sequence
file’s keys and values as opaque binary objects.
D) None of the mentioned
50. The ______ file is populated with the key and a LongWritable that contains the starting byte position
of the record. [B]
A) Array B) Index C) Immutable D) all of the mentioned
51. _________ is the output produced by TextOutputFor mat, Hadoop default OutputFormat. [B]
A) KeyValueTextInputFormat B) KeyValueTextOutputFormat C FileValueTextInputFormat
D) File Records distribution
52. Which of the following are NOT metadata items? [D]
A) List of HDFS files B) HDFS block locations C) Replication factor of files
D) File Records distribution
53. Point out the wrong statement. [B]
A) Java code is used to deserialize the contents of the file into objects
B) Avro allows you to use complex data structures within Hadoop MapReduce jobs
C) The m2e plugin automatically downloads the newly added JAR files and their dependencies
D) None of the mentioned
54. ___________ generates keys of type LongWritable and values of type Text. [D]
A) TextOutputFormat B) TextInputFormat C) OutputInputFormat D) None of the mentioned
55. An ___________ is responsible for creating the input splits, and dividing them into records. [B]
A) TextOutputFormat B) TextInputFormat C) OutputInputFormat D) InputFormat
56. __________ is a variant of SequenceFileInputFormat that converts the sequence file’s keys and
values to Text objects. [D]
A) SequenceFile B) SequenceFileAsTextInputFormat C) SequenceAsTextInputFormat D) all of the
mentioned
57. _____________ is another implementation of the MapRunnable interface that runs mappers
concurrently in a configurable number of threads [C]
A) MultithreadedRunner B) MultithreadedMap C) MultithreadedMapRunner
D) SinglethreadedMapRunner
58. __________ class allows you to specify the InputFormat and Mapper to use on a per-path basis. [C]
A) MultipleOutputs B) SingleInputs C) MultipleInputs D) None of the mentioned
59. ____________ class accepts the values that the ModelCountMapper object has collected. [A]
A) AvroReducer B) Mapper C) AvroMapper D) None of the mentioned
60. The Hadoop MapReduce framework spawns one map task for each __________ generated by the
InputFormat for the job. [C]
A) OutputSplit B) InputSplitStream C) InputSplit D) All of the above
61. Using Hadoop Archives in __________ is as easy as specifying a different input filesystem than the
default file system. [C]
A) Hive B) Pig C) Map Reduce D) All of the above
62. Which of the following are the Big Data Solutions Candidates [D]
A) Processing 1.5 TB data everyday
B) Processing 30 minutes Flight sensor data
C) Interconnecting 50K data points (approx. 1 MB input file)
D) All of the above
63. The __________ guarantees that excess resources taken from a queue will be restored to it within N
minutes of its need for them. [B]
A) capacitor B) scheduler C) datanode D) none of the mentioned
64. On a tasktracker, the map task passes the split to the createRecordReader() method on InputFormat to
obtain a _________ for that split. [B]
A) InputReader B) RecordReader C) OutputReader D) None of the mentioned
65. Point out the wrong statement. [C]
A) Hadoop works better with a small number of large files than a large number of small files
B) CombineFileInputFormat is designed to work well with small files
C) CombineFileInputFormat does not compromise the speed at which it can process the input in a typical
MapReduce job
D) All of the above
66. The default InputFormat is __________ which treats each value of input a new value and the
associated key is byte offset. [B]
A) TextFormat B) TextInputFormat C) InputFormat D) All of the above
67. _________ is a pluggable Map/Reduce scheduler for Hadoop which provides a way to share large
clusters. [C]
A) Flow Scheduler B) Data Scheduler C) Capacity Scheduler D) None of the mentioned
68. What is the default HDFS replication factor? [C]
A) 4 B) 1 C) 3 D) 2
69. Which of the following writes MapFiles as output? [C]
A) DBInpFormat B) MapFileOutputFormat C) SequenceFileAsBinaryOutputFormat DNone of the
mentioned
70. Which of the following method add a path or paths to the list of inputs? [B]
A) setInputPaths() B) addInputPath() C) TextFileInputFormat D) none of the mentioned
71. ___________ takes node and rack locality into account when deciding which blocks to place in the
same split. [B]
A) CombineFileOutputFormat B) CombineFileInputFormat
C) TextFileInputFormat D) none of the mentioned
72. __________ is the parent argument used to specify the relative path to which the files should be
archived to [B]
A) -archiveName <name> B) -p <parent_path> C) <destination> D) <source>
73. ______ is the base class for all implementations of InputFormat that use files as their data source. [B]
A) FileTextFormat B) FileInputFormat C) FileOutputFormat D) None of the mentioned
74. ___________ is an input format for reading data from a relational database, using JDBC. [B]
A) DBInput B) DBInputFormat C) DBInpFormat D) None of the mentioned
75. Point out the correct statement. [D]
A) With TextInputFormat and KeyValueTextInputFormat, each mapper receives a variable number of
lines of input
B) StreamXmlRecordReader, the page elements can be interpreted as records for processing by a mapper
C) The number depends on the size of the split and the length of the lines.
D) All of the mentioned
76. Which of the following is not true about Pig? [B]
A) Apache Pig is an abstraction over MapReduce
B) Pig can not perform all the data manipulation operations in Hadoop.
C) Pig is a tool/platform which is used to analyze larger sets of data representing them as data flows.
D) None of the above
77. Which of the following is/are a feature of Pig? [D]
A) Rich set of operators B) Ease of programming C) Extensibility D) All of the above
78. In which year apache Pig was released? [B]
A) 2005 B) 2006 C) 2007 D) 2008
79. Pig operates in mainly how many nodes? [A]
A) 2 B) 3 C) 4 D) 5
80. Which of the following company has developed PIG? [B]
A) Google B) Yahoo C) Microsoft D) Apple
81. Which of the following function is used to read data in PIG? [D]
A) Write B) Read C) Perform D) Load
82. __________ is a framework for collecting and storing script-level statistics for Pig Latin. [C]
A) Pig Stats B) PStatistics C) Pig Statistics D) All of the above
83. Which of the following is true statement? [D]
A) Pig is a high level language. B) Performing a Join operation in Apache Pig is pretty simple. C)
Apache Pig is a data flow language. D) All of the above
84. Which of the following will compile the Pigunit? [A]
A) $pig_trunk ant pigunit-jar B) $pig_tr ant pigunit-jar C) $pig_ ant pigunit-jar D) $pigtr_ ant
pigunit-jar
85. Point out the wrong statement. [A]
A) Pig can invoke code in language like Java Only
B) Pig enables data workers to write complex data transformations without knowing Java
C) Pig's simple SQL-like scripting language is called Pig Latin, and appeals to developers already
familiar with scripting languages and SQL
D) Pig is complete, so you can do all required data manipulations in Apache Hadoop with Pig
86. You can run Pig in interactive mode using the ______ shell [A]
A) Grunt B) FS C) HDFS D) None of the mentioned
87. Which of the following is the default mode? [D]
A) Mapreduce B) Tez C) Local D) All of the mentioned
88. Use the __________ command to run a Pig script that can interact with the Grunt shell (interactive
mode). [C]
A) fetch B) declare C) run D) all of the mentioned
89. What are the different complex data types in PIG [D]
A) Maps B) Tuples C) Bags D) All of these
90. What are the various diagnostic operators available in Apache Pig? [D]
A) Dump Operator B) Describe Operator C) Explain Operator D) All of these
91. The query "SHOW DATABASE LIKE 'h.*' ; gives the output with database name [B]
A) containing h in their name B) starting with h C) ending with h D) containing 'h.'
92. Which of the following statements is correct? [A]
A) Pig is an execution engine that replaces the MapReduce core in Hadoop.
B) Pig is an execution engine that utilizes the MapReduce core in Hadoop.
C) Pig is an execution engine that compiles Pig Latin scripts into database queries.
D) a. Pig is an execution engine that compiles Pig Latin scripts into HDFS.
93. Which of the following statements about Pig are not correct? [A]
A) In general, to implement a task, the number of lines of code in Pig and Hadoop are roughly the same
B) Pig makes use of Hadoop job chaining
C) Code written for the Pig engine is compiled into Hadoop jobs
D) Code written for the Pig engine is directly compiled into machine code
94. Let's consider the file above once more. You are tasked with writing a Pig Latin script that outputs the
unique names (first column) occurring in this file. Which Pig Latin operators do you use (choose the
minimum number)? [A]
A foreach, distinct B) filter, distinct C) foreach, filter D) foreach
95. Which of the following definitions of complex data types in Pig are correct? [A]
A) Tuple: a set of key/value pairs B) Tuple: an ordered set of fields. C) Bag: a collection of
key/value pairs. D) Bag: an ordered set of fields.
96. Which guarantee that Hadoop provides does Pig break? [B]
A) Calls to the Reducer's reduce() method only occur after the last Mapper has finished running.
B) All values associated with a single key are processed by the same Reducer.
C) The Combiner (if defined) may run multiple times, on the Map-side as well as the Reduce-side.
D) Task stragglers due to slow machines (not data skew) can be sped up through speculative execution.
97. Which of the following statements about Pig is correct? [C]
A) Pig always generates the same number of Hadoop jobs given a particular script, independent of the
amount/type of data that is being processed.
B) Pig replaces the MapReduce core with its own execution engine.
C) Pig may generate a different number of Hadoop jobs given a particular script, dependent on the
amount/type of data that is being processed.
D) When doing a default join, Pig will detect which join-type is probably the most efficient.
98. Which of the following definitions of complex data types in Pig are correct? [D]
A) Tuple: a set of key/value pairs. B) Tuple: an ordered set of fields C) Map: a collection of
tuples. D) Bag: an ordered set of fields
99. Which of the following is/are a feature of Pig? [B]
A) Rich set of operators B) Ease of programming C) Extensibility D) All of the above
100. What Hive can not offer [B]
A) storing data in tables and columns B) Online transaction processing C) Handling date time data
D) Partitioning stored data
101. Using the ALTER DATABASE command in an database you can change the [C]
A) database name B) database creation time C) dbproperties D) directory where the database is
stored
102. The partitioning of a table in Hive creates more [B]
A) subdirectories under the database name B) subdirectories under the table name C) files under
databse name D) directory where the database is stored
103. Creating a table an loading it with a select clause in one query applies to [A]
A) only managed tables B) only external tables C) Both types of tables D) All of the above
104. An element in a STRUCT column in hive is referred by [D]
A) index B) key C) colon D) dot
105. Create table TABLE_NAME LIKE VIEW_NAME [A]
A) creates a table which is copy of the view B) is invalid C) runs only if the view has data D) runs
only if the view is in same directory as the table
106. Indexes can be created [A]
A) only on managed table B) only on views C) Only on external tables D) only on views with
partitions
107. If a hive query produces unexpected result then its cause can be investigated by using [B]
A) Block size in HDFS B) Virtual columns C) Virtual parameters D) Query logs
108. The command to list the functions currently loaded in a Hive Session is [B]
A) LIST FUNCTIONS B) SHOW FUNCTIONS C) DECSRIBE FUNCTIONS D) FIND
FUNCTIONS
109. To add a new user defined Function permanently to Hive, we need to [C]
A) Create a new version of HIve B) Add the .class Java code to Function Registry C) Add the .jar
Java code to FunctionRegistry D) Add the .jar java code to $HOME/.hiverc
110. Which of the following is/are INCORRECT with respect to Hive? [B]
A) Hive provides SQL interface to process large amount of data
B) Hive needs a relational database like oracle to perform query operations and store data.
C) Hive works well on all files stored in HDFS
D) Both A and B
111. Which of the following is not a Features of HiveQL? [D]
A) Supports joins B) Supports indexes C) Support views D) Support Transactions
112. Which of the following operator executes a shell command from the Hive shell? [B]
A) | B) ! C) # D) $
113. Hive uses _________ for logging. [D]
A) logj4 B) log4l C) log4i D) log4j
114. HCatalog is installed with Hive, starting with Hive release is ___________ [C]
A) 0.10.0 B) 0.9.0 C) 0.11.0 D) 0.12.0
115. _______ supports a new command shell Beeline that works with HiveServer2. [A]
A) HiveServer2 B) HiveServer3 C) HiveServer4 D) HiveServer5
116. The ________ allows users to read or write Avro data as Hive tables. [A]
A) AvroSerde B) HiveSerde C) SqlSerde D) HiveQLSerde
117. We need to store skill set of MCQs(which might have multiple values) in MCQs table, which of the
following is the best way to store this information in case of Hive? [C]
A) We need to store skill set of MCQs(which might have multiple values) in MCQs table, which of the
following is the best way to store this information in case of Hive?
B) Create a column in MCQs table of MAP data type
C) Create a column in MCQs table of ARRAY data type
D) As storing multiple values in a column of MCQs itself is a violation
118. Which of the following data type is supported by Hive? [D]
A) map B) record C) string D) enum
119. Letsfindcourse is generating huge amount of data. They are generating huge amount of sensor data
from different courses which was unstructured in form. They moved to Hadoop framework for storing
and analyzing data. What technology in Hadoop framework, they can use to analyse this unstructured
data? [A]
A) MapReduce programming B) Hive C) RDBMS D) None of the above
120. What is Hive? [A]
A) An open source data warehouse system B) relational databas C) OLTP D) An langauge
121. Which of the following command sets the value of a particular configuration variable (key)? [B]
A) set -v B) set <key>=<value> C) set D) reset
122. Point out the wrong statement. [B]
A) source FILE <filepath> executes a script file inside the CLI
B) bfs <bfs command> executes a dfs command from the Hive shell
C) hive is Query language similar to SQL
D) none of the mentioned
123. Which of the following is a command line option? [A]
A) -d,–define <key=value> B) -e,–define <key=value> C) -f,–define <key=value> D) None of the
mentioned
124. _________ is a shell utility which can be used to run Hive queries in either interactive or batch
mode. [C]
A) $HIVE/bin/hive B) $HIVE_HOME/hive C) $HIVE_HOME/bin/hive D) All of the mentioned
125. Which of the following will remove the resource(s) from the distributed cache? [D]
A) delete FILE[S] <filepath>* B) delete JAR[S] <filepath>* C) delete ARCHIVE[S] <filepath>*
D) all of the mentioned

Course Coordinator HOD CSE

You might also like