Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Question 1.

All of the following accurately describe Hadoop, EXCEPT:

A. Open-source

B. Real-time

C. Java-based

D. Distributed computing approach

Question 2. What are the challenges faced by optimization of Big Data analysis ?

A. Big Data is not difficult to optimize

B. Both data and cost-effective ways to mine data to make business sense out of it

C. The technology to mine data

D. All of the above

Question 3. Which among the Listed step is not used for deployment of big data solution

A. Data Ingestion

B. Data Processing

C. Data dissemination

D. Data Storage

Question 4. Which among the following is used to provide multiple inputs to Hadoop?

A. MultipleInputs class

B. MultipleInput Format

C. FileInput Format

D. DBInput Format

Question5.Big data analysis does the following except?

A. Spreads data

B. Analyse data.

C. Organize data

D. Collect data

Question 6. Which analytics tool lets users create charts and dashboards to share online?

A. Apache Spark

B. Plotly

C. Lumify

D. None

Question 7. What are Fixed-size pieces of MapReduce job known as ____?


A. Splits

B. Tasks

C. Maps

D. Records

Ans-Fixed size pieces of MapReduce job is known as splits.

Question 8. Identify the slave node among the following.

A. Job node

B. Data node

C. Task node

D. Name node

Question 9. Studying the Forms of Big Data, which one of these is not included?

A. Structured

B. Unstructured

C. Processed

D. Semi-Structured

Question 10. Which among the following has the world’s largest Hadoop cluster?

A. Apple

B. Datamatics

C. Facebook

D. None of the mentioned

Question 11. Which is key and value in TextInputFormat

A. Key- byte offset Value- It is the contents of the line

B. Key- Everything up to tab character Value- Remaining part of the line after tab character

C. Key and value- Both are user-defined

D. None of the above

Question 12. Identify the framework used for performing remote procedure calls and data
serialization.

A. Drill

B. BigTop

C. Avro

D. Chukwa
Question 13. Which part of the MapReduce is responsible for processing one or more chunks of data
and producing the output results?

A. Maptask

B. Mapper

C. Task execution

D. All of the mentioned

Question 14. What license is Apache Hadoop distributed under?

A. Apache License 2.0

B. Shareware

C. Mozilla Public License

D. Commercial

Question 15. Which shell is used to run pig in interactive mode?

A. Grunt

B. FS

C. HDFS

D. None of the mentioned

Question 16. ______ is interpolated into the quotes to correctly handle spaces within the schema.

A. $SCHEMA

B. $ROW

C.$SCHEMASPACES

D.$NAMESPACES

Question 17. Identify slave/worker node that holds the user data in the form of Data Blocks.

A. Data Block

B. NameNode

C. DataNode

D. Replication

Question 18. Predictive analytics relies on capturing relationships between explanatory variables and
the ___.

A. Predicted variables

B. Descriptive variables

C. Prescriptive variables
D. All of the mentioned above

Answer: A) Predicted variables

₹Question 19. Apart from HBaseAdmin which is the other important class in this package that
provide DDL functionalities.

A. HTableDescriptor

B. HDescriptor

C. HTable

D. HTabDescriptor

Question 20. To register a “watch” on a znode data, you need to use the ___ commands to access
the current content or metadata.

A. stat

B. put

C. receive

D. gets

Question 21. Hadoop works in

A. master-worker fashion

B. master – slave fashion

C. worker/slave fashion

D. All of the mentioned

Question 22. Which industries employ the use of so-called "Big Data" in their day to day operations?

A. Weather forecasting

B. Marketing

C. Healthcare

D. All of the above

Question 23. Which of the following scenario may not be a good fit for HDFS?

A. HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file

B. HDFS is suitable for storing data related to applications requiring low latency data access

C. HDFS is suitable for storing data related to applications requiring low latency data access

D. None of the mentioned

Question 24.______ was designed to overcome the limitations of the other Hive file formats.

A. ORC

B. OPC
C. ODC

D. None of the mentioned

Question 25._ ____ is general-purpose computing model and runtime system for distributed data
analytics.

A. Mapreduce

B. Drill

C. Oozie

D. None of the above

Question 26. What license is Apache Hadoop distributed under?

A. Commercial

B. Shareware

C. Mozilla Public License

D. Apache License 2.0

Question 27. Point out the correct statement.

A. Hadoop is ideal for the analytical, post-operational, data-warehouse-ish type of workload

B. HDFS runs on a small cluster of commodity-class nodes

C. NEWSQL is frequently the collection point for big data

D. None of the mentioned

Question 28. Fault Tolerance in RDD is achieved using which of the following:

A. Immutable nature of RDD

B. DAG (Directed Acyclic Graph)

C. Lazy-evaluation

D. None of the above

Question 29. Which algorithm is not the solution for multiclass classification problem?

A. Naive Bayes

B. Random Forests

C. Logistic Regression

D. Decision Trees

Question 30. Identify the correct statement for RDD?

A. RDD is a programming paradigm

B. RDD in Apache Spark is an immutable collection of objects


C. It is a database

D. None of the above

You might also like