Download as pdf or txt
Download as pdf or txt
You are on page 1of 180

Question: 1

Where must a Spark configuration be set up first?


Your answer

A Notebook

B Db2 Warehouse

C IBM Cloud

D Watson Data Platform

Question: 2
When sharing a notebook, what will always point to
the most recent version of the notebook?
Your answer

A Watson Studio homepage


homepage

B The permalink

C The Spark service

D PixieDust visualization

Question: 3
When creating a Watson Studio project,
p roject, what do you
need to specify?
Your answer

A Spark service

B Data service

C Collaborators
Your answer

D Data assets

Question: 4
You can import preinstalled libraries if you are using
which languages? (Select two.)
(Please select ALL that apply)
Your answer

A R

B Python

C Bash

D Rexx

E Scala

Question: 5
Who can control a Watson Studio project assets?
Your answer

A Viewers

B Editors

C Collaborators

D Tenants

Question: 6
Which environmental variable needs to be set to
properly start ZooKeeper?
Your answer

A ZOOKEEPER_APP
ZOOKEEPER_APP

B ZOOKEEPER_DATA

C ZOOKEEPER

D ZOOKEEPER_HOME

Question: 7
Which is the primary advantage of using column-
based data formats over record-based formats?
Your answer

A better compression using GZip

B supports in-memory processing

C facilitates SQL-based queries

D faster query execution

Question: 8

What is the primary purpose of Apache NiFi?


Your answer

A Collect and send data into a stream.


stream.

B Finding data across the cluster.

C Connect remote data sources via WiFi.

D Identifying non-compliant data access.

Question: 9
What are three examples of Big Data? (Choose
three.)
(Please select ALL that apply)
Your answer

A cash register receipts

B web server logs

C inventory database records

D bank records

E photos posted on Instragram

F messages tweeted on Twitter

Question: 10
What ZK CLI command is used to list all the ZNodes
at the top level of the ZooKeeper hierarchy, in the
ZooKeeper command-line interface?
Your answer

A get /

B create /

C listquota /

D ls /

Question: 11
What is the default data format Sqoop parses to
export data to a database?
Your answer

A JSON

B CSV

C XML

D SQL

Question: 12
Under the MapReduce v1 architecture, which
function is performed by the TaskTracker?
Your answer

A Keeps the tasks physically


physically close to the data.

B Pushes map and reduce tasks out to DataNodes.

C Manages storage and transmission of intermediate output.

D Accepts MapReduce jobs submitted by clients.

Question: 13

Which statement
in the modern describes
business "Big Data" as it is used
world?
Your answer

A Indexed databases containing


containing very large volumes
volumes of historical data used for com

B Non-conventional methods used by businesses and organizations to capture, ma

C Structured data stores containing very large data sets such as video and audio s

D The summ
summarization
arization of large indexed data stores to provide information about pot
Question: 14
Under the MapReduce v1 architecture, which
function is performed by the JobTracker?
Your answer

A Runs map and reduce tasks.


tasks.

B Accepts MapReduce jobs submitted by clients.

C Manages storage and transmission of intermediate output.

D Reports status to MasterNode.

Question: 15
Which statement is true about the Hadoop
Distributed File System (HDFS)?
Your answer

A HDFS is a software framework


framework to support computing on large clusters of comput
comput

B HDFS provides a web-based tool for managin


managing
g Hadoop clusters.

C HDFS links the disks on multiple nodes into one large file system.

D HDFS is the framework for job scheduling and cluster resource managemen
management.
t.

Question: 16
How does MapReduce use ZooKeeper?
Your answer

A Coordination between servers.

B Aid in the high availability of Resource Manager.

C Master server election and discovery.


Your answer

D Server lease managem


management
ent of nodes.

Question: 17
Which two Spark libraries provide a native shell?
(Choose two.)
(Please select ALL that apply)
Your answer

A Python

B Scala

C C#

D Java

E C++

Question: 18
What is an authentication mechanism in
Hortonworks Data Platform?
Your answer

A IP address

B Preshared keys

C Kerberos

D Hardware token

Question: 19
What is Hortonworks DataPlane Services (DPS) used
for?
Your answer

A Manage, secure, and govern data stored across all storage environments.
environments.

B Transform data from CSV format into native HDFS data.

C Perform backup and recovery of data in the Hadoop ecosystem


ecosystem..

D Keep data up to date by periodically refreshing stale data.

Question: 20
What must be done before using Sqoop to import
from a relational database?
Your answer

A Copy any appropriate JDBC driver


driver JAR to $SQOOP_HOME/lib.
$SQOOP_HOME/lib.

B Complete the installation of Apache Accumulo.

C Create a Java class to support the data import.

D Create an empty database for Sqoop to access.

Question: 21

What is the native programming language for Spark?


Your answer

A Scala

B C++

C Java

D Python

Question: 22
Which Hortonworks Data Platform (HDP) component
provides a common web user interface for
applications running on a Hadoop cluster?
Your answer

A YARN

B HDFS

C Ambari

D MapReduce

Question: 23
Which Spark RDD operation returns values after
performing the evaluations?
Your answer

A Transformations

B Actions

C Caching

D Evaluations

Question: 24
Which two are use cases for deploying ZooKeeper?
(Choose two.)
(Please select ALL that apply)
Your answer

A Configuration bootstrapping for new nodes.


nodes.

B Managing the hardware of cluster nodes.


Your answer

C Storing local temporary data files.

D Simple data registry between nodes.

Question: 25
In a Hadoop cluster, which two are the result of
adding more nodes to the cluster? (Choose two.)
(Please select ALL that apply)
Your answer

A DataNodes increase capacity


capacity while NameNodes increase processing
processing power.

B It adds capacity to the file system.

C Scalability increases by a factor of x^


x^N-1.
N-1.

D Capacity increases while fault tolerance decreases


decreases..

E It increases available processing power.

Question: 26
Which Spark RDD operation creates a directed
acyclic graph through lazy evaluations?
Your answer

A Distribution

B GraphX

C Transformations

D Actions

Question: 27
Which feature allows application developers to
easily use the Ambari interface to integrate Hadoop
provisioning, management, and monitoring
capabilities into their own applications?

Your answer

A REST APIs

B Postgres RDBMS

C Ambari Alert Framework

D AMS APIs

Question: 28
What is one disadvantage to using CSV formatted
data in a Hadoop data store?
Your answer

A Columns of data must be separated by a delimiter.

B Fields must be positioned at a fixed offset from the beginning of the record.

C It is difficult to represent complex data structures such as maps.

D Data must be extracted, cleansed, and loaded into the data warehouse.

Question: 29
Which element of Hadoop is responsible for
spreading data across the cluster?
Your answer

A YARN

B MapReduce
Your answer

C AMS

D HDFS

Question: 30
Which component of the Apache Ambari
architecture stores the cluster configurations?
Your answer

A Authorization Provider

B Ambari Metrics System

C Postgres RDBMS

D Ambari Alert Framework

Question: 31
Which two are examples of personally identifiable
information (PII)? (Choose two.)
(Please select ALL that apply)
Your answer

A Time of interaction

B Medical record numbe


numberr

C Email address

D IP address

Question: 32
Under the MapReduce v1 architecture, which
element of the system manages the map and reduce
functions?
Your answer

A SlaveNode

B JobTracker

C MasterNode

D StorageNode

E TaskTracker

Question: 33
Which component of the HDFS architecture
manages storage attached to the nodes?
Your answer

A NameNode

B StorageNode

C DataNode

D MasterNode

Question: 34
Which of the "Five V's" of Big Data describes the
real purpose of deriving business insight from Big
Data?
Your answer

A Volume
Your answer

B Value

C Variety

D Velocity

E Veracity

Question: 35
Which component of the Spark Unified Stack
supports learning algorithms such as, logistic
regression, naive Bayes classification, and SVM?

Your answer

A Spark Learning

B Spork

C Spark SQL

D MLlib

Question:
Which two36
descriptions are advantages of Hadoop?
(Choose two.)
(Please select ALL that apply)
Your answer

A able to use inexpensive commodity


commodity hardware

B intensive calculations on small amounts of data

C processing a large number of small files


Your answer

D processing random access transactions

E processing large volum


volumes
es of data with high throughput

Question: 37
Which two of the following are row-based data
encoding formats? (Choose two.)
(Please select ALL that apply)
Your answer

A CSV

B Avro

C ETL

D Parquet

E RC and ORC

Question: 38
Which statement describes the action performed by
HDFS when data is written to the Hadoop cluster?
Your answer

A The data is spread out and replicated


replicated across the cluster.

B The data is replicated to at least 5 different compu


computers.
ters.

C The MasterNodes write the data to disk.

D The FsImage is updated with the new data map.

Question: 39
Under the MapReduce v1 architecture, which
element of MapReduce controls job execution on
multiple slaves?
Your answer

A MasterNode

B JobTracker

C SlaveNode

D TaskTracker

E StorageNode

Question: 40
Which component of the Spark Unified Stack
provides processing of data arriving at the system in
real-time?
Your answer

A MLlib

B Spark SQL

C Spark Streaming

D Spark Live

Question: 41
Which two registries are used for compiler and
runtime performance improvements in support of
the Big SQL environment? (Choose two)
(Please select ALL that apply)
Your answer

A DB2ATSENABLE

B DB2FODC

C DB2COMPOPT

D DB2RSHTIMEOUT

E DB2SORTAFTER_TQ

Question: 42
Which script is used to backup and restore the Big
SQL database?
Your answer

A bigsql_bar.py

B db2.sh

C bigsql.sh

D load.py

Question: 43
You need to create a table that is not managed by
the Big SQL database manager. Which keyword
would you use to create the table?
Your answer

A STRING

B BOOLEAN
Your answer

C SMALLINT

D EXTERNAL

Question: 44
Which two of the following data sources are
currently supported by Big SQL? (Choose two)
(Please select ALL that apply)
Your answer

A Oracle

B PostgreSQL

C Teradata

D MySQL

E MariaDB

Question: 45
Which port is the default for the Big SQL Scheduler
to get administrator commands?
Your answer

A 7055

B 7054

C 7052

D 7053

Question: 46
Which tool should you use to enable Kerberos
security?
Your answer

A Hortonworks

B Ambari

C Apache Ranger

D Hive

Question: 47
Which two options can be used to start and stop Big
SQL? (Choose two)
(Please select ALL that apply)
Your answer

A Scheduler

B DSM Consol
Consolee

C Command line

D Java SQL shell

E Ambari web interface

Question: 48
Which command is used to populate a Big SQL
table?
Your answer

A CREATE
Your answer

B QUERY

C SET

D LOAD

Question: 49
Which feature allows the bigsql user to securely
access data in Hadoop on behalf of another user?
Your answer

A Impersonation

B Privilege

C Rights

D Schema

Question: 50
Which command would you run to make a remote
table accessible using an alias?
Your answer

A SET AUTHORIZATION

B CREATE SERVER

C CREATE WRAPPER

D CREATE NICKNAME

Question: 51
The Big SQL head node has a set of processes
running. What is the name of the service ID running
these processes?
Your answer

A Db2

B hdfs

C user1

D bigsql

Question: 52

Which file column


where the format contains human-readable
values are separated by adata
comma?
Your answer

A Parquet

B ORC

C Delimited

D Sequence

Question: 53
Which Big SQL authentication mode is designed to
provide strong authentication for client/server
applications by using secret-key cryptography?
Your answer

A Public key
Your answer

B Flat files

C Kerberos

D LDAP

Question: 54
Which type of foundation does Big SQL build on?
Your answer

A Jupyter

B Apache HIVE

C RStudio

D MapReduce

Question: 55
You need to monitor and manage data security
across a Hadoop platform. Which tool would you
use?
Your answer

A SSL

B HDFS

C Hive

D Apache Ranger

Question: 56
What can be used to surround a multi-line string in a
Python code cell by appearing before and after the multi-
multi -
line string?
Your answer

A """

B "

Question: 57
For what are interactive notebooks used by data
scientists?
Your answer

A Packaging data for public distribution


distribution on a website.

B Quick data exploration tasks that can be reproduced.

C Providing a chain of custody of all data.

D Bulk loading data into a database.

Question: 58
What Python statement is used to add a library to
the current code cell?
Your answer

A pull

B import

C load
Your answer

D using

Question: 59
What Python package has support for linear algebra,
optimization, mathematical integration, and
statistics?
Your answer

A NLTK

B Pandas

C NumPy

D SciPy

Question: 60
Which three main areas make up Data Science
according to Drew Conway? (Choose three.)
(Please select ALL that apply)
Your answer

A Traditional research

B Machine learning

C Substantive expertise

D Math and statistics knowledge

E Hacking skills
1- Select all the components of HDP which provides data access capabilies

 Pig
 Sqoop
 Flume
 MapReduce
 Hive

2- Select the components that provides the capability to move data from relaonal database
into Hadoop.

 Sql
 Sqoop
 Hive
 Kaa
 Flume

3- Managing Hadoop clusters can be accomplished using which component?

 Ambari
 HBase
 Phoenix
 Hive
 Sqoop

4- True or False: The following components are value-add from IBM: Big Replicate, Big SQL,
BigIntegrate, BigQuality, Big Match

 TRUE
 FALSE

5- True or False: Data Science capabilies can be achieved using only HDP.

 TRUE
 FALSE
6- True or False: Ambari is backed by RESTful APIs for developers to easily integrate with
their own applicaons.

 TRUE
 FALSE

7- Which Hadoop funconalies does Ambari provide?

 None of the above


 All of the above
 Monitor
 Manage
 Provision
 Integrate

8- Which page from the Ambari UI allows you to check the versions of the soware installed
on your cluster?

 Monitor page
 Integrate page
 The Admin > Manage Ambari page
 The Admin > Provision page

9- True or False?Creang users through the Ambari UI will also create the user on the HDFS.

 TRUE
 FALSE

10- True or False? You can


c an use the CURL commands to issue commands to Ambari.

 TRUE
 FALSE

11- True or False: Hadoop systems are designed for transacon processing.

 TRUE
 FALSE
12- What is the default number of replicas in a Hadoop system?

 5
 4
 3
 2

13- True or False: One of the driving principal of Hadoop is that the data is brought to the
program.

 TRUE
 FALSE

14- True or False: Atleast 2 Name Nodes are required for a standalone Hadoop cluster.

 TRUE
 FALSE

15- True or False: The phases in a MR job are Map, Shue, Reduce and Combiner

 TRUE
 FALSE

16- Centralized handling of job control ow is one of the the limitaons of MR v1.

 TRUE
 FALSE

16- The Job Tracker in MR1 is replaced by which component(s) in YARN?

 ResourceMaster
 ApplicaonMaster
 ApplicaonManager
 ResourceManager

17- What are the benets of using Spark?


(Please select the THREE that apply)

 Generality
 Versality
 Speed
 Ease of use
18- What are the languages supported by Spark?
(Please select the THREE that apply)

 Javascript
 HTML

Python
 Java
 Scala

19- Resilient Distributed Dataset (RDD) is the primary abstracon of Spark.

 TRUE
 FALSE

20- What would you need to do in a Spark applicaon that you would not need to do in a

Spark shell to start using Spark?


 Extract the necessary libraries to load the SparkContext
 Export the necessary libraries to load the SparkContext
 Delete the necessary libraries to load the SparkContext
 Import the necessary libraries to load the SparkContext

21- True or False: NoSQL database is designed for those that do not want to use SQL.

 TRUE
 FALSE

22- Which database is a columnar storage database?

 SQL
 Hive
 HBase

23- Which database provides a SQL for Hadoop interface?

 Hive
 Hadoop
 HBase
24- Which Apache project provides coordinaon of resources?

 Streams
 Spark
 Zeppelin

ZooKeeper

25- What is ZooKeeper's role in the Hadoop infrastructure?

 Manage the coordinaon between HBase servers


 None of the above
 Hadoop and MapReduce uses ZooKeeper to aid in high availability of Resource
Manager
 All of the above
 Flume uses ZooKeeper for conguraon purposes in recent releases

26- True or False: Slider provides an intuive UI which allows you to dynamically allocate
YARN resources.

 TRUE
 FALSE

27- True or False: Knox can provide all the security you need within your Hadoop
infrastructure.

 TRUE
 FALSE

28- True or False: Sqoop is used to transfer data between Hadoop and relaonal databases.

 TRUE
 FALSE

29- True or False: For Sqoop to connect


c onnect to a relaonal database, the JDBC JAR les for that
database must be located in $SQOOP_HOME/bin.

 TRUE
 FALSE
30- True or False: Each Flume node receives data as "source", stores it in a "channel", and
sends it via a "sink".

 TRUE
 FALSE

31- Through what HDP component are Kerberos, Knox, and Ranger managed?

 Zookeeper
 Ambari
 Apache Knox

32- Which security component is used to provide peripheral security?

 Apache Ranger
 Apache Camel
 Apache Knox

33- One of the governance issue that Hortonworks DataPlane Service (DPS) address is
visibility over all of an organizaon's data across all of their environments — on-prem,
cloud, hybrid — while making it easy to maintain consistent security and governance

 TRUE
 FALSE

34- True or false: The typical sources of streaming data are Sensors, "Data exhaust" and
high-rate transacon data.


TRUE
 FALSE

35- What are the components of Hortonworks Data Flow(HDF)?

 Flow management
 Stream processing
 All of the above
 None of the above
 Enterprise services
36- True or False: NiFi is a disk-based, microbatch ETL tool that provides ow management

 TRUE

FALSE

37- True or False: MiNiFi is a complementary data collecon tool that feeds collected
c ollected data to
NiFi

 TRUE
 FALSE

38- What main features does IBM Streams provide as a Streaming Data Plaorm?
(Please select the THREE that apply)


Flow management
 Analysis and visualizaon
 Sensors
 Rich data connecons
 Development support

39- What are the three types of Big Data?


(Please select the THREE that apply)

 Natural Language

Semi-structured
 Graph-based
 Structured
 Machine-Generated
 Unstructured
40- What are the 4Vs of Big Data?
(Please select the FOUR that apply)

 Veracity
 Velocity

Variety
 Value
 Volume
 Visualizaon

41- What are the most important computer languages for Data Analycs?
(Please select the THREE that apply)

 Scala
 HTML
 R

SQL
 Python

42- True or False: GPUs are special-purpose processors that tradionally


tradionally can be used to
power graphical displays, but for Data Analycs lend themselves to faster algorithm
execuon because of the large number of independent processing cores.

 TRUE
 FALSE

43- True or False: Jupyter stores its workbooks in les with the .ipynb sux. These les can
not be stored locally or on a hub server.

 TRUE
 FALSE

44- $BIGSQL_HOME/bin/bigsql startcommand is used to start Big SQL from the command line?

 TRUE
 FALSE
45- What are the two ways you can work with Big SQL.
(Please select the TWO that apply)

 JQuery
 R

JSqsh
 Web tooling from DSM

46- What is one of the reasons to use Big SQL?

 Want to access your Hadoop data without using MapReduce


 You want to learn new languages like MapReduce
 Has deep learning curve because Big SQL uses standard 2011 query structure

47- Should you use the default STRING data type?

Yes
 No

48- The BOOLEAN type is dened as SMALLINT SQL type in Big SQL.

 TRUE
 FALSE

49- Using the LOAD operaon is the recommended


rec ommended method for geng data into your Big
SQL table for best performance.


TRUE
 FALSE

50- Which le storage format has the highest performance?

 Delimited
 Sequence
 RC
 Parquet
 Avro
51- What are the two ways to classify funcons?

 Built-in funcons
 Scalar funcons

User-dened funcons
 None of the above

52- True or False: UMASK is used to determine permissions on directories and les.

 TRUE
 FALSE

53- True or False: You can


c an only Kerberize a Big SQL server before it is installed.

 TRUE
 FALSE

54- True or False: Authencaon with Big SQL only occurs at the Big SQL layer or the client's
applicaon layer.

 TRUE
 FALSE

55- True or False: Ranger and impersonaon works well together.

 TRUE
 FALSE

56- True or False: RCAC can hide rows and columns.

 TRUE
 FALSE

57- True or False: Nicknames can be used for wrappers and servers.

 TRUE
 FALSE
58- True or False: Server objects denes the property and values of the connecon.

 TRUE
 FALSE

59- True or False: The purpose of a wrapper provide a library of rounes that doesn't
communicates with the data source.

 TRUE
 FALSE

60- True or False: User mappings are used to authencate to the remote data source.

 TRUE
 FALSE

61- True or False: Collaboraon with Watson Studio is an oponal add-on component that
must be purchased.
 TRUE
 FALSE

62- True or False: Watson Studio is designed only for Data Sciensts, other personas would
not know how to use it.

 TRUE
 FALSE

63- True or False: Community provides access to arcles, tutorials, and even data sets that
you can use.

 TRUE
 FALSE

64- True or False: You can import visualizaon libraries into Watson Studio.

 TRUE
 FALSE
65- True or False: Collaborators can be given certain access levels.

 TRUE
 FALSE

66- True or False: Watson Studio contains Zeppelin as a notebook interface.

 TRUE
 FALSE

67- Spark is developed in which language

 Java
 Scala
 Python
 R

68- In Spark Streaming the data can be from what all sources?
 Kaa
 Flume
 Kinesis
 All of the above

69- Apache Spark has API's in


Java
Scala
Python
All of the above

70- Which of the following is not a component of Spark Ecosystem?


Sqoop

GraphX

MLlib

BlinkDB

S
1-Which is an advantage that Zeppelin holds over Jupyter?
Your answer

A. Notebooks can be
be used by multiple people
people at the same time.

B. Notebooks can be
be connected to big data engines such as Spark.
Spark.

C. Users must authenticate


authenticate before using
using a notebook.
notebook.

D. Zeppelin is able
able to use the R language.

2-Why might a data scientist need a particular kind of GPU (graphics


processing unit)?
Your answer

A. To collect video for use in streaming data applications.


applications.

B. To display a simple bar chart


chart of data on the
the screen.

C. To perform certain data transformation quickly.


quickly.

D. To input commands
commands to a data
data science notebook.
notebook.

73- What command is used to list the "magic" commands in Jupyter?


Your answer

A. %list-all-magic

B. %dirmagic

C. %list-magic

D. %lsmagic
74- What is the first step in a data science pipeline?
Your answer

A. Exploration

B. Acquisition

C. Manipulation

D. Analytics

75- What is a markdown cell used for in a data science notebook?


Your answer

A. Documenting the computational


computational process.

B. Configuring data connections.

C. Holding the output of a computation.

D. Writing code
code to transform data.
data.

76- You have a distributed file system (DFS) and need to set permissions on
the the /hive/warehouse directory to allow access to ONLY the bigsql user.
Which command would you run?
Your answer

A. hdfs dfs -chmod 700 /hive/warehouse


/hive/warehouse

B. hdfs dfs -chmod 755


755 /hive/warehouse
/hive/warehouse

C. hdfs dfs -chmod 770 /hive/warehouse


/hive/warehouse

D. hdfs dfs -chmod 666 /hive/warehouse


/hive/warehouse
77- You need to determine the permission setting for a new schema directory.
Which tool would you use?
Your answer

A. umask

B. HDFS

C. Kerberos

D. GRANT

78- Which definition best describes RCAC?


Your answer

A. It grants or revokes certain user


user privileges.
B. It limits access by
by using views
views and stored procedures.

C. It grants or revokes certain


certain directory privileges.
privileges.

D. It limits the rows or


or columns returned based on certain criteria.

79- How many Big SQL management node do you need at minimum?
Your answer

A. 4

B. 1

C. 3

D. 2
80- Which directory permissions need to be set to allow all users to create
their own schema?
Your answer

A. 755

B. 666

C. 777

D. 700

81- Which command creates a user-defined schema function?


Your answer

A. CREATE FUNCTION
B. ALTER MODULE ADD FUNCTION
FUNCTION

C. TRANSLATE FUNCTION

D. ALTER MODULE PUBLISH FUNCTION

82- What are Big SQL database tables organized into?


Your answer

A. Directories

B. Schemas

C. Hives

D. Files
83- Which Big SQL feature allows users to join a Hadoop data set to data in
external databases?
Your answer

A. Fluid query

B. Impersonation

C. Integration

D. Grant/Revoke privileges

84- Which two commands would you use to give or remove certain privileges
to/from a user?
Your answer

A. INSERT

B. GRANT

C. SELECT

D. REVOKE

E. LOAD
85- What is an advantage of the ORC file format?
Your answer

A. Efficient compression

B. Supported by multiple I/O engines

C. Data interchange
interchange outside Hadoop
Hadoop

D. Big SQL can exploit advanced features

86- You are creating a new table and need to format it with parquet. Which
partial SQL statement would create the table in parquet format?
Your answer

A. CREATE AS parquetfile
parquetfile

B. CREATE AS parquet

C. STORED AS parquetfile

D. STORED AS parquet

8 - Which statement best describes a Big SQL database table?


Your answer

A. The defined format and rules


rules around a delimited file.

B. A directory with zero or


or more data files.

C. A container
container for any
any record format.
D. A data type of a column describing
describing its value.
value.

88- You need to enable impersonation. Which two properties in the bigsql-
conf.xml file need to be marked true?
Your answer

A. bigsql.alltables.io.doAs

B. bigsql.impersonation.create.table.grant.public

C. DB2_ATS_ENABLE

D. DB2COMPOPT

E. $BIGSQL_HOME/conf

89- Using the Java SQL Shell, which command will connect to a database
called mybigdata?
Your answer

A. ./java tables

B. ./jsqsh mybigdata

C. ./java mybigdata

D. ./jsqsh go mybigdata

90- When connecting to an external database in a federation, you need to use


the correct database driver and protocol. What is this federation component
called in Big SQL?
Your answer
A. Wrapper

B. Data source

C. Nickname

D. User mapping

91- What are two primary limitations of MapReduce v1?


Your answer

A. Scalability

B. Workloads limited to MapReduce

C. Resource utilization

D. TaskTrackers can be a bottleneck


bottleneck to MapReduce
MapReduce jobs

E. Number of TaskTrackers limited to 1,000


1,000

92- Which feature makes Apache Spark much easier to use than MapReduce?
Your answer

A. Suitable for transaction processing.


processing.

B. Libraries that support


support SQL queries.

C. Applications run in-memory.

D. APIs for Scala,


Scala, Python, C++, and .NET.
.NET.

93- What is an example of a NoSQL datastore of the "Document Store" type?


Your answer

A. Cassandra

B. REDIS

C. HBase
D. MongoDB

94- Which Apache Hadoop application provides a high-level programming


language for data transformation on unstructured data?
Your answer

A. Zookeeper

B. Pig

C. Hive

D. Sqoop

95- Under the MapReduce v1 programming model, which shows the proper
order of the full set of MapReduce phases?
Your answer

A. Map -> Split -> Reduce -> Combine

B. Map -> Combine


Combine -> Reduce
Reduce -> Shuffle
Shuffle

C. Map -> Combine


Combine -> Shuffle
Shuffle -> Reduce
Reduce

D. Split -> Map -> Combine -> Reduce

96- Which three programming languages are directly supported by Apache


Spark?
Your answer

A. C#

B. C++

C. Java

D. .NET
E. Python

F. Scala

97- Which statement accurately describes how ZooKeeper works?


Your answer

A. All servers keep a copy


copy of the shared data in memory.
memory.

B. Clients connect
connect to multiple servers at the same time.
time.

C. There can be
be more than one
one leader server
server at a time.

D. Writes to a leader
leader server will always
always succeed.

98- Which computing technology provides Hadoop's high performance?


Your answer

A. Online Transactional
Transactional Processing

B. Parallel Processing

C. Online Analytical Processing

D. RAID-0

99- Which two factors in a Hadoop cluster increase performance most


significantly?
Your answer

A. immediate failover of failed disks

B. parallel reading
reading of large data files

C. data redundancy
redundancy on management nodes

D. solid state disks

E. high-speed networking between nodes


nodes
F. large number
number of small data files

100- Which component of the Apache Ambari architecture provides statistical


data to the dashboard about the performance of a Hadoop cluster?
Your answer

A. Ambari Alert Framework

B. Ambari Wizard

C. Ambari Metrics System

D. Ambari Server

101- Apache Spark can run on which two of the following cluster managers?
Your answer

A. oneSIS

B. Linux Cluster Manager

C. Nomad

D. Apache Mesos

E. Hadoop YARN

102- Under the MapReduce v1 programming model, which optional phase is


executed simultaneously with the Shuffle phase?
Your answer

A. Map

B. Combiner

C. Reduce
D. Split

103- Which hardware feature on an Hadoop datanode is recommended for cost


efficient performance?
Your answer

A. JBOD

B. RAID

C. LVM

D. SSD

104- What is the name of the Hadoop-related Apache project that utilizes an in-
memory architecture to run applications faster than MapReduce?
Your answer

A. Spark

B. Python

C. Pig

D. Hive

105- Which statement is true about MapReduce v1 APIs?


Your answer

A. MapReduce v1 APIs
APIs cannot be used with
with YARN.

B. MapReduce v1 APIs
APIs are implemented by applications
applications which are largely
independent of the execution environment.

C. MapReduce v1 APIs
APIs define how MapReduce
MapReduce jobs are executed.
executed.
D. MapReduce v1 APIs
APIs provide a flexible execution
execution environment to run
MapReduce.

106- Hadoop uses which two Google technologies as its foundation?


Your answer

A. YARN

B. Google File System

C. Ambari

D. HBase

E. MapReduce

107- What are two common issues in distributed systems?


Your answer

A. Reduced performance
performance when compared to a single server.

B. Partial failure of the nodes during execution.


execution.

C. Finding a particular node


node within the cluster.

D. Distributed systems
systems are harder
harder to scale
scale up.

108- Which statement about Apache Spark is true?


Your answer

A. It runs on Hadoop clusters


clusters with RAM drives configured
configured on each
DataNode.

B. It is much faster than MapReduce


MapReduce for complex applications
applications on disk.

C. It supports HDFS, MS-SQL,


MS-SQL, and Oracle.
D. It features APIs for C++ and .NET.
109- Which two are valid watches for ZNodes in ZooKeeper?
Your answer
A. NodeRefreshed

B. NodeExpired

C. NodeChildrenChanged

D. NodeDeleted

110- Which Apache Hadoop component can potentially replace an RDBMS as a


large Hadoop datastore and is particularly good for "sparse data"?
Your answer

A. Spark

B. Ambari

C. HBase

D. MapReduce

111- Which statement describes an example of an application using streaming


data?
Your answer

A. An application evaluating
evaluating sensor data in real-time.

B. One time export


export and import of a database.

C. A web application
application that supports 10,000
10,000 users.

D. A system that stores many records


records in a database.
database.
112- Which component of the Hortonworks Data Platform (HDP) is the
architectural center of Hadoop and provides resource management and a
central platform for Hadoop applications?
Your answer

A. HBase

B. HDFS

C. YARN

D. MapReduce

113- Which three are a part of the Five Pillars of Security?


Your answer

A. Administration

B. Audit

C. Resiliency

D. Speed

E. Data Protection

114- How can a Sqoop invocation be constrained to only run one mapper?
Your answer

A. Use the --limit mapper=1 parameter.


parameter.

B. Use the -mapper 1 parameter.

C. Use the --single parameter.

D. Use the -m 1 parameter.


115- Under the YARN/MRv2 framework, which daemon is tasked with
negotiating with the NodeManager(s) to execute and monitor tasks?
Your answer

A. TaskManager

B. JobMaster

C. ResourceManager

D. ApplicationMaster

116- Apache Spark provides a single, unifying platform for which three of the
following types of operations?
Your answer

A. ACID transactions
B. graph operations

C. record locking

D. batch processing

E. machine learning

F. transaction processing

117- Which Apache Hadoop application provides an SQL-like interface to allow


abstraction of data on semi-structured data in a Hadoop datastore?
Your answer

A. Spark

B. Pig

C. YARN

D. Hive
118- Under the MapReduce v1 programming model, what happens in a
"Reduce" step?
Your answer

A. Data is aggregated by worker


worker nodes.

B. Worker nodes
nodes process pieces in parallel.

C. Worker nodes store results on their own local file systems.

D. Input is split into pieces.

119- What are two security features Apache Ranger provides?


Your answer

A. Auditing

B. Authentication

C. Authorization

D. Availability

120- Under the YARN/MRv2 framework, the JobTracker functions are split into
which two daemons?
Your answer

A. JobMaster

B. TaskManager

C. ApplicationMaster

D. ScheduleManager

E. ResourceManager
121- Under the YARN/MRv2 framework, which daemon arbitrates the execution
of tasks among all the applications in the system?
Your answer

A. ApplicationMaster

B. JobMaster

C. ScheduleManager

D. ResourceManager

122- Which description characterizes a function provided by Apache Ambari?


Your answer

A. A wizard for installing Hadoop


Hadoop services on host servers.
servers.

B. Moves large amounts of streaming event


event data.

C. Moves information
information to/from structured databases.
databases.

D. A messaging system for real-time


real-time data pipelines.
pipelines.

123- What is the preferred replacement for Flume?


Your answer

A. Druid

B. Hortonworks Data Flow

C. NiFi

D. Storm
124- If a Hadoop node goes down, which Ambari component will notify the
Administrator?
Your answer

A. Ambari Metrics System

B. Ambari Alert Framework

C. Ambari Wizard

D. REST API

125- Hadoop 2 consists of which three open-source sub-projects maintained


by the Apache Software Foundation?
Your answer

A. HDFS

B. Hive

C. YARN

D. MapReduce

E. Big SQL

F. Cloudbreak

126- What is the architecture of Watson Studio centered on?

Your answer
A. Projects

B. Data Assets

C. Analytic Assets

D. Collaborators
127- Which type of cell can be used to document and comment on a process in
a Jupyter notebook?
Your answer

A. Markdown

B. Code

C. Kernel

D. Output

128- Which Watson Studio offering used to be available through something


known as IBM Bluemix?
Your answer

A. Watson Studio Business


Business

B. Watson Studio Local

C. Watson Studio Cloud

D. Watson Studio Desktop

129- Where does the unstructured data of a project reside in Watson Studio?
Your answer

A. Wrapper

B. Database

C. Object Storage

D. Tables
130- Before you create a Jupyter notebook in Watson Studio, which two items
are necessary?
Your answer

A. Spark Instance

B. Scala

C. Project

D. File

E. URL

130- The basic abstracon of Spark Streaming is


Dstream
 RDD
 Shared Variable
 None of the above

131- Which of the following algorithm is not present in MLlib?

 Streaming Linear Regression


 Streaming KMeans
 Tanimoto distance
 None of the above

132- Dstream internally is

 Connuous Stream of RDD


 Connuous Stream of DataFrame
 Connuous Stream of DataSet
 None of the above

133- Can we add or setup new string computaon aer SparkContext starts


Yes
 No
134- Which of the following is not the feature of Spark?

 Supports in-memory computaon


 Fault-tolerance
 It is cost ecient
 Compable with other le storage system

135- Which is the abstracon of Apache Spark?


 Shared Variable
 RDD
 Both the above

136- What are the parameters dened to specify window operaon


Window length, sliding interval
State size, window length
State size, sliding interval

None of the above

137- Which of the following is not output operaon on DStream


SaveAsTextFiles
ForeachRDD
SaveAsHadoopFiles
ReduceByKeyAndWindow

138- Dataset was introduced in which Spark release?


Spark 1.6
Spark 1.4.0
Spark 2.1.0
Spark 1.1

139- Which Cluster Manager do Spark Support?


Standalone Cluster Manager
MESOS
YARN
All of the above

140- The default storage level of cache() is?

MEMORY_ONLY
MEMORY_AND_DISK
DISK_ONLY
MEMORY_ONLY_SER

141- Which is not a component on the top of Spark Core?


Spark RDD
Spark Streaming

MLlib
None of the above

142- Apache Spark was made open-source in which year?


2010
2011
2008
2009

143- In addion to stream processing jobs, what all funconality do Spark provides?
Machine learning
Graph processing
Batch processing
All of the above

144- Is Spark included in every major distribuon of Hadoop?


Yes
No

145- Which of the following is not true for Hadoop and Spark?
Both are data processing plaorms
Both are cluster compung environments
Both have their own le system
Both use open source APIs to link between dierent tools

146- How much faster can Apache Spark potenally run batch-processing programs when
processed in memory than MapReduce can?
10 mes faster
20 mes faster
100 mes faster
200 mes faster
147- Which of the following provide the Spark Core’s fast scheduling capability to perform
streaming analycs.
RDD
GraphX
Spark Streaming
Spark R

148- Which of the following is the reason for Spark being Speedy than MapReduce?
DAG execuon engine and in-memory computaon
c omputaon
Support for dierent language APIs like Scala, Java, Python and R
RDDs are immutable and fault-tolerant
None of the above

149- Can you combine the libraries of Apache Spark into the same Applicaon, for example,
MLlib, GraphX, SQL and DataFrames etc.
Yes
No

150- Which of the following is true for RDD?


RDD is programming paradigm
RDD in Apache Spark is an immutable collecon of objects
It is database
None of the above

151- Which of the following is not a funcon of Spark Context in Apache Spark?
Entry point to Spark SQL
To Access various services

To
To set
get the
the conguraon
current status of Spark Applicaon

152- What are the features of Spark RDD?


In-memory computaon
Lazy evaluaons
Fault Tolerance
All of the above

153- How many Spark Context can be acve per JVM?


More than one
Only one
Not specic
None of the above
154- In how many ways RDD can be created?
4
3
2
1

155- How many tasks does Spark run on each paron?


Any number of task
One
More than one less than ve

156- Can we edit the data of RDD,


R DD, for example, the case conversion?
Yes
No

157- Which of the following is not a transformaon?


Flatmap
Map
Reduce
Filter

158- Which of the following is not an acon?


collect()
take(n)
top()
map

159- Does Spark R make use of MLlib in any aspect?


Yes
No

160- You can connect R program to a Spark cluster from -


RStudio
R Shell
Rscript
All of the above

161- For Mulclass classicaon problem which algorithm is not the soluon?
Naive Bayes
Random Forests
Logisc Regression
Decision Trees

162- For Regression problem which algorithm is not the soluon?


Logisc Regression
Ridge Regression
Decision Trees
Gradient-Boosted Trees

163- Which of the following is true about DataFrame?


DataFrames provide a more user-friendly API than RDDs.
DataFrame API have provision for compile-me type safety
Both the above

164- Which of the following is a tool of Machine Learning Library?


Persistence
Ulies like linear algebra, stascs
Pipelines
All of the above

165- Is MLlib deprecated?


Yes
No

166- Which of the following is false for Apache Spark?


It provides high-level API in Java, Python, R, Scala
It can be integrated with Hadoop and can process exisng Hadoop HDFS data
Spark is an open source framework which is wrien in Java
Spark is 100 mes faster than Bigdata Hadoop

167- Which of the following is true for Spark SQL?


It is the kernel of Spark
Provides an execuon plaorm for all the Spark applicaons
It enables users to run SQL / HQL queries on the top of Spark.
Enables powerful interacve and data analycs applicaon across live streaming
data

168- Which of the following is true for Spark core?


It is the kernel of Spark

It enables users to run SQL / HQL queries on the top of Spark.


It is the scalable machine learning library which delivers eciencies
Improves the performance of iterave algorithm drascally.
169- Which of the following is true for Spark R?
It allows data sciensts to analyze large datasets and interacvely run jobs
It is the kernel of Spark
It is the scalable machine learning library which delivers eciencies
It enables users to run SQL / HQL queries on the top of Spark.

170- Which of the following is true for Spark MLlib?


Provides an execuon plaorm for all the Spark applicaons
It is the scalable machine learning library which delivers eciencies
enables powerful interacve and data analycs applicaon across live streaming
data
All of the above

171- Which of the following is true for Spark Shell?


It helps Spark applicaons to easily run on the command line of the system
It runs/tests applicaon code interacvely

It
Allallows
of thereading
above from many types of data sources

172- Which of the following is true for RDD?


We can operate Spark RDDs in parallel with a low-level API
RDDs are similar to the table in a relaonal database
It allows processing of a large amount of structured data
It has built-in opmizaon engine

173- RDD are fault-tolerant and immutable


True
False

174- In which of the following cases do we keep the data in-memory?


Iterave algorithms
Interacve data mining tools
Both the above

175- When does Apache Spark evaluate RDD?


Upon acon
Upon transformaon
On both transformaon and acon

176- The read operaon on RDD is


Fine-grained
Coarse-grained
Either ne-grained or coarse-grained
Neither ne-grained nor coarse-grained

177- The write operaon on RDD is


Fine-grained
Coarse-grained
Either ne-grained or coarse-grained
Neither ne-grained nor coarse-grained

178- Is it possible to migate stragglers in RDD?


Yes
No

179- Fault Tolerance in RDD is achieved using


Immutable nature of RDD
DAG (Directed Acyclic Graph)
Lazy-evaluaon
None of the above

180- What is a transformaon in Spark RDD?


Takes RDD as input and produces one or more RDD as output.
Returns nal result of RDD computaons.
The ways to send result from executors
exec utors to the driver
None of the above

181- What is acon in Spark RDD?


The ways to send result from executors
exec utors to the driver
Takes RDD as input and produces one or more RDD as output.
Creates one or many new RDDs
All of the above

182- Which of the following is true about narrow transformaon -


The data required to compute resides on mulple parons.
The data required to compute resides on the single paron.
Both the above

183- Which of the following is true about wide transformaon -


The data required to compute resides on mulple parons.
The data required to compute resides on the single paron.
None of the both
184-Whenwe want to work with the actual dataset, at that point we use Transformaon?
True
False

185- The shortcomings of Hadoop MapReduce was overcome by Spark RDD by


Lazy-evaluaon
DAG
In-memory processing
All of the above

186- What does Spark Engine do?


Scheduling
Distribung data across a cluster
Monitoring data across a cluster
All of the above

187- Caching is opmizing the technique


True
False

188- Which of the following is the entry point of Spark Applicaon -


SparkSession
SparkContext
None of the both

189- SparkContext guides how to access the Spark cluster.


True
False

190- Which of the following is the entry point of Spark SQL?


SparkSession
SparkContext

191- Which of the following is open-source?


Apache Spark
Apache Hadoop
Apache Flink
All of the above
192- Apache Spark supports -
Batch processing
Stream processing
Graph processing
All of the above

193- Which of the following is not true for map() Operaon?


Map transforms an RDD of length N into another RDD of length N.
In the Map operaon developer can dene his own custom business logic.
It applies to each element of RDD
R DD and it returns the result as new RDD
Map allows returning 0, 1 or more elements from map funcon.

194- FlatMap transforms an RDD of length N into another RDD of length M. which of the
following is true for N and M.

a. N>M

b. N<M

c. N<=M

Either a or b
Either b or c
Either a or c

195- Which of the following is a transformaon?


take(n)
top()
countByValue()
mapParonWithIndex()

196- Which of the following is acon?


Union(dataset)
Intersecon(other-dataset)
Disnct()
CountByValue()

197- In aggregate funcon can we get the data type dierent from as that input data type?
Yes
No
198- In which of the following Acon the result is not returned to the driver.
collect()
top()
countByValue()
foreach()

199- Which of the following is true for stateless transformaon?


Uses data or intermediate results from previous batches and computes the result of
the current batch.
Windowed operaons and updateStateByKey() are two type of Stateless
transformaon.
The processing of each batch has no dependency on the data of previous batches.
None of the above

120- Which of the following is true for stateful transformaon?


The processing of each batch has no dependency on the data of previous batches.

Uses data orbatch.


the current intermediate results from previous batches and computes the result of
Stateful transformaons are simple RDD transformaons.
transformaons.
None of the above

121- The primary Machine Learning API for Spark is now the _____ based API
DataFrame
Dataset
RDD
All of the above

122- Which of the following is a module for Structured data processing?


GraphX
MLlib
Spark SQL
Spark R

123- SparkSQL translates commands into codes. These codes are processed by
Driver nodes
Executor Nodes
Cluster manager
None of the above

124- Spark SQL plays the main role in the opmizaon of queries.
True
False

125- This opmizer is based on funconal programming construct in


Java
Scala
Python
R

126- Catalyst Opmizer supports either rule-based or cost-based opmizaon.


True
False

127- Which of the following is not true for Catalyst Opmizer?


Catalyst opmizer makes use of paern matching feature.
Catalyst contains the tree and the set of rulesto manipulate the tree.
There are no specic libraries to process relaonal queries.
There are dierent rule sets which handle dierent phases of query.

128- Which of the following is true for the tree in Catalyst opmizer?
A tree is the main data type in the catalyst.
New nodes are dened as subclasses of TreeNode
Tree Node class.
A tree contains a node object.
All of the above

129- Which of the following is true for the rule in Catalyst opmizer?
We can manipulate tree using rules.
We can dene rules as a funcon from one tree to another tree.
Using rule we get the paern that matches each paern to a result.
All of the above

130- Which of the following is not a Spark SQL query execuon phases?
Analysis
Logical Opmizaon
Execuon
Physical planning

131- In Spark SQL opmizaon which of the following is not present in the logical plan -
Constant folding
Abstract syntax tree
Projecon pruning
Predicate pushdown
132- In the analysis phase which is the correct
c orrect order of execuon aer forming unresolved
logical plan

a. Search relaon BY NAME FROM CATALOG.


b. Determine which aributes match to the same value to give them unique ID.
c. Map the name aribute
d. Propagate and push type through expressions

abcd
acbd
adbc
dcab

133-In the Physical planning phase of Query opmizaon we can use both Coast-based and
Rule-based opmizaon.
True
False

134-DataFramein Apache Spark prevails over RDD and does not contain any feature of RDD.
True
False

135- Which of the following are the common


c ommon feature of RDD and DataFrame?
Immutability
In-memory
Resilient
All of the above

136- Which of the following is not true for DataFrame?


DataFrame in Apache Spark is behind RDD
We can build DataFrame from dierent data sources. structured data le, tables in
Hive
The Applicaon Programming Interface (APIs) of DataFrame is available in various
languages
Both in Scala and Java, we represent DataFrame as Dataset of rows.

137- In Dataframe in Spark Once the domain object is converted into a data frame, the
regeneraon of domain object is not possible.
True
False
138- DataFrameAPI has provision for compile-me type safety.
True
False

139- We can create DataFrame using:


Tables in Hive
Structured data les
External databases
All of the above

140- Which of the following is the fundamental data structure of Spark


RDD
DataFrame
Dataset
None of the above

141- Which of the following organized a data into a named column?

a. RDD
b. DataFrame
c. Dataset

Both a and b
Both b and c
Both a and c

142- Which of the following provide the object-oriented


objec t-oriented programming interfa
interface
ce
RDD
DataFrame
Dataset
None of the above

143- Aer transforming into DataFrame one cannot regenerate a domain object
True
False

144- RDD allows Java serializaon


True
False
145- Which of the following make use of an encoder for serializaon.
RDD
DataFrame
Dataset
None of the above

146- Apache Spark is presently added in all major distribuon of Hadoop


True
False

147-DoesDataset API support Python and R.


Yes
No

148- Which of the following is slow to perform simple grouping and aggregaon operaons.

RDD
DataFrame
Dataset
All of the above

149- Which of the following is good for low-level transformaon and acons.
RDD
DataFrame
Dataset
All of the above

150- Which of the following technology is good for Stream technology?


Apache Spark
Apache Hadoop
Apache Flink
None of the above

151- Which of the following is not true for Apache Spark Execuon?
To simplify working with structured data it provides DataFrame abstracon in
Python, Java, and Scala.
The data can be read and wrien in a variety of structured formats. For example,
JSON, Hive Tables, and Parquet.
Using SQL we can query data,only from inside a Spark program and not from
external tools.
The best way to use Spark SQL is inside a Spark applicaon. This empowers us to
load data and query it with SQL.
152- When SQL run from the other programming language the result will be
DataFrame
DataSet
Either DataFrame or Dataset
Neither DataFrame nor Dataset

153- The Dataset API is accessible in


Java and Scala
Java, Scala and python
Scala and Python
Scala and R

154- Dataset API is not supported by Python. But because of the dynamic nature of Python,
many benets of Dataset API are available.
True
False

155- Which of the following is true for Catalyst opmizer?


The opmizer helps us to run queries much faster than their counter RDD part.
The opmizer helps us to run queries lile faster than their counter RDD part.
The opmizer helps us to run queries in the same speed as their counter RDD part.

156- Which of the following are uses of Apache Spark SQL?


It executes SQL queries.
We can read data from exisng Hive installaon using SparkSQL.
When we run SQL within another programming language
language we will get the result as
Dataset/DataFrame.
All of the above

157- With the help of Spark SQL, we can query structured data as a distributed dataset
(RDD).
True
False

158- Spark SQL can connect through JDBC or ODBC.


True
False

159- Using Spark SQL, we can create or read a table containing union elds.
True
False
160- Which of the following is true for Spark SQL?
Hive transacons are not supported by Spark SQL.
No support for me-stamp in Avro table.
Even if the inserted value exceeds
exc eeds the size limit, no error will occur.
All of the above

161- Which of the following is the daemon of Hadoop?


NameNode
Node manager
DataNode
All of the above

162-Which one of the following is false about Hadoop?


It is a distributed framework
The main algorithm used in it is Map Reduce
c. It runs with commodity hard ware
All are true

163-Hadoop Framework is wrien in


Python
Java
C++
Scala

164- Whichof the following is component of Hadoop?


YARN
HDFS
MapReduce
All of the above

165- The archive le created in


i n Hadoop has the extension of
.hrh
.har
.hrc
.hrar

166- Whichcommand is used to check the status of all daemons running in the HDFS.
jps

fsck
distcp
None of the above
167- What license is Apache Hadoop distributed under?
Apache License 2.0
Shareware
Mozilla Public License
Commercial

168- Which of the following plaorms does Apache Hadoop run on ?


Bare metal
Unix-like
Cross-plaorm
Debian

169- Apache Hadoop achieves reliability by replicang the data across mulple hosts, and
hence does not require ________ storage on hosts.
Standard RAID levels
RAID
ZFS
Operang system

170- Which of the following is the correct statement


Data locality means moving computaon
c omputaon to data instead of data to computaon
Data locality means moving data to computaon instead of computaon to data
Both the above
None of the above

171- Hadoop works in


master-worker fashion

master – slavefashion
worker/slave fashion
All of the menoned

172- Which type of data Hadoop can deal with is


Structured
Semi - structured
Unstructured
All of the above

173- Which statement is false about Hadoop


It runs with commodity hard ware
It is a part of the Apache project sponsored by the ASF
It is best for live streaming of data
None of the above

174- Whichof the following property gets congured on mapred-site.xml ?


Replicaon factor
Java Environment variables.

Directory names
Host and port to store
where hdfs les.
MapReduce job runs.

175- Which of the below apache system deals with ingesng streaming data to hadoop
Flume
Oozie
Hive
Kaa

176- As compared to RDBMS, Apache Hadoop


Has higher data Integrity

Does ACID for


Is suitable transacons
read and write many mes
Works beer on unstructured and semi-structured data.

177- Whichcommand lists the blocks that make up each le in the lesystem
hdfs fsck / -les -blocks
hdfs fsck / -blocks -les
hdfs fchk / -blocks -les
hdfs fchk / -les -blocks

178- In which all languages you can code in Hadoop


Java
Python
C++
All of the above

179- Whchof the le contains the conguraon


c onguraon seng for HDFS daemons
yarn-site.xml
hdfs-site.xml
mapred-site.xml
None of the above

180- All of the following accurately describe Hadoop, EXCEPT


Open source
Real-me
Java based

Distributed compung approach

181- Whchof the le contains the conguraon


c onguraon seng for NodeManager and
ResourceManager
yarn-site.xml

hdfs-site.xml
mapred-site.xml
None of the above

182- Hadoop can be used to create distributed clusters, based on commodity servers, that
provide low-cost processing and storage for unstructured data
True
False

183- Which of the following is a distributed mul-level database?


HDFS

HBase
Both the above
None of the above

184- Which of the following is used for machine learning on Hadoop?


Hive
Pig
HBase
Mahoot

185- Which of the following is used to ingest streaming data into Hadoop clusters
Flume
Sqoop
Both the above
None of the above

186- Hadoop distributed le system behaves similarly to which of the following:
RAID-1 Filesystem
RAID-0 Filesystem
Both the above
All of the above

187- Zookeeper ensures that


All the namenodes are acvely serving the client requests
re quests
A failover is triggered when any of the datanode fails.
Only one namenode is acvely serving the client requests
A failover can not be started by hadoop administrator.

188- Which of the following is used to ingest data into Hadoop clusters?
Flume
Sqoop

Both
Nonwthe above
of the above

189- Which of the following is a data processing engine for clustered compung?
Drill
Oozie
Spark
All of the above

190- Whichtool could be used to move data from RDBMS data to HDFS?
Sqoop
Flume
Both the above
None of the above

191- All the les in a directory in HDFS can be merged together using which of the
following?
Put merge
Get merge
Remerge
Merge all

192- Whichof these provides a Stream processing system used in Hadoop ecosystem?
Hive
Solr
Tez
Spark

193- The client reading the data from HDFS lesystem in Hadoop does which of the
following?
Gets only the block locaons form the namenode
Gets the data from the namenode
Gets both the data and block locaon from the namenode
Gets the block locaon from the datanode

194- Whichof the following jobs are opmized for scalability but not latency
Mapreduce

Drill
Oozie
Hive

195- Can mulple clients write into an HDFS le concurrently?


c oncurrently?

Yes
No

196- Which of the following command is used to enter Safemode


hadoop dfsadmin –safemode get
bin dfsadmin –safemode get
hadoop dfsadmin –safemode enter
None of the above

197- Whichof the following command is used to come out of Safemode


hadoop dfsadmin -safemode leave
hadoop dfsadmin -safemode exit
hadoop dfsadmin -safemode out
None of the above

198- DuringSafemode Hadoop cluster is in


Read-only
Write-only
Read-Write
None of the above

199- HDFS allows a client to read a le which is already opened for wring?
False
True

200- What happens when a le is deleted from the command line?
It is permanently deleted if trash is enabled.
It is permanently deleted and the le aributes are recorded in a log le.
It is placed into a trash directory common to all users for that cluster.
None of the above

201- Whichone of the following statements is false about Distributed Cache?


Hive
Drill
Oozie
Mapreduce
202- Which among the following are the features of Hadoop
Open source
Fault-tolerant
High Availability
All of the above

203- Checkpoint node download the FsImage and EditLogs from the NameNode & then
merge them & store the modied FsImage
Into persistent storage
Back to the acve NameNode

204- Whichstatement is true about Safemode


It is a maintenance state of NameNode
In Safemode, HDFS cluster is in read only
In Safemode, NameNode doesn’t allow any modicaons to the le system
All of the above

205- Which command is used to know the current status of the safe mode
hadoop dfsadmin –safemode get
hadoop dfsadmin –safemode getStatus
hadoop dfsadmin –safemode status
None of the above

206- Which of the following deal with small les issue


Hadoop archives
Sequence les
HBase
All of the above

207- Whichof the following feature overcomes this single point of failure
None of the above
HDFS federaon
High availability
Erasure coding

208- Which statement is true about NameNode High Availability


Solve Single point of failure
For high scalability
Reduce storage overhead to 50%
None of the above
209- In NameNode HA, when acve node fails, which node takes the responsibility of acve
node
Secondary NameNode
Backup node
Standby node
Checkpoint node

210- Whatare the advantages of 3x replicaon schema in Hadoop


Fault tolerance
High availability
Reliability
All of the above

211- Whatare the advantages of HDFS federaon in Hadoop?


Isolaon
Namespace scalability
Improves throughput
All of the above

212- Rack Awareness improves


Data high availability and reliability
Performance of the cluster
Network bandwidth
All of the above

213- Which property is used to enable/disable speculave execuon


mapred.map.tasks.speculave.execuon

mapred.reduce.tasks.speculave.execuon
Both the above
None of the above

214- In which process duplicate task is created to improve the overall execuon me
Erasure coding
Speculave execuon
HDFS federaon
None of the above

215- In which mode each daemon runs on a single node but there is separate java process
for each daemon
Local (Standalone) mode
Fully distributed mode
Pseudo-distributed mode
None of the above

216- In which mode each daemon runs on a single node as a single java process
Local (Standalone) mode

Pseudo-distributed mode
Fully distributed mode
None of the above

217- In which mode all daemons execute


exec ute in separate nodes
Local (Standalone) mode
Pseudo-distributed mode
Fully distributed mode
None of the above

218- Which conguraon le is used to control the HDFS replicaon factor?
mapred-site.xml
hdfs-site.xml
core-site.xml
yarn-site.xml

219- How to adjust the size of a distributed cache


local.cache.size.
mapred.cache.size.
hdfs.cache.size.
distributedcache.size.

220- Whas the default size of distributed cache


8 GB
10 GB
12 GB
16 GB

221- Distributed cache can cache les


 Jar Files
 Read-only text les
 Archives
 All of the above

222- The total number of paroner is equal to


 The number of reducer
 The number of mapper

 The number of combiner


 None of the above

223- Which of the following Hadoop cong les is used to dene the heap size?
 hdfs-site.xml


core-site.xml
hadoop-env.sh
 mapred-site.xml

224- Which of the following feature you will use submit jars, stac les for MapReduce job
during runme
 Distributed cache
 Speculave execuon
 Data locality
 Erasure coding

225- Which of the following method used to set the output directory
 FileOutputFormat.setOutputgetpath()
 OutputFormat.setOutputpath()
 FileOutputFormat.setOutputpath()
 OutputFormat.setOutputgetpath()

226- Which tool is used to distributes data evenly across datanode


 Balancer
 Disk balancer

227- Which tool is used to distributes data evenly on all disks of a datanode
 Balancer
 Disk Balancer

228- Which of the following must be set true enable diskbalnecr in hdfs-site.xml
 dfs.balancer.enabled
 dfs.disk.balancer.enabled
 dfs.diskbalancer.enabled
 dfs.disk.balancer.enabled

229- In disk balancer datanode uses which volume choosing


c hoosing the policy to choose the disk for
the block.
 Round-robin

Available space
 All of the above
 None of the above

230- Which among the following is conguraon les in Hadoop


 core-site.xml
 hdfs-site.xml
 yarn-site.xml
 All of the above

231- Which of the following is used for large inter/intra-cluster copying


 fsck
 distch
 DistCp
 dtul

232- Hadoop uses hadoop


 Troubleshoong
 Performance reporng purpose
 Monitoring
 All of the above

233- Is it possible to provide mulple inputs to Hadoop?


 Yes
 No

234- Is it possible to have Hadoop job output in mulple directories?


 Yes
 No

235- Which of the following is used to provide mulple outputs to Hadoop?


 MulpleOutputFormat
 MulpleOutputs class
 FileOutputFormat
 DBInputFormat

236- Which of the following command is used to check for various inconsistencies
 zkfc
 fs
 fsck
 fetchdt

237- What does commodity Hardware in Hadoop world mean?


Very cheap hardware

 Industry standard hardware


 Discarded hardware
 Low specicaons Industry grade hardware

238- Distributed cache les can’t be accessed in Reducer.



True
False

239- Pig is a:
 Programming Language
 Data Flow Language
 Query Language
 Database

240- Distributed Cache can be used in


 Mapper phase only
 Reducer phase only
 In either phase, but not on both sides simultaneously
 In either phase

241- Which of the following is not a valid Hadoop cong le?


 core-default.xml
 hdfs-default.xml
 hadoop-default.xml
 mapred-default.xml

242- It is necessary to default all the properes in Hadoop cong les.


 True
 False

243- Which of the following is a column-oriented


c olumn-oriented database
database that runs on top of HDFS
 Hive
 Sqoop
 HBase
 Flume

244- Which command is used to show all the Hadoop daemons tthat
hat are running on the
machine
 distcp

jps
 fsck
245- Hadoop is a framework that works with a variety of related tools. Common cohorts
include:
 MapReduce, Hive and HBase
 MapReduce, MySQL and Google Apps
 MapReduce, Hummer and Iguana
 MapReduce, Heron and Trumpet
Question: 1
Which capability does IBM BigInsights add to enrich
Hadoop?
Your answer

A Jaql

B Fault tolerance through HDFS replication

C Adaptive MapReduce

D Parallel com
computing
puting on commodity servers

Question: 2
What is one of the four characteristics of Big Data?
Your answer

A value

B volume

C verifiability

D volatility

Question: 3
Which Hadoop-related project provides common
utilities and libraries that support other Hadoop sub
projects?
Your answer

A Hadoop Common
Common
Your answer

B Hadoop HBase

C MapReduce

D BigTable

Question: 4
Which type of Big Data analysis involves the
processing of extremely large volumes of constantly
moving data that is impractical to store?
Your answer

A Federated Discovery and Navigation


Navigation

B Text Analysis

C Stream Computing

D MapReduce

Question: 6
Which primary computing bottleneck of modern
computers is addressed by Hadoop?
Your answer

A 64-bit architecture

B disk latency

C MIPS
Your answer

D limited disk capacity

Question: 7
Which Big Data function improves the decision-
making capabilities of organizations by enabling the
organizations to interpret and evaluate structured
and unstructured data in search of valuable
business information?
Your answer

A stream computing

B data warehousing

C analytics

D distributed file system

Question: 8

What is one
uses as of the two technologies that Hadoop
its foundation?
Your answer

A HBase

B Apache

C Jaql

D MapReduce
Question: 9
What key feature does HDFS 2.0 provide that HDFS
does not?
Your answer

A a high throughput, shared file system

B high availability of the NameNode

C data access performed by an RDBMS

D random access to data in the cluster

Question: 10
What are two of the core operators that can be used
in a Jaql query? (Select two.)
Your answer

A LOAD

B JOIN

C TOP

D SELECT

Question: 11
Which type of language is Pig?
Your answer

A SQL-like

B compiled language
B compiled language

Your answer

C object oriented

D data flow

Question: 12
If you need to change the replication factor or
increase the default storage block size, which file do
you need to modify?
Your answer

A hdfs.conf

B hadoop-configuration.xml

C hadoop.conf

D hdfs-site.xml

Question: 13
To run a MapReduce job on the BigInsights cluster,
which statement about the input file(s) must be true?
Your answer

A The file(s) must be stored on the local file system where the map reduce job was
was

B The file(s) must be stored in HDFS or GPFS.

C The file(s) must be stored on the JobTracker


JobTracker..

D No matter where the input files are before, they will be automatically copied to w
Question: 14
What is a characteristic of IBM GPFS that
distinguishes it from other distributed file systems?
Your answer

A operating system independence


independence

B posix compliance

C no single point of failure

D blocks that are stored on different nodes

Question: 15
Which statement represents a difference between
Pig and Hive?
Your answer

A Pig is used for creating MapReduce programs.


programs.

B Pig has a shelf interface for executing comm


commands.
ands.

C Pig is not designed for random reads/writes or low-latency queries.

D Pig uses Load, Transform, and Store.

Question: 16

Which command helps you create a directory called


mydata on HDFS?
Your answer

A hdfs -dir mydata

B hadoop fs -mkdir mydata

C hadoop fs -dir mydata

D mkdir mydata

Question: 17

In which step of a MapReduce job is the output


stored on the local disk?
Your answer

A Reduce

B Shuffle

C Combine

D Map

Question: 18
Under the MapReduce programming model, which
task is performed by the Reduce step?
Your answer

A Worker nodes process individual


individual data segments in parallel.

B Worker nodes store results in the local file system


system..
Your answer

C Input data is split into smaller pieces.

D Data is aggregated by worker nodes.

Question: 19
Which element of the MapReduce architecture runs
map and reduce jobs?
Your answer

A Reducer

B JobScheduler

C TaskTracker

D JobTracker

Question: 20
What is one of the two driving principles of
MapReduce?
Your answer

A spread data across a cluster of computers


computers

B provide structure to unstructured or semi-structured data

C increase storage capacity through advanced com


compression
pression algorithms

D provide a platform for highly efficient transaction processing


Question: 21
When running a MapReduce job from Eclipse, which
BigInsights execution models are available? (Select
two.)
Your answer

A Cluster

B Distributed

C Remote

D Debugging

E Local

Question: 22
Which statement is true regarding the number of
mappers and reducers configured in a cluster?
Your answer

A The number of reducers is always equal to the number of mappers.

B The number of mappers and reducers can be configured by modifying the mapred-site.x

C The number of mappers and reducers is decided by the NameNode.

D The number of mappers must be equal to the number of nodes in a cluster.

Question: 23
Which command displays the sizes of files and
directories contained in the given directory,
d irectory, or the
length of a file, in case it is just a file?
Your answer

A hadoop size

B hdfs -du

C hdfs fs size

D hadoop fs -du

Question: 24
Following the most common HDFS replica
placement policy, when the replication factor is
three, how many
man y replicas will be located on the local
rack?
Your answer

A three

B two

C one

D none

Question: 25
In the MapReduce processing model, what is the
main function performed by the JobTracker?
Your answer

A copies Job Resources to the shared


shared file system
B coordinates the job execution

Your answer

C executes the map and reduce functions

D assigns tasks to each cluster node

Question: 26
How are Pig and Jaql query languages similar?
Your answer

A Both are data flow languages.

B Both require schema.

C Both use Jaql query language.

D Both are developed primarily by IBM.

Question: 27
Under the HDFS architecture, what is one purpose of
the NameNode?
Your answer

A to manage storage attached to nodes


nodes

B to coordinate MapReduce jobs

C to regulate client access to files

D to periodically report status to DataNode


Question: 28
Which command should be used to list the contents
of the root directory in HDFS?
Your answer

A hadoop fs list

B hdfs root

C hadoop fs -Is /

D hdfs list /

Question: 29
What is one function of the JobTracker in
MapReduce?
Your answer

A runs map and reduce tasks

B keeps the work physically close to the data

C reports status of DataNodes

D manages storage

Question: 30
In addition to the high-level language Pig Latin, what
is a primary component of the Apache Pig platform?
Your answer

A built-in UDFs and indexing


Your answer

B platform-specific SQL libraries

C an RDBMS such as DB2 or MySQL

D runtime environment

Question: 31
Which statement is true about Hadoop Distributed
File System (HDFS)?
Your answer

A Data is accessed through MapReduce.


MapReduce.

B Data is designed for random access read/write.

C Data can be processed over long distances without a decrease in performance.

D Data can be created, updated and deleted.

Question: 32
Which is a use-case for Text Analytics?
Your answer

A managing customer
customer information in a CRM database

B sentiment analytics from social media blogs

C product cost analysis from accounting systems

D health insurance cost/benefit analysis from payroll data


Question: 33
Which tool is used to access BigSheets?
Your answer

A BigSheets client

B Microsoft Excel

C Eclipse

D Web Browser

Question: 34
Which technology does Big SQL utilize for access to
shared catalogs?
Your answer

A Hive metastore

B RDBMS

C MapReduce

D HCatalog

Question: 35
Which statement will make an AQL view have
content displayed?
Your answer

A display view <view_name>


<view_name>
B return view <view_name>

Your answer

C output view <view_name


<view_name>
>

D export view <view_name


<view_name>
>

Question: 36
You work for a hosting company that has data
centers spread across North America. You are trying
to resolve a critical performance problem in which a
large number of web servers are performing far
below expectations. You know that the information
written to log files can help determine the cause of
the problem, but there is too much data to manage
easily. Which type of Big Data analysis is
appropriate for this use case?
Your answer

A Text Analytics

B Stream Computing

C Data Warehousing

D Temporal Analysis

Question: 37
Which utility provides a command-line interface for
Hive?
Your answer

A Thrift client
Your answer

B Hive shell

C Hive SQL client

D Hive Eclipse plugin

Question: 38
What is an accurate description of HBase?
Your answer

A It is a data flow language for structured data based


based on Ansi-SQL.

B It is a distributed file system that replicates data across a cluster.

C It is an open source implementation of Google's BigTable.

D It is a database schema for unstructured Big Data.

Question: 39
Which Hadoop-related technology provides a user-
friendly interface, which enables business users to
easily analyze Big Data?
Your answer

A BigSQL

B BigSheets

C Avro
D HBase

Question: 40
What drives the demand for Text Analytics?
Your answer

A Text Analytics is the most


most common way to derive value from Big Data.

B MapReduce is unable to process unstructured text.

C Data warehouses contain potentially valuable information.

D Most of the world's data is in unstructured or sem


semi-structured
i-structured text.

Question: 41
In Hive, what is the difference between an external
table and a Hive managed table?
Your answer

A An external table refers an existing location


location outside the warehouse directory.

B An external table refers to a table that cannot be dropped.

C An external table refers to the data from a remote database.

D An external table refers to the data stored on the local file system.

Question: 42
Which statement about NoSQL is true?
Your answer

A It provides all the capabilities of an RDBMS plus the ability to manage Big Data.

B It is a database technology that does not use the traditional relational model.
Your answer

C It is based on the highly scalable Google Compute Engine.

D It is an IBM project designed to enable DB2 to manage Big Data.

Question: 43
If you need to JOIN data from two workbooks, which
operation should be performed beforehand?
Your answer

A "Copy" to create a new sheet with


with the other workbook data in the current workbo

B "Group" to bring together the two workbooks

C "Load" to create a new sheet with the other workbook data in the current workbo

D "Add" to add the other workbook data to the current workbook

Question: 44
What is the "scan" command used for in HBase?
Your answer

A to get detailed information about the table

B to view data in an Hbase table

C to report any inconsistencies in the database

D to list all tables in Hbase


Question: 45

Which tool is used for developing a BigInsights Text


Analytics extractor?
Your answer

A Eclipse with BigInsights tools for Eclipse plugin


plugin

B BigInsights Console with AQL plugin

C AQLBuilder

D AQL command line

Question: 46
What is the most efficient way to load 700MB of data
when you create a new HBase table?
Your answer

A Pre-create regions by specifying splits


splits in create table command and use the inse
inse

B Pre-create regions by specifying splits in create table comm


command
and and bulk loading

C Pre-create the colum


column
n families when creating the table and use the put comm
comman
an

D Pre-create the colum


column
n families when creating the table and bulk loading the data

Question: 47
The following sequence of commands is executed:
create 'table_1','column_family1','column_family2'
put 'table_1','row1','column_family1:c11','r1v11'
put 'table_1','row2','column_family1:c12','r1v12'
put 'table_1','row2','column_family2:c21','r1v21'

put 'table_1','row3','column_family1:d11','r1v11'
put 'table_1','row2','column_family1:d12','r1v12'
put 'table_1','row2','column_family2:d21','r1v21'
In HBase, which value will the "count 'table_1'"
command return?
Your answer

A 4

B 3

C 6

D 2

Question: 48
Which Hive command is used to query a table?
Your answer

A TRANSFORM

B SELECT

C GET

D EXPAND

Question: 49
Why develop SQL-based query languages that can
access Hadoop data sets?
Your answer

A because SQL enhances query performance


performance

B because the MapReduce Java API is som


sometimes
etimes difficult to use

C because data stored in a Hadoop cluster lends itself to structured SQL queries

D because the data stored in Hadoop is always structured

Question: 50
Which key benefit does NoSQL provide?
Your answer

A It allows Hadoop to apply the schema-on-ingest


schema-on-ingest model to unstructured Big Data.

B It allows an RDBMS to maintain referential integrity on a Hadoop data set.

C It allows customers to leverage high-end server platforms to manage Big Data.

D It can cost-effectively manage data sets too large for traditional RDBMS.

Question: 51
What makes SQL access to Hadoop data difficult?
Your answer

A Hadoop data is highly structured.

B Data is in many formats.

C Data is located on a distributed file system.


D Hadoop requires pre-defined schema.

Question: 52
Which command can be used in Hive to list the
tables available in a database/schema?
Your answer

A list tables

B describe tables

C show all

D show tables

Question: 53
In HBase, what is the "count" command used for?
Your answer

A to count the number of columns of a table

B to count the number of column families of a table

C to count the number of rows in a table

D to count the number of regions of a table

Question: 54
Which Hadoop-related technology supports analysis
of large datasets stored in HDFS using an SQL-like
query language?
Your answer

A HBase
Your answer

B Pig

C Jaql

D Hive

Question: 55
How can the applications published to BigInsights
Web Console be made available for users to
execute?
Your answer

A They need to be marked as "Shared."

B They need to be copied under the user home directory.

C They need to be deployed with proper privileges.

D They need to be linked with the master application.

Question: 56
Which component of Apache Hadoop is used for
scheduling and running workflow jobs?
Your answer

A Eclipse

B Oozie

C Jaql

D Task Launcher
Question: 57
What is one of the main components of Watson
Explorer (InfoSphere Data Explorer)?
Your answer

A validater

B replicater

C crawler

D compressor

Question: 58
IBM InfoSphere Streams is designed to accomplish
which Big Data function?
Your answer

A analyze and react to data in motion before it is


i s stored

B find and analyze historical stream data stored on disk

C analyze and summ


summarize
arize product sentiments posted to social media

D execute ad-hoc queries against a Hadoop-based data warehouse

Question: 59
Which IBM Big Data solution provides low-latency
analytics for processing data-in-motion?
Your answer

A InfoSphere Information Server


Your answer

B InfoSphere Streams

C InfoSphere BigInsights

D PureData for Analytics

Question: 60
Which IBM tool enables BigInsights users to
develop, test and publish BigInsights applications?
Your answer

A Avro

B HBase

C Eclipse

D BigInsights Applications Catalog


Question: 5
Which description identifies the real value of Big Data and An
Your answer

A enabling customers
customers to efficiently index and access large volumes
volumes of data

B gaining new insight through the capabilities of the world's interconnected intellige

C providing solutions to help custom


customers
ers manage and grow large database systems

D using modern technology to efficiently store the massive amounts of data genera
examen blanc big data
question reponses
Question: 1 A Jaql
Which capability does IBM BigInsights add to enrich Hadoop? B Fault tolerance through HDFS replication
C Adaptive MapReduce
D Parallel computing on commodity servers

Question: 2 A value
What is one of the four characteristics of Big Data? B volume
C verifiabilit
verifiabilityy
D volatility

Question: 3 A Hadoop Common


Which Hadoop-related project provides common utilities and libraries that support other B Hadoop HBase
Hadoop sub projects? C MapReduce
D BigTable
Question: 4 Which type of Big Data analysis involves A Federated Discovery and Navigation
the processing of extremely large volumes of constantly moving data that is impractical to B Text Analysis
store? C Stream Computing
D MapReduce

Question: 6 A 64-bit architecture


Which primary computing bottleneck of modern computers is addressed by Hadoop? B disk latency
C MIPS
D limited disk capacity
Question: 7 A stream computing
Which Big Data function improves the decision-making capabilities of organizations by B data warehousing
enabling the organizations to interpret and evaluate structured and unstructured data in C analytics
search of valuable business information
information? ? D distributed file system

Question: 8 A HBase
What is one of the two technologies that Hadoop uses as its foundation? B Apache
C Jaql
D MapReduce
Question: 9 A a high throughput, shared file system
What key feature does HDFS 2.0 provide that HDFS does not? B high availability of the NameNode
C data access performed by an RDBMS
D random access to data in the cluster
Question: 10 A LOAD
What are two of the core operators that can be used in a Jaql query? (Select two.) B JOIN
C TOP
D SELECT
Question: 11 A SQL-like
Which type of language is Pig? B compiled language
C object oriented
D data flow
If you need to change the replication factor or increase the default storage block size, which A hdfs.conf
file do you need to modify? B hadoop-confi
hadoop-configuration.xml
guration.xml
C hadoop.conf
D hdfs-site.xml

Question: 13 A The file(s) must be stored on the local file system where the map reduce job was
To run a MapReduce job on the BigInsights cluster, which statement about the input file(s) developed.
must be true? B The file(s) must be stored in HDFS or GPFS.
C The file(s) must be stored on the JobTracker.
D No matter where the input files are before, they will be automatically copied to where
the job runs.
What is a characteristic
characteristic of IBM GPFS that
that distinguishe
distinguishess it from other dist
distributed
ributed file sy
systems?
stems? A operating syst
system
em indepen
independence
dence
B posix compliance
C no single point of failure
D blocks that are stored on different nodes
Question: 15 A Pig is used for creating MapReduce program
programs. s.
Which statement represents a difference between Pig and Hive? B Pig has a shelf interface for executing commands.
commands.
C Pig is not designed for random reads/writes or low-latency queries.
D Pig uses Load, Transform, and Store.

Question: 16 A hdfs -dir mydata


B hadoop fs -mkdir mydata
Which command helps you create a directory called mydata on HDFS? C hadoop fs -dir mydata
D mkdir mydata

Question: 17 A Reduce
B Shuffle
In which step of a MapReduce job is the output stored on the local disk? C Combine
D Map

Question: 18 A Worker nodes process individual data segments in parallel.


Under the MapReduce programming model, which task is performed by the Reduce step? B Worker nodes store results in the local file system.
C Input data is split into smaller pieces.
D Data is aggregated by worker nodes.

Question: 19 A Reducer
Which element of the MapReduce architecture runs map and reduce jobs? B JobScheduler
C TaskTracker
D JobTracker
Question: 19 A Reducer
Which element of the MapReduce architecture runs map and reduce jobs? B JobScheduler
C TaskTracker
D JobTracker

Question: 20 A spread data across a cluster of computers


What is one of the two driving principles of MapReduce? B provide structure to unstructured or semi-structured data
C increase storage capacity through advanced compression algorithms
D provide a platform for highly efficient transaction processin
processing
g

Question: 21 A Cluster
When running a MapReduce job from Eclipse, which BigInsights execution models are B Distributed
available? (Select two.) C Remote
D Debugging
E Local

Question: 22 A The number of reducers is always equal to the number of mappers.


Which statement is true regarding the number of mappers and reducers configured in a B The number of mappers and reducers can be configured by modifying the mapred-site.
cluster? xml file.
C The number of mappers and reducers is decided by the NameNode.
D The number of mappers must be equal to the number of nodes in a cluster.

Question: 23 A hadoop size


Which command displays the sizes of files and directories contained in the given directory, or B hdfs -du
the length of a file, in case it is just a file? C hdfs fs size
D hadoop fs -du

Question: 24 A three
Following the most common HDFS replica placement policy, when the replication factor is B two
three, how many replicas will be located on the local rack? C one
D none

Question: 25 A copies Job Resources to the shared file system


In the MapReduce processing model, what is the main function performed by the JobTracker? B coordinates the job execution
C executes the map and reduce functions
D assigns tasks to each cluster node

Question: 26 A Both are data flow languages.


How are Pig and Jaql query languages similar?
similar? B Both require schema.
C Both use Jaql query language.
D Both are developed primarily by IBM.

Question: 27 A to manage storage attached to nodes


Under the HDFS architecture, what is one purpose of the NameNode? B to coordinate MapReduce jobs
C to regulate client access to files
D to periodically report status to DataNode

Question: 28 A hadoop fs list


Which command should be used to list the contents of the root directory in HDFS? B hdfs root
C hadoop fs -Is /
D hdfs list /

Question: 29 A runs map and reduce tasks


What is one function of the JobTracker in MapReduce? B keeps the work physically close to the data
C reports status of DataNodes
D manages storage

Question: 30 A built-in UDFs and indexing


In addition to the high-level language Pig Latin, what is a primary component of the Apache B platform-specific SQL libraries
Pig platform? C an RDBMS such as DB2 or MySQL
D runtime environment

Question: 31 A Data is accessed through MapReduce.


Which statement is true about Hadoop Distributed File System (HDFS)? B Data is designed for random access read/write.
C Data can be processed over long distances without a decrease in performance.
performance.
D Data can be created, updated and deleted.

Question: 32 A managing customer information in a CRM database


Which is a use-case for Text Analytics? B sentiment analytics from social media blogs
C product cost analysis from accounting systems
D health insurance cost/benefit analysis from payroll data

Question: 33 A BigSheets client


Which tool is used to access BigSheets? B Microsoft Excel
C Eclipse
D Web Browser

Question: 34 A Hive metastore


Which technology does Big SQL utilize for access to shared catalogs? B RDBMS
C MapReduce
D HCatalog

Question: 35 A display view <view_name>


Which statement will make an AQL view have content displayed? B return view <view_name>
C output view <view_name>
D export view <view_name>

Question: 36 A Text Analytics


You work for a hosting company that has data centers spread across North America. You are B Stream Computing
trying to resolve a critical performance problem in which a large number of web servers are C Data Warehousing
performing far below expectations. You know that the information written to log files can help D Temporal Analysis
determine the cause of the problem, but there is too much data to manage easily. Which type
of Big Data analysis is appropriate for this use case?

Question: 37 A Thrift client


Which utility provides a command-line interface for Hive? B Hive shell
C Hive SQL client
D Hive Eclipse plugin

Question: 38 A It is a data flow language for structured data based on Ansi-SQL.


What is an accurate description of HBase? B It is a distributed file system that replicates data across a cluster.
C It is an open source implementation of Google's BigTable.
D It is a database schema for unstructured Big Data.

Question: 39 A BigSQL
Which Hadoop-related technology provides a user-friendly interface, which enables business B BigSheets
users to easily analyze Big Data? C Avro
D HBase

Question: 40 A Text Analytics is the most common way to derive value from Big Data.
What drives the demand for Text Analytics? B MapReduce is unable to process unstructured text.
C Data warehouses contain potentially valuable information.
D Most of the world's data is in unstructured or semi-structured text.

Question: 41 A An external table refers an existing location outside the warehouse directory.
In Hive, what is the difference between an external table and a Hive managed table? B An external table refers to a table that cannot be dropped.
C An external table refers to the data from a remote database.
D An external table refers to the data stored on the local file system.

Question: 42 A It provides all the capabilities of an RDBMS plus the ability to manage Big Data.
Which statement about NoSQL is true? B It is a database technology that does not use the traditional relational model.
C It is based on the highly scalable Google Compute Engine.
D It is an IBM project designed to enable DB2 to manage Big Data.

Question: 43 A "Copy" to create a new sheet with the other workbook data in the current workbook
If you need to JOIN data from two workbooks, which operation should be performed B "Group" to bring together the two workbooks
beforehand? C "Load" to create a new sheet with the other workbook data in the current workbook
D "Add" to add the other workbook data to the current workbook

Question: 44 A to get detailed information about the table


What is the "scan" command used for in HBase? B to view data in an Hbase table

C
D to
to report any inconsistencies
list all tables in Hbase in the database

Which tool is used for developing a BigInsights Text Analytics extractor? A Eclipse with BigInsights tools for Eclipse plugin
B BigInsights Console with AQL plugin
C AQLBuilder
D AQL command line

Question: 46 A Pre-create regions by specifying splits in create table command and use the insert
What is the most efficient way to load 700MB of data when you create a new HBase table? command to load data.
B Pre-create regions by specifying splits in create table command and bulk loading the
data.
C Pre-create the column families when creating the table and use the put command to
load the data.
D Pre-create the column families when creating the table and bulk loading the data.

Question: 47 A4
The following sequence of commands is executed: B3
create 'table_1','column_family1','column_family2' C6
put 'table_1','row1','column_family1:c11','r1v11' D2
put 'table_1','row2','column_family1:c12','r1v12'
put 'table_1','row2','column_family2:c21','r1v21'
put 'table_1','row3','column_family1:d11','r1v11'
put 'table_1','row2','column_family1:d12','r1v12'
put 'table_1','row2','column_family2:d21','r1v21'
In HBase, which value will the "count 'table_1'" command return?

Question:
Which Hive48
command is used to query a table? A TRANSFORM
B SELECT
C GET
D EXPAND

Question: 49 A because SQL enhances query performance


Why develop SQL-based query languages that can access Hadoop data sets? B because the MapReduce Java API is sometimes difficult to use
C because data stored in a Hadoop cluster lends itself to structured SQL queries
D because the data stored in Hadoop is always structured

Question: 50 A It allows Hadoop to apply the schema-on-ingest model to unstructured Big Data.
Which key benefit does NoSQL provide? B It allows an RDBMS to maintain referential integrity on a Hadoop data set.
C It allows customers to leverage high-end server platforms to manage Big Data.
D It can cost-effectively manage data sets too large for traditional RDBMS.

Question: 51 A Hadoop data is highly structured.


What makes SQL access to Hadoop data difficult? B Data is in many formats.
C Data is located on a distributed file system.
D Hadoop requires pre-defined schema.

Question: 52 A list tables


Which command can be used in Hive to list the tables available in a database/schema? B describe tables
C show all
D show tables

Question: 53 A to count the number of columns of a table


In HBase, what is the count command used for? B to count the number of column families of a table
C to count the number of rows in a table
D to count the number of regions of a table

Question: 54 A HBase
Which Hadoop-related technology supports analysis of large datasets stored in HDFS using an B Pig
SQL-like query language? C Jaql
D Hive

Question: 55 A They need to be marked as "Shared."


How can the applications published to BigInsights Web Console be made available for users to B They need to be copied under the user home directory.
execute? C They need to be deployed with proper privileges.
D They need to be linked with the master application.

Question: 56 A Eclipse
Which component of Apache Hadoop is used for scheduling and running workflow jobs? B Oozie
Oozie
C Jaql
D Task Launch
Launcher
er
Question: 57 A validater
What is one of the main components of Watson Explorer (InfoSphere Data Explorer)? B replicater
C crawler
D compressor

Question: 58 A analyze and react to data in motion before it is stored


IBM InfoSphere Streams is designed to accomplish which Big Data function? B find and analyze historical stream data stored on disk
C analyze and summarize product sentiments posted to social media
D execute ad-hoc queries against a Hadoop-based data warehouse

Question: 59 A InfoSphere Information Server


Which IBM Big Data solution provides low-latency analytics for processing data-in-motion?
data-in-motion? B InfoSphere Streams
C InfoSphere BigInsights
D PureData for Analytics

Question: 60 A Avro
Which IBM tool enables BigInsights users to develop, test and publish BigInsights applications? B HBase
C Eclipse
D BigInsights Applications Catalog

Question: 5 A enabling customers to efficientl y index and access large volumes of data
Which description identifies the real value of Big Data and Analytics? B gaining new insight through the capabilities of the world's interconnected intelligence
C providing solutions to help customers manage and grow large database systems
D using modern technology to efficiently store the massive amounts of data generated
by social networks

examen big data


question reponses
Question: 1 A Notebook
Where must a Spark configuration be set up first? B Db2 Warehouse
C IBM Cloud
D Watson Data Platform

Question: 2 A Watson Studio homepage


When sharing a notebook, what will always point to the most recent version of the notebook? B The permalink
C The Spark service
D PixieDust visualization

Question: 3 A Spark service


When creating a Watson Studio project, what do you need to specify? B Data service
C Collaborat
Collaborators
ors
D Data assets

Question: 4 AR
You can import preinstalled libraries if you are using which languages? (Select two.) B Python
(Please select ALL that apply) C Bash
D Rexx
E Scala

Question: 5 A Viewers
Who can control a Watson Studio project assets? B Editors
C Collaborat
Collaborators
ors
D Tenants

Question: 6 A ZOOKEEPER_APP
Which environmental variable needs to be set to properly start ZooKeeper? B ZOOKEEPER_DATA

C
D ZOOKEEPER
ZOOKEEPER_HOME

Question: 7 A better compression using GZip


Which is the primary advantage of using column-based data formats over record-based B supports in-memory processing
formats? C facilitates SQL-based queries
D faster query execution

Question: 8 A Collect and send data into a stream.


What is the primary purpose of Apache NiFi? B Finding data across the cluster.
C Connect remote data sources via WiFi.
D Identifying non-compliant data access.
Question: 9 A cash register receipts
What are three examples of Big Data? (Choose three.) B web server logs
(Please select ALL that apply) C inventory database records
D bank records
E photos posted on Instragram
F messages tweeted on Twitter

Question: 10 A get /
What ZK CLI command is used to list all the ZNodes at the top level of the ZooKeeper B create /
hierarchy, in the ZooKeeper command-line interface? C listquota /
D ls /

Question: 11 A JSON
What is the default data format Sqoop parses to export data to a database? B CSV
C XML
D SQL

Question: 12 A Keeps the tasks physically close to the data.


Under the MapReduce v1 architecture, which function is performed by the TaskTracker? B Pushes map and reduce tasks out to DataNodes.
C Manages storage and transmission of intermediate output.
D Accepts MapReduce jobs submitted by clients.

Question: 13 A Indexed databases containing very large volumes of historical data used for compliance
Which statement describes "Big Data" as it is used in the modern business world? reporting purposes.
B Non-conventional methods used by businesses and organizations to capture, manage,
process, and make sense of a large volume of data.
C Structured data stores containing very large data sets such as video and audio streams.
D The summarization of large indexed data stores to provide information about potential
problems or opportunities.

Question: 14 A Runs map and reduce tasks.


Under the MapReduce v1 architecture, which function is performed by the JobTracker? B Accepts MapReduce jobs submitted by clients.
C Manages storage and transmission of intermediate output.
D Reports status to MasterNode.

Question: 15 A HDFS is a software framework to support computing on large clusters of computers.


Which statement is true about the Hadoop Distributed File System (HDFS)? B HDFS provides a web-based tool for managing Hadoop clusters.
C HDFS links the disks on multiple nodes into one large file system.
D HDFS is the framework for job scheduling and cluster resource management.
management.

Question: 16 A Coordination between servers.


How does MapReduce use ZooKeeper? B Aid in the high availability of Resource Manager.
C Master server election and discovery.
D Server lease management of nodes.

Question: 17 A Python
Which two Spark libraries provide a native shell? (Choose two.) B Scala
(Please select ALL that apply) C C#
D Java
E C++

Question: 18 A IP address
What is an authentication mechanism in Hortonworks Data Platform? B Preshared keys
C Kerberos
D Hardware token

Question: 19 A Manage, secure, and govern data stored across all storage environments.
What is Hortonworks DataPlane Services (DPS) used for? B Transform data from CSV format into native HDFS data.
C Perform backup and recovery of data in the Hadoop ecosystem.
D Keep data up to date by periodically refreshing stale data.

Question: 20 A Copy any appropriate JDBC driver JAR to $SQOOP_HOME/lib.


What must be done before using Sqoop to import from a relational database? B Complete the installation of Apache Accumulo.
C Create a Java class to support the data import.
D Create an empty database for Sqoop to access.

Question: 21 A Scala
What is the native programming language for Spark? B C++
C Java
D Python

Question: 22 A YARN
Which Hortonworks Data Platform (HDP) component provides a common web user interface B HDFS
for applications running on a Hadoop cluster? C Ambari
D MapReduce
Question: 23 A Transformatio
Transformations
ns
Which Spark RDD operation returns values after performing the evaluations? B Actions
C Caching
D Evaluations

Question: 24 A Configuration bootstrapping for new nodes.


Which two are use cases for deploying ZooKeeper? (Choose two.) B Managing the hardware of cluster nodes.
(Please select ALL that apply) C Storing local temporary data files.
D Simple data registry between nodes.

Question: 25 A DataNodes increase capacity while NameNodes increase processing power.


In a Hadoop cluster, which two are the result of adding more nodes to the cluster? (Choose B It adds capacity to the file system.
two.) C Scalability increases by a factor of x^N-1.
(Please select ALL that apply) D Capacity increases while fault tolerance decreases.
E It increases available processinngg power.

Question: 26 A Distribution
Which Spark RDD operation creates a directed acyclic graph through lazy evaluations? B GraphX
C Transformations
D Actions

Question: 27 A REST APIs


Which feature allows application developers to easily use the Ambari interface to integrate B Postgres RDBMS
Hadoop provisioning, management, and monitoring capabilities into their own applications? C Ambari Alert Framework
D AMS APIs
Question: 28 A Columns of data must be separated by a delimiter.
What is one disadvantage to using CSV formatted data in a Hadoop data store? B Fields must be positioned at a fixed offset from the beginning of the record.
C It is difficult to represent complex data structures such as maps.
D Data must be extracted, cleansed, and loaded into the data warehouse.

Question: 29 A YARN
Which element of Hadoop is responsible for spreading data across the cluster? B MapReduce
C AMS
D HDFS

Question: 30 A Authorization Provider


Which component of the Apache Ambari architecture stores the cluster configurations? B Ambari Metrics System
C Postgres RDBMS
D Ambari Alert Framework

Question: 31 A Time of interaction


Which two are examples of personally identifiable information (PII)? (Choose two.) B Medical record number
(Please select ALL that apply) C Email address
D IP address

Question: 32 A SlaveNode
Under the MapReduce v1 architecture, which element of the system manages the map and B JobTracker
reduce functions? C MasterNode
D StorageNode
E TaskTracker

Question: 33 A NameNode
Which component of the HDFS architecture manages storage attached to the nodes? B StorageNode
C DataNode
D MasterNode

Question: 34 A Volume
Which of the "Five V's" of Big Data describes the real purpose of deriving business
business insight from B Value
Big Data? C Variety
D Velocity
E Veracity

Question: 35 A Spark Learning


Which component of the Spark Unified Stack supports learning algorithms such as, logistic B Spork
regression, naive Bayes classification, and SVM? C Spark SQL
D MLlib

Question: 36 A able to use inexpensive commodity hardware


Which two descriptions are advantages of Hadoop? (Choose two.) B intensive calculations on small amounts of data
(Please select ALL that apply) C processing a large number of small files
D processing random access transactions

E processing large volumes of data with high throughput


Question: 37 A CSV
Which two of the following are row-based data encoding formats? (Choose two.) B Avro
(Please select ALL that apply) C ETL
D Parquet
E RC and ORC

Question: 38 A The data is spread out and replicated across the cluster.
Which statement describes the action performed by HDFS when data is written to the Hadoop B The data is replicated to at least 5 different computers.
cluster? C The MasterNodes write the data to disk.
D The FsImage is updated with the new data map.

Question: 39 A MasterNode
Under the MapReduce v1 architecture, which element of MapReduce controls job execution B JobTracker
on multiple slaves? C SlaveNode
D TaskTracker
E StorageNode
Question: 40 A MLlib
Which component of the Spark Unified Stack provides processing of data arriving at the B Spark SQL
system in real-time? C Spark Streaming
D Spark Live

Question: 41 A DB2ATSENABLE
Which two registries are used for compiler and runtime performance improvements in B DB2F ODC
support of the Big SQL environment? (Choose two) C DB2COMPOPT
(Please select ALL that apply) D DB2RSHTIMEOUT
E DB2SORTAFTER_TQ

Question: 42 A bigsql_bar.py
Which script is used to backup and restore the Big SQL database? B db2.sh
C bigsql.sh
D load.py

Question: 43 A STRING
You need to create a table that is not managed by the Big SQL database manager. Which B BOOLEAN
keyword would you use to create the table? C SMALLINT
D EXTERNAL

Question: 44 A Oracle
Which two of the following data sources are currently supported by Big SQL? (Choose two) B PostgreSQL
(Please select ALL that apply) C Teradata
D MySQL
E MariaDB

Question: 45 A 7055
Which port is the default for the Big SQL Scheduler to get administrator commands? B 7054
C 7052
D 7053

Question: 46 A Hortonworks
Which tool should you use to enable Kerberos security? B Ambari
C Apache Ranger
D Hive

Question: 47 A Scheduler
Which two options can be used to start and stop Big SQL? (Choose two) B DSM Console
(Please select ALL that apply) C Command line
D Java SQL shell

Question: 48 A CREATE
Which command is used to populate a Big SQL table? B QUERY
C SET
D LOAD

Question: 49 A Impersonation
Which feature allows the bigsql user to securely access data in Hadoop on behalf of another B Privilege
user? C Rights
D Schema

Question: 50 A SET AUTHOR


AUTHORIZATION
IZATION
Which command would you run to make a remote table accessible using an alias? B CREATE SERV
SERVER
ER
C CREATE W
WRAPPER
RAPPER
D CREATE NICK
NICKNAME
NAME

Question: 51 A Db2
The Big SQL head node has a set of processes running. What is the name of the service ID B hdfs
running these processes? C user1
D bigsql

Question: 52 A Parquet
Which file format contains human-readable data where the column values are separated by a B ORC
comma? C Delimited
D Sequence

Question: 53 A Public key


Which Big SQL authentication mode is designed to provide strong authentication for B Flat files
client/server applications by using secret-key cryptography? C Kerberos
D LDAP

Question: 54 A Jupyter
Which type of foundation does Big SQL build on? B Apache HIVE
C RStudio
D MapReduce

Question: 55 A SSL
You need to monitor and manage data security across a Hadoop platform. Which tool would B HDFS
you use? C Hive
D Apache Ranger

Question: 56 A """
What can be used to surround a multi-line string in a Python code cell by appearing before and B "
after the multi-line string? C

Question: 57 A Packaging data for public distribution on a website.


For what are interactive notebooks used by data scientists? B Quick data exploration tasks that can be reproduced.
C Providing a chain of custody of all data.
D Bulk loading data into a database.

Question: 58 A pull
What Python statement is used to add a library to the current code cell? B import
C load
D using

Question: 59 A NLTK
What Python package has support for linear algebra, optimization, mathematical integration, B Pandas
and statistics? C NumPy
D SciPy

Question: 60 A Traditional research


Which three main areas make up Data Science according to Drew Conway? (Choose three.) B Machine learning
(Please select ALL that apply) C Substantive expertise
D Math and statistics knowledge
E Hacking skills

Big Data Engineer v2IBM Certification20


Certification2018
18
question reponses
1/ What are the 4Vs of Big Data? (Please select the FOUR that apply) •Veracity •Velocity •Variety •Volume
2/ What are the three types of Big Data? (Please select the THREE that apply) •Semi-struc tured •Structured •Unstructured
3/ Select all the components of HDP which provides data access capabilities •Pig •MapReduce •Hive
4/ Select the components that provides the capability to move data from relational database
•Sqoop •Kafka •Flume
into Hadoop.
5/ Managing Hadoop clusters can be accomplished using which component? Ambari
6/ True or False: The following components are value-add from IBM: Big Replicate, Big SQL,
TRUE
BigIntegrate,, BigQuality, Big Match
BigIntegrate
7/ True or False: Data Science capabilities can be achieved using only HDP. FALSE(Big Data Ecosystem UNIT 2)
8/ True or False: Ambari is backed by RESTful APIs for developers to easily integrate with their
True
own applications.
9/ Which Hadoop functionalities does Ambari provide? Manage •Provision •Integrate •Monitor
10/ Which page from the Ambari UI allows you to check the versions of the software installed
The Admin > Manage Ambari page
on your cluster?
11/ True or False?Creating users through the Ambari UI will also create the user on the HDFS. FALSE
12/ True or False? You can use the CURL commands to issue commands to Ambari TRUE
13/ True or False: Hadoop systems are designed for transaction processing. FALSE
14/ What is the default number of replicas in a Hadoop system? 3
15/ True or False: One of the driving principal of Hadoop is that the data is brought to the
FALSE(Big Data Ecosystem UNIT 4)
program.
16/ True or False: Atleast 2 Name Nodes are required for a standalone Hadoop cluster. FALSE(Big Data Ecosystem UNIT 4)
17/ True or False: The phases in a MR job are Map, Shuffle, Reduce and Combiner TRUE
18/ Centralized handling
handling of job control flow is one of the the limitations
limitations of MRv1 TRUE
19/ The Job Tracker in MR1 is replaced by which component(s)
component(s) in YARN? •ApplicationMaster •ResourceManager
20/ What are the benefits of using Spark? (Please select the THREE that apply) •Generality •Speed •East of use
21/ What are the languages supported by Spark? (Please select the THREE that apply) •Python •Java •Scala
22/ Resilient Distributed Dataset (RDD) is the primary abstraction of Spark. True
23/ What would you need to do in a Spark application that you would not need to do in a
Import the necessary libraries to load the SparkContext
Spark shell to start using Spark?
24/ True or False: NoSQL database is designed for those that do not want to use SQL. FALSE(Big Data Ecosystem UNIT 7
25/ Which database is a columnar storage database? Hbase
26/ Which database provides a SQL for Hadoop interface? Hive
27/ Which Apache project provides coordination of resources? ZooKeeper
28/ What is ZooKeeper's role in the Hadoop infrastructure? •Manage the coordination between HBase servers •Hadoop and MapReduce uses
ZooKeeper to aid in high availability of Resource Manager •Flume uses
ZooKeeper for configuration purposes in recent releases
29/ True or False: Slider provides an intuitive UI which allows you to dynamically allocate YARN
FALSE(Big Data Ecosystem UNIT 8)
resources.

30/ True or False: Knox can provide all the security you need within your Hadoop
FALSE(Big Data Ecosystem UNIT 8)
infrastructure.
31/ True or False: Sqoop is used to transfer data between Hadoop and relational databases. True
32/True or False: For Sqoop to connect to a relational database, the JDBC JAR files for that
FALSE(Big Data Ecosystem UNIT 9)
database must be located in $SQOOP_HOME/bin.
33/ True or False: Each Flume node receives data as "source", stores it in a "channel", and
True
sends it via a "sink".
34/ Through what HDP component are Kerberos, Knox, and Ranger managed? Ambari
35/ Which security component is used to provide peripheral security?
security? Apache Knox
36/ One of the governance issue that Hortonworks DataPlane Service (DPS) address is visibility
over all of an organization's data across all of their environments — on-prem, cloud, hybrid — True
while making it easy to maintain consistent security and governance
37/ True or false: The typical sources of streaming data are Sensors, "Data exhaust" and high-
True
rate transaction data.
38/ What are the components of Hortonworks Data Flow(HDF)? •Flow management •Stream processing •Enterprise services
39/ True or False: NiFi is a disk-based, microbatch ETL tool that provides flow management True
40/ True or False: MiNiFi is a complementary data collection tool that feeds collected data to
True
NiFi
41/ What main features does IBM Streams provide as a Streaming Data Platform? (Please
•Analysis and visualization •Rich data connections •Development support
select the THREE that apply)
42/ What are the most important computer languages for Data Analytics?(Please select the
•Python •R •Scala
THREE that apply
43/ True or False: GPUs are special-purpose processors that traditionally can be used to power
graphical displays, but for Data Analytics lend themselves tofaster algorithm execution True
because of the large number of independent processing cores.
44/ True or False: Jupyter stores its workbooks in files with the .ipynb suffix. These files can
FALSE(Introduction
FALSE(Introduction to Data Science UNIT 1)
not be stored locally or on a hub server.
45/ $BIGSQL_HOM E/bin/bigsql startcommand is used to start Big SQL from the command
True
line?
46/ What are the two ways you can work with Big SQL.(Please select the TWO that apply) •Jsqsh •Web tooling from DSM
47/ What is one of the reasons to use Big SQL? Want to access your Hadoop data without using MapReduce
48/ Should you use the default STRING data type? No(Big SQL UNIT 2)
49/The BOOLEAN type is defined as SMALLINT SQL type in Big SQL. True
50/ Using the LOAD operation is the recommended method for getting data into your Big SQL
True
table for best performance.
51/ Which file storage format has the highest performance? Parquet
52/ What are the two ways to classify functions? •Built-in functions •User-defined
•User-defined functions
53/ True or False: UMASK is used to determine permissions on directories and files. True
54/ True or False: You can only Kerberize a Big SQL server before it is installed. False(Big SQL UNIT 4)
55/ True or False: Authentication with Big SQL only occurs at the Big SQL layer or the client's False(Big SQL UNIT 4)
application layer.
56/ True or False: Ranger and impersonation works well together. False(Big SQL UNIT 4)
57/ True or False: RCAC can hide rows and columns. True
58/ True or False: Nicknames can be used for wrappers and servers. False(Big SQL UNIT 5)
59/ True or False: Server objects defines the property and values of the connection. True
60/ True or False: The purpose of a wrapper provide a library of routines that doesn't
False(Big SQL UNIT 5)
communicates with the data source.
61/ True or False: User mappings are used to authenticate to the remote datasource. True
62/ True or False: Collaboration with Watson Studio is an optional add-on component that
False(Watson Studio UNIT 1)
must be purchased.

63/ True or False: Watson Studio is designed only for Data Scientists, other personas would
False(Watson Studio UNIT 1)
not know how to use it.
64/ True or False: Community provides access to articles, tutorials, and even data sets that you
True
can use.
65/ True or False: You can import visualization libraries into Watson Studio. True
66/ True or False: Collaborators can be given certain access levels. True
67/True or False: Watson Studio contains Zeppelin as a notebook interface. False(Watson Studio UNIT 2) Jupyter as notebook
Big Data QCM
question reponses
Which component connects sinks and sources in Flume? A. HDFS
B. Elasti
ElasticSearch
cSearch
* C. channels
D. Interceptors
Why does YARN scale better than Hadoop v1 for multiple jobs? (Choose two.) A. There is one Job Tracker per cluster.
cluster.
(Please select ALL that apply) * B. Job tracki ng and resource management are split.
C. Job tracking and resource management are one process.
* D. Ther e is one Application Master per job.
What happens ifif a task fails during a Hadoop job execution? A. T he
he job will be restarted with different compute nodes.
B. The entire
entire job will fail.
C. The job will finish with incomplete results.
* D. The task will be restarted on another node.
What command will list files located on the HDFS in R? A. b
biigr.dir()
B. lsls()
()
* C. bigr.listfs()
D. list
list()
()

Which Big SQL datatype should be avoided because it causes significant performance A. CHAR
degradation? * B. STRING
C. UNION
D. VAR
VARCHAR
CHAR
You need to create multiple Big SQL tables with columns defined as CHAR. What needs to be * A. SET SYSHADOOP.COM PATIBILIT Y_MODE=1
set to enable CHAR columns? B. CREATE TABLE chartab
C. SET HADOOPCO
HADOOPCOMPATIBLITY_MO
MPATIBLITY_MODE=True
DE=True
D. ALTER CHA
CHARR DATATY
DATATYPE
PE TO byte
What is the primary core abstraction of Apache Spark? A. GraphX
* B. Resilient Distributed Dataset (RDD)
C. Spark Streaming
D. Directed Acyclic Graph (DAG)

Which Text Analytics runtime component is used for languages such as Spanish and English by A. Named entity extractors
breaking a stream of text into phrases or words? B. Other extractors
* C. Standard tokenizer
D. Multilingual tokenizer
Question : Which two commands are used to load data into an existing Big SQL table from * A. Lo ad
HDFS? (Choose two.) B. Tab
Table
le
(Please select ALL that apply) C. Select
* D. Inser t
E. Creat
Createe
Which command should you use to set the default schema in a Big SQL table and also create A. default
the schema if it does not exist? B. crea
create
te
C. format
* D. use

What is missing from the following statement when querying a remote table? CREATE A. TABLE
_______ FOR remot
remotetable1
etable1 … B. VVIEW
IEW
* C. NICKNAME
D. IND
INDEX
EX

What are two major business advantages of using BigSheets? (Choose two.) * A. bui lt-in data readers for multiple formats
(Please select ALL that apply) * B. spreadsheet-li ke querying and discovery interface
C. command-lin e-driven data analysis
D. feature rich programming environment

W he
he re
re s h
ho
ouull d y o
ou
u bu
uii ld
ld e xt
xt ra
ra ct
ct ors
ors i n t he
he I nfo
nform
rma
a titi on
on Ext
Ext rra
a ct
ct iio
onWe
eb
b Too
Too ll?
? A. Do
Do ccu
umme
e nt
nt s
* B. Canvas
C. Property pane
D. Regular expression

In which text analytics phase are extractors developed and tested? A. An


Analysis
* B. Rule Development
C. Produ
Production
ction
D. Performance Tuning
Which action is performed during the Reduce step of a MapReduce v1 processing cycle? * A. Intermedi ate results are aggregated.
B. T he TaskTracker distributes the job to the cluster.
C. The initial problem is broken into pieces.
D. The JobTrackers execute their assigned tasks.

What are two benefits of using the IBM Big SQL processing engine? (Choose two.) A. Core functionality is written in Java for portability.
portability.
(Please select ALL that apply) B. The system is built to be started and stopped on demand.
* C. Various data storage formats are supported.
* D. It provides acc ess to Hadoop data using SQL.
An organization is developing a proof-of-concept for a big data system. Which phase of the big * A. Engage
data adoption cycle is the company currently in? B. Exe
Execute
cute
C. Explo
Explore
re
D. Educ
Educate
ate
Which
Which feat
feature
ure in a B
Big
ig SQL federa
federation
tion is
is a libra
library
ry to ac
access
cess a parti
particular
cular type
type of d
data
ata source
source?
? A. server
server
B. tab
table
le
C. view
* D. wrapper
What is a feature of Apache ZooKeeper? A. ge
generates shell programs for running components of Hadoop
B. monit ors log files of cluster members
* C. maintains configuration
configuration information
information for a cluster
D. perf ormance tunes a running cluster

Which open source component is a big data processing framework?


* A. Apache Spark
B. Apache Ambari
C. IBM BigSheets
D. IBM Big SQL
Wh ic
ic h c o
om
mma nd
nd i s us ed
ed to l au
au nc
nc h an i nt
nter ac
act iv
iv e P yytt h
ho
on s he
hel l f o
orr S pa
par kk?
? A. ppyyt h
ho
on - sp
spa rrkk
* B. pyspark
C. hadoop pyshell
D. spark-s
spark-shell
hell

What are the two types of Spark operations? (Choose two.) A. Sequences
(Please select ALL that apply) B. Ve
Vectors
ctors
C. DataFrames
* D. Tr ansformation s
* E. Actions
Which statement is true regarding Reduce tasks in MapReduce?
* A. They can run on any node.
B. They only run on nodes that didn't generate data during the Map step.
C. They run only on nodes that generated data during the Map step.
D. They only run on one node.
What command will load the BigR package in R? * A. library(bigr)
B. source
source("bigr")
("bigr")
C. dir(patt
dir(pattern="bigr")
ern="bigr")
D. bigr.con
bigr.connect
nect
Which programming language is Apache Spark primarily written in? * A.
A. Scala
B. Java
C. Python 2
D. C+
C+++
Which feature
feature of Text A
Analytics
nalytics sho
should
uld you use to process Japanese o
orr Chinese la
language
nguage tex
text?
t? A. Annotatio
Annotation n Query Language
Language (AQL)
B. Standard tokenizer
* C. Multilingu al tokenizer
D. Online Analytical Programming (OLAP)
Which kind of HBase row key maps to multiple SQL columns?
A. Prima
Primary ry
B. Dense
C. Unique
* D. Composite
What does the HCatalog component of Hive provide? A. c ol
ollecting common data transformations into a library
B. maint aining an inventory of cluster nodes
* C. table and storage management layer for Hadoop
D. providing a REST gateway
gateway for job
jobss
Which
Which action
action is perfo
performed
rmed prior
prior to the Map step of a MapR
MapRedu
educe
ce v1
v1 pr
process
ocessing
ing ccycle
ycle?
? A. The job iiss sent
sent seque
sequentia
ntially
lly to all
all n
nodes
odes..
B. Output result sets are simplified to a single answer.
C. The data required is moved to the fastest nodes.
* D. The job is broken into individual task pieces and distributed.
Which integration API does Apache Ambari support? * A. RES T
B. RMI
C. SOAP
D. R RPC
PC
How does an end-user interact with the IBM BigSheets tool? A. IIB
BM-built desktop app
B. command line
C. mobile app
* D. web browser
Which software is at the core of the IBM BigInsights platform? * A. open source components
B. customer developed software
C. proprietary IBM libraries
D. cloud-ba sed web services
What command is used to retrieve multiple rows out of an HBase table? A. p uulll
B. sele
selectct
C. scan
* C.
D. gegett

Which format is used to export extractor results? A. TXT


B. R
RTF
TF
C. JSON
* D. CSV
How does Sqoop decide how to split data across mappers? * A. exami ning the primary key
B. moving the data to the closest network node
C. dividing the input bytes by available nodes
D. applying the split size to the data
How should you use the pre-built extractors for a new project? A. A ss
ssign them to a new query.
B. Right click on the extractor and select Edit Output.
* C. Dra g and drop them onto canvas.
D. Con vert them to AQL Statements.
Wh y i s t he
he SY SP
SP R
RO
OC .S
.SY SSIIN SST
T AL
AL LO
LOBJ EC
ECT p
prr oc
oc ed
edu re
re u se
sed w iitt h Big SQ
SQ LL?
? A . to
to c rre
eate a S NA
NA P
PSSH OT
OT c ol
ol u
um
mn
B. to set the location of the EXPLAIN.DDL
C. to specify the SQL statement to be explained
* D. to create an EXPLAIN table
What does the programmatic implementation of a Map function do? * A. Reads the data file and performs a transformati on.
B. Combi nes previous results into an aggregate.
C. Locates the data in the DFS.
D. Computes the final result of the entire job.
Which statement will create a table with parquet files? * A. CREATE HADOOP TABLE T ( i int, s VARCHAR(10)) STORE D AS PARQUETFILE;.
B. CREATE HADOOP TABLE TABLE T ( i int, s VARCHAR(1
VARCHAR(10))
0)) STORED AS PARQU
PARQUET;
ET;
C. CREATE HADOO
HADOOP P TABLE T ( i int, s VARCHAR(
VARCHAR(10))
10)) SAVE AS PARQUETF
PARQUETFILE;
ILE;
D. CREATE HADOOP TAB TABLE
LE T ( i int, s VARCHAR(1
VARCHAR(10))
0)) SAVE AS PARQUET;
What is the JSqsh tool used for? A. web-based SQL editing
B. depl oying the SQL JDBC driver
* C. command-line SQL queries
D. i nstalling the IBM Data Server Manager (DSM)

What does the bucketing feature of Hive do? * A. sub-par titioning/g rouping of data by hash within partitions
B. allows data to be stored in arrays
C. splits data into collections based on ranges
D. dis tributes the data dynamically for faster processing
What advantage does the Text Analytics Web UI give you? * A. It generates the AQL syntax for you.
B. I t allows only single data types.
C. It allows oonly
nly one ty
type
pe of file extension.
extension.
D. It teaches you how to write AQL syntax.
Wh
Whic
ich
h AQ
AQLL ca
cand
ndid
idat
ate
e rule
rule ccom
ombi
bine
ness tupl
tuples
es ffro
rom
m two
two vie
views
ws w
wit
ith
h th
the
e same
same ssch
chem
ema?
a? A. Bl
Bloc
ocks
ks
B. Sel
Select
ect
* C. Union
D. Sequence
Data collected within your organization has a short period of time when it is relevant. Which * A. Veloc ity
characteristic of a big data system does this represent? B. Validation
C. Variety
D. Volume
Assuming the same data is stored in multiple data formats, which format will provide faster * A. Pa rquet
query execution and require the least amount of IO operations to process? B. XML
C. flat file
D. JSON
Which feature of Text Analytics allows you to rollback your extractors when necessary? * A. Snaps hots
B. Standard tokenizer
C. Scalar functions
D. Multilingual tokenizer
What defines a relation in an AQL extractor? * A. a view
B. a row
C. a schema
D. a column
Which command must be run after compiling a Java program so it can run on the Hadoop * A. jar cf name.jar *.class
*.class
cluster? B. hadoop classpath
C. jar tf name.jar
D. rm hadoop.class
What type of NoSQL datastore does HBase fall into? A. document
B. key-value
* C. column
D. gra
graph
ph
Which data inconsistency may appear while using ZooKeeper? A. ex
e xcessively stale data views
* B. simultaneously inconsis tent cross-client views
C. unr eliable client updates across the cluster
D. out-of-order updates across clients
clients
What is re
required to run an EXPLAIN statement in Big SQ
SQL? A. t h
hee explainable-sql-statement clause
B. the SYSPROC.SYSINSTA
SYSPROC.SYSINSTALLOBJECT
LLOBJECT procedure
* C. pro per authorization
D. a rule

Which command must be run first to become the HDFS user? * A. su - hdfs
B. hadoop fs
C. pwd
D. hhdfs
dfs
Which Big SQL file format is human readable and supported by most tools, but is the least * A. Delimited
efficient file format? B. Parq
Parquet uet
C. Sequence
D. Av
Avro
ro
W ha
ha t i s t he
he de
de fa
fa u
ull t i ns
ns ta
ta llll lo
lo cca
a titi on
on ffo
o r t he
he IB
IB M O p
pe
e n D at
at a Pl
Pla
a tfo
tform
rm o n Li
Lin
nuuxx ? A. / o
opp tt// iib
bmm// iio
op
B. /var/iop
C. /usr/lo
/usr/local/iop
cal/iop
* D. /usr/iop
You need to populate a Big SQL table to test an operation. Which INSERT statement is A. INSERT INTO ... SELECT SELECT FROM ...
recommended for testing, only because it does not support parallel reads or writes? * B. INSERT INTO ... VALUES VALUES (...)
C. INSERT INTO .. .... SELECT …
D. INSERT INTO ... SELECT ... ... WHERE …

Which command is used to launch an interactive Apache Spark shell? A. s cca


ala --spark
B. hadoop spark
C. spa
spark
rk
* D. spark- shell
Which data inconsistency may appear while using ZooKeeper? A. excessively stale data views
B. simultaneously inconsistent cross-client views
C. out-of-order update
updatess across clients
D. unreliable client updates across the cluster
Which st
statement wi
will ccrreate a table with pa
parquet fifiles? A. C RE
REATE HAHADOOP TATABLE T ( i int, s VARCHAR(10)) SST
TORED AS
AS P
PA
ARQUET;
B. CREATE HADOOP TABLE TABLE T ( i int, s VARCHAR(1
VARCHAR(10))
0)) SAVE AS PARQUETFIL
PARQUETFILE;
E;
* C. CREATE HADOOP TABLE T ( i int, s VARCHAR(10)) STORED AS PARQUETFILE;.
PARQUETFILE;.
D. CREATE HADOOP TAB TABLE
LE T ( i int, s VARCHAR(1
VARCHAR(10))
0)) SAVE AS PARQUET;
What are extractors transformed into when they are executed? A. Ca
Candidate generation statements
B. BigSheets function statements
* C. Annotated Query Language (AQL) statements
D. Online Analytical Programming (OLAP) statements

You need to set up the command-line interface JSqsh to connect to a bigsql database. What is A. Run the $JSQSH_HOME/ bin/JSQSH script.
the recommended method to set up the connection? B. Run the JSqsh driver wizard.
C. Mod ify database parameters in the .jsqsh/connections.xml file.
* D. Run the JSqsh connection wizard .
How will the following column mapping command be encoded? cf_data:full_names mapped A. Hex
by (last_name, First_name) separator ',' B. Charact
Character
er
C. Binary
* D. String
Which underlying data representation and access method does Big SQL use? * A. Hi ve
B. TINYI
TINYINT
NT
C. MAP
D. SMAL
SMALLINT
LINT
What does the MLlib component of Apache Spark support? * A. scalable machine learning
B. graph computation
C. SQL and HiveQL
D. stream processing

Which type of HBase column is mapped to multiple SQL columns? A. Co


Composite
B. Exclu
Exclusive
sive
* C. Dense
D. Double
What is a key factor in determining how to implement file compression with HDFS? * A. compression algorithm supports splitting
B. the CPU speed of the cluster members (MHz)
C. the speed of network transfers between nodes
D. the amount of storage space needed for all files

What is used in a Big SQL file system to organize tables? A. JJSSqsh


B. D DSM
SM
* C. schemas
D. partitions
What command is used to start a Flume agent? A. flume-start
B. flum
flume-src
e-src
* C. flume-ng

Which component is required for Flume to work? A.D.RDflume


flume-agent
BMS -agent
B. Sys
Syslog
log
* C. Data source
D. Interceptor

When creating a new table in Big SQL, what additional keyword is used in the CREATE TABLE A. dfs
statement to create the table in HDFS? * B. hadoop
C. replicated
D. clo
cloud
ud

What is a feature of an Avro file? A. directly readable by JavaScript


* B. versi oning of the data
C. col umns delimited by commas
D. formal schema language

What does the federation feature of Big SQL allow? A. ttu


uning server hardware performance
B. importi ng data into HDFS
C. re writing statements for better execution performance
* D. queryi ng multiple data sources in one statement

A Hadoop file listing is performed and one of the output lines is: -rw-r--r-- 5 biadmin biadmin A. permissions
871233 2015-09-12 09:33 data.txt What does the 5 in the output represent? * B. replication factor
C. login id of the file owner
owner
D. data size

What privilege
privilege is required to e
execute
xecute an EXPLAI
EXPLAIN
N statemen
statementt with INSERT privileges
privileges in Big SQL? A. SYSMON authority
authority
B. SECADM authority
* C. SQLADM authority
D. SYSCTRL authority

What does a computer need to understand unstructured data? * A. co ntext


B. usa
usage
ge
C. extractors
D. attribute types
What is the ApplicationMaster in YARN responsible for? (Choose two.) * A. monitori ng node execution status
(Please select ALL that apply) * B. obtaining resources for computation
C. taking nnodes
odes offline for mainte
maintenance
nance
D. alloc ating resources from all nodes

What is a limitation of Apache Spark? * A. It does not have universal tools.


B. It does not support streams.
C. It does not run Hadoop.
D. It does not in itself interact with SQL.
How is a sequence created in Canvas? A. Click on the New Literal button.
* B. Drag and drop one extractor onto another.
C. Right click on the extractor, and select Edit Output.
D. S elect multiple extractors on the result pane.
Which type of ke
key does HBase require in each row in an HBase table? A. D u up
plicate
B. Fore
Foreign
ign
* C. Unique
D. Prima
Primaryry
What is a key factor in determining how to implement file compression with HDFS? * A. compression algorithm supports splitting
B. th e speed of network transfers between nodes
C. the amount of storage space needed for all files
D. the CPU speed of the cluster members (MHz)
Which two components make up a Hadoop node? (Choose two.) * A. CP U
(Please select ALL that apply) B. network
C. memory
* D. disk
Which statement is used to set the correct compatible collation with Big SQL? * A. CREATE SERVER
B. SEQU
SEQUENCE
ENCE
C. PUSHDOWN
D. CREATE WRAPPER
How does Apache Ambari use the Ganglia component? A. tto
o predict hardware failures
* B. to monitor cluster performance
C. to cluster job scheduling
D. to add new nodes to the cluster

How can you fix duplicate results generated by an extractor from the same text because the A. edit output with overlapping matches
text matches more than one dictionary entry? B. remove union statement
* C. remov e with a consolidatio n rule
D. edit prop
properties
erties of the seq
sequence
uence

Which statement best describes Spark? A. A


An
n instance of a federated database.
* B. A computing engi ne for a large-scale data set.
C. A logical view on top of Hadoop data.
D. A n open source database query tool.
Which two tasks can an Apache Ambari admin do that a regular Apache Ambari user cannot A. browse job information
information
do? (Choose two.) B. view service status
(Please select ALL that apply) * C. modify configurations
configurations
* D. run service checks
What is the default replication factor for HDFS on a production cluster? A. 5
B. 1
C. 10
* D. 3
In the ZooKeeper environment, what does atomicity guarantee? * A. Upda tes completely succeed or fail.
B. Up dates are applied in the order created.
C. If an update succeeds, then it persists.
D. E very client sees the same view.
Which
Which basi
basicc feat
feature
ure rul
rule
e of AQL hel
helps
ps find
find an exac
exactt match
match to a single
single wo
word
rd o
orr ph
phras
rase?
e? A. Dicti
Dictiona
onary
ry
B. Part of Speech
* C. Literals
D. Spli
Splits
ts

What is the default data type in Big R? * A. char acter


B. int
integer
eger
C. complex
D. numeric

You have a very large Hadoop file system. You need to work on the data without migrating the A. MapReduce
data out or changing the data format. Which IBM tool should you use? * B. Big SQL
C. Data Server Manager
D. Pig
Which
Which co
core
re compon
component
ent of tthe
he H
Hadoo
adoop
p fr
framew
amework
ork is hi
highly
ghly scala
scalable
ble and a co
commo
mmon
n to
tool?
ol? A. Sqoop
Sqoop
B. Pig
* C. MapReduce
D. Hi
Hive
ve
Ho
How
w ccan
an yo
you
u red
reduc
uce
e th
the
emmem
emor
oryy usa
usage
ge of th
the
eAANA
NALY
LYZE
ZE co
comm
mman
and
d iin
nBBig
ig SQL?
SQL? A. Run
Run eeve
very
ryth
thin
ing
g iin
noone
ne batc
batch.
h.
B. Turn on distribution statistics.
* C. Run the command separately on different batches of columns.
D. I nclude all the columns in the batch.
What shoul
should
d you do iin
n Text A
Analy
nalytics
tics to
to fix an e
extra
xtractor
ctor tthat
hat produc
produces
es unwante
unwanted
d resu
results?
lts? A. Re-creat
Re-createe the eextra
xtractor
ctors.
s.
B. Remove results with a consolidatio n rule.
* C. Cr eate a new filter.
D. E dit the properties of the sequence.
QUESTIONS REPONSES CORRECTION
A The list of deployments
B A list of your saved bookmarks
C The email address of the collaborator
You need to add a collaborator to your project. What do you need? D Your project ID c
A URL
B Scala
C File
Before you create a Jupyter notebook in Watson Studio, which two items are necessary? D Spark Instance
(Please select the TWO that apply) E Project de

A Database
B Wrapper
C Object Storage
Where does the unstructured data of a project reside in Watson Studio? D Tables c
A Watson Studio Desktop
B Watson Studio Cloud
C Watson Studio Business
Which Watson Studio offering used to be available through something known as IBM Bluemix? D Watson Studio Local b
A Data Assets
B Projects
C Collaborato
Collaborators
rs
What is the architecture of Watson Studio centered on? D Analytic Assets b
A INSERT
B GRANT
C REVOKE
Which two commands would you use to give or remove certain privileges to/from a user? D LOAD
(Please select the TWO that apply) E SELECT bc
A4
B2
C1
How many Big SQL management node do you need at minimum? D3 c
A /apps/hive/warehouse/data
B /apps/hive/warehouse/bigsql
C /apps/hive/warehouse/
What is the default directory in HDFS where tables are stored? D /apps/hive/warehouse/schema c
A ./java mybigdata
B ./jsqsh mybigdata
C ./java tables
Using the Java SQL Shell, which command will connect to a database called mybigdata? D ./jsqsh go mybigdata b
A 777
B 755
C 700
Which directory permissions need to be set to allow all users to create their own schema? D 666 a
A Files
B Schemas
C Hives
What are Big SQL database tables organized into? D Directories b
A hdfs dfs -chmod 770 /hive/warehou
/hive/warehouse se
B hdfs dfs -chmod 755 /hive/warehouse
C hdfs dfs -chmod 700 /hive/warehouse
/hive/warehouse
/hive/warehous D hdfs dfs -chmod 666 /hive/warehouse
You have a distributed file system (DFS) and need to set permissions on the the /hive/warehous c
A It grants or revokes certain user privileges.
B It grants or revokes certain directory privileges.
C It limits the rows or columns returned based on certain criteria.
Which definition best describes RCAC? D It limits access by using views and stored procedures. c
A A data type of a column describing its value.
B The defined format and rules around a delimited file.
C A container for any record format.
Which statement best describes a Big SQL database table? D A directory with zero or more data files. d
A Big SQL can exploit advanced features
B Data interchange outside Hadoop
C Supported by multiple I/O engines
What is an advantage of the ORC file format? D Efficient compression d

A GRANT
B umask
C Kerberos
You need to determin ne
e the permission set
setting directory.. Which tool would yo D HDFS
ting for a new schema directory b
A Scheduler
B DSM
C Jupyter
Which tool would you use to create a connection to your Big SQL database? D Ambari b
A bigsql.alltabl
bigsql.alltables.io.doAs
es.io.doAs
B DB2COMPOPT
C DB2_ATS_EN
DB2_ATS_ENABLEABLE
You need to enable impersonation. Which two properties in the bigsql-conf.xml file need to be D $BIGSQL_HOME/conf
(Please select the TWO that apply) E bigsql.impers
bigsql.impersonation.cre
onation.create.table.
ate.table.grant.public
grant.public ae
A CREATE AS parquet
B STORED AS parquetfile
C STORED AS parquet
You are creating a new table and need to format it with parquet. Which partial SQL statement D CREATE AS parquetfile b
A TRANSLATE FUNCTION
B CREATE FUNCTION
C ALTER MODULE ADD FUNCTION
Which command creates a user-defined schema function? D ALTER MODULE PUBLISH FUNCTION b
A YARN
B Spark
C Pig
Which Apache Hadoop
Hadoop application provid
provides
es an SQL-like interface to allow abstraction
abstraction of data o D Hive d
A A messaging system for real-time data pipelines.
B A wizard for installing Hadoop services on host servers.
C Moves information to/from structured databases.
Which description characterizes a function provided by Apache Ambari? D Moves large amounts of streaming event data. b
A Writes to a leader server will always succeed.
B All servers keep a copy of the shared data in memory.
C There can be more than one leader server at a time.
Which statement accurately describes how ZooKeeper works? D Clients connect to multiple servers at the same time. b
A MemcacheD
B CouchDB
C Riak
Which NoSQL datastore
datastore type b
began
egan as an imple
implementation
mentation o
off Google's BigTable that can store a D Hbase d
A It is a powerful platform for managing large volumes of structured data.
B It is designed specifically for IBM Big Data customers.
C It is a Hadoop distribution based on a centralized architecture with YARN at its core.
Which statement is true about Hortonworks Data Platform (HDP)? D It is engineered and developed by IBM's BigInsights team. c
A Pig
B Hive
C Python
What is the name of the Hadoop-related Apache project that utilizes an in-memory architecture D Spark d
A It runs on Hadoop clusters with RAM drives configured on each DataNode.
B It supports HDFS, MS-SQL, and Oracle.
C It is much faster than MapReduce for complex applications on disk.
Which statement about Apache Spark is true? D It features APIs for C++ and .NET. c
A It determines the size and distribution of data split in the Map phase.
B It aggregates all input data before it goes through the Map phase.
C It reduces the amount of data that is sent to the Reducer task nodes.
Which statement is true about the Combiner phase of the MapReduce architecture? D It
It iiss pe
performed after th
the R
Re
educer p
ph
hase to p
prroduce the ffiinal ou
output. c

A MapReduce v1 APIs cannot be used with YARN.


B MapReduce v1 APIs provide a flexible execution environment to run MapReduce.
C MapReduce v1 A APIs
PIs are implemented
implemented by applications
applications which are largely
argely independent
independent of t
Which statement is true about MapReduce v1 APIs? D MapReduce v1 APIs define how MapReduce jobs are executed. c
A Hive
B Cloudbreak
C Big SQL
D MapReduce
Hadoop 2 consists of which three open-source sub-projects maintained by the Apache Software E HDFS
(Please select the THREE that apply) F YARN def
A Authorization Provider
B Ambari Alert Framework
C Postgres RDBMS
Which component of the Apache Ambari architecture integrates with an organization's LDAP or D REST API a
A Authenticating and auditing user access.
B Loading bulk data into an Hadoop cluster.
What are two services provided by ZooKeeper? C Maintaining configuration
configuration information.
(Please select the TWO that apply) D Providing distributed synchronization. cd
A Sesame
B Neo4j
C MongoDB
What is an example of a Key-value type of NoSQL datastore? D REDIS d
A MLlib
B RDD
C Mesos
Which Spark Core function provides the main element of Spark API? D YARN b
A org.apache.m
org.apache.mrr
B org.apache.hadoop.mr
C org.apache.h
org.apache.hadoop.mapr
adoop.mapred ed
Which is the java class prefix for the MapReduce v1 APIs? D org.apache.mapreduce c
A SSD
B JBOD
C RAID
Which hardware feature on an Hadoop datanode is recommended for cost efficient performanc D LVM b
A ApplicationM
ApplicationMaster
aster
B JobMaster
C TaskManager
Under the YARN/MRv2 framework, the JobTracker functions are split into which two daemons? D ScheduleManager
(Please select the TWO that apply) E ResourceManager ae
A The number of rows to commit per transaction.

B
C The
The number of rows
table name to send
to export to each
from mapper.
the database.
What does the split-by parameter tell Sqoop? D The column to use as the primary key. d
A Ambari
B Google File System
C HBase
Hadoop uses which two Google technologies as its foundation? D YARN
(Please select the TWO that apply) E MapReduce be
A Requires extremely rapid processing.
B Data is processed in batch.
Which two are attributes of streaming data? C Simple, numeric data.
(Please select the TWO that apply) D Sent in high volume. ad
Which st
statement ac
accurately describes ho
how Zo
ZooKeeper w
wo
orks? All sse
ervers ke
keep a copy ooff tth
he sh
shared d
daata iin
nmme
emory.

SequenceFiles
A API and perimeter security.
B Management of Kerberos in the cluster.
What two security functions does Apache Knox provide? C Proxying services.
(Please select the TWO that apply) D Database field access auditing. ac
A REDIS
B HBase
C Cassandra
What is an example of a NoSQL datastore of the "Document Store" type? D MongoDB d
A Hive
B Sqoop
C Pig
Which
Which Ap
Apache
ache Hadoop
Hadoop appli
applicati
cation
on provi
provides
des a high-le
high-level
vel programm
programming
ing language
language for d
data
ata transfo D Zookeeper
transfo c
A Big Data
B Big Match
C Big Replicate
D Big SQL
D Big SQL
What are three IBM value-add components to the Hortonworks Data Platform (HDP)? E Big YARN
(Please select the THREE that apply) F Big Index bcd
A Accumulo
B HBase
C Oozie
Which Hadoop
Hadoop ecosyste
ecosystem
m tool can import datdata
a into a Ha
Hadoop
doop cluste
clusterr from a DB2, ot D Sqoop
DB2, MySQL, or ot d
A Data Protection
B Speed
C Resiliency
Which three are a part of the Five Pillars of Security? D Audit
(Please select the THREE that apply) E Administrat
Administration
ion ade
A MLlib
B Mesos
C Spark SQL
Which component of the Spark Unified Stack allows developers to intermix structured database D Java c
A Hadoop YARN
B Apache Mesos
C Nomad
Apache Spark can run on which two of the following cluster managers? D Linux Cluster Manager
(Please select the TWO that apply) E oneSIS ab
A Suitable for transaction processing.
B Libraries that support SQL queries.
C APIs for Scala, Python, C++, and .NET.
Which feature makes Apache Spark much easier to use than MapReduce? D Applications run in-memory. b
A Run Sqoop using the vi editor.
B Use the --import-command line argument.
What are two ways the command-line parameters for a Sqoop invocation can be simplified? C Include the --options-file command line argument.
(Please select the TWO that apply) D Place the commands in a file. cd
A NodeChildrenChanged
B NodeDeleted
Which two are valid watches for ZNodes in ZooKeeper? C NodeExpired
(Please select the TWO that apply) D NodeRefreshed ab
A ResourceManager
B JobMaster
C ScheduleManager
Under the YARN/MRv2
YARN/MRv2 framework,
framework, which daemon
daemon arbitrates
arbitrates the execution
execution of tasks among
among all t D ApplicationMaster a

A NiFi
B Hortonworks Data Flow
C Druid
What is the preferred replacement for Flume? D Storm b
A Use the -mapper 1 parameter.
B Use the --limit mapper=1 parameter.
C Use the -m 1 parameter.
How can a Sqoop invocation be constrained to only run one mapper? D Use the --single parameter. c
A Reduce
B Map
C Combiner
Under the MapReduce v1 programming
programming model, which optional phase is executed simu ltaneousl D Split
simultaneousl c
A Acquisition
B Manipulation
C Exploration
What is the first step in a data science pipeline? D Analytics a
A Holding the output of a computation.
B Configuring data connections.
C Documenting the computational process.
What is a markdown cell used for in a data science notebook? D Writing code to transform data. c
A Common desktop app.
B Database interface.
C Linux SSH session.
What does the user interface for Jupyter look like to a user? D App in web browser. d
A To display a simple bar chart of data on the screen.
B To collect video for use in streaming data applications.
C To perform certain data transformation quickly.
Why might a data scientist need a particular kind of GPU (graphics processing unit)? D To input commands to a data science notebook. c
A %dirmagic
B %lsmagic
C %list-magic
What command is used to list the "magic" commands in Jupyter? D %list-all-magic b
A The list of deployments
B A list of your saved bookmarks
You need to add a collaborator to your project. What do you need? C The email address of the collaborator
Your answer D Your project ID c
A record locking
B batch processing
C machine learning
D transaction processing
Apache Spark provides a single, unifying platform for which three of the following types of oper E graph operation
operationss
(Please select the THREE that apply) F ACID transactions bce
A Scala
B Python
C Java
D .NET
Which three programming languages are directly supported by Apache Spark? E C#
(Please select the THREE that apply) F C++ a / c/b

A Map -> Split -> Reduce -> Combine

B Map -> Combine -> Shuffle -> Reduce

C Map -> Combine -> Reduce -> Shuffle


Under the MapReduce v1 programming model, which shows the proper order of the full set of D Split -> Map -> Combine -> Reduce b
A ResourceMa
ResourceManager
nager
B JobMaster
C ApplicationM
ApplicationMaster
aster
Underr th
Unde the
e YARN/M
YARN/MRv2Rv2 fram
framewor
ework,
k, which
which daem
daemon
on is
is tasked
tasked with nego
negotiat
tiating
ing w
with
ith the NodeMa
NodeMa D TaskManager c
A Collector
B Source
C Stream
What is the final agent in a Flume chain named? D Agent a
A Availability
B Authorization
What are two security features Apache Ranger provides? C Authentication
(Please select the TWO that apply) D Auditing bd
A Worker nodes store results on their own local file systems.
B Data is aggregated by worker nodes.
C Worker nodes process pieces in parallel.
Under the MapReduce v1 programming model, what happens in a "Reduce" step? D Input is split into pieces. b

A MapReduce
B Spark
C HBase
Which Apache Hadoop
Hadoop component can pote
potentially
ntially replace an RDB
RDBMSMS as a large Hadoop datasto D Ambari c
A RCFile
B SequenceFiles
C Flat
Which data encoding format supports exact storage of all data in binary representations such as D Parquet b
A One time export and import of a database.
B An application evaluating sensor data in real-time.
C A system that stores many records in a database.
Which statement describes an example of an application using streaming data? D A web application that supports 10,000 users. b
A Scalability
B Resource utilization
C TaskTrackers can be a bottleneck to MapReduce jobs
What are two primary limitations of MapReduce v1? D Number of TaskTrackers limited to 1,000
(Please select the TWO that apply E Workloads limited to MapReduce ab
A RAM
B network
C CPU
Which component of an Hadoop system is the primary cause of poor performance? D disk latency d
A Partial failure of the nodes during execution.
B Finding a particular node within the cluster.
What are two common issues in distributed systems? C Reduced performance when compared to a single server.
(Please select the TWO that apply) D Distributed systems are harder to scale up . ab
A Ambari Metrics System
B Ambari Alert Framework
C Ambari Server
Which component of the Apache Ambari architecture provides statistical data to the dashboard D Ambari Wizard a

A Impersonat
Impersonation
ion
B Grant/Revoke privileges
C Fluid query
Which Big SQL feature allows users to join a Hadoop data set to data in external databases? D Integration c
A Data source
B User mapping
C Nickname
When connecting to an external database in a federation, you need to use the correct database D Wrapper d
A Parsing and loading data into a notebook.
B Autoconfiguring data connections using a registry.
C Extending the core language with shortcuts.
What is a "magic" command used for in Jupyter? D Running common statistical analyses. c
A RAID-0
B Online Transactional Processing
C Parallel Processing
Which computing technology provides Hadoop's high performance D Online Analytical Processing c
A ScheduleMa
ScheduleManager
nager
B ResourceManager
C ApplicationM
ApplicationMaster
aster
Under the YARN/MRv2 framework, the Scheduler and ApplicationsManager are components of D TaskManager b
A large number of small data files
B solid state disks
C immediate failover of failed disks
D high-speed networking between nodes
which two factors in a Hadoop cluster increase performance most significantly? E data redundancy on management nodes
(Please select the TWO that apply) F parallel reading of large data files ok d/F
A MapReduce
B YARN
C HDFS
Which component
component of the Hortonworks DatData
a Platform (HDP) is the architectural center of Hadoo D Hbase
architectural b
A Ambari Alert Framework
B Ambari Wizard
C Ambari Metrics System
If a Hadoop node goes down, which Ambari component will notify the Administrator? D REST API a
A Code

B Kernel
C Output
Which type of cell can be used to document and comment on a process in a Jupyter notebook? D Markdown d
A Notebooks can be used by multiple people at the same time.
B Users must authenticate before using a notebook.
C Notebooks can be connected to big data engines such as Spark.
Which is an advantage that Zeppelin holds over Jupyter? D Zeppelin is able to use the R language. a
A Jaql
B Fault tolerance through HDFS replication
C Adaptive MapReduce
Which capability does IBM BigInsights add to enrich Hadoop? D Parallel computing on commodity servers c
A value
B volum
volume e
C verifiabil
verifiability
ity
What is one of the four characteristics of Big Data? D volatility b
A Hadoop Common
B Hadoop HBase
C MapReduce
Which Hadoop-related
Hadoop-related projec
projectt provides comm
commonon utilities a
and
nd libraries tha
thatt support other
other Hado D BigTable a
A Federated Discovery and Navigation
B Text Ana
Analysis
lysis
C Stream Computing
Which type of Big Data analysis involves the processing of extremely
extremely large volumes of constantl D MapReduce c
A 64-bit architecture
B disk latency
C MIPS
Which primary computing bottleneck of modern computers is addressed by Hadoop? D limited disk capacity b
A stream computing
B data warehousing
C analyt
analytics
ics
Which Big Data function improves the decision-making capabilities of organizations by enabling D distributed file system c
A HBase
B Apa
Apache
che
C Jaql
What is one of the two technologies that Hadoop uses as its foundation? D MapReduce d
A a high throughput, shared file system
B high availability of the NamNameNode
eNode
C data access performed by an RDBMS
What key feature does HDFS 2.0 provide that HDFS does not? D random access to data in the cluster b
A LOAD
B JOI
JOINN
C TOP
What are two of the core operators that can be used in a Jaql query? (Select two.) D SELECT bc
A SQL-like
B compiled language
C object orien
oriented
ted
Which type of language is Pig? D data flow d
A hdfs.conf
B hadoop-configura
hadoop-configuration.xml
tion.xml
C hadoop.conf
If you need to change the replication factor or increase the default storage block size, which file D hdfs-site.xml d
A The file(s) must be stored on the local file system where the map reduce job was develop
B The file(s) must be stored in HDFS or GPFS.
C The file(s) must be stored on the JobTracker .
To run a MapReduce job on the BigInsights cluster, which statement about the input file(s)
file(s) mus D No matt
matter
er where tthe
he input files
files are before,
before, they will
will be automatically
automatically copied
copied to where t b
A operating system independence
B posix comp
compliance
liance
C no single point of failu
failure
re
What is a characterist
characteristic
ic of IBM GPFS that distinguishes it from other distributed file systems? D blocks that are stored on different nodes b
A Pig is used for creating MapReduce program
programs.s.
B Pig has a shelf interface for executing commands.
C Pig is not designed for random reads/writes or low-latency queries.
Which statement represents a difference between Pig and Hive? D Pig uses Load, Transform, and Store. d
A hdfs -dir mydata
B hadoop fs -mkdir mydata
C hadoop fs -dir myda
mydatata

Which command helps you create a directory called mydata on HDFS? ADReduce
mkdir mydata b
B Shu
Shuffle
ffle
C Combine
In which step of a MapReduce job is the output stored on the local disk? D Map d

A Worker nodes process individual data segments in parallel.


B Worker nodes store results in the local file system.
C Input data is split into smaller piec
pieces.
es.
Under the MapReduce programming model, which task is performed by the Reduce step? D Data is aggregated by worker nodes. d
A Reducer
B JobSche
JobScheduler
duler
C TaskTrack
TaskTrackerer
Which element of the MapReduce architecture runs map and reduce jobs? D JobTracker c
A spread data across a cluster of computers
B provide structure to unstructured or semi-struc tured data
C increase storage capacity through advance advancedd compression algorithms
algorithms
What is one of the two driving principles of MapReduce? D provide a platform for highly efficient transaction processing a
A Cluster
B Distributed
C Remote
D Debu
Debugging
gging
When running a MapReduce job from Eclipse, which BigInsights execution
execution models are available E Local ae
A The number of reducers is always equal to the number of mappers.
B The num
number
ber of mappers
mappers and reducers
reducers can be configured by modifying the the mapred-site.x
mapred-site.x
C The number of mappers and reducers is decided by the NameNode.
Which statement is true regarding the number of mappers and reducers configured in a cluster D The The nu
numb
mberer of mapp
mappererss mus
mustt bbe
eeequ
qual
al to th
the
ennum
umbe
berr o
off n
nod
odes
es in a clu
clust
ster
er.. b
A hadoop size
B hdfs -d
-du
u
C hdfs fs size
Which command displays
displays the sizes of files and directorie
directoriess contained in the given directory,
directory, or th D hadoop fs -du d
A three
B two
C one
placement policy, when the replication factor is three D none
Following the most common HDFS replica placement c
A copies Job Resources to the shared file system
B coordinates the job execution
C executes the map and reduce functions
In the MapReduce processing model, what is the main function performed by the JobTracker? D assigns tasks to each cluster node b
A Both are data flow languages.
B Both require schema.
C Both use Jaql query language.
How are Pig and Jaql query languages similar?
similar? D Both are developed primarily by IBM. a
A to manage storage attached to nodes
B to coordinate MapReduce jobs
C to regulate clien
clientt access to files
Under the HDFS architecture, what is one purpose of the NameNode? D to periodically report status to DataNode c
A hadoop fs list
B hdfs ro
root
ot
C hadoop fs -I -Iss /
Which command should be used to list the contents of the root directory in HDFS? D hdfs list / c
A runs map and reduce tasks
B keeps the work physically close to the data
C reports status of DataNodes
What is one function of the JobTracker in MapReduce? D manages storage b
A built-in UDFs and indexing
B platform-spe
platform-specific
cific SQL libraries
C an RDBMS such as DB2 or MySQL
In addition to the high-level language Pig Latin, what is a primary component of the Apache Pig D runtime environment d

A Data is accessed through MapReduce.


B Data is designed for random access read/write.
C Data can be processed over long distances without a decrease in performance.
Which statement is true about Hadoop Distributed File System (HDFS)? D Data can be created, updated and deleted. a
A managing customer information in a CRM database
B sentiment ana
analytics
lytics from social media blogs
C product cost analysis from accounting systems
Which is a use-case for Text Analytics? D health insurance cost/benefit analysis from payroll data b
A BigSheets client
B Microsoft Ex
Excel
cel
C Eclipse
Which tool is used to access BigSheets? D Web Browser d
A Hive metastore
B RDBMS
C MapReduce
Which technology does Big SQL utilize for access to shared catalogs? D HCatalog a
A display view <view_name>
B return view <view_name>
C output view <view_name>
Which statement will make an AQL view have content displayed? D export view <view_name> c
A Text Analytics
B Stream Computing
C Data Warehousing
You work for a hosting company
company that has data centers spre
spread
ad across No North
rth America. You are try D Temporal Analysis a
A Thrift client
B Hive shell
C Hive SQL client
Which utility provides a command-line interface for Hive? D Hive Eclipse plugin b
A It is a data flow language for structured data based on Ansi-SQL.
B It is a distributed file system that replicates data across a cluster.
C It is an open source implementation of Google's BigTable.
What is an accurate description of HBase? D It is a database schema for unstructured Big Data. c
A BigSQL
B BigSheets
C Avro
Which Hadoop-related
Hadoop-related ttechnology
echnology p
provides
rovides a u
user-friendly
ser-friendly iinterface,
nterface, which
which enables business us D HBase b
A Text Analytics is the most common way to derive value from Big Data.
B MapReduce is unable to process unstructured text.
C Data warehouses contain potentially valuable information.
What drives the demand for Text Analytics? D Mo
Most of th the world's data is in unstructured or semi-structured text. d
A An external table refers an existing location outside the warehouse directory.
B An external table refers to a table that cannot be dropped.
C An external table refers to the data from a remote database.
In Hive, what is the difference between an external table and a Hive managed table? D An external table refers to the data stored on the local file system. a
A It provides all the capabilities of an RDBMS plus the ability to manage Big Data.
B It is a database technology that does not use the traditional relational model.
C It is based on the highly scalable Google Compute Engine.
Which statement about NoSQL is true? D IItt is an IBM project designed to enable DB2 to manage Big Data. b
A "Copy" to create a new sheet with the other workbook data in the current workbook
B "Group" to bring together the two workbooks
C "Load" to create a new sheet with the other workbook data in the current workbook
If you need to JOIN data from two workbooks, which operation should be performed beforehan D "Add" to add the other workbook data to the current workbook c

A to get detailed information about the table


B to view data in an Hbase table
C to report any inconsistencies in the database
What is the "scan" command used for in HBase? D to list all tables in Hbase c
A Eclipse with BigInsights tools for Eclipse plugin
B BigInsights Console with AQL plugin
C AQLBuilder
Which tool is used for developing a BigInsights Text Analytics extractor? D AQL command line a
A Pre-create regions by specifying ssplits
plits in create table
table comm
command
and and use the insert comm
comm
B Pre-create regions by specifying splits in create table command and bulk loading the data
C Pre-create the column families
families when creating the table and us use
e the put command to loa
What is the most efficient way to load 700MB of data when you create a new HBase table? D Pre-crea
Pre-createte tthe
he co
column
lumn families
families when crea
creating
ting the table
table an
and
d bulk
bulk loa
loading
ding the data.
data. b
The following sequence of commands is executed:
create 'table_1','column_family1','column_family2'
put 'table_1','row1','column_family1:c11','r1v11'
put 'table_1','row2','column_family1:c12','r1v12'
put 'table_1','row2','column_family2:c21','r1v21'
put 'table_1','row3','column_family1:d11','r1v11' A4
put 'table_1','row2','column_family1:d12','r1v12' B3
put 'table_1','row2','column_family2:d21','r1v21' C6
In HBase, which value will the "count 'table_1'" command return? D2 b
A TRANSFORM
B SELECT
C GET
Which Hive command is used to query a table? D EXPAND b
A because SQL enhances query performance
B because the MapReduce Java API is sometimes difficult to use
C because data stored in a Hadoop cluster lends itself to structured SQL queries
Why develop SQL-based query languages that can access Hadoop data sets? D because the data stored in Hadoop is always structured b
A It allows Hadoop to apply the schema-on-ingest model to unstructured Big Data.

B It allows an RDBMS to maintain referential integrity on a Hadoop data set.


C It allows customers to leverage high-end server platforms to manage Big Data.
Which key benefit does NoSQL provide? D IItt cca
a n cos
cos t-e
t-eff
ffe
e cctt iivv e
ell y m a
an
naage
ge d a
att a s e
ett s t oo
oo lla
a rge
rge for
for ttra
rad
d iitt io na
na l R DB
DB MS
MS . d
A Hadoop data is highly structured.
B Data is in many formats.
C Data is located on a distributed file system.
What makes SQL access to Hadoop data difficult? D Hadoop requires pre-defined schema. b
A list tables
B describe tables
C show all
Which command can be used in Hive to list the tables available in a database/schema? D show tables d
A to count the number of columns of a table
B to count the number of column families of a table
C to count the number of rows in a table
In HBase, what is the "count" command used for? D to count the number of regions of a table c
A HBase
B Pig
C Jaql
Which
Which Had
Hadoop-
oop-rela
related
ted techn
technolog
ologyy sup
support
portss ana
analysis
lysis of lar
large
ge da
datase
tasets
ts sto
stored
red in HDFS u
using
sing an S D Hive
an d
A They need to be marked as "Shared."
B They need to be copied under the user home directory.
C They need to be deployed with proper privileges.
How can the applications published to BigInsights Web Console be made available for users to e D They need to be linked with the master application. c

A Eclipse
B Oozie
C Jaql
Which component of Apache Hadoop is used for scheduling and running workflow jobs? D Task Launcher b
A validater
B replicater
C crawler
What is one of the main components of Watson Explorer (InfoSphere Data Explorer)? D compressor c
A analyze and react to data in motion before it is stored
B find and analyze historical stream data stored on disk
C analyze and summarize product sentiments posted to social media
IBM InfoSphere Streams is designed to accomplish which Big Data function? D execute ad-hoc queries against a Hadoop-based data warehouse c
A InfoSphere Information Server
B InfoSphere Streams
C InfoSphere BigInsight
BigInsightss
Which IBM Big Data solution provides low-latency analytics for processing data-in-motion?
data-in-motion? D PureData for Analytics b
A Avro
B HBase
C Eclipse
Which IBM tool enables BigInsights users to develop, test and publish BigInsights applications? D BigInsights Applications Catalog c
A enabling customers to efficiently index and access large volumes of data
B gaining new insight through the capabilities of the world's interconnected intelligence
C providing solutions to help customers manage and grow large database systems
Which description identifies the real value of Big Data and Analytics? D using modern technology to efficiently store the massive amounts of data generated by b
Sqoop
Which Hadoop ecosystem tool can import data into a Hadoop cluster from a DB2, MySQL, or other databas

Which NoSQL datastore


datastore type began as an imp
implementation
lementation o
off Google's BigTable that can store any type of data and sscc HBase
that
A Parallel Processing
B Online Analytical Processing
C Online Transactional
Transactional Processing
Which computing technology provides Hadoop's high performance? D RAID- 0 a

Test Blanc

Question 1
You nee
n eed
d to determine
determin e th
thee permis
permissio
sion
n setting
settin g for a
new schema
schema directory.
directory. Which
Which tool woul
wouldd you use?
Your answer
A. Kerberos

B. HDFS

C. umask

D. GRANT

Question 2
Using the
th e Java SQL Shell,
Shell, which
whi ch command
co mmand will
wi ll
connect
con nect to a databa
database
se call
called
ed mybigdata?
mybi gdata?
Your answer

A. ./java tables

B. ./jsqsh mybigdata

C. ./jsqsh go mybigdata

D. ./java mybigdata

Test Blanc

Question 3
Which tool woul
would d you use to create
create a connection to
yourr Big
you Bi g SQL
SQL data
d atabase
base??
Your answer

A. Ambari
B. Scheduler

C. DSM

D. Jupyter

Whi
hich
ch definiti
defin ition
on best describes
describ es RCAC
CAC?
?

Your answer

A. It limits the rows or columns returned based


based on certain criteria.

B. It grants or revokes certain


certain directory privileges.
privileges.

C. It grants or revokes
revokes certain user privileges.

D. It limits access by using views and stored procedures.


procedures.

Test Blanc
Question 5
What is an advantage of the
t he OR
ORC file format?
fo rmat?
Your answer

A. Data interchange outside Hadoop


Hadoop

B. Supported by multiple I/O engines


engines
C. Big SQL can exploit
exploit advanced features

D. Efficient compression

Question 6
You nee
n eedd to enable impersonation.
impersonatio n. Whi
Which
ch two
tw o
prope
prop erties in the
t he bigsql-
bigsql -conf.xml fi le nee
need to be
marked true?
Your answer

A. bigsql.alltables.io.doAs

B. bigsql.impersonation.create.table.grant.public

C. DB2_ATS_ENABLE

D. DB2COMPOPT

E. $BIGSQL_HOME/conf

Test Blanc
Question 7
What are Bi
Bigg SQL
SQL data
d atabase
base tabl
tables
es organize
org anized
d int
i nto?
o?
Your answer

A. Directories

B. Files
C. Hives

D. Schemas

Question 8
Which two
t wo comma
comm ands would
wou ld you use
u se to give
giv e or
remove certain
certain privi
pr ivileges
leges to/fro
to/fromm a user?
Your answer

A. GRANT

B. REVOKE

C. INSERT

D. SELECT

E. LOAD

Test Blanc

Question 9
You have a disdistri
tribut
but ed fil e system (DF
(DFS) and
and nee
needd to
set permission
permiss ions s on the
t he the /hi
/hive/
ve/ware
warehou
house
se
directory
di rectory to
t o allow access to ONLYONLY the bigsql
big sql user.
u ser.
Which command would you run?
Your answer
A. hdfs dfs -chmod 770 /hive/warehouse
/hive/warehouse

B. hdfs dfs -chmod 700


700 /hive/warehouse
/hive/warehouse

C. hdfs dfs -chmod 755


755 /hive/warehouse
/hive/warehouse

D. hdfs dfs -chmod 666


666 /hive/warehouse
/hive/warehouse

Whi
hich
ch stateme
statement
nt best describes
d escribes a Big SQL da
d atabase
table?
Your answer

A. A container for any record format.

B. The defined format and rules around a delimited file.

C. A directory with zero or more data files.


files.

D. A data type of a column


column describing its value.

Test Blanc

Question 11
Whi
hich
ch Big
Bi g SQL
SQL featur
featuree allows users to join
j oin a
Hadoo
Hadoop p data set
set to data in external data
d atabase
bases?
s?
Your answer
A. Integration

B. Fluid query

C. Impersonation

D. Grant/Revoke privileges

Question 12
Which directory permissions
permissi ons ne
n eed to be set
set to allow
all users to crea
c reate
te their own sch
scheema?
Your answer

A. 700

B. 755

C. 666

D. 777

Test Blanc

Question 13
What is the default directo
d irectory
ry i n HDF
HDFS where tables
are sto
stored?
red?
Your answer
A. /apps/hive/warehou
/apps/hive/warehouse/
se/

B. /apps/hive/warehouse/data

C. /apps/hive/warehouse/bigsql

D. /apps/hive/warehouse/schema

Question
How many 14Big
Bi g SQL
SQL management
management node
no de do you
y ou nee
n eed
d
at minimum?
min imum?
Your answer

A. 1

B. 2

C. 4

D. 3

Test Blanc
Question 15
When connectin
con necting g to
t o an external
external dad atabase in a
federatio
federa tion,
n, you nee
need d to
t o use
us e the correct database
drive
driv er and protocol.
proto col. What
What is this
th is fe
f edera
deration
tion
component
compo nent called
called in Big SQL?
SQL?
Your answer
A. Wrapper

B. User mapping

C. Data source

D. Nickname

Question 16
Whihich
ch Apache
Ap ache Ha
Hadoo
doopp application
appli cation pro
provid
vides
es an
an SQL-
SQL-
like inte
int erface to allow abstraction of da
d ata on semi-
struct
str uctured
ured data in a Ha
Hadoo
doopp datastore?
Your answer

A. Pig

B. Spark

C. Hive

D. YARN

Question 17
A p ac
Ap achh e Spar
Sp ark
k p r o v i d es a si
s i n g l e, un
u n i f y i n g p l at
atff o r m
for which
wh ich thre
thr ee of the following
follo wing typesty pes of operation
operations? s?

Test Blanc

Your answer

A. record locking

B. machine learning

C. batch processing
D. ACID transactions

E. graph operations

F. transaction processing

Question 18
Under the
th e YAR
YARN/
N/MR
MRv2 v2 framework,
framewor k, which
whi ch dae
d aemo
monn is
is
tasked with
wit h negoti ati
ting
ng with
w ith the NodeM
NodeMa anage
nager(s)
r(s) to
exe
xecute
cute and
and moni
monitor
tor tasks?
Your answer

A. JobMaster

B. ApplicationMaster

C. ResourceManager

D. TaskManager

Question 19

Test Blanc
Which compo
componentnent of the Horton
Hortonworks
works Data Platform
Platform
(HD
(HDP) is t he architectural
architectu ral ce
c enter of
o f Ha
Hadoo
doopp and
provid
pro vides
es resou
resource
rce manageme
management nt and a centr
centra
al
platform for
fo r Ha
Hadoop applica
applic ation
tions?
s?
Your answer

A. MapReduce
B. HBase

C. YARN

D. HDFS

Question 20
What is the fina
fin al agent in a Flu
Flume
me chain named?

Your answer

A. Agent

B. Stream

C. Collector

D. Source

Question 21
Which Apache Hadoop applipplication
cation provides
pr ovides a high -
levell progr
leve prograamming langua
language
ge for data transformation
on unstructu
uns tructured
red data?
data?
Your answer

A. Sqoop

Test Blanc

Your answer

B. Zookeeper

C. Hive

D. Pig
Question 22
Whi
hich
ch compo
co mponent
nent of the
th e Spark Uni
Unified
fied Stack
Stack allows
allow s
develo
deve lopers
pers to intermix
int ermix struct
str uctured
ured database
database queries
with Spark's progra
progr amming langua
language?
ge?
Your answer

A. Java

B. MLlib

C. Mesos

D. Spark SQL

Test Blanc

Question 23
Under the
Under t he MapR
MapRe educ
ducee v1 programming
progr amming mod
model,
el, what
what
happens
happe ns in
i n a " Reduce" step?
Your answer
A. Worker nodes process pieces
pieces in parallel.

B. Worker nodes store results on their own local file systems.

C. Input is split into pieces.

D. Data is aggregated by worker


worker nodes.

Question 24
Whi
hich
ch state
st atement
ment is true
tr ue abou
aboutt Hortonwor
Horton works
ks Data
Data
Platform (HDP)?
Your answer

A. It is designed specifically for IBM Big Data


Data customers.

B. It is a powerful platform
platform for managing large volumes of structured
structured data.

C. It is a Hadoop distribution based on a centralized


centralized architecture with YARN at its core.

D. It is engineered and
and developed by IBM's BigInsights team.
team.

A p ac
Ap achh e Spar
Sp ark
k c an r u n o n w h i c h t w o o f t h e fo
fo l l owi ng
cluster managers?

Test Blanc

Your answer

A. Nomad

B. Hadoop YARN

C. oneSIS
D. Apache Mesos

E. Linux Cluster Manager

Question 26
Whi
hich
ch state
st atement
ment is true
t rue abou
aboutt the
th e Comb
ombiner
iner phase
of the MapRe
MapReduc
ducee arch
architectu
itecture?
re?
Your answer

A. It determines the size and distribution of data


data split in the Map phase.

B. It reduces the amount


amount of data that is sent to the Reducer
Reducer task nodes.
nodes.

C. It is performed after the Reducer


Reducer phase to produce the final output.
output.

D. It aggregates all input data before it goes through the Map phase.
phase.

Question 27
Whi
hich
ch three
t hree are
are a part of th
thee Five
Five Pillars of
Security?
Your answer

A. Administration

B. Audit

Test Blanc

Your answer

C. Speed

D. Data Protection

E. Resiliency
Question 28
Whi
hich
ch fea
featur
turee makes
makes Apache Spark
Spark much
m uch ea
easier
sier to
to
use
us e th
than
an MapRe
MapRedu duce?
ce?
Your answer

A. Libraries that support SQL queries.

B. Applications run in-memory.

C. Suitable for transaction processing.

D. APIs for Scala,


Scala, Python,
Python, C++, and
and .NET.

Under the
th e YARN
YARN/M
/MRv2
Rv2 framework,
framewor k, the
th e JobTra
Job Tracker
cker
functions are split into which
wh ich two da
daeemons?
Your answer

A. ResourceManag
ResourceManager
er

B. TaskManager

C. JobMaster

D. ApplicationMaster

Test Blanc

Your answer

E. ScheduleManager

Under the
th e YARN
YARN/M
/MRv2
Rv2 framework,
framewor k, the
th e JobTracker
Job Tracker
functions are split into which
wh ich two da
daeemons?
Your answer
A. ResourceManag
ResourceManager
er

B. TaskManager

C. JobMaster

D. ApplicationMaster

E. ScheduleManager

Question 30
Which computi
com puting
ng technology provides
pro vides Ha
Hadoop
doop's
's
high performance
performance??
Your answer

A. RAID-0

B. Parallel Processing

C. Online Analytical Processing

D. Online Transactional Processing

Question 31
What are two securit y fea
f eatur
turees Apache
A pache Ranger
Ranger
provides?

Test Blanc

Your answer

A. Availability

B. Authorization
C. Authentication

D. Auditing

Question 32
Whi
hich
ch two
tw o are
are attrib
attributes
utes of streaming data?
Your answer

A. Simple, numeric data.

B. Sent in high
high volume.
volume.

C. Data is processed in batch.

D. Requires extremely
extremely rapid processing.

Question 33
Whi
hich
ch state
s tatement
ment describes
descri bes an example
example of an
appli
pplica
cation
tion usin
usingg stre
str eaming data
data?
?
Your answer

A. An application evaluating
evaluating sensor data in real-time.

B. A web application that supports


supports 10,000 users.

Test Blanc

Your answer

C. One time export


export and import of a database.
database.

D. A system that stores many records in a database.

Question 34
Whi
hich
ch stateme
statement
nt is tru
truee abou
aboutt Ma
MapRe
pReduc
ducee v1 APIs?
APIs?
Your answer

A. MapReduce v1 APIs
APIs are implemented by applications which are largely independent of t

B. MapReduce v1 APIs define how MapReduce jobs are executed.

C. MapReduce v1 APIs provide


provide a flexible execution environment
environment to run MapReduce.

D. MapReduce v1 APIs
APIs cannot be used with YARN.

Question 35
What two security funct ions does
d oes Apache
Apache Knox
provide?
Your answer

A. API and perimeter security.

B. Database field access


access auditing.
auditing.

C. Proxying services.

D. Management of
of Kerberos in the cluster.

Question 36

Test Blanc
Hadoop 2 consists of which
w hich three open-
open-source
source sub-
projects
pro jects mainta
maint ain
ined
ed by the Apache Soft
oftware
ware
Foundation?
Your answer

A. HDFS
B. Hive

C. Big SQL

D. Cloudbreak

E. YARN

F. MapReduce

Question 37
Which Hadoop ecosystem tool too l can import
imp ort da
d ata into
a Ha
Hado
doop
op clu
c lust
ster
er from a DB2
DB2,, MyS
MySQL,
QL, or oth
o ther
er
databases?
Your answer

A. Sqoop

B. HBase

C. Accumulo

D. Oozie

Question 38
Whi
hich
ch state
st atement
ment accurately
accurately describes
d escribes how
Zoo
ooKee
Keeper
per works?
works ?

Test Blanc

Your answer

A. There can be more than one


one leader server at a time.

B. Writes to a leader
leader server will always succeed.
C. All servers keep a copy of the shared
shared data in memory.

D. Clients connect to multiple servers


servers at the same time.

Question 39
What does the
th e split -by paramete
parameterr tell Sqoop?
Sqoo p?
Your answer

A. The number of rows to send to each mapper.

B. The column
column to use as the primary
primary key.

C. The number of rows to commit per transaction.


transaction.

D. The table name


name to export
export from the database.

Question 40
Whi
hich
ch two
tw o are valid wa
w atch
tches
es for ZNod
ZNodees in
ZooKeeper?
Your answer

A. NodeChildrenCha
NodeChildrenChanged
nged

B. NodeRefreshed

C. NodeDeleted

Test Blanc

Your answer

D. NodeExpired

Question 41
Which compo
component
nent of the
t he Apache
Apache Ambari
Ambari
architecture
archit ecture integrates wit h an organizatio
organization's
n's LD
L DAP
or Active
A ctive Directory
Directory service
service?
?
Your answer

A. REST API

B. Authorization Provider

C. Ambari Alert Framework

D. Postgres RDBMS
Question 42
Whi
hich
ch hardware feature
feature on an Hado
Hadoopop data
datanod
nodee is
recommended
recommended for cost effici
fficie
ent performance
performance??
Your answer

A. SSD

B. RAID

C. JBOD

D. LVM

Question 43

Test Blanc
Which data encodi
ncoding
ng f ormat supports
supp orts exact storage
of all data in binary represe
representations
ntations such as
VARBINARY columns?
Your answer

A. SequenceFiles
B. Flat

C. Parquet

D. RCFile

Question 44
What is ththe
e name of the th e Ha
Hado
doop
op-re
-related
lated Apache
Ap ache
proje
proj ect tha
th at utilize
uti lizes
s an in-memory
in-memory archite
archit ecture to run
r un
application
appli cations s faster than
th an MapR
MapReedu
duce?
ce?
Your answer

A. Pig

B. Spark

C. Python

D. Hive

Question 45
Whi
hich
ch state
st atement
ment about Apache
A pache Spark is true?
t rue?
Your answer

A. It features APIs for C++ and .NET.

Test Blanc

Your answer

B. It supports HDFS, MS-SQL,


MS-SQL, and Oracle.

C. It is much faster than


than MapReduce for complex applications on disk.
D. It runs on Hadoop
Hadoop clusters with RAM drives
drives configured on each DataNode.

Question 46
Which descriptio
description
n cha
ch ara
racteriz
cterize
es a funct ion provi
provided
ded
by Apache
A pache Ambari?
Your answer

A. Moves information to/from structured databases.


databases.

B. A wizard for installing Hadoop


Hadoop services on host servers.

C. Moves large amounts of streaming event data.

D. A messaging system for real-time data pipelines.


pipelines.

Question 47
What is the preferred replacement
replacement for
fo r Flume?
Your answer

A. Druid

B. Hortonworks Data Flow

C. Storm

D. NiFi

Test Blanc
Question 48
What are
are two primary
pr imary limi
li mitatio
tations
ns of Ma
MapRe
pReduc
ducee v1?
Your answer

A. Scalability
B. TaskTrackers can
can be a bottleneck
bottleneck to MapReduce
MapReduce jobs

C. Resource utilization

D. Number of TaskTrackers limited to 1,000

E. Workloads limited to
to MapReduce
MapReduce

Question 49
What are two services provi
pr ovided
ded by ZooKeeper?
ZooKeeper?

Your answer

A. Maintaining configuration information.


information.

B. Providing distributed synchronization.

C. Authenticating and
and auditing user access.
access.

D. Loading bulk data into an Hadoop cluster

Question 50
What are three IBM
IBM value-a
value-add com
compon
ponents
ents to
t o the
Hortonw
Horto nwor
orks
ks Data
Data Platf
Platfor
orm
m (HDP
(HDP)?
)?
Your answer

A. Big Match

Test Blanc

Your answer

B. Big YARN

C. Big SQL
D. Big Replicate

E. Big Data

F. Big Index

Question 51
What does the user interface
interface for Jupyter
Ju pyter look
loo k like
li ke to
a user?
user?
Your answer

A. Database interface.

B. Linux SSH session.

C. App in web browser.

D. Common desktop app.

Questionth52
What is thee fi
first
rst step
s tep in a data
data science pipeline?
pipelin e?
Your answer

A. Analytics

B. Manipulation

Test Blanc

Your answer

C. Exploration

D. Acquisition

Question 53
What command is used
used to list
lis t the "ma
" magic"
gic"
commands in Jupyter?
Your answer

A. %dirmagic

B. %list-all-magic

C. %list-magic

D. %lsmagic

Question 54
What is a " magic" command used for in Jupyter?
Your answer

A. Autoconfiguring data connections


connections using a registry.

B. Parsing and loading data into a notebook.


notebook.

C. Extending the core


core language with shortcuts.

D. Running common
common statistical analyses.
analyses.

Question 55
Whi
hich
ch is
i s an advantage
advantage that Zeppelin
Zeppelin holds
ho lds over
Jupyter?

Test Blanc

Your answer

A. Zeppelin is able to use the R language.


language.

B. Notebooks can be used by multiple people at the same time.


C. Notebooks can be connected
connected to big data engines such
such as Spark.

D. Users must authenticate before using a notebook.

Question 56
Where does the
th e uns
unstru
tructu
ctured
red data of a project
proj ect
reside in Watson
Watson Studio?
Studio ?
Your answer

A. Database

B. Wrapper

C. Tables

D. Object Storage

Question 57
Before you crea
cr eate
te a Jup
Jupyter
yter notebook
no tebook in
i n Watson
Watson
Stud
tudio,
io, which
whic h two items
i tems are
are nece
necessary?
ssary?
Your answer

A. Scala

B. URL

C. Project

Test Blanc

Your answer

D. Spark Instance

E. File

Question 58
Which type of cell can be used
used to docum
docume ent and
comment on a process
process in a Jupyter
Jupyter note
not ebook?
Your answer

A. Kernel

B. Output

C. Code

D. Markdown

Question 59
Whi
hich
ch Watso
Watsonn Studio
Studi o offering
of fering used to be available
available
through
throu gh something known
k nown as IBMIBM Bluemix?
Your answer

A. Watson Studio Cloud

B. Watson Studio Desktop

C. Watson Studio Local

D. Watson Studio Business

Question 60

Test Blanc
What is the
t he arch
architectu
itecture
re of Watso
Watsonn Studio
Studi o centered
centered
on?
Your answer

A. Collaborators
B. Data Assets

C. Projects

D. Analytic Assets

Big Data Engineer v2


IBM Certifcation
2018

/What are the 4Vs of Big Data? (Please select the FOUR that apply)
• Veracity
• Velocity
• Variety
• Volume

2/ What are the three types of Big Data? (Please select the THREE that apply)
• Semi-structured
• Structured
• Unstructured

3/ Select all the components of HDP which provides data access capabilities

Pig
• MapReduce
• Hive

4/ Select the components that provides the capability to move data from
relational database into Hadoop.
• Sqoop
• Kaka
• Flume

5/ Managing Hadoop clusters can be accomplished using which component?


• Ambari

6/ True or False: The following components are value-add from IBM: Big
Replicate, Big SQL, BigIntegrate, BigQuality, Big Match
• TRUE

7/ True or False: Data Science capabilities can be achieved using only HDP.
FALSE (Big Data Ecosystem UNIT 2)
p.45 // Hortonworks Data Platorm.
8/ True or False: Ambari is backed by RESTful APIs for developers to easily
integrate with their own applications.
• True

9/ Which Hadoop functionalities does Ambari provide?


• Manage
• Provision
• Integrate
• Monitor

0/Which page from the Ambari UI allows you to check the versions of the
software installed on your cluster?
• The Admin > Manage Ambari page

/True or False?Creating users through the Ambari UI will also create the
user on the HDFS.
• FALSE

2/True or False? You can use the CURL commands to issue commands to
Ambari.
• TRUE

3/True or False: Hadoop systems are designed for transaction processing.


• FALSE

4/What
4/What is the default number of replicas in a Hadoop system?
• 3

5/True or False: One of the driving principal of Hadoop is that the data is
brought to the program.
FALSE (Big Data Ecosystem UNIT 4)
⇒ programs are brought to the data, not the data to the program
p rogram

6/True
6/ True or False: Atleast 2 Name Nodes are required for a standalone
Hadoop cluster.
FALSE (Big Data Ecosystem UNIT 4)
⇒ One Name Node is required

7/True or False: The phases in a MR job are Map, Shuffle, Reduce and
Combiner
• TRUE
8/Centralized handling of job control flow is one of the the limitations of MR
v1.
• TRUE
9/The Job Tracker in MR1 is replaced by which component(s) in YARN?
• ApplicationMaster
• ResourceManager

20/ What are the benefits of using Spark? (Please select the THREE that
apply)
• Generality
• Speed
• East o use

2 / What are the languages supported by Spark? (Please select the THREE
that apply)
• Python
• Java
• Scala

22/ Resilient Distributed Dataset (RDD) is the primary abstraction of Spark.


• True

23/ What would you need to do in a Spark application that you would not
need to do in a Spark shell to start using Spark?
• Import the necessary libraries to load the SparkContext

24/ True or False: NoSQL database is designed for those that do not want to
use SQL.
FALSE (Big Data Ecosystem UNIT 7)
⇒ Note: HBase and other NoSQL distributed data stores are subject to the CAP
Theorem which states
states that distributed NoSQL data stores can only achieve 2
out of the 3 properties: consistency
consistency,, availability and partition tolerance.
25/ Which database is a columnar storage database?
• Hbase

26/ Which database provides a SQL for Hadoop interface?


• Hive

27/ Which Apache project provides coordination of resources?

• ZooKeeper

28/ What is ZooKeeper's role in the Hadoop infrastructure?


• Manage the coordination between HBase servers
• Hadoop and MapReduce uses ZooKeeper to aid in
i n high
availability o Resource Manager
• Flume uses ZooKeeper or confguration purposes in recent
releases
29/ True or False: Slider provides an intuitive UI which allows you to
dynamically allocate YARN resources.
FALSE (Big Data Ecosystem UNIT 8)
⇒ Apache Slider works in conjunction with YARN
YARN to deploy distributed
applications and to monitor them.
30/ True or False: Knox can provide all the security you need within your

Hadoop infrastructure.
FALSE (Big Data Ecosystem UNIT 8)
⇒ Apache Knox provides peripheral security services to an Hadoop
cluster

3 / True or False: Sqoop is used to transfer data between Hadoop and

relational databases.
• True

32/True or False: For Sqoop to connect to a relational database, the JDBC


32/True
JAR files for that database must be located in $SQOOP_HOME/bin.
FALSE (Big Data Ecosystem UNIT 9)
⇒ Must copy the JDBC driver JAR fles or any relational databases to
$SQOOP_HOME/lib
33/ True or False: Each Flume node receives data as "source", stores it in a
"channel", and sends it via a "sink".
• True

34/ Through what HDP component are Kerberos, Knox, and Ranger managed?
• Ambari

35/ Which security component is used to provide peripheral security?


• Apache Knox

36/ One of the governance issue that Hortonworks DataPlane Service (DPS)
address is visibility over all of an organization's data across all of their

environments — on-prem, cloud, hybrid — while making it easy to maintain


consistent security and governance
• True

37/ True or false: The typical sources of streaming data are Sensors, "Data
exhaust" and high-rate transaction data.
• True
38/ What are the components of Hortonworks Data Flow(HDF)?

Flow management
• Stream processing
• Enterprise services

39/ True or False: NiFi is a disk-based, microbatch ETL tool that provides flow
management
• True

40/ True or False: MiNiFi is a complementary data collection tool that feeds
collected data to NiFi

True
4 / What main features does IBM Streams provide as a Streaming Data
Platform? (Please select the THREE that apply)
• Analysis and visualization
• Rich data connections
• Development support
42/ What are the most important computer languages for Data Analytics?
(Please select the THREE that apply)
• Python
• R
• Scala

43/ True or False: GPUs are special-purpose processors that traditionally can
be used to power graphical displays, but for Data Analytics lend themselves to
faster algorithm execution because of the large number of independent
processing cores.
• True

44/ True or False: Jupyter stores its workbooks in files with the .ipynb suffix.
These files can not be stored locally or on a hub server.
FALSE (Introduction to Data Science UNIT 1)
Course 3

45/ $BIGSQL_HOME/bin/bigsql start


command is used to start Big SQL from the command line?
• True

46/ What are the two ways you can work with Big SQL.
(Please select the TWO that apply)
• Jsqsh
• Web tooling rom DSM

47/ What is one of the reasons to use Big SQL?


• Wantt to access your Hadoop data without using MapReduce
Wan

You might also like