Professional Documents
Culture Documents
DadBig Data Hadoop Interview Questions and Answers
DadBig Data Hadoop Interview Questions and Answers
2. What is Cassandra?
Cassandra is one of the most favored NoSQL distributed database management
systems by Apache. With open source technology, Cassandra is efficiently designed to
store and manage large volumes of data without any failure. Highly scalable for Big
Data models and originally designed by Facebook, Apache Cassandra is written in Java
comprising flexible schemas. Apache Cassandra has no single point of failure.There are
various types of NoSQL databases and Cassandra is a hybrid of column-oriented and
key-value store database. The keyspace is outermost container for an application and
table or column family in Cassandra is keyspace entity.
3. List the benefits of using Cassandra.
Unlike traditional or any other database, Apache Cassandradelivers near real-time
performance simplifying the work of Developers, Administrators, Data Analysts and
Software Engineers.
SPM primarily administers Cassandra metrics and various OS and JVM metrics.
Besides Cassandra, SPM also monitors Hadoop, Spark, Solr, Storm, zookeeper
and other Big Data platforms. The main features of SPM include correlation of
events and metrics, distributed transaction tracing, creating real-time graphs
with zooming, anomaly detection and heartbeat alerting.
7. Define memtable.
Similar to table, memtable is in-memory/write-back cache space consisting of content in
key and column format. The data in memtable is sorted by key, and each ColumnFamily
consist of a distinct memtable that retrieves column data via key. It stores the writes
until it is full, and then flushed out.
8. What is SSTable? How is it different from other relational tables?
SSTable expands to ‘Sorted String Table,’ which refers to an important data file in
Cassandra and accepts regular written memtables. They are stored on disk and exist
for each Cassandra table. Exhibiting immutability, SStables do not allow any further
addition and removal of data items once written. For each SSTable, Cassandra creates
three separate files like partition index, partition summary and a bloom filter.
9. Explain the concept of Bloom Filter.
Associated with SSTable, Bloom filter is an off-heap (off the Java heap to native
memory) data structure to check whether there is any data available in the SSTable
before performing any I/O disk operation.Learn more about Apache Cassandra- A Brief
Intro in this insightful blog now!
10. Explain CAP Theorem.
With a strong requirement to scale systems when additional resources are
needed, CAP Theoremplays a major role in maintaining the scaling strategy. It is an
efficient way to handle scaling in distributed systems. Consistency Availability and
Partition tolerance (CAP) theorem states that in distributed systems like Cassandra,
users can enjoy only two out of these three characteristics.
One of them needs to be sacrificed. Consistency guarantees the return of most recent
write for the client, Availability returns a rational response within minimum time and in
Partition Tolerance, the system will continue its operations when network partitions
occur. The two options available are AP and CP.
11. State the differences between a node, a cluster and datacenter in Cassandra.
There are various components of Cassandra. While a node is a single machine running
Cassandra, cluster is a collection of nodes that have similar type of data grouped
together. Data Centers are useful components when serving customers in different
geographical areas. You can group different nodes of a cluster into different data
centers.
12. How to write a query in Cassandra?
Using CQL (Cassandra Query Language).Cqlsh is used for interacting with database.
13. What OS Cassandra supports?
Windows and Linux
14. What is Cassandra Data Model?
Cassandra Data Model consists of four main components:
Cluster: Made up of multiple nodes and keyspaces
Keyspace: a namespace to group multiple column families, especially one per partition
Column: consists of a column name, value and timestamp
ColumnFamily: multiple columns with row key reference.
15. What is CQL?
CQL is Cassandra Query language to access and query the Apache distributed
database. It consists of a CQL parser that incites all the implementation details to the
server. The syntax of CQL is similar to SQL but it does not alter the Cassandra data
model.
16. Explain the concept of compaction in Cassandra.
Compaction refers to a maintenance process in Cassandra , in which, the SSTables are
reorganized for data optimization of data structure son the disk. The compaction
process is useful during interactive with memtable. There are two type sof compaction
in Cassandra:
Minor compaction: started automatically when a new sstable is created. Here,
Cassandra condenses all the equally sized sstables into one.
Major compaction is triggered manually using nodetool. Compacts all sstables of a
ColumnFamily into one.
17. Does Cassandra support ACID transactions?
Unlike relational databases, Cassandra does not support ACID transactions.
18. Explain Cqlsh
Cqlsh expands to Cassandra Query language Shell that configures the CQL interactive
terminal. It is a Python-base command-line prompt used on Linux or Windows and
exequte CQL commands like ASSUME, CAPTURE, CONSITENCY, COPY, DESCRIBE
and many others. With cqlsh, users can define a schema, insert data and execute a
query.
19. What is SuperColumn in Cassandra?
Cassandra Super Column is a unique element consisting of similar collections of data.
They are actually key-value pairs with values as columns. It is a sorted array of
columns, and they follow a hierarchy when in action: keystore> column family> super
column> column data structure in JSON.
Similar to row keys, super column data entries contains no independent values but are
used to collect other columns. It is interesting to note that super column keys appearing
in different rows do not necessarily match and will not ever.
20. Define the consistency levels for read operations in Cassandra.