Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

FaunaDB:

A Guide for Relational Users

Technical Whitepaper
Table of Contents
Evolution of relational databases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Relational databases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Integrity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Database transactions and ACID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
What does relational mean to different database operators?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Challenges posed by the advent of the internet era. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Impedance mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Sub-optimal data models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Dynamic scalability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Increasing data volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
The NoSQL evolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
The growth of the multi-datacenter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Scaling out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Superior data handling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Lower cost of ownership. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Flexible data model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Application performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
But what about data consistency and relations?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Operational readiness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Lack of standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Table of Contents Continued
Joins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Need of the day: A modern database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
The relational checklist in FaunaDB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Step 1: Create the class. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Step 2: Create indices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Step 3: Create instances (documents) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Step 4: Join the two and return the results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Evolution of relational generalized graph structure. In spite of the
relatively flexible structure, the network model
databases did not gain traction over the hierarchical model
because of IBM ’s decision to use the latter
The evolution of databases has been a history in its established IMS database. These early
of “Creative destruction”. Joseph Schumpeter, databases saw adoption only with businesses
the famous Austrian economist, in his widely with deep pockets and they were conceived
accepted economic theory explains creative as complex, difficult, and time-consuming to
destruction as the “process of industrial design. The upfront cost to procure specialized
mutation that incessantly revolutionizes the hardware was also an inhibitor and along with
economic structure from within, incessantly the poor performance of applications left a lot
destroying the old one, incessantly creating a to be desired. The complexity to model beyond
new one”. Taking a cue from the above, we can one-to-many relationships only added to these
very easily relate to how database management shortcomings.
systems have been evolving over the last 50
years or so. Primarily tedious navigational access and high
total cost of ownership led to the relational
In the 1960s, both the hierarchical and the database model, i.e., the next disruption in the
network models to store databases became 1970s.
popular. The hierarchical databases structured
data in a tree-like model in which each record
of data is connected to another one using links.
Relational databases
The hierarchical database enforced a one-
The next big step in database technology came
to-many relationship between a parent and a
in the 1970s when Edgar Codd described the
child record and, in order to read any record,
relational database model. Codd proposed 12
the whole database needed to be traversed
rules (actually 13, numbered 0 through 12) to
from the root node to the leaf. The hierarchical
explain the properties of a relational database
model was the first database model introduced
system:
by IBM. The network model allowed more
natural modeling of relationships between
entities. Unlike the hierarchical model, the
network model allowed each record to have
multiple parent and child records forming a

Technical Whitepaper 4
Rule 0: The foundation rule
■ The system must qualify as relational, as a database, and as a management system. For a
system to qualify as a relational database management system (RDBMS), that system must use
its relational facilities (exclusively) to manage the database.
■ The other 12 rules derive from this rule. The rules are as follows :

Rule 1: The information rule: All information in the database is to be represented in one and only
one way, namely by values in column positions within rows of tables.

Rule 2: The guaranteed access rule: All data must be accessible. This rule is essentially a
restatement of the fundamental requirement for primary keys. It says that every individual scalar
value in the database must be logically addressable by specifying the name of the containing
table, the name of the containing column and the primary key value of the containing row.

Rule 3: Systematic treatment of null values: The DBMS must allow each field to remain null (or
empty). Specifically, it must support a representation of “missing information and inapplicable
information” that is systematic, distinct from all regular values (for example, “distinct from zero or
any other number”, in the case of numeric values), and independent of data type. It is also implied
that such representations must be manipulated by the DBMS in a systematic way.

Rule 4: Active online catalog based on the relational model: The system must support an
online, inline, relational catalog that is accessible to authorized users by means of their regular
query language. That is, users must be able to access the database’s structure (catalog) using the
same query language that they use to access the database’s data.

Rule 5: The comprehensive data sublanguage rule: The system must support at least one
relational language that:
1. Has a linear syntax.
2. Can be used both interactively and within application programs.
3. Supports data definition operations (including view definitions), data manipulation operations
(update as well as retrieval), security and integrity constraints, and transaction management
operations (begin, commit, and rollback).

Technical Whitepaper 5
Rule 6: The view updating rule: All views that are theoretically updatable must also be updatable
by the system.

Rule 7: High-level insert, update, and delete: The system must support set-at-a-time insert,
update, and delete operations. This means that data can be retrieved from a relational database
in sets constructed of data from multiple rows and/or multiple tables. This rule states that insert,
update, and delete operations should be supported for any retrievable set rather than just for a
single row in a single table.

Rule 8: Physical data independence: Changes to the physical level (how the data is stored,
whether in arrays or linked lists, etc.) must not require a change to an application based on the
structure.

Rule 9: Logical data independence: Changes to the logical level (tables, columns, rows, and so
on) must not require a change to an application based on the structure. Logical data independence
is more difficult to achieve than physical data independence.

Rule 10: Integrity independence: Integrity constraints must be specified separately from
application programs and stored in the catalog. It must be possible to change such constraints as
appropriate without unnecessarily affecting existing applications.

Rule 11: Distribution independence: The distribution of portions of the database to various
locations should be invisible to users of the database. Existing applications should continue to
operate successfully:

1. When a distributed version of the DBMS is first introduced.


2. When existing distributed data are redistributed around the system.

Rule 12: The non-subversion rule: If the system provides a low-level (record-at-a-time) interface,
then that interface cannot be used to subvert the system, for example, bypassing a relational
security or integrity constraint.

Technical Whitepaper 6
In a strict sense, none of the modern day It is very important to understand here that
databases adhere to these 13 rules but many the ground rules for relational databases
commercial databases do come close. An were set at a time when storing and retrieving
extensive examination of all the rules is beyond information from disks was quite expensive,
the scope of this paper, but if we summarize and thus application of normalization became
these rules, the core tenets of an RDBMS really important. Normalizing data to the most
distills down to three very high-level concepts: granular level often leads to complicated
Structure, Integrity, and Manipulation (Baker, data models that are extremely difficult
Relational Databases 1992). to comprehend. Over the years, new data
modeling techniques like the star schema and
Structure the snowflake schema have been accepted
even though they don’t adhere to the principles
The structure of the relational model refers to of normalization.
Entities, which over the years have become
synonymous with a Table in the relational lingo. Integrity
Codd proposed (in Rule #1) that all data in the
database be represented as values in column Integrity mainly refers to relational integrity and
positions within rows of tables. Basically, only constraints. It enforces that a foreign key of one
put one piece of data in each field, and do not relation either can hold values that are present
put the same data in different fields of different in the primary key columns or must be null.
tables. This process is called normalization This rule means that if one instance of an entity
and leads to the creation of entity relationships exists and refers to another entity occurrence,
within the database. Normalization is then the referenced entity must also exist. The
essentially a data-analysis process used to other aspects of integrity are constraints that
find the simplest possible data structure for a can enforce uniqueness in a specific column
given collection of information by eliminating without making it a primary key. In relational
the replication of data across tables. However, databases, unique constraints are enforced
in order to build relationships between entities, using indexes. Indexes are key to data access
there is a need to uniquely identify each record. in relational databases.
This unique identifier for a table row is called
a Primary key. Data in primary key columns
are used to build relationships between two or
Manipulation
more related entities.
The data manipulation aspect of relational

Technical Whitepaper 7
databases can be divided into two major and so forth. SQL as a database manipulation
elements: language has become so popular that it has
become synonymous with relational databases.
■ Manipulation Operations It is easy to learn and provides a very intuitive
■ Manipulation Language syntax to operate on the data.

Data manipulation operations adhere to


Database transactions and ACID
relational algebra and consist of the set
operators, such as intersection, union, minus,
A database transaction is a unit of work
etc. The ability to Join by comparing data in two
performed within a database in a coherent and
columns in two different tables forms the crux
reliable way independent of other transactions.
of data manipulation in relational databases.
Transactions generally represent any
Joins provide the ability to access any data in
change in the database. The built-in support
the normalized entities.
for transactions has been one of the key
reasons for widespread adoption of relational
The manipulation language as mentioned in
databases. A database transaction by definition
Codd’s Rule #5 allows the user to interact with
needs to be atomic, consistent, isolated, and
the data stored across various tables. Users
durable, i.e., ACID compliant.
access data through a high-level, declarative
database language. Instead of writing an
algorithm to obtain desired records one at
a time, the application programmer is only
required to specify a predicate that identifies
the desired record(s) or combination of records.

The Structured Query Language or SQL has


become the universal standard for this data
manipulation language. SQL can be executed
interactively or embedded in a programming
language. The embedded mode makes SQL
the database programming language for a
host language, e.g., C, FORTRAN, COBOL,

Technical Whitepaper 8
Atomicity An atomic transaction is an indivisible and irreducible series of database
operations such that either all occur, or nothing occurs.

Consistent Consistency guarantees that any transactions started in the future neces-
sarily see the effects of other transactions committed in the past.

Isolated Isolation determines how transaction integrity is visible to other users and
systems.

Durable Durability guarantees that transactions that have been committed will sur-
vive permanently.

What does relational mean to different database operators?

We have discussed the foundations of relational databases in the above sections. However, over
the past 40 years, the interpretation of the word “relational” has morphed. If we ran a survey across
database developers, architects, and even administrators to ask them which relational property is the
most important, the answers would be surprisingly different. While most developers would identify
SQL, a database architect might choose referential integrity, and the administrator might choose the
ACID properties.

Over the years, the use of relational databases for mission-critical systems has also ensured certain
operational readiness. The ability to recover to any point in time or to failover to a standby system in
case of any hardware failure are now considered table stakes in the relational world.

Technical Whitepaper 9
Challenges posed Sub-optimal data models
by the advent of the While relational systems meet most business
internet era operations needs (because relationships
are the way most business are modeled),
The internet revolution in the late 90s changed the relational data model doesn’t fit in with
how businesses were conducted. The need for every data domain. Today, more than 85% of
global internet-scale applications changed the corporate data is generated either in the web as
fundamental requirements of how applications free form text or by machines and modern day
needed to be designed. The introduction of new applications that need to store data in various
application programming paradigms exposed formats. Textual unstructured data includes
issues with the relational database model, documents, presentations, videos, email, chat,
some of which forced more innovation in the and social media posts, while machine data
database industry. includes data generated by various sensors,
electric meters, IoT devices, and so on.
Analyzing all of this unstructured data can be
Impedance mismatch
challenging with massive data volumes, many
different file types, and high creation speeds.
A mismatch between the object-oriented and
The structured data model, with the upfront
the relational world.
definition of the table structure and referential
constraints, is not suitable for storing and
New-generation programming languages are
retrieving such unstructured data.
primarily object-oriented, where objects are
connected via references and build an object
In the pre-internet era, the design of an
hierarchy or graph. But relational databases
application was driven by the data model.
store data in table rows and columns. To store
A business process was mapped to entity
the object hierarchy or the graph in a relational
relationships and applications were built within
database, the object or graph has to be sliced
the bounds of that definition. With the advent of
and flattened to fit the normalized data format.
the internet era, the focus changed to capturing
This results in a complex joining of tables
data in its original free form. Thus, the rigidity
and often leads to performance issues in the
of a structured model had to be given up.
databases.
Applications now require flexibility so that data
can be captured in its original form.

Technical Whitepaper 10
Dynamic scalability data shards. But eventually it came down to
one single point of failure that always made the
Relational databases thrived during the client- database vulnerable to failures. For example,
server era and clearly were not built to handle Oracle RAC can support multiple database
web-scale workloads. So, when the application nodes, but it still requires a shared disk
load increased, the only way to keep up with subsystem where a single corruption or crash
performance needs was to scale up, i.e., move can take down the entire application.
the database to a bigger machine. Many people
tried horizontal scaling with shared disks Performance
across multiple machines, but that approach
could not keep up with the demands of running Application performance expectations
a database across multiple data centers. The have changed significantly over the years.
scale up approach left a lot to be desired Applications are expected to ingest data at high
and led to a high total cost of ownership. speed from multiple sources and to keep read
The operational overhead to maintain large latencies under a few milliseconds.
databases is significant and often requires
hiring a team of highly skilled DBAs to maintain One of the prime requirements of a relational
the application backend. database is to support a declarative language
for data manipulation. If we take a step back
Increasing data volumes and assess the reasons behind the popularity
of SQL, we will see it is primarily because of the
Data volumes generated by applications are fact that it helps to answer the “What?” and not
growing exponentially. In the last decade, the the “How?”. In other words, the end user doesn’t
notion of a large database has changed from need to specify the access path (i.e., an Index
a few hundred GBs to hundreds of TBs. Even or join order), but only provide the name of the
petabyte scale is not uncommon these days. entity (table). While in theory this sounds pretty
Relational databases were designed to run good, in reality this is quite the opposite. For
on single servers and hence they are limited any given application, a team of performance
by design to the system resources that can engineers carefully study access patterns and
be made available on the node. The relational suggest required indices to ensure agreed SLAs.
databases did undergo improvements to While the end users may not care about the nuts
cope with this increased demand by allowing and bolts of indices or materialized views, the
active-passive architectures or by creating operational overhead of maintaining such

Technical Whitepaper 11
additional objects cannot be ignored. Now, The growth of the multi-datacenter
as the data volumes grow and new access
patterns emerge, the predictability of One of the factors that has triggered the need
performance goes out the window. This has for distributed databases is the adoption of
been a huge problem with relational databases, multiple data centers by enterprise IT. Multiple
so much so that database vendors like Oracle data centers were initially introduced purely
have introduced several features just to ensure as a disaster recovery strategy, but enterprises
that the execution plan (access pattern) doesn’t soon realized that in order to stay competitive
change. they could not have machines idle in a data
center purely as insurance for an impending
The other side of the problem has been failure. Relational databases provide ways to
the need to join multiple tables, even for keep a copy of the database in-sync across data
addressing simple queries. As data volumes centers, but they do not support active-active
grow, performing efficient joins across multiple setups, i.e., writes across multiple data centers.
tables poses a challenge to meet business NoSQL databases have been built with the
expectations. premise of being distributed across multiple data
centers.
Primarily because of these shortcomings,
companies are progressively considering The key benefits to using NoSQL to process Big
alternatives to legacy relational infrastructure. Data are as follows:

Scaling out
The NoSQL evolution
As discussed earlier, relational databases
NoSQL (or Not Only SQL) encompasses a wide relied on scaling up -- purchasing larger, more
variety of different database technologies that expensive servers -- as workload increased, but
have been developed in response to the growth NoSQL databases can scale out transparently
in the data volume, the increased frequency of across multiple data centers. NoSQL databases
data retrieval, and performance and processing are designed to work on low-cost commodity
needs. It has now become an alternative to the hardware.
traditional relational databases with built-in
horizontal scalability and high availability but
not compromising performance.

Technical Whitepaper 12
Superior data handling documents column-family, and graph) also helps
with improved application performance.
NoSQL distributed databases allow data to
be spread across hundreds of servers with These advantages have led to widespread
little reduction in performance. Features like adoption of NoSQL databases in the last decade.
data replication, automatic repair, relaxed data
distribution, and simpler data models make But what about data
database management much easier in NoSQL
databases.
consistency and
relations?
Lower cost of ownership
Before getting into the shortcomings of the
first generation NoSQL databases, it is very
NoSQL databases usually use clusters of
important to understand the CAP theorem
inexpensive commodity servers to manage the
which describes the strategies of distributing
extremely large data and transaction volumes.
application logic across networks. The CAP
Thus, both upfront cost and management cost
theorem describes three areas of concern and
for NoSQL databases are much less than with
prescribes that one can only choose two out of
RDBMS.
the three in case of a network partition:

Flexible data model ■ Consistency - All database users see the same
data even with concurrent updates
An upfront data model definition is not a ■ Availability - All database users are able to
requirement in a NoSQL database. Thus, it access some version of the data
fosters rapid application development and ■ Partition Tolerance - The entire database can
deployment. The data created or the underlying be distributed across multiple networks (read
data model can be restructured at any time, data centers)
without any application disruption. This offers
immense application and business flexibility.

Application performance

The flexibility in data models and the ability


to store data in multiple formats (key-value,

Technical Whitepaper 13
In distributed NoSQL databases, one of the (AP systems) may work for some applications,
biggest challenges is ensuring that data most mission-critical applications will choose
remains synchronized across data centers consistency over availability. The requirement to
or regions. If it is very important for the see a consistent version of data is paramount to
application users to have a consistent view a number of financial, retail, and inventory-based
of data irrespective of the data center they applications. Hence, mission-critical applications
are connected to, any writes on one of the have been resistant to adopting NoSQL
database servers must be replicated to all databases.
the data centers without much delay. If it is
extremely critical or a business requirement
that each user must see all changes as they “When we started at Twitter,
are happening, it becomes essential that the
databases were bad. When we
application wait until a majority of the database
servers have acknowledged a specific write left, they were still bad”
or the version of data being read. In this case,
“consistency” of data is the driver and for - Evan Weaver, CEO of Fauna
a significant amount of time a use case for
relational databases.

Anecdotal evidence of the inefficiencies of the


However, if the availability of the application first generation NoSQL systems is available first
is more important than consistency, database hand from Evan Weaver (now CEO at Fauna),
clients can write data to any node of the cluster employee number 15 at Twitter and responsible
without waiting for an agreement from another for running its software infrastructure team.
database server. The database must provide a Twitter, one of the early adopters of Cassandra,
way to reconcile the changes and the data is couldn’t rely on existing databases to scale the
“eventually consistent”. site. This resulted in frequent outages and the
famous fail whale page.
A majority of the first generation NoSQL
databases chose availability over consistency Apart from the problem of eventual consistency,
and are “eventually consistent”. the following shortcomings have also been a
deterrent to the adoption of NoSQL databases.
While this strategy of eventual consistency

Technical Whitepaper 14
Operational readiness Lack of standardization

A lot of the NoSQL databases are extremely The design and query languages vary widely
difficult to manage and maintain. One of among various NoSQL database products.
the possible reasons is that most of these Depending on the type of NoSQL database
databases are an outgrowth of an open source chosen, the learning curve can be very steep.
project. Open source committers are great at
adding new functionality, but they seldom care
Joins
about enterprise readiness. Large companies
can throw developers at these issues and work
NoSQL databases do not require any pre-
around the immaturities, but smaller digital
defined data models or referential integrity. As a
businesses cannot afford to do that. They need
result, there is no predefined structure that can
a solution that just works out of the box.
be used to join two different collections of data.

Typical problems are often very basic


With these shortcomings in place, it is clear
requirements for the operators. There are
that modern-day applications need a database
no easy ways to take incremental backups
that has the best of breed features from both
or straightforward ways to do point-in-time
relational and NoSQL database worlds.
recoveries. Most of these first generation
NoSQL databases simply lacked the enterprise
readiness required to run the database in a Need of the day: A
corporate data center. modern database
Security The advent of NoSQL databases in the last
decade has addressed the issue of scaling out
Security is a major concern for IT Enterprise at the expense of relational integrity, but even
Infrastructures. Security in NoSQL databases then the primary focus has been on how data is
is very weak; authentication and encryption stored and distributed. Some of these NoSQL
both in motion and at rest are very weakly databases are strictly key-value stores, some are
implemented. document based, and a few are wide-column
stores. This focus on how to store the data
has often meant that the need for consistency,
enterprise readiness, and ease of use have been
overlooked. We have many NoSQL database

Technical Whitepaper 15
solutions available today as point solutions to niche use cases, but none of them have evolved to a
platform that provides the enterprise-grade readiness that we usually identify with RDBMSs. Today’s
modern applications want the global scale and flexibility of NoSQL but, at the same time, they
want the consistency, security, and reliability of relational systems.

FaunaDB provides the best of both the relational and NoSQL worlds. Here is a summary of FaunaDB
features that let you create distributed applications at a global scale without compromising the
integrity of data:

Feature Description

Strong ACID FaunaDB guarantees 100% ACID transactions across a distributed database cluster.
Transactions

Native FaunaDB has built-in quality of service (QoS) capabilities that allow multiple
Multi-tenancy tenants to co-exist without any noisy neighbor issues. FaunaDB is great for SaaS
applications.

Scalability e FaunaDB can scale from a single node install to an enterprise-grade setup across
multiple data centers with no downtime. The management of the database is
completely transparent to the application.

High Availability FaunaDB’s peer-to-peer/masterless architecture provides a seamless response to


any node failure or data center outage.

Operational Operational simplicity is at the core of FaunaDB. Unlike incumbent NoSQL


Simplicity solutions, FaunaDB’s rich cluster management APIs makes cluster setup a breeze.

Security FaunaDB provides identity management, row-level access control, transaction


auditing, and encryption in motion and encryption at rest coming soon.

Document FaunaDB supports various application programming languages with built-in drivers.
Relational A single query in FaunaDB (FQL) can pull data across documents, relational graph,
Features and temporal datasets.

Technical Whitepaper 16
The relational checklist in FaunaDB

Feature Description

Structure FaunaDB provides the flexibility of NoSQL databases when it comes to schema
design. At the same time, like relational databases with indexes and the (soon to be
added) class validators, FaunaDB enforces structure and defined access patterns to
the data. FaunaDB stores data as documents (rows) in the database.

Integrity Fauna DB supports the creation of Primary keys and will soon be able to enforce
Foreign keys.

Joins e FaunaDB supports equi-joins and other set operations like union, intersect, minus,
etc. across classes. Support for outer joins is on the roadmap.

Indexes FaunaDB supports the creation of various types of Indexes. Indexes are used to
define the access path to data.

ACID FaunaDB fully supports transactions and provides strong consistency.

Query Language Fauna Query Language (FQL) allows for complex, precise manipulation and
retrieval of data stored within FaunaDB.

Enterprise FaunaDB not only solves the problem of scale, but all the problems adjacent to
Readiness scale like security, compliance, global distribution, consistency, multi-tenancy, and
more.

Technical Whitepaper 17
Examples
In this section, we will show how indexes can be created in FaunaDB and how we can leverage them
to join two different classes. In order to expand on the example, we will use the data and structure of
the EMP and DEPT tables usually found in the SCOTT schema of almost every Oracle database.

Step 1: Create the class

Fauna doesn’t enforce a predefined structure onclasses, as one would normally associate with tables
in a relational database. The structure is enforced in the instances created in the class.
The following commands can be executed using Fauna Shell. Refer to the FaunaDB documentation for
setting up the Fauna Shell in your laptop or workstation.

# Create the dept class


CreateClass({name: “dept”});

#Create the Emp Class


CreateClass({name: “emp”})

Step 2: Create indices

Access paths to data are defined via indexes within Fauna. In the example below, two such indexes are
created to join the two classes on the department number “deptno” column. Indexes can have both
terms and values. Values of one index are joined to the terms of another index So to run a query that
returns all employees for a specific department, we need an index on the Dept class with the term
“dname” (department name) and the value “deptno” (department number). The index on the Emp class
should use “deptno” as the term and the Refs (location of the data, loosely analogous with ROWIDs in
relational databases) as the values.

Technical Whitepaper 18
CreateIndex({name:”emp_by_deptno”,
source:Class(“emp”),
terms:[{field: [“data”,”deptno”]}]})

Step 3: Create instances (documents)

Use the following commands to create records in the EMP and Dept classes. Only a single sample is
shown. The entire script is available in the appendix of this document

# Create the dept document

Create(
Class(“dept”),
{ data:
{ “deptno”:10, “dname”: “ACCOUNTING”, “loc”: “NEW YORK”}
});

#Create the Emp Class

Create(
Class(“emp”),
{ data:
{ “empno”:7839, “ename”: “KING”, “job”: “PRESIDENT”, “mgr”: null,
“hiredate”:”1981-11-17”, “sal”:5000, “comm”:null, “deptno”:10}
});

Step 4: Join the two and return the results

In this example, let us assume that we want to find all employees in the SALES department.

Technical Whitepaper 19
# Find all Refs in the Emp class matching the sales dept
Paginate(Join(Match(Index(“dept_by_name”),”SALES”),Index(“emp_by_
deptno”)))

# Get data using Refs


Map(Paginate(Join(Match(Index(“dept_by_name”),”SALES”),Index(“emp_
by_deptno”))),Lambda(“X”,Get(Var(“X”))))

The above query returns all records in the Emp class matching the SALES department.

employees> Paginate(Join(Match(Index(“dept_by_
name”),”SALES”),Index(“emp_by_deptno”)))
{ data:
[ Ref(Class(“emp”), “212658237367910913”),
Ref(Class(“emp”), “212658785828733440”),
Ref(Class(“emp”), “212659256472633858”),
Ref(Class(“emp”), “212659262752555520”),
Ref(Class(“emp”), “212659274083467777”) ] }
employees>
Map(Paginate(Join(Match(Index(“dept_by_name”),”SALES”),Index(“emp_
by_deptno”))),Lambda(“X”,Get(Var(“X”))))
{ data:
[ { ref: Ref(Class(“emp”), “212658237367910913”),
ts: 1539065549080650,
data:
{ empno: 7698,
ename: ‘BLAKE’,
job: ‘MANAGER’,
mgr: 7839,
hiredate: ‘1981-5-1’,
sal: 2850,
deptno: 30 } },

Technical Whitepaper 20
{ ref: Ref(Class(“emp”), “212658785828733440”),
ts: 1539066072138690,
data:
{ empno: 7499,
ename: ‘ALLEN’,
job: ‘SALESMAN’,
mgr: 7698,
hiredate: ‘1981-02-22’,
sal: 1600,
comm: 300,
deptno: 30 } },
{ ref: Ref(Class(“emp”), “212659256472633858”),
ts: 1539066520973573,
data:
{ empno: 7521,
ename: ‘WARD’,
job: ‘SALESMAN’,
mgr: 7698,
hiredate: ‘1981-02-22’,
sal: 1250,
comm: 500,
deptno: 30 } },
{ ref: Ref(Class(“emp”), “212659262752555520”),
ts: 1539066526966442,
data:
{ empno: 7844,
ename: ‘TURNER’,
job: ‘SALESMAN’,
mgr: 7698,
hiredate: ‘1981-09-08’,
sal: 1500,
comm: 0,
deptno: 30 } },
{ ref: Ref(Class(“emp”), “212659274083467777”),
ts: 1539066537773781,
data:

Technical Whitepaper 21
{ empno: 7900,
ename: ‘JAMES’,
job: ‘CLERK’
mgr: 7698,
hiredate: ‘1987-12-3’,
sal: 950,
deptno: 30 } } ] }

Conclusion
We have inspected the continuous evolution of database management systems over the last 50 years
and how disruption during the internet era has led to a shift from the traditional relational model to
the new generation of NoSQL databases. While ACID guarantees were taken for granted in traditional
relational databases, the first generation of NoSQL databases either had no support or supported a
very restrictive flavor of transactions. As a result, much of the complexity of ensuring the integrity and
consistency of data had to be pushed into the application layers. This made developing applications
on the first generation of NoSQL databases often cumbersome and eventually difficult to maintain. The
lack of complete transaction support coupled with the lack of enterprise readiness has prevented the
widespread adoption of NoSQL databases in mission-critical transactional applications that require
global scale.

Unlike those first-generation NoSQL databases, FaunaDB supports fully distributed ACID transactions
with serializable isolation across geographically distributed replicas. FaunaDB’s built-in relational-
like capabilities provide the operational readiness and robustness that are usually associated with
relational databases.

FaunaDB is the future of data-driven applications that require the best of both relational and NoSQL
worlds.

Technical Whitepaper 22
Appendix
CreateClass({name: “dept”});

Create(
Class(“dept”),
{ data:
{ “deptno”:10, “dname”: “ACCOUNTING”, “loc”: “NEW YORK”}
});

Create(
Class(“dept”),
{ data:
{ “deptno”:20, “dname”: “RESEARCH”, “loc”: “DALLAS”}
});

Create(
Class(“dept”),
{ data:
{ “deptno”:30, “dname”: “SALES”, “loc”: “CHICAGO”}
});

Create(
Class(“dept”),
{ data:
{ “deptno”:40, “dname”: “OPERATIONS”, “loc”: “BOSTON”}
});

/* Insert into Employee class */

CreateClass({name: “emp”});

Create(
Class(“emp”),
{ data:
{ “empno”:7839, “ename”: “KING”, “job”: “PRESIDENT”, “mgr”: null,
“hiredate”:”1981-11-17”, “sal”:5000, “comm”:null, “deptno”:10}
});

Technical Whitepaper 23
Create(
Class(“emp”),
{ data:
{ “empno”:7698, “ename”: “BLAKE”, “job”: “MANAGER”, “mgr”:
7839, “hiredate”:”1981-5-1”, “sal”:2850, “comm”:null, “deptno”:30}
});

Create(
Class(“emp”),
{ data:
{ “empno”:7782, “ename”: “CLARK”, “job”: “MANAGER”, “mgr”:
7839, “hiredate”:”1981-6-9”, “sal”:2450, “comm”:null, “deptno”:10}
});
Create(
Class(“emp”),
{ data:
{ “empno”:7566, “ename”: “JONES”, “job”: “MANAGER”,
“mgr”: 7839, “hiredate”:”1981-4-2”, “sal”:2975, “comm”:null,
“deptno”:20}
});
Create(
Class(“emp”),
{ data:
{ “empno”:7788, “ename”: “SCOTT”, “job”: “ANALYST”,
“mgr”: 7566, “hiredate”:”1987-07-13”, “sal”:3000, “comm”:null,
“deptno”:20}
});
Create(
Class(“emp”),
{ data:
{ “empno”:7902, “ename”: “FORD”, “job”: “ANALYST”,
“mgr”: 7566, “hiredate”:”1981-12-3”, “sal”:3000, “comm”:null,
“deptno”:20}
});
Create(
Class(“emp”),
{ data:
{ “empno”:7369, “ename”: “SMITH”, “job”: “CLERK”,
“mgr”: 7902, “hiredate”:”1980-12-17”, “sal”:800, “comm”:null,
“deptno”:20}
});

Technical Whitepaper 24
Create(
Class(“emp”),
{ data:
{ “empno”:7499, “ename”: “ALLEN”, “job”: “SALESMAN”,
“mgr”: 7698, “hiredate”:”1981-02-22”, “sal”:1600, “comm”:300,
“deptno”:30}
});
Create(
Class(“emp”),
{ data:
{ “empno”:7521, “ename”: “WARD”, “job”: “SALESMAN”,
“mgr”: 7698, “hiredate”:”1981-02-22”, “sal”:1250, “comm”:500,
“deptno”:30}
});
Create(
Class(“emp”),
{ data:
{ “empno”:7844, “ename”: “TURNER”, “job”: “SALESMAN”,
“mgr”: 7698, “hiredate”:”1981-09-08”, “sal”:1500, “comm”:0,
“deptno”:30}
});

Create(
Class(“emp”),
{ data:
{ “empno”:7876, “ename”: “ADAMS”, “job”: “CLERK”,
“mgr”: 7788, “hiredate”:”1987-07-13”, “sal”:1100, “comm”:null,
“deptno”:20}
});

Create(
Class(“emp”),
{ data:
{ “empno”:7900, “ename”: “JAMES”, “job”: “CLERK”,
“mgr”: 7698, “hiredate”:”1987-12-3”, “sal”:950, “comm”:null,
“deptno”:30}
});

/* Here we create two indexes */


CreateIndex({name:”dept_by_name”,
source:Class(“dept”),
terms: [{field: [“data”, “dname”]}],
values:[{field: [“data”,”deptno”]}]})

CreateIndex({name:”emp_by_deptno”,

Technical Whitepaper 25
source:Class(“emp”),
terms:[{field: [“data”,”deptno”]}]})

/* FInd all records in EMP for the department SALES */


Paginate(Join(Match(Index(“dept_by_name”),”SALES”),Index(“emp_by_
deptno”)))

/* Get all records from EMP */


Map(Paginate(Join(Match(Index(“dept_by_name”),”SALES”),Index(“emp_
by_deptno”))),Lambda(“X”,Get(Var(“X”))))

Technical Whitepaper 26

You might also like