Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Common NoSQL data models

NoSQL: A quick The most common types of NoSQL data


models include:
primer • Key-value type, which pairs keys and
values using a hash table—in a manner
If you work with databases, you’ve probably similar to how a file path points to a file
heard of NoSQL. Even if you haven’t, odds are containing some data. The key is used to
that you depend on NoSQL databases more reference the value, which can include any
than you know—if not as a developer, as an arbitrary value—for example, an integer,
end user. It’s becoming more and more string, a JSON structure (aka a document),
popular with today’s largest companies for its a JPEG, an array, and so on.
flexibility and scalability, in areas ranging from
• Document databases extend the concept
gaming and e-commerce to big data and real-
of the key-value database by organizing
time web apps. The use cases for NoSQL are
entire documents into groups often called
continuing to grow, and, with the availability
collections. A key can be any attribute
of NoSQL database services in the cloud, the
within the document, within which data is
benefits that it provides are within the reach
encoded using a standardized format,
of all.
such as XML or JSON. (In general, key-
NoSQL databases have been around since the value stores don’t support nested key-
1960s, under various names. However, their value pairs, whereas document databases
popularity began to surge—and the NoSQL do. What’s more, because document
label was attached—much more recently, as databases store their data in a format that
leading technology companies began the database can understand, they allow
adopting NoSQL databases for their ability to queries on any attribute within a
handle petabytes of rapidly changing, document.)
unstructured data. But what exactly is a
• Columnar, or wide-column databases,
NoSQL database and, more importantly, what
which generally store the values of one or
can it do for you as a developer?
more columns together in a storage block.
Unlike relational databases, a columnar
NoSQL defined database can efficiently store and query
NoSQL is the name for a category of across rows that contain sparsely filled
databases that are nonrelational in nature, columns.
meaning that data storage and retrieval aren’t
• Graph, which uses a data model based on
handled using a predefined schema, with
nodes, edges, and properties to represent
structured rows and columns, as with a
interconnected data—such as
relational database. Instead, NoSQL databases
relationships between people in a social
don’t require a predefined schema and
network.
employ data models that make them highly
effective at handling unstructured, It’s worth noting that most NoSQL databases
unpredictable data—often with blazing-fast can also handle highly structured data—they
query speeds. By design, most NoSQL just aren’t limited to it, nor do you need to
databases also support horizontal scalability. define a database schema ahead of time.
Similarly, if you want to add new data types to
a NoSQL database, unlike with a relational
database, you won’t need to stop what you’re

1
doing, add new columns, and then move your app needs to do, and what’s required of your
data to the new schema. This can be a big database to support that.
advantage when it comes to agile
If you need to handle unstructured data at
development and more frequent software
any scale, NoSQL might be a good place to
release cycles.
start. Now consider the other characteristics of
many NoSQL databases, such as low latency,
Horizontal scalability
horizontal scalability, and automatic
Another factor that contributes to the rapid replication. Clearly, these characteristics lend
adoption of NoSQL databases is that they’re themselves well to a distributed app that
designed to scale out, or scale horizontally, requires fast performance across multiple
which makes them capable of handling a geographic regions—achieved by using the
virtually unlimited amount of data. That’s not enabling characteristics of NoSQL to put a
to say that you can’t scale out a relational copy of your data in each geography where
database, but it can get tricky. Many NoSQL your users reside. Similarly, the low latency of
databases, in comparison, have the inherent NoSQL makes it a strong candidate for
capabilities that allow them to scale out delivering real-time customer experiences—
automatically and distribute their data over an like you might need for e-commerce or
arbitrary number of servers. gaming. NoSQL is also proving popular in
other scenarios, such as building serverless
Replication apps and implementing big data/analytics
Most NoSQL databases are distributed and over operational data/transactional apps.
support some form of automatic replication,
The takeaway here is that, at the end of the
which can help maintain service availability in
day, the decision on when to use NoSQL is
the event of a planned or unplanned outage.
about more than just whether your data is
Replication also lets you distribute copies of
structured or not—it’s about what your app
your data across multiple geographies. For
needs to do, and how easily and flexibly you
geographically distributed apps, it implies that
can achieve that.
someone using an app in one part of the
world can read from a local replica and rather
than waiting for data to be retrieved from the
How to choose a NoSQL
other side of the globe. database
Flexibility in handling unstructured data,
When to consider NoSQL inherent horizontal scalability, and built-in
At this point you might be asking, “So when replication are all reasons why NoSQL
should I use a NoSQL database?” To answer databases are becoming more and more
this, it’s worth starting with an popular. And with so many of them to choose
acknowledgment that “NoSQL” can also mean from, developers can usually find one that’s
“Not only SQL.” As we’ve stated, NoSQL well suited to their data. Such specialized,
databases can also handle structured data— purpose-built NoSQL databases can also serve
and can often be accessed using a structured queries with blazing speed in many cases,
query language like SQL. Although at first that which is critical in delivering real-time user
might seem to muddy the picture when it experiences at scale—gaming and e-
comes to the SQL or NoSQL question, it really commerce are two good examples. However,
doesn’t—it just shifts the focus to what your that’s not to say that there aren’t some
potential tradeoffs and other important

2
considerations associated with choosing a The impossibility result of the CAP theorem
NoSQL database. proves that it’s impossible for such a system
to both remain highly available and deliver
Programming models and APIs linearizable consistency in the event of a
If you’ve worked with relational databases, network failure (in which replicas are unable
you’re probably aware that they’re not always to talk to each other). Similarly, the CAP
a good match for the data structures you use theorem shows that, in the absence of a
when programming. Many NoSQL databases, network failure, you can achieve both
however, are aggregate oriented, with an availability and consistency. However, even in
aggregate defined as a collection of data that the absence of a network failure, you still need
you interact with as a unit—making them a to consider tradeoffs between consistency and
much more natural fit for modern object- latency—formally codified in PACELC
oriented programming languages. theorem—due to the fact that data packets
being sent over a network wire are unable to
As such, when it comes to choosing a NoSQL
travel faster than the speed of light.
database, you’ll probably want to start by
choosing a data model—and then evaluate Some NoSQL databases don’t guarantee
the NoSQL databases that support it, along consistency. Most of them, however, let you
with the programming languages and SDKs choose from either end of the spectrum:
that each database supports. Does the strong consistency (you’ll get the latest data,
database lock you into a given SDK and but you might need to wait) or eventual
language, or will you have a choice in the consistency (you’ll get a fast response, but the
matter? And does the SDK have what you data might be stale). Some NoSQL databases
need to get the most out of your distributed support other consistency levels, which
database—such as transparent multihoming typically fall in between those extremes. The
APIs to ensure that your app can properly key takeaway here is that, all other things
operate in case of a planned or unplanned considered equal, the more flexibility and
failover? control you have in terms of consistency
levels—and thus the tradeoffs between
Consistency vs. latency consistency and latency—the better off you’ll
Because a replicated NoSQL database is, in be.
effect, a distributed system, you’ll need to be
aware of the CAP theorem. Also called
On-premises vs. cloud—and which
Brewer’s theorem, it states that it’s impossible cloud?
for a distributed data store to simultaneously NoSQL databases have been around for years,
provide more than two out of the following so you can find many that were designed to
three guarantees: run on-premises. However, it’s worth noting
that NoSQL databases really started becoming
• Consistency—Ensuring that every request
popular with the advent of the cloud—and for
receives the most recent data
good reason: their distributed nature and
• Availability—Ensuring that every request horizontal scalability make them an ideal fit. In
receives a response fact, odds are that, regardless of the data
• Partition tolerance—Ensuring that the model you choose, you’ll find several cloud
system continues to operate in the event options. But as you’re probably aware, all
of a failure between network nodes clouds are not created equal. So how do you
choose?

3
In approaching this decision, in addition to availability built into the service, or will it
programming languages/APIs and be an added complication that I’ll need to
consistency/latency tradeoffs, you might want worry about?
to consider the following:
• Service levels. Does the cloud service
• Supported data models. Does the cloud guarantee a certain level of availability?
provider support all the data models that I Does it have any latency guarantees? And
might want to use? And if so, will I need to if so, are they “empty promises” or are
juggle a bunch of different database they financially backed?
services?
• Ecosystem. How tightly integrated is the
• Deployment and operations. How easily database with the rest of the cloud
can I deploy my database, and then platform? Does it provide all the services I
replicate it to other regions if needed? need, and can they be quickly stitched
How tedious are the setup and together to build a complete solution?
maintenance requirements? Do I get a
Finally, in selecting a NoSQL database service,
fully managed service, or will I need to
it’s worth taking a step back and examining its
worry about patching and planned
cloud platform as a whole. Rarely does any
downtime?
database exist in isolation, so you’ll want to
• Geographic presence. Where are the make sure the service you choose—and the
cloud provider’s datacenters? Can put my platform upon which it resides—can provide
data where I want it? How will I handle everything that you’ll need to put your NoSQL
important regulatory and data sovereignty database to use. The specific services you’ll
issues, such as the European Union’s new need will depend on your app, such as the
General Data Protection Regulation? ability to integrate your NoSQL database with
other app components via serverless
• Ease of replication. What’s the process
functions. Other cloud services that you might
for replicating my database to a different
need are more scenario specific, such as those
geographic region? How complex is the
for ingesting massive volumes of IoT data,
process, and how long will it take?
implementing real-time streaming analytics,
• Scalability. How will I ensure the or building AI into your apps. And don’t forget
database resources required to ensure about ease of integration, such as triggering a
adequate performance—and scale for serverless function when your NoSQL data
growth? Will I need to pre-provision and changes. After all, even if a cloud platform
pay for resources that I might never use, provides all the services you need, you don’t
or can I scale up and down on demand to want to tie them together with paper clips and
handle unpredictable workloads? glue.
• High availability. What will happen in the
event of an unexpected failure? Is high

4
storage, making it ideal for apps that require
Azure Cosmos DB: A extremely low latency, anywhere in the world.
With Azure Cosmos DB, you get things that
globally distributed, you can’t find anywhere else. It’s the only
database service that offers five well-defined
multi-model consistency levels, enabling you to avoid the
all-or-nothing tradeoffs you face with most

database NoSQL databases. It even indexes your data


for you as it’s ingested, without requiring you
to deal with schema or index management.
Azure Cosmos DB, the globally distributed
And it delivers guaranteed high availability
database service from Microsoft, is a lot more
and low latency, all backed by industry-
than just another NoSQL database. it provides
leading service-level agreements (SLAs).
native support for all major NoSQL data
models - key-value type, document, graph, Best of all, because Azure Cosmos DB is a fully
and columnar—exposed through multiple managed Microsoft Azure service, you won’t
APIs so you can use familiar tools and need to manage virtual machines, deploy and
frameworks. Azure Cosmos DB also delivers configure software, or deal with upgrades.
turnkey global distribution, multi-master Every database is automatically backed up,
support, and elastic scaling of throughput and protected against regional failures, and
encrypted, so you won’t have to worry about
those things either—leaving you with even
Check out our technical more time to focus on your app.
training series A brief history of Azure Cosmos DB
This seven-part webinar series covers
As a cloud service, Azure Cosmos DB is built
the following topics:
from the ground up for multitenancy, elastic
• Technical overview of Azure scalability, high availability, and global
Cosmos DB distribution—with low latencies and intuitive,
predictable consistency levels. The work
• Build real-time personalized
began in 2010, when developers at Microsoft
experiences with AI and serverless
set out to build a database that could meet
technology
those fundamental requirements for internal
• Using the Gremlin and Table APIs global apps. The result was a new fully
with Azure Cosmos DB managed nonrelational database service
• Build or migrate your Mongo DB called Azure DocumentDB.
app to Azure Cosmos DB Seven years later, we announced Azure
• Understanding operations of Azure Cosmos DB, the first globally distributed,
Cosmos DB multi-model database service for building
planet-scale apps. Since then, we’ve added
• Build serverless apps with Azure support for new APIs, a native Apache Spark
Cosmos DB and Azure Functions connector, the Azure Cosmos DB Change Feed
• Apply real-time analytics with Processor Library (which provides a sorted list
Azure Cosmos DB and Spark of documents in the order in which they were
modified), support for Azure Cosmos DB in

5
the Azure Storage Explorer, and a number of place. These data models are supported
features for monitoring and troubleshooting. through the following APIs, with SDKs
available in multiple languages:
In January 2018, Info World’s 2018 Technology
of the Year awards recognized Azure Cosmos • SQL API: An API for accessing the core
DB, zeroing in on its “innovative approach to schema-less JSON document-oriented
the complexities of building and managing database engine with rich SQL querying
distributed systems.” capabilities.
So just how did we achieve this? By design, • Azure Cosmos DB API for MongoDB: An
Azure Cosmos DB does three things very well: API for accessing the document- oriented
massively scalable MongoDB-as-a-service
• Partitioning, which is what enables elastic
that you can use to easily move existing
scale out of storage and throughput.
MongoDB apps to the cloud. The
• Replication, which enables turnkey global MongoDB API enables connectivity
distribution—augmented with a set of between Azure Cosmos DB and existing
well-defined consistency levels to let you MongoDB libraries, drivers, tools, and
tune consistency versus performance. apps.
• Resource governance, through which • Cassandra API: An API for accessing the
Azure Cosmos DB can offer column based globally distributed
comprehensive SLAs encompassing the Cassandra-as-a-service, which makes it
four dimensions of global distribution that easy to move existing Apache Cassandra
customers care about the most: apps to the cloud. The Cassandra API
throughput, latency at the ninety-ninth enables connectivity between Azure
percentile, availability, and consistency. Cosmos DB and existing Cassandra
libraries, drivers, tools, and apps.
Key features and capabilities
• Gremlin (graph) API: An API to the fully
To understand how you can use Azure managed, horizontally scalable database
Cosmos DB to build infinitely scalable, highly service that supports Open Graph APIs
responsive global apps, it’s worth looking at (based on the Apache TinkerPop
its key capabilities in more detail. Later in this specification).
e-book, we’ll take a deeper dive into many of
these same concepts. • Azure Table API: An API built to provide
automatic indexing, guaranteed low
Multiple data models. Azure Cosmos DB is latency, global distribution, and other
the only fully managed service that natively features of Azure Cosmos DB to existing
supports document, graph, key-value, and Azure Table storage apps with very
columnar NoSQL data models—all in one minimal effort.

Figure 1. Azure Cosmos DB natively supports document, graph, key value, and columnar data models.

6
Turnkey global distribution. Azure Cosmos intuitive consistency levels—ranging from
DB is the only database service that delivers strong to eventual. In between those two
turnkey global distribution. It lets extremes, you get three intermediate
you distribute your data to any number consistency levels to choose from (bounded
of Azure regions with just a few mouse clicks, staleness, consistent-prefix, and session),
keeping your data close to your users to enabling you to fine-tune the tradeoffs
maximize app performance. With the Azure between consistency and latency for your app.
Cosmos DB multihoming APIs, your app
No schema or index management. Azure
always knows where the nearest copy of your
Cosmos DB lets you rapidly iterate without
data resides, without any configuration
worrying about schemas or indexes. The Azure
changes, even as you add and remove
Cosmos DB database engine is schema
regions.
agnostic, and Azure Cosmos DB is the only
Multi-master support. With multi-master database service that automatically indexes all
support (multi-region writes), you can write the data it ingests, resulting in blazing-fast
data to any region associated with your Azure queries. It works across all supported data
Cosmos DB account and have those updates models, without the need for schemas or
propagate asynchronously, enabling you to secondary indexes.
seamlessly scale both write and read
Global presence. As a foundational Azure
throughput anywhere around the world. You’ll
service, Azure Cosmos DB is available in all
get single-digit millisecond write latencies at
regions where Azure is available— currently
the ninety-ninth percentile, 99.999 percent
54 regions worldwide.
write (and read) availability, and
comprehensive and flexible built-in conflict Industry-leading security and compliance.
resolution. Multi-master support is crucial for When you choose Azure Cosmos DB, you run
building globally distributed apps and on Microsoft Azure—the world’s most trusted
significantly simplifies their development. cloud, with more compliance offerings than
any other cloud provider. Data within Azure
Limitless, elastic scale out of storage and
Cosmos DB is always encrypted, both at rest
throughput. With Azure Cosmos DB, you pay
and in motion, as are indexes, backups, and
only for the storage and throughput that you
attachments. Encryption is enabled by default,
need—and can independently and elastically
in a manner that’s transparent to your app
scale storage and throughput at any time,
and has no impact on performance,
across the globe.
throughput, or availability.
Guaranteed low latency. With its latch-free,
“Always on” availability. Azure Cosmos DB
write-optimized database engine, Azure
provides a 99.99 percent availability SLA for all
Cosmos DB delivers guaranteed low latency.
single-region accounts and a 99.999 percent
For a typical 1-KB item, reads are guaranteed
read availability SLA for all multi-region
to be under 10 milliseconds at the ninety-
accounts. Automatic failover helps protect
ninth percentile; indexed writes are
against the unlikely event of a regional
guaranteed to be under 10 milliseconds at the
outage, with all SLAs maintained. You can
ninety-ninth percentile, within the same Azure
prioritize failover order for mult-iregion
region. Median latencies are even lower, at
accounts and can manually trigger failover to
under 5 milliseconds.
test the end-to-end availability of your app—
Five well-defined consistency options. with guaranteed zero data-loss.
Azure Cosmos DB is the only database service
that offers five well-defined, practical, and

7
Unmatched, enterprise-grade SLAs. With industry-leading, financially-backed SLAs for
Azure Cosmos DB, you can rest assured that 99.999 percent high availability, latency at the
your apps are running on an enterprise-grade ninety-ninth percentile, guaranteed
database service. In fact, Azure Cosmos DB is throughput, and consistency.
the first and only database service to offer

Figure 2. Azure Cosmos DB offers industry-leading, financially backed SLAs

Common use cases • Real-time customer experiences. The


guaranteed low latency provided by Azure
Now that we’ve covered the key features and
Cosmos DB makes it ideal for delivering
capabilities of Azure Cosmos DB, just how can
real-time customer experiences and other
you put them to use? As a fully managed,
latency-sensitive apps. And when you use
multi-model database service, Azure Cosmos
Azure Cosmos DB together with Azure
DB is a good choice for a broad range of apps.
Databricks for its advanced analytics and
It’s especially well-suited for event-driven
machine learning capabilities, you can
serverless apps that require low latency and
build apps that provide personalization
that might need to scale rapidly and globally.
and real-time recommendations.
Add in its support for multiple data models
and APIs and five consistency levels, and you • Internet of things (IoT). Azure Cosmos
have a NoSQL-compatible database service DB lets you accommodate diverse and
capable of supporting most any scenario unpredictable IoT workloads—enabling
where a traditional relational database isn’t a you to scale instantly and elastically to
good fit. handle sustained, write-heavy data
ingestion, all with uncompromised query
That said, here are common scenarios where
performance.
Microsoft customers are using Azure Cosmos
DB: • E-commerce. Azure Cosmos DB supports
flexible schemas and hierarchical data,
• Globally distributed apps. Azure Cosmos
making it well suited for storing product
DB lets you build modern apps at a global
catalog data where different products
scale, ensuring uncompromised
have different attributes. This is one of the
performance no matter where your users
reasons why Azure Cosmos DB is used
are. You can easily put copies of your data
extensively in Microsoft’s own e-
in regions across the world, knowing you’ll
commerce platforms.
get guaranteed low latencies and built-in
failover to ensure high availability and • Gaming. Modern games rely on the cloud
disaster recovery. to deliver personalized content like in-

8
game stats, social media integration, and Cosmos DB to improve the efficiency of
leaderboards. Through its low-latency Spark jobs.
reads and writes, Azure Cosmos DB can
• Migration of existing NoSQL workloads
help deliver an engaging, uncompromised
to the cloud. Azure Cosmos DB makes it
in-game experience across large and
easy to migrate existing NoSQL workloads
changing user bases. At the same time, its
to the cloud—in many cases, with no more
instant, elastic scalability enables it to
than a change to a connection string in
easily support the traffic spikes that are
your app. With the Azure Cosmos DB
likely to occur during new game launches,
Mongo DB and Cassandra APIs, you can
online tournaments, and feature updates.
migrate on-premises MongoDB and
• Serverless apps. Azure Cosmos DB Cassandra databases to Azure Cosmos DB,
integrates natively with Azure Functions, respectively, then continue to use your
making it easy to build event-driven, existing tools, drivers, libraries, and SDKs.
serverless apps that let you seamlessly You won’t need to spend any more time
scale data ingestion, throughput, and data managing an on-premises database and
volumes. Your data will be made available will benefit from all that Azure Cosmos DB
immediately and indexed automatically, brings to the table. The videos on the
with stable ingestion rates and query Azure Cosmos DB YouTube channel can
performance. And with the change feed help you get started.
support in Azure Cosmos DB, you can
On the following pages, we take a deeper look
easily use changes in your data to kick off
at these and other key capabilities of Azure
other actions and/or synchronize multiple
Cosmos DB, including how they work and how
data models in your event-driven app.
to put them to use. We’re confident that, by
• Big data and analytics. Azure Cosmos DB the time you finish reading, you’ll be ready to
integrates effortlessly with Azure choose an API and go hands-on with Azure
Databricks for advanced analytics via Cosmos DB. Or, if you prefer to learn by
Apache Spark, enabling you to implement doing, you can skip forward to Choosing a
machine learning at scale across fast- data model and API, get started with your
changing, high-volume, globally chosen API, and refer to the Key Concepts
distributed data. The Spark to Azure section of this e-book on an as-needed basis.
Cosmos DB connector lets Azure Cosmos
If you’d prefer to watch a video, many of these
DB act as an input source or output sink
same concepts are also covered in the first
for Spark jobs and can even push down
webinar in the Azure Cosmos DB Technical
predicate filtering to indexes within Azure
Training series.

You might also like