Download as pdf or txt
Download as pdf or txt
You are on page 1of 221

NoSQL Databases

 Database is usually defined as collection of data and the system that handles data, transactions,
problems and issues of the database is known as Database Management System (DBMS).
 A Database was incepted in 1960 in order to satisfy the need of storing and finding data.
 The relational database has been the foundation of enterprise data management for over thirty
years.

But the way we build and run applications today, coupled with unrelenting growth in new data sources
and user loads are pushing relational databases beyond their limits. This is compelling more and more
organizations to migrate to alternatives.

Introduction

 "NoSQL database" is a new breed of database management system which doesn't use the
relational model — it uses a wide variety of data models. The actual data model that it uses
depends on the database.
 When people use the term “NoSQL database”, they typically use it to refer to any non-relational
database. Some say the term “NoSQL” stands for “non SQL” while others say it stands for “not only
SQL.”
 NoSQL database is a highly scalable and flexible database management system.
 It allows the user to store and process unstructured data and flexible-structured data.
 NoSQL systems don’t generally provide the same level of data consistency as SQL databases. In
fact, while SQL databases have traditionally sacrificed scalability and performance for the ACID
properties. NoSQL databases guarantee high speed and scalability performance.

 A common misconception is that NoSQL databases or non-relational databases don’t store


relationship data well. It is not true - they just store it differently than relational databases do. In
fact, , many find modeling relationship data in NoSQL databases to be easier than in SQL databases,
because related data doesn’t have to be split between tables. NoSQL data models allow related
data to be nested within a single data structure.

 Although NoSQL databases have been around since the 1960s, it wasn't until the early 2000s that
the NoSQL approach started to pick up steam, and a whole new generation of NoSQL systems was
born.

 At present application development scenario we generally observe that


o As storage costs rapidly decreases, the applications need to store more data and demand of
efficient query increases.
o Data comes in all shapes and sizes—structured, semi-structured, and polymorphic

So defining the schema in advance became nearly impossible. NoSQL databases allow developers
to store huge amounts of unstructured data, giving them a lot of flexibility.

Prepared by Kamal Podder Page 1


 On one hand, there is relational database management system (RDBMS) that offers more
consistency as well as powerful query capabilities and a lot of knowledge and expertise over the
years. On the other hand, NoSQL approach, which offers higher scalability i.e. it can run faster and
support bigger loads.
 The NoSQL in general doesn’t support complicated queries and doesn’t have a structured schema.
It recommends de-normalization and is designed to be distributed (cloud-scale). Because of the
distributed model, any server can answer any query. The server that answers our query might not
have the latest data.

NoSQL characteristics

1. Can handle large data volume.


2. Scalable replication and distribution - The NoSQL automatically spreads the data into multiple
servers without requiring application assistance. Servers can be added or removed from the data
layer without application downtime.
3. Schema-less : The Data can be inserted in a NoSQL database without first defining a rigid database
schema. The format of the data being inserted can be changed anytime without application
disruption. This provides immense application flexibility that delivers substantial business flexibility.
4. Open Source development.

NoSQL Database Types


Several different varieties of NoSQL databases have been created to support specific needs and use
cases. There are four NoSQL Database types: key-value store, document store, column-oriented
database, and graph database. Each type solves a problem that can’t be solved with relational
databases. Actual implementations are often combinations of these.

1. Column-Oriented Store (Wide Column Store)


Traditional relational databases are row-oriented, with each row having a row-id and each field within
the row stored together in a table. Let’s say, for example’s sake, that no extra data about hobbies is
stored and you have only a single table to describe people, as shown in figure.

Every time you look something up in a row-oriented database, every row is scanned, regardless of
which columns you require. Let’s say we only want a list of birthdays in September. The database will

Prepared by Kamal Podder Page 2


scan the table from top to bottom and left to right, as shown in figure, eventually returning the list of
birthdays.

Column databases store each column separately, with the related row numbers. Every entity (person)
is divided over multiple tables, allowing for quicker scans when only a small number of columns are
involved.

A column database maps the data to the row numbers; in that way counting becomes quicker, so it’s
easy to see how many people like archery, for instance. Storing the columns separately also allows for
optimized compression because there’s only one data type per table.

The column-oriented database shines when performing analytics and reporting: summating values and
counting entries. A row-oriented database is often the operational database of choice for actual
transactions (such as sales). Overnight batch jobs bring the column-oriented database up to date,
supporting lightning-speed lookups and aggregations using MapReduce algorithms for reports.
Examples of column-family stores are Apache HBase, Facebook’s Cassandra, Hypertable, and the
grandfather of wide-column stores, Google BigTable.

2. Key-Value Store

 The database stores data as a collection of key/value pairs A simple phone directory is a classic
- each item contains keys and values. example of a key-value database:
 A value can typically only be retrieved by referencing its
key.

Example : Redis, Voldemort, Riak, and Amazon’s Dynamo are


popular key-value databases.

Some popular use cases of the key-value However key-value databases are not the ideal choice for
databases: every use case when:
 For storing user session data  We have to query the database by specific data value.
 Maintaining schema-less user profiles  We need relationships between data values.

Prepared by Kamal Podder Page 3


 Storing user preferences  We need to operate on multiple unique keys.
 Storing shopping cart data  Our business needs updating a part of the value
frequently.
3. Document Store
 Document store NoSQL databases are similar to key-value databases. Only difference is that, the
value contains structured or semi-structured data.
 This structured/semi-structured value is referred to as a document and can be in XML, JSON or
BSON format.
 Its associated key is the unique identifier for that value.

A document store does assume a certain document structure that can be specified with a schema.
Document stores appear the most natural among the NoSQL database types because they’re designed
to store everyday documents as is.

For example say we want to store articles of Newspapers or magazines in database


In RDBMS following tables will be needed : In document store :
1. Article - the article text goes in one table A newspaper article can also be stored as a single
2. Author - all the information about the author. entity as shown in figure; this lowers the burden
3. Reader – all the information about readers of working with the data spread in different tables
4. Comment - comments on the article when for those used to seeing articles all the time.
published on a website.

Examples of document stores are MongoDB and Apache CouchDB.

Prepared by Kamal Podder Page 4


Document store databases are preferable for: Document store NoSQL databases are not the
 E-commerce platforms right choice if you have to run complex search
 Content management systems queries or if your application requires complex
 Analytics platforms multiple operation transactions.
 Blogging platforms

The MongoDB JSON Connection

JSON and BSON are close cousins, as their nearly identical names imply, but you wouldn’t know it by
looking at them side-by-side. JSON, or JavaScript Object Notation, is the wildly popular standard for
data interchange on the web, on which BSON (Binary JSON) is based.

As JSON being very common, MongoDB’s inventor chooses JSON for representing data structures in
the document model. However, there are several issues that make JSON less than ideal for usage inside
of a database.
1. JSON is a text-based format, and text parsing is very slow
2. JSON’s readable format is far from space-efficient, another database concern
3. JSON only supports a limited number of basic data types

In order to make MongoDB JSON-first, but still high-performance and general-purpose, BSON was
invented to bridge the gap: a binary representation to store data in JSON format, optimized for speed,
space, and flexibility.

BSON’s binary structure encodes type and length information, which allows it to be parsed much more
quickly. Since its initial formulation, BSON has been extended to add some optional non-JSON-native
data types, like dates and binary data, without which MongoDB would have been missing some
valuable support.

MongoDB stores data in BSON format both internally, and over the network, but that doesn’t mean
you can’t think of MongoDB as a JSON database. Anything you can represent in JSON can be natively
stored in MongoDB, and retrieved just as easily in JSON.

4. Graph databases

 It store data in nodes and edges.


 Nodes typically store information about people, places, and things while edges store information
about the relationships between the nodes.
 Graph databases excel in use cases where you need to traverse relationships to look for patterns
such as social networks, fraud detection, and recommendation engines.
 Neo4j and JanusGraph are examples of graph databases.

Graph databases are basically built upon the Entity – Attribute – Value model. Entities are also known
as nodes, which have properties. It is a very flexible way to describe how data relates to other data.

Prepared by Kamal Podder Page 5


Nodes store data about each entity in the database, relationships describe a relationship between
nodes, and a property is simply the node on the opposite end of the relationship. Whereas a traditional
database stores a description of each possible relationship in foreign key fields or junction tables,
graph databases allow for virtually any relationship to be defined on-the-fly.

Graph base NoSQL databases are usually used in:


 Fraud detection
 Graph based search
 Network and IT operations
 Social networks, etc

When should NoSQL be used:


1. When huge amount of data need to be stored and retrieved.
2. The relationship between the data you store is not that important
3. The data changing over time and is not structured.
4. Support of Constraints and Joins is not required at database level
5. The data is growing continuously and you need to scale the database regular to handle the data.

We will mainly focus our study on MongoDB. So let us give you a brief introduction of MongoDB.

Prepared by Kamal Podder Page 6


MongoDB
 MongoDB is a highly flexible and scalable open-source NoSQL database management platform that
uses a document-oriented database model. MongoDB is written in C++. MongoDB supports various
forms of data.
 It was developed as a solution for working with large volumes of distributed data that cannot be
processed effectively in relational models.
 MongoDB stores data in flat files using their own binary storage objects. This means that data
storage is very compact and efficient, perfect for high data volumes. MongoDB stores data in JSON-
like documents, which makes the database very flexible and scalable.
 Each MongoDB database contains collections and which in turn contains documents. Each
document can be different and depends on the varying number of fields. The model of each
document will be different in size and content from each other.

It shows the relationship


of RDBMS terminology
with MongoDB.

A record in MongoDB is a
Document Database
document, which is a data
structure composed of field and
value pairs. MongoDB documents
are similar to JSON objects. The
values of fields may include other
documents, arrays, and arrays of
documents.

Some Key Features of MongoDB Include:

 High Performance : MongoDB provides high performance data persistence. In particular,

o Support for embedded data models requires lesser input and output operations. Hence
reduces I/O activity on database system.
o Indexes support faster queries
o It’s query language that is rich and supports text search, aggregation features, and CRUD
operations.

Prepared by Kamal Podder Page 7


 High Availability - MongoDB’s replication facility is called replica sets. A replica set is a group of
MongoDB servers that maintain the same data set, providing redundancy and increasing data
availability. Replication ensures automatic failover and high availability.
 Automatic Scaling - MongoDB provides horizontal scalability as part of its core functionality. It
features sharding, which makes horizontal scalability possible by distributing data across a cluster
of machines. . This supports increasing data needs at a cost that is lower than vertical methods of
handling system growth.
 It employs multiple storage engines, thereby ensuring the right engine is used for the right
workload, which in turn enhances performance.

Note :
Horizontal scaling Vertical scaling
Horizontal scaling means we scale by adding Vertical scaling means we scale by adding more
additional machines to our existing bunch of computing power like CPU and RAM to an
resources. existing machine.

Advantages of MongoDB over RDBMS

 Flexible Schemas : MongoDB is document database in which one collection holds different
documents. Number of fields, content and size of the document can differ from one document to
another.
 Structure of a single object is clear
 No complex joins
 Deep query-ability. MongoDB supports dynamic queries on documents using a document-based
query language that's nearly as powerful as SQL
 Ease of scale-out: MongoDB is easy to scale
 Conversion / mapping of application objects to database objects not needed
 Uses internal memory for storing the (windowed) working set, enabling faster access of data

Where should use MongoDB?

 Big Data
 Content Management and Delivery
 Mobile and Social Infrastructure
 User Data Management
 Data Hub

Prepared by Kamal Podder Page 8


Limitations of MongoDB

While MongoDB incorporates great features to deal with many of the challenges in big data, it comes
with some limitations, such as:

1. To use joins, you have to manually add code, which may cause slower execution and less-than-
optimum performance.
2. Lack of joins also means that MongoDB requires a lot of memory as all files have to be mapped
from disk to memory.
3. Document sizes cannot be bigger than 16MB.
4. The nesting functionality is limited and cannot exceed 100 levels.

MongoDB Atlas vs MongoDB Compass: What are the differences?


MongoDB Atlas is a global cloud database service built and run by the team behind MongoDB. That
means that Atlas takes the responsibility of hosting, patching, managing and securing your MongoDB
cluster, and leaves you free for putting it to good use.

Atlas makes installing MongoDB as easy as clicking a button and answering 4 questions. Once that is
complete you will have a MongoDB cluster running a few minutes later. Creating users and allocating
limited permissions is easy and done through a nice UI.

Atlas also handles growing/shrinking your cluster when the need arises, and patching/upgrading your
MongoDB cluster when new version is released. MongoDB Atlas belongs to "MongoDB
Hosting" category of the tech stack,

MongoDB Atlas is generally recommended to every company who has a significant need in the NoSQL
database and do not want to manage their infrastructure. Using MongoDB Atlas can significantly
reduce the management time and cost, which saves valuable resources for other tasks. It also suits a
smaller company as MongoDB Atlas scales up and down very quickly.

On the other hand, MongoDB Compass is detailed as "A GUI for MongoDB". Visually explore your data.
Run ad hoc queries in seconds. Interact with your data with full CRUD functionality. View and optimize
your query performance.
MongoDB Compass can be primarily classified under "Database Tools".
MongoDB is better placed in large projects, with great scalability. It also allows you to work quite
comfortably with projects based on programming languages such as javascript angular typescript C #.
When MongoDB is used MongoDB Compass may be used as a tool.

Prepared by Kamal Podder Page 9


Hadoop Vs. MongoDB: What Should You Use for Big Data?

No discussion on Big Data is complete without bringing up Hadoop and MongoDB, two of the most
prominent software programs that are available today.

What is Hadoop?

Hadoop is an open-source set of programs that you can use and modify for your big data processes. It
is made up of 4 modules, each of which performs a specific task related to big data analytics.
These platforms include:
 Distributed File-System
 MapReduce
 Hadoop Common
 Hadoop YARN

Why Should We Use Hadoop?

Here for your consideration are six reasons why Hadoop may be the best fit for your company and its
need to capitalize on big data.

1. You can quickly store and process large amounts of varied data generated from the internet of
things and social media.
2. The Distributed File System gives Hadoop high computing power necessary for fast data
computation.
3. Hadoop protects against hardware failure by redirecting jobs to other nodes and automatically
storing multiple copies of data.
4. You can store a wide variety of structured or unstructured data (including images and videos)
without having to preprocess it.
5. The open-source framework runs on commodity servers, which are more cost-effective than
dedicated storage.
6. Adding nodes enables a system to scale to handle increasing data sets. This is done with little
administration.

Limitations of Hadoop

1. Due to its programming, MapReduce is suitable for simple requests. You can work with
independent units, but not as effective with interactive and iterative tasks. Unlike independent
tasks that need simple sort and shuffle, iterative tasks require multiple maps and reduce processes
to complete. As a result, numerous files are created between the map and reduce phases, making it
inefficient at advanced analytics.
2. Only a few entry-level programmers have the java skills necessary to work with MapReduce. This
has seen providers rushing to put SQL on top of Hadoop because programmers skilled in SQL are
easier to find.

Prepared by Kamal Podder Page 10


3. Hadoop is a complex application and requires a complex level of knowledge to enable functions
such as security protocols. Also, Hadoop lacks storage and network encryption.
4. Hadoop does not provide a full suite of tools necessary for handling metadata or for managing,
cleansing, and ensuring data quality.
5. Its complex design makes it unsuitable for handling smaller amounts of data since it can't support
random reading of small files efficiently.
6. Thanks to the fact that Hadoop’s framework is written almost totally in Java, a programming
language increasingly compromised by cyber-criminals, the platform poses notable security risks

Both Hadoop and MongoDB offer more advantages compared to the traditional relational database
management systems (RDBMS), including parallel processing, scalability, ability to handle aggregated
data in large volumes, MapReduce architecture, and cost-effectiveness due to being open source.
More so, they process data across nodes or clusters, saving on hardware costs.

However, in the context of comparing them to RDBMS, each platform has some strengths over the
other. We discuss them in detail below:
MongoDB Hadoop
RDBMS Replacement A flexible platform that can Hadoop cannot replace RDBMS but rather
make a suitable replacement supplements it by helping to archive data.
for RDBMS
Memory Handling MongoDB is a C++ based Hadoop is a Java-based collection of software
database, which makes it that provides a framework for storage,
better at memory handling. retrieval, and processing. Hadoop optimizes
space better than MongoDB.

Data Import and Data is stored as JSON, Hadoop accepts various formats of data, thus
Storage BSON, or binary, and all eliminating the need for data transformation
fields can be queried, during processing.
indexed, aggregated, or
replicated at once.
Additionally, data in
MongoDB has to be in JSON
or CSV formats to be
imported.
Big Data Handling MongoDB was not built with On the other hand, Hadoop was built for that
big data in mind. sole purpose. As such, the latter is great at
batch processing and running long ETL jobs.
Implementing MapReduce on Hadoop is more
efficient than in MongoDB, again making it a
better choice for analysis of large data sets.
Real-time Data MongoDB handles real-time Hadoop is not very good at real-time data
Processing data analysis better and is handling, but if you run Hadoop SQL-like
also a good option for client- queries on Hive, you can make data queries
side data delivery due to its with a lot more speed and with more
Prepared by Kamal Podder Page 11
readily available data. effectiveness than JSON.
Additionally, MongoDB’s
geospatial indexing makes it
ideal for geospatial
gathering and analyzing GPS
or geographical data in real-
time.

Each company and individual comes with its own unique needs and challenges, so there’s no such thing
as a one-size-fits-all solution. When determining something like Hadoop vs. MongoDB, you have to
make your choice based on your unique situation.

You could take a look and see which big companies use which platform and try to follow their example.
For instance, eBay, SAP, Adobe, LinkedIn, McAfee, MetLife, and Foursquare use MongoDB. On the
other hand, Microsoft, Cloudera, IBM, Intel, Teradata, Amazon, Map R Technologies are counted
among notable Hadoop users.

Ultimately, both Hadoop and MongoDB are popular choices for handling big data. However, although
they have many similarities (e.g., open-source, NoSQL, schema-free, and Map-reduce), their approach
to data processing and storage is different. It is precisely the difference that finally helps us to
determine the best choice between Hadoop vs. MongoDB.

Cassandra vs MongoDB
You probably came across Cassandra and MongoDB when searching for a NoSQL database. Still, these
two popular NoSQL choices have much less in common than expected.

Cassandra vs MongoDB: Similarities

When making a comparison between two database systems, it is usually inferred there are shared
similarities as well. Although they do exist, in regards to Cassandra and MongoDB, these similarities are
limited.

 Most importantly, Cassandra and MongoDB are classified as NoSQL databases.


 Cassandra was released in 2008, as one of these NoSQL databases. A year later, MongoDB was
created.
 They are free, open-source software. You can download the database packages, set up, and
configure them at no expense.
 Initially created by developers from Facebook, Cassandra is now under the ownership of the
Apache project and part of its open-source community. On the other hand, MongoDB is one of the
most popular database management systems in the world with a strong community of MongoDB
developers.

Prepared by Kamal Podder Page 12


Cassandra vs MongoDB: Differences

MongoDB Cassandra
Data Availability A single master directing multiple slave It utilizes multiple masters inside a
nodes. If the master node goes down, cluster. With multiple masters present,
one of the slave nodes takes over its there is no fear of any downtime. The
role. Although automatic failover does redundant model ensures high
ensure recovery, it may take up to a availability at all times.
minute for the slave to become the
master. During this time, the database
isn’t able to respond to requests.
Scalability Only the master node can write and Having multiple master nodes
accept input. In the meantime, the slave increases Cassandras writing
nodes are only used for reads. As capabilities. It allows this database to
MongoDB has a single master node, it is coordinate numerous writes at the
limited in terms of writing scalability. same time, all coming from its masters.
Therefore, the more master nodes
there are in a cluster, the better the
write speed (scalability).
Data Model MongoDB’s data model is categorized as Cassandra has a table structure using
object and document-oriented. This rows and columns it is column
means it can represent any kind of oriented. Still, it is more flexible than
object structures which can have relational databases since each row is
properties or even be nested for not required to have the same
multiple levels. columns. Upon creation, these
columns are assigned one of the
If you need a rich data model, MongoDB available Cassandra data types,
may be the better solution. ultimately relying more on data
structure.
Query Language MongoDB uses queries structured into Cassandra has its own query language
JSON fragments and does not have any called CQL (Cassandra Query
query language support yet. If you or Language). Its syntax is similar to SQL
your team is used to SQL, this will be but still has some limitations.
something to get used to. However, it is Essentially, the database has a
easy enough to manage. different way of storing and recovering
data due to it being non-relational.
How are Queries Selecting records from the employee Selecting records from the employee
Different? table: ‘db.employee.find()’ table: ‘SELECT * FROM employee;’

Inserting records into the employee Inserting records into the employee
table: table:

‘db.employee.insert({ empid: '101', ‘INSERT INTO employee (empid,


Prepared by Kamal Podder Page 13
firstname: 'John', lastname: 'Doe', firstname, lastname, gender, status)
gender: 'M', status: 'A'})’ VALUES('101', 'John', 'Doe', 'M', 'A');’

Updating records in the employee table: Updating records in the employee


table:
'db.Employee.update({"empid" : 101}, ‘UPDATE employee SET firstname =
{$set: { "firstname" : "James"}})' ‘James' WHERE empid = '101';’
Supported Actionscript, C, C#, C++, Clojure, C#, C++, Clojure, Erlang, Go, Haskell,
Programming ColdFusion, D, Dart, Delphi, Erlang, Go, Java, JavaScript, Perl, PHP, Python,
Languages Groovy, Haskell, Java, JavaScript, Lisp, Ruby, Scala
Lua, MatLab, Perl, PHP, PowerShell,
Prolog, Python, R, Ruby, Scala, Smalltalk
Aggregation MongoDB has a built-in aggregation Cassandra has no aggregation
framework. This feature allows it to framework and requires external tools
retrieve data by utilizing an ELT multi- like Hadoop, Spark and others.
stage pipeline to transform the
documents into aggregated results.
However, such a framework is only
efficient when working with small or
medium-sized data traffic.

It is believed that MongoDB works best


with Node.js.
Schema MongoDB is a database that does not Cassandra is a much more stationary
require a schema, naturally making it database. It facilitates static typing and
more adaptable to changes. In its prior demands the categorization and
versions, the default configuration did definition of columns beforehand.
not enforce any schema at all. Today,
you can decide whether you want a If you need flexibility in terms of
schema or not. Such flexibility means the schema, MongoDB would probably suit
database can input documents of you better.
different structures and interpret them
once in the software.
Performance There are a number of factors that impact the performance of these two types of
databases.

Mainly, the database model (or schema) makes a big difference in performance
quality as some are more suitable for MongoDB while others may work better
with Cassandra.

What’s more, the load characteristic of the application your database needs to
support also plays a crucial role. If you are expecting heavy load input, Cassandra,
with its multiple master nodes, will give better results. With heavy load output
both MongoDB and Cassandra will show good performance.

Prepared by Kamal Podder Page 14


Finally, many consider MongoDB to have the upper hand when it comes
to consistency requirements. Still, this may vary depending on the application.
Also, you can manually configure Cassandra to meet the consistency standards
you set.

Prepared by Kamal Podder Page 15


Installing MongoDB On Windows 10 and Getting started with MongoDB
Compass
In this guide we are going to do following things:
 Installing 'MongoDB Community Edition'
 Running MongoDB Server instance
 Getting started with MongoDB compass (a GUI to connect to MongoDB database)
 Creating a user database, and working with collections and documents.
 Using Mongo Shell

Installing MongoDB community Edition


Other than community edition (free to use edition of MongoDB) , there is a commercial Enterprise
Edition. Also there is 'MongoDB Atlas Free Tier Cluster' which is a cloud-hosted service for running and
maintaining MongoDB deployments. Check out this for more info. Download 'MongoDB community'
for Windows from here. (https://www.mongodb.com/try/download/community)

Run the msi installer after completion of download. Following screen will appear.

Prepared by Kamal Podder Page 1


After clicking Next button following window will appear to Choose Setup Type (Complete or Custom),
Choose 'Complete' option. In service configuration window we uncheck the option ‘Install MongoDB
as service’.

In the following 'Service Configuration' dialog, we are going to uncheck 'Install MongoD as a Service'
(checked by default) so that we can start MongoDB instance ourselves rather than it's running as a
service all the time.

Prepared by Kamal Podder Page 2


In following dialog check 'Install MongoDB Compass' and click the Next button on that dialog.
MongoDB Compass is the GUI which allows to connect MongoDB server, and to perform various
operations.

Then installation will start. After installation the message ‘Completed the MongoDB 5.0.9……. Set up
wizard’ will appear in a dialog. Click on Finish button on that dialog to exit set up.

Then set the Path environmental variable to include the MongoDB installed path as shown below.

Prepared by Kamal Podder Page 3


Set the Path environmental variable to include the MongoDB installed path as shown below.

We can verify the installation using control panel

Creating Data Directory and Running MongoDB Server


Open cmd.exe as an Admin. Create the data directory (say E:\mongoDbData). A 'data directory' is the
directory where mongoDB server instance will store data. If no data directory is specified MongoDB
will use \data\db inside your MongoDB Installation.
Assume we have set the path of bin
Run the server using following command: folder in environmental path variable.

C:\WINDOWS\system32>mongod.exe --dbpath=e:\mongoDbData

Following message will appear in the console


{"t":{"$date":"2020-10-25T09:46:28.606+05:30"},"s":"I", "c":"CONTROL", "id":23285,
"ctx":"main","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --
sslDisabledProtocols 'none'"}
{"t":{"$date":"2020-10-25T09:46:29.369+05:30"},"s":"W", "c":"ASIO", "id":22601,
"ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2020-10-25T09:46:29.369+05:30"},"s":"I", "c":"NETWORK", "id":4648602,
"ctx":"main","msg":"Implicit TCP FastOpen in use."}
{"t":{"$date":"2020-10-25T09:46:29.370+05:30"},"s":"I", "c":"STORAGE", "id":4615611,
"ctx":"initandlisten","msg":"MongoDB
starting","attr":{"pid":10488,"port":27017,"dbPath":"e:/mongoDbData","architecture":"64-
bit","host":"Kamal-PC"}}
{"t":{"$date":"2020-10-25T09:46:29.371+05:30"},"s":"I", "c":"CONTROL", "id":23398,
"ctx":"initandlisten","msg":"Target operating system minimum
version","attr":{"targetMinOS":"Windows 7/Windows Server 2008 R2"}}

Prepared by Kamal Podder Page 4


{"t":{"$date":"2020-10-25T09:46:29.371+05:30"},"s":"I", "c":"CONTROL", "id":23403,
"ctx":"initandlisten","msg":"Build
Info","attr":{"buildInfo":{"version":"4.4.1","gitVersion":"ad91a93a5a31e175f5cbf8c69561e788bbc55ce
1","modules":[],"allocator":"tcmalloc","environment":{"distmod":"windows","distarch":"x86_64","targ
et_arch":"x86_64"}}}}
{"t":{"$date":"2020-10-25T09:46:29.371+05:30"},"s":"I", "c":"CONTROL", "id":51765,
"ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"Microsoft Windows
10","version":"10.0 (build 18362)"}}}
{"t":{"$date":"2020-10-25T09:46:29.371+05:30"},"s":"I", "c":"CONTROL", "id":21951,
"ctx":"initandlisten","msg":"Options set by command
line","attr":{"options":{"storage":{"dbPath":"e:\\mongoDbData"}}}}
{"t":{"$date":"2020-10-25T09:46:29.394+05:30"},"s":"I", "c":"STORAGE", "id":22315,
"ctx":"initandlisten","msg":"Opening
WiredTiger","attr":{"config":"create,cache_size=3524M,session_max=33000,eviction=(threads_min=4,
threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,comp
ressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimu
m=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress,compact_progress],"
}}
{"t":{"$date":"2020-10-25T09:46:29.694+05:30"},"s":"I", "c":"STORAGE", "id":22430,
"ctx":"initandlisten","msg":"WiredTiger
message","attr":{"message":"[1603599389:693814][10488:140709311307424], txn-recover:
[WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Set global recovery timestamp: (0, 0)"}}
{"t":{"$date":"2020-10-25T09:46:29.694+05:30"},"s":"I", "c":"STORAGE", "id":22430,
"ctx":"initandlisten","msg":"WiredTiger
message","attr":{"message":"[1603599389:694204][10488:140709311307424], txn-recover:
[WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Set global oldest timestamp: (0, 0)"}}
{"t":{"$date":"2020-10-25T09:46:29.770+05:30"},"s":"I", "c":"STORAGE", "id":4795906,
"ctx":"initandlisten","msg":"WiredTiger opened","attr":{"durationMillis":376}}
{"t":{"$date":"2020-10-25T09:46:29.772+05:30"},"s":"I", "c":"RECOVERY", "id":23987,
"ctx":"initandlisten","msg":"WiredTiger
recoveryTimestamp","attr":{"recoveryTimestamp":{"$timestamp":{"t":0,"i":0}}}}
{"t":{"$date":"2020-10-25T09:46:29.978+05:30"},"s":"I", "c":"STORAGE", "id":22262,
"ctx":"initandlisten","msg":"Timestamp monitor starting"}
{"t":{"$date":"2020-10-25T09:46:30.081+05:30"},"s":"W", "c":"CONTROL", "id":22120,
"ctx":"initandlisten","msg":"Access control is not enabled for the database. Read and write access to
data and configuration is unrestricted","tags":["startupWarnings"]}
{"t":{"$date":"2020-10-25T09:46:30.081+05:30"},"s":"W", "c":"CONTROL", "id":22140,
"ctx":"initandlisten","msg":"This server is bound to localhost. Remote systems will be unable to
connect to this server. Start the server with --bind_ip <address> to specify which IP addresses it should
serve responses from, or with --bind_ip_all to bind to all interfaces. If this behavior is desired, start the
server with --bind_ip 127.0.0.1 to disable this warning","tags":["startupWarnings"]}
{"t":{"$date":"2020-10-25T09:46:30.085+05:30"},"s":"I", "c":"STORAGE", "id":20320,
"ctx":"initandlisten","msg":"createCollection","attr":{"namespace":"admin.system.version","uuidDispo

Prepared by Kamal Podder Page 5


sition":"provided","uuid":{"uuid":{"$uuid":"2c9266cf-5f50-4d9c-a17c-
4ca1b0542d13"}},"options":{"uuid":{"$uuid":"2c9266cf-5f50-4d9c-a17c-4ca1b0542d13"}}}}
{"t":{"$date":"2020-10-25T09:46:30.236+05:30"},"s":"I", "c":"INDEX", "id":20345,
"ctx":"initandlisten","msg":"Index build: done
building","attr":{"buildUUID":null,"namespace":"admin.system.version","index":"_id_","commitTimest
amp":{"$timestamp":{"t":0,"i":0}}}}
{"t":{"$date":"2020-10-25T09:46:30.237+05:30"},"s":"I", "c":"COMMAND", "id":20459,
"ctx":"initandlisten","msg":"Setting featureCompatibilityVersion","attr":{"newVersion":"4.4"}}
{"t":{"$date":"2020-10-25T09:46:30.241+05:30"},"s":"I", "c":"STORAGE", "id":20536,
"ctx":"initandlisten","msg":"Flow Control is enabled on this deployment"}
{"t":{"$date":"2020-10-25T09:46:30.245+05:30"},"s":"I", "c":"STORAGE", "id":20320,
"ctx":"initandlisten","msg":"createCollection","attr":{"namespace":"local.startup_log","uuidDisposition
":"generated","uuid":{"uuid":{"$uuid":"f67b320a-a67f-4627-8abe-
1fea5a1e9054"}},"options":{"capped":true,"size":10485760}}}
{"t":{"$date":"2020-10-25T09:46:30.383+05:30"},"s":"I", "c":"INDEX", "id":20345,
"ctx":"initandlisten","msg":"Index build: done
building","attr":{"buildUUID":null,"namespace":"local.startup_log","index":"_id_","commitTimestamp"
:{"$timestamp":{"t":0,"i":0}}}}
{"t":{"$date":"2020-10-25T09:46:30.612+05:30"},"s":"I", "c":"FTDC", "id":20625,
"ctx":"initandlisten","msg":"Initializing full-time diagnostic data
capture","attr":{"dataDirectory":"e:/mongoDbData/diagnostic.data"}}
{"t":{"$date":"2020-10-25T09:46:30.615+05:30"},"s":"I", "c":"CONTROL", "id":20712,
"ctx":"LogicalSessionCacheReap","msg":"Sessions collection is not set up; waiting until next sessions
reap interval","attr":{"error":"NamespaceNotFound: config.system.sessions does not exist"}}
{"t":{"$date":"2020-10-25T09:46:30.615+05:30"},"s":"I", "c":"NETWORK", "id":23015,
"ctx":"listener","msg":"Listening on","attr":{"address":"127.0.0.1"}}
{"t":{"$date":"2020-10-25T09:46:30.615+05:30"},"s":"I", "c":"STORAGE", "id":20320,
"ctx":"LogicalSessionCacheRefresh","msg":"createCollection","attr":{"namespace":"config.system.sessi
ons","uuidDisposition":"generated","uuid":{"uuid":{"$uuid":"dbfe682f-e207-43ef-8c37-
6a87658dd4b1"}},"options":{}}}
{"t":{"$date":"2020-10-25T09:46:30.616+05:30"},"s":"I", "c":"NETWORK", "id":23016,
"ctx":"listener","msg":"Waiting for connections","attr":{"port":27017,"ssl":"off"}}
{"t":{"$date":"2020-10-25T09:46:30.807+05:30"},"s":"I", "c":"INDEX", "id":20345,
"ctx":"LogicalSessionCacheRefresh","msg":"Index build: done
building","attr":{"buildUUID":null,"namespace":"config.system.sessions","index":"_id_","commitTimes
tamp":{"$timestamp":{"t":0,"i":0}}}}
{"t":{"$date":"2020-10-25T09:46:30.807+05:30"},"s":"I", "c":"INDEX", "id":20345,
"ctx":"LogicalSessionCacheRefresh","msg":"Index build: done
building","attr":{"buildUUID":null,"namespace":"config.system.sessions","index":"lsidTTLIndex","com
mitTimestamp":{"$timestamp":{"t":0,"i":0}}}}
{"t":{"$date":"2020-10-25T09:46:30.808+05:30"},"s":"I", "c":"COMMAND", "id":51803,
"ctx":"LogicalSessionCacheRefresh","msg":"Slow
query","attr":{"type":"command","ns":"config.system.sessions","command":{"createIndexes":"system.
sessions","indexes":[{"key":{"lastUse":1},"name":"lsidTTLIndex","expireAfterSeconds":1800}],"writeCo

Prepared by Kamal Podder Page 6


ncern":{},"$db":"config"},"numYields":0,"reslen":114,"locks":{"ParallelBatchWriterMode":{"acquireCou
nt":{"r":5}},"ReplicationStateTransition":{"acquireCount":{"w":5}},"Global":{"acquireCount":{"r":2,"w":
3}},"Database":{"acquireCount":{"r":2,"w":3}},"Collection":{"acquireCount":{"r":3,"w":2}},"Mutex":{"ac
quireCount":{"r":6}}},"flowControl":{"acquireCount":1,"timeAcquiringMicros":1},"storage":{},"protocol
":"op_msg","durationMillis":193}}

If we don’t set the path of the bin folder in path environmental variable then we should follow the
steps:
 First navigate to your MongoDB Bin folder
 To start MongoDB, run mongod.exe from the Command Prompt.

It will start MongoDB main process and “The waiting for connections” message will appear in the
console.

If you want to connect mongodb through shell, use below commands in another command prompt
(don’t close earlier command prompt of window which is used for running mongoDB).

Remember

 mongod is command to run the server


 mongo is command to run the client to
access server

Please see the above warning. So we need to install “mongosh” separately.

Prepared by Kamal Podder Page 7


Install MongoDB Shell, mongosh
The MongoDB Shell (mongosh) is not installed with MongoDB Server. You need to follow the mongosh
installation instructions to download and install mongosh separately.

It will download following zip file.

Now extract the files from the downloaded archive in c:\Program Files\mongosh-1.5.1-win32-x64.

Add the mongosh binary to your PATH environment variable.

Ensure that the extracted MongoDB Shell binary is in the desired location in your filesystem, then add
that location to your PATH environment variable.

To confirm that your PATH environment variable is correctly configured to find mongosh, open a
command prompt and enter the mongosh --help command. If your PATH is configured correctly, a list
of valid commands displays.

How to use the MongoDB Shell to connect to a MongoDB local deployment

Prerequisites : To use the MongoDB Shell, we must have a MongoDB deployment to connect to.

So run the mongodb local instance in a command prompt using following command :
C:\WINDOWS\system32>mongod.exe --dbpath=e:\mongoDbData

Prepared by Kamal Podder Page 8


We can use the MongoDB Shell to connect to MongoDB version 4.0 or greater.

When local MongoDB Instance on Default Port :

Run mongosh without any command-line options to connect to a MongoDB instance running on
your localhost with default port 27017 in another command window:
mongosh This is equivalent to the command: mongosh "mongodb://localhost:27017"

When we issue show dbs in the prompt we will get following output :

When local MongoDB Instance on a Non-Default


Port
To connect to a MongoDB instance running on
localhost with a non-default port 28015:
mongosh "mongodb://localhost:28015"

The test shown at the bottom of the result is the default database created for us by mongosh.

Creating and Working with Database via MongoDB Compass


1. Run mongoDB server from command prompt if not already running :
mongod.exe --dbpath=e:\mongoDbData ( Do not close the command window )
2. Start MongoDB Compass: press shortcut key: Win+S search box will open as shown below and
then type 'mongodb compass'
or click the shortcut of MongoDB
Compass or choose it from start
menu.

To connect with MongoDB server, click on New Connection.

Prepared by Kamal Podder Page 9


Hostname and port should
already be populated.

Clicking on 'Connect' button


will show following window.

There are 2 sets of the database in the node of mongoDB instance,


1. default database or reserve databases created by default : local, admin, and config.
2. user created database
As shown above, three databases 'admin', 'config' and 'local' are created by default

local database In MongoDB

 Reserved database used to store the metadata of the replication process and other related data.
 It is not part of the replication database, meaning that, the collection in local database will not
replicate from the primary node of MongoDB to the secondary node of MongoDB.

Prepared by Kamal Podder Page 10


 On startup, each mongod instance, mongoDB engine inserts a document into startup_log collection
of local database and this information will be helpful for diagnostic purpose. The
collection startup_log is a capped collection.

admin database

 It plays a vital role in authentication and authorization of MongoDB database users. This database
is used for administrative purpose too.
 There are different security mechanisms to enable security in MongoDB. If you have enabled
security in MongoDB for authentication and authorization of MongoDB database user then
this admin database comes into the picture.

config database

 Use to store the information related to sharding and its metadata.


 If you have a standalone MongoDB server then this config database is not applicable for you. This
is applicable for only sharding environment not even for MongoDB servers running under the
Replica set

Notes
Capped collections are fixed-size circular collections that follow the insertion order to support high
performance for create, read, and delete operations. By circular, it means that when the fixed size
allocated to the collection is exhausted, it will start deleting the oldest document in the collection
without providing any explicit commands.
Capped collections restrict updates to the documents if the update results in increased document
size. Since capped collections store documents in the order of the disk storage, it ensures that the
document size does not increase the size allocated on the disk. Capped collections are best for storing
log information, cache data, or any other high volume data.
Use custom Collation - New in version 3.4.

Collation allows users to specify language-specific rules for string comparison, such as rules for
lettercase and accent marks. You can specify collation for a collection or a view, an index, or specific
operations that support collation.

Creating User Database

Click on 'CREATE DATABASE' and enter database name as authDB and first collection name as users
(In MongoDB, a collection is the equivalent of an RDBMS table):

Prepared by Kamal Podder Page 11


Clicking on 'CREATE DTABASE' will create the authDB database and its first collection users as shown:

Creating Document

A 'Document' is a record in a MongoDB collection. It is the basic unit of data in MongoDB. Documents
are written in BSON (Binary JSON) format. BSON is similar to JSON but has a more type-rich format. To
create a document click on 'authDB' (above screenshot) database. It will show the collection list
where first collection users will appear.

Prepared by Kamal Podder Page 12


Click on 'users' collection (above screenshot):

Insert Documents

IMPORTANT : Inserting documents is not permitted in MongoDB Compass Readonly Edition.

Compass provides two ways to insert documents into your collections: JSON Mode and a Field-by-Field
Editor.

 JSON Mode (New in Compass 1.20) : Allows you to write or paste JSON documents in the editor.
Use this mode to insert multiple documents at once as an array.
Prepared by Kamal Podder Page 13
 Field-by-Field Editor : Provides a more interactive experience to create documents, allowing you to
select individual field values and types. This mode only supports inserting one document at a time.

Procedure for inserting documents the collection:

1. Click on the Add Data dropdown shows two option


Import File and Insert Document. Select Insert
Document for inserting document.
2. Select the appropriate view based on how you
would like to insert documents.
3. Click the { } brackets for JSON view. This is the
default view. Click the list icon for Field-by-Field
mode. Click table to enter document in tabular view.
4. Enter the data as shown below.
5. Click Insert.

Limitation : The Insert Document button is not


available if you are connected to a Data Lake.

In JSON format, type or paste the document(s) you want to insert into the collection. To insert multiple
documents, enter a comma-separated array of JSON documents.

EXAMPLE
The following array inserts 2 documents into the collection:
[

Prepared by Kamal Podder Page 14


{"_id":4,"name":"Sourav Roy","dob":"10-10-1992","registerOn":"26-10-2020",
"usrType":"F","collegeid":"Heritage","roles":"M,A","passkey":"mak002",
"mobile":"9432114134","email":"kap@gmail.com","delStatus":"N"},
{"_id":5,"name":"Amal Saha","dob":"10-10-1958","registerOn":"25-10-2020",
"usrType":"F","collegeid":"Heritage","roles":"M,A,S","passkey":"mak001",
"mobile":"9432114134","email":"kap@gmail.com","delStatus":"N"}
]
NOTE : If you do not provide an ObjectId in your document, Compass automatically generates an
ObjectId.
In Tabular view a grid will appear as shown below where each column is a field of JSON document.
We can perform all operation on individual document appear in a row using action buttons.

Modify Documents

You can edit existing documents in your collection. When you edit a document, Compass performs
a findAndModify operation to update the document.
Limitations
 Modifying documents is not permitted in MongoDB Compass Readonly Edition.
 You cannot use the MongoDB Compass GUI to modify documents in a sharded collection. As an
alternative, you can use the Embedded MongoDB Shell in Compass to modify a sharded collection.
To learn more about updates on sharded collections, see Sharded Collection Behavior.

Procedure

Select the appropriate tab based on whether you are viewing your documents in List, JSON, or Table
view:
 To modify a document, hover over the document and click the pencil icon as shown in picture
below.
 After you click the pencil icon, the document enters edit mode
 You can now make changes to the fields, values, or data types of values and click on Update
 To exit the edit mode and cancel all pending changes to the document, click the Cancel button.

Prepared by Kamal Podder Page 15


Clone Documents

IMPORTANT : Cloning documents is not permitted in MongoDB Compass Readonly Edition.

You can insert new documents by cloning the schema and values of an existing document in a
collection. Select the appropriate tab based on whether you are viewing your documents in List, JSON,
or Table view
To clone a document, hover over the desired document and click the Clone button.

When you click the Clone button, Compass opens the document insertion dialog with the same schema
and values as the cloned document. You can edit any of these fields and values before you insert the
new document. To learn more about inserting documents, see Insert Documents.

Delete Documents

IMPORTANT : Deleting documents is not permitted in MongoDB Compass Readonly Edition.


Select the appropriate tab based on whether you are viewing your documents in List, JSON, or Table
view:
To delete a document, hover over the document and click the delete icon.

After you click the delete button, the document is flagged for deletion. Compass asks for confirmation
that you want to remove the document:

Prepared by Kamal Podder Page 16


Once you confirm,
Compass deletes
the document from
the collection.

Delete Multiple Documents


You cannot delete multiple documents at once from the Compass UI . As an alternative, you can use
the db.collection.deleteMany() method in the embedded MongoDB Shell to delete multiple documents
in a single operation.

Mongo Shell
 Mongo Shell is a JavaScript based command line interface to connect to MongoDB and to perform
various operations. Mongo shell comes with MongoDB installation by default.

Prerequisites

 The MongoDB server must be installed and running before you can connect to it from
the mongo shell.
 Once you have verified that the mongod server is running, open a terminal window (or a
command prompt for Windows) and run mongo.

Connecting MongoDB

To connect Mongo Shell to a Mongo Server use following command:

 Run mongod.exe from a command prompt as shown below. Don’t close the command window.
 Run mongo.exe from another command prompt to execute mongo shell. In the prompt we can
type shell command to execute.

Prepared by Kamal Podder Page 17


Another
command
window for
running the shell.

In mongo shell window test following shell command.


List of databases : To list available databases use Switching database
show dbs command
The command use databaseName can be
used to switch to a database:
> use authDB
switched to db authDB
>

Querying a collection

To find the documents in the collection “users” in database authDB issue the command
db.getCollection(“users”).find()

Prepared by Kamal Podder Page 18


Format Printed Results

The db.collection.find() method returns a cursor to the results; however, in the mongo shell, if the
returned cursor is not assigned to a variable using the var keyword, then the cursor is automatically
iterated up to 20 times to print up to the first 20 documents that match the query. The mongo shell
will prompt Type it to iterate another 20 times.

To format the printed result, you can add the .pretty() to the operation, as in the following:
db.myCollection.find().pretty()

In addition, you can use the


following explicit print methods
in the mongo shell:
 print() to print without
formatting
 print(tojson(<obj>)) to print
with JSON formatting and
equivalent to printjson()
 printjson() to print with JSON
formatting and equivalent to
print(tojson(<obj>))

Exit the Shell

To exit the shell, type quit() or


use the <Ctrl-C> shortcut.

Inserting a document

Use db.collection.insertOne(theDocument) command to insert a new document. For example

> db.getCollection("test-collection").insertOne({name:"Tina", dept: "Admin", phone:"222-222-222"})


{
Prepared by Kamal Podder Page 19
"acknowledged" : true,
"insertedId" : ObjectId("5c86a3ae6bbcb1dcee194565")
}
>

Multi-line Operations in the mongo Shell

If you end a line with an open parenthesis ('('), an open brace ('{'), or an open bracket ('['), then the
subsequent lines start with ellipsis ("...") until you enter the corresponding closing parenthesis (')'),
the closing brace ('}') or the closing bracket (']'). The mongo shell waits for the closing parenthesis,
closing brace, or the closing bracket before evaluating the code, as in the following example:

> if ( x > 0 ) { You can exit the line continuation mode if you enter two blank lines,
... count++; as in the following example:
... print (x); > if (x > 0
... } ...
...
>

Comparison of the mongo Shell and mongosh

mongosh is currently available as a Beta release. The new MongoDB Shell, mongosh, offers numerous
advantages over the mongo shell, such as:

 Improved syntax highlighting. During the beta stage, mongosh supports a subset of the
 Improved command history. mongo shell methods. Achieving feature parity between
 Improved logging. mongosh and the mongo shell is an ongoing effort.

To maintain backwards compatibility, the methods that mongosh supports use the same syntax as the
corresponding methods in the mongo shell. To see the complete list of methods supported by
mongosh, see MongoDB Shell Methods.

VSCode with MongoDB


Visual Studio Code has great support for working with MongoDB databases, whether your own
instance or in Azure with MongoDB Atlas. With the MongoDB for VS Code extension, you can create,
manage, and query MongoDB databases from within VS Code.

Install the extension

MongoDB support for VS Code is provided by the MongoDB for VS Code extension. To install the
MongoDB for VS Code extension, open the Extensions view by pressing Ctrl+Shift+X and search for
'MongoDB' to filter the results. Select the MongoDB for VS Code extension.

Prepared by Kamal Podder Page 20


After installation of the extension a the leaf icon on the left sidebar menu for mongoDB will appear. If
we click on this icon following screen will appear.

Connect to a MongoDB deployment

Note: To connect to a deployment using a connection string, we must have a MongoDB cluster running
on our machine or have one in the cloud using Atlas.

So first start local mongoDB instance from command prompt using following command :

Open the MongoDB interactive panel by clicking on the leaf icon on the left sidebar menu, then click on
Add Connection (1 in diagram) to connect to a database instance. Then click Connect (2) and then
enter connection string for our database in the text bar at the top of the window (3 in the diagram). In
this case it is mongodb://localhost:27017/authDB. This database is already created.
Prepared by Kamal Podder Page 21
Upon a successful connection, you should see the following changes:

Play with your database

To perform queries and other database operations on our new database, we can create
a Playground in VS Code to do these. Click on the green create playground button in VS Code as shown
in above diagram to create a playground.

It will open a new editor tab which should look like below. This a default template supplied to help
writing code for mongoDb in the VSCode.

Prepared by Kamal Podder Page 22


Delete the content in the default template and paste the following to test our authDB database:

Click the Play button


to run all code written
in the editor or
selected portion.

Click on the play button at the top-right side to run the code. A new panel “Playgroud Result” should
open with our results like below:

Prepared by Kamal Podder Page 23


This is the way to work with our MongoDB databases locally using VS Code, perform database
operations and see the results on the fly!

Prepared by Kamal Podder Page 24


MongoDB Datatypes
 String − This is the most commonly used datatype to store the data. String in MongoDB must be
UTF-8 valid.
 Integer − This type is used to store a numerical value. Integer can be 32 bit or 64 bit depending
upon your server.
 Boolean − This type is used to store a boolean (true/ false) value.
 Double − This type is used to store floating point values.
 Min/ Max keys − This type is used to compare a value against the lowest and highest BJSON
elements.
 Arrays − This type is used to store arrays or list or multiple values into one key.
 Timestamp − ctimestamp. This can be handy for recording when a document has been modified or
added.
 Object − This datatype is used for embedded documents.
 Null − This type is used to store a Null value.
 Symbol − This datatype is used identically to a string; however, it's generally reserved for
languages that use a specific symbol type.
 Date − This datatype is used to store the current date or time in UNIX time format. You can specify
your own date time by creating object of Date and passing day, month, year into it.
 Object ID − This datatype is used to store the document’s ID.
 Binary data − This datatype is used to store binary data.
 Code − This datatype is used to store JavaScript code into the document.
 Regular expression − This datatype is used to store regular expression.

Create database The command will create a new database if it doesn't


exist, otherwise it will return the existing database.
use DATABASE_NAME

Write Operations Overview


 A write operation is any operation that creates or modifies data in the MongoDB instance. In
MongoDB, write operations target a single collection. All write operations in MongoDB are atomic
on the level of a single document.
 There are three classes of write operations in MongoDB: insert , update, and remove.
 Insert operations add new data to a collection. Update operations modify existing data, and
remove operations delete data from a collection. No insert, update, or remove can affect more
than one document atomically.
 For the update and remove operations, you can specify criteria, or conditions, that identify the
documents to update or remove. These operations use the same query syntax to specify the
criteria as read operations.
 MongoDB allows applications to determine the acceptable level of acknowledgement required of
write operations.

Prepared by Kamal Podder Page 1


Insert Document
To insert data into MongoDB collection, you need to use MongoDB's insert() or save() method.
Syntax of insert() method
First start the instance of
db.collection.insert( <document or array of documents>, desktop mongoDB and run the
{ mongosh (MongoDB shell) to
writeConcern: <document>, Define the acceptable level test these operations.
ordered: <boolean> of acknowledgement.
}
)
Parameters:
Name Description Required / Type
Optional
document A document or array of documents to be inserted into Required document
the collection. or array
writeConcern A document expressing the write concern. Omit to use Optional document
the default write concern.
ordered true : perform an ordered insert of the documents in Optional boolean
the array, and if an error occurs with one of the
Defaults to documents, MongoDB will return without processing the
true. remaining documents in the array.

false : perform an unordered insert, and if an error


occurs with one of the documents, continue processing
the remaining documents in the array.

Notes on writeConcern

 The write concern is a specification of MongoDB for


write operations that determines the
acknowledgement you want after a write operation
has taken place. Write concern can be defined as
“the level of acknowledgment requested from
MongoDB for write operations to a standalone
mongod or to replica sets or to sharded clusters“.
 MongoDB has a default write concern that always
acknowledging all writes, which means that after
every write, MongoDB has to always return an
acknowledgement (in a form of a document),
meaning that it was successful. When asking for
write acknowledgement, if none is returned (in
case of failover, crashes), the write is unsuccessful.

Prepared by Kamal Podder Page 2


 This behavior is useful to understand, especially in distributed MongoDB deployments (i.e. replica
sets and sharded clusters), since you will have more than one mongod instance, and depending on
your needs, maybe you don't want all instances to acknowledge the write, just a few, to speed up
writes. Also, when to specify a write concern, you can specify journal writing, so you can guarantee
that operation result and any rollbacks required if a failover happens.

Simply put, a write concern is an indication of ‘durability’ passed along with write operations to
MongoDB. To clarify, let us look at the syntax:
Majority states that acknowledgment
{ w: <value>, j: <boolean>, wtimeout: <number> } is requested from a majority of the
Where, “voting nodes.”

 w can be an integer | "majority" | , it represents the number of members that must acknowledge
the write. Default value is 1.
 j field indicates that a write be acknowledged after it is written to the on-disk journal as opposed to
just the system memory. Unspecified by default.
 wtimeout specifies timeout for the applying the write concern. Unspecified by default.
Setting Write Concern on Replica Sets without a wtimeout can cause Writes to Block Indefinitely.
Note that “If you do not specify the wtimeout option and the level of write concern is
unachievable, the write operation will block indefinitely."
Insert’s write concern can be read as
Example:
follows: acknowledge this write when ‘at
db.inventory.insert(
least 2 members of the replica set have
{ sku: "abcdxyz", qty : 100, category: "Clothing" },
written it to their journals within 5000
{ writeConcern: { w: 2, j: true, wtimeout: 5000 } }
msecs or return an error’.
)
Available Write Concerns
Write Meaning Description
Concern
w=0 Unacknowledged Requests no acknowledgment of the write operation.
However, w: 0 may return information about socket exceptions
and networking errors to the application. Data can be rolled
back if the primary steps down before the write operations have
replicated to any of the secondaries.

If you specify w: 0 but include j: true, the j: true prevails to


request acknowledgment from the standalone mongod or the
primary of a replica set.
w=1 Acknowledged The write will be acknowledged by the server (the primary on
replica set configuration)
w=N Replica Set The write will be acknowledged by the primary server, and
Acknowledged replicated to N-1 secondaries.
w greater than 1 requires acknowledgment from the primary and
Prepared by Kamal Podder Page 3
as many data-bearing secondaries as needed to meet the
specified write concern. For example, consider a 3-member
replica set with a primary and 2 secondaries. Specifying w:
2 would require acknowledgment from the primary and one of
the secondaries. Specifying w: 3 would require acknowledgment
from the primary and both secondaries.
w=majority Majority The write will be acknowledged by the majority of the replica set
Acknowledged (including the primary). This is a special reserved string.
w=<tag set> Replica Set Tag Set The write will be acknowledged by members of the entire tag set
Acknowledged
j=true Journaled The write will be acknowledged by primary and the journal
flushed to disk.
j=unspecified In memory MongoDB does not wait for w: "majority" writes to be written to
or false the on-disk journal before acknowledging the writes.

Returns from insert command:

 A WriteResult object for single insert that contains the status of the operation
 A BulkWriteResult object for bulk inserts that contains the status of the operation.

Example :

1. Insert a single invoice document in invoice collection. Document does not contain the _id field:
If we don't specify the _id
db.invoice.insert( { inv_no: "I00001", inv_date: "10/10/2012" } );
parameter, then MongoDB
Output:
assigns a unique ObjectId
> db.invoice.insert( { inv_no: "I00001", inv_date: "10/10/2012" } );
for this document.
WriteResult({ "nInserted" : 1 })

Successful Results : Upon success, the WriteResult object contains information on the number of
documents inserted as shown above.

Write Concern Errors : If the insert() method encounters write concern errors, the results include
the WriteResult.writeConcernError field:
WriteResult({
"nInserted" : 1,
"writeConcernError" : {
"code" : 64,
"errmsg" : "waiting for replication timed out at shard-a"
}
})
Errors Unrelated to Write Concern : If the insert() method encounters a non-write concern error, the
results include the WriteResult.writeError field:
WriteResult({

Prepared by Kamal Podder Page 4


"nInserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "insertDocument :: caused by :: 11000 E11000 duplicate key error index: test.foo.$_id_
dup key: { : 1.0 }"
}
})
When we find that inserted document we will see _id is generated automatically

> db.invoice.find();
{ "_id" : ObjectId("567554d2f61afaaed2aae48f"), "inv_no" : "I00001", "inv_date" : "10/10/2012" }

Note :
 _id is 12 bytes hexadecimal number unique for every document in a collection. 12 bytes are
divided as follows − _id: ObjectId(4 bytes timestamp, 3 bytes machine id, 2 bytes process id, 3
bytes incrementer)
 Create Collection - If the collection does not exist, then the insert() method will create the
collection.

2. Insert a single invoice document in invoice collection specifying the _id field:

db.invoice.insert( { _id: 901, inv_no: "I00001", inv_date: "10/10/2012" } );

Output:
> db.invoice.insert( { _id: 901,inv_no: "I00001", inv_date: "10/10/2012" } );
WriteResult({ "nInserted" : 1 })

The operation inserts the following document in the products collection:


{ "_id" : 901, "inv_no" : "I00001", "inv_date" : "10/10/2012" }

 If you specify the _id field, the value must be unique within the collection. For operations with
write concern, if you try to create a document with a duplicate _id value, mongod returns a
duplicate key exception.
 Other Methods to Add Documents : You can also add new documents to a collection using methods
that have an upsert option. If the option is set to true, these methods will either modify existing
documents or add a new document when no matching documents exist for the query.

3. Performs a bulk insert


We will insert three documents by passing an array of documents to the insert()method. The
documents in the array do not need to have the same fields. For instance, the first document in the
array has an _id field and a unit field. But the second and third documents do not contain an_id field,
mongod will create the _id field for the second and third documents during the insert:

Prepared by Kamal Podder Page 5


db.orders.insert(
[ Most MongoDB driver clients will include
{ _id: 15, ord_no: 2001, qty: 200, unit: "doz" }, the _id field and generate an ObjectId before
{ ord_no: 2005, qty: 320 }, sending the insert operation to MongoDB;
{ ord_no: 2008, qty: 250, rate:85 } however, if the client sends a document
] without an _id field, the mongod will add
); the _id field and generate the ObjectId.
Output:
BulkWriteResult({
"writeErrors" : [ ],
"writeConcernErrors" : [ ], The operation inserted the following three documents:
"nInserted" : 3, { "_id" : 15, "ord_no" : 2001, "qty" : 200, "unit" : "doz" }
"nUpserted" : 0, { "_id" : ObjectId("56755896f61afaaed2aae490"), "ord_no" :
"nMatched" : 0, 2005, "qty" : 320 }
"nModified" : 0, { "_id" : ObjectId("56755896f61afaaed2aae491"), "ord_no" :
"nRemoved" : 0, 2008, "qty" : 250, "rate" : 85 }
"upserted" : [ ]
})
> db.createCollection("post")
> db.post.insert([
{
title: "MongoDB Overview",
description: "MongoDB is no SQL database",
by: "tutorials point", You can also pass an array of nested
url: "http://www.tutorialspoint.com", documents into the insert() method as
tags: ["mongodb", "database", "NoSQL"], shown below:
likes: 100
},
{
title: "NoSQL Database",
description: "NoSQL database doesn't have tables",
by: "tutorials point",
url: "http://www.tutorialspoint.com",
tags: ["mongodb", "database", "NoSQL"],
likes: 20,
comments: [
{
user:"user1",
message: "My first comment",
dateCreated: new Date(2013,11,10,2,35),
like: 0
}
]
}

Prepared by Kamal Podder Page 6


])
BulkWriteResult({
"writeErrors" : [ ], To insert the document we can
"writeConcernErrors" : [ ], use db.post.save(document) too. If you don't
"nInserted" : 2, specify _id in the document then save() method
"nUpserted" : 0, will work same as insert() method.
"nMatched" : 0,
"nModified" : 0, If the document contains an _id field, then
"nRemoved" : 0, the save() method is equivalent to an update with
"upserted" : [ ] the upsert option set to true.
})
>
BulkWriteResult - A wrapper that contains the results of the Bulk.execute() method. It has following
properties.

 BulkWriteResult.nInserted - The number of documents inserted using the Bulk.insert() method.


 BulkWriteResult.nMatched - The number of existing documents selected for update or
replacement. If the update/replacement operation results in no change to an existing document,
e.g. $set expression updates the value to the current value, nMatched can be greater
than nModified.
 BulkWriteResult.nModified - The number of existing documents updated or replaced. If the
update/replacement operation results in no change to an existing document, such as setting the
value of the field to its current value, nModified can be less than nMatched.
BulkWriteResult.nRemoved - The number of documents removed.
 BulkWriteResult.nUpserted - The number of documents inserted through operations with
the Bulk.find.upsert() option.
 BulkWriteResult.upserted - An array of documents that contains information for each document
inserted through operations with the Bulk.find.upsert() option.

Each document contains the following information:


BulkWriteResult.upserted.index - An integer that identifies the operation in the bulk operations
list, which uses a zero-based index.
BulkWriteResult.upserted._id - The _id value of the inserted document.

 BulkWriteResult.writeErrors - An array of documents that contains information regarding any error,


unrelated to write concerns, encountered during the update operation. The writeErrors array
contains an error document for each write operation that errors.

Each error document contains the following fields:


BulkWriteResult.writeErrors.index - An integer that identifies the write operation in the bulk
operations list, which uses a zero-based index. See also Bulk.getOperations().
BulkWriteResult.writeErrors.code - An integer value identifying the error.
BulkWriteResult.writeErrors.errmsg - A description of the error.

Prepared by Kamal Podder Page 7


BulkWriteResult.writeErrors.op - A document identifying the operation that failed. For instance, an
update/replace operation error will return a document specifying the query, the update, the multi and
the upsert options; an insert operation will return the document the operation tried to insert.

 BulkWriteResult.writeConcernError - Document that describe error related to write concern and contains
the field:
BulkWriteResult.writeConcernError.code - integer value identifying the cause of the write concern error.
BulkWriteResult.writeConcernError.errInfo - A document identifying the write concern setting related to
the error.
BulkWriteResult.writeConcernError.errmsg - A description of the cause of the write concern error.

Perform an Unordered Insert

The following example performs an unordered insert of three documents. With unordered inserts, if an
error occurs during an insert of one of the documents, MongoDB continues to insert the remaining
documents in the array.
db.products.insert(
[
{ _id: 20, item: "lamp", qty: 50, type: "desk" },
{ _id: 21, item: "lamp", qty: 20, type: "floor" },
{ _id: 22, item: "bulk", qty: 100 }
],
{ ordered: false } )

The insertOne() method


This is used to insert a single document into the collection. If the collection does not exist, then
insertOne() method creates the collection first and then inserts the specified document.

Syntax : db.collection.insertOne(<document>,
{ writeConcern: <document> }
)
Parameter
 <document> The document or record that is to be stored in the database
 writeConcern: Optional.

Return Value: It returns the _id of the document inserted into the database.

Example
Following example creates a new collection named empDetails and inserts a document using the
insertOne() method.
> db.createCollection("empDetails")
{ "ok" : 1 }
> db.empDetails.insertOne(
{

Prepared by Kamal Podder Page 8


First_Name: "Radhika",
Last_Name: "Sharma",
Date_Of_Birth: "1995-09-26",
e_mail: "radhika_sharma.123@gmail.com",
phone: "9848022338"
})
{ Output after
"acknowledged" : true, successfully inserted
"insertedId" : ObjectId("5dd62b4070fb13eec3963bea")
}
>
The insertMany() method - This method can insert multiple documents. To this method you
need to pass an array of documents.
Syntax
db.collection.insertMany([ <document 1>, <document 2>, … ],
{
writeConcern: <document>, ordered: <boolean>
}
)
Parameter

 <documents> The document or record that is to be stored in the database


 writeConcern: Optional.
 ordered: Optional. Can be set to true or false.

Return Value: It returns the _ids of the documents inserted into the database.

Prepared by Kamal Podder Page 9


MongoDB - Query Document
 Read operations, or queries, retrieve data stored in the database. In MongoDB, queries select
documents from a single collection.
 Queries specify criteria, or conditions, that identify the documents that MongoDB returns to the
clients.
 A query may include a projection that specifies the fields from the matching documents to return.
The projection limits the amount of data that MongoDB returns to the client over the network.
find() : To query data from MongoDB collection.
For query operations, MongoDB provides a db.collection.find() method. The method accepts both the
query criteria and projections and returns a cursor to the matching documents. You can optionally
modify the query to impose limits, skips, and sort orders.

The following diagram highlights the components of a MongoDB query operation:

Same query in SQL

Example

db.users.find( { age: { $gt: 18 } }, { name: 1, address: 1 } ).limit(5)

This query selects the documents in the users collection that match the condition age is greater than
18. To specify the greater than condition, query criteria uses the greater than (i.e. $gt) query selection
operator. The query returns at most 5 matching documents (or more precisely, a cursor to those
documents). The matching documents will return with only the _id, name and address fields.

Query Behavior

MongoDB queries exhibit the following behavior:

 All queries in MongoDB address a single collection.


 You can modify the query to impose limits, skips, and sort orders.
 The order of documents returned by a query is not defined unless you specify a sort().
 Operations that modify existing documents (i.e. updates) use the same query syntax as queries to
select documents to update.
Prepared by Kamal Podder Page 1
 In aggregation pipeline, the $match pipeline stage provides access to MongoDB queries.

MongoDB provides a db.collection.findOne() method as a special case of find() that returns a single
document.

Projections, which are the second argument to the find() method, may either specify a list of fields to
return or list fields to exclude in the result documents.
Important: Except for excluding the _id field in inclusive projections, you cannot mix exclusive and
inclusive projections.

Projection Examples

Exclude One Field From a Result Set : db.records.find( { "user_id": { $lt: 42 } }, { "history": 0 } )
This query selects documents in the records collection that match the condition { "user_id": { $lt: 42
} }, and uses the projection { "history": 0 } to exclude the history field from the documents.

Return Two fields and the _id Field : db.records.find( { "user_id": { $lt: 42 } }, { "name": 1, "email": 1 } )
This query selects documents in the records collection that match the query { "user_id": { $lt: 42 }
} and uses the projection { "name": 1, "email": 1 } to return just the _id field (implicitly included), name
field, and the email field in the documents in the result set.

Return Two Fields and Exclude _id :


db.records.find( { "user_id": { $lt: 42} }, { "_id": 0, "name": 1 , "email": 1 } )
This query selects documents in the records collection that match the query { "user_id": { $lt: 42}
}, and only returns the name and email fields in the documents in the result set.

Projection Behavior : MongoDB projections have the following properties:

 By default, the _id field is included in the results. To suppress the _id field from the result set,
specify _id: 0 in the projection document.
 For fields that contain arrays, MongoDB provides the following projection operators: $elemMatch,
$slice, and $.
 For related projection functionality in the aggregation framework pipeline, use the $project
pipeline stage.

Prepared by Kamal Podder Page 2


Cursors
 In the mongo shell, the primary method for the read operation is the db.collection.find() method.
This method queries a collection and returns a cursor to the returning documents.
 To access the documents, you need to iterate the cursor. However, in the mongo shell, if the
returned cursor is not assigned to a variable using the var keyword, then the cursor is
automatically iterated up to 20 times to print up to the first 20 documents in the results.

For example, in the mongo shell, the following read operation queries the inventory collection for
documents that have type equal to ’food’ and automatically print up to the first 20 matching
documents:

db.inventory.find( { type: 'food' } );

To manually iterate the cursor to access the documents, see Iterate a Cursor in the mongo Shell

Cursor Behaviors

Starting in MongoDB 5.0 (and 4.4.8), cursors created within a client session can close when the
corresponding server session ends with the killSessions command, if the session times out, or if the
client has exhausted the cursor.

By default, server sessions have an expiration timeout of 30 minutes. To change the value, set
the localLogicalSessionTimeoutMinutes parameter when starting up mongod.

Cursors Opened Outside of a Session

Cursors that aren't opened under a session automatically close after 10 minutes of inactivity, or if
client has exhausted the cursor. To override this behavior in mongosh, you can use
the cursor.noCursorTimeout() method:
After setting the noCursorTimeout option,
var myCursor = db.users.find().noCursorTimeout(); you must either close the cursor manually
with cursor.close() or by exhausting the
Cursor Isolation : cursor's results.

MongoDB cursors can return the same document more than once in some situations.

 Because the cursor is not isolated during its lifetime, intervening write operations on a document
may result in a cursor that returns a document more than once if that document has changed.
 As a cursor returns documents other operations may interleave with the query. If some of these
operations change the indexed field on the index used by the query; then the cursor will return the
same document more than once.

Prepared by Kamal Podder Page 3


To handle this situation, we can use snapshot mode. This prevents the cursor from returning a
document more than once because an intervening write operation results in a move of the document.
The mongo shell provides the cursor.snapshot() method: db.collection.find().snapshot()

Cursor Batches

The MongoDB server returns the query results in batches. Batch size will not exceed the maximum
BSON document size. For most queries, the first batch returns 101 documents or just enough
documents to exceed 1 megabyte. Subsequent batch size is 4 megabytes.

To override the default size of the batch, we can use batchSize() and limit(). batchsize() specifies the
number of documents to return in each batch of the response from the MongoDB instance. In most
cases, modifying the batch size will not affect the user or the application, as mongosh and
most drivers return results as if MongoDB returned a single batch.

Example : db.inventory.find().batchSize(10) sets the batch size for the results of a query (i.e. find())
to 10. The batchSize() method does not change the output in mongosh, which, by default, iterates over
the first 20 documents.

Note : Specifying 1 or a negative number is analogous to using the limit() method.

For queries that include a sort operation without an index, the server must load all the documents in
memory to perform the sort before returning any results. As you iterate through the cursor and reach
the end of the returned batch, if there are more results, cursor.next() will perform a getmore operation
to retrieve the next batch. To see how many documents remain in the batch as you iterate the cursor,
you can use the objsLeftInBatch() method, as in the following example:

var myCursor = db.inventory.find();


var myFirstDocument = myCursor.hasNext() ? myCursor.next() : null;
myCursor.objsLeftInBatch();

Cursor Information

The db.serverStatus() method returns a document that includes a metrics field. The metrics field
contains a cursor field with the following information:
 number of timed out cursors since the last server restart
 number of open cursors with the option DBQuery.Option.noTimeout set to prevent timeout
after a period of inactivity
 number of “pinned” open cursors
 total number of open cursors

When we call the db.serverStatus() method and accesses the metrics field from the results and then
the cursor field from the metrics field it results following document:

Prepared by Kamal Podder Page 4


db.serverStatus().metrics.cursor
{ A pinned cursor is an internal implementation detail where
"timedOut" : <number> the pinned flag denotes an open cursor that is actively in
"open" : { use and should not be deleted.
"noTimeout" : <number>,
"pinned" : <number>, Cursors are generally pinned for a short period of time.
"total" : <number> Example: a find() or getMore() operation will pin a cursor to
} prevent it from being deleted while fetching a next batch of
} results, and unpin the cursor when results are returned.
Type Bracketing

MongoDB treats some data types as equivalent for comparison purposes. For instance, numeric types
undergo conversion before comparison. For most data types, however, comparison operators only
perform comparisons on documents where the BSON type of the target field matches the type of the
query operand. Consider the following collection:
The following query uses $gt to return
{ "_id": "apples", "qty": 5 } documents where the value of qty is
{ "_id": "bananas", "qty": 7 } greater than 4.
{ "_id": "oranges", "qty": { "in stock": 8, "ordered": 12 } } db.collection.find( { qty: { $gt: 4 } } )
{ "_id": "avocados", "qty": "fourteen" } Query returns the following documents:
{ "_id": "apples", "qty": 5 }
The document with _id equal to "oranges" is not
{ "_id": "bananas", "qty": 7 }
returned because its qty value is of type object.

Iterate the Returned Cursor

The find() method returns a cursor to the results. In the monogosh shell, if the returned cursor is not
assigned to a variable using the var keyword, the cursor is automatically iterated to access up to the
first 20 documents that match the query. You can set the DBQuery.shellBatchSize variable to change
the number of automatically iterated documents.

To manually iterate over the results, assign the returned cursor to a variable with the var keyword, as
shown in the following sections.

With Variable Name


Here we uses the variable myCursor to iterate over the cursor and print the matching documents:
var myCursor = db.bios.find( );

With next() Method : This cursor method next() is used to access the next document in the cursor:
var myCursor = db.bios.find( );
var myDocument = myCursor.hasNext() ? myCursor.next() : null;

if (myDocument) {
var myName = myDocument.name;

Prepared by Kamal Podder Page 5


print (tojson(myName));
} To print, you can also use the printjson() method instead
of print(tojson()):
if (myDocument) {
var myName = myDocument.name;
printjson(myName);
}

We can also use the cursor method next() to access the documents, as shown below:

var myCursor = db.users.find( { type: 2 } );


hasNext() method returns true if the cursor has
while (myCursor.hasNext()) {
more documents and can be iterated.
printjson(myCursor.next());}
}

With forEach() Method


The following example uses the cursor method forEach() to iterate the cursor and access the
documents:
var myCursor = db.bios.find( ); forEach() function is used to apply a JavaScript function
myCursor.forEach(printjson); for every document present in the cursor.

Iterator Index

In mongosh, you can use the toArray() method to iterate the cursor and return the documents in an
array, as in the following:
The toArray() method loads into RAM all
var myCursor = db.inventory.find( { type: 2 } ); documents returned by the cursor;
var documentArray = myCursor.toArray(); the toArray() method exhausts the cursor.
var myDocument = documentArray[3];

Additionally, some Drivers provide access to the documents by using an index on the cursor
(i.e. cursor[index]). This is a shortcut for first calling the toArray() method and then using an index on
the resulting array.

var myCursor = db.users.find( { type: 2 } ); The myCursor[1] is equivalent to the


var myDocument = myCursor[1]; myCursor.toArray() [1];

The findOne() method

Apart from the find() method, there is findOne() method, that returns only one document.
Syntax : >db.COLLECTIONNAME.findOne()

Prepared by Kamal Podder Page 6


RDBMS Where Clause Equivalents in MongoDB

We will use following collection for testing different options of query

Example : Assume we have created a collection named mycol as −

> use sampleDB


switched to db sampleDB
> db.createCollection("mycol")
{ "ok" : 1 }
>
And inserted 3 documents in it using the insert() method as shown below −
> db.mycol.insert([
{
title: "MongoDB Overview",
description: "MongoDB is no SQL database",
by: "tutorials point",
url: "http://www.tutorialspoint.com",
tags: ["mongodb", "database", "NoSQL"],
likes: 100
},
{
title: "NoSQL Database",
description: "NoSQL database doesn't have tables",
by: "tutorials point",
url: "http://www.tutorialspoint.com",
tags: ["mongodb", "database", "NoSQL"],
likes: 20,
comments: [
{
user:"user1",
message: "My first comment",
dateCreated: new Date(2013,11,10,2,35),
like: 0
}
]
}
])
Following method retrieves all the documents in the collection −
> db.mycol.find()
{ "_id" : ObjectId("5dd4e2cc0821d3b44607534c"), "title" : "MongoDB Overview", "description" :
"MongoDB is no SQL database", "by" : "tutorials point", "url" : "http://www.tutorialspoint.com", "tags"
: [ "mongodb", "database", "NoSQL" ], "likes" : 100 }

Prepared by Kamal Podder Page 7


{ "_id" : ObjectId("5dd4e2cc0821d3b44607534d"), "title" : "NoSQL Database", "description" : "NoSQL
database doesn't have tables", "by" : "tutorials point", "url" : "http://www.tutorialspoint.com", "tags" :
[ "mongodb", "database", "NoSQL" ], "likes" : 20, "comments" : [ { "user" : "user1", "message" : "My
first comment", "dateCreated" : ISODate("2013-12-09T21:05:00Z"), "like" : 0 } ] }
>
The pretty() Method

To display the results in a formatted way, you can use pretty() method.
Syntax : >db.COLLECTION_NAME.find().pretty()
Example
Following example retrieves all the documents from the collection named mycol and arranges them in
an easy-to-read format.
> db.mycol.find().pretty()
{
"_id" : ObjectId("5dd4e2cc0821d3b44607534c"),
"title" : "MongoDB Overview",
"description" : "MongoDB is no SQL database",
"by" : "tutorials point",
"url" : "http://www.tutorialspoint.com",
"tags" : [
"mongodb",
"database",
"NoSQL"
],
"likes" : 100
}
{
"_id" : ObjectId("5dd4e2cc0821d3b44607534d"),
"title" : "NoSQL Database",
"description" : "NoSQL database doesn't have tables",
"by" : "tutorials point",
"url" : "http://www.tutorialspoint.com",
"tags" : [
"mongodb",
"database",
"NoSQL"
],
"likes" : 20,
"comments" : [
{
"user" : "user1",
"message" : "My first comment",
"dateCreated" : ISODate("2013-12-09T21:05:00Z"),
"like" : 0

Prepared by Kamal Podder Page 8


}
]
}
To query the document on the basis of some condition, you can use following operations.
Operation Syntax Example RDBMS
Equivalent
Equality {<key>:{$eq;<value>}} db.mycol.find({"by":"tutorials where by =
point"}).pretty() 'tutorials point'
Less Than {<key>:{$lt:<value>}} db.mycol.find({"likes":{$lt:50}}).pretty() where likes <
50
Less Than {<key>:{$lte:<value>}} db.mycol.find({"likes":{$lte:50}}).pretty() where likes <=
Equals 50
Greater {<key>:{$gt:<value>}} db.mycol.find({"likes":{$gt:50}}).pretty() where likes >
Than 50
Greater {<key>:{$gte:<value>}} db.mycol.find({"likes":{$gte:50}}).pretty() where likes >=
Than 50
Equals
Not Equals {<key>:{$ne:<value>}} db.mycol.find({"likes":{$ne:50}}).pretty() where likes !=
50
Values in {<key>:{$in:[<value1>, db.mycol.find({"name":{$in:["Raj", Where name
an array <value2>,……<valueN>]}} "Ram", "Raghu"]}}).pretty() matches any of
the value in
:["Raj", "Ram",
"Raghu"]
Values not {<key>:{$nin:<value>}} db.mycol.find({"name":{$nin:["Ramu", Where name
in an array "Raghav"]}}).pretty() values is not in
the array
:["Ramu",
"Raghav"] or,
doesn’t exist at
all

AND in MongoDB

Syntax : To query documents based on the AND condition, you need to use $and keyword. Following is
the basic syntax of AND –

>db.mycol.find({ $and: [ {<key1>:<value1>}, { <key2>:<value2>} ] })


Example : Following example will show all the tutorials written by 'tutorials point' and whose title is
'MongoDB Overview'.
> db.mycol.find({$and:[{"by":"tutorials point"},{"title": "MongoDB Overview"}]}).pretty()
{

Prepared by Kamal Podder Page 9


"_id" : ObjectId("5dd4e2cc0821d3b44607534c"),
"title" : "MongoDB Overview",
"description" : "MongoDB is no SQL database",
"by" : "tutorials point",
"url" : "http://www.tutorialspoint.com",
"tags" : [ For the above given example, equivalent
"mongodb", where clause will be ' where by =
"database", 'tutorials point' AND title = 'MongoDB
"NoSQL" Overview' '. You can pass any number of
], key, value pairs in find clause.
"likes" : 100
}
>
OR in MongoDB

To query documents based on the OR condition, you need to use $or keyword.
Example : Following example will show all the tutorials written by 'tutorials point' or whose title is
'MongoDB Overview'.
>db.mycol.find({$or:[{"by":"tutorials point"},{"title": "MongoDB Overview"}]}).pretty()
{
"_id": ObjectId(7df78ad8902c),
"title": "MongoDB Overview",
"description": "MongoDB is no sql database",
"by": "tutorials point",
"url": "http://www.tutorialspoint.com",
"tags": ["mongodb", "database", "NoSQL"],
"likes": "100"
}
>
Using AND and OR Together

The following example will show the documents that have likes greater than 10 and whose title is
either 'MongoDB Overview' or by is 'tutorials point'. Equivalent SQL where clause is 'where likes>10
AND (by = 'tutorials point' OR title = 'MongoDB Overview')'

>db.mycol.find({"likes": {$gt:10}, $or: [{"by": "tutorials point"},


{"title": "MongoDB Overview"}]}).pretty()
{
"_id": ObjectId(7df78ad8902c),
"title": "MongoDB Overview",
"description": "MongoDB is no sql database",
"by": "tutorials point",
"url": "http://www.tutorialspoint.com",
"tags": ["mongodb", "database", "NoSQL"],
Prepared by Kamal Podder Page 10
"likes": "100"
}
>
Query on Embedded Documents
When the field holds an embedded document, a query can either specify an exact match on the
embedded document or specify a match by individual fields in the embedded document using the dot
notation.

Exact Match on the Embedded Document

To specify an equality match on the whole embedded document, use the query document { <field>:
<value>} where <value> is the document to match. Equality matches on an embedded document
require an exact match of the specified <value>, including the field order.

In the following example, the query matches all documents where the value of the field producer is an
embedded document that contains only the field company with the value ’ABC123’ and the field
address with the value ’123 Street’, in the exact order:

db.inventory.find( { producer: { company: 'ABC123', address: '123 Street' } } )

Equality Match on Fields within an Embedded Document

Use the dot notation to match by specific fields in an embedded document. Equality matches for
specific fields in an embedded document will select documents in the collection where the embedded
document contains the specified fields with the specified values. The embedded document can contain
additional fields.

In the following example, the query uses the dot notation to match all documents where the value of
the field producer is an embedded document that contains a field company with the value ’ABC123’
and may contain other fields:

db.inventory.find( { 'producer.company': 'ABC123' } )

Return Specific Fields in Embedded Documents

Use the dot notation to return specific fields inside an embedded document. For example, the
inventory collection contains the following document:
{
"_id" : 3, "type" : "food", "item" : "aaa",
"classification": { dept: "grocery", category: "chocolate" }
}

Prepared by Kamal Podder Page 11


The following operation returns all documents that match the query. The specified projection returns
only the category field in the classification document. The returned category field remains inside the
classification document.

db.inventory.find( { type: 'food', _id: 3 }, { "classification.category": 1, _id: 0 } )

The operation returns the following document:


{ "classification" : { "category" : "chocolate" } }

Arrays

When the field holds an array, you can query for an exact array match or for specific values in the
array. If the array holds embedded documents, you can query for specific fields in the embedded
documents using dot notation. If you specify multiple conditions using the $elemMatch operator, the
array must contain at least one element that satisfies all the conditions.

If you specify multiple conditions without using the $elemMatch operator, then some combination of
the array elements, not necessarily a single element, must satisfy all the conditions; i.e. different
elements in the array can satisfy different parts of the conditions. Consider an inventory collection that
contains the following documents:
{ _id: 5, type: "food", item: "aaa", ratings: [ 5, 8, 9 ] }
{ _id: 6, type: "food", item: "bbb", ratings: [ 5, 9 ] }
{ _id: 7, type: "food", item: "ccc", ratings: [ 9, 5, 8 ] }

Exact Match on an Array

To specify equality match on an array, use the query document { <field>: <value> } where <value> is
the array to match. Equality matches on the array require that the array field match exactly the
specified <value>, including the element order.

The following example queries for all documents where the field ratings is an array that holds exactly
three elements, 5, 8, and 9, in this order:
The operation returns the following document:
{ "_id" : 5, "type" : "food", "item" : "aaa",
db.inventory.find( { ratings: [ 5, 8, 9 ] } )
"ratings" : [ 5, 8, 9 ] }

Match an Array Element

Equality matches can specify a single element in the array to match. These specifications match if the
array contains at least one element with the specified value.
The following example queries for all documents where ratings is an array that contains 5 as one of its
elements:

db.inventory.find( { ratings: 5 } )

Prepared by Kamal Podder Page 12


The operation returns the following documents:

{ "_id" : 5, "type" : "food", "item" : "aaa", "ratings" : [ 5, 8, 9 ] }


{ "_id" : 6, "type" : "food", "item" : "bbb", "ratings" : [ 5, 9 ] }
{ "_id" : 7, "type" : "food", "item" : "ccc", "ratings" : [ 9, 5, 8 ] }

Match a Specific Element of an Array

Equality matches can specify equality matches for an element at a particular index or position of the
array using the dot notation. In the following example, the query uses the dot notation to match all
documents where the ratings array contains 5 as the first element:

db.inventory.find( { 'ratings.0': 5 } )

The operation returns the following documents:


{ "_id" : 5, "type" : "food", "item" : "aaa", "ratings" : [ 5, 8, 9 ] }
{ "_id" : 6, "type" : "food", "item" : "bbb", "ratings" : [ 5, 9 ] }

On Arrays and Embedded Documents

The following operation queries the bios collection and returns the last field in the name embedded
document and the first two elements in the contribs array:
db.bios.find({ }, { _id: 0, 'name.last': 1, contribs: { $slice: 2 } } )

Starting in MongoDB 4.4, you can also specify embedded fields using the nested form, e.g.
db.bios.find( { }, { _id: 0, name: { last: 1 }, contribs: { $slice: 2 } } )

Specify Multiple Criteria for Array Elements

The $elemMatch operator limits the contents of an <array> field from the query results to contain only
the first element matching the $elemMatch condition.

For Single Element Satisfies the Criteria use $elemMatch operator to specify multiple criteria on the
elements of an array such that at least one array element satisfies all the specified criteria. The
following example queries for documents where the ratings array contains at least one element that is
greater than ($gt) 5 and less than ($lt) 9:

db.inventory.find( { ratings: { $elemMatch: { $gt: 5, $lt: 9 } } } )

The operation returns the following documents, whose ratings array contains the element 8 which
meets the criteria:
{ "_id" : 5, "type" : "food", "item" : "aaa", "ratings" : [ 5, 8, 9 ] }
{ "_id" : 7, "type" : "food", "item" : "ccc", "ratings" : [ 9, 5, 8 ] }

Prepared by Kamal Podder Page 13


Other document { "_id" : 6, "type" : "food", "item" : "bbb", "ratings" : [ 5, 9 ] } does not match the
criteria because at least one array element 5 or 9 is not >5 and < 9.

Combination of Elements Satisfies the Criteria

The following example queries for documents where the ratings array contains elements that in some
combination satisfy the query conditions; e.g., one element can satisfy the greater than 5 condition
and another element can satisfy the less than 9 condition, or a single element can satisfy both:

db.inventory.find( { ratings: { $gt: 5, $lt: 9 } } )

The operation returns the following documents: The document with the ratings"
{ "_id" : 5, "type" : "food", "item" : "aaa", "ratings" : [ 5, 8, 9 ] } : [ 5, 9 ] matches the query
{ "_id" : 6, "type" : "food", "item" : "bbb", "ratings" : [ 5, 9 ] } since the element 9 is greater
{ "_id" : 7, "type" : "food", "item" : "ccc", "ratings" : [ 9, 5, 8 ] } than 5 (the first condition) and
the element 5 is less than 9 (the
Array of Embedded Documents second condition).

Consider that the inventory collection includes the


following documents:
{
Array of embedded document
_id: 100, type: "food", item: "xyz",
qty: 25, price: 2.5,
ratings: [ 5, 8, 9 ],
memos: [ { memo: "on time", by: "shipping" }, { memo: "approved", by: "billing" } ]
}
{
_id: 101, type: "fruit", item: "jkl",
qty: 10, price: 4.25,
ratings: [ 5, 9 ],
memos: [ { memo: "on time", by: "payment" }, { memo: "delayed", by: "shipping" } ]
}
Match a Field in the Embedded Document Using the Array Index

If you know the array index of the embedded document, you can specify the document using the
embedded document’s position using the dot notation. The following example selects all documents
where the memos contains an array whose first element (i.e. index is 0) is a document that contains
the field by whose value is ’shipping’:

db.inventory.find( { 'memos.0.by': 'shipping' } )

The operation returns the following document:


{
_id: 100, type: "food", item: "xyz",

Prepared by Kamal Podder Page 14


qty: 25, price: 2.5, ratings: [ 5, 8, 9 ],
memos: [ { memo: "on time", by: "shipping" }, { memo: "approved", by: "billing" } ]
}
Match a Field Without Specifying Array Index

If you do not know the index position of the document in the array, concatenate the name of the field
that contains the array, with a dot (.) and the name of the field in the embedded document.

The following example selects all documents where the memos field contains an array that contains at
least one embedded document that contains the field by with the value ’shipping’:

db.inventory.find( { 'memos.by': 'shipping' } )


The operation returns the following documents:
{
_id: 100, type: "food", item: "xyz",
qty: 25, price: 2.5, ratings: [ 5, 8, 9 ],
memos: [ { memo: "on time", by: "shipping" }, { memo: "approved", by: "billing" } ]
}
{
_id: 101, type: "fruit", item: "jkl",
qty: 10, price: 4.25, ratings: [ 5, 9 ],
memos: [ { memo: "on time", by: "payment" }, { memo: "delayed", by: "shipping" } ]
}
Specify Multiple Criteria for Array of Documents

For Single Element Satisfies the Criteria use $elemMatch operator to specify multiple criteria on an
array of embedded documents such that at least one embedded document satisfies all the specified
criteria.
The following example queries for documents where the memos array has at least one embedded
document that contains both the field memo equal to ’on time’ and the field by equal to ’shipping’:

db.inventory.find( { memos: { $elemMatch: { memo: 'on time', by: 'shipping' } } } )

The operation returns the following document:


{
_id: 100, type: "food", item: "xyz",
qty: 25, price: 2.5, ratings: [ 5, 8, 9 ],
memos: [ { memo: "on time", by: "shipping" }, { memo: "approved", by: "billing" } ]
}
Combination of Elements Satisfies the Criteria

The following example queries for documents where the memos array contains elements that in some
combination satisfy the query conditions; e.g. one element satisfies the field memo equal to ’on time’

Prepared by Kamal Podder Page 15


condition and another element satisfies the field by equal to ’shipping’ condition, or a single element
can satisfy both criteria:

db.inventory.find( { 'memos.memo': 'on time', 'memos.by': 'shipping' } )

The query returns the following documents:


{
_id: 100, type: "food", item: "xyz",
qty: 25, price: 2.5, ratings: [ 5, 8, 9 ],
memos: [ { memo: "on time", by: "shipping" }, { memo: "approved", by: "billing" } ]
}
{
_id: 101, type: "fruit", item: "jkl",
qty: 10, price: 4.25, ratings: [ 5, 9 ],
memos: [ { memo: "on time", by: "payment" }, { memo: "delayed", by: "shipping" } ]
}
MongoDB - Projection
In MongoDB, projection means selecting only the necessary data rather than selecting whole of the
data of a document. If a document has 5 fields and you need to show only 3, then select only 3 fields
from them.

The find() Method

Selects documents in a collection or view and returns a cursor to the selected documents.
Syntax : db.collection.find(query, projection)

Parameter Type Description


query document Optional. Specifies selection filter using query operators. To return all
documents in a collection, omit this parameter or pass an empty document
({}).
projection document Optional. Specifies the fields to return in the documents that match the
query filter. To return all fields in the matching documents, omit this
parameter. In MongoDB, when you execute find() method, then it displays
all fields of a document. To limit this, you need to set a list of fields with
value 1 or 0. 1 is used to show the field while 0 is used to hide the fields.

Returns: When the find() method “returns documents,” the method is actually returning a cursor to
the documents.

Example
Consider the collection mycol has the following data −
{_id : ObjectId("507f191e810c19729de860e1"), title: "MongoDB Overview"},
Prepared by Kamal Podder Page 16
{_id : ObjectId("507f191e810c19729de860e2"), title: "NoSQL Overview"},
{_id : ObjectId("507f191e810c19729de860e3"), title: "Tutorials Point Overview"}
Following example will display the title of the document while querying the document.
>db.mycol.find({},{"title":1,_id:0}) Please note _id field is always displayed
{"title":"MongoDB Overview"} while executing find() method, if you don't
{"title":"NoSQL Overview"} want this field, then you need to set it as 0.
{"title":"Tutorials Point Overview"}
>
The projection parameter

It determines which fields are returned in the matching documents. The projection parameter takes a
document of the form: { <field1>: <value>, <field2>: <value> ... }
Projection Description
<field>: <1 or true> Specifies the inclusion of a field.
<field>: <0 or false> Specifies the exclusion of a field.
"<field>.$": <1 or true> With the use of the $ array projection operator, you can specify the
projection to return the first element that match the query condition on
the array field; e.g. "arrayField.$" : 1. (Not available for views.)
<field>: <array projection> Using the array projection operators $elemMatch, $slice, specifies the
array element(s) to include, thereby excluding those elements that do
not meet the expressions. (Not available for views.)
<field>: <$meta expression> Using the $meta operator expression, specifies the inclusion of
available per-document metadata. (Not available for views.)
<field>: <aggregation expres Specifies the value of the projected field.
sion> Starting in MongoDB 4.4, with the use of aggregation expressions and
syntax, including the use of literals and aggregation variables, you can
project new fields or project existing fields with new values. For
example,
 If you specify a non-numeric, non-boolean literal (such as a literal
string or an array or an operator expression) for the projection
value, the field is projected with the new value; e.g.:
o { field: [ 1, 2, 3, "$someExistingField" ] }
o { field: "New String Value" }
o { field: { status: "Active", total: { $sum: "$existingArray" } }
}
 To project a literal value for a field, use the $literal aggregation
expression; e.g.:
o { field: { $literal: 5 } }
o { field: { $literal: true } }
o { field: { $literal: { fieldWithValue0: 0, fieldWithValue1: 1 }
}}
In versions 4.2 and earlier, any specification value (with the exception of
the previously unsupported document value) is treated as

Prepared by Kamal Podder Page 17


either true or false to indicate the inclusion or exclusion of the field.
New in version 4.4.

Projection for Array Fields

For fields that contain arrays, MongoDB provides the following projection operators: $elemMatch,
$slice, and $.

For example, the inventory collection contains the following document:


{ "_id" : 5, "type" : "food", "item" : "aaa", "ratings" : [ 5, 8, 9 ] }

Then the following operation uses the $slice projection operator to return just the first two elements in
the ratings array.

db.inventory.find( { _id: 5 }, { ratings: { $slice: 2 } } )

$elemMatch, $slice, and $ are the only way to project portions of an array. For instance, you cannot
project a portion of an array using the array index; e.g. { "ratings.0": 1 } projection will not project the
array with the first element.

MongoDB - Limit Records


The Limit() Method
To limit the records in MongoDB, you need to use limit() method. The method accepts one number
type argument, which is the number of documents that you want to be displayed.

Syntax : >db.COLLECTION_NAME.find().limit(NUMBER)
Example : Consider the collection myycol has the following data.

{_id : ObjectId("507f191e810c19729de860e1"), title: "MongoDB Overview"},


{_id : ObjectId("507f191e810c19729de860e2"), title: "NoSQL Overview"},
{_id : ObjectId("507f191e810c19729de860e3"), title: "Tutorials Point Overview"}
Following example will display only two documents while querying the document.
>db.mycol.find({},{"title":1,_id:0}).limit(2)
{"title":"MongoDB Overview"} If you don't specify the number argument
{"title":"NoSQL Overview"} in limit() method then it will display all
> documents from the collection.
MongoDB Skip() Method
Method skip() accepts number type argument and is used to skip the number of documents.
Syntax : >db.COLLECTION_NAME.find().limit(NUMBER).skip(NUMBER)
Example
Following example will display only the second document.

Prepared by Kamal Podder Page 18


>db.mycol.find({},{"title":1,_id:0}).limit(1).skip(1)
{"title":"NoSQL Overview"}
>
Please note, the default value in skip() method is 0.

MongoDB - Sort Records


The sort() Method

To sort documents in MongoDB, you need to use sort() method. The method accepts a document
containing a list of fields along with their sorting order. To specify sorting order 1 and -1 are used. 1 is
used for ascending order while -1 is used for descending order.

Syntax : >db.COLLECTION_NAME.find().sort({KEY:1})
Example
Consider the collection myycol has the following data.
{_id : ObjectId("507f191e810c19729de860e1"), title: "MongoDB Overview"}
{_id : ObjectId("507f191e810c19729de860e2"), title: "NoSQL Overview"}
{_id : ObjectId("507f191e810c19729de860e3"), title: "Tutorials Point Overview"}
Following example will display the documents sorted by title in the descending order.

>db.mycol.find({},{"title":1,_id:0}).sort({"title":-1})
{"title":"Tutorials Point Overview"}
{"title":"NoSQL Overview"}
{"title":"MongoDB Overview"}
>
Please note, if you don't specify the sorting preference, then sort() method will display the documents
in ascending order.

We can use the following syntax to sort documents in MongoDB by multiple fields:

db.myCollection.find().sort( { "field1": 1, "field2": -1 } )

Prepared by Kamal Podder Page 19


MongoDB - Update Document
MongoDB's update() and save() methods are used to update document into a collection.
 The update() method updates the values in the existing document
 The save() method replaces the existing document with the document passed in save() method.

Syntax : >db.COLLECTION_NAME.update(SELECTION_CRITERIA, UPDATED_DATA)


Example
Consider the mycol collection has the following data.
{ "_id" : ObjectId(5983548781331adf45ec5), "title":"MongoDB Overview"}
{ "_id" : ObjectId(5983548781331adf45ec6), "title":"NoSQL Overview"}
{ "_id" : ObjectId(5983548781331adf45ec7), "title":"Tutorials Point Overview"}

Following example will set the new title 'New MongoDB Tutorial' of the documents whose title is
'MongoDB Overview'.
>db.mycol.update({'title':'MongoDB Overview'},{$set:{'title':'New MongoDB Tutorial'}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

 Choose the condition which you want to use to decide which document needs to be updated. Here
we want to update the document which has title = ‘MongoDB Overview’
 Use the set command to modify the Field Name
 Choose which Field Name you want to modify and enter the new value accordingly.
 The update operation returns a WriteResult object which contains the status of the operation. A
successful update of the document returns the above object. The nMatched field specifies the
number of existing documents matched for the update, and nModified specifies the number of
existing documents modified.

>db.mycol.find()
{ "_id" : ObjectId(5983548781331adf45ec5), "title":"New MongoDB Tutorial"}
{ "_id" : ObjectId(5983548781331adf45ec6), "title":"NoSQL Overview"}
{ "_id" : ObjectId(5983548781331adf45ec7), "title":"Tutorials Point Overview"}
>
>db.Employee.update({"Employeeid" : 1},{$set: { "EmployeeName" : "NewMartin"}});
If the command is executed successfully, the following Output will be shown

Prepared by Kamal Podder Page 1


By default, MongoDB will update only a single document. To update multiple documents, you need to
set a parameter 'multi' to true.

>db.mycol.update({'title':'MongoDB Overview'}, {$set:{'title':'New MongoDB Tutorial'}},{multi:true})

Update an embedded field.

To update a field within an embedded document, use the dot notation. When using the dot notation,
enclose the whole dotted field name in quotes. The following updates the model field within the
embedded details document.

db.inventory.update({ item: "ABC1" },{ $set: { "details.model": "14Q2" } })

The update operation returns a WriteResult object which contains the status of the operation. A
successful update of the document returns the following object:
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Replace the Document

To replace the entire content of a document except for the _id field, pass an entirely new document as
the second argument to update(). The replacement document can have different fields from the
original document. In the replacement document, you can omit the _id field since the _id field is
immutable. If you do include the _id field, it must be the same value as the existing value.

The following operation replaces the document with item equal to "BE10". The newly replaced
document will only contain the _id field and the fields in the replacement document.
db.inventory.update( { item: "BE10" },
{
item: "BE05",
stock: [ { size: "S", qty: 20 }, { size: "M", qty: 5 } ], category: "apparel"
}
)
upsert Option

By default, if no document matches the update query, the update() method does nothing. However, by
specifying upsert: true, the update() method either updates matching document or documents, or
inserts a new document using the update specification if no matching document exists.

 Specify upsert: true for the update replacement operation.

When you specify upsert: true for an update operation to replace a document and no matching
documents are found, MongoDB creates a new document using the equality conditions in the update
conditions document, and replaces this document, except for the _id field if specified, with the update
document.

Prepared by Kamal Podder Page 2


The following operation either updates a matching document by replacing it with a new document or
adds a new document if no matching document exists.

db.inventory.update( { item: "TBD1" },


{
item: "TBD1",
details: { "model" : "14Q4", "manufacturer" : "ABC Company" },
stock: [ { "size" : "S", "qty" : 25 } ], category: "houseware"
},
{ upsert: true }
)
The update operation returns a WriteResult object which contains the status of the operation,
including whether the db.collection.update() method modified an existing document or added a new
document.
WriteResult({ "nMatched" : 0, "nUpserted" : 1, "nModified" : 0,
"_id" : ObjectId("53dbd684babeaec6342ed6c7") })

The nMatched field shows that the operation matched 0 documents.


The nUpserted of 1 shows that the update added a document.
The nModified of 0 specifies that no existing documents were updated.
The _id field shows the generated _id field for the added document.

 Specify upsert: true for the update specific fields operation.

When you specify upsert : true for an update operation that modifies specific fields and no matching
documents are found, MongoDB creates a new document using the equality conditions in the update
conditions document, and applies the modification as specified in the update document. The following
update operation either updates specific fields of a matching document or adds a new document if no
matching document exists.

db.inventory.update( { item: "TBD2" },


{ $set: { details: { "model" : "14Q3", "manufacturer" : "IJK Co." }, category: "houseware" } },
{ upsert: true }
)

MongoDB Save() Method


Updates an existing document or inserts a new document, depending on its document parameter.
The save() method has the following form:
db.collection.save( <document>, Optional. A document expressing the write concern.
{ writeConcern: <document> } Omit to use the default write concern
)
The save() returns an object that contains the status of the operation.

Prepared by Kamal Podder Page 3


The save() method uses either the insert or the update command, which use the default write concern.
To specify a different write concern, include the write concern in the options parameter.

Insert : If the document does not contain an _id field, then the save() method calls the insert() method.
During the operation, the mongo shell will create an ObjectId and assign it to the _id field.

NOTE
Most MongoDB driver clients will include the _id field and generate an ObjectId before sending the
insert operation to MongoDB; however, if the client sends a document without an _id field,
the mongod will add the _id field and generate the ObjectId.

Update : If the document contains an _id field, then the save() method is equivalent to an update with
the upsert option set to true and the query predicate on the _id field.

Examples

Save a New Document without Specifying an _id Field

Here save() method performs an insert since the document passed to the method does not contain
the _id field:
db.products.save( { item: "book", qty: 40 } )

During the insert, the shell will create the _id field with a unique ObjectId value, as verified by the
inserted document:
{ "_id" : ObjectId("50691737d386d8fadbd6b01d"), "item" : "book", "qty" : 40 }

Save a New Document Specifying an _id Field : Here save() performs an update with upsert:true since
the document contains an _id field:
db.products.save( { _id: 100, item: "water", qty: 30 } )

Because the _id field holds a value that does not exist in the collection, the update operation results in
an insertion of the document. The results of these operations are identical to an update() method with
the upsert option set to true.

Following example will replace the document with the _id '5983548781331adf45ec5'.
>db.mycol.save(
{
"_id" : ObjectId("507f191e810c19729de860ea"), "title":"Tutorials Point New Topic",
"by":"Tutorials Point"
}
)
WriteResult({"nMatched" : 0, "nUpserted" : 1, "nModified" : 0,
"_id" : ObjectId("507f191e810c19729de860ea")
})

Prepared by Kamal Podder Page 4


>db.mycol.find()
{ "_id" : ObjectId("507f191e810c19729de860e6"), "title":"Tutorials Point New Topic",
"by":"Tutorials Point"}
{ "_id" : ObjectId("507f191e810c19729de860e6"), "title":"NoSQL Overview"}
{ "_id" : ObjectId("507f191e810c19729de860e6"), "title":"Tutorials Point Overview"}
>

MongoDB findOneAndUpdate() method


The findOneAndUpdate() method updates the values in the existing document.

Syntax : >db.COLLECTION_NAME.findOneAndUpdate(SELECTIOIN_CRITERIA, UPDATED_DATA)


Example
Assume we have created a collection named empDetails and inserted three documents in it as shown
below −
> db.empDetails.insertMany(
[
{
First_Name: "Radhika", Last_Name: "Sharma",
Age: "26", e_mail: "radhika_sharma.123@gmail.com",
phone: "9000012345"
},
{
First_Name: "Rachel", Last_Name: "Christopher",
Age: "27", e_mail: "Rachel_Christopher.123@gmail.com",
phone: "9000054321"
},
{
First_Name: "Fathima", Last_Name: "Sheik",
Age: "24", e_mail: "Fathima_Sheik.123@gmail.com",
phone: "9000054321"
}
]
)
Following example updates the age and email values of the document with name 'Radhika'.
> db.empDetails.findOneAndUpdate( {First_Name: 'Radhika'},
{ $set: { Age: '30',e_mail: 'radhika_newemail@gmail.com'}} )
{
"_id" : ObjectId("5dd6636870fb13eec3963bf5"),
"First_Name" : "Radhika", "Last_Name" : "Sharma",
"Age" : "30",
"e_mail" : "radhika_newemail@gmail.com",
"phone" : "9000012345"
}
Prepared by Kamal Podder Page 5
MongoDB updateOne() method
 This method updates a single document which matches the given filter, updateMany() method can
be used in the multi-document transactions.
 When you update your documents the value of _id field in not change.
 It also adds new fields in the documents.

Syntax : >db.COLLECTION_NAME.updateOne(<filter>, <update>, options)

Parameters:
filter: It specifies the selection criteria for the update. The type of this parameter is document. If it
contains empty document, i.e, {}, then this method will update all the documents of the collection with
the update document.
update: The type of this parameter is document or pipeline and it contains modification that will apply
to the documents. It can be a update Document(only contain update operator expressions) or
aggregation pipeline(only contain aggregation stages, i.e, $addFields, $project, $replaceRoot).

Options :
{
upsert: <boolean>,
writeConcern: <document>,
collation: <document>,
arrayFilters: [ <filterdocument1>, ... ],
hint: <document|string> // Available starting in MongoDB 4.2.1
}

Optional Parameters:
 upsert: The value of this parameter is either true or false. If the value of this parameter is true,
then the method will update the documents that match the given condition or if any of the
documents in the collection does not match the given filter, then this method will insert a new
document(i.e., update Document) in the collection. The type of this parameter is a Boolean and the
default value of this parameter is false.
 writeConcern: It is only used when you do not want to use the default write concern. The type of
this parameter is document.
 collation: It specifies the use of the collation for operations. It allows users to specify the language-
specific rules for string comparison like rules for letter case and accent marks. The type of this
parameter is document.
 arrayFilters: It is an array of filter documents that indicates which array elements to modify for an
update operation on an array field. The type of this parameter is an array.
 hint: It is a document or field that specifies the index to use to support the filter. It can take an
index specification document or the index name string and if you specify an index that does not
exist, then it will give an error.

Prepared by Kamal Podder Page 6


Return: This method will return a document that contains a boolean acknowledged as true (if the write
concern is enabled) or false (if the write concern is disabled), matchedCount represents the number of
matched documents, modifiedCount represents the number of modified documents, and upsertedId
represents the _id of the upserted document.

Example
> db.empDetails.updateOne({First_Name: 'Radhika'},
{ $set: { Age: '30',e_mail: 'radhika_newemail@gmail.com'}})
The operation returns
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 0 }
after successful update.
If no matches were found, the operation instead returns:
{ "acknowledged" : true, "matchedCount" : 0, "modifiedCount" : 0 }

Note : Setting upsert: true would insert the document if no match was found.

MongoDB updateMany() method


The updateMany() method updates all the documents that matches the given filter.

Syntax : >db.COLLECTION_NAME.updateMany(<filter>, <update>, options)


Example
> db.empDetails.updateMany({Age:{ $gt: "25" }},{ $set: { Age: '00'}} )
{ "acknowledged" : true, "matchedCount" : 2, "modifiedCount" : 2 }

You can see the updated values if you retrieve the contents of the document using the find method as
shown below −
> db.empDetails.find()
{ "_id" : ObjectId("5dd6636870fb13eec3963bf5"), "First_Name" : "Radhika", "Last_Name" : "Sharma",
"Age" : "00", "e_mail" : "radhika_newemail@gmail.com", "phone" : "9000012345" }
{ "_id" : ObjectId("5dd6636870fb13eec3963bf6"), "First_Name" : "Rachel", "Last_Name" :
"Christopher", "Age" : "00", "e_mail" : "Rachel_Christopher.123@gmail.com", "phone" : "9000054321"
}
{ "_id" : ObjectId("5dd6636870fb13eec3963bf7"), "First_Name" : "Fathima", "Last_Name" : "Sheik",
"Age" : "24", "e_mail" : "Fathima_Sheik.123@gmail.com", "phone" : "9000054321" }
>
MongoDB - Delete Document

The remove() Method


MongoDB's remove() method is used to remove a document from the collection. remove() method
accepts two parameters. One is deletion criteria and second is justOne flag.
 deletion criteria − (Optional) deletion criteria according to documents will be removed.
 justOne − (Optional) if set to true or 1, then remove only one document.

Prepared by Kamal Podder Page 7


Syntax : >db.COLLECTION_NAME.remove(DELLETION_CRITTERIA)

Example

Consider the mycol collection has the following data.


{_id : ObjectId("507f191e810c19729de860e1"), title: "MongoDB Overview"},
{_id : ObjectId("507f191e810c19729de860e2"), title: "NoSQL Overview"},
{_id : ObjectId("507f191e810c19729de860e3"), title: "Tutorials Point Overview"}
Following example will remove all the documents whose title is 'MongoDB Overview'.
>db.mycol.remove({'title':'MongoDB Overview'})
WriteResult({"nRemoved" : 1})
> db.mycol.find()
{"_id" : ObjectId("507f191e810c19729de860e2"), "title" : "NoSQL Overview" }
{"_id" : ObjectId("507f191e810c19729de860e3"), "title" : "Tutorials Point Overview" }

Remove Only One


If there are multiple records and you want to delete only the first record, then set justOne parameter
in remove() method.
>db.COLLECTION_NAME.remove(DELETION_CRITERIA,1)

Remove All Documents


If you don't specify deletion criteria, then MongoDB will delete whole documents from the
collection. This is equivalent of SQL's truncate command.
> db.mycol.remove({})
WriteResult({ "nRemoved" : 2 })
> db.mycol.find()
>

Prepared by Kamal Podder Page 8


Advanced MongoDB queries for nested documents

Sample data is some product data for an online shop of laptops, as demonstrated below :
[
{
"_id": 1,
"name": "HP EliteBook Model 1",
"price": 38842.0,
"quantity": 1,
"brand": "HP",
"attributes": [
{ "attribute_name": "cpu", "attribute_value": "Intel Core i7" },
{ "attribute_name": "memory", "attribute_value": "8GB" },
{ "attribute_name": "storage", "attribute_value": "256GB" }
]
},
{
"_id": 2,
"name": "Lenovo IdeaPad Model 2",
"price": 9405.0,
"quantity": 2,
"brand": "Lenovo",
"attributes": [
{ "attribute_name": "cpu", "attribute_value": "Intel Core i5" },
{ "attribute_name": "memory", "attribute_value": "8GB" },
{ "attribute_name": "storage", "attribute_value": "256GB" }
]
},
………
]
As we see, the laptop documents in the laptops collection have an attributes field that is an array of
embedded documents. It is more complex to query and update a field like this than simple non-nested
fields.
First, let’s find all the laptops whose CPU is Intel Core i7–8550U. This should be pretty simple because
the CPU is unambiguous:

db.laptops.find(
{
"attributes.attribute_value": "Intel Core i7"
}
)
Note that the nested field is queried with a dot notation and must be put in quotes. We should get the
result we want because there is only one nested document in the attributes array that has the value for

Prepared by Kamal Podder Page 1


CPU models:
[
{
_id: 34,
name: 'Lenovo ThinkPad Model 34',
price: 14988,
quantity: 3,
brand: 'Lenovo',
attributes: [
{ attribute_name: 'cpu', attribute_value: 'Intel Core i7' },
{ attribute_name: 'memory', attribute_value: '8GB' },
{ attribute_name: 'storage', attribute_value: '256GB' }
]
},
{
_id: 35,
name: 'Lenovo ThinkPad Model 35',
price: 22644,
quantity: 4,
brand: 'Lenovo',
attributes: [
{ attribute_name: 'cpu', attribute_value: 'Intel Core i7' },
{ attribute_name: 'memory', attribute_value: '8GB' },
{ attribute_name: 'storage', attribute_value: '256GB' }
]
},
......
]

Now let’s find all the laptops that have a memory of 16GB. Intuitively, you may want to use a query like
this:
db.laptops.find(
{
"attributes.attribute_name": "memory",
"attributes.attribute_value": "16GB"
}
)
When the above query is executed, it seems all the laptops whose memory is 16GB are returned:
[
{
_id: 9,
name: 'HP EliteBook Model 9',
price: 22209,
Prepared by Kamal Podder Page 2
quantity: 9,
brand: 'HP',
attributes: [
{ attribute_name: 'cpu', attribute_value: 'Intel Core i7' },
{ attribute_name: 'memory', attribute_value: '16GB' },
{ attribute_name: 'storage', attribute_value: '512GB' }
]
},
{
_id: 11,
name: 'HP ZBook Model 11',
price: 45175,
quantity: 1,
brand: 'HP',
attributes: [
{
attribute_name: 'cpu',
attribute_value: 'Intel Core i7'
},
{ attribute_name: 'memory', attribute_value: '16GB' },
{ attribute_name: 'storage', attribute_value: '512GB' }
]
},
......
]
However, this is where many beginners of MongoDB make mistakes and where some bugs are
introduced into your code. If you enter “it” in mongosh to show more results or just scroll down close to
the bottom of the result page with an IDE and check carefully, you will find something strange:
...,
{ We get the laptops whose
_id: 144, storage is 16GB as well with
name: 'HP ZBook Model 144', the query above! This is
price: 14759, because the above query finds
quantity: 2, the documents where the
brand: 'HP', attributes array has at least
attributes: [ one embedded document that
{ attribute_name: 'cpu', attribute_value: 'Intel Core i5' }, contains field attribute_name
{ attribute_name: 'memory', attribute_value: '16GB' }, equal to memory and at least
{ attribute_name: 'storage', attribute_value: '16GB' } one embedded document (but
] not necessarily the same
}, embedded document) that
{ contains field attribute_value
_id: 145, equal to 16GB.
name: 'HP ZBook Model 145',
Prepared by Kamal Podder Page 3
price: 53855,
quantity: 3,
brand: 'HP',
attributes: [
{ attribute_name: 'cpu', attribute_value: 'Intel Core i7' },
{ attribute_name: 'memory', attribute_value: '8GB' },
{ attribute_name: 'storage', attribute_value: '16GB' }
]
},
...

Since all the attributes arrays have an embedded document whose attribute_name is memory, the
query above gave use erroneous results. What we want is that both conditions should be satisfied for
the same embedded document. To achieve this, we cannot query by dot notation as shown above but
need to use the $elemMatch operator:
db.laptops.find(
{ This time the laptops whose
attributes: { storage is 16GB but memory is
$elemMatch: { not 16GB will not be returned.
attribute_name: 'memory', attribute_value: '16GB' If you don’t believe it, you can
} count the number of
} documents that are returned
} by both queries:
)

db.laptops.find(
{
"attributes.attribute_name": "memory",
"attributes.attribute_value": "16GB"
}
).count()
// Returns 77

db.laptops.find(
{
attributes: {
$elemMatch: {
attribute_name: 'memory', attribute_value: '16GB'
}
}
}
).count()
// Returns 75

Prepared by Kamal Podder Page 4


Now let’s try an even more complex case and find all the laptops whose memory is 16GB and storage is
1TB. We would need to use the $and operator to specify the conditions for two nested documents

db.laptops.find(
{
$and: [
{
attributes: {
$elemMatch: {
attribute_name: 'memory',
attribute_value: '16GB'
}
}
},
{
attributes: {
$elemMatch: {
attribute_name: 'storage',
attribute_value: '1TB'
}
}
}]
}
)
This query will return the result we want:
[
{
_id: 22,
name: 'HP EliteBook Model 22',
price: 32425,
quantity: 1,
brand: 'HP',
attributes: [
{ attribute_name: 'cpu', attribute_value: 'Intel Core i7' },
{ attribute_name: 'memory', attribute_value: '16GB' },
{ attribute_name: 'storage', attribute_value: '1TB' }
]
},
{
_id: 107,
name: 'HP EliteBook Model 107',
price: 35450,
Prepared by Kamal Podder Page 5
quantity: 6,
brand: 'HP',
attributes: [
{ attribute_name: 'cpu', attribute_value: 'Intel Core i7' },
{ attribute_name: 'memory', attribute_value: '16GB' },
{ attribute_name: 'storage', attribute_value: '1TB' }
]
},
{
_id: 129,
name: 'Lenovo Legion Model 129',
price: 29495,
quantity: 7,
brand: 'Lenovo',
attributes: [
{ attribute_name: 'cpu', attribute_value: 'Intel Core i7' },
{ attribute_name: 'memory', attribute_value: '16GB' },
{ attribute_name: 'storage', attribute_value: '1TB' }
]
}
]
Note that even though the $and operator is the You will get one incorrect result in this case. If you
default one in MongoDB, it is mandatory here try with other conditions, you will get more
because we are querying the same field in two incorrect results.
different conditions. Otherwise, we will only query ...
by the second condition and will get incorrect {
results:
db.laptops.find( _id: 67,
{ name: 'HP ZBook Studio Model 67',
attributes: { price: 54575,
$elemMatch: { quantity: 6,
attribute_name: 'memory', brand: 'HP',
attribute_value: '16GB' attributes: [
} { attribute_name: 'cpu', attribute_value: 'Intel
}, Core i7' },
attributes: { { attribute_name: 'memory', attribute_value:
$elemMatch: { '32GB' },
attribute_name: 'storage', { attribute_name: 'storage', attribute_value:
attribute_value: '1TB' '1TB' }
} ]
} },
} ...
)

Prepared by Kamal Podder Page 6


Once you know how to query an array with nested documents, it should be fairly easy to update it. Note
that the nested documents are ordered in the array and therefore you can access a nested document by
the index position.

Let’s update the memory to 16GB for the laptop with _id equal to 1:
db.laptops.updateOne(
{ _id: 1 },
{ $set: { "attributes.1.attribute_value": "16GB" } }
)

Prepared by Kamal Podder Page 7


Mongo Shell
 Mongo Shell is a JavaScript based command line interface to connect to MongoDB and to perform
various operations. Legacy Mongo shell comes with MongoDB installation. The shell is useful for
performing administrative functions and running instances.
 The new MongoDB Shell, mongosh, is a fully functional JavaScript and Node.js 16.x REPL (Read-
Eval-Print-Loop) environment for interacting with MongoDB deployments. You can use
the MongoDB Shell to test queries and operations directly with your database.
 mongosh is available as a standalone package in the MongoDB download center. We have to
download and install separately before use. We have already coved it earlier.

o Improved syntax highlighting. mongosh, offers numerous advantages


o Improved command history. over the legacy mongo shell.
o Improved logging.

Currently mongosh supports a subset of the mongo shell methods. Achieving feature parity
between mongosh and the mongo shell is an ongoing effort.

To maintain backwards compatibility, the methods that mongosh supports use the same syntax as the
corresponding methods in the mongo shell.

We can use the MongoDB Shell to connect to MongoDB version 4.0 or greater.

Prerequisites

 The MongoDB server must be installed and running before you can connect to it from
the mongo shell or mongoDB shell.
 Once you have verified that the mongod server is running, open a terminal window (or a
command prompt for Windows) and run mongo or mongosh.

Connecting MongoDB and run mongo shell (Legacy)

To connect mongo Shell to a Mongo Server use following command:

 Run mongod.exe from a command prompt as shown below. Don’t close the command window.
 Run mongo.exe from another command prompt to execute mongo shell. In the prompt we can
type shell command to execute.

You should start mongoDB before starting the shell because shell automatically attempt to connect to
a MongoDB server on startup. As the shell is a full-featured JavaScript interpreter It is capable of
running Arbitrary JavaScript program.

Prepared by Kamal Podder Page 1


Using mongosh
We will discuss how to use mongosh for manipulation database.

As mongosh is built on top of Node.js the entire Node.js API is available inside mongosh. This is a big
step forward from the legacy mongo shell, where the API available to developers was a limited
JavaScript subset. We can customize mongosh to suit developer needs like any other modern tool.

Use an Editor for Commands

The mongosh console is line oriented. However, you can also use an editor to work with multiline
functions. There are two options:
1. Use the edit command with an external editor. Here we will discuss only how to
2. Use the .editor command, a built-in editor. use built-in editor.

Connecting MongoDB and run mongoDB shell

To connect mongoDB Shell to a Mongo Server use following command:


 Run mongod.exe from a command prompt as shown below. Don’t close the command window.
 Run mongosh from another command prompt to execute mongoDB shell. In the prompt we can

Prepared by Kamal Podder Page 2


type shell command to execute.
 In command prompt when we enter mongosh without any option, it will try to connect to a
MongoDB instance running on localhost with default port 27017. This is equivalent to the
command mongosh "mongodb://localhost:27017"

Execute javascript and MongoDB Commands


Using in-built editor

 Run the .editor command to execute multi-line commands. Press Ctrl+d to run a command
or Ctrl+c to cancel.
 MongoDB shell is JavaScript and Node.js REPL, so you can also execute limited JavaScript code.

In the editor after


entering the command
we press Ctr+D to
execute the command.
Press Ctrl + c twice to exit from the MongoDB shell.
It will show the result
in next line.
Let us take a simple mathematical program:
test>x= 100
You can also use the JavaScript libraries
100
test> "Hello, World!".replace("World", "MongoDB");
test>x/ 5;
Hello, MongoDB!
20

We can even define and call JavaScript functions


We can also use the editor to execute it.
test> function factorial (n) {
... if (n <= 1) return 1;
... return n * factorial(n - 1);
... }
test> factorial (5);
120

List of databases

To list available databases use show dbs command:

Use the "db" command to


check the current database.
test> db
test

Prepared by Kamal Podder Page 3


Switching database

The command use databaseName can be used to switch to a database:


test> use authDB
switched to db authDB Now the prompt will change from test to authDB
authDB>

Querying a collection

To find the documents in the collection “uses” in database authDB issue the command
db.getCollection(“users”).find() . The db.collection.find() method returns a cursor to the results;

Prepared by Kamal Podder Page 4


Image in this document is
shown as binary buffer

cursor.pretty()

It configures the cursor to display results in a format that is easy to read. The pretty() method has the
prototype form: db.collection.find(<query>).pretty()
Note : In most cases, mongosh
The pretty() method: methods work the same way as the
 Does not change the output format in mongosh. legacy mongo shell methods.
 Changes the output format in the legacy mongo shell. However, some legacy methods are
unavailable in mongosh.

Prepared by Kamal Podder Page 5


Examples : Consider the following document:

db.books.save({
"_id" : ObjectId("54f612b6029b47909a90ce8d"),
"title" : "A Tale of Two Cities",
"text" : "It was the best of times, it was the worst of times, it was the age of wisdom, it was the age
of foolishness...",
"authorship" : "Charles Dickens"})

By default, db.collection.find() returns data in a dense format:


db.books.find()
{ "_id" : ObjectId("54f612b6029b47909a90ce8d"), "title" : "A Tale of Two Cities", "text" : "It was the
best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness...",
"authorship" : "Charles Dickens" }

By using cursor.pretty() you can set the cursor to return data in a format that is easier to read:
db.books.find().pretty()
{
"_id" : ObjectId("54f612b6029b47909a90ce8d"),
"title" : "A Tale of Two Cities",
"text" : "It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of
foolishness...",
"authorship" : "Charles Dickens"
}
Inserting a document : Use db.collection.insertOne(theDocument) command as shown below:

To check the inserted document


added in the collection

Prepared by Kamal Podder Page 6


We can specify a different _id field value into one or more documents, as shown below.
db.employees.insert( Output
[ {
{ acknowledged: true,
firstName: "John", insertedIds: {
lastName: "King", '0': ObjectId("616d63eda861820797edd9b3"),
email: "john.king@abc.com" '1': 1,
}, '2': ObjectId("616d63eda861820797edd9b5")
{ }
_id:1, }
firstName: "Sachin",
lastName: "T", Note : By default, the insert() method performs ordered
email: "sachin.t@abc.com" inserts. So, if an error occurred in any of the documents,
}, then it won't process the remaining documents.
{
firstName: "James",
lastName: "Bond",
email: "jamesb@abc.com"
},
])

MongoDB Shell Collection Methods


We can run aggregation pipelines on your collections using the MongoDB Shell. Aggregation pipelines
transform your documents into aggregated results based on selected pipeline stages.

Common uses for aggregation include:


 Grouping data by a given expression.
 Calculating results based on multiple fields and storing those results in a new field.
 Filtering data to return a subset that matches a given criteria.
 Sorting data.

When you run an aggregation, MongoDB Shell outputs the results directly to the terminal.

Following are the MongoDB collection methods that are used in different scenarios.

1. db.collection.aggregate(pipeline, option)

The aggregate method calculates aggregate values for the data in a collection or a view and return
computed results. It collects values from various documents and groups them together and then
performs different types of operations on that grouped data like sum, average, minimum, maximum,
etc. to return a computed result. It is similar to the aggregate function of SQL.

Prepared by Kamal Podder Page 7


MongoDB provides three ways to perform aggregation
 Aggregation pipeline We will discuss the aggregation
 Map-reduce function pipeline operators in detail later on.
 Single-purpose aggregation

Parameter Type Description


pipeline array A sequence of data aggregation operations or stages. The method can
still accept the pipeline stages as separate arguments instead of as
elements in an array; however, if we do not specify the pipeline as an
array, we cannot specify the options parameter.
options document Optional. Additional options that aggregate() passes to the
aggregate command. Available only if you specify the pipeline as an
array.

Aggregation pipeline

The aggregation pipeline consists of stages and each stage transforms the document. Or in other
words, the aggregation pipeline is a multi-stage pipeline, so in each state,
 the documents taken as input and produce the resultant set of documents
 now in the next stage(id available) the resultant documents is taken as input and produce output
 this process is going on till the last stage.

Let us discuss the aggregation pipeline with the help of an example:

It refers the name of the field

Following figure shows the pictorial representation of above aggregation method. In the above
example

 Aggregation is applied on a collection of train fares


 The $match stage filters the documents by the value in class field i.e. class: “first-class” and passes
the document to the second stage.
 In the Second Stage, the $group stage groups the documents by the id field to calculate the sum of
fare for each unique id. Here id is a field of the document not really the _id field (which is unique
key field). In $group _id: “id” indicates group by the id field.

Prepared by Kamal Podder Page 8


train collection contains these documents

Stages: Each stage starts from stage operators which are:

 $match: It is used for filtering the documents can reduce the amount of documents that are
given as input to the next stage.
 $project: It is used to select some specific fields from a collection.
 $group: It is used to group documents based on some value.
 $sort: It is used to sort the document that is rearranging them
 $skip: It is used to skip n number of documents and passes the remaining documents
 $limit: It is used to pass first n number of documents thus limiting them.
 $unwind: It is used to unwind documents that are using arrays i.e. it deconstructs an array field
in the documents to return documents for each element.
 $out: It is used to write resulting documents to a new collection

Expressions: It refers to the name of the field in input documents for e.g. { $group : { _id : “$id“,
total:{$sum:”$fare“}}} here $id and $fare are expressions.

Accumulators: These are basically used in the group stage


 sum: It sums numeric values for the documents in each group
 count: It counts total numbers of documents
 avg: It calculates the average of all given values from all documents
 min: It gets the minimum value from all the documents
 max: It gets the maximum value from all the documents
 first: It gets the first document from the grouping

Prepared by Kamal Podder Page 9


 last: It gets the last document from the grouping

Note:
 in $group _id is Mandatory field
 $out must be the last stage in the pipeline
 $sum: 1 will count the number of documents and $sum:”$fare” will give the sum of total fare
generated per id.

2. db.collection.bulkWrite()

MongoDB provides clients the ability to perform write operations in bulk. Bulk write operations affect
a single collection. Array of write operations are executed by this operation. Operations are executed
in a specific order by default.

MongoDB allows applications to determine the acceptable level of acknowledgement required for bulk
write operations.
The db.collection.bulkWrite() method provides the ability to
Syntax: perform bulk insert, update, and remove operations. MongoDB
db.collection.bulkWrite( also supports bulk insert through the db.collection.insertMany().
[ <op. 1>, <op. 2>, .. ],
{ writeConcern : <document>, ordered: <boolean> }
)
Ordered vs Unordered Operations

Bulk write operations can be either ordered or unordered.

With an ordered list of operations, MongoDB executes the operations serially. If an error occurs during
the processing of one of the write operations, MongoDB will return without processing any remaining
write operations in the list.

With an unordered list of operations, MongoDB can execute the operations in parallel, but this
behavior is not guaranteed. If an error occurs during the processing of one of the write operations,
MongoDB will continue to process remaining write operations in the list.

By default, bulkWrite() performs ordered operations. To specify unordered write operations,


set ordered : false in the options document.

bulkWrite() supports the following write operations:


 insertOne
 updateOne
 updateMany Each write operation is passed to bulkWrite() as a
 replaceOne document in an array.
 deleteOne
 deleteMany

Prepared by Kamal Podder Page 10


Example

Here’s an example of using db.collection.bulkWrite() to perform a bulk write operation against a


collection called pets:
Suppose we insert the following documents into a collection called pets:

db.pets.insertMany([
{ _id: 1, name: "Wag", type: "Dog", weight: 20 },
{ _id: 2, name: "Bark", type: "Dog", weight: 10 },
{ _id: 3, name: "Meow", type: "Cat" },
{ _id: 4, name: "Scratch", type: "Cat" },
{ _id: 5, name: "Bruce", type: "Bat" }
])
We can now use db.collection.bulkWrite() to perform a bulk write operation against that collection.

db.pets.bulkWrite([
{ insertOne: { "document": { "_id": 6, "name": "Bubbles", "type": "Fish" }}},
{ updateOne : {
"filter" : { "_id" : 2 },
"update" : { $set : { "weight" : 15 } }
} },
{ deleteOne : { "filter" : { "_id" : 5 } } },
{ replaceOne : {
"filter" : { "_id" : 4 },
"replacement" : { "name" : "Bite", "type" : "Dog", "weight": 5 }
}}
]) In this case, we inserted one document, updated
Result: another document, deleted another, and replaced
{ another document.
"acknowledged" : true,
"deletedCount" : 1, The db.collection.bulkWrite() method returns the
"insertedCount" : 1, following:
"matchedCount" : 2,
"upsertedCount" : 0,  A boolean acknowledged as true if the operation ran
"insertedIds" : { with write concern or false if write concern was
"0" : 6 disabled.
},  A count for each write operation.
"upsertedIds" : {  An array containing an _id for each successfully
inserted or upserted documents.
}
}

Prepared by Kamal Podder Page 11


View the Result

Now let’s take a look at the documents in the collection again.


db.pets.find()
Result:
We can see that all the changes
{ "_id" : 1, "name" : "Wag", "type" : "Dog", "weight" : 20 }
were made as specified.
{ "_id" : 2, "name" : "Bark", "type" : "Dog", "weight" : 15 }
{ "_id" : 3, "name" : "Meow", "type" : "Cat" }
{ "_id" : 4, "name" : "Bite", "type" : "Dog", "weight" : 5 }
{ "_id" : 6, "name" : "Bubbles", "type" : "Fish" }

Another example

Following performs multiple write operations. The characters collection contains the following
documents:

{ "_id" : 1, "char" : "Brisbane", "class" : "monk", "lvl" : 4 },


{ "_id" : 2, "char" : "Eldon", "class" : "alchemist", "lvl" : 3 },
{ "_id" : 3, "char" : "Meldane", "class" : "ranger", "lvl" : 3 }

The following bulkWrite() performs multiple operations on the collection:

try {
db.characters.bulkWrite(
[
{ insertOne : { "document" : { "_id" : 4, "char" : "Dithras", "class" : "barbarian", "lvl" : 4 } } },
{ insertOne : { "document" : { "_id" : 5, "char" : "Taeln", "class" : "fighter", "lvl" : 3 } } },
{ updateOne : { "filter" : { "char" : "Eldon" },
"update" : { $set : { "status" : "Critical Injury" } } }
},
{ deleteOne : { "filter" : { "char" : "Brisbane" } } },
{ replaceOne :
{
"filter" : { "char" : "Meldane" },
"replacement" : { "char" : "Tanys", "class" : "oracle", "lvl" : 4 }
}
}
]
);
}
catch (e) { print(e); }

The operation returns the following:


{

Prepared by Kamal Podder Page 12


"acknowledged" : true,
"deletedCount" : 1,
"insertedCount" : 2,
"matchedCount" : 2,
"upsertedCount" : 0,
"insertedIds" : {
"0" : 4,
"1" : 5
},
"upsertedIds" : {

}
}
Some MongoDB Shell Collection Methods and their syntax used in bulkWrite()

insertOne: It inserts only one document into the collection.

db.collection.bulkWrite( [
{ insertOne : { "document" : <document> } }
])

updateOne: It updates only one document that matches the filter in the collection.

db.collection.bulkWrite( [
{ updateOne :
{
"filter": <document>,
"update": <document or pipeline>,
"upsert": <boolean>,
"collation": <document>,
"arrayFilters": [ <filterdocument1>, ... ],
"hint": <document|string>
}
}
])

updateMany: It updates all the filter matched documents in the collection.

db.collection.bulkWrite( [
{ updateMany :{
"filter" : <doc.>,
"update" : <document or pipeline>,
"upsert" : <Boolean>,
"collation": <document>,

Prepared by Kamal Podder Page 13


"arrayFilters": [ <filterdocument1>, ... ],
"hint": <document|string> // Available starting in 4.2.1
}
}
])
replaceOne: It replaces a single document in the collection that matches the filter.

db.collection.bulkWrite([
{ replaceOne :
{
"filter" : <doc.>,
"replacement" : <doc.>,
"upsert" : <boolean>,
"collation": <document>,
"hint": <document|string>
}
}
])

3. db.collection.count(query, option)

The db.collection.count() method is used to return the count of documents that would match a find()
query. The db.collection.count() method does not perform the find() operation but instead counts and
returns the number of results that match a query.

Parameter- query :The query selection criteria. Type : document , Required

Note:
 This method is equivalent to db.collection.find().count().
 You cannot use this method in transactions.
 On a shared cluster, if you use this method without a query predicate, then it will return an
inaccurate count if orphaned documents exist or if a chunk migration is in progress. So, to avoid
such type of situation use db.collection.aggregate() method

Consider a Sample document in the restaurants collection:


{
"address": {
"building": "1007",
"coord": [ -73.856077, 40.848447 ],
"street": "Morris Park Ave",
"zipcode": "10462"
},
"borough": "Bronx",
"cuisine": "Bakery",
Prepared by Kamal Podder Page 14
"grades": [
{ "date": { "$date": 1393804800000 }, "grade": "A", "score": 2 },
{ "date": { "$date": 1378857600000 }, "grade": "A", "score": 6 },
{ "date": { "$date": 1358985600000 }, "grade": "A", "score": 10 },
{ "date": { "$date": 1322006400000 }, "grade": "A", "score": 9 },
{ "date": { "$date": 1299715200000 }, "grade": "B", "score": 14 }
],
"name": "Morris Park Bake Shop",
"restaurant_id": "30075445"
}
.....
Example: Count all document from the collection

Count the number of the documents in the restaurants collection.


db.restaurants.count(); Output : > db.restaurants.count();
25359
Example : Count all Documents that Match a Query

Count the number of the documents in the restaurants collection with the field matching the cuisine is
American:
Output : >db.restaurants.find({"cuisine" :
db.restaurants.find({"cuisine" : "American "}).count() "American "}).count();
6183

Example : Count all Documents that Match a Query using more than on criteria

Count the number of the documents in the collection restaurants filtering with the field cuisine is equal
to Italian and zipcode is 10075:
db.restaurants.find( { "cuisine": "Italian", "address.zipcode": "10075" } ).count();

Output:

> db.restaurants.find( { "cuisine": "Italian", "address.zipcode": "10075" } ).count();


15

4. Db.collection.countDocuments(query, options)

The countDocument() method return the number of documents that match the query for a collection
or view. It does not use the metadata to return the count.

Only, the countDocuments returns the actual count of the documents. The other methods return
counts based upon the collection's meta data.

Prepared by Kamal Podder Page 15


Note :

 The db.collection.find method returns a cursor. The cursor.count() method on the cursor counts
the number of documents referenced by a cursor. This is same as the db.collection.count().
 Both these methods (the cursor.count() and db.collection.count()) are deprecated as of MongoDB
v4.0. in favor of new APIs for countDocuments() and estimatedDocumentCount().
 Avoid using the db.collection.count() method without a query predicate since without the query
predicate, the method returns results based on the collection’s metadata, which may result in an
approximate count.
 db.collection.countDocuments(query) returns the count of documents that match the query for a
collection or view. This is the method you need to use to count the number of documents in your
collection.
 Most of the times all of the above return the exact same thing. Only, the countDocuments returns
the actual count of the documents. The other methods return counts based upon the collection's
meta data.

Syntax: db.collection.countDocuments( <query>, <options> )

Examples

Count all Documents in a Collection


To count the number of all documents in the orders collection, use the following operation:
db.orders.countDocuments({})

Count all Documents that Match a Query


Count the number of the documents in the orders collection with the field ord_dt greater than new
Date('01/01/2012'):
db.orders.countDocuments( { ord_dt: { $gt: new Date('01/01/2012') } }, { limit: 100 } )

5. db.collection.createIndex()

It can create the indexes on collections

Syntax: db.collection.createIndex(keys, options)

Keys: For an ascending index on a field we need to specify a value of 1 and for the descending index
we need to specify a value of -1.

Example

The example below creates an ascending index on the field tut_Date.

db.collection.createIndex( { tut_Date: 1 } )

Prepared by Kamal Podder Page 16


The following example shows a compound index created on the tut_Date field and the tut_code field.

db.collection.createIndex( { tut_Date: 1, tut_code: -1 } )

The example below will create an index named as category_tutorial. The example creates the index
with the collation that specifies the locale fr and comparison strength.

db.collection.createIndex(
{ category: 1 },
{ name: "category_tutorial", collation: { locale: "fr", strength: 2 } }
)

6. db.collection.createIndexes()

The createIndexes() method creates one or more indexes on a collection. It is used to create one or
more indexes based on the field of the document. If the index is already created or exists then this
method does not recreate the existing index.

Syntax: db.collection.createIndexes( [keyPatterns, ]options)

Keypatterns: It is an array that contains the index specific documents. All the document have field-
value pairs, where the field is the index key, and the value describes the type of index for that field,
For an ascending index on a field we need to specify a value of 1 and for the descending index we
need to specify a value of -1

Example

In the example below we considered an employee collection. We are creating index on field Employid.
Output:

Now, the example below creates two indexes on the products collection:
 Index on the manufacturer field in ascending order.
Prepared by Kamal Podder Page 17
 Index on the category field in ascending order.
It uses a collation which
db.products.createIndexes( [ { "manufacturer": 1}, { "category": 1 } ], specifies the basic fr and
{ collation: { locale: "fr", strength: 2 } }) comparison strength as 2.

Create an ascending index on the single


field name:
db.employee.createIndexes([{name:1}])

Create a descending index on the single


field name:
db.employee.createIndexes([{name:-1}])

Create an index on the multiple fields: Ascending index on the name, and Descending index on
language:

db.employee.createIndexes([{name:1,department:-1}]

Here, we create the indexes on multiple fields, i.e. ascending index on the name field, and
descending index on department field.

Creates the ascending unique indexes on the joinYear field:

db.employee.createIndexes([{joinYear:1}],{unique:true})

Here, we create the ascending unique indexes on the joinYear field by setting the value of the unique
parameter to true.

Prepared by Kamal Podder Page 18


Data Modelling in MongoDB
Data modeling is the process of defining how data is stored and what relationships exist between
different entities in our data. Data modeling aims to visually represent the relationship between
different entities in data.

Introduction
 In SQL databases you must determine and declare a table’s schema before inserting data. But
MongoDB supports flexible schema, which means there is no need of defining a structure for the
data before insertion.
 MongoDB is a document based database. A set of documents form a collection. Any document
within the same collection is not mandatory to have same set of fields or structure and the data
type for a field can differ across documents within a collection.
 To change the structure of the documents in a collection, such as add new fields, remove existing
fields, or change the field values to a new type, we simply update the documents to the new
structure.
This flexibility facilitates easy mapping of documents to an entity or an object. The documents in a
collection may have substantial variation in structure but each one will match the data fields of the
represented entity. In practice, however, the documents in a collection share a similar structure. This
will give us a better chance to set some document validation rules as a way of improving data integrity
during insert and update operations.

When designing data models we should always consider two things :

 Application-specific data access patterns (i.e. queries, updates, and processing of the data).
Finding out the questions that our users will have is paramount to designing our entities.
 The inherent structure of the data itself.

 Document Structure

The key decision in designing data models for MongoDB applications revolves around the structure of
documents and how the application represents relationships between data. There are 2 ways in which
the relationships between the data can be established in MongoDB:
1. Embedded Documents 2. Reference Documents

Embedded Documents

Embedded documents capture relationships between data by storing related data in a single document
structure. MongoDB documents make it possible to embed document structures in a field or array
within a document. These denormalized data models allow applications to retrieve and manipulate
related data in a single database operation.

Consider 2 collections student and address. Let us see how the address can be embedded into student
Prepared by Kamal Podder Page 1
collection.
{ Embedding multiple addresses can also be done.
_id:123, student collection See the below example :
name: "Student1" {
} _id:123,
name: "Student1"
{ addresses: [
address collection
_studentId:123, {
to contains the
street: "123 Street", street: "123 Street",
addresses of the
city: "Bangalore", city: "Bangalore",
students.
state: "KA" state: "KA"
} },
{
{ street: "456 Street",
_studentId:123, city: "Punjab",
street: "456 Street", state: "HR"
city: "Punjab", }
state: "HR" ]
} }

With MongoDB, you may embed


related data in a single structure or
document. These schema are
generally known as “denormalized”
models, and take advantage of
MongoDB’s rich documents.

As a result, applications may need to


issue fewer queries and updates to
complete common operations.

In general, use embedded data models when:

 We have “contains” relationships between entities.


 We have one-to-many relationships between entities. In these relationships the “many” or child
documents always appear with or are viewed in the context of the “one” or parent documents.

Strength of embedding

 We can retrieve all relevant information in a single query. So better performance for read
operations
 Avoid implementing joins in application code or using $lookup
 Update related information as a single atomic write operation. By default, all CRUD operations on a
single document are ACID compliant

Prepared by Kamal Podder Page 2


Note: Documents in MongoDB must be smaller than the maximum BSON document size.

Weaknesses of Embedding

 Restricted document size. All documents in MongoDB are constrained to the BSON size of 16
megabytes. Therefore, overall document size together with embedded data should not surpass this
limit. Otherwise, for some storage engines such as MMAPv1, data may outgrow and result in data
fragmentation as a result of degraded write performance.
 Large documents mean more overhead if most fields are not relevant. You can increase query
performance by limiting the size of the documents that you are sending over the wire for each
query.
 Data duplication: multiple copies of the same data make it harder to query the replicated data and
it may take longer to filter embedded documents, hence outdo the core advantage of embedding.

Referenced Documents

References store the relationships between data by including links or references from one document to
another. Applications can resolve these references to access the related data. Broadly, these
are normalized data models.

This is one of the ways to implement the relationship between data stored in different collections. In
this, a reference to the data in one collection will be used in connecting the data between the
collections. Consider 2 collections books and authors as shown below:

{ In this example, {
title: "Java in action", the publisher data is repeated. In name: "My Publciations",
author: "author1", order to avoid this repetition, we founded:1980,
language: "English", can add references of the book to location: "CA",
publisher: { the publisher data instead of books: [111222333,444555666,
name: "My publications", using entire data of publisher in ..]
founded:1990, every book entry, as shown. }
location: "SF"
} This can be done the other way {
} round as well, where in one can _id:111222333,
reference the publisher id in the title: "Java in action",
{ books data, your choice. author: "author1",
title: "Hibernate in action", language: "English"
author: "author2", }
language: "English",
publisher: { {
name: "My publications", _id:444555666,
founded:1990, title: "Hibernate in action",
location: "SF" author: "author2",

Prepared by Kamal Podder Page 3


} language: "English"
} }

Normalized data
models describe
relationships using
references between
documents.

In general, use normalized data models:

 when embedding would result in duplication of data but would not provide sufficient read
performance advantages to outweigh the implications of the duplication.
 to represent more complex many-to-many relationships.
 to model large hierarchical data sets.

Strengths of Referencing

 Data consistency is better. Same piece of information is not repeated as embedded document in
various other documents (say same author in various books). Hence chances of data inconsistency
are pretty low.
 Improved data integrity. Due to normalization, it is easy to update data regardless of operation
duration length and therefore ensure correct data for every document without causing any
confusion. For updating an author we need not update author in several book documents.
 By splitting up data, we will have smaller documents. Less likely to reach 16-MB-per-document
limit
 Improved cache utilization. Canonical documents accessed frequently are stored in the cache
rather than for embedded documents which are accessed a few times.
 Improved flexibility especially with a large set of subdocuments. Infrequently accessed
information not needed on every query.
 Faster writes.

Weaknesses of Referencing

 Multiple lookups: Since we have to look in a number of documents that match criteria there is
increased read time when retrieving from disk. Besides, this may result into cache misses.
 Many queries are issued to achieve some operation hence normalized data models require more
round trips to the server to complete a specific operation.
Prepared by Kamal Podder Page 4
Example of Data Modelling in MongoDB

Let us consider a simple example of building a student database in a college. Assume there are 3
models – Student, Address and Course. In a typical RDBMS database, these 3 models will be translated
into 3 tables as shown below:
Hence, from this
model, if a student
details has to be
added, then entries
should be made in all
the 3 tables.

{
Let us see, how the same data can be modelled in MongoDB. The
_id: 123,
firstName: 'Test', schema design will have only one collection Student and will have
the structure as shown. Data related to all the 3 models will be
lastName: 'Student',
shown under one Collection
address :[{
City: 'Bangalore',
NOTE : Fieldnames in a collection like firstName and lastName etc in
State: 'Karnataka',
above examples also use memory, may 10-20 bytes or so. But when
Country: 'India'
the dataset is very large, this can add up to a lot of memory. Hence
}
it is advised, for large datasets, use short fieldnames to store data in
],
collections, like fname instead of firstName.
Course: 'MCA'
}
Embedded Vs References
Effective data models support your application needs. The key consideration for the structure of your
documents is the decision to embed or to use references.

Embeded documents generally known as Referenced documents known as Normalized


Denormalized models data models
Allow applications to store related pieces of In general, use normalized data models:
information in the same document.
o Use to represents relationships between o When embedding would result in duplication
entities ( one-to-one or one-to-many of data but would not provide sufficient read
relationship) performance advantages.

Prepared by Kamal Podder Page 5


o Provides better performance for read o To represent more complex many-to-many
operations, as well as the ability to request relationships.
and retrieve related data in a single database o To model large hierarchical data sets.
operation.
o Possible to update related data in a single o References provide more flexibility than
atomic write operation. embedding. However, client-side applications
o However, embedding related data in must issue follow-up queries to resolve the
documents may lead to situations where references. In other words, normalized data
documents grow after creation. With the models can require more round trips to the
MMAPv1 storage engine, document growth server.
can impact write performance and lead to
data fragmentation.

Schema Design Approaches – Relational vs. MongoDB


"How do I model a schema for my application?"

 Proper MongoDB schema design is the most critical part of deploying a scalable, fast, and
affordable database. It's one of the most common questions developers have pertaining to
MongoDB. And the answer is, it depends. This is because document databases have a rich
vocabulary that is capable of expressing data relationships in various ways than SQL.

 There are many things to consider when picking a schema.


o Is your app read or write heavy?
o What data is frequently accessed together? Comparison between designing MongoDB
o What are your performance considerations? schema and RDBMS schema design is
o How will your data set grow and scale? natural as most of the developers come
from RDBMS background. So, let's see
how these two design patterns differ.
Relational Schema Design

In RDBMS developers
model their schema
independent of
queries.

They use prescribed


approaches, and then
apply normalization.

In this example, you can see that the user data is split into separate tables and it can be JOINED
together using foreign keys in the user_id column of the Professions and Cars table. Now, let's take a
look at how we might model this same data in MongoDB.

Prepared by Kamal Podder Page 6


MongoDB Schema Design

Now, MongoDB schema design works a lot differently than relational schema design. With MongoDB
schema design, there is:
a) No formal process b) No algorithms c) No rules

The only thing that matters is that design a schema such that it will work well for your application. Two
different apps that use the same exact data might have very different schemas if the applications are
used differently. When designing a schema, we want to take into consideration the following:
 Efficient storing the data
 Provide good query performance
 Require reasonable amount of hardware

Let's take a look at how we might model the relational User model in MongoDB.
{
2 "first_name": "Paul",
3 "surname": "Miller",
4 "cell": "447557505611",
5 "city": "London", Here instead of splitting our data up
6 "location": [45.123, 47.232], into separate collections or documents,
7 "profession": ["banking", "finance", "trader"], we take advantage of MongoDB's
8 "cars": [ document based design to embed data
9 { "model": "Bentley", "year": 1973 }, into arrays and objects within the User
1 { "model": "Rolls Royce", "year": 1965 } object. Now we can make one simple
1 ] query to pull all that data together for
1} our application.

Type of Relationships
Now we will discuss some interesting patterns and relationships and how we model them with real-
world examples.

The most important consideration we make for our schema is how the data is going to be used by the
system. So for the same exact data as the examples listed below, we might have a completely different
schema than the one that outlined here. In each example, we will outline the requirements for each
application and why a given schema was used for that example.

In this process we are going to establish a couple of handy rules to help our schema design.

1. One-to-One

Let's take a look at our User document. This example has some one-to-one data in it. For example,
here one user can only have one name. So, this would be an example of a one-to-one relationship. We
Prepared by Kamal Podder Page 7
can model all one-to-one data as key-value pairs in our database.
{
"_id": "ObjectId('AAA')",
"name": "Joe Karlsson", We should prefer key-value pair embedded in the
"company": "MongoDB", document. For example, an employee can work in
"twitter": "@JoeKarlsson1", one and only one department.
"twitch": "joe_karlsson",
"tiktok": "joekarlsson",
"website": "joekarlsson.com"
}
Subset Pattern : A potential problem with the embedded document pattern is that it can lead to large
documents that contain fields that the application does not need. This unnecessary data can cause
extra load on the server and slow down read operations. Instead, you can use the subset pattern to
retrieve the subset of data which is accessed the most frequently in a single database call.

Consider an application that shows information on movies. The database contains a movie collection
with the following schema:
{
"_id": 1,
"title": "The Arrival of a Train", "year": 1896, "runtime": 1,
"released": ISODate("01-25-1896"),
"poster": "http://ia.media-imdb.com/images/M/MV5BMjEyNDk5MDYzOV5BMl5BanBnXkFtZTgwNjIx
MTEwMzE@._V1_SX300.jpg",
"plot": "A group of people are standing in a straight line along the platform of a railway station,
waiting for a train, which is seen coming at some distance. When the train stops at the
platform, ...",
"fullplot": "A group of people are standing in a straight line along the platform of a railway station,
waiting for a train, which is seen coming at some distance. When the train stops at the
platform, the line dissolves. The doors of the railway-cars open, and people on the platform
help passengers to get off.",
"lastupdated": ISODate("2015-08-15T10:06:53"),
"type": "movie", The movie collection contains several
"directors": [ "Auguste Lumière", "Louis Lumière" ], fields that the application does not
"imdb": { need to show a simple overview of a
"rating": 7.3, movie, such as fullplot and rating
"votes": 5043, information.
"id": 12
},
"countries": [ "France" ],
"genres": [ "Documentary", "Short" ],
"tomatoes": {
"viewer": {
"rating": 3.7,
"numReviews": 59

Prepared by Kamal Podder Page 8


},
"lastUpdated": ISODate("2020-01-09T00:02:53")
}
}
Instead of storing all of the movie data in a single collection, we can split the collection into two
collections as shown below :

The movie collection contains basic information on a movie. This is the data that the application loads
by default:
// movie collection
{
"_id": 1,
"title": "The Arrival of a Train", "year": 1896, "runtime": 1,
"released": ISODate("1896-01-25"),
"type": "movie",
"directors": [ "Auguste Lumière", "Louis Lumière" ],
"countries": [ "France" ],
"genres": [ "Documentary", "Short" ],
}
The movie_details collection contains additional, less frequently-accessed data for each movie:
// movie_details collection
{
"_id": 156,
"movie_id": 1, // reference to the movie collection
"poster": "http://ia.media-
imdb.com/images/M/MV5BMjEyNDk5MDYzOV5BMl5BanBnXkFtZTgwNjIxMTEwMzE@._V1_SX300.
jpg",
"plot": "A group of people are standing in a straight line along the platform of a railway station,
waiting for a train, which is seen coming at some distance. When the ….",
"fullplot": "A group of people are standing in a straight line along the platform of a railway station,
waiting for a train, which is seen coming at some distance. When the train stops at the platform,
the line dissolves. The doors of the railway-cars open, and people….",
"lastupdated": ISODate("2015-08-15T10:06:53"),
"imdb": {
"rating": 7.3, This method improves read performance because it requires
"votes": 5043, the application to read less data to fulfill its most common
"id": 12 request. The application can make an additional database
}, call to fetch the less-frequently accessed data if needed.
"tomatoes": { TIP
"viewer": { When considering where to split your data, the most
"rating": 3.7, frequently-accessed portion of the data should go in the
"numReviews": 59 collection that the application loads first.
},
"lastUpdated": ISODate("2020-01-29T00:02:53")

Prepared by Kamal Podder Page 9


}
}
Trade-Offs of the Subset Pattern

Smaller documents result in improved read performance and make more memory available for the
application. However, it is important to understand your application and the way it loads data. If you
split your data into multiple collections improperly, your application will often need to make multiple
trips to the database and rely on JOIN operations to retrieve all of the data that it needs.

In addition, splitting your data into many small collections may increase required database
maintenance, as it may become difficult to track what data is stored in which collection.

2. One-to-Many

While the most common way to represent a one-to-one relationship in a document database is
through an embedded document, there are several ways to model one-to-many relationships in a
document schema. When considering your options for how to best model these, though, there are
three properties of the given relationship you should consider:
 Cardinality: Cardinality is the measure of the number of individual elements in a given set. For
example, if a class has 30 students, you could say that class has a cardinality of 30. In a one-to-
many relationship, the size of “many” will affect how you might model the data.
 Independent access: Some related data will rarely, if ever, be accessed separately from the main
object. Whether or not we will ever access a related document alone will also affect how we might
model the data.
 Whether the relationship between data is strictly a one-to-many relationship: For example
consider the courses a student attends at a university. From the student’s perspective, they can
participate in multiple courses, so it may seem like a one-to-many relationship. However, university
courses are rarely attended by a single student; more often, multiple students will attend the same
class. In cases like this, the relationship in question is not really a one-to-many relationship, but a
many-to-many relationship, and thus you’d take a different approach to model this relationship
than you would a one-to-many relationship.

a) One-to-Few (It is one-to-many with many side has a few documents)

For example, we might need to store several addresses associated with a given user. It's unlikely that a
user for our application would have more than a couple of different addresses. For relationships like
this, we would define this as a one-to-few relationship.
{
"_id": "ObjectId('AAA')",
"name": "Joe Karlsson", Prefer embedding for one-to-few relationships.
"company": "MongoDB",
"twitter": "@JoeKarlsson1",
"twitch": "joe_karlsson",
"tiktok": "joekarlsson",

Prepared by Kamal Podder Page 10


"website": "joekarlsson.com",
"addresses": [ Generally speaking,
{ "street": "123 Sesame St", "city": "Anytown", "cc": "USA" }, default action is to
{ "street": "123 Avenue Q", "city": "New York", "cc": "USA" } embed data within a
] document. We pull it
} out and reference it only
Rule 1: Favor embedding unless there is a compelling reason not to. if we need to access it
on its own, it's too big,
Subset Pattern or any other reason.

A potential problem with the embedded document pattern is that it can lead to large documents. In
this case, you can use the subset pattern to only access data which is required by the application,
instead of the entire set of embedded data. Consider an e-commerce site that has a list of reviews for a
product:
{
"_id": 1,
"name": "Super Widget",
"description": "This is the most useful item in your toolbox.",
"price": { "value": NumberDecimal("119.99"), "currency": "USD" },
"reviews": [
{
"review_id": 786,
"review_author": "Kristina", "review_text": "This is indeed an amazing widget.",
"published_date": ISODate("2019-02-18")
},
{
"review_id": 785,
"review_author": "Trina", "review_text": "Nice product. Slow shipping.",
"published_date": ISODate("2019-02-17")
}, The reviews are sorted in reverse
... chronological order. When a user
{ visits a product page, the
"review_id": 1, application loads the ten most
"review_author": "Hans", "review_text": "Meh, it's okay.", recent reviews.
"published_date": ISODate("2017-12-06")
}
]
}
Instead of storing all of the reviews with the product, you can split the collection into two collections:

 The product collection stores information on each product, including the product’s ten most recent
reviews:
{
"_id": 1, "name": "Super Widget",

Prepared by Kamal Podder Page 11


"description": "This is the most useful item in your toolbox.",
"price": { "value": NumberDecimal("119.99"), "currency": "USD" },
"reviews": [
{
"review_id": 786,
"review_author": "Kristina",
"review_text": "This is indeed an amazing widget.",
"published_date": ISODate("2019-02-18")
}
...
{
"review_id": 776,
"review_author": "Pablo",
"review_text": "Amazing!",
"published_date": ISODate("2019-02-16")
}
]
}
 The review collection stores all reviews. Each review contains a reference to the product for which
it was written.
{
"review_id": 786, "product_id": 1, "review_author": "Kristina",
"review_text": "This is indeed an amazing widget.",
"published_date": ISODate("2019-02-18")
}
{
"review_id": 785, "product_id": 1, "review_author": "Trina",
"review_text": "Nice product. Slow shipping.",
"published_date": ISODate("2019-02-17")
By storing the ten most recent reviews in
}
the product collection, only the required
...
subset of the overall data is returned in the
{
call to the product collection. If a user
"review_id": 1, "product_id": 1,
wants to see additional reviews, the
"review_author": "Hans",
application makes a call to
"review_text": "Meh, it's okay.",
the review collection.
"published_date": ISODate("2017-12-06")
}
Trade-Offs of the Subset Pattern

Using smaller documents containing more frequently-accessed data reduces the overall size of the
working set. These smaller documents result in improved read performance for the data that the
application accesses most frequently.

However, the subset pattern results in data duplication. In the example, reviews are maintained in

Prepared by Kamal Podder Page 12


both the product collection and the reviews collection. Extra steps must be taken to ensure that the
reviews are consistent between each collection. For example, when a customer edits their review, the
application may need to make two write operations: one to update the product collection and one to
update the reviews collection. You must also implement logic in your application to ensure that the
reviews in the product collection are always the ten most recent reviews for that product.

Other Sample Use Cases

In addition to product reviews, the subset pattern can also be a good fit to store:
 Comments on a blog post, when you only want to show the most recent or highest-rated
comments by default.
 Cast members in a movie, when you only want to show cast members with the largest roles by
default.

b) One-to-Many (Many side has bounded cardinality)

Let's say that we are building a product page for an e-commerce website, and we are going to have to
design a schema that will be able to show product information. In our system, we save information
about all the many parts that make up each product for repair services. How would you design a
schema to save all this data, but still make your product page performant? You might want to consider
a one-to-many schema since your one product is made up of many parts.

Now, with a schema that could potentially be saving thousands of sub parts, we probably do not need
to have all of the data for the parts on every single request, but it's still important that this relationship
is maintained in our schema. So, we might have a Products collection with data about each product in
our e-commerce store, and in order to keep that part data linked, we can keep an array of Object IDs
that link to a document that has information about the part. These parts can be saved in the same
collection or in a separate collection, if needed. Let's take a look at how this would look.

Child references i.e. child ids are kept in parent document as shown below works well when there are
too many related objects to embed them directly inside the parent document, but the number is still
within known bounds.

Products: Here child references are stored in parent


{
"name": "left-handed smoke shifter",
"manufacturer": "Acme Corp",
"catalog_number": "1234",
"parts": ["ObjectID('AAAA')", "ObjectID('BBBB')", "ObjectID('CCCC')"]6
}
 Rule 2: Needing to access an object on its own is a
Parts:
compelling reason not to embed it.
{
"_id" : "ObjectID('AAAA')",  Rule 3: Avoid joins/lookups if possible, but don't be
"partno" : "123-aff-456", afraid if they can provide a better schema design.

Prepared by Kamal Podder Page 13


"name" : "#4 grommet",
"qty": "94",
"cost": "0.94",
"price":" 3.99"
}
c) Unbounded One-to-Many Relationships with Parent References

What if we have a schema where there could be potentially millions of subdocuments, or more?

There are cases when the number of associated documents might be unbounded and will continue to
grow with time. Let's imagine that you have been asked to create a server logging application. Each
server could potentially save a massive amount of data, depending on how verbose you're logging and
how long you store server logs for.

With MongoDB, tracking data within an unbounded array is dangerous, since we could potentially hit
that 16-MB-per-document limit. Any given host could generate enough messages to overflow the 16-
MB document size, even if only ObjectIDs are stored in an array. So, we need to rethink how we can
track this relationship without coming up against any hard limits.

So, instead of tracking the relationship between the host and the log message in the host document,
let's each log message store the host that its message is associated with. By storing the data in the
log, we no longer need to worry about an unbounded array messing with our application! This is
known as parent referencing i.e. keep the reference of parent in child document.

Let's take a look at how this might work.

Hosts:
{  Rule 4: Arrays should not grow without
"_id": ObjectID("AAAB"), bound. If there are more than a couple of
"name": "goofy.example.com", hundred documents on the "many" side,
"ipaddr": "127.66.66.66" don't embed them; if there are more than
} a few thousand documents on the "many"
Log Message: side, don't use an array of ObjectID
{ references. High-cardinality arrays are a
"time": ISODate("2014-03-28T09:42:41.382Z"), compelling reason not to embed.
"message": "cpu is on fire!",
"host": ObjectID("AAAB")
}
Another example : Iimagine that the university’s student council has a message board where any
student can post whatever messages they want, including questions about courses, travel stories, job
postings, study materials, or just a free chat. A sample message in this example consists of a subject
and a message body:
{
"_id": ObjectId("61741c9cbc9ec583c836174c"),

Prepared by Kamal Podder Page 14


"subject": "Books on kinematics and dynamics",
"message": "Hello! Could you recommend good introductory books covering the topics of kinematics
and dynamics? Thanks!",
"posted_on": ISODate("2021-07-23T16:03:21Z")
}
We can use either of the two approaches discussed previously — embedding and child references — to
model this relationship. If you were to decide on embedding, the student’s document might take a
shape like this:
{
"_id": ObjectId("612d1e835ebee16872a109a4"),
Embedding approach
"first_name": "Sammy",
"last_name": "Shark",
"emails": [
{
"email": "sammy@digitalocean.com", "type": "work"
},
{
"email": "sammy@example.com", "type": "home"
}
],
"courses": [ ObjectId("61741c9cbc9ec583c836170a"), ObjectId("61741c9cbc9ec583c836170b") ],
"message_board_messages": [
{
"subject": "Books on kinematics and dynamics",
"message": "Hello! Could you recommend good introductory books covering the topics of
kinematics and dynamics? Thanks!",
"posted_on": ISODate("2021-07-23T16:03:21Z")
},
...
]
}
 Number of message and length of message will quickly become incredibly long and could easily
exceed the 16MB size limit, so the cardinality of this relation suggests against embedding.
 Additionally, the messages might need to be accessed separately from the student, as could be the
case if the message board page is designed to show the latest messages posted by students. This
also suggests that embedding is not the best choice for this scenario.
 We should also consider whether the message board messages are frequently accessed when
retrieving the student’s document. If not, having them all embedded inside that document would
incur a performance penalty when retrieving and manipulating this document

Now consider using child references instead of embedding full documents as in the previous example.
The individual messages would be stored in a separate collection, and the student’s document could
then have the following structure:
{

Prepared by Kamal Podder Page 15


"_id": ObjectId("612d1e835ebee16872a109a4"),
"first_name": "Sammy",
"last_name": "Shark",
"emails": [ { "email": "sammy@digitalocean.com", "type": "work" },
{ "email": "sammy@example.com", "type": "home" } ],
"courses": [ ObjectId("61741c9cbc9ec583c836170a"), ObjectId("61741c9cbc9ec583c836170b") ],
"message_board_messages": [
ObjectId("61741c9cbc9ec583c836174c"),
...
]
}
 In this example, the message_board_messages field now stores the child references to all messages
written by Sammy.
 However, changing the approach solves only one of the issues mentioned before in that it would
now be possible to access the messages independently. But although the student’s document size
would grow more slowly using the child references approach, the collection of object identifiers
could also become unwieldy given the unbounded cardinality of this relation. A student could easily
write thousands of messages during their four years of study, after all.

In such scenarios, a common way to connect one object to another is through parent references. Here
not the student document referring to individual messages, but rather a reference in the message’s
document pointing towards the student that wrote it.

To use parent references, you would need to modify the message document schema to contain a
reference to the student who authored the message:
{
"_id": ObjectId("61741c9cbc9ec583c836174c"),
"subject": "Books on kinematics and dynamics",
"message": "Hello! Could you recommend a good introductory books covering the topics of
kinematics and dynamics? Thanks!",
"posted_on": ISODate("2021-07-23T16:03:21Z"), Student id – parent reference
"posted_by": ObjectId("612d1e835ebee16872a109a4")
}
Notice the new posted_by field contains the object identifier of the student’s document. Now, the
student’s document won’t contain any information about the messages they’ve posted:
{
"_id": ObjectId("612d1e835ebee16872a109a4"),
"first_name": "Sammy",
"last_name": "Shark",
"emails": [ { "email": "sammy@digitalocean.com", "type": "work" },
{ "email": "sammy@example.com", "type": "home" }
],
"courses": [ ObjectId("61741c9cbc9ec583c836170a"), ObjectId("61741c9cbc9ec583c836170b") ]
}

Prepared by Kamal Podder Page 16


 To retrieve the list of messages written by a student, you would use a query on the messages
collection and filter against the posted_by field. Having them in a separate collection makes it safe
to let the list of messages grow without affecting any of the student’s documents.

 When using parent references, creating an index on the field referencing the parent document can
significantly increase the query performance each time you filter against the parent document
identifier.

In this type of situation it’s generally advised that we store related documents separately and use
parent references to connect them to the parent document.

3. Many-to-Many

This is another very common schema pattern that we see all the time in relational and MongoDB
schema designs. For this pattern, let's imagine that we are building a to-do application. In our app, a
user may have many tasks and a task may have many users assigned to it.

In order to preserve these relationships between users and tasks, there will need to be references from
the one user to the many tasks and references from the one task to the many users. Let's look at how
this could work for a to-do list application.
Users:
{
"_id": ObjectID("AAF1"),
"name": "Kate Monster",
"tasks": [ObjectID("ADF9"), ObjectID("AE02"), ObjectID("AE73")]
} We can see that each
Tasks: user has a sub-array of
{ linked tasks, and each
"_id": ObjectID("ADF9"), task has a sub-array of
"description": "Write blog post about MongoDB schema design", owners for each item in
"due_date": ISODate("2014-04-01"), our to-do app.
"owners": [ObjectID("AAF1"), ObjectID("BB3G")]
}

Note: There is no firm rule when to use child references, parent references or both based on
cardinality of the relation. We might choose a different approach at either a lower or higher cardinality
if it’s what best suits the application in question. After all, we will always want to structure our data to
suit the manner in which your application queries and updates it.

Prepared by Kamal Podder Page 17


Some considerations while designing schema in MongoDB
1. From the approaches discussed earlier it is clear that design the schema according to user
requirements.
2. Combine objects into one document if you will use them together. Otherwise separate them (but
make sure there should not be need of joins).
3. Duplicate the data (but limited) because disk space is cheap as compare to compute time.
4. Do joins while write, not on read.
5. Optimize your schema for most frequent use cases.
6. Do complex aggregation in the schema.

7. Storage Optimization for Small Documents

Each MongoDB document contains a certain amount of overhead. This overhead is normally
insignificant but becomes significant if all documents are just a few bytes, as might be the case if the
documents in your collection only have one or two fields.

Consider the following suggestions and strategies for optimizing storage utilization for these
collections:

 Use the _id field explicitly.

MongoDB clients automatically add an _id field to each document and generate a unique 12-
byte ObjectId for the _id field. Furthermore, MongoDB always indexes the _id field. For smaller
documents this may account for a significant amount of space.

To optimize storage use, users can specify a value for the _id field explicitly when inserting documents
into the collection. The value in the _id field serves as a primary key for documents in the collection, so
it must be unique.

 Use shorter field names.

MongoDB stores all field names in every document. Consider a collection of small documents that
resemble the following:
{ last_name : "Smith", best_score: 3.9 }

If we shorten the field named last_name to lname and the field named best_score to score, as follows,
you could save 9 bytes per document.
{ lname : "Smith", score : 3.9 }

NOTE
Shortening field names reduces expressiveness and does not provide considerable benefit for larger
documents and where document overhead is not of significant concern. Shorter field names do not
reduce the size of indexes, because indexes have a predefined structure.
Prepared by Kamal Podder Page 18
In general, it is not necessary to use short field names.

8. Data Use and Performance

When designing a data model, consider how applications will use your database. For instance, if your
application only uses recently inserted documents, consider using Capped Collections. Or if your
application needs are mainly read operations to a collection, adding indexes to support common
queries can improve performance.

9. Operational Factors and Data Models

While designing data model we should consider various operational factors that impact the
performance of MongoDB. These factors are operational or address requirements that arise outside of
the application but impact the performance of MongoDB based applications. When developing a data
model, analyze all of your application’s read operations and write operations in conjunction with the
following considerations.

 Document Growth

Some updates to documents can increase the size of documents. These updates include pushing
elements to an array (i.e. $push) and adding new fields to a document. When using the MMAPv1
storage engine, document growth can be a consideration for your data model. For MMAPv1, if the
document size exceeds the allocated space for that document, MongoDB will relocate the
document on disk.

 Atomicity

In MongoDB, a write operation is atomic on the level of a single document, even if the operation
modifies multiple embedded documents within a single document.

A data model that embeds related data in a single document facilitates these kinds of atomic
operations. For data models that store references between related pieces of data, the application
must issue separate read and write operations to retrieve and modify these related pieces of data.

Ensure that the application stores all fields with atomic dependency requirements in the same
document. If the application can tolerate non-atomic updates for two pieces of data, you can store
these data in separate documents.
o When a single write operation (e.g. db.collection.updateMany()) modifies multiple documents,
the modification of each document is atomic, but the operation as a whole is not atomic.
o When performing multi-document write operations, whether through a single write operation
or multiple write operations, other operations may interleave.

For situations that require atomicity of reads and writes to multiple documents (in a single or multiple
collections), MongoDB supports multi-document transactions:

Prepared by Kamal Podder Page 19


 In version 4.0, MongoDB supports multi-document transactions on replica sets.
 In version 4.2, MongoDB introduces distributed transactions, which adds support for multi-
document transactions on sharded clusters and incorporates the existing support for multi-
document transactions on replica sets.

For details regarding transactions in MongoDB, see the Transactions page.

 Sharding

MongoDB uses sharding to provide horizontal scaling by partitioning a collection within a database
to distribute the collection’s documents across a number of mongod instances or shards. These
clusters support deployments with large data sets and high-throughput operations.

To distribute data and application traffic in a sharded collection, MongoDB uses the shard key.
Selecting the proper shard key has significant implications for performance, and can enable or
prevent query isolation and increased write capacity. It is important to consider carefully the field
or fields to use as the shard key.

 Indexes

Use indexes to improve performance for common queries. Build indexes on fields that appear often
in queries and for all operations that return sorted results. MongoDB automatically creates a
unique index on the _id field.

 Large Number of Collections

In certain situations, you might choose to store related information in several collections rather
than in a single collection. Consider a sample collection logs that stores log documents for various
environment and applications. The logs collection contains documents of the following form:
If the total number of documents is low, you may group
{ log: "dev", ts: ..., info: ... }
documents into collection by type. For logs, consider
{ log: "debug", ts: ..., info: ...}
maintaining distinct log collections, such
as logs_dev and logs_debug. The logs_dev collection would
contain only the documents related to the dev environment.

Generally, having a large number of collections has no significant performance penalty and results
in very good performance. Distinct collections are very important for high-throughput batch
processing.
When using models that have a large number of collections, consider the following behaviors:

o Each collection has a certain minimum overhead of a few kilobytes.


o Each index, including the index on _id, requires at least 8 kB of data space.

Prepared by Kamal Podder Page 20


o For each database, a single namespace file (i.e. <database>.ns) stores all meta-data for that
database, and each index and collection has its own entry in the namespace file. MongoDB
places limits on the size of namespace files.

 Data Lifecycle Management

Data modeling decisions should take data lifecycle management into consideration.
The Time to Live or TTL feature of collections expire documents after a period of time. Consider
using the TTL feature if your application requires some data to persist in the database for a limited
period of time.

Additionally, if your application only uses recently inserted documents, consider Capped
Collections. Capped collections provide first-in-first-out (FIFO) management of inserted documents
and efficiently support operations that insert and read documents based on insertion order.

GridFS
GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of
16MB. Instead of storing a file in a single document, GridFS divides a file into parts, or chunks, and
stores each of those chunks as a separate document. By default GridFS limits chunk size to 255k.

GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file
metadata. When you query a GridFS store for a file, the driver or client will reassemble the chunks as
needed. You can perform range queries on files stored through GridFS. You also can access information
from arbitrary sections of files, which allows you to “skip” into the middle of a video or audio file.

GridFS is useful not only for storing files that exceed 16MB but also for storing any files for which you
want access without having to load the entire file into memory.

Prepared by Kamal Podder Page 21


Model Tree Structures
MongoDB allows various ways to use tree data structures to model large hierarchical or nested data
relationships. We will discuss following types of tree structure.

 Model Tree Structures with Parent References

A data model representing tree-like structure in MongoDB documents can be described by


storing references to “parent” nodes in children nodes.

Pattern

The Parent References pattern stores each tree node in a document; in addition to the tree node, the
document stores the id of the node’s parent. Consider the following hierarchy of categories where the
tree using Parent References, storing the reference to the parent category in the field parent:

db.categories.insertMany( [
{ _id: "MongoDB", parent: "Databases" }, Above tree structure is
{ _id: "dbm", parent: "Databases" }, represented in the document
{ _id: "Databases", parent: "Programming" },
{ _id: "Languages", parent: "Programming" },
{ _id: "Programming", parent: "Books" },
{ _id: "Books", parent: null }
])
o The query to retrieve the parent of a node is fast and straightforward:
db.categories.findOne( { _id: "MongoDB" } ).parent
o We can create an index on the field parent to enable fast search by the parent node:
db.categories.createIndex( { parent: 1 } )
o We can query by the parent field to find its immediate children nodes:
db.categories.find( { parent: "Databases" } )
o To retrieve subtrees, see $graphLookup.

Prepared by Kamal Podder Page 1


 Model Tree Structures with Child References

In this data model a tree-like structure in MongoDB documents is described by storing references of
child nodes (in an array) in the parent-nodes.

Pattern
The Child References pattern stores each tree node in a document; in addition to the tree node,
document stores in an array the id(s) of the node’s children. The following example models the tree
using Child References, storing the reference to the node’s children in the field children:

db.categories.insertMany( [ Above tree structure is


{ _id: "MongoDB", children: [] }, represented in the document.
{ _id: "dbm", children: [] }, Here array of children is stored
{ _id: "Databases", children: [ "MongoDB", "dbm" ] }, along with tree node.
{ _id: "Languages", children: [] },
{ _id: "Programming", children: [ "Databases", "Languages" ] },
{ _id: "Books", children: [ "Programming" ] }
])
o The query to retrieve the immediate children of a node is fast and straightforward:
o db.categories.findOne( { _id: "Databases" } ).children
o We can create an index on the field children to enable fast search by the child nodes:
o db.categories.createIndex( { children: 1 } )
o We can query for a node in the children field to find its parent node as well as its siblings:
o db.categories.find( { children: "MongoDB" } ) The result will give Databases as parent and
dbm as siblings.
o The Child References pattern provides a suitable solution to tree storage as long as no
operations on subtrees are necessary.
o This pattern may also provide a suitable solution for storing graphs where a node may have
multiple parents.

 Model Tree Structures with an Array of Ancestors

This data model that describes a tree-like structure in MongoDB documents using references to parent
nodes and an array that stores all ancestors.

Pattern : The Array of Ancestors pattern stores each tree node in a document; in addition to the tree
node, document stores in an array the id(s) of the node’s ancestors or path.

The following example models the above tree using Array of Ancestors. In addition to
the ancestors’ field, these documents also store the reference to the immediate parent category in
the parent field:

Prepared by Kamal Podder Page 2


db.categories.insertMany( [
{ _id: "MongoDB", ancestors: [ "Books",
"Programming", "Databases" ], parent: "Databases" },
{ _id: "dbm", ancestors: [ "Books", "Programming",
"Databases" ], parent: "Databases" },
{ _id: "Databases", ancestors: [ "Books",
"Programming" ], parent: "Programming" },
{ _id: "Languages", ancestors: [ "Books",
"Programming" ], parent: "Programming" },
{ _id: "Programming", ancestors: [ "Books" ],
parent: "Books" },
{ _id: "Books", ancestors: [ ], parent: null }
])

o The query to retrieve the ancestors or path of a node is fast and straightforward:
db.categories.findOne( { _id: "MongoDB" } ).ancestors
o We can create an index on the field ancestors to enable fast search by the ancestors nodes:
db.categories.createIndex( { ancestors: 1 } )
o We can query by the field ancestors to find all its descendants:
db.categories.find( { ancestors: "Programming" } )

The Array of Ancestors pattern provides a fast and efficient solution to find the descendants and the
ancestors of a node by creating an index on the elements of the ancestors field. This makes Array of
Ancestors a good choice for working with subtrees.

The Array of Ancestors pattern is slightly slower than the Materialized Paths pattern but is more
straightforward to use.

 Model Tree Structures with Materialized Paths

This data model that describes a tree-like structure in MongoDB documents by storing full relationship
paths between documents.

Pattern : The Materialized Paths pattern stores each tree node in a document; in addition to the tree
node, document stores as a string the id(s) of the node’s ancestors or path. Although the Materialized
Paths pattern requires additional steps of working with strings and regular expressions, the pattern
also provides more flexibility in working with the path, such as finding nodes by partial paths.

Consider the following hierarchy of categories:

The following example models the tree using Materialized Paths, storing the path in the field path; the
path string uses the comma , as a delimiter:

Prepared by Kamal Podder Page 3


Prepare Data

db.categories.insertMany( [
{ _id: "Books", path: null },
{ _id: "Programming", path: ",Books," },
{ _id: "Databases", path: ",Books,Programming," },
{ _id: "Languages", path: ",Books,Programming," },
{ _id: "MongoDB", path: ",Books,Programming,Databases," },
{ _id: "dbm", path: ",Books,Programming,Databases," },
{ _id: "Java", path: ",Books,Programming,Languages," },
{ _id: "C++", path: ",Books,Programming,Languages," },
{ _id: "C", path: ",Books,Programming,Languages," },
{ _id: "Functional", path: ",Books,Programming,Languages,Java," },
{ _id: "Spring", path: ",Books,Programming,Languages,Java," },
{ _id: "MicroServices", path: ",Books,Programming,Languages,Java," },
] );
// add index on "path" field.
db.categories.createIndex( { path: 1 } );

This index may improve performance depending on the query:

Prepared by Kamal Podder Page 4


o For queries from the root Books sub-tree (e.g. /^,Books,/ or /^,Books,Programming,/), an index on
the path field improves the query performance significantly.
o For queries of sub-trees where the path from the root is not provided in the query
(e.g. /,Databases,/), or similar queries of sub-trees, where the node might be in the middle of the
indexed string, the query must inspect the entire index.

For these queries an index may provide some performance improvement if the index is significantly
smaller than the entire collection.

Query

 We can query to retrieve the whole tree, sorting by the field path

db.categories.find().sort( { path: 1 } );

//Output:
{ "_id" : "Books", "path" : null }
{ "_id" : "Programming", "path" : ",Books," }
{ "_id" : "Databases", "path" : ",Books,Programming," }
{ "_id" : "Languages", "path" : ",Books,Programming," }
{ "_id" : "MongoDB", "path" : ",Books,Programming,Databases," }
{ "_id" : "dbm", "path" : ",Books,Programming,Databases," }
{ "_id" : "Java", "path" : ",Books,Programming,Languages," }
{ "_id" : "C++", "path" : ",Books,Programming,Languages," }
{ "_id" : "C", "path" : ",Books,Programming,Languages," }
{ "_id" : "Functional", "path" : ",Books,Programming,Languages,Java," }
{ "_id" : "Spring", "path" : ",Books,Programming,Languages,Java," }
{ "_id" : "MicroServices", "path" : ",Books,Programming,Languages,Java," }

 Use regular expressions on the path field to find the descendants/children of Programming

Note : A regular expression is a “prefix expression” if it starts with a caret (^) or a left anchor (\A),
followed by a string of simple symbols. The ^ is used to make sure that the string starts with a certain
character. For example, the regex /^abc.*/ will be optimized by matching only against the values from
the index that start with abc.
The “//” options basically means to specify the
search criteria within these delimiters. Hence,
db.categories.find( { path: /,Databases,/ } );
specifying /,Database,/ means find those
documents where path contains this string.
// Output:
{ "_id" : "MongoDB", "path" : ",Books,Programming,Databases," }
{ "_id" : "dbm", "path" : ",Books,Programming,Databases," }

Prepared by Kamal Podder Page 5


We can also retrieve the descendants of Books where the Books is also at the topmost level of the
hierarchy:
db.categories.find( { path: /^,Books,/ } )

 Query to find all the closest related node’s with score for a given node. Here we are finding the
Closest related to “SPRING”

db.categories.createIndex( { path: "text"} ); // create text index to use text search.


// Find the path for node "Spring". find() will return the cursor and next() will return the next document in
// the cursor
var pathForNodeSpring = db.getCollection("categories")
.find({_id: "Spring"}).next().path;

// Use it to search the tree with score, highest score is the best match
db.categories.find( Here we are using text search
{ $text: { $search: pathForNodeSpring } }, based on text index created earlier
{ score: { $meta: "textScore" } } // assigns score to best match.
).sort( { score: { $meta: "textScore" } } ) ;
“textScore” returns the score associated with the
corresponding $text query for each matching document. The
text score signifies how well the document matched
// Output: the search term or terms.

{ "_id" : "Functional", "path" : ",Books,Programming,Languages,Java,", "score" : 2.5 }


{ "_id" : "Spring", "path" : ",Books,Programming,Languages,Java,", "score" : 2.5 }
{ "_id" : "MicroServices", "path" : ",Books,Programming,Languages,Java,", "score" : 2.5 }
{ "_id" : "Java", "path" : ",Books,Programming,Languages,", "score" : 2 }
{ "_id" : "C++", "path" : ",Books,Programming,Languages,", "score" : 2 }
{ "_id" : "C", "path" : ",Books,Programming,Languages,", "score" : 2 }
{ "_id" : "Databases", "path" : ",Books,Programming,", "score" : 1.5 }
{ "_id" : "Languages", "path" : ",Books,Programming,", "score" : 1.5 }
{ "_id" : "MongoDB", "path" : ",Books,Programming,Databases,", "score" : 1.3333333333333333 }
{ "_id" : "dbm", "path" : ",Books,Programming,Databases,", "score" : 1.3333333333333333 }
{ "_id" : "Programming", "path" : ",Books,", "score" : 1 }

The highest “score” : 2.5 indicates siblings that is closest to node “Spring”. Here they are Functional, Spring
and MicroServices. It will be helpful in case where reader doesn’t find a book on “Spring” but still application
can refer book’s related in the same category available.

Cons:
Modification’s made to node should be reflected to all descendant's/child-node.

Prepared by Kamal Podder Page 6


 Model Tree Structures with Nested Sets

The nested set model is to number the nodes according to a tree traversal, which visits each node
twice, assigning numbers in the order of visiting, and at both visits. This leaves two numbers for each
node, which are stored as two attributes.

To generate the nested set representation of a tree the tree is traversed in depth-first order (dotted
line). Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures.
The algorithm starts at the root node (selecting some arbitrary node as the root node in the case of a
graph) and explores as far as possible along each branch before backtracking.

Nested sets representation of a tree : To


generate the nested set representation of a
tree the tree is traversed in depth-first order
(dotted line), and each node is assigned a pair
of numbers that record the order in which that
node is visited. The left number records the
first time the node is visited, the right number
records the last visit.

The set of nodes in a given subtree correspond to those nodes whose left and right visitation numbers
fall within the range of same numbers assigned for the root of the subtree. For example for root E all
subtree nodes must have left and right visitation numbers within 4 and 11. B,C,D nodes satisfy these
criteria, hence they belongs to the subtree of E.

Pattern :

 The Nested Sets pattern identifies each node in the tree as stops in a round-trip traversal of the
tree.
 The application visits each node in the tree twice; first during the initial trip, and second during the
return trip. The Nested Sets pattern stores each tree node in a document; in addition to the tree
node, document stores the id of node’s parent, the node’s initial stop in the left field, and its return
stop in the right field.

Consider the following hierarchy of categories:

The "Books" category, with the highest position in the hierarchy, encompasses all subordinating
categories. It is therefore given left and right domain values of 1 and 12, the latter value is the double
of the total number of nodes being represented.

Prepared by Kamal Podder Page 7


We are covering here only a
basic query not how to insert
or update node info.

The following example models the tree using Nested Sets:

db.categories.insertMany( [
{ _id: "Books", parent: 0, left: 1, right: 12 },
{ _id: "Programming", parent: "Books", left: 2, right: 11 },
{ _id: "Languages", parent: "Programming", left: 3, right: 4 }, All the nodes descendants of
{ _id: "Databases", parent: "Programming", left: 5, right: 10 }, parent node (Database) must
{ _id: "MongoDB", parent: "Databases", left: 6, right: 7 }, have left and right visitation
{ _id: "dbm", parent: "Databases", left: 8, right: 9 } numbers within 5 and 10.
])
We can query to retrieve the descendants of a node:
var databaseCategory = db.categories.findOne( { _id: "Databases" } );
db.categories.find( { left: { $gt: databaseCategory.left }, right: { $lt: databaseCategory.right } } );

The Nested Sets pattern provides a fast and efficient solution for finding subtrees but is inefficient for
modifying the tree structure. As such, this pattern is best for static trees that do not change.

Model Specific Application Contexts


We can use various data model patterns which are specific to the application contexts. These patterns
will be suitable for that application requirement. We are going to discuss some of these patterns.

 Model Data for Atomic Operations

Although MongoDB supports multi-document transactions for replica sets (starting in version 4.0) and
sharded clusters (starting in version 4.2), for many scenarios, the denormalized data model will
continue to be optimal for your data and use cases.

Prepared by Kamal Podder Page 8


Pattern
In MongoDB, a write operation on a single document is atomic. For fields that must be updated
together, embedding the fields within the same document ensures that the fields can be updated
atomically.

For example, consider a situation where you need to maintain information on books, including the
number of copies available for checkout as well as the current checkout information. The available
copies of the book and the checkout information should be in sync. As such, embedding
the available field and the checkout field within the same document ensures that you can update the
two fields atomically.
{
_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher_id: "oreilly",
available: 3,
checkout: [ { by: "joe", date: ISODate("2012-10-15") } ]
}
Then to update with new checkout information, you can use the db.collection.updateOne() method to
atomically update both the available field and the checkout field:
db.books.updateOne (
{ _id: 123456789, available: { $gt: 0 } },
{
$inc: { available: -1 },
$push: { checkout: { by: "abc", date: new Date() } }
}
)
The operation returns a document that contains information on the status of the operation:
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }

The matchedCount field shows that 1 document matched the update condition,
and modifiedCount shows that the operation updated 1 document. If no document matched the
update condition, then matchedCount and modifiedCount would be 0 and would indicate that you
could not check out the book.

 Model Data to Support Keyword Search

If your application needs to perform queries on the content of a field that holds text you can perform
exact matches on the text or use $regex to use regular expression pattern matches. However, for many
operations on text, these methods do not satisfy application requirements.

Prepared by Kamal Podder Page 9


This pattern describes one method for supporting keyword search using MongoDB to support
application search functionality, that uses keywords stored in an array in the same document as the
text field. Combined with a multi-key index, this pattern can support application’s keyword search
operations.

Pattern
To add structures to your document to support keyword-based queries, create an array field in your
documents and add the keywords as strings in the array. You can then create a multi-key index on the
array and create queries that select values from the array.

EXAMPLE
Given a collection of library volumes that you want to provide topic-based search. For each volume,
you add the array topics, and you add as many keywords as needed for a given volume.
For the Moby-Dick volume you might have the following document:

{ title : "Moby-Dick" ,
author : "Herman Melville" ,
published : 1851 ,
ISBN : 0451526996 ,
topics : [ "whaling" , "allegory" , "revenge" , "American" ,
"novel" , "nautical" , "voyage" , "Cape Cod" ]
}
You then create a multi-key index on the topics array:
db.volumes.createIndex( { topics: 1 } )

The multi-key index creates separate index entries for each keyword in the topics array. For example
the index contains one entry for whaling and another for allegory.
You then query based on the keywords. For example:

db.volumes.findOne( { topics : "voyage" }, { title: 1 } )

NOTE : An array with a large number of elements, such as one with several hundreds or thousands of
keywords will incur greater indexing costs on insertion.

 Model Data for Schema Versioning

Database schemas occasionally need to be updated. For example, a schema designed to hold user
contact information may need to be updated to include new methods of communication as they
become popular, such as Twitter or Skype.

Those who have worked with RDBMS must understand the challenge of introducing changes in the
underlying database schema of any application, especially after it's been deployed to production
environment. Typically that involves stopping the application, running the migrations, waiting for the
changes to kick in and then restarting the application. There is no way to avoid this downtime,

Prepared by Kamal Podder Page 10


however minimal that may be. Moreover, if there is a problem, such as migration failure, then
reverting back to the old shape of the database would be an additional challenge to deal with.

However, with NoSQL databases like MongoDB this problem can be avoided only because the
document model is very flexible. There can be various approaches to making schema changes. Here we
will discuss about the schema versioning pattern.

You can use MongoDB’s flexible schema model, which supports differently shaped documents in the
same collection, to gradually update your collection’s schema. As you update your schema model, the
Schema Versioning pattern allows you to track these updates with version numbers.

Your application code can use version numbers to identify and handle differently shaped documents
without downtime.

Schema Versioning Pattern

To implement the Schema Versioning pattern, add a schema_version (or similarly named) field to your
schema the first time that you modify your schema. Documents that use the new schema should have
a schema_version of 2 to indicate that they adhere to the second iteration of your schema. If you
update your schema again, increment the schema_version.

Your application code can use a document’s schema_version, or lack thereof, to conditionally handle
documents. Use the latest schema to store new information in the database.

Example
The following example iterates upon the schema for documents in the users collection.
In the first iteration of this schema, a record includes galactic_id, name, and phone fields:

// users collection
{
"_id": "<ObjectId>",
"galactic_id": 123,
"name": "Anakin Skywalker",
"phone": "503-555-0000",
}
In the next iteration, the schema is updated to include more information in a different shape:
// users collection
{
"_id": "<ObjectId>", Adding a schema_version means that an application can
"galactic_id": 123, identify documents shaped for the new schema and
"name": "Darth Vader", handles them accordingly. The application can still
"contact_method": { handle old documents if schema_version does not exist
"work": "503-555-0210", on the document.
"home": "503-555-0220",

Prepared by Kamal Podder Page 11


"twitter": "@realdarthvader",
"skype": "AlwaysWithYou"
},
"schema_version": "2"
}
For example, consider an application that finds a user’s phone number(s) by galactic_id. Upon being
given a galactic_id, the application needs to query the database:

db.users.find( { galactic_id: 123 } );

After the document is returned from the database, the application checks to see whether the
document has a schema_version field.
 If it does not have a schema_version field, the application passes the returned document to a
dedicated function that renders the phone field from the original schema.
 If it does have a schema_version field, the application checks the schema version. In this example,
the schema_version is 2 and the application passes the returned document to a dedicated function
that renders the new contact_method.work and contact_method.home fields.

Using the schema_version field, application code can support any number of schema iterations in the
same collection by adding dedicated handler functions to the code.

Use Cases

The Schema Versioning pattern is ideal for any one or a combination of the following cases:
 Application downtime is not an option
 Updating documents may take hours, days, or weeks of time to complete
 Updating documents to the new schema version is not a requirement

The Schema Versioning pattern helps you better decide when and how data migrations will take place
relative to traditional, tabular databases.

Prepared by Kamal Podder Page 12


Data Modeling Example: Sample E-Commerce System with MongoDB

Simple e-commerce systems are a good starting point for data modeling with document databases like
MongoDB. These examples easily demonstrate core concepts of application development with
MongoDB and contain several patterns that you can reuse in other problem domains. MongoDB lets
you organize you data in "BSON documents," which you can think of as a "typed JSON" documents.

The Product Catalog

The first step is to design the schema for the website. Consider an initial product schema:
{
sku: "111445GB3",
title: "Simsong One mobile phone",
description: "The greatest Onedroid phone on the market .....",

manufacture_details: {
model_number: "A123X",
release_date: new ISODate("2012-05-17T08:14:15.656Z")
},

shipping_details: { This data model stores physical details like


weight: 350, manufacturing and shipping information as embedded
width: 10, documents in the larger product document, which makes
height: 10, sense because these physical details are unique features
depth: 1 of the product. This gives the document "strong data
}, locality," which allows easy mapping in an object
quantity: 99, oriented environment.

pricing: { price: 1000 }


}
To insert the document in the products collection, use the following commands.

mongo
use ecommerce
db.products.insert({
sku: "111445GB3",
title: "Simsong One mobile phone",
description: "The greatest Onedroid phone on the market .....",

manufacture_details: {
model_number: "A123X",
release_date: new ISODate("2012-05-17T08:14:15.656Z")
},
Prepared by Kamal Podder Page 1
shipping_details: {
weight: 350, The first command (mongo) starts the mongodb console and
width: 10, connects to the local Mongo DB console on localhost and
height: 10, port 27017. The next chooses the ecommerce database (use
depth: 1 ecommerce) and the third inserts the product document in
}, the products collection. Going forward all commands are
assuming you are in the Mongo DB shell using the
quantity: 99, ecommerce database.

pricing: { The products data model has a unique sku that identifies the
price: 1000 product, title, description, a stock quantity, and pricing
} information about the item.
})

All products have categories. In the case of the Simsong One it's a 15G phone and also has a FM
receiver. As a result, this product falls into both the mobile/15G and the radio/fm categories. Add the
categories to the existing document, with the following update() operation:

db.products.update({sku: "111445GB3"}, {$set: { categories: ['mobile/15G', 'mobile/fm'] }});

To support efficient queries using the categories field, add an index on the categories field for
the products collection:
db.products.ensureIndex({categories:1 })

This returns all the products for a specific category using the index and an anchored regular expression.
As long as the regular expression is case sensitive and anchored, MongoDB will use the index to return
the query. For example, fetch all the products in the category that begins with mobile/fm:

db.products.find({categories: /^mobile\/fm/})

To be able to provide a list of all the products in a category, amend the data model with a collection of
documents for each category. In this collection, each document represents a category and contains the
path for that category in category tree. These documents would resemble the following.
{
title: "Mobiles containing a FM radio",
parent: "mobile",
path: "mobile/fm"
}
Insert the document into the categories collection and add indexes to this collection:

db.categories.insert({title: "Mobiles containing a FM radio", parent: "mobile", path: "mobile/fm"})


db.categories.insert({title: "Mobiles with 15G support", parent: "mobile", path: "mobile/15G"})
db.categories.ensureIndex({parent: 1, path: 1})

Prepared by Kamal Podder Page 2


db.categories.ensureIndex({path: 1})

There are two paths in each category: this allows the application to use the same method to find all
categories for a specific category root as used for finding products by category. For example, to return
all sub-categories of the category "mobile", use the following query:

db.categories.find({parent: /^mobile/}, {_id: 0, path: 1})


This will return the following documents:
{"path": "mobile/fm"}
{"path": "mobile/15G"}

Using these path values, the application can use this method to access the category tree and extract
more sub-categories with a single index supported query. Furthermore, the application can pull all the
documents for a specific category using this path value.

The Cart

A cart in an e-commerce system, allows users to reserve items from the inventory and keep them until
they check out and pay for the items. The application must ensure that at any point in time there are
not more items in carts than there are in stock and that if the users abandons the cart, the application
must return the items from the cart to the inventory without loosing track of any objects. Take the
following document, which models the cart:

{
_id: "the_users_session_id",
status:'active'

quantity: 2,
total: 2000,

products: []
}
The products array contains the list of products the customer intends to purchase. Use the
following insert() operation to create the cart:

db.carts.insert({ _id: "the_users_session_id", status:'active', quantity: 2,


total: 2000, products: []});
If the inventory had 99 items, after this operation, the inventory should have 97 items. To prevent
"overselling," the application must move items from the inventory to the cart. To support these
operations applications must perform a set of updates, and to be able "rollback" changes if something
goes awry. Begin by adding a product to the customer's cart with the following operation:

db.carts.update({
_id: "the_users_session_id", status:'active'

Prepared by Kamal Podder Page 3


}, {
$set: { modified_on: ISODate() },
$push: {
products: {
sku: "111445GB3", quantity: 1, title: "Simsong One mobile phone", price:1000
}
}
});
Then, check to ensure that the inventory can support adding the product to the customers cart:
db.products.update({
sku: "111445GB3", quantity: {$gte: 1}
}, {
$inc: {quantity: -1},
$push: {
in_carts: {
quantity:1, id: "the_users_session_id", timestamp: new ISODate()
}
}
})
This operation only succeeds if there is sufficient inventory, and the application must detect the
operation's success or failure. Call getLastError to fetch the result of the attempted update:
if(!db.runCommand({getLastError:1}).updatedExisting) {
db.carts.update({
_id: "the_users_session_id"
}, {
$pull: {products: {sku:"111445GB3"}}
})
}
If updatedExisting is false in the resulting document, the operation failed and the application must "roll
back" the attempt to add the product to the user’s cart. This pattern ensures that the application
cannot have more products in carts than the available inventory.

In addition to simply adding objects to carts, there are a number of cart related operations that the
application must be able to support:
 users may add or remove objects from the cart.
 users may abandon a cart and the application must return items in the cart to inventory.

The next sequence of operations allow the application to ensure that carts are up to date and that the
application has enough inventory to cover it. Update the cart with the new quantity, using the
following update() operation:

var new_quantity = 2; var old_quantity = 1;


var quantity_delta = new_quantity - old_quantity;
db.carts.update({ _id: "the_users_session_id", "products.sku": "111445GB3", status: "active" },

Prepared by Kamal Podder Page 4


{
$set: { modified_on: new ISODate(), "products.$.qty": new_quantity }
})
Notice that there are some defined variables called newquantity, oldquantity and quantity_delta to
contain the new and previous quantity in the cart as well as the delta that needs to be requested from
the inventory.

Now, remove the additional item from the inventory update the number of items in the shopping cart:
db.products.update({
sku: "111445GB3",
"in_carts.id": "the_users_session_id",
quantity: {
$gte: 1
}
}, {
$inc: { quantity: (-1)*quantity_delta },
$set: {
"in_carts.$.quantity": new_quantity, timestamp: new ISODate()
}
})
Ensure the application has enough inventory for the operation. If there is not sufficient inventory, the
application must rollback the last operation. The following operation checks for errors
using getLastError and rolls back the operation if it returns an error:

if(!db.runCommand({getLastError:1}).updatedExisting) {
db.carts.update({
_id: "the_users_session_id", "products.sku": "111445GB3"
}, {
$set : { "in_carts.$.quantity": old_quantity}
})
}
If a user abandons the purchase process or the shopping cart grows stale and times out, the
application must return cart content to the inventory. This operation requires a loop that finds all
expired or canceled carts and then returns the content of each cart to the inventory. Begin by finding
all sufficiently "stale" carts, and use an operation that resembles the following:

var carts = db.carts.find({status:"expiring"})


for(var i = 0; i < carts.length; i++) {
var cart = carts[i]

for(var j = 0; j < cart.products.length; j++) {


var product = cart.products[i]

db.products.update({

Prepared by Kamal Podder Page 5


sku: product.sku,
"in_carts.id": cart._id,
"in_carts.quantity": product.quantity
}, {
$inc: {quantity: item.quantity},
$pull: {in_carts: {id: cart._id}}
})
}

db.carts.update({
_id: cart._id,
$set: {status: 'expired'}
})
}
This operation walks all products in each cart and returns them to the inventory and removes cart
identifiers from the in_carts array in the product documents. Once the application has returned all of
the items to the inventory, the application sets the cart's status to expire.

Checkout

When the user clicks the "confirm" button in the checkout portion of the application, the application
creates an "order" document that reflects the entire order. Consider the following operation:

db.orders.insert({
created_on: new ISODate("2012-05-17T08:14:15.656Z"),

shipping: {
customer: "Peter P Peterson",
address: "Longroad 1343",
city: "Peterburg",
region: "",
state: "PE",
country: "Peteonia",
delivery_notes: "Leave at the gate",

tracking: {
company: "ups",
tracking_number: "22122X211SD",
status: "ontruck",
estimated_delivery: new ISODate("2012-05-17T08:14:15.656Z")
},
},

payment: {

Prepared by Kamal Podder Page 6


method: "visa",
transaction_id: "2312213312XXXTD"
}

products: {
{quantity: 2, sku:"111445GB3", title: "Simsong mobile phone", unit_cost:1000, currency:"USDA"}
}
})
For a relational databases you might need to model this as a set of tables: for orders, shipping,
tracking, and payment. Using MongoDB one can create a single document that is self-contained, easy
to understand, and simply maps into an object oriented application. After inserting this document the
application must ensure inventory is up to date before completing the checkout. Begin by setting the
cart as finished, with the following operation:

db.carts.update({
_id: "the_users_session_id"
}, {
$set: {status:"complete"}
});
Use the following operation to remove the cart identifer from all product records:
db.products.update({
"in_carts.id": "the_users_session_id"
}, {
$pull: {in_carts: {id: "the_users_session_id"}}
}, false, true);
By using "multi-update," which is the last argument in the update() method, this operation will update
all matching documents in one set of operations.

Mongo DB Data Model Example


Relationships

Relationships represent the way documents are related to each other. They can be modeled through
Embedded and Referenced approaches. The relationship types can be One to One( 1:1), One to Many(
1:N), Many to One (N:1) and Many to Many (N:N).

We will start looking at Person document. The sample data for the person document is shown below:
Person Document
{
"_id":ObjectId("52eecd85242f436000001"),
"person": "Tom Hanks",
"id": "987654321",
"ssn": "345982341",

Prepared by Kamal Podder Page 7


"gender": "male"
}
Another document of type department is shown below:
Department Document
{
"_id":ObjectId("82aacd85242f436000011"),
"department": "HR",
"id": "9",
"location": "Los Angeles"
"country": "USA"
}
One to one relationship between Person and Department is modeled as below using embedded and
reference approach:
Embedded approach Reference approach

Person Document Person to Department using references


{ Person document
"_id":ObjectId("52eecd85242f436000001"), {
"person": "Tom Hanks", "_id":ObjectId("52eecd85242f436000001"),
"id": "987654321", "person": "Tom Hanks",
"ssn": "345982341", "id": "987654321",
"gender": "male" "ssn": "345982341",
"department": "gender": "male"
{ }
"street": "22 A, Parker Apt", Department document
"code": 123456, {
"city": "Los Angeles", "_id":ObjectId("82aacd85242f436000011"),
"state": "California", "department": "HR",
"country": "USA" "person_id": "52eecd85242f436000001",
} "id": "9",
} "location": "Los Angeles"
"country": "USA"
}

One to Many relationship between person and address is shown below using the embedded and
reference approach.
Embedded approach Reference approach
Person to Address One to Many Person to Address – References
{ Person Document
"_id":ObjectId("52ffc33cd85242f436000001"), {
"person": "Tom Hanks", "_id":ObjectId("52ffc33cd85242f436000001"),
"id": "987654321", "person": "Tom Hanks",
"ssn": "345982341", "id": "987654321",
"gender": "male" "ssn": "345982341",

Prepared by Kamal Podder Page 8


"address": [ "gender": "male"
{ }
"street": "92 A, Windsor Apt",
"code": 123456, Address Document1
"city": "Los Angeles", {
"state": "California", "person_id": "52ffc33cd85242f436000001",
"country": "USA" "street": "92 A, Windsor Apt",
}, "code": 123456,
{ "city": "Los Angeles",
"street": "25 Franklin Apt", "state": "California",
"code": 456789, "country": "USA"
"city": "Chicago", }
"state": "Illinois", Address Document 2
"country": "USA" {
} "person_id": "52ffc33cd85242f436000001",
] "street": "25 Franklin Apt",
} "code": 456789,
"city": "Chicago",
"state": "Illinois",
"country": "USA"
}

A group document can have many to many relationship with person document. A sample group
document is shown below:
Group Document
{
"_id":ObjectId("22avxd85242f436000001"),
"group": "Group1",
"type": "Engineers"
}
Many to Many relationship between Person and Group is shown using embedded approach.
Person to Group using embedded
{
"_id":ObjectId("52ffc33cd85242f436000001"),
"person": "Tom Hanks",
"id": "987654321",
"ssn": "345982341",
"gender": "male"
"groups": [
{
"_id":ObjectId("22avxd85242f436000001"),
"group": "Group1",
"type": "Engineers"
},

Prepared by Kamal Podder Page 9


{
"_id":ObjectId("35kfsd85242f436000001"),
"group": "Group2",
"type": "Managers"
}
]
}
Using references, Person to Group many to many relationship is shown below:
Person to Group using references

Person Document
{
"_id":ObjectId("52ffc33cd85242f436000001"),
"person": "Tom Hanks",
"id": "987654321",
"ssn": "345982341",
"gender": "male"
}

Group Document 1

{
"_id":ObjectId("22avxd85242f436000001"),
"group": "Group1",
"type": "Engineers"
}

Group Document 2
{
"_id":ObjectId("35kfsd85242f436000001"),
"group": "Group2",
"type": "Managers"
}
Manager to a person can be a parent to child relationship. The relationship is shown using the
embedded approach.
Manager to Person using Embedded
{
"_id":ObjectId("52ffc33cd85242f436000001"),
"manager": "John Smith",
"id": "987652321",
"ssn": "245982341",
"gender": "male",
"persons":[
{

Prepared by Kamal Podder Page 10


"_id":ObjectId("52ffc33cd85242f436000001"),
"person": "Tom Hanks",
"id": "987654321",
"ssn": "345982341",
"gender": "male"
},
{
"_id":ObjectId("83eec33cd85242f436000001"),
"person": "Roger Harper",
"id": "387654321",
"ssn": "324982341",
"gender": "male"
},
]
}
Parent to child relationship between Manager and Person using references approach is shown below:
Manager to Person using references
{
"_id":ObjectId("52ffc33cd85242f436000001"),
"manager": "John Smith",
"id": "987652321",
"ssn": "245982341",
"gender": "male",
"persons":[
ObjectId("52ffc33cd85242f436000001"),
ObjectId("83eec33cd85242f436000001")
]
}

Model Monetary Data


Applications that handle monetary data often require the ability to capture fractional units of currency
and need to emulate decimal rounding with exact precision when performing arithmetic. The binary-
based floating-point arithmetic used by many modern systems (i.e., float, double) is unable to
represent exact decimal fractions and requires some degree of approximation making it unsuitable for
monetary arithmetic. This constraint is an important consideration when modeling monetary data.

There are several approaches to modeling monetary data in MongoDB using the numeric and non-
numeric models.

Numeric Model : The numeric model may be appropriate if you need to query the database for exact,
mathematically valid matches or need to perform server-side arithmetic, e.g., $inc, $mul,
and aggregation framework arithmetic.

Prepared by Kamal Podder Page 11


The following approaches follow the numeric model:

 Using the Decimal BSON Type which is a decimal-based floating-point format capable of providing
exact precision. Available in MongoDB version 3.4 and later.
 Using a Scale Factor to convert the monetary value to a 64-bit integer (long BSON type) by
multiplying by a power of 10 scale factor.

Non-Numeric Model : If there is no need to perform server-side arithmetic on monetary data or if


server-side approximations are sufficient, modeling monetary data using the non-numeric model may
be suitable.

The following approach follows the non-numeric model:

 Using two fields for the monetary value: One field stores the exact monetary value as a non-
numeric string and another field stores a binary-based floating-point (double BSON type)
approximation of the value.

NOTE : Arithmetic mentioned on this page refers to server-side arithmetic performed


by mongod or mongos, and not to client-side arithmetic.

1. Numeric Model

Using the Decimal BSON Type - New in version 3.4.

The decimal BSON type uses the IEEE 754 decimal128 decimal-based floating-point numbering format.
Unlike binary-based floating-point formats (i.e., the double BSON type), decimal128 does not
approximate decimal values and is able to provide the exact precision required for working with
monetary data.

From the mongo shell decimal values are assigned and queried using he NumberDecimal() constructor.
The following example adds a document containing gas prices to a gasprices collection:

db.gasprices.insert{ "_id" : 1, "date" : ISODate(), "price" : NumberDecimal("2.099"), "station" :


"Quikstop", "grade" : "regular" }

The following query matches the document above:


db.gasprices.find( { price: NumberDecimal("2.099") } )

Converting Values to Decimal

A collection’s values can be transformed to the decimal type by performing a one-time transformation
or by modifying application logic to perform the transformation as it accesses records.

Prepared by Kamal Podder Page 12


TIP
Alternative to the procedure outlined below, starting in version 4.0, you can use the $convert and its
helper $toDecimal operator to convert values to NumberDecimal().

One-Time Collection Transformation : A collection can be transformed by iterating over all documents
in the collection, converting the monetary value to the decimal type, and writing the document back to
the collection.

NOTE : It is strongly advised to add the decimal value to the document as a new field and remove the
old field later once the new field’s values have been verified.

WARNING
Be sure to test decimal conversions in an isolated test environment. Once datafiles are created or
modified with MongoDB version 3.4 they will no longer be compatible with previous versions and there
is no support for downgrading datafiles containing decimals.

Scale Factor Transformation:

Consider the following collection which used the Scale Factor approach and saved the monetary value
as a 64-bit integer representing the number of cents:

{ "_id" : 1, "description" : "T-Shirt", "size" : "M", "price" : NumberLong("1999") },


{ "_id" : 2, "description" : "Jeans", "size" : "36", "price" : NumberLong("3999") },
{ "_id" : 3, "description" : "Shorts", "size" : "32", "price" : NumberLong("2999") },
{ "_id" : 4, "description" : "Cool T-Shirt", "size" : "L", "price" : NumberLong("2495") },
{ "_id" : 5, "description" : "Designer Jeans", "size" : "30", "price" : NumberLong("8000") }

The long value can be converted to an appropriately formatted decimal value by


multiplying price and NumberDecimal("0.01") using the $multiply operator. The following aggregation
pipeline assigns the converted value to the new priceDec field in the $addFields stage:

db.clothes.aggregate(
[
{ $match: { price: { $type: "long" }, priceDec: { $exists: 0 } } },
{
$addFields: {
priceDec: { $multiply: [ "$price", NumberDecimal( "0.01" ) ] }
}
}
]
).forEach( ( function( doc ) {
db.clothes.save( doc );
}))
The results of the aggregation pipeline can be verified using the db.clothes.find() query:

Prepared by Kamal Podder Page 13


{ "_id" : 1, "description" : "T-Shirt", "size" : "M", "price" : NumberLong(1999), "priceDec" :
NumberDecimal("19.99") }
{ "_id" : 2, "description" : "Jeans", "size" : "36", "price" : NumberLong(3999), "priceDec" :
NumberDecimal("39.99") }
{ "_id" : 3, "description" : "Shorts", "size" : "32", "price" : NumberLong(2999), "priceDec" :
NumberDecimal("29.99") }
{ "_id" : 4, "description" : "Cool T-Shirt", "size" : "L", "price" : NumberLong(2495), "priceDec" :
NumberDecimal("24.95") }
{ "_id" : 5, "description" : "Designer Jeans", "size" : "30", "price" : NumberLong(8000), "priceDec" :
NumberDecimal("80.00") }

If you do not want to add a new field with the decimal value, the original field can be overwritten. The
following update() method first checks that price exists and that it is a long, then transforms
the long value to decimal and stores it in the price field:

db.clothes.update(
{ price: { $type: "long" } },
{ $mul: { price: NumberDecimal( "0.01" ) } },
{ multi: 1 }
)
The results can be verified using the db.clothes.find() query:
{ "_id" : 1, "description" : "T-Shirt", "size" : "M", "price" : NumberDecimal("19.99") }
{ "_id" : 2, "description" : "Jeans", "size" : "36", "price" : NumberDecimal("39.99") }
{ "_id" : 3, "description" : "Shorts", "size" : "32", "price" : NumberDecimal("29.99") }
{ "_id" : 4, "description" : "Cool T-Shirt", "size" : "L", "price" : NumberDecimal("24.95") }
{ "_id" : 5, "description" : "Designer Jeans", "size" : "30", "price" : NumberDecimal("80.00") }

Non-Numeric Transformation:

Consider the following collection which used the non-numeric model and saved the monetary value as
a string with the exact representation of the value:
{ "_id" : 1, "description" : "T-Shirt", "size" : "M", "price" : "19.99" }
{ "_id" : 2, "description" : "Jeans", "size" : "36", "price" : "39.99" }
{ "_id" : 3, "description" : "Shorts", "size" : "32", "price" : "29.99" }
{ "_id" : 4, "description" : "Cool T-Shirt", "size" : "L", "price" : "24.95" }
{ "_id" : 5, "description" : "Designer Jeans", "size" : "30", "price" : "80.00" }

The following function first checks that price exists and that it is a string, then transforms
the string value to a decimal value and stores it in the priceDec field:

db.clothes.find( { $and : [ { price: { $exists: true } }, { price: { $type: "string" } } ] } )


.forEach( function( doc ) {
doc.priceDec = NumberDecimal( doc.price ); db.clothes.save( doc );
} );

Prepared by Kamal Podder Page 14


The function does not output anything to the command line. The results can be verified using
the db.clothes.find() query:

{ "_id" : 1, "description" : "T-Shirt", "size" : "M", "price" : "19.99", "priceDec" : NumberDecimal("19.99") }


{ "_id" : 2, "description" : "Jeans", "size" : "36", "price" : "39.99", "priceDec" : NumberDecimal("39.99") }
{ "_id" : 3, "description" : "Shorts", "size" : "32", "price" : "29.99", "priceDec" : NumberDecimal("29.99") }
{ "_id" : 4, "description" : "Cool T-Shirt", "size" : "L", "price" : "24.95", "priceDec" : NumberDecimal("24.95") }
{ "_id" : 5, "description" : "Designer Jeans", "size" : "30", "price" : "80.00", "priceDec" : NumberDecimal("80.00")
}
NOTE : If you are using MongoDB version 3.4 or higher, using the decimal type for modeling monetary
data is preferable to the Scale Factor method.

2. Non-Numeric Model

To model monetary data using the non-numeric model, store the value in two fields:
1. In one field, encode the exact monetary value as a non-numeric data type; e.g., BinData or
a string.
2. In the second field, store a double-precision floating point approximation of the exact value.

The following example uses the non-numeric model to store 9.99 USD for the price and 0.25 USD for
the fee:
{
price: { display: "9.99", approx: 9.9900000000000002, currency: "USD" },
fee: { display: "0.25", approx: 0.2499999999999999, currency: "USD" }
}
With some care, applications can perform range and sort queries on the field with the numeric
approximation. However, the use of the approximation field for the query and sort operations requires
that applications perform client-side post-processing to decode the non-numeric representation of the
exact value and then filter out the returned documents based on the exact monetary value.

Model Time Data


MongoDB stores times in UTC by default, and will convert any local time representations into this form.
Applications that must operate or report on some unmodified local time value may store the time zone
alongside the UTC timestamp, and compute the original local time in their application logic.

Example
In the MongoDB shell, you can store both the current date and the current client’s offset from UTC.

var now = new Date();


db.data.save( { date: now, offset: now.getTimezoneOffset() } );

You can reconstruct the original local time by applying the saved offset:
Prepared by Kamal Podder Page 15
var record = db.data.findOne();
var localNow = new Date( record.date.getTime() - ( record.offset * 60000 ) );

Use Buckets for Time-Series Data

A common method to organize time-series data is to group the data into buckets where each bucket
represents a uniform unit of time such as a day or year. Bucketing organizes specific groups of data to
help:
 Discover historical trends,
 Forecast future trends, and
 Optimze storage usage.

The Bucket Pattern

Consider a collection that stores temperature data obtained from a sensor. The sensor records the
temperature every minute and stores the data in a collection called temperatures:

// temperatures collection
{
"_id": 1, "sensor_id": 12345,
"timestamp": ISODate("2019-01-31T10:00:00.000Z"), "temperature": 40
}
{
"_id": 2, "sensor_id": 12345,
"timestamp": ISODate("2019-01-31T10:01:00.000Z"), "temperature": 40
}
{
"_id": 3, "sensor_id": 12345,
"timestamp": ISODate("2019-01-31T10:02:00.000Z"), "temperature": 41
}
...
This approach does not scale well in terms of data and index size. For example, if the application
requires indexes on the sensor_id and timestamp fields, every incoming reading from the sensor would
need to be indexed to improve performance.

You can leverage the document model to bucket the data into documents that hold the measurements
for a particular timespan. Consider the following updated schema which buckets the readings taken
every minute into hour-long groups:
{
"_id": 1, "sensor_id": 12345,
"start_date": ISODate("2019-01-31T10:00:00.000Z"),
"end_date": ISODate("2019-01-31T10:59:59.000Z"),
"measurements": [

Prepared by Kamal Podder Page 16


{ "timestamp": ISODate("2019-01-31T10:00:00.000Z"), "temperature": 40 },
{ "timestamp": ISODate("2019-01-31T10:01:00.000Z"), "temperature": 40 },
...
{ "timestamp": ISODate("2019-01-31T10:42:00.000Z"), "temperature": 42 }
],
"transaction_count": 42, "sum_temperature": 1783
}

This updated schema improves scalability and mirrors how the application actually uses the data. A
user likely wouldn’t query for a specific temperature reading. Instead, a user would likely query for
temperature behavior over the course of an hour or day. The Bucket pattern helps facilitate those
queries by grouping the data into uniform time periods.

Combine the Computed and Bucket Patterns

The example document contains two computed fields: transaction_count and sum_temperature. If the
application frequently needs to retrieve the sum of temperatures for a given hour, computing a
running total of the sum can help save application resources. This Computed Pattern approach
eliminates the need to calculate the sum each time the data is requested.

The pre-aggregated sum_temperature and transaction_count values enable further computations such
as the average temperature (sum_temperature / transaction_count) for a particular bucket. It is much
more likely that users will query the application for the average temperature between 2:00 and 3:00
PM rather than querying for the specific temperature at 2:03 PM. Bucketing and pre-computing certain
values allows the application to more readily provide that information.

Sample Use Cases

In addition to time-series data, the Bucket pattern is useful for Internet of Things projects where you
have multiple datasets coming from many different sources. It can be helpful to bucket that data into
groups (e.g. based on device type or system) to more easily retrieve and parse the data.

The Bucket pattern is also commonly used in financial applications to group transactions by type, date,
or customer.

Model Computed Data


Often, an application needs to derive a value from source data stored in a database. Computing a new
value may require significant CPU resources, especially in the case of large data sets or in cases where
multiple documents must be examined. If a computed value is requested often, it can be more efficient
to save that value in the database ahead of time. This way, when the application requests data, only
one read operation is required.

Prepared by Kamal Podder Page 17


Computed Pattern

If your reads significantly outnumber your writes, the computed pattern reduces the frequency of
having to perform computations. Instead of attaching the burden of computation to every read, the
application stores the computed value and recalculates it as needed. The application can either
recompute the value with every write that changes the computed value’s source data, or as part of a
periodic job.

NOTE : With periodic updates, the computed value is not guaranteed to be exact in any given read.
However, this approach may be worth the performance boost if exact accuracy isn’t a requirement.
Example

An application displays movie viewer and revenue information. Users often want to know how
Consider the ollowing screenings collection: many people saw a certain
movie and how much money
// screenings collection that movie made.
{
"theater": "Alger Cinema", "location": "Lakeview, OR", In this example, to
"movie_title": "Reservoir Dogs", total num_viewers and revenue,
"num_viewers": 344, "revenue": 3440 you must perform a read for
} theaters that screened a movie
{ with the title “Reservoir Dogs”
"theater": "City Cinema", "location": "New York, NY", and sum the values of those
"movie_title": "Reservoir Dogs", fields.
"num_viewers": 1496, "revenue": 22440
}
{
"theater": "Overland Park Cinema", "location": "Boise, ID",
"movie_title": "Reservoir Dogs",
"num_viewers": 760, "revenue": 7600
}
To avoid performing that computation every time the information is requested, you can compute the
total values and store them in a movies collection with the movie record itself:

// movies collection
{
"title": "Reservoir Dogs",
"total_viewers": 2600, "total_revenue": 33480,
...
}
In a low write environment, the computation could be done in conjunction with any update of
the screenings data.In an environment with more regular writes, the computations could be done at
defined intervals - every hour for example. The source data in screenings isn’t affected by writes to
the movies collection, so you can run calculations at any time.

Prepared by Kamal Podder Page 18


This is a common design pattern that reduces CPU workload and increases application performance.
Whenever you are performing the same calculations repeatedly and you have a high read to write
ratio, consider the Computed Pattern.

Other Sample Use Cases

In addition to cases where summing is requested frequently, such as getting total revenue or viewers
in the movie database example, the computed pattern is a good fit wherever calculations need to be
run against data. For example:

 A car company that runs massive aggregation queries on vehicle data, storing results to show for
the next few hours until the data is recomputed.
 A consumer reporting company that compiles data from several different sources to create rank-
ordered lists like the “100 Best-Reviewed Gadgets”. The lists can be regenerated periodically while
the underlying data is updated independently.

Prepared by Kamal Podder Page 19


Schema Validation in MongoDB
There are likely to be situations in which we might need our data documents to follow a particular
structure or fulfill certain requirements. Many document databases allow defining rules that dictate
how parts of your documents’ data should be structured while still offering some freedom to change
this structure if needed.

MongoDB has a feature called schema validation that allows you to apply constraints on your
documents’ structure. Schema validation is built around JSON Schema, an open standard for JSON
document structure description and validation.

Step 1 — Inserting Documents Without Applying Schema Validation


To illustrate the schema validation features, we use a sample {
database containing documents that represent the highest "name": "Everest",
mountains in the world. The sample document for Mount "height": 8848,
Everest will take this form: "location": ["Nepal", "China"],
"ascents": {
This document contains the following information: "first": { "year": 1953, },
 name: the peak’s name. "first_winter": { "year": 1980, },
 height: the peak’s elevation, in meters. "total": 5656,
 location: the countries in which the mountain is located. }
 ascents: this field’s value is another document. When one }
document is stored within another document like this, it’s Run the following insertOne() method
known as an embedded or nested document. to simultaneously create a collection
Each ascents (climbs) document describes successful named peaks :
ascents of the given mountain. Specifically, db.peaks.insertOne(
each ascents document contains a total field that lists the {
total number of successful ascents of each given peak. "name": "Everest",
Additionally, each of these nested documents contain two "height": 8848,
fields whose values are also nested documents: "location": ["Nepal", "China"],
 first: this field’s value is a nested document that contains "ascents": {
one field, year, which describes the year of the first overall "first": { "year": 1953 },
successful ascent. "first_winter": { "year": 1980 },
 first_winter: this field’s value is a nested document that "total": 5656
also contains a year field, the value of which represents }
the year of the first successful winter ascent of the given }
mountain. )

In the above example we have added the document without any special validations on field like name
height etc. If documents are valid JSON, it is enough to insert them into the collection. However, this
isn’t enough to keep the database logically consistent and meaningful. So we should build schema
validation rules to make sure the data documents in the peaks collection follow a few essential
requirements.

Prepared by Kamal Podder Page 1


Step 2 — Validating String Fields

In MongoDB, schema validation works on individual collections by assigning a JSON Schema


document to the collection. JSON Schema is an open standard that allows us define and validate the
structure of JSON documents. We do this by creating a schema definition that lists a set of
requirements that documents in the given collection must follow to be considered valid.

 Any given collection can only use a single JSON Schema


 But we can assign a schema when you create the collection or any time afterwards.
 If we decide to change your original validation rules later on, you will have to replace the
original JSON Schema document with one that aligns with your new requirements.

To assign a JSON Schema validator document to the peaks collection we run the following command:
db.runCommand({
"collMod": "collection_name", The runCommand method executes the
"validator": { collMod command, which modifies the
specified collection by applying the validator
$jsonSchema: {JSON_Schema_document}
attribute to it. The validator attribute is
} responsible for schema validation.
})

validator attribute accepts $jsonSchema operator : This operator defines a JSON Schema document
which will be used as the schema validator for the given collection.

Warning: In order to execute the collMod command, your MongoDB user must be granted the
appropriate privileges.

We can also assign a JSON Schema validator when you create a collection. To do so use following
syntax:
db.createCollection( collection_name is the name of the
"collection_name", { collection to which we want to assign the
"validator": { validator document and the validator option
$jsonSchema: {JSON_Schema_document} assigns a specified JSON Schema document
} as the collection’s validator.
})

 Applying a JSON Schema validator from the start like this means every document we add to the
collection must satisfy the requirements set by the validator.
 When we add validation rules to an existing collection, though, the new rules won’t affect existing
documents until you try to modify them.
 The JSON schema document you pass to the validator attribute should outline every validation rule
we want to apply to the collection.

Prepared by Kamal Podder Page 2


The following example JSON Schema will make sure that the name field is present in every document
in the collection, and that the name field’s value is always a string:
{
This schema document outlines
"bsonType": "object",
certain requirements that certain
"description": "Document describing a mountain peak",
parts of documents entered into
"required": ["name"],
the collection must follow. The
"properties": {
root part of the JSON Schema
"name": {
document (the fields before
"bsonType": "string",
properties, which in this case are
"description": "Name must be a string and is required"
bsonType, description, and
}
required) describes the database
},
document itself.
}

 The bsonType property describes the data type that the validation engine will expect to find. For
the database document itself, the expected type is object. This means that you can only add
objects — in other words, complete, valid JSON documents surrounded by curly braces ({ and }) —
to this collection. If you were to try to insert some other kind of data type (like a standalone string,
integer, or an array), it would cause an error.
 In MongoDB, every document is an object. However, JSON Schema is a standard used to describe
and validate all kinds of valid JSON documents, and a plain array or a string is valid JSON, too. When
working with MongoDB schema validation, you’ll find that you must always set the root
document’s bsonType value as object in the JSON Schema validator.

 Next, the description property (optional) provides a short description of the documents found in
this collection.
 The next property in the validation document is the required field. The required field can only
accept an array containing a list of document fields that must be present in every document in the
collection. In this example, ["name"] means that the documents only have to contain
the name field to be considered valid.
 Following that is a properties object that describes the rules used to validate document fields. For
each field that you want to define rules for, include an embedded JSON Schema document named
after the field. Be aware that you can define schema rules for fields that aren’t listed in
the required array. This can be useful in cases where your data has fields that aren’t required, but
you’d still like for them to follow certain rules when they are present.

These embedded schema documents will follow a similar syntax as the main document. To apply this
JSON Schema to the peaks collection you created in the previous step, run the
following runCommand() method:
db.runCommand({
"collMod": "peaks",
"validator": {
$jsonSchema: {
"bsonType": "object",
Prepared by Kamal Podder Page 3
"description": "Document describing a mountain peak",
"required": ["name"],
"properties": {
"name": {
"bsonType": "string",
"description": "Name must be a string and is required"
}
}, MongoDB will respond with a success message indicating
} that the collection was successfully modified:
} Output
}) { "ok" : 1 }

Following that, MongoDB will no longer allow us to insert documents into the peaks collection if they
don’t have a name field. To test this, try inserting the document you inserted in the previous step that
fully describes a mountain, aside from missing the name field:
db.peaks.insertOne({
"height": 8611, This time, the operation will trigger an error
"location": ["Pakistan", "China"], message indicating a failed document validation:
"ascents": { Output
"first": { "year": 1954 }, WriteError({
"first_winter": { "year": 1921 }, "index" : 0,
"total": 306 "code" : 121,
} "errmsg" : "Document failed validation",
}) ...
})

Note: Starting with MongoDB 5.0, when validation fails the error messages point towards the failed
constraint. In MongoDB 4.4 and earlier, the database provides no further details on the failure reason.

Step 3 — Validating Number Fields

MongoDB will accept any value for this field when no validation applied— even values that don’t make
any sense for this field, like negative values — as long as the inserted document is written in valid JSON
syntax. To work around this, you can extend the schema validation document from the previous step
to include additional rules regarding the height field.

Start by ensuring that the height field is always present in newly-inserted documents and that it’s
always expressed as a number. Modify the schema validation with the following command:
db.runCommand({
"collMod": "peaks",
"validator": {
$jsonSchema: {
"bsonType": "object",
"description": "Document describing a mountain peak",

Prepared by Kamal Podder Page 4


"required": ["name", "height"],
"properties": {
"name": { "bsonType": "string", "description": "Name must be a string and is required" },
"height": { "bsonType": "number", "description": "Height must be a number and is
required" }
},
} description field is auxiliary, and any description you include should only be
} to help other users understand the intention behind the JSON Schema.
})

To prevent this, you could add a few more properties to the schema validation document. Replace the
current schema validation settings by running the following operation:

db.runCommand({
"collMod": "peaks",
"validator": {
$jsonSchema: {
"bsonType": "object",
"description": "Document describing a mountain peak",
"required": ["name", "height"],
"properties": {
"name": { "bsonType": "string", "description": "Name must be a string and is required" },
"height": { "bsonType": "number", "description": "Height must be a number between 100
and 10000 and is required", "minimum": 100, "maximum": 10000 }
},
} The minimum and maximum attributes set constraints on values
} included in height fields, ensuring they can’t be lower than 100 or
}) higher than 10000.

Step 4 — Validating Array Fields

Now we can turn our attention to the location field to guarantee its data consistency.
As peaks span more than one country, it would make sense store each peak’s location data as an array
containing one or more country names instead of being just a string value. As with the height values,
making sure each location field’s data type is consistent across every document can help with
summarizing data when using aggregation pipelines.

First, consider some examples of location values that users might enter, and weigh which ones would
be valid or invalid:
 ["Nepal", "China"]: this is a two-element array, and would be a valid value for a mountain spanning
two countries.

Prepared by Kamal Podder Page 5


 ["Nepal"]: this example is a single-element array, it would also be a valid value for a mountain
located in a single country.
 "Nepal": this example is a plain string. It would be invalid because although it lists a single country,
the location field should always contain an array
 []: an empty array, this example would not be a valid value. After all, every mountain must exist in
at least one country.
 ["Nepal", "Nepal"]: this two-element array would also be invalid, as it contains the same value
appearing twice.
 ["Nepal", 15]: lastly, this two-element array would be invalid, as one of its values is a number
instead of a string and this is not a correct location name.

To ensure that MongoDB will correctly interpret each of these examples as valid or invalid, run the
following operation to create some new validation rules for the peaks collection:

db.runCommand({
"collMod": "peaks",
"validator": {
$jsonSchema: {
"bsonType": "object",
"description": "Document describing a mountain peak",
"required": ["name", "height", "location"],
"properties": {
"name": { "bsonType": "string", "description": "Name must be a string and is required" },
"height": {
"bsonType": "number",
"description": "Height must be a number between 100 and 10000 and is required",
"minimum": 100, "maximum": 10000
},
"location": {
"bsonType": "array",
"description": "Location must be an array of strings",
"minItems": 1,
"uniqueItems": true,
"items": { "bsonType": "string" }
}
},
}
}
})
The minItems property validates that the array must contain at least one element, and
the uniqueItems property is set to true to ensure that elements within each location array will be
unique. This will prevent values like ["Nepal", "Nepal"] from being accepted. Lastly,
the items subdocument defines the validation schema for each individual array item. Here, the only
expectation is that every item within a location array must be a string.
Prepared by Kamal Podder Page 6
Step 5 — Validating Embedded Documents

At this point, your peaks collection has three fields — name, height and location — that are being kept
in check by schema validation. This step focuses on defining validation rules for the ascents field, which
describes successful attempts at summiting each peak.
In the example document from Step 1 that represents Mount Everest, the ascents field was structured
as follows:
{ The ascents subdocument contains a total field whose
"name": "Everest", value represents the total number of ascent attempts
"height": 8848, for the given mountain. It also contains information on
"location": ["Nepal", "China"], the first winter ascent of the mountain as well as the
"ascents": { first ascent overall.
"first": {
"year": 1953, For now, just assume the information that you will
}, always want to have in each document is the total
"first_winter": { number of ascent attempts and the ascents field must
"year": 1980, always be present and its value must always be a
}, subdocument. This subdocument, in turn, must always
"total": 5656, contain a total attribute holding a number greater than
} or equal to zero.
}

Once again, replace the schema validation document for the peaks collection by running the
following runCommand() method:

db.runCommand({
"collMod": "peaks",
"validator": {
$jsonSchema: {
"bsonType": "object",
"description": "Document describing a mountain peak",
"required": ["name", "height", "location", "ascents"],
"properties": {
"name": { "bsonType": "string", "description": "Name must be a string and is required" },
"height": {
"bsonType": "number",
"description": "Height must be a number between 100 and 10000 and is required",
"minimum": 100, "maximum": 10000 },
"location": {
"bsonType": "array",
"description": "Location must be an array of strings",
"minItems": 1,
"uniqueItems": true,
"items": { "bsonType": "string" }

Prepared by Kamal Podder Page 7


},
"ascents": {
"bsonType": "object",
"description": "Ascent attempts information",
"required": ["total"],
"properties": {
"total": {
"bsonType": "number",
"description": "Total number of ascents must be 0 or higher",
"minimum": 0
}
}
} Anytime the document contains subdocuments under any of its
}, fields, the JSON Schema for that field follows the exact same
} syntax as the main document schema. This makes it
} straightforward to define complex validation schemas for
}) document structures containing multiple subdocuments in a
hierarchical structure.

MongoDB’s schema validation feature should not be considered a replacement for data validation at
the application level, but it can further safeguard against violating data constraints that are essential to
keeping your data meaningful. Using schema validation can be a helpful tool for structuring one’s data
while retaining the flexibility of a schemaless approach to data storage.

We can use the validationLevel option to determine which operations MongoDB applies the validation
rules:
 If the validationLevel is strict (the default), MongoDB applies validation rules to all inserts and
updates.
 If the validationLevel is moderate, MongoDB applies validation rules to inserts and to updates to
existing documents that already fulfill the validation criteria. With the moderate level, updates to
existing documents that do not fulfill the validation criteria are not checked for validity.

For example, create a contacts collection with the following documents:

db.contacts.insert([
{ "_id": 1, "name": "Anne", "phone": "+1 555 123 456", "city": "London", "status": "Complete" },
{ "_id": 2, "name": "Ivan", "city": "Vancouver" }
])
Issue the following command to add a validator to the contacts collection:

db.runCommand( {
collMod: "contacts",
validator: { $jsonSchema: {
bsonType: "object",

Prepared by Kamal Podder Page 8


required: [ "phone", "name" ],
properties: {
phone: { bsonType: "string", description: "must be a string and is required" },
name: { bsonType: "string", description: "must be a string and is required" }
}
} },
validationLevel: "moderate"
})
The contacts collection now has a validator with the moderate validationLevel:

 If you attempted to update the document with _id of 1, MongoDB would apply the validation rules
since the existing document matches the criteria.
 In contrast, MongoDB will not apply validation rules to updates to the document with _id of 2 as it
does not meet the validation rules. Phone is not there.

To disable validation entirely, you can set validationLevel to off.

Accept or Reject Invalid Documents

The validationAction option determines how MongoDB handles documents that violate the validation
rules:
 If the validationAction is error (the default), MongoDB rejects any insert or update that violates
the validation criteria.
 If the validationAction is warn, MongoDB logs any violations but allows the insertion or update
to proceed.
For example, create a contacts2 collection with the following JSON Schema validator:

db.createCollection( "contacts2", {
validator: { $jsonSchema: {
bsonType: "object",
required: [ "phone" ],
properties: {
phone: { bsonType: "string", description: "must be a string and is required" },
email: { bsonType : "string", pattern : "@mongodb\.com$",
description: "must be a string and match the regular expression pattern" },
status: { enum: [ "Unknown", "Incomplete" ],
description: "can only be one of the enum values" }
}
} },
validationAction: "warn"
})
With the warn validationAction, MongoDB logs any violations but allows the insertion or update to
proceed.
For example, the following insert operation violates the validation rule:

Prepared by Kamal Podder Page 9


db.contacts2.insert( { name: "Amanda", status: "Updated" } )

However, since the validationAction is warn only, MongoDB only logs the validation violation message
and allows the operation to proceed:
2017-12-01T12:31:23.738-05:00 W STORAGE [conn1] Document would fail validation collection:
example.contacts2 doc: { _id: ObjectId('5a2191ebacbbfc2bdc4dcffc'), name: "Amanda", status:
"Updated" }

Bypass Document Validation

Users can bypass document validation using the bypassDocumentValidation option.


The following commands can bypass validation per operation using the new
option bypassDocumentValidation:
 applyOps command
 findAndModify command and db.collection.findAndModify() method
 mapReduce command and db.collection.mapReduce() method
 insert command
 update command
 $out and $merge stages for the aggregate command and db.collection.aggregate() method

For deployments that have enabled access control, to bypass document validation, the authenticated
user must have bypassDocumentValidation action. The built-in roles dbAdmin and restore provide this
action.

Specify Validation with Query Operators


When we want to create dynamic validation rules that compare multiple field values at runtime we can
use query operators such as $eq and $gt to compare fields. For example, if you have a field that
depends on the value of another field and need to ensure that those values are correctly proportional
to each other.

Restrictions
 We can't specify the following query operators in a validator object:
o $expr with $function expressions
o $near
o $nearSphere
o $text
o $where
 We can't specify schema validation for:
o Collections in the admin, local, and config databases
o System collections

Prepared by Kamal Podder Page 10


Example

Let’s consider an application that tracks customer orders. The orders have a base price and a VAT.
The orders collection contains these fields to track total price: price, VAT, totalWithVAT

We create a schema validation with query operators to ensure that totalWithGST matches the
expected combination of price and GST.

 Create a collection with validation.

Create an orders collection with schema validation:


db.createCollection( "orders",
{
validator: {
$expr:
{
$eq: [ "$totalWithGST", { $multiply: [ "$total", { $sum:[ 1, "$GST" ] } ] } ]
}
} With this validation we can only insert documents if the totalWithGST field
}) equals total * (1 + GST).

 Confirm that the validation prevents invalid documents.

The following operation fails because the totalWithGST field does not equal the correct value:
db.orders.insertOne( {
total: NumberDecimal("141"), 141 * (1 + 0.20) equals 169.2, so the value of
GST: NumberDecimal("0.20"), the totalWithGST field must be 169.2.
totalWithGST: NumberDecimal("169")
})
The operation returns this error:
MongoServerError: Document failed validation
Additional information: {
failingDocumentId: ObjectId("62bcc9b073c105dde9231293"),
details: {
operatorName: '$expr',
specifiedAs: {
'$expr': {
'$eq': [
'$totalWithGST', { '$multiply': [ '$total', { '$sum': [ 1, '$GST' ] } ] }
]
}
},
reason: 'expression did not match',
expressionResult: false

Prepared by Kamal Podder Page 11


}
}
Another Example

The following example specifies validator rules using the query expression $or:

db.createCollection( "contacts", Validation occurs during


{ validator: { $or: [ updates and inserts. When you
{ phone: { $type: "string" } }, add validation to a collection,
{ email: { $regex: /@mongodb\.com$/ } }, existing documents do not
{ status: { $in: [ "Unknown", "Incomplete" ] } } undergo validation checks until
] modification.
}
})

Prepared by Kamal Podder Page 12


Aggregation
 Aggregation operations process data records and return computed results.
 Aggregation operations group values from multiple documents together, and can perform a variety
of operations on the grouped data to return a single result.
 MongoDB provides three ways to perform aggregation: the aggregation pipeline, the map-reduce
function, and single purpose aggregation methods.

The aggregate() Method

For the aggregation in MongoDB, you should use aggregate() method.

Syntax :>db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)
Example
In the collection you have the following data − {
{ _id: ObjectId(7df78ad8902e)
_id: ObjectId(7df78ad8902c) title: 'Neo4j Overview',
title: 'MongoDB Overview', description: 'Neo4j is no sql database',
description: 'MongoDB is no sql database', by_user: 'Neo4j',
by_user: 'tutorials point', url: 'http://www.neo4j.com',
url: 'http://www.tutorialspoint.com', tags: ['neo4j', 'database', 'NoSQL'],
tags: ['mongodb', 'database', 'NoSQL'], likes: 750
likes: 100 },
},
{ Now from the above collection, if you want to display
_id: ObjectId(7df78ad8902d) a list stating how many tutorials are written by each
title: 'NoSQL Overview', user, then you will use the
description: 'No sql database is very fast', following aggregate() method −
by_user: 'tutorials point', > db.mycol.aggregate([{$group : {_id : "$by_user",
url: 'http://www.tutorialspoint.com', num_tutorial : {$sum : 1}}}])
tags: ['mongodb', 'database', 'NoSQL'], { "_id" : "tutorials point", "num_tutorial" : 2 }
likes: 10 { "_id" : "Neo4j", "num_tutorial" : 1 }
}, >
In this example, we have grouped Sql equivalent query for the above use case will
documents by field by_user and on be select by_user, count(*) from mycol group by
each occurrence of by user previous by_user.
value of sum is incremented.

Following is a list of available aggregation expressions.


Expression Description Example
$sum Sums up the defined value from all db.mycol.aggregate([{$group : {_id :
documents in the collection. "$by_user", num_tutorial : {$sum : "$likes"}}}])

Prepared by Kamal Podder Page 1


If used on a field that contains both  If used on a field that does not exist in any
numeric and non-numeric document in the collection, $sum returns 0
values, $sum ignores the non- for that field.
numeric values and returns the sum  If all operands are non-numeric, $sum
of the numeric values. returns 0.

$avg Calculates the average of all given db.mycol.aggregate([{$group : {_id :


values from all documents in the "$by_user", num_tutorial : {$avg : "$likes"}}}])
collection.
$min Gets the minimum of the db.mycol.aggregate([{$group : {_id :
corresponding values from all "$by_user", num_tutorial : {$min : "$likes"}}}])
documents in the collection.
$max Gets the maximum of the db.mycol.aggregate([{$group : {_id :
corresponding values from all "$by_user", num_tutorial : {$max : "$likes"}}}])
documents in the collection.
$push Inserts the value to an array in the db.mycol.aggregate([{$group : {_id :
resulting document. "$by_user", url : {$push: "$url"}}}])
$addToSet Inserts the value to an array in the db.mycol.aggregate([{$group : {_id :
resulting document but does not "$by_user", url : {$addToSet : "$url"}}}])
create duplicates.
$first Gets the first document from the db.mycol.aggregate([{$group : {_id :
source documents according to the "$by_user", first_url : {$first : "$url"}}}])
grouping. Typically this makes only
sense together with some previously
applied “$sort”-stage.
$last Gets the last document from the db.mycol.aggregate([{$group : {_id :
source documents according to the "$by_user", last_url : {$last : "$url"}}}])
grouping. Typically this makes only
sense together with some previously
applied “$sort”-stage.

Pipeline Concept

In UNIX command, shell pipeline means the possibility to execute an operation on some input and use
the output as the input for the next command and so on. MongoDB also supports same concept in
aggregation framework.

There is a set of possible stages and each of those is taken as a set of documents as an input and
produces a resulting set of documents (or the final resulting JSON document at the end of the
pipeline). This can then in turn be used for the next stage and so on.

Following are the possible stages in aggregation framework –

Prepared by Kamal Podder Page 2


 $project − Used to select some specific fields from a collection.
 $match − This is a filtering operation and thus this can reduce the amount of documents that are
given as input to the next stage.
 $group − This does the actual aggregation as discussed above.
 $sort − Sorts the documents.
 $skip − It is possible to skip forward in the list of documents for a given amount of documents.
 $limit − It limits the amount of documents to look at, by the given number starting from the
current positions.
 $unwind − It is used to unwind document that are using arrays. When using an array, the data is
kind of pre-joined and this operation will be undone with this to have individual documents again.
Thus with this stage we will increase the amount of documents for the next stage.

Aggregation Pipeline

MongoDB’s aggregation framework is modeled on the concept of data processing pipelines.


Documents enter a multi-stage pipeline that transforms the documents into an aggregated result.
For example:
First Stage: The $match stage filters the documents by the
db.orders.aggregate([ status field and passes to the next stage those documents
{ $match: { status: "A" } }, that have status equal to "A".
{ $group: { _id: "$cust_id",
total: { $sum: "$amount" } } } Second Stage: The $group stage groups the documents by
]) the cust_id field to calculate the sum of the amount for
each unique cust_id.

 The most basic pipeline stages provide filters that operate like queries and document
transformations that modify the form of the output document.
 Other pipeline operations provide tools for grouping and sorting documents by specific field or
fields as well as tools for aggregating the contents of arrays, including arrays of documents. In
addition, pipeline stages can use operators for tasks such as calculating the average or
concatenating a string.
 The pipeline provides efficient data aggregation using native operations within MongoDB, and is
the preferred method for data aggregation in MongoDB.
 The aggregation pipeline can operate on a sharded collection.

Prepared by Kamal Podder Page 3


 The aggregation pipeline can use indexes to improve its performance during some of its stages. In
addition, the aggregation pipeline has an internal optimization phase.

Map-Reduce

 Aggregation pipeline provides better performance and a more coherent interface than map-
reduce. For examples of aggregation alternatives to map-reduce operations, see Map-Reduce
Examples. See also Map-Reduce to Aggregation Pipeline.
 MongoDB also provides map-reduce operations to perform aggregation. Map-reduce uses custom
JavaScript functions to perform the map and reduce operations, as well as the optional finalize
operation.

Single Purpose Aggregation Operations

 MongoDB also provides db.collection.estimatedDocumentCount(), db.collection.count() and


db.collection.distinct().
 All of these operations aggregate documents from a single collection. While these operations
provide simple access to common aggregation processes, they lack the flexibility and capabilities of
the aggregation pipeline and map-reduce.

Aggregation Pipeline
 Pipeline

The MongoDB aggregation pipeline consists of stages. Each stage transforms the documents as they
pass through the pipeline. Pipeline stages do not need to produce one output document for every
input document; e.g., some stages may generate new documents or filter out documents.

Pipeline stages can appear multiple times in the pipeline with the exception of $out, $merge, and
$geoNear stages. For a list of all available stages, see Aggregation Pipeline Stages.

MongoDB provides the db.collection.aggregate() method in the mongo shell and the aggregate
command to run the aggregation pipeline.

For example usage of the aggregation pipeline, consider Aggregation with User Preference Data and
Aggregation with the Zip Code Data Set.

Starting in MongoDB 4.2, you can use the aggregation pipeline for updates :
Command mongo Shell Methods
findAndModify db.collection.findOneAndUpdate()
db.collection.findAndModify()
update db.collection.updateOne()
db.collection.updateMany()

Prepared by Kamal Podder Page 4


db.collection.update()
Bulk.find.update()
Bulk.find.updateOne()
Bulk.find.upsert()

From the above command we can understand the process of aggregate processing.
Here, the aggregate() function is used to perform aggregation it can have three operators stages,
expression and accumulator.

 DB.Collection.Aggregate() can create a pipeline with multiple components to process a series of


documents. These components include: match of filtering operation, project of mapping operation,
group of grouping operation, sort of sorting operation, limit of limiting operation and skip of
skipping operation.
 DB.Collection.Aggregate() uses mongodb’s built-in native operation, which has very high
aggregation efficiency. It supports functions similar to SQL group by operation, and does not need
users to write custom JavaScript routines.
 The pipeline is limited to 100MB of memory per phase. If a node pipeline exceeds this limit,
mongodb will generate an error. In order to be able to process large data sets, you can set
allowdiskuse to true to write data to temporary files in the aggregation pipeline node. This can
solve the 100MB memory limit.
 The DB.Collection.Aggregate () method can return a cursor and put the data in memory for direct
operation. Like Mongo shell, pointer operation.
 The output result of DB. Collection. Aggregate() can only be saved in one document, and the size of
bson document is limited to 16m. You can solve this problem by returning a pointer. In version 2.6,
the DB. Collect. Aggregate () method returns a pointer, which can return the size of any result set.

Stages: Each stage starts from stage operators which are:

 $match: It is used for filtering the documents can reduce the amount of documents that are
given as input to the next stage.
 $project: It is used to select some specific fields from a collection.
 $group: It is used to group documents based on some value.
 $sort: It is used to sort the document that is rearranging them
 $skip: It is used to skip n number of documents and passes the remaining documents
 $limit: It is used to pass first n number of documents thus limiting them.
 $unwind: It is used to unwind documents that are using arrays i.e. it deconstructs an array field
in the documents to return documents for each element.
 $out: It is used to write resulting documents to a new collection

Expressions: It refers to the name of the field in input documents for e.g. { $group : { _id : “$id“,
total:{$sum:”$fare“}}} here $id and $fare are expressions.
Prepared by Kamal Podder Page 5
Accumulators: These are basically used in the group stage
 sum: It sums numeric values for the documents in each group
 count: It counts total numbers of documents
 avg: It calculates the average of all given values from all documents
 min: It gets the minimum value from all the documents
 max: It gets the maximum value from all the documents
 first: It gets the first document from the grouping
 last: It gets the last document from the grouping

Aggregate syntax

Basic format: db.collection.aggregate(pipeline, options)


Parameter Description:
parameter type describe
pipeline array A series of data aggregation operations or phases. Changed in version 2.6:
the pipeline phase can still be accepted as a separate parameter instead
of as an element in an array; However, if you do not specify a pipe as an
array, you cannot specify the options parameter.

Prepared by Kamal Podder Page 6


options document Optional. Other options that aggregate () passes to the aggregate
command.

Aggregation Pipeline Builder


New in version 1.14.0

The Aggregation Pipeline Builder in MongoDB Compass provides the ability to create aggregation
pipelines to process data. With aggregation pipelines, documents in a collection or view pass through
multiple stages where they are processed into a set of aggregated results. The particular stages and
results can be modified depending on your application's needs.

To access the aggregation pipeline builder, navigate to the collection or view for which you wish to
create an aggregation pipeline and click the Aggregations tab. You are presented with a blank
aggregation pipeline. The Preview of Documents in the Collection section of the Aggregations view
displays 20 documents sampled from the current collection.

To populate the Aggregation Pipeline Builder, we can either:


 Create a new Aggregation Pipeline in the UI,
 Open a saved pipeline, or
 Import a Pipeline from Plain Text

Create an Aggregation Pipeline


Navigate to Aggregation Pipeline

Start the MongoDB instance from command prompt and start the Compass. Select the database
connection store as favorites (authDBCon). Click connect as shown in figure.

Following list of DB will be shown. Click on authDB database. The collections in the database will be
shown. Click on users collection. Collection window has various tab at the top. Click on Aggregations

Prepared by Kamal Podder Page 7


tab and the Aggregation pipeline pan will appear as shown below. This plan will be used to add
pipeline stages.

aggregation pipeline pane in the


bottom-left of the view.

Prepared by Kamal Podder Page 8


Add an aggregation pipeline stage

In the aggregation pipeline pane in the bottom-left of the view, click


the Select... dropdown and select the aggregation pipeline stage to
use for the first stage of the pipeline:

Prepared by Kamal Podder Page 9


Fill in the pipeline stage.

Fill in your selected stage. As you modify the pipeline stage, the preview documents shown in the pane
to the right of the stage update automatically to reflect the results of your pipeline as it progresses,
provided Auto Preview is enabled:

As shown below we have added $match stage to select all the documents where country = ‘Maxico’
and base = ‘DWC’. The result satisfying the expression will be displayed in the right pane.

Add additional pipeline stages.

Additional aggregation stages can be added similarly as desired by clicking the Add Stage button below
your last aggregation stage. Repeat above two operation to define the additional stage.

NOTE : The toggle to the right of the name of each pipeline stage dictates whether that stage is
included in the pipeline. Toggling a pipeline stage also updates the pipeline preview, which reflects
whether or not that stage is included.

EXAMPLE : The following pipeline excludes the first $match stage and then includes the $project stage:

Prepared by Kamal Podder Page 10


Toggle to the right
of name

Limitations : The $out stage is not available if you are connected to a Data Lake.

Prepared by Kamal Podder Page 11


Specify Custom Collation
Use custom collation to specify language-
To specify a custom collation: specific rules for string comparison, such as
1. Click the Collation button at the rules for letter case and accent marks.
top of the pipeline builder.
2. Enter your collation document.
You can save a pipeline so that
Save a Pipeline you can access it later. If you
load a saved pipeline, you can
To save your pipeline: modify it without changing the
1. Click the Save button at the top of the pipeline builder. original saved copy. You can
2. Enter a name for your pipeline. also create a view from your
3. Click Save. pipeline results.

Create a View from Pipeline Results


NOTE : Creating a view from pipeline results does not save the pipeline itself.
To create a view from your pipeline results:
1. Click the arrow next to the Save button at the top
of the pipeline builder.
2. Click Create a View.
3. Enter a name for your view.
4. Click Create.

Compass creates a view from your pipeline results in the same database where the pipeline was
created.

Open a Saved Pipeline


1. Click the Folder icon at the top left of the pipeline builder.
2. Hover over the pipeline you want to open and click Open.
3. In the modal, click Open Pipeline.

Pipeline Options
The toggles at the top of the pipeline builder control the following options:
Option Description
Sample Mode (Recommended) When enabled, limits input documents passed to $group, $bucket,
and $bucketAuto stages. Set the document limit with the Limit setting.
Auto When enabled, Compass automatically updates the preview documents pane to
Preview reflect the results of each active stage as the pipeline progresses.

Prepared by Kamal Podder Page 12


Pipeline Settings

To view and modify pipeline settings:


1. Click the gear icon at the top right of the pipeline builder to open the Settings panel.
2. Modify the pipeline settings. The settings as shown above can be modified.
3. Click Apply to save changes and close the Settings panel.

Example
The following example walks through creating and executing a pipeline for a collection containing
airline data. You can download this dataset from the following link: air_airlines.json.
For instructions on importing JSON data into your cluster, see mongoimport. This procedure assumes
you have the data in the example.air_airlines namespace.

Prepared by Kamal Podder Page 13


Create the Pipeline

The following pipeline has two aggregation stages: $group and $match.

1. The $group stage groups documents by their active status and country. The stage also adds a new
field called flightCount containing the number of documents in each group.
2. The $match stage filters documents to only return those with a flightCount value greater than or
equal to 5.

Prepared by Kamal Podder Page 14


Copy the Pipeline to the Clipboard

1. Click the Export to Language button at the top of the pipeline builder. The button opens the
following modal:

2. Click the Copy button in the My Pipeline panel on the left. This button copies your pipeline to
your clipboard in mongo shell syntax. You will use the pipeline in the following section.

Prepared by Kamal Podder Page 15


Execute the Pipeline

1. Launch and connect to a mongod instance which contains the


imported air_airlines.json dataset.
2. Switch to the example database where the air_airlines collection exists:
use example
3. Run the following command to execute the pipeline created in Compass:
db.air_airlines.aggregate([{
$group: {
_id: {
active: '$active',
country: '$country'
}, The following is an excerpt of the documents returned by the
flightCount: { pipeline:
$sum: 1 { "_id" : { "active" : "Y", "country" : "Nigeria" }, "flightCount" : 5 }
} { "_id" : { "active" : "N", "country" : "Switzerland" }, "flightCount" : 46 }
} { "_id" : { "active" : "N", "country" : "Bahrain" }, "flightCount" : 8 }
}, { { "_id" : { "active" : "N", "country" : "Guinea-Bissau" }, "flightCount" : 8 }
$match: { { "_id" : { "active" : "N", "country" : "Argentina" }, "flightCount" : 14 }
flightCount: { { "_id" : { "active" : "N", "country" : "Moldova" }, "flightCount" : 17 }
$gte: 5 { "_id" : { "active" : "Y", "country" : "Israel" }, "flightCount" : 6 }
} { "_id" : { "active" : "N", "country" : "Finland" }, "flightCount" : 7 }
}
}])

Import Pipeline from Text


We can import aggregation pipelines from plain text into the Aggregation Pipeline Builder to easily
modify and execute your pipelines. Importing a plain text aggregation pipeline shows how each stage
of the pipeline affects the output, and illustrates the effects of modifying specific pipeline stages using
the Pipeline Builder's Output panes.

Syntax
The imported pipeline must be in the MongoDB query language (i.e. the same syntax used as the
pipeline parameter of the db.collection.aggregate() method). The imported pipeline must be an array,
even if there is only one stage in the pipeline.

Procedure

 Open the Aggregation Pipeline Builder for the desired collection

Navigate to the collection for which you wish to import your aggregation pipeline. Click the
Aggregations tab.

Prepared by Kamal Podder Page 16


 Open the New Pipeline From Plain Text dialog

a. Click the arrow next to the icon at the top of the pipeline builder.
b. Click New Pipeline From Text.

 Enter your pipeline in the dialog

If you have a pre-written pipeline you wish to import into the Aggregation Pipeline Builder, copy it to
your clipboard and paste it into the New Pipeline from Plain Text dialog. Otherwise, type your pipeline
in the input.

Click Create New


Click Confirm to import your pipeline

Once you import your pipeline, you can add and modify individual stages and see the results reflected
in the Output of each respective stage.

Prepared by Kamal Podder Page 17


Introduction to the common pipeline stage of aggregate

$match

 Filter the documents, and only the documents that meet the specified conditions are passed to the
next pipeline stage.
 $match accepts a document with specified query criteria. The query syntax is the same as the read
operation query syntax. Syntax : { $match: { } }
 $match can use all conventional query operators except geospatial. In practical application, try to
put $match in front of the pipeline. This has two advantages: one is that you can quickly filter out
unnecessary documents to save time Reduce pipeline workload; Second, if $match is executed
before projection and grouping, Queries can use indexes.

Restrictions:
 You cannot use $as part of an aggregate pipeline in a $match query.
 To use $text in the $match phase, the $match phase must be the first phase of the pipeline.
 View does not support text search.
Example

Sample data:

{ "_id" : ObjectId("512bc95fe835e68f199c8686"), "author" : "dave", "score" : 80, "views" : 100 }


{ "_id" : ObjectId("512bc962e835e68f199c8687"), "author" : "dave", "score" : 85, "views" : 521 }
{ "_id" : ObjectId("55f5a192d4bede9ac365b257"), "author" : "ahn", "score" : 60, "views" : 1000 }
{ "_id" : ObjectId("55f5a192d4bede9ac365b258"), "author" : "li", "score" : 55, "views" : 5000 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b259"), "author" : "annT", "score" : 60, "views" : 50 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b25a"), "author" : "li", "score" : 94, "views" : 999 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b25b"), "author" : "ty", "score" : 95, "views" : 1000 }

1. Use $match to do simple matching query


Output :

/* 1 */
{
"_id" : ObjectId("512bc95fe835e68f199c8686"),
"author" : "dave", "score" : 80, "views" : 100
}
/* 2 */
{
"_id" : ObjectId("512bc962e835e68f199c8687"),
"author" : "dave", "score" : 85, "views" : 521
}

Prepared by Kamal Podder Page 1


2. Use the $match pipeline to select the documents to be processed, and then output the results to
the $group pipeline to calculate the document count:
Output :
/* 1 */
{
"_id" : null,
"count" : 5.0
}
$count

Returns the count of documents that contain input to the stage, which is understood as the count of
documents that match the find() query of the table or view. The DB.Collection.Count() method does
not perform the find() operation, but counts and returns the number of results that match the query.

Example:
Sample data:

{ "_id" : 1, "subject" : "History", "score" : 88 }


{ "_id" : 2, "subject" : "History", "score" : 92 }
{ "_id" : 3, "subject" : "History", "score" : 97 }
{ "_id" : 4, "subject" : "History", "score" : 71 }
{ "_id" : 5, "subject" : "History", "score" : 79 }
{ "_id" : 6, "subject" : "History", "score" : 83 }
Implementation:
1) The $match phase excludes documents with score less than or equal to 80 and passes documents
with score greater than or equal to 80 to the next phase
2) The $count phase returns the count of the remaining documents in the aggregation pipeline and
assigns the value to a field named passing_scores.

Implementation results:

Output :

Prepared by Kamal Podder Page 2


$group
The $group stage separates documents into groups according to a "group key". The output is one
document for each unique group key and the documents of each different grouping are output to the
next stage.

A group key is often a field, or group of fields. The group key can also be the result of an expression.
Use the _id field in the $group pipeline stage to set the group key. The output documents can also
contain additional fields that are set using accumulator expressions.

NOTE : $group does not order its output documents.


_id : Required. The _id expression specifies
The $group stage has the following prototype form: the group key. If you specify an _id value of
{ null, or any other constant value,
$group: the $group stage returns a single document
{ that aggregates values across all of the input
_id: <expression>, // Group key documents.
<field1>: { <accumulator1> : <expression1> },
... field: Optional. Computed using the
} accumulator operators.
}
The _id and the accumulator operators can
Accumulator operator accept any valid expression.

A list of some accumulator operator is given below :

name describe Analogy SQL


$avg Calculate the mean value. Ignores non-numeric values. avg
Changed in version 5.0: Available in $setWindowFields stage.
$first Return the first document of each group. If it is sorted, it will be limit 0,1
sorted. If it is not sorted, it will be the first document in the default
stored order.
Changed in version 5.0: Available in $setWindowFields stage.
$firstN Returns an aggregation of the first n elements within a group. Only
meaningful when documents are in a defined order. Distinct from
the $firstN array operator.
New in version 5.2: Available in $group, expression and
$setWindowFields stages.
$last Returns the last document of each group. If there is a sort, it will be –
sorted. If there is no default stored order, it will be the last
document.
$max According to the grouping, get the maximum value of all max
documents in the collection.
$min According to the grouping, get the minimum value of the min

Prepared by Kamal Podder Page 3


corresponding value of all documents in the collection.
$push Adds the value of the specified expression to an array. –
$addToSet Adds the value of an expression to a collection (no duplicate values, –
out of order).
$sum Calculate the sum sum
$stdDevPop Returns the population standard deviation of the input value –
$stdDevSamp Returns the sample standard deviation of the input value –

The memory limit of the $group phase is 100m. By default, if the stage exceeds this limit, $group will
generate an error. However, to allow processing of large datasets, set the allowdiskuse option to true
to enable the $group operation to write temporary files.

Example:
Sample data:

{ "_id" : 1, "item" : "abc", "price" : 10, "quantity" : 2, "date" : ISODate("2014-03-01T08:00:00Z") }


{ "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1, "date" : ISODate("2014-03-01T09:00:00Z") }
{ "_id" : 3, "item" : "xyz", "price" : 5, "quantity" : 10, "date" : ISODate("2014-03-15T09:00:00Z") }
{ "_id" : 4, "item" : "xyz", "price" : 5, "quantity" : 20, "date" : ISODate("2014-04-04T11:21:39.736Z") }
{ "_id" : 5, "item" : "abc", "price" : 10, "quantity" : 10, "date" : ISODate("2014-04-04T21:23:13.331Z") }

1. The following summary operations use the $group stage to group documents by month, date and
year, calculate the total price and average quantity, and calculate the number of documents in
each group:

return:
/* 1 */
{
"_id" : { "month" : 4, "day" : 4, "year" : 2014 },
"totalPrice" : 200, "averageQuantity" : 15.0, "count" : 2.0
}
/* 2 */
{
Prepared by Kamal Podder Page 4
"_id" : { "month" : 3, "day" : 15, "year" : 2014 },
"totalPrice" : 50, "averageQuantity" : 10.0, "count" : 1.0
}

/* 3 */
{
"_id" : { "month" : 3, "day" : 1, "year" : 2014 },
"totalPrice" : 40, "averageQuantity" : 1.5, "count" : 2.0
}

2. group null , The following aggregation operations specify the group - If Id is null, calculate the total
price, average quantity and count of all documents in the collection

3. Query distinct values


The following rollup uses the $group phase to group documents by item to retrieve different item
values:
Output :
db.getCollection(‘test’).aggregate([ { { “_id” : “xyz” }
$group : {_id : $item } { “_id” : “jkl” }
}]) { “_id” : “abc” }

4. Data conversion

 Group the data in the collection by price and convert it into an item array

The returned data _id value is the field specified in the group. Items can be customized and is a
grouped list.

Prepared by Kamal Podder Page 5


db.getCollection(‘test’).aggregate( [
{ $group : { _id : “$price”, items: { $push : “$item } } } $push will create an array and
]) store part or all of the grouped
Output : source documents in it.
{
“_id” : 5,
“items” : [ “xyz”, “xyz” ]
} The $$ROOT variable contains the source
{ documents for the group. If you'd like to just pass
“_id” : 20, them through unmodified, we can do this by
“items” : [ “jkl” ] $pushing $$ROOT into the output from the group.
} Here we are grouping based on item and insert the
{ whole documents in to an array field 'items'.
“_id” : 10,
“items” : [ “abc”, “abc” ]
}

 Aggregate the operating utility variable $$ROOT to group documents by item. The generated
document cannot exceed the bson document size limit.

return:

{
"_id" : "xyz",
"books" : [
{
"_id" : 3,
"item" : "xyz", "price" : 5, "quantity" : 10,
"date" : ISODate("2014-03-15T09:00:00.000Z")
},
{
"_id" : 4,
"item" : "xyz", "price" : 5, "quantity" : 20,
"date" : ISODate("2014-04-04T11:21:39.736Z")
}
]
}

{
"_id" : "jkl",

Prepared by Kamal Podder Page 6


"books" : [
{
"_id" : 2,
"item" : "jkl", "price" : 20, "quantity" : 1,
"date" : ISODate("2014-03-01T09:00:00.000Z")
}
]
}
{
"_id" : "abc",
"books" : [
{
"_id" : 1,
"item" : "abc", "price" : 10, "quantity" : 2,
"date" : ISODate("2014-03-01T08:00:00.000Z")
},
{
"_id" : 5,
"item" : "abc", "price" : 10, "quantity" : 10,
"date" : ISODate("2014-04-04T21:23:13.331Z")
}
]
}
$unwind

This operator deconstructs the array field from the input document to output the document for each
element. In short, it is used to split an array into separate documents.

Syntax: You can pass a field path operand or a document operand to unwind an array field.

Field Path Operand  You can pass the array field path to $unwind. When using this
syntax, $unwind does not output a document if the field value is null,
missing, or an empty array.
{ $unwind: <field path> }
 When you specify the field path, prefix the field name with a dollar
sign $ and enclose in quotes.
Document Operand  You can pass a document to $unwind to specify various behavior options.
with Options {
$unwind:
{
path: <field path>,
includeArrayIndex: <string>,
preserveNullAndEmptyArrays: <boolean>
}
Prepared by Kamal Podder Page 7
}
 includeArrayIndex : Optional. The name of a new field to hold the array index
of the element. The name cannot start with a dollar sign $.
 preserveNullAndEmptyArrays: Optional, The default value is false.

o If true, if the path is null, missing, or an empty array, $unwind outputs


the document.
o If false, if path is null, missing, or an empty array, $unwind does not
output a document.

Example:

1. Sample data : { "_id" : 1, "item" : "ABC1", sizes: [ "S", "M", "L"] }

The following aggregation uses $unwind to output a document for each element in the sizes array:
return:
db.getCollection('test').aggregate(
{ “_id” : 1, “item” : “ABC1”, “sizes” : “S” }
[ { $unwind : "$sizes" } ]
{ “_id” : 1, “item” : “ABC1”, “sizes” : “M” }
)
{ “_id” : 1, “item” : “ABC1”, “sizes” : “L” }
Each document is the same as the input document, except that the values of the sizes field are the
values of the original sizes array.

2. Take the following example data:

{ "_id" : 1, "item" : "ABC", "sizes": [ "S", "M", "L"] }


{ "_id" : 2, "item" : "EFG", "sizes" : [ ] }
{ "_id" : 3, "item" : "IJK", "sizes": "M" }
{ "_id" : 4, "item" : "LMN" }
{ "_id" : 5, "item" : "XYZ", "sizes" : null }

The following $unwind operation uses the include array index option to output the array index of an
array element.

db.getCollection('test').aggregate( [
{ $unwind: { path: "$sizes", includeArrayIndex: "arrayIndex" } }
])

return:
{ “_id” : 1, “item” : “ABC”, “sizes” : “S”, “arrayIndex” : NumberLong(0) }
{ “_id” : 1, “item” : “ABC”, “sizes” : “M”, “arrayIndex” : NumberLong(1) }
{ “_id” : 1, “item” : “ABC”, “sizes” : “L”, “arrayIndex” : NumberLong(2) }
{ “_id” : 3, “item” : “IJK”, “sizes” : “M”, “arrayIndex” : null }

Prepared by Kamal Podder Page 8


The following $unwind operation uses the preserve null and empty arrays option to include documents
in the output that are missing the size field, null, or empty array.

db.inventory.aggregate( [
{ $unwind: { path: "$sizes", preserveNullAndEmptyArrays: true } }
])
return:
{ "_id" : 1, "item" : "ABC", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC", "sizes" : "L" }
{ "_id" : 2, "item" : "EFG" }
{ "_id" : 3, "item" : "IJK", "sizes" : "M" }
{ "_id" : 4, "item" : "LMN" }
{ "_id" : 5, "item" : "XYZ", "sizes" : null }

$project

 $project can select the desired and unwanted fields from the document.
 The specified field can be an existing field from an input document or a new calculated field.
 It can also perform some complex operations through pipeline expressions, such as mathematical
operations, date operations, string operations, and logical operations.

Syntax : { $project: { } }

The $project pipeline character is used to select fields (specify fields, add fields, and do not display
fields) _id: 0, exclude fields, etc.), rename fields, derive fields.

Specifications come in the following forms:

: <1 or true> Whether to include the field, field:1/0 , indicating select / not select field
_id: <0 or false> Specify _id field
: Add a new field or reset the value of an existing field. Change in version 3.6: mongodb 3.6 adds the
variable remove. If the expression evaluates to $$remove, the field is excluded from the output.
:<0 or false> V3.4 new function, specify exclusion field
 By default _id field is included in the output document. To include any other fields in the input
document in the output document, include in $project must be explicitly specified. If you specify
to include a field that does not exist in the document, $project ignores the field inclusion and
does not add the field to the document.
 To add a new field or reset the value of an existing field, specify the field name and set its value
to an expression.
 To set the field value directly to a number or Boolean text, rather than to an expression that
resolves to text, use the $literal operator. Otherwise, the $project treats a number or Boolean
text as a flag to include or exclude the field.

Prepared by Kamal Podder Page 9


 We can effectively rename a field by specifying a new field and setting its value to the field path
of an existing field.
 The $project phase supports creating new array fields directly using square brackets. If the array
specification contains a field that does not exist in the document, the operation replaces the null
value with the value of the field.
 We can use point symbols when projecting or adding / resetting fields embedded in a document.
For example: "contact.address.country": <1 or 0 or expression> or
contact: { address: { country: <1 or 0 or expression> } }

Example:

Sample data:
{
"_id" : 1,
title: "abc123", isbn: "0001122223334", author: { last: "zzz", first: "aaa" },
copies: 5, lastModified: "2016-07-28"
}

1. The output document of the following $project phase contains only_ id , title and author fields:

db.getCollection('test').aggregate( [ { $project : { title : 1 , author : 1 } } ] )


return:
{ "_id" : 1, "title" : "abc123", "author" : { "last" : "zzz", "first" : "aaa" } }

To exclude _id field from the output document of the $project phase set it to 0.

db.getCollection('test').aggregate( [ { $project : { _id: 0, title : 1 , author : 1 } } ] )


return:
{ "title" : "abc123", "author" : { "last" : "zzz", "first" : "aaa" } }

2. Exclude fields from nested documents,In the $project phase, the author.first and LastModified
fields are excluded from the output:

db.test.aggregate( [ { $project : { "author.first" : 0, "lastModified" : 0 } } ] )

Alternatively, we can nest exclusion specifications in a document:

db.test.aggregate( [ { $project: { "author": { "first": 0}, "lastModified" : 0 } } ] )


return:
{
"_id" : 1,
"title" : "abc123", "isbn" : "0001122223334",
"author" : { "last" : "zzz" }, "copies" : 5,
}

Prepared by Kamal Podder Page 10


Starting with mongodb 3.6, we can use the variable remove in aggregate expressions to conditionally
disable a field.

Sample data:
{
"_id" : 1,
title: "abc123", isbn: "0001122223334",
author: { last: "zzz", first: "aaa" }, copies: 5, lastModified: "2016-07-28"
}
{
"_id" : 2,
title: "Baked Goods", isbn: "9999999999999",
author: { last: "xyz", first: "abc", middle: "" }, copies: 2, lastModified: "2017-07-21"
}
{
"_id" : 3,
title: "Ice Cream Cakes", isbn: "8888888888888",
author: { last: "xyz", first: "abc", middle: "mmm" }, copies: 5, lastModified: "2017-07-22"
}
3. The following $project phase uses the remove variable to exclude the author.middle field if it is
equal to ”“:
db.books.aggregate( [
{ Output :
$project: {
title: 1, return:
"author.first": 1, "author.last" : 1, { "_id" : 1, "title" : "abc123", "author" : { "last" :
"author.middle": { "zzz", "first" : "aaa" } }
$cond: { { "_id" : 2, "title" : "Baked Goods",
if: { $eq: [ "", "$author.middle" ] }, "author" : { "last" : "xyz", "first" : "abc" } }
then: "$$REMOVE", { "_id" : 3, "title" : "Ice Cream Cakes", "author" : {
else: "$author.middle" "last" : "xyz", "first" : "abc", "middle" : "mmm" } }
}
}
}
}] )
Contains the specified fields from the embedded document(the results only return fields that contain
nested documents, and of course, fields that contain nested documents _id)

Sample document:

{ _id: 1, user: "1234", stop: { title: "book1", author: "xyz", page: 32 } }


{ _id: 2, user: "7890", stop: [ { title: "book2", author: "abc", page: 5 },
{ title: "book3", author: "ijk", page: 100 } ] }
Only the title field in the stop field is returned:

Prepared by Kamal Podder Page 11


db.bookmarks.aggregate( [ { $project: { "stop.title": 1 } } ] )
or
db.bookmarks.aggregate( [ { $project: { stop: { title: 1 } } } ] )
return:
{ "_id" : 1, "stop" : { "title" : "book1" } }
{ "_id" : 2, "stop" : [ { "title" : "book2" }, { "title" : "book3" } ] }

Include calculated fields

Sample data:
{
"_id" : 1,
title: "abc123", isbn: "0001122223334",
author: { last: "zzz", first: "aaa" }, copies: 5
}
ISBN, LastName and copiesold are added to the return field

db.books.aggregate(
[ Return result :
{ {
$project: { "_id" : 1,
title: 1, "title" : "abc123",
isbn: { "isbn" : {
prefix: { $substr: [ "$isbn", 0, 3 ] }, "prefix" : "000",
group: { $substr: [ "$isbn", 3, 2 ] }, "group" : "11",
publisher: { $substr: [ "$isbn", 5, 4 ] }, "publisher" : "2222",
title: { $substr: [ "$isbn", 9, 3 ] }, "title" : "333",
checkDigit: { $substr: [ "$isbn", 12, 1] } "checkDigit" : "4"
}, },
lastName: "$author.last", "lastName" : "zzz",
copiesSold: "$copies" "copiesSold" : 5
} }
}
]
)
Project a new array field

Sample data:

{ "_id" : ObjectId("55ad167f320c6be244eb3b95"), "x" : 1, "y" : 1 }

The following aggregation operation returns the new array field myArray:

Prepared by Kamal Podder Page 12


db.collection.aggregate( [ { $project: { myArray: [ "$x", "$y" ] } } ] )
return:
{ "_id" : ObjectId("55ad167f320c6be244eb3b95"), "myArray" : [ 1, 1 ] }

If the returned array contains fields that do not exist, null will be returned:
db.collection.aggregate( [ { $project: { myArray: [ "$x", "$y", "$someField" ] } } ] )

return:
{ "_id" : ObjectId("55ad167f320c6be244eb3b95"), "myArray" : [ 1, 1, null ] }

$limit $skip

Limit the number of documents passed to the Skip the specified number of documents entering
next stage in the pipeline the stage and pass the remaining documents to
Syntax : { $limit: } the next stage in the pipeline
Example: Synatx : { $skip: }
db.article.aggregate(
{ $limit : 5 } Example:
); db.article.aggregate(
This operation returns only the first five { $skip : 5 }
documents passed to it by the pipeline$ Limit has );
no effect on the content of the document it This action skips the first five documents passed
passes. to it by the pipeline$ Skip has no effect on the
content of the document passed along the
pipeline.

$sort
Sort all input documents and return them to the pipeline in sort order.

Syntax: { $sort: { <field_1>: <sort order>, <field_2>: <sort order> ... } }

$sort specifies the fields to sort and the documents in the corresponding sort order. Can have one of
the following values:
 1 specifies the ascending order.
 -1 specifies descending order.
 {$meta: “textscore”} sorts the calculated textscore metadata in descending order.

Example:
To sort fields, set the sort order to 1 or – 1 to specify ascending or descending sort, respectively, as
shown in the following example:

db.users.aggregate( [ { $sort : { age : -1, posts: 1 } } ])

Prepared by Kamal Podder Page 13


When comparing values of different bson types, mongodb uses the following comparison order, from
lowest to highest:
1 MinKey (internal type)
2 Null
3 Numbers (ints, longs, doubles, decimals)
4 Symbol, String
5 Object
6 Array
7 BinData
8 ObjectId
9 Boolean
10 Date
11 Timestamp
12 Regular Expression
13 MaxKey (internal type)

We will discuss metadata sort in after $sortByCount

$sortByCount

Groups incoming documents based on the value of a specified expression, then computes the count
of documents in each distinct group.
 Each output document contains two fields: an _id field containing the distinct grouping value,
and a count field containing the number of documents belonging to that grouping or category.
 The documents are sorted by count in descending order.

Syntax : { $sortByCount: <expression> } Expression to group by. We can specify any


expression except for a document literal.

To specify a field path, prefix the field name with a dollar sign $ and enclose it in quotes. For example,
to group by the field employee, specify "$employee" as the expression.
{ $sortByCount: "$employee" }

The $sortByCount stage is equivalent to the following $group + $sort sequence:


{ $group: { _id: <expression>, count: { $sum: 1 } } },
{ $sort: { count: -1 } }

Example

Consider a collection exhibits with the following documents:


{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
"tags" : [ "painting", "satire", "Expressionism", "caricature" ] }
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
"tags" : [ "woodcut", "Expressionism" ] }
Prepared by Kamal Podder Page 14
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925, "tags" : [ "oil", "Surrealism", "painting" ] }
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
"tags" : [ "woodblock", "ukiyo-e" ] }
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
"tags" : [ "Surrealism", "painting", "oil" ] }
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
"tags" : [ "oil", "painting", "abstract" ] }
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893,
"tags" : [ "Expressionism", "painting", "oil" ] }
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918, "tags" : [ "abstract", "painting" ] }

The following operation unwinds the tags array and uses the $sortByCount stage to count the number
of documents associated with each tag:

db.exhibits.aggregate( [ { $unwind: "$tags" }, { $sortByCount: "$tags" } ] )

The operation returns the following documents, sorted in descending order by count:
{ "_id" : "painting", "count" : 6 }
{ "_id" : "oil", "count" : 4 }
{ "_id" : "Expressionism", "count" : 3 }
{ "_id" : "Surrealism", "count" : 2 }
{ "_id" : "abstract", "count" : 2 }
{ "_id" : "woodblock", "count" : 1 }
{ "_id" : "woodcut", "count" : 1 }
{ "_id" : "ukiyo-e", "count" : 1 }
{ "_id" : "satire", "count" : 1 }
{ "_id" : "caricature", "count" : 1 }

Text Score Metadata Sort


Suppose we have a collection called posts with the following documents:
{
"_id" : 1, "title" : "Web",
"body" : "Create a funny website with these three easy steps...",
"date" : "2021-01-01T00:00:00.000Z"
} Notice that the first date field contains
{ a date string, whereas the other two
"_id" : 2, "title" : "Animals", documents use a Date object. The date
"body" : "Animals are funny things...", string contains exactly the same date
"date" : ISODate("2020-01-01T00:00:00Z") as document 3, and this date is a later
} date than the date in document 2.
{
"_id" : 3, "title" : "Oceans",
"body" : "Oceans are wide and vast, but definitely not funny...",
Prepared by Kamal Podder Page 15
"date" : ISODate("2021-01-01T00:00:00Z")
}
We can use the sort() method to sort the metadata values for a calculated metadata field. Here posts
collection is sorted using the metadata “textScore”. The field name within the sort() method can be
arbitrary since the query system ignores the field name.

"textScore" is a metadata which returns the score associated with the corresponding $text query for
each matching document. The text score signifies how well the document matched the search term or
terms.
We can use the { $meta: "textScore" } argument to sort by descending relevance score when
using $text searches.
From MongoDB 4.4 the line that goes { score: {
db.posts.find( $meta: "textScore" }} is optional. Omitting this
{ $text: { $search: "funny" } }, will omit the score field from the results. So we
{ score: { $meta: "textScore" }} can do the following (from MongoDB 4.4):
).sort({ score: { $meta: "textScore" } } db.posts.find(
).pretty() { $text: { $search: "funny" } }
).sort({ score: { $meta: "textScore" } }
Result: sorted by { $meta: "textScore" }. ).pretty()

{
"_id" : 2, "title" : "Animals", "body" : "Animals are funny things...",
"date" : ISODate("2020-01-01T00:00:00Z"), "score" : 0.6666666666666666
}
{
"_id" : 3, "title" : "Oceans", "body" : "Oceans are wide and vast, but definitely not funny...",
"date" : ISODate("2021-01-01T00:00:00Z"), "score" : 0.6
}
{
"_id" : 1, "title" : "Web", "body" : "Create a funny website with these three easy steps...",
"date" : "2021-01-01T00:00:00.000Z", "score" : 0.5833333333333334
}
Not : Doing $text searches like this requires that we’ve created a text index. If not,
an IndexNotFound error will be returned.

To understand the metadata sort let we discuss the text based searched in MongoDB engine ( it
behaves like web search engine)

Sorting the Metadata


In MongoDB, using the sort() method we can sort the metadata values. Let us discuss with the help of
examples. Doing $text searches and metadata sort require that we’ve created a text index. If not,
an IndexNotFound error will be returned.

Prepared by Kamal Podder Page 16


SAMPLE DATA

db.recipes.insertMany([
{"name": "Cafecito", "description": "A sweet and rich Cuban hot coffee made by topping an
espresso shot with a thick sugar cream foam."},
{"name": "New Orleans Coffee", "description": "Cafe Noir from New Orleans is a spiced, nutty
coffee made with chicory."},
{"name": "Affogato", "description": "An Italian sweet dessert coffee made with fresh-brewed
espresso and vanilla ice cream."},
{"name": "Maple Latte", "description": "A wintertime classic made with espresso and steamed
milk and sweetened with some maple syrup."},
{"name": "Pumpkin Spice Latte", "description": "It wouldn't be autumn without pumpkin spice
lattes made with espresso, steamed milk, cinnamon spices, and pumpkin puree."}
])

Creating a Text Index

 To start using MongoDB’s full-text search capabilities, we must create a text index on a collection.
 A text index is a special type of index used to further facilitate searching fields containing text data.
When a user creates a text index, MongoDB will automatically drop any language-specific stop
words from searches. This means that MongoDB will ignore the most common words for the given
language (in English, words like “a”, “an”, “the”, or “this”).
 MongoDB will also implement a form of suffix-stemming in searches. This involves MongoDB
identifying the root part of the search term and treating other grammar forms of that root (created
by adding common suffixes like “-ing”, “-ed”, or perhaps “-er”) as equivalent to the root for the
purposes of the search.
 You can only create one text index for any given MongoDB collection, but the index can be created
using more than one field.

In our example collection, there is useful text stored in both the name and description fields of each
document. It could be useful to create a text index for both fields.

Run the following createIndex() method, which will create a text index for the two fields:

db.recipes.createIndex({ "name": "text", "description": "text" });

Searching for One or More Individual Words

 Perhaps the most common search problem is to look up documents containing one or more
individual words.
 Typically, users expect the search engine to be flexible in determining where the given search
terms should appear. As an example, if you were to use any popular web search engine and type in
“coffee sweet spicy”, you likely are not expecting results that will contain those three words in that

Prepared by Kamal Podder Page 17


exact order. It’s more likely that you’d expect a list of web pages containing the words “coffee”,
“sweet”, and “spicy” but not necessarily immediately near each other.

That’s also how MongoDB approaches typical search queries when using text indexes.

Here we outlines how MongoDB interprets search queries with a few examples.

Example : 1

We want to search for coffee drinks with spices in their recipe, so we search for the word spiced alone
using the following command:

db.recipes.find({ $text: { $search: "spiced" } });

Notice that the syntax when using full-text search is slightly different from regular queries. Individual
field names — like name or description — don’t appear in the filter document. Instead, the query uses
the $text operator, telling MongoDB that this query intends to use the text index we created
previously. We don’t need to be any more specific than that because, as a collection may only have a
single text index.

After running this command, MongoDB produces the following list of documents:

Output
{ "_id" : ObjectId("61895d2787f246b334ece915"), "name" : "Pumpkin Spice Latte", "description" : "It
wouldn't be autumn without pumpkin spice lattes made with espresso, steamed milk, cinnamon
spices, and pumpkin puree." }
{ "_id" : ObjectId("61895d2787f246b334ece912"), "name" : "New Orleans Coffee", "description" :
"Cafe Noir from New Orleans is a spiced, nutty coffee made with chicory." }

There are two documents in the result set, both of which contain words resembling the search query.
While the New Orleans Coffee document does have the word spiced in the description, the Pumpkin
Spice Late document doesn’t.

As MongoDB uses stemming, it stripped the word spiced down to just spice, looked up spice in the
index, and also stemmed it. Because of this, the words spice and spices in the Pumpkin Spice
Late document matched the search query successfully, even though you didn’t search for either of
those words specifically.

Example : 2

Look up documents with a two-word query, spiced espresso, to look for a spicy, espresso-based coffee.

db.recipes.find({ $text: { $search: "spiced espresso" } });

Prepared by Kamal Podder Page 18


Output
{ "_id" : ObjectId("61895d2787f246b334ece914"), "name" : "Maple Latte", "description" : "A
wintertime classic made with espresso and steamed milk and sweetened with some maple syrup." }
{ "_id" : ObjectId("61895d2787f246b334ece913"), "name" : "Affogato", "description" : "An Italian
sweet dessert coffee made with fresh-brewed espresso and vanilla ice cream." }
{ "_id" : ObjectId("61895d2787f246b334ece911"), "name" : "Cafecito", "description" : "A sweet and
rich Cuban hot coffee made by topping an espresso shot with a thick sugar cream foam." }
{ "_id" : ObjectId("61895d2787f246b334ece915"), "name" : "Pumpkin Spice Latte", "description" : "It
wouldn't be autumn without pumpkin spice lattes made with espresso, steamed milk, cinnamon
spices, and pumpkin puree." }
{ "_id" : ObjectId("61895d2787f246b334ece912"), "name" : "New Orleans Coffee", "description" :
"Cafe Noir from New Orleans is a spiced, nutty coffee made with chicory." }

When using multiple words in a search query, MongoDB performs a logical OR operation, so a
document only has to match one part of the expression to be included in the result set. The results
contain documents containing both spiced and espresso or either term alone. Notice that words do not
necessarily need to appear near each other as long as they appear in the document somewhere.

Scoring the Results and Sorting By Score

When a query, especially a complex one, returns multiple results, some documents are likely to be a
better match than others. For example, when you look for spiced espresso drinks, those that are both
spiced and espresso-based are more fitting than those without spices or not using espresso as the
base.
Full-text search engines typically assign a relevance score to the search results, indicating how well
they match the search query. MongoDB also does this, but the search relevance is not visible by
default.

Search once again for spiced espresso, but this time have MongoDB also return each result’s search
relevance score. To do this, you could add a projection after the query filter document:

db.recipes.find(
{ $text: { $search: "spiced espresso" } },
{ score: { $meta: "textScore" } }
)

The projection { score: { $meta: "textScore" } } uses the $meta operator, a special kind of projection
that returns specific metadata from returned documents. This example returns the
documents’ textScore metadata, a built-in feature of MongoDB’s full-text search engine that contains
the search relevance score.

After executing the query, the returned documents will include a new field named score, as was
specified in the filter document:

Prepared by Kamal Podder Page 19


Output
{ "_id" : ObjectId("61895d2787f246b334ece913"), "name" : "Affogato", "description" : "An Italian
sweet dessert coffee made with fresh-brewed espresso and vanilla ice cream.",
"score" : 0.5454545454545454 }
{ "_id" : ObjectId("61895d2787f246b334ece911"), "name" : "Cafecito", "description" : "A sweet and
rich Cuban hot coffee made by topping an espresso shot with a thick sugar cream foam.",
"score" : 0.5384615384615384 }
{ "_id" : ObjectId("61895d2787f246b334ece914"), "name" : "Maple Latte",
"description" : "A wintertime classic made with espresso and steamed milk and sweetened with
some maple syrup.", "score" : 0.55 }
{ "_id" : ObjectId("61895d2787f246b334ece912"), "name" : "New Orleans Coffee",
"description" : "Cafe Noir from New Orleans is a spiced, nutty coffee made with chicory.",
"score" : 0.5454545454545454 }
{ "_id" : ObjectId("61895d2787f246b334ece915"), "name" : "Pumpkin Spice Latte",
"description" : "It wouldn't be autumn without pumpkin spice lattes made with espresso, steamed
milk, cinnamon spices, and pumpkin puree.", "score" : 2.0705128205128203 }

Notice how much higher the score is for Pumpkin Spice Latte, the only coffee drink that contains both
the words spiced and espresso. According to MongoDB’s relevance score, it’s the most relevant
document for that query. However, by default, the results are not returned in order of relevance.

To change that, you could add a sort() clause to the query, like this:

db.recipes.find(
{ $text: { $search: "spiced espresso" } }, The syntax for the sorting document is the same as
{ score: { $meta: "textScore" } } that of the projection. Now, the list of documents is
).sort( { score: { $meta: "textScore" } } ); the same, but their order is different:

Output
{ "_id" : ObjectId("61895d2787f246b334ece915"), "name" : "Pumpkin Spice Latte", "description" : "It
wouldn't be autumn without pumpkin spice lattes made with espresso, steamed milk, cinnamon
spices, and pumpkin puree.", "score" : 2.0705128205128203 }
{ "_id" : ObjectId("61895d2787f246b334ece914"), "name" : "Maple Latte", "description" : "A
wintertime classic made with espresso and steamed milk and sweetened with some maple syrup.",
"score" : 0.55 }
{ "_id" : ObjectId("61895d2787f246b334ece913"), "name" : "Affogato", "description" : "An Italian
sweet dessert coffee made with fresh-brewed espresso and vanilla ice cream.", "score" :
0.5454545454545454 }
{ "_id" : ObjectId("61895d2787f246b334ece912"), "name" : "New Orleans Coffee", "description" :
"Cafe Noir from New Orleans is a spiced, nutty coffee made with chicory.", "score" :
0.5454545454545454 }
{ "_id" : ObjectId("61895d2787f246b334ece911"), "name" : "Cafecito", "description" : "A sweet and
rich Cuban hot coffee made by topping an espresso shot with a thick sugar cream foam.", "score" :
0.5384615384615384 }

Prepared by Kamal Podder Page 20


The Pumpkin Spice Latte document appears as the first result since it has the highest relevance score.
Sorting results according to their relevance score can be helpful. This is especially true with queries
containing multiple words, where the most fitting documents will usually contain multiple search terms
while the less relevant documents might contain only one.

Text Score

The $text operator assigns a score to each document that contains the search term in the indexed
fields. The score represents the relevance of a document to a given text search query. The score can be
part of a sort() method specification as well as part of the projection expression.
The { $meta: "textScore" } expression provides information on the processing of the $text operation.

Examples
Sample Data
{ _id: 1, subject: "coffee", author: "xyz", views: 50 },
{ _id: 2, subject: "Coffee Shopping", author: "efg", views: 5 },
{ _id: 3, subject: "Baking a cake", author: "abc", views: 90 },
{ _id: 4, subject: "baking", author: "xyz", views: 100 },
{ _id: 5, subject: "Café Con Leche", author: "abc", views: 200 },
{ _id: 6, subject: "Сырники", author: "jkl", views: 80 },
{ _id: 7, subject: "coffee and cream", author: "efg", views: 10 },
{ _id: 8, subject: "Cafe con Leche", author: "xyz", views: 10 }

Create a text index on field subject : db.articles.createIndex( { subject: "text" } )

Populate the collection with the sample data.

 Search for a Single Word

The query specifies a $search string of coffee: db.articles.find( { $text: { $search: "coffee" } } )

This query returns the documents that contain the term coffee in the indexed subject field, or more
precisely, the stemmed version of the word:

{ "_id" : 2, "subject" : "Coffee Shopping", "author" : "efg", "views" : 5 }


{ "_id" : 7, "subject" : "coffee and cream", "author" : "efg", "views" : 10 }
{ "_id" : 1, "subject" : "coffee", "author" : "xyz", "views" : 50 }

 Match Any of the Search Terms

If the search string is a space-delimited string, $text operator performs a logical OR search on each
term and returns documents that contains any of the terms.
The following query specifies a $search string of three terms delimited by space, "bake coffee cake":

Prepared by Kamal Podder Page 21


db.articles.find( { $text: { $search: "bake coffee cake" } } )

This query returns documents that contain either bake or coffee or cake in the indexed subject field, or
more precisely, the stemmed version of these words:

{ "_id" : 2, "subject" : "Coffee Shopping", "author" : "efg", "views" : 5 }


{ "_id" : 7, "subject" : "coffee and cream", "author" : "efg", "views" : 10 }
{ "_id" : 1, "subject" : "coffee", "author" : "xyz", "views" : 50 }
{ "_id" : 3, "subject" : "Baking a cake", "author" : "abc", "views" : 90 }
{ "_id" : 4, "subject" : "baking", "author" : "xyz", "views" : 100 }

Prepared by Kamal Podder Page 22


Some common Mongo aggregation examples compared with MySQL
Comparison of Mongo and MySQL aggregation

SQL operation / function Mongodb aggregation operation


where $match
group by $group In order to make it easy to
having $match understand, we first make the
select $project following analogy between the
order by $sort common Mongo aggregation
limit $limit operation and the MySQL
sum() $sum query.
count() $sum
join $lookup
(new in v3.2)
Here we will discuss some common Mongo aggregation examples compared with MySQL. Suppose
there is a database record (Table Name: orders) as an example:

{
cust_id: "abc123",
ord_date: ISODate("2012-11-02T17:04:11.102Z"),
status: 'A',
price: 50,
items: [ { sku: "xxx", qty: 25, price: 1 },
{ sku: "yyy", qty: 25, price: 1 } ]
}

MongoDB MySql
1. Count all records in orders table

db.orders.aggregate( [ SELECT COUNT(*) AS count FROM orders


{ $group: { _id: null, count: { $sum: 1 } } }
])
2. Sum all prices in the orders table SELECT SUM(price) AS total FROM orders

db.orders.aggregate( [
{ $group: { _id: null, total: { $sum: "$price" } } }
])
3. For each unique cust_ ID to calculate the sum of SELECT cust_id, SUM(price) AS total
prices FROM orders
db.orders.aggregate( [ GROUP BY cust_id
{ $group: { _id: "$cust_id", total: { $sum: "$price" } } }

Prepared by Kamal Podder Page 1


])

4. For each unique pair cust_ ID and ord_date


calculate the total price. Excluding the time part of
the date.
db.orders.aggregate( [ SELECT cust_id, ord_date,
{ SUM(price) AS total
$group: { FROM orders
_id: { cust_id: "$cust_id", GROUP BY cust_id, ord_date
ord_date: {
month: { $month: "$ord_date" },
day: { $dayOfMonth: "$ord_date" },
year: { $year: "$ord_date"} $dayOfMonth aggregation pipeline operator
} returns the day of the month for a date as a
}, number between 1 and 31.
total: { $sum: "$price" } Similarly $month will return month and $year
} will return year.
}
])
5. For each cust_id return cust_id and total no of SELECT cust_id, count(*)
order if it is more than 1. FROM orders
db.orders.aggregate( [ GROUP BY cust_id HAVING count(*) > 1
{ $group: { _id: "$cust_id", count: { $sum: 1 } } },
{ $match: { count: { $gt: 1 } } }
]) After grouping we are filtering
using $match which is equivalent
When used in the $group stage, $sum has the following to HAVING clause in SQL
syntax and returns the collective sum of all the numeric
values that result from applying a specified expression to
each document in a group of documents that share the
same group by key:
{ $sum: < expression > }
Here the expression is 1, it will aggregate a value of one
for each document in the group, thus yielding the total
number of documents per group.
6. For each unique cust_id and ord_date, calculate the
total price, return only the records whose total
price is greater than 250, and exclude the time part
of the date. SELECT cust_id, ord_date,
db.orders.aggregate( [ SUM(price) AS total
{ FROM orders
$group: { GROUP BY cust_id, ord_date
_id: { cust_id: "$cust_id", HAVING total > 250
ord_date: {
Prepared by Kamal Podder Page 2
month: { $month: "$ord_date" },
day: { $dayOfMonth: "$ord_date" },
year: { $year: "$ord_date"}
}
},
total: { $sum: "$price" }
}
},
{ $match: { total: { $gt: 250 } } }
])
7. For each unique cust_Id and status = A, the total SELECT cust_id, SUM(price) as total
price is calculated. FROM orders
WHERE status = 'A'
db.orders.aggregate( [ GROUP BY cust_id
{ $match: { status: 'A' } },
Here $match is applied before
{ $group: { _id: "$cust_id", total: { $sum: "$price" } } } grouping which is equivalent to
]) WHERE clause of SQL

8. For each unique cust_Id and status = A, the total


price is calculated and only records whose total SELECT cust_id, SUM(price) as total
price is greater than 250 are returned. FROM orders
WHERE status = 'A'
db.orders.aggregate( [ GROUP BY cust_id
{ $match: { status: 'A' } }, HAVING total > 250
{ $group: { _id: "$cust_id", total: { $sum: "$price" } } },
{ $match: { total: { $gt: 250 } } }
])
9. For each unique cust_Id, the corresponding order
item associated with order is in an embedded
document. The quantity field of lineitem will be SELECT cust_id, SUM(li.qty) as qty
shown for each cust_Id. FROM orders o, order_lineitem li
WHERE li.order_id = o.id
items: [ { sku: "xxx", qty: 25, price: 1 },
GROUP BY cust_id
{ sku: "yyy", qty: 25, price: 1 } ]

db.orders.aggregate( [ $unwind will deconstructs the array field


{ $unwind: "$items" }, from the input document to output the
{ document for each element. In short, it splits
$group: { _id: "$cust_id", an array into separate documents.
qty: { $sum: "$items.qty" } Then quantity of each item is accessed as
} $items.qty
}
])

Prepared by Kamal Podder Page 3


10. Count unique cust_id and ord_date excluding the
time part of the date.
db.orders.aggregate( [ SELECT COUNT(*)
{ FROM (SELECT cust_id, ord_date
$group: { FROM orders
_id: { cust_id: "$cust_id", GROUP BY cust_id, ord_date)
ord_date: { as DerivedTable
month: { $month: "$ord_date" },
day: { $dayOfMonth: "$ord_date" },
year: { $year: "$ord_date"} Result of first $group is used in the
} next $group to count the total no.
} Here the first $group is equivalent to
} inner query in SQL.
},
{
$group: { _id: null, count: { $sum: 1 } }
}
])

Prepared by Kamal Podder Page 4


How To Use Aggregations in MongoDB
Stages can perform operations on data such as:
 filtering: Where the list of documents is narrowed down through a set of criteria
 sorting: You can reorder documents based on a chosen field
 transforming: The ability to change the structure of documents means we can remove or rename
certain fields, or perhaps rename or group fields within an embedded document for legibility
 grouping: We can also process multiple documents together to form a summarized result

In the following steps, we’ll prepare a test database to serve as an example data set. We’ll then learn
how to use a few of the most common aggregation pipeline stages individually. Finally, you’ll combine
these stages together to form a complete example pipeline.

Preparing the Test Data

To understand how the aggregation pipelines work, we need a collection of documents with multiple
fields of different types we can filter, sort, group, and summarize in different ways. Say we will use a
sample collection describing the twenty most populated cities in the world.

This document contains the following information:


 name: the city’s name.
 country: the country where the city is located.
 continent: the continent where the city is located.
 population: the city’s population, in millions.

Run the following insertMany() method in the MongoDB shell to simultaneously create a collection
named cities and insert twenty sample documents into it. These documents describe the twenty most
populated cities in the world:

db.cities.insertMany([
{"name": "Seoul", "country": "South Korea", "continent": "Asia", "population": 25.674 },
{"name": "Mumbai", "country": "India", "continent": "Asia", "population": 19.980 },
{"name": "Lagos", "country": "Nigeria", "continent": "Africa", "population": 13.463 },
{"name": "Beijing", "country": "China", "continent": "Asia", "population": 19.618 },
{"name": "Shanghai", "country": "China", "continent": "Asia", "population": 25.582 },
{"name": "Osaka", "country": "Japan", "continent": "Asia", "population": 19.281 },
{"name": "Cairo", "country": "Egypt", "continent": "Africa", "population": 20.076 },
{"name": "Tokyo", "country": "Japan", "continent": "Asia", "population": 37.400 },
{"name": "Karachi", "country": "Pakistan", "continent": "Asia", "population": 15.400 },
{"name": "Dhaka", "country": "Bangladesh", "continent": "Asia", "population": 19.578 },
{"name": "Rio de Janeiro", "country": "Brazil", "continent": "South America", "population": 13.293 },
{"name": "São Paulo", "country": "Brazil", "continent": "South America", "population": 21.650 },
{"name": "Mexico City", "country": "Mexico", "continent": "North America", "population": 21.581 },
Prepared by Kamal Podder Page 1
{"name": "Delhi", "country": "India", "continent": "Asia", "population": 28.514 },
{"name": "Buenos Aires", "country": "Argentina", "continent": "South America",
"population": 14.967 },
{"name": "Kolkata", "country": "India", "continent": "Asia", "population": 14.681 },
{"name": "New York", "country": "United States", "continent": "North America",
"population": 18.819 },
{"name": "Manila", "country": "Philippines", "continent": "Asia", "population": 13.482 },
{"name": "Chongqing", "country": "China", "continent": "Asia", "population": 14.838 },
{"name": "Istanbul", "country": "Turkey", "continent": "Europe", "population": 14.751 }
])

The output will contain a list of object identifiers assigned to the newly inserted objects.

We can verify that the documents were properly inserted by running the find() method on
the cities collection with no arguments like db.cities.find(). This will retrieve all the documents in the
collection.

Using the $match Aggregation Stage

Whether we want to do light document structure processing, summarizing, or complex


transformations, we’ll usually want to focus your analysis on just a selection of documents matching
specific criteria. $match can be used to narrow down the list of documents at any given step of a
pipeline, and can be used to ensure that all subsequent operations will be executed on a limited list of
entries.
As an example, run the following operation. This will construct an aggregation pipeline using a
single $match stage without any particular filtering query:

db.cities.aggregate([ As aggregation pipelines are multi-step processes, the argument


{ $match: { } } is a list of stages, hence the use of square brackets [] denoting an
]) array of multiple elements.

Each element inside this array is an object describing a stage. The stage is written as { $match: { } }. It is
describing the processing stage, the key $match refers to the stage type, and the value { } describes its
parameters. In our example, the $match stage uses the empty query document as its parameter and is
the only stage in the whole processing pipeline.

Remember that $match narrows down the list of documents from the collection. With no filtering
parameters applied, MongoDB will return the list of all cities from the collection.

Next, run the aggregate() method again, but this time include a query document as a parameter to
the $match stage. Any valid query document can be used here.

We can think of using the $match stage as equivalent to querying the collection with find(). The biggest
difference is that $match can be used multiple times in the aggregation pipeline, allowing us to query

Prepared by Kamal Podder Page 2


documents that have already been processed and transformed earlier in the pipeline.

Run the following aggregate() method. This example includes a $match stage to select only cities from
North America:
db.cities.aggregate([
Here { "continent": "North America" } query
{ $match: { "continent": "North America" } }
document appears as the parameter.
])

Consequently, MongoDB returns two cities from North America:

Output
{ "_id" : ObjectId("612d1e835ebee16872a109b0"), "name" : "Mexico City", "country" : "Mexico",
"continent" : "North America", "population" : 21.581 }
{ "_id" : ObjectId("612d1e835ebee16872a109b4"), "name" : "New York", "country" : "United States",
"continent" : "North America", "population" : 18.819 }

This command returns the same output as the following one which instead uses the find() method to
query the database: db.cities.find({ "continent": "North America" })

Following aggregate() method will return cities from North America and Asia:

db.cities.aggregate([
{ $match: { "continent": { $in: ["North America", "Asia"] } } }
])
With that, we’ve learned how to execute an aggregation pipeline and using the $match stage to
narrow down the collection’s documents.

Using the $sort Aggregation Stage

$match does nothing to change or transform the data as it passes through the pipeline. When querying
the database, it’s common to expect a certain order when retrieving the results. Using the standard
query mechanism, you can specify the document order by appending a sort() method to the end of
a find() query. For example, to retrieve every city in the collection and sort them in descending order
by population, you could run an operation like this:

db.cities.find().sort({ "population": -1 })

We can alternatively sort the documents in an aggregation pipeline by including a $sort stage. To
illustrate this, run the following aggregate() method. This follows a similar syntax to the previous
examples that used a $match stage:

db.cities.aggregate([
{ $sort: { "population": -1 } }
])

Prepared by Kamal Podder Page 3


MongoDB will return the same result set as the previous find() operation since using an aggregation
pipeline with just a sorting stage is equivalent to a standard query with a sort order applied:

Output
{ "_id" : ObjectId("612d1e835ebee16872a109ab"), "name" : "Tokyo", "country" : "Japan",
"continent" : "Asia", "population" : 37.4 }
{ "_id" : ObjectId("612d1e835ebee16872a109b1"), "name" : "Delhi", "country" : "India",
"continent" : "Asia", "population" : 28.514 }
{ "_id" : ObjectId("612d1e835ebee16872a109a4"), "name" : "Seoul", "country" : "South Korea",
"continent" : "Asia", "population" : 25.674 }
...
Suppose we want to retrieve cities just from North America sorted by population in ascending order.
To do so, we can apply two processing stages as shown below :

db.cities.aggregate([ This time, MongoDB will return documents


{ $match: { "continent": "North America" } }, representing New York and Mexico City, the
{ $sort: { "population": 1 } } only two cities from North America, starting
]) with New York as it has a lower population.

To obtain these results, MongoDB first passed the document collection through the $match stage,
filtered the documents against the query criteria, and then forwarded the results to the next stage in
line responsible for sorting the results. Just like the $match stage, $sort can appear multiple times in
the aggregation pipeline and can sort documents by any field you might need, including fields that will
only appear in the document structure during the aggregation.

Note: When running filtering and sorting stages at the beginning of the aggregation pipeline, before
any projection, grouping, or other transformation stages, MongoDB will use indexes to maximize the
performance just like it would with standard query.

Using the $group Aggregation Stage

The output documents of $group stage hold information about the group and can contain additional
computed fields like sums or averages across the list of documents from the group.

Here we includes a $group stage that will group the resulting documents by the continent in which
each city is located:

db.cities.aggregate([ For $group stages within an aggregation pipeline,


{ $group: { "_id": "$continent" } } though, it is required that we specify an _id field
]) with a valid expression.

This aggregate() method, though, does specify an _id value; namely, each value found in
the continent fields of each document in the cities collection. Any time we want to refer the values of a
field in an aggregation pipeline like this, we must precede the name of the field with a dollar sign ($). In

Prepared by Kamal Podder Page 4


MongoDB, this is referred to as a field path, as it directs the operation to the appropriate field where it
can find the values to be used in the pipeline stage.

Here in this example, "$continent" tells MongoDB to take the continent field from the original
document and use its value to construct the expression value in the aggregation pipeline. MongoDB
will output a single document for each unique value of that grouping expression:
Output
{ "_id" : "Africa" } Here output is a single document for each of the five
{ "_id" : "Asia" } continents represented in the collection. By default, the
{ "_id" : "South America" } grouping stage doesn’t include any additional fields from
{ "_id" : "Europe" } the original document, since it wouldn’t know how or
{ "_id" : "North America" } from which document to source the other values.

We can, however, specify multiple single-field values in a grouping expression. The following example
method will group documents based on the values in the continent and country documents:

db.cities.aggregate([
{ Here the _id field of grouping expression uses an
$group: { embedded document which, in turn, has two
"_id": { "continent": "$continent", fields inside: one for the continent name and
"country": "$country" } another for the country name. Both fields refer
} to fields from the original documents using the
} field path dollar sign notation.
])
This time MongoDB returns 14 results as there are 14 distinct country-continents pairs in the
collection:
Output
{ "_id" : { "continent" : "Europe", "country" : "Turkey" } }
{ "_id" : { "continent" : "South America", "country" : "Argentina" } }
{ "_id" : { "continent" : "Asia", "country" : "Bangladesh" } }
{ "_id" : { "continent" : "Asia", "country" : "Philippines" } }
{ "_id" : { "continent" : "Asia", "country" : "South Korea" } }
{ "_id" : { "continent" : "Asia", "country" : "Japan" } }
{ "_id" : { "continent" : "Asia", "country" : "China" } }
{ "_id" : { "continent" : "North America", "country" : "United States" } }
{ "_id" : { "continent" : "North America", "country" : "Mexico" } }
{ "_id" : { "continent" : "Africa", "country" : "Nigeria" } }
{ "_id" : { "continent" : "Asia", "country" : "India" } }
{ "_id" : { "continent" : "Asia", "country" : "Pakistan" } }
{ "_id" : { "continent" : "Africa", "country" : "Egypt" } }
{ "_id" : { "continent" : "South America", "country" : "Brazil" } }

MongoDB provides a number of accumulator operators which allow us to find more granular details
about your data. An accumulator operator, sometimes just referred to simply as an accumulator, is a

Prepared by Kamal Podder Page 5


special type of operation that maintains its value or state as it passes through an aggregation pipeline,
such as a sum or average of more than one value.

To illustrate, run the following aggregate() method. This method’s $group stage creates the
required _id grouping expression as well as three additional computed fields. These computed fields all
include an accumulator operator and its value. Here’s a breakdown of these computed fields:

 highest_population: this field contains the maximum population value in the group.
The $max accumulator operator computes the maximum value for "$population" across all
documents in a group.
 first_city: contains the name of the first city in the group. The $first accumulator operator takes the
value of "$name" from the first document appearing in the group. Notice that since the list of
documents is now unordered, this doesn’t automatically make it the city with the highest
population, but rather the first city MongoDB finds within each group.
 cities_in_top_20: holds the number of cities in the collection for each continent-country pair. To
accomplish this, the $sum accumulator operator is used to compute the sum of all the pairs in the
list. In this example, the sum takes one for each document and doesn’t refer to a particular field in
the source document.

We can add as many additional computed fields as needed for your use case, but for now run this
example query:
db.cities.aggregate([
{
$group: {
"_id": { "continent": "$continent", "country": "$country" },
"highest_population": { $max: "$population" },
"first_city": { $first: "$name" },
"cities_in_top_20": { $sum: 1 }
}
}
])

MongoDB returns the following 14 documents, one for each unique group defined by the grouping
expression:
Output
{ "_id" : { "continent" : "North America", "country" : "United States" }, "highest_population" : 18.819,
"first_city" : "New York", "cities_in_top_20" : 1 }
{ "_id" : { "continent" : "Asia", "country" : "Philippines" }, "highest_population" : 13.482,
"first_city" : "Manila", "cities_in_top_20" : 1 }
{ "_id" : { "continent" : "North America", "country" : "Mexico" }, "highest_population" : 21.581,
"first_city" : "Mexico City", "cities_in_top_20" : 1 }
{ "_id" : { "continent" : "Africa", "country" : "Nigeria" }, "highest_population" : 13.463,
"first_city" : "Lagos", "cities_in_top_20" : 1 }
{ "_id" : { "continent" : "Asia", "country" : "India" }, "highest_population" : 28.514,

Prepared by Kamal Podder Page 6


"first_city" : "Mumbai", "cities_in_top_20" : 3 }
{ "_id" : { "continent" : "Asia", "country" : "Pakistan" }, "highest_population" : 15.4,
"first_city" : "Karachi", "cities_in_top_20" : 1 }
{ "_id" : { "continent" : "Africa", "country" : "Egypt" }, "highest_population" : 20.076,
"first_city" : "Cairo", "cities_in_top_20" : 1 }
{ "_id" : { "continent" : "South America", "country" : "Brazil" }, "highest_population" : 21.65,
"first_city" : "Rio de Janeiro", "cities_in_top_20" : 2 }
{ "_id" : { "continent" : "Europe", "country" : "Turkey" }, "highest_population" : 14.751,
"first_city" : "Istanbul", "cities_in_top_20" : 1 }
{ "_id" : { "continent" : "Asia", "country" : "Bangladesh" }, "highest_population" : 19.578,
"first_city" : "Dhaka", "cities_in_top_20" : 1 }
{ "_id" : { "continent" : "South America", "country" : "Argentina" }, "highest_population" : 14.967,
"first_city" : "Buenos Aires", "cities_in_top_20" : 1 }
{ "_id" : { "continent" : "Asia", "country" : "South Korea" }, "highest_population" : 25.674,
"first_city" : "Seoul", "cities_in_top_20" : 1 }
{ "_id" : { "continent" : "Asia", "country" : "Japan" }, "highest_population" : 37.4,
"first_city" : "Osaka", "cities_in_top_20" : 2 }
{ "_id" : { "continent" : "Asia", "country" : "China" }, "highest_population" : 25.582,
"first_city" : "Beijing", "cities_in_top_20" : 3 }

The field names in the returned documents correspond to the computed field names in the grouping
stage document. To examine the results more closely, let’s narrow our focus to a single document:

{ "_id" : { "continent" : "Asia", "country" : "Japan" }, "highest_population" : 37.4,


"first_city" : "Osaka", "cities_in_top_20" : 2 }
 The _id field holds the grouping expression values for Japan and Asia.
 The cities_in_top_20 field shows that two Japanese cities are on the list of the 20 most
populated cities. Recall we have added two documents in the test data representing cities in
Japan (Tokyo and Osaka), so this value is correct.
 The highest_population corresponds to the population of Tokyo, which is indeed the higher
population of the two.
 However, the first_city shows Osaka and not Tokyo, as one might expect. That’s because the
grouping stage used a list of source documents that were not ordered by population, so it
couldn’t guarantee the logical meaning of “first” in that scenario. Osaka was processed by the
pipeline first and hence appears in the first_city field value.

Note: In addition to the three described in this step, there are several more accumulator operators
available in MongoDB that can be used for a variety of aggregations.

Using the $project Aggregation Stage

When working with aggregation pipelines, you’ll sometimes want to return only a few of a document
collection’s multiple fields or change the structure slightly to move some fields into embedded
documents.

Prepared by Kamal Podder Page 7


Say, for example, that we’d like to retrieve the population for each of the cities in the sample
collection, but you’d like the results to take the following format:

Required document structure


{  The location field contains the country and continent pair
"location" : {  The city’s name and population are shown
"country" : "South Korea", in name and population fields, respectively
"continent" : "Asia"  The document identifier _id doesn’t appear in the
},
outputted document.
"name" : "Seoul",
"population" : 25.674
}
We can use the $project stage to construct new document structures in an aggregation pipeline,
thereby altering the way resulting documents appear in the result set.

To illustrate, run the following aggregate() method which includes a $project stage:

db.cities.aggregate([ The value for this $project stage is a projection


{ document describing the output structure. These
$project: { projection documents follow the same format as those
"_id": 0, used in queries, constructed as inclusion projections or
"location": { exclusion projections. The projection document keys
"country": "$country", correspond to the keys from input documents entering
"continent": "$continent", the $project stage.
},
"name": "$name", When the projection document contains keys with 1 as
"population": "$population" their values, it describes the list of fields that will be
} included in the result. If, on the other hand, projection
} keys are set to 0, the projection document describes
]) the list of fields that will be excluded from the result.

In an aggregation pipeline, projections can also include additional computed fields. In such cases, the
projection automatically becomes an inclusion projection, and only the _id field can be suppressed by
appending "_id": 0 to the projection document. Computed fields use the dollar sign field path notation
for their values and can refer to the values from input documents.

In this example, the document identifier is suppressed with "_id": 0, the name and population are
computed fields referring to the name and population fields from the input documents, respectively.
The location field becomes an embedded document with two additional keys: country and continent,
referring to fields from the input documents.

Using this projection stage, MongoDB will return the following documents:
Output

Prepared by Kamal Podder Page 8


{ "location" : { "country" : "South Korea", "continent" : "Asia" }, "name" : "Seoul", "population" : 25.674 }
{ "location" : { "country" : "India", "continent" : "Asia" }, "name" : "Mumbai", "population" : 19.98 }
{ "location" : { "country" : "Nigeria", "continent" : "Africa" }, "name" : "Lagos", "population" : 13.463 }
{ "location" : { "country" : "China", "continent" : "Asia" }, "name" : "Beijing", "population" : 19.618 }
{ "location" : { "country" : "China", "continent" : "Asia" }, "name" : "Shanghai", "population" : 25.582 }
{ "location" : { "country" : "Japan", "continent" : "Asia" }, "name" : "Osaka", "population" : 19.281 }
{ "location" : { "country" : "Egypt", "continent" : "Africa" }, "name" : "Cairo", "population" : 20.076 }
{ "location" : { "country" : "Japan", "continent" : "Asia" }, "name" : "Tokyo", "population" : 37.4 }
{ "location" : { "country" : "Pakistan", "continent" : "Asia" }, "name" : "Karachi", "population" : 15.4 }
{ "location" : { "country" : "Bangladesh", "continent" : "Asia" }, "name" : "Dhaka", "population" : 19.578 }
{ "location" : { "country" : "Brazil", "continent" : "South America" }, "name" : "Rio de Janeiro",
"population" : 13.293 }
{ "location" : { "country" : "Brazil", "continent" : "South America" }, "name" : "São Paulo", "population" : 21.65 }
{ "location" : { "country" : "Mexico", "continent" : "North America" }, "name" : "Mexico City",
"population" : 1.581 }
{ "location" : { "country" : "India", "continent" : "Asia" }, "name" : "Delhi", "population" : 28.514 }
{ "location" : { "country" : "Argentina", "continent" : "South America" }, "name" : "Buenos Aires",
"population" : 14.967 }
{ "location" : { "country" : "India", "continent" : "Asia" }, "name" : "Kolkata", "population" : 14.681 }
{ "location" : { "country" : "United States", "continent" : "North America" }, "name" : "New York",
"population" : 18.819 }
{ "location" : { "country" : "Philippines", "continent" : "Asia" }, "name" : "Manila", "population" : 13.482 }
{ "location" : { "country" : "China", "continent" : "Asia" }, "name" : "Chongqing", "population" : 14.838 }
{ "location" : { "country" : "Turkey", "continent" : "Europe" }, "name" : "Istanbul", "population" : 14.751 }

Each document now follows the new format transformed through the projection stage.

Putting All the Stages Together

We’re now ready to join together all the previous stages to form a fully functional aggregation pipeline
that both filters and transforms documents.

Suppose the task at hand is to find the most populated city for each country in country in Asia and
North America and return both its name and population. The results should be sorted by the highest
population, returning countries with the largest cities first, and you are interested only in countries
where the most populated city crosses the threshold of 20 million people. Lastly, the document
structure you aim for should replicate the following:

Example document
{
"location" : {
"country" : "Japan",
"continent" : "Asia"
},
"most_populated_city" : {
"name" : "Tokyo",

Prepared by Kamal Podder Page 9


"population" : 37.4
}
}
To illustrate how to retrieve a data set that would satisfy these requirements, this step outlines how to
build the appropriate aggregation pipeline.
db.cities.aggregate([
{
This pipeline’s $match stage will
$match: {
only find cities in North America
"continent": { $in: ["North America", "Asia"] }
and Asia, and the documents
}
representing these cities will be
}
returned in their full original
])
structure and with default ordering.
Output
{ "_id" : ObjectId("612d1e835ebee16872a109a4"), "name" : "Seoul", "country" : "South Korea",
"continent" : "Asia", "population" : 25.674 }
{ "_id" : ObjectId("612d1e835ebee16872a109a5"), "name" : "Mumbai", "country" : "India",
"continent" : "Asia", "population" : 19.98 }
{ "_id" : ObjectId("612d1e835ebee16872a109a7"), "name" : "Beijing", "country" : "China",
"continent" : "Asia", "population" : 19.618 }
{ "_id" : ObjectId("612d1e835ebee16872a109a8"), "name" : "Shanghai", "country" : "China",
"continent" : "Asia", "population" : 25.582 }
...
After $match stage we add a $sort stage so that we can order the cities from the highest to the lowest
population by following

db.cities.aggregate([
{
$match: { "continent": { $in: ["North America", "Asia"]
$sort stage tells} MongoDB
} to order the documents by
}, population in descending order.
{
$sort: { "population": -1 }
}
])
Once again the returned documents have the same structure, but this time Tokyo comes first since it
has the highest population:
Output
{ "_id" : ObjectId("612d1e835ebee16872a109ab"), "name" : "Tokyo", "country" : "Japan",
"continent" : "Asia", "population" : 37.4 }
...
We now have the list of cities sorted by the population coming from the expected continents, so the
next necessary action for this scenario is to group cities by their countries, choosing only the most
populated city from each group. To do so, add a $group stage to the pipeline:

Prepared by Kamal Podder Page 10


db.cities.aggregate([
{
$match: { "continent": { $in: ["North America", "Asia"] } }
},
{
$sort: { "population": -1 }
},
{
$group: { "_id": { "continent": "$continent", "country": "$country" },
"first_city": { $first: "$name" },
"highest_population": { $max: "$population" }
}
}
])
This new stage’s grouping expression tells MongoDB to group cities by unique continent and country
pairs. For each group, two computed values summarize the groups. The highest_population value uses
the $max accumulator operator to find the highest population in the group. The first_city gets the
name of the first city in the group of documents. We can be sure that the first city will also the most
populated city in the group and that it will match the numerical population value.

Adding this $group stage changes the number of documents returned by this method as well as their
structure. This time, the method only returns nine documents, as there are only nine unique country
and continent pairs in the previously filtered cities list. Each document corresponds to one of these
pairs, and consists of the grouping expression value in the _id field and two computed fields:
Output
{ "_id" : { "continent" : "North America", "country" : "United States" }, "first_city" : "New York",
"highest_population" : 18.819 }
{ "_id" : { "continent" : "Asia", "country" : "China" }, "first_city" : "Shanghai",
"highest_population" : 25.582 }
{ "_id" : { "continent" : "Asia", "country" : "Japan" }, "first_city" : "Tokyo", "highest_population" : 37.4 }
{ "_id" : { "continent" : "Asia", "country" : "South Korea" }, "first_city" : "Seoul",
"highest_population" : 25.674 }
{ "_id" : { "continent" : "Asia", "country" : "Bangladesh" }, "first_city" : "Dhaka",
"highest_population" : 19.578 }
{ "_id" : { "continent" : "Asia", "country" : "Philippines" }, "first_city" : "Manila",
"highest_population" : 13.482 }
{ "_id" : { "continent" : "Asia", "country" : "India" }, "first_city" : "Delhi", "highest_population" : 28.514 }
{ "_id" : { "continent" : "Asia", "country" : "Pakistan" }, "first_city" : "Karachi",
"highest_population" : 15.4 }
{ "_id" : { "continent" : "North America", "country" : "Mexico" }, "first_city" : "Mexico City",
"highest_population" : 21.581 }

Notice that the resulting documents for each group are not ordered by the population value. New York
comes first, but the second city — Shanghai — has a population of almost 7 million people more. Also,

Prepared by Kamal Podder Page 11


several countries have the most populated cities below the expected threshold of 20 million.
Remember that filtering and sorting stages can appear multiple times in the pipeline. Also, for each
aggregation stage, the last stage’s output is the next stage’s input. Use another $match stage to filter
the groups to contain only countries with the cities satisfying population minimum of 20 million:

db.cities.aggregate([
{
$match: { "continent": { $in: ["North America", "Asia"] } }
},
{ $sort: { "population": -1 } },
{
$group: {
"_id": { "continent": "$continent", "country": "$country" },
"first_city": { $first: "$name" },
"highest_population": { $max: "$population" }
This filtering $match stage refers to
}
the highest_population field available in
},
the documents coming from the
{
grouping stage, even though such a field
$match: { "highest_population": { $gt: 20.0 } }
is not part of the structure of the original
}
documents.
])

This time, five countries appear in the output:


Output
{ "_id" : { "continent" : "Asia", "country" : "China" }, "first_city" : "Shanghai",
"highest_population" : 25.582 }
{ "_id" : { "continent" : "Asia", "country" : "Japan" }, "first_city" : "Tokyo", "highest_population" : 37.4 }
{ "_id" : { "continent" : "Asia", "country" : "South Korea" }, "first_city" : "Seoul",
"highest_population" : 25.674 }
{ "_id" : { "continent" : "Asia", "country" : "India" }, "first_city" : "Delhi", "highest_population" : 28.514 }
{ "_id" : { "continent" : "North America", "country" : "Mexico" }, "first_city" : "Mexico City",
"highest_population" : 21.581 }

Next, sort the results according by their highest_population value. To do so, add another $sort stage:
db.cities.aggregate([
{
$match: { "continent": { $in: ["North America", "Asia"] } }
},
{ $sort: { "population": -1 } },
{
$group: {
"_id": { "continent": "$continent", "country": "$country" },
"first_city": { $first: "$name" },
"highest_population": { $max: "$population" }

Prepared by Kamal Podder Page 12


}
},
{
$match: { "highest_population": { $gt: 20.0 } }
},
{ $sort: { "highest_population": -1 } }
])
The document structure doesn’t change, and MongoDB still returns five documents corresponding to
the country groups. This time, however, Japan appears first since Tokyo is the most populated city of
all in the data set:

Output
{ "_id" : { "continent" : "Asia", "country" : "Japan" }, "first_city" : "Tokyo", "highest_population" : 37.4 }
{ "_id" : { "continent" : "Asia", "country" : "India" }, "first_city" : "Delhi", "highest_population" : 28.514 }
{ "_id" : { "continent" : "Asia", "country" : "South Korea" }, "first_city" : "Seoul",
"highest_population" : 25.674 }
{ "_id" : { "continent" : "Asia", "country" : "China" }, "first_city" : "Shanghai",
"highest_population" : 25.582 }
{ "_id" : { "continent" : "North America", "country" : "Mexico" }, "first_city" : "Mexico City",
"highest_population" : 21.581 }

The last requirement is to transform the document structure to match the sample shown previously.
For your review, here’s that sample once more:
Example document
{ This sample’s location embedded document resembles
"location" : { the _id grouping expression value, as both
"country" : "Japan", include country and continent fields. The most populated
"continent" : "Asia" city name and population are nested as an embedded
}, document under the most_populated_city field. This is
"most_populated_city" : { different from the grouping results, where all computed
"name" : "Tokyo", fields are top-level fields.
"population" : 37.4
}
}
To transform the results to align with this structure, add a $project stage to the pipeline:

db.cities.aggregate([
{
$match: { "continent": { $in: ["North America", "Asia"] } }
},
{ $sort: { "population": -1 } },
{
$group: {
"_id": { "continent": "$continent", "country": "$country" },

Prepared by Kamal Podder Page 13


"first_city": { $first: "$name" },
"highest_population": { $max: "$population" }
}
},
{
$match: { "highest_population": { $gt: 20.0 } }
},
{ $sort: { "highest_population": -1 } },
{
$project: {
"_id": 0,
"location": { "country": "$_id.country", "continent": "$_id.continent", },
"most_populated_city": { "name": "$first_city", "population": "$highest_population" }
}
}
])
 This $project stage first suppresses the _id field from appearing in the output.
 Then it creates a location field as a nested document containing two fields: country and continent.
"$_id.country" pulls values from the country field from inside the _id embedded document of the
input, and $_id.continent pulls values from its continent field.
 most_populated_city follows a similar structure, nesting the name and population fields inside.
These refer to the top-level fields first_city and highest_population, respectively.

This projection stage effectively constructs an entirely new structure for the output as shown below:

Output
{ "location" : { "country" : "Japan", "continent" : "Asia" }, "most_populated_city" : { "name" : "Tokyo",
"population" : 37.4 } }
{ "location" : { "country" : "India", "continent" : "Asia" }, "most_populated_city" : { "name" : "Delhi",
"population" : 28.514 } }
{ "location" : { "country" : "South Korea", "continent" : "Asia" },
"most_populated_city" : { "name" : "Seoul", "population" : 25.674 } }
{ "location" : { "country" : "China", "continent" : "Asia" },
"most_populated_city" : { "name" : "Shanghai", "population" : 25.582 } }
{ "location" : { "country" : "Mexico", "continent" : "North America" },
"most_populated_city" : { "name" : "Mexico City", "population" : 21.581 } }

This output meets all the requirements defined at the beginning of this step:
 It only includes cities from Asia and North America in the lists.
 For each country and continent pair, a single city is selected, and it’s the city with the highest
population.
 The selected city’s name and population are listed.
 Cities are sorted from the most populated to least populated.
 The output format is altered to align with the example document .
Prepared by Kamal Podder Page 14

You might also like