Lecture 3 FULL Explanation

NoSQL
Trouble spots
1
NoSQL
Aggregation
Definition
Aggregation is the process of transforming documents in a

collection into aggregated results based on a specified set of
operations. These operations can include grouping, filtering,
sorting, and performing various calculations.
Aggregation is a powerful feature that allows you to analyze

data and derive insights from your MongoDB collections.
Aggregation Pipeline:
MongoDB's aggregation framework operates primarily
through the concept of an aggregation pipeline.
The aggregation pipeline is a series of stages through which

documents pass, with each stage performing a specific
operation on the documents.
These stages are executed sequentially, with the output of one

stage serving as the input to the next stage.
Basic Structure:
The basic structure of an aggregation pipeline consists of one or more
stages, each defined as a document with a specific operation.
These stages are passed to the aggregate() method of a MongoDB
collection.
With the $group operator, we can perform all the aggregation or

summary queries that we need, such as finding counts, totals, averages
or maximums.
• In this example, we want to know the number of documents per
university in our ‘university’ collection:
db.university.aggregate([ { $group : { _id : '$name', totaldocs : { $sum : 1 } } } ]).pretty()

MongoDB $group aggregation operators
Operator Meaning
$count Calculates the quantity of documents in the given group.
$max Displays the maximum value of a document’s field in the collection.
$min Displays the minimum value of a document’s field in the collection.
$avg Displays the average value of a document’s field in the collection.
$sum Sums up the specified values of all documents in the collection.
$push Adds extra values into the array of the resulting document.
Example $max
db.sales.aggregate([ { $group: { _id: "$item",

maxTotalAmount: { $max: { $multiply: [ "$price", "$quantity" ] } },
maxQuantity: { $max: "$quantity" }} } ])
Grouping the documents by the item field, the following operation uses the $max
accumulator to compute the maximum total amount and maximum quantity for each
group of documents.
MongoDB $unwind
The $unwind operator enables us to work with the values of the fields within
an array.
Where there is an array field within the input documents, you will sometimes
need to output the document several times, once for every element of that array.
Each copy of the document has the array field replaced with the successive
element.
The following aggregation uses the $unwind stage to output a

document for each element in the sizes array
db.inventory.aggregate( [ { $unwind : "$sizes" } ] )

MongoDB main operators
$match: Filters the documents to pass only those that match the
specified conditions.
{ $match: { field: value } }
$group: Groups documents by a specified key and applies

accumulator expressions to calculate aggregated values.
{ $group: { _id: "$field", total: { $sum: "$value" } } }
$project: Reshapes documents by including, excluding, or

renaming fields.
{ $project: { newField: "$existingField", _id: 0 } }
And also $sort $skip $limit

MongoDB $out
This is an unusual type of stage because it allows you to carry the
results of your aggregation over into a new collection, or into an
existing one after dropping it, or even adding them to the existing
documentts.
Example $out
db.universities.aggregate([ { $group : { _id : '$name', totaldocs : { $sum : 1 } } },

{ $out : 'aggResults' } ])
Advantages of Aggregation in MongoDB:
• Performance: Aggregation pipelines can efficiently process large
volumes of data and return aggregated results.
• Flexibility: The aggregation framework provides a rich set of
operators and stages, allowing for complex data transformation and
analysis.
• Scalability: Aggregation operations can be distributed across multiple
nodes in a MongoDB cluster, enabling horizontal scalability.
• Integration: Aggregation pipelines can easily integrate with other
MongoDB features like indexing, sharding, and replication.
In summary, aggregation in MongoDB is a versatile and powerful
feature that allows for efficient data analysis and manipulation using
aggregation pipelines and a rich set of operators and stages. It's a
fundamental tool for deriving insights and performing complex
calculations on MongoDB collections.
NoSQL
Backup/restore
1
Basic Backup - mongodump:
1.Install MongoDB Tools: Ensure that MongoDB Tools, which includes
mongodump, are installed on your system. You can download the MongoDB
Tools package from the MongoDB website or install it via package managers
like apt, yum, or brew, depending on your operating system.
2.Run mongodump Command: Open a terminal or command prompt and
execute the mongodump command, specifying the target database and optionally
the host and port if it's not running on the default localhost:27017.
mongodump --host <hostname> --port <port> --db <database_name> --out
<backup_directory>
Replace <hostname>, <port>, <database_name>, and <backup_directory> with

your MongoDB server details and the desired backup directory path.
2
Filesystem Snapshots:
• If your MongoDB deployment uses a filesystem that supports
snapshot capabilities (e.g., ZFS, LVM), you can take filesystem
snapshots to create backups. This method offers fast and
consistent backups, but it's crucial to ensure that the snapshots are
taken in a consistent state.
3
MongoDB Atlas Backup Service:
• If you're using MongoDB Atlas, you can leverage its built-in backup
service for automated backups. Atlas offers continuous backups with
point-in-time recovery, providing a convenient and reliable backup
solution without the need for manual intervention.
Steps to Enable Backup in MongoDB Atlas:
1.Access Atlas Dashboard: Log in to your MongoDB Atlas account and
navigate to your cluster's dashboard.
2.Select Cluster: Click on the cluster for which you want to enable
backups.
3.Navigate to Backup: In the cluster's overview, go to the "Backup" tab.
4.Configure Backup Settings: Configure backup settings such as backup
frequency, retention policy, and preferred backup window according to
your requirements.
5.Save Changes: Once configured, save the changes to enable automated
backups for your MongoDB Atlas cluster.
Best Practices for MongoDB Backup:
• Regular Schedule: Ensure regular backups are taken, preferably
daily or more frequently for critical data.
• Offsite Storage: Store backups in a separate location or on a
different server to mitigate the risk of data loss in case of
hardware failures or disasters.
• Test Restores: Periodically test backup restores to verify their
integrity and ensure that data can be recovered successfully.
• Monitor Backup Jobs: Monitor backup jobs to detect any
failures or issues promptly.
• Encryption: Consider encrypting backup files to protect
sensitive data during transit and storage.
NoSQL
JSON Schema – data validation

JSON Schema
Creating a JSON Schema for data validation in MongoDB involves
defining the structure and constraints of your documents using the
JSON Schema standard. MongoDB supports JSON Schema
validation natively, allowing you to enforce data integrity and
consistency at the database level.
Define Your JSON Schema:
Start by defining the structure of your documents using JSON
Schema. This includes specifying the fields, their types, and any
constraints or validation rules.
Example JSON Schema for a hypothetical users collection:
{"$jsonSchema": {"bsonType": "object",
"required": ["username", "email", "age"],
"properties": {
"username": {
"bsonType": "string",
"description": "Username of the user"},
"email": {
"bsonType": "string",
"pattern": "^\\S+@\\S+\\.\\S+$",
"description": "Email address of the user"},
"age": {
"bsonType": "int",
"minimum": 18,
"maximum": 120,
"description": "Age of the user"},
"createdAt": {
"bsonType": "date",
"description": "Date when the user was created"}}}}
Apply the JSON Schema to Your
Collection:
Once you have defined your JSON Schema, you can apply it to
your MongoDB collection using the collMod command or the
createCollection command with the validator option.
Example using collMod:
db.runCommand({
collMod: "users",
validator: {
$jsonSchema: {
// Your JSON Schema definition here}},
validationLevel: "strict", // or "moderate" or
"off"
validationAction: "error" // or "warn"})
Validation Options:
• validationLevel: Specifies the level of validation. Options are "strict", "moderate", or
"off". Use "strict" to reject any invalid documents, "moderate" to log validation warnings
but still allow insertion of invalid documents, and "off" to disable validation altogether.
• validationAction: Specifies the action to take when a validation rule is violated. Options
are "error" or "warn". Use "error" to reject the operation that violates the validation rule
and "warn" to log a warning but still allow the operation.
Testing and Maintenance:
• Test Your Schema: Before deploying your JSON Schema to a production

environment, thoroughly test it with sample data to ensure that it behaves as
expected and enforces the desired constraints.
• Maintenance: Periodically review and update your JSON Schema as your
data model evolves. This ensures that your validation rules remain relevant
and effective over time.
Benefits of JSON Schema Validation
• Data Integrity: Enforce consistency and integrity of your data by defining strict
validation rules.
• Security: Protect against malicious or erroneous data insertion by validating
documents against predefined rules.
• Ease of Use: Leverage the simplicity and familiarity of JSON Schema to define
your validation rules.
By following these steps, you can create a JSON Schema with data validation in
MongoDB, ensuring that your documents adhere to predefined rules and constraints,
thus maintaining data integrity and consistency within your database.
NoSQL
Transactions
Transactions
Let's consider a scenario where we need to transfer a student from one course to
another in a university system.
This operation involves updating two collections: students and courses. We want to
ensure that the student is removed from the old course and added to the new course
atomically to maintain data integrity.
Scenario:
• Suppose we have two collections: students and courses.
• Each document in the students collection represents a student, and each
document in the courses collection represents a course.
• Students are associated with courses through a courseId field in the student
document.
Implementing the Transaction:
1. Start a Transaction: Begin by starting a transaction session.
2. Perform Operations within the Transaction:
1. Find the student document and update its courseId to remove the student
from the old course.
2. Find the course document corresponding to the old course and decrement the
enrollmentCount.
3. Find the course document corresponding to the new course and increment
the enrollmentCount.
4. Update the student document to reflect the new course by setting its
courseId.
3. Commit or Abort the Transaction: Depending on the success or failure of the
operations within the transaction, commit or abort the transaction accordingly.
Considerations:
• The transaction ensures that all operations are atomic. If any operation fails, the
transaction will be aborted, and changes will be rolled back.
• By using a transaction, we maintain data integrity, ensuring that the student is
transferred from the old course to the new course atomically.
NoSQL
Geospatial queries
Why we need it
Geospatial queries in MongoDB allow you to perform spatial operations on
geospatial data, such as querying for documents based on their proximity to a point,
within a certain shape, or finding the nearest locations. MongoDB supports two
types of geospatial queries: GeoJSON and Legacy Coordinate Pairs.
Part 1
GeoJSON:
GeoJSON is a format for encoding a variety of geographic data structures.
MongoDB uses GeoJSON objects to represent points, shapes, and other spatial data.
Here's how to use GeoJSON for geospatial queries:
a. Storing GeoJSON Data:

You can store GeoJSON data in MongoDB documents using the GeoJSON format:
{"location": {"type": "Point",
"coordinates": [longitude, latitude]}}
b. Creating Geospatial Index:

To enable efficient geospatial queries, you need to create a geospatial index on the
GeoJSON field:
db.collection.createIndex({ location: "2dsphere" });

Part 2
GeoJSON:
c. Performing Geospatial Queries:
Now you can perform various geospatial queries:
Query by Location Proximity:
db.collection.find({ location: {
$near: { $geometry: {
type: "Point",
coordinates: [longitude, latitude]},
$maxDistance: distanceInMeters});
Query Within a Shape:
db.collection.find({
location: {
$geoWithin: {
$geometry: {
type: "Polygon",
coordinates: [/* Define polygon
vertices */]}}}});
Part 1
Legacy Coordinate Pairs:

Legacy coordinate pairs represent latitude and longitude coordinates in arrays. While
GeoJSON is the preferred format, MongoDB also supports legacy coordinate pairs
for geospatial queries.
a. Storing Legacy Coordinate Pairs:

You can store latitude and longitude coordinates in MongoDB documents using
arrays:
{ "location": [longitude, latitude]}
b. Creating Geospatial Index:
Similarly, you need to create a geospatial index on the legacy coordinate pair field:
db.collection.createIndex({ location: "2d" });

Part 2
Legacy Coordinate Pairs:

c. Performing Geospatial Queries:
You can perform geospatial queries using legacy coordinate pairs:
Query by Location Proximity:
location: {
$near: [longitude, latitude],
$maxDistance: distanceInRadians}});
Query Within a Box:
location: {
$geoWithin: {
$box: [[minLongitude, minLatitude],
[maxLongitude, maxLatitude]]}}});
Conclusion
• Indexing: Ensure that you create a geospatial index on the field containing the
geospatial data to optimize query performance.
• Accuracy: Pay attention to the precision of your coordinates and distance units
(e.g., radians or meters) to ensure accurate results.
• Projection: You can use projection to retrieve only specific fields of the matched
documents, reducing network overhead.
Geospatial queries in MongoDB allow you to work with spatial data effectively,
enabling applications such as location-based services, geofencing, and spatial
analysis. By understanding the GeoJSON and legacy coordinate pairs formats and
leveraging MongoDB's geospatial indexing capabilities, you can perform a variety of
spatial operations efficiently within your NoSQL database.

Lecture 3 FULL Explanation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 3 FULL Explanation

Uploaded by

Copyright:

Available Formats

NoSQL

Aggregation is the process of transforming documents in a

Aggregation is a powerful feature that allows you to analyze

The aggregation pipeline is a series of stages through which

These stages are executed sequentially, with the output of one

With the $group operator, we can perform all the aggregation or

db.university.aggregate([ { $group : { _id : '$name', totaldocs : { $sum : 1 } } } ]).pretty()

db.sales.aggregate([ { $group: { _id: "$item",

The following aggregation uses the $unwind stage to output a

db.inventory.aggregate( [ { $unwind : "$sizes" } ] )

$group: Groups documents by a specified key and applies

$project: Reshapes documents by including, excluding, or

{ $project: { newField: "$existingField", _id: 0 } }

And also $sort $skip $limit

db.universities.aggregate([ { $group : { _id : '$name', totaldocs : { $sum : 1 } } },

Replace <hostname>, <port>, <database_name>, and <backup_directory> with

JSON Schema – data validation

• Test Your Schema: Before deploying your JSON Schema to a production

a. Storing GeoJSON Data:

b. Creating Geospatial Index:

db.collection.createIndex({ location: "2dsphere" });

Legacy Coordinate Pairs:

a. Storing Legacy Coordinate Pairs:

db.collection.createIndex({ location: "2d" });

Legacy Coordinate Pairs:

You might also like