Work NoSQL

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

JOSÉ MARÍA PEREZ BELTRÁN

INDEX

1. Introduction

2. Theoretical basis, architecture and application of


NoSQL databases

3. NoSQL – MongoDB

4. Benefits and disadvantages of NoSQL

5. Conclusion
1.Introduction
This work intends to present the main characteristics of NoSQL database systems,
discuss data representation models as well as establish their advantages and
disadvantages.

This document provides content on NoSQL, which refers to a new architecture of


database systems.

First, the theory of NoSQL databases, their architecture and their applications will be
analyzed.

Secondly, an example of NoSQL-MongoDB will be addressed.

2.Theoretical basis, architecture and application of


NoSQL databases.
BRIEF INTRODUCTION
In recent years, a wide variety of NoSQL databases have come to light, created by
companies primarily for their own needs. Issues such as scalability, performance,
maintenance, etc. They could not find any solution that existed in the market. Due to
the variety of approaches that exist between requirements and functionalities that a
NoSQL database must meet, it is quite difficult to maintain an overview of the current
situation of non-relational databases. NoSQL databases can be said to be a separate
category within the set of databases. Various common concepts and characteristics of
NoSQL databases will be discussed later. Now a general classification of the most
important NoSQL databases will be made according to their data model.

Relational database systems have long been the world's most widely used computing
model for storing and retrieving information. The famous phrase "one size fits all" has
to do with the design of these relational databases because it in turn has to do with the
needs of companies and user communities for databases.

In recent years this has changed. Now the needs are different, and BigData is largely to
blame.

NoSQL MOVEMENT
The term NoSQL was initially used in 1998, and was to refer to a relational database that
did not use the SQL language to function. From here, the term was rescued in 2009 in
talks by defenders of non-relational databases.

NoSQL FEATURES
-Avoid unnecessary complexity. Relational databases provide a great deal of
functionality and constraints to maintain data consistency, in some cases much more
than is necessary. This globally makes database operations take longer to help increase
performance.

-High performance. Many NoSQL databases provide better performance than


conventional RDBS systems. To illustrate this, you just have to see examples of the times
it takes large companies to insert information into their NoSQL databases. For example,
in the case of Google and its Bigtable. It is capable of processing up to 20 petabytes per
day (Lai 2009).

-Horizontal scalability and low cost hardware. Unlike relational databases, NoSQL
systems have been designed to scale horizontally. The software is designed to be able
to add or remove machines in a simple way without having a really high operational cost.

-Complexity and cost of building a database cluster. It has to do with the previous point,
in which the ease and simplicity with which these systems are able to add and remove
nodes from the system was exposed.

-Compromising reliability for performance. Data reliability is a very important issue, but
there are certain times when you can demand an increase in performance in exchange
for a lower level of reliability.

-The phrase “One size fit's it all” has been and is incorrect. Currently, there is a large
number of problems that cannot be addressed through a traditional vision in databases.
Many companies, especially related to the Internet, have adopted NoSQL solutions in
the company, surely encouraged by the increasing use and acceptance of the
technology. But many of these technologies were not mature enough (many still are not
today), so they have had to watch them grow and stabilize version by version. It is true
that many companies, especially startups and companies oriented especially to the web
world, have embraced NoSQL technology with force. Care must be taken when deciding
to implement this type of technology. Currently there are many solutions and types of
databases,

-Movements in programming languages and development Frameworks. For some time


now, the fact of leaving, functionally speaking, the layers of access to databases
independent of the rest of the code, has become popular. This, which is already good
practice for relational databases in itself, takes on important significance for the NoSQL
movement, which has rushed to develop its database connectors for different
programming languages.

-Cloud Computing requirements. In an interview with Dwight Merriman from 10gen (the
company that develops and maintains MongoDB) he mentions the 2 main requirements
for databases in cloud computing environments: high scalability, especially horizontal,
and minimal administration times. From his point of view, the following databases
would work well in a cloud environment.

1.Specific databases for batch data storage and MapReduce operations.


2.Key-value storage.

3.Databases whose logic is closer to traditional database systems than to key-value


stores, but without renouncing the performance and scalability characteristics of these.

-Needs of yesterday before needs of today. In the 1960s and 1970s, databases were
designed to run on a single, very powerful server, contrary to the current trend of many
companies today, especially web-oriented ones, which have several cheaper machines
as it is expected that They will fail and need to be replaced. Also applications have to be
designed accordingly. The latter is something that Amazon with its AWS service has to
deal with on a daily basis. At all times, Amazon explains that everything can fail, and it
must be your application that is prepared to face a possible loss of part of the hardware.

CAP THEOREM
CAP stands for:

- Consistency. In a distributed system, it is usually said to be in a consistent state if, after


a write operation, all subsequent read operations are able to see the updates from the
part of the system from which they are reading.

- Availability. High availability occurs when the system has been designed and
implemented so that it can continue to operate (reads, writes), even after a node
becomes unavailable, or that some hardware parts have to be removed, due to to bugs
or updates.

- Partition Tolerance. Understood as the ability of a system to have different regions or


logical divisions in the network, and to be able to continue working even if one of these
parts remains inaccessible for a while.

The CAP theory (also known as Bewer's theorem (Bewer 2012)) states that it is
impossible for a distributed system to simultaneously guarantee these 3 characteristics.
However, the CAP theorem also says that you can guarantee 2 of these 3 properties.

The CAP Theorem suggests that any distributed data storage system is vulnerable to
network connectivity failures, therefore, given the tolerance level of the partitioning of
the nodes, it will have to make some kind of concession between access to the
information or its most recent version.

The CAP Theorem gives us three options for combinations of pairs of attributes that can
be guaranteed at the same time. Let's see what they are:

-CA: Consistency and Availability- Access to information is guaranteed and the value of
the data is consistent (same) for all requests attended; if there are changes, they will be
displayed immediately. However, the partitioning of the nodes is not supported by the
system simultaneously. Examples: Relational (Oracle, Mysql, SQL Server), Neo4J
Figure 3: AC

-AP: Availability and partition tolerance- Access to the data is guaranteed and the system
is capable of tolerating (managing) the partitioning of the nodes, but leaving the
consistency of the data in the background, since it is not preserved and the data value
will not be replicated in the different nodes instantly. Examples: DynamoDB, CouchDB,
Cassandra.

Figure 4: PA

-CP: Consistency and Partition Tolerance- The consistency of the data between the
different nodes is guaranteed and the partitioning of the nodes is tolerated, but
sacrificing the availability of the data, with which the system may fail or take time to
offer a response to the user's request. Examples: MongoDB, HBase, Redis

Figure 5: PC
ACID vs. BASE
The relational database world is familiar with ACID transactions.

Transactions that occur in the SQL language, regardless of the database management
system, always comply with the ACID properties. These types of transactions are called
this way because they guarantee Atomicity, Consistency, isolation and Durability.

- Atomicity. Transactions must be fully executed or not executed, but the transaction
cannot be left halfway.

- Consistency. The data that is saved after the transaction must always be valid data.

-Isolation. Transactions are independent and do not affect each other.

- Durability. Once an operation is finished, it will last over time

Figure 1: Acid Transactions

The BASE model is an approach similar to ACID, though losing consistency and isolation
in favor of availability, degradation, and performance. The BASE model takes its name
from:

-Basic Availability. The system works even when some part fails, because the storage
follows the principles of distribution and replication.

-Soft State. The nodes do not have to be consistent with each other all the time.

- Eventual Consistency. Consistency occurs eventually. For a relational database


management system to be considered such, it must comply with the ACID model.

ACID BASE
-Strong coherence -weak coherence

-Isolation -Availability

-Focused on commits -Best Effort


-Nested transactions -Approximate answers

-More conservative -More optimistic

- Complicated evolution (scheme) -Easier and faster

- Easier evolution

Table 1: acid vs base

The non-relational model adheres more to the BASE approach, based on the fact that
applications must work most of the time (availability), they do not have to be always
consistent (soft state) and eventually be consistent (eventual consistency).

GUYS
-Key value: Key-value databases are highly divisible and allow horizontal scaling to scales
that other types of databases cannot reach. Use cases like gaming, ad tech, and IoT lend
themselves particularly well with the key-value data model. Amazon DynamoDB is
designed to provide consistent single-digit millisecond latency for any scale of
workloads. This consistent performance is one of the main reasons why Snapchat's
stories feature, which includes Snapchat's largest storage write workload, was moved to
DynamoDB.

-Documents: In application code, data is often represented as a JSON object or


document because it is an efficient and intuitive data model for developers. Document
databases make it easy for developers to store and query data in a database by using
the same document model format that they use in application code. The flexible, semi-
structured, and hierarchical nature of documents and document databases allows them
to evolve with the needs of applications. The document model works well with catalogs,
user profiles, and content management systems where each document is unique and
evolves over time.

-Graphics: The purpose of a graph database is to make it easier to build and run
applications that work with highly connected data sets. Typical use cases for a graph
database include social media, recommendation engines, fraud detection, and
knowledge graphs. Amazon Neptune is a fully managed graph database service.
Neptune supports both the Property Graph model and the Resource Description
Framework (RDF), which offers a choice of two graph APIs: TinkerPop and RDF/SPARQL.
Popular graph databases include Neo4j and Giraph.

-Seek:many applications generate logs to help developers troubleshoot. Amazon


OpenSearch Service is designed to provide real-time visualizations and analysis of
machine-generated data by indexing, aggregating, and searching records and semi-
structured metrics. Amazon OpenSearch Service is also a powerful, high-performance
search engine for full-text search use cases. Expedia is using more than 150 Amazon
OpenSearch Service domains, 30 TB of data, and 30 billion documents for a variety of
critical use cases, ranging from operational monitoring and troubleshooting, to data
stack tracing. distributed applications and price optimization

CONSISTENT HASHING
This idea arose as a solution to a problem that exists in databases based on key values.
During the normal operation of the system, the number of keys that need to be stored
are distributed more or less evenly among all the servers that commit to the storage
cluster. The problem occurs when it is necessary to add one or more nodes to the
schema and approximately all keys have to be redistributed. Consistent hashing covers
this problem of redistribution of keys, thus taking the keys of the new servers from the
cache nodes that previously had them assigned, and not directly from the main
database. There are two widely used approaches to accomplish this task:

-Mark each server with a single point on the ring that has just been generated, so that,
initially, they are evenly distributed among the keys. In the event that a new server has
to be inserted, the keys that are contained between each server will have to be
calculated, to add the new node where there is a higher concentration of keys, and thus
alleviate the part of the cluster where more hotspots are concentrated. When a client
has a key and wants to resolve the server where they have to ask, they only have to
resolve the hash that makes up their key, locate that point on the ring and go through it
clockwise until they find a server. That will be the node that contains or should contain
the value you are looking for.

-The generated ring will be divided into partitions. You should choose a fairly large
number, since it will be a number that will never change in the life of the storage cluster.
Each node will be divided into subnodes, and each subnode is mapped to a portion of
the ring. Thus, each node will have subnodes distributed throughout the entire ring, and
each of these subnodes will take charge of a portion of the circle, that is, a reduced set
of keys. Shrinking the cluster by one node simply means that the rest of the nodes must
add subnodes in the gaps that remain free. On the other hand, to add a new node to the
cluster, the nodes that are occupying the ring must give up a number of subnodes to the
new node that becomes part of the storage,
Figure 2: consistent hashing

MapReduce
MapReduce is a data processing paradigm characterized by being divided into two
distinct phases or steps: Map and Reduce. These threads associated with the task are
executed in a distributed way, in different processing nodes or slaves. To control and
manage its execution, there is a Master process. It is also in charge of accepting new
jobs sent to the system by clients.

This processing system is supported by distributed data storage technologies, in whose


nodes these map and reduce type operations are executed. The Hadoop distributed file
system is HDFS (Hadoop Distributed File System), responsible for storing files divided
into data blocks. HDFS provides the pre-blocking of data that MapReduce needs to run.
The processing results can be stored in the same storage system or in an external
database or system.

Figure 6: Map reduce

Finally, bringing together some concepts seen previously, Consistent Hashing can be
used as a form of orchestration for the MapReduce paradigm.

3. NoSQL – MongoDB
MongoDB (from humongous - ginormous-) is a scalable, high-performance, open source
NoSQL database; saves data structures in documents like JSON (JavaScript Object
Notation) with a dynamic schema, making the integration of data in certain applications
easier and faster. In this type of NoSQL database, four elements are handled:

-Database: contains a set of collections.


-Collection: contains a set of documents, it can be related to the tables of the relational
model, but it must be taken into account that documents with different attributes can be
stored here.

-Document: is a set of fields.

-Field: is a pair made up of a Key and a Value, where the key is the name of the field and
the value is its content.

Practical example:

Use dc dc.history.save({“name”: “The Dark Knight Returns”, “Pages”: 25, “characters”:


[ {“name”: “Batman”, “Earth”: “Earth 1” }, {“name”: “Superman”, “Earth”: “Earth 1”} ]

})

Query:

dc.history.find({“characters.name”:”Batman”})

The example shows how a single document can store both the information of a comic
(its name and number of pages) and the names of the characters that appear in it (in
addition to the land where they live).

WHAT IS BISON?

BSON is a binary format used to store information in MongoDB. BSON is the binary
encoding of the JSON format. This encoding has been chosen because it presents
certain advantages when storing data, such as efficiency or compression. Basically,
BSON and JSON are the formats that MongoDB works with: JSON is the format in
which information is presented to users and applications, and BSON is the format that
MongoDB uses internally.

4. Benefits and disadvantages of NoSQL


Advantages of a non-relational database or NoSQL database

NoSQL databases (No SQL implies NON-relational databases) are perfectly suited for
many modern applications, such as mobile, web, and gaming, that require flexible,
scalable, high-performance, and highly functional databases to provide great
experiences. of user.

 Scalability and its decentralized nature. They support distributed structures.


 They tend to be much more open and flexible databases (Flexibility: NoSQL
databases generally offer flexible schemas that allow for faster and more
iterative development. The flexible data model makes NoSQL databases ideal for
semi-structured data and unstructured). They allow adapting to project needs
much more easily than Entity Relationship models.
 Schema changes can be made without having to stop databases.

 Horizontal scalability: They are able to grow in number of machines, rather than
having to reside on large machines.NoSQL databases are generally designed to
scale using distributed clusters of hardware rather than scaling by adding
expensive and robust servers. Some cloud providers handle these operations in
the background, as a fully managed service.

 They can be run on machines with few resources.


 Highly functional – NoSQL databases provide highly functional APIs and data
types that are designed specifically for each of their respective data models

 Optimization database queries for large amounts of data. NoSQL database is


optimized for specific data models and access patterns allowing for higher
performance than trying to achieve similar functionality with relational
databases.

Disadvantagesfrom a NoSQL database

 Not all NoSQL databases support statement atomicity and data integrity. They
support what is called eventual consistency.
 Compatibility problems between SQL statements. Newer databases use their
own query language features and are not 100% compatible with relational
database SQL. Supporting problems with job queries in a NoSQL database is
more complicated.
 Lack of standardization. There are many NoSQL databases and there is still no
standard like there is for relational databases. An uncertain future is presumed
in these databases.
 Cross platform support. There are still many improvements to some systems to
support non-Linux operating systems.
 poor usability. They usually have administration tools that are not very usable
or are accessed through the console.

Conclution
NoSQL databases are already one more option in the portfolio of alternatives to store
the data of your applications. There are several types of them, but in general their main
objective is to solve the performance and scalability problems of RDBMSs. On the other
hand, RDBMSs are by no means going away. Their transactional capabilities make them
perfect for most existing applications. However, they will surely undergo changes. Just
as object-oriented databases influenced the evolution of RDBMSs in the past, we will
see many of the NoSQL ideas applied to relational databases in the future. In the future,
you will use more than one type of database. The one that adapts to your application
and even more, the one that adapts to a certain use case of your application. So it will
not be rare to see developments that use more than one type of database. The point is
that you have to keep adapting, lose the fear of leaving the safety of an RDBMS and start
using other alternatives. Many of the NoSQL databases are already in production quality,
some even have commercial support available and are backed by major companies.

BIBLIOGRAPHY
https://www.mongodb.com/es/nosql-explained

https://vitalflux.com/wtf-consistent-hashing-databases/

https://learnbigdata.com/hadoop-mapreduce/

file:///C:/Users/Jose/Desktop/Dialnet-
UtilidadYFuncionamientoDeLasBasesDeDatosNoSQL-5029469.pdf

https://datos.codeandcoke.com/apuntes:mongodb

https://www.acens.com/wp-content/images/2014/02/bbdd-nosql-wp-acens.pdf

https://verneacademy.com/blog/articulos-data/que-es-mapreduce-como-funciona/

https://ayudaleyprotecciondatos.es/bases-de-datos/no-relational/

https://www.unir.net/ingenieria/revista/bases-de-datos-nosql/

https://bosonit.com/blog/bases-de-datos-nosql-characteristics/

https://riunet.upv.es/bitstream/handle/10251/186175/Tejero%20-
%20Introduccion%20a%20las%20bases%20de%20datos%20NoSQL%20Sistemas%20de
%20bases%20de%20datos%20orientados%20a %20graphs.pdf?sequence=1

https://phoenixnap.com/kb/acid-vs-base

https://blog.bitsrc.io/acid-and-base-database-model-fadb156c660f

https://www.linkedin.com/pulse/acid-vs-base-mahmoud-salem

https://www.ibm.com/es-es/cloud/learn/cap-theorem

https://openwebinars.net/blog/what-is-the-teorem-cap-and-how-it-affects-when-
choosing-the-database-/

https://platzi.com/blog/what-is-the-teorema-cap-and-how-to-choose-the-database-
for-your-project/

https://www.crehana.com/blog/transformacion-digital/teorema-cap/
https://blog.jacagudelo.com/teorema-cap/

https://learn.microsoft.com/es-es/azure/architecture/data-guide/big-data/non-
relational-data

https://pandorafms.com/blog/en/nosql-databases/

https://guidocutipa.blog.bo/principales-ventajas-desventajas-bases-de-datos-
relacionales-no-relacionales-nosql-vs-sql/

You might also like