Professional Documents
Culture Documents
Work NoSQL
Work NoSQL
Work NoSQL
INDEX
1. Introduction
3. NoSQL – MongoDB
5. Conclusion
1.Introduction
This work intends to present the main characteristics of NoSQL database systems,
discuss data representation models as well as establish their advantages and
disadvantages.
First, the theory of NoSQL databases, their architecture and their applications will be
analyzed.
Relational database systems have long been the world's most widely used computing
model for storing and retrieving information. The famous phrase "one size fits all" has
to do with the design of these relational databases because it in turn has to do with the
needs of companies and user communities for databases.
In recent years this has changed. Now the needs are different, and BigData is largely to
blame.
NoSQL MOVEMENT
The term NoSQL was initially used in 1998, and was to refer to a relational database that
did not use the SQL language to function. From here, the term was rescued in 2009 in
talks by defenders of non-relational databases.
NoSQL FEATURES
-Avoid unnecessary complexity. Relational databases provide a great deal of
functionality and constraints to maintain data consistency, in some cases much more
than is necessary. This globally makes database operations take longer to help increase
performance.
-Horizontal scalability and low cost hardware. Unlike relational databases, NoSQL
systems have been designed to scale horizontally. The software is designed to be able
to add or remove machines in a simple way without having a really high operational cost.
-Complexity and cost of building a database cluster. It has to do with the previous point,
in which the ease and simplicity with which these systems are able to add and remove
nodes from the system was exposed.
-Compromising reliability for performance. Data reliability is a very important issue, but
there are certain times when you can demand an increase in performance in exchange
for a lower level of reliability.
-The phrase “One size fit's it all” has been and is incorrect. Currently, there is a large
number of problems that cannot be addressed through a traditional vision in databases.
Many companies, especially related to the Internet, have adopted NoSQL solutions in
the company, surely encouraged by the increasing use and acceptance of the
technology. But many of these technologies were not mature enough (many still are not
today), so they have had to watch them grow and stabilize version by version. It is true
that many companies, especially startups and companies oriented especially to the web
world, have embraced NoSQL technology with force. Care must be taken when deciding
to implement this type of technology. Currently there are many solutions and types of
databases,
-Cloud Computing requirements. In an interview with Dwight Merriman from 10gen (the
company that develops and maintains MongoDB) he mentions the 2 main requirements
for databases in cloud computing environments: high scalability, especially horizontal,
and minimal administration times. From his point of view, the following databases
would work well in a cloud environment.
-Needs of yesterday before needs of today. In the 1960s and 1970s, databases were
designed to run on a single, very powerful server, contrary to the current trend of many
companies today, especially web-oriented ones, which have several cheaper machines
as it is expected that They will fail and need to be replaced. Also applications have to be
designed accordingly. The latter is something that Amazon with its AWS service has to
deal with on a daily basis. At all times, Amazon explains that everything can fail, and it
must be your application that is prepared to face a possible loss of part of the hardware.
CAP THEOREM
CAP stands for:
- Availability. High availability occurs when the system has been designed and
implemented so that it can continue to operate (reads, writes), even after a node
becomes unavailable, or that some hardware parts have to be removed, due to to bugs
or updates.
The CAP theory (also known as Bewer's theorem (Bewer 2012)) states that it is
impossible for a distributed system to simultaneously guarantee these 3 characteristics.
However, the CAP theorem also says that you can guarantee 2 of these 3 properties.
The CAP Theorem suggests that any distributed data storage system is vulnerable to
network connectivity failures, therefore, given the tolerance level of the partitioning of
the nodes, it will have to make some kind of concession between access to the
information or its most recent version.
The CAP Theorem gives us three options for combinations of pairs of attributes that can
be guaranteed at the same time. Let's see what they are:
-CA: Consistency and Availability- Access to information is guaranteed and the value of
the data is consistent (same) for all requests attended; if there are changes, they will be
displayed immediately. However, the partitioning of the nodes is not supported by the
system simultaneously. Examples: Relational (Oracle, Mysql, SQL Server), Neo4J
Figure 3: AC
-AP: Availability and partition tolerance- Access to the data is guaranteed and the system
is capable of tolerating (managing) the partitioning of the nodes, but leaving the
consistency of the data in the background, since it is not preserved and the data value
will not be replicated in the different nodes instantly. Examples: DynamoDB, CouchDB,
Cassandra.
Figure 4: PA
-CP: Consistency and Partition Tolerance- The consistency of the data between the
different nodes is guaranteed and the partitioning of the nodes is tolerated, but
sacrificing the availability of the data, with which the system may fail or take time to
offer a response to the user's request. Examples: MongoDB, HBase, Redis
Figure 5: PC
ACID vs. BASE
The relational database world is familiar with ACID transactions.
Transactions that occur in the SQL language, regardless of the database management
system, always comply with the ACID properties. These types of transactions are called
this way because they guarantee Atomicity, Consistency, isolation and Durability.
- Atomicity. Transactions must be fully executed or not executed, but the transaction
cannot be left halfway.
- Consistency. The data that is saved after the transaction must always be valid data.
The BASE model is an approach similar to ACID, though losing consistency and isolation
in favor of availability, degradation, and performance. The BASE model takes its name
from:
-Basic Availability. The system works even when some part fails, because the storage
follows the principles of distribution and replication.
-Soft State. The nodes do not have to be consistent with each other all the time.
ACID BASE
-Strong coherence -weak coherence
-Isolation -Availability
- Easier evolution
The non-relational model adheres more to the BASE approach, based on the fact that
applications must work most of the time (availability), they do not have to be always
consistent (soft state) and eventually be consistent (eventual consistency).
GUYS
-Key value: Key-value databases are highly divisible and allow horizontal scaling to scales
that other types of databases cannot reach. Use cases like gaming, ad tech, and IoT lend
themselves particularly well with the key-value data model. Amazon DynamoDB is
designed to provide consistent single-digit millisecond latency for any scale of
workloads. This consistent performance is one of the main reasons why Snapchat's
stories feature, which includes Snapchat's largest storage write workload, was moved to
DynamoDB.
-Graphics: The purpose of a graph database is to make it easier to build and run
applications that work with highly connected data sets. Typical use cases for a graph
database include social media, recommendation engines, fraud detection, and
knowledge graphs. Amazon Neptune is a fully managed graph database service.
Neptune supports both the Property Graph model and the Resource Description
Framework (RDF), which offers a choice of two graph APIs: TinkerPop and RDF/SPARQL.
Popular graph databases include Neo4j and Giraph.
CONSISTENT HASHING
This idea arose as a solution to a problem that exists in databases based on key values.
During the normal operation of the system, the number of keys that need to be stored
are distributed more or less evenly among all the servers that commit to the storage
cluster. The problem occurs when it is necessary to add one or more nodes to the
schema and approximately all keys have to be redistributed. Consistent hashing covers
this problem of redistribution of keys, thus taking the keys of the new servers from the
cache nodes that previously had them assigned, and not directly from the main
database. There are two widely used approaches to accomplish this task:
-Mark each server with a single point on the ring that has just been generated, so that,
initially, they are evenly distributed among the keys. In the event that a new server has
to be inserted, the keys that are contained between each server will have to be
calculated, to add the new node where there is a higher concentration of keys, and thus
alleviate the part of the cluster where more hotspots are concentrated. When a client
has a key and wants to resolve the server where they have to ask, they only have to
resolve the hash that makes up their key, locate that point on the ring and go through it
clockwise until they find a server. That will be the node that contains or should contain
the value you are looking for.
-The generated ring will be divided into partitions. You should choose a fairly large
number, since it will be a number that will never change in the life of the storage cluster.
Each node will be divided into subnodes, and each subnode is mapped to a portion of
the ring. Thus, each node will have subnodes distributed throughout the entire ring, and
each of these subnodes will take charge of a portion of the circle, that is, a reduced set
of keys. Shrinking the cluster by one node simply means that the rest of the nodes must
add subnodes in the gaps that remain free. On the other hand, to add a new node to the
cluster, the nodes that are occupying the ring must give up a number of subnodes to the
new node that becomes part of the storage,
Figure 2: consistent hashing
MapReduce
MapReduce is a data processing paradigm characterized by being divided into two
distinct phases or steps: Map and Reduce. These threads associated with the task are
executed in a distributed way, in different processing nodes or slaves. To control and
manage its execution, there is a Master process. It is also in charge of accepting new
jobs sent to the system by clients.
Finally, bringing together some concepts seen previously, Consistent Hashing can be
used as a form of orchestration for the MapReduce paradigm.
3. NoSQL – MongoDB
MongoDB (from humongous - ginormous-) is a scalable, high-performance, open source
NoSQL database; saves data structures in documents like JSON (JavaScript Object
Notation) with a dynamic schema, making the integration of data in certain applications
easier and faster. In this type of NoSQL database, four elements are handled:
-Field: is a pair made up of a Key and a Value, where the key is the name of the field and
the value is its content.
Practical example:
})
Query:
dc.history.find({“characters.name”:”Batman”})
The example shows how a single document can store both the information of a comic
(its name and number of pages) and the names of the characters that appear in it (in
addition to the land where they live).
WHAT IS BISON?
BSON is a binary format used to store information in MongoDB. BSON is the binary
encoding of the JSON format. This encoding has been chosen because it presents
certain advantages when storing data, such as efficiency or compression. Basically,
BSON and JSON are the formats that MongoDB works with: JSON is the format in
which information is presented to users and applications, and BSON is the format that
MongoDB uses internally.
NoSQL databases (No SQL implies NON-relational databases) are perfectly suited for
many modern applications, such as mobile, web, and gaming, that require flexible,
scalable, high-performance, and highly functional databases to provide great
experiences. of user.
Horizontal scalability: They are able to grow in number of machines, rather than
having to reside on large machines.NoSQL databases are generally designed to
scale using distributed clusters of hardware rather than scaling by adding
expensive and robust servers. Some cloud providers handle these operations in
the background, as a fully managed service.
Not all NoSQL databases support statement atomicity and data integrity. They
support what is called eventual consistency.
Compatibility problems between SQL statements. Newer databases use their
own query language features and are not 100% compatible with relational
database SQL. Supporting problems with job queries in a NoSQL database is
more complicated.
Lack of standardization. There are many NoSQL databases and there is still no
standard like there is for relational databases. An uncertain future is presumed
in these databases.
Cross platform support. There are still many improvements to some systems to
support non-Linux operating systems.
poor usability. They usually have administration tools that are not very usable
or are accessed through the console.
Conclution
NoSQL databases are already one more option in the portfolio of alternatives to store
the data of your applications. There are several types of them, but in general their main
objective is to solve the performance and scalability problems of RDBMSs. On the other
hand, RDBMSs are by no means going away. Their transactional capabilities make them
perfect for most existing applications. However, they will surely undergo changes. Just
as object-oriented databases influenced the evolution of RDBMSs in the past, we will
see many of the NoSQL ideas applied to relational databases in the future. In the future,
you will use more than one type of database. The one that adapts to your application
and even more, the one that adapts to a certain use case of your application. So it will
not be rare to see developments that use more than one type of database. The point is
that you have to keep adapting, lose the fear of leaving the safety of an RDBMS and start
using other alternatives. Many of the NoSQL databases are already in production quality,
some even have commercial support available and are backed by major companies.
BIBLIOGRAPHY
https://www.mongodb.com/es/nosql-explained
https://vitalflux.com/wtf-consistent-hashing-databases/
https://learnbigdata.com/hadoop-mapreduce/
file:///C:/Users/Jose/Desktop/Dialnet-
UtilidadYFuncionamientoDeLasBasesDeDatosNoSQL-5029469.pdf
https://datos.codeandcoke.com/apuntes:mongodb
https://www.acens.com/wp-content/images/2014/02/bbdd-nosql-wp-acens.pdf
https://verneacademy.com/blog/articulos-data/que-es-mapreduce-como-funciona/
https://ayudaleyprotecciondatos.es/bases-de-datos/no-relational/
https://www.unir.net/ingenieria/revista/bases-de-datos-nosql/
https://bosonit.com/blog/bases-de-datos-nosql-characteristics/
https://riunet.upv.es/bitstream/handle/10251/186175/Tejero%20-
%20Introduccion%20a%20las%20bases%20de%20datos%20NoSQL%20Sistemas%20de
%20bases%20de%20datos%20orientados%20a %20graphs.pdf?sequence=1
https://phoenixnap.com/kb/acid-vs-base
https://blog.bitsrc.io/acid-and-base-database-model-fadb156c660f
https://www.linkedin.com/pulse/acid-vs-base-mahmoud-salem
https://www.ibm.com/es-es/cloud/learn/cap-theorem
https://openwebinars.net/blog/what-is-the-teorem-cap-and-how-it-affects-when-
choosing-the-database-/
https://platzi.com/blog/what-is-the-teorema-cap-and-how-to-choose-the-database-
for-your-project/
https://www.crehana.com/blog/transformacion-digital/teorema-cap/
https://blog.jacagudelo.com/teorema-cap/
https://learn.microsoft.com/es-es/azure/architecture/data-guide/big-data/non-
relational-data
https://pandorafms.com/blog/en/nosql-databases/
https://guidocutipa.blog.bo/principales-ventajas-desventajas-bases-de-datos-
relacionales-no-relacionales-nosql-vs-sql/