NoSQL-Assignemnt 2

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

RELATIONAL AND NON-RELATIONAL DATABASES

ITNPBD3

ASSIGNMENT TWO
SUBMISSION DATE- 26/05/2023

Introduction
This assignment is based on creating a non-relational database design. Firstly I’d like to highlight
the importance of data that plays a crucial role in every industry- be it with decision-making,
implementing changes in business, investing in a particular project, or even hiring a potential
candidate for a particular job post. Data science has tremendously helped in streamlining
business processes, coming up with convenient ways for managers to keep track of business
performance, enhancing research and development, monitoring performance appraisal of its
employees, and tracking sales records and expenses faster to facilitate decisions on improving
profitability, managing stocks, and improving the supply chain-the possibilities are endless.
There are several websites including social media, e-commerce, agencies, education sites, and
many more that use database software to manage and store data efficiently.

The website I chose


One such example of a website that is commonly used by many around the world is
Facebook.com- the world’s largest social media platform which connects people from different
locations around the globe. It is generally accessed through the Facebook app. Facebook
manages a range of activities and pages that are run by businesses and individuals around the
world. In order to manage such high volumes of data, Facebook possibly uses a range of
technically advanced database technologies. I use Facebook quite often to maintain my own
profile and, like many, I run 3-4 business pages in Facebook’s business platform Meta Business
Suite.

Reasons for choosing Facebook


For this particular assignment, I chose Facebook mainly for two reasons. (1) I use it quite often
and therefore I have a basic assumption about the types of data that they manage. (2) Even
though it’s a large social media platform, as per research I found out that a part of its data is
managed by MySQL database software. Thus it will be easier to interpret the benefits of using a
NoSQL database for Facebook over using a MySQL database for those particular kinds of data.

Assumptions- types of data stored by Facebook


In order to maintain such a massive data infrastructure, Facebook not only uses high-end data
technologies, but for certain types of data, they use MySQL. Facebook initially started using the
InnoDB MySQL database engine called MyRocksDB to manage all its social data and later they
advanced into using a range of NoSQL database software to support their business requirements.
As per research, some of the data that Facebook maintains using MySQL includes a social graph
that tracks user interactions such as comments, likes, reactions, user events, etc. that are possibly
run using entities and tables. The entities are described further in detail:
- Relationship Entity: Possibly shows the friend list in a profile and the type of
relationship (Friend/Follower or both). The key attributes likely to be involved are
 User ID
 Friend ID
 Relationship Type
- Posts Entity: This includes the list of pictures/contents posted or shared by a particular
user. The possible key attributes could be:
 User ID
 Post ID
 Post Type
 Timestamps

- Reactions Entity: Facebook tracks users’ interaction with posts using likes, reactions,
and comments. Attributes that could be involved are:
 User ID
 Post ID
 Reaction Type

- Comments Entity: Users also interact with each other by commenting on each other’s
posts/contents. The attributes involved could be:
 User ID
 Friend ID
 Comments ID
 Timestamps

- Events Entity: User events on Facebook are possibly managed by MySQL. The key
attributes involved could be:
 Users ID
 Events ID
 Event Details
 Event Reactions
 Participation list

- Transactions Entity: Transactions occur in business pages where a business might be


running ad campaigns and is required to check the updated balance before making any
further ad payments. The key attributes here could be
 User ID
 Account Details
 Promotion Type
 Account Balance
- Pages Entity: Facebook is likely to use a mix of relational databases, including MySQL
to manage business or other pages. The key attributes for the Pages entity could be
 Page Name
 Page Id
 Page Type
 Page Likes
 Content ID
 Author ID
These are some of the possible entities and their key attributes maintained in a relational
database design on Facebook.
Use Cases for Facebook:
In terms of use cases for Facebook, users interact differently with each of the entities mentioned
above.
- A use case for the User Profile Entity could be a user who is interacting with the Profile
settings of Facebook to upload a new profile picture or update their educational
qualifications for example.
- Users interact with the profiles of other users by sending them friend requests and tagging
them in posts.
- Users interact with content posted by their friends by reacting/commenting on their posts.
- In terms of the events entity, users will interact with the events page to create and
promote their upcoming events.
- Users can also be businesses who may interact with the Facebook Page promotion
platform by spending on ad campaigns to promote their website.

Business Requirements for Facebook:


- Scalability: Due to large volumes of data being processed every second, Facebook
requires its data structure to maintain traffic loads so that generating any kind of data,
let’s say timestamps to show when a picture/content was uploaded, can be processed
smoothly. Each of the entities described above could be horizontally scalable rather than
in tabular formats.

- Flexibility: Each of the entities explained above has complex data structures and
therefore shouldn’t stay within the rigidity of database schema as running queries may
become slow. Instead, for all these entities Facebook can adopt a schema-less design,
allowing a list of values to be stored within a document, adding flexibility to its data
modeling.

- Fault tolerance: The data structure in Facebook could be modeled in a way that can
withstand failures such as a network partition. It could be done through replication. The
same data (for example a user who wants to check their account balance on their business
page) should produce the same result if queries are run from a different location. In this
case, part of the data should be replicated in more than one node (devices) in order to
generate consistent search results.

- Performance and Speed: With billions of users trying to run/ process the same type of
data at the same time having multiple read/write loads, Facebook must acquire efficient
query optimization techniques for operations like generating codes for two-factor
authentication when running Facebook from more than one device, generating quick
query results to see the total engagements from promotional contents, or when refreshing
a newsfeed.

- Security: Facebook contains confidential information such as passwords, account


information, and other sensitive information which must be backed by encryption and
powerful authentication mechanisms to protect the data from unauthorized access.

Relative merits of using a relational design for Facebook:


Maintains ACID transactions:
When databases of the entities mentioned above are maintained via MySQL in Facebook
following ACID (Atomicity, Consistency, Isolation, and Durability), this is where MySQL
stands out as a better approach, in terms of consistency, when compared to a NoSQL data
structure. User Profiles such as ID, Pictures, Passwords, and Accounts Information be
compressed into a single logical database (Atomicity). The newsfeed maintains consistency in
terms of availability in showing related posts in a newsfeed (Eventual Consistency) or when
there is a business transaction to show the updated bank balance (Strict Consistency). All the
entities are uniquely identified, with each attribute having a single entry (Isolation). All the
databases are completed and saved successfully (Durability).
-Database Integrity and consistency:
Using joins across tables and strict use of foreign key constraints helps to maintain database
integrity. Facebook contains confidential data such as user profile which is linked to the posts
they like or interact with. This is where strict joins are necessary. There might be financial
transactions for business pages as well for which Facebook requires financial information of its
users. It is also linked to the profile information of users and therefore maintaining strict joins
across tables is necessary. In terms of consistency, when showing newsfeeds, Facebook ensures
that contents are always available as per user interests and preferences. However, when
conducting a financial transaction, Facebook ensures that strict consistency, which is showing
the updated balance, is maintained, all of which can be optimized using a MySQL database.
-Generate insights through queries:
Using MySQL, Facebook can not only sort, update, and delete data as per requirements, but also
generate specific data using structured query languages that aid in decision-making. For instance,
Facebook can use event data to see which of the events shared by friends attract a particular user,
based on user engagements, and using that data, Facebook will display similar events in the
user’s feed. MySQL can optimize complex query languages to process more specific data such
as the time at which a post/content received the highest number of reactions.

Why is the NoSQL database more appropriate?


NoSQL database, also known as a non-relational database is more appropriate because, unlike
relational databases which store databases within a schema and use SQL, Non-relational
databases have several advantages in flexibility in storing and retrieving databases. This could be
implemented for the entities mentioned earlier- Relationships, Posts, Reactions, Comments,
Pages, and Transactions. Some of the advantages are listed below:
Scalability
Facebook operates on a massive scale, handling enormous data every second. To cope with
challenges arising from the rapid growth of current data and concurrent usage for the entities
described above, Facebook should acquire NoSQL Database technologies. NoSQL database
software is best in terms of horizontal scalability and data can be easily scaled into each of the
entities-Relationships, Posts, Reactions, Comments, Events, Pages, and Transactions. It all can
be stored within documents in a distributed database store.
More Flexibility
Facebook has several classifications for each of its entities. Storing all the data in one big table
may not be a convenient option. For, example each time there is an update for additional
comments where a comment from a particular user involves a link for the first time, queries need
to be rewritten, updated, and inserted, thus making it time-consuming and inflexible. However, a
NoSQL can easily accommodate such a dynamic, unrelated data structure within document
stores.
Demoralization
In MySQL, databases are stored within the rigidity of tabular formats having stricter joins. With
a NoSQL database, a range of heterogeneous, unrelated data can be assembled in clusters within
a single ‘Document’ in a distributed database. For example, the entity known as ‘Events’ has an
attribute called event details. The fact that event details itself have 3 different values which could
be description, dates and location. It all cannot be added as a list of values for one attribute in a
MySQL database. Using a NoSQL database, this can be recorded in an array of objects within
Event Details.
Saves time
The distributed databases in NoSQL are stored across a large number of commodity machines
(clusters). Each cluster/node stores a portion of the database. When data is distributed across
clusters, it minimizes the time in which data travels through the network. Data can be efficiently
controlled in terms of cost. For instance, users of the data can easily acquire information about
the highest views of video content from a particular location.

Shared Risks
NoSQL Data is distributed across clusters in wide geographical areas. Storing data on more than
one cluster (computer) satisfies local data protection. For instance, there might be a financial
transaction to run Facebook ads, after which balances are automatically updated. It is important
to have replicas of the current balance because if there is a network partition, running queries
may not show the updated balance. Thus having replicas will enable Facebook to easily access
the updated data from other machines where the updated figure is replicated. Additionally, the
wider the geographic spread of replication, the safer the data becomes.
Better Performance
NoSQL allows fast retrieval of data, speedy query performance, and personalized
recommendations for Facebook, all of which are necessary for its all business operations,
including the entities mentioned above. It can perform fast read/write operations as data are
stored in denormalized formats. Thus running any kind of query, for example, the number of
times a user visited a Facebook page can be tracked much faster using NoSQL queries. On a
note, NoSQL is more appropriate in terms of flexibility, scalability, faster read/write operations,
efficient data storage, and query optimization, all of which add to its performance, speed, and
reliability.

As per the business requirements mentioned earlier, it is recommended that Facebook uses the
NoSQL database as the benefits that it serves clearly cater to their business requirements.
Initially, Facebook used MySQL, but even after compressing data, the spaces weren’t enough to
accommodate mass data. This led to having additional hardware to keep up with the storage but
this greatly added to the cost. Thus they switched to operating such big data using a couple of
NoSQL database software. Below is a list of operations that are run by NoSQL software on
Facebook:
- Newsfeed: Facebook uses NoSQL software to generate relevant contents for its users to
match their interests and preferences.
- Logs: logs are a way to track bugs. Considering that Facebook data operates on a mass
scale, it cannot run with logs that greatly help with bug fixes. To manage logs, Facebook
uses a distributed database called LogDevice.
- Analytics: Facebook uses PrestoDB, HBase, and Apache Hadoop for running analytics
and data warehousing. When refreshing a newsfeed, it uses analytics to match user
preferences in order to display the related posts instantly in the newsfeed.
- Messages and Notifications: Facebook uses Apache Cassandra to run an inbox search.
The software handles message threading, retrieve chat history, and display read/unread
statuses.
- Ads and Targeting: NoSQL software used by Facebook stores user preferences and
demographic data to suggest ads, track ad clicks, and uses analytics to optimize ad
campaigns.

Using MongoDB for Facebook


MongoDB is one of the most efficient NoSQL software that allows data to be manipulated
within a document store. It uses documents, rather than tables in a MySQL database. These
documents are organized into collections. MongoDB stores data in a JavaScript Object Notation
(JSON) format which relies heavily on use cases and business requirements to generate and
represent data.
The entities mentioned earlier are likely to be run through MySQL queries on Facebook. Even
though MySQL has unique advantages in terms of strong consistency, complex query
processing, and data integrity, it is recommended that Facebook should use a NoSQL database
like MongoDB as the entities no longer have pre-defined schema and require a data structure like
MongoDB which is highly adaptive to the fast-changing business requirements.

Collections for Facebook


The entities – Relationships, Posts, Reactions, Comments, Events, Pages, and Transactions can
be illustrated using a UML Diagram Displayed Below:
Denormalized
I have denormalized the data structure where some of the entities contain a list of JSON objects
within a field and some of the fields contain a list of values.
 The collection known as Relationship has an object called ‘Friend ID’ which might be
within an array as one user may have several friends.
 The collection known as ‘Events’ has an object known as ‘Event Details’. This Object
itself contains an array of Objects called ‘Description’, ‘Dates’, and ‘Location’
 ‘Events’ also contain an object called ‘Participation’ that also has an array of objects
‘Interested Participants’ and ‘Going Participants’
 “Transactions” include a field called account details having objects within objects, in an
array. Account details include objects like- UserID, Password, Contact, and Card Details.
 “Page” includes ‘Admins’ that may include an array of Admin IDs.
I have illustrated the UML diagram above by highlighting two profiles, a user and their friend for
a generic understanding I have identified relationships between each of these tables:
 “Relationships” is linked with “Posts” to show that people who are friends on Facebook
can react and comment on each other’s posts.
 “Posts” is linked with “Reactions” and “Comments” as through them users can interact
with posts in their newsfeed
 “Relationships” is linked with “Events” to show that users can invite their friends to
participate in the events. Users and their Friends can also act as Event admins.
 “Relationships” is linked with “Pages” to show that users having business pages can
invite their friends to like their page. Users and their friends can also be Admins on a
page.
 “Transactions” is linked to “Page” as pages run ad campaigns for there are certain
transactions. Thus the account details of the Page admins and Transaction ID are
connected.

Embedding decisions:
I have embedded Author ID, Post Id, Post Type and Timestamps within “Posts” collection and
added a reference of Comment ID and Reaction Type to show associate which posts received
what type of reaction and comments.
Some more references:
- User ID ( Relationship) represents Author ID (Posts)
- One of the Friend IDs ( Relationship) represents one of the Friend IDs(Reactions)
- One of the Friend ID ( Relationship) represents one of the Friend IDs (Comments)
- User ID ( Relationship) represents one of the Page Admins (Pages)
- One of the Friend ID (Relationship) represents one of the Page Admins (Pages)
- User ID ( Relationship) represents one of the Event Admins (Posts)
- One of the Friend ID (Relationship) represents one of the Event Admins (Posts)
- User ID ( Relationship) represents Author ID (Posts)
Document representation in MongoDB
I have highlighted some examples of the MongoDB syntax to show how the collections Posts
and Events are connected to the Relationships Collection.

Relationships Collection:
db.relationships.insert({'UserId': "Jack123", 'FriendID':["John890", "Sara321",
"Donna456","Clara673"],'RelationshipType':
[{'John890':"Friend",'Sara321':"Friend",'Donna456':"Follower",'Clara673':"Follower"}]})

Posts Collection:
db.posts.insert ({'AuthorID':"Jack123",'PostID':"98765",'PostType':"Picture",
'Timestamp':30,'ReactionType':"Heart",'CommentID': 5658})

Events Collection:
db.events.insert({'EventName':"Jack's Art Exhibition",'EventID': "Jack.Art.Exb",'Event
Details':{'Descrition':"Finest Art in town",'Date':2023-4-30,'Location':"Street11"},'Event
Admin':"Jack123",'Event Partipants':["John890","Sarah321","Donna456"]})
Design Decisions for a Distributed Cluster:

 Sharding: Through Shard Keys and Indexing, (like hashed indexes to optimize
queries) data can be split in small portions in shards across multiple machines. For
instance, the User ID is linked to most of the collections, and based on that
relationship, we can distribute the data across wide networks. For example in
terms of the events collection, the event which I named “Jack’s Art Exhibition” to
be held at “Street 11” should have shards (part of the data) closest to that location.
A business page targeting multiple locations could have customer data specific to
their locations and sharding can be done based on their location.
 Replicas could be set for the event details in other nodes (Machines) that are
located close to “Street 11” for local data protection. The data must have high
availability and fault tolerance.

Examples of MongoDB syntax to retrieve data:


Relationships
List only the friends of “Jack123”
> db.relationships.find ({'UserId':"Jack123"},{'FriendID':1, _id:0})
{ "FriendID" : [ "John890", "Sara321", "Donna456", "Clara673" ] }
Posts
Find the details of Post ID-98765
> db.posts.find ({'PostID':"98765"})
{"AuthorID" : "Jack123", "PostID" : "98765", "PostType" : "Picture", "Timestamp" :
30, "ReactionType" : "Heart", "CommentID" : 5658 }
Events
Find the event located in Street 11
> db.events.find ({'Event Details.Location':"Street11"})
"EventName" : "Jack's Art Exhibition", "EventID" : "Jack.Art.Exb", "Event Details" :
{ "Descrition" : "Finest Art in town", "Date" : 1989, "Location" : "Street11" }, "Event
Admin" : "Jack123", "Event Partipants" : [ "John890", "Sarah321", "Donna456" ] }

You might also like