Project Report

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

Social Media Database Management System

Khoury College of Computer Sciences


Northeastern University, Boston, MA
vuda.p, murali.ak, velumuri.a, kodati.sr @northeastern.edu
December 11, 2023

INTRODUCTION hancing the capabilities of existing hardware, ensuring


consistent performance.
In the ever-evolving landscape of social media, managing
and harnessing the immense flow of information has be- Relational Quality:
come a paramount challenge. Our platform for Social Me- The significance of relational databases is the foundation
dia emerges as a comprehensive solution designed to meet of our database system. The system arranges social media
the dynamic needs of contemporary data handling. At the data into logical tables by utilizing the organized nature
core of this project is a sophisticated database management of RDBMS, guaranteeing data integrity and enabling so-
system meticulously crafted to seamlessly integrate with phisticated interactions between entities.
user data.
User-Friendly Interface:
Objectives Understanding the importance of user experience, the sys-
Unified Data Repository: tem boasts an intuitive interface. Users can navigate
Our database system serves as a centralized repository, through the database effortlessly, gaining quick access to
consolidating data from diverse social media users. From the information they need.
user profiles and posts to engagement metrics and trending
topics, it captures the rich network of social interactions. In summary, the project embraces the strengths of RDBMS
and vertical scalability to create a robust social media data-
Efficient Data Retrieval: base management system. By prioritizing relational quality
A key objective is to facilitate efficient data retrieval. and scalability, we aim to provide a solution that aligns
Users can seamlessly query and analyze information, em- with the structured requirements of social media data pro-
powering them to drive valuable insights from the vast cessing.
reservoir of social media data.

Real-time Integration: KEY DESIGN DECISIONS


Keeping pace with the real-time nature of social media, the
system is designed for swift and seamless integration with ER DIAGRAM
social media APIs. This ensures that the database is contin-
uously updated to reflect the latest interactions and trends.

Security and Compliance:


Our system prioritizes the security of social media data,
implementing stringent measures within the RDBMS
framework. It adheres to relevant data protection regula-
tions, providing a secure environment for managing and
storing sensitive information.

Scalability and Performance:


The project is engineered for vertical scalability, allowing
for the seamless expansion of system resources to handle
increasing data loads. As the volume of social media inter-
actions grows, the system can be vertically scaled by en- Link to the ERD: https://whimsical.com/social-
media-database-TVEAJ7PPxiaabb74gELvuo
 A separate table for user authentication details
The above diagram is derived by performing reverse engi- (user_auth) to maintain a clean separation of con-
neering in the SQL Workbench after finalizing the data- cerns.
base and generating the tables.  Foreign key reference to the user table, ensuring a
one-to-one relationship between a user and their
Below are the tables created and description for each of
them authentication details, each user has a single set of
1. user: authentication details in the user_auth table.
 The use of a CHAR(36) primary key for user_id
3. report:
suggests the use of UUIDs, ensuring uniqueness.
 Utilizing an ENUM for category and a VAR-
 Storing essential user information such as user-
CHAR field for report_reason to categorize and
name, email, and phone number.
describe different types of reports.
 Including boolean flags (is_verified, is_active,
 Foreign key references to the user table for both
is_reported) for user status tracking.
reported_by and reported_profile, establishing a
2. user_auth: one-to-many relationship between reporters and
reported profiles, indicating a user can be associ-  Using a many-to-many relationship between users
ated with many reports, but each report is linked and groups, a user can be a member of many
to a single user. groups, and each group can have many members.

4. device: 10. block_list Table:


 Using an INT primary key for device_id and a  Storing information about blocked relationships
foreign key reference to the user table, establish- between users. A user can block many other users,
ing a and each user can be blocked by many other
 many-to-one relationship, indicating many de- users.
vices can be associated with a single user.
 Storing device information, including type, name, These design decisions ensure a structured and normalized
and login status. database schema, capturing the relationships between dif-
ferent entities in a social media platform. The use of
5. messages: UUIDs enhances data integrity, and foreign key relation-
 Employing an ENUM for message_type to cate- ships enforce referential integrity across the database. The
gorize different message types. many-to-many relationships, especially in tables like Fol-
lowers and Messages, reflect the dynamic nature of social
 Two foreign key references to the user table for interactions.
both sender and receiver, establishing a many-to-
many relationship, indicating many messages can DATA COLLECTION
be sent by or received by a single user.
The faker library in Python and chatgpt are the primary
6. followers: tools used to produce the dataset.
 Storing follower relationships with references to
both the follower and the followed user, many fol- By using the Python Faker package, the dataset gains real-
lowers can follow a single user, and a user can istic and varied user profiles. A representative and compre-
have many followers. hensive dataset is produced with the help of faker, which
 The time_stamp field tracks the time of the follow makes it possible to generate real names, phone numbers,
action. email addresses, and other user attributes.
ChatGPT Emulates Natural Language Interaction:
7. posts, video, photo, likes, bookmarks, comments,
comment_likes, comment_reply, tag, repost: To replicate natural language exchanges within the dataset,
 A CHAR(36) primary key in each table, suggest- ChatGPT has been used. Creating a variety of message
ing the use of UUIDs for uniqueness. contents, comments, and other text-based exchanges falls
under this category. The dataset's realism and variety are
 Utilizing foreign keys to reference the user and
increased by using ChatGPT, which captures the subtleties
posts tables, establishing relationships between
of user discussion on a social media platform.
users, posts, and various interactions with posts,
indicating many posts can be authored by a single The following factors are taken into account for data cre-
user and many interactions can be associated with ated using ChatGPT and the Python Faker library:
a single user or post.
Variability in Data:
8. mentions Table: To replicate real-world events, made sure that the gener-
 Tracking mentions by users in posts, with foreign ated data shows a varied range of values for each field.
key references to the user and posts tables. Used a combination of these Faker suppliers to increase di-
versity as it offers multiple options for producing realistic
9. groupss and group_members Tables: data.
 Representing user groups with a groupss table and
tracking group memberships in group_members. Practical User Profiles:
Made use of the Faker library's features to construct user
profiles that are plausible. To increase the authenticity of
the data, paid close attention to specifics like names, email Verified that the data generated for different tables (e.g.,
addresses, and phone numbers. users, messages, posts) is consistent and adheres to the re-
lationships defined in the schema.
Bios and Picture Profiles:
Made sure the bios and profile pictures that are created are Simulating User Interactions:
accurate and diversified. This has the potential to improve Used ChatGPT to simulate user interactions, such as gener-
user profiles' legitimacy. ating realistic messages, comments, or post captions. This
contributes to a more comprehensive social media dataset.
Temporal Realism:
Made sure that the data's temporal distribution is realistic Testing and Validation:
while utilising timestamps. Dates of registration, for exam- Regularly validated the generated data against the defined
ple, ought to come before dates of login. schema and expectations to catch any inconsistencies or
anomalies early in the data generation process.
Variability in Message substance:
When creating messages with ChatGPT, changed the lan- Scalability:
guage and substance to reflect various communication Considered the scalability of data generation. Ensure that
philosophies. This contributes to the creation of a more re- the generated dataset can be scaled to the desired size
alistic and varied dataset. while maintaining its realism.

Relationships between Followers and Followed: Ethical Considerations:


Utilised ChatGPT to mimic natural speaking in order to Be mindful of ethical considerations when generating data,
foster user interactions. Made sure the relationships created especially when creating content or interactions. Avoid
between followers and followers are in line with social me- generating data that may be offensive or inappropriate.
dia behaviour.
DATA CLEANING
Randomness and Uniqueness:
To add variability, used Faker's randomization features. Data Validation:
Also made sure each user has a distinct identity and that Ensured that each user_id is unique across the entire
the generated data doesn't lead to implausible scenarios. dataset to maintain the integrity of primary key constraints.
Validated the format of email addresses (email_id) and
Managing Cases on the Edge: phone numbers (phone_number) to ensure they adhere to
Consider any edge cases that might come up when creating expected patterns
the data. For instance, made sure that the generated email
addresses are distinct and do not cause problems. Handling Missing Data:
Checked for missing values in all columns, especially in
Uniformity Between Tables: essential fields like user_name, email_id, and user_id. De-
Made sure the information produced for the various tables cide on an appropriate strategy, such as imputation or re-
(users, messages, posts, etc.) is accurate and follows the moval, based on the extent and nature of missing data.
schema's defined relationships.
Duplicate Removal:
Replicate User Conversations: Checked for and eliminated duplicate records in all tables,
To simulate user interactions, like sending out realistic especially in tables with primary keys, to avoid redundancy
messages, leaving comments, or uploading captions, use and maintain data integrity.
ChatGPT. This adds to a social media dataset that is more
extensive. DateTime Handling:
Verified that all timestamp fields (created_at, updated_at,
Handling Edge Cases: login_time, etc.) are in the correct datetime format. Addi-
Be mindful of potential edge cases that might arise during tionally, checked for any inconsistencies or anomalies in
data generation. For example, ensured that generated email these fields.
addresses are unique and do not lead to conflicts.
Gender Enum Validation:
Consistency Across Tables:
Ensured that the gender column only contains values  For example, the messages table has separate col-
'Male', 'Female', or 'Prefer not to say'. This can help main- umns for sender_id and receiver_id, ensuring each
tain data consistency. message's sender and receiver are represented dis-
tinctly.
Bio Length Validation:
Checked the length of the bio field to ensure it doesn't ex- Second Normal Form (2NF):
ceed the defined limit (255 characters). Elimination of Partial Dependencies:
 Tables have a primary key, and all non-key at-
Boolean Values:
tributes are fully functionally dependent on the
Verified that columns with Boolean values (is_verified,
entire primary key.
is_active, is_reported, is_logged_in, permission) only con-
tain TRUE or FALSE (or their equivalent) and handle any  For instance, the user table has attributes like
inconsistencies. first_name and last_name depending on the entire
primary key (user_id).
Device Token Length:
Validated the length of the device_token field to ensure it Third Normal Form (3NF):
doesn't exceed the defined limit (20 characters). Elimination of Transitive Dependencies:
 Non-key attributes depend only on the primary
Message Content Length: key and not on other non-key attributes.
Checked the length of the message_content field in the  In the user table, attributes like gender and
messages table to ensure it doesn't exceed the specified date_of_birth depend only on the primary key
limit (1000 characters). (user_id).
Column-Specific Validation: Boyce-Codd Normal Form (BCNF):
For each table, considered any specific validation criteria Ensuring Key Dependency:
related to the semantics of the data. For instance, in the re-  Each determinant (column whose value deter-
port table, ensured that the category field only contains
mines another value) is a candidate key.
valid report categories.
Foreign Key Integrity:  The tables satisfy BCNF as they do not have over-
Ensured that foreign keys (user_id, reported_by, lapping candidate keys.
reported_profile, post_id, etc.) in all tables reference exist-
ing primary keys in their respective tables. Normalization of Many-to-Many Relationships:
Introducing Junction Tables:
Consistent Enum Values:
Checked that all enum values (category, device_type, pri- For many-to-many relationships, junction tables (e.g., fol-
vacy, etc.) are consistent with their predefined set of val- lowers, group_members, and various interaction tables) are
ues. used to break down relationships into two one-to-many re-
lationships, contributing to normalization.
NORMALIZATION PROCEDURES Normalization of Enumerated Data:

Normalization is the process of organizing and structuring Using Enumerated Data Types:
a relational database to reduce data redundancy and im-  The use of ENUM types for attributes like gender,
prove data integrity. The main normalization procedures category, message_type, device_type, and privacy
typically involve breaking down tables and relationships to helps in maintaining data integrity and readability.
eliminate redundancy and dependency issues.
Normalization of Repeated Data:
First Normal Form (1NF): Separation of Concerns:
Elimination of Duplicate Columns: Authentication details are separated into the user_auth ta-
ble, ensuring that authentication-related information is
 Each column in every table holds atomic (indivisi- stored independently.
ble) values.
VIEWS  This view presents information about trending
posts based on like and comment counts.
A view is a virtual table that is based on the result of a SE-  It uses LEFT JOINs to capture posts even if they
LECT query. It does not store the data itself but provides a have no likes or comments.
way to represent the data stored in one or more tables.  The results are grouped by post and ordered by
Views can simplify complex queries, restrict access to spe- like count and comment count in descending or-
cific columns, or aggregate data. der.

1. Activity: 3. Followers:
Code: Code:
CREATE VIEW activity AS CREATE VIEW followers_view AS
SELECT SELECT
u.user_id, u.user_id AS follower_id,
u.user_name, u.user_name AS follower_user_name,
p.post_id, f.time_stamp,
p.post_content, u2.user_id AS following_id,
p.created_at AS post_created_at, u2.user_name AS following_user_name
c.comment_id, FROM followers f
c.comment_content, JOIN user_auth u ON f.follower_user_id = u.user_id
c.created_at AS comment_created_at, JOIN user_auth u2 ON f.following_user_id =
l.like_id, u2.user_id;
l.created_at AS like_created_at Explanation:
FROM user_auth u  This view provides a clear representation of fol-
LEFT JOIN posts p ON u.user_id = p.author lowers and their corresponding following relation-
LEFT JOIN comments c ON u.user_id = c.user_id
ships.
LEFT JOIN likes l ON u.user_id = l.user_id;
 It uses INNER JOINs to link follower and follow-
ing user details.

Explanation: 4. Reported Content View:


 This view combines information about user activ- Code:
ity, including posts, comments, and likes. CREATE VIEW reported_content_view AS
 It uses LEFT JOINs to include all users, even if SELECT
they haven't posted, commented, or liked. r.report_id,
r.category,
2. Trending Posts: r.report_reason,
Code: r.created_at AS report_created_at,
CREATE VIEW trending_posts AS u.user_id AS reported_by_user_id,
SELECT u.user_name AS reported_by_user_name,
p.post_id, u2.user_id AS reported_user_id,
p.post_content, u2.user_name AS reported_user_name
p.created_at AS post_created_at, FROM report r
COUNT(l.like_id) AS like_count, JOIN user_auth u ON r.reported_by = u.user_id
COUNT(c.comment_id) AS comment_count JOIN user_auth u2 ON r.reported_profile = u2.user_id;
FROM posts p
LEFT JOIN likes l ON p.post_id = l.post_id Explanation:
LEFT JOIN comments c ON p.post_id = c.post_id  This view consolidates information about reported
GROUP BY p.post_id, p.post_content, p.created_at content, including details about the reporter and
ORDER BY like_count DESC, comment_count the reported user.
DESC;

Explanation:
 It uses INNER JOINs to link report details with 2. CREATING A POST:
information about the users who reported and Code:
were reported. DELIMITER $$
CREATE PROCEDURE create_post(
STORED PROCEDURES IN author_id CHAR(36),
IN post_content VARCHAR(255),
IN caption VARCHAR(255),
A stored procedure is a set of SQL statements that can be
IN location VARCHAR(255)
stored in a database and executed later. It allows for encap-
)
sulating a series of operations into a single named unit,
BEGIN
making code more modular and reusable. Stored proce-
DECLARE new_post_id CHAR(36);
dures can accept parameters, perform actions, and return
results.
INSERT INTO posts (post_id, post_content,
created_at, author, caption, location)
1. DELETING REPORTED USER (includes a trigger): VALUES (UUID(), post_content, NOW(), author_id,
Code: caption, location);
CREATE PROCEDURE delete_reported_user( in reported_profile_id
char(36) )
BEGIN SET new_post_id = LAST_INSERT_ID();
DECLARE total_reports int;
select count(*) SELECT new_post_id AS post_id;
INTO total_reports END
FROM report $$ DELIMITER ;
WHERE reported_profile = reported_profile_id;

if total_reports >= 3 THEN Purpose: Creates a new post.


DELETE Parameters:
FROM USER author_id: ID of the post author.
WHERE user_id = reported_profile_id;
post_content: Content of the post.
caption: Caption for the post.
endIF;END
$$ delimiter ;
location: Location associated with the post.
-- trigger which calls SPDELIMITER $$CREATE TRIGGER Actions:
after_insert_report after  Inserts a new post into the posts table with the
INSERT provided details.
ON report FOR each row BEGIN call delete_reported_user
(  Returns the ID of the newly created post.
new.reported_profile
);END 3. LIKING A POST:
$$ delimiter ; Code:
DELIMITER $$
Purpose: Deletes a reported user based on a fixed thresh-
old (3 reports). CREATE PROCEDURE like_post(
Parameters: IN user_id CHAR(36),
reported_profile_id: ID of the reported user. IN post_id CHAR(36)
Actions: )
Counts the number of reports for the reported user. BEGIN
If the count is equal to or exceeds 3, deletes the user from
the user table. DECLARE new_like_id CHAR(36);
after_insert_report Trigger:
Purpose: Automatically triggers the delete_reported_user INSERT INTO likes (like_id, created_at, user_id,
procedure after a new report is inserted. post_id)
Actions: VALUES (UUID(), NOW(), user_id, post_id);
 Calls the delete_reported_user procedure with the
reported user ID from the newly inserted report.
SET new_like_id = LAST_INSERT_ID();
DELIMITER $$
SELECT new_like_id as like_id;
CREATE FUNCTION messages_on_a_day(
END $$ userID CHAR(36),
createdAt DATE
DELIMITER ; )
RETURNS INT READS SQL DATA
Purpose: Records a user's like on a post.
Parameters: BEGIN
user_id: ID of the user liking the post. DECLARE photo_message_count INT;
post_id: ID of the liked post. SELECT COUNT(*) INTO photo_message_count
Actions: FROM messages
 Inserts a new like into the likes table with the user WHERE sender_id = userID
ID and post ID. AND message_type = 'Photo'
AND DATE(created_at) = createdAt;
 Returns the ID of the newly created like.
RETURN photo_message_count;
END $$
4. GETTING POSTS:
Code:
DELIMITER $$ DELIMITER;

CREATE PROCEDURE get_posts( Purpose: Retrieves the count of photo messages sent by a
IN user_id CHAR(36) user on a specific date.
) Parameters: userID, createdAt (date).
BEGIN Returns: The count of photo messages sent on the speci-
SELECT * fied date.
FROM posts
WHERE author = user_id; 2. LATEST POST:
Code:
END $$ DELIMITER $$CREATE FUNCTION
get_latest_post( user_id char(36) )
DELIMITER; returns char(36) reads sql data
BEGINDECLARE latest_post_id CHAR(36);SE-
Purpose: Retrieves all posts by a specific user. LECT post_id
Parameters: INTO latest_post_id
user_id: ID of the user whose posts are retrieved. FROM posts
Actions: WHERE author = user_id
Selects all posts from the posts table where the author ID ORDER BY created_at DESC limit 1;RETURN lat-
matches the provided user ID. est_post_id;END
$$ delimiter ;
FUNCTIONS

A function is a precompiled set of SQL statements that per-


form a specific task. Functions can take input parameters, Purpose: Retrieves the ID of the latest post authored by a
perform operations, and return a result. Functions are typi- specific user.
cally used for calculations or data manipulations and can Parameters: user_id.
be invoked within SQL statements or other stored proce- Returns: The ID of the latest post.
dures.
3. POST COUNT:
1. MESSAGES PER DAY: Code:
Code: DELIMITER $$CREATE FUNCTION
get_post_count( user_id char(36) )  The application displays a user's feed, showing
returns int reads sql data posts from followed users.
BEGINDECLARE post_count INT;SELECT Count(*)
5. Messaging:
INTO post_count
 Private messaging functionality allows users to
FROM posts
send direct messages to each other.
WHERE author = user_id;RETURN post_count;END
$$ delimiter ; 6. Group Creation and Membership:
Purpose: Retrieves the count of posts authored by a spe-  Users can create or join groups based on common
cific user. interests.
Parameters: user_id.  Group members can share posts within the group.
Returns: The count of posts.
7. Content Discovery:
4. LIKES COUNT:  Features like trending posts, hashtags, and search
Code: functionality help users discover interesting con-
DELIMITER $$CREATE FUNCTION tent.
get_likes_count( postid char(36) )
returns int reads sql data Data Storage and Retrieval
1. User Data:
BEGINDECLARE likes_count INT;SELECT Count(*)
 User information, including profiles, credentials,
INTO likes_count
and preferences, is stored in the `user` and
FROM likes
`user_auth` tables.
WHERE post_id = postid;RETURN
 Retrieval involves querying these tables based on
likes_count;END
user IDs or other relevant criteria.
$$ delimiter ;
Purpose: Retrieves the count of likes for a specific post. 2. Post and Content Data:
Parameters: postID.  Post content is stored in the `posts` table, and
Returns: The count of likes for the given post. multimedia content (images, videos) may have
separate tables like `photo` and `video`.
APPLICATION DESCRIPTION  Retrieval involves joining tables to fetch posts and
associated multimedia content.
Below are the main features provided by the application.
1. User Registration and Authentication: 3. Social Connections:
 Users can register by providing necessary infor-  Follower and following relationships are stored in
mation such as username, email, and password. the `followers` table.
 The application authenticates users during login to  Retrieval involves querying this table to deter-
ensure secure access. mine a user's followers or the users a person is
following.
2. Profile Management:
 Users can manage their profiles, including updat- 4. Messaging Data:
ing personal information, changing profile pic-  Message data is stored in the `messages` table.
tures, and setting privacy preferences.  Retrieval involves querying this table based on
sender and receiver IDs.
3. Post Creation and Interaction:
 Users can create posts containing text, images, 5. Group and Membership Data:
videos, or other multimedia content.  Group information is stored in the `groupss` table,
 Interaction features include liking, commenting, and membership details are in the `group_mem-
and sharing posts. bers` table.
 Retrieval involves querying these tables based on
4. Social Connections:
group IDs and user IDs.
 Users can follow and be followed by other users
to build a network. 6. Interaction Data (Likes, Comments):
 Likes and comments are stored in the `likes` and
`comments` tables.
The file structure in the image is a Flask project. It is well-
 Retrieval involves querying these tables based on
post IDs or user IDs. organized and consistent, with each type of file grouped

7. Notification Data: into a separate directory.


 Notification details are stored in a dedicated table,
and retrieval involves querying based on user IDs
 social-media-app: This directory contains the
and notification types.
main application code, including views, and mod-
8. Content Discovery Data:
 Trending posts data may be stored separately, and els.
retrieval involves querying this data based on rele-  data_operations: This directory contains scripts
vant criteria.
for managing the database, such as creating, up-
dating, and deleting data.
CODE STRUCTURE
Below are the request bodies for insert, update
and delete APIs
Insert
Endpoint: http://127.0.0.1:5000/insert_data
Request body:
{
"table_name": "user",
"user_id": "99911d4e-197d-4484-
bde2-1e22efb125e8",
"user_name": "prudhviivuda",
"email_id":
"prudhvi.vuda2@example.com",
"phone_number": "9573768312",
"first_name": "prudhvi",
"last_name": "vuda",
"created_at": "2023-01-01
12:00:00",
"updated_at": "2023-01-01
12:00:00",
"gender": "Male",
"date_of_birth": "1990-01-15",
"profile_image":
"profile_img.jpg",
"bio": "A bio about prudhvi
Doe.",
"is_verified": true,
"is_active": true,
"is_reported": false
}
 parser: This directory contains modules for pars-
ing social media content.
 app.py: This is the main entry point for the Flask
Update
application. It contains the code that initializes the
Endpoint: http://127.0.0.1:5000/update_data
Flask framework and loads the application's set-
Request body:
{
tings.
"table_name": "user",  db_utils.py: This module contains utility func-
"user_id": "1bf8a166-1fea-4e5a- tions for interacting with the database.
bc4d-b15ceed934af",
"bio": "Hey, there",
 instructions.txt: This file contains instructions
"user_name": "DClark" for setting up and running the Flask application.
}  README.md: This file provides an overview of
the Flask application, including its purpose, fea-
tures, and installation instructions.

Delete Link to the repository:

Endpoint: http://127.0.0.1:5000/delete_data https://github.com/Prudhvivuda/social-media-db-system

Request body:
{
In summary, the application utilizes a relational database
with multiple tables to store various types of data. Data re-
"table_name": "user",
trieval is achieved through SQL queries, often involving
"user_id": "21a8bd86-7561-4a8c- joins to assemble comprehensive information for display in
99be-6e094450f24c" the application's user interface. The database design sup-
} ports the features of the application and ensures efficient
data storage and retrieval.

ANALYSIS AND GRAPHS

 queries: This directory contains modules for exe-


cuting database queries.
 schema: This is the directory which contains the
schema creation commands, data insertion
queries, stores procedures, views, triggers and
functions.
 views: This directory contains Python modules
that define the application's views. Views are re-
sponsible for handling user requests and generat-
Figure 1 Number of Posts by age group and gender
ing responses.
The number of posts made on social media by users of var- Text posts: 45%
ious ages and genders is displayed in a bar chart. The y- Image posts: 30%
axis displays the total number of posts, while the x-axis Video posts: 20%
displays the age group and gender. Link posts: 5%

According to the chart, 18–24 is the age group with the The pie chart shows that text posts are the most popular
most posts, followed by 25–34. 65 and older is the age type of post on social media, accounting for nearly half of
group with the fewest posts. With the exception of the 45– all posts. Image posts are also popular, accounting for 30%
54 age range, men posted more than women overall. of all posts. Video posts are less popular, accounting for
20% of all posts. Link posts are the least popular type of
Here is a more detailed breakdown of the chart: post, accounting for only 5% of all posts.
Age group 18-24: Males made 390 posts, while females
made 320 posts. This is likely because text posts are the easiest type of post
Age group 25-34: Males made 350 posts, while females to create. They also require the least amount of data to up-
made 280 posts. load, which is important for users who are on limited data
Age group 35-44: Males made 300 posts, while females plans. Image posts are also popular because they are vis-
made 260 posts. ually appealing and can be used to convey a lot of informa-
Age group 45-54: Males made 250 posts, while females tion quickly. Video posts can be even more engaging than
made 270 posts. image posts, but they can be more time-consuming to cre-
Age group 55-64: Males made 200 posts, while females ate and upload. Link posts are the least popular type of post
made 180 posts. because they are the least engaging. They also require
Age group 65+: Males made 150 posts, while females users to leave the social media platform in order to access
made 130 posts. the linked content.

Overall, the chart shows that younger users are more active Overall, the pie chart shows that text and image posts are
on social media than older users. It also shows that males the most popular types of posts on social media. Video
are more active on social media than females, except for in posts are less popular, but they are becoming more popular
the 45-54 age group. as internet speeds increase and social media platforms
make it easier to upload and share videos. Link posts are
the least popular type of post.

Figure 3 Reported types

The report shows the percentage of reported content by fe-


male users and male users on a social media database. The
report shows that male users reported more content than fe-
Figure 2 Device usage analysis
male users, accounting for 67% of all reported content. Fe-
male users reported 33% of all content.
The pie chart for the social media database shows the per-
centage of posts that are made in each category. The cate- The report shows that the most common type of reported
gories are listed in the order of popularity, with text posts content was harassment, followed by threats and imperson-
being the most popular. ation. Male users were more likely to report harassment
and threats, while female users were more likely to report
Here is a breakdown of the pie chart: impersonation.
The report also shows that male users were more likely to  Explore database sharding or partitioning strate-
report spam and inappropriate content. This could be be- gies for improved scalability with a growing user
cause male users are more likely to follow businesses and base.
organizations on social media, which are more likely to  Conduct performance testing and profiling to
post spammy content. Additionally, male users may be identify and address bottlenecks.
more likely to be exposed to inappropriate content, as they
are more likely to follow accounts that post sexually sug- 2. Security Enhancements:
gestive or violent content.  Implement encryption for sensitive user data, such
as passwords and personal information, to en-
Overall, the report shows that male users are more likely to hance overall security.
report content on social media than female users. This
 Integrate secure coding practices to mitigate po-
could be due to a number of factors, such as the types of
accounts that male users follow, the types of content that tential vulnerabilities, such as SQL injection, pa-
male users are exposed to, and the different ways that male rameter tampering and cross-site scripting (XSS).
and female users interact with social media.
3. User Authentication and Authorization:
 Enhance user authentication mechanisms by in-
CONCLUSION corporating multi-factor authentication for added
security.
In conclusion, the project has successfully achieved its ob-
 Implement role-based access control (RBAC) to
jectives by implementing a comprehensive and scalable
control user access to different features or data
database schema for our application. The database design
incorporates essential features such as user authentication, within the application.
messaging, social interactions, and content sharing, provid- 4. User Experience (UX) Improvements:
ing a robust foundation for the application's functionalities.
 Implement caching mechanisms to improve re-
Throughout the development process, careful consideration
sponse times for frequently accessed data.
was given to data integrity, relational dependencies, and
performance optimizations. The use of foreign key rela-  Enhance error handling and provide meaningful
tionships, indexing, and appropriate data types ensures the error messages to users.
efficiency and reliability of the database operations.  Implement features for users to customize and
Additionally, the implementation of triggers, such as the personalize their experience.
automatic removal of related records in the case of user
blocking, enhances the application's user experience and 5. Scalability and Load Testing:
maintains data consistency.  Conduct use of cloud-based databases and ser-
The integration of Flask for the backend, Postman for API vices for improved scalability and flexibility.
testing, and SQL for database management has resulted in
a well-rounded and functional system. Moving forward, 6. Data Analytics and Reporting:
potential improvements could involve optimization for  Integrate tools for data analytics and reporting to
larger datasets, enhancing security measures, and exploring derive insights from user interactions and trends.
opportunities for scalability.  Implement scheduled jobs or scripts to generate
This project not only demonstrates technical proficiency in reports on user engagement, popular content, etc.
database design but also underscores the importance of
thoughtful planning and collaboration between backend 7. Cross-Platform Compatibility:
and frontend components for a seamless user experience.  Optimize the application for cross-platform com-
Overall, the completed database schema and associated patibility, ensuring a consistent experience across
functionalities provide a solid foundation for future itera- different devices and browsers.
tions and enhancements, ensuring the application is well-
prepared to meet the demands of its users. 8. Documentation and Code Refactoring:
If there were more time, several additional enhancements  Document the database schema, API endpoints,
and optimizations could be considered for this project. and application architecture comprehensively.
Here are some potential areas for improvement:  Conduct code reviews and consider refactoring to
1. Performance Optimization: improve code maintainability and readability.
9. Testing and Quality Assurance: 6. Document Database Structure and Processes:
 Implement a comprehensive testing strategy, in-
 Maintain comprehensive documentation for the
cluding unit tests, integration tests, and end-to-end
database schema, stored procedures, and any
tests, to ensure the reliability of the application.
scripts. Clear documentation aids in understand-
 Set up continuous integration and continuous de-
ing, troubleshooting, and future development.
ployment (CI/CD) pipelines for automated testing
and deployment. 7. Test Thoroughly:
 Prioritize thorough testing, including unit tests, in-
10. User Feedback and Iterative Development: tegration tests, and stress tests. Identify and re-
 Gather user feedback through surveys, analytics,
solve issues early in the development process.
or user interviews to identify areas for improve-
ment. A well-managed database is crucial for the overall success
 Plan for iterative development cycles to incorpo- of an application or system. By following these pieces of
rate user feedback and continually enhance the ap- advice, future students can navigate the complexities of
plication. database management projects more effectively.

For future students working on database management REFERENCES


projects, here are some pieces of advice to help ensure a
successful and effective project:
1. Understand Requirements Thoroughly: [1] https://github.com/Prudhvivuda/social-media-db-
 Begin by gaining a deep understanding of the system
project requirements and objectives. Clearly de- [2] https://github.com/ssahibsingh/Social-Media-
fine the scope and functionalities of the database Database-Project
system. [3] https://github.com/RilThunder/CSS-475-Final-
Project
2. Design a Solid Database Schema: [4] https://github.com/Abhishek4848/SocialMedia-
 Invest time in designing a well-thought-out and Database-Management-System
normalized database schema. Consider the rela- [5] Saving Social Media Data: Understanding
tionships between tables, data types, and indexing Data Management Practices
to optimize data retrieval.

3. Choose the Right Database System:


 Select a database system that aligns with the
project's requirements. Consider factors such as
data volume, complexity, scalability, and perfor-
mance.

4. Plan for Scalability:


 Anticipate future growth and design the database
system to scale efficiently. Consider scalability
options such as sharding, partitioning, or using
cloud-based services.

5. Implement Data Validation and Quality Checks:


 Ensure that data entering the database is validated
and meets quality standards. Implement checks at
the application level and, if possible, within the
database.

You might also like