Professional Documents
Culture Documents
sample
sample
ASSIGNMENT 2
PART 1............................................................................................................................................4
1. Executive Summary.................................................................................................................4
2. Introduction..............................................................................................................................4
3. Methodology............................................................................................................................5
3.6.8. Scalability................................................................................................................14
3.6.9. Performance.............................................................................................................14
3.6.10. Security................................................................................................................14
3.6.11. Flexibility.............................................................................................................15
PART 2..........................................................................................................................................16
1. Queries...................................................................................................................................16
2. Results....................................................................................................................................20
3. Work Division.......................................................................................................................21
4. Conclusion.............................................................................................................................22
References......................................................................................................................................22
PART 1
1. Executive Summary
The report presents the findings of a two-part big data and database management assignment. In
the tourPedia database, the query operations from Part 2 were completed successfully. The
exercise provided hands-on practise using datasets and a variety of data manipulation and
querying techniques.
2. Introduction
Today's world, where organisations must deal with enormous amounts of data, has made the use
of big data databases more crucial. Data is being produced at a rate that has never been seen
before thanks to technological advancements and the availability of high-speed internet. For
organisations to stay competitive, managing this data and gaining insights from it has become
essential. Big data technologies like MongoDB, a NoSQL document-oriented database that is
commonly used for big data applications, have emerged as a result of this (Chong & Fan, 2019).
The implementation of a big data database using MongoDB, which was created by fusing the
unique databases of each group member, will be covered in this report. The report will give an
overview of each member's individual databases, describe how these databases were integrated,
discuss the decisions that were made during implementation, and offer a critical assessment of
the chosen database solution. We will also assess the finished database's usability and utility
from the viewpoint of the end user (Diogo & Santos, 2020).
The result aims to give readers a thorough grasp of how to create a large data database using
MongoDB and the advantages it may offer businesses. This report intends to assist people and
organisations considering using a big data database solution to meet their data management
requirements.
3. Methodology
Each group member provided their unique database ideas from assessment 1 as part of the
collaborative methodology utilised to create the big data database using MongoDB. To find
improvement opportunities and ensure the separate databases were compatible with MongoDB,
Online_Bookings, Referrals, and Insurance Provider that provide unique aspect of this ERD.
Therapy_Type, and Patient Vital Signs. We can use these entities to expand the scope of our
The addition of the Medical_Test, Therapy_Session, Therapy_Type, and Patient Vital Signs
entities in the ERD greatly increases the scope and functionality of the final database solution.
These organisations offer extra channels for healthcare data collection and processing, which is
crucial for supporting clinical activities like disease diagnosis, chronic condition management,
An elegant technique that successfully represents the intricacy of the relationships between the
entity. This method guarantees correct data recording and removes the chance of duplicate or
inconsistent data.
The association names and cardinalities offer a precise and thorough representation of the
connections between the database's elements. This feature makes it easier to navigate between
related entities and ensures that data is correctly associated with the appropriate entity,
This ERD has two association names on a single association line connecting two entities. There
is no direct relationship between Doctor and Patient. These flaws can be corrected and further
there is no unique entity which can extend our final database and scope.
For improved data modelling, a few flaws with this ERD must be fixed. First, there are occasions
where a single association line connecting two entities lists two association names. This may
make the connections between entities unclear and confusing. For instance, in this ERD, there is
no clear connection between the entities Doctor and Patient. A more accurate and understandable
representation of the data would arise from making the links between entities more explicit and
The absence of a special entity that can broaden the database's scope is another problem with this
ERD. Electronic health records, laboratory testing, and other particular entities can offer more
functionality and insights for hospital management and decision-making. Such entities would
To find any flaws or faults and to improve the data model for accurate and efficient data
Source code for the final big data database implementation can be found at this link:
https://github.com/inqlo/asim-hospital/blob/main/asim-new_york_city_hospital.json
The video demonstration can be found here:
https://drive.google.com/file/d/1ws6wNGNCW1atHJIWtH4saRKtIjJqEx0c/view
In comparison to the earlier ERD, the final database design, which incorporates the additional
entities Patient Vital Signs, Therapy Session, Therapy Type, and Medical Test, looks to be an
improvement.
The newly established entities offer more functionality and insights into managing patient health
and treatment, which can be helpful for healthcare practitioners. For instance, the inclusion of
Patient Vital Signs can aid medical professionals in tracking a patient's vital signs over time and
identifying any changes that would call for additional care or treatment. The Therapy Type item
offers details on the sorts of therapy a patient receives, while the Therapy Session entity enables
doctors to monitor therapy sessions and their results. Last but not least, the Medical Test entity
keeps a record of the specifics of the medical tests carried out, which can help physicians make
The ERD has now advancements, though. There was absence of unambiguous links between
some of the entities which was one potential problem. The relationship between the Hospital
Overall, adding the new entities has increased the database's capabilities and increased the
accuracy and level of detail with which patient health and treatment data are managed. To
guarantee that the ERD is clear and well-organized, with unambiguous entity and attribute
names, and with clear links between entities, further refining can be done if need be there.
3.6.1. Documentation and Metadata
The solution contains thorough documentation and metadata that facilitates efficient database
maintenance and understanding, improving data usability, and ensuring accurate interpretation
The final database solution establishes a consistent naming convention and data representation,
enhancing clarity, usability, and manipulation of data, thereby reducing confusion and mistakes.
By acting as a central repository for diverse healthcare-related data, the database system
integrates data integration capabilities, enabling data exchange and integration with other
systems like electronic health records (EHR), laboratory systems, or billing systems (Patel &
Patel, 2018).
The final database solution aims to provide comprehensive data coverage by including essential
entities and attributes relevant to healthcare operations, laying the foundation for capturing and
The database solution implements suitable methods for data quality, privacy, and compliance,
enforcing data governance standards, defining data stewardship responsibilities, and executing
The final database solution optimizes data storage by making use of effective storage methods
like indexing, compression, and partitioning, maximizing storage efficiency, and reducing
The database solution includes techniques for handling mistakes, exceptions, and inconsistent
data, maintaining data integrity, system stability, and user confidence in the veracity of the stored
3.6.8. Scalability
The final database solution is designed to scale with the growth of the healthcare organization. It
can handle a significant amount of data and can accommodate the addition of new entities as the
organization expands. This scalability ensures that the database can meet the evolving needs of
3.6.9. Performance
The final database solution is optimized for performance, ensuring that queries are executed
efficiently and quickly. The solution can process large amounts of data without slowing down,
ensuring that healthcare professionals can access the information they need in a timely manner.
3.6.10. Security
The final database solution includes robust security measures to protect sensitive healthcare data
from unauthorized access. The solution implements authentication and authorization mechanisms
to ensure that only authorized personnel can access the data. It also provides data encryption to
protect against data breaches and cyber-attacks (Diogo & Santos, 2020).
3.6.11. Flexibility
The final database solution is designed to be flexible, allowing for customization based on the
specific needs of the healthcare organization. The solution can be tailored to include additional
entities, attributes, and relationships as required, ensuring that it can adapt to the evolving needs
The JSON structure made available to healthcare organisations is significant because it offers a
uniform format for keeping and organising crucial medical data. Its use of simple key-value
combinations guarantees that data is accessible and processable, making it simple for healthcare
The versatility of the JSON format, which enables healthcare providers to store diverse types of
data in one area, is a key benefit. This helps them to effectively manage patient information and
decide on their care with knowledge. Additionally, the JSON's hierarchical structure makes sure
that relevant data, such patient referrals and bookings, is grouped together to make it simpler to
management solution. They are able to accurately track patient data, including as medical
history, treatment plans, prescriptions, and billing data. This helps healthcare professionals to
give patients with better treatment and guarantee that all parts of their care are managed
effectively.
PART 2
1. Queries
2. Give the name and phone number of places with a phone number entered
($exists, $ne);
db.paris.find(
},
"name": 1,
"contact.phone": 1
$and: [
},
"name": 1,
"contact": 1
4. Name of places whose name contains the word “hotel” (pay attention to case);
db.paris.find(
{ "name": 1 }
{ "services": { $size: 5 } },
{ "name": 1, "services": 1 })
{ "location.coord.coordinates": 1, _id: 0 }
10. For each ”poi” category place name, give the number of reviews whose source
db.paris.aggregate([
{
$match: {
"reviews.source": "Facebook",
category: "poi"
},
$group: {
_id: "$name",
},
$sort: {
reviewCount: -1
])
11. For each place name in the “restaurant” category, give the average rating and
the number of comments.
db.paris.aggregate([
$match: {
category: "restaurant"
},
{
$group: {
_id: "$name",
])
The $lookup command in MongoDB is used to perform a left outer join between two collections
that are located in the same database. You can use it to combine data from different collections
based on a common field or expression. (MongoDB, 2023)
The lookup command's syntax is:
$lookup: {
from: <collection>,
localField: <field>,
foreignField: <field>,
as: <outputArray>
For example:
db.paris.aggregate([
$lookup: {
from: "accommodation",
localField: "_id",
foreignField: "place_id",
as: "accommodation"
])
Based on the _id column and the place_id field, respectively, this query uses the $lookup
aggregation step to join the paris collection with the accommodations collection. An additional
accommodation field that contains a number of matched documents from the accommodation
2. Results
The JSON data was downloaded and saved locally for later use from the URL that was provided.
To make the management and retrieval of the data easier, the data was imported into a
MongoDB database and particular collections, including Paris and TourPedia, were created.
Numerous fields are included in the "paris" collection of the "tourPedia" database to record
crucial information about each site, including _id, contact, name, location, category, description,
services, and reviews. Queries on the "paris" collection required filtering, aggregation, lookup,
necessary data from the "paris" collection. The idea of combining data from various collections
using a constrained set of results was shown using a specific query and the $lookup command.
3. Work Division
Asim Khan and Muhammad Salman worked as a team to develop the final database solution.
The entity relationship diagram (ERD) analysis and final database design were the responsibility
of Asim Khan. He further refined the ERD further by using association names and cardinalities,
and he resolved many-to-many links by including junction tables and alternative entities. The
dataset was obtained, and the JSON was denormalized with significant assistance from
Muhammad Salman. He changed the data into a format that could be used and would work with
the eventual database solution. Additionally, he made sure the information was correct,
build the database's framework and manage queries. He created the database's tables,
relationships, and schema, making sure they were efficient, clear, and consistent. Additionally,
he created queries that let users access and change data in the database.
Database architecture and query manipulation were the two key components of massive data
processing that were explored in this report. Combining entity-relationship diagrams (ERDs) and
creating a unique database schema were the topics of the first portion. The second part, which
employed actual data from the "tourPedia" database, concentrated on query manipulation. The
usage of big data principles was illustrated by the integration of ERDs, the construction of our
unique database schema, and query modification on the "tourPedia" database. This exercise
enhanced our knowledge of database design and query execution while highlighting the need of
data management and analysis in real-world contexts. This knowledge and skill set will be
References
Baig, S. A., & Khan, A. (2017). A comparative analysis of NoSQL databases. International
Chong, E., & Fan, J. (2019). NoSQL databases for big data applications: An overview. Big Data
https://doi.org/10.1016/j.ijmedinf.2020.104180
Hersh, W. R., Weiner, M. G., Embi, P. J., Logan, J. R., Payne, P. R., Bernstam, E. V., ... &
Lehmann, H. P. (2013). Caveats for the use of operational electronic health record data in
Masic, I., Pandza, H., & Kulasin, I. (2014). Importance of medical databases and registries in
Mehta, S., Sankaranarayanan, R., & Varadarajan, S. (2019). A comparative study of NoSQL
databases for big data applications. International Journal of Computer Science and Information
Patel, R., & Patel, D. (2018). A comparative study of relational and NoSQL databases.