Professional Documents
Culture Documents
Dlmdmdql01 Course Book
Dlmdmdql01 Course Book
Dlmdmdql01 Course Book
Masthead
Publisher:
IU Internationale Hochschule GmbH
IU International University of Applied Sciences
Juri-Gagarin-Ring 152
D-99084 Erfurt
Mailing address:
Albert-Proeller-Straße 15-19
D-86675 Buchdorf
media@iu.org
www.iu.org
DLMDMDQL01
Version No.: 001-2022-1013
Module Director
Prof. Dr. Peter Poensgen
Table of Contents
Data Query Languages
Module Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Introduction
Data Query Languages 7
Signposts Throughout the Course Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Unit 1
Introduction to Data Query Languages 12
1.1 Definition of Data Query Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Unit 2
Data Management 18
2.1 Data Life Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Unit 3
Fundamentals of SQL 28
3.1 Brief Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Unit 4
Advanced SQL 44
4.1 Transaction Control Language (TCL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Unit 5
Data Query Languages for NoSQL Database and Other Purposes
62
5.1 Document Databases (N1QL/Couchbase and MongoDB) . . . . . . . . . . . . . 62
Unit 6
Using Data Query Languages within Application Programming 82
6.1 Special Aspects (Architecture, Connection Management, Coding, and
Testing) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Appendix 1
List of References 94
Appendix 2
List of Tables and Figures 100
Introduction
Data Query Languages
8 Introduction
Welcome
This course book contains the core content for this course. Additional learning materials can
be found on the learning platform, but this course book should form the basis for your
learning.
The content of this course book is divided into units, which are divided further into sections.
Each section contains only one new key concept to allow you to quickly and efficiently add
new learning material to your existing knowledge.
At the end of each section of the digital course book, you will find self-check questions.
These questions are designed to help you check whether you have understood the concepts
in each section.
For all modules with a final exam, you must complete the knowledge tests on the learning
platform. You will pass the knowledge test for each unit when you answer at least 80% of the
questions correctly.
When you have passed the knowledge tests for all the units, the course is considered fin-
ished and you will be able to register for the final assessment. Please ensure that you com-
plete the evaluation prior to registering for the assessment.
Good luck!
Introduction 9
Learning Objectives
Databases are one of the most used technologies today. Almost no industry does not use
databases of some sort to store data. This storage also enables data to be grown, manipu-
lated, and exported for various uses. Thus, it is important to understand how to access this
data using data query languages.
The course Data Query Languages addresses how data are stored in databases, and it also
defines various types of data. This is important for the data life cycle, a key concept that
describes the multiple stages that data go through when used in a database.
The most used data query language is SQL, and it consists of a data definition language, a
data query language, and a data manipulation language. There are also advanced commands,
as seen with transaction control languages and data control languages.
Although SQL still has prominence and is usually the first data query language that students
study and explore, numerous other databases are becoming more common as market needs
change. NoSQL is one such category of databases, and it includes document databases,
graph databases, and GraphQL. All of these databases are influential in the data manage-
ment industry. They each play an important role depending on the requirements of a project
and the particulars of the data utilized.
Unit 1
Introduction to Data Query Languages
STUDY GOALS
DL-E-DLMDMDQL01-U01
12 Unit 1
Introduction
The number of programming languages continues to grow each year as complexity
increases, technology advances, and the needs of the average developer evolve. As with
many other technologies, whether iPhones or TVs, programming languages evolve in
generations that can be divided by appearance, advancement, and technology facilita-
tion.
Data query languages are widely used by programmers as they are essential to the
operations of their designated databases. Everywhere that a database is used, query
languages are also used. Students in many professions beyond computer programming
are learning query languages and databases, as professions and career fields increas-
ingly find this knowledge advantageous to their non-technical fields.
In general, data query languages are relatively easy to read and understand by any
user. Although advanced queries and commands need to be studied and reviewed (as
with any other programming language), good queries make for quick transactions
within the database and for easy review, without requiring a high level of expertise.
query languages. This growth has allowed many without formal expertise in data lan-
guages to pull and utilize data, and anyone who can drag and drop can create analytic
reports.
Earlier generations included the very basic machine language in the first generation, Machine language
assembly language in the second, and high-level language in the third (Flynn, n.d.). This is a computer
These three types of languages lack the complexity found in data languages. For exam- programming lan-
ple, machine language included extremely simple commands—think 0s and 1s. Query guage that consists
languages go far beyond that level of complexity. They allow for tasks like pulling and of binary or hexa-
updating complex data using given functions. This advanced complexity means that decimal instructions
fewer lines of code are used compared to what was needed in earlier generations. to which a computer
can respond directly.
Although the code is more complex, query languages are significantly easier to read
since they are written in English-like sentences (Flynn, n.d.). Thus, users can read the Assembly language
code more naturally and learn it more easily than in past generations. Languages in This is a type of low-
this fourth generation include Perl, Python, and Ruby (Enos, 2020). level programming
language that is
The shift from the fourth generation to the fifth generation was a significant leap. intended to commu-
Although the fifth generation of programming languages is the most “current” genera- nicate directly with a
tion, past generations are still widely used. Fifth-generation languages are defined as computer’s hard-
“any programming language based on problem-solving using constraints given to the ware.
program, rather than using an algorithm written by a programmer” (Enos, 2020, para. 7).
This generation did not alter how the code looked; instead, it changed how the code
worked. Data query languages within this generation vary greatly, as it depends on the
programmer to ensure that queries are written correctly. Fifth-generation languages are
moving away from the necessity of a programmer. In fact, they give the computer a
greater share of the work.
What makes the data query language so strong is its uniqueness compared to other
programming languages. It has objects, it relies on strict procedures, and its main goal
is to retrieve or manipulate data rather than to run a program. In some sense, data
query languages are the strong legs that support the table of the programming applica-
14 Unit 1
tion, where another language—such as Java, Python, or .NET—runs the rest of the pro-
gram. However, the main program cannot do anything with the data without the query
language’s support.
Although it was the first data query language, SQL isn’t the only popular language in
use today. SQL belongs to the relational family of databases, which makes it a strong
candidate for anyone who needs to organize large quantities of data for reporting tools
or management systems. However, with the growth of big data and its increasing role in
business, NoSQL (which stands for “not only SQL”) was developed to fulfill what SQL
and relational databases could not (Meier & Kaufmann, 2019).
A standard example of the graph database model is Cypher for Neo4j. For JavaScript
object notation (JSON), the query language JSONiq is used. This database model is a
declarative query language that allows the user to store and retrieve data from the
graph database (Cypher Query Language, n.d.). The key-value store model follows a
hash data structure, with a unique key and a pointer. Amazon SimpleDB is an example
of this model. An example of the column model is Cassandra, which arranges columns
by column family and in which keys point to multiple columns. Finally, the standard
example of the document stores model is MongoDB. This model stores information in
JSON-like documents (Big Data Analytics News, 2014).
Another query language to note is MDX from Microsoft, which is standard for OLAP
tools. MDX stands for multidimensional expression and is the query language for multi-
dimensional languages. These queries can result in multiple dimensions. To conceptu-
alize this, imagine cube-like results (Moran, 2019).
Unit 1 15
Summary
Data query languages are the key to accessing the data found in a database. They
are used alongside other programs to access data. Their specialized queries make
accessing, modifying, and deleting data easier for the user; however, they may not
be easily readable.
As with other programming languages, data query languages have evolved over
time and new languages have been created to suit the needs of today’s users. Data
query languages belong to the fourth generation of programming languages. Com-
pared to other programming languages used today, data query languages are more
evolved than those of previous generations, yet they lack the complexities of the
current fifth generation.
Knowledge Check
You can check your understanding by completing the questions for this unit on the
learning platform.
Good luck!
Unit 2
Data Management
STUDY GOALS
DL-E-DLMDMDQL01-U02
18 Unit 2
2. Data Management
Introduction
Data are enigmas. Definitionally, data are factual pieces of information that we use to
measure an object or concept (Watt & Eng, 2022). While data may seem like simple
information, they go through life cycles completely unique to themselves. Data can be
broken down into unique categories and can be divided based on style, appearance,
and structure. They can be used as part of a calculation. They can be a person, a place,
an event, or any number of things. In fact, the different forms data can take are end-
less, and therefore categorizing data and understanding its life cycle is more complex
than meets the eye.
The data life cycle provides the framework for the successful management and reuse of
data. Much like other life cycles in technology (or even non-technology), it follows the
same pattern of creation, living, maintenance, and death. Data, far from straightforward,
dynamically expand in this life cycle, and important steps are required for data to be
properly utilized.
Data are also divided between structure, unstructured, and semi-structured based on
how they look, how they can be stored, and if a user can easily read the information
they indicate. Data were first conceived as simple pieces of information; however, over
time it was discovered that they can be cataloged, categorized, and placed based on
their structure.
1. Creating
2. Storing
3. Using
4. Archiving
5. Destroying
Unit 2 19
Data Management
Creating
Data needs to be created before anything can be done with it. Therefore, creation is the
first phase. There are multiple ways data can be created by an organization (Christian-
sen, 2021):
These three ways can also include other various aspects (e.g., governance), sub-steps
(e.g., procedures), or sub-life cycles in which the data can be manipulated into proper
formatting. The creation phase can be deceiving as it appears quick and straightfor-
ward. However, as mentioned, data are enigmas. They rapidly grow and are unraveled.
20 Unit 2
Storing
Data must be stored in the correct location (physical or virtual), and strong security
measures, efficient retention policies, and other lifecycle management processes must
be used when doing so (Christiansen, 2021). Organizations should also consider strong
backup processes, the need for multiple locations (whether they should be on-site or
off-site), security around these backups, and the schedules for how often these back-
ups are created. All these time policies contribute to resilient data retention by the
organization and strong security for the users. For example, whenever a data leak
occurs, it is common that a key element of the storing phase was missing, which
caused the data to be stored insecurely and, thus, made vulnerable.
Using
Once the data are created and stored somewhere safe, they can then be used. This
stage forms the basis for some decision-making processes within an organization
(Christiansen, 2021). Data, in this phase, are retrieved, changed, updated, moved to
other applications (potentially overlapping with another data life cycle phase), and
saved for use in other contexts or for other purposes.
Audit trails Security is still just as important in this phase. Audit trails, which track who is using
These are a record of data and how they are doing so, are an important and easy way to retrieve backup ver-
the changes that sions should a user manipulate data incorrectly, whether this was done maliciously or
have been made to a innocently. Specialists will usually create security levels for each type of user. For exam-
database or file. ple, administrators will have access to all data, while regularly read-only users may only
have access to general data.
Archiving
When the appropriate time arrives, or at consistent intervals, copies of the data will be
moved to separate storage locations. This step is relevant to the work of developers,
while it is not generally important to users. In this step, developers will remove any
data that is unnecessary or old from the production database (Christiansen, 2021). This
increases efficiency as it eliminates maintenance of this unnecessary data in the active
database. As this step is time-consuming, costs more money, and has a lower return,
users don’t see much use for data when it is not in production or in active use. Devel-
opers know differently. Developers understand how data are important down the road
and in unforeseeable future situations. Archiving may be a mundane step, but it is
essential to the overall data life cycle.
Unit 2 21
Data Management
Destroying
If all the data ever used by a company were always collected and archived, a significant
amount of useless data would be unnecessarily preserved. Not all old data are useless,
but some data are not worth keeping. If all data were retained, the collection size
would cause the amount of storage needed to balloon, which would prove costly to the
company (Christiansen, 2021). Therefore, companies must decide which data should be
destroyed.
As stated by Blancco (2019), “it’s also important to note that destroying data (data
destruction), is not the same as destroying the media on which data is stored (physical
destruction)” (para. 10). Physical destruction involves making the place where the data
are saved completely unusable. For example, this could include completely breaking a
hard drive or shredding storage media. Non-physical destruction ensures that the data
are destroyed forever. For example, this could include using a magnetic eraser to sani-
tize the data source, a process that makes the data irrecoverable. (Blancco, 2020). The
“data destruction process is confirmed using recognized verification methods and pro-
duces a certified, tamper-proof report” (Blancco, 2020, para. 15).
Structured
Structured data is highly organized and easy to read. Most often found in relational or
SQL databases, data in this dataset can be easily added, updated, and deleted by the
user (IBM Cloud Education, 2021). Generally, any user with an understanding of data
could use this type of dataset, including accessing and reading it. This has led to the
creation of many tools that make the use of structured data easier.
Structured data have the following properties (Meier & Kaufmann, 2019, p. 144):
• schema. The database must have a structure including proper table formation,
integrity constraints, and the definition of referential integrity.
• data types. With the use of the relational database, data will always be found in a
data type (for example, CHARACTER, INTEGER, DATE, or TIMESTAMP).
22 Unit 2
With structure come limitations, however. The dataset lacks flexibility and usability as it
is so highly structured (IBM Cloud Education, 2021), and its schemas require high
amounts of storage space.
Because the data is unstructured, it remains pure and in its natural format. Unlike
structured data, in which data need to be divided and changed to fit the needs of the
database schema, unstructured data remains untouched and in the same format as it
is received. This is a positive for any data scientists who wish to use the data for analy-
sis. This also makes it easier to pull data quickly, since no changes need to be made
after the data are pulled. Thus, the rate of accumulation is much faster with semi-
structured data.
However, because the data is less structured and has low readability, it requires some-
one with high expertise to use it to prepare reports. This expertise can come from data
scientists or even specialized tools designed to handle unstructured data.
Data Management
• Structured data are stored in the form of tables that require less storage space. They
can be stored in data warehouses, which makes them highly scalable. Unstructured
data, on the other hand, are stored as media files or NoSQL databases, which
require more space. They can be stored in data lakes, which makes them difficult to
scale (IBM Cloud Education, 2021).
• Structured data is used in machine learning (ML) and uses its algorithms or strong
report writing, whereas unstructured data is used in natural language processing
(NLP), text mining, and big data management (IBM Cloud Education, 2021).
• Structured data have a predefined data model and are formatted to a set data
structure before they are placed in data storage, whereas unstructured data are
stored in their native format and are not processed until they are used (IBM Cloud
Education, 2021).
Semi-Structured
Semi-structured data have the following properties (Meier & Kaufmann, 2019, p. 144):
• They consist of a set of data objects whose structure and content are subject to
continuous changes.
• Data objects are either atomic or composed of other data objects (complex objects).
• Atomic data objects contain data values of a specified data type.
• Data management systems for semi-structured data work without a fixed database
schema since the structure and content change constantly.
There are two major categories of databases in use today: SQL and NoSQL databases.
Both branch off into separate database query languages, and both have designated
needs that make them powerful or even necessary according to the specific needs of
the user or the requirements of the project.
SQL databases belong to the relational database family, which means that all data
within the database can be related through keys or could be generally related to
lookup data in relation to other data. A relational database thus has more structure
than other types. The role of this database tends to involve data that requires organiza-
tion, reporting, or has the need for a designated structure. For example, customer rela-
tionship management (CRM) systems or sales systems find this type of database useful.
SQL databases are “mainly designed for integrity and transaction protection” (Meier &
Kaufmann, 2019, p. 201). This leads to restrictions on the amount of data that can be
involved, as more data lead to slower processes. NoSQL databases, with their schema-
less, non-relational databases, were invented because of this limitation.
NoSQL databases are non-relational databases and are much newer to the industry.
They were created as the need to store increasingly large amounts of data increased.
These databases are thus associated with the handling of large datasets by data ana-
lysts and big data scientists. These databases normally have a significant number of
data but little to no structure at all. They can handle large amounts of data and have
greater processing power.
NoSQL is also a collective term that refers to multiple models. Each model represents
its own unique framework that differs from others in terms of schema and use. There
are four such setups (Meier & Kaufmann, 2019, p. 16):
1. Documents
2. Graphs
3. Value-key pairs
4. Columns
Although these structures are different, they can all process large amounts of data
quickly while using a massively distributed storage architecture. This includes analyzing
large volumes of data or searching for specific results. This “strong consistency means
that the NoSQL database management system ensures full consistency at all times”
(Meier & Kaufmann, 2019, p. 16).
NoSQL also provides flexibility with its schemaless structure. As explained by Vaish
(2013), “almost all NoSQL implementations offer schemaless data representation. This
means that you don’t have to think too far ahead to define a structure and you can
continue to evolve over time— including adding new fields or even nesting the data”
(p. 23). NoSQL provides quick development, consistent querying, and increased func-
tionality for large datasets that are interesting to note for any query language student.
Unit 2 25
Data Management
Summary
Data are pieces of information that are stored within a database, and databases are
used everywhere. Much like any other modern-day technology, the design, role, and
number of databases offered have exponentially grown, mostly in the last few dec-
ades. Databases hold the data that society uses to store valuable information, cre-
ate reports to form business decisions, and retain secure information on subject
matters.
Data can be viewed in a lifecycle. Data are created, they live, and they die (or are
reused). Data are not static. They go through lifecycles much like projects or biolog-
ical organisms go through their respective lifecycles. This data lifecycle process is
essential for organizations to follow to ensure data is managed properly, stored
correctly, and destroyed following the proper guidelines.
When broken down, data can be understood based on their data structure. Data
can be categorized as structured, semi-structured, or unstructured. In this order
(structured to unstructured), data go from most readable to least readable, most
usable to least usable, and least flexible to most flexible. No structure is inherently
better than the other. They are equally important to understand for data manage-
ment, projects, and databases.
Knowledge Check
You can check your understanding by completing the questions for this unit on the
learning platform.
Good luck!
Unit 3
Fundamentals of SQL
STUDY GOALS
… describe structured query language (SQL) and the four types of subqueries within SQL.
DL-E-DLMDMDQL01-U03
28 Unit 3
3. Fundamentals of SQL
Introduction
The use of relational databases is becoming standard practice for most industries
today. According to this trend, the use of the programming language needed to run
these databases, structured query language (SQL), has also increased.
SQL has been around for decades. Because of its history, it has a level of complexity
that makes it powerful. What is interesting about SQL is that, although it’s old and com-
plex, its general command syntax is relatively simple. This simple syntax breakdown is
why it’s so easily accessible to the everyday coder. The commands that run SQL can be
broken down into four categories that each designate a distinct function. These are
commands that create the databases, populate tables, pull data, and destroy data.
Above the level of these commands are data types that allow data to even become
more structured and related. In general, data can be characters, numbers, or dates.
Complexity is derived from the details of these data types. They can be of different
sizes, which means they can hold various sizes of data based on the needs of the data.
Although the data may be complex depending on these details, the syntax of the data
types remains quite simple.
SQL is much like any other programming language. Understanding its basics helps pro-
grammers understand how databases work and relate through objects and data. Most
importantly, the commands that run through a relational database to make it all work.
Queries are the commands SQL creates to deal with these roles and data manipulation
needs, and this result can then be used for more analysis or even further queries.
These queries can be divided into the following four categories (GeeksforGeeks, 2021b):
Fundamentals of SQL
Under these categories are commands that are pivotal to running an SQL database,
including setting up, creating tables, and entering data. These categories, along with
their commands, can be viewed in a hierarchy, as seen in the following figure.
As shown, there are advanced languages, such as transactional control language (TCL),
that can be further researched and studied. However, all these languages fit within the
main language of SQL, as well as their key commands. SQL goes hand-in-hand with the
relational database model, in which tables are connected through relationships. These
relationships provide the power of the queries where data can be connected, pro-
cessed, and corrected.
When writing commands in SQL, there are rules that should be followed for easy read-
ing but will not break the database if they aren’t followed. They will, however, make the
code look professional and effective for the reader. These accepted rules include the
following (Watt & Eng, 2014):
• If they have multiple parts, these sub parts should be indented and lined up to
show relationships.
• Upper-case letters should be used for reserved words, such as the clauses them-
selves.
• Lower-case letters should be used to represent user-generated words, such as table
names or column names.
Data Types
Relational databases can store data using data types. These data types are mostly
strings (words), numbers, and dates. SQL does have specific data types that differ from
other programming languages; however, many columns will mostly include the same
simple data types (Beaulieu, 2020). The three data types most found in SQL are charac-
ter, numerical, and date.
Each SQL language (e.g., MySQL, Microsoft SQL Server, and Oracle) will have slightly dif-
ferent data types from each other; however, the declaration and standard data types
below are generally found in all languages. They will be given in MySQL syntax for stan-
dardization, and documentation should be checked if a different language is imple-
mented for confirmation.
Character data
Fixed or variable Character data can be stored in a fixed or variable length depending on the data nee-
length ded. Fixed length is indicated by the keyword “char,” while variable length delineated by
Data in fixed length keyword “varchar.” When declared, they appear as follows:
indicate the record
will always store the char(20) /* fixed-length */
specified amount of varchar(20) /* variable-length */
data. Data in vari-
able length will only These both declare the storage of 20 characters in length. However, because of their
hold the given intrinsic characteristics in MySQL, the char column can only store 255 bytes maximum,
amount of data. while the varchar column can store up to 65,535 bytes (or around 64 KB) (Beaulieu,
2020).
In general, the char data variable is used for smaller, predetermined character sets
such as defined yes/no columns, state/province abbreviations, or gender. Varchar is
suggested for anything longer such as names, state names, or addresses. There are lon-
ger text types that can utilize more bytes and be declared if needed. If the varchar can-
not hold the need of the data at 64 KB, there are text data types that can exceed this
size. Various SQL text data types of varying sizes are shown in the table below (Beau-
lieu, 2020).
Fundamentals of SQL
Tinytext 255
Text 65,535
Mediumtext 16,777,215
Longtext 4,294,967,295
Numerical data
As in the case of character data, it would be wrong to assume that the numerical data
type is simple. It also has multiple types that can be chosen based on the specific
needs of the database and users. Like the character type, the numerical data will
appropriate the necessary space to store the data based on the type given by the user
(Beaulieu, 2020).
There are five distinct integer types. They are shown in the following table, along with
their minimum and maximum signed values and unsigned values. Signed value
A signed value can
represent both posi-
Required Storage and Range for Integer Types Supported by MySQL tive and negative
numbers; unsigned
Type Storage (Bytes) Value signed Minimum value values can only
range unsigned represent positive
numbers.
The sizes of these types can range from the tinyint type, which can be one byte in size,
to the bigint type, which can hold eight bytes. Therefore, the space needed to hold the
numerical data should align with the type chosen.
32 Unit 3
For example, if a simple Boolean (that is, a true or false) is needed, it could be saved in
the database as a 0 or 1. In this case, the data type used could be a tinyint, as it needs
very little space (just one digit). A larger type could also be used without causing any
explicit errors. However, using a larger size would incur unneeded space and overallo-
cated space, something database developers try to avoid.
Another example would be an employee’s salary. Tinyint, which cannot store more than
255 bytes, should not be used. Nor should a smallint we used, which has a maximum of
65,535 bytes. Mediumint is a sufficient data type for many salaries as it holds up to
8,388,607 bytes, but this should be discussed with the users and the company stakehol-
ders. However, if a CEO’s million-dollar salary needs to be stored, mediumint would not
be sufficient to contain it.
DATE '0000-00-00'
TIME '00:00:00'
YEAR 0000
The DDL is a set of standards that all structured query languages follow (Pedamkar,
2020a). This means every instance of SQL follows these commands, which makes them
uniform and fundamental to understand. It is simple to understand these commands
Unit 3 33
Fundamentals of SQL
CREATE Command
When designing a new database, these are the first commands used by the user to
create these essential objects. For example, a new employee table will be created by
using the DDL CREATE employee command. Within this command, columns and
column data types are designated.
In this case, the employee table could be created using the following command: Primary key
This refers to the set
CREATE employee of attributes that
( emp_in SMALLINT, uniquely identify the
name VARCHAR(30), row.
phone INT,
CONSTRAINT pk_employee PRIMARY KEY (emp_id)
);
The statement above creates a new employee table with three columns: emp_id, name,
and address. These columns each have a data type that describes the data that should
be saved in that column. For example, the name is of the type varchar(30), which
means that it contains characters with a maximum of 30 bytes.
ALTER Command
If the table (or another object) needs to be changed, the ALTER TABLE command is
used. This command can be used to add a column, change an existing column, rename
a table or column, or delete a column.
Not only do these syntaxes follow the same general coding steps, but they are also
easy to read and follow, even for novice coders.
For example, using the syntax above, if a new “birthdate” column needs to be added to
the already created table (employee), the relevant line of code will look like the follo-
wing:
DROP Command
Finally, when the table becomes obsolete (as well as the data inside of the table), DROP
TABLE should be used. This command will delete the table as well as any data con-
tained inside this table.
The table called “employee” that was previously used can be dropped using the follow-
ing line of code:
The SELECT command has multiple clauses that make up the larger query. In fact, only
one clause is needed to run it (the SELECT clause); however, the more clauses that are
included, the more precise the query can pull data. The clauses are listed in the follow-
ing table.
Unit 3 35
Fundamentals of SQL
Query Clauses
For example, we can generate a list of students’ first and last names from the employee
table using the following query:
This query pulls all the data from the columns “name” and “phone” that are located in
the employee table.
The clauses should be listed in the order above; however, they do not always need to
be present based on the needs of the query results.
FROM Clause
The FROM clause is the second clause listed in the SQL query. This clause defines all
tables that will be needed to run this specific query as well as any links between them
(Beaulieu, 2020). The tables do not necessarily need to be permanent tables. They can
also be derived, temporary, or virtual tables.
Permanent tables
Permanent tables are the most common table and are thus well known among data-
base developers. They store data within rows, have attributes used to pull more infor-
mation, and are stored within the database as a table.
36 Unit 3
Derived tables
Derived tables, or subqueries, are “a query contained within another query” (Beaulieu,
2020, p. 92). A derived table is surrounded by parentheses and is found in the FROM
clause as if it were another table. Because it is in the FROM clause, it can interact with
other tables in the same clause as if it were a permanent table.
SELECT information
FROM (SELECT concat(name, ‘ ‘, dateofbirth) as information
FROM employee
WHERE employee.name = “John Smith”) emp;
In this example, the derived table emp is being created within the parentheses in the
FROM clause and can therefore be used by the SELECT clause.
The derived table within the parentheses is also being saved as an alias. This allows
one to easily call in other parts of the SELECT statement—as in the example in the
SELECT clause that uses the information alias. An alias can be created at any time by
using the keyword ‘AS’ and by adding any name after it to indicate the alias.
Temporary tables
Temporary tables are implemented much like permanent tables. However, a major dif-
ference is that temporary tables are deleted after a set time, usually after the session.
Each database server is different, and this type of table should be used with caution.
WHERE Clause
The WHERE clause is the filter condition of the SELECT statement. It utilizes columns
found from the FROM clauses to filter necessary or necessary data by satisfying the
given expressional conditions (Sharma, 2020). This is completed using comparison ope-
rators such as LIKE,<,>,=, etc.
SELECT Column1,….ColumnN
From Table_name
WHERE [condition];
Unit 3 37
Fundamentals of SQL
Using the employee table, it could be implemented using the following query:
This query is filtering the rows to only show data from the employee data with the
column name containing the name “John.”
GROUP BY Clause
The GROUP BY clause works a little differently. It is used to group rows that have the
same value and the result set (Sharma, 2020). Usually an aggregate function is used on
remaining rows, e.g., count, sum, average, maximum, or minimum.
For example
The above query will group all names together into one row so that there are no dupli-
cates, and then it will count the number of occurrences using the count() aggregate
function. The “*” in the count function indicates “all columns.” It thus counts the num-
ber of rows, regardless of whether data exist.
To put in better perspective, consider the following as the original employee table.
John 01/04/1965
Sara 06/15/1979
John 10/10/1999
Marta 11/06/1980
Sara 09/12/1988
John 04/19/1991
38 Unit 3
The query used as an example above would then produce the following.
Group By Example
Name Count(*)
John 3
Sara 2
Marta 1
HAVING Clause
The HAVING clause is ancillary to the WHERE clause and is used to apply functions. It is
used in this way because the WHERE clause does not allow aggregate functions
(Sharma, 2020). Whenever aggregate functions are needed, they can be added with this
clause.
SELECT Column
FROM Table
WHERE condition
GROUP BY Column
HAVING condition
[ORDER BY Column];
ORDER BY Clause
The columns listed after the ORDER BY clause return the query in the order based on
the columns listed. It is the sorting clause of the SELECT clause (Sharma, 2020). If no
columns are listed, or if the ORDER BY clause is not present, the result will not be in
any particular order (Beaulieu, 2020). As stated by Beaulieu (2020), “the order by
clause is the mechanism for sorting your result set using either raw column data or
expressions based on column data” (p. 94).
The columns in the ORDER BY clause do not need to be listed in the SELECT clause;
however, they do need to be present in the tables called in the FROM clause.
Furthermore, while the ORDER BY clause will automatically use an ascending order, it
can be switched to a descending order by using the DESC keywords. The ASC keyword
can also be used for better descriptions.
Unit 3 39
Fundamentals of SQL
the result will be the same list but with the names listed in an descending order. The
same rule can be applied to columns that have numbers or dates as the data type.
INSERT Command
The INSERT command is used to add data into the database. The general syntax for an
INSERT command is
In this case, one row of data will be inserted into a table. For example,
will insert a row of data into the employee table. The table that records will be added
to “should be called” in the table, and the columns data will be added to “should be
listed.” Data should then be listed in the values in the same order of the columns that
the data will be pushed into (Elmasri & Navathe, 2017).
40 Unit 3
It is also possible to insert more than one row. This is done by listing multiple values in
the parentheses and using a comma to separate the groups of data. For example,
As shown, the data are listed in order as the columns. This is the same as inserting
rows and separating each one with a comma.
An additional way to insert data is by using data from another table. This is done by
Subquery using a subquery. The following shows an example of this:
These are queries
that appear inside INSERT INTO employee (emp_no, name, phone, birthdate)
another query state- SELECT pers_no,
ment. Concat(first_name, “ “, last_name),
Pers_phone,
Dob
FROM person
As with other INSERT statements, the number of columns, as well as the type of data
found in each column, matches the INSERT table.
Because it is a subquery, this query can be expanded using the other SELECT clauses,
which narrows down the exact data that needs to be added to the table.
UPDATE command
The UPDATE command is used to modify an existing data point within the database.
This can be done in multiple rows or columns yet in only one table at a time.
UPDATE table_name
SET column_1 = expression_1 ,
column_2 = expression_2 ,
...,
column_n = expression_n
[WHERE predicates] ;
In the set statement, a column can only be listed once. While the WHERE statement
helps narrow down the exact row or rows specified.
Unit 3 41
Fundamentals of SQL
DELETE Command
The DELETE command will remove data from a specific column, row, or table in a data-
base. The basic syntax for this command is as follows:
The WHERE clause should be added unless an entire table is requested to be deleted.
The WHERE clause will filter to the exact row, column, and data point to be deleted
(Elmasri & Navathe, 2017).
For example, DELETE FROM employee will delete all the data from the entire employee
table but will not delete the table itself. However, if a WHERE statement is included,
such as
data will only be deleted in the rows in which the column “name“ contains the name
John (Elmasri & Navathe, 2017).
Summary
SQL is run by powerful commands that can be divided into four different categories
or sub-languages. By learning key commands in each of these four sub-languages,
a developer can create a database, add data, and navigate through queries. These
four categories define how the data is being interacted with and what precisely is
being done.
SQL is an older language but is still among the most used since it is the language
of relational databases. What adds to its power is the ease with which new pro-
grammers or even non-programmers can learn it. The syntax to all commands stays
relatively the same, regardless of which attribute is being updated or which data is
being inserted. The relatively simple syntax means that people across many indus-
tries can quickly learn and understand this language, which contributes to SQL’s
popularity.
42 Unit 3
Knowledge Check
You can check your understanding by completing the questions for this unit on the
learning platform.
Good luck!
Unit 4
Advanced SQL
STUDY GOALS
DL-E-DLMDMDQL01-U04
44 Unit 4
4. Advanced SQL
Introduction
When one steps past structured query language’s (SQL’s) most basic commands, one
finds how it has developed very sophisticated steps. This is seen in its use in transac-
tion control language (TCL) or procedural language for SQL (PL/SQL), in which multiple
steps and tasks are added to the previously online command.
This expansion leads to further abilities, such as coding control with transaction com-
mitting and rollback, and even creating functions and variables with PL/SQL. They both
start with the commands of SQL but can expand into groups of code, which allows for
powerful interactions with the database. In the case of PL/SQL, this can lead to becom-
ing a new application, one built entirely on SQL.
With this expansion of abilities comes the importance of controlling security and
access using data control language (DCL). This sub-language handles the commands
that permit and deny users access to specific objects within a database. As applications
become more complex using SQL commands, TCL, or PL/SQL, these DCL commands are
important steps to securing any database. They also allow users to utilize databases
more easily.
When that is completed, the SET TRANSACTION command is run, which gives a transac-
tion a name:
If the transaction is a success, the changes can be saved in the database. This is com-
pleted with the COMMIT command:
COMMIT;
Unit 4 45
Advanced SQL
This transaction would delete two rows from the employee table, followed by commit-
ting (or saving) the deletion.
If any error were to occur, or if the SQL statement run was incorrect, the ROLLBACK
command can be run as follows:
ROLLBACK;
This command can only be used to undo transactions since the last COMMIT or ROLL-
BACK command was issued (GeeksforGeeks, 2020b).
Another point in a transaction is a SAVEPOINT, where changes can be rolled back to the
specified point without rolling back the entire transaction (GeeksforGeeks, 2020b).
SAVEPOINT SAVEPOINT_NAME;
One can return to this point by using the ROLLBACK command, which undoes changes
to that specified point. The syntax using the ROLLBACK command to the SAVEPOINT is
as follows:
ROLLBACK TO SAVEPOINT_NAME;
To put this all into context, these commands could all be used as such:
ROLLBACK TO SAVEPOINT_NAME;
SAVEPOINT SP1;
//Savepoint created.
DELETE FROM employee WHERE name = “Joseph Turner”;
//deleted
SAVEPOINT SP2;
//Savepoint created.
In this example, SP1 is first SAVEPOINT created before deletion. In this example, one
deletion has taken place.
Assume that a deletion has taken place, but that it was done in error (e.g., an incorrect
employee was deleted). Thus, you decide to ROLLBACK to the SAVEPOINT that you
identified as SP1, which is before deletion. This can be done by running
46 Unit 4
ROLLBACK TO SP1;
//Rollback completed.
Finally, after a SAVEPOINT is used, it can be removed. This is done using the RELEASE
SAVEPOINT command in the following manner:
This makes the SAVEPOINT ineligible to be used again by the ROLLBACK command to
undo transactions. This “is used to initiate a database transaction and used to specify
characteristics of the transaction that follows” (GeeksforGeeks, 2020b, para. 23).
Functions
Functions are key components in creating powerful steps in any coding file. According
to Taylor (2019), “functions perform computations or operations that are more elabo-
rate than what ... would [be expected by] a simple command statement to do” (p. 2017).
There are multiple types of functions in SQL; the following are a small sample.
CURRENT_DATE
CURRENT_TIME(1)
Returns the current time with the number of digits beyond the decimal point specified
in the argument. In this case, 08:22:24.3.
CURRENT_TIMESTAMP(2)
returns the current timestamp, and as with the CURRENT_TIME, will return the number
of digits beyond the decimal point using the number specified in the argument. In this
case, 2022-01-12 08:22:24.32.
Unit 4 47
Advanced SQL
Numeric functions
Numeric functions can take in a variety of data types, but the output will always be a
numeric value (Taylor, 2019). SQL has 14 types of numeric value functions. Not all will be
discussed in this unit, but all are worth reading about.
For example,
returns 1. If the string is not found, the function returns 0. For example,
returns 0. If the target string has a null value, the result is also null.
For example,
String Functions
Functions also exist when manipulating strings. SQL allows a significant number of
string functions, and the following sections provide an overview of the most basic of
these functions.
SUBSTRING (FROM)
The SUBSTRING function is designed to cut the string by using the starting point and
ending point passed through the argument. For example,
will return “tran.” These four characters are returned by the query because it cuts four
characters out of the string, beginning with the eighth character and finishing with the
twelfth (Taylor, 2019).
48 Unit 4
For example,
UPPER(“HapPy”);
LOWER(“HapPy”);
Conversion functions
It is common to have to convert from one data type to another. Conversion functions
are designed to perform this task. SQL processes conversion functions with the CAST
expression (Taylor, 2019).
CAST can convert one data type to another and then back again. For example, it can
convert a date to a string and then convert the string back into a date. As in, the DATE
type ‘2020-02-20’ can be converted to a string. However, there will be an error if you
attempt to convert a non-date string. To implement this expression, pass the data to be
converted along with the new data type. For example,
CAST(‘16234’ AS INTEGER)
Mathematical functions
Although there are several useful mathematical string functions in SQL, the ones that
are most important to know are COUNT, AVG, MAX, MIN, and SUM. These are all aggregate
functions, and the tasks they perform will be exemplified through the use of the table
below.
Table grocery_order
Advanced SQL
Table grocery_order
Apple .25 5
Chips 3.45 1
Milk 1.79 2
Butter 3.88 -
Bread 4.23 -
The COUNT function returns the number of rows in the table. For example,
SELECT count(*)
FROM grocery_order;
will return the number of rows in the grocery_order table, which is 5. In contrast,
SELECT count(number)
FROM grocery_order;
The AVG function calculates the average of the values in the specified columns. The
function works only on columns that contain numeric data (Taylor, 2019). For example,
SELECT AVG(number)
FROM grocery_order;
Like the AVG function, the MAX function will find the maximum numeral in the column.
For example,
SELECT MAX(number)
FROM grocery_order;
will return 5.
Additionally, the MIN function will find the minimum numeral in the column. For exam-
ple,
SELECT MIN(number)
FROM grocery_order;
50 Unit 4
will return 1.
Finally, the SUM function is used to add all numbers in a column. For example, if nee-
ding to find the grocery items ordered, the following query would be used:
SELECT SUM(number)
FROM grocery_order;
ISNULL
The ISNULL function checks if the value passed contains any value or is null (Snaidero,
2021). “Null” is a special marker used in columns to indicate when a value does not
exist (Snaidero 2021). This is distinct from an empty string or from 0; instead, in the
case of “null,” no value exists.
ISNUMERIC
The ISNUMERIC function denotes if the input is a numeric type or not. As described by
Fadlallah (2021), “if the input expression is evaluated to a valid numeric data type, SQL
server ISNUMERIC returns 1; otherwise, it returns 0” (para. 2).
SELECT ISNUMERIC(4567);
Well-written queries bring about strong data results. Occasionally, the data needed for
these results exist across multiple tables. When this occurs, tables need to be joined in
the query, and this can be done using multiple methods.
For the following example queries, two tables will be used, a student table and a
teacher table. The student table is composed of information on the students, and it
contains five columns.
Student Table
Advanced SQL
The teacher table also contains five columns and contains information about teachers.
Teacher Table
If a list needs to be generated of all the teachers and students, this can be done in
multiple ways in SQL. The SQL operators UNION, UNION ALL, and JOIN all create this
list, but using different rules and syntaxes. There are different positives and negatives
associated with using each method.
UNION
The UNION operator is much like basic algebraic addition. It takes one table and adds it
to another. This operator “enables you to draw information from two or more tables
that have the same structure” (Taylor, 2019, p. 310).
A key aspect of this definition is that the tables must have the same structure, which
means they must have the same number of columns in the SELECT clause. The corre-
sponding columns must all have the same data types and lengths. Along with having
the same structure, the UNION operator will add all rows while eliminating duplicates
(Taylor, 2019, p. 310).
For example, the following command shows the proper use of the UNION operator:
52 Unit 4
Notice that both SELECT clauses call the same number of columns (3) and the
columns match in terms of data type.
UNION Result
With these results, note that the column names of the first table are used in the resul-
ting table but can be changed using aliases.
Advanced SQL
Although each SELECT clause contains the same number of columns, the level and
gender columns are distinct data types.
UNION ALL
If a query is needed where duplicates should not be removed, the UNION ALL opera-
tion should be used. Because the UNION operation will eliminate any duplicate rows, if
the desired result requires these duplicates to be shown, the ALL operator should be
added (Taylor, 2019).
Student Table 2
Teacher Table 2
A new row is added that contains the same data in the first three columns should the
following query be called:
UNION Result 2
With the addition to the ALL operator, the query will appear as follows:
Advanced SQL
UNION All
The UNION ALL operator will include all data, regardless of duplicates. Therefore,
the “Mae Davis” student and teacher rows are included twice after this operator is
used.
JOIN operators
To create queries that combine data from multiple data sources and, thus, to create
more exact results, tables may be combined using JOIN clauses. JOIN clauses can be
done in multiple ways depending on the needs of the final query (Taylor, 2019).
Natural JOIN
Multiple tables can be listed in the FROM clause, and a connection between the tables
can be established in the WHERE clause. This creates a natural join wherein columns
can be derived from each table.
For example,
56 Unit 4
SELECT *
FROM teacher, student
WHERE teacher.teacher_id = student.student_id;
will produce a list where the teacher_id from the teacher table equals the student_id
from the student table. This operation is useful as it clearly indicates what is needed,
but it can be tedious to write out. As stated by Taylor (2019), “to avoid ambiguity, it
makes good sense to qualify the column names with the names of the tables they
came from. However, writing those table names repeatedly can be tiresome” (p. 319).
INNER JOIN
Another way to join tables is to use the JOIN keyword. The usual JOIN is the INNER
JOIN. This operator is similar to a natural join; however, they use different join proces-
ses and keywords. For example,
SELECT *
FROM teacher INNER JOIN student
ON teacher.teacher_id = student.student_id;
creates the same result as the natural join shown above. According to Taylor (2019), “an
inner join discards all rows from the result table that don’t have corresponding rows in
both source tables” (p. 323).
OUTER JOIN
In contrast to the INNER JOIN, the OUTER JOIN operator preserves unmatched rows.
For example, consider two tables, Table A and Table B, that you wish to join with an
outer join. Table A may have rows that don’t have matching counterparts in Table B,
while Table B may have rows that don’t have matching counterparts in Table A. An
outer join includes all rows regardless of whether they have matching counterparts.
This stands in contrast to an inner join, which will only include matched rows (Taylor,
2019).
Advanced SQL
EXISTS
The EXISTS operator is used with a subquery to determine whether the subquery
returns any rows. As explained by Taylor (2019), “if the subquery returns at least one
row, that result satisfies the EXISTS condition, and the outer query executes.” (p. 252).
The subquery (the SELECT statement found within the parenthesis) will return all tea-
cher rows also found within the student table. The DISTINCT keyword ensures that
only one copy of each teacher_id will exist. The outer query returns the first and last
names of the students who are also found in the teacher table.
These two commands are reserved mostly for database administrators (DBAs) for
granting and revoking privileges to any object. Users can be given privileges to one or
more tables, views, or other objects. Users may also be granted specific levels, such as
read-only, update, delete, or insert privileges. All these privileges are controlled by the
administrators.
58 Unit 4
Database adminis-
trators
4.3 Differences between Various SQL Versions (MSSQL,
A database adminis- PL/SQL, etc.)
trator is a technician
who is responsible SQL is the standard for interacting with databases. Over previous years and decades,
for organizing data- multiple dialects of SQL have emerged, all based on the same basic language. Similar
bases and managing to the Latin origins of both French and Italian, MySQL and Microsoft SQL both stemmed
the data within from SQL. (GeeksforGeeks, 2021a)
them.
Many SQL versions, such as PostgreSQL, MySQL, and SQLite have very similar syntaxes.
PostgreSQL According to Khalil (2018), “Microsoft SQL Server has the greatest contrast in SQL syntax,
This SQL language is as well as a wide variety of functions not available in other platforms” (para. 7). The
an advanced, enter- biggest differences between these SQL languages are in terms of data case sensitivity
prise-class, and and data functions. For example, in MySQL, there is no data case sensitivity. Therefore,
open-source rela- WHERE name = ‘John’ and WHERE name = ‘john’ are the same. This is different in
tional database sys- SQLite, in which WHERE name = ‘John’ and WHERE name = ‘john’ are not the same.
tem. It supports both
SQL (relational) and Another large difference between these languages is how they handle dates and times.
JSON (non-rela- This is exemplified by how each retrieves the current date and time. In MySQL, the
tional) querying. command is (GeeksforGeeks, 2021a)
SELECT now();
Which results in
2017-01-13 08:03:52
This would not work in SQL Server, which would require the following command
(TechOnTheNet, 2022):
SELECT GETDATE();
which results in
2019-04-28 15:13:26.270
As shown, not only is the function different, but the result is different as well. These
differences are important when translating or working between different languages as
well as importing or exporting data between databases.
When looking at processing or transactional SQL languages such as PL/SQL, the differ-
ences are more pronounced. PL/SQL extends SQL and “allows the execution of a block
of code at a time which increases its performance. The block of code consists of proce-
dures, functions, loops, variables, packages, [and] triggers” (Pedamkar, 2020b, para. 7).
PL/SQL is used for creating applications as a procedural language, while SQL is used
for manipulating data. In contrast, SQL is written in commands that are run individually.
Unit 4 59
Advanced SQL
Summary
The genius of structured query language (SQL) comes from its ability to expand into
other programming languages while staying true to its simple commands and data-
base interactions. By expanding with transaction control language (TCL), a devel-
oper can add multiple steps to any process including commits, savepoints, and roll-
backs. Entire processes can be coded to fit the needs of the business, users, or
stakeholders, and SQL provides the foundation for this.
More importantly, security and users’ ability to use data easily must be understood.
This can be accomplished by understanding data control language (DCL). Although
they are normally run by database administrators (DBAs), these commands should
still be understood by all programmers.
Knowledge Check
You can check your understanding by completing the questions for this unit on the
learning platform.
Good luck!
Unit 5
Data Query Languages for NoSQL
Database and Other Purposes
STUDY GOALS
DL-E-DLMDMDQL01-U05
62 Unit 5
Introduction
With the growing need for data, analytics, and security, along with the increasing
opportunities that application development offers when aligned with databases, the
offerings of a relational, structured query language (SQL) database have become limi-
ted. Because of these limitations, “not only SQL” (NoSQL) databases have emerged and
become common.
Document databases are the most popular of these databases. Examples of these
include MongoDB and Couchbase. These databases are composed of documents that
hold semi-structured data, adapt to the data, and are not restricted to the schema. This
means that the database can handle the data that are different from one document to
another.
Finally, GraphQL will be considered. These databases are more like connections to
databases that are optimized for application programming interfaces (APIs). They are
optimized for developers and for API connections that have any backend database.
The document database creates a key that it pairs with a specific document. Informa-
tion is then located within this document. This schemaless database is held together
Key-value pair by its documents, and these documents hold these keys in a key-value pair that bring
These are pairs in value to the data. Flexible documents, instead of tables, are used to store data in field-
which a key serves value pairs that can be stored using a variety of types and structures, and they do not
as a unique identi- have to remain tied to any specific data type (MongoDB, 2021b).
fier, e.g., {id: 10293}.
Unit 5 63
Because document databases have a flexible schema and the documents do not have Key-value pair
to have the same data types or even fields, the process of adding data to and pulling These are pairs in
data from them is very easy (MongoDB, 2021b). For example, in an SQL database, a user which a key serves
would need to know the table schema and the table columns to pull the data from the as a unique identi-
table. Because of its flexibility, a document database does not require such definitions. fier, e.g., {id: 10293}.
There are three additional factors that differentiate document databases (and non-
relational databases) from relational databases (MongoDB, 2021b):
1. The ease of use of the data model. Documents map to objects, making them coder-
friendly (reflecting on key-value pairs). Further, data are stored together in much
more accessible means for developers. Less code is required, and higher perform-
ance is achieved.
2. The universal knowledge of JavaScript object notation (JSON) among the coding
community. JSON has become an established standard for data interchange and
storage. This makes it a better choice when saving code or data as it is more acces-
sible to more users. Therefore, JSON documents are also more readable and even
language independent. Their key-value pair structure (that is, the underlying struc-
ture of JSON) makes their use more standard by all developers.
3. The flexibility of the schema. Without a schema, developers can essentially develop
immediately without studying or perming any needed searches. Fields are inde-
pendent of one another and can differ between documents. As such, developers can
also modify the structure at any time, avoiding disruptions and potential code
breakage.
When reviewing a document database, the following features will be found (MongoDB,
2021b):
Advantages
Document databases offer many advantages to their users. More precisely, they allow
an intuitive data model that is “fast and easy for developers to work with” (MongoDB,
2021b, para. 1). Additionally, the flexibility of their schema allows the data model to
change as the needs of the application change. In a relational database, if a cell does
not contain data, for example, the cell must exist regardless—it would just be a cell
without any data. However, in a document database, the document or field within the
document could be deleted without any repercussions for the integrity of the database
(Technical Matters, 2020).
64 Unit 5
This also allows for easier integration and scaling of new information. The database
can scale out to adapt to the data (MongoDB, 2021b). In a relational database, new data
could force changes to the schema, while in a document database, new data can be
added to the new documents without affecting previously inserted data (Technical Mat-
ters, 2020). These advantages cause document databases to be generally applicable in
various cases and across many industries (MongoDB, 2021b, para. 3).
Disadvantages
Document databases can have some disadvantages according to the specific needs of
the data. If the data need to be related and linked, document databases will be slow to
respond since they are not built for interlinking tasks. Even with high volumes of data,
if the data are related (e.g., in a banking database), a relational database would have
superior performance and storage compared to a document database (Technical Mat-
ters, 2020).
N1QL/Couchbase
Like all NoSQL databases, Couchbase is schema-free and flexible, responding to the
needs of the data. It also provides CRUD operations, which makes it a useful database
optimized for interactive applications (GeeksforGeeks, 2020a). In Couchbase Ser-
ver, “documents are stored in collections, which are stored in scopes, which are in turn
stored in buckets within a namespace” (Couchbase, 2022, para 3). This can be written in
the following format (Couchbase, 2022):
namespace:bucket.scope.collection
Couchbase runs using the non-first normal form query language (N1QL, pronoun-
ced “nickel”), which is based on SQL coding standards. N1QL uses SELECT, FROM, and
WHERE to build a simple query, just like an SQL query. In contrast to an SQL query, the
response comes in the JSON document format stored in the Couchbase server. It is also
important to note the document structure. For example, the following N1QL query
[
{
"country": "United Kingdom"
}
]
The query reads almost exactly like SQL, yet it results in JSON.
There are other differences in SQL and N1QL when looking at more advanced queries.
But some advanced queries still stay true to their SQL roots. For example, the following
query will return the names of (at a maximum) ten hotels that accept pets in the city of
Medway (Couchbase, 2022):
MongoDB
MongoDB is one of the most common document databases. MongoDB documents are
stored in binary JSON (BSON) format, which “is a variable of JSON with some additional
data types and is more efficient for storage than JSON” (Elmasri & Navathe, 2016,
p. 920). These documents are stored in collections.
What makes MongoDB so powerful is its ease of use and straightforward scaling (Cho-
dorow, 2013). For example, it replaces the traditional relational table with a easier-to-
read and easier-to-use document that allows for all types of data to be embedded in a
schema-free design. Furthermore, if needed, any document or any future document can
grow quickly and be easily compared to schema-restricted tables.
Features
Although a common database, MongoDB does not replace a relational database. It mer-
ely adds the options of how to store data such as scaling. There are also other factors
to consider when looking at using MongoDB (Raj & Deka, 2018). These include
• data insert consistency. MongoDB can insert large amounts of data at an extremely
high speed with equally high consistency.
• data corruption recovery. There is a command to perform data repair in MongoDB,
but this operation can be time consuming.
• load balancing. MongoDB supports faster replication and automatic load balancing
configuration because of data placed in shards.
• avoids joins. Joins in a relational database are examples of query deceleration. By
removing joins, queries speed up.
• changing schemas. Changing schemas in a relational database causes issues and
can possibly slow down queries. MongoDB is schemaless, which means that adding
new fields will not have negative effects.
66 Unit 5
As mentioned, MongoDB is not structured with databases, tables, and fields. Instead, it
contains collections, documents, and records. This new layout is important to remem-
ber when starting a MongoDB collection.
For example,
db.createCollection(“project”)
As seen in the previous example, MongoDB runs mostly on simple commands in which
further parameters can be used to specify the arguments or can be left empty. This
streamlines coding for the programmer.
Collections are grouped together to form databases (Chodorow, 2013). A single instance
of MongoDB can host several databases. Each collection should store documents per-
taining to the topic of the database for ease of use and quicker data capturing. Much
like a SQL database, these databases also have permissions and restrictions that can
be added to users (Chodorow, 2013).
Data models
As stated, instead of tables and relationships like a relational database, MongoDB is
comprised of documents. These documents are in a BSON format, which looks exactly
like JSON. For example, the following is a MongoDB document (Chodorow, 2013):
{
"greeting" : "Hello, world!",
"foo" : 3
}
This document contains two keys, "greeting" and "foo", with corresponding values,
"Hello, world!" and 3.
Although quite simple, there are still some rules MongoDB data models must follow
(Chodorow, 2013):
• The keys in a document are strings but cannot contain a string character.
• Values can be one of several different data types, such as string or integer. It can
even contain embedded documents.
Unit 5 67
Documents
Now that the collection "project" exists, documents can be added (Elmasri & Navathe,
2016). This is again done with the command line. For example, consider the following
command:
This line is inserting a document into the collection project. Notice that the collection
name is referred to in the command.
Secondly, notice how the data are inserted into the document. It is the BSON format as
described. Behind the scenes, this document is given a unique key stored in the _id
field that is automatically created and stored by MongoDB (Elmasri & Navathe, 2016). It
can be created by the programmer, but it is generally easier when a system is genera-
ted by MongoDB. As mentioned before, collections have dynamic schemas, which
means that no two documents in a collection need to have the same layout.
Transactions
While MongoDB can handle all CRUD operations, it does not appear like an SQL com-
mand in terms of key words (e.g., SELECT, INSERT, or DELETE). Although they look diffe-
rent and have different terminology, they can translate easily into MongoDB once you
understand SQL commands. This can be expanded by first studying the SELECT com-
mand in SQL, which is the .find() command in MongoDB. Because MongoDB is com-
mand-line based and requires functions rather than commands, the functions require
arguments to narrow down searches. In MongoDB,
db.project.find()
in SQL (MongoDB, 2021a). By comparing the two, sometimes it can be easier to translate
or learn MongoDB.
Like the SELECT statement, the .find() command can help narrow down the query by
using filters constructed in BSON format and added to the argument (MongoDB, 2021a).
For example,
db.project.find({age: 25})
Important for understanding this MongoDB statement is to notice how the search is
written inside curly brackets (i.e., {}), which is the same as any BSON value. To search,
BSON values need to be in correctly written syntax. For example, to add more to
this .find(), it would need to be written in correct BSON format, as seen in the follo-
wing example:
It may look at little overwhelming at first, but when comparing to SQL and converting to
simple BSON syntax, MongoDB is written in simple grammar for basic and quick deve-
lopment. When this basic syntax is understood, more complex syntaxes can be added
(MongoDB, 2021a). For example,
translates easily to
The special command $in can be changed to $or for the OR operator in SQL, as seen in
the following example:
db.<collection_name>.deleteOne(<condition>)
and
db.<collection_name>.updateOne(<condition>)
Indexes
An index makes searching a database faster, much like an index for a book. If it did not
use an index, the query would have to use a table scan (Chodorow, 2013), which means
the server would need to search the entire database to the find the query’s results.
This can take a long time, especially in a MongoDB database designed to hold large
amounts of data.
To view how an index improves a query, the explain() function can be used to show
the query details of when it is executed. For example, consider the following query and
output (Chodorow, 2013):
db.users.find({username: "user101"}).explain()
{
"cursor": "BasicCursor",
"nscanned": 1000000,
"nscannedObjects": 1000000,
"n": 1,
"millis": 721,
"nYields": 0,
"nChunkSkips": 0,
"isMultiKey": false,
"indexOnly": false,
"indexBounds": { }
}
This query is searching for a random username, and the explain() function has out-
put fields that show detailed information on the how the query runs. The “nscanned” is
the most interesting field, as it shows the number of documents the query looked at
the satisfy the query (Chodorow, 2013). Important to note is the “millis” field, which
shows time in milliseconds. In this case, it took the query 721 milliseconds to search
1,000,000 (which in this example is all the documents in the collections).
db.users.ensureIndex({"username" : 1})
This could take a few minutes depending on the collection size, but it can show notable
results. For example, the previous query can be run in the following manner (Chodorow,
2013):
db.users.find({"username" : "user101"}).explain() {
"cursor": "BtreeCursor username_1",
"nscanned": 1,
"nscannedObjects": 1,
"n": 1,
"millis": 3,
"nYields": 0,
70 Unit 5
"nChunkSkips": 0,
"isMultiKey": false,
"indexOnly": false,
"indexBounds": { "username" : [ [ "user101",
"user101" ] ] }
}
After adding the index, the find() query decreases to 3 milliseconds in length, while
only needing 1 object to scan. As summarized by Chodorow (2013), “indexes have their
price: every write (insert, update, or delete) will take longer for every index you add.
This is because MongoDB has to update all your indexes whenever your data changes,
as well as the document itself” (pp. 83—84). Accordingly, MongoDB has a limit of 64
indexes per collection (Chodorow, 2013). Although all examples are in a fictitious data-
base, the queries are examples of how an index, when added to the most affected field,
can improve queries in reality.
By understanding SQL first, MongoDB can be easier to learn; however, it was built for
overall usefulness and is now well incorporated into many organizations, regardless of
prior SQL knowledge.
When reviewing a graph database, the following features are found (Neo4j, 2022c):
Further, there are labels, which are important ways for nodes to be grouped together.
Nodes can have zero or more labels, and labels do not have properties (JavaTPoint,
2021).
Unit 5 71
Cypher/Neo4j
Neo4j is an open-source, native graph database that provides a backend for applicati-
ons (Neo4j, 2022c). Neo4j is known for its flexibility, which is derived from the fact that
data aren’t "stored as a ‘graph abstraction’ on top of another technology, it’s stored
[like a] whiteboard” (Neo4j, 2022c, para. 13).
Data model
The data of the graph database are easily illustrated using circles, lines, and arrows
(Neo4j, 2022a). This illustration explains the entities, relationships, and flows of infor-
mation the database will then use in code. A drawing of the connections of data on a
whiteboard is a good way of picturing a graph data model. It can start very simple and
rough and develop into a more complicated series of relationships.
A simple graph database showing an investigation into car insurance fraud investiga-
tion could look something like the following figure (Webber & Van Bruggen, 2020).
72 Unit 5
In the figure above, nodes and relationships are labeled and are easily read by non-
technical readers. For example, in this instance “a Person LIVES_AT a Location, and a
Person DRIVES a Car that HAS_INSURANCE and was INVOLVED_IN an Accident” (Webber
& Van Bruggen, 2020, p. 18). The graph is easy to understand and navigate, which leads
to less interpretation for database administrators and better communication between
the technical and non-technical members of the team.
These whiteboard images can become more in-depth over time concerning nodes and
relationships. The graph model can evolve to include more layers and possibly even
more entities and relationships as the specific needs for the data take shape.
CREATE (single);
would create a node single. This can be verified using the following code:
This code will return the node (or nodes) that are present in the collection.
To go further with the CREATE statement, a node with multiple labels can be created as
well by using the following syntax:
For example, you could create a node “Kalam” with the label “person,” “president,”
and “scientist.”
CREATE (Kalam:person:president:scientist)
You could take this CREATE statement further by adding the following properties:
Unit 5 73
As important as these nodes may be, their relationships are equally as important.
These relationships can be created using the MATCH statement with the following syn-
tax:
To make a more impactful view of Neo4j, compare this Cypher code (Bitnine, 2016)
MATCH(cited_o : Affiliation)
<-(citied_a:Author) -> (citied_p:Paper)
<-(citing_p:Paper) -> (citing_t:Team)
WHERE citing_t = ‘Database’
RETURN cited_o.name
SELECT affiliation.affiliation_name
FROM affiliation.affiliation_info, author_info.reference_info, term_info, term
WHERE affiliation.affiliation_id = affiliation_info.affiliation_name
AND affiliation_info.author_id = author_info. author_id
AND author_info.paper_id = reference_info.citied_paper
AND refrence_info.citing_paper = term_info.paper_id
AND term_info.term_id = term.term_id
AND term.term_name = ‘Database’
Note that Cypher was developed to optimize joining and relationships, and it thus
needs less code compared to SQL. Even though the JOINS are not written (as they are
in SQL) because the relationships already exist in the code, they are inherited when
writing Cypher (and the arrows help as well). It helps when writing and comparing the
two to see how similar they are, and one can thus note how Cypher makes writing que-
ries easier.
74 Unit 5
SHOW PROCEDURES
This is similar to functions, for which the following code can be run (Neo4j, 2022b):
CALL dbms.procedures()
More details on each function and procedure can also be shown by filtering and narro-
wing the search using the aforementioned commands. Doing so will help describe
exactly how each procedure or function can run (e.g., what the required arguments are),
along with the expected output. This general filtering and search further help to narrow
the analysis and provide examples for prospective uses.
User-defined procedures and functions extend Neo4j and Cypher’s already powerful
codes. Many monitoring, analysis, and security features can be implemented using
user-defined procedures and functions. To create custom procedures in Neo4j and
Cypher, the following rules should be utilized (Neo4j, 2022b):
Both user-defined procedures and functions follow the subsequent rules (Neo4j,
2022b):
A basic example of calling a procedure could look like the following (Neo4j, 2022b):
Unit 5 75
As noted, the code is loading data from an SQL database. The code then uses the
apoc.load.jdbc() procedure, which takes two arguments. One is already defined by
the user (url).
Node labels
Nodes labels are case sensitive and have the first letter of each word begin with a capi-
tal letter (Neo4j, 2022c). The following are examples of node labels:
(:Person)
(:NetworkAddress)
(:VeryDescriptiveLabel)
Relationship types
Relationship types are also case sensitive and written in all upper case with an under-
score between words (Neo4j, 2022c). The following are examples of relationship types:
[:FOLLOWS]
[:ACTED_IN]
[:IS_IN_LOVE_WITH]
title
size()
businessAddress
firstName
customerAccountNumber
allShortestPaths()
Clauses
Clauses are capitalized and are placed at the beginning of a new line. They are not case
sensitive. It is possible to change casing (e.g., mAtCh), put multiple keywords on a line,
or mistype clauses, and Cypher will still execute the query. However, for the readability
and supportability of queries, it is recommended that the clauses are in all capital let-
ters and placed at the beginning of a new line. The following show examples of clauses
(Neo4j, 2022c):
MATCH (n:Person)
WHERE n.name = 'Bob'
RETURN n;
//-----------
WITH "1980-01-01" AS birthdate
MATCH (person:Person)
WHERE person.birthdate > birthdate
RETURN person.name;
Keywords
Keywords follow the same pattern as clauses. They should consist of all capital letters
and are not casesensitive. However, they do not need to be placed on a separate line.
Keywords in Cypher include words such as DISTINCT, IN, STARTS WITH, CONTAINS,
NOT, AS, AND, and others.
MATCH (p:Person)-[:VISITED]-(place:City)
RETURN collect(DISTINCT place.name);
//------
MATCH (a:Airport)
RETURN a.airportIdentifier AS AirportCode;
//------
MATCH (c:Company)
WHERE c.name CONTAINS 'Inc.' AND c.startYear IN [1990, 1998, 2007, 2010]
RETURN c;
There is a subquery in these examples that is in curly brackets and is also indented.
This has a positive influence on readability for both current and future users. This is
true for Cypher and for other programming languages.
If the subquery is only one line, it is not necessary to put it on its own line or to indent
it. Instead, it can be written like the following example query, which adheres to the
recommended guidelines for using subqueries (Neo4j, 2022c):
Because it was created by developers and for developers, GraphQL offers solid per-
formance and architectural advantages. One of its performance advantages involves
preventing the over-fetching often found in a RESTful API (Stemmler, 2021). This means
that it does not ask for more data than are required to fulfill its query. The result is less
bandwidth used, an advantage for any web application.
GraphQL has its own type of syntax to define the schema of an API. This is the schema
definition language (SDL), and it contributes to GraphQL’s interaction with an API signi-
ficantly (How to Graph QL, n.d.). For example, the SDL defines a simple type called “Per-
son” in the following way (How to Graph QL, n.d.):
type Person {
name: String!
age: Int!
}
This type, Person, has two fields, name and age, with their respective types of String
and Int. The ! indicates they are both required.
Relationships can be expressed closely in the same way. For example, Person can be
associated with Post as seen in the following case (How to Graph QL, n.d.):
type Post {
title: String!
author: Person!
}
type Person {
name: String!
age: Int!
posts: [Post!]!
}
To query all these data, the following query would be sent to a server:
Unit 5 79
{
allPersons {
name
}
}
{
"allPersons": [
{ "name": "Olaf" },
{ "name": "Stella" },
{ "name": "Noreen" }
]
}
In this query to the server, the allPersons field is the root field of the query, while the
payload is everything that follows. In this case, the payload is name (How to Graph QL,
n.d.).
{
allPersons(last: 2) {
name
}
}
This argument states that the allPersons field has a last parameter that only returns
up to a specific number of persons (How to Graph QL, n.d.).
Working with this basic structure, the GraphQL can expand and fit the needs of many
APIs. This makes it easy for any developer to adapt and manipulate structures accor-
ding to their specific needs.
Summary
“Not only SQL” (NoSQL) databases offer many opportunities for developers who
seek unrestrained storage and analytics. Document databases store semi-struc-
tured data in documents that react to the data and require no schema—a common
thread among all NoSQL databases. Having no schema means data are easily
added and manipulated.
80 Unit 5
Two examples of such document databases are MongoDB and Couchbase. MongoDB
is more common and offers a document database where collections are created
with command lines used by developers. These command lines easily create,
delete, and manipulate data. Couchbase is a document database with key-value
document access that offers high latency for any developer.
A second NoSQL database to keep in mind is a Graph Database. This database func-
tions as a whiteboard with free-flowing nodes. It is responsive and, again, requires
no schema. Neo4j is an example of a graph database that looks surprisingly similar
to SQL.
Finally, there is GraphQL, which is the NoSQL database optimized for application
programming interfaces (APIs). GraphQL does not pull from any exact database but
is optimized for any database. This offers clear performance and architectural
advantages for developers.
Knowledge Check
You can check your understanding by completing the questions for this unit on the
learning platform.
Good luck!
Unit 6
Using Data Query Languages within
Application Programming
STUDY GOALS
DL-E-DLMDMDQL01-U06
82 Unit 6
Introduction
A web developer’s key competency is their knowledge of the languages needed to
develop powerful applications. This includes understanding data query languages and
how to incorporate them into web applications. Although larger teams can afford to
have specialized developers who each only know a few languages, many teams depend
on team members who know all programming languages on their development stack to
successfully create and implement projects. These stacks are found in a three-tier
architecture.
Testing can be tricky with data query languages in an application programming envi-
ronment. However, unit testing is essential for developers. The positive and negative
results from these tests help coding projects to advance toward their goals.
Architecture
When first setting up the environment in which the code will be written, it is important
to understand where each part of the environment exists. The database, for example,
lives on what is called the “database tier.” The programming language lives on the “pro-
gramming tier,” and the user of the application lives on “the client tier.” These tiers
make up the three-tier architecture (IBM Cloud Education, 2020) of application develop-
ment.
Unit 6 83
The database tier manages the database management software, and this software
manages the data query language (IBM Cloud Education, 2020). This can be derived
from any database management software, including MySQL, Oracle, PostgreSQL, or
MongoDB.
The programming tier, sometimes called the application tier, houses the business logic
and processing (IBM Cloud Education, 2020). This tier will query from the database to
return to the client tier. This layer can be run on any popular programming language
(e.g., Python, Ruby, or PHP) and can contain frameworks such as Django, Rails, Sym-
phony, or ASP.NET.
Finally, there is the client tier, or presentation tier, which provides the user interface.
This tier includes the website that users will see and use. This front-end is developed
using HTML, CSS, and JavaScript.
Sometimes this architecture can consist of two, four, or even more tiers; however, we
will only analyze the three-tier variety.
Connection Management
Connecting the programming tier to the database tier is more precisely described
according to the programming language used. Older languages, such as C and Java,
require libraries to connect to the database tier. More modern languages, such as PHP,
Python, and Ruby, will offer built-in libraries. To add more complexity, this will also
84 Unit 6
depend on the type of database that is used. For example, MySQL connects more easily
to back-end programming languages than Oracle SQL since Oracle SQL usually requires
the addition of a library. What all these languages have in common is that the connec-
tion from the programming tier to the database tier depends on a specific code to sup-
ply the connection.
Java, for example, requires a DriverManager class to establish the connection. This
class could look something like the following (The Java Tutorials, 2021):
if (this.dbms.equals("mysql")) {
conn = DriverManager.getConnection(
"jdbc:" + this.dbms + "://" +
this.serverName +
":" + this.portNumber + "/",
connectionProps);
} else if (this.dbms.equals("derby")) {
conn = DriverManager.getConnection(
"jdbc:" + this.dbms + ":" +
this.dbName +
";create=true",
connectionProps)
}
System.out.println("Connected to database");
return conn;
}
As the code shows, after the connection is initiated, it is completed based on what type
of database is present. The first IF statement is checking if the database management
Derby system (DBMS) is an SQL database. The ELSE IF statement is checking for a derby
This is an open- DBMS.
source relatio-
nal database that
is implemented ent- Coding and Testing
irely in Java and is
available under the The difficulty when combining great programming and high-quality database query
Apache License. code is how to test them properly. Testing can become difficult when two tiers are
working together as they may not communicate in a manner conducive to testing out-
put. Therefore, unit testing is best used. There are numerous ways to test a data query
language inside an application environment, and some development applications have
built-in tools. However, knowledge of good unit tests can provide outstanding services
for the end-user and team (Gill, 2022).
Unit 6 85
When applying unit testing to database testing, a developer can test all components of
the database as they are developed, including each new column, constraint, and trigger.
Furthermore, SQL queries (the most tested part of the database testing process) can be
carried out by creating scenarios. After the scenarios are created, they can be followed
through the database with the code. For example, if the scenario follows a query that
inserts data into the table, does the table populate correctly? Although this is a very
small, functional test, when it is done correctly it can speed up the entire development
process.
Database testing plays a key role in application development. If testing is not comple-
ted, or not done correctly, the system can deadline, there can be data corruption, data
loss, and decreased performance (Gill, 2022). Unit testing of databases helps ensure the
development of the entire application is a success.
Python Examples
The following examples all assume an SQL database (in this case, SQLite3) has been SQLite3
created and the Python code (also setup in the environment) is connected to the data- This is a C library
base. that provides a light-
weight, disk-based
This can be accomplished using the following script (GeeksforGeeks, 2021c): database that
doesn't require a
import sqlite3 separate server pro-
# connecting to the database cess.
connection = sqlite3.connect("gfg.db")
# cursor
crsr = connection.cursor()
86 Unit 6
What is very new here is the cursor line, in which the object crsr is created and a con-
nection to the cursor object is initiated. The cursor object in Python is used to make
the connection for executive SQL queries, and it acts as middleware between the
SQLite database connection and the SQL query (GeeksforGeeks, 2021c). When com-
mands are executed in Python, they will be executed using the cursor object only.
Once the connection is established, Python can perform create, read, update, and
delete (CRUD) operations on the database. In Python, it is easiest to create an object
and write the SQL command in it. The following command will do so (GeeksforGeeks,
2021c):
All commands can be run by first creating the object and then executing it. Consider
the following example, which puts the cursor and the object with the SQL command
together:
# cursor
crsr = connection.cursor()
Notice that the cursor is opened with the connection, and then executed. If the script is
completed (and no more commands need to run in the file), Python uses a commit and
close command:
Java Example
Without going into too much detail concerning the differences between Python and
Java, how these languages work and how they connect and interact with a database are
roughly the same. The following Java method, for example, will create a table (The Java
Tutorials, 2021):
There is a separate method for establishing the connection using the DriverManager
class. This class will use this method to execute the SQL command string.
Inserting values into a SQL table using Java uses roughly the same steps, as seen in the
following example (The Java Tutorials, 2021):
Handling the reading of tables also involves similar steps. The following example code
is used to retrieve data (The Java Tutorials, 2021):
Given Java’s differences compared to Python, its code is heftier when accomplishing
SQL tasks. However, both code blocks can cleanly connect and write SQL commands to
the database.
Summary
There is more to data query languages (DQLs) than just queries and databases.
When it expands into application programming, DQLs allow a simple program to
become interactive, usable, and secure. By adding a database, the application
becomes a three-tier application in which the client, the programming language,
and the database all are divided yet connected between layers. These layers facili-
tate smoother development and higher security. The connection between the data-
base and programming layers can be tricky depending on the languages found in
each tier. However, most modern languages have built-in connection management.
Knowledge Check
You can check your understanding by completing the questions for this unit on the
learning platform.
Good luck!
Evaluation 91
Congratulations!
You have now completed the course. After you have completed the knowledge tests on
the learning platform, please carry out the evaluation for this course. You will then be
eligible to complete your final assessment. Good luck!
Appendix 1
List of References
94 Appendix 1
List of References
Barney, L. (2008). Oracle database Ajax & PHP web application development. Oracle
Press.
Big Data Analytics News. (2014, February 24). Types and examples of NoSQL databases.
https://bigdataanalyticsnews.com/types-examples-nosql-databases/
Blancco. (2019, August 2). What is data destruction? For data protection, the definition
matters. https://www.blancco.com/resources/article-data-destruction-definition/
DiFranza, A. (n.d.). 5 reasons why a computer professional needs SQL skills. Charlotte
Business Journal. https://www.bizjournals.com/charlotte/news/2020/03/06/5-reasons-
why-a-computer-professional-needs-sql.html
Elmasri, R., & Navathe, S. B. (2016). Fundamentals of database systems (Global ed.).
Pearson.
Elmasri, R., & Navathe, S. B. (2017). Fundamentals of database systems (Global ed., 7th
ed.). Pearson.
Fadlallah, H. (2021, July 2). An overview of the SQL server ISNUMERIC function. SQLShack.
https://www.sqlshack.com/an-overview-of-the-sql-server-isnumeric-function/
Appendix 1 95
List of References
GeeksforGeeks. (2021b, September 30). SQL: DDL, DQL, DML, DCL and TCL commands.
https://www.geeksforgeeks.org/sql-ddl-dql-dml-dcl-tcl-commands/
Gill, N. S. (2022, February 3). Database unit testing and test-driven database develop-
ment. Xenonstack. https://www.xenonstack.com/blog/database-unit-testing
IBM Cloud Education. (2020, October 28). Three-tier architecture. IBM. https://
www.ibm.com/cloud/learn/three-tier-architecture
IBM Cloud Education. (2021, June 29). Structured vs. unstructured data: What’s the differ-
ence? IBM. https://www.ibm.com/cloud/blog/structured-vs-unstructured-data
Khalil, M. (2018, October 15). SQL Server, PostgreSQL, MySQL... what's the difference?
Where do I start? DataCamp. https://www.datacamp.com/community/blog/sql-differen-
ces
Meier, A., & Kaufmann, M. (2019). SQL & NoSQL databases: Models, languages, consis-
tency options and architectures for big data management. Springer. https://doi.org/
10.1007/978-3-658-24549-8
Moran, R. W. (2019, January 17). Getting to know OLAP and MDX: Microsoft's new multidi-
mensional database tools. ITPro Today. https://www.itprotoday.com/sql-server/getting-
know-olap-and-mdx
MySQL. (n.d.). Integer types (exact value): INTEGER, INT, SMALLINT, TINYINT, MEDIUMINT,
BIGINT. In MySQL 8.0 reference manual. Oracle. https://dev.mysql.com/doc/
refman/8.0/en/integer-types.html
Raj, P., & Deka, C. G. (2018). A deep dive into NoSQL databases: The use cases and appli-
cations. Elsevier Science & Technology.
Risch T. (2016). Query language. In L. Liu & M. Özsu (Eds.), Encyclopedia of database sys-
tems. Springer. https://doi.org/10.1007/978-1-4899-7993-3_1090-2
Snaidero, B. (2021, March 18). Using the SQL ISNULL() function. MSSQLTips. https://
www.mssqltips.com/sqlservertip/6776/sql-isnull-function-examples/
Stemmler, K. (2021, February 16). What is GraphQL? GraphQL introduction. Apollo Blog.
https://www.apollographql.com/blog/graphql/basics/what-is-graphql-introduction/
Taylor, A. G. (2019). SQL all-in-one for dummies (3rd ed.). John Wiley & Sons.
TechGig Correspondent. (n.d.). Top 5 NoSQL databases for data scientists in 2020.
https://content.techgig.com/top-5-nosql-databases-for-data-scientists-in-2020/article-
show/78330888.cms
Appendix 1 97
List of References
Technical Matters. (2020, March 23). Document databases: How do document stores
work? Ionos. https://www.ionos.com/digitalguide/hosting/technical-matters/docu-
ment-database/
Vaish, G. (2013). Getting started with NoSQL: Your guide to the world and technology of
NoSQL. Packt Publishing.
Watt, A., & Eng, N. (2014). SQL: Structured query language. In A. Watt (Ed.), Database
design (2nd ed.). BCcampus Open Education. https://opentextbc.ca/dbdesign01/chap-
ter/sql-structured-query-language/
Watt, A., & Eng, N. (2022). Database design (2nd ed.). BCcampus Open Education.
https://opentextbc.ca/dbdesign01/
Webber, J. D., & Van Bruggen, R. (2020). Graph databases for dummies (Neo4j special
ed.). John Wiley & Sons.
Appendix 2
List of Tables and Figures
100 Appendix 2
SQL Commands
Source: Krista Sheely (2022), based on GeeksforGeeks (2021b).
Query Clauses
Source: Krista Sheely (2022), based on Beaulieu (2020).
Three-Tier Architecture
Source: Krista Sheely (2022), based on Amazon Web Services (n.d.).
Mailing address:
Albert-Proeller-Straße 15-19
D-86675 Buchdorf