Dlmdmdql01 Course Book

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 104

COURSE BOOK

Data Query Languages


DLMDMDQL01
Course Book
Data Query Languages
DLMDMDQL01
2 Masthead

Masthead

Publisher:
IU Internationale Hochschule GmbH
IU International University of Applied Sciences
Juri-Gagarin-Ring 152
D-99084 Erfurt

Mailing address:
Albert-Proeller-Straße 15-19
D-86675 Buchdorf

media@iu.org
www.iu.org

DLMDMDQL01
Version No.: 001-2022-1013

© 2022 IU Internationale Hochschule GmbH


This course book is protected by copyright. All rights reserved.
This course book may not be reproduced and/or electronically edited, duplicated, or distributed in any kind of
form without written permission by the IU Internationale Hochschule GmbH.
The authors/publishers have identified the authors and sources of all graphics to the best of their abilities.
However, if any erroneous information has been provided, please notify us accordingly.
Module Director 3

Module Director
Prof. Dr. Peter Poensgen

Mr. Poensgen has been a lecturer in business intelligence at IU Inter-


national University of Applied Sciences since July 2020. His areas of
expertise include data analysis and database management.

Mr. Poensgen began his professional career as a database and devel-


opment specialist in Düsseldorf, Germany. This was followed by a
project manager position in finance where he began to develop his
knowledge in databank-based systems. In leadership roles at banks
and in energy trade, he was responsible for the development, sup-
port, and project management of trade, back office, and risk-manage-
ment systems. He has been the IT coordinator of an insurance com-
pany in Cologne since 2012.

Parallel to his professional duties, he lectured on IT management and


mathematical economics at the Institute for Information Systems at
Lübeck University, where he earned his doctorate on the topic of the
query processing and optimization of databank systems.
4 Contents

Table of Contents
Data Query Languages

Module Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Introduction
Data Query Languages 7
Signposts Throughout the Course Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Unit 1
Introduction to Data Query Languages 12
1.1 Definition of Data Query Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2 Differentiation to Other Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.3 Typical Examples of Data Query Languages . . . . . . . . . . . . . . . . . . . . . . . . . 14

Unit 2
Data Management 18
2.1 Data Life Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 Types of Datasets (Structured, Semi-Structured, and Unstructured


Data) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3 Role of Databases (SQL and NoSQL Databases) . . . . . . . . . . . . . . . . . . . . . 23

Unit 3
Fundamentals of SQL 28
3.1 Brief Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2 Data Definition Language (DDL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.3 Data Query Language (DQL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.4 Data Manipulation Language (DML) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39


Contents 5

Unit 4
Advanced SQL 44
4.1 Transaction Control Language (TCL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.2 Data Control Language (DCL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.3 Differences between Various SQL Versions (MSSQL, PL/SQL, etc.) . . . . . 58

Unit 5
Data Query Languages for NoSQL Database and Other Purposes
62
5.1 Document Databases (N1QL/Couchbase and MongoDB) . . . . . . . . . . . . . 62

5.2 Graph Databases (Cypher/Neo4j) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.3 GraphQL for APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Unit 6
Using Data Query Languages within Application Programming 82
6.1 Special Aspects (Architecture, Connection Management, Coding, and
Testing) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.2 Examples (SQL in Python and SQL in Java) . . . . . . . . . . . . . . . . . . . . . . . . . 85

Appendix 1
List of References 94

Appendix 2
List of Tables and Figures 100
Introduction
Data Query Languages
8 Introduction

Signposts Throughout the Course Book

Welcome

This course book contains the core content for this course. Additional learning materials can
be found on the learning platform, but this course book should form the basis for your
learning.

The content of this course book is divided into units, which are divided further into sections.
Each section contains only one new key concept to allow you to quickly and efficiently add
new learning material to your existing knowledge.

At the end of each section of the digital course book, you will find self-check questions.
These questions are designed to help you check whether you have understood the concepts
in each section.

For all modules with a final exam, you must complete the knowledge tests on the learning
platform. You will pass the knowledge test for each unit when you answer at least 80% of the
questions correctly.

When you have passed the knowledge tests for all the units, the course is considered fin-
ished and you will be able to register for the final assessment. Please ensure that you com-
plete the evaluation prior to registering for the assessment.

Good luck!
Introduction 9

Learning Objectives

Databases are one of the most used technologies today. Almost no industry does not use
databases of some sort to store data. This storage also enables data to be grown, manipu-
lated, and exported for various uses. Thus, it is important to understand how to access this
data using data query languages.

The course Data Query Languages addresses how data are stored in databases, and it also
defines various types of data. This is important for the data life cycle, a key concept that
describes the multiple stages that data go through when used in a database.

The most used data query language is SQL, and it consists of a data definition language, a
data query language, and a data manipulation language. There are also advanced commands,
as seen with transaction control languages and data control languages.

Although SQL still has prominence and is usually the first data query language that students
study and explore, numerous other databases are becoming more common as market needs
change. NoSQL is one such category of databases, and it includes document databases,
graph databases, and GraphQL. All of these databases are influential in the data manage-
ment industry. They each play an important role depending on the requirements of a project
and the particulars of the data utilized.
Unit 1
Introduction to Data Query Languages

STUDY GOALS

On completion of this unit, you will be able to …

… define data query languages.

… compare data query languages to other programming languages.

… recognize examples of query languages.

DL-E-DLMDMDQL01-U01
12 Unit 1

1. Introduction to Data Query Languages

Introduction
The number of programming languages continues to grow each year as complexity
increases, technology advances, and the needs of the average developer evolve. As with
many other technologies, whether iPhones or TVs, programming languages evolve in
generations that can be divided by appearance, advancement, and technology facilita-
tion.

Data query languages are widely used by programmers as they are essential to the
operations of their designated databases. Everywhere that a database is used, query
languages are also used. Students in many professions beyond computer programming
are learning query languages and databases, as professions and career fields increas-
ingly find this knowledge advantageous to their non-technical fields.

In general, data query languages are relatively easy to read and understand by any
user. Although advanced queries and commands need to be studied and reviewed (as
with any other programming language), good queries make for quick transactions
within the database and for easy review, without requiring a high level of expertise.

1.1 Definition of Data Query Languages


Put simply, a data query language refers to a language that is used to interact with any
database. Data query languages are primarily used within databases or within applica-
tions in which data is requested by the user. Users create and manipulate this language
to return data in a specified form, length, or order. This data can also be sent between
Query applications. Data query languages perform commands, or queries, on databases, and
A query is a single make requests to pull data from objects found within schemas. Query languages can
request entered into program a dataset using a natural language. This goes well beyond data analysis and
a computer data- processing calculation (Risch, 2016).
base that involves
connecting user Data query language commands allow the user to query a database to retrieve data
input to the contents from a source of information (Barney, 2008). For example, in MySQL, this source would
of databases or be a table, whereas this source would be a document in MongoDB. The result would be
information systems. in a structure that users could utilize to process the data efficiently, whether for analy-
ses, reviews, or presentations. Since they do not depend on each other, the queries can
Schema be used independently. Moreover, they can be built to run on a larger scale with
A schema shows the dependencies and procedures linked to other queries or programs.
structure of data
and can be consid- Databases are inevitably becoming mainstream, in small businesses, start-ups, non-
ered a blueprint of profits, and even at the individual level, since any entity who wishes to keep track of
how a database is large amounts of information must use databases. Accordingly, knowledge of query
constructed. languages and how to use them is also becoming widespread. The number of user-
friendly query languages has also grown in parallel to the growth in the popularity of
Unit 1 13

Introduction to Data Query Languages

query languages. This growth has allowed many without formal expertise in data lan-
guages to pull and utilize data, and anyone who can drag and drop can create analytic
reports.

1.2 Differentiation to Other Languages


Programming, like other technologies, was created over time. As each generation built
on previous versions, each grew in complexity. Data query languages have been present
since the 1970s, and languages from this era are considered fourth-generation pro-
gramming languages. Since then, only one more generation has been created, the fifth
generation (Flynn, n.d.). Since they are fourth-generation programming languages, data
query languages are less complex than the fifth-generation languages used today.

Earlier generations included the very basic machine language in the first generation, Machine language
assembly language in the second, and high-level language in the third (Flynn, n.d.). This is a computer
These three types of languages lack the complexity found in data languages. For exam- programming lan-
ple, machine language included extremely simple commands—think 0s and 1s. Query guage that consists
languages go far beyond that level of complexity. They allow for tasks like pulling and of binary or hexa-
updating complex data using given functions. This advanced complexity means that decimal instructions
fewer lines of code are used compared to what was needed in earlier generations. to which a computer
can respond directly.
Although the code is more complex, query languages are significantly easier to read
since they are written in English-like sentences (Flynn, n.d.). Thus, users can read the Assembly language
code more naturally and learn it more easily than in past generations. Languages in This is a type of low-
this fourth generation include Perl, Python, and Ruby (Enos, 2020). level programming
language that is
The shift from the fourth generation to the fifth generation was a significant leap. intended to commu-
Although the fifth generation of programming languages is the most “current” genera- nicate directly with a
tion, past generations are still widely used. Fifth-generation languages are defined as computer’s hard-
“any programming language based on problem-solving using constraints given to the ware.
program, rather than using an algorithm written by a programmer” (Enos, 2020, para. 7).
This generation did not alter how the code looked; instead, it changed how the code
worked. Data query languages within this generation vary greatly, as it depends on the
programmer to ensure that queries are written correctly. Fifth-generation languages are
moving away from the necessity of a programmer. In fact, they give the computer a
greater share of the work.

Beyond the Generations

What makes the data query language so strong is its uniqueness compared to other
programming languages. It has objects, it relies on strict procedures, and its main goal
is to retrieve or manipulate data rather than to run a program. In some sense, data
query languages are the strong legs that support the table of the programming applica-
14 Unit 1

tion, where another language—such as Java, Python, or .NET—runs the rest of the pro-
gram. However, the main program cannot do anything with the data without the query
language’s support.

1.3 Typical Examples of Data Query Languages


SQL is the most used query language today. From its initial introduction, it quickly
became an industry standard (DiFranza, n.d.) and is well-used by most companies or
programmers. SQL became so popular since its appearance in the 1970s that it
branched into proprietary developments (e.g., Oracle SQL) and open-source versions
(e.g., MariaDB).

Although it was the first data query language, SQL isn’t the only popular language in
use today. SQL belongs to the relational family of databases, which makes it a strong
candidate for anyone who needs to organize large quantities of data for reporting tools
or management systems. However, with the growth of big data and its increasing role in
business, NoSQL (which stands for “not only SQL”) was developed to fulfill what SQL
and relational databases could not (Meier & Kaufmann, 2019).

NoSQL, or non-relational databases, are becoming just as common in the market as


relational databases. The most common of these is MongoDB (TechGig Correspondent,
n.d.). Followed closely by ElasticSearch, which is used by over 3000 companies, and
Amazon’s DynamoDB (TechGig Correspondent, n.d.). NoSQL is much more than it
appears as it expands into multiple models (Meier & Kaufmann, 2019, p. 202). These
models include the graph database model, the key-value stores model, the column
family database model, and the document stores model.

A standard example of the graph database model is Cypher for Neo4j. For JavaScript
object notation (JSON), the query language JSONiq is used. This database model is a
declarative query language that allows the user to store and retrieve data from the
graph database (Cypher Query Language, n.d.). The key-value store model follows a
hash data structure, with a unique key and a pointer. Amazon SimpleDB is an example
of this model. An example of the column model is Cassandra, which arranges columns
by column family and in which keys point to multiple columns. Finally, the standard
example of the document stores model is MongoDB. This model stores information in
JSON-like documents (Big Data Analytics News, 2014).

Another query language to note is MDX from Microsoft, which is standard for OLAP
tools. MDX stands for multidimensional expression and is the query language for multi-
dimensional languages. These queries can result in multiple dimensions. To conceptu-
alize this, imagine cube-like results (Moran, 2019).
Unit 1 15

Introduction to Data Query Languages

Summary

Data query languages are the key to accessing the data found in a database. They
are used alongside other programs to access data. Their specialized queries make
accessing, modifying, and deleting data easier for the user; however, they may not
be easily readable.

As with other programming languages, data query languages have evolved over
time and new languages have been created to suit the needs of today’s users. Data
query languages belong to the fourth generation of programming languages. Com-
pared to other programming languages used today, data query languages are more
evolved than those of previous generations, yet they lack the complexities of the
current fifth generation.

The necessity of query languages has increased exponentially in recent decades.


Businesses now rely on using data when making business decisions. Since data
query languages are central to the practical utilization of data, understanding how
to use these languages is essential. Market and industry leaders have learned the
value that comes from having employees with database query language knowledge,
as their work contributes to business decisions. Having an expert SQL programmer
is no longer essential for using database data. Many business students now have
low-level SQL knowledge enough to understand, use, and pull data.

Knowledge Check

Did you understand this unit?

You can check your understanding by completing the questions for this unit on the
learning platform.

Good luck!
Unit 2
Data Management

STUDY GOALS

On completion of this unit, you will be able to …

… classify the different phases of the data life cycle.

… differentiate between structured, unstructured, and semi-structured data.

… explain the importance and uses of SQL and NoSQL databases.

DL-E-DLMDMDQL01-U02
18 Unit 2

2. Data Management

Introduction
Data are enigmas. Definitionally, data are factual pieces of information that we use to
measure an object or concept (Watt & Eng, 2022). While data may seem like simple
information, they go through life cycles completely unique to themselves. Data can be
broken down into unique categories and can be divided based on style, appearance,
and structure. They can be used as part of a calculation. They can be a person, a place,
an event, or any number of things. In fact, the different forms data can take are end-
less, and therefore categorizing data and understanding its life cycle is more complex
than meets the eye.

The data life cycle provides the framework for the successful management and reuse of
data. Much like other life cycles in technology (or even non-technology), it follows the
same pattern of creation, living, maintenance, and death. Data, far from straightforward,
dynamically expand in this life cycle, and important steps are required for data to be
properly utilized.

Data are also divided between structure, unstructured, and semi-structured based on
how they look, how they can be stored, and if a user can easily read the information
they indicate. Data were first conceived as simple pieces of information; however, over
time it was discovered that they can be cataloged, categorized, and placed based on
their structure.

2.1 Data Life Cycle


Data do not appear out of thin air. Like any other project, creature, or natural phenom-
enon, data go through life cycles, from creation, to use, to destruction. As with other life
cycles, the data life cycle is specific to data, although it has the same main characteris-
tics seen in other life cycles. The data life cycle can be summarized in five phases.
When data follow these phases, they are best able to serve the needs of the user,
whether that is a company, organization, or individual. The five phases of the data life
cycle are as follows (Christiansen, 2021):

1. Creating
2. Storing
3. Using
4. Archiving
5. Destroying
Unit 2 19

Data Management

Creating

Data needs to be created before anything can be done with it. Therefore, creation is the
first phase. There are multiple ways data can be created by an organization (Christian-
sen, 2021):

• Acquisition involves acquiring data from another source, organization, or application


that exists outside the organization.
• Entry involves the manual entry of data into a system by staff within the organiza-
tion.
• Capture involves capturing data using multiple devices and through multiple pro-
cesses (e.g., social media analytics).

These three ways can also include other various aspects (e.g., governance), sub-steps
(e.g., procedures), or sub-life cycles in which the data can be manipulated into proper
formatting. The creation phase can be deceiving as it appears quick and straightfor-
ward. However, as mentioned, data are enigmas. They rapidly grow and are unraveled.
20 Unit 2

Storing

Data must be stored in the correct location (physical or virtual), and strong security
measures, efficient retention policies, and other lifecycle management processes must
be used when doing so (Christiansen, 2021). Organizations should also consider strong
backup processes, the need for multiple locations (whether they should be on-site or
off-site), security around these backups, and the schedules for how often these back-
ups are created. All these time policies contribute to resilient data retention by the
organization and strong security for the users. For example, whenever a data leak
occurs, it is common that a key element of the storing phase was missing, which
caused the data to be stored insecurely and, thus, made vulnerable.

Using

Once the data are created and stored somewhere safe, they can then be used. This
stage forms the basis for some decision-making processes within an organization
(Christiansen, 2021). Data, in this phase, are retrieved, changed, updated, moved to
other applications (potentially overlapping with another data life cycle phase), and
saved for use in other contexts or for other purposes.

Audit trails Security is still just as important in this phase. Audit trails, which track who is using
These are a record of data and how they are doing so, are an important and easy way to retrieve backup ver-
the changes that sions should a user manipulate data incorrectly, whether this was done maliciously or
have been made to a innocently. Specialists will usually create security levels for each type of user. For exam-
database or file. ple, administrators will have access to all data, while regularly read-only users may only
have access to general data.

Archiving

When the appropriate time arrives, or at consistent intervals, copies of the data will be
moved to separate storage locations. This step is relevant to the work of developers,
while it is not generally important to users. In this step, developers will remove any
data that is unnecessary or old from the production database (Christiansen, 2021). This
increases efficiency as it eliminates maintenance of this unnecessary data in the active
database. As this step is time-consuming, costs more money, and has a lower return,
users don’t see much use for data when it is not in production or in active use. Devel-
opers know differently. Developers understand how data are important down the road
and in unforeseeable future situations. Archiving may be a mundane step, but it is
essential to the overall data life cycle.
Unit 2 21

Data Management

Destroying

If all the data ever used by a company were always collected and archived, a significant
amount of useless data would be unnecessarily preserved. Not all old data are useless,
but some data are not worth keeping. If all data were retained, the collection size
would cause the amount of storage needed to balloon, which would prove costly to the
company (Christiansen, 2021). Therefore, companies must decide which data should be
destroyed.

Companies usually decide this by creating strong governance or compliance around


data practices. This ensures all levels of leadership agree on what data should be pre-
served or destroyed and how often data destruction should take place.

As stated by Blancco (2019), “it’s also important to note that destroying data (data
destruction), is not the same as destroying the media on which data is stored (physical
destruction)” (para. 10). Physical destruction involves making the place where the data
are saved completely unusable. For example, this could include completely breaking a
hard drive or shredding storage media. Non-physical destruction ensures that the data
are destroyed forever. For example, this could include using a magnetic eraser to sani-
tize the data source, a process that makes the data irrecoverable. (Blancco, 2020). The
“data destruction process is confirmed using recognized verification methods and pro-
duces a certified, tamper-proof report” (Blancco, 2020, para. 15).

2.2 Types of Datasets (Structured, Semi-Structured, and


Unstructured Data)
Datasets can be divided into the categories of structured, unstructured, and semi-
structured. Each type of dataset is “sourced, collected and scaled in different ways, and
each one resides in a different type of database” (IBM Cloud Education, 2021, para. 1).

Structured

Structured data is highly organized and easy to read. Most often found in relational or
SQL databases, data in this dataset can be easily added, updated, and deleted by the
user (IBM Cloud Education, 2021). Generally, any user with an understanding of data
could use this type of dataset, including accessing and reading it. This has led to the
creation of many tools that make the use of structured data easier.

Structured data have the following properties (Meier & Kaufmann, 2019, p. 144):

• schema. The database must have a structure including proper table formation,
integrity constraints, and the definition of referential integrity.
• data types. With the use of the relational database, data will always be found in a
data type (for example, CHARACTER, INTEGER, DATE, or TIMESTAMP).
22 Unit 2

With structure come limitations, however. The dataset lacks flexibility and usability as it
is so highly structured (IBM Cloud Education, 2021), and its schemas require high
amounts of storage space.

Some examples of structured datasets include account databases or customer rela-


Customer relation- tionship management (CRM) software. Both require structured data that can be created
ship manager into reports and strict processes.
This is a system that
refers to the princi-
ples, practices, and Unstructured
guidelines that an
organization follows On the opposite end of the dataset spectrum, there is unstructured data, which does
when interacting not have any schema or predefined data model (IBM Cloud Education, 2021, p. 17). It is
with its customers. found in NoSQL databases and “cannot be processed and analyzed via conventional
data tools and methods” (IBM Cloud Education, 2021, p. 17).

Because the data is unstructured, it remains pure and in its natural format. Unlike
structured data, in which data need to be divided and changed to fit the needs of the
database schema, unstructured data remains untouched and in the same format as it
is received. This is a positive for any data scientists who wish to use the data for analy-
sis. This also makes it easier to pull data quickly, since no changes need to be made
after the data are pulled. Thus, the rate of accumulation is much faster with semi-
structured data.

However, because the data is less structured and has low readability, it requires some-
one with high expertise to use it to prepare reports. This expertise can come from data
scientists or even specialized tools designed to handle unstructured data.

Some examples of unstructured data include chatbots or predictive data analytics.


Both require responsiveness to the data itself, not specifically to how the data is
stored. They also require large sets of data, which NoSQL databases expertly allow.

Differences between Structured and Unstructured Data

Although it is important to understand what each dataset represents, it is equally


important to understand their differences. “While structured...data gives a ‘birds-eye
view’ of customers, unstructured…data provides a deeper understanding of customer
behavior and intent” (IBM Cloud Education, 2021, para. 16). There are some other key
differences to note between structured and unstructured data:
Unit 2 23

Data Management

• Structured data are stored in the form of tables that require less storage space. They
can be stored in data warehouses, which makes them highly scalable. Unstructured
data, on the other hand, are stored as media files or NoSQL databases, which
require more space. They can be stored in data lakes, which makes them difficult to
scale (IBM Cloud Education, 2021).
• Structured data is used in machine learning (ML) and uses its algorithms or strong
report writing, whereas unstructured data is used in natural language processing
(NLP), text mining, and big data management (IBM Cloud Education, 2021).
• Structured data have a predefined data model and are formatted to a set data
structure before they are placed in data storage, whereas unstructured data are
stored in their native format and are not processed until they are used (IBM Cloud
Education, 2021).

Semi-Structured

Finally, semi-structured is a type of dataset that lies in between structured and


unstructured data. Some examples of this type of dataset include JavaScript object
notation (JSON), comma-separated values (CSV), and extensible markup language (XML).
Semi-structured datasets are considered more complex and detailed than structured
but easier to read and understand than unstructured. Semi-structured uses “metadata”
to identify specific data characteristics and scale data into records and preset fields.
Metadata ultimately enables semi-structured data to be better cataloged, searched,
and analyzed than unstructured data (IBM Cloud Education, 2021).

Semi-structured data have the following properties (Meier & Kaufmann, 2019, p. 144):

• They consist of a set of data objects whose structure and content are subject to
continuous changes.
• Data objects are either atomic or composed of other data objects (complex objects).
• Atomic data objects contain data values of a specified data type.
• Data management systems for semi-structured data work without a fixed database
schema since the structure and content change constantly.

2.3 Role of Databases (SQL and NoSQL Databases)


Today’s economy and society need databases in order to operate. Every second, tril-
lions of data are sent from database to database, while data analysts, scientists, and Data analyst
even amateur programmers and computer users harvest this data in usable form. A data analyst serves
Online shopping, social media, doctor’s appointments, and schoolwork are a few of the as a gatekeeper for
countless examples of daily activities that involve the use of databases. As explained an organization's
by Meier and Kaufmann (2019), “Users of databases are rarely aware of the immaterial data so stakeholders
and concrete business values contained in any individual database” (p. v). can understand data
and use it to make
strategic business
decisions.
24 Unit 2

There are two major categories of databases in use today: SQL and NoSQL databases.
Both branch off into separate database query languages, and both have designated
needs that make them powerful or even necessary according to the specific needs of
the user or the requirements of the project.

SQL databases belong to the relational database family, which means that all data
within the database can be related through keys or could be generally related to
lookup data in relation to other data. A relational database thus has more structure
than other types. The role of this database tends to involve data that requires organiza-
tion, reporting, or has the need for a designated structure. For example, customer rela-
tionship management (CRM) systems or sales systems find this type of database useful.

SQL databases are “mainly designed for integrity and transaction protection” (Meier &
Kaufmann, 2019, p. 201). This leads to restrictions on the amount of data that can be
involved, as more data lead to slower processes. NoSQL databases, with their schema-
less, non-relational databases, were invented because of this limitation.

NoSQL databases are non-relational databases and are much newer to the industry.
They were created as the need to store increasingly large amounts of data increased.
These databases are thus associated with the handling of large datasets by data ana-
lysts and big data scientists. These databases normally have a significant number of
data but little to no structure at all. They can handle large amounts of data and have
greater processing power.

NoSQL is also a collective term that refers to multiple models. Each model represents
its own unique framework that differs from others in terms of schema and use. There
are four such setups (Meier & Kaufmann, 2019, p. 16):

1. Documents
2. Graphs
3. Value-key pairs
4. Columns

Although these structures are different, they can all process large amounts of data
quickly while using a massively distributed storage architecture. This includes analyzing
large volumes of data or searching for specific results. This “strong consistency means
that the NoSQL database management system ensures full consistency at all times”
(Meier & Kaufmann, 2019, p. 16).

NoSQL also provides flexibility with its schemaless structure. As explained by Vaish
(2013), “almost all NoSQL implementations offer schemaless data representation. This
means that you don’t have to think too far ahead to define a structure and you can
continue to evolve over time— including adding new fields or even nesting the data”
(p. 23). NoSQL provides quick development, consistent querying, and increased func-
tionality for large datasets that are interesting to note for any query language student.
Unit 2 25

Data Management

Summary

Data and data management are important components to understand in today's


market. Not only is the use of data growing, but the management of such data is
also becoming increasingly important in the general workflows of many industries.

Data are pieces of information that are stored within a database, and databases are
used everywhere. Much like any other modern-day technology, the design, role, and
number of databases offered have exponentially grown, mostly in the last few dec-
ades. Databases hold the data that society uses to store valuable information, cre-
ate reports to form business decisions, and retain secure information on subject
matters.

Data can be viewed in a lifecycle. Data are created, they live, and they die (or are
reused). Data are not static. They go through lifecycles much like projects or biolog-
ical organisms go through their respective lifecycles. This data lifecycle process is
essential for organizations to follow to ensure data is managed properly, stored
correctly, and destroyed following the proper guidelines.

When broken down, data can be understood based on their data structure. Data
can be categorized as structured, semi-structured, or unstructured. In this order
(structured to unstructured), data go from most readable to least readable, most
usable to least usable, and least flexible to most flexible. No structure is inherently
better than the other. They are equally important to understand for data manage-
ment, projects, and databases.

Knowledge Check

Did you understand this unit?

You can check your understanding by completing the questions for this unit on the
learning platform.

Good luck!
Unit 3
Fundamentals of SQL

STUDY GOALS

On completion of this unit, you will be able to …

… describe structured query language (SQL) and the four types of subqueries within SQL.

… differentiate between SQL data types.

… duplicate fundamental SQL commands.

… explain what each fundamental SQL commands performs on the database.

DL-E-DLMDMDQL01-U03
28 Unit 3

3. Fundamentals of SQL

Introduction
The use of relational databases is becoming standard practice for most industries
today. According to this trend, the use of the programming language needed to run
these databases, structured query language (SQL), has also increased.

SQL has been around for decades. Because of its history, it has a level of complexity
that makes it powerful. What is interesting about SQL is that, although it’s old and com-
plex, its general command syntax is relatively simple. This simple syntax breakdown is
why it’s so easily accessible to the everyday coder. The commands that run SQL can be
broken down into four categories that each designate a distinct function. These are
commands that create the databases, populate tables, pull data, and destroy data.

Above the level of these commands are data types that allow data to even become
more structured and related. In general, data can be characters, numbers, or dates.
Complexity is derived from the details of these data types. They can be of different
sizes, which means they can hold various sizes of data based on the needs of the data.
Although the data may be complex depending on these details, the syntax of the data
types remains quite simple.

SQL is much like any other programming language. Understanding its basics helps pro-
grammers understand how databases work and relate through objects and data. Most
importantly, the commands that run through a relational database to make it all work.

3.1 Brief Overview


SQL is used within a relational database to create databases, manage objects within
databases, query data, and control database security. Since its inception in the early
1970s, when it was developed by IBM (Watt & Eng, 2014), SQL has evolved into multiple
commercial versions including Oracle SQL, MySQL, and Microsoft SQL Server, as well as
open-source versions such as MariaDB. These versions allowed SQL to become one of
the most used languages in all developer environments and in databases.

Queries are the commands SQL creates to deal with these roles and data manipulation
needs, and this result can then be used for more analysis or even further queries.
These queries can be divided into the following four categories (GeeksforGeeks, 2021b):

1. Data definition language (DDL)


2. Data query language (DQL)
3. Data manipulation language (DML)
4. Data control language (DCL)
Unit 3 29

Fundamentals of SQL

Under these categories are commands that are pivotal to running an SQL database,
including setting up, creating tables, and entering data. These categories, along with
their commands, can be viewed in a hierarchy, as seen in the following figure.

As shown, there are advanced languages, such as transactional control language (TCL),
that can be further researched and studied. However, all these languages fit within the
main language of SQL, as well as their key commands. SQL goes hand-in-hand with the
relational database model, in which tables are connected through relationships. These
relationships provide the power of the queries where data can be connected, pro-
cessed, and corrected.

When writing commands in SQL, there are rules that should be followed for easy read-
ing but will not break the database if they aren’t followed. They will, however, make the
code look professional and effective for the reader. These accepted rules include the
following (Watt & Eng, 2014):

• Each command should begin on a new line.


• New clauses should line up with the beginning of other clauses.
30 Unit 3

• If they have multiple parts, these sub parts should be indented and lined up to
show relationships.
• Upper-case letters should be used for reserved words, such as the clauses them-
selves.
• Lower-case letters should be used to represent user-generated words, such as table
names or column names.

Data Types

Relational databases can store data using data types. These data types are mostly
strings (words), numbers, and dates. SQL does have specific data types that differ from
other programming languages; however, many columns will mostly include the same
simple data types (Beaulieu, 2020). The three data types most found in SQL are charac-
ter, numerical, and date.

Each SQL language (e.g., MySQL, Microsoft SQL Server, and Oracle) will have slightly dif-
ferent data types from each other; however, the declaration and standard data types
below are generally found in all languages. They will be given in MySQL syntax for stan-
dardization, and documentation should be checked if a different language is imple-
mented for confirmation.

Character data
Fixed or variable Character data can be stored in a fixed or variable length depending on the data nee-
length ded. Fixed length is indicated by the keyword “char,” while variable length delineated by
Data in fixed length keyword “varchar.” When declared, they appear as follows:
indicate the record
will always store the char(20) /* fixed-length */
specified amount of varchar(20) /* variable-length */
data. Data in vari-
able length will only These both declare the storage of 20 characters in length. However, because of their
hold the given intrinsic characteristics in MySQL, the char column can only store 255 bytes maximum,
amount of data. while the varchar column can store up to 65,535 bytes (or around 64 KB) (Beaulieu,
2020).

In general, the char data variable is used for smaller, predetermined character sets
such as defined yes/no columns, state/province abbreviations, or gender. Varchar is
suggested for anything longer such as names, state names, or addresses. There are lon-
ger text types that can utilize more bytes and be declared if needed. If the varchar can-
not hold the need of the data at 64 KB, there are text data types that can exceed this
size. Various SQL text data types of varying sizes are shown in the table below (Beau-
lieu, 2020).

SQL Text Types

Text type Maximum number of bytes


Unit 3 31

Fundamentals of SQL

SQL Text Types

Tinytext 255

Text 65,535

Mediumtext 16,777,215

Longtext 4,294,967,295

Numerical data
As in the case of character data, it would be wrong to assume that the numerical data
type is simple. It also has multiple types that can be chosen based on the specific
needs of the database and users. Like the character type, the numerical data will
appropriate the necessary space to store the data based on the type given by the user
(Beaulieu, 2020).

There are five distinct integer types. They are shown in the following table, along with
their minimum and maximum signed values and unsigned values. Signed value
A signed value can
represent both posi-
Required Storage and Range for Integer Types Supported by MySQL tive and negative
numbers; unsigned
Type Storage (Bytes) Value signed Minimum value values can only
range unsigned represent positive
numbers.

TINYINT 1 -128 to 127 0 to 255

SMALLINT 2 -32,768 to 32,767 0 to 65,535

MEDIUMINT 3 -8,388,608 to 0 to 16,777,215


8,388,607

INT 4 -2,147,483,648 to 0 to 4,294,967,295


2,147,483,647

BIGINT 8 -263 to 263-1 264-1 to 264-1

The sizes of these types can range from the tinyint type, which can be one byte in size,
to the bigint type, which can hold eight bytes. Therefore, the space needed to hold the
numerical data should align with the type chosen.
32 Unit 3

For example, if a simple Boolean (that is, a true or false) is needed, it could be saved in
the database as a 0 or 1. In this case, the data type used could be a tinyint, as it needs
very little space (just one digit). A larger type could also be used without causing any
explicit errors. However, using a larger size would incur unneeded space and overallo-
cated space, something database developers try to avoid.

Another example would be an employee’s salary. Tinyint, which cannot store more than
255 bytes, should not be used. Nor should a smallint we used, which has a maximum of
65,535 bytes. Mediumint is a sufficient data type for many salaries as it holds up to
8,388,607 bytes, but this should be discussed with the users and the company stakehol-
ders. However, if a CEO’s million-dollar salary needs to be stored, mediumint would not
be sufficient to contain it.

Date and time data


Date and time data types are just as important as character and numeric data types. As
with character and numeric data types, there are multiple options to choose from when
one needs to store a date or time. The data types to select from are as follows.

Date and Time Data Type Syntax

Data type “Zero” value

DATE '0000-00-00'

TIME '00:00:00'

DATETIME '0000-00-00 00:00:00'

TIMESTAMP '0000-00-00 00:00:00'

YEAR 0000

3.2 Data Definition Language (DDL)


The data definition language (DDL) sublanguage deals with the objects of the database.
These objects include tables, views, schemas, and catalogs, among others (Taylor, 2019).
The DDL will create, change, or delete these objects using basic commands in SQL. This
is done using the CREATE, ALTER, and DROP commands.

The DDL is a set of standards that all structured query languages follow (Pedamkar,
2020a). This means every instance of SQL follows these commands, which makes them
uniform and fundamental to understand. It is simple to understand these commands
Unit 3 33

Fundamentals of SQL

since they follow a straightforward schema. Each command conforms to a somewhat


simple breakdown of subcommands, and these make the “starting” commands in SQL
quick to learn and remember (Pedamkar, 2020a).

CREATE Command

When designing a new database, these are the first commands used by the user to
create these essential objects. For example, a new employee table will be created by
using the DDL CREATE employee command. Within this command, columns and
column data types are designated.

In this case, the employee table could be created using the following command: Primary key
This refers to the set
CREATE employee of attributes that
( emp_in SMALLINT, uniquely identify the
name VARCHAR(30), row.
phone INT,
CONSTRAINT pk_employee PRIMARY KEY (emp_id)
);

The statement above creates a new employee table with three columns: emp_id, name,
and address. These columns each have a data type that describes the data that should
be saved in that column. For example, the name is of the type varchar(30), which
means that it contains characters with a maximum of 30 bytes.

The syntax for any new table is

CREATE TABLE <TableName> (


<Column1> <DataType>,
<Column2> <DataType>,
<Column3> <DataType>,
.
.
.
<ColumnN> <DataType>,
)

ALTER Command

If the table (or another object) needs to be changed, the ALTER TABLE command is
used. This command can be used to add a column, change an existing column, rename
a table or column, or delete a column.

The syntaxes for these commands are as follows (Pedamkar, 2020a):


34 Unit 3

ALTER TABLE <TableName> ADD <ColumnName> <DataType>


ALTER TABLE <TableName> MODIFY <ColumnName> <DataType>
ALTER TABLE <TableName> RENAME TO <NewTableName>
ALTER TABLE <TableName> RENAME COLUMN <ColumnName> TO <NewColumnName>
ALTER TABLE <TableName> DROP COLUMN <ColumnName>

Not only do these syntaxes follow the same general coding steps, but they are also
easy to read and follow, even for novice coders.

For example, using the syntax above, if a new “birthdate” column needs to be added to
the already created table (employee), the relevant line of code will look like the follo-
wing:

ALTER TABLE employee ADD birthdate DATE;

DROP Command

Finally, when the table becomes obsolete (as well as the data inside of the table), DROP
TABLE should be used. This command will delete the table as well as any data con-
tained inside this table.

The syntax for dropping a table is (Pedamkar, 2020a):

DROP TABLE <TableName>

The table called “employee” that was previously used can be dropped using the follow-
ing line of code:

DROP TABLE Employee;

3.3 Data Query Language (DQL)


Data query language consists of the select command. This statement is used to query
the data on the database object (GeeksforGeeks, 2021b). It “is a component of SQL
statement that allows getting data from the database and imposing order upon it”
(GeeksforGeeks, 2021b, para. 6). When the SELECT statement is used, a new temporary
table is generated from permanent tables within the database. This temporary table
can then be saved for further use or stored for ad hoc reviews of the data.

The SELECT command has multiple clauses that make up the larger query. In fact, only
one clause is needed to run it (the SELECT clause); however, the more clauses that are
included, the more precise the query can pull data. The clauses are listed in the follow-
ing table.
Unit 3 35

Fundamentals of SQL

Query Clauses

Clause name Purpose

select Determines which columns to include


in the query’s result set

from Identifies the tables from which to


retrieve data and how the tables
should be joined

where Filters out unwanted data

group by Groups rows together by common col-


umn values

having Filters out unwanted groups

order by Sorts the rows of the final result set by


one or more columns

For example, we can generate a list of students’ first and last names from the employee
table using the following query:

SELECT name, phone


FROM employee

This query pulls all the data from the columns “name” and “phone” that are located in
the employee table.

The clauses should be listed in the order above; however, they do not always need to
be present based on the needs of the query results.

FROM Clause

The FROM clause is the second clause listed in the SQL query. This clause defines all
tables that will be needed to run this specific query as well as any links between them
(Beaulieu, 2020). The tables do not necessarily need to be permanent tables. They can
also be derived, temporary, or virtual tables.

Permanent tables
Permanent tables are the most common table and are thus well known among data-
base developers. They store data within rows, have attributes used to pull more infor-
mation, and are stored within the database as a table.
36 Unit 3

Derived tables
Derived tables, or subqueries, are “a query contained within another query” (Beaulieu,
2020, p. 92). A derived table is surrounded by parentheses and is found in the FROM
clause as if it were another table. Because it is in the FROM clause, it can interact with
other tables in the same clause as if it were a permanent table.

For example, the following shows a derived table:

SELECT information
FROM (SELECT concat(name, ‘ ‘, dateofbirth) as information
FROM employee
WHERE employee.name = “John Smith”) emp;

In this example, the derived table emp is being created within the parentheses in the
FROM clause and can therefore be used by the SELECT clause.

The derived table within the parentheses is also being saved as an alias. This allows
one to easily call in other parts of the SELECT statement—as in the example in the
SELECT clause that uses the information alias. An alias can be created at any time by
using the keyword ‘AS’ and by adding any name after it to indicate the alias.

Temporary tables
Temporary tables are implemented much like permanent tables. However, a major dif-
ference is that temporary tables are deleted after a set time, usually after the session.
Each database server is different, and this type of table should be used with caution.

Virtual table or view


The final type of table is the virtual table, also known as a view. This type of table is
Data dictionary stored in the data dictionary and looks and acts like a table. However, has no data
The SQL data dictio- associated with it (Beaulieu, 2020).
nary stores informa-
tion about the defi- Views are extremely powerful and are thus commonly utilized (Sharma, 2020). They are
nition of the wonderful for often-used queries, security systems in which data can be created or vie-
database. wed by users, and a database’s general ease of use.

WHERE Clause

The WHERE clause is the filter condition of the SELECT statement. It utilizes columns
found from the FROM clauses to filter necessary or necessary data by satisfying the
given expressional conditions (Sharma, 2020). This is completed using comparison ope-
rators such as LIKE,<,>,=, etc.

The syntax for a WHERE clause is as follows:

SELECT Column1,….ColumnN
From Table_name
WHERE [condition];
Unit 3 37

Fundamentals of SQL

Using the employee table, it could be implemented using the following query:

SELECT name, dob


FROM employee
WHERE name = “John”;

This query is filtering the rows to only show data from the employee data with the
column name containing the name “John.”

GROUP BY Clause

The GROUP BY clause works a little differently. It is used to group rows that have the
same value and the result set (Sharma, 2020). Usually an aggregate function is used on
remaining rows, e.g., count, sum, average, maximum, or minimum.

For example

SELECT name, count(*)


FROM employee
GROUP BY name

The above query will group all names together into one row so that there are no dupli-
cates, and then it will count the number of occurrences using the count() aggregate
function. The “*” in the count function indicates “all columns.” It thus counts the num-
ber of rows, regardless of whether data exist.

To put in better perspective, consider the following as the original employee table.

Group By Example Before

Name Date of birth

John 01/04/1965

Sara 06/15/1979

John 10/10/1999

Marta 11/06/1980

Sara 09/12/1988

John 04/19/1991
38 Unit 3

The query used as an example above would then produce the following.

Group By Example

Name Count(*)

John 3

Sara 2

Marta 1

HAVING Clause

The HAVING clause is ancillary to the WHERE clause and is used to apply functions. It is
used in this way because the WHERE clause does not allow aggregate functions
(Sharma, 2020). Whenever aggregate functions are needed, they can be added with this
clause.

The syntax will look like the following:

SELECT Column
FROM Table
WHERE condition
GROUP BY Column
HAVING condition
[ORDER BY Column];

ORDER BY Clause

The columns listed after the ORDER BY clause return the query in the order based on
the columns listed. It is the sorting clause of the SELECT clause (Sharma, 2020). If no
columns are listed, or if the ORDER BY clause is not present, the result will not be in
any particular order (Beaulieu, 2020). As stated by Beaulieu (2020), “the order by
clause is the mechanism for sorting your result set using either raw column data or
expressions based on column data” (p. 94).

The columns in the ORDER BY clause do not need to be listed in the SELECT clause;
however, they do need to be present in the tables called in the FROM clause.

Furthermore, while the ORDER BY clause will automatically use an ascending order, it
can be switched to a descending order by using the DESC keywords. The ASC keyword
can also be used for better descriptions.
Unit 3 39

Fundamentals of SQL

For example, the returned list in the query

SELECT name, phone


FROM employee
ORDER BY name;

lists the name of employees in an ascending order. This is because it is described in


the ORDER BY column, and because the ORDER BY defaults to ASCENDING.

If the query is changed to

SELECT name, phone


FROM employee
ORDER BY name DESC;

the result will be the same list but with the names listed in an descending order. The
same rule can be applied to columns that have numbers or dates as the data type.

3.4 Data Manipulation Language (DML)


Data manipulation language (DML) is another sublanguage of SQL that deals with data
altering. It does so by using the following commands (Watt & Eng, 2014):

• INSERT, used to insert data into a table


• UPDATE, used to update data in a table
• DELETE, used to delete data from a table

INSERT Command

The INSERT command is used to add data into the database. The general syntax for an
INSERT command is

INSERT INTO table (column1, column2, column3…)


VALUES (value1, value2, value3…)

In this case, one row of data will be inserted into a table. For example,

INSERT INTO employee (emp_no, name, phone, birthdate)


VALUES (10001, “Ronaldo Jones”, 923443523, “1979-05-23”);

will insert a row of data into the employee table. The table that records will be added
to “should be called” in the table, and the columns data will be added to “should be
listed.” Data should then be listed in the values in the same order of the columns that
the data will be pushed into (Elmasri & Navathe, 2017).
40 Unit 3

It is also possible to insert more than one row. This is done by listing multiple values in
the parentheses and using a comma to separate the groups of data. For example,

INSERT INTO employee (emp_no, name, phone, birthdate)


VALUES
(10001, “Ronaldo Jones”, 923443523, “1979-05-23”),
(10002, “Hillary Adams”, 8234423123, “1999-10-12”),
(10003, “Martin Miller”, 555434523, “1978-03-05”) ;

As shown, the data are listed in order as the columns. This is the same as inserting
rows and separating each one with a comma.

An additional way to insert data is by using data from another table. This is done by
Subquery using a subquery. The following shows an example of this:
These are queries
that appear inside INSERT INTO employee (emp_no, name, phone, birthdate)
another query state- SELECT pers_no,
ment. Concat(first_name, “ “, last_name),
Pers_phone,
Dob
FROM person

As with other INSERT statements, the number of columns, as well as the type of data
found in each column, matches the INSERT table.

Because it is a subquery, this query can be expanded using the other SELECT clauses,
which narrows down the exact data that needs to be added to the table.

UPDATE command

The UPDATE command is used to modify an existing data point within the database.
This can be done in multiple rows or columns yet in only one table at a time.

The syntax for this command is as follows:

UPDATE table_name
SET column_1 = expression_1 ,
column_2 = expression_2 ,
...,
column_n = expression_n
[WHERE predicates] ;

In the set statement, a column can only be listed once. While the WHERE statement
helps narrow down the exact row or rows specified.
Unit 3 41

Fundamentals of SQL

DELETE Command

The DELETE command will remove data from a specific column, row, or table in a data-
base. The basic syntax for this command is as follows:

DELETE FROM table_name


[WHERE filter]

The WHERE clause should be added unless an entire table is requested to be deleted.
The WHERE clause will filter to the exact row, column, and data point to be deleted
(Elmasri & Navathe, 2017).

For example, DELETE FROM employee will delete all the data from the entire employee
table but will not delete the table itself. However, if a WHERE statement is included,
such as

DELETE FROM employee


WHERE name = “John”

data will only be deleted in the rows in which the column “name“ contains the name
John (Elmasri & Navathe, 2017).

Summary

SQL is run by powerful commands that can be divided into four different categories
or sub-languages. By learning key commands in each of these four sub-languages,
a developer can create a database, add data, and navigate through queries. These
four categories define how the data is being interacted with and what precisely is
being done.

SQL is an older language but is still among the most used since it is the language
of relational databases. What adds to its power is the ease with which new pro-
grammers or even non-programmers can learn it. The syntax to all commands stays
relatively the same, regardless of which attribute is being updated or which data is
being inserted. The relatively simple syntax means that people across many indus-
tries can quickly learn and understand this language, which contributes to SQL’s
popularity.
42 Unit 3

Knowledge Check

Did you understand this unit?

You can check your understanding by completing the questions for this unit on the
learning platform.

Good luck!
Unit 4
Advanced SQL

STUDY GOALS

On completion of this unit, you will be able to …

… describe the transaction and data control languages

… create commands in the transaction and data control languages

… differentiate between various versions of SQL

DL-E-DLMDMDQL01-U04
44 Unit 4

4. Advanced SQL

Introduction
When one steps past structured query language’s (SQL’s) most basic commands, one
finds how it has developed very sophisticated steps. This is seen in its use in transac-
tion control language (TCL) or procedural language for SQL (PL/SQL), in which multiple
steps and tasks are added to the previously online command.

This expansion leads to further abilities, such as coding control with transaction com-
mitting and rollback, and even creating functions and variables with PL/SQL. They both
start with the commands of SQL but can expand into groups of code, which allows for
powerful interactions with the database. In the case of PL/SQL, this can lead to becom-
ing a new application, one built entirely on SQL.

With this expansion of abilities comes the importance of controlling security and
access using data control language (DCL). This sub-language handles the commands
that permit and deny users access to specific objects within a database. As applications
become more complex using SQL commands, TCL, or PL/SQL, these DCL commands are
important steps to securing any database. They also allow users to utilize databases
more easily.

4.1 Transaction Control Language (TCL)


Transaction control language moves away from one-command functioning and toward
Database transaction a set of tasks in a single execution. Every database transaction “begins with a specific
This symbolizes a task and ends when all the tasks in the group successfully complete” (GeeksforGeeks,
unit of work per- 2020b, para. 1). The transaction has two possible results, success or failure. These trans-
formed within a actions must be controlled completely in a database to avoid failure and incomplete
DBMS against a transactions (which also result in failures).
database and trea-
ted independently These transactions are first implemented by using the BEGIN TRANSACTION Command
from other transac- (GeeksforGeeks, 2020b).
tions.
BEGIN TRANSACTION transaction_name ;

When that is completed, the SET TRANSACTION command is run, which gives a transac-
tion a name:

SET TRANSACTION [ READ WRITE | READ ONLY ];

If the transaction is a success, the changes can be saved in the database. This is com-
pleted with the COMMIT command:

COMMIT;
Unit 4 45

Advanced SQL

An example of this process could look like the following:

DELETE FROM employee WHERE name = “Joseph Turner”;


COMMIT;

This transaction would delete two rows from the employee table, followed by commit-
ting (or saving) the deletion.

If any error were to occur, or if the SQL statement run was incorrect, the ROLLBACK
command can be run as follows:

ROLLBACK;

This command can only be used to undo transactions since the last COMMIT or ROLL-
BACK command was issued (GeeksforGeeks, 2020b).

Another point in a transaction is a SAVEPOINT, where changes can be rolled back to the
specified point without rolling back the entire transaction (GeeksforGeeks, 2020b).

This can be done using the following syntax:

SAVEPOINT SAVEPOINT_NAME;

One can return to this point by using the ROLLBACK command, which undoes changes
to that specified point. The syntax using the ROLLBACK command to the SAVEPOINT is
as follows:

ROLLBACK TO SAVEPOINT_NAME;

To put this all into context, these commands could all be used as such:

ROLLBACK TO SAVEPOINT_NAME;
SAVEPOINT SP1;
//Savepoint created.
DELETE FROM employee WHERE name = “Joseph Turner”;
//deleted
SAVEPOINT SP2;
//Savepoint created.

In this example, SP1 is first SAVEPOINT created before deletion. In this example, one
deletion has taken place.

After deletion, SAVEPOINT SP2 is created.

Assume that a deletion has taken place, but that it was done in error (e.g., an incorrect
employee was deleted). Thus, you decide to ROLLBACK to the SAVEPOINT that you
identified as SP1, which is before deletion. This can be done by running
46 Unit 4

ROLLBACK TO SP1;
//Rollback completed.

Finally, after a SAVEPOINT is used, it can be removed. This is done using the RELEASE
SAVEPOINT command in the following manner:

RELEASE SAVEPOINT SAVEPOINT_NAME

This makes the SAVEPOINT ineligible to be used again by the ROLLBACK command to
undo transactions. This “is used to initiate a database transaction and used to specify
characteristics of the transaction that follows” (GeeksforGeeks, 2020b, para. 23).

Functions

Functions are key components in creating powerful steps in any coding file. According
to Taylor (2019), “functions perform computations or operations that are more elabo-
rate than what ... would [be expected by] a simple command statement to do” (p. 2017).
There are multiple types of functions in SQL; the following are a small sample.

Date and time functions


SQL include three functions that work exclusively with dates and times, including cur-
rent date, current time, or both (Taylor, 2019). As stated in Taylor (2019), “CURRENT_DATE
returns the current date; CURRENT_TIME returns the current time; and CURRENT_TIME-
STAMP returns both the current date and the current time. CURRENT_DATE doesn’t take
an argument, but CURRENT_TIME and CURRENT_TIMESTAMP both take a single argu-
ment” (p. 228).

Examples of each include the following:

CURRENT_DATE

Returns the current date, for example 2022-01-12.

CURRENT_TIME(1)

Returns the current time with the number of digits beyond the decimal point specified
in the argument. In this case, 08:22:24.3.

CURRENT_TIMESTAMP(2)

returns the current timestamp, and as with the CURRENT_TIME, will return the number
of digits beyond the decimal point using the number specified in the argument. In this
case, 2022-01-12 08:22:24.32.
Unit 4 47

Advanced SQL

Numeric functions
Numeric functions can take in a variety of data types, but the output will always be a
numeric value (Taylor, 2019). SQL has 14 types of numeric value functions. Not all will be
discussed in this unit, but all are worth reading about.

Position expression (POSITION)


POSITION searches for a specific target string within the past source string and returns
the starting point of the character’s position. The syntax is as follows (Taylor, 2019):

POSITION (target IN source)

For example,

POSITION (‘T’ IN ‘Transmission’)

returns 1. If the string is not found, the function returns 0. For example,

POSITION (‘Q’ IN ‘Transmission’)

returns 0. If the target string has a null value, the result is also null.

Extract expression (EXTRACT)


The EXTRACT function extracts a single field from a date, time, or interval (Taylor, 2019).

For example,

EXTRACT (MONTH FROM DATE '2018-12-04')

will return 12.

String Functions

Functions also exist when manipulating strings. SQL allows a significant number of
string functions, and the following sections provide an overview of the most basic of
these functions.

SUBSTRING (FROM)
The SUBSTRING function is designed to cut the string by using the starting point and
ending point passed through the argument. For example,

SUBSTRING(‘manual transmission’ FROM 8 FOR 4);

will return “tran.” These four characters are returned by the query because it cuts four
characters out of the string, beginning with the eighth character and finishing with the
twelfth (Taylor, 2019).
48 Unit 4

As stated by Taylor (2019), it is important to note that “some implementations do not


adhere strictly to the…standard syntax for the SUBSTRING function, or for the other
functions that follow” (p. 221). It is thus essential to “check the documentation of the
implementation you are using,” particularly if you run into problems using sample
codes (Taylor, 2019, p. 221).

UPPER and LOWER


The UPPER and LOWER functions concern capitalization. These functions convert wha-
tever string is passed through the argument to either upper case or lower case, respec-
tively.

For example,

UPPER(“HapPy”);

will return “HAPPY.” While the query,

LOWER(“HapPy”);

will return “happy.”

Conversion functions
It is common to have to convert from one data type to another. Conversion functions
are designed to perform this task. SQL processes conversion functions with the CAST
expression (Taylor, 2019).

CAST can convert one data type to another and then back again. For example, it can
convert a date to a string and then convert the string back into a date. As in, the DATE
type ‘2020-02-20’ can be converted to a string. However, there will be an error if you
attempt to convert a non-date string. To implement this expression, pass the data to be
converted along with the new data type. For example,

CAST(‘16234’ AS INTEGER)

will return 16234 and convert it to an INTEGER in the database.

Mathematical functions
Although there are several useful mathematical string functions in SQL, the ones that
are most important to know are COUNT, AVG, MAX, MIN, and SUM. These are all aggregate
functions, and the tasks they perform will be exemplified through the use of the table
below.

Table grocery_order

Item Unit_price Number


Unit 4 49

Advanced SQL

Table grocery_order

Apple .25 5

Chips 3.45 1

Milk 1.79 2

Butter 3.88 -

Bread 4.23 -

The COUNT function returns the number of rows in the table. For example,

SELECT count(*)
FROM grocery_order;

will return the number of rows in the grocery_order table, which is 5. In contrast,

SELECT count(number)
FROM grocery_order;

will return 3, as only 3 rows contain data.

The AVG function calculates the average of the values in the specified columns. The
function works only on columns that contain numeric data (Taylor, 2019). For example,

SELECT AVG(number)
FROM grocery_order;

will average the numeric data in the column number.

Like the AVG function, the MAX function will find the maximum numeral in the column.
For example,

SELECT MAX(number)
FROM grocery_order;

will return 5.

Additionally, the MIN function will find the minimum numeral in the column. For exam-
ple,

SELECT MIN(number)
FROM grocery_order;
50 Unit 4

will return 1.

Finally, the SUM function is used to add all numbers in a column. For example, if nee-
ding to find the grocery items ordered, the following query would be used:

SELECT SUM(number)
FROM grocery_order;

This would return 8.

ISNULL
The ISNULL function checks if the value passed contains any value or is null (Snaidero,
2021). “Null” is a special marker used in columns to indicate when a value does not
exist (Snaidero 2021). This is distinct from an empty string or from 0; instead, in the
case of “null,” no value exists.

ISNUMERIC
The ISNUMERIC function denotes if the input is a numeric type or not. As described by
Fadlallah (2021), “if the input expression is evaluated to a valid numeric data type, SQL
server ISNUMERIC returns 1; otherwise, it returns 0” (para. 2).

The following query is an example of this function:

SELECT ISNUMERIC(4567);

This query would return 1.

Querying Multiple Tables

Well-written queries bring about strong data results. Occasionally, the data needed for
these results exist across multiple tables. When this occurs, tables need to be joined in
the query, and this can be done using multiple methods.

For the following example queries, two tables will be used, a student table and a
teacher table. The student table is composed of information on the students, and it
contains five columns.

Student Table

student_id first_name last_name level active

10521 Antonio Gemma 1 T

94512 Mohammad Loram 3 F


Unit 4 51

Advanced SQL

student_id first_name last_name level active

71532 Peter Giroud 2 T

74415 Marta Williams 1 T

The teacher table also contains five columns and contains information about teachers.

Teacher Table

teacher_id first_name last_name gender active

53452 Franz Rosenbaum M F

85567 Simone Mckee F T

23451 Vladislav Mykolenko M T

12678 Hugo Rodriguez M F

If a list needs to be generated of all the teachers and students, this can be done in
multiple ways in SQL. The SQL operators UNION, UNION ALL, and JOIN all create this
list, but using different rules and syntaxes. There are different positives and negatives
associated with using each method.

UNION

The UNION operator is much like basic algebraic addition. It takes one table and adds it
to another. This operator “enables you to draw information from two or more tables
that have the same structure” (Taylor, 2019, p. 310).

A key aspect of this definition is that the tables must have the same structure, which
means they must have the same number of columns in the SELECT clause. The corre-
sponding columns must all have the same data types and lengths. Along with having
the same structure, the UNION operator will add all rows while eliminating duplicates
(Taylor, 2019, p. 310).

For example, the following command shows the proper use of the UNION operator:
52 Unit 4

SELECT student_id, first_name, last_name


FROM student
UNION
SELECT teacher_id, first_name, last_name
FROM teacher;

Notice that both SELECT clauses call the same number of columns (3) and the
columns match in terms of data type.

The following data would be produced.

UNION Result

Student_id First_name Last_name

10521 Antonio Gemma

94512 Mohammad Loram

71532 Peter Giroud

74415 Marta Williams

53452 Franz Rosenbaum

85567 Simone Mckee

23451 Vladislav Mykolenko

12678 Hugo Rodriguez

With these results, note that the column names of the first table are used in the resul-
ting table but can be changed using aliases.

The following is an example of a UNION operator that would fail:

SELECT student_id, first_name, last_name, level


FROM student
UNION
SELECT teacher_id, first_name, last_name, gender
FROM teacher;
Unit 4 53

Advanced SQL

Although each SELECT clause contains the same number of columns, the level and
gender columns are distinct data types.

UNION ALL
If a query is needed where duplicates should not be removed, the UNION ALL opera-
tion should be used. Because the UNION operation will eliminate any duplicate rows, if
the desired result requires these duplicates to be shown, the ALL operator should be
added (Taylor, 2019).

For example, consider the following two tables.

Student Table 2

student_id first_name last_name level active

10521 Antonio Gemma 1 T

94512 Mohammad Loram 3 F

71532 Peter Giroud 2 T

74415 Marta Williams 1 T

052621 Agnieszka Nowak 1 T

Teacher Table 2

teacher_id first_name last_name gender active

53452 Franz Rosenbaum M F

85567 Simone Mckee F T

23451 Vladislav Mykolenko M T

12678 Hugo Rodriguez M F

052621 Agnieszka Nowak 1 T


54 Unit 4

A new row is added that contains the same data in the first three columns should the
following query be called:

SELECT student_id, first_name, last_name


FROM student
UNION
SELECT teacher_id, first_name, last_name
FROM teacher;

The query would result in the following:

UNION Result 2

Student_id First_name Last_name

10521 Antonio Gemma

94512 Mohammad Loram

71532 Peter Giroud

74415 Marta Williams

53452 Franz Rosenbaum

85567 Simone Mckee

23451 Vladislav Mykolenko

12678 Hugo Rodriguez

052621 Agnieszka Nowak

With the addition to the ALL operator, the query will appear as follows:

SELECT student_id, first_name, last_name


FROM student
UNION ALL
SELECT teacher_id, first_name, last_name
FROM teacher;

And it will now result in the following.


Unit 4 55

Advanced SQL

UNION All

Student_id First_name Last_name

10521 Liam Olson

94512 Joseph Loram

71532 Peter Edwards

74415 Martha Williams

53452 Francis Olson

85567 Simone Mckee

23451 Casper Benson

12678 Hugo Young

052621 Mae Davis

052621 Mae Davis

The UNION ALL operator will include all data, regardless of duplicates. Therefore,
the “Mae Davis” student and teacher rows are included twice after this operator is
used.

JOIN operators
To create queries that combine data from multiple data sources and, thus, to create
more exact results, tables may be combined using JOIN clauses. JOIN clauses can be
done in multiple ways depending on the needs of the final query (Taylor, 2019).

Natural JOIN
Multiple tables can be listed in the FROM clause, and a connection between the tables
can be established in the WHERE clause. This creates a natural join wherein columns
can be derived from each table.

For example,
56 Unit 4

SELECT *
FROM teacher, student
WHERE teacher.teacher_id = student.student_id;

will produce a list where the teacher_id from the teacher table equals the student_id
from the student table. This operation is useful as it clearly indicates what is needed,
but it can be tedious to write out. As stated by Taylor (2019), “to avoid ambiguity, it
makes good sense to qualify the column names with the names of the tables they
came from. However, writing those table names repeatedly can be tiresome” (p. 319).

INNER JOIN
Another way to join tables is to use the JOIN keyword. The usual JOIN is the INNER
JOIN. This operator is similar to a natural join; however, they use different join proces-
ses and keywords. For example,

SELECT *
FROM teacher INNER JOIN student
ON teacher.teacher_id = student.student_id;

creates the same result as the natural join shown above. According to Taylor (2019), “an
inner join discards all rows from the result table that don’t have corresponding rows in
both source tables” (p. 323).

OUTER JOIN
In contrast to the INNER JOIN, the OUTER JOIN operator preserves unmatched rows.
For example, consider two tables, Table A and Table B, that you wish to join with an
outer join. Table A may have rows that don’t have matching counterparts in Table B,
while Table B may have rows that don’t have matching counterparts in Table A. An
outer join includes all rows regardless of whether they have matching counterparts.
This stands in contrast to an inner join, which will only include matched rows (Taylor,
2019).

LEFT OUTER JOIN


When using the LEFT OUTER JOIN operator, the query will exclude all matching rows
between Table A and Table B, as well as all Table B rows (Taylor, 2019). Another way to
conceptualize this type of join is as Table A minus all matching rows with Table B. This
is assuming Table A is listed before Table B (i.e., it is the “left” table). This can also be
abbreviated to LEFT JOIN for simplified querying (Taylor, 2019).

RIGHT OUTER JOIN


The RIGHT OUTER JOIN is similar to the LEFT OUTER JOIN, except that it begins from
the right instead of the left. If Table B is written after Table A (i.e., it is the “right” table),
then the result is Table B minus matching rows with Table A. A simplified way of writing
this join is as RIGHT JOIN (Taylor, 2019).
Unit 4 57

Advanced SQL

FULL OUTER JOIN


Additionally, there is the FULL OUTER JOIN. As defined by Taylor (2019), “the full outer
join combines the functions of the left outer join and the right outer join. It retains the
unmatched rows from both the left and the right tables” (p. 326). This operator can thus
be used when you wish to utilize the functions of both RIGHT OUTER JOIN and LEFT
OUTER JOIN.

EXISTS
The EXISTS operator is used with a subquery to determine whether the subquery
returns any rows. As explained by Taylor (2019), “if the subquery returns at least one
row, that result satisfies the EXISTS condition, and the outer query executes.” (p. 252).

Consider the following example:

SELECT first_name, last_name


FROM student
WHERE EXISTS
(SELECT DISTINCT teacher_id
FROM teacher
WHERE STUDENT.student_id = TEACHER.teacher_id)

The subquery (the SELECT statement found within the parenthesis) will return all tea-
cher rows also found within the student table. The DISTINCT keyword ensures that
only one copy of each teacher_id will exist. The outer query returns the first and last
names of the students who are also found in the teacher table.

4.2 Data Control Language (DCL)


Databases and the data within them need to be controlled to maintain their integrity,
security, and general misuse. Data control language (DCL) contains statements to help
protect the data. DCL includes commands to give and remove permissions for users
concerning any object within the database (Taylor, 2019).

There are two general commands to review (GeeksforGeeks, 2021b):

GRANT: allow specified users to perform specified tasks.

REVOKE: cancel previously granted or denied permissions.

These two commands are reserved mostly for database administrators (DBAs) for
granting and revoking privileges to any object. Users can be given privileges to one or
more tables, views, or other objects. Users may also be granted specific levels, such as
read-only, update, delete, or insert privileges. All these privileges are controlled by the
administrators.
58 Unit 4

Database adminis-
trators
4.3 Differences between Various SQL Versions (MSSQL,
A database adminis- PL/SQL, etc.)
trator is a technician
who is responsible SQL is the standard for interacting with databases. Over previous years and decades,
for organizing data- multiple dialects of SQL have emerged, all based on the same basic language. Similar
bases and managing to the Latin origins of both French and Italian, MySQL and Microsoft SQL both stemmed
the data within from SQL. (GeeksforGeeks, 2021a)
them.
Many SQL versions, such as PostgreSQL, MySQL, and SQLite have very similar syntaxes.
PostgreSQL According to Khalil (2018), “Microsoft SQL Server has the greatest contrast in SQL syntax,
This SQL language is as well as a wide variety of functions not available in other platforms” (para. 7). The
an advanced, enter- biggest differences between these SQL languages are in terms of data case sensitivity
prise-class, and and data functions. For example, in MySQL, there is no data case sensitivity. Therefore,
open-source rela- WHERE name = ‘John’ and WHERE name = ‘john’ are the same. This is different in
tional database sys- SQLite, in which WHERE name = ‘John’ and WHERE name = ‘john’ are not the same.
tem. It supports both
SQL (relational) and Another large difference between these languages is how they handle dates and times.
JSON (non-rela- This is exemplified by how each retrieves the current date and time. In MySQL, the
tional) querying. command is (GeeksforGeeks, 2021a)

SELECT now();

Which results in

2017-01-13 08:03:52

This would not work in SQL Server, which would require the following command
(TechOnTheNet, 2022):

SELECT GETDATE();

which results in

2019-04-28 15:13:26.270

As shown, not only is the function different, but the result is different as well. These
differences are important when translating or working between different languages as
well as importing or exporting data between databases.

When looking at processing or transactional SQL languages such as PL/SQL, the differ-
ences are more pronounced. PL/SQL extends SQL and “allows the execution of a block
of code at a time which increases its performance. The block of code consists of proce-
dures, functions, loops, variables, packages, [and] triggers” (Pedamkar, 2020b, para. 7).
PL/SQL is used for creating applications as a procedural language, while SQL is used
for manipulating data. In contrast, SQL is written in commands that are run individually.
Unit 4 59

Advanced SQL

Another way to conceptualize PL/SQL is as a combination of SQL with characteristics of


programming language (Pedamkar, 2020b). It was created by Oracle and is embedded in
the Oracle Database, along with SQL and Java. It is another language entirely and builds
on the language of SQL to create applications to show the data retrieved by the SQL
queries (Pedamkar, 2020b). This is in contrast to MySQL, which is a different version of
SQL.

Summary

The genius of structured query language (SQL) comes from its ability to expand into
other programming languages while staying true to its simple commands and data-
base interactions. By expanding with transaction control language (TCL), a devel-
oper can add multiple steps to any process including commits, savepoints, and roll-
backs. Entire processes can be coded to fit the needs of the business, users, or
stakeholders, and SQL provides the foundation for this.

If an application is needed, procedural language for SQL (PL/SQL) provides a solu-


tion. Blocks of code are created beyond that of SQL into functions and programs to
run more sophisticated applications than a command run by SQL. Because SQL is
an older language, it has diverged into multiple SQL “child” languages, similar to
how Latin diverged into various Romance languages. These derivatives of SQL,
although running on the same foundations as their predecessor, do contain differ-
ences that make their ability to run each other’s code impossible. Thus, it is impor-
tant to understand their differences and which database is being run before run-
ning essential pieces of code.

More importantly, security and users’ ability to use data easily must be understood.
This can be accomplished by understanding data control language (DCL). Although
they are normally run by database administrators (DBAs), these commands should
still be understood by all programmers.

Knowledge Check

Did you understand this unit?

You can check your understanding by completing the questions for this unit on the
learning platform.

Good luck!
Unit 5
Data Query Languages for NoSQL
Database and Other Purposes

STUDY GOALS

On completion of this unit, you will be able to …

… differentiate between various “not only SQL” (NoSQL) databases.

… define a NoSQL document database.

… execute NoSQL document database code.

… understand what a NoSQL graph database is.

… describe a GraphQL database.

DL-E-DLMDMDQL01-U05
62 Unit 5

5. Data Query Languages for NoSQL Database


and Other Purposes

Introduction
With the growing need for data, analytics, and security, along with the increasing
opportunities that application development offers when aligned with databases, the
offerings of a relational, structured query language (SQL) database have become limi-
ted. Because of these limitations, “not only SQL” (NoSQL) databases have emerged and
become common.

Document databases are the most popular of these databases. Examples of these
include MongoDB and Couchbase. These databases are composed of documents that
hold semi-structured data, adapt to the data, and are not restricted to the schema. This
means that the database can handle the data that are different from one document to
another.

Another type of NoSQL database worth reviewing includes graph databases. As


opposed to document databases, which are created with documents, graph databases
look like connections on a whiteboard. However, they are still free-flowing (according to
the needs of the data) and are not a strict schema. Graph databases contain nodes and
connections rather than tables and relationships.

Finally, GraphQL will be considered. These databases are more like connections to
databases that are optimized for application programming interfaces (APIs). They are
optimized for developers and for API connections that have any backend database.

5.1 Document Databases (N1QL/Couchbase and


MongoDB)
Document databases are the most common NoSQL database and the most popular
alternative to SQL databases (MongoDB, 2021b). They are used to manage semi-struc-
tured data that do not adhere to a fixed structure. Documents can be stored in non-
structured formats such as JavaScript object notation (JSON), binary JavaScript object
notation (BSON), and extensible markup language (XML; Technical Matters, 2020).
Because it does not have a precisely defined structure, “this data is not suitable for
relational databases since its information cannot be arranged in tables” (Technical
Matters, 2020, para. 2).

The document database creates a key that it pairs with a specific document. Informa-
tion is then located within this document. This schemaless database is held together
Key-value pair by its documents, and these documents hold these keys in a key-value pair that bring
These are pairs in value to the data. Flexible documents, instead of tables, are used to store data in field-
which a key serves value pairs that can be stored using a variety of types and structures, and they do not
as a unique identi- have to remain tied to any specific data type (MongoDB, 2021b).
fier, e.g., {id: 10293}.
Unit 5 63

Data Query Languages for NoSQL Database and Other Purposes

Because document databases have a flexible schema and the documents do not have Key-value pair
to have the same data types or even fields, the process of adding data to and pulling These are pairs in
data from them is very easy (MongoDB, 2021b). For example, in an SQL database, a user which a key serves
would need to know the table schema and the table columns to pull the data from the as a unique identi-
table. Because of its flexibility, a document database does not require such definitions. fier, e.g., {id: 10293}.

There are three additional factors that differentiate document databases (and non-
relational databases) from relational databases (MongoDB, 2021b):

1. The ease of use of the data model. Documents map to objects, making them coder-
friendly (reflecting on key-value pairs). Further, data are stored together in much
more accessible means for developers. Less code is required, and higher perform-
ance is achieved.
2. The universal knowledge of JavaScript object notation (JSON) among the coding
community. JSON has become an established standard for data interchange and
storage. This makes it a better choice when saving code or data as it is more acces-
sible to more users. Therefore, JSON documents are also more readable and even
language independent. Their key-value pair structure (that is, the underlying struc-
ture of JSON) makes their use more standard by all developers.
3. The flexibility of the schema. Without a schema, developers can essentially develop
immediately without studying or perming any needed searches. Fields are inde-
pendent of one another and can differ between documents. As such, developers can
also modify the structure at any time, avoiding disruptions and potential code
breakage.

When reviewing a document database, the following features will be found (MongoDB,
2021b):

• Although no defined schema is present, data will be stored in documents. These


documents map to objects that can be used in many programming languages, which
allows for rapid programming.
• Document databases are distributed and resilient, which allows for horizontal scal-
ing, data distribution, and data replication. This is cheaper than with vertical coun-
terparts.
• Most document databases have an API or query language that allows developers to
perform create, read, update, and delete (CRUD) operations on the database.

Advantages

Document databases offer many advantages to their users. More precisely, they allow
an intuitive data model that is “fast and easy for developers to work with” (MongoDB,
2021b, para. 1). Additionally, the flexibility of their schema allows the data model to
change as the needs of the application change. In a relational database, if a cell does
not contain data, for example, the cell must exist regardless—it would just be a cell
without any data. However, in a document database, the document or field within the
document could be deleted without any repercussions for the integrity of the database
(Technical Matters, 2020).
64 Unit 5

This also allows for easier integration and scaling of new information. The database
can scale out to adapt to the data (MongoDB, 2021b). In a relational database, new data
could force changes to the schema, while in a document database, new data can be
added to the new documents without affecting previously inserted data (Technical Mat-
ters, 2020). These advantages cause document databases to be generally applicable in
various cases and across many industries (MongoDB, 2021b, para. 3).

Disadvantages

Document databases can have some disadvantages according to the specific needs of
the data. If the data need to be related and linked, document databases will be slow to
respond since they are not built for interlinking tasks. Even with high volumes of data,
if the data are related (e.g., in a banking database), a relational database would have
superior performance and storage compared to a document database (Technical Mat-
ters, 2020).

N1QL/Couchbase

Couchbase Server is a JSON document database designed with easy-to-scale JSON


document (or key-value) access that has highly sustained output and low latency.
Couchbase applications “may help in serving many users by storing, creating, aggrega-
ting, retrieving, manipulating and presenting data. It is designed to be gathered from a
single machine to very large-scale deployments spanning many machines” (Geeksfor-
Geeks, 2020a, para 1).

Like all NoSQL databases, Couchbase is schema-free and flexible, responding to the
needs of the data. It also provides CRUD operations, which makes it a useful database
optimized for interactive applications (GeeksforGeeks, 2020a). In Couchbase Ser-
ver, “documents are stored in collections, which are stored in scopes, which are in turn
stored in buckets within a namespace” (Couchbase, 2022, para 3). This can be written in
the following format (Couchbase, 2022):

namespace:bucket.scope.collection

Couchbase runs using the non-first normal form query language (N1QL, pronoun-
ced “nickel”), which is based on SQL coding standards. N1QL uses SELECT, FROM, and
WHERE to build a simple query, just like an SQL query. In contrast to an SQL query, the
response comes in the JSON document format stored in the Couchbase server. It is also
important to note the document structure. For example, the following N1QL query

SELECT a.country FROM default:`travel-sample`.inventory.airline a


WHERE a.name = "Excel Airways";

produces the following result (Couchbase, 2022):


Unit 5 65

Data Query Languages for NoSQL Database and Other Purposes

[
{
"country": "United Kingdom"
}
]

The query reads almost exactly like SQL, yet it results in JSON.

There are other differences in SQL and N1QL when looking at more advanced queries.
But some advanced queries still stay true to their SQL roots. For example, the following
query will return the names of (at a maximum) ten hotels that accept pets in the city of
Medway (Couchbase, 2022):

SELECT h.name FROM default:`travel-sample`.inventory.hotel h


WHERE h.city="Medway" AND h.pets_ok=true LIMIT 10;

MongoDB

MongoDB is one of the most common document databases. MongoDB documents are
stored in binary JSON (BSON) format, which “is a variable of JSON with some additional
data types and is more efficient for storage than JSON” (Elmasri & Navathe, 2016,
p. 920). These documents are stored in collections.

What makes MongoDB so powerful is its ease of use and straightforward scaling (Cho-
dorow, 2013). For example, it replaces the traditional relational table with a easier-to-
read and easier-to-use document that allows for all types of data to be embedded in a
schema-free design. Furthermore, if needed, any document or any future document can
grow quickly and be easily compared to schema-restricted tables.

Features
Although a common database, MongoDB does not replace a relational database. It mer-
ely adds the options of how to store data such as scaling. There are also other factors
to consider when looking at using MongoDB (Raj & Deka, 2018). These include

• data insert consistency. MongoDB can insert large amounts of data at an extremely
high speed with equally high consistency.
• data corruption recovery. There is a command to perform data repair in MongoDB,
but this operation can be time consuming.
• load balancing. MongoDB supports faster replication and automatic load balancing
configuration because of data placed in shards.
• avoids joins. Joins in a relational database are examples of query deceleration. By
removing joins, queries speed up.
• changing schemas. Changing schemas in a relational database causes issues and
can possibly slow down queries. MongoDB is schemaless, which means that adding
new fields will not have negative effects.
66 Unit 5

As mentioned, MongoDB is not structured with databases, tables, and fields. Instead, it
contains collections, documents, and records. This new layout is important to remem-
ber when starting a MongoDB collection.

Database and collections


To start a database in MongoDB, a collection must be created. Instead of a database
being created and populated, a MongoDB database will create a collection and popu-
late the collection.

For example,

db.createCollection(“project”)

creates a new collection, "project".

As seen in the previous example, MongoDB runs mostly on simple commands in which
further parameters can be used to specify the arguments or can be left empty. This
streamlines coding for the programmer.

Collections are grouped together to form databases (Chodorow, 2013). A single instance
of MongoDB can host several databases. Each collection should store documents per-
taining to the topic of the database for ease of use and quicker data capturing. Much
like a SQL database, these databases also have permissions and restrictions that can
be added to users (Chodorow, 2013).

Data models
As stated, instead of tables and relationships like a relational database, MongoDB is
comprised of documents. These documents are in a BSON format, which looks exactly
like JSON. For example, the following is a MongoDB document (Chodorow, 2013):

{
"greeting" : "Hello, world!",
"foo" : 3
}

This document contains two keys, "greeting" and "foo", with corresponding values,
"Hello, world!" and 3.

Although quite simple, there are still some rules MongoDB data models must follow
(Chodorow, 2013):

• The keys in a document are strings but cannot contain a string character.
• Values can be one of several different data types, such as string or integer. It can
even contain embedded documents.
Unit 5 67

Data Query Languages for NoSQL Database and Other Purposes

• A document cannot contain duplicate keys.


• Key/value pairs in documents are ordered. Thus, {"x" : 1, "y" : 2} is not the
same as {"y" : 2, "x" : 1}. MongoDB may reorder the field order. Thus, the
order does not usually matter and the schema should not depend on a certain
ordering of fields. (Chodorow, 2013)

Documents
Now that the collection "project" exists, documents can be added (Elmasri & Navathe,
2016). This is again done with the command line. For example, consider the following
command:

db.project.insert({name: “Jack”, age: 25, favColor: “blue”});

This line is inserting a document into the collection project. Notice that the collection
name is referred to in the command.

Secondly, notice how the data are inserted into the document. It is the BSON format as
described. Behind the scenes, this document is given a unique key stored in the _id
field that is automatically created and stored by MongoDB (Elmasri & Navathe, 2016). It
can be created by the programmer, but it is generally easier when a system is genera-
ted by MongoDB. As mentioned before, collections have dynamic schemas, which
means that no two documents in a collection need to have the same layout.

Transactions
While MongoDB can handle all CRUD operations, it does not appear like an SQL com-
mand in terms of key words (e.g., SELECT, INSERT, or DELETE). Although they look diffe-
rent and have different terminology, they can translate easily into MongoDB once you
understand SQL commands. This can be expanded by first studying the SELECT com-
mand in SQL, which is the .find() command in MongoDB. Because MongoDB is com-
mand-line based and requires functions rather than commands, the functions require
arguments to narrow down searches. In MongoDB,

db.project.find()

is, in coding terms, the same as searching

SELECT * FROM project

in SQL (MongoDB, 2021a). By comparing the two, sometimes it can be easier to translate
or learn MongoDB.

Like the SELECT statement, the .find() command can help narrow down the query by
using filters constructed in BSON format and added to the argument (MongoDB, 2021a).
For example,

db.project.find({age: 25})

is the same as inputting


68 Unit 5

SELECT * FROM PROJECT WHERE age = 25

Important for understanding this MongoDB statement is to notice how the search is
written inside curly brackets (i.e., {}), which is the same as any BSON value. To search,
BSON values need to be in correctly written syntax. For example, to add more to
this .find(), it would need to be written in correct BSON format, as seen in the follo-
wing example:

db.project.find({age: 25, name:”John”})

It may look at little overwhelming at first, but when comparing to SQL and converting to
simple BSON syntax, MongoDB is written in simple grammar for basic and quick deve-
lopment. When this basic syntax is understood, more complex syntaxes can be added
(MongoDB, 2021a). For example,

db.project.find( { age: { $in: [ 25, 26 ] } } )

translates easily to

SELECT * FROM project WHERE age IN (25, 26)

The special command $in can be changed to $or for the OR operator in SQL, as seen in
the following example:

SELECT * FROM project WHERE age = 25 OR age = 26

Equals in MongoDB are illustrated in the following example:

db.project.find( { age: { $or: [ 25, 26 ] } } )

When looking at the MongoDB .deleteOne() and .updateOne() functions, these


arguments also apply. The syntax for both these functions are as follows (MongoDB,
2021a):

db.<collection_name>.deleteOne(<condition>)

and

db.<collection_name>.updateOne(<condition>)

An example of deleting a row could look like the following:

db.project.deleteOne({age: 25, name:”John”}

This is the same as coding the following in SQL code:

DELETE FROM project WHERE age = 25 AND name = “John”


Unit 5 69

Data Query Languages for NoSQL Database and Other Purposes

Indexes
An index makes searching a database faster, much like an index for a book. If it did not
use an index, the query would have to use a table scan (Chodorow, 2013), which means
the server would need to search the entire database to the find the query’s results.
This can take a long time, especially in a MongoDB database designed to hold large
amounts of data.

To view how an index improves a query, the explain() function can be used to show
the query details of when it is executed. For example, consider the following query and
output (Chodorow, 2013):

db.users.find({username: "user101"}).explain()
{
"cursor": "BasicCursor",
"nscanned": 1000000,
"nscannedObjects": 1000000,
"n": 1,
"millis": 721,
"nYields": 0,
"nChunkSkips": 0,
"isMultiKey": false,
"indexOnly": false,
"indexBounds": { }
}

This query is searching for a random username, and the explain() function has out-
put fields that show detailed information on the how the query runs. The “nscanned” is
the most interesting field, as it shows the number of documents the query looked at
the satisfy the query (Chodorow, 2013). Important to note is the “millis” field, which
shows time in milliseconds. In this case, it took the query 721 milliseconds to search
1,000,000 (which in this example is all the documents in the collections).

This can be improved by implementing an index on the username using ensureIn-


dex(), as seen in the following example (Chodorow, 2013):

db.users.ensureIndex({"username" : 1})

This could take a few minutes depending on the collection size, but it can show notable
results. For example, the previous query can be run in the following manner (Chodorow,
2013):

db.users.find({"username" : "user101"}).explain() {
"cursor": "BtreeCursor username_1",
"nscanned": 1,
"nscannedObjects": 1,
"n": 1,
"millis": 3,
"nYields": 0,
70 Unit 5

"nChunkSkips": 0,
"isMultiKey": false,
"indexOnly": false,
"indexBounds": { "username" : [ [ "user101",
"user101" ] ] }
}

After adding the index, the find() query decreases to 3 milliseconds in length, while
only needing 1 object to scan. As summarized by Chodorow (2013), “indexes have their
price: every write (insert, update, or delete) will take longer for every index you add.
This is because MongoDB has to update all your indexes whenever your data changes,
as well as the document itself” (pp. 83—84). Accordingly, MongoDB has a limit of 64
indexes per collection (Chodorow, 2013). Although all examples are in a fictitious data-
base, the queries are examples of how an index, when added to the most affected field,
can improve queries in reality.

By understanding SQL first, MongoDB can be easier to learn; however, it was built for
overall usefulness and is now well incorporated into many organizations, regardless of
prior SQL knowledge.

5.2 Graph Databases (Cypher/Neo4j)


A graph database takes its name from how the data are represented. These databases
depict data as a picture or graph (Neo4j, 2022c). Instead of tables, there are nodes and
relationships. The schema can look like a whiteboard and be very free-flowing and
unrestrictive (Neo4j, 2022c). Since it is NoSQL, the schema is flexible and responsive to
the needs of the data.

When reviewing a graph database, the following features are found (Neo4j, 2022c):

• Nodes can hold any number of key-value pairs or properties.


• Relationships between nodes provide directed and named connections between
two node entities (e.g. Person LOVES Person).
• Relationships always have a direction, a type, a start node, and an end node. They
can also have properties, just like nodes.
• Nodes can have any type of relationship and any number of them without sacrific-
ing performance.
• Although relationships are always directed, they can be navigated efficiently in any
direction.

Further, there are labels, which are important ways for nodes to be grouped together.
Nodes can have zero or more labels, and labels do not have properties (JavaTPoint,
2021).
Unit 5 71

Data Query Languages for NoSQL Database and Other Purposes

Cypher/Neo4j

Neo4j is an open-source, native graph database that provides a backend for applicati-
ons (Neo4j, 2022c). Neo4j is known for its flexibility, which is derived from the fact that
data aren’t "stored as a ‘graph abstraction’ on top of another technology, it’s stored
[like a] whiteboard” (Neo4j, 2022c, para. 13).

Data model
The data of the graph database are easily illustrated using circles, lines, and arrows
(Neo4j, 2022a). This illustration explains the entities, relationships, and flows of infor-
mation the database will then use in code. A drawing of the connections of data on a
whiteboard is a good way of picturing a graph data model. It can start very simple and
rough and develop into a more complicated series of relationships.

A simple graph database showing an investigation into car insurance fraud investiga-
tion could look something like the following figure (Webber & Van Bruggen, 2020).
72 Unit 5

In the figure above, nodes and relationships are labeled and are easily read by non-
technical readers. For example, in this instance “a Person LIVES_AT a Location, and a
Person DRIVES a Car that HAS_INSURANCE and was INVOLVED_IN an Accident” (Webber
& Van Bruggen, 2020, p. 18). The graph is easy to understand and navigate, which leads
to less interpretation for database administrators and better communication between
the technical and non-technical members of the team.

These whiteboard images can become more in-depth over time concerning nodes and
relationships. The graph model can evolve to include more layers and possibly even
more entities and relationships as the specific needs for the data take shape.

Cypher query languages


Just as other database languages, Neo4j also provides proprietary atomicity, consis-
ACID tency, isolation, and durability (ACID) transactions. Its language is like SQL, but optimi-
This acronym stands zed for graphs, making it easy for programmers to handle (Neo4j, 2022c). It also con-
for atomicity, consis- tains a “flexible property graph schema that can adapt over time, making it possible to
tency, isolation, and materialize and add new relationships later to shortcut and speed up the domain data
durability. It refers to when the business needs change” (Neo4j, 2022c, para. 18).
a set of properties of
database transacti- Neo4j runs using cypher query language (CQL), another type of query language (JavaT-
ons intended to gua- Point, 2021). CQL’s syntax closely resembles SQL, which makes it easy to read and learn.
rantee data validity Since it is a graph database, the first step with CQL is to create a node. This is done by
despite errors, using a CREATE statement with the following syntax:
power failures, and
other issues. CREATE (node_name);

For example, the code

CREATE (single);

would create a node single. This can be verified using the following code:

MATCH (n) RETURN n

This code will return the node (or nodes) that are present in the collection.

To go further with the CREATE statement, a node with multiple labels can be created as
well by using the following syntax:

CREATE (node:label1:label2:. . . . labeln)

For example, you could create a node “Kalam” with the label “person,” “president,”
and “scientist.”

CREATE (Kalam:person:president:scientist)

You could take this CREATE statement further by adding the following properties:
Unit 5 73

Data Query Languages for NoSQL Database and Other Purposes

CREATE (node:label { key1: value, key2: value, . . . . . . . . . })

For example, you could create the following node:

CREATE (Ajeet:Developer{name: "Ajeet Kumar", YOB: 1989, POB: "Mau"})

As important as these nodes may be, their relationships are equally as important.
These relationships can be created using the MATCH statement with the following syn-
tax:

MATCH (a:LabeofNode1), (b:LabeofNode2)


WHERE a.name = "nameofnode1" AND b.name = " nameofnode2
CREATE (a)-[: Relation]->(b)
RETURN a,b

For example, consider the following code:

MATCH (a:player), (b:Country)


WHERE a.name = "Raul Vinci" AND b.name = "Itly"
CREATE (a)-[r: FOOTBALLER_OF]->(b)
RETURN a,b

To make a more impactful view of Neo4j, compare this Cypher code (Bitnine, 2016)

MATCH(cited_o : Affiliation)
<-(citied_a:Author) -> (citied_p:Paper)
<-(citing_p:Paper) -> (citing_t:Team)
WHERE citing_t = ‘Database’
RETURN cited_o.name

to the following SQL code:

SELECT affiliation.affiliation_name
FROM affiliation.affiliation_info, author_info.reference_info, term_info, term
WHERE affiliation.affiliation_id = affiliation_info.affiliation_name
AND affiliation_info.author_id = author_info. author_id
AND author_info.paper_id = reference_info.citied_paper
AND refrence_info.citing_paper = term_info.paper_id
AND term_info.term_id = term.term_id
AND term.term_name = ‘Database’

Note that Cypher was developed to optimize joining and relationships, and it thus
needs less code compared to SQL. Even though the JOINS are not written (as they are
in SQL) because the relationships already exist in the code, they are inherited when
writing Cypher (and the arrows help as well). It helps when writing and comparing the
two to see how similar they are, and one can thus note how Cypher makes writing que-
ries easier.
74 Unit 5

User defined procedures and functions


Neo4j and Cypher provide multiple built-in procedures and functions that are useful in
assisting the daily use and analysis of data and databases To ensure the expected pro-
cedures are installed, the following code should be run (Neo4j, 2022b):

SHOW PROCEDURES

This is similar to functions, for which the following code can be run (Neo4j, 2022b):

CALL dbms.procedures()

More details on each function and procedure can also be shown by filtering and narro-
wing the search using the aforementioned commands. Doing so will help describe
exactly how each procedure or function can run (e.g., what the required arguments are),
along with the expected output. This general filtering and search further help to narrow
the analysis and provide examples for prospective uses.

User-defined procedures and functions extend Neo4j and Cypher’s already powerful
codes. Many monitoring, analysis, and security features can be implemented using
user-defined procedures and functions. To create custom procedures in Neo4j and
Cypher, the following rules should be utilized (Neo4j, 2022b):

• @Procedure annotated, Java methods used


• an additional mode attribute included (e.g., READ, WRITE, DBMS)
• a Java 8 Stream of simple objects with public fields returned
• fields names turned into result columns available for YIELD

User-defined functions should have the following attributes (Neo4j, 2022b):

• @UserFunction annotated, public Java methods in a class used


• use default name package-name.method-name
• return a single value
• are read only

Both user-defined procedures and functions follow the subsequent rules (Neo4j,
2022b):

• take @Name annotated parameters (with optional default values)


• can use an injected @Context public GraphDatabaseService
• run within transaction of the Cypher statement
• use supported types of parameters and results (e.g., Long, Double, Boolean,
String, Node, Relationship, Path, Object)

A basic example of calling a procedure could look like the following (Neo4j, 2022b):
Unit 5 75

Data Query Languages for NoSQL Database and Other Purposes

WITH "jdbc:mysql://localhost:3306/northwind?user=root" as url


CALL apoc.load.jdbc(url,"products") YIELD row
RETURN row
ORDER BY row.UnitPrice DESC
LIMIT 20

As noted, the code is loading data from an SQL database. The code then uses the
apoc.load.jdbc() procedure, which takes two arguments. One is already defined by
the user (url).

A function, like a procedure, could be coded as the following (Neo4j, 2022b):

RETURN apoc.date.format(timestamp()) as time,


apoc.date.format(timestamp(),'ms','yyyy-MM-dd') as date,
apoc.date.parse('13.01.1975','s','dd.MM.yyyy') as unixtime,
apoc.date.parse('2017-01-05 13:03:07') as millis

In this example, the functions apoc.date.format and apoc.date.parse are utilized


to format and parse timestamps intended for different results.

Cypher style guide


Much like SQL and other programming languages, Cypher has a recommended format
for code that is designed to facilitate quicker writing and reading between developers.
Just like with SQL, most are centered around how to call objects within the database.

Node labels
Nodes labels are case sensitive and have the first letter of each word begin with a capi-
tal letter (Neo4j, 2022c). The following are examples of node labels:

(:Person)
(:NetworkAddress)
(:VeryDescriptiveLabel)

Relationship types
Relationship types are also case sensitive and written in all upper case with an under-
score between words (Neo4j, 2022c). The following are examples of relationship types:

[:FOLLOWS]
[:ACTED_IN]
[:IS_IN_LOVE_WITH]

Property keys, variables, parameters, aliases, and functions


Property keys, variables, parameters, aliases, and functions are styled the same as node
labels, with the first letter capitalized. They are also case sensitive. Examples of these
include the following (Neo4j, 2022c):
76 Unit 5

title
size()
businessAddress
firstName
customerAccountNumber
allShortestPaths()

Clauses
Clauses are capitalized and are placed at the beginning of a new line. They are not case
sensitive. It is possible to change casing (e.g., mAtCh), put multiple keywords on a line,
or mistype clauses, and Cypher will still execute the query. However, for the readability
and supportability of queries, it is recommended that the clauses are in all capital let-
ters and placed at the beginning of a new line. The following show examples of clauses
(Neo4j, 2022c):

MATCH (n:Person)
WHERE n.name = 'Bob'
RETURN n;
//-----------
WITH "1980-01-01" AS birthdate
MATCH (person:Person)
WHERE person.birthdate > birthdate
RETURN person.name;

Keywords
Keywords follow the same pattern as clauses. They should consist of all capital letters
and are not casesensitive. However, they do not need to be placed on a separate line.
Keywords in Cypher include words such as DISTINCT, IN, STARTS WITH, CONTAINS,
NOT, AS, AND, and others.

Some examples of keywords include the following (Neo4j, 2022c):

MATCH (p:Person)-[:VISITED]-(place:City)
RETURN collect(DISTINCT place.name);
//------
MATCH (a:Airport)
RETURN a.airportIdentifier AS AirportCode;
//------
MATCH (c:Company)
WHERE c.name CONTAINS 'Inc.' AND c.startYear IN [1990, 1998, 2007, 2010]
RETURN c;

Indentation and line breaks


Clauses, when grouped together or placed as one long clause, can become congested if
indentations and line breaks are not added. Indentation is best done at the ON
CREATE or ON MATCH commands and at any subqueries. Indenting at these locations
generally increases readability. Consider the following example (Neo4j, 2022c):
Unit 5 77

Data Query Languages for NoSQL Database and Other Purposes

//indent 2 spaces on lines with ON CREATE or ON MATCH


MATCH (p:Person {name: 'Alice'})
MERGE (c:Company {name: 'Wayne Enterprises'})
MERGE (p)-[rel:WORKS_FOR]-(c)
ON CREATE SET rel.startYear = date({year: 2018})
ON MATCH SET rel.updated = date()
RETURN p, rel, c;

//indent 2 spaces with braces for subqueries


MATCH (p:Person)
WHERE EXISTS {
MATCH (p)-->(c:Company)
WHERE c.name = 'Neo4j'
}
RETURN p.name;

There is a subquery in these examples that is in curly brackets and is also indented.
This has a positive influence on readability for both current and future users. This is
true for Cypher and for other programming languages.

If the subquery is only one line, it is not necessary to put it on its own line or to indent
it. Instead, it can be written like the following example query, which adheres to the
recommended guidelines for using subqueries (Neo4j, 2022c):

//indent 2 spaces without braces for 1-line subqueries


MATCH (p:Person)
WHERE EXISTS { MATCH (p)-->(c:Company) }
RETURN p.name

5.3 GraphQL for APIs


In the age where APIs and quick queries are dominant, traditional RESTful API styles RESTful API
have become too cumbersome for the amount of data traffic that must be processed. This is an API that
GraphQL was developed by Facebook to fix these issues and, more specifically, to be conforms to the
used for querying with APIs with the aim of improving the development of web applica- design principles of
tions for developers (Stemmler, 2021). the REST (that is, the
representational
What distinguishes GraphQL from other data query languages (like MySQL) is that state transfer) archi-
GraphQL can query data from any number of different sources rather than only from tectural style.
one direct database. These data could come from a database, micro-service, or even an
underlying RESTful API. When using GraphQL, it makes no difference where the data
come from. It is able to deal with so many data types by shaping the data “within a
strictly typed schema, telling it how to resolve data when asked for” (Stemmler, 2021,
para. 12).
78 Unit 5

Because it was created by developers and for developers, GraphQL offers solid per-
formance and architectural advantages. One of its performance advantages involves
preventing the over-fetching often found in a RESTful API (Stemmler, 2021). This means
that it does not ask for more data than are required to fulfill its query. The result is less
bandwidth used, an advantage for any web application.

One of GraphQL’s architectural advantages is found in how it allows developers to code


ahead, regardless of where another team is on the same project (Stemmler, 2021). For
example, a user interface team only needs to know about a single endpoint (a single
source of truth), as GraphQL is independent of any database. This increases develop-
ment speed.

The Schema Definition Language (SDL)

GraphQL has its own type of syntax to define the schema of an API. This is the schema
definition language (SDL), and it contributes to GraphQL’s interaction with an API signi-
ficantly (How to Graph QL, n.d.). For example, the SDL defines a simple type called “Per-
son” in the following way (How to Graph QL, n.d.):

type Person {
name: String!
age: Int!
}

This type, Person, has two fields, name and age, with their respective types of String
and Int. The ! indicates they are both required.

Relationships can be expressed closely in the same way. For example, Person can be
associated with Post as seen in the following case (How to Graph QL, n.d.):

type Post {
title: String!
author: Person!
}

Person should thus be updated to reflect this relationship:

type Person {
name: String!
age: Int!
posts: [Post!]!
}

To query all these data, the following query would be sent to a server:
Unit 5 79

Data Query Languages for NoSQL Database and Other Purposes

{
allPersons {
name
}
}

This would result in the following response:

{
"allPersons": [
{ "name": "Olaf" },
{ "name": "Stella" },
{ "name": "Noreen" }
]
}

In this query to the server, the allPersons field is the root field of the query, while the
payload is everything that follows. In this case, the payload is name (How to Graph QL,
n.d.).

To include arguments, the root field takes the argument as follows:

{
allPersons(last: 2) {
name
}
}

This argument states that the allPersons field has a last parameter that only returns
up to a specific number of persons (How to Graph QL, n.d.).

Working with this basic structure, the GraphQL can expand and fit the needs of many
APIs. This makes it easy for any developer to adapt and manipulate structures accor-
ding to their specific needs.

Summary

“Not only SQL” (NoSQL) databases offer many opportunities for developers who
seek unrestrained storage and analytics. Document databases store semi-struc-
tured data in documents that react to the data and require no schema—a common
thread among all NoSQL databases. Having no schema means data are easily
added and manipulated.
80 Unit 5

Two examples of such document databases are MongoDB and Couchbase. MongoDB
is more common and offers a document database where collections are created
with command lines used by developers. These command lines easily create,
delete, and manipulate data. Couchbase is a document database with key-value
document access that offers high latency for any developer.

A second NoSQL database to keep in mind is a Graph Database. This database func-
tions as a whiteboard with free-flowing nodes. It is responsive and, again, requires
no schema. Neo4j is an example of a graph database that looks surprisingly similar
to SQL.

Finally, there is GraphQL, which is the NoSQL database optimized for application
programming interfaces (APIs). GraphQL does not pull from any exact database but
is optimized for any database. This offers clear performance and architectural
advantages for developers.

Knowledge Check

Did you understand this unit?

You can check your understanding by completing the questions for this unit on the
learning platform.

Good luck!
Unit 6
Using Data Query Languages within
Application Programming

STUDY GOALS

On completion of this unit, you will be able to …

… understand the three-tier architecture of application programming.

… describe connections between the database and programming tiers.

… define unit testing and how it can relate to database programming.

… recognize database programming in Python and Java.

DL-E-DLMDMDQL01-U06
82 Unit 6

6. Using Data Query Languages within Appli-


cation Programming

Introduction
A web developer’s key competency is their knowledge of the languages needed to
develop powerful applications. This includes understanding data query languages and
how to incorporate them into web applications. Although larger teams can afford to
have specialized developers who each only know a few languages, many teams depend
on team members who know all programming languages on their development stack to
successfully create and implement projects. These stacks are found in a three-tier
architecture.

Three-tier architecture consists of the environments necessary to develop secure and


successful applications, while it also hosts languages in their native environments.
Because each language is found on separate tiers, connections between programming
languages and databases are essential for successful coding and testing. No two lan-
guages are exactly alike when connecting between tiers, but having fundamental
knowledge about one can help a developer work with another language. Python and
Java are examples of languages that differ slightly in connection and setup yet are sim-
ilar enough that a basic understanding of one makes it easier to learn or interact with
the other.

Testing can be tricky with data query languages in an application programming envi-
ronment. However, unit testing is essential for developers. The positive and negative
results from these tests help coding projects to advance toward their goals.

6.1 Special Aspects (Architecture, Connection


Management, Coding, and Testing)
Architecture and connection management are essential to the use of data query lan-
guages in applications. It is important to understand these elements in both older and
newer data query languages. Unit testing ensures that the code functions properly and
increases its quality.

Architecture

When first setting up the environment in which the code will be written, it is important
to understand where each part of the environment exists. The database, for example,
lives on what is called the “database tier.” The programming language lives on the “pro-
gramming tier,” and the user of the application lives on “the client tier.” These tiers
make up the three-tier architecture (IBM Cloud Education, 2020) of application develop-
ment.
Unit 6 83

Using Data Query Languages within Application Programming

The database tier manages the database management software, and this software
manages the data query language (IBM Cloud Education, 2020). This can be derived
from any database management software, including MySQL, Oracle, PostgreSQL, or
MongoDB.

The programming tier, sometimes called the application tier, houses the business logic
and processing (IBM Cloud Education, 2020). This tier will query from the database to
return to the client tier. This layer can be run on any popular programming language
(e.g., Python, Ruby, or PHP) and can contain frameworks such as Django, Rails, Sym-
phony, or ASP.NET.

Finally, there is the client tier, or presentation tier, which provides the user interface.
This tier includes the website that users will see and use. This front-end is developed
using HTML, CSS, and JavaScript.

This architecture provides many advantages to modern development and functionality.


Importantly, it facilitates faster development, and each tier usually represents different
teams in a department (IBM Cloud Education, 2020). According to this division, each
team can work simultaneously within their teams. Other advantages of this architecture
include improved scalability and reliability (IBM Cloud Education, 2020). Any part of the
tier can scale independently, and, if there is a problem with one tier, the other layers
can perform and run without downtime. The most important advantage of three-tier
architecture is higher security (IBM Cloud Education, 2020). Because the client tier and
the database tier do not communicate, it is more difficult for hackers to access internal
data.

Sometimes this architecture can consist of two, four, or even more tiers; however, we
will only analyze the three-tier variety.

Connection Management

Connecting the programming tier to the database tier is more precisely described
according to the programming language used. Older languages, such as C and Java,
require libraries to connect to the database tier. More modern languages, such as PHP,
Python, and Ruby, will offer built-in libraries. To add more complexity, this will also
84 Unit 6

depend on the type of database that is used. For example, MySQL connects more easily
to back-end programming languages than Oracle SQL since Oracle SQL usually requires
the addition of a library. What all these languages have in common is that the connec-
tion from the programming tier to the database tier depends on a specific code to sup-
ply the connection.

Java, for example, requires a DriverManager class to establish the connection. This
class could look something like the following (The Java Tutorials, 2021):

public Connection getConnection() throws SQLException {


Connection conn = null;
Properties connectionProps = new Properties();
connectionProps.put("user", this.userName);
connectionProps.put("password", this.password);

if (this.dbms.equals("mysql")) {
conn = DriverManager.getConnection(
"jdbc:" + this.dbms + "://" +
this.serverName +
":" + this.portNumber + "/",
connectionProps);
} else if (this.dbms.equals("derby")) {
conn = DriverManager.getConnection(
"jdbc:" + this.dbms + ":" +
this.dbName +
";create=true",
connectionProps)
}
System.out.println("Connected to database");
return conn;
}

As the code shows, after the connection is initiated, it is completed based on what type
of database is present. The first IF statement is checking if the database management
Derby system (DBMS) is an SQL database. The ELSE IF statement is checking for a derby
This is an open- DBMS.
source relatio-
nal database that
is implemented ent- Coding and Testing
irely in Java and is
available under the The difficulty when combining great programming and high-quality database query
Apache License. code is how to test them properly. Testing can become difficult when two tiers are
working together as they may not communicate in a manner conducive to testing out-
put. Therefore, unit testing is best used. There are numerous ways to test a data query
language inside an application environment, and some development applications have
built-in tools. However, knowledge of good unit tests can provide outstanding services
for the end-user and team (Gill, 2022).
Unit 6 85

Using Data Query Languages within Application Programming

Unit testing is a testing approach found in test-driven development environments. This


means production code is written after automated test cases (Gill, 2022). Although it is
a very specific testing approach, it is quickly becoming mainstream. Unit testing
involves testing one specific part of the code, or a single functionality, without waiting
for the remaining parts of the code to be written (Gill, 2022). This allows for quicker
development, improved code quality, and less expensive development.

When applying unit testing to database testing, a developer can test all components of
the database as they are developed, including each new column, constraint, and trigger.
Furthermore, SQL queries (the most tested part of the database testing process) can be
carried out by creating scenarios. After the scenarios are created, they can be followed
through the database with the code. For example, if the scenario follows a query that
inserts data into the table, does the table populate correctly? Although this is a very
small, functional test, when it is done correctly it can speed up the entire development
process.

Database testing plays a key role in application development. If testing is not comple-
ted, or not done correctly, the system can deadline, there can be data corruption, data
loss, and decreased performance (Gill, 2022). Unit testing of databases helps ensure the
development of the entire application is a success.

6.2 Examples (SQL in Python and SQL in Java)


When integrating SQL with a programming language (e.g., Python or Java), the general
process will seem the same. The language first needs to connect to the database. This
is followed by building the query, sending the query to the database, and, finally,
receiving the response. The response is the result of the query. This will be either a
true or an error response, as is the case with an INSERT query, rather than data
returned. Understanding the response in this way is just as important as building out
the query properly.

Python Examples

The following examples all assume an SQL database (in this case, SQLite3) has been SQLite3
created and the Python code (also setup in the environment) is connected to the data- This is a C library
base. that provides a light-
weight, disk-based
This can be accomplished using the following script (GeeksforGeeks, 2021c): database that
doesn't require a
import sqlite3 separate server pro-
# connecting to the database cess.
connection = sqlite3.connect("gfg.db")

# cursor
crsr = connection.cursor()
86 Unit 6

# print statement will execute if there


# are no errors
print("Connected to the database")

# close the connection


connection.close()

What is very new here is the cursor line, in which the object crsr is created and a con-
nection to the cursor object is initiated. The cursor object in Python is used to make
the connection for executive SQL queries, and it acts as middleware between the
SQLite database connection and the SQL query (GeeksforGeeks, 2021c). When com-
mands are executed in Python, they will be executed using the cursor object only.

Once the connection is established, Python can perform create, read, update, and
delete (CRUD) operations on the database. In Python, it is easiest to create an object
and write the SQL command in it. The following command will do so (GeeksforGeeks,
2021c):

# SQL command to create a table in the database


sql_command = """CREATE TABLE emp (
staff_number INTEGER PRIMARY KEY,
fname VARCHAR(20),
lname VARCHAR(30),
gender CHAR(1),
joining DATE);"""

All commands can be run by first creating the object and then executing it. Consider
the following example, which puts the cursor and the object with the SQL command
together:

# cursor
crsr = connection.cursor()

# SQL command to insert the data in the table


sql_command = """INSERT INTO emp VALUES (23, "Rishabh",\
"Bansal", "M", "2014-03-28");"""
crsr.execute(sql_command)

Notice that the cursor is opened with the connection, and then executed. If the script is
completed (and no more commands need to run in the file), Python uses a commit and
close command:

# To save the changes in the files. Never skip this.


# If we skip this, nothing will be saved in the database.
connection.commit()
Unit 6 87

Using Data Query Languages within Application Programming

# close the connection


connection.close()

Java Example

Without going into too much detail concerning the differences between Python and
Java, how these languages work and how they connect and interact with a database are
roughly the same. The following Java method, for example, will create a table (The Java
Tutorials, 2021):

public void createTable() throws SQLException {


String createString =
"create table COFFEES " + "(COF_NAME varchar(32) NOT NULL, " +
"SUP_ID int NOT NULL, " + "PRICE numeric(10,2) NOT NULL, " +
"SALES integer NOT NULL, " + "TOTAL integer NOT NULL, " +
"PRIMARY KEY (COF_NAME), " +
"FOREIGN KEY (SUP_ID) REFERENCES SUPPLIERS (SUP_ID))";
try (Statement stmt = con.createStatement()) {
stmt.executeUpdate(createString);
} catch (SQLException e) {
JDBCTutorialUtilities.printSQLException(e);
}
}

There is a separate method for establishing the connection using the DriverManager
class. This class will use this method to execute the SQL command string.

Inserting values into a SQL table using Java uses roughly the same steps, as seen in the
following example (The Java Tutorials, 2021):

public void populateTable() throws SQLException {


try (Statement stmt = con.createStatement()) {
stmt.executeUpdate("insert into COFFEES " +
"values('Colombian', 00101, 7.99, 0, 0)");
stmt.executeUpdate("insert into COFFEES " +
"values('French_Roast', 00049, 8.99, 0, 0)");
stmt.executeUpdate("insert into COFFEES " +
"values('Espresso', 00150, 9.99, 0, 0)");
stmt.executeUpdate("insert into COFFEES " +
"values('Colombian_Decaf', 00101, 8.99, 0, 0)");
stmt.executeUpdate("insert into COFFEES " +
"values('French_Roast_Decaf', 00049, 9.99, 0, 0)");
} catch (SQLException e) {
JDBCTutorialUtilities.printSQLException(e);
}
}
88 Unit 6

Handling the reading of tables also involves similar steps. The following example code
is used to retrieve data (The Java Tutorials, 2021):

public static void viewTable(Connection con) throws SQLException {


String query = "select COF_NAME, SUP_ID, PRICE, SALES, TOTAL from COFFEES";
try (Statement stmt = con.createStatement()) {
ResultSet rs = stmt.executeQuery(query);
while (rs.next()) {
String coffeeName = rs.getString("COF_NAME");
int supplierID = rs.getInt("SUP_ID");
float price = rs.getFloat("PRICE");
int sales = rs.getInt("SALES");
int total = rs.getInt("TOTAL");
System.out.println(coffeeName + ", " + supplierID + ", " + price +
", " + sales + ", " + total);
}
} catch (SQLException e) {
JDBCTutorialUtilities.printSQLException(e);
}
}

Given Java’s differences compared to Python, its code is heftier when accomplishing
SQL tasks. However, both code blocks can cleanly connect and write SQL commands to
the database.

Summary

There is more to data query languages (DQLs) than just queries and databases.
When it expands into application programming, DQLs allow a simple program to
become interactive, usable, and secure. By adding a database, the application
becomes a three-tier application in which the client, the programming language,
and the database all are divided yet connected between layers. These layers facili-
tate smoother development and higher security. The connection between the data-
base and programming layers can be tricky depending on the languages found in
each tier. However, most modern languages have built-in connection management.

When bringing it all together, proper testing is essential to a successful program.


Unit testing is an important option, as it facilitates quicker development and
cleaner testing. This can be done in any of the tiers and between connections of
the tiers. Database testing can be difficult, and unit testing helps fix any open-
ended questions regarding quality and failures in code.
Unit 6 89

Using Data Query Languages within Application Programming

Tier programming, as well as connection management, can be exemplified with the


programming languages Python and Java. Both languages need proper connection
strings and proper SQL code to work with the database. However, in terms of the
actual code that each language uses, the two show differences that stem from their
age and programming type.

Knowledge Check

Did you understand this unit?

You can check your understanding by completing the questions for this unit on the
learning platform.

Good luck!
Evaluation 91

Congratulations!

You have now completed the course. After you have completed the knowledge tests on
the learning platform, please carry out the evaluation for this course. You will then be
eligible to complete your final assessment. Good luck!
Appendix 1
List of References
94 Appendix 1

List of References

Amazon Web Services. (n.d.) Three-tier architecture overview. https://


docs.aws.amazon.com/whitepapers/latest/serverless-multi-tier-architectures-api-gate-
way-lambda/three-tier-architecture-overview.html

Barney, L. (2008). Oracle database Ajax & PHP web application development. Oracle
Press.

Beaulieu, A. (2020). Learning SQL (3rd ed.). O’Reilly.

Big Data Analytics News. (2014, February 24). Types and examples of NoSQL databases.
https://bigdataanalyticsnews.com/types-examples-nosql-databases/

Bitnine. (2016, October 20). What is the graph database? https://bitnine.net/blog-


graph-database/what-is-the-graph-database/

Blancco. (2019, August 2). What is data destruction? For data protection, the definition
matters. https://www.blancco.com/resources/article-data-destruction-definition/

Chodorow, K. (2013). MongoDB: The definitive guide (2nd ed.). O’Reilly.

Christiansen, L. (2021, March 5). 5 Stages in data lifecycle management. ZipReporting.


https://zipreporting.com/en/data-management/data-lifecycle-management.html

Couchbase. (2022). Run your first N1QL query. https://docs.couchbase.com/server/


current/getting-started/try-a-query.html

Cypher Query Language. (n.d.). Neo4j graph database platform. https://neo4j.com/


developer/cypher/

DiFranza, A. (n.d.). 5 reasons why a computer professional needs SQL skills. Charlotte
Business Journal. https://www.bizjournals.com/charlotte/news/2020/03/06/5-reasons-
why-a-computer-professional-needs-sql.html

Elmasri, R., & Navathe, S. B. (2016). Fundamentals of database systems (Global ed.).
Pearson.

Elmasri, R., & Navathe, S. B. (2017). Fundamentals of database systems (Global ed., 7th
ed.). Pearson.

Enos, J. (2020, October 7). Generation of programming languages. Medium. https://


medium.com/analytics-vidhya/generation-of-programming-languages-6e74aff63109

Fadlallah, H. (2021, July 2). An overview of the SQL server ISNUMERIC function. SQLShack.
https://www.sqlshack.com/an-overview-of-the-sql-server-isnumeric-function/
Appendix 1 95

List of References

Flynn, I. M. (n.d.). Generations, languages. Encyclopedia.com https://www.encyclope-


dia.com/computing/news-wires-white-papers-and-books/generations-languages

GeeksforGeeks. (2020a, July 2). Introduction to Couchbase. https://www.geeksfor-


geeks.org/introduction-to-couchbase/

GeeksforGeeks. (2020b, August 14). SQL: Transactions. https://www.geeksforgeeks.org/


sql-transactions/

GeeksforGeeks. (2021a, September 3). SQL: Date functions. https://www.geeksfor-


geeks.org/sql-date-functions/

GeeksforGeeks. (2021b, September 30). SQL: DDL, DQL, DML, DCL and TCL commands.
https://www.geeksforgeeks.org/sql-ddl-dql-dml-dcl-tcl-commands/

GeeksforGeeks. (2021c, September 30). SQL using Python. https://www.geeksfor-


geeks.org/sql-using-python/

Gill, N. S. (2022, February 3). Database unit testing and test-driven database develop-
ment. Xenonstack. https://www.xenonstack.com/blog/database-unit-testing

How to Graph QL. (n.d.). Core concepts. Prisma. https://www.howtographql.com/


basics/2-core-concepts/

IBM Cloud Education. (2020, October 28). Three-tier architecture. IBM. https://
www.ibm.com/cloud/learn/three-tier-architecture

IBM Cloud Education. (2021, June 29). Structured vs. unstructured data: What’s the differ-
ence? IBM. https://www.ibm.com/cloud/blog/structured-vs-unstructured-data

JavaTPoint. (2021). Neo4j tutorial. https://www.javatpoint.com/neo4j-tutorial

The Java Tutorials. (2021). Lesson: JDBC basics. Oracle. https://docs.oracle.com/javase/


tutorial/jdbc/basics/index.html

Khalil, M. (2018, October 15). SQL Server, PostgreSQL, MySQL... what's the difference?
Where do I start? DataCamp. https://www.datacamp.com/community/blog/sql-differen-
ces

Meier, A., & Kaufmann, M. (2019). SQL & NoSQL databases: Models, languages, consis-
tency options and architectures for big data management. Springer. https://doi.org/
10.1007/978-3-658-24549-8

MongoDB. (2021a). Query documents. https://docs.mongodb.com/manual/tutorial/


query-documents/
96 Appendix 1

MongoDB. (2021b). What is a document database? A document database (also known as


a document-oriented database or a document store) is a database that stores informa-
tion in documents. https://www.mongodb.com/document-databases

Moran, R. W. (2019, January 17). Getting to know OLAP and MDX: Microsoft's new multidi-
mensional database tools. ITPro Today. https://www.itprotoday.com/sql-server/getting-
know-olap-and-mdx

MySQL. (n.d.). Integer types (exact value): INTEGER, INT, SMALLINT, TINYINT, MEDIUMINT,
BIGINT. In MySQL 8.0 reference manual. Oracle. https://dev.mysql.com/doc/
refman/8.0/en/integer-types.html

Neo4j. (2022a). Cypher style guide. https://neo4j.com/developer/cypher/style-guide/

Neo4j. (2022b). User defined procedures and functions. https://neo4j.com/developer/


cypher/procedures-functions/

Neo4j. (2022c). What is a graph database? https://neo4j.com/developer/graph-data-


base/

Pedamkar, P. (2020a). Data definition language. EDUCBA. https://www.educba.com/data-


definition-language/

Pedamkar, P. (2020b). PLSQL vs SQL. EDUCBA. https://www.educba.com/pl-sql-vs-sql/

Raj, P., & Deka, C. G. (2018). A deep dive into NoSQL databases: The use cases and appli-
cations. Elsevier Science & Technology.

Risch T. (2016). Query language. In L. Liu & M. Özsu (Eds.), Encyclopedia of database sys-
tems. Springer. https://doi.org/10.1007/978-1-4899-7993-3_1090-2

Sharma, A. (2020). Introduction to SQL clauses. EDUCBA. https://www.educba.com/sql-


clauses/

Snaidero, B. (2021, March 18). Using the SQL ISNULL() function. MSSQLTips. https://
www.mssqltips.com/sqlservertip/6776/sql-isnull-function-examples/

Stemmler, K. (2021, February 16). What is GraphQL? GraphQL introduction. Apollo Blog.
https://www.apollographql.com/blog/graphql/basics/what-is-graphql-introduction/

Taylor, A. G. (2019). SQL all-in-one for dummies (3rd ed.). John Wiley & Sons.

TechGig Correspondent. (n.d.). Top 5 NoSQL databases for data scientists in 2020.
https://content.techgig.com/top-5-nosql-databases-for-data-scientists-in-2020/article-
show/78330888.cms
Appendix 1 97

List of References

Technical Matters. (2020, March 23). Document databases: How do document stores
work? Ionos. https://www.ionos.com/digitalguide/hosting/technical-matters/docu-
ment-database/

TechOnTheNet. (2022). SQL server functions. https://www.techonthenet.com/sql_server/


functions/getdate.php

Vaish, G. (2013). Getting started with NoSQL: Your guide to the world and technology of
NoSQL. Packt Publishing.

Watt, A., & Eng, N. (2014). SQL: Structured query language. In A. Watt (Ed.), Database
design (2nd ed.). BCcampus Open Education. https://opentextbc.ca/dbdesign01/chap-
ter/sql-structured-query-language/

Watt, A., & Eng, N. (2022). Database design (2nd ed.). BCcampus Open Education.
https://opentextbc.ca/dbdesign01/

Webber, J. D., & Van Bruggen, R. (2020). Graph databases for dummies (Neo4j special
ed.). John Wiley & Sons.
Appendix 2
List of Tables and Figures
100 Appendix 2

List of Tables and Figures

SQL Commands
Source: Krista Sheely (2022), based on GeeksforGeeks (2021b).

SQL Text Types


Source: Krista Sheely (2022), based on Beaulieu (2020).

Required Storage and Range for Integer Types Supported by MySQL


Source: Krista Sheely (2022), based on MySQL (n.d.).

Date and Time Data Type Syntax


Source: Krista Sheely (2022), based on MySQL (n.d.).

Query Clauses
Source: Krista Sheely (2022), based on Beaulieu (2020).

A Graph Model for Car Insurance Fraud Investigation


Source: Krista Sheely (2022), based on Bitnine (2016).

Three-Tier Architecture
Source: Krista Sheely (2022), based on Amazon Web Services (n.d.).

All other tables and figures


Source: Krista Sheely (2022).
IU Internationale Hochschule GmbH
IU International University of Applied Sciences
Juri-Gagarin-Ring 152
D-99084 Erfurt

Mailing address:
Albert-Proeller-Straße 15-19
D-86675 Buchdorf

You might also like