Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Unit - 1

UNIT I RELATIONAL DATABASES 10


Purpose of Database System – Views of data – Data Models – Database System Architecture – Introduction to relational
databases – Relational Model – Keys – Relational Algebra – SQL fundamentals – Advanced SQL features – Embedded
SQL– Dynamic SQL – Introduction to NoSQL
Database Management System (DBMS)

n Collection of interrelated data


n Set of programs to access the data
n DBMS contains information about a particular enterprise
n DBMS provides an environment that is both convenient and efficient to use.
n Database Applications:
• Banking: all transactions
• Airlines: reservations, schedules
• Universities: registration, grades
• Sales: customers, products, purchases
• Manufacturing: production, inventory, orders, supply chain
• Human resources: employee records, salaries, tax deductions
n Databases touch all aspects of our lives

Purpose of database Systems or Advantages of databases


Consider the part of a bank enterprise that keeps information about all customers and savings accounts. To
allow users to manipulate the information, the system has a number of application programs that manipulates
the files, including programs to add a new account, find the balance of an account, generate monthly statements,
debit or credit an account. System programmers wrote these application programs saved in a file to meet the
needs of the bank. This typical processing system has a number of major disadvantages. The major
disadvantages of file processing systems are
1. Data redundancy and Data inconsistency:
Since different programmers create the files and application programs over a long period, the
various files are likely to have different formats and the programs may be written in several
programming languages. Moreover, the same information may be duplicated in several places. This
redundancy may lead to data inconsistency.
For example, the address and telephone number of a particular student may be present in a file
that is maintained in the office, and also in a file maintained in the department. This redundancy leads to
higher storage and access cost. If a change in the student address is updated in the office file but not in
the department file, information mismatch or data inconsistency occurs.
2. Difficulty in accessing data:
Data in file systems may be in different formats and the programs may be written in several
programming languages across different platforms. It may lead to difficulty in accessing data.
For example, if the bank officers are in need of the names of all customers who live in particular
area say “Madurai”. The designers who wrote the program may not have anticipated this request, so they
have to write a new program to generate the list or alternatively designers have to get the information
manually. Both the options are not satisfactory.
3. Data isolation:
Data are scattered in various files and files are available in different locations. Writing an
application program to retrieve the appropriate data is difficult.
4. Integrity problems:
The data values stored in the database must satisfy certain types of consistency constraints or
conditions.

1 Department of IT, PSNA CET


For example, the bank may restrict that the balance of the bank account should never fall below
Rs.500. developers enforce this constraint in the system by adding appropriate code in the various
applications programs. However it is difficult to change the program when the constraint is to be
updated to change the minimum balance should be Rs. 1000.
5. Atomicity problems:
A system may suffer due to hardware or software failure. In such cases, the information should
not be lost rather reverted to the original status. This assurance is given by the property called atomicity
in database systems. But in many applications based on the file system it is critical to ensure the
atomicity property.
For example, consider a program to transfer Rs. 50 from account A to account B. If system
failure occurs during the execution of the program, it is possible that the data may be lost, i.e., Rs.50
may not be credited to account B, but rather deducted from account A. this leads to inconsistent state.
6. Concurrent access anomalies:
For the improved performance of the system and faster response, many systems allow multiple
users to update the data simultaneously. If such concurrent accesses are allowed, data inconsistency may
result.
For example, if account A has Rs.1000 and two customers withdraw funds Rs.100 and Rs.200
from it concurrently, it may leave the balance of account A in an incorrect state i.e., either Rs.100 may
be deducted leaving a balance of Rs.900 or Rs.200 may be deducted leaving a balance of Rs.800.
7. Security problems:
Not every user of the database system should be able to access all the data. For example, the
students should not be given access rights to modify their marks in the database, thus their access rights
should be denied to guarantee security. This security aspect cannot be implemented efficiently using file
system.
All these disadvantages faced by the file processing systems are resolved in the database management
systems.
2. Views of Data
The major purpose of a database system is to provide users with an abstract view of the data. That is, the
system should hide certain details as how long the data are stored and maintained.
2.1. Data abstraction:
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led
designers to use complex data structures to represent data in the database. The developers hide the
complexity from users. It is done through three levels of abstraction to simplify users’ interactions with
the system.
Levels of abstraction:

View Level
View 1 View 2 --------- View n

Logical Level

Physical Level

Figure: Levels of abstraction


Physical level:
It is the lowest level of abstraction which describes how the data are actually stored in the databse.
For example, whetherit is stored sequentially, randomly etc. in addition it explains about the comples

2 Department of IT, PSNA CET


low-level data structures in detail. i.e., it tells about whether the data are stored in the array, stack,
queue, list etc.
Logical level:
It is the next-higher level of abstraction which describes the actual data stored in the database and
the interrelationship which exists among those data. This level uses various data models to describe the
data. The database administrators who must decide what information to keep in the database use this
level of abstraction.
View level:
It is the highest level of abstraction which describes only part of the entire database. Many users of
the database system do not need all the information available in the overall database instead; they need
to access only a part of the database. For example, the customers need not be aware of the salary details
of the employees working in the bank.
2.2. Instances and Schemas:
The state of the database changes over time as information is inserted and deleted by the users.
Instance:
The collection of information stored in the database at a particular moment is called an instance of
the database.
Schema:
The overall design of the database is called the database schema. The schema may not change
frequently. The physical schema (Internal level) describes the database design at the physical level. The
logical schema (Conceptual level) describes the database design at the logical level. The database
systems have several schemas at the view level. A sub schema describes the different views of the
database.
Figure: Three schema architecture

External View External View


1 End Users n

Conceptual Schema Conceptual level

Internal Schema Internal level

Stored database

2.3. Data Independence:


The concept of data independence can be defined as the capacity to change the schema at one
level of a database system without having to change the schema at the next higher level.
Two types of data independence:
2.3.1. Logical data independence:
It is the capacity to change the conceptual schema without having to change external schemas or
application programs. That is, the logical design (structure) of the database may be changed without
changing the application program.
2.3.2. Physical data independence:
It is the capacity to change the internal schema without having to change conceptual schema.
That is, the way in which the data are stored in the database may be changed without changing the
structure of the database.
Data models
Data model is a collection of conceptual tools for describing data, data relationships, data semantics (meaning)
and consistency constraints. A data model provides a way to describe the design of a database at the physical,
logical and view level.
The data models can be classified into four categories.
3 Department of IT, PSNA CET
• Relational model
• Entity-relationship model
• Object-based data model
• Semi structured data model
Other than the above said four data models, we have two additional data models that preceded the relational
data model. They are
• Network data model
• Hierarchical data model
3.1. Relational model
The relational model is based on simple concept of tables to represent both data and the relationships
among those data. Each table has multiple columns and each column has a unique name. Tables allow quick
comparison by row and column, easy retrieval of item by finding the point of intersection of a row and column.
The tables are called relations. Each row of data is equivalent to a record and is called a tuple. Each column of
data is equivalent to a field and is called an attribute. Relational databases is not one big table, instead it is
usually designed as many related tables.
Example: Account Relation
Account No. Balance
A100 500
A101 700
A102 400
A103 500
Customer relation
Customer Name SSN Account No. City
Jeya 7465 A100 Chennai
Saleem 7466 A101 Madurai
Hari 7467 A102 Madurai
Jeya 7465 A103 Chennai

Basic principles involved in creating a relational database


1. The order of attribute in a table is irrelevant (i.e.,) In a table with attributes rollno and name, the table
can be format either.
Rollno Name

Or

Name Rollno

2. Each tuple must be identified by the primary key.


3. Each table must have a unique identifier to represent the name of the relation.
4. There can be no duplicate attribute (i.e.,) Two attribute in a single table cannot be name and name.
Name Name

5. There can be only one value in each row column cell in a table (i.e.,) The value of a single tuple rollno
cannot be 1,2.
Rollno Name
1,2 Anu

4 Department of IT, PSNA CET


Three basic operation in relational database:
1. Select: It creates a subset of rows that satisfies certain conditions.
2. Join: It combines relational tables to provide the user with more information.
3. Project: It creates subset consisting of columns in a table permitting the user to create a new table,
which contains only the information required.
Diagram of relational database model

3.2. Entity-Relationship model


ER data model is based on the perception of a real world that consists of a collection of basic
Emp. No Name Salary objects, called entities and the relationships among
those objects.
1 Anu 2000
An entity is a “thing” or “object” in the real
2 Pravina world that is distinguishable from other objects.
3 Ajith 3000 Entities are described in the database by a set of
attributes. Attributes are properties of an entity. A
relationship is an association among several entities. The set of all entities of the same type and the set
of all relationships of the same type are termed as entity set and relationship set respectively.
The overall logical structure of a database can be expressed graphically by an ER diagram.
Entity: Customer, Account
Attribute: Cust. Name, cust. Street, cust. Id, cust. City, acc. No, balance
Relationship: can deposit
Cust. Name Cust. Street

Acc. No Balance
Cust id Cust City

Customer Can deposit Account

Figure: ER Diagram for a bank database

3.3. Object-based data model


This model is based on a collection of objects. An object contains values stored in instance
variables (data members) within the object. An object also contains bodies of code that operate on the
object. These bodies of code are called methods. Objects that contain the same type of values and the
same methods are grouped together into class. Thus a class is a collection of objects. This combination
of data and methods comprises of a type definition which is similar to a programming language abstract
data type. The only way in which one object can access the data of another object is by invoking a
method of that object. This action is called sending a message to the object. The object relational data
model combines features of the object-oriented data model and relational data model. Object-based
databases allows complex data types.

5 Department of IT, PSNA CET


The object relational data model extends the relational data model by providing a richer type
system including complex data types and object orientation.
3.4. Semi structured data model
This model permits the specification of data where individual data items of the same type may
have different sets of attributes. The Extensible Markup Language (XML) is used widely to represent
semi structured data.
<Catalog>
<CD>
<Title> Empire </Title>
<Artist> Bob </Artist>
<Country> India </Country>
</CD>
<CD>
<Title> </Title>
<Artist> XYZ </Artist>
<Country> India </Country>
</CD>
</Catalog>

3.5. Hierarchical data model


Two major concepts in Hierarchical data model are records and parent child relationships. A
record is a collection of field values for an entity. Records are grouped into record types. Records are
grouped into record types. A record type is given an unique name. A parent child relationship type (PCR
type) is a 1: N relationship between two record types. A record type on the 1 side is called the parent
record type and the N side is called child record type. An instance or occurrence of the PCR type
consists of one record of the child record type. An instance or occurrence of the PCR type consists of
one record of the parent record type and a number of records of the child record type. A hierarchical
schema (structure of the database) is displayed as a hierarchical diagram.
Example:

Department
Dname Dno Dlocation Mgrname

Employee Project
Eno Ename Bdate Address Pname Pno Plocation

The above figure shows three record types namely Department, Employee and Project. There are two
PCR types. They are (Department, Employee) and (Department, Project). Each occurrence of the
(Department, Employee) PCR types relates one department record to the records of the many employees
who work in that department.

Department: Research

Employee: Anu Arun Priya Saleem


m

6 Department of IT, PSNA CET


Each occurrence of the (Department, Project) PCR type relates a department record to the records of projects
controlled by that department.
Department: Research

Project: Product 1 Product 2 Product 3 Product 4

Hierarchical schema defines a tree data structure.


Department

Employee Project

Here, a record type corresponds to a node of the tree and a PCR type corresponds to an edge of the tree.
Virtual parent-child relationships
The hierarchical model has problems with M:N relationships. To solve this problem virtual parent-child
relationship (VPCR) type was introduced.
E Employee P Project

R ppointer

Here project is called the virtual parent of ppointer, ppointer is called the virtual child. It also avoids
redundancy.
Data definition in Hierarchical model:
The HDDL (Hierarchical data definition language) is used for defining the schema.
Data manipulation in Hierarchical model:
The HDML (Hierarchical data manipulation language) is used for inserting, deleting, selecting,
replacing the data.
3.6.Network data model:
The basic data structures used in network model are called records and sets. Data is stored in records. A
record is a collection of data values. Records are classified into record types.
Examples:
Student
Name Rollno Address Dept

Here record type is student. Data items are name, rollno, address, dept. values for these data items are called
records.
1:N Relationship representation:
A 1:N Relationship between two record types are called set type. They are represented by Bachman Diagram as
follows. Each set type definition has

Department
Dname Dno Dlocation

Student
7 Department of IT, PSNA CET
Name Rollno Address Dept

Set type name [Major-Dept]


• Owner record type [Department]
• Member record type [Student]
There are many set occurrences (or set instances) corresponding to a set type. Each instance relates one record
from owner record type and n records from member record type.

Set instances:
Department CSE ……….
(owner)

Arun ……….
Balaji ……….
Student
Chandru ……….
(member) Dinesh ……….
Elango ……….

Department Physics ……….


Sai ……….
Student
Sharmila ……….
Sumitha ……….

M:N Relationship representation:


Assume that an employee can be working on several projects and that a project
typically has several employees working on it. To represent this M:N relationship, two record types and an
additional record type is used.

Employee Project Network data definition language is used for


Eno …..….. Pno ……. creating the schema and Network data manipulation
language (commands like GET, FIND, STORE,
ERASE, MODIFY, CONNECT, DISCONNECT,
Works on RECONNECT) are used for data manipulation.
Hours

This additional record type works on is called


linking record type (or) dummy record type.
Illustrate database system structure with suitable block diagram.

8 Department of IT, PSNA CET


A database system is partitioned into modules that deal with each of the responsibilities of the overall system.
The functional components of a database system can be broadly divided into the storage manager and
the query processor components. The storage manager is important because databases typically require a large
amount of storage space. The query processor is important because it helps the database system simplify and
facilitate access to data.
It is the job of the database system to translate updates and queries written in a nonprocedural language, at the
logical level, into an efficient sequence of operations at the physical level.
Query Processor
The query processor components include
· DDL interpreter, which interprets DDL statements and records the definitions in the data dictionary.
· DML compiler, which translates DML statements in a query language into an evaluation plan consisting
of low-level instructions that the query evaluation engine understands.
A query can usually be translated into any of a number of alternative evaluation plans that all give the same
result. The DML compiler also performs query optimization, that is, it picks the lowest cost evaluation plan
from among the alternatives.
· Query evaluation engine, which executes low-level instructions generated by the DML compiler.
Storage Manager
A storage manager is a program module that provides the interface between the lowlevel data stored in the
database and the application programs and queries submitted to the system. The storage manager is responsible
for the interaction with the file manager. The raw data are stored on the disk using the file system, which is
usually provided by a conventional operating system. The storage manager translates the various DML
statements into low-level file-system commands. Thus, the storage manager is responsible for storing,
retrieving, and updating data in the database.
The storage manager components include:
· Authorization and integrity manager, which tests for the satisfaction of integrity constraints and checks
the authority of users to access data.

9 Department of IT, PSNA CET


· Transaction manager, which ensures that the database remains in a consistent (correct) state despite
system failures, and that concurrent transaction executions proceed without conflicting.
· File manager, which manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
· Buffer manager, which is responsible for fetching data from disk storage into main memory, and
deciding what data to cache in main memory. The buffer manager is a critical part of the database system, since
it enables the database to handle data sizes that are much larger than the size of main memory.
Transaction Manager
A transaction is a collection of operations that performs a single logical function in a database application.
Each transaction is a unit of both atomicity and consistency. Thus, we require that transactions do not violate
any database-consistency constraints. That is, if the database was consistent when a transaction started, the
database must be consistent when the transaction successfully terminates. Transaction - manager ensures that
the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating
system crashes) and transaction failures.
Application Architectures
▪ Two-tier architecture: E.g. client programs using ODBC/JDBC to communicate with a database
▪ Three-tier architecture: E.g. web-based applications, and applications built using “middleware”

Why do we require Keys in DBMS?


• We use a key for defining various types of integrity constraints in a database. A table, on the other hand,
represents a collection of the records of various events for any relation. Now, there might be thousands
of these records, and some of these might even be duplicated.
• Thus, we need a way in which one can identify all of these records uniquely and separately, i.e., without
any duplicates. This hassle is removed with the help of keys.
• For example, let us consider a database of all the students who are studying in a college. What attribute
of all the students, according to you, will identify each of these people uniquely? We can refer to these
students by their names, departments, sections, and year. Similarly, we can also mention only the
university roll number and fetch all the other details based on that roll number.
• The keys in DBMS can be a combination of multiple attributes (or columns), or they can be just one
single attribute. The primary motive of the keys is to provide every record with a unique identity of its
own.

What are the different types of Keys in DBMS?


Keys are of seven broad types in DBMS:
• Candidate Key
• Primary Key
• Foreign Key
• Super Key
• Alternate Key
10 Department of IT, PSNA CET
• Composite Key
• Unique Key
1. Primary Key
The primary key refers to a column or a set of columns of a table that helps us identify all the records uniquely
present in that table. A table can consist of just one primary key. Also, this primary key cannot consist of the
same values reappearing/repeating for any of its rows. All the values of a primary key have to be different, and
there should be no repetitions.
The PK (PRIMARY KEY) constraint that we put on a column/set of columns won’t allow these to have a null
value or a duplicate. Any table can consist of only a single primary key constraint. A foreign key (explained
below) that refers to it can never change the values present in the primary key.
2. Super Key
A super key refers to the set of all those keys that help us uniquely identify all the rows present in a table. It
means that all of these columns present in a table that can identify the columns of that table uniquely act as the
super keys.
A super key is a candidate key’s superset (candidate key has been explained below). We need to pick the
primary key of any table from the super key’s set so as to make it the table’s identity attribute.
3. Candidate Key
The candidate keys refer to those attributes that identify rows uniquely in a table. In a table, we select the
primary key from a candidate key. Thus, a candidate key has similar properties as that of the primary keys that
we have explained above. In a table, there can be multiple candidate keys.
4. Alternate Key
As we have stated above, any table can consist of multiple choices for the primary key. But, it can only choose
one. Thus, all those keys that did not become a primary key are known as alternate keys.
5. Foreign Key
We use a foreign key to establish relationships between two available tables. The foreign key would require
every value present in a column/set of columns to match the referential table’s primary key. A foreign key helps
us to maintain data as well as referential integrity.
6. Composite Key
The composite key refers to a set of multiple attributes that help us uniquely identify every tuple present in a
table. The attributes present in a set may not be unique whenever we consider them separately. Thus, when we
take them all together, it will ensure total uniqueness.
7. Unique Key
A unique key refers to a column/a set of columns that identify every record uniquely in a table. All the values in
this key would have to be unique. Remember that a unique key is different from a primary key. It is because it is
only capable of having one null value. A primary key, on the other hand, cannot have a null value.

Relational Algebra divided in various groups


Unary Relational Operations
• SELECT (symbol: σ)
• PROJECT (symbol: π)
• RENAME (symbol: )
Relational Algebra Operations From Set Theory
• UNION ( )
• INTERSECTION ( ),
• DIFFERENCE (-)
• CARTESIAN PRODUCT ( x )
Binary Relational Operations
• JOIN
• DIVISION )
SELECT (σ)

11 Department of IT, PSNA CET


The SELECT operation is used for selecting a subset of the tuples according to a given selection condition.
Sigma(σ)Symbol denotes it. It is used as an expression to choose tuples which meet the selection condition.
Select operation selects tuples that satisfy a given predicate.
σp(r)
σ is the predicate
r stands for relation which is the name of the table
p is prepositional logic
Example 1
σ topic = "Database" (Tutorials)
Output - Selects tuples from Tutorials where topic = 'Database'.
Example 2
σ topic = "Database" and author = "guru99"( Tutorials)
Output - Selects tuples from Tutorials where the topic is 'Database' and 'author' is guru99.
Example 3
σ sales > 50000 (Customers)
Output - Selects tuples from Customers where sales is greater than 50000

Projection(π)
The projection eliminates all attributes of the input relation but those mentioned in the projection list. The
projection method defines a relation that contains a vertical subset of Relation.
This helps to extract the values of specified attributes to eliminates duplicate values. (pi) The symbol used to
choose attributes from a relation. This operation helps you to keep specific columns from a relation and discards
the other columns.
Example of Projection:
Consider the following table
CustomerID CustomerName Status

1 Google Active

2 Amazon Active

3 Apple Inactive

4 Alibaba Active
Here, the projection of CustomerName and status will give
Π CustomerName, Status (Customers)
CustomerName Status

Google Active

Amazon Active

Apple Inactive

Alibaba Active
Union operation (U)
UNION is symbolized by ∪ symbol. It includes all tuples that are in tables A or in B. It also eliminates
duplicate tuples. So, set A UNION set B would be expressed as:
The result <- A ∪ B
For a union operation to be valid, the following conditions must hold -
12 Department of IT, PSNA CET
•R and S must be the same number of attributes.
•Attribute domains need to be compatible.
•Duplicate tuples should be automatically removed.
Example
Consider the following tables.
Table A Table B

column 1 column 2 column 1 column 2

1 1 1 1

1 2 1 3
A ∪ B gives
Table A ∪ B

column 1 column 2

1 1

1 2

1 3

Set Difference (-)


- Symbol denotes it. The result of A - B, is a relation which includes all tuples that are in A but not in B.
• The attribute name of A has to match with the attribute name in B.
• The two-operand relations A and B should be either compatible or Union compatible.
• It should be defined relation consisting of the tuples that are in relation A, but not in B.
Example
A-B
Table A - B

column 1 column 2

1 2
Intersection
An intersection is defined by the symbol ∩
A∩B
Defines a relation consisting of a set of all tuple that are in both A and B. However, A and B must be union-
compatible.

Example:
A∩B
Table A ∩ B

column 1 column 2

1 1

13 Department of IT, PSNA CET


Cartesian product(X)
This type of operation is helpful to merge columns from two relations. Generally, a Cartesian product is never a
meaningful operation when it performs alone. However, it becomes meaningful when it is followed by other
operations.
Example – Cartesian product
σ column 2 = '1' (A X B)
Output – The above example shows all rows from relation A and B whose column 2 has value 1
σ column 2 = '1' (A X B)

column 1 column 2

1 1

1 1
Join Operations
Join operation is essentially a cartesian product followed by a selection criterion.
Join operation denoted by ⋈.
JOIN operation also allows joining variously related tuples from different relations.
Types of JOIN:
Various forms of join operation are:
Inner Joins:
• Theta join
• EQUI join
• Natural join
Outer join:
• Left Outer Join
• Right Outer Join
• Full Outer Join
Inner Join:
In an inner join, only those tuples that satisfy the matching criteria are included, while the rest are excluded.
Let's study various types of Inner Joins:
Theta Join:
The general case of JOIN operation is called a Theta join. It is denoted by symbol θ
Example
A ⋈θ B
Theta join can use any conditions in the selection criteria.
For example:
A ⋈ A.column 2 > B.column 2 (B)
A ⋈ A.column 2 > B.column 2 (B)

column 1 column 2

1 2
EQUI join:
When a theta join uses only equivalence condition, it becomes a equi join.
For example:
A ⋈ A.column 2 = B.column 2 (B)
14 Department of IT, PSNA CET
A ⋈ A.column 2 = B.column 2 (B)

column 1 column 2

1 1
EQUI join is the most difficult operations to implement efficiently in an RDBMS and one reason why RDBMS
have essential performance problems.
NATURAL JOIN (⋈)
Natural join can only be performed if there is a common attribute (column) between the relations. The name and
type of the attribute must be same.
Example
Consider the following two tables
C

Num Square

2 4

3 9

Num Cube

2 8

3 18
C⋈D
C⋈D

Num Square Cube

2 4 4

3 9 9
OUTER JOIN
In an outer join, along with tuples that satisfy the matching criteria, we also include some or all tuples that do
not match the criteria.
Left Outer Join(A B)
In the left outer join, operation allows keeping all tuple in the left relation. However, if there is no matching
tuple is found in right relation, then the attributes of right relation in the join result are filled with null values.

Consider the following 2 Tables


A

Num Square

2 4

15 Department of IT, PSNA CET


3 9

4 16

Num Cube

2 8

3 18

5 75
A B
A⋈B

Num Square Cube

2 4 4

3 9 9

4 16 -
Right Outer Join: ( A B)
In the right outer join, operation allows keeping all tuple in the right relation. However, if there is no matching
tuple is found in the left relation, then the attributes of the left relation in the join result are filled with null
values.

A B
A⋈B

Num Cube Square

2 8 4

3 18 9

5 75 -
Full Outer Join: ( A B)
In a full outer join, all tuples from both relations are included in the result, irrespective of the matching
condition.
A B
A⋈B

Num Cube Square

2 4 8

16 Department of IT, PSNA CET


3 9 18

4 16 -

5 - 75
Summary
Operation Purpose

Select(σ) The SELECT operation is used for selecting a subset of the tuples according to a given
selection condition

Projection (π) The projection eliminates all attributes of the input relation but those mentioned in the
projection list.

Union UNION is symbolized by symbol. It includes all tuples that are in tables A or in B.
Operation (∪)

Set Difference - Symbol denotes it. The result of A - B, is a relation which includes all tuples that are in A
(-) but not in B.

Intersection (∩) Intersection defines a relation consisting of a set of all tuple that are in both A and B.

Cartesian Cartesian operation is helpful to merge columns from two relations.


Product(X)

Inner Join Inner join, includes only those tuples that satisfy the matching criteria.

Theta Join(θ) The general case of JOIN operation is called a Theta join. It is denoted by symbol θ.

EQUI Join When a theta join uses only equivalence condition, it becomes a equi join.

Natural Join(⋈) Natural join can only be performed if there is a common attribute (column) between the
relations.

Outer Join In an outer join, along with tuples that satisfy the matching criteria.

Left Outer In the left outer join, operation allows keeping all tuple in the left relation.
Join( )

Right Outer join In the right outer join, operation allows keeping all tuple in the right relation.
( )

Full Outer In a full outer join, all tuples from both relations are included in the result irrespective of the
Join( ) matching condition.

SQL - Data Definition Language (DDL)


Allows the specification of not only a set of relations but also information about each relation, including:
➢ The schema for each relation.
➢ The domain of values associated with each attribute.
➢ Integrity constraints

17 Department of IT, PSNA CET


➢ The set of indices to be maintained for each relations.
➢ Security and authorization information for each relation.
➢ The physical storage structure of each relation on disk.
Domain Types or Data types in SQL
➢ char(n). Fixed length character string, with user-specified length n.
➢ varchar(n). Variable length character strings, with user-specified maximum length n.
➢ int. Integer (a finite subset of the integers that is machine-dependent).
➢ smallint. Small integer (a machine-dependent subset of the integer domain type).
➢ numeric(p,d). Fixed point number, with user-specified precision of p digits, with n digits to the
right of decimal point.
➢ real, double precision. Floating point and double-precision floating point numbers, with
machine-dependent precision.
➢ float(n). Floating point number, with user-specified precision of at least n digits.
➢ Null values are allowed in all the domain types. Declaring an attribute to be not null prohibits null
values for that attribute.
➢ create domain construct in SQL-92 creates user-defined domain types
create domain person-name char(20) not null

Date/Time Types in SQL


➢ date. Dates, containing a (4 digit) year, month and date
➢ E.g. date ‘2001-7-27’
➢ time. Time of day, in hours, minutes and seconds.
➢ E.g. time ’09:00:30’ time ’09:00:30.75’
➢ timestamp: date plus time of day
➢ E.g. timestamp ‘2001-7-27 09:00:30.75’
➢ Interval: period of time
➢ E.g. Interval ‘1’ day
➢ Subtracting a date/time/timestamp value from another gives an interval value
➢ Interval values can be added to date/time/timestamp values
Create Table Construct
An SQL relation is defined using the create table command:
create table r (A1 D1, A2 D2, ..., An Dn,
(integrity-constraint1),…....,
(integrity-constraint k))
➢ r is the name of the relation
➢ each Ai is an attribute name in the schema of relation r
➢ Di is the data type of values in the domain of attribute Ai
Example:
create table branch (branch-name char(15) not null, branch-city char(30),
assets integer)
Integrity Constraints in Create Table
➢ not null
➢ primary key (A1, ..., An)
➢ check (P), where P is a predicate
Example: Declare branch-name as the primary key for branch and ensure that the values of assets are non-
negative.
create table branch (branch-name char(15),branch-city char(30) assets integer, primary key
(branch-name), check (assets >= 0))
Drop and Alter Table Constructs
➢ The drop table command deletes all information about the dropped relation from the database.
➢ The alter table command is used to add attributes to an existing relation.

18 Department of IT, PSNA CET


alter table r add A D
where A is the name of the attribute to be added to relation r and D is the domain of A.
➢ All tuples in the relation are assigned null as the value for the new attribute.
➢ The alter table command can also be used to drop attributes of a relation
alter table r drop A
where A is the name of an attribute of relation r
➢ Dropping of attributes not supported by many databases
SQL Query - Basic Structure
➢ SQL is based on set and relational operations with certain modifications and enhancements
➢ A typical SQL query has the form:
select A1, A2, ..., An from r1, r2, ..., rm where P
➢ A represent attributes
➢ r represent relations
➢ P is a predicate.
➢ This query is equivalent to the relational algebra expression.
A1, A2, ..., An(P (r1 x r2 x ... x rm))
➢ The result of an SQL query is a relation.
The select Clause
➢ The select clause list the attributes desired in the result of a query
➢ corresponds to the projection operation of the relational algebra
➢ E.g. find the names of all branches in the loan relation
select branch-name from loan
➢ In the “pure” relational algebra syntax, the query would be:
branch-name(loan)
➢ SQL allows duplicates in relations as well as in query results.
➢ To force the elimination of duplicates, insert the keyword distinct after select.
➢ Find the names of all branches in the loan relations, and remove duplicates
select distinct branch-name from loan
➢ The keyword all specifies that duplicates not be removed.
select all branch-name from loan
An asterisk in the select clause denotes “all attributes”
select * from loan
The select clause can contain arithmetic expressions involving the operation, +, –, , and /, and operating on
constants or attributes of tuples.
The query:
select loan-number, branch-name, amount  100 from loan
would return a relation which is the same as the loan relations, except that the attribute amount is multiplied
by 100.
The where Clause
➢ The where clause specifies conditions that the result must satisfy
➢ Corresponds to the selection predicate of the relational algebra.
➢ To find all loan number for loans made at the Perryridge branch with loan amounts greater than $1200.
select loan-number from loan where branch-name = ‘Perryridge’ and amount > 1200
➢ Comparison results can be combined using the logical connectives and, or, and not.
➢ Comparisons can be applied to results of arithmetic expressions.
➢ SQL includes a between comparison operator
➢ E.g. Find the loan number of those loans with loan amounts between $90,000 and $100,000 (that is,
$90,000 and $100,000)
select loan-number from loan where amount between 90000 and 100000
The from Clause
➢ The from clause lists the relations involved in the query
19 Department of IT, PSNA CET
➢ corresponds to the Cartesian product operation of the relational algebra.
➢ Find the Cartesian product borrower x loan
select  from borrower, loan
➢ Find the name, loan number and loan amount of all customers having a loan at the Perryridge branch.
select customer-name, borrower.loan-number, amount from borrower, loan where borrower.loan-
number = loan.loan-number and branch-name = ‘Perryridge’
Tuple Variables
Tuple variables are defined in the from clause via the use of the as clause.
Find the customer names and their loan numbers for all customers having a loan at some branch.
select customer-name, T.loan-number, S.amount from borrower as T, loan as S
where T.loan-number = S.loan-number
Find the names of all branches that have greater assets than some branch located in Brooklyn.
select distinct T.branch-name from branch as T, branch as S where T.assets > S.assets and S.branch-city =
‘Brooklyn’
String Operations
➢ SQL includes a string-matching operator for comparisons on character strings. Patterns are described
using two special characters:
➢ percent (%). The % character matches any substring.
➢ underscore (_). The _ character matches any character.
➢ Find the names of all customers whose street includes the substring “Main”.
select customer-name from customer where customer-street like ‘%Main%’
➢ Match the name “Main%”
like ‘Main\%’ escape ‘\’
Ordering the Display of Tuples
➢ List in alphabetic order the names of all customers having a loan in Perryridge branch
select distinct customer-name from borrower, loan where borrower loan-number = loan.loan-number
and branch-name = ‘Perryridge’ order by customer-name
➢ We may specify desc for descending order or asc for ascending order, for each attribute; ascending
order is the default.
➢ E.g. order by customer-name desc
Set Operations
➢ The set operations union, intersect, and except operate on relations and correspond to the relational
algebra operations −  
➢ Each of the above operations automatically eliminates duplicates; to retain all duplicates use the
corresponding multiset versions union all, intersect all and except all.
Find all customers who have a loan, an account, or both:
(select customer-name from depositor) union (select customer-name from borrower)
Find all customers who have both a loan and an account.
(select customer-name from depositor) intersect (select customer-name from borrower)
Find all customers who have an account but no loan.
(select customer-name from depositor) except (select customer-name from borrower)
Aggregate Functions
These functions operate on the multiset of values of a column of a relation, and return a value
avg: average value min: minimum value
max: maximum value sum: sum of values count: number of values
Find the average account balance at the Perryridge branch.
select avg (balance) from account where branch-name = ‘Perryridge’
Find the number of tuples in the customer relation.
select count (*) from customer
Find the number of depositors in the bank
select count (distinct customer-name)from depositor
20 Department of IT, PSNA CET
Find the number of depositors for each branch.
select branch-name, count (distinct customer-name)from depositor, account where
depositor.account-number = account.account-number group by branch-name
Find the names of all branches where the average account balance is more than $1,200.
select branch-name, avg (balance)from accountgroup by branch-name
having avg (balance) > 1200
Aggregate Functions – Group By
➢ Find the number of depositors for each branch.
select branch-name, count (distinct customer-name) from depositor, account
where depositor.account-number = account.account-number group by branch-name
➢ Find the names of all branches where the average account balance is more than $1,200.
select branch-name, avg (balance) from account group by branch-name having avg (balance) > 1200
Null Values
➢ It is possible for tuples to have a null value, denoted by null, for some of their attributes
➢ null signifies an unknown value or that a value does not exist.
➢ The predicate is null can be used to check for null values.
➢ E.g. Find all loan number which appear in the loan relation with null values for amount.
select loan-number from loan where amount is null
➢ The result of any arithmetic expression involving null is null
➢ E.g. 5 + null returns null
➢ However, aggregate functions simply ignore nulls
➢ more on this shortly
Null Values and Aggregates
Total all loan amounts
select sum (amount) from loan
➢ Above statement ignores null amounts
➢ result is null if there is no non-null amount
➢ All aggregate operations except count(*) ignore tuples with null values on the aggregated
attributes.
Nested Subqueries
➢ SQL provides a mechanism for the nesting of subqueries.
➢ A subquery is a select-from-where expression that is nested within another query.
➢ A common use of subqueries is to perform tests for set membership, set comparisons, and set
cardinality.
Example Query
Find all customers who have both an account and a loan at the bank.
select distinct customer-name from borrower where customer-name in (select
customer-name from depositor)
Find all customers who have a loan at the bank but do not have an account at the bank
select distinct customer-name from borrower where customer-name not in (select customer-
name from depositor)
Find all customers who have both an account and a loan at the Perryridge branch
select distinct customer-name from borrower, loan
where borrower.loan-number = loan.loan-number and
branch-name = “Perryridge” and (branch-name, customer-name) in
(select branch-name, customer-name from depositor, account
where depositor.account-number = account.account-number)
Set Comparison
Find all branches that have greater assets than some branch located in Brooklyn.
select distinct T.branch-name from branch as T, branch as S
where T.assets > S.assets and S.branch-city = ‘Brooklyn’

21 Department of IT, PSNA CET


Same query using > some clause
select branch-name from branch where assets > some (select assets
from branch where branch-city = ‘Brooklyn’)
Example Query
Find the names of all branches that have greater assets than all branches located in Brooklyn.
select branch-name from branch where assets > all (select assets from branch
where branch-city = ‘Brooklyn’)
Views
Provide a mechanism to hide certain data from the view of certain users. To create a view we use the
command:
create view v as <query expression> where:
➢ <query expression> is any legal expression
➢ The view name is represented by v
Example Queries
A view consisting of branches and their customers
create view all-customer as (select branch-name, customer-name from depositor, account where
depositor.account-number = account.account-number) union
(select branch-name, customer-name from borrower, loan where borrower.loan-number = loan.loan-
number)
Find all customers of the Perryridge branch
select customer-name from all-customer where branch-name = ‘Perryridge’
Modification of the Database – Deletion (DML)
Delete all account records at the Perryridge branch
delete from account where branch-name = ‘Perryridge’
Delete all accounts at every branch located in Needham city.
delete from account where branch-name in (select branch-name
from branch where branch-city = ‘Needham’)
Example Query
Delete the record of all accounts with balances below the average at the bank.
delete from account where balance < (select avg (balance) from account)
➢ Problem: as we delete tuples from deposit, the average balance changes
➢ Solution used in SQL:
1. First, compute avg balance and find all tuples to delete
2. Next, delete all tuples found above (without recomputing avg or retesting the tuples)
Modification of the Database – Insertion (DML)
Add a new tuple to account
insert into account values (‘A-9732’, ‘Perryridge’,1200)
or equivalently
insert into account (branch-name, balance, account-number) values (‘Perryridge’, 1200, ‘A-9732’)
Add a new tuple to account with balance set to null
insert into account values (‘A-777’,‘Perryridge’, null)
Modification of the Database – Updates (DML)
Increase all accounts with balances over $10,000 by 6%, all other accounts receive 5%.
➢ Write two update statements:
update account set balance = balance  1.06 where balance > 10000
update account set balance = balance  1.05 where balance  10000
➢ The order is important
➢ Can be done better using the case statement.
Case Statement for Conditional Updates
Same query as before: Increase all accounts with balances over $10,000 by 6%, all other accounts
receive 5%.
22 Department of IT, PSNA CET
update account set balance = case when balance <= 10000 then balance *1.05
else balance * 1.06 end
Update of a View
Create a view of all loan data in loan relation, hiding the amount attribute
create view branch-loan as select branch-name, loan-number from loan
Add a new tuple to branch-loan
insert into branch-loan values (‘Perryridge’, ‘L-307’)
This insertion must be represented by the insertion of the tuple (‘L-307’, ‘Perryridge’, null)
into the loan relation
Updates on more complex views are difficult or impossible to translate, and hence are disallowed.
Most SQL implementations allow updates only on simple views (without aggregates) defined on a single
relation
Transactions (TCL)
A transaction is a sequence of queries and update statements executed as a single unit
➢ Transactions are started implicitly and terminated by one of
➢ commit work: makes all updates of the transaction permanent in the database
➢ rollback work: undoes all updates performed by the transaction.
Motivating example
➢ Transfer of money from one account to another involves two steps:
➢ deduct from one account and credit to another
➢ If one steps succeeds and the other fails, database is in an inconsistent state
➢ Therefore, either both steps should succeed or neither should
If any step of a transaction fails, all work done by the transaction can be undone by rollback work.
Rollback of incomplete transactions is done automatically, in case of system failures.
Joined Relations
➢ Join operations take two relations and return as a result another relation.
➢ These additional operations are typically used as subquery expressions in the from clause
➢ Join condition – defines which tuples in the two relations match, and what attributes are present in the
result of the join.
➢ Join type – defines how tuples in each relation that do not match any tuple in the other relation (based on
the join condition) are treated.
Join Types : inner join, left outer join, right outer join, full outer join
Embedded SQL
n The SQL standard defines embeddings of SQL in a variety of programming languages such as Pascal,
PL/I, Fortran, C, and Cobol.
n A language to which SQL queries are embedded is referred to as a host language, and the SQL
structures permitted in the host language comprise embedded SQL.
n The basic form of these languages follows that of the System R embedding of SQL into PL/I.
n EXEC SQL statement is used to identify embedded SQL request to the preprocessor
EXEC SQL <embedded SQL statement > END-EXEC
Note: this varies by language. E.g. the Java embedding uses # SQL { …. } ;
Example Query
From within a host language, find the names and cities of customers with more than the variable amount
dollars in some account.
Specify the query in SQL and declare a cursor for it
EXEC SQL
declare c cursor for
select customer-name, customer-city
from depositor, customer, account where depositor.customer-name = customer.customer-name
and depositor account-number = account.account-number and account.balance > :amount
END-EXEC

23 Department of IT, PSNA CET


Embedded SQL (Cont.)
The open statement causes the query to be evaluated
EXEC SQL open c END-EXEC
The fetch statement causes the values of one tuple in the query result to be placed on host language
variables.
EXEC SQL fetch c into :cn, :cc END-EXEC
Repeated calls to fetch get successive tuples in the query result
A variable called SQLSTATE in the SQL communication area (SQLCA) gets set to ‘02000’ to indicate
no more data is available
The close statement causes the database system to delete the temporary relation that holds the result of
the query.
EXEC SQL close c END-EXEC
Note: above details vary with language. E.g. the Java embedding defines Java iterators to step through
result tuples.
Dynamic SQL
n Allows programs to construct and submit SQL queries at run time.
n Example of the use of dynamic SQL from within a C program.
char * sqlprog = “update account set balance = balance * 1.05
where account-number = ?”
EXEC SQL prepare dynprog from :sqlprog;
char account [10] = “A-101”;
EXEC SQL execute dynprog using :account;
n The dynamic SQL program contains a ?, which is a place holder for a value that is provided when the
SQL program is executed.

Introduction to NoSQL
What is a NoSQL database?
NoSQL, also referred to as “not only SQL”, “non-SQL”, is an approach to database design that enables the
storage and querying of data outside the traditional structures found in relational databases. While it can still
store data found within relational database management systems (RDBMS), it just stores it differently compared
to an RDBMS. The decision to use a relational database versus a non-relational database is largely contextual,
and it varies depending on the use case.
Instead of the typical tabular structure of a relational database, NoSQL databases, house data within one data
structure, such as JSON document. Since this non-relational database design does not require a schema, it offers
rapid scalability to manage large and typically unstructured data sets.
NoSQL is also type of distributed database, which means that information is copied and stored on various
servers, which can be remote or local. This ensures availability and reliability of data. If some of the data goes
offline, the rest of the database can continue to run.
Today, companies need to manage large data volumes at high speeds with the ability to scale up quickly to run
modern web applications in nearly every industry. In this era of growth within cloud, big data, and mobile
and web applications, NoSQL databases provide that speed and scalability, making it a popular choice for their
performance and ease of use.
Key Highlights on SQL vs NoSQL
SQL NoSQL

RELATIONAL DATABASE MANAGEMENT SYSTEM


Non-relational or distributed database system.
(RDBMS)

These databases have fixed or static or predefined schema They have a dynamic schema

24 Department of IT, PSNA CET


SQL NoSQL

These databases are best suited for hierarchical


These databases are not suited for hierarchical data storage.
data storage.

These databases are not so good for complex


These databases are best suited for complex queries
queries

Vertically Scalable Horizontally scalable

Follows CAP(consistency, availability, partition


Follows ACID property
tolerance)

Examples: MySQL, PostgreSQL, Oracle, MS-SQL Server, Examples: MongoDB, HBase, Neo4j, Cassandra,
etc etc

NoSQL is a non-relational database that is used to store the data in the nontabular form. NoSQL stands for Not
only SQL. The main types are documents, key-value, wide-column, and graphs.
Types of NoSQL Database:
• Document-based databases
• Key-value stores
• Column-oriented databases
• Graph-based databases

Document-Based Database:
The document-based database is a nonrelational database. Instead of storing the data in rows and columns
(tables), it uses the documents to store the data in the database. A document database stores data in JSON,
BSON, or XML documents.
Documents can be stored and retrieved in a form that is much closer to the data objects used in applications
which means less translation is required to use these data in the applications. In the Document database, the
particular elements can be accessed by using the index value that is assigned for faster querying.
Collections are the group of documents that store documents that have similar contents. Not all the documents
are in any collection as they require a similar schema because document databases have a flexible schema.
Key features of documents database:
• Flexible schema: Documents in the database has a flexible schema. It means the documents in the
database need not be the same schema.
25 Department of IT, PSNA CET
• Faster creation and maintenance: the creation of documents is easy and minimal maintenance is required
once we create the document.
• No foreign keys: There is no dynamic relationship between two documents so documents can be
independent of one another. So, there is no requirement for a foreign key in a document database.
• Open formats: To build a document we use XML, JSON, and others.
Key-Value Stores:
A key-value store is a nonrelational database. The simplest form of a NoSQL database is a key-value store.
Every data element in the database is stored in key-value pairs. The data can be retrieved by using a unique key
allotted to each element in the database. The values can be simple data types like strings and numbers or
complex objects.
A key-value store is like a relational database with only two columns which is the key and the value.
Key features of the key-value store:
• Simplicity.
• Scalability.
• Speed.
Column Oriented Databases:
A column-oriented database is a non-relational database that stores the data in columns instead of rows. That
means when we want to run analytics on a small number of columns, you can read those columns directly
without consuming memory with the unwanted data.
Columnar databases are designed to read data more efficiently and retrieve the data with greater speed. A
columnar database is used to store a large amount of data. Key features of columnar oriented database:
• Scalability.
• Compression.
• Very responsive.
Graph-Based databases:
Graph-based databases focus on the relationship between the elements. It stores the data in the form of nodes in
the database. The connections between the nodes are called links or relationships.
Key features of graph database:
In a graph-based database, it is easy to identify the relationship between the data by using the links.
The Query’s output is real-time results.
The speed depends upon the number of relationships among the database elements.
Updating data is also easy, as adding a new node or edge to a graph database is a straightforward task that does
not require significant schema changes.

26 Department of IT, PSNA CET

You might also like