Professional Documents
Culture Documents
Hand Out Intro To Database
Hand Out Intro To Database
SOFTWARE ENGINEERING
INTRODUCTION TO
DATABASE
BY
MR. NDUMU
HAND OUT
INTRODUCTION TO DATABASE SYSTEMS
INTRODUCTION TO DATABASE
Data within the most common types of databases in operation today is typically modeled in
rows and columns in a series of tables to make processing and data querying efficient. The
data can then be easily accessed, managed, modified, updated, controlled, and organized.
Most databases use structured query language (SQL) for writing and querying data.
Databases have evolved dramatically since their inception in the early 1960s. Navigational
databases such as the hierarchical database (which relied on a tree-like model and allowed
only a one-to-many relationship), and the network database (a more flexible model that
allowed multiple relationships), were the original systems used to store and manipulate
data. Although simple, these early systems were inflexible. In the 1980s, relational
databases became popular, followed by object-oriented databases in the 1990s. More
recently, NoSQL databases came about as a response to the growth of the internet and the
need for faster speed and processing of unstructured data. Today, cloud databases and self-
driving databases are breaking new ground when it comes to how data is collected, stored,
managed, and utilized.
Databases and spreadsheets (such as Microsoft Excel) are both convenient ways to store
information. The primary differences between the two are:
Spreadsheets were originally designed for one user, and their characteristics reflect that.
They’re great for a single user or small number of users who don’t need to do a lot of
incredibly complicated data manipulation. Databases, on the other hand, are designed to
hold much larger collections of organized information—massive amounts, sometimes.
Databases allow multiple users at the same time to quickly and securely access and query
the data using highly complex logic and language.
TYPES OF DATABASES
There are many different types of databases. The best database for a specific organization
depends on how the organization intends to use the data.
These are only a few of the several dozen types of databases in use today. Other, less
common databases are tailored to very specific scientific, financial, or other functions. In
addition to the different database types, changes in technology development approaches
and dramatic advances such as the cloud and automation are propelling databases in
entirely new directions. Some of the latest databases include
Open source databases. An open source database system is one whose source code
is open source; such databases could be SQL or NoSQL databases.
Cloud databases. A cloud database is a collection of data, either structured or
unstructured, that resides on a private, public, or hybrid cloud computing platform.
There are two types of cloud database models: traditional and database as a service
(DBaaS). With DBaaS, administrative tasks and maintenance are performed by a
service provider.
Multimodel database. Multimodel databases combine different types of database
models into a single, integrated back end. This means they can accommodate various
data types.
Document/JSON database. Designed for storing, retrieving, and managing document-
oriented information, document databases are a modern way to store data in JSON
format rather than rows and columns.
Self-driving databases. The newest and most groundbreaking type of database, self-
driving databases (also known as autonomous databases) are cloud-based and use
machine learning to automate database tuning, security, backups, updates, and other
routine management tasks traditionally performed by database administrators.
3|Page
WHAT IS A DATABASE?
A database (DB) is a collection of data that lives for a long time. Many systems fit this
definition, for example, a paper-based file system, a notebook, or even a string with knobs
for counting.
Schema are meta-data that describe data. Such meta-data can describe the structure of the
data which ranges from strictly enforced structure (relational) to semi-structure (XML) and
free-structured data (text files). Before we define the schema we must decide on a model of
the data - a metaphor. For relational database n-ary relation is used to model data.
A database can be defined as a shared collection of interrelated data designed to meet the
varied information needs of an organisation. (McFadden and Hoffer).
PROPERTIES OF A DATABASE
A database is
Shared All qualified users in the organisation have access to the same data
for use in a variety of activities.
ADVANTAGES
There are a number of advantages to the database approach. These are discussed in more
detail in the topic Database Systems. Some of the advantages of a database system are
summarised below.
5|Page
Data independence In a database system the data is held independently of the
application programs that use it. Changes to the data or
the way it is stored will not necessarily affect the
programs that use the data.
Reduced maintenance A number of tools for maintaining the data are already
built in to most DBMS. Tools may include report
generators, query languages, form generators, code
checkers, data dictionaries and maintenance logs.
6|Page
COMPONENTS
Repository Which contains the rules under which users can access the
data and the rules under which the data is organised ie
metadata.
Interfaces Which are the various ways an end user may access the data
via display terminals, phone links from remote terminals,
touch screens, bar code readers, by using voice commands,
touch pads, graphics selection, selections from menus with
keyboard or mouse, keying in commands or scanning
cards. The interface also is the form in which the
information is delivered to the end user eg graph, printed
report, graphical display.
Application programs Which are used to perform the main operations on the data
ie create, modify, delete and retrieve. Application programs
also combine the data in meaningful ways to produce
reports.
7|Page
Computer Aided Are also considered to be part of the database
Software Engineering environment. These are automated tools to assist with the
Tools (CASE tools) design of systems and databases.
Database Manage the DBMS and are responsible for the overall
Administrators (DBA) information resources of an organisation. You will look at
the role of the DBA more fully in a later section.
System Developers Include systems analysts and programmers. You will look at
the role of each of these in other modules. Both use CASE
tools to assist in the development of application programs.
8|Page
DATABASE MANAGEMENT SYSTEM (DBMS)
A database management system (DBMS) is system software for creating and managing
databases. A DBMS makes it possible for end users to create, read, update and delete data
in a database. The most prevalent type of data management platform, the DBMS essentially
serves as an interface between databases and end users or application programs, ensuring
that data is consistently organized and remains easily accessible.
FUNCTIONS OF A DBMS
The DBMS manages three important things: the data, the database engine that allows data
to be accessed, locked and modified, and the database schema, which defines the database's
logical structure. These three foundational elements help provide concurrency, security,
data integrity and uniform data administration procedures. Typical database
administration tasks supported by the DBMS include change management, performance
monitoring and tuning, security, and backup and recovery. Many database management
systems are also responsible for automated rollbacks and restarts as well as the logging and
auditing of activity in databases.
The DBMS is perhaps most useful for providing a centralized view of data that can be
accessed by multiple users, from multiple locations, in a controlled manner. A DBMS can
limit what data the end user sees, as well as how that end user can view the data, providing
many views of a single database schema. End users and software programs are free from
having to understand where the data is physically located or on what type of storage media
it resides because the DBMS handles all requests.
The DBMS can offer both logical and physical data independence.This means it can protect
users and applications from needing to know where data is stored or having to be
concerned about changes to the physical structure of data. As long as programs use the
9|Page
application programming interface (API) for the database that is provided by the DBMS,
developers won't have to modify programs just because changes have been made to the
database.
In a relational database management system (RDBMS), the most widely used type of DBMS,
this API is SQL, a standard programming language for defining, protecting and accessing
data in an RDBMS.
Relational database management system (RDBMS) -- adaptable to most use cases, but
RDBMS Tier-1 products can be quite expensive.
NoSQL DBMS -- well-suited for loosely defined data structures that may evolve over
time.
In-memory database management system (IMDBMS) -- provides faster response
times and better performance.
10 | P a g e
Columnar database management system (CDBMS) -- well-suited for data warehouses
that have a large number of similar data items.
Cloud-based database management system -- the cloud service provider is
responsible for providing and maintaining the DBMS.
Using a DBMS to store and manage data comes with advantages, but also processing
overhead. One of the biggest advantages of using a DBMS is that it lets end users and
application programmers access and use the same data while managing data integrity. Data
is better protected and maintained when it can be shared using a DBMS instead of creating
new iterations of the same data stored in new files for every new application. The DBMS
provides a central store of data that can be accessed by multiple users in a controlled
manner.
11 | P a g e
Another advantage of a DBMS is that it can be used to impose a logical, structured
organization on the data. A DBMS delivers economy of scale for processing large amounts
of data because it is optimized for such operations.
A DBMS can also provide many views of a single database schema. A view defines what data
the user sees and how that user sees the data. The DBMS provides a level of abstraction
between the conceptual schema that defines the logical structure of the database and the
physical schema that describes the files, indexes and other physical mechanisms used by
the database. When a DBMS is used, systems can be modified much more easily when
business requirements change. New categories of data can be added to the database without
disrupting the existing system and applications can be insulated from how data is
structured and stored.
However, a DBMS must perform additional work to provide these advantages, thereby
bringing with it the overhead. A DBMS will use more memory and CPU than a simple file
storage system, and different types of DBMSes will require different types and levels of
system resources.
A DBMS always provides data independence. Any change in storage mechanism and formats
are performed without modifying the entire application. There are four main types of
database organization:
12 | P a g e
relational database system. Operations such as "select" and "join" can be performed
on these tables. This is the most widely used system of database organization.
Flat Database: Data is organized in a single kind of record with a fixed number of
fields. This database type encounters more errors due to the repetitive nature of data.
Object-Oriented Database: Data is organized with similarity to object-oriented
programming concepts. An object consists of data and methods, while classes group
objects having similar data and methods.
Hierarchical Database: Data is organized with hierarchical relationships. It becomes
a complex network if the one-to-many relationship is violated.
1. The data are combined to form operational units to minimise the duplication of data and
increase access to all data in the data base.
2. The advancement in the data base enables to add more data and program to the system.
3. The capacity to store large amount of data necessary for user’s needs. They are stored on
direct accessible devices for one line support.
4. The control in the systems limits the access to the data base files and builds the
confidentiality of all data in these files.
5. The capacity to interrogating data files, retrieving and modifying data and recording the
changes.
The objectives of a data base management system are to facilitate the creation of data
structures and relieve the programmer of the problems of setting up complicated files.
13 | P a g e
The objectives of DBMS can be narrated as follows:
Every computer application has unique requirements. For example, special purpose
software systems that handle personnel, inventory, and marketing data, may differ not only
in the type of information these store, but also in the facilities they provide for data entry
and retrieval.
The cost of designing and building special purpose software systems for Data management
tasks often prohibits otherwise cost effective automation. Data base management systems
are general purpose programs that dramatically reduce the time necessary to computerise
an application.
14 | P a g e
2. A mechanism for accessing data that provides a measure of data independence, i.e., to
some extent it insulates application programs from changes to the data structure.
3. Creating program and data independence. Either one can be altered independently of the
other.
5. Providing security to the user’s data. Access is limited to authorized users by pass words
or similar schemes.
6. Reducing physical storage requirements by separating the logical and physical aspects of
the data base.
Merits of DBMS:
1. Integrity
2. Security
3. Data independence
4. Shared data
5. Conflict resolution
6. Reduction of redundancies.
1. Integrity:
Centralised control can also ensure that adequate checks are incorporated in the DBMS to
provide data integrity. Data integrity means that the data contained in the data base is both
15 | P a g e
accurate and consistent. Therefore, Data values being entered for storage could be checked
to ensure that they fall within a specified range and are of the correct format.
For example, the value for the age of an employee may be in the range of 16 and 75.
Another integrity check that should be incorporated in the data base is to ensure that if
there is a reference to certain object, that object must exist. In the case of an automatic teller
machine, for example, a user is not allowed to transfer funds from a non-existent savings
account to a checking account.
2. Security:
Different levels of security could be implemented for various types of data and operations.
The enforcement of security could be data value dependent (e.g., a manager has access to
the salary details of employees in his department only), as well as data type dependent (but
the manager cannot access the medical history of any employee, including those in his
department).
3. Data Independence:
Data independence is usually considered from two points of view; physical data
independence and logical data independence. Physical data independence allows changes
in the physical storage devices or organisation of the files to be made without requiring
changes in the conceptual view or any of the external views and hence in the application
programs using the data base.
16 | P a g e
Thus, the files may migrate from one type of physical media to another or the file structure
may change without any need for changes in the application programs. Logical data
independence implies that application programs need not be changed if fields are added to
an existing record; nor do they have to be changed if fields not used by application programs
are deleted.
Logical data independence indicates that the conceptual schema can be changed without
affecting the existing external schemas. Data independence is advantageous in the data base
environment since it allows for changes at one level of the data base without affecting other
levels. These changes are absorbed by the mappings between the levels.
4. Shared Data:
A data base allows the sharing of data under its control by any number of application
programs or users. In the example discussed earlier, the applications for the public relations
and payroll departments could share the data contained for the record type employee.
5. Conflict Resolution:
Since the data base is under the control of the data base administrator (DBA), he should
resolve the conflicting requirements of various users and applications. In essence, the DBA
chooses the best file structure and access method to get optimal performance for the critical
applications, while permitting less critical applications to continue to use the data base,
albeit with the relative response.
6. Reduction of Redundancies:
Centralised control of data by the DBA avoids unnecessary duplication of data and
effectively reduces the total amount of data storage required. It also eliminates the extra
processing necessary to trace the required data in a large mass of data.
17 | P a g e
Another advantage of avoiding duplication is the elimination of the inconsistencies that
tend to be present in redundant data files. Any redundancies that exist in the DBMS are
controlled and the system ensures that these multiple copies are consistent.
DISADVANTAGES OF DBMS:
Backup and recovery operations are very complex in a data base management system
(DBMS) environment and this is evident in concurrent multi user data base system. A data
base system requires a certain amount of controlled redundancies and duplication to enable
assess to related data items.
The decentralisation of the data base that is the replacement of a single centralised data
base by independent and co-operating distributed data bases solves the problem arise out
because of the centralisation that is the problem of failures and down time.
18 | P a g e
3. Cost of Software, Hardware and Migration:
For a well-designed and effective data base system, it is necessary to purchase and develop
the software’s and hardware has to be upgraded to allow for the extensive programs and
the work spaces required for their execution and storage. It involves a lot of cost. An
additional cost is that of migration that means the shift from a traditional separate
application environment to an integrated application environment.
From my point of view, the basic objectives of a database system can be summarized as
below:
1. A database should act as a kind of medium to collect and store the incoming data in
an organized way. For example, in case of a relational database such as Oracle, the
main purpose of this database is to store the input data and organize them in terms
of attributes (columns) and tuples (rows) grouped into relations (tables).
2. A database, in addition to storing the input data, should allow for an efficient retrieval
of stored data as per user’s requirements.
3. A database should be implemented with various security features such that it ensures
high level of integrity, i.e., developing a trust for the users about their data stored in
the database.
4. A database should be highly scalable as the amount of data increases over time. In this
context, it should also be highly adaptable to changes with respect to business needs.
5. A database should be highly consistent despite the amount of concurrent transactions
operating on the data stored in it. Further, it should also be highly durable so that it
prevents loss of data despite the loss of power.
19 | P a g e
DBMS DATABASE MODELS
A Database model defines the logical design and structure of a database and defines how
data will be stored, accessed and updated in a database management system. While the
Relational Model is the most widely used database model, there are other models too:
Hierarchical Model
Network Model
Entity-relationship Model
Relational Model
HIERARCHICAL MODEL
This database model organises data into a tree-like-structure, with a single root, to which
all the other data is linked. The heirarchy starts from the Root data, and expands like a tree,
adding child nodes to the parent nodes.
In this model, a child node will only have a single parent node.
This model efficiently describes many real-world relationships like index of a book, recipes
etc.
In hierarchical model, data is organised into tree-like structure with one one-to-many
relationship between two different types of data, for example, one department can have
many courses, many professors and of-course many students.
20 | P a g e
NETWORK MODEL
This is an extension of the Hierarchical model. In this model data is organised more like a
graph, and are allowed to have more than one parent node.
In this database model data is more related as more relationships are established in this
database model. Also, as the data is more related, hence accessing the data is also easier and
fast. This database model was used to map many-to-many data relationships.
This was the most widely used database model, before Relational Model was introduced.
21 | P a g e
RELATIONAL MODEL (ENTITY-RELATIONSHIP MODEL)
In this database model, relationships are created by dividing object of interest into entity
and its characteristics into attributes.
E-R Models are defined to represent the relationships into pictorial form to make it easier
for different stakeholders to understand.
This model is good to design a database, which can then be turned into tables in relational
model(explained below).
Let's take an example, If we have to design a School Database, then Student will be an entity
with attributes name, age, address etc. As Address is generally complex, it can be another
entity with attributes street name, pincode, city etc, and there will be a relationship between
them.
22 | P a g e
Relationships can also be of different types. To learn about E-R Diagrams in details, click on
the link.
RELATIONAL MODEL
This model was introduced by E.F Codd in 1970, and since then it has been the most widely
used database model, in fact, we can say the only database model used around the world.
The basic structure of data in the relational model is tables. All the information related to a
particular type is stored in rows of that table.
In the coming tutorials we will learn how to design tables, normalize them to reduce data
redundancy and how to use Structured Query language to access data from tables.
23 | P a g e
DATABASE MODELS
24 | P a g e
NETWORK MODEL
The popularity of the network data model coincided with the popularity of the hierarchical
data model. Some data were more naturally modeled with more than one parent per child.
So, the network model permitted the modeling of many-to-many relationships in data. In
1971, the Conference on Data Systems Languages (CODASYL) formally defined the network
model. The basic data modeling construct in the network model is the set construct. A set
consists of an owner record type, a set name, and a member record type. A member record
type can have that role in more than one set, hence the multiparent concept is supported.
An owner record type can also be a member or owner in another set. The data model is a
simple network, and link and intersection record types (called junction records by IDMS)
may exist, as well as sets between them . Thus, the complete network of relationships is
represented by several pairwise sets; in each set some (one) record type is owner (at the
tail of the network arrow) and one or more record types are members (at the head of the
relationship arrow). Usually, a set defines a 1:M relationship, although 1:1 is permitted. The
CODASYL network model is based on mathematical set theory.
HIERARCHICAL MODEL
The hierarchical data model organizes data in a tree structure. There is a hierarchy of parent
and child data segments. This structure implies that a record can have repeating
information, generally in the child data segments. Data in a series of records, which have a
set of field values attached to it. It collects all the instances of a specific record together as a
record type. These record types are the equivalent of tables in the relational model, and
with the individual records being the equivalent of rows. To create links between these
record types, the hierarchical model uses Parent Child Relationships. These are a 1:N
mapping between record types. This is done by using trees, like set theory used in the
relational model, "borrowed" from maths. For example, an organization might store
25 | P a g e
information about an employee, such as name, employee number, department, salary. The
organization might also store information about an employee's children, such as name and
date of birth. The employee and children data forms a hierarchy, where the employee data
represents the parent segment and the children data represents the child segment. If an
employee has three children, then there would be three child segments associated with one
employee segment. In a hierarchical database the parent-child relationship is one to many.
This restricts a child segment to having only one parent segment. Hierarchical DBMSs were
popular from the late 1960s, with the introduction of IBM's Information Management
System (IMS) DBMS, through the 1970s.
RELATIONAL MODEL
Certain fields may be designated as keys, which means that searches for specific values of
that field will use indexing to speed them up. Where fields in two different tables take
values from the same set, a join operation can be performed to select related records in
26 | P a g e
the two tables by matching values in those fields. Often, but not always, the fields will have
the same name in both tables. For example, an "orders" table might contain (customer-ID,
product-code) pairs and a "products" table might contain (product-code, price) pairs so to
calculate a given customer's bill you would sum the prices of all products ordered by that
customer by joining on the product-code fields of the two tables. This can be extended to
joining multiple tables on multiple fields. Because these relationships are only specified at
retreival time, relational databases are classed as dynamic database management system.
The RELATIONAL database model is based on the Relational Algebra.
OBJECT-ORIENTED MODEL
Object DBMSs add database functionality to object programming languages. They bring
much more than persistent storage of programming language objects. Object DBMSs
extend the semantics of the C++, Smalltalk and Java object programming languages to
provide full-featured database programming capability, while retaining native language
compatibility. A major benefit of this approach is the unification of the application and
database development into a seamless data model and language environment. As a result,
applications require less code, use more natural data modeling, and code bases are easier
to maintain. Object developers can write complete database applications with a modest
amount of additional effort.
27 | P a g e
This one-to-one mapping of object programming language objects to database objects has
two benefits over other storage approaches: it provides higher performance management
of objects, and it enables better management of the complex interrelationships between
objects. This makes object DBMSs better suited to support applications such as financial
portfolio risk analysis systems, telecommunications service applications, World Wide Web
document structures, design and manufacturing systems, and hospital patient record
systems, which have complex relationships between data.
The table name and column names are helpful to interpret the meaning of values in each
row. The data are represented as a set of relations. In the relational model, data are stored
as tables. However, the physical storage of the data is independent of the way the data are
logically organized.
28 | P a g e
Operations in Relational Model
Best Practices for creating a Relational Model
Advantages of using Relational model
Disadvantages of using Relational model
1. Attribute: Each column in a Table. Attributes are the properties which define a
relation. e.g., Student_Rollno, NAME,etc.
2. Tables – In the Relational model the, relations are saved in the table format. It is
stored along with its entities. A table has two properties rows and columns. Rows
represent records and columns represent attributes.
3. Tuple – It is nothing but a single row of a table, which contains a single record.
4. Relation Schema: A relation schema represents the name of the relation with its
attributes.
5. Degree: The total number of attributes which in the relation is called the degree of the
relation.
6. Cardinality: Total number of rows present in the Table.
7. Column: The column represents the set of values for a specific attribute.
8. Relation instance – Relation instance is a finite set of tuples in the RDBMS system.
Relation instances never have duplicate tuples.
9. Relation key - Every row has one, two or multiple attributes, which is called relation
key.
10. Attribute domain – Every attribute has some pre-defined value and scope
which is known as attribute domain
29 | P a g e
RELATIONAL INTEGRITY CONSTRAINTS
Relational Integrity constraints is referred to conditions which must be present for a valid
relation. These integrity constraints are derived from the rules in the mini-world that the
database represents.
There are many types of integrity constraints. Constraints on the Relational database
management system is mostly divided into three main categories are:
1. Domain constraints
2. Key constraints
3. Referential integrity constraints
Domain Constraints
30 | P a g e
Domain constraints specify that within each tuple, and the value of each attribute must be
unique. This is specified as data types which include standard data types integers, real
numbers, characters, Booleans, variable length strings, etc.
Example:
The example shown demonstrates creating a domain constraint such that CustomerName
is not NULL
Key constraints
An attribute that can uniquely identify a tuple in a relation is called the key of the table. The
value of the attribute for different tuples in the relation has to be unique.
Example:
In the given table, CustomerID is a key attribute of Customer Table. It is most likely to have
a single key for one customer, CustomerID =1 is only for the CustomerName =" Google".
1 Google Active
2 Amazon Active
3 Apple Inactive
31 | P a g e
Referential integrity constraints
Referential integrity constraints is base on the concept of Foreign Keys. A foreign key is an
important attribute of a relation which should be referred to in other relationships.
Referential integrity constraint state happens where relation refers to a key attribute of a
different or same relation. However, that key element must exist in the table.
Example:
32 | P a g e
Insert, update, delete and select.
Whenever one of these operations are applied, integrity constraints specified on the
relational database schema must never be violated.
Insert Operation
The insert operation gives values of the attribute for a new tuple which should be inserted
into a relation.
Update Operation
You can see that in the below-given relation table CustomerName= 'Apple' is updated from
Inactive to Active.
33 | P a g e
Delete Operation
To specify deletion, a condition on the attributes of the relation selects the tuple to be
deleted.
The Delete operation could violate referential integrity if the tuple which is deleted is
referenced by foreign keys from other tuples in the same database.
Select Operation
34 | P a g e
The values of an attribute should be from the same domain
Simplicity: A relational data model is simpler than the hierarchical and network
model.
Structural Independence: The relational database is only concerned with data and not
with a structure. This can improve the performance of the model.
Easy to use: The relational model is easy as tables consisting of rows and columns is
quite natural and simple to understand
Query capability: It makes possible for a high-level query language like SQL to avoid
complex database navigation.
Data independence: The structure of a database can be changed without having to
change any application.
Scalable: Regarding a number of records, or rows, and the number of fields, a database
should be enlarged to enhance its usability.
Few relational databases have limits on field lengths which can't be exceeded.
Relational databases can sometimes become complex as the amount of data grows,
and the relations between pieces of data become more complicated.
Complex relational database systems may lead to isolated databases where the
information cannot be shared from one system to another.
The relational database model was a huge leap forward from the network database model.
Instead of relying on a parent-child or owner-member relationship, the relational model
allows any file to be related to any other by means of a common field. Suddenly, the
complexity of the design was greatly reduced because changes could be made to the
35 | P a g e
database schema without affecting the system's ability to access data. And because access
was not by means of paths to and from files, but from a direct relationship between files,
new relations between these files could easily be added.
In 1970, when E.F. Codd developed the model, it was thought to be impractical. The
increased ease of use comes at a large performance penalty, and the hardware in those days
was not able to implement the model. Since then, of course, hardware has taken huge strides
to where today, even the simplest computers can run sophisticated relational database
management systems.
Relational databases go hand-in-hand with the development of SQL. The simplicity of SQL -
where even a novice can learn to perform basic queries in a short period of time - is a large
part of the reason for the popularity of the relational model.
The two tables below relate to each other through the product_code field. Any two tables
can relate to each other simply by creating a field they have in common.
Table 1
Product_code Description Price
Table 2
Invoice_code Invoice_line Product_code Quantity
3804 1 A416 10
3804 2 C923 15
36 | P a g e
BASIC CONCEPTS OF ER MODEL IN DBMS
The main data objects are termed as Entities, with their details defined as attributes, some
of these attributes are important and are used to identity the entity, and different entities
are related using relationships.
Let's take an example to explain everything. For a School Management Software, we will
have to store Student information, Teacher information, Classes, Subjects taught in each
class etc.
Considering the above example, Student is an entity, Teacher is an entity, similarly, Class,
Subject etc are also entities.
An Entity is generally a real-world object which has characteristics and holds relationships
in a DBMS.
If a Student is an Entity, then the complete dataset of all the students will be the Entity Set
37 | P a g e
ER Model: Attributes
If a Student is an Entity, then student's roll no., student's name, student's age, student's
gender etc will be its attributes.
An attribute can be of many types, here are different types of attributes defined in ER
database model:
1. Simple attribute: The attributes with values that are atomic and cannot be broken
down further are simple attributes. For example, student's age.
2. Composite attribute: A composite attribute is made up of more than one simple
attribute. For example, student's address will contain, house no., street name, pincode
etc.
3. Derived attribute: These are the attributes which are not present in the whole
database management system, but are derived using other attributes. For example,
average age of students in a class.
4. Single-valued attribute: As the name suggests, they have a single value.
5. Multi-valued attribute: And, they can have multiple values.
ER MODEL: KEYS
If the attribute roll no. can uniquely identify a student entity, amongst all the students, then
the attribute roll no. will be said to be a key.
38 | P a g e
1. Super Key
2. Candidate Key
3. Primary Key
ER Model: Relationships
When an Entity is related to another Entity, they are said to have a relationship. For
example, A Class Entity is related to Student entity, becasue students study in classes, hence
this is a relationship.
For example, if 2 entities are involved, it is said to be Binary relationship, if 3 entities are
involved, it is said to be Ternary relationship, and so on.
In the next tutorial, we will learn how to create ER diagrams and design databases using ER
diagrams.
ER Diagram is a visual representation of data that describes how data is related to each
other. In ER Model, we disintegrate data into entities, attributes and setup relationships
between entities, all this can be represented visually using the ER diagram.
For example, in the below diagram, anyone can see and understand what the diagram wants
to convey: Developer develops a website, whereas a Visitor visits a website.
39 | P a g e
COMPONENTS OF ER DIAGRAM
Entitiy, Attributes, Relationships etc form the components of ER Diagram and there are
defined symbols and shapes to represent each one of them.
Entity
40 | P a g e
Attributes for any Entity
Weak Entity
To represent a Key attribute, the attribute name inside the Ellipse is underlined.
41 | P a g e
Derived Attribute for any Entity
Derived attributes are those which are derived based on other attributes, for example, age
can be derived from date of birth.
To represent a derived attribute, another dotted ellipse is created inside the main ellipse.
Double Ellipse, one inside another, represents the attribute which can have multiple values.
42 | P a g e
ER Diagram: Entity
An Entity can be any object, place, person or class. In ER Diagram, an entity is represented
using rectangles. Consider an example of an Organisation- Employee, Manager, Department,
Product and many more can be taken as entities in an Organisation.
Weak entity is an entity that depends on another entity. Weak entity doesn't have anay key
attribute of its own. Double rectangle is used to represent a weak entity.
43 | P a g e
ER Diagram: Attribute
44 | P a g e
ER Diagram: Composite Attribute
An attribute can also have their own attributes. These attributes are known as Composite
attributes.
ER Diagram: Relationship
45 | P a g e
There are three types of relationship that exist between Entities.
1. Binary Relationship
2. Recursive Relationship
3. Ternary Relationship
Binary Relationship means relation between two Entities. This is further divided into three
types.
46 | P a g e
The above example describes that one student can enroll only for one course and a course
will also have only one Student. This is not what you will usually see in real-world
relationships.
The below example showcases this relationship, which means that 1 student can opt for
many courses, but a course can only have 1 student. Sounds weird! This is how it is.
47 | P a g e
Many to One Relationship
It reflects business rule that many entities can be associated with just one entity. For
example, Student enrolls for only one Course but a Course can have many Students.
48 | P a g e
The above diagram represents that one student can enroll for more than one courses. And
a course can have more than 1 student enrolled in it.
A Ternary relationship involves three entities. In such relationships we always consider two
entites together and then look upon the third.
49 | P a g e
For example, in the diagram above, we have three related entities, Company, Product and
Sector. To understand the relationship better or to define rules around the model, we
should relate two entities and then derive the third one.
A Company produces many Products/ each product is produced by exactly one company.
A Company operates in only one Sector / each sector has many companies operating in it.
Considering the above two rules or relationships, we see that although the complete
relationship involves three entities, but we are looking at two entities at a time.
Normalization of Database
50 | P a g e
step process that puts data into tabular form, removing duplicated data from the relation
tables.
The video below will give you a good overview of Database Normalization. If you want you
can skip the video, as the concept is covered in detail, below the video.
If a table is not properly normalized and have data redundancy then it will not only eat up
extra memory space but will also make it difficult to handle and update the database,
without facing data loss. Insertion, Updation and Deletion Anomalies are very frequent if
database is not normalized. To understand these anomalies let us take an example of a
Student table.
51 | P a g e
In the table above, we have data of 4 Computer Sci. students. As we can see, data for the
fields branch, hod(Head of Department) and office_tel is repeated for the students who are
in the same branch in the college, this is Data Redundancy.
Insertion Anomaly
Suppose for a new admission, until and unless a student opts for a branch, data of the
student cannot be inserted, or else we will have to set the branch information as NULL.
Also, if we have to insert data of 100 students of same branch, then the branch information
will be repeated for all those 100 students.
Updation Anomaly
What if Mr. X leaves the college? or is no longer the HOD of computer science department?
In that case all the student records will have to be updated, and if by mistake we miss any
record, it will lead to data inconsistency. This is Updation anomaly.
Deletion Anomaly
In our Student table, two different informations are kept together, Student information and
Branch information. Hence, at the end of the academic year, if student records are deleted,
we will also lose the branch information. This is Deletion anomaly.
52 | P a g e
NORMALIZATION RULE
For a table to be in the First Normal Form, it should follow the following 4 rules:
In the next tutorial, we will discuss about the First Normal Form in details.
To understand what is Partial Dependency and how to normalize a table to 2nd normal for,
jump to the Second Normal Form tutorial.
53 | P a g e
Third Normal Form (3NF)
Here is the Third Normal Form tutorial. But we suggest you to first study about the second
normal form and then head over to the third normal form.
Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals
with certain type of anomaly that is not handled by 3NF. A 3NF table which does not have
multiple overlapping candidate keys is said to be in BCNF. For a table to be in BCNF,
following conditions must be satisfied:
To learn about BCNF in detail with a very easy to understand example, head to Boye-Codd
Normal Form tutorial.
54 | P a g e
2. And, it doesn't have Multi-Valued Dependency.
Here is the Fourth Normal Form tutorial. But we suggest you to understand other normal
forms before you head over to the fourth normal form.
In this tutorial we will learn about the 1st(First) Normal Form which is more like the Step
1 of the Normalization process. The 1st Normal form expects you to design your table in
such a way that it can easily be extended and it is easier for you to retrieve data from it
whenever required.
In our last tutorial we learned and understood how data redundancy or repetition can lead
to several issues like Insertion, Deletion and Updation anomalies and how Normalization
can reduce data redundancy and make the data more meaningful.
If tables in a database are not even in the 1st Normal Form, it is considered as bad database
design.
The first normal form expects you to follow a few simple rules while designing your
database, and they are:
55 | P a g e
Rule 1: Single Valued Attributes
Each column of your table should be single valued which means they should not contain
multiple values. We will explain this with help of an example later, let's see the other rules
for now.
This is more of a "Common Sense" rule. In each column the values stored must be of the
same kind or type.
For example: If you have a column dob to save date of births of a set of people, then you
cannot or you must not save 'names' of some of them in that column along with 'date of
birth' of others in that column. It should hold only 'date of birth' for all the records/rows.
This rule expects that each column in a table should have a unique name. This is to avoid
confusion at the time of retrieving data or performing any other operation on the stored
data.
If one or more columns have same name, then the DBMS system will be left confused.
This rule says that the order in which you store the data in your table doesn't matter.
56 | P a g e
Time for an Example
Although all the rules are self explanatory still let's take an example where we will create a
table to store student data which will have student's roll no., their name and the name of
subjects they have opted for.
Our table already satisfies 3 rules out of the 4 rules, as all our column names are unique, we
have stored data in the order we wanted to and we have not inter-mixed different type of
data in columns.
But out of the 3 different students in our table, 2 have opted for more than 1 subject. And
we have stored the subject names in a single column. But as per the 1st Normal form each
column must contain atomic value.
It's very simple, because all we have to do is break the values into atomic values.
Here is our updated table and it now satisfies the First Normal Form.
57 | P a g e
101 Akon OS
101 Akon CN
102 Bkon C
By doing so, although a few values are getting repeated but values for the subject column
are now atomic for each record/row.
Using the First Normal Form, data redundancy increases, as there will be many columns
with same data in multiple rows but each row as a whole will be unique.
For a table to be in the Second Normal Form, it must satisfy two conditions:
If you want you can skip the video, as the concept is covered in detail below the video.
What is Partial Dependency? Do not worry about it. First let's understand what is
Dependency in a table?
58 | P a g e
What is Dependency?
Let's take an example of a Student table with columns student_id, name, reg_no(registration
number), branch and address(student's home address).
In this table, student_id is the primary key and will be unique for every row, hence we can
use student_id to fetch any row of data from this table
Even for a case, where student names are same, if we know the student_id we can easily
fetch the correct record.
Hence we can say a Primary Key for a table is the column or a group of columns(composite
key) which can uniquely identify each record in the table.
I can ask from branch name of student with student_id 10, and I can get it. Similarly, if I ask
for name of student with student_id 10 or 11, I will get it. So all I need is student_id and
every other column depends on it, or can be fetched using it.
59 | P a g e
What is Partial Dependency?
Now that we know what dependency is, we are in a better state to understand what partial
dependency is.
For a simple table like Student, a single column like student_id can uniquely identfy all the
records in a table.
But this is not true all the time. So now let's extend our example to see if more than 1 column
together can act as a primary key.
Let's create another table for Subject, which will have subject_id and subject_name fields
and subject_id will be the primary key.
subject_id subject_name
1 Java
2 C++
3 Php
Now we have a Student table with student information and another table Subject for storing
subject information.
Let's create another table Score, to store the marks obtained by students in the respective
subjects. We will also be saving name of the teacher who teaches that subject along with
marks.
60 | P a g e
1 10 1 70 Java Teacher
2 10 2 75 C++ Teacher
3 11 1 80 Java Teacher
In the score table we are saving the student_id to know which student's marks are these
and subject_id to know for which subject the marks are for.
Together, student_id + subject_id forms a Candidate Key(learn about Database Keys) for
this table, which can be the Primary key.
See, if I ask you to get me marks of student with student_id 10, can you get it from this table?
No, because you don't know for which subject. And if I give you subject_id, you would not
know for which student. Hence we need student_id + subject_id to uniquely identify any
row.
Now if you look at the Score table, we have a column names teacher which is only dependent
on the subject, for Java it's Java Teacher and for C++ it's C++ Teacher & so on.
Now as we just discussed that the primary key for this table is a composition of two columns
which is student_id & subject_id but the teacher's name only depends on subject, hence the
subject_id, and has nothing to do with student_id.
This is Partial Dependency, where an attribute in a table depends on only a part of the
primary key and not on the whole key.
61 | P a g e
How to remove Partial Dependency?
There can be many different solutions for this, but out objective is to remove teacher's name
from Score table.
The simplest solution is to remove columns teacher from Score table and add it to the
Subject table. Hence, the Subject table will become:
And our Score table is now in the second normal form, with no partial dependency.
1 10 1 70
2 10 2 75
3 11 1 80
Quick Recap
1. For a table to be in the Second Normal form, it should be in the First Normal form and
it should not have Partial Dependency.
2. Partial Dependency exists, when for a composite primary key, any attribute in the
table depends only on a part of the primary key and not on the complete primary key.
62 | P a g e
3. To remove Partial dependency, we can divide the table, remove the attribute which is
causing partial dependency, and move it to some other table where it fits in well.
Third Normal Form is an upgrade to Second Normal Form. When a table is in the Second
Normal Form and has no transitive dependency, then it is in the Third Normal Form.
The video below covers the concept of Third Normal Form in details.
In our last tutorial, we learned about the second normal form and even normalized our
Score table into the 2nd Normal Form.
So let's use the same example, where we have 3 tables, Student, Subject and Score.
Student Table
student_id name reg_no branch address
Subject Table
subject_id subject_name teacher
63 | P a g e
3 Php Php Teacher
Score Table
score_id student_id subject_id marks
1 10 1 70
2 10 2 75
3 11 1 80
In the Score table, we need to store some more information, which is the exam name and
total marks, so let's add 2 more columns to the Score table.
64 | P a g e
What is Transitive Dependency?
With exam_name and total_marks added to our Score table, it saves more data now. Primary
key for our Score table is a composite key, which means it's made up of two attributes or
columns → student_id + subject_id.
Our new column exam_name depends on both student and subject. For example, a
mechanical engineering student will have Workshop exam but a computer science student
won't. And for some subjects you have Prctical exams and for some you don't. So we can say
that exam_name is dependent on both student_id and subject_id.
And what about our second new column total_marks? Does it depend on our Score table's
primary key?
Well, the column total_marks depends on exam_name as with exam type the total score
changes. For example, practicals are of less marks while theory exams are of more marks.
But, exam_name is just another column in the score table. It is not a primary key or even a
part of the primary key, and total_marks depends on it.
Again the solution is very simple. Take out the columns exam_name and total_marks from
Score table and put them in an Exam table and use the exam_id wherever required.
65 | P a g e
The new Exam table
exam_id exam_name total_marks
1 Workshop 200
2 Mains 70
3 Practicals 30
Boyce-Codd Normal Form or BCNF is an extension to the third normal form, and is also
known as 3.5 Normal Form.
Follow the video above for complete explanation of BCNF. Or, if you want, you can even skip
the video and jump to the section below for the complete tutorial.
In our last tutorial, we learned about the third normal form and we also learned how to
remove transitive dependency from a table, we suggest you to follow the last tutorial before
this one.
66 | P a g e
Rules for BCNF
For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two
conditions:
The second point sounds a bit tricky, right? In simple words, it means, that for a dependency
A → B, A cannot be a non-prime attribute, if B is a prime attribute.
Below we have a college enrolment table with columns student_id, subject and professor.
103 C# P.Chash
As you can see, we have also added some sample data to the table.
67 | P a g e
One student can enrol for multiple subjects. For example, student with student_id
101, has opted for subjects - Java & C++
For each subject, a professor is assigned to the student.
And, there can be multiple professors teaching one subject like we have for Java.
Well, in the table above student_id, subject together form the primary key, because using
student_id and subject, we can find all the columns of the table.
One more important point to note here is, one professor teaches only one subject, but one
subject may have two different professors.
Hence, there is a dependency between subject and professor here, where subject depends
on the professor name.
This table satisfies the 1st Normal form because all the values are atomic, column names
are unique and all the values stored in a particular column are of same domain.
This table also satisfies the 2nd Normal Form as their is no Partial Dependency.
And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.
In the table above, student_id, subject form primary key, which means subject column is a
prime attribute.
68 | P a g e
And while subject is a prime attribute, professor is a non-prime attribute, which is not
allowed by BCNF.
To make this relation(table) satisfy BCNF, we will decompose this table into two tables,
student table and professor table.
Student Table
student_id p_id
101 1
101 2
and so on...
1 P.Java Java
2 P.Cpp C++
and so on...
And now, this relation satisfy Boyce-Codd Normal Form. In the next tutorial we will learn
about the Fourth Normal Form.
69 | P a g e
INTRODUCTION TO SQL(STRUCTURED QUERY LANGUAGE)
Structure Query Language (SQL) is a database query language used for storing and
managing data in Relational DBMS. SQL was the first commercial language introduced for
E.F Codd's Relational model of database. Today almost all RDBMS(MySql, Oracle, Infomix,
Sybase, MS Access) use SQL as the standard database query language. SQL is used to
perform all types of data operations in RDBMS.
SQL Command
This includes changes to the structure of the table like creation of table, altering table,
deleting a table etc.
All DDL commands are auto-committed. That means it saves all the changes permanently in
the database.
Command Description
70 | P a g e
rename to rename a table
DML commands are used for manipulating the data stored in the table and not the table
itself.
DML commands are not auto-committed. It means changes are not permanent to database,
they can be rolled back.
Command Description
These commands are to keep a check on other commands and their affect on the database.
These commands can annul changes made by other commands by rolling the data back to
its original state. It can also make any temporary change permanent.
Command Description
71 | P a g e
savepoint to save temporarily
Data control language are the commands to grant and take back authority from any
database user.
Command Description
Data query language is used to fetch data from tables based on conditions that we can easily
apply.
Command Description
create is a DDL SQL command used to create a table or a database in relational database
management system.
72 | P a g e
CREATING A DATABASE
The above command will create a database named Test, which will be an empty schema
without any table.
To create tables in this newly created database, we can again use the create command.
CREATING A TABLE
create command can also be used to create tables. Now when we create a table, we have to
specify the details of the columns of the tables too. We can specify the names and datatypes
of various columns in the create command itself.
73 | P a g e
create table command will tell the database system to create a new table with the given
table name and column information.
The above command will create a new table with name Student in the current database with
3 columns, namely student_id, name and age. Where the column student_id will only store
integer, name will hold upto 100 characters and age will again store only integer value.
If you are currently not logged into your database in which you want to create the table then
you can also add the database name along with table name, using a dot operator .
For example, if we have a database with name Test and we want to create a table Student in
it, then we can do so using the following query:
Here we have listed some of the most commonly used datatypes used for columns in tables.
74 | P a g e
Datatype Use
used for columns which will be used to store characters and integers, basically
VARCHAR
a string.
CHAR used for columns which will store char values(single character).
used for columns which will store text which is generally long in length. For
example, if you create a table for storing profile information of a social
TEXT
networking website, then for about me section you can have a column of type
TEXT.
DROP COMMAND
DROP command completely removes a table from the database. This command will also
destroy the table structure and the data stored in it. Following is its syntax,
The above query will delete the Student table completely. It can also be used on Databases,
to delete the complete database. For example, to drop a database,
75 | P a g e
DROP DATABASE Test;
The above query will drop the database with name Test from the system.
RENAME query
RENAME command is used to set a new name for any existing table. Following is the syntax,
Data Manipulation Language (DML) statements are used for managing data in database.
DML commands are not auto-committed. It means changes made by DML command are not
permanent to database, it can be rolled back.
Talking about the Insert command, whenever we post a Tweet on Twitter, the text is stored
in some table, and as we post a new tweet, a new record gets inserted in that table.
INSERT command
Insert command is used to insert data into a table. Following is its general syntax,
76 | P a g e
Lets see an example,
The above command will insert a new record into student table.
101 Adam 15
We can use the INSERT command to insert values for only some specific columns of a row.
We can specify the column names along with the values to be inserted like this,
The above SQL query will only insert id and name values in the newly inserted record.
Both the statements below will insert NULL value into age column of the student table.
Or,
The above command will insert only two column values and the other column is set to null.
77 | P a g e
S_id S_Name age
101 Adam 15
102 Alex
101 Adam 15
102 Alex
103 chris 14
Suppose the column age in our tabel has a default value of 14.
Also, if you run the below query, it will insert default value into the age column, whatever
the default value may be.
Let's take an example of a real-world problem. These days, Facebook provides an option for
Editing your status update, how do you think it works? Yes, using the Update SQL command.
Let's learn about the syntax and usage of the UPDATE command.
78 | P a g e
UPDATE COMMAND
UPDATE command is used to update any record of data in a table. Following is its general
syntax,
101 Adam 15
102 Alex
103 chris 14
101 Adam 15
102 Alex 18
103 chris 14
In the above statement, if we do not use the WHERE clause, then our update query will
update age for all the columns of the table to 18.
We can also update values of multiple columns using a single UPDATE statement.
79 | P a g e
UPDATE student SET name='Abhi', age=17 where s_id=103;
The above command will update two columns of the record which has s_id 103.
101 Adam 15
102 Alex 18
103 Abhi 17
When we have to update any integer value in a table, then we can fetch and update the value
in the table in a single statement.
For example, if we have to update the age column of student table every year for every
student, then we can simply run the following UPDATE statement to perform the following
operation:
As you can see, we have used age = age + 1 to increment the value of age by 1.
When you ask any question in Studytonight's Forum it gets saved into a table. And using the
Delete option, you can even delete a question asked by you. How do you think that works?
Yes, using the Delete DML command.
80 | P a g e
Let's study about the syntax and the usage of the Delete command.
DELETE command
101 Adam 15
102 Alex 18
103 Abhi 17
The above command will delete all the records from the table student.
In our student table if we want to delete a single record, we can use the WHERE clause to
provide a condition in our DELETE statement.
81 | P a g e
The above command will delete the record where s_id is 103 from the table student.
101 Adam 15
102 Alex 18
TRUNCATE command is different from DELETE command. The delete command will delete
all the rows from a table whereas truncate command not only deletes all the records stored
in the table, but it also re-initializes the table(like a newly created table).
For eg: If you have a table with 10 rows and an auto_increment primary key, and if you use
DELETE command to delete all the rows, it will delete all the rows, but will not re-initialize
the primary key, hence if you will insert any row after using the DELETE command, the
auto_increment primary key will start from 11. But in case of TRUNCATE command,
primary key is re-initialized, and it will again start from 1.
o avoid that, we use the COMMIT command to mark the changes as permanent.
COMMIT;
ROLLBACK COMMAND
This command restores the database to last commited state. It is also used with SAVEPOINT
command to jump to a savepoint in an ongoing transaction.
82 | P a g e
If we have used the UPDATE command to make some changes into the database, and realise
that those changes were not required, then we can use the ROLLBACK command to rollback
those changes, if they were not commited using the COMMIT command.
ROLLBACK TO savepoint_name;
SAVEPOINT command
SAVEPOINT command is used to temporarily save a transaction so that you can rollback to
that point whenever required.
SAVEPOINT savepoint_name;
In short, using this command we can name the different states of our data in any table and
then rollback to that state using the ROLLBACK command whenever required.
id name
1 Abhi
2 Adam
4 Alex
83 | P a g e
Lets use some SQL queries on the above table and see the results.
COMMIT;
SAVEPOINT A;
SAVEPOINT B;
SAVEPOINT C;
id name
1 Abhi
2 Adam
4 Alex
84 | P a g e
5 Abhijit
6 Chris
7 Bravo
Now let's use the ROLLBACK command to roll back the state of data to the savepoint B.
ROLLBACK TO B;
id name
1 Abhi
2 Adam
4 Alex
5 Abhijit
6 Chris
Now let's again use the ROLLBACK command to roll back the state of data to the savepoint
A
ROLLBACK TO A;
1 Abhi
2 Adam
4 Alex
5 Abhijit
So now you know how the commands COMMIT, ROLLBACK and SAVEPOINT works.
Data Control Language (DCL) is used to control privileges in Database. To perform any
operation in the database, such as for creating tables, sequences or views, a user needs
privileges. Privileges are of two types,
System: This includes permissions for creating session, table, etc and all types of other
system privileges.
Object: This includes permissions for any command or query to perform any
operation on the database tables.
GRANT: Used to provide any user access privileges or other priviliges for the
database.
REVOKE: Used to take back permissions from any user.
86 | P a g e
Allow a User to create session
When we create a user in SQL, it is not even allowed to login and create a session until and
unless proper permissions/priviliges are granted to the user.
To allow a user to create tables in the database, we can use the below command,
Allowing a user to create table is not enough to start storing data in that table. We also must
provide the user with priviliges to use the available tablespace for their table and data.
The above command will alter the user details and will provide it access to unlimited
tablespace on system.
sysdba is a set of priviliges which has all the permissions in it. So if we want to provide all
the privileges to any user, we can simply grant them the sysdba permission.
87 | P a g e
GRANT sysdba TO username
Sometimes user is restricted from creating come tables with names which are reserved for
system tables. But we can grant privileges to a user to create any table using the below
command,
As the title suggests, if you want to allow user to drop any table from the database, then
grant this privilege to the user,
And, if you want to take back the privileges from any user, use the REVOKE command.
WHERE clause is used to specify/apply any condition while retrieving, updating or deleting
data from a table. This clause is used mostly with SELECT, UPDATE and DELETEquery.
When we specify a condition using the WHERE clause then the query executes only for those
records for which the condition specified by the WHERE clause is true.
88 | P a g e
Syntax for WHERE clause
Here is how you can use the WHERE clause with a DELETE statement, or any other
statement,
The WHERE clause is used at the end of any SQL query, to specify a condition for execution.
Now we will use the SELECT statement to display data of the table, based on a condition,
which we will add to our SELECT query using WHERE clause.
Let's write a simple SQL query to display the record for student with s_id as 101.
SELECT s_id,
name,
age,
89 | P a g e
address
FROM student WHERE s_id = 101;
In the above example we have applied a condition to an integer value field, but what if we
want to apply the condition on name field. In that case we must enclose the value in single
quote ' '. Some databases even accept double quotes, but single quotes is accepted by all.
SELECT s_id,
name,
age,
address
FROM student WHERE name = 'Adam';
90 | P a g e
OPERATORS FOR WHERE CLAUSE CONDITION
Following is a list of operators that can be used while specifying the WHERE clause
condition.
Operator Description
= Equal to
!= Not Equal to
Order by clause is used with SELECT statement for arranging retrieved data in sorted order.
The Order by clause by default sorts the retrieved data in ascending order. To sort the data
in descending order DESC keyword is used with Order by clause.
91 | P a g e
Syntax of Order By
The above query will return the resultant data in ascending order of the salary.
92 | P a g e
Using Order by DESC
The above query will return the resultant data in descending order of the salary.
93 | P a g e
of these operations is a new relation, which might be formed from one
or more input relations.
SELECT (symbol: σ)
PROJECT (symbol: π)
RENAME (symbol: )
UNION (υ)
INTERSECTION ( ),
DIFFERENCE (-)
CARTESIAN PRODUCT ( x )
JOIN
DIVISION
SELECT (σ)
94 | P a g e
used as an expression to choose tuples which meet the selection
condition. Select operation selects tuples that satisfy a given predicate.
σp(r)
σ is the predicate
p is prepositional logic
Example 1
Example 2
Output - Selects tuples from Tutorials where the topic is 'Database' and
'author' is guru99.
Example 3
Projection(π)
95 | P a g e
The projection eliminates all attributes of the input relation but those
mentioned in the projection list. The projection method defines a
relation that contains a vertical subset of Relation.
Example of Projection:
1 Google Active
2 Amazon Active
3 Apple Inactive
4 Alibaba Active
Google Active
Amazon Active
Apple Inactive
96 | P a g e
Alibaba Active
Example
Table A Table B
1 1 1 1
1 2 1 3
A ∪ B gives
Table A ∪ B
97 | P a g e
column 1 column 2
1 1
1 2
1 3
Example
A-B
Table A - B
column 1 column 2
1 2
Intersection
98 | P a g e
A∩B
Defines a relation consisting of a set of all tuple that are in both A and B.
However, A and B must be union-compatible.
Example:
A∩B
Table A ∩ B
column 1 column 2
1 1
Cartesian product(X)
σ column 2 = '1' (A X B)
99 | P a g e
Output – The above example shows all rows from relation A and B whose
column 2 has value 1
σ column 2 = '1' (A X
B)
column 1 column 2
1 1
1 1
Join Operations
JOIN operation also allows joining variously related tuples from different
relations.
Types of JOIN:
Inner Joins:
Theta join
EQUI join
Natural join
100 | P a g e
Outer join:
Inner Join:
In an inner join, only those tuples that satisfy the matching criteria are
included, while the rest are excluded. Let's study various types of Inner
Joins:
Theta Join:
Example
A ⋈θ B
For example:
column 1 column 2
1 2
101 | P a g e
EQUI join:
When a theta join uses only equivalence condition, it becomes a equi join.
For example:
column 1 column 2
1 1
Example
Num Square
102 | P a g e
2 4
3 9
Num Cube
2 8
3 27
C⋈D
C⋈D
2 4 4
3 9 27
OUTER JOIN
In an outer join, along with tuples that satisfy the matching criteria, we
also include some or all tuples that do not match the criteria.
In the left outer join, operation allows keeping all tuple in the left
relation. However, if there is no matching tuple is found in right relation,
then the attributes of right relation in the join result are filled with null
values.
103 | P a g e
Consider the following 2 Tables
Num Square
2 4
3 9
4 16
Num Cube
2 8
3 18
5 75
A B
A⋈B
2 4 4
104 | P a g e
3 9 9
4 16 -
In the right outer join, operation allows keeping all tuple in the right
relation. However, if there is no matching tuple is found in the left
relation, then the attributes of the left relation in the join result are filled
with null values.
A B
A⋈B
2 8 4
3 18 9
5 75 -
In a full outer join, all tuples from both relations are included in the
result, irrespective of the matching condition.
105 | P a g e
A B
A⋈B
2 4 8
3 9 18
4 16 -
5 - 75
RELATIONAL ALGEBRA
Relational database systems are expected to be equipped with a query language that can
assist its users to query the database instances. There are two kinds of query languages −
relational algebra and relational calculus.
Relational Algebra
Select
Project
106 | P a g e
Union
Set different
Cartesian product
Rename
Notation − σp(r)
Where σ stands for selection predicate and r stands for relation. p is prepositional logic
formula which may use connectors like and, or, and not. These terms may use relational
operators like − =, ≠, ≥, < , >, ≤.
For example −
σsubject = "database"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is 450.
Output − Selects tuples from books where subject is 'database' and 'price' is 450 or those
books published after 2010.
107 | P a g e
It projects column(s) that satisfy a given predicate.
For example −
Selects and projects columns named as subject and author from the relation Books.
r ∪ s = { t | t ∈ r or t ∈ s}
Notation − r U s
Where r and s are either database relations or relation result set (temporary relation).
Output − Projects the names of the authors who have either written a book or an article or
both.
108 | P a g e
Set Difference (−)
The result of set difference operation is tuples, which are present in one relation but are not
in the second relation.
Notation − r − s
Output − Provides the name of authors who have written books but not articles.
Notation − r Χ s
r Χ s = { q t | q ∈ r and t ∈ s}
Output − Yields a relation, which shows all the books and articles written by tutorialspoint.
The results of relational algebra are also relations but without any name. The rename
operation allows us to rename the output relation. 'rename' operation is denoted with small
Greek letter rho ρ.
Notation − ρ x (E)
109 | P a g e
Where the result of expression E is saved with name of x.
Set intersection
Assignment
Natural join
Relational Calculus
Notation − {T | Condition}
For example −
Output − Returns tuples with 'name' from Author who has written article on 'database'.
TRC can be quantified. We can use Existential (∃) and Universal Quantifiers (∀).
For example −
110 | P a g e
Output − The above query will yield the same result as the previous one.
In DRC, the filtering variable uses the domain of attributes instead of entire tuple values (as
done in TRC, mentioned above).
Notation −
Where a1, a2 are attributes and P stands for formulae built by inner attributes.
For example −
Output − Yields Article, Page, and Subject from the relation TutorialsPoint, where subject is
database.
Just like TRC, DRC can also be written using existential and universal quantifiers. DRC also
involves relational operators.
The expression power of Tuple Relation Calculus and Domain Relation Calculus is
equivalent to Relational Algebra.
111 | P a g e