Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

UNIT- I

What is Data?
Data is a collection of facts. It can be numbers, words, measurements, observations or even just descriptions of
things. Data as an abstract concept can be viewed as the lowest level of abstraction from which information
and then knowledge are derived
What is Database?
A database is an organized collection of data that can easily be accessed, managed and updated. A database
can be thought of as a set of logically related files organized to facilitate access by one or more applications
programs and to minimize data redundancy. Some of the examples include Microsoft Excess, SQL Server and
Oracle.
What is File System?
File systems are used on data storage devices, such as hard disk drives, floppy disks, optical discs, or flash
memory storage devices, to maintain the physical locations of the computer files. File is just a general concept
that describes an old style of database that was used before the advent of relational tables and schemas.

- What is DBMS?
DBMS is a collection of programs that enables you to store, modify, and extract information from a database.
In other way, it is a computer software program that is designed as the means of managing all databases that
are currently installed on a system hard drive or network.
ACID Properties of DBMS
Atomicity states that database modifications must follow an “all or nothing” rule. Each transaction is said to be
“atomic.” If one part of the transaction fails, the entire transaction fails. It is critical that the database
management system maintain the atomic nature of transactions in spite of any DBMS, operating system or
hardware failure.
Consistency states that only valid data will be written to the database. If, for some reason, a transaction is
executed that violates the database’s consistency rules, the entire transaction will be rolled back and the
database will be restored to a state consistent with those rules. On the other hand, if a transaction successfully
executes, it will take the database from one state that is consistent with the rules to another state that is also
consistent with the rules.
Isolation requires that multiple transactions occurring at the same time not impact each other’s execution. For
example, if A issues a transaction against a database at the same time that B issues a different transaction,
both transactions should operate on the database in an isolated manner. The database should either perform
A’s entire transaction before executing B’s or vice-versa. This prevents A’s transaction from reading
intermediate data produced as a side effect of part of B’s transaction that will not eventually be committed to
the database. Note that the isolation property does not ensure which transaction will execute first, merely that
they will not interfere with each other.
Durability ensures that any transaction committed to the database will not be lost. Durability is ensured
through the use of database backups and transaction logs that facilitate the restoration of committed
transactions in spite of any subsequent software or hardware failures.

- Advantages of DBMS over File Systems


a. Data Redundancy and Inconsistency
In the conventional file processing system, every user group maintains its own files for handling its data files.
This may lead to duplication of data which may results to more space utilization and more access cost.
Eventually data may be inconsistent due to duplication. While in DBMS, it takes care of all these shortcomings
by providing various methods of mitigation.
b. Difficulty in accessing data
A DBMS utilizes a variety of sophisticated techniques to store and retrieve data efficiently. DBMS makes it
possible to produce quick answers to ad hoc queries. While this is not possible in traditional file based system,
DBMS does it quietly superbly.
c. Data Isolation
In the file based systems, data may be scattered all around with various format and managing it independently
becomes very tedious job. Hence, writing new application program to retrieve appropriate data becomes very
difficult. But in modern database practice, it becomes very easy to access different tables from a single
database.
d. Integrity Problems:
Since related data is stored in one single database, enforcing data integrity is much easier in DBMS. Data
integrity means that the data contained in the database is both accurate and consistent. Therefore, data values
being entered for the storage could be checked to ensure that they fall within a specified range and are of the
correct format. These cannot be done in traditional file based systems.
e. Atomicity Problems:
In a DBMS, if a transaction fails due to any incident or accident, data can be restored to consistent state that
existed prior to the failure. It ensures that accidental threat doesn’t impact the overall database consistent
state.
f. Concurrent Access:
A DBMS schedules concurrent accesses to the data in such a manner that users can think of the data as being
accessed by only one user at a time. However, that data may be accessed by multiple users managing the
consistency on one hand and managing the integrity on the other.
g. Security:
In conventional systems, applications are developed in an ad hoc manner. Often different system of an
organization would access different components of the operational data. In such an environment, enforcing
security can be quite difficult. Setting up of a database makes it easier to enforce security restrictions since the
data is now centralized. It is easier to control who has access to what parts of the database.

Relational Database Management Systems


It is a type of database management system (DBMS) that stores data in the form of related tables. A relational
database is created using the relational model which uses set of process to define data. Some of the feature of
RDBMS are as under:

- Provides data to be stored in tables


- Persists data in the form of rows and columns
- Provides facility primary key, to uniquely identify the rows
- Creates indexes for quicker data retrieval
- Sharing a common column in two or more tables(primary key and foreign key)
- Provides multi user accessibility that can be controlled by individual users

Some of the widely used RDBMS are Oracle, IBM’s DB/2 and Microsoft’s SQL Server.

View of Data

a) Data Abstraction
 Physical Level
 Logical Level
 View Level
b) Instances and Schemas

a) Data Abstraction:
When the DBMS hides certain details of how data is stored and maintained, it provides what is called as
the abstract view of data. Complexity is hidden from users through several levels of abstraction.
Following are the levels of abstraction that are hidden from users:
1. Physical level:
a) Lowest level of abstraction.
b) It describes how data are actually stored.
c) It describes low-level complex data structures in detail.
d) At this level, efficient algorithms to access data are defined.
2. Logical level:
a) It is next-higher level of abstraction. Here it defines what data are stored in the database and what
relationship exists between those data.
b) Users at this level need not be aware of the physical-level complexity used to implement the simple
structures.
c) Generally, database administrators (DBAs), Programmers work at logical level of abstraction

3. View level:
a) It is the highest level of abstraction.
b) It describes only a part of the whole Database for particular group of users.
c) This view hides all complexity.
d) It exists only to simplify user interaction with system.
e) Bank Tellers, Software Users falls under this category.

Three Levels of Data Abstraction

b) Instances and Schemas

In DBMS, Instance is the information stored in the Database at a particular moment or it can be termed
as one of the snapshot of database at any moment of time. There may be many instances of a database
because it changes over a time as information is inserted and deleted.

In DBMS, Schema is the overall design of the Database. It is a logical state of database that defines the
tables, the fields in each table, and the relationships between fields and tables. One schema may be
several state of instances.

Database systems have several schemas, partitioned according to the levels of abstraction. The physical
schema describes the database design at the physical level, while the logical schema describes the
database design at the logical level. A database may also have several schemas at the view level,
sometimes called subschemas that describe different views of the database.

Of these, the logical schema is by far the most important, in terms of its effect on application programs,
since programmers construct applications by using the logical schema. The physical schema is hidden
beneath the logical schema, and can usually be changed easily without affecting application programs.
Application programs are said to exhibit physical data independence if they do not depend on the
physical schema, and thus need not be rewritten if the physical schema changes.

Data Models:

A data model can be thought of as a diagram or flowchart that illustrates the relationships between
data. A data model is not just a way of structuring data: it also defines a set of operations that can be
performed on the data. It provides concepts that describe the details of how data is stored in the
computer.

There are various types of Data Models:


a) Hierarchical Model
b) Network Data Model
c) Relational Model
d) Entity- Relational Model
e) Object Relational Data Model

a) Entity Relational Model:


Entity – Relationship model is an abstract way to describe a database and a high-level data model
that is useful in developing a conceptual design for a database. An ER model is a diagram containing
entities or "items", relationships among them, and attributes of the entities. One of the key
techniques in ER modeling is to document the entity and relationship types in a graphical form
called, Entity-Relationship (ER) diagram.

Basic Building blocks of E-R Model

i) Entity: An entity may be defined as a thing which is recognized as being capable of an


independent existence and which can be uniquely identified. Information about an entity is
captured in the form of attributes and/or relationships.
ii) Attributes: Attribute is a characteristic to describe an object usually within a pattern. The
attribute usually refers to the shape, size, color or properties of an object
iii) Relationship: A relationship is a situation that exists between two relational database tables
when one table has a foreign key that references the primary key of the other table. There
are 3 types of relationship
- One to One
- One to many
- Many to one
- Many to many

iv) Lines: It links attributes to entity sets and entity sets to relationships.

b) Relational Model:
The relational model of databases provides a very simple way of looking at data structured into
tables. The relational model is based on (relational) set theory in mathematics. It uses a collection
of tables to represent both data and relationship among those data. Each table has multiple
columns and each column has unique name. The relational model is at a lower level of abstraction
than E-R Model but it is the mostly used data model by database designers.
Data Independence:
Data independence is the type of data transparency that matters for a centralized DBMS. Physical
data independence deals with hiding the details of the storage structure from user applications.
Nearly all modern applications are based on the principle of data independence. In fact, the whole
concept of a database management system (DBMS) supports the notion of data independence
since it represents a system for managing data separately from the programs that use the data.
Data independence ensures that the data cannot be redefined or reorganized by any of the
programs that make use of the data. In this manner, the data remains accessible, but is also stable
and cannot be corrupted by the applications using it.

Types of Data Independence:


a) Physical Independence: It is the ability to modify the physical scheme without causing
application programs to be rewritten. Modifications at this level are usually to improve
performance and managing easy access for efficiency. This may include adding new storage
device, modifying indexes.

b) Logical Independence: Logical data independence makes it possible to change the structure
of the data independently of modifying the applications or programs that make use of the
data. There is no need to rewrite current applications as part of the process of adding to or
removing data from then system. This may include addition or deletion of entities, attributes
or relationships.

Database languages
Normally Database language uses Structured Query Language for managing data and
database itself. It is a specialized language for updating, deleting, and requesting
information from databases. SQL is an ANSI and ISO standard, and is the de facto standard
database query language.

Types of Database Language


a) Data Definition Language
DDL statements are used to build and modify the structure of tables and other objects in
the database. These database objects include tables, views, sequences, catalogs,
indexes, and aliases. Basically DDL defines overall structure of a database which can be
explicitly seen in the scope of schema. Some of DDL Examples are
CREATE - to create objects in the database
ALTER - alters the structure of the database
DROP - delete objects from the database
TRUNCATE - remove all records from a table, including all spaces allocated for the
records are removed
COMMENT - add comments to the data dictionary
RENAME - rename an object

Example of DDL: Create table DEPARTMENT (DepartmentID integer, DepartmentName


varchar(25),StartedDate datetime, primary key(DepartmentID))
b) Data Manipulation Language
Data Manipulation Language (DML) is to retrieve, insert and modify data in a database.
These commands will be used by all database users during the routine operation of the
database
Insert- inserting data to tables
Update- updating data of tables
Delete- deleting data from table
Select- viewing data from table
Example of DM: select * from DEPARTMENT where DepartmentID=2;

Database Users and Administrators


There are various users in a database who have various level of interaction with the database systems. Each
users has their own roles and own privilege. They interact with the database in order to query and update the
database and generate reports. Following are the various database users who uses database for various
reasons:
a) Naive Users: These are the users who query and update the database by invoking some already written
application programs such as VB, Oracle Reports, and ERP Applications. For example, a bank teller who
needs to transfer $50 from account A to account B invokes a program called transfer. This program asks
the teller for the amount of money to be transferred, the account from which the money is to be
transferred, and the account to which the money is to be transferred. The typical user interface for
naive users is a forms interface, where the user can fill in appropriate fields of the form. Naive users
may also simply read reports generated from the database

b) Application Programmers: They are computer professionals who write application programs for
accessing database and managing business logic. They can choose tools, such as rapid application
development (RAD) and fourth generation programming language to develop the application program
with minimal effort.

c) Sophisticated Users: They are one of the experienced users who interacts with database without
writing application programs. Instead they uses SQL for interacting with database directly. They submit
each such query to a query processor, whose function is to break down DML statements into
instructions that the storage manager understands. Sophisticated useres include engineers, scientists,
business analysts, and others who thoroughly familiarize themselves with the facilities of the DBMS so
as to implement their applications to meet their complex requirements.

d) Specialized Users: Specialized users write applications such as computer-aided design systems,
knowledge-base and expert systems that store data having complex data types. Complex data includes
graphics and audio data.
Database Administrators
DBA is a person who has central control over both data and application programs. The responsibilities of
DBA vary depending upon the job description and corporate and organization policies. DBA is responsible
for the installation, configuration, upgrade, administration, monitoring, security and maintenance of
database in an organization.
Roles of Database Administrators
a) Installation, Configuration, up gradation and migration
b) Backup and Recovery
c) Database Security
d) Storage and Capacity Planning
e) Performance monitoring and Tuning
f) Troubleshooting
g) High Availability and Scalability
Hospital Systems

Student Enrollment
Office System
Entity Relationship Data Model
The entity-relationship model (or ER model) is a way of graphically representing the logical relationships of
entities (or objects) in order to create a database. In ER modeling, the structure for a database is portrayed
as a diagram, called an entity-relationship diagram . ER Diagram consists of following major components:
 Rectangles represents entity sets
 Ellipses represents attributes
 Diamonds represents relationship sets
 Lines represents link from attributes to entity sets and entity sets to relationship sets
 Double ellipses represents multivalued attributes
 Dashed ellipses represents derived attributes
 Double lines indicates total participation of an entity in a relationship sets
 Double Rectangles represents weak entity sets
E-R Model is comprised of following:
a) Entity Sets: Entity is basic building block of the E-R data model that are usually recognizable concepts,
either concrete or abstract, such as person, places, things, or events which have relevance to the
database. An entity set is a set of entities of the same type (e.g., all persons having an account at a
bank).
Weak Entity Set: An entity set that does not have a primary key is referred to as a weak entity set. We
depict a weak entity set by double rectangles. The existence of a weak entity set depends on the
existence of a strong entity set. For a weak entity set to be meaningful, it must be associated with
another entity set called the identifying or owner entity set. The relationship associating the weak
entity set with the identifying entity set is called identifying relationship.
Strong Entity Set: The entity set that do have a primary key is referred to as strong entity sets. For
example, The STUDENT entity has a key attribute Roll No which uniquely identifies it, hence is a strong
entity set.

b) Relationship Sets: A relationship set is a mathematical relation among more than two entities.
Relationship between two or more than two entity sets are normally not found. Most relationship are
binary relationship.
Mapping Constraints: An E-R scheme may define certain constraints to which the contents of a
database must conform.
Mapping Cardinalities of a Relationship: A mapping cardinality is a data condition that specifies how
many entities can be related to in a relationship set. For a binary relationship set the mapping
cardinality must be one of the following types:
- One to One: An entity in A is associated with at most one entity in B, and an entity in B is
associated with at most one entity in A.
- One to Many: An entity in A is associated with any number in B. An entity in B is associated
with at most one entity in A.
- Many to One: An entity in A is associated with at most one entity in B. An entity in B is
associated with any number in A.
- Many to Many: Entities in A and B are associated with any number from each other

One to One Relation

One to Many Relation

Many to One Relation


c) Attributes: Attribute is a characteristic to describe an object usually within a pattern. The attribute
usually refers to the shape, size, color or properties of an object. The database schema associates one
or more attributes with each database entity. For an example, customer's name, address, city, balance,
etc. are attributes that help identify the customer. Some of the various types of attributes are as below:

 Simple and Composite Attributes: Simple attribute that consist of a single atomic value. A
composite attribute is an attribute that can be further subdivided. For example the attribute
ADDRESS can be subdivided into street, city, state, and zip code. A simple attribute cannot be
subdivided. For example the attributes age, sex etc are simple attributes.

 Single Valued and Multivalued Attributes: A single valued attribute can have only a single value.
For example a person can have only one 'date of birth', 'age' etc. But it can be simple or
composite attribute. That is 'date of birth' is a composite attribute, 'age' is a simple attribute. But
both are single valued attributes.

Multivalued attributes can have multiple values. For instance a person may have multiple phone
numbers, multiple degrees etc. Multivalued attributes are shown by a double line connecting to
the entity in the ER diagram.

 Stored and Derived Attributes: The value for the derived attribute is derived from the stored
attribute. For example 'Date of birth' of a person is a stored attribute. The value for the attribute
'AGE' can be derived by subtracting the 'Date of Birth'(DOB) from the current date. Stored
attribute supplies a value to the related attribute.

Keys
A key is a single or combination of multiple fields. Its purpose is to access or retrieve data rows
from table according to the requirement. The keys are defined in tables to access or sequence
the stored data quickly and smoothly. They are also used to create links between different
tables. It can also be called a key field, sort key, index, or key word. They help enforce integrity
and help identify the relationship between tables.

In DBMS, there are variety of keys which helps database systems manage its operation
efficiently. Following are the various types of keys:

a) Super Key: A super key is a set of one or more attributes within a table whose values can be
used to uniquely identify a row. A table might have many super keys. Candidate keys are a
special subset of super keys. For an example, we have customer as an entity and customer-
id, customer-name as attributes. So, super key may be customer-id or customer-id,
customer-name as it can represent single customer uniquely. But customer-name alone
cannot be a super key because many customers can have same name.
b) Candidate Key: A candidate key is a column, or set of columns, in a table that can uniquely
identify any database record without referring to any other data. Each table may have one
or more candidate keys, but one candidate key is special, and it is called the primary key.
This is usually the best among the candidate keys. For an example, we have customer-id,
customer-name, and customer-address as attributes. Suppose, customer-name and
customer-address is sufficient to uniquely represent the row. Similarly, customer-id may also
be enough to uniquely represent the row. Therefore, customer-id and customer-name,
customer-address may be called as candidate keys. Even though customer-id, customer-
name also uniquely can represent the row but this cannot be termed as candidate key
because customer-id is itself a candidate key. Hence, a candidate key is the most minimal
subset of fields that uniquely identifies a tuple (row).

c) Primary Key: A primary key is a value that can be used to identify a unique row in a table. It
can either be a normal attribute that is guaranteed to be unique (such as Social Security
Number in a table with no more than one record per person) or it can be generated by the
DBMS (such as auto-increment number). Primary keys may consist of a single attribute or
multiple attributes in combination. A primary key column cannot contain a NULL value.

d) Foreign Key: A foreign key is a field in a relational table that matches the primary key
column of another table. The foreign key can be used to make sure that the rows in one
table have corresponding rows in another table. Foreign key values can be null, even though
primary key values can't.

Category table: Primary Key: CategoryID


Product Table: Foreign Key: CategoryID
Specialization, Generalization and Aggregation
Specialization: The process of designating sub groupings with in an entity set is called specialization.
We use IS A relationship to represent specialization.IS A relationship may also be referred as super
class-subclass relationship. The process of defining subclass is based on the basis of some distinguish
characteristics of the entities in the super class. Subclasses inherit the attributes and relationships of
their super classes.
Example: Employee IS A Salaried Employee
Employee IS A Hourly Employeed

Generalization: The reverse of specialization is generalization. Several classes with common features
are generalized into a super class. For example, the entity types Car and Truck share common attributes
License, PlateNo, VehicleID and Price, therefore they can be generalized into the super class Vehicle.
Generalization is used to emphasize the similarities among lower-level entity set and to hide
differences. It makes ER diagram simpler because shared attributes are not repeated. Generalization is
denoted through a triangular component labeled ‘IS A’.
Aggregation: One limitation of the E-R model is that it cannot express relationships among relationships. The
best way to define relationship among relationship is the use of aggregation. To illustrate the need for such a
construct, consider the relationship works-on between employee, branch, and job. The best way to model a
situation like this is by the use of aggregation. Thus the relationship set works-on relating the entity sets
employee, branch and job is a higher-level entity set. Such an entity set is treated in the same manner, as is any
other entity set. We can then create a binary relationship manages between works-on and manager to represent
who manages what task.

ER diagram with redundant relationships


ER diagram with Aggregation
Relational Database
A relational database is a collection of data items organized as a set of formally described tables from which
data can be accessed easily. A relational database is created using the relational model. The software used in a
relational database is called a relational database management system (RDBMS). A relation is usually described
as a table, which is organized into rows and columns. In a relational database, all data are stored and accessed
via relations.

Relational databases are created using a special computer language, structured query language (SQL) that is
the standard for database interoperability. SQL is the foundation for all of the popular database applications
available today, from Access to Oracle.

Types of Data Models


a) Hierarchical Model
b) Network Model
c) Relational Model
d) Object Relational Model

a) Hierarchical Model: A hierarchical database model is a data model in which the data is organized into
a tree-like structure. The structure allows representing information using parent/child relationships:
each parent can have many children, but each child has only one parent.
Redundancy would occur because hierarchical databases handle one-to-many relationships well but do
not handle many-to-many relationships well. This is because a child may only have one parent.
However, in many cases we want to have the child be related to more than one parent.

b) Network Model: While the hierarchical database model structures data as a tree of records, with each
record having one parent record and many children, the network model allows each record to have
multiple parent and child records, forming a generalized graph structure. This allows network model to
support many-to-many relationship.
Advantages of a Network Database Model
 Since it has the many-many relationship, network database model can easily be accessed in any table
record in the database
 For more complex data, it is easier to use because of the multiple relationship founded among its
data
 Easier to navigate and search for information because of its flexibility

Disadvantage of a Network Database Model

 Difficult for first time users


 Difficulties with alterations of the database because when information entered can alter the entire
database

Network Database vs. Hierarchical Database Model

Network Database Model Hierarchical Database Model

Many-to-many relationship One-to-many relationship

Easily accessed because of the linkage between Difficult to navigate because of its strict owner
the information to member connection

Great flexibility among the information files Less flexibility with the collection of
because the multiple relationships among the information because of the hierarchical
files position of the files

c) Relational Model: Described in previous topic.


d) Object Relational Model: An object-relational database (ORD), or object-relational database
management system (ORDBMS), is a database management system (DBMS) similar to a relational
database, but with an object-oriented database model: objects, classes and inheritance are directly
supported in database schemas and in the query language. The primary function of this new object-
relational model is to more power, greater flexibility, better performance, and greater data integrity
then those that came before it. Object-relational databases are object-oriented databases built on
top of the relational model. It supports complex data types as BLOB, Abstract Data Types,
Multimedia etc.

An object-relational database can be said to provide a middle ground between relational databases
and object-oriented databases (OODBMS). In object-relational databases, the approach is
essentially that of relational databases: the data resides in the database and is manipulated
collectively with queries in a query language; at the other extreme is OODBMS in which the
database is essentially a persistent object store for software written in an object-oriented
programming language, with a programming API for storing and retrieving objects, and little or
no specific support for querying.
Structured Query Language
Structured Query Language is a special-purpose programming language designed for managing data in
relational database management systems (RDBMS). SQL is an ANSI (American National Standards
Institute) standard language used for manipulating data.
SQL allows users to access data in relational database management systems, such as Oracle, Sybase,
Informix, Microsoft SQL Server, Access, and others, by allowing users to describe the data the user
wishes to
see. SQL also allows users to define the data in a database, and manipulate that data. Some common
SQL commands include “select”, "insert," "update," and "delete." The language was first created by IBM
in 1975 and was called SEQUEL for "Structured English Query Language."

SQL is based on set and relational operations with certain modifications and enhancements. A typical
SQL query has the form:
Select A1, A2, ..., An
From r1, r2, ..., rm
Where P
– Ai s represent attributes
– ri s represent relations
– P is a predicate.
And the final result of an SQL query is a relation.

SQL statements are divided into two major categories: data manipulation language (DDL) and data
definition language (DDL). DML statements are used to work with the data in tables. DDL statements
are used to build and modify the structure of your tables and other objects in the database. When you
execute a DDL statement, it takes effect immediately.

SQL Operations (Some DML Statements)


a) Select Clause: The select clause corresponds to the projection operation of the relational algebra. It is used
to list the attributes desired in the result of a query. The result is stored in a result table, called the result-set.
The basic SELECT statement has 3 clauses:

 SELECT which is mandatory in select clause


 FROM defines the table from which data is retrieved
 WHERE is optional condition used to filter data

Example of select clause is:


Select column_name(s) from table_name; (select specific column(s)/attribute from table)
Select * from table name; (select all columns/ attribues from table)
Practical Example of select clause is:
SELECT serialno,name FROM country_details WHERE city='Rome';

Here in this statement,


serialno and name are attributes/fields
country_details is the table
city=’Rome’ is the predicate
and the final result is it returns a resultset with serialno and name whose city name is Rome.

With the WHERE clause, the following comparison operators can be used:

Operator Description
= Equal
<> Not equal
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal
BETWEEN Between an inclusive range
LIKE Search for a pattern
IN To specify multiple possible values for a column

Other logical Operations (AND, OR, NOT) and extended comparisons (SUB QUERIES, JOINING TABLES,
GROUPING QUERIES, ) can also be done via sql commands.

b) Insert Clause: The INSERT INTO statement is used to insert a new row in a table. The first form doesn't specify
the column names where the data will be inserted, only their values:

INSERT INTO table_name VALUES (value1, value2, value3,...)

The second form specifies both the column names and the values to be inserted:

INSERT INTO table_name (column1, column2, column3,...) VALUES (value1, value2, value3,...)

The number of columns and values must be the same. If a column is not specified, the default value for the
column is used. The values specified (or implied) by the INSERT statement must satisfy all the applicable
constraints (such as primary keys, CHECK constraints, and NOT NULL constraints). If a syntax error occurs or if
any constraints are violated, the new row is not added to the table and an error returned instead.

Example of insert statement:


INSERT into developers (code, name) values (5, 'Bob');

c) Delete Clause: During some operations, you need to delete single or multiple rows from the table. The
DELETE statement is used to delete rows in a table.

Syntax for delete operation is:


DELETE FROM table_name WHERE some_column=some_value;

Without the where clause, all records in the table will be deleted. Syntax without where clause is:
DELETE FROM table_name;

After you remove records using a Delete statement, you cannot undo the operation. It is also important to
understand, that a Delete statement deletes entire records, not just data in specified fields. If you just want to
delete certain fields, use an Update query that changes the value to Null.

d) Update Clause: The UPDATE statement is used to update existing records/row in a table.

Syntax for update operation is:

UPDATE table_name SET column1=value, column2=value2,...


WHERE some_column=some_value

Without the where clause, all the columns will be updated. Hence, it is mandatory to use update command
carefully. It is possible to modify a single or a multiple column at a time. However, the updated value must not
conflict with all the applicable constraints (such as primary keys, unique indexes, CHECK constraints, and NOT
NULL constraints).

SQL Operations (Some DDL Statements)


a) Create Clause: Create clause is used whenever there is a need to create a table, database, user, trigger or a
view. Syntax for creating table is:

CREATE TABLE table_name


(column_name1 data_type, column_name2 data_type,column_name3 data_type,....)

The data type specifies what type of data the column can hold. It may be of INT, VARCHAR, DATE etc.

Example: CREATE TABLE Persons


( P_Id int, LastName varchar(255), FirstName varchar(255), Address varchar(255), City varchar(255))
b) Drop Clause: Create clause is used whenever there is a need to drop a table, database, user, trigger or a
view. Syntax for dropping table is:

DROP TABLE table_name;


But for dropping some objects, you need to have privilege of dropping objects.

c) Alter Clause: Alter clause is used whenever there is a need to alter a table, database, user, trigger or a
view. Syntax for altering any object is:

ALTER objecttype objectname parameters


For example,

ALTER TABLE student ADD rollno INTEGER;


ALTER TABLE student DROP COLUMN rollno;

Query by Example

Query-by-Example (QBE) is another language for querying (and, like SQL, for creating and modifying) relational
data. It is different from SQL, and from most other database query languages, in having a graphical user
interface that allows users to write queries by creating example tables on the screen. For example, a user may
want to select an entry from a table called "Table1" with an ID of 123. Using SQL, the user would need to input
the command, "SELECT * FROM Table1 WHERE ID = 123". The QBE interface may allow the user to just click on
Table1, type in "123" in the ID field and click "Search."

QBE is offered with most database programs, though the interface is often different between applications. For
example, Microsoft Access has a QBE interface known as "Query Design View" that is completely graphical. The
phpMyAdmin application used with MySQL, offers a Web-based interface where users can select a query
operator and fill in blanks with search terms. Whatever QBE implementation is provided with a program, the
purpose is the same – to make it easier to run database queries and to avoid the frustrations of SQL errors.
UNIT- II

Integrity constraints provide a way of ensuring the changes made to the database by authorized users do not
result in a loss of data consistency and are used to ensure accuracy and consistency of data in a relational
database. Enforcing data integrity ensures the quality of the data in the database. For example, if an employee
is entered with an employee_id value of 123, the database should not allow another employee to have an ID
with the same value. If you have an employee_rating column intended to have values ranging from 1 to 5, the
database should not accept a value of 6. Data integrity falls into following categories:

a) Domain Constraints/ Integrity


b) Referential Constraints/ Integrity

Domain Constraints/ Integrity: Domain constraints are the most elementary form of integrity constraint. A
domain is defined as the set of all unique values permitted for an attribute. Domain Integrity enforces valid
entries for a given column by restricting the type, the format, or the range of possible values. They are tested
easily by the system whenever a new data item is entered into the database.

The check constraint permits domains to be restricted:


For example, we can use check clause to ensure that an hourly-wage domain allows only values greater
than a specified value of 4.0.

Create domain hourly-wage numeric (5,2) constraint value-test check (value >= 4.00)

– The domain hourly-wage is declared to be a decimal number with 5 digits, 2 of which are after the
decimal point
– The domain has a constraint that ensures that the hourly-wage is greater than 4.00.
– The clause constraint value-test is optional; useful to indicate which constraint an update violated.

Referential Constraints/ Integrity: Referential integrity is a database concept that ensures that
relationships between tables remain consistent. Referential integrity prevents inconsistent data from being
created in the database by ensuring that any data shared between tables remains consistent. To put it
another way, it ensures that the soundness of the relationships remains intact.
For referential integrity to hold in a relational database, any field in a table that is declared a foreign
key can contain only values from a parent table's primary key or a candidate key. For instance, deleting a
record that contains a value referred to by a foreign key in another table would break referential integrity.
Some relational database management systems (RDBMS) can enforce referential integrity, normally either
by deleting the foreign key rows as well to maintain integrity, or by returning an error and not performing
the delete
Sample SQL using Key constraints:

For Creating Primary Key

CREATE TABLE Persons


(P_Id int NOT NULL, LastName varchar(255) NOT NULL, FirstName varchar(255),
Address varchar(255),City varchar(255),PRIMARY KEY (P_Id))

To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY constraint on multiple columns:

CREATE TABLE Persons (P_Id int NOT NULL,LastName varchar(255) NOT NULL,FirstName varchar(255),
Address varchar(255),City varchar(255),CONSTRAINT pk_PersonID PRIMARY KEY (P_Id,LastName))

In the example above there is only ONE PRIMARY KEY (pk_PersonID). However, the value of the pk_PersonID is
made up of two columns (P_Id and LastName). If a primary key consists of more than one column, duplicate
values are allowed in one column, but each combination of values from all the columns in the primary key
must be unique.

For Altering Primary Key


ALTER TABLE Persons ADD PRIMARY KEY (P_Id);
ALTER TABLE Persons ADD CONSTRAINT pk_PersonID PRIMARY KEY (P_Id,LastName);

For Dropping Primary Key


ALTER TABLE Persons DROP PRIMARY KEY;
ALTER TABLE Persons DROP CONSTRAINT pk_PersonID;

A FOREIGN KEY in one table points to a PRIMARY KEY in another table.

Let's illustrate the foreign key with an example. Look at the following two tables:
The "Persons" table: P_Id, LastName, FirstName, Address, City
The “Orders” table: O_id, OrderNo, P_Id
Note that the "P_Id" column in the "Orders" table points to the "P_Id" column in the "Persons" table.
The "P_Id" column in the "Persons" table is the PRIMARY KEY in the "Persons" table.
The "P_Id" column in the "Orders" table is a FOREIGN KEY in the "Orders" table.
The FOREIGN KEY constraint is used to prevent actions that would destroy links between tables. The FOREIGN
KEY constraint also prevents that invalid data form being inserted into the foreign key column, because it has
to be one of the values contained in the table it points to.
For Creating Foreign Key

CREATE TABLE Orders


(O_Id int NOT NULL,OrderNo int NOT NULL,P_Id int,PRIMARY KEY (O_Id),CONSTRAINT fk_PerOrders FOREIGN
KEY (P_Id) REFERENCES Persons(P_Id))

For Altering Foreign Key


ALTER TABLE Orders ADD CONSTRAINT fk_PerOrders FOREIGN KEY (P_Id) REFERENCES Persons(P_Id)

For Dropping Foreign Key


ALTER TABLE Orders DROP FOREIGN KEY fk_PerOrders

Assertion:
An assertion is a predicate expressing a condition that we want the database always to satisfy. Assertions are
included in the SQL standard. Syntax:

Create assertion <name> check (<predicate>)

When an assertion is specified, the DBMS tests for its validity. This testing may introduce a significant amount
of computing overhead (query evaluation), thus assertions should be used carefully. Assertions must be
checked at any change to the mentioned (in the assertion declaration) relations. Domain constraints,
functional dependency and referential integrity are special forms of assertion. When an assertion is created,
the system tests it for validity. If the assertion is valid, any further modification to the database is allowed only
if it does not cause that assertion to be violated.

For example, if you want to prevent investors from withdrawing more than a certain amount of money from
collective fund, you could create an assertion using the following SQL statement:

CREATE ASSERTION maximum_withdrawal


CHECK (investor.withdrawal_limit>
SELECT SUM(withdrawals.amount)
FROM withdrawals
WHERE withdrawals.investor_id = investor.ID)

Once you add the MAXIMUM_WITHDRAWAL ASSERTION to the database definition, the DBMS will check to
make sure that the assertion remains TRUE each time you execute an SQL statement that modifies either the
INVESTOR or WITHDRAWALS tables. As such, each time the user or application program attempts to execute an
INSERT, UPDATE, or DELETE statement on one of the tables in the assertion's CHECK clause, the DBMS checks
the check condition against the database, including the proposed modification. If the check condition remains
TRUE, the DBMS carries out the modification. If the modification makes the check condition FALSE, the DBMS
does not perform the modification and returns an error code indicating that the statement was unsuccessful
due to an assertion violation.

Ensuring every loan customer keeps a minimum of $1000 in an account.


create assertion balance-constraint check (not exists (select * from loan L
(where not exists (select * from borrower B, depositor D, account A where L.loan_no = B.loan_no and
B.cname = D.cname and D.account_no = A.account_no and A.balance >= 1000 )))

Triggers:
A trigger is a statement (procedure) that is executed automatically by the DBMS whenever a specified event
occurs. The trigger is mostly used for maintaining the integrity of the information on the database. For
example, when a new record is added to the employees table, new records should also be created in the tables
of the taxes, vacations and salaries.
Since triggers are event-driven specialized procedures, they are stored in and managed by the DBMS. A trigger
cannot be called or executed; the DBMS automatically fires the trigger as a result of a data modification to the
associated table
A trigger defines a set of actions that are performed in response to an insert, update, or delete operation on a
specified table. When such an SQL operation is executed, the trigger is said to have been activated. Triggers are
optional and are defined using the CREATE TRIGGER statement.
The general syntax of CREATE TRIGGER is :
CREATE TRIGGER trigger_name trigger_time trigger_event ON tbl_name FOR EACH ROW trigger_statement
The general syntax of DROP TRIGGER is:
DROP TRIGGER trigger_name
An example: All new customers opening an account must have a balance of $100; however, once the account is
opened their balance can fall below that amount. In this case you have to use a trigger because you only want
the condition evaluated when a new record is inserted.
Normalization
First Normal Form:
First Normal form imposes a very basic requirement on relations; unlike the other normal forms, it does not
require additional information such as functional dependencies.

A domain is atomic if elements of the domain are considered to be indivisible units. We say that a relation
schema R is in first normal form (1NF) if the domains of all attributes of R are atomic.
For example, we have following table. We can see movies rented with multiple values i.e. non-atomic.

Hence taking it to 1NF or atomic values.


Second Normal Form
The following relation is in First Normal Form, but not Second Normal Form:
Order # Customer Contact Person Total
1 Acme Widgets John Doe $134.23
2 ABC Corporation Fred Flintstone $521.24
3 Acme Widgets John Doe $1042.42
4 Acme Widgets John Doe $928.53

In the table above, the order number serves as the primary key. Notice that the customer and total amount
are dependent upon the order number -- this data is specific to each order. However, the contact person is
dependent upon the customer. An alternative way to accomplish this would be to create two tables:

Customer Contact Person


Acme Widgets John Doe
ABC Corporation Fred Flintstone

Order # Customer Total


1 Acme Widgets $134.23
2 ABC Corporation $521.24
3 Acme Widgets $1042.42
4 Acme Widgets $928.53

Third Normal Form

Third normal form is pretty simple. In plain terms it means that no column can depend on a non-key column. In
technical terms we say that there are no "transitive dependencies."

The example below shows a table that is not in third normal form. The problem is that the letter grade
depends on the numeric grade:

SUBJECT | YEAR | TEACHER | STUDENT | GRADE | LETTER


---------+-------+---------+-------------+-------+-----------
HIST-101 | 2008 | RUSSELL | NIRGALAI | 80 | B
HIST-101 | 2008 | RUSSELL | JBOONE | 90 | A
HIST-101 | 2008 | RUSSELL | PCLAYBORNE | 95 | A
To repeat: The key is SUBJECT + YEAR + TEACHER + STUDENT, there are no duplicate values of this combination
of columns. In plain terms we mean that no student can be listed in the same class twice, which only makes
sense after all. We note next that there is only one actual property of the table, the GRADE column. This is the
student's final grade in a particular class.

The violation is the last column, LETTER, which is not a property of the key but is functionally dependent upon
the GRADE value. The term "functionally dependent" means that column LETTER is a function of GRADE.

As always, normalization is meant to help us. The problem with the table as-is is that you cannot guarantee
that when the GRADE column is updated that the LETTER column will be correctly updated. The non-
normalized table can lead to what we call UPDATE ANOMALIES.

Another example is suppose in the Student table you had student birth date as an attribute and you also had
student's age. Students age depends on the student's birth date (a fact about his/her birth date) so 3rd Normal
Form is violated.

Boyce Codd Normal Form


Boyce–Codd normal form (or BCNF or 3.5NF) is a normal form used in database normalization. It is a slightly
stronger version of the third normal form (3NF). A relation R is said to be in BCNF if whenever X -> A holds in
R, and A is not in X, then X is candidate key for R. It should be noted that most relations
that are in 3NF are also in BCNF.

Example: Person (SI#, Name, Address)


The only FD is SI# → Name, Address. Since SI# is a key, Person1 is in BCNF

A 3NF table which does not have multiple overlapping candidate keys is guaranteed to be in BCNF. If there’s a
multiple overlapping candidate keys, then it may suffer from update anomalies.
Example for Normalizing Tables

Assume a video library maintains a database of movies rented out. Without any normalization all information is stored
in one table as shown below. Here in the table, we can see that column Movies rented has multiple values.

Hence, taking table to First Normal Form(1NF), we can separate each tuple to single value.

But in the above table, there’s a repetition of data as we can see many duplicate values. Hence we need to take
this table to Second Normal Form (2NF) by assigning a Primary Key MEMBERSHIP ID. Same key is used as
foreign key to another table. The main purpose of assigning primary/ foreign key is to avoid repetition of data
and for efficient management.
Even we have converted table to 2NF, we can here see that changing non key attribute FULLNAME may change
salutation. That means SALUTATION is functionally dependent upon FULLNAME and if FULLNAME is changed
by updating, we may have incorrect SALUTATION. So, to maintain Third Normal Form (3NF), we need to
separate attributes which are functionally dependent in this table. Hence, our 3NF table would be

Functional Dependency
A Functional Dependency describes a relationship between attributes in a single relation. An attribute is
functionally dependent on another if we can use the value of one attribute to determine the value of another.

Consider the relation:

STUDENTS = {Student_ID, First_Name, Last_Name}


We may state that the set of attributes {First_Name, Last_Name} is functionally dependent on the attribute set
{Student_ID}. This means that, given a value for Student_ID, we can always uniquely determine the value of
First_Name and the value of Last_Name. Note that, for this relation, the opposite would not be true. For
example, if there are three students with the same first and last name we will get a list of three student IDs.
That is we cannot uniquely determine a value for Student_ID given values of the attributes {First_Name,
Last_Name}.

A set of attributes Y is functionally dependent on a set of attributes X if a given set of values for each attribute
in X determines unique values for the set of attributes in Y. We use the notation:

X → Y to denote that Y is functionally dependent on X. The set of attributes X is known as the determinant of
the FD X → Y.

Trivial and Non-Trivial Functional Dependencies. It is often the case when examining the FDs associated with a
relation that some FDs are of less interest because they are self-evident. For example consider the relation:
STUDENT (Student_ID, First_Name, Last_Name) The FD {First_Name, Last_Name}→{First_Name} is true but of
little informational interest.

A trivial functional dependency occurs when you describe a functional dependency of an attribute on a
collection of attributes that includes the original attribute. For example, “{A, B} -> B” is a trivial functional
dependency, as is “{name, SSN} -> SSN”. This type of functional dependency is called trivial because it can be
derived from common sense. It is obvious that if you already know the value of B, then the value of B can be
uniquely determined by that knowledge.

Full Functional Dependency


A full functional dependency occurs when you already meet the requirements for a functional dependency and
the set of attributes on the left side of the functional dependency statement cannot be reduced any farther.
For example, “{SSN, age} -> name” is a functional dependency, but it is not a full functional dependency
because you can remove age from the left side of the statement without impacting the dependency
relationship.
Partial Functional Dependency
Partial Functional Dependency Indicates that if A and B are attributes of a table, B is partially dependent on A if there
is some attribute that can be removed from A and yet the dependency still exists.
For example, “{SSN, age} ->
name” is a partial functional dependency, because you can remove age from the left side of the statement
without impacting the dependency relationship.
Transitive Dependencies
Transitive dependencies occur when there is an indirect relationship that causes a functional dependency. For
example, ”A -> C” is a transitive dependency when it is true only because both “A -> B” and “B -> C” are true.
Transitive Closure of Functional Dependencies
We need to consider all functional dependencies that hold. Given a set F of functional dependencies, we can
prove that certain other ones also hold. We say these ones are logically implied by F.

Suppose we are given a relation scheme R= (A, B, C, G, H, I), and the set of functional dependencies:

A B, A C, CG H, CG I, B H

Then the functional dependency is logically implied.


The closure of a set F of functional dependencies is the set of all functional dependencies logically implied by F.
We denote the closure of F by F+.

Lossless Decomposition

In this example, we have a relation R

Model Name Price Category


a11 100 Canon
s20 200 Nikon
a70 150 Canon

Model Name Category Price Category


a11 Canon 100 Canon
s20 Nikon 200 Nikon
a70 Canon 150 Canon
Relation R1 Relation R2

Model Name Price Category


a11 100 Canon
a11 150 Canon
s20 200 Nikon
a70 100 Canon
a70 150 Canon
Relation R1 Union Relation R2

 In this example, additional tuples are obtained along with original tuples
 Although there are more tuples, this leads to less information
 Due to the loss of information, decomposition for previous example is called lossy decomposition
or lossy-join decomposition

You might also like