Professional Documents
Culture Documents
DBMS Updated Note - 1581657063
DBMS Updated Note - 1581657063
What is Data?
Data is a collection of facts. It can be numbers, words, measurements, observations or even just descriptions of
things. Data as an abstract concept can be viewed as the lowest level of abstraction from which information
and then knowledge are derived
What is Database?
A database is an organized collection of data that can easily be accessed, managed and updated. A database
can be thought of as a set of logically related files organized to facilitate access by one or more applications
programs and to minimize data redundancy. Some of the examples include Microsoft Excess, SQL Server and
Oracle.
What is File System?
File systems are used on data storage devices, such as hard disk drives, floppy disks, optical discs, or flash
memory storage devices, to maintain the physical locations of the computer files. File is just a general concept
that describes an old style of database that was used before the advent of relational tables and schemas.
- What is DBMS?
DBMS is a collection of programs that enables you to store, modify, and extract information from a database.
In other way, it is a computer software program that is designed as the means of managing all databases that
are currently installed on a system hard drive or network.
ACID Properties of DBMS
Atomicity states that database modifications must follow an “all or nothing” rule. Each transaction is said to be
“atomic.” If one part of the transaction fails, the entire transaction fails. It is critical that the database
management system maintain the atomic nature of transactions in spite of any DBMS, operating system or
hardware failure.
Consistency states that only valid data will be written to the database. If, for some reason, a transaction is
executed that violates the database’s consistency rules, the entire transaction will be rolled back and the
database will be restored to a state consistent with those rules. On the other hand, if a transaction successfully
executes, it will take the database from one state that is consistent with the rules to another state that is also
consistent with the rules.
Isolation requires that multiple transactions occurring at the same time not impact each other’s execution. For
example, if A issues a transaction against a database at the same time that B issues a different transaction,
both transactions should operate on the database in an isolated manner. The database should either perform
A’s entire transaction before executing B’s or vice-versa. This prevents A’s transaction from reading
intermediate data produced as a side effect of part of B’s transaction that will not eventually be committed to
the database. Note that the isolation property does not ensure which transaction will execute first, merely that
they will not interfere with each other.
Durability ensures that any transaction committed to the database will not be lost. Durability is ensured
through the use of database backups and transaction logs that facilitate the restoration of committed
transactions in spite of any subsequent software or hardware failures.
Some of the widely used RDBMS are Oracle, IBM’s DB/2 and Microsoft’s SQL Server.
View of Data
a) Data Abstraction
Physical Level
Logical Level
View Level
b) Instances and Schemas
a) Data Abstraction:
When the DBMS hides certain details of how data is stored and maintained, it provides what is called as
the abstract view of data. Complexity is hidden from users through several levels of abstraction.
Following are the levels of abstraction that are hidden from users:
1. Physical level:
a) Lowest level of abstraction.
b) It describes how data are actually stored.
c) It describes low-level complex data structures in detail.
d) At this level, efficient algorithms to access data are defined.
2. Logical level:
a) It is next-higher level of abstraction. Here it defines what data are stored in the database and what
relationship exists between those data.
b) Users at this level need not be aware of the physical-level complexity used to implement the simple
structures.
c) Generally, database administrators (DBAs), Programmers work at logical level of abstraction
3. View level:
a) It is the highest level of abstraction.
b) It describes only a part of the whole Database for particular group of users.
c) This view hides all complexity.
d) It exists only to simplify user interaction with system.
e) Bank Tellers, Software Users falls under this category.
In DBMS, Instance is the information stored in the Database at a particular moment or it can be termed
as one of the snapshot of database at any moment of time. There may be many instances of a database
because it changes over a time as information is inserted and deleted.
In DBMS, Schema is the overall design of the Database. It is a logical state of database that defines the
tables, the fields in each table, and the relationships between fields and tables. One schema may be
several state of instances.
Database systems have several schemas, partitioned according to the levels of abstraction. The physical
schema describes the database design at the physical level, while the logical schema describes the
database design at the logical level. A database may also have several schemas at the view level,
sometimes called subschemas that describe different views of the database.
Of these, the logical schema is by far the most important, in terms of its effect on application programs,
since programmers construct applications by using the logical schema. The physical schema is hidden
beneath the logical schema, and can usually be changed easily without affecting application programs.
Application programs are said to exhibit physical data independence if they do not depend on the
physical schema, and thus need not be rewritten if the physical schema changes.
Data Models:
A data model can be thought of as a diagram or flowchart that illustrates the relationships between
data. A data model is not just a way of structuring data: it also defines a set of operations that can be
performed on the data. It provides concepts that describe the details of how data is stored in the
computer.
iv) Lines: It links attributes to entity sets and entity sets to relationships.
b) Relational Model:
The relational model of databases provides a very simple way of looking at data structured into
tables. The relational model is based on (relational) set theory in mathematics. It uses a collection
of tables to represent both data and relationship among those data. Each table has multiple
columns and each column has unique name. The relational model is at a lower level of abstraction
than E-R Model but it is the mostly used data model by database designers.
Data Independence:
Data independence is the type of data transparency that matters for a centralized DBMS. Physical
data independence deals with hiding the details of the storage structure from user applications.
Nearly all modern applications are based on the principle of data independence. In fact, the whole
concept of a database management system (DBMS) supports the notion of data independence
since it represents a system for managing data separately from the programs that use the data.
Data independence ensures that the data cannot be redefined or reorganized by any of the
programs that make use of the data. In this manner, the data remains accessible, but is also stable
and cannot be corrupted by the applications using it.
b) Logical Independence: Logical data independence makes it possible to change the structure
of the data independently of modifying the applications or programs that make use of the
data. There is no need to rewrite current applications as part of the process of adding to or
removing data from then system. This may include addition or deletion of entities, attributes
or relationships.
Database languages
Normally Database language uses Structured Query Language for managing data and
database itself. It is a specialized language for updating, deleting, and requesting
information from databases. SQL is an ANSI and ISO standard, and is the de facto standard
database query language.
b) Application Programmers: They are computer professionals who write application programs for
accessing database and managing business logic. They can choose tools, such as rapid application
development (RAD) and fourth generation programming language to develop the application program
with minimal effort.
c) Sophisticated Users: They are one of the experienced users who interacts with database without
writing application programs. Instead they uses SQL for interacting with database directly. They submit
each such query to a query processor, whose function is to break down DML statements into
instructions that the storage manager understands. Sophisticated useres include engineers, scientists,
business analysts, and others who thoroughly familiarize themselves with the facilities of the DBMS so
as to implement their applications to meet their complex requirements.
d) Specialized Users: Specialized users write applications such as computer-aided design systems,
knowledge-base and expert systems that store data having complex data types. Complex data includes
graphics and audio data.
Database Administrators
DBA is a person who has central control over both data and application programs. The responsibilities of
DBA vary depending upon the job description and corporate and organization policies. DBA is responsible
for the installation, configuration, upgrade, administration, monitoring, security and maintenance of
database in an organization.
Roles of Database Administrators
a) Installation, Configuration, up gradation and migration
b) Backup and Recovery
c) Database Security
d) Storage and Capacity Planning
e) Performance monitoring and Tuning
f) Troubleshooting
g) High Availability and Scalability
Hospital Systems
Student Enrollment
Office System
Entity Relationship Data Model
The entity-relationship model (or ER model) is a way of graphically representing the logical relationships of
entities (or objects) in order to create a database. In ER modeling, the structure for a database is portrayed
as a diagram, called an entity-relationship diagram . ER Diagram consists of following major components:
Rectangles represents entity sets
Ellipses represents attributes
Diamonds represents relationship sets
Lines represents link from attributes to entity sets and entity sets to relationship sets
Double ellipses represents multivalued attributes
Dashed ellipses represents derived attributes
Double lines indicates total participation of an entity in a relationship sets
Double Rectangles represents weak entity sets
E-R Model is comprised of following:
a) Entity Sets: Entity is basic building block of the E-R data model that are usually recognizable concepts,
either concrete or abstract, such as person, places, things, or events which have relevance to the
database. An entity set is a set of entities of the same type (e.g., all persons having an account at a
bank).
Weak Entity Set: An entity set that does not have a primary key is referred to as a weak entity set. We
depict a weak entity set by double rectangles. The existence of a weak entity set depends on the
existence of a strong entity set. For a weak entity set to be meaningful, it must be associated with
another entity set called the identifying or owner entity set. The relationship associating the weak
entity set with the identifying entity set is called identifying relationship.
Strong Entity Set: The entity set that do have a primary key is referred to as strong entity sets. For
example, The STUDENT entity has a key attribute Roll No which uniquely identifies it, hence is a strong
entity set.
b) Relationship Sets: A relationship set is a mathematical relation among more than two entities.
Relationship between two or more than two entity sets are normally not found. Most relationship are
binary relationship.
Mapping Constraints: An E-R scheme may define certain constraints to which the contents of a
database must conform.
Mapping Cardinalities of a Relationship: A mapping cardinality is a data condition that specifies how
many entities can be related to in a relationship set. For a binary relationship set the mapping
cardinality must be one of the following types:
- One to One: An entity in A is associated with at most one entity in B, and an entity in B is
associated with at most one entity in A.
- One to Many: An entity in A is associated with any number in B. An entity in B is associated
with at most one entity in A.
- Many to One: An entity in A is associated with at most one entity in B. An entity in B is
associated with any number in A.
- Many to Many: Entities in A and B are associated with any number from each other
Simple and Composite Attributes: Simple attribute that consist of a single atomic value. A
composite attribute is an attribute that can be further subdivided. For example the attribute
ADDRESS can be subdivided into street, city, state, and zip code. A simple attribute cannot be
subdivided. For example the attributes age, sex etc are simple attributes.
Single Valued and Multivalued Attributes: A single valued attribute can have only a single value.
For example a person can have only one 'date of birth', 'age' etc. But it can be simple or
composite attribute. That is 'date of birth' is a composite attribute, 'age' is a simple attribute. But
both are single valued attributes.
Multivalued attributes can have multiple values. For instance a person may have multiple phone
numbers, multiple degrees etc. Multivalued attributes are shown by a double line connecting to
the entity in the ER diagram.
Stored and Derived Attributes: The value for the derived attribute is derived from the stored
attribute. For example 'Date of birth' of a person is a stored attribute. The value for the attribute
'AGE' can be derived by subtracting the 'Date of Birth'(DOB) from the current date. Stored
attribute supplies a value to the related attribute.
Keys
A key is a single or combination of multiple fields. Its purpose is to access or retrieve data rows
from table according to the requirement. The keys are defined in tables to access or sequence
the stored data quickly and smoothly. They are also used to create links between different
tables. It can also be called a key field, sort key, index, or key word. They help enforce integrity
and help identify the relationship between tables.
In DBMS, there are variety of keys which helps database systems manage its operation
efficiently. Following are the various types of keys:
a) Super Key: A super key is a set of one or more attributes within a table whose values can be
used to uniquely identify a row. A table might have many super keys. Candidate keys are a
special subset of super keys. For an example, we have customer as an entity and customer-
id, customer-name as attributes. So, super key may be customer-id or customer-id,
customer-name as it can represent single customer uniquely. But customer-name alone
cannot be a super key because many customers can have same name.
b) Candidate Key: A candidate key is a column, or set of columns, in a table that can uniquely
identify any database record without referring to any other data. Each table may have one
or more candidate keys, but one candidate key is special, and it is called the primary key.
This is usually the best among the candidate keys. For an example, we have customer-id,
customer-name, and customer-address as attributes. Suppose, customer-name and
customer-address is sufficient to uniquely represent the row. Similarly, customer-id may also
be enough to uniquely represent the row. Therefore, customer-id and customer-name,
customer-address may be called as candidate keys. Even though customer-id, customer-
name also uniquely can represent the row but this cannot be termed as candidate key
because customer-id is itself a candidate key. Hence, a candidate key is the most minimal
subset of fields that uniquely identifies a tuple (row).
c) Primary Key: A primary key is a value that can be used to identify a unique row in a table. It
can either be a normal attribute that is guaranteed to be unique (such as Social Security
Number in a table with no more than one record per person) or it can be generated by the
DBMS (such as auto-increment number). Primary keys may consist of a single attribute or
multiple attributes in combination. A primary key column cannot contain a NULL value.
d) Foreign Key: A foreign key is a field in a relational table that matches the primary key
column of another table. The foreign key can be used to make sure that the rows in one
table have corresponding rows in another table. Foreign key values can be null, even though
primary key values can't.
Generalization: The reverse of specialization is generalization. Several classes with common features
are generalized into a super class. For example, the entity types Car and Truck share common attributes
License, PlateNo, VehicleID and Price, therefore they can be generalized into the super class Vehicle.
Generalization is used to emphasize the similarities among lower-level entity set and to hide
differences. It makes ER diagram simpler because shared attributes are not repeated. Generalization is
denoted through a triangular component labeled ‘IS A’.
Aggregation: One limitation of the E-R model is that it cannot express relationships among relationships. The
best way to define relationship among relationship is the use of aggregation. To illustrate the need for such a
construct, consider the relationship works-on between employee, branch, and job. The best way to model a
situation like this is by the use of aggregation. Thus the relationship set works-on relating the entity sets
employee, branch and job is a higher-level entity set. Such an entity set is treated in the same manner, as is any
other entity set. We can then create a binary relationship manages between works-on and manager to represent
who manages what task.
Relational databases are created using a special computer language, structured query language (SQL) that is
the standard for database interoperability. SQL is the foundation for all of the popular database applications
available today, from Access to Oracle.
a) Hierarchical Model: A hierarchical database model is a data model in which the data is organized into
a tree-like structure. The structure allows representing information using parent/child relationships:
each parent can have many children, but each child has only one parent.
Redundancy would occur because hierarchical databases handle one-to-many relationships well but do
not handle many-to-many relationships well. This is because a child may only have one parent.
However, in many cases we want to have the child be related to more than one parent.
b) Network Model: While the hierarchical database model structures data as a tree of records, with each
record having one parent record and many children, the network model allows each record to have
multiple parent and child records, forming a generalized graph structure. This allows network model to
support many-to-many relationship.
Advantages of a Network Database Model
Since it has the many-many relationship, network database model can easily be accessed in any table
record in the database
For more complex data, it is easier to use because of the multiple relationship founded among its
data
Easier to navigate and search for information because of its flexibility
Easily accessed because of the linkage between Difficult to navigate because of its strict owner
the information to member connection
Great flexibility among the information files Less flexibility with the collection of
because the multiple relationships among the information because of the hierarchical
files position of the files
An object-relational database can be said to provide a middle ground between relational databases
and object-oriented databases (OODBMS). In object-relational databases, the approach is
essentially that of relational databases: the data resides in the database and is manipulated
collectively with queries in a query language; at the other extreme is OODBMS in which the
database is essentially a persistent object store for software written in an object-oriented
programming language, with a programming API for storing and retrieving objects, and little or
no specific support for querying.
Structured Query Language
Structured Query Language is a special-purpose programming language designed for managing data in
relational database management systems (RDBMS). SQL is an ANSI (American National Standards
Institute) standard language used for manipulating data.
SQL allows users to access data in relational database management systems, such as Oracle, Sybase,
Informix, Microsoft SQL Server, Access, and others, by allowing users to describe the data the user
wishes to
see. SQL also allows users to define the data in a database, and manipulate that data. Some common
SQL commands include “select”, "insert," "update," and "delete." The language was first created by IBM
in 1975 and was called SEQUEL for "Structured English Query Language."
SQL is based on set and relational operations with certain modifications and enhancements. A typical
SQL query has the form:
Select A1, A2, ..., An
From r1, r2, ..., rm
Where P
– Ai s represent attributes
– ri s represent relations
– P is a predicate.
And the final result of an SQL query is a relation.
SQL statements are divided into two major categories: data manipulation language (DDL) and data
definition language (DDL). DML statements are used to work with the data in tables. DDL statements
are used to build and modify the structure of your tables and other objects in the database. When you
execute a DDL statement, it takes effect immediately.
With the WHERE clause, the following comparison operators can be used:
Operator Description
= Equal
<> Not equal
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal
BETWEEN Between an inclusive range
LIKE Search for a pattern
IN To specify multiple possible values for a column
Other logical Operations (AND, OR, NOT) and extended comparisons (SUB QUERIES, JOINING TABLES,
GROUPING QUERIES, ) can also be done via sql commands.
b) Insert Clause: The INSERT INTO statement is used to insert a new row in a table. The first form doesn't specify
the column names where the data will be inserted, only their values:
The second form specifies both the column names and the values to be inserted:
INSERT INTO table_name (column1, column2, column3,...) VALUES (value1, value2, value3,...)
The number of columns and values must be the same. If a column is not specified, the default value for the
column is used. The values specified (or implied) by the INSERT statement must satisfy all the applicable
constraints (such as primary keys, CHECK constraints, and NOT NULL constraints). If a syntax error occurs or if
any constraints are violated, the new row is not added to the table and an error returned instead.
c) Delete Clause: During some operations, you need to delete single or multiple rows from the table. The
DELETE statement is used to delete rows in a table.
Without the where clause, all records in the table will be deleted. Syntax without where clause is:
DELETE FROM table_name;
After you remove records using a Delete statement, you cannot undo the operation. It is also important to
understand, that a Delete statement deletes entire records, not just data in specified fields. If you just want to
delete certain fields, use an Update query that changes the value to Null.
d) Update Clause: The UPDATE statement is used to update existing records/row in a table.
Without the where clause, all the columns will be updated. Hence, it is mandatory to use update command
carefully. It is possible to modify a single or a multiple column at a time. However, the updated value must not
conflict with all the applicable constraints (such as primary keys, unique indexes, CHECK constraints, and NOT
NULL constraints).
The data type specifies what type of data the column can hold. It may be of INT, VARCHAR, DATE etc.
c) Alter Clause: Alter clause is used whenever there is a need to alter a table, database, user, trigger or a
view. Syntax for altering any object is:
Query by Example
Query-by-Example (QBE) is another language for querying (and, like SQL, for creating and modifying) relational
data. It is different from SQL, and from most other database query languages, in having a graphical user
interface that allows users to write queries by creating example tables on the screen. For example, a user may
want to select an entry from a table called "Table1" with an ID of 123. Using SQL, the user would need to input
the command, "SELECT * FROM Table1 WHERE ID = 123". The QBE interface may allow the user to just click on
Table1, type in "123" in the ID field and click "Search."
QBE is offered with most database programs, though the interface is often different between applications. For
example, Microsoft Access has a QBE interface known as "Query Design View" that is completely graphical. The
phpMyAdmin application used with MySQL, offers a Web-based interface where users can select a query
operator and fill in blanks with search terms. Whatever QBE implementation is provided with a program, the
purpose is the same – to make it easier to run database queries and to avoid the frustrations of SQL errors.
UNIT- II
Integrity constraints provide a way of ensuring the changes made to the database by authorized users do not
result in a loss of data consistency and are used to ensure accuracy and consistency of data in a relational
database. Enforcing data integrity ensures the quality of the data in the database. For example, if an employee
is entered with an employee_id value of 123, the database should not allow another employee to have an ID
with the same value. If you have an employee_rating column intended to have values ranging from 1 to 5, the
database should not accept a value of 6. Data integrity falls into following categories:
Domain Constraints/ Integrity: Domain constraints are the most elementary form of integrity constraint. A
domain is defined as the set of all unique values permitted for an attribute. Domain Integrity enforces valid
entries for a given column by restricting the type, the format, or the range of possible values. They are tested
easily by the system whenever a new data item is entered into the database.
Create domain hourly-wage numeric (5,2) constraint value-test check (value >= 4.00)
– The domain hourly-wage is declared to be a decimal number with 5 digits, 2 of which are after the
decimal point
– The domain has a constraint that ensures that the hourly-wage is greater than 4.00.
– The clause constraint value-test is optional; useful to indicate which constraint an update violated.
Referential Constraints/ Integrity: Referential integrity is a database concept that ensures that
relationships between tables remain consistent. Referential integrity prevents inconsistent data from being
created in the database by ensuring that any data shared between tables remains consistent. To put it
another way, it ensures that the soundness of the relationships remains intact.
For referential integrity to hold in a relational database, any field in a table that is declared a foreign
key can contain only values from a parent table's primary key or a candidate key. For instance, deleting a
record that contains a value referred to by a foreign key in another table would break referential integrity.
Some relational database management systems (RDBMS) can enforce referential integrity, normally either
by deleting the foreign key rows as well to maintain integrity, or by returning an error and not performing
the delete
Sample SQL using Key constraints:
To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY constraint on multiple columns:
CREATE TABLE Persons (P_Id int NOT NULL,LastName varchar(255) NOT NULL,FirstName varchar(255),
Address varchar(255),City varchar(255),CONSTRAINT pk_PersonID PRIMARY KEY (P_Id,LastName))
In the example above there is only ONE PRIMARY KEY (pk_PersonID). However, the value of the pk_PersonID is
made up of two columns (P_Id and LastName). If a primary key consists of more than one column, duplicate
values are allowed in one column, but each combination of values from all the columns in the primary key
must be unique.
Let's illustrate the foreign key with an example. Look at the following two tables:
The "Persons" table: P_Id, LastName, FirstName, Address, City
The “Orders” table: O_id, OrderNo, P_Id
Note that the "P_Id" column in the "Orders" table points to the "P_Id" column in the "Persons" table.
The "P_Id" column in the "Persons" table is the PRIMARY KEY in the "Persons" table.
The "P_Id" column in the "Orders" table is a FOREIGN KEY in the "Orders" table.
The FOREIGN KEY constraint is used to prevent actions that would destroy links between tables. The FOREIGN
KEY constraint also prevents that invalid data form being inserted into the foreign key column, because it has
to be one of the values contained in the table it points to.
For Creating Foreign Key
Assertion:
An assertion is a predicate expressing a condition that we want the database always to satisfy. Assertions are
included in the SQL standard. Syntax:
When an assertion is specified, the DBMS tests for its validity. This testing may introduce a significant amount
of computing overhead (query evaluation), thus assertions should be used carefully. Assertions must be
checked at any change to the mentioned (in the assertion declaration) relations. Domain constraints,
functional dependency and referential integrity are special forms of assertion. When an assertion is created,
the system tests it for validity. If the assertion is valid, any further modification to the database is allowed only
if it does not cause that assertion to be violated.
For example, if you want to prevent investors from withdrawing more than a certain amount of money from
collective fund, you could create an assertion using the following SQL statement:
Once you add the MAXIMUM_WITHDRAWAL ASSERTION to the database definition, the DBMS will check to
make sure that the assertion remains TRUE each time you execute an SQL statement that modifies either the
INVESTOR or WITHDRAWALS tables. As such, each time the user or application program attempts to execute an
INSERT, UPDATE, or DELETE statement on one of the tables in the assertion's CHECK clause, the DBMS checks
the check condition against the database, including the proposed modification. If the check condition remains
TRUE, the DBMS carries out the modification. If the modification makes the check condition FALSE, the DBMS
does not perform the modification and returns an error code indicating that the statement was unsuccessful
due to an assertion violation.
Triggers:
A trigger is a statement (procedure) that is executed automatically by the DBMS whenever a specified event
occurs. The trigger is mostly used for maintaining the integrity of the information on the database. For
example, when a new record is added to the employees table, new records should also be created in the tables
of the taxes, vacations and salaries.
Since triggers are event-driven specialized procedures, they are stored in and managed by the DBMS. A trigger
cannot be called or executed; the DBMS automatically fires the trigger as a result of a data modification to the
associated table
A trigger defines a set of actions that are performed in response to an insert, update, or delete operation on a
specified table. When such an SQL operation is executed, the trigger is said to have been activated. Triggers are
optional and are defined using the CREATE TRIGGER statement.
The general syntax of CREATE TRIGGER is :
CREATE TRIGGER trigger_name trigger_time trigger_event ON tbl_name FOR EACH ROW trigger_statement
The general syntax of DROP TRIGGER is:
DROP TRIGGER trigger_name
An example: All new customers opening an account must have a balance of $100; however, once the account is
opened their balance can fall below that amount. In this case you have to use a trigger because you only want
the condition evaluated when a new record is inserted.
Normalization
First Normal Form:
First Normal form imposes a very basic requirement on relations; unlike the other normal forms, it does not
require additional information such as functional dependencies.
A domain is atomic if elements of the domain are considered to be indivisible units. We say that a relation
schema R is in first normal form (1NF) if the domains of all attributes of R are atomic.
For example, we have following table. We can see movies rented with multiple values i.e. non-atomic.
In the table above, the order number serves as the primary key. Notice that the customer and total amount
are dependent upon the order number -- this data is specific to each order. However, the contact person is
dependent upon the customer. An alternative way to accomplish this would be to create two tables:
Third normal form is pretty simple. In plain terms it means that no column can depend on a non-key column. In
technical terms we say that there are no "transitive dependencies."
The example below shows a table that is not in third normal form. The problem is that the letter grade
depends on the numeric grade:
The violation is the last column, LETTER, which is not a property of the key but is functionally dependent upon
the GRADE value. The term "functionally dependent" means that column LETTER is a function of GRADE.
As always, normalization is meant to help us. The problem with the table as-is is that you cannot guarantee
that when the GRADE column is updated that the LETTER column will be correctly updated. The non-
normalized table can lead to what we call UPDATE ANOMALIES.
Another example is suppose in the Student table you had student birth date as an attribute and you also had
student's age. Students age depends on the student's birth date (a fact about his/her birth date) so 3rd Normal
Form is violated.
A 3NF table which does not have multiple overlapping candidate keys is guaranteed to be in BCNF. If there’s a
multiple overlapping candidate keys, then it may suffer from update anomalies.
Example for Normalizing Tables
Assume a video library maintains a database of movies rented out. Without any normalization all information is stored
in one table as shown below. Here in the table, we can see that column Movies rented has multiple values.
Hence, taking table to First Normal Form(1NF), we can separate each tuple to single value.
But in the above table, there’s a repetition of data as we can see many duplicate values. Hence we need to take
this table to Second Normal Form (2NF) by assigning a Primary Key MEMBERSHIP ID. Same key is used as
foreign key to another table. The main purpose of assigning primary/ foreign key is to avoid repetition of data
and for efficient management.
Even we have converted table to 2NF, we can here see that changing non key attribute FULLNAME may change
salutation. That means SALUTATION is functionally dependent upon FULLNAME and if FULLNAME is changed
by updating, we may have incorrect SALUTATION. So, to maintain Third Normal Form (3NF), we need to
separate attributes which are functionally dependent in this table. Hence, our 3NF table would be
Functional Dependency
A Functional Dependency describes a relationship between attributes in a single relation. An attribute is
functionally dependent on another if we can use the value of one attribute to determine the value of another.
A set of attributes Y is functionally dependent on a set of attributes X if a given set of values for each attribute
in X determines unique values for the set of attributes in Y. We use the notation:
X → Y to denote that Y is functionally dependent on X. The set of attributes X is known as the determinant of
the FD X → Y.
Trivial and Non-Trivial Functional Dependencies. It is often the case when examining the FDs associated with a
relation that some FDs are of less interest because they are self-evident. For example consider the relation:
STUDENT (Student_ID, First_Name, Last_Name) The FD {First_Name, Last_Name}→{First_Name} is true but of
little informational interest.
A trivial functional dependency occurs when you describe a functional dependency of an attribute on a
collection of attributes that includes the original attribute. For example, “{A, B} -> B” is a trivial functional
dependency, as is “{name, SSN} -> SSN”. This type of functional dependency is called trivial because it can be
derived from common sense. It is obvious that if you already know the value of B, then the value of B can be
uniquely determined by that knowledge.
Suppose we are given a relation scheme R= (A, B, C, G, H, I), and the set of functional dependencies:
A B, A C, CG H, CG I, B H
Lossless Decomposition
In this example, additional tuples are obtained along with original tuples
Although there are more tuples, this leads to less information
Due to the loss of information, decomposition for previous example is called lossy decomposition
or lossy-join decomposition