ADBMS

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 26

UNIT1

Entity

An entity is any object in the system that we want to model and store information
about. Entities are usually recognizable concepts, either concrete or abstract,
such as person, places, things, or events which have relevance to the database.
Some specific examples of entities are Employee, Student, Lecturer.A design tool
that allows database administrators to view the relationships between several
entities is called the entity relationship diagram (ERD).In database administration,
only those things about which data will be captured or stored is considered an
entity. If you aren't going to capture data about something, there's no point in
creating an entity in a database.

Attribute
An attribute is a characteristic of an entity. It also may refer to a database field.
Attributes describe the instances in the row of a database.An attribute defines the
information about the entity that needs to be stored. If the entity is an employee,
attributes could include name, employee ID, health plan enrollment, and work
location. An entity will have zero or more attributes, and each of those attributes
apply only to that entity. For example, the employee ID of 123456 belongs to that
employee entity alone.Attributes also have further refinements, such as domain
and key. The domain of an entity describes the possible values of attributes.
Types of Attributes
Simple attribute − Simple attributes are atomic values, which cannot be divided
further. For example, a student's phone number is an atomic value of 10 digits.
Composite attribute − Composite attributes are made of more than one simple
attribute. For example, a student's complete name may have first_name and
last_name.
Derived attribute − Derived attributes are the attributes that do not exist in the
physical database, but their values are derived from other attributes present in
the database. For example, average_salary in a department should not be saved
directly in the database, instead it can be derived. For another example, age can
be derived from data_of_birth.
Single-value attribute − Single-value attributes contain single value. For example
− Social_Security_Number.
Multi-value attribute − Multi-value attributes may contain more than one values.
For example, a person can have more than one phone number, email_address,
etc.
KEY
Any attribute in the table which uniquely identifies each record in the table is
called key. It can be a single attribute or a combination of attributes. For example,
in STUDENT table, STUDENT_ID is a key, since it is unique for each student. In
PERSON table, his passport number, driving license number, phone number, SSN,
email address is keys since they are unique for each person
TYPES
Primary key
A primary key is a column -- or a group of columns -- in a table that uniquely
identifies the rows in that table.The data values placed in the primary key column
must be unique to each row; no duplicates can be used. In addition, nulls are not
allowed in primary key columns.For example, in STUDENT table, STUDENT_ID is a
key, since it is unique for each student.
Candidate Key
Candidate keys are defined as the minimal set of fields which can uniquely
identify each record in a tableA candidate key is a column that meets all of the
requirements of a primary key. In other words, it has the potential to be a primary
key, like the CustomerNo column. A table can have more than one candidate key
from which a primary key selected.
Composite Key
A composite key, in the context of relational databases, is a combination of two or
more columns in a table that can be used to uniquely identify each row in the
table. Uniqueness is only guaranteed when the columns are combined; when
taken individually the columns do not guarantee uniqueness.
Super key
Super Key is defined as a set of attributes within a table that can uniquely identify
each record within a table. Super Key is a superset of Candidate key.
In the table defined above super key would include student_id, (student_id,
name), phone etc.
Confused? The first one is pretty simple as student_id is unique for every row of
data, hence it can be used to identity each row uniquely.
Next comes, (student_id, name), now name of two students can be same, but
their student_id can't be same hence this combination can also be a key.
Similarly, phone number for every student will be unique, hence again, phone can
also be a key.
So they all are super keys.
Foreign key
A FOREIGN KEY is a key used to link two tables together.

A FOREIGN KEY is a field (or collection of fields) in one table that refers to the
PRIMARY KEY in another table.
The table containing the foreign key is called the child table, and the table
containing the candidate key is called the referenced or parent table.
Secondary or Alternative key
The candidate key which are not selected as primary key are known as secondary
keys or alternative keys.
Simple key
Simple keys have a single field to specially recognize a record. The single field
cannot be divided into more fields. Primary key is also a simple key.
Compound key
Compound key has many fields to uniquely recognize a record. Compound key is
different from composite key because any part of this key can be foreign key but
in composite key its part may or may not be a foreign key.
Natural /Domain/ Business Key
It is a key that is naturally declared as the Primary key. Natural keys are
sometimes called as business or domain keys because these key are based on the
real world observation. So it is a key whose attributes or values exists in the real
world. These attributes have logical relationship with the table.
For Example: Social Security Number (SSN) is a natural key that can be declared as
the primary key
Surrogate Key
Surrogate key is a kind of primary key, but it is not defined by the designer. It is a
system generated random number, which uniquely identifies the entity in the
system and not available for the user.
Non-key Attributes
Non-key attributes are the attributes or fields of a table, other than candidate key
attributes/fields in a table.
Non-prime Attributes
Non-prime Attributes are attributes other than Primary Key attribute(s)..
RELATIONSHIP
A relationship, in the context of databases, is a situation that exists between two
relational database tables when one table has a foreign key that references the
primary key of the other table. Relationships allow relational databases to split
and store data in different tables, while linking disparate data items.
For example, in a bank database a CUSTOMER_MASTER table stores customer
data with a primary key column named CUSTOMER_ID; it also stores customer
data in an ACCOUNTS_MASTER table, which holds information about various bank
accounts and associated customers. To link these two tables and determine
customer and bank account information, a corresponding CUSTOMER_ID column
must be inserted in the ACCOUNTS_MASTER table, referencing existing customer
IDs from the CUSTOMER_MASTER table. In this case, the ACCOUNTS_MASTER
table’s CUSTOMER_ID column is a foreign key that references a column with the
same name in the CUSTOMER_MASTER table. This is an example of a relationship
between the two tables.
TYPES
One-to-One Relationships

A pair of tables bears a one-to-one relationship when a single record in the first
table is related to only one record in the second table, and a single record in the
second table is related to only one record in the first table. One-to-Many
Relationships
One to many
A one-to-many relationship exists between a pair of tables when a single record in
the first table can be related to one or more records in the second table, but a
single record in the second table can be related to only one record in the first
table.
Many-to-many: Each record in both tables can relate to any number of records
(or no records) in the other table. For instance, if you have several siblings, so do
your siblings (have many siblings). Many-to-many relationships require a third
table, known as an associate or linking table, because relational systems can't
directly accommodate the relationship.
Association
Association is a relationship between two objects. In other words, association
defines the multiplicity between objects. You may be aware of one-to-one, one-
to-many, many-to-one, many-to-many all these words define an association
between objects.
DATABASE
A database is a collection of information that is organized so that it can be easily
accessed, managed and updated.
Data is organized into rows, columns and tables, and it is indexed to make it
easier to find relevant information. Data gets updated, expanded and deleted as
new information is added. Databases process workloads to create and update
themselves, querying the data they contain and running applications against it.
DATABASE MANAGEMENT SYSTEM
A database management system (DBMS) is system software for creating and
managing databases. The DBMS provides users and programmers with a
systematic way to create, retrieve, update and manage data.The DBMS essentially
serves as an interface between the database and end users or application
programs, ensuring that data is consistently organized and remains easily
accessible.
The DBMS manages three important things: the data, the database engine that
allows data to be accessed, locked and modified -- and the database schema,
which defines the database’s logical structure.
Data independece
Data independence is ability to modify a schema definition in one level without
affecting a schema definition in the next higher level.In other words it is the
acquired skill to change a conceptual pattern by not altering the conceptual
pattern of the next superior level.It is One of the vast advantages of DBMS.
TYPES
1)Logical Data Idependence
Logical data independence indicates that the conceptual schema can be changed
without affecting the existing external schemas. The change would be absorbed
by the mapping between the external and conceptual levels. Logical data
independence also insulates application programs from operations such as
combining two records into one or splitting an existing record into two or more
records.
2)Physical data independence
The ability to change the physical schema without changing the logical schema is
called physical data independence. For example, a change to the internal schema,
such as using different file organization or storage structures, storage devices, or
indexing strategy, should be possible without having to change the conceptual or
external schemas.The change would be absorbed by the mapping between the
conceptual and internal levels.
THREE TIER ARCHITECTURE
A 3-tier architecture separates its tiers from each other based on the complexity
of the users and how they use the data present in the database. It is the most
widely used architecture to design a DBMS.
Database (Data) Tier: has an internal schema, which describes the physical
storage structure of the database. The internal schema uses a physical data model
and describes the complete details of data storage and access paths for the
database.
Application (Middle) Tier: At this tier reside the application server and the
programs that access the database. For a user, this application tier presents an
abstracted view of the database. End-users are unaware of any existence of the
database beyond the application. At the other end, the database tier is not aware
of any other user beyond the application tier. Hence, the application layer sits in
the middle and acts as a mediator between the end-user and the database.
User (Presentation) Tier: End-users operate on this tier and they know nothing
about any existence of the database beyond this layer. At this layer, multiple
views of the database can be provided by the application. All views are generated
by applications that reside in the application tier.
DATABASE COMPONENTS
A database system is composed of four components;
• Data
• Hardware
• Software
• Users
which coordinate with each other to form an effective database system.
1)Data - It is a very important component of the database system. Most of the
organizations generate, store and process 1arge amount of data. The data acts a
bridge between the machine parts i.e. hardware and software and the users
which directly access it or access it through some application programs.
2. Hardware - The hardware consists of the secondary storage devices such as
magnetic disks (hard disk, zip disk, floppy disks), optical disks (CD-ROM), magnetic
tapes etc. on which data is stored together with the Input/Output devices
(mouse, keyboard, printers), processors, main memory etc. which are used for
storing and retrieving the data in a fast and efficient manner.
3. Software - The Software part consists of DBMS which acts as a bridge between
the user and the database or in other words, software that interacts with the
users, application programs, and database and files system of a particular storage
media (hard disk, magnetic tapes etc.) to insert, update, delete and retrieve data.
For performing these operations such as insertion, deletion and updation we can
either use the Query Languages like SQL, QUEL, Gupta SQL or application
softwares such as Visual 3asic, Developer etc.
4. Users - Users are those persons who need the information from the database
to carry out their primary business responsibilities i.e. Personnel, Staff, Clerical,
Managers, Executives etc. On the basis of the job and requirements made by
them they are provided access to the database totally or partially.
The various types of users which can access the database are:-
• Database Designers
• Database Administrators (DBA)
• Application Programmers
• End Users
1)DATABASE DESIGNERS
Database Designers are the group of people who actually work on the designing
part of the database. They keep a close watch on what data should be kept and in
what format. They identify and design the whole set of entities, relations,
constraints, and views.
2)DATABASE ADMINISTRATORS
Administrators maintain the DBMS and are responsible for administrating the
database. They are responsible to look after its usage and by whom it should be
used. They create access profiles for users and apply limitations to maintain
isolation and force security. Administrators also look after DBMS resources like
system license, required tools, and other software and hardware related
maintenance.
3)APPLICATION PROGRAMMERS
As its name shows, application programmers are the one who writes application
programs that uses the database. These application programs are written in
programming languages like COBOL or PL (Programming Language 1), Java and
fourth generation language. These programs meet the user requirement and
made according to user requirements. Retrieving information, creating new
information and changing existing information is done by these application
programs.
They interact with DBMS through DML (Data manipulation language) calls
4)End Users
End users are those who access the database from the terminal end. They use the
developed applications and they don’t have any knowledge about the design and
working of database. These are the second class of users and their main motto is
just to get their task done. There are basically two types of end users that are
discussed below.
 Sophisticated User
These users have great knowledge of query language. Casual users access data by
entering different queries from the terminal end. They do not write programs but
they can interact with the system by writing queries.
 Naive
Any user who does not have any knowledge about database can be in this
category. There task is to just use the developed application and get the desired
results. For example: Clerical staff in any bank is a naïve user. They don’t have any
dbms knowledge but they still use the database and perform their given task.
Advantages of Database Management System (DBMS)
1. Controlling Data Redundancy: In non-database systems (traditional computer
file processing), each application program has its own files. In this case, the
duplicated copies of the same data are created at many places. In DBMS, all the
data of an organization is integrated into a single database. The data is recorded
at only one place in the database and it is not duplicated.
2. Data Consistency: By controlling the data redundancy, the data consistency is
obtained. If a data item appears only once, any update to its value has to be
performed only once and the updated value (new value of item) is immediately
available to all users.
If the DBMS has reduced redundancy to a minimum level, the database system
enforces consistency. It means that when a data item appears more than once in
the database and is updated, the DBMS automatically updates each occurrence of
a data item in the database.
3. Data Sharing: In DBMS, data can be shared by authorized users of the
organization. The DBA manages the data and gives rights to users to access the
data. Many users can be authorized to access the same set of information
simultaneously. The remote users can also share same data. Similarly, the data of
same database can be shared between different application programs.
4. Data Integration: In DBMS, data in database is stored in tables. A single
database contains multiple tables and relationships can be created between
tables (or associated data entities). This makes easy to retrieve and update data.
5)Data Security: Data security is the protection of the database from
unauthorized users. Only the authorized persons are allowed to access the
database. Some of the users may be allowed to access only a part of database i.e.,
the data that is related to them or related to their department
DISADVANTAGES
1)Cost
DBMS requires high initial investment for hardware, software and trained staff. A
significant investment based upon size and functionality of organization if
required. Also organization has to pay concurrent annual maintenance cost.
2)Complexity
A DBMS fulfill lots of requirement and it solves many problems related to
database. But all these functionality has made DBMS an extremely complex
software. Developer, designer, DBA and End user of database must have
complete skills if they want to user it properly. If they don’t understand this
complex system then it may cause loss of data or database failure.
3)Technical staff requirement
Any organization have many employees working for it and they can perform many
others tasks too that are not in their domain but it is not easy for them to work on
DBMS. A team of technical staff is required who understand DBMS and company
have to pay handsome salary to them too.
4)Database Failure
As we know that in DBMS, all the files are stored in single database so chances of
database failure become more. Any accidental failure of component may cause
loss of valuable data. This is really a big question mark for big firms.
5)Extra Cost of Hardware
A DBMS requires disk storage for the data and sometimes you need to purchase
extra space to store your data. Also sometimes you need to a dedicated machine
for better performance of database. These machines and storage space increase
extra costs of hardware
UNIT2
Relational Database (RDB)
A relational database (RDB) is a collective set of multiple data sets organized by
tables, records and columns. RDBs establish a well-defined relationship between
database tables. Tables communicate and share information, which facilitates
data searchability, organization and reporting.
RDBs use Structured Query Language (SQL), which is a standard user application
that provides an easy programming interface for database interaction.
CODD'S RULES
Dr Edgar F. Codd, after his extensive research on the Relational Model of database
systems, came up with twelve rules of his own, which according to him, a
database must obey in order to be regarded as a true relational database.
These rules can be applied on any database system that manages stored data
using only its relational capabilities. This is a foundation rule, which acts as a base
for all the other rules.
Rule 1: Information Rule
The data stored in a database, may it be user data or metadata, must be a value
of some table cell. Everything in a database must be stored in a table format.
Rule 2: Guaranteed Access Rule
Every single data element (value) is guaranteed to be accessible logically with a
combination of table-name, primary-key (row value), and attribute-name (column
value). No other means, such as pointers, can be used to access data.
Rule 3: Systematic Treatment of NULL Values
The NULL values in a database must be given a systematic and uniform treatment.
This is a very important rule because a NULL can be interpreted as one the
following − data is missing, data is not known, or data is not applicable.
Rule 4: Active Online Catalog
The structure description of the entire database must be stored in an online
catalog, known as data dictionary, which can be accessed by authorized users.
Users can use the same query language to access the catalog which they use to
access the database itself.
Rule 5: Comprehensive Data Sub-Language Rule
A database can only be accessed using a language having linear syntax that
supports data definition, data manipulation, and transaction management
operations. This language can be used directly or by means of some application. If
the database allows access to data without any help of this language, then it is
considered as a violation.
Rule 6: View Updating Rule
All the views of a database, which can theoretically be updated, must also be
updatable by the system.
Rule 7: High-Level Insert, Update, and Delete Rule
A database must support high-level insertion, updation, and deletion. This must
not be limited to a single row, that is, it must also support union, intersection and
minus operations to yield sets of data records.
Rule 8: Physical Data Independence
The data stored in a database must be independent of the applications that
access the database. Any change in the physical structure of a database must not
have any impact on how the data is being accessed by external applications.
Rule 9: Logical Data Independence
The logical data in a database must be independent of its user’s view
(application). Any change in logical data must not affect the applications using it.
For example, if two tables are merged or one is split into two different tables,
there should be no impact or change on the user application. This is one of the
most difficult rule to apply.
Rule 10: Integrity Independence
A database must be independent of the application that uses it. All its integrity
constraints can be independently modified without the need of any change in the
application. This rule makes a database independent of the front-end application
and its interface.
Rule 11: Distribution Independence
The end-user must not be able to see that the data is distributed over various
locations. Users should always get the impression that the data is located at one
site only. This rule has been regarded as the foundation of distributed database
systems.
Rule 12: Non-Subversion Rule
If a system has an interface that provides access to low-level records, then the
interface must not be able to subvert the system and bypass security and integrity
constraints.
DATA INTEGRITY
Data integrity is the maintenance of, and the assurance of the accuracy and
consistency of, data over its entire life-cycle, and is a critical aspect to the design,
implementation and usage of any system which stores, processes, or retrieves
data.
Data integrity is normally enforced in a database system by a series of integrity
constraints or rules.They are:-
Entity integrity concerns the concept of a primary key. Entity integrity is an
integrity rule which states that every table must have a primary key and that the
column or columns chosen to be the primary key should be unique and not null.
Referential integrity concerns the concept of a foreign key. The referential
integrity rule states that any foreign-key value can only be in one of two states.
The usual state of affairs is that the foreign-key value refers to a primary key value
of some table in the database. Occasionally, and this will depend on the rules of
the data owner, a foreign-key value can be null. In this case, we are explicitly
saying that either there is no relationship between the objects represented in the
database or that this relationship is unknown.
Domain integrity specifies that all columns in a relational database must be
declared upon a defined domain. The primary unit of data in the relational data
model is the data item. Such data items are said to be non-decomposable or
atomic. A domain is a set of values of the same type. Domains are therefore pools
of values from which actual values appearing in the columns of a table are drawn.
User-defined integrity refers to a set of rules specified by a user, which do not
belong to the entity, domain and referential integrity categories.
DATABASE ANOMALIES
Anomalies are problems that can occur in poorly planned, un-normalised
databases where all the data is stored in one table (a flat-file database).
Insertion Anomaly - An insertion anomaly is the inability to add data to the
database due to absence of other data. E.g. A library database that cannot store
the details of a new member until that member has taken out a book.This results
in database inconsistencies due to omission.
Deletion Anomaly - A deletion anomaly is the unintended loss of data due to
deletion of other data. E.g. Deleting a book loan from a library member can
remove all details of the particular book from the database such as the author,
book title etc.
Update Anomaly -An update anomaly is a data inconsistency that results from
data redundancy and a partial update.For example, when we try to update one
data item having its copies scattered over several places, a few instances get
updated properly while a few others are left with old values. Such instances leave
the database in an inconsistent state.
RELATIONAL ALGEBRA
Relational algebra is a special form of algebra that describes the data stored in
relational databases and the query languages used to access that data. It was first
developed by E. F. Codd at IBM and was formally introduced in 1970. Codd's work
became the basis for database query languages such as SQL and MySQL..
RELATIONAL ALGEBRA OPERATIONS
Select Operation (σ)
Selects tuples from a relation whose attributes meet the selection criteria, which
is normally expressed as a predicate.
Notation: σp(r)
Where σ stands for selection predicate and r stands for relation. p is prepositional
logic formula which may use connectors like and, or, and not. These terms may
use relational operators like: =, ≠, ≥, <, >, ≤.
For example:
σsubject="database"(Books)
Output: Selects tuples from books where subject is 'database'.
Project Operation (∏)
It projects column(s) that satisfy a given predicate.
Notation: ∏A1, A2, An (r)
Where A1, A2, An are attribute names of relation r.Duplicate rows are
automatically eliminated, as relation is a set.
For example:
∏subject, author (Books)
Selects and projects columns named as subject and author from the relation
Books.
Join
Combines attributes of two relations into one.
R3 = join(R1,D1,R2,D2)
Given a domain from each relation, join considers all possible
pairs of tuples from the two relations, and if their values for the chosen domains
are equal, it adds a tuple to the result containing all the attributes of both tuples
(discarding the duplicate domain
Union Operation (∪)
It performs binary union between two given relations and is defined as:
r ∪ s = { t | t ∈ r or t ∈ s}
For a union operation to be valid, the following conditions must hold:
r and s must have the same number of attributes.
Attribute domains must be compatible.
Duplicate tuples are automatically eliminated.
Example
∏ author (Books) ∪ ∏ author (Articles)
Output: Projects the names of the authors who have either written a book or an
article or both.
Set Difference (−)
The result of set difference operation is tuples, which are present in one relation
but are not in the second relation.
Notation: r − s
Finds all the tuples that are present in r but not in s.
Example
∏author(Books) − ∏author(Articles)
Output: Provides the name of authors who have written books but not articles.
Cartesian Product (Χ)
Combines information of two different relations into one.
Notation − r Χ s
Where r and s are relations and their output will be defined as −
r Χ s = { q t | q ∈ r and t ∈ s}
EXAMPLE
σauthor = 'tutorialspoint'(Books Χ Articles)
Output − Yields a relation, which shows all the books and articles written by
tutorialspoint
SQL(STRUCTURED QUERY LANGUAGE)
Structured Query Language)[5][6][7][8] is a domain-specific language used in
programming and designed for managing data held in a relational database
management system (RDBMS), or for stream processing in a relational data
stream management system (RDSMS). It is particularly useful in handling
structured data where there are relations between different entities/variables of
the data.
SQL commands are mainly categorized into four categories as discussed below:
DDL(Data Definition Language) : DDL or Data Definition Language
actually consists of the SQL commands that can be used to define the database
schema. It simply deals with descriptions of the database schema and is used to
create and modify the structure of database objects in database.
Examples of DDL commands:
CREATE – is used to create the database or its objects (like table, index, function,
views, store procedure and triggers).
DROP – is used to delete objects from the database.
ALTER-is used to alter the structure of the database.
INDEX-The INDEX statement is used to create indexes in tables.Indexes are used
to retrieve data from the database very fast.
TRUNCATE–is used to remove all records from a table, including all spaces
allocated for the records are removed.
COMMENT –is used to add comments to the data dictionary.
RENAME –is used to rename an object existing in the database.

DML(Data Manipulation Language) : The SQL commands that deals with the
manipulation of data present in database belong to DML or Data Manipulation
Language and this includes most of the SQL statements.
Examples of DML:
SELECT – is used to retrieve data from the a database.
INSERT – is used to insert data into a table.
UPDATE – is used to update existing data within a table.
DELETE – is used to delete records from a database table.
DCL(Data Control Language) : DCL includes commands such as GRANT and
REVOKE which mainly deals with the rights, permissions and other controls of the
database system.
Examples of DCL commands:

GRANT-gives user’s access privileges to database.


REVOKE-withdraw user’s access privileges given by using the GRANT command.
TCL(transaction Control Language) : TCL commands deals with the transaction
within the database.
Examples of TCL commands:
COMMIT– commits a Transaction.
ROLLBACK– rollbacks a transaction in case of any error occurs.
SAVEPOINT–sets a savepoint within a transaction.
SET TRANSACTION–specify characteristics for the transaction.
FORMS AND REPORTS IN MS ACCESS
Forms make the data available on the screen. With a form, we can view and edit
the data, display it nicely, sort it, add to it, delete it and so on. Forms let us work
with our data. They don’t hold any data – they are just a tool for viewing the data
in our table. Changing something in our form, and we’re actually changing it in
our table. Yes, we could – theoretically – just work with the data in our tables
directly, like we would work with data in an Excel spreadsheet. But remember
that tables don’t sort the data or present it nicely – so we’ll quite likely end up
looking at very old data that’s no longer relevant (because the new data is a long
way down the table),That’s a bad thing. And it’s much less likely to happen with a
form.
Reports also display our data, but on paper. Unlike Forms, Reports don’t allow us
to edit the data – they are designed to be static. After all, once we’ve printed our
data on paper (or as a PDF) it’s going to be pretty static, so Reports reflect that.
Whether it’s a product catalogue or a staff directory, an invoice or a
manufacturing docket, if it’s data on paper, it’s a report.

UNIT3
RELATIONAL DATABASE DESIGN (RDD)
Relational database design (RDD) models information and data into a set of tables
with rows and columns. Each row of a relation/table represents a record, and
each column represents an attribute of data. The Structured Query Language
(SQL) is used to manipulate relational databases. The design of a relational
database is composed of four stages, where the data are modeled into a set of
related tables. The stages are:
Define relations/attributes
Define primary keys
Define relationships
Normalization
FULLY FUNCTIONAL DEPENDENCY
A full functional dependency is a state of database normalization that equates to
the normalization standard of Second Normal Form (2NF). In brief, this means
that it meets the requirements of First Normal Form (1NF), and all non-key
attributes are fully functionally dependent on the primary key.In other words In a
relation , there exists Full Functional Dependency between any two attributes X
and Y, when X is functionally dependent on Y and is not functionally dependent
on any proper subset of Y.For example : Let there be a relation R ( Course, Sid ,
Sname , fid, schedule , room , marks )
Full Functional Dependencies : {Course , Sid) -> Sname , {Course , Sid} -> Marks,
etc.
PARTIAL FUNCTIONAL DEPENDENCY
In a relation, there exists Partial Dependency, when a non prime attribute (the
attributes which are not a part of any candidate key ) is functionally dependent on
a proper subset of Candidate Key.
For example : Let there be a relation R ( Course, Sid , Sname , fid, schedule , room
, marks )
Partial Functional Dependencies : Course -> Schedule , Course -> Room
TRANSITIVE DEPENDENCY
If a non-prime attribute of the relation is getting derived by either another non-
prime attribute or the combination of part of the candidate key along with a non-
prime attribute then such dependency would be defined as transitive
dependency.Let A, B, and C designate three distinct attributes (or distinct
collections of attributes) in the relation. Suppose all three of the following
conditions hold:
A→B
It is not the case that B → A
B→C
Then the functional dependency A → C (which follows from 1 and 3 by the axiom
of transitivity) is a transitive dependency.
UNNORMALISED RELATION
An unnormalized relation is a relation that contains repeating values. An
unnormalized relation can also contain relations nested within other relations, as
well as all kinds of transitive dependencies. Sometimes unnormalized relations
are signified by 0NF
NORMALIZATION
Database Normalization is a technique of organizing the data in the database.
Normalization is a systematic approach of decomposing tables to eliminate data
redundancy(repetition) and undesirable characteristics like Insertion, Update and
Deletion Anamolies. It is a multi-step process that puts data into tabular form,
removing duplicated data from the relation tables.
Normalization is used for mainly two purposes,
Eliminating reduntant(useless) data.
Ensuring data dependencies make sense i.e data is logically stored.
Normalization Rule
Normalization rules are divided into the following normal forms:
1)First Normal Form
2)Second Normal Form
3)Third Normal Form
4)BCNF
5)Fourth Normal Form
1)First Normal Form (1NF)
For a table to be in the First Normal Form, it should follow the following 4 rules:
a)It should only have single(atomic) valued attributes/columns.
b) Values stored in a column should be of the same domain
c) All the columns in a table should have unique names.
d) And the order in which data is stored, does not matter.
2)Second Normal Form (2NF)
For a table to be in the Second Normal Form,
a) It should be in the First Normal form.
b) And, it should not have Partial Dependency.
3)Third Normal Form (3NF)
A table is said to be in the Third Normal Form when,
a) It is in the Second Normal form.
b) And, it doesn't have Transitive Dependency.
4)Boyce and Codd Normal Form (BCNF)
Boyce and Codd Normal Form is a higher version of the Third Normal form. This
form deals with certain type of anomaly that is not handled by 3NF. A 3NF table
which does not have multiple overlapping candidate keys is said to be in BCNF.
For a table to be in BCNF, following conditions must be satisfied:
R must be in 3rd Normal Form
and, for each functional dependency ( X → Y ), X should be a super Key.
5)Fourth Normal Form (4NF)
A table is said to be in the Fourth Normal Form when,
It is in the Boyce-Codd Normal Form.
And, it doesn't have Multi-Valued Dependency.
Lossless Join and Dependency Preserving Decomposition
Decomposition of a relation is done when a relation in relational model is not in
appropriate normal form. Relation R is decomposed into two or more relations if
decomposition is lossless join as well as dependency preserving.
Lossless Join Decomposition
If we decompose a relation R into relations R1 and R2,
Decomposition is lossy if R1 ⋈ R2 ⊃ R
Decomposition is lossless if R1 ⋈ R2 = R
To check for lossless join decomposition using FD set, following conditions must
hold:
Union of Attributes of R1 and R2 must be equal to attribute of R. Each attribute of
R must be either in R1 or in R2.
Att(R1) U Att(R2) = Att(R)
Intersection of Attributes of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ Φ
Common attribute must be a key for at least one relation (R1 or R2)
Att(R1) ∩ Att(R2) -> Att(R1) or Att(R1) ∩ Att(R2) -> Att(R2)
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into
R1(ABC) and R2(AD) which is a lossless join decomposition as:
First condition holds true as Att(R1) U Att(R2) = (ABC) U (AD) = (ABCD) = Att(R).
Second condition holds true as Att(R1) ∩ Att(R2) = (ABC) ∩ (AD) ≠ Φ
Third condition holds true as Att(R1) ∩ Att(R2) = A is a key of R1(ABC) because A-
>BC is given.
Dependency Preserving Decomposition
If we decompose a relation R into relations R1 and R2, All dependencies of R
either must be a part of R1 or R2 or must be derivable from combination of FD’s
of R1 and R2.
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into
R1(ABC) and R2(AD) which is dependency preserving because FD A->BC is a part
of R1(ABC).

You might also like