Introduction of DBMS (Database Management System)

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 48

Introduction of DBMS (Database Management System)

DBMS stands for Database Management System. We can break it like this DBMS = Database +
Management System. Database is a collection of data and Management System is a set of programs to
store and retrieve those data. Based on this we can define DBMS like this: DBMS is a collection of inter-
related data and set of programs to store & access those data in an easy and effective manner.

The main purpose of database systems is to manage the data. Consider a university that keeps
the data of students, teachers, courses, books etc. To manage this data we need to store this data
somewhere where we can add new data, delete unused data, update outdated data, retrieve data, to
perform these operations on data we need a Database management system that allows us to store the
data in such a way so that all these operations can be performed on the data efficiently.

File Based System:-


File System manages data using files in hard disk. Users are allowed to create, delete, and
update the files according to their requirement. Let us consider the example of file based University
Management System. Data of students is available to their respective Departments, Academics Section,
Result Section, Accounts Section, Hostel Office etc. Some of the data is common for all sections like Roll
No, Name, Father Name, Address and Phone number of students but some data is available to a
particular section only like Hostel allotment number which is a part of hostel office. Let us discuss the
issues with this system:

The following are the disadvantage of File-Based system.

 Redundancy of data: Data is said to be redundant if same data is copied at many places.
If a student wants to change Phone number, he has to get it updated at various sections.
Similarly, old records must be deleted from all sections representing that student.

 Inconsistency of Data: Data is said to be inconsistent if multiple copies of same data


does not match with each other. If Phone number is different in Accounts Section and
Academics Section, it will be inconsistent. Inconsistency may be because of typing errors
or not updating all copies of same data.

 Difficult Data Access: A user should know the exact location of file to access data, so
the process is very slow and time consuming. If user wants to search student hostel
allotment number of a student from 10000 unsorted students’ records, how difficult it can
be.

 Unauthorized Access: File System may lead to unauthorized access to data. If a


student gets access to file having his marks, he can change it in unauthorized way.
1. No Concurrent Access: The access of same data by multiple users at same time is
known as concurrency. File system does not allow concurrency as data can be accessed
by only one user at a time.

 No Backup and Recovery:- File system does not incorporate any backup and recovery
of data if a file is lost or corrupted.

Advantage of DBMS over file system


A Database Management System (DBMS) is defined as the software system that allows users to
define, create, maintain and control access to the database. DBMS makes it possible for end
users to create, read, update and delete data in database. It is a layer between programs and data.

Compared to the File Based Data Management System, Database Management System has many
advantages. Some of these advantages are given below −

 Reducing Data Redundancy

The file based data management systems contained multiple files that were stored in many
different locations in a system or even across multiple systems. Because of this, there
were sometimes multiple copies of the same file which lead to data redundancy.

This is prevented in a database as there is a single database and any change in it is reflected
immediately. Because of this, there is no chance of encountering duplicate data.

 Sharing of Data

In a database, the users of the database can share the data among themselves. There are various
levels of authorisation to access the data, and consequently the data can only be shared based on
the correct authorisation protocols being followed.

Many remote users can also access the database simultaneously and share the data between
themselves.

 Data Integrity

Data integrity means that the data is accurate and consistent in the database. Data Integrity is
very important as there are multiple databases in a DBMS. All of these databases contain data
that is visible to multiple users. So it is necessary to ensure that the data is correct and consistent
in all the databases and for all the users.
 Data Security

Data Security is vital concept in a database. Only authorised users should be allowed to access
the database and their identity should be authenticated using a username and password.
Unauthorised users should not be allowed to access the database under any circumstances as it
violates the integrity constraints.

 Privacy

The privacy rule in a database means only the authorized users can access a database according
to its privacy constraints. There are levels of database access and a user can only view the data
he is allowed to. For example - In social networking sites, access constraints are different for
different accounts a user may want to access.

 Backup and Recovery

Database Management System automatically takes care of backup and recovery. The users don't
need to backup data periodically because this is taken care of by the DBMS. Moreover, it also
restores the database after a crash or system failure to its previous condition.

 Data Consistency

Data consistency is ensured in a database because there is no data redundancy. All data appears
consistently across the database and the data is same for all the users viewing the database.
Moreover, any changes made to the database are immediately reflected to all the users and there
is no data inconsistency.

Disadvantages of DBMS:

 DBMS implementation cost is high compared to the file system


 Complexity: Database systems are complex to understand
 Performance: Database systems are generic, making them suitable for various
applications. However this feature affect their performance for some applications

DBMS – Three Level Architecture (Logical DBMS Architecture)


The logical architecture describes how data in database is perceived by user. It is not concerned
with how the data is handled and processed by the DBMS, but only with how it looks.

It is divided in to Three levels of abstraction (1) The internal Or Physical Level

(2) The Conceptual Level (3) The external or view level.


1. External level:-
 It is also called view level. The reason this level is called “view” is because several
users can view their desired data from this level which is internally fetched from
database with the help of conceptual and internal level mapping.
 The user doesn’t need to know the database schema details such as data structure,
table definition etc. user is only concerned about data which is what returned back to
the view level after it has been fetched from database (present at the internal level).
 External level is the “top level” of the Three Level DBMS Architecture.

2. Conceptual level

 It is also called logical level. The whole design of the database such as relationship
among data, schema of data etc. are described in this level.
 Database constraints and security are also implemented in this level of
architecture. This level is maintained by DBA (database administrator).

3. Internal level

 This level is also known as physical level. This level describes how the data is actually
stored in the storage devices. This level is also responsible for allocating space to the
data. This is the lowest level of the architecture.
Physical DBMS Architecture

The physical architecture describes the software components used to enter and process
data and how these software components are related and interconnected. It is possible to
identify a number of key functions which are common to most database management system.
Based on various functions, the database system may be partitioned into the following
modules.

DML Pre compiler:-


All the DBMS have two basic sets of Languages - Data Definition Language (DDL) that have the
set of commands needed to define the format of the data that is being stored and Data Manipulation
Language (DML) which tells the set of commands that modify, process data to make user definable
output. The DML statements can as well be written in an application program. The DML pre-compiler
converts DML statements embedded in an application program to normal procedural calls in the host
language. The pre-compiler interacts with the query processor in order to produce the appropriate
code.

DDL Compiler:

Data Description Language compiler processes schema definitions specified in the DDL.

It includes metadata information such as the name of the files, data items, storage details of
each file, mapping information and constraints etc.
Database Administrator:
Database Administrator (DBA) is individual or person responsible for controlling, maintenance,
coordinating, and operation of database management system. Managing, securing, and taking care
of database system is prime responsibility.

Importance of Database Administrator (DBA) :

 Database Administrator manages and controls three levels of database like internal level,
conceptual level, and external level of Database management system architecture and in
discussion with comprehensive user community, gives definition of world view of
database. It then provides external view of different users and applications.
 Database Administrator ensures held responsible to maintain integrity and security of
database restricting from unauthorized users. It grants permission to users of database
and contains profile of each and every user in database.
 Database Administrator also held accountable that database is protected and secured and
that any chance of data loss keeps at minimum.

Role and Duties of Database Administrator (DBA) :

 Decides hardware –
They decides economical hardware, based upon cost, performance and efficiency of
hardware, and best suits organization. It is hardware which is interface between end users
and database.
 Manages data integrity and security –
Data integrity need to be checked and managed accurately as it protects and restricts data
from unauthorized use. DBA eyes on relationship within data to maintain data integrity.
 Database design –
DBA is held responsible and accountable for logical, physical design, external model
design, and integrity and security control.
 Database implementation –
DBA implements DBMS and checks database loading at time of its implementation.
 Query processing performance –
DBA enhances query processing by improving their speed, performance and accuracy.
 Tuning Database Performance –
If user is not able to get data speedily and accurately then it may loss organization
business. So by tuning SQL commands DBA can enhance performance of database.

Types of Database Administrator (DBA) :

 Administrative DBA –
Their job is to maintain server and keep it functional. They are concerned with data
backups, security, trouble shooting, replication, migration etc.
 Data Warehouse DBA –
Assigned earlier roles, but held accountable for merging data from various sources into
data warehouse. They also design warehouse, with cleaning and scrubs data prior to
loading.
 Development DBA –
They build and develop queries, stores procedure, etc. that meets firm or organization
needs. They are par at programmer.
 Application DBA –
They particularly manages all requirements of application components that interact with
database and accomplish activities such as application installation and coordinating,
application upgrades, database cloning, data load process management, etc.
 Architect –
They are held responsible for designing schemas like building tables. They work to build
structure that meets organisation needs. The design is further used by developers and
development DBAs to design and implement real application.
 OLAP DBA –
They design and builds multi-dimensional cubes for determination support or OLAP
systems.

TYPES OF DATABASES:

Databases can be classified according to the number of users, location of database and the
type of usage.

Users: The count of users determines whether the database is single-user or multi-user.
Single-user database allows only one user to interact with the database at a time. Generally,
a desktop system is used for single-user database. So. it can be termed as desktop database.
A multi-user database allows multiple users at a time. It can be used for either work-group
or an enterprise application. The database meant for a department or a small organization
is termed Is Work-group database, whereas an Enterprise database is defined for an
organization with multiple departments consisting of more number of users.

Location: Database can be designed based upon the location. A database that is located to a
single' location is called as Centralized database. A database that is distributed across
afferent locations is called Distributed database.

Usage: Database that is designed to record day-to-day operations is called as Operational


database. The database that is designed to store data which is meant for generating
information to make business decisions is called as Data Warehouse.

Relational model
It is used for data storage and processing.
A relation is a table with columns and rows it is based on the mathematical concept of relation
which is represented physically as table.
Some of the benefits of Relational Model are:-

Ease of use:- The simple tabular representation of database helps the user define and query
the database conveniently.

Flexibility:- New data can be added and deleted easily. Also manipulation of data from
various tables can be done easily using various basic operation.

Accuracy:- In RDBMS the relational algebraic operations are used to manipulate database.

Relational Terms (in RDBMS):-

Informal Term Formal Term

Table-------------------------- Relation

Column header/Field-------- Attributes

All possible column values------- Domain

Row ------------------------- Tuple

Table definition-------------- Schema of relation


Relation name Student Attributes

Stu_Id Stu_Name Stu_Age


23 Steve 29 Tuple
367 Chaitanya 27
234 Ajeet 28
Domain value

How to find degree of Relation :- Number of attributes in a relation is called degree of


relation.

How to find Cardinality of Relation:- Number of tuples in a relation is called cardinality of


relation.

Properties of Relation:-

1. A relation has name that is distinct, for all the names in the relationship schema.
2. Each cell of the relation contains exactly one value.
3. Each attributes has distinct names.
4. Each tuple is distinct, there are no duplicate tuples in relation.
Relational keys
KEYS in DBMS:- KEYS in DBMS is an attribute or set of attributes which helps you to identify a
row(tuple) in a relation(table). They allow you to find the relation between two tables. Keys help you
uniquely identify a row in a table by a combination of one or more columns in that table. Key is also
helpful for finding unique record or row from the table. Database key is also helpful for finding
unique record or row.

student_id name phone age


1 Akon 9876723452 17
2 Akon 9991165674 19
3 Bkon 7898756543 18
4 Ckon 8987867898 19
5 Dkon 9990080080 17

Above is a Student table, with attributes student_id, name, phone and age.

Super Key

Super Key is defined as a set of attributes within a table that can uniquely identify each record
within a table. Super Key is a superset of Candidate key.

In the table defined above super key would include. student_id, (student_id, name),
phone, (student_id , age) & (student_id, age, name , phone)

Confused? The first one is pretty simple as student_id is unique for every row of data, hence it
can be used to identity each row uniquely.

Next comes, (student_id, name), now name of two students can be same, but their
student_id can't be same hence this combination can also be a key.

Similarly, phone number for every student will be unique, hence again, phone can also be a

key. Next comes, (student_id, age) now age of two students can be same, but their

student_id
can't be same hence this combination can also be a key.

So they all are super keys.

Candidate Key

Candidate keys are defined as the minimal set of fields which can uniquely identify each record
in a table. It is an attribute or a set of attributes that can act as a Primary Key for a table to
uniquely identify each record in that table. There can be more than one candidate key.
In our example, student_id and phone both are candidate keys for table Student.

 A candiate key can never be NULL or empty. And its value should be unique.
 There can be more than one candidate keys for a table.
 A candidate key can be a combination of more than one columns(attributes).

Primary Key

Primary key is a candidate key that is most appropriate to become the main key for any table. It
is a key that can uniquely identify each record in a table.

For the table Student we can make the student_id column as the primary key.

Composite Key

Key that consists of two or more attributes that uniquely identify any record in a table is called
Composite key. But the attributes which together form the Composite key are not a key
independentely or individually.
In the above picture we have a Score table which stores the marks scored by a student in a
particular subject.

In this table student_id and subject_id together will form the primary key, hence it is a
composite key.

Secondary or Alternative key

The candidate key which are not selected as primary key are known as secondary keys or
alternative keys.

FOREIGN KEY

A FOREIGN KEY is a key used to link two tables together.

A FOREIGN KEY is a field (or collection of fields) in one table that refers to the PRIMARY
KEY in another table.

The table containing the foreign key is called the child table, and the table containing the
candidate key is called the referenced or parent table.

Look at the following two tables:

"Persons" table:

PersonID LastName FirstName Age


1 Hansen Ola 30
2 Svendson Tove 23
3 Pettersen Kari 20

"Orders" table:

OrderID OrderNumber PersonID


1 77895 3
2 44678 3
3 22456 2
4 24562 1

In the above example "PersonID" column in the "Orders" table points to the "PersonID" column
in the "Persons" table.

The "PersonID" column in the "Persons" table is the PRIMARY KEY in the "Persons" table.

The "PersonID" column in the "Orders" table is a FOREIGN KEY in the "Orders" table.
The FOREIGN KEY constraint is used to prevent actions that would destroy links between
tables.

The FOREIGN KEY constraint also prevents invalid data from being inserted into the foreign
key column, because it has to be one of the values contained in the table it points to.

Types of Database Users:-

1. Native User:- Who need not be aware of the presence of database system.
These are end users of database who work through menu driven
application.
2. Application Programmer:- they are responsible for developing
application program or user interface application program will be written in
High level language.
3. Sophisticated User:- they are interact with the system withought writing
the program they request in query language.
4. Specialized user:- who writes specialized database application that do
not fit into fractional database processing frame work.
5. Online User:- Who may communicate with database directly
through online.

Constraints in DBMS-
 Relational constraints are the restrictions imposed on the database contents and operations.
 They ensure the correctness of data in the database.
 Every relation has some condition that must hold for it to be valid relation called
Relational Integrity Constraints.
Types of Constraints in DBMS-
In DBMS, there are following 4 different types of relational constraints-

1. Domain constraint
2. Tuple Uniqueness constraint
3. Key constraint
4. Entity Integrity constraint
5. Referential Integrity Constraint
1. Domain Constraint-
 Domain constraint defines the domain or set of values for an attribute.
 It specifies that the value taken by the attribute must be the atomic value from its domain.

Example-

Consider the following Student table-

STU_ID Name Age

S001 Akshay 20

S002 Abhishek 21

S003 Shashank 20

S004 Rahul A

Here, value ‘A’ is not allowed since only integer values can be taken by the age attribute.

2. Tuple Uniqueness Constraint-


Tuple Uniqueness constraint specifies that all the tuples must be necessarily unique in any relation.

Example-01:

Consider the following Student table-

STU_ID Name Age

S001 Akshay 20
S002 Abhishek 21

S003 Shashank 20

S004 Rahul 20

This relation satisfies the tuple uniqueness constraint since here all the tuples are unique.

Example-02:

Consider the following Student table-

STU_ID Name Age

S001 Akshay 20

S001 Akshay 20

S003 Shashank 20

S004 Rahul 20

This relation does not satisfy the tuple uniqueness constraint since here all the tuples are not unique.

3. Key Constraint-

Key constraint specifies that in any relation-

 All the values of primary key must be unique.


 The value of primary key must not be null.
Example-

Consider the following Student table-


STU_ID Name Age

S001 Akshay 20

S001 Abhishek 21

S003 Shashank 20

S004 Rahul 20

This relation does not satisfy the key constraint as here all the values of primary key are not
unique.

Example-02:

STU_ID Name Age

S001 Akshay 20

S002 Abhishek 21

S003 Shashank 20

S004 Rahul 20

This relation satisfy the key constraint as here all the values of primary key are unique.

4. Entity Integrity Constraint-

 Entity integrity constraint specifies that no attribute of primary key must contain a null value in
any relation.
 This is because the presence of null value in the primary key violates the uniqueness property.
Example-

Consider the following Student table-

STU_ID Name Age

S001 Akshay 20

S002 Abhishek 21

S003 Shashank 20

Rahul 20

This relation does not satisfy the entity integrity constraint as here the primary key contains a NULL
value.

5. Referential Integrity Constraint-

 This constraint is enforced when a foreign key references the primary key of a relation.
 It specifies that all the values taken by the foreign key must either be available in the relation
of the primary key or be null.
Important Results-

The following two important results emerges out due to referential integrity constraint-

 We can not insert a record into a referencing relation if the corresponding record does not exist in
the referenced relation.
 We can not delete or update a record of the referenced relation if the corresponding record
exists in the referencing relation.

Example-

Consider the following two relations- ‘Student’ and ‘Department’.


Here, relation ‘Student’ references the relation ‘Department’.
Student

STU_ID Name Dept_no

S001 Akshay D10

S002 Abhishek D10

S003 Shashank D11

S004 Rahul D14

Department

Dept_no Dept_name

D10 ASET

D11 ALS

D12 ASFL

D13 ASHS
Here,

 The relation ‘Student’ does not satisfy the referential integrity constraint.
 This is because in relation ‘Department’, no value of primary key specifies department no. 14.
 Thus, referential integrity constraint is violated.

Attributes:-
Attributes are the properties which defines Entity Type.
Entites are represented by means of their properties is called attributes
Entity:- An object with physical existance.---eg. Person, car, house. or
An object with conceptual existence ----eg. company, job, university.
Entity Set:- Set of all Entities.
S1
S2
Student S3 Entity Set
Entity Type

Eg. Roll no , Name, DOB, Age, Address are the attributes which defines entity type Student.

DOB

Name , Age
DOB,

Roll no
Student Address

Note:- There exists a domain or range of values that can be assigned to attribute.

Eg:- student name cannot be numeric. or Age can not be negative.

Types of attributes
1.Simple attributes.
2.composite attributes.
3.Derived attributes.
4.Single value attributes
5. Multi valued attributes

1.Simple attributes:- These are atomic value, which cannot be derived further.
Eg:- Phone number, age.

2. Composite Attribute:- Which can be divided into smaller sub parts.


Student Name ,
DOB,

L. Name

F. Name , M. Name
DOB, Age, DOB, Age,

3.Derived attributes:- These attributes do not exist in the physical database, but there values are
derived from other attributes present in database.

Student Age

DOB

4.Single value attributes :- Contains only single values.


5. Multi valued attributes :- May contain more than one value.
Eg:- A person can have more than one phone value.
A person can have more than one Email id.

Example:- L. Name
F Name

. Composite Attribute
Name
salary

Employee
Phone no
Multivalued attribute.
Age

DOB
Derived attributes EMP id

Kye attribute
ER:- Diagram Component:-
Mapping Cardinalities :-

Advance Features of ER-Diagram :-


There are three such features are
1. Generalisation.
2. Specialisation.
3. Aggregation.

1. Generalisation:-

Generalization is a process in which the common attributes of more than one entities form a new entity. This
newly formed entity is called generalized entity.

Generalization Example

Lets say we have two entities Student and Teacher.


Attributes of Entity Student are: Name, Address & Grade
Attributes of Entity Teacher are: Name, Address & Salary

The ER diagram before generalization looks like this:

These two entities have two common attributes: Name and Address, we can make a generalized entity with these
common attributes. Lets have a look at the ER model after generalization.

The ER diagram after generalization:


We have created a new generalized entity Person and this entity has the common attributes of both the entities. As you
can see in the following ER diagram that after the generalization process the entities Student and Teacher only has the
specialized attributes Grade and Salary respectively and their common attributes (Name & Address) are now associated
with a new entity Person which is in the relationship with both the entities (Student & Teacher).

1. Generalization uses bottom-up approach where two or more lower level entities combine together to form a higher
level new entity.
2. The new generalized entity can further combine together with lower level entity to create a further higher level
generalized entity.

Specialization
Specialization is a process in which an entity is divided into sub-entities. You can think of it as a reverse
process of generalization, in generalization two entities combine together to form a new higher level entity.
Specialization is a top-down process.

The idea behind Specialization is to find the subsets of entities that have few distinguish attributes. For
example – Consider an entity employee which can be further classified as sub-entities Technician, Engineer &
Accountant because these sub entities have some distinguish attributes.

Specialization Example

In the above diagram, we can see that we have a higher level entity “Employee” which we have divided in sub
entities “Technician”, “Engineer” & “Accountant”. All of these are just an employee of a company, however
their role is completely different and they have few different attributes. Just for the example, I have shown that
Technician handles service requests, Engineer works on a project and Accountant handles the credit & debit
details. All of these three employee types have few attributes common such as name & salary which we had
left associated with the parent entity “Employee” as shown in the above diagram.

Aggregration

Aggregation is a process in which a single entity alone is not able to make sense in a relationship so the
relationship of two entities acts as one entity. I know it sounds confusing but don’t worry the example we will
take, will clear all the doubts.
Aggregration Example

In real world, we know that a manager not only manages the employee working under them but he has to
manage the project as well. In such scenario if entity “Manager” makes a “manages” relationship with
either “Employee” or “Project” entity alone then it will not make any sense because he has to manage both.
In these cases the relationship of two entities acts as one entity. In our example, the relationship “Works-
On” between “Employee” & “Project” acts as one entity that has a relationship “Manages” with the entity
“Manager”.
+
UNIT-II: DATABASE INTEGRITY AND NORMALISATION:

What is a Functional Dependency?


Functional Dependency (FD) determines the relation of one attribute to another attribute in a database
management system (DBMS) system. Functional dependency helps you to maintain the quality of data
in the database. A functional dependency is denoted by an arrow →. The functional dependency of X on
Y is represented by X → Y. Functional Dependency plays a vital role to find the difference between good
and bad database design. It is used in understanding the process of Normalization.

Example:
Employee number Employee Name Salary City

1 Dana 50000 San Francisco

2 Francis 38000 London

3 Andrew 25000 Tokyo

In this example, if we know the value of Employee number, we can obtain Employee Name, city, salary,
etc. By this, we can say that the city, Employee Name, and salary are functionally depended on
Employee number.

Types of Functional Dependency


1. Fully Functional dependency:- A dependency A → B in a relation schema R is said to be a fully functionally
dependency if removal of any attribute from A result in the dependency not existing any more.

we can say that B is fully functionally dependent on A .

Eg:-

ABC→D { D is fully functional dependent on ABC}

that means D can not depend on any subset of ABC.

BC→ D Not possible because BC cannot determine D

C→ D C cannot determine D

A→ D A cannot determine D

Only ABC determine D means D is fully functional dependent on ABC.


Eg:-

StdID Name ProfID Grade


S1 Pinky P2 5
S2 Lucky P1 6

In above table StdID & ProfID are Identification keys.

so StdID, ProfID determines grade (StdID, ProfID →Grade)

Transitive dependency:-

A functional dependency is said to be transitive if it is indirectly formed by two functional dependencies. For e.g.

X -> Z is a transitive dependency if the following three functional dependencies hold true:

 X->Y
 Y does not ->X
 Y->Z

Note: A transitive dependency can only occur in a relation of three of more attributes. This dependency helps us
normalizing the database in 3NF (3rd Normal Form).

Example: Let’s take an example to understand it better:

Book Author Author_age

Game of Thrones George R. R. Martin 66

Harry Potter J. K. Rowling 49

Dying of the Light George R. R. Martin 66

Book} ->{Author} (if we know the book, we knows the author name)

{Author} does not ->{Book}

{Author} -> {Author_age}

Therefore as per the rule of transitive dependency: {Book} -> {Author_age} should hold, that makes sense
because if we know the book name we can know the author’s age.

Partial Dependency?
For a simple table like Student, a single column like student_id can uniquely identfy all the records in a table.
But this is not true all the time. So now let's extend our example to see if more than 1 column together can act as
a primary key.
Let's create another table for Subject, which will have subject_id and subject_name fields and subject_id will
be the primary key.
subject_id subject_name

1 Java

2 C++

3 Php
Now we have a Student table with student information and another table Subject for storing subject
information.
Let's create another table Score, to store the marks obtained by students in the respective subjects. We will also
be saving name of the teacher who teaches that subject along with marks.

score_id student_id subject_id marks teacher

1 10 1 70 Java Teacher

2 10 2 75 C++ Teacher

3 11 1 80 Java Teacher
In the score table we are saving the student_id to know which student's marks are these and subject_id to know
for which subject the marks are for.
Together, student_id + subject_id forms a Candidate Key(learn about Database Keys) for this table, which can
be the Primary key.
Confused, How this combination can be a primary key?
See, if I ask you to get me marks of student with student_id 10, can you get it from this table? No, because you
don't know for which subject. And if I give you subject_id, you would not know for which student. Hence we
need student_id + subject_id to uniquely identify any row.

Now if you look at the Score table, we have a column names teacher which is only dependent on the subject, for
Java it's Java Teacher and for C++ it's C++ Teacher & so on.
Now as we just discussed that the primary key for this table is a composition of two columns which
is student_id & subject_id but the teacher's name only depends on subject, hence the subject_id, and has
nothing to do with student_id.
This is Partial Dependency, where an attribute in a table depends on only a part of the primary key and not on
the whole key.

Normalization in DBMS: 1NF, 2NF, 3NF and BCNF in


Database
Normalization is a process of organizing the data in database to avoid data redundancy, insertion anomaly,
update anomaly & deletion anomaly. Let’s discuss about anomalies first then we will discuss normal forms with
examples.

Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These are – Insertion, update
and deletion anomaly. Let’s take an example to understand this.
Example: Suppose a manufacturing company stores the employee details in a table named employee that has
four attributes: emp_id for storing employee’s id, emp_name for storing employee’s name, emp_address for
storing employee’s address and emp_dept for storing the department details in which the employee works. At
some point of time the table looks like this:

emp_id emp_name emp_address emp_dept


101 Rick Delhi D001
101 Rick Delhi D002
123 Maggie Agra D890
166 Glenn Chennai D900
166 Glenn Chennai D004
The above table is not normalized. We will see the problems that we face when a table is not
normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs to two departments of
the company. If we want to update the address of Rick then we have to update the same in two rows or the data
will become inconsistent. If somehow, the correct address gets updated in one department but not in other then as
per the database, Rick would be having two different addresses, which is not correct and would lead to
inconsistent data.

Insert anomaly: Suppose a new employee joins the company, who is under training and currently not assigned to
any department then we would not be able to insert the data into the table if emp_dept field doesn’t allow nulls.

Delete anomaly: Suppose, if at a point of time the company closes the department D890 then deleting the rows
that are having emp_dept as D890 would also delete the information of employee Maggie since she is assigned
only to this department.

To overcome these anomalies we need to normalize the data.

Normalization.
Here are the most commonly used normal forms:

 First normal form(1NF)


 Second normal form(2NF)
 Third normal form(3NF)
 Boyce & Codd normal form (BCNF)

First normal form (1NF)


As per the rule of first normal form, an attribute (column) of a table cannot hold multiple values. It should hold only
atomic values.

Example: Suppose a company wants to store the names and contact details of its employees. It creates a table
that looks like this:

emp_id emp_name emp_address emp_mobile


101 Herschel New Delhi 8912312390
8812121212
102 Jon Kanpur
9900012222
103 Ron Chennai 7778881212
9990000123
104 Lester Bangalore
8123450987
Two employees (Jon & Lester) are having two mobile numbers so the company stored them in the same field as
you can see in the table above.

This table is not in 1NF as the rule says “each attribute of a table must have atomic (single) values”, the
emp_mobile values for employees Jon & Lester violates that rule.

To make the table complies with 1NF we should have the data like this:

emp_id emp_name emp_address emp_mobile


101 Herschel New Delhi 8912312390
102 Jon Kanpur 8812121212
102 Jon Kanpur 9900012222
103 Ron Chennai 7778881212
104 Lester Bangalore 9990000123
104 Lester Bangalore 8123450987

Second normal form (2NF)


A table is said to be in 2NF if both the following conditions hold:

 Table is in 1NF (First normal form)


 No non-prime attribute is dependent on the proper subset of any candidate key of table.

An attribute that is not part of any candidate key is known as non-prime attribute.

Example: Suppose a school wants to store the data of teachers and the subjects they teach. They create a table
that looks like this: Since a teacher can teach more than one subjects, the table can have multiple rows for a same
teacher.

teacher_id subject teacher_age


111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
Candidate Keys: {teacher_id, subject}
Non prime attribute: teacher_age

The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non prime
attribute teacher_age is dependent on teacher_id alone which is a proper subset of candidate key. This violates
the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper subset of any candidate key
of the table”.

To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:

teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:

teacher_id subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
Now the tables comply with Second normal form (2NF).

Third Normal form (3NF)


A table design is said to be in 3NF if both the following conditions hold:

 Table must be in 2NF


 Transitive functional dependency of non-prime attribute on any super key should be removed.

An attribute that is not part of any candidate key is known as non-prime attribute.

In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional dependency
X-> Y at least one of the following conditions hold:

 X is a super key of table


 Y is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as prime attribute.

Example: Suppose a company wants to store the complete address of each employee, they create a table named
employee_details that looks like this:
emp_id emp_name emp_zip emp_state emp_city emp_district
1001 John 282005 UP Agra Dayal Bagh
1002 Ajeet 222008 TN Chennai M-City
1006 Lora 282007 TN Chennai Urrapakkam
1101 Lilly 292008 UK Pauri Bhagwan
1201 Steve 222999 MP Gwalior Ratan
 

Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on


Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any candidate keys.

Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on emp_id that
makes non-prime attributes (emp_state, emp_city & emp_district) transitively dependent on super key (emp_id).
This violates the rule of 3NF.

To make this table complies with 3NF we have to break the table into two tables to remove the transitive
dependency:

employee table:

emp_id emp_name emp_zip


1001 John 282005
1002 Ajeet 222008
1006 Lora 282007
1101 Lilly 292008
1201 Steve 222999
employee_zip table:

emp_zip emp_state emp_city emp_district


282005 UP Agra Dayal Bagh
222008 TN Chennai M-City
282007 TN Chennai Urrapakkam
292008 UK Pauri Bhagwan
222999 MP Gwalior Ratan

Boyce Codd normal form (BCNF)


It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than 3NF. A table complies
with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the table.
Example: Suppose there is a company wherein employees work in more than one department. They store the
data like this:

emp_id emp_nationality emp_dept dept_type dept_no_of_emp


1001 Austrian Production and planning D001 200
1001 Austrian stores D001 250
1002 American design and technical support D134 100
1002 American Purchasing department D134 600
Functional dependencies in the table above:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate key: {emp_id, emp_dept}

The table is not in BCNF as neither emp_id nor emp_dept alone are keys.

To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:

emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:

emp_dept dept_type dept_no_of_emp


Production and planning D001 200
stores D001 250
design and technical support D134 100
Purchasing department D134 600
emp_dept_mapping table:

emp_id emp_dept
1001 Production and planning
1001 stores
1002 design and technical support
1002 Purchasing department
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}

Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then the decomposition of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of information.
o Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies, and redundancy.

Types of Decomposition
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the same relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the decomposition give the original relation.

Example:

EMPLOYEE_DEPARTMENT table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT

EMPLOYEE table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai

33 Alina 25 Delhi

46 Stephan 30 Bangalore

52 Katherine 36 Mumbai

60 Jack 40 Noida
DEPARTMENT table

DEPT_ID EMP_ID DEPT_NAME

827 22 Sales

438 33 Marketing

869 46 Finance

575 52 Production

678 60 Testing

Now, when these two relations are joined on the common column "EMP_ID", then the resultant relation will look like:

Employee ⋈ Department

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.

Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a part of R1 or R2 or
must be derivable from the combination of functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC). The relational R is
decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD A->BC is a part of relation R1(ABC).

File Organization
 The File is a collection of records. Using the primary key, we can access the records. The type and frequency of
access can be determined by the type of file organization which was used for a given set of records.
 File organization is a logical relationship among various records. This method defines how file records are
mapped onto disk blocks.
 File organization is used to describe the way in which the records are stored in terms of blocks, and the blocks
are placed on the storage medium.
 The first approach to map the database to the file is to use the several files and store only one fixed length record
in any given file. An alternative approach is to structure our files so that we can contain multiple lengths for
records.
 Files of fixed length records are easier to implement than the files of variable length records.
Objective of file organization
 It contains an optimal selection of records, i.e., records can be selected as fast as possible.
 To perform insert, delete or update transaction on the records should be quick and easy.
 The duplicate records cannot be induced as a result of insert, update or delete.
 For the minimal cost of storage, records should be stored efficiently.

Types of file organization:

File organization contains various methods. These particular methods have pros and cons on the basis of
access or selection. In the file organization, the programmer decides the best-suited file organization method
according to his requirement.

Types of file organization are as follows:

1. Sequential File Organization


This method is the easiest method for file organization. In this method, files are stored sequentially. This
method can be implemented in two ways:

1. Pile File Method:


 It is a quite simple method. In this method, we store the record in a sequence, i.e., one after another. Here, the
record will be inserted in the order in which they are inserted into tables.
 In case of updating or deleting of any record, the record will be searched in the memory blocks. When it is
found, then it will be marked for deleting, and the new record is inserted.

Insertion of the new record:

Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence. Hence, records are nothing but
a row in the table. Suppose we want to insert a new record R2 in the sequence, then it will be placed at the end
of the file. Here, records are nothing but a row in any table.
2. Sorted File Method:
 In this method, the new record is always inserted at the file's end, and then it will sort the sequence in ascending
or descending order. Sorting of records is based on any primary key or any other key.
 In the case of modification of any record, it will update the record and then sort the file, and lastly, the updated
record is placed in the right place.

Insertion of the new record:

Suppose there is a preexisting sorted sequence of four records R1, R3 and so on upto R6 and R7. Suppose a
new record R2 has to be inserted in the sequence, then it will be inserted at the end of the file, and then it will
sort the sequence.
Advantages of sequential file organization
 It contains a fast and efficient method for the huge amount of data.
 In this method, files can be easily stored in cheaper storage mechanism like magnetic tapes.
 It is simple in design. It requires no much effort to store the data.
 This method is used when most of the records have to be accessed like grade calculation of a student, generating
the salary slip, etc.
 This method is used for report generation or statistical calculations.

Disadvantage of sequential file organization


 It will waste time as we cannot jump on a particular record that is required but we have to move sequentially
which takes our time.
 Sorted file method takes more time and space for sorting the records.

2. Heap file organization


 It is the simplest and most basic type of organization. It works with data blocks. In heap file organization, the
records are inserted at the file's end. When the records are inserted, it doesn't require the sorting and ordering of
records.
 When the data block is full, the new record is stored in some other block. This new data block need not to be the
very next data block, but it can select any data block in the memory to store new records. The heap file is also
known as an unordered file.
 In the file, every record has a unique id, and every page in a file is of the same size. It is the DBMS
responsibility to store and manage the new records.
Insertion of a new record

Suppose we have five records R1, R3, R6, R4 and R5 in a heap and suppose we want to insert a new record
R2 in a heap. If the data block 3 is full then it will be inserted in any of the database selected by the DBMS,
let's say data block 1.

If we want to search, update or delete the data in heap file organization, then we need to traverse the data from
staring of the file till we get the requested record.

If the database is very large then searching, updating or deleting of record will be time-consuming because
there is no sorting or ordering of records. In the heap file organization, we need to check all the data until we
get the requested record.

Advantage of Heap file organization


 It is a very good method of file organization for bulk insertion. If there is a large number of data which needs to
load into the database at a time, then this method is best suited.
 In case of a small database, fetching and retrieving of records is faster than the sequential record.

Disadvantage of Heap file organization


 This method is inefficient for the large database because it takes time to search or modify the record.
 This method is inefficient for large databases.

3. Hash File Organization


Hash File Organization uses the computation of hash function on some fields of the records. The hash
function's output determines the location of disk block where the records are to be placed.

When a record has to be received using the hash key columns, then the address is generated, and the whole
record is retrieved using that address. In the same way, when a new record has to be inserted, then the address
is generated using the hash key and record is directly inserted. The same process is applied in the case of
delete and update.

In this method, there is no effort for searching and sorting the entire file. In this method, each record will be
stored randomly in the memory.
4. B+ File Organization
 B+ tree file organization is the advanced method of an indexed sequential access method. It uses a tree-like
structure to store records in File.
 It uses the same concept of key-index where the primary key is used to sort the records. For each primary key,
the value of the index is generated and mapped with the record.
 The B+ tree is similar to a binary search tree (BST), but it can have more than two children. In this method, all
the records are stored only at the leaf node. Intermediate nodes act as a pointer to the leaf nodes. They do not
contain any records.

The above B+ tree shows that:


 There is one root node of the tree, i.e., 25.
 There is an intermediary layer with nodes. They do not store the actual record. They have only pointers to the
leaf node.
 The nodes to the left of the root node contain the prior value of the root and nodes to the right contain next value
of the root, i.e., 15 and 30 respectively.
 There is only one leaf node which has only values, i.e., 10, 12, 17, 20, 24, 27 and 29.
 Searching for any record is easier as all the leaf nodes are balanced.
 In this method, searching any record can be traversed through the single path and accessed easily.

Advantage of B+ tree file organization


 In this method, searching becomes very easy as all the records are stored only in the leaf nodes and sorted the
sequential linked list.
 Traversing through the tree structure is easier and faster.
 The size of the B+ tree has no restrictions, so the number of records can increase or decrease and the B+ tree
structure can also grow or shrink.
 It is a balanced tree structure, and any insert/update/delete does not affect the performance of tree.

Disadvantage of B+ tree file organization


 This method is inefficient for the static method.

5. Indexed sequential access method (ISAM)


ISAM method is an advanced sequential file organization. In this method, records are stored in the file using
the primary key. An index value is generated for each primary key and mapped with the record. This index
contains the address of the record in the file.

If any record has to be retrieved based on its index value, then the address of the data block is fetched and the
record is retrieved from the memory.

Advantage of ISAM:
 In this method, each record has the address of its data block, searching a record in a huge database is quick and
easy.
 This method supports range retrieval and partial retrieval of records. Since the index is based on the primary key
values, we can retrieve the data for the given range of value. In the same way, the partial value can also be easily
searched, i.e., the student name starting with 'JA' can be easily searched.

Disadvantage of ISAM
 This method requires extra space in the disk to store the index value.
 When the new records are inserted, then these files have to be reconstructed to maintain the sequence.
 When the record is deleted, then the space used by it needs to be released. Otherwise, the performance of the
database will slow down.

6. Cluster file organization


 When the two or more records are stored in the same file, it is known as clusters. These files will have two or
more tables in the same data block, and key attributes which are used to map these tables together are stored
only once.
 This method reduces the cost of searching for various records in different files.
 The cluster file organization is used when there is a frequent need for joining the tables with the same condition.
These joins will give only a few records from both tables. In the given example, we are retrieving the record for
only particular departments. This method can't be used to retrieve the record for the entire department.

In this method, we can directly insert, update or delete any record. Data is sorted based on the key with which
searching is done. Cluster key is a type of key with which joining of the table is performed.

Types of Cluster file organization:

Cluster file organization is of two types:

1. Indexed Clusters:
In indexed cluster, records are grouped based on the cluster key and stored together. The above EMPLOYEE
and DEPARTMENT relationship is an example of an indexed cluster. Here, all the records are grouped based
on the cluster key- DEP_ID and all the records are grouped.

2. Hash Clusters:

It is similar to the indexed cluster. In hash cluster, instead of storing the records based on the cluster key, we
generate the value of the hash key for the cluster key and store the records with the same hash key value.

Advantage of Cluster file organization


 The cluster file organization is used when there is a frequent request for joining the tables with same joining
condition.
 It provides the efficient result when there is a 1:M mapping between the tables.

Disadvantage of Cluster file organization


 This method has the low performance for the very large database.
 If there is any change in joining condition, then this method cannot use. If we change the condition of joining
then traversing the file takes a lot of time.
 This method is not suitable for a table with a 1:1 condition.

Storage of Database on Hard Disks:-


A file organization refers to the organization of the data of a file in to records, blocks, and access structures; this includes the way
records and blocks are placed on the storage medium an interlinked. An access method, on the other hand, is the way how the data
can be retrieved based on the file Organization.
TYPES OF INDEXES
Indexing is a data structure technique to efficiently retrieve records from the data base files based on some attributes on which the
indexing has been done. Indexing in database system is similar to what we see in books.

Types of Index

 
                          

Indexing is defined based on its indexing attributes. Indexing can be of the following types-
•Primary Index Primary index is defined on an ordered data file. The data file is ordered on a key field. The key field is generally
the primary key of the relation.
•Secondary Index-Secondary index may be generated from a field which is a candidate key and has a unique value in every record,
or a non-key with duplicate values.
•Clustering Index-Clustering index is defined on an ordered data file. The data file is ordered on a non-key field.
Ordered Indexing is of two types-
•Dense Index
•Sparse Index

Dense Index:-

In dense index, there is an index record for every search key value in the database. This makes searching faster but
requires more space to store index records itself. Index records contain search key value and a pointer to the actual
record on the disk.

Sparse Index
In sparse index, index records are not created for every search key. An index record here contains a search key and an
actual pointer to the data on the disk. To search a record, we first proceed by index record and reach at the actual
location of the data. If the data we are looking for is not where we directly reach by following the index, then the system
starts sequential search until the desired data is found.
Multilevel Index
Index records comprise search-key values and data pointers. Multilevel index is stored on the disk along with the actual
database files. As the size of the database grows, so does the size of the indices. There is an immense need to keep
the index records in the main memory so as to speed up the search operations.

Multi-level Index helps in breaking down the index into several smaller indices in order to make the outermost level so
small that it can be saved in a single disk block, which can easily be accommodated anywhere in the main memory.

INDEX AND TREE STRUCTURE

You might also like