Download as pdf or txt
Download as pdf or txt
You are on page 1of 70

Unit No 2.

Data Modelling and Relational Database Design


Unit No 2. Data Modelling and Relational Database Design

2.1 Data Modelling using ER Diagram: Representation of Entities, Attributes, Relationships and their Type, Cardinality,
Generalization, Specialization, Aggregation.

2.2 Relational data model: Structure of Relational Database Model, Types of keys, Referential Integrity Constraints

2.3 Codd’s rules

2.4 Database Design using E-R, E-R to Relational

2.5 Normalization – Normal forms based on primary (1 NF, 2 NF, 3NF, BCNF) Note: Case studies based on E-R diagram &
Normalization
2.1 Data Modelling using ER Diagram: Representation of Entities, Attributes, Relationships and their Type, Cardinality,
Generalization, Specialization, Aggregation.

ER Diagram:
• Entity Relational model is a model for identifying entities to be represented in the database and representation of how
those entities are related.
• The ER data model specifies enterprise schema that represents the overall logical structure of a database graphically.
• E-R diagrams are used to model real-world objects like a person, a car, a company and the relation between these real-
world objects.
Entity, Entity Type, Entity Set –
An Entity may be an object with a physical existence – a particular person, car, house, or employee – or it may be an object
with a conceptual existence – a company, a job, or a university course.
An Entity is an object of Entity Type and a set of all entities is called as an entity set. e.g.; E1 is an entity having Entity Type
Student and set of all students is called Entity Set. In ER diagram, Entity Type is represented as:
Attribute(s):
Attributes are the properties that define the entity type. For example, Roll_No, Name, DOB, Age, Address, Mobile_No
are the attributes that define entity type Student. In ER diagram, the attribute is represented by an oval.

1. Key Attribute –
The attribute which uniquely identifies each entity in the entity set is called key attribute. For example, Roll No will be
unique for each student. In ER diagram, key attribute is represented by an oval with underlying lines.
2. Composite Attribute –
An attribute composed of many other attribute is called as composite attribute. For example, Address attribute of student
Entity type consists of Street, City, State, and Country. In ER diagram, composite attribute is represented by an oval
comprising of ovals.
3. Multivalued Attribute –
An attribute consisting more than one value for a given entity. For example, Phone_No (can be more than one for a given
student). In ER diagram, a multivalued attribute is represented by a double oval.

4. Derived Attribute –
An attribute that can be derived from other attributes of the entity type is known as a derived attribute. e.g.; Age (can be
derived from DOB). In ER diagram, the derived attribute is represented by a dashed oval.
The complete entity type Student with its attributes can be represented as:
Relationships and their Type :-

A relationship type represents the association between entity types. For example,‘Enrolled in’ is a relationship
type that exists between entity type Student and Course. In ER diagram, the relationship type is represented by
a diamond and connecting the entities with lines.
A set of relationships of the same type is known as a relationship set. The following relationship set depicts S1 as enrolled
in C2, S2 is enrolled in C1, and S3 is enrolled in C3.
Degree of a relationship set:
The number of different entity sets participating in a relationship set is called as the degree of a relationship set.
1. Unary Relationship –
When there is only ONE entity set participating in a relation, the relationship is called a unary relationship. For example,
one person is married to only one person.

2. Binary Relationship –
When there are TWO entities set participating in a relationship, the relationship is called a binary relationship. For
example, a Student is enrolled in a Course.

3. n-ary Relationship –
When there are n entities set participating in a relation, the relationship is called an an n-ary relationship.
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known as cardinality. Cardinality can be
of different types:

1. One-to-one – When each entity in each entity set can take part only once in the relationship, the cardinality is one-to-
one. Let us assume that a male can marry one female and a female can marry one male. So the relationship will be one-to-
one.
Using Sets, it can be represented as:
2. Many to one – When entities in one entity set can take part only once in the relationship set and entities in other entity
sets can take part more than once in the relationship set, cardinality is many to one. Let us assume that a student can take
only one course but one course can be taken by many students. So the cardinality will be n(many) to 1. It means that for
one course there can be n students but for one student, there will be only one course.

Using Sets, it can be represented as:

In this case, each student is


taking only 1 course but 1
course has been taken by
many students.
3. Many to many – When entities in all entity sets can take part more than once in the relationship cardinality is many to
many. Let us assume that a student can take more than one course and one course can be taken by many students. So the
relationship will be many to many.
Using sets, it can be represented as:

In this example, student S1 is enrolled in C1 and C3 and Course C3 is enrolled by S1, S3, and S4. So it is many-to-many
relationships.
In this, there is one-to-many mapping as well where each entity can be related to more than one relationship .
Participation Constraint:
Participation Constraint is applied to the entity participating in the relationship set.
1. Total Participation – Each entity in the entity set must participate in the relationship. If each student must enroll in a
course, the participation of students will be total. Total participation is shown by a double line in the ER diagram.
2. Partial Participation – The entity in the entity set may or may NOT participate in the relationship. If some courses are
not enrolled by any of the students, the participation of the course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total participation and Course Entity set
having partial participation.
Using set, it can be represented as,

Every student in the Student Entity set is participating in a relationship but there exists a course C4 that is not taking
part in the relationship.
Generalization :-
• Generalization is like a bottom-up approach in which two or more entities of lower level combine to form a higher level
entity if they have some attributes in common.
• In generalization, an entity of a higher level can also combine with the entities of the lower level to form a further higher
level entity.
• Generalization is more like subclass and superclass system, but the only difference is the approach. Generalization uses
the bottom-up approach.
• In generalization, entities are combined to form a more generalized entity, i.e., subclasses are combined to make a
superclass.
For example, Faculty and Student entities can be generalized and create a higher level entity Person.
Generalization
Specialization :-
• Specialization is a top-down approach, and it is opposite to Generalization. In specialization, one higher level entity can be
broken down into two lower level entities.
• Specialization is used to identify the subset of an entity set that shares some distinguishing characteristics.
• Normally, the superclass is defined first, the subclass and its related attributes are defined next, and relationship set are
then added.
For example: In an Employee management system, EMPLOYEE entity can be specialized as TESTER or DEVELOPER based on
what role they play in the company.
Specialization
Aggregation
In aggregation, the relation between two entities is treated as a single entity. In aggregation, relationship with its
corresponding entities is aggregated into a higher level entity.
For example: Center entity offers the Course entity act as a single entity in the relationship which is in a relationship with
another entity visitor. In the real world, if a visitor visits a coaching center then he will never enquiry about the Course only or
just about the Center instead he will ask the enquiry about both.
2.2 Relational data model: Structure of Relational Database Model, Types of keys, Referential Integrity
Constraints

What is the Relational Model?


The relational model represents how data is stored in Relational Databases. A relational database stores data in the
form of relations (tables). Consider a relation STUDENT with attributes ROLL_NO, NAME, ADDRESS, PHONE, and AGE
shown in Table 1.

STUDENT

ROLL_NO NAME ADDRESS PHONE AGE


1 RAM DELHI 9455123451 18
2 RAMESH GURGAON 9652431543 18
3 SUJIT ROHTAK 9156253131 20
4 SURESH DELHI 18
IMPORTANT TERMINOLOGIES
•Attribute: Attributes are the properties that define a relation. e.g.; ROLL_NO, NAME
•Relation Schema: A relation schema represents the name of the relation with its attributes. e.g.; STUDENT (ROLL_NO,
NAME, ADDRESS, PHONE, and AGE) is the relation schema for STUDENT. If a schema has more than 1 relation, it is called
Relational Schema.
•Tuple: Each row in the relation is known as a tuple. The above relation contains 4 tuples, one of which is shown as:

1 RAM DELHI 9455123451 18

Relation Instance: The set of tuples of a relation at a particular instance of time is called a relation instance. Table 1 shows
the relation instance of STUDENT at a particular time. It can change whenever there is an insertion, deletion, or update in
the database.

Degree: The number of attributes in the relation is known as the degree of the relation. The STUDENT relation defined
above has degree 5.
•Cardinality: The number of tuples in a relation is known as cardinality. The STUDENT relation defined above has cardinality
4.
•Column: The column represents the set of values for a particular attribute. The column ROLL_NO is extracted from the
relation STUDENT.

ROLL_NO

•NULL Values: The value which is not known or unavailable is called a NULL value. It is represented by blank space. e.g.;
PHONE of STUDENT having ROLL_NO 4 is NULL.
Types of keys in Relational Model :-
•Keys play an important role in the relational database.
•It is used to uniquely identify any record or row of data from the table. It is also used to establish and identify relationships
between tables.
For example, ID is used as a key in the Student table because it is unique for each student. In the PERSON table,
passport_number, license_number, SSN are keys since they are unique for each person.
1. Primary key
•It is the first key used to identify one and only one instance of an entity uniquely. An entity can contain multiple keys, as
we saw in the PERSON table. The key which is most suitable from those lists becomes a primary key.
•In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the EMPLOYEE table, we can
even select License_Number and Passport_Number as primary keys since they are also unique.
•For each entity, the primary key selection is based on requirements and developers.
2. Candidate key
•A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
•Except for the primary key, the remaining attributes are considered a candidate key. The candidate keys are as strong as the
primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the attributes, like SSN,
Passport_Number, License_Number, etc., are considered a candidate key.
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a candidate key.

For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of two employees can be the
same, but their EMPLYEE_ID can't be the same. Hence, this combination can also be a key.

The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-NAME), etc.


4. Foreign key
•Foreign keys are the column of the table used to point to the primary key of another table.
•Every employee works in a specific department in a company, and employee and department are two different entities. So
we can't store the department's information in the employee table. That's why we link these two tables through the primary
key of one table.
•We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in the EMPLOYEE table.
•In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each tuple in a relation. These
attributes or combinations of the attributes are called the candidate keys. One key is chosen as the primary key from
these candidate keys, and the remaining candidate key, if it exists, is termed the alternate key. In other words, the total
number of the alternate keys is the total number of candidate keys minus the primary key. The alternate key may or may
not exist. If there is only one candidate key in a relation, it does not have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act as candidate keys. In this relation,
Employee_Id is chosen as the primary key, so the other candidate key, PAN_No, acts as the Alternate key.

Primary Key
Candidate Key
Alternate Key
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key. This key is also known as
Concatenated Key.

Composite Key

For example, in employee relations, we assume that an employee may be assigned multiple roles, and an employee may
work on multiple projects simultaneously. So the primary key will be composed of all three attributes, namely Emp_ID,
Emp_role, and Proj_ID in combination. So these attributes act as a composite key since the primary key comprises more
than one attribute.
7. Artificial key
The key created using arbitrarily assigned data are known as artificial keys. These keys are created when a primary key is
large and complex and has no relationship with many other relations. The data values of the artificial keys are usually
numbered in a serial order.
For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large in employee relations. So it
would be better to add a new virtual attribute to identify each tuple in the relation uniquely.

Artificial Key
Referencial Integrity Constraints -
A referential integrity constraint is also known as foreign key constraint. A foreign key is a key whose values are derived
from the Primary key of another table.
The table from which the values are derived is known as Master or Referenced Table and the Table in which values are
inserted accordingly is known as Child or Referencing Table, In other words, we can say that the table containing the
foreign key is called the child table, and the table containing the Primary key/candidate key is called the referenced or
parent table. When we talk about the database relational model, the candidate key can be defined as a set of attribute
which can have zero or more attributes.

The syntax of the Master Table or Referenced table is:


CREATE TABLE Student (Roll int PRIMARY KEY, Name varchar(25) , Course varchar(10) );

Here column Roll is acting as Primary Key, which will help in deriving the value of foreign key in the child table.
The syntax of Child Table or Referencing table is:
1.CREATE TABLE Subject (Roll int references Student, SubCode int, SubName varchar(10) );

In the above table, column Roll is acting as Foreign Key, whose values are derived using the Roll value of Primary key from Master
table.
Foreign Key Constraint OR Referential Integrity constraint-
There are two referential integrity constraint:
Insert Constraint: Value cannot be inserted in CHILD Table if the value is not lying in MASTER Table
Delete Constraint: Value cannot be deleted from MASTER Table if the value is lying in CHILD Table
Suppose you wanted to insert Roll = 05 with other values of columns in SUBJECT Table, then you will immediately see an
error "Foreign key Constraint Violated" i.e. on running an insertion command as:
Insert into SUBJECT values(5, 786, OS); will not be entertained by SQL due to Insertion Constraint ( As you cannot insert
value in a child table if the value is not lying in the master table, since Roll = 5 is not present in the master table, hence it
will not be allowed to enter Roll = 5 in child table )
Similarly, if you want to delete Roll = 4 from STUDENT Table, then you will immediately see an error "Foreign key Constraint
Violated" i.e. on running a deletion command as:

Delete from STUDENT where Roll = 4; will not be entertained by SQL due to Deletion Constraint. ( As you cannot delete
the value from the master table if the value is lying in the child table, since Roll = 5 is present in the child table, hence it will
not be allowed to delete Roll = 5 from the master table, lets if somehow we managed to delete Roll = 5, then Roll = 5 will be
available in child table which will ultimately violate insertion constraint. )
ON DELETE CASCADE.
As per deletion constraint: Value cannot be deleted from the MASTER Table if the value is lying in CHILD Table. The next
question comes can we delete the value from the master table if the value is lying in the child table without violating the
deletion constraint? i.e. The moment we delete the value from the master table the value corresponding to it should also
get deleted from the child table.
The answer to the above question is YES, we can delete the value from the master table if the value is lying in the child table
without violating the deletion constraint, we have to do slight modification while creating the child table, i.e. by adding on
delete cascade.

TABLE SYNTAX
CREATE TABLE Subject (Roll int references Student on delete cascade, SubCode int, SubName varchar(10) );
In the above syntax, just after references keyword( used for creating foreign key), we have added on delete cascade, by
adding such now, we can delete the value from the master table if the value is lying in the child table without violating
deletion constraint. Now if you wanted to delete Roll = 5 from the master table even though Roll = 5 is lying in the child
table, it is possible because the moment you give the command to delete Roll = 5 from the master table, the row having Roll
= 5 from child table will also get deleted.
The above two tables STUDENT and SUBJECT having four values each are shown, now suppose you are looking to delete Roll
= 4 from STUDENT( Master ) Table by writing a SQL command: delete from STUDENT where Roll = 4;
The moment SQL execute the above command the row having Roll = 4 from SUBJECT( Child ) Table will also get deleted, The
resultant STUDENT and SUBJECT table will look like:

From the above two tables STUDENT and SUBJECT, you can see that in table
STUDENT Roll = 4 get deleted while the value of Roll = 4 in the SUBJECT table is
replaced by NULL. This proves that the Foreign key can have null values. If in the
case in SUBJECT Table, column Roll is Primary Key along with Foreign Key then in
that case we could not make a foreign key to have NULL values.
2.3 Codd’s rules
Every database has tables, and constraints cannot be referred to as a rational database system. And if any database has
only relational data model, it cannot be a Relational Database System (RDBMS). So, some rules define a database to be
the correct RDBMS. These rules were developed by Dr. Edgar F. Codd (E.F. Codd) in 1985, who has vast research
knowledge on the Relational Model of database Systems. Codd presents his 13 rules for a database to test the concept
of DBMS against his relational model, and if a database follows the rule, it is called a true relational database (RDBMS).
These 13 rules are popular in RDBMS, known as Codd's 12 rules.
Rule 0: The Foundation Rule
The database must be in relational form. So that the system can handle the database through its relational capabilities.

Rule 1: Information Rule


A database contains various information, and this information must be stored in each cell of a table in the form of rows
and columns.

Rule 2: Guaranteed Access Rule


Every single or precise data (atomic value) may be accessed logically from a relational database using the combination of
primary key value, table name, and column name.

Rule 3: Systematic Treatment of Null Values


This rule defines the systematic treatment of Null values in database records. The null value has various meanings in the
database, like missing the data, no value in a cell, inappropriate information, unknown data and the primary key should
not be null.
Rule 4: Active/Dynamic Online Catalog based on the relational model
It represents the entire logical structure of the descriptive database that must be stored online and is known as a database
dictionary. It authorizes users to access the database and implement a similar query language to access the database.

Rule 5: Comprehensive Data Sub Language Rule


The relational database supports various languages, and if we want to access the database, the language must be the
explicit, linear or well-defined syntax, character strings and supports the comprehensive: data definition, view definition,
data manipulation, integrity constraints, and limit transaction management operations. If the database allows access to the
data without any language, it is considered a violation of the database.

Rule 6: View Updating Rule


All views table can be theoretically updated and must be practically updated by the database systems.

Rule 7: Relational Level Operation (High-Level Insert, Update and delete) Rule
A database system should follow high-level relational operations such as insert, update, and delete in each level or a single
row. It also supports union, intersection and minus operation in the database system
Rule 8: Physical Data Independence Rule
All stored data in a database or an application must be physically independent to access the database. Each data should not
depend on other data or an application. If data is updated or the physical structure of the database is changed, it will not
show any effect on external applications that are accessing the data from the database.

Rule 9: Logical Data Independence Rule


It is similar to physical data independence. It means, if any changes occurred to the logical level (table structures), it should
not affect the user's view (application). For example, suppose a table either split into two tables, or two table joins to create
a single table, these changes should not be impacted on the user view application.

Rule 10: Integrity Independence Rule


A database must maintain integrity independence when inserting data into table's cells using the SQL query language. All
entered values should not be changed or rely on any external factor or application to maintain integrity. It is also helpful in
making the database-independent for each front-end application.
Rule 11: Distribution Independence Rule

The distribution independence rule represents a database that must work properly, even if it is stored in different locations

and used by different end-users. Suppose a user accesses the database through an application; in that case, they should not

be aware that another user uses particular data, and the data they always get is only located on one site. The end users can

access the database, and these access data should be independent for every user to perform the SQL queries.

Rule 12: Non Subversion Rule

The non-submersion rule defines RDBMS as a SQL language to store and manipulate the data in the database. If a system

has a low-level or separate language other than SQL to access the database system, it should not subvert or bypass integrity

to transform data.
2.4 Database Design using E-R, E-R to Relational :-

ER (Entity Relationship) Diagram in DBMS


•ER model stands for an Entity-Relationship model. It is a high-level data model. This model is used to define the data
elements and relationship for a specified system.
•It develops a conceptual design for the database. It also develops a very simple and easy to design view of data.
•In ER modeling, the database structure is portrayed as a diagram called an entity-relationship diagram.
For example, Suppose we design a school database. In this database, the student will be an entity with attributes like
address, name, id, age, etc. The address can be another entity with attributes like city, street name, pin code, etc and there
will be a relationship between them.
b. Composite Attribute

An attribute that composed of many other attributes is known as a composite attribute. The composite attribute is
represented by an ellipse, and those ellipses are connected with an ellipse.
c. Multivalued Attribute

An attribute can have more than one value. These attributes are known as a multivalued attribute. The double oval is used to
represent multivalued attribute.
For example, a student can have more than one phone number.
d. Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute. It can be represented by a dashed
ellipse.
For example, A person's age changes over time and can be derived from another attribute like Date of birth.
3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus is used to represent the relationship.

Types of relationship are as follows:


a. One-to-One Relationship
When only one instance of an entity is associated with the relationship, then it is known as one to one relationship.
For example, A female can marry to one male, and a male can marry to one female.
b. One-to-many relationship
When only one instance of the entity on the left, and more than one instance of an entity on the right associates with the
relationship then this is known as a one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done by the only specific scientist.

c. Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an entity on the right associates with the
relationship then it is known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship

When more than one instance of the entity on the left, and more than one instance of an entity on the right associates with
the relationship then it is known as a many-to-many relationship.
For example, Employee can assign by many projects and project can have many employees.
ER Model to Relational Model

ER Model, when conceptualized into diagrams, gives a good overview of entity-relationship, which is easier to
understand. ER diagrams can be mapped to relational schema, that is, it is possible to create relational schema using ER
diagram. We cannot import all the ER constraints into relational model, but an approximate schema can be generated.
There are several processes and algorithms available to convert ER Diagrams into Relational Schema. Some of them are
automated and some of them are manual. We may focus here on the mapping diagram contents to relational basics.
ER diagrams mainly comprise of −
•Entity and its attributes
•Relationship, which is association among entities
Mapping Entity
An entity is a real-world object with some attributes.

Mapping Process (Algorithm)


•Create table for each entity.
•Entity's attributes should become fields of tables with their respective data types.
•Declare primary key.
Mapping Relationship
A relationship is an association among entities.
Mapping Process
•Create table for a relationship.
•Add the primary keys of all participating Entities as fields of table with their respective data types.
•If relationship has any attribute, add each attribute as field of table.
•Declare a primary key composing all the primary keys of participating entities.
•Declare all foreign key constraints.
Mapping Hierarchical Entities
ER specialization or generalization comes in the form of hierarchical entity sets.
Mapping Process
•Create tables for all higher-level entities.
•Create tables for lower-level entities.
•Add primary keys of higher-level entities in the table of lower-level entities.
•In lower-level tables, add all other attributes of lower-level entities.
•Declare primary key of higher-level table and the primary key for lower-level table.
•Declare foreign key constraints.
2.5 Normalization – Normal forms based on primary (1 NF, 2 NF, 3NF, BCNF) Note: Case
studies based on E-R diagram & Normalization

Normalization
Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate
undesirable characteristics like Insertion, Update, and Deletion Anomalies.
Normalization divides the larger table into smaller and links them using relationships.
The normal form is used to reduce redundancy from the database table.
Why do we need Normalization?
The main reason for normalizing the relations is removing these anomalies. Failure to eliminate anomalies leads to data
redundancy and can cause data integrity and other problems as the database grows. Normalization consists of a series of
guidelines that helps to guide you in creating a good database structure.

Data modification anomalies can be categorized into three types:


•Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into a relationship due to lack of
data.

•Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data results in the unintended loss of
some other important data.

Updatation Anomaly: The update anomaly is when an update of a single data value requires multiple rows of data to be
updated.
Types of Normal Forms:
Normalization works through a series of stages called Normal forms. The normal forms apply to individual relations. The
relation is said to be in particular normal form if it satisfies constraints.
Following are the various types of Normal forms:
Normal Form Description

1NF A relation is in 1NF if it contains an atomic value.

A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent
2NF
on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.

A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued
4NF
dependency.

A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be
5NF
lossless.
Advantages of Normalization
•Normalization helps to minimize data redundancy.
•Greater overall database organization.
•Data consistency within the database.
•Much more flexible database design.
•Enforces the concept of relational integrity.

Disadvantages of Normalization
•You cannot start building the database before knowing what the user needs.
•The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF, 5NF.
•It is very time-consuming and difficult to normalize relations of a higher degree.

You might also like