Professional Documents
Culture Documents
DBMS Unit-4
DBMS Unit-4
Schema Refinement:
Schema is a relation with attribute set like R(A,B,C,D).Eg. student(sid,sname,cid).
A schema refinement process means to avoid redundancy in the database. Redundancy means same data
placed in 2 different locations.
Redundancy can be in 3 types.
1. Table Level Redundancy:
If a file with the name student.txt is stored in C: Drive and another file with the same name student.txt
is stored in D: Drive then this type of redundancy called Table level Redundancy.
2. Row Level Redundancy:
If a single table contains different tuples with the same values called Row Level Redundancy.
Eg: student table
Sid Sname branch
S1 A CSE
S1 A CSE
S3 B IT
3. Column Level Redundancy:
If a single table contains different columns/fields with the same values called Column Level or Subset of
Column Level Redundancy.
Eg: student table
Sid Sname branch
S1 A CSE
S2 A CSE
S3 B IT
To remove this type of anomalies, we will normalize the table or split the table or join the tables. There
can be various normalized forms of a table like 1NF, 2NF, 3NF, BCNF etc. we will apply the different
normalization schemes according to the current form of the table.
Example 2:
Anomalies:
If the database is redundant we face 3 problems/Anomalies.
1. Update anomalies: If one copy of such repeated data is updated, an inconsistency is created unless
all copies are similarly updated.
Example: STAFF table
For example Room H940 has been improved, it is now of Room size= 500. For updating a single entity,
we have to update all other columns where room=H940.
2. Insertion anomalies: It may not be possible to store some information unless some other
information is stored as well.
Example: STAFF table
For example we have built a new room(B123) but it has not yet been timetabled for any courses or
members of staff.
Deletion anomalies: It may not be possible to delete some information without losing some other
information as well.
Example: STAFF table
For example if we remove the entity, course_no from the table, the details of room C320 get deleted.
Which implies the corresponding course will also get deleted.
Functional Dependency:
The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table.
X → Y
The left side of FD is known as a determinant, the right side of the production is known as a dependent.
Example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we
know the Emp_Id, we can tell that employee name associated with it.
Functional dependency can be written as:
Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Types of Functional Dependency:
There are 2 types of functional dependencies.
1. Trivial Functional Dependency
2. Non-Trivial Functional Dependency
Trivial Functional Dependency:
A → B has trivial functional dependency if B is a subset of A.
The following dependencies are also trivial like: A → A, B → B
Example:
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as Employee_
Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependenc
ies too.
Non-Trivial functional dependency:
A → B has a non-trivial functional dependency if B is not a subset of A.
When A intersection B is NULL, then A → B is called as complete non-trivial.
Example:
ID → Name, Name → DOB
Armstrong's axioms Properties/Inference Rules:
The term Armstrong axioms refer to the sound and complete set of inference rules or axioms,
introduced by William W. Armstrong, that is used to test the logical implication of functional
dependencies. If F is a set of functional dependencies then the closure of F, denoted as F+, is the set of
all functional dependencies logically implied by F. Armstrong’s Axioms are a set of rules, that when
applied repeatedly, generates a closure of functional dependencies .
Armstrong's axioms are used to conclude functional dependencies on a relational database.
The Functional dependency has 6 types of inference rule:
1. Reflexive Rule (IR1):
In the reflexive rule, if Y is a subset of X, then X determines Y.
If X ⊇ Y then X → Y
Example:
X = {a, b, c, d, e}
Y = {a, b, c}
2.Augmentation Rule (IR2):
The augmentation is also called as a partial dependency. In augmentation, if X determines Y, then XZ
determines YZ for any Z.
If X → Y then XZ → YZ
Example:
For R(ABCD), if A → B then AC → BC
3. Transitive Rule (IR3):
In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.
If X → Y and Y → Z then X → Z
4. Union Rule (IR4):
Union rule says, if X determines Y and X determines Z, then X must also determine Y and Z.
If X → Y and X → Z then X → YZ
Proof:
X → Y (given)
X → Z (given)
X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
XY → YZ (using IR2 on 2 by augmentation with Y)
X → YZ (using IR3 on 3 and 4)
5. Decomposition Rule (IR5):
Decomposition rule is also known as project rule. It is the reverse of union rule.
This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.
If X → YZ then X → Y and X → Z
Proof:
X → YZ (given)
YZ → Y (using IR1 Rule)
X → Y (using IR3 on 1 and 2)
6. Pseudo transitive Rule (IR6):
In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ determines W.
If X → Y and YZ → W then XZ → W
Proof:
X → Y (given)
WY → Z (given)
WX → WY (using IR2 on 1 by augmenting with W)
WX → Z (using IR3 on 3 and 2)
Surrogate key:
It is an artificial key which aims to uniquely identify each record is called a surrogate key. This kind of
partial key in dbms is unique because it is created when you don't have any natural primary key.
Surrogate key is usually an integer. A surrogate key is a value generated right before the record is
inserted into a table.
Example 1:
Fname Lastname Start Time End Time
Above, given example, shown shift timings of the different employee. In this example, a surrogate key is
needed to uniquely identify each employee.
Surrogate keys in sql are allowed when
No property has the parameter of the primary key.
In the table when the primary key is too big or complicated.
Example 2:
ProductPrice Table:
Key ProductID Price
Keys:
Types of keys:
1. Primary key:
It is the first key used to identify one and only one instance of an entity uniquely. An entity can contain
multiple keys, as we saw in the PERSON table. The key which is most suitable from those lists becomes
a primary key.
In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the
EMPLOYEE table, we can even select License_Number and Passport_Number as primary keys since
they are also unique.
For each entity, the primary key selection is based on requirements and developers.
2. Candidate key:
A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
Except for the primary key, the remaining attributes are considered a candidate key. The candidate keys
are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the attributes,
like SSN, Passport_Number, License_Number, etc., are considered a candidate key.
3. Super Key:
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a candidate
key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of
two employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination can
also be a key.
4. Foreign key:
Foreign keys are the column of the table used to point to the primary key of another table.
Every employee works in a specific department in a company, and employee and department are two
different entities. So we can't store the department's information in the employee table. That's why we
link these two tables through the primary key of one table.
We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in the
EMPLOYEE table.
In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.
5. Alternate key:
There may be one or more attributes or a combination of attributes that uniquely identify each tuple in a
relation. These attributes or combinations of the attributes are called the candidate keys. One key is
chosen as the primary key from these candidate keys, and the remaining candidate key, if it exists, is
termed the alternate key. In other words, the total number of the alternate keys is the total number of
candidate keys minus the primary key. The alternate key may or may not exist. If there is only one
candidate key in a relation, it does not have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act as candidate
keys. In this relation, Employee_Id is chosen as the primary key, so the other candidate key, PAN_No,
acts as the Alternate key.
6. Composite key:
Whenever a primary key consists of more than one attribute, it is known as a composite key. This key is
also known as Concatenated Key.
For example, in employee relations, we assume that an employee may be assigned multiple roles, and
an employee may work on multiple projects simultaneously. So the primary key will be composed of all
three attributes, namely Emp_ID, Emp_role, and Proj_ID in combination. So these attributes act as a
composite key since the primary key comprises more than one attribute.
Attribute closure: Attribute closure of an attribute set can be defined as set of attributes which can be
functionally determined from it.
To find attribute closure of an attribute set-
add elements of attribute set to the result set.
recursively add elements to the result set which can be functionally determined from the elements of
result set.
Normalization:
Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of relations. It is also used
to eliminate the undesirable characteristics like Insertion, Update and Deletion Anomalies.
Normalization divides the larger table into the smaller table and links them using relationship.
The normal form is used to reduce redundancy from the database table.
Types of Normal Forms:
There are the six types of normal forms:
1. 1NF
2. 2NF
3. 3NF
4. BCNF
5. 4NF
6. 5NF
First Normal Form (1NF):
A relation will be 1NF if it contains an atomic value.
It states that an attribute of a table cannot hold multiple values. It must hold only single-valued
attribute.
First normal form disallows the multi-valued attribute, composite attribute, and their combinations.
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302
The decomposition of the EMPLOYEE table into 1NF has been shown below:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab
Partial Dependency:
Partial Dependency occurs when a non-prime attribute is functionally dependent on part of a candidate
key.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_ZIP
222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007
EMPLOYEE_ZIP table:
EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
STUDENT Table:
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and
two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to
unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE:
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY:
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Decomposition of Relation:
Decomposition of a relation is done when a relation in relational model is not in appropriate normal
form. Relation R is decomposed into two or more relations if decomposition is lossless join as well as
dependency preserving.
Lossless Decomposition:
If the information is not lost from the relation that is decomposed, then the decomposition will be
lossless.
The lossless decomposition guarantees that the join of relations will result in the same relation as it
was decomposed.
The relation is said to be lossless decomposition if natural joins of all the decomposition give the
original relation.
Example:
EMPLOYEE_DEPARTMENT table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME
22 Denim 28 Mumbai 827 Sales
33 Alina 25 Delhi 438 Marketing
46 Stephan 30 Bangalore 869 Finance
52 Katherine 36 Mumbai 575 Production
60 Jack 40 Noida 678 Testing
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table:
DEPT_ID EMP_ID DEPT_NAME
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing