DBMS Mod 5

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 56

MODULE 5

Relational Database Design


Pitfalls in Relational Database Design
 A developer is faced with confusing tables, values, functions, triggers, and much more while
developing a database design.
 During this process, the chances of making some common mistakes are inevitable.
 Mistakes aren’t exhaustive, however, if one avoids these then the chances of developing a
successful database decrease.
 The right database design will give less trouble during deployment, development, and
performance. Hence, to get it right in one go, here is a list of nine mistakes to avert while
designing a database.
 Relational database design requires that we find a “good” collection of relation schemas.
 A bad design may lead to
. Repetition of Information.
. Inability to represent certain information.
 Design Goals:
. Avoid redundant data
. Ensure that relationships among attributes are represented
1. Poor Design/Planning
 The database is a vital aspect of every custom software, hence taking the time to map out the goals of
database design ensures the success of any project. Consequences of lack of planning are seen further
down the line and impacts projects in terms of time management.
 Improper planning of the database leaves you with no time to go back and fix errors and leads to malicious
cyber attacks. Therefore, consider sitting down with a paper and drawing a data model as per
business requirements.
 For example, coders use database schemas to incorporate database designs as they are the blueprints that
help developers to visualize databases. Developers can avoid poor planning/design by checking off the
following points.
I. Main tables of your database model
II. Names for tables
III. Rules for naming tables
IV. Time span required for the project
 These pointers help resolve essential issues within a project and skipping any of these will only delay your
project.
2. Ignoring Normalization

 Normalization or SQL (Structured Query Language) groups data under a single


table and indirectly related data are put under separate tables. These tables are
connected with a logical relationship between child and parent tables.
 Lack of normalization reduces data consistency and leads to duplication of data
because entire data isn’t stored in one place. Finding related data is strenuous
due to lack of grouping and costs time for searching. Hence, consider
implementing normalization rules during database design.
 Despite following normalization rules, databases don’t function as required.
That’s because they need to be normalized to a third normal form, as this layout
represents entities and is balanced with updating – inserting – deleting records.
For example, don’t comply with 1NF, 2NF, or 3NF and redesign the entire table.
3. Redundant Records

 Redundancy in a database is a condition in which the same set of data is stored at


two different places. This means two different spots or two separate fields in
software. This is a problem for developers because they have to keep several
versions of data updated.
 Redundant records lead to unnecessary growth in database size, which in turn
decreases the efficiency of the database and causes data corruption. Hence, the
best option is to avoid preparing redundant records, unless it’s essential for
backups.
 Data redundancy is classified into two aspects – wasteful and excessive. Out of
these, wasteful redundancy occurs when a set of data is repeated needlessly.
Complicated data storing or inefficient coding results in wasteful redundancy.
4. Poor Naming Standards
 Naming is a personal choice, however, it is an important aspect of documentation. Poor naming
standards result in messy and large data files, hence consider incorporating consistency.
 The purpose of naming is to allow all future developers or programmers to easily understand
the components of databases and what was their use. This saves time for developers and they
need not go through documents to understand the meaning of a name.
 There isn’t a universal guide to naming conventions, but it’s best to avoid bad naming practices.
Here are examples of unsuccessful naming conventions that one must avoid
1. Underscore_for_Word_Separation
 For certain display situations, an underscore is confused with a blank space in command and a developer
may confuse it for a compound name with two objects. Underscore is a combination of keys, and on
keyboards that are configured for international usage, the key combination is difficult to locate. Finally, an
underscore makes a name longer, with a character that holds no meaning.
2. Meaningless or Generic Names
 Names such as flags, scrap, data table, or config are ambiguous and misleading. Generic or
meaningless names are a problem for new developers as they have to use and read data models
all over again. Hence, for names use infinitive verbs that express actions and not static names,
for example, calculate, summarize, or append.

3. ALL UPPER CASE


 All-caps don’t allow you to use camel case and make words stick together or forces you to use
underscore to separate words. This impacts your reading as well cause it becomes hard to
distinguish hundreds of files in one go.
 To avoid bad naming, pick one standard and stick to it; ensure that names make sense to other
developers.
5. Lack of Documentation
 As per a survey conducted, the second most challenging task faced by developers was poor
technical documentation. Lack of documentation leads to the loss of vital information or a
tedious handover process to a new programmer.
 Consider documenting everything you know from day one because any documentation is better
than none. Well-organized documentation throughout the project helps to wrap up everything
smoothly and in turn, helps build robust software.
 The goal of documentation is to provide information to the support programmer to detect bugs
and fix them. Documentation starts with naming columns, objects, and tables in a database
model. A well-documented data model consists of solid names, definitions on columns, tables,
relationships, and check and default constraints.
 It is recommended to use sample values and everything else that is to be known for a year-long
project.
6. One Table to Hold All Domain Values
 The next common pitfall encountered while designing a database is to prepare one table for all the same
values. For example, you have a range of values for varied areas such as order status, account status, and
payment status; each one of them with different values.
 The first thought that comes to mind is to store all the values in one place because they are all status
values. So, a table would look like –
1. Table or entity (order, account, and payment)
2. Key (1,2, or 3 for whole table/entity)
3. Value (pending, draft, paid, etc.)
 This table looks simple, yet it comes with its own set of issues. This approach does not include referential
integrity because there’s no simplified way to assure statuses that are applicable to a table are associated
with that specific table.
 For example, you can’t relate the primary key to the account table and make sure that only account statuses
are chosen.
 Hence, to avoid these hassles, consider creating different tables for all similar data values, for instance,
order_status, account_ status, etc.
 7. Ignoring Frequency or Purpose of the Data
 For instance, a system where data is collected each day manually will not have the same data model where
information is created in real-time. That’s because managing a few thousand of data monthly is different
as compared to handling millions of them in the same time period.
 Further, data volume is not the only fact to consider because the purpose of data impacts data structure,
normalization, implementation, and record size of the entire system.
8. Insufficient Indexing
 Insufficient indexing comes from a SQL configuration whose performance is affected due to improper,
excessive, or missing indexes. In case indexes aren’t created properly, the SQL server goes through more
records to retrieve the data that’s requested by the query.
 Index efficiency is connected to the column type, for instance, indexes on INT column display the best
performance, however, indexes on DATE, VARCHAR, or DECIMAL aren’t as efficient. This leads to
redesigning tables with the best possible efficiency.
 Overall, indexing is a complex decision because too much indexing is bad as little indexing, as it impacts
the final outcome.
9. Lack of Testing
 The lack of database testing fails to give information on whether the data values stored and
received in the database are valid or not. Testing helps to save transaction data, avoids data loss,
and prevents unauthorized access to information.
 The database is essential for every type of software application, therefore testers need to know
about SQL during testing. Consider testing for a banking application, and during tests a few
things to note are:
1. No loss of information during the process.
2. Application stores transaction data correctly in the database and displays it accurately.
3. No aborted or partial operation data is saved by the application.
 So, these were the nine common pitfalls to avoid during database design. For developers, creating
a neat and tight database structure is essential for a seamless project flow. Hence, follow the
above aspects for successful database creation.
NORMALIZATION

What is Normalization in DBMS?


 In a database, a huge amount of data gets stored in multiple tables.
 There can be the possibility of redundancy being present in the data.
 So Normalization in DBMS can be defined as the process which eliminates the
redundancy from the data and ensures data integrity.
 Also, the normalization of data helps in removing the insert, update and delete
anomalies.
How Does Normalization work in DBMS?

 The normalization in the DBMS can be defined as a technique to design the schema of a
database and this is done by modifying the existing schema which also reduces the redundancy
and dependency of the data.
 So with Normalization, the unwanted duplication in data is removed along with the anomalies.
 In insert anomaly, the values such as null are not allowed to be inserted for a column.
 In update anomaly, the data cannot be updated correctly because the same values occur multiple
times in a column and in delete anomaly the deletion of a record creates inconsistency as it gets
deleted from more than one row.
 So the aim of normalization is to remove redundant data as well as storing only related data in
the table. This decreases the database size and the data gets logically stored in the database.
In simple words we can say,
 Normalization is the process of organizing data to minimize.
 Redundancy/duplication/repetition.
 Insertion, deletion, updating anomalies.

Normalization Basics
Normalization is basically a process which involves two steps. These are:
 Put the data into a tabular form so as to eliminate repeating groups.
 Remove any kind of duplication from the related tables.
Advantages of Normalization

Normalization has a number of advantages. Some of them are as follows:


 Normalization helps eliminate or minimise data redundancy.
 It helps in better understanding of relations and data stored in them thereof.
 Data structure is easier to maintain.
 Normalization is a reversible process; hence no information is lost during
transformation.
 It helps in efficient use of storage space.
 It provides flexibility of data due to efficient structuring.
Disadvantages of Normalization in DBMS

Some disadvantages of Normalization in DBMS are as follows:


 You should have good knowledge of your users’ requirements.
 Normal Forms of higher end i.e., 4NF and 5NF have performance issues.
 It is a time consuming and a complicated process to follow. You require a good
amount of experience to create and optimally normalized database.
 Decomposition of data needs to be done carefully else it may lead to a poor
database design.
Objective of Normalization

The main objectives of using normalization technique are as follows −


 It provides a formal framework for analysing the relations based on the key
attributes and their functional dependencies.
 Freeing the relations from insertion, update and delete anomalies.
 Reducing the need of re-structuring the tables.
While decomposing, the normalization process should ensure the following
two properties are satisfied −

 Lossless join or non-additive property − It guarantees that the spurious tuples are not
generated with respect to the relation schemas created after decomposition.
 Dependency preservation property − It ensures that every functional dependency is
represented in some of the individual relations resulting after decomposition.
 Denormalization − It is the process of storing the join of higher normal form relations as a
base relation- which is in a lower normal form.
Functional Dependency in DBMS

 The term functional dependency means the association between any two attributes.
 Typically, this relationship is demonstrated between the primary key and non-key attributes
within the table of a Database Management System, where the non-key attribute is
functionally dependent on the primary key attribute to operate.
 A Functional Dependency case in a table is termed as ‘Minimal’ if the non- key attribute has
dependencies on the primary key attribute with the functional characteristics such as there is
only one non-key attribute in the table, any change made in the primary key attribute brings
in changes to the non-key attribute as well, and if any alteration made on the functional
dependencies will affect the table contents of the primary key.
• Here, the left side of the arrow is identified as a Determinant, while the right side of the
arrow is identified as a Dependent.
• X will always be the primary key attribute and Y will be any dependent non- key
attribute from the same table as the primary key.
• This shows Y non-key attribute is functionally dependent on the X primary key attribute.
In other words, If column X attribute of the table uniquely identifies the column Y
attribute of the same table, then the functional dependency of the column Y on the
column X is symbolized as X → Y.
Student_ID → Student_Name Student_ID → Dept Student_ID → DOB
2. Trivial Functional Dependency in DBMS
 The Trivial Functional Dependency is a set of attributes or columns that are known as trivial if the non-
key-dependent attribute is a subset of the determinant attribute, which is a primary key attribute.
 This Trivial Functional Dependency scenario occurs when the primary key is formed by two columns,
and one of which is functionally dependent on the combined set.
 X → Y, is a trivial functional dependency, if Y is a subset of X. Let us consider the below table,
 Here, if the primary key is a combination of the columns Student_ID and Student_Name, then
the Student_Name column is in Trivial Functional Dependency relationship with the primary
key set [Student_ID, Student_Name].
 Any changes made in the Student_Name column will have its effects on the primary key set
[Student_ID, Student_Name], as the Student_Name column is a subset of the primary key
attribute set.
 For a Student ID, S_001, the primary key combination will be [S_001, Sname01]. If a change
to the name is made as Sname001, then the primary key combination will change as [S_001,
Sname001], as the Student_Name column is a subset of the primary key.
Here, if the primary key is the column Student_ID, and the Student_Name column is not a subset of
Student_ID, then the Student_Name column is in a non Trivial Functional Dependency relationship with the
primary key Student_ID.
4. Transitive Functional Dependency in DBMS
 A Transitive Functional Dependency is a type of functional dependency which happens when the non- key
attribute is indirectly formed by its functional dependencies on the primary key attributes.
 Either the value or the known factors can be the reason for this type of Functional Dependency occurrence.
The Transitive Functional Dependency can only occur in a relation of three or more non-key attributes that
are functionally dependent on the primary key attribute.
 In this table, the Student_ID column is the primary key.
 The values in the Student_ID column are formed by the combination of the first letter
from the Student_Name column, last code from the Dept column and date & month from
the DOB column.
 If any change is made in any of these columns will reflect changes in the primary key
column, that is, the Student_ID column.
 Any new record inserted in this table will also have a Student_ID value formed from the
combination of the other three non-key columns.
What is 1NF in DBMS?
 A relation will be 1NF if it contains an atomic value.
 It states that an attribute of a table cannot hold multiple values. It must hold only single-valued
attribute.
 First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.

Requirements
1. The Attributes must be Single Valued
 Every column in your table must be single-valued. It means that no columns should have
multiple values in a single cell. In case we don’t have single values in a cell, we won’t be able to
call it 1NF.
 For instance, if we take a look at a table that consists of data regarding a single novel and its
writers, and it has the following columns: [Book ID], [Writer 1], [Writer 2], and [Writer 3]. In
this case, [Writer 1], [Writer 2], and [Writer 3] repeat the same attribute. They do not refer to
different Book 1Ds. Thus, this table would not be in 1NF.
2. The Domain of attributes must not change
 Every value stored in every table column must be of the same type/ kind. Random values should not make
up the table.
 For instance, if a table consists of a column named DOB that saves the date of birth of various people, we
cannot use this column to save the names of these people. We need a separate column for that. Every
column must hold separate sets of attributes in a DBMS table.
3. Every Column/ Attribute must have a Unique Name
 A 1NF table expects that every column present in a table consists of a unique name of its own. This way, it
becomes feasible to avoid any confusion while the system is retrieving, editing, or adding data, or
performing any other operations on the table. In case multiple columns have a similar name, then the
system will be confused in the end.
4. The order of Data does not matter
 The order in which we store the data in a table does not matter in 1NF. It is a simple way of storing info- no
shenanigans involved.
Key of R1= (rollno,phone)
Key of R2=(rollno)
Second Normal Form (2NF)
 To be in second normal form, a relation must be in first normal form and relation must not contain any
partial dependency. A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute
(attributes which are not part of any candidate key) is dependent on any proper subset of any candidate key
of the table.
 In the second normal form, all non-key attributes are fully functional dependent on the primary key.
Rules Followed in 2nd Normal Form in DBMS
For a relation to be in the 2NF, it must be:
 in 1NF;
 should not consist of partial dependency.
In simpler words,
 If a relation is in 1NF and all the attributes of the non-primary keys are fully dependent on primary keys,
then this relation is known to be in the 2NF or the Second Normal Form.
 We can conclude that the attribute SUBJECT_FEE is a non-prime one since it doesn’t belong to the
candidate key here {SUBJECT_NO, CAND_ID} ;

 But, on the other hand, the SUBJECT_NO – > SUBJECT_FEE, meaning the SUBJECT_FEE depends
directly on the SUBJECT_NO, and it forms the candidate key’s proper subset. Here, the
SUBJECT_FEE is a non-prime attribute, and it depends directly on the candidate key’s proper subset.
Thus, it forms a partial dependency.

 Conclusion: The relation mentioned here does not exist in 2NF.


Now, the tables are in their Second Normal Form.

Note: The Second Normal Form tries to reduce any redundant data from getting stored in the system’s
memory. For instance, if we take an example of about 100 candidates taking the S1 subject, then we don’t
have to store their fees as 1000 as a record for all the 100 candidates. Rather, we can store them all at once
in the second table as the subject fee for S1 is 1000.
Third Normal Form(3 NF)
 A relation is said to be in 3rd normal form in DBMS (or 3NF) when it is in the second normal form, but no
transitive dependency exists for a non-prime attribute.
 In simpler words, In a relation that is in 1NF or 2NF, when none of the non-primary key attributes
transitively depend on their primary keys, then we can say that the relation is in the third normal form of
3NF.

Rules Followed in 3rd Normal Form in DBMS


 We can say that a relation is in the third normal form when it holds any of these given conditions in case of
a functional dependency P -> Q that is non-trivial:
 P acts as a super key.
 Q acts as a non-prime attribute. Meaning, every element of Q forms a part of a candidate key.
Uses of Third Normal Form in DBMS
 We use the 3NF to reduce any duplication of data and achieve data integrity in a database.
 The third normal form is fit for the designing of normal relational databases. It is because a majority of the
3NF tables are free from the anomalies of deletion, updates, and insertion.
 Added to this, a 3NF would always ensure losslessness and preservation of the functional dependencies.
BCNF in DBMS
 BCNF in DBMS (Boyce Codd Normal Form) is an advanced version of the third normal form (3NF).
 A table or a relation is said to be in BCNF in DBMS if the table or the relation is already in 3NF and also,
for every functional dependency (say, X->Y), X is either the super key or the candidate key.
 In simple terms, for any case (say, X->Y), X can't be a non-prime attribute.
What is 4NF in DBMS?

 A relation R is in 4NF if it is in BCNF and there is no non-trivial


multivalued dependency.
 For a dependency A->B, if for a single value of A, multiple values
of B exist, then the relation will be a multi-valued dependency.
Anomalies

It also suffers with anomalies which are as follows −


 Insertion anomaly: If we want to insert a new phoneno for regno3 then we have to insert 3
rows, because for each phoneno we have stored all three combinations of qualification.
 Deletion anomaly: If we want to delete the qualification diploma, then we have to delete it
in more than one place.
 Updation anomaly: If we want to update the qualification diploma to IT, then we have to
update in more than one place.
Now, regno->->phoneno is trivial MVD (since {regno} U {phoneno}=R1)
=>R1 is in 4NF.

Regno->-> qualification is trivial MVD (since {regno} U {qualification} =R2)


=> R2 is in 4NF.

You might also like