DBMS Module 4

Module 4 Chapter 1; Database Design Theory 4.0. Introduction 4.1 Objective 4.2. Introduction to DB design 4.3 Informal Design Guidelines for Relation Schemas 4.3.1 Imparting Clear Semantics to Attributes in Relations 4.3.2 Redundant Information in Tuples and Update Anomalies 4.3.3 NULL Values in Tuples 4.34 Generation of Spurious Tuples 4.4 Funetional Dependencies 4.4.1 Normalization of Relations, 4.4.2. Practical Use of Normal Forms 4.4.3 Definitions of Keys and Attributes Partici 4.4.4 First Normal Form 4.4.5 Second Normal Form 4.4.6 Third Normal Form 4.5 General Definition of Second and Third Normal Form 4.6 Boyce-Codd Normal Form 4.7. Multivalued Dependency and Fourth Normal Form 4.7.1 Formal Definition of Multivalued Dependency 4.8 Join Dependencies and Fifth Normal Form 4.9. Inference Rules for Functional Dependencies 4.10. Equivalence of Sets of Functional Dependencies 4.11 Sets of Functional Dependene 4.12. Properties of Relational Decompositions 4.13 Algorithms for Relational Database Schema Design 4.13.1 Dependency-Preserving and Nonadditive (Lossless) Join Decomposition into 3NF Schemas 4.13.2 Nonadditive Join Decomposition into BCNF Schemas 4.13.3 Dependency-Preserving and Nonadditive (Lossless) Join Decomposition into 3NF Schemas 4.14 About Nulls, Dangling Tuples, and Alternative Relational Designs 4.14.1 Problems with NULL Values and Dangling Tuples 4.15 Other Dependencies and Normal Forms 4.15.1 Inclusion Dependencies 4.15.2 Template Dependencies 4.15.3 Functional Dependencies Based on Arithmetic Functions and Procedures 4.15.4 Domain-Key Normal Form 4.16 Assignment Questions 4.17. Expected Outcome 4.18 Further Reading Dept. of CSE,ATMECE, Mysuru Page 1Database Management System [18CS53] 4.0 Introduction Database Normalization is a technique of organizing the data in the database. Normalization is a systematic approach of decomposing tables to eliminate date redundancy and undesirable characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process that puts data into tabular form by removing duplicated data from the relation tables. This module discuss the basic and higher normal forms. 4.1 Objectives To study the process of normalization and refine the database design To normalize the tables upto 4NF and SNF To study lossless and lossy join operations To study inference rules To study other dependencies and Normal Forms ° 4.2 Introduction to DB design Eact relation scheme consists of @ number of attributes, and the relational database scheme consists of a number of relation schemas. So far, we have assumed that attributes are grouped to form a relation schema by using the common sense of the database designer or by mapping a database scheme design from a conceptual data mode! such as the ER or Enhanced-ER (EER) data model. These models make the designer identify entity types and relationship types and their respective attributes, which leads to a natural and logical grouping of the attributes into relations. Database Design deals with coming up with a ‘good’ schema. There are two levels at which we can discuss the goodness of relation schemas: 1. The logical (or conceptual) level—how users interpret the relation schemas ané the meaning of their attributes. 2. The implementation (or physical storage) level—how the tuples in a base relation are stored and updated, This level applies only to schemas of base relations ‘An Example = STUDENT relation with attributes: studName, rollNo, gender, studDept "DEPARTMENT relation with attributes: deptName, officePhone, hod = Several students belong to a department = studDept gives the name of the student's department Correct schema: Stud percramertinas StudName | rollNo | gender | studDept depename | officePhone. HOD LosDatabase Management System [18CS53] Incorrect schema Studdept StudName rollNe gender depiName | officePhone HOD Problems with bad schema + Redundant storage of data: Office Phone & HOD info-stored redundantly once with each student thal belongs to the department - wastage of disk space + Aprogram that updates Office Phone of a department - must change it at several olaces - more running time ~ error-prone 4.3 Informal Design Guidelines for Relation Schemas * Four informal guidelines that may be used as measures to determine the quality of ‘elation schema design: 1. Making sure that the semantics of the attributes is clearin the schema 2. Reducing the redundant information in tuples 3, Reducing the NULL values in tuples 4. Disallowing the possibilty of generating spurious tuples = These measures are not aways independent of one another 4.3.1 Imparting Clear Semantics to Attributes in Relations * semantics of a relation refers to its meaning resulting from the interpretation of attribute values in a tuple "Whenever we group attributes tc form a relation schema, we assume thal attributes belonging to one ‘elation have certain real-world meaning and a proper interpretatior associated with therr "The easier it is to explain the semantics of the relation, the better the relation schema design will be Guideline 1 = Design a relation schema so that itis easy to explain its meaning "Do not combine attributes from multiple entity types and relationship types into a single relation Dept. of CSE,ATMECE, Mysuru Page 3Database Management System [18CS53] + it relation schema corresponds to one entity type or one relationship type, itis, straightforward tc interpret and to explair its meaning + ifthe relation corresponds to a mixture of multiple entities and relationships semantic ambiguities will result and the relation cannot be easily explained. Examples of Violating Guideline 1 EMP_DEPT cuwie | sev | bowe | wooneze [ canecn | oune | cacnsen SSN | PNUMBER | Hours | ENAME | PNAME | PLOCATION EMP_PROJ Fig: schema diagram for company database Both the relation schemas have clear semantics A tuple in the EMP_DEPT relatior scheme -epresents a single employee but includes additional information— the ame (Dname) of the department for which the employee works and the Socid Security number (Dmgr_ssn) of the department manager. A tuple in the EMP_PROJ relates an employee to a project but also includes the employee name (Ename), project name (Prame), and projectlocation (Plocation) logically correct but they violate Guideline 1 by mixing attributes from distinct real-world entities + EMP_DEPT mixes attributes of employees and departments + EMP_PROJ mixes attributes of employees and projects and the WORKS_ON relationship They may be used as views, but they cause problems when used as base relations 4.3.2 Redundant Information in Tuples and Update Anomalies (One goal of schema design is to minimize the storage space used by the base relations Grouping attributes into relation schemas has a significant effect on storage space For example, compare the space usec by the two base relations EMPLOYEE anc DEPARTMENT with thatfor an EMP_DEPT base relation In EMP_DEPT, the attribute values pertaining to a particular department (Dnumber, Dname, Dmar_ssn) are repeated for every employee who works for that department Dept. of CSE,ATMECE, Mysuru Page 4Database Management System [18CS53] + In contrast, each department's information appears only once in the DEPARTMENT relation. Only the department number Dnumber ‘s repeated in the EMPLOYEE relation for each employee who works in that department as a foreign key ewrtoree a | if Say [Sa | dete |B | an_| 12368760 |1965-01.00|791 Foden Hoon. Tx] m_ [0000 fcseasees | 5 rntn | T | Wong [2004sss 1086-1208 ]600 ees Howson 1X _[M_[s000 fsssssees | 5 ‘sca | [Zale [e0000777 | 1980-01-10 [3001 Carte Sng. | F_[asoo0 faresiant | « Tear | S| Wace | o0765005|1941-0920[201 Gary Batre, TK | F [e000 fnooosses | « anes | K [Rayon] 650004444 |1082:00-16 | 075 Fre Ou Hobs TK] [sooo faseasees | 6 doy A [gi | a50450a60 [1072071 [501 fice Howe. TK_[ [sooo faoeasees | 5 Fi [ yar [0700087 [1080082090 Dn, Hon TK | M_[asoo0 foresazt [« tenes | [Bay _[eeessss | 0371-10 [«60Sorm Hoosen TK [M5000 uu [+ | Figure 1: One possible database state for the COMPANY relational database schema DEPT_LOCATIONS DEPARTMENT Te RST Doane Dronber | _Mgcaen | Wet tan date : ae Resewrsh S| Saseassss | 1986-05-22 ‘ariniewation 4 | 967654391 | 1996-01-01 = — Heasquarers 1 saasessss | 19810619 Belere 3 Sugarland 5 Houston Dept. of CSE,ATMECE, Mysuru PagesDatabase Management System [18CS53] WORKS ON prover — = pee Seth Pett [asessre [1 | 05: Frade a | [haseseree [a [5 Prowse 2 | Sugotend | 5 ae = hee eoeeeaeae | 8 | 400 Computerization | 10 [Siatad | 4 {poses | eae: Reorgarzton | 20 | Heuston | 1 [season [2 | a, [[Rowbeneits | 80 [Stina | a [Bssessoss [7 [100 ([Baseasses [9 [00 DEPENDENT woeesses | 10 | 100 Ean | Dependontname [eende] Baste [Relatonahp soacassss | 20 | 100 330648655 | Alice F_| 1965.06.05 | Deugher [sesearr | a0_| soo) ‘909448585 | Theodore w_| 1909-10-25 | Son Ea 30446685 | Joy F_ | 1058.05.08 | Spouse eo 987654921 | Abner M_| 19420228 | Spouse [serene [a0 | 60 725466760 _| Michel | 186-0706 | Sen [peresnsor_[30_[ 200) 175466780_| Alice F_| 10881280 | 0. Teme [ois 179456780__| Elabath F_| 19670505 | Spovee easeesess | 20 | NULL Figure 1 : One possible database state for the COMPANY relationd database schema Redundancy Mp DEFT ———— Ene “Sm [Baw | Adaess Drumter | Oname | Omgrsse | [Smith John | 72466700) 1085.01 09 [704 Fondren, Houston TK) 5 | Researon | adaadbs55 5 4 4 5 5 4 7 [Wong FsuioTfassaassns | 1956-1208 [@s8 Voss, Houston TX eerach | 293045565 Zuo Aisa, [000967777 [19880710 [3821 Cone rng TE ‘amnvaizn | 087660221 | ita JnnferS [987664921 [104-0520 [201 Bory, Bole Tk ‘amniraten [087650201 Naan Raed |e8604aa)106200-15 [976 FroOak nb ea os [Engh Joe A [4s0ss045a/ 19720731 [S691 Rice Howion TA esceoh —[ eoseabsos ator Asmad V._ [987067007 [1966 0825 [S60 Date, ovr. ‘Amn | 907354201 Bor, Janes E 1897-11-10 [ 460 Stone, Houston, TX Headquarters [ 888655055, fii Redundancy aT aaaseme [2 [75 | Saeki fapaneczat | 2 | 00| Nagar Rash [Engh Ine= | Engh J | se765s871 [90 | 200 | Woe, ete 5 serisisti | — 29 —| 150 | Wes one Fig: Sample states for EMP_DEPT and EMP_PROJ resulting from applying NATURAL JOIN to the relations in Figure * Dept. of CSE,ATMECE, Mysuru Page 6Database Management System [18CS53] * Storing natural joins of base relations leads to an additional problem referred to as update anomalies. These can be classifiec into + insertion anomalies * deletion anomalies, * modification anomalies Insertion Anomalies + Insertion anomalies can be differentiated nto two types, illustrated by the following examples based on the EMP_DEPT relation: 1. To insert a new employee tuple into EMP_DEPT, we must include either the attribute values for the department that the employee works for, or NULLS + For example, to insert a new tuple for an employee who works in department number 5, we must enter all the attribute values of department 5 correctly so that they are consistent with the corresponding values for department 5 in other tuples in EMP_DEPT + In the design of Employee in fig 1, we do not have to worry about this consistency problem hecause we enter only the department number in the employee tuple: all other attribute values of department 5 are recorded only once in the database, as a single tuple in the DEPARTMENT relation 2. Itis difficult to insert a new department that has no employees as yetin the EMP_DEPT relation. The only way to do this is to olace NULL values in the attributes for employee - This violates the entity integrity for EMP_DEPT because Ssn is its orimary key This problem does not occur in the design of Figure 1 because a department is entered in the DEPARTMENT relation whether or not any employees work for it and whenever an employees assigned to that department, a corresponding tuple is nsertec in EMPLOYEE. Deletion Anomalies "The problem of deletion anomalies ‘s related to the seconc insertion anomaly situation just discussec - If we delete from EMP_DEPT an employee tuple that happens to represent the last employee working for a particular department, the information concerning that department is lost from the database ~ This problem does not occurin the database of Figure 2 because DEPARTMENT tuples are stored separately. Dept. of CSE,ATMECE, Mysuru Page7Database Management System [18CS53] Modification Anomalies «In EMP_DEPT, if we change the value of one of the attributes of a particular department—say, the manager of department 5—we must update the tuples of all employees who work in thal department; otherwise, the database will become inconsistent + If we fail to update some tuples, the same department will be showr to have two different values for managerin different employee tuples, which would be wrong Guideline 2 * Design the base elation schemas so that no insertion, deletion, or modification anomalies are present in the relations = if any anomalies are present, note them clearly anc make sure that the programs that update the database will operate correctly * The second guideline is consistent with and, in a way, a restatement of the first guideline " These guidelines may sometimes have tc be violated in order to improve the performance of certain queries. 4.3.3 NULL Values in Tuples + If many of the attributes do not apply to all tuples in the relation, we enc up with many NULLs in those tuples - this can waste space at the storage level - may lead to problems with understanding the meaning of the attributes ~ may also lead to problems with specifying JOIN operations - how to account for them when aggregate operations such as COUNT or SUM are applied + SELECT and JOIN operations involve comparisons; if NULL values are oresent, the results may become unpredictable. + Moreover, NULLs can have multiple interpretations, such as the following + The attribute does not apply to this tuple. For example, Visa_status may not apply to US. students. + The attribute value for this tuple is unknown. For example, the Date_of_birth may be unknown for an employee. + The value is known but absent; thalis it has not been recorded yet. For example, the Home_Phone_Number for an employee may exist, but may not be avaitable and recorded yet. Guideline 3 Dept. of CSE,ATMECE, Mysuru Page 8Database Management System [18CS53] As far as possible avoic placing attributes in 2 base ‘elation whose values may frequently be NULL. If NULLs are unavoidable, make sure that they apply in exceptional cases only and do ‘not apply to a majority of tuples ir the relatior Using space efficiently anc avoiding joins with NULL values are the twc overriding criteria that determine whether to include the columns that may have NULLsin a relatior or to have a separate relation for those columns with the appropriate key columns For example, if only 15 percent of employees have individual offices,there is little justification for including an attribute Office_number in the EMPLOYEE relation; rather, a relation EMP_OFFICES(Essn, Office_number) can be createc to include tuples for only the employees with individual offices, 4.3.4 Generation of Spurious Tuples Consider the twc relation schemas EMP_LOCS and EMP_PROJ1 which can be usec instead of the single EMP_PROJ EMP Locs: Ename | Plosation T PK EMP PROM ‘Sso [ Prumber | Hours | Prams | Plecation T PK. Atuple in EMP_LOCS means that the employee whose name is Ename works on some project whose location is Plocation A tuple in EMP_PRO41 ‘efers to the fact thal the employee whose Social Security number is Ssn works Hours per week on the project whose name, number, and location are Pname, Pnumber, and Plocation. Dept. of CSE,ATMECE, Mysuru Page 9Database Management System [18CS53] EMP Loss EMP PROM rare | Plain Sen_—_[ Promber [ows [Prana [Peston ‘Si Jom 8 Betare ‘seria | 1 [995 | Posun ——| Bale ‘Sith Jom [Sugar] [12806 | 915 | Pay | Sugar Naren RansehK[ Rousin | [seenecaat | 8 [400] tear Houmon Eras byeaA | Balas “auESIEE | —1 [ana | Peak | are Engi Joyen A | Suprne| [ASUS 300 Peas —] Sata Wg. akin | Sugerind| [ROOTES 180] Pees | Good Wig, Flin [Houston | [SH6596 700] Poet | Reon Vion FenkinT.[Stafod | [Sseesee | 10 [190 | Optra | Sati Zaina, Aka | Stef’ | [Bateasses_ | 2000 WialocsernforS] Satord —] [aveuerra | to Yao Hovton | lasrener | 10 | 80 ‘oerestsnt | 90 [200 er CT cecessee [29 —[ RULE "Suppose that we used EMP_PRO4J1 and EMP_LOCS as the base relations instead of EMP_PROV. This produces @ particularly bad schema design because we cannot recover the information that was originally in EMP_PRO4 from EMP_PROW and EMP_LOcS = If we attempt a NATURAL JOIN operation on EMP_PROWJ1 and EMP_LOCS, the result produces many more tuples than the origind set of tuples in EMP_PROJ = Additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious information that is not valid. + The spurious tuples are marked by asterisks (") + [izaseree | 1 [ans [moa [Balan — [Bgl Je ‘aauseres | 2 | 75 [eraser | sora | son + fizssemee | 275 [poor suguna | eaio 19702 + Figssrun [225 [rogue [Sunn [Wien Fran, + [seasaaae | a —[-a00- | Proonaiz | Haaston | Wena. Fri fszesaeea [1 | 200 | roasts | Sotare —| git oes + [asaataesa | —2 [200 [pooner [soar [ sun, ba suesnesa [a | 200 [Posi | Sapsira | cain oes + [assesses | 2 [200 [Poss [Segura [ Weng. Fann + ssscass | —2 [a0 [roar [sora [oui ba ~ [assesses | 2 [100 [Pear [sega | eas Jee ssuessss [2 | 100 | Prsdacre | Spurs | ong Ft Paecetses [a [100 Pot ——[Hacton [Wong Fant 25448685 [10 | 100 | Companion | Sato Fri. + [ssoeas665 | 20 [100 | Rocrsriton [Haun [Naan Ramos _[SHsb58 Hanon Wova Flin "Decomposing EMP_PROJ into EMP_LOCS anc EMP_PRO\1 is undesirable vecause when we JOIN them dack using NATURAL JOIN, we de not gel the correct original information Dept. of CSE,ATMECE, Mysuru Page 10Database Management System [18CS53] = This is because in this case Plocation is the attribute that -elates EMP_LOCS and EMP_PRO41, anc Plocation is neither 2 primary key nor a foreign key in either EMP_LOCS or EMP_PROJ Guideline 4 = Design relation schemas so that they can be joined with equality conditions on attributes that are appropriately related (primary key, foreign @ Dept. of CSE,ATMECE, Mysuru Page 12Database Management System [18CS53] = The following FDs may hold because the four tuples in the current extension have no violation of these constraints *B>C *CoR SABC + {A BJD +{C DB. * The following do not hold because we already have violations of them in the given extension. + A— B (tuples 1 and 2 violate this constraint, + BA (tuples 2 and 3 violate this constraint + DC (tuples 3 and 4 violate it) Normal Forms Based on Primary Keys We assume that ¢ = Set of function dependencies is given for each relation * Each relation has a designated primary key * This information combined with the tests (conditions) for norma forms drives the normalization process for relationd schema design * First three normal forms for relation takes into account all candidate keys of a relation rather than the primary key 4.4.1 Normalization of Relations = The normalization process, as first proposed by Codd (19724), takes a relation schema through a series of tests to certify whether it satisfies a certain normal form. = Initially, Code proposed three normal forms, which he c normal form + Allthese normal forms are based on a single analytical tool: the functiond dependencies among the attributes of a relation = A fourth normal form (4NF) and a fifth normal form (SNF) were proposed, based on the concepts of multivalued dependencies and join dependencies, respectively + Normalization of data can be considered a process of analyzing the given elation schemas based on their FDs and primary keys to achieve the desirable properties of (1) minimizing redundancy and (2) minimizing the insertion, deletion, and update anomalies lec first second, anc third Dept. of CSE,ATMECE, Mysuru Page 13Database Management System [18CS53] + It can be considerec as a ‘fitering’ or “purfication’ process to make the design have ‘successively better quality = Unsatisfactory relation schemas that do not meet certain conditions—the normal form tests—are decomposed into smaller relatior schemas that meet the tests and hence possess the desirable properties Thus, the normalization procedure provides database designers with the following: + A formal framework for analyzing relation schemas based on their keys and n the functional dependencies among their attributes = A series of normal form tests that car be carriec out on individual relation schemas so thal the relational database can be normalized to any desirec degree Definition: The normal form of a relation refers to the highest normal form condition that it meets, and hence indicates the degree to which it has been normalized 4.4.2 Practical Use of Normal Forms + Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties "Database design as practiced in industry today pays particular attention to normalization only up to 3NF, BCNF, or at most 4NF. ‘The database designers need not normalize to the highest possible normal form = Relations may be lefl in @ lower normalization status, such as 2NF, for performance reasons "Definition: Denormalization is the orocess of storing the joir of higher normal form relations as a base relation, whic is in a lower normal form, 4.4.3 Definitions of Keys and Attributes Participating in Keys * Superkey: specifies a uniqueness constraint that no two distinct tuples in any state r of R can have the same value «key Kis a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more + Example: + The attribute set {Ssn} is a key because no two employees tuples can have the same value for Ssr Dept. of CSE,ATMECE, Mysuru Page 14Database Management System [18CS53] *Any set of attributes that includes Ssn—for example, {Ssn, Name, Address}—is 2 superkey if a relation schema has more than one key, eachis called a candidate key One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys In 2 practical relational database, each relation schema must have a primary key If no candidate key is known for a relation, the entire relation can be treated as a default superkey For example (Ssn} s the only candidate key for EMPLOYEE, so it s also the primary key Definition. An attribute of relation scheme R is called a prime attribute of Rif itis member of some candidate key of R. An attribute is called nonprime if itis not a prime attribute—thal is if it's not a member of any candidate key WORKS ON In WORKS_ON relation Both Ssn and Phumber are prime attributes whereas other attributes are nonprime. 4.4.4 First Normal Form Defined to disallow multivalued attributes, composite attributes, and their combinations It states that the domair of an attribute must include only atomic (simple indivisible) values and thal the value of any attribute in a tuple must be 2 single value from the domain of that attribute NF disallows relations within relations or relations as attribute values within tuples The only attribute values permitted by 1NF are single atomic (or le) values. Consider the DEPARTMENT relation schema shownin Figure below (@) DEPARTMENT Dname | Doumber [ Dmgrssn | Diocations t_| 4 Primary key is Dnumber We assume that each department can have a number of locations Dept. of CSE,ATMECE, Mysuru Page 15Database Management System [18CS53] The DEPARTMENT schema and a sample relation state are shownin Figure below DEPARTMENT name [ Doumber | Dmgr_ssn Diocations Research 5 ‘893445555 | (Bellaire, Sugerland, Houston) Administration] 4 1987854321 | (Stattora) Headquarters [+t 288665565 | {Houston} ‘As we can see this is not in 1NF because Diocations is not an atomic attribute, as illustrated by the first tuple in Figure There are two ways we can look at the Dlocations attribute + The domain of Diocations contains atomic values but some tuples can nave a set of these values In this case Dlocations is not functionally dependent on the primary key Dnumber + The domain of Diocations contains sets of values and hence is nonatomic. In this case Onumber--Diocations because each set is considered a single member of the attribute domain In either case, the DEPARTMENT relation is not in 1NF ‘There are three main techniques to achieve first normal form for such a relation 1. Remove the attribute Dlocations that violates 1NF and place it in a separate relatior DEPT_LOCATIONS along with the primary key Dnumber of DEPARTMENT. The orimary key of this relations the combination {Dnumber, Dlocation}. A distinct tuple in DEPT_LOCATIONS exists for each location of a department. This decomposes the non-1NF relation into two 1NF relations, 2. Expand the key so that there will be a separate tuple in the original DEPARTMENT. ‘elation for each location of @ DEPARTMENT. In this case. the primary key becomes the combination {Dnumber, Diocation}. This solution has the disadvantage of introducing redundancy in the relation DEPARTMENT Dname | Dnumber | Dmor_ssn_| Diocation Research 5 333445555 | Bellaire Research 5 299446665 | Sugarland Research 5 (333445555 | Houston ‘Administration| 4 (987654921 | Stafford Headquarters [1 aaa665655 | Houstan Dept. of CSE,ATMECE, Mysuru Page 16Database Management System [18CS53] 3. if a maximum number of values is knowr for the attribute—tor example if it is knowr thal at most three locations can exist for @ department—replace the Diocations attribute by three atomic attributes: Dlocationt, Dlocation2, and Diocation3. This solution has the disadvantage of introducing NULL values if most departments have fewer than three _ locations. Querying on this attribute becomes more difficult; forexample, consider how you would write the query’ List the departments that have ‘Bellaire’ as one of their locations in this design. = Of the three solutions, the first is generally considered best because it does not suffer from redundancy anc it is completely general. naving na limit placed an @ maximum number of values * First normal form also disallows multivalued attributes that are themselves composite. = These are called nested relations because each tuple can have a relation within it. @) EMP_PROJ = (Tse [kane [Pramber [Rowe = Figure above shows how the EMP_PROJ relation could appear if nesting is allowed Each tuple represents an employee entity anc @ relation PROJS(Pnumber Hours) within eack tuple represents the employee's projects anc the hours 9er week that employee works on each project. "The schema of this EMP_PRO4 relation can be represented as follows: EMP_PROJ(Ssn, Ename, (PROJS(Pnumber. Hours)}) = Ssr's the primary key of the EMP_PROJ relation and Phumber is the partial key of the nested relation; that is, within each tuple, the nested relation must have unique values of Pnumber * To normalize this into 1NF, we remove the nested relation attributes into a new relation and propagate the primary key inte it; the primary key of the new relation will combine the partial key with the primary key of the original relation * Decomposition and orimary key oropagation yield the schemas EMP_PROJ1 and EMP_PROL2, EMP_PROJ2 Sen] Prumber | Heure EMP_PROSt Dept. of CSE,ATMECE, Mysuru Page 17Database Management System [18CS53] EMP_PROJ = name Paumber | Hows 1123456780 _| Smith, John B. 1 225) 459455455 | | Engish Joyce A 39446508 ‘Wong, Frankin T, lsooee7777 Zolaya, Alicia J. faeroa7a87 | labbar, Ahmad V laa7asaaa7 | Wiatace, Janvier S [rg lames © 4.4.5 Second Normal Form * Second normal form (2NF} is based on the concept of full functional dependency = A functional dependency X — Y is a full functional dependency if -emoval of any attribute A from X means that the dependency does not hold any more; that is for any attribute A € X, (X—{A}) does not functionally determine Y = A functional dependency XY is @ partial dependency f some attribute A ¢ X car be removed from X and the dependency stil holds that is, for some A € X, (K={A}) > Y EMP PROJ ‘Seq [Prumber [Hears |Ename [Prame [Poca FDI : Foo| Fba | * In the above figure , {Ssn, Prumber} — Hours is full dependency (neither Ssn — Hours nor Pnumber--Hours holds) + {Ssn, Pnumber}—Ename is partial because Ssn—Ename holds + Definition. A relatior scheme R is in 2NF if every nonprime attribute A in R is fuly functionally dependent on the primary key of R = The test for 2NF involves testing for functional dependencies whose left-hanc side attributes are part of the primary key = _ Ifthe primary key contains a single attribute, the test need not be applied at all Dept. of CSE,ATMECE, Mysuru Page 18Database Management System [18CS53] + The EMP_PROJ elation is in 1NF but is not in 2NF. = The nonprime attribute Ename violates 2NF because of =D2. as do the nonprime attributes Pname and Plocation because of FD3 = The functional dependencies FD2 and FD3 make Ename 2name, and Plocation partially dependent on the primary key {Ssn, Pnumber) of EMP_PROJ, thus violating the 2NF test. "Ifa relation schema is not in 2NF, t can be second normalized or 2NF normalized into a number of 2NF relations in which nonprime attributes are associated only with the part of the primary key on which they are fully functionally dependent. Therefore, the functiond dependencies FD1, FD2. and FDS lead to the decomposition of EMP_PROJ into the three relation schemas EP1, EP2, and EP3 shown in Figure delow, pach of which is in 2NF. EMP pros {NF Normaizstion Pi (Saree) [Sacre] aie ri > mp om 4.4.6 Third Normal Form + Transitive functional dependency ‘A functional dependency XY in a relation schema R is a transitive dependency if there exists a set of attribute Z that are neither a primary nor a subset of any key of R(candidate key) and both X > Z and Y > Zholds + Example: EMP DEPT [Levame | Sen [ Bests [Aaoss | Onamber | Orane [Dries 4 [+ 4 4 t 4 + SSN > DMGRSSNis a transitive FD since SSN > DNUMBER and DNUMBER. > DMGRSSN hold Dept. of CSE,ATMECE, Mysuru Page 19Database Management System [18CS53] Dnumberis neither a key itself nor a subset of the key of EMP_DEPT + SSN > ENAME is non-transitive since there is nc set of attributes X where SSN X andX > ENAME "Definition: A relation schema Ris in third normal form (SNF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key "The relation schema EMP_DEPT is in 2NF, since no partial dependencies on a key exist, However, EMP_DEPT is not in 3NF decause of the transitive dependency of Dmgr_ssn (and also Dname) on Ssn via Dnumber EMP_DEPT Ename | San | Bdate | Address [ Daumber | Dname al a a a f + We car normalize EMP_DEPT by decomposing it inte the two 3NF relatior schemas FD: and =n? {NF Normadiztion | ept 02 [Eramo [San [date [Adaress TDnumbor ] [Daunbar TOname [Omer ssn] ae ae a | L + = ED’ and ED2 represent independent entity facts about employees and departments EMP_DEPT without generating spurious tuples * Problematic FD + Left-hand side is part of primary key + Left-hand side is a non-key attribute relatior into new relations A NATURAL JOIN operation on ED1 and FD? wil recover the original elation 2NF and 3NF normalization remove these problem FDs by decomposing the original = In general, we want to design our relation schemas so that they have neither partial nor transitive dependencies because these types of dependencies cause the update anomalies Dept. of CSE,ATMECE, Mysuru Page 20Table 15.1 Database Management System [18CS53] Summary of Normal Forms Based on Primary Keys and Corresponding Normalization Normal Form First (INF) Second (2NF) ‘Third (3NF) Test Relation should have no multivalued attributes or nested relations. For relations where primary key contains mltipl attributes, no nonkey attribute should be functionally dependent on a part ofthe primary key. Relation should not have @ nonkey attribute functionally determined by another nonkey ateibute (or by a set of nonkey attributes) That is, there should be no transitive dependency of a nonkey attribute on the primary key. Remedy (Normalization) Form new relations for each multivalued attribute or nested relation, Decompose and set up a new relation for ‘each partial key with its dependent atrib- lte(s). Make sure to keep a relation with ‘the original primary key and any attributes ‘that are fully functionally dependent on it. Decompose and set up a relation that includes the nonkey attribute(s) that functionally determine(s) other nonkey attribute(s) Dept. of CSE,ATMECE, Mysuru Page 21Database Management System [18CS53] 4.5 General Definition of Second and Third Normal Form + Takes into account all candidate keys of a relation into account * Definition of 2NF: A relation schame R is in second normal form (2NF] if every onprime attribute Ain R 's not partially dependent on any key of R * Consider the relation schema LOTS which describes parcels of land for sale in various counties of a state * Suppose that there are two candidate keys Property_id# and {County_name, Lot#; that is lot numbers are unique only within each county, out Property_id# numbers are unique across counties for the entire state. Candidate Key Lots. Property id | County_name | Lot | Area | Price | Taxrato FO 4 a ee } roo } 4 4 Fo3 Fos 4 * Based on the two candidate keys Property_idit and {County_name Loti, the functional dependencies FD’ anc FDZ hold + FD1: Property_id — { County_name Lott Area Price, Tax_rate} + FD2:{County_name Lott} —{Property_id, Area,Price, Tax_rate} + FD3: County_name — Tax_rate + FDA: Area Price = We choose Property_id#t as the primary key, but no specid consideration will be given to this key over the other candidate key = FDS says that the tax rate s fixed for a given county (does not vary ot by lot within the same county) + FD4 says that the price of 2 lot is determined by its arez regardless of which countyit is in + The LOTS relation schema violates the generd definition of 2NF because Tax_rate is partially dependent on the candidate key {County_name, Loti}, due to FD2 + Tonormalize LOTS into 2NF, we decompose it into the two relationsLOTS1 and LOTS2 Dept. of CSE,ATMECE, MysuruDatabase Management System [18CS53] Property a | Gouniyiname | Low | Wea | Price ‘County name | Tacralo = We construct LOTS’ by removing the attribute Tax_rate that violates 2NF from LOTS and placing it with County_name (the left-hand side of FDS that causes the partial dependency) nto another relation LOTS2. + BothLOTS1 and LOTS? are in 2NF. + Definition of 3NF: A ‘elatior schema R ‘s in thir normal form (3NF) if, whenever 2 nontrivial functional dependency X—-A holds in R, either (a) X is a superkey of R, or (b) A sa prime attribute of R = According to this definition LOTS2 is in 3NF * FD4 in LOTS! violates 3NF because Aree 's not a superkey anc¢ Price is not a rime attribute in LOTS1 + To normalize LOTS1 into 3NF, we decompose it into the relation schemas LOTS1A anc LOTSIB Lorsia Lorsia Properyae | Comimnane | Lot [Aca] [area Poe FoI 4 an) ros|_ ro 4 = We construct LOTS1A by removing the attribute Price that violates 3NF fromLOTS1 anc placing it with Aree (the lefthand side of FD4 that causes the transitive dependency) into another relation LOTS'B. "Bott LOTS1A and LOTSIB are in 3NF lors 1NF Lorsi Lotsa 2NF LoTsiA —LTS1B__LoTs2 SNF Dept. of CSE,ATMECE, Mysuru Page 23Database Management System [18CS53] 4.6 Boyce-Codd Normal Form * Boyce-Codd normal form (BCNF) was proposed as a simpler form of SNF. out it was found to be stricter than 3NF "Every relation in BCNF is also in 3NF; however, a relation in 3NF is not necessarily in BCNF = Definition. A relation scheme R 's in BCNF if whenever 2 nontrivial functional dependency X--A holds in R then X sa superkey of R = The formal definition of BCNF differs from the definition of 3NF in thal condition (b) of 3NF, which allows A to de prime, is absent from BCNF. That makes BCNF a stronger normal form compared to 3NF = In our example, =DS violates BCNF in LOTS1A because AREA is not a superkey of LOTS1A = FDS satisfies 3NF in LOTS1A because County_name is a prime attribute (condition b) but this condition does nat exist in the definition of BCNF = We can decompose LOTS1A inte two BCNF relations LOTS1AX and LOTSTAY. This decomposition loses the functional dependency =D2 because its attributes no longer coexist in the same relation after decomposition LoTsta Property id | Gouniy name [Lott [Area Ft ’ oo Fo 4 L | 4 Fos # BCNF Normalization LoTs1Ax Lorsiay Prpery Gt [Area [LeiP] [Arn [ Coury ane = In practice, most relation schemas that are n 3NF are also in BCNF = Only ifX+A holds in a relation schema R with X not being a superkey and A being ¢ prime attribute will R be in 3NF but not in BCNF = Example: consider the relation TEACH with the following dependencies: Dept. of CSE,ATMECE, Mysuru Page 24Database Management System [18CS53] TEACH ‘Student Course Instructor Narayan_| Database Mark ‘Smith | Database Nevatha Srith | Operating Sterns | Ammar “Seth | Trery Sehr Wallace | Datsbase Mark Wallace | Operating Systems | Anamad Wong | Database ‘Omiveinak Zelaya | Database Navathe Narayan | Operating Systeme | Ammar £1: {Student, Course} — Instructor FD2 Instructor Course ~ means that each instructor teaches one course + (Student, Course} is @ candidate key for this relator * The dependencies shown falow the patter in Figure below with Student as A Course as B, anc \nstructor as C R Alslc FDI 4 FD2 4 = Hence this relation is in 3NF but not BCNF = Decomposition of this relation scheme into two schemas snot straightforward because it may be decomposec into one of the three following possible pairs: 41. R1(Student, Instructor) and R2(Student, Course} 2. R4(Course, Instructor) and R2(Course, Student) 3, Ri (Instructor, Course)and R2(Instructor, Student = It is generally not sufficient tc check separately that each ‘elatior scheme in the database is, say, in BCNF or 3NF = Rather, the proces of normalization through decompositior must also confirm the existence of additional properties thal the relational schemas, taker together, should possess. These would include two properties: “The nonadditive join orlossless join property, which guarantees that the spurious tuple generation oroblerr does not occur with respect to the relation schemas created after decomposition. +The dependency preservation property, which ensures that each functional dependency is representec in some individual relation resulting after decomposition. Dept. of CSE,ATMECE, Mysuru Page 25Database Management System [18CS53] We are not able to meet the functiond dependency preservation but we must meet the ‘non additive join property Nonadditive Join Test for Binary Decomposition: ‘A decomposition D={R:, R:} of R has the lossless join property with respect to a set of functional dependencies F on Rif and only ifeither + The FD ((RinRe) —(R+Ra)is in F* oF +The FD (RinRa) (RRs in F* The third decomposition meets the test Ria: is Instructor Re-Re is Course Hence, the proper decomposition of TEACH into BCNF relationsis TEACH (Instructor, Course) and TEACH2(Instructor, Student) In general, a relation R not in BCNF can be decomposed so as to meet the nonadditive join prorperty by the following procedure. It decomposes R successively into set of relations that are in BCNF: Let R be the relation notin BCNF, letX R. and let X- Abethe FD that causes violation of BCNF. R may be decomposed into two relation: R-A XA either R-A or XAis not in BCNF, repeat the process 4.7 Multivalued Dependency and Fourth Normal Form "For example, consider the relation EMP shown in Figure below EMP Ename | Phane | Dname ‘Saith x Jebn ‘Smith Y Anna ‘Smith x Aanal ‘Seth Y Jebn = A tuple in this EMP relatior represents the fact that an employee whose name is Ename works on the oroject whose name is Pname and has a dependent whose name is Oname = An employee may work on several projects and may have several dependents = The employee's projects and dependents are independent of one another Dept. of CSE,ATMECE, Mysuru Page 26Database Management System [18CS53] = lo keep the relation state consistent, and to avoid any spurious relationship betweer the two independent attributes we must nave a separate tuple to represent every combination of an employee's dependent and an employee's project "In the relation state showr in the EMP, the employee Smith works on two projects X’ and "Y an¢ has two dependents ‘John’ and ‘Anna! anc therefore there are 4 tuples to represent these facts together «The relation =MP is an all-key relation (witt xey made up of all attributes) and therefore no f.d’s and as such qualifies to be a BCNF relation = There is a redundancy in the relatior EMP-the dependent information is repeated for every project and project information is repeated for every dependent = To address this situation, the concept of multivalued dependency(MVD) was proposed and based on this dependency, the fourth normal form was defined * Multivalued dependencies are a consequence of 1NF which disellows an attribute in a tuple to have a set of values, and the accompanying process of converting an unnormalized relation into NF * Informally, whenever two independent 1:N relationships are mixed in the same relation, R(A, B, C),an MVD may arise 4.7.1 Formal Definition of Multivalued Dependency Definition. A multivalued dependency XY specified on relation schema R, where X and Y are both subsets of R, specifies the following constraint on any relation state rof R_ iftwo tuples 1 anc {2 exist in r such that t1[X] = t2[X], then two tuples 3 and t4 should also existin rwith the following properties where we use Z to denote (R — (K (1 Y)) 131K] = t4DX = t1PX] = 1204. (3[Y! = t1[Y] ane t4[¥] = (2[Y] 1312, = 12(Z] and ta{Z| = t11Z1 ae Phan | Drama] LetXEname, Y-Prame ‘Smith % John) _ tY{Ename}=t2fename}=Smith ‘Smith Y as = (EMP-(Ename u Prame)] | Smith x ron ‘Smith Y John 13(Ename)=t4(Ename)=t1 (Ename)=t2(Ename)=Smith Dept. of CSE,ATMECE, Mysuru Page 27Database Management System [18CS53] = t3(Pname)=ti(Pname)=X and t4(Pname)=t2(Pname)=Y = t3(Dname)=t2(Dname)=Anna and t4(Dname)=t1(Dname)=John = Whenever XY holds, we say that X multidetermines Y. Because of the symmetryin the definition, whenever X —— Y holds in R, so does X >> Z Hence, X -»-» Yimplies XZ, and therefore itis sometimes written as X— VIZ. = An MVDX —>- Y in Ris called a trivial MVD if EMP_PROJECTS (a) Visa subset of X, or = (b)XuY=R ame name ‘Smith x Smith | ¥ "For example, the relation EMP_PROJECTS has the trivial MVC ame —>—+ Pname = An MVD that satisfies neither (a) nor (b} is called a nontrivial MVD. * If we have a nontrivial MVC in a relation, we may have to repeat values redundantly in the tuples "In the EMP relation the values °X' and 'Y of Pname are repeated with each value of Dname (or, by symmetry, the values ‘John’ and ‘Anna’ of Dname are repeated with each value of Prame} * This redundancy is clearly undesirable. "We now present the definitior of fourth normal form (4NF), which is violated when relatior has undesirable multivalued dependencies. an¢ hence can be usec to identify and decompose such relations Definition: A relation schema Ris in 4NF with respect to a set of dependencies F (that includes functional dependencies and multivalued dependencies) if, for every nontrivial multivalued dependency X + Yin F* X is a superkey for R = The process of normalizing a relatior involving the nontrivial MVDs that is not in ANE consists of decomposing it so that each MVC is represented by a separate relatior where it becomes a trivial MVE EMP_PROJECTS EMP_DEPENDENTS Ename | Pname Ename | Dname Smith x Smith John: Smith ¥ Smith Anna Dept. of CSE,ATMECE, Mysuru Page 28Database Management System [18CS53] = We decompose EMF into EMP_PROJECTS anc EMP_DEPENDENTS + Both EMP_PROJECTS anc EMP_DEPENDENTS are in 4NF, because the MVDs Ename -— + Pname in EMP_PROJECTS and Ename —— Dname in EMP_DEPENDENTS are trivial MVDs + No other nontrivial MVDs hole in either EMP_PROJECTS or EMP_DEPENDENTS, No FDs hole in these relation schemas either = We can state the following soints: + An all-key elation is always in BCNF since it hasno FDs + An all-Key relation such as the EMP, which has no FDs but has the MVC Ename—>—> Pname | Dname is not in 4NF + Avrelation that is not in 4NF due to a nontrivial MVD must be decomposed to convertit into a set of relations in 4NF + The decomposition removes the redundancy caused by the MVC 4.8 Join Dependencies and Fifth Normal Form + Ajoin dependency (JD), denoted by JD(R1 2, .... Rn), spectfied on relation scheme R. specifies a constraint on the states r of R. The constraint states that every lega state r of R should have a nonadditive join decomposition into R1, R2. .... Rn. Hence, for every such r we have # (gM mata, (=r = Ajoin dependency JD(R1, R2,...., Rn), specified on relation schema R. is a trivial JD if one of the relation schemas Ri in JD(R1, R2, .... Ra) is equal to R. Fifth normal form (project-join normal form) + A relation schema R is in fifth normal form (SNF) (or projectjoin normal form (PJNF)) with respect to a set F of functional, mutivalued, and join dependencies if for every nontrivial join dependency JD(R1, R2, .., Rn in F* every Ris a superkey of R + A database is said to be in SNF, if and only if, + It's in 4NE Dept. of CSE,ATMECE, Mysuru Page 29Database Management System [18CS53] + It we can decompose table further to eliminate redundancy and anomaly, and when we re-join the decomposec tables by means of candidate keys we should not be losing the original date or any new record set should not arise In simple words, joining two or more decomposed table should not lose records nor create new records. supply Sname | Part_name | Proj name ‘Smith Belt Pro “Smith Froly, Adomoty Fre Walton Pro "Adame Fro [aaa | Fox] Smit Prot Fig: The relation SUPPLY with no MVDs is in 4NF out not in SNF if it has the JD(R1, R2, R3) Ry R Ry ‘Shane | Pat name Shame | Prolnane Painame | Prolnane ‘Smith Belt ‘Smith Prox Batt Projx ‘Smith Nut ‘Smith Prov Nut Proj ‘Adamsky [Bolt ‘Adamsky | ProiY Bot Proi¥ Walton Nut Walton Proz Nut Proz ‘Adameky [Nail ‘Adamaky | Prox Nai Prox Fig: Decomposing the relation SUPPLY into the SNF relations R1, 2, R3. Dept. of CSE,ATMECE, Mysuru Page 30Database Management System [18CS53] Chapter 2: Normalization Algorithms 4.9 Inference Rules for Functional Dependencies "Let F be the set of functional dependencies that are specified on relation schema R = The schema designer specifies the functional dependencies that are semantically obvious = Numerous other functional dependencies hold in all legal relation instances among sets of attributes that can be derived from and satisfy the dependencies in F = Those other dependencies can be inferred or deduced from the FDsin F. + For example: + If each department has one manager, so that Dept_no uniquely determines Mgr_ssr (Dept_no — Mgr_ssn), ane a manager has é unique phone number called Mgr_phone (Mgr_ssn—Mgr_phone), + Then these two dependencies togetherimply that Dept_no —> Mgr_phone + Thisis an inferred FD and need not be explicitly statec in addition to the two given FDs. = Definition. Formally, the set of all dependencies that include F as well as all dependencies that can be inferrec from Fis called the closure of F;itis denoted by F*. = For example, suppose that we specify the following set F of obvious functional dependencies on the relation scheme EMP_DEPT EP DEPT EWE ov [ wow [ oon [ cnnmen [ owe | ovorn * Fef Ssn — {Ename, Bdate, Address, Dnumber} Dnumber — {Dname, Dmgr_ssn} ) + Some of the additional functional dependencies that we can infer from F are the following: + Ssn— {Dname, Omgr_ssn} + Ssn—Ssn Dept. of CSE,ATMECE, Mysuru Page 31Database Management System [18CS53] + Dnumber — Dname "An FDX — Yis inferred from a set of dependencies F specified on Rif X + Y holds in every legal relation state r of R "The closure F* of Fis the set of all functional dependencies that can be inferred from F * Set of inference rules car be used to infer new dependencies from given set of dependencies "We use the notation F |X — Y to denote thal the functional dependency XY is inferred from the set of functional dependencies F = we use an abbreviated notation when discussing functional dependencies. We concatenate attribute variables and drop the commas for convenience * The FD (X,¥}-+Z's abbreviated to XYZ, and the FD {X. Y,Z} + (U. V} is abbreviated toXYZ— UV. * Threerules IR1 through IR2 are well-known inference rules for function dependencies. * They are proposed by Armstrong and hence known as Armstrong’s axioms + IRI (reflexive rule): if X2 Y, then XY. + IR2 (augmentation rule): (XY) |=XZYZ. + IRS (transitive rule): (XY, Y+Z}|=X+Z + The reflexive rule (IR1) states that a set of attributes always determines itself or any of its subsets, whict is obvious. + Because IR1 generates dependencies that are always true such dependencies are ralled trivial. * Formally, a functional dependency X—+Y is trivial if X 2 Y; otherwise it is nontrivial * The augmentation rule (IR2) says that adding the same set of attributes to both the left- and right-hand sides of a dependency results in another valid dependency * According to IR3, functional dependencies are transitive "There are three other inference rules that follow from IR1,IR2 anc IR3. They are + IR4 (decomposition, or projective, rule): {XYZ} |=XY + IRS (union, or additive, rule): XY, XZ) |>X-YZ + IR6 (pseudotransitive rule): (XY,WYZ} |-WX=Z + The decomposition cule (IR4) says that we car remove attributes from the right-hand side of a dependency; applying this rule repeatedly can decompose the FD X—{A1, A2 . An} into the set of dependencies {X-A1, X+A2, ....X+An} = The union rule (IRS) allows us to de the opposite; we can combine a set of dependencies {X-»A1, XrA2, ..., X-»An}into the single FD X—{A1, A2, .... An} = The pseudotransitive rule (IR6) allows us to replace a set of attributes Yon the left han: side of a dependency with another set X that functionally determines Y, and can be Dept. of CSE,ATMECE, Mysuru Page 32

DBMS Module 4

Uploaded by

Copyright:

Available Formats

You might also like

DBMS Module 4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DBMS Module 4

Uploaded by

Copyright:

Available Formats

You might also like