Database Design: Logical Models: Normalization and The Relational Model

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 62

Database Design: Logical Models:

Normalization and The Relational


Model
University of California, Berkeley
School of Information
IS 257: Database Management

IS 257 – Fall 2006 2006.09.14 - SLIDE 1


Announcements

• I will be away next week


• Instead we will have an informal workshop
to work on issues of choosing and
designing your personal Databases

IS 257 – Fall 2006 2006.09.14 - SLIDE 2


Lecture Outline

• Review
– Conceptual Model and UML
• Logical Model for the Diveshop
database
• Normalization
• Relational Advantages and
Disadvantages

IS 257 – Fall 2006 2006.09.14 - SLIDE 3


Lecture Outline

• Review
• Logical Design for the Diveshop
database
• Normalization
• Relational Advantages and
Disadvantages

IS 257 – Fall 2006 2006.09.14 - SLIDE 4


DiveShop ER Diagram
Customer
No DiveCust
1
Destination Customer ShipVia
Name No
Destination n
no n 1 ShipVia ShipVia
Dest 1
n
DiveOrds
1

Destination 1

no Order
Destination
Site No n
No
Order
Sites n
No
1 1

Site No
n 1/n
DiveItem
Item
BioSite ShipWrck n
No
Species n
No
Site No
1
1

Species DiveStok Item


BioLife No
No

IS 257 – Fall 2006 2006.09.14 - SLIDE 5


Lecture Outline

• Review
– Conceptual Model and UML
• Logical Model for the Diveshop
database
• Normalization
• Relational Advantages and
Disadvantages

IS 257 – Fall 2006 2006.09.14 - SLIDE 6


Database Design Process

Application 1 Application 2 Application 3 Application 4


External External External External
Model Model Model Model
Application 1
Conceptual
requirements
Application 2
Conceptual
requirements
Internal
Conceptual Logical Model
Application 3 Model Model
Conceptual
requirements
Application 4
Conceptual
requirements

IS 257 – Fall 2006 2006.09.14 - SLIDE 7


Logical Model: Mapping to a Relational
Model
• Each entity in the ER Diagram becomes a
relation.
• A properly normalized (next time) ER diagram
will indicate where intersection relations for
many-to-many mappings are needed.
• Relationships are indicated by common columns
(or domains) in tables that are related.
• We will examine the tables for the Diveshop
derived from the ER diagram

IS 257 – Fall 2006 2006.09.14 - SLIDE 8


DiveShop ER Diagram
Customer
No DiveCust
1
Destination Customer ShipVia
Name No
Destination n
no n 1 ShipVia ShipVia
Dest 1
n
DiveOrds
1

Destination 1

no Order
Destination
Site No n
No
Order
Sites n
No
1 1

Site No
n 1/n
DiveItem
Item
BioSite ShipWrck n
No
Species n
No
Site No
1
1

Species DiveStok Item


BioLife No
No

IS 257 – Fall 2006 2006.09.14 - SLIDE 9


Customer = DIVECUST

Customer No
Name Street City State/Prov Zip/Postal Code
Country Phone First Contact
1480 Louis Jazdzewski
2501 O'Connor
New Orleans LA 60332 U.S.A. (902) 555-88881/29/95
1481 Barbara W right
6344 W . Freeway
San Francisco
CA 95031 U.S.A. (415) 555-4321 2/2/93
1909 Stephen Bredenburg
559 N.E. 167
Indianapolis
Place IN 46241 U.S.A. (317) 555-3644 1/5/93
1913 Phillip Davoust
123 First Street
Berkeley CA 94704 U.S.A. (415) 555-9184 3/9/98
1969 David Burgett
320 Montgomery
SeattleStreet
WA 98105 U.S.A. (206) 555-75803/12/99
2001 Mary Rioux1701 Gateway Pueblo
Blvd. #385
CO 81002 U.S.A. (719) 555-20103/15/97
2306 Kim Lopez 14134 Nottingham
HonoluluLaneHI 96826 U.S.A. (808) 555-50501/29/99
2589 Hiram Marley7233 Mill Run
SanDrive
Francisco
CA 94123 U.S.A. (415) 555-64302/18/99
3154 Tanya Kulesa505 S. Flower,
NewMail
YorkStop
NY 48943 10032 U.S.A. (212) 555-67501/30/99
3333 Charles Sekaron
110 East Park
Miller
Avenue,SDBox 8 57362 U.S.A. (613) 555-43333/16/98
3684 Lowell Lutz915 E. Fesler
Dallas TX 75043 U.S.A. (214) 555-27222/15/99
4158 Keith Lucas56 South Euclid
Chicago IL 60542 U.S.A. (312) 555-43103/17/98
4175 Karen Ng 2134 Elmhill Klamath
Pike FallsOR 97603 U.S.A. (503) 555-47003/20/99
5510 Ken Soule 58 Sansome Aurora
Street CO 89022 U.S.A. (303) 555-6695 2/5/99

IS 257 – Fall 2006 2006.09.14 - SLIDE 10


Dive Order = DIVEORDS

Order No Customer No
Sale Date Ship Via PaymentMethod
CcNumber CcExpDateNo Of People
Depart DateReturn DateDestinationVacationCost
307 1480 9/1/99 UPS Visa 12345 678 90 1/1/01 2 11/8/00 11/15/00 Fiji 10000
310 1481 9/1/99 FedEx Check 1 4/4/00 4/18/00 Santa Barbara 6000
313 1909 9/1/99 W alk In Visa 456456456 9/11/00 4 6/27/00 7/11/00 Cozumel 8000
314 1913 9/1/99 FedEx Check 3 2/7/00 2/14/00 Monterey 6000
317 1969 9/1/99 FedEx AmEx 432432432 12/31/02 4 5/9/00 5/16/00 Fiji 20000
320 2001 9/1/99 W alk In Cash 1 10/10/00 10/17/00 Santa Barbara 3000
321 2306 9/1/99 Emery Master Card
1112223334 8/12/00 1 3/15/00 4/12/00 New Jersey 8000
325 2589 9/1/99 Emery AmEx 332332332 12/10/99 1 3/15/00 4/12/00 New Jersey 8000
326 3333 9/1/99 FedEx Money Order 2 2/10/00 2/17/00 Monterey 4000
327 3684 9/1/99 DHL Master Card
122122321 11/9/99 4 3/10/00 3/23/00 Florida 24000
329 4158 9/1/99 W alk In Cash 1 5/4/00 5/15/00 Cozumel 1571
330 4175 9/1/99 FedEx Check 2 7/3/00 7/10/00 Florida 6000
331 5510 9/1/99 FedEx Money Order 6 6/20/00 6/30/00 Santa Barbara 36000
333 5926 9/1/99 DHL Discover 123123123 12/21/00 2 6/10/00 6/17/00 Fiji 10000
336 5719 9/1/99 FedEx Cash 10 4/2/00 4/24/00 Great Barrier Reef
200000

IS 257 – Fall 2006 2006.09.14 - SLIDE 11


Line item = DIVEITEM
Order No Item No Rental/SaleQty Line Note
307 90010 Rental 4
307 90020 Rental 1 This is our most popular mask.
307 90021 Rental 1
307 90030 Rental 2 These are our best selling fins.
307 90051 Rental 2
310 90011 Rental 1
310 90045 Rental 1
310 90059 Rental 1 A good weight belt for beginners
310 90074 Rental 1
310 90078 Rental 1
313 90127 Sale 1 Holds 10 cubic feet of cargo.
314 90072 Rental 3
314 90094 Rental 3
314 90100 Rental 3
317 90012 Sale 2

IS 257 – Fall 2006 2006.09.14 - SLIDE 12


Shipping information = SHIPVIA

Ship Via Ship Cost


DHL 8
Emery 11
FedEx 12
UPS 10
US Mail 6

IS 257 – Fall 2006 2006.09.14 - SLIDE 13


Dive Equipment Stock= DIVESTOK

Item No DescriptionEquipment On Class


Hand Reorder Point
Cost Sale Price Rental Price
90010 Shotgun 2 Snorkel - Clear 12 2 $18.00 $30.00 $2.00
90011 Shotgun 2 Snorkel - Red 12 2 $18.00 $30.00 $2.00
90012 Shotgun 2 Snorkel - Teal 11 2 $18.00 $30.00 $2.00
90020 Tri-Vent Mask
Mask- Clear 14 2 $62.50 $100.00 $5.00
90021 Tri-Vent Mask
Mask- Red 10 2 $62.50 $100.00 $5.00
90022 Tri-Vent Mask
Mask- Teal 14 2 $62.50 $100.00 $7.00
90023 Quad Vision Mask
Mask - Clear 11 2 $48.25 $80.00 $7.00
90024 Quad Vision Mask
Mask - Red 13 2 $48.25 $80.00 $7.00
90025 Quad Vision Mask
Mask - Teal 10 2 $48.25 $80.00 $10.00
90030 Sea W ing Fins
Fins - Clear 12 2 $60.00 $100.00 $12.00
90031 Sea W ing Fins
Fins - Red 11 2 $60.00 $100.00 $12.00
90032 Sea W ing Fins
Fins - Teal 12 2 $60.00 $100.00 $12.00
90033 Jet Fin - Black
Fins 14 2 $30.00 $60.00 $10.00
90040 D350 Second Regulator
Stage 11 1 $162.50 $270.00 $20.00
90041 G250 Second Regulator
Stage 13 1 $144.50 $240.00 $20.00
90042 G200 Second Regulator
Stage 12 1 $105.25 $175.00 $20.00

IS 257 – Fall 2006 2006.09.14 - SLIDE 14


Dive Locations = DEST

De s tina tio nDe


Nos tina tio nAvg
Na me
Te mp Avg
(F) Te mp S(C)
p ring Te mp
S p ring
(F) Te mpS umme
(C) r TeSmp
umme
(F) r TeFa
mpll Te
(C)mp Fa
(F)ll Te mp Winte
(C) r Te mp
Winte
(F) r Te mp
Acco
(C)mo d aNig
tio ns
ht Life Bo d y o f WaTrate rve l Co s t
1 Co zume l 78 25.556 76 24.444 84 28.889 78 25.556 74 23.333 Che a p S le e p y Ca rib b e a n 1000
2 Gre a t Ba rrie r Re e f80 26.667 76 24.444 84 28.889 78 25.556 76 24.444 Mo d e ra te P le a s a nt Co ra l S e a 5000
3 Mo nte re y 60 15.556 62 16.667 64 17.778 64 17.778 58 14.444 Exp e ns ive Wild P a cific 2000
4 S a nta Ba rb a ra 75 23.889 73 22.777 78 25.556 72 22.222 70 21.111 Exp e ns ive Wild P a cific 3000
5 Flo rid a 77 25 75 23.889 85 29.444 78 25.556 70 21.111 Mo d e ra te P le a s a nt Ca rib b e a n 3000
6 Fiji 75 23.889 76 24.444 80 26.667 74 23.333 70 21.111 Exp e ns ive S le e p y S o uth P a cific 5000
7 Ne w J e rs e y 57 13.889 57 13.89 60 15.556 58 14.444 53 11.667 Exp e ns ive P le a s a nt Atla ntic 2000

IS 257 – Fall 2006 2006.09.14 - SLIDE 15


Dive Sites = SITE

S ite No De s tina tio nSNo


ite Na me S ite Hig hlig htS iteDis
Notatence
sDisfro
ta nc
m eDe
Tofro
p
wnm
th (m)
(ft)
ToDewnp (km)
th (m) Vis ib ility (ft)Vis ib ility (m)
Curre nt S kill Le ve l
1001 1 P a la nca r R e e f R e e f 10 16.09 100 30.48 150 45.72 S tro ng Inte rme d ia te
1002 1 S a nta R o s a R e eR
f eef 8 12.87 80 24.384 150 45.72 S tro ng Inte rme d ia te
1003 1 Cha nca na b R e e R f eef 4 6.437 60 18.288 100 30.48 Mild Be g inning
1004 1 P unta S ur Re e f 13 20.92 120 36.576 175 53.34 S tro ng Ad va nc e d
1005 1 Yo c a b R e e f Re e f 6 9.656 50 15.24 100 30.48 Mild Be g inning
2001 2 He ro n Is la nd Re e f 50 80.47 90 27.432 150 45.72 Mild Inte rme d ia te
2002 2 Co d Ho le Fis h 45 72.42 50 15.24 150 45.72 Mild Be g inning
2003 2 Butte rfly Ba y Ca ve s 20 32.19 70 21.336 70 21.336 No ne Ad va nc e d
2004 2 Whe e le r R e e f Ma rine Life 30 48.28 50 15.24 125 38.1 Mild Be g inning
2005 2 Wa ta na b e Ma rine Life 130 209.2 150 45.72 200 60.96 No ne Inte rme d ia te
3001 3 P o int Lo b o s Ma rine Life 3 4.828 60 18.288 75 22.86 No ne Be g inning
3002 3 Ma ca b e e Be a c hMa rine Life 0.1 0.161 40 12.192 40 12.192 No ne Be g inning
3003 3 P inna c le s P inna cle 1 1.609 60 18.288 50 15.24 Mild Be g inning
3004 3 Mo na s te ry Be a c M
h a rine Life 3 4.828 50 15.24 40 12.192 S urg e Be g inning

IS 257 – Fall 2006 2006.09.14 - SLIDE 16


Sea Life = BIOLIFE

S p e cie s NoCa te g o ry Co mmo n Na me S p e cie s Na me Le ng th (cm) Le ng th (in)


No te s Gra p hic
90020 Trig g e rfis hClo wn Trig g e rfis hBa llis to id e s co ns p icillum
50 19.685
90030 S na p p e r R e d Emp e ro r Lutja nus s e b a e 60 23.622
90050 Wra s s e Gia nt Ma o ri Wra s Che s e ilinus und ula tus 229 90.157
90070 Ang e lfis h Blue Ang e lfis h P o ma ca nthus na ua rchus 30 11.811
90080 Co d Luna rta il R o ckco dVa rio la lo uti 80 31.496
90090 S co rp io nfisFire
h fis h P te ro is vo lita ns 38 14.961
90100 Butte rflyfis hOrna te Butte rflyfis C
h ha e to d o n Orna tis s imus1 9 7.4803
90110 S ha rk S we ll S ha rk Ce p ha lo s cyllium ve ntrio 102
s um40.157
90120 R a y Ba t Ra y Mylio b a tis ca lifo rnica 56 22.047
90130 Ee l Ca lifo rnia Mo ra y Gymno tho ra x mo rd a x 150 59.055
90140 Co d Ling co d Op hio d o n e lo ng a tus 150 59.055

IS 257 – Fall 2006 2006.09.14 - SLIDE 17


BIOSITE -- linking relation

S p e cie s No S ite No
90010 2001
90010 2002
90010 2003
90010 2004
90010 2005
90010 6001
90010 6003
90010 6004
90010 6005
90020 2001
90020 2002

IS 257 – Fall 2006 2006.09.14 - SLIDE 18


Shipwrecks = SHIPWRK

S hip N a m e S ite N o Ca te g o ry T yp e Inte re s t T o nna g e Le ng th (ft)


Le ng th (m ) Be a m (ft)
Be a m (m ) Ca us e D a te S unk Co mPmaesnts
s e ng
S urvivo
e rs /Cre
Co
rs w
nd itio n G ra p h
De la wa re 7007 Co m m e rc iaSl te a m F re ig
T hte
re a rs ure 1646 252 76.8096 37 11.2776 F ire 66 66 Bro ke n
F .S .Lo o p 4004 Co m m e rc iaSl te a m S c hoMa o ne
c hine
r ry 794 193 58.8264 39 11.8872 D e lib e ra te 1/1/47 0 S c a tte re d
Go s fo rd 4001 Co m m e rc iaBa l rq ue R igFgixture
e d S a il 2250 280 85.344 42 12.8016 F ire Inta c t
Gre a t Is a a c 7002 Co m m e rc iaSl e a g o ing TFug
ixture 1117 185 56.388 37 11.2776 Co llis io n 4/16/47 27 27 Inta c t
Lizzie D 7001 Co m m e rc iaTl ug /R um runne
T re ars ure 122 84 25.6032 21 6.4008 U nkno wn 10/19/22 8 0 Inta c t
Mo ha wk 7004 P a s s e ng e rO c e a n LineTr re a s ure 8140 402 122.5296 54 16.4592 Co llis io n 1/25/35 163 118 S c a tte re d
R .P . R e s o r 7006 Co m m e rc iaOl il T a nke r T re a s ure 7450 435 132.588 66.8 20.36064 Milita ry 2/28/42 50 2 Bro ke n
S ta r o f S c o tla nd 4002 P a s s e ng e rBritis h Q -BoTaret a s ure 1250 263 80.1624 35 10.668 W e a the r 1/22/42 5 4 Bro ke n
T o lte n 7008 Co m m e rc iaFlre ig hte r F ixture 1858 280 85.344 43 13.1064 Milita ry 3/13/42 28 1 Inta c t
US S Mo o d y 4006 Milita ry W W I De s troT ye
re ar s ure 1308 314 95.7072 31 9.4488 D e lib e ra te 1/1/33 0 Inta c t
Va lia nt 4003 P a s s e ng e rLuxury Mo toTrreYa a scure
ht 444 162.4 49.49952 26 7.9248 F ire 12/17/30 25 25 Inta c t

IS 257 – Fall 2006 2006.09.14 - SLIDE 19


Mapping to Other Models
• Hierarchical
– Need to make decisions about access paths
• Network
– Need to pre-specify all of the links and sets
• Object-Oriented
– What are the objects, datatypes, their
methods and the access points for them
• Object-Relational
– Same as relational, but what new datatypes
might be needed or useful (more on OR later)

IS 257 – Fall 2006 2006.09.14 - SLIDE 20


Lecture Outline

• Review
• Logical Model for the Diveshop
database
• Normalization
• Relational Advantages and
Disadvantages

IS 257 – Fall 2006 2006.09.14 - SLIDE 21


Normalization
• Normalization theory is based on the
observation that relations with certain
properties are more effective in inserting,
updating and deleting data than other sets
of relations containing the same data
• Normalization is a multi-step process
beginning with an “unnormalized” relation
– Hospital example from Atre, S. Data Base:
Structured Techniques for Design,
Performance, and Management.
IS 257 – Fall 2006 2006.09.14 - SLIDE 22
Normal Forms
• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)
• Boyce-Codd Normal Form (BCNF)
• Fourth Normal Form (4NF)
• Fifth Normal Form (5NF)

IS 257 – Fall 2006 2006.09.14 - SLIDE 23


Normalization

Functional
dependency
No transitive
of nonkey
dependency
attributes on
between
the primary
nonkey Boyce- key - Atomic
attributes
Codd and values only

All
Higher Full
determinants Functional
are candidate dependency
keys - Single of nonkey
multivalued attributes on
dependency the primary
key

IS 257 – Fall 2006 2006.09.14 - SLIDE 24


Unnormalized Relations
• First step in normalization is to convert the
data into a two-dimensional table
• In unnormalized relations data can repeat
within a column

IS 257 – Fall 2006 2006.09.14 - SLIDE 25


Unnormalized Relation
Patient # Surgeon # Surg. date Patient Name Patient Addr Surgeon Surgery Postop drug
Drug side effects

Gallstone
s removal;
Jan 1, 15 New St. Beth Little Kidney
145 1995; June New York, Michael stones Penicillin, rash
1111 311 12, 1995 John White NY Diamond removal none- none

Eye
Charles Cataract
Apr 5, Field removal
243 1994 May 10 Main St. Patricia Thrombos Tetracyclin Fever
1234 467 10, 1995 Mary Jones Rye, NY Gold is removal e none none
Dogwood
Lane Open
Jan 8, Harrison, David Heart Cephalosp
2345 189 1996 Charles Brown NY Rosen Surgery orin none
55 Boston
Post Road,
Nov 5, Chester, Cholecyst
4876 145 1995 Hal Kane CN Beth Little ectomy Demicillin none
Blind Brook Gallstone
May 10, Mamaronec s
5123 145 1995 Paul Kosher k, NY Beth Little Removal none none
Eye
Cornea
Replacem
Apr 5, Hilton Road ent Eye
1994 Dec Larchmont, Charles cataract Tetracyclin
6845 243 15, 1984 Ann Hood NY Field removal e Fever

IS 257 – Fall 2006 2006.09.14 - SLIDE 26


First Normal Form
• To move to First Normal Form a relation
must contain only atomic values at each
row and column.
– No repeating groups
– A column or set of columns is called a
Candidate Key when its values can uniquely
identify the row in the relation.

IS 257 – Fall 2006 2006.09.14 - SLIDE 27


First Normal Form
Patient # Surgeon # Surgery DatePatient Name Patient Addr Surgeon Name Surgery Drug adminSide Effects

15 New St.
New York, Gallstone
1111 145 01-Jan-95 John W hite NY Beth Little s removal Penicillin rash
15 New St. Kidney
New York, Michael stones
1111 311 12-Jun-95 John W hite NY Diamond removal none none
Eye
10 Main St. Cataract Tetracyclin
1234 243 05-Apr-94 Mary Jones Rye, NY Charles Field removal e Fever

10 Main St. Thrombos


1234 467 10-May-95 Mary Jones Rye, NY Patricia Gold is removal none none
Dogwood
Lane Open
Charles Harrison, Heart Cephalosp
2345 189 08-Jan-96 Brown NY David Rosen Surgery orin none

55 Boston
Post Road,
Chester, Cholecyst
4876 145 05-Nov-95 Hal Kane CN Beth Little ectomy Demicillin none

Blind Brook Gallstone


Mamaronec s
5123 145 10-May-95 Paul Kosher k, NY Beth Little Removal none none
Eye
Hilton Road Cornea
Larchmont, Replacem Tetracyclin
6845 243 05-Apr-94 Ann Hood NY Charles Field ent e Fever

Hilton Road Eye


Larchmont, cataract
6845 243 15-Dec-84 Ann Hood NY Charles Field removal none none

IS 257 – Fall 2006 2006.09.14 - SLIDE 28


1NF Storage Anomalies
• Insertion: A new patient has not yet undergone
surgery -- hence no surgeon # -- Since surgeon # is
part of the key we can’t insert.
• Insertion: If a surgeon is newly hired and hasn’t
operated yet -- there will be no way to include that
person in the database.
• Update: If a patient comes in for a new procedure,
and has moved, we need to change multiple
address entries.
• Deletion (type 1): Deleting a patient record may
also delete all info about a surgeon.
• Deletion (type 2): When there are functional
dependencies (like side effects and drug) changing
one item eliminates other information.
IS 257 – Fall 2006 2006.09.14 - SLIDE 29
Second Normal Form
• A relation is said to be in Second Normal
Form when every nonkey attribute is fully
functionally dependent on the primary
key.
– That is, every nonkey attribute needs the full
primary key for unique identification

IS 257 – Fall 2006 2006.09.14 - SLIDE 30


Second Normal Form

Patient # Patient Name Patient Address


15 New St. New
1111 John White York, NY
10 Main St. Rye,
1234 Mary Jones NY
Charles Dogwood Lane
2345 Brown Harrison, NY
55 Boston Post
4876 Hal Kane Road, Chester,
Blind Brook
5123 Paul Kosher Mamaroneck, NY
Hilton Road
6845 Ann Hood Larchmont, NY

IS 257 – Fall 2006 2006.09.14 - SLIDE 31


Second Normal Form

Surgeon # Surgeon Name

145 Beth Little

189 David Rosen

243 Charles Field

311 Michael Diamond

467 Patricia Gold

IS 257 – Fall 2006 2006.09.14 - SLIDE 32


Second Normal Form
Patient # Surgeon # Surgery Date Surgery Drug Admin Side Effects
Gallstones
1111 145 01-Jan-95 removal
Kidney Penicillin rash
stones
1111 311 12-Jun-95 removal none none
Eye Cataract
1234 243 05-Apr-94 removal Tetracycline Fever
Thrombosis
1234 467 10-May-95 removal none none
Open Heart Cephalospori
2345 189 08-Jan-96 Surgery n none
Cholecystect
4876 145 05-Nov-95 omy Demicillin none
Gallstones
5123 145 10-May-95 Removal none none
Eye cataract
6845 243 15-Dec-84 removal none none
Eye Cornea
6845 243 05-Apr-94 Replacement Tetracycline Fever

IS 257 – Fall 2006 2006.09.14 - SLIDE 33


1NF Storage Anomalies Removed

• Insertion: Can now enter new patients without


surgery.
• Insertion: Can now enter Surgeons who haven’t
operated.
• Deletion (type 1): If Charles Brown dies the
corresponding tuples from Patient and Surgery
tables can be deleted without losing information
on David Rosen.
• Update: If John White comes in for third time,
and has moved, we only need to change the
Patient table

IS 257 – Fall 2006 2006.09.14 - SLIDE 34


2NF Storage Anomalies
• Insertion: Cannot enter the fact that a particular
drug has a particular side effect unless it is given
to a patient.
• Deletion: If John White receives some other drug
because of the penicillin rash, and a new drug
and side effect are entered, we lose the
information that penicillin can cause a rash
• Update: If drug side effects change (a new
formula) we have to update multiple occurrences
of side effects.

IS 257 – Fall 2006 2006.09.14 - SLIDE 35


Third Normal Form
• A relation is said to be in Third Normal Form if
there is no transitive functional dependency
between nonkey attributes
– When one nonkey attribute can be determined with
one or more nonkey attributes there is said to be a
transitive functional dependency.
• The side effect column in the Surgery table is
determined by the drug administered
– Side effect is transitively functionally dependent on
drug so Surgery is not 3NF

IS 257 – Fall 2006 2006.09.14 - SLIDE 36


Third Normal Form

Patient # Surgeon # Surgery Date Surgery Drug Admin

1111 145 01-Jan-95 Gallstones removal Penicillin


Kidney stones
1111 311 12-Jun-95 removal none

1234 243 05-Apr-94 Eye Cataract removal Tetracycline

1234 467 10-May-95 Thrombosis removal none

2345 189 08-Jan-96 Open Heart Surgery Cephalosporin

4876 145 05-Nov-95 Cholecystectomy Demicillin

5123 145 10-May-95 Gallstones Removal none

6845 243 15-Dec-84 Eye cataract removal none


Eye Cornea
6845 243 05-Apr-94 Replacement Tetracycline

IS 257 – Fall 2006 2006.09.14 - SLIDE 37


Third Normal Form

Drug Admin Side Effects


Cephalosporin none
Demicillin none
none none
Penicillin rash
Tetracycline Fever

IS 257 – Fall 2006 2006.09.14 - SLIDE 38


2NF Storage Anomalies Removed

• Insertion: We can now enter the fact that a


particular drug has a particular side effect
in the Drug relation.
• Deletion: If John White recieves some
other drug as a result of the rash from
penicillin, but the information on penicillin
and rash is maintained.
• Update: The side effects for each drug
appear only once.

IS 257 – Fall 2006 2006.09.14 - SLIDE 39


Boyce-Codd Normal Form
• Most 3NF relations are also BCNF
relations.
• A 3NF relation is NOT in BCNF if:
– Candidate keys in the relation are composite
keys (they are not single attributes)
– There is more than one candidate key in the
relation, and
– The keys are not disjoint, that is, some
attributes in the keys are common

IS 257 – Fall 2006 2006.09.14 - SLIDE 40


Most 3NF Relations are also BCNF – Is this
one?
Patient # Patient Name Patient Address
15 New St. New
1111 John W hite York, NY
10 Main St. Rye,
1234 Mary Jones NY
Charles Dogwood Lane
2345 Brown Harrison, NY
55 Boston Post
4876 Hal Kane Road, Chester,
Blind Brook
5123 Paul Kosher Mamaroneck, NY
Hilton Road
6845 Ann Hood Larchmont, NY

IS 257 – Fall 2006 2006.09.14 - SLIDE 41


BCNF Relations

Patient # Patient Name Patient # Patient Address


15 New St. New
1111 John W hite 1111 York, NY
10 Main St. Rye,
1234 Mary Jones 1234 NY
Charles Dogwood Lane
2345 Brown 2345 Harrison, NY
55 Boston Post
4876 Hal Kane 4876 Road, Chester,
Blind Brook
5123 Paul Kosher 5123 Mamaroneck, NY
Hilton Road
6845 Ann Hood 6845 Larchmont, NY

IS 257 – Fall 2006 2006.09.14 - SLIDE 42


Fourth Normal Form
• Any relation is in Fourth Normal Form if it
is BCNF and any multivalued
dependencies are trivial
• Eliminate non-trivial multivalued
dependencies by projecting into simpler
tables

IS 257 – Fall 2006 2006.09.14 - SLIDE 43


Fifth Normal Form
• A relation is in 5NF if every join
dependency in the relation is implied by
the keys of the relation
• Implies that relations that have been
decomposed in previous NF can be
recombined via natural joins to recreate
the original relation.

IS 257 – Fall 2006 2006.09.14 - SLIDE 44


Effectiveness and Efficiency Issues for
DBMS
• Focus on the relational model
• Any column in a relational database can
be searched for values.
• To improve efficiency indexes using
storage structures such as BTrees and
Hashing are used
• But many useful functions are not
indexable and require complete scans of
the the database

IS 257 – Fall 2006 2006.09.14 - SLIDE 45


Example: Text Fields
• In conventional RDBMS, when a text field
is indexed, only exact matching of the text
field contents (or Greater-than and Less-
than).
– Can search for individual words using pattern
matching, but a full scan is required.
• Text searching is still done best (and
fastest) by specialized text search
programs (Search Engines) that we will
look at more later.

IS 257 – Fall 2006 2006.09.14 - SLIDE 46


Normalization
• Normalization is performed to reduce or
eliminate Insertion, Deletion or Update
anomalies.
• However, a completely normalized
database may not be the most efficient or
effective implementation.
• “Denormalization” is sometimes used to
improve efficiency.

IS 257 – Fall 2006 2006.09.14 - SLIDE 47


Normalizing to death
• Normalization splits database information
across multiple tables.
• To retrieve complete information from a
normalized database, the JOIN operation
must be used.
• JOIN tends to be expensive in terms of
processing time, and very large joins are
very expensive.

IS 257 – Fall 2006 2006.09.14 - SLIDE 48


Downward Denormalization
Customer Customer
Before: After:
ID ID
Address Address
Name Name
Telephone Telephone

Order Order
Order No Order No
Date Taken Date Taken
Date Dispatched Date Dispatched
Date Invoiced Date Invoiced
Cust ID Cust ID
Cust Name
IS 257 – Fall 2006 2006.09.14 - SLIDE 49
Upward Denormalization
Order Order
Order No Order No
Date Taken Date Taken
Date Dispatched Date Dispatched
Date Invoiced Date Invoiced
Cust ID Cust ID
Cust Name Cust Name
Order Price
Order Item
Order No Order Item
Item No Order No
Item Price Item No
Num Ordered Item Price
Num Ordered

IS 257 – Fall 2006 2006.09.14 - SLIDE 50


Denormalization
• Usually driven by the need to improve
query speed
• Query speed is improved at the expense
of more complex or problematic DML
(Data manipulation language) for updates,
deletions and insertions.

IS 257 – Fall 2006 2006.09.14 - SLIDE 51


Using RDBMS to help normalize

• Example database: Cookie


• Database of books, libraries, publisher and
holding information for a shared (union)
catalog

IS 257 – Fall 2006 2006.09.14 - SLIDE 52


Cookie relationships

IS 257 – Fall 2006 2006.09.14 - SLIDE 53


Cookie BIBFILE relation
ACCNO AUTHOR TITLE LOC PUBID DATE PRICE PAGINATION ILL HEIGHT
A003 AMERICAN LIBRARY ASSOCIATION
ALA BULLETIN CHICAGO 04 $3.00 63 V. ILL. 26
T082 ANDERSON, THEODORE THE TEACHING OF MODERN
PARIS LANGUAGES 53 1955 $10.95 294 P. 22
C024 AXT, RICHARD G. COLLEGE SELF STUDY BOULDER,
: LECTURES
CO. ON51 INSTITU
1960 $7.00 X, 300 P. GRAPHS 28
B006 BALDERSTON, FREDERICK MANAGING
E. TODAYS SAN
UNIVERSITY
FRANCISCO 27 1975 $6.00 XVI, 307 P. 24
B007 BARZUN, JACQUES TEACHER IN AMERICA GARDEN CITY 18 1954 $7.00 280 P. 18
B005 BARZUN, JACQUES THE AMERICAN UNIVERSITY
NEW YORK : HOW IT 24
RUNS, W 1970 $5.00 XII, 319 P. 20
B008 BARZUN, JACQUES THE HOUSE OF INTELLECT
NEW YORK 24 1961 $8.00 VIII, 271 P. 21
B010 BELL, DANIEL THE COMING OF POST-INDUSTRIAL
NEW YORK SOCIETY
09 1976
: $10.00 XXVII, 507 P. 21
B009 BENSON, CHARLES S. IMPLEMENTING THE SAN LEARNING
FRANCISCO
SOCIETY
27 1974 $9.00 XVII, 147 P. 24
B012 BERG, IVAR EDUCATION AND JOBS BOSTON
: THE GREAT TRAINING
10 1971 $12.00 XX, 200 P. 21
B011 BERSI, ROBERT M. RESTRUCTURING THE W BACCALAUREATE
ASHINGTON, D.C. 03 1973 $11.00 IV, 160P. 23
B014 BEVERIDGE, W ILLIAM I.THE ART OF SCIENTIFIC
NEWINVESTIGATION
YORK 58 1957 $14.00 XIV, 239 P. 18
B013 BIRD, CAROLINE THE CASE AGAINST NEW
COLLEGE YORK 08 1975 $13.00 XII, 308 P. 18
B016 BISSELL, CLAUDE T. THE STRENGTH OF THE TORONTO
UNIVERSITY 57 1968 $14.00 VII, 251 P. 21
B017 BLAIR, GLENN MYERS EDUCATIONAL PSYCHOLOGYNEW YORK 30 1962 $11.00 678 P. 24
F047 BLAKE, ELIAS, JR. THE FUTURE OF THECAMBRIDGE,
BLACK COLLEGES MA.02 1971 $14.25 VIII, PP. 539 23
B116 BOLAND, R.J. CRITICAL ISSUES IN INFORMATION
CHICHESTER, ENG. SYSTEMS
63 1987
R $30.95 XV, 394 P. ILL. 24
S102 BROW N, SANBORN C., SCIENTIFIC
ED. MANPOWCAMBRIDGE,
ER MASS.
29 1971 $4.00 X, 180 P. 26
B118 BUCKLAND, MICHAEL K.LIBRARY SERVICES ELMSFORD,
IN THEORY AND NY CONTEXT
70 1983 $12.00 XII, 201 P. ILL. 23
B018 BUDIG, GENE A. ACADEMIC QUICKSAND LINCOLN,
: SOME NEBRASKA
TRENDS
37 AND1973ISS $13.00 74 P. 23
C031 CALIFORNIA. DEPT. OFLAWJUSTICE
IN THE SCHOOLMONTCLAIR, N.J. 35 1974 $0.50 IV, 87 P. 21
C032 CAMPBELL, MARGARETWA. HY W OULD A GIRLOLD
GO INTOW ESTBURY,
MEDICINE?
48
N.Y. 1973 $1.50 V, 114 P. 24
C034 CARNEGIE COMMISSION A DIGEST
ON HIGHER
OF REPORTS
NEWOFYORK THE CARNEGIE
30 COMM
1974 $3.50 399 P. 24

IS 257 – Fall 2006 2006.09.14 - SLIDE 54


How to Normalize?
• Currently no way to have multiple authors
for a given book, and there is duplicate
data spread over the BIBFILE table
• Can we use the DBMS to help us
normalize?
• Access example…

IS 257 – Fall 2006 2006.09.14 - SLIDE 55


Database Creation in Access
• Simplest to use a design view
– wizards are available, but less flexible
• Need to watch the default values
• Helps to know what the primary key is, or
if one is to be created automatically
– Automatic creation is more complex in other
RDBMS and ORDBMS
• Need to make decision about the physical
storage of the data

IS 257 – Fall 2006 2006.09.14 - SLIDE 56


Database Creation in Access
• Some Simple Examples

IS 257 – Fall 2006 2006.09.14 - SLIDE 57


Lecture Outline

• Review
• Logical Model for the Diveshop
database
• Normalization
• Relational Advantages and
Disadvantages

IS 257 – Fall 2006 2006.09.14 - SLIDE 58


Advantages of RDBMS
• Relational Database Management
Systems (RDBMS)
• Possible to design complex data storage
and retrieval systems with ease (and
without conventional programming).
• Support for ACID transactions
– Atomic
– Consistent
– Independent
– Durable

IS 257 – Fall 2006 2006.09.14 - SLIDE 59


Advantages of RDBMS
• Support for very large databases
• Automatic optimization of searching (when
possible)
• RDBMS have a simple view of the
database that conforms to much of the
data used in business
• Standard query language (SQL)

IS 257 – Fall 2006 2006.09.14 - SLIDE 60


Disadvantages of RDBMS
• Until recently, no real support for complex
objects such as documents, video, images,
spatial or time-series data. (ORDBMS add -- or
make available support for these)
• Often poor support for storage of complex
objects from OOP languages (Disassembling the
car to park it in the garage)
• Usually no efficient and effective integrated
support for things like text searching within fields
(MySQL does have simple keyword searching
now with index support)
IS 257 – Fall 2006 2006.09.14 - SLIDE 61
Next Week
• Database Design Workshop

IS 257 – Fall 2006 2006.09.14 - SLIDE 62

You might also like