Professional Documents
Culture Documents
Introduction To Database Systems CSE 544: Lecture #2 January 16, 2007
Introduction To Database Systems CSE 544: Lecture #2 January 16, 2007
CSE 544
Lecture #2
January 16, 2007
1
Review Questions: NULLS
From Lecture 1:
3
Review Question: ACID
From the reading assignment
SQL for Web Nerds:
4
Discussion of Project/Phase 1
• Task 1: Schema design
5
Task 1: Schema Design
Official requirement
• Read the project description
• Design a “good” database schema
6
Task 1: Schema Design
What you should do:
• Read description AND look inside the
starter code App_code/Provided/…
• Read the classes, determine the fields…
7
Task 1: Schema Design
• Optional: draw an E/R diagram
• Create a file:
CREATE TABLE Customer ( . . . )
CREATE TABLE Invoice ( . . . )
...
9
Task 2: Import Sample Data
INSERT INTO Customer ( . . . )
• Create a file: VALUES (‘John’, ….)
INSERT INTO Customer ( . . . )
VALUES (‘Sue’, ….)
...
10
Task 3: Modify Starter Code
The starter code:
• C#
• ASP.NET (you do not need to understand it)
http://www.ecma-international.org/activities/Languages/Introduction%20to%20Csharp.pdf
12
C# - Highlights
• C# = C++.Sytnax + Java.Semantics
class Hello {
static void Main() {
Console.WriteLine("Hello world");
}
}
14
Properties: Getters and Setters
public class Point {
private int x; Point uvw = new Point();
private string c;
uvw.position = 55;
public int position { uvw.color = “green”;
get { return x; } uvw.position =
set { x = value; c = “red”; } uvw.position * 2;
} if (uvw.color == “green”)
public string color { …
get { return c; }
set { c = value; x++; }
} 15
Indexers
publicGetters
class Stuff { with […]
and setters
private int x[];
uvw[12] = 55;
uvw[99] = uvw[12]*7 + 2;
16
Enum
• Why ?
Nguyen creates file 1, you create file 2
18
Dataset
This is an important class that allows you to
interact with a database
• DataTable
• DataRow
19
DataSet
DataSet myLocalDB = new DataSet();
.....
. . . . . /* create inside a table called “books” */
. . . . . /* (this is shown on a following slide) */
21
Connecting to a Database
/* create inside a table called “books” */
23
Task 3: Modify Starter Code
• What you have to do:
• App_Code/Phase1/Billing and Shipping/…
foreach(datarow r in invoices.Rows) {
invoiceList.Add(new Invoice( r ));
}
Now you need
return invoiceList; to implement this
}
26
Task 3: Modify Starter Code
public partial class Invoice {
public Invoice(DataRow invoiceData) {
/* here goes your code, something like that: */
init(invoiceData); /* may need it in several places */
}
....
private void init(DataRow invoiceData) {
invoiceId = (int) invoiceData[“invoiceId”];
orderDate = (DateTime) invoiceData[“date”];
.....
In Provided In you SQL 27
Task 3 Summary
• The previous slides give you a hint at how to
proceed
• You have to fill in the eight files under Phase1
• While doing this, examine the code under
Provided
• Figuring out WHAT TO DO takes longer than
actually doing it
• Run the project often to see how it behaves
• Compare with working example at
http://iisqlsrv.cs.washington.edu/CSEP544/Phase1
_Example/ 28
Time Estimate
• Task 1: about 9 tables or so, 2-3 hours
29
E/R Diagrams
• E/R diagrams: Chapter 2
30
Database Design
1. Requirements Analysis: e.g. Phase 1 of the project
Objects entities
Classes entity sets Product
price
makes Company
Product
stockprice
buys employs
Person
name category
price
Product
34
What is a Relation ?
• A mathematical definition:
– if A, B are sets, then a relation R is a subset of
AB
1 a
• A={1,2,3}, B={a,b,c,d},
2 b
A B = {(1,a),(1,b), . . ., (3,d)} A=
R = {(1,a), (1,c), (3,b)} c
3
B= d
- makes is a subset of Product Company:
makes Company
Product 35
Multiplicity of E/R Relations
• one-one:
1 a
2 b
3 c
d
• many-one
1 a
2 b
3 c
d
• many-many
1 a
2 b
3 c
d Note: book places arrow differently
36
name category
name
price
makes Company
Product
stockprice
What does
this say ?
buys employs
Person
Product
Purchase Store
Person
Product
Purchase Store
Person
Product
Purchase Store
Person
Product
Purchase Store
Person
Purchase
StoreOf Store
BuyerOf Person
42
3. Design Principles
What’s wrong?
43
Moral: be faithful!
Design Principles:
What’s Wrong?
date
Product
Purchase Store
Product
Purchase Store
Moral: don’t
complicate life more
than it already is.
Person 45
Logical Database Design =
E/R Relations
• Relationship relation
46
Entity Set to Relation
name category
price
Product
makes Company
Product
Stock price
Makes(product-name, product-category, company-name, year)
Product-name Product-Category Company-name Starting-year
makes Company
Product
Stock price
No need for Makes. Modify Product:
Purchase(prodName,stName,ssn)
Person
ssn name
50
Modeling Subclasses
Products
Software Educational
products products
price
Product
isa isa
– SoftwareProduct field1
field2
field3
– EducationalProduct
field1
field2
field4
field5 53
Product
Subclasses to Relations
Name Price Category
Gizmo 99 gadget
Product
p1 p2
p3 ep1
sp1 EducationalProduct
ep2
SoftwareProduct
sp2 ep3
55
Difference between OO and E/R
inheritance
• E/R: entity sets overlap
Product
p1 p2
p3 ep1
sp1 EducationalProduct
ep2
SoftwareProduct sp2 ep3
56
No need for multiple inheritance in E/R
Product
p1 p2
p3 ep1
sp1 EducationalProduct
ep2
SoftwareProduct sp2
esp1 esp2 ep3
FurniturePiece
Company
Person
ownedByPerson ownedByPerson
59
Modeling Union Types with
Subclasses
Solution 2: better, more laborious
Owner
isa isa
ownedBy
Person Company
FurniturePiece
60
Constraints in E/R Diagrams
Finding constraints is part of the modeling process.
Commonly used constraints:
price
makes
v. s.
makes
63
Referential Integrity Constraints
<100
Product makes Company
65
Weak Entity Sets
Entity sets are weak when their key comes from other
classes to which they are related.
67
Schema Refinements = Normal
Forms
• 1st Normal Form = all tables are flat
• 2nd Normal Form = obsolete
• Boyce Codd Normal Form = will study
• 3rd Normal Form = see book
68
First Normal Form (1NF)
• A database schema is in First Normal Form
if all tables are flat Student
Student Name GPA
Alice 3.8
Name GPA Courses Bob 3.7
Carol 3.9
Math
Relational Model:
plus FD’s
Normalization:
Eliminates anomalies
70
Data Anomalies
When a database is poorly designed we get anomalies:
71
Relational Schema Design
Recall set attributes (persons with several phones):
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
One person may have multiple phones, but lives in only one city
Anomalies:
• Redundancy = repeat data
• Update anomalies = Fred moves to “Bellevue”
• Deletion anomalies = Joe deletes his phone number:
what is his city ? 72
Relation Decomposition
Break the relation into two:
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
74
Functional Dependencies
• A form of constraint
– hence, part of the schema
• Finding them is part of the database design
• Also used in normalizing the relations
75
Functional Dependencies
Definition:
R
A1 ... Am B1 ... Bm
t’
Position Phone
79
Example
80
Example
FD’s are constraints:
• On some instances they hold name color
• On others they don’t category department
color, category price
name color
If all these FDs are true: category department
color, category price
Why ?? 83
Goal: Find ALL Functional
Dependencies
• Anomalies occur when certain “bad” FDs
hold
A1 ... Am B1 ... Bm
A1, A2, …, An B1
A1, A2, …, An B2
.....
A1, A2, …, An Bm
85
Armstrong’s Rules (1/3)
where i = 1, 2, ..., n
A1 … Am
Why ?
86
Armstrong’s Rules (1/3)
88
Example (continued)
Start from the following FDs: 1. name color
2. category department
3. color, category price
Infer the following FDs:
Which Rule
Inferred FD
did we apply ?
4. name, category name
5. name, category color
6. name, category category
7. name, category color, category
8. name, category price 89
Example (continued)
1. name color
Answers: 2. category department
3. color, category price
Which Rule
Inferred FD
did we apply ?
4. name, category name Trivial rule
5. name, category color Transitivity on 4, 1
6. name, category category Trivial rule
7. name, category color, category Split/combine on 5, 6
8. name, category price Transitivity on 3, 7
90
THIS IS TOO HARD ! Let’s see an easier way.
Closure of a set of Attributes
Given a set of attributes A1, …, An
{name, category}+ =
{ name, category, color, department, price }
R(A,B,C,D,E,F) A, B C
A, D E
B D
A, F B
93
Why Do We Need Closure
• With closure we can find all FD’s easily
• To check if X A
– Compute X+
– Check if A X+
94
Using Closure to Infer ALL FDs
Example:
A, B C
A, D B
B D
Step 1: Compute X+, for every X:
A+ = A, B+ = BD, C+ = C, D+ = D
AB+ =ABCD, AC+=AC, AD+=ABCD,
BC+=BCD, BD+=BD, CD+=CD
ABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)
BCD+ = BCD, ABCD+ = ABCD
Step 2: Enumerate all FD’s X Y, s.t. Y X+ and XY = :
AB CD, ADBC, ABC D, ABD C, ACD B 95
Another Example
• Enrollment(student, major, course, room, time)
student major
major, course room
course time
96
Keys
• A superkey is a set of attributes A1, ..., An s.t. for
any other attribute B, we have A1, ..., An B
97
Computing (Super)Keys
• Compute X+ for all sets X
• If X+ = all attributes, then X is a key
• List only the minimal X’s
98
Example
Product(name, price, category, color)
99
Example
Product(name, price, category, color)
student address
room, time course
student, course room, time
101
Eliminating Anomalies
Main idea:
• X A is OK if X is a (super)key
• X A is not OK otherwise
102
Example
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
Joe 987-65-4321 908-555-1234 Westfield
104
Key or Keys ?
Can we have more than one key ?
Equivalently:
X, either (X+ = X) or (X+ = all attributes)
106
BCNF Decomposition Algorithm
repeat
choose A1, …, Am B1, …, Bn that violates BNCF
split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others])
continue with both R1 and R2
until no more violations
Is there a
2-attribute
relation that is
B’s A’s Others not in BCNF ?
In practice, we have
R1 R2 107
a better algorithm (coming up)
Example
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
Joe 987-65-4321 908-555-1234 Westfield
109
Example Decomposition
Person(name, SSN, age, hairColor, phoneNumber)
SSN name, age
age hairColor
110
BCNF Decomposition Algorithm
BCNF_Decompose(R)
let Y = X+ - X
let Z = [all attributes] - X+
decompose R into R1(X Y) and R2(X Z)
continue to decompose recursively R1 and R2
111
Find X s.t.: X ≠X+ ≠ [all attributes]
Iteration 1: Person
SSN+ = SSN, name, age, hairColor
Decompose into: P(SSN, name, age, hairColor)
Phone(SSN, phoneNumber)
Iteration 2: P
age+ = age, hairColor What are
Decompose: People(SSN, name, age) the keys ?
Hair(age, hairColor)
Phone(SSN, phoneNumber) 112
R(A,B,C,D)
A B
BC
Example
R(A,B,C,D)
A+ = ABC ≠ ABCD
R1(A,B,C) R2(A,D)
B+ = BC ≠ ABC
What are
R11(B,C) R12(A,B) the keys ?
R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)
114
Theory of Decomposition
• Sometimes it is correct:
Name Price Category
Gizmo 19.99 Gadget
OneClick 24.99 Camera
Gizmo 19.99 Camera
115
Lossless decomposition
Incorrect Decomposition
• Sometimes it is not:
Name Price Category
Gizmo 19.99 Gadget What’s
OneClick 24.99 Camera incorrect ??
Gizmo 19.99 Camera
R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)
Unit Company
120
Solution: 3rd Normal Form
(3NF)
A simple condition for removing anomalies from relations:
Tradeoff:
BCNF = no anomalies, but may lose some FDs
3NF = keeps all FDs, but may have some anomalies 121
Computing 3NF
• Basic idea: whenever there is an FD of the
form X A, then add a table XA to the
decomposition
• Read the book
– I don’t find the explanation there too good
• In practice: use BCNF and enforce missing
FD’s in the application
122