Spatial Data Base Chapter Two - Data Modellig 4th Year

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 36

Chapter Two

Data Modeling

1
Outline
1.Overview
2. Data Modeling Using The Entity-Relationship Model
– Conceptual Data Models For Database Design
– An Example Database Application
– Entity Types, Entity Sets, Attributes, and Keys
– Relationship Types
– Design Choices For ER Conceptual Design

3. The Relational Data Model & Relational Database Constraints


– Relational Model Concepts
– Relational Model Constraints And Relational Database Schemas
– Update Operations And Dealing With Constraint Violations
2
1. Overview

1.1 What is a Data Model?


 What is a model? (Dictionary meaning)
 A set of plans (blueprint drawing) for a building
 A small representation of a system to analyze properties of interest
 What is Data Model?
 Specify structure or schema of a data set
 Document description of data
 Facilitates early analysis of some properties, e.g. querying ability,
redundancy, consistency, storage space requirements, etc.
 Examples:
 Databases organize dataset as a collection of tables

3
1.2 Why Data Models?
 Data models facilitate
 Early analysis of properties, e.g. storage cost, querying ability, ...
 Reuse of shared data among multiple applications
 Exchange of data across organization
 Conversion of data to new software / environment
 Example- Y2K crisis for year 2000
 Many computer software systems were developed without well-
defined data models in 1960s and 1970s. These systems used a variety
of data models for representing time and date. Some of the
representations used two digits to represent years. In late 1990s, people
worried that the 2 digit representation of year may lead to erroneous
behavior.
4
 For example age of a person born in 1960 (represented as 60) in year
2000 (represented as 00) may appear negative and may be flagged as
illegal data item. A large amount of effort and resources (hundreds of
Billions of dollars) was spent in revising the software.

 Proper use of data model may have significantly reduced the costs. If
time and date were modeled as abstract data types in a software, only a
small portion of the software implementing the date ADT had to be
reviewed and revised.

 So we have to follow some predefine steps (Software Engineering life Cycle )


so as to mitigate some of unforeseen problems

5
2. Data Modeling Using The Entity-Relationship Model

2.1 Database Design phases

Phase One: Requirement collection and


Analysis FIGURE 2.1 : A simplified
diagram of main phases of
 During this step, the database database design.

designers interview prospective


database users to understand and
document their data requirements.
 The result of this step is a concisely
written set of users' requirements.
 In parallel with specifying the data
requirements, it is useful to specify
the known functional
requirements of the application.

6
• Phase Two:The Conceptual schema
– is a concise description of the data requirements of the users and
includes detailed descriptions of the entity types, relationships,
and constraints; these are expressed using the concepts provided
by the high-level data model.
– The high-level conceptual schema can also be used as a reference
to ensure that all users' data requirements are met and that the
requirements do not conflict
• Pahse Three: The Logical Design
– It is the actual implementation of the database, using a
commercial DBMS.
– conceptual schema is transformed from the high-level data model
into the implementation data model.
• Phase Four: physical design
– during which the internal storage structures, indexes( access
paths) and file organizations for the database files are specified
7
2.2 AN EXAMPLE DATABASE Application
 Assume that we want to Create a database application called Company
 The COMPANY database keeps track of a company's employees,
departments, and projects. Suppose that after the requirements
collection and analysis phase, the database designers provided the
following description of the "miniworld
i. The company is organized into departments. Each department has a
unique name, a unique number, and a particular employee who
manages the department. We keep track of the start date when that
employee began managing the department.
ii. A department controls a number of projects, each of which has a
unique name, a unique number, and a single location.
iii. We store each employee's name, social security number,address,
salary, sex, and birth date. An employee is assigned to one
department but may work on several projects, which are not
necessarily controlled by the same department. We keep track of the
number of hours per week that an employee works on each project.
We also keep track of the direct supervisor of each employee. 8
FIGURE 2.2 : An ER schema diagram for the COMPANY
database.

iv. We want to keep track


of the dependents of
each employee for
insurance purposes. We
keep each dependent's
first name, sex, birth
date, and relationship to
the employee.

9
2.3 ENTITY TYPES, ENTITY SETS,ATTRIBUTES, AND KEYS
• Entities and Attributes-The basic object that the ER model represents
is an entity, which is a "thing" in the real world with an independent
existence. An entity may be an object with a physical existence (for
example, a particular person, car, house, or employee) or it may be an
object with a conceptual existence (for example, a company, a job, or a
university course).
• Each entity has attributes-the particular properties that describe it. For
example, an employee entity may be described by the employee's
name, age, address, salary, and job. A particular entity will have a value
for each of its attributes. The attribute values that describe each entity
become a major part of the data stored in the database.
• WEAK ENTITY TYPES-Entity types that do not have key attributes of their
10
own are called weak entity types.
FIGURE 2.3 : Preliminary design of entity types for the COMPANY
database.

11
2.4 RELATIONSHIP
 There are several implicit relationships among the various entity types.
 For example, the attribute Manager of DEPARTMENT refers to an
employee who manages the department;
 The attribute Controlling Department of PROJECT refers to the
department that controls the project;
 The attribute Department of EMPLOYEE refers to the department
for which the employees works; and so on.
 In the ER model, these references should not be represented as
attributes but as relationships, which are discussed later on.
• In ER diagrams, relationship types are displayed as diamond-shaped
boxes, which are connected by straight lines to the rectangular boxes
representing the participating entity types.
12
FIGURE 2.5
Summary of the notation for ER
diagrams.

13
FIGURE 2 . 6 Part of an ER diagram for a COMPANY database.

Slide 2-14
FIGURE 2.7 Part of an ER diagram for a COURSES database.

15
2.5 CONCEPTUAL DESIGN FOR LARGE ENTERPRISES

 We have thus far concentrated on the constructs available in the ER


model for describing various application concepts and relationships.
But, the process of conceptual design consists of more than just
describing small fragments of the application in terms of ER
diagrams.
 For a large enterprise, the design may require the efforts of more
than one designer and span data and application code used by a
number of user groups.
 Using a high-level, semantic data model such as ER diagrams for
conceptual design in such an environment efforts the additional
advantage that the high-level design can be diagrammatically
represented and is easily understood by the many people who must
provide input to the design process.
16
 An important aspect of the design process is the methodology used to
structure the development of the overall design and to ensure that the
design takes into account all user requirements and is consistent.
 The usual approach is that the requirements of various user groups
are considered, any conflicting requirements are somehow resolved,
and a single set of global requirements is generated at the end of the
requirements analysis phase.
 Generating a single set of global requirements is a difficult task, but
it allows the conceptual design phase to proceed with the
development of a logical schema that spans all the data and
applications throughout the enterprise.
17
Exercise one : The Prescriptions-R-X chain of pharmacies has offered to
give you a free lifetime supply of medicines if you design its database.
Here's the information that you gather:
– Patients are identied by an SID, and their names, addresses, and ages
must be recorded.
– Doctors are identied by an EID. For each doctor, the name, specialty,
and years of experience must be recorded.
– Each pharmaceutical company is identied by name and has a phone
number.
– For each drug, the trade name and formula must be recorded. Each
drug is sold by a given pharmaceutical company, and the trade name
identies a drug uniquely from among the products of that company. If a
pharmaceutical company is deleted, you need not keep track of its
products any longer.
– Each pharmacy has a name, address, and phone number.
– Every patient has a primary physician. Every doctor has at least one
patient.
18
– Each pharmacy sells several drugs and has a price for each. A drug
could be sold at several pharmacies, and the price could vary from
one pharmacy to another.
– Doctors prescribe drugs for patients. A doctor could prescribe one or
more drugs for several patients, Each prescription has a date and a
quantity associated with it.
– Pharmaceutical companies have long-term contracts with
pharmacies. A pharmaceutical company can contract with several
pharmacies, and a pharmacy can contract with several
pharmaceutical companies. For each contract, you have to store a
start date, an end date, and the text of the contract.
– Pharmacies appoint a supervisor for each contract. There must
always be a supervisor for each contract, but the contract supervisor
can change over the lifetime of the contract.
Draw an ER diagram that captures the above information.
19
Exercise two: Nahom Records has decided to store information about musicians
who perform on its albums (as well as other company data) in a
database. The company has wisely chosen to hire you as a database
designer (at your usual consulting fee of 600Birr/day).
– Each musician that records at Notown has an SSN, a name, an address, and a phone
number.
– Each instrument that is used in songs recorded at Nahom has a name (e.g., guitar,
– synthesizer, flute)
– Each album that is recorded on the Nahom label has a title, a copyright date, a
format(e.g., CD , VCD, DVD), and an album identier.
– Each song recorded at Nahom has a title and an author.
– Each musician may play several instruments, and a given instrument may be played
by several musicians.
– Each album has a number of songs on it, but no song may appear on more than one
album.
– Each song is performed by one or more musicians, and a musician may perform a
number of songs.
– Each album has exactly one musician who acts as its producer. A musician may
produce several albums, of course.
• Design a conceptual schema for Nahom and draw an ER diagram
for your schema. The following information describes the situation that the Notown database
must model. Be sure to indicate all key and cardinality constraints and any assumptions that you make.
Identify any constraints that you are unable to capture in the ER diagram and briefly explain why you
could not express them. 20
3. Relational Model Concepts
 The relational model represents the database as a collection of
relations, where each relation is a table with rows and
columns.
 In the formal relational model terminology, a row is called a
tuple, a column header is called an attribute, and the table is
called a relation. The data type describing the types of values
that can appear in each column is represented by a domain of
possible values.
 common method of specifying a domain is to specify a data
type from which the data values forming the domain are drawn.
Example.
 Local_phone_numbers: The set of ten-digit phone numbers
 Names: The set of character strings that represent names of persons.
 Grade_paint_averages: Possible values of computed grade point averages;
each must be a real (floating-point) number between 0 and 4.
 • Employee_ages: Possible ages of employees of a company; each must be
a value between 18 and 65 years old.
 Date of Birth have various formats such yyyy-mm-dd, or dd mm,yyyy etc.21
 An example of a relation schema that describes university
students is the following:
– STUDENT(Name: string, SSN: string, HomePhone: string,
Address: string, DateofBirth: DateTime, Age: integer, GPA: real)
 For this relation schema, STUDENT is the name of the relation,
which has seven attributes. In the above definition, we showed
assignment of generic types such as string or integer to the
attributes.

 Key of a Relation: Each row has a value of a data item (or set of items) that
uniquely identifies that row in the table Called the key . Here SSN is the key
 Remark: we display the relation as a table, where each tuple is shown as a row and each attribute
corresponds to a column header indicating a role or interpretation of the values in that column. Null values
represent attributes whose values are unknown or do not exist for some individual STUDENT tuple.
22
 The degree of a relation is the number of attributes n of its
relation schema.
 A relation schema R of degree n is denoted by
R(A1 , A2 , …, An )
 The domain of Ai is denoted by dom(Ai).

DEFINITION SUMMARY

Informal Terms Formal Terms


Table Relation
Column Attribute/Domain
Row Tuple
Values in a column Domain
Table Definition Schema of a Relation
23
3.1 Characteristics of Relations
 Ordering of tuples in a relation r(R):
– A relation is defined as a set of tuples. Mathematically,
elements of a set have no order among them; hence,
tuples in a relation do not have any particular order.
 Ordering of attributes in a relation schema R (and of values
within each tuple):

 We will consider the attributes in R(A1, A2, ..., An) and


the values in t=<v1, v2, ..., vn> to be ordered .
 the order of attributes and their values is not that
important as long as the correspondence between
attributes and values is maintained.
 Values in a tuple: All values are considered atomic
(indivisible). A special null value is used to represent
values that are unknown or inapplicable to certain tuples.
24
3. 2 Relational Integrity Constraints
 So far, we have discussed the characteristics of single relations. In a
relational database, there will typically be many relations, and the
tuples in those relations are usually related in various ways.
 There are generally many restrictions or constraints on the actual
values in a database state. These constraints are derived from the rules
in the miniworld that the database represents.
 Constraints are conditions that must hold on all valid relation states.
 There are three main types of constraints in the relational model:
 Key constraints
 Entity integrity constraints
 Referential integrity constraints
 Another implicit constraint is the domain constraint
 Every value in a tuple must be from the domain of its attribute (or it
could be null, if allowed for that attribute)
25
i. Key Constraints
 A relation is defined as a set of tuples. By definition, all elements
of a set are distinct; hence, all tuples in a relation must also be
distinct. This means that no two tuples can have the same
combination of values for all their attributes.
 If a relation has several candidate keys, one is chosen arbitrarily
to be the primary key.
 The primary key attributes are underlined.
 Example: Consider the CAR relation schema:
 CAR( License#, SerialNo, Make, Model, Year): We can choose
SerialNo as the primary key
 The primary key value is used to uniquely identify each tuple in a
relation and provides the tuple identity 26
 Relational Database Schema: A set S of relation schemas that
belong to the same database.
 S is the name of the whole database schema
 S = {R1, R2, ..., Rn}
 R1, R2, …, Rn are the names of the individual relation
schemas within the database S

Schema Diagram for the COMPANY


Relational Database Schema

27
One possible database state for the COMPANY relational database schema

28
ii. Entity Integrity
 The primary key attributes PK of each relation schema R in S
cannot have null values in any tuple of r(R).
 This is because primary key values are used to identify
the individual tuples.
 t[PK]  null for any tuple t in r(R)
 If PK has several attributes, null is not allowed in any of
these attributes
 Note: Other attributes of R may be constrained to disallow null values, even
though they are not members of the primary key.
iii. Referential Integrity
 A constraint involving two relations.
 Used to specify a relationship among tuples in two relations:
the referencing relation and the referenced relation.
 Tuples in the referencing relation R1 have attributes FK (called
foreign key attributes) that reference the primary key
attributes PK of the referenced relation R2. A tuple t1 in R1 is
29
said to reference a tuple t2 in R2 if t1[FK] = t2[PK].
Displaying a relational database schema and its constraints

 Each relation schema can be displayed as a row of attribute


names
 The name of the relation is written above the attribute names
 The primary key attribute (or attributes) will be underlined
 A foreign key (referential integrity) constraints is displayed as a
directed arc (arrow) from the foreign key attributes to the
referenced table
 Can also point the primary key of the referenced relation for
clarity
 Next slide shows the COMPANY relational schema diagram

30
31
3. 3 Populated database state

 Each relation will have many tuples in its current relation state
 The relational database state is a union of all the individual
relation states
 Whenever the database is changed, a new state arises
 Basic operations for changing the database:
 INSERT a new tuple in a relation
 DELETE an existing tuple from a relation
 MODIFY an attribute of an existing tuple
 Next slide shows an example state for the COMPANY database

32
Populated database state for COMPANY

33
Possible violations for each operation
 INSERT may violate any of the constraints:
– Domain constraint:
• if one of the attribute values provided for the new tuple is not
of the specified attribute domain
– Key constraint:
• if the value of a key attribute in the new tuple already exists
in another tuple in the relation
– Entity integrity:
• if the primary key value is null in the new tuple
– Referential integrity:
• if a foreign key value in the new tuple references a primary
key value that does not exist in the referenced relation

34
• DELETE- may violate only referential integrity:
– If the primary key value of the tuple being deleted is
referenced from other tuples in the database
• UPDATE- may violate domain constraint and NOT NULL
constraint on an attribute being modified
– Any of the other constraints may also be violated,
depending on the attribute being updated:
• Updating the primary key (PK):
• Updating a foreign key (FK):
– May violate referential integrity
• Updating an ordinary attribute (neither PK nor FK):
– Can only violate domain constraints
35
Exercise

Consider the following relations for a database that keeps track of student
enrollment in courses and the books adopted for each course:
STUDENT(SSN, Name, Major, Bdate)
COURSE(Course#, Cname, Dept)
ENROLL(SSN, Course#, Quarter, Grade)
BOOK_ADOPTION(Course#, Quarter, Book_ISBN)
TEXT(Book_ISBN, Book_Title, Publisher, Author)
Draw a relational schema diagram specifying the foreign keys for this
schema.

36

You might also like