Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Modeling Data Objects

LO1 Identify Entities and Relationships


1.1 Understanding and analyzing business operations
Entity Types, Entity Sets, Attributes and Keys
ER Model Overview
ER model describes data in terms of:
 Entities and entity sets
 Objects
 Relationships and relationship sets
 Connections between objects
 Attributes
 Properties that characterize or describe entities or relationships

Entities and Attributes Example

Entity Sets: - Entity type or set


•Collection (or set) of similar entities that have the same attributes

 ER model defines entity sets, not individual entities


 But entity sets described in terms of their attributes

BY:- MM Page 1
Modeling Data Objects

Categories of Attributes
 Simple/Atomic (one value) or composite (divided into sub-parts) e.g. Name =
surname + first_name
 Single- or multi-valued
 Single-valued: one value for a particular entity
 Multi-valued: a set of values possible for 1 entity
E.g. Student has a number of telephone-numbers
 Optional attributes: unknown, not applicable or missing values are possible for this
attribute
 Stored or Derived: values of attributes can be derived from other attributes (not
stored; computed when required)
E.g. net pay (gross – tax), # employees on a project

 Attribute values constrained to be distinct for individual entities in entity set

Initial ER Diagram for Company


 Four entity types
 Most attributes are simple, single-valued, and stored
 Works_on and Locations are multi-valued
 Employee’s Name is composite
 Employee has one key, department and project have two keys, dependent has none

Relationship Sets
 Associations between entity sets which express some real world relationship
 Represented as diamonds in an E-R diagram

 Project and Employee participate in the employs relationship

BY:- MM Page 2
Modeling Data Objects

 The function that an entity plays in a relationship is called that entity’s role
Relationships
 Can have more than one
relationship between entities.

 Can have recursive relationships


and can indicate roles for clarity.

 A relationship can also have


attributes.
Degree of Relationships
• Refers to the number of entities participating in a relationship
• Most relationships are binary (degree 2), but you can also have ternary (degree 3) relationships
etc.

Mapping Cardinalities
Express the number of entities to which another entity can be associated via a relationship. We
will use this convention:
– one to one
– One to many
– Many to one
– Many to many

Cardinality Examples
• A person has only one ID book and an
ID book belongs to only 1 person

• A school can have many pupils, but


a pupil attends only 1 school

• A person can own many houses and a


house can have multiple owners
Participation Constraints

BY:- MM Page 3
Modeling Data Objects

• Entity set E has total participation in relationship set R: every entity in E participates in at least
one relationship in R
 E.g. an ID book must have an owner
 E.g. a house must have an owner
• Partial participation: only some entities in E participate in relationships in R
 E.g. not every person owns a house
• We will use a double-line to show total participation and a single-line to show partial
participation (NB there are many styles of ER notation, but the ideas are the same).

Cardinality Examples
 A person has only one ID book and
an ID book belongs to only 1 person

 A school can have many pupils, but


a pupil attends only 1 school

 A person can own many houses and a


house can have multiple owners

Keys
• A super key of an entity set is one or more attributes whose values uniquely identify a
particular entity.
• A candidate key is a minimal super key (i.e. no subset of the candidate key is a super key).
• A primary key: the candidate key selected by the database designer to uniquely identify
entities (chosen carefully such that attribute values rarely change).
• Keys of relationship sets are formed from the primary keys of participating entities (cardinality
must be considered).

Weak Entity Sets


• An entity set that does not have a primary key is referred to as a weak entity set
• The existence of a weak entity set depends on the existence of a strong entity set, called the
identifying entity set. It is therefore existence dependent on the identifying entity set.
• The relationship must be many-to-one from weak to identifying.
• Participation of the weak entity set in the relationship must be total.
The discriminator (or partial key) of a weak entity set distinguishes weak entities that depend
on the same specific strong entity.
• The primary key of a weak entity is the primary key of the identifying entity set + the partial
key of the weak entity set.
• Example: Many payments are made on a loan
 Payments don’t exist without a loan
 Multiple loans will each have a first, second etc. payment.
So, each payment is only unique in the context of the loan which it is paying off.

BY:- MM Page 4
Modeling Data Objects

1.2 Identify the Scope of the System


1. Database
What is DB Scope?
DB Scope is a tool for developers, used to view and edit databases, application information blocks, and
saved and unsaved preferences.
What's so special about DB Scope?
DB Scope uses the concept of "layouts" for your application's databases. You create a layout that
describes the format of each of your databases, using the 14 data types supported by DB Scope.

Scope, what will be in the Database?


A good database is one that is simple to understand and well planned. The database doesn't have
redundant tables. One can use ERD's (Entity-Relationship Diagrams) or EER's (Enhanced-Entity
Relationship Diagrams) in order to make a good database.

Now about the scope; well if you have good database then
 Easy to locate the data or information in no time.
 No redundant data.
 No repetition.
 More security. Like if one is accessing or changing the data other cannot change the same
data at that time.
 Table references (keys like : Primary and foreign keys) are easy to maintain.
Example:- Define Scope of the Database Project
Identify which organizational subdivisions will be served by the database
Define which functions within these organizations will utilize the database
Identify which existing and planned applications will be converted to the database system
Prepare proposal for management and obtain go-ahead

BY:- MM Page 5
Modeling Data Objects

2. Application
To determine the boundaries of an application system, it is important to examine the application
system from both functional and technical perspectives. A single application system will
normally have the following characteristics:
Consistent user interface design with and application behavior
Common architecture (e.g. program language, program design)
Shared application system components (e.g. visual objects, programs, database tables)
When undertaking an analysis of a computer systems, it is important to verify the presumed
scope of the application system and to identify all interface or integration points it has with other
application systems.

The scope of the application system affects the systems analysis effort. Scope is also significant
when formulating system maintenance, enhancement or replacement options.

Define System Scope

Since system size is a measure of the magnitude of all components of a system that are within the
current scope, the system scope should be documented in the project plan before the system size
is estimated. The scope statement defines what the project will and will not include, in enough
detail to clearly communicate to all participants.

The scope must be a complete definition encompassing all types of requirements:

The external business requirements are generally the most obvious requirements and for which
the definition of scope is the easiest.

The system design may imply requirements that are not specified. For example, the design of a
client/server system may have the need for a fire wall between data moving in and out of the
environment. This may add the need for user exits or other components integrated with the data
propagation software.

BY:- MM Page 6
Modeling Data Objects

Other components are often implied but not clearly defined, such as, performance, interfaces,
operations and implementation. These components should be included if they are within the
scope of the system being sized. If there is any question regarding whether something is
included, it should be assumed to be within the scope of the sizing (refer to Document
Assumptions) until the system scope specifically excludes it.

Clarify System Boundaries

In addition to the scope, it is important that the system boundaries are clearly understood before
the system size is estimated. The boundaries identify where the system to be sized starts and
ends. The sizing should include everything for which the team is responsible. Items or areas
that will be excluded should be clearly stated.

There are two primary reasons to exclude something from the sizing:

The component is another team's responsibility. For example, a business to business system may
provide a standard format for suppliers to interface with their systems. The interface
components that provide the standard interface would be in scope, but the supplier interfaces
may be excluded.

The component is assumed to be already implemented. For example, if the system will use
standard reusable components, such as standard date routines or file access routines that will not
be modified, then these may be excluded from the scope while the new interfaces to these
routines may be in scope.
1.3 Reviewing business rules to determine impact
Over view of business rules
Use business rules to control the behavior of a business practice.

What is a business rule?


A business rule is anything that imposes structure upon or controls the behavior of a business
practice. A rule can enforce business policy, establish common guidelines within an
organization, or control access in a business environment.

Once you've established the Business Rules you believe to be appropriate, review their
specification sheets. Carefully examine the specification sheet and make certain that the rule has
been properly established and that all the appropriate areas on the sheet are clearly marked. If
you find an error, make the necessary modifications and review it once more. Repeat this process
until you've reviewed every Business Rule.

Business Rules are an important component of the database. Along with contributing to overall
data integrity, Business Rules impose integrity constraints that are specific to the organization.
As you've seen, these rules help to ensure the validity and consistency of the data within the

BY:- MM Page 7
Modeling Data Objects

context of the manner in which the organization functions or conducts its business. Furthermore,
these rules will affect the manner in which the database is implemented in an RDBMS and how
it works with the application program used to work with the database.

When to use a business rule


Use business rules to officiate over frequently changing business practices that can come from
within a business or mandated from outside a business, such as regulatory agencies. Some typical
uses for business rules are as follows:
 Determining current interest rates
 Calculating discounts for products
 Calculating sales tax
 Determining special groups such as senior citizens or preferred customers

How to use business rules


Develop and deploy business rules using the Eclipse-based business rules editors in IBM®
Integration Designer. Manage and modify business rule values using the Web-based business
rules manager, which is an option of IBM Process Server. For more information about these
tools, see the appropriate topics in the IBM Integration Designer Information Center and
the IBM Process Server Information Center, respectively.

1.4 Relationships
A relationship works by matching data in key columns — usually columns with the same name
in both tables. In most cases, the relationship matches the primary key from one table, which
provides a unique identifier for each row, with an entry in the foreign key in the other table. For
example, book sales can be associated with the specific titles sold by creating a relationship
between the title_id column in the titles table (the primary key) and the title_id column in
the sales table (the foreign key).
There are three types of relationships between tables. The type of relationship that is created
depends on how the related columns are defined.

One-to-Many Relationships
A one-to-many relationship is the most common type of relationship. In this type of relationship,
a row in table A can have many matching rows in table B, but a row in table B can have only one
matching row in table A. For example, the publishers and titles tables have a one-to-many
relationship: each publisher produces many titles, but each title comes from only one publisher.
Make a one-to-many relationship if only one of the related columns is a primary key or has a
unique constraint.
The primary key side of a one-to-many relationship is denoted by a key symbol. The foreign key
side of a relationship is denoted by an infinity symbol.

Many-to-Many Relationships
In a many-to-many relationship, a row in table A can have many matching rows in table B, and
vice versa. You create such a relationship by defining a third table, called a junction table, whose
primary key consists of the foreign keys from both table A and table B. For example,
the authors table and the titles table have a many-to-many relationship that is defined by a one-
to-many relationship from each of these tables to the titleauthors table. The primary key of

BY:- MM Page 8
Modeling Data Objects

the titleauthors table is the combination of the au_id column (the authors table's primary key)
and the title_id column (the titles table's primary key).

One-to-One Relationships
In a one-to-one relationship, a row in table A can have no more than one matching row in table
B, and vice versa. A one-to-one relationship is created if both of the related columns are primary
keys or have unique constraints.
This type of relationship is not common because most information related in this way would be
all in one table. You might use a one-to-one relationship to:
 Divide a table with many columns.
 Isolate part of a table for security reasons.
 Store data that is short-lived and could be easily deleted by simply deleting the table.
 Store information that applies only to a subset of the main table.
The primary key side of a one-to-one relationship is denoted by a key symbol. The foreign key
side is also denoted by a key symbol.

1.5 Documenting Entity relationship Diagram


Person, object or concept: - Documentation of an E-R Schema
 An Entity-Relationship schema is rarely sufficient by itself to represent all the aspects of an
application in detail.
 It is therefore important to complement every E-R schema with support documentation, which
can facilitate the interpretation of the schema itself and describe properties of the data that cannot
be expressed directly by the constructs of the model.
 A widely-used documentation concept for conceptual schemas is the business rule.

Business Rules
 Business rules are used to describe the properties of an application, e.g., the fact that an employee
cannot earn more than his or her manager.
 A business rule can be:
 the description of a concept relevant to the application (also known as a business object),
 an integrity constraint on the data of the application,
 a derivation rule, whereby information can be derived from other information within a
schema.
Documentation Techniques
 Descriptive business rules can be organized as a data dictionary. This is made up of two tables:
the first describes the entities of the schema, the others describes the relationships.
 Business rules that describe constraints can be expressed in the following form:
 <concept> must/must not <expression on concepts>
 Business rules that describe derivations can be expressed in the following form:
 <concept> is obtained by <operations on concepts>

BY:- MM Page 9
Modeling Data Objects

BY:- MM Page 10
Modeling Data Objects

LO2 Develop Normalization


2.1 Introduction to Normalization
Normalization: Definitions
Normalization is a method used to validate and improve a logical design so that it satisfies
certain constraints that avoid unnecessary duplication of data. The process of decomposing
relations with anomalies to produces smaller, well-structured relations.

Well-Structured Relations
A relation that contains minimal data redundancy and allows users to insert, delete, and update
rows without causing data inconsistencies.

Goal is to avoid anomalies:


Insertion Anomaly–adding new rows forces user to create duplicate data adding new rows
forces user to create duplicate data
Deletion Anomaly–deleting rows may cause a loss of data that would be deleting rows may
cause a loss of data that would be needed for other future rows

Modification Anomaly–changing data in a row forces changes to other changing data in a row
forces changes to other rows because of duplication

Anomalies in this Table


Insertion–can’t enter a new employee without having the employee take a class.
Deletion–if we remove employee 140, we lose information about the existence of a Tax Acc
class.
Modification–giving a salary increase to employee 100 forces us to update multiple records

BY:- MM Page 11
Modeling Data Objects

Why do these anomalies exist?


Because there are two themes (entity types) in this one relation. This results in data
duplication and an unnecessary dependency between the entities.

1.2 Informal Design Guidelines for Relational Schema


The four informal Guidelines to measures the quality of relational schema design are:
Guideline 1: Semantic of Relation Attributes
Design a relational schema so that it is easy to explain its meaning. Do not combine attributes
from multiple entity types and relationship types into a single relation. Intuitively, if a relation
schema corresponds to one entity type, or one relationship type, the meaning tends to be clear.

Example of Good Design

BY:- MM Page 12
Modeling Data Objects

--- Example of Bad Design

Guideline 2: Reducing the Redundant Values in Tuples


Design the base relation schemas so that no insertion, deletion, or modification anomalies are
present in the relation.

If any anomalies are present, note them clearly and make sure that the programs that update the
database will operate correctly. Due to improper grouping of attributes into a relation schema,
the following problems are encountered.
 Storage wastage +
 Insert anomalies +
 Delete anomalies +
 Modification anomalies +

BY:- MM Page 13
Modeling Data Objects

Storage Wastage

Insert Anomalies

Delete Anomalies

BY:- MM Page 14
Modeling Data Objects

Modification Anomalies

Guideline 3: Reducing Null Values in Tuples


If possible avoid placing attributes in a base relation whose values may frequently be null. If
nulls are unavoidable, make sure they apply in exceptional cases only and not to majority of
tuples in a relation.
Problems with null values:
 Waste of disk space
 Problem of understanding the meaning of attributes
 Problems in specifying JOIN operations
 Problems in applying some aggregate functions
May have multiple interpretations (not applicable, unknown, unavailable)
Reasons for nulls:
 attribute not applicable or invalid
 attribute value unknown (may exist)
 value known to exist, but unavailable

Guideline 4: Disallowing generating Spurious (False) Tuples …


Make sure that the foreign keys refer to unique keys.

BY:- MM Page 15
Modeling Data Objects

 The relations should be designed to satisfy the lossless join condition. No spurious tuples
should be generated by doing a natural-join of any relations

Design relational schemas so that they can be joined with equality conditions of attributes that
are easier primary keys or foreign keys in a way that guarantees that no spurious tuples are
generated. Do not have relations that contain matching attributes other than foreign key-primary
key combination. If such relations are unavoidable, do not join them on such attributes, because
the join may produce spurious tuples.

1.3 Functional Dependencies


Functional Dependency
Functional dependency is a relationship that exists when one attribute uniquely determines
another attribute. If R is a relation with attributes X and Y, a functional dependency between the
attributes is represented as X->Y, which specifies Y is functionally dependent on X. Here X is
termed as a determinant set and Y as a dependant attribute. Each value of X is associated
precisely with one Y value. Functional dependency in a database serves as a constraint between
two sets of attributes. Defining functional dependency is an important part of relational database
design and contributes to aspect normalization.

BY:- MM Page 16
Modeling Data Objects

 Functional dependencies (FDs) are used to specify formal measures of the "goodness" of
relational designs
 FDs and keys are used to define normal forms for relations
 FDs are constraints that are derived from the meaning and interrelationships of the data
attributes

A particular relationship between two attributes. For a given relation, attribute B is functionally
dependent on attribute A if, for every valid value of A, that value of A uniquely determines the
value of B. or
 A set of attributes X functionally determines a set of attributes Y if the value of X
determines a unique value for Y
 X Y holds if whenever two tuples have the same value for X, they must have the same
value for Y
If t1[X]=t2[X], then t1[Y]=t2[Y] in any relation instance r(R)
 X  Y in R specifies a constraint on all relation instances r(R)
FD Constraints
 FDs are derived from the real-world constraints on the attributes
 An FD is a property of the attributes in the schema R
 The constraint must hold on every relation instance r(R)
 If K is a key of R, then K functionally determines all attributes in R (since we never have
two distinct tuples with t1[K]=t2[K])

Examples of FD constraints
 Social Security Number determines employee name
SSN  ENAME
 Project Number determines project name and location
PNUMBER  {PNAME, PLOCATION}
 Employee SSN and project number determines the hours per week that the employee
works on the project
{SSN, PNUMBER}  HOURS
Inference Rules for FDs
 Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold
 Armstrong's inference rules
A1. (Reflexive) If Y subset-of X, then X  Y
A2. (Augmentation) If X  Y, then XZ  YZ
(Notation: XZ stands for X U Z)
A3. (Transitive) If X  Y and Y  Z, then X  Z
 A1, A2, A3 form a sound and complete set of inference rules
Additional Useful Inference Rules
 Decomposition
 If X  YZ, then X  Y and X  Z

BY:- MM Page 17
Modeling Data Objects

 Union
 If X  Y and X  Z, then X  YZ
 Pseudo transitivity
 If X  Y and WY  Z, then WX  Z

2.4 Normal forms based on primary keys


Definitions of Keys and Attributes Participating in Keys
 A superkey of a relation schema R = {A1, A2, ...., An} is a set of attributes S subset-of R
with the property that no two tuples t1 and t2 in any legal relation state r of R will have
t1[S] = t2[S] .
 A key K is a superkey with the additional property that removal of any attribute from K
will cause K not to be a superkey any more.
 If a relation schema has more than one key, each is called a candidate key.
 One of the candidate keys is arbitrarily designated to be the primary key
 A Prime attribute must be a member of some candidate key
 A Nonprime attribute is not a prime attribute—that is, it is not a member of any
candidate key.

First Normal Form


 Disallows composite attributes, multivalued attributes, and nested relations; attributes
whose values for an individual tuple are non-atomic
 Considered to be part of the definition of relation
 All the values need to be atomic

BY:- MM Page 18
Modeling Data Objects

2.5 General definition of second and third normal forms


Second Normal Form
 A relation schema R is in second normal form (2NF) if every non-prime attribute A in R
is fully functionally dependent on the primary key
 A functional dependency X->Y is a partial dependency if some attribute A belong X can
be removed from X and the dependency still holds

 Uses the concepts of FDs, primary key
 Definitions:
 Prime attribute - attribute that is member of the primary key K
 Full functional dependency - a FD Y  Z where removal of any attribute from Y
means the FD does not hold any more
 {SSN, PNUMBER}  HOURS is a full FD since neither SSN  HOURS nor
PNUMBER  HOURS hold
 {SSN, PNUMBER}  ENAME is not a full FD (it is called a partial dependency ) since
SSN  ENAME also holds

BY:- MM Page 19
Modeling Data Objects

 A relation schema R is in second normal form (2NF) if every non-prime attribute A in R


is fully functionally dependent on the primary key
 R can be decomposed into 2NF relations via the process of 2NF normalization

Third Normal Form


 A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime
attribute A in R is transitively dependent on the primary key
 Transitive functional dependency – if there a set of attribute Z that are neither a
primary or candidate key and both X  Z and Y  Z holds.
 Examples:
 SSN  DMGRSSN is a transitive FD since SSN  DNUMBER and
DNUMBER  DMGRSSN hold
 SSN  ENAME is non-transitive since there is no set of attributes X where SSN
 X and X  ENAME

2.6 BCNF (Boyce-Codd Normal Form)


 A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X  A
holds in R, then X is a superkey of R
 Each normal form is strictly stronger than the previous one:
 Every 2NF relation is in 1NF
 Every 3NF relation is in 2NF

BY:- MM Page 20
Modeling Data Objects

 Every BCNF relation is in 3NF


 There exist relations that are in 3NF but not in BCNF
 The goal is to have each relation in BCNF (or 3NF)
Rule for schema not in BCNF
 Let R be a schema not in BCNF, then there is at least one nontrivial functional
dependency  such that  is not a superkey
Example of not BCNF
 borloan = (customer_id, loan_number, amount)
 loan_numberamount
 but loan_number is not a superkey
 If a relation is not in BCNF it can be decomposed to create relations that are in BCNF
 borrower = (customer_id, load_number)
Is BCNF because no nontrivial functional dependency hold onto it
 loan = (loan_number, amount)
Has one nontrivial functional dependency that holds, loan_numberamount, but loan_number
is a superkey so loan is in BCNF

BY:- MM Page 21
Modeling Data Objects

LO3 Validate Data Model using Normalization


3.1 Validating the Data Model
Even when you have completed your data model it may not completely reflect the system being
developed. You need to review your business rules once more to see if you have created an
accurate picture.

Look at the example data model answer for the Painting Hire system. The normalization process
resulted in an entity called Portfolio, with a key of Artist No, Painting No). This would allow for
a painter to have painted several paintings. However it would also allow for the same painting to
be painted by several painters. But, a requirement of the system was 'each painting can only have
one artist associated with it'.

Therefore, the current data model needs reviewing to prevent multiple artists being associated
with the same painting.

This has been achieved in the revised data model. The entity Portfolio has been deleted and
replaced by a foreign key of Artist No in the Painting entity.

3.4 Submitting final approval to client


This Final Design Approval form must be submitted by an authorized representative prior to the
release of the final project. For web development, the final design approval form is required prior
to the coding / development stage.

Should changes be requested after Aleberry receives a signed Final Design Approval form,
Client will be charged at an hourly rate.
By submitting this document the client declares satisfaction with the design and relinquishes the
right to any remaining uncharged design revisions of the project as it stands on:

After receiving this form, Aleberry Creative will contact you with any final design files.

BY:- MM Page 22

You might also like