Database

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20

Chapter 1

Definitions
- Database: an organized collection of logically related data
- Data: stored representations of meaningful objects and events
+ Structured: numbers, text, dates
+Unstructured: images, video, documents
- Information: data processed to increase knowledge in the person using the data
- Metadata: data that describes the properties and context of user data

Disadvantages of File processing


- Program-data dependence
- Duplication of data
- Limited data sharing
- Lengthy Develpment times
- Excessive Program Maintenance
Problems with data Dependency
- Each application programmer must maintain his/her own data
- Each application program:
+ needs to include code for the metadata of each file
+ must have its own processing routines for reading, inserting, updating, and deleting
data
- must have its own processing routines for reading, inserting, updating, and deleting data
- Non-standard file formats
Duplicate Data (Duplicate data refers to the presence of multiple identical or similar
copies of the same data within a system or database) đọc cho hiểu là đc

Problems with Data redundancy


- Waste of space to have duplicate data
- Causes more maintenance headaches 
- The biggest problem:
+ Data changes in one file could cause inconsistencies
+ Compromises in data integrity
All of problems above => we need Database management system
Advantages of the Database Approach

- Program-data independence
- Planned data redundancy
- Improved data consistency
- Improved data sharing
- Increased application development productivity
- Enforcement of standards
- Improved data quality
- Improved data accessibility and responsiveness
- Reduced program maintenance
- Improved decision support

Costs and Risks of the Database Approach

- New, specialized personnel


- Installation and management cost and complexity
- Conversion costs
- Need for explicit backup and recovery
- Organizational conflict

Elements of the Database Approach


- Data models
+ Graphical system capturing nature and relationship of data
+ Enterprise Data Model–high-level entities and relationships for the organization
+Project Data Model–more detailed view, matching data structure in database or data
warehouse
- Entities:
+ Noun form describing a person, place, object, event, or concept
+ Composed of attributes
- Relationships (Between entities)
+Usually one-to-many (1:M) or many-to-many (M:N)
- Relational Databases
Database technology: tables (relations) representing entities and primary/foreign keys
representing relationships.
Nhìn lướt qua cho hiểu
Chú thích từng cái trong hình đọc cho biết
1. Data modeling and design tools –
automated tools used to design databases
and application programs.
2. Repository – centralized storehouse of
metadata
3. Database Management System (DBMS) –
software for managing the database.
4. Database – storehouse of the data
5. Application Programs – software using the
data.
6. User Interface – text and graphical
displays to users
Components of Database Environment

The range of database Applications


- Personal databases
- Two-tier Client/Server databases
- Multitier Client/Server databases
- Enterprise applications
+ Enterprise resource planning (ERP) systems
+ Data warehousing implementations

Enterprise Data Model

- First step in
database
development
- Specifies scope and
general content.
- Overall picture of
organizational data
at high level of
abstraction
- Entity-relationship
diagram
- Descriptions of
entity types
- Relationships
between entities
- Business rules

SDLC (System Development Life Cycle)


- Detailed, well-planned development process
- Time-consuming, but comprehensive
- Long development cycle

Prototyping
- RAD (rapid application development)
- Cursory attempt at conceptual data modeling
- Define database during development of initial prototype.
- Repeat implementation and maintenance activities with new prototype versions

Database Schema
Chapter 2:
Business Rules:
- Are statements that define or constrain some aspect of the business
- Are derived from policies, procedures, events, functions
- Assert business structure
- Control/influence business behavior
- Are expressed in terms familiar to end users
- Are automated through DBMS software

A good business rule is: Declarative, Precise, Atomic, Consistent, Expressible, Distinct, Business-
oriented

A good data name is: Related to business, Meaningful and self-documenting, Unique, Readable,
Composed of words from approved list, Repeatable, Written in standard syntax.

E-R Model ( đọc cho đỡ lú nha mấy má)


- Entities:
+ Entity instance – person, place, object, event, concept (often corresponds to a row in a
table)
+ Entity Type – a collection of entities (often corresponds to a table)
+ Entity instance – A single occurrence of an entity type

- Relationships: there are 3 types: unary, binary, ternary relationships


+ Relationship instance – a link between entities (corresponds to primary keyforeign key
equivalencies in related tables)
+ Relationship type – category of relationship…link between entity types (ex: one to one,
one-to-many,…)
+ Two entities can have more than one type of relationship between them (multiple
relationships)

+ Associative Entity – a combination of relationship and entity


- Attribute – property or characteristic of an entity or relationship type (often corresponds
to a field in a table)
+ Classifications of attributes:
1. Required versus Optional Attributes
2. Simple versus Composite Attribute
3. Single-Valued versus Multivalued Attribute
4. Stored versus Derived Attributes
5. Identifier Attributes

- Identifiers (Keys) - an attribute (or combination of attributes) that uniquely identifies


individual instances of an entity type
+ choose Keys that : will not change in value ( like locations or people….) and will not
be null.

Hình tui bỏ vô coi cho đỡ lú thui nha chắc ko ra thi đâu,


Khúc này hơi nhiều hình hơi nhiều hình nó giải thích, bỏ vô in tốn giấy nên mở silde ra xem sơ
nha (chapter 2- page 14-15,…) đọc vs coi hình trên slide sẽ dễ hiểu hơn

Cardinality constraints
- Cardinality Constraints — the number of instances of one entity that can or must be
associated with each instance of another entity
- Minimum cardinality
+ if 0, then optional
+ if 1 or more, then mandatory
- Maximum cardinality: the maximum number
( Trang 26 có hình giải thích mandatory one, optional one,…)

Mandatory one: chỉ 1


Mandatory many : >=1
Optional one: 0 hoặc tối đa 1
Optional many: : 0 hoặc bao nhiu cũng đc
Strong vs. Weak Entities, and Identifying Relationships
- Strong entity : ( identifier underlined with a single line)
+ exists independently of other types of entities
+ has its own unique identifier

- Weak entity : (partial identifier underlined with a double line)


+ dependent on a strong entity (identifying owner)…cannot exist on its own
+ does not have a unique identifier (only a partial identifier)
+ entity box has a double line

- Identifying relationship: links strong entities to weak entities

Trang 32 có hình nha ( nhìn cho hiểu thoi)

Chapter 3

Supertypes and Subtypes


- Enhanced ER model: extends original ER model with new modeling constructs
- Subtype: A subgrouping of the entities in an entity type that has attributes distinct from
those in other subgroupings
- Supertype: A generic entity type that has a relationship with one or more subtypes
- Attribute Inheritance:
+Subtype entities inherit values of all attributes of the supertype
+An instance of a subtype is also an instance of the supertype
(hình chapter 3 trang 5)

Relationships and Subtypes


- Relationships at the supertype level indicate that all subtypes will participate in the
relationship
- The instances of a subtype may participate in a relationship unique to that subtype. In
this situation, the relationship is shown at the subtype level
(hình trang 8)

Generalization and Specialization


- Generalization: The process of defining a more general entity type from a set of more
specialized entity types. BOTTOM-UP
- Specialization: The process of defining one or more subtypes of the supertype and
forming supertype/subtype relationships. TOP-DOWN
( hình trang 10,11,12,13)
Constraints in Supertype/ Completeness Constraint
- Completeness Constraints: Whether an instance of a supertype must also be a member
of at least one subtype
+ Total Specialization Rule: Yes (double line)
+ Partial Specialization Rule: No (single line)
( hình trang 16,17)

Constraints in Supertype/ Disjointness constraint


- Disjointness Constraints: Whether an instance of a supertype may simultaneously be a
member of two (or more) subtypes
+ Disjoint Rule: An instance of the supertype can be only ONE of the subtypes ( có thể
thuộc 1 trong 2 nhưng ko cùng lúc)
+ Overlap Rule: An instance of the supertype could be more than one of the subtypes
(có thể thuộc cả 2 cùng lúc nhưng buộc phải thuộc 1 trong 2 bởi vì nó là total
specialization rule)
( hình trang 18,19)

Constraints in Supertype/ Subtype Discriminators


- Subtype Discriminator: An attribute of the supertype whose values determine the target
subtype(s)
+ Disjoint – a simple attribute with alternative values to indicate the possible subtypes
+ Overlapping – a composite attribute whose subparts pertain to different subtypes.
Each subpart contains a Boolean value to indicate whether or not the instance belongs
to the associated subtype
(hình trang 21,22,23)

Entity clusters
- EER diagrams are difficult to read when there are too many entities and relationships
- Solution: Group entities and relationships into entity clusters
- Entity cluster: Set of one or more entity types and associated relationships grouped into
a single abstract entity type
(hình 25,26,27)

Packaged Data models


- Predefined data models
- Could be universal or industry-specific
- Universal data model = a generic or template data model that can be reused as a starting
point for a data modeling project (also called a “pattern”)
- Advantages:
+ Use proven model components
+ Save time and cost
+ Less likelihood of data model errors
+ Easier to evolve and modify over time
+ Aid in requirements determination
+ Easier to read
+ Supertype/subtype hierarchies promote reuse
+ Many-to-many relationships enhance model flexibility
+ Vendor-supplied data model fosters integration with vendor’s applications
+ Universal models support inter-organizational systems
(hìnhh 30,31)

Chapter4
Relation
- A relation is a named, two-dimensional table of data.
- A table consists of rows (records) and columns (attributes or fields).
- Requirements for a table to qualify as a relation:
+ It must have a unique name.
+ Every attribute value must be atomic (not multivalued, not composite).
+ Every row must be unique (can’t have two rows with exactly the same values for all
their fields).
+ Attributes (columns) in tables must have unique names.
+ The order of the columns must be irrelevant.
+ The order of the rows must be irrelevant.
NOTE: all relations are in 1st Normal form

Correspondence with E-R model


- Relations (tables) correspond with entity types and with many-to-many relationship
types.
- Rows correspond with entity instances and with many-to-many relationship instances.
- Columns correspond with attributes.

Key Fields
- Keys are special fields that serve two main purposes:
+ Primary keys are unique identifiers of the relation in question
+ Foreign keys are identifiers that enable a dependent relation (on the many side of a
relationship) to refer to its parent relation (on the one side of the relationship).
- Keys can be simple (a single field) or composite (more than one field).
- Keys usually are used as indexes to speed up the response to user queries.
Hình trang 8
Integrity Constraints

- Domain Constraints: Allowable values for an attribute.


- Entity Integrity: No primary key attribute may be null. All primary key fields MUST have
data.
- Referential Integrity:
+ Rules that maintain consistency between the rows of two related tables.
+ rule states that any foreign key value (on the relation of the many side) MUST match a
primary key value in the relation of the one side. (Or the foreign key can be null)
Hình trang 12

Transforming EER Diagrams into Relations

- Mapping Regular Entities to Relations (hình trang 14,15,16)


1. Simple attributes: E-R attributes map directly onto the relation
2. Composite attributes: Use only their simple, component attributes
3. Multivalued Attribute: Becomes a separate relation with a foreign key taken from the
superior entity

- Mapping Weak Entities (hình trang 18,19)


+Becomes a separate relation with a foreign key taken from the superior entity
+Primary key composed of: Partial identifier of weak entity & Primary key of
identifying relation (strong entity)

- Mapping Binary Relationships (21,22,23,24,25)


+ One-to-Many–Primary key on the one side becomes a foreign key on the many side
+ Many-to-Many–Create a new relation with the primary keys of the two entities as its
primary key
+ One-to-One–Primary key on the mandatory side becomes a foreign key on the optional
side

- Mapping Associative Entities(27,28,29,30)


+ Identifier Not Assigned : Default primary key for the association relation is composed
of the primary keys of the two entities (as in M:N relationship)
+ Identifier Assigned
1. It is natural and familiar to end-users
2.Default identifier may not be unique
- Mapping Unary Relationships (32,33)
+ One-to-Many–Recursive foreign key in the same relation
+ Many-to-Many–Two relations:
1. One for the entity type
2. One for an associative relation in which the primary key has two attributes, both
taken from the primary key of the entity

- Mapping Ternary (and n-ary) Relationships (35,36)


+ One relation for each entity and one for the associative entity
+ Associative entity has foreign keys to each entity in the relationship

- Mapping Supertype/Subtype Relationships (38,39)


+ One relation for supertype and for each subtype
+ Supertype attributes (including identifier and subtype discriminator) go into supertype
relation
+ Subtype attributes go into each subtype; primary key of supertype relation also
becomes primary key of subtype relation
+ 1:1 relationship established between supertype and each subtype, with supertype as
primary table

Data Normalization
- Primarily a tool to validate and improve a logical design so that it satisfies certain
constraints that avoid unnecessary duplication of data
- The process of decomposing relations with anomalies to produce smaller, well-
structured relations

Well-structured Relations
- A relation that contains minimal data redundancy and allows users to insert, delete, and
update rows without causing data inconsistencies
- Goal is to avoid anomalies
+ Insertion Anomaly–adding new rows forces user to create duplicate data
+ Deletion Anomaly–deleting rows may cause a loss of data that would be needed for
other future rows
+ Modification Anomaly–changing data in a row forces changes to other rows because of
duplication
Anomalies in this Table
1. Insertion–can’t enter a
new employee without
having the employee
take a class
2. Deletion–if we remove
employee 140, we lose
information about the
existence of a Tax Acc
class
3. Modification–giving a
salary increase to
employee 100 forces us
to update multiple
? why do these anomalies exist
Because there are two themes (entity types) in this one
records
relation. This results in data duplication and an
unnecessary dependency between the entities
Functional Dependencies and Keys
- Functional Dependency: The value of one attribute (the determinant) determines the
value of another attribute

- Candidate Key:
+A unique identifier. One of the candidate keys will become the primary key
+ Each non-key field is functionally dependent on every candidate key

First normal form (47,48) Second Normal Form Third Normal Form (54)
(51,52)
No multivalued attributes 1NF PLUS every non-key 2NF PLUS no transitive
attribute is fully dependencies (functional
functionally dependent on dependencies on non-
the ENTIRE primary key primarykey attributes)
Every attribute value is Every non-key attribute Solution: Non-key
atomic must be defined by the determinant with transitive
entire key, not by only part dependencies go into a
of the key new table; non-key
determinant becomes
primary key in the new
table and stays as foreign
key in the old table
All relations are in 1st No partial functional
Normal Form dependencies

Chapter 6

SQL = Structured Query Language


- The standard for relational database management systems (RDBMS)
- RDBMS: A database management system that manages data as a collection of tables in
which all relationships are represented by common values in related tables
- Benefits:
+ Reduced training costs
+ Productivity
+ Application portability
+ Application longevity
+ Reduced dependence on a single vendor
+ Cross-system communication
SQL: DDL, DML, and DCL

- Data Definition Language (DDL) :Commands that define a database, including creating,
altering, and dropping tables and establishing constraints
- Data Manipulation Language (DML): Commands that maintain and query a database
- Data Control Language (DCL) : Commands that control a database, including
administering privileges and committing data

SQL Database Definition

- Data Definition Language (DDL)


- Major CREATE statements:
+CREATE SCHEMA–defines a portion of the database owned by a particular user
+CREATE TABLE–defines a new table and its columns
+CREATE VIEW–defines a logical table from one or more tables or views
- Other CREATE statements: CHARACTER SET, COLLATION, TRANSLATION, ASSERTION,
DOMAIN
Table Creation (hình minh hoạ trang 11,12,13,14,15,16)
- Steps in table creation:
1. Identify data types for attributes
2. Identify columns that can and cannot be null
3. Identify columns that must be unique (candidate keys)
4. Identify primary key–foreign key mates
5. Determine default values
6. Identify constraints on columns (domain specifications)
7. Create the table and associated indexes

Data Integrity Controls


- Referential integrity–constraint that ensures that foreign key values of a table must
match primary key values of a related table in 1:M relationships
- Restricting:
+ Deletes of primary records
+ Updates of primary records
+ Inserts of dependent records
Lệnh trong SQL
- Removing Table: Drop Table CUSOMER_T
- Insert Statement : Adds one or more rows to a table

- Delete Statement: remove rows from a table


+ delete certain rows: DELETE FROM CUSTOMER_T WHERE CUSTOMERSTATE=”HI”;
+ all rows: DELETE FROM CUSTOMER_T;
- Update Statement: modifies data in existing rows

- Select statement
+SELECT :List the columns (and expressions) that should be returned from the query
+FROM: Indicate the table(s) or view(s) from which data will be obtained
+WHERE: Indicate the conditions under which a row will be included in the result
+GROUP BY: Indicate categorization of results
+ HAVING : Indicate the conditions under which a category (group) will be included
+ORDER BY : Sorts the result according to specified criteria

Ex: trang 25,26,27,..,32

Processing Multiple Tables


- Join – a relational operation that causes two or more tables with a common domain to
be combined into a single table or view
- Equi-join–a join in which the joining condition is based on equality between values in
the common columns; common columns appear redundantly in the result table
- Natural join – a join in which one of the duplicate columns is eliminated in the result
table
- Outer join – a join in which rows that do not have matching values in common columns
are nonetheless included in the result table (as opposed to inner join, in which rows
must have matching values in order to appear in the result table)
- Union join – includes all data from each table that was joined
Ex: 36,37,38,29,40,41

You might also like