Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 84

Database Management Systems

Dr. Md. Rakibul Hoque


University of Dhaka
Learning Objectives

 What are the problems of managing data resources in a


traditional file environment?
 What are the major capabilities of database management
systems (DBMS) and why is a relational DBMS so powerful?
 What are the principal tools and technologies for accessing
information from databases to improve business
performance and decision making?
 Why are information policy, data administration, and data
quality assurance essential for managing the firm’s data
resources?
Organizing Data in a Traditional
File Environment

• File organization Term and Concepts


• Computer system organizes data in a hierarchy
• Bit: Smallest unit of data; binary digit (0,1)
• Byte: Group of bits that represents a single
character
• Field: Group of characters as word(s) or
number
• Record: Group of related fields
• File: Group of records of same type
Organizing Data in a Traditional
File Environment

• File organization Term and Concepts


• Computer system organizes data in a hierarchy
• Database: Group of related files
• Entity: Person, place, thing on which we
store information.
• Attribute: Each characteristic, or quality,
describing entity
• E.g., Attributes Date or Grade belong to entity
COURSE
Organizing Data in a Traditional
File Environment
The Data Hierarchy
A computer system
organizes data in a
hierarchy that starts
with the bit, which
represents either a 0 or
a 1. Bits can be
grouped to form a byte
to represent one
character, number, or
symbol. Bytes can be
grouped to form a field,
and related fields can
be grouped to form a
record. Related records
can be collected to form
a file, and related files
can be organized into a
database.
Organizing Data in a Traditional
File Environment
Traditional File Processing

The use of a traditional approach to file processing encourages each


functional area in a corporation to develop specialized applications and files.
Each application requires a unique data file that is likely to be a subset of the
master file. These subsets of the master file lead to data redundancy and
inconsistency, processing inflexibility, and wasted storage resources.
Database Management
System
 A database management system
(DBMS) is a collection of interrelated
data and a set of programs to access
those data. The collection of data,
usually referred to a database, contains
information relevant to an enterprise.
Database Management
System
 A database management system is a collection
of programs that enable users to create and
maintain a database.
-----Elmarsi & Navathe
 A database management system, or DBMS, is
a software designed to assist in maintaining
and utilizing large collections of data. ---------
Ramakrishnan & Gehrke
Database Systems Vs File
Systems ( Why DBMS?)
 Ordinary file system has a number of major
drawbacks:

1. Data redundancy and inconsistency-


Multiple file formats, duplication of information
in different files.
2.Difficulty in accessing data
- Need to write a new program to carry out
each new task
Database Systems Vs File
Systems ( Why DBMS?)

3.Data isolation
-Multiple files and formats
4.Integrity problems
- Integrity constraints (e.g. account balance >
0) become part of program code
- Hard to add new constraints or change
existing ones.
Database Systems Vs File
Systems ( Why DBMS?)

5. Atomicity problems
- Failures may leave database in an
inconsistent state with partial updates
carried out. E.g., transfer of funds from
one account to another should either
complete or not happen at all.
Database Systems Vs File
Systems ( Why DBMS?)

 6. Concurrent-access anomalies
- Needed for system performance and
usability
- Uncontrolled concurrent accesses can lead
to inconsistencies. E.g. two people reading
a balance and updating it at the same time.
Database Systems Vs File
Systems ( Why DBMS?)

7. Security problems: Not every user of


the database system should be able to
access all the data.

Database systems offer solutions to


all these problems
Some Commercial Database
Management Software

 For Personal Computers

1. Microsoft Access
2. FoxPro
3. dBase
Some Commercial Database
Management Software

1.Oracle – Oracle 8i, Oracle9i, Oracle 10g, 11g


2. Microsoft SQL Server
3. IBM DB2/DB2UDB
4. Informix
5. Sybase
6. Ingress
Some Open Source Database
Management Software

1. CUBRID
2. Firebird
3. MariaDB
4. MongoDB
5. Postgre SQL
6. MySQL
7. SQLite
Database System
Applications
Databases are widely used. Some representative applications are:
1. Banking: For customer information, accounts, loans and banking transactions.
2. Airlines/Railways/Road Transport: For ticket reservation, schedules and routes.
3. Universities: For student information, courses and grades (education
management).
4. Credit card transaction: For purchases on credit card, monthly statement
generation
5. Sales: For customer, product and purchase information.
Database System
Applications
6. Telecommunication: For keeping records of call made, generating monthly bills,
maintaining balances on prepaid calling cards, storing information about the
communication networks.
7. Finance: For storing information about holdings, sales, and purchases of financial
instruments such as stocks and bonds.
8. Manufacturing: For management of supply chains and for tracking production of items
in factories, inventories of items in warehouses/stores and order for items.
9. Human resources: For information about employees, salaries, payroll taxes and
benefits and for generation of pay checks.
Capabilities of Database
Management Systems (DBMSs)
There are three important capabilities of DBMS that traditional file
environments lack—data definition, data dictionary, and a data
manipulation language.
–Data definition capability: Specifies structure of database content, used
to create tables and define characteristics of fields
–Data dictionary: Automated or manual file storing definitions of data
elements and their characteristics
–Data manipulation language: Used to add, change, delete, retrieve data
from database
• Structured Query Language (SQL)
• Microsoft Access user tools for generating SQL
–Many DBMS have report generation capabilities for creating polished
reports (Crystal Reports)
Process of Database Design

 Conceptual design vs. physical design


 Entity-relationship diagram. A correct data model is essential
for a system serving the business well
 Relational database model
 Referential integrity
 Rules used by RDBMS to ensure relationships between

tables remain consistent


 Normalization
 Streamlining complex groupings of data to minimize

redundant data elements and awkward many-to-many


relationships
Process of Database Design
 Conceptual (logical) design: abstract model from business perspective. Develop a logical database
model, which describes data using notation that corresponds to a data organization used by a database
management system.
 Physical design: How database is arranged on direct-access storage devices. Prescribe the technical
specifications for computer files and databases in which to store the data

 Logical and physical database design in parallel with other system


design steps
Process of Database Design

 E-R Model
 The purpose of E-R diagramming is to capture the
richest possible understanding of the meaning of
the data necessary for an information system or
organization.
 The E-R model is expressed in terms of:
 Entities in the business environment
 Relationships among those entities
 The attributes (properties) of both the entities and their
relationships
Process of Database Design
 Entity – person, place, object about which the organization
wishes to maintain data. An entity may be concrete, such as a
person or a book, or it may be abstract, such as loan, or a
holiday, or a concept.
 An entity has its own identity that distinguishes it from each other
entity.
 Person: Employee, Student, Patient

 Place: Store, Warehouse

 Object: Machine, Building

 Concept: Account, Course


Process of Database Design
 Attribute – named property or a characteristic of an entity that
is of interest to the organization. An attribute name is a noun
and should be unique. To make an attribute name unique and
for clarity, each attribute name should follow a standard
format. Similar attributes of different entity types should use
similar but distinguishing names.
 Student: Student_ID, Student_Name, Address, Cell

 An attribute definition:
 States what the attribute is and possibly why it is important

 Should make it clear what is included and what is not

included
 Contains any aliases or alternative names

 States the source of values for the attribute


Process of Database Design
(Types of attributes)
Required – must have a value for every Optional – may not have a value for
entity (or relationship) instance with every entity (or relationship) instance
which it is associated with which it is associated
 Simple
 When an attribute is not possible to divided into subparts. Each entity
has a single atomic value for the attribute. For example, city, country.
 Composite
 Composite attributes can be divided into subparts i.e. other attributes.
The attribute may be composed of several components. For example:
 Address(Apt#, House#, Street, City, State, ZipCode, Country), or
 Name(FirstName, MiddleName, LastName).
 Single-valued – when there exist a single value for a particular entity.
Example: Student_ID
 Multi-valued
 An entity may have multiple values for that attribute. Example:
Phone_Number
Derived Attributes

 Derived – The value of this type of attribute can be


derived from the values of other related attributes
or entities.
 age attribute in customer entity set where there is
a attribute named date-of-birth. date-of-birth may
be referred to as a base attribute, or a stored
attribute. The value of a derived attribute is not
stored, but is computed when required.
Identifiers Attribute (Keys)

 Identifier (Key)–an attribute (or combination


of attributes) that uniquely identifies
individual instances of an entity type
 Primary key – attribute (or combination of
attributes) whose value is unique across all
occurrences of a relation
 Simple versus Composite Identifier
 Candidate Identifier–an attribute that could
be a key…satisfies the requirements for
being an identifier
 Choose Identifiers that
 Will not change in value
 Will not be null
Two Entities, EMPLOYEE e1 and
COMPANY c1 and their attributes
Relationships
 Relationship – A relationship is an association among several entities. A depositor
relationship associates a customer with each account that he or she has.
 A relationship name is a verb phrase in the present tense
 Relationship guidelines:
 Explain importance and why it is important

 Give examples to clarify action

 Explain any optional participation

 Explain reason for any explicit maximum cardinality other than many

 Explain any reasons for participation restrictions

 Explain the extend of history that is kept in the relationship

 Explain whether an entity instance involved in a relationship instance can transfer participation

to another relationship instance


Process of Database Design
(Data Models)
 A data model is a collection of conceptual tools for
describing data, data relationship, data semantics and
consistency constraints.
 1. Entity-Relationship Model: This is a higher-level data
model. It is based on a perception of a real world that
consists of a collection of basic objects, called entities and
the relationship among these objects.
 customer- customer-id, customer-name, customer-street, customer-city
 account – account-number, balance
Process of Database Design
(Data Models)
The COMPANY Conceptual Schema in
UML Class Diagram Notation
Human Resource Schema
NOTATION for E-R diagrams
NOTATION for E-R diagrams
 The E-R diagram can express the overall logical
structure of a database graphically. Such a
diagram consists of the following major
components:
 Rectangles, which represent entity sets
 Ellipses/ ovals, which represent attributes
 Diamonds, which represent relationship sets
 Lines, which link attributes to entity sets and entity sets
to relationship sets
 Each key attribute is underlined
NOTATION for E-R diagrams

 Double ellipses/ ovals, which represent multivalued


attributes
 Dashed ellipses/ ovals, which denote derived attributes
 Double lines, which indicate total participation of an entity in
a relationship set
 Double rectangles, which represent weak entity set
 Double diamond, identifying relationship set for weak entity
set
Data Models
2. Relational Model: This is a lower level model. It uses a
collection of tables to represent both data and relationships
among those data. The relational model is an example of a
record-based model. This is because the database is
structured in fixed-format records of several types. Each table
contains records of a particular type. Each record type defines
a fixed no. of fields, or attributes. The columns of the table
correspond to the attributes of the record type.

Each table has multiple columns, and each column has a


unique name.
Data Models
Transforming E-R Diagrams into
Relations
 Transforming an E-R diagram into normalized relations
and merging all of them into one consolidated set of
relations takes four steps:
1. Represent entities (each becomes a relation)

2. Represent relationships (each must be represented in


the relational database design)
3. Normalize the relations (make them well structured)

4. Merge the relations (renormalize if necessary to


remove redundancy)
Representing Entities
 Each regular entity is transformed into a relation
 The identifier of the entity type becomes the primary key
of the corresponding relation
 The primary key must satisfy the following two conditions
 The value of the key must uniquely identify every row

in the relation
 The key should be nonredundant

 The entity type label is translated into a relation name


Initial Conceptual Design of Entity
Types for the COMPANY Database
Schema

 Based on the requirements, we can identify four initial


entity types in the COMPANY database:
 DEPARTMENT
 PROJECT
 EMPLOYEE
 DEPENDENT
 Their initial conceptual design is shown on the following
slide
 The initial attributes shown are derived from the
requirements description
Initial Design of Entity Types:
EMPLOYEE, DEPARTMENT, PROJECT,
DEPENDENT
Refining the COMPANY database
schema by introducing relationships

 By examining the requirements, six relationship types are


identified
 All are binary relationships( degree 2)
 Listed below with their participating entity types:
 WORKS_FOR (between EMPLOYEE, DEPARTMENT)
 MANAGES (also between EMPLOYEE, DEPARTMENT)
 CONTROLS (between DEPARTMENT, PROJECT)
 WORKS_ON (between EMPLOYEE, PROJECT)
 SUPERVISION (between EMPLOYEE (as subordinate),
EMPLOYEE (as supervisor))
 DEPENDENTS_OF (between EMPLOYEE, DEPENDENT)
E-R DIAGRAM – Relationship Types
are:
WORKS_FOR, MANAGES, WORKS_ON, CONTROLS, SUPERVISION, DEPENDENTS_OF
Foreign Key and Referential
Integrity

 Foreign key – To make sure the tables relate to each


other, the primary key from one table is stored in a
related table as a foreign key. Attribute that appears
as a nonprimary key attribute in one relation and as a
primary key attribute (or part of a primary key) in
another relation
 Referential integrity – rule that states that either each
foreign key value must match a primary key value in
another relation or the foreign key value must be null
(i.e., have no value)
COMPANY Database
Referential Integrity Constraints for
COMPANY database
One possible database state for
the COMPANY relational database
schema
One possible database state for the COMPANY
relational database schema – continued

Slide 6- 13
Conceptual Data Model and
Transformed Relations
Well-structured relation

• Well-structured relation – relation that contains


a minimum amount of redundancy and that
allows users to insert, modify, and delete the
rows without error or inconsistencies.
Well-structured relation

Figure shows an example of a relation named EMPLOYEE2


which contains data about employees and the courses they
have completed. This relation contains the following attributes
describing employees: Emp_ID, Name, Dept, Salary, Course
and Date_Completed.
EMPLOYEE2 (Emp_ID, Name, Dept, Salary, Course, Date_Completed)
Well-structured relation

 EMPLOYEE2 is not a well-structured relation. If we


examine the sample data in the table, we notice a
considerable amount of redundancy. For example,
the Emp_ID, Name, Dept, and Salary values appear
in two separate rows for employees 100, 110, and
150.
 Consequently, if the salary for employee 100
changes, we must record this fact in two rows (or
more, for some employees).
 The problem with this relation is that it contains data
about two entities:
 EMPLOYEE and COURSE.
Well-structured relation

 Figure shows an example of a relation named EMPLOYEE1. This relation contains the following
attributes describing employees: Emp_ID, Name, Dept, and Salary.
 EMPLOYEE1(Emp_ID, Name, Dept, Salary)
 This table has five sample rows, corresponding to five employees.
 EMPLOYEE1 is a well-structured relation. Each row of the table contains data describing one
employee, and any modification to an employee’s data (such as a change in salary) is confined
to one row of the table.
Normalization
 Normalization –the process of converting
complex data structures into simple, stable data
structures.
 Normalization is a systematic approach of
decomposing tables to eliminate data
redundancy and undesirable characteristics like
Insertion, Update and Deletion anamolies. It is a
multi-step process that puts data into tabular form
by removing duplicated data from the relation
tables.
 The result of normalization is that every
nonprimary key attribute depends upon the whole
primary key and nothing but the primary key.
Normalization
 First Normal Form (1NF):
 Has no multivalued attributes, unique rows, and

all relations are in 1NF


 Second Normal Form (2NF):

 Each nonprimary key attribute is identified by the

whole key (referred to as a full functional


dependency)
 Third Normal Form (3NF):

 Nonprimary key attributes do not depend on each

other (referred to as a transitive dependency)


Functional Dependency
 Functional dependency – Normalization is based on the
analysis of functional dependence. Functional dependence
constraint relationship between two attributes in which the
value of one attribute is determined by the value of another
attribute
 Functional dependency is not a mathematical dependency
 Instances (or sample data) in a relation do not prove the
existence of a functional dependency
 Knowledge of problem domain is most reliable method for
identifying functional dependency
Functional Dependency

 Example of attribute B being functionality


dependent on attribute A
 Dependency is represented by an arrow ( → )

 Attribute B is functionally dependent on attribute

A if, for every valid value of A, that value of A


uniquely determines the value of B
 Represented as: A → B
Functional Dependency

•Emp_ID → Name
 A given Emp_ID value can have only one Name value
associated with it; the value of Name
Functional Dependency

An attribute may be functionally dependent on two (or more) attributes


rather than on a single attribute. For example, consider the relation EMP
COURSE (Emp_ID,Course,Date_Completed). We represent the functional
dependency in this relation as follows:
Emp_ID, Course → Date_Completed (this is sometimes shown as
Emp_ID +
Course → Date_Completed).
In this case, Date_Completed cannot be determined by either Emp_ID or
Course alone because Date_Completed is a characteristic of an employee
taking a course.
Functional Dependency

 Consider the sample data in the relation EXAMPLE(A,B,C,D).


 The sample data in this relation prove that attribute B is not
functionally dependent on attribute A because A does not
uniquely determine B (two rows with the same value of A have
different values of B).
Second Normal Form (2NF)

 Second normal form (2N F) – relation is in second


normal form if every nonprimary key attribute is
functionally dependent on the whole primary key
 To convert a relation into 2N F , decompose the
relation into new relations using the attributes, called
determinants, that determine other attributes
 The determinants are the primary keys of the new
relations
Second Normal Form (2NF)

• EMPLOYEE2 is an example of a relation that is not in second normal form.


EMPLOYEE2 (Emp_ID, Name, Dept,Salary, Course, Date_Completed )
• To convert a relation to second normal form, we decompose the relation into new
relations using the attributes, called determinants, that determine other attributes;
the determinants are the primary keys of these relations. We can use the
principles of normalization to convert the EMPLOYEE2 table with its redundancy
to EMPLOYEE1 and EMP COURSE
1. EMPLOYEE1(Emp_ID,Name,Dept,Salary): This relation satisfies the first second normal form condition.
2. EMP COURSE(Emp_ID,Course,Date_Completed): This relation satisfies second normal form condition
three.
Second Normal Form (2NF)
Third Normal Form (3NF)

 Third normal form (3N F) – relation is in second


normal form and has no functional (transitive)
dependencies between two (or more) nonprimary
key attributes
Third Normal Form (3NF)

 Consider the relation SALES (Customer_ID,


Customer_Name, Salesperson,Region)
 The following functional dependencies exist in the SALES relation:
 1. Customer_ID → Customer_Name, Salesperson, Region (Customer_ID is the primary key.)
 2. Salesperson → Region (Each salesperson is assigned to a unique region.)
 Notice that SALES is in second normal form because the primary key consists of a single
attribute (Customer_ID). However, Region is functionally dependent on Salesperson, and
Salesperson is functionally dependent on Customer_ID. As a result, there are data maintenance
problems in SALES.
Third Normal Form (3NF)

1.A new salesperson (Robinson) assigned to the North


region cannot be entered until a customer has been
assigned to that salesperson (because a value for
Customer_ID must be provided to insert a row in the
table).
2. If customer number 6837 is deleted from the table,
we lose the information that salesperson Hernandez is
assigned to the East region.
3. If salesperson Smith is reassigned to the East
region, several rows must be changed to reflect that
fact (two rows are shown in Figure Sales).
Third Normal Form (3NF)

These problems can be avoided by decomposing SALES into the two


relations, based on the two determinants, shown in Figure SPERSON.
These relations are the following:
 SALES1(Customer_ID, Customer_Name, Salesperson)

 SPERSON(Salesperson, Region)

 Note that Salesperson is the primary key in SPERSON. Salesperson is


also a foreign key in SALES1. A foreign key is an attribute that appears
as a nonprimary key attribute in one relation (such as SALES1) and as
a primary key attribute (or part of a primary key) in another relation. You
designate a foreign key by using a dashed underline.
Removing Transitive Dependencies (a)
Relation with Transitive Dependency (b)
Relation in 3NF
Third Normal Form (3NF)
 A foreign key must satisfy referential integrity, which specifies that the value of an
attribute in one relation depends on the value of the same attribute in another
relation. Thus, the value of Salesperson in each row of table SALES1 is limited
only to the current values of Salesperson in the SPERSON table. If some sales do
not have to have a salesperson, then it is possible for the value of Salesperson to
be null (i.e., have no value). Referential integrity is one of the most important
principles of the relational model.
AN UNNORMALIZED RELATION
FOR ORDER

An unnormalized relation contains repeating groups. For example, there


can be many parts and suppliers for each order. There is only a one-to-
one correspondence between Order_Number and Order_Date.
NORMALIZED TABLES
CREATED FROM ORDER

After normalization, the original relation ORDER has been broken down into
four smaller relations. The relation ORDER is left with only two attributes and
the relation LINE_ITEM has a combined, or concatenated, key consisting of
Order_Number and Part_Number.
MICROSOFT ACCESS DATA
DICTIONARY FEATURES

Microsoft Access has a rudimentary data dictionary capability that displays


information about the size, format, and other characteristics of each field in a
database. Displayed here is the information maintained in the SUPPLIER table. The
AN ACCESS QUERY
Non-Relational Databases and
Databases in the Cloud

 Data are now stored in text messages, social media


postings, maps, and the like. Non-relational database
management systems are better at managing large
data set on distributed computing networks. They can
easily be scaled up or down depending on the
particular needs of your business at a particular time.
 Cloud computing service companies provide a way for
you to manage your company’s data through Internet
access using a Web browser.
Non-Relational Databases and
Databases in the Cloud
 Non-relational databases: “NoSQL”
 More flexible data model
 Data sets stored across distributed machines
 Easier to scale
 Handle large volumes of unstructured and structured

data (Web, social media, graphics)


 Databases in the cloud
 Typically, less functionality than on-premises DBs
 Amazon Relational Database Service, Microsoft SQL
Azure
 Private clouds
Multidimensional Databases

 A multidimensional database presents


the data to the user in several
dimensions. A three dimensional
database might present the
information by
 Sales Region
 Season
 Product Line
Tools for Improving Business
Performance and Decision Making
MULTIDIMENSIONAL
DATA MODEL: The
view that is showing is
product versus region.
If you rotate the cube
90 degrees, the face
that will show product
versus actual and
projected sales. If you
rotate the cube 90
degrees again, you will
see region versus
actual and projected
sales. Other views are
possible.
Managing the Firm’s Data
Resources

 Establishing an information policy


 Firm’s rules, procedures, roles for sharing, managing,
standardizing data
 Data administration
 Establishes policies and procedures to manage data
 Data governance
 Deals with policies and processes for managing availability, usability,
integrity, and security of data, especially regarding government
regulations
 Database administration
 Creating and maintaining database
Managing the Firm’s Data
Resources

• Ensuring data quality


– More than 25 percent of critical data in Fortune 1000
company databases are inaccurate or incomplete
– Redundant data
– Inconsistent data
– Faulty input
– Before new database in place, need to:
• Identify and correct faulty data
• Establish better routines for editing data once database in
operation
Managing the Firm’s Data
Resources

• Data quality audit:


– Structured survey of the accuracy and level of
completeness of the data in an information system
• Survey samples from data files, or
• Survey end users for perceptions of quality
• Data cleansing
– Software to detect and correct data that are incorrect,
incomplete, improperly formatted, or redundant
– Enforces consistency among different sets of data from
separate information systems
Thank
You

You might also like