Chap 1 - Fundamentals of Database Systems

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 48

Fundamentals of Database

Systems

(Comp 222)
Some of the notes are adapted from:

Elmasri et. al (2004). Fundamentals of Database Systems, 4 th ed,


Pearson education
Introduction
 Database
– Databases play a critical role in almost all areas where computers are used,
including business, engineering, medicine, law, education, and library
science, etc.
– Database systems are designed to manage large data set in an organization.
– They are used to:
• maintain internal records,
• present data to customers and clients on the World-Wide-Web, and
• support many other commercial processes
• Thus the DB course is about:
1. How to organize data
2. Supporting multiple users
3. Efficient and effective data retrieval
4. Secured and reliable storage of data
5. Maintaining consistent data
6. Making information useful for decision making
Definition
General definition:
• A database is a collection of related data .
– Data-facts that can be recorded and that have implicit meaning.
e.g. names, telephone numbers, and addresses of the people you know
Specific definition:
• A database is a shard collection of logically related data, organized
systematically to meet the information need of multiple users. A random
assortment of data cannot correctly be referred to as a database.
• A database is nothing more than a collection of shared information that exists
over a long period of time, often many years.
• A database represents some aspect of the real world, sometimes called the mini
world or the Universe of Discourse (UoD) where Changes to the mini world can
be reflected in the database.
• A database is designed, built, and populated with data for a specific purpose. It
has an intended group of users and some preconceived applications in which
these users are interested.
• It is a collection of data that is managed by a DBMS.
DBMS
• A database management system (DBMS) is a collection of
programs that enables users to create and maintain a database.
• It is a powerful tool for creating and managing large amounts
of data efficiently and allowing it to persist over long periods
of time, safely
• The DBMS is hence a general-purpose software system that
facilitates the processes of defining, constructing, and
manipulating databases for various applications.
Database system
• Database system: database and DBMS
software together.
• E.g. UNIVERSITY database,
• we store data to represent each student, course, section,
grade report, and prerequisite as a record in the
appropriate file
Database Management levels/approaches

Q. How do you manage your data?

1. Manual Approach
2. Traditional File Based Approach
3. Database Approach
Database Management levels/approaches ….

1. Manual Approach
• Primitive and traditional way of information handling
• Cards and paper are used for the purpose.
• The data storage and retrieval will be performed using human
labour.
– Files for as many event and objects as the organization are used to
store information.
– Each of the files containing various kinds of information is labelled
and stored in one or more cabinets.
– The cabinets could be kept in safe places for security purpose based
on the sensitivity of the information contained in it.
– Insertion and retrieval is done by searching first for the right cabinet
then for the right file then the information.
– One could have an indexing system to facilitate access to the data
Database Management levels/approaches ….

1. Manual Approach...
• Limitations of the Manual approach
» Prone to error
» Difficult to update, retrieve, integrate
» You have the data but it is difficult to
compile the information
» Limited to small size information
» Cross referencing is difficult
Database Management levels/approaches ….

• An alternative approach of data handling is a


computerized way of dealing with the information.
• The computerized approach could also be either
decentralized or centralized based on where the data
resides in the system
• Possible approaches:
• Traditional File Based Approach (decentralized)
• Database Approach (centralized)
Database Management levels/approaches ….

2. Traditional File Based Approach


• File based systems were an early attempt to
computerize the manual filing system.
• This approach is the decentralized computerized data
handling method.
• A collection of application programs perform services
for the end-users. In such systems, every application
program that provides service to end users define and
manage its own data
• Such systems have number of programs for each of the
different applications in the organization.
• File, in traditional file based approach, is a collection of
records which contains logically related data.
Database Management levels/approaches ….
2. Traditional File Based Approach...
Database Management levels/approaches ….

2. Traditional File Based Approach


Limitations of the Traditional File Based approach

– Separation or Isolation of Data


– Limited data sharing
– Lengthy development and maintenance time
– Duplication or redundancy of data (money and time cost
and loss of data integrity)
– Data dependency on the application
– Incompatible file formats or data structures (e.g. “C” and
COBOL)
Database Management levels/approaches ….

2. Traditional File Based Approach....


Limitations of the Traditional File Based approach...
– update anomalies
• Modification Anomalies: a problem experienced when one
ore more data value is modified on one application program
but not on others containing the same data set.
• Deletion Anomalies: a problem encountered where one
record set is deleted from one application but remain
untouched in other application programs.
• Insertion Anomalies: a problem experienced when ever
there is new data item to be recorded, and the recording is not
made in all the applications. And when same data item is
inserted at different applications, there could be errors in
encoding which makes the new data item to be considered as
a totally different object.
Database Management levels/approaches ….
3. Database Approach
– Database is just a computerized record keeping system or a kind of
electronic filing cabinet.
– Database is a repository for collection of computerized data files.
– Database is a shared collection of logically related data and description of
data designed to meet the information needs of an organization.
– Database is a collection of logically related data where these logically
related data comprises entities, attributes, relationships, and business rules
of an organization's information.
– In addition to containing data required by an organization, database also
contains a description of the data which is known as “Metadata” or “Data
Dictionary” or “Systems Catalogue” or “Data about Data” or some times
“Data Directory”.
– Since a database contains information about the data (metadata), it is called
a self descriptive collection of integrated records.
– The purpose of a database is to store information and to allow users to
retrieve and update that information on demand.
– Database is deigned once and used simultaneously by many users.
– Keeps the data in central position
Database Management levels/approaches ….
3. Database Approach
Database Management levels/approaches ….
3. Database Approach...
Benefits of the database approach
– Program data independence.
– Centralized information control
– Data can be shared
– Improved accessibility of data
– Redundancy can be reduced
– Quality data can be maintained
– Inconsistency can be avoided
– Integrity can be maintained
– Security measures can be enforced
– Improved decision support
– Standards can be enforced
– Compactness
– Speed
– Less labour

Q. Quality of data?
Database Management levels/approaches ….

• What limitations a database approach will


have?
Database Management System (DBMS)
• Database Management System (DBMS) is a Software package used for providing
EFFICIENT, CONVENIENT and SAFE MULTI-USER storage of and access to
MASSIVE amounts of PERSISTENT data.
• Generally, it provides the following services:
– Data storage, retrieval and update in the database
– A user accessible catalogue
– Transaction support service
– Concurrency Control Services
– Recovery Services
– Authorization Services (Security)
– Support for Data Communication
– Integrity Services
– Services to promote data independency between the data and the application
– Utility services: sets of utility service facilities like
• Importing data
• Statistical analysis support
• Index reorganization
• Garbage collection
DBMS Architecture
DBMS Facilities/functions

• A DBMS is software package used to design, manage, and


maintain databases.
• Each DBMS should have facilities to define the database,
manipulate the content of the database and control the
database.
– Data Definition Language (DDL)
–  Data Manipulation Language (DML)
– Data Dictionary
– Data Control Language
DBMS Facilities/functions
• Data Definition Language (DDL):
– Language used to define each data element
required by the organization.
– Commands for setting up schema or the intension
of database
– These commands are used to setup a database,
create, delete and alter table with the facility of
handling constraints
– E.g. CREATE, DELETE, ALTER in SQL
DBMS Facilities/functions …
• Data Manipulation Language (DML):
– Is a core command used by end-users and programmers to
store, retrieve, and access the data in the database e.g. SQL
– Since the required data or Query by the user will be
extracted using this type of language, it is also called
"Query Language“
– E.g. INSERT, UPDATE, SELECT etc in SQL
• Data Dictionary:
– used to store and organize information about the data
stored in the database.
– Meta data (data about data)
DBMS Facilities /functions …
• Data Control Language:
– Database is a shared resource that demands control of data
access and usage.
– The database administrator should have the facility to
control the overall operation of the system.
– Data Control Languages are commands that will help the
Database Administrator to control the database.
– The commands include grant or revoke privileges to
access the database or particular object within the database
and to store or remove database transactions
DBMS components
• Hardware:
– personal computers, mainframe or any server computers to be used in
multi-user system;
– network infrastructure, and
– other peripherals
• Software:
– DBMS software,
– application programs,
– operating systems,
– network software,
– language software
• Data:
– Operational data is the data actually stored in the system to be used by the
user.
– Metadata is the data that is used to store information about the database
itself.
DBMS components …
• Procedure:
– this is the rules and regulations on how to design and
use a database.
– It includes procedures like:
• how to log on to the DBMS,
• how to use facilities,
• how to start and stop DBMS,
• how to make backup,
• how to treat hardware and software failure,
• how to change the structure of the database.

• People:
– the people in the organization that are responsible or play a role in
designing, implementing, managing, administering and using the
resources in the database.
Database Development Lifecycle
• steps in designing a database system
1. Planning: that is identifying information gap in an
organization and propose a database solution to solve the
problem.
2. Analysis: concentrates more on fact finding about the
problem or the opportunity.
• Feasibility analysis,
• requirement determination and structuring, and
• selection of best design
Database Development Lifecycle…
3. Design: in database development more emphasis is given to
this phase.
1. Conceptual Design: concise description of the data, data type,
relationship between data and constraints on the data.
– There is no implementation or physical detail consideration.
– Used to elicit and structure all information requirements
– E.g ER-model
2. Logical Design: a higher level conceptual abstraction with selected
specific data model to implement the data structure.
– It is particular DBMS independent and with no other physical
considerations.
– E.g. Normalization (discover new entities and rules)
3. Physical Design: physical implementation of the logical design of the
database with respect to internal storage and file structure of the
database for the selected DBMS.
– To develop all technology and organizational specification. 
Database Development Lifecycle…
4. Implementation: the testing and deployment of the designed
database for use.
5. Operation and Support: administering and maintaining the
operation of the database system and providing support to users.
Tuning the database operations for best performance.
Roles in Database Design and Use
1. Database Administrator (DBA)
• Responsible to oversee, control and manage the database resources (the
database itself, the DBMS and other related software)
– Authorizing access to the database
– Coordinating and monitoring the use of the database
– Responsible for determining and acquiring hardware and software resources
– Accountable for problems like poor security, poor performance of the
system
– Involves in all steps of database development
• In big organizations having huge amount of data and user requirement.
– Data Administrator (DA): is responsible on management of data
resources.
• database planning, and development,
• maintenance of standards policies and procedures at the conceptual and logical
design phases.
– Database Administrator (DBA): This is more technically oriented role.
• DBA is responsible for the physical realization of the database.
• Involved in physical design, implementation, security and integrity control of the
database.
Roles in Database Design and Use
2. Database Designer (DBD)
• Identifies the data to be stored and choose the appropriate structures to
represent and store the data.
• Should understand the user requirement and should choose how the user
views the database.
• Involve on the design phase before the implementation of the database
system.
• two distinctions of database designers,
– Logical and Conceptual DBD
• Identifies data (entity, attributes and relationship) relevant to the organization
• Identifies constraints on each data
• Understand data and business rules in the organization
• Sees the database independent of any data model at conceptual level and consider one
specific data model at logical design phase.
– Physical DBD
• Take logical design specification as input and decide how it should be physically realized.
• Map the logical data model on the specified DBMS with respect to tables and integrity
constraints. (DBMS dependent designing)
• Select specific storage structure and access path to the database
• Design security measures required on the database
Roles in Database Design and Use

3. Application Programmer and Systems Analyst


• System analyst determines the user requirement and how the
user wants to view the database.
• The application programmer implements these specifications
as programs; code, test, debug, document and maintain the
application program.
• The application programmer determines the interface on how
to retrieve, insert, update and delete data in the database.
• The application could use any high level programming
language according to the availability, the facility and the
required service.
Roles in Database Design and Use
4. End Users
• Workers, whose job requires accessing the database frequently for
various purposes
– Naïve Users:
• Sizable proportion of users
• Unaware of the DBMS
• Only access the database based on their access level and demand
• Use standard and pre-specified types of queries.
– Sophisticated Users
• Users familiar with the structure of the Database and facilities of the DBMS.
• Have complex requirements
• Have higher level queries
• Are most of the time engineers, scientists, business analysts, etc
– Casual Users
• Users who access the database occasionally.
• Need different information from the database each time.
• Use sophisticated database queries to satisfy their needs.
• Are most of the time middle to high level managers.
Actors of the database system
• Actors on the Scene:
– Data Administrator
– Database Administrator
– Database Designer
– End Users

• Workers behind the scene


– DBMS designers and implementers: who design and implement
different DBMS software.
– Tool Developers: experts who develop software packages that
facilitates database system designing and use
– Operators and Maintenance Personnel: system administrators who
are responsible for actually running and maintaining the hardware and
software of the database system and the information technology
facilities.
ANSI-SPARC Architecture
• Actors who have various roles have different ways of
looking (views) to the database
• All users have their own customized views to the database
• Executing their roles in the database doesn’t affect others
views and roles

 Such kinds of functionalities are possible due to the three


level ANSI-SPARC architecture
ANSI-SPARC Architecture
• Three-Level database architecture
ANSI-SPARC Architecture
• External Level: Users' view of the database. It describes
that part of database that is relevant to a particular user.
– Different users have their own customized view of the
database independent of other users.

• Conceptual Level: Community view of the database.


– Describes what data is stored in database and relationships
among the data along with the business constraints.

• Internal Level: Physical representation of the database


on the computer.
– Describes how the data is stored in the database.
ANSI-SPARC Architecture
• ANSI-SPARC Architecture and Database Design Phases
ANSI-SPARC Architecture
• Differences between Three Levels of ANSI-SPARC Architecture
Data Independence

1. Logical Data Independence:


• Refers to immunity of external schemas to
changes in conceptual schema.
• Conceptual schema changes e.g. addition/removal
of entities should not require changes to external
schema or rewrites of application programs.
• The capacity to change the conceptual schema
without having to change the external schemas
and their application programs.
2. Physical Data Independence
• The ability to modify the physical schema without
changing the logical schema
• The capacity to change the internal schema
without having to change the conceptual schema
• Internal schema changes e.g. using different file
organizations, storage structures/devices should
not require change to conceptual or external
schemas.
Data Independence and the ANSI-SPARC Three-level
Architecture
Data models
• Data Definition Languages are too low level to describe the data
requirements of an organization in a way that is readily understandable
by a variety of users.
• We need a higher-level language to describe the database at a higher
level so that the data can be represented in an understandable way.

• Data Model
– a higher-level description of the database schema
– a set of concepts to describe the structure of a database, and certain
constraints that the database should obey.
– it is a description of the way that data is stored in a database
– helps to understand the relationship between entities and to create the most
effective structure to hold data.
– is a collection of tools or concepts for describing
• Data
• Data relationships
• Data semantics
• Data constraints
Categories of data models
• Categorized according to the types of concepts they
use to describe the database structure
– Object-based (High-level or conceptual)
• provide concepts that are close to the way many users perceive
data
– Record-based (representational or implementation)
• provide concepts that may be understood by end users but that
are not too far removed from the way data is organized within the
computer.
• Representational data models hide some details of data storage
but can be implemented on a computer system in a direct way.
• Popular data model
– Physical (low-level)
• provide concepts that describe the details of how data is stored in
the computer
Categories of data models…

• Record-based Data Models


– Consist of a number of fixed format records.
– Each record type defines a fixed number of fields,
– Each field is typically of a fixed length.
• Hierarchical Data Model
• Network Data Model
• Relational Data Model
Record-based Data Models…
1. Hierarchical Model
• The simplest data model
• Record type is referred to as node or Department
segment
• The top node is the root node
• Nodes are arranged in a hierarchical
structure as sort of upside-down tree
• A parent node can have more than one
child node
Employee Job

• A child node can only have one parent


node
• The relationship between parent and
child is one-to-many Time Card Activity
• Relation is established by creating
physical link between stored records
(each is stored with a predefined
access path to other records)
• To add new record type or relationship, What will be the pros and cons
the database must be redefined and of hierarchical model?
then stored in a new form.
Record-based Data Models…
2. Network Model
– Allows record types to have Department Job

more than one parent unlike


hierarchical model
– A network data models sees
records as set members
– Each set has an owner and
one or more members
– Allow many to many Employee

relationship between entities


– Like hierarchical model
Activity

network model is a collection


of physically linked records.
– Allow member records to
have more than one owner
Time Card

What will be the pros and cons of network


model?
Record-based Data Models…
3. Relational Data Model
• Viewed as a collection of tables called
“Relations” equivalent to collection of record
types
• Relation: Two dimensional table
• Stores information or data in the form of Alternative terminologies
tables  rows and columns
Relation Table File
• A row of the table is called tuple equivalent
to record Tuple Row Record
• A column of a table is called attribute Attribute Column Field
equivalent to fields
• Data value is the value of the Attribute
• Records are related by the data stored jointly
in the fields of records in two tables or files.
The related tables contain information that
creates the relation
• The tables seem to be independent but are
related some how.
• No physical consideration of the storage is
required by the user
• Many tables are merged together to come up
with a new virtual view of the relationship
Chapter Two
Relational Data Model

You might also like