Professional Documents
Culture Documents
DBMS Slides
DBMS Slides
SYSTEM (DBMS)
• Database
– is collection of related data and its metadata organized in a
structured format
– for optimized information management
• Database System
– is an integrated system of hardware, software, people,
procedures, and data
– that define and regulate the collection, storage, management,
and use of data within a database environment
Introduction to DBMS
• A DBMS is a collection of software programs that allows a user to
define data types, structures, constraints, store data permanently,
modify and delete the operations.
• DBMS is basically a software used to add, modify, delete, select data
from the database.
• In simpler words, DBMS is a collection of interrelated data and
software programs to access those data.
• Purpose of DBMS:
1. Data redundancy and inconsistency
• Same information may be duplicated in several places.
• All copies may not be updated properly.
2. Difficulty in new program to carry out each new task
3. Data isolation —
• Data in different formats.
• Difficult to write new application programs.
4. Security problems
• Every user of the system should be able to access only the data they are
permitted to see.
E.g. payroll people only handle employee records, and cannot see
customer accounts; tellers only access account data and cannot see payroll
data.
• Difficult to enforce this with application programs.
5. Integrity problems
• Data may be required to satisfy constraints.
E.g. no account balance below $25.00.
• Again, difficult to enforce or to change constraints with the file-
processing approach.
DBMS manages interaction between end users and database
DBMS vs Flat File System
Advantages of DBMS: Disadvantages of DBMS:
• Controlling Redundancy Cost of Hardware and
Software
• Sharing of Data Cost of Data Conversion
• Data Consistency Cost of Staff Training
• Integration of Data Appointing Technical Staff
• Integration Constraints Database Damage
• Data Security
• Report Writers
• Control Over Concurrency
• Backup and Recovery Procedures
• Data Independence
Common terms used in DBMS
• Instance: The collection of information currently
stored in the database at the particular period.
• Schema: The overall design of the database that is
not expected to change frequently.
• Mapping
• Metadata: Data about data.
• Data dictionary: A special file that stores metadata.
• Methods: Values and bodies of codes that the
object contains.
Database System Structure
Components include:
Storage Manager
Disk Storage
Query Processor
1. Storage Manager
• Storage manager stores, retrieves, and updates data in the database.
It translates DML statements into low level statements that the
database understands
• Its components are:
• Authorization and integrity manager: It checks for the correctness of
the constraints specified and also checks the authenticity of the users
while entering the database.
• Transaction manager: It ensures that the database follows the ACID
property.
• File manager: It is responsible to manage allocation of space on the
disk storage.
• Buffer manager: It is responsible for fetching data from the disk
storage and to decide what data to cache in the main memory. It enables
to handle data sizes which are larger than the size of the disk.
2. Disk Storage
• It is responsible for allocating the disk space and also for
maintaining the data in the database.
• Its components are:
• Data files: It is the place where database is stored,
• Data dictionary: It stores metadata about the schema of the
database.
• Indices: It provides quick access to the data items that hold
particular values.
3. Query Processor
• It simplifies and facilitates easy access to the data.
• It translates queries written in non-procedural language, and
operates the statement.
• Its components are:
• DDL interpreter: It interprets DDL statements into low level
language understood by the system.
• DML compiler: It translates DML statements into an
evaluation plan consisting low level instructions that the query
evaluation engine understands.
• Query evaluation engine: It executes low level statements
generated by the DML compiler.
Two-tier Architecture
• In two tier architecture, user interface and application
programs are on the client side and query and transaction
facility are on the server side
• When DBMS access is required, the application program
establishes connection with client and server
• Client queries and server provides response to the
authenticated client request/queries.
• There is direct interaction of client with the server
• The business logic coupled with either client side or the
server side.
Advantages:
• Suitable for environments where business rules do not change frequently
• Number of users are limited
Disadvantages:
• Useless for large scale organizations
• Only a limited number of clients can access
Three-tier Architecture
• In three tier architecture, user interface and application
programs are on the client side and query and transaction
facility are on the server side and in between there is an
intermediate layer which stores business rules (procedures/
constraints) used to access data from the database.
• When DBMS access is required, the application program
establishes connection with client and server but before
forwarding the request of client, application rules check the
client credentials.
• There is indirect interaction of client with the server but
direct connection of client and application rules and
application rules and the server.
Advantages: Disadvantages:
• Efficiency More complex structure
• Security More difficult to setup
• Scalability More difficult to maintain
• Flexible Expensive
Levels/Layers of DBMS Architecture
• External Level: - External Level is
described by a schema i.e. it consists of
definition of logical records and relationship
in the external view.
• Conceptual Level: - Conceptual Level
represents the entire database. Conceptual
schema describes the records and
relationship included in the Conceptual view.
• Internal Level: - Internal level indicates
how the data will be stored and described
the data structures and access method to be
used by the database.
Components of DBMS
1. Hardware: Can range from a PC to a network of
computers.
2. Software: DBMS, operating system, network
software (if necessary) and also the application
programs.
3. Data: Used by the organization and a description of
this data called the schema.
4. People: Includes database designers, DBAs,
application programmers, and end-users.
5. Procedure: Instructions and rules that should be
applied to the design and use of the database and
DBMS.
Views of Data
Data Abstraction
• It is a process of easy user interface to users by
hiding underlying complexities of data management
from users.
• It defines views; which user can view which part.
• Database system provides users with abstract view
of the data.
• It only shows a part of database that a user needs.
Physical Level
• It is the lowest level of abstractions and describes
how the data are stored on the storage disk and
access mechanisms to retrieve the data.
• DBMS developer is concerned with this level.
• This level emphasis on minimizing the number of
disks access so that data can be retrieved very fast.
• It describes complex low-level data structures in
detail.
Logical Level
• This level describes what data are stored in the database
and the relationships between the data.
• It describes stored data in terms of the data model.
• It describes the database in terms of small number of
relatively simple structures.
View Level
• Every user don’t need to access all information stored in
the database.
• This level is basically concerned with dividing the
database according to the need of the database users.
• It simplifies the interaction of the users with the system.
Data Independence
• A database system normally contains a lot of data in addition to users’
data. It is rather difficult to modify or update a set of metadata once it
is stored in the database. But as a DBMS expands, it needs to change
over time to satisfy the requirements of the users. If the entire data is
dependent, it would become a tedious and highly complex job.
• Metadata itself follows a layered architecture, so that when we
change data at one layer, it does not affect the data at another level.
This data is independent but mapped to each other.
• Logical Data Independence- It stores information about how data is
managed inside. It’s a kind of mechanism, which liberalizes itself from
actual data stored on the disk. If we do some changes on table format,
it should not change the data residing on the disk.
• Physical Data Independence- Its the power to change the physical
data without impacting the schema or logical data. For example, if we
upgrade the storage system itself, it should not have any impact on
the logical data or schemas.
Data Models
• The basic design or the structure of the database is
the data model.
• It is a collection of conceptual tools for describing
data, data relationships, data semantics, and
consistency constraints.
• The basic structure of the database that organizes
data, defines how the data are stored and accessed,
and the relationships between the data, is the data
model.
Types of Database Models
• Hierarchical Model
• Network Model
• Entity-Relationship(E-R) Model
• Relational Model
• Object-Oriented Model
• Object-Relational Model
Hierarchical Data Model
• Data is represented as a tree.
• A record type can belong to only one owner type but a
owner type can belong to many record type.
• Advantages
– Conceptual simplicity
• groups of data could be related to each other
• related data could be viewed together
– Centralization of data
• reduced redundancy and promoted consistency
• Disadvantages
– Limited representation of data relationships
• did not allow Many-to-Many (M:N) relations
– Complex implementation
• required in-depth knowledge of physical data storage
– Structural Dependence
• data access requires physical storage path
– Lack of Standards
• limited portability
Network Data Model
• It is modified version of Hierarchical Data Model where it
allows more general connections among the nodes as well.
• They are difficult to use but are more flexible than
hierarchical databases.
• Advantages
– More data relationship types
– More efficient and flexible data access
• “network” vs. “tree” path traversal
– Conformance to standards
• enhanced database administration and portability
• Disadvantages
– System complexity
• require familiarity with the internal structure for data
access
– Lack of structural independence
• small structural changes require significant program
changes
Relational Data Model
• It is a lower level model that uses a collection of tables to
represent both data and relationships among those data.
• Each table has multiple columns, depending on the number
of attributes, and each column has a unique name.
• Advantages
– Structural independence
• Separation of database design and physical data
storage/access
• Easier database design, implementation, management,
and use
– Ad hoc query capability with Structured Query Language
(SQL)
• SQL translates user queries to codes
• Disadvantages
– Substantial hardware and system software overhead
• more complex system
– Poor design and implementation is made easy
• ease-of-use allows careless use of RDBMS
E-R Data Model
• Entity Relationship Model
• It is a high level model based on the need of the organization.
• Its entities are distinguishable from other objects and
relationship is an association among several entities.
Notations of ER Model
Cardinality in ER Diagram: Crow’s Foot Model
• Advantages
– Exceptional conceptual simplicity
• easily viewed and understood representation of database
• facilitates database design and management
– Integration with the relational database model
• enables better database design via conceptual modeling
• Disadvantages
– Incomplete model on its own
• Limited representational power
– cannot model data constraints not tied to entity
relationships
» e.g. attribute constraints
– cannot represent relationships between attributes
within entities
• No data manipulation language (e.g. SQL)
– Loss of information content
• Hard to include attributes in ERD
Object Oriented Data Model
• It represents entity sets as class and a class represents both
attributes and the behaviour of the entity.
• Instance of a class is an object.
• The internal part of the object is not externally visible.
• One object communicates with the other by sending
messages.
• Advantages
– Semantic representation of data
• fuller and more meaningful description of data via object
– Modularity, reusability, inheritance
– Ability to handle
• complex data
• sophisticated information requirements
• Disadvantages
– Lack of standards
• no standard data access method
– Complex navigational data access
• class hierarchy traversal
– Steep learning curve
• difficult to design and implement properly
– More system-oriented than user-centered
– High system overhead
• slow transactions
Database Users
• Users are distinguished by the way they interact with the database
system.
1. Naïve users:
• They interact with the system by invoking one of the application
programs written previously. Naive users are bank teller, receptionist, etc.
• For e.g., bank teller needs to add Rs.90 to account Ram, and deduct the
same amount from account Sita. Then the application program the bank
teller uses is transfer.
2. Application programmers:
• They are specializes computer professionals who write the application
programs for naïve users and for easy user interface can use an tools.
3. Sophisticated users:
• They make request to the database with writing application programs.
4. Specialized users:
• They are experts who write database application like system storing
complex data types, CAD systems that do not fit to the traditional data-
processing framework.
Concept of Relational Data Model
• Relational data model is the primary data model, which is
used widely for data storage and processing. It is simple and
has all the properties and capabilities required to process
data with storage efficiency.
• Tables − In relational data model, relations are saved in the
format of Tables. This format stores the relation among
entities. A table has rows and columns, where rows
represents records and columns represent the attributes.
• Tuple − A single row of a table, which contains a single
record for that relation is called a tuple.
• Relation instance − A finite set of tuples in the relational
database system represents relation instance. Relation
instances do not have duplicate tuples.
• Relation schema − A relation schema describes the relation
name (table name), attributes, and their names.
• Relation key − Each row has one or more attributes, known
as relation key, which can identify the row in the relation
(table) uniquely.
• Attribute domain − Every attribute has some pre-defined
value scope, known as attribute domain.
• Relational database systems are expected to be equipped
with a query language that can assist its users to query the
database instances. There are two kinds of query languages
– relational algebra
– relational calculus
Keys in DBMS
Key plays an important role in relational database; it is used for
identifying unique rows from table. It also establishes relationship among
tables
• Primary Key: A primary is a column or set of columns in a table that
uniquely identifies tuples (rows) in that table.
• Super Key: A super key is a set of one of more columns (attributes) to
uniquely identify rows in a table.
• Candidate Key: A super key with no redundant attribute is known as
candidate key.
• Alternate Key: Out of all candidate keys, only one gets selected as
primary key, remaining keys are known as alternate or secondary keys.
• Composite Key: A key that consists of more than one attribute to
uniquely identify rows (also known as records & tuples) in a table is
called composite key.
• Foreign Key: Foreign keys are the columns of a table that points to the
primary key of another table. They act as a cross-reference between
tables.
Constraints
Every relation has some conditions that must hold for it to be a
valid relation. These conditions are called Relational Integrity
Constraints. There are three main integrity constraints −
• Key constraints: force that in a relation with a key attribute,
no two tuples can have identical values for key attributes and,
a key attribute can not have NULL values.
• Domain constraints: enforces that every attribute is bound to
have a specific range of values.
• Referential integrity constraints: states that if a relation
refers to a key attribute of a different relation (foreign key)
or same relation, then that key element must exist.
Relational Algebra
• Relational algebra is a procedural query language, which takes
instances of relations as input and yields instances of relations as
output.
• It uses operators to perform queries. An operator can be either unary
or binary.
• Relational algebra is performed recursively on a relation and
intermediate results are also considered relations.
• The fundamental operations of relational algebra are as follows −
Select
Project
Union
Set difference
Cartesian product
Rename
• Additional operations include Joins, Set Intersection, Assignment.
Select Operation (σ)
• It selects tuples that satisfy the given predicate from a
relation.
• Notation − σp(r) where σ stands for selection predicate
and r stands for relation. p is prepositional logic formula
which may use connectors like and, or, and not. These terms
may use relational operators like − =, ≠, ≥, < , >, ≤.
• For example −
σsubject = "database" and price = "450" or year > "2010"(Books)
• Mapping an Entity:
– Create table for each entity.
– Entity’s attributes should become fields of tables with their
respective data types.
– Declare primary key.
ER Model to Relational Model
• Mapping a Relationship:
– Create table for a relationship.
– Add the primary keys of all participating Entities as fields of table
with their respective data types.
– If relationship has any attribute, add each attribute as field of
table.
– Declare a primary key composing all the primary keys of
participating entities.
– Declare all foreign key constraints.
Schedule 1
• Example schedule:
Let T1 and T2 be the transactions defined previously.
The following schedule is not a serial schedule,
but it is equivalent to Schedule 1.
Schedule 2
• Network Transparency
• Replication Transparency
• Fragmentation Transparency
• Scalability