Download as pdf or txt
Download as pdf or txt
You are on page 1of 127

DATABASE MANAGEMENT

SYSTEM (DBMS)
• Database
– is collection of related data and its metadata organized in a
structured format
– for optimized information management

• Database Management System (DBMS)


– is a software that enables easy creation, access, and
modification of databases
– for efficient and effective database management

• Database System
– is an integrated system of hardware, software, people,
procedures, and data
– that define and regulate the collection, storage, management,
and use of data within a database environment
Introduction to DBMS
• A DBMS is a collection of software programs that allows a user to
define data types, structures, constraints, store data permanently,
modify and delete the operations.
• DBMS is basically a software used to add, modify, delete, select data
from the database.
• In simpler words, DBMS is a collection of interrelated data and
software programs to access those data.

• Why use a DBMS:


– Data independence and efficient access.
– Reduced application development time.
– Data integrity and security.
– Uniform data administration.
– Concurrent access, recovery from crashes.
• Functions of DBMS:
– Stores data and related data entry forms, report definitions, etc.
– Hides the complexities of relational database model from the user
▪ facilitates the construction/definition of data elements and
their relationships
▪ enables data transformation and presentation
– Enforces data integrity
– Implements data security management
▪ access, privacy, backup & restoration

• Purpose of DBMS:
1. Data redundancy and inconsistency
• Same information may be duplicated in several places.
• All copies may not be updated properly.
2. Difficulty in new program to carry out each new task
3. Data isolation —
• Data in different formats.
• Difficult to write new application programs.
4. Security problems
• Every user of the system should be able to access only the data they are
permitted to see.
E.g. payroll people only handle employee records, and cannot see
customer accounts; tellers only access account data and cannot see payroll
data.
• Difficult to enforce this with application programs.
5. Integrity problems
• Data may be required to satisfy constraints.
E.g. no account balance below $25.00.
• Again, difficult to enforce or to change constraints with the file-
processing approach.
DBMS manages interaction between end users and database
DBMS vs Flat File System
Advantages of DBMS: Disadvantages of DBMS:
• Controlling Redundancy Cost of Hardware and
Software
• Sharing of Data Cost of Data Conversion
• Data Consistency Cost of Staff Training
• Integration of Data Appointing Technical Staff
• Integration Constraints Database Damage
• Data Security
• Report Writers
• Control Over Concurrency
• Backup and Recovery Procedures
• Data Independence
Common terms used in DBMS
• Instance: The collection of information currently
stored in the database at the particular period.
• Schema: The overall design of the database that is
not expected to change frequently.
• Mapping
• Metadata: Data about data.
• Data dictionary: A special file that stores metadata.
• Methods: Values and bodies of codes that the
object contains.
Database System Structure
Components include:
Storage Manager

Disk Storage

Query Processor
1. Storage Manager
• Storage manager stores, retrieves, and updates data in the database.
It translates DML statements into low level statements that the
database understands
• Its components are:
• Authorization and integrity manager: It checks for the correctness of
the constraints specified and also checks the authenticity of the users
while entering the database.
• Transaction manager: It ensures that the database follows the ACID
property.
• File manager: It is responsible to manage allocation of space on the
disk storage.
• Buffer manager: It is responsible for fetching data from the disk
storage and to decide what data to cache in the main memory. It enables
to handle data sizes which are larger than the size of the disk.
2. Disk Storage
• It is responsible for allocating the disk space and also for
maintaining the data in the database.
• Its components are:
• Data files: It is the place where database is stored,
• Data dictionary: It stores metadata about the schema of the
database.
• Indices: It provides quick access to the data items that hold
particular values.
3. Query Processor
• It simplifies and facilitates easy access to the data.
• It translates queries written in non-procedural language, and
operates the statement.
• Its components are:
• DDL interpreter: It interprets DDL statements into low level
language understood by the system.
• DML compiler: It translates DML statements into an
evaluation plan consisting low level instructions that the query
evaluation engine understands.
• Query evaluation engine: It executes low level statements
generated by the DML compiler.
Two-tier Architecture
• In two tier architecture, user interface and application
programs are on the client side and query and transaction
facility are on the server side
• When DBMS access is required, the application program
establishes connection with client and server
• Client queries and server provides response to the
authenticated client request/queries.
• There is direct interaction of client with the server
• The business logic coupled with either client side or the
server side.
Advantages:
• Suitable for environments where business rules do not change frequently
• Number of users are limited
Disadvantages:
• Useless for large scale organizations
• Only a limited number of clients can access
Three-tier Architecture
• In three tier architecture, user interface and application
programs are on the client side and query and transaction
facility are on the server side and in between there is an
intermediate layer which stores business rules (procedures/
constraints) used to access data from the database.
• When DBMS access is required, the application program
establishes connection with client and server but before
forwarding the request of client, application rules check the
client credentials.
• There is indirect interaction of client with the server but
direct connection of client and application rules and
application rules and the server.
Advantages: Disadvantages:
• Efficiency More complex structure
• Security More difficult to setup
• Scalability More difficult to maintain
• Flexible Expensive
Levels/Layers of DBMS Architecture
• External Level: - External Level is
described by a schema i.e. it consists of
definition of logical records and relationship
in the external view.
• Conceptual Level: - Conceptual Level
represents the entire database. Conceptual
schema describes the records and
relationship included in the Conceptual view.
• Internal Level: - Internal level indicates
how the data will be stored and described
the data structures and access method to be
used by the database.
Components of DBMS
1. Hardware: Can range from a PC to a network of
computers.
2. Software: DBMS, operating system, network
software (if necessary) and also the application
programs.
3. Data: Used by the organization and a description of
this data called the schema.
4. People: Includes database designers, DBAs,
application programmers, and end-users.
5. Procedure: Instructions and rules that should be
applied to the design and use of the database and
DBMS.
Views of Data
Data Abstraction
• It is a process of easy user interface to users by
hiding underlying complexities of data management
from users.
• It defines views; which user can view which part.
• Database system provides users with abstract view
of the data.
• It only shows a part of database that a user needs.
Physical Level
• It is the lowest level of abstractions and describes
how the data are stored on the storage disk and
access mechanisms to retrieve the data.
• DBMS developer is concerned with this level.
• This level emphasis on minimizing the number of
disks access so that data can be retrieved very fast.
• It describes complex low-level data structures in
detail.
Logical Level
• This level describes what data are stored in the database
and the relationships between the data.
• It describes stored data in terms of the data model.
• It describes the database in terms of small number of
relatively simple structures.

View Level
• Every user don’t need to access all information stored in
the database.
• This level is basically concerned with dividing the
database according to the need of the database users.
• It simplifies the interaction of the users with the system.
Data Independence
• A database system normally contains a lot of data in addition to users’
data. It is rather difficult to modify or update a set of metadata once it
is stored in the database. But as a DBMS expands, it needs to change
over time to satisfy the requirements of the users. If the entire data is
dependent, it would become a tedious and highly complex job.
• Metadata itself follows a layered architecture, so that when we
change data at one layer, it does not affect the data at another level.
This data is independent but mapped to each other.
• Logical Data Independence- It stores information about how data is
managed inside. It’s a kind of mechanism, which liberalizes itself from
actual data stored on the disk. If we do some changes on table format,
it should not change the data residing on the disk.
• Physical Data Independence- Its the power to change the physical
data without impacting the schema or logical data. For example, if we
upgrade the storage system itself, it should not have any impact on
the logical data or schemas.
Data Models
• The basic design or the structure of the database is
the data model.
• It is a collection of conceptual tools for describing
data, data relationships, data semantics, and
consistency constraints.
• The basic structure of the database that organizes
data, defines how the data are stored and accessed,
and the relationships between the data, is the data
model.
Types of Database Models
• Hierarchical Model
• Network Model
• Entity-Relationship(E-R) Model
• Relational Model
• Object-Oriented Model
• Object-Relational Model
Hierarchical Data Model
• Data is represented as a tree.
• A record type can belong to only one owner type but a
owner type can belong to many record type.
• Advantages
– Conceptual simplicity
• groups of data could be related to each other
• related data could be viewed together
– Centralization of data
• reduced redundancy and promoted consistency

• Disadvantages
– Limited representation of data relationships
• did not allow Many-to-Many (M:N) relations
– Complex implementation
• required in-depth knowledge of physical data storage
– Structural Dependence
• data access requires physical storage path
– Lack of Standards
• limited portability
Network Data Model
• It is modified version of Hierarchical Data Model where it
allows more general connections among the nodes as well.
• They are difficult to use but are more flexible than
hierarchical databases.
• Advantages
– More data relationship types
– More efficient and flexible data access
• “network” vs. “tree” path traversal
– Conformance to standards
• enhanced database administration and portability

• Disadvantages
– System complexity
• require familiarity with the internal structure for data
access
– Lack of structural independence
• small structural changes require significant program
changes
Relational Data Model
• It is a lower level model that uses a collection of tables to
represent both data and relationships among those data.
• Each table has multiple columns, depending on the number
of attributes, and each column has a unique name.
• Advantages
– Structural independence
• Separation of database design and physical data
storage/access
• Easier database design, implementation, management,
and use
– Ad hoc query capability with Structured Query Language
(SQL)
• SQL translates user queries to codes

• Disadvantages
– Substantial hardware and system software overhead
• more complex system
– Poor design and implementation is made easy
• ease-of-use allows careless use of RDBMS
E-R Data Model
• Entity Relationship Model
• It is a high level model based on the need of the organization.
• Its entities are distinguishable from other objects and
relationship is an association among several entities.
Notations of ER Model
Cardinality in ER Diagram: Crow’s Foot Model
• Advantages
– Exceptional conceptual simplicity
• easily viewed and understood representation of database
• facilitates database design and management
– Integration with the relational database model
• enables better database design via conceptual modeling
• Disadvantages
– Incomplete model on its own
• Limited representational power
– cannot model data constraints not tied to entity
relationships
» e.g. attribute constraints
– cannot represent relationships between attributes
within entities
• No data manipulation language (e.g. SQL)
– Loss of information content
• Hard to include attributes in ERD
Object Oriented Data Model
• It represents entity sets as class and a class represents both
attributes and the behaviour of the entity.
• Instance of a class is an object.
• The internal part of the object is not externally visible.
• One object communicates with the other by sending
messages.
• Advantages
– Semantic representation of data
• fuller and more meaningful description of data via object
– Modularity, reusability, inheritance
– Ability to handle
• complex data
• sophisticated information requirements

• Disadvantages
– Lack of standards
• no standard data access method
– Complex navigational data access
• class hierarchy traversal
– Steep learning curve
• difficult to design and implement properly
– More system-oriented than user-centered
– High system overhead
• slow transactions
Database Users
• Users are distinguished by the way they interact with the database
system.
1. Naïve users:
• They interact with the system by invoking one of the application
programs written previously. Naive users are bank teller, receptionist, etc.
• For e.g., bank teller needs to add Rs.90 to account Ram, and deduct the
same amount from account Sita. Then the application program the bank
teller uses is transfer.
2. Application programmers:
• They are specializes computer professionals who write the application
programs for naïve users and for easy user interface can use an tools.
3. Sophisticated users:
• They make request to the database with writing application programs.
4. Specialized users:
• They are experts who write database application like system storing
complex data types, CAD systems that do not fit to the traditional data-
processing framework.
Concept of Relational Data Model
• Relational data model is the primary data model, which is
used widely for data storage and processing. It is simple and
has all the properties and capabilities required to process
data with storage efficiency.
• Tables − In relational data model, relations are saved in the
format of Tables. This format stores the relation among
entities. A table has rows and columns, where rows
represents records and columns represent the attributes.
• Tuple − A single row of a table, which contains a single
record for that relation is called a tuple.
• Relation instance − A finite set of tuples in the relational
database system represents relation instance. Relation
instances do not have duplicate tuples.
• Relation schema − A relation schema describes the relation
name (table name), attributes, and their names.
• Relation key − Each row has one or more attributes, known
as relation key, which can identify the row in the relation
(table) uniquely.
• Attribute domain − Every attribute has some pre-defined
value scope, known as attribute domain.
• Relational database systems are expected to be equipped
with a query language that can assist its users to query the
database instances. There are two kinds of query languages
– relational algebra
– relational calculus
Keys in DBMS
Key plays an important role in relational database; it is used for
identifying unique rows from table. It also establishes relationship among
tables
• Primary Key: A primary is a column or set of columns in a table that
uniquely identifies tuples (rows) in that table.
• Super Key: A super key is a set of one of more columns (attributes) to
uniquely identify rows in a table.
• Candidate Key: A super key with no redundant attribute is known as
candidate key.
• Alternate Key: Out of all candidate keys, only one gets selected as
primary key, remaining keys are known as alternate or secondary keys.
• Composite Key: A key that consists of more than one attribute to
uniquely identify rows (also known as records & tuples) in a table is
called composite key.
• Foreign Key: Foreign keys are the columns of a table that points to the
primary key of another table. They act as a cross-reference between
tables.
Constraints
Every relation has some conditions that must hold for it to be a
valid relation. These conditions are called Relational Integrity
Constraints. There are three main integrity constraints −
• Key constraints: force that in a relation with a key attribute,
no two tuples can have identical values for key attributes and,
a key attribute can not have NULL values.
• Domain constraints: enforces that every attribute is bound to
have a specific range of values.
• Referential integrity constraints: states that if a relation
refers to a key attribute of a different relation (foreign key)
or same relation, then that key element must exist.
Relational Algebra
• Relational algebra is a procedural query language, which takes
instances of relations as input and yields instances of relations as
output.
• It uses operators to perform queries. An operator can be either unary
or binary.
• Relational algebra is performed recursively on a relation and
intermediate results are also considered relations.
• The fundamental operations of relational algebra are as follows −
Select
Project
Union
Set difference
Cartesian product
Rename
• Additional operations include Joins, Set Intersection, Assignment.
Select Operation (σ)
• It selects tuples that satisfy the given predicate from a
relation.
• Notation − σp(r) where σ stands for selection predicate
and r stands for relation. p is prepositional logic formula
which may use connectors like and, or, and not. These terms
may use relational operators like − =, ≠, ≥, < , >, ≤.
• For example −
σsubject = "database" and price = "450" or year > "2010"(Books)

Output − Selects tuples from books where subject is


'database' and 'price' is 450 or those books published after
2010.
Project Operation (∏)
• It projects column(s) that satisfy a given predicate.
• Notation − ∏A1, A2, An (r) where A1, A2 , An are attribute names
of relation r.
• Duplicate rows are automatically eliminated as relation is a
set.
• For example −
∏subject, author (Books)

Output - Selects and projects columns named as subject


and author from the relation Books.
Union Operation (∪)
• It performs binary union between two given relations and is
defined as −
r ∪ s = { t | t ∈ r or t ∈ s}
• Notation − r U s where r and s are either database relations
or relation result set (temporary relation).
• For a union operation to be valid, the following conditions
must hold −
r and s must have the same number of attributes.
Attribute domains must be compatible.
Duplicate tuples are automatically eliminated.
• For Example:
∏ author (Books) ∪ ∏ author (Articles)
Output − Projects the names of the authors who have
either written a book or an article or both.
Set Difference (−)

• The result of set difference operation is tuples, which are


present in one relation but are not in the second relation.
• Notation: r − s which finds all the tuples that are present
in r but not in s.
• For Example:
∏ author (Books) − ∏ author (Articles)

Output − Provides the name of authors who have written


books but not articles.
Cartesian Product (Χ)
• Combines information of two different relations into one.
• Notation: r Χ s where r and s are relations and their output
will be defined as: r Χ s = { q t | q ∈ r and t ∈ s}
• For Example:
σauthor = ‘abcd' (Books Χ Articles)

Output − Yields a relation, which shows all the books and


articles written by abcd.

Rename Operation (ρ)


• The results of relational algebra are also relations but
without any name. The rename operation allows us to
rename the output relation.
• Notation: ρ x (E) where the result of expression E is saved
with name of x.
ER Model to Relational Model
• ER diagrams can be mapped to relational schema, i.e. it is possible to
create relational schema using ER diagram. Although all the ER
constraints can’t be imported into relational model, but an
approximate schema can be generated.
• ER diagrams mainly comprise of −
Entity and its Attributes, i.e. a real-world object with characteristic
values which describe the instances in the rows of database.
Relationship, which is association among entities.

• Mapping an Entity:
– Create table for each entity.
– Entity’s attributes should become fields of tables with their
respective data types.
– Declare primary key.
ER Model to Relational Model
• Mapping a Relationship:
– Create table for a relationship.
– Add the primary keys of all participating Entities as fields of table
with their respective data types.
– If relationship has any attribute, add each attribute as field of
table.
– Declare a primary key composing all the primary keys of
participating entities.
– Declare all foreign key constraints.

• Mapping a Weak Entity Set:


– Create table for weak entity set.
– Add all its attributes to table as field.
– Add the primary key of identifying entity set.
– Declare all foreign key constraints.
ER Model to Relational Model
• Mapping a Hierarchical Entity:
ER specialization or generalization comes in the form of hierarchical
entity sets.
– Create tables for all higher-level entities.
– Create tables for lower-level entities.
– Add primary keys of higher-level entities in the table of lower-level
entities.
– In lower-level tables, add all other attributes of lower-level
entities.
– Declare primary key of higher-level table and the primary key for
lower-level table.
– Declare foreign key constraints.
Structured Query Language (SQL)
• SQL is a non-procedural programming language for
Relational Databases.
• It is designed over relational algebra and tuple relational
calculus.
• SQL comes as a package with all major distributions of
RDBMS.
• SQL comprises both data definition and data manipulation
languages. Using the data definition properties of SQL, one
can design and modify database schema, whereas data
manipulation properties allows SQL to store and retrieve
data from database.
Database Languages
1. Data Definition Language(DDL):
• Database language that is used to create, delete or modify
database schema is called DDL.
• It is used by Database Administrators(DBA) to specify the
conceptual schema.
• DDL interpreter converts DDL statements into equivalent low
level statements understood by the DBMS.
• Normally, create, alter, and drop statements are DDL
statements.
• DDL statements make changes in the schema
Some examples:
• CREATE - to create objects in the database
• ALTER - alters the structure of the database
• DROP - delete objects from the database
• TRUNCATE - remove all records from a table, including all spaces
allocated for the records are removed
• COMMENT - add comments to the data dictionary
• RENAME - rename an object
2. Data Manipulation Language(DML):
• Database language that enables insert, update,
delete, and retrieval of data from the database is
called Data Manipulation Language.
• DML complier converts DML statements into
equivalent low level statements that the database
understands.
• Normally, insert, update, delete, select are DML
commands.
• DML reflects change in the instance, not the schema
Some examples:
• SELECT - Retrieve data from the a database
• INSERT - Insert data into a table
• UPDATE - Updates existing data within a table
• DELETE - deletes all records from a table, the space for the records
remain
• MERGE - UPSERT operation (insert or update)
• CALL - Call a PL/SQL or Java subprogram
• EXPLAIN PLAN - explain access path to data
• LOCK TABLE - control concurrency
SQL Views
• A VIEW is a virtual table, through which a selective portion of the data
from one or more tables can be seen.
• Views do not contain data of their own.
• They are used to restrict access to the database or to hide data
complexity.
• A view is stored as a SELECT statement in the database. DML
operations on a view like INSERT, UPDATE, DELETE affects the data in
the original table upon which the view is based.
• The Syntax to create a SQL view is:
CREATE VIEW view_name
AS
SELECT column_list_to_display
FROM table_name [WHERE condition];
SQL Indexes
• An index is a schema object. It is used by the server to speed up the
retrieval of rows by using a pointer.
• It can reduce disk I/O by using a rapid path access method to locate
data quickly. An index helps to speed up select queries and where
clauses, but it slows down data input, with the update and the insert
statements.
• Indexes can be created or dropped with no effect on the data.
• When should indexes be created:
A column contains a wide range of values
A column does not contain a large number of null values
One or more columns are frequently used together in a where
clause or a join condition
• When should indexes be avoided:
The table is small
The columns are not often used as a condition in the query
The column is updated frequently
SQL Indexes contd.
• Syntax for Creating an Index:
CREATE INDEX index
ON TABLE column;
where index is the name given to that index and TABLE is the name of the
table on which that index is created and column is the name of that
column for which it is applied.
• For multiple columns:
CREATE INDEX index
ON TABLE (cloumn1, column2,.....);
• Unique Indexes:
CREATE UNIQUE INDEX index
ON TABLE column;
Unique indexes are used for the maintenance of the integrity of the data
present in the table as well as for the fast performance, it does not allow
multiple values to enter into the table.
SQL Constraints
Constraints are the rules that we can apply on the type of data in a table. That is,
we can specify the limit on the type of data that can be stored in a particular
column in a table using constraints. The available constraints in SQL are:
• NOT NULL: This constraint tells that we cannot store a null value in a column.
That is, if a column is specified as NOT NULL then we will not be able to store
null in this particular column any more.
• UNIQUE: This constraint when specified with a column, tells that all the
values in the column must be unique. That is, the values in any row of a
column must not be repeated.
• PRIMARY KEY: A primary key is a field which can uniquely identify each row in
a table. And this constraint is used to specify a field in a table as primary key.
• FOREIGN KEY: A Foreign key is a field which can uniquely identify each row in
a another table. And this constraint is used to specify a field as Foreign key.
• CHECK: This constraint helps to validate the values of a column to meet a
particular condition. That is, it helps to ensure that the value stored in a
column meets a specific condition.
• DEFAULT: This constraint specifies a default value for the column when no
value is specified by the user.
SQL Constraints contd.
• We can specify constraints at the time of creating the table using
CREATE TABLE statement. We can also specify the constraints after
creating a table using ALTER TABLE statement.
• Syntax using CREATE TABLE statement:
CREATE TABLE sample_table
(
column1 data_type(size) constraint_name,
column2 data_type(size) constraint_name,
column3 data_type(size) constraint_name,
....
);
where sample_table is the name of the table to be created, data_type is
the type of data that can be stored in the field, constraint_name is the
name of the constraint, for example- NOT NULL, UNIQUE, PRIMARY KEY
etc.
Oracle Database
• An Oracle database is a collection of data treated as a unit. The
purpose of a database is to store and retrieve related information. A
database server is the key to solving the problems of information
management. In general, a server reliably manages a large amount of
data in a multiuser environment so that many users can concurrently
access the same data. All this is accomplished while delivering high
performance.
• Oracle Database is the first database designed for enterprise grid
computing, the most flexible and cost effective way to manage
information and applications. The database has logical
structures and physical structures. Because the physical and logical
structures are separate, the physical storage of data can be managed
without affecting the access to logical storage structures.
Primary Architecture Components
1. Oracle server: An Oracle server includes an Oracle Instance and
an Oracle database.
• An Oracle database includes several different types of files: datafiles,
control files, redo log files and archive redo log files. The Oracle
server also accesses parameter files and password files.
• This set of files has several purposes.
o One is to enable system users to process SQL statements.
o Another is to improve system performance.
o Still another is to ensure the database can be recovered if there is
a software/hardware failure.
• The database server must manage large amounts of data in a multi-
user environment.
• The server must manage concurrent access to the same data.
• The server must deliver high performance. This generally means fast
response times.
Primary Architecture Components
Oracle instance: An Oracle Instance consists of two different sets of
components:
• The first component set is the set of background processes (PMON,
SMON, RECO, DBW0, LGWR, CKPT, D000 and others).
o Each background process is a computer program.
o These processes perform input/output and monitor other Oracle
processes to provide good performance and database reliability.
• The second component set includes the memory structures that
comprise the Oracle instance.
• When an instance starts up, a memory structure called the System
Global Area (SGA) is allocated.
o At this point the background processes also start.
• An Oracle Instance provides access to one and only one Oracle
database.
Primary Architecture Components
Oracle database: An Oracle database consists of files.
• Sometimes these are referred to as operating system files, but they
are actually database files that store the database information that a
firm or organization needs in order to operate.
• The redo log files are used to recover the database in the event of
application program failures, instance failures and other minor
failures.
• The archived redo log files are used to recover the database if a disk
fails.
• Other files not shown in the figure include:
o The required parameter file that is used to specify parameters for
configuring an Oracle instance when it starts up.
o The optional password file authenticates special users of the
database – these are termed privileged users and include database
administrators.
o Alert and Trace Log Files – these files store information about
errors and actions taken that affect the configuration of the
Primary Architecture Components
2. User and server processes: The processes shown in the figure are
called user and server processes. These processes are used to manage
the execution of SQL statements.

• A Shared Server Process can share memory and variable processing


for multiple user processes.

• A Dedicated Server Process manages memory and variables for a


single user process.
Oracle Internal Memory Structure
The basic memory structures associated with Oracle Database include:
• System global area (SGA): The SGA is a group of shared memory
structures, known as SGA components, that contain data and control
information for one Oracle Database instance. All server and background
processes share the SGA. Examples of data stored in the SGA include
cached data blocks and shared SQL areas.
• Program global area (PGA): A PGA is a non-shared memory region that
contains data and control information exclusively for use by an Oracle
process. Oracle Database creates the PGA when an Oracle process starts.
One PGA exists for each server process and background process. The
collection of individual PGAs is the total instance PGA, or instance PGA.
Database initialization parameters set the size of the instance PGA, not
individual PGAs.
• User global area (UGA): The UGA is memory associated with a user
session.
• Software code areas: These are portions of memory used to store code
that is being run or can be run. Oracle Database code is stored in a
software area that is typically at a different location from user
programs—a more exclusive or protected location.
Physical & Logical Data Structures
• The physical database structure comprises of datafiles, temp
files, redo log files and control files:
Datafiles: Datafiles contain database's data. The data of
logical data structures such as tables and indexes is stored in
datafiles of the database. One or more datafiles form a
logical unit of database storage called a tablespace.
Redo log files: The purpose of these files is to record all
changes made to data. These files protect database against
failures.
Control files: Control files contain entries such as
database name, name and location of datafiles and redo log
files and time stamp of database creation.
• Logical structures include tablespaces, schema objects, data
blocks, extents and segments:
Tablespaces: Database is logically divided into one or
more tablespaces. Each tablespace creates one or more
datafiles to physically store data.
Schema objects: Schema objects are the structure that
represents database's data. Schema objects include
structures such as tables, views, sequences, stored procedures,
indexes, synonyms, clusters and database links.
Data Blocks: Data block represents specific number of
bytes of physical database space on disk.
Extents: An extent represents continuous data blocks that
are used to store specific data information.
Segments: A segment is a set of extents allocated for a
certain logical structure.
Tablespaces
• An Oracle database is comprised of one or more logical
storage units called tablespaces. The database's data is
collectively stored in the database's tablespaces. Each
tablespace in an Oracle database is comprised of one or
more operating system files called datafiles.
• Tablespaces are the bridge between certain physical and
logical components of the Oracle database. Tablespaces are
where you store Oracle database objects such as tables,
indexes and rollback segments. You can think of a tablespace
like a shared disk drive in Windows.
Types of Tablespaces
1. Permanent: Used to store your user and application data. Oracle Database
uses permanent tablespaces to store permanent data, such as system data.
Each user is assigned a default permanent tablespace.
2. Undo: A database running in automatic undo management mode
transparently creates and manages undo data in the undo tablespace. Oracle
Database uses undo data to roll back transactions, to provide read
consistency, to help with database recovery, and to enable features such as
Oracle Flashback Query. A database instance can have only one active undo
tablespace.
3. Temporary: Used for storing temporary data, as would be created when
SQL statements perform sort operations. An Oracle database gets a
temporary tablespace when the database is created. You would create
another temporary tablespace if you were creating a temporary tablespace
group. Under typical circumstances, you do not have to create additional
temporary tablespaces. If you have an extremely large database, then you
might configure additional temporary tablespaces. The TEMP tablespace is
typically used as the default temporary tablespace for users who are not
explicitly assigned a temporary tablespace.
Background Processes
• To maximize performance and accommodate many users, a
multiprocess Oracle Database system uses background
processes.
• Background processes consolidate functions that would
otherwise be handled by multiple database programs
running for each user process.
• Background processes asynchronously perform I/O and
monitor other Oracle Database processes to provide
increased parallelism for better performance and reliability.
• The use of additional database server features or options can
cause more background processes to be present. For
example, when you use Advanced Queuing, the queue
monitor (QMNn) background process is present.
Triggers & Stored Procedures
• A stored procedure is a set of Structured Query Language (SQL)
statements with an assigned name, which are stored in a relational
database management system as a group, so it can be reused and
shared by multiple programs.
• Triggers are similar to stored procedures. A trigger stored in the
database can include SQL and PL/SQL or Java statements to run as a
unit and can invoke stored procedures.
• However, procedures and triggers differ in the way that they are
invoked. A procedure is explicitly run by a user, application, or trigger.
Triggers are implicitly fired by the Database when a triggering event
occurs, no matter which user is connected or which application is
being used.
Triggers
You can write triggers that fire whenever one of the following operations occurs:
DML statements (INSERT, UPDATE, DELETE) on a particular table or view, issued
by any user
DDL statements (CREATE or ALTER primarily) issued either by a particular
schema/user or by any schema/user in the database.
Database events, such as logon/logoff, errors, or startup/shutdown, also
issued either by a particular schema/user or by any schema/user in the
database.
Triggers can be used to:
• Automatically generate derived column values
• Prevent invalid transactions
• Enforce complex security authorizations
• Enforce referential integrity across nodes in a distributed database
• Provide transparent event logging
• Maintain synchronous table replicates
• Gather statistics on table access
Cursors
• A cursor is a database object which is used to retrieve data
from resultset one row at a time. The cursor can be used
when the data needs to be updated row by row.
• A cursor is a temporary work area created in the system
memory when a SQL statement is executed. This temporary
work area is used to store the data retrieved from the
database, and manipulate this data.
• Cursors can be faster than a while loop but they do have
more overhead and occupy memory of the system.
Privileges
• Privileges defines the access rights provided to a user on a
database object. There are two types of privileges:
1) System privileges - This allows the user to CREATE, ALTER, or
DROP database objects.
2) Object privileges - This allows the user to EXECUTE, SELECT,
INSERT, UPDATE, or DELETE data from database objects to which
the privileges apply.
• Only Database Administrators or owners of the database object
can provide/remove privileges on a database object.
• Roles, on the other hand, are created by users (usually
administrators) and are used to group together privileges or
other roles.
Roles
• Roles are a collection of privileges or access rights.
• When there are many users in a database it becomes difficult to grant
or revoke privileges to users. Therefore, if you define roles, you can
grant or revoke privileges to users, thereby automatically granting or
revoking privileges.
• You can either create Roles or use the system roles pre-defined by
oracle.
• Some of the privileges granted to the system roles are as given:
User-Defined Functions
• User-defined functions combine the advantages of stored
procedures with the capabilities of SQL predefined functions.
• They can accept parameters, perform specific calculations based
on data retrieved by one or more SELECT statement, and return
results directly to the calling SQL statement.
• For example, user-defined functions can be used in the following
to provide functionality that is not available in SQL or SQL built-in
functions:
The select list of a SELECT statement
The condition of a WHERE clause
CONNECT BY, START WITH, ORDER BY, and GROUP BY clauses
The VALUES clause of an INSERT statement
The SET clause of an UPDATE statement
Functional Dependency
• Functional dependency (FD) is a set of constraints between
two attributes in a relation.
• Functional dependency says that if two tuples have same
values for attributes A1, A2,..., An, then those two tuples
must have to have same values for attributes B1, B2, ..., Bn.
• Functional dependency is represented by an arrow sign (→)
that is, X→Y, where X functionally determines Y. The left-
hand side attributes determine the values of attributes on
the right-hand side
• If a functional dependency (FD) X → Y holds, where Y is a
subset of X, then it is called a trivial FD. Trivial FDs always
hold.
Armstrong’s Axioms
• If F is a set of functional dependencies then the closure of F,
denoted as F+, is the set of all functional dependencies
logically implied by F. Armstrong's Axioms are a set of rules,
that when applied repeatedly, generates a closure of
functional dependencies.
• Reflexive rule: If alpha is a set of attributes and beta
is_subset_of alpha, then alpha holds beta.
• Augmentation rule: If a → b holds and y is attribute set, then
ay → by also holds. That is adding attributes in dependencies,
does not change the basic dependencies.
• Transitivity rule: Same as transitive rule in algebra, if a → b
holds and b → c holds, then a → c also holds. a → b is called
as a functionally that determines b.
Normalization
• Normalization is a method to remove the following
anomalies and bring the database to a consistent state:
• Update anomalies − If data items are scattered and are not
linked to each other properly, then it could lead to strange
situations. For example, when we try to update one data
item having its copies scattered over several places, a few
instances get updated properly while a few others are left
with old values. Such instances leave the database in an
inconsistent state.
• Deletion anomalies − We tried to delete a record, but parts
of it was left undeleted because of unawareness, the data is
also saved somewhere else.
• Insert anomalies − We tried to insert data in a record that
does not exist at all.
Normalization
• Developed by E.F. Codd, normalization is a process of
organizing data into tables/relations in such a way that the
results of using the database are always unambiguous and as
intended. Such normalization is intrinsic to relational
database theory. It may have the effect of duplicating data
within the database and often results in the creation of
additional tables.
• Types of Normalization/ Normal Forms:
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
First Normal Form
• First normal form enforces these criteria:
Eliminate repeating groups in individual relations.
Create a separate relation for each set of related data.
Identify each set of related data with a primary key.
• Example: Colour attribute of Table_Product contains multiple values.
The relation can be decomposed as follows to be in 1NF:
Second Normal Form
• A table is said to be in 2NF if both the following conditions hold:
Relation is in 1NF, i.e. atomic values for all attributes.
No non-prime attribute is dependent on the proper subset of any
candidate key of table.
• An attribute that is not part of any candidate key is known as a non-
prime attribute.
• Example: Table purchase detail has a composite primary key-
customer id, store id. The non key attribute is location and location
depends on store id, which is part of the primary key. Decomposed
into 2NF as:
Third Normal Form
• A table is said to be in 3NF if both the following conditions hold:
Relation must be in 2NF.
Transitive functional dependency of non-prime attribute on any
super key should be removed.
• A is a transitive dependency of C (A->C) if A is functionally dependent
on B (A->B), and B is functionally dependent on C (B->C) but not on A
(B not->A).
• Example: In Table Book Details, book_id determines genre_id and
genre_id determines genre type. Thus book_idd determines genre
type via genre_id. Decomposing into 3NF as follows:
Boyce-Codd Normal Form
• It is an advance version of 3NF that’s why it is also referred as 3.5NF.
• BCNF is stricter than 3NF.
• A table complies with BCNF if it is in 3NF and for every functional
dependency X->Y, X should be the super key of the table.
• Example: Consider a relation with KEY: {Student, Course} and FDs:
{student, course} -> Teacher, Teacher-> Course. Teacher is not a
superkey but it determines course, so decomposing into BCNF:
Fourth Normal Form
• 4NF is a level of database normalization where there are no non-
trivial multivalued dependencies (MVD) other than a candidate key.
• It builds on the first three normal forms (1NF, 2NF, 3NF) and the BCNF.
• A MVD on R, X ->-> Y , says that if two tuples of R agree on all the
attributes of X, then their components in Y may be swapped, and the
result will be two tuples that are also in the relation, i.e., for each
value of X, the values of Y are independent of the values of R-X-Y.
• Example: Conside a relation with Key: {students, major, hobby} and
FD: MVD: ->-> Major, hobby. Decomposing into 4NF:
Fifth Normal Form
• A database is said to be in 5NF, if and only if:
It is in 4NF.
If we can decompose table further to eliminate redundancy and
anomaly, and when we re-join the decomposed tables by means of
candidate keys, we should not be losing the original data or any new
record set should not arise.
• Example: Conside a relation with Key: {seller, company, product} and
FD: MVD: Seller ->-> Company, product. Product is related to
company, so, decomposing into 5NF:
Domain Key Normal Form (DKNF)
• DKNF is a normal form used in database normalization which requires
that the database contains no constraints other than domain
constraints and key constraints.
• A domain constraint specifies the permissible values for a given
attribute, while a key constraint specifies the attributes that uniquely
identify a row in a given table.
• The domain/key normal form is achieved when every constraint on
the relation is a logical consequence of the definition of keys and
domains, and enforcing key and domain restraints and conditions
causes all constraints to be met. Thus, it avoids all non-temporal
anomalies.
• The third normal form, Boyce–Codd normal form, fourth normal
form and fifth normal form are special cases of the domain/key
normal form. All have either functional, multi-valued or join
dependencies that can be converted into (super)keys. The domains on
those normal forms were unconstrained so all domain constraints are
Lossless Join Decomposition
• If we decompose a relation R into relations R1 and R2,
Decomposition is lossy if R1 ⋈ R2 ⊃ R
Decomposition is lossless if R1 ⋈ R2 = R
• To check for lossless join decomposition using FD set, following
conditions must hold:
Union of Attributes of R1 and R2 must be equal to attribute of R.
Each attribute of R must be either in R1 or in R2.
Att(R1) U Att(R2) = Att(R)
Intersection of Attributes of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ Φ
Common attribute must be a key for at least one relation (R1 or
R2)
Att(R1) ∩ Att(R2) -> Att(R1) or Att(R1) ∩ Att(R2) -> Att(R2)
Dependency Preserving
Decomposition
• If we decompose a relation R into relations R1 and R2, All
dependencies of R either must be a part of R1 or R2 or must
be derivable from combination of FD’s of R1 and R2.
• For Example, A relation R (A, B, C, D) with FD set{A->BC} is
decomposed into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of R1(ABC).
Transaction Management
• Collection of operations that form a single logical unit of
work is called transaction.
• Transaction management ensures that the database system
has ACID properties.
ACID Properties
1. Atomicity:
Atomic means whole. This property ensures that the changes made to
the database are wholly reflected or no change is made. But partial
changes or execution are not made.
2. Consistency:
The database must move from one consistent state to another after the
execution of the transaction.
3. Isolation:
Even if many transaction may be done at the same time but isolation
ensures that if transaction A and B are executing concurrently, then
either A must execute first then B is executed or, B must execute first
then A is executed.
4. Durability:
Changes made to the database are permanent and even if the system
failure occurs, that don’t lead to the erase of the transaction executed
previously.
Example of Fund Transfer
• Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
• Consistency requirement – the sum of A and B is unchanged by the execution of
the transaction.
• Atomicity requirement— if the transaction fails after step 3 and before step 6,
the system should ensure that its updates are not reflected in the database, else
an inconsistency will result.
• Durability requirement— once the user has been notified that the transaction
has completed (i.e., the transfer of the $50 has taken place), the updates to the
database by the transaction must persist despite failures.
• Isolation requirement— if between steps 3 and 6, another transaction is allowed
to access the partially updated database, it will see an inconsistent database (the
sum A + B will be less than it should be). Can be ensured trivially by running
transactions serially, that is one after the other. However, executing multiple
transactions concurrently has significant benefits.
Transaction States
Implementation of Atomicity and Durability

• The recovery-management component of a database system


implements the support for atomicity and durability.
• The shadow-database scheme:
assume that only one transaction is active at a time.
a pointer called db_pointer always points to the current
consistent copy of the database.
all updates are made on a shadow copy of the database, and
db_pointer is made to point to the updated shadow copy only after
the transaction reaches partial commit and all updated pages
have been flushed to disk.
in case transaction fails, old consistent copy pointed to by
db_pointer can be used, and the shadow copy can be deleted.
The shadow-database scheme:
• Assumes disks to not fail
• Useful for text editors, but extremely inefficient for large
databases: executing a single transaction requires copying
the entire database.
Concurrency Execution
• Multiple transactions are allowed to run concurrently in the
• system. Advantages are:
increased processor and disk utilization, leading to
better transaction throughput: one transaction can be using
the CPU while another is reading from or writing to the disk
reduced average response time for transactions: short
transactions need not wait behind long ones.
• Concurrency control schemes – mechanisms to achieve
isolation, i.e., to control the interaction among the
concurrent transactions in order to prevent them from
destroying the consistency of the database.
Schedules
• Schedules are sequences that indicate the chronological order in
which instructions of concurrent transactions are executed
a schedule for a set of transactions must consist of all
instructions of those transactions
must preserve the order in which the instructions appear in
each individual transaction.
• Example schedule:
Let T1 transfer $50 from A to B, and
T2 transfer 10% of the balance from A to B.
The following is a serial schedule,
in which T1 is followed by T2.

Schedule 1
• Example schedule:
Let T1 and T2 be the transactions defined previously.
The following schedule is not a serial schedule,
but it is equivalent to Schedule 1.

Schedule 2

• In both Schedule 1 and 3, the sum A + B is preserved.


Serializability
• Basic Assumption – Each transaction preserves database
consistency.
• Thus serial execution of a set of transactions preserves
database consistency.
• A (possibly concurrent) schedule is serializable if it is
equivalent to a serial schedule. Different forms of schedule
equivalence give rise to the notions of:
1. conflict serializability
2. view serializability
• We ignore operations other than read and write instructions.
Testing for Serializability
• Consider some schedule of a set of transactions T1, T2, ..., Tn
• Precedence graph— a direct graph where the vertices are
the transactions (names).
• We draw an arc from Ti to Tj if the two transaction conflict,
and Ti accessed the data item on which the conflict arose
earlier.
• We may label the arc by the item that was accessed.
Conflict Serializability
• Instructions li and lj of transactions Ti and Tj respectively,
conflict if and only if there exists some item Q accessed by
both li and lj, and at least one of these instructions wrote Q.

1. li = read(Q), lj = read(Q). li and lj don’t conflict.


2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
Conflict Serializability contd.
• If a schedule S can be transformed into a schedule S´ by a
series of swaps of non-conflicting instructions, we say that S
and S´ are conflict equivalent.
• We say that a schedule S is conflict serializable if it is conflict
equivalent to a serial schedule
• Example of a schedule that is not conflict serializable:

• We are unable to swap instructions in the above schedule to


obtain either the serial schedule < T3, T4 >, or the serial
schedule < T4, T3 >.
View Serializability
• Let S and S´ be two schedules with the same set of transactions.
S and S´ are view equivalent if the following three conditions
are met:
1. For each data item Q, if transaction Ti reads the initial value
of Q in schedule S, then transaction Ti must, in schedule S´, also
read the initial value of Q.
2. For each data item Q if transaction Ti executes read(Q) in
schedule S, and that value was produced by transaction Tj (if any),
then transaction Ti must in schedule S´ also read the value of Q
that was produced by transaction Tj .
3. For each data item Q, the transaction (if any) that performs
the final write(Q) operation in schedule S must perform the final
write(Q) operation in schedule S´.
• As can be seen, view equivalence is also based purely on reads
and writes alone.
View Serializability contd.
• A schedule S is view serializable it is view equivalent to a
serial schedule.
• Every conflict serializable schedule is also view serializable.
• Following is a schedule which is view-serializable but not
conflict serializable.

• Every view serializable schedule that is not conflict


serializable has blind writes.
Implementation of Isolation
• Schedules must be conflict or view serializable, and
recoverable, for the sake of database consistency, and
preferably cascadeless.
• A policy in which only one transaction can execute at a time
generates serial schedules, but provides a poor degree of
concurrency.
• Concurrency-control schemes trade-off between the amount
of concurrency they allow and the amount of overhead that
they incur.
• Some schemes allow only conflict-serializable schedules to
be generated, while others allow view-serializable schedules
that are not conflict-serializable.
Failure Classification
• To see where the problem has occurred, we generalize a failure into
various categories:
– Transaction Failure: A transaction has to abort when it fails to
execute or when it reaches a point from where it can’t go any
further. This is called transaction failure where only a few
transactions or processes are hurt. Reasons may be-
• Logical Errors
• System Errors
– System Crash: There are problems external to the system that may
cause the system to stop abruptly and cause the system to crash.
For example, interruptions in power supply may cause the failure
of underlying hardware or software failure.
– Disk Failure: Disk failures include formation of bad sectors,
unreachability to the disk, disk head crash or any other failure,
which destroys all or a part of disk storage.
Loss of Volatile Storage
• A volatile storage like RAM stores all the active logs, disk buffers, and
related data. In addition, it stores all the transactions that are being
currently executed.
• What happens if such a volatile storage crashes abruptly? It would
obviously take away all the logs and active copies of the database. It
makes recovery almost impossible, as everything that is required to
recover the data is lost.
• Following techniques may be adopted in case of loss of volatile
storage−
We can have checkpoints at multiple stages so as to save the
contents of the database periodically.
A state of active database in the volatile memory can be
periodically dumped onto a stable storage, which may also contain
logs and active transactions and buffer blocks.
<dump> can be marked on a log file, whenever the database
contents are dumped from a non-volatile memory to a stable one.
Recovery from Catastrophic Failure
• A catastrophic failure is one where a stable, secondary storage device
gets corrupt. With the storage device, all the valuable data that is stored
inside is lost.
• We have two different strategies to recover data from such a catastrophic
failure −
Remote backup: Here a backup copy of the database is stored at a
remote location from where it can be restored in case of a catastrophe.
Alternatively, database backups can be taken on magnetic tapes and
stored at a safer place. This backup can later be transferred onto a freshly
installed database to bring it to the point of backup.
• Grown-up databases are too bulky to be frequently backed up. In such
cases, we have techniques where we can restore a database just by
looking at its logs. So, all that we need to do here is to take a backup of all
the logs at frequent intervals of time. The database can be backed up
once a week, and the logs being very small can be backed up every day or
as frequently as possible.
Recovery and Atomicity
• When a DBMS recovers from a crash, it should maintain the following:
It should check the states of all the transactions, which were being
executed.
A transaction may be in the middle of some operation; the DBMS
must ensure the atomicity of the transaction in this case.
It should check whether the transaction can be completed now or it
needs to be rolled back.
No transactions would be allowed to leave the DBMS in an
inconsistent state.
• There are two types of techniques, which can help a DBMS in recovering
as well as maintaining the atomicity of a transaction:
Maintaining the logs of each transaction, and writing them onto some
stable storage before actually modifying the database.
Maintaining shadow paging, where the changes are done on a
volatile memory, and later, the actual database is updated.
Log based Recovery
• Log is a sequence of records, which maintains the records of actions
performed by a transaction. It is important that the logs are written prior
to the actual modification and stored on a stable storage media, which is
failsafe.
• Log-based recovery works as follows:
The log file is kept on a stable storage media.
When a transaction enters the system and starts execution, it writes
a log about it: <Tn, Start>
When the transaction modifies an item X, it write logs as: <Tn, X, V1,
V2> It reads Tn has changed the value of X, from V1 to V2.
When the transaction finishes, it logs: <Tn, commit>
• The database can be modified using two approaches:
1. Deferred database modification − All logs are written on to the stable
storage and the database is updated when a transaction commits.
2. Immediate database modification − Each log follows an actual
database modification. That is, the database is modified immediately after
every operation.
Recovery with Concurrent Transactions
• When more than one transaction are being executed in parallel, the
logs are interleaved. The concept of 'checkpoints‘ is used during
recovery to backtrack all logs.
• Checkpoint is a mechanism where all the previous logs are removed
from the system and stored permanently in a storage disk. Checkpoint
declares a point before which the DBMS was in consistent state, and
all the transactions were committed.
Recovery with Concurrent Transactions
• The recovery system reads the logs backwards from the end
to the last checkpoint.
• It maintains two lists, an undo-list and a redo-list.
• If the recovery system sees a log with <Tn, Start> and <Tn,
Commit> or just <Tn, Commit>, it puts the transaction in the
redo-list.
• If the recovery system sees a log with <Tn, Start> but no
commit or abort log found, it puts the transaction in undo-list.
• All the transactions in the undo-list are then undone and
their logs are removed. All the transactions in the redo-list
and their previous logs are removed and then redone before
saving their logs.
Recoverability
• Need to address the effect of transaction failures on concurrently
running transactions.
Recoverable schedule— if a transaction Tj reads a data items
previously written by a transaction Ti , the commit operation of Ti
appears before the commit operation of Tj.
Cascading rollback– a single transaction failure leads to a series
of transaction rollbacks. This can lead to the undoing of a
significant amount of work.
Cascadeless schedules— cascading rollbacks cannot occur; for
each pair of transactions Ti and Tj such that Tj reads a data item
previously written by Ti, the commit operation of Ti appears
before the read operation of Tj. Every cascadeless schedule is also
recoverable. It is desirable to restrict the schedules to those that
are cascadeless.
Object Oriented DBMS (OODBMS)
• An object-oriented database management system is a database
management system that supports the creation and modelling of data as
objects.
• OODBMS also includes support for classes of objects and the inheritance
of class properties, and incorporates methods, subclasses and their
objects.
• Most of the object databases also offer some kind of query language,
permitting objects to be found through a declarative programming
approach.
• Also called an object database management system (ODMS).
• OODBMS allows programmers to enjoy the consistency that comes with
one programming environment because the database is integrated with
the programming language and uses the same representation model.
• Certain object-oriented databases are designed to work with object-
oriented programming languages such as Delphi, Python, Java, Perl,
Objective C and Visual Basic .NET.
Distributed DBMS (DDBMS)
• A transaction can be executed by multiple networked
computers in a unified manner.
• A distributed DBMS processes a transaction that accesses a
distributed data base.
• A distributed database (DDB) is a collection of multiple
logically related databases distributed over a computer
network.
• A distributed database management system manages a
distributed database while making the distribution
transparent to the user.
True Distributed DBMS Architecture
Distributed Database: Properties
• Levels of transparency

• Network Transparency

– Location Transparency command is not aware of location


of data and location of where the command was issued
from

– Naming Transparency : Specified name is unambiguous

• Replication Transparency

– User is unaware of multiple copies if any

• Fragmentation Transparency

– User is unaware if relations are partitioned across sites


Distributed Database: Properties
• Increased reliability and availability

– Reliability refers to system live time.

– Availability is the probability that the system is continuously


available (usable or accessible) during a time interval.

Both facilitated by multiplicity of resources: if one node fails then


others available

• Improved performance: keep data closer to where it is needed most.

• Scalability

– Allows new nodes to be added anytime without chaining the


entire configuration.
Distributed Database: Properties
• Data Fragmentation

Split a relation into logically related and correct parts.

A relation can be fragmented in two ways:

• Horizontal Fragmentation- horizontal subset of a relation which


contain those tuples which satisfy selection conditions

• Vertical Fragmentation- subset of a relation created by a subset


of columns
Distributed Database: Properties
• Data Replication
– Database is replicated to all sites.
– In full replication the entire database is replicated and in partial
replication some selected part is replicated to some of the sites.
– Data replication is achieved through a replication schema.

• Data Distribution (Data Allocation)


– This is relevant only in the case of partial replication or partition.
– The selected portion of the database is distributed to the database
sites.

You might also like