Database Management: Department of Computer Science, School of Computing Sciences

Database Management
II B. Sc. B
Department of Computer Science,
School of Computing Sciences,
St. Joseph’s College (Autonomous),
Trichy – 2.
Unit II
• Storage Structure
• Introduction
• File Organization and Addressing Schemes
• Relational Data Structure
• Introduction
• Relations
• Domains
Introduction
• A database system provides an ultimate view of the stored data.
• Data in the form of bits, bytes get stored in different storage devices.
• Types of Data Storage
• For storing the data, there are different types of storage options available. These storage
types differ from one another as per the speed and accessibility. There are the following
types of storage devices used for storing the data:
• Primary Storage
• Secondary Storage
• Tertiary Storage
Types of Data Storage
• Primary Storage
• It is the primary area that offers quick access to the stored data.
• volatile storage (power cut or a crash, the data also get lost)
• Main memory and cache are the types of primary storage.
• Secondary Storage
• Secondary storage is also called as Online storage.
• It is the storage area that allows the user to save and store data
permanently.
• This type of memory does not lose the data due to any power failure or
system crash.
• That's why we also call it non-volatile storage.
• Tertiary Storage
• It is the storage type that is external from the computer system.
• It has the slowest speed.
• But it is capable of storing a large amount of data.
• It is also known as Offline storage.
• Tertiary storage is generally used for data backup.
File Organization and Addressing Schemes
• The File is a collection of records.
• Using the key system allows to access the records.
• The type and frequency of access can be determined by the type of file organization.
• File organization is a logical relationship among various records. It defines how file
records are mapped onto disk blocks
• File organization is used to describe the way in which the records are stored in terms
of blocks, and the blocks are placed on the storage medium.
• Approach to map the database to the file
1) use the several files and store only one fixed length record in any given file.
2) structure the files so that it can contain multiple lengths for records.
• Files of fixed length records are easier to implement than the files of variable length
records.
• There are number of methods to organize Files.
• These methods can be categorized based on access or selection.
• Programmer can decide the best-suited method based to the requirement.
a) Sequential Organization
b) Indexed Sequential Organization
c) Direct Organization of File
d) Interface Indexing
e) Hashing Scheme of File Organization
Sequential Organization
• Easiest way of file organization
• Files are stored sequentially
• Implemented in two ways:
• Pile File Method
• Record is stored in a sequence
• Inserted in the order in which they are inserted into tables.
• Sorted File Method
• Record is always inserted at the file's end, and then it will
sort the sequence in ascending or descending order.
• Sorting of records is based on primary key or any other key.
• On modification of any record, it will update the record and
then sort the file
Sequential Organization
• Disadvantages
• It will waste time as it doesn’t allow jumping to a particular

record.
• Sorted file method takes more time and space for sorting the
records.
Indexed Sequential Organization
• ‘Indexes created from a sequential set of primary keys are referred to

as index sequential’.
• A sequential file that is indexed is called an index sequential file.
• The index provides for random access to records, while the sequential
nature of the file provides easy access to the subsequent records as
well as sequential processing.
• In index sequential organization, a hierarchy of indexes is followed
• Updates to an index sequential file may entail modifications to the

index in addition to the file
• The addresses of nonexistent records can be set to an impossibly high

or low value to indicate their absence from the file.
Direct Organization of File
 Hash function is used to calculate the address of the block to store
the records.
 hash function (simple or complex) mathematical function.
 This function is applied on some columns/attributes – either key or
non-key columns to get the block address.
 Hence each record is stored randomly irrespective of the order they
come.
 This method is known as Direct or Random file organization.
 If the hash function is generated on key column, then that column is
called hash key
 If hash function is generated on non-key column, then the column is
hash column.
 Advantages of Hash File Organization
 Records need not be sorted after any of the transaction. Hence the effort of sorting is reduced in
this method.
 Since block address is known by hash function, accessing any record is very faster. Similarly
updating or deleting a record is also very quick.
 This method can handle multiple transactions as each record is independent of other. i.e.; since
there is no dependency on storage location for each record, multiple records can be accessed at
the same time.
 It is suitable for online transaction systems like online banking, ticket booking system etc.
Disadvantages of Hash File Organization
 This method may accidentally delete the data. [older record will be overwritten by newer.
So there will be data loss]
 Since all the records are randomly stored, they are scattered in the memory. [memory is not
efficiently used]
 For searching a range of data, this method is not suitable. Because, each record will be
stored at random address.
 Searching for records with exact name or value will be efficient.
 This method is efficient only when the search is done on hash column. Otherwise, it will
not be able to find the correct address of the data.
 If there is multiple hash columns and searching for a single record alone will not give
correct results.
 If these hash columns are frequently updated, then the data block address is also changed
accordingly. Each update will generate new address.
Interface Indexing
• An indexed file contains records ordered by a record key.
• A record key uniquely identifies a record and determines the
sequence in which it is accessed with respect to other records.
• Each record contains a field that contains the record key.
• An indexed file can also use alternate indexes, that is, record
keys that let you access the file using a different logical
arrangement of the records.
• The possible record transmission (access) modes for indexed
files are sequential, random, or dynamic.
• When indexed files are read or written sequentially, the
sequence is that of the key values.
Hashing Scheme of File Organization

• Data bucket
• Data buckets are the memory locations where the records are stored.
• These buckets are also considered as Unit Of Storage.
• Hash Function
• Hash function (math calculation) maps all the set of search keys to actual record address.
• It uses primary key to generate the hash index – address of the data block.
• Hash Index
• The prefix of an entire hash value is taken as a hash index.
• Every hash index has a depth value (signify how many bits are used for computing a
hash)
• These bits can address 2n buckets. (if every bits are consumed, the depth value is
increased linearly and twice the buckets are allocated)
Relational Data Structures
• Relational Model (RM) - introduced by Dr. E. F. Codd in
1970
• RM - how users perceive (view) data
• RM represents data in the form of two-dimensional tables
• Each table represents some real-world person, place, thing,
or event about which information is collected. A relational
database is a collection of two-dimensional tables.
• The organization of data into relational tables is known as
the logical view of the database.
• The way the database software physically stores the data on a
computer disk is called the internal view.
• internal view differs from product to product and does not concern us
here.
• relational model is effectively used by relational database software’s
such as Oracle, Microsoft SQL Server, or even personal database
systems such as Access or Fox
• Data Structure, Relationships, and Data Integrity—the basis of
the Relational Model.
RM Terminology
• relational model is based on the mathematical concept of a relation
• physically represented as a table
• Codd, a trained mathematician, used terminology taken from mathematics,
• principally set theory and predicate logic
• Degree – number of domains in the relation
• Relation – is a table
• Tuple – row in a relation
• Cardinality of the relation – number of tuples in a relation
• To define a relation – Cartesian product could be defined first
Relations
• A relation r over collection of sets (domain) D1, D2,..., Dn is a
subset of the Cartesian Product D1×D2 ×...× Dn
• Thus a relation is a set of n-tuples (d1, d2, . . . , dn) where di ∈ Di

Domain
• The set of allowable values for an attribute
• Every attribute in a relation is defined on a domain
• Domains may be distinct for each attribute, or two or more attributes

may be defined on the same domain
Relational Algebra
• Collection of operations on relations
• Procedural query language
• Consists of a set of operations (relation - I/P and O/P)
• fundamental operations available with relational algebra are select,
project, set difference, Cartesian, rename, union
• In addition to the fundamental operations, there are several other
operations-namely, set intersection, natural join, division, and
assignment
• selection, projection, join and division operations can be seen as a
special relational operators
Relational Algebra
SELECT
• The select operation selects tuples that satisfy a given predicate.
• The lowercase Greek letter sigma (σ) is used to denote selection.
• The predicate appear as a subscript to σ.
• The argument relation is given in parenthesis following the σ.
Example
• Select those tuples of the loan relation where the branch is “Perryridge”.
σ branch _name=”perryridge”(loan) Select * from loan where
brance_name=“perryridge”
• Find all tuples in which the amount lent is more than $1200
σ Amount>1200(loan)
• Find tuples pertaining to loans of more than $1200 made by Perryridge branch
σ branch _name=”perryridge”^amount>1200(loan)
Relational Algebra
PROJECT
• The project operation is a unary operation that returns its argument
relation, with certain attributes left out.
• Since a relation is a set, any duplicate rows are eliminated.
• Projection is denoted by the Greek letter pi (π).
• The argument relation follows in parentheses.
Example
• List all loan numbers and the amount of the loan .The corresponding query is
π loan-number, amount(loan)
Relational Algebra
SET DIFFERENCE
• - is used to denote set difference operation
• Tuples in one relation/ not in other relation
• R1 – R2
Example
• Find all the customers name/ details (attributes) – having an account and
not a loan
π customer_name (depositor) - π customer_name (borrower)
Thank You
….

Database Management: Department of Computer Science, School of Computing Sciences

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Database Management: Department of Computer Science, School of Computing Sciences

Uploaded by

Copyright:

Available Formats

Database Management

• File Organization and Addressing Schemes

• Relational Data Structure

• Types of Data Storage

• It will waste time as it doesn’t allow jumping to a particular

• ‘Indexes created from a sequential set of primary keys are referred to

• A sequential file that is indexed is called an index sequential file.

• In index sequential organization, a hierarchy of indexes is followed

• Updates to an index sequential file may entail modifications to the

• The addresses of nonexistent records can be set to an impossibly high

Hashing Scheme of File Organization

• Thus a relation is a set of n-tuples (d1, d2, . . . , dn) where di ∈ Di

• Every attribute in a relation is defined on a domain

• Domains may be distinct for each attribute, or two or more attributes

You might also like