Unit 4 Rdbms

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

UNIT – IV STORAGE AND FILE ORGANIZATION

STORAGE SYSTEM IN DBMS

A database system provides an ultimate view of the stored data. However,


data in the form of bits, bytes get stored in different storage devices.

Types of Data Storage

For storing the data, there are different types of storage options available.
These storage types differ from one another as per the speed and
accessibility. There are the following types of storage devices used for
storing the data:

o Primary Storage
o Secondary Storage
o Tertiary Storage

Primary Storage
It is the primary area that offers quick access to the stored data. We also know
the primary storage as volatile storage. It is because this type of memory does
not permanently store the data. As soon as the system leads to a power cut or
a crash, the data also get lost. Main memory and cache are the types of
primary storage.
o Main Memory: It is the one that is responsible for operating the data
that is available by the storage medium. The main memory handles
each instruction of a computer machine. This type of memory can
store gigabytes of data on a system but is small enough to carry the
entire database. At last, the main memory loses the whole content if
the system shuts down because of power failure or other reasons.

1. Cache: It is one of the costly storage media. On the other hand, it is


the fastest one. A cache is a tiny storage media which is maintained
by the computer hardware usually. While designing the algorithms
and query processors for the data structures, the designers keep
concern on the cache effects.

Secondary Storage
Secondary storage is also called as Online storage. It is the storage area that
allows the user to save and store data permanently. This type of memory
does not lose the data due to any power failure or system crash. That's why
we also call it non-volatile storage.

There are some commonly described secondary storage media which are
available in almost every type of computer system:

o Flash Memory: A flash memory stores data in USB (Universal Serial


Bus) keys which are further plugged into the USB slots of a computer
system. These USB keys help transfer data to a computer system, but
it varies in size limits. Unlike the main memory, it is possible to get
back the stored data which may be lost due to a power cut or other
reasons. This type of memory storage is most commonly used in the
server systems for caching the frequently used data. This leads the
systems towards high performance and is capable of storing large
amounts of databases than the main memory.
o Magnetic Disk Storage: This type of storage media is also known as
online storage media. A magnetic disk is used for storing the data for
a long time. It is capable of storing an entire database. It is the
responsibility of the computer system to make availability of the data
from a disk to the main memory for further accessing. Also, if the
system performs any operation over the data, the modified data
should be written back to the disk. The tremendous capability of a
magnetic disk is that it does not affect the data due to a system crash
or failure, but a disk failure can easily ruin as well as destroy the
stored data.

Tertiary Storage
It is the storage type that is external from the computer system. It has the
slowest speed. But it is capable of storing a large amount of data. It is also
known as Offline storage. Tertiary storage is generally used for data backup.
There are following tertiary storage devices available:

o Optical Storage: An optical storage can store megabytes or gigabytes


of data. A Compact Disk (CD) can store 700 megabytes of data with a
playtime of around 80 minutes. On the other hand, a Digital Video
Disk or a DVD can store 4.7 or 8.5 gigabytes of data on each side of
the disk.
o Tape Storage: It is the cheapest storage medium than disks.
Generally, tapes are used for archiving or backing up the data. It
provides slow access to data as it accesses data sequentially from the
start. Thus, tape storage is also known as sequential-access storage.
Disk storage is known as direct-access storage as we can directly
access the data from any location on disk.

Storage Hierarchy
Besides the above, various other storage devices reside in the computer
system. These storage media are organized on the basis of data accessing
speed, cost per unit of data to buy the medium, and by medium's reliability.
Thus, we can create a hierarchy of storage media on the basis of its cost
and speed.

Thus, on arranging the above-described storage media in a hierarchy


according to its speed and cost, we conclude the below-described image:
In the image, the higher levels are expensive but fast. On moving down, the
cost per bit is decreasing, and the access time is increasing. Also, the
storage media from the main memory to up represents the volatile nature,
and below the main memory, all are non-volatile devices.

What is disk storage in DBMS?

Hard disks are formatted in a well-defined order to store data


efficiently. A hard disk plate has many concentric circles on it, called
tracks. Every track is further divided into sectors. A sector on a hard disk
typically stores 512 bytes of data.

How does a disk store data?


Data is stored on a hard drive in binary code, using 1s and 0s. The
information is spread out on the magnetic layer of the disk(s) and are read
or written by the read heads that 'float' above the surface thanks to the
layer of air produced by the ultra fast rotation of the disk.
RAID (REDUNDANT ARRAY OF INDEPENDENT DISKS)

What is RAID?
RAID (redundant array of independent disks) is a way of storing the same data
in different places on multiple hard disks or solid-state drives (SSDs) to protect
data in the case of a drive failure. There are different RAID levels, however, and
not all have the goal of providing redundancy.

How RAID works


RAID works by placing data on multiple disks and allowing input/output (I/O)
operations to overlap in a balanced way, improving performance. Because
using multiple disks increases the mean time between failures, storing data
redundantly also increases fault tolerance.

RAID arrays appear to the operating system (OS) as a single logical drive.

RAID employs the techniques of disk mirroring or disk striping. Mirroring will
copy identical data onto more than one drive. Striping partitions help spread
data over multiple disk drives. Each drive's storage space is divided into units
ranging from a sector of 512 bytes up to several megabytes. The stripes of all
the disks are interleaved and addressed in order. Disk mirroring and disk
striping can also be combined in a RAID array.

In a single-user system where large records are stored, the stripes are typically
set up to be small (512 bytes, for example) so that a single record spans all the
disks and can be accessed quickly by reading all the disks at the same time.

In a multiuser system, better performance requires a stripe wide


enough to hold the typical or maximum size record, enabling
overlapped disk I/O across drives.
RAID controller
A RAID controller is a device used to manage hard disk drives in a storage
array. It can be used as a level of abstraction between the OS and the physical
disks, presenting groups of disks as logical units. Using a RAID controller can
improve performance and help protect data in case of a crash.

A RAID controller may be hardware- or software-based. In a hardware-based


RAID product, a physical controller manages the entire array. The controller
can also be designed to support drive formats such as Serial Advanced
Technology Attachment and Small Computer System Interface. A physical RAID
controller can also be built into a server's motherboard.

With software-based RAID, the controller uses the resources of the hardware
system, such as the central processor and memory. While it performs the same
functions as a hardware-based RAID controller, software-based RAID
controllers may not enable as much of a performance boost and can affect the
performance of other applications on the server.

If a software-based RAID implementation is not compatible with a system's


boot-up process and hardware-based RAID controllers are too costly, firmware,
or driver-based RAID, is a potential option.

Firmware-based RAID controller chips are located on the motherboard, and all
operations are performed by the central processing unit (CPU), similar to
software-based RAID. However, with firmware, the RAID system is only
implemented at the beginning of the boot process. Once the OS has loaded, the
controller driver takes over RAID functionality. A firmware RAID controller is
not as pricey as a hardware option, but it puts more strain on the computer's
CPU. Firmware-based RAID is also called hardware-assisted software RAID,
hybrid model RAID and fake RAID.
Why data redundancy?

Data redundancy, although taking up extra space, adds to disk reliability.


This means, in case of disk failure, if the same data is also backed up onto
another disk, we can retrieve the data and go on with the operation. On the
other hand, if the data is spread across just multiple disks without the
RAID technique, the loss of a single disk can affect the entire data.
Key evaluation points for a RAID System

• Reliability: How many disk faults can the system tolerate?


• Availability: What fraction of the total session time is a system in uptime
mode, i.e. how available is the system for actual use?
• Performance: How good is the response time? How high is the throughput
(rate of processing work)? Note that performance contains a lot of
parameters and not just the two.
• Capacity: Given a set of N disks each with B blocks, how much useful
capacity is available to the user?
RAID is very transparent to the underlying system. This means, to the host
system, it appears as a single big disk presenting itself as a linear array of
blocks. This allows older technologies to be replaced by RAID without
making too many changes in the existing code.

Types of (or) Levels of RAID

RAID or Redundant Array of Independent Disks, is a technology to connect


multiple secondary storage devices and use them as a single storage media.
RAID consists of an array of disks in which multiple disks are connected
together to achieve different goals. RAID levels define the use of disk arrays.

RAID 0
In this level, a striped array of disks is implemented. The data is broken down
into blocks and the blocks are distributed among disks. Each disk receives a
block of data to write/read in parallel. It enhances the speed and performance
of the storage device. There is no parity and backup in Level 0.
RAID 1
RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it
sends a copy of data to all the disks in the array. RAID level 1 is also
called mirroring and provides 100% redundancy in case of a failure.

RAID 2
RAID 2 records Error Correction Code using Hamming distance for its data,
striped on different disks. Like level 0, each data bit in a word is recorded on a
separate disk and ECC codes of the data words are stored on a different set
disks. Due to its complex structure and high cost, RAID 2 is not commercially
available.

RAID 3
RAID 3 stripes the data onto multiple disks. The parity bit generated for data
word is stored on a different disk. This technique makes it to overcome single
disk failures.

RAID 4
In this level, an entire block of data is written onto data disks and then the
parity is generated and stored on a different disk. Note that level 3 uses byte-
level striping, whereas level 4 uses block-level striping. Both level 3 and level 4
require at least three disks to implement RAID.
RAID 5
RAID 5 writes whole data blocks onto different disks, but the parity bits
generated for data block stripe are distributed among all the data disks rather
than storing them on a different dedicated disk.

RAID 6
RAID 6 is an extension of level 5. In this level, two independent parities are
generated and stored in distributed fashion among multiple disks. Two
parities provide additional fault tolerance. This level requires at least four disk
drives to implement RAID.
Storage Access

A database is mapped into a number of different files, which are


maintained by the underlying operating system. Files are organized
into block and a block may contain one or more data item.

A major goal of the DBMS is to minimize the number of block


transfers between the disk and memory. Since it is not possible to
keep all blocks in main memory, we need to manage the allocation of
the space available for the storage of blocks. This is also similar to the
problems encountered by the operating system, and can be in conflict
with the operating system, since the OS is concerned with processes
and the DBMS is concerned with only one family of processes.

DBMS | FILE ORGANIZATION


A database consists of a huge amount of data. The data is grouped within a
table in RDBMS, and each table has related records. A user can see that the
data is stored in form of tables, but in actual this huge amount of data is
stored in physical memory in form of files.
File
– A file is a sequence of records stored in binary format. A disk drive is
formatted into several blocks that can store records. File records are
mapped onto those disk blocks.
A file is named collection of related information that is recorded on
secondary storage such as magnetic disks, magnetic tables and optical
disks.

What is File Organization?

File Organization refers to the logical relationships among various records


that constitute the file, particularly with respect to the means of
identification and access to any specific record. In simple terms, Storing the
files in certain order is called file Organization. File Structure refers to the
format of the label and data blocks and of any logical control record.
Types of File Organizations –
Various methods have been introduced to Organize files. These particular
methods have advantages and disadvantages on the basis of access or
selection. Thus it is all upon the programmer to decide the best suited file
Organization method according to his requirements.
Some types of File Organizations are:

• Sequential File Organization


• Heap File Organization
• Hash File Organization
• Indexed sequential access method (ISAM)
• B+ Tree File Organization
• Clustered File Organization

We will be discussing each of the file Organizations in further sets of this


article along with differences and advantages/ disadvantages of each file
Organization methods.

1. SEQUENTIAL FILE ORGANIZATION –

The easiest method for file Organization is Sequential method. In this


method the the file are stored one after another in a sequential manner.
There are two ways to implement this method:
i. Pile File Method – This method is quite simple, in which we
store the records in a sequence i.e one after other in the
order in which they are inserted into the tables.

Insertion of new record –


Let the R1, R3 and so on upto R5 and R4 be four records in
the sequence. Here, records are nothing but a row in any
table. Suppose a new record R2 has to be inserted in the
sequence, then it is simply placed at the end of the file.

ii. Sorted File Method –In this method, As the name itself
suggest whenever a new record has to be inserted, it is
always inserted in a sorted (ascending or descending)
manner. Sorting of records may be based on any primary key
or any other key.

Pros of sequential file organization


o It contains a fast and efficient method for the huge amount of data.
o In this method, files can be easily stored in cheaper storage mechanism
like magnetic tapes.
o It is simple in design. It requires no much effort to store the data.
o This method is used when most of the records have to be accessed like
grade calculation of a student, generating the salary slip, etc.
o This method is used for report generation or statistical calculations.
Cons of sequential file organization
o It will waste time as we cannot jump on a particular record that is
required but we have to move sequentially which takes our time.
o Sorted file method takes more time and space for sorting the records.

2. HEAP FILE ORGANIZATION:

Heap File Organization works with data blocks. In this method


records are inserted at the end of the file, into the data blocks. No
Sorting or Ordering is required in this method. If a data block is full, the
new record is stored in some other block, here the other data block need
not be the very next data block, but it can be any block in the memory.
It is the responsibility of DBMS to store and manage the new records.

Insertion of new record –

Suppose we have four records in the heap R1, R5, R6, R4 and R3
and suppose a new record R2 has to be inserted in the heap then, since
the last data block i.e data block 3 is full it will be inserted in any of the
data blocks selected by the DBMS, lets say data block 1.
If we want to search, delete or update data in heap file Organization the
we will traverse the data from the beginning of the file till we get the
requested record. Thus if the database is very huge, searching, deleting
or updating the record will take a lot of time.

Pros and Cons of Heap File Organization –

Pros –
• Fetching and retrieving records is faster than sequential record but
only in case of small databases.
• When there is a huge number of data needs to be loaded into the
database at a time, then this method of file Organization is best
suited.

Cons –
• Problem of unused memory blocks.
• Inefficient for larger databases.

3. Hash File Organization


Hash File Organization uses the computation of hash function on some
fields of the records. The hash function's output determines the location of
disk block where the records are to be placed.
When a record has to be received using the hash key columns, then the
address is generated, and the whole record is retrieved using that address.
In the same way, when a new record has to be inserted, then the address is
generated using the hash key and record is directly inserted. The same
process is applied in the case of delete and update.

In this method, there is no effort for searching and sorting the entire file. In
this method, each record will be stored randomly in the memory.

4. B+ File Organization

o B+ tree file organization is the advanced method of an indexed


sequential access method. It uses a tree-like structure to store records
in File.
o It uses the same concept of key-index where the primary key is used
to sort the records. For each primary key, the value of the index is
generated and mapped with the record.
o The B+ tree is similar to a binary search tree (BST), but it can have
more than two children. In this method, all the records are stored only
at the leaf node. Intermediate nodes act as a pointer to the leaf nodes.
They do not contain any records.

The above B+ tree shows that:


o There is one root node of the tree, i.e., 25.
o There is an intermediary layer with nodes. They do not store the actual
record. They have only pointers to the leaf node.
o The nodes to the left of the root node contain the prior value of the root
and nodes to the right contain next value of the root, i.e., 15 and 30
respectively.
o There is only one leaf node which has only values, i.e., 10, 12, 17, 20,
24, 27 and 29.
o Searching for any record is easier as all the leaf nodes are balanced.
o In this method, searching any record can be traversed through the single
path and accessed easily.

Pros of B+ tree file organization


o In this method, searching becomes very easy as all the records are stored
only in the leaf nodes and sorted the sequential linked list.
o Traversing through the tree structure is easier and faster.
o The size of the B+ tree has no restrictions, so the number of records can
increase or decrease and the B+ tree structure can also grow or shrink.
o It is a balanced tree structure, and any insert/update/delete does not
affect the performance of tree.

Cons of B+ tree file organization


o This method is inefficient for the static method.

5. Indexed sequential access method (ISAM)


ISAM method is an advanced sequential file organization. In this method,
records are stored in the file using the primary key. An index value is
generated for each primary key and mapped with the record. This index
contains the address of the record in the file.

6. Cluster file organization

o When the two or more records are stored in the same file, it is known
as clusters. These files will have two or more tables in the same data
block, and key attributes which are used to map these tables together
are stored only once.
o This method reduces the cost of searching for various records in
different files.
o The cluster file organization is used when there is a frequent need for
joining the tables with the same condition. These joins will give only a
few records from both tables. In the given example, we are retrieving
the record for only particular departments. This method can't be used
to retrieve the record for the entire department.

Pros of Cluster file organization


o The cluster file organization is used when there is a frequent request for
joining the tables with same joining condition.
o It provides the efficient result when there is a 1:M mapping between the
tables.
Cons of Cluster file organization
o This method has the low performance for the very large database.
o If there is any change in joining condition, then this method cannot use.
If we change the condition of joining then traversing the file takes a lot of
time.

File Operations
Operations on database files can be broadly classified into two categories −
• Update Operations
• Retrieval Operations
Update operations change the data values by insertion, deletion, or update.
Retrieval operations, on the other hand, do not alter the data but retrieve
them after optional conditional filtering. In both types of operations,
selection plays a significant role. Other than creation and deletion of a file,
there could be several operations, which can be done on files.
• Open − A file can be opened in one of the two modes, read
mode or write mode. In read mode, the operating system does not
allow anyone to alter data. In other words, data is read only. Files
opened in read mode can be shared among several entities. Write
mode allows data modification. Files opened in write mode can be
read but cannot be shared.
• Locate − Every file has a file pointer, which tells the current position
where the data is to be read or written. This pointer can be adjusted
accordingly. Using find (seek) operation, it can be moved forward or
backward.
• Read − By default, when files are opened in read mode, the file
pointer points to the beginning of the file. There are options where
the user can tell the operating system where to locate the file pointer
at the time of opening a file. The very next data to the file pointer is
read.
• Write − User can select to open a file in write mode, which enables
them to edit its contents. It can be deletion, insertion, or
modification. The file pointer can be located at the time of opening or
can be dynamically changed if the operating system allows to do so.
• Close − This is the most important operation from the operating
system’s point of view. When a request to close a file is generated, the
operating system
o removes all the locks (if in shared mode),
o saves the data (if altered) to the secondary storage media, and
o releases all the buffers and file handlers associated with the file.
The organization of data inside a file plays a major role here. The process to
locate the file pointer to a desired record inside a file various based on
whether the records are arranged sequentially or clustered.

DATA DICTIONARY STOREGE

Data Dictionary is a repository of information about data in a database or


a data set (a type of metadata).

A data dictionary is like the A-Z dictionary of the relational database system
holding all information of each relation in the database. Also known as Data
Dictionary or System Catalog or Meta data.

Data Dictionary consists of


Data Dictionary consists of the following information −

• Name of the tables in the database


• Constraints of a table i.e. keys, relationships, etc.
• Columns of the tables that related to each other
• Owner of the table
• Last accessed information of the object
• Last updated information of the object
An example of Data Dictionary can be personal details of a student −

Example
<StudentPersonalDetails>
Student_ID Student_Name Student_Address Student_City

The following is the data dictionary for the above fields −


Types of Data Dictionary

There are two main types of Data Dictionary in DBMS,

1. Integrated Data Dictionary


2. Stand Alone Data Dictionary

1) Integrated Data Dictionary


In DBMS, an integrated data dictionary is contained within the DBMS.

For e.g.: Every relational DBMSs includes an integral data dictionary or


system catalog and it is often accessed and updated by RDBMS. Other than
relational DBMS, the other entire DBMSs mainly old one doesn't include an
integral data dictionary. So, as an alternative, DBA may use stand alone
data dictionary systems.

Data Dictionary is further classified into two types:

a. Active
In DBMS, an active data dictionary gets automatically updated by DBMS
when every database access occurs, and thus it keeps each access
information, up-to-date.
b. Passive
In DBMS, a passive data dictionary does not getautomatically updated and
often needs a batch method to run.

The access information of the Data Dictionary is mainly used for query
optimization purpose by DBMS. The main function of the Data Dictionary is
to store the report of all database objects. In DBMS, Integrated Data
Dictionary has a tendency to bind their metadata into the data.

2) Stand Alone Data Dictionary


In DBMS, Stand alone data dictionary systems are typically more flexible
than Integrated Data Dictionary and it permits DBA to define and manage
all administration's data, doesn't matter whether the data are computerized
or not.

Stand-alone provides designers of database and the end users with


improvised ability to communicate with each other doesn't matter what is
the format of Data Dictionary.
Data Dictionary is the tool to help DBA to settle all the conflicts in the
data.

Data Dictionary doesn't have any standard format to store the information.
But there are some features that are common.

• Data Elements:
Data Dictionary stores the definition of all the data elements. It stores
name, data types, display formats, internal storage formats, and validation
rules. It also explains the use of data, where an element gets used, who has
used it and so on.
• Tables:
Data Dictionary stores the name of the user who created the table, number
of rows and columns, date at which table has been created and authorized
access and so on.
• Index:
Data Dictionary stores the Indexes that are defined for database tables. In
every index, DBMS store index name used by the attributes, location,
characteristics of the index and the date of creation.
• Programs:
Data Dictionary stores the programs that are created to access database
including report, application and screen format, SQL queries and so on.
• Relationship between data elements:
Data Dictionary stores whether the relationship and compulsory or optional,
cardinality and connectivity and so on.
• Administrations and End Users:
Data Dictionary stores the information of all administrations and ends
users as well.

2 Marks
1. Define File
2. Define Storage
3. Types of storage
4. Define Disk and types
5. What is meant by RAID
6. Define Data Dictionary
7. Types of Data dictionary
5 Marks
1. Short notes on File organization
2. Short notes on Data dictionary
3. Explain types of storege.
10 marks
1. Discus about the types of file organizations

You might also like