Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 23

Data on external storage

Data is stored on external storage devices like disks


and tapes. and fetched into memory when needed for
processing. The unit of information read from or
written to disk is called a page The size of a page is
typically 4 or 8 kb.
Each record in a file has a unique id called record. We
use the term data entry to refer to the records stored
in an index file.
Data on External Storage
• Disks: Can retrieve random page at fixed cost
– But reading several consecutive pages is much cheaper
than reading them in random order
• Tapes: Can only read pages in sequence
– Cheaper than disks; used for archival storage
• File organization: Method of arranging a file of records on
external storage.
– Record id (rid) is sufficient to physically locate record
– Indexes is a data structure that organizes data records on a
disk to optimize certain kind of retrieval operations
• Buffer manager brings pages from external storage to main
memory buffer pool. File and index layers make calls to the
buffer manager.

Slide No:L1-1
Various File Organizations
Many alternatives exist, each is ideal for some situations,
– Heap (random order) files: Files of randomly ordered
records are called heap files. Suitable when typical access
is a file scan i.e retrieving all records. it is an unordered
and simplest file structure.
– Sorted Files:Files sorted on some fields are called sorted
files. Best if records must be retrieved in some order
– when only a `range’ of records is needed.
– The records are stored in some order.
– Hashed files: files that are hashed on some fields are called
hashed files.
• Like sorted files, they speed up searches for a subset of
records, based on values in certain (“search key”) fields
• Updates are much faster than in sorted files.
Slide No:L1-2
Index
• Primary vs. secondary: an index on a set of fields that
includes the primary key .other indexes are called
secondary indexes.
• Clustered vs. unclustere d when the file is organized
so that the order of data records is the same as, or
`close to’, order of data entries, in some index , then it
is called clustered index.
– A file can be clustered on at most one search key.
– Cost of retrieving data records through index varies
greatly based on whether index is clustered or not!

Slide No:L1-3
Index data structures

1.Hash based indexing


2.Tree based indexing
Hash based indexing
Hash based indexing
Tree based indexing
Tree Based indexing
Comparing File Organizations
• Heap files
 records are stored in random order. Insertions are
done at end of file.
 Good for entire file scans.
 Good for retrieval of particular record.
 Not efficient for range selections
• Sorted files, stores records in sorted order of some
fields.
• Supports retrieval of all records
• Supports retrieval of single record
• Good for retrieving range of records.
• Insertions and deletions are slow.
Slide No:L3-2
• Hashed files
• Allows to locate records quickly with a given search
key value.
• Not good for range selections.
• Allows fast delete and insert operations.

Slide No:L3-3
Cost Model for Our Analysis

– B: The number of data pages


– R: Number of records per
page
– D: (Average) time to read or
write disk page
– C: the average time to process
a record

Slide No:L3-1
Cost of Operations

(a) Scan (b) Equality (c ) Range (d) Insert (e) Delete

(1) Heap BD 0.5BD BD 2D Search


+D
(2) Sorted BD Dlog 2B D(log 2 B + Search Search
# pgs with + BD +BD
match recs)
(3) 1.5BD Dlog F 1.5B D(log F 1.5B Search Search
Clustered + # pgs w. +D +D
match recs)
(4) Unclust. BD(R+0.15) D(1 + D(log F 0.15B Search Search
Tree index log F 0.15B) + # pgs w. + 2D + 2D
match recs)
(5) Unclust. BD(R+0.125) 2D BD Search Search
Hash index + 2D + 2D

Slide No:L4-1
Indexed Sequential Access Method(ISAM)
IN ISAM data structure, the no. of leaf pages are
fixed at file creation time.The records are stored in
leaf pages and are sorted

ISAM is a static structure

If a page is full, additional overflow pages are added


to the leaf page and all these overflow pages are
chained to the leaf page.
It supports the basic insert,delete and search
operations and range query very well.

Disadvantages
Results in long chain of overflow pages.
When a record is deleted from the primary leaf
pages, the space created is unchanged.
index entry
ISAM
P K P K 2 P K m Pm
0 1 1 2

• ISAM

Non-leaf
Pages

Leaf
Pages
Overflow
page
Primary pages

Slide No:L7-3
ISAM

Data
Pages

Index Pages

Overflow pages

The order of Allocation of pages in ISAM

Slide No:L7-4
Example ISAM Tree
• Each node can hold 2 entries;

Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*

Slide No:L7-5
After Inserting 23*, 48*, 41*, 42* ...

Root
Index 40

Pages

20 33 51 63

Primary
Leaf
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
Pages

Overflow 23* 48* 41*

Pages
42*

Slide No:L7-6
... Then Deleting 42*, 51*, 97*

Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 55* 63*

23* 48* 41*

Slide No:L7-7
B+ Trees
It is dynamic data structure which is most widely used.
It is a height balanced tree in which every path from
the root of the tree to any leaf is of same length.
The data pages are organized in the form of a double
linked list
So we can traverse the pages in both directions.
Each node except the root has between d and 2d
entries where d is the order of the tree.
D is a parameter and is measure of the capacity of a
tree node.
Insert and delete operations keep the tree balanced.
B+ trees perform better than ISAM as inserts are
handled without overflow pages.

The leaf pages are not allocated sequentially as in


ISAM.So they are linked using page pointers.
B+ Tree Indexes-A dynamic data structure

Non-leaf
Pages

Leaf
Pages
(Sorted by search key)

 Leaf pages contain data entries, and are chained (prev & next)
 Non-leaf pages have index entries; only used to direct searches:

index entry

P0 K 1 P1 K 2 P 2 K m Pm

Slide No:L2-2
Example B+ Tree

Root

17

Entries <= 17 Entries > 17

5 13 27 30

2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

Data entries in leaf level are sorted

Slide No:L2-3

You might also like