DS unit-IV

External Storage Devices
An external storage device, also referred to as auxiliary storage and

secondary storage, is a device that contains all the addressable data
storage that is not inside a computer's main storage or memory.
An external storage device can be removable or non-removable,

temporary or permanent, and accessible over a wired or wireless
network.
External storage enables users to store data separately from a computer's

main or primary storage and memory at a relatively low cost. It increases
storage capacity without having to open up a system.
External storage devices are
1. Magnetic tapes
2. Magnetic Diskes
1. Magnetic tapes
o A Magnetic tape is made of using plastic material coded with
ferrite.
o Magnetic tapes length is represent using number of
characters or number of tracks in the tape.
o In tape we can store the information in bit by bit.
o Information can be read or write into the tape using tape
drive.
o The gap between the records in the tap is called record gap.It
is vary from ½ to ¾ inch.
o Records are grouped into blocks.The gap between the block
is called inter block gap.
o Number of records in a block is called blocking factor.
Advantages
Magnetic tape is the cheapest one.
Disadvantages
In Magnetic tape records are processed only in sequential

order.
2. Magnetic Disk
Magnetic disk contain two components
(i) Disk module

(ii) Disk Drive
(i) Disk Module

Disk module is used to store the information.
(ii) Disk Drive
 Disk drive perform the function of reading and writing
information onto disk.
 Disk pack consist of several platter and each platter has
two surfaces and surface has tracks and tracks are
divides into sectors.
 Sector is the smallest addressable segment
 There are three factors contributing input/output time
for disks
i. Seek time
ii. Latency time
iii. Transmission time
(i) Seek Time
 Time taken to position the read/write head to the correct cylinder.
 Maximum Seek time is 1/10 seconds.
(ii) Latency Time
 Time until right sector of the track is under the read/write head.
 Latency time is atmost 1/40 seconds.
(iii) Transmission time
 Time to transmit the block of data to/from the disk.
 Transmission rates are typically between 105 characters per second
to 5×105 characters/second.
Sorting with Disks
 The most popular method for sorting an external storage device is

Merge sort.
 This method consists of two distinct phases.
i. Segments of the input files are sorted using a good internal
sort method. These sorted segments are called runs.
ii. The runs generated are merged together.
Example
File contains 4500 records it can be sorted on disk it can be sorted
on disk in the following way.
Blocked runs obtained After Internal Sorting
Merging the 6 runs

 In the above example 4500 records are divided into 6 runs and
each runs are divided into 3 blocks.
 Each run have 750 records. Each block have 250 records.
 All the 6 runs are individually sorted using internal sorting method.
 At last all the runs are combine and sorted using merge sort.
K-Way Merging
 Merging algorithm on disk and tape need Log2M Passes over
the disk.
 It can be reduced using K-Way merging.
 In this method we can merge K-runs simultaneously.
 K-way merge need only LogKM Passes.
Example
 In the above example 16 runs are sorted individually.

 Then that runs are combined into 4 runs.
 At last 4 runs are sorted and combined simultaneously.
 So it is called 4-Way merging.Here K=4.
 It require LogkM Passes.Here M is number of runs.
Selection Tree
 In K-Way merging sorting is start with smallest run.
 The smallest run is find using the concept of selection
tree.
 Selection tree is a binary tree where each node
represents the smallest of its two children.
Example
Tournament Tree
The Tournament Tree [3] is based on an elimination tournament, like it

can be found in sports. In each game, two of the input elements compete.
The winner is promoted to the next round. Therefore, we get a binary
tree of games. The list is sorted in ascending order, so the winner of a
game is the smaller one of both elements.
Loser tree
For k-way merging, it is more efficient to only store the loser of each
game (see image). The data structure is therefore called a loser tree.
When building the tree or replacing an element with the next one from
its list, we still promote the winner of the game to the top.
Sorting with tape
Sorting on disk and tapes are same but they are differ in following ways
 Sorting on tapes use external media to store runs

 In disk seek and latency times are same in specific location but in
tape they are differ in one run to another run because tape use the
sequential order only.
Example
Sort 4500 records with 4 tapes and 6 runs.
T1
Run1 Run3 Run5
T2
Run2 Run4 Run6
T3
Run1 Run3
T4
Run2
T3.Run1=T1.Run1+T2.Run2
T1
Run1
T2
Run1
At last sorted records are stored in Tape-2.
 This type of sorting is known as balanced merge sort.
 It need 2k tapes for sorting.
Balanced Merge sort
 Balanced merge sort use M1,M2 and M3 algorithms.
M1 Algorithm
Analysis of M1 algorithm
Total number of passes needed in M1 algorithm is 2logkm.
M2 Algorithm
Total number of passes needed in M2 algorithm is 3/2logkm+1/2.
M3 Algorithm
Total number of passes needed in M3 algorithm is logkm.
 Algorithm M3 reduces number of passes.

Polyphase Merge
 In balanced merge sort 2k tapes are needed to avoid wasteful

passes in sorting.
 But Polyphase merge use fewer than 2k tapes for sorting.
 Example
Phase T1 T2 T3 Fraction of
total
records
read
1 113 18 -- 1
2 15 -- 28 16/21
3 -- 35 23 15/21
4 53 32 -- 15/21
5 51 -- 82 16/21
6 -- 131 81 13/21
7 211 -- -- 1
 Counting number of passes

1+16/21+15/21+15/21+16/21+13/21+1=5.5 passes
 Polyphase merge take 5.5 passes to merge 21 runs.
 In balanced merge sort need 8 passes to sort 21 runs.
Sorting with Fewer than 3 tapes
 Both the balanced merge sort and polyphase merging

at least need 3 tapes to sort.It require O(nlogn) time
where n is number of records.
 Any algorithm that use 1 tape for sort n records take
time>=O(n2).
 Any algorithm that use 2 tapes for sort n records take
O(nlogn). Time.It is possible only perform rewritten
of records without destroying adjacent records.
Symbol Tables
Symbol table is a set of name-value pairs.In symbol table we can
perform following operations.
i. Ask if particular name is already present
ii. Retrieve the attributes of that name
iii. Insert new name and its value
iv. Delete a name and its value
Symbol tables can be implemented in following ways
i. Static tree tables
ii. Dynamic tree tables
iii. Hash tables
(i) Static tree tables

 In this identifiers are known in advance and no
deletions or insertions are allowed.
 Static tree tables consist
->Binary search tree
->Extended Binary tree
->Huffman code
->Optimal binary search tree
(a) Binary Search Tree
A binary search tree is a binary tree .It is either empty or it satisfy
the following criteria
(i) All identifiers in the left subtree are less than the root
node.
(ii) All identifiers in right subtree are greater than the
root node.
(iii) The left and right subtrees also binary search trees.
Example
if
for while
repeat
Loop
Example-2
10
5 20
Algorithm
Procedure search(T,X,i)
i=T
while i<>0 do
case
:X<ident(i): i=LCHILD(i)
:X=ident(i):return
:X>ident(i): i=RCHILD(i)
End
End
End search
b)Extended binary trees
 In evaluating binary search trees it is useful to add square node at

every place of null link.
 A binary tree with n nodes has n+1 null links.
 An external nodes also referred as failure nodes.
 A binary tree with external nodes is called extended binary tree.
 An external path length of binary tree is sum of over all external
nodes length from root node.
 An internal path length of binary tree is sum of over all internal
nodes length from root node.
Example
Internal path length I=0+1+1+2+3=7

External path length E=2+2+2+3+4+4=17
C) Huffman code
 Binary tree with minimum weighted external path length is find

using Huffman code.
 In this weight of root node is sum of weights of left and right
childs.
Example
10
5 5
2 3
Algorithm
Procedure Huffman(L,n)
For i= 1 to n-1 do
Call getnode(T)
Lchild(T)=Least(L)
Rchild(T)=Least(L)
Weight(T)=Weight(Lchild(T))+Weight((Rchild(T))
Call insert(L,T)
End
End Huffman
D) Optimal Binary Search Tree
 An optimal binary search tree is a binary search tree for which the
nodes are arranged on levels such that the tree cost is minimum.
 Example
Tree1
stop
if
do
Maximum cost is 2
Tree2
if
do stop
Cost of tree2 is=1

So tree2 is optimal binary search tree.
Dynamic Tree Tables
 In dynamic tree tables insertion and deletions are allowed.

 Dynamic tables also be maintained as binary search trees.In this
insertion and deletions are allowed.
 In insertion if insert node is less than root node then it is added as
left child. Otherwise if it is greater than the root node then it is
added as right child.
Height Balanced Binary trees
 Adelson-Velskii and Landis in 1962 introduced a height

balanced tree.It is also called AVL tree.
 An empty tree is height balanced.
 If it is non empty tree with TL and TR are left and right
subtrees. Then tree satisfy the following conditions
1. TL and TR are height balanced
2. |HL-HR|<=1 where HL and HR are height of TL
and TR respectively.
Example
A
B C
In the above example HL=2 HR=1

|HL-HR|=1 so it is height balanced tree.
Rotations
The following rotations are performed when tree is not height balanced.
LL-New node Y is inserted in the left subtree of the left subtree of A.
LR-Y is inserted in the right subtree of the left subtree of A.
RR-Y is inserted in the right subtree of the right subtree of A.
RL-Y is inserted in the left subtree of the right subtree of A.
Hash Table
 Some arithmetic function f(x) is performed on the key values, the

result is used as index of the record.That table is called hash table.
 The resultant address is called hash address.Each address location
is called buckets.
Hash Functions
The following functions are used as Hash Functions
1. Mid-Square
2. Division
3. Folding
4. Digit Analysis
1. Mid-Square
In Mid-Square method identifier is squared and
then middle of square is take it as bucket address.
Ex:
Rno=134
X=134
X2=17956
Middle element is 9.It is used as index.
2.Division
The identifier X is divided by some number m and
remainder is used as the hash address.
Ex:
Rno=134
X=134/6
Remainder 2 is used as hash address.
3.Folding
The identifier X is partitioned into several parts of
same length then that parts are added to obtain the hash
address.
X=123/203/241
=123+203+241
=567
It is used as hash address.
4.Digit Analysis
In digit analysis identifier X is interpreted using some
radix r then that is used as hash address.
Ex:
X=123/203/241
123=321
203=302
241=142
-----
765
------
765 is used as hash address.
Overflow Handling
 If two records have same hash address.It is called overflow or

collision.
Overflow Handling Techniques
The following are the overflow handling techniques
1. Open Addressing
A. Linear Probing
B. Quadratic Probing
C. Random probing
2. Chaining
3. Rehashing
1. Open Addressing
A. Linear Probing
 When a new identifier get hashed into a full bucket it is necessary
to find the another bucket for this identifier.
 Find the closest unfilled bucket and put the identifier to that
bucket.It is called linear probing.
Example
If R1 is placed in the bucket A1 and R2 also gets same hash
address of A1 then find closest unfill bucket A2.So R2 is placed in
the bucket A2.
B.Quadratic Probing
 In quadratic probing search the bucket address (f(x)+i) mod b
 If that address is already full then we place the identifier into
the bucket address (f(x)+i2) mod b.
C.Random Probing
The random probing the hash address f(x) is already full then
we calculate the new hash address add some random numbers
with hash function f(x).
2. Chaining
The hash address f(x) is full means then we not search the new
address instead of that we can store the elements using linked
list.
Example
1 R1 R4
2 R2 R6
R3
3
4 R5
Algorithm
Procedure Chsearch(X,HT,b,J)
J=HT(f(x))
While (J<>0 and ident(J) <> X) do
J=Link(J)
End
End Chsearch
3. Rehashing
Hash address is calculated using some hash function f(x).If f(x)
is full then we change the hash function and built the new table
and store the identifier into that table.This method is called
rehashing.

DS unit-IV

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DS unit-IV

Uploaded by

Copyright:

Available Formats

External Storage Devices

An external storage device, also referred to as auxiliary storage and

An external storage device can be removable or non-removable,

External storage enables users to store data separately from a computer's

External storage devices are

Magnetic tape is the cheapest one.

In Magnetic tape records are processed only in sequential

(i) Disk module

(i) Disk Module

Sorting with Disks

 The most popular method for sorting an external storage device is

Blocked runs obtained After Internal Sorting

Merging the 6 runs

 In the above example 16 runs are sorted individually.

The Tournament Tree [3] is based on an elimination tournament, like it

 Sorting on tapes use external media to store runs

 Algorithm M3 reduces number of passes.

 In balanced merge sort 2k tapes are needed to avoid wasteful

 Counting number of passes

 Both the balanced merge sort and polyphase merging

(i) Static tree tables

b)Extended binary trees

 In evaluating binary search trees it is useful to add square node at

Internal path length I=0+1+1+2+3=7

 Binary tree with minimum weighted external path length is find

Cost of tree2 is=1

 In dynamic tree tables insertion and deletions are allowed.

Height Balanced Binary trees

 Adelson-Velskii and Landis in 1962 introduced a height

In the above example HL=2 HR=1

 Some arithmetic function f(x) is performed on the key values, the

 If two records have same hash address.It is called overflow or

You might also like