Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

FS - IAT 3 @Sachin

MODULE 4

1. Discuss about blocks and its features for maintaining a sequence set with neat
sketch.
The use of Block
• Changes while adding and deleting records should not affect the entire file. So,
it should be localized to avoid entire file to be affected.
• Here, we can use blocks – In blocks the changes such that either insertion or
deletion does not affect the entire file, instead, the changes is localized.
• If the stored record in file as blocks, the blocks as basic for input/output.
• We can define the size of the buffer is as same as block size to easily bring and
access in main memory.
• The link fields in each block point to the preceding block and the following
block.
• As with B-tree, the insertion of new records into block can cause the block to
overflow. This overflow condition can be handled by a block-splitting process.
• In a B-tree a split results in the promotion of a key. Here things are simpler:
▪ divide the records between two blocks and rearrange the links so we can
still move through the file in order by key, block after block.
• Underflow in ·a B+ tree can lead to either of two solutions:
▪ If a neighboring node is also half full, we can merge the two nodes,
freeing one up for reuse.
▪ If the neighboring nodes are more than half full, we can redistribute
records between the nodes to make the distribution more nearly even

• Example :

Initial Block Sequence Set


Sequence Set after insertion of CARTER record - block 2 splits, and the
contents are divided b/w blocks 2 and 4

Sequence Set after deletion of Davis Record – block 4 is less than half full,
so it is merged with block 3
2. What is Separator? Demonstrate simple Prefix B+ Tree index with complete picture.
3. What is Prefix B+ tree? How it is different from B+ tree? Explain with example.

• The difference between a simple prefix B+ trees and a plain B tree is that the
latter structure does not provide the use of prefixes as separators. Instead, the
separator in the index set are simple copies of the actual keys.
• The operations performed on B+ trees are essentially the same as those for
simple B+ prefix trees. Both B+ and simple prefix B+ tree consists of a set of
record arrange in key order in sequence set coupled with an index set that
provide rapid access to block containing any particular key slash record
combination.
• The only difference in simple prefix B+ is we build an index set of shortest
separator formed from key prefixes.
Example :

Simple Prefix B+ Tree :

B+ Tree
4. Discuss in details with neat diagram about maintenance of prefix B+ tree block
overflow while deleting or inserting.

i. Changes are Localized to Single Blocks in the Sequence Set

• Deleting these records, not required merging and redistribution with in the
sequence set
• Suppose we want to delete the record for EMBRY and FOLKS and that neither of
this deletion result in any merging on redistribution within the sequence set, since
there is no merging or redistribution, the effect of these relations on the sequence
set is limited to changes within Block 4 and 6.

• The effect of inserting new records into the sequence set that do not cause block
splitting is same as the effect of those deletion that do not result in merging the
index set remain unchanged.

ii. Changes Involving Multiple Blocks in the Sequence Set


What happens, If the insertion or deletion of records change the number of blocks in the
sequence set?
Record Insertion and deletion always take place in the sequence set since that is where
the records are. If a splitting, merging, or redistribution is necessary , perform the
operation just as you would if there were no index set at all, then after the record
operation in the sequence set are complete, make changes as necessary in the index.
Insertion
Deletion
5. What is sequence set? How it is used to implement prefix B+ tree? Explain.

6. Distinguish B-Trees, B+ Trees, and Simple Prefix B+ Trees with suitable example and
sketch.
B Trees
• The sequence set can be processed in a truly linear, sequential way, providing
efficient access to records in order by key and value
• The index is built with a single key or separator per block of data records
instead of one key per data record

B+ Trees
• The major difference between B-tree and B+ tree is that in the B+ tree all the
key and record information is contained in a linked set of blocks known as the
sequence set.
• The key and record information is not in the upper-level, treelike portion of the
B+ tree.
• Indexed access to this sequence set is provided through a conceptually
separate structure called the index set.
• B+ tree the index set consists of copies of the keys that represent the
boundaries between sequence set blocks.
• These copies of keys are called separators because they separate a sequence
set block from its predecessor.
Simple Prefix B+ Trees
• The simple prefix B+ tree builds on this advantage by making the separators in
the index set smaller than they keys in the sequence set, rather than just using
copies of these keys.

7. What is index set? Describe the block size for index set with example.
8. With neat diagram explain internal structure of Index set block to maintain variable
order prefix B+ tree.
MODULE 5

9. What is hashing? Brief about different hashing methods.


A Hash function is like a black box that produces an address every time a key is
dropped. The process of producing an address when a key is dropped to hash
function is called Hashing. Formally hash function is given by h(K) that transforms a
key ‘K’ into an address. The resulting address is used to store and retrieve the record.

A hash function achieves the goal of file structure i.e. to access or retrieve any record
in a single seek by converting the key in address and using direct access to fetch the
record in single seek.
Different hashing methods [Link]
10.Predicting the Distribution of Records
• None will hash to the given address?
• Exactly one key will hash to the address?
• Exactly two keys will hash to the address
• Exactly three, four, and so on keys will hash to the address?
• All keys in the file will has to the same given address?
11.What is packing density? How much extra memory should be used to avoid
collision?
• How many addresses should have one record plus one or more
synonyms?
• Assuming that only one record can be assigned to each home address,
how any overflow records can be expected?
• What percentage of records should be overflow records?

Packing Density
Formula to be used :

Numerical :
12.Discuss about dynamic hashing with neat sketch.

Dynamic Hashing: Functionally, dynamic hashing and extendible hashing are very
similar. Both use a directory to track the addresses of the buckets, and both-extend
the directory through the use of tries.
The key difference between the approaches is that dynamic hashing, like
conventional, static hashing, starts with a hash function that covers an address space
of a fixed Size. As buckets within that fixed address space overflow, they split,
forming the leaves of a trie that grows down from the original address node.
Eventually, after enough additions and splitting, the buckets are addressed through a
forest of tries that have been seeded out of the original static address space.
Example :

Let's look at an example. Figure (a) shows an initial address space of four and four
buckets descending from the four addresses in the directory. In Fig. (b) we have split
the bucket at address 4. We address the two buckets resulting from the split as 40
and 41 . We change the shape of the directory node at address 4 from a square to a
circle because it has changed from an external node. In Fig (c) we split the bucket
addressed by node 2, creating the new external nodes 20 and 21. We also split the
bucket addressed by 41 , extending the trie downward to include 410 and 411.
Finding a key in a dynamic hashing scheme can involve the use of two hash functions
rather than just one. First, there is the hash function that covers the original address
space. If you find that the directory node is an external node and therefore points to
a bucket, the search is complete.

However, if the directory node is an internal node, then you need additional address
information to guide you through the ls and 0s that form the trie. The primary
difference between the two approaches is that dynamic hashing allows for slower,
more gradual growth of the directory, whereas extendible hashing extends the
directory by doubling it.
13.What is linear hashing? Explain with neat diagram.

• Linear hashing, does away with the directory. Linear hashing, like extendible
hashing, uses more bits of hashed value as the address space grows. Note that the
address space consists of four buckets rather than four directory nodes that can
point to buckets.

Example :
• As we add records, bucket b overflows. The overflow forces a split. However, as
Fig. (b) Shows, it is not bucket b that splits, but bucket a. The reason for this is that
we are extending the address space linearly, and bucket a is the next bucket that
must split to create the next linear extension, which we call bucket A. A 3-bit hash
function, h3(k), is applied to buckets a‘and A to divide the records between them.
Since bucket b was not the bucket that we split, the overflowing record is placed
into an overflow bucket w.

• We add more records, and bucket d overflows. Bucket b is the next one to split
and extend the address space, so we use the h3(k) address function to divide the
records from bucket b and its overflow bucket between b and the new bucket B.
The record overflowing bucket d is placed In an overflow bucket x. The resulting
arrangement is illustrated in Fig. (c).

• Figure (d) shows what happens when, as we add more records, bucket d
overflows beyond the capacity of the overflow bucket w. Bucket c is the next in
the extension sequence, so we use the h3(k)address function to divide the records
between c and C.

• Finally, assume that bucket B overflows. The overflow record is placed in the
overflow bucket z. The overflow also triggers the extension to bucket D, dividing
the contents of d, x, and y between buckets d and D. At this point all of the
buckets use the h3(k) address function, and we have finished the expansion cycle.
The pointer for the next bucket to be split returns, to bucket a to get ready for a
new cycle that will use an h4(k) address function to reach new buckets.
14.What is double hashing? Explain in details.

Example :
• In the diagram, we have a hash-table of size 5 and two keys 10, 15 which need to be
inserted in the hash-table.

• We start with the first key 10 and apply our first hash-function to get the location
for 10 on the hash-table.

• Now, we take our second key 15 and apply our first hash function again.
Location of key 15 = h_1h1(15) = 15 % 5 = 0^{th}0th location. Oops! This time
the 0^{th}0th location is not empty, hence it is a collision.

• New Location of key 15 = h_1h1(15) + i * h_2h2(15) = 0 + (1 * 15 % 7)


= 1^{st}1st location. The first location of the hash-table is empty and we can place
key 15 here.

15.What is chained progressive overflow? Brief with neat sketch.


Example :

Disadvantage :
• Requires little more storage for link field
• Chaining algorithm should guarantee that it is possible to get to any
synonym by starting at its home address

16.What is collision? Explain collision resolution by progressive overflow.


Collision: An attempt to store a record at an address which does not have sufficient
room i.e. already occupied by another record which is a synonym. A collision occurs
when two record keys has to the same address.
Example :
Suppose there is a key in the sample file with the name OLIVIER. Because the name·
OLIVIER starts with the same two letters as the name LOWELL, they produce the
same address (004). There is a collision between the record for OLIVIER and the
record for LOWELL. We refer to keys that hash to the same address as synonyms.
Progressive overflow: If a key, k1, hashes into the same address, a1, as another key, k2,
then look for the first available address, a2, following a1 and place k1 in a2. If the end of
the address space is reached, then wrap around it. When searching for a key that is not
in, if the address space is not full, then an empty address will be reached or the search
will come back to where it began.
Example :

You might also like