Lecture 09 Hash Index - Without Answers

Lecture 09: Hash Index
Long Cheng
Assistant Professor
c.long@ntu.edu.sg
DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 1

Indexes
Conventional B+ Tree Hash
Index Index Index
• Simple • Search • Search

• Could query and query only
involve range query • 1 I/O (under
many I/Os • Few I/Os some
(roughly the circumstanc
height of es)
the tree)

Hashing Some logical space for
storing records, which
share the same hash value
 h(key) Bucket A block for
this example
key
records
h is the hash function

Blocks
.
E.g., h(key) = key mod B .
.
h(key) = 0, 1, …, B-1

Two Alternatives
.
.
.
records
(1) key  h(key) .
E.g., .
bucket no.
.

Two Alternatives
record
(2) key  h(key) Ptr
E.g., index id
in the Directory
directory
• Much smaller storage
than the disk file
• Easier to update (suitable
for dynamic case)
Within a Bucket
Yes
If
Do we keep records
• CPU time critical
sorted (wrt a key)?
• Inserts/Deletes
not too frequent

Inserts and Deletes
Bucket no.
INSERT: 0
h(a) = 1 1
h(b) = 2 Pointer
2
h(c) = 1
h(d) = 0 3
4 buckets
Inserts and Deletes
INSERT: 0 d
h(a) = 1 1 a
h(b) = 2 c
2
h(c) = 1 b
h(d) = 0 3
h(e) = 1

Inserts and Deletes
Overflow
INSERT: 0 d
h(a) = 1 1 a e
h(b) = 2 c
2
h(c) = 1 b
h(d) = 0 3
2 I/Os for
h(e) = 1 accessing e

Inserts and Deletes
Delete: 0 a
e 1 b d
f c
2
e
3
f
g

Inserts and Deletes
Delete: 0 a
e 1 b d
f c
c 2
e
3
f move
g “g” up

Inserts and Deletes
Delete: 0 a
e 1 b d
f c d
c 2
e
3
f move
g “g” up

Inserts and Deletes
Delete: 0 a
e 1 b
f d
c 2
3
g

How Many Buckets Should We
Use?
• In principle, it should be depending on the size of

the database we have (e.g., the number of
records)
• Once the number of buckets is decided, we can
define the hash functions accordingly.
• Try to keep space utilization between 50% and
80% # records stored /
• If < 50%, wasting space total # records that fit
• If > 80%, overflows significant

How to Handle Growing Records?
The number of buckets is
decided at the beginning
0 # of the possible hash values

1
3
Many I/Os
Extensible Hash Index

Extensible Hash Index – 2 Ideas
(a) Use the first i of b bits output by hash function

and i would grow over time with more records
b
h(K) 
00110101

Extensible Hash Index – 2 Ideas
(b) Use directory

.
h(K)[1-i ] .
record
.
.
.
.
Directory

The number of bits used for hashing (directory)
Extensible Hash Index – Example j
i 0000 2 The number of bits used
2 0001 for hashing (bucket)
00 0111 2
• Directory
01
• 4 buckets
10
• Hash function
11 outputs 4 bits
1001 2 • Each record
1010 represented by its
hash value
1100 2 • 2 records in each
Search 1010
bucket/block
Extensible Hash Index – Insertions
1
i= 1 0001
0
1 Initially
1 • i=1
1001 • 2 buckets
1100
Insert 1010

Example continued
1
i= 1 0001
0
1
1 2
1001 Cannot use 1 bit for
1010 1100 hashing these three keys,
and thus increase to 2
1 2 bits
Insert 1010 1100

• First 1 bit: 0
Example continued
• First two bits: 0x
(namely 00 and 01)
i=2
1
00
i= 1 0001
0 01
1
10
1 2
1001 11
First two
1010 1100
bits: 10
1 2 New directory
Insert 1010 1100
First two
bits: 11
Example continued
i= 2
00
1
01
0001
10
11 2
1001
1010
Insert: 2
1100
0111

Example continued 2
0000
i= 2 0001
00
1 2
01
0001 0111
10 0111
11 2
1001
1010
Insert: 2
1100
0000

Example continued 2
0000
i= 2 0001
00
1 2
01
0001 0111
10 0111
11 2
1001
1010
Insert: 2
1100
0000

Example continued
0000 2
i= 2 0001
00 0111 2
01
10
11
1001 2
1010
Insert:
1001 1100 2

Example continued
0000 2
i= 2 0001
00 0111 2
01
10 1001 3
11 1001
1010 1001 2 3
1010
Insert:
1001 1100 2

Example continued
i=3
0000 2
000
i= 2 0001
001
00 0111 2
010
01
011
10 1001 3
1001 100
11
1010 1001 2 3 101
1010 110
Insert:
1001 1100 2 111

Example continued
0000 2
i= 3 0001
000 0111 2
001
• Now
010 1001 3 • i = 3 and 5
011 1001 buckets
3
• Initially
100 1010
• i = 1 and 2
101 buckets
110 1100 2
111

Extensible Hash Index - Deletions
• No merging of blocks
• Merge blocks
and cut directory if possible
(Reverse insert procedure)

Extensible Hash Index Still Needs Overflow Chains
Splitting it further does

not solve the problem
if we split:
insert 1100
1100 2
1
1100
1100
2 1100
many records with 1100
duplicate keys 1100

Extensible Hash Index Still Needs Overflow Chains
insert 1100 add overflow block:
1 1
1100 1100 1100
1100 1100
Question:
Can you think of another scenario where an
overflow block is not avoidable?
Summary of Indexes
Conventional B+ Tree Hash
Index Index Index
• Simple • Search • Search

• Could query and query only
involve range query • 1 I/O (if no
many I/Os • Few I/Os overflow
(height of chains)
the tree) Assuming the
directory is in
main memory
Create Indexes in SQL
CREATE INDEX name ON relation (attribute)

Duplicate values are allowed
CREATE UNIQUE INDEX name ON relation (attribute)

Duplicate values are not allowed
DROP INDEX name;
Note: The syntax for creating indexes varies amongst

different databases.

• Can we specify index techniques, e.g., B+-tree or

hashing?
o Some DBMS does not support, e.g., SQL server
o Some DBMS support, e.g., MySQL, PostgreSQL
• But you can always specify which sets of
attributes you want to build indexes

• Benefits. Index on an attribute may speed up the

execution of queries in which a value/a range of
values are specified for the attribute, and may
also help joins involving that attribute
• Costs. It makes insertions, deletions, and updates
slower

Recap
• Hash Index
• Extensible Hash Index
• Summary of Indexes and Index Creation

Questions?
Next lecture:
Lecture 10: Multiple Key Index

Lecture 09 Hash Index - Without Answers

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 09 Hash Index - Without Answers

Uploaded by

Copyright:

Available Formats

Lecture 09: Hash Index

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 1

• Simple • Search • Search

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 2

h is the hash function

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 3

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 4

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 6

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 8

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 9

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 10

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 11

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 12

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 13

• In principle, it should be depending on the size of

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 14

0 # of the possible hash values

Extensible Hash Index

(a) Use the first i of b bits output by hash function

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 16

(b) Use directory

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 17

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 19

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 20

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 22

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 23

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 24

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 25

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 26

1010 1001 2 3 101

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 27

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 28

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 29

Splitting it further does

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 30

insert 1100 add overflow block:

• Simple • Search • Search

CREATE INDEX name ON relation (attribute)

CREATE UNIQUE INDEX name ON relation (attribute)

DROP INDEX name;

Note: The syntax for creating indexes varies amongst

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 33

• Can we specify index techniques, e.g., B+-tree or

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 34

• Benefits. Index on an attribute may speed up the

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 35

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 36

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 37

You might also like