Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 37

Lecture 09: Hash Index

Long Cheng
Assistant Professor
c.long@ntu.edu.sg

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 1


Indexes
Conventional B+ Tree Hash
Index Index Index

• Simple • Search • Search


• Could query and query only
involve range query • 1 I/O (under
many I/Os • Few I/Os some
(roughly the circumstanc
height of es)
the tree)

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 2


Hashing Some logical space for
storing records, which
share the same hash value
 h(key) Bucket A block for
this example
key
records

h is the hash function


Blocks
.
E.g., h(key) = key mod B .
.
h(key) = 0, 1, …, B-1

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 3


Two Alternatives
.
.
.
records
(1) key  h(key) .
E.g., .
bucket no.
.

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 4


Two Alternatives

record
(2) key  h(key) Ptr

E.g., index id
in the Directory
directory
• Much smaller storage
than the disk file
• Easier to update (suitable
for dynamic case)
DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 5
Within a Bucket

Yes
If
Do we keep records
• CPU time critical
sorted (wrt a key)?
• Inserts/Deletes
not too frequent

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 6


Inserts and Deletes
Bucket no.

INSERT: 0

h(a) = 1 1
h(b) = 2 Pointer
2
h(c) = 1
h(d) = 0 3

4 buckets
DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 7
Inserts and Deletes

INSERT: 0 d
h(a) = 1 1 a
h(b) = 2 c
2
h(c) = 1 b
h(d) = 0 3

h(e) = 1

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 8


Inserts and Deletes

Overflow
INSERT: 0 d
h(a) = 1 1 a e
h(b) = 2 c
2
h(c) = 1 b
h(d) = 0 3
2 I/Os for
h(e) = 1 accessing e

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 9


Inserts and Deletes

Delete: 0 a
e 1 b d
f c
2
e
3
f
g

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 10


Inserts and Deletes

Delete: 0 a
e 1 b d
f c
c 2
e
3
f move
g “g” up

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 11


Inserts and Deletes

Delete: 0 a
e 1 b d
f c d
c 2
e
3
f move
g “g” up

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 12


Inserts and Deletes

Delete: 0 a
e 1 b
f d
c 2

3
g

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 13


How Many Buckets Should We
Use?

• In principle, it should be depending on the size of


the database we have (e.g., the number of
records)
• Once the number of buckets is decided, we can
define the hash functions accordingly.
• Try to keep space utilization between 50% and
80% # records stored /
• If < 50%, wasting space total # records that fit
• If > 80%, overflows significant

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 14


How to Handle Growing Records?
The number of buckets is
decided at the beginning

0 # of the possible hash values


1

3
Many I/Os

Extensible Hash Index


DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 15
Extensible Hash Index – 2 Ideas

(a) Use the first i of b bits output by hash function


and i would grow over time with more records
b
h(K) 
00110101

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 16


Extensible Hash Index – 2 Ideas

(b) Use directory


.
h(K)[1-i ] .
record
.
.
.
.
Directory

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 17


The number of bits used for hashing (directory)
Extensible Hash Index – Example j
i 0000 2 The number of bits used
2 0001 for hashing (bucket)
00 0111 2
• Directory
01
• 4 buckets
10
• Hash function
11 outputs 4 bits
1001 2 • Each record
1010 represented by its
hash value
1100 2 • 2 records in each
Search 1010
bucket/block
DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 18
Extensible Hash Index – Insertions
1
i= 1 0001
0

1 Initially
1 • i=1
1001 • 2 buckets
1100

Insert 1010

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 19


Example continued

1
i= 1 0001
0

1
1 2
1001 Cannot use 1 bit for
1010 1100 hashing these three keys,
and thus increase to 2
1 2 bits
Insert 1010 1100

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 20


• First 1 bit: 0
Example continued
• First two bits: 0x
(namely 00 and 01)
i=2
1
00
i= 1 0001
0 01
1
10
1 2
1001 11
First two
1010 1100
bits: 10
1 2 New directory
Insert 1010 1100
First two
bits: 11
DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 21
Example continued

i= 2
00
1
01
0001
10

11 2
1001
1010
Insert: 2
1100
0111

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 22


Example continued 2
0000
i= 2 0001
00
1 2
01
0001 0111
10 0111
11 2
1001
1010
Insert: 2
1100
0000

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 23


Example continued 2
0000
i= 2 0001
00
1 2
01
0001 0111
10 0111
11 2
1001
1010
Insert: 2
1100
0000

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 24


Example continued
0000 2
i= 2 0001
00 0111 2
01

10

11

1001 2
1010
Insert:
1001 1100 2

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 25


Example continued
0000 2
i= 2 0001
00 0111 2
01

10 1001 3
11 1001
1010 1001 2 3
1010
Insert:
1001 1100 2

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 26


Example continued
i=3
0000 2
000
i= 2 0001
001
00 0111 2
010
01
011
10 1001 3
1001 100
11

1010 1001 2 3 101

1010 110
Insert:
1001 1100 2 111

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 27


Example continued
0000 2
i= 3 0001
000 0111 2
001
• Now
010 1001 3 • i = 3 and 5
011 1001 buckets
3
• Initially
100 1010
• i = 1 and 2
101 buckets
110 1100 2
111

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 28


Extensible Hash Index - Deletions

• No merging of blocks
• Merge blocks
and cut directory if possible
(Reverse insert procedure)

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 29


Extensible Hash Index Still Needs Overflow Chains

Splitting it further does


not solve the problem
if we split:
insert 1100

1100 2
1
1100
1100
2 1100
many records with 1100
duplicate keys 1100

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 30


Extensible Hash Index Still Needs Overflow Chains

insert 1100 add overflow block:

1 1
1100 1100 1100
1100 1100

Question:
Can you think of another scenario where an
overflow block is not avoidable?
DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 31
Summary of Indexes
Conventional B+ Tree Hash
Index Index Index

• Simple • Search • Search


• Could query and query only
involve range query • 1 I/O (if no
many I/Os • Few I/Os overflow
(height of chains)
the tree) Assuming the
directory is in
main memory
DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 32
Create Indexes in SQL

CREATE INDEX name ON relation (attribute)


Duplicate values are allowed

CREATE UNIQUE INDEX name ON relation (attribute)


Duplicate values are not allowed

DROP INDEX name;

Note: The syntax for creating indexes varies amongst


different databases.

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 33


Create Indexes in SQL

• Can we specify index techniques, e.g., B+-tree or


hashing?
o Some DBMS does not support, e.g., SQL server
o Some DBMS support, e.g., MySQL, PostgreSQL
• But you can always specify which sets of
attributes you want to build indexes

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 34


Create Indexes in SQL

• Benefits. Index on an attribute may speed up the


execution of queries in which a value/a range of
values are specified for the attribute, and may
also help joins involving that attribute
• Costs. It makes insertions, deletions, and updates
slower

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 35


Recap
• Hash Index
• Extensible Hash Index
• Summary of Indexes and Index Creation

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 36


Questions?

Next lecture:
Lecture 10: Multiple Key Index

DATABASE SYSTEM PRINCIPLES: Lecture 09: Hash Index 37

You might also like