CS 245: Database System Principles: Notes 4: Indexing

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 156

CS 245: Database System

Principles

Notes 4: Indexing

Hector Garcia-Molina

CS 245 Notes 4 1
Chapter 4

Indexing & Hashing


record
value ? value

CS 245 Notes 4 2
Topics

• Conventional indexes
• B-trees
• Hashing schemes

CS 245 Notes 4 3
Sequential File
10
20
30
40
50
60
70
80
90
100

CS 245 Notes 4 4
Dense Index Sequential File

10 10
20 20
30
40
30
40
50
60 50
70 60
80
70
90 80
100 90
110 100
120

CS 245 Notes 4 5
Sparse Index Sequential File

10 10
30 20
50
70
30
40
90
110 50
130 60
150
70
170 80
190 90
210 100
230

CS 245 Notes 4 6
Sparse 2nd level Sequential File

10 10 10
90 30 20
170 50
250 70
30
40
90
330 50
110
410 60
130
490
150
570 70
170 80
190 90
210 100
230

CS 245 Notes 4 7
• Comment:
{FILE,INDEX} may be contiguous
or not (blocks chained)

CS 245 Notes 4 8
Question:

• Can we build a dense, 2nd level index


for a dense index?

CS 245 Notes 4 9
Notes on pointers:

(1) Block pointer (sparse index) can be


smaller than record pointer

BP

RP

CS 245 Notes 4 10
Notes on pointers:
(2) If file is contiguous, then we can omit
pointers (i.e., compute them)

CS 245 Notes 4 11
R1

K1 R2
K2
K3 R3

K4
R4

CS 245 Notes 4 12
R1

K1 R2 say:
K2 1024 B
per block
K3 R3

K4
R4

• if we want K3 block:
get it at offset
(3-1)1024
= 2048 bytes

CS 245 Notes 4 13
Sparse vs. Dense Tradeoff

• Sparse: Less index space per record


can keep more of index in memory
• Dense: Can tell if any record exists
without accessing file

(Later:
– sparse better for insertions
– dense needed for secondary indexes)

CS 245 Notes 4 14
Terms
• Index sequential file
• Search key (  primary key)
• Primary index (on Sequencing field)
• Secondary index
• Dense index (all Search Key values in)
• Sparse index
• Multi-level index

CS 245 Notes 4 15
Next:

• Duplicate keys

• Deletion/Insertion

• Secondary indexes

CS 245 Notes 4 16
Duplicate keys

10
10
10
20
20
30
30
30
40
45
CS 245 Notes 4 17
Duplicate keys
Dense index, one way to implement?
10
10 10
10
10 10
20 20

20 20
30 30
30 30
30 30
40
45
CS 245 Notes 4 18
Duplicate keys
Dense index, better way?
10
10 10
20
30 10
40 20
20
30
30
30
40
45
CS 245 Notes 4 19
Duplicate keys
Sparse index, one way?
10
10 10
10
20 10
30 20
20
30
30
30
40
45
CS 245 Notes 4 20
Duplicate keys
Sparse index, one way?
10
careful if looking

10 10
for 20 or 30!

10
20 10
30 20
20
30
30
30
40
45
CS 245 Notes 4 21
Duplicate keys
Sparse index, another way?
– place first new key from block
10
10 10
20
30 10
30 20
20
30
30
30
40
45
CS 245 Notes 4 22
Duplicate keys
Sparse index, another way?
– place first new key from block
10
should 10 10
20
this be 30 10
40? 30 20
20
30
30
30
40
45
CS 245 Notes 4 23
Summary Duplicate values,
primary index
• Index may point to first instance of
each value only
File
Index a
a a
.
.

b
CS 245 Notes 4 24
Deletion from sparse index

10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

CS 245 Notes 4 25
Deletion from sparse index
– delete record 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

CS 245 Notes 4 26
Deletion from sparse index
– delete record 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

CS 245 Notes 4 27
Deletion from sparse index
– delete record 30
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

CS 245 Notes 4 28
Deletion from sparse index
– delete record 30
10
10 20
40 30
50 30 40
70 40

90 50
110 60
130 70
150 80

CS 245 Notes 4 29
Deletion from sparse index
– delete records 30 & 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

CS 245 Notes 4 30
Deletion from sparse index
– delete records 30 & 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

CS 245 Notes 4 31
Deletion from sparse index
– delete records 30 & 40
10
10 20
50 30
70 50 30
70 40

90 50
110 60
130 70
150 80

CS 245 Notes 4 32
Deletion from dense index

10
10 20
20
30 30
40 40

50 50
60 60
70 70
80 80

CS 245 Notes 4 33
Deletion from dense index
– delete record 30
10
10 20
20
30 30
40 40

50 50
60 60
70 70
80 80

CS 245 Notes 4 34
Deletion from dense index
– delete record 30
10
10 20
20
30 30 40
40 40

50 50
60 60
70 70
80 80

CS 245 Notes 4 35
Deletion from dense index
– delete record 30
10
10 20
20
40 30 30 40
40 40

50 50
60 60
70 70
80 80

CS 245 Notes 4 36
Insertion, sparse index case

10
10 20
30
40 30
60

40
50
60

CS 245 Notes 4 37
Insertion, sparse index case
– insert record 34
10
10 20
30
40 30
60

40
50
60

CS 245 Notes 4 38
Insertion, sparse index case
– insert record 34
10
10 20
30
40 30
60 34
40
50
• our lucky day! 60
we have free space
where we need it!

CS 245 Notes 4 39
Insertion, sparse index case
– insert record 15
10
10 20
30
40 30
60

40
50
60

CS 245 Notes 4 40
Insertion, sparse index case
– insert record 15
10
10 20 15
20 30
40 30 20
60 30
40
50
60

CS 245 Notes 4 41
Insertion, sparse index case
– insert record 15
10
10 20 15
20 30
40 30 20
60 30
40
50
• Illustrated: Immediate reorganization 60
• Variation:
– insert new block (chained file)
– update index

CS 245 Notes 4 42
Insertion, sparse index case
– insert record 25
10
10 20
30
40 30
60

40
50
60

CS 245 Notes 4 43
Insertion, sparse index case
– insert record 25
10 25
10 20
30
40 30 overflow blocks
60 (reorganize later...)
40
50
60

CS 245 Notes 4 44
Insertion, dense index case

• Similar
• Often more expensive . . .

CS 245 Notes 4 45
Secondary indexes
Sequence
field

30
50
20
70
80
40
100
10
90
60

CS 245 Notes 4 46
Secondary indexes
Sequence
• Sparse index field

30
30
20
50
80 20
100 70
90 80
... 40
100
10
90
60

CS 245 Notes 4 47
Secondary indexes
Sequence
• Sparse index field

30
30
20
50
80 20
100 70
90 80
... 40
100
10
does not make sense! 90
60

CS 245 Notes 4 48
Secondary indexes
Sequence
• Dense index field

30
50
20
70
80
40
100
10
90
60

CS 245 Notes 4 49
Secondary indexes
Sequence
• Dense index field

10 30
20 50
30
40 20
70
50
60
80
70
40
... 100
10
90
60

CS 245 Notes 4 50
Secondary indexes
Sequence
• Dense index field

10 30
20 50
10 30
40 20
50
70
90
... 50
60
80
sparse 70
40
high ... 100
level 10
90
60

CS 245 Notes 4 51
With secondary indexes:
• Lowest level is dense
• Other levels are sparse

Also: Pointers are record pointers


(not block pointers; not computed)

CS 245 Notes 4 52
Duplicate values & secondary indexes
20
10
20
40
10
40
10
40
30
40

CS 245 Notes 4 53
Duplicate values & secondary indexes
one option...
10 20
10 10
10
20 20
40
20
30 10
40 40
40
10
40 40
40
... 30
40

CS 245 Notes 4 54
Duplicate values & secondary indexes
one option...
10 20
10 10
10
20
Problem: 20
40
20
excess overhead! 30 10
• disk space 40 40
40
• search time 10
40 40
40
... 30
40

CS 245 Notes 4 55
Duplicate values & secondary indexes
another option...
10 20
10

20
20
40
10
30
40
40 10
40
30
40

CS 245 Notes 4 56
Duplicate values & secondary indexes
another option...
10 20
10
20
Problem: 20
40
variable size 10
records in 30
40
index! 40 10
40
30
40

CS 245 Notes 4 57
Duplicate values & secondary indexes
20 
10 10
20
30 20
40 40

50 10
60 40
...
10 
40
Another idea (suggested in class): 30 
Chain records with same key?
40 

CS 245 Notes 4 58
Duplicate values & secondary indexes
20 
10 10
20
30 20
40 40

50 10
60 40
...
10 
40
Another idea (suggested in class): 30 
Chain records with same key?
40 
Problems:
• Need to add fields to records
• Need to follow chain to know records

CS 245 Notes 4 59
Duplicate values & secondary indexes
20
10 10
20
30 20
40 40
50 10
60 40
...
10
40
30
40
buckets
CS 245 Notes 4 60
Why “bucket” idea is useful

Indexes Records
Name: primary EMP (name,dept,floor,...)
Dept: secondary
Floor: secondary

CS 245 Notes 4 61
Query: Get employees in
(Toy Dept) ^ (2nd floor)
Dept. index EMP Floor index

Toy 2nd

CS 245 Notes 4 62
Query: Get employees in
(Toy Dept) ^ (2nd floor)
Dept. index EMP Floor index

Toy 2nd

 Intersect toy bucket and 2nd Floor


bucket to get set of matching EMP’s
CS 245 Notes 4 63
This idea used in
text information retrieval

Documents
...the cat is
fat ...

...was raining
cats and dogs...

...Fido the
dog ...

CS 245 Notes 4 64
This idea used in
text information retrieval

Documents
cat
...the cat is
fat ...
dog
...was raining
cats and dogs...

...Fido the
dog ...
Inverted lists

CS 245 Notes 4 65
IR QUERIES
• Find articles with “cat” and “dog”
• Find articles with “cat” or “dog”
• Find articles with “cat” and not “dog”

CS 245 Notes 4 66
IR QUERIES
• Find articles with “cat” and “dog”
• Find articles with “cat” or “dog”
• Find articles with “cat” and not “dog”

• Find articles with “cat” in title


• Find articles with “cat” and “dog”
within 5 words

CS 245 Notes 4 67
Common technique:
more info in inverted list

d1
cat Title 5
Author 10
Abstract 57
d3 d2

dog Title 100


Title 12

CS 245 Notes 4 68
Posting: an entry in inverted list.
Represents occurrence of
term in article
Size of a list: 1 Rare words or
(in postings) miss-spellings

106 Common words

Size of a posting: 10-15 bits (compressed)

CS 245 Notes 4 69
IR DISCUSSION

• Stop words
• Truncation
• Thesaurus
• Full text vs. Abstracts
• Vector model

CS 245 Notes 4 70
Vector space model

w1 w2 w3 w4 w5 w6 w7 …
DOC = <1 0 0 1 1 0 0 …>

Query= <0 0 1 1 0 0 0 …>

CS 245 Notes 4 71
Vector space model

w1 w2 w3 w4 w5 w6 w7 …
DOC = <1 0 0 1 1 0 0 …>

Query= <0 0 1 1 0 0 0 …>

PRODUCT = 1 + ……. = score

CS 245 Notes 4 72
• Tricks to weigh scores + normalize

e.g.: Match on common word not as


useful as match on rare words...

CS 245 Notes 4 73
• How to process V.S. Queries?

w1 w2 w3 w4 w5 w6 …
Q=<0 0 0 1 1 0 …>

CS 245 Notes 4 74
• Try Stanford Libraries
• Try Google, Yahoo, ...

CS 245 Notes 4 75
Summary so far

• Conventional index
– Basic Ideas: sparse, dense, multi-level…
– Duplicate Keys
– Deletion/Insertion
– Secondary indexes
– Buckets of Postings List

CS 245 Notes 4 76
Conventional indexes
Advantage:
- Simple
- Index is sequential file
good for scans
Disadvantage:
- Inserts expensive, and/or
- Lose sequentiality & balance

CS 245 Notes 4 77
Example Index (sequential)
10
20
30
continuous
40
50
60
free space
70
80
90

CS 245 Notes 4 78
Example Index (sequential)
10
20 39
30 31
33 35
continuous 36
40
50
60 32
38
free space 34
70
80
90 overflow area
(not sequential)

CS 245 Notes 4 79
Outline:

• Conventional indexes
• B-Trees  NEXT
• Hashing schemes

CS 245 Notes 4 80
• NEXT: Another type of index
– Give up on sequentiality of index
– Try to get “balance”

CS 245 Notes 4 81
3
5

CS 245
11

30
30
35

100
101
B+Tree Example

110
100

Notes 4
Root

120
130

150
156 120
179 150
180
180
n=3

200
82
Sample non-leaf

57

81

95
to keys to keys to keys to keys
< 57 57 k<81 81k<95 95

CS 245 Notes 4 83
Sample leaf node:
From non-leaf node

to next leaf
in sequence
57

81

95
with key 57

with key 81

with key 85
To record

To record

To record

CS 245 Notes 4 84
In textbook’s notation n=3

Leaf:
30 35
30
35

Non-leaf:
30
30

CS 245 Notes 4 85
Size of nodes: n+1 pointers
(fixed)
n keys

CS 245 Notes 4 86
Don’t want nodes to be too empty

• Use at least

Non-leaf: (n+1)/2 pointers

Leaf: (n+1)/2 pointers to data

CS 245 Notes 4 87
n=3

Full node min. node

Non-leaf

120
150
180

30

counts even if null


Leaf 11

30
35
3
5

CS 245 Notes 4 88
B+tree rules tree of order n

(1) All leaves at same lowest level


(balanced tree)
(2) Pointers in leaves point to records
except for “sequence pointer”

CS 245 Notes 4 89
(3) Number of pointers/keys for B+tree

Max Max Min Min


ptrs keys ptrsdata keys
Non-leaf n+1 n (n+1)/2 (n+1)/2- 1
(non-root)
Leaf n+1 n (n+1)/2 (n+1)/2
(non-root)
Root n+1 n 1 1

CS 245 Notes 4 90
Insert into B+tree

(a) simple case


– space available in leaf
(b) leaf overflow
(c) non-leaf overflow
(d) new root

CS 245 Notes 4 91
(a) Insert key = 32 n=3

100
30
11

30
31
3
5

CS 245 Notes 4 92
(a) Insert key = 32 n=3

100
30
11

30
31
32
3
5

CS 245 Notes 4 93
(a) Insert key = 7 n=3

100
30
11

30
31
3
5

CS 245 Notes 4 94
(a) Insert key = 7 n=3

100
30
57
11

30
31
3
5

CS 245 Notes 4 95
(a) Insert key = 7 n=3

100
30
7
57
11

30
31
3
5

CS 245 Notes 4 96
(c) Insert key = 160 n=3

100

120
150
180

180
200
150
156
179

CS 245 Notes 4 97
(c) Insert key = 160 n=3

100

120
150
180

180
200
150
156
179

160
179
CS 245 Notes 4 98
(c) Insert key = 160 n=3

100

120
150
180

180

180
200
150
156
179

160
179
CS 245 Notes 4 99
(c) Insert key = 160 n=3

160
100

120
150
180

180

180
200
150
156
179

160
179
CS 245 Notes 4 100
(d) New root, insert 45 n=3

10
20
30
12
10

20
25

30
32
40
1
2
3

CS 245 Notes 4 101


(d) New root, insert 45 n=3

10
20
30
12
10

20
25

30
32
40

40
45
1
2
3

CS 245 Notes 4 102


(d) New root, insert 45 n=3

40
10
20
30
12
10

20
25

30
32
40

40
45
1
2
3

CS 245 Notes 4 103


(d) New root, insert 45 n=3

new root

30

40
10
20
30
12
10

20
25

30
32
40

40
45
1
2
3

CS 245 Notes 4 104


Deletion from B+tree

(a) Simple case - no example


(b) Coalesce with neighbor (sibling)
(c) Re-distribute keys
(d) Cases (b) or (c) at non-leaf

CS 245 Notes 4 105


(b) Coalesce with sibling
n=4
– Delete 50

100
10
40
10
20
30

40
50

CS 245 Notes 4 106


(b) Coalesce with sibling
n=4
– Delete 50

100
10
40
40
10
20
30

40
50

CS 245 Notes 4 107


(c) Redistribute keys
n=4
– Delete 50

100
10
40
10
20
30
35

40
50
CS 245 Notes 4 108
(c) Redistribute keys
n=4
– Delete 50

40 35
100
10

35
10
20
30
35

40
50
CS 245 Notes 4 109
(d) Non-leaf coalese
n=4
– Delete 37

25
10
20

30
40

37
30
26
10
14

20
22

25

40
45
1
3

CS 245 Notes 4 110


(d) Non-leaf coalese
n=4
– Delete 37

25
10
20

30
40
30

37
30
26
10
14

20
22

25

40
45
1
3

CS 245 Notes 4 111


(d) Non-leaf coalese
n=4
– Delete 37

25
40
10
20

30
40
30

37
30
26
10
14

20
22

25

40
45
1
3

CS 245 Notes 4 112


(d) Non-leaf coalese
n=4
– Delete 37

25
new root

40
25
10
20

30
40
30

37
30
26
10
14

20
22

25

40
45
1
3

CS 245 Notes 4 113


B+tree deletions in practice

– Often, coalescing is not implemented


– Too hard and not worth it!

CS 245 Notes 4 114


Comparison: B-trees vs. static
indexed sequential file
Ref #1: Held & Stonebraker
“B-Trees Re-examined”
CACM, Feb. 1978

CS 245 Notes 4 115


Ref # 1 claims:
- Concurrency control harder in B-Trees
- B-tree consumes more space

For their comparison:


block = 512 bytes
key = pointer = 4 bytes
4 data records per block

CS 245 Notes 4 116


Example: 1 block static index
k1
1 data
block
k1 k2

127 keys k2
k3 k3

(127+1)4 = 512 Bytes


-> pointers in index implicit! up to 127
blocks

CS 245 Notes 4 117


Example: 1 block B-tree
k1 k1
1 data
block
k2
k2
...
63 keys
k63
k3
-
next
63x(4+4)+8 = 512 Bytes
-> pointers needed in B-tree up to 63
blocks because index is blocks
not contiguous
CS 245 Notes 4 118
Size comparison Ref. #1

Static Index B-tree


# data # data
blocks height blocks height

2 -> 127 2 2 -> 63 2


128 -> 16,129 3 64 -> 3968 3
16,130 -> 2,048,383 4 3969 -> 250,047 4
250,048 -> 15,752,961 5

CS 245 Notes 4 119


Ref. #1 analysis claims
• For an 8,000 block file,
after 32,000 inserts
after 16,000 lookups
 Static index saves enough accesses
to allow for reorganization

CS 245 Notes 4 120


Ref. #1 analysis claims
• For an 8,000 block file,
after 32,000 inserts
after 16,000 lookups
 Static index saves enough accesses
to allow for reorganization

Ref. #1 conclusion Static index better!!

CS 245 Notes 4 121


Ref #2: M. Stonebraker,
“Retrospective on a database
system,” TODS, June 1980

Ref. #2 conclusion B-trees better!!

CS 245 Notes 4 122


Ref. #2 conclusion B-trees better!!

• DBA does not know when to reorganize


• DBA does not know how full to load
pages of new index

CS 245 Notes 4 123


Ref. #2 conclusion B-trees better!!

• Buffering
– B-tree: has fixed buffer requirements
– Static index: must read several overflow
blocks to be efficient
(large & variable size
buffers needed for this)

CS 245 Notes 4 124


• Speaking of buffering…
Is LRU a good policy for B+tree buffers?

CS 245 Notes 4 125


• Speaking of buffering…
Is LRU a good policy for B+tree buffers?

 Of course not!
 Should try to keep root in memory
at all times
(and perhaps some nodes from second level)

CS 245 Notes 4 126


Interesting problem:
For B+tree, how large should n be?

n is number of keys / node

CS 245 Notes 4 127


Sample assumptions:
(1) Time to read node from disk is
(S+Tn) msec.

CS 245 Notes 4 128


Sample assumptions:
(1) Time to read node from disk is
(S+Tn) msec.
(2) Once block in memory, use binary
search to locate key:
(a + b LOG2 n) msec.
For some constants a,b; Assume a << S

CS 245 Notes 4 129


Sample assumptions:
(1) Time to read node from disk is
(S+Tn) msec.
(2) Once block in memory, use binary
search to locate key:
(a + b LOG2 n) msec.
For some constants a,b; Assume a << S

(3) Assume B+tree is full, i.e.,


# nodes to examine is LOGn N
where N = # records
CS 245 Notes 4 130
Can get:
f(n) = time to find a record

f(n)

nopt n

CS 245 Notes 4 131


 FIND nopt by f’(n) = 0

Answer is nopt = “few hundred”


(see homework for details)

CS 245 Notes 4 132


 FIND nopt by f’(n) = 0

Answer is nopt = “few hundred”


(see homework for details)

 What happens to nopt as


• Disk gets faster?
• CPU get faster?

CS 245 Notes 4 133


Variation on B+tree: B-tree (no +)

• Idea:
– Avoid duplicate keys
– Have record pointers in non-leaf nodes

CS 245 Notes 4 134


K1 P1 K2 P2 K3 P3

to record to record to record


with K1 with K2 with K3
to keys to keys to keys to keys
< K1 K1<x<K2 K2<x<k3 >k3

CS 245 Notes 4 135


10

CS 245
20
30
40
25
50
45
60
B-tree example

70
80
90 85 65

Notes 4
100 105 125
110
120
130
140 145
150 165
160
n=2

170
136

180
B-tree example n=2
• sequence pointers
not useful now!

125
(but keep space for simplicity)

65
105

145
165
25
45

85

110

160
100

120
130
140
150

170
180
20
10

30
40
50
60
70
80
90

CS 245 Notes 4 137


Note on inserts
• Say we insert record with key = 25
n=3
leaf

30
10
20

CS 245 Notes 4 138


Note on inserts
• Say we insert record with key = 25
n=3
leaf

30
10
20
• Afterwards:
20


10

25
30
CS 245 Notes 4 139
So, for B-trees:

MAX MIN
Tree Rec Keys Tree Rec Keys
Ptrs Ptrs Ptrs Ptrs
Non-leaf
non-root n+1 n n (n+1)/2 (n+1)/2-1 (n+1)/2-1
Leaf
non-root 1 n n 1 n/2 n/2
Root
non-leaf n+1 n n 2 1 1
Root
Leaf 1 n n 1 1 1

CS 245 Notes 4 140


Tradeoffs:
 B-trees have faster lookup than B+trees

 in B-tree, non-leaf & leaf different sizes


 in B-tree, deletion more complicated

CS 245 Notes 4 141


Tradeoffs:
 B-trees have faster lookup than B+trees

 in B-tree, non-leaf & leaf different sizes


 in B-tree, deletion more complicated

 B+trees preferred!

CS 245 Notes 4 142


But note:
• If blocks are fixed size
(due to disk and buffering restrictions)
Then lookup for B+tree is
actually better!!

CS 245 Notes 4 143


Example:
- Pointers 4 bytes
- Keys 4 bytes
- Blocks 100 bytes (just example)
- Look at full 2 level tree

CS 245 Notes 4 144


B-tree:
Root has 8 keys + 8 record pointers
+ 9 son pointers
= 8x4 + 8x4 + 9x4 = 100 bytes

CS 245 Notes 4 145


B-tree:
Root has 8 keys + 8 record pointers
+ 9 son pointers
= 8x4 + 8x4 + 9x4 = 100 bytes

Each of 9 sons: 12 rec. pointers (+12 keys)


= 12x(4+4) + 4 = 100 bytes

CS 245 Notes 4 146


B-tree:
Root has 8 keys + 8 record pointers
+ 9 son pointers
= 8x4 + 8x4 + 9x4 = 100 bytes

Each of 9 sons: 12 rec. pointers (+12 keys)


= 12x(4+4) + 4 = 100 bytes

2-level B-tree, Max # records =


12x9 + 8 = 116

CS 245 Notes 4 147


B+tree:

Root has 12 keys + 13 son pointers


= 12x4 + 13x4 = 100 bytes

CS 245 Notes 4 148


B+tree:

Root has 12 keys + 13 son pointers


= 12x4 + 13x4 = 100 bytes

Each of 13 sons: 12 rec. ptrs (+12 keys)


= 12x(4 +4) + 4 = 100 bytes

CS 245 Notes 4 149


B+tree:

Root has 12 keys + 13 son pointers


= 12x4 + 13x4 = 100 bytes

Each of 13 sons: 12 rec. ptrs (+12 keys)


= 12x(4 +4) + 4 = 100 bytes

2-level B+tree, Max # records


= 13x12 = 156

CS 245 Notes 4 150


So... 8 records

B+ B

ooooooooooooo ooooooooo
156 records 108 records
Total = 116

CS 245 Notes 4 151


So... 8 records

B+ B

ooooooooooooo ooooooooo
156 records 108 records
Total = 116
• Conclusion:
– For fixed block size,
– B+ tree is better because it is bushier
CS 245 Notes 4 152
An Interesting Problem...
• What is a good index structure when:
– records tend to be inserted with keys
that are larger than existing values?
(e.g., banking records with growing data/time)
– we want to remove older data

CS 245 Notes 4 153


One Solution: Multiple Indexes
• Example: I1, I2
day days indexed days indexed
I1 I2
10 1,2,3,4,5 6,7,8,9,10
11 11,2,3,4,5 6,7,8,9,10
12 11,12,3,4,5 6,7,8,9,10
13 11,12,13,4,5 6,7,8,9,10

•advantage: deletions/insertions from smaller index


•disadvantage: query multiple indexes
CS 245 Notes 4 154
Another Solution (Wave Indexes)
day I1 I2 I3 I4
10 1,2,3 4,5,6 7,8,9 10
11 1,2,3 4,5,6 7,8,9 10,11
12 1,2,3 4,5,6 7,8,9 10,11, 12
13 13 4,5,6 7,8,9 10,11, 12
14 13,14 4,5,6 7,8,9 10,11, 12
15 13,14,15 4,5,6 7,8,9 10,11, 12
16 13,14,15 16 7,8,9 10,11, 12

•advantage: no deletions
•disadvantage: approximate windows

CS 245 Notes 4 155


Outline/summary
• Conventional Indexes
• Sparse vs. dense
• Primary vs. secondary
• B trees
• B+trees vs. B-trees
• B+trees vs. indexed sequential
• Hashing schemes --> Next

CS 245 Notes 4 156

You might also like