CS 245: Database System Principles: Notes 4: Indexing

CS 245: Database System
Principles
Notes 4: Indexing
Hector Garcia-Molina
CS 245 Notes 4 1
Chapter 4
Indexing & Hashing

record
value ? value
CS 245 Notes 4 2
Topics
• Conventional indexes
• B-trees
• Hashing schemes
CS 245 Notes 4 3
Sequential File
10
20
30
40
50
60
70
80
90
100
CS 245 Notes 4 4
Dense Index Sequential File
10 10
20 20
30
40
30
40
50
60 50
70 60
80
70
90 80
100 90
110 100
120
CS 245 Notes 4 5
Sparse Index Sequential File
10 10
30 20
50
70
30
40
90
110 50
130 60
150
70
170 80
190 90
210 100
230
CS 245 Notes 4 6
Sparse 2nd level Sequential File
10 10 10
90 30 20
170 50
250 70
30
40
90
330 50
110
410 60
130
490
150
570 70
170 80
190 90
210 100
230
CS 245 Notes 4 7
• Comment:
{FILE,INDEX} may be contiguous
or not (blocks chained)
CS 245 Notes 4 8
Question:
• Can we build a dense, 2nd level index

for a dense index?
CS 245 Notes 4 9
Notes on pointers:
(1) Block pointer (sparse index) can be

smaller than record pointer
BP
RP
CS 245 Notes 4 10
Notes on pointers:
(2) If file is contiguous, then we can omit
pointers (i.e., compute them)
CS 245 Notes 4 11
R1
K1 R2
K2
K3 R3
K4
R4
CS 245 Notes 4 12
R1
K1 R2 say:
K2 1024 B
per block
K3 R3
K4
R4
• if we want K3 block:
get it at offset
(3-1)1024
= 2048 bytes
CS 245 Notes 4 13
Sparse vs. Dense Tradeoff
• Sparse: Less index space per record

can keep more of index in memory
• Dense: Can tell if any record exists
without accessing file
(Later:
– sparse better for insertions
– dense needed for secondary indexes)
CS 245 Notes 4 14
Terms
• Index sequential file
• Search key (  primary key)
• Primary index (on Sequencing field)
• Secondary index
• Dense index (all Search Key values in)
• Sparse index
• Multi-level index
CS 245 Notes 4 15
Next:
• Duplicate keys
• Deletion/Insertion
• Secondary indexes
CS 245 Notes 4 16
Duplicate keys
10
10
10
20
20
30
30
30
40
45
CS 245 Notes 4 17
Duplicate keys
Dense index, one way to implement?
10
10 10
10
10 10
20 20
20 20
30 30
30 30
30 30
40
45
CS 245 Notes 4 18
Duplicate keys
Dense index, better way?
10
10 10
20
30 10
40 20
20
30
30
30
40
45
CS 245 Notes 4 19
Duplicate keys
Sparse index, one way?
10
10 10
10
20 10
30 20
20
30
30
30
40
45
CS 245 Notes 4 20
Duplicate keys
Sparse index, one way?
10
careful if looking
10 10
for 20 or 30!
10
20 10
30 20
20
30
30
30
40
45
CS 245 Notes 4 21
Duplicate keys
Sparse index, another way?
– place first new key from block
10
10 10
20
30 10
30 20
20
30
30
30
40
45
CS 245 Notes 4 22
Duplicate keys
Sparse index, another way?
– place first new key from block
10
should 10 10
20
this be 30 10
40? 30 20
20
30
30
30
40
45
CS 245 Notes 4 23
Summary Duplicate values,
primary index
• Index may point to first instance of
each value only
File
Index a
a a
.
.
b
CS 245 Notes 4 24
Deletion from sparse index
10
10 20
30
50 30
70 40
90 50
110 60
130 70
150 80
CS 245 Notes 4 25
– delete record 40
10
10 20
30
50 30
70 40
90 50
110 60
130 70
150 80
CS 245 Notes 4 26
10
10 20
30
50 30
70 40
90 50
110 60
130 70
150 80
CS 245 Notes 4 27
10
10 20
30
50 30
70 40
90 50
110 60
130 70
150 80
CS 245 Notes 4 28
10
10 20
40 30
50 30 40
70 40
90 50
110 60
130 70
150 80
CS 245 Notes 4 29
– delete records 30 & 40
10
10 20
30
50 30
70 40
90 50
110 60
130 70
150 80
CS 245 Notes 4 30
10
10 20
30
50 30
70 40
90 50
110 60
130 70
150 80
CS 245 Notes 4 31
10
10 20
50 30
70 50 30
70 40
90 50
110 60
130 70
150 80
CS 245 Notes 4 32
Deletion from dense index
10
10 20
20
30 30
40 40
50 50
60 60
70 70
80 80
CS 245 Notes 4 33
10
10 20
20
30 30
40 40
50 50
60 60
70 70
80 80
CS 245 Notes 4 34
10
10 20
20
30 30 40
40 40
50 50
60 60
70 70
80 80
CS 245 Notes 4 35
10
10 20
20
40 30 30 40
40 40
50 50
60 60
70 70
80 80
CS 245 Notes 4 36
Insertion, sparse index case
10
10 20
30
40 30
60
40
50
60
CS 245 Notes 4 37
– insert record 34
10
10 20
30
40 30
60
40
50
60
CS 245 Notes 4 38
10
10 20
30
40 30
60 34
40
50
• our lucky day! 60
we have free space
where we need it!
CS 245 Notes 4 39
10
10 20
30
40 30
60
40
50
60
CS 245 Notes 4 40
10
10 20 15
20 30
40 30 20
60 30
40
50
60
CS 245 Notes 4 41
10
10 20 15
20 30
40 30 20
60 30
40
50
• Illustrated: Immediate reorganization 60
• Variation:
– insert new block (chained file)
– update index
CS 245 Notes 4 42
10
10 20
30
40 30
60
40
50
60
CS 245 Notes 4 43
10 25
10 20
30
40 30 overflow blocks
60 (reorganize later...)
40
50
60
CS 245 Notes 4 44
Insertion, dense index case
• Similar
• Often more expensive . . .
CS 245 Notes 4 45
Secondary indexes
Sequence
field
30
50
20
70
80
40
100
10
90
60
CS 245 Notes 4 46
Secondary indexes
Sequence
• Sparse index field
30
30
20
50
80 20
100 70
90 80
... 40
100
10
90
60
CS 245 Notes 4 47
Secondary indexes
Sequence
• Sparse index field
30
30
20
50
80 20
100 70
90 80
... 40
100
10
does not make sense! 90
60
CS 245 Notes 4 48
Secondary indexes
Sequence
• Dense index field
30
50
20
70
80
40
100
10
90
60
CS 245 Notes 4 49
Secondary indexes
Sequence
10 30
20 50
30
40 20
70
50
60
80
70
40
... 100
10
90
60
CS 245 Notes 4 50
Secondary indexes
Sequence
10 30
20 50
10 30
40 20
50
70
90
... 50
60
80
sparse 70
40
high ... 100
level 10
90
60
CS 245 Notes 4 51
With secondary indexes:
• Lowest level is dense
• Other levels are sparse
Also: Pointers are record pointers

(not block pointers; not computed)
CS 245 Notes 4 52
Duplicate values & secondary indexes
20
10
20
40
10
40
10
40
30
40
CS 245 Notes 4 53
one option...
10 20
10 10
10
20 20
40
20
30 10
40 40
40
10
40 40
40
... 30
40
CS 245 Notes 4 54
one option...
10 20
10 10
10
20
Problem: 20
40
20
excess overhead! 30 10
• disk space 40 40
40
• search time 10
40 40
40
... 30
40
CS 245 Notes 4 55
another option...
10 20
10
20
20
40
10
30
40
40 10
40
30
40
CS 245 Notes 4 56
another option...
10 20
10
20
Problem: 20
40
variable size 10
records in 30
40
index! 40 10
40
30
40
CS 245 Notes 4 57
20 
10 10
20
30 20
40 40
50 10
60 40
...
10 
40
Another idea (suggested in class): 30 
Chain records with same key?
40 
CS 245 Notes 4 58
20 
10 10
20
30 20
40 40
50 10
60 40
...
10 
40
Another idea (suggested in class): 30 
Chain records with same key?
40 
Problems:
• Need to add fields to records
• Need to follow chain to know records
CS 245 Notes 4 59
20
10 10
20
30 20
40 40
50 10
60 40
...
10
40
30
40
buckets
CS 245 Notes 4 60
Why “bucket” idea is useful
Indexes Records
Name: primary EMP (name,dept,floor,...)
Dept: secondary
Floor: secondary
CS 245 Notes 4 61
Query: Get employees in
(Toy Dept) ^ (2nd floor)
Dept. index EMP Floor index
Toy 2nd
CS 245 Notes 4 62
Query: Get employees in
(Toy Dept) ^ (2nd floor)
Dept. index EMP Floor index
Toy 2nd
 Intersect toy bucket and 2nd Floor

bucket to get set of matching EMP’s
CS 245 Notes 4 63
This idea used in
text information retrieval
Documents
...the cat is
fat ...
...was raining
cats and dogs...
...Fido the
dog ...
CS 245 Notes 4 64
This idea used in
text information retrieval
Documents
cat
...the cat is
fat ...
dog
...was raining
cats and dogs...
...Fido the
dog ...
Inverted lists
CS 245 Notes 4 65
IR QUERIES
• Find articles with “cat” and “dog”
• Find articles with “cat” or “dog”
• Find articles with “cat” and not “dog”
CS 245 Notes 4 66
IR QUERIES
• Find articles with “cat” or “dog”
• Find articles with “cat” and not “dog”
• Find articles with “cat” in title

within 5 words
CS 245 Notes 4 67
Common technique:
more info in inverted list
d1
cat Title 5
Author 10
Abstract 57
d3 d2
dog Title 100

Title 12
CS 245 Notes 4 68
Posting: an entry in inverted list.
Represents occurrence of
term in article
Size of a list: 1 Rare words or
(in postings) miss-spellings
106 Common words
Size of a posting: 10-15 bits (compressed)
CS 245 Notes 4 69
IR DISCUSSION
• Stop words
• Truncation
• Thesaurus
• Full text vs. Abstracts
• Vector model
CS 245 Notes 4 70
Vector space model
w1 w2 w3 w4 w5 w6 w7 …
DOC = <1 0 0 1 1 0 0 …>
Query= <0 0 1 1 0 0 0 …>
CS 245 Notes 4 71
Vector space model
w1 w2 w3 w4 w5 w6 w7 …
DOC = <1 0 0 1 1 0 0 …>
Query= <0 0 1 1 0 0 0 …>
PRODUCT = 1 + ……. = score
CS 245 Notes 4 72
• Tricks to weigh scores + normalize
e.g.: Match on common word not as

useful as match on rare words...
CS 245 Notes 4 73
• How to process V.S. Queries?
w1 w2 w3 w4 w5 w6 …
Q=<0 0 0 1 1 0 …>
CS 245 Notes 4 74
• Try Stanford Libraries
• Try Google, Yahoo, ...
CS 245 Notes 4 75
Summary so far
• Conventional index
– Basic Ideas: sparse, dense, multi-level…
– Duplicate Keys
– Deletion/Insertion
– Secondary indexes
– Buckets of Postings List
CS 245 Notes 4 76
Conventional indexes
Advantage:
- Simple
- Index is sequential file
good for scans
Disadvantage:
- Inserts expensive, and/or
- Lose sequentiality & balance
CS 245 Notes 4 77
Example Index (sequential)
10
20
30
continuous
40
50
60
free space
70
80
90
CS 245 Notes 4 78
Example Index (sequential)
10
20 39
30 31
33 35
continuous 36
40
50
60 32
38
free space 34
70
80
90 overflow area
(not sequential)
CS 245 Notes 4 79
Outline:
• Conventional indexes
• B-Trees  NEXT
• Hashing schemes
CS 245 Notes 4 80
• NEXT: Another type of index
– Give up on sequentiality of index
– Try to get “balance”
CS 245 Notes 4 81
3
5
CS 245
11
30
30
35
100
101
B+Tree Example
110
100
Notes 4
Root
120
130
150
156 120
179 150
180
180
n=3
200
82
Sample non-leaf
57
81
95
to keys to keys to keys to keys
< 57 57 k<81 81k<95 95
CS 245 Notes 4 83
Sample leaf node:
From non-leaf node
to next leaf
in sequence
57
81
95
with key 57
with key 81
with key 85
To record
To record
To record
CS 245 Notes 4 84
In textbook’s notation n=3
Leaf:
30 35
30
35
Non-leaf:
30
30
CS 245 Notes 4 85
Size of nodes: n+1 pointers
(fixed)
n keys
CS 245 Notes 4 86
Don’t want nodes to be too empty
• Use at least
Non-leaf: (n+1)/2 pointers
Leaf: (n+1)/2 pointers to data
CS 245 Notes 4 87
n=3
Full node min. node
Non-leaf
120
150
180
30
counts even if null

Leaf 11
30
35
3
5
CS 245 Notes 4 88
B+tree rules tree of order n
(1) All leaves at same lowest level

(balanced tree)
(2) Pointers in leaves point to records
except for “sequence pointer”
CS 245 Notes 4 89
(3) Number of pointers/keys for B+tree
Max Max Min Min

ptrs keys ptrsdata keys
Non-leaf n+1 n (n+1)/2 (n+1)/2- 1
(non-root)
Leaf n+1 n (n+1)/2 (n+1)/2
(non-root)
Root n+1 n 1 1
CS 245 Notes 4 90
Insert into B+tree
(a) simple case

– space available in leaf
(b) leaf overflow
(c) non-leaf overflow
(d) new root
CS 245 Notes 4 91
(a) Insert key = 32 n=3
100
30
11
30
31
3
5
CS 245 Notes 4 92
100
30
11
30
31
32
3
5
CS 245 Notes 4 93
100
30
11
30
31
3
5
CS 245 Notes 4 94
100
30
57
11
30
31
3
5
CS 245 Notes 4 95
100
30
7
57
11
30
31
3
5
CS 245 Notes 4 96
(c) Insert key = 160 n=3
100
120
150
180
180
200
150
156
179
CS 245 Notes 4 97
100
120
150
180
180
200
150
156
179
160
179
CS 245 Notes 4 98
100
120
150
180
180
180
200
150
156
179
160
179
CS 245 Notes 4 99
160
100
120
150
180
180
180
200
150
156
179
160
179
CS 245 Notes 4 100
(d) New root, insert 45 n=3
10
20
30
12
10
20
25
30
32
40
1
2
3
CS 245 Notes 4 101

10
20
30
12
10
20
25
30
32
40
40
45
1
2
3
CS 245 Notes 4 102

40
10
20
30
12
10
20
25
30
32
40
40
45
1
2
3
CS 245 Notes 4 103

new root
30
40
10
20
30
12
10
20
25
30
32
40
40
45
1
2
3
CS 245 Notes 4 104

Deletion from B+tree
(a) Simple case - no example

(b) Coalesce with neighbor (sibling)
(c) Re-distribute keys
(d) Cases (b) or (c) at non-leaf
CS 245 Notes 4 105

(b) Coalesce with sibling
n=4
– Delete 50
100
10
40
10
20
30
40
50
CS 245 Notes 4 106

(b) Coalesce with sibling
n=4
– Delete 50
100
10
40
40
10
20
30
40
50
CS 245 Notes 4 107

(c) Redistribute keys
n=4
– Delete 50
100
10
40
10
20
30
35
40
50
CS 245 Notes 4 108
(c) Redistribute keys
n=4
– Delete 50
40 35
100
10
35
10
20
30
35
40
50
CS 245 Notes 4 109
(d) Non-leaf coalese
n=4
– Delete 37
25
10
20
30
40
37
30
26
10
14
20
22
25
40
45
1
3
CS 245 Notes 4 110

n=4
– Delete 37
25
10
20
30
40
30
37
30
26
10
14
20
22
25
40
45
1
3
CS 245 Notes 4 111

n=4
– Delete 37
25
40
10
20
30
40
30
37
30
26
10
14
20
22
25
40
45
1
3
CS 245 Notes 4 112

n=4
– Delete 37
25
new root
40
25
10
20
30
40
30
37
30
26
10
14
20
22
25
40
45
1
3
CS 245 Notes 4 113

B+tree deletions in practice
– Often, coalescing is not implemented

– Too hard and not worth it!
CS 245 Notes 4 114

Comparison: B-trees vs. static
indexed sequential file
Ref #1: Held & Stonebraker
“B-Trees Re-examined”
CACM, Feb. 1978
CS 245 Notes 4 115

Ref # 1 claims:
- Concurrency control harder in B-Trees
- B-tree consumes more space
For their comparison:

block = 512 bytes
key = pointer = 4 bytes
4 data records per block
CS 245 Notes 4 116

Example: 1 block static index
k1
1 data
block
k1 k2
127 keys k2
k3 k3
(127+1)4 = 512 Bytes

-> pointers in index implicit! up to 127
blocks
CS 245 Notes 4 117

Example: 1 block B-tree
k1 k1
1 data
block
k2
k2
...
63 keys
k63
k3
-
next
63x(4+4)+8 = 512 Bytes
-> pointers needed in B-tree up to 63
blocks because index is blocks
not contiguous
CS 245 Notes 4 118
Size comparison Ref. #1
Static Index B-tree

# data # data
blocks height blocks height
2 -> 127 2 2 -> 63 2

128 -> 16,129 3 64 -> 3968 3
16,130 -> 2,048,383 4 3969 -> 250,047 4
250,048 -> 15,752,961 5
CS 245 Notes 4 119

Ref. #1 analysis claims
• For an 8,000 block file,
after 32,000 inserts
after 16,000 lookups
 Static index saves enough accesses
to allow for reorganization
CS 245 Notes 4 120

Ref. #1 analysis claims
• For an 8,000 block file,
after 32,000 inserts
after 16,000 lookups
 Static index saves enough accesses
to allow for reorganization
Ref. #1 conclusion Static index better!!
CS 245 Notes 4 121

Ref #2: M. Stonebraker,
“Retrospective on a database
system,” TODS, June 1980
Ref. #2 conclusion B-trees better!!
CS 245 Notes 4 122

• DBA does not know when to reorganize

• DBA does not know how full to load
pages of new index
CS 245 Notes 4 123

• Buffering
– B-tree: has fixed buffer requirements
– Static index: must read several overflow
blocks to be efficient
(large & variable size
buffers needed for this)
CS 245 Notes 4 124

• Speaking of buffering…
Is LRU a good policy for B+tree buffers?
CS 245 Notes 4 125

• Speaking of buffering…
Is LRU a good policy for B+tree buffers?
 Of course not!
 Should try to keep root in memory
at all times
(and perhaps some nodes from second level)
CS 245 Notes 4 126

Interesting problem:
For B+tree, how large should n be?
n is number of keys / node
CS 245 Notes 4 127

Sample assumptions:
(1) Time to read node from disk is
(S+Tn) msec.
CS 245 Notes 4 128

Sample assumptions:
(S+Tn) msec.
(2) Once block in memory, use binary
search to locate key:
(a + b LOG2 n) msec.
For some constants a,b; Assume a << S
CS 245 Notes 4 129

Sample assumptions:
(S+Tn) msec.
(2) Once block in memory, use binary
search to locate key:
(a + b LOG2 n) msec.
For some constants a,b; Assume a << S
(3) Assume B+tree is full, i.e.,

# nodes to examine is LOGn N
where N = # records
CS 245 Notes 4 130
Can get:
f(n) = time to find a record
f(n)
nopt n
CS 245 Notes 4 131

 FIND nopt by f’(n) = 0
Answer is nopt = “few hundred”

(see homework for details)
CS 245 Notes 4 132

 FIND nopt by f’(n) = 0
Answer is nopt = “few hundred”

(see homework for details)
 What happens to nopt as

• Disk gets faster?
• CPU get faster?
CS 245 Notes 4 133

Variation on B+tree: B-tree (no +)
• Idea:
– Avoid duplicate keys
– Have record pointers in non-leaf nodes
CS 245 Notes 4 134

K1 P1 K2 P2 K3 P3
to record to record to record

with K1 with K2 with K3
to keys to keys to keys to keys
< K1 K1<x<K2 K2<x<k3 >k3
CS 245 Notes 4 135

10
CS 245
20
30
40
25
50
45
60
B-tree example
70
80
90 85 65
Notes 4
100 105 125
110
120
130
140 145
150 165
160
n=2
170
136
180
B-tree example n=2
• sequence pointers
not useful now!
125
(but keep space for simplicity)
65
105
145
165
25
45
85
110
160
100
120
130
140
150
170
180
20
10
30
40
50
60
70
80
90
CS 245 Notes 4 137

Note on inserts
• Say we insert record with key = 25
n=3
leaf
30
10
20
CS 245 Notes 4 138

Note on inserts
• Say we insert record with key = 25
n=3
leaf
30
10
20
• Afterwards:
20
–
–
10
25
30
CS 245 Notes 4 139
So, for B-trees:
MAX MIN
Tree Rec Keys Tree Rec Keys
Ptrs Ptrs Ptrs Ptrs
Non-leaf
non-root n+1 n n (n+1)/2 (n+1)/2-1 (n+1)/2-1
Leaf
non-root 1 n n 1 n/2 n/2
Root
non-leaf n+1 n n 2 1 1
Root
Leaf 1 n n 1 1 1
CS 245 Notes 4 140

Tradeoffs:
 B-trees have faster lookup than B+trees
 in B-tree, non-leaf & leaf different sizes

 in B-tree, deletion more complicated
CS 245 Notes 4 141

Tradeoffs:
 B-trees have faster lookup than B+trees
 in B-tree, non-leaf & leaf different sizes

 in B-tree, deletion more complicated
 B+trees preferred!
CS 245 Notes 4 142

But note:
• If blocks are fixed size
(due to disk and buffering restrictions)
Then lookup for B+tree is
actually better!!
CS 245 Notes 4 143

Example:
- Pointers 4 bytes
- Keys 4 bytes
- Blocks 100 bytes (just example)
- Look at full 2 level tree
CS 245 Notes 4 144

B-tree:
Root has 8 keys + 8 record pointers
+ 9 son pointers
= 8x4 + 8x4 + 9x4 = 100 bytes
CS 245 Notes 4 145

B-tree:
+ 9 son pointers
= 8x4 + 8x4 + 9x4 = 100 bytes
Each of 9 sons: 12 rec. pointers (+12 keys)

= 12x(4+4) + 4 = 100 bytes
CS 245 Notes 4 146

B-tree:
+ 9 son pointers
= 8x4 + 8x4 + 9x4 = 100 bytes
Each of 9 sons: 12 rec. pointers (+12 keys)

= 12x(4+4) + 4 = 100 bytes
2-level B-tree, Max # records =

12x9 + 8 = 116
CS 245 Notes 4 147

B+tree:
Root has 12 keys + 13 son pointers

= 12x4 + 13x4 = 100 bytes
CS 245 Notes 4 148

B+tree:

= 12x4 + 13x4 = 100 bytes
Each of 13 sons: 12 rec. ptrs (+12 keys)

= 12x(4 +4) + 4 = 100 bytes
CS 245 Notes 4 149

B+tree:

= 12x4 + 13x4 = 100 bytes
Each of 13 sons: 12 rec. ptrs (+12 keys)

= 12x(4 +4) + 4 = 100 bytes
2-level B+tree, Max # records

= 13x12 = 156
CS 245 Notes 4 150

So... 8 records
B+ B
ooooooooooooo ooooooooo
156 records 108 records
Total = 116
CS 245 Notes 4 151

So... 8 records
B+ B
ooooooooooooo ooooooooo
156 records 108 records
Total = 116
• Conclusion:
– For fixed block size,
– B+ tree is better because it is bushier
CS 245 Notes 4 152
An Interesting Problem...
• What is a good index structure when:
– records tend to be inserted with keys
that are larger than existing values?
(e.g., banking records with growing data/time)
– we want to remove older data
CS 245 Notes 4 153

One Solution: Multiple Indexes
• Example: I1, I2
day days indexed days indexed
I1 I2
10 1,2,3,4,5 6,7,8,9,10
11 11,2,3,4,5 6,7,8,9,10
12 11,12,3,4,5 6,7,8,9,10
13 11,12,13,4,5 6,7,8,9,10
•advantage: deletions/insertions from smaller index

•disadvantage: query multiple indexes
CS 245 Notes 4 154
Another Solution (Wave Indexes)
day I1 I2 I3 I4
10 1,2,3 4,5,6 7,8,9 10
11 1,2,3 4,5,6 7,8,9 10,11
12 1,2,3 4,5,6 7,8,9 10,11, 12
13 13 4,5,6 7,8,9 10,11, 12
14 13,14 4,5,6 7,8,9 10,11, 12
15 13,14,15 4,5,6 7,8,9 10,11, 12
16 13,14,15 16 7,8,9 10,11, 12
•advantage: no deletions
•disadvantage: approximate windows
CS 245 Notes 4 155

Outline/summary
• Conventional Indexes
• Sparse vs. dense
• Primary vs. secondary
• B trees
• B+trees vs. B-trees
• B+trees vs. indexed sequential
• Hashing schemes --> Next
CS 245 Notes 4 156

CS 245: Database System Principles: Notes 4: Indexing

Uploaded by

Copyright:

Available Formats

You might also like

CS 245: Database System Principles: Notes 4: Indexing

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS 245: Database System Principles: Notes 4: Indexing

Uploaded by

Copyright:

Available Formats

CS 245: Database System

Indexing & Hashing

• Can we build a dense, 2nd level index

(1) Block pointer (sparse index) can be

• Sparse: Less index space per record

Also: Pointers are record pointers

 Intersect toy bucket and 2nd Floor

• Find articles with “cat” in title

dog Title 100

106 Common words

Size of a posting: 10-15 bits (compressed)

Query= <0 0 1 1 0 0 0 …>

Query= <0 0 1 1 0 0 0 …>

PRODUCT = 1 + ……. = score

e.g.: Match on common word not as

Non-leaf: (n+1)/2 pointers

Leaf: (n+1)/2 pointers to data

Full node min. node

counts even if null

(1) All leaves at same lowest level

Max Max Min Min

(a) simple case

CS 245 Notes 4 101

CS 245 Notes 4 102

CS 245 Notes 4 103

CS 245 Notes 4 104

(a) Simple case - no example

CS 245 Notes 4 105

CS 245 Notes 4 106

CS 245 Notes 4 107

CS 245 Notes 4 110

CS 245 Notes 4 111

CS 245 Notes 4 112

CS 245 Notes 4 113

– Often, coalescing is not implemented

CS 245 Notes 4 114

CS 245 Notes 4 115

For their comparison:

CS 245 Notes 4 116

(127+1)4 = 512 Bytes

CS 245 Notes 4 117

Static Index B-tree

2 -> 127 2 2 -> 63 2

CS 245 Notes 4 119

CS 245 Notes 4 120

Ref. #1 conclusion Static index better!!

CS 245 Notes 4 121

Ref. #2 conclusion B-trees better!!

CS 245 Notes 4 122

• DBA does not know when to reorganize

CS 245 Notes 4 123

CS 245 Notes 4 124

CS 245 Notes 4 125

CS 245 Notes 4 126

n is number of keys / node

CS 245 Notes 4 127

CS 245 Notes 4 128