Professional Documents
Culture Documents
The Power of Incorrectness: A Brief Introduction To Soft Heaps
The Power of Incorrectness: A Brief Introduction To Soft Heaps
The Problem
A heap (priority queue) is a data structure that stores elements with keys chosen from a totally ordered set (e.g. integers) We need to support the following operations
Insert
an element Update (decrease the key of an element) Extract min (find and delete the element with minimum key) Merge (optional)
A Note on Notation
We evaluate algorithm speed using big O notation. Most of the upper bounds on runtime given here are also lower bounds, but we use just big-O to simplify notation. Some of the runtime specified are amortized, which means the average over a sequence of operation. Theyre stated as normal bounds to reduce confusion. All logs are 2 based and denoted as lg. N is the number of elements in our heap at any time. We also use it to denote the number of operations. Were working in the comparison model.
We store the elements in a tree with a constant branching factor Heap condition: the key of any node is always at least the key of its parent. *Exercise: show we can perform insert and delete min in time proportional to the height of the tree.
Binary Heaps
We use a perfectly balanced tree with a constant branching factor. The height of the tree is O(lgN). So insert/update/extract min all take O(lgN) time. Merge is not supported as a basic operation.
Binomial Heap
Binomial heaps use a branching factor of O(lgN) and can also support merge in O(lgn) time. The main idea is keep a forest of trees, each with a number of nodes are powers of two and no two have the same size. When we merge two trees of same size, we get another tree whose size is a power of two. We can merge two such forests in O(lgN) time in a manner analogous to binary addition.
We typically describe binomial heaps as one binomial heap attached to another of the same size. Let the rank of a tree in a binomial be the log of the number of nodes it has.
If we get lazy with binomial heaps, which is to save all the work until we perform extract min, we can get to O(1) per insert and merge, but O(lgn) for extract min and update. Fibonacci heaps (by Tarjan) can do insert, update, merge in O(1) per access. But it still requires O(lgN) for a delete Can we get rid of the O(lgN) factor?
No!
WHY?
A Bound on Sorting
We cant sort N numbers in a comparison model faster than O(NlgN) time. Sketch of Proof:
There
are N! possible permutations Each operation in comparison model can only have 2 possible results, true/false. So for our algorithm to give distinguish all N! inputs, we need log(N!)=O(NlgN) operations.
Given an array of N elements, we can insert them into a heap in N inserts. Performing extract-min once gives the 1st element of the sorted list, 2nd time gives the 2nd element. So we can perform extract-min N times to get a sorted list back So one of insert or extract min must take at least O(lgN) time per operation.
Note there is a hidden assumption made in the proof on the previous slide:
The result given by every call of extract-min must be correct.
The Idea
We sacrifice some correctness to get a better runtime. To be more specific, We allow a fraction of the answers provided by extract-min to be incorrect.
Soft Heaps
Supports insertion, update, extract-min and merge in O(1) time. No more than N (0<<0.5) of all elements have their keys raised at any point.
We put multiple elements on the same node, resulting in the non-fullness. This allows a reduction in the height of the tree.
The Catch
If a node has multiple elements stored on it, how do we track which one is the minimum? Solution: we assign all the elements in the list the same key. Some of the keys would be raised. This is where the error rate comes in.
Example
Modified binomial heap with 8 elements. Two of the nodes have 2 elements instead of one. Note 2 and 3s key values are raised. But two nodes in the deeper parts of the tree are no longer there.
Extract-Min
If the roots list is not empty, we just take what it close to the minimum, remove it and reduce the size of the list by 1
Recall
This is a bit trickier when the list is empty. In both cases we siphon elements from below the root to append the roots list using a separate procedure called sift.
Sift
We pull up some of the elements the current nodes list up the tree. We need to concatenate the item lists when two lists collide. Then we perform sift on one of the children of the current node. Note that at this point were doing the same thing as in a binary heap. However, we call sift on another child of the node on some cases, which makes the sift calls truly branching. The question is when to do this.
The total cost of merging is O(n) by an argument similar to counting the number of carries resulting from incrementing a binary counter N times. Result on Sift from paper (no proof):
Let r=2+2lg(1/ ), then sift runs in a O(r) per call, which is O(1) per operation as is constant.
We can also show that the runtime of O(1/ ) is optimal if at most N elements can have keys raised.
Note that if we set =1/2N, no errors can occur and we get a normal heap back.
A Problem
Given a list of N numbers, we want to find the kth largest in O(N) time. Randomized quick-select does it in O(N), but its randomized. The most well-known deterministic algorithm for this involved finding the median of 5 numbers and taking median of medians...basically a mess.
We insert all N elements in to a soft heap with error rate =1/3, and perform extract min N/3 times. Then the largest number deleted has rank between 1/3N and 2/3N. So we can remove 1/3n numbers from consideration each time (the ones on different side of k) and do the rest recursively. Runtime: n+(2/3)n+(2/3)2n+(2/3)3n=O(n)
Other applications
Approximate sorting: sort n numbers so theyre nearly ordered. Dynamic maintenance of percentiles Minimum spanning trees
This
is the problem soft heaps were designed to solve. It gives the best algorithm to date. With soft heap (and another 5~6 pages of work), we can get an O(E(E)) algorithm for minimum spanning tree. ((E) is the inverse Ackerman function)
Bibliography
Chazelle, Bernard. The Soft Heap: An Approximate Priority Queue with Optimal Error Rate. Chazelle, Bernard. A Minimum Spanning Tree Algorithm with Inverse Ackerman Type Complexity. Wikipedia