Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 32

The Power of Incorrectness

A Brief Introduction to Soft Heaps

The Problem

A heap (priority queue) is a data structure that stores elements with keys chosen from a totally ordered set (e.g. integers) We need to support the following operations
Insert

an element Update (decrease the key of an element) Extract min (find and delete the element with minimum key) Merge (optional)

A Note on Notation

We evaluate algorithm speed using big O notation. Most of the upper bounds on runtime given here are also lower bounds, but we use just big-O to simplify notation. Some of the runtime specified are amortized, which means the average over a sequence of operation. Theyre stated as normal bounds to reduce confusion. All logs are 2 based and denoted as lg. N is the number of elements in our heap at any time. We also use it to denote the number of operations. Were working in the comparison model.

What about delete?


Note we can perform delete the lazy style by marking the elements as deleted using a flag. Then we can perform repeated extractmins when the minimum element is already marked. So delete doesnt need to be treated any differently than extract-min.

The General Approach

We store the elements in a tree with a constant branching factor Heap condition: the key of any node is always at least the key of its parent. *Exercise: show we can perform insert and delete min in time proportional to the height of the tree.

Binary Heaps
We use a perfectly balanced tree with a constant branching factor. The height of the tree is O(lgN). So insert/update/extract min all take O(lgN) time. Merge is not supported as a basic operation.

Binomial Heap

Binomial heaps use a branching factor of O(lgN) and can also support merge in O(lgn) time. The main idea is keep a forest of trees, each with a number of nodes are powers of two and no two have the same size. When we merge two trees of same size, we get another tree whose size is a power of two. We can merge two such forests in O(lgN) time in a manner analogous to binary addition.

Structure of Binomial Heaps

We typically describe binomial heaps as one binomial heap attached to another of the same size. Let the rank of a tree in a binomial be the log of the number of nodes it has.

Even More Heap

If we get lazy with binomial heaps, which is to save all the work until we perform extract min, we can get to O(1) per insert and merge, but O(lgn) for extract min and update. Fibonacci heaps (by Tarjan) can do insert, update, merge in O(1) per access. But it still requires O(lgN) for a delete Can we get rid of the O(lgN) factor?

No!
WHY?

A Bound on Sorting
We cant sort N numbers in a comparison model faster than O(NlgN) time. Sketch of Proof:

There

are N! possible permutations Each operation in comparison model can only have 2 possible results, true/false. So for our algorithm to give distinguish all N! inputs, we need log(N!)=O(NlgN) operations.

Now Apply to Heaps

Given an array of N elements, we can insert them into a heap in N inserts. Performing extract-min once gives the 1st element of the sorted list, 2nd time gives the 2nd element. So we can perform extract-min N times to get a sorted list back So one of insert or extract min must take at least O(lgN) time per operation.

Is There a Way Around This

Note there is a hidden assumption made in the proof on the previous slide:
The result given by every call of extract-min must be correct.

The Idea
We sacrifice some correctness to get a better runtime. To be more specific, We allow a fraction of the answers provided by extract-min to be incorrect.

Soft Heaps
Supports insertion, update, extract-min and merge in O(1) time. No more than N (0<<0.5) of all elements have their keys raised at any point.

The Motivation: Car Pooling

No, I Meant This :

The Idea in Words

We modify the binomial heap described earlier.


trees

dont have to be full anymore. The idea of rank can be transplanted.

We put multiple elements on the same node, resulting in the non-fullness. This allows a reduction in the height of the tree.

The Catch
If a node has multiple elements stored on it, how do we track which one is the minimum? Solution: we assign all the elements in the list the same key. Some of the keys would be raised. This is where the error rate comes in.

Example

Modified binomial heap with 8 elements. Two of the nodes have 2 elements instead of one. Note 2 and 3s key values are raised. But two nodes in the deeper parts of the tree are no longer there.

Outline of the Algorithm


Insert is done through merging of heaps We merge as we do in binomial heaps, in a manner not so different than adding binary numbers. When inserting, we do not have to change any of the lists stored in the nodes, all we have to do it to maintain heap order when merging trees.

Extract-Min

If the roots list is not empty, we just take what it close to the minimum, remove it and reduce the size of the list by 1
Recall

that we dont have to be right sometimes.

This is a bit trickier when the list is empty. In both cases we siphon elements from below the root to append the roots list using a separate procedure called sift.

Sift

We pull up some of the elements the current nodes list up the tree. We need to concatenate the item lists when two lists collide. Then we perform sift on one of the children of the current node. Note that at this point were doing the same thing as in a binary heap. However, we call sift on another child of the node on some cases, which makes the sift calls truly branching. The question is when to do this.

How Many Elements Do We Sift?


This is tricky. If we dont sift, the height (and thus runtime) would become O(lgN). But if we sift too much, we can get more than N elements with keys raised. We need to use a combination of the size of the tree and size of the current list to decide when to sift and destroy nodes, hence the branching condition is key.

Sift Loop Condition


We call sift twice when the rank of the current tree is large enough (>r for some r) and the rank is odd. The rank being odd condition ensures we never call sift more than twice. The constant r is used to globally control how much we sift.

One More Detail


We need to keep a rank invariant, which states that a node has at least half as many children as its rank. This prevents excessive merging of lists. We can keep this condition as follows: every time we find a violation on root, we dismantle it list and merge the elements of the list to get its subtrees.

Result of the Analysis

The total cost of merging is O(n) by an argument similar to counting the number of carries resulting from incrementing a binary counter N times. Result on Sift from paper (no proof):

Let r=2+2lg(1/ ), then sift runs in a O(r) per call, which is O(1) per operation as is constant.

We can also show that the runtime of O(1/ ) is optimal if at most N elements can have keys raised.

Note that if we set =1/2N, no errors can occur and we get a normal heap back.

Is This Any Useful?


Dont ever submit this for your CS assignment and expect it to get right answers.

A Problem
Given a list of N numbers, we want to find the kth largest in O(N) time. Randomized quick-select does it in O(N), but its randomized. The most well-known deterministic algorithm for this involved finding the median of 5 numbers and taking median of medians...basically a mess.

A Simple Deterministic Solution

We insert all N elements in to a soft heap with error rate =1/3, and perform extract min N/3 times. Then the largest number deleted has rank between 1/3N and 2/3N. So we can remove 1/3n numbers from consideration each time (the ones on different side of k) and do the rest recursively. Runtime: n+(2/3)n+(2/3)2n+(2/3)3n=O(n)

Other applications

Approximate sorting: sort n numbers so theyre nearly ordered. Dynamic maintenance of percentiles Minimum spanning trees
This

is the problem soft heaps were designed to solve. It gives the best algorithm to date. With soft heap (and another 5~6 pages of work), we can get an O(E(E)) algorithm for minimum spanning tree. ((E) is the inverse Ackerman function)

Bibliography
Chazelle, Bernard. The Soft Heap: An Approximate Priority Queue with Optimal Error Rate. Chazelle, Bernard. A Minimum Spanning Tree Algorithm with Inverse Ackerman Type Complexity. Wikipedia

You might also like