www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.algorithm.BinaryHeap

reply bearophile <bearophileHUGS lycos.com> writes:
I have tried to use BinaryHeap of Phobos2. As a warming up exercise I am
translating this little Python program to D2 using Phobos2 only, a very basic
Huffman encoder:

from heapq import heappush, heappop, heapify
from collections import defaultdict

def encode(freqs):
    """Huffman encode the given dict mapping symbols to weights"""
    heap = [[float(wt), [sym, '']] for sym,wt in freqs.iteritems()]
    heapify(heap)
    while len(heap) > 1:
        lo = heappop(heap)
        hi = heappop(heap)
        for i in lo[1:]:
            i[1] = '0' + i[1]
        for i in hi[1:]:
            i[1] = '1' + i[1]
        lohi = [lo[0] + hi[0]] + lo[1:] + hi[1:]
        heappush(heap, lohi)
    return sorted(heappop(heap)[1:], key=lambda x: (len(x[-1]), x))

astring = "this is an example for huffman encoding"
freqs = defaultdict(int)
for c in astring:
    freqs[c] += 1
huff = encode(freqs)
print "SYMBOL\tWEIGHT\tHUFFMAN CODE"
for h in huff:
    print "'%s'\t%s\t%s" % (h[0], freqs[h[0]], h[1])

That outputs is:
SYMBOL	WEIGHT	HUFFMAN CODE
' '	6	101
'n'	4	010
'a'	3	1001
'e'	3	1100
'f'	3	1101
'h'	2	0001
'i'	3	1110
'm'	2	0010
'o'	2	0011
's'	2	0111
'g'	1	00000
'l'	1	00001
'p'	1	01100
'r'	1	01101
't'	1	10000
'u'	1	10001
'x'	1	11110
'c'	1	111110
'd'	1	111111

(Note that the python heap isn't object oriented, I'd like to write it in C for
Python2.6, in the meantime if you need OOP you can use
http://code.activestate.com/recipes/502295/ ).

Regarding BinaryHeap of Phobos2:

This may be a dumb question: How do you push/add an item to such heap (to
translate the heappush)? I have found no info in the docs.

I suggest to add one or two examples of the acquire(Range r) method, because I
don't understand its purpose.

I suggest to add to the std.algorithm module something like:

BinaryHeap!(Range) binaryHeap(Range)(Range arr) {
    return BinaryHeap!(Range)(arr);
}

Otherwise most of the times you have to use:
auto heap = BinaryHeap!(typeof(arr))(arr);

(In the following few months I'll probably post many more notes about Phobos2).

Bye,
bearophile
Apr 27 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 This may be a dumb question: How do you push/add an item to such heap
 (to translate the heappush)? I have found no info in the docs.

It's a very good question. BinaryHeap intently lacks a way to increase its length because it's meant to be a structuring over an existing range, not an autonomous one. As such, BinaryHeap is pretty obtuse to use (you need to make an array first and then bind BinaryHeap to it). Once there's a clear way to express topology in Phobos, BinaryHeap will use it. Currently there's an embryo of topology-related choices in std.range: enum Topology { /** The range cannot change the container's topology (whereas it can change its content). This is useful if e.g. the container must control creation and destruction of its slots. */ fixed, /** The range can change the underlying container's structure. This is useful if the range is free-floating and is not owned by any container. */ flexible } These topology choices are used with SListRange (independent, growable singly-linked list vs. a range looking at a different singly-linked list). If the notion of topology turns out to be a good design, today's BinaryHeap is Topology.fixed, whereas a BinaryHeap that is Topology.flexible can grow no problem. Andrei
Apr 27 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 These topology choices are used with SListRange (independent, growable 
 singly-linked list vs. a range looking at a different singly-linked 
 list). If the notion of topology turns out to be a good design, today's 
 BinaryHeap is Topology.fixed, whereas a BinaryHeap that is 
 Topology.flexible can grow no problem.

Do you mean that an heap that allows pop() to remove items is Topology.fixed?? Is extending a dynamic array a change in its topology? I think you are over-generalizing a bit here. 99% of the times I just need a heap to push and pop items from it (eventually even many at the same time), and I don't care much of its internal implementation (that may be a dynamic array of pointers to large blocks of items, like a Deque, but with internal blocks not necessarily fully filled of items). Using a linked list to implement a binary heap doesn't sound like much efficient). So I suggest to use a simpler API, that takes a range, builds some internal representation, and allows me to push and pop items... Your design is more efficient (no copying data, for example) but its API is more hairy. Most people most of the time don't want to mess with such details. From what you say I guess there is currently no way to add items to an heap. Bye, bearophile
Apr 27 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei, if you have missed it, you can take a look at this simple API of mine:
http://code.activestate.com/recipes/502295/

This is the constructor (well almost, in Python the constructor is __new__), it
uses an optional key function, instead of a cmp:
def __init__(self, sequence=None, key=None, inplace=False):

"inplace" allows it to not copy items if the given sequence is a list (that is
a dynamic array).
inplace is False by default, to increase safety and reduce surprises, so the
default design is nice and safe. When your profiling or your brain tells you
need more speed, that can be as efficient as your implementation (because lists
are very commonly used data structures in Python, so most of the times your
items are already in a list).
In D probably a cmp function is faster than a key (because a key needs to
allocate new memory), but 99% of the times the key is nicer to use.

Bye,
bearophile
Apr 27 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 These topology choices are used with SListRange (independent,
 growable singly-linked list vs. a range looking at a different
 singly-linked list). If the notion of topology turns out to be a
 good design, today's BinaryHeap is Topology.fixed, whereas a
 BinaryHeap that is Topology.flexible can grow no problem.

Do you mean that an heap that allows pop() to remove items is Topology.fixed?? Is extending a dynamic array a change in its topology?

Not pop because reducing a range does not modify the topology. Growing is the issue.
 I think you are over-generalizing a bit here. 99% of the times I just
 need a heap to push and pop items from it (eventually even many at
 the same time), and I don't care much of its internal implementation
 (that may be a dynamic array of pointers to large blocks of items,
 like a Deque, but with internal blocks not necessarily fully filled
 of items). Using a linked list to implement a binary heap doesn't
 sound like much efficient). So I suggest to use a simpler API, that
 takes a range, builds some internal representation, and allows me to
 push and pop items... Your design is more efficient (no copying data,
 for example) but its API is more hairy. Most people most of the time
 don't want to mess with such details.

If you look at the use of BinaryHeap inside std.algorithm (topNCopy and topNIndex, two rather important basic algorithms) you'll see that a heap that organizes an existing range is the way to go. I agree that efficiency consideration don't have to make the interface obtuse. A good solution is to provide configurable libraries that make everybody happy.
 From what you say I guess there is currently no way to add items to
 an heap.

There isn't an efficient method (you could rebind the heap to a different array). The point is that it doesn't exist not because I believe it's not necessary, but because I am looking for a good way to provide it. Andrei
Apr 27 2009
parent Jason House <jason.james.house gnail.com> writes:
Given the rstricted use of std.algorithm's binary heap, what would be tge
recommended way to implement Dijkstra's algorithm. Basically, the way I'd
implement it is as follows:

1. Push nodes from edges connected to root onto heap using weights as key
2. Pop lowest value from heap
3. If connected node is endpoint, stop
4. Push nodes from edges connecting to popped node back on heap
     (weight = value popped + edge weight)

It's very simple, but requires growth...

Andrei Alexandrescu Wrote:

 bearophile wrote:
 Andrei Alexandrescu:
 These topology choices are used with SListRange (independent,
 growable singly-linked list vs. a range looking at a different
 singly-linked list). If the notion of topology turns out to be a
 good design, today's BinaryHeap is Topology.fixed, whereas a
 BinaryHeap that is Topology.flexible can grow no problem.

Do you mean that an heap that allows pop() to remove items is Topology.fixed?? Is extending a dynamic array a change in its topology?

Not pop because reducing a range does not modify the topology. Growing is the issue.
 I think you are over-generalizing a bit here. 99% of the times I just
 need a heap to push and pop items from it (eventually even many at
 the same time), and I don't care much of its internal implementation
 (that may be a dynamic array of pointers to large blocks of items,
 like a Deque, but with internal blocks not necessarily fully filled
 of items). Using a linked list to implement a binary heap doesn't
 sound like much efficient). So I suggest to use a simpler API, that
 takes a range, builds some internal representation, and allows me to
 push and pop items... Your design is more efficient (no copying data,
 for example) but its API is more hairy. Most people most of the time
 don't want to mess with such details.

If you look at the use of BinaryHeap inside std.algorithm (topNCopy and topNIndex, two rather important basic algorithms) you'll see that a heap that organizes an existing range is the way to go. I agree that efficiency consideration don't have to make the interface obtuse. A good solution is to provide configurable libraries that make everybody happy.
 From what you say I guess there is currently no way to add items to
 an heap.

There isn't an efficient method (you could rebind the heap to a different array). The point is that it doesn't exist not because I believe it's not necessary, but because I am looking for a good way to provide it. Andrei

Apr 27 2009