## digitalmars.D.learn - Creating a Priority Queue: An Adventure

"DarthCthulhu" <spam spam.com> writes:
```So I've just gotten into D and decided to have a go at creating
something relatively simple to get my feet wet: a priority queue.
Seemed like a simple enough thing, right? It's a pretty basic
data structure, it can't possibly be THAT difficult, right?

First, I tried using associative arrays only to discover that
associative arrays are not and can not be sorted, nor is the
order of elements guaranteed via insertion. Doh! Back to the
drawing board.

Next, I discovered the BinaryHeap. Aha! This does exactly what I
want! This should be easy! Just a thin little wrapper around it
and I should be golden.

I created a small helper struct to encapsulate the priority and
values... and then I discovered Tuple, which does that pretty
much already. Feeling pretty stupid for missing tuples before, I
rewrote the PriorityQueue to use tuples.

And now (two days later) things seem to work... except for one
big thing.

The BinaryHeap always has a length of zero. This is especially
confusing as I can verify that new elements are being inserted,
the BinaryHeap just keeps its length at zero. I figure I must be
doing something wrong, but I haven't the faintest clue what it
could be.

Here's some code:

<code>

struct PriorityQueue(P, V, alias predicate = "a > b") {

// Templated queue
alias Tuple!(P, V) PV;
BinaryHeap!(Array!(PV), predicate) queue;

property bool empty()
{
return queue.empty;
}

property PV front()
{

return queue.front;

}

property int length() {
return queue.length;
}

void insert(P priority, V value) {

// Insert the new element into the queue
queue.insert( PV(priority, value) );

}

void insert(PV ins) {
queue.insert(ins);
}

void popFront()
{
// Remove the first key
queue.popFront();
}

}

void main() {

PriorityQueue!(int, string) pq;

pq.insert(10, "HELLO10");
pq.insert(11, "HELLO11");
pq.insert(Tuple!(int, string)(3, "HELLO3"));

writefln("PQ: %s", pq.queue); <- prints PQ: [Tuple!(int,
string)(3, "HELLO3"), Tuple!(int, string)(10, "HELLO10"),
Tuple!(int, string)(11, "HELLO11")]

writefln("PQ length: %s", pq.queue.length); <- prints PQ length:
0

assert( !pq.empty ); <- Assert fails
}

</code>

I must be doing something really stupid here, but I have no clue
what it is. Anyone know?
```
Aug 04 2015
Steven Schveighoffer <schveiguy yahoo.com> writes:
```On 8/4/15 9:02 PM, DarthCthulhu wrote:

writefln("PQ: %s", pq.queue); <- prints PQ: [Tuple!(int, string)(3,
"HELLO3"), Tuple!(int, string)(10, "HELLO10"), Tuple!(int, string)(11,
"HELLO11")]

This is probably consuming your queue, popping all the data off as it
prints. If you print the length before hand, I'll bet it's not zero.

I don't know how to print the elements without removing them, as binary
heap doesn't have a range type, it seems to be the range itself (an odd
situation). Perhaps print the underlying storage?

-Steve
```
Aug 04 2015
"DarthCthulhu" <spam spam.com> writes:
```On Wednesday, 5 August 2015 at 01:27:53 UTC, Steven Schveighoffer
wrote:
On 8/4/15 9:02 PM, DarthCthulhu wrote:

writefln("PQ: %s", pq.queue); <- prints PQ: [Tuple!(int,
string)(3,
"HELLO3"), Tuple!(int, string)(10, "HELLO10"), Tuple!(int,
string)(11,
"HELLO11")]

This is probably consuming your queue, popping all the data off
as it prints. If you print the length before hand, I'll bet
it's not zero.

Aha! Yes, you are correct. I didn't know writefln was popping
elements off the heap. I thought it would've just walked down the
heap without altering it at all. Interesting. Now I feel kinda
silly.

I don't know how to print the elements without removing them,
as binary heap doesn't have a range type, it seems to be the
range itself (an odd situation). Perhaps print the underlying
storage?

-Steve

Yeah, I think the thing to do would be to make a helper function
that would return the Array!(Tuple!) that the heap contains.
Maybe as a const reference to make sure a user doesn't
accidentally alter the array?

```
Aug 04 2015
"Meta" <jared771 gmail.com> writes:
```On Wednesday, 5 August 2015 at 01:27:53 UTC, Steven Schveighoffer
wrote:
On 8/4/15 9:02 PM, DarthCthulhu wrote:

writefln("PQ: %s", pq.queue); <- prints PQ: [Tuple!(int,
string)(3,
"HELLO3"), Tuple!(int, string)(10, "HELLO10"), Tuple!(int,
string)(11,
"HELLO11")]

This is probably consuming your queue, popping all the data off
as it prints. If you print the length before hand, I'll bet
it's not zero.

I don't know how to print the elements without removing them,
as binary heap doesn't have a range type, it seems to be the
range itself (an odd situation). Perhaps print the underlying
storage?

-Steve

It looks like there was a breaking change made to BinaryHeap
somewhere between 2.065 and the present. The code compiles fine
on 2.065.

http://dpaste.dzfl.pl/65ba735d69e7
```
Aug 04 2015
"DarthCthulhu" <spam spam.com> writes:
```On Wednesday, 5 August 2015 at 02:26:48 UTC, Meta wrote:
On Wednesday, 5 August 2015 at 01:27:53 UTC, Steven
Schveighoffer wrote:
On 8/4/15 9:02 PM, DarthCthulhu wrote:

writefln("PQ: %s", pq.queue); <- prints PQ: [Tuple!(int,
string)(3,
"HELLO3"), Tuple!(int, string)(10, "HELLO10"), Tuple!(int,
string)(11,
"HELLO11")]

This is probably consuming your queue, popping all the data
off as it prints. If you print the length before hand, I'll
bet it's not zero.

I don't know how to print the elements without removing them,
as binary heap doesn't have a range type, it seems to be the
range itself (an odd situation). Perhaps print the underlying
storage?

-Steve

It looks like there was a breaking change made to BinaryHeap
somewhere between 2.065 and the present. The code compiles fine
on 2.065.

http://dpaste.dzfl.pl/65ba735d69e7

Interesting.

I notice that the output of the 'writefln("PQ: %s", pq.queue);'
line is different in 2.065 as well. Presumably the change was
made because if one is printing out a BinaryHeap, one is more
likely interested in the contents of the heap rather than its
signature?

I'm using 2.067.1.
```
Aug 04 2015
"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
```On Wednesday, 5 August 2015 at 02:26:48 UTC, Meta wrote:
It looks like there was a breaking change made to BinaryHeap
somewhere between 2.065 and the present. The code compiles fine
on 2.065.

http://dpaste.dzfl.pl/65ba735d69e7

https://github.com/D-Programming-Language/phobos/pull/1989
```
Aug 05 2015
=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
```On Wednesday, 5 August 2015 at 01:02:56 UTC, DarthCthulhu wrote:
I must be doing something really stupid here, but I have no
clue what it is. Anyone know?

For functional behaviour I prefer a postblit that duplicates the
underlying BinaryHeap.

https://github.com/nordlow/justd/blob/master/priority_queue.d

Now the foreach won't consume the queue.

This will however duplicate the underlying array aswell, which is
probably not what we want. How do we avoid this?
```
Aug 05 2015
"Per =?UTF-8?B?Tm9yZGzDtnci?= <per.nordlow gmail.com> writes:
```On Wednesday, 5 August 2015 at 09:04:54 UTC, Nordlöw wrote:
For functional behaviour I prefer a postblit that duplicates
the underlying BinaryHeap.

The postblit is the

this(this) { ... }
```
Aug 05 2015
"Per =?UTF-8?B?Tm9yZGzDtnci?= <per.nordlow gmail.com> writes:
```On Wednesday, 5 August 2015 at 09:04:54 UTC, Nordlöw wrote:
This will however duplicate the underlying array aswell, which
is probably not what we want. How do we avoid this?

Correction: the underlying storage array *must* be duplicated
whenever we want to iterate over it without side effects in the
original instance. That's just the way binary heaps work.
```
Aug 05 2015
Steven Schveighoffer <schveiguy yahoo.com> writes:
```On 8/5/15 7:09 AM, "Per =?UTF-8?B?Tm9yZGzDtnci?=
<per.nordlow gmail.com>" wrote:
On Wednesday, 5 August 2015 at 09:04:54 UTC, Nordlöw wrote:
This will however duplicate the underlying array aswell, which is
probably not what we want. How do we avoid this?

Correction: the underlying storage array *must* be duplicated whenever
we want to iterate over it without side effects in the original
instance. That's just the way binary heaps work.

Yeah, I think there is no way to "traverse" a binary heap in order
without manipulating it. However, you can print the underlying storage.

-Steve
```
Aug 05 2015
"John Colvin" <john.loughran.colvin gmail.com> writes:
```On Wednesday, 5 August 2015 at 11:09:29 UTC, Per Nordlöw wrote:
On Wednesday, 5 August 2015 at 09:04:54 UTC, Nordlöw wrote:
This will however duplicate the underlying array aswell, which
is probably not what we want. How do we avoid this?

Correction: the underlying storage array *must* be duplicated
whenever we want to iterate over it without side effects in the
original instance. That's just the way binary heaps work.

Crazy idea: what about a range that lazily copies as it needs to?
I.e. copy-on-write
```
Aug 05 2015
=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
```On Wednesday, 5 August 2015 at 13:37:19 UTC, John Colvin wrote:
On Wednesday, 5 August 2015 at 11:09:29 UTC, Per Nordlöw wrote:
On Wednesday, 5 August 2015 at 09:04:54 UTC, Nordlöw wrote:
This will however duplicate the underlying array aswell,
which is probably not what we want. How do we avoid this?

Correction: the underlying storage array *must* be duplicated
whenever we want to iterate over it without side effects in
the original instance. That's just the way binary heaps work.

Crazy idea: what about a range that lazily copies as it needs
to? I.e. copy-on-write

I guess you mean that popFront should copy on demand then. We
then an extra bool to keep track of whether the copying has been
done then. One problem though:

auto x = PQ;
x.insert(...); // one element
auto y = x; // no copying of underlying storage
x.popFront; // modified both x and y!
y.popFront; // copied on demands, but underlying storage is

I don't think this is a desired behaviour.
```
Aug 05 2015
"John Colvin" <john.loughran.colvin gmail.com> writes:
```On Wednesday, 5 August 2015 at 15:29:39 UTC, Nordlöw wrote:
On Wednesday, 5 August 2015 at 13:37:19 UTC, John Colvin wrote:
On Wednesday, 5 August 2015 at 11:09:29 UTC, Per Nordlöw wrote:
On Wednesday, 5 August 2015 at 09:04:54 UTC, Nordlöw wrote:
This will however duplicate the underlying array aswell,
which is probably not what we want. How do we avoid this?

Correction: the underlying storage array *must* be duplicated
whenever we want to iterate over it without side effects in
the original instance. That's just the way binary heaps work.

Crazy idea: what about a range that lazily copies as it needs
to? I.e. copy-on-write

I guess you mean that popFront should copy on demand then. We
then an extra bool to keep track of whether the copying has
been done then. One problem though:

auto x = PQ;
x.insert(...); // one element
auto y = x; // no copying of underlying storage
x.popFront; // modified both x and y!
y.popFront; // copied on demands, but underlying storage is

I don't think this is a desired behaviour.

in my vision, either x.popFront would also create a copy or you
would have to go "auto y = x.nonModifyingView" or similar. What I
don't want is something that copies 10000 elements just to use
x.take(6)
```
Aug 05 2015
"Per =?UTF-8?B?Tm9yZGzDtnci?= <per.nordlow gmail.com> writes:
```On Wednesday, 5 August 2015 at 18:20:41 UTC, John Colvin wrote:
in my vision, either x.popFront would also create a copy or you
would have to go "auto y = x.nonModifyingView" or similar. What
I don't want is something that copies 10000 elements just to
use x.take(6)

I suggest you rehearse on how a binary heap works. A binary heap
with array storage trades speed for memory compactness, a bit
similar to how quick sort relates to heap sort.

The underlying heap array in D's `BinaryHeap` contains both the
data leaves and their inter-orderings in a memory-packed manner.
This makes heaps very space-efficient and running `dup` on the
underlying array is very cheap (in D), especially for value types.

When you pop an element from the heap an in-place reorganisation
(balancing) will be performed on the array. This means `popFront`
will potentially (in the worst case) require a complete copy of
the array in order to not have to modify the original array.

AFAIK, the 10000 elements will always have to be copied even when
we just need x.take(2). Peeking the front (using `front()`) is
O(1), but *popping* the front (to get the next front) in the
range may, in the worst cast, have to require reorganisation of
*all* the 10000 elements.

Please correct me if I'm wrong.
```
Aug 06 2015
"John Colvin" <john.loughran.colvin gmail.com> writes:
```On Thursday, 6 August 2015 at 08:44:17 UTC, Per Nordlöw wrote:
On Wednesday, 5 August 2015 at 18:20:41 UTC, John Colvin wrote:
[...]

I suggest you rehearse on how a binary heap works. A binary
heap with array storage trades speed for memory compactness, a
bit similar to how quick sort relates to heap sort.

The underlying heap array in D's `BinaryHeap` contains both the
data leaves and their inter-orderings in a memory-packed
manner. This makes heaps very space-efficient and running `dup`
on the underlying array is very cheap (in D), especially for
value types.

When you pop an element from the heap an in-place
reorganisation (balancing) will be performed on the array. This
means `popFront` will potentially (in the worst case) require a
complete copy of the array in order to not have to modify the
original array.

AFAIK, the 10000 elements will always have to be copied even
when we just need x.take(2). Peeking the front (using
`front()`) is O(1), but *popping* the front (to get the next
front) in the range may, in the worst cast, have to require
reorganisation of *all* the 10000 elements.

Please correct me if I'm wrong.

Worst case for getting root off a binary heap is O(log(N)),
copying the whole thing is O(N).

In practice the copy may be very cheap, but it does mean more
memory usage and won't scale to very large N. Perhaps it's an OK
tradeoff, but you want to be careful.
```
Aug 06 2015
"Per =?UTF-8?B?Tm9yZGzDtnci?= <per.nordlow gmail.com> writes:
```On Thursday, 6 August 2015 at 09:08:04 UTC, John Colvin wrote:
Worst case for getting root off a binary heap is O(log(N)),
copying the whole thing is O(N).

Those numbers does not take into account the special properties
of *in-array-packed* implementation of a binary heap. They are
*theoretical properties* related to a graph implementation of a
`BinaryHeap`.

The important thing in our D case is *which elements* in the
array that needs to be changed or *moved*.
```
Aug 06 2015
"DarthCthulhu" <spam spam.com> writes:
```Okay, so, I decided to scrap the BinaryHeap version of the
priority queue, going back to basics and utilizing a simple
array. It works, huzzah!

Code:

module data_structures.priority_queue;

import std.array;
import std.range: assumeSorted;
import std.typecons: Tuple;

/*
Templated Priority Queue

Usage: 	PriorityQueue!(PRIORITY_TYPE, VALUE_TYPE,
OPTIONAL_PREDICATE)

*/
struct PriorityQueue(P, V, alias predicate = "a < b") {

// To make the code a bit more readable
alias Tuple!(P, V) PV;

PV _q[];

// Forward most function calls to the underlying array.
PV[]* opDot() {
return &_q;
}

// Determine if the queue is empty
property bool empty () {

return (_q.length == 0);
}

// Needed so foreach can work
property PV front() {

return _q.front;
}

// Chop off the front of the array
property void popFront() {

_q = _q[1 .. \$];
}

// Insert a record via a template tuple
void insert(ref PV rec) {

// Empty queue?
if (_q.length == 0 ) {
// just put the record into the queue
_q ~= rec;

return;
}

// Assume the queue is already sorted according to PREDICATE
auto a = assumeSorted!(predicate)(_q);

// Find a slice containing records with priorities less than
the insertion rec
auto p = a.lowerBound(rec);

int location = p.length;

// Insert the record
_q.insertInPlace(location, rec);

}

void insert(PV rec) {
insert(rec);
}

// Insert a record via decomposed priority and value
void insert(P priority, V value) {

PV rec = PV(priority, value);

// Insert the record
insert(rec);

}

// Merge two Priority Queues, returning the merge.
// The two queues must obviously be of the same type in Priority
and Value, and predicate;
ref PriorityQueue!(P, V, predicate) merge(ref PriorityQueue!(P,
V, predicate) qmerge) {

// Make a copy of this PriorityQueue
PriorityQueue!(P, V, predicate)* qreturn = new
PriorityQueue!(P, V, predicate);
qreturn._q = _q.dup;

// Add in all the elements of the merging queue
foreach(rec; qmerge) {
qreturn.insert(rec);
}

// Return the resulting merged queue
return *qreturn;

}

}

unittest {

alias P = int;
alias V = string;
alias PV = Tuple!(P, V);
alias PQ = PriorityQueue!(P, V, "a < b");
PQ pq, pq2, pq3;

import std.typecons: tuple;

// Test basic insertion
pq.insert(10, "HELLO10");
pq.insert(11, "HELLO11");
pq.insert(3, "HELLO3");
pq.insert(31, "HELLO31");
pq.insert(5, "HELLO5");
pq.insert(10, "HELLO10-2");

assert(pq.length == 6);

foreach (const e; pq) {}    // iteration
assert(!pq.empty);          // shouldn't consume queue

// Copy by value
pq2 = pq;

foreach (priority, value; pq) {
pq.popFront();
}

// pq and pq2 should be independent
assert( !pq2.empty);
assert( pq.empty );

// Test merging
pq3.insert(tuple(12, "HELLO12"));
pq3.insert(Tuple!(int, string)(17, "HELLO17"));
pq3.insert(tuple(7, "HELLO7"));

pq = pq2.merge(pq3);

assert ( !pq.empty);

assert(pq.front == tuple(3, "HELLO3"));
pq.popFront;
assert(pq.front == tuple(5, "HELLO5"));
pq.popFront;
assert(pq.front == tuple(7, "HELLO7"));
pq.popFront;

assert( pq.length == 6 );
}

And a little driver main() to illustrate the queue a bit better:

main() {

PriorityQueue!(int, string) pq, pq2, pq3;

pq.insert(10, "HELLO10");
pq.insert(11, "HELLO11");
pq.insert(Tuple!(int, string)(3, "HELLO3"));
pq.insert(5, "HELLO5");
pq.insert(Tuple!(int, string)(12, "HELLO12"));
pq.insert(Tuple!(int, string)(17, "HELLO17"));

pq2.insert(Tuple!(int, string)(15, "HELLO15"));
pq2.insert(Tuple!(int, string)(21, "HELLO21"));

writefln("\tPQ: %s \n\tPQ2: %s \n\tPQ3: %s", pq, pq2, pq3);

pq3 = pq.merge(pq2);

foreach(priority, value; pq3) {

writefln("Priority: %s \tValue: %s \tLength: %s", priority,
value, pq3.length);
pq3.popFront();
}

}

I think I like this version better than the BinaryHeap one. It's
a bit simpler and can be extended easier. For instance, I think
it would be pretty easy to use mixins to make a seperate
PriorityQueueChangable version which allows the alteration of
priorities. It's also pretty easy to search for priorities or
values of certain requirements.

I don't know enough about D to really say if its performant in
any way, but it does do the job. It took about a week to get to
this final form, coming from someone familiar with programming
but not D.

A quick review of this project from my perspective:

The first day was mostly taken up with reading about D, then
doing an abortive attempt with associative arrays until I
realized that associative arrays are not and can not be sorted.

The next day involved finding BinaryHeap and puzzling out how to
use it. I think I got really hung up on the template syntax for a
bit. It took a little to get used to. I also over engineered the
thing with a separate helper struct until I embarrassingly
discovered tuples would do the job easier.

The third day had me hitting a wall with not thinking that the
BinaryHeap eats its elements on traversal. Coming here was a
great help.

The fourth day was mostly cleaning up the BinaryHeap version and
becoming increasingly dissatisfied with it.

And now here we are today, with a new version written with just
standard arrays. The big problems I had were screwing up the
sorting predicate (I couldn't figure out why everything would
work correctly in one direction, but not the other until I
realized that assumeSorted needed to be told which predicate to
use). The predicate was also needed for the merge portion
otherwise changing the direction of the queue would cause a type
mismatch.

Those two problems plus a few other minor ones later, and the
queue is complete. I have to say, I really like D a lot!

Thanks to you all for the assistance!
```
Aug 07 2015
"DarthCthulhu" <spam spam.com> writes:
```On Wednesday, 5 August 2015 at 09:04:54 UTC, Nordlöw wrote:
On Wednesday, 5 August 2015 at 01:02:56 UTC, DarthCthulhu wrote:
I must be doing something really stupid here, but I have no
clue what it is. Anyone know?

For functional behaviour I prefer a postblit that duplicates
the underlying BinaryHeap.

https://github.com/nordlow/justd/blob/master/priority_queue.d

Now the foreach won't consume the queue.

Oh, neat! I stumbled on the same thing (making a .dup of the
BinaryHeap), but didn't know about the postblit. That makes
things simplier.

This will however duplicate the underlying array aswell, which
is probably not what we want. How do we avoid this?

I was wondering that, myself, when I stumbled on the .dup
solution. My first thought was to instantiate a templated Array!
first, then use BinaryHeap.assume or .acquire to make it a
BinaryHeap while also keeping a reference to the underlining
array. Then one could just return the array reference in a
separate function rather than destructively iterating through the
BinaryHeap.

My experiments didn't bear this out, however. Maybe I'm
misunderstanding what the .assume and .acquire functions do?

Incidentally, I also discovered the wonderful opDot function
which allows the PriorityQueue to shunt most of the work down to
the BinaryHeap directly.

// Forward most function calls to the underlying queue.
BinaryHeap!(Array!(PV), predicate)* opDot() {
return &_q;
}

Yeah, I think there is no way to "traverse" a binary heap in
order without manipulating it. However, you can print the
underlying storage.

There's a way to get at the underlining store? I couldn't find
any means to do so in the BinaryHeap documentation.

in my vision, either x.popFront would also create a copy or you
would have to go "auto y = x.nonModifyingView" or similar. What
I don't want is something that copies 10000 elements just to use
x.take(6)

Yeah, that's it exactly!
```
Aug 05 2015