www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Heap: container or range?

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
A "computer science heap" is a structure that offers fast access to the 
largest element and fast extraction of it (which in turn provides access 
to the next largest element etc.).

I'm just done working on the heap in std.algorithm. Now, it turns out 
that heap supports both a meaningful definition as a full-fledged 
container, and a beautiful definition as a range.

If Heap is a range, you initiate it with another range, which Heap 
organizes in the heap manner. Then, successive calls to next() nicely 
extract elements starting from the largest. If the underlying range 
supports put(), then Heap also supports put() to insert into the heap.

Heap as a container would offer similar primitives but in addition would 
"own" its data (would call destructors upon destruction, and would 
support value copying).

What do you think? Should I make Heap a container or a range?


Andrei
Jan 29 2009
next sibling parent Daniel Keep <daniel.keep.lists gmail.com> writes:
Andrei Alexandrescu wrote:
 A "computer science heap" is a structure that offers fast access to the
 largest element and fast extraction of it (which in turn provides access
 to the next largest element etc.).
 
 I'm just done working on the heap in std.algorithm. Now, it turns out
 that heap supports both a meaningful definition as a full-fledged
 container, and a beautiful definition as a range.
 
 If Heap is a range, you initiate it with another range, which Heap
 organizes in the heap manner. Then, successive calls to next() nicely
 extract elements starting from the largest. If the underlying range
 supports put(), then Heap also supports put() to insert into the heap.
 
 Heap as a container would offer similar primitives but in addition would
 "own" its data (would call destructors upon destruction, and would
 support value copying).
 
 What do you think? Should I make Heap a container or a range?
 
 
 Andrei

I suppose it comes down to whether or not the typical case is to either have a heap or heap-view of another container. I'd suspect the first, to be honest. You could always just have Heap and HeapView. :) -- Daniel
Jan 29 2009
prev sibling next sibling parent reply Bill Baxter <wbaxter gmail.com> writes:
On Fri, Jan 30, 2009 at 11:49 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 A "computer science heap" is a structure that offers fast access to the
 largest element and fast extraction of it (which in turn provides access to
 the next largest element etc.).

In response to your previous question of how to distinguish from a memory heap, you can use the specific name for the kind of heap it is, like BinaryHeap, Binomial Heap, Fibbonocci Heap, etc.
 I'm just done working on the heap in std.algorithm. Now, it turns out that
 heap supports both a meaningful definition as a full-fledged container, and
 a beautiful definition as a range.

 If Heap is a range, you initiate it with another range, which Heap organizes
 in the heap manner. Then, successive calls to next() nicely extract elements
 starting from the largest. If the underlying range supports put(), then Heap
 also supports put() to insert into the heap.

 Heap as a container would offer similar primitives but in addition would
 "own" its data (would call destructors upon destruction, and would support
 value copying).

 What do you think? Should I make Heap a container or a range?

Both sound useful depending on circumstances. One provides the all-in-one convenient solution, the other is more of a "non-invasive heap". If you emphasize the non-invasive version, then creating the all-in-one version out of that should be pretty easy. But you can't take an all-in-one and turn it non-invasive so easily. --bb
Jan 29 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 On Fri, Jan 30, 2009 at 11:49 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 A "computer science heap" is a structure that offers fast access to the
 largest element and fast extraction of it (which in turn provides access to
 the next largest element etc.).

In response to your previous question of how to distinguish from a memory heap, you can use the specific name for the kind of heap it is, like BinaryHeap, Binomial Heap, Fibbonocci Heap, etc.

BinaryHeap it is. Thanks!
 I'm just done working on the heap in std.algorithm. Now, it turns out that
 heap supports both a meaningful definition as a full-fledged container, and
 a beautiful definition as a range.

 If Heap is a range, you initiate it with another range, which Heap organizes
 in the heap manner. Then, successive calls to next() nicely extract elements
 starting from the largest. If the underlying range supports put(), then Heap
 also supports put() to insert into the heap.

 Heap as a container would offer similar primitives but in addition would
 "own" its data (would call destructors upon destruction, and would support
 value copying).

 What do you think? Should I make Heap a container or a range?

Both sound useful depending on circumstances. One provides the all-in-one convenient solution, the other is more of a "non-invasive heap". If you emphasize the non-invasive version, then creating the all-in-one version out of that should be pretty easy. But you can't take an all-in-one and turn it non-invasive so easily.

Ah, never mind all that. I realized that I can't have a heap range. This is because heaps must mutate the underlying range in-place and any copy will mess the heap up. Here's the example that clarify it for me: int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ]; auto h = Heap!(int[])(a); assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]); At this point the mutations done by take messed up h already!! Andrei
Jan 29 2009
next sibling parent reply Jason House <jason.james.house gmail.com> writes:
Andrei Alexandrescu Wrote:

 Bill Baxter wrote:
 On Fri, Jan 30, 2009 at 11:49 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 A "computer science heap" is a structure that offers fast access to the
 largest element and fast extraction of it (which in turn provides access to
 the next largest element etc.).

In response to your previous question of how to distinguish from a memory heap, you can use the specific name for the kind of heap it is, like BinaryHeap, Binomial Heap, Fibbonocci Heap, etc.

BinaryHeap it is. Thanks!
 I'm just done working on the heap in std.algorithm. Now, it turns out that
 heap supports both a meaningful definition as a full-fledged container, and
 a beautiful definition as a range.

 If Heap is a range, you initiate it with another range, which Heap organizes
 in the heap manner. Then, successive calls to next() nicely extract elements
 starting from the largest. If the underlying range supports put(), then Heap
 also supports put() to insert into the heap.

 Heap as a container would offer similar primitives but in addition would
 "own" its data (would call destructors upon destruction, and would support
 value copying).

 What do you think? Should I make Heap a container or a range?

Both sound useful depending on circumstances. One provides the all-in-one convenient solution, the other is more of a "non-invasive heap". If you emphasize the non-invasive version, then creating the all-in-one version out of that should be pretty easy. But you can't take an all-in-one and turn it non-invasive so easily.

Ah, never mind all that. I realized that I can't have a heap range. This is because heaps must mutate the underlying range in-place and any copy will mess the heap up. Here's the example that clarify it for me: int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ]; auto h = Heap!(int[])(a); assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]); At this point the mutations done by take messed up h already!!

I would have expected index take(h,5)[0] to be 16... I still have yet to come to terms with passing ranges by value. I would expect take to modify my range. I naturally expect heaps to be destructive as elements are taken out. I also expect ranges to shrink. I don't see any issue...
Jan 29 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
 I still have yet to come to terms with passing ranges by value. I
 would expect take to modify my range. I naturally expect heaps to be
 destructive as elements are taken out. I also expect ranges to
 shrink. I don't see any issue...

Both ways have advantages and disadvantages. My early experiments involved passing by reference and it was an absolute mess: I'd forget a "ref" here and there with weird results, or I'd forget to save a copy of my range and I'd find is shrunk like an dehydrated fig in no time. One advantage of pass by value is that code is shorter and nicer - no need for a lot of temporaries. Andrei
Jan 29 2009
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 On Fri, Jan 30, 2009 at 12:14 PM, Andrei Alexandrescu
 Ah, never mind all that. I realized that I can't have a heap range. This is
 because heaps must mutate the underlying range in-place and any copy will
 mess the heap up. Here's the example that clarify it for me:

        int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
        auto h = Heap!(int[])(a);
        assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);

 At this point the mutations done by take messed up h already!!

Messed 'h' up or messed 'a' up?

'a' is messed up (mutated) by definition. The problem is that after h was copied to do take's deed, it got messed up (i.e., doesn't respect the heap property anymore).
 Anyway, in that case it would be nice if you can easily run the heap
 indirectly on an index or pointer buffer.
 
 Will something like this work?
 HeavyElement[] arrayOfHeavies;
 auto h = Heap!(int[], (int i, int j){return
 arrayOfHeavies[i]<arrayOfHeavies[j];} )( 0..arrayOfHeavies.length )
 
 Now   arrayOfHeavies[h.pop()] should give you the smallest element of
 the array of heavies, without ever having to copy/swap anything but
 ints.

Yah, that should work. You just need to initially fill the heap with the numbers 0 to arrayOfHeavies.length. I presume that's what you meant with the illegal slicing notation :o). Andrei
Jan 29 2009
parent Christopher Wright <dhasenan gmail.com> writes:
Andrei Alexandrescu wrote:
 Bill Baxter wrote:
 On Fri, Jan 30, 2009 at 12:14 PM, Andrei Alexandrescu
 Ah, never mind all that. I realized that I can't have a heap range. 
 This is
 because heaps must mutate the underlying range in-place and any copy 
 will
 mess the heap up. Here's the example that clarify it for me:

        int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
        auto h = Heap!(int[])(a);
        assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);

 At this point the mutations done by take messed up h already!!

Messed 'h' up or messed 'a' up?

'a' is messed up (mutated) by definition. The problem is that after h was copied to do take's deed, it got messed up (i.e., doesn't respect the heap property anymore).

I'm not seeing that. take should iterate through _h_, which will pop elements from _a_ (or a copy thereof) while maintaining the heap invariant.
Jan 29 2009
prev sibling next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu wrote:
 
 Ah, never mind all that. I realized that I can't have a heap range. This 
 is because heaps must mutate the underlying range in-place and any copy 
 will mess the heap up. Here's the example that clarify it for me:
 
         int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
         auto h = Heap!(int[])(a);
         assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);
 
 At this point the mutations done by take messed up h already!!

Hm... so assuming this is a min heap, I have: a = [4,1,3,2,16,9,10,14,8,7]; h = [1,2,3,4,7,9,10,14,8,16]; take(5, h); h = [8,10,9,16,14]; Shouldn't take call next repeatedly on the heap, which will in turn perform a popHeap operation? It's difficult to be sure what the result will be after take(5,h) occurs, but it should still be a valid heap, correct? Sean
Jan 29 2009
parent reply Sean Kelly <sean invisibleduck.org> writes:
Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Ah, never mind all that. I realized that I can't have a heap range. 
 This is because heaps must mutate the underlying range in-place and 
 any copy will mess the heap up. Here's the example that clarify it for 
 me:

         int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
         auto h = Heap!(int[])(a);
         assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);

 At this point the mutations done by take messed up h already!!

Hm... so assuming this is a min heap, I have: a = [4,1,3,2,16,9,10,14,8,7]; h = [1,2,3,4,7,9,10,14,8,16]; take(5, h); h = [8,10,9,16,14]; Shouldn't take call next repeatedly on the heap, which will in turn perform a popHeap operation? It's difficult to be sure what the result will be after take(5,h) occurs, but it should still be a valid heap, correct?

Oh, I would also expect: a = [8,10,9,16,14, garbage]; Since it isn't aware of the length adjustment. Perhaps this is what you meant? Sean
Jan 29 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Ah, never mind all that. I realized that I can't have a heap range. 
 This is because heaps must mutate the underlying range in-place and 
 any copy will mess the heap up. Here's the example that clarify it 
 for me:

         int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
         auto h = Heap!(int[])(a);
         assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);

 At this point the mutations done by take messed up h already!!

Hm... so assuming this is a min heap, I have: a = [4,1,3,2,16,9,10,14,8,7]; h = [1,2,3,4,7,9,10,14,8,16]; take(5, h); h = [8,10,9,16,14]; Shouldn't take call next repeatedly on the heap, which will in turn perform a popHeap operation? It's difficult to be sure what the result will be after take(5,h) occurs, but it should still be a valid heap, correct?

Oh, I would also expect: a = [8,10,9,16,14, garbage]; Since it isn't aware of the length adjustment. Perhaps this is what you meant? Sean

Yah, that's what I meant. h still thinks its length is 10 and that those 10 elements respect the heap property. Andrei
Jan 29 2009
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
I don't know if it matters, but I added the heap routines to Tango 
because I wanted heapsort, so they kind of came for free.  Perhaps a key 
distinction between ranges and algorithms is that algorithms may mutate 
the underlying data, but ranges may not?  I've been trying to think of 
another example of a mutating range and I haven't come up with one 
yet... unless you consider input ranges mutating, I suppose.


Sean
Jan 29 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 I don't know if it matters, but I added the heap routines to Tango 
 because I wanted heapsort, so they kind of came for free.  Perhaps a key 
 distinction between ranges and algorithms is that algorithms may mutate 
 the underlying data, but ranges may not?  I've been trying to think of 
 another example of a mutating range and I haven't come up with one 
 yet... unless you consider input ranges mutating, I suppose.

A great use for heaps is topNCopy: given an input range (!), give me back the top N items in it, fast. The way that works is by maintaining a heap in the return value. I'm sure interesting cases will arise with ranges that mutate their host, but heap doesn't look like a good candidate. Andrei
Jan 30 2009
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Brad Roberts wrote:
 Since take is a mutator, that implies that h needs to be a reference
 type or a by ref value, right?  So, why not make it so?

That's a great question, and one that I don't have a definitive answer to. There are several reasons for which by-ref manipulation is dicey: 1. Composition is harder. Consider: foreach (e; take(10, map!("a*a")(range))) ... In this case, take is passed an rvalue. Rvalues can't and shouldn't bind to references (I think C++'s rule of binding rvalues to const references was a subtle disaster that necessitated endless contortions in defining rvalue references). So then we need to explicitate: auto t = map!("a*a")(range); foreach (e; take(10, t)) ... 2. Difficulties in coding. My first attempt at ranges was in some formatting code. A by-ref doctrine requires that "ref" is added religiously to the trafficked values. That's not much of a problem except that, if you forget, the code still compiles and runs as told, except that the way it's told is not the way you meant. 3. More difficulties in coding Coding with more non-local dependencies is harder than coding that creates new, relatively independent entities. The examples given so far were as simple as: int[] a = [ 1, 2, 3 ]; take(1, a); // WHY IS A NOT [2, 3] NOW??? In reality, code is almost never that simple and the reality is that it becomes exceedingly harder to assess the state of a range after several other lazy ranges mutate it at their leisure. That's what I have so far. On the other hand, other functions are a tougher call. Consider "drop". drop(n, range) throws away at most n elements from the range. There are a few possible ways to define drop: a) By reference int[] a = [ 1, 2, 3 ]; drop(1, a); assert(a == [ 2, 3]); b) By value int[] a = [ 1, 2, 3 ]; drop(1, a); assert(a == [ 1, 2, 3]); a = drop(1, a); assert(a == [ 2, 3]); Unlike take, drop is eager, hence its by-ref implementation doesn't raise as many problems. I'm not sure what's the best choice for drop. Andrei
Jan 30 2009
parent Denis Koroskin <2korden gmail.com> writes:
Andrei Alexandrescu Wrote:

[snip]
 
 int[] a = [ 1, 2, 3 ];
 take(1, a);
 

Why is a not [2, 3] now?
Jan 30 2009
prev sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Thu, 29 Jan 2009 21:26:39 -0800, Andrei Alexandrescu wrote:

 Sean Kelly wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Ah, never mind all that. I realized that I can't have a heap range. 
 This is because heaps must mutate the underlying range in-place and 
 any copy will mess the heap up. Here's the example that clarify it 
 for me:

         int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
         auto h = Heap!(int[])(a);
         assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);

 At this point the mutations done by take messed up h already!!

Hm... so assuming this is a min heap, I have: a = [4,1,3,2,16,9,10,14,8,7]; h = [1,2,3,4,7,9,10,14,8,16]; take(5, h); h = [8,10,9,16,14]; Shouldn't take call next repeatedly on the heap, which will in turn perform a popHeap operation? It's difficult to be sure what the result will be after take(5,h) occurs, but it should still be a valid heap, correct?

Oh, I would also expect: a = [8,10,9,16,14, garbage]; Since it isn't aware of the length adjustment. Perhaps this is what you meant? Sean

Yah, that's what I meant. h still thinks its length is 10 and that those 10 elements respect the heap property.

Actually, if ranges are by value, I'd expect assert(equal(take(5, h) == take(5, h))); OTOH, I'd expect something like the following to also work: auto f = new File("blah.txt"); auto top = take(10, f.lines); I.e. I'd expect some ranges to be Translators, and others to be Mutators. Translators are read-only views into underlying anything which may involve some expensive bookkeeping. Mutators are basically references into a particular container and make sure that the container and other Mutators stay consistent.
Jan 30 2009
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Thu, 29 Jan 2009 21:26:39 -0800, Andrei Alexandrescu wrote:
 
 Sean Kelly wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Ah, never mind all that. I realized that I can't have a heap range. 
 This is because heaps must mutate the underlying range in-place and 
 any copy will mess the heap up. Here's the example that clarify it 
 for me:

         int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
         auto h = Heap!(int[])(a);
         assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);

 At this point the mutations done by take messed up h already!!

a = [4,1,3,2,16,9,10,14,8,7]; h = [1,2,3,4,7,9,10,14,8,16]; take(5, h); h = [8,10,9,16,14]; Shouldn't take call next repeatedly on the heap, which will in turn perform a popHeap operation? It's difficult to be sure what the result will be after take(5,h) occurs, but it should still be a valid heap, correct?

a = [8,10,9,16,14, garbage]; Since it isn't aware of the length adjustment. Perhaps this is what you meant? Sean

10 elements respect the heap property.

Actually, if ranges are by value, I'd expect assert(equal(take(5, h) == take(5, h)));

Me too. It does work indeed BUT NOT FOR INPUT RANGES. So in short you are showing yet another reason why heaps can't be good ranges.
 OTOH, I'd expect something like the following to also work:
 
 auto f = new File("blah.txt");
 auto top = take(10, f.lines);

That also works (just that in the new std.stdio File is a struct, so it's not dynamically allocated). However, f.lines is understandably an input range so this will NOT work: auto f = new File("blah.txt"); assert(equal(take(5, f.lines) == take(5, f.lines))); The two f.lines instances operate on the same underlying file handle, so they are input ranges: advancing one advances all others.
 I.e. I'd expect some ranges to be Translators, and others to be
 Mutators.  Translators are read-only views into underlying anything
 which may involve some expensive bookkeeping.  Mutators are basically
 references into a particular container and make sure that the container
 and other Mutators stay consistent.

I think the distinction is input vs. forward and mutable vs. immutable ranges (the latter distinction is not checked in yet). Andrei
Jan 30 2009
prev sibling next sibling parent reply Denis Koroskin <2korden gmail.com> writes:
Sergey Gromov Wrote:
[snip]
 OTOH, I'd expect something like the following to also work:
 
 auto f = new File("blah.txt");
 auto top = take(10, f.lines);
 

Same here. I feel that take (and some other adaptors) should accept range by reference. One of the advantages of ranges over opApply is that you can start foreach, break somewhere (throw an exception, recover) and then continue. It also means that the range is mutated inside foreach. If foreach mutates the range, so can other algorithms that operate on ranges do.
 I.e. I'd expect some ranges to be Translators, and others to be
 Mutators.  Translators are read-only views into underlying anything
 which may involve some expensive bookkeeping.  Mutators are basically
 references into a particular container and make sure that the container
 and other Mutators stay consistent.

Jan 30 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Denis Koroskin wrote:
 Sergey Gromov Wrote: [snip]
 OTOH, I'd expect something like the following to also work:
 
 auto f = new File("blah.txt"); auto top = take(10, f.lines);
 

Same here. I feel that take (and some other adaptors) should accept range by reference. One of the advantages of ranges over opApply is that you can start foreach, break somewhere (throw an exception, recover) and then continue. It also means that the range is mutated inside foreach. If foreach mutates the range, so can other algorithms that operate on ranges do.

While I'm not 100% clear on what take should do, this particular argument is not that compelling. I really mean not compelling at all. Consider: int[] a = [ 1, 2, 3, 4, 5]; foreach (e; a) { if (e == 3) break; } assert(a == [4, 5]); Do you seriously expect that to be the case? Of course not. However, you would inconsistently expect this to happen: int[] a = [ 1, 2, 3, 4, 5]; foreach (e; take(4, a)) { if (e == 3) break; } assert(a == [4, 5]); Ranges are modeled to correspond to slices. Slices have few operations that manipulate them by reference (notably the controversial ~=) and range are made to be unsurprising when compared to slices. They are really extensions of slices. Andrei
Jan 30 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Brad Roberts wrote:
 On Fri, 30 Jan 2009, Andrei Alexandrescu wrote:
 
 Consider:

 int[] a = [ 1, 2, 3, 4, 5];
 foreach (e; a)
 {
    if (e == 3) break;
 }
 assert(a == [4, 5]);

 Do you seriously expect that to be the case? Of course not. However, you would
 inconsistently expect this to happen:

 int[] a = [ 1, 2, 3, 4, 5];
 foreach (e; take(4, a))
 {
    if (e == 3) break;
 }
 assert(a == [4, 5]);

 Ranges are modeled to correspond to slices. Slices have few operations that
 manipulate them by reference (notably the controversial ~=) and range are made
 to be unsurprising when compared to slices. They are really extensions of
 slices.


 Andrei

I hate to come back to naming, but I think that's a major part of what's causing grief here. Take implies mutation of the input range. How about 'subrange' or 'subset' or 'leftmost' anything that doesn't imply removal. Later, Brad

Names are important. "take" is taken from Haskell. Andrei
Jan 30 2009
parent Derek Parnell <derek psych.ward> writes:
On Fri, 30 Jan 2009 14:46:28 -0800, Andrei Alexandrescu wrote:

 Names are important. "take" is taken from Haskell.

No it hasn't. Haskell still has it so its been "copied" from Haskell. :-) -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Jan 30 2009
prev sibling parent Brad Roberts <braddr bellevue.puremagic.com> writes:
On Fri, 30 Jan 2009, Andrei Alexandrescu wrote:

 Consider:
 
 int[] a = [ 1, 2, 3, 4, 5];
 foreach (e; a)
 {
    if (e == 3) break;
 }
 assert(a == [4, 5]);
 
 Do you seriously expect that to be the case? Of course not. However, you would
 inconsistently expect this to happen:
 
 int[] a = [ 1, 2, 3, 4, 5];
 foreach (e; take(4, a))
 {
    if (e == 3) break;
 }
 assert(a == [4, 5]);
 
 Ranges are modeled to correspond to slices. Slices have few operations that
 manipulate them by reference (notably the controversial ~=) and range are made
 to be unsurprising when compared to slices. They are really extensions of
 slices.
 
 
 Andrei

I hate to come back to naming, but I think that's a major part of what's causing grief here. Take implies mutation of the input range. How about 'subrange' or 'subset' or 'leftmost' anything that doesn't imply removal. Later, Brad
Jan 30 2009
prev sibling parent Brad Roberts <braddr puremagic.com> writes:
Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Ah, never mind all that. I realized that I can't have a heap range.
 This is because heaps must mutate the underlying range in-place and
 any copy will mess the heap up. Here's the example that clarify it
 for me:

         int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
         auto h = Heap!(int[])(a);
         assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);

 At this point the mutations done by take messed up h already!!

Hm... so assuming this is a min heap, I have: a = [4,1,3,2,16,9,10,14,8,7]; h = [1,2,3,4,7,9,10,14,8,16]; take(5, h); h = [8,10,9,16,14]; Shouldn't take call next repeatedly on the heap, which will in turn perform a popHeap operation? It's difficult to be sure what the result will be after take(5,h) occurs, but it should still be a valid heap, correct?

Oh, I would also expect: a = [8,10,9,16,14, garbage]; Since it isn't aware of the length adjustment. Perhaps this is what you meant? Sean

Yah, that's what I meant. h still thinks its length is 10 and that those 10 elements respect the heap property. Andrei

Since take is a mutator, that implies that h needs to be a reference type or a by ref value, right? So, why not make it so? Later, Brad
Jan 29 2009
prev sibling next sibling parent BCS <none anon.com> writes:
Hello Andrei,

 What do you think? Should I make Heap a container or a range?
 
 Andrei
 

what would be the problem of making it (optionaly) both?
Jan 29 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Fri, Jan 30, 2009 at 12:14 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Fri, Jan 30, 2009 at 11:49 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 A "computer science heap" is a structure that offers fast access to the
 largest element and fast extraction of it (which in turn provides access
 to
 the next largest element etc.).

In response to your previous question of how to distinguish from a memory heap, you can use the specific name for the kind of heap it is, like BinaryHeap, Binomial Heap, Fibbonocci Heap, etc.

BinaryHeap it is. Thanks!
 I'm just done working on the heap in std.algorithm. Now, it turns out
 that
 heap supports both a meaningful definition as a full-fledged container,
 and
 a beautiful definition as a range.

 If Heap is a range, you initiate it with another range, which Heap
 organizes
 in the heap manner. Then, successive calls to next() nicely extract
 elements
 starting from the largest. If the underlying range supports put(), then
 Heap
 also supports put() to insert into the heap.

 Heap as a container would offer similar primitives but in addition would
 "own" its data (would call destructors upon destruction, and would
 support
 value copying).

 What do you think? Should I make Heap a container or a range?

Both sound useful depending on circumstances. One provides the all-in-one convenient solution, the other is more of a "non-invasive heap". If you emphasize the non-invasive version, then creating the all-in-one version out of that should be pretty easy. But you can't take an all-in-one and turn it non-invasive so easily.

Ah, never mind all that. I realized that I can't have a heap range. This is because heaps must mutate the underlying range in-place and any copy will mess the heap up. Here's the example that clarify it for me: int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ]; auto h = Heap!(int[])(a); assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]); At this point the mutations done by take messed up h already!!

Messed 'h' up or messed 'a' up? Anyway, in that case it would be nice if you can easily run the heap indirectly on an index or pointer buffer. Will something like this work? HeavyElement[] arrayOfHeavies; auto h = Heap!(int[], (int i, int j){return arrayOfHeavies[i]<arrayOfHeavies[j];} )( 0..arrayOfHeavies.length ) Now arrayOfHeavies[h.pop()] should give you the smallest element of the array of heavies, without ever having to copy/swap anything but ints. --bb
Jan 29 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Fri, Jan 30, 2009 at 1:17 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Fri, Jan 30, 2009 at 12:14 PM, Andrei Alexandrescu
 Ah, never mind all that. I realized that I can't have a heap range. This
 is
 because heaps must mutate the underlying range in-place and any copy will
 mess the heap up. Here's the example that clarify it for me:

       int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
       auto h = Heap!(int[])(a);
       assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);

 At this point the mutations done by take messed up h already!!

Messed 'h' up or messed 'a' up?

'a' is messed up (mutated) by definition. The problem is that after h was copied to do take's deed, it got messed up (i.e., doesn't respect the heap property anymore).
 Anyway, in that case it would be nice if you can easily run the heap
 indirectly on an index or pointer buffer.

 Will something like this work?
 HeavyElement[] arrayOfHeavies;
 auto h = Heap!(int[], (int i, int j){return
 arrayOfHeavies[i]<arrayOfHeavies[j];} )( 0..arrayOfHeavies.length )

 Now   arrayOfHeavies[h.pop()] should give you the smallest element of
 the array of heavies, without ever having to copy/swap anything but
 ints.

Yah, that should work. You just need to initially fill the heap with the numbers 0 to arrayOfHeavies.length. I presume that's what you meant with the illegal slicing notation :o).

Ok. great. I thought that the slicing notation to make a sequence worked everywhere now in D2. I guess it's only inside a foreach. --bb
Jan 29 2009
prev sibling next sibling parent reply Yigal Chripun <yigal100 gmail.com> writes:
Andrei Alexandrescu wrote:
 A "computer science heap" is a structure that offers fast access to the
 largest element and fast extraction of it (which in turn provides access
 to the next largest element etc.).

 I'm just done working on the heap in std.algorithm. Now, it turns out
 that heap supports both a meaningful definition as a full-fledged
 container, and a beautiful definition as a range.

 If Heap is a range, you initiate it with another range, which Heap
 organizes in the heap manner. Then, successive calls to next() nicely
 extract elements starting from the largest. If the underlying range
 supports put(), then Heap also supports put() to insert into the heap.

 Heap as a container would offer similar primitives but in addition would
 "own" its data (would call destructors upon destruction, and would
 support value copying).

 What do you think? Should I make Heap a container or a range?


 Andrei

regarding your previous question: isn't a Heap also called a Priority Queue?
Jan 29 2009
parent reply Bill Baxter <wbaxter gmail.com> writes:
On Fri, Jan 30, 2009 at 4:01 PM, Yigal Chripun <yigal100 gmail.com> wrote:
 regarding your previous question:
 isn't a Heap also called a Priority Queue?

I the distinction is that a Heap is one way to implement a Priority Queue. You could implement a priority queue by scanning a linked list for the smallest item and it would still be a (really inefficient) Priority Queue. --bb
Jan 29 2009
parent Yigal Chripun <yigal100 gmail.com> writes:
Bill Baxter wrote:
 On Fri, Jan 30, 2009 at 4:01 PM, Yigal Chripun<yigal100 gmail.com>  wrote:
 regarding your previous question:
 isn't a Heap also called a Priority Queue?

I the distinction is that a Heap is one way to implement a Priority Queue. You could implement a priority queue by scanning a linked list for the smallest item and it would still be a (really inefficient) Priority Queue. --bb

Ah. right.. :)
Jan 30 2009
prev sibling parent Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 A "computer science heap" is a structure that offers fast access to the 
 largest element and fast extraction of it (which in turn provides access 
 to the next largest element etc.).
 
 I'm just done working on the heap in std.algorithm. Now, it turns out 
 that heap supports both a meaningful definition as a full-fledged 
 container, and a beautiful definition as a range.
 
 If Heap is a range, you initiate it with another range, which Heap 
 organizes in the heap manner. Then, successive calls to next() nicely 
 extract elements starting from the largest. If the underlying range 
 supports put(), then Heap also supports put() to insert into the heap.
 
 Heap as a container would offer similar primitives but in addition would 
 "own" its data (would call destructors upon destruction, and would 
 support value copying).
 
 What do you think? Should I make Heap a container or a range?
 
 
 Andrei

I think this depends on whether heap operations are required (and usable) in places other than in a heap container. For some applications (certain graph algorithms), you want two ways to access the data. You use an IndexedPriorityQueue structure, which contains both a heap of (key, value) pairs ordered by value, allowing you to pop the item with minimum value in O(log n) time, and also an array allowing you to access any item by key in O(1) time. Whenever you move items in the heap, you update the array at the same time. It's impossible to implement such a data stucture using STL heap primitives, and likewise, my guess is that it would be impossible in both a range and container Heaps (basically, you need to be able to hook in a user-defined function to be called whenever items are swapped in the heap). But if it is possible in one but not the other implementation, it should be favoured.
Jan 30 2009