digitalmars.D - Heap: container or range?

Andrei Alexandrescu (15/15) Jan 29 2009 A "computer science heap" is a structure that offers fast access to the

Daniel Keep (6/27) Jan 29 2009 I suppose it comes down to whether or not the typical case is to either
Bill Baxter (12/26) Jan 29 2009 In response to your previous question of how to distinguish from a

Andrei Alexandrescu (10/41) Jan 29 2009 Ah, never mind all that. I realized that I can't have a heap range. This...

Bill Baxter (13/62) Jan 29 2009 Messed 'h' up or messed 'a' up?

Andrei Alexandrescu (8/31) Jan 29 2009 'a' is messed up (mutated) by definition. The problem is that after h

Bill Baxter (5/38) Jan 29 2009 Ok. great. I thought that the slicing notation to make a sequence
Christopher Wright (3/22) Jan 29 2009 I'm not seeing that. take should iterate through _h_, which will pop

Jason House (3/48) Jan 29 2009 I would have expected index take(h,5)[0] to be 16...

Andrei Alexandrescu (7/11) Jan 29 2009 Both ways have advantages and disadvantages. My early experiments

Sean Kelly (11/21) Jan 29 2009 Hm... so assuming this is a min heap, I have:

Sean Kelly (6/30) Jan 29 2009 Oh, I would also expect:

Andrei Alexandrescu (4/39) Jan 29 2009 Yah, that's what I meant. h still thinks its length is 10 and that those...

Brad Roberts (5/46) Jan 29 2009 Since take is a mutator, that implies that h needs to be a reference

Andrei Alexandrescu (46/48) Jan 30 2009 That's a great question, and one that I don't have a definitive answer

Denis Koroskin (3/7) Jan 30 2009 Why is a not [2, 3] now?

Sean Kelly (7/7) Jan 29 2009 I don't know if it matters, but I added the heap routines to Tango

Andrei Alexandrescu (7/13) Jan 30 2009 A great use for heaps is topNCopy: given an input range (!), give me

Sergey Gromov (11/49) Jan 30 2009 Actually, if ranges are by value, I'd expect

Andrei Alexandrescu (13/62) Jan 30 2009 Me too. It does work indeed BUT NOT FOR INPUT RANGES. So in short you
Denis Koroskin (3/13) Jan 30 2009 Same here. I feel that take (and some other adaptors) should accept rang...

Andrei Alexandrescu (23/35) Jan 30 2009 While I'm not 100% clear on what take should do, this particular

Brad Roberts (6/32) Jan 30 2009 I hate to come back to naming, but I think that's a major part of what's...

Andrei Alexandrescu (3/38) Jan 30 2009 Names are important. "take" is taken from Haskell.

Derek Parnell (6/7) Jan 30 2009 No it hasn't. Haskell still has it so its been "copied" from Haskell. :-...

BCS (2/6) Jan 29 2009 what would be the problem of making it (optionaly) both?
Yigal Chripun (3/18) Jan 29 2009 regarding your previous question:

Bill Baxter (6/8) Jan 29 2009 I the distinction is that a Heap is one way to implement a Priority

Yigal Chripun (2/11) Jan 30 2009 Ah. right.. :)

Don (16/37) Jan 30 2009 I think this depends on whether heap operations are required (and

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

A "computer science heap" is a structure that offers fast access to the 
largest element and fast extraction of it (which in turn provides access 
to the next largest element etc.).

I'm just done working on the heap in std.algorithm. Now, it turns out 
that heap supports both a meaningful definition as a full-fledged 
container, and a beautiful definition as a range.

If Heap is a range, you initiate it with another range, which Heap 
organizes in the heap manner. Then, successive calls to next() nicely 
extract elements starting from the largest. If the underlying range 
supports put(), then Heap also supports put() to insert into the heap.

Heap as a container would offer similar primitives but in addition would 
"own" its data (would call destructors upon destruction, and would 
support value copying).

What do you think? Should I make Heap a container or a range?


Andrei

Jan 29 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

Andrei Alexandrescu wrote:
 A "computer science heap" is a structure that offers fast access to the
 largest element and fast extraction of it (which in turn provides access
 to the next largest element etc.).
 
 I'm just done working on the heap in std.algorithm. Now, it turns out
 that heap supports both a meaningful definition as a full-fledged
 container, and a beautiful definition as a range.
 
 If Heap is a range, you initiate it with another range, which Heap
 organizes in the heap manner. Then, successive calls to next() nicely
 extract elements starting from the largest. If the underlying range
 supports put(), then Heap also supports put() to insert into the heap.
 
 Heap as a container would offer similar primitives but in addition would
 "own" its data (would call destructors upon destruction, and would
 support value copying).
 
 What do you think? Should I make Heap a container or a range?
 
 
 Andrei

I suppose it comes down to whether or not the typical case is to either
have a heap or heap-view of another container.  I'd suspect the first,
to be honest.

You could always just have Heap and HeapView.  :)

  -- Daniel

Jan 29 2009

Bill Baxter <wbaxter gmail.com> writes:

On Fri, Jan 30, 2009 at 11:49 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 A "computer science heap" is a structure that offers fast access to the
 largest element and fast extraction of it (which in turn provides access to
 the next largest element etc.).

In response to your previous question of how to distinguish from a
memory heap, you can use the specific name for the kind of heap it is,
like BinaryHeap, Binomial Heap, Fibbonocci Heap, etc.


 I'm just done working on the heap in std.algorithm. Now, it turns out that
 heap supports both a meaningful definition as a full-fledged container, and
 a beautiful definition as a range.

 If Heap is a range, you initiate it with another range, which Heap organizes
 in the heap manner. Then, successive calls to next() nicely extract elements
 starting from the largest. If the underlying range supports put(), then Heap
 also supports put() to insert into the heap.

 Heap as a container would offer similar primitives but in addition would
 "own" its data (would call destructors upon destruction, and would support
 value copying).

 What do you think? Should I make Heap a container or a range?

Both sound useful depending on circumstances.  One provides the
all-in-one convenient solution, the other is more of a "non-invasive
heap".

If you emphasize the non-invasive version, then creating the
all-in-one version out of that should be pretty easy.  But you can't
take an all-in-one and turn it non-invasive so easily.

--bb

Jan 29 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Fri, Jan 30, 2009 at 11:49 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 A "computer science heap" is a structure that offers fast access to the
 largest element and fast extraction of it (which in turn provides access to
 the next largest element etc.).

 
 In response to your previous question of how to distinguish from a
 memory heap, you can use the specific name for the kind of heap it is,
 like BinaryHeap, Binomial Heap, Fibbonocci Heap, etc.

BinaryHeap it is. Thanks!

 I'm just done working on the heap in std.algorithm. Now, it turns out that
 heap supports both a meaningful definition as a full-fledged container, and
 a beautiful definition as a range.

 If Heap is a range, you initiate it with another range, which Heap organizes
 in the heap manner. Then, successive calls to next() nicely extract elements
 starting from the largest. If the underlying range supports put(), then Heap
 also supports put() to insert into the heap.

 Heap as a container would offer similar primitives but in addition would
 "own" its data (would call destructors upon destruction, and would support
 value copying).

 What do you think? Should I make Heap a container or a range?

 
 Both sound useful depending on circumstances.  One provides the
 all-in-one convenient solution, the other is more of a "non-invasive
 heap".
 
 If you emphasize the non-invasive version, then creating the
 all-in-one version out of that should be pretty easy.  But you can't
 take an all-in-one and turn it non-invasive so easily.

Ah, never mind all that. I realized that I can't have a heap range. This 
is because heaps must mutate the underlying range in-place and any copy 
will mess the heap up. Here's the example that clarify it for me:

         int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
         auto h = Heap!(int[])(a);
         assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);

At this point the mutations done by take messed up h already!!


Andrei

Jan 29 2009

Bill Baxter <wbaxter gmail.com> writes:

On Fri, Jan 30, 2009 at 12:14 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Fri, Jan 30, 2009 at 11:49 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 A "computer science heap" is a structure that offers fast access to the
 largest element and fast extraction of it (which in turn provides access
 to
 the next largest element etc.).

 In response to your previous question of how to distinguish from a
 memory heap, you can use the specific name for the kind of heap it is,
 like BinaryHeap, Binomial Heap, Fibbonocci Heap, etc.

 BinaryHeap it is. Thanks!

 I'm just done working on the heap in std.algorithm. Now, it turns out
 that
 heap supports both a meaningful definition as a full-fledged container,
 and
 a beautiful definition as a range.

 If Heap is a range, you initiate it with another range, which Heap
 organizes
 in the heap manner. Then, successive calls to next() nicely extract
 elements
 starting from the largest. If the underlying range supports put(), then
 Heap
 also supports put() to insert into the heap.

 Heap as a container would offer similar primitives but in addition would
 "own" its data (would call destructors upon destruction, and would
 support
 value copying).

 What do you think? Should I make Heap a container or a range?

 Both sound useful depending on circumstances.  One provides the
 all-in-one convenient solution, the other is more of a "non-invasive
 heap".

 If you emphasize the non-invasive version, then creating the
 all-in-one version out of that should be pretty easy.  But you can't
 take an all-in-one and turn it non-invasive so easily.

 Ah, never mind all that. I realized that I can't have a heap range. This is
 because heaps must mutate the underlying range in-place and any copy will
 mess the heap up. Here's the example that clarify it for me:

        int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
        auto h = Heap!(int[])(a);
        assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);

 At this point the mutations done by take messed up h already!!

Messed 'h' up or messed 'a' up?

Anyway, in that case it would be nice if you can easily run the heap
indirectly on an index or pointer buffer.

Will something like this work?
HeavyElement[] arrayOfHeavies;
auto h = Heap!(int[], (int i, int j){return
arrayOfHeavies[i]<arrayOfHeavies[j];} )( 0..arrayOfHeavies.length )

Now   arrayOfHeavies[h.pop()] should give you the smallest element of
the array of heavies, without ever having to copy/swap anything but
ints.

--bb

Jan 29 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Fri, Jan 30, 2009 at 12:14 PM, Andrei Alexandrescu
 Ah, never mind all that. I realized that I can't have a heap range. This is
 because heaps must mutate the underlying range in-place and any copy will
 mess the heap up. Here's the example that clarify it for me:

        int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
        auto h = Heap!(int[])(a);
        assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);

 At this point the mutations done by take messed up h already!!

 
 Messed 'h' up or messed 'a' up?

'a' is messed up (mutated) by definition. The problem is that after h 
was copied to do take's deed, it got messed up (i.e., doesn't respect 
the heap property anymore).

 Anyway, in that case it would be nice if you can easily run the heap
 indirectly on an index or pointer buffer.
 
 Will something like this work?
 HeavyElement[] arrayOfHeavies;
 auto h = Heap!(int[], (int i, int j){return
 arrayOfHeavies[i]<arrayOfHeavies[j];} )( 0..arrayOfHeavies.length )
 
 Now   arrayOfHeavies[h.pop()] should give you the smallest element of
 the array of heavies, without ever having to copy/swap anything but
 ints.

Yah, that should work. You just need to initially fill the heap with the 
numbers 0 to arrayOfHeavies.length. I presume that's what you meant with 
the illegal slicing notation :o).


Andrei

Jan 29 2009

Bill Baxter <wbaxter gmail.com> writes:

On Fri, Jan 30, 2009 at 1:17 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Fri, Jan 30, 2009 at 12:14 PM, Andrei Alexandrescu
 Ah, never mind all that. I realized that I can't have a heap range. This
 is
 because heaps must mutate the underlying range in-place and any copy will
 mess the heap up. Here's the example that clarify it for me:

       int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
       auto h = Heap!(int[])(a);
       assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);

 At this point the mutations done by take messed up h already!!

 Messed 'h' up or messed 'a' up?

 'a' is messed up (mutated) by definition. The problem is that after h was
 copied to do take's deed, it got messed up (i.e., doesn't respect the heap
 property anymore).

 Anyway, in that case it would be nice if you can easily run the heap
 indirectly on an index or pointer buffer.

 Will something like this work?
 HeavyElement[] arrayOfHeavies;
 auto h = Heap!(int[], (int i, int j){return
 arrayOfHeavies[i]<arrayOfHeavies[j];} )( 0..arrayOfHeavies.length )

 Now   arrayOfHeavies[h.pop()] should give you the smallest element of
 the array of heavies, without ever having to copy/swap anything but
 ints.

 Yah, that should work. You just need to initially fill the heap with the
 numbers 0 to arrayOfHeavies.length. I presume that's what you meant with the
 illegal slicing notation :o).

Ok. great.  I thought that the slicing notation to make a sequence
worked everywhere now in D2.  I guess it's only inside a foreach.

--bb

Jan 29 2009

Christopher Wright <dhasenan gmail.com> writes:

Andrei Alexandrescu wrote:
 Bill Baxter wrote:
 On Fri, Jan 30, 2009 at 12:14 PM, Andrei Alexandrescu
 Ah, never mind all that. I realized that I can't have a heap range. 
 This is
 because heaps must mutate the underlying range in-place and any copy 
 will
 mess the heap up. Here's the example that clarify it for me:

        int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
        auto h = Heap!(int[])(a);
        assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);

 At this point the mutations done by take messed up h already!!

 Messed 'h' up or messed 'a' up?

 
 'a' is messed up (mutated) by definition. The problem is that after h 
 was copied to do take's deed, it got messed up (i.e., doesn't respect 
 the heap property anymore).

I'm not seeing that. take should iterate through _h_, which will pop 
elements from _a_ (or a copy thereof) while maintaining the heap invariant.

Jan 29 2009

Jason House <jason.james.house gmail.com> writes:

Andrei Alexandrescu Wrote:

 Bill Baxter wrote:
 On Fri, Jan 30, 2009 at 11:49 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 A "computer science heap" is a structure that offers fast access to the
 largest element and fast extraction of it (which in turn provides access to
 the next largest element etc.).

 
 In response to your previous question of how to distinguish from a
 memory heap, you can use the specific name for the kind of heap it is,
 like BinaryHeap, Binomial Heap, Fibbonocci Heap, etc.

 
 BinaryHeap it is. Thanks!
 
 I'm just done working on the heap in std.algorithm. Now, it turns out that
 heap supports both a meaningful definition as a full-fledged container, and
 a beautiful definition as a range.

 If Heap is a range, you initiate it with another range, which Heap organizes
 in the heap manner. Then, successive calls to next() nicely extract elements
 starting from the largest. If the underlying range supports put(), then Heap
 also supports put() to insert into the heap.

 Heap as a container would offer similar primitives but in addition would
 "own" its data (would call destructors upon destruction, and would support
 value copying).

 What do you think? Should I make Heap a container or a range?

 
 Both sound useful depending on circumstances.  One provides the
 all-in-one convenient solution, the other is more of a "non-invasive
 heap".
 
 If you emphasize the non-invasive version, then creating the
 all-in-one version out of that should be pretty easy.  But you can't
 take an all-in-one and turn it non-invasive so easily.

 
 Ah, never mind all that. I realized that I can't have a heap range. This 
 is because heaps must mutate the underlying range in-place and any copy 
 will mess the heap up. Here's the example that clarify it for me:
 
          int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
          auto h = Heap!(int[])(a);
          assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);
 
 At this point the mutations done by take messed up h already!!

I would have expected index take(h,5)[0] to be 16...

I still have yet to come to terms with passing ranges by value. I would expect
take to modify my range. I naturally expect heaps to be destructive as elements
are taken out. I also expect ranges to shrink. I don't see any issue...

Jan 29 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

 I still have yet to come to terms with passing ranges by value. I
 would expect take to modify my range. I naturally expect heaps to be
 destructive as elements are taken out. I also expect ranges to
 shrink. I don't see any issue...

Both ways have advantages and disadvantages. My early experiments 
involved passing by reference and it was an absolute mess: I'd forget a 
"ref" here and there with weird results, or I'd forget to save a copy of 
my range and I'd find is shrunk like an dehydrated fig in no time. One 
advantage of pass by value is that code is shorter and nicer - no need 
for a lot of temporaries.

Andrei

Jan 29 2009

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu wrote:
 
 Ah, never mind all that. I realized that I can't have a heap range. This 
 is because heaps must mutate the underlying range in-place and any copy 
 will mess the heap up. Here's the example that clarify it for me:
 
         int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
         auto h = Heap!(int[])(a);
         assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);
 
 At this point the mutations done by take messed up h already!!

Hm... so assuming this is a min heap, I have:

a = [4,1,3,2,16,9,10,14,8,7];
h = [1,2,3,4,7,9,10,14,8,16];
take(5, h);
h = [8,10,9,16,14];

Shouldn't take call next repeatedly on the heap, which will in turn 
perform a popHeap operation?  It's difficult to be sure what the result 
will be after take(5,h) occurs, but it should still be a valid heap, 
correct?


Sean

Jan 29 2009

Sean Kelly <sean invisibleduck.org> writes:

Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Ah, never mind all that. I realized that I can't have a heap range. 
 This is because heaps must mutate the underlying range in-place and 
 any copy will mess the heap up. Here's the example that clarify it for 
 me:

         int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
         auto h = Heap!(int[])(a);
         assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);

 At this point the mutations done by take messed up h already!!

 
 Hm... so assuming this is a min heap, I have:
 
 a = [4,1,3,2,16,9,10,14,8,7];
 h = [1,2,3,4,7,9,10,14,8,16];
 take(5, h);
 h = [8,10,9,16,14];
 
 Shouldn't take call next repeatedly on the heap, which will in turn 
 perform a popHeap operation?  It's difficult to be sure what the result 
 will be after take(5,h) occurs, but it should still be a valid heap, 
 correct?

Oh, I would also expect:

a = [8,10,9,16,14, garbage];

Since it isn't aware of the length adjustment.  Perhaps this is what you 
meant?


Sean

Jan 29 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sean Kelly wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Ah, never mind all that. I realized that I can't have a heap range. 
 This is because heaps must mutate the underlying range in-place and 
 any copy will mess the heap up. Here's the example that clarify it 
 for me:

         int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
         auto h = Heap!(int[])(a);
         assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);

 At this point the mutations done by take messed up h already!!

 Hm... so assuming this is a min heap, I have:

 a = [4,1,3,2,16,9,10,14,8,7];
 h = [1,2,3,4,7,9,10,14,8,16];
 take(5, h);
 h = [8,10,9,16,14];

 Shouldn't take call next repeatedly on the heap, which will in turn 
 perform a popHeap operation?  It's difficult to be sure what the 
 result will be after take(5,h) occurs, but it should still be a valid 
 heap, correct?

 
 Oh, I would also expect:
 
 a = [8,10,9,16,14, garbage];
 
 Since it isn't aware of the length adjustment.  Perhaps this is what you 
 meant?
 
 
 Sean

Yah, that's what I meant. h still thinks its length is 10 and that those 
10 elements respect the heap property.

Andrei

Jan 29 2009

Brad Roberts <braddr puremagic.com> writes:

Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Ah, never mind all that. I realized that I can't have a heap range.
 This is because heaps must mutate the underlying range in-place and
 any copy will mess the heap up. Here's the example that clarify it
 for me:

         int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
         auto h = Heap!(int[])(a);
         assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);

 At this point the mutations done by take messed up h already!!

 Hm... so assuming this is a min heap, I have:

 a = [4,1,3,2,16,9,10,14,8,7];
 h = [1,2,3,4,7,9,10,14,8,16];
 take(5, h);
 h = [8,10,9,16,14];

 Shouldn't take call next repeatedly on the heap, which will in turn
 perform a popHeap operation?  It's difficult to be sure what the
 result will be after take(5,h) occurs, but it should still be a valid
 heap, correct?

 Oh, I would also expect:

 a = [8,10,9,16,14, garbage];

 Since it isn't aware of the length adjustment.  Perhaps this is what
 you meant?


 Sean

 
 Yah, that's what I meant. h still thinks its length is 10 and that those
 10 elements respect the heap property.
 
 Andrei

Since take is a mutator, that implies that h needs to be a reference
type or a by ref value, right?  So, why not make it so?

Later,
Brad

Jan 29 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Brad Roberts wrote:
 Since take is a mutator, that implies that h needs to be a reference
 type or a by ref value, right?  So, why not make it so?

That's a great question, and one that I don't have a definitive answer 
to. There are several reasons for which by-ref manipulation is dicey:

1. Composition is harder.

Consider:

foreach (e; take(10, map!("a*a")(range))) ...

In this case, take is passed an rvalue. Rvalues can't and shouldn't bind 
to references (I think C++'s rule of binding rvalues to const references 
was a subtle disaster that necessitated endless contortions in defining 
rvalue references).

So then we need to explicitate:

auto t = map!("a*a")(range);
foreach (e; take(10, t)) ...

2. Difficulties in coding.

My first attempt at ranges was in some formatting code. A by-ref 
doctrine requires that "ref" is added religiously to the trafficked 
values. That's not much of a problem except that, if you forget, the 
code still compiles and runs as told, except that the way it's told is 
not the way you meant.

3. More difficulties in coding

Coding with more non-local dependencies is harder than coding that 
creates new, relatively independent entities. The examples given so far 
were as simple as:

int[] a = [ 1, 2, 3 ];
take(1, a);
// WHY IS A NOT [2, 3] NOW???

In reality, code is almost never that simple and the reality is that it 
becomes exceedingly harder to assess the state of a range after several 
other lazy ranges mutate it at their leisure.

That's what I have so far.

On the other hand, other functions are a tougher call. Consider "drop". 
drop(n, range) throws away at most n elements from the range. There are 
a few possible ways to define drop:

a) By reference
int[] a = [ 1, 2, 3 ];
drop(1, a);
assert(a == [ 2, 3]);

b) By value
int[] a = [ 1, 2, 3 ];
drop(1, a);
assert(a == [ 1, 2, 3]);
a = drop(1, a);
assert(a == [ 2, 3]);

Unlike take, drop is eager, hence its by-ref implementation doesn't 
raise as many problems. I'm not sure what's the best choice for drop.


Andrei

Jan 30 2009

Denis Koroskin <2korden gmail.com> writes:

Andrei Alexandrescu Wrote:

[snip]
 
 int[] a = [ 1, 2, 3 ];
 take(1, a);
 

Why is a not [2, 3] now?

Jan 30 2009

Sean Kelly <sean invisibleduck.org> writes:

I don't know if it matters, but I added the heap routines to Tango 
because I wanted heapsort, so they kind of came for free.  Perhaps a key 
distinction between ranges and algorithms is that algorithms may mutate 
the underlying data, but ranges may not?  I've been trying to think of 
another example of a mutating range and I haven't come up with one 
yet... unless you consider input ranges mutating, I suppose.


Sean

Jan 29 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sean Kelly wrote:
 I don't know if it matters, but I added the heap routines to Tango 
 because I wanted heapsort, so they kind of came for free.  Perhaps a key 
 distinction between ranges and algorithms is that algorithms may mutate 
 the underlying data, but ranges may not?  I've been trying to think of 
 another example of a mutating range and I haven't come up with one 
 yet... unless you consider input ranges mutating, I suppose.

A great use for heaps is topNCopy: given an input range (!), give me 
back the top N items in it, fast. The way that works is by maintaining a 
heap in the return value.

I'm sure interesting cases will arise with ranges that mutate their 
host, but heap doesn't look like a good candidate.


Andrei

Jan 30 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Thu, 29 Jan 2009 21:26:39 -0800, Andrei Alexandrescu wrote:

 Sean Kelly wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Ah, never mind all that. I realized that I can't have a heap range. 
 This is because heaps must mutate the underlying range in-place and 
 any copy will mess the heap up. Here's the example that clarify it 
 for me:

         int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
         auto h = Heap!(int[])(a);
         assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);

 At this point the mutations done by take messed up h already!!

 Hm... so assuming this is a min heap, I have:

 a = [4,1,3,2,16,9,10,14,8,7];
 h = [1,2,3,4,7,9,10,14,8,16];
 take(5, h);
 h = [8,10,9,16,14];

 Shouldn't take call next repeatedly on the heap, which will in turn 
 perform a popHeap operation?  It's difficult to be sure what the 
 result will be after take(5,h) occurs, but it should still be a valid 
 heap, correct?

 
 Oh, I would also expect:
 
 a = [8,10,9,16,14, garbage];
 
 Since it isn't aware of the length adjustment.  Perhaps this is what you 
 meant?
 
 Sean

 
 Yah, that's what I meant. h still thinks its length is 10 and that those 
 10 elements respect the heap property.

Actually, if ranges are by value, I'd expect

assert(equal(take(5, h) == take(5, h)));

OTOH, I'd expect something like the following to also work:

auto f = new File("blah.txt");
auto top = take(10, f.lines);

I.e. I'd expect some ranges to be Translators, and others to be
Mutators.  Translators are read-only views into underlying anything
which may involve some expensive bookkeeping.  Mutators are basically
references into a particular container and make sure that the container
and other Mutators stay consistent.

Jan 30 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sergey Gromov wrote:
 Thu, 29 Jan 2009 21:26:39 -0800, Andrei Alexandrescu wrote:
 
 Sean Kelly wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Ah, never mind all that. I realized that I can't have a heap range. 
 This is because heaps must mutate the underlying range in-place and 
 any copy will mess the heap up. Here's the example that clarify it 
 for me:

         int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
         auto h = Heap!(int[])(a);
         assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);

 At this point the mutations done by take messed up h already!!

 Hm... so assuming this is a min heap, I have:

 a = [4,1,3,2,16,9,10,14,8,7];
 h = [1,2,3,4,7,9,10,14,8,16];
 take(5, h);
 h = [8,10,9,16,14];

 Shouldn't take call next repeatedly on the heap, which will in turn 
 perform a popHeap operation?  It's difficult to be sure what the 
 result will be after take(5,h) occurs, but it should still be a valid 
 heap, correct?

 Oh, I would also expect:

 a = [8,10,9,16,14, garbage];

 Since it isn't aware of the length adjustment.  Perhaps this is what you 
 meant?

 Sean

 Yah, that's what I meant. h still thinks its length is 10 and that those 
 10 elements respect the heap property.

 
 Actually, if ranges are by value, I'd expect
 
 assert(equal(take(5, h) == take(5, h)));

Me too. It does work indeed BUT NOT FOR INPUT RANGES. So in short you 
are showing yet another reason why heaps can't be good ranges.

 OTOH, I'd expect something like the following to also work:
 
 auto f = new File("blah.txt");
 auto top = take(10, f.lines);

That also works (just that in the new std.stdio File is a struct, so 
it's not dynamically allocated). However, f.lines is understandably an 
input range so this will NOT work:

auto f = new File("blah.txt");
assert(equal(take(5, f.lines) == take(5, f.lines)));

The two f.lines instances operate on the same underlying file handle, so 
they are input ranges: advancing one advances all others.

 I.e. I'd expect some ranges to be Translators, and others to be
 Mutators.  Translators are read-only views into underlying anything
 which may involve some expensive bookkeeping.  Mutators are basically
 references into a particular container and make sure that the container
 and other Mutators stay consistent.

I think the distinction is input vs. forward and mutable vs. immutable 
ranges (the latter distinction is not checked in yet).

Andrei

Jan 30 2009

Denis Koroskin <2korden gmail.com> writes:

Sergey Gromov Wrote:
[snip]
 OTOH, I'd expect something like the following to also work:
 
 auto f = new File("blah.txt");
 auto top = take(10, f.lines);
 

Same here. I feel that take (and some other adaptors) should accept range by
reference. One of the advantages of ranges over opApply is that you can start
foreach, break somewhere (throw an exception, recover) and then continue. It
also means that the range is mutated inside foreach. If foreach mutates the
range, so can other algorithms that operate on ranges do.

 I.e. I'd expect some ranges to be Translators, and others to be
 Mutators.  Translators are read-only views into underlying anything
 which may involve some expensive bookkeeping.  Mutators are basically
 references into a particular container and make sure that the container
 and other Mutators stay consistent.

Jan 30 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Denis Koroskin wrote:
 Sergey Gromov Wrote: [snip]
 OTOH, I'd expect something like the following to also work:
 
 auto f = new File("blah.txt"); auto top = take(10, f.lines);
 

 
 Same here. I feel that take (and some other adaptors) should accept
 range by reference. One of the advantages of ranges over opApply is
 that you can start foreach, break somewhere (throw an exception,
 recover) and then continue. It also means that the range is mutated
 inside foreach. If foreach mutates the range, so can other algorithms
 that operate on ranges do.

While I'm not 100% clear on what take should do, this particular 
argument is not that compelling. I really mean not compelling at all.

Consider:

int[] a = [ 1, 2, 3, 4, 5];
foreach (e; a)
{
    if (e == 3) break;
}
assert(a == [4, 5]);

Do you seriously expect that to be the case? Of course not. However, you 
would inconsistently expect this to happen:

int[] a = [ 1, 2, 3, 4, 5];
foreach (e; take(4, a))
{
    if (e == 3) break;
}
assert(a == [4, 5]);

Ranges are modeled to correspond to slices. Slices have few operations 
that manipulate them by reference (notably the controversial ~=) and 
range are made to be unsurprising when compared to slices. They are 
really extensions of slices.


Andrei

Jan 30 2009

Brad Roberts <braddr bellevue.puremagic.com> writes:

On Fri, 30 Jan 2009, Andrei Alexandrescu wrote:

 Consider:
 
 int[] a = [ 1, 2, 3, 4, 5];
 foreach (e; a)
 {
    if (e == 3) break;
 }
 assert(a == [4, 5]);
 
 Do you seriously expect that to be the case? Of course not. However, you would
 inconsistently expect this to happen:
 
 int[] a = [ 1, 2, 3, 4, 5];
 foreach (e; take(4, a))
 {
    if (e == 3) break;
 }
 assert(a == [4, 5]);
 
 Ranges are modeled to correspond to slices. Slices have few operations that
 manipulate them by reference (notably the controversial ~=) and range are made
 to be unsurprising when compared to slices. They are really extensions of
 slices.
 
 
 Andrei

I hate to come back to naming, but I think that's a major part of what's 
causing grief here.  Take implies mutation of the input range.  How about 
'subrange' or 'subset' or 'leftmost' anything that doesn't imply removal.

Later,
Brad

Jan 30 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Brad Roberts wrote:
 On Fri, 30 Jan 2009, Andrei Alexandrescu wrote:
 
 Consider:

 int[] a = [ 1, 2, 3, 4, 5];
 foreach (e; a)
 {
    if (e == 3) break;
 }
 assert(a == [4, 5]);

 Do you seriously expect that to be the case? Of course not. However, you would
 inconsistently expect this to happen:

 int[] a = [ 1, 2, 3, 4, 5];
 foreach (e; take(4, a))
 {
    if (e == 3) break;
 }
 assert(a == [4, 5]);

 Ranges are modeled to correspond to slices. Slices have few operations that
 manipulate them by reference (notably the controversial ~=) and range are made
 to be unsurprising when compared to slices. They are really extensions of
 slices.


 Andrei

 
 I hate to come back to naming, but I think that's a major part of what's 
 causing grief here.  Take implies mutation of the input range.  How about 
 'subrange' or 'subset' or 'leftmost' anything that doesn't imply removal.
 
 Later,
 Brad

Names are important. "take" is taken from Haskell.

Andrei

Jan 30 2009

Derek Parnell <derek psych.ward> writes:

On Fri, 30 Jan 2009 14:46:28 -0800, Andrei Alexandrescu wrote:

 Names are important. "take" is taken from Haskell.

No it hasn't. Haskell still has it so its been "copied" from Haskell. :-)

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Jan 30 2009

BCS <none anon.com> writes:

Hello Andrei,

 What do you think? Should I make Heap a container or a range?
 
 Andrei
 

what would be the problem of making it (optionaly) both?

Jan 29 2009

Yigal Chripun <yigal100 gmail.com> writes:

Andrei Alexandrescu wrote:
 A "computer science heap" is a structure that offers fast access to the
 largest element and fast extraction of it (which in turn provides access
 to the next largest element etc.).

 I'm just done working on the heap in std.algorithm. Now, it turns out
 that heap supports both a meaningful definition as a full-fledged
 container, and a beautiful definition as a range.

 If Heap is a range, you initiate it with another range, which Heap
 organizes in the heap manner. Then, successive calls to next() nicely
 extract elements starting from the largest. If the underlying range
 supports put(), then Heap also supports put() to insert into the heap.

 Heap as a container would offer similar primitives but in addition would
 "own" its data (would call destructors upon destruction, and would
 support value copying).

 What do you think? Should I make Heap a container or a range?


 Andrei

regarding your previous question:
isn't a Heap also called a Priority Queue?

Jan 29 2009

Bill Baxter <wbaxter gmail.com> writes:

On Fri, Jan 30, 2009 at 4:01 PM, Yigal Chripun <yigal100 gmail.com> wrote:
 regarding your previous question:
 isn't a Heap also called a Priority Queue?

I the distinction is that a Heap is one way to implement a Priority
Queue.  You could implement a priority queue by scanning a linked list
for the smallest item and it would still be a (really inefficient)
Priority Queue.

--bb

Jan 29 2009

Yigal Chripun <yigal100 gmail.com> writes:

Bill Baxter wrote:
 On Fri, Jan 30, 2009 at 4:01 PM, Yigal Chripun<yigal100 gmail.com>  wrote:
 regarding your previous question:
 isn't a Heap also called a Priority Queue?

 I the distinction is that a Heap is one way to implement a Priority
 Queue.  You could implement a priority queue by scanning a linked list
 for the smallest item and it would still be a (really inefficient)
 Priority Queue.

 --bb

Ah. right.. :)

Jan 30 2009

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 A "computer science heap" is a structure that offers fast access to the 
 largest element and fast extraction of it (which in turn provides access 
 to the next largest element etc.).
 
 I'm just done working on the heap in std.algorithm. Now, it turns out 
 that heap supports both a meaningful definition as a full-fledged 
 container, and a beautiful definition as a range.
 
 If Heap is a range, you initiate it with another range, which Heap 
 organizes in the heap manner. Then, successive calls to next() nicely 
 extract elements starting from the largest. If the underlying range 
 supports put(), then Heap also supports put() to insert into the heap.
 
 Heap as a container would offer similar primitives but in addition would 
 "own" its data (would call destructors upon destruction, and would 
 support value copying).
 
 What do you think? Should I make Heap a container or a range?
 
 
 Andrei

I think this depends on whether heap operations are required (and 
usable) in places other than in a heap container.

For some applications (certain graph algorithms), you want two ways to 
access the data. You use an IndexedPriorityQueue structure, which 
contains both a heap of (key, value) pairs ordered by value, allowing 
you to pop the item with minimum value in O(log n) time, and also an 
array allowing you to access any item by key in O(1) time. Whenever you 
move items in the heap, you update the array at the same time.
It's impossible to implement such a data stucture using STL heap 
primitives, and likewise, my guess is that it would be impossible in 
both a range and container Heaps (basically, you need to be able to hook 
in a user-defined function to be called whenever items are swapped in 
the heap).
But if it is possible in one but not the other implementation, it should 
be favoured.

Jan 30 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Heap: container or range?