www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - BinaryHeap is a range so it goes in std.range. Agree?

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
I finalized BinaryHeap. It's pretty cool - it builds a forward range on 
top of a random-access range - typically T[] - or a random-access 
container - typically Array!T.

The difference is simple - if you build on top of a range the heap can't 
grow beyond the size of that range. Building on top of a container makes 
the heap growable.

Making BinaryHeap a range is actually pretty cool - just walking the 
heap is tantamount to lazily sorting the container. (Of course 
BinaryHeap has primitives in addition to the four range primitives.)

Do you agree with putting BinaryHeap in std.range (as opposed to 
std.container)?


Andrei
Jun 08 2010
next sibling parent reply "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 I finalized BinaryHeap. It's pretty cool - it builds a forward range on  
 top of a random-access range - typically T[] - or a random-access  
 container - typically Array!T.

 The difference is simple - if you build on top of a range the heap can't  
 grow beyond the size of that range. Building on top of a container makes  
 the heap growable.

 Making BinaryHeap a range is actually pretty cool - just walking the  
 heap is tantamount to lazily sorting the container. (Of course  
 BinaryHeap has primitives in addition to the four range primitives.)

 Do you agree with putting BinaryHeap in std.range (as opposed to  
 std.container)?

No. Binaryheap may work like a range, and it's definitely usable as a range, but the associativity of my brain says it's a container. I fear this will end up like enum for manifest constants, which behaves the way it should, but does not belong in that location. -- Simen
Jun 08 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/08/2010 10:00 AM, Simen kjaeraas wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 I finalized BinaryHeap. It's pretty cool - it builds a forward range
 on top of a random-access range - typically T[] - or a random-access
 container - typically Array!T.

 The difference is simple - if you build on top of a range the heap
 can't grow beyond the size of that range. Building on top of a
 container makes the heap growable.

 Making BinaryHeap a range is actually pretty cool - just walking the
 heap is tantamount to lazily sorting the container. (Of course
 BinaryHeap has primitives in addition to the four range primitives.)

 Do you agree with putting BinaryHeap in std.range (as opposed to
 std.container)?

No. Binaryheap may work like a range, and it's definitely usable as a range, but the associativity of my brain says it's a container.

Let me be more specific: BinaryHeap does not work like a range. It _is_ a range. It implements all of a forward range's primitives with the required semantics.
 I fear this will end up like enum for manifest constants, which behaves
 the way it should, but does not belong in that location.

But where should I put it then? I thought it would be even more confusing if I put something in std.container that wait a minute, is not a container. Andrei
Jun 08 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/08/2010 10:53 AM, Simen kjaeraas wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 Let me be more specific: BinaryHeap does not work like a range. It
 _is_ a range. It implements all of a forward range's primitives with
 the required semantics.

Is not anything that works like a range automatically a range?

Yah, just wanted to be precise.
 But where should I put it then? I thought it would be even more
 confusing if I put something in std.container that wait a minute, is
 not a container.

How is it not a container? Because it uses a different container as a back-end?

It does not implement the container primitives and is not a reference type.
 Generally, a range with benefits (i.e. extra functions) is still a range
 if it implements the required range primitives with the required
 semantics.

So many containers are ranges, then.

There aren't because no container implements e.g. popFront().
 And we will see this question for
 any of those containers added to Phobos. Or we decide on some rules to
 disambiguate between the two, and follow those.

There are already rules that disambiguate range operations from similar container operations. For example a range defines popFront() whereas a container defines removeFront().
 I hold that a range is a view that does not change the underlying data,
 and does not store all its data.
 That makes an array a container, which I feel is correct. It may still
 have range functionality, and thus be both, but it should be
 categorized as a container first.

 Are there other ranges that violate this definition?

The explanation goes as Steve puts it. A container has control over its topology. I agree that a BinaryHeap built on top of a container may ultimately affect the topology of the container, which makes it unlike e.g. Take or Chain. I could choose to disallow that and simply require that BinaryHeap always works on top of a range, not a container. But I think it's useful to have the growing functionality, and I don't think that makes BinaryHeap hopelessly confusing. Andrei
Jun 08 2010
next sibling parent "Joel C. Salomon" <joelcsalomon gmail.com> writes:
On 06/08/2010 12:26 PM, Andrei Alexandrescu wrote:
 On 06/08/2010 10:53 AM, Simen kjaeraas wrote:
 I hold that a range is a view that does not change the underlying data,
 and does not store all its data.
 That makes an array a container, which I feel is correct. It may still
 have range functionality, and thus be both, but it should be
 categorized as a container first.

 Are there other ranges that violate this definition?

The explanation goes as Steve puts it. A container has control over its topology.

From your perspective as the library writer, perhaps. As a user, my criterion is this: If I might go behind the back of the view, either to directly access the underlying structure or to impose a different view, then the view is a “range”; if that is unlikely than the view is itself a structure, a “container”. One addendum: The fact that I can choose the underlying container does not count as “going behind the back of the view”. If I implement a BinaryHeap (say) on something that wraps an on-disk file, that’s just an implementation detail; it’s still just a variant of BinaryHeap. —Joel Salomon
Jun 08 2010
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/08/2010 04:41 PM, Simen kjaeraas wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 On 06/08/2010 10:53 AM, Simen kjaeraas wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:


 But where should I put it then? I thought it would be even more
 confusing if I put something in std.container that wait a minute, is
 not a container.

How is it not a container? Because it uses a different container as a back-end?

It does not implement the container primitives and is not a reference type.

So it is not a container because you chose not to make it a container.

Instead of assuming that I'm simultaneously retard(ed) and stubborn, it would be more productive to ask more about my reasons. I'll assume you did so. I wrote BinaryHeap out of necessity. The necessity was to solve the selection problem (see topN, topNIndex in std.algorithm) and to implement n-way merge (see std.algorithm.nWayUnion, I think I'll rename it to nWayMerge). In all these cases, there was no necessity for defining a heap as a container; instead, using it as a means to add structure over an existing range were sufficient. If I defined the heap as a container, there would be need for one extra allocation and also extra indirections to ensure reference semantics. I wanted to avoid all that.
 There are already rules that disambiguate range operations from
 similar container operations. For example a range defines popFront()
 whereas a container defines removeFront().

And my point is that removeFront() is popFront() with a different name, so many containers could be considered ranges with mutated member functions. :p But we're arguing semantics.

It's a good point. I agree it would make sense to define a binary heap container in addition to a binary heap range. I'm only weary that they don't have very clean means to reuse code, so we'll end up with some unpleasant implementation internals. I guess I'll just have to do that.
 I agree that a BinaryHeap built on top of a container may ultimately
 affect the topology of the container, which makes it unlike e.g. Take
 or Chain. I could choose to disallow that and simply require that
 BinaryHeap always works on top of a range, not a container. But I
 think it's useful to have the growing functionality, and I don't think
 that makes BinaryHeap hopelessly confusing.

To me, this makes it a container. Now, my favorite way of dealing with this: Where would I look for a binary heap if I wanted one? I would think of it as a container, and thus check std.container. If it was not there, I would use the search function to find it. I can invent reasons, but it's mostly grounded in learned names and categories.

I think that's sensible. Andrei
Jun 08 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/09/2010 06:20 AM, Simen kjaeraas wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 It's a good point. I agree it would make sense to define a binary heap
 container in addition to a binary heap range. I'm only weary that they
 don't have very clean means to reuse code, so we'll end up with some
 unpleasant implementation internals. I guess I'll just have to do that.

I fear that this will also lead to confusion. "What? Two binheaps? What do they need that for?" Basically, the binheap range is a heapifying range, yes? So to me, it seems to make sense to have std.range.heapify, and std.container.binheap.

The problem is that operating on a range as a heap is not just one operation (i.e. heapify), it's a handful of them. STL took the approach of providing make_heap, push_heap, pop_heap, sort_heap, and is_heap. Their use is quite clunky. I think that's too unstructured an approach and would like to improve on it. Andrei
Jun 09 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/09/2010 10:13 AM, Simen kjaeraas wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 The problem is that operating on a range as a heap is not just one
 operation (i.e. heapify), it's a handful of them. STL took the
 approach of providing make_heap, push_heap, pop_heap, sort_heap, and
 is_heap. Their use is quite clunky. I think that's too unstructured an
 approach and would like to improve on it.

I might have mixed up names here. I meant heapify not as the operation, but as in 'turn this into a heap'.

Yes, but does that operation yield an object of a new type, or is just like STL's make_heap? Andrei
Jun 09 2010
prev sibling next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-06-08 17:41:22 -0400, "Simen kjaeraas" <simen.kjaras gmail.com> said:

 Now, my favorite way of dealing with this: Where would I look for a
 binary heap if I wanted one? I would think of it as a container, and thus
 check std.container. If it was not there, I would use the search function
 to find it. I can invent reasons, but it's mostly grounded in learned
 names and categories.

And if you were accustomed to the STL, you'd just look for a binary heap header to include instead of trying to philosophize about which category of things it fits best. That's why I suggested "std.binaryheap" earlier. (Could be "std.binheap" if you want it short.) -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jun 09 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/09/2010 07:57 AM, Michel Fortin wrote:
 On 2010-06-08 17:41:22 -0400, "Simen kjaeraas" <simen.kjaras gmail.com>
 said:

 Now, my favorite way of dealing with this: Where would I look for a
 binary heap if I wanted one? I would think of it as a container, and thus
 check std.container. If it was not there, I would use the search function
 to find it. I can invent reasons, but it's mostly grounded in learned
 names and categories.

And if you were accustomed to the STL, you'd just look for a binary heap header to include instead of trying to philosophize about which category of things it fits best. That's why I suggested "std.binaryheap" earlier. (Could be "std.binheap" if you want it short.)

I don't think this will scale. There are quite a few data structures out there, I'm afraid we'll have too many modules too soon. Andrei
Jun 09 2010
next sibling parent reply Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 On 06/09/2010 07:57 AM, Michel Fortin wrote:
 On 2010-06-08 17:41:22 -0400, "Simen kjaeraas" <simen.kjaras gmail.com>
 said:

 Now, my favorite way of dealing with this: Where would I look for a
 binary heap if I wanted one? I would think of it as a container, and 
 thus
 check std.container. If it was not there, I would use the search 
 function
 to find it. I can invent reasons, but it's mostly grounded in learned
 names and categories.

And if you were accustomed to the STL, you'd just look for a binary heap header to include instead of trying to philosophize about which category of things it fits best. That's why I suggested "std.binaryheap" earlier. (Could be "std.binheap" if you want it short.)

I don't think this will scale. There are quite a few data structures out there, I'm afraid we'll have too many modules too soon. Andrei

On the other hand, I don't think we want a 5Mb module, either. There's a very large number of potential containers, once you include all the esoteric ones. (std.algorithm has the same problem, of course). It's a difficult tradeoff. I hope you're able to come up with a reasonable rationale.
Jun 09 2010
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/09/2010 10:37 AM, Don wrote:
 Andrei Alexandrescu wrote:
 On 06/09/2010 07:57 AM, Michel Fortin wrote:
 On 2010-06-08 17:41:22 -0400, "Simen kjaeraas" <simen.kjaras gmail.com>
 said:

 Now, my favorite way of dealing with this: Where would I look for a
 binary heap if I wanted one? I would think of it as a container, and
 thus
 check std.container. If it was not there, I would use the search
 function
 to find it. I can invent reasons, but it's mostly grounded in learned
 names and categories.

And if you were accustomed to the STL, you'd just look for a binary heap header to include instead of trying to philosophize about which category of things it fits best. That's why I suggested "std.binaryheap" earlier. (Could be "std.binheap" if you want it short.)

I don't think this will scale. There are quite a few data structures out there, I'm afraid we'll have too many modules too soon. Andrei

On the other hand, I don't think we want a 5Mb module, either. There's a very large number of potential containers, once you include all the esoteric ones. (std.algorithm has the same problem, of course). It's a difficult tradeoff. I hope you're able to come up with a reasonable rationale.

I'm not sure what's the best approach. Best I can think of is by category, e.g. all sequences go in std.sequence, all trees go in std.tree, all hashes go in std.hash and so on. Since heaps are trees they may be in std.tree. But then there are quite a few heap kinds out there (binary, Fibonacci, binomial, soft) so perhaps std.heap is deserved. Andrei
Jun 09 2010
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-06-09 11:37:38 -0400, Don <nospam nospam.com> said:

 Andrei Alexandrescu wrote:
 On 06/09/2010 07:57 AM, Michel Fortin wrote:
 On 2010-06-08 17:41:22 -0400, "Simen kjaeraas" <simen.kjaras gmail.com>
 said:
 
 Now, my favorite way of dealing with this: Where would I look for a
 binary heap if I wanted one? I would think of it as a container, and thus
 check std.container. If it was not there, I would use the search function
 to find it. I can invent reasons, but it's mostly grounded in learned
 names and categories.

And if you were accustomed to the STL, you'd just look for a binary heap header to include instead of trying to philosophize about which category of things it fits best. That's why I suggested "std.binaryheap" earlier. (Could be "std.binheap" if you want it short.)

I don't think this will scale. There are quite a few data structures out there, I'm afraid we'll have too many modules too soon. Andrei

On the other hand, I don't think we want a 5Mb module, either. There's a very large number of potential containers, once you include all the esoteric ones.

Beside the size of the module, there is also the issue of namespace pollution. If you import std.container and you end up with a dozen of different containers plus their related functions, it's quite a lot. (Of course you can use selective import, but that's a workaround, it's better not have the problem in the first place.)
 (std.algorithm has the same problem, of course).
 It's a difficult tradeoff. I hope you're able to come up with a 
 reasonable rationale.

Personally I'd draw the line like this: a module should contain functions and types which are either tied in their implementation or which their use are correlated (if you use this thing you'll likely need this other one). For instance, a tree set and a tree map are almost the same, so they're good candidates to put together in a module (std.tree?). A linked list and an array are totally independent in implementation, and the need for one doesn't correlate with the need for another, so there's no reason to put them in the same module; they thus belong to separate modules. This "correlation in usage" rule should reduce the number of modules you need to import while also minimizing the number of symbols, the code size, and the dependencies of each module. What do you think? -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jun 09 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/09/2010 02:34 PM, Michel Fortin wrote:
 On 2010-06-09 11:37:38 -0400, Don <nospam nospam.com> said:

 Andrei Alexandrescu wrote:
 On 06/09/2010 07:57 AM, Michel Fortin wrote:
 On 2010-06-08 17:41:22 -0400, "Simen kjaeraas" <simen.kjaras gmail.com>
 said:

 Now, my favorite way of dealing with this: Where would I look for a
 binary heap if I wanted one? I would think of it as a container,
 and thus
 check std.container. If it was not there, I would use the search
 function
 to find it. I can invent reasons, but it's mostly grounded in learned
 names and categories.

And if you were accustomed to the STL, you'd just look for a binary heap header to include instead of trying to philosophize about which category of things it fits best. That's why I suggested "std.binaryheap" earlier. (Could be "std.binheap" if you want it short.)

I don't think this will scale. There are quite a few data structures out there, I'm afraid we'll have too many modules too soon. Andrei

On the other hand, I don't think we want a 5Mb module, either. There's a very large number of potential containers, once you include all the esoteric ones.

Beside the size of the module, there is also the issue of namespace pollution. If you import std.container and you end up with a dozen of different containers plus their related functions, it's quite a lot. (Of course you can use selective import, but that's a workaround, it's better not have the problem in the first place.)

Say you import std.container and you end up with two dozen containers: Array, SList, DList, BinaryHeap (groovy baby!), RedBlackTree, Trie, ... Where are the clashes?
 (std.algorithm has the same problem, of course).
 It's a difficult tradeoff. I hope you're able to come up with a
 reasonable rationale.

Personally I'd draw the line like this: a module should contain functions and types which are either tied in their implementation or which their use are correlated (if you use this thing you'll likely need this other one). For instance, a tree set and a tree map are almost the same, so they're good candidates to put together in a module (std.tree?).

Hm. On the other hand, if one is using a tree set that doesn't increase by a lot the likelihood of using a tree map.
 A linked list
 and an array are totally independent in implementation, and the need for
 one doesn't correlate with the need for another, so there's no reason to
 put them in the same module; they thus belong to separate modules.

 This "correlation in usage" rule should reduce the number of modules you
 need to import while also minimizing the number of symbols, the code
 size, and the dependencies of each module. What do you think?

I think we should look for a better rule. Andrei
Jun 09 2010
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2010-06-09 15:44:48 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 06/09/2010 02:34 PM, Michel Fortin wrote:
 Beside the size of the module, there is also the issue of namespace
 pollution. If you import std.container and you end up with a dozen of
 different containers plus their related functions, it's quite a lot. (Of
 course you can use selective import, but that's a workaround, it's
 better not have the problem in the first place.)

Say you import std.container and you end up with two dozen containers: Array, SList, DList, BinaryHeap (groovy baby!), RedBlackTree, Trie, ... Where are the clashes?

None... until you import something else that has a different Array type. If you import std.container because you need only BinaryHeap and doing this creates a clash with your own Array type you're going to have to disambiguate manually everywhere, rename your Array type everywhere, or to use a selective import. Not having the clash is better. And that clash will be quite frustrating if you need only BinaryHeap which has nothing to do with std.containers.Array.
 Personally I'd draw the line like this: a module should contain
 functions and types which are either tied in their implementation or
 which their use are correlated (if you use this thing you'll likely need
 this other one).
 
 For instance, a tree set and a tree map are almost the same, so they're
 good candidates to put together in a module (std.tree?).

Hm. On the other hand, if one is using a tree set that doesn't increase by a lot the likelihood of using a tree map.

Indeed, but since both share the same tree implementation, it'd be wasteful to make another module just for a few lines of code wrapping the same tree structure. The idea is to avoid creating a separate module for every small tweak of a core structure. There are multiple criteria which can be considered, my rule is based on two.
 A linked list
 and an array are totally independent in implementation, and the need for
 one doesn't correlate with the need for another, so there's no reason to
 put them in the same module; they thus belong to separate modules.
 
 This "correlation in usage" rule should reduce the number of modules you
 need to import while also minimizing the number of symbols, the code
 size, and the dependencies of each module. What do you think?

I think we should look for a better rule.

The rule I proposed was really "correlation in usage *or* shared implementation"; any of the two allows things to belong to the same module. The correlation avoids unnecessary imports, and the shared implementation part is to avoid scattering things in too many modules, and also to make it easier to keep common implementation details private. But it's just a guideline, it's what I'd try to do, not necessarily what I'd do every time. You don't have to follow a guideline when it makes more sense to do something else. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jun 09 2010
prev sibling parent Jonathan M Davis <jmdavisProg gmail.com> writes:
Michel Fortin wrote:

 Personally I'd draw the line like this: a module should contain
 functions and types which are either tied in their implementation or
 which their use are correlated (if you use this thing you'll likely
 need this other one).
 
 For instance, a tree set and a tree map are almost the same, so they're
 good candidates to put together in a module (std.tree?). A linked list
 and an array are totally independent in implementation, and the need
 for one doesn't correlate with the need for another, so there's no
 reason to put them in the same module; they thus belong to separate
 modules.
 
 This "correlation in usage" rule should reduce the number of modules
 you need to import while also minimizing the number of symbols, the
 code size, and the dependencies of each module. What do you think?
 

I'd expect that to lead to module explosion. It wouldn't be as bad as importing each class individually as typically happens in Java, but it would mean a lot more modules. It can certainly quickly lead to something close a module per container. I don't, personally, see much downside to throwing all of the containers in one module except that that's a lot to sort through when looking for a container documentation-wise. They definitely shouldn't be clashing with each other in any way, and ideally, you wouldn't be writing any code which clashed with them either since they'd be standard. Breaking up larger modules like std.algorithm and std.container isn't necessarily a bad idea, but it definitely should be done in a manner which avoids module explosion. Personally, I'd much prefer something closer to importing the entire standard library with one import than having to import each individual class one by one. - Jonathan M Davis
Jun 09 2010
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 On 06/09/2010 07:57 AM, Michel Fortin wrote:
 On 2010-06-08 17:41:22 -0400, "Simen kjaeraas" <simen.kjaras gmail.com>
 said:

 Now, my favorite way of dealing with this: Where would I look for a
 binary heap if I wanted one? I would think of it as a container, and 
 thus
 check std.container. If it was not there, I would use the search 
 function
 to find it. I can invent reasons, but it's mostly grounded in learned
 names and categories.

And if you were accustomed to the STL, you'd just look for a binary heap header to include instead of trying to philosophize about which category of things it fits best. That's why I suggested "std.binaryheap" earlier. (Could be "std.binheap" if you want it short.)

I don't think this will scale. There are quite a few data structures out there, I'm afraid we'll have too many modules too soon.

One solution is to have std.container consist of the following: public import std.containers.binaryheap; public import std.containers.redblacktree; ...
Jun 09 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/09/2010 02:48 PM, Walter Bright wrote:
 Andrei Alexandrescu wrote:
 On 06/09/2010 07:57 AM, Michel Fortin wrote:
 On 2010-06-08 17:41:22 -0400, "Simen kjaeraas" <simen.kjaras gmail.com>
 said:

 Now, my favorite way of dealing with this: Where would I look for a
 binary heap if I wanted one? I would think of it as a container, and
 thus
 check std.container. If it was not there, I would use the search
 function
 to find it. I can invent reasons, but it's mostly grounded in learned
 names and categories.

And if you were accustomed to the STL, you'd just look for a binary heap header to include instead of trying to philosophize about which category of things it fits best. That's why I suggested "std.binaryheap" earlier. (Could be "std.binheap" if you want it short.)

I don't think this will scale. There are quite a few data structures out there, I'm afraid we'll have too many modules too soon.

One solution is to have std.container consist of the following: public import std.containers.binaryheap; public import std.containers.redblacktree;

I wanted to do that but with the singular: public import std.container.binaryheap; So if someone imports std.container then they get them all, if someone imports std.container.something then they only get something. Having a module and a package with the same name does not currently work, and I don't think there's a good rationale for that limitation. Andrei
Jun 09 2010
parent Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 I wanted to do that but with the singular:
 
 public import std.container.binaryheap;
 
 So if someone imports std.container then they get them all, if someone 
 imports std.container.something then they only get something.
 
 Having a module and a package with the same name does not currently 
 work, and I don't think there's a good rationale for that limitation.

Because then it's overloading a name based on the contents of that name.
Jun 09 2010
prev sibling parent reply FeepingCreature <default_357-line yahoo.de> writes:
On 08.06.2010 23:41, Simen kjaeraas wrote:
 Now, my favorite way of dealing with this: Where would I look for a
 binary heap if I wanted one? I would think of it as a container, and thus
 check std.container. 

Law of Least Astonishment. Never forget.
Jun 13 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
FeepingCreature wrote:
 On 08.06.2010 23:41, Simen kjaeraas wrote:
 Now, my favorite way of dealing with this: Where would I look for a
 binary heap if I wanted one? I would think of it as a container, and thus
 check std.container. 

Law of Least Astonishment. Never forget.

I made BinaryHeap a container, which I recently realized is the best decision. Unfortunately, the HTML documentation is not there: http://www.digitalmars.com/d/2.0/phobos/std_container.html Walter, could you please fix this? Thanks, Andrei
Jun 13 2010
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-06-08 11:09:44 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 But where should I put it then? I thought it would be even more 
 confusing if I put something in std.container that wait a minute, is 
 not a container.

import std.binheap; -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jun 08 2010
parent KennyTM~ <kennytm gmail.com> writes:
On Jun 9, 10 02:02, Michel Fortin wrote:
 On 2010-06-08 11:09:44 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 But where should I put it then? I thought it would be even more
 confusing if I put something in std.container that wait a minute, is
 not a container.

import std.binheap;

vote++;
Jun 08 2010
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 08 Jun 2010 10:47:33 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I finalized BinaryHeap. It's pretty cool - it builds a forward range on  
 top of a random-access range - typically T[] - or a random-access  
 container - typically Array!T.

 The difference is simple - if you build on top of a range the heap can't  
 grow beyond the size of that range. Building on top of a container makes  
 the heap growable.

 Making BinaryHeap a range is actually pretty cool - just walking the  
 heap is tantamount to lazily sorting the container. (Of course  
 BinaryHeap has primitives in addition to the four range primitives.)

 Do you agree with putting BinaryHeap in std.range (as opposed to  
 std.container)?

If BinaryHeap can be added to, how is it a range? I would suspect that BinaryHeap can be in std.range, but a BinaryHeap that takes a container as an input is definitely not a range IMO. You might say the same about an array, but an array is special in that if I append to one array, it does not affect the other. I don't know where it belongs. To me, a range is a type that in itself cannot affect the topology of the data structure. It's like a window into the data structure that provides a common interface. A container is the owner of the data structure, and can provide ranges over its data. This may be an inadequate view, but its worked for me so far. -Steve
Jun 08 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/08/2010 10:04 AM, Steven Schveighoffer wrote:
 On Tue, 08 Jun 2010 10:47:33 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 I finalized BinaryHeap. It's pretty cool - it builds a forward range
 on top of a random-access range - typically T[] - or a random-access
 container - typically Array!T.

 The difference is simple - if you build on top of a range the heap
 can't grow beyond the size of that range. Building on top of a
 container makes the heap growable.

 Making BinaryHeap a range is actually pretty cool - just walking the
 heap is tantamount to lazily sorting the container. (Of course
 BinaryHeap has primitives in addition to the four range primitives.)

 Do you agree with putting BinaryHeap in std.range (as opposed to
 std.container)?

If BinaryHeap can be added to, how is it a range?

It is a forward range and also an output range by supporting put().
 I would suspect that
 BinaryHeap can be in std.range, but a BinaryHeap that takes a container
 as an input is definitely not a range IMO.

Well it still is. I mean isForwardRange!(BinaryHeap!(Array!int)) is true. Generally, a range with benefits (i.e. extra functions) is still a range if it implements the required range primitives with the required semantics.
 You might say the same about
 an array, but an array is special in that if I append to one array, it
 does not affect the other.

 I don't know where it belongs. To me, a range is a type that in itself
 cannot affect the topology of the data structure. It's like a window
 into the data structure that provides a common interface. A container is
 the owner of the data structure, and can provide ranges over its data.
 This may be an inadequate view, but its worked for me so far.

Actually if a heap is built on top of a container, it can affect its topology because it has the sensible primitive of adding to the heap. Andrei
Jun 08 2010
parent reply jlquinn <jlquinn optonline.net> writes:
Steven Schveighoffer Wrote:
 You might say the same about
 an array, but an array is special in that if I append to one array, it
 does not affect the other.

 I don't know where it belongs. To me, a range is a type that in itself
 cannot affect the topology of the data structure. It's like a window
 into the data structure that provides a common interface. A container is
 the owner of the data structure, and can provide ranges over its data.
 This may be an inadequate view, but its worked for me so far.

Actually if a heap is built on top of a container, it can affect its topology because it has the sensible primitive of adding to the heap.

Yes, that is my point of why at least growable heaps should not be considered ranges.

I second this point. In STL parlance, BinaryHeap would be a container adapter. Also, to me a range feels morally equivalent to an iterator and binary heap feels like a data structure that you can iterate over. And the fact that you can insert objects into it and later remove them in a manipulated order makes it feel even more like an active component and thus a container. So I'd naturally go looking for it in std.container and be a little confused to find it in std.range. It would become a quirk that one adjusts to. Jerry
Jun 08 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/08/2010 12:29 PM, jlquinn wrote:
 Steven Schveighoffer Wrote:
 You might say the same about an array, but an array is special
 in that if I append to one array, it does not affect the
 other.

 I don't know where it belongs. To me, a range is a type that in
 itself cannot affect the topology of the data structure. It's
 like a window into the data structure that provides a common
 interface. A container is the owner of the data structure, and
 can provide ranges over its data. This may be an inadequate
 view, but its worked for me so far.

Actually if a heap is built on top of a container, it can affect its topology because it has the sensible primitive of adding to the heap.

Yes, that is my point of why at least growable heaps should not be considered ranges.

I second this point. In STL parlance, BinaryHeap would be a container adapter. Also, to me a range feels morally equivalent to an iterator and binary heap feels like a data structure that you can iterate over. And the fact that you can insert objects into it and later remove them in a manipulated order makes it feel even more like an active component and thus a container. So I'd naturally go looking for it in std.container and be a little confused to find it in std.range. It would become a quirk that one adjusts to. Jerry

Unfortunately things are not that simple. A BinaryHeap built on top of a range can _still_ grow, just not larger than the size of the range. If you look through the code, you'll see that a constructor takes a range and an initial size. Oddly enough, this is a very frequent of binary heaps: you have a range and you want to take the top N. So I wouldn't want to discount that. So, what we have here are binary heaps having ranges as support (can only grow up to the size of the range) and binary heaps having containers as support (can grow with the container). Andrei
Jun 08 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 08 Jun 2010 11:12:01 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 06/08/2010 10:04 AM, Steven Schveighoffer wrote:
 On Tue, 08 Jun 2010 10:47:33 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 I finalized BinaryHeap. It's pretty cool - it builds a forward range
 on top of a random-access range - typically T[] - or a random-access
 container - typically Array!T.

 The difference is simple - if you build on top of a range the heap
 can't grow beyond the size of that range. Building on top of a
 container makes the heap growable.

 Making BinaryHeap a range is actually pretty cool - just walking the
 heap is tantamount to lazily sorting the container. (Of course
 BinaryHeap has primitives in addition to the four range primitives.)

 Do you agree with putting BinaryHeap in std.range (as opposed to
 std.container)?

If BinaryHeap can be added to, how is it a range?

It is a forward range and also an output range by supporting put().
 I would suspect that
 BinaryHeap can be in std.range, but a BinaryHeap that takes a container
 as an input is definitely not a range IMO.

Well it still is. I mean isForwardRange!(BinaryHeap!(Array!int)) is true. Generally, a range with benefits (i.e. extra functions) is still a range if it implements the required range primitives with the required semantics.

Functionally yes, something that supports all the range functions can be considered a range, but semantically, to me anyways, I always consider a range to be a something that is a view into a container. Something that destroys the container as it iterates, or can affect the topology of the original container seems like it is a different animal. You could have called std.container's removeFront popFront, and then containers could be "ranges" too. Very slow and bizarre ranges :) This is part of the reason why I have trouble accepting I/O ranges -- they just behave so much differently than other ranges it seems like they should have their own interface. Have you ever seen that movie Wall-E? There's a scene where he finds a spork, and in trying to place it, he gets to his bin of forks and spoons, with a cup for each type. After going back and forth between spoons and forks, he just places it on the shelf between the two cups. Maybe BinaryHeap is a crange :)
 You might say the same about
 an array, but an array is special in that if I append to one array, it
 does not affect the other.

 I don't know where it belongs. To me, a range is a type that in itself
 cannot affect the topology of the data structure. It's like a window
 into the data structure that provides a common interface. A container is
 the owner of the data structure, and can provide ranges over its data.
 This may be an inadequate view, but its worked for me so far.

Actually if a heap is built on top of a container, it can affect its topology because it has the sensible primitive of adding to the heap.

Yes, that is my point of why at least growable heaps should not be considered ranges. -Steve
Jun 08 2010
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 Let me be more specific: BinaryHeap does not work like a range. It _is_  
 a range. It implements all of a forward range's primitives with the  
 required semantics.

Is not anything that works like a range automatically a range?
 But where should I put it then? I thought it would be even more  
 confusing if I put something in std.container that wait a minute, is not  
 a container.

How is it not a container? Because it uses a different container as a back-end?
 Generally, a range with benefits (i.e. extra functions) is still a range
 if it implements the required range primitives with the required
 semantics.

So many containers are ranges, then. And we will see this question for any of those containers added to Phobos. Or we decide on some rules to disambiguate between the two, and follow those. I hold that a range is a view that does not change the underlying data, and does not store all its data. That makes an array a container, which I feel is correct. It may still have range functionality, and thus be both, but it should be categorized as a container first. Are there other ranges that violate this definition? -- Simen
Jun 08 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 08 Jun 2010 11:53:06 -0400, Simen kjaeraas  
<simen.kjaras gmail.com> wrote:

 I hold that a range is a view that does not change the underlying data,
 and does not store all its data.
 That makes an array a container, which I feel is correct. It may still
 have range functionality, and thus be both, but it should be
 categorized as a container first.

This is not exactly correct in my opinion. A range that presents a read-only view of the underlying data would unnecessarily exclude many algorithms. I would say it's a view of the data which can change the data, but not restructure the data (swapping two elements' values is not restructuring, re-linking two linked list nodes is restructuring). Essentially, it's a read-only view of the topology. Array is special in that even the container itself cannot change its own topology since it's based on contiguous memory which cannot be restructured, so it can also be considered a range. -Steve
Jun 08 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 08 Jun 2010 12:26:19 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I agree that a BinaryHeap built on top of a container may ultimately  
 affect the topology of the container, which makes it unlike e.g. Take or  
 Chain. I could choose to disallow that and simply require that  
 BinaryHeap always works on top of a range, not a container. But I think  
 it's useful to have the growing functionality, and I don't think that  
 makes BinaryHeap hopelessly confusing.

What about making a BinaryHeapRange and a BinaryHeapContainer (well, with better names)? Essentially, the container flavored heap would use a BinaryHeapRange for its range type, and would supply the necessary functions for it to be a container. I admit not reading any code for BinaryHeap, so this might just be a foolish suggestion... -Steve
Jun 08 2010
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 On 06/08/2010 10:53 AM, Simen kjaeraas wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:


 But where should I put it then? I thought it would be even more
 confusing if I put something in std.container that wait a minute, is
 not a container.

How is it not a container? Because it uses a different container as a back-end?

It does not implement the container primitives and is not a reference type.

So it is not a container because you chose not to make it a container.
 There are already rules that disambiguate range operations from similar  
 container operations. For example a range defines popFront() whereas a  
 container defines removeFront().

And my point is that removeFront() is popFront() with a different name, so many containers could be considered ranges with mutated member functions. :p But we're arguing semantics.
 I agree that a BinaryHeap built on top of a container may ultimately  
 affect the topology of the container, which makes it unlike e.g. Take or  
 Chain. I could choose to disallow that and simply require that  
 BinaryHeap always works on top of a range, not a container. But I think  
 it's useful to have the growing functionality, and I don't think that  
 makes BinaryHeap hopelessly confusing.

To me, this makes it a container. Now, my favorite way of dealing with this: Where would I look for a binary heap if I wanted one? I would think of it as a container, and thus check std.container. If it was not there, I would use the search function to find it. I can invent reasons, but it's mostly grounded in learned names and categories. -- Simen
Jun 08 2010
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 Instead of assuming that I'm simultaneously retard(ed) and stubborn

I'm sorry, that was not my intention.
 I wrote BinaryHeap out of necessity. The necessity was to solve the  
 selection problem (see topN, topNIndex in std.algorithm) and to  
 implement n-way merge (see std.algorithm.nWayUnion, I think I'll rename  
 it to nWayMerge).

 In all these cases, there was no necessity for defining a heap as a  
 container; instead, using it as a means to add structure over an  
 existing range were sufficient.

 If I defined the heap as a container, there would be need for one extra  
 allocation and also extra indirections to ensure reference semantics. I  
 wanted to avoid all that.

Now this is exactly what I wanted! Thank you. Sorry we had to fight to get there. :p
 It's a good point. I agree it would make sense to define a binary heap  
 container in addition to a binary heap range. I'm only weary that they  
 don't have very clean means to reuse code, so we'll end up with some  
 unpleasant implementation internals. I guess I'll just have to do that.

I fear that this will also lead to confusion. "What? Two binheaps? What do they need that for?" Basically, the binheap range is a heapifying range, yes? So to me, it seems to make sense to have std.range.heapify, and std.container.binheap. -- Simen
Jun 09 2010
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
  Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 The problem is that operating on a range as a heap is not just one  
 operation (i.e. heapify), it's a handful of them. STL took the approach  
 of providing make_heap, push_heap, pop_heap, sort_heap, and is_heap.  
 Their use is quite clunky. I think that's too unstructured an approach  
 and would like to improve on it.

I might have mixed up names here. I meant heapify not as the operation, but as in 'turn this into a heap'. -- Simen
Jun 09 2010
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 Having a module and a package with the same name does not currently  
 work, and I don't think there's a good rationale for that limitation.

////////////// module a; public import a.b; struct b { static int c; } ////////////// module a.b; int c; ////////////// module foo; import a; a.b.c = 3; // Where's you lookup rules now? -- Simen
Jun 09 2010
prev sibling parent Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el  9 de junio a las 14:59 me escribiste:
 On 06/09/2010 02:48 PM, Walter Bright wrote:
Andrei Alexandrescu wrote:
On 06/09/2010 07:57 AM, Michel Fortin wrote:
On 2010-06-08 17:41:22 -0400, "Simen kjaeraas" <simen.kjaras gmail.com>
said:

Now, my favorite way of dealing with this: Where would I look for a
binary heap if I wanted one? I would think of it as a container, and
thus
check std.container. If it was not there, I would use the search
function
to find it. I can invent reasons, but it's mostly grounded in learned
names and categories.

And if you were accustomed to the STL, you'd just look for a binary heap header to include instead of trying to philosophize about which category of things it fits best. That's why I suggested "std.binaryheap" earlier. (Could be "std.binheap" if you want it short.)

I don't think this will scale. There are quite a few data structures out there, I'm afraid we'll have too many modules too soon.

One solution is to have std.container consist of the following: public import std.containers.binaryheap; public import std.containers.redblacktree;

I wanted to do that but with the singular: public import std.container.binaryheap; So if someone imports std.container then they get them all, if someone imports std.container.something then they only get something. Having a module and a package with the same name does not currently work, and I don't think there's a good rationale for that limitation.

That's why the convetion .all is used. If that worked I guess .all wasn't necessary in the first place. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- You can try the best you can If you try the best you can The best you can is good enough
Jun 09 2010