www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Container hierarchy vs. container types

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Hi everyone,


I'm on a trip to Romania visiting relatives. Being disconnected for a 
while, I thought some about container types and started writing some code.

It didn't take long to figure that a classic class hierarchy with 
inheritance etc. would contain prohibitively many types (with awkward 
names too), or would lose a lot of expressiveness. There are just so 
many types of containers, and they have so different characteristics.

My second shot was with a capability-based architecture suggested (in an 
unrelated context) by Scott Meyers 
(http://www.artima.com/cppsource/codefeatures.html - good article, 
recommended regardless). Some capabilities I came up with were:

enum ContCaps : uint
{
     hasLength = 1,
         hasAssignableLength = 2,
         hasPushBack = 4,
         hasRemoveBack = 8,
         hasPushFront = 16,
         hasRemoveFront = 32,
         linear = 64,
         hasFront = 128,
         hasBack = 256,
         randomAccess = 512,
}

A container would look like this:

interface Container(T, uint caps = 0) ...

The trick is that any container could contain more or less any free 
combination of capabilities. There may be a few combinations that don't 
quite make sense, but most do.

The capability-based design prescribes that a container with N 
capabilities inherits each and every container with N-1 capabilities 
obtained by chopping away one capability in turn. For example, a 
container with Container!(int, 19) would inherit Container!(int, 18), 
Container!(int, 17), and Container!(int, 3). That was a bit difficult to 
implement, but I pulled it off with a string mixin.

The problem is that the number of effective base interfaces grows 
exponentially with the number of capabilities. (That's a problem also 
experienced by Scott.) If I instantiated a Container with all 
capabilities, the compiler would hang forever. If I had 6-7 
capabilities, the code would compile after a while but would generate 
interfaces that have size 140KB-260KB excluding payload, which is 
prohibitive.

The joys of inheritance hierarchies!

The design doesn't even address an important aspect that I'm still 
mulling over - how do I model whether the container exposes lvalues or 
is completely encapsulated? For example a trie is unable to expose the 
lvalues of the arrays it holds because it distributes storage.

But wait, there's more.

Containers must expose ranges which face similar issues. Here are my 
incomplete range capabilities:

enum RangeCaps
{
     rvalueElem = 1,
         assignableElem = 2,
         swappableElem = 4,
         infinite = 8,
         noLength = 16,
}

These capabilities are orthogonal on the fundamental range kinds (input, 
forward, double-ended, random). So we're looking at not one thick 
hierarchy, but two!

I decided to let no assumption unchallenged and got back to the 
question: do we really need a container _hierarchy_? How much joy and 
usefulness do people derive from dynamically changing an array with a 
list under the same interface? Or a rb-tree vs. a binary tree vs. a 
left-leaning binary tree?

As far as I can tell such changes of containers are a design-time 
decision, and extremely rarely (never as far as I remember) a runtime 
decision to hide behind a binary interface. I'm starting to think that 
it's quite the other way around, and a Sapir-Whorf kind of thing: The 
fact that Java lacks symbol aliasing capability conditions the thinking 
that a hierarchy is the right thing to do, whereas the true solution is 
to design a flat federation of container types with name- and 
semantics-compatible primitives, to then use them directly or via 
design-time aliases.

Obviously I might be wrong in any number of ways. Switching containers 
dynamically might be very useful. The precision, flexibility, and beauty 
I'm aiming for may be useless. Container hierarchies may have some other 
awesomeness I'm failing to see. But for the time being I'm very 
seriously considering a simple design:

* Each specific container is an unconnected class (or struct, haven't 
decided yet - probably structs to accommodate reference-counted containers);

* Naming matches dictate semantic matches. There should be no false 
friends - if a container defines removeFront, that should mean the same 
thing as for other containers (i.e. O(1) removal of the first element).

This design follows and enriches STL's design even after we have, as a 
community, seen many hierarchy-based designs, and even though it looks 
like hierarchies have "won". The main improvements I want to bring to 
STL's design are (a) a better categorization of containers, (b) of 
course replace iterators with ranges, (c) either eliminate allocators or 
make them work.

Needless to say, I'd be very curious to hear other opinions in this 
matter. Please speak up - the future of D depends, in small part, on 
this too.


Andrei
Mar 04 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 As far as I can tell such changes of containers are a design-time 
 decision, and extremely rarely (never as far as I remember) a runtime 
 decision to hide behind a binary interface.

In your post there's lot of stuff to think about. I can see you are trying to innovate, so here's a comment about data structures of the future. In Gcc 4.5 they have added a profile mode: http://gcc.gnu.org/onlinedocs/libstdc++/manual/profile_mode.html It gives you simple suggestions, based on simple statistics collected at run-time, about data structures choice. Today data structures are mostly chosen at compile time by the programmer, but once the runtime has those statistics collected at run-time, you can think of feeding that data into the program itself at runtime, so data structures can adapt themselves. This can be useful for programs that run on servers, for minutes, hours or days, where they have to face different workloads as time passes. Bye, bearophile
Mar 04 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 As far as I can tell such changes of containers are a design-time 
 decision, and extremely rarely (never as far as I remember) a runtime 
 decision to hide behind a binary interface.

In your post there's lot of stuff to think about. I can see you are trying to innovate, so here's a comment about data structures of the future. In Gcc 4.5 they have added a profile mode: http://gcc.gnu.org/onlinedocs/libstdc++/manual/profile_mode.html It gives you simple suggestions, based on simple statistics collected at run-time, about data structures choice. Today data structures are mostly chosen at compile time by the programmer, but once the runtime has those statistics collected at run-time, you can think of feeding that data into the program itself at runtime, so data structures can adapt themselves. This can be useful for programs that run on servers, for minutes, hours or days, where they have to face different workloads as time passes. Bye, bearophile

Good point, perfect. I agree with that, and I meant to massage that point within the message. A structure that's an e.g. set that adapts its strategy dynamically depending on load is a perfect example. I still don't think that's the job of a hierarchy - it's the encapsulation of an algebraic type together with a strategy for choosing the type currently used. Apologies for the dense message. I wanted to concentrate as much info in as little time/space as possible. Andrei
Mar 04 2010
next sibling parent reply Sandwich <bigsandwich gmail.com> writes:
/*Me ends lurking*/

As far as allocators go, you may want to take a look at how eastl handles them:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html#eastl_allocator

/*Resumes Lurking*/

Andrei Alexandrescu Wrote:

 bearophile wrote:
 Andrei Alexandrescu:
 As far as I can tell such changes of containers are a design-time 
 decision, and extremely rarely (never as far as I remember) a runtime 
 decision to hide behind a binary interface.

In your post there's lot of stuff to think about. I can see you are trying to innovate, so here's a comment about data structures of the future. In Gcc 4.5 they have added a profile mode: http://gcc.gnu.org/onlinedocs/libstdc++/manual/profile_mode.html It gives you simple suggestions, based on simple statistics collected at run-time, about data structures choice. Today data structures are mostly chosen at compile time by the programmer, but once the runtime has those statistics collected at run-time, you can think of feeding that data into the program itself at runtime, so data structures can adapt themselves. This can be useful for programs that run on servers, for minutes, hours or days, where they have to face different workloads as time passes. Bye, bearophile

Good point, perfect. I agree with that, and I meant to massage that point within the message. A structure that's an e.g. set that adapts its strategy dynamically depending on load is a perfect example. I still don't think that's the job of a hierarchy - it's the encapsulation of an algebraic type together with a strategy for choosing the type currently used. Apologies for the dense message. I wanted to concentrate as much info in as little time/space as possible. Andrei

Mar 04 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sandwich wrote:
 /*Me ends lurking*/
 
 As far as allocators go, you may want to take a look at how eastl handles them:
 
 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html#eastl_allocator
 
 /*Resumes Lurking*/

Thanks. Wonder how many other knowledgeable lurkers are out there :o). Andrei
Mar 04 2010
prev sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 Apologies for the dense message. I wanted to concentrate as much info in 
 as little time/space as possible.

Actually, thank you for making a concentrated message.
Mar 04 2010
prev sibling next sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Thu, 04 Mar 2010 18:22:55 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
[snip]

I like this idea. I've gotten used to using template constraints and  
ranges when coding which matches this strategy naturally. A lot of the  
capabilities seem to match ranges, but why did you use different names for  
some of them?
removeBack : popBack
removeFront: popFront
pushBack   : put
pushFront  : putFront?
Mar 04 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 On Thu, 04 Mar 2010 18:22:55 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 [snip]
 
 I like this idea. I've gotten used to using template constraints and 
 ranges when coding which matches this strategy naturally. A lot of the 
 capabilities seem to match ranges, but why did you use different names 
 for some of them?
 removeBack : popBack
 removeFront: popFront
 pushBack   : put
 pushFront  : putFront?
 

It's intentional. You don't want to use a container as a range by mistake. Andrei
Mar 04 2010
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-03-05 01:53:32 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Robert Jacques wrote:
 On Thu, 04 Mar 2010 18:22:55 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 [snip]
 
 I like this idea. I've gotten used to using template constraints and 
 ranges when coding which matches this strategy naturally. A lot of the 
 capabilities seem to match ranges, but why did you use different names 
 for some of them?
 removeBack : popBack
 removeFront: popFront
 pushBack   : put
 pushFront  : putFront?

It's intentional. You don't want to use a container as a range by mistake.

I agree that "put" has different semantics than "push" so the name should be different, but what's the difference in semantics between "remove" and "pop"? Both remove an element from the subject (range or container). I think they should share the same name. You say you don't want to mix ranges with containers. But a container somewhat has the same semantics as an input stream: if you remove/pop something form it, it's no longer there for anyone. You decided to implement input streams as ranges, but at the same time you avoid reusing the same interface for containers; isn't that a little contradictory? Why couldn't a container have the same interface as a range, only with the added ability to push elements into it? -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Mar 05 2010
parent reply Jonathan M Davis <jmdavisProg gmail.com> writes:
 Why couldn't a container have the same interface as a range, only with
 the added ability to push elements into it?
 

Probably because it would become way too easy to accidentally use the container as a range, which in many cases would end up emptying it when passing it to one of the std algorithms. - Jonathan M Davis
Mar 05 2010
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-03-05 11:32:27 -0500, Jonathan M Davis <jmdavisProg gmail.com> said:

 Why couldn't a container have the same interface as a range, only with
 the added ability to push elements into it?

Probably because it would become way too easy to accidentally use the container as a range, which in many cases would end up emptying it when passing it to one of the std algorithms.

Yes, I understand that. My point is that the problem already exists with ranges: by exposing a stream as a range, you'll empty the stream when iterating over it. If that's acceptable for streams, why is it not for containers? And perhaps you actually want to empty the container as you iterate over it. If you use your container as a queue, that's what you'll expect. In fact, a stream is just like a queue, and you want algorithms to work on streams don't you? -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Mar 05 2010
next sibling parent Jonathan M Davis <jmdavisProg gmail.com> writes:
Michel Fortin wrote:

 On 2010-03-05 11:32:27 -0500, Jonathan M Davis <jmdavisProg gmail.com>
 said:
 
 Why couldn't a container have the same interface as a range, only with
 the added ability to push elements into it?

Probably because it would become way too easy to accidentally use the container as a range, which in many cases would end up emptying it when passing it to one of the std algorithms.

Yes, I understand that. My point is that the problem already exists with ranges: by exposing a stream as a range, you'll empty the stream when iterating over it. If that's acceptable for streams, why is it not for containers? And perhaps you actually want to empty the container as you iterate over it. If you use your container as a queue, that's what you'll expect. In fact, a stream is just like a queue, and you want algorithms to work on streams don't you?

It would probably be better in such a case to have a range wrap the container. That way, you can have range semantics for the container if you want them, but you don't risk using them on a container where you don't. As for streams, I don't know. I do like the idea of using ranges with streams, but I haven't really looked at what the downsides to that might be. It depends a lot on what the std algorithms do. With some it might be fine while with others it would be a really bad idea. I'd have to look into it more to say, but at least with a stream, it's expected that iterating over it will alter it. With a container, that's not usually the case. And in cases where it is, either that container could be a special case and be a range as well, or it you could have a wrapper range do it. Personally, I'd prefer the wrapper idea. It would probably allow you to create generic wrappers to use on pretty much any container (or at least containers with a particular set of functions) rather than special casing particular containers. Regardless, I think that making containers ranges in the general case is a bad idea. - Jonathan M Davis
Mar 05 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2010-03-05 11:32:27 -0500, Jonathan M Davis <jmdavisProg gmail.com> 
 said:
 
 Why couldn't a container have the same interface as a range, only with
 the added ability to push elements into it?

Probably because it would become way too easy to accidentally use the container as a range, which in many cases would end up emptying it when passing it to one of the std algorithms.

Yes, I understand that. My point is that the problem already exists with ranges: by exposing a stream as a range, you'll empty the stream when iterating over it. If that's acceptable for streams, why is it not for containers? And perhaps you actually want to empty the container as you iterate over it. If you use your container as a queue, that's what you'll expect. In fact, a stream is just like a queue, and you want algorithms to work on streams don't you?

A range is a view on a container. As such, popping off a range does not affect the topology of the underlying container. Consider, for contrast, a balanced binary tree. Removing an element triggers a rebalance of the tree, but popping an element off an iterator in that tree does not affect the tree itself. Fair enough? Andrei
Mar 05 2010
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-03-05 16:09:08 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 A range is a view on a container. As such, popping off a range does not 
 affect the topology of the underlying container. Consider, for 
 contrast, a balanced binary tree. Removing an element triggers a 
 rebalance of the tree, but popping an element off an iterator in that 
 tree does not affect the tree itself.
 
 Fair enough?

That's a fine definition, except for the word "topology". Whether it changes the topology should be an implementation detail. Popping of a C++ vector or a hash table doesn't change the topology, will you make them ranges? That doesn't make much sense. Also, do you consider that popping off a stream affects its topology? If your stream is buffered, you're probably advancing a pointer through a buffer with 'pop', and deallocating old buffers from time to time. So this definition would forbid a stream from being a range. That doesn't make much sense. And similarly you could have a queue container where you insert things at the back and pop things from the front. But that's basically the same thing as a stream. Is a queue a container or a range? I think this definition is more on the spot: a range is something you consume, a container is something used to hold data. Those definition are not mutually exclusive, but is this really a problem? -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Mar 05 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2010-03-05 16:09:08 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 A range is a view on a container. As such, popping off a range does 
 not affect the topology of the underlying container. Consider, for 
 contrast, a balanced binary tree. Removing an element triggers a 
 rebalance of the tree, but popping an element off an iterator in that 
 tree does not affect the tree itself.

 Fair enough?

That's a fine definition, except for the word "topology". Whether it changes the topology should be an implementation detail. Popping of a C++ vector or a hash table doesn't change the topology, will you make them ranges? That doesn't make much sense.

It's essential. The way ranges work, they can never change the topology of the collection they operate on. That is a fundamental aspect. A container operation may or may not affect its topology.
 Also, do you consider that popping off a stream affects its topology?

A stream has no topology I can think of. Within this context, I define topology loosely as the way the data of a container is stored and interconnected in memory.
 If 
 your stream is buffered, you're probably advancing a pointer through a 
 buffer with 'pop', and deallocating old buffers from time to time. So 
 this definition would forbid a stream from being a range. That doesn't 
 make much sense.
 
 And similarly you could have a queue container where you insert things 
 at the back and pop things from the front. But that's basically the same 
 thing as a stream. Is a queue a container or a range?
 
 I think this definition is more on the spot: a range is something you 
 consume, a container is something used to hold data. Those definition 
 are not mutually exclusive, but is this really a problem?

I think it is. For starters, copying a forward range copies its state, whereas copying a container and then removing stuff from it results in two containers with less net state. By and large, clearly there are important distinctions between ranges and containers. My perception is that you do have that understanding, but are doing a Plato on me. I'm a bit tired of having a Plato done on me, particularly in this case when I don't have the answers. Would be great to strengthen the weak definitions we have now instead of challenging them. Andrei
Mar 05 2010
prev sibling next sibling parent Jerry Quinn <jlquinn optonline.net> writes:
Andrei Alexandrescu Wrote:

 I decided to let no assumption unchallenged and got back to the 
 question: do we really need a container _hierarchy_? How much joy and 
 usefulness do people derive from dynamically changing an array with a 
 list under the same interface? Or a rb-tree vs. a binary tree vs. a 
 left-leaning binary tree?

 As far as I can tell such changes of containers are a design-time 
 decision, and extremely rarely (never as far as I remember) a runtime 
 decision to hide behind a binary interface. I'm starting to think that 
 it's quite the other way around, and a Sapir-Whorf kind of thing: The 
 fact that Java lacks symbol aliasing capability conditions the thinking 
 that a hierarchy is the right thing to do, whereas the true solution is 
 to design a flat federation of container types with name- and 
 semantics-compatible primitives, to then use them directly or via 
 design-time aliases.

I've rarely found it helpful that an ArrayList is a List, or that HashMap is a Map. If an API I'm passing to takes the base class, it could easily have been done as a template. And dynamic switching hasn't ever been useful to me. As long as the API is the same, recompiling will work and the containers will continue to behave well. If dynamic switching is really necessary for an application, it can be done by wrapping the container with a class hierarchy as needed.
 Obviously I might be wrong in any number of ways. Switching containers 
 dynamically might be very useful. The precision, flexibility, and beauty 
 I'm aiming for may be useless. Container hierarchies may have some other 
 awesomeness I'm failing to see. But for the time being I'm very 
 seriously considering a simple design:
 
 * Each specific container is an unconnected class (or struct, haven't 
 decided yet - probably structs to accommodate reference-counted containers);

Also, structs would reduce the indirections required to access the data, when placing them inside other aggregates as well as reduce memory allocation traffic. I'd rather not pay for 2 allocations to create an instance of: class C { vector!(int) v; }
 * Naming matches dictate semantic matches. There should be no false 
 friends - if a container defines removeFront, that should mean the same 
 thing as for other containers (i.e. O(1) removal of the first element).

This I would hesitate a bit on. I recently changed STL maps for custom hashmaps and not changing operator[] for different code is useful. I'm changing the computational complexity of an operation but conceptually it's the same kind of access. I'm doing so because I know that for my application, the tradeoffs elsewhere are acceptable.
 This design follows and enriches STL's design even after we have, as a 
 community, seen many hierarchy-based designs, and even though it looks 
 like hierarchies have "won". The main improvements I want to bring to 
 STL's design are (a) a better categorization of containers, (b) of 
 course replace iterators with ranges, (c) either eliminate allocators or 
 make them work.
 
 Needless to say, I'd be very curious to hear other opinions in this 
 matter. Please speak up - the future of D depends, in small part, on 
 this too.

The concept of allocators is nice in principal, though I've never had to put them into action. A couple of places I could see having used STL allocators: wrapping JNI memory or GC memory in an STL container. Neither is particularly necessary in D. Another place I could envision an allocator is if I load a large static data structure with mmap and want to wrap an STL structure around it. I've wanted to do this in particular with strings. This is where C++ allocators fall down at least. Trying to mix custom allocated data structures with standard ones is going to be painful. Jerry
Mar 04 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmail.com> writes:
I'd definitely go with the unconnected class solution. Sometimes it's useful 
to be able to have a class hierarchy where you don't have to care whether 
about things like whether a Map is a HashMap or a SortedMap, but it 
frequently comes back to bite you - especially when they require that 
different operations be implemented by the elements they hold. It can easily 
lead to an inefficient use of containers - especially after changing one 
type for another. Additionally, from a performance perspective, by avoiding 
the class hierarchy, it's possible for them to be final and allow inlining 
and such. So, there are performance gains by avoiding the class hiercharchy.

Also, I recall Scott Meyers talking in Effective STL how containers aren't 
truly interchangeable and how you can get yourself into trouble when you try 
and swap one out for another. You really should be aware of what container 
type your using and what it's strengths and weaknesses are.

What it generally comes down to is that you want containers to have the same 
functions when they do the same thing, so that it's possible to swap one for 
another when you want to. The same API is generally enough for that. The 
fact that D has auto makes it that much easier. Being able to swap out 
vector and list or ArrayList and LinkedList can be useful in some 
circumstances, but having a List which could be either (or yet another 
unknown type) is not all that useful, generally speaking. And, as I said 
before, it makes it really easy to use such containers in inefficient ways 
since the performance of each operation of each operation can differ 
significantly depending on the type. It also leads to using only the 
functions which are the lowest common denominator, which means that the 
containers are not used to their full potential and are probably being used 
less efficiently.

If you have a set of unconnected container classes and someone wants a 
hierarchy of containers for their project, it would be quite simple to set 
up a set of wrapper classes for the containers that they want to be in a 
hierarchy. On the other hand, you can't take a class hierarchy and use a 
wrapper class to turn it into a flat set of unconnected containers.

I say that it makes sense for functions with the same behavior to have the 
same names, so that you can effectively duck type the containers, but I 
don't see much benefit in putting them in a hierarchy where you supposedly 
don't have to care about what the actual container is that you're using as 
long as it implements a particular interface.

In any case, I'm glad to see that you're finally working on the container 
problem. I think that it's easily the thing that phobos lacks most, and it 
would definitely be easier to get people to use D, if there were containers. 
Most programmers I know would probably not touch D as long as there are no 
standard containers. It'll be good once we have them. And hopefully, they'll 
be awesome.

- Jonathan M Davis
Mar 04 2010
prev sibling next sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Hi everyone,
 I'm on a trip to Romania visiting relatives. Being disconnected for a
 while, I thought some about container types and started writing some code.
 It didn't take long to figure that a classic class hierarchy with
 inheritance etc. would contain prohibitively many types (with awkward
 names too), or would lose a lot of expressiveness. There are just so
 many types of containers, and they have so different characteristics.
 My second shot was with a capability-based architecture suggested (in an
 unrelated context) by Scott Meyers
 (http://www.artima.com/cppsource/codefeatures.html - good article,
 recommended regardless). Some capabilities I came up with were:
 enum ContCaps : uint
 {
      hasLength = 1,
          hasAssignableLength = 2,
          hasPushBack = 4,
          hasRemoveBack = 8,
          hasPushFront = 16,
          hasRemoveFront = 32,
          linear = 64,
          hasFront = 128,
          hasBack = 256,
          randomAccess = 512,
 }

This is why I hate nominative typing (http://en.wikipedia.org/wiki/Nominative_type_system).
Mar 04 2010
prev sibling next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 I decided to let no assumption unchallenged and got back to the 
 question: do we really need a container _hierarchy_? How much joy and 
 usefulness do people derive from dynamically changing an array with a 
 list under the same interface? Or a rb-tree vs. a binary tree vs. a 
 left-leaning binary tree?

I've never, in 35 years of programming, ever had a need to change the container type at runtime. I don't think the thought has ever even crossed my mind. I appreciate bearophile's point of runtime profiling being used to change the container, but I suggest that such containers be internally self-modifying, rather than externally using user-visible virtual functions. I more and more like the idea of having a set of names defined by convention that have conventional behaviors, in other words, duck typing. It's infinitely extendable and flexible without needing to modify the core library in any way. It also fits right in with D's template constraints. The bit flag approach is not pretty. Furthermore, to add a capability means one has to redo that enum in the core library, which is not scalable and people don't like making their own custom core libraries. Even if they did, it makes for mutually incompatible third party libraries. Yuk.
Mar 04 2010
prev sibling next sibling parent reply BLS <windevguy hotmail.de> writes:
On 05/03/2010 00:22, Andrei Alexandrescu wrote:
 I want to bring to STL's design are (a) a better categorization of
 containers, (b) of course replace iterators with ranges, (c) either
 eliminate allocators or make them work.

I would like to see : collection update events. The C# C5 library is (afaik) the only container library which implements this very useful feature. (example a database informs connected users about relevant changes to force update the GUI, no need to pull just in case every 5 seconds) Beside, Implementing update events could be a good reason to give std.pattern a new try... http://www.itu.dk/research/c5/
 Please speak up - the future of D depends, in small part, on
 this too.
 Andrei

I think it is a big win for the D2 community to have a container library. I.E. porting C++ will be more comfortable. Bjoern
Mar 05 2010
parent reply Jacob Carlborg <doob me.com> writes:
On 3/5/10 10:44, BLS wrote:
 On 05/03/2010 00:22, Andrei Alexandrescu wrote:
 I want to bring to STL's design are (a) a better categorization of
 containers, (b) of course replace iterators with ranges, (c) either
 eliminate allocators or make them work.

I would like to see : collection update events. The C# C5 library is (afaik) the only container library which implements this very useful feature. (example a database informs connected users about relevant changes to force update the GUI, no need to pull just in case every 5 seconds)

That looks really nice to have.
 Beside, Implementing update events could be a good reason to give
 std.pattern a new try...

 http://www.itu.dk/research/c5/

 Please speak up - the future of D depends, in small part, on
 this too.
 Andrei

I think it is a big win for the D2 community to have a container library. I.E. porting C++ will be more comfortable. Bjoern

Mar 05 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
== Quote from Jacob Carlborg (doob me.com)'s article
 On 3/5/10 10:44, BLS wrote:
 On 05/03/2010 00:22, Andrei Alexandrescu wrote:
 I want to bring to STL's design are (a) a better categorization of
 containers, (b) of course replace iterators with ranges, (c) either
 eliminate allocators or make them work.

I would like to see : collection update events. The C# C5 library is (afaik) the only container library which implements this very useful feature. (example a database informs connected users about relevant changes to force update the GUI, no need to pull just in case every 5 seconds)


It's a nice touch. I hope at least to allow that kind of customization from the outside. Andrei
Mar 05 2010
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 04 Mar 2010 18:22:55 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Needless to say, I'd be very curious to hear other opinions in this  
 matter. Please speak up - the future of D depends, in small part, on  
 this too.

As you know, you and I disagree on some aspects of containers. However, I think our philosophies are converging. Having built a container library, I can tell you the one and only reason I made it have an interface hierarchy -- Tango. Tango had a container library, which was based on ancient Doug Lea collections, with some added features. Because I intended dcollections to replace Tango's collections, I tried to encompass the same feature set that it had. One of those features was the ability to use interfaces without knowing the implementation. Since then, Tango has replaced their container collection with something different, and guess what -- no more interface hierarchy. They have a single interface ICollection, and all containers implement that interface. No Map interface, or List interface or anything. However, although I think generic containers can avoid *requiring* an interface, if you design your classes well, slapping on an interface costs almost nothing. It depends on whether you wish to use classes or structs to implement containers. I still think classes are better, because modifying one aspect of a class is easy to do. If someone wants to make a new type of HashMap that does everything my HashMap does, except changes one little bit, it's really easy. With the advent of alias this, it's also possible to do something similar with structs, but not as straightforward, and not without recompiling. Also, passing containers around by value by default is one of the aspects of STL that I think sucks. When working with STL, I almost never passed around a container by value, I always used a reference, because passing by value can incur large hidden-allocation costs. I'll go over a quick set of points that are pro-interface. First, using an interface hides the implementation. It may not be possible to have your code on display for the compiler to use. Using an interface is a perfectly acceptable way to hide proprietary code that you cannot legally divulge. This is probably the weakest of the points, but I put it out there. Second, D is a statically compiled language, but with the (hopefully soon) evolution to dynamic linking, using an interface is ideal. If you for instance wish to pass a map to or from a plugin library, using an interface is probably the best way to do it. Interfaces are less susceptible to implementation changes/differences. Third, code that uses an interface is compiled once per interface. Code that uses duck typing is compiled once per set of arguments. While this might not seem like much, it can reduce the footprint of generated code. Using duck typing, you may have two almost identical generated functions that differ only by the function addresses used. Finally, interfaces simplify understanding. Once you have used an interface, you know "oh yeah, this is a map, so I can use it like a map." You can strive to build a container library that follows those principles, even making assertions that force the compiler to prove those principles, but it's not as easy for a person to understand as it is to look at interface documentation and know what it does. This becomes important when using libraries that use special implementations of containers. Like for instance a database result or an XML tree. Interfaces in other languages can be viewed as advantageous in other ways, but D has advanced compile-time interfaces so far that those don't really matter in D. For example, declaring that a function requires a map container can be done with duck typing via conditional compilation. At the same time, just like I think ranges don't fit every model (*cough* I/O), interfaces aren't the answer to every aspect of containers. I don't think ranges fit well with interfaces, because iterating interface ranges prevents inlining -- the major draw of ranges in the first place -- and ranges are so much more useful with value semantics. I also think functions that can be tuned to each implementation should be. For this reason, dcollections containers provide a lot of functionality that is not included in the interfaces, simply because the functions are so specific to the implementation, it would be the only class that implemented that interface. For example, all the functions that return cursors (and soon ranges) are not interface functions. This doesn't make them useless via interfaces, but you cannot use every aspect of the container via an interface. An interface is like a common denominator, and I think it should be useful for some purposes. If nobody will ever use the interface as a parameter to a function, then there is no point in declaring the interface (I realize that I have created such interfaces in dcollections and I plan to correct that -- one nice benefit of the contemplation triggered by this discussion). I am working on updating dcollections as we post, and I think I have come up with a nifty way to do ranges that will both retain the usefulness of the cursor, and provide a common way to plug the collections into std.algorithm. Good luck with your containers, I still hold out hope that dcollections can be integrated in Phobos, but I probably need to get it working in order to have any chance of competing :) -Steve
Mar 05 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Thu, 04 Mar 2010 18:22:55 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Needless to say, I'd be very curious to hear other opinions in this 
 matter. Please speak up - the future of D depends, in small part, on 
 this too.

As you know, you and I disagree on some aspects of containers. However, I think our philosophies are converging.

To summarize my understanding of your points: (a) When you only want to tweak a container's behavior in small ways, inheritance is great. (b) Pass by reference is more natural for containers than pass by value. (c) Proprietary code hiding is better with interfaces. (d) Dynamically-linked code works best with interfaces. (e) Interface-based code requires less recompilation and may save on generated code. (f) Interfaces simplify understanding. At the same time, you argue that ranges aren't fit as interfaces because: (a) Inlinining is difficult/impossible (b) Value semantics is more natural for ranges (c) I didn't understand this because you changed the subject from ranges back to containers:
 I also think functions that can be tuned to each 
 implementation should be.  For this reason, dcollections containers 
 provide a lot of functionality that is not included in the interfaces, 
 simply because the functions are so specific to the implementation, it 
 would be the only class that implemented that interface.  For example, 
 all the functions that return cursors (and soon ranges) are not 
 interface functions.  This doesn't make them useless via interfaces, but 
 you cannot use every aspect of the container via an interface.  An 
 interface is like a common denominator, and I think it should be useful 
 for some purposes.  If nobody will ever use the interface as a parameter 
 to a function, then there is no point in declaring the interface (I 
 realize that I have created such interfaces in dcollections and I plan 
 to correct that -- one nice benefit of the contemplation triggered by 
 this discussion).

One other thing I'm worried about, and was an absolute pain in my preliminary tests of container implementations, is this issue of "iterators/ranges/cursors are not part of the container interface". It is absolutely fundamental that you can get an abstract range from an abstract container because most of what you can do to a container you must do with ranges. Otherwise the duplication of functionality is horrid. This failure alone is a reason to abandon the container hierarchy design.
 I am working on updating dcollections as we post, and I think I have 
 come up with a nifty way to do ranges that will both retain the 
 usefulness of the cursor, and provide a common way to plug the 
 collections into std.algorithm.
 
 Good luck with your containers, I still hold out hope that dcollections 
 can be integrated in Phobos, but I probably need to get it working in 
 order to have any chance of competing :)

Good luck to you too. As I specified in my private message - if you undo the fundamental mistake of making find()/contains() part of the container (and of course all other disastrous mistakes of that kind), then I'm sure dcollections will be a very solid library. Andrei
Mar 05 2010
next sibling parent Jonathan M Davis <jmdavisProg gmail.com> writes:
Andrei Alexandrescu wrote:


 (b) Pass by reference is more natural for containers than pass by value.

I concur with this. It rarely makes sense to outright copy containers, and when you do, you're explicitly looking to do so. If a container were a struct, then returning it from a function would be expensive. And it could get really messy trying to have one container as an element in another - particularly if you can't get a reference to the element in the container. You'd have to copy it out of the container, modify it, and copy it back in again. That's way too much unnecessary copying. And it would be easy to think that after you'd changed the element after getting it out of the container that it had changed the one in the container when you hadn't. Container!(Container!(E)) cont; ... auto element = cont.get(x); element.add(someValue); //With a class, the element in the container would now be altered. vs Container!(Container!(E)) cont; ... auto element = cont.get(x); //this involves a copy element.add(someValue); cont.set(x, element); //this involves a copy Now, by getting at the elements of a container by reference, that reduces the problem, but generally speaking, it makes no sense to be copying containers, and it's expensive to do so. I'd vote to just make them final classes. You gain the efficiency of inlining and have reference semantics which is really what typically makes sense for containers. If you want it copied, you can clone it or construct a new one with the elements of the first one. I really don't see what would be gained by making a container a struct. - Jonathan M Davis
Mar 05 2010
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Fri, 05 Mar 2010 11:53:47 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 On Thu, 04 Mar 2010 18:22:55 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 Needless to say, I'd be very curious to hear other opinions in this 
 matter. Please speak up - the future of D depends, in small part, on 
 this too.

However, I think our philosophies are converging.

To summarize my understanding of your points: (a) When you only want to tweak a container's behavior in small ways, inheritance is great. (b) Pass by reference is more natural for containers than pass by value. (c) Proprietary code hiding is better with interfaces. (d) Dynamically-linked code works best with interfaces. (e) Interface-based code requires less recompilation and may save on generated code. (f) Interfaces simplify understanding. At the same time, you argue that ranges aren't fit as interfaces because: (a) Inlinining is difficult/impossible (b) Value semantics is more natural for ranges

A range is not a container. A range is a small reference-like construct that points to a containers elements. I stand by what I said, a container is more natural as a reference, ranges are more natural with value semantics.

I wasn't debating the point, just organizing ideas.
 If I pass an abstract range into something like std.algorithm.sort, the 
 performance is going to be much worse than using a value-type range.  
 The implementation has the option of doing the latter, so why not make 
 it an interface function?  In addition, std.algorithm.sort is going to 
 expect value-type ranges, so you have to make copies everywhere, 
 incurring more performance hits.

I agree. Interface ranges are unfortunately a poor practical proposition.
 One other thing I'm worried about, and was an absolute pain in my 
 preliminary tests of container implementations, is this issue of 
 "iterators/ranges/cursors are not part of the container interface". It 
 is absolutely fundamental that you can get an abstract range from an 
 abstract container because most of what you can do to a container you 
 must do with ranges. Otherwise the duplication of functionality is 
 horrid. This failure alone is a reason to abandon the container 
 hierarchy design.

It depends on what you want to do with the container. std.algorithm is not the only reason to have containers.

It's not the only reason, but it is the primordial reason - just like "transportation" is a reason for buying a car. It would be a failure to define containers that can't work with std.algorithm.
 Duplication of functionality is unavoidable if you do not have abstract 
 ranges.  However, by "duplication" it most likely means forwarding calls 
 to range-accepting functions.  For example, a List interface may have a 
 sort function.  The implementation can forward to std.algorithm.sort if 
 the list is an array, or to an internal merge sort if it is a linked 
 list (in fact, this is similar to what I do now in dcollections).  I 
 don't really know how you would do this without knowing one is a linked 
 list and one isn't.

The sort example is good, but I wonder how many of those are out there.
 It's possible to take a flat non-interfaced collection library and build 
 an interface on top of it, I don't see why it detracts from the original 
 non-interface API.

Perfect. So I take it we can focus on the federation of containers approach? Andrei
Mar 05 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 05 Mar 2010 11:53:47 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Thu, 04 Mar 2010 18:22:55 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Needless to say, I'd be very curious to hear other opinions in this  
 matter. Please speak up - the future of D depends, in small part, on  
 this too.

However, I think our philosophies are converging.

To summarize my understanding of your points: (a) When you only want to tweak a container's behavior in small ways, inheritance is great. (b) Pass by reference is more natural for containers than pass by value. (c) Proprietary code hiding is better with interfaces. (d) Dynamically-linked code works best with interfaces. (e) Interface-based code requires less recompilation and may save on generated code. (f) Interfaces simplify understanding. At the same time, you argue that ranges aren't fit as interfaces because: (a) Inlinining is difficult/impossible (b) Value semantics is more natural for ranges

A range is not a container. A range is a small reference-like construct that points to a containers elements. I stand by what I said, a container is more natural as a reference, ranges are more natural with value semantics. If I pass an abstract range into something like std.algorithm.sort, the performance is going to be much worse than using a value-type range. The implementation has the option of doing the latter, so why not make it an interface function? In addition, std.algorithm.sort is going to expect value-type ranges, so you have to make copies everywhere, incurring more performance hits.
 One other thing I'm worried about, and was an absolute pain in my  
 preliminary tests of container implementations, is this issue of  
 "iterators/ranges/cursors are not part of the container interface". It  
 is absolutely fundamental that you can get an abstract range from an  
 abstract container because most of what you can do to a container you  
 must do with ranges. Otherwise the duplication of functionality is  
 horrid. This failure alone is a reason to abandon the container  
 hierarchy design.

It depends on what you want to do with the container. std.algorithm is not the only reason to have containers. Duplication of functionality is unavoidable if you do not have abstract ranges. However, by "duplication" it most likely means forwarding calls to range-accepting functions. For example, a List interface may have a sort function. The implementation can forward to std.algorithm.sort if the list is an array, or to an internal merge sort if it is a linked list (in fact, this is similar to what I do now in dcollections). I don't really know how you would do this without knowing one is a linked list and one isn't. It's possible to take a flat non-interfaced collection library and build an interface on top of it, I don't see why it detracts from the original non-interface API. -Steve
Mar 05 2010
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 05 Mar 2010 16:16:15 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
  It depends on what you want to do with the container.  std.algorithm  
 is not the only reason to have containers.

It's not the only reason, but it is the primordial reason - just like "transportation" is a reason for buying a car. It would be a failure to define containers that can't work with std.algorithm.

I disagree. I generally use containers for collecting items for later lookup. That's generally a container-specific function that can be part of an interface. Other functions I usually use are iteration, which again can be part of the interface. Besides, making containers have interfaces does not preclude them from using std.algorithm. I agree std.algorithm is definitely a reason to have containers, but that's not mutually exclusive with interfaces. -Steve
Mar 05 2010