www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Decision on container design

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Today after work I plan to start making one pass through std.container. 
After having thought of things for a long time, my conclusions are as 
follows:

1. Containers will be classes.

2. Most of the methods in existing containers will be final. It's up to 
the container to make a method final or not.

3. Containers and their ranges decide whether they give away references 
to their objects. Sealing is a great idea but it makes everybody's life 
too complicated. I'll defer sealing to future improvements in the 
language and/or the reflection subsystem.

4. Containers will assume that objects are cheap to copy so they won't 
worry about moving primitives.

Any showstoppers, please share.


Andrei
Jan 28 2011
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei:

As far as I understand the implications, I agree with your conclusions.
Other people will be free to create an alternative D2 library like this one:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html
but a standard library can't be too much hard to use.

Bye,
bearophile
Jan 28 2011
prev sibling next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2011-01-28 13:31:58 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Today after work I plan to start making one pass through std.container. 
 After having thought of things for a long time, my conclusions are as 
 follows:
 
 1. Containers will be classes.
 
 2. Most of the methods in existing containers will be final. It's up to 
 the container to make a method final or not.
 
 3. Containers and their ranges decide whether they give away references 
 to their objects. Sealing is a great idea but it makes everybody's life 
 too complicated. I'll defer sealing to future improvements in the 
 language and/or the reflection subsystem.
 
 4. Containers will assume that objects are cheap to copy so they won't 
 worry about moving primitives.
 
 Any showstoppers, please share.

Not my preferred choices (especially #1), but having containers in Phobos will certainly be an improvement over not having them. So go ahead! About #4, it'd be nice to have the containers use move semantics when possible even if they fallback to (cheap) copy semantic when move isn't available. That way, if you have a type which is moveable but not copyable you can still put it in a container. Does that makes sense? -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 28 2011
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/28/11 3:05 PM, Michel Fortin wrote:
 On 2011-01-28 13:31:58 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 Today after work I plan to start making one pass through
 std.container. After having thought of things for a long time, my
 conclusions are as follows:

 1. Containers will be classes.

 2. Most of the methods in existing containers will be final. It's up
 to the container to make a method final or not.

 3. Containers and their ranges decide whether they give away
 references to their objects. Sealing is a great idea but it makes
 everybody's life too complicated. I'll defer sealing to future
 improvements in the language and/or the reflection subsystem.

 4. Containers will assume that objects are cheap to copy so they won't
 worry about moving primitives.

 Any showstoppers, please share.

Not my preferred choices (especially #1), but having containers in Phobos will certainly be an improvement over not having them. So go ahead!

Well if you brought forth some strong argument I'm all ears. What I see for now is that struct containers are just difficult to implement and need to have carefully explained semantics, whereas a lot of people know how classes behave from day one.
 About #4, it'd be nice to have the containers use move semantics when
 possible even if they fallback to (cheap) copy semantic when move isn't
 available. That way, if you have a type which is moveable but not
 copyable you can still put it in a container. Does that makes sense?

That's what I did up until now. It is tantamount to defining a bunch of methods (aliases or not) that add to the interface that the user must absorb, but that are seldom useful. It just seems that the entire move paraphernalia doesn't lift its weight. Andrei
Jan 28 2011
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2011-01-28 17:09:08 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 1/28/11 3:05 PM, Michel Fortin wrote:
 Not my preferred choices (especially #1), but having containers in
 Phobos will certainly be an improvement over not having them. So go ahead!

Well if you brought forth some strong argument I'm all ears. What I see for now is that struct containers are just difficult to implement and need to have carefully explained semantics, whereas a lot of people know how classes behave from day one.

We already argument this over and over in the past. First, I totally acknowledge that C++ style containers have a problem: they make it easier to copy the content than pass it by reference. On the other side of the spectrum, I think that class semantics makes it too easy to have null dereferences, it's easy to get lost when you have a container of containers. I have some experience with containers having class-style semantics: in Objective-C, I ended up creating a set of macro-like functions which I use to initialize containers whenever I use them in case they are null. And I had to do more of these utility functions to handle a particular data structure of mine which is a dictionary of arrays of objects. In C++, I'd have declared this as a "map< string, vector< Object > >" and be done with it; no need for special care initializing each vector, so much easier than in Objective-C. I agree that defining structs to have reference semantics as you have done is complicated. But I like the lazy initialization, and we have a precedent for that with AAs (ideally, AAs would be a compatible container too). Can't we just use the GC instead of reference counting? I'd make things much easier. Here is a implementation: struct Container { struct Impl { ... } private Impl* _impl; ref Impl impl() property { if (!impl) impl = new Impl; return *impl; } alias impl this; } I also believe reference semantics are not to be used everywhere, even though they're good most of the time. I'd like to have a way to bypass it and get a value-semantic container. With the above, it's easy as long as you keep Container.Impl public: void main() { Container lazyHeapAllocatedContainer; Container.Impl stackAllocatedContainer; } void MyObject { Container.Impl listOfObjects; }
 About #4, it'd be nice to have the containers use move semantics when
 possible even if they fallback to (cheap) copy semantic when move isn't
 available. That way, if you have a type which is moveable but not
 copyable you can still put it in a container. Does that makes sense?

That's what I did up until now. It is tantamount to defining a bunch of methods (aliases or not) that add to the interface that the user must absorb, but that are seldom useful. It just seems that the entire move paraphernalia doesn't lift its weight.

But could we limit this to say that only containers that can return elements by ref? Perhaps that won't help. You know the problem better than me, I don't really have anything more to say. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 28 2011
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Michel Fortin:

 On the other side 
 of the spectrum, I think that class semantics makes it too easy to have 
 null dereferences,

That's why I have asked for a suffix syntax+semantics for notnull reference types in D :-)
 I agree that defining structs to have reference semantics as you have 
 done is complicated. But I like the lazy initialization, and we have a 
 precedent for that with AAs

Built-in AAs are currently broken and in need to be fixed: import std.stdio: writeln; void foo(int[int] aa, int n) { aa[n] = n; } void main() { int[int] a; foo(a, 0); writeln(a); a[1] = 1; foo(a, 2); writeln(a); } Bye, bearophile
Jan 28 2011
prev sibling next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2011-01-28 20:10:06 -0500, "Denis Koroskin" <2korden gmail.com> said:

 Unfortunately, this design has big issues:
 
 
 void fill(Appender appender)
 {
      appender.put("hello");
      appender.put("world");
 }
 
 void test()
 {
      Appender<string> appender;
      fill(appender); // Appender is supposed to have reference semantics
      assert(appender.length != 0); // fails!
 }
 
 Asserting above fails because at the time you pass appender object to 
 the  fill method it isn't initialized yet (lazy initialization). As 
 such, a  null is passed, creating an instance at first appending, but 
 the result  isn't seen to the caller.

That's indeed a problem. I don't think it's a fatal flaw however, given that the idiom already exists in AAs. That said, the nice thing about my proposal is that you can easily reuse the Impl to create a new container to build a new container wrapper with the semantics you like with no loss of efficiency. As for the case of Appender... personally in the case above I'd be tempted to use Appender.Impl directly (value semantics) and make fill take a 'ref'. There's no point in having an extra heap allocation, especially if you're calling test() in a loop or if there's a good chance fill() has nothing to append to it. That's the issue with containers. The optimal semantics always change depending on the use case.
 An explicit initialization is needed to work around this design issue. 
 The  worst thing is that in many cases it would work fine (you might 
 have  already initialized it indirectly) but sometimes you get 
 unexpected  result. I got hit by this in past, and it wasn't easy to 
 trace down.
 
 As such, I strongly believe containers either need to have copy 
 semantics,  or be classes. However, copy semantics contradicts with the 
 "cheap copy  ctor" idiom because you need to copy all the elements from 
 source  container.

Personally, I'm really concerned by the case where you have a container of containers. Class semantics make things really complicated as you always have to initialize everything in the container explicitly; value semantics makes things semantically easier but quite inefficient as moving elements inside of the outermost container implies copying the containers. Making containers auto-initialize themselves on first use solves the case where containers are references-types; making containers capable of using move semantics solves the problem for value-type containers. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 28 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/28/11 8:12 PM, Michel Fortin wrote:
 On 2011-01-28 20:10:06 -0500, "Denis Koroskin" <2korden gmail.com> said:

 Unfortunately, this design has big issues:


 void fill(Appender appender)
 {
 appender.put("hello");
 appender.put("world");
 }

 void test()
 {
 Appender<string> appender;
 fill(appender); // Appender is supposed to have reference semantics
 assert(appender.length != 0); // fails!
 }

 Asserting above fails because at the time you pass appender object to
 the fill method it isn't initialized yet (lazy initialization). As
 such, a null is passed, creating an instance at first appending, but
 the result isn't seen to the caller.

That's indeed a problem. I don't think it's a fatal flaw however, given that the idiom already exists in AAs. That said, the nice thing about my proposal is that you can easily reuse the Impl to create a new container to build a new container wrapper with the semantics you like with no loss of efficiency. As for the case of Appender... personally in the case above I'd be tempted to use Appender.Impl directly (value semantics) and make fill take a 'ref'. There's no point in having an extra heap allocation, especially if you're calling test() in a loop or if there's a good chance fill() has nothing to append to it. That's the issue with containers. The optimal semantics always change depending on the use case.

Yep, yep, I found myself wrestling with the same issues. All good points. On one hand containers are a target for optimization because many will use them. On the other hand you'd want to have reasonably simple and idiomatic code in the container implementation because you want people to understand them easily and also to write their own. I thought for a while of a layered approach in which you'd have both the value and the sealed reference version of a container... it's just too much aggravation.
 An explicit initialization is needed to work around this design issue.
 The worst thing is that in many cases it would work fine (you might
 have already initialized it indirectly) but sometimes you get
 unexpected result. I got hit by this in past, and it wasn't easy to
 trace down.

 As such, I strongly believe containers either need to have copy
 semantics, or be classes. However, copy semantics contradicts with the
 "cheap copy ctor" idiom because you need to copy all the elements from
 source container.

Personally, I'm really concerned by the case where you have a container of containers. Class semantics make things really complicated as you always have to initialize everything in the container explicitly; value semantics makes things semantically easier but quite inefficient as moving elements inside of the outermost container implies copying the containers. Making containers auto-initialize themselves on first use solves the case where containers are references-types; making containers capable of using move semantics solves the problem for value-type containers.

Neither values nor references are perfect indeed. For example, someone mentioned, hey, in STL I write set< vector<double> > and it Just Works(tm). On the other hand, if you swap the two names it still seems to work but it's awfully inefficient (something that may trip even experienced developers). Andrei
Feb 01 2011
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2011-02-01 11:12:13 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 1/28/11 8:12 PM, Michel Fortin wrote:
 On 2011-01-28 20:10:06 -0500, "Denis Koroskin" <2korden gmail.com> said:
 
 Unfortunately, this design has big issues:
 
 
 void fill(Appender appender)
 {
 appender.put("hello");
 appender.put("world");
 }
 
 void test()
 {
 Appender<string> appender;
 fill(appender); // Appender is supposed to have reference semantics
 assert(appender.length != 0); // fails!
 }
 
 Asserting above fails because at the time you pass appender object to
 the fill method it isn't initialized yet (lazy initialization). As
 such, a null is passed, creating an instance at first appending, but
 the result isn't seen to the caller.

That's indeed a problem. I don't think it's a fatal flaw however, given that the idiom already exists in AAs. That said, the nice thing about my proposal is that you can easily reuse the Impl to create a new container to build a new container wrapper with the semantics you like with no loss of efficiency. As for the case of Appender... personally in the case above I'd be tempted to use Appender.Impl directly (value semantics) and make fill take a 'ref'. There's no point in having an extra heap allocation, especially if you're calling test() in a loop or if there's a good chance fill() has nothing to append to it. That's the issue with containers. The optimal semantics always change depending on the use case.

Yep, yep, I found myself wrestling with the same issues. All good points. On one hand containers are a target for optimization because many will use them. On the other hand you'd want to have reasonably simple and idiomatic code in the container implementation because you want people to understand them easily and also to write their own. I thought for a while of a layered approach in which you'd have both the value and the sealed reference version of a container... it's just too much aggravation.

But are you not just pushing the aggravation elsewhere? If I need a by value container for some reason (performance or semantics) I'll have to write my own, and likely others will write their own too. Using classes for containers is just marginally better than making them by-value structs: you can use 'new' with a by-value struct if you want it to behave as a class-like by-reference container: struct Container { ... } auto c = new Container(); The only noticeable difference from a class container is that now c is now a Container*.
 Personally, I'm really concerned by the case where you have a container
 of containers. Class semantics make things really complicated as you
 always have to initialize everything in the container explicitly; value
 semantics makes things semantically easier but quite inefficient as
 moving elements inside of the outermost container implies copying the
 containers. Making containers auto-initialize themselves on first use
 solves the case where containers are references-types; making containers
 capable of using move semantics solves the problem for value-type
 containers.

Neither values nor references are perfect indeed. For example, someone mentioned, hey, in STL I write set< vector<double> > and it Just Works(tm). On the other hand, if you swap the two names it still seems to work but it's awfully inefficient (something that may trip even experienced developers).

Isn't that solved by C++0x, using move semantics in swap? -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Feb 01 2011
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/1/11 10:44 AM, Michel Fortin wrote:
 On 2011-02-01 11:12:13 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 On 1/28/11 8:12 PM, Michel Fortin wrote:
 On 2011-01-28 20:10:06 -0500, "Denis Koroskin" <2korden gmail.com> said:

 Unfortunately, this design has big issues:


 void fill(Appender appender)
 {
 appender.put("hello");
 appender.put("world");
 }

 void test()
 {
 Appender<string> appender;
 fill(appender); // Appender is supposed to have reference semantics
 assert(appender.length != 0); // fails!
 }

 Asserting above fails because at the time you pass appender object to
 the fill method it isn't initialized yet (lazy initialization). As
 such, a null is passed, creating an instance at first appending, but
 the result isn't seen to the caller.

That's indeed a problem. I don't think it's a fatal flaw however, given that the idiom already exists in AAs. That said, the nice thing about my proposal is that you can easily reuse the Impl to create a new container to build a new container wrapper with the semantics you like with no loss of efficiency. As for the case of Appender... personally in the case above I'd be tempted to use Appender.Impl directly (value semantics) and make fill take a 'ref'. There's no point in having an extra heap allocation, especially if you're calling test() in a loop or if there's a good chance fill() has nothing to append to it. That's the issue with containers. The optimal semantics always change depending on the use case.

Yep, yep, I found myself wrestling with the same issues. All good points. On one hand containers are a target for optimization because many will use them. On the other hand you'd want to have reasonably simple and idiomatic code in the container implementation because you want people to understand them easily and also to write their own. I thought for a while of a layered approach in which you'd have both the value and the sealed reference version of a container... it's just too much aggravation.

But are you not just pushing the aggravation elsewhere? If I need a by value container for some reason (performance or semantics) I'll have to write my own, and likely others will write their own too.

If semantics are the primary concern, you could (and in fact Phobos could) provide a Value!C template that automatically calls dup in this(this) etc. For performance I agree there is stuff that class containers leave on the table.
 Using classes for containers is just marginally better than making them
 by-value structs: you can use 'new' with a by-value struct if you want
 it to behave as a class-like by-reference container:

 struct Container {
 ...
 }

 auto c = new Container();

 The only noticeable difference from a class container is that now c is
 now a Container*.

Well one problem now is that if you have a Container* you don't know whether it's dynamically allocated or the address of some stack-allocated object. This is pretty big; a major issue that I believe C++ has is that you can seldom reason modularly about functions because C++ makes it impossible to represent reference semantics with local/remote/shared/no ownership without resorting to convention. A better solution is to define something like auto c = new Classify!Container; which transforms a value into a class object. With this, the question becomes a matter of choosing the right default: do we want values most of the time and occasional references, or vice versa? I think most of the time you need references, as witnessed by the many '&'s out there in code working on STL containers.
 Personally, I'm really concerned by the case where you have a container
 of containers. Class semantics make things really complicated as you
 always have to initialize everything in the container explicitly; value
 semantics makes things semantically easier but quite inefficient as
 moving elements inside of the outermost container implies copying the
 containers. Making containers auto-initialize themselves on first use
 solves the case where containers are references-types; making containers
 capable of using move semantics solves the problem for value-type
 containers.

Neither values nor references are perfect indeed. For example, someone mentioned, hey, in STL I write set< vector<double> > and it Just Works(tm). On the other hand, if you swap the two names it still seems to work but it's awfully inefficient (something that may trip even experienced developers).

Isn't that solved by C++0x, using move semantics in swap?

This particular incarnation yes, but that doesn't automatically fix user code that forgets the cost of copying. But that took a large language change. My point was that values by default is not automatically a good choice. Andrei
Feb 01 2011
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei:

 A better solution is to define something like
 
 auto c = new Classify!Container;
 
 which transforms a value into a class object.
 
 With this, the question becomes a matter of choosing the right default: 
 do we want values most of the time and occasional references, or vice 
 versa? I think most of the time you need references, as witnessed by the 
 many '&'s out there in code working on STL containers.

I agree that most times a reference is better. This brings back the need for a very good (efficient, syntactically readable, and even if not safe, able to spot most common errors, and RAII-safe) way to allocate class instances in-place, on the stack or inside another struct/class. Bye, bearophile
Feb 01 2011
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2011-02-01 12:07:55 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 With this, the question becomes a matter of choosing the right default: 
 do we want values most of the time and occasional references, or vice 
 versa? I think most of the time you need references, as witnessed by 
 the many '&'s out there in code working on STL containers.

What exactly is "most of the time"? In C++, you pass containers by '&' for function parameters, using '&' elsewhere is rare. One thing I proposed some time ago to address this problem (and to which no one replied) was this: ref struct Container { ... } // new "ref struct" concept void func(Container c) { // c is implicitly a "ref Container" } Container a; // by value func(a); // implicitly passed by ref Containers would be stored by value, but always passed by ref in functions parameters. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Feb 01 2011
parent Simon Buerger <krox gmx.net> writes:
On 01.02.2011 20:01, Michel Fortin wrote:
 On 2011-02-01 12:07:55 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 With this, the question becomes a matter of choosing the right
 default: do we want values most of the time and occasional
 references, or vice versa? I think most of the time you need
 references, as witnessed by the many '&'s out there in code working
 on STL containers.

What exactly is "most of the time"? In C++, you pass containers by '&' for function parameters, using '&' elsewhere is rare. One thing I proposed some time ago to address this problem (and to which no one replied) was this: ref struct Container { ... } // new "ref struct" concept void func(Container c) { // c is implicitly a "ref Container" } Container a; // by value func(a); // implicitly passed by ref Containers would be stored by value, but always passed by ref in functions parameters.

Thats would not be "per-value" as most understand it. Container globalC; func(c); void func(Container paramC) { c.add(42); // modifies globalC in reference-semantics // leaves globalC as it was in value-semantics } Your Idea would be somehow truly value-based if you default not only to "ref" but to "const ref", because then the function would not be able to alter globalC. But making parameters default-const was not considered the right way for D. - Krox
Feb 01 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday 01 February 2011 09:07:55 Andrei Alexandrescu wrote:
 On 2/1/11 10:44 AM, Michel Fortin wrote:
 On 2011-02-01 11:12:13 -0500, Andrei Alexandrescu
 
 <SeeWebsiteForEmail erdani.org> said:
 On 1/28/11 8:12 PM, Michel Fortin wrote:
 On 2011-01-28 20:10:06 -0500, "Denis Koroskin" <2korden gmail.com> said:
 Unfortunately, this design has big issues:
 
 
 void fill(Appender appender)
 {
 appender.put("hello");
 appender.put("world");
 }
 
 void test()
 {
 Appender<string> appender;
 fill(appender); // Appender is supposed to have reference semantics
 assert(appender.length != 0); // fails!
 }
 
 Asserting above fails because at the time you pass appender object to
 the fill method it isn't initialized yet (lazy initialization). As
 such, a null is passed, creating an instance at first appending, but
 the result isn't seen to the caller.

That's indeed a problem. I don't think it's a fatal flaw however, given that the idiom already exists in AAs. That said, the nice thing about my proposal is that you can easily reuse the Impl to create a new container to build a new container wrapper with the semantics you like with no loss of efficiency. As for the case of Appender... personally in the case above I'd be tempted to use Appender.Impl directly (value semantics) and make fill take a 'ref'. There's no point in having an extra heap allocation, especially if you're calling test() in a loop or if there's a good chance fill() has nothing to append to it. That's the issue with containers. The optimal semantics always change depending on the use case.

Yep, yep, I found myself wrestling with the same issues. All good points. On one hand containers are a target for optimization because many will use them. On the other hand you'd want to have reasonably simple and idiomatic code in the container implementation because you want people to understand them easily and also to write their own. I thought for a while of a layered approach in which you'd have both the value and the sealed reference version of a container... it's just too much aggravation.

But are you not just pushing the aggravation elsewhere? If I need a by value container for some reason (performance or semantics) I'll have to write my own, and likely others will write their own too.

If semantics are the primary concern, you could (and in fact Phobos could) provide a Value!C template that automatically calls dup in this(this) etc. For performance I agree there is stuff that class containers leave on the table.
 Using classes for containers is just marginally better than making them
 by-value structs: you can use 'new' with a by-value struct if you want
 it to behave as a class-like by-reference container:
 
 struct Container {
 ...
 }
 
 auto c = new Container();
 
 The only noticeable difference from a class container is that now c is
 now a Container*.

Well one problem now is that if you have a Container* you don't know whether it's dynamically allocated or the address of some stack-allocated object. This is pretty big; a major issue that I believe C++ has is that you can seldom reason modularly about functions because C++ makes it impossible to represent reference semantics with local/remote/shared/no ownership without resorting to convention. A better solution is to define something like auto c = new Classify!Container; which transforms a value into a class object. With this, the question becomes a matter of choosing the right default: do we want values most of the time and occasional references, or vice versa? I think most of the time you need references, as witnessed by the many '&'s out there in code working on STL containers.

Java implements containers as classes (not that it really has any other choice), so all containers in Java have reference semantics, and I've _never_ found that to be a problem. I do think that there are rare cases where it makes sense for a container to be a value type, but I really do think that it's a rare case for the average programmer. On the other hand, having containers be value types in C++ is _frequently_ a problem. - Jonathan M Davis
Feb 01 2011
prev sibling parent Simon Buerger <krox gmx.net> writes:
On 01.02.2011 18:08, Steven Schveighoffer wrote:
 On Tue, 01 Feb 2011 11:44:36 -0500, Michel Fortin
 <michel.fortin michelf.com> wrote:

 On 2011-02-01 11:12:13 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 On 1/28/11 8:12 PM, Michel Fortin wrote:
 On 2011-01-28 20:10:06 -0500, "Denis Koroskin" <2korden gmail.com>
 said:

 Unfortunately, this design has big issues:
 void fill(Appender appender)
 {
 appender.put("hello");
 appender.put("world");
 }
 void test()
 {
 Appender<string> appender;
 fill(appender); // Appender is supposed to have reference semantics
 assert(appender.length != 0); // fails!
 }
 Asserting above fails because at the time you pass appender
 object to
 the fill method it isn't initialized yet (lazy initialization). As
 such, a null is passed, creating an instance at first appending, but
 the result isn't seen to the caller.

given that the idiom already exists in AAs. That said, the nice thing about my proposal is that you can easily reuse the Impl to create a new container to build a new container wrapper with the semantics you like with no loss of efficiency. As for the case of Appender... personally in the case above I'd be tempted to use Appender.Impl directly (value semantics) and make fill take a 'ref'. There's no point in having an extra heap allocation, especially if you're calling test() in a loop or if there's a good chance fill() has nothing to append to it.



I've been thinking of making Appender.Impl public or at least its own type. A stack-based appender makes a lot of sense when you are using it temporarily to build an array. But array-based containers really are in a separate class from node-based containers. It's tempting to conflate the two because they are both 'containers', but arrays allow many more optimizations/features that node-based containers simply can't do.
 Yep, yep, I found myself wrestling with the same issues. All good
 points. On one hand containers are a target for optimization
 because many will use them. On the other hand you'd want to have
 reasonably simple and idiomatic code in the container
 implementation because you want people to understand them easily
 and also to write their own. I thought for a while of a layered
 approach in which you'd have both the value and the sealed
 reference version of a container... it's just too much aggravation.

But are you not just pushing the aggravation elsewhere? If I need a by value container for some reason (performance or semantics) I'll have to write my own, and likely others will write their own too.

foo(container.dup) ; // value semantics I'm sure some template guru can make a wrapper type for this for the rare occasions that you need it. We can work on solving the "auto-initialization" issue (where a nested container 'just works'), I think there are ways to do it. If that still doesn't help for your issues, then writing your own may be the only valid option.
 Using classes for containers is just marginally better than making
 them by-value structs: you can use 'new' with a by-value struct if
 you want it to behave as a class-like by-reference container:

 struct Container {
 ...
 }

 auto c = new Container();

 The only noticeable difference from a class container is that now c
 is now a Container*.

And doesn't get cleaned up by the GC properly. Plus, each member call must check if the container is 'instantiated', since we can have no default ctors. Yes, it's a trade-off, and I think by far class-based containers win for the common case.
 Personally, I'm really concerned by the case where you have a
 container
 of containers. Class semantics make things really complicated as you
 always have to initialize everything in the container explicitly;
 value
 semantics makes things semantically easier but quite inefficient as
 moving elements inside of the outermost container implies copying the
 containers. Making containers auto-initialize themselves on first use
 solves the case where containers are references-types; making
 containers
 capable of using move semantics solves the problem for value-type
 containers.

someone mentioned, hey, in STL I write set< vector<double> > and it Just Works(tm). On the other hand, if you swap the two names it still seems to work but it's awfully inefficient (something that may trip even experienced developers).

Isn't that solved by C++0x, using move semantics in swap?

swap isn't the problem. foreach(s; myVectorSet) { // if s is by value, it must be copied for each iteration in the loop } -Steve

Just to note: the "correct" solution for the last problem is foreach(const ref s; myVectorSet) which is working in current D. In a more value-based language you may even want to default to "const ref" for foreach-loop-values, and even for function-parameters. I suggested that a while ago, but wasn't liked much for D, for good reasons. - Krox
Feb 01 2011
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2011-01-28 19:00:02 -0500, Tomek Sowiński <just ask.me> said:

 Michel Fortin napisał:
 
 We already argument this over and over in the past. First, I totally
 acknowledge that C++ style containers have a problem: they make it
 easier to copy the content than pass it by reference. On the other side
 of the spectrum, I think that class semantics makes it too easy to have
 null dereferences, it's easy to get lost when you have a container of
 containers.
 
 I have some experience with containers having class-style semantics: in
 Objective-C, I ended up creating a set of macro-like functions which I
 use to initialize containers whenever I use them in case they are null.
 And I had to do more of these utility functions to handle a particular
 data structure of mine which is a dictionary of arrays of objects. In
 C++, I'd have declared this as a "map< string, vector< Object > >" and
 be done with it; no need for special care initializing each vector, so
 much easier than in Objective-C.
 
 I agree that defining structs to have reference semantics as you have
 done is complicated. But I like the lazy initialization, and we have a
 precedent for that with AAs (ideally, AAs would be a compatible
 container too). Can't we just use the GC instead of reference counting?
 I'd make things much easier. Here is a implementation:
 
 	struct Container
 	{
 		struct Impl { ... }
 
 		private Impl* _impl;
 		ref Impl impl()  property
 		{
 			if (!impl) impl = new Impl;
 			return *impl;
 		}
 	
 		alias impl this;
 	}
 
 I also believe reference semantics are not to be used everywhere, even
 though they're good most of the time. I'd like to have a way to bypass
 it and get a value-semantic container. With the above, it's easy as
 long as you keep Container.Impl public:
 
 	void main() {
 		Container      lazyHeapAllocatedContainer;
 		Container.Impl stackAllocatedContainer;
 	}
 
 	void MyObject {
 		Container.Impl listOfObjects;
 	}

Is there anything implementation specific in the outer struct that provides ref semantics to Impl? If not, Container could be generic, parametrized by Impl type.

You could provide an implementation-specific version of some functions as an optimization. For instance there is no need to create the Impl when asking for the length, if the pointer is null, length is zero. Typically, const function can be implemented in the outward container with a shortcut checking for null.
 Overall, I think a value-like implementation in a referency wrapper is 
 a clear-cut idiom, bringing order to otherwise messy struct-implemented 
 ref-semantics. Do you know of a existing collection library that 
 exploits this idea?

No. Only associative arrays in D do that, that I know of. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 28 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/29/11 5:01 AM, Simen kjaeraas wrote:
 Tomek Sowiński <just ask.me> wrote:

 Michel Fortin napisał:

 Is there anything implementation specific in the outer struct that

 ref semantics to Impl? If not, Container could be generic,

 Impl type.

You could provide an implementation-specific version of some functions as an optimization. For instance there is no need to create the Impl when asking for the length, if the pointer is null, length is zero. Typically, const function can be implemented in the outward container with a shortcut checking for null.

I think the reference struct can still be orthogonal to the container. struct Ref(Impl) { private Impl* _impl; ref Impl impl() property { if (!impl) impl = new Impl; return *impl; } static if (hasLength!Impl) { auto length() property { return impl ? impl.length : 0; } } alias impl this; }

Now, other functions may also exploit such a shortcut. How do you plan to implement that? Actually, by adopting another idiom, it can be done: struct Ref(Impl) { private Impl* _impl; property ref Impl impl() { return *(_impl = (_impl ? _impl : new Impl)); } alias impl this; auto opDispatch(string name, T...)(T args) { mixin("return impl."~name~"(_impl,args);"); } } struct ExampleImpl { static int length(ExampleImpl* that) { return that ? *that.actualLength : 0; } int actualLength( ) { return 42; } } Of course, I did not say it was a good idiom. :p

I do something similar with RefCounted. There are problems - you need to know in advance which functions you can implement on a null container (empty and length are obvious candidates, but there could be others). Andrei
Feb 01 2011
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Sat, 29 Jan 2011 02:32:28 +0300, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2011-01-28 17:09:08 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> said:

 On 1/28/11 3:05 PM, Michel Fortin wrote:
 Not my preferred choices (especially #1), but having containers in
 Phobos will certainly be an improvement over not having them. So go  
 ahead!

see for now is that struct containers are just difficult to implement and need to have carefully explained semantics, whereas a lot of people know how classes behave from day one.

We already argument this over and over in the past. First, I totally acknowledge that C++ style containers have a problem: they make it easier to copy the content than pass it by reference. On the other side of the spectrum, I think that class semantics makes it too easy to have null dereferences, it's easy to get lost when you have a container of containers. I have some experience with containers having class-style semantics: in Objective-C, I ended up creating a set of macro-like functions which I use to initialize containers whenever I use them in case they are null. And I had to do more of these utility functions to handle a particular data structure of mine which is a dictionary of arrays of objects. In C++, I'd have declared this as a "map< string, vector< Object > >" and be done with it; no need for special care initializing each vector, so much easier than in Objective-C. I agree that defining structs to have reference semantics as you have done is complicated. But I like the lazy initialization, and we have a precedent for that with AAs (ideally, AAs would be a compatible container too). Can't we just use the GC instead of reference counting? I'd make things much easier. Here is a implementation: struct Container { struct Impl { ... } private Impl* _impl; ref Impl impl() property { if (!impl) impl = new Impl; return *impl; } alias impl this; }

Unfortunately, this design has big issues: void fill(Appender appender) { appender.put("hello"); appender.put("world"); } void test() { Appender<string> appender; fill(appender); // Appender is supposed to have reference semantics assert(appender.length != 0); // fails! } Asserting above fails because at the time you pass appender object to the fill method it isn't initialized yet (lazy initialization). As such, a null is passed, creating an instance at first appending, but the result isn't seen to the caller. An explicit initialization is needed to work around this design issue. The worst thing is that in many cases it would work fine (you might have already initialized it indirectly) but sometimes you get unexpected result. I got hit by this in past, and it wasn't easy to trace down. As such, I strongly believe containers either need to have copy semantics, or be classes. However, copy semantics contradicts with the "cheap copy ctor" idiom because you need to copy all the elements from source container.
 I also believe reference semantics are not to be used everywhere, even  
 though they're good most of the time. I'd like to have a way to bypass  
 it and get a value-semantic container. With the above, it's easy as long  
 as you keep Container.Impl public:

 	void main() {
 		Container      lazyHeapAllocatedContainer;
 		Container.Impl stackAllocatedContainer;
 	}

 	void MyObject {
 		Container.Impl listOfObjects;
 	}


 About #4, it'd be nice to have the containers use move semantics when
 possible even if they fallback to (cheap) copy semantic when move isn't
 available. That way, if you have a type which is moveable but not
 copyable you can still put it in a container. Does that makes sense?

of methods (aliases or not) that add to the interface that the user must absorb, but that are seldom useful. It just seems that the entire move paraphernalia doesn't lift its weight.

But could we limit this to say that only containers that can return elements by ref? Perhaps that won't help. You know the problem better than me, I don't really have anything more to say.

-- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 28 2011
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Tomek Sowi=C5=84ski <just ask.me> wrote:

 Michel Fortin napisa=C5=82:

 Is there anything implementation specific in the outer struct that =



 provides
 ref semantics to Impl? If not, Container could be generic,  =



 parametrized by
 Impl type.

You could provide an implementation-specific version of some function=


 as an optimization. For instance there is no need to create the Impl
 when asking for the length, if the pointer is null, length is zero.
 Typically, const function can be implemented in the outward container=


 with a shortcut checking for null.

I think the reference struct can still be orthogonal to the container.=

 	struct Ref(Impl)
 	{
 		private Impl* _impl;
 		ref Impl impl()  property
 		{
 			if (!impl) impl =3D new Impl;
 			return *impl;
 		}

 		static if (hasLength!Impl)
 		{
 			auto length()  property
 			{
 				return impl ? impl.length : 0;
 			}
 		}

 		alias impl this;
 	}

Now, other functions may also exploit such a shortcut. How do you plan to implement that? Actually, by adopting another idiom, it can be done: struct Ref(Impl) { private Impl* _impl; property ref Impl impl() { return *(_impl =3D (_impl ? _impl : new Impl)); } alias impl this; auto opDispatch(string name, T...)(T args) { mixin("return impl."~name~"(_impl,args);"); } } struct ExampleImpl { static int length(ExampleImpl* that) { return that ? *that.actualLength : 0; } int actualLength( ) { return 42; } } Of course, I did not say it was a good idiom. :p -- = Simen
Jan 29 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 28 Jan 2011 18:32:28 -0500, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2011-01-28 17:09:08 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> said:

 On 1/28/11 3:05 PM, Michel Fortin wrote:
 Not my preferred choices (especially #1), but having containers in
 Phobos will certainly be an improvement over not having them. So go  
 ahead!

see for now is that struct containers are just difficult to implement and need to have carefully explained semantics, whereas a lot of people know how classes behave from day one.

We already argument this over and over in the past. First, I totally acknowledge that C++ style containers have a problem: they make it easier to copy the content than pass it by reference. On the other side of the spectrum, I think that class semantics makes it too easy to have null dereferences, it's easy to get lost when you have a container of containers. I have some experience with containers having class-style semantics: in Objective-C, I ended up creating a set of macro-like functions which I use to initialize containers whenever I use them in case they are null. And I had to do more of these utility functions to handle a particular data structure of mine which is a dictionary of arrays of objects. In C++, I'd have declared this as a "map< string, vector< Object > >" and be done with it; no need for special care initializing each vector, so much easier than in Objective-C.

I see value in this idiom, and I have ideas on how to solve this with class-based containers. Essentially, I think we need a way to construct a "container of containers" where the owner container is the one who has class semantics, and the sub-containers only provide access through the owner. I haven't fleshed out how this will work, but it solves another really important problem that dcollections currently has -- over-allocation. The issue is, if you have a nested container (say a map of strings to linked-lists), the map container pre-allocates a page of elements, to ease stress on the GC. But so does each linked list! However, since the lists all are part of the map, they could potentially share the same allocator. If you have few elements in each list, this could significantly reduce the over-allocation of memory. -Steve
Jan 31 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 01 Feb 2011 11:44:36 -0500, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2011-02-01 11:12:13 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> said:

 On 1/28/11 8:12 PM, Michel Fortin wrote:
 On 2011-01-28 20:10:06 -0500, "Denis Koroskin" <2korden gmail.com>  
 said:

 Unfortunately, this design has big issues:
   void fill(Appender appender)
 {
 appender.put("hello");
 appender.put("world");
 }
  void test()
 {
 Appender<string> appender;
 fill(appender); // Appender is supposed to have reference semantics
 assert(appender.length != 0); // fails!
 }
  Asserting above fails because at the time you pass appender object to
 the fill method it isn't initialized yet (lazy initialization). As
 such, a null is passed, creating an instance at first appending, but
 the result isn't seen to the caller.

given that the idiom already exists in AAs. That said, the nice thing about my proposal is that you can easily reuse the Impl to create a new container to build a new container wrapper with the semantics you like with no loss of efficiency. As for the case of Appender... personally in the case above I'd be tempted to use Appender.Impl directly (value semantics) and make fill take a 'ref'. There's no point in having an extra heap allocation, especially if you're calling test() in a loop or if there's a good chance fill() has nothing to append to it.



I've been thinking of making Appender.Impl public or at least its own type. A stack-based appender makes a lot of sense when you are using it temporarily to build an array. But array-based containers really are in a separate class from node-based containers. It's tempting to conflate the two because they are both 'containers', but arrays allow many more optimizations/features that node-based containers simply can't do.
  Yep, yep, I found myself wrestling with the same issues. All good  
 points. On one hand containers are a target for optimization because  
 many will use them. On the other hand you'd want to have reasonably  
 simple and idiomatic code in the container implementation because you  
 want people to understand them easily and also to write their own. I  
 thought for a while of a layered approach in which you'd have both the  
 value and the sealed reference version of a container... it's just too  
 much aggravation.

But are you not just pushing the aggravation elsewhere? If I need a by value container for some reason (performance or semantics) I'll have to write my own, and likely others will write their own too.

foo(container.dup) ; // value semantics I'm sure some template guru can make a wrapper type for this for the rare occasions that you need it. We can work on solving the "auto-initialization" issue (where a nested container 'just works'), I think there are ways to do it. If that still doesn't help for your issues, then writing your own may be the only valid option.
 Using classes for containers is just marginally better than making them  
 by-value structs: you can use 'new' with a by-value struct if you want  
 it to behave as a class-like by-reference container:

 	struct Container {
 		...
 	}

 	auto c = new Container();

 The only noticeable difference from a class container is that now c is  
 now a Container*.

And doesn't get cleaned up by the GC properly. Plus, each member call must check if the container is 'instantiated', since we can have no default ctors. Yes, it's a trade-off, and I think by far class-based containers win for the common case.
 Personally, I'm really concerned by the case where you have a container
 of containers. Class semantics make things really complicated as you
 always have to initialize everything in the container explicitly; value
 semantics makes things semantically easier but quite inefficient as
 moving elements inside of the outermost container implies copying the
 containers. Making containers auto-initialize themselves on first use
 solves the case where containers are references-types; making  
 containers
 capable of using move semantics solves the problem for value-type
 containers.

mentioned, hey, in STL I write set< vector<double> > and it Just Works(tm). On the other hand, if you swap the two names it still seems to work but it's awfully inefficient (something that may trip even experienced developers).

Isn't that solved by C++0x, using move semantics in swap?

swap isn't the problem. foreach(s; myVectorSet) { // if s is by value, it must be copied for each iteration in the loop } -Steve
Feb 01 2011
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 I do something similar with RefCounted. There are problems - you need to  
 know in advance which functions you can implement on a null container  
 (empty and length are obvious candidates, but there could be others).

Static functions can safely be called. Hence, this template: import std.traits; template isStaticFunc(T, string fn) { enum isStaticFunc = is(typeof({ mixin("alias T."~fn~" func;"); ParameterTypeTuple!func args; mixin("T."~fn~"(args);"); })); } Now, Ref can look like this: struct Ref(Impl) { private Impl* _impl; property ref Impl impl() { return *(_impl = (_impl ? _impl : new Impl)); } //alias impl this; // Apparently, alias this takes precedence over opDispatch, so // using both doesn't work. Well, unless you put opDispatch in Impl. auto opDispatch(string name, T...)(T args) if ( isStaticFunc!(Impl, name)) { mixin("return impl."~name~"(_impl,args);"); } } struct ExampleImpl { static int length(ExampleImpl* that) { return that ? that.actualLength : 0; } int actualLength( ) { return 42; } } import std.stdio; void main( ) { Ref!ExampleImpl a; writeln( a.length ); } -- Simen
Feb 01 2011
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 01 Feb 2011 14:26:26 -0500, Simon Buerger <krox gmx.net> wrote:

 On 01.02.2011 18:08, Steven Schveighoffer wrote:

 swap isn't the problem.

 foreach(s; myVectorSet)
 {
 // if s is by value, it must be copied for each iteration in the loop
 }

Just to note: the "correct" solution for the last problem is foreach(const ref s; myVectorSet) which is working in current D. In a more value-based language you may even want to default to "const ref" for foreach-loop-values, and even for function-parameters. I suggested that a while ago, but wasn't liked much for D, for good reasons.

This is not a "solution". I cannot enforce that someone uses foreach with ref semantics. Unless the type is a reference type, like a class. Note that const ref isn't necessarily want, you could want to mutate the elements of the vector, meaning you really want ref. The whole point is, it's too easy to get it wrong, and the wrong thing looks innocuous and does not look like it will perform horribly. Compare this to someone who actually *wants* value semantics when iterating a reference type (which should be very rare): foreach(s; myVectorSet) { auto setIAmGoingToUse = s.dup; // clear intentions: "yes, I want to do an expensive copy" } -Steve
Feb 01 2011
prev sibling next sibling parent Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:
Michel Fortin napisa=C5=82:

 We already argument this over and over in the past. First, I totally=20
 acknowledge that C++ style containers have a problem: they make it=20
 easier to copy the content than pass it by reference. On the other side=20
 of the spectrum, I think that class semantics makes it too easy to have=20
 null dereferences, it's easy to get lost when you have a container of=20
 containers.
=20
 I have some experience with containers having class-style semantics: in=20
 Objective-C, I ended up creating a set of macro-like functions which I=20
 use to initialize containers whenever I use them in case they are null.=20
 And I had to do more of these utility functions to handle a particular=20
 data structure of mine which is a dictionary of arrays of objects. In=20
 C++, I'd have declared this as a "map< string, vector< Object > >" and=20
 be done with it; no need for special care initializing each vector, so=20
 much easier than in Objective-C.
=20
 I agree that defining structs to have reference semantics as you have=20
 done is complicated. But I like the lazy initialization, and we have a=20
 precedent for that with AAs (ideally, AAs would be a compatible=20
 container too). Can't we just use the GC instead of reference counting?=20
 I'd make things much easier. Here is a implementation:
=20
 	struct Container
 	{
 		struct Impl { ... }
=20
 		private Impl* _impl;
 		ref Impl impl()  property
 		{
 			if (!impl) impl =3D new Impl;
 			return *impl;
 		}
 =09
 		alias impl this;
 	}
=20
 I also believe reference semantics are not to be used everywhere, even=20
 though they're good most of the time. I'd like to have a way to bypass=20
 it and get a value-semantic container. With the above, it's easy as=20
 long as you keep Container.Impl public:
=20
 	void main() {
 		Container      lazyHeapAllocatedContainer;
 		Container.Impl stackAllocatedContainer;
 	}
=20
 	void MyObject {
 		Container.Impl listOfObjects;
 	}

Is there anything implementation specific in the outer struct that provides= ref semantics to Impl? If not, Container could be generic, parametrized by= Impl type. Overall, I think a value-like implementation in a referency wrapper is a cl= ear-cut idiom, bringing order to otherwise messy struct-implemented ref-sem= antics. Do you know of a existing collection library that exploits this ide= a? --=20 Tomek
Jan 28 2011
prev sibling next sibling parent spir <denis.spir gmail.com> writes:
On 01/29/2011 01:01 AM, bearophile wrote:
 Built-in AAs are currently broken and in need to be fixed:

 import std.stdio: writeln;
 void foo(int[int] aa, int n) {
      aa[n] = n;
 }
 void main() {
      int[int] a;
      foo(a, 0);
      writeln(a);
      a[1] = 1;
      foo(a, 2);
      writeln(a);
 }

 Bye,
 bearophile

Variation on the theme: void foo(int[int] aa, int n) { aa[n] = n; } void foo(int[] a, int n) { a ~= n; } void bar(ref int[int] aa, int n) { aa[n] = n; } void bar(ref int[] a, int n) { a ~= n; } unittest { int[int] aa; foo(aa, 3); writeln(aa.length); int[] a; foo(a, 3); writeln(a.length); int[int] bb; bar(bb, 3); writeln(bb.length); int[] b; bar(b, 3); writeln(b.length); } Denis -- _________________ vita es estrany spir.wikidot.com
Jan 28 2011
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei:

 1. Containers will be classes.

This page: http://www.jroller.com/scolebourne/entry/the_next_big_jvm_language1 A quotation:
3) Everything is a monitor. In Java and the JVM, every object is a monitor,
meaning that you can synchronize on any object. This is incredibly wasteful at
the JVM level. Senior JVM guys have indicated large percentage improvements in
JVM space and performance if we removed the requirement that every object can
be synchronized on. (Instead, you would have specific classes like Java 5 Lock)<

I have read similar comments in various other places. What about creating a nomonitor annotation, for D2 classes to not create a monitor for specific classes annotated with it? This may reduce some class overhead. Bye, bearophile
Jan 28 2011
prev sibling next sibling parent Tomek =?ISO-8859-2?Q?Sowi=F1ski?= <just ask.me> writes:
Michel Fortin napisa=B3:

 Is there anything implementation specific in the outer struct that prov=


 ref semantics to Impl? If not, Container could be generic, parametrized=


 Impl type. =20

You could provide an implementation-specific version of some functions=20 as an optimization. For instance there is no need to create the Impl=20 when asking for the length, if the pointer is null, length is zero.=20 Typically, const function can be implemented in the outward container=20 with a shortcut checking for null.

I think the reference struct can still be orthogonal to the container. struct Ref(Impl) { private Impl* _impl; ref Impl impl() property { if (!impl) impl =3D new Impl; return *impl; } static if (hasLength!Impl) { auto length() property { return impl ? impl.length : 0; } } alias impl this; } Reusability lightens the burden of the container's author (less fuss for us= er implementations) and somewhat standardizes containers as they all must e= xhibit a certain API with certain semantics to be able to fit into Ref. The downside is that the syntax for the most common case (ref semantics) is= a little nosier than for value-like behavior. --=20 Tomek
Jan 29 2011
prev sibling next sibling parent Tomek =?ISO-8859-2?Q?Sowi=F1ski?= <just ask.me> writes:
Michel Fortin napisa=B3:

 As for the case of Appender... personally in the case above I'd be=20
 tempted to use Appender.Impl directly (value semantics) and make fill=20
 take a 'ref'. There's no point in having an extra heap allocation,=20
 especially if you're calling test() in a loop or if there's a good=20
 chance fill() has nothing to append to it.

Or take an output range. --=20 Tomek
Jan 29 2011
prev sibling next sibling parent Tomek =?ISO-8859-2?Q?Sowi=F1ski?= <just ask.me> writes:
bearophile napisa=B3:

 This page:
 http://www.jroller.com/scolebourne/entry/the_next_big_jvm_language1
=20
 A quotation:
=20
3) Everything is a monitor. In Java and the JVM, every object is a monit=


object. This is incredibly wasteful at the JVM level. Senior JVM guys ha=


in JVM space and performance if we removed the requirement that every ob=


would have specific classes like Java 5 Lock)<

I have read similar comments in various other places. =20 What about creating a nomonitor annotation, for D2 classes to not create=

 with it? This may reduce some class overhead.

Better just remove it, it's not used often. Besides, there are different lo= cks, one size doesn't fit all. --=20 Tomek
Jan 29 2011
prev sibling next sibling parent SHOO <zan77137 nifty.com> writes:
(2011/01/29 3:31), Andrei Alexandrescu wrote:
 Today after work I plan to start making one pass through std.container.
 After having thought of things for a long time, my conclusions are as
 follows:

 1. Containers will be classes.

 2. Most of the methods in existing containers will be final. It's up to
 the container to make a method final or not.

 3. Containers and their ranges decide whether they give away references
 to their objects. Sealing is a great idea but it makes everybody's life
 too complicated. I'll defer sealing to future improvements in the
 language and/or the reflection subsystem.

 4. Containers will assume that objects are cheap to copy so they won't
 worry about moving primitives.

 Any showstoppers, please share.


 Andrei

1, 2

There is hardly the superiority that a containers are structs. :-) In the first place I had doubt towards treating structs as classes (OOP). When I use it by such a way, the lack of the default constructor is fatal.
3, 4

there are many unnecessary cases. Because it is thought that a lot of repetition access to the element and life cycles occurs in the case of a container, handling it by the element unit should be high-performance. In addition, for usability, a thing of simple behavior is preferable rather than a complicated thing. However, on the other hand, the design that wrong use is impossible should be considered enough.
Jan 29 2011
prev sibling next sibling parent reply Simon Buerger <krox gmx.net> writes:
On 28.01.2011 19:31, Andrei Alexandrescu wrote:
 1. Containers will be classes.

 2. Most of the methods in existing containers will be final. It's up
 to the container to make a method final or not.

 3. Containers and their ranges decide whether they give away
 references to their objects. Sealing is a great idea but it makes
 everybody's life too complicated. I'll defer sealing to future
 improvements in the language and/or the reflection subsystem.

 4. Containers will assume that objects are cheap to copy so they won't
 worry about moving primitives.

Not perfectly what I would like, but a reasonable choice, and most important to actually have a mature container-lib. But there are other choices remaining: what containers will be there and what will they be called? My suggestion is * Set, MulitSet, Map, MultiMap (hash-table based) * OrderedSet, OrderedMultiSet, OrderedMap, OrderedMultiMap (tree-based) * Sequence (like stl-Deque. the name is just more intuitive. Funny enough, the stl-deque implemenation has nothing to do with a "doubly linked list") * Array (like stl-vector. I think "vector" is a kinda strange name, but that may be only my impression) * List (linked list) * Stack/Queue/PriorityQueue should be done on top of an other class, with a "impl"-template-param, like the stl-ones Things to note: * container should be named with respect to their use, not the implementation. "HashSet" is a bad name, because the user shouldnt care about the implemenation. * unordered sets are used more often than ordered. So it should be "Set/OrderedSet", and not "UnorderedSet/Set" (also, the first one is two characters less typing *g*) * opEqual should work between different types of Sets (or Maps, or sequences). Nothing wrong with comparing an ordered to an unordered one, or a list to an array. just my 2 cents, Krox
Jan 29 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Simon Buerger:

 * container should be named with respect to their use, not the 
 implementation. "HashSet" is a bad name, because the user shouldnt 
 care about the implemenation.

This is good for a language like Python, or even Java, but for a system language I want to know the algorithms I'm using under the hood. So I am not sure. Bye, bearophile
Jan 29 2011
parent reply Jim <bitcirkel yahoo.com> writes:
bearophile Wrote:

 Simon Buerger:
 
 * container should be named with respect to their use, not the 
 implementation. "HashSet" is a bad name, because the user shouldnt 
 care about the implemenation.

This is good for a language like Python, or even Java, but for a system language I want to know the algorithms I'm using under the hood. So I am not sure.

There could be an interface named Set and various implementations of it, e.g. HashSet. I think this would be a good principle in general for all abstract collection types.
Jan 29 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 01/29/2011 12:44 PM, Jim wrote:
 bearophile Wrote:

 Simon Buerger:

 * container should be named with respect to their use, not the
 implementation. "HashSet" is a bad name, because the user shouldnt
 care about the implemenation.

This is good for a language like Python, or even Java, but for a system language I want to know the algorithms I'm using under the hood. So I am not sure.

There could be an interface named Set and various implementations of it, e.g. HashSet. I think this would be a good principle in general for all abstract collection types.

Thus approach has a few advantages but a lot of serious issues. I decided to not define a hierarchy for std.container. Andrei
Jan 29 2011
prev sibling next sibling parent KennyTM~ <kennytm gmail.com> writes:
On Jan 30, 11 01:09, Simon Buerger wrote:
 On 28.01.2011 19:31, Andrei Alexandrescu wrote:
 1. Containers will be classes.

 2. Most of the methods in existing containers will be final. It's up
 to the container to make a method final or not.

 3. Containers and their ranges decide whether they give away
 references to their objects. Sealing is a great idea but it makes
 everybody's life too complicated. I'll defer sealing to future
 improvements in the language and/or the reflection subsystem.

 4. Containers will assume that objects are cheap to copy so they won't
 worry about moving primitives.

Not perfectly what I would like, but a reasonable choice, and most important to actually have a mature container-lib. But there are other choices remaining: what containers will be there and what will they be called? My suggestion is * Set, MulitSet, Map, MultiMap (hash-table based) * OrderedSet, OrderedMultiSet, OrderedMap, OrderedMultiMap (tree-based) * Sequence (like stl-Deque. the name is just more intuitive. Funny enough, the stl-deque implemenation has nothing to do with a "doubly linked list")

A 'deque' just mean a double-ended queue, not necessarily doubly linked list. I don't like the name Sequence since it doesn't specify the container can be modified from both ends. 'Deque' is just fine.
 * Array (like stl-vector. I think "vector" is a kinda strange name, but
 that may be only my impression)
 * List (linked list)

 * Stack/Queue/PriorityQueue should be done on top of an other class,
 with a "impl"-template-param, like the stl-ones

 Things to note:
 * container should be named with respect to their use, not the
 implementation. "HashSet" is a bad name, because the user shouldnt care
 about the implemenation.

Except that there is already a std.container.RedBlackTree named in this fashion :).
 * unordered sets are used more often than ordered. So it should be
 "Set/OrderedSet", and not "UnorderedSet/Set" (also, the first one is two
 characters less typing *g*)

 * opEqual should work between different types of Sets (or Maps, or
 sequences). Nothing wrong with comparing an ordered to an unordered one,
 or a list to an array.

This breaks transitivity of '==' unnecessarily. I don't see the benefit of comparing two different kinds of containers. If you really need to compare them, use std.algorithm.equal.
 just my 2 cents,
 Krox

Jan 29 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday 29 January 2011 09:09:00 Simon Buerger wrote:
 * container should be named with respect to their use, not the
 implementation. "HashSet" is a bad name, because the user shouldnt
 care about the implemenation.

It was decided previously that containers in std.container would specifically be named based on the type of data structure that they are rather than what they're used for. Hence why the newest container to std.container is called RedBlackTree. If you want to call it TreeSet or something similar, then all you have to do is alias it. So, it's easy to get names based on usage if that's what you want, but that's not the naming scheme that std.container is using. - Jonathan M Davis
Jan 29 2011
prev sibling next sibling parent spir <denis.spir gmail.com> writes:
On 01/29/2011 06:09 PM, Simon Buerger wrote:
 Things to note:
 * container should be named with respect to their use, not the implementation.
 "HashSet" is a bad name, because the user shouldnt care about the
implemenation.

 * unordered sets are used more often than ordered. So it should be
 "Set/OrderedSet", and not "UnorderedSet/Set" (also, the first one is two
 characters less typing *g*)

 * opEqual should work between different types of Sets (or Maps, or sequences).
 Nothing wrong with comparing an ordered to an unordered one, or a list to an
 array.

+++ on all those points Denis -- _________________ vita es estrany spir.wikidot.com
Jan 30 2011
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 29 Jan 2011 12:09:00 -0500, Simon Buerger <krox gmx.net> wrote:

 On 28.01.2011 19:31, Andrei Alexandrescu wrote:
 1. Containers will be classes.

 2. Most of the methods in existing containers will be final. It's up
 to the container to make a method final or not.

 3. Containers and their ranges decide whether they give away
 references to their objects. Sealing is a great idea but it makes
 everybody's life too complicated. I'll defer sealing to future
 improvements in the language and/or the reflection subsystem.

 4. Containers will assume that objects are cheap to copy so they won't
 worry about moving primitives.

Not perfectly what I would like, but a reasonable choice, and most important to actually have a mature container-lib. But there are other choices remaining: what containers will be there and what will they be called? My suggestion is * Set, MulitSet, Map, MultiMap (hash-table based) * OrderedSet, OrderedMultiSet, OrderedMap, OrderedMultiMap (tree-based) * Sequence (like stl-Deque. the name is just more intuitive. Funny enough, the stl-deque implemenation has nothing to do with a "doubly linked list") * Array (like stl-vector. I think "vector" is a kinda strange name, but that may be only my impression) * List (linked list) * Stack/Queue/PriorityQueue should be done on top of an other class, with a "impl"-template-param, like the stl-ones Things to note: * container should be named with respect to their use, not the implementation. "HashSet" is a bad name, because the user shouldnt care about the implemenation. * unordered sets are used more often than ordered. So it should be "Set/OrderedSet", and not "UnorderedSet/Set" (also, the first one is two characters less typing *g*) * opEqual should work between different types of Sets (or Maps, or sequences). Nothing wrong with comparing an ordered to an unordered one, or a list to an array.

http://www.dsource.org/projects/dcollections -Steve
Jan 31 2011
parent reply Simon Buerger <krox gmx.net> writes:
On 31.01.2011 17:53, Steven Schveighoffer wrote:
 http://www.dsource.org/projects/dcollections

 -Steve

Well, seems not bad on a quick look. But source is updated 2 years ago, so I doubt it would compile with current dmd. Anyway, the topic here is the std-behaviour of the std-lib. But sure, always nice to have custom alternatives. Krox
Jan 31 2011
next sibling parent David Nadlinger <see klickverbot.at> writes:
On 1/31/11 6:48 PM, Simon Buerger wrote:
 Well, seems not bad on a quick look. But source is updated 2 years ago,…

http://www.dsource.org/projects/dcollections/browser/branches/d2 – four months are not quite as long as two years. Oh, and Steve is the very author of dcollections. ;) David
Jan 31 2011
prev sibling parent Simon Buerger <krox gmx.net> writes:
Okay, my fault. Didnt realize you were the author, and the project is 
still active. The "2 years" came from here: 
http://www.dsource.org/projects/dcollections/browser/trunk/dcollections. 
I thought, that "trunk" was the most recent version. Added a bookmark, 
and will definitely take a closer look later. Thx for mentioning.


On 31.01.2011 19:09, Steven Schveighoffer wrote:
 On Mon, 31 Jan 2011 12:48:06 -0500, Simon Buerger <krox gmx.net> wrote:

 On 31.01.2011 17:53, Steven Schveighoffer wrote:
 http://www.dsource.org/projects/dcollections

 -Steve

Well, seems not bad on a quick look. But source is updated 2 years ago, so I doubt it would compile with current dmd. Anyway, the topic here is the std-behaviour of the std-lib. But sure, always nice to have custom alternatives.

latest meaningful change was 4 months ago: http://www.dsource.org/projects/dcollections/changeset/102 It should compile on the latest DMD (if not, file a ticket). BTW, it changes very little mostly because I haven't had many complaints about it (maybe not a good sign?) and I haven't had much opportunity to use it, as my day job does not allow using D unfortunately. It was proposed as a possibility for the std lib, but Andrei and I couldn't come to an agreement on what the collections should look like. Ironically, his latest decision moves std.container closer to dcollections in design. However, it should be relatively compatible with phobos' container lib (in fact RedBlackTree is a direct port of dcollections' version). -Steve

Jan 31 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 31 Jan 2011 12:48:06 -0500, Simon Buerger <krox gmx.net> wrote:

 On 31.01.2011 17:53, Steven Schveighoffer wrote:
 http://www.dsource.org/projects/dcollections

 -Steve

Well, seems not bad on a quick look. But source is updated 2 years ago, so I doubt it would compile with current dmd. Anyway, the topic here is the std-behaviour of the std-lib. But sure, always nice to have custom alternatives.

latest meaningful change was 4 months ago: http://www.dsource.org/projects/dcollections/changeset/102 It should compile on the latest DMD (if not, file a ticket). BTW, it changes very little mostly because I haven't had many complaints about it (maybe not a good sign?) and I haven't had much opportunity to use it, as my day job does not allow using D unfortunately. It was proposed as a possibility for the std lib, but Andrei and I couldn't come to an agreement on what the collections should look like. Ironically, his latest decision moves std.container closer to dcollections in design. However, it should be relatively compatible with phobos' container lib (in fact RedBlackTree is a direct port of dcollections' version). -Steve
Jan 31 2011
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 31 Jan 2011 13:36:57 -0500, Simon Buerger <krox gmx.net> wrote:

 Okay, my fault. Didnt realize you were the author, and the project is  
 still active. The "2 years" came from here:  
 http://www.dsource.org/projects/dcollections/browser/trunk/dcollections.  
 I thought, that "trunk" was the most recent version. Added a bookmark,  
 and will definitely take a closer look later. Thx for mentioning.

Sorry about that, I have had at least 3 people be confused at why trunk wasn't the D2 version. I really need to swap those (make the 1.x version the branch). Unfortunately, it will screw up the D1 documentation, since it's auto-generated. But that should probably be moot since the D1 version isn't changing... -Steve
Jan 31 2011
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
In general I actually like this idea, but can we still have sealed 
ref-counted arrays and associative arrays somewhere in Phobos (not 
necessarily any more esoteric containers than these and not necessarily 
in std.container, maybe in a new module called std.refcounted or 
something)?  For huge arrays/AAs I think ref counting is an important 
optimization, especially with the GC in its current state but to a 
lesser extent even when we get a better GC.  GC tends to be slow 
compared to manual memory management in cases where most allocated 
blocks are large.

Also, I like reference counting for huge arrays/AAs enough that I 
submitted a ref counted hash table implementation to the Phobos list for 
review, which noone seemed to notice.  Please comment.

On 1/28/2011 1:31 PM, Andrei Alexandrescu wrote:
 Today after work I plan to start making one pass through std.container.
 After having thought of things for a long time, my conclusions are as
 follows:

 1. Containers will be classes.

 2. Most of the methods in existing containers will be final. It's up to
 the container to make a method final or not.

 3. Containers and their ranges decide whether they give away references
 to their objects. Sealing is a great idea but it makes everybody's life
 too complicated. I'll defer sealing to future improvements in the
 language and/or the reflection subsystem.

 4. Containers will assume that objects are cheap to copy so they won't
 worry about moving primitives.

 Any showstoppers, please share.


 Andrei

Jan 29 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 01/29/2011 12:54 PM, dsimcha wrote:
 Also, I like reference counting for huge arrays/AAs enough that I
 submitted a ref counted hash table implementation to the Phobos list for
 review, which noone seemed to notice. Please comment.

Where is it? Andrei
Jan 29 2011
parent reply dsimcha <dsimcha yahoo.com> writes:
http://www.dsource.org/projects/aa/browser/trunk/randAA/randaasealed.d

On 1/29/2011 2:18 PM, Andrei Alexandrescu wrote:
 On 01/29/2011 12:54 PM, dsimcha wrote:
 Also, I like reference counting for huge arrays/AAs enough that I
 submitted a ref counted hash table implementation to the Phobos list for
 review, which noone seemed to notice. Please comment.

Where is it? Andrei

Jan 29 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 01/29/2011 01:29 PM, dsimcha wrote:
 http://www.dsource.org/projects/aa/browser/trunk/randAA/randaasealed.d

 On 1/29/2011 2:18 PM, Andrei Alexandrescu wrote:
 On 01/29/2011 12:54 PM, dsimcha wrote:
 Also, I like reference counting for huge arrays/AAs enough that I
 submitted a ref counted hash table implementation to the Phobos list for
 review, which noone seemed to notice. Please comment.

Where is it? Andrei


Well to create a proposal you'd need to generate the documentation and put it somewhere and write a note to this group motivating the poposal and asking for a review manager. Andrei
Jan 29 2011
parent reply dsimcha <dsimcha yahoo.com> writes:
I've uploaded the documentation to 
http://cis.jhu.edu/~dsimcha/randaasealed.html and mentioned it again on 
the mailing list.  The documentation is pretty sparse because 
interface-wise it's just a standard hash table.  More generally, though, 
are we still interested in sealed/ref counted containers?

On 1/29/2011 2:52 PM, Andrei Alexandrescu wrote:
 On 01/29/2011 01:29 PM, dsimcha wrote:
 http://www.dsource.org/projects/aa/browser/trunk/randAA/randaasealed.d

 On 1/29/2011 2:18 PM, Andrei Alexandrescu wrote:
 On 01/29/2011 12:54 PM, dsimcha wrote:
 Also, I like reference counting for huge arrays/AAs enough that I
 submitted a ref counted hash table implementation to the Phobos list
 for
 review, which noone seemed to notice. Please comment.

Where is it? Andrei


Well to create a proposal you'd need to generate the documentation and put it somewhere and write a note to this group motivating the poposal and asking for a review manager. Andrei

Jan 29 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/29/11 3:36 PM, dsimcha wrote:
 I've uploaded the documentation to
 http://cis.jhu.edu/~dsimcha/randaasealed.html and mentioned it again on
 the mailing list. The documentation is pretty sparse because
 interface-wise it's just a standard hash table. More generally, though,
 are we still interested in sealed/ref counted containers?

Sorry for being slow in continuing this thread. Regarding the general issue that someone makes an informal proposal (either here, as a DIP, or on the Phobos mailing list), followed by a thundering silence: I believe that a good technique is to formalize the proposal review process, which has been a homerun for Boost. The disadvantage of that is that almost without exception this is very taxing to library submitters. This means the submitter must put a lot of thought and a lot of work into motivating, polishing, and documenting an artifact without any guarantee that it would lead to inclusion in the target library. I've seen very, VERY elaborate Boost submissions fail - literally months of work gone to waste. I'm not sure how to save people from doing work up front in hope of an uncertain outcome in the future. I do know what does _not_ work: the take it or leave it approach: "Hey, I have this code for abstraction XYZ that I extracted from a project of mine and I think it may be of general interest. It's at http://site.com/path/to/code.{d,html}. It needs polishing here and there, it's largely undocumented, but I'm sure the ideas shine through. Eh?" The doc at http://cis.jhu.edu/~dsimcha/randaasealed.html is somewhere in between. It is clear you have a good understanding of sealing and hash containers, but let me ask you this - if you wanted to sell this to someone, what would you do? Probably you'd show some relevant benchmarks putting the built-in hashes to shame. Maybe you'd have some good examples - yes, we know it's a hash, but it doesn't hurt to see some code pulp over there. Maybe you'd explain sealing and discuss its relative advantages and disadvantages (which have not yet been documented anywhere - a great opportunity). Maybe you'd even show some numbers showing how sealing does as well/better/worse than reference leaking. This is a good amount of upfront work for little promise. Again, I don't know yet how to optimize for minimizing that. What I did see works on Boost is a request for interest in the form of a discussion (usually with NO source, only USAGE examples) asking if there are people who are interested in such a notion. In this particular case, you'd need numbers to make a strong case which means that code must be already written. For something like e.g. "how about a Finite State Automaton library?" perhaps upfront code wouldn't be necessary for gauging interest. Andrei
Feb 01 2011
next sibling parent dsimcha <dsimcha yahoo.com> writes:
I see your point.  Some of the stuff I've posted lately (RandAA, 
CsvRange) is in a usable but admittedly rough state, and was written 
partially for myself and partially for Phobos.

If this came off as take-it-or-leave-it, this was unintended. These 
posts were supposed to be informal requests for comment.  My main goal 
in these posts was to gauge whether there's enough interest, whether the 
design was good enough that it's worth polishing up for Phobos, and 
whether I missed any key features.  If there was sufficient interest I 
had every intention of polishing/fleshing out these libraries.

On 2/1/2011 11:00 AM, Andrei Alexandrescu wrote:
 On 1/29/11 3:36 PM, dsimcha wrote:
 I've uploaded the documentation to
 http://cis.jhu.edu/~dsimcha/randaasealed.html and mentioned it again on
 the mailing list. The documentation is pretty sparse because
 interface-wise it's just a standard hash table. More generally, though,
 are we still interested in sealed/ref counted containers?

Sorry for being slow in continuing this thread. Regarding the general issue that someone makes an informal proposal (either here, as a DIP, or on the Phobos mailing list), followed by a thundering silence: I believe that a good technique is to formalize the proposal review process, which has been a homerun for Boost. The disadvantage of that is that almost without exception this is very taxing to library submitters. This means the submitter must put a lot of thought and a lot of work into motivating, polishing, and documenting an artifact without any guarantee that it would lead to inclusion in the target library. I've seen very, VERY elaborate Boost submissions fail - literally months of work gone to waste. I'm not sure how to save people from doing work up front in hope of an uncertain outcome in the future. I do know what does _not_ work: the take it or leave it approach: "Hey, I have this code for abstraction XYZ that I extracted from a project of mine and I think it may be of general interest. It's at http://site.com/path/to/code.{d,html}. It needs polishing here and there, it's largely undocumented, but I'm sure the ideas shine through. Eh?" The doc at http://cis.jhu.edu/~dsimcha/randaasealed.html is somewhere in between. It is clear you have a good understanding of sealing and hash containers, but let me ask you this - if you wanted to sell this to someone, what would you do? Probably you'd show some relevant benchmarks putting the built-in hashes to shame. Maybe you'd have some good examples - yes, we know it's a hash, but it doesn't hurt to see some code pulp over there. Maybe you'd explain sealing and discuss its relative advantages and disadvantages (which have not yet been documented anywhere - a great opportunity). Maybe you'd even show some numbers showing how sealing does as well/better/worse than reference leaking. This is a good amount of upfront work for little promise. Again, I don't know yet how to optimize for minimizing that. What I did see works on Boost is a request for interest in the form of a discussion (usually with NO source, only USAGE examples) asking if there are people who are interested in such a notion. In this particular case, you'd need numbers to make a strong case which means that code must be already written. For something like e.g. "how about a Finite State Automaton library?" perhaps upfront code wouldn't be necessary for gauging interest. Andrei

Feb 02 2011
prev sibling parent reply dsimcha <dsimcha yahoo.com> writes:
BTW, I thought the case for a sealed hash table with an optimized memory 
allocation scheme was self-evident to most heavy D users, given how the 
builtin hash table leaks memory/fragments the heap (I think it's a 
little of both) and destroys multithreaded performance.

On 2/1/2011 11:00 AM, Andrei Alexandrescu wrote:
 On 1/29/11 3:36 PM, dsimcha wrote:
 I've uploaded the documentation to
 http://cis.jhu.edu/~dsimcha/randaasealed.html and mentioned it again on
 the mailing list. The documentation is pretty sparse because
 interface-wise it's just a standard hash table. More generally, though,
 are we still interested in sealed/ref counted containers?

Sorry for being slow in continuing this thread. Regarding the general issue that someone makes an informal proposal (either here, as a DIP, or on the Phobos mailing list), followed by a thundering silence: I believe that a good technique is to formalize the proposal review process, which has been a homerun for Boost. The disadvantage of that is that almost without exception this is very taxing to library submitters. This means the submitter must put a lot of thought and a lot of work into motivating, polishing, and documenting an artifact without any guarantee that it would lead to inclusion in the target library. I've seen very, VERY elaborate Boost submissions fail - literally months of work gone to waste. I'm not sure how to save people from doing work up front in hope of an uncertain outcome in the future. I do know what does _not_ work: the take it or leave it approach: "Hey, I have this code for abstraction XYZ that I extracted from a project of mine and I think it may be of general interest. It's at http://site.com/path/to/code.{d,html}. It needs polishing here and there, it's largely undocumented, but I'm sure the ideas shine through. Eh?" The doc at http://cis.jhu.edu/~dsimcha/randaasealed.html is somewhere in between. It is clear you have a good understanding of sealing and hash containers, but let me ask you this - if you wanted to sell this to someone, what would you do? Probably you'd show some relevant benchmarks putting the built-in hashes to shame. Maybe you'd have some good examples - yes, we know it's a hash, but it doesn't hurt to see some code pulp over there. Maybe you'd explain sealing and discuss its relative advantages and disadvantages (which have not yet been documented anywhere - a great opportunity). Maybe you'd even show some numbers showing how sealing does as well/better/worse than reference leaking. This is a good amount of upfront work for little promise. Again, I don't know yet how to optimize for minimizing that. What I did see works on Boost is a request for interest in the form of a discussion (usually with NO source, only USAGE examples) asking if there are people who are interested in such a notion. In this particular case, you'd need numbers to make a strong case which means that code must be already written. For something like e.g. "how about a Finite State Automaton library?" perhaps upfront code wouldn't be necessary for gauging interest. Andrei

Feb 02 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/2/11 8:56 AM, dsimcha wrote:
 BTW, I thought the case for a sealed hash table with an optimized memory
 allocation scheme was self-evident to most heavy D users, given how the
 builtin hash table leaks memory/fragments the heap (I think it's a
 little of both) and destroys multithreaded performance.

Perhaps a very good argument could be made with some benchmarks that show how your table is better than others in casual use and in heavy use. Andrei
Feb 02 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday 28 January 2011 10:31:58 Andrei Alexandrescu wrote:
 Today after work I plan to start making one pass through std.container.
 After having thought of things for a long time, my conclusions are as
 follows:
 
 1. Containers will be classes.
 
 2. Most of the methods in existing containers will be final. It's up to
 the container to make a method final or not.
 
 3. Containers and their ranges decide whether they give away references
 to their objects. Sealing is a great idea but it makes everybody's life
 too complicated. I'll defer sealing to future improvements in the
 language and/or the reflection subsystem.
 
 4. Containers will assume that objects are cheap to copy so they won't
 worry about moving primitives.
 
 Any showstoppers, please share.

Overall, I think definitely support this approach. I definitely think that containers should be reference types, and I'm not sure that I care much whether they end up being structs or classes. My one concern would be the ability to inline the container's function calls, and marking them as final fixes that problem. I'm not sure that I know enough about the sealing issue to comment on it, but if what we're doing doesn't preclude having it later, then that sounds like it's a good compromise. And I definitely think that make it so that containers don't have to worry about moving primitives is the way to go. So, I'm essentially 100% behind this, and I don't see any real issues with it. Containers is one of the places that Phobos is generally behind on and which really needs to be taken care if we really want people to be using D and Phobos. So, it's great to see that progress is being made with regards to containers. - Jonathan M Davis
Jan 29 2011
prev sibling parent spir <denis.spir gmail.com> writes:
On 02/01/2011 05:00 PM, Andrei Alexandrescu wrote:
 Regarding the general issue that someone makes an informal proposal (either
 here, as a DIP, or on the Phobos mailing list), followed by a thundering
 silence: I believe that a good technique is to formalize the proposal review
 process, which has been a homerun for Boost. The disadvantage of that is that
 almost without exception this is very taxing to library submitters. This means
 the submitter must put a lot of thought and a lot of work into motivating,
 polishing, and documenting an artifact without any guarantee that it would lead
 to inclusion in the target library. I've seen very, VERY elaborate Boost
 submissions fail - literally months of work gone to waste.

An alternative, or a complementary approach, may be to delegate part of your responsability. In this case, I'm thinking at a pool of people which "mission" would be to obviously show interest (or lack of) for proposals made on the mailing list --whatever their advancement, formality, code quality... This would provide a valuable indicator while removing some load from your shoulders, I guess. Such people may be chosen by cooptation. Note this approach is not exclusive of formal & heavy adoption process like Boost's; instead it can be a complementary or preliminary way of judging interest for proposals. A similar principle may indeed be used for other purpose: specification & evolution of D-the-language, implementation & bug removal of the reference compiler,... Denis -- _________________ vita es estrany spir.wikidot.com
Feb 01 2011