digitalmars.D - std.container and classes

Jonathan M Davis (15/15) Dec 13 2011 Is the plan for std.container still to have all of its containers be fin...

Andrei Alexandrescu (50/65) Dec 17 2011 Apologies for being slow on this. It may be a fateful time to discuss

Jesse Phillips (23/35) Dec 17 2011 This is an interesting reasoning to go with class. Which is similar to

Peter Alexander (4/10) Dec 17 2011 Allocators are very commonly used in C++, although people rarely use

Peter Alexander (10/26) Dec 17 2011 My thoughts on this:
Vladimir Panteleev (4/10) Dec 17 2011 Since containers are templated, why not use asserts for this, and
Jonathan M Davis (36/114) Dec 17 2011 The only reason that I can think of to use a reference-counted struct in...

Andrei Alexandrescu (6/13) Dec 17 2011 Being on the heap is not the main issue. The main issue is the data

Jonathan M Davis (37/54) Dec 17 2011 My initial take on this is that the primary difference between a final c...

Andrei Alexandrescu (7/45) Dec 18 2011 If this is a new notion, it might take a while for all of its aspects to...

Jonathan M Davis (36/41) Dec 18 2011 After thinking about this a bit, I don't think that custom allocators ar...

Jacob Carlborg (6/47) Dec 18 2011 The Tango runtime has added a new method in Object, "dispose". This

Jonathan M Davis (53/56) Dec 20 2011 That sounds a lot like using clear, though clear doesn't free memory unl...

foobar (17/86) Dec 20 2011 I disagree with the above conclusion. you conflate two issues

Jonathan M Davis (9/27) Dec 20 2011 And if they're classes and not managed by the GC nor in a struct which m...

foobar (9/18) Dec 21 2011 Memory management should be the responsibility of the allocator

Andrei Alexandrescu (11/19) Dec 21 2011 (Please don't overquote. It took me a million years to scroll all the

foobar (7/18) Dec 21 2011 Could you give an example to illustrate this design difference?

Jacob Carlborg (11/67) Dec 20 2011 With "dispose" it would give you the option to control the lifetime of

Jonathan M Davis (20/26) Dec 20 2011 Well, both delete and scope are going away, so std.typecons.scoped would...

Jacob Carlborg (7/33) Dec 21 2011 I was thinking if there were good uses for "delete" and "scope" they

Caligo (8/10) Dec 20 2011 Two questions:

Andrei Alexandrescu (9/17) Dec 20 2011 Good containers are hard to define. It took the C++ community ten years

so (3/4) Dec 21 2011 Name please?

Froglegs (3/7) Dec 21 2011 I believe he is referring to

so (4/13) Dec 21 2011 Thank you Mr Frog.

Steven Schveighoffer (11/21) Dec 19 2011 This whole thread of discussion is somewhat more complicated than it has...

Jerry (16/130) Dec 20 2011 One advantage of a struct container is that when you make it a member of...

Froglegs (8/8) Dec 20 2011 I don't really think ref counted struct vs class is fair,

foobar (16/102) Dec 20 2011 My thoughts are:

Jonathan M Davis (32/46) Dec 20 2011 Containers are fairly straightforward in general. So, if you need to imp...

foobar (10/24) Dec 20 2011 [snip]

Jonathan M Davis <jmdavisProg gmx.com> writes:

Is the plan for std.container still to have all of its containers be final 
classes (classes so that they're reference types and final so that their 
functions are inlinable)? Or has that changed? I believe that Andrei said 
something recently about discussing reference counting and containers with 
Walter.

The reason that I bring this up is that Array and SList are still structs, and 
the longer that they're structs, the more code that will break when they get 
changed to classes. Granted, some level of code breakage may occur when we add 
custom allocators to them, but since that would probably only affect the 
constructor (and preferably wouldn't affect anything if you want to simply 
create a container with the GC heap as you would now were Array and SList 
classes), the breakage for that should be minimal.

Is there any reason for me to not just go and make Array and SList final 
classes and create a pull request for it?

- Jonathan M Davis

Dec 13 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/13/11 9:08 PM, Jonathan M Davis wrote:
 Is the plan for std.container still to have all of its containers be final
 classes (classes so that they're reference types and final so that their
 functions are inlinable)? Or has that changed? I believe that Andrei said
 something recently about discussing reference counting and containers with
 Walter.

 The reason that I bring this up is that Array and SList are still structs, and
 the longer that they're structs, the more code that will break when they get
 changed to classes. Granted, some level of code breakage may occur when we add
 custom allocators to them, but since that would probably only affect the
 constructor (and preferably wouldn't affect anything if you want to simply
 create a container with the GC heap as you would now were Array and SList
 classes), the breakage for that should be minimal.

 Is there any reason for me to not just go and make Array and SList final
 classes and create a pull request for it?

 - Jonathan M Davis

Apologies for being slow on this. It may be a fateful time to discuss 
that right now, after all the discussion of what's appropriate for 
stdlib vs. application code etc.

As some of you know, Walter and I went back and forth several times on 
this. First, there was the issue of making containers value types vs. 
reference types. Making containers value types would be in keep with the 
STL approach. However, Walter noted that copying entire containers by 
default is most often NOT desirable and there's significant care and 
adornments in C++ programs to make sure that that default behavior is 
avoided (e.g. adding const& to function parameters).

So we decided to make containers reference types, and that seemed to be 
a good choice.

The second decision is classes vs. structs. Walter correctly pointed out 
that the obvious choice for defining a reference type in D - whether the 
type is momonorphic or polymorphic - is making it a class. If containers 
aren't classes, the reasoning went, it means we took a wrong step 
somewhere; it might mean our flagship abstraction for reference types is 
not suitable for, well, defining a reference type.

Fast forward a couple of months, a few unslept nights, and a bunch of 
newsgroup and IRC conversations. Several additional pieces came together.

The most important thing I noticed is that people expect standard 
containers to have sophisticated memory management. Many ask not about 
containers as much as "containers with custom allocators". Second, 
containers tend to be large memory users by definition. Third, 
containers are self-contained (heh) and relatively simple in terms of 
what they model, meaning that they _never_ suffer from circular 
references, like general entity types might.

All of these arguments very strongly suggest that many want containers 
to be types with deterministic control over memory and accept 
configurable allocation strategies (regions, heap, malloc, custom). So 
that would mean containers should be reference counted structs.

This cycle of thought has happened twice, and the evidence coming the 
second time has been stronger. The first time around I went about and 
started implementing std.container with reference counting in mind. The 
code is not easy to write, and is not to be recommended for most types, 
hence my thinking (at the end of the first cycle) that we should switch 
to class containers. One fear I have is that people would be curious, 
look at the implementation of std.container, and be like "so am I 
expected to do all this to define a robust type"? I start to think that 
the right answer to that is to improve library support for good 
reference counted types, and define reference counted struct containers 
that are deterministic.

Safety is also an issue. I was hoping I'd provide safety as a policy, 
e.g. one may choose for a given container whether they want safe or not 
(and presumably fast). I think it's best to postpone that policy and 
focus for now on defining safe containers with safe ranges. This 
precludes e.g. using T[] as a range for Array!T.

Please discuss.


Andrei

Dec 17 2011

Jesse Phillips <jessekphillips+d gmail.com> writes:

On Sat, 17 Dec 2011 17:31:46 -0600, Andrei Alexandrescu wrote:

 The second decision is classes vs. structs. Walter correctly pointed out
 that the obvious choice for defining a reference type in D - whether the
 type is momonorphic or polymorphic - is making it a class. If containers
 aren't classes, the reasoning went, it means we took a wrong step
 somewhere; it might mean our flagship abstraction for reference types is
 not suitable for, well, defining a reference type.

This is an interesting reasoning to go with class. Which is similar to 
what you end up saying here:

 One fear I have is that people would be curious,
 look at the implementation of std.container, and be like "so am I
 expected to do all this to define a robust type"? I start to think that
 the right answer to that is to improve library support for good
 reference counted types, and define reference counted struct containers
 that are deterministic.

I'm not really sure the answer, but here are some thoughts.

There is interest in having reference counting, and making it easy. There 
was a question on SO about how to make a reference counted object. I gave 
it my best shot answering having never done it myself. It isn't very 
straight forward and I think people will expect to use it similar to 
auto_prt. And considering how similar each wrapped object is, I think D 
can pull it off.

http://stackoverflow.com/questions/4632355/making-a-reference-counted-
object-in-d-using-refcountedt

So building a reference counted container type probably shouldn't be a 
challenge in D (or I should say it should be a goal to make it not a 
challenge). For example, it might be as simple as, a build a value 
container which you then expose wrapped with RefCounted!T

alias RefCounted!ContainerImp Container;

But as to whether this should be what is implemented in the standard 
library, I don't know. You make mention of custom allocators and such. Is 
this interest going to be of benefit, or is it just something people are 
use to having from C++? If it makes sense to have such then the 
containers should support it. Don't classes allow for custom allocators? 
Is there a reason that can't be used? Do we need to improve on that?

Dec 17 2011

Peter Alexander <peter.alexander.au gmail.com> writes:

On 18/12/11 12:06 AM, Jesse Phillips wrote:
 But as to whether this should be what is implemented in the standard
 library, I don't know. You make mention of custom allocators and such. Is
 this interest going to be of benefit, or is it just something people are
 use to having from C++? If it makes sense to have such then the
 containers should support it. Don't classes allow for custom allocators?
 Is there a reason that can't be used? Do we need to improve on that?

Allocators are very commonly used in C++, although people rarely use 
them as they are implemented in the standard library because they were 
poorly designed. We can't repeat the same mistakes.

Dec 17 2011

Peter Alexander <peter.alexander.au gmail.com> writes:

On 17/12/11 11:31 PM, Andrei Alexandrescu wrote:
 The most important thing I noticed is that people expect standard
 containers to have sophisticated memory management. Many ask not about
 containers as much as "containers with custom allocators". Second,
 containers tend to be large memory users by definition. Third,
 containers are self-contained (heh) and relatively simple in terms of
 what they model, meaning that they _never_ suffer from circular
 references, like general entity types might.

 All of these arguments very strongly suggest that many want containers
 to be types with deterministic control over memory and accept
 configurable allocation strategies (regions, heap, malloc, custom). So
 that would mean containers should be reference counted structs.

My thoughts on this:

Allocators are a must, and need to handled better than they were in C++.

Classes are simpler than ref-counted structs, and ref-counting is a 
massive performance drain and code-bloater anyway, so just go with final 
classes.


 Safety is also an issue. I was hoping I'd provide safety as a policy,
 e.g. one may choose for a given container whether they want safe or not
 (and presumably fast). I think it's best to postpone that policy and
 focus for now on defining safe containers with safe ranges. This
 precludes e.g. using T[] as a range for Array!T.

IMO they should just be safe and not provide the option of non-safe. It 
just complicates things to cater to people that won't use them anyway 
C++ standard library containers are unsafe and relatively lean, yet are 
rarely used when performance really matters.

Dec 17 2011

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Saturday, 17 December 2011 at 23:31:47 UTC, Andrei 
Alexandrescu wrote:
 Safety is also an issue. I was hoping I'd provide safety as a 
 policy, e.g. one may choose for a given container whether they 
 want safe or not (and presumably fast). I think it's best to 
 postpone that policy and focus for now on defining safe 
 containers with safe ranges. This precludes e.g. using T[] as a 
 range for Array!T.

Since containers are templated, why not use asserts for this, and 
let the user choose with -release (just as with built-in arrays)?

Dec 17 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, December 17, 2011 17:31:46 Andrei Alexandrescu wrote:
 On 12/13/11 9:08 PM, Jonathan M Davis wrote:
 Is the plan for std.container still to have all of its containers be
 final classes (classes so that they're reference types and final so
 that their functions are inlinable)? Or has that changed? I believe
 that Andrei said something recently about discussing reference counting
 and containers with Walter.
 
 The reason that I bring this up is that Array and SList are still
 structs, and the longer that they're structs, the more code that will
 break when they get changed to classes. Granted, some level of code
 breakage may occur when we add custom allocators to them, but since
 that would probably only affect the constructor (and preferably
 wouldn't affect anything if you want to simply create a container with
 the GC heap as you would now were Array and SList classes), the
 breakage for that should be minimal.
 
 Is there any reason for me to not just go and make Array and SList final
 classes and create a pull request for it?
 
 - Jonathan M Davis

 
 Apologies for being slow on this. It may be a fateful time to discuss
 that right now, after all the discussion of what's appropriate for
 stdlib vs. application code etc.
 
 As some of you know, Walter and I went back and forth several times on
 this. First, there was the issue of making containers value types vs.
 reference types. Making containers value types would be in keep with the
 STL approach. However, Walter noted that copying entire containers by
 default is most often NOT desirable and there's significant care and
 adornments in C++ programs to make sure that that default behavior is
 avoided (e.g. adding const& to function parameters).
 
 So we decided to make containers reference types, and that seemed to be
 a good choice.
 
 The second decision is classes vs. structs. Walter correctly pointed out
 that the obvious choice for defining a reference type in D - whether the
 type is momonorphic or polymorphic - is making it a class. If containers
 aren't classes, the reasoning went, it means we took a wrong step
 somewhere; it might mean our flagship abstraction for reference types is
 not suitable for, well, defining a reference type.
 
 Fast forward a couple of months, a few unslept nights, and a bunch of
 newsgroup and IRC conversations. Several additional pieces came together.
 
 The most important thing I noticed is that people expect standard
 containers to have sophisticated memory management. Many ask not about
 containers as much as "containers with custom allocators". Second,
 containers tend to be large memory users by definition. Third,
 containers are self-contained (heh) and relatively simple in terms of
 what they model, meaning that they _never_ suffer from circular
 references, like general entity types might.
 
 All of these arguments very strongly suggest that many want containers
 to be types with deterministic control over memory and accept
 configurable allocation strategies (regions, heap, malloc, custom). So
 that would mean containers should be reference counted structs.
 
 This cycle of thought has happened twice, and the evidence coming the
 second time has been stronger. The first time around I went about and
 started implementing std.container with reference counting in mind. The
 code is not easy to write, and is not to be recommended for most types,
 hence my thinking (at the end of the first cycle) that we should switch
 to class containers. One fear I have is that people would be curious,
 look at the implementation of std.container, and be like "so am I
 expected to do all this to define a robust type"? I start to think that
 the right answer to that is to improve library support for good
 reference counted types, and define reference counted struct containers
 that are deterministic.
 
 Safety is also an issue. I was hoping I'd provide safety as a policy,
 e.g. one may choose for a given container whether they want safe or not
 (and presumably fast). I think it's best to postpone that policy and
 focus for now on defining safe containers with safe ranges. This
 precludes e.g. using T[] as a range for Array!T.
 
 Please discuss.

The only reason that I can think of to use a reference-counted struct instead 
of a class is becuse then it's easier to avoid the GC heap entirely.  Almost 
all of a container's memory is going to end up on the heap regardless, because 
the elements almost never end up in the container itself. They're in a dynamic 
array or in nodes or something similar. So, whether the container is ref-
counted or a class is almost irrelevant. If anything making it a class makes 
more sense, because then it doesn't have to worry about the extra cost of ref-
counting. However, once we bring allocators into the picture, the situation 
changes slightly. In both cases, the elements themselves end up where the 
allocator puts them. However, in the case of a struct, the member variables 
themselves end up on the stack, whereas with the class, they end up on the GC 
heap unless the programmer puts the object somewhere else with emplace or 
possibly with a function on the allocator.

The other concern is that with a class, it's immediately clear that you're 
dealing with a reference, but that's much easier to miss with a struct when 
structs are almost always value types.

Also, some containers can't be an a usable state from their init value (e.g. 
RedBlackTree), so they're more awkward as structs.  disable might fix that 
problem, but I don't know. It would be also be awkward to not be able to just 
declare a RedBlackTree without immediately initializing it.

So, in general, I think that final classes make more sense. The ref-counted 
struct is just trying to do what a class does already, and the class doesn't 
have the extra overhead of the ref-counting. The only concern is the fact that 
even when using a custom allocator, the few member variables that the 
container has end up on the GC heap unless the programmer goes to extra effort 
to avoid it, and part of the point of using the custom allocatorc is 
specifically to avoid the GC heap. The allocator API could make that easy to 
deal with though by having a function which takes the type and the arguments 
for the constructor and constructs it for you. e.g.

auto arr = custom.alloc!Array(4, 5, 7);

So, I favor final classes. I just don't see much benefit in ref-counted
structs. 
They're more confusing and generally harder to deal with, with the one 
exception being avoiding the GC heap for the member variables of the 
container.

- Jonathan M Davis

Dec 17 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/17/11 7:52 PM, Jonathan M Davis wrote:
 The only reason that I can think of to use a reference-counted struct instead
 of a class is becuse then it's easier to avoid the GC heap entirely.  Almost
 all of a container's memory is going to end up on the heap regardless, because
 the elements almost never end up in the container itself.

Being on the heap is not the main issue. The main issue is the data 
becoming garbage once all references are gone.

 They're in a dynamic
 array or in nodes or something similar. So, whether the container is ref-
 counted or a class is almost irrelevant.

I think this argument is starting off the wrong premise, and is already 
wrong by this point, so I skipped the rest. Am I right?


Andrei

Dec 17 2011

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Sunday, December 18, 2011 00:15:40 Andrei Alexandrescu wrote:
 On 12/17/11 7:52 PM, Jonathan M Davis wrote:
 The only reason that I can think of to use a reference-counted struct
 instead of a class is becuse then it's easier to avoid the GC heap
 entirely. Almost all of a container's memory is going to end up on the
 heap regardless, because the elements almost never end up in the
 container itself.

 
 Being on the heap is not the main issue. The main issue is the data
 becoming garbage once all references are gone.
 
 They're in a dynamic
 array or in nodes or something similar. So, whether the container is
 ref-
 counted or a class is almost irrelevant.

 
 I think this argument is starting off the wrong premise, and is already
 wrong by this point, so I skipped the rest. Am I right?

My initial take on this is that the primary difference between a final class
and 
a ref-counted struct is the fact that the class is on the heap and needs no 
ref-counting, whereas the struct is no the stack and has to do ref-counting, 
and everything else is the same regardless, since the memory for the container 
has to go on the heap regardless (be it the GC heap or wherever the allocator 
puts it).

But what you're bringing up is that in the case of the class, everything in 
the container stays around until it's garbage collected, whereas in the case 
of the struct, it goes away as soon as the ref-count hits zero? I hadn't 
thought of that.

But if we're talking about the GC heap, since most of the internals are going 
to be on the heap in either case, with the only difference being the container 
itself and its value type member variables. However, once you use a custom 
allocator (say one that uses malloc and free), then in the struct case, 
everything gets cleaned up as soon as the ref-count hits zero, whereas with 
the class it sits around until the container itself is collected - assuming 
that the class itself is on the GC heap. If it was using the malloc-free 
allocator, then it would be around until it was manually freed by the 
allocator.

I don't know. That is certainly food for thought. My natural inclinition is to 
just make it a class, since it's a reference type, and then you don't have to 
worry about the cost of ref-counting or have any confusion over whether the 
container is a reference type or not. But if there's any real performance 
advantage in using ref-counted structs due to something to do with memory 
management, then that would be highly valuable.

If no custom allocators are used, then I'd expect the struct to be worse off, 
since it has the cost of being copied as well as the cost of ref-counting, 
whereas the only cost in passing the class around is copying it's reference, 
and other than the few member variables that the container has, there's no 
difference in how long the memory in the container sticks around. It has to 
wait for the GC in either case. It's only when a custom allocator which could 
free the memory as soon as the ref-count hits zero comes into play that it 
makes a difference, in which case the struct would probably be more efficient. 
Unless I'm missing something?

It certainly doesn't seem like an easy call.

- Jonathan M Davis

Dec 17 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/18/11 1:42 AM, Jonathan M Davis wrote:
 On Sunday, December 18, 2011 00:15:40 Andrei Alexandrescu wrote:
 On 12/17/11 7:52 PM, Jonathan M Davis wrote:
 The only reason that I can think of to use a reference-counted struct
 instead of a class is becuse then it's easier to avoid the GC heap
 entirely. Almost all of a container's memory is going to end up on the
 heap regardless, because the elements almost never end up in the
 container itself.

 Being on the heap is not the main issue. The main issue is the data
 becoming garbage once all references are gone.

 They're in a dynamic
 array or in nodes or something similar. So, whether the container is
 ref-
 counted or a class is almost irrelevant.

 I think this argument is starting off the wrong premise, and is already
 wrong by this point, so I skipped the rest. Am I right?

 My initial take on this is that the primary difference between a final class
and
 a ref-counted struct is the fact that the class is on the heap and needs no
 ref-counting, whereas the struct is no the stack and has to do ref-counting,
 and everything else is the same regardless, since the memory for the container
 has to go on the heap regardless (be it the GC heap or wherever the allocator
 puts it).

 But what you're bringing up is that in the case of the class, everything in
 the container stays around until it's garbage collected, whereas in the case
 of the struct, it goes away as soon as the ref-count hits zero? I hadn't
 thought of that.

 But if we're talking about the GC heap, since most of the internals are going
 to be on the heap in either case, with the only difference being the container
 itself and its value type member variables. However, once you use a custom
 allocator (say one that uses malloc and free), then in the struct case,
 everything gets cleaned up as soon as the ref-count hits zero, whereas with
 the class it sits around until the container itself is collected - assuming
 that the class itself is on the GC heap. If it was using the malloc-free
 allocator, then it would be around until it was manually freed by the
 allocator.

 I don't know. That is certainly food for thought.

If this is a new notion, it might take a while for all of its aspects to 
sink in. I'm not saying that to be smug - reference counting vs. other 
methods of collecting garbage is definitely a difficult topic on today's 
architectures, and I can say at least it took me quite a while to build 
some reasonable mental model of the related tradeoffs.


Andrei

Dec 18 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, December 18, 2011 02:46:01 Andrei Alexandrescu wrote:
 If this is a new notion, it might take a while for all of its aspects to
 sink in. I'm not saying that to be smug - reference counting vs. other
 methods of collecting garbage is definitely a difficult topic on today's
 architectures, and I can say at least it took me quite a while to build
 some reasonable mental model of the related tradeoffs.

After thinking about this a bit, I don't think that custom allocators are 
really going to work with classes very well. Unless you allocate the entire 
class with the custom allocator (as opposed to just telling the container to 
use the custom allocator), the memory won't be cleaned up until the object is 
collected by the GC (even though it's using a custom allocator to allocate its 
internals), in which case, what does the custom allocator really buy us? So, 
the idea of just passing the allocator to the container doesn't seem to cut it 
for a class. The object itself needs to be allocated with the custom 
allocator. On top of that, if you're allocating the container with a custom 
allocator, doesn't that risk issues like what the to-be-deprecated scope 
modifier causes? You practically end up having to wrap the class in a struct 
anyway, in which case, what's the point of using a class?

Just in general, it's much harder to control the memory if the container is a 
class rather than a struct. With a struct, you can do pretty much whatever you 
want with the memory. Its lifetime is strictly controlled, which gives much 
greater opportunites to optimize its memory usage. With a class, it's going to 
be difficult to completely divorce the container from the GC in a clean manner.

With a struct, we could easily make the default not use the GC at all, which 
would make the default use of the containers much more efficient. You also 
wouldn't have to worry about creating the container itself with an allocator 
and all of the potential issues that that brings.

I don't particularly like the idea of the cost of passing the container around 
as a struct (since it's few member variables will have to be memcopied and the 
ref-counting will have to be done), but that's not a huge cost (though it _is_ 
a pervasive one), and I'm begginning to think that it's a necessary one if we 
want containers to be able to properly and efficiently manage their own memory. 
Depending on the use case, the benefits from the containers being able to fully 
manage their own memory without the programmer having to do anything special 
could easily outweigh the additional cost of passing a struct around. And if 
ranges over such containers are passed around more frequently than the 
containers themselves (as I would expect), then the minor cost of passing the 
container around is even less of a big deal, since it would primarily be the 
ranges being passed around.

So, I'm beginning to think that we're going to have to go the struct route.

- Jonathan M Davis

Dec 18 2011

Jacob Carlborg <doob me.com> writes:

On 2011-12-19 05:12, Jonathan M Davis wrote:
 On Sunday, December 18, 2011 02:46:01 Andrei Alexandrescu wrote:
 If this is a new notion, it might take a while for all of its aspects to
 sink in. I'm not saying that to be smug - reference counting vs. other
 methods of collecting garbage is definitely a difficult topic on today's
 architectures, and I can say at least it took me quite a while to build
 some reasonable mental model of the related tradeoffs.

 After thinking about this a bit, I don't think that custom allocators are
 really going to work with classes very well. Unless you allocate the entire
 class with the custom allocator (as opposed to just telling the container to
 use the custom allocator), the memory won't be cleaned up until the object is
 collected by the GC (even though it's using a custom allocator to allocate its
 internals), in which case, what does the custom allocator really buy us? So,
 the idea of just passing the allocator to the container doesn't seem to cut it
 for a class. The object itself needs to be allocated with the custom
 allocator. On top of that, if you're allocating the container with a custom
 allocator, doesn't that risk issues like what the to-be-deprecated scope
 modifier causes? You practically end up having to wrap the class in a struct
 anyway, in which case, what's the point of using a class?

 Just in general, it's much harder to control the memory if the container is a
 class rather than a struct. With a struct, you can do pretty much whatever you
 want with the memory. Its lifetime is strictly controlled, which gives much
 greater opportunites to optimize its memory usage. With a class, it's going to
 be difficult to completely divorce the container from the GC in a clean manner.

 With a struct, we could easily make the default not use the GC at all, which
 would make the default use of the containers much more efficient. You also
 wouldn't have to worry about creating the container itself with an allocator
 and all of the potential issues that that brings.

 I don't particularly like the idea of the cost of passing the container around
 as a struct (since it's few member variables will have to be memcopied and the
 ref-counting will have to be done), but that's not a huge cost (though it _is_
 a pervasive one), and I'm begginning to think that it's a necessary one if we
 want containers to be able to properly and efficiently manage their own memory.
 Depending on the use case, the benefits from the containers being able to fully
 manage their own memory without the programmer having to do anything special
 could easily outweigh the additional cost of passing a struct around. And if
 ranges over such containers are passed around more frequently than the
 containers themselves (as I would expect), then the minor cost of passing the
 container around is even less of a big deal, since it would primarily be the
 ranges being passed around.

 So, I'm beginning to think that we're going to have to go the struct route.

 - Jonathan M Davis

The Tango runtime has added a new method in Object, "dispose". This 
method is called when "scope" is used or "delete" is explicitly called. 
I don't know if that would help.

-- 
/Jacob Carlborg

Dec 18 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Monday, December 19, 2011 08:54:00 Jacob Carlborg wrote:
 The Tango runtime has added a new method in Object, "dispose". This
 method is called when "scope" is used or "delete" is explicitly called.
 I don't know if that would help.

That sounds a lot like using clear, though clear doesn't free memory unless 
the finalizer does (and I'm not sure that managing memory with finalizers 
actually works right now; there were issues with that previously - but it 
might have only been GC memory, so perhaps malloc and free would still work). 
The issue with them is escaping references. You can easily end up with 
references to data which has been destroyed. It also makes them harder to pass 
around unless you essentially wrap the container in ref-counted struct.

The lifetime of the container must be managed if you really want to be able to 
use custom allocators or have the container be as efficient as possible with 
regards to its memory usage in the general case. With a struct, it manages 
itself. It can do whatever it wants to with memory internally, and the 
combination of its constructor, postblit constructor, and destructor allows it 
to take care of it itself and clean up its memory when it's destroyed. You 
also don't have issues of stuff referring to a container which doesn't exist 
anymore (unless you add pointers to the container into the mix and then move 
or destroy the container).

With classes, if they're owned by the GC, then it's the GC that manages their 
lifetime. It destroys them when they're no longer referenced and it runs a 
collection cycle. You have no control over its lifetime, and even if it tries 
to manage its memory internally on some level, that memory won't be cleaned up 
until the GC collects the container itself. So, if you want to manage the 
lifetime of the class, you can't use a naked reference anymore. You need a way 
to get the GC to collect it, or you need to have it somewhere else other than 
the GC heap.

If you use something like Scoped, then its lifetime will be tied to the scope 
that it's in, but you risk references to it escaping, and limiting the 
container to a particular scope could be far too limiting anyway. That being 
the case, if you want to use an allocator other than the GC for the class 
itself, then you're probably going to have to stick it in a struct. That 
struct could then manage the container's lifetime on some level. It would 
probably have to use ref-counting and then call clear on the container when 
the ref-count hits zero. That should then allow the container to at least 
clean up its internal memory, assuming that that's not on the GC heap, but 
unless you use emplace on non-GC heap memory, the container itself still can't 
be collected until the GC collects it (since clear destroys it but doesn't 
free its memory). Not to mention, if you're using a ref-counted struct to hold 
a class in order to manage that class' lifetime, why on earth did you make it 
a class in the first place?

If what you're suggesting with dispose is that it would act like clear except 
that it would actually free the container, well that pretty much goes against 
the direction that we've been trying to go with the GC, which is to not have 
the programmer explicitly freeing anything on the GC heap (which is why delete 
is being removed from the language).

I really think that if we want deterministic destruction of containers, they 
need to be structs. And if we want to use custom allocators with containers, I 
don't see how we could reasonably expect them to be of much use unless we're 
expecting that programmers will explicitly call clear on them when they're 
done with them. So, while the fact that containers need to be reference types 
does imply that classes are the way to go, I think that we're going to have to 
go with structs if we want their memory management to be more efficient than 
simply letting the GC handle it.

- Jonathan M Davis

Dec 20 2011

"foobar" <foo bar.com> writes:

On Wednesday, 21 December 2011 at 05:08:27 UTC, Jonathan M Davis 
wrote:
 On Monday, December 19, 2011 08:54:00 Jacob Carlborg wrote:
 The Tango runtime has added a new method in Object, "dispose". 
 This
 method is called when "scope" is used or "delete" is 
 explicitly called.
 I don't know if that would help.

 That sounds a lot like using clear, though clear doesn't free 
 memory unless the finalizer does (and I'm not sure that 
 managing memory with finalizers actually works right now; there 
 were issues with that previously - but it might have only been 
 GC memory, so perhaps malloc and free would still work). The 
 issue with them is escaping references. You can easily end up 
 with references to data which has been destroyed. It also makes 
 them harder to pass around unless you essentially wrap the 
 container in ref-counted struct.

 The lifetime of the container must be managed if you really 
 want to be able to use custom allocators or have the container 
 be as efficient as possible with regards to its memory usage in 
 the general case. With a struct, it manages itself. It can do 
 whatever it wants to with memory internally, and the 
 combination of its constructor, postblit constructor, and 
 destructor allows it to take care of it itself and clean up its 
 memory when it's destroyed. You also don't have issues of stuff 
 referring to a container which doesn't exist anymore (unless 
 you add pointers to the container into the mix and then move or 
 destroy the container).

 With classes, if they're owned by the GC, then it's the GC that 
 manages their lifetime. It destroys them when they're no longer 
 referenced and it runs a collection cycle. You have no control 
 over its lifetime, and even if it tries to manage its memory 
 internally on some level, that memory won't be cleaned up until 
 the GC collects the container itself. So, if you want to manage 
 the lifetime of the class, you can't use a naked reference 
 anymore. You need a way to get the GC to collect it, or you 
 need to have it somewhere else other than the GC heap.

 If you use something like Scoped, then its lifetime will be 
 tied to the scope that it's in, but you risk references to it 
 escaping, and limiting the container to a particular scope 
 could be far too limiting anyway. That being the case, if you 
 want to use an allocator other than the GC for the class 
 itself, then you're probably going to have to stick it in a 
 struct. That struct could then manage the container's lifetime 
 on some level. It would probably have to use ref-counting and 
 then call clear on the container when the ref-count hits zero. 
 That should then allow the container to at least clean up its 
 internal memory, assuming that that's not on the GC heap, but 
 unless you use emplace on non-GC heap memory, the container 
 itself still can't be collected until the GC collects it (since 
 clear destroys it but doesn't free its memory). Not to mention, 
 if you're using a ref-counted struct to hold a class in order 
 to manage that class' lifetime, why on earth did you make it a 
 class in the first place?

 If what you're suggesting with dispose is that it would act 
 like clear except that it would actually free the container, 
 well that pretty much goes against the direction that we've 
 been trying to go with the GC, which is to not have the 
 programmer explicitly freeing anything on the GC heap (which is 
 why delete is being removed from the language).

 I really think that if we want deterministic destruction of 
 containers, they need to be structs. And if we want to use 
 custom allocators with containers, I don't see how we could 
 reasonably expect them to be of much use unless we're expecting 
 that programmers will explicitly call clear on them when 
 they're done with them. So, while the fact that containers need 
 to be reference types does imply that classes are the way to 
 go, I think that we're going to have to go with structs if we 
 want their memory management to be more efficient than simply 
 letting the GC handle it.

 - Jonathan M Davis

I disagree with the above conclusion. you conflate two issues 
that are orthogonal:
a. value vs. ref semantics which is already seemed to be decided 
in favor of the latter and hence classes. b. memory and lifetime 
management

The containers should allow for (disregard the specifics of the 
syntax):

Container a = new(SharedMemAllocator) LinkedList();
Container b = new(MallocAllocator) LinkedList();
Container c = new(GC) LinkedList();

When adding an item to the above containers the relevant 
allocator will enact its policy about intermixing with other 
allocators - by default the item will be copied if it comes from 
a separate allocator. I don't see anything here that forces the 
use of structs instead of classes.

Dec 20 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Wednesday, December 21, 2011 08:07:23 foobar wrote:
 I disagree with the above conclusion. you conflate two issues
 that are orthogonal:
 a. value vs. ref semantics which is already seemed to be decided
 in favor of the latter and hence classes. b. memory and lifetime
 management
 
 The containers should allow for (disregard the specifics of the
 syntax):
 
 Container a = new(SharedMemAllocator) LinkedList();
 Container b = new(MallocAllocator) LinkedList();
 Container c = new(GC) LinkedList();
 
 When adding an item to the above containers the relevant
 allocator will enact its policy about intermixing with other
 allocators - by default the item will be copied if it comes from
 a separate allocator. I don't see anything here that forces the
 use of structs instead of classes.

And if they're classes and not managed by the GC nor in a struct which manages 
their lifetime, how are they going to be freed? Does the user have to 
explicitly free them themselves? How is that better than using a ref-counted 
struct?

And no, using reference semantics does _not_ require classes. A class is just 
the easiest way to get reference semantics. It doesn't necessarily mean that 
it's the best way. That depends on the context.

- Jonathan M Davis

Dec 20 2011

"foobar" <foo bar.com> writes:

On Wednesday, 21 December 2011 at 07:30:45 UTC, Jonathan M Davis 
wrote:
 And if they're classes and not managed by the GC nor in a 
 struct which manages their lifetime, how are they going to be 
 freed? Does the user have to explicitly free them themselves? 
 How is that better than using a ref-counted struct?

Memory management should be the responsibility of the allocator 
and it shouldn't be limited only to a single policy of 
ref-counting which has many disadvantages.

 And no, using reference semantics does _not_ require classes. A 
 class is just the easiest way to get reference semantics. It 
 doesn't necessarily mean that it's the best way. That depends 
 on the context.

 - Jonathan M Davis

I didn't use the word 'require' anywhere, but it is indeed by far 
the simplest and best way to define a ref-type. All I'm saying is 
that when travelling from NYC to Washington DC through China, you 
really should have good reasons to not use the direct route.

Dec 21 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/21/11 1:07 AM, foobar wrote:
 The containers should allow for (disregard the specifics of the syntax):

 Container a = new(SharedMemAllocator) LinkedList();
 Container b = new(MallocAllocator) LinkedList();
 Container c = new(GC) LinkedList();

 When adding an item to the above containers the relevant allocator will
 enact its policy about intermixing with other allocators - by default
 the item will be copied if it comes from a separate allocator. I don't
 see anything here that forces the use of structs instead of classes.

(Please don't overquote. It took me a million years to scroll all the 
way down on my phone. Thanks.)

This is an alluring proposition but it's a lot more difficult than one 
might think. The problem is the entire container design is different if 
e.g. using refcounting vs. classic garbage collection.

You may want to try designing a simple container and isolate all 
memory/lifetime-specific issues into an allocator. See what allocator 
interface you come up with. I was unable to solve this problem, but it's 
possible it has a solution.


Andrei

Dec 21 2011

"foobar" <foo bar.com> writes:

On Wednesday, 21 December 2011 at 16:39:13 UTC, Andrei 
Alexandrescu wrote:
 (Please don't overquote. It took me a million years to scroll 
 all the way down on my phone. Thanks.)

 This is an alluring proposition but it's a lot more difficult 
 than one might think. The problem is the entire container 
 design is different if e.g. using refcounting vs. classic 
 garbage collection.

 You may want to try designing a simple container and isolate 
 all memory/lifetime-specific issues into an allocator. See what 
 allocator interface you come up with. I was unable to solve 
 this problem, but it's possible it has a solution.


 Andrei

Could you give an example to illustrate this design difference? 
Which container would be affected by this?

There's already one good allocator API design I'm aware of:  
http://www.open-std.org/Jtc1/sc22/wg21/docs/papers/2005/n1850.pdf
Can we reuse the design/ideas or can you see problems with it?

Dec 21 2011

Jacob Carlborg <doob me.com> writes:

On 2011-12-21 06:07, Jonathan M Davis wrote:
 On Monday, December 19, 2011 08:54:00 Jacob Carlborg wrote:
 The Tango runtime has added a new method in Object, "dispose". This
 method is called when "scope" is used or "delete" is explicitly called.
 I don't know if that would help.

 That sounds a lot like using clear, though clear doesn't free memory unless
 the finalizer does (and I'm not sure that managing memory with finalizers
 actually works right now; there were issues with that previously - but it
 might have only been GC memory, so perhaps malloc and free would still work).
 The issue with them is escaping references. You can easily end up with
 references to data which has been destroyed. It also makes them harder to pass
 around unless you essentially wrap the container in ref-counted struct.

 The lifetime of the container must be managed if you really want to be able to
 use custom allocators or have the container be as efficient as possible with
 regards to its memory usage in the general case. With a struct, it manages
 itself. It can do whatever it wants to with memory internally, and the
 combination of its constructor, postblit constructor, and destructor allows it
 to take care of it itself and clean up its memory when it's destroyed. You
 also don't have issues of stuff referring to a container which doesn't exist
 anymore (unless you add pointers to the container into the mix and then move
 or destroy the container).

 With classes, if they're owned by the GC, then it's the GC that manages their
 lifetime. It destroys them when they're no longer referenced and it runs a
 collection cycle. You have no control over its lifetime, and even if it tries
 to manage its memory internally on some level, that memory won't be cleaned up
 until the GC collects the container itself. So, if you want to manage the
 lifetime of the class, you can't use a naked reference anymore. You need a way
 to get the GC to collect it, or you need to have it somewhere else other than
 the GC heap.

With "dispose" it would give you the option to control the lifetime of 
the object, in some cases.

 If you use something like Scoped, then its lifetime will be tied to the scope
 that it's in, but you risk references to it escaping, and limiting the
 container to a particular scope could be far too limiting anyway. That being
 the case, if you want to use an allocator other than the GC for the class
 itself, then you're probably going to have to stick it in a struct. That
 struct could then manage the container's lifetime on some level. It would
 probably have to use ref-counting and then call clear on the container when
 the ref-count hits zero. That should then allow the container to at least
 clean up its internal memory, assuming that that's not on the GC heap, but
 unless you use emplace on non-GC heap memory, the container itself still can't
 be collected until the GC collects it (since clear destroys it but doesn't
 free its memory). Not to mention, if you're using a ref-counted struct to hold
 a class in order to manage that class' lifetime, why on earth did you make it
 a class in the first place?

 If what you're suggesting with dispose is that it would act like clear except
 that it would actually free the container, well that pretty much goes against
 the direction that we've been trying to go with the GC, which is to not have
 the programmer explicitly freeing anything on the GC heap (which is why delete
 is being removed from the language).

I thought that one of the problems with the GC and memory was that you 
can't rely on when/if the destructors are called. With "dispose" it 
would give the developer a reliable method to cleanup resources, 
assuming "scope" or "delete" is used.

 I really think that if we want deterministic destruction of containers, they
 need to be structs. And if we want to use custom allocators with containers, I
 don't see how we could reasonably expect them to be of much use unless we're
 expecting that programmers will explicitly call clear on them when they're
 done with them. So, while the fact that containers need to be reference types
 does imply that classes are the way to go, I think that we're going to have to
 go with structs if we want their memory management to be more efficient than
 simply letting the GC handle it.

 - Jonathan M Davis

I was thinking that classes could be the safe and default but and 
scope/delete could be used as an optimization.

-- 
/Jacob Carlborg

Dec 20 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Wednesday, December 21, 2011 08:44:10 Jacob Carlborg wrote:
 I thought that one of the problems with the GC and memory was that you
 can't rely on when/if the destructors are called. With "dispose" it
 would give the developer a reliable method to cleanup resources,
 assuming "scope" or "delete" is used.

 I was thinking that classes could be the safe and default but and
 scope/delete could be used as an optimization.

Well, both delete and scope are going away, so std.typecons.scoped would have 
to be used instead, but that only really works for local variables. It would 
probably work for a member variable as well, but in both cases, you have to 
worry about the class going away when Scoped goes away and how that affects all 
of the references to it.

If, on other hand, you use a ref-counted struct, it takes care of itself 
without the risks of the container going away while references to it still 
exist (unless you keep a pointer to it somewhere instead of the ref-counted 
struct). So, ref-counted structs would be much safer, and they could use 
advanced memory management internally by default, making them more efficient by 
default.

What I really don't get, however, is how custom allocators would be of much 
use for classes unless the entire class is allocated that way (in which case, 
it's probably going to end up in a struct of some kind anyway unless you want 
the same issues that you get with scoped), since if the class is on the GC 
heap, its internal memory management couldn't release any of it until the 
container is collected by the GC. So, it seems to me that the desire for 
custom allocators would favor the use of ref-counted structs over classes.

- Jonathan M Davis

Dec 20 2011

Jacob Carlborg <doob me.com> writes:

On 2011-12-21 08:53, Jonathan M Davis wrote:
 On Wednesday, December 21, 2011 08:44:10 Jacob Carlborg wrote:
 I thought that one of the problems with the GC and memory was that you
 can't rely on when/if the destructors are called. With "dispose" it
 would give the developer a reliable method to cleanup resources,
 assuming "scope" or "delete" is used.

 I was thinking that classes could be the safe and default but and
 scope/delete could be used as an optimization.

 Well, both delete and scope are going away, so std.typecons.scoped would have
 to be used instead, but that only really works for local variables. It would
 probably work for a member variable as well, but in both cases, you have to
 worry about the class going away when Scoped goes away and how that affects all
 of the references to it.

I was thinking if there were good uses for "delete" and "scope" they 
could stay. I think the "dispose" method in Tango makes them more useful.

 If, on other hand, you use a ref-counted struct, it takes care of itself
 without the risks of the container going away while references to it still
 exist (unless you keep a pointer to it somewhere instead of the ref-counted
 struct). So, ref-counted structs would be much safer, and they could use
 advanced memory management internally by default, making them more efficient by
 default.

 What I really don't get, however, is how custom allocators would be of much
 use for classes unless the entire class is allocated that way (in which case,
 it's probably going to end up in a struct of some kind anyway unless you want
 the same issues that you get with scoped), since if the class is on the GC
 heap, its internal memory management couldn't release any of it until the
 container is collected by the GC. So, it seems to me that the desire for
 custom allocators would favor the use of ref-counted structs over classes.

 - Jonathan M Davis

I sounds like a struct is what should be used, it will be most flexible 
as well.

-- 
/Jacob Carlborg

Dec 21 2011

Caligo <iteronvexor gmail.com> writes:

On Sun, Dec 18, 2011 at 10:12 PM, Jonathan M Davis <jmdavisProg gmx.com>wrote:

 So, I'm beginning to think that we're going to have to go the struct route.

 - Jonathan M Davis

Two questions:

1. If you guys, the D experts, are having such a difficult time with this,
what happens to the rest of us when we need to implement data structures
that are not offered by Phobos?  Do we just wait and see what happens with
Phobos and learn from it?

2.  Are the new containers going to be multi-threaded?  i.e., will I be
able to insert elements into a container from multiple threads?

Dec 20 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/20/11 10:15 PM, Caligo wrote:
 Two questions:

 1. If you guys, the D experts, are having such a difficult time with
 this, what happens to the rest of us when we need to implement data
 structures that are not offered by Phobos?

Good containers are hard to define. It took the C++ community ten years 
and a great mathematician to define some passable ones. I have no 
worries - it's a hard task.

 Do we just wait and see what
 happens with Phobos and learn from it?

No need. The basic design is pretty clear - containers offer ranges, and 
algorithms use ranges.

 2.  Are the new containers going to be multi-threaded?  i.e., will I be
 able to insert elements into a container from multiple threads?

Containers will not be implicitly shared. We'll define specialized 
shared containers.


Andrei

Dec 20 2011

so <so so.so> writes:

On Wed, 21 Dec 2011 07:37:33 +0200, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 a great mathematician

Name please?

Dec 21 2011

"Froglegs" <lugtug gmail.com> writes:

On Wednesday, 21 December 2011 at 12:50:44 UTC, so wrote:
 On Wed, 21 Dec 2011 07:37:33 +0200, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 a great mathematician

 Name please?

I believe he is referring to

http://en.wikipedia.org/wiki/Alexander_Stepanov

Dec 21 2011

so <so so.so> writes:

On Wed, 21 Dec 2011 14:56:28 +0200, Froglegs <lugtug gmail.com> wrote:

 On Wednesday, 21 December 2011 at 12:50:44 UTC, so wrote:
 On Wed, 21 Dec 2011 07:37:33 +0200, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 a great mathematician

 Name please?

 I believe he is referring to

 http://en.wikipedia.org/wiki/Alexander_Stepanov

Thank you Mr Frog.
I didn't know the "mathematician" part of him.
I am warning you Java (or any die hard OOP) guys, don't read his articles.

Dec 21 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 18 Dec 2011 01:15:40 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 12/17/11 7:52 PM, Jonathan M Davis wrote:
 The only reason that I can think of to use a reference-counted struct  
 instead
 of a class is becuse then it's easier to avoid the GC heap entirely.   
 Almost
 all of a container's memory is going to end up on the heap regardless,  
 because
 the elements almost never end up in the container itself.

 Being on the heap is not the main issue. The main issue is the data  
 becoming garbage once all references are gone.

This whole thread of discussion is somewhat more complicated than it has  
to be.

A ref-counted type is implemented via a reference to allocated data.  That  
data can be a class instance or a struct pointer.

Using a reference style struct just introduces problems you don't have to  
worry about with a class.  It gains you nothing except having to crowbar  
the language into believing your struct is really a reference type.

And that's really hard to do.

-Steve

Dec 19 2011

Jerry <jlquinn optonline.net> writes:

Jonathan M Davis <jmdavisProg gmx.com> writes:

 On Saturday, December 17, 2011 17:31:46 Andrei Alexandrescu wrote:
 On 12/13/11 9:08 PM, Jonathan M Davis wrote:
 Is the plan for std.container still to have all of its containers be
 final classes (classes so that they're reference types and final so
 that their functions are inlinable)? Or has that changed? I believe
 that Andrei said something recently about discussing reference counting
 and containers with Walter.
 
 The reason that I bring this up is that Array and SList are still
 structs, and the longer that they're structs, the more code that will
 break when they get changed to classes. Granted, some level of code
 breakage may occur when we add custom allocators to them, but since
 that would probably only affect the constructor (and preferably
 wouldn't affect anything if you want to simply create a container with
 the GC heap as you would now were Array and SList classes), the
 breakage for that should be minimal.
 
 Is there any reason for me to not just go and make Array and SList final
 classes and create a pull request for it?
 
 - Jonathan M Davis

 
 Apologies for being slow on this. It may be a fateful time to discuss
 that right now, after all the discussion of what's appropriate for
 stdlib vs. application code etc.
 
 As some of you know, Walter and I went back and forth several times on
 this. First, there was the issue of making containers value types vs.
 reference types. Making containers value types would be in keep with the
 STL approach. However, Walter noted that copying entire containers by
 default is most often NOT desirable and there's significant care and
 adornments in C++ programs to make sure that that default behavior is
 avoided (e.g. adding const& to function parameters).
 
 So we decided to make containers reference types, and that seemed to be
 a good choice.
 
 The second decision is classes vs. structs. Walter correctly pointed out
 that the obvious choice for defining a reference type in D - whether the
 type is momonorphic or polymorphic - is making it a class. If containers
 aren't classes, the reasoning went, it means we took a wrong step
 somewhere; it might mean our flagship abstraction for reference types is
 not suitable for, well, defining a reference type.
 
 Fast forward a couple of months, a few unslept nights, and a bunch of
 newsgroup and IRC conversations. Several additional pieces came together.
 
 The most important thing I noticed is that people expect standard
 containers to have sophisticated memory management. Many ask not about
 containers as much as "containers with custom allocators". Second,
 containers tend to be large memory users by definition. Third,
 containers are self-contained (heh) and relatively simple in terms of
 what they model, meaning that they _never_ suffer from circular
 references, like general entity types might.
 
 All of these arguments very strongly suggest that many want containers
 to be types with deterministic control over memory and accept
 configurable allocation strategies (regions, heap, malloc, custom). So
 that would mean containers should be reference counted structs.
 
 This cycle of thought has happened twice, and the evidence coming the
 second time has been stronger. The first time around I went about and
 started implementing std.container with reference counting in mind. The
 code is not easy to write, and is not to be recommended for most types,
 hence my thinking (at the end of the first cycle) that we should switch
 to class containers. One fear I have is that people would be curious,
 look at the implementation of std.container, and be like "so am I
 expected to do all this to define a robust type"? I start to think that
 the right answer to that is to improve library support for good
 reference counted types, and define reference counted struct containers
 that are deterministic.
 
 Safety is also an issue. I was hoping I'd provide safety as a policy,
 e.g. one may choose for a given container whether they want safe or not
 (and presumably fast). I think it's best to postpone that policy and
 focus for now on defining safe containers with safe ranges. This
 precludes e.g. using T[] as a range for Array!T.
 
 Please discuss.

 The only reason that I can think of to use a reference-counted struct instead 
 of a class is becuse then it's easier to avoid the GC heap entirely.  Almost 
 all of a container's memory is going to end up on the heap regardless, because 
 the elements almost never end up in the container itself. They're in a dynamic 
 array or in nodes or something similar. So, whether the container is ref-
 counted or a class is almost irrelevant. If anything making it a class makes 
 more sense, because then it doesn't have to worry about the extra cost of ref-
 counting. However, once we bring allocators into the picture, the situation 
 changes slightly. In both cases, the elements themselves end up where the 
 allocator puts them. However, in the case of a struct, the member variables 
 themselves end up on the stack, whereas with the class, they end up on the GC 
 heap unless the programmer puts the object somewhere else with emplace or 
 possibly with a function on the allocator.

 The other concern is that with a class, it's immediately clear that you're 
 dealing with a reference, but that's much easier to miss with a struct when 
 structs are almost always value types.

 Also, some containers can't be an a usable state from their init value (e.g. 
 RedBlackTree), so they're more awkward as structs.  disable might fix that 
 problem, but I don't know. It would be also be awkward to not be able to just 
 declare a RedBlackTree without immediately initializing it.

 So, in general, I think that final classes make more sense. The ref-counted 
 struct is just trying to do what a class does already, and the class doesn't 
 have the extra overhead of the ref-counting. The only concern is the fact that 
 even when using a custom allocator, the few member variables that the 
 container has end up on the GC heap unless the programmer goes to extra effort 
 to avoid it, and part of the point of using the custom allocatorc is 
 specifically to avoid the GC heap. The allocator API could make that easy to 
 deal with though by having a function which takes the type and the arguments 
 for the constructor and constructs it for you. e.g.

 auto arr = custom.alloc!Array(4, 5, 7);

 So, I favor final classes. I just don't see much benefit in ref-counted
structs. 
 They're more confusing and generally harder to deal with, with the one 
 exception being avoiding the GC heap for the member variables of the 
 container.

 - Jonathan M Davis

One advantage of a struct container is that when you make it a member of a
struct
or class that will have lots of instances and therefore lots of tiny
containers.  

In C++ at my work, we embed tiny vectors in objects all the time.  If
you have a situation where there's enough of the objects that the memory
actually matters, and the vectors need to be resizeable but are often
small, the container overhead matters.  With a class, you've added a
reference pointer, vtable, typeinfo on top of the storage for the
container itself.

Also, the extra dereference to get from the object to its member
container is often going to be a cache miss, slowing things down
further.  So if you embed an Array class in an object, you pay for 2
pointer dereferences to get to the data rather than one.  With a class
you'll have to roll your own to avoid the costs.

Jerry

Dec 20 2011

"Froglegs" <lugtug gmail.com> writes:

   I don't really think ref counted struct vs class is fair, 
because in reality most containers don't need ref counting.  I 
can't think of one instance in C++ where I stuck a container 
directly in a shared_ptr or anything similar.

Also as far I as I can tell making it a class would bloat it with 
unnecessary data(vtable), and being it is common to have many 
many instances of these containers, that doesn't sound like such 
a great thing.

Dec 20 2011

"foobar" <foo bar.com> writes:

On Saturday, 17 December 2011 at 23:31:47 UTC, Andrei 
Alexandrescu wrote:
 On 12/13/11 9:08 PM, Jonathan M Davis wrote:
 Is the plan for std.container still to have all of its 
 containers be final
 classes (classes so that they're reference types and final so 
 that their
 functions are inlinable)? Or has that changed? I believe that 
 Andrei said
 something recently about discussing reference counting and 
 containers with
 Walter.

 The reason that I bring this up is that Array and SList are 
 still structs, and
 the longer that they're structs, the more code that will break 
 when they get
 changed to classes. Granted, some level of code breakage may 
 occur when we add
 custom allocators to them, but since that would probably only 
 affect the
 constructor (and preferably wouldn't affect anything if you 
 want to simply
 create a container with the GC heap as you would now were 
 Array and SList
 classes), the breakage for that should be minimal.

 Is there any reason for me to not just go and make Array and 
 SList final
 classes and create a pull request for it?

 - Jonathan M Davis

 Apologies for being slow on this. It may be a fateful time to 
 discuss that right now, after all the discussion of what's 
 appropriate for stdlib vs. application code etc.

 As some of you know, Walter and I went back and forth several 
 times on this. First, there was the issue of making containers 
 value types vs. reference types. Making containers value types 
 would be in keep with the STL approach. However, Walter noted 
 that copying entire containers by default is most often NOT 
 desirable and there's significant care and adornments in C++ 
 programs to make sure that that default behavior is avoided 
 (e.g. adding const& to function parameters).

 So we decided to make containers reference types, and that 
 seemed to be a good choice.

 The second decision is classes vs. structs. Walter correctly 
 pointed out that the obvious choice for defining a reference 
 type in D - whether the type is momonorphic or polymorphic - is 
 making it a class. If containers aren't classes, the reasoning 
 went, it means we took a wrong step somewhere; it might mean 
 our flagship abstraction for reference types is not suitable 
 for, well, defining a reference type.

 Fast forward a couple of months, a few unslept nights, and a 
 bunch of newsgroup and IRC conversations. Several additional 
 pieces came together.

 The most important thing I noticed is that people expect 
 standard containers to have sophisticated memory management. 
 Many ask not about containers as much as "containers with 
 custom allocators". Second, containers tend to be large memory 
 users by definition. Third, containers are self-contained (heh) 
 and relatively simple in terms of what they model, meaning that 
 they _never_ suffer from circular references, like general 
 entity types might.

 All of these arguments very strongly suggest that many want 
 containers to be types with deterministic control over memory 
 and accept configurable allocation strategies (regions, heap, 
 malloc, custom). So that would mean containers should be 
 reference counted structs.

 This cycle of thought has happened twice, and the evidence 
 coming the second time has been stronger. The first time around 
 I went about and started implementing std.container with 
 reference counting in mind. The code is not easy to write, and 
 is not to be recommended for most types, hence my thinking (at 
 the end of the first cycle) that we should switch to class 
 containers. One fear I have is that people would be curious, 
 look at the implementation of std.container, and be like "so am 
 I expected to do all this to define a robust type"? I start to 
 think that the right answer to that is to improve library 
 support for good reference counted types, and define reference 
 counted struct containers that are deterministic.

 Safety is also an issue. I was hoping I'd provide safety as a 
 policy, e.g. one may choose for a given container whether they 
 want safe or not (and presumably fast). I think it's best to 
 postpone that policy and focus for now on defining safe 
 containers with safe ranges. This precludes e.g. using T[] as a 
 range for Array!T.

 Please discuss.


 Andrei

My thoughts are:
- value vs. reference: definitely reference hence classes
- reference type - BAD idea. it adds code bloat, code complexity 
and hurts performance, not to mention it conflicts with 
concurrency which is the way of the future. - allocators - 
EXCELLENT idea. As discussed previously the design must not be 
the c++ one. - In addition to the general purpose ref. containers 
we should provide a few special value type containers such as 
small arrays with primitive types, a FlagSet with 1 bit per flag, 
etc.
- safety - definitely safe by default and this is wat we should 
concentrate ATM. unsafe could be added later.

ref-counted seems redundant to me if we provide allocators. There 
should be an RC_alloc for this.

Dec 20 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, December 20, 2011 22:15:43 Caligo wrote:
 On Sun, Dec 18, 2011 at 10:12 PM, Jonathan M Davis 

<jmdavisProg gmx.com>wrote:
 So, I'm beginning to think that we're going to have to go the struct
 route.
 
 - Jonathan M Davis

 
 Two questions:
 
 1. If you guys, the D experts, are having such a difficult time with this,
 what happens to the rest of us when we need to implement data structures
 that are not offered by Phobos?  Do we just wait and see what happens with
 Phobos and learn from it?

Containers are fairly straightforward in general. So, if you need to implement 
your own, it's not generally all that hard. The problem here is that the 
standard library needs to be efficient in the general case. We don't know under 
what circumstances these containers are going to be used, so we're under 
different constraints than someone who's writing a container for their own use.

Clearly, containers can be either structs or classes. Both will work. The 
issue is which will be more efficient in the general case. And if you're 
implementing your own containers, you can choose to do what Phobos does or go 
another route - especially since you're going to know under what circumstances 
you're using your own containers. For the most part, I think that the issues 
being discussed with regards to classes vs structs are as big a deal in this 
case as they are because it's the standard library.

Ideally, Phobos will take the most efficient route such that if you want your 
containers to be as absolutely as efficient as possible, you'd do the same
thing 
that they're doing with your own containers, but you wouldn't necessarily need 
to. Since, as Andrei has been pointing out, the standard library is the sort 
of place where you go to extremes to make code efficient, and you don't always 
do things in the way that would normally be considered the nice way to do it, 
because it _needs_ to have that efficiency and can afford to be nastier code,  
since it's generally the experts who are working on it.

So yes, learn from Phobos, but you don't have to do everything the way that it 
does.

 2.  Are the new containers going to be multi-threaded?  i.e., will I be
 able to insert elements into a container from multiple threads?

No. That would introduce unnecessary overhead. If you want synchronized 
containers, then wrap the standard ones in your own. Maybe some sort of 
synchronized wrapper will be provided by Phobos at some point, but the 
containers themselves are definitely not going to be. You can add the multi-
threading protections and the cost that they bring on top of an implementation 
that doesn't have them, but if you have such protections built-in, those using 
the containers can't remove them and would be stuck with the performance cost.

- Jonathan M Davis

Dec 20 2011

"foobar" <foo bar.com> writes:

On Wednesday, 21 December 2011 at 04:45:48 UTC, Jonathan M Davis 
wrote:
 On Tuesday, December 20, 2011 22:15:43 Caligo wrote:

[snip]
 2.  Are the new containers going to be multi-threaded?  i.e., 
 will I be
 able to insert elements into a container from multiple threads?

 No. That would introduce unnecessary overhead. If you want 
 synchronized containers, then wrap the standard ones in your 
 own. Maybe some sort of synchronized wrapper will be provided 
 by Phobos at some point, but the containers themselves are 
 definitely not going to be. You can add the multi-
 threading protections and the cost that they bring on top of an 
 implementation that doesn't have them, but if you have such 
 protections built-in, those using the containers can't remove 
 them and would be stuck with the performance cost.

 - Jonathan M Davis

actually, a synchronized container doesn't really provide 
concurrent write access, it serialized the access of multiple 
threads by using a lock. What about true concurrent containers 
such that two threads can access concurrently the same container 
at different places? E.g Array could be divided into chunks and 
each chunk is owned by a separate thread, Tree that can divide 
it's sub-trees among several threads, etc..

Dec 20 2011

D Programming

C/C++ Programming

Other

digitalmars.D - std.container and classes