www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.container and classes

reply Jonathan M Davis <jmdavisProg gmx.com> writes:
Is the plan for std.container still to have all of its containers be final 
classes (classes so that they're reference types and final so that their 
functions are inlinable)? Or has that changed? I believe that Andrei said 
something recently about discussing reference counting and containers with 
Walter.

The reason that I bring this up is that Array and SList are still structs, and 
the longer that they're structs, the more code that will break when they get 
changed to classes. Granted, some level of code breakage may occur when we add 
custom allocators to them, but since that would probably only affect the 
constructor (and preferably wouldn't affect anything if you want to simply 
create a container with the GC heap as you would now were Array and SList 
classes), the breakage for that should be minimal.

Is there any reason for me to not just go and make Array and SList final 
classes and create a pull request for it?

- Jonathan M Davis
Dec 13 2011
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/13/11 9:08 PM, Jonathan M Davis wrote:
 Is the plan for std.container still to have all of its containers be final
 classes (classes so that they're reference types and final so that their
 functions are inlinable)? Or has that changed? I believe that Andrei said
 something recently about discussing reference counting and containers with
 Walter.

 The reason that I bring this up is that Array and SList are still structs, and
 the longer that they're structs, the more code that will break when they get
 changed to classes. Granted, some level of code breakage may occur when we add
 custom allocators to them, but since that would probably only affect the
 constructor (and preferably wouldn't affect anything if you want to simply
 create a container with the GC heap as you would now were Array and SList
 classes), the breakage for that should be minimal.

 Is there any reason for me to not just go and make Array and SList final
 classes and create a pull request for it?

 - Jonathan M Davis
Apologies for being slow on this. It may be a fateful time to discuss that right now, after all the discussion of what's appropriate for stdlib vs. application code etc. As some of you know, Walter and I went back and forth several times on this. First, there was the issue of making containers value types vs. reference types. Making containers value types would be in keep with the STL approach. However, Walter noted that copying entire containers by default is most often NOT desirable and there's significant care and adornments in C++ programs to make sure that that default behavior is avoided (e.g. adding const& to function parameters). So we decided to make containers reference types, and that seemed to be a good choice. The second decision is classes vs. structs. Walter correctly pointed out that the obvious choice for defining a reference type in D - whether the type is momonorphic or polymorphic - is making it a class. If containers aren't classes, the reasoning went, it means we took a wrong step somewhere; it might mean our flagship abstraction for reference types is not suitable for, well, defining a reference type. Fast forward a couple of months, a few unslept nights, and a bunch of newsgroup and IRC conversations. Several additional pieces came together. The most important thing I noticed is that people expect standard containers to have sophisticated memory management. Many ask not about containers as much as "containers with custom allocators". Second, containers tend to be large memory users by definition. Third, containers are self-contained (heh) and relatively simple in terms of what they model, meaning that they _never_ suffer from circular references, like general entity types might. All of these arguments very strongly suggest that many want containers to be types with deterministic control over memory and accept configurable allocation strategies (regions, heap, malloc, custom). So that would mean containers should be reference counted structs. This cycle of thought has happened twice, and the evidence coming the second time has been stronger. The first time around I went about and started implementing std.container with reference counting in mind. The code is not easy to write, and is not to be recommended for most types, hence my thinking (at the end of the first cycle) that we should switch to class containers. One fear I have is that people would be curious, look at the implementation of std.container, and be like "so am I expected to do all this to define a robust type"? I start to think that the right answer to that is to improve library support for good reference counted types, and define reference counted struct containers that are deterministic. Safety is also an issue. I was hoping I'd provide safety as a policy, e.g. one may choose for a given container whether they want safe or not (and presumably fast). I think it's best to postpone that policy and focus for now on defining safe containers with safe ranges. This precludes e.g. using T[] as a range for Array!T. Please discuss. Andrei
Dec 17 2011
next sibling parent reply Jesse Phillips <jessekphillips+d gmail.com> writes:
On Sat, 17 Dec 2011 17:31:46 -0600, Andrei Alexandrescu wrote:

 The second decision is classes vs. structs. Walter correctly pointed out
 that the obvious choice for defining a reference type in D - whether the
 type is momonorphic or polymorphic - is making it a class. If containers
 aren't classes, the reasoning went, it means we took a wrong step
 somewhere; it might mean our flagship abstraction for reference types is
 not suitable for, well, defining a reference type.
This is an interesting reasoning to go with class. Which is similar to what you end up saying here:
 One fear I have is that people would be curious,
 look at the implementation of std.container, and be like "so am I
 expected to do all this to define a robust type"? I start to think that
 the right answer to that is to improve library support for good
 reference counted types, and define reference counted struct containers
 that are deterministic.
I'm not really sure the answer, but here are some thoughts. There is interest in having reference counting, and making it easy. There was a question on SO about how to make a reference counted object. I gave it my best shot answering having never done it myself. It isn't very straight forward and I think people will expect to use it similar to auto_prt. And considering how similar each wrapped object is, I think D can pull it off. http://stackoverflow.com/questions/4632355/making-a-reference-counted- object-in-d-using-refcountedt So building a reference counted container type probably shouldn't be a challenge in D (or I should say it should be a goal to make it not a challenge). For example, it might be as simple as, a build a value container which you then expose wrapped with RefCounted!T alias RefCounted!ContainerImp Container; But as to whether this should be what is implemented in the standard library, I don't know. You make mention of custom allocators and such. Is this interest going to be of benefit, or is it just something people are use to having from C++? If it makes sense to have such then the containers should support it. Don't classes allow for custom allocators? Is there a reason that can't be used? Do we need to improve on that?
Dec 17 2011
parent Peter Alexander <peter.alexander.au gmail.com> writes:
On 18/12/11 12:06 AM, Jesse Phillips wrote:
 But as to whether this should be what is implemented in the standard
 library, I don't know. You make mention of custom allocators and such. Is
 this interest going to be of benefit, or is it just something people are
 use to having from C++? If it makes sense to have such then the
 containers should support it. Don't classes allow for custom allocators?
 Is there a reason that can't be used? Do we need to improve on that?
Allocators are very commonly used in C++, although people rarely use them as they are implemented in the standard library because they were poorly designed. We can't repeat the same mistakes.
Dec 17 2011
prev sibling next sibling parent Peter Alexander <peter.alexander.au gmail.com> writes:
On 17/12/11 11:31 PM, Andrei Alexandrescu wrote:
 The most important thing I noticed is that people expect standard
 containers to have sophisticated memory management. Many ask not about
 containers as much as "containers with custom allocators". Second,
 containers tend to be large memory users by definition. Third,
 containers are self-contained (heh) and relatively simple in terms of
 what they model, meaning that they _never_ suffer from circular
 references, like general entity types might.

 All of these arguments very strongly suggest that many want containers
 to be types with deterministic control over memory and accept
 configurable allocation strategies (regions, heap, malloc, custom). So
 that would mean containers should be reference counted structs.
My thoughts on this: Allocators are a must, and need to handled better than they were in C++. Classes are simpler than ref-counted structs, and ref-counting is a massive performance drain and code-bloater anyway, so just go with final classes.
 Safety is also an issue. I was hoping I'd provide safety as a policy,
 e.g. one may choose for a given container whether they want safe or not
 (and presumably fast). I think it's best to postpone that policy and
 focus for now on defining safe containers with safe ranges. This
 precludes e.g. using T[] as a range for Array!T.
IMO they should just be safe and not provide the option of non-safe. It just complicates things to cater to people that won't use them anyway C++ standard library containers are unsafe and relatively lean, yet are rarely used when performance really matters.
Dec 17 2011
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Saturday, 17 December 2011 at 23:31:47 UTC, Andrei 
Alexandrescu wrote:
 Safety is also an issue. I was hoping I'd provide safety as a 
 policy, e.g. one may choose for a given container whether they 
 want safe or not (and presumably fast). I think it's best to 
 postpone that policy and focus for now on defining safe 
 containers with safe ranges. This precludes e.g. using T[] as a 
 range for Array!T.
Since containers are templated, why not use asserts for this, and let the user choose with -release (just as with built-in arrays)?
Dec 17 2011
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, December 17, 2011 17:31:46 Andrei Alexandrescu wrote:
 On 12/13/11 9:08 PM, Jonathan M Davis wrote:
 Is the plan for std.container still to have all of its containers be
 final classes (classes so that they're reference types and final so
 that their functions are inlinable)? Or has that changed? I believe
 that Andrei said something recently about discussing reference counting
 and containers with Walter.
 
 The reason that I bring this up is that Array and SList are still
 structs, and the longer that they're structs, the more code that will
 break when they get changed to classes. Granted, some level of code
 breakage may occur when we add custom allocators to them, but since
 that would probably only affect the constructor (and preferably
 wouldn't affect anything if you want to simply create a container with
 the GC heap as you would now were Array and SList classes), the
 breakage for that should be minimal.
 
 Is there any reason for me to not just go and make Array and SList final
 classes and create a pull request for it?
 
 - Jonathan M Davis
Apologies for being slow on this. It may be a fateful time to discuss that right now, after all the discussion of what's appropriate for stdlib vs. application code etc. As some of you know, Walter and I went back and forth several times on this. First, there was the issue of making containers value types vs. reference types. Making containers value types would be in keep with the STL approach. However, Walter noted that copying entire containers by default is most often NOT desirable and there's significant care and adornments in C++ programs to make sure that that default behavior is avoided (e.g. adding const& to function parameters). So we decided to make containers reference types, and that seemed to be a good choice. The second decision is classes vs. structs. Walter correctly pointed out that the obvious choice for defining a reference type in D - whether the type is momonorphic or polymorphic - is making it a class. If containers aren't classes, the reasoning went, it means we took a wrong step somewhere; it might mean our flagship abstraction for reference types is not suitable for, well, defining a reference type. Fast forward a couple of months, a few unslept nights, and a bunch of newsgroup and IRC conversations. Several additional pieces came together. The most important thing I noticed is that people expect standard containers to have sophisticated memory management. Many ask not about containers as much as "containers with custom allocators". Second, containers tend to be large memory users by definition. Third, containers are self-contained (heh) and relatively simple in terms of what they model, meaning that they _never_ suffer from circular references, like general entity types might. All of these arguments very strongly suggest that many want containers to be types with deterministic control over memory and accept configurable allocation strategies (regions, heap, malloc, custom). So that would mean containers should be reference counted structs. This cycle of thought has happened twice, and the evidence coming the second time has been stronger. The first time around I went about and started implementing std.container with reference counting in mind. The code is not easy to write, and is not to be recommended for most types, hence my thinking (at the end of the first cycle) that we should switch to class containers. One fear I have is that people would be curious, look at the implementation of std.container, and be like "so am I expected to do all this to define a robust type"? I start to think that the right answer to that is to improve library support for good reference counted types, and define reference counted struct containers that are deterministic. Safety is also an issue. I was hoping I'd provide safety as a policy, e.g. one may choose for a given container whether they want safe or not (and presumably fast). I think it's best to postpone that policy and focus for now on defining safe containers with safe ranges. This precludes e.g. using T[] as a range for Array!T. Please discuss.
The only reason that I can think of to use a reference-counted struct instead of a class is becuse then it's easier to avoid the GC heap entirely. Almost all of a container's memory is going to end up on the heap regardless, because the elements almost never end up in the container itself. They're in a dynamic array or in nodes or something similar. So, whether the container is ref- counted or a class is almost irrelevant. If anything making it a class makes more sense, because then it doesn't have to worry about the extra cost of ref- counting. However, once we bring allocators into the picture, the situation changes slightly. In both cases, the elements themselves end up where the allocator puts them. However, in the case of a struct, the member variables themselves end up on the stack, whereas with the class, they end up on the GC heap unless the programmer puts the object somewhere else with emplace or possibly with a function on the allocator. The other concern is that with a class, it's immediately clear that you're dealing with a reference, but that's much easier to miss with a struct when structs are almost always value types. Also, some containers can't be an a usable state from their init value (e.g. RedBlackTree), so they're more awkward as structs. disable might fix that problem, but I don't know. It would be also be awkward to not be able to just declare a RedBlackTree without immediately initializing it. So, in general, I think that final classes make more sense. The ref-counted struct is just trying to do what a class does already, and the class doesn't have the extra overhead of the ref-counting. The only concern is the fact that even when using a custom allocator, the few member variables that the container has end up on the GC heap unless the programmer goes to extra effort to avoid it, and part of the point of using the custom allocatorc is specifically to avoid the GC heap. The allocator API could make that easy to deal with though by having a function which takes the type and the arguments for the constructor and constructs it for you. e.g. auto arr = custom.alloc!Array(4, 5, 7); So, I favor final classes. I just don't see much benefit in ref-counted structs. They're more confusing and generally harder to deal with, with the one exception being avoiding the GC heap for the member variables of the container. - Jonathan M Davis
Dec 17 2011
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/17/11 7:52 PM, Jonathan M Davis wrote:
 The only reason that I can think of to use a reference-counted struct instead
 of a class is becuse then it's easier to avoid the GC heap entirely.  Almost
 all of a container's memory is going to end up on the heap regardless, because
 the elements almost never end up in the container itself.
Being on the heap is not the main issue. The main issue is the data becoming garbage once all references are gone.
 They're in a dynamic
 array or in nodes or something similar. So, whether the container is ref-
 counted or a class is almost irrelevant.
I think this argument is starting off the wrong premise, and is already wrong by this point, so I skipped the rest. Am I right? Andrei
Dec 17 2011
next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Sunday, December 18, 2011 00:15:40 Andrei Alexandrescu wrote:
 On 12/17/11 7:52 PM, Jonathan M Davis wrote:
 The only reason that I can think of to use a reference-counted struct
 instead of a class is becuse then it's easier to avoid the GC heap
 entirely. Almost all of a container's memory is going to end up on the
 heap regardless, because the elements almost never end up in the
 container itself.
Being on the heap is not the main issue. The main issue is the data becoming garbage once all references are gone.
 They're in a dynamic
 array or in nodes or something similar. So, whether the container is
 ref-
 counted or a class is almost irrelevant.
I think this argument is starting off the wrong premise, and is already wrong by this point, so I skipped the rest. Am I right?
My initial take on this is that the primary difference between a final class and a ref-counted struct is the fact that the class is on the heap and needs no ref-counting, whereas the struct is no the stack and has to do ref-counting, and everything else is the same regardless, since the memory for the container has to go on the heap regardless (be it the GC heap or wherever the allocator puts it). But what you're bringing up is that in the case of the class, everything in the container stays around until it's garbage collected, whereas in the case of the struct, it goes away as soon as the ref-count hits zero? I hadn't thought of that. But if we're talking about the GC heap, since most of the internals are going to be on the heap in either case, with the only difference being the container itself and its value type member variables. However, once you use a custom allocator (say one that uses malloc and free), then in the struct case, everything gets cleaned up as soon as the ref-count hits zero, whereas with the class it sits around until the container itself is collected - assuming that the class itself is on the GC heap. If it was using the malloc-free allocator, then it would be around until it was manually freed by the allocator. I don't know. That is certainly food for thought. My natural inclinition is to just make it a class, since it's a reference type, and then you don't have to worry about the cost of ref-counting or have any confusion over whether the container is a reference type or not. But if there's any real performance advantage in using ref-counted structs due to something to do with memory management, then that would be highly valuable. If no custom allocators are used, then I'd expect the struct to be worse off, since it has the cost of being copied as well as the cost of ref-counting, whereas the only cost in passing the class around is copying it's reference, and other than the few member variables that the container has, there's no difference in how long the memory in the container sticks around. It has to wait for the GC in either case. It's only when a custom allocator which could free the memory as soon as the ref-count hits zero comes into play that it makes a difference, in which case the struct would probably be more efficient. Unless I'm missing something? It certainly doesn't seem like an easy call. - Jonathan M Davis
Dec 17 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/18/11 1:42 AM, Jonathan M Davis wrote:
 On Sunday, December 18, 2011 00:15:40 Andrei Alexandrescu wrote:
 On 12/17/11 7:52 PM, Jonathan M Davis wrote:
 The only reason that I can think of to use a reference-counted struct
 instead of a class is becuse then it's easier to avoid the GC heap
 entirely. Almost all of a container's memory is going to end up on the
 heap regardless, because the elements almost never end up in the
 container itself.
Being on the heap is not the main issue. The main issue is the data becoming garbage once all references are gone.
 They're in a dynamic
 array or in nodes or something similar. So, whether the container is
 ref-
 counted or a class is almost irrelevant.
I think this argument is starting off the wrong premise, and is already wrong by this point, so I skipped the rest. Am I right?
My initial take on this is that the primary difference between a final class and a ref-counted struct is the fact that the class is on the heap and needs no ref-counting, whereas the struct is no the stack and has to do ref-counting, and everything else is the same regardless, since the memory for the container has to go on the heap regardless (be it the GC heap or wherever the allocator puts it). But what you're bringing up is that in the case of the class, everything in the container stays around until it's garbage collected, whereas in the case of the struct, it goes away as soon as the ref-count hits zero? I hadn't thought of that. But if we're talking about the GC heap, since most of the internals are going to be on the heap in either case, with the only difference being the container itself and its value type member variables. However, once you use a custom allocator (say one that uses malloc and free), then in the struct case, everything gets cleaned up as soon as the ref-count hits zero, whereas with the class it sits around until the container itself is collected - assuming that the class itself is on the GC heap. If it was using the malloc-free allocator, then it would be around until it was manually freed by the allocator. I don't know. That is certainly food for thought.
If this is a new notion, it might take a while for all of its aspects to sink in. I'm not saying that to be smug - reference counting vs. other methods of collecting garbage is definitely a difficult topic on today's architectures, and I can say at least it took me quite a while to build some reasonable mental model of the related tradeoffs. Andrei
Dec 18 2011
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, December 18, 2011 02:46:01 Andrei Alexandrescu wrote:
 If this is a new notion, it might take a while for all of its aspects to
 sink in. I'm not saying that to be smug - reference counting vs. other
 methods of collecting garbage is definitely a difficult topic on today's
 architectures, and I can say at least it took me quite a while to build
 some reasonable mental model of the related tradeoffs.
After thinking about this a bit, I don't think that custom allocators are really going to work with classes very well. Unless you allocate the entire class with the custom allocator (as opposed to just telling the container to use the custom allocator), the memory won't be cleaned up until the object is collected by the GC (even though it's using a custom allocator to allocate its internals), in which case, what does the custom allocator really buy us? So, the idea of just passing the allocator to the container doesn't seem to cut it for a class. The object itself needs to be allocated with the custom allocator. On top of that, if you're allocating the container with a custom allocator, doesn't that risk issues like what the to-be-deprecated scope modifier causes? You practically end up having to wrap the class in a struct anyway, in which case, what's the point of using a class? Just in general, it's much harder to control the memory if the container is a class rather than a struct. With a struct, you can do pretty much whatever you want with the memory. Its lifetime is strictly controlled, which gives much greater opportunites to optimize its memory usage. With a class, it's going to be difficult to completely divorce the container from the GC in a clean manner. With a struct, we could easily make the default not use the GC at all, which would make the default use of the containers much more efficient. You also wouldn't have to worry about creating the container itself with an allocator and all of the potential issues that that brings. I don't particularly like the idea of the cost of passing the container around as a struct (since it's few member variables will have to be memcopied and the ref-counting will have to be done), but that's not a huge cost (though it _is_ a pervasive one), and I'm begginning to think that it's a necessary one if we want containers to be able to properly and efficiently manage their own memory. Depending on the use case, the benefits from the containers being able to fully manage their own memory without the programmer having to do anything special could easily outweigh the additional cost of passing a struct around. And if ranges over such containers are passed around more frequently than the containers themselves (as I would expect), then the minor cost of passing the container around is even less of a big deal, since it would primarily be the ranges being passed around. So, I'm beginning to think that we're going to have to go the struct route. - Jonathan M Davis
Dec 18 2011
parent reply Jacob Carlborg <doob me.com> writes:
On 2011-12-19 05:12, Jonathan M Davis wrote:
 On Sunday, December 18, 2011 02:46:01 Andrei Alexandrescu wrote:
 If this is a new notion, it might take a while for all of its aspects to
 sink in. I'm not saying that to be smug - reference counting vs. other
 methods of collecting garbage is definitely a difficult topic on today's
 architectures, and I can say at least it took me quite a while to build
 some reasonable mental model of the related tradeoffs.
After thinking about this a bit, I don't think that custom allocators are really going to work with classes very well. Unless you allocate the entire class with the custom allocator (as opposed to just telling the container to use the custom allocator), the memory won't be cleaned up until the object is collected by the GC (even though it's using a custom allocator to allocate its internals), in which case, what does the custom allocator really buy us? So, the idea of just passing the allocator to the container doesn't seem to cut it for a class. The object itself needs to be allocated with the custom allocator. On top of that, if you're allocating the container with a custom allocator, doesn't that risk issues like what the to-be-deprecated scope modifier causes? You practically end up having to wrap the class in a struct anyway, in which case, what's the point of using a class? Just in general, it's much harder to control the memory if the container is a class rather than a struct. With a struct, you can do pretty much whatever you want with the memory. Its lifetime is strictly controlled, which gives much greater opportunites to optimize its memory usage. With a class, it's going to be difficult to completely divorce the container from the GC in a clean manner. With a struct, we could easily make the default not use the GC at all, which would make the default use of the containers much more efficient. You also wouldn't have to worry about creating the container itself with an allocator and all of the potential issues that that brings. I don't particularly like the idea of the cost of passing the container around as a struct (since it's few member variables will have to be memcopied and the ref-counting will have to be done), but that's not a huge cost (though it _is_ a pervasive one), and I'm begginning to think that it's a necessary one if we want containers to be able to properly and efficiently manage their own memory. Depending on the use case, the benefits from the containers being able to fully manage their own memory without the programmer having to do anything special could easily outweigh the additional cost of passing a struct around. And if ranges over such containers are passed around more frequently than the containers themselves (as I would expect), then the minor cost of passing the container around is even less of a big deal, since it would primarily be the ranges being passed around. So, I'm beginning to think that we're going to have to go the struct route. - Jonathan M Davis
The Tango runtime has added a new method in Object, "dispose". This method is called when "scope" is used or "delete" is explicitly called. I don't know if that would help. -- /Jacob Carlborg
Dec 18 2011
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, December 19, 2011 08:54:00 Jacob Carlborg wrote:
 The Tango runtime has added a new method in Object, "dispose". This
 method is called when "scope" is used or "delete" is explicitly called.
 I don't know if that would help.
That sounds a lot like using clear, though clear doesn't free memory unless the finalizer does (and I'm not sure that managing memory with finalizers actually works right now; there were issues with that previously - but it might have only been GC memory, so perhaps malloc and free would still work). The issue with them is escaping references. You can easily end up with references to data which has been destroyed. It also makes them harder to pass around unless you essentially wrap the container in ref-counted struct. The lifetime of the container must be managed if you really want to be able to use custom allocators or have the container be as efficient as possible with regards to its memory usage in the general case. With a struct, it manages itself. It can do whatever it wants to with memory internally, and the combination of its constructor, postblit constructor, and destructor allows it to take care of it itself and clean up its memory when it's destroyed. You also don't have issues of stuff referring to a container which doesn't exist anymore (unless you add pointers to the container into the mix and then move or destroy the container). With classes, if they're owned by the GC, then it's the GC that manages their lifetime. It destroys them when they're no longer referenced and it runs a collection cycle. You have no control over its lifetime, and even if it tries to manage its memory internally on some level, that memory won't be cleaned up until the GC collects the container itself. So, if you want to manage the lifetime of the class, you can't use a naked reference anymore. You need a way to get the GC to collect it, or you need to have it somewhere else other than the GC heap. If you use something like Scoped, then its lifetime will be tied to the scope that it's in, but you risk references to it escaping, and limiting the container to a particular scope could be far too limiting anyway. That being the case, if you want to use an allocator other than the GC for the class itself, then you're probably going to have to stick it in a struct. That struct could then manage the container's lifetime on some level. It would probably have to use ref-counting and then call clear on the container when the ref-count hits zero. That should then allow the container to at least clean up its internal memory, assuming that that's not on the GC heap, but unless you use emplace on non-GC heap memory, the container itself still can't be collected until the GC collects it (since clear destroys it but doesn't free its memory). Not to mention, if you're using a ref-counted struct to hold a class in order to manage that class' lifetime, why on earth did you make it a class in the first place? If what you're suggesting with dispose is that it would act like clear except that it would actually free the container, well that pretty much goes against the direction that we've been trying to go with the GC, which is to not have the programmer explicitly freeing anything on the GC heap (which is why delete is being removed from the language). I really think that if we want deterministic destruction of containers, they need to be structs. And if we want to use custom allocators with containers, I don't see how we could reasonably expect them to be of much use unless we're expecting that programmers will explicitly call clear on them when they're done with them. So, while the fact that containers need to be reference types does imply that classes are the way to go, I think that we're going to have to go with structs if we want their memory management to be more efficient than simply letting the GC handle it. - Jonathan M Davis
Dec 20 2011
next sibling parent reply "foobar" <foo bar.com> writes:
On Wednesday, 21 December 2011 at 05:08:27 UTC, Jonathan M Davis 
wrote:
 On Monday, December 19, 2011 08:54:00 Jacob Carlborg wrote:
 The Tango runtime has added a new method in Object, "dispose". 
 This
 method is called when "scope" is used or "delete" is 
 explicitly called.
 I don't know if that would help.
That sounds a lot like using clear, though clear doesn't free memory unless the finalizer does (and I'm not sure that managing memory with finalizers actually works right now; there were issues with that previously - but it might have only been GC memory, so perhaps malloc and free would still work). The issue with them is escaping references. You can easily end up with references to data which has been destroyed. It also makes them harder to pass around unless you essentially wrap the container in ref-counted struct. The lifetime of the container must be managed if you really want to be able to use custom allocators or have the container be as efficient as possible with regards to its memory usage in the general case. With a struct, it manages itself. It can do whatever it wants to with memory internally, and the combination of its constructor, postblit constructor, and destructor allows it to take care of it itself and clean up its memory when it's destroyed. You also don't have issues of stuff referring to a container which doesn't exist anymore (unless you add pointers to the container into the mix and then move or destroy the container). With classes, if they're owned by the GC, then it's the GC that manages their lifetime. It destroys them when they're no longer referenced and it runs a collection cycle. You have no control over its lifetime, and even if it tries to manage its memory internally on some level, that memory won't be cleaned up until the GC collects the container itself. So, if you want to manage the lifetime of the class, you can't use a naked reference anymore. You need a way to get the GC to collect it, or you need to have it somewhere else other than the GC heap. If you use something like Scoped, then its lifetime will be tied to the scope that it's in, but you risk references to it escaping, and limiting the container to a particular scope could be far too limiting anyway. That being the case, if you want to use an allocator other than the GC for the class itself, then you're probably going to have to stick it in a struct. That struct could then manage the container's lifetime on some level. It would probably have to use ref-counting and then call clear on the container when the ref-count hits zero. That should then allow the container to at least clean up its internal memory, assuming that that's not on the GC heap, but unless you use emplace on non-GC heap memory, the container itself still can't be collected until the GC collects it (since clear destroys it but doesn't free its memory). Not to mention, if you're using a ref-counted struct to hold a class in order to manage that class' lifetime, why on earth did you make it a class in the first place? If what you're suggesting with dispose is that it would act like clear except that it would actually free the container, well that pretty much goes against the direction that we've been trying to go with the GC, which is to not have the programmer explicitly freeing anything on the GC heap (which is why delete is being removed from the language). I really think that if we want deterministic destruction of containers, they need to be structs. And if we want to use custom allocators with containers, I don't see how we could reasonably expect them to be of much use unless we're expecting that programmers will explicitly call clear on them when they're done with them. So, while the fact that containers need to be reference types does imply that classes are the way to go, I think that we're going to have to go with structs if we want their memory management to be more efficient than simply letting the GC handle it. - Jonathan M Davis
I disagree with the above conclusion. you conflate two issues that are orthogonal: a. value vs. ref semantics which is already seemed to be decided in favor of the latter and hence classes. b. memory and lifetime management The containers should allow for (disregard the specifics of the syntax): Container a = new(SharedMemAllocator) LinkedList(); Container b = new(MallocAllocator) LinkedList(); Container c = new(GC) LinkedList(); When adding an item to the above containers the relevant allocator will enact its policy about intermixing with other allocators - by default the item will be copied if it comes from a separate allocator. I don't see anything here that forces the use of structs instead of classes.
Dec 20 2011
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, December 21, 2011 08:07:23 foobar wrote:
 I disagree with the above conclusion. you conflate two issues
 that are orthogonal:
 a. value vs. ref semantics which is already seemed to be decided
 in favor of the latter and hence classes. b. memory and lifetime
 management
 
 The containers should allow for (disregard the specifics of the
 syntax):
 
 Container a = new(SharedMemAllocator) LinkedList();
 Container b = new(MallocAllocator) LinkedList();
 Container c = new(GC) LinkedList();
 
 When adding an item to the above containers the relevant
 allocator will enact its policy about intermixing with other
 allocators - by default the item will be copied if it comes from
 a separate allocator. I don't see anything here that forces the
 use of structs instead of classes.
And if they're classes and not managed by the GC nor in a struct which manages their lifetime, how are they going to be freed? Does the user have to explicitly free them themselves? How is that better than using a ref-counted struct? And no, using reference semantics does _not_ require classes. A class is just the easiest way to get reference semantics. It doesn't necessarily mean that it's the best way. That depends on the context. - Jonathan M Davis
Dec 20 2011
parent "foobar" <foo bar.com> writes:
On Wednesday, 21 December 2011 at 07:30:45 UTC, Jonathan M Davis 
wrote:
 And if they're classes and not managed by the GC nor in a 
 struct which manages their lifetime, how are they going to be 
 freed? Does the user have to explicitly free them themselves? 
 How is that better than using a ref-counted struct?
Memory management should be the responsibility of the allocator and it shouldn't be limited only to a single policy of ref-counting which has many disadvantages.
 And no, using reference semantics does _not_ require classes. A 
 class is just the easiest way to get reference semantics. It 
 doesn't necessarily mean that it's the best way. That depends 
 on the context.

 - Jonathan M Davis
I didn't use the word 'require' anywhere, but it is indeed by far the simplest and best way to define a ref-type. All I'm saying is that when travelling from NYC to Washington DC through China, you really should have good reasons to not use the direct route.
Dec 21 2011
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/21/11 1:07 AM, foobar wrote:
 The containers should allow for (disregard the specifics of the syntax):

 Container a = new(SharedMemAllocator) LinkedList();
 Container b = new(MallocAllocator) LinkedList();
 Container c = new(GC) LinkedList();

 When adding an item to the above containers the relevant allocator will
 enact its policy about intermixing with other allocators - by default
 the item will be copied if it comes from a separate allocator. I don't
 see anything here that forces the use of structs instead of classes.
(Please don't overquote. It took me a million years to scroll all the way down on my phone. Thanks.) This is an alluring proposition but it's a lot more difficult than one might think. The problem is the entire container design is different if e.g. using refcounting vs. classic garbage collection. You may want to try designing a simple container and isolate all memory/lifetime-specific issues into an allocator. See what allocator interface you come up with. I was unable to solve this problem, but it's possible it has a solution. Andrei
Dec 21 2011
parent "foobar" <foo bar.com> writes:
On Wednesday, 21 December 2011 at 16:39:13 UTC, Andrei 
Alexandrescu wrote:
 (Please don't overquote. It took me a million years to scroll 
 all the way down on my phone. Thanks.)

 This is an alluring proposition but it's a lot more difficult 
 than one might think. The problem is the entire container 
 design is different if e.g. using refcounting vs. classic 
 garbage collection.

 You may want to try designing a simple container and isolate 
 all memory/lifetime-specific issues into an allocator. See what 
 allocator interface you come up with. I was unable to solve 
 this problem, but it's possible it has a solution.


 Andrei
Could you give an example to illustrate this design difference? Which container would be affected by this? There's already one good allocator API design I'm aware of: http://www.open-std.org/Jtc1/sc22/wg21/docs/papers/2005/n1850.pdf Can we reuse the design/ideas or can you see problems with it?
Dec 21 2011
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2011-12-21 06:07, Jonathan M Davis wrote:
 On Monday, December 19, 2011 08:54:00 Jacob Carlborg wrote:
 The Tango runtime has added a new method in Object, "dispose". This
 method is called when "scope" is used or "delete" is explicitly called.
 I don't know if that would help.
That sounds a lot like using clear, though clear doesn't free memory unless the finalizer does (and I'm not sure that managing memory with finalizers actually works right now; there were issues with that previously - but it might have only been GC memory, so perhaps malloc and free would still work). The issue with them is escaping references. You can easily end up with references to data which has been destroyed. It also makes them harder to pass around unless you essentially wrap the container in ref-counted struct. The lifetime of the container must be managed if you really want to be able to use custom allocators or have the container be as efficient as possible with regards to its memory usage in the general case. With a struct, it manages itself. It can do whatever it wants to with memory internally, and the combination of its constructor, postblit constructor, and destructor allows it to take care of it itself and clean up its memory when it's destroyed. You also don't have issues of stuff referring to a container which doesn't exist anymore (unless you add pointers to the container into the mix and then move or destroy the container). With classes, if they're owned by the GC, then it's the GC that manages their lifetime. It destroys them when they're no longer referenced and it runs a collection cycle. You have no control over its lifetime, and even if it tries to manage its memory internally on some level, that memory won't be cleaned up until the GC collects the container itself. So, if you want to manage the lifetime of the class, you can't use a naked reference anymore. You need a way to get the GC to collect it, or you need to have it somewhere else other than the GC heap.
With "dispose" it would give you the option to control the lifetime of the object, in some cases.
 If you use something like Scoped, then its lifetime will be tied to the scope
 that it's in, but you risk references to it escaping, and limiting the
 container to a particular scope could be far too limiting anyway. That being
 the case, if you want to use an allocator other than the GC for the class
 itself, then you're probably going to have to stick it in a struct. That
 struct could then manage the container's lifetime on some level. It would
 probably have to use ref-counting and then call clear on the container when
 the ref-count hits zero. That should then allow the container to at least
 clean up its internal memory, assuming that that's not on the GC heap, but
 unless you use emplace on non-GC heap memory, the container itself still can't
 be collected until the GC collects it (since clear destroys it but doesn't
 free its memory). Not to mention, if you're using a ref-counted struct to hold
 a class in order to manage that class' lifetime, why on earth did you make it
 a class in the first place?

 If what you're suggesting with dispose is that it would act like clear except
 that it would actually free the container, well that pretty much goes against
 the direction that we've been trying to go with the GC, which is to not have
 the programmer explicitly freeing anything on the GC heap (which is why delete
 is being removed from the language).
I thought that one of the problems with the GC and memory was that you can't rely on when/if the destructors are called. With "dispose" it would give the developer a reliable method to cleanup resources, assuming "scope" or "delete" is used.
 I really think that if we want deterministic destruction of containers, they
 need to be structs. And if we want to use custom allocators with containers, I
 don't see how we could reasonably expect them to be of much use unless we're
 expecting that programmers will explicitly call clear on them when they're
 done with them. So, while the fact that containers need to be reference types
 does imply that classes are the way to go, I think that we're going to have to
 go with structs if we want their memory management to be more efficient than
 simply letting the GC handle it.

 - Jonathan M Davis
I was thinking that classes could be the safe and default but and scope/delete could be used as an optimization. -- /Jacob Carlborg
Dec 20 2011
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, December 21, 2011 08:44:10 Jacob Carlborg wrote:
 I thought that one of the problems with the GC and memory was that you
 can't rely on when/if the destructors are called. With "dispose" it
 would give the developer a reliable method to cleanup resources,
 assuming "scope" or "delete" is used.
 I was thinking that classes could be the safe and default but and
 scope/delete could be used as an optimization.
Well, both delete and scope are going away, so std.typecons.scoped would have to be used instead, but that only really works for local variables. It would probably work for a member variable as well, but in both cases, you have to worry about the class going away when Scoped goes away and how that affects all of the references to it. If, on other hand, you use a ref-counted struct, it takes care of itself without the risks of the container going away while references to it still exist (unless you keep a pointer to it somewhere instead of the ref-counted struct). So, ref-counted structs would be much safer, and they could use advanced memory management internally by default, making them more efficient by default. What I really don't get, however, is how custom allocators would be of much use for classes unless the entire class is allocated that way (in which case, it's probably going to end up in a struct of some kind anyway unless you want the same issues that you get with scoped), since if the class is on the GC heap, its internal memory management couldn't release any of it until the container is collected by the GC. So, it seems to me that the desire for custom allocators would favor the use of ref-counted structs over classes. - Jonathan M Davis
Dec 20 2011
parent Jacob Carlborg <doob me.com> writes:
On 2011-12-21 08:53, Jonathan M Davis wrote:
 On Wednesday, December 21, 2011 08:44:10 Jacob Carlborg wrote:
 I thought that one of the problems with the GC and memory was that you
 can't rely on when/if the destructors are called. With "dispose" it
 would give the developer a reliable method to cleanup resources,
 assuming "scope" or "delete" is used.
 I was thinking that classes could be the safe and default but and
 scope/delete could be used as an optimization.
Well, both delete and scope are going away, so std.typecons.scoped would have to be used instead, but that only really works for local variables. It would probably work for a member variable as well, but in both cases, you have to worry about the class going away when Scoped goes away and how that affects all of the references to it.
I was thinking if there were good uses for "delete" and "scope" they could stay. I think the "dispose" method in Tango makes them more useful.
 If, on other hand, you use a ref-counted struct, it takes care of itself
 without the risks of the container going away while references to it still
 exist (unless you keep a pointer to it somewhere instead of the ref-counted
 struct). So, ref-counted structs would be much safer, and they could use
 advanced memory management internally by default, making them more efficient by
 default.

 What I really don't get, however, is how custom allocators would be of much
 use for classes unless the entire class is allocated that way (in which case,
 it's probably going to end up in a struct of some kind anyway unless you want
 the same issues that you get with scoped), since if the class is on the GC
 heap, its internal memory management couldn't release any of it until the
 container is collected by the GC. So, it seems to me that the desire for
 custom allocators would favor the use of ref-counted structs over classes.

 - Jonathan M Davis
I sounds like a struct is what should be used, it will be most flexible as well. -- /Jacob Carlborg
Dec 21 2011
prev sibling parent reply Caligo <iteronvexor gmail.com> writes:
On Sun, Dec 18, 2011 at 10:12 PM, Jonathan M Davis <jmdavisProg gmx.com>wrote:

 So, I'm beginning to think that we're going to have to go the struct route.

 - Jonathan M Davis
Two questions: 1. If you guys, the D experts, are having such a difficult time with this, what happens to the rest of us when we need to implement data structures that are not offered by Phobos? Do we just wait and see what happens with Phobos and learn from it? 2. Are the new containers going to be multi-threaded? i.e., will I be able to insert elements into a container from multiple threads?
Dec 20 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/20/11 10:15 PM, Caligo wrote:
 Two questions:

 1. If you guys, the D experts, are having such a difficult time with
 this, what happens to the rest of us when we need to implement data
 structures that are not offered by Phobos?
Good containers are hard to define. It took the C++ community ten years and a great mathematician to define some passable ones. I have no worries - it's a hard task.
 Do we just wait and see what
 happens with Phobos and learn from it?
No need. The basic design is pretty clear - containers offer ranges, and algorithms use ranges.
 2.  Are the new containers going to be multi-threaded?  i.e., will I be
 able to insert elements into a container from multiple threads?
Containers will not be implicitly shared. We'll define specialized shared containers. Andrei
Dec 20 2011
parent reply so <so so.so> writes:
On Wed, 21 Dec 2011 07:37:33 +0200, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 a great mathematician
Name please?
Dec 21 2011
parent reply "Froglegs" <lugtug gmail.com> writes:
On Wednesday, 21 December 2011 at 12:50:44 UTC, so wrote:
 On Wed, 21 Dec 2011 07:37:33 +0200, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 a great mathematician
Name please?
I believe he is referring to http://en.wikipedia.org/wiki/Alexander_Stepanov
Dec 21 2011
parent so <so so.so> writes:
On Wed, 21 Dec 2011 14:56:28 +0200, Froglegs <lugtug gmail.com> wrote:

 On Wednesday, 21 December 2011 at 12:50:44 UTC, so wrote:
 On Wed, 21 Dec 2011 07:37:33 +0200, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 a great mathematician
Name please?
I believe he is referring to http://en.wikipedia.org/wiki/Alexander_Stepanov
Thank you Mr Frog. I didn't know the "mathematician" part of him. I am warning you Java (or any die hard OOP) guys, don't read his articles.
Dec 21 2011
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 18 Dec 2011 01:15:40 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 12/17/11 7:52 PM, Jonathan M Davis wrote:
 The only reason that I can think of to use a reference-counted struct  
 instead
 of a class is becuse then it's easier to avoid the GC heap entirely.   
 Almost
 all of a container's memory is going to end up on the heap regardless,  
 because
 the elements almost never end up in the container itself.
Being on the heap is not the main issue. The main issue is the data becoming garbage once all references are gone.
This whole thread of discussion is somewhat more complicated than it has to be. A ref-counted type is implemented via a reference to allocated data. That data can be a class instance or a struct pointer. Using a reference style struct just introduces problems you don't have to worry about with a class. It gains you nothing except having to crowbar the language into believing your struct is really a reference type. And that's really hard to do. -Steve
Dec 19 2011
prev sibling parent reply Jerry <jlquinn optonline.net> writes:
Jonathan M Davis <jmdavisProg gmx.com> writes:

 On Saturday, December 17, 2011 17:31:46 Andrei Alexandrescu wrote:
 On 12/13/11 9:08 PM, Jonathan M Davis wrote:
 Is the plan for std.container still to have all of its containers be
 final classes (classes so that they're reference types and final so
 that their functions are inlinable)? Or has that changed? I believe
 that Andrei said something recently about discussing reference counting
 and containers with Walter.
 
 The reason that I bring this up is that Array and SList are still
 structs, and the longer that they're structs, the more code that will
 break when they get changed to classes. Granted, some level of code
 breakage may occur when we add custom allocators to them, but since
 that would probably only affect the constructor (and preferably
 wouldn't affect anything if you want to simply create a container with
 the GC heap as you would now were Array and SList classes), the
 breakage for that should be minimal.
 
 Is there any reason for me to not just go and make Array and SList final
 classes and create a pull request for it?
 
 - Jonathan M Davis
Apologies for being slow on this. It may be a fateful time to discuss that right now, after all the discussion of what's appropriate for stdlib vs. application code etc. As some of you know, Walter and I went back and forth several times on this. First, there was the issue of making containers value types vs. reference types. Making containers value types would be in keep with the STL approach. However, Walter noted that copying entire containers by default is most often NOT desirable and there's significant care and adornments in C++ programs to make sure that that default behavior is avoided (e.g. adding const& to function parameters). So we decided to make containers reference types, and that seemed to be a good choice. The second decision is classes vs. structs. Walter correctly pointed out that the obvious choice for defining a reference type in D - whether the type is momonorphic or polymorphic - is making it a class. If containers aren't classes, the reasoning went, it means we took a wrong step somewhere; it might mean our flagship abstraction for reference types is not suitable for, well, defining a reference type. Fast forward a couple of months, a few unslept nights, and a bunch of newsgroup and IRC conversations. Several additional pieces came together. The most important thing I noticed is that people expect standard containers to have sophisticated memory management. Many ask not about containers as much as "containers with custom allocators". Second, containers tend to be large memory users by definition. Third, containers are self-contained (heh) and relatively simple in terms of what they model, meaning that they _never_ suffer from circular references, like general entity types might. All of these arguments very strongly suggest that many want containers to be types with deterministic control over memory and accept configurable allocation strategies (regions, heap, malloc, custom). So that would mean containers should be reference counted structs. This cycle of thought has happened twice, and the evidence coming the second time has been stronger. The first time around I went about and started implementing std.container with reference counting in mind. The code is not easy to write, and is not to be recommended for most types, hence my thinking (at the end of the first cycle) that we should switch to class containers. One fear I have is that people would be curious, look at the implementation of std.container, and be like "so am I expected to do all this to define a robust type"? I start to think that the right answer to that is to improve library support for good reference counted types, and define reference counted struct containers that are deterministic. Safety is also an issue. I was hoping I'd provide safety as a policy, e.g. one may choose for a given container whether they want safe or not (and presumably fast). I think it's best to postpone that policy and focus for now on defining safe containers with safe ranges. This precludes e.g. using T[] as a range for Array!T. Please discuss.
The only reason that I can think of to use a reference-counted struct instead of a class is becuse then it's easier to avoid the GC heap entirely. Almost all of a container's memory is going to end up on the heap regardless, because the elements almost never end up in the container itself. They're in a dynamic array or in nodes or something similar. So, whether the container is ref- counted or a class is almost irrelevant. If anything making it a class makes more sense, because then it doesn't have to worry about the extra cost of ref- counting. However, once we bring allocators into the picture, the situation changes slightly. In both cases, the elements themselves end up where the allocator puts them. However, in the case of a struct, the member variables themselves end up on the stack, whereas with the class, they end up on the GC heap unless the programmer puts the object somewhere else with emplace or possibly with a function on the allocator. The other concern is that with a class, it's immediately clear that you're dealing with a reference, but that's much easier to miss with a struct when structs are almost always value types. Also, some containers can't be an a usable state from their init value (e.g. RedBlackTree), so they're more awkward as structs. disable might fix that problem, but I don't know. It would be also be awkward to not be able to just declare a RedBlackTree without immediately initializing it. So, in general, I think that final classes make more sense. The ref-counted struct is just trying to do what a class does already, and the class doesn't have the extra overhead of the ref-counting. The only concern is the fact that even when using a custom allocator, the few member variables that the container has end up on the GC heap unless the programmer goes to extra effort to avoid it, and part of the point of using the custom allocatorc is specifically to avoid the GC heap. The allocator API could make that easy to deal with though by having a function which takes the type and the arguments for the constructor and constructs it for you. e.g. auto arr = custom.alloc!Array(4, 5, 7); So, I favor final classes. I just don't see much benefit in ref-counted structs. They're more confusing and generally harder to deal with, with the one exception being avoiding the GC heap for the member variables of the container. - Jonathan M Davis
One advantage of a struct container is that when you make it a member of a struct or class that will have lots of instances and therefore lots of tiny containers. In C++ at my work, we embed tiny vectors in objects all the time. If you have a situation where there's enough of the objects that the memory actually matters, and the vectors need to be resizeable but are often small, the container overhead matters. With a class, you've added a reference pointer, vtable, typeinfo on top of the storage for the container itself. Also, the extra dereference to get from the object to its member container is often going to be a cache miss, slowing things down further. So if you embed an Array class in an object, you pay for 2 pointer dereferences to get to the data rather than one. With a class you'll have to roll your own to avoid the costs. Jerry
Dec 20 2011
parent "Froglegs" <lugtug gmail.com> writes:
   I don't really think ref counted struct vs class is fair, 
because in reality most containers don't need ref counting.  I 
can't think of one instance in C++ where I stuck a container 
directly in a shared_ptr or anything similar.

Also as far I as I can tell making it a class would bloat it with 
unnecessary data(vtable), and being it is common to have many 
many instances of these containers, that doesn't sound like such 
a great thing.
Dec 20 2011
prev sibling parent "foobar" <foo bar.com> writes:
On Saturday, 17 December 2011 at 23:31:47 UTC, Andrei 
Alexandrescu wrote:
 On 12/13/11 9:08 PM, Jonathan M Davis wrote:
 Is the plan for std.container still to have all of its 
 containers be final
 classes (classes so that they're reference types and final so 
 that their
 functions are inlinable)? Or has that changed? I believe that 
 Andrei said
 something recently about discussing reference counting and 
 containers with
 Walter.

 The reason that I bring this up is that Array and SList are 
 still structs, and
 the longer that they're structs, the more code that will break 
 when they get
 changed to classes. Granted, some level of code breakage may 
 occur when we add
 custom allocators to them, but since that would probably only 
 affect the
 constructor (and preferably wouldn't affect anything if you 
 want to simply
 create a container with the GC heap as you would now were 
 Array and SList
 classes), the breakage for that should be minimal.

 Is there any reason for me to not just go and make Array and 
 SList final
 classes and create a pull request for it?

 - Jonathan M Davis
Apologies for being slow on this. It may be a fateful time to discuss that right now, after all the discussion of what's appropriate for stdlib vs. application code etc. As some of you know, Walter and I went back and forth several times on this. First, there was the issue of making containers value types vs. reference types. Making containers value types would be in keep with the STL approach. However, Walter noted that copying entire containers by default is most often NOT desirable and there's significant care and adornments in C++ programs to make sure that that default behavior is avoided (e.g. adding const& to function parameters). So we decided to make containers reference types, and that seemed to be a good choice. The second decision is classes vs. structs. Walter correctly pointed out that the obvious choice for defining a reference type in D - whether the type is momonorphic or polymorphic - is making it a class. If containers aren't classes, the reasoning went, it means we took a wrong step somewhere; it might mean our flagship abstraction for reference types is not suitable for, well, defining a reference type. Fast forward a couple of months, a few unslept nights, and a bunch of newsgroup and IRC conversations. Several additional pieces came together. The most important thing I noticed is that people expect standard containers to have sophisticated memory management. Many ask not about containers as much as "containers with custom allocators". Second, containers tend to be large memory users by definition. Third, containers are self-contained (heh) and relatively simple in terms of what they model, meaning that they _never_ suffer from circular references, like general entity types might. All of these arguments very strongly suggest that many want containers to be types with deterministic control over memory and accept configurable allocation strategies (regions, heap, malloc, custom). So that would mean containers should be reference counted structs. This cycle of thought has happened twice, and the evidence coming the second time has been stronger. The first time around I went about and started implementing std.container with reference counting in mind. The code is not easy to write, and is not to be recommended for most types, hence my thinking (at the end of the first cycle) that we should switch to class containers. One fear I have is that people would be curious, look at the implementation of std.container, and be like "so am I expected to do all this to define a robust type"? I start to think that the right answer to that is to improve library support for good reference counted types, and define reference counted struct containers that are deterministic. Safety is also an issue. I was hoping I'd provide safety as a policy, e.g. one may choose for a given container whether they want safe or not (and presumably fast). I think it's best to postpone that policy and focus for now on defining safe containers with safe ranges. This precludes e.g. using T[] as a range for Array!T. Please discuss. Andrei
My thoughts are: - value vs. reference: definitely reference hence classes - reference type - BAD idea. it adds code bloat, code complexity and hurts performance, not to mention it conflicts with concurrency which is the way of the future. - allocators - EXCELLENT idea. As discussed previously the design must not be the c++ one. - In addition to the general purpose ref. containers we should provide a few special value type containers such as small arrays with primitive types, a FlagSet with 1 bit per flag, etc. - safety - definitely safe by default and this is wat we should concentrate ATM. unsafe could be added later. ref-counted seems redundant to me if we provide allocators. There should be an RC_alloc for this.
Dec 20 2011
prev sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, December 20, 2011 22:15:43 Caligo wrote:
 On Sun, Dec 18, 2011 at 10:12 PM, Jonathan M Davis 
<jmdavisProg gmx.com>wrote:
 So, I'm beginning to think that we're going to have to go the struct
 route.
 
 - Jonathan M Davis
Two questions: 1. If you guys, the D experts, are having such a difficult time with this, what happens to the rest of us when we need to implement data structures that are not offered by Phobos? Do we just wait and see what happens with Phobos and learn from it?
Containers are fairly straightforward in general. So, if you need to implement your own, it's not generally all that hard. The problem here is that the standard library needs to be efficient in the general case. We don't know under what circumstances these containers are going to be used, so we're under different constraints than someone who's writing a container for their own use. Clearly, containers can be either structs or classes. Both will work. The issue is which will be more efficient in the general case. And if you're implementing your own containers, you can choose to do what Phobos does or go another route - especially since you're going to know under what circumstances you're using your own containers. For the most part, I think that the issues being discussed with regards to classes vs structs are as big a deal in this case as they are because it's the standard library. Ideally, Phobos will take the most efficient route such that if you want your containers to be as absolutely as efficient as possible, you'd do the same thing that they're doing with your own containers, but you wouldn't necessarily need to. Since, as Andrei has been pointing out, the standard library is the sort of place where you go to extremes to make code efficient, and you don't always do things in the way that would normally be considered the nice way to do it, because it _needs_ to have that efficiency and can afford to be nastier code, since it's generally the experts who are working on it. So yes, learn from Phobos, but you don't have to do everything the way that it does.
 2.  Are the new containers going to be multi-threaded?  i.e., will I be
 able to insert elements into a container from multiple threads?
No. That would introduce unnecessary overhead. If you want synchronized containers, then wrap the standard ones in your own. Maybe some sort of synchronized wrapper will be provided by Phobos at some point, but the containers themselves are definitely not going to be. You can add the multi- threading protections and the cost that they bring on top of an implementation that doesn't have them, but if you have such protections built-in, those using the containers can't remove them and would be stuck with the performance cost. - Jonathan M Davis
Dec 20 2011
parent "foobar" <foo bar.com> writes:
On Wednesday, 21 December 2011 at 04:45:48 UTC, Jonathan M Davis 
wrote:
 On Tuesday, December 20, 2011 22:15:43 Caligo wrote:
[snip]
 2.  Are the new containers going to be multi-threaded?  i.e., 
 will I be
 able to insert elements into a container from multiple threads?
No. That would introduce unnecessary overhead. If you want synchronized containers, then wrap the standard ones in your own. Maybe some sort of synchronized wrapper will be provided by Phobos at some point, but the containers themselves are definitely not going to be. You can add the multi- threading protections and the cost that they bring on top of an implementation that doesn't have them, but if you have such protections built-in, those using the containers can't remove them and would be stuck with the performance cost. - Jonathan M Davis
actually, a synchronized container doesn't really provide concurrent write access, it serialized the access of multiple threads by using a lock. What about true concurrent containers such that two threads can access concurrently the same container at different places? E.g Array could be divided into chunks and each chunk is owned by a separate thread, Tree that can divide it's sub-trees among several threads, etc..
Dec 20 2011