www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - improvement request - enabling by-value-containers

reply Simon Buerger <krox gmx.net> writes:
For Every lib its a design descision if containers should be value- or 
reference-types. In C++ STL they are value-types (i.e. the 
copy-constructor does a real copy), while in tango and phobos the 
descision was to go for reference-types afaik, but I would like to be 
able to write value-types too, which isn't possible (in a really good 
way) currently. Following points would need some love (by-value 
containers are probably not the only area, where these could be useful)

(1) Allow default-constructors for structs
I don't see a reason, why "this(int foo)" is allowed, but "this()" is 
not. There might be some useful non-trivial init to do for complex 
structs.

(2) const parameters by reference
If a parameter to a function is read-only, the right notion depends on 
the type of that parameter. I.e. "in" for simple stuff like ints, and 
"ref const" for big structures. Using "in" for big data implies a 
whole copy, even though it's constant, and using "ref const" for 
simple types is a useless indirection. This is a problem for generic 
code, when the type is templated, because there is now way to switch 
between "in" and "ref const" with compile-time-reflection.

Solution one: make "ref" a real type-constructor, so you could do the 
following (this is possible in C++):

static if(is(T == struct))
	alias ref const T const_type;
else
	alias const scope T const_type;
// "const scope" is (currently) equivalent to "in"
void foo(const_type x)

Solution two: let "in" decide wheather to pass by reference or value, 
depending on the type. Probably the better solution cause the 
programmer dont need to care of the descision himself anymore.

(3) make foreach parameters constant
when you do "foreach(x;a)" the x value gets copied in each iteration, 
once again, that matters for big types especially when you have a 
copy-constructor. Current work-around is prepending "ref": nothing 
gets copied, but the compiler wont know it is meant to be read-only. 
Solution: either allow "ref const" or "in" in foreach. Or you could 
even make x default to constant if not stated as "ref" explicitly. 
Last alternative seems logical to me, but it may break existing code.

Comments welcome,
Krox
Dec 08 2010
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, December 08, 2010 14:14:57 Simon Buerger wrote:
 For Every lib its a design descision if containers should be value- or
 reference-types. In C++ STL they are value-types (i.e. the
 copy-constructor does a real copy), while in tango and phobos the
 descision was to go for reference-types afaik, but I would like to be
 able to write value-types too, which isn't possible (in a really good
 way) currently. Following points would need some love (by-value
 containers are probably not the only area, where these could be useful)

It's extremely rare in my experience that it makes any sense to copy a container on a regular basis. Having an easy means of creating a deep copy of a container or copying the elements from one container to another efficiently would be good, but having containers be value types is almost always a bad idea. It's just not a typical need to need to copy containers - certainly not enough to have them be copied just because you passed them to a function or returned them from one. I think that reference types for containers is very much the correct decision. There should be good ways to copy containers, but copying shouldn't be the default for much of anything in the way of containers.
 (1) Allow default-constructors for structs
 I don't see a reason, why "this(int foo)" is allowed, but "this()" is
 not. There might be some useful non-trivial init to do for complex
 structs.

It has to do with the init property. It has to be known at compile-time for all types. For classes, that's easy because it's null, but for structs, that's what all of their member variables are directly initialized to. If you add a default constructor, then it would have to be to whatever that constructed them to, which would shift it from compile time to runtime. It should be possible to have default constructors which are definitely limited in a number of ways (like having to be nothrow and possibly pure), but that hasn't been sorted out, and even if it is, plenty of cases where people want default constructors still wouldn't likely work. It just doesn't work to have default constructors which can run completely arbitrary code. You could get exceptions thrown in weird places and a variety of other problems which we can't have in situations where init is used. Hopefully, we'll get limited default constructors at some point, but it hasn't happened yet (and probably won't without a good proposal that deals with all of the potentiall issues), and regardless, it will never be as flexible as what C++ does. It's primarily a side effect of insisting that all variables be default initialized if they're not directly initialized.
 (2) const parameters by reference
 If a parameter to a function is read-only, the right notion depends on
 the type of that parameter. I.e. "in" for simple stuff like ints, and
 "ref const" for big structures. Using "in" for big data implies a
 whole copy, even though it's constant, and using "ref const" for
 simple types is a useless indirection. This is a problem for generic
 code, when the type is templated, because there is now way to switch
 between "in" and "ref const" with compile-time-reflection.
 
 Solution one: make "ref" a real type-constructor, so you could do the
 following (this is possible in C++):
 
 static if(is(T == struct))
 	alias ref const T const_type;
 else
 	alias const scope T const_type;
 // "const scope" is (currently) equivalent to "in"
 void foo(const_type x)
 
 Solution two: let "in" decide wheather to pass by reference or value,
 depending on the type. Probably the better solution cause the
 programmer dont need to care of the descision himself anymore.

I think that auto ref is supposed to deal with some of this, but it's buggy at the moment, and I'm not sure exactly what it's supposed to do. There was some discussion on this one in a recent thread.
 (3) make foreach parameters constant
 when you do "foreach(x;a)" the x value gets copied in each iteration,
 once again, that matters for big types especially when you have a
 copy-constructor. Current work-around is prepending "ref": nothing
 gets copied, but the compiler wont know it is meant to be read-only.
 Solution: either allow "ref const" or "in" in foreach. Or you could
 even make x default to constant if not stated as "ref" explicitly.
 Last alternative seems logical to me, but it may break existing code.

I'd hate to see foreach variables be const by default. That would be overly limiting and would definitely break a lot of code. Making ref const work properly would be good (I think that it works in at least some cases) for structs that you don't want to be copied but wouldn't be all that useful otherwise. Nothing in D is const by default, and I think that making anything const by default would clash with the rest of the language. Particularly since then how would you make it mutable? No, it should be possible to have const refs to structs for foreach variables, but it shouldn't be the default. The language as a whole just does not support that. - Jonathan M Davis
Dec 08 2010
parent reply Simon Buerger <krox gmx.net> writes:
On 08.12.2010 23:45, Jonathan M Davis wrote:
 On Wednesday, December 08, 2010 14:14:57 Simon Buerger wrote:
 For Every lib its a design descision if containers should be value- or
 reference-types. In C++ STL they are value-types (i.e. the
 copy-constructor does a real copy), while in tango and phobos the
 descision was to go for reference-types afaik, but I would like to be
 able to write value-types too, which isn't possible (in a really good
 way) currently. Following points would need some love (by-value
 containers are probably not the only area, where these could be useful)

It's extremely rare in my experience that it makes any sense to copy a container on a regular basis. Having an easy means of creating a deep copy of a container or copying the elements from one container to another efficiently would be good, but having containers be value types is almost always a bad idea. It's just not a typical need to need to copy containers - certainly not enough to have them be copied just because you passed them to a function or returned them from one. I think that reference types for containers is very much the correct decision. There should be good ways to copy containers, but copying shouldn't be the default for much of anything in the way of containers.

From a pragmatic viewpoint you are right, copying containers is rare. But on the other hand, classes imply a kind of identity, so that a set is a different obejct then an other object with the very same elements. That feels wrong from an aesthetical or mathematical viewpoint. Furthermore, if you have for example a vector of vectors, vector!int row = [1,2,3]; auto vec = Vector!(Vector!int)(5, row); then vec should be 5 rows, and not 5 times the same row.
 (1) Allow default-constructors for structs
 I don't see a reason, why "this(int foo)" is allowed, but "this()" is
 not. There might be some useful non-trivial init to do for complex
 structs.

It has to do with the init property. It has to be known at compile-time for all types. For classes, that's easy because it's null, but for structs, that's what all of their member variables are directly initialized to. If you add a default constructor, then it would have to be to whatever that constructed them to, which would shift it from compile time to runtime. It should be possible to have default constructors which are definitely limited in a number of ways (like having to be nothrow and possibly pure), but that hasn't been sorted out, and even if it is, plenty of cases where people want default constructors still wouldn't likely work. It just doesn't work to have default constructors which can run completely arbitrary code. You could get exceptions thrown in weird places and a variety of other problems which we can't have in situations where init is used. Hopefully, we'll get limited default constructors at some point, but it hasn't happened yet (and probably won't without a good proposal that deals with all of the potentiall issues), and regardless, it will never be as flexible as what C++ does. It's primarily a side effect of insisting that all variables be default initialized if they're not directly initialized.

I partially see your point, the constructor would be called in places the programmer didnt expect, but actually, what's the problem with an exception? They can always happen anyway (at least outOfMemory)
 (2) const parameters by reference
 If a parameter to a function is read-only, the right notion depends on
 the type of that parameter. I.e. "in" for simple stuff like ints, and
 "ref const" for big structures. Using "in" for big data implies a
 whole copy, even though it's constant, and using "ref const" for
 simple types is a useless indirection. This is a problem for generic
 code, when the type is templated, because there is now way to switch
 between "in" and "ref const" with compile-time-reflection.

 Solution one: make "ref" a real type-constructor, so you could do the
 following (this is possible in C++):

 static if(is(T == struct))
 	alias ref const T const_type;
 else
 	alias const scope T const_type;
 // "const scope" is (currently) equivalent to "in"
 void foo(const_type x)

 Solution two: let "in" decide wheather to pass by reference or value,
 depending on the type. Probably the better solution cause the
 programmer dont need to care of the descision himself anymore.

I think that auto ref is supposed to deal with some of this, but it's buggy at the moment, and I'm not sure exactly what it's supposed to do. There was some discussion on this one in a recent thread.

letting "in" decide would be cleaner IMO, but anyway good to hear that problem is recognized. Will look for the other thread.
 (3) make foreach parameters constant
 when you do "foreach(x;a)" the x value gets copied in each iteration,
 once again, that matters for big types especially when you have a
 copy-constructor. Current work-around is prepending "ref": nothing
 gets copied, but the compiler wont know it is meant to be read-only.
 Solution: either allow "ref const" or "in" in foreach. Or you could
 even make x default to constant if not stated as "ref" explicitly.
 Last alternative seems logical to me, but it may break existing code.

I'd hate to see foreach variables be const by default. That would be overly limiting and would definitely break a lot of code. Making ref const work properly would be good (I think that it works in at least some cases) for structs that you don't want to be copied but wouldn't be all that useful otherwise. Nothing in D is const by default, and I think that making anything const by default would clash with the rest of the language. Particularly since then how would you make it mutable? No, it should be possible to have const refs to structs for foreach variables, but it shouldn't be the default. The language as a whole just does not support that.

You are right that default-const would be contrary to the rest of the language, but when I think longer about this... the same default-const should apply for all function parameter. They should be input, output or inout. But the "mutable copy of the original" which is common in C/C++/D/everything-alike, is actually pretty weird. (modifying non-output parameters inside a function is considered bad style even in C++ and Java). But well, that would be really a step too big for D2... maybe I'll suggest it for D3 some day *g* Krox
Dec 09 2010
next sibling parent reply Jesse Phillips <jessekphillips+D gmail.com> writes:
Simon Buerger Wrote:

 vector!int row = [1,2,3];
 auto vec = Vector!(Vector!int)(5, row);
 
 then vec should be 5 rows, and not 5 times the same row.

Why? You put row in there and said there was 5 of them. vec[] = row.dup; I believe that would be the correct syntax if you wanted to store 5 different vectors of the same content (Works for arrays).
 I partially see your point, the constructor would be called in places 
 the programmer didnt expect, but actually, what's the problem with an 
 exception? They can always happen anyway (at least outOfMemory)

I think there is even more too it. init is used during compile time so properties of the class/struct can be checked. I don't think exceptions are supported for CTFE.
 letting "in" decide would be cleaner IMO, but anyway good to hear that 
 problem is recognized. Will look for the other thread.

I'm not sure if the spec says in must be passed by reference, only that is how it is done. I'd think it'd be up to the compiler.
 You are right that default-const would be contrary to the rest of the 
 language, but when I think longer about this... the same default-const 
 should apply for all function parameter. They should be input, output 
 or inout. But the "mutable copy of the original" which is common in 
 C/C++/D/everything-alike, is actually pretty weird. (modifying 
 non-output parameters inside a function is considered bad style even 
 in C++ and Java). But well, that would be really a step too big for 
 D2... maybe I'll suggest it for D3 some day *g*
 
 Krox

I believe Bearophile has beat you too that. Think it is even in Bugzilla. I think it would only make sense to add it to D3 if it becomes common to mark functions parameters as in. But I agree it is easier to think, I want to modify this then it is to say I'm not modifying this so it should be in. Though currently I don't think there is a way to mark the current default behavior.
Dec 09 2010
parent reply Simon Buerger <krox gmx.net> writes:
On 09.12.2010 23:39, Jesse Phillips wrote:
 Simon Buerger Wrote:

 vector!int row = [1,2,3];
 auto vec = Vector!(Vector!int)(5, row);

 then vec should be 5 rows, and not 5 times the same row.

Why? You put row in there and said there was 5 of them. vec[] = row.dup; I believe that would be the correct syntax if you wanted to store 5 different vectors of the same content (Works for arrays).

No, that line would duplicate row once, and store that same copy in every element of vec.
 I partially see your point, the constructor would be called in places
 the programmer didnt expect, but actually, what's the problem with an
 exception? They can always happen anyway (at least outOfMemory)

I think there is even more too it. init is used during compile time so properties of the class/struct can be checked. I don't think exceptions are supported for CTFE.
 letting "in" decide would be cleaner IMO, but anyway good to hear that
 problem is recognized. Will look for the other thread.

I'm not sure if the spec says in must be passed by reference, only that is how it is done. I'd think it'd be up to the compiler.

Other way around, "in" is currently passed by value, though the spec does not explicitly disallow by reference, so it might be implemented without even changing the spec.
 You are right that default-const would be contrary to the rest of the
 language, but when I think longer about this... the same default-const
 should apply for all function parameter. They should be input, output
 or inout. But the "mutable copy of the original" which is common in
 C/C++/D/everything-alike, is actually pretty weird. (modifying
 non-output parameters inside a function is considered bad style even
 in C++ and Java). But well, that would be really a step too big for
 D2... maybe I'll suggest it for D3 some day *g*

I believe Bearophile has beat you too that. Think it is even in Bugzilla. I think it would only make sense to add it to D3 if it becomes common to mark functions parameters as in. But I agree it is easier to think, I want to modify this then it is to say I'm not modifying this so it should be in. Though currently I don't think there is a way to mark the current default behavior.

Well, it would be good style to add in/out to each and every parameter there is, but I dont do it myself either (except in container-implementations, where good style and the last bit of optimizer seems important) Krox
Dec 09 2010
parent reply Jonathan =?UTF-8?B?U2NobWlkdC1Eb21pbsOp?= <devel the-user.org> writes:
Hi!

Just about my experiences: When trying to hack some algorithms quickly in 
Ruby I made a lot of mistakes because I had to care about a .clone 
everywhere and because Array.new(5, []) does not work as expected (sorry, 
but Array.new(5) { return [] } is not nice). So in fact C++ made my life 
easier than the new, stylish, simple Ruby-programming-language, because of 
the great by-value-containers in the STL.

However, some reasons for by-value-containers:
*First of all you often have to deal with mutli-dimensional data-
structures, 
you map something to a map of lists of whatever and you want to manage 
suche 
data in a simple and generic way, simplification or extensions to the data-
structure should not force you to refactor all more or less generic code-
fragments. For example copying some entries around should not look 
different 
just because you added a dimension in your data-structure. But without 
proper value-semantics you are forced to do that, because at some point you 
will have to switch from by-value to by-reference because of limitations 
made somewhere.
*Another argument: It should be very simple (at least in C++ it is, I have 
never had problems with it, I just added the & here and there) to handle 
references to by-value-types, but wrapping by-reference-types into by-value-
types is really ugly, although it may be the right thing somewhere.
*By-value-containers support more generic code. A copy_if on a multi-
dimensional container should of course copy the element and not just copy 
some references, and it would be bad if the generic implementation would 
have to test if the type is by-reference, but a container supporting clone, 
eventually using clone, bothering if it is a deep clone etc. I just do not 
see a simple way to make such generic algorithms easy to implement (even 
with new language features) if by-value-types are not fully supported and 
not used for containers.
*Whether or not you think by-value-containers are good, a better 
alternative 
for in would be great for generic code. In C++ I can use something like 
parameter_type<T>::result choosing by-reference or by-value automatically, 
it is not very nice, but it is simply impossible with D, there are no 
reference-types, there are no ways to implement such decisions in the 
parameter-lists, and in normal code it is impossible more than ever. But 
imagine there would be a simple you could put in front of a variable or 
paramater declaration choosing by-ref/by-value automatically, lets say §, 
and of course it should be const and scope, so you would put a § before 
your 
parameters, because you may not know if type is big or not (there are 
sometimes big PODs, somebody may want to pass a FILE-object to a generic 
function or whatever), you would use a § an a read-only-foreach, and you 
would not have to bother about anything, when extracting a container-
element 
temporarily (pivot in quicksort or whatever, you may want to sort your 
PODs, 
by-value-containers, primitives, pointers, class-objects etc.), you would 
use a § and it would be perfect. This would allow the implementation of by-
value-containers (of course default-constructors are also required), but it 
would allow to write more generic high-quality code everywhere. In D3 this 
could even be the default for function-parameters, the C-compatible default-
behaviour is simply nonsense.

Regards

The User
Dec 14 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/14/10 3:29 PM, Jonathan Schmidt-Dominé wrote:
 Hi!

 Just about my experiences: When trying to hack some algorithms quickly in
 Ruby I made a lot of mistakes because I had to care about a .clone
 everywhere and because Array.new(5, []) does not work as expected (sorry,
 but Array.new(5) { return [] } is not nice). So in fact C++ made my life
 easier than the new, stylish, simple Ruby-programming-language, because of
 the great by-value-containers in the STL.

Thanks for sharing.
 However, some reasons for by-value-containers:
 *First of all you often have to deal with mutli-dimensional data-
 structures,
 you map something to a map of lists of whatever and you want to manage
 suche
 data in a simple and generic way, simplification or extensions to the data-
 structure should not force you to refactor all more or less generic code-
 fragments. For example copying some entries around should not look
 different
 just because you added a dimension in your data-structure. But without
 proper value-semantics you are forced to do that, because at some point you
 will have to switch from by-value to by-reference because of limitations
 made somewhere.

I think this argument goes exactly the other way. C++ containers have terrible compositional behavior. Using vector<vector<T> > or vector<map<T> > in C++98 is suicide. C++0x fixes that by means of introducing rvalue references, but reference semantics obviate all that.
 *Another argument: It should be very simple (at least in C++ it is, I have
 never had problems with it, I just added the&  here and there) to handle
 references to by-value-types, but wrapping by-reference-types into by-value-
 types is really ugly, although it may be the right thing somewhere.

"here and there" is more like "every time I define a function". I mean that's a lot, no? Wrapping could work either way, and after thinking about it a lot I have difficulty decreeing one is considerably easier/simpler than the other.
 *By-value-containers support more generic code. A copy_if on a multi-
 dimensional container should of course copy the element and not just copy
 some references, and it would be bad if the generic implementation would
 have to test if the type is by-reference, but a container supporting clone,
 eventually using clone, bothering if it is a deep clone etc. I just do not
 see a simple way to make such generic algorithms easy to implement (even
 with new language features) if by-value-types are not fully supported and
 not used for containers.

copy_if on a multidimensional container should not naively copy entire hyperplanes. More generally, I think that whenever an arbitrarily large object is to be copied, that should be explicit instead of implicit. A lot of focus in C++ is dedicated to making sure you don't copy the wrong thing.
 *Whether or not you think by-value-containers are good, a better
 alternative
 for in would be great for generic code. In C++ I can use something like
 parameter_type<T>::result choosing by-reference or by-value automatically,
 it is not very nice, but it is simply impossible with D, there are no
 reference-types, there are no ways to implement such decisions in the
 parameter-lists, and in normal code it is impossible more than ever. But
 imagine there would be a simple you could put in front of a variable or
 paramater declaration choosing by-ref/by-value automatically, lets say §,
 and of course it should be const and scope, so you would put a § before
 your
 parameters, because you may not know if type is big or not (there are
 sometimes big PODs, somebody may want to pass a FILE-object to a generic
 function or whatever), you would use a § an a read-only-foreach, and you
 would not have to bother about anything, when extracting a container-
 element
 temporarily (pivot in quicksort or whatever, you may want to sort your
 PODs,
 by-value-containers, primitives, pointers, class-objects etc.), you would
 use a § and it would be perfect. This would allow the implementation of by-
 value-containers (of course default-constructors are also required), but it
 would allow to write more generic high-quality code everywhere. In D3 this
 could even be the default for function-parameters, the C-compatible default-
 behaviour is simply nonsense.

It would be great to have D just obviate the necessity of parameter_type<T>::result in the first place, which is what auto ref is meant for (it's §).
 The User

s/The/A/ I guess ;o). One issue that I noticed about myself and other people coming to D from C++ is that we expect to bring with us, along with the many things that make C++ great, the baggage of common worries, misgivings, and just rote work that we got used to. Andrei
Dec 14 2010
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei:

 One issue that I noticed about myself and other people coming to D from 
 C++ is that we expect to bring with us, along with the many things that 
 make C++ great, the baggage of common worries, misgivings, and just rote 
 work that we got used to.

Yes, this is a common thing (it happened to me too, with Python and other languages). You need to be careful and think three times before designing things. Knowing several languages helps a bit against that. I think so far reference semantics (but final methods) for containers is having the upper hand so far in this discussion. Surely other people will write other kind of D containers (like the C++ containers used by Electronic Arts), but I think the std lib needs to be not too much hard to use. (C++ is sometimes too much hard to use for me. In D I was looking for a bit simpler to use language). Bye, bearophile
Dec 14 2010
prev sibling next sibling parent reply Jonathan =?UTF-8?B?U2NobWlkdC1Eb21pbsOp?= <devel the-user.org> writes:
Hi!

 *Another argument: It should be very simple (at least in C++ it is, I
 have
 never had problems with it, I just added the&  here and there) to handle
 references to by-value-types, but wrapping by-reference-types into
 by-value- types is really ugly, although it may be the right thing
 somewhere.

"here and there" is more like "every time I define a function". I mean that's a lot, no?

possible to write generic sort, copy_if etc. without unnecessary copies or references.
 Wrapping could work either way, and after thinking about it a lot I have
 difficulty decreeing one is considerably easier/simpler than the other.

wrapping by-value-types in by-reference-types does not. And structs without default-constructors are not very nice, bacause you would have to check first in any operation if the wrapped by-reference-container is null.
 copy_if on a multidimensional container should not naively copy entire
 hyperplanes. More generally, I think that whenever an arbitrarily large
 object is to be copied, that should be explicit instead of implicit. A
 lot of focus in C++ is dedicated to making sure you don't copy the wrong
 thing.

The result of copy_if should be a copy and not a view on the original data, and even such a view should not rely on the input-types to be by-reference (by-value types should also be accessible by reference in such a view, unless it is const, but that is offtopic). When functions like copy_if, map or fold should be combined in a generic way, by-value-containers are much more intuitive and they work without guesses where to add a clone. And Array x(5, [0,1,2]) should result in a 5·3 array.
 It would be great to have D just obviate the necessity of
 parameter_type<T>::result in the first place, which is what auto ref is
 meant for (it's §).

be used as often as possible, it should even be the default in many cases (function parameters, foreach-variables etc.), although I understand that this is unlikely because of compatibility. Then by-value container would simply make the world consistent and the places where you would have to write ref or & would be less obtrusive and more generic than all the clone- stuff you would have to write with by-reference-containers.
 s/The/A/ I guess ;o).

anything in this world. :D The User
Dec 14 2010
parent reply Kagamin <spam here.lot> writes:
Jonathan Schmidt-Dominé Wrote:

 For me something like ref const scope (or ref scope for primitives) should 
 be used as often as possible, it should even be the default in many cases 
 (function parameters, foreach-variables etc.)

So you want your containers by reference everywhere?
Dec 14 2010
parent reply Jonathan =?UTF-8?B?U2NobWlkdC1Eb21pbsOp?= <devel the-user.org> writes:
Sorry, ref const scope for such containers and PODs etc., const scope for 
primitives and references. Of course, that is exactly what you want to have 
for function-paramaters, temporary variables temporarily holding a read-only 
container-element, like foreach-variables etc. That would not influence the 
value-semantics.
Dec 15 2010
parent Jonathan =?UTF-8?B?U2NobWlkdC1Eb21pbsOp?= <devel the-user.org> writes:
 Sorry, ref const scope for such containers and PODs etc., const scope for
 primitives and references. Of course, that is exactly what you want to
 have for function-paramaters, temporary variables temporarily holding a
 read-only container-element, like foreach-variables etc. That would not
 influence the value-semantics.

Maybe you want to say that this is waste because containers often have only one or two elements, pointer and length, “d”-pointer, etc. Well, I think there should be a possibility to tell D that passing a container as const ref does not require the extra-level of indirection. Same could be possible for non-const-references and for swapping. swap(x, y) would simply check how to swap x and y in a nice way (I think swap(x, y) is always better than tmp = x; x = y; y =x;). That would make it optimal against extra-indirection and unneccessary copying. However, even with normal references it would not require more indirection than by-reference types.
Dec 15 2010
prev sibling next sibling parent Jonathan =?UTF-8?B?U2NobWlkdC1Eb21pbsOp?= <devel the-user.org> writes:
 copy_if on a multidimensional container should not naively copy entire
 hyperplanes. More generally, I think that whenever an arbitrarily large
 object is to be copied, that should be explicit instead of implicit. A
 lot of focus in C++ is dedicated to making sure you don't copy the wrong
 thing.

In my opinion by-value-types are good for mathematical objects and data like numbers, tuples, sets, lists, vectors, dictionaries etc., by-reference-types are good for things like graphical objects/widgets, hardware-ressources, io- handles, factories etc. Unnecessary copies? The compiler should care about that, I want to have clear semantics and generic syntax.
Dec 14 2010
prev sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2010-12-14 17:05:47 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 12/14/10 3:29 PM, Jonathan Schmidt-Domin wrote:
 However, some reasons for by-value-containers:
 *First of all you often have to deal with mutli-dimensional data-
 structures,
 you map something to a map of lists of whatever and you want to manage
 suche
 data in a simple and generic way, simplification or extensions to the data-
 structure should not force you to refactor all more or less generic code-
 fragments. For example copying some entries around should not look
 different
 just because you added a dimension in your data-structure. But without
 proper value-semantics you are forced to do that, because at some point you
 will have to switch from by-value to by-reference because of limitations
 made somewhere.

I think this argument goes exactly the other way. C++ containers have terrible compositional behavior. Using vector<vector<T> > or vector<map<T> > in C++98 is suicide. C++0x fixes that by means of introducing rvalue references, but reference semantics obviate all that.

That would depend on what you're doing. Sure, if the next thing you do is run an algorithm that swaps vectors or maps then will be stupidly slow. But if you're just filling the containers and iterating on the data later then it works quite well, and is likely to perform better than by-reference containers. Also, won't move semantics fix this whole performance problem? Reference semantics isn't the only solution.
 *Another argument: It should be very simple (at least in C++ it is, I have
 never had problems with it, I just added the&  here and there) to handle
 references to by-value-types, but wrapping by-reference-types into by-value-
 types is really ugly, although it may be the right thing somewhere.

"here and there" is more like "every time I define a function". I mean that's a lot, no?

I agree writing 'const T &' everywhere in C++ is a pain, and it shouldn't be that way. Perhaps what we need is a way to tell the compiler that a certain type should automatically be passed by reference when given as a function argument. By passing them by reference as in 'ref', you know the reference won't escape the function's scope, and your container can even be located on the stack.
 Wrapping could work either way, and after thinking about it a lot I 
 have difficulty decreeing one is considerably easier/simpler than the 
 other.

Wrapping a by-reference inside a by-value container is easy, but wasteful (extra allocation, extra dereference, extra null pointer check). I think that was the point. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 14 2010
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-12-14 16:29:18 -0500, Jonathan Schmidt-Domin 
<devel the-user.org> said:

 Just about my experiences: When trying to hack some algorithms quickly in
 Ruby I made a lot of mistakes because I had to care about a .clone
 everywhere and because Array.new(5, []) does not work as expected (sorry,
 but Array.new(5) { return [] } is not nice). So in fact C++ made my life
 easier than the new, stylish, simple Ruby-programming-language, because of
 the great by-value-containers in the STL.

I have to echo a similar concern with by-reference containers from my experience of Cocoa. It's really too easy to have two references to the same container without realizing it. I feel much more secure that my logic is correct when I play with C++ containers than with Cocoa's. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 14 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Michel Fortin:

 I have to echo a similar concern with by-reference containers from my 
 experience of Cocoa. It's really too easy to have two references to the 
 same container without realizing it.

A partial (but maybe better) solution to this problem is to introduce "linear types" in D, ad then let the compiler allocate a container on the stack as an automatic optimization where possible: http://en.wikipedia.org/wiki/Linear_types Bye, bearophile
Dec 14 2010
next sibling parent reply KennyTM~ <kennytm gmail.com> writes:
On Dec 15, 10 14:23, bearophile wrote:
 Michel Fortin:

 I have to echo a similar concern with by-reference containers from my
 experience of Cocoa. It's really too easy to have two references to the
 same container without realizing it.

A partial (but maybe better) solution to this problem is to introduce "linear types" in D, ad then let the compiler allocate a container on the stack as an automatic optimization where possible: http://en.wikipedia.org/wiki/Linear_types Bye, bearophile

std.typecons.Unqiue ?
Dec 15 2010
parent KennyTM~ <kennytm gmail.com> writes:
On Dec 16, 10 02:24, KennyTM~ wrote:
 On Dec 15, 10 14:23, bearophile wrote:
 Michel Fortin:

 I have to echo a similar concern with by-reference containers from my
 experience of Cocoa. It's really too easy to have two references to the
 same container without realizing it.

A partial (but maybe better) solution to this problem is to introduce "linear types" in D, ad then let the compiler allocate a container on the stack as an automatic optimization where possible: http://en.wikipedia.org/wiki/Linear_types Bye, bearophile

std.typecons.Unqiue ?

(BTW, I meant 'Unique' :) )
Dec 15 2010
prev sibling parent Jonathan =?UTF-8?B?U2NobWlkdC1Eb21pbsOp?= <devel the-user.org> writes:
 A partial (but maybe better) solution to this problem is to introduce
 "linear types" in D, ad then let the compiler allocate a container on the
 stack as an automatic optimization where possible:
 http://en.wikipedia.org/wiki/Linear_types

Well, then you would have a lot of null-ptrs when using the by-reference- containers, not very intuitive. What should it be good for?
Dec 15 2010
prev sibling parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 09/12/2010 21:55, Simon Buerger wrote:
 On 08.12.2010 23:45, Jonathan M Davis wrote:
 On Wednesday, December 08, 2010 14:14:57 Simon Buerger wrote:
 For Every lib its a design descision if containers should be value- or
 reference-types. In C++ STL they are value-types (i.e. the
 copy-constructor does a real copy), while in tango and phobos the
 descision was to go for reference-types afaik, but I would like to be
 able to write value-types too, which isn't possible (in a really good
 way) currently. Following points would need some love (by-value
 containers are probably not the only area, where these could be useful)

It's extremely rare in my experience that it makes any sense to copy a container on a regular basis. Having an easy means of creating a deep copy of a container or copying the elements from one container to another efficiently would be good, but having containers be value types is almost always a bad idea. It's just not a typical need to need to copy containers - certainly not enough to have them be copied just because you passed them to a function or returned them from one. I think that reference types for containers is very much the correct decision. There should be good ways to copy containers, but copying shouldn't be the default for much of anything in the way of containers.


I would go further than that actually, it seems to me that the idea of by-value containers is completely idiotic. I was *hesitant* to say this because it goes against conventional C++ "wisdom" (or rather, C++ mentality), and I'm just a random junior programmer on a web forum, and I am saying it in a somewhat inflammatory way... But frankly, I've been thinking about it for the last few days (the issue came up earlier, in the "Destructors, const structs, and opEquals" thread), and I could not change my mind. For the love of life, how can anyone think this is a good idea? I'm struggling to find even one use-case where it would make sense. (a non-subjective use-case at least)
  From a pragmatic viewpoint you are right, copying containers is rare.
 But on the other hand, classes imply a kind of identity, so that a set
 is a different obejct then an other object with the very same elements.

Yeah, classes have identity, but they retain the concept of equality. So what's wrong with that? Equality comparisons would still work the same way as by-value containers.
 That feels wrong from an aesthetical or mathematical viewpoint.

Aesthetics are very subjective (I can say the exact same thing about the opposite case). As for a mathematical viewpoint, yes, it's not exactly the same, but first of all, it's not generally a good idea to strictly emulate mathematical semantics in programming languages. So to speak, mathematical "objects" are immutable, and they exist in a magical infinite space world without the notion of execution or side-effects. Trying to model those semantics in a programming language brings forth a host issues (most of them performance-related). But more important, even if you wanted to do that (to have it right from a mathematical viewpoint), mutable by-value containers are just as bad, you should use immutable data instead.
 Furthermore, if you have for example a vector of vectors,

 vector!int row = [1,2,3];
 auto vec = Vector!(Vector!int)(5, row);

 then vec should be 5 rows, and not 5 times the same row.

Then instead of "Vector" use a static-length vector type, don't use a container. -- Bruno Medeiros - Software Engineer
Dec 21 2010
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Bruno Medeiros:

 For the love of life, how can anyone think this is a 
 good idea? I'm struggling to find even one use-case where it would make 
 sense. (a non-subjective use-case at least)

I agree that in general collections are better managed by reference. But if you need a hash that you know will not contain more than 10-20 items, and you need max performance, and you don't need to pass it around, than value in-place hash may be useful. I have used it some times. This is not a generic case, but beside the normal collections, Phobos2 may add few little value ones like this. Bye, bearophile
Dec 21 2010
prev sibling parent reply Simon Buerger <krox gmx.net> writes:
On 21.12.2010 18:45, Bruno Medeiros wrote:
 On 09/12/2010 21:55, Simon Buerger wrote:
 From a pragmatic viewpoint you are right, copying containers is rare.
 But on the other hand, classes imply a kind of identity, so that a set
 is a different obejct then an other object with the very same elements.

Yeah, classes have identity, but they retain the concept of equality. So what's wrong with that? Equality comparisons would still work the same way as by-value containers.

Identity is wrong, because if I pass th set {1,2,3} to a function, I would like to pass exactly these three values, not some mutable object. This may imply that the function-parameter should be const, which is probably a good idea anyway. I want it to be mutable, I want to use "out"/"ref", the same way as with the simple builtin-types.
 That feels wrong from an aesthetical or mathematical viewpoint.

Aesthetics are very subjective (I can say the exact same thing about the opposite case). As for a mathematical viewpoint, yes, it's not exactly the same, but first of all, it's not generally a good idea to strictly emulate mathematical semantics in programming languages. So to speak, mathematical "objects" are immutable, and they exist in a magical infinite space world without the notion of execution or side-effects. Trying to model those semantics in a programming language brings forth a host issues (most of them performance-related). But more important, even if you wanted to do that (to have it right from a mathematical viewpoint), mutable by-value containers are just as bad, you should use immutable data instead.

You might be right that modeling mathematics is not perfect, at least in C/C++/D/java. Though the functional-programming is fine with it, and it uses immutable data just as you suggested. But I'm aware that thats not the way to go for D. Anyway, though total math-like behavior is impossible, but with auto A = Set(1,2,3); auto B = A; B.add(42); letting A and B have different contents is much closer to math, than letting both be equal. Though both is not perfect. And for the "immutable data": Its not perfectly possible, but in many circumstances it is considered good style to use "const" and "assumeUnique" as much as possible. It helps optimizing, multi-threading and code-correctness. So it is a topic not only in functional programming but also in D.
 Furthermore, if you have for example a vector of vectors,

 vector!int row = [1,2,3];
 auto vec = Vector!(Vector!int)(5, row);

 then vec should be 5 rows, and not 5 times the same row.

Then instead of "Vector" use a static-length vector type, don't use a container.

Maybe you want to change that stuff later on, so static-length is no option. Following example might demonstrate the problem more clearly. It is intended to init a couple of sets to empty. set!int[42] a; version(by_reference_wrong): a[] = set!int.empty; // this does not work as intended version(by_reference_correct): foreach(ref x; a) x = set!int.empty; version(by_value): //nothing to be done, already everything empty Obviously the by_value version is the cleanest. Furthermore, the first example demonstrates that by-reference does not work together with the slice-syntax (which is equivalent to the constructor-call in my original example). Replacing "set!int.empty" with "new set!int" doesnt change the situation, but make it sound only more weird in my ears: "new vector"? what was wrong with the old one? and I dont want "_an_ empty set", I want "_the_ empty set". Every empty set is equal, so there is only one. Last but not least let me state: I do _not_ think, that value-containers will go into phobos/tango some day, that would to difficult in practice. I just want to state that there are certain reasons for it. (And originally this thread asked for some small changes in the language to make it possible, not the standard). Krox ps: I'll go on vacation now, see you next year, if there is still need for discussion. Merry christmas all :)
Dec 22 2010
parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Sorry for the long delay in replying.

On 22/12/2010 12:04, Simon Buerger wrote:
 On 21.12.2010 18:45, Bruno Medeiros wrote:
 On 09/12/2010 21:55, Simon Buerger wrote:
 From a pragmatic viewpoint you are right, copying containers is rare.
 But on the other hand, classes imply a kind of identity, so that a set
 is a different obejct then an other object with the very same elements.

Yeah, classes have identity, but they retain the concept of equality. So what's wrong with that? Equality comparisons would still work the same way as by-value containers.

Identity is wrong, because if I pass th set {1,2,3} to a function, I would like to pass exactly these three values, not some mutable object. This may imply that the function-parameter should be const, which is probably a good idea anyway. I want it to be mutable, I want to use "out"/"ref", the same way as with the simple builtin-types.

I don't understand this, it doesn't seem to make sense. You say you don't want the set to be "some mutable object", yet also say you "want it to be mutable". Does "it" refer to something else? I don't get it. Assuming just this text: " Identity is wrong, because if I pass th set {1,2,3} to a function, I would like to pass exactly these three values, not some mutable object. " Then pass in some unmodifiable collection. Hard to suggest a better alternative without a concrete example.
 That feels wrong from an aesthetical or mathematical viewpoint.

Aesthetics are very subjective (I can say the exact same thing about the opposite case). As for a mathematical viewpoint, yes, it's not exactly the same, but first of all, it's not generally a good idea to strictly emulate mathematical semantics in programming languages. So to speak, mathematical "objects" are immutable, and they exist in a magical infinite space world without the notion of execution or side-effects. Trying to model those semantics in a programming language brings forth a host issues (most of them performance-related). But more important, even if you wanted to do that (to have it right from a mathematical viewpoint), mutable by-value containers are just as bad, you should use immutable data instead.

You might be right that modeling mathematics is not perfect, at least in C/C++/D/java. Though the functional-programming is fine with it, and it uses immutable data just as you suggested. But I'm aware that thats not the way to go for D. Anyway, though total math-like behavior is impossible, but with

Why is "total math-like behavior is impossible" ?
 Furthermore, if you have for example a vector of vectors,

 vector!int row = [1,2,3];
 auto vec = Vector!(Vector!int)(5, row);

 then vec should be 5 rows, and not 5 times the same row.

Then instead of "Vector" use a static-length vector type, don't use a container.

Maybe you want to change that stuff later on, so static-length is no option. Following example might demonstrate the problem more clearly. It is intended to init a couple of sets to empty. set!int[42] a; version(by_reference_wrong): a[] = set!int.empty; // this does not work as intended version(by_reference_correct): foreach(ref x; a) x = set!int.empty; version(by_value): //nothing to be done, already everything empty Obviously the by_value version is the cleanest.

Not the best comparison, since the by_reference_correct version could be improved to something like: applyFill(a, set!int.empty) // if the last parameter is lazy or applyFill(a, { set!int.empty }) // otherwise, param is delegate instead but in any case this is just a very specific example. What about the other cases where by value could would be more verbose than by reference? (particularly when you want to avoid needless copies)
 Replacing "set!int.empty" with "new set!int" doesnt change the
 situation,but make it sound only more weird in my ears: "new vector"?
 what was wrong with the old one? and I dont want "_an_ empty set", I
 want "_the_ empty set". Every empty set is equal, so there is only one.

The only way to truly solve this problem for you is to use by-value containers, and actually use them with value semantics all the time (ie, don't turn them into by-ref by passing pointers or other references to them) ! ...Well, this does contradict a bit what I originally said that "how can anyone think this is a good idea? " But note that I was talking about within the context of C++. If you use by-value containers in the described above (with actually value semantics usage) you will be sooner or later incurring heavy performance costs such that it's no longer a good idea to be using C++ in the first place. -- Bruno Medeiros - Software Engineer
Jan 27 2011
prev sibling next sibling parent spir <denis.spir gmail.com> writes:
On Tue, 14 Dec 2010 18:24:09 -0500
bearophile <bearophileHUGS lycos.com> wrote:

 Yes, this is a common thing (it happened to me too, with Python and other=

things. Knowing several languages helps a bit against that. s/bit/lot/ ;-) ?
 I think so far reference semantics (but final methods) for containers is =

rite other kind of D containers (like the C++ containers used by Electronic= Arts), but I think the std lib needs to be not too much hard to use. (C++ = is sometimes too much hard to use for me. In D I was looking for a bit simp= ler to use language). Would these ones be useful? https://bitbucket.org/denispir/denispir-d/src/a= 5975e94f15c/collections.d=20 (quickly written for personal use -- definitions as struct/class can be cha= nged) Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Dec 15 2010
prev sibling parent spir <denis.spir gmail.com> writes:
On Wed, 15 Dec 2010 03:06 +0100
Jonathan Schmidt-Domin=C3=A9 <devel the-user.org> wrote:

 copy_if on a multidimensional container should not naively copy entire
 hyperplanes. More generally, I think that whenever an arbitrarily large
 object is to be copied, that should be explicit instead of implicit. A
 lot of focus in C++ is dedicated to making sure you don't copy the wrong
 thing.

In my opinion by-value-types are good for mathematical objects and data l=

 numbers, tuples, sets, lists, vectors, dictionaries etc., by-reference-ty=

 are good for things like graphical objects/widgets, hardware-ressources, =

 handles, factories etc.
 Unnecessary copies? The compiler should care about that, I want to have=20
 clear semantics and generic syntax.

+++ The choice of value/ref should always be based on (human) semantics _only_.= Any other design (of language or app) is wrong. Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Dec 15 2010