www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Passing dynamic arrays

reply Jens Mueller <jens.k.mueller gmx.de> writes:
Hi,

I do not understand what's going on behind the scene with this code. Or
better said I have some idea but maybe I do not see the whole point.

void foo(int[] array) {
	array.length += 1000; // may copy the array
	array[0] = 1;
}

auto a = new int[1];
foo(a);
assert(a[0] == 1); // fails if a needs to copied inside foo

I do understand that array.length += 1000 may copy the array. Page 98 of
The D Programming Language and
http://www.digitalmars.com/d/2.0/arrays.html#resize shed some light on
this matter. Passing a to foo is achieved by copying let's say a begin
and an end pointer. Now due to array.length += 1000 new memory might be
needed and that's why the begin and end pointer change and array[0]
works now on different data. That's why the assert fails. Right?

I find this behavior rather strange. Arrays are neither passed by value
(copying the whole array) nor by reference. I see reasons for doing it
like this, e.g. doing array = array[1..$] inside should not affect the
outside.
But I wonder whether these semantics are well enough documented?
I think I should use ref int[] in the example above, shouldn't I?

Jens
Nov 08 2010
next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Mon, 08 Nov 2010 20:30:03 +0300, Jens Mueller <jens.k.mueller gmx.de>  
wrote:

 Hi,

 I do not understand what's going on behind the scene with this code. Or
 better said I have some idea but maybe I do not see the whole point.

 void foo(int[] array) {
 	array.length += 1000; // may copy the array
 	array[0] = 1;
 }

 auto a = new int[1];
 foo(a);
 assert(a[0] == 1); // fails if a needs to copied inside foo

 I do understand that array.length += 1000 may copy the array. Page 98 of
 The D Programming Language and
 http://www.digitalmars.com/d/2.0/arrays.html#resize shed some light on
 this matter. Passing a to foo is achieved by copying let's say a begin
 and an end pointer. Now due to array.length += 1000 new memory might be
 needed and that's why the begin and end pointer change and array[0]
 works now on different data. That's why the assert fails. Right?

 I find this behavior rather strange. Arrays are neither passed by value
 (copying the whole array) nor by reference. I see reasons for doing it
 like this, e.g. doing array = array[1..$] inside should not affect the
 outside.
 But I wonder whether these semantics are well enough documented?
 I think I should use ref int[] in the example above, shouldn't I?

 Jens

Yes, you understood it correctly. The changes to array structure (i.e. size and pointer to contents) aren't visible to outer scope, but changes to the contents are. int[] is merely a tuple of a T* ptr and size_t length.
Nov 08 2010
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Jens Mueller:

 I find this behavior rather strange.

I don't know if it's strange, but surely it is a little bug-prone corner of D. I have had two or three bugs in my code because of that.
 Arrays are neither passed by value
 (copying the whole array) nor by reference.

They are passed by "fat reference" :-)
 I think I should use ref int[] in the example above, shouldn't I?

Right. Bye, bearophile
Nov 08 2010
parent reply Daniel Gibson <metalcaedes gmail.com> writes:
bearophile schrieb:
 Jens Mueller:
 
 I find this behavior rather strange.

I don't know if it's strange, but surely it is a little bug-prone corner of D. I have had two or three bugs in my code because of that.

If you pass a dynamic array to a function and chance it's size within the function, you have undefined behaviour - you never know if it will affect the original array (from the calling function) or not. So IMHO a compiler warning would be appropriate in that case. (It would be even better to have more consistent array handling throughout the different kinds of arrays, as I wrote in another branch of this thread, but if that is no option, for example because it contradicts TDPL, a compiler warning is a good compromise) Cheers, - Daniel
Nov 08 2010
next sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
Steven Schveighoffer wrote:
 On Mon, 08 Nov 2010 13:35:38 -0500, Daniel Gibson
 If you pass a dynamic array to a function and chance it's size within
 the function, you have undefined behaviour - you never know if it will
 affect the original array (from the calling function) or not.

Not exactly. If you happen to change its size *and* change the original data afterwards, then it's somewhat undefined

Let's also note that appending to the array qualifies as "change its size *and* change the original data afterwards." We cannot be sure whether appending affects the passed-in array.
 (I'd call it confusing,
 since the behavior is perfectly defined, just hard to describe).

I like the term "discretionary sharing semantics" where any slice can leave the sharing contract at their discretion regardless of whether they modified the shared elements so far. Ali
Nov 08 2010
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
Steven Schveighoffer wrote:

 No, it doesn't. If you are appending to data that was passed in, you are
 not changing the *original data* passed in.  You are only appending 

I must be remembering an old behavior. I think appending could affect the original if it had enough capacity. Ali
Nov 08 2010
parent reply Pillsy <pillsbury gmail.com> writes:
Steven Schveighoffer Wrote:

 On Mon, 08 Nov 2010 18:29:27 -0500, Ali Çehreli <acehreli yahoo.com> wrote:

 I must be remembering an old behavior. I think appending could
 affect the original if it had enough capacity.


 Before the array append changes were introduced (somewhere around
 2.040 I think?), appending to a slice that started at the beginning of
 the memory block could affect the other data in the array.  But that
 was a memory corruption issue, somewhat different than what we 
 are talking about.

Ah! This is a lot of what was confusing me about arrays; I still thought they had this behavior. The fact that they don't makes me a good deal more comfortable with them, though I still don't like the non-deterministic way that they may copy their elements or they may share structure after you append stuff to them. Cheers, Pillsy
Nov 09 2010
parent Pillsy <pillsbury gmail.com> writes:
Steven Schveighoffer Wrote:

 On Tue, 09 Nov 2010 08:14:40 -0500, Pillsy <pillsbury gmail.com> wrote:

 Ah! This is a lot of what was confusing me about arrays; I still thought  
 they had this behavior. The fact that they don't makes me a good deal  
 more comfortable with them, though I still don't like the  
 non-deterministic way that they may copy their elements or they may  
 share structure after you append stuff to them.


 As I said before, this rarely affects code.  The common cases I've seen:

 1. You append to an array and return it.
 2. You modify data in the array.
 3. You use a passed in array as a buffer, which means you overwrite the  
 array, and then start appending when it runs out of space.

 I don't ever remember seeing:

 You append to an array, then go back and modify the first few bytes of the  
 array.

I've certainly encountered situations in at least one other language where standard library functions will return mutable arrays which may or may not share structure with their inputs. This has been such a frequent source of pain when using that language that I tend to react very negatively to the possibility in any context.
 Let's assume this is a very common thing and absolutely needs to be  
 addressed.  What would you like the behavior to be? 

Using a different, library type for a buffer you can append to. I think of "a buffer or abstract list you can cheaply append to" as a different sort of type from a fixed size buffer anyway, since it so often is a different type. Arrays/slices are a very basic type in D, and I'm generally thinking that giving your basic types simpler, easier to understand semantics is worth paying a modest cost. [...]
 IMO, the benefits of just being able to append to an array any time you  
 want without having to set up some special type far outweighs this little  
 quirk that almost nobody encounters.  You can append to *any* array, no  
 matter where the data is located, or whether the data is a slice, and it  
 just works.  I can't see how anyone would prefer another solution!

There's a difference between appending and appending in place. The problem with not appending in place (and arrays not having the possibility of a reserve that's larger than the actual amount, of course) is one of efficiency. Having auto s = "foo"; s ~= "bar"; result in a new array being allocated that is of length 6 and contains "foobar", and assigning that array to `s`, is obviously useful and desirable behavior. If the expansion can happen in place, that's a perfectly reasonable performance optimization to have in the case of strings or other immutable arrays. Indeed, one of the reasons that functional programming and GC go together like peanut butter and jelly is that together they let you get all sorts of wins in terms of efficiency from shared structure. However, I've found working with languages that mix a lot of imperative and functional constructs (Lisp is one, but not the only one) that if you're going to do this, it's really very important that there not be any doubt about when mutable state is shared and when it isn't. D is trying to be that same kind of multi-paradigm language. This means that, for mutable arrays, having int[] x = [1, 2, 3]; x ~= [4, 5, 6]; maybe reallocate and maybe not seems like it's only really there to protect people from doing inefficient things by accident when they append onto the back of an array repeatedly (or to make that admittedly common case more convenient). This really doesn't strike me as worth the trouble. Like I said elsewhere, the uncertainty gives me the screaming willies. Cheers, Pillsy
Nov 09 2010
prev sibling parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Jonathan M Davis schrieb:
 On Monday, November 08, 2010 10:35:38 Daniel Gibson wrote:
 bearophile schrieb:
 Jens Mueller:
 I find this behavior rather strange.

of D. I have had two or three bugs in my code because of that.

function, you have undefined behaviour - you never know if it will affect the original array (from the calling function) or not.

Implementation-defined behavior would be more accurate. Undefined would mean that it's something dangerous which is not defined by the language, and you shouldn't be doing. It's perfectly defined by the language in this case. It's just that how much extra memory is allocated for a dynamic array is implementation-defined, so in cases where a reallocation would be necessary because the array ran out of memory, it's implementation-dependent as to whether or not a reallocation will be necessary or not. In every other case, it's completely language-defined and deterministic. The algorithm for additional capacity is the only part that isn't.

Ok, undefined may have been the wrong word.. non-deterministic may be better. Anyway, you can't know if there's space left behind the array that can be obtained by realloc or if increasing the length will cause the array to be copied to a new block of memory, so the array in the calling function points to other memory than the array in the called function.
 So IMHO a compiler warning would  be appropriate in that case.

 (It would be even better to have more consistent array handling throughout
 the different kinds of arrays, as I wrote in another branch of this
 thread, but if that is no option, for example because it contradicts TDPL,
 a compiler warning is a good compromise)

Honestly, if you want an array that you're passing in to be altered by the function that you're passing it to, it really should be passed by ref anyway.

The documentation[1] says: "For dynamic array and object parameters, which are passed by reference, in/out/ref apply only to the reference and not the contents." So, by reading the documentation one would assume that dynamic arrays are passed by reference - *real* reference, implying that any change to the array within the called function will be visible outside of the function. The truth however is that dynamic arrays are not passed by reference and any changes to the length will be lost (even if the arrays data won't be copied).
 And if you want to guarantee that it isn't going to be altered, make it const, 
 dup it, or have its individual elements be const or immutable. Problem solved.
I 
 really don't see this as an issue. I can understand why there might be some 
 confusion - particularly since the online docs aren't very clear on the
matter, 
 but it really isn't complicated, and I'd hate to have to dup an array just to
be 
 able to append to it, which is what the warning that you're suggesting would 
 require. A compiler warning is as good as an error really, since you're going
to 
 have to fix it anyway, so I'd definitely be against making this either an
error or 
 a warning. I see no problem with being allowed to resize arrays that are
passed 
 to functions.

So maybe yet another solution would be to *really* pass dynamic arrays by reference (like the doc pretends it's already done)?
 
 - Jonathan M Davis

Cheers, - Daniel [1] http://www.digitalmars.com/d/2.0/function.html
Nov 08 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Jonathan M Davis:

 Daniel Gibson:
 So maybe yet another solution would be to *really* pass dynamic arrays by
 reference (like the doc pretends it's already done)?

That would be a devastating change.

That's a solution that I have proposed lot of time ago, I did receive no answers :-) A disadvantage of passing on default dynamic arrays by reference is a decrease in performance. Bye, bearophile
Nov 08 2010
prev sibling next sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Mon, 08 Nov 2010 19:30:03 +0200, Jens Mueller <jens.k.mueller gmx.de>  
wrote:

 I find this behavior rather strange. Arrays are neither passed by value
 (copying the whole array) nor by reference. I see reasons for doing it
 like this, e.g. doing array = array[1..$] inside should not affect the
 outside.

Compare to this C-ish program: void foo(int *array, int length) { array = (int*)realloc(array, (length += 1000) * sizeof(int)); // may copy the array array[0] = 1; } int* a = new int[1]; foo(a, 1); assert(a[0] == 1); // fails if a needs to copied inside foo C's realloc *may* copy the memory area being reallocated. Just like when using realloc, it's something you must be aware of when using D arrays. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Nov 08 2010
parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Vladimir Panteleev schrieb:
 On Mon, 08 Nov 2010 19:30:03 +0200, Jens Mueller <jens.k.mueller gmx.de> 
 wrote:
 
 I find this behavior rather strange. Arrays are neither passed by value
 (copying the whole array) nor by reference. I see reasons for doing it
 like this, e.g. doing array = array[1..$] inside should not affect the
 outside.

Compare to this C-ish program: void foo(int *array, int length) { array = (int*)realloc(array, (length += 1000) * sizeof(int)); // may copy the array array[0] = 1; } int* a = new int[1]; foo(a, 1); assert(a[0] == 1); // fails if a needs to copied inside foo C's realloc *may* copy the memory area being reallocated. Just like when using realloc, it's something you must be aware of when using D arrays.

This might technically be the same thing, but D's nice syntax hides this. IMHO passing arrays to functions are really inconsistent in D2 anyway: static arrays are passed by value but dynamic arrays are passed by reference, but then again, as this thread shows, not really.. And what about associative arrays? (I don't know, haven't tried yet, afaik it isn't documented). Certainly D's behaviour regarding the dynamic arrays has technical reasons and makes sense when you know how they're implemented (a fat pointer that is passed by value), but it *feels* inconsistent. IMHO either all kinds of arrays should *either* be passed by reference (real reference, not coincidental reference like dynamic arrays are now) *or* by "logical" value, i.e. dynamic arrays would be dup'ed and associative arrays would be cloned. BTW: What were the reasons to pass static arrays by value in D2 (while in D1 they're passed by reference)? Cheers, - Daniel
Nov 08 2010
parent reply Walter Bright <newshound2 digitalmars.com> writes:
Daniel Gibson wrote:
 BTW: What were the reasons to pass static arrays by value in D2 (while 
 in D1 they're passed by reference)?

It makes things like vectors (i.e. float[3]) natural to manipulate. It also segues nicely into hopeful future support for the CPU's vector instructions.
Nov 08 2010
parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Walter Bright schrieb:
 Daniel Gibson wrote:
 BTW: What were the reasons to pass static arrays by value in D2 (while 
 in D1 they're passed by reference)?

It makes things like vectors (i.e. float[3]) natural to manipulate. It also segues nicely into hopeful future support for the CPU's vector instructions.

Why can't that be done when the static arrays are passed by reference?
Nov 08 2010
parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Daniel Gibson schrieb:
 Walter Bright schrieb:
 Daniel Gibson wrote:
 BTW: What were the reasons to pass static arrays by value in D2 
 (while in D1 they're passed by reference)?

It makes things like vectors (i.e. float[3]) natural to manipulate. It also segues nicely into hopeful future support for the CPU's vector instructions.

Why can't that be done when the static arrays are passed by reference?

Ah I guess you mean something like "alias float[3] vec3", so one may expect vec3 to behave like a value type (like when you define it in a struct) so it's passed by value. That does make sense, even though I'm not sure what's more important: consistency between different kinds of arrays or expectations towards typed defined from static arrays. ;-) I still don't get the part with the CPU's vector instructions though. I don't have any assembly knowledge an no experience with directly using CPU's vector instructions, but the example of a C function wrapping SSE instructions from the wikipedia article[1], which multiplies two arrays of floats, loads the arrays by reference and even stores the result in one of them.
Nov 08 2010
parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Daniel Gibson schrieb:
 Daniel Gibson schrieb:
 Walter Bright schrieb:
 Daniel Gibson wrote:
 BTW: What were the reasons to pass static arrays by value in D2 
 (while in D1 they're passed by reference)?

It makes things like vectors (i.e. float[3]) natural to manipulate. It also segues nicely into hopeful future support for the CPU's vector instructions.

Why can't that be done when the static arrays are passed by reference?

Ah I guess you mean something like "alias float[3] vec3", so one may expect vec3 to behave like a value type (like when you define it in a struct) so it's passed by value. That does make sense, even though I'm not sure what's more important: consistency between different kinds of arrays or expectations towards typed defined from static arrays. ;-) I still don't get the part with the CPU's vector instructions though. I don't have any assembly knowledge an no experience with directly using CPU's vector instructions, but the example of a C function wrapping SSE instructions from the wikipedia article[1], which multiplies two arrays of floats, loads the arrays by reference and even stores the result in one of them.

Hrm forgot the link: [1] http://en.wikipedia.org/wiki/Vector_processor
Nov 08 2010
parent reply Walter Bright <newshound2 digitalmars.com> writes:
Daniel Gibson wrote:
 Daniel Gibson schrieb:
 Daniel Gibson schrieb:
 Walter Bright schrieb:
 Daniel Gibson wrote:
 BTW: What were the reasons to pass static arrays by value in D2 
 (while in D1 they're passed by reference)?

It makes things like vectors (i.e. float[3]) natural to manipulate. It also segues nicely into hopeful future support for the CPU's vector instructions.

Why can't that be done when the static arrays are passed by reference?

Ah I guess you mean something like "alias float[3] vec3", so one may expect vec3 to behave like a value type (like when you define it in a struct) so it's passed by value. That does make sense, even though I'm not sure what's more important: consistency between different kinds of arrays or expectations towards typed defined from static arrays. ;-) I still don't get the part with the CPU's vector instructions though. I don't have any assembly knowledge an no experience with directly using CPU's vector instructions, but the example of a C function wrapping SSE instructions from the wikipedia article[1], which multiplies two arrays of floats, loads the arrays by reference and even stores the result in one of them.

Hrm forgot the link: [1] http://en.wikipedia.org/wiki/Vector_processor

You don't write an add function by using references to ints. The vector instructions treat them like values, so a value type should correspond to it.
Nov 08 2010
parent Daniel Gibson <metalcaedes gmail.com> writes:
Walter Bright schrieb:
 Daniel Gibson wrote:
 Daniel Gibson schrieb:
 Daniel Gibson schrieb:
 Walter Bright schrieb:
 Daniel Gibson wrote:
 BTW: What were the reasons to pass static arrays by value in D2 
 (while in D1 they're passed by reference)?

It makes things like vectors (i.e. float[3]) natural to manipulate. It also segues nicely into hopeful future support for the CPU's vector instructions.

Why can't that be done when the static arrays are passed by reference?

Ah I guess you mean something like "alias float[3] vec3", so one may expect vec3 to behave like a value type (like when you define it in a struct) so it's passed by value. That does make sense, even though I'm not sure what's more important: consistency between different kinds of arrays or expectations towards typed defined from static arrays. ;-) I still don't get the part with the CPU's vector instructions though. I don't have any assembly knowledge an no experience with directly using CPU's vector instructions, but the example of a C function wrapping SSE instructions from the wikipedia article[1], which multiplies two arrays of floats, loads the arrays by reference and even stores the result in one of them.

Hrm forgot the link: [1] http://en.wikipedia.org/wiki/Vector_processor

You don't write an add function by using references to ints. The vector instructions treat them like values, so a value type should correspond to it.

Ok, thanks for explaining :-)
Nov 08 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, November 08, 2010 09:40:47 bearophile wrote:
 Jens Mueller:
 I find this behavior rather strange.

I don't know if it's strange, but surely it is a little bug-prone corner of D. I have had two or three bugs in my code because of that.

I don't know. I find it to be pretty straightforward, though I can understand why people would find it to be confusing at first. As long as you don't alter a dynamic array's size in any way, then it and any references to it or any part of it will continue to point to the same data. If you dup an array, then it's guaranteed to point to different data (albeit a copy of that data). If you alter the size of an array but don't explicitly dup it, then it _might_ point to the same data and it might not (hence the potential confusion). So, if you want to guarantee that an array continues to point to the same data, then just don't alter its size. If you want to guarantee that its copied and points to different data, then dup or idup it. If you don't care whether it continues to point to the same data or not, then feel free to resize it through setting its length or appending to it. Granted, it's easier to understand what's going on when you understand that a dynamic array is essentially struct array(T) { T* ptr; size_t length; } but you don't really need to. Truth be told, I don't think that I've _ever_ had a bug due to how arrays reallocate. I think that as long as you understand that resizing could mean reallocation, it's quite easy to avoid bugs with it. That doesn't mean that they'll never happen, but I don't find this particular corner of the language to be particularly bug-prone. - Jonathan M Davis
Nov 08 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 08 Nov 2010 13:35:38 -0500, Daniel Gibson <metalcaedes gmail.com>  
wrote:

 bearophile schrieb:
 Jens Mueller:

 I find this behavior rather strange.

corner of D. I have had two or three bugs in my code because of that.

If you pass a dynamic array to a function and chance it's size within the function, you have undefined behaviour - you never know if it will affect the original array (from the calling function) or not.

Not exactly. If you happen to change its size *and* change the original data afterwards, then it's somewhat undefined (I'd call it confusing, since the behavior is perfectly defined, just hard to describe). Such cases are very rare. You are usually changing data on the array in place, or appending to the array, but not usually both.
 So IMHO a compiler warning would  be appropriate in that case.

 (It would be even better to have more consistent array handling  
 throughout the different kinds of arrays, as I wrote in another branch  
 of this thread, but if that is no option, for example because it  
 contradicts TDPL, a compiler warning is a good compromise)

First, D doesn't have compiler warnings. Either something is an error, or it is not. You can use the -w switch to turn on extra checks that become errors, but that's it. Second, if you made that a compiler warning, then 90% of D functions would exhibit a warning. This may appear to be a surprising issue, the one time it happens to you, but when it does, you learn how arrays work and move on. In practice, it's not a huge killer such as to warrant making it a compiler error or warning. -Steve
Nov 08 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, November 08, 2010 10:35:38 Daniel Gibson wrote:
 bearophile schrieb:
 Jens Mueller:
 I find this behavior rather strange.

I don't know if it's strange, but surely it is a little bug-prone corner of D. I have had two or three bugs in my code because of that.

If you pass a dynamic array to a function and chance it's size within the function, you have undefined behaviour - you never know if it will affect the original array (from the calling function) or not.

Implementation-defined behavior would be more accurate. Undefined would mean that it's something dangerous which is not defined by the language, and you shouldn't be doing. It's perfectly defined by the language in this case. It's just that how much extra memory is allocated for a dynamic array is implementation-defined, so in cases where a reallocation would be necessary because the array ran out of memory, it's implementation-dependent as to whether or not a reallocation will be necessary or not. In every other case, it's completely language-defined and deterministic. The algorithm for additional capacity is the only part that isn't.
 So IMHO a compiler warning would  be appropriate in that case.
 
 (It would be even better to have more consistent array handling throughout
 the different kinds of arrays, as I wrote in another branch of this
 thread, but if that is no option, for example because it contradicts TDPL,
 a compiler warning is a good compromise)

Honestly, if you want an array that you're passing in to be altered by the function that you're passing it to, it really should be passed by ref anyway. And if you want to guarantee that it isn't going to be altered, make it const, dup it, or have its individual elements be const or immutable. Problem solved. I really don't see this as an issue. I can understand why there might be some confusion - particularly since the online docs aren't very clear on the matter, but it really isn't complicated, and I'd hate to have to dup an array just to be able to append to it, which is what the warning that you're suggesting would require. A compiler warning is as good as an error really, since you're going to have to fix it anyway, so I'd definitely be against making this either an error or a warning. I see no problem with being allowed to resize arrays that are passed to functions. - Jonathan M Davis
Nov 08 2010
prev sibling next sibling parent spir <denis.spir gmail.com> writes:
On Mon, 08 Nov 2010 19:04:40 +0100
Daniel Gibson <metalcaedes gmail.com> wrote:

 IMHO passing arrays to functions are really inconsistent in D2 anyway:
 static arrays are passed by value but dynamic arrays are passed by refere=

 but then again, as this thread shows, not really..

It may be better to have 2 kinds of "sequences", one having true value sema= ntics (assignment & parameter passing also protect the content), the other = true reference semantics (say, an array-list). Static arrays may just be co= nsidered as an additional hint to the compiler for possible optimization. Or, have arrays implement true value semantics, but pass them as ref when n= eeded. But then, we may sometimes need assignment not to copy... What do you think? Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Nov 08 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 08 Nov 2010 14:22:36 -0500, Ali Çehreli <acehreli yahoo.com> wrote:

 Steven Schveighoffer wrote:
  > On Mon, 08 Nov 2010 13:35:38 -0500, Daniel Gibson
  >> If you pass a dynamic array to a function and chance it's size within
  >> the function, you have undefined behaviour - you never know if it  
 will
  >> affect the original array (from the calling function) or not.
  >
  > Not exactly.  If you happen to change its size *and* change the  
 original
  > data afterwards, then it's somewhat undefined

 Let's also note that appending to the array qualifies as "change its  
 size *and* change the original data afterwards." We cannot be sure  
 whether appending affects the passed-in array.

No, it doesn't. If you are appending to data that was passed in, you are not changing the *original data* passed in. You are only appending to it. for example: char[] s = "foo".dup; s ~= "bar"; does not change the first 3 characters at all. So any aliases to s would not be affected. However, any data aliased to the original s may or may not be aliased to the new s. Once you start changing that original data (either via s or via an alias to the original s), this is where the confusing behavior occurs. In my experience, this does not cause a problem in the vast majority of cases. -Steve
Nov 08 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, November 08, 2010 11:22:36 Ali =C3=87ehreli wrote:
 Steven Schveighoffer wrote:
  > On Mon, 08 Nov 2010 13:35:38 -0500, Daniel Gibson
  >=20
  >> If you pass a dynamic array to a function and chance it's size within
  >> the function, you have undefined behaviour - you never know if it will
  >> affect the original array (from the calling function) or not.
  >=20
  > Not exactly.  If you happen to change its size *and* change the origin=

  > data afterwards, then it's somewhat undefined
=20
 Let's also note that appending to the array qualifies as "change its
 size *and* change the original data afterwards." We cannot be sure
 whether appending affects the passed-in array.

Yes you can. It _never_ alters the array which was passed in. Sure, it _cou= ld_=20 alter the memory just off the end of the passed in array if no arrays refer= to=20 that memory, but that doesn't cause any problems. It doesn't alter the orig= inal=20 array at all. It just means that when you resize that array, it may have to= =20 reallocate whereas before it might have been able to resize in place. And i= f the=20 array in the called function reallocates instead of resizing in place, then= the=20 original array would either have been forced to reallocate anyway or it may= be=20 able to resize in place depending on how much you try and resize it and whe= ther=20 any other arrays refer to the memory passed its end. In _no_ case does appending or concatenating to an array alter any other ar= rays,=20 even if they point to the same memory. They may or may not end up pointing = to=20 the same memory afterwards (depending on whether a reallocation takes place= ),=20 but you never alter any other arrays. =2D Jonathan M Davis
Nov 08 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, November 08, 2010 11:30:33 Daniel Gibson wrote:
 So IMHO a compiler warning would  be appropriate in that case.
 
 (It would be even better to have more consistent array handling
 throughout the different kinds of arrays, as I wrote in another branch
 of this thread, but if that is no option, for example because it
 contradicts TDPL, a compiler warning is a good compromise)

Honestly, if you want an array that you're passing in to be altered by the function that you're passing it to, it really should be passed by ref anyway.

The documentation[1] says: "For dynamic array and object parameters, which are passed by reference, in/out/ref apply only to the reference and not the contents." So, by reading the documentation one would assume that dynamic arrays are passed by reference - *real* reference, implying that any change to the array within the called function will be visible outside of the function. The truth however is that dynamic arrays are not passed by reference and any changes to the length will be lost (even if the arrays data won't be copied).

Then the documentation should be updated to be more clear.
 And if you want to guarantee that it isn't going to be altered, make it
 const, dup it, or have its individual elements be const or immutable.
 Problem solved. I really don't see this as an issue. I can understand
 why there might be some confusion - particularly since the online docs
 aren't very clear on the matter, but it really isn't complicated, and
 I'd hate to have to dup an array just to be able to append to it, which
 is what the warning that you're suggesting would require. A compiler
 warning is as good as an error really, since you're going to have to fix
 it anyway, so I'd definitely be against making this either an error or a
 warning. I see no problem with being allowed to resize arrays that are
 passed to functions.

So maybe yet another solution would be to *really* pass dynamic arrays by reference (like the doc pretends it's already done)?

That would be a devastating change. It would cause bugs all over the place - especially when dealing with std.algorithm. Remember that an array is a range. Really, when an array is passed to a function, you're passing a slice of that array which happen to slice the whole array. - Jonathan M Davis
Nov 08 2010
prev sibling next sibling parent spir <denis.spir gmail.com> writes:
On Mon, 08 Nov 2010 20:30:33 +0100
Daniel Gibson <metalcaedes gmail.com> wrote:

 The documentation[1] says: "For dynamic array and object parameters, whic=

 passed by reference, in/out/ref apply only to the reference and not the c=

 So, by reading the documentation one would assume that dynamic arrays are=

 by reference - *real* reference, implying that any change to the array wi=

 the called function will be visible outside of the function.
 The truth however is that dynamic arrays are not passed by reference and =

 changes to the length will be lost (even if the arrays data won't be copi=

Exactly. Pass (and assign) by reference mean different semantics from what = D does with dyn arrays: that changes are shared. Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Nov 08 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday 08 November 2010 12:07:39 spir wrote:
 On Mon, 08 Nov 2010 20:30:33 +0100
 
 Daniel Gibson <metalcaedes gmail.com> wrote:
 The documentation[1] says: "For dynamic array and object parameters,
 which are passed by reference, in/out/ref apply only to the reference
 and not the contents." So, by reading the documentation one would assume
 that dynamic arrays are passed by reference - *real* reference, implying
 that any change to the array within the called function will be visible
 outside of the function.
 The truth however is that dynamic arrays are not passed by reference and
 any changes to the length will be lost (even if the arrays data won't be
 copied).

Exactly. Pass (and assign) by reference mean different semantics from what D does with dyn arrays: that changes are shared.

The reference semantics for Objects and dynamic arrays are identical. They are passed by reference, but their reference is passed by value. So, any changes to the contents will affect the object or array which was passed in. However, changes to the reference (such as assigning a new Object or array or doing any operation on an array which would result in reallocation) do not change the object or array which was referred to by that reference. "For dynamic array and object parameters, which are passed by reference, in/out/ref apply only to the reference and not the contents." is perfectly correct. The problem is whether you read the "which..." as applying to both dynamic arrays and object parameters or just object parameters. It's perfectly correct as is but apparently ambiguous enough to cause confusion. The docs should be updated to be clearer, but the quoted documentation, at least, is correct. And there's nothing funny about how arrays work in comparison to objects except that arrays happen to have operations which can cause reallocation (in addition to new) whereas objects don't. - Jonathan M Davis
Nov 08 2010
prev sibling next sibling parent reply so <so so.do> writes:
D arrays very powerful but you first need to understand what is going on.  
You should check the book.
An inconsistency is the copy of static arrays at assignment, but necessary  
one.
One thing i don't like about D arrays is an undefined case in dynamic  
array reallocation.

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
Nov 08 2010
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/8/10 4:50 PM, so wrote:
 D arrays very powerful but you first need to understand what is going
 on. You should check the book.

Or a mildly outdated but accurate preview of the relevant chapter: http://erdani.com/d/thermopylae.pdf Andrei
Nov 08 2010
prev sibling parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 09/11/2010 01:43, Jonathan M Davis wrote:
 On Monday, November 08, 2010 16:50:46 so wrote:
 D arrays very powerful but you first need to understand what is going on.
 You should check the book.
 An inconsistency is the copy of static arrays at assignment, but necessary
 one.
 One thing i don't like about D arrays is an undefined case in dynamic
 array reallocation.

It's perfectly defined, just not knowable at compile time. You can even check the array's capacity if you want to try and figure out when it's going to happen. And there's not really any reasonable alternative. What would have happen instead? Make an array reallocate _every_ time that it's resized? That would be highly inefficient and could really degrade performance. Appending becomes O(n) instead of amortized O(1). If you're not altering the actual elements of the array, then the current implementation is great. If you _are_ altering them, then simply dup the array to guarantee that it's been reallocated. - Jonathan M Davis

Making the array reallocate _every_ time that it's resized (to a greater length) is actually not that unreasonable. Would it be highly inneficient? Only if you write bad code. TDPL agrees with you, I quote: " One easy way out would be to always reallocate a upon appending to it [...] Although that behavior is easiest to implement, it has serious efficiency problems. For example, oftentimes arrays are iteratively grown in a loop: int[] a; foreach (i; 0 .. 100) { a ~= i; } " Hum, "oftentimes"? I wonder if such code is really that common (and what languages are we talking about here?) But more importantly, there is a simple solution: don't write such code, don't use arrays like if they are lists, preallocate instead and then fill the array. So with this alternative behavior, you can still write efficient code, and nearly as easily. The only advantage of the current behavior is that it is more noob friendly, which is an advantage of debatable value. -- Bruno Medeiros - Software Engineer
Nov 26 2010
next sibling parent reply =?ISO-8859-9?Q?Pelle_M=E5nsson?= <pelle.mansson gmail.com> writes:
On 11/26/2010 07:22 PM, Bruno Medeiros wrote:
 But more importantly, there is a simple solution: don't write such code,
 don't use arrays like if they are lists, preallocate instead and then
 fill the array. So with this alternative behavior, you can still write
 efficient code, and nearly as easily.

What about when you don't know the length before, or working with immutable elements?
 The only advantage of the current behavior is that it is more noob
 friendly, which is an advantage of debatable value.

I believe you will find to have that exactly backwards.
Nov 26 2010
parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 26/11/2010 19:36, Pelle Mnsson wrote:
 On 11/26/2010 07:22 PM, Bruno Medeiros wrote:
 But more importantly, there is a simple solution: don't write such code,
 don't use arrays like if they are lists, preallocate instead and then
 fill the array. So with this alternative behavior, you can still write
 efficient code, and nearly as easily.

What about when you don't know the length before, or working with immutable elements?

I must recognize I made a huge blunder with that, my reasoning was indeed very wrong. :( don't know the length: Do the exponentional growth yourself. Double the array length when it gets full, and keep track of real length in a separate variable. At the end, downsize the array to real length. - This version is significantly more complex than the code with the current behavior, significantly enough to counter my argument (the "nearly as easily" part). - It's actually also slightly less efficient, because whenever you grow the array, half of the elements have to be default-initialized, which is not the case when you just grow the capacity (using current resize behavior). I think one might resolve this by doing some cast to void[]'s and back, but that would make this code version even more complex. immutable: preallocate another array _whose elements are typed as tail-immutable_, fill it, at the end cast it to the array typed with immutable elements. - Ouch, this one is even worse, especially depending on what the type of the immutable elements are, because of the need to determine the tail-immutable version of that type. If the elements are Objects, it's not even possible to do that, so you would need to type the temporary array as void*[]. And that would make the code even more complex if you'd want to access and use the elements while the array is being constructed. (alternatively you could cast away all the immutable from the element type, but it's less safe, on a more complex loop you would risk modifying them by mistake) :/ -- Bruno Medeiros - Software Engineer
Nov 26 2010
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/26/10 12:22 PM, Bruno Medeiros wrote:
 On 09/11/2010 01:43, Jonathan M Davis wrote:
 On Monday, November 08, 2010 16:50:46 so wrote:
 D arrays very powerful but you first need to understand what is going
 on.
 You should check the book.
 An inconsistency is the copy of static arrays at assignment, but
 necessary
 one.
 One thing i don't like about D arrays is an undefined case in dynamic
 array reallocation.

It's perfectly defined, just not knowable at compile time. You can even check the array's capacity if you want to try and figure out when it's going to happen. And there's not really any reasonable alternative. What would have happen instead? Make an array reallocate _every_ time that it's resized? That would be highly inefficient and could really degrade performance. Appending becomes O(n) instead of amortized O(1). If you're not altering the actual elements of the array, then the current implementation is great. If you _are_ altering them, then simply dup the array to guarantee that it's been reallocated. - Jonathan M Davis

Making the array reallocate _every_ time that it's resized (to a greater length) is actually not that unreasonable. Would it be highly inneficient? Only if you write bad code. TDPL agrees with you, I quote: " One easy way out would be to always reallocate a upon appending to it [...] Although that behavior is easiest to implement, it has serious efficiency problems. For example, oftentimes arrays are iteratively grown in a loop: int[] a; foreach (i; 0 .. 100) { a ~= i; } " Hum, "oftentimes"? I wonder if such code is really that common (and what languages are we talking about here?)

It would be difficult to challenge the assumption that appends in a loop are common.
 But more importantly, there is a simple solution: don't write such code,
 don't use arrays like if they are lists, preallocate instead and then
 fill the array. So with this alternative behavior, you can still write
 efficient code, and nearly as easily.

I disagree. Often you don't know the length to preallocate (e.g. input is from a file etc). The fact that there's a convenient append operator only makes things more in favor of supporting such idioms. The technique (exponential capacity growth) is well known.
 The only advantage of the current behavior is that it is more noob
 friendly, which is an advantage of debatable value.

I don't think the current behavior favors noobs. Andrei
Nov 26 2010
next sibling parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 26/11/2010 19:16, Andrei Alexandrescu wrote:
 On 11/26/10 12:22 PM, Bruno Medeiros wrote:

 But more importantly, there is a simple solution: don't write such code,
 don't use arrays like if they are lists, preallocate instead and then
 fill the array. So with this alternative behavior, you can still write
 efficient code, and nearly as easily.

I disagree. Often you don't know the length to preallocate (e.g. input is from a file etc). The fact that there's a convenient append operator only makes things more in favor of supporting such idioms. The technique (exponential capacity growth) is well known.
 The only advantage of the current behavior is that it is more noob
 friendly, which is an advantage of debatable value.

I don't think the current behavior favors noobs. Andrei

You could still do exponential capacity growth by manipulating the length property, but yeah, that would create a host of complexity and other issues (see my reply to Pelle). Yeah, my reasoning was really broken. :'( (I need some R&R, lol) -- Bruno Medeiros - Software Engineer
Nov 26 2010
parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 26/11/2010 22:12, spir wrote:
 On Fri, 26 Nov 2010 21:59:37 +0000
 Bruno Medeiros<brunodomedeiros+spam com.gmail>  wrote:

 You could still do exponential capacity growth by manipulating the
 length property, but yeah, that would create a host of complexity and
 other issues (see my reply to Pelle). Yeah, my reasoning was really
 broken. :'(

What is the reason why D does not internally manages exp growth with a (private) capacity field? It's very common, and I thought the reason was precisely efficiency as it does not require reallocating so often, esp in cases such as feeding an array in a loop like in Andrei's example.) Denis -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com

But D does exactly that, there is a capacity field (internal to the GC), and array growth is managed automatically, in an exponential way. -- Bruno Medeiros - Software Engineer
Nov 26 2010
prev sibling parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 26/11/2010 19:16, Andrei Alexandrescu wrote:
 "
 One easy way out would be to always reallocate a upon appending to it
 [...]
 Although that behavior is easiest to implement, it
 has serious efficiency problems. For example, oftentimes arrays are
 iteratively grown in a loop:

 int[] a;
 foreach (i; 0 .. 100) {
 a ~= i;
 }

 "

 Hum, "oftentimes"? I wonder if such code is really that common (and what
 languages are we talking about here?)

It would be difficult to challenge the assumption that appends in a loop are common.

Well, there was actually no assumption yet, I wanted first of all to know what languages you had in mind, because I'm wasn't sure I understood you correctly. C and C++ don't even have (dynamic) arrays. Java, Javascript (and C# as well, I think) have them but there is no append operation, arrays cannot be resized. So trivially that idiom is not common in these languages. :) I don't know about Python, Ruby, Erlang, Haskell, Perl, PHP. Or perhaps you were just being very liberal in your meaning of "arrays", and were also thinking of constructs like C++'s std::vector ? -- Bruno Medeiros - Software Engineer
Nov 26 2010
prev sibling parent reply Kagamin <spam here.lot> writes:
Bruno Medeiros Wrote:

 Making the array reallocate _every_ time that it's resized (to a greater 
 length) is actually not that unreasonable. Would it be highly 
 inneficient? Only if you write bad code. TDPL agrees with you, I quote:

Challenge: make D slower than C#.
Nov 26 2010
parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 26/11/2010 21:30, Kagamin wrote:
 Bruno Medeiros Wrote:

 Making the array reallocate _every_ time that it's resized (to a greater
 length) is actually not that unreasonable. Would it be highly
 inneficient? Only if you write bad code. TDPL agrees with you, I quote:

Challenge: make D slower than C#.

Huh? -- Bruno Medeiros - Software Engineer
Nov 26 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, November 08, 2010 16:50:46 so wrote:
 D arrays very powerful but you first need to understand what is going on.
 You should check the book.
 An inconsistency is the copy of static arrays at assignment, but necessary
 one.
 One thing i don't like about D arrays is an undefined case in dynamic
 array reallocation.

It's perfectly defined, just not knowable at compile time. You can even check the array's capacity if you want to try and figure out when it's going to happen. And there's not really any reasonable alternative. What would have happen instead? Make an array reallocate _every_ time that it's resized? That would be highly inefficient and could really degrade performance. Appending becomes O(n) instead of amortized O(1). If you're not altering the actual elements of the array, then the current implementation is great. If you _are_ altering them, then simply dup the array to guarantee that it's been reallocated. - Jonathan M Davis
Nov 08 2010
prev sibling next sibling parent so <so so.do> writes:
I didn't mean that one, check page 112 on  
http://erdani.com/d/thermopylae.pdf

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
Nov 08 2010
prev sibling next sibling parent so <so so.do> writes:
Oh yeh you are right, i said reallocation. Should have said assignment.

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
Nov 08 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 08 Nov 2010 18:29:27 -0500, Ali Çehreli <acehreli yahoo.com> wrote:

 Steven Schveighoffer wrote:

  > No, it doesn't. If you are appending to data that was passed in, you  
 are
  > not changing the *original data* passed in.  You are only appending  
 to it.

 I must be remembering an old behavior. I think appending could affect  
 the original if it had enough capacity.

Before the array append changes were introduced (somewhere around 2.040 I think?), appending to a slice that started at the beginning of the memory block could affect the other data in the array. But that was a memory corruption issue, somewhat different than what we are talking about. -Steve
Nov 09 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 09 Nov 2010 08:14:40 -0500, Pillsy <pillsbury gmail.com> wrote:

 Steven Schveighoffer Wrote:

 On Mon, 08 Nov 2010 18:29:27 -0500, Ali Çehreli <acehreli yahoo.com>  
 wrote:

 I must be remembering an old behavior. I think appending could
 affect the original if it had enough capacity.


 Before the array append changes were introduced (somewhere around
 2.040 I think?), appending to a slice that started at the beginning of
 the memory block could affect the other data in the array.  But that
 was a memory corruption issue, somewhat different than what we
 are talking about.

Ah! This is a lot of what was confusing me about arrays; I still thought they had this behavior. The fact that they don't makes me a good deal more comfortable with them, though I still don't like the non-deterministic way that they may copy their elements or they may share structure after you append stuff to them.

As I said before, this rarely affects code. The common cases I've seen: 1. You append to an array and return it. 2. You modify data in the array. 3. You use a passed in array as a buffer, which means you overwrite the array, and then start appending when it runs out of space. I don't ever remember seeing: You append to an array, then go back and modify the first few bytes of the array. Let's assume this is a very common thing and absolutely needs to be addressed. What would you like the behavior to be? How can anything different than the current behavior be reasonable? IMO, the benefits of just being able to append to an array any time you want without having to set up some special type far outweighs this little quirk that almost nobody encounters. You can append to *any* array, no matter where the data is located, or whether the data is a slice, and it just works. I can't see how anyone would prefer another solution! -Steve
Nov 09 2010
prev sibling next sibling parent spir <denis.spir gmail.com> writes:
On Tue, 09 Nov 2010 15:13:55 -0500
Pillsy <pillsbury gmail.com> wrote:

 There's a difference between appending and appending in place. The proble=

eserve that's larger than the actual amount, of course) is one of efficienc= y. Having
=20
 auto s =3D "foo";
 s ~=3D "bar";
=20
 result in a new array being allocated that is of length 6 and contains "f=

behavior. If the expansion can happen in place, that's a perfectly reasonab= le performance optimization to have in the case of strings or other immutab= le arrays. Indeed, one of the reasons that functional programming and GC go= together like peanut butter and jelly is that together they let you get al= l sorts of wins in terms of efficiency from shared structure.=20
=20
 However, I've found working with languages that mix a lot of imperative a=

going to do this, it's really very important that there not be any doubt a= bout when mutable state is shared and when it isn't. D is trying to be that= same kind of multi-paradigm language. +++
 This means that, for mutable arrays, having
=20
 int[] x =3D [1, 2, 3];
 x ~=3D [4, 5, 6];
=20
 maybe reallocate and maybe not seems like it's only really there to prote=

he back of an array repeatedly (or to make that admittedly common case more= convenient). This really doesn't strike me as worth the trouble. Like I sa= id elsewhere, the uncertainty gives me the screaming willies.=20 There is some trouble in there; but it's hard to point it clearly. After int[] ints1; ... ints2 =3D ints1; ... depending of what happens to each array, especially when passed to funcs th= at could manipulate them, the relation initially established between variab= les may or may not be maintained. Also, it may be broken only in some cases= , depending on what operations are performed during a given run. Possibly, in practice, things are much easier to do right than seems at fir= st sight. But my impression is I will be bitten more than once, and badly. = (*) Denis (*) Reminds of python's famous gotcha which instead _establishes_ an unexpe= cted relation: def f(i, l=3D[]): l.append(i) return l print f(1) # [1] print f(1) # [1, 1] print f(1) # [1, 1, 1] -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Nov 09 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 09 Nov 2010 15:13:55 -0500, Pillsy <pillsbury gmail.com> wrote:

 Steven Schveighoffer Wrote:

 On Tue, 09 Nov 2010 08:14:40 -0500, Pillsy <pillsbury gmail.com> wrote:

 Ah! This is a lot of what was confusing me about arrays; I still  

 they had this behavior. The fact that they don't makes me a good deal
 more comfortable with them, though I still don't like the
 non-deterministic way that they may copy their elements or they may
 share structure after you append stuff to them.


 As I said before, this rarely affects code.  The common cases I've seen:

 1. You append to an array and return it.
 2. You modify data in the array.
 3. You use a passed in array as a buffer, which means you overwrite the
 array, and then start appending when it runs out of space.

 I don't ever remember seeing:

 You append to an array, then go back and modify the first few bytes of  
 the
 array.

where standard library functions will return mutable arrays which may or may not share structure with their inputs. This has been such a frequent source of pain when using that language that I tend to react very negatively to the possibility in any context.

Care to name names? I want to understand this dislike of D arrays, because out of all the languages I've ever used, D arrays are by far the easiest and most intuitive to use. I don't expect to be convinced, but at least we can have some debate on this, and maybe we can avoid mistakes made by other languages.
 Let's assume this is a very common thing and absolutely needs to be
 addressed.  What would you like the behavior to be?

Using a different, library type for a buffer you can append to. I think of "a buffer or abstract list you can cheaply append to" as a different sort of type from a fixed size buffer anyway, since it so often is a different type. Arrays/slices are a very basic type in D, and I'm generally thinking that giving your basic types simpler, easier to understand semantics is worth paying a modest cost.

There was a time when the T[new] idea was expected to be part of the language. Both Andrei and Walter were behind it, and seldom does something not make it into the language when that happens. It turns out, that after all the academic and theoretical discussions were finished, and it came time to implement, it was a clunky and confusing feature. Andrei said that for TDPL he had a whole table dedicated to what type to use in which cases (T[] or T[new]) and he didn't even know how to fill out the table. The beauty of D's arrays are that the slice and the array are both the same type, so you only need to define one function to handle both, and appending "just works". I feel like this is simply a case of 'not well enough understood.' BTW, you can allocate a fixed buffer by doing: T[BUFSIZE] buffer; This cannot be appended to. It is still difficult to allocate one of these on the heap, which is a language shortcoming, but it can be fixed.
 [...]
 IMO, the benefits of just being able to append to an array any time you
 want without having to set up some special type far outweighs this  
 little
 quirk that almost nobody encounters.  You can append to *any* array, no
 matter where the data is located, or whether the data is a slice, and it
 just works.  I can't see how anyone would prefer another solution!

There's a difference between appending and appending in place. The problem with not appending in place (and arrays not having the possibility of a reserve that's larger than the actual amount, of course) is one of efficiency. Having auto s = "foo"; s ~= "bar"; result in a new array being allocated that is of length 6 and contains "foobar", and assigning that array to `s`, is obviously useful and desirable behavior. If the expansion can happen in place, that's a perfectly reasonable performance optimization to have in the case of strings or other immutable arrays. Indeed, one of the reasons that functional programming and GC go together like peanut butter and jelly is that together they let you get all sorts of wins in terms of efficiency from shared structure. However, I've found working with languages that mix a lot of imperative and functional constructs (Lisp is one, but not the only one) that if you're going to do this, it's really very important that there not be any doubt about when mutable state is shared and when it isn't. D is trying to be that same kind of multi-paradigm language. This means that, for mutable arrays, having int[] x = [1, 2, 3]; x ~= [4, 5, 6];

To leave no doubt about whether this reallocates or not try: bool willReallocate = x.length + 3 > x.capacity; But I still don't understand this concept. If you find out it's not going to reallocate, what are you going to do? I mean, you have three cases here: 1. You *don't* want it to reallocate -- well, you can't enforce this, but you can use ref to ensure the original is always affected 2. You *want* it to reallocate -- use dup or ~ 3. You don't care -- just use the array directly I don't see how these three options aren't enough.
 maybe reallocate and maybe not seems like it's only really there to  
 protect people from doing inefficient things by accident when they  
 append onto the back of an array repeatedly (or to make that admittedly  
 common case more convenient). This really doesn't strike me as worth the  
 trouble. Like I said elsewhere, the uncertainty gives me the screaming  
 willies.

I hear you, but at the same time, we are talking about common and uncommon cases here. D (at least in my mind) tries to be a practical language -- make the common things easy as long as they are safe. And the cases where D's arrays may surprise you are pretty uncommon IMO. -Steve
Nov 10 2010
prev sibling next sibling parent spir <denis.spir gmail.com> writes:
On Fri, 26 Nov 2010 18:22:46 +0000
Bruno Medeiros <brunodomedeiros+spam com.gmail> wrote:

 Making the array reallocate _every_ time that it's resized (to a greater=

 length) is actually not that unreasonable. Would it be highly=20
 inneficient? Only if you write bad code. TDPL agrees with you, I quote:
=20
 "
 One easy way out would be to always reallocate a upon appending to it
 [...]
 Although that behavior is easiest to implement, it
 has serious efficiency problems. For example, oftentimes arrays are=20
 iteratively grown in a loop:
=20
    int[] a;
    foreach (i; 0 .. 100) {
      a ~=3D i;
    }
=20
 "
=20
 Hum, "oftentimes"? I wonder if such code is really that common (and what=

 languages are we talking about here?)
=20
 But more importantly, there is a simple solution: don't write such code,=

 don't use arrays like if they are lists, preallocate instead and then=20
 fill the array. So with this alternative behavior, you can still write=20
 efficient code, and nearly as easily.
=20
 The only advantage of the current behavior is that it is more noob=20
 friendly, which is an advantage of debatable value.

Well, except that "noobs" usually don't care about performance. (Anybody would else preallocate, I guess, if only because it is just a few = more key strokes; but the corresponding idiom is not that obvious: T[] xxx =3D new T[yyy.length]; ) Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Nov 26 2010
prev sibling parent spir <denis.spir gmail.com> writes:
On Fri, 26 Nov 2010 21:59:37 +0000
Bruno Medeiros <brunodomedeiros+spam com.gmail> wrote:

 You could still do exponential capacity growth by manipulating the=20
 length property, but yeah, that would create a host of complexity and=20
 other issues (see my reply to Pelle). Yeah, my reasoning was really=20
 broken. :'(

What is the reason why D does not internally manages exp growth with a (pri= vate) capacity field? It's very common, and I thought the reason was precis= ely efficiency as it does not require reallocating so often, esp in cases s= uch as feeding an array in a loop like in Andrei's example.) Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Nov 26 2010