www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Word of warning about passing arrays to functions..

reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
This was something that bit me.  I thought arrays were passed by reference 
to functions, like classes.  Hence the whole copy-on-write situation.

Well, they're not.  Sort of.

If you only _modify_ the contents of the array, it's as if you passed it 
byref (this is more of a programmer's error than anything - a ticking time 
bomb).  But if you do something that changes the array's .ptr (like resizing 
it, which may cause the data to be moved), the array in the calling function 
is left alone.

I was doing this:

void something(char[] s)
{
    // modify s in place
}

And it worked fine.  I could pass in a string and it would be modified in 
place.  But then I changed it so something() changed the length of the 
array, and it no longer worked.

I was puzzled, until I added an "inout" to the parameter decl, at which 
point it worked again.

Just something to remember when you're passing in arrays to functions. 
Jun 21 2005
next sibling parent reply "Unknown W. Brackets" <unknown simplemachines.org> writes:
Makes perfect sense to me.  I just think of arrays as a struct - pointer 
and length.  Modify the data pointed to and you modify everyone's 
copy... modify the pointer and you modify the struct (stack) copy.

-[Unknown]


 This was something that bit me.  I thought arrays were passed by reference 
 to functions, like classes.  Hence the whole copy-on-write situation.
 
 Well, they're not.  Sort of.
 
 If you only _modify_ the contents of the array, it's as if you passed it 
 byref (this is more of a programmer's error than anything - a ticking time 
 bomb).  But if you do something that changes the array's .ptr (like resizing 
 it, which may cause the data to be moved), the array in the calling function 
 is left alone.
 
 I was doing this:
 
 void something(char[] s)
 {
     // modify s in place
 }
 
 And it worked fine.  I could pass in a string and it would be modified in 
 place.  But then I changed it so something() changed the length of the 
 array, and it no longer worked.
 
 I was puzzled, until I added an "inout" to the parameter decl, at which 
 point it worked again.
 
 Just something to remember when you're passing in arrays to functions. 
 
 

Jun 21 2005
parent Dawid =?ISO-8859-2?Q?Ci=EA=BFarkiewicz?= <arael fov.pl> writes:
Unknown W. Brackets wrote:

 Makes perfect sense to me.  I just think of arrays as a struct - pointer
 and length.  Modify the data pointed to and you modify everyone's
 copy... modify the pointer and you modify the struct (stack) copy.

I like this explanation. It's simple and I'll be keeping it in my mind when dealing with arrays. Thanks. -- Dawid Ciężarkiewicz | arael
Jun 22 2005
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Tue, 21 Jun 2005 21:27:40 -0400, Jarrett Billingsley wrote:

 This was something that bit me.  I thought arrays were passed by reference 
 to functions, like classes.  Hence the whole copy-on-write situation.
 
 Well, they're not.  Sort of.
 
 If you only _modify_ the contents of the array, it's as if you passed it 
 byref (this is more of a programmer's error than anything - a ticking time 
 bomb).  But if you do something that changes the array's .ptr (like resizing 
 it, which may cause the data to be moved), the array in the calling function 
 is left alone.
 
 I was doing this:
 
 void something(char[] s)
 {
     // modify s in place
 }
 
 And it worked fine.  I could pass in a string and it would be modified in 
 place.  But then I changed it so something() changed the length of the 
 array, and it no longer worked.

Well it did work according to the compiler, it just didn't do as you expected. When 's' is passed, you get a pointer and a length. Your routine modified the data in place, that is it updated the RAM that the pointer points to. But when you change the length, D allocates a new lot of RAM and copies the data to it then sets 's' to the new RAM. But because you passed 's' as an "in" parameter, the changes to it are not passed back to the calling routine. Thus from the perspective of the calling routine, the parameter still points to the original RAM. The tricky part is that if your function modified the RAM before it changed the length, the calling routine will see your changes but not the new length. If you change the length before modifying RAM, the calling routine will not see your modifications.
 I was puzzled, until I added an "inout" to the parameter decl, at which 
 point it worked again.

Yes, by making the [pointer/length] combination an "inout" parameter then your changes to the length get sent back to the calling routine. -- Derek Melbourne, Australia 22/06/2005 12:01:14 PM
Jun 21 2005
next sibling parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message 
news:194ul3972wdxy.1hekvtrwup6wl$.dlg 40tude.net...
 When 's' is passed, you get a pointer and a length. Your routine modified
 the data in place, that is it updated the RAM that the pointer points to.
 But when you change the length, D allocates a new lot of RAM and copies 
 the
 data to it then sets 's' to the new RAM. But because you passed 's' as an
 "in" parameter, the changes to it are not passed back to the calling
 routine. Thus from the perspective of the calling routine, the parameter
 still points to the original RAM.

Oh I know well _why_ it happens, it's just that it wasn't readily apparent that it _was_ happening at first.
Jun 21 2005
prev sibling parent reply Jan-Eric Duden <jeduden whisset.com> writes:
Derek Parnell wrote:
 On Tue, 21 Jun 2005 21:27:40 -0400, Jarrett Billingsley wrote:
 
 
This was something that bit me.  I thought arrays were passed by reference 
to functions, like classes.  Hence the whole copy-on-write situation.

Well, they're not.  Sort of.

If you only _modify_ the contents of the array, it's as if you passed it 
byref (this is more of a programmer's error than anything - a ticking time 
bomb).  But if you do something that changes the array's .ptr (like resizing 
it, which may cause the data to be moved), the array in the calling function 
is left alone.

I was doing this:

void something(char[] s)
{
    // modify s in place
}

And it worked fine.  I could pass in a string and it would be modified in 
place.  But then I changed it so something() changed the length of the 
array, and it no longer worked.

Well it did work according to the compiler, it just didn't do as you expected. When 's' is passed, you get a pointer and a length. Your routine modified the data in place, that is it updated the RAM that the pointer points to. But when you change the length, D allocates a new lot of RAM and copies the data to it then sets 's' to the new RAM. But because you passed 's' as an "in" parameter, the changes to it are not passed back to the calling routine. Thus from the perspective of the calling routine, the parameter still points to the original RAM. The tricky part is that if your function modified the RAM before it changed the length, the calling routine will see your changes but not the new length. If you change the length before modifying RAM, the calling routine will not see your modifications.
I was puzzled, until I added an "inout" to the parameter decl, at which 
point it worked again.

Yes, by making the [pointer/length] combination an "inout" parameter then your changes to the length get sent back to the calling routine.

that modify arrays if I call method sort of an array in something the change is visible. if I call method reverse of an array in something the change is visible. if I set the length property in something the change is not visible. maybe arrays should always be passed as inout ... otherwise this behavior only asks for trouble. Just my 0.2 euro cent....
Jun 22 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Wed, 22 Jun 2005 09:31:17 +0200, Jan-Eric Duden <jeduden whisset.com>  
wrote:
 Derek Parnell wrote:
 On Tue, 21 Jun 2005 21:27:40 -0400, Jarrett Billingsley wrote:

 This was something that bit me.  I thought arrays were passed by  
 reference to functions, like classes.  Hence the whole copy-on-write  
 situation.

 Well, they're not.  Sort of.

 If you only _modify_ the contents of the array, it's as if you passed  
 it byref (this is more of a programmer's error than anything - a  
 ticking time bomb).  But if you do something that changes the array's  
 .ptr (like resizing it, which may cause the data to be moved), the  
 array in the calling function is left alone.

 I was doing this:

 void something(char[] s)
 {
    // modify s in place
 }

 And it worked fine.  I could pass in a string and it would be modified  
 in place.  But then I changed it so something() changed the length of  
 the array, and it no longer worked.

expected. When 's' is passed, you get a pointer and a length. Your routine modified the data in place, that is it updated the RAM that the pointer points to. But when you change the length, D allocates a new lot of RAM and copies the data to it then sets 's' to the new RAM. But because you passed 's' as an "in" parameter, the changes to it are not passed back to the calling routine. Thus from the perspective of the calling routine, the parameter still points to the original RAM. The tricky part is that if your function modified the RAM before it changed the length, the calling routine will see your changes but not the new length. If you change the length before modifying RAM, the calling routine will not see your modifications.
 I was puzzled, until I added an "inout" to the parameter decl, at  
 which point it worked again.

then your changes to the length get sent back to the calling routine.

that modify arrays if I call method sort of an array in something the change is visible. if I call method reverse of an array in something the change is visible. if I set the length property in something the change is not visible. maybe arrays should always be passed as inout ... otherwise this behavior only asks for trouble.

This is another case where enforcing readonly for 'in' parameters and giving a compile time (when possible) or DBC style runtime error for violations would have shown the bug immediately. It makes sense to be want to modify the contents of an array without modifying the reference, i.e. add 1 to all numbers in an array of integers. This is an example of an array passed with 'in'. It makes sense to want to append data to an array, modifying the array reference length and possibly the data pointer. This is an example of an array passed with 'inout'. It makes no sense to pass an array as 'in' and attempt to modify the length and or data pointer. If a modifiable reference is required it's simple enough to create one for that purpose, i.e. void foo(char[] a) { char[] b = a; //it would be a compile time error to modify 'a'. //it would be fine to modify 'b'. } In cases where a compile time error is not possible ie. the compiler cannot detect the error an implicit DBC check at the end of the function should compare the input array with the local copy, any changes would throw an exception. As a side issue it also makes sense to want to pass an array whose data and reference are unmodifyable. I thought this wasn't possible, but it seems it now is: import std.stdio; const char[] test1 = "testing1"; char[] test2 = "testing2"; void foo(char[] p) { p[0] = 'a'; } void main() { foo(test1); writefln("%s",test1); foo(test2); writefln("%s",test2); } [output] testing1 aesting2 Regan
Jun 22 2005
parent reply "Lionello Lunesu" <lio lunesu.removethis.com> writes:
Shouldn't the compiler complain when passing a const char[] to a
func(char[]) ??

L.

| void foo(char[] p)
| {
| p[0] = 'a';
| }
|
| void main()
| {
| foo(test1);
| writefln("%s",test1);
|
| foo(test2);
| writefln("%s",test2);
| }
|
| [output]
| testing1
| aesting2
Jun 22 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Wed, 22 Jun 2005 16:09:28 +0300, Lionello Lunesu  
<lio lunesu.removethis.com> wrote:
 Shouldn't the compiler complain when passing a const char[] to a
 func(char[]) ??

void foo(char[] p) { p[0] = 'a'; } 'p' is an 'in' parameter. 'in' means "I will read this parameter". So, technically passing a const as 'in' is fine. However, the function goes on to modify the 'in' parameter. IMO the readonly nature of 'in' needs to be enforced. To do this I would: 1. have a compile time flag for variables signalling whether they are readonly or not, error on incorrect usage (like this one) 2. have a runtime DBC flag for variables signalling whether they are readonly or not (this includes user classes, structs, unions, etc) throw an exception for incorrect usage. #2 is DBC only so disabled with -release 2 rules are also required: 1. immutable data cannot become mutable 2. mutable data can implicitly be immutable. Regan
Jun 22 2005
parent reply "Andrew Fedoniouk" <news terrainformatica.com> writes:
"Regan Heath" <regan netwin.co.nz> wrote in message 
news:opssslvmy023k2f5 nrage.netwin.co.nz...
 On Wed, 22 Jun 2005 16:09:28 +0300, Lionello Lunesu 
 <lio lunesu.removethis.com> wrote:
 Shouldn't the compiler complain when passing a const char[] to a
 func(char[]) ??

void foo(char[] p) { p[0] = 'a'; } 'p' is an 'in' parameter. 'in' means "I will read this parameter". So, technically passing a const as 'in' is fine. However, the function goes on to modify the 'in' parameter. IMO the readonly nature of 'in' needs to be enforced. To do this I would: 1. have a compile time flag for variables signalling whether they are readonly or not, error on incorrect usage (like this one)

Too many times and too many people were trying to convince Walter to implement const. These efforts has no future.
 2. have a runtime DBC flag for variables signalling whether they are 
 readonly or not (this includes user classes, structs, unions, etc) throw 
 an exception for incorrect usage.

I've tried to do this already. Without opAssign it is just impossible.
 #2 is DBC only so disabled with -release

 2 rules are also required:
 1. immutable data cannot become mutable
 2. mutable data can implicitly be immutable.

 Regan

PS: about RAII: http://www.digitalmars.com/d/archives/7988.html
Jun 23 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Thu, 23 Jun 2005 11:32:06 -0700, Andrew Fedoniouk  
<news terrainformatica.com> wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opssslvmy023k2f5 nrage.netwin.co.nz...
 On Wed, 22 Jun 2005 16:09:28 +0300, Lionello Lunesu
 <lio lunesu.removethis.com> wrote:
 Shouldn't the compiler complain when passing a const char[] to a
 func(char[]) ??

void foo(char[] p) { p[0] = 'a'; } 'p' is an 'in' parameter. 'in' means "I will read this parameter". So, technically passing a const as 'in' is fine. However, the function goes on to modify the 'in' parameter. IMO the readonly nature of 'in' needs to be enforced. To do this I would: 1. have a compile time flag for variables signalling whether they are readonly or not, error on incorrect usage (like this one)

Too many times and too many people were trying to convince Walter to implement const. These efforts has no future.

You can think what you like. I'm going to keep suggesting enforced/checked "readonly" for 'in' (not 'const') until I get a definitive answer from Walter. It's my impression that some of what we're been suggesting has already been implemented, after all a while back this: const char[] bob = "test"; void foo(char[] a) { a[0] = 'a'; } foo(bob); actually modified the string 'bob'. Walter fixed that.
 2. have a runtime DBC flag for variables signalling whether they are
 readonly or not (this includes user classes, structs, unions, etc) throw
 an exception for incorrect usage.

I've tried to do this already. Without opAssign it is just impossible.

I'm not suggesting *you* do it at all. I'm suggesting the compiler does it. My other response to you in another thread describes how this would work.
 #2 is DBC only so disabled with -release

 2 rules are also required:
 1. immutable data cannot become mutable
 2. mutable data can implicitly be immutable.

 Regan

PS: about RAII: http://www.digitalmars.com/d/archives/7988.html

That is an interesting thread. Thanks. So, what did you want me to see in there in particular? Regan
Jun 23 2005
parent reply Oskar Linde <olREM OVEnada.kth.se> writes:
Regan Heath wrote:

 On Thu, 23 Jun 2005 11:32:06 -0700, Andrew Fedoniouk
 <news terrainformatica.com> wrote:
 Too many times and too many people were trying to convince
 Walter to implement const. These efforts has no future.

You can think what you like. I'm going to keep suggesting enforced/checked "readonly" for 'in' (not 'const') until I get a definitive answer from Walter. It's my impression that some of what we're been suggesting has already been implemented, after all a while back this: const char[] bob = "test"; void foo(char[] a) { a[0] = 'a'; } foo(bob); actually modified the string 'bob'. Walter fixed that.

The only thing fixed, afaik, was that bob was placed in a read only memory segment which is what const in D does. Const is a storage class, not a type modifier. Citing Walter from 2004-05-09:
 <hot air>
 The reason const as a type modifier is not in D is because the cost is 
 high (in terms of complexity of the language, and complexity of the
 implementation) and the benefits are slight. I've used const for 15 years
 and not once has it ever found or prevented a bug. Because it can be 
 legally subverted, it's  useless as a hint to the optimizer. For the final 
 indignity, it's ugly to have 'const' appear all over the place (doing a wc
 on some headers shows that 'const' is usually among the most frequently
 occurring words).
 </hot air>

As a contrast to that last argument, it is interesting to note that some C++ experts today would prefer if const had been the default type modifier, with a explicit mutable attribute for the other cases. / Oskar
Jun 24 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Fri, 24 Jun 2005 09:09:46 +0000, Oskar Linde <olREM OVEnada.kth.se>  
wrote:
 Regan Heath wrote:

 On Thu, 23 Jun 2005 11:32:06 -0700, Andrew Fedoniouk
 <news terrainformatica.com> wrote:
 Too many times and too many people were trying to convince
 Walter to implement const. These efforts has no future.

You can think what you like. I'm going to keep suggesting enforced/checked "readonly" for 'in' (not 'const') until I get a definitive answer from Walter. It's my impression that some of what we're been suggesting has already been implemented, after all a while back this: const char[] bob = "test"; void foo(char[] a) { a[0] = 'a'; } foo(bob); actually modified the string 'bob'. Walter fixed that.

The only thing fixed, afaik, was that bob was placed in a read only memory segment which is what const in D does.

But before it was only doing this for the array reference, this is a step forward if only a small one.
 Const is a storage class, not a type
 modifier.

Yep and I think that's the 'correct' way to implement it. Regan
Jun 24 2005
parent reply Oskar Linde <Oskar_member pathlink.com> writes:
In article <opssvdy0s823k2f5 nrage.netwin.co.nz>, Regan Heath says...
On Fri, 24 Jun 2005 09:09:46 +0000, Oskar Linde <olREM OVEnada.kth.se>  
wrote:
 The only thing fixed, afaik, was that bob was placed in a read only  
 memory
 segment which is what const in D does.

But before it was only doing this for the array reference, this is a step forward if only a small one.
 Const is a storage class, not a type
 modifier.

Yep and I think that's the 'correct' way to implement it. Regan

I actually agree, and it fulfills one need. But people also want to use it as a way of signalling ownership: const char[] getHostName(); saying you may read this char[], but it is not yours to change. In C/C++, ownership is even more important to keep track of, because with it also comes the responsibility of cleaning up, and for this reason the auto_ptr/shared_ptr idioms have appeared in C++: auto_ptr<> is used as a means of transfering ownership: A function taking an auto_ptr<> argument assumes ownership of the data, and a function returning an auto_ptr<> gives the ownership away. a shared_ptr<> is ref-counted and you could check its ref-count to see if you are the sole owner. D really needs a way to declare ownership/constness/readonliness. You need to know if void func1(char[] arr); will modify the contents of arr, and if you may modify the contents returned from char[] func2(); Today, defensive .dup-ing seems to be the only way to be safe. /Oskar
Jun 24 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Fri, 24 Jun 2005 11:38:27 +0000 (UTC), Oskar Linde  
<Oskar_member pathlink.com> wrote:
 In article <opssvdy0s823k2f5 nrage.netwin.co.nz>, Regan Heath says...
 On Fri, 24 Jun 2005 09:09:46 +0000, Oskar Linde <olREM OVEnada.kth.se>
 wrote:
 The only thing fixed, afaik, was that bob was placed in a read only
 memory
 segment which is what const in D does.

But before it was only doing this for the array reference, this is a step forward if only a small one.
 Const is a storage class, not a type
 modifier.

Yep and I think that's the 'correct' way to implement it. Regan

I actually agree, and it fulfills one need. But people also want to use it as a way of signalling ownership:

I know. The solution as I see it is to have both a compile time and runtime readonly flag for all variables and to enforce const and 'in' as being readonly. In other words as the compiler runs it flags variables as readonly based on the presence of const, or being passed as an 'in' parameter to a function. It then gives an error if any of these variables are modified. If that cannot catch all cases then a runtime readonly flag could be used to ensure the remaining cases are caught. This could be a DBC feature and disabled in -release builds for efficiency's sake. By readonly flag I mean every single variable has a bit that is set when it's readonly, and if modified while that bit is set it throws an exception. If this is too much overhead then another option is to have implicit 'out' blocks on all functions where it checks the 'in' parameters have not been modified in any way. A simple binary comparison is enough in most cases. These ideas should solve the problems you've mentioned below.
 const char[] getHostName();

 saying you may read this char[], but it is not yours to change.

Solved if 'const' causes the readonly bit to be set at compile time and/or runtime and it is enforced.
 In C/C++, ownership is even more important to keep track of, because  
 with it
 also comes the responsibility of cleaning up, and for this reason the
 auto_ptr/shared_ptr idioms have appeared in C++:

Some of the need for "cleaning up" is alleviated by the fact that we have a GC. All that remains is a requirement for deterministic destruction to release resources. 'auto' is the D solution for that, it handles most/all cases required and will improve in efficiency when Walter makes 'auto' classes stack allocated.
 auto_ptr<> is used as a means of transfering ownership: A function  
 taking an
 auto_ptr<> argument assumes ownership of the data, and a function  
 returning an
 auto_ptr<> gives the ownership away.

 a shared_ptr<> is ref-counted and you could check its ref-count to see  
 if you
 are the sole owner.

If 'in' has enforced readonly semantics and const too then you can safely know whether you have ownership of a variable, unless it's in or const, you do. Otherwise you have to .dup in order to write to it.
 D really needs a way to declare ownership/constness/readonliness.
 You need to know if

 void func1(char[] arr);

 will modify the contents of arr

As 'arr' is passed as 'in' it *will not*, however, D does not enforce this contract. IMO it needs to.
 , and if you may modify the contents returned
 from

 char[] func2();

without 'const' you can. with 'const' you cannot.
 Today, defensive .dup-ing seems to be the only way to be safe.

True. What do you think of the changes I have described? Somewhat strangely (for this NG) I have posted this idea several times and gotten little of no comment from anyone. Either I'm talking complete garbage or everyone (silently) agrees. Regan
Jun 24 2005
parent reply "Lionello Lunesu" <lio lunesu.removethis.com> writes:
Hi,

| I know. The solution as I see it is to have both a compile time and
| runtime readonly flag for all variables and to enforce const and 'in' as
| being readonly.

Sounds like the C++ const, type modifier. You keep a flag for every 
variable, comes down to every variable having either a "const" or 
non-constant flag..

And aren't these flags only needed at compile time? The prototype of each 
variable or function argument determines the state of the flag, whether it's 
const or not. Exactly what's done in C++.

I really like 'const' in C++, and use it extensively in my API's, to prevent 
users from doing stuff they shouldn't do. Or, I don't want them to do : )

L. 
Jun 24 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Fri, 24 Jun 2005 17:39:30 +0300, Lionello Lunesu  
<lio lunesu.removethis.com> wrote:
 | I know. The solution as I see it is to have both a compile time and
 | runtime readonly flag for all variables and to enforce const and 'in'  
 as
 | being readonly.

 Sounds like the C++ const, type modifier. You keep a flag for every
 variable, comes down to every variable having either a "const" or
 non-constant flag..

Probably. I've never used const or read anything describing how you'd implement it in a compiler :)
 And aren't these flags only needed at compile time? The prototype of each
 variable or function argument determines the state of the flag, whether  
 it's const or not. Exactly what's done in C++.

My 'runtime' suggestion is really a "just in case it can't all be done at compile time" suggestion.
 I really like 'const' in C++, and use it extensively in my API's, to  
 prevent
 users from doing stuff they shouldn't do. Or, I don't want them to do : )

I've never used it before, it just seems like something we *need* in order to carry out "copy-on-write" efficiently. Dup'ing everything based on guesswork is inefficient and feels like "voodoo" style programming to me. Regan
Jun 26 2005