www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - peculiarities with char[] and std.string

reply Kyle K <Kyle_member pathlink.com> writes:
Greetings.

I was poking around the std.string lib, and was wondering if someone could
answer a few questions about it. I'm relatively new to D, so I'm sure there are
pretty obvious answers.

I notice in most of the functions like toStringz() and tolower() it implements
the copy-on-write convention... but since the default function parameter is in,
is there not already an implicit copy of the data being made? For example,

import std.stdio;
int main()
{
char []str, str2;
str="foo";
str2= bob(str);
writefln("%s:%s", str, str2);  // should print "foo:keke"
return 0;
}
char []bob(in char[] str)
{
str = "keke"; 
return str;
}

Works fine with my copy of DMD. Is this behavior not to be relied on as you
shouldn't ever touch memory you didnt allocate (according to the FAQ)?


Also, why is the following the case:

printf("%s", "hello\0"); // Fails with access violation
printf("%s", cast(char *)"hello\0"); // OK

Is the implicit casting from char[] to char * doing something im not aware of in
terms of the length of the string, like chopping off the \0?

My last question is which is the preferred method of making a copy of a string?
Suppose I want str2 to be a copy of str, then:

str2.length = str.length;
str2[] = str;
//      These two equivalent?
str2 = str.dup;

Sorry for all the questions and thanks for the help, let me know if this info is
somewhere obvious.. I wasn't able to find it in the spec.

Regards
Kyle K.
Jun 19 2006
parent reply xs0 <xs0 xs0.com> writes:
Kyle K wrote:
 Greetings.
 
 I was poking around the std.string lib, and was wondering if someone could
 answer a few questions about it. I'm relatively new to D, so I'm sure there are
 pretty obvious answers.
 
 I notice in most of the functions like toStringz() and tolower() it implements
 the copy-on-write convention... but since the default function parameter is in,
 is there not already an implicit copy of the data being made? 
No, just a copy of the _reference_ is made, but both point to the same data.
 For example,
 
 import std.stdio;
 int main()
 {
 char []str, str2;
 str="foo";
 str2= bob(str);
 writefln("%s:%s", str, str2);  // should print "foo:keke"
 return 0;
 }
 char []bob(in char[] str)
 {
 str = "keke"; 
 return str;
 }
 
 Works fine with my copy of DMD. Is this behavior not to be relied on as you
 shouldn't ever touch memory you didnt allocate (according to the FAQ)?
Well, you didn't touch the memory you didn't allocate :) If you had char[] bob(in char[] str) { str[0] = 'a'; return str; } You'd get "aoo:aoo" as output (or a crash, as you can't write into constants on some platforms)
 Also, why is the following the case:
 
 printf("%s", "hello\0"); // Fails with access violation
 printf("%s", cast(char *)"hello\0"); // OK
 
 Is the implicit casting from char[] to char * doing something im not aware of
in
 terms of the length of the string, like chopping off the \0?
"hello\0" is a D char[] array, which is composed of length + char*. printf doesn't know about D arrays, so it takes the length to be the pointer to data, which fails for obvious reasons. When you cast it to char*, you lose the length, keep the pointer, and it works. I think you should use something like printf("%.*s", "hello"); // no zero needed/wanted in this case.. Better yet, use writef/ln instead - it knows all about D's types..
 My last question is which is the preferred method of making a copy of a string?
 Suppose I want str2 to be a copy of str, then:
 
 str2.length = str.length;
 str2[] = str;
 //      These two equivalent?
 str2 = str.dup;
Generally, .dup is/could/should be faster, as it's obvious you want a copy, so there's no need to initialize the destination array on resizing, for example. Hope that helped :) xs0
Jun 19 2006
next sibling parent reply Kyle K <Kyle_member pathlink.com> writes:
In article <e76aq8$qsr$1 digitaldaemon.com>, xs0 says...
Well, you didn't touch the memory you didn't allocate :) If you had

char[] bob(in char[] str)
{
     str[0] = 'a';
     return str;
}

You'd get "aoo:aoo" as output (or a crash, as you can't write into 
constants on some platforms)
Ah ok, that makes sense. So using 'in' with arrays and aggregate types will always still give you a reference? I assume with primitives the semantics remain pass-by-value, such that foo(in int b) will never modify the caller's data?
Hope that helped :)
It did, thanks a lot! :D
Jun 19 2006
parent reply BCS <BCS pathlink.com> writes:
Kyle K wrote:
 In article <e76aq8$qsr$1 digitaldaemon.com>, xs0 says...
 
Well, you didn't touch the memory you didn't allocate :) If you had

char[] bob(in char[] str)
{
    str[0] = 'a';
    return str;
}

You'd get "aoo:aoo" as output (or a crash, as you can't write into 
constants on some platforms)
Ah ok, that makes sense. So using 'in' with arrays and aggregate types will always still give you a reference? I assume with primitives the semantics remain pass-by-value, such that foo(in int b) will never modify the caller's data?
Actually "in" always gives you a copy of the actual "thing". Arrays are reference types so you get a copy of the reference. Same with objects, as they are also reference types. Stucts on the other hand are not reference types and as such will get passed by value class fooC{int i;} struct fooS{int i;} void main() { fooC c1= new fooC, c2; c1.i = 0; c2 = fn(c1); writef(c1.i, " ", c2.i, \n); // prints "1 1" fooS s1, s2; s1.i = 0; s2 = fn(s1); writef(s1.i, " ", s2.i, \n); // prints "0 1" } fooC fn(in fooC v) { v.i=1; return v; } fooS fn(in fooS v) { v.i=1; return v; }
Jun 19 2006
parent Kyle K <Kyle_member pathlink.com> writes:
In article <e76jri$1ds7$1 digitaldaemon.com>, BCS says...

 Ah ok, that makes sense. So using 'in' with arrays and aggregate types will
 always still give you a reference? I assume with primitives the semantics
remain
 pass-by-value, such that foo(in int b) will never modify the caller's data?
 
 
Actually "in" always gives you a copy of the actual "thing". Arrays are reference types so you get a copy of the reference. Same with objects, as they are also reference types. Stucts on the other hand are not reference types and as such will get passed by value class fooC{int i;} struct fooS{int i;} void main() { fooC c1= new fooC, c2; c1.i = 0; c2 = fn(c1); writef(c1.i, " ", c2.i, \n); // prints "1 1" fooS s1, s2; s1.i = 0; s2 = fn(s1); writef(s1.i, " ", s2.i, \n); // prints "0 1" } fooC fn(in fooC v) { v.i=1; return v; } fooS fn(in fooS v) { v.i=1; return v; }
Got it, thanks a bunch. I knew it had to be something simple... :D
Jun 19 2006
prev sibling parent Kyle K <Kyle_member pathlink.com> writes:
In article <e76aq8$qsr$1 digitaldaemon.com>, xs0 says...
Well, you didn't touch the memory you didn't allocate :) If you had

char[] bob(in char[] str)
{
     str[0] = 'a';
     return str;
}

You'd get "aoo:aoo" as output (or a crash, as you can't write into 
constants on some platforms)
Ah ok, that makes sense. So using 'in' with arrays and aggregate types will always still give you a reference? I assume with primitives the semantics remain pass-by-value, such that foo(in int b) will never modify the caller's data?
Hope that helped :)
It did, thanks a lot! :D
Jun 19 2006