www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Newbie Question about strings

reply hellcatv hotmail.com writes:
does the following result in undefined behavior (as if I had realloc'd the char
* inp in C?)
i.e. could the inp[0]='A' also affect the char[] s;


import std.string;
void mod (char [] inp) {
inp~="8";
inp[0]='A';
printf ("\n%s ",std.string.toStringz(inp));
printf ("%d ",inp.length);  
}

int main () {
char [] s = "1234567";
printf ("%s\n",std.string.toStringz(s));
mod(s);
printf ("%s\n",std.string.toStringz(s));
return 0;
}
May 10 2004
parent reply "Ben Hinkle" <bhinkle4 juno.com> writes:
The "~=" operator will reallocate if there isn't space already. That is why
the std.string uses "copy-on-write" semantics - meaning if you don't "own"
an array you make a copy before changing it.

<hellcatv hotmail.com> wrote in message
news:c7of0m$c15$1 digitaldaemon.com...
 does the following result in undefined behavior (as if I had realloc'd the

 * inp in C?)
 i.e. could the inp[0]='A' also affect the char[] s;


 import std.string;
 void mod (char [] inp) {
 inp~="8";
 inp[0]='A';
 printf ("\n%s ",std.string.toStringz(inp));
 printf ("%d ",inp.length);
 }

 int main () {
 char [] s = "1234567";
 printf ("%s\n",std.string.toStringz(s));
 mod(s);
 printf ("%s\n",std.string.toStringz(s));
 return 0;
 }

May 10 2004
parent reply Sean Kelly <sean f4.ca> writes:
Ben Hinkle wrote:

 The "~=" operator will reallocate if there isn't space already. That is why
 the std.string uses "copy-on-write" semantics - meaning if you don't "own"
 an array you make a copy before changing it.

But D is a GC language. Would there even be a dangling reference in this case? I assumed that this would just result in a side-effect. Sean
May 10 2004
parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message
news:c7osgt$10f4$1 digitaldaemon.com...
 Ben Hinkle wrote:

 The "~=" operator will reallocate if there isn't space already. That is


 the std.string uses "copy-on-write" semantics - meaning if you don't


 an array you make a copy before changing it.

But D is a GC language. Would there even be a dangling reference in this case? I assumed that this would just result in a side-effect.

umm, I'm not sure what the GC has to do with it, but yeah, the GC will collect the copy if all the references go away. COW is to prevent side-effects.
May 10 2004
next sibling parent reply Daniel Horn <hellcatv hotmail.com> writes:
right the docs say "you" but I wasn't sure if it means I must do it or 
by modifying it, the lib does a copy-on-write.

so I must specifically make a copy of it in order to guarantee that my 
function will not result in side effects?

could I make a wrapper struct that guaranteed it would copy when passed 
into a function (like C++ strings)?  in C I could wrap a static array 
into a struct in order to get pass-by-value semantics (of course the 
size of this array was known) and in C++ I could make a copy constructor 
that would implicitly get called when I passed the string into a wrapper 
function.

is there anything similar in D for char[] arrays?

Ben Hinkle wrote:
 "Sean Kelly" <sean f4.ca> wrote in message
 news:c7osgt$10f4$1 digitaldaemon.com...
 
Ben Hinkle wrote:


The "~=" operator will reallocate if there isn't space already. That is


why
the std.string uses "copy-on-write" semantics - meaning if you don't


"own"
an array you make a copy before changing it.

But D is a GC language. Would there even be a dangling reference in this case? I assumed that this would just result in a side-effect.

umm, I'm not sure what the GC has to do with it, but yeah, the GC will collect the copy if all the references go away. COW is to prevent side-effects.

May 10 2004
parent reply Ben Hinkle <bhinkle4 juno.com> writes:
Daniel Horn wrote:

 right the docs say "you" but I wasn't sure if it means I must do it or
 by modifying it, the lib does a copy-on-write.

copy-on-write is not enforced by the compiler but it is a technique used by std.string (and probably the rest of phobos). If you look at std.string.tolower, for example, you will see how it delays making a copy until it absolutely has to. I'm not sure which part of the doc you are looking at.
 so I must specifically make a copy of it in order to guarantee that my
 function will not result in side effects?

If you write a statement like str[3] = 'a'; then you should think about using COW. If you want to guarantee your function has no side effect then you should make a copy. If you write str = tolower(str); in your function then you don't have to make a copy since tolower uses COW already and it will make a copy if it needs to.
 could I make a wrapper struct that guaranteed it would copy when passed
 into a function (like C++ strings)?  in C I could wrap a static array
 into a struct in order to get pass-by-value semantics (of course the
 size of this array was known) and in C++ I could make a copy constructor
 that would implicitly get called when I passed the string into a wrapper
 function.
 
 is there anything similar in D for char[] arrays?

I suppose you could wrap the array in a struct that overloads opIndex assignment and do something funky, but I haven't really thought about it. Seems like a lot of trouble to avoid using strings.
May 10 2004
next sibling parent reply hellcatv hotmail.com writes:
you have some good points
but what is a good way for a new person (or someone reading someone else's code)
to know if the function is exhibiting copy on write

if C++ had the same feature for strings then I would assume that a const string
would not be modified and a non const string would be...

can I assume all phobos-related string functions that need to perform copy on
write then?  it's a potential pitfall for new programmers to have the opCat
function ~= sometimes copy on write yet the tolower function copies on write

perhaps this just needs to be mentioned carefully in the
documentation...preferably in a consistent manner

I also noticed that
char [] blah="1234567";
char [] bleh=blah;
bleh~="";
bleh[0]='A';
blah[0] is still '1'
perhaps ~= also guarantees copy-on-write semantics? :-)
that would make phobos a quite consistent library then


In article <c7pat1$1kk4$1 digitaldaemon.com>, Ben Hinkle says...
Daniel Horn wrote:

 right the docs say "you" but I wasn't sure if it means I must do it or
 by modifying it, the lib does a copy-on-write.

copy-on-write is not enforced by the compiler but it is a technique used by std.string (and probably the rest of phobos). If you look at std.string.tolower, for example, you will see how it delays making a copy until it absolutely has to. I'm not sure which part of the doc you are looking at.
 so I must specifically make a copy of it in order to guarantee that my
 function will not result in side effects?

If you write a statement like str[3] = 'a'; then you should think about using COW. If you want to guarantee your function has no side effect then you should make a copy. If you write str = tolower(str); in your function then you don't have to make a copy since tolower uses COW already and it will make a copy if it needs to.
 could I make a wrapper struct that guaranteed it would copy when passed
 into a function (like C++ strings)?  in C I could wrap a static array
 into a struct in order to get pass-by-value semantics (of course the
 size of this array was known) and in C++ I could make a copy constructor
 that would implicitly get called when I passed the string into a wrapper
 function.
 
 is there anything similar in D for char[] arrays?

I suppose you could wrap the array in a struct that overloads opIndex assignment and do something funky, but I haven't really thought about it. Seems like a lot of trouble to avoid using strings.

May 10 2004
parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
hellcatv hotmail.com wrote:

 you have some good points
 but what is a good way for a new person (or someone reading someone else's
 code) to know if the function is exhibiting copy on write

You should always assume that it may do it, unless it is explicitely documented as dowing something "in place". You should be careful to expect that a routine guarantees to do a copy. Like the tolower example (as I understand from Ben's post): it will make a copy if there were any uppercase letters in the original. Otherwise, there is no reason to do so, and it will just return a reference to the original string. If you want to make sure to have a unique copy, you have to call .dup yourself. I hope, the compiler is intelligent enough to detect and drop unnecessary .dups
 I also noticed that
 char [] blah="1234567";
 char [] bleh=blah;
 bleh~="";
 bleh[0]='A';
 blah[0] is still '1'

 perhaps ~= also guarantees copy-on-write semantics? :-)
 that would make phobos a quite consistent library then

The implementation is likely to do so, but currently, the language spec does not guarantee it. Actually, in your example, bleh~="" might be optimized away completely. (At least, that's how I understand the specs.)
May 10 2004
parent "Walter" <newshound digitalmars.com> writes:
"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
news:c7psqp$2f7r$1 digitaldaemon.com...
 You should always assume that it may do it, unless it is explicitely
 documented as dowing something "in place". You should be careful to expect
 that a routine guarantees to do a copy. Like the tolower example (as I
 understand from Ben's post): it will make a copy if there were any
 uppercase letters in the original. Otherwise, there is no reason to do so,
 and it will just return a reference to the original string.

COW coupled with gc enables D string processing programs to smoke C++ std.string ones in the performance department. The combination of the two enables one to do things like mix slices, static data, and gc'd data without worrying about which is which. C++ std.string has to worry, and the implementations I've looked at resolve the problem by always copying.
May 11 2004
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Ben Hinkle wrote:
 
 copy-on-write is not enforced by the compiler but it is a technique used by
 std.string (and probably the rest of phobos). If you look at
 std.string.tolower, for example, you will see how it delays making a copy
 until it absolutely has to. I'm not sure which part of the doc you are
 looking at.

COW is great in many cases but it can be a nightmare with multithreaded programming. It almost makes me wish that we could specify the behavior with a template parameter. Sean
May 10 2004
parent Norbert Nemec <Norbert.Nemec gmx.de> writes:
Sean Kelly wrote:

 COW is great in many cases but it can be a nightmare with multithreaded
 programming.  It almost makes me wish that we could specify the behavior
 with a template parameter.

Why is that? If you know, that no other part of the program may have a reference to some string, then you may write to it. Otherwise, you just have to copy the string first. I see no difference whether "other part" is a local variable in the same routine or some part in another thread. Of course, if the reference itself is shared between threads, you have to lock it before writing anything, but that is the same with any variable.
May 10 2004
prev sibling parent Sean Kelly <sean f4.ca> writes:
Ben Hinkle wrote:
 "Sean Kelly" <sean f4.ca> wrote in message
 news:c7osgt$10f4$1 digitaldaemon.com...
But D is a GC language.  Would there even be a dangling reference in
this case?  I assumed that this would just result in a side-effect.

umm, I'm not sure what the GC has to do with it, but yeah, the GC will collect the copy if all the references go away. COW is to prevent side-effects.

By GC I meant that the string is effectively passed by reference, so a reallocation would not leave the passed variable pointing to bad memory as may happen in C using pointers. I just wanted to clarify the semantics that the result is not "undefined" but rather merely that the function has a side-effect. Sean
May 10 2004