www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - V2 string

reply Derek Parnell <derek psych.ward> writes:
I'm converting Bud to compile using V2 and so far its been a very hard
thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all
over the place, which is exactly what I thought would happen. Bud does a
lot of text manipulation so having 'string' as invariant means that calls
to functions that return string need to often be .dup'ed because I need to
assign the result to a malleable variable. 

I might have to rethink of the design of the application to avoid the
performance hit of all these dups.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell
Jul 04 2007
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Derek Parnell wrote:
 I'm converting Bud to compile using V2 and so far its been a very hard
 thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all
 over the place, which is exactly what I thought would happen. Bud does a
 lot of text manipulation so having 'string' as invariant means that calls
 to functions that return string need to often be .dup'ed because I need to
 assign the result to a malleable variable. 
 
 I might have to rethink of the design of the application to avoid the
 performance hit of all these dups.
 

First of all, if you were returning string literals as char[] and trying to manipulate them, they'd fail on linux at run time (because string literals are put into read only segments). Second, you can use char[] instead of string.
Jul 04 2007
parent reply Derek Parnell <derek psych.ward> writes:
On Wed, 04 Jul 2007 15:48:45 -0700, Walter Bright wrote:

 Derek Parnell wrote:
 I'm converting Bud to compile using V2 and so far its been a very hard
 thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all
 over the place, which is exactly what I thought would happen. Bud does a
 lot of text manipulation so having 'string' as invariant means that calls
 to functions that return string need to often be .dup'ed because I need to
 assign the result to a malleable variable. 
 
 I might have to rethink of the design of the application to avoid the
 performance hit of all these dups.
 

First of all, if you were returning string literals as char[] and trying to manipulate them, they'd fail on linux at run time (because string literals are put into read only segments).

But I'm not, and never have been, returning string literals anywhere.
 Second, you can use char[] instead of string.

The idiom I'm using is that functions that receive text have those parameters as 'string' to guard against the function inadvertantly modifying that which is passed, and functions that return text return 'string' to guard against calling functions inadvertantly modifying data that they did not create (own). This leads to constructs like ... char[] result; result = SomeTextFunc(data).dup; Another commonly used idiom that I had to stop using was ... char[] text; text = getvalue(); if (wrongvalue(text)) text = ""; // Reset to an empty string I now code ... text.length = 0; // Reset to an empty string which is slightly less readable. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Jul 04 2007
next sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Thu, 05 Jul 2007 02:23:11 +0300, Derek Parnell <derek psych.ward> wro=
te:

 This leads to constructs like ...

    char[] result;

    result =3D SomeTextFunc(data).dup;

Is SomeTextFunc allocating a copy of the string which it is returning? I= f it is, then there's no reason why it should return a "string" type. If= it isn't, then modifying the data in the returned char[] could have unf= oreseen consequences.
 Another commonly used idiom that I had to stop using was ...

    char[] text;
    text =3D getvalue();
    if (wrongvalue(text))
        text =3D ""; // Reset to an empty string

Since empty string literals don't really point to data, I'd suggest that= empty string and array literals shouldn't be const/invariant in favor o= f the above example. It breaks some consistency, but "a foolish consiste= ncy is the hobgoblin of little minds" ;) -- = Best regards, Vladimir mailto:thecybershadow gmail.com
Jul 04 2007
parent Derek Parnell <derek nomail.afraid.org> writes:
On Thu, 05 Jul 2007 04:44:41 +0300, Vladimir Panteleev wrote:

 On Thu, 05 Jul 2007 02:23:11 +0300, Derek Parnell <derek psych.ward> wrote:
 
 This leads to constructs like ...

    char[] result;

    result = SomeTextFunc(data).dup;

Is SomeTextFunc allocating a copy of the string which it is returning? If it is, then there's no reason why it should return a "string" type. If it isn't, then modifying the data in the returned char[] could have unforeseen consequences.

Yes, I realize this and I'm not saying its doing the wrong thing, and actually I'm not even complaining. I'm just letting people know some of the observations I've had in moving to v2. In this case, someone has to copy the resulting data - either the function that created it or the routine that called the function. If the called function does the duplication, it could be a waste if the calling function is not going to further modify it, that is why I elected to pass a 'const' reference to the new data. The calling function can then decide if it needs a copy (to modify it) or not. string result; result = SomeTextFunc(data); // no need to dup if I'm not changing it. I've got a set of alias to help me ... alias char[] text; alias wchar[] wtext; alias dchar[] dtext; so now I see 'text' as mutable and 'string' as immutable.
 Another commonly used idiom that I had to stop using was ...

    char[] txt;
    txt = getvalue();
    if (wrongvalue(txt))
        txt = ""; // Reset to an empty string

Since empty string literals don't really point to data, I'd suggest that empty string and array literals shouldn't be const/invariant in favor of the above example. It breaks some consistency, but "a foolish consistency is the hobgoblin of little minds" ;)

Nice idea, but I can't see it happening because of the inconsistency angle. Instead I've decided to use the idiom ... text txt; txt = getvalue(); if (wrongvalue(txt)) txt = text.init; // Reset to an empty string -- Derek (skype: derek.j.parnell) Melbourne, Australia 5/07/2007 3:52:27 PM
Jul 04 2007
prev sibling next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Derek Parnell wrote:
 The idiom I'm using is that functions that receive text have those
 parameters as 'string' to guard against the function inadvertantly
 modifying that which is passed, and functions that return text return
 'string' to guard against calling functions inadvertantly modifying data
 that they did not create (own).
 
 This leads to constructs like ...
 
    char[] result;
 
    result = SomeTextFunc(data).dup;

If you're needing to guard against inadvertent modification, that's just what const strings are for. I'm not understanding the issue here.
 Another commonly used idiom that I had to stop using was ...
 
    char[] text;
    text = getvalue();
    if (wrongvalue(text))
        text = ""; // Reset to an empty string
 
 I now code ...
 
        text.length = 0; // Reset to an empty string
 
 which is slightly less readable.

This should do it nicely: text = null;
Jul 05 2007
next sibling parent reply Regan Heath <regan netmail.co.nz> writes:
Walter Bright Wrote:
 Derek Parnell wrote:
 The idiom I'm using is that functions that receive text have those
 parameters as 'string' to guard against the function inadvertantly
 modifying that which is passed, and functions that return text return
 'string' to guard against calling functions inadvertantly modifying data
 that they did not create (own).
 
 This leads to constructs like ...
 
    char[] result;
 
    result = SomeTextFunc(data).dup;

If you're needing to guard against inadvertent modification, that's just what const strings are for. I'm not understanding the issue here.
 Another commonly used idiom that I had to stop using was ...
 
    char[] text;
    text = getvalue();
    if (wrongvalue(text))
        text = ""; // Reset to an empty string
 
 I now code ...
 
        text.length = 0; // Reset to an empty string
 
 which is slightly less readable.

This should do it nicely: text = null;

Aaargh! You're confusing empty and non-existant (null) again! <g> In some cases there is an important difference between the two. In this case maybe not I don't really know. Regan
Jul 05 2007
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Regan Heath wrote:
 Aaargh!  You're confusing empty and non-existant (null) again!  <g>

In this case, no.
 In some cases there is an important difference between the two.

The only case is when you're extending into a preallocated buffer. Such cannot be the case with string literals.
Jul 05 2007
parent reply James Dennett <jdennett acm.org> writes:
Walter Bright wrote:
 Regan Heath wrote:
 Aaargh!  You're confusing empty and non-existant (null) again!  <g>

In this case, no.

But a way of emptying something was asked for, and you showed a way to make it null, not empty -- can you explain your "In this case, no"?
 In some cases there is an important difference between the two.

The only case is when you're extending into a preallocated buffer.

I've found many times when the difference between an empty string and no string was important; they generally have nothing to do with extending at all. I'd be interested to know why you assert that no such cases exist. -- James
Jul 05 2007
parent reply Walter Bright <newshound1 digitalmars.com> writes:
James Dennett wrote:
 I've found many times when the difference between an empty
 string and no string was important; they generally have
 nothing to do with extending at all.  I'd be interested to
 know why you assert that no such cases exist.

I'd like to know of such cases.
Jul 05 2007
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Thu, 05 Jul 2007 20:58:11 -0700, Walter Bright wrote:

 James Dennett wrote:
 I've found many times when the difference between an empty
 string and no string was important; they generally have
 nothing to do with extending at all.  I'd be interested to
 know why you assert that no such cases exist.

I'd like to know of such cases.

char[] Option; Option = getOptionFromUser(); if (Option.ptr = 0) { Option = DefaultOption; } However, if the user sets the option to "" then that is what they want and not the default one. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Jul 05 2007
next sibling parent reply Derek Parnell <derek nomail.afraid.org> writes:
On Fri, 6 Jul 2007 14:23:43 +1000, Derek Parnell wrote:

 On Thu, 05 Jul 2007 20:58:11 -0700, Walter Bright wrote:
 
 James Dennett wrote:
 I've found many times when the difference between an empty
 string and no string was important; they generally have
 nothing to do with extending at all.  I'd be interested to
 know why you assert that no such cases exist.

I'd like to know of such cases.

char[] Option; Option = getOptionFromUser(); if (Option.ptr = 0) { Option = DefaultOption; } However, if the user sets the option to "" then that is what they want and not the default one.

And if you must nitpick that one can code this a different way then here is another example. Let's say that there is this library routine, which is closed source and I don't have access to its source, that accepts a string as its argument. Further more, if that passed string is null the routine uses a default value - whatever that is because I don't know it. Now in my code I call it with ... SomeFunc(""); -- Use an empty string to do its magic SomeFunc(null); -- But this time, use the default value Remember, I have no control over the SomeFunc routine's implementation. -- Derek (skype: derek.j.parnell) Melbourne, Australia 6/07/2007 2:54:45 PM
Jul 05 2007
next sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Derek Parnell wrote:
 On Fri, 6 Jul 2007 14:23:43 +1000, Derek Parnell wrote:
 
 On Thu, 05 Jul 2007 20:58:11 -0700, Walter Bright wrote:

 James Dennett wrote:
 I've found many times when the difference between an empty
 string and no string was important; they generally have
 nothing to do with extending at all.  I'd be interested to
 know why you assert that no such cases exist.


Option = getOptionFromUser(); if (Option.ptr = 0) { Option = DefaultOption; } However, if the user sets the option to "" then that is what they want and not the default one.

And if you must nitpick that one can code this a different way then here is another example. Let's say that there is this library routine, which is closed source and I don't have access to its source, that accepts a string as its argument. Further more, if that passed string is null the routine uses a default value - whatever that is because I don't know it. Now in my code I call it with ... SomeFunc(""); -- Use an empty string to do its magic SomeFunc(null); -- But this time, use the default value Remember, I have no control over the SomeFunc routine's implementation.

In databases NULL being different from empty seems to a big deal too. Anyway googling for "null versus empty" turns up a bevy of hits, so from that I think we can presume that the distinction is important to a non-empty subset of programmers. --bb
Jul 05 2007
parent Sean Kelly <sean f4.ca> writes:
Bill Baxter wrote:
 
 Anyway googling for "null versus empty" turns up a bevy of hits, so from 
 that I think we can presume that the distinction is important to a 
 non-empty subset of programmers.

Either that or it's important to a non-null set of programmers. ;-) Sean
Jul 06 2007
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Derek Parnell wrote:
 Let's say that there is this library routine, which is closed source and I
 don't have access to its source, that accepts a string as its argument.
 Further more, if that passed string is null the routine uses a default
 value - whatever that is because I don't know it. Now in my code I call it
 with ...
 
    SomeFunc("");   -- Use an empty string to do its magic
    SomeFunc(null); -- But this time, use the default value
 
 Remember, I have no control over the SomeFunc routine's implementation.

Of course, if a function is documented to behave that way, and you have no control over it, you must adhere to its documentation. There are other ways to do default arguments. I suspect we could argue about it like we could argue about tab stops, and never reach any sort of resolution <g>.
Jul 06 2007
next sibling parent Regan Heath <regan netmail.co.nz> writes:
Walter Bright wrote:
 Derek Parnell wrote:
 Let's say that there is this library routine, which is closed source 
 and I
 don't have access to its source, that accepts a string as its argument.
 Further more, if that passed string is null the routine uses a default
 value - whatever that is because I don't know it. Now in my code I 
 call it
 with ...

    SomeFunc("");   -- Use an empty string to do its magic
    SomeFunc(null); -- But this time, use the default value

 Remember, I have no control over the SomeFunc routine's implementation.

Of course, if a function is documented to behave that way, and you have no control over it, you must adhere to its documentation. There are other ways to do default arguments. I suspect we could argue about it like we could argue about tab stops, and never reach any sort of resolution <g>.

The first argument which I think holds water is that it is trivial to represent empty and non existant in C, eg. char *empty = ""; char *non-existant = NULL; The other argument is the one made earlier about databases. In a database empty and non-existant are important distinct states a value could have. Currently, D can model these but it worries me that you don't seem to think that it's important. So, perhaps in future you might decide to get rid of this, or do so accidently. Regan
Jul 06 2007
prev sibling parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Walter Bright wrote:
 Derek Parnell wrote:
 Let's say that there is this library routine, which is closed source 
 and I
 don't have access to its source, that accepts a string as its argument.
 Further more, if that passed string is null the routine uses a default
 value - whatever that is because I don't know it. Now in my code I 
 call it
 with ...

    SomeFunc("");   -- Use an empty string to do its magic
    SomeFunc(null); -- But this time, use the default value

 Remember, I have no control over the SomeFunc routine's implementation.

Of course, if a function is documented to behave that way, and you have no control over it, you must adhere to its documentation. There are other ways to do default arguments. I suspect we could argue about it like we could argue about tab stops, and never reach any sort of resolution <g>.

Uh, unlike tab stops, I think it is widely recognized by the developer community that it is useful to have a distinction between *valid* and *invalid* values of something. Why is there a NAN for floats (and in D NAN is the default value for floats) ? What if NAN was equal to zero? Didn't you yourself, Walter, said once that if there was a way to have an actual invalid value for ints (without sacrificing precision) you would like to have that, and you would place it as the default value for int, instead of -1 (which is a valid int)? So why shouldn't arrays (who are already reference types) have a value that means "invalid array", especially if we can get that for free (unlike ints)? -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 07 2007
prev sibling parent Leandro Lucarella <llucax gmail.com> writes:
Derek Parnell, el  6 de julio a las 14:23 me escribiste:
 On Thu, 05 Jul 2007 20:58:11 -0700, Walter Bright wrote:
 
 James Dennett wrote:
 I've found many times when the difference between an empty
 string and no string was important; they generally have
 nothing to do with extending at all.  I'd be interested to
 know why you assert that no such cases exist.

I'd like to know of such cases.

char[] Option; Option = getOptionFromUser(); if (Option.ptr = 0) { Option = DefaultOption; } However, if the user sets the option to "" then that is what they want and not the default one.

Basically is the same issue as NULL and NOT NULL on SQL... -- LUCA - Leandro Lucarella - Usando Debian GNU/Linux Sid - GNU Generation ------------------------------------------------------------------------ E-Mail / JID: luca lugmen.org.ar GPG Fingerprint: D9E1 4545 0F4B 7928 E82C 375D 4B02 0FE0 B08B 4FB2 GPG Key: gpg --keyserver pks.lugmen.org.ar --recv-keys B08B4FB2 ------------------------------------------------------------------------ Sé que tu me miras, pero yo me juraría que, en esos ojos negros que tenés, hay un indio sensible que piensa: "Qué bárbaro que este tipo blanco esté tratando de comunicarse conmigo que soy un ser inferior en la escala del homo sapiens". Por eso, querido indio, no puedo dejar de mirarte como si fueras un cobayo de mierda al que puedo pisar cuando quiera. -- Ricardo Vaporeso. Carta a los aborígenes, ed. Gredos, Barcelona, 1912, página 102.
Jul 06 2007
prev sibling next sibling parent James Dennett <jdennett acm.org> writes:
Walter Bright wrote:
 James Dennett wrote:
 I've found many times when the difference between an empty
 string and no string was important; they generally have
 nothing to do with extending at all.  I'd be interested to
 know why you assert that no such cases exist.

I'd like to know of such cases.

Any time you need a difference between "specified, and known to be empty" and "unspecified or unknown", which is very common. The alternative is to carry a boolean around to say whether the string is in use. Others have raised the case of null meaning "use default" (but let's not spend too much time on that specific case), and the fact that the database world often (though not always) distinguishes null from empty. Many people have found good reason to do this. The "Maybe" or "Fallible" type constructors used in other languages also cover cases where "absent" can usefully be handled separately from "empty" (in more general cases than just strings). -- James
Jul 06 2007
prev sibling parent Serg Kovrov <kovrov bugmenot.com> writes:
Walter Bright wrote:
 James Dennett wrote:
 I've found many times when the difference between an empty
 string and no string was important; they generally have
 nothing to do with extending at all.  I'd be interested to
 know why you assert that no such cases exist.

I'd like to know of such cases.

I used to this pattern: void foo(char[] bar=null) { if (bar is null) m_bar = "default_value"; else m_bar = bar; // even if it's empty } often as one-liner: m_bar = (bar is null) ? "default_value" : bar; This is most used one (at least by me), but of course there are more. -- serg.
Jul 07 2007
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Thu, 05 Jul 2007 00:42:25 -0700, Walter Bright wrote:

 Derek Parnell wrote:
 The idiom I'm using is that functions that receive text have those
 parameters as 'string' to guard against the function inadvertantly
 modifying that which is passed, and functions that return text return
 'string' to guard against calling functions inadvertantly modifying data
 that they did not create (own).
 
 This leads to constructs like ...
 
    char[] result;
 
    result = SomeTextFunc(data).dup;

If you're needing to guard against inadvertent modification, that's just what const strings are for. I'm not understanding the issue here.

There is no issue. I'm not raising an issue. I'm just making some observations about my exerience so far in moving to V2. I'm not surprised by the effort that I'm having. I expected it. Why? Because I knew that most of the strings I work with are text (mutable things) and by using the D 'string', an immutable thing, for function signatures was going to mean I'd have to changes things to suit. I choose to use 'string' it safe guard myself from making stupid errors in coding. And its working. My next pass through the application code will be to find places where I can safely return a 'text' thing instead of a 'string' thing, which is a performance turning exercise.
 Another commonly used idiom that I had to stop using was ...
 
    char[] text;
    text = getvalue();
    if (wrongvalue(text))
        text = ""; // Reset to an empty string
 
 I now code ...
 
        text.length = 0; // Reset to an empty string
 
 which is slightly less readable.

This should do it nicely: text = null;

Not really. I want an empty text and not a non-text. Also, it doesn't fit right with other data types - the consistency thing again. text = typeof(text).init; works better for me because I can also use this construct in templates without problems. But really, this thread can die now. I didn't mean to go off into weird tangental subects. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Jul 05 2007
parent Walter Bright <newshound1 digitalmars.com> writes:
Derek Parnell wrote:
 This should do it nicely:

 	text = null;

Not really. I want an empty text and not a non-text.

Such a distinction is critical in C code, but is not of much use in D code. What do you need the distinction for?
 Also, it doesn't fit
 right with other data types - the consistency thing again.
 
    text = typeof(text).init; 
 
 works better for me because I can also use this construct in templates
 without problems.

The .init for char[] is null, not "".
 But really, this thread can die now. I didn't mean to go off into weird
 tangental subects.

I think you've raised a couple of very important stylistic issues, and it is worth pursuing.
Jul 05 2007
prev sibling next sibling parent Regan Heath <regan netmail.co.nz> writes:
Derek Parnell Wrote:
 On Wed, 04 Jul 2007 15:48:45 -0700, Walter Bright wrote:
 
 Derek Parnell wrote:
 I'm converting Bud to compile using V2 and so far its been a very hard
 thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all
 over the place, which is exactly what I thought would happen. Bud does a
 lot of text manipulation so having 'string' as invariant means that calls
 to functions that return string need to often be .dup'ed because I need to
 assign the result to a malleable variable. 
 
 I might have to rethink of the design of the application to avoid the
 performance hit of all these dups.
 

First of all, if you were returning string literals as char[] and trying to manipulate them, they'd fail on linux at run time (because string literals are put into read only segments).

But I'm not, and never have been, returning string literals anywhere.
 Second, you can use char[] instead of string.

The idiom I'm using is that functions that receive text have those parameters as 'string' to guard against the function inadvertantly modifying that which is passed

Yep, makes sense.
 , and functions that return text return
 'string' to guard against calling functions inadvertantly modifying data
 that they did not create (own).

Question; Do these functions keep a copy of the returned string? Or, to re-phrase, after returning the string do they still 'own' it, or have they washed their hands of it? Are they in a sense passing ownership to the calling function perhaps? If they no longer 'own' the string then they can return it as a char[] instead of string and all your problems are solved, right? I imagine that if they return a slice of the input string, and that string was 'string' not char[] then they would also return string (because doing otherwise would be claiming ownership of the input string and giving it away to the caller, which may not be valid) Maybe you have a lot of functions returning slices to the input string? Maybe you need to template them? i.e. T function(T)(T param) { } so if you pass string you get string, if you pass char[] you get char[]. Maybe all string routines which return slices of the input should be so templated? Regan
Jul 05 2007
prev sibling parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Derek Parnell wrote:
 On Wed, 04 Jul 2007 15:48:45 -0700, Walter Bright wrote:
 
 The idiom I'm using is that functions that receive text have those
 parameters as 'string' to guard against the function inadvertantly
 modifying that which is passed, and functions that return text return
 'string' to guard against calling functions inadvertantly modifying data
 that they did not create (own).
 
 This leads to constructs like ...
 
    char[] result;
 
    result = SomeTextFunc(data).dup;
 
 Another commonly used idiom that I had to stop using was ...
 
    char[] text;
    text = getvalue();
    if (wrongvalue(text))
        text = ""; // Reset to an empty string
 
 I now code ...
 
        text.length = 0; // Reset to an empty string
 
 which is slightly less readable.
 

Why is 'text.length = 0;' or 'text = text.init;' better than the idiom: str = "".dup; , which also works for any kind of string, not just empty strings? I found however, that there is a bug with that code: http://d.puremagic.com/issues/show_bug.cgi?id=1314 -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 05 2007
prev sibling next sibling parent reply Sean Kelly <sean f4.ca> writes:
Derek Parnell wrote:
 I'm converting Bud to compile using V2 and so far its been a very hard
 thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all
 over the place, which is exactly what I thought would happen. Bud does a
 lot of text manipulation so having 'string' as invariant means that calls
 to functions that return string need to often be .dup'ed because I need to
 assign the result to a malleable variable. 

So just use char[] instead of 'string'. I don't plan to use the aliases much either. Sean
Jul 05 2007
parent reply Derek Parnell <derek nomail.afraid.org> writes:
On Thu, 05 Jul 2007 00:15:41 -0700, Sean Kelly wrote:

 Derek Parnell wrote:
 I'm converting Bud to compile using V2 and so far its been a very hard
 thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all
 over the place, which is exactly what I thought would happen. Bud does a
 lot of text manipulation so having 'string' as invariant means that calls
 to functions that return string need to often be .dup'ed because I need to
 assign the result to a malleable variable. 

So just use char[] instead of 'string'. I don't plan to use the aliases much either.

It's not so clear cut. Firstly, a lot of phobos routines now return 'string' results and expect 'string' inputs. Secondly, I like the idea of general purpose functions returning 'const' data, because it helps guard against inadvertent modifications by the calling routines. It is up to the calling function to explicitly decide if it is going to modify returned stuff or not. For example, if I know that I'll not need to modify the 'fullpath' then I might do this ... string fullpath; fullpath = CanonicalPath(shortname); However, if I might need to update it ... char[] fullpath; fullpath = CanonicalPath(shortname).dup; version(Windows) { setLowerCase(fullpath); } The point is that the 'CanonicalPath' function hasn't got a clue what the calling function is intending to do with the result so it is trying to be responsible by guarding it against mistakes by the caller. -- Derek (skype: derek.j.parnell) Melbourne, Australia 5/07/2007 5:17:33 PM
Jul 05 2007
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Derek Parnell wrote:
 However, if I might need to update it ...
 
    char[] fullpath;
 
    fullpath = CanonicalPath(shortname).dup;
    version(Windows)
    {
       setLowerCase(fullpath);
    }
 
 The point is that the 'CanonicalPath' function hasn't got a clue what the
 calling function is intending to do with the result so it is trying to be
 responsible by guarding it against mistakes by the caller.

If you write it like this: string fullpath; fullpath = CanonicalPath(shortname); version(Windows) { fullpath = std.string.tolower(fullpath); } you won't need to do the .dup .
Jul 05 2007
next sibling parent reply Regan Heath <regan netmail.co.nz> writes:
Walter Bright Wrote:
 Derek Parnell wrote:
 However, if I might need to update it ...
 
    char[] fullpath;
 
    fullpath = CanonicalPath(shortname).dup;
    version(Windows)
    {
       setLowerCase(fullpath);
    }
 
 The point is that the 'CanonicalPath' function hasn't got a clue what the
 calling function is intending to do with the result so it is trying to be
 responsible by guarding it against mistakes by the caller.

If you write it like this: string fullpath; fullpath = CanonicalPath(shortname); version(Windows) { fullpath = std.string.tolower(fullpath); } you won't need to do the .dup .

Because tolower does it for you, but it still returns string and if for example you need to add something to the end of the path, like a filename you will end up doing yet another dup somewhere. I think the solution may be to template all functions which return the input string, or part of the input string, eg. T tolower(T)(T input) { } That way if you call it with char[] you get a char[] back, if you call it with string you get a string back. However... tolower is an interesting case. As a caller I expect it to modify the string, or perhaps give a modified copy back (both options are valid and should perhaps be supported?). So, the 'string tolower(string)' version has 2 cases, the first case where it doesn't need to modify the input and can simply return it, no problem. But case 2, where it does modify it should dup and return char[]. My reasoning being that after it has completed and returned the copy, the caller now 'owns' the string (as it's the only copy in existance and no-one else has a reference to it). To achieve that we'd need to overload on return type, or something clever... but then, how do we call it? auto s = tolower(input); tolower cannot be selected at compile time, and the type of s cannot be known either, so that's an impossible situation, yes? Regan
Jul 05 2007
next sibling parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Regan Heath wrote:
 Walter Bright Wrote:
 Derek Parnell wrote:
 However, if I might need to update it ...

    char[] fullpath;

    fullpath = CanonicalPath(shortname).dup;
    version(Windows)
    {
       setLowerCase(fullpath);
    }

 The point is that the 'CanonicalPath' function hasn't got a clue what the
 calling function is intending to do with the result so it is trying to be
 responsible by guarding it against mistakes by the caller.

string fullpath; fullpath = CanonicalPath(shortname); version(Windows) { fullpath = std.string.tolower(fullpath); } you won't need to do the .dup .

Because tolower does it for you, but it still returns string and if for example you need to add something to the end of the path, like a filename you will end up doing yet another dup somewhere. I think the solution may be to template all functions which return the input string, or part of the input string, eg. T tolower(T)(T input) { } That way if you call it with char[] you get a char[] back, if you call it with string you get a string back. However... tolower is an interesting case. As a caller I expect it to modify the string, or perhaps give a modified copy back (both options are valid and should perhaps be supported?). So, the 'string tolower(string)' version has 2 cases, the first case where it doesn't need to modify the input and can simply return it, no problem. But case 2, where it does modify it should dup and return char[]. My reasoning being that after it has completed and returned the copy, the caller now 'owns' the string (as it's the only copy in existance and no-one else has a reference to it).

Indeed, I think this illustrates that some standard library functions may not have the correct signature, and I tolower is likely one of them. The most general case for tolower is: char[] tolower(const(char)[] s); Since tolower creates a new array, but does not keep it, it can give away it's ownership of the the array (ie, return a mutable). The second case, more specific, is simply syntactic sugar for making that array invariant: invariant(char)[] tolowerinv(const(char)[] str) { return cast(invariant) tolower(str); } The current signature: const(char)[] tolower(const(char)[] str) is kinda incorrect, because it returns a const reference for an array that has no mutable references, and that is the same as an invariant reference, so tolower might as well return invariant(char)[].
 To achieve that we'd need to overload on return type, or something clever... 
but then, how do we call it?
 
 auto s = tolower(input);
 
 tolower cannot be selected at compile time, and the type of s cannot be known
either, so that's an impossible situation, yes?
 
 Regan

The 'something clever' to distinguish both cases is simply naming two different functions, like tolower or tolowerinv (if the second function is needed at all). -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 05 2007
next sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Bruno Medeiros wrote:
 Regan Heath wrote:
 tolower is an interesting case.  As a caller I expect it to modify the 
 string, or perhaps give a modified copy back (both options are valid 
 and should perhaps be supported?).

 So, the 'string tolower(string)' version has 2 cases, the first case 
 where it doesn't need to modify the input and can simply return it, no 
 problem. 
 But case 2, where it does modify it should dup and return char[].  My 
 reasoning being that after it has completed and returned the copy, the 
 caller now 'owns' the string (as it's the only copy in existance and 
 no-one else has a reference to it).

Indeed, I think this illustrates that some standard library functions may not have the correct signature, and I tolower is likely one of them. The most general case for tolower is: char[] tolower(const(char)[] s); Since tolower creates a new array, but does not keep it, it can give away it's ownership of the the array (ie, return a mutable).

Sorry, but you seem to have missed a bit above: if the string doesn't contain any uppercase characters tolower returns the input without .dup-ing it (aka copy-on-write).
 The second case, more specific, is simply syntactic sugar for making 
 that array invariant:
 
   invariant(char)[] tolowerinv(const(char)[] str) {
     return cast(invariant) tolower(str);
   }

Yes, but only if it actually needs to modify the string. You seem to have missed that the two cases can't (in general) be distinguished at compile time; it's only at run time when a choice is made between a copy and no copy.
 The current signature:
   const(char)[] tolower(const(char)[] str)
 is kinda incorrect, because it returns a const reference for an array 
 that has no mutable references, and that is the same as an invariant 
 reference, so tolower might as well return invariant(char)[].

Again, that only holds if a copy was actually made at run time. If no copy was made the original input is returned, to which there may be mutable references.
Jul 05 2007
parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Frits van Bommel wrote:
 Bruno Medeiros wrote:
 Regan Heath wrote:
 tolower is an interesting case.  As a caller I expect it to modify 
 the string, or perhaps give a modified copy back (both options are 
 valid and should perhaps be supported?).

 So, the 'string tolower(string)' version has 2 cases, the first case 
 where it doesn't need to modify the input and can simply return it, 
 no problem. But case 2, where it does modify it should dup and return 
 char[].  My reasoning being that after it has completed and returned 
 the copy, the caller now 'owns' the string (as it's the only copy in 
 existance and no-one else has a reference to it).

Indeed, I think this illustrates that some standard library functions may not have the correct signature, and I tolower is likely one of them. The most general case for tolower is: char[] tolower(const(char)[] s); Since tolower creates a new array, but does not keep it, it can give away it's ownership of the the array (ie, return a mutable).

Sorry, but you seem to have missed a bit above: if the string doesn't contain any uppercase characters tolower returns the input without ..dup-ing it (aka copy-on-write).

Oops, sorry, that's right, I missed that part about tolower not modifying the string if it wasn't necessary. :(
 The second case, more specific, is simply syntactic sugar for making 
 that array invariant:

   invariant(char)[] tolowerinv(const(char)[] str) {
     return cast(invariant) tolower(str);
   }

Yes, but only if it actually needs to modify the string. You seem to have missed that the two cases can't (in general) be distinguished at compile time; it's only at run time when a choice is made between a copy and no copy.
 The current signature:
   const(char)[] tolower(const(char)[] str)
 is kinda incorrect, because it returns a const reference for an array 
 that has no mutable references, and that is the same as an invariant 
 reference, so tolower might as well return invariant(char)[].

Again, that only holds if a copy was actually made at run time. If no copy was made the original input is returned, to which there may be mutable references.

You're right, if a copy is not made *every* time (which is the case after all), then the above doesn't hold. But then, what I think is happening is that Phobo's current tolower is suboptimal in terms of usefulness, because the fact that we don't know if a new copy is made or not. I'm wondering now what would be the more useful form, or forms, of tolower (and similar functions) to have. Now that I think of it again (admittedly I haven't got much experience with string manipulation in C++ or D, though), but perhaps the best form is an in-place mutable version: char[] tolower(char[] str); And it's this one after all that is the most general form. If you want to call tolower on a const or invariant array you dup it yourself on the call: char[] str = tolower("FOO".dup); -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 05 2007
parent reply Regan Heath <regan netmail.co.nz> writes:
Bruno Medeiros wrote:
 The current signature:
   const(char)[] tolower(const(char)[] str)
 is kinda incorrect, because it returns a const reference for an array 
 that has no mutable references, and that is the same as an invariant 
 reference, so tolower might as well return invariant(char)[].

Again, that only holds if a copy was actually made at run time. If no copy was made the original input is returned, to which there may be mutable references.

You're right, if a copy is not made *every* time (which is the case after all), then the above doesn't hold. But then, what I think is happening is that Phobo's current tolower is suboptimal in terms of usefulness, because the fact that we don't know if a new copy is made or not. I'm wondering now what would be the more useful form, or forms, of tolower (and similar functions) to have. Now that I think of it again (admittedly I haven't got much experience with string manipulation in C++ or D, though), but perhaps the best form is an in-place mutable version: char[] tolower(char[] str); And it's this one after all that is the most general form. If you want to call tolower on a const or invariant array you dup it yourself on the call: char[] str = tolower("FOO".dup);

True.. but it's unfortunate that the most efficient case, where no duplication is needed, is no longer possible :( If we template the function, eg. T tolower(T)(T input) { } and we have some way to check whether the input is const or not (at runtime is(string) or something?) perhaps we can code the existing efficient solution (no dup of const data) as well as the general case where it mutates. In the mutate case it can dup if the input is const and not dup if it isn't (adding an efficient solution which doesn't currently exist). The only problem is that the case where you pass const data and it has to dup, you get back a const reference to a piece of data with no other owner (meaning it doesn't need to be const) which might cause another dup in your code at a later point. Regan
Jul 06 2007
parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Regan Heath wrote:
 Bruno Medeiros wrote:
 The current signature:
   const(char)[] tolower(const(char)[] str)
 is kinda incorrect, because it returns a const reference for an 
 array that has no mutable references, and that is the same as an 
 invariant reference, so tolower might as well return invariant(char)[].

Again, that only holds if a copy was actually made at run time. If no copy was made the original input is returned, to which there may be mutable references.

You're right, if a copy is not made *every* time (which is the case after all), then the above doesn't hold. But then, what I think is happening is that Phobo's current tolower is suboptimal in terms of usefulness, because the fact that we don't know if a new copy is made or not. I'm wondering now what would be the more useful form, or forms, of tolower (and similar functions) to have. Now that I think of it again (admittedly I haven't got much experience with string manipulation in C++ or D, though), but perhaps the best form is an in-place mutable version: char[] tolower(char[] str); And it's this one after all that is the most general form. If you want to call tolower on a const or invariant array you dup it yourself on the call: char[] str = tolower("FOO".dup);

True.. but it's unfortunate that the most efficient case, where no duplication is needed, is no longer possible :(

Algoritms should care about worst-case performance, or average-case performance. That most efficient "case", where a string is already tolower, is a minority case in most applications, and is never a worst-case scenario. So why bother? Also, doing this tolower like that would give other performance problems like these:
 The only problem is that the case where you pass const data and it has 
 to dup, you get back a const reference to a piece of data with no other 
 owner (meaning it doesn't need to be const) which might cause another 
 dup in your code at a later point.
 
 Regan

Indeed, with such scenario, you would end up with worse performance overall. -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 07 2007
prev sibling parent Regan Heath <regan netmail.co.nz> writes:
Bruno Medeiros wrote:
 The 'something clever' to distinguish both cases is simply naming two 
 different functions, like tolower or tolowerinv (if the second function 
 is needed at all).

I was hoping for something clever'er ;) Regan
Jul 05 2007
prev sibling next sibling parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Regan Heath wrote:
 Walter Bright Wrote:
 Derek Parnell wrote:
 However, if I might need to update it ...

    char[] fullpath;

    fullpath = CanonicalPath(shortname).dup;
    version(Windows)
    {
       setLowerCase(fullpath);
    }

 The point is that the 'CanonicalPath' function hasn't got a clue what the
 calling function is intending to do with the result so it is trying to be
 responsible by guarding it against mistakes by the caller.

string fullpath; fullpath = CanonicalPath(shortname); version(Windows) { fullpath = std.string.tolower(fullpath); } you won't need to do the .dup .

Because tolower does it for you, but it still returns string and if for example you need to add something to the end of the path, like a filename you will end up doing yet another dup somewhere. I think the solution may be to template all functions which return the input string, or part of the input string, eg. T tolower(T)(T input) { } That way if you call it with char[] you get a char[] back, if you call it with string you get a string back.

It doesn't make sense to template it, because you'd still have two different function versions, that would work differently. The one that receives a string does a dup, the one that receives a char[] does not dup. The return type of tolower(string str) might also be char[] and not string, if tolower(string str) would allways does a dup, even if no character modifications are necessary. -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 05 2007
parent Regan Heath <regan netmail.co.nz> writes:
Bruno Medeiros wrote:
 It doesn't make sense to template it, because you'd still have two 
 different function versions, that would work differently. The one that 
 receives a string does a dup, the one that receives a char[] does not 
 dup. The return type of tolower(string str) might also be char[] and not 
 string, if tolower(string str) would allways does a dup, even if no 
 character modifications are necessary.

If the template is T tolower(T)(T input) {} then you have string tolower(string input) {} char[] tolower(char[] input) {} and you cases are: 1. input string, output same string (no dup) 2. input string, output string (dup) 3. input char[], output same char[] (no dup) Case #2 is admitedly not ideal because it may cause a later dup in your code. But case #1 handles the efficient no modification case of string and case #3 handles both modification and non-modification without any call to dup. I think the above is better than the current implementation as it avoids a dup in case #3. Regan
Jul 06 2007
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Regan Heath wrote:
 Walter Bright Wrote:
 string fullpath;

 fullpath = CanonicalPath(shortname);
 version(Windows)
 {
        fullpath = std.string.tolower(fullpath);
 }

 you won't need to do the .dup .

Because tolower does it for you, but it still returns string

tolower only dups the string if it needs to. It won't dup a string that is already in lower case.
 and if for example
 you need to add something to the end of the path, like a filename you 
 will end up
 doing yet another dup somewhere.

Concatenating strings does not require a .dup.
Jul 05 2007
next sibling parent Regan Heath <regan netmail.co.nz> writes:
Walter Bright wrote:
 Regan Heath wrote:
 Walter Bright Wrote:
 string fullpath;

 fullpath = CanonicalPath(shortname);
 version(Windows)
 {
        fullpath = std.string.tolower(fullpath);
 }

 you won't need to do the .dup .

Because tolower does it for you, but it still returns string

tolower only dups the string if it needs to. It won't dup a string that is already in lower case.

Sure, but there is still a case where it does dup. (dup #1)
  > and if for example
  > you need to add something to the end of the path, like a filename you 
  > will end up
  > doing yet another dup somewhere.
 
 Concatenating strings does not require a .dup.

opCatAssign does. (dup #2) OR newString = constString ~ bitToAdd; (is a copy of constString to newString which is essentially a dup) (dup #2) So, the worst case scenario is that 2 dups are done. Further if the input is char[] you can still get this worst case scenario because tolower returns string instead of char[]. With a templated version you get a much more efficient tolower for char[]. Regan
Jul 06 2007
prev sibling parent Regan Heath <regan netmail.co.nz> writes:
Proof of concept.

Only duplicate when the input is 'string' allowing for more efficient 
handling of char[] parameters and allowing callers to pass mutable 
char[] parameter, recieve the result as a mutable char[] and avoid 
future dup calls on the returned data.

Output:
sStringM: 0x  416080 becomes 0x  880FD0 DUP
sCharM  : 0x  880FE0 becomes 0x  880FE0 SAME
sString : 0x  416110 becomes 0x  416110 SAME
sChar   : 0x  880FC0 becomes 0x  880FC0 SAME


Code:
# /*
#  * Common Public License Version 1.0
#  * http://www.opensource.org/licenses/cpl1.0.php
#  */
# import std.stdio;
#
# void main()
# {
# 	string sStringM = "tEsT";
# 	char[] sCharM = sStringM.dup;
# 	string rStringM = .tolower(sStringM);
# 	char[] rCharM = .tolower(sCharM);
# 	
# 	writefln("sStringM: 0x%08x becomes 0x%08x %s", sStringM.ptr, 
rStringM.ptr, (sStringM.ptr!=rStringM.ptr)?"DUP":"SAME");
# 	writefln("sCharM  : 0x%08x becomes 0x%08x %s", sCharM.ptr, 
rCharM.ptr, (sCharM.ptr!=rCharM.ptr)?"DUP":"SAME");
#
# 	string sString = "test";
# 	char[] sChar = sString.dup;
# 	string rString = .tolower(sString);
# 	char[] rChar = .tolower(sChar);
#
# 	writefln("sString : 0x%08x becomes 0x%08x %s", sString.ptr, 
rString.ptr, (sString.ptr!=rString.ptr)?"DUP":"SAME");
# 	writefln("sChar   : 0x%08x becomes 0x%08x %s", sChar.ptr, rChar.ptr, 
(sChar.ptr!=rChar.ptr)?"DUP":"SAME");
# }
#
# T tolower(T)(T s)
# {
#     bool changed;
#     char[] r;
#
#     if (is(typeof(s) == char[]))
#     {
#     	changed = true;
#     	r = cast(char[])s;
#     }
#
#     for (size_t i = 0; i < s.length; i++)
#     {
# 	auto c = s[i];
# 	if ('A' <= c && c <= 'Z')
# 	{
# 	    if (!changed)
# 	    {
# 		r = s.dup;
# 		changed = true;
# 	    }
# 	    r[i] = cast(char) (c + (cast(char)'a' - 'A'));
# 	}
# 	else if (c >= 0x7F)
# 	{
# 	    foreach(size_t j, dchar dc; s[i .. length])
# 	    {
# 		if (std.uni.isUniUpper(dc))
# 		{
# 		    dc = std.uni.toUniLower(dc);
# 		    if (!changed)
# 		    {
# 			r = s[0 .. i + j].dup;
# 			changed = true;
# 		    }
# 		}
# 		if (changed)
# 		{
# 		    if (r.length != i + j)
# 			r = r[0 .. i + j];
# 		    std.utf.encode(r, dc);
# 		}
# 	    }
# 	    break;
# 	}
#     }
#     return changed ? r : s;
# }
Jul 06 2007
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Thu, 05 Jul 2007 01:06:45 -0700, Walter Bright wrote:

 Derek Parnell wrote:
 However, if I might need to update it ...
 
    char[] fullpath;
 
    fullpath = CanonicalPath(shortname).dup;
    version(Windows)
    {
       setLowerCase(fullpath);
    }
 
 The point is that the 'CanonicalPath' function hasn't got a clue what the
 calling function is intending to do with the result so it is trying to be
 responsible by guarding it against mistakes by the caller.

If you write it like this: string fullpath; fullpath = CanonicalPath(shortname); version(Windows) { fullpath = std.string.tolower(fullpath); } you won't need to do the .dup .

If you have any failing Walter, its your ability to focus on insignifacnt minutia as a form of distraction from the point that people are really trying to make. I was not talking about how to do efficient lower case conversion. I'll make my code example more free from assumed functionality. char[] qwerty; qwerty = KJHGF(poiuy).dup; version(xyzzy) { MNBVC(qwerty); } As you can see, my point is made without regard to converting stuff to lower case. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Jul 05 2007
next sibling parent Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Derek Parnell wrote:

 I'll make my code example more free from assumed functionality.
 
 
  char[] qwerty;
  
  qwerty = KJHGF(poiuy).dup;
  version(xyzzy)
  {
      MNBVC(qwerty);
  }
 
 As you can see, my point is made without regard to converting stuff to
 lower case.

What you are doing there is mixing two styles of functions. Functional (KJHGF) and in-place modifying functions (MNBVC). Walter's modification was making both use a common style (functional). Mixing those two function styles will naturally require different types of constness. -- Oskar
Jul 05 2007
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Derek Parnell wrote:
 I'll make my code example more free from assumed functionality.
 
 
  char[] qwerty;
  
  qwerty = KJHGF(poiuy).dup;
  version(xyzzy)
  {
      MNBVC(qwerty);
  }
 
 As you can see, my point is made without regard to converting stuff to
 lower case.

My point is that the way the snippet is written is inside out. Do not use .dup to preemptively make a copy in case it gets changed somewhere later one. The style is to make a .dup *only if* the contents will be changed and do the .dup *at the site* of the modification. In other words, dups should be done from the bottom up, not from the top down. I think such a style helps fit things together nicely and avoids strange .dups appearing in inexplicable places.
Jul 05 2007
next sibling parent Derek Parnell <derek psych.ward> writes:
On Thu, 05 Jul 2007 11:51:30 -0700, Walter Bright wrote:

 Derek Parnell wrote:
 I'll make my code example more free from assumed functionality.
 
  char[] qwerty;
  
  qwerty = KJHGF(poiuy).dup;
  version(xyzzy)
  {
      MNBVC(qwerty);
  }
 
 As you can see, my point is made without regard to converting stuff to
 lower case.

My point is that the way the snippet is written is inside out. Do not use .dup to preemptively make a copy in case it gets changed somewhere later one. The style is to make a .dup *only if* the contents will be changed and do the .dup *at the site* of the modification. In other words, dups should be done from the bottom up, not from the top down. I think such a style helps fit things together nicely and avoids strange .dups appearing in inexplicable places.

Thanks. This is what I meant by taking rethinking the design of my routines. I'll strongly consider your suggestion even though it does complicate the algorirhm for readers of the code. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Jul 05 2007
prev sibling parent reply BCS <ao pathlink.com> writes:
Reply to Walter,

 Derek Parnell wrote:
 
 I'll make my code example more free from assumed functionality.
 
 char[] qwerty;
 
 qwerty = KJHGF(poiuy).dup;
 version(xyzzy)
 {
 MNBVC(qwerty);
 }
 As you can see, my point is made without regard to converting stuff
 to lower case.
 

use .dup to preemptively make a copy in case it gets changed somewhere later one. The style is to make a .dup *only if* the contents will be changed and do the .dup *at the site* of the modification. In other words, dups should be done from the bottom up, not from the top down. I think such a style helps fit things together nicely and avoids strange .dups appearing in inexplicable places.

The one issue I can see with this is where an input is const but may be changed (and .duped) at any of a number of points. The data though only needs to be .duped once. |char[] Whatever(const char[] str) |{ | if(c1) str = Mod1(str.dup); | if(c2) str = Mod2(str.dup); | if(c3) str = Mod3(str.dup); | return str; |} // causes exces duping I can't think of a better solution than this (and this is BAD): |char[] Whatever(const char[] str) |{ | sw: switch(-1) | { | foreach(bool b; T!(true, false)) | { | if(c1) {static if(b){str = str.dup; goto case 1;} else {case 1: str = Mod1(str.dup);}} | if(c2) {static if(b){str = str.dup; goto case 2;} else {case 2: str = Mod2(str.dup);}} | if(c3) {static if(b){str = str.dup; goto case 3;} else {case 3: str = Mod3(str.dup);}} | return str; | } | } |}
Jul 05 2007
parent Walter Bright <newshound1 digitalmars.com> writes:
BCS wrote:
 The one issue I can see with this is where an input is const but may be 
 changed (and .duped) at any of a number of points. The data though only 
 needs to be .duped once.
 
 |char[] Whatever(const char[] str)
 |{
 | if(c1) str = Mod1(str.dup);
 | if(c2) str = Mod2(str.dup);
 | if(c3) str = Mod3(str.dup);
 | return str;
 |}
 // causes exces duping

My experience with this is: 1) Such cases are unusual 2) The few cases where they do happen, they are not in that 5% of the code that is a bottleneck 3) If such code is performance critical, there's usually a better way to write it that will yield even better performance than taking repeated passes over the same string. Best performance usually comes by merging all the operations into one pass.
Jul 05 2007
prev sibling parent Sean Kelly <sean f4.ca> writes:
Derek Parnell wrote:
 On Thu, 05 Jul 2007 00:15:41 -0700, Sean Kelly wrote:
 
 Derek Parnell wrote:
 I'm converting Bud to compile using V2 and so far its been a very hard
 thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all
 over the place, which is exactly what I thought would happen. Bud does a
 lot of text manipulation so having 'string' as invariant means that calls
 to functions that return string need to often be .dup'ed because I need to
 assign the result to a malleable variable. 

much either.

It's not so clear cut. Firstly, a lot of phobos routines now return 'string' results and expect 'string' inputs.

I'd argue that the parameters should be "const char[]" rather than "string", and it's hard to say for the return values.
 Secondly, I like the idea of
 general purpose functions returning 'const' data, because it helps guard
 against inadvertent modifications by the calling routines. It is up to the
 calling function to explicitly decide if it is going to modify returned
 stuff or not.
 
 For example, if I know that I'll not need to modify the 'fullpath' then I
 might do this ...
 
    string fullpath;
 
    fullpath = CanonicalPath(shortname);

I would say that whether the return value is const/invariant indicates ownership. If the called function/class owns the data then it is const or invariant. If it does not then it is not const/invariant. This seems to largely limit "string" as a return value to property methods.
 However, if I might need to update it ...
 
    char[] fullpath;
 
    fullpath = CanonicalPath(shortname).dup;
    version(Windows)
    {
       setLowerCase(fullpath);
    }
 
 The point is that the 'CanonicalPath' function hasn't got a clue what the
 calling function is intending to do with the result so it is trying to be
 responsible by guarding it against mistakes by the caller.

Right. See above. Sean
Jul 05 2007
prev sibling parent reply "Kristian Kilpi" <kjkilpi gmail.com> writes:
On Thu, 05 Jul 2007 01:18:28 +0300, Derek Parnell <derek psych.ward> wro=
te:
 I'm converting Bud to compile using V2 and so far its been a very hard=

 thing to do. I'm finding that I'm now having to use '.dup' and '.idup'=

 all
 over the place, which is exactly what I thought would happen. Bud does=

 lot of text manipulation so having 'string' as invariant means that ca=

 to functions that return string need to often be .dup'ed because I nee=

 to
 assign the result to a malleable variable.

 I might have to rethink of the design of the application to avoid the
 performance hit of all these dups.

That got me thinking about string functions in general. First, I am wondering why some functions are formed as follows: (but I'm sure someone will (hopefully) enlight me about that ;) ) string foo(string bar); That is, if they return something else than 'bar' (they do some string = manipulation). Shouldn't they return char[] instead? For example: char[] foo(string bar) { return bar ~ "blah"; } And this brings us to the 'tolower()' function (for instance). Sometimes it .dups and sometimes it doesn't. So, if I don't know if the = = input string contains upper cased chars, I have to .dup the return value, even if it = = may already been .dupped by 'tolower()'... char[] a =3D "abc".dup; char[] b =3D tolower(a).dub; //.dupped once ('tolower()' returns pla= in = 'a') char[] a =3D "ABC".dup; char[] b =3D tolower(a).dub; //.dupped twice! So 'tolower()' is a hybrid of two function groups: (1) functions that modify the input string, (2) functions that returns a (modified) copy of the input string. (If the input string doesn't contains upper cased chars it behaves like = (1) (even if it doesn't actually modify the input string), otherwise it = behaves like (2).) I don't think this is a good thing. There should be two different functions, one for each group: char[] tolower(char[] str); //modifies and returns 'str' char[] getlower(string str); //returns a copy If one likes the copy-on-write behaviour of 'tolower(), I think it would= work only by using reference counting. For example (the 'String' class uses reference counting): String a, b; a =3D "abc"; b =3D tolower(a); //'b' points to 'a' ('tolower()' simply returns 'a= ') b[0] =3D 'x'; //'b' .dups its contents before modification, so 'a' i= s not = changed
Jul 05 2007
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Kristian Kilpi wrote:
 First, I am wondering why some functions are formed as follows:
 (but I'm sure someone will (hopefully) enlight me about that ;) )
 
   string foo(string bar);
 
 That is, if they return something else than 'bar' (they do some string 
 manipulation).
 Shouldn't they return char[] instead?

No, because then they must always dup the string. If they don't need to dup the string, they can return a reference to the parameter, and if so, it must be const.
 There should be two different functions, one for each group:
 
   char[] tolower(char[] str);  //modifies and returns 'str'
 
   char[] getlower(string str);  //returns a copy

When one would use a mutating tolower, one is already manipulating the contents of a string character by character. In such cases, one can tolower the characters in that process, instead of doing it later (the former will be more efficient anyway, and the only advantage to a mutating tolower is an efficiency improvement). Using the functional-style copy-on-write string functions will result in easy to understand, less buggy programs. Doing strings in this manner is a proven success in just about every programming language.
Jul 05 2007
parent "Kristian Kilpi" <kjkilpi gmail.com> writes:
On Thu, 05 Jul 2007 22:11:37 +0300, Walter Bright  =

<newshound1 digitalmars.com> wrote:
 Kristian Kilpi wrote:
 First, I am wondering why some functions are formed as follows:
 (but I'm sure someone will (hopefully) enlight me about that ;) )
    string foo(string bar);
  That is, if they return something else than 'bar' (they do some stri=


 manipulation).
 Shouldn't they return char[] instead?

No, because then they must always dup the string. If they don't need t=

 dup the string, they can return a reference to the parameter, and if s=

 it must be const.

 There should be two different functions, one for each group:
    char[] tolower(char[] str);  //modifies and returns 'str'
    char[] getlower(string str);  //returns a copy

When one would use a mutating tolower, one is already manipulating the=

 contents of a string character by character. In such cases, one can  =

 tolower the characters in that process, instead of doing it later (the=

 former will be more efficient anyway, and the only advantage to a  =

 mutating tolower is an efficiency improvement).

That makes sense (especially with strings). Of course, as said, it's not a perfect solution because unnecessary .dupping can occur. For example: s =3D "blah " ~ foo(tolower(str).dup); 'foo()' modifies its input string and returns it. If 'foo' would be a copy-on-write function, you could just do: s =3D "blah " ~ foo(tolower(str)); That's much nicer, but 'str' could be copied twice in both the cases abo= ve. If both 'foo()' and 'tolower()' would modify 'str', no copying had been done (by these functions). Well, it's just how you like to code and build things. Both the ways have their own pros and cons.
Jul 06 2007