digitalmars.D - V2 string

Derek Parnell (12/12) Jul 04 2007 I'm converting Bud to compile using V2 and so far its been a very hard

Walter Bright (5/15) Jul 04 2007 First of all, if you were returning string literals as char[] and trying...

Derek Parnell (23/39) Jul 04 2007 But I'm not, and never have been, returning string literals anywhere.

Vladimir Panteleev (13/21) Jul 04 2007 Is SomeTextFunc allocating a copy of the string which it is returning? I...

Derek Parnell (28/52) Jul 04 2007 Yes, I realize this and I'm not saying its doing the wrong thing, and

Walter Bright (5/28) Jul 05 2007 If you're needing to guard against inadvertent modification, that's just...

Regan Heath (4/36) Jul 05 2007 Aaargh! You're confusing empty and non-existant (null) again!

Walter Bright (4/6) Jul 05 2007 The only case is when you're extending into a preallocated buffer. Such

James Dennett (9/16) Jul 05 2007 But a way of emptying something was asked for, and you showed

Walter Bright (2/6) Jul 05 2007 I'd like to know of such cases.

Derek Parnell (13/20) Jul 05 2007 char[] Option;

Derek Parnell (16/36) Jul 05 2007 And if you must nitpick that one can code this a different way then here...

Bill Baxter (6/41) Jul 05 2007 In databases NULL being different from empty seems to a big deal too.

Sean Kelly (3/7) Jul 06 2007 Either that or it's important to a non-null set of programmers.

Walter Bright (6/16) Jul 06 2007 Of course, if a function is documented to behave that way, and you have

Regan Heath (12/32) Jul 06 2007 The first argument which I think holds water is that it is trivial to
Bruno Medeiros (16/36) Jul 07 2007 Uh, unlike tab stops, I think it is widely recognized by the developer

Leandro Lucarella (17/37) Jul 06 2007 Basically is the same issue as NULL and NOT NULL on SQL...

James Dennett (14/21) Jul 06 2007 Any time you need a difference between "specified, and
Serg Kovrov (13/20) Jul 07 2007 I used to this pattern:

Derek Parnell (22/53) Jul 05 2007 There is no issue. I'm not raising an issue. I'm just making some

Walter Bright (6/20) Jul 05 2007 Such a distinction is critical in C code, but is not of much use in D

Regan Heath (13/41) Jul 05 2007 Question; Do these functions keep a copy of the returned string? Or, t...
Bruno Medeiros (9/36) Jul 05 2007 Why is 'text.length = 0;' or 'text = text.init;' better than the idiom:

Sean Kelly (4/10) Jul 05 2007 So just use char[] instead of 'string'. I don't plan to use the aliases...

Derek Parnell (26/36) Jul 05 2007 It's not so clear cut. Firstly, a lot of phobos routines now return

Walter Bright (9/22) Jul 05 2007 If you write it like this:

Regan Heath (15/41) Jul 05 2007 Because tolower does it for you, but it still returns string and if for ...

Bruno Medeiros (23/75) Jul 05 2007 Indeed, I think this illustrates that some standard library functions

Frits van Bommel (11/42) Jul 05 2007 Sorry, but you seem to have missed a bit above: if the string doesn't

Bruno Medeiros (20/67) Jul 05 2007 Oops, sorry, that's right, I missed that part about tolower not

Regan Heath (18/42) Jul 06 2007 True.. but it's unfortunate that the most efficient case, where no

Bruno Medeiros (11/46) Jul 07 2007 Algoritms should care about worst-case performance, or average-case

Regan Heath (3/6) Jul 05 2007 I was hoping for something clever'er ;)

Bruno Medeiros (10/47) Jul 05 2007 It doesn't make sense to template it, because you'd still have two

Regan Heath (17/23) Jul 06 2007 If the template is

Walter Bright (4/20) Jul 05 2007 tolower only dups the string if it needs to. It won't dup a string that

Regan Heath (11/33) Jul 06 2007 opCatAssign does. (dup #2)
Regan Heath (88/88) Jul 06 2007 Proof of concept.

Derek Parnell (19/45) Jul 05 2007 If you have any failing Walter, its your ability to focus on insignifacn...

Oskar Linde (8/21) Jul 05 2007 What you are doing there is mixing two styles of functions. Functional
Walter Bright (9/22) Jul 05 2007 My point is that the way the snippet is written is inside out. Do not

Derek Parnell (8/32) Jul 05 2007 Thanks. This is what I meant by taking rethinking the design of my
BCS (29/54) Jul 05 2007 The one issue I can see with this is where an input is const but may be ...

Walter Bright (9/21) Jul 05 2007 My experience with this is:

Sean Kelly (9/48) Jul 05 2007 I'd argue that the parameters should be "const char[]" rather than

Kristian Kilpi (51/61) Jul 05 2007 =

Walter Bright (12/25) Jul 05 2007 No, because then they must always dup the string. If they don't need to

Kristian Kilpi (21/39) Jul 06 2007 ng =

Derek Parnell <derek psych.ward> writes:

I'm converting Bud to compile using V2 and so far its been a very hard
thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all
over the place, which is exactly what I thought would happen. Bud does a
lot of text manipulation so having 'string' as invariant means that calls
to functions that return string need to often be .dup'ed because I need to
assign the result to a malleable variable. 

I might have to rethink of the design of the application to avoid the
performance hit of all these dups.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Jul 04 2007

Walter Bright <newshound1 digitalmars.com> writes:

Derek Parnell wrote:
 I'm converting Bud to compile using V2 and so far its been a very hard
 thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all
 over the place, which is exactly what I thought would happen. Bud does a
 lot of text manipulation so having 'string' as invariant means that calls
 to functions that return string need to often be .dup'ed because I need to
 assign the result to a malleable variable. 
 
 I might have to rethink of the design of the application to avoid the
 performance hit of all these dups.
 

First of all, if you were returning string literals as char[] and trying 
to manipulate them, they'd fail on linux at run time (because string 
literals are put into read only segments).

Second, you can use char[] instead of string.

Jul 04 2007

Derek Parnell <derek psych.ward> writes:

On Wed, 04 Jul 2007 15:48:45 -0700, Walter Bright wrote:

 Derek Parnell wrote:
 I'm converting Bud to compile using V2 and so far its been a very hard
 thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all
 over the place, which is exactly what I thought would happen. Bud does a
 lot of text manipulation so having 'string' as invariant means that calls
 to functions that return string need to often be .dup'ed because I need to
 assign the result to a malleable variable. 
 
 I might have to rethink of the design of the application to avoid the
 performance hit of all these dups.
 

 
 First of all, if you were returning string literals as char[] and trying 
 to manipulate them, they'd fail on linux at run time (because string 
 literals are put into read only segments).

But I'm not, and never have been, returning string literals anywhere.
 
 Second, you can use char[] instead of string.

The idiom I'm using is that functions that receive text have those
parameters as 'string' to guard against the function inadvertantly
modifying that which is passed, and functions that return text return
'string' to guard against calling functions inadvertantly modifying data
that they did not create (own).

This leads to constructs like ...

   char[] result;

   result = SomeTextFunc(data).dup;

Another commonly used idiom that I had to stop using was ...

   char[] text;
   text = getvalue();
   if (wrongvalue(text))
       text = ""; // Reset to an empty string

I now code ...

       text.length = 0; // Reset to an empty string

which is slightly less readable.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Jul 04 2007

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Thu, 05 Jul 2007 02:23:11 +0300, Derek Parnell <derek psych.ward> wro=
te:

 This leads to constructs like ...

    char[] result;

    result =3D SomeTextFunc(data).dup;

Is SomeTextFunc allocating a copy of the string which it is returning? I=
f it is, then there's no reason why it should return a "string" type. If=
 it isn't, then modifying the data in the returned char[] could have unf=
oreseen consequences.

 Another commonly used idiom that I had to stop using was ...

    char[] text;
    text =3D getvalue();
    if (wrongvalue(text))
        text =3D ""; // Reset to an empty string

Since empty string literals don't really point to data, I'd suggest that=
 empty string and array literals shouldn't be const/invariant in favor o=
f the above example. It breaks some consistency, but "a foolish consiste=
ncy is the hobgoblin of little minds" ;)

-- =

Best regards,
  Vladimir                          mailto:thecybershadow gmail.com

Jul 04 2007

Derek Parnell <derek nomail.afraid.org> writes:

On Thu, 05 Jul 2007 04:44:41 +0300, Vladimir Panteleev wrote:

 On Thu, 05 Jul 2007 02:23:11 +0300, Derek Parnell <derek psych.ward> wrote:
 
 This leads to constructs like ...

    char[] result;

    result = SomeTextFunc(data).dup;

 
 Is SomeTextFunc allocating a copy of the string which it is returning?
 If it is, then there's no reason why it should return a "string" type. 
 If it isn't, then modifying the data in the returned char[] could have
 unforeseen consequences.

Yes, I realize this and I'm not saying its doing the wrong thing, and
actually I'm not even complaining. I'm just letting people know some of the
observations I've had in moving to v2. In this case, someone has to copy
the resulting data - either the function that created it or the routine
that called the function. If the called function does the duplication, it
could be a waste if the calling function is not going to further modify it,
that is why I elected to pass a 'const' reference to the new data. The
calling function can then decide if it needs a copy (to modify it) or not.

   string result;
   result = SomeTextFunc(data); // no need to dup if I'm not changing it.


I've got a set of alias to help me ...

   alias char[]  text;
   alias wchar[] wtext;
   alias dchar[] dtext;

so now I see 'text' as mutable and 'string' as immutable.

 Another commonly used idiom that I had to stop using was ...

    char[] txt;
    txt = getvalue();
    if (wrongvalue(txt))
        txt = ""; // Reset to an empty string

 
 Since empty string literals don't really point to data, I'd 
 suggest that empty string and array literals shouldn't be
 const/invariant in favor of the above example. It breaks some
 consistency, but "a foolish consistency is the hobgoblin of
 little minds" ;)

Nice idea, but I can't see it happening because of the inconsistency angle.

Instead I've decided to use the idiom ...

    text txt;
    txt = getvalue();
    if (wrongvalue(txt))
        txt = text.init; // Reset to an empty string
  
-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
5/07/2007 3:52:27 PM

Jul 04 2007

Walter Bright <newshound1 digitalmars.com> writes:

Derek Parnell wrote:
 The idiom I'm using is that functions that receive text have those
 parameters as 'string' to guard against the function inadvertantly
 modifying that which is passed, and functions that return text return
 'string' to guard against calling functions inadvertantly modifying data
 that they did not create (own).
 
 This leads to constructs like ...
 
    char[] result;
 
    result = SomeTextFunc(data).dup;

If you're needing to guard against inadvertent modification, that's just 
what const strings are for. I'm not understanding the issue here.

 Another commonly used idiom that I had to stop using was ...
 
    char[] text;
    text = getvalue();
    if (wrongvalue(text))
        text = ""; // Reset to an empty string
 
 I now code ...
 
        text.length = 0; // Reset to an empty string
 
 which is slightly less readable.

This should do it nicely:

	text = null;

Jul 05 2007

Regan Heath <regan netmail.co.nz> writes:

Walter Bright Wrote:
 Derek Parnell wrote:
 The idiom I'm using is that functions that receive text have those
 parameters as 'string' to guard against the function inadvertantly
 modifying that which is passed, and functions that return text return
 'string' to guard against calling functions inadvertantly modifying data
 that they did not create (own).
 
 This leads to constructs like ...
 
    char[] result;
 
    result = SomeTextFunc(data).dup;

 
 If you're needing to guard against inadvertent modification, that's just 
 what const strings are for. I'm not understanding the issue here.
 
 Another commonly used idiom that I had to stop using was ...
 
    char[] text;
    text = getvalue();
    if (wrongvalue(text))
        text = ""; // Reset to an empty string
 
 I now code ...
 
        text.length = 0; // Reset to an empty string
 
 which is slightly less readable.

 
 This should do it nicely:
 
 	text = null;

Aaargh!  You're confusing empty and non-existant (null) again!  <g>

In some cases there is an important difference between the two.  In this case
maybe not I don't really know.

Regan

Jul 05 2007

Walter Bright <newshound1 digitalmars.com> writes:

Regan Heath wrote:
 Aaargh!  You're confusing empty and non-existant (null) again!  <g>

In this case, no.

 In some cases there is an important difference between the two.

The only case is when you're extending into a preallocated buffer. Such 
cannot be the case with string literals.

Jul 05 2007

James Dennett <jdennett acm.org> writes:

Walter Bright wrote:
 Regan Heath wrote:
 Aaargh!  You're confusing empty and non-existant (null) again!  <g>

 
 In this case, no.

But a way of emptying something was asked for, and you showed
a way to make it null, not empty -- can you explain your "In
this case, no"?

 In some cases there is an important difference between the two.

 
 The only case is when you're extending into a preallocated buffer. 

I've found many times when the difference between an empty
string and no string was important; they generally have
nothing to do with extending at all.  I'd be interested to
know why you assert that no such cases exist.

-- James

Jul 05 2007

Walter Bright <newshound1 digitalmars.com> writes:

James Dennett wrote:
 I've found many times when the difference between an empty
 string and no string was important; they generally have
 nothing to do with extending at all.  I'd be interested to
 know why you assert that no such cases exist.

I'd like to know of such cases.

Jul 05 2007

Derek Parnell <derek psych.ward> writes:

On Thu, 05 Jul 2007 20:58:11 -0700, Walter Bright wrote:

 James Dennett wrote:
 I've found many times when the difference between an empty
 string and no string was important; they generally have
 nothing to do with extending at all.  I'd be interested to
 know why you assert that no such cases exist.

 
 I'd like to know of such cases.

  char[] Option;

  Option = getOptionFromUser();
  if (Option.ptr = 0)
  {
   Option = DefaultOption;
  }

However, if the user sets the option to "" then that is what they want and
not the default one.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Jul 05 2007

Derek Parnell <derek nomail.afraid.org> writes:

On Fri, 6 Jul 2007 14:23:43 +1000, Derek Parnell wrote:

 On Thu, 05 Jul 2007 20:58:11 -0700, Walter Bright wrote:
 
 James Dennett wrote:
 I've found many times when the difference between an empty
 string and no string was important; they generally have
 nothing to do with extending at all.  I'd be interested to
 know why you assert that no such cases exist.

 
 I'd like to know of such cases.

 
   char[] Option;
 
   Option = getOptionFromUser();
   if (Option.ptr = 0)
   {
    Option = DefaultOption;
   }
 
 However, if the user sets the option to "" then that is what they want and
 not the default one.

And if you must nitpick that one can code this a different way then here is
another example.

Let's say that there is this library routine, which is closed source and I
don't have access to its source, that accepts a string as its argument.
Further more, if that passed string is null the routine uses a default
value - whatever that is because I don't know it. Now in my code I call it
with ...

   SomeFunc("");   -- Use an empty string to do its magic
   SomeFunc(null); -- But this time, use the default value

Remember, I have no control over the SomeFunc routine's implementation.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
6/07/2007 2:54:45 PM

Jul 05 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Derek Parnell wrote:
 On Fri, 6 Jul 2007 14:23:43 +1000, Derek Parnell wrote:
 
 On Thu, 05 Jul 2007 20:58:11 -0700, Walter Bright wrote:

 James Dennett wrote:
 I've found many times when the difference between an empty
 string and no string was important; they generally have
 nothing to do with extending at all.  I'd be interested to
 know why you assert that no such cases exist.

 I'd like to know of such cases.

   char[] Option;

   Option = getOptionFromUser();
   if (Option.ptr = 0)
   {
    Option = DefaultOption;
   }

 However, if the user sets the option to "" then that is what they want and
 not the default one.

 
 And if you must nitpick that one can code this a different way then here is
 another example.
 
 Let's say that there is this library routine, which is closed source and I
 don't have access to its source, that accepts a string as its argument.
 Further more, if that passed string is null the routine uses a default
 value - whatever that is because I don't know it. Now in my code I call it
 with ...
 
    SomeFunc("");   -- Use an empty string to do its magic
    SomeFunc(null); -- But this time, use the default value
 
 Remember, I have no control over the SomeFunc routine's implementation.
 

In databases NULL being different from empty seems to a big deal too.

Anyway googling for "null versus empty" turns up a bevy of hits, so from 
that I think we can presume that the distinction is important to a 
non-empty subset of programmers.

--bb

Jul 05 2007

Sean Kelly <sean f4.ca> writes:

Bill Baxter wrote:
 
 Anyway googling for "null versus empty" turns up a bevy of hits, so from 
 that I think we can presume that the distinction is important to a 
 non-empty subset of programmers.

Either that or it's important to a non-null set of programmers.


;-) Sean

Jul 06 2007

Walter Bright <newshound1 digitalmars.com> writes:

Derek Parnell wrote:
 Let's say that there is this library routine, which is closed source and I
 don't have access to its source, that accepts a string as its argument.
 Further more, if that passed string is null the routine uses a default
 value - whatever that is because I don't know it. Now in my code I call it
 with ...
 
    SomeFunc("");   -- Use an empty string to do its magic
    SomeFunc(null); -- But this time, use the default value
 
 Remember, I have no control over the SomeFunc routine's implementation.

Of course, if a function is documented to behave that way, and you have 
no control over it, you must adhere to its documentation.

There are other ways to do default arguments. I suspect we could argue 
about it like we could argue about tab stops, and never reach any sort 
of resolution <g>.

Jul 06 2007

Regan Heath <regan netmail.co.nz> writes:

Walter Bright wrote:
 Derek Parnell wrote:
 Let's say that there is this library routine, which is closed source 
 and I
 don't have access to its source, that accepts a string as its argument.
 Further more, if that passed string is null the routine uses a default
 value - whatever that is because I don't know it. Now in my code I 
 call it
 with ...

    SomeFunc("");   -- Use an empty string to do its magic
    SomeFunc(null); -- But this time, use the default value

 Remember, I have no control over the SomeFunc routine's implementation.

 
 Of course, if a function is documented to behave that way, and you have 
 no control over it, you must adhere to its documentation.
 
 There are other ways to do default arguments. I suspect we could argue 
 about it like we could argue about tab stops, and never reach any sort 
 of resolution <g>.

The first argument which I think holds water is that it is trivial to 
represent empty and non existant in C, eg.

char *empty = "";
char *non-existant = NULL;

The other argument is the one made earlier about databases.  In a 
database empty and non-existant are important distinct states a value 
could have.

Currently, D can model these but it worries me that you don't seem to 
think that it's important.  So, perhaps in future you might decide to 
get rid of this, or do so accidently.

Regan

Jul 06 2007

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Walter Bright wrote:
 Derek Parnell wrote:
 Let's say that there is this library routine, which is closed source 
 and I
 don't have access to its source, that accepts a string as its argument.
 Further more, if that passed string is null the routine uses a default
 value - whatever that is because I don't know it. Now in my code I 
 call it
 with ...

    SomeFunc("");   -- Use an empty string to do its magic
    SomeFunc(null); -- But this time, use the default value

 Remember, I have no control over the SomeFunc routine's implementation.

 
 Of course, if a function is documented to behave that way, and you have 
 no control over it, you must adhere to its documentation.
 
 There are other ways to do default arguments. I suspect we could argue 
 about it like we could argue about tab stops, and never reach any sort 
 of resolution <g>.

Uh, unlike tab stops, I think it is widely recognized by the developer 
community that it is useful to have a distinction between *valid* and 
*invalid* values of something.

Why is there a NAN for floats (and in D NAN is the default value for 
floats) ? What if NAN was equal to zero? Didn't you yourself, Walter, 
said once that if there was a way to have an actual invalid value for 
ints (without sacrificing precision) you would like to have that, and 
you would place it as the default value for int, instead of -1 (which is 
a valid int)?
So why shouldn't arrays (who are already reference types) have a value 
that means "invalid array", especially if we can get that for free 
(unlike ints)?


-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jul 07 2007

Leandro Lucarella <llucax gmail.com> writes:

Derek Parnell, el  6 de julio a las 14:23 me escribiste:
 On Thu, 05 Jul 2007 20:58:11 -0700, Walter Bright wrote:
 
 James Dennett wrote:
 I've found many times when the difference between an empty
 string and no string was important; they generally have
 nothing to do with extending at all.  I'd be interested to
 know why you assert that no such cases exist.

 
 I'd like to know of such cases.

 
   char[] Option;
 
   Option = getOptionFromUser();
   if (Option.ptr = 0)
   {
    Option = DefaultOption;
   }
 
 However, if the user sets the option to "" then that is what they want and
 not the default one.

Basically is the same issue as NULL and NOT NULL on SQL...

-- 
LUCA - Leandro Lucarella - Usando Debian GNU/Linux Sid - GNU Generation
------------------------------------------------------------------------
E-Mail / JID:     luca lugmen.org.ar
GPG Fingerprint:  D9E1 4545 0F4B 7928 E82C  375D 4B02 0FE0 B08B 4FB2 
GPG Key:          gpg --keyserver pks.lugmen.org.ar --recv-keys B08B4FB2
------------------------------------------------------------------------
Sé que tu me miras, pero yo me juraría que, en esos ojos negros que
tenés, hay un indio sensible que piensa: "Qué bárbaro que este tipo
blanco esté tratando de comunicarse conmigo que soy un ser inferior en
la escala del homo sapiens". Por eso, querido indio, no puedo dejar de
mirarte como si fueras un cobayo de mierda al que puedo pisar cuando
quiera.
	-- Ricardo Vaporeso. Carta a los aborígenes, ed. Gredos,
		Barcelona, 1912, página 102.

Jul 06 2007

James Dennett <jdennett acm.org> writes:

Walter Bright wrote:
 James Dennett wrote:
 I've found many times when the difference between an empty
 string and no string was important; they generally have
 nothing to do with extending at all.  I'd be interested to
 know why you assert that no such cases exist.

 
 I'd like to know of such cases.

Any time you need a difference between "specified, and
known to be empty" and "unspecified or unknown", which
is very common.  The alternative is to carry a boolean
around to say whether the string is in use.

Others have raised the case of null meaning "use default"
(but let's not spend too much time on that specific case),
and the fact that the database world often (though not
always) distinguishes null from empty.  Many people have
found good reason to do this.  The "Maybe" or "Fallible"
type constructors used in other languages also cover cases
where "absent" can usefully be handled separately from
"empty" (in more general cases than just strings).

-- James

Jul 06 2007

Serg Kovrov <kovrov bugmenot.com> writes:

Walter Bright wrote:
 James Dennett wrote:
 I've found many times when the difference between an empty
 string and no string was important; they generally have
 nothing to do with extending at all.  I'd be interested to
 know why you assert that no such cases exist.

 
 I'd like to know of such cases.

I used to this pattern:
void foo(char[] bar=null)
{
     if (bar is null)
         m_bar = "default_value";
     else
         m_bar = bar; // even if it's empty
}

often as one-liner:
m_bar = (bar is null) ? "default_value" : bar;

This is most used one (at least by me), but of course there are more.


-- serg.

Jul 07 2007

Derek Parnell <derek psych.ward> writes:

On Thu, 05 Jul 2007 00:42:25 -0700, Walter Bright wrote:

 Derek Parnell wrote:
 The idiom I'm using is that functions that receive text have those
 parameters as 'string' to guard against the function inadvertantly
 modifying that which is passed, and functions that return text return
 'string' to guard against calling functions inadvertantly modifying data
 that they did not create (own).
 
 This leads to constructs like ...
 
    char[] result;
 
    result = SomeTextFunc(data).dup;

 
 If you're needing to guard against inadvertent modification, that's just 
 what const strings are for. I'm not understanding the issue here.

There is no issue. I'm not raising an issue. I'm just making some
observations about my exerience so far in moving to V2. 

I'm not surprised by the effort that I'm having. I expected it. Why?
Because I knew that most of the strings I work with are text (mutable
things) and by using the D 'string', an immutable thing, for function
signatures was going to mean I'd have to changes things to suit. 

I choose to use 'string' it safe guard myself from making stupid errors in
coding. And its working. My next pass through the application code will be
to find places where I can safely return a 'text' thing instead of a
'string' thing, which is a performance turning exercise.

 Another commonly used idiom that I had to stop using was ...
 
    char[] text;
    text = getvalue();
    if (wrongvalue(text))
        text = ""; // Reset to an empty string
 
 I now code ...
 
        text.length = 0; // Reset to an empty string
 
 which is slightly less readable.

 
 This should do it nicely:
 
 	text = null;

Not really. I want an empty text and not a non-text. Also, it doesn't fit
right with other data types - the consistency thing again.

   text = typeof(text).init; 

works better for me because I can also use this construct in templates
without problems.

But really, this thread can die now. I didn't mean to go off into weird
tangental subects.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Jul 05 2007

Walter Bright <newshound1 digitalmars.com> writes:

Derek Parnell wrote:
 This should do it nicely:

 	text = null;

 
 Not really. I want an empty text and not a non-text.

Such a distinction is critical in C code, but is not of much use in D 
code. What do you need the distinction for?

 Also, it doesn't fit
 right with other data types - the consistency thing again.
 
    text = typeof(text).init; 
 
 works better for me because I can also use this construct in templates
 without problems.

The .init for char[] is null, not "".

 But really, this thread can die now. I didn't mean to go off into weird
 tangental subects.

I think you've raised a couple of very important stylistic issues, and 
it is worth pursuing.

Jul 05 2007

Regan Heath <regan netmail.co.nz> writes:

Derek Parnell Wrote:
 On Wed, 04 Jul 2007 15:48:45 -0700, Walter Bright wrote:
 
 Derek Parnell wrote:
 I'm converting Bud to compile using V2 and so far its been a very hard
 thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all
 over the place, which is exactly what I thought would happen. Bud does a
 lot of text manipulation so having 'string' as invariant means that calls
 to functions that return string need to often be .dup'ed because I need to
 assign the result to a malleable variable. 
 
 I might have to rethink of the design of the application to avoid the
 performance hit of all these dups.
 

 
 First of all, if you were returning string literals as char[] and trying 
 to manipulate them, they'd fail on linux at run time (because string 
 literals are put into read only segments).

 
 But I'm not, and never have been, returning string literals anywhere.
  
 Second, you can use char[] instead of string.

 
 The idiom I'm using is that functions that receive text have those
 parameters as 'string' to guard against the function inadvertantly
 modifying that which is passed

Yep, makes sense.

 , and functions that return text return
 'string' to guard against calling functions inadvertantly modifying data
 that they did not create (own).

Question;  Do these functions keep a copy of the returned string?  Or, to
re-phrase, after returning the string do they still 'own' it, or have they
washed their hands of it?  Are they in a sense passing ownership to the calling
function perhaps?

If they no longer 'own' the string then they can return it as a char[] instead
of string and all your problems are solved, right?

I imagine that if they return a slice of the input string, and that string was
'string' not char[] then they would also return string (because doing otherwise
would be claiming ownership of the input string and giving it away to the
caller, which may not be valid)

Maybe you have a lot of functions returning slices to the input string?

Maybe you need to template them? i.e.

T function(T)(T param)
{
}

so if you pass string you get string, if you pass char[] you get char[].

Maybe all string routines which return slices of the input should be so
templated?

Regan

Jul 05 2007

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Derek Parnell wrote:
 On Wed, 04 Jul 2007 15:48:45 -0700, Walter Bright wrote:
 
 The idiom I'm using is that functions that receive text have those
 parameters as 'string' to guard against the function inadvertantly
 modifying that which is passed, and functions that return text return
 'string' to guard against calling functions inadvertantly modifying data
 that they did not create (own).
 
 This leads to constructs like ...
 
    char[] result;
 
    result = SomeTextFunc(data).dup;
 
 Another commonly used idiom that I had to stop using was ...
 
    char[] text;
    text = getvalue();
    if (wrongvalue(text))
        text = ""; // Reset to an empty string
 
 I now code ...
 
        text.length = 0; // Reset to an empty string
 
 which is slightly less readable.
 


Why is 'text.length = 0;' or 'text = text.init;' better than the idiom:
   str = "".dup;
, which also works for any kind of string, not just empty strings?

I found however, that there is a bug with that code:
http://d.puremagic.com/issues/show_bug.cgi?id=1314

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jul 05 2007

Sean Kelly <sean f4.ca> writes:

Derek Parnell wrote:
 I'm converting Bud to compile using V2 and so far its been a very hard
 thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all
 over the place, which is exactly what I thought would happen. Bud does a
 lot of text manipulation so having 'string' as invariant means that calls
 to functions that return string need to often be .dup'ed because I need to
 assign the result to a malleable variable. 

So just use char[] instead of 'string'.  I don't plan to use the aliases 
much either.


Sean

Jul 05 2007

Derek Parnell <derek nomail.afraid.org> writes:

On Thu, 05 Jul 2007 00:15:41 -0700, Sean Kelly wrote:

 Derek Parnell wrote:
 I'm converting Bud to compile using V2 and so far its been a very hard
 thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all
 over the place, which is exactly what I thought would happen. Bud does a
 lot of text manipulation so having 'string' as invariant means that calls
 to functions that return string need to often be .dup'ed because I need to
 assign the result to a malleable variable. 

 
 So just use char[] instead of 'string'.  I don't plan to use the aliases 
 much either.

It's not so clear cut. Firstly, a lot of phobos routines now return
'string' results and expect 'string' inputs. Secondly, I like the idea of
general purpose functions returning 'const' data, because it helps guard
against inadvertent modifications by the calling routines. It is up to the
calling function to explicitly decide if it is going to modify returned
stuff or not.

For example, if I know that I'll not need to modify the 'fullpath' then I
might do this ...

   string fullpath;

   fullpath = CanonicalPath(shortname);


However, if I might need to update it ...

   char[] fullpath;

   fullpath = CanonicalPath(shortname).dup;
   version(Windows)
   {
      setLowerCase(fullpath);
   }

The point is that the 'CanonicalPath' function hasn't got a clue what the
calling function is intending to do with the result so it is trying to be
responsible by guarding it against mistakes by the caller.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
5/07/2007 5:17:33 PM

Jul 05 2007

Walter Bright <newshound1 digitalmars.com> writes:

Derek Parnell wrote:
 However, if I might need to update it ...
 
    char[] fullpath;
 
    fullpath = CanonicalPath(shortname).dup;
    version(Windows)
    {
       setLowerCase(fullpath);
    }
 
 The point is that the 'CanonicalPath' function hasn't got a clue what the
 calling function is intending to do with the result so it is trying to be
 responsible by guarding it against mistakes by the caller.

If you write it like this:

string fullpath;

fullpath = CanonicalPath(shortname);
version(Windows)
{
       fullpath = std.string.tolower(fullpath);
}

you won't need to do the .dup .

Jul 05 2007

Regan Heath <regan netmail.co.nz> writes:

Walter Bright Wrote:
 Derek Parnell wrote:
 However, if I might need to update it ...
 
    char[] fullpath;
 
    fullpath = CanonicalPath(shortname).dup;
    version(Windows)
    {
       setLowerCase(fullpath);
    }
 
 The point is that the 'CanonicalPath' function hasn't got a clue what the
 calling function is intending to do with the result so it is trying to be
 responsible by guarding it against mistakes by the caller.

 
 If you write it like this:
 
 string fullpath;
 
 fullpath = CanonicalPath(shortname);
 version(Windows)
 {
        fullpath = std.string.tolower(fullpath);
 }
 
 you won't need to do the .dup .

Because tolower does it for you, but it still returns string and if for example
you need to add something to the end of the path, like a filename you will end
up doing yet another dup somewhere.

I think the solution may be to template all functions which return the input
string, or part of the input string, eg.

T tolower(T)(T input)
{
}

That way if you call it with char[] you get a char[] back, if you call it with
string you get a string back.

However...

tolower is an interesting case.  As a caller I expect it to modify the string,
or perhaps give a modified copy back (both options are valid and should perhaps
be supported?).

So, the 'string tolower(string)' version has 2 cases, the first case where it
doesn't need to modify the input and can simply return it, no problem.  

But case 2, where it does modify it should dup and return char[].  My reasoning
being that after it has completed and returned the copy, the caller now 'owns'
the string (as it's the only copy in existance and no-one else has a reference
to it).

To achieve that we'd need to overload on return type, or something clever... 
but then, how do we call it?

auto s = tolower(input);

tolower cannot be selected at compile time, and the type of s cannot be known
either, so that's an impossible situation, yes?

Regan

Jul 05 2007

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Regan Heath wrote:
 Walter Bright Wrote:
 Derek Parnell wrote:
 However, if I might need to update it ...

    char[] fullpath;

    fullpath = CanonicalPath(shortname).dup;
    version(Windows)
    {
       setLowerCase(fullpath);
    }

 The point is that the 'CanonicalPath' function hasn't got a clue what the
 calling function is intending to do with the result so it is trying to be
 responsible by guarding it against mistakes by the caller.

 If you write it like this:

 string fullpath;

 fullpath = CanonicalPath(shortname);
 version(Windows)
 {
        fullpath = std.string.tolower(fullpath);
 }

 you won't need to do the .dup .

 
 Because tolower does it for you, but it still returns string and if for
example you need to add something to the end of the path, like a filename you
will end up doing yet another dup somewhere.
 
 I think the solution may be to template all functions which return the input
string, or part of the input string, eg.
 
 T tolower(T)(T input)
 {
 }
 
 That way if you call it with char[] you get a char[] back, if you call it with
string you get a string back.
 
 However...
 
 tolower is an interesting case.  As a caller I expect it to modify the string,
or perhaps give a modified copy back (both options are valid and should perhaps
be supported?).
 
 So, the 'string tolower(string)' version has 2 cases, the first case where it
doesn't need to modify the input and can simply return it, no problem.  
 
 But case 2, where it does modify it should dup and return char[].  My
reasoning being that after it has completed and returned the copy, the caller
now 'owns' the string (as it's the only copy in existance and no-one else has a
reference to it).
 

Indeed, I think this illustrates that some standard library functions 
may not have the correct signature, and I tolower is likely one of them.
The most general case for tolower is:
   char[] tolower(const(char)[] s);
Since tolower creates a new array, but does not keep it, it can give 
away it's ownership of the the array (ie, return a mutable).

The second case, more specific, is simply syntactic sugar for making 
that array invariant:

   invariant(char)[] tolowerinv(const(char)[] str) {
     return cast(invariant) tolower(str);
   }

The current signature:
   const(char)[] tolower(const(char)[] str)
is kinda incorrect, because it returns a const reference for an array 
that has no mutable references, and that is the same as an invariant 
reference, so tolower might as well return invariant(char)[].


 To achieve that we'd need to overload on return type, or something clever... 
but then, how do we call it?
 
 auto s = tolower(input);
 
 tolower cannot be selected at compile time, and the type of s cannot be known
either, so that's an impossible situation, yes?
 
 Regan

The 'something clever' to distinguish both cases is simply naming two 
different functions, like tolower or tolowerinv (if the second function 
is needed at all).


-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jul 05 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Bruno Medeiros wrote:
 Regan Heath wrote:
 tolower is an interesting case.  As a caller I expect it to modify the 
 string, or perhaps give a modified copy back (both options are valid 
 and should perhaps be supported?).

 So, the 'string tolower(string)' version has 2 cases, the first case 
 where it doesn't need to modify the input and can simply return it, no 
 problem. 
 But case 2, where it does modify it should dup and return char[].  My 
 reasoning being that after it has completed and returned the copy, the 
 caller now 'owns' the string (as it's the only copy in existance and 
 no-one else has a reference to it).

 
 Indeed, I think this illustrates that some standard library functions 
 may not have the correct signature, and I tolower is likely one of them.
 The most general case for tolower is:
   char[] tolower(const(char)[] s);
 Since tolower creates a new array, but does not keep it, it can give 
 away it's ownership of the the array (ie, return a mutable).

Sorry, but you seem to have missed a bit above: if the string doesn't 
contain any uppercase characters tolower returns the input without 
.dup-ing it (aka copy-on-write).

 The second case, more specific, is simply syntactic sugar for making 
 that array invariant:
 
   invariant(char)[] tolowerinv(const(char)[] str) {
     return cast(invariant) tolower(str);
   }

Yes, but only if it actually needs to modify the string.

You seem to have missed that the two cases can't (in general) be 
distinguished at compile time; it's only at run time when a choice is 
made between a copy and no copy.

 The current signature:
   const(char)[] tolower(const(char)[] str)
 is kinda incorrect, because it returns a const reference for an array 
 that has no mutable references, and that is the same as an invariant 
 reference, so tolower might as well return invariant(char)[].

Again, that only holds if a copy was actually made at run time. If no 
copy was made the original input is returned, to which there may be 
mutable references.

Jul 05 2007

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Frits van Bommel wrote:
 Bruno Medeiros wrote:
 Regan Heath wrote:
 tolower is an interesting case.  As a caller I expect it to modify 
 the string, or perhaps give a modified copy back (both options are 
 valid and should perhaps be supported?).

 So, the 'string tolower(string)' version has 2 cases, the first case 
 where it doesn't need to modify the input and can simply return it, 
 no problem. But case 2, where it does modify it should dup and return 
 char[].  My reasoning being that after it has completed and returned 
 the copy, the caller now 'owns' the string (as it's the only copy in 
 existance and no-one else has a reference to it).

 Indeed, I think this illustrates that some standard library functions 
 may not have the correct signature, and I tolower is likely one of them.
 The most general case for tolower is:
   char[] tolower(const(char)[] s);
 Since tolower creates a new array, but does not keep it, it can give 
 away it's ownership of the the array (ie, return a mutable).

 
 Sorry, but you seem to have missed a bit above: if the string doesn't 
 contain any uppercase characters tolower returns the input without 
 ..dup-ing it (aka copy-on-write).
 

Oops, sorry, that's right, I missed that part about tolower not
modifying the string if it wasn't necessary. :(


 The second case, more specific, is simply syntactic sugar for making 
 that array invariant:

   invariant(char)[] tolowerinv(const(char)[] str) {
     return cast(invariant) tolower(str);
   }

 
 Yes, but only if it actually needs to modify the string.
 
 You seem to have missed that the two cases can't (in general) be 
 distinguished at compile time; it's only at run time when a choice is 
 made between a copy and no copy.
 
 The current signature:
   const(char)[] tolower(const(char)[] str)
 is kinda incorrect, because it returns a const reference for an array 
 that has no mutable references, and that is the same as an invariant 
 reference, so tolower might as well return invariant(char)[].

 
 Again, that only holds if a copy was actually made at run time. If no 
 copy was made the original input is returned, to which there may be 
 mutable references.

You're right, if a copy is not made *every* time (which is the case
after all), then the above doesn't hold.
But then, what I think is happening is that Phobo's current tolower is
suboptimal in terms of usefulness, because the fact that we don't know
if a new copy is made or not. I'm wondering now what would be the more
useful form, or forms, of tolower (and similar functions) to have.
Now that I think of it again (admittedly I haven't got much experience 
with string manipulation in C++ or D, though), but perhaps the best form 
is an in-place mutable version:
   char[] tolower(char[] str);
And it's this one after all that is the most general form. If you want 
to call tolower on a const or invariant array you dup it yourself on the 
call:
   char[] str = tolower("FOO".dup);


-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jul 05 2007

Regan Heath <regan netmail.co.nz> writes:

Bruno Medeiros wrote:
 The current signature:
   const(char)[] tolower(const(char)[] str)
 is kinda incorrect, because it returns a const reference for an array 
 that has no mutable references, and that is the same as an invariant 
 reference, so tolower might as well return invariant(char)[].

 Again, that only holds if a copy was actually made at run time. If no 
 copy was made the original input is returned, to which there may be 
 mutable references.

 
 You're right, if a copy is not made *every* time (which is the case
 after all), then the above doesn't hold.
 But then, what I think is happening is that Phobo's current tolower is
 suboptimal in terms of usefulness, because the fact that we don't know
 if a new copy is made or not. I'm wondering now what would be the more
 useful form, or forms, of tolower (and similar functions) to have.
 Now that I think of it again (admittedly I haven't got much experience 
 with string manipulation in C++ or D, though), but perhaps the best form 
 is an in-place mutable version:
   char[] tolower(char[] str);
 And it's this one after all that is the most general form. If you want 
 to call tolower on a const or invariant array you dup it yourself on the 
 call:
   char[] str = tolower("FOO".dup);

True.. but it's unfortunate that the most efficient case, where no 
duplication is needed, is no longer possible :(

If we template the function, eg.

T tolower(T)(T input)
{
}

and we have some way to check whether the input is const or not (at 
runtime is(string) or something?) perhaps we can code the existing 
efficient solution (no dup of const data) as well as the general case 
where it mutates.  In the mutate case it can dup if the input is const 
and not dup if it isn't (adding an efficient solution which doesn't 
currently exist).

The only problem is that the case where you pass const data and it has 
to dup, you get back a const reference to a piece of data with no other 
owner (meaning it doesn't need to be const) which might cause another 
dup in your code at a later point.

Regan

Jul 06 2007

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Regan Heath wrote:
 Bruno Medeiros wrote:
 The current signature:
   const(char)[] tolower(const(char)[] str)
 is kinda incorrect, because it returns a const reference for an 
 array that has no mutable references, and that is the same as an 
 invariant reference, so tolower might as well return invariant(char)[].

 Again, that only holds if a copy was actually made at run time. If no 
 copy was made the original input is returned, to which there may be 
 mutable references.

 You're right, if a copy is not made *every* time (which is the case
 after all), then the above doesn't hold.
 But then, what I think is happening is that Phobo's current tolower is
 suboptimal in terms of usefulness, because the fact that we don't know
 if a new copy is made or not. I'm wondering now what would be the more
 useful form, or forms, of tolower (and similar functions) to have.
 Now that I think of it again (admittedly I haven't got much experience 
 with string manipulation in C++ or D, though), but perhaps the best 
 form is an in-place mutable version:
   char[] tolower(char[] str);
 And it's this one after all that is the most general form. If you want 
 to call tolower on a const or invariant array you dup it yourself on 
 the call:
   char[] str = tolower("FOO".dup);

 
 True.. but it's unfortunate that the most efficient case, where no 
 duplication is needed, is no longer possible :(
 

Algoritms should care about worst-case performance, or average-case 
performance. That most efficient "case", where a string is already 
tolower, is a minority case in most applications, and is never a 
worst-case scenario. So why bother?
Also, doing this tolower like that would give other performance problems 
like these:

 The only problem is that the case where you pass const data and it has 
 to dup, you get back a const reference to a piece of data with no other 
 owner (meaning it doesn't need to be const) which might cause another 
 dup in your code at a later point.
 
 Regan

Indeed, with such scenario, you would end up with worse performance overall.

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jul 07 2007

Regan Heath <regan netmail.co.nz> writes:

Bruno Medeiros wrote:
 The 'something clever' to distinguish both cases is simply naming two 
 different functions, like tolower or tolowerinv (if the second function 
 is needed at all).

I was hoping for something clever'er ;)

Regan

Jul 05 2007

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Regan Heath wrote:
 Walter Bright Wrote:
 Derek Parnell wrote:
 However, if I might need to update it ...

    char[] fullpath;

    fullpath = CanonicalPath(shortname).dup;
    version(Windows)
    {
       setLowerCase(fullpath);
    }

 The point is that the 'CanonicalPath' function hasn't got a clue what the
 calling function is intending to do with the result so it is trying to be
 responsible by guarding it against mistakes by the caller.

 If you write it like this:

 string fullpath;

 fullpath = CanonicalPath(shortname);
 version(Windows)
 {
        fullpath = std.string.tolower(fullpath);
 }

 you won't need to do the .dup .

 
 Because tolower does it for you, but it still returns string and if for
example you need to add something to the end of the path, like a filename you
will end up doing yet another dup somewhere.
 
 I think the solution may be to template all functions which return the input
string, or part of the input string, eg.
 
 T tolower(T)(T input)
 {
 }
 
 That way if you call it with char[] you get a char[] back, if you call it with
string you get a string back.
 

It doesn't make sense to template it, because you'd still have two 
different function versions, that would work differently. The one that 
receives a string does a dup, the one that receives a char[] does not 
dup. The return type of tolower(string str) might also be char[] and not 
string, if tolower(string str) would allways does a dup, even if no 
character modifications are necessary.


-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jul 05 2007

Regan Heath <regan netmail.co.nz> writes:

Bruno Medeiros wrote:
 It doesn't make sense to template it, because you'd still have two 
 different function versions, that would work differently. The one that 
 receives a string does a dup, the one that receives a char[] does not 
 dup. The return type of tolower(string str) might also be char[] and not 
 string, if tolower(string str) would allways does a dup, even if no 
 character modifications are necessary.

If the template is

T tolower(T)(T input) {}

then you have

string tolower(string input) {}
char[] tolower(char[] input) {}

and you cases are:

1. input string, output same string (no dup)
2. input string, output string (dup)
3. input char[], output same char[] (no dup)




call to dup.

I think the above is better than the current implementation as it avoids 


Regan

Jul 06 2007

Walter Bright <newshound1 digitalmars.com> writes:

Regan Heath wrote:
 Walter Bright Wrote:
 string fullpath;

 fullpath = CanonicalPath(shortname);
 version(Windows)
 {
        fullpath = std.string.tolower(fullpath);
 }

 you won't need to do the .dup .

 
 Because tolower does it for you, but it still returns string

tolower only dups the string if it needs to. It won't dup a string that 
is already in lower case.

 and if for example
 you need to add something to the end of the path, like a filename you 
 will end up
 doing yet another dup somewhere.

Concatenating strings does not require a .dup.

Jul 05 2007

Regan Heath <regan netmail.co.nz> writes:

Walter Bright wrote:
 Regan Heath wrote:
 Walter Bright Wrote:
 string fullpath;

 fullpath = CanonicalPath(shortname);
 version(Windows)
 {
        fullpath = std.string.tolower(fullpath);
 }

 you won't need to do the .dup .

 Because tolower does it for you, but it still returns string

 tolower only dups the string if it needs to. It won't dup a string that 
 is already in lower case.

  > and if for example
  > you need to add something to the end of the path, like a filename you 
  > will end up
  > doing yet another dup somewhere.

 Concatenating strings does not require a .dup.

OR

newString = constString ~ bitToAdd; (is a copy of constString to 

So, the worst case scenario is that 2 dups are done.

Further if the input is char[] you can still get this worst case 
scenario because tolower returns string instead of char[].  With a 
templated version you get a much more efficient tolower for char[].

Regan

Jul 06 2007

Regan Heath <regan netmail.co.nz> writes:

Proof of concept.

Only duplicate when the input is 'string' allowing for more efficient 
handling of char[] parameters and allowing callers to pass mutable 
char[] parameter, recieve the result as a mutable char[] and avoid 
future dup calls on the returned data.

Output:
sStringM: 0x  416080 becomes 0x  880FD0 DUP
sCharM  : 0x  880FE0 becomes 0x  880FE0 SAME
sString : 0x  416110 becomes 0x  416110 SAME
sChar   : 0x  880FC0 becomes 0x  880FC0 SAME


Code:














rStringM.ptr, (sStringM.ptr!=rStringM.ptr)?"DUP":"SAME");

rCharM.ptr, (sCharM.ptr!=rCharM.ptr)?"DUP":"SAME");







rString.ptr, (sString.ptr!=rString.ptr)?"DUP":"SAME");

(sChar.ptr!=rChar.ptr)?"DUP":"SAME");

Jul 06 2007

Derek Parnell <derek psych.ward> writes:

On Thu, 05 Jul 2007 01:06:45 -0700, Walter Bright wrote:

 Derek Parnell wrote:
 However, if I might need to update it ...
 
    char[] fullpath;
 
    fullpath = CanonicalPath(shortname).dup;
    version(Windows)
    {
       setLowerCase(fullpath);
    }
 
 The point is that the 'CanonicalPath' function hasn't got a clue what the
 calling function is intending to do with the result so it is trying to be
 responsible by guarding it against mistakes by the caller.

 
 If you write it like this:
 
 string fullpath;
 
 fullpath = CanonicalPath(shortname);
 version(Windows)
 {
        fullpath = std.string.tolower(fullpath);
 }
 
 you won't need to do the .dup .

If you have any failing Walter, its your ability to focus on insignifacnt
minutia as a form of distraction from the point that people are really
trying to make.

I was not talking about how to do efficient lower case conversion. 

I'll make my code example more free from assumed functionality.


 char[] qwerty;
 
 qwerty = KJHGF(poiuy).dup;
 version(xyzzy)
 {
     MNBVC(qwerty);
 }

As you can see, my point is made without regard to converting stuff to
lower case.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Jul 05 2007

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Derek Parnell wrote:

 I'll make my code example more free from assumed functionality.
 
 
  char[] qwerty;
  
  qwerty = KJHGF(poiuy).dup;
  version(xyzzy)
  {
      MNBVC(qwerty);
  }
 
 As you can see, my point is made without regard to converting stuff to
 lower case.

What you are doing there is mixing two styles of functions. Functional 
(KJHGF) and in-place modifying functions (MNBVC). Walter's modification 
was making both use a common style (functional).

Mixing those two function styles will naturally require different types 
of constness.

-- 
Oskar

Jul 05 2007

Walter Bright <newshound1 digitalmars.com> writes:

Derek Parnell wrote:
 I'll make my code example more free from assumed functionality.
 
 
  char[] qwerty;
  
  qwerty = KJHGF(poiuy).dup;
  version(xyzzy)
  {
      MNBVC(qwerty);
  }
 
 As you can see, my point is made without regard to converting stuff to
 lower case.

My point is that the way the snippet is written is inside out. Do not 
use .dup to preemptively make a copy in case it gets changed somewhere 
later one. The style is to make a .dup *only if* the contents will be 
changed and do the .dup *at the site* of the modification.

In other words, dups should be done from the bottom up, not from the top 
down.

I think such a style helps fit things together nicely and avoids strange 
.dups appearing in inexplicable places.

Jul 05 2007

Derek Parnell <derek psych.ward> writes:

On Thu, 05 Jul 2007 11:51:30 -0700, Walter Bright wrote:

 Derek Parnell wrote:
 I'll make my code example more free from assumed functionality.
 
  char[] qwerty;
  
  qwerty = KJHGF(poiuy).dup;
  version(xyzzy)
  {
      MNBVC(qwerty);
  }
 
 As you can see, my point is made without regard to converting stuff to
 lower case.

 
 My point is that the way the snippet is written is inside out. Do not 
 use .dup to preemptively make a copy in case it gets changed somewhere 
 later one. The style is to make a .dup *only if* the contents will be 
 changed and do the .dup *at the site* of the modification.
 
 In other words, dups should be done from the bottom up, not from the top 
 down.
 
 I think such a style helps fit things together nicely and avoids strange 
 .dups appearing in inexplicable places.

Thanks. This is what I meant by taking rethinking the design of my
routines. I'll strongly consider your suggestion even though it does
complicate the algorirhm for readers of the code.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Jul 05 2007

BCS <ao pathlink.com> writes:

Reply to Walter,

 Derek Parnell wrote:
 
 I'll make my code example more free from assumed functionality.
 
 char[] qwerty;
 
 qwerty = KJHGF(poiuy).dup;
 version(xyzzy)
 {
 MNBVC(qwerty);
 }
 As you can see, my point is made without regard to converting stuff
 to lower case.
 

 My point is that the way the snippet is written is inside out. Do not
 use .dup to preemptively make a copy in case it gets changed somewhere
 later one. The style is to make a .dup *only if* the contents will be
 changed and do the .dup *at the site* of the modification.
 
 In other words, dups should be done from the bottom up, not from the
 top down.
 
 I think such a style helps fit things together nicely and avoids
 strange .dups appearing in inexplicable places.
 


The one issue I can see with this is where an input is const but may be changed 
(and .duped) at any of a number of points. The data though only needs to 
be .duped once.

|char[] Whatever(const char[] str)
|{
| if(c1) str = Mod1(str.dup);
| if(c2) str = Mod2(str.dup);
| if(c3) str = Mod3(str.dup);
| return str;
|}
// causes exces duping


I can't think of a better solution than this (and this is BAD):

|char[] Whatever(const char[] str)
|{
| sw: switch(-1)
| {
|  foreach(bool b; T!(true, false))
|  {
|   if(c1) {static if(b){str = str.dup; goto case 1;} else {case 1:  str 
= Mod1(str.dup);}}
|   if(c2) {static if(b){str = str.dup; goto case 2;} else {case 2:  str 
= Mod2(str.dup);}}
|   if(c3) {static if(b){str = str.dup; goto case 3;} else {case 3:  str 
= Mod3(str.dup);}}
|   return str;
|  }
| }
|}

Jul 05 2007

Walter Bright <newshound1 digitalmars.com> writes:

BCS wrote:
 The one issue I can see with this is where an input is const but may be 
 changed (and .duped) at any of a number of points. The data though only 
 needs to be .duped once.
 
 |char[] Whatever(const char[] str)
 |{
 | if(c1) str = Mod1(str.dup);
 | if(c2) str = Mod2(str.dup);
 | if(c3) str = Mod3(str.dup);
 | return str;
 |}
 // causes exces duping

My experience with this is:

1) Such cases are unusual

2) The few cases where they do happen, they are not in that 5% of the 
code that is a bottleneck

3) If such code is performance critical, there's usually a better way to 
write it that will yield even better performance than taking repeated 
passes over the same string. Best performance usually comes by merging 
all the operations into one pass.

Jul 05 2007

Sean Kelly <sean f4.ca> writes:

Derek Parnell wrote:
 On Thu, 05 Jul 2007 00:15:41 -0700, Sean Kelly wrote:
 
 Derek Parnell wrote:
 I'm converting Bud to compile using V2 and so far its been a very hard
 thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all
 over the place, which is exactly what I thought would happen. Bud does a
 lot of text manipulation so having 'string' as invariant means that calls
 to functions that return string need to often be .dup'ed because I need to
 assign the result to a malleable variable. 

 So just use char[] instead of 'string'.  I don't plan to use the aliases 
 much either.

 
 It's not so clear cut. Firstly, a lot of phobos routines now return
 'string' results and expect 'string' inputs.

I'd argue that the parameters should be "const char[]" rather than 
"string", and it's hard to say for the return values.

 Secondly, I like the idea of
 general purpose functions returning 'const' data, because it helps guard
 against inadvertent modifications by the calling routines. It is up to the
 calling function to explicitly decide if it is going to modify returned
 stuff or not.
 
 For example, if I know that I'll not need to modify the 'fullpath' then I
 might do this ...
 
    string fullpath;
 
    fullpath = CanonicalPath(shortname);

I would say that whether the return value is const/invariant indicates 
ownership.  If the called function/class owns the data then it is const 
or invariant.  If it does not then it is not const/invariant.  This 
seems to largely limit "string" as a return value to property methods.

 However, if I might need to update it ...
 
    char[] fullpath;
 
    fullpath = CanonicalPath(shortname).dup;
    version(Windows)
    {
       setLowerCase(fullpath);
    }
 
 The point is that the 'CanonicalPath' function hasn't got a clue what the
 calling function is intending to do with the result so it is trying to be
 responsible by guarding it against mistakes by the caller.

Right.  See above.


Sean

Jul 05 2007

"Kristian Kilpi" <kjkilpi gmail.com> writes:

On Thu, 05 Jul 2007 01:18:28 +0300, Derek Parnell <derek psych.ward> wro=
te:
 I'm converting Bud to compile using V2 and so far its been a very hard=

 thing to do. I'm finding that I'm now having to use '.dup' and '.idup'=

  =

 all
 over the place, which is exactly what I thought would happen. Bud does=

 a
 lot of text manipulation so having 'string' as invariant means that ca=

lls
 to functions that return string need to often be .dup'ed because I nee=

d  =

 to
 assign the result to a malleable variable.

 I might have to rethink of the design of the application to avoid the
 performance hit of all these dups.

That got me thinking about string functions in general.

First, I am wondering why some functions are formed as follows:
(but I'm sure someone will (hopefully) enlight me about that ;) )

   string foo(string bar);

That is, if they return something else than 'bar' (they do some string  =

manipulation).
Shouldn't they return char[] instead? For example:

   char[] foo(string bar) {
     return bar ~ "blah";
   }


And this brings us to the 'tolower()' function (for instance).

Sometimes it .dups and sometimes it doesn't. So, if I don't know if the =
 =

input string
contains upper cased chars, I have to .dup the return value, even if it =
 =

may already
been .dupped by 'tolower()'...

   char[] a =3D "abc".dup;
   char[] b =3D tolower(a).dub;  //.dupped once ('tolower()' returns pla=
in  =

'a')

   char[] a =3D "ABC".dup;
   char[] b =3D tolower(a).dub;  //.dupped twice!

So 'tolower()' is a hybrid of two function groups:
(1) functions that modify the input string,
(2) functions that returns a (modified) copy of the input string.

(If the input string doesn't contains upper cased chars it behaves like =
(1)
(even if it doesn't actually modify the input string), otherwise it  =

behaves like (2).)

I don't think this is a good thing.
There should be two different functions, one for each group:

   char[] tolower(char[] str);  //modifies and returns 'str'

   char[] getlower(string str);  //returns a copy


If one likes the copy-on-write behaviour of 'tolower(), I think it would=

work only by using reference counting.

For example (the 'String' class uses reference counting):

   String a, b;

   a =3D "abc";
   b =3D tolower(a);  //'b' points to 'a' ('tolower()' simply returns 'a=
')

   b[0] =3D 'x';  //'b' .dups its contents before modification, so 'a' i=
s not  =

changed

Jul 05 2007

Walter Bright <newshound1 digitalmars.com> writes:

Kristian Kilpi wrote:
 First, I am wondering why some functions are formed as follows:
 (but I'm sure someone will (hopefully) enlight me about that ;) )
 
   string foo(string bar);
 
 That is, if they return something else than 'bar' (they do some string 
 manipulation).
 Shouldn't they return char[] instead?

No, because then they must always dup the string. If they don't need to 
dup the string, they can return a reference to the parameter, and if so, 
it must be const.

 There should be two different functions, one for each group:
 
   char[] tolower(char[] str);  //modifies and returns 'str'
 
   char[] getlower(string str);  //returns a copy

When one would use a mutating tolower, one is already manipulating the 
contents of a string character by character. In such cases, one can 
tolower the characters in that process, instead of doing it later (the 
former will be more efficient anyway, and the only advantage to a 
mutating tolower is an efficiency improvement).

Using the functional-style copy-on-write string functions will result in 
easy to understand, less buggy programs. Doing strings in this manner is 
a proven success in just about every programming language.

Jul 05 2007

"Kristian Kilpi" <kjkilpi gmail.com> writes:

On Thu, 05 Jul 2007 22:11:37 +0300, Walter Bright  =

<newshound1 digitalmars.com> wrote:
 Kristian Kilpi wrote:
 First, I am wondering why some functions are formed as follows:
 (but I'm sure someone will (hopefully) enlight me about that ;) )
    string foo(string bar);
  That is, if they return something else than 'bar' (they do some stri=


ng  =

 manipulation).
 Shouldn't they return char[] instead?

 No, because then they must always dup the string. If they don't need t=

o  =

 dup the string, they can return a reference to the parameter, and if s=

o,  =

 it must be const.

 There should be two different functions, one for each group:
    char[] tolower(char[] str);  //modifies and returns 'str'
    char[] getlower(string str);  //returns a copy

 When one would use a mutating tolower, one is already manipulating the=

  =

 contents of a string character by character. In such cases, one can  =

 tolower the characters in that process, instead of doing it later (the=

  =

 former will be more efficient anyway, and the only advantage to a  =

 mutating tolower is an efficiency improvement).

That makes sense (especially with strings).

Of course, as said, it's not a perfect solution because
unnecessary .dupping can occur.

For example:

   s =3D "blah " ~ foo(tolower(str).dup);

'foo()' modifies its input string and returns it.

If 'foo' would be a copy-on-write function, you could just do:

   s =3D "blah " ~ foo(tolower(str));

That's much nicer, but 'str' could be copied twice in both the cases abo=
ve.
If both 'foo()' and 'tolower()' would modify 'str', no copying
had been done (by these functions).

Well, it's just how you like to code and build things.
Both the ways have their own pros and cons.

Jul 06 2007

D Programming

C/C++ Programming

Other

digitalmars.D - V2 string