digitalmars.D - COW vs. in-place.

Dave (8/8) Jul 31 2006 What if selected functions in phobos were modified to take an optional

Dawid =?UTF-8?B?Q2nEmcW8YXJraWV3aWN6?= (16/30) Jul 31 2006 I don't get it (the example).

BCS (3/27) Jul 31 2006 how about:
Dave (15/51) Jul 31 2006 Right now, if you call toupper with a string with any lower-case chars.
Dave (7/11) Jul 31 2006 It would take many calls to the modified toupper to cost as much as

Dawid =?UTF-8?B?Q2nEmcW8YXJraWV3aWN6?= (10/22) Jul 31 2006 Yes. Still - I'd rather see duplicated functions for that or something l...

Dave (3/28) Jul 31 2006 Not a bad idea... The main prob. would be that there would be a lot of

Derek (15/17) Jul 31 2006 void toUpper_inplace(char[] x)

Kirk McDonald (20/39) Jul 31 2006 I've got one better. Say we have a whole bunch of inplace string

Dave (32/75) Jul 31 2006 With this one, you're always dup'ing instead of .dup'ing only when

Derek Parnell (34/60) Jul 31 2006 I'm getting confused about what you are after now, sorry.

Dave (86/148) Jul 31 2006 Sorry, I think some of that got lost in the thread...

Kirk McDonald (21/31) Jul 31 2006 Using a function parameter as you suggest is fine and all (it helps in

Dawid =?UTF-8?B?Q2nEmcW8YXJraWV3aWN6?= (3/11) Aug 01 2006 Well. IMO not so much. There are not so many essential functions operati...

Reiner Pope (67/70) Jul 31 2006 Argh, that's what all of my proposals are about. See:

Andrei Khropov (7/11) Aug 01 2006 I agree. Adding additional parameter doesn't seem to be a good idea and ...

kris (3/17) Aug 01 2006 "to CoW or not to CoW ~ that is the question ..."

Jarrett Billingsley (5/11) Jul 31 2006 I've had the same idea; would be great for those trying to write librari...

Dave (9/25) Jul 31 2006 Too much water under the bridge now anyway (or is there?), but I've

Lionello Lunesu (12/18) Aug 01 2006 str being an UTF-8 string, I don't think you can guarantee that it CAN b...

Dawid =?UTF-8?B?Q2nEmcW8YXJraWV3aWN6?= (3/26) Aug 01 2006 Well thought.

Thomas Kuehne (29/50) Aug 02 2006 -----BEGIN PGP SIGNED MESSAGE-----

Regan Heath (11/17) Aug 02 2006 I think it's the right idea, but I think it's simply a variation of the ...
Sean Kelly (8/18) Aug 02 2006 Why not:

Dave (2/27) Aug 02 2006
Reiner Pope (5/9) Aug 03 2006 This is not copy on write. That is simply 'always copy', and this

Dave (16/26) Aug 03 2006 But presumably the user would only do the dup if they didn't want to mod...

Reiner Pope (17/54) Aug 03 2006 While I'm not convinced that CoW is such a bad situation, I agree with
Oskar Linde (29/63) Aug 03 2006 What is the advantage of redundantly assigning the result of an in-place...

Sean Kelly (6/22) Aug 03 2006 I like returning the mutated value so the function call can be embedded

Oskar Linde (60/84) Aug 03 2006 I have already seen the above foreach error in others D code.

Dave (14/56) Aug 03 2006 Hmmm - Is the current implementation of std.string.toupper wrong then?

Oskar Linde (14/79) Aug 03 2006 Not really. Some of the newer case folding mappings from Unicode are

Dave (9/44) Aug 04 2006 Not (pedantically) in-place for those cases, but for all cases you could...

Dave (28/57) Aug 03 2006 No advantage - the poster was just using the example from the OP. And wh...

Sean Kelly (6/16) Aug 03 2006 To do true COW, toupper would have to test every element against its

Oskar Linde (15/19) Aug 03 2006 There are at least three ways an array algorithm can operate:

Reiner Pope (40/58) Aug 03 2006 To the caller, however, there are only two situations (in an ideal world...

renox (12/26) Aug 03 2006 In ruby, they have this nice convention that a.function() leaves a

Kirk McDonald (11/38) Aug 03 2006 What about:

Oskar Linde (5/42) Aug 03 2006 It doesn't really apply to functions that are verbs, like capitalize,

Tom S (12/55) Aug 03 2006 I know we aren't supposed to like pointers, but it could also work the
Kirk McDonald (9/58) Aug 03 2006 Those make me think the function is /asking/ if the array/string is

Dave <Dave_member pathlink.com> writes:

What if selected functions in phobos were modified to take an optional 
parameter that specified COW or in-place? The default for each would be 
whatever they do now.

For example, toupper and tolower?

How many times have we seen something like this:

str = toupper(str); // or equivalent in another language.

Thanks,

- Dave

Jul 31 2006

Dawid =?UTF-8?B?Q2nEmcW8YXJraWV3aWN6?= <dawid.ciezarkiewicz gmail.com> writes:

Dave wrote:

 
 What if selected functions in phobos were modified to take an optional
 parameter that specified COW or in-place? The default for each would be
 whatever they do now.
 
 For example, toupper and tolower?
 
 How many times have we seen something like this:
 
 str = toupper(str); // or equivalent in another language.
 
 Thanks,
 
 - Dave

I don't get it (the example).

str = toupper(str);

does not mean that str can be modified in place

-- BEGIN --
void bla(char[] str) {
  str = toupper(str);
  /* something else */
}

bla("this string is readonly");
--- END ---
If you mean something else - sorry, still I don't get it.

I'd rather wait till const/immutability in D problem will be resolved. Don't
forget that additional "option" is runtime cost. There are some
propositions of const/immutability that could help providing compile time
information to deal with your proposition.

Jul 31 2006

BCS <BCS pathlink.com> writes:

Dawid Ciezarkiewicz wrote:
 Dave wrote:
 
 
What if selected functions in phobos were modified to take an optional
parameter that specified COW or in-place? The default for each would be
whatever they do now.

For example, toupper and tolower?

How many times have we seen something like this:

str = toupper(str); // or equivalent in another language.

Thanks,

- Dave

 
 
 I don't get it (the example).
 
 str = toupper(str);
 
 does not mean that str can be modified in place
 

how about:

str[] = toupper(str)[];

Jul 31 2006

Dave <Dave_member pathlink.com> writes:

Dawid Ciężarkiewicz wrote:
 Dave wrote:
 
 What if selected functions in phobos were modified to take an optional
 parameter that specified COW or in-place? The default for each would be
 whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.

 Thanks,

 - Dave

 
 I don't get it (the example).
 
 str = toupper(str);
 
 does not mean that str can be modified in place
 
 -- BEGIN --
 void bla(char[] str) {
   str = toupper(str);
   /* something else */
 }
 
 bla("this string is readonly");
 --- END ---
 If you mean something else - sorry, still I don't get it.
 

Right now, if you call toupper with a string with any lower-case chars. 
in it, it will .dup the string passed into it, then modify the dup'd 
string instead of the original and then return the dup. If it doesn't 
need to modify the string then it returns a reference to the original 
string. Often though, people just want to modify the original string 
anyway, so they do this:

str = toupper(str);

A new version could be declared as:

char[] toupper(char[] s, CIP cip = CIP.COW);

and changed to not modify the original instead of dup'ing it if COW 
isn't specified (which it is by default).

Then instead of str = toupper(str) you could do:

toupper(str,CIP.InPlace);

and avoid the duplication (a ref. to the modified string is still returned).

 I'd rather wait till const/immutability in D problem will be resolved. Don't
 forget that additional "option" is runtime cost. There are some
 propositions of const/immutability that could help providing compile time
 information to deal with your proposition.

Jul 31 2006

Dave <Dave_member pathlink.com> writes:

Dawid Ciężarkiewicz wrote:
 I'd rather wait till const/immutability in D problem will be resolved. Don't
 forget that additional "option" is runtime cost. There are some
 propositions of const/immutability that could help providing compile time
 information to deal with your proposition.

It would take many calls to the modified toupper to cost as much as 
needlessly duplicating one large text file, and now you have to either 
live with the dups or write your own in-place toupper <g>

None of the const/immutability ideas will take care of having to "copy 
on write"; they were all more-or-less just ways of enforcing COW so 
there wouldn't be mistakes.

Jul 31 2006

Dawid =?UTF-8?B?Q2nEmcW8YXJraWV3aWN6?= <dawid.ciezarkiewicz gmail.com> writes:

Dave wrote:

 Dawid Ciężarkiewicz wrote:
 I'd rather wait till const/immutability in D problem will be resolved.
 Don't forget that additional "option" is runtime cost. There are some
 propositions of const/immutability that could help providing compile time
 information to deal with your proposition.

 
 It would take many calls to the modified toupper to cost as much as
 needlessly duplicating one large text file, and now you have to either
 live with the dups or write your own in-place toupper <g>

Yes. Still - I'd rather see duplicated functions for that or something like
it (just to have it in compile time).
 
 None of the const/immutability ideas will take care of having to "copy
 on write"; they were all more-or-less just ways of enforcing COW so
 there wouldn't be mistakes.

Well, right.

Maybe just writting new module (std.strinplace) that do what you want and
then sending it to Walter/D discussion group is good . I guess with newday
import improvements names could stay like they were and people interested
in this speedup would statically import this module and use FQN where they
want such behavior.

Jul 31 2006

Dave <Dave_member pathlink.com> writes:

Dawid Ciężarkiewicz wrote:
 Dave wrote:
 
 Dawid Ciężarkiewicz wrote:
 I'd rather wait till const/immutability in D problem will be resolved.
 Don't forget that additional "option" is runtime cost. There are some
 propositions of const/immutability that could help providing compile time
 information to deal with your proposition.

 It would take many calls to the modified toupper to cost as much as
 needlessly duplicating one large text file, and now you have to either
 live with the dups or write your own in-place toupper <g>

 
 Yes. Still - I'd rather see duplicated functions for that or something like
 it (just to have it in compile time).
  
 None of the const/immutability ideas will take care of having to "copy
 on write"; they were all more-or-less just ways of enforcing COW so
 there wouldn't be mistakes.

 
 Well, right.
 
 Maybe just writting new module (std.strinplace) that do what you want and
 then sending it to Walter/D discussion group is good . I guess with newday
 import improvements names could stay like they were and people interested
 in this speedup would statically import this module and use FQN where they
 want such behavior.

Not a bad idea... The main prob. would be that there would be a lot of 
duplication of code.

Jul 31 2006

Derek <derek psyc.ward> writes:

On Mon, 31 Jul 2006 16:40:54 -0500, Dave wrote:

Not a bad idea... The main prob. would be that there would be a lot of 
duplication of code.

void toUpper_inplace(char[] x)
{
 . . .
}

char[] toUpper(char[] x)
{
   char[] y = x.dup;
   toUpper_inplace(y);
   return y;
}

-- 
Derek Parnell
Melbourne, Australia
"Down with mediocrity!"

Jul 31 2006

Kirk McDonald <kirklin.mcdonald gmail.com> writes:

Derek wrote:
 On Mon, 31 Jul 2006 16:40:54 -0500, Dave wrote:
 
 
Not a bad idea... The main prob. would be that there would be a lot of 
duplication of code.

 
 
 void toUpper_inplace(char[] x)
 {
  . . .
 }
 
 char[] toUpper(char[] x)
 {
    char[] y = x.dup;
    toUpper_inplace(y);
    return y;
 }
 

I've got one better. Say we have a whole bunch of inplace string 
functions, like the one above and this one:

void toLower_inplace(char[] x) {
     // ...
}

and others. Then we can:

char[] cow_func(alias fn)(char[] x) {
     char[] y = x.dup;
     fn(y);
     return y;
}

alias cow_func!(toUpper_inplace) toUpper;
alias cow_func!(toLower_inplace) toLower;

Etc. Obviously, you'd have to provide a different template for each 
function footprint, but the string library has a lot of repeated footprints.

-- 
Kirk McDonald
Pyd: Wrapping Python with D
http://dsource.org/projects/pyd/wiki

Jul 31 2006

Dave <Dave_member pathlink.com> writes:

Kirk McDonald wrote:
 Derek wrote:
 On Mon, 31 Jul 2006 16:40:54 -0500, Dave wrote:


 Not a bad idea... The main prob. would be that there would be a lot 
 of duplication of code.


 void toUpper_inplace(char[] x)
 {
  . . .
 }

 char[] toUpper(char[] x)
 {
    char[] y = x.dup;
    toUpper_inplace(y);
    return y;
 }


With this one, you're always dup'ing instead of .dup'ing only when 
needed (the current one is actually more efficient).

 
 I've got one better. Say we have a whole bunch of inplace string 
 functions, like the one above and this one:
 
 void toLower_inplace(char[] x) {
     // ...
 }
 
 and others. Then we can:
 
 char[] cow_func(alias fn)(char[] x) {
     char[] y = x.dup;
     fn(y);
     return y;
 }
 
 alias cow_func!(toUpper_inplace) toUpper;
 alias cow_func!(toLower_inplace) toLower;
 
 Etc. Obviously, you'd have to provide a different template for each 
 function footprint, but the string library has a lot of repeated 
 footprints.
 

I think to maximize code re-use you'd have to build the "COW or not to 
COW" logic into the "base" function. And if you did that you'd have to 
live with a little more function call overhead (passing a bool or small 
enum around) in order to avoid the defensive copying like in cow_func above.

I'm wondering - if Phobos would have been built that way (making it the 
'D way' of doing things), would all the concerns about GC performance 
and "const" have been so acute over the last year or so (hind-sight is 
always closer to 20-20 of course)?

The problem w/ all the dup'ing is when you put something like this in a 
tight loop you get sloooowwwww code:

import std.file, std.string, std.stdio;

void main()
{
   char[][] formatted;
   char[][] text = split(cast(char[])read("largefile.txt"), ".");
   foreach(char[] sentence; text)
   {
     formatted ~= capitalize(tolower(strip(sentence))) ~ ".\r\n";
   }
   //...
   foreach(char[] sentence; formatted)
   {
     writefln(sentence);
   }
}

None of those functions (except for read()) would really have to do much 
allocating because the input file for all intents and purposes is 
read-only here (it won't get implicitly modified even if COW isn't used).

- Dave

Jul 31 2006

Derek Parnell <derek nomail.afraid.org> writes:

On Mon, 31 Jul 2006 18:01:14 -0500, Dave wrote:

 Kirk McDonald wrote:
 Derek wrote:
 On Mon, 31 Jul 2006 16:40:54 -0500, Dave wrote:


 Not a bad idea... The main prob. would be that there would be a lot 
 of duplication of code.


 void toUpper_inplace(char[] x)
 {
  . . .
 }

 char[] toUpper(char[] x)
 {
    char[] y = x.dup;
    toUpper_inplace(y);
    return y;
 }


 
 With this one, you're always dup'ing instead of .dup'ing only when 
 needed (the current one is actually more efficient).

I'm getting confused about what you are after now, sorry. 

It seems that you are wanting a CoW version, an InPlace version, and a
non-Destructive version of each function and let the compiler and/or the
author choose the best one for the job at hand.

The example about gave the InPlace and non-destructive versoins and the
current version is CoW. 

...

 The problem w/ all the dup'ing is when you put something like this in a 
 tight loop you get sloooowwwww code:

Not if the author has a choice ...
 
import std.file, std.string, std.stdio;

void main()
{
   char[][] formatted;
   char[][] text = split(cast(char[])read("largefile.txt"), ".");
   foreach(char[] sentence; text)
   {
     strip_IP(sentence);
     tolower_IP(sentence);
     capitalize_IP(sentence);
     formatted ~= sentence ~ ".\r\n";
   }
   //...
   foreach(char[] sentence; formatted)
   {
     writefln(sentence);
   }
}


-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocrity!"
1/08/2006 11:18:40 AM

Jul 31 2006

Dave <Dave_member pathlink.com> writes:

Derek Parnell wrote:
 On Mon, 31 Jul 2006 18:01:14 -0500, Dave wrote:
 
 Kirk McDonald wrote:
 Derek wrote:
 On Mon, 31 Jul 2006 16:40:54 -0500, Dave wrote:


 Not a bad idea... The main prob. would be that there would be a lot 
 of duplication of code.

 void toUpper_inplace(char[] x)
 {
  . . .
 }

 char[] toUpper(char[] x)
 {
    char[] y = x.dup;
    toUpper_inplace(y);
    return y;
 }


 With this one, you're always dup'ing instead of .dup'ing only when 
 needed (the current one is actually more efficient).

 
 I'm getting confused about what you are after now, sorry. 
 
 It seems that you are wanting a CoW version, an InPlace version, and a
 non-Destructive version of each function and let the compiler and/or the
 author choose the best one for the job at hand.
 
 The example about gave the InPlace and non-destructive versoins and the
 current version is CoW. 
 
 ...
 
 The problem w/ all the dup'ing is when you put something like this in a 
 tight loop you get sloooowwwww code:

 
 Not if the author has a choice ...
  
 import std.file, std.string, std.stdio;
 
 void main()
 {
    char[][] formatted;
    char[][] text = split(cast(char[])read("largefile.txt"), ".");
    foreach(char[] sentence; text)
    {
      strip_IP(sentence);
      tolower_IP(sentence);
      capitalize_IP(sentence);
      formatted ~= sentence ~ ".\r\n";
    }
    //...
    foreach(char[] sentence; formatted)
    {
      writefln(sentence);
    }
 }
 
 

Sorry, I think some of that got lost in the thread...

I'm asking if it would make sense to change the current functions so COW is
optional. That way 
current code wouldn't be broken but we'd have the choice.

For example, the current tolower w/ the changes added (denoted by **):

//** char[] tolower(char[] s)
char[] tolower(char[] s, bool cow = true)
//**
{
     int changed;
     int i;
     char[] r = s;

     changed = 0;
     for (i = 0; i < s.length; i++)
     {
         auto c = s[i];
         if ('A' <= c && c <= 'Z')
         {
             //**if (!changed)
             if (cow && !changed)
             //**
             {   r = s.dup;
                 changed = 1;
             }
             r[i] = c + (cast(char)'a' - 'A');
         }
         else if (c >= 0x7F)
         {
             foreach (size_t j, dchar dc; s[i .. length])
             {
                 //**if (!changed)
                 if (cow && !changed)
                 //**
                 {
                     if (!std.uni.isUniUpper(dc))
                         continue;

                     r = s[0 .. i + j].dup;
                     changed = 1;
                 }
                 dc = std.uni.toUniLower(dc);
                 std.utf.encode(r, dc);
             }
             break;
         }
     }
     return r;
}

So the sample code would become:

import std.file, std.string, std.stdio;

void main()
{
   char[][] formatted;
   char[][] text = split(cast(char[])read("largefile.txt"), ".");
   foreach(char[] sentence; text)
   {
     formatted ~= capitalize(tolower(strip(sentence, false), false), false) ~
".\r\n";
   }
   //...
   foreach(char[] sentence; formatted)
   {
     writefln(sentence);
   }
}

Then I suggested either make the cow parameter default to false, or wondered
how things would have 
worked out if the original data owner became responsible for there own dups:

void main()
{
   char[][] formatted;
   char[] original = cast(char[])read("largefile.txt").dup; //**
   char[][] text = split(original, ".");
   foreach(char[] sentence; text)
   {
     formatted ~= capitalize(tolower(strip(sentence))) ~ ".\r\n";
   }
   //...
   foreach(char[] sentence; formatted)
   {
     writefln(sentence);
   }
   //** The 'original' (duplicated, unmodified) data is used again here
}

If everything was done inplace in Phobos, then it would become 2nd nature for
the owner to dup when 
needed. And the user wouldn't need to rely on the hope that the library
developer didn't make a 
mistake and forget to COW when they were supposed to.

Thanks,

- Dave

Jul 31 2006

Kirk McDonald <kirklin.mcdonald gmail.com> writes:

Dave wrote:
 Sorry, I think some of that got lost in the thread...
 
 I'm asking if it would make sense to change the current functions so COW 
 is optional. That way current code wouldn't be broken but we'd have the 
 choice.
 

Using a function parameter as you suggest is fine and all (it helps in 
code re-use as your example ably shows), but I find calling, e.g. 
islower_inplace clearer than some strange 'false' parameter at the end 
of the argument list. If we make the 'cow' parameter default to 'true', 
we might also provide a wrapper:

char[] inplace_wrap(alias fn)(char[] s) {
     return fn(s, false);
}

alias inplace_wrap!(tolower) tolower_inplace;
alias inplace_wrap!(toupper) toupper_inplace;
// &c, &c

(I like this method of function wrapping, can you tell?) Or we could 
just as easily default cow to 'false' and have the wrapper be 'cow_wrap' 
instead. (It would also be easy enough to provide both.)

 If everything was done inplace in Phobos, then it would become 2nd 
 nature for the owner to dup when needed. And the user wouldn't need to 
 rely on the hope that the library developer didn't make a mistake and 
 forget to COW when they were supposed to.

I sure hope the library makes this an important, documented part of its 
interface.

-- 
Kirk McDonald
Pyd: Wrapping Python with D
http://dsource.org/projects/pyd/wiki

Jul 31 2006

Dawid =?UTF-8?B?Q2nEmcW8YXJraWV3aWN6?= <dawid.ciezarkiewicz gmail.com> writes:

Dave wrote:
 Maybe just writting new module (std.strinplace) that do what you want and
 then sending it to Walter/D discussion group is good . I guess with
 newday import improvements names could stay like they were and people
 interested in this speedup would statically import this module and use
 FQN where they want such behavior.

 
 Not a bad idea... The main prob. would be that there would be a lot of
 duplication of code.

Well. IMO not so much. There are not so many essential functions operating
on strings and they don't change too often.

Aug 01 2006

Reiner Pope <reiner.pope gmail.com> writes:

Dave wrote:
 None of the const/immutability ideas will take care of having to "copy 
 on write"; they were all more-or-less just ways of enforcing COW so 
 there wouldn't be mistakes.

Argh, that's what all of my proposals are about. See:
rocheck in 'YACP -- Yet Another Const Proposal' on digitalmars.D
'constness for arrays' by xs0 on digitalmars.D
'what's wrong with just a runtime-checked const'? on digitalmars.D.learn

These all explore a way to make array functions work optimally in all 
cases. The rocheck proposal (the most recent one) would look as follows:

rocheck char[] toupper(rocheck char[] input)
{   foreach (i, c; input)
     {   if (islower(c))
         {   char[] temp = input.ensureWritable; // ensureWritable 
checks whether it is mutable and copies if not
             temp[i] = chartoupper(c);
	    input = temp; // if we did indeed duplicate, then make sure we now 
use the duplicated one
         }
     }
     return input;
}

// Another alternative: faster, but more code
rocheck char[] toupper(rocheck char[] input)
{   foreach (i, c; input)
     {   if (islower(c))
         {   char[] temp = input.ensureWritable;
             foreach (inout c2; temp[i..$])
             {   if (islower(c2)) c2 = toupper(c2);
             }
             return temp;
         }
     }
     return input;
}

// Now look what we can do:
char[] foo = "hello".dup;
foo = toupper(foo).ensureWritable;
// Ensurewritable is a null-op here, because there is never a const 
reference. It's only there to please the const checking of the compiler
readonly char[] bar = baz.getName();
foo = toupper(bar).ensureWritable;
// if toupper modifies, then it will dup it (since bar is readonly). Iff 
not, then ensureWritable will dup it. This way, we ensure exactly one 
duplication, which is as required.
readonly char[] asdf = CIP1(CIP2(CIP3(bar)));
/// CIP1, 2 and 3 are rocheck functions like toupper above. If none of 
them modify, then no duplication takes place. If one of them does, then 
only one duplication takes place.

Having it integrated into the language is more powerful, because it 
actually works with const checking and makes the syntax cleaner. 
Consider how you would get the same efficiency with the last statement 
using the CIP enum when just modifying the library:


CIP1, CIP2 and CIP3 would all need signatures as follows:
char[] CIP1(char[] input, inout CIP cipness) {...}

It would be inout so that you can tell it about the input, and it can 
tell you about the output. If you don't know the ownership of the 
output, you will get unnecessary dups. Here is how you would emulate the 
last line of the rocheck sample code:

CIP temp = CIP.COW;
char[] bar; // We mustn't modify this
bar = CIP1(bar, temp); // bar *might* be modifiable inplace, but only 
temp knows
bar = CIP2(bar, temp);
bar = CIP3(bar, temp);
// We still don't know whether bar is the original, unmodifiable one, or 
not. However, temp can tell us.

This code is much more verbose than one built into the language.

Cheers,

Reiner

Jul 31 2006

"Andrei Khropov" <andkhropov nospam_mtu-net.ru> writes:

Dawid Ciężarkiewicz wrote:

 I'd rather wait till const/immutability in D problem will be resolved. Don't
 forget that additional "option" is runtime cost. There are some
 propositions of const/immutability that could help providing compile time
 information to deal with your proposition.

I agree. Adding additional parameter doesn't seem to be a good idea and also
raises the question whether the default behavior will be to copy or not and
also introduces possibility of subtle errors when passing the flag was
mistakenly omitted.

-- 
AKhropov

Aug 01 2006

kris <foo bar.com> writes:

Andrei Khropov wrote:
 Dawid Ciężarkiewicz wrote:
 
 
I'd rather wait till const/immutability in D problem will be resolved. Don't
forget that additional "option" is runtime cost. There are some
propositions of const/immutability that could help providing compile time
information to deal with your proposition.

 
 
 I agree. Adding additional parameter doesn't seem to be a good idea and also
 raises the question whether the default behavior will be to copy or not and
 also introduces possibility of subtle errors when passing the flag was
 mistakenly omitted.
 


"to CoW or not to CoW ~ that is the question ..."

"to err is human; to moo is bovine"

Aug 01 2006

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Dave" <Dave_member pathlink.com> wrote in message 
news:ealack$bjg$1 digitaldaemon.com...
 What if selected functions in phobos were modified to take an optional 
 parameter that specified COW or in-place? The default for each would be 
 whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.

I've had the same idea; would be great for those trying to write libraries 
that make as few allocations as possible.  Not to mention just plain more 
efficient if you don't need a copy.

Jul 31 2006

Dave <Dave_member pathlink.com> writes:

Jarrett Billingsley wrote:
 "Dave" <Dave_member pathlink.com> wrote in message 
 news:ealack$bjg$1 digitaldaemon.com...
 What if selected functions in phobos were modified to take an optional 
 parameter that specified COW or in-place? The default for each would be 
 whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.

 
 I've had the same idea; would be great for those trying to write libraries 
 that make as few allocations as possible.  Not to mention just plain more 
 efficient if you don't need a copy. 
 

Too much water under the bridge now anyway (or is there?), but I've 
often thought that it would've been better to do the same and make 
in-place the default and COW the exception anyhow. This wouldn't have 
been a hurdle for people coming from the C lib. to Phobos anyway -- 
they're used to it (e.g.: strcat, et al). As to users of other 
languages, all the docs. would have to do is make sure to point out what 
in-place means, with maybe an example of how to .dup your string before 
you pass it in if needed.

Jul 31 2006

"Lionello Lunesu" <lio lunesu.remove.com> writes:

"Dave" <Dave_member pathlink.com> wrote in message 
news:ealack$bjg$1 digitaldaemon.com...
 What if selected functions in phobos were modified to take an optional 
 parameter that specified COW or in-place? The default for each would be 
 whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.

str being an UTF-8 string, I don't think you can guarantee that it CAN be 
made uppercase in-place. It seems to me that it's quite possible that some 
uppercase UNICODE characters are larger than their lowercase versions, 
possibly crossing an UTF-8 byte-count border. But there are other string 
functions that don't have this problem.

In either case, a standard library should simply provide two functions, one 
in-place and the other COW. I many cases, the COW function could use the 
in-place one, eliminating duplicate code. For example, In my own lib I use 
.ToUpper() for the in-place version and .UpperCase() for the COW one.

L.

Aug 01 2006

Dawid =?UTF-8?B?Q2nEmcW8YXJraWV3aWN6?= <dawid.ciezarkiewicz gmail.com> writes:

Lionello Lunesu wrote:

 
 "Dave" <Dave_member pathlink.com> wrote in message
 news:ealack$bjg$1 digitaldaemon.com...
 What if selected functions in phobos were modified to take an optional
 parameter that specified COW or in-place? The default for each would be
 whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.

 
 str being an UTF-8 string, I don't think you can guarantee that it CAN be
 made uppercase in-place. It seems to me that it's quite possible that some
 uppercase UNICODE characters are larger than their lowercase versions,
 possibly crossing an UTF-8 byte-count border. But there are other string
 functions that don't have this problem.

This _is_ problem.

 In either case, a standard library should simply provide two functions,
 one in-place and the other COW. I many cases, the COW function could use
 the in-place one, eliminating duplicate code. For example, In my own lib I
 use .ToUpper() for the in-place version and .UpperCase() for the COW one.

Well thought.

Aug 01 2006

Thomas Kuehne <thomas-dloop kuehne.cn> writes:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dawid Ci??arkiewicz schrieb am 2006-08-01:
 Lionello Lunesu wrote:

 
 "Dave" <Dave_member pathlink.com> wrote in message
 news:ealack$bjg$1 digitaldaemon.com...
 What if selected functions in phobos were modified to take an optional
 parameter that specified COW or in-place? The default for each would be
 whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.

 
 str being an UTF-8 string, I don't think you can guarantee that it CAN be
 made uppercase in-place. It seems to me that it's quite possible that some
 uppercase UNICODE characters are larger than their lowercase versions,
 possibly crossing an UTF-8 byte-count border. But there are other string
 functions that don't have this problem.

 This _is_ problem.

http://www.unicode.org/reports/tr21/

from ftp://ftp.unicode.org/Public/UNIDATA/CaseFolding.txt




This allows to keep the code point count constant, the UTF-8 fragment
count however is a problem. Currently (5.0.0 2006-03-03, 08:22:43 GMT)
there are 9 + 2 cases where the fragment count changes:











Only used for Turkic languages (tr, az):



Thomas


-----BEGIN PGP SIGNATURE-----

iD8DBQFE0RG3LK5blCcjpWoRAtjwAJ4wHpa36MrLRwlmBFs86gDdJyLHaQCfRNFI
6Ejb+99BzV5dl2QW9giF8Qg=
=h/xz
-----END PGP SIGNATURE-----

Aug 02 2006

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 31 Jul 2006 11:18:40 -0500, Dave <Dave_member pathlink.com> wrote:
 What if selected functions in phobos were modified to take an optional  
 parameter that specified COW or in-place? The default for each would be  
 whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.

I think it's the right idea, but I think it's simply a variation of the  
idea that the array itself needs a flag to tell functions whether they  
have to copy, or can modify in place. A 'readonly' flag, as mentioned here  
in other threads.

I'd prefer the flag was internal to the array so that my function  
signatures were simpler and less cluttered by things not directly related  
to the function.

That said, your idea can be implemented right now. The internal array flag  
requires Walter to agree and change D's arrays.

Regan

Aug 02 2006

Sean Kelly <sean f4.ca> writes:

Dave wrote:
 
 What if selected functions in phobos were modified to take an optional 
 parameter that specified COW or in-place? The default for each would be 
 whatever they do now.
 
 For example, toupper and tolower?
 
 How many times have we seen something like this:
 
 str = toupper(str); // or equivalent in another language.

Why not:

     str = toupper(str);     // in-place
     str = toupper(str.dup); // COW

or alternately:

     char[] toupper(char[] src, char[] dst = null);

where dst is an optional destination argument.


Sean

Aug 02 2006

Dave <Dave_member pathlink.com> writes:

Sean Kelly wrote:
 Dave wrote:
 What if selected functions in phobos were modified to take an optional 
 parameter that specified COW or in-place? The default for each would 
 be whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.

 
 Why not:
 
     str = toupper(str);     // in-place
     str = toupper(str.dup); // COW
 

That's how I think things should be, but it might break a lot of code now <g>

 or alternately:
 
     char[] toupper(char[] src, char[] dst = null);
 
 where dst is an optional destination argument.
 
 
 Sean

Aug 02 2006

Reiner Pope <reiner.pope gmail.com> writes:

 Why not:
 
     str = toupper(str);     // in-place
     str = toupper(str.dup); // COW

This is not copy on write. That is simply 'always copy', and this 
performs worse than COW (which in turn performs worse than in-place, if 
in-place is possible). Walter has also said earlier that, with COW, it 
should be the responsibility of the writer to ensure the copy, not the 
caller.

Aug 03 2006

Dave <Dave_member pathlink.com> writes:

Reiner Pope wrote:
 Why not:

     str = toupper(str);     // in-place
     str = toupper(str.dup); // COW

 
 This is not copy on write. That is simply 'always copy', and this 

But presumably the user would only do the dup if they didn't want to modify
str, so CoW would 
basically go away as a design pattern.

 performs worse than COW (which in turn performs worse than in-place, if 
 in-place is possible). Walter has also said earlier that, with COW, it 
 should be the responsibility of the writer to ensure the copy, not the 
 caller.

That's what I'm questioning ultimately. The caller knows best if the object
that _they created_ 
should be modified or copied and they can do that best before a call to a
modifying function. No 
matter if that happens to be the developer of another lib. function or an
application programmer.

What's more, CoW for arrays is inconsistent with how other reference objects
are treated (class 
objects are really not made for CoW - there's not even a rudimentary copy ctor
provided by the 
language. Same with AA's, which don't have a .dup for example).

Ultimately, most data that is modified is used modified for its remaining
program "lifetime", and 
however the original data was sourced (e.g.: reading from disk) can be
replicated if needed instead 
of having to keep copies around.

I think CoW for arrays was a mistake -- it is most often unnecessary, will
cause D to repeat many of 
Java's performance woes for the average user, and as I mentioned is
inconsistent as well. It's a 
lose-lose-lose.

- Dave

Aug 03 2006

Reiner Pope <reiner.pope gmail.com> writes:

Dave wrote:
 Reiner Pope wrote:
 Why not:

     str = toupper(str);     // in-place
     str = toupper(str.dup); // COW

 This is not copy on write. That is simply 'always copy', and this 

 
 But presumably the user would only do the dup if they didn't want to 
 modify str, so CoW would basically go away as a design pattern.
 
 performs worse than COW (which in turn performs worse than in-place, 
 if in-place is possible). Walter has also said earlier that, with COW, 
 it should be the responsibility of the writer to ensure the copy, not 
 the caller.

 
 That's what I'm questioning ultimately. The caller knows best if the 
 object that _they created_ should be modified or copied and they can do 
 that best before a call to a modifying function. No matter if that 
 happens to be the developer of another lib. function or an application 
 programmer.
 
 What's more, CoW for arrays is inconsistent with how other reference 
 objects are treated (class objects are really not made for CoW - there's 
 not even a rudimentary copy ctor provided by the language. Same with 
 AA's, which don't have a .dup for example).
 
 Ultimately, most data that is modified is used modified for its 
 remaining program "lifetime", and however the original data was sourced 
 (e.g.: reading from disk) can be replicated if needed instead of having 
 to keep copies around.
 
 I think CoW for arrays was a mistake -- it is most often unnecessary, 
 will cause D to repeat many of Java's performance woes for the average 
 user, and as I mentioned is inconsistent as well. It's a lose-lose-lose.
 
 - Dave

While I'm not convinced that CoW is such a bad situation, I agree with 
you that it is not perfect. However, a proper solution would need to 
make use of some facts:
  - the caller knows best whether the array may be edited in-place
  - whether the string should be modified in-place is often not known at 
compile time.
These require the passing of a bool indicating whether it should be 
copied on write, or not, which is just as you suggest.

However, to support this with the nicest code, it would be best to be 
both compiler-checked and language-supported. Of course, this is just 
advertising for the rocheck type modifier I'm proposing in YACP. The 
benefit of language support can also mean that inlining in situations 
with readonlyness known at compile time may have the CoW checking 
optimized away.

Cheers,

Reiner

Aug 03 2006

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Dave wrote:
 Reiner Pope wrote:
 Why not:

     str = toupper(str);     // in-place
     str = toupper(str.dup); // COW



What is the advantage of redundantly assigning the result of an in-place 
function to itself? In my opinion, all in-place functions should have a 
void return type to avoid common mistakes such as:

foreach(e; arr.reverse) { ... }
// OOPS, arr is now reversed

.dup followed by calling an in-place function is certainly ok, but in 
those cases, an ordinary functional (non-in-place) function would have 
been more efficient.

 This is not copy on write. That is simply 'always copy', and this 

 
 But presumably the user would only do the dup if they didn't want to 
 modify str, so CoW would basically go away as a design pattern.
 
 performs worse than COW (which in turn performs worse than in-place, 
 if in-place is possible). Walter has also said earlier that, with COW, 
 it should be the responsibility of the writer to ensure the copy, not 
 the caller.

 
 That's what I'm questioning ultimately. The caller knows best if the 
 object that _they created_ should be modified or copied and they can do 
 that best before a call to a modifying function. No matter if that 
 happens to be the developer of another lib. function or an application 
 programmer.
 
 What's more, CoW for arrays is inconsistent with how other reference 
 objects are treated (class objects are really not made for CoW - there's 
 not even a rudimentary copy ctor provided by the language. Same with 
 AA's, which don't have a .dup for example).

 
 Ultimately, most data that is modified is used modified for its 
 remaining program "lifetime", and however the original data was sourced 
 (e.g.: reading from disk) can be replicated if needed instead of having 
 to keep copies around.

 
 I think CoW for arrays was a mistake -- it is most often unnecessary, 
 will cause D to repeat many of Java's performance woes for the average 
 user, and as I mentioned is inconsistent as well. It's a lose-lose-lose.

Consider the following (just made up) case insensitive multi-file word 
count application:

import std.stdio;
import std.file;
import std.string;

void main(char[][] args) {
         int[char[]] wc;
         foreach(filename; args[1..$]) {
                 char[] data = cast(char[]) read(filename);
                 foreach(word; data.split())
                         wc[tolower(word)]++;
         }
         writefln("num words: ",wc.length);
}

If you ran this program on the full collection of 18000 Gutenberg books, 
you would inevitably run out of memory. Why would you do that when a 
standard English dictionary only occupies a couple of megabytes?

Without knowing the intricate details of D and Phobos, I bet you would 
have no way of knowing that you got killed by the cow. :)

/Oskar

Aug 03 2006

Sean Kelly <sean f4.ca> writes:

Oskar Linde wrote:
 Dave wrote:
 Reiner Pope wrote:
 Why not:

     str = toupper(str);     // in-place
     str = toupper(str.dup); // COW



 
 What is the advantage of redundantly assigning the result of an in-place 
 function to itself? In my opinion, all in-place functions should have a 
 void return type to avoid common mistakes such as:
 
 foreach(e; arr.reverse) { ... }
 // OOPS, arr is now reversed

I like returning the mutated value so the function call can be embedded 
in other code.  And arr.reverse is already a built-in mutating function, 
according to the spec.

 .dup followed by calling an in-place function is certainly ok, but in 
 those cases, an ordinary functional (non-in-place) function would have 
 been more efficient.

Why?


Sean

Aug 03 2006

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Sean Kelly wrote:
 Oskar Linde wrote:
 Dave wrote:
 Reiner Pope wrote:
 Why not:

     str = toupper(str);     // in-place
     str = toupper(str.dup); // COW



 What is the advantage of redundantly assigning the result of an 
 in-place function to itself? In my opinion, all in-place functions 
 should have a void return type to avoid common mistakes such as:

 foreach(e; arr.reverse) { ... }
 // OOPS, arr is now reversed

 
 I like returning the mutated value so the function call can be embedded 
 in other code.  

I have already seen the above foreach error in others D code.
I believe it is good library design to clearly mark functions with 
side-effects. Giving them a void return type will prevent any mistake of 
the following kind (assume toupper is in-place modifying as well as 
returning):

func(toupper(mystring));
func(arr.reverse);

where the side effect was unintended.
could those be errors: ?

arr2 = arr1.reverse;

toupper(mystring) ~ mystring;

 And arr.reverse is already a built-in mutating function, 
 according to the spec.

Yes. I find that unfortunate and inconsistent with how Phobos is 
designed. Luckily, arr.sort and arr.reverse are not callable as 
arr.sort() and arr.reverse(), so they really don't look like functions.

 .dup followed by calling an in-place function is certainly ok, but in 
 those cases, an ordinary functional (non-in-place) function would have 
 been more efficient.

 
 Why?

What I meant was that .dup + inplace will never be more efficient than a 
copying algorithm. In-place algorithms are often more complicated. If 
you want a copy anyway, it is more efficient to use a copying algorithm. 
As an example, consider stable sorting, where efficient copying 
algorithms are trivial.

Re: Library design

I would like to see both copying and in-place versions of algorithms 
where it makes sense, but only one behavior should be default. That 
default should be consistent throughout the standard library and 
preferably be recommended in an official style guide for third party 
libraries to follow.

I see two valid designs:


1. in-place default, copying algorithms specially named
-------------------------------------------------------

Design:
void toUpper(char[] str); // in-place
char[] toUpperCopy(char[] str); // copy

Pros:
* in-place is often more efficient and therefore default.
* many functions are imperative verbs, and as such one expects them to 
be modifying
* Similar to how the C++ STL is designed
Cons:
* many functions can not be expressed in-place (example: UTF-8 toUpper)


2. copying default, in-place versions specially named
-----------------------------------------------------

Design:
void toUpperInPlace(char[] str); // in-place
char[] toUpper(char[] str); // copy

Pros:
* copying is safer, and is therefore a better default
* in-place is an optimization and would stand out as such
* default is functional (no-side effects), side effects stand out
* people used to functional style programming would not find any
surprises
* all functions can be defined as copying functions
* how many popular languages are designed (Ruby, Python, php, all 
"functional" languages, etc...)
Cons:
* could confuse people, lead to silent errors:
toupper(str); // doesn't change str
cos(x); // doesn't change x ;)

For the record, I am in favor of number 2 and that would have biased the 
arguments above.

/Oskar

Aug 03 2006

Dave <Dave_member pathlink.com> writes:

Oskar Linde wrote:
 
 1. in-place default, copying algorithms specially named
 -------------------------------------------------------
 
 Design:
 void toUpper(char[] str); // in-place
 char[] toUpperCopy(char[] str); // copy
 
 Pros:
 * in-place is often more efficient and therefore default.
 * many functions are imperative verbs, and as such one expects them to 
 be modifying
 * Similar to how the C++ STL is designed
 Cons:
 * many functions can not be expressed in-place (example: UTF-8 toUpper)
 

Hmmm - Is the current implementation of std.string.toupper wrong then?

(If you removed the if(!changed) {...} blocks [where the CoW is milked] you
would effectively have 
an in-place implementation).

 
 2. copying default, in-place versions specially named
 -----------------------------------------------------
 
 Design:
 void toUpperInPlace(char[] str); // in-place
 char[] toUpper(char[] str); // copy
 
 Pros:
 * copying is safer, and is therefore a better default

Only if the coder expects that is the default, *and* they most often need the
original data intact 
later in the program.

And that safety is not much of an advantage when your code is three-legged dog
slow and eats up 
resources that could be used by other processes :) Walking to work may be safer
than going 70 MPH on 
the freeway, but it would take me a week and I'd starve.

 * in-place is an optimization and would stand out as such

It's only considered an 'optimization' right now because it's different from
the default (CoW).

 * default is functional (no-side effects), side effects stand out
 * people used to functional style programming would not find any
 surprises
 * all functions can be defined as copying functions
 * how many popular languages are designed (Ruby, Python, php, all 
 "functional" languages, etc...)

Yes, but all of these are languages where performance is not an imperative
(excepting some of the 
functional languages perhaps). Plus think of all the time and effort that have
been spent on GC's 
because of this design choice :)

 Cons:
 * could confuse people, lead to silent errors:
 toupper(str); // doesn't change str
 cos(x); // doesn't change x ;)
 
 For the record, I am in favor of number 2 and that would have biased the 
 arguments above.



 
 /Oskar

Aug 03 2006

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Dave wrote:
 Oskar Linde wrote:
 1. in-place default, copying algorithms specially named
 -------------------------------------------------------

 Design:
 void toUpper(char[] str); // in-place
 char[] toUpperCopy(char[] str); // copy

 Pros:
 * in-place is often more efficient and therefore default.
 * many functions are imperative verbs, and as such one expects them to 
 be modifying
 * Similar to how the C++ STL is designed
 Cons:
 * many functions can not be expressed in-place (example: UTF-8 toUpper)

 
 Hmmm - Is the current implementation of std.string.toupper wrong then?

Not really. Some of the newer case folding mappings from Unicode are 
missing from std.uni, so it could be better though.

 (If you removed the if(!changed) {...} blocks [where the CoW is milked] 
 you would effectively have an in-place implementation).

Again, not really. :) See Thomas Kuehne's post further up the thread. 
There are certain unicode case foldings where the number of UTF-8 
element changes. This will be handled correctly by 
std.string.toupper/tolower, but the result can not be in-place.

 2. copying default, in-place versions specially named
 -----------------------------------------------------

 Design:
 void toUpperInPlace(char[] str); // in-place
 char[] toUpper(char[] str); // copy

 Pros:
 * copying is safer, and is therefore a better default

 
 Only if the coder expects that is the default, *and* they most often 
 need the original data intact later in the program.
 
 And that safety is not much of an advantage when your code is 
 three-legged dog slow and eats up resources that could be used by other 
 processes :) Walking to work may be safer than going 70 MPH on the 
 freeway, but it would take me a week and I'd starve.

Is someone prejudiced here? :) I could counter that with how functional 
style programming is superior in all other ways, but I won't. ;)

 * in-place is an optimization and would stand out as such

 
 It's only considered an 'optimization' right now because it's different 
 from the default (CoW).
 
 * default is functional (no-side effects), side effects stand out
 * people used to functional style programming would not find any
 surprises
 * all functions can be defined as copying functions
 * how many popular languages are designed (Ruby, Python, php, all 
 "functional" languages, etc...)

 
 Yes, but all of these are languages where performance is not an 
 imperative (excepting some of the functional languages perhaps). Plus 
 think of all the time and effort that have been spent on GC's because of 
 this design choice :)
 
 Cons:
 * could confuse people, lead to silent errors:
 toupper(str); // doesn't change str
 cos(x); // doesn't change x ;)

 For the record, I am in favor of number 2 and that would have biased 
 the arguments above.

 


I could live with either one. It is after all only a matter of naming. 
Consistency is the most important thing. The argument that there are 
only a small subset of all functions for which in-place as a concept is 
applicable is IMHO quite strong.

/Oskar

Aug 03 2006

Dave <Dave_member pathlink.com> writes:

Oskar Linde wrote:
 Dave wrote:
 
 Again, not really. :) See Thomas Kuehne's post further up the thread. 
 There are certain unicode case foldings where the number of UTF-8 
 element changes. This will be handled correctly by 
 std.string.toupper/tolower, but the result can not be in-place.
 

Not (pedantically) in-place for those cases, but for all cases you could still
get around a complete 
.dup (and of course the string arguments would have to change to be passed
inout for to/upper/lower, 
std.uni.encode, etc.).

 2. copying default, in-place versions specially named
 -----------------------------------------------------

 Design:
 void toUpperInPlace(char[] str); // in-place
 char[] toUpper(char[] str); // copy

 Pros:
 * copying is safer, and is therefore a better default

 Only if the coder expects that is the default, *and* they most often 
 need the original data intact later in the program.

 And that safety is not much of an advantage when your code is 
 three-legged dog slow and eats up resources that could be used by 
 other processes :) Walking to work may be safer than going 70 MPH on 
 the freeway, but it would take me a week and I'd starve.

 
 Is someone prejudiced here? :) I could counter that with how functional 
 style programming is superior in all other ways, but I won't. ;)
 

I didn't intend it that way <g> Just pointing out that I'm not overly concerned
with complete safety 
with a language like D when it can cost a lot.

 I could live with either one. It is after all only a matter of naming. 
 Consistency is the most important thing. The argument that there are 
 only a small subset of all functions for which in-place as a concept is 
 applicable is IMHO quite strong.

You're right, it is a very small subset in Phobos right now but 'CoW' seems to
be the design pattern 
chosen for D. As this thread went on I became concerned that CoW for arrays is
probably not the way 
to go for a language like D (all IMHO).

 
 /Oskar

Aug 04 2006

Dave <Dave_member pathlink.com> writes:

Oskar Linde wrote:
 Dave wrote:
 Reiner Pope wrote:
 Why not:

     str = toupper(str);     // in-place
     str = toupper(str.dup); // COW



 What is the advantage of redundantly assigning the result of an in-place

No advantage - the poster was just using the example from the OP. And what the
OP example was 
showing is that the way it is now (CoW), the coder (often) ends-up assigning
the results back to the 
original string reference, in which case the .dup inside toupper is a total
waste.

     writefln(toupper(str));             // in-place

     char[] st2 = cast(char[])file.read("somedata");
     writefln("Uppercase string: ", toupper(st2.dup)); // dup only if needed
     writefln("Original string:  ", st2);

 function to itself? In my opinion, all in-place functions should have a void
return type to avoid 

common mistakes such as:

     writefln(toupper(str));             // function chain

Many of C's string functions do this too.

 foreach(e; arr.reverse) { ... }
 // OOPS, arr is now reversed

 .dup followed by calling an in-place function is certainly ok, but in those
cases, an ordinary 

functional (non-in-place) function would have been more efficient.

If the programmer needs to keep a copy of the original, the way
toupper/tolower/etc is done now is 
more efficient only in the case where the data was not modified.

My argument is that most often when data is modified at some point in a
program, it is because the 
rest of the program needs the modified version and not a copy of the original
(so defensive .dups 
won't be done anyhow).

 I think CoW for arrays was a mistake -- it is most often unnecessary, will
cause D to repeat 


many of Java's performance woes for the average user, and as I mentioned is
inconsistent as well. 
It's a lose-lose-lose.
 Consider the following (just made up) case insensitive multi-file word count
application:

 import std.stdio;
 import std.file;
 import std.string;

 void main(char[][] args) {
         int[char[]] wc;
         foreach(filename; args[1..$]) {
                 char[] data = cast(char[]) read(filename);
                 foreach(word; data.split())
                         wc[tolower(word)]++;
         }
         writefln("num words: ",wc.length);
 }

 If you ran this program on the full collection of 18000 Gutenberg books, you
would inevitably run 

out of memory. Why would you do that when a standard English dictionary only
occupies a couple of 
megabytes?
 Without knowing the intricate details of D and Phobos, I bet you would have no
way of knowing 

that you got killed by the cow. :)

Exactly my point and great example. It's that kind of stuff that is really
tough on a newbie trying 
to get the most out of a high-performance language.

IMHO, it's not too big of a leap for a beginner to suspect that data will be
modified when they pass 
a byref argument into a function like toupper. If 'in-place' is clearly
documented then I don't see 
a problem.

- Dave

 /Oskar

Aug 03 2006

Sean Kelly <sean f4.ca> writes:

Reiner Pope wrote:
 Why not:

     str = toupper(str);     // in-place
     str = toupper(str.dup); // COW

 
 This is not copy on write. That is simply 'always copy', and this 
 performs worse than COW (which in turn performs worse than in-place, if 
 in-place is possible). Walter has also said earlier that, with COW, it 
 should be the responsibility of the writer to ensure the copy, not the 
 caller.

To do true COW, toupper would have to test every element against its 
uppercase equivalent--the first diff would cause a copy to occur.  For 
mutating algorithms such as this, I think it makes more sense for them 
to always change the data in place if possible and to document them as such.


Sean

Aug 03 2006

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Dave wrote:
 
 What if selected functions in phobos were modified to take an optional 
 parameter that specified COW or in-place? The default for each would be 
 whatever they do now.

There are at least three ways an array algorithm can operate:
- in-place
- copying
- CoW

In this case, CoW would mean a function that made a copy in all cases 
except when the return value would become identical to the argument and 
as such, is semantically very close to the copying version.

It would make more sense to have separate in-place and copying 
functions, and add a possible runtime CoW-flag to the copying function.

I don't think a runtime flag for CoW vs in-place does make much sense 
when the compile time semantics are different.

An efficient implementation of a copying algorithm would also often be 
quite different from an in-place version, speaking for separate functions.

/Oskar

Aug 03 2006

Reiner Pope <reiner.pope gmail.com> writes:

Oskar Linde wrote:
 Dave wrote:
 What if selected functions in phobos were modified to take an optional 
 parameter that specified COW or in-place? The default for each would 
 be whatever they do now.

 
 There are at least three ways an array algorithm can operate:
 - in-place
 - copying
 - CoW

To the caller, however, there are only two situations (in an ideal world 
with adequate const protection*):
  - modifies my copy (in-place)
  - doesn't modify my copy

As long as the function sticks to what it promises, then it should be 
free to implement it in the fastest/easiest way possible.

*I know that there is a difference at the moment: with CoW, you have to 
be careful about modifying the returned value, because it might also be 
your original, in which case you would be modifying both. However, this 
is where const protection helps, especially the runtime flag included in 
rocheck.

 It would make more sense to have separate in-place and copying 
 functions, and add a possible runtime CoW-flag to the copying function.
 

When would ever want the copying function instead of the CoW function? 
At most times, the overhead from keeping track of CoW is generally 
minimal, but in the situations where CoW requires no copying, it gets a 
huge advantage. The only situation where choosing copying makes sense is 
if you have determined that the CoW is too much. In that case, however, 
you probably wouldn't want to send the flag at runtime, but change it at 
compile time, I would say.

 I don't think a runtime flag for CoW vs in-place does make much sense 
 when the compile time semantics are different.
 
 An efficient implementation of a copying algorithm would also often be 
 quite different from an in-place version, speaking for separate functions.

There's a simple solution to this:

// If the implementations for in-place and copying are substantially 
different, then wrap them like this
rocheck T[] sort(rocheck T[] array)
{
     if (array.isMutable())
         return inPlaceSort(array.ensureWritable());
     else
         return copyingSort(array);
}

// If there is no real difference, put them together in the one function
rocheck dchar[] toupper(rocheck dchar[] array)
{
     // Do some stuff and call ensureWritable() when required, which 
manages whether copying is necessary behind the scenes
}

The point behind the runtime flag is that the required checking can be 
made to be low overhead, with O(1) cost, whereas unnecessary copying has 
O(n) cost.

Cheers,

Reiner

Aug 03 2006

renox <renosky free.fr> writes:

Dave wrote:
 
 What if selected functions in phobos were modified to take an optional 
 parameter that specified COW or in-place? The default for each would be 
 whatever they do now.
 
 For example, toupper and tolower?
 
 How many times have we seen something like this:
 
 str = toupper(str); // or equivalent in another language.

In ruby, they have this nice convention that a.function() leaves a 
unchanged and a.function!() modifies a.

Something like this would be nice, the hard part is choosing the correct 
naming convention so that it is followed..

functionXIP (eXecute In Place), functionWSD (With Side Effect)?
Sigh, hard to achieve something as simple and elegant as '!' : caution 
this function modifies the object!

In the absence of proper naming termination, an optionnal parameter 
could be used yes.

Regards,
Renaud Hebert


 
 Thanks,
 
 - Dave

Aug 03 2006

Kirk McDonald <kirklin.mcdonald gmail.com> writes:

renox wrote:
 Dave wrote:
 
 What if selected functions in phobos were modified to take an optional 
 parameter that specified COW or in-place? The default for each would 
 be whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.

 
 
 In ruby, they have this nice convention that a.function() leaves a 
 unchanged and a.function!() modifies a.
 
 Something like this would be nice, the hard part is choosing the correct 
 naming convention so that it is followed..
 
 functionXIP (eXecute In Place), functionWSD (With Side Effect)?
 Sigh, hard to achieve something as simple and elegant as '!' : caution 
 this function modifies the object!
 
 In the absence of proper naming termination, an optionnal parameter 
 could be used yes.
 

What about:

void   toupper(char[] s);  // Modifies s in-place
char[] asupper(char[] s);  // COW function

Of course, this convention would only apply to functions named 
"tosomething", but I bet most/all of the functions for which an 
"in-place" operation makes sense are named that.

-- 
Kirk McDonald
Pyd: Wrapping Python with D
http://dsource.org/projects/pyd/wiki

Aug 03 2006

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Kirk McDonald wrote:
 renox wrote:
 Dave wrote:

 What if selected functions in phobos were modified to take an 
 optional parameter that specified COW or in-place? The default for 
 each would be whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.


 In ruby, they have this nice convention that a.function() leaves a 
 unchanged and a.function!() modifies a.

 Something like this would be nice, the hard part is choosing the 
 correct naming convention so that it is followed..

 functionXIP (eXecute In Place), functionWSD (With Side Effect)?
 Sigh, hard to achieve something as simple and elegant as '!' : caution 
 this function modifies the object!

 In the absence of proper naming termination, an optionnal parameter 
 could be used yes.

 
 What about:
 
 void   toupper(char[] s);  // Modifies s in-place
 char[] asupper(char[] s);  // COW function
 
 Of course, this convention would only apply to functions named 
 "tosomething", but I bet most/all of the functions for which an 
 "in-place" operation makes sense are named that.

It doesn't really apply to functions that are verbs, like capitalize, 
sort and map.

For those one option is: capitalized, sorted and mapped for COW versions.

/Oskar

Aug 03 2006

Tom S <h3r3tic remove.mat.uni.torun.pl> writes:

Oskar Linde wrote:
 Kirk McDonald wrote:
 renox wrote:
 Dave wrote:

 What if selected functions in phobos were modified to take an 
 optional parameter that specified COW or in-place? The default for 
 each would be whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.


 In ruby, they have this nice convention that a.function() leaves a 
 unchanged and a.function!() modifies a.

 Something like this would be nice, the hard part is choosing the 
 correct naming convention so that it is followed..

 functionXIP (eXecute In Place), functionWSD (With Side Effect)?
 Sigh, hard to achieve something as simple and elegant as '!' : 
 caution this function modifies the object!

 In the absence of proper naming termination, an optionnal parameter 
 could be used yes.

 What about:

 void   toupper(char[] s);  // Modifies s in-place
 char[] asupper(char[] s);  // COW function

 Of course, this convention would only apply to functions named 
 "tosomething", but I bet most/all of the functions for which an 
 "in-place" operation makes sense are named that.

 
 It doesn't really apply to functions that are verbs, like capitalize, 
 sort and map.
 
 For those one option is: capitalized, sorted and mapped for COW versions.

I know we aren't supposed to like pointers, but it could also work the 
following way:

void   toupper(char[]* s);  // modifies *s in-place
char[] toupper(char[] s);   // moo

then by writing:

toupper(&foo);

you'd make it pretty clear that foo is to be modified. Internally, the 
in-place version could immediately call sth like
void toupper_inPlace(inout char[] s);


--
Tomasz Stachowiak

Aug 03 2006

Kirk McDonald <kirklin.mcdonald gmail.com> writes:

Oskar Linde wrote:
 Kirk McDonald wrote:
 
 renox wrote:

 Dave wrote:

 What if selected functions in phobos were modified to take an 
 optional parameter that specified COW or in-place? The default for 
 each would be whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.



 In ruby, they have this nice convention that a.function() leaves a 
 unchanged and a.function!() modifies a.

 Something like this would be nice, the hard part is choosing the 
 correct naming convention so that it is followed..

 functionXIP (eXecute In Place), functionWSD (With Side Effect)?
 Sigh, hard to achieve something as simple and elegant as '!' : 
 caution this function modifies the object!

 In the absence of proper naming termination, an optionnal parameter 
 could be used yes.

 What about:

 void   toupper(char[] s);  // Modifies s in-place
 char[] asupper(char[] s);  // COW function

 Of course, this convention would only apply to functions named 
 "tosomething", but I bet most/all of the functions for which an 
 "in-place" operation makes sense are named that.

 
 
 It doesn't really apply to functions that are verbs, like capitalize, 
 sort and map.
 
 For those one option is: capitalized, sorted and mapped for COW versions.
 
 /Oskar

Those make me think the function is /asking/ if the array/string is 
capitalized, sorted, &c. For sheer, bloodyminded consistency's sake, we 
could use ascapitalized, assorted, &c, but those read pretty poorly. 
Hrm. On second thought, your idea is better. :-)

-- 
Kirk McDonald
Pyd: Wrapping Python with D
http://dsource.org/projects/pyd/wiki

Aug 03 2006

D Programming

C/C++ Programming

Other

digitalmars.D - COW vs. in-place.