www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - COW vs. in-place.

reply Dave <Dave_member pathlink.com> writes:
What if selected functions in phobos were modified to take an optional 
parameter that specified COW or in-place? The default for each would be 
whatever they do now.

For example, toupper and tolower?

How many times have we seen something like this:

str = toupper(str); // or equivalent in another language.

Thanks,

- Dave
Jul 31 2006
next sibling parent reply Dawid =?UTF-8?B?Q2nEmcW8YXJraWV3aWN6?= <dawid.ciezarkiewicz gmail.com> writes:
Dave wrote:

 
 What if selected functions in phobos were modified to take an optional
 parameter that specified COW or in-place? The default for each would be
 whatever they do now.
 
 For example, toupper and tolower?
 
 How many times have we seen something like this:
 
 str = toupper(str); // or equivalent in another language.
 
 Thanks,
 
 - Dave
I don't get it (the example). str = toupper(str); does not mean that str can be modified in place -- BEGIN -- void bla(char[] str) { str = toupper(str); /* something else */ } bla("this string is readonly"); --- END --- If you mean something else - sorry, still I don't get it. I'd rather wait till const/immutability in D problem will be resolved. Don't forget that additional "option" is runtime cost. There are some propositions of const/immutability that could help providing compile time information to deal with your proposition.
Jul 31 2006
next sibling parent BCS <BCS pathlink.com> writes:
Dawid Ciezarkiewicz wrote:
 Dave wrote:
 
 
What if selected functions in phobos were modified to take an optional
parameter that specified COW or in-place? The default for each would be
whatever they do now.

For example, toupper and tolower?

How many times have we seen something like this:

str = toupper(str); // or equivalent in another language.

Thanks,

- Dave
I don't get it (the example). str = toupper(str); does not mean that str can be modified in place
how about: str[] = toupper(str)[];
Jul 31 2006
prev sibling next sibling parent Dave <Dave_member pathlink.com> writes:
Dawid Ciężarkiewicz wrote:
 Dave wrote:
 
 What if selected functions in phobos were modified to take an optional
 parameter that specified COW or in-place? The default for each would be
 whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.

 Thanks,

 - Dave
I don't get it (the example). str = toupper(str); does not mean that str can be modified in place -- BEGIN -- void bla(char[] str) { str = toupper(str); /* something else */ } bla("this string is readonly"); --- END --- If you mean something else - sorry, still I don't get it.
Right now, if you call toupper with a string with any lower-case chars. in it, it will .dup the string passed into it, then modify the dup'd string instead of the original and then return the dup. If it doesn't need to modify the string then it returns a reference to the original string. Often though, people just want to modify the original string anyway, so they do this: str = toupper(str); A new version could be declared as: char[] toupper(char[] s, CIP cip = CIP.COW); and changed to not modify the original instead of dup'ing it if COW isn't specified (which it is by default). Then instead of str = toupper(str) you could do: toupper(str,CIP.InPlace); and avoid the duplication (a ref. to the modified string is still returned).
 I'd rather wait till const/immutability in D problem will be resolved. Don't
 forget that additional "option" is runtime cost. There are some
 propositions of const/immutability that could help providing compile time
 information to deal with your proposition.
Jul 31 2006
prev sibling next sibling parent reply Dave <Dave_member pathlink.com> writes:
Dawid Ciężarkiewicz wrote:
 I'd rather wait till const/immutability in D problem will be resolved. Don't
 forget that additional "option" is runtime cost. There are some
 propositions of const/immutability that could help providing compile time
 information to deal with your proposition.
It would take many calls to the modified toupper to cost as much as needlessly duplicating one large text file, and now you have to either live with the dups or write your own in-place toupper <g> None of the const/immutability ideas will take care of having to "copy on write"; they were all more-or-less just ways of enforcing COW so there wouldn't be mistakes.
Jul 31 2006
next sibling parent reply Dawid =?UTF-8?B?Q2nEmcW8YXJraWV3aWN6?= <dawid.ciezarkiewicz gmail.com> writes:
Dave wrote:

 Dawid Ciężarkiewicz wrote:
 I'd rather wait till const/immutability in D problem will be resolved.
 Don't forget that additional "option" is runtime cost. There are some
 propositions of const/immutability that could help providing compile time
 information to deal with your proposition.
It would take many calls to the modified toupper to cost as much as needlessly duplicating one large text file, and now you have to either live with the dups or write your own in-place toupper <g>
Yes. Still - I'd rather see duplicated functions for that or something like it (just to have it in compile time).
 None of the const/immutability ideas will take care of having to "copy
 on write"; they were all more-or-less just ways of enforcing COW so
 there wouldn't be mistakes.
Well, right. Maybe just writting new module (std.strinplace) that do what you want and then sending it to Walter/D discussion group is good . I guess with newday import improvements names could stay like they were and people interested in this speedup would statically import this module and use FQN where they want such behavior.
Jul 31 2006
parent reply Dave <Dave_member pathlink.com> writes:
Dawid Ciężarkiewicz wrote:
 Dave wrote:
 
 Dawid Ciężarkiewicz wrote:
 I'd rather wait till const/immutability in D problem will be resolved.
 Don't forget that additional "option" is runtime cost. There are some
 propositions of const/immutability that could help providing compile time
 information to deal with your proposition.
It would take many calls to the modified toupper to cost as much as needlessly duplicating one large text file, and now you have to either live with the dups or write your own in-place toupper <g>
Yes. Still - I'd rather see duplicated functions for that or something like it (just to have it in compile time).
 None of the const/immutability ideas will take care of having to "copy
 on write"; they were all more-or-less just ways of enforcing COW so
 there wouldn't be mistakes.
Well, right. Maybe just writting new module (std.strinplace) that do what you want and then sending it to Walter/D discussion group is good . I guess with newday import improvements names could stay like they were and people interested in this speedup would statically import this module and use FQN where they want such behavior.
Not a bad idea... The main prob. would be that there would be a lot of duplication of code.
Jul 31 2006
next sibling parent reply Derek <derek psyc.ward> writes:
On Mon, 31 Jul 2006 16:40:54 -0500, Dave wrote:

Not a bad idea... The main prob. would be that there would be a lot of 
duplication of code.
void toUpper_inplace(char[] x) { . . . } char[] toUpper(char[] x) { char[] y = x.dup; toUpper_inplace(y); return y; } -- Derek Parnell Melbourne, Australia "Down with mediocrity!"
Jul 31 2006
parent reply Kirk McDonald <kirklin.mcdonald gmail.com> writes:
Derek wrote:
 On Mon, 31 Jul 2006 16:40:54 -0500, Dave wrote:
 
 
Not a bad idea... The main prob. would be that there would be a lot of 
duplication of code.
void toUpper_inplace(char[] x) { . . . } char[] toUpper(char[] x) { char[] y = x.dup; toUpper_inplace(y); return y; }
I've got one better. Say we have a whole bunch of inplace string functions, like the one above and this one: void toLower_inplace(char[] x) { // ... } and others. Then we can: char[] cow_func(alias fn)(char[] x) { char[] y = x.dup; fn(y); return y; } alias cow_func!(toUpper_inplace) toUpper; alias cow_func!(toLower_inplace) toLower; Etc. Obviously, you'd have to provide a different template for each function footprint, but the string library has a lot of repeated footprints. -- Kirk McDonald Pyd: Wrapping Python with D http://dsource.org/projects/pyd/wiki
Jul 31 2006
parent reply Dave <Dave_member pathlink.com> writes:
Kirk McDonald wrote:
 Derek wrote:
 On Mon, 31 Jul 2006 16:40:54 -0500, Dave wrote:


 Not a bad idea... The main prob. would be that there would be a lot 
 of duplication of code.
void toUpper_inplace(char[] x) { . . . } char[] toUpper(char[] x) { char[] y = x.dup; toUpper_inplace(y); return y; }
With this one, you're always dup'ing instead of .dup'ing only when needed (the current one is actually more efficient).
 
 I've got one better. Say we have a whole bunch of inplace string 
 functions, like the one above and this one:
 
 void toLower_inplace(char[] x) {
     // ...
 }
 
 and others. Then we can:
 
 char[] cow_func(alias fn)(char[] x) {
     char[] y = x.dup;
     fn(y);
     return y;
 }
 
 alias cow_func!(toUpper_inplace) toUpper;
 alias cow_func!(toLower_inplace) toLower;
 
 Etc. Obviously, you'd have to provide a different template for each 
 function footprint, but the string library has a lot of repeated 
 footprints.
 
I think to maximize code re-use you'd have to build the "COW or not to COW" logic into the "base" function. And if you did that you'd have to live with a little more function call overhead (passing a bool or small enum around) in order to avoid the defensive copying like in cow_func above. I'm wondering - if Phobos would have been built that way (making it the 'D way' of doing things), would all the concerns about GC performance and "const" have been so acute over the last year or so (hind-sight is always closer to 20-20 of course)? The problem w/ all the dup'ing is when you put something like this in a tight loop you get sloooowwwww code: import std.file, std.string, std.stdio; void main() { char[][] formatted; char[][] text = split(cast(char[])read("largefile.txt"), "."); foreach(char[] sentence; text) { formatted ~= capitalize(tolower(strip(sentence))) ~ ".\r\n"; } //... foreach(char[] sentence; formatted) { writefln(sentence); } } None of those functions (except for read()) would really have to do much allocating because the input file for all intents and purposes is read-only here (it won't get implicitly modified even if COW isn't used). - Dave
Jul 31 2006
parent reply Derek Parnell <derek nomail.afraid.org> writes:
On Mon, 31 Jul 2006 18:01:14 -0500, Dave wrote:

 Kirk McDonald wrote:
 Derek wrote:
 On Mon, 31 Jul 2006 16:40:54 -0500, Dave wrote:


 Not a bad idea... The main prob. would be that there would be a lot 
 of duplication of code.
void toUpper_inplace(char[] x) { . . . } char[] toUpper(char[] x) { char[] y = x.dup; toUpper_inplace(y); return y; }
With this one, you're always dup'ing instead of .dup'ing only when needed (the current one is actually more efficient).
I'm getting confused about what you are after now, sorry. It seems that you are wanting a CoW version, an InPlace version, and a non-Destructive version of each function and let the compiler and/or the author choose the best one for the job at hand. The example about gave the InPlace and non-destructive versoins and the current version is CoW. ...
 The problem w/ all the dup'ing is when you put something like this in a 
 tight loop you get sloooowwwww code:
Not if the author has a choice ... import std.file, std.string, std.stdio; void main() { char[][] formatted; char[][] text = split(cast(char[])read("largefile.txt"), "."); foreach(char[] sentence; text) { strip_IP(sentence); tolower_IP(sentence); capitalize_IP(sentence); formatted ~= sentence ~ ".\r\n"; } //... foreach(char[] sentence; formatted) { writefln(sentence); } } -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocrity!" 1/08/2006 11:18:40 AM
Jul 31 2006
parent reply Dave <Dave_member pathlink.com> writes:
Derek Parnell wrote:
 On Mon, 31 Jul 2006 18:01:14 -0500, Dave wrote:
 
 Kirk McDonald wrote:
 Derek wrote:
 On Mon, 31 Jul 2006 16:40:54 -0500, Dave wrote:


 Not a bad idea... The main prob. would be that there would be a lot 
 of duplication of code.
void toUpper_inplace(char[] x) { . . . } char[] toUpper(char[] x) { char[] y = x.dup; toUpper_inplace(y); return y; }
With this one, you're always dup'ing instead of .dup'ing only when needed (the current one is actually more efficient).
I'm getting confused about what you are after now, sorry. It seems that you are wanting a CoW version, an InPlace version, and a non-Destructive version of each function and let the compiler and/or the author choose the best one for the job at hand. The example about gave the InPlace and non-destructive versoins and the current version is CoW. ...
 The problem w/ all the dup'ing is when you put something like this in a 
 tight loop you get sloooowwwww code:
Not if the author has a choice ... import std.file, std.string, std.stdio; void main() { char[][] formatted; char[][] text = split(cast(char[])read("largefile.txt"), "."); foreach(char[] sentence; text) { strip_IP(sentence); tolower_IP(sentence); capitalize_IP(sentence); formatted ~= sentence ~ ".\r\n"; } //... foreach(char[] sentence; formatted) { writefln(sentence); } }
Sorry, I think some of that got lost in the thread... I'm asking if it would make sense to change the current functions so COW is optional. That way current code wouldn't be broken but we'd have the choice. For example, the current tolower w/ the changes added (denoted by **): //** char[] tolower(char[] s) char[] tolower(char[] s, bool cow = true) //** { int changed; int i; char[] r = s; changed = 0; for (i = 0; i < s.length; i++) { auto c = s[i]; if ('A' <= c && c <= 'Z') { //**if (!changed) if (cow && !changed) //** { r = s.dup; changed = 1; } r[i] = c + (cast(char)'a' - 'A'); } else if (c >= 0x7F) { foreach (size_t j, dchar dc; s[i .. length]) { //**if (!changed) if (cow && !changed) //** { if (!std.uni.isUniUpper(dc)) continue; r = s[0 .. i + j].dup; changed = 1; } dc = std.uni.toUniLower(dc); std.utf.encode(r, dc); } break; } } return r; } So the sample code would become: import std.file, std.string, std.stdio; void main() { char[][] formatted; char[][] text = split(cast(char[])read("largefile.txt"), "."); foreach(char[] sentence; text) { formatted ~= capitalize(tolower(strip(sentence, false), false), false) ~ ".\r\n"; } //... foreach(char[] sentence; formatted) { writefln(sentence); } } Then I suggested either make the cow parameter default to false, or wondered how things would have worked out if the original data owner became responsible for there own dups: void main() { char[][] formatted; char[] original = cast(char[])read("largefile.txt").dup; //** char[][] text = split(original, "."); foreach(char[] sentence; text) { formatted ~= capitalize(tolower(strip(sentence))) ~ ".\r\n"; } //... foreach(char[] sentence; formatted) { writefln(sentence); } //** The 'original' (duplicated, unmodified) data is used again here } If everything was done inplace in Phobos, then it would become 2nd nature for the owner to dup when needed. And the user wouldn't need to rely on the hope that the library developer didn't make a mistake and forget to COW when they were supposed to. Thanks, - Dave
Jul 31 2006
parent Kirk McDonald <kirklin.mcdonald gmail.com> writes:
Dave wrote:
 Sorry, I think some of that got lost in the thread...
 
 I'm asking if it would make sense to change the current functions so COW 
 is optional. That way current code wouldn't be broken but we'd have the 
 choice.
 
Using a function parameter as you suggest is fine and all (it helps in code re-use as your example ably shows), but I find calling, e.g. islower_inplace clearer than some strange 'false' parameter at the end of the argument list. If we make the 'cow' parameter default to 'true', we might also provide a wrapper: char[] inplace_wrap(alias fn)(char[] s) { return fn(s, false); } alias inplace_wrap!(tolower) tolower_inplace; alias inplace_wrap!(toupper) toupper_inplace; // &c, &c (I like this method of function wrapping, can you tell?) Or we could just as easily default cow to 'false' and have the wrapper be 'cow_wrap' instead. (It would also be easy enough to provide both.)
 If everything was done inplace in Phobos, then it would become 2nd 
 nature for the owner to dup when needed. And the user wouldn't need to 
 rely on the hope that the library developer didn't make a mistake and 
 forget to COW when they were supposed to.
I sure hope the library makes this an important, documented part of its interface. -- Kirk McDonald Pyd: Wrapping Python with D http://dsource.org/projects/pyd/wiki
Jul 31 2006
prev sibling parent Dawid =?UTF-8?B?Q2nEmcW8YXJraWV3aWN6?= <dawid.ciezarkiewicz gmail.com> writes:
Dave wrote:
 Maybe just writting new module (std.strinplace) that do what you want and
 then sending it to Walter/D discussion group is good . I guess with
 newday import improvements names could stay like they were and people
 interested in this speedup would statically import this module and use
 FQN where they want such behavior.
Not a bad idea... The main prob. would be that there would be a lot of duplication of code.
Well. IMO not so much. There are not so many essential functions operating on strings and they don't change too often.
Aug 01 2006
prev sibling parent Reiner Pope <reiner.pope gmail.com> writes:
Dave wrote:
 None of the const/immutability ideas will take care of having to "copy 
 on write"; they were all more-or-less just ways of enforcing COW so 
 there wouldn't be mistakes.
Argh, that's what all of my proposals are about. See: rocheck in 'YACP -- Yet Another Const Proposal' on digitalmars.D 'constness for arrays' by xs0 on digitalmars.D 'what's wrong with just a runtime-checked const'? on digitalmars.D.learn These all explore a way to make array functions work optimally in all cases. The rocheck proposal (the most recent one) would look as follows: rocheck char[] toupper(rocheck char[] input) { foreach (i, c; input) { if (islower(c)) { char[] temp = input.ensureWritable; // ensureWritable checks whether it is mutable and copies if not temp[i] = chartoupper(c); input = temp; // if we did indeed duplicate, then make sure we now use the duplicated one } } return input; } // Another alternative: faster, but more code rocheck char[] toupper(rocheck char[] input) { foreach (i, c; input) { if (islower(c)) { char[] temp = input.ensureWritable; foreach (inout c2; temp[i..$]) { if (islower(c2)) c2 = toupper(c2); } return temp; } } return input; } // Now look what we can do: char[] foo = "hello".dup; foo = toupper(foo).ensureWritable; // Ensurewritable is a null-op here, because there is never a const reference. It's only there to please the const checking of the compiler readonly char[] bar = baz.getName(); foo = toupper(bar).ensureWritable; // if toupper modifies, then it will dup it (since bar is readonly). Iff not, then ensureWritable will dup it. This way, we ensure exactly one duplication, which is as required. readonly char[] asdf = CIP1(CIP2(CIP3(bar))); /// CIP1, 2 and 3 are rocheck functions like toupper above. If none of them modify, then no duplication takes place. If one of them does, then only one duplication takes place. Having it integrated into the language is more powerful, because it actually works with const checking and makes the syntax cleaner. Consider how you would get the same efficiency with the last statement using the CIP enum when just modifying the library: CIP1, CIP2 and CIP3 would all need signatures as follows: char[] CIP1(char[] input, inout CIP cipness) {...} It would be inout so that you can tell it about the input, and it can tell you about the output. If you don't know the ownership of the output, you will get unnecessary dups. Here is how you would emulate the last line of the rocheck sample code: CIP temp = CIP.COW; char[] bar; // We mustn't modify this bar = CIP1(bar, temp); // bar *might* be modifiable inplace, but only temp knows bar = CIP2(bar, temp); bar = CIP3(bar, temp); // We still don't know whether bar is the original, unmodifiable one, or not. However, temp can tell us. This code is much more verbose than one built into the language. Cheers, Reiner
Jul 31 2006
prev sibling parent reply "Andrei Khropov" <andkhropov nospam_mtu-net.ru> writes:
Dawid Ciężarkiewicz wrote:

 I'd rather wait till const/immutability in D problem will be resolved. Don't
 forget that additional "option" is runtime cost. There are some
 propositions of const/immutability that could help providing compile time
 information to deal with your proposition.
I agree. Adding additional parameter doesn't seem to be a good idea and also raises the question whether the default behavior will be to copy or not and also introduces possibility of subtle errors when passing the flag was mistakenly omitted. -- AKhropov
Aug 01 2006
parent kris <foo bar.com> writes:
Andrei Khropov wrote:
 Dawid Ciężarkiewicz wrote:
 
 
I'd rather wait till const/immutability in D problem will be resolved. Don't
forget that additional "option" is runtime cost. There are some
propositions of const/immutability that could help providing compile time
information to deal with your proposition.
I agree. Adding additional parameter doesn't seem to be a good idea and also raises the question whether the default behavior will be to copy or not and also introduces possibility of subtle errors when passing the flag was mistakenly omitted.
"to CoW or not to CoW ~ that is the question ..." "to err is human; to moo is bovine"
Aug 01 2006
prev sibling next sibling parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Dave" <Dave_member pathlink.com> wrote in message 
news:ealack$bjg$1 digitaldaemon.com...
 What if selected functions in phobos were modified to take an optional 
 parameter that specified COW or in-place? The default for each would be 
 whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.
I've had the same idea; would be great for those trying to write libraries that make as few allocations as possible. Not to mention just plain more efficient if you don't need a copy.
Jul 31 2006
parent Dave <Dave_member pathlink.com> writes:
Jarrett Billingsley wrote:
 "Dave" <Dave_member pathlink.com> wrote in message 
 news:ealack$bjg$1 digitaldaemon.com...
 What if selected functions in phobos were modified to take an optional 
 parameter that specified COW or in-place? The default for each would be 
 whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.
I've had the same idea; would be great for those trying to write libraries that make as few allocations as possible. Not to mention just plain more efficient if you don't need a copy.
Too much water under the bridge now anyway (or is there?), but I've often thought that it would've been better to do the same and make in-place the default and COW the exception anyhow. This wouldn't have been a hurdle for people coming from the C lib. to Phobos anyway -- they're used to it (e.g.: strcat, et al). As to users of other languages, all the docs. would have to do is make sure to point out what in-place means, with maybe an example of how to .dup your string before you pass it in if needed.
Jul 31 2006
prev sibling next sibling parent reply "Lionello Lunesu" <lio lunesu.remove.com> writes:
"Dave" <Dave_member pathlink.com> wrote in message 
news:ealack$bjg$1 digitaldaemon.com...
 What if selected functions in phobos were modified to take an optional 
 parameter that specified COW or in-place? The default for each would be 
 whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.
str being an UTF-8 string, I don't think you can guarantee that it CAN be made uppercase in-place. It seems to me that it's quite possible that some uppercase UNICODE characters are larger than their lowercase versions, possibly crossing an UTF-8 byte-count border. But there are other string functions that don't have this problem. In either case, a standard library should simply provide two functions, one in-place and the other COW. I many cases, the COW function could use the in-place one, eliminating duplicate code. For example, In my own lib I use .ToUpper() for the in-place version and .UpperCase() for the COW one. L.
Aug 01 2006
parent reply Dawid =?UTF-8?B?Q2nEmcW8YXJraWV3aWN6?= <dawid.ciezarkiewicz gmail.com> writes:
Lionello Lunesu wrote:

 
 "Dave" <Dave_member pathlink.com> wrote in message
 news:ealack$bjg$1 digitaldaemon.com...
 What if selected functions in phobos were modified to take an optional
 parameter that specified COW or in-place? The default for each would be
 whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.
str being an UTF-8 string, I don't think you can guarantee that it CAN be made uppercase in-place. It seems to me that it's quite possible that some uppercase UNICODE characters are larger than their lowercase versions, possibly crossing an UTF-8 byte-count border. But there are other string functions that don't have this problem.
This _is_ problem.
 In either case, a standard library should simply provide two functions,
 one in-place and the other COW. I many cases, the COW function could use
 the in-place one, eliminating duplicate code. For example, In my own lib I
 use .ToUpper() for the in-place version and .UpperCase() for the COW one.
Well thought.
Aug 01 2006
parent Thomas Kuehne <thomas-dloop kuehne.cn> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dawid Ci??arkiewicz schrieb am 2006-08-01:
 Lionello Lunesu wrote:

 
 "Dave" <Dave_member pathlink.com> wrote in message
 news:ealack$bjg$1 digitaldaemon.com...
 What if selected functions in phobos were modified to take an optional
 parameter that specified COW or in-place? The default for each would be
 whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.
str being an UTF-8 string, I don't think you can guarantee that it CAN be made uppercase in-place. It seems to me that it's quite possible that some uppercase UNICODE characters are larger than their lowercase versions, possibly crossing an UTF-8 byte-count border. But there are other string functions that don't have this problem.
This _is_ problem.
http://www.unicode.org/reports/tr21/ from ftp://ftp.unicode.org/Public/UNIDATA/CaseFolding.txt This allows to keep the code point count constant, the UTF-8 fragment count however is a problem. Currently (5.0.0 2006-03-03, 08:22:43 GMT) there are 9 + 2 cases where the fragment count changes: Only used for Turkic languages (tr, az): Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFE0RG3LK5blCcjpWoRAtjwAJ4wHpa36MrLRwlmBFs86gDdJyLHaQCfRNFI 6Ejb+99BzV5dl2QW9giF8Qg= =h/xz -----END PGP SIGNATURE-----
Aug 02 2006
prev sibling next sibling parent "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 31 Jul 2006 11:18:40 -0500, Dave <Dave_member pathlink.com> wrote:
 What if selected functions in phobos were modified to take an optional  
 parameter that specified COW or in-place? The default for each would be  
 whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.
I think it's the right idea, but I think it's simply a variation of the idea that the array itself needs a flag to tell functions whether they have to copy, or can modify in place. A 'readonly' flag, as mentioned here in other threads. I'd prefer the flag was internal to the array so that my function signatures were simpler and less cluttered by things not directly related to the function. That said, your idea can be implemented right now. The internal array flag requires Walter to agree and change D's arrays. Regan
Aug 02 2006
prev sibling next sibling parent reply Sean Kelly <sean f4.ca> writes:
Dave wrote:
 
 What if selected functions in phobos were modified to take an optional 
 parameter that specified COW or in-place? The default for each would be 
 whatever they do now.
 
 For example, toupper and tolower?
 
 How many times have we seen something like this:
 
 str = toupper(str); // or equivalent in another language.
Why not: str = toupper(str); // in-place str = toupper(str.dup); // COW or alternately: char[] toupper(char[] src, char[] dst = null); where dst is an optional destination argument. Sean
Aug 02 2006
next sibling parent Dave <Dave_member pathlink.com> writes:
Sean Kelly wrote:
 Dave wrote:
 What if selected functions in phobos were modified to take an optional 
 parameter that specified COW or in-place? The default for each would 
 be whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.
Why not: str = toupper(str); // in-place str = toupper(str.dup); // COW
That's how I think things should be, but it might break a lot of code now <g>
 or alternately:
 
     char[] toupper(char[] src, char[] dst = null);
 
 where dst is an optional destination argument.
 
 
 Sean
Aug 02 2006
prev sibling parent reply Reiner Pope <reiner.pope gmail.com> writes:
 Why not:
 
     str = toupper(str);     // in-place
     str = toupper(str.dup); // COW
This is not copy on write. That is simply 'always copy', and this performs worse than COW (which in turn performs worse than in-place, if in-place is possible). Walter has also said earlier that, with COW, it should be the responsibility of the writer to ensure the copy, not the caller.
Aug 03 2006
next sibling parent reply Dave <Dave_member pathlink.com> writes:
Reiner Pope wrote:
 Why not:

     str = toupper(str);     // in-place
     str = toupper(str.dup); // COW
This is not copy on write. That is simply 'always copy', and this
But presumably the user would only do the dup if they didn't want to modify str, so CoW would basically go away as a design pattern.
 performs worse than COW (which in turn performs worse than in-place, if 
 in-place is possible). Walter has also said earlier that, with COW, it 
 should be the responsibility of the writer to ensure the copy, not the 
 caller.
That's what I'm questioning ultimately. The caller knows best if the object that _they created_ should be modified or copied and they can do that best before a call to a modifying function. No matter if that happens to be the developer of another lib. function or an application programmer. What's more, CoW for arrays is inconsistent with how other reference objects are treated (class objects are really not made for CoW - there's not even a rudimentary copy ctor provided by the language. Same with AA's, which don't have a .dup for example). Ultimately, most data that is modified is used modified for its remaining program "lifetime", and however the original data was sourced (e.g.: reading from disk) can be replicated if needed instead of having to keep copies around. I think CoW for arrays was a mistake -- it is most often unnecessary, will cause D to repeat many of Java's performance woes for the average user, and as I mentioned is inconsistent as well. It's a lose-lose-lose. - Dave
Aug 03 2006
next sibling parent Reiner Pope <reiner.pope gmail.com> writes:
Dave wrote:
 Reiner Pope wrote:
 Why not:

     str = toupper(str);     // in-place
     str = toupper(str.dup); // COW
This is not copy on write. That is simply 'always copy', and this
But presumably the user would only do the dup if they didn't want to modify str, so CoW would basically go away as a design pattern.
 performs worse than COW (which in turn performs worse than in-place, 
 if in-place is possible). Walter has also said earlier that, with COW, 
 it should be the responsibility of the writer to ensure the copy, not 
 the caller.
That's what I'm questioning ultimately. The caller knows best if the object that _they created_ should be modified or copied and they can do that best before a call to a modifying function. No matter if that happens to be the developer of another lib. function or an application programmer. What's more, CoW for arrays is inconsistent with how other reference objects are treated (class objects are really not made for CoW - there's not even a rudimentary copy ctor provided by the language. Same with AA's, which don't have a .dup for example). Ultimately, most data that is modified is used modified for its remaining program "lifetime", and however the original data was sourced (e.g.: reading from disk) can be replicated if needed instead of having to keep copies around. I think CoW for arrays was a mistake -- it is most often unnecessary, will cause D to repeat many of Java's performance woes for the average user, and as I mentioned is inconsistent as well. It's a lose-lose-lose. - Dave
While I'm not convinced that CoW is such a bad situation, I agree with you that it is not perfect. However, a proper solution would need to make use of some facts: - the caller knows best whether the array may be edited in-place - whether the string should be modified in-place is often not known at compile time. These require the passing of a bool indicating whether it should be copied on write, or not, which is just as you suggest. However, to support this with the nicest code, it would be best to be both compiler-checked and language-supported. Of course, this is just advertising for the rocheck type modifier I'm proposing in YACP. The benefit of language support can also mean that inlining in situations with readonlyness known at compile time may have the CoW checking optimized away. Cheers, Reiner
Aug 03 2006
prev sibling parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Dave wrote:
 Reiner Pope wrote:
 Why not:

     str = toupper(str);     // in-place
     str = toupper(str.dup); // COW
What is the advantage of redundantly assigning the result of an in-place function to itself? In my opinion, all in-place functions should have a void return type to avoid common mistakes such as: foreach(e; arr.reverse) { ... } // OOPS, arr is now reversed .dup followed by calling an in-place function is certainly ok, but in those cases, an ordinary functional (non-in-place) function would have been more efficient.
 This is not copy on write. That is simply 'always copy', and this 
But presumably the user would only do the dup if they didn't want to modify str, so CoW would basically go away as a design pattern.
 performs worse than COW (which in turn performs worse than in-place, 
 if in-place is possible). Walter has also said earlier that, with COW, 
 it should be the responsibility of the writer to ensure the copy, not 
 the caller.
That's what I'm questioning ultimately. The caller knows best if the object that _they created_ should be modified or copied and they can do that best before a call to a modifying function. No matter if that happens to be the developer of another lib. function or an application programmer. What's more, CoW for arrays is inconsistent with how other reference objects are treated (class objects are really not made for CoW - there's not even a rudimentary copy ctor provided by the language. Same with AA's, which don't have a .dup for example).
 
 Ultimately, most data that is modified is used modified for its 
 remaining program "lifetime", and however the original data was sourced 
 (e.g.: reading from disk) can be replicated if needed instead of having 
 to keep copies around.
 
 I think CoW for arrays was a mistake -- it is most often unnecessary, 
 will cause D to repeat many of Java's performance woes for the average 
 user, and as I mentioned is inconsistent as well. It's a lose-lose-lose.
Consider the following (just made up) case insensitive multi-file word count application: import std.stdio; import std.file; import std.string; void main(char[][] args) { int[char[]] wc; foreach(filename; args[1..$]) { char[] data = cast(char[]) read(filename); foreach(word; data.split()) wc[tolower(word)]++; } writefln("num words: ",wc.length); } If you ran this program on the full collection of 18000 Gutenberg books, you would inevitably run out of memory. Why would you do that when a standard English dictionary only occupies a couple of megabytes? Without knowing the intricate details of D and Phobos, I bet you would have no way of knowing that you got killed by the cow. :) /Oskar
Aug 03 2006
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Oskar Linde wrote:
 Dave wrote:
 Reiner Pope wrote:
 Why not:

     str = toupper(str);     // in-place
     str = toupper(str.dup); // COW
What is the advantage of redundantly assigning the result of an in-place function to itself? In my opinion, all in-place functions should have a void return type to avoid common mistakes such as: foreach(e; arr.reverse) { ... } // OOPS, arr is now reversed
I like returning the mutated value so the function call can be embedded in other code. And arr.reverse is already a built-in mutating function, according to the spec.
 .dup followed by calling an in-place function is certainly ok, but in 
 those cases, an ordinary functional (non-in-place) function would have 
 been more efficient.
Why? Sean
Aug 03 2006
parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Sean Kelly wrote:
 Oskar Linde wrote:
 Dave wrote:
 Reiner Pope wrote:
 Why not:

     str = toupper(str);     // in-place
     str = toupper(str.dup); // COW
What is the advantage of redundantly assigning the result of an in-place function to itself? In my opinion, all in-place functions should have a void return type to avoid common mistakes such as: foreach(e; arr.reverse) { ... } // OOPS, arr is now reversed
I like returning the mutated value so the function call can be embedded in other code.
I have already seen the above foreach error in others D code. I believe it is good library design to clearly mark functions with side-effects. Giving them a void return type will prevent any mistake of the following kind (assume toupper is in-place modifying as well as returning): func(toupper(mystring)); func(arr.reverse); where the side effect was unintended. could those be errors: ? arr2 = arr1.reverse; toupper(mystring) ~ mystring;
 And arr.reverse is already a built-in mutating function, 
 according to the spec.
Yes. I find that unfortunate and inconsistent with how Phobos is designed. Luckily, arr.sort and arr.reverse are not callable as arr.sort() and arr.reverse(), so they really don't look like functions.
 .dup followed by calling an in-place function is certainly ok, but in 
 those cases, an ordinary functional (non-in-place) function would have 
 been more efficient.
Why?
What I meant was that .dup + inplace will never be more efficient than a copying algorithm. In-place algorithms are often more complicated. If you want a copy anyway, it is more efficient to use a copying algorithm. As an example, consider stable sorting, where efficient copying algorithms are trivial. Re: Library design I would like to see both copying and in-place versions of algorithms where it makes sense, but only one behavior should be default. That default should be consistent throughout the standard library and preferably be recommended in an official style guide for third party libraries to follow. I see two valid designs: 1. in-place default, copying algorithms specially named ------------------------------------------------------- Design: void toUpper(char[] str); // in-place char[] toUpperCopy(char[] str); // copy Pros: * in-place is often more efficient and therefore default. * many functions are imperative verbs, and as such one expects them to be modifying * Similar to how the C++ STL is designed Cons: * many functions can not be expressed in-place (example: UTF-8 toUpper) 2. copying default, in-place versions specially named ----------------------------------------------------- Design: void toUpperInPlace(char[] str); // in-place char[] toUpper(char[] str); // copy Pros: * copying is safer, and is therefore a better default * in-place is an optimization and would stand out as such * default is functional (no-side effects), side effects stand out * people used to functional style programming would not find any surprises * all functions can be defined as copying functions * how many popular languages are designed (Ruby, Python, php, all "functional" languages, etc...) Cons: * could confuse people, lead to silent errors: toupper(str); // doesn't change str cos(x); // doesn't change x ;) For the record, I am in favor of number 2 and that would have biased the arguments above. /Oskar
Aug 03 2006
parent reply Dave <Dave_member pathlink.com> writes:
Oskar Linde wrote:
 
 1. in-place default, copying algorithms specially named
 -------------------------------------------------------
 
 Design:
 void toUpper(char[] str); // in-place
 char[] toUpperCopy(char[] str); // copy
 
 Pros:
 * in-place is often more efficient and therefore default.
 * many functions are imperative verbs, and as such one expects them to 
 be modifying
 * Similar to how the C++ STL is designed
 Cons:
 * many functions can not be expressed in-place (example: UTF-8 toUpper)
 
Hmmm - Is the current implementation of std.string.toupper wrong then? (If you removed the if(!changed) {...} blocks [where the CoW is milked] you would effectively have an in-place implementation).
 
 2. copying default, in-place versions specially named
 -----------------------------------------------------
 
 Design:
 void toUpperInPlace(char[] str); // in-place
 char[] toUpper(char[] str); // copy
 
 Pros:
 * copying is safer, and is therefore a better default
Only if the coder expects that is the default, *and* they most often need the original data intact later in the program. And that safety is not much of an advantage when your code is three-legged dog slow and eats up resources that could be used by other processes :) Walking to work may be safer than going 70 MPH on the freeway, but it would take me a week and I'd starve.
 * in-place is an optimization and would stand out as such
It's only considered an 'optimization' right now because it's different from the default (CoW).
 * default is functional (no-side effects), side effects stand out
 * people used to functional style programming would not find any
 surprises
 * all functions can be defined as copying functions
 * how many popular languages are designed (Ruby, Python, php, all 
 "functional" languages, etc...)
Yes, but all of these are languages where performance is not an imperative (excepting some of the functional languages perhaps). Plus think of all the time and effort that have been spent on GC's because of this design choice :)
 Cons:
 * could confuse people, lead to silent errors:
 toupper(str); // doesn't change str
 cos(x); // doesn't change x ;)
 
 For the record, I am in favor of number 2 and that would have biased the 
 arguments above.
 
 /Oskar
Aug 03 2006
parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Dave wrote:
 Oskar Linde wrote:
 1. in-place default, copying algorithms specially named
 -------------------------------------------------------

 Design:
 void toUpper(char[] str); // in-place
 char[] toUpperCopy(char[] str); // copy

 Pros:
 * in-place is often more efficient and therefore default.
 * many functions are imperative verbs, and as such one expects them to 
 be modifying
 * Similar to how the C++ STL is designed
 Cons:
 * many functions can not be expressed in-place (example: UTF-8 toUpper)
Hmmm - Is the current implementation of std.string.toupper wrong then?
Not really. Some of the newer case folding mappings from Unicode are missing from std.uni, so it could be better though.
 (If you removed the if(!changed) {...} blocks [where the CoW is milked] 
 you would effectively have an in-place implementation).
Again, not really. :) See Thomas Kuehne's post further up the thread. There are certain unicode case foldings where the number of UTF-8 element changes. This will be handled correctly by std.string.toupper/tolower, but the result can not be in-place.
 2. copying default, in-place versions specially named
 -----------------------------------------------------

 Design:
 void toUpperInPlace(char[] str); // in-place
 char[] toUpper(char[] str); // copy

 Pros:
 * copying is safer, and is therefore a better default
Only if the coder expects that is the default, *and* they most often need the original data intact later in the program. And that safety is not much of an advantage when your code is three-legged dog slow and eats up resources that could be used by other processes :) Walking to work may be safer than going 70 MPH on the freeway, but it would take me a week and I'd starve.
Is someone prejudiced here? :) I could counter that with how functional style programming is superior in all other ways, but I won't. ;)
 * in-place is an optimization and would stand out as such
It's only considered an 'optimization' right now because it's different from the default (CoW).
 * default is functional (no-side effects), side effects stand out
 * people used to functional style programming would not find any
 surprises
 * all functions can be defined as copying functions
 * how many popular languages are designed (Ruby, Python, php, all 
 "functional" languages, etc...)
Yes, but all of these are languages where performance is not an imperative (excepting some of the functional languages perhaps). Plus think of all the time and effort that have been spent on GC's because of this design choice :)
 Cons:
 * could confuse people, lead to silent errors:
 toupper(str); // doesn't change str
 cos(x); // doesn't change x ;)

 For the record, I am in favor of number 2 and that would have biased 
 the arguments above.
I could live with either one. It is after all only a matter of naming. Consistency is the most important thing. The argument that there are only a small subset of all functions for which in-place as a concept is applicable is IMHO quite strong. /Oskar
Aug 03 2006
parent Dave <Dave_member pathlink.com> writes:
Oskar Linde wrote:
 Dave wrote:
 
 Again, not really. :) See Thomas Kuehne's post further up the thread. 
 There are certain unicode case foldings where the number of UTF-8 
 element changes. This will be handled correctly by 
 std.string.toupper/tolower, but the result can not be in-place.
 
Not (pedantically) in-place for those cases, but for all cases you could still get around a complete .dup (and of course the string arguments would have to change to be passed inout for to/upper/lower, std.uni.encode, etc.).
 2. copying default, in-place versions specially named
 -----------------------------------------------------

 Design:
 void toUpperInPlace(char[] str); // in-place
 char[] toUpper(char[] str); // copy

 Pros:
 * copying is safer, and is therefore a better default
Only if the coder expects that is the default, *and* they most often need the original data intact later in the program. And that safety is not much of an advantage when your code is three-legged dog slow and eats up resources that could be used by other processes :) Walking to work may be safer than going 70 MPH on the freeway, but it would take me a week and I'd starve.
Is someone prejudiced here? :) I could counter that with how functional style programming is superior in all other ways, but I won't. ;)
I didn't intend it that way <g> Just pointing out that I'm not overly concerned with complete safety with a language like D when it can cost a lot.
 I could live with either one. It is after all only a matter of naming. 
 Consistency is the most important thing. The argument that there are 
 only a small subset of all functions for which in-place as a concept is 
 applicable is IMHO quite strong.
You're right, it is a very small subset in Phobos right now but 'CoW' seems to be the design pattern chosen for D. As this thread went on I became concerned that CoW for arrays is probably not the way to go for a language like D (all IMHO).
 
 /Oskar
Aug 04 2006
prev sibling parent Dave <Dave_member pathlink.com> writes:
Oskar Linde wrote:
 Dave wrote:
 Reiner Pope wrote:
 Why not:

     str = toupper(str);     // in-place
     str = toupper(str.dup); // COW
What is the advantage of redundantly assigning the result of an in-place
No advantage - the poster was just using the example from the OP. And what the OP example was showing is that the way it is now (CoW), the coder (often) ends-up assigning the results back to the original string reference, in which case the .dup inside toupper is a total waste. writefln(toupper(str)); // in-place char[] st2 = cast(char[])file.read("somedata"); writefln("Uppercase string: ", toupper(st2.dup)); // dup only if needed writefln("Original string: ", st2);
 function to itself? In my opinion, all in-place functions should have a void
return type to avoid 
common mistakes such as:

     writefln(toupper(str));             // function chain

Many of C's string functions do this too.

 foreach(e; arr.reverse) { ... }
 // OOPS, arr is now reversed

 .dup followed by calling an in-place function is certainly ok, but in those
cases, an ordinary 
functional (non-in-place) function would have been more efficient.

If the programmer needs to keep a copy of the original, the way
toupper/tolower/etc is done now is 
more efficient only in the case where the data was not modified.

My argument is that most often when data is modified at some point in a
program, it is because the 
rest of the program needs the modified version and not a copy of the original
(so defensive .dups 
won't be done anyhow).

 I think CoW for arrays was a mistake -- it is most often unnecessary, will
cause D to repeat 
many of Java's performance woes for the average user, and as I mentioned is inconsistent as well. It's a lose-lose-lose.
 Consider the following (just made up) case insensitive multi-file word count
application:

 import std.stdio;
 import std.file;
 import std.string;

 void main(char[][] args) {
         int[char[]] wc;
         foreach(filename; args[1..$]) {
                 char[] data = cast(char[]) read(filename);
                 foreach(word; data.split())
                         wc[tolower(word)]++;
         }
         writefln("num words: ",wc.length);
 }

 If you ran this program on the full collection of 18000 Gutenberg books, you
would inevitably run 
out of memory. Why would you do that when a standard English dictionary only occupies a couple of megabytes?
 Without knowing the intricate details of D and Phobos, I bet you would have no
way of knowing 
that you got killed by the cow. :)

Exactly my point and great example. It's that kind of stuff that is really
tough on a newbie trying 
to get the most out of a high-performance language.

IMHO, it's not too big of a leap for a beginner to suspect that data will be
modified when they pass 
a byref argument into a function like toupper. If 'in-place' is clearly
documented then I don't see 
a problem.

- Dave

 /Oskar
Aug 03 2006
prev sibling parent Sean Kelly <sean f4.ca> writes:
Reiner Pope wrote:
 Why not:

     str = toupper(str);     // in-place
     str = toupper(str.dup); // COW
This is not copy on write. That is simply 'always copy', and this performs worse than COW (which in turn performs worse than in-place, if in-place is possible). Walter has also said earlier that, with COW, it should be the responsibility of the writer to ensure the copy, not the caller.
To do true COW, toupper would have to test every element against its uppercase equivalent--the first diff would cause a copy to occur. For mutating algorithms such as this, I think it makes more sense for them to always change the data in place if possible and to document them as such. Sean
Aug 03 2006
prev sibling next sibling parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Dave wrote:
 
 What if selected functions in phobos were modified to take an optional 
 parameter that specified COW or in-place? The default for each would be 
 whatever they do now.
There are at least three ways an array algorithm can operate: - in-place - copying - CoW In this case, CoW would mean a function that made a copy in all cases except when the return value would become identical to the argument and as such, is semantically very close to the copying version. It would make more sense to have separate in-place and copying functions, and add a possible runtime CoW-flag to the copying function. I don't think a runtime flag for CoW vs in-place does make much sense when the compile time semantics are different. An efficient implementation of a copying algorithm would also often be quite different from an in-place version, speaking for separate functions. /Oskar
Aug 03 2006
parent Reiner Pope <reiner.pope gmail.com> writes:
Oskar Linde wrote:
 Dave wrote:
 What if selected functions in phobos were modified to take an optional 
 parameter that specified COW or in-place? The default for each would 
 be whatever they do now.
There are at least three ways an array algorithm can operate: - in-place - copying - CoW
To the caller, however, there are only two situations (in an ideal world with adequate const protection*): - modifies my copy (in-place) - doesn't modify my copy As long as the function sticks to what it promises, then it should be free to implement it in the fastest/easiest way possible. *I know that there is a difference at the moment: with CoW, you have to be careful about modifying the returned value, because it might also be your original, in which case you would be modifying both. However, this is where const protection helps, especially the runtime flag included in rocheck.
 It would make more sense to have separate in-place and copying 
 functions, and add a possible runtime CoW-flag to the copying function.
 
When would ever want the copying function instead of the CoW function? At most times, the overhead from keeping track of CoW is generally minimal, but in the situations where CoW requires no copying, it gets a huge advantage. The only situation where choosing copying makes sense is if you have determined that the CoW is too much. In that case, however, you probably wouldn't want to send the flag at runtime, but change it at compile time, I would say.
 I don't think a runtime flag for CoW vs in-place does make much sense 
 when the compile time semantics are different.
 
 An efficient implementation of a copying algorithm would also often be 
 quite different from an in-place version, speaking for separate functions.
There's a simple solution to this: // If the implementations for in-place and copying are substantially different, then wrap them like this rocheck T[] sort(rocheck T[] array) { if (array.isMutable()) return inPlaceSort(array.ensureWritable()); else return copyingSort(array); } // If there is no real difference, put them together in the one function rocheck dchar[] toupper(rocheck dchar[] array) { // Do some stuff and call ensureWritable() when required, which manages whether copying is necessary behind the scenes } The point behind the runtime flag is that the required checking can be made to be low overhead, with O(1) cost, whereas unnecessary copying has O(n) cost. Cheers, Reiner
Aug 03 2006
prev sibling parent reply renox <renosky free.fr> writes:
Dave wrote:
 
 What if selected functions in phobos were modified to take an optional 
 parameter that specified COW or in-place? The default for each would be 
 whatever they do now.
 
 For example, toupper and tolower?
 
 How many times have we seen something like this:
 
 str = toupper(str); // or equivalent in another language.
In ruby, they have this nice convention that a.function() leaves a unchanged and a.function!() modifies a. Something like this would be nice, the hard part is choosing the correct naming convention so that it is followed.. functionXIP (eXecute In Place), functionWSD (With Side Effect)? Sigh, hard to achieve something as simple and elegant as '!' : caution this function modifies the object! In the absence of proper naming termination, an optionnal parameter could be used yes. Regards, Renaud Hebert
 
 Thanks,
 
 - Dave
Aug 03 2006
parent reply Kirk McDonald <kirklin.mcdonald gmail.com> writes:
renox wrote:
 Dave wrote:
 
 What if selected functions in phobos were modified to take an optional 
 parameter that specified COW or in-place? The default for each would 
 be whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.
In ruby, they have this nice convention that a.function() leaves a unchanged and a.function!() modifies a. Something like this would be nice, the hard part is choosing the correct naming convention so that it is followed.. functionXIP (eXecute In Place), functionWSD (With Side Effect)? Sigh, hard to achieve something as simple and elegant as '!' : caution this function modifies the object! In the absence of proper naming termination, an optionnal parameter could be used yes.
What about: void toupper(char[] s); // Modifies s in-place char[] asupper(char[] s); // COW function Of course, this convention would only apply to functions named "tosomething", but I bet most/all of the functions for which an "in-place" operation makes sense are named that. -- Kirk McDonald Pyd: Wrapping Python with D http://dsource.org/projects/pyd/wiki
Aug 03 2006
parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Kirk McDonald wrote:
 renox wrote:
 Dave wrote:

 What if selected functions in phobos were modified to take an 
 optional parameter that specified COW or in-place? The default for 
 each would be whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.
In ruby, they have this nice convention that a.function() leaves a unchanged and a.function!() modifies a. Something like this would be nice, the hard part is choosing the correct naming convention so that it is followed.. functionXIP (eXecute In Place), functionWSD (With Side Effect)? Sigh, hard to achieve something as simple and elegant as '!' : caution this function modifies the object! In the absence of proper naming termination, an optionnal parameter could be used yes.
What about: void toupper(char[] s); // Modifies s in-place char[] asupper(char[] s); // COW function Of course, this convention would only apply to functions named "tosomething", but I bet most/all of the functions for which an "in-place" operation makes sense are named that.
It doesn't really apply to functions that are verbs, like capitalize, sort and map. For those one option is: capitalized, sorted and mapped for COW versions. /Oskar
Aug 03 2006
next sibling parent Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Oskar Linde wrote:
 Kirk McDonald wrote:
 renox wrote:
 Dave wrote:

 What if selected functions in phobos were modified to take an 
 optional parameter that specified COW or in-place? The default for 
 each would be whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.
In ruby, they have this nice convention that a.function() leaves a unchanged and a.function!() modifies a. Something like this would be nice, the hard part is choosing the correct naming convention so that it is followed.. functionXIP (eXecute In Place), functionWSD (With Side Effect)? Sigh, hard to achieve something as simple and elegant as '!' : caution this function modifies the object! In the absence of proper naming termination, an optionnal parameter could be used yes.
What about: void toupper(char[] s); // Modifies s in-place char[] asupper(char[] s); // COW function Of course, this convention would only apply to functions named "tosomething", but I bet most/all of the functions for which an "in-place" operation makes sense are named that.
It doesn't really apply to functions that are verbs, like capitalize, sort and map. For those one option is: capitalized, sorted and mapped for COW versions.
I know we aren't supposed to like pointers, but it could also work the following way: void toupper(char[]* s); // modifies *s in-place char[] toupper(char[] s); // moo then by writing: toupper(&foo); you'd make it pretty clear that foo is to be modified. Internally, the in-place version could immediately call sth like void toupper_inPlace(inout char[] s); -- Tomasz Stachowiak
Aug 03 2006
prev sibling parent Kirk McDonald <kirklin.mcdonald gmail.com> writes:
Oskar Linde wrote:
 Kirk McDonald wrote:
 
 renox wrote:

 Dave wrote:

 What if selected functions in phobos were modified to take an 
 optional parameter that specified COW or in-place? The default for 
 each would be whatever they do now.

 For example, toupper and tolower?

 How many times have we seen something like this:

 str = toupper(str); // or equivalent in another language.
In ruby, they have this nice convention that a.function() leaves a unchanged and a.function!() modifies a. Something like this would be nice, the hard part is choosing the correct naming convention so that it is followed.. functionXIP (eXecute In Place), functionWSD (With Side Effect)? Sigh, hard to achieve something as simple and elegant as '!' : caution this function modifies the object! In the absence of proper naming termination, an optionnal parameter could be used yes.
What about: void toupper(char[] s); // Modifies s in-place char[] asupper(char[] s); // COW function Of course, this convention would only apply to functions named "tosomething", but I bet most/all of the functions for which an "in-place" operation makes sense are named that.
It doesn't really apply to functions that are verbs, like capitalize, sort and map. For those one option is: capitalized, sorted and mapped for COW versions. /Oskar
Those make me think the function is /asking/ if the array/string is capitalized, sorted, &c. For sheer, bloodyminded consistency's sake, we could use ascapitalized, assorted, &c, but those read pretty poorly. Hrm. On second thought, your idea is better. :-) -- Kirk McDonald Pyd: Wrapping Python with D http://dsource.org/projects/pyd/wiki
Aug 03 2006