www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - xxxInPlace or xxxCopy?

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
I'm consolidating some routines from std.string into std.array. They are 
specialized for operating on arrays, and include the likes of insert, 
remove, replace.

One question is whether operations should be performed in place or on a 
copy. For example:

string s = "Mary has a lil lamb.";
// Implicit copy
auto s1 = replace(s, "lil", "li'l");
assert(s == "Mary has a lil lamb.");
// Explicit in-place
replaceInPlace(s, "lil", "li'l");
assert(s == "Mary has a li'l lamb.");

So that would make copying the default behavior. Alternatively, we could 
make in-place the default behavior and ask for the Copy suffix:

string s = "Mary has a lil lamb.";
// Explicit copy
auto s1 = replaceCopy(s, "lil", "li'l");
assert(s == "Mary has a lil lamb.");
// Implicit in-place
replace(s, "lil", "li'l");
assert(s == "Mary has a li'l lamb.");


Thoughts?

Andrei
Jan 19 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei:

 One question is whether operations should be performed in place or on a 
 copy. For example:

Strings are meant to be immutable, and the functional style is simpler to understand and safer to use, so I firmly suggest the default (with shorter names) functions to create a new string/array, and the versions that work in place with a longer name. In some languages the versions that work in-place have a bang (!) suffix, like replace and replace!. I guess a name like "replaceBang" is too much cryptic.
 auto s1 = replace(s, "lil", "li'l");
 assert(s == "Mary has a lil lamb.");

You probably meant:
 assert(s1 == "Mary has a lil lamb.");

Bye, bearophile
Jan 19 2011
next sibling parent reply so <so so.do> writes:
Strange, we are again on the opposite sides...
Second one looks much better to me.
I think, most of the time we need inplace, and it deserves the better  
syntax.
Jan 19 2011
parent bearophile <bearophileHUGS lycos.com> writes:
so:

 Strange, we are again on the opposite sides...
 Second one looks much better to me.
 I think, most of the time we need inplace, and it deserves the better  
 syntax.

In the meantime the world is going more functional... :-) Bye, bearophile
Jan 19 2011
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
bearophile <bearophileHUGS lycos.com> wrote:

 auto s1 = replace(s, "lil", "li'l");
 assert(s == "Mary has a lil lamb.");

You probably meant:
 assert(s1 == "Mary has a lil lamb.");


Nope. (s1 == "Mary has a li'l lamb.") && (s == "Mary has a lil lamb."). -- Simen
Jan 19 2011
prev sibling parent so <so so.do> writes:
 In the meantime the world is going more functional... :-)

I love how they solve this problem, but if you go on that path while ignoring the reality there wouldn't be much of a reason using D, no? :)
Jan 20 2011
prev sibling next sibling parent Jesse Phillips <jessekphillips+D gmail.com> writes:
Andrei Alexandrescu Wrote:

 So that would make copying the default behavior. Alternatively, we could 
 make in-place the default behavior and ask for the Copy suffix:

Do what sort does. On another thought what about: auto s = replace(s1[], "lil", "li'l"); isn't the empty [] the specification for saving a range in its current form? Just seems like this would be how we'd want to do things.
Jan 19 2011
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, January 19, 2011 15:33:16 Andrei Alexandrescu wrote:
 I'm consolidating some routines from std.string into std.array. They are
 specialized for operating on arrays, and include the likes of insert,
 remove, replace.
 
 One question is whether operations should be performed in place or on a
 copy. For example:
 
 string s = "Mary has a lil lamb.";
 // Implicit copy
 auto s1 = replace(s, "lil", "li'l");
 assert(s == "Mary has a lil lamb.");
 // Explicit in-place
 replaceInPlace(s, "lil", "li'l");
 assert(s == "Mary has a li'l lamb.");

++vote;
 So that would make copying the default behavior. Alternatively, we could
 make in-place the default behavior and ask for the Copy suffix:
 
 string s = "Mary has a lil lamb.";
 // Explicit copy
 auto s1 = replaceCopy(s, "lil", "li'l");
 assert(s == "Mary has a lil lamb.");
 // Implicit in-place
 replace(s, "lil", "li'l");
 assert(s == "Mary has a li'l lamb.");

--vote;
 
 Thoughts?

Haven't we been using the approach that string operations generally make copies (in many cases slices) and marking functions that do it in place with InPlace? That's certainly the approach that I'd prefer. And considering that strings (which would be the most common use of arrays, I would think) have immutable elements and generally _can't_ do anything in place, that would imply that copying/slicing would be the default rather than doing operations in place. Also, if you're looking to minimize code breakage, you're going to have to go with using copy by default and in place for functions marked for it, because the existing versions of functions like replace have been making copies. So, switching to in place by default would break more code. Being forced to use functions with copy in the name would make dealing with strings more annoying, since they would _have_ to be using the copy versions, and it would be the versions with copy in the name which would be used the most, which seems really backwards. So, I really think that copying should be the default and in place functions should be marked with InPlace. It's more consistent with current behavior and would generally result in less typing. - Jonathan M Davis
Jan 19 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/19/11 6:53 PM, Jonathan M Davis wrote:
 On Wednesday, January 19, 2011 15:33:16 Andrei Alexandrescu wrote:
 I'm consolidating some routines from std.string into std.array. They are
 specialized for operating on arrays, and include the likes of insert,
 remove, replace.

 One question is whether operations should be performed in place or on a
 copy. For example:

 string s = "Mary has a lil lamb.";
 // Implicit copy
 auto s1 = replace(s, "lil", "li'l");
 assert(s == "Mary has a lil lamb.");
 // Explicit in-place
 replaceInPlace(s, "lil", "li'l");
 assert(s == "Mary has a li'l lamb.");

++vote;
 So that would make copying the default behavior. Alternatively, we could
 make in-place the default behavior and ask for the Copy suffix:

 string s = "Mary has a lil lamb.";
 // Explicit copy
 auto s1 = replaceCopy(s, "lil", "li'l");
 assert(s == "Mary has a lil lamb.");
 // Implicit in-place
 replace(s, "lil", "li'l");
 assert(s == "Mary has a li'l lamb.");

--vote;

So I guess vote stays unchanged :o).
 Thoughts?

Haven't we been using the approach that string operations generally make copies (in many cases slices) and marking functions that do it in place with InPlace?

Problem is, even though the example uses strings, the functions apply to all arrays. Andrei
Jan 19 2011
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei:

 Problem is, even though the example uses strings, the functions apply to 
 all arrays.

Important general rule: if converting string functions into generic functions makes them worse string functions, then don't move them to the algorithm module, or create special string functions for the string module. Bye, bearophile
Jan 19 2011
prev sibling next sibling parent Jerry Quinn <jlquinn optonline.net> writes:
Andrei Alexandrescu Wrote:

 On 1/19/11 6:53 PM, Jonathan M Davis wrote:
 On Wednesday, January 19, 2011 15:33:16 Andrei Alexandrescu wrote:
 I'm consolidating some routines from std.string into std.array. They are
 specialized for operating on arrays, and include the likes of insert,
 remove, replace.

 One question is whether operations should be performed in place or on a
 copy. For example:


So I guess vote stays unchanged :o).
 Thoughts?

Haven't we been using the approach that string operations generally make copies (in many cases slices) and marking functions that do it in place with InPlace?

Problem is, even though the example uses strings, the functions apply to all arrays.

The big difference is operating on immutable arrays vs mutable ones. For immutable arrays, you have to do copies. But mutable ones allow in-place editing. If I'm working with mutable arrays of ints, I don't want to have to type InPlace after every function and I *really* don't want the array to be copied or efficiency will go down the tubes. Nor do I want to add Copy to every string operation. This might be an argument to leave the string functions where they are. To a certain extent, strings are special, even though they really aren't. Is it too ugly to contemplate algorithms doing in-place operations on mutable arrays and return a copy instead for immutable ones?
Jan 19 2011
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
so:

 Isn't simplicity and understandability favors the in-place style on these  
 type of algorithms?

Nope, functional-style code is what you are looking for :-)
 As Jesse Phillips said, it is same as sort.

You have to think of the normal sort as a performance hack, something that is good because copying data wastes a lot of time, if the array is large or if you have to sort an many small arrays. Normally in Python you prefer sorted(), that returns a sorted copy, unless performance is important. I'd like something like sorted() in D too. In a program there is code that's performance-critical, and other code that's not changing the total runtime much. Often the second kind of code is a good part of the whole program. In this part you want very short, readable, safer code, even functional-style :-) Bye, bearophile
Jan 19 2011
parent bearophile <bearophileHUGS lycos.com> writes:
so:

 I didn't know that, this solution is what i meant.
 So, they didn't blindly enforce functional language rules to a  
 non-functional language.

Python was designed lot of time ago by Guido that I think didn't know much about functional programming. So they have first added an in-place sort() and later they have added a more functional sorted(). D2 is more functional than Python2, and I think the behaviour of sorted() is better to be the default one in D2 :-) Bye, bearophile
Jan 20 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday 19 January 2011 18:23:14 Jerry Quinn wrote:
 Andrei Alexandrescu Wrote:
 On 1/19/11 6:53 PM, Jonathan M Davis wrote:
 On Wednesday, January 19, 2011 15:33:16 Andrei Alexandrescu wrote:
 I'm consolidating some routines from std.string into std.array. They
 are specialized for operating on arrays, and include the likes of
 insert, remove, replace.
 
 One question is whether operations should be performed in place or on
 a


 copy. For example:


 Thoughts?

Haven't we been using the approach that string operations generally make copies (in many cases slices) and marking functions that do it in place with InPlace?

Problem is, even though the example uses strings, the functions apply to all arrays.

The big difference is operating on immutable arrays vs mutable ones. For immutable arrays, you have to do copies. But mutable ones allow in-place editing. If I'm working with mutable arrays of ints, I don't want to have to type InPlace after every function and I *really* don't want the array to be copied or efficiency will go down the tubes. Nor do I want to add Copy to every string operation. This might be an argument to leave the string functions where they are. To a certain extent, strings are special, even though they really aren't. Is it too ugly to contemplate algorithms doing in-place operations on mutable arrays and return a copy instead for immutable ones?

I'd say that yes, it's too ugly to contemplate. The reason is simple: the behavior of the function then changes drastically depending on whether the array you give it is immutable or not. - Jonathan M Davis
Jan 19 2011
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/19/11 8:36 PM, so wrote:
 And honestly, from the standpoint of code simplicity and
 understandability,
 there's a lot to be said for making copies being the default rather than
 mutation. You can then use the InPlace versions if you need the boost in
 efficiency.

 - Jonathan M Davis

Isn't simplicity and understandability favors the in-place style on these type of algorithms? As Jesse Phillips said, it is same as sort.

We also have toupperInPlace and tolowerInPlace as precedents pointing the other way. Andrei
Jan 19 2011
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, January 19, 2011 17:10:07 Andrei Alexandrescu wrote:
 On 1/19/11 6:53 PM, Jonathan M Davis wrote:
 On Wednesday, January 19, 2011 15:33:16 Andrei Alexandrescu wrote:
 I'm consolidating some routines from std.string into std.array. They are
 specialized for operating on arrays, and include the likes of insert,
 remove, replace.
 
 One question is whether operations should be performed in place or on a
 copy. For example:
 
 string s = "Mary has a lil lamb.";
 // Implicit copy
 auto s1 = replace(s, "lil", "li'l");
 assert(s == "Mary has a lil lamb.");
 // Explicit in-place
 replaceInPlace(s, "lil", "li'l");
 assert(s == "Mary has a li'l lamb.");

++vote;
 So that would make copying the default behavior. Alternatively, we could
 make in-place the default behavior and ask for the Copy suffix:
 
 string s = "Mary has a lil lamb.";
 // Explicit copy
 auto s1 = replaceCopy(s, "lil", "li'l");
 assert(s == "Mary has a lil lamb.");
 // Implicit in-place
 replace(s, "lil", "li'l");
 assert(s == "Mary has a li'l lamb.");

--vote;

So I guess vote stays unchanged :o).
 Thoughts?

Haven't we been using the approach that string operations generally make copies (in many cases slices) and marking functions that do it in place with InPlace?

Problem is, even though the example uses strings, the functions apply to all arrays.

True. But I would expect a string to be by far the most used type of array. So, unless you want to specialize the functions so that they work one way for strings and another way for other arrays (which sounds like a really bad idea), it would make the most sense to pick the way that's most likely to be used as the default. And since strings are the most likely case, choosing what works best for strings seems like the best idea IMHO. And honestly, from the standpoint of code simplicity and understandability, there's a lot to be said for making copies being the default rather than mutation. You can then use the InPlace versions if you need the boost in efficiency. - Jonathan M Davis
Jan 19 2011
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/19/11 9:11 PM, Jonathan M Davis wrote:
 On Wednesday 19 January 2011 18:36:55 so wrote:
 And honestly, from the standpoint of code simplicity and
 understandability,
 there's a lot to be said for making copies being the default rather than
 mutation. You can then use the InPlace versions if you need the boost in
 efficiency.

 - Jonathan M Davis

Isn't simplicity and understandability favors the in-place style on these type of algorithms? As Jesse Phillips said, it is same as sort.

No. I'd argue that it's clearer to see stuff like auto newStr = replace(str, "hello", "world"); auto sorted = sort(newStr); than to see stuff like replace(str, "hello", "world"); sort(newStr); If you have replace(str, "hello", "world"); you don't know whether it's changed the value in place or if you're throwing away a return value. However, if you have auto newStr = replace(str, "hello", "world"); replaceInPlace(newStr, "world", "hello"); it's quite clear that the first one returns a value and the the second one does it in place. Whereas if you have auto newStr = replaceCopy(str, "hello", "world"); replace(newStr, "world", "hello"); the first one is clear, but the second one is only clear because seeing the first one makes it obvious that the second one must be doing something different.

This is a good argument, thanks Jonathan. Andrei
Jan 19 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrej Mitrovic:

 I think what might help out in D is if we had a way to mark some
 functions so the compiler guarantees that their return values *are
 not* to be discarded. For example, this code will compile:
 
 import std.stdio;
 import std.string;
 void main()
 {
     string s = "Mary has a lil lamb.";
     replace(s, "lil", "li'l");  // returns a copy, but discards it
 }
 
 If the replace function is marked with some kind of  nodiscard
 annotation, then his would be a compile error since it doesn't make
 sense to construct a new string, return it, and discard it.
 
 But maybe that's going overboard. How often do these kinds of bugs creep in?

Such bugs are common enough. GNU C has the warn_unused_result attribute (that is like your nodiscard if you use -Werror to turn warnings into errors): http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html Some C lints require a void cast where you don't want to use a function result: cast(void)replace(s, "lil", "li'l"); In a language the default is different and where you don't want to use a function result you have to add an annotation: unused replace(s, "lil", "li'l"); Something like nodiscard is more useful in C than D because in C there are no true built-in exceptions, so error return values are common, and ignoring them is a mistake. In some cases like replace() or the C realloc() ignoring a result is always a programmer error. So something like nodiscard is useful in D too. Bye, bearophile
Jan 20 2011
next sibling parent reply Trass3r <un known.com> writes:
If such an annotation was introduced, it should be the other way around.
But imo discarding a return value should always result in a warning,
the function returns something for a reason.
Jan 20 2011
parent reply foobar <foo bar.com> writes:
Jonathan M Davis Wrote:

 On Thursday 20 January 2011 03:51:48 Trass3r wrote:
 If such an annotation was introduced, it should be the other way around.
 But imo discarding a return value should always result in a warning,
 the function returns something for a reason.

Actually, there are plenty of cases where you throw away the return value. A number of overloaded operators are prime examples - such as opAssign. std.algorithm.sort both sorts in place _and_ returns a sorted range (so that other algorithms can then know that the range is sorted). It's really quite easy to get legitimate cases where throwing away the return value makes perfect sense. Now, if you're dealing with a strongly pure function which throws away its return value, then yes, that's definitely bug, since the only effect of the function is its return value. Frequently however, that's not the case. Yes, you can have bugs because you didn't actually use the return value of a function, but it's that necessarily uncommon to have function calls which legitimately throw away their return value. - Jonathan M Davis

You brought up an interesting idea: D already supports purity and as you said it doesn't make sense to discard return values of such functions. Therefore, it makes sense that for pure functions, this would result in a compile time error.
Jan 20 2011
parent reply Don <nospam nospam.com> writes:
Steven Schveighoffer wrote:
 On Thu, 20 Jan 2011 10:36:00 -0500, foobar <foo bar.com> wrote:
 
 Jonathan M Davis Wrote:

 On Thursday 20 January 2011 03:51:48 Trass3r wrote:
 If such an annotation was introduced, it should be the other way 

 But imo discarding a return value should always result in a warning,
 the function returns something for a reason.

Actually, there are plenty of cases where you throw away the return value. A number of overloaded operators are prime examples - such as opAssign. std.algorithm.sort both sorts in place _and_ returns a sorted range (so that other algorithms can then know that the range is sorted). It's really quite easy to get legitimate cases where throwing away the return value makes perfect sense. Now, if you're dealing with a strongly pure function which throws away its return value, then yes, that's definitely bug, since the only effect of the function is its return value. Frequently however, that's not the case. Yes, you can have bugs because you didn't actually use the return value of a function, but it's that necessarily uncommon to have function calls which legitimately throw away their return value. - Jonathan M Davis

You brought up an interesting idea: D already supports purity and as you said it doesn't make sense to discard return values of such functions. Therefore, it makes sense that for pure functions, this would result in a compile time error.

Pure functions no longer have that requirement. You can pass mutable references to pure functions, which makes them weak-pure. -Steve

If you don't use the return value of a strongly pure, nothrow function, you could be given a 'expression has no effect' error. Currently the function call is silently dropped.
Jan 20 2011
parent bearophile <bearophileHUGS lycos.com> writes:
Don:

 If you don't use the return value of a strongly pure, nothrow function, 
 you could be given a 'expression has no effect' error.
 Currently the function call is silently dropped.

I have added this at the end of the enhancement request 5464 (but the error message is different). Bye, bearophile
Jan 20 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday 20 January 2011 03:51:48 Trass3r wrote:
 If such an annotation was introduced, it should be the other way around.
 But imo discarding a return value should always result in a warning,
 the function returns something for a reason.

Actually, there are plenty of cases where you throw away the return value. A number of overloaded operators are prime examples - such as opAssign. std.algorithm.sort both sorts in place _and_ returns a sorted range (so that other algorithms can then know that the range is sorted). It's really quite easy to get legitimate cases where throwing away the return value makes perfect sense. Now, if you're dealing with a strongly pure function which throws away its return value, then yes, that's definitely bug, since the only effect of the function is its return value. Frequently however, that's not the case. Yes, you can have bugs because you didn't actually use the return value of a function, but it's that necessarily uncommon to have function calls which legitimately throw away their return value. - Jonathan M Davis
Jan 20 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 20 Jan 2011 10:36:00 -0500, foobar <foo bar.com> wrote:

 Jonathan M Davis Wrote:

 On Thursday 20 January 2011 03:51:48 Trass3r wrote:
 If such an annotation was introduced, it should be the other way  

 But imo discarding a return value should always result in a warning,
 the function returns something for a reason.

Actually, there are plenty of cases where you throw away the return value. A number of overloaded operators are prime examples - such as opAssign. std.algorithm.sort both sorts in place _and_ returns a sorted range (so that other algorithms can then know that the range is sorted). It's really quite easy to get legitimate cases where throwing away the return value makes perfect sense. Now, if you're dealing with a strongly pure function which throws away its return value, then yes, that's definitely bug, since the only effect of the function is its return value. Frequently however, that's not the case. Yes, you can have bugs because you didn't actually use the return value of a function, but it's that necessarily uncommon to have function calls which legitimately throw away their return value. - Jonathan M Davis

You brought up an interesting idea: D already supports purity and as you said it doesn't make sense to discard return values of such functions. Therefore, it makes sense that for pure functions, this would result in a compile time error.

Pure functions no longer have that requirement. You can pass mutable references to pure functions, which makes them weak-pure. -Steve
Jan 20 2011
prev sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 1/20/11, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 On Thursday 20 January 2011 03:51:48 Trass3r wrote:
 If such an annotation was introduced, it should be the other way around.
 But imo discarding a return value should always result in a warning,
 the function returns something for a reason.

Actually, there are plenty of cases where you throw away the return value.

Yeah. There are functions that can return a value that also have side-effects. An example might be a class method that modifies it's private fields and might return the number of fields that were affected. While you might not need the return value in most cases, you do want the side-effects to happen. That's why forcing an error on functions that return values which aren't used would not be a good idea, and where the annotation idea comes from.
Jan 20 2011
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrej Mitrovic:

 If the replace function is marked with some kind of  nodiscard
 annotation, then his would be a compile error since it doesn't make
 sense to construct a new string, return it, and discard it.

http://d.puremagic.com/issues/show_bug.cgi?id=5464 Bye, bearophile
Jan 20 2011
prev sibling parent spir <denis.spir gmail.com> writes:
On 01/20/2011 11:31 AM, bearophile wrote:
 Andrej Mitrovic:

 I think what might help out in D is if we had a way to mark some
 functions so the compiler guarantees that their return values *are
 not* to be discarded. For example, this code will compile:

 import std.stdio;
 import std.string;
 void main()
 {
      string s = "Mary has a lil lamb.";
      replace(s, "lil", "li'l");  // returns a copy, but discards it
 }

 If the replace function is marked with some kind of  nodiscard
 annotation, then his would be a compile error since it doesn't make
 sense to construct a new string, return it, and discard it.

 But maybe that's going overboard. How often do these kinds of bugs 


 Such bugs are common enough. GNU C has the warn_unused_result 

warnings into errors):
 http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html

 Some C lints require a void cast where you don't want to use a 

 cast(void)replace(s, "lil", "li'l");

 In a language the default is different and where you don't want to 

 unused replace(s, "lil", "li'l");

 Something like  nodiscard is more useful in C than D because in C 

common, and ignoring them is a mistake. In some cases like replace() or the C realloc() ignoring a result is always a programmer error. So something like nodiscard is useful in D too. But I thought D had such a feature already. Probably I'm confusing, but I think I've had compiler warning in such cases, procisely (ingoring a func result). denis _________________ vita es estrany spir.wikidot.com
Jan 20 2011
prev sibling next sibling parent Trass3r <un known.com> writes:
 If you have replace(str, "hello", "world");
 you don't know whether it's changed the value in place or if you're

 auto newStr = replace(str, "hello", "world");
 replaceInPlace(newStr, "world", "hello");
 it's quite clear that the first one returns a value and the the

Very true. Imho function names would also be more understandable this way cause xInPlace is unambiguous while xCopy might lead to confusion (at least I could imagine a stranger misinterpreting replaceCopy etc.)
Jan 20 2011
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
so:

 I don't understand how the first two are clear and the last two are not so.
 Where both have the name "replace" for different things, and replace to me  
 means "replace in place".
 With this in hand, how is the first "replace" is quite clear?

In Python I am used to immutable strings, so string methods like replace return a modified copy. D1 string functions are similar. I'd like D2 to be like Python here, but in practice an in-place replace procedure and a strongly-pure replace function that returns a modified copy are about equally clear :-) Yet, if you perform many in-place operations on strings you may get confused (it happened to me), such confusion is less common with functional-style string functions. Bye, bearophile
Jan 20 2011
prev sibling next sibling parent so <so so.do> writes:
 And honestly, from the standpoint of code simplicity and  
 understandability,
 there's a lot to be said for making copies being the default rather than
 mutation. You can then use the InPlace versions if you need the boost in
 efficiency.

 - Jonathan M Davis

Isn't simplicity and understandability favors the in-place style on these type of algorithms? As Jesse Phillips said, it is same as sort.
Jan 19 2011
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday 19 January 2011 18:36:55 so wrote:
 And honestly, from the standpoint of code simplicity and
 understandability,
 there's a lot to be said for making copies being the default rather than
 mutation. You can then use the InPlace versions if you need the boost in
 efficiency.
 
 - Jonathan M Davis

Isn't simplicity and understandability favors the in-place style on these type of algorithms? As Jesse Phillips said, it is same as sort.

No. I'd argue that it's clearer to see stuff like auto newStr = replace(str, "hello", "world"); auto sorted = sort(newStr); than to see stuff like replace(str, "hello", "world"); sort(newStr); If you have replace(str, "hello", "world"); you don't know whether it's changed the value in place or if you're throwing away a return value. However, if you have auto newStr = replace(str, "hello", "world"); replaceInPlace(newStr, "world", "hello"); it's quite clear that the first one returns a value and the the second one does it in place. Whereas if you have auto newStr = replaceCopy(str, "hello", "world"); replace(newStr, "world", "hello"); the first one is clear, but the second one is only clear because seeing the first one makes it obvious that the second one must be doing something different. And even then, I'd argue that the name replaceCopy is more ambiguous than replaceInPlace. I think that it's far more likely that a function xCopy is going to have possible alternate meanings that xInPlace would, since not only is copy both a verb and a noun, but it can be used in a lot more situations, whereas InPlace is pretty limited and thus clear. Not to mention, if a function says copy, that implies that it might actually be _copying_ rather than slicing, which many xCopy functions would actually be doing rather than actually copying. So, using Copy in the name is actual ambiguous _regardless_ of what the first part of the function name is. In functional languages, it's _required_ that a function return the changed value instead of changing the one passed in. You're far less likely to accidentally mutate stuff if you program that way, even if you're not dealing with immutable or const values. I think that code is much cleaner if you program in a functional style. The problem is, of course, that you can't constantly be copying everything all the time, because there's a definite performance hit for doing that. So, you have functions which make changes in place when you need to do that. So, I'd argue that it's generally better to program using a functional style if you can and then use mutation if necessary for performance. x and xInPlace support that while xCopy and x do not. However, I think that the biggest argument in favor of using x and xInPlace is that strings are by far the most used type of array, and they _need_ to use the version which makes a copy or slices the array. So, if the x / xInPlace naming scheme would result in x being used more than xInPlace, whereas xCopy / x would result in xCopy being used the most. And I really think that the shorter version should be the one which is going to be used the most. Not to mention, that's the way that the string functions have been done thus far, so sticking to x / xInPlace will break less code. - Jonathan M Davis
Jan 19 2011
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
so:

 Just check boost/string/replace, they have in place replaces default too.  
 You might not like boost (some don't) but it is the closest example to D.

You will find D1 string functions are much more from here than from Boost: http://docs.python.org/release/2.5.2/lib/string-methods.html Bye, bearophile
Jan 21 2011
prev sibling parent spir <denis.spir gmail.com> writes:
On 01/21/2011 07:47 PM, so wrote:
 replace is clearer in the first case, because you're getting the
 return value.
 ...

is clearer then the second makes no sense to me i am sorry. I still think second is clearer, but whatever, as long as i can see the interface or the doc, i am fine. string replace(string, ...); void replace(ref string, ...);
 Regardless, I don't see anything wrong with naming functions in a
 manner that
 implies that a functional style is the default

and work with assumptions. Just check boost/string/replace, they have in place replaces default too. You might not like boost (some don't) but it is the closest example to D.

Without any additional information, I would necessirily assume replace performs an /action/ because it's an action verb: meaning it changes the argument. Like 'so', I cannot understand the converse reasoning. I you want people to guess that a true function returns a result, just name it according to its result: replacedString, ot just replaced. Nobody, I guess, would ever think that a routine called replacedString acts in-place. Denis _________________ vita es estrany spir.wikidot.com
Jan 21 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
One common mistake newbies make in Python is calling the sorted method
and expecting it to sort in place:

 x = [3, 2, 1]
 sorted(x)



 x






There are a few functions in the Python lib that have "InPlace" added to their names to avoid confusion, so it's not a new convention and it seems like a good way to go.
Jan 19 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 1/20/11, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 One common mistake newbies make in Python is calling the sorted method
 and expecting it to sort in place:

 x = [3, 2, 1]
 sorted(x)



 x






There are a few functions in the Python lib that have "InPlace" added to their names to avoid confusion, so it's not a new convention and it seems like a good way to go.

What I meant by the first sentence is that due to the interpreter outputing the sorted list, a newbie might think that x was sorted, so he uses it in his own code until he notices the bug. I think what might help out in D is if we had a way to mark some functions so the compiler guarantees that their return values *are not* to be discarded. For example, this code will compile: import std.stdio; import std.string; void main() { string s = "Mary has a lil lamb."; replace(s, "lil", "li'l"); // returns a copy, but discards it } If the replace function is marked with some kind of nodiscard annotation, then his would be a compile error since it doesn't make sense to construct a new string, return it, and discard it. But maybe that's going overboard. How often do these kinds of bugs creep in?
Jan 19 2011
prev sibling next sibling parent so <so so.do> writes:
 auto newStr = replace(str, "hello", "world");
 replaceInPlace(newStr, "world", "hello");

 it's quite clear that the first one returns a value and the the second  
 one does
 it in place. Whereas if you have

 auto newStr = replaceCopy(str, "hello", "world");
 replace(newStr, "world", "hello");

 the first one is clear, but the second one is only clear because seeing  
 the first
 one makes it obvious that the second one must be doing something  
 different.

I don't understand how the first two are clear and the last two are not so. Where both have the name "replace" for different things, and replace to me means "replace in place". With this in hand, how is the first "replace" is quite clear? I am sure this is the case for many people. Problem is the naming here. If you have named it something like "replaced" and return a copy, it would be obvious and clear. Here, aren't you just dictating functional language rules to a multi-paradigm language, implicitly? In a fully functional language "replace(something)" might mean "replace and give me a copy", but it is not what we have.
Jan 20 2011
prev sibling next sibling parent so <so so.do> writes:
 You have to think of the normal sort as a performance hack, something  
 that is good because copying data wastes a lot of time, if the array is  
 large or if you have to sort an many small arrays. Normally in Python  
 you prefer sorted(), that returns a sorted copy, unless performance is  
 important. I'd like something like sorted() in D too.

I didn't know that, this solution is what i meant. So, they didn't blindly enforce functional language rules to a non-functional language.
Jan 20 2011
prev sibling next sibling parent Justin Johansson <jj nospam.com> writes:
On 20/01/11 10:33, Andrei Alexandrescu wrote:
 I'm consolidating some routines from std.string into std.array. They are
 specialized for operating on arrays, and include the likes of insert,
 remove, replace.

 One question is whether operations should be performed in place or on a
 copy. For example:

Though your question has already prompted a number of answers, are you sure that your question *saliently* poses the problem to be answered? In short, work on stating the problem as succinctly as you can, rather than asking for answers that shoot from the hip. Cheers, Justin Johansson
Jan 20 2011
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, January 20, 2011 05:48:12 so wrote:
 auto newStr = replace(str, "hello", "world");
 replaceInPlace(newStr, "world", "hello");
 
 it's quite clear that the first one returns a value and the the second
 one does
 it in place. Whereas if you have
 
 auto newStr = replaceCopy(str, "hello", "world");
 replace(newStr, "world", "hello");
 
 the first one is clear, but the second one is only clear because seeing
 the first
 one makes it obvious that the second one must be doing something
 different.

I don't understand how the first two are clear and the last two are not so. Where both have the name "replace" for different things, and replace to me means "replace in place". With this in hand, how is the first "replace" is quite clear? I am sure this is the case for many people. Problem is the naming here. If you have named it something like "replaced" and return a copy, it would be obvious and clear. Here, aren't you just dictating functional language rules to a multi-paradigm language, implicitly? In a fully functional language "replace(something)" might mean "replace and give me a copy", but it is not what we have.

replace is clearer in the first case, because you're getting the return value. If you don't get the return value, then it's not immediately clear whether it's replacing "world" with "hello" in the return value or whether the function is void and "world" is being replaced in the original string (though they fact that we're dealing with strings here means that it _can't_ alter the original string - it's more of a question when dealing with arrays with mutable elements). Also, replaced would just be downright confusing to me, since it's not a verb. I'd expect it to be some sort of boolean test for whether something had been replaced, though that doesn't make a whole lot of sense in the context. I expect functions to be verbs unless checking state. Now, as I understdand it, python uses past participles such as replaced and sorted, but having never programmed in python, I'm not particularly familiar with that naming scheme and it wouild really throw me off at first. Regardless, I don't see anything wrong with naming functions in a manner that implies that a functional style is the default - particularly when we're talking about arrays, and they pretty much _have_ to be used in a functional style, because their elements are immutable. Andrei is essentially asking us whether the default behavior of an array function should typically be to return the changed value or to change it in place, with the longer name going to the function which has the other behavior. And since strings _have_ to be copied/sliced, and strings are generally going to be the most common type of array used, then it would make sense to make the default behavior be copying/slicing, making the functions which alter arrays in place have InPlace in their name. - Jonathan M Davis
Jan 20 2011
parent spir <denis.spir gmail.com> writes:
On 01/21/2011 09:21 PM, Jonathan M Davis wrote:
 The issue is when you don't look at the documentation or trying to avoid having
 to look at the documentation. If you see

 auto result = replace(str, "hello", "goodbye");

 it's quite clear that a copy is taking place. And if a copy/slice is taking
 place, then that is what you would normally see when replace is used. However,
 if replace alters the array in place, then

 replace(str, "hello", "goodbye");

 would be what you would normally see. And without looking at the documentation,
 it's not clear whether that is doing it in-place or if you're throwing away the
 return value. However, in the case where replace does a copy/slice, it_is_
 clear, because the return value is saved.

I don't follow you here. You use in your reasoning the particularity of C-like funcs which can be both proper functions and action routines. Indeed, as you say, one can throw away a result after calling a routine which is mainly a function, but for a side-effect; right. But the same applies conversely: one can well call a routine which is mainly an action (in this case, that operates in-place) and returns whatever outcome flag, so that: auto result = replace(str, "hello", "goodbye"); actually operates in-place. Which is consistent with its name, an action verb suggesting an action. Replace could eg return the number of replacements performed (actually useful, what do you think?) Without more information, and guessing from the name, that is precisely what I would think (and try to imagine what meta-info replace returns). Do not misinterpret: I actually support the choice of making return/copy the default (where both would make sense), because it's safer. But since we are changing many names, why not avoid misleading ones, precisely for the default case? Denis _________________ vita es estrany spir.wikidot.com
Jan 21 2011
prev sibling next sibling parent spir <denis.spir gmail.com> writes:
On 01/20/2011 12:33 AM, Andrei Alexandrescu wrote:
 I'm consolidating some routines from std.string into std.array. They are
 specialized for operating on arrays, and include the likes of insert,
 remove, replace.

 One question is whether operations should be performed in place or on a
 copy. For example:

 string s = "Mary has a lil lamb.";
 // Implicit copy
 auto s1 = replace(s, "lil", "li'l");
 assert(s == "Mary has a lil lamb.");
 // Explicit in-place
 replaceInPlace(s, "lil", "li'l");
 assert(s == "Mary has a li'l lamb.");

 So that would make copying the default behavior. Alternatively, we could
 make in-place the default behavior and ask for the Copy suffix:

 string s = "Mary has a lil lamb.";
 // Explicit copy
 auto s1 = replaceCopy(s, "lil", "li'l");
 assert(s == "Mary has a lil lamb.");
 // Implicit in-place
 replace(s, "lil", "li'l");
 assert(s == "Mary has a li'l lamb.");


 Thoughts?

I have thought at these issues (there are several playing together) in other languages. The first problem is indeed that both operations may often be useful. If you define it to operate in-place, then when the user instead wants a new element, they need copy first: col2 = col1; col2.sort(); If instead you define it to create a new element, then conversely when the user wants it to operate in-place, they need to reassign: col1 = col.sorted; The second point is how to hint the user to the actual semantics, and avoid possibly naughty bugs. It's mainly a question of naming. I have decided to follow once and for all the below guideline: * In-place modification is an action, thus it's name is an action verb, like "sort" (indeed, english is very often ambiguous; in such cases, verbal sense take precedence, else add some more word). * Creating a new is a function in the pure, math, sense of the word (not the C sense); name after what it creates. Usually, a simple adjective does the job, else add a noun: "sorted", "sortedTable", "sortedList". * Never mix both action & function in the same routine (except for signaling error in language without any exception system). It is often worth having both operations; difference of naming makes this easy to manage. When having both is overkill, I decided to return a new element for methods operating globally, and modify in-place for methods operating at the level of element(s). The reason is the first ones are usually costly, so it's worth using the safer functional scheme (and copying sometimes allows faster algo). While creating a whole new collection after any minimal change on element(s) is obviously not very efficient. These questions, as taken implicitely in this thread, mostly concern collections. Now, the case of string chosen as initial example is as always very particular. I'm not fan for this reason of the politics of using the same methods as for (other) arrays, except in cases where it's obvious. D strings are even more particular by having immutable elements. Well... My 2 cents. Denis _________________ vita es estrany spir.wikidot.com
Jan 20 2011
prev sibling next sibling parent "Akakima" <akakima33 gmail.com> writes:
Is it ok to use:

In place:

trim( string )
replace( string, from, to )

or Copy:

trim( string, outstring )
replace( string, from, to, outstring )
Jan 20 2011
prev sibling next sibling parent reply so <so so.do> writes:
 replace is clearer in the first case, because you're getting the return  
 value.
 ...

clearer then the second makes no sense to me i am sorry. I still think second is clearer, but whatever, as long as i can see the interface or the doc, i am fine. string replace(string, ...); void replace(ref string, ...);
 Regardless, I don't see anything wrong with naming functions in a manner  
 that
 implies that a functional style is the default

and work with assumptions. Just check boost/string/replace, they have in place replaces default too. You might not like boost (some don't) but it is the closest example to D.
Jan 21 2011
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, January 21, 2011 12:15:42 spir wrote:
 On 01/21/2011 07:47 PM, so wrote:
 replace is clearer in the first case, because you're getting the
 return value.
 ...

I am really trying hard to understand this, but your reasons for first is clearer then the second makes no sense to me i am sorry. I still think second is clearer, but whatever, as long as i can see the interface or the doc, i am fine. string replace(string, ...); void replace(ref string, ...);
 Regardless, I don't see anything wrong with naming functions in a
 manner that
 implies that a functional style is the default

I am not against enforcing such a rule, i am against doing it implicitly and work with assumptions. Just check boost/string/replace, they have in place replaces default too. You might not like boost (some don't) but it is the closest example to D.

Without any additional information, I would necessirily assume replace performs an /action/ because it's an action verb: meaning it changes the argument. Like 'so', I cannot understand the converse reasoning. I you want people to guess that a true function returns a result, just name it according to its result: replacedString, ot just replaced. Nobody, I guess, would ever think that a routine called replacedString acts in-place.

The fact that a function performs an action has nothing do to with whether it alters its arguments or just returns a value. It could be either. Functional languages _must_ return a result and _can't_ alter their arguments. Many, many functions are rewritten that way in pretty much _all_ languages. In fact, I'd argue that the _normal_ case is that you pass arguments to a function, and it returns a result without altering the arguments. It's only when you get into reference types that that changes. And, of course, arrays are reference types (abeit somewhat special ones). But since non-reference type arguments never get altered, and many functions with reference type arguments don't alter there arguments (in fact, I'd argue that _most_ functions don't alter their arguments - regardless of whether they're reference types or not), _not_ altering the arguments would be what you would typically expect of a function unless the name made it obvious that that wasn't the case, or what the function did made it obvious, or if you read the documention and then _knew_ what it did. And honestly, I find the whole python thing of using the past partiple for indicating that the result is returned rather than done in place is just weird. I'd expect a function like sorted to give me a boolean result telling me whether a range is sorted, _not_ that it would return a sorted version of the range that you gave it. I expect function names to be verbs, not past participles. Now, as bizarre as that convention may be, it could make functions clearer if you know about the convention and it is followed. However, as someone who has never dealt with code written that way, reading code that was written that way would be rather confusing at first. In any case, I'd argue that having a function _not_ alter its aruments is the typical default case of functions in general, so assuming that a function altered in place just because you passed it an array seems odd to me. - Jonathan M Davis
Jan 21 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, January 21, 2011 10:47:01 so wrote:
 replace is clearer in the first case, because you're getting the return
 value.
 ...

I am really trying hard to understand this, but your reasons for first is clearer then the second makes no sense to me i am sorry. I still think second is clearer, but whatever, as long as i can see the interface or the doc, i am fine. string replace(string, ...); void replace(ref string, ...);

The issue is when you don't look at the documentation or trying to avoid having to look at the documentation. If you see auto result = replace(str, "hello", "goodbye"); it's quite clear that a copy is taking place. And if a copy/slice is taking place, then that is what you would normally see when replace is used. However, if replace alters the array in place, then replace(str, "hello", "goodbye"); would be what you would normally see. And without looking at the documentation, it's not clear whether that is doing it in-place or if you're throwing away the return value. However, in the case where replace does a copy/slice, it _is_ clear, because the return value is saved. So, if copying/slicing is the default, then you won't _need_ to read the documentation to know whether a copy/slice is happening or whether it's happening in-place, because the code itself will make it obvious (unless you screwed up and forgot to assign the return value to a variable or pass it to a function). But in the case where in-place is the default, it is _not_ obvious by reading the code. Sure, once you read the documentation, you'll know. But you have to read the documentation. So, copying/slicing by default is immediately obvious whereas in-place is not.
 Regardless, I don't see anything wrong with naming functions in a manner
 that
 implies that a functional style is the default

I am not against enforcing such a rule, i am against doing it implicitly and work with assumptions. Just check boost/string/replace, they have in place replaces default too. You might not like boost (some don't) but it is the closest example to D.

If you want consistency among your function, then you have to pick either copying or in place as the default. That doesn't necessarily mean that _all_ functions must _always_ be named that way (e.g. the current behavior of sort is an interesting example since it does _both_). However, if you're going for consistency, then you have to pick one or the other. Unless you want to explicitly put Copy and InPlace in all of the array functions and not have any without it, you're going to _have_ to deal with the fact that a function without Copy or InPlace in its name is still going to have to do one or the other (unless you're talking about a function which is just querying something about an array rather than manipulating it - like cmp). So, when you have a function like replace, you have to choose whether it's going to do it in place or copy/slice the array. A different version of the function with a different name (such as replaceCopy or replaceInPlace) then deals with the other case. Phobos has already been going for the default of copying/slicing rather than doing it in-place. Given that strings _have_ to be copied or sliced and that strings are the most common type of array, making copying/slicing the default makes good sense. It's fine if Boost wants to pick in-place as the default. That's their choice. They're also dealing with a different programming language with different pros and cons. Personally, I prefer that copying/slicing be the default if it's efficient enough to do so, since that promotes a functional style of programming, which is going to tend to be more straightforward and less error-prone, but if in-place mutation was going to be the normal use case (like is probably the case with Boost), then it's probably better to make in-place the norm, because that's the way that's going to be used most. However, since that's _not_ the way that's likely to be used most in D (due to strings having immutable elements), I really don't think that in-place as the default makes the most sense for D. - Jonathan M Davis
Jan 21 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, January 21, 2011 12:48:57 spir wrote:
 On 01/21/2011 09:21 PM, Jonathan M Davis wrote:
 The issue is when you don't look at the documentation or trying to avoid
 having to look at the documentation. If you see
 
 auto result = replace(str, "hello", "goodbye");
 
 it's quite clear that a copy is taking place. And if a copy/slice is
 taking place, then that is what you would normally see when replace is
 used. However, if replace alters the array in place, then
 
 replace(str, "hello", "goodbye");
 
 would be what you would normally see. And without looking at the
 documentation, it's not clear whether that is doing it in-place or if
 you're throwing away the return value. However, in the case where
 replace does a copy/slice, it_is_ clear, because the return value is
 saved.

I don't follow you here. You use in your reasoning the particularity of C-like funcs which can be both proper functions and action routines. Indeed, as you say, one can throw away a result after calling a routine which is mainly a function, but for a side-effect; right. But the same applies conversely: one can well call a routine which is mainly an action (in this case, that operates in-place) and returns whatever outcome flag, so that: auto result = replace(str, "hello", "goodbye"); actually operates in-place. Which is consistent with its name, an action verb suggesting an action. Replace could eg return the number of replacements performed (actually useful, what do you think?) Without more information, and guessing from the name, that is precisely what I would think (and try to imagine what meta-info replace returns). Do not misinterpret: I actually support the choice of making return/copy the default (where both would make sense), because it's safer. But since we are changing many names, why not avoid misleading ones, precisely for the default case?

Sure, you can always come up with more exotic stuff that the return value could do, but I would expect that your average programmer would think that auto result = replace(str, "hello", "goodbye"); made a copy of the string with "hello" having been replaced with "goodbye" in the return value rather than in-place. Stuff like returning the number of replacements made is less typical, and I wouldn't expect that to be what a programmer would initially expect the function to do. Obviously, you're going to have to look at the documentation to be sure regardless of what the function actually does, but in this case, the obvious answer would be the correct one. I really don't find having functions returning results without altering their arguments as the normal case to be odd at all, let alone misleading, since that's what most functions actually do. True, it becomes more ambiguous once you're dealing with reference types like arrays, and ultimately, you have to read the documentation to be sure, but the most typical case is for a function to take in a set of arguments and return a result without altering those arguments. I see no reason to change that just because you're dealing with a reference type. I find replace to be perfectly clear as it is. - Jonathan M Davis
Jan 21 2011
prev sibling parent spir <denis.spir gmail.com> writes:
On 01/21/2011 10:03 PM, Jonathan M Davis wrote:
 I really don't find having functions returning results without altering their
 arguments as the normal case to be odd at all, let alone misleading, since
 that's what most functions actually do.

Same for me. I don't find having this version as the normal case odd at all, neither. I just find using action-verbs to denote that misleading; eg sort(array) so-to-say "naturally" means "_sort_ this array", not "gimme a new _sorted_ array". Many programmers name function which (main) purpose is to construct a new element according to said element (not only me, I copied this practice from others). They are right on this, it's highly informative and never misleading (except for issues inherent to english). Then, using action-verbs for action-functions is, in constrast, also sensible: writeReport(reportData); Denis _________________ vita es estrany spir.wikidot.com
Jan 21 2011