www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - string vs. const(char)[] on the function signature

reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
tl;dr - Please see the conclusion section below.

Although the question is valid for other reference types as well, I will 
only use strings in this post.

The question is whether a function should take a string parameter as 
'string' or as 'const(char)[]'.

string on the function API is a restriction: Such a function wants the 
caller to provide an immutable string:

void foo(string s)  // <-- too restrict
{
     // ... doesn't mutate s ...
}

void main()
{
     char[] s;
     foo(s);  // <-- ERROR: not immutable
}

The caller can copy the string before calling the function but that 
would only be to satisfy the limitation that the immutable parameter brings.

So, a better parameter type is 'const char[]' (or const(char)[]) because 
it can be bound to both mutable and immutable:

void foo(const char[] s)  // <-- welcoming
{
     // ... doesn't mutate s ...
}

void main()
{
     char[] s;
     foo(s);               // <-- works

     foo("");              // <-- works
}

So far so good.

Unfortunately, a problem arises when foo() decides to make a call to a 
function that really (or unnecessarily) needs an immutable string:

void bar(string s)  // <-- really needs immutable
{
     // ...
}

void foo(const char[] s)
{
     // ... doesn't mutate s ...

     bar(s);  // <-- ERROR: not immutable
}

A solution is to make an immutable copy of the string before calling 
bar(), but that would be the same inefficiency as how the original 
caller could make a copy before calling foo():

import std.conv;

void foo(const char[] s)
{
     // ... doesn't mutate s ...

     bar(to!string(s));  // <-- sometimes an unnecessary copy
}

Although to!string is a no-op when the original string is immutable, 
there will always be a copy above because the 'const char[]' (or 
const(char)[]) parameter erases the mutability attribute of the string. 
The compiler can't know whether the original string has been mutable or 
immutable.

In order to retain the mutability information, one can think of inout. 
Unfortunately, conditional compilation as in the following code can't 
work with inout:

void foo(inout(char)[] s)
{
     // ... doesn't mutate s ...

     // NOT A SOLUTION
     static if (is (typeof(s[0]) == immutable)) {
         writeln("immutable");

     } else {
         writeln("mutable");
     }
}

This is not surprising: Since inout is not a template the compiler will 
generate the same code for mutable, const, and immutable. So, in order 
to support all, such parameters cannot be mutated in the function body.

Another way of retaining the mutability information is using templates. 
Since this time the actual "type" would be erased, we can (and should) 
use a template constraint to accept only strings:

void foo(T)(T[] s)
     if (is(Unqual!T == char))
{
     // ... doesn't mutate s ...
}

(Note: Of course the constraint could also be isSomeChar!T, but I wanted 
to continue with the same example that uses 'char'.)

This is great because now we can pass the parameter to to!string 
unconditionally and don't pay anything if the original is already an 
immutable string. Here is a program that demonstrates that nothing is 
being copied for originally immutable strings:

import std.stdio;
import std.traits;
import std.conv;

void foo(T)(T[] s_param)
     if (is(Unqual!T == char))
{
     writeln('\n', s_param);
     writeln("in foo: ", s_param.ptr);

     auto s = to!(string)(s_param);
     bar(s);
}

void bar(string s)
{
     writeln("in bar: ", s.ptr);
}

void main()
{
     char[] m = "originally mutable".dup;
     foo(m);

     foo("originally immutable");
}

Here is the output on my system:

originally mutable
in foo: 2B70E4DE7F80
in bar: 2B70E4DE7F60  // <-- copied

originally immutable
in foo: 47D820
in bar: 47D820        // <-- not copied


CONCLUSION:

* For parameters of reference types that are not modified in the 
function, const is a better choice than immutable because const can take 
both mutable and immutable.

(I still include this among the guidelines under the "How to use" 
section here:

   http://ddili.org/ders/d.en/const_and_immutable.html
)

* The choice above complicates matters when the parameter needs to be 
forwarded to a function that takes as immutable, because 'const' erases 
the mutability information of the actual variable.

* The solution is to make the function a template so that the actual 
type is retained. This solution prevents unnecessary copies when the 
actual variable is already immutable.


QUESTIONS:

What do you think about all of this? Can you see better idioms? Should 
we simply ignore this issue and stick with immutable anyway, especially 
for strings since they are everywhere? Should the original foo() take 
string and have the callers make a copy if the original variable is mutable?

Thank you,
Ali

P.S. I had opened a similar thread about the return types of functions:

   http://forum.dlang.org/thread/itr5o1$poi$1 digitalmars.com

Since then, I have learned that pure functions can simply return by 
mutable because the return values of pure functions can implicitly be 
converted to immutable:

char[] foo() pure     // <-- returns mutable
{
     char[] result;
     return result;
}

void main()
{
     char[] m = foo();  // <-- works
     string s = foo();  // <-- works
}
Jul 13 2012
next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, July 13, 2012 10:19:34 Ali Çehreli wrote:
 QUESTIONS:
 
 What do you think about all of this? Can you see better idioms? Should
 we simply ignore this issue and stick with immutable anyway, especially
 for strings since they are everywhere? Should the original foo() take
 string and have the callers make a copy if the original variable is mutable?
In a lot of cases, the answer is to simply template on string type so that you don't care. In other cases, you're going to need a string anyway, so why bother templatizing (e.g. you need to assign the string to a member variable or pass it to a C function)? And in some cases, there's no real gain in taking anything other than string. It was debated for std.file whether it should take anything other than string, and we stuck with string, because it just wasn't worth bothering with anything else. Any string-related costs were nothing in comparison to the I/O going on, and whether string or wstring was what was needed for calling the C functions depended on the OS. So, we just stuck with string. I believe that std.net.curl ends up using a combinatin of const(char)[] and string depending on what the string is ultimately used for. std.path and std.string on the other hand templatize everything so that they work with any string type, but they're operating on strings and returning them, not saving them or using them with other APIs (let alone C APIs). In general, I would argue that if you're going to operate on a string and then return it, you should templatize on string type (or even use a range, if the funciton can be genericized beyond strings). But in cases where you're going to need a specific string type, it's often best to just take that string type (usually string). In some of those cases though (particularly if you're going to have to copy it anyway), taking const(char)[] makes more sense. Classes are one case where you're likely to be forced to take either string or const(char) [], because you can't templatize virtual functions. So, ultimately, what makes the most sense depends entirely on the situation. I'm not sure that we can really give any hard and fast rules. It mostly comes down to balancing flexibility and performance. We should make functions as flexible as possible with regards string type but restrict it when it makes sense to do so for performance or when the extra flexibility just isn't warranted (or isn't possible - e.g. virtual functions). You make very good points overall, and I think that you understand the situation fairly well, but I don't know how we'd really go about giving much in the way of concrete guidelines, since it's so situation-dependent. - Jonathan M Davis
Jul 13 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Ali Çehreli:
 tl;dr - Please see the conclusion section below.
One case for your discussion: http://d.puremagic.com/issues/show_bug.cgi?id=8164 Bye, bearophile
Jul 13 2012
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
I know this is really old, but just catching up on old posts.

On Fri, 13 Jul 2012 13:19:34 -0400, Ali =C3=87ehreli <acehreli yahoo.com=
 wrote:
 tl;dr - Please see the conclusion section below.
[snip]
 CONCLUSION:

 * For parameters of reference types that are not modified in the  =
 function, const is a better choice than immutable because const can ta=
ke =
 both mutable and immutable.

 (I still include this among the guidelines under the "How to use"  =
 section here:

    http://ddili.org/ders/d.en/const_and_immutable.html
 )

 * The choice above complicates matters when the parameter needs to be =
=
 forwarded to a function that takes as immutable, because 'const' erase=
s =
 the mutability information of the actual variable.

 * The solution is to make the function a template so that the actual  =
 type is retained. This solution prevents unnecessary copies when the  =
 actual variable is already immutable.
IMO, a function that does not utilize the benefits of immutable should = actually be re-labeled const or inout. For example (please, don't suggest I use some tricks to make this a one = = liner, it's an example): int foo(immutable(int)[] arr) { int x =3D 0; foreach(m ; arr) x +=3D m; return x; } Clearly, this does not need to be immutable. Making it immutable does n= ot = help or change anything. However, we have this special case of char[] arrays, because the most = common type used is 'string', which is immutable. But using string has benefits -- you can simply store the string somewhe= re = without worrying it gets changed or erased. However, the library (phobos) should not force you into immutable ever. = = Yes, strings are immutable, and we can have some benefits for that, but = = phobos shouldn't make it difficult to avoid immutable, it should not hav= e = an opinion there. Almost all phobos functions that accept 'strings' should take = const(char)[] or inout(char)[], not string. Now, we cannot control what we don't write, so it's quite easy to see th= at = someone might label something as string when it should have been = inout(char)[], and you simply have to deal with it. I think the correct= = solution is to define both an inout/const version, which uses .idup, and= = an immutable version which uses does not. There is no reason to have a = = mutable version. It would be nice to be able to make a recommended pattern that allows yo= u = to avoid code duplication. Something like this: template constOrImmutable(T) { static if(is(T =3D=3D immutable)) alias T constOrImmutable; else alias const(T) constOrImmutable; } void foo(T)(constOrImmutable!(T)[] arg) { bar(to!(immutable(T)[])(arg)); // should idup if T is not immutable= } Of course, this still results in two identical instantiations for mutabl= e = and const, even though the resulting code is the same. Hopefully the = compiler optimizes this out. -Steve
Sep 13 2012