www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - [Design] return char[] or string?

reply "Stewart Gordon" <smjg_1998 yahoo.com> writes:
While I haven't got into using D 2.x, I've already begun thinking about 
making libraries compatible with it.  On this basis, a design decision to 
consider is whether functions that return a string should return it as a 
char[] or a const(char)[].  (I use "string" with its general meaning, and 
"const(char)[]" to refer to that specific type.  Obviously for 1.0 
compatibility, I'd have to use the "string" alias wherever I want 
const(char)[].)

Obviously, a function that takes a string as a parameter has to take in a 
const(char)[], to be able to accept a string literal or otherwise a constant 
string.  But what about the return type?

Looking through the 2.x version of std.string, they all return const(char)[] 
rather than char[].  (Except for those that return something else such as a 
number.)  This is necessary in most cases because of the copy-on-write 
policy.

But otherwise, it seems that both have their pros and cons.

There seem to be two cases to consider: libraries targeted specifically at D 
2.x, and libraries that (attempt to) support both 1.x and 2.x.  At the 
moment, it's the latter that really matters.

Let's see.  The string-returning functions in my library more or less fall 
into these categories:
(a) functions that build a string in a local variable, which is then 
returned
(b) functions that return a copy of a member variable
(c) property setters and the like that simply pass the argument through
(d) functions that call a function in Phobos and return the result

In the case of (a), there is no obvious benefit to returning a const(char)[] 
rather than a char[].

Many of the cases of (b) are property getters.  If we have such things 
returning a const(char)[], then the getter no longer needs to copy the 
member variable.  Though versioning would be needed to implement this 
behaviour without causing havoc under 1.x.  The alternative, leaving them 
returning char[], leads to inconsistency with (c), which would have to 
return const(char)[].

That leaves (d), to which the obvious answer is to return whatever type the 
Phobos function returns.

On one hand, if the string is generated on the fly, and so altering it would 
not cause a problem, it seems wasteful to return a const(char)[] only for 
the caller to have to .dup it if it wants to modify it.

On the other hand, from the library user's point of view, it can be seen as 
a confusing inconsistency if some functions return char[] and others 
const(char)[], when no difference in the semantics of what's returned 
accounts for this.  It also borders on breaking the encapsulation principle, 
whereby internal implementation details should not be exposed in my 
library's API.

What do you people think?

Stewart. 
Jul 29 2007
parent reply Kirk McDonald <kirklin.mcdonald gmail.com> writes:
Stewart Gordon wrote:
 While I haven't got into using D 2.x, I've already begun thinking about 
 making libraries compatible with it.  On this basis, a design decision 
 to consider is whether functions that return a string should return it 
 as a char[] or a const(char)[].  (I use "string" with its general 
 meaning, and "const(char)[]" to refer to that specific type.  Obviously 
 for 1.0 compatibility, I'd have to use the "string" alias wherever I 
 want const(char)[].)
 
 Obviously, a function that takes a string as a parameter has to take in 
 a const(char)[], to be able to accept a string literal or otherwise a 
 constant string.  But what about the return type?
 
 Looking through the 2.x version of std.string, they all return 
 const(char)[] rather than char[].  (Except for those that return 
 something else such as a number.)  This is necessary in most cases 
 because of the copy-on-write policy.
 
 But otherwise, it seems that both have their pros and cons.
 
 There seem to be two cases to consider: libraries targeted specifically 
 at D 2.x, and libraries that (attempt to) support both 1.x and 2.x.  At 
 the moment, it's the latter that really matters.
 
 Let's see.  The string-returning functions in my library more or less 
 fall into these categories:
 (a) functions that build a string in a local variable, which is then 
 returned
 (b) functions that return a copy of a member variable
 (c) property setters and the like that simply pass the argument through
 (d) functions that call a function in Phobos and return the result
 
 In the case of (a), there is no obvious benefit to returning a 
 const(char)[] rather than a char[].
 
 Many of the cases of (b) are property getters.  If we have such things 
 returning a const(char)[], then the getter no longer needs to copy the 
 member variable.  Though versioning would be needed to implement this 
 behaviour without causing havoc under 1.x.  The alternative, leaving 
 them returning char[], leads to inconsistency with (c), which would have 
 to return const(char)[].
 
 That leaves (d), to which the obvious answer is to return whatever type 
 the Phobos function returns.
 
 On one hand, if the string is generated on the fly, and so altering it 
 would not cause a problem, it seems wasteful to return a const(char)[] 
 only for the caller to have to .dup it if it wants to modify it.
 
 On the other hand, from the library user's point of view, it can be seen 
 as a confusing inconsistency if some functions return char[] and others 
 const(char)[], when no difference in the semantics of what's returned 
 accounts for this.  It also borders on breaking the encapsulation 
 principle, whereby internal implementation details should not be exposed 
 in my library's API.
 
 What do you people think?
 
 Stewart.

It's a question of ownership. If the function is returning a new string, and giving ownership of that string to the caller, then it should return a char[]. If the function is returning a string which the caller is merely borrowing, it should return a const(char)[]. In most cases, thinking of things this way causes the return type to be obvious. And, of course, you can always convert a char[] to a const(char)[]. In (a), the function is returning a new string to the caller; it should return char[]. (b) should usually return const(char)[], unless of course you want the caller to mutate the string. If you're going through the trouble of wrapping a member with a getter/setter, then that probably means you don't want the user messing with it directly. The other cases are less clear, and will vary from function to function. -- Kirk McDonald http://kirkmcdonald.blogspot.com Pyd: Connecting D and Python http://pyd.dsource.org
Jul 29 2007
parent reply Regan Heath <regan netmail.co.nz> writes:
Some comments inline...

Kirk McDonald wrote:
 Stewart Gordon wrote:
 While I haven't got into using D 2.x, I've already begun thinking 
 about making libraries compatible with it.  On this basis, a design 
 decision to consider is whether functions that return a string should 
 return it as a char[] or a const(char)[].  (I use "string" with its 
 general meaning, and "const(char)[]" to refer to that specific type.  
 Obviously for 1.0 compatibility, I'd have to use the "string" alias 
 wherever I want const(char)[].)

 Obviously, a function that takes a string as a parameter has to take 
 in a const(char)[], to be able to accept a string literal or otherwise 
 a constant string.  But what about the return type?


It's a pity D cannot differentiate string literals and place those passed as char[] (mutable) parameters in RAM. Obviously it would have to create a seperate one for each and every use.
 Looking through the 2.x version of std.string, they all return 
 const(char)[] rather than char[].  (Except for those that return 
 something else such as a number.)  This is necessary in most cases 
 because of the copy-on-write policy.


True, however when you perform 'copy on write' you get a copy of the original and that copy is unique and owned by the copier and therefore can be mutable, or in other words char[] not const(char)[].
 But otherwise, it seems that both have their pros and cons.

 There seem to be two cases to consider: libraries targeted 
 specifically at D 2.x, and libraries that (attempt to) support both 
 1.x and 2.x.  At the moment, it's the latter that really matters.

 Let's see.  The string-returning functions in my library more or less 
 fall into these categories:
 (a) functions that build a string in a local variable, which is then 
 returned
 (b) functions that return a copy of a member variable
 (c) property setters and the like that simply pass the argument through
 (d) functions that call a function in Phobos and return the result

 In the case of (a), there is no obvious benefit to returning a 
 const(char)[] rather than a char[].

 Many of the cases of (b) are property getters.  If we have such things 
 returning a const(char)[], then the getter no longer needs to copy the 
 member variable.  Though versioning would be needed to implement this 
 behaviour without causing havoc under 1.x.  The alternative, leaving 
 them returning char[], leads to inconsistency with (c), which would 
 have to return const(char)[].

 That leaves (d), to which the obvious answer is to return whatever 
 type the Phobos function returns.

 On one hand, if the string is generated on the fly, and so altering it 
 would not cause a problem, it seems wasteful to return a const(char)[] 
 only for the caller to have to .dup it if it wants to modify it.


Indeed and some Phobos function are doing this, it has been a source of irritation for me since the inception of 'const'.
 On the other hand, from the library user's point of view, it can be 
 seen as a confusing inconsistency if some functions return char[] and 
 others const(char)[], when no difference in the semantics of what's 
 returned accounts for this.  It also borders on breaking the 
 encapsulation principle, whereby internal implementation details 
 should not be exposed in my library's API.


I think perhaps providing more than one overload could help lessen confusion, things like having: char[] tolowerInplace(char [] s) in addition to the standard tolower.
 It's a question of ownership. If the function is returning a new string, 
 and giving ownership of that string to the caller, then it should return 
 a char[]. If the function is returning a string which the caller is 
 merely borrowing, it should return a const(char)[]. In most cases, 
 thinking of things this way causes the return type to be obvious.
 
 And, of course, you can always convert a char[] to a const(char)[].

This is how I tend to think about it also.
 In (a), the function is returning a new string to the caller; it should 
 return char[].
 
 (b) should usually return const(char)[], unless of course you want the 
 caller to mutate the string. If you're going through the trouble of 
 wrapping a member with a getter/setter, then that probably means you 
 don't want the user messing with it directly.
 
 The other cases are less clear, and will vary from function to function.

As I mentioned above I have been repeatedly annoyed by a number of Phobos string functions since the introduction of 'const'. I think in some cases we need to rethink some of the functions and how they work in order to provide a more 'const' aware/friendly library. Example "string[] split(in string s)" in std.string. If the input is char[] then this function essentially casts the input to const and if I want to perform further modification of the input I now have to dup the results. In a sense this function 'takes ownership' of the input and does not give it back again. I think in this case split should be templated. If the input is char[] the result should be char[][], if the input is string the result should be string[]. This works fine for cases where the input is not ever copied, but in cases where it is conditionally copied, "string tolower(string s)" in std.string for example. It cannot know ahead of time whether it's going to need to 'copy on write' so simply templating it doesn't help, however I suggested a possible templated solution which dups only in the case where the input is 'string': http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=55337 I figure if want a copy of the input you can manually dup the parameter you pass. Another solution (also hinted at above) may be to provide more than one overload, you might do this where you cannot easily template a solution to efficiently handle the common case for each input type (mutable/const). As for your cases mentioned above... I would probably implement (c), a property setter, as code that sets the member followed by a call to the getter so it would return the same as (b). That said I haven't written a lot of these so perhaps my experience using them isn't sufficient. Is there some reason you'd rather return char[] from a setter? I'm hoping in the case of (d) that phobos will change or provide more overloads to handle the different use-cases. Regan
Jul 30 2007
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Regan Heath Wrote:

<snip>
 As for your cases mentioned above...
 
 I would probably implement (c), a property setter, as code that sets the 
 member followed by a call to the getter so it would return the same as 
 (b).  That said I haven't written a lot of these so perhaps my 
 experience using them isn't sufficient.

I've never really liked this idea. In general, either it would just return the same string that was passed in, IWC there's no point calling the getter rather than simply returning the argument, or there would be a performance hit where the return value isn't used. Much better would be if D would chain property assignments implicitly: http://www.digitalmars.com/d/archives/digitalmars/D/10199.html If only Walter would finally answer this request (among many others)! Stewart.
Aug 21 2007
parent reply Regan Heath <regan netmail.co.nz> writes:
Stewart Gordon wrote:
 Regan Heath Wrote:
 
 <snip>
 As for your cases mentioned above...
 
 I would probably implement (c), a property setter, as code that
 sets the member followed by a call to the getter so it would return
 the same as (b).  That said I haven't written a lot of these so
 perhaps my experience using them isn't sufficient.

I've never really liked this idea. In general, either it would just return the same string that was passed in, IWC there's no point calling the getter rather than simply returning the argument

I'd hope the call to the getter would be inlined. The point I see is consistency, the getter might return the stored value with some sort of modification, perhaps due to a change in required functionality at some point, or perhaps because more than one getter uses the same data member.
, or
 there would be a performance hit where the return value isn't used.

Yeah, that's always going to be a problem. It's a pity we cannot overload on return type.
 Much better would be if D would chain property assignments
 implicitly:
 
 http://www.digitalmars.com/d/archives/digitalmars/D/10199.html
 
 If only Walter would finally answer this request (among many others)!

Yeah, this is another of those cases where a property doesn't quite work the same as a plain old data member, p.property += x; being the more common one. Regan
Aug 22 2007
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Regan Heath wrote

 Yeah, this is another of those cases where a property doesn't
 quite work the same as a plain old data member, p.property += x;
 being the more common one.

Again I do not see the deeper reason for a whole discussion. This time this discussion about properties. Properties _are_ restricted. If one do not want this restrictions, one can use a class instead. -manfred
Aug 22 2007
parent reply Regan Heath <regan netmail.co.nz> writes:
Manfred Nowak wrote:
 Regan Heath wrote
 
 Yeah, this is another of those cases where a property doesn't
 quite work the same as a plain old data member, p.property += x;
 being the more common one.

Again I do not see the deeper reason for a whole discussion. This time this discussion about properties. Properties _are_ restricted. If one do not want this restrictions, one can use a class instead.

http://www.digitalmars.com/d/property.html "Properties are member functions that can be syntactically treated as if they were fields" I was under the impression the main benefit to properties was being able to replace an existing field (one in use by some user code) with a property and have it work without user code changes. eg. ---------BEFORE-------- class A { public int a; } void foo(ref int i) {} void main() { A a = new A(); int b; b = a.a = 5; a.a += 1; a.a++; foo(a.a); } ---------AFTER--------- <after> class A { int _a; public int a() { return _a; } public int a(int _aa) { _a = _aa; return a(); } } void main() { A a = new A(); int b; b = a.a = 5; //error a.a += 1; //error a.a++; //error foo(a.a); //error } Sadly there are plenty of cases where a property cannot be "syntactically treated as a field" but needs a completely different syntax. Sure, there are other benefits for properties like performing some complex calculation on the input to the setter, or error checking it, or whatever but I don't think this is the core benefit to properties as these can be achieved with plain old methods i.e. set<Propertyname> Something that would solve 3 of the errors above is the ability to return by 'ref', eg. class A { int _a; public ref int a() { return _a; } public ref int a(int _aa) { _a = _aa; return a(); } } void main() { A a = new A(); int b; b = a.a = 5; //error a.a += 1; //ok a.a++; //ok foo(a.a); //ok } The problem with the remaining error is that a setter might take and return 2 different types, as mentioned here: http://www.digitalmars.com/d/archives/digitalmars/D/10199.html Regan
Aug 23 2007
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Regan Heath wrote

 "Properties are member functions that can be syntactically treated
 as if they were fields"
 
 I was under the impression the main benefit to properties was
 being able to replace an existing field (one in use by some user
 code) with a property and have it work without user code changes.

For me the definition implies that fields can replace properties, but no property can replace a field. I read the definition above like this: : if a member function can be syntactically treated as a field, then : it is a property i.e., properties have less syntactical power than fields. Because member functions with at most one formal parameter can be treated as fields in assignments, i.e. without the parentheses, D has properties according to the definition in the docs. All those recognizable errors in yours and Stewart's examples are based on the wrong assumption, that properties are more powerful than fields. -manfred
Aug 23 2007
next sibling parent reply Regan Heath <regan netmail.co.nz> writes:
Manfred Nowak wrote:
 Regan Heath wrote
 
 "Properties are member functions that can be syntactically treated
 as if they were fields"

 I was under the impression the main benefit to properties was
 being able to replace an existing field (one in use by some user
 code) with a property and have it work without user code changes.

For me the definition implies that fields can replace properties, but no property can replace a field. I read the definition above like this: : if a member function can be syntactically treated as a field, then : it is a property

Sure, you can reverse the definition if you like. The end result is that as there are no member functions in D which can be syntactically treated as if they were fields (in all cases) then D does not have properties by this definition. Note I use "(in all cases)" above, that is the assumption I am making, if any. I believe this assumption is implied by the definition, just as I believe your assumption below is not.
 i.e.,  properties have less syntactical power than fields.

This is certainly true, but how do you draw that conclusion from the definition? (this is not the assumption to which I refer above)
 Because member functions with at most one formal parameter can be 
 treated as fields in assignments, i.e. without the parentheses, D has 
 properties according to the definition in the docs.

It seems you are giving us a new defintion for properties in D which reads something like: "properties are member functions with at most one formal parameter (which) can be treated as fields in assignments only" Lets take a look at the full text of the paragraph in the docs containing the definition of a property: "Properties are member functions that can be syntactically treated as if they were fields. Properties can be read from or written to. A property is read by calling a method with no arguments; a property is written by calling a method with its argument being the value it is set to." Where does the definition mention only supporting assignments? (this is your assumption) Where does it mention "at most one formal parameter"?
 All those recognizable errors in yours and Stewart's examples are 
 based on the wrong assumption, that properties are more powerful than 
 fields.

It's possible my assumption "that properties should be treated syntactically like fields (in all cases)" is incorrect but given the core purpose of properties (to replace fields seamlessly) I don't think it's an outrageous assumption to make. I am simply expecting to be able to treat properties as fields (syntactically speaking) which is (almost to the letter/word) exactly what the definition states. We can quible over the definition all we like and frankly I don't really care to. We all know the docs are seldom precisely defined nor do they necessarily reflect reality. The simple fact remains that just about every new D user writes: char[] p; p.length += 5; and expects the 'length' "property" to be incremented by 5. Unless the error cases mentioned are supported by properties then the core reason for having properties (being able to seamlessly refactor replacing a field with a property) is null and void as it will regularly result in changes being required to user code. Properties are not as useful as many of us wish they were. I assume you agree that it would be quite nice to be able to use properties in the error cases listed? Regan
Aug 23 2007
parent Manfred Nowak <svv1999 hotmail.com> writes:
Regan Heath wrote
 I assume you agree that it would be quite nice to be able to use 
 properties in the error cases listed?

Not exactly properties, but one should be able to iron out those errors. As I stated before, one can use an inner class: class A{ private: int _a; public: Property a; this(){ a= new Property;} class Property{ int opCall() { return _a; } int opAssign(int assgn) { _a= assgn; return opCall(); } int opAddAssign( int add){ _a+= add; return opCall();} int opPostInc(){ int tmp= _a++; return tmp;} } } void foo(inout A.Property i) {} // wart!! void main(){ A a = new A(); int b; b = a.a = 5; a.a += 1; a.a++; foo(a.a); } In the examples given here two warts are remaining: - classes cannot derive from basic types, therefore the type of the formal parameter of `foo' has to be changed - Stewarts point stays unhandled. In fact the first wart may be closed by allowing something like `class Property: int' or `alias int Property'. Stewarts point does not need overloading by return type, a conditional or lazy return `return? <expr>' would be enough. Where `return?' has the semantics to not be evaluated if the value of the expression `<expr>' is not needed or fed as an actual parameter to a function at a position in the formal parameter list, where a `lazy' parameter is declared. -manfred
Aug 23 2007
prev sibling parent reply "Stewart Gordon" <smjg_1998 yahoo.com> writes:
"Manfred Nowak" <svv1999 hotmail.com> wrote in message 
news:faki47$2ifr$1 digitalmars.com...
 Regan Heath wrote

 "Properties are member functions that can be syntactically treated
 as if they were fields"


What I think it means is simply that you can get/set a property with notation that looks as though you're getting/setting a field.
 I was under the impression the main benefit to properties was
 being able to replace an existing field (one in use by some user
 code) with a property and have it work without user code changes.


I think that's more or less what was meant to happen, but it hasn't quite turned out that way.
 For me the definition implies that fields can replace properties, but
 no property can replace a field.

I don't see how you work that out.
 I read the definition above like this:
 : if a member function can be syntactically treated as a field, then
 : it is a property
 i.e.,  properties have less syntactical power than fields.

 Because member functions with at most one formal parameter can be
 treated as fields in assignments, i.e. without the parentheses, D has
 properties according to the definition in the docs.

 All those recognizable errors in yours and Stewart's examples are
 based on the wrong assumption, that properties are more powerful than
 fields.

I suppose properties have less syntactical power, but more power in terms of practical uses. If that makes sense.... Stewart.
Aug 23 2007
parent Derek Parnell <derek psych.ward> writes:
On Thu, 23 Aug 2007 22:20:16 +0100, Stewart Gordon wrote:

 I suppose properties have less syntactical power, but more power in terms of 
 practical uses.  If that makes sense....

Yes I can see what you mean. ( silly example: ) long x() } return { _x * 2 + _y; } void x(int y) { _x = y / 2; _y = y - 2; } void x(char[] y) { . . . } foo.x = 16; long A = foo.x; foo.x = "abc"; long B = foo.x; On the other hand, when I first read about properties I thought "Brilliant!" and started using them. However it soon became apparent that they were not brilliant but were really a PITA. I can't be bothered with them now as they just increase the cost of maintenance, because they cannot be syntactically be treated as the fields that they look like. They deceive coders because they look like fields but they are not fields. In short, properties in D are evil in the same way that 'goto' is evil. If you use them, be prepared to pay extra for them. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Aug 23 2007