digitalmars.D.learn - [Design] return char[] or string?

Stewart Gordon (46/46) Jul 29 2007 While I haven't got into using D 2.x, I've already begun thinking about

Kirk McDonald (19/77) Jul 29 2007 It's a question of ownership. If the function is returning a new string,...

Regan Heath (50/118) Jul 30 2007 It's a pity D cannot differentiate string literals and place those

Stewart Gordon (7/13) Aug 21 2007 I've never really liked this idea. In general, either it would just ret...

Regan Heath (12/33) Aug 22 2007 I'd hope the call to the getter would be inlined.

Manfred Nowak (5/8) Aug 22 2007 Again I do not see the deeper reason for a whole discussion. This time

Regan Heath (67/76) Aug 23 2007 http://www.digitalmars.com/d/property.html

Manfred Nowak (15/21) Aug 23 2007 For me the definition implies that fields can replace properties, but

Regan Heath (45/67) Aug 23 2007 Sure, you can reverse the definition if you like. The end result is

Manfred Nowak (38/40) Aug 23 2007 Not exactly properties, but one should be able to iron out those

Stewart Gordon (10/28) Aug 23 2007 What I think it means is simply that you can get/set a property with

Derek Parnell (23/25) Aug 23 2007 Yes I can see what you mean.

"Stewart Gordon" <smjg_1998 yahoo.com> writes:

While I haven't got into using D 2.x, I've already begun thinking about 
making libraries compatible with it.  On this basis, a design decision to 
consider is whether functions that return a string should return it as a 
char[] or a const(char)[].  (I use "string" with its general meaning, and 
"const(char)[]" to refer to that specific type.  Obviously for 1.0 
compatibility, I'd have to use the "string" alias wherever I want 
const(char)[].)

Obviously, a function that takes a string as a parameter has to take in a 
const(char)[], to be able to accept a string literal or otherwise a constant 
string.  But what about the return type?

Looking through the 2.x version of std.string, they all return const(char)[] 
rather than char[].  (Except for those that return something else such as a 
number.)  This is necessary in most cases because of the copy-on-write 
policy.

But otherwise, it seems that both have their pros and cons.

There seem to be two cases to consider: libraries targeted specifically at D 
2.x, and libraries that (attempt to) support both 1.x and 2.x.  At the 
moment, it's the latter that really matters.

Let's see.  The string-returning functions in my library more or less fall 
into these categories:
(a) functions that build a string in a local variable, which is then 
returned
(b) functions that return a copy of a member variable
(c) property setters and the like that simply pass the argument through
(d) functions that call a function in Phobos and return the result

In the case of (a), there is no obvious benefit to returning a const(char)[] 
rather than a char[].

Many of the cases of (b) are property getters.  If we have such things 
returning a const(char)[], then the getter no longer needs to copy the 
member variable.  Though versioning would be needed to implement this 
behaviour without causing havoc under 1.x.  The alternative, leaving them 
returning char[], leads to inconsistency with (c), which would have to 
return const(char)[].

That leaves (d), to which the obvious answer is to return whatever type the 
Phobos function returns.

On one hand, if the string is generated on the fly, and so altering it would 
not cause a problem, it seems wasteful to return a const(char)[] only for 
the caller to have to .dup it if it wants to modify it.

On the other hand, from the library user's point of view, it can be seen as 
a confusing inconsistency if some functions return char[] and others 
const(char)[], when no difference in the semantics of what's returned 
accounts for this.  It also borders on breaking the encapsulation principle, 
whereby internal implementation details should not be exposed in my 
library's API.

What do you people think?

Stewart.

Jul 29 2007

Kirk McDonald <kirklin.mcdonald gmail.com> writes:

Stewart Gordon wrote:
 While I haven't got into using D 2.x, I've already begun thinking about 
 making libraries compatible with it.  On this basis, a design decision 
 to consider is whether functions that return a string should return it 
 as a char[] or a const(char)[].  (I use "string" with its general 
 meaning, and "const(char)[]" to refer to that specific type.  Obviously 
 for 1.0 compatibility, I'd have to use the "string" alias wherever I 
 want const(char)[].)
 
 Obviously, a function that takes a string as a parameter has to take in 
 a const(char)[], to be able to accept a string literal or otherwise a 
 constant string.  But what about the return type?
 
 Looking through the 2.x version of std.string, they all return 
 const(char)[] rather than char[].  (Except for those that return 
 something else such as a number.)  This is necessary in most cases 
 because of the copy-on-write policy.
 
 But otherwise, it seems that both have their pros and cons.
 
 There seem to be two cases to consider: libraries targeted specifically 
 at D 2.x, and libraries that (attempt to) support both 1.x and 2.x.  At 
 the moment, it's the latter that really matters.
 
 Let's see.  The string-returning functions in my library more or less 
 fall into these categories:
 (a) functions that build a string in a local variable, which is then 
 returned
 (b) functions that return a copy of a member variable
 (c) property setters and the like that simply pass the argument through
 (d) functions that call a function in Phobos and return the result
 
 In the case of (a), there is no obvious benefit to returning a 
 const(char)[] rather than a char[].
 
 Many of the cases of (b) are property getters.  If we have such things 
 returning a const(char)[], then the getter no longer needs to copy the 
 member variable.  Though versioning would be needed to implement this 
 behaviour without causing havoc under 1.x.  The alternative, leaving 
 them returning char[], leads to inconsistency with (c), which would have 
 to return const(char)[].
 
 That leaves (d), to which the obvious answer is to return whatever type 
 the Phobos function returns.
 
 On one hand, if the string is generated on the fly, and so altering it 
 would not cause a problem, it seems wasteful to return a const(char)[] 
 only for the caller to have to .dup it if it wants to modify it.
 
 On the other hand, from the library user's point of view, it can be seen 
 as a confusing inconsistency if some functions return char[] and others 
 const(char)[], when no difference in the semantics of what's returned 
 accounts for this.  It also borders on breaking the encapsulation 
 principle, whereby internal implementation details should not be exposed 
 in my library's API.
 
 What do you people think?
 
 Stewart.

It's a question of ownership. If the function is returning a new string, 
and giving ownership of that string to the caller, then it should return 
a char[]. If the function is returning a string which the caller is 
merely borrowing, it should return a const(char)[]. In most cases, 
thinking of things this way causes the return type to be obvious.

And, of course, you can always convert a char[] to a const(char)[].

In (a), the function is returning a new string to the caller; it should 
return char[].

(b) should usually return const(char)[], unless of course you want the 
caller to mutate the string. If you're going through the trouble of 
wrapping a member with a getter/setter, then that probably means you 
don't want the user messing with it directly.

The other cases are less clear, and will vary from function to function.

-- 
Kirk McDonald
http://kirkmcdonald.blogspot.com
Pyd: Connecting D and Python
http://pyd.dsource.org

Jul 29 2007

Regan Heath <regan netmail.co.nz> writes:

Some comments inline...

Kirk McDonald wrote:
 Stewart Gordon wrote:
 While I haven't got into using D 2.x, I've already begun thinking 
 about making libraries compatible with it.  On this basis, a design 
 decision to consider is whether functions that return a string should 
 return it as a char[] or a const(char)[].  (I use "string" with its 
 general meaning, and "const(char)[]" to refer to that specific type.  
 Obviously for 1.0 compatibility, I'd have to use the "string" alias 
 wherever I want const(char)[].)

 Obviously, a function that takes a string as a parameter has to take 
 in a const(char)[], to be able to accept a string literal or otherwise 
 a constant string.  But what about the return type?


It's a pity D cannot differentiate string literals and place those 
passed as char[] (mutable) parameters in RAM.  Obviously it would have 
to create a seperate one for each and every use.

 Looking through the 2.x version of std.string, they all return 
 const(char)[] rather than char[].  (Except for those that return 
 something else such as a number.)  This is necessary in most cases 
 because of the copy-on-write policy.


True, however when you perform 'copy on write' you get a copy of the 
original and that copy is unique and owned by the copier and therefore 
can be mutable, or in other words char[] not const(char)[].

 But otherwise, it seems that both have their pros and cons.

 There seem to be two cases to consider: libraries targeted 
 specifically at D 2.x, and libraries that (attempt to) support both 
 1.x and 2.x.  At the moment, it's the latter that really matters.

 Let's see.  The string-returning functions in my library more or less 
 fall into these categories:
 (a) functions that build a string in a local variable, which is then 
 returned
 (b) functions that return a copy of a member variable
 (c) property setters and the like that simply pass the argument through
 (d) functions that call a function in Phobos and return the result

 In the case of (a), there is no obvious benefit to returning a 
 const(char)[] rather than a char[].

 Many of the cases of (b) are property getters.  If we have such things 
 returning a const(char)[], then the getter no longer needs to copy the 
 member variable.  Though versioning would be needed to implement this 
 behaviour without causing havoc under 1.x.  The alternative, leaving 
 them returning char[], leads to inconsistency with (c), which would 
 have to return const(char)[].

 That leaves (d), to which the obvious answer is to return whatever 
 type the Phobos function returns.

 On one hand, if the string is generated on the fly, and so altering it 
 would not cause a problem, it seems wasteful to return a const(char)[] 
 only for the caller to have to .dup it if it wants to modify it.


Indeed and some Phobos function are doing this, it has been a source of 
irritation for me since the inception of 'const'.

 On the other hand, from the library user's point of view, it can be 
 seen as a confusing inconsistency if some functions return char[] and 
 others const(char)[], when no difference in the semantics of what's 
 returned accounts for this.  It also borders on breaking the 
 encapsulation principle, whereby internal implementation details 
 should not be exposed in my library's API.


I think perhaps providing more than one overload could help lessen 
confusion, things like having:

char[] tolowerInplace(char [] s)

in addition to the standard tolower.

 It's a question of ownership. If the function is returning a new string, 
 and giving ownership of that string to the caller, then it should return 
 a char[]. If the function is returning a string which the caller is 
 merely borrowing, it should return a const(char)[]. In most cases, 
 thinking of things this way causes the return type to be obvious.
 
 And, of course, you can always convert a char[] to a const(char)[].

This is how I tend to think about it also.

 In (a), the function is returning a new string to the caller; it should 
 return char[].
 
 (b) should usually return const(char)[], unless of course you want the 
 caller to mutate the string. If you're going through the trouble of 
 wrapping a member with a getter/setter, then that probably means you 
 don't want the user messing with it directly.
 
 The other cases are less clear, and will vary from function to function.

As I mentioned above I have been repeatedly annoyed by a number of 
Phobos string functions since the introduction of 'const'.

I think in some cases we need to rethink some of the functions and how 
they work in order to provide a more 'const' aware/friendly library.

Example "string[] split(in string s)" in std.string.

If the input is char[] then this function essentially casts the input to 
const and if I want to perform further modification of the input I now 
have to dup the results.

In a sense this function 'takes ownership' of the input and does not 
give it back again.

I think in this case split should be templated.  If the input is char[] 
the result should be char[][], if the input is string the result should 
be string[].

This works fine for cases where the input is not ever copied, but in 
cases where it is conditionally copied, "string tolower(string s)" in 
std.string for example.

It cannot know ahead of time whether it's going to need to 'copy on 
write' so simply templating it doesn't help, however I suggested a 
possible templated solution which dups only in the case where the input 
is 'string':

http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=55337

I figure if want a copy of the input you can manually dup the parameter 
you pass.

Another solution (also hinted at above) may be to provide more than one 
overload, you might do this where you cannot easily template a solution 
to efficiently handle the common case for each input type (mutable/const).

As for your cases mentioned above...

I would probably implement (c), a property setter, as code that sets the 
member followed by a call to the getter so it would return the same as 
(b).  That said I haven't written a lot of these so perhaps my 
experience using them isn't sufficient.

Is there some reason you'd rather return char[] from a setter?

I'm hoping in the case of (d) that phobos will change or provide more 
overloads to handle the different use-cases.

Regan

Jul 30 2007

Stewart Gordon <smjg_1998 yahoo.com> writes:

Regan Heath Wrote:

<snip>
 As for your cases mentioned above...
 
 I would probably implement (c), a property setter, as code that sets the 
 member followed by a call to the getter so it would return the same as 
 (b).  That said I haven't written a lot of these so perhaps my 
 experience using them isn't sufficient.

I've never really liked this idea.  In general, either it would just return the
same string that was passed in, IWC there's no point calling the getter rather
than simply returning the argument, or there would be a performance hit where
the return value isn't used.

Much better would be if D would chain property assignments implicitly:

http://www.digitalmars.com/d/archives/digitalmars/D/10199.html

If only Walter would finally answer this request (among many others)!

Stewart.

Aug 21 2007

Regan Heath <regan netmail.co.nz> writes:

Stewart Gordon wrote:
 Regan Heath Wrote:
 
 <snip>
 As for your cases mentioned above...
 
 I would probably implement (c), a property setter, as code that
 sets the member followed by a call to the getter so it would return
 the same as (b).  That said I haven't written a lot of these so
 perhaps my experience using them isn't sufficient.

 
 I've never really liked this idea.  In general, either it would just
 return the same string that was passed in, IWC there's no point
 calling the getter rather than simply returning the argument

I'd hope the call to the getter would be inlined.

The point I see is consistency, the getter might return the stored value 
with some sort of modification, perhaps due to a change in required 
functionality at some point, or perhaps because more than one getter 
uses the same data member.

, or
 there would be a performance hit where the return value isn't used.

Yeah, that's always going to be a problem.  It's a pity we cannot 
overload on return type.

 Much better would be if D would chain property assignments
 implicitly:
 
 http://www.digitalmars.com/d/archives/digitalmars/D/10199.html
 
 If only Walter would finally answer this request (among many others)!

Yeah, this is another of those cases where a property doesn't quite work 
the same as a plain old data member, p.property += x; being the more 
common one.

Regan

Aug 22 2007

Manfred Nowak <svv1999 hotmail.com> writes:

Regan Heath wrote

 Yeah, this is another of those cases where a property doesn't
 quite work the same as a plain old data member, p.property += x;
 being the more common one.

Again I do not see the deeper reason for a whole discussion. This time 
this discussion about properties. Properties _are_ restricted. If one 
do not want this restrictions, one can use a class instead.

-manfred

Aug 22 2007

Regan Heath <regan netmail.co.nz> writes:

Manfred Nowak wrote:
 Regan Heath wrote
 
 Yeah, this is another of those cases where a property doesn't
 quite work the same as a plain old data member, p.property += x;
 being the more common one.

 
 Again I do not see the deeper reason for a whole discussion. This time 
 this discussion about properties. Properties _are_ restricted. If one 
 do not want this restrictions, one can use a class instead.

http://www.digitalmars.com/d/property.html
"Properties are member functions that can be syntactically treated as if 
they were fields"

I was under the impression the main benefit to properties was being able 
to replace an existing field (one in use by some user code) with a 
property and have it work without user code changes.

eg.

---------BEFORE--------

class A
{
   public int a;
}

void foo(ref int i) {}

void main()
{
	A a = new A();
         int b;

         b = a.a = 5;
	a.a += 1;
	a.a++;
	foo(a.a);
}

---------AFTER---------

<after>
class A
{
   int _a;
   public int a() { return _a; }
   public int a(int _aa) { _a = _aa; return a(); }
}

void main()
{
	A a = new A();
         int b;

         b = a.a = 5;  //error
	a.a += 1;     //error
	a.a++;        //error
         foo(a.a);     //error
}

Sadly there are plenty of cases where a property cannot be 
"syntactically treated as a field" but needs a completely different syntax.

Sure, there are other benefits for properties like performing some 
complex calculation on the input to the setter, or error checking it, or 
whatever but I don't think this is the core benefit to properties as 
these can be achieved with plain old methods i.e. set<Propertyname>

Something that would solve 3 of the errors above is the ability to 
return by 'ref', eg.

class A
{
   int _a;
   public ref int a() { return _a; }
   public ref int a(int _aa) { _a = _aa; return a(); }
}

void main()
{
	A a = new A();
         int b;

         b = a.a = 5;  //error
	a.a += 1;     //ok
	a.a++;        //ok
         foo(a.a);     //ok
}

The problem with the remaining error is that a setter might take and 
return 2 different types, as mentioned here:
   http://www.digitalmars.com/d/archives/digitalmars/D/10199.html

Regan

Aug 23 2007

Manfred Nowak <svv1999 hotmail.com> writes:

Regan Heath wrote

 "Properties are member functions that can be syntactically treated
 as if they were fields"
 
 I was under the impression the main benefit to properties was
 being able to replace an existing field (one in use by some user
 code) with a property and have it work without user code changes.

For me the definition implies that fields can replace properties, but 
no property can replace a field.

I read the definition above like this:
: if a member function can be syntactically treated as a field, then 
: it is a property
i.e.,  properties have less syntactical power than fields.

Because member functions with at most one formal parameter can be 
treated as fields in assignments, i.e. without the parentheses, D has 
properties according to the definition in the docs.

All those recognizable errors in yours and Stewart's examples are 
based on the wrong assumption, that properties are more powerful than 
fields.

-manfred

Aug 23 2007

Regan Heath <regan netmail.co.nz> writes:

Manfred Nowak wrote:
 Regan Heath wrote
 
 "Properties are member functions that can be syntactically treated
 as if they were fields"

 I was under the impression the main benefit to properties was
 being able to replace an existing field (one in use by some user
 code) with a property and have it work without user code changes.

 
 For me the definition implies that fields can replace properties, but 
 no property can replace a field.
 
 I read the definition above like this:
 : if a member function can be syntactically treated as a field, then 
 : it is a property

Sure, you can reverse the definition if you like.  The end result is 
that as there are no member functions in D which can be syntactically 
treated as if they were fields (in all cases) then D does not have 
properties by this definition.

Note I use "(in all cases)" above, that is the assumption I am making, 
if any.  I believe this assumption is implied by the definition, just as 
I believe your assumption below is not.

 i.e.,  properties have less syntactical power than fields.

This is certainly true, but how do you draw that conclusion from the 
definition? (this is not the assumption to which I refer above)

 Because member functions with at most one formal parameter can be 
 treated as fields in assignments, i.e. without the parentheses, D has 
 properties according to the definition in the docs.

It seems you are giving us a new defintion for properties in D which 
reads something like:

"properties are member functions with at most one formal parameter 
(which) can be treated as fields in assignments only"


Lets take a look at the full text of the paragraph in the docs 
containing the definition of a property:

"Properties are member functions that can be syntactically treated as if 
they were fields. Properties can be read from or written to. A property 
is read by calling a method with no arguments; a property is written by 
calling a method with its argument being the value it is set to."

Where does the definition mention only supporting assignments? (this is 
your assumption)

Where does it mention "at most one formal parameter"?

 All those recognizable errors in yours and Stewart's examples are 
 based on the wrong assumption, that properties are more powerful than 
 fields.

It's possible my assumption "that properties should be treated 
syntactically like fields (in all cases)" is incorrect but given the 
core purpose of properties (to replace fields seamlessly) I don't think 
it's an outrageous assumption to make.

I am simply expecting to be able to treat properties as fields 
(syntactically speaking) which is (almost to the letter/word) exactly 
what the definition states.

We can quible over the definition all we like and frankly I don't really 
care to.  We all know the docs are seldom precisely defined nor do they 
necessarily reflect reality.

The simple fact remains that just about every new D user writes:

char[] p;
p.length += 5;

and expects the 'length' "property" to be incremented by 5.

Unless the error cases mentioned are supported by properties then the 
core reason for having properties (being able to seamlessly refactor 
replacing a field with a property) is null and void as it will regularly 
result in changes being required to user code.

Properties are not as useful as many of us wish they were.

I assume you agree that it would be quite nice to be able to use 
properties in the error cases listed?

Regan

Aug 23 2007

Manfred Nowak <svv1999 hotmail.com> writes:

Regan Heath wrote
 I assume you agree that it would be quite nice to be able to use 
 properties in the error cases listed?

Not exactly properties, but one should be able to iron out those 
errors. As I stated before, one can use an inner class:

class A{
  private:
   int _a;
  public:
   Property a;
   this(){ a= new Property;}
   class Property{
     int opCall() { return _a; }
     int opAssign(int assgn) { _a= assgn; return opCall(); }
     int opAddAssign( int add){ _a+= add; return opCall();}
     int opPostInc(){ int tmp= _a++; return tmp;}
   }
}
void foo(inout A.Property i) {} // wart!!
void main(){
	A a = new A();
         int b;
         b = a.a = 5; 
	a.a += 1;
	a.a++;
         foo(a.a);
}

In the examples given here two warts are remaining:
- classes cannot derive from basic types, therefore the type of the 
formal parameter of `foo' has to be changed
- Stewarts point stays unhandled.

In fact the first wart may be closed by allowing something like 
`class Property: int' or `alias int Property'.

Stewarts point does not need overloading by return type, a 
conditional or lazy return `return? <expr>' would be enough. Where 
`return?' has the semantics to not be evaluated if the value of the 
expression `<expr>' is not needed or fed as an actual parameter to a 
function at a position in the formal parameter list, where a `lazy' 
parameter is declared.

-manfred

Aug 23 2007

"Stewart Gordon" <smjg_1998 yahoo.com> writes:

"Manfred Nowak" <svv1999 hotmail.com> wrote in message 
news:faki47$2ifr$1 digitalmars.com...
 Regan Heath wrote

 "Properties are member functions that can be syntactically treated
 as if they were fields"


What I think it means is simply that you can get/set a property with 
notation that looks as though you're getting/setting a field.

 I was under the impression the main benefit to properties was
 being able to replace an existing field (one in use by some user
 code) with a property and have it work without user code changes.


I think that's more or less what was meant to happen, but it hasn't quite 
turned out that way.

 For me the definition implies that fields can replace properties, but
 no property can replace a field.

I don't see how you work that out.

 I read the definition above like this:
 : if a member function can be syntactically treated as a field, then
 : it is a property
 i.e.,  properties have less syntactical power than fields.

 Because member functions with at most one formal parameter can be
 treated as fields in assignments, i.e. without the parentheses, D has
 properties according to the definition in the docs.

 All those recognizable errors in yours and Stewart's examples are
 based on the wrong assumption, that properties are more powerful than
 fields.

I suppose properties have less syntactical power, but more power in terms of 
practical uses.  If that makes sense....

Stewart.

Aug 23 2007

Derek Parnell <derek psych.ward> writes:

On Thu, 23 Aug 2007 22:20:16 +0100, Stewart Gordon wrote:

 I suppose properties have less syntactical power, but more power in terms of 
 practical uses.  If that makes sense....

Yes I can see what you mean.

( silly example: )
    long x() } return { _x * 2 + _y; }
    void x(int y) { _x = y / 2; _y = y - 2; }
    void x(char[] y) { . . . }

    foo.x = 16;
    long A = foo.x;
    foo.x = "abc";
    long B = foo.x;
  
On the other hand, when I first read about properties I thought
"Brilliant!" and started using them. However it soon became apparent that
they were not brilliant but were really a PITA. I can't be bothered with
them now as they just increase the cost of maintenance, because they cannot
be syntactically be treated as the fields that they look like. They deceive
coders because they look like fields but they are not fields.

In short, properties in D are evil in the same way that 'goto' is evil. If
you use them, be prepared to pay extra for them.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Aug 23 2007

D Programming

C/C++ Programming

Other

digitalmars.D.learn - [Design] return char[] or string?