www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Semantics of toString

reply Justin Johansson <free beer.com> writes:
I assert that the semantics of "toString" or similarly named/purposed
methods/functions in many
PL's (including and not limited to D) is ill-defined.

To put this statement into perspective, I would be most appreciative of D NG
readers
responding with their own idea(s) of what the semantics of "toString" are (or
should be)
in a language agnostic ideology.

If there are more than, say, two or three different views on the said semantics
then my
"ill-definition" assertion is surely correct.

If there are no replies on this matter, then guess I'm left concludeless.

Just thinking in the language round-up that this is (just another) one of the
things
we should address as a community.

So what does "toString" mean to you?

**beers,
Justin

**caveat: free beer offer available in-store only
Nov 05 2009
next sibling parent Michal Minich <michal minich.sk> writes:
Hello Justin,

 I assert that the semantics of "toString" or similarly named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.
 
 To put this statement into perspective, I would be most appreciative
 of D NG readers responding with their own idea(s) of what the
 semantics of "toString" are (or should be) in a language agnostic
 ideology.
 
 If there are more than, say, two or three different views on the said
 semantics then my "ill-definition" assertion is surely correct.
 
 If there are no replies on this matter, then guess I'm left
 concludeless.
 
 Just thinking in the language round-up that this is (just another) one
 of the things we should address as a community.
 
 So what does "toString" mean to you?
 
 **beers,
 Justin
 **caveat: free beer offer available in-store only
 

My practice tells me to use toString only for debugging - to quickly get string representation of object in human readable format - nothing else ever. So it is good that toString is part of D object class. It quite unsuitable e.g. for serializing object to xml/html or other formats. You may find yourself later finding out that your object should not only be toString-ed to xml, but now to json... Better to use specific method for specific purpose. what matters me more of object methods, is opEquals being part of them. But that is different story.
Nov 05 2009
prev sibling next sibling parent Ary Borenszweig <ary esperanto.org.ar> writes:
Justin Johansson wrote:
 So what does "toString" mean to you?

A string useful for debugging purposes and, when possible, useful for programming tasks. For example in Java there's StringWriter and the toString method returns the String being written, I think that's fine. An XML node might return it's xml representation. But most of the time an object dosen't have a use as a string.
Nov 05 2009
prev sibling next sibling parent "Nick Sabalausky" <a a.a> writes:
"Justin Johansson" <free beer.com> wrote in message 
news:hcuhet$15a2$1 digitalmars.com...
I assert that the semantics of "toString" or similarly named/purposed 
methods/functions in many
 PL's (including and not limited to D) is ill-defined.

 To put this statement into perspective, I would be most appreciative of D 
 NG readers
 responding with their own idea(s) of what the semantics of "toString" are 
 (or should be)
 in a language agnostic ideology.

 If there are more than, say, two or three different views on the said 
 semantics then my
 "ill-definition" assertion is surely correct.

 If there are no replies on this matter, then guess I'm left concludeless.

 Just thinking in the language round-up that this is (just another) one of 
 the things
 we should address as a community.

 So what does "toString" mean to you?

 **beers,
 Justin

 **caveat: free beer offer available in-store only

(Deliberately not reading the other replies before posting...) It means to me, obtain a string-representation of an object (or an instance of a non-class type) in whatever form is reasonably appropriate for the given type. This string representation might include all data, but this is not guaranteed. It might be unique to each object, but this is not guaranteed. It might be fully-suitable for serialization, but this is not guaranteed.
Nov 05 2009
prev sibling next sibling parent reply Don <nospam nospam.com> writes:
Justin Johansson wrote:
 I assert that the semantics of "toString" or similarly named/purposed
methods/functions in many
 PL's (including and not limited to D) is ill-defined.
 
 To put this statement into perspective, I would be most appreciative of D NG
readers
 responding with their own idea(s) of what the semantics of "toString" are (or
should be)
 in a language agnostic ideology.
 
 If there are more than, say, two or three different views on the said
semantics then my
 "ill-definition" assertion is surely correct.
 
 If there are no replies on this matter, then guess I'm left concludeless.
 
 Just thinking in the language round-up that this is (just another) one of the
things
 we should address as a community.
 
 So what does "toString" mean to you?

It's a hack from the early days of D. Should be unavailable unless the -debug flag is set, to discourage people from using it. I hate it.
Nov 05 2009
next sibling parent reply "Nick Sabalausky" <a a.a> writes:
"Don" <nospam nospam.com> wrote in message 
news:hcvf9l$91i$1 digitalmars.com...
 Justin Johansson wrote:
 So what does "toString" mean to you?

It's a hack from the early days of D. Should be unavailable unless the -debug flag is set, to discourage people from using it. I hate it.

What don't you like about it?
Nov 05 2009
parent reply Don <nospam nospam.com> writes:
Nick Sabalausky wrote:
 "Don" <nospam nospam.com> wrote in message 
 news:hcvf9l$91i$1 digitalmars.com...
 Justin Johansson wrote:
 So what does "toString" mean to you?

the -debug flag is set, to discourage people from using it. I hate it.

What don't you like about it?

It cannot even do the most basic stuff. (1) You can't even make a struct that behaves like an int. struct MyInt { int z; string toString() { .... } } void main() { int a = 400; MyInt b = 400; writefln("%05d %05d", a, b); writefln("%x %x", a, b); } (2) It doesn't behave like a stream. Suppose you have XmlDoc.toString() You can't emit the doc, piece by piece. You have to create the ENTIRE string in one go!
Nov 05 2009
parent reply Yigal Chripun <yigal100 gmail.com> writes:
On 06/11/2009 07:34, Don wrote:
 Nick Sabalausky wrote:
 "Don" <nospam nospam.com> wrote in message
 news:hcvf9l$91i$1 digitalmars.com...
 Justin Johansson wrote:
 So what does "toString" mean to you?

the -debug flag is set, to discourage people from using it. I hate it.

What don't you like about it?

It cannot even do the most basic stuff. (1) You can't even make a struct that behaves like an int. struct MyInt { int z; string toString() { .... } } void main() { int a = 400; MyInt b = 400; writefln("%05d %05d", a, b); writefln("%x %x", a, b); } (2) It doesn't behave like a stream. Suppose you have XmlDoc.toString() You can't emit the doc, piece by piece. You have to create the ENTIRE string in one go!

The first issue you raise is IMO a problem with writefln and not with toString since writefln doesn't handle user-defined types properly. I think that writefln (btw, horrible name) should only deal with strings and their formatting and all other types need to provide an (optionally formatted) string. a numeric type would provide formatting of properties like number of decimal places, thousands separator, etc while user defined specification type could provide a type of standard format. auto spec = new Specification(HTML); string ansi = spec.toString(Specification.ANSI); string iso = spec.toString(Specification.ISO); writefln ("{1} {0}", ansi, iso); // i'm using the tango/C# formatting the c style format string that specifies types is a horrible horrible thing and should be removed. regarding the second issue: forech (node; XmlDoc.preOrder()) writfln("{0}", node.toString());
Nov 06 2009
parent reply =?ISO-8859-1?Q?Pelle_M=E5nsson?= <pelle.mansson gmail.com> writes:
Yigal Chripun wrote:
 On 06/11/2009 07:34, Don wrote:
 Nick Sabalausky wrote:
 "Don" <nospam nospam.com> wrote in message
 news:hcvf9l$91i$1 digitalmars.com...
 Justin Johansson wrote:
 So what does "toString" mean to you?

the -debug flag is set, to discourage people from using it. I hate it.

What don't you like about it?

It cannot even do the most basic stuff. (1) You can't even make a struct that behaves like an int. struct MyInt { int z; string toString() { .... } } void main() { int a = 400; MyInt b = 400; writefln("%05d %05d", a, b); writefln("%x %x", a, b); } (2) It doesn't behave like a stream. Suppose you have XmlDoc.toString() You can't emit the doc, piece by piece. You have to create the ENTIRE string in one go!

The first issue you raise is IMO a problem with writefln and not with toString since writefln doesn't handle user-defined types properly. I think that writefln (btw, horrible name) should only deal with strings and their formatting and all other types need to provide an (optionally formatted) string. a numeric type would provide formatting of properties like number of decimal places, thousands separator, etc while user defined specification type could provide a type of standard format. auto spec = new Specification(HTML); string ansi = spec.toString(Specification.ANSI); string iso = spec.toString(Specification.ISO); writefln ("{1} {0}", ansi, iso); // i'm using the tango/C# formatting the c style format string that specifies types is a horrible horrible thing and should be removed.

Your formatting string should be written as writeln(ansi, " ", iso);
Nov 06 2009
parent reply Yigal Chripun <yigal100 gmail.com> writes:
On 06/11/2009 12:34, Pelle Månsson wrote:
 Yigal Chripun wrote:
 On 06/11/2009 07:34, Don wrote:
 Nick Sabalausky wrote:
 "Don" <nospam nospam.com> wrote in message
 news:hcvf9l$91i$1 digitalmars.com...
 Justin Johansson wrote:
 So what does "toString" mean to you?

the -debug flag is set, to discourage people from using it. I hate it.

What don't you like about it?

It cannot even do the most basic stuff. (1) You can't even make a struct that behaves like an int. struct MyInt { int z; string toString() { .... } } void main() { int a = 400; MyInt b = 400; writefln("%05d %05d", a, b); writefln("%x %x", a, b); } (2) It doesn't behave like a stream. Suppose you have XmlDoc.toString() You can't emit the doc, piece by piece. You have to create the ENTIRE string in one go!

The first issue you raise is IMO a problem with writefln and not with toString since writefln doesn't handle user-defined types properly. I think that writefln (btw, horrible name) should only deal with strings and their formatting and all other types need to provide an (optionally formatted) string. a numeric type would provide formatting of properties like number of decimal places, thousands separator, etc while user defined specification type could provide a type of standard format. auto spec = new Specification(HTML); string ansi = spec.toString(Specification.ANSI); string iso = spec.toString(Specification.ISO); writefln ("{1} {0}", ansi, iso); // i'm using the tango/C# formatting the c style format string that specifies types is a horrible horrible thing and should be removed.


 How do you do %.3f in {}-notation?

writefln("{0:F3}", value);
 Your formatting string should be written as writeln(ansi, " ", iso);

That is incorrect since in my example I use the format string to switch the order of the strings. ( hence the numbers inside the {} ) Please go and read the tango documentation starting with http://www.dsource.org/projects/tango/wiki/TutCSharpFormatter it has also links to the MSDN docs which describe the modifiers: for instance: http://msdn.microsoft.com/en-us/library/dwhawy9k%28VS.100%29.aspx This is one area in phobos that needs to be rewritten from scratch or better yet, use tango. I'm still waiting for when hell will freeze over and tango and phobos will be merged together in one consistent API.
Nov 06 2009
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Yigal Chripun wrote:
 On 06/11/2009 12:34, Pelle Månsson wrote:
 Yigal Chripun wrote:
 On 06/11/2009 07:34, Don wrote:
 Nick Sabalausky wrote:
 "Don" <nospam nospam.com> wrote in message
 news:hcvf9l$91i$1 digitalmars.com...
 Justin Johansson wrote:
 So what does "toString" mean to you?

the -debug flag is set, to discourage people from using it. I hate it.

What don't you like about it?

It cannot even do the most basic stuff. (1) You can't even make a struct that behaves like an int. struct MyInt { int z; string toString() { .... } } void main() { int a = 400; MyInt b = 400; writefln("%05d %05d", a, b); writefln("%x %x", a, b); } (2) It doesn't behave like a stream. Suppose you have XmlDoc.toString() You can't emit the doc, piece by piece. You have to create the ENTIRE string in one go!

The first issue you raise is IMO a problem with writefln and not with toString since writefln doesn't handle user-defined types properly. I think that writefln (btw, horrible name) should only deal with strings and their formatting and all other types need to provide an (optionally formatted) string. a numeric type would provide formatting of properties like number of decimal places, thousands separator, etc while user defined specification type could provide a type of standard format. auto spec = new Specification(HTML); string ansi = spec.toString(Specification.ANSI); string iso = spec.toString(Specification.ISO); writefln ("{1} {0}", ansi, iso); // i'm using the tango/C# formatting the c style format string that specifies types is a horrible horrible thing and should be removed.


 How do you do %.3f in {}-notation?

writefln("{0:F3}", value);
 Your formatting string should be written as writeln(ansi, " ", iso);

That is incorrect since in my example I use the format string to switch the order of the strings. ( hence the numbers inside the {} ) Please go and read the tango documentation starting with http://www.dsource.org/projects/tango/wiki/TutCSharpFormatter it has also links to the MSDN docs which describe the modifiers: for instance: http://msdn.microsoft.com/en-us/library/dwhawy9k%28VS.100%29.aspx This is one area in phobos that needs to be rewritten from scratch or better yet, use tango. I'm still waiting for when hell will freeze over and tango and phobos will be merged together in one consistent API.

Not sure to what extent it helps, but Phobos supports positional parameters too. Andrei
Nov 06 2009
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Leandro Lucarella wrote:
 Yigal Chripun, el  6 de noviembre a las 14:23 me escribiste:
 the c style format string that specifies types is a horrible horrible
 thing and should be removed.


 Your formatting string should be written as writeln(ansi, " ", iso);


This is horrible, horrible for internationalization, you just can't assume how a language order words. Anyway, about the type in the format, I think it's nice, as you just proved, tango have it too "{0:F3}" is saying "treat the value as a float and format it that way". The deal is, the type should not be used to know the size of the parameter in the stack like in C's printf(), it should be just a hint to convert the value to another type. So, type specification is important. Variables reordering is important too, and you even have it in POSIX's printf(): printf("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec); (see http://www.opengroup.org/onlinepubs/9699919799/functions/printf.html) I like printf()'s format (I don't know if it's just because I'm used to it though :).

I think you found a bug in Phobos. I tried this: import std.stdio; void main() { int hour = 1, min = 2, precision = 2, sec = 3; writef("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec); } and it prints 1:002:003 But it should really print: 1:02:03 right? Andrei
Nov 06 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Leandro Lucarella wrote:
 Andrei Alexandrescu, el  6 de noviembre a las 08:50 me escribiste:
 So, type specification is important. Variables reordering is important
 too, and you even have it in POSIX's printf():

 	printf("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec);

 (see http://www.opengroup.org/onlinepubs/9699919799/functions/printf.html)

 I like printf()'s format (I don't know if it's just because I'm used to it
 though :).

import std.stdio; void main() { int hour = 1, min = 2, precision = 2, sec = 3; writef("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec); } and it prints 1:002:003 But it should really print: 1:02:03 right?

Yes. ------------------------ $ cat t.c #include <stdio.h> int main() { int hour = 1, min = 2, precision = 2, sec = 3; printf("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec); return 0; } $ make t cc t.c -o t $ ./t 1:02:03 -----------------------

Thanks! http://d.puremagic.com/issues/show_bug.cgi?id=3479 Andrei
Nov 06 2009
prev sibling parent Yigal Chripun <yigal100 gmail.com> writes:
On 06/11/2009 15:38, Leandro Lucarella wrote:
 Yigal Chripun, el  6 de noviembre a las 14:23 me escribiste:
 the c style format string that specifies types is a horrible horrible
 thing and should be removed.


 How do you do %.3f in {}-notation?

writefln("{0:F3}", value);
 Your formatting string should be written as writeln(ansi, " ", iso);


This is horrible, horrible for internationalization, you just can't assume how a language order words. Anyway, about the type in the format, I think it's nice, as you just proved, tango have it too "{0:F3}" is saying "treat the value as a float and format it that way". The deal is, the type should not be used to know the size of the parameter in the stack like in C's printf(), it should be just a hint to convert the value to another type. So, type specification is important. Variables reordering is important too, and you even have it in POSIX's printf(): printf("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec); (see http://www.opengroup.org/onlinepubs/9699919799/functions/printf.html) I like printf()'s format (I don't know if it's just because I'm used to it though :).

F in the above is _not_ a type specifier. It is a format specifier that means "fixed". More over, each type defines it's own format specifiers, and there's also a way to custom format stuff. Here's some more examples: (from MSDN) string myName = "Fred"; Console.WriteLine(String.Format("Name = {0}, hours = {1:hh}, minutes = {1:mm}", myName, DateTime.Now)); // Depending on the current time, the example displays output like the following: // Name = Fred, hours = 11, minutes = 30 string FormatString1 = String.Format("{0:dddd MMMM}", DateTime.Now); string FormatString2 = DateTime.Now.ToString("dddd MMMM"); Console.WriteLine("{0:F}", DateTime.Now); // NOT float // F for DateTime means Full date/time pattern (long time). Another issue with the .NET design is that it's locale aware. e.g. // Display using pt-BR culture's short date format DateTime thisDate = new DateTime(2008, 3, 15); CultureInfo culture = new CultureInfo("pt-BR"); Console.WriteLine(thisDate.ToString("d", culture)); // Displays 15/3/2008 besides, the printf format is plain unreadable. It's like comparing ASCII to Unicode - D moved to native Unicode support and should move to this much better design as well.
Nov 06 2009
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Don (nospam nospam.com)'s article
 Justin Johansson wrote:
 I assert that the semantics of "toString" or similarly named/purposed


 PL's (including and not limited to D) is ill-defined.

 To put this statement into perspective, I would be most appreciative of D NG


 responding with their own idea(s) of what the semantics of "toString" are (or


 in a language agnostic ideology.

 If there are more than, say, two or three different views on the said


 "ill-definition" assertion is surely correct.

 If there are no replies on this matter, then guess I'm left concludeless.

 Just thinking in the language round-up that this is (just another) one of the


 we should address as a community.

 So what does "toString" mean to you?

-debug flag is set, to discourage people from using it. I hate it.

Why? You've said this several times w/o giving your reason. IMHO toString() is a great way to get a default string representation of something. If you care about the formatting details, then you use a non-special method. How else would you recommend giving objects a sane default string representation?
Nov 05 2009
parent div0 <div0 users.sourceforge.net> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

dsimcha wrote:
 
 Why?  You've said this several times w/o giving your reason.  IMHO toString()
is a
 great way to get a default string representation of something.

And that's *exactly* what is wrong. There is *never* a good default for anything. Just look at all the discussion of nullable. (shit people even complain about float.init == NaN) - -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFK82NwT9LetA9XoXwRAkh+AJ4552uLI2Fz938zj4SsmY0qIHga3QCfZ5pa Y5NmGJYF7tz8qk9K6GYB5J0= =jmrB -----END PGP SIGNATURE-----
Nov 05 2009
prev sibling parent Justin Johansson <no spam.com> writes:
Don Wrote:

 Justin Johansson wrote:
 I assert that the semantics of "toString" or similarly named/purposed
methods/functions in many
 PL's (including and not limited to D) is ill-defined.
 
 To put this statement into perspective, I would be most appreciative of D NG
readers
 responding with their own idea(s) of what the semantics of "toString" are (or
should be)
 in a language agnostic ideology.
 
 If there are more than, say, two or three different views on the said
semantics then my
 "ill-definition" assertion is surely correct.
 
 If there are no replies on this matter, then guess I'm left concludeless.
 
 Just thinking in the language round-up that this is (just another) one of the
things
 we should address as a community.
 
 So what does "toString" mean to you?

It's a hack from the early days of D. Should be unavailable unless the -debug flag is set, to discourage people from using it. I hate it.

There are some interesting replies coming along here. Thanks everybody for chipping in. I must admit though, when I read Don's reply just now the first thought that went through my mind was "Sweet!" Justin
Nov 05 2009
prev sibling next sibling parent Leandro Lucarella <llucax gmail.com> writes:
Yigal Chripun, el  6 de noviembre a las 14:23 me escribiste:
the c style format string that specifies types is a horrible horrible
thing and should be removed.


How do you do %.3f in {}-notation?

writefln("{0:F3}", value);
Your formatting string should be written as writeln(ansi, " ", iso);


This is horrible, horrible for internationalization, you just can't assume how a language order words. Anyway, about the type in the format, I think it's nice, as you just proved, tango have it too "{0:F3}" is saying "treat the value as a float and format it that way". The deal is, the type should not be used to know the size of the parameter in the stack like in C's printf(), it should be just a hint to convert the value to another type. So, type specification is important. Variables reordering is important too, and you even have it in POSIX's printf(): printf("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec); (see http://www.opengroup.org/onlinepubs/9699919799/functions/printf.html) I like printf()'s format (I don't know if it's just because I'm used to it though :). -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- "All mail clients suck. This one just sucks less." -me, circa 1995
Nov 06 2009
prev sibling next sibling parent Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el  6 de noviembre a las 08:50 me escribiste:
So, type specification is important. Variables reordering is important
too, and you even have it in POSIX's printf():

	printf("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec);

(see http://www.opengroup.org/onlinepubs/9699919799/functions/printf.html)

I like printf()'s format (I don't know if it's just because I'm used to it
though :).

I think you found a bug in Phobos. I tried this: import std.stdio; void main() { int hour = 1, min = 2, precision = 2, sec = 3; writef("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec); } and it prints 1:002:003 But it should really print: 1:02:03 right?

Yes. ------------------------ $ cat t.c #include <stdio.h> int main() { int hour = 1, min = 2, precision = 2, sec = 3; printf("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec); return 0; } $ make t cc t.c -o t $ ./t 1:02:03 ----------------------- -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Vaporeso, al verse enfundado por la depresión, decide dar fin a su vida tomando Chinato Garda mezclado con kerosene al 50%. Ante el duro trance pierde la movilidad en sus miembros derechos: inferior y superior. En ese momento es considerado como el hombre líder del movimiento de izquierda de Occidente.
Nov 06 2009
prev sibling next sibling parent Lutger <lutger.blijdestijn gmail.com> writes:
Justin Johansson wrote:

...
 So what does "toString" mean to you?

Whatever you got, give it to me as a string for my printf debugging while my debugger is broken.
Nov 08 2009
prev sibling parent reply Lutger <lutger.blijdestijn gmail.com> writes:
Justin Johansson wrote:

 I assert that the semantics of "toString" or similarly named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.
 
 To put this statement into perspective, I would be most appreciative of D
 NG readers responding with their own idea(s) of what the semantics of
 "toString" are (or should be) in a language agnostic ideology.
 

My other reply didn't take the language agnostic into account, sorry. Semantics of toString would depend on the object, I would think there are three general types of objects: 1. objects with only one sensible or one clear default string representations, like integers. Maybe even none of these exist (except strings themselves?) 2. objects that, given some formatting options or locale have a clear string representation. floating points, dates, curreny and the like. 3. objects that have no sensible default representation. toString() would not make sense for 3) type objects and only for 2) type objects as part of a formatting / localization package. toString() as a debugging aid sometimes doubles as a formatter for 1) and 2) class objects, but that may be more confusing than it's worth.
Nov 08 2009
next sibling parent reply Justin Johansson <no spam.com> writes:
Lutger Wrote:

 Justin Johansson wrote:
 
 I assert that the semantics of "toString" or similarly named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.
 
 To put this statement into perspective, I would be most appreciative of D
 NG readers responding with their own idea(s) of what the semantics of
 "toString" are (or should be) in a language agnostic ideology.
 

My other reply didn't take the language agnostic into account, sorry. Semantics of toString would depend on the object, I would think there are three general types of objects: 1. objects with only one sensible or one clear default string representations, like integers. Maybe even none of these exist (except strings themselves?) 2. objects that, given some formatting options or locale have a clear string representation. floating points, dates, curreny and the like. 3. objects that have no sensible default representation. toString() would not make sense for 3) type objects and only for 2) type objects as part of a formatting / localization package. toString() as a debugging aid sometimes doubles as a formatter for 1) and 2) class objects, but that may be more confusing than it's worth.

Thanks for that Lutger. Do you think it would make better sense if programming languages/their libraries separated functions/methods which are currently loosely purposed as "toString" into methods which are more specific to the types you suggest (leaving only the types/classifications and number thereof to argue about)? In my own D project, I've introduced a toDebugString method and left toString alone. There are times when I like D's default toString printing out the name of the object class. For debug purposes there are times also when I like to see a string printed out in quotes so you can tell the difference between "123" and 123. Then again, and since I'm working on a scripting language, sometimes I like to see debug output distinguish between different numeric types. Anyway going by the replies on this topic, looks like most people view toString as being good for debug purposes and that about it. Cheers Justin
Nov 08 2009
parent reply Lutger <lutger.blijdestijn gmail.com> writes:
Justin Johansson wrote:

 Lutger Wrote:
 
 Justin Johansson wrote:
 
 I assert that the semantics of "toString" or similarly named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.
 
 To put this statement into perspective, I would be most appreciative of
 D NG readers responding with their own idea(s) of what the semantics of
 "toString" are (or should be) in a language agnostic ideology.
 

My other reply didn't take the language agnostic into account, sorry. Semantics of toString would depend on the object, I would think there are three general types of objects: 1. objects with only one sensible or one clear default string representations, like integers. Maybe even none of these exist (except strings themselves?) 2. objects that, given some formatting options or locale have a clear string representation. floating points, dates, curreny and the like. 3. objects that have no sensible default representation. toString() would not make sense for 3) type objects and only for 2) type objects as part of a formatting / localization package. toString() as a debugging aid sometimes doubles as a formatter for 1) and 2) class objects, but that may be more confusing than it's worth.

Thanks for that Lutger. Do you think it would make better sense if programming languages/their libraries separated functions/methods which are currently loosely purposed as "toString" into methods which are more specific to the types you suggest (leaving only the types/classifications and number thereof to argue about)? In my own D project, I've introduced a toDebugString method and left toString alone. There are times when I like D's default toString printing out the name of the object class. For debug purposes there are times also when I like to see a string printed out in quotes so you can tell the difference between "123" and 123. Then again, and since I'm working on a scripting language, sometimes I like to see debug output distinguish between different numeric types. Anyway going by the replies on this topic, looks like most people view toString as being good for debug purposes and that about it. Cheers Justin

Your design makes better sense (to me at least) because it is based on why you want a string from some object. Take .NET for example: it does provide very elaborate and nice formatting options based and toString() with parameters. For some types however, the default toString() gives you the name of the type itself which is in no way related to formatting an object. You learn to work with it, but I find it a bit muddled. As a last note, I think people view toString as a debug thing mostly because it is very underpowered.
Nov 10 2009
next sibling parent reply Don <nospam nospam.com> writes:
Lutger wrote:
 Justin Johansson wrote:
 
 Lutger Wrote:

 Justin Johansson wrote:

 I assert that the semantics of "toString" or similarly named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.

 To put this statement into perspective, I would be most appreciative of
 D NG readers responding with their own idea(s) of what the semantics of
 "toString" are (or should be) in a language agnostic ideology.

Semantics of toString would depend on the object, I would think there are three general types of objects: 1. objects with only one sensible or one clear default string representations, like integers. Maybe even none of these exist (except strings themselves?) 2. objects that, given some formatting options or locale have a clear string representation. floating points, dates, curreny and the like. 3. objects that have no sensible default representation. toString() would not make sense for 3) type objects and only for 2) type objects as part of a formatting / localization package. toString() as a debugging aid sometimes doubles as a formatter for 1) and 2) class objects, but that may be more confusing than it's worth.

Do you think it would make better sense if programming languages/their libraries separated functions/methods which are currently loosely purposed as "toString" into methods which are more specific to the types you suggest (leaving only the types/classifications and number thereof to argue about)? In my own D project, I've introduced a toDebugString method and left toString alone. There are times when I like D's default toString printing out the name of the object class. For debug purposes there are times also when I like to see a string printed out in quotes so you can tell the difference between "123" and 123. Then again, and since I'm working on a scripting language, sometimes I like to see debug output distinguish between different numeric types. Anyway going by the replies on this topic, looks like most people view toString as being good for debug purposes and that about it. Cheers Justin

Your design makes better sense (to me at least) because it is based on why you want a string from some object. Take .NET for example: it does provide very elaborate and nice formatting options based and toString() with parameters. For some types however, the default toString() gives you the name of the type itself which is in no way related to formatting an object. You learn to work with it, but I find it a bit muddled. As a last note, I think people view toString as a debug thing mostly because it is very underpowered.

There is a definite use for such as thing. But the existing toString() is much, much worse than useless. People think you can do something with it, but you can't. eg, people have asked for BigInt to support toString(). That is an over-my-dead-body.
Nov 10 2009
next sibling parent reply Lutger <lutger.blijdestijn gmail.com> writes:
Don wrote:
...
 
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

Since you are in the know and probably the biggest toString() hater around: are there plans (or rejections thereof) to change toString() before D2 turns gold? Seems to me it could break quite some code.
Nov 10 2009
next sibling parent reply Justin Johansson <no spam.com> writes:
Lutger Wrote:

 Don wrote:
 ...
 
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

Since you are in the know and probably the biggest toString() hater around: are there plans (or rejections thereof) to change toString() before D2 turns gold? Seems to me it could break quite some code.

I have a feeling (and I may well be wrong) that toString might be used in relation to associative arrays. I implemented an AA recently based upon a struct key (I think). Though I cannot remember the exact details I do remember DMD saying something about toString not implemented and so without thinking I gave the struct a toString and that kept DMD happy. Since the code was throw-away I didn't bother to investigate. Like I say, I cannot remember the details but others may recall some similar experience. For all I know it may be a case of RTFM? beers, Justin
Nov 10 2009
parent Justin Johansson <no spam.com> writes:
Bill Baxter Wrote:

 On Tue, Nov 10, 2009 at 3:59 AM, Justin Johansson <no spam.com> wrote:
 Lutger Wrote:

 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

Since you are in the know and probably the biggest toString() hater around: are there plans (or rejections thereof) to change toString() before D2 turns gold? Seems to me it could break quite some code.

I have a feeling (and I may well be wrong) that toString might be used in relation to associative arrays.  I implemented an AA recently based upon a struct key (I think).  Though I cannot remember the exact details I do remember DMD saying something about toString not implemented and so without thinking I gave the struct a toString and that kept DMD happy. Since the code was throw-away I didn't bother to investigate. Like I say, I cannot remember the details but others may recall some similar experience.  For all I know it may be a case of RTFM?

Shouldn't be the case. From TFM: """ Classes can be used as the KeyType. For this to work, the class definition must override the following member functions of class Object: •hash_t toHash() •bool opEquals(Object) •int opCmp(Object) """ --bb

I think you are right; if I can dig up what it was, and if relevant to this discussion, I'll post it. Ignore what I said for mom. Just wondering now though and in reference to Lutger's comment
 Since you are in the know and probably the biggest toString() hater around:
 are there plans (or rejections thereof) to change toString() before D2 turns
 gold? Seems to me it could break quite some code.



how much core code would be broken if toString was actually banished?
Nov 10 2009
prev sibling next sibling parent reply Don <nospam nospam.com> writes:
Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

Since you are in the know and probably the biggest toString() hater around: are there plans (or rejections thereof) to change toString() before D2 turns gold? Seems to me it could break quite some code.

I'm hoping someone will come up with a design. Straw man: void toString(void delegate(const(char)[]) sink, string fmt) { // fmt holds the format string from writefln/formatln. // call sink() to print partial results. }
Nov 10 2009
next sibling parent reply Justin Johansson <no spam.com> writes:
Don Wrote:

 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

Since you are in the know and probably the biggest toString() hater around: are there plans (or rejections thereof) to change toString() before D2 turns gold? Seems to me it could break quite some code.

I'm hoping someone will come up with a design. Straw man: void toString(void delegate(const(char)[]) sink, string fmt) { // fmt holds the format string from writefln/formatln. // call sink() to print partial results. }

That's starting to look like a "serialize" method!
Nov 10 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Tue, 10 Nov 2009 07:49:11 -0500, Justin Johansson <no spam.com> wrote:
 
 Don Wrote:

 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing 


 is much, much worse than useless. People think you can do 


 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

Since you are in the know and probably the biggest toString() hater

 are there plans (or rejections thereof) to change toString() before 

 gold? Seems to me it could break quite some code.

I'm hoping someone will come up with a design. Straw man: void toString(void delegate(const(char)[]) sink, string fmt) { // fmt holds the format string from writefln/formatln. // call sink() to print partial results. }

That's starting to look like a "serialize" method!

As it should. I should be able to print a 10000 element container without having to load a string representation of 10000 elements in memory. I'd also like to see the name toString changed to something more appropriate, like output(). And although I think a direct translation is mostly possible, emulating writefln string formatting from tango would be a burden. I don't know if there's any way around it without coming up with some complicated "formatting provider" interface/object implementation, and I don't think it's worth it. Unfortunately, I doubt Walter accepts this, it's been proposed in the past without success. -Steve

Walter does not feel strongly about Phobos. The save() method in "On Iteration" intently makes it possible to define ranges as interfaces, which in turn should pave the way towards defining a coherent text streaming mechanism. Andrei
Nov 10 2009
prev sibling next sibling parent Don <nospam nospam.com> writes:
Bill Baxter wrote:
 On Tue, Nov 10, 2009 at 4:40 AM, Don <nospam nospam.com> wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

around: are there plans (or rejections thereof) to change toString() before D2 turns gold? Seems to me it could break quite some code.

I'm hoping someone will come up with a design. Straw man: void toString(void delegate(const(char)[]) sink, string fmt) { // fmt holds the format string from writefln/formatln. // call sink() to print partial results. }

That looks pretty good, actually. I guess I would like to see plain no-arg toString() still supported.

The thing is, the toString() function is essentially a virtual function present in every struct. Each one of those functions needs a very strong justification to exist.
 A default toString() could be implemented in terms of the fancy one as:
 
 string toString() {
      char buf[];
      toString( (string s) { buf ~= s; }, "" );
      return assumeUnique!(buf);
 }
 
 could be a mixin in a library I suppose.

More for the benefit of consumers, or producers? Because void toString(void delegate(const(char)[]) sink, string fmt) { sink("xxx"); } isn't much more complex than: string toString() { return "xxx"; } other than the signature.
 
 I think I would like to see the format strings not necessarily tied to
 writefln's particular format.

I think the format strings are actually pretty similar, Tango vs writefln? There might be enough common ground. I think the Tango format is a slight superset of the writefln one.
Nov 10 2009
prev sibling next sibling parent reply grauzone <none example.net> writes:
Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.



How are you supposed to print a BigInt then?
 Since you are in the know and probably the biggest toString() hater 
 around: are there plans (or rejections thereof) to change toString() 
 before D2 turns gold? Seems to me it could break quite some code.

I'm hoping someone will come up with a design. Straw man: void toString(void delegate(const(char)[]) sink, string fmt) { // fmt holds the format string from writefln/formatln. // call sink() to print partial results. }

Just put it into an "interface DebugOutput", remove Object.toString(), and be done with it. That interface could be defined in the same module as writefln or format, and its use will be clear.
Nov 10 2009
parent reply Don <nospam nospam.com> writes:
grauzone wrote:
 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something 
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.



How are you supposed to print a BigInt then?

(The problem even more obvious if you consider BigFloat).
 Just put it into an "interface DebugOutput", remove Object.toString(), 
 and be done with it. That interface could be defined in the same module 
 as writefln or format, and its use will be clear.

BigInt is a struct, so it doesn't have interfaces.
Nov 10 2009
parent reply grauzone <none example.net> writes:
Don wrote:
 grauzone wrote:
 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something 
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.



How are you supposed to print a BigInt then?

(The problem even more obvious if you consider BigFloat).
 Just put it into an "interface DebugOutput", remove Object.toString(), 
 and be done with it. That interface could be defined in the same 
 module as writefln or format, and its use will be clear.

BigInt is a struct, so it doesn't have interfaces.

Structs are a different matter. Nothing dictates that a struct should have a toString method, or what arguments that method should have, right? (There's this compiler/runtime hack to make struct toString work with writefln, but now that wirtefln uses compile time varargs, it can go.)
Nov 10 2009
parent Don <nospam nospam.com> writes:
grauzone wrote:
 Don wrote:
 grauzone wrote:
 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing 
 toString()
 is much, much worse than useless. People think you can do 
 something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.



How are you supposed to print a BigInt then?

(The problem even more obvious if you consider BigFloat).
 Just put it into an "interface DebugOutput", remove 
 Object.toString(), and be done with it. That interface could be 
 defined in the same module as writefln or format, and its use will be 
 clear.

BigInt is a struct, so it doesn't have interfaces.

Structs are a different matter. Nothing dictates that a struct should have a toString method, or what arguments that method should have, right? (There's this compiler/runtime hack to make struct toString work with writefln, but now that wirtefln uses compile time varargs, it can go.)

This discussion is about that hack. Yes, it might be unnecessary if compile time varargs work sufficiently well.
Nov 10 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

Since you are in the know and probably the biggest toString() hater around: are there plans (or rejections thereof) to change toString() before D2 turns gold? Seems to me it could break quite some code.

I'm hoping someone will come up with a design. Straw man: void toString(void delegate(const(char)[]) sink, string fmt) { // fmt holds the format string from writefln/formatln. // call sink() to print partial results. }

I think the best option for toString is to take an output range and write to it. (The sink is a simplified range.) Andrei
Nov 10 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Denis Koroskin wrote:
 On Wed, 11 Nov 2009 02:49:54 +0300, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something 
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

Since you are in the know and probably the biggest toString() hater around: are there plans (or rejections thereof) to change toString() before D2 turns gold? Seems to me it could break quite some code.

Straw man: void toString(void delegate(const(char)[]) sink, string fmt) { // fmt holds the format string from writefln/formatln. // call sink() to print partial results. }

I think the best option for toString is to take an output range and write to it. (The sink is a simplified range.) Andrei

It means toString() must be either a template, or accept an abstract InputRange interface?

It should take an interface. Andrei
Nov 10 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 2009/11/10 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 Denis Koroskin wrote:
 On Wed, 11 Nov 2009 02:49:54 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

around: are there plans (or rejections thereof) to change toString() before D2 turns gold? Seems to me it could break quite some code.

Straw man: void toString(void delegate(const(char)[]) sink, string fmt) { // fmt holds the format string from writefln/formatln. // call sink() to print partial results. }

to it. (The sink is a simplified range.) Andrei

InputRange interface?


So yet another type in object.d? Or require users in import something specific in every module that's going to use toString? --bb

I am not sure. Opinions as always are welcome. Andrei
Nov 10 2009
next sibling parent Don <nospam nospam.com> writes:
Bill Baxter wrote:
 On Tue, Nov 10, 2009 at 5:27 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 2009/11/10 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 Denis Koroskin wrote:
 On Wed, 11 Nov 2009 02:49:54 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing
 toString()
 is much, much worse than useless. People think you can do something
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

around: are there plans (or rejections thereof) to change toString() before D2 turns gold? Seems to me it could break quite some code.

Straw man: void toString(void delegate(const(char)[]) sink, string fmt) { // fmt holds the format string from writefln/formatln. // call sink() to print partial results. }

write to it. (The sink is a simplified range.) Andrei

InputRange interface?


Or require users in import something specific in every module that's going to use toString? --bb


That's why my opinion is that the delegate idea is nice. :-) But I guess toString is already defined by Object, right? So it would make sense for an interface needed by an Object method to be defined in object.d. I suppose it could be an interface defined inside the Object class itself? (Does that work? can you define interfaces inside classes?)

It also needs to be used by structs, which aren't inherited from Object. So I don't see how a nested interface could work. I suggest a design acceptance criterion: the simplest case should be about as simple as: return "xxx"; or put("xxx");
Nov 11 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Denis Koroskin wrote:
 On Wed, 11 Nov 2009 04:27:45 +0300, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Bill Baxter wrote:
 2009/11/10 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 Denis Koroskin wrote:
 On Wed, 11 Nov 2009 02:49:54 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing 
 toString()
 is much, much worse than useless. People think you can do 
 something
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

around: are there plans (or rejections thereof) to change toString() before D2 turns gold? Seems to me it could break quite some code.

Straw man: void toString(void delegate(const(char)[]) sink, string fmt) { // fmt holds the format string from writefln/formatln. // call sink() to print partial results. }

and write to it. (The sink is a simplified range.) Andrei

InputRange interface?


Or require users in import something specific in every module that's going to use toString? --bb

I am not sure. Opinions as always are welcome. Andrei

Some ranges may be polymorphic, so having base interface hierarchy in Phobos would be useful anyway. BTW, save() is already implemented and used throughout the Phobos under a different name - opSlice (i.e. auto copy = range[]). It's a bikeshed discussion, but why save() and not opSlice(), or even clone()?

It can't be clone() because it doesn't clone. For example say you have a T[] - one would expect clone() actually copies the content. But using opSlice is a good idea. Andrei
Nov 11 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Denis Koroskin wrote:
 Well, range doesn't own any of the contents it covers, so deep copy is 
 impossible.
 Yet, there is also .dup array property which is pretends to be a 
 standard way of creating instance copies.

Well so the second sentence contradicts the first. Let me put it another way: you have the entire vocabulary at your disposal to define save(). Wouldn't you think clone() may be a bit more confusing than others? Andrei
Nov 11 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
2009/11/11 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 Denis Koroskin wrote:
 Well, range doesn't own any of the contents it covers, so deep copy is
 impossible.
 Yet, there is also .dup array property which is pretends to be a standard
 way of creating instance copies.

Well so the second sentence contradicts the first. Let me put it another way: you have the entire vocabulary at your disposal to define save(). Wouldn't you think clone() may be a bit more confusing than others?

makeBreadCrumb() ? :-) --bb
Nov 11 2009
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 I think the best option for toString is to take an output range and 
 write to it. (The sink is a simplified range.)

Bad idea... A range only makes sense as a struct, not an interface/object. I'll tell you why: performance.

You are right. If range interfaces accommodate block transfers, this problem may be addressed. I agree that one virtual call per character output would be overkill. (I seem to recall it's one of the reasons why C++'s iostreams are so inefficient.)
 Ranges are special in two respects:
 
 1. They are foreachable.  I think everyone agrees that calling 2 
 interface functions per loop iteration is much lower performing than 
 using opApply, which calls one delegate function per loop.  My 
 recommendation -- use opApply when dealing with polymorphism.  I don't 
 think there's a way around this.

 2. They are useful for passing to std.algorithm.  But std.algorithm is 
 template-interfaced.  No need for using interfaces because the correct 
 instatiation will be chosen.
 
 If you are intending to add a streaming module that uses ranges, would 
 it not be templated for the range type as std.algorithm is?  If not, the 
 next logical choice is a delegate, which requires no vtable lookup.  
 Using an interface is just asking for a performance penalty for not much 
 gain.

I think the cost of calling through the delegate is roughly the same as a virtual call.
 Here's what I mean by not much gain: I would expect a stream range that 
 does output to have a method in it for outputting a buffer (I'd laugh at 
 you if you wanted to define a stream range that outputs a character at a 
 time).  So the difference between:

Well I'd laugh at you if you thought I'm that brain dead :o).
 x.toString(outputRange, format)
 
 and
 
 x.toString(&outputRange.sink, format)
 
 is pretty darn minimal, and if outputRange is an interface or object, 
 this saves a virtual call per buffer write.  Plus the second form is 
 more universal, you can pass any delegate, and not have to use a range 
 type to wrap a delegate.
 
 Don't fall into the "OOP newbie" trap -- where just because you've found 
 a new concept that is amazing, you want to use it for everything.  I say 
 this because I've seen in the past where someone discovers the power of 
 OOP and then wants to use it for everything, when in some cases, it's 
 overkill.  Just look at some Java "classes"...

There is no need to worry that I'll fall into at least that particular OOP newbie trap. What I think we should do is define a text output interface that allows writing individual characters of all widths and also arrays of all widths. That would be a universal means for text output. interface TextOutputStream { void put(dchar); // also accommodates char and wchar void put(in char[]); void put(in wchar[]); void put(in dchar[]); } The toString method (re-baptized as toStream) would take such an interface. Better ideas are always welcome. Perhaps I'm falling another OOP newbie trap! (Seriously!) One possible course of action would be to extend the text output stream to print (and possibly format) some or all primitive types, a la today's phobos streams. That would make TextOutputStream fatter and more diluted, something that I don't like. But then we might define a FormattingTextOutputStream that extends TextOutputStream with all that stuff.
  From another thread:
 Walter does not feel strongly about Phobos.

Huh? I feel like this sentence doesn't make sense, so maybe there's a typo.

I meant to say, Walter does not want to do library design. Andrei
Nov 12 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 10:29:17 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and 
 write to it. (The sink is a simplified range.)

A range only makes sense as a struct, not an interface/object. I'll tell you why: performance.

You are right. If range interfaces accommodate block transfers, this problem may be addressed. I agree that one virtual call per character output would be overkill. (I seem to recall it's one of the reasons why C++'s iostreams are so inefficient.)

IIRC, I don't think C++ iostreams use polymorphism

Oh yes they do. (Did you even google?) Virtual multiple inheritance, the works. http://www.deitel.com/articles/cplusplus_tutorials/20060225/virtualBaseClass/
, and I don't think 
 they use the "one char at a time" method.

Well they do offer one char at a time and also a block transfer. http://msdn.microsoft.com/en-us/library/760t8w1z%28VS.80%29.aspx I'm not sure how the heck but they still manage to call one virtual method per char, otherwise they'd be plenty fast, which they aren't. I seem to recall write() has a default implementation that calls put() in a loop or something. It's not a topic that I want to study closely. iostreams suck, why spend time on learning the quirks of a broken design.
 Ranges are special in two respects:
  1. They are foreachable.  I think everyone agrees that calling 2 
 interface functions per loop iteration is much lower performing than 
 using opApply, which calls one delegate function per loop.  My 
 recommendation -- use opApply when dealing with polymorphism.  I 
 don't think there's a way around this.

 2. They are useful for passing to std.algorithm.  But std.algorithm 
 is template-interfaced.  No need for using interfaces because the 
 correct instatiation will be chosen.
  If you are intending to add a streaming module that uses ranges, 
 would it not be templated for the range type as std.algorithm is?  If 
 not, the next logical choice is a delegate, which requires no vtable 
 lookup.  Using an interface is just asking for a performance penalty 
 for not much gain.

I think the cost of calling through the delegate is roughly the same as a virtual call.

Not exactly. I think you are right that struct member calls are faster than delegates, but only slightly. The difference being that a struct member call does not need to load the function address from the stack, it can hard-code the address directly. However, virtual calls have to be lower performing because you are doing two indirections, one to the class vtable, then one to the function address itself. Plus those two locations are most likely located on the heap, not the stack, and so may not be in the cache.

I think the only way to figure is to measure. For one thing I disagree with the comment about the cache - a vtable is quite likely to be warm after a couple of calls. I know one thing - Walter's old format function used delegates and it was unusably slow.
 x.toString(outputRange, format)
  and
  x.toString(&outputRange.sink, format)
  is pretty darn minimal, and if outputRange is an interface or 
 object, this saves a virtual call per buffer write.  Plus the second 
 form is more universal, you can pass any delegate, and not have to 
 use a range type to wrap a delegate.
  Don't fall into the "OOP newbie" trap -- where just because you've 
 found a new concept that is amazing, you want to use it for 
 everything.  I say this because I've seen in the past where someone 
 discovers the power of OOP and then wants to use it for everything, 
 when in some cases, it's overkill.  Just look at some Java "classes"...

There is no need to worry that I'll fall into at least that particular OOP newbie trap. What I think we should do is define a text output interface that allows writing individual characters of all widths and also arrays of all widths. That would be a universal means for text output. interface TextOutputStream { void put(dchar); // also accommodates char and wchar void put(in char[]); void put(in wchar[]); void put(in dchar[]); } The toString method (re-baptized as toStream) would take such an interface. Better ideas are always welcome. Perhaps I'm falling another OOP newbie trap! (Seriously!)

This still fits within a single function, which takes one of the 3 widths (pick one, they can all be translated to eachother): void put(in char[] str) { foreach(dchar dc; str) { put((&dc)[0..1]); } } Note that you probably want to build a buffer of dchars instead of putting one at a time, but you get the idea.

I don't get the idea. I'm seeing one virtual call per character.
 Also, putting a single character is probably pretty uncommon, but can be 
 handled in a similar fashion.

I'm not sure about the uncommonality of outputting one character, but it may be good to discourage it just to not foster slow code.
 That being said, one other point that makes all this moot is -- toString 
 is for debugging, not for general purpose.  We don't need to support 
 everything that is possible.  You should be able to say "hey, toString 
 only accepts char[], deal."  Of course, you could substitute wchar[] or 
 dchar[], but I think by far char[] is the most common (and is the 
 default type for string literals).

I was hoping we could elevate the usefulness of toString a bit.
 That's not to say there is no reason to have a TextOutputStream object.  
 Such a thing is perfectly usable for a toString which takes a char[] 
 delegate sink, just pass &put.  In fact, there could be a default 
 toString function in Object that does just that:
 
 class Object
 {
    ...
    void toString(delegate void(in char[] buf) put, string fmt) const
    {}
    void toString(TextOutputStream tos, string fmt) const
    { toString(&tos.put, fmt); }
 }

I'd agree with the delegate idea if we established that UTF-8 is favored compared to all other formats. Andrei
Nov 12 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 11:46:48 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 10:29:17 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range 
 and write to it. (The sink is a simplified range.)

A range only makes sense as a struct, not an interface/object. I'll tell you why: performance.

You are right. If range interfaces accommodate block transfers, this problem may be addressed. I agree that one virtual call per character output would be overkill. (I seem to recall it's one of the reasons why C++'s iostreams are so inefficient.)


Oh yes they do. (Did you even google?) Virtual multiple inheritance, the works. http://www.deitel.com/articles/cplusplus_tutorials/20060225/virtualBaseClass/

From my C++ book, it appears to only use virtual inheritance. I don't know enough about virtual inheritance to know how that changes function calls. As far as virtual functions, only the destructor is virtual, so there is no issue there.

You're right, but there is an issue because as far as I can recall these functions' implementation do end up calling a virtual function per char; that might be streambuf.overflow. I'm not keen on investigating this any further, but I'd be grateful if you shared any related knowledge. At the end of the day, there seem to be violent agreement that we don't want one virtual call per character or one delegate call per character.
  void put(in char[] str)
 {
   foreach(dchar dc; str)
   {
      put((&dc)[0..1]);
   }
 }
  Note that you probably want to build a buffer of dchars instead of 
 putting one at a time, but you get the idea.

I don't get the idea. I'm seeing one virtual call per character.

You missed the note. I didn't implement it, but you could easily implement a stack-allocated buffer to cache the conversions, passing multiple converted code-points at once. But I don't think it's even worth discussing per my other points.
 That being said, one other point that makes all this moot is -- 
 toString is for debugging, not for general purpose.  We don't need to 
 support everything that is possible.  You should be able to say "hey, 
 toString only accepts char[], deal."  Of course, you could substitute 
 wchar[] or dchar[], but I think by far char[] is the most common (and 
 is the default type for string literals).

I was hoping we could elevate the usefulness of toString a bit.

Whatever kind of data the output stream gets, it's going to convert it to the format it wants anyways (as for stdout, I think that would be utf8), the only benefit is if you have data stored in a different width that you wanted to output. Calling a conversion function in that case I think is reasonable enough, and saves the output stream from having to convert/deal with it. In other words, I don't think it's going to be that common a case where you need anything other than utf8 output, and therefore the cost of creating an interface, making virtual calls, disallowing simple delegate passing etc is worth the convenience *just in case* you have data stored as wchar[] you want to output.

I'm not sure. http://www.gnu.org/s/libc/manual/html_node/Streams-and-I18N.html#Streams-and-I18N gnu defines means to set and detect a utf-16 console, which dmd observes (grep std/ for fwide). But then I'm not sure how many are using that kind of stuff.
 That's not to say there is no reason to have a TextOutputStream 
 object.  Such a thing is perfectly usable for a toString which takes 
 a char[] delegate sink, just pass &put.  In fact, there could be a 
 default toString function in Object that does just that:
  class Object
 {
    ...
    void toString(delegate void(in char[] buf) put, string fmt) const
    {}
    void toString(TextOutputStream tos, string fmt) const
    { toString(&tos.put, fmt); }
 }

I'd agree with the delegate idea if we established that UTF-8 is favored compared to all other formats.

D seems to favor UTF8 -- it is the default type for string literals. I don't think I've ever used dchar, and I usually only use wchar to talk to Win32 functions when required. The question I'd ask is -- how common is it where the versions other than char[] would be more convenient?

I don't know. I think Asian-language users might give a salient answer. Andrei
Nov 12 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 13:46:08 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
   From my C++ book, it appears to only use virtual inheritance.  I 
 don't know enough about virtual inheritance to know how that changes 
 function calls.
  As far as virtual functions, only the destructor is virtual, so 
 there is no issue there.

You're right, but there is an issue because as far as I can recall these functions' implementation do end up calling a virtual function per char; that might be streambuf.overflow. I'm not keen on investigating this any further, but I'd be grateful if you shared any related knowledge.

Yep, you are right. It appears the reason they do this is so the conversion to the appropriate width can be done per character (and is a no-op for char).
 At the end of the day, there seem to be violent agreement that we 
 don't want one virtual call per character or one delegate call per 
 character.

After running my tests, it appears the virtual call vs. delegate is so negligible, and the virtual call vs. direct call is only slightly less negligible, I think the virtualness may not matter. However, I think avoiding one *call* per character is a worthy goal. This doesn't mean I change my mind :) I still think there is little benefit to having to conjure up an entire object just to convert something to a string vs. writing a simple inner function. One way to find out is to support only char[], and see who complains :) It'd be much easier to go from supporting char[] to supporting all the widths than going from supporting all to just one.

One problem I just realized is that, if we e.g. offer only put(in char[]) or a delegate to that effect, we make it impossible to output one character efficiently. The (&c)[0 .. 1] trick will not work in safe mode. You'd have to allocate a one-element array dynamically. Also, many OSs adopted UTF-16 as their standard format. It may be wise to design for compatibility. Andrei
Nov 12 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 14:40:12 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 13:46:08 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

   From my C++ book, it appears to only use virtual inheritance.  I 
 don't know enough about virtual inheritance to know how that 
 changes function calls.
  As far as virtual functions, only the destructor is virtual, so 
 there is no issue there.

You're right, but there is an issue because as far as I can recall these functions' implementation do end up calling a virtual function per char; that might be streambuf.overflow. I'm not keen on investigating this any further, but I'd be grateful if you shared any related knowledge.

conversion to the appropriate width can be done per character (and is a no-op for char).
 At the end of the day, there seem to be violent agreement that we 
 don't want one virtual call per character or one delegate call per 
 character.

so negligible, and the virtual call vs. direct call is only slightly less negligible, I think the virtualness may not matter. However, I think avoiding one *call* per character is a worthy goal. This doesn't mean I change my mind :) I still think there is little benefit to having to conjure up an entire object just to convert something to a string vs. writing a simple inner function. One way to find out is to support only char[], and see who complains :) It'd be much easier to go from supporting char[] to supporting all the widths than going from supporting all to just one.

One problem I just realized is that, if we e.g. offer only put(in char[]) or a delegate to that effect, we make it impossible to output one character efficiently. The (&c)[0 .. 1] trick will not work in safe mode. You'd have to allocate a one-element array dynamically.

char[1] buf; buf[0] = c; put(buf);

This would not compile in SafeD.
 Although it would be a useful feature to be able to convert a value type 
 to an array of one element reference, especially since that should be as 
 safe as taking a slice of a static array.
 
 Another solution, although I'm unaware of the added costs:
 
 void toString(void delegate(in char[]...) put, string fmt);
 
 Also, many OSs adopted UTF-16 as their standard format. It may be wise 
 to design for compatibility.

So you want toString's to look like this? version(utf16isdefault) { textobj.put("Array: "w); ... } else { textobj.put("Array: "); ... } -Steve

I was just thinking of offering an interface that offers utf8 and utf16 and utf32. Andrei
Nov 12 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 16:19:39 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 14:40:12 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 13:46:08 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

   From my C++ book, it appears to only use virtual inheritance.  
 I don't know enough about virtual inheritance to know how that 
 changes function calls.
  As far as virtual functions, only the destructor is virtual, so 
 there is no issue there.

You're right, but there is an issue because as far as I can recall these functions' implementation do end up calling a virtual function per char; that might be streambuf.overflow. I'm not keen on investigating this any further, but I'd be grateful if you shared any related knowledge.

conversion to the appropriate width can be done per character (and is a no-op for char).
 At the end of the day, there seem to be violent agreement that we 
 don't want one virtual call per character or one delegate call per 
 character.

is so negligible, and the virtual call vs. direct call is only slightly less negligible, I think the virtualness may not matter. However, I think avoiding one *call* per character is a worthy goal. This doesn't mean I change my mind :) I still think there is little benefit to having to conjure up an entire object just to convert something to a string vs. writing a simple inner function. One way to find out is to support only char[], and see who complains :) It'd be much easier to go from supporting char[] to supporting all the widths than going from supporting all to just one.

One problem I just realized is that, if we e.g. offer only put(in char[]) or a delegate to that effect, we make it impossible to output one character efficiently. The (&c)[0 .. 1] trick will not work in safe mode. You'd have to allocate a one-element array dynamically.

buf[0] = c; put(buf);

This would not compile in SafeD.

:O Why not? I would expect that using a local buffer would be the main way for converting non-string things to strings, or to avoid calling the delegate/vfunction lots of times.

Well a stack-allocated buffer is stack-allocated, and passing a slice out of it to a function may cause the function to escape the slice.
 i.e. if I want to output an integer i:
 
 
 if(i == 0) put("0");
 else
 {
   char[20] buf;
   int idx = buf.length - 1;
   while(i != 0)
   {
     buf[idx] = i % 10;
     --idx;
     i /= 10;
   }
   put(buf[idx..$]); // no compily in SafeD???
 }
 
 Do I have to allocate a heap buffer in SafeD?

I'm afraid so. Unless of course you have a put(dchar) routine handy :o).
 Also, many OSs adopted UTF-16 as their standard format. It may be 
 wise to design for compatibility.

version(utf16isdefault) { textobj.put("Array: "w); ... } else { textobj.put("Array: "); ... } -Steve

I was just thinking of offering an interface that offers utf8 and utf16 and utf32.

Yes, and your explaination for this is because many OSes adopt UTF-16 as their standard format. My expectation is that the outputter will convert to the required OS format anyways, regardless of what you pass it, so why should we write code to cater to what the OS wants? I'd like to write string-handling code once and be done with it, not try to optimize my toString functions so that they use the "right" methods for the current OS. I asserted that the only reason you want to use the functions other than the char[] version is in the case where your data is *stored* as wchar[] or dchar[]. Otherwise, it makes no sense to do the conversion because the outputter already does it for you. So the question becomes, how often do you need to output data that's already in dchar[] or wchar[] format, and is it worth passing around a list of functions just in case you need that, or should you just call a conversion routine the few times you need it? Let's not forget that this is mainly for debugging...

If it's mainly for debugging maybe it's not worth spending time on. Andrei
Nov 12 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 On Thu, Nov 12, 2009 at 1:54 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Let's not forget that this is mainly for debugging...


Nonsense! Developers spend a lot of time debugging. Helping people debug their programs is certainly worth spending time on. --bb

Sorry sorry. I just meant to say it's not worth coming with an airtight design. We might afford some extra conversions and extra virtual calls I guess. But that being said, I'd so much want to start thinking of an actual text serialization infrastructure. Why develop one later with the mention "well use that stuff for debugging only, this is the real stuff." Andrei
Nov 12 2009
parent Yigal Chripun <yigal100 gmail.com> writes:
Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 17:13:06 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Bill Baxter wrote:
 On Thu, Nov 12, 2009 at 1:54 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Let's not forget that this is mainly for debugging...


debug their programs is certainly worth spending time on. --bb

Sorry sorry. I just meant to say it's not worth coming with an airtight design. We might afford some extra conversions and extra virtual calls I guess. But that being said, I'd so much want to start thinking of an actual text serialization infrastructure. Why develop one later with the mention "well use that stuff for debugging only, this is the real stuff."

The main purpose to serialize is to be able to deserialize. The main reason to print debug information is so a person can read it. I don't know if those two goals overlap enough. I think we need both. Maybe one uses the other, I'm not sure, but a way to say "here's how you interact with writefln and friends" would be very nice. -Steve

I'd add to that the a format facility should be locale aware as in .Net. i.e: (pseudo-code) auto str = format("{0}", 2.4, CurrentCulture); // or specify a specific locale str will be either "2.4" or "2,4" based on locale. this serves an entirely different purpose from serialization even though both have common parts. you can't and shouldn't try to de-serialize the above text representation.
Nov 12 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Thu, Nov 12, 2009 at 1:54 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 Let's not forget that this is mainly for debugging...

If it's mainly for debugging maybe it's not worth spending time on.

Nonsense! Developers spend a lot of time debugging. Helping people debug their programs is certainly worth spending time on. --bb
Nov 12 2009
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 12 Nov 2009 17:13:06 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Bill Baxter wrote:
 On Thu, Nov 12, 2009 at 1:54 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Let's not forget that this is mainly for debugging...


debug their programs is certainly worth spending time on. --bb

Sorry sorry. I just meant to say it's not worth coming with an airtight design. We might afford some extra conversions and extra virtual calls I guess. But that being said, I'd so much want to start thinking of an actual text serialization infrastructure. Why develop one later with the mention "well use that stuff for debugging only, this is the real stuff."

The main purpose to serialize is to be able to deserialize. The main reason to print debug information is so a person can read it. I don't know if those two goals overlap enough. I think we need both. Maybe one uses the other, I'm not sure, but a way to say "here's how you interact with writefln and friends" would be very nice. -Steve
Nov 12 2009
prev sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 By far a direct call is faster, but I was surprised at how
 little overhead virtual calls add in relation to the loop counter.  I had
 to use 10 billion loops or else the difference was undetectable.
 I used dmd 1.046 -release -O (the -release is needed to get rid of the
 class method checking the invariant every call).
 The relative assembly for calling a virtual method is:
 mov	ECX,[EBX]
 mov	EAX,EBX
 push	dword ptr -8[EBP]
 call	dword ptr 014h[ECX]
 and the assembly for calling a delegate is:
 push	dword ptr -8[EBP]
 mov	EAX,-010h[EBP]
 call	EBX
 -Steve

Your benchmarks don't show that the direct call is much faster. You had inlining disabled. Was this intentional? If so, it proves my point that most of the overhead from virtual calls comes from the fact that they can't usually be inlined, not because they're virtual.
Nov 12 2009
parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 On Thu, 12 Nov 2009 12:38:00 -0500, dsimcha <dsimcha yahoo.com> wrote:
 == Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
  By far a direct call is faster, but I was surprised at how
 little overhead virtual calls add in relation to the loop counter.  I
 had
 to use 10 billion loops or else the difference was undetectable.
 I used dmd 1.046 -release -O (the -release is needed to get rid of the
 class method checking the invariant every call).
 The relative assembly for calling a virtual method is:
 mov	ECX,[EBX]
 mov	EAX,EBX
 push	dword ptr -8[EBP]
 call	dword ptr 014h[ECX]
 and the assembly for calling a delegate is:
 push	dword ptr -8[EBP]
 mov	EAX,-010h[EBP]
 call	EBX
 -Steve

Your benchmarks don't show that the direct call is much faster. You had inlining disabled. Was this intentional? If so, it proves my point that most of the overhead from virtual calls comes from the fact that they can't usually be inlined, not because they're virtual.

small but present amount.

Yes, about 0.5 nanoseconds. In other words, if your CPU is roughly 2 GHz, about one **clock cycle**. This is definitely negligible IMHO.
 Inlining makes the struct member function call disappear (b/c foo does
 nothing!), so it's not really a relevant benchmark.

Right, my point is that the overhead of indirect function calls compared to direct function calls is too small to ever be worth considering assuming the direct function call is not inlined. However, when the direct function call may be inlined, this is where indirect calls really hurt because they usually can't be inlined.
 I did the "struct" version as a baseline.  Consider that the struct
 version is the cost of doing the loop increments, pushing the 'this'
 pointer and argument, and calling the function.  Any difference from that
 is the overhead of virtual/delegate/interface calls.
 Inlining is not possible with delegates (yet), so it's not really
 important for this argument.
 -Steve

Nov 12 2009
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 A delegate is equivalent to a struct member function call.  (load data 
 pointer (i.e. this), push args, call function)

I think this particular point is incorrect. Andrei
Nov 12 2009
parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Steven Schveighoffer wrote:
 A delegate is equivalent to a struct member function call.  (load data
 pointer (i.e. this), push args, call function)

Andrei

Most of the overhead from indirect function calls come from the fact that they (usually) can't be inlined, not because they are indirect. The struct member function call is faster mostly because it can be inlined, not because it's direct. Here's roughly what the ASM would look like for a call to a member function of a struct on the stack, if I is a metasyntactic variable for any immediate value: mov EAX, EBP; // Copy frame pointer to EAX add EAX, I; // Add the offset of the struct to EAX. push EAX; // EAX is now the this ptr. Push it. call I; // Call the function. And for a delegate that lives on the stack: mov EAX, [EBP + I]; // Move delegate's this ptr into EAX. push EAX; // Push delegate's this ptr onto stack. call [EBP + I]; // Call whatever address is at offset I from EBP. I've actually benchmarked how much indirect function calls cost compared to direct calls that aren't inlined. The short answer is it's not measurable, at least when calling the same function indirectly in a loop over and over. It could in theory cause pipeline stalls because it's a branch, but according to some Intel optimization manual Don posted here a while back, modern CPUs predict the address of indirect function calls in their branch predictor. This means that if the same path is taken again and again, the overhead will be negligible.
Nov 12 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Denis Koroskin wrote:
 On Thu, 12 Nov 2009 16:23:22 +0300, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:
 
 On Thu, 12 Nov 2009 08:22:26 -0500, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:

 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and 
 write to it. (The sink is a simplified range.)

Bad idea... A range only makes sense as a struct, not an interface/object. I'll tell you why: performance. Ranges are special in two respects: 1. They are foreachable. I think everyone agrees that calling 2 interface functions per loop iteration is much lower performing than using opApply, which calls one delegate function per loop. My recommendation -- use opApply when dealing with polymorphism. I don't think there's a way around this.

Oops, I meant 3 virtual functions -- front, popNext, and empty. -Steve

Output range has only one method: put. I'm not sure, but I don't think there is a performance difference between calling a virtual function through an interface and invoking a delegate. But I agree passing a delegate is more generic. You can substitute an output range with a delegate (obj.toString(&range.put, fmt)) without any performance hit, but not vice versa (obj.toString(new DelegateWrapRange(&myput), fmt) implies an additional allocation and additional indirection per range.put call).

I think that, on the contrary, working with a delegate is less generic. A delegate is cost-wise much like a class with only one (non-final) method. Since we're taking that hit already, we may as well define actual interfaces and classes that have multiple methods. That makes things more flexible and more efficient. Andrei
Nov 12 2009
next sibling parent reply Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 Denis Koroskin wrote:
 On Thu, 12 Nov 2009 16:23:22 +0300, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:

 On Thu, 12 Nov 2009 08:22:26 -0500, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:

 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and 
 write to it. (The sink is a simplified range.)

Bad idea... A range only makes sense as a struct, not an interface/object. I'll tell you why: performance. Ranges are special in two respects: 1. They are foreachable. I think everyone agrees that calling 2 interface functions per loop iteration is much lower performing than using opApply, which calls one delegate function per loop. My recommendation -- use opApply when dealing with polymorphism. I don't think there's a way around this.

Oops, I meant 3 virtual functions -- front, popNext, and empty. -Steve

Output range has only one method: put. I'm not sure, but I don't think there is a performance difference between calling a virtual function through an interface and invoking a delegate. But I agree passing a delegate is more generic. You can substitute an output range with a delegate (obj.toString(&range.put, fmt)) without any performance hit, but not vice versa (obj.toString(new DelegateWrapRange(&myput), fmt) implies an additional allocation and additional indirection per range.put call).

I think that, on the contrary, working with a delegate is less generic. A delegate is cost-wise much like a class with only one (non-final) method. Since we're taking that hit already, we may as well define actual interfaces and classes that have multiple methods. That makes things more flexible and more efficient.

How? It seems to introduce more requirements on the implementation, but I'm not seeing any benefit in exchange. FWIW, with regard to performance, I can easily imagine the compiler being able to perform the equivalent of a "named return value" optimisation on a delegate return, giving some chance of inlining. That's a lot less obvious with an interface.
Nov 12 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Don wrote:
 Andrei Alexandrescu wrote:
 Denis Koroskin wrote:
 On Thu, 12 Nov 2009 16:23:22 +0300, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:

 On Thu, 12 Nov 2009 08:22:26 -0500, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:

 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range 
 and write to it. (The sink is a simplified range.)

Bad idea... A range only makes sense as a struct, not an interface/object. I'll tell you why: performance. Ranges are special in two respects: 1. They are foreachable. I think everyone agrees that calling 2 interface functions per loop iteration is much lower performing than using opApply, which calls one delegate function per loop. My recommendation -- use opApply when dealing with polymorphism. I don't think there's a way around this.

Oops, I meant 3 virtual functions -- front, popNext, and empty. -Steve

Output range has only one method: put. I'm not sure, but I don't think there is a performance difference between calling a virtual function through an interface and invoking a delegate. But I agree passing a delegate is more generic. You can substitute an output range with a delegate (obj.toString(&range.put, fmt)) without any performance hit, but not vice versa (obj.toString(new DelegateWrapRange(&myput), fmt) implies an additional allocation and additional indirection per range.put call).

I think that, on the contrary, working with a delegate is less generic. A delegate is cost-wise much like a class with only one (non-final) method. Since we're taking that hit already, we may as well define actual interfaces and classes that have multiple methods. That makes things more flexible and more efficient.

How? It seems to introduce more requirements on the implementation, but I'm not seeing any benefit in exchange.

The benefit is that it allows writing all character widths.
 FWIW, with regard to performance, I can easily imagine the compiler 
 being able to perform the equivalent of a "named return value" 
 optimisation on a delegate return, giving some chance of inlining.
 That's a lot less obvious with an interface.

That seems plausible. Andrei
Nov 12 2009
prev sibling parent reply Justin Johansson <no spam.com> writes:
Andrei Alexandrescu Wrote:

 Denis Koroskin wrote:
 On Thu, 12 Nov 2009 16:23:22 +0300, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:
 
 On Thu, 12 Nov 2009 08:22:26 -0500, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:

 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and 
 write to it. (The sink is a simplified range.)

Bad idea... A range only makes sense as a struct, not an interface/object. I'll tell you why: performance. Ranges are special in two respects: 1. They are foreachable. I think everyone agrees that calling 2 interface functions per loop iteration is much lower performing than using opApply, which calls one delegate function per loop. My recommendation -- use opApply when dealing with polymorphism. I don't think there's a way around this.

Oops, I meant 3 virtual functions -- front, popNext, and empty. -Steve

Output range has only one method: put. I'm not sure, but I don't think there is a performance difference between calling a virtual function through an interface and invoking a delegate. But I agree passing a delegate is more generic. You can substitute an output range with a delegate (obj.toString(&range.put, fmt)) without any performance hit, but not vice versa (obj.toString(new DelegateWrapRange(&myput), fmt) implies an additional allocation and additional indirection per range.put call).

I think that, on the contrary, working with a delegate is less generic. A delegate is cost-wise much like a class with only one (non-final) method. Since we're taking that hit already, we may as well define actual interfaces and classes that have multiple methods. That makes things more flexible and more efficient. Andrei

"Since we're taking that hit already, we may as well define
 actual interfaces and classes that have multiple methods."

Which you mean -- interfaces, classes or both? Don't interfaces have a higher cost than classes? Justin
Nov 12 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Justin Johansson wrote:
 Andrei Alexandrescu Wrote:
 
 Denis Koroskin wrote:
 On Thu, 12 Nov 2009 16:23:22 +0300, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:

 On Thu, 12 Nov 2009 08:22:26 -0500, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:

 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and 
 write to it. (The sink is a simplified range.)

A range only makes sense as a struct, not an interface/object. I'll tell you why: performance. Ranges are special in two respects: 1. They are foreachable. I think everyone agrees that calling 2 interface functions per loop iteration is much lower performing than using opApply, which calls one delegate function per loop. My recommendation -- use opApply when dealing with polymorphism. I don't think there's a way around this.

-Steve

I'm not sure, but I don't think there is a performance difference between calling a virtual function through an interface and invoking a delegate. But I agree passing a delegate is more generic. You can substitute an output range with a delegate (obj.toString(&range.put, fmt)) without any performance hit, but not vice versa (obj.toString(new DelegateWrapRange(&myput), fmt) implies an additional allocation and additional indirection per range.put call).

A delegate is cost-wise much like a class with only one (non-final) method. Since we're taking that hit already, we may as well define actual interfaces and classes that have multiple methods. That makes things more flexible and more efficient. Andrei

"Since we're taking that hit already, we may as well define
 actual interfaces and classes that have multiple methods."

Which you mean -- interfaces, classes or both? Don't interfaces have a higher cost than classes?

My understanding is that the costs are comparable. Andrei
Nov 12 2009
prev sibling next sibling parent Philippe Sigaud <philippe.sigaud gmail.com> writes:
--0016e6d77e3c2df82704781b7b31
Content-Type: text/plain; charset=ISO-8859-1

Denis:
BTW, save() is already implemented and used throughout the Phobos under a

(i.e. auto copy = range[]). It's a bikeshed discussion, but why save() and not opSlice(), or even clone()? 2009/11/11 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>
 It can't be clone() because it doesn't clone. For example say you have a
 T[] - one would expect clone() actually copies the content. But using
 opSlice is a good idea.

I don't get it. Shouldn't save() copy the content? Do you mean we could use opSlice() (the parameterless version) as a save function and write "auto r2 = r1[];"? But, again maybe I don't get something: for dyn. arrays (aka the range archetype) opSlice is not a save, it's just an alias. So using opSlice doesn't work for remembering positions with arrays. Philippe --0016e6d77e3c2df82704781b7b31 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">Denis:<br>&gt;BTW, save() is already implemented= and used throughout the Phobos under a different name - opSlice<br>=A0(i.e. auto copy =3D range[]). It&#39;s a b= ikeshed discussion, but why save() and not opSlice(), or even clone()?<br><br>2009/= 11/11 Andrei Alexandrescu <span dir=3D"ltr">&lt;<a href=3D"mailto:SeeWebsit= eForEmail erdani.org">SeeWebsiteForEmail erdani.org</a>&gt;</span><br><bloc= kquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, 204, = 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> <div><div></div>It can&#39;t be clone() because it doesn&#39;t clone. For e= xample say you have a T[] - one would expect clone() actually copies the co= ntent. But using opSlice is a good idea.<br></div></blockquote><div><br> I don&#39;t get it. Shouldn&#39;t save() copy the content?<br><br>Do you me= an we could use opSlice() (the parameterless version) as a save function an= d write &quot;auto r2 =3D r1[];&quot;?<br>But, again maybe I don&#39;t get = something: for dyn. arrays (aka the range archetype) opSlice is not a save,= it&#39;s just an alias. So using opSlice doesn&#39;t work for remembering = positions with arrays.<br> <br>=A0=A0 Philippe<br><br></div></div> --0016e6d77e3c2df82704781b7b31--
Nov 11 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Wed, 11 Nov 2009 20:08:52 +0300, Philippe Sigaud  
<philippe.sigaud gmail.com> wrote:

 Denis:
 BTW, save() is already implemented and used throughout the Phobos under  
 a

(i.e. auto copy = range[]). It's a bikeshed discussion, but why save() and not opSlice(), or even clone()? 2009/11/11 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>
 It can't be clone() because it doesn't clone. For example say you have a
 T[] - one would expect clone() actually copies the content. But using
 opSlice is a good idea.

I don't get it. Shouldn't save() copy the content? Do you mean we could use opSlice() (the parameterless version) as a save function and write "auto r2 = r1[];"? But, again maybe I don't get something: for dyn. arrays (aka the range archetype) opSlice is not a save, it's just an alias. So using opSlice doesn't work for remembering positions with arrays. Philippe

It remembers array bounds, not contents.
Nov 11 2009
prev sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Thu, Nov 12, 2009 at 10:46 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 I'd agree with the delegate idea if we established that UTF-8 is favore=



 compared to all other formats.

D seems to favor UTF8 -- it is the default type for string literals. =A0=


 don't think I've ever used dchar, and I usually only use wchar to talk t=


 Win32 functions when required.

 The question I'd ask is -- how common is it where the versions other tha=


 char[] would be more convenient?

I don't know. I think Asian-language users might give a salient answer.

This isn't authoritative, but I don't think utf-16 is commonly used in Japan (except for calling Windows APIs). If you look at Mozilla the default Japanese encoding listed is Shift-JIS. A lot of Japanese email still gets sent as ISO-2022-JP. Otherwise utf-8 I think. A quick look at www.asahi.com shows they're using EUC-JP. nicovideo.jp is using utf-8. I seem to recall that my Japanese Visual Studio even saved files in Utf-8, or at least could be set to use utf-8. In short, I think utf-8 is closer to being a widely accepted standard for documents over there than utf-16 is. --bb
Nov 12 2009
prev sibling next sibling parent Justin Johansson <no spam.com> writes:
Don Wrote:

 Lutger wrote:
 Justin Johansson wrote:
 
 Lutger Wrote:

 Justin Johansson wrote:

 I assert that the semantics of "toString" or similarly named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.

 To put this statement into perspective, I would be most appreciative of
 D NG readers responding with their own idea(s) of what the semantics of
 "toString" are (or should be) in a language agnostic ideology.

Semantics of toString would depend on the object, I would think there are three general types of objects: 1. objects with only one sensible or one clear default string representations, like integers. Maybe even none of these exist (except strings themselves?) 2. objects that, given some formatting options or locale have a clear string representation. floating points, dates, curreny and the like. 3. objects that have no sensible default representation. toString() would not make sense for 3) type objects and only for 2) type objects as part of a formatting / localization package. toString() as a debugging aid sometimes doubles as a formatter for 1) and 2) class objects, but that may be more confusing than it's worth.

Do you think it would make better sense if programming languages/their libraries separated functions/methods which are currently loosely purposed as "toString" into methods which are more specific to the types you suggest (leaving only the types/classifications and number thereof to argue about)? In my own D project, I've introduced a toDebugString method and left toString alone. There are times when I like D's default toString printing out the name of the object class. For debug purposes there are times also when I like to see a string printed out in quotes so you can tell the difference between "123" and 123. Then again, and since I'm working on a scripting language, sometimes I like to see debug output distinguish between different numeric types. Anyway going by the replies on this topic, looks like most people view toString as being good for debug purposes and that about it. Cheers Justin

Your design makes better sense (to me at least) because it is based on why you want a string from some object. Take .NET for example: it does provide very elaborate and nice formatting options based and toString() with parameters. For some types however, the default toString() gives you the name of the type itself which is in no way related to formatting an object. You learn to work with it, but I find it a bit muddled. As a last note, I think people view toString as a debug thing mostly because it is very underpowered.

There is a definite use for such as thing. But the existing toString() is much, much worse than useless. People think you can do something with it, but you can't. eg, people have asked for BigInt to support toString(). That is an over-my-dead-body.

s/over-my-dead-body/over-your-dead-body/ :-) At least those are the words that Brendan Eich uses when people seek to make JavaScript multi-threaded. http://weblogs.mozillazine.org/roadmap/archives/2007/02/threads_suck.html http://www.teknico.net/misc/fortune/concurrency.en.txt Google: http://www.google.com.au/#hl=en&q=Brendan+Eich+"your+dead+body" Best regards and thanks to all respondents on "toString" topic, Justin
Nov 10 2009
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Bill Baxter:

 Just out of curiousity, how does someone print out the
 value of a BigInt right now?

I have added a toString to my copy of the BigInt. Bye, bearophile
Nov 10 2009
prev sibling parent reply Don <nospam nospam.com> writes:
Bill Baxter wrote:
 On Tue, Nov 10, 2009 at 2:51 AM, Don <nospam nospam.com> wrote:
 Lutger wrote:
 Justin Johansson wrote:

 Lutger Wrote:

 Justin Johansson wrote:

 I assert that the semantics of "toString" or similarly named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.

 To put this statement into perspective, I would be most appreciative of
 D NG readers responding with their own idea(s) of what the semantics of
 "toString" are (or should be) in a language agnostic ideology.

Semantics of toString would depend on the object, I would think there are three general types of objects: 1. objects with only one sensible or one clear default string representations, like integers. Maybe even none of these exist (except strings themselves?) 2. objects that, given some formatting options or locale have a clear string representation. floating points, dates, curreny and the like. 3. objects that have no sensible default representation. toString() would not make sense for 3) type objects and only for 2) type objects as part of a formatting / localization package. toString() as a debugging aid sometimes doubles as a formatter for 1) and 2) class objects, but that may be more confusing than it's worth.

Do you think it would make better sense if programming languages/their libraries separated functions/methods which are currently loosely purposed as "toString" into methods which are more specific to the types you suggest (leaving only the types/classifications and number thereof to argue about)? In my own D project, I've introduced a toDebugString method and left toString alone. There are times when I like D's default toString printing out the name of the object class. For debug purposes there are times also when I like to see a string printed out in quotes so you can tell the difference between "123" and 123. Then again, and since I'm working on a scripting language, sometimes I like to see debug output distinguish between different numeric types. Anyway going by the replies on this topic, looks like most people view toString as being good for debug purposes and that about it. Cheers Justin

you want a string from some object. Take .NET for example: it does provide very elaborate and nice formatting options based and toString() with parameters. For some types however, the default toString() gives you the name of the type itself which is in no way related to formatting an object. You learn to work with it, but I find it a bit muddled. As a last note, I think people view toString as a debug thing mostly because it is very underpowered.

much, much worse than useless. People think you can do something with it, but you can't. eg, people have asked for BigInt to support toString(). That is an over-my-dead-body.

You can definitely do something with it -- printf debugging. And if I were using BigInt, that's exactly why I'd want BigInt to have a toString.

I almost always want to print the value out in hex. And with some kind of digit separators, so that I can see how many digits it has. Just out of curiousity, how does someone print out the
 value of a BigInt right now?

In Tango, there's just .toHex() and .toDecimalString(). Needs proper formatting options, it's the biggest thing which isn't done. I hit one too many compiler segfaults and starting patching the compiler instead <g>. But I really want a decent toString(). Given a BigInt n, you should be able to just do writefln("%s %x", n, n); // Phobos formatln("{0} {0:X}", n); // Tango To solve this part of the issue, it would be enough to have toString() take a string parameter. (it would be "x" or "X" in this case). string toString(string fmt); But the performance would still be very poor, and that's much more difficult to solve.
Nov 10 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Don:
 But the performance would still be very poor, and that's much more 
 difficult to solve.

This may help: http://fredrik-j.blogspot.com/2008/07/making-division-in-python-faster.html http://fredrik-j.blogspot.com/2008/07/division-sequel-with-bonus-material.html http://bugs.python.org/issue3451 Bye, bearophile
Nov 10 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Bill Baxter:

 Though they may be useful, those don't look to have anything to do
 with formatting user types into strings, which is the subject at hand.

Don has said: "But the performance would still be very poor, and that's much more difficult to solve." And those links show a way to quickly convert a large multi-precision integer into a string. What is that I am missing? Bye, bearophile
Nov 10 2009
next sibling parent Don <nospam nospam.com> writes:
bearophile wrote:
 Bill Baxter:
 
 Though they may be useful, those don't look to have anything to do
 with formatting user types into strings, which is the subject at hand.

Don has said: "But the performance would still be very poor, and that's much more difficult to solve." And those links show a way to quickly convert a large multi-precision integer into a string. What is that I am missing?

It's problem 2 from my original posts: being able to output something large (eg an xml doc) in a piece-by-piece manner.
Nov 10 2009
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Bill Baxter:
 Maybe it's just my ignorance of BigNum issues, but those links look to
 me to be about divsion and not generating string representations.  Are
 those somehow synonymous in BigInt land?

Look the numeral() function inside here from those blog posts: http://www.dd.chalmers.se/~frejohl/code/div.py To convert a positive integer to string you have to keep dividing a number by 10, and accumulate the modulus as the digit, converted to ['0', '9']. When the number is zero you are done: n = 541489 result = "" while n: n, digit = divmod(n, 10) result = str(digit) + result # don't do this print repr(result) # prints '541489' But all those large divisions are slow if the number is huge. So that div.py python program shows a faster algorithm that does something smarter, to decrease the computational complexity of all that. Bye, bearophile
Nov 10 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Bill Baxter:
 Well, anyway, [...]

You are welcome. Bye, bearophile
Nov 10 2009
prev sibling parent Don <nospam nospam.com> writes:
Bill Baxter wrote:
 2009/11/10 Denis Koroskin <2korden gmail.com>:
 On Tue, 10 Nov 2009 15:30:20 +0300, Don <nospam nospam.com> wrote:

 Bill Baxter wrote:
 On Tue, Nov 10, 2009 at 2:51 AM, Don <nospam nospam.com> wrote:
 Lutger wrote:
 Justin Johansson wrote:

 Lutger Wrote:

 Justin Johansson wrote:

 I assert that the semantics of "toString" or similarly
 named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.

 To put this statement into perspective, I would be most appreciative
 of
 D NG readers responding with their own idea(s) of what the semantics
 of
 "toString" are (or should be) in a language agnostic ideology.

Semantics of toString would depend on the object, I would think there are three general types of objects: 1. objects with only one sensible or one clear default string representations, like integers. Maybe even none of these exist (except strings themselves?) 2. objects that, given some formatting options or locale have a clear string representation. floating points, dates, curreny and the like. 3. objects that have no sensible default representation. toString() would not make sense for 3) type objects and only for 2) type objects as part of a formatting / localization package. toString() as a debugging aid sometimes doubles as a formatter for 1) and 2) class objects, but that may be more confusing than it's worth.

Do you think it would make better sense if programming languages/their libraries separated functions/methods which are currently loosely purposed as "toString" into methods which are more specific to the types you suggest (leaving only the types/classifications and number thereof to argue about)? In my own D project, I've introduced a toDebugString method and left toString alone. There are times when I like D's default toString printing out the name of the object class. For debug purposes there are times also when I like to see a string printed out in quotes so you can tell the difference between "123" and 123. Then again, and since I'm working on a scripting language, sometimes I like to see debug output distinguish between different numeric types. Anyway going by the replies on this topic, looks like most people view toString as being good for debug purposes and that about it. Cheers Justin

why you want a string from some object. Take .NET for example: it does provide very elaborate and nice formatting options based and toString() with parameters. For some types however, the default toString() gives you the name of the type itself which is in no way related to formatting an object. You learn to work with it, but I find it a bit muddled. As a last note, I think people view toString as a debug thing mostly because it is very underpowered.

is much, much worse than useless. People think you can do something with it, but you can't. eg, people have asked for BigInt to support toString(). That is an over-my-dead-body.

were using BigInt, that's exactly why I'd want BigInt to have a toString.

digit separators, so that I can see how many digits it has. Just out of curiousity, how does someone print out the
 value of a BigInt right now?

formatting options, it's the biggest thing which isn't done. I hit one too many compiler segfaults and starting patching the compiler instead <g>. But I really want a decent toString(). Given a BigInt n, you should be able to just do writefln("%s %x", n, n); // Phobos formatln("{0} {0:X}", n); // Tango To solve this part of the issue, it would be enough to have toString() take a string parameter. (it would be "x" or "X" in this case). string toString(string fmt); But the performance would still be very poor, and that's much more difficult to solve.

Another part (i.e. memory allocation) could be solved by providing an optional buffer to the toString: char[] toString(string format = "s" /* comes from %s which is a default qualifier */, char[] buffer = null) { // operate on the buffer, possibly resizing it // which is safe and fast - it only allocates // when *really* necessary, instead of always, as now return buffer; }

With Don's delegate idea, if you do have a toString with special performance concerns, then it can use its own stack-allocated buffer. void toString(void delegate(const(char)[]) put, string format) { char[512] preallocBuffer; foreach( ... ) { ... put(preallocBuffer[0..lenUsed]); } }

Thanks. 'put' is so much better than 'sink'. <g>
 If the buffer is going to be passed in, then probably it should be
 passed in as a full fledged output stream object with .write() methods
 and such.  I don't want to have to worry about buffer management to
 write a toString method.  That should be encapsulated.  But it seems
 to me that Don's method offers exactly the right minimality of
 interface to allow encapsulating that management without requiring it
 to be done in a heavy-handed way.

One thing it doesn't (easily) handle is the case where an int argument gives the length of another one. (eg the "%*s" writefln format). I guess this can still be handled (very inefficiently) by converting the parameter value into a text number -- generally, though, that'd only be for direct interchangability with a built-in type; you'd normally do such things by calling a member function on the struct. The other issue is grauzone's comment: perhaps compile-time varargs make this whole approach obsolete.
Nov 10 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Tue, Nov 10, 2009 at 9:16 AM, bearophile <bearophileHUGS lycos.com> wrot=
e:
 Bill Baxter:
 Maybe it's just my ignorance of BigNum issues, but those links look to
 me to be about divsion and not generating string representations. =A0Are
 those somehow synonymous in BigInt land?

Look the numeral() function inside here from those blog posts: http://www.dd.chalmers.se/~frejohl/code/div.py To convert a positive integer to string you have to keep dividing a numbe=

When the number is zero you are done:
 n =3D 541489
 result =3D ""
 while n:
 =A0 =A0n, digit =3D divmod(n, 10)
 =A0 =A0result =3D str(digit) + result # don't do this
 print repr(result) # prints '541489'

 But all those large divisions are slow if the number is huge. So that div=

decrease the computational complexity of all that. Well, anyway, slowness of BigInt is not what Don was referring to. He was talking about the general slowness of a toString interface that forces allocating enough memory to hold the entire result, instead of being able to dole out the result piecemeal. --bb
Nov 10 2009
prev sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Tue, Nov 10, 2009 at 5:27 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 2009/11/10 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 Denis Koroskin wrote:
 On Wed, 11 Nov 2009 02:49:54 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing
 toString()
 is much, much worse than useless. People think you can do somethin=








 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

Since you are in the know and probably the biggest toString() hater around: are there plans (or rejections thereof) to change toString(=







 before
 D2 turns gold? Seems to me it could break quite some code.

=A0I'm hoping someone will come up with a design. =A0Straw man: =A0void toString(void delegate(const(char)[]) sink, string fmt) { =A0// fmt holds the format string from writefln/formatln. // call sink() to print partial results. =A0}

I think the best option for toString is to take an output range and write to it. (The sink is a simplified range.) Andrei

It means toString() must be either a template, or accept an abstract InputRange interface?

It should take an interface.

So yet another type in object.d? Or require users in import something specific in every module that's going to use toString? --bb

I am not sure. Opinions as always are welcome.

That's why my opinion is that the delegate idea is nice. :-) But I guess toString is already defined by Object, right? So it would make sense for an interface needed by an Object method to be defined in object.d. I suppose it could be an interface defined inside the Object class itself? (Does that work? can you define interfaces inside classes?) --bb
Nov 10 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Tue, Nov 10, 2009 at 3:59 AM, Justin Johansson <no spam.com> wrote:
 Lutger Wrote:

 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something wi=



 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

Since you are in the know and probably the biggest toString() hater arou=


 are there plans (or rejections thereof) to change toString() before D2 t=


 gold? Seems to me it could break quite some code.

I have a feeling (and I may well be wrong) that toString might be used in relation to associative arrays. =A0I implemented an AA recently based upo=

 a struct key (I think). =A0Though I cannot remember the exact details I d=

 remember DMD saying something about toString not implemented and
 so without thinking I gave the struct a toString and that kept DMD happy.
 Since the code was throw-away I didn't bother to investigate.

 Like I say, I cannot remember the details but others may recall some simi=

 experience. =A0For all I know it may be a case of RTFM?

Shouldn't be the case. From TFM: """ Classes can be used as the KeyType. For this to work, the class definition must override the following member functions of class Object: =95hash_t toHash() =95bool opEquals(Object) =95int opCmp(Object) """ --bb
Nov 10 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Tue, Nov 10, 2009 at 2:51 AM, Don <nospam nospam.com> wrote:
 Lutger wrote:
 Justin Johansson wrote:

 Lutger Wrote:

 Justin Johansson wrote:

 I assert that the semantics of "toString" or similarly named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.

 To put this statement into perspective, I would be most appreciative =





 D NG readers responding with their own idea(s) of what the semantics =





 "toString" are (or should be) in a language agnostic ideology.

Semantics of toString would depend on the object, I would think there are three general types of objects: 1. objects with only one sensible or one clear default string representations, like integers. Maybe even none of these exist (except strings themselves?) 2. objects that, given some formatting options or locale have a clear string representation. floating points, dates, curreny and the like. 3. objects that have no sensible default representation. toString() would not make sense for 3) type objects and only for 2) ty=




 objects as part of a formatting / localization package.

 toString() as a debugging aid sometimes doubles as a formatter for 1)
 and
 2) class objects, but that may be more confusing than it's worth.

Do you think it would make better sense if programming languages/their libraries separated functions/methods which are currently loosely purposed as "toString" into methods which are more specific to the types you suggest (leaving only the types/classifications and number thereof to argue about)? In my own D project, I've introduced a toDebugString method and left toString alone. There are times when I like D's default toString printi=



 out the name of the object
 class. =A0For debug purposes there are times also when I like to see a
 string printed
 out in quotes so you can tell the difference between "123" and 123. =A0=



 again, and since I'm working on a scripting language, sometimes I like =



 see debug output distinguish between different numeric types.

 Anyway going by the replies on this topic, looks like most people view
 toString as being good for debug purposes and that about it.

 Cheers
 Justin

Your design makes better sense (to me at least) because it is based on w=


 you want a string from some object.
 Take .NET for example: it does provide very elaborate and nice formattin=


 options based and toString() with parameters. For some types however, th=


 default toString() gives you the name of the type itself which is in no =


 related to formatting an object. You learn to work with it, but I find i=


 bit muddled.
 As a last note, I think people view toString as a debug thing mostly
 because it is very underpowered.

There is a definite use for such as thing. But the existing toString() is much, much worse than useless. People think you can do something with it, but you can't. eg, people have asked for BigInt to support toString(). That is an over-my-dead-body.

You can definitely do something with it -- printf debugging. And if I were using BigInt, that's exactly why I'd want BigInt to have a toString. Just out of curiousity, how does someone print out the value of a BigInt right now? --bb
Nov 10 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 10 Nov 2009 07:49:11 -0500, Justin Johansson <no spam.com> wrote:

 Don Wrote:

 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing  


 is much, much worse than useless. People think you can do something  


 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

Since you are in the know and probably the biggest toString() hater

 are there plans (or rejections thereof) to change toString() before  

 gold? Seems to me it could break quite some code.

I'm hoping someone will come up with a design. Straw man: void toString(void delegate(const(char)[]) sink, string fmt) { // fmt holds the format string from writefln/formatln. // call sink() to print partial results. }

That's starting to look like a "serialize" method!

As it should. I should be able to print a 10000 element container without having to load a string representation of 10000 elements in memory. I'd also like to see the name toString changed to something more appropriate, like output(). And although I think a direct translation is mostly possible, emulating writefln string formatting from tango would be a burden. I don't know if there's any way around it without coming up with some complicated "formatting provider" interface/object implementation, and I don't think it's worth it. Unfortunately, I doubt Walter accepts this, it's been proposed in the past without success. -Steve
Nov 10 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Tue, Nov 10, 2009 at 6:11 AM, bearophile <bearophileHUGS lycos.com> wrote:
 Don:
 But the performance would still be very poor, and that's much more
 difficult to solve.

This may help: http://fredrik-j.blogspot.com/2008/07/making-division-in-python-faster.html http://fredrik-j.blogspot.com/2008/07/division-sequel-with-bonus-material.html http://bugs.python.org/issue3451

Though they may be useful, those don't look to have anything to do with formatting user types into strings, which is the subject at hand. --bb
Nov 10 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Tue, Nov 10, 2009 at 4:30 AM, Don <nospam nospam.com> wrote:
 =A0Just out of curiousity, how does someone print out the

 value of a BigInt right now?

In Tango, there's just .toHex() and .toDecimalString(). Needs proper formatting options, it's the biggest thing which isn't done. I hit one to=

 many compiler segfaults and starting patching the compiler instead <g>. B=

 I really want a decent toString().

Ah, ok. So there is something, it's just not called "toString". --bb
Nov 10 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Tue, Nov 10, 2009 at 4:40 AM, Don <nospam nospam.com> wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

Since you are in the know and probably the biggest toString() hater around: are there plans (or rejections thereof) to change toString() before D2 turns gold? Seems to me it could break quite some code.

I'm hoping someone will come up with a design. Straw man: void toString(void delegate(const(char)[]) sink, string fmt) { // fmt holds the format string from writefln/formatln. // call sink() to print partial results. }

That looks pretty good, actually. I guess I would like to see plain no-arg toString() still supported. A default toString() could be implemented in terms of the fancy one as: string toString() { char buf[]; toString( (string s) { buf ~= s; }, "" ); return assumeUnique!(buf); } could be a mixin in a library I suppose. I think I would like to see the format strings not necessarily tied to writefln's particular format. --bb
Nov 10 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Tue, Nov 10, 2009 at 7:04 AM, bearophile <bearophileHUGS lycos.com> wrote:
 Bill Baxter:

 Though they may be useful, those don't look to have anything to do
 with formatting user types into strings, which is the subject at hand.

Don has said: "But the performance would still be very poor, and that's much more difficult to solve." And those links show a way to quickly convert a large multi-precision integer into a string. What is that I am missing?

Maybe it's just my ignorance of BigNum issues, but those links look to me to be about divsion and not generating string representations. Are those somehow synonymous in BigInt land? --bb
Nov 10 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Tue, Nov 10, 2009 at 7:29 AM, Don <nospam nospam.com> wrote:
 Bill Baxter wrote:
 On Tue, Nov 10, 2009 at 4:40 AM, Don <nospam nospam.com> wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString(=





 is much, much worse than useless. People think you can do something
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

Since you are in the know and probably the biggest toString() hater around: are there plans (or rejections thereof) to change toString() before D2 turns gold? Seems to me it could break quite some code.

I'm hoping someone will come up with a design. Straw man: void toString(void delegate(const(char)[]) sink, string fmt) { // fmt holds the format string from writefln/formatln. // call sink() to print partial results. }

That looks pretty good, actually. I guess I would like to see plain no-arg toString() still supported.

The thing is, the toString() function is essentially a virtual function present in every struct. Each one of those functions needs a very strong justification to exist.

Structs can't have virtual functions... so what do you mean?
 A default toString() could be implemented in terms of the fancy one as:

 string toString() {
 =A0 =A0 char buf[];
 =A0 =A0 toString( (string s) { buf ~=3D s; }, "" );
 =A0 =A0 return assumeUnique!(buf);
 }

 could be a mixin in a library I suppose.

More for the benefit of consumers, or producers?

Consumers. I was just thinking it would be a little annoying to have to reproduce the above 3-line snippet of code every time I want to get the string version of an object. But I guess such needs can be adequately served by std.string.format or sformat. So scratch that, no old-style toString() needed.
 Because
 void toString(void delegate(const(char)[]) sink, string fmt) {
 =A0 sink("xxx");
 }
 isn't much more complex than:
 string toString()
 {
 =A0return "xxx";
 }
 other than the signature.

Yeh, for authors of toString methods it's fine. Well, a different way to write delegates would be nice, but that's a different discussion.
 I think I would like to see the format strings not necessarily tied to
 writefln's particular format.

I think the format strings are actually pretty similar, Tango vs writefln=

 There might be enough common ground. I think the Tango format is a slight
 superset of the writefln one.

Pretty similar, maybe, but I'd be surprised if they just happened to be identical without any attempt at compatibility having been made. --bb
Nov 10 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Tue, 10 Nov 2009 15:30:20 +0300, Don <nospam nospam.com> wrote:

 Bill Baxter wrote:
 On Tue, Nov 10, 2009 at 2:51 AM, Don <nospam nospam.com> wrote:
 Lutger wrote:
 Justin Johansson wrote:

 Lutger Wrote:

 Justin Johansson wrote:

 I assert that the semantics of "toString" or similarly  
 named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.

 To put this statement into perspective, I would be most  
 appreciative of
 D NG readers responding with their own idea(s) of what the  
 semantics of
 "toString" are (or should be) in a language agnostic ideology.

sorry. Semantics of toString would depend on the object, I would think there are three general types of objects: 1. objects with only one sensible or one clear default string representations, like integers. Maybe even none of these exist (except strings themselves?) 2. objects that, given some formatting options or locale have a clear string representation. floating points, dates, curreny and the like. 3. objects that have no sensible default representation. toString() would not make sense for 3) type objects and only for 2) type objects as part of a formatting / localization package. toString() as a debugging aid sometimes doubles as a formatter for 1) and 2) class objects, but that may be more confusing than it's worth.

Do you think it would make better sense if programming languages/their libraries separated functions/methods which are currently loosely purposed as "toString" into methods which are more specific to the types you suggest (leaving only the types/classifications and number thereof to argue about)? In my own D project, I've introduced a toDebugString method and left toString alone. There are times when I like D's default toString printing out the name of the object class. For debug purposes there are times also when I like to see a string printed out in quotes so you can tell the difference between "123" and 123. Then again, and since I'm working on a scripting language, sometimes I like to see debug output distinguish between different numeric types. Anyway going by the replies on this topic, looks like most people view toString as being good for debug purposes and that about it. Cheers Justin

on why you want a string from some object. Take .NET for example: it does provide very elaborate and nice formatting options based and toString() with parameters. For some types however, the default toString() gives you the name of the type itself which is in no way related to formatting an object. You learn to work with it, but I find it a bit muddled. As a last note, I think people view toString as a debug thing mostly because it is very underpowered.

is much, much worse than useless. People think you can do something with it, but you can't. eg, people have asked for BigInt to support toString(). That is an over-my-dead-body.

were using BigInt, that's exactly why I'd want BigInt to have a toString.

I almost always want to print the value out in hex. And with some kind of digit separators, so that I can see how many digits it has. Just out of curiousity, how does someone print out the
 value of a BigInt right now?

In Tango, there's just .toHex() and .toDecimalString(). Needs proper formatting options, it's the biggest thing which isn't done. I hit one too many compiler segfaults and starting patching the compiler instead <g>. But I really want a decent toString(). Given a BigInt n, you should be able to just do writefln("%s %x", n, n); // Phobos formatln("{0} {0:X}", n); // Tango To solve this part of the issue, it would be enough to have toString() take a string parameter. (it would be "x" or "X" in this case). string toString(string fmt); But the performance would still be very poor, and that's much more difficult to solve.

Yes, it would solve half of the toString problems. Another part (i.e. memory allocation) could be solved by providing an optional buffer to the toString: char[] toString(string format = "s" /* comes from %s which is a default qualifier */, char[] buffer = null) { // operate on the buffer, possibly resizing it // which is safe and fast - it only allocates // when *really* necessary, instead of always, as now return buffer; } You can use it almost the same way you used it before: string s = assumeUnique(someObject.toString()); // because we return a mutable string now Optimization example: int sprintf(string format, ...) { char[512] preallocatedBuffer; char[] buffer = preallocatedBuffer[]; // buffer may grow, but // initially points to a preallocatedBuffer char[] storage = buffer[]; // storage for a current element ... for (...) { // iterate over qualifiers (and arguments) string currentQualifier = format[i..j]; auto currentArgument = argsTuple[n]; char[] result = currentArgument.toString(storage); if (result.ptr is storage.ptr) { // okay, string was constructed in-place storage = storage[result.length..$]; } else { // storage didn't have enough space for the whole // string (a reallocation occurred) int offset = buffer.length - storage.length; // increase the capacity buffer.length *= 2; // append our string to the buffer buffer[offset..offset+storage.length] = storage[]; // renew the temporary storage storage = preallocatedBuffer[]; } } ... } Another example: class Array(T) { // ... private T[] elements; char[] toString(string format, char[] buffer) { auto builder = StringBuilder(buffer); // reallocates when no space left builder.append("["); foreach (i, o; elements) { if (i > 0) builder.append(", "); // separator buffer = builder.getBuffer()[appender.length..$]; char[] result = o.toString(format, buffer); if (result.ptr is buffer.ptr) { // no reallocation builder.length += result.length; // without copying } else { builder.append(result); } } builder.append("]"); return builder.toString(); } } auto array = new Array!(int); array ~= [0, 1, 2, 3, 4]; assert(array.toString() == "[0, 1, 2, 3, 4]"); It's not very easy to take advantage of, but it's usable the old way (well, almost). Any ideas?
Nov 10 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
2009/11/10 Denis Koroskin <2korden gmail.com>:
 On Tue, 10 Nov 2009 15:30:20 +0300, Don <nospam nospam.com> wrote:

 Bill Baxter wrote:
 On Tue, Nov 10, 2009 at 2:51 AM, Don <nospam nospam.com> wrote:
 Lutger wrote:
 Justin Johansson wrote:

 Lutger Wrote:

 Justin Johansson wrote:

 I assert that the semantics of "toString" or similarly
 named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.

 To put this statement into perspective, I would be most appreciati=








 of
 D NG readers responding with their own idea(s) of what the semanti=








 of
 "toString" are (or should be) in a language agnostic ideology.








 Semantics of toString would depend on the object, I would think the=







 are
 three general types of objects:

 1. objects with only one sensible or one clear default string
 representations, like integers. Maybe even none of these exist
 (except
 strings themselves?)

 2. objects that, given some formatting options or locale have a cle=







 string representation. floating points, dates, curreny and the like=







 3. objects that have no sensible default representation.

 toString() would not make sense for 3) type objects and only for 2)
 type
 objects as part of a formatting / localization package.

 toString() as a debugging aid sometimes doubles as a formatter for =







 and
 2) class objects, but that may be more confusing than it's worth.

Do you think it would make better sense if programming languages/the=






 libraries separated functions/methods which are currently loosely
 purposed
 as "toString" into methods which are more specific to the types you
 suggest (leaving only the types/classifications and number thereof t=






 argue about)?

 In my own D project, I've introduced a toDebugString method and left
 toString alone. There are times when I like D's default toString
 printing
 out the name of the object
 class. =A0For debug purposes there are times also when I like to see=






 string printed
 out in quotes so you can tell the difference between "123" and 123.
 =A0Then
 again, and since I'm working on a scripting language, sometimes I li=






 to
 see debug output distinguish between different numeric types.

 Anyway going by the replies on this topic, looks like most people vi=






 toString as being good for debug purposes and that about it.

 Cheers
 Justin






 why
 you want a string from some object.
 Take .NET for example: it does provide very elaborate and nice
 formatting
 options based and toString() with parameters. For some types however,
 the
 default toString() gives you the name of the type itself which is in =





 way
 related to formatting an object. You learn to work with it, but I fin=





 it a
 bit muddled.
 As a last note, I think people view toString as a debug thing mostly
 because it is very underpowered.

There is a definite use for such as thing. But the existing toString() is much, much worse than useless. People think you can do something with it, but you can't. eg, people have asked for BigInt to support toString(). That is an over-my-dead-body.

=A0You can definitely do something with it -- printf debugging. =A0And =



 were using BigInt, that's exactly why I'd want BigInt to have a
 toString.

I almost always want to print the value out in hex. And with some kind o=


 digit separators, so that I can see how many digits it has.

 =A0Just out of curiousity, how does someone print out the
 value of a BigInt right now?

In Tango, there's just .toHex() and .toDecimalString(). Needs proper formatting options, it's the biggest thing which isn't done. I hit one t=


 many compiler segfaults and starting patching the compiler instead <g>. =


 I really want a decent toString().

 Given a BigInt n, you should be able to just do

 writefln("%s %x", n, n); =A0// Phobos
 formatln("{0} {0:X}", n); // Tango

 To solve this part of the issue, it would be enough to have toString()
 take a string parameter. (it would be "x" or "X" in this case).

 string toString(string fmt);
 But the performance would still be very poor, and that's much more
 difficult to solve.

Yes, it would solve half of the toString problems. Another part (i.e. memory allocation) could be solved by providing an optional buffer to the toString: char[] toString(string format =3D "s" /* comes from %s which is a default qualifier */, char[] buffer =3D null) { =A0 =A0// operate on the buffer, possibly resizing it =A0 =A0// which is safe and fast - it only allocates =A0 =A0// when *really* necessary, instead of always, as now =A0 =A0return buffer; }

With Don's delegate idea, if you do have a toString with special performance concerns, then it can use its own stack-allocated buffer. void toString(void delegate(const(char)[]) put, string format) { char[512] preallocBuffer; foreach( ... ) { ... put(preallocBuffer[0..lenUsed]); } } Which in some cases (like writefln) should be almost as efficient as passing a buffer in. It avoids willy-nilly unbounded allocations anyway. But the nice thing is that it's easy to upgrade to. You can keep it simple and leave toString pretty much like you had it before, just changing the signature and the return. void toString(void delegate(const(char)[]) put, string format) { char ret[]; foreach( ... ) { ... ret ~=3D "..."; } put(ret); // only this line needed to change for Don-style toString } And to get the string you just need to call format: assert(std.string.format(thing) =3D=3D "blah"); If the buffer is going to be passed in, then probably it should be passed in as a full fledged output stream object with .write() methods and such. I don't want to have to worry about buffer management to write a toString method. That should be encapsulated. But it seems to me that Don's method offers exactly the right minimality of interface to allow encapsulating that management without requiring it to be done in a heavy-handed way. --bb
Nov 10 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Wed, 11 Nov 2009 02:49:54 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something  
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

Since you are in the know and probably the biggest toString() hater around: are there plans (or rejections thereof) to change toString() before D2 turns gold? Seems to me it could break quite some code.

Straw man: void toString(void delegate(const(char)[]) sink, string fmt) { // fmt holds the format string from writefln/formatln. // call sink() to print partial results. }

I think the best option for toString is to take an output range and write to it. (The sink is a simplified range.) Andrei

It means toString() must be either a template, or accept an abstract InputRange interface?
Nov 10 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
2009/11/10 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 Denis Koroskin wrote:
 On Wed, 11 Nov 2009 02:49:54 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString=






 is much, much worse than useless. People think you can do something
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

Since you are in the know and probably the biggest toString() hater around: are there plans (or rejections thereof) to change toString() =





 D2 turns gold? Seems to me it could break quite some code.

=A0I'm hoping someone will come up with a design. =A0Straw man: =A0void toString(void delegate(const(char)[]) sink, string fmt) { =A0// fmt holds the format string from writefln/formatln. // call sink() to print partial results. =A0}

I think the best option for toString is to take an output range and wri=



 to it. (The sink is a simplified range.)

 Andrei

It means toString() must be either a template, or accept an abstract InputRange interface?

It should take an interface.

So yet another type in object.d? Or require users in import something specific in every module that's going to use toString? --bb
Nov 10 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Wed, 11 Nov 2009 04:27:45 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Bill Baxter wrote:
 2009/11/10 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 Denis Koroskin wrote:
 On Wed, 11 Nov 2009 02:49:54 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing  
 toString()
 is much, much worse than useless. People think you can do  
 something
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

around: are there plans (or rejections thereof) to change toString() before D2 turns gold? Seems to me it could break quite some code.

Straw man: void toString(void delegate(const(char)[]) sink, string fmt) { // fmt holds the format string from writefln/formatln. // call sink() to print partial results. }

write to it. (The sink is a simplified range.) Andrei

InputRange interface?


Or require users in import something specific in every module that's going to use toString? --bb

I am not sure. Opinions as always are welcome. Andrei

Some ranges may be polymorphic, so having base interface hierarchy in Phobos would be useful anyway. BTW, save() is already implemented and used throughout the Phobos under a different name - opSlice (i.e. auto copy = range[]). It's a bikeshed discussion, but why save() and not opSlice(), or even clone()?
Nov 11 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Wed, 11 Nov 2009 18:50:47 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Denis Koroskin wrote:
 On Wed, 11 Nov 2009 04:27:45 +0300, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Bill Baxter wrote:
 2009/11/10 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 Denis Koroskin wrote:
 On Wed, 11 Nov 2009 02:49:54 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing  
 toString()
 is much, much worse than useless. People think you can do  
 something
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is  
 an
 over-my-dead-body.

hater around: are there plans (or rejections thereof) to change toString() before D2 turns gold? Seems to me it could break quite some code.

Straw man: void toString(void delegate(const(char)[]) sink, string fmt) { // fmt holds the format string from writefln/formatln. // call sink() to print partial results. }

and write to it. (The sink is a simplified range.) Andrei

InputRange interface?


Or require users in import something specific in every module that's going to use toString? --bb

I am not sure. Opinions as always are welcome. Andrei

Phobos would be useful anyway. BTW, save() is already implemented and used throughout the Phobos under a different name - opSlice (i.e. auto copy = range[]). It's a bikeshed discussion, but why save() and not opSlice(), or even clone()?

It can't be clone() because it doesn't clone. For example say you have a T[] - one would expect clone() actually copies the content. But using opSlice is a good idea. Andrei

Well, range doesn't own any of the contents it covers, so deep copy is impossible. Yet, there is also .dup array property which is pretends to be a standard way of creating instance copies.
Nov 11 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and  
 write to it. (The sink is a simplified range.)

Bad idea... A range only makes sense as a struct, not an interface/object. I'll tell you why: performance. Ranges are special in two respects: 1. They are foreachable. I think everyone agrees that calling 2 interface functions per loop iteration is much lower performing than using opApply, which calls one delegate function per loop. My recommendation -- use opApply when dealing with polymorphism. I don't think there's a way around this. 2. They are useful for passing to std.algorithm. But std.algorithm is template-interfaced. No need for using interfaces because the correct instatiation will be chosen. If you are intending to add a streaming module that uses ranges, would it not be templated for the range type as std.algorithm is? If not, the next logical choice is a delegate, which requires no vtable lookup. Using an interface is just asking for a performance penalty for not much gain. Here's what I mean by not much gain: I would expect a stream range that does output to have a method in it for outputting a buffer (I'd laugh at you if you wanted to define a stream range that outputs a character at a time). So the difference between: x.toString(outputRange, format) and x.toString(&outputRange.sink, format) is pretty darn minimal, and if outputRange is an interface or object, this saves a virtual call per buffer write. Plus the second form is more universal, you can pass any delegate, and not have to use a range type to wrap a delegate. Don't fall into the "OOP newbie" trap -- where just because you've found a new concept that is amazing, you want to use it for everything. I say this because I've seen in the past where someone discovers the power of OOP and then wants to use it for everything, when in some cases, it's overkill. Just look at some Java "classes"... From another thread:
 Walter does not feel strongly about Phobos.

Huh? I feel like this sentence doesn't make sense, so maybe there's a typo. -Steve
Nov 12 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 12 Nov 2009 08:22:26 -0500, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and  
 write to it. (The sink is a simplified range.)

Bad idea... A range only makes sense as a struct, not an interface/object. I'll tell you why: performance. Ranges are special in two respects: 1. They are foreachable. I think everyone agrees that calling 2 interface functions per loop iteration is much lower performing than using opApply, which calls one delegate function per loop. My recommendation -- use opApply when dealing with polymorphism. I don't think there's a way around this.

Oops, I meant 3 virtual functions -- front, popNext, and empty. -Steve
Nov 12 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 12 Nov 2009 16:23:22 +0300, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Thu, 12 Nov 2009 08:22:26 -0500, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and  
 write to it. (The sink is a simplified range.)

Bad idea... A range only makes sense as a struct, not an interface/object. I'll tell you why: performance. Ranges are special in two respects: 1. They are foreachable. I think everyone agrees that calling 2 interface functions per loop iteration is much lower performing than using opApply, which calls one delegate function per loop. My recommendation -- use opApply when dealing with polymorphism. I don't think there's a way around this.

Oops, I meant 3 virtual functions -- front, popNext, and empty. -Steve

Output range has only one method: put. I'm not sure, but I don't think there is a performance difference between calling a virtual function through an interface and invoking a delegate. But I agree passing a delegate is more generic. You can substitute an output range with a delegate (obj.toString(&range.put, fmt)) without any performance hit, but not vice versa (obj.toString(new DelegateWrapRange(&myput), fmt) implies an additional allocation and additional indirection per range.put call).
Nov 12 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 12 Nov 2009 08:56:06 -0500, Denis Koroskin <2korden gmail.com>  
wrote:

 On Thu, 12 Nov 2009 16:23:22 +0300, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Thu, 12 Nov 2009 08:22:26 -0500, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and  
 write to it. (The sink is a simplified range.)

Bad idea... A range only makes sense as a struct, not an interface/object. I'll tell you why: performance. Ranges are special in two respects: 1. They are foreachable. I think everyone agrees that calling 2 interface functions per loop iteration is much lower performing than using opApply, which calls one delegate function per loop. My recommendation -- use opApply when dealing with polymorphism. I don't think there's a way around this.

Oops, I meant 3 virtual functions -- front, popNext, and empty. -Steve

Output range has only one method: put.

I was referring to range's ability to interact with foreach. An output range wouldn't qualify as a foreachable entity anyways (and rightfully so). Just covering all the bases.
 I'm not sure, but I don't think there is a performance difference  
 between calling a virtual function through an interface and invoking a  
 delegate.

Yes, there is: A delegate is equivalent to a struct member function call. (load data pointer (i.e. this), push args, call function) A virtual function uses a vtable to look up the function address, and then is equivalent to a struct member call. An interface function call is equivalent to a virtual call with the added penalty that you might have to adjust the 'this' pointer before calling.
 But I agree passing a delegate is more generic. You can substitute an  
 output range with a delegate (obj.toString(&range.put, fmt)) without any  
 performance hit, but not vice versa (obj.toString(new  
 DelegateWrapRange(&myput), fmt) implies an additional allocation and  
 additional indirection per range.put call).

You can use scope classes to avoid the allocation, but you can't get around the virtual/interface call penalty. But even if a range is a struct, it's simply a different form of delegate, one in which you undoubtedly call only one member function. Might as well use a delegate to allow the most usefulness. -Steve
Nov 12 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 12 Nov 2009 10:29:17 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and  
 write to it. (The sink is a simplified range.)

A range only makes sense as a struct, not an interface/object. I'll tell you why: performance.

You are right. If range interfaces accommodate block transfers, this problem may be addressed. I agree that one virtual call per character output would be overkill. (I seem to recall it's one of the reasons why C++'s iostreams are so inefficient.)

IIRC, I don't think C++ iostreams use polymorphism, and I don't think they use the "one char at a time" method.
 Ranges are special in two respects:
  1. They are foreachable.  I think everyone agrees that calling 2  
 interface functions per loop iteration is much lower performing than  
 using opApply, which calls one delegate function per loop.  My  
 recommendation -- use opApply when dealing with polymorphism.  I don't  
 think there's a way around this.

 2. They are useful for passing to std.algorithm.  But std.algorithm is  
 template-interfaced.  No need for using interfaces because the correct  
 instatiation will be chosen.
  If you are intending to add a streaming module that uses ranges, would  
 it not be templated for the range type as std.algorithm is?  If not,  
 the next logical choice is a delegate, which requires no vtable  
 lookup.  Using an interface is just asking for a performance penalty  
 for not much gain.

I think the cost of calling through the delegate is roughly the same as a virtual call.

Not exactly. I think you are right that struct member calls are faster than delegates, but only slightly. The difference being that a struct member call does not need to load the function address from the stack, it can hard-code the address directly. However, virtual calls have to be lower performing because you are doing two indirections, one to the class vtable, then one to the function address itself. Plus those two locations are most likely located on the heap, not the stack, and so may not be in the cache.
 x.toString(outputRange, format)
  and
  x.toString(&outputRange.sink, format)
  is pretty darn minimal, and if outputRange is an interface or object,  
 this saves a virtual call per buffer write.  Plus the second form is  
 more universal, you can pass any delegate, and not have to use a range  
 type to wrap a delegate.
  Don't fall into the "OOP newbie" trap -- where just because you've  
 found a new concept that is amazing, you want to use it for  
 everything.  I say this because I've seen in the past where someone  
 discovers the power of OOP and then wants to use it for everything,  
 when in some cases, it's overkill.  Just look at some Java "classes"...

There is no need to worry that I'll fall into at least that particular OOP newbie trap. What I think we should do is define a text output interface that allows writing individual characters of all widths and also arrays of all widths. That would be a universal means for text output. interface TextOutputStream { void put(dchar); // also accommodates char and wchar void put(in char[]); void put(in wchar[]); void put(in dchar[]); } The toString method (re-baptized as toStream) would take such an interface. Better ideas are always welcome. Perhaps I'm falling another OOP newbie trap! (Seriously!)

This still fits within a single function, which takes one of the 3 widths (pick one, they can all be translated to eachother): void put(in char[] str) { foreach(dchar dc; str) { put((&dc)[0..1]); } } Note that you probably want to build a buffer of dchars instead of putting one at a time, but you get the idea. Also, putting a single character is probably pretty uncommon, but can be handled in a similar fashion. That being said, one other point that makes all this moot is -- toString is for debugging, not for general purpose. We don't need to support everything that is possible. You should be able to say "hey, toString only accepts char[], deal." Of course, you could substitute wchar[] or dchar[], but I think by far char[] is the most common (and is the default type for string literals). That's not to say there is no reason to have a TextOutputStream object. Such a thing is perfectly usable for a toString which takes a char[] delegate sink, just pass &put. In fact, there could be a default toString function in Object that does just that: class Object { ... void toString(delegate void(in char[] buf) put, string fmt) const {} void toString(TextOutputStream tos, string fmt) const { toString(&tos.put, fmt); } } Of course, then TextOutputStream has to be druntime-accessible, so maybe it's not a great idea... But there are ways around that: abstract class BaseTextOutputStream : TextOutputStream { void format(const Object o, string fmt) { o.toString(&this.put, fmt); } }
  From another thread:
 Walter does not feel strongly about Phobos.

typo.

I meant to say, Walter does not want to do library design.

I'm trying to remember but I thought he did care about this particular issue, but it may be muddled in my memory. Also note that toString has special status from the compiler in regards to structs (that hack with the xtoString function in the struct's typeinfo), so it doesn't just affect library code. -Steve
Nov 12 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 12 Nov 2009 11:14:56 -0500, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Thu, 12 Nov 2009 10:29:17 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the cost of calling through the delegate is roughly the same as  
 a virtual call.

Not exactly. I think you are right that struct member calls are faster than delegates, but only slightly. The difference being that a struct member call does not need to load the function address from the stack, it can hard-code the address directly. However, virtual calls have to be lower performing because you are doing two indirections, one to the class vtable, then one to the function address itself. Plus those two locations are most likely located on the heap, not the stack, and so may not be in the cache.

Some rudamentary attempts at benchmarking: testme.d: struct S { void foo(int x){} } interface I { void foo(int x); } class C : I { void foo(int x){} } const loopcount = 10_000_000_000L; void doVirtual() { C c = new C; for(auto x = loopcount; x > 0; x--) c.foo(x); } void doInterface() { I i = new C; for(auto x = loopcount; x > 0; x--) i.foo(x); } void doDelegate() { auto d = new C; auto dg = &d.foo; for(auto x = loopcount; x > 0; x--) dg(x); } void doStruct() { S s; for(auto x = loopcount; x > 0; x--) s.foo(x); } void main(char[][] args) { switch(args[1]) { case "virtual": doVirtual(); break; case "interface": doInterface(); break; case "struct": doStruct(); break; case "delegate": doDelegate(); break; } } [steves steveslaptop testd]$ time ./testme interface real 1m18.152s user 1m16.638s sys 0m0.015s [steves steveslaptop testd]$ time ./testme virtual real 1m11.146s user 1m10.497s sys 0m0.014s [steves steveslaptop testd]$ time ./testme struct real 1m5.828s user 1m5.249s sys 0m0.011s [steves steveslaptop testd]$ time ./testme delegate real 1m10.464s user 1m9.856s sys 0m0.010s According to this, delegates are slightly faster than virtual calls, but not by much. By far a direct call is faster, but I was surprised at how little overhead virtual calls add in relation to the loop counter. I had to use 10 billion loops or else the difference was undetectable. I used dmd 1.046 -release -O (the -release is needed to get rid of the class method checking the invariant every call). The relative assembly for calling a virtual method is: mov ECX,[EBX] mov EAX,EBX push dword ptr -8[EBP] call dword ptr 014h[ECX] and the assembly for calling a delegate is: push dword ptr -8[EBP] mov EAX,-010h[EBP] call EBX -Steve
Nov 12 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 12 Nov 2009 11:46:48 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 10:29:17 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and  
 write to it. (The sink is a simplified range.)

A range only makes sense as a struct, not an interface/object. I'll tell you why: performance.

You are right. If range interfaces accommodate block transfers, this problem may be addressed. I agree that one virtual call per character output would be overkill. (I seem to recall it's one of the reasons why C++'s iostreams are so inefficient.)


Oh yes they do. (Did you even google?) Virtual multiple inheritance, the works. http://www.deitel.com/articles/cplusplus_tutorials/20060225/virtualBaseClass/

From my C++ book, it appears to only use virtual inheritance. I don't know enough about virtual inheritance to know how that changes function calls. As far as virtual functions, only the destructor is virtual, so there is no issue there.
  void put(in char[] str)
 {
   foreach(dchar dc; str)
   {
      put((&dc)[0..1]);
   }
 }
  Note that you probably want to build a buffer of dchars instead of  
 putting one at a time, but you get the idea.

I don't get the idea. I'm seeing one virtual call per character.

You missed the note. I didn't implement it, but you could easily implement a stack-allocated buffer to cache the conversions, passing multiple converted code-points at once. But I don't think it's even worth discussing per my other points.
 That being said, one other point that makes all this moot is --  
 toString is for debugging, not for general purpose.  We don't need to  
 support everything that is possible.  You should be able to say "hey,  
 toString only accepts char[], deal."  Of course, you could substitute  
 wchar[] or dchar[], but I think by far char[] is the most common (and  
 is the default type for string literals).

I was hoping we could elevate the usefulness of toString a bit.

Whatever kind of data the output stream gets, it's going to convert it to the format it wants anyways (as for stdout, I think that would be utf8), the only benefit is if you have data stored in a different width that you wanted to output. Calling a conversion function in that case I think is reasonable enough, and saves the output stream from having to convert/deal with it. In other words, I don't think it's going to be that common a case where you need anything other than utf8 output, and therefore the cost of creating an interface, making virtual calls, disallowing simple delegate passing etc is worth the convenience *just in case* you have data stored as wchar[] you want to output.
 That's not to say there is no reason to have a TextOutputStream  
 object.  Such a thing is perfectly usable for a toString which takes a  
 char[] delegate sink, just pass &put.  In fact, there could be a  
 default toString function in Object that does just that:
  class Object
 {
    ...
    void toString(delegate void(in char[] buf) put, string fmt) const
    {}
    void toString(TextOutputStream tos, string fmt) const
    { toString(&tos.put, fmt); }
 }

I'd agree with the delegate idea if we established that UTF-8 is favored compared to all other formats.

D seems to favor UTF8 -- it is the default type for string literals. I don't think I've ever used dchar, and I usually only use wchar to talk to Win32 functions when required. The question I'd ask is -- how common is it where the versions other than char[] would be more convenient? -Steve
Nov 12 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 12 Nov 2009 12:38:00 -0500, dsimcha <dsimcha yahoo.com> wrote:

 == Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
  By far a direct call is faster, but I was surprised at how
 little overhead virtual calls add in relation to the loop counter.  I  
 had
 to use 10 billion loops or else the difference was undetectable.
 I used dmd 1.046 -release -O (the -release is needed to get rid of the
 class method checking the invariant every call).
 The relative assembly for calling a virtual method is:
 mov	ECX,[EBX]
 mov	EAX,EBX
 push	dword ptr -8[EBP]
 call	dword ptr 014h[ECX]
 and the assembly for calling a delegate is:
 push	dword ptr -8[EBP]
 mov	EAX,-010h[EBP]
 call	EBX
 -Steve

Your benchmarks don't show that the direct call is much faster. You had inlining disabled. Was this intentional? If so, it proves my point that most of the overhead from virtual calls comes from the fact that they can't usually be inlined, not because they're virtual.

The direct call was 5 seconds faster. Divide by 10 billion and you get a small but present amount. Inlining makes the struct member function call disappear (b/c foo does nothing!), so it's not really a relevant benchmark. I did the "struct" version as a baseline. Consider that the struct version is the cost of doing the loop increments, pushing the 'this' pointer and argument, and calling the function. Any difference from that is the overhead of virtual/delegate/interface calls. Inlining is not possible with delegates (yet), so it's not really important for this argument. -Steve
Nov 12 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 12 Nov 2009 13:46:08 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

   From my C++ book, it appears to only use virtual inheritance.  I  
 don't know enough about virtual inheritance to know how that changes  
 function calls.
  As far as virtual functions, only the destructor is virtual, so there  
 is no issue there.

You're right, but there is an issue because as far as I can recall these functions' implementation do end up calling a virtual function per char; that might be streambuf.overflow. I'm not keen on investigating this any further, but I'd be grateful if you shared any related knowledge.

Yep, you are right. It appears the reason they do this is so the conversion to the appropriate width can be done per character (and is a no-op for char).
 At the end of the day, there seem to be violent agreement that we don't  
 want one virtual call per character or one delegate call per character.

After running my tests, it appears the virtual call vs. delegate is so negligible, and the virtual call vs. direct call is only slightly less negligible, I think the virtualness may not matter. However, I think avoiding one *call* per character is a worthy goal. This doesn't mean I change my mind :) I still think there is little benefit to having to conjure up an entire object just to convert something to a string vs. writing a simple inner function. One way to find out is to support only char[], and see who complains :) It'd be much easier to go from supporting char[] to supporting all the widths than going from supporting all to just one. -Steve
Nov 12 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 12 Nov 2009 14:40:12 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 13:46:08 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

   From my C++ book, it appears to only use virtual inheritance.  I  
 don't know enough about virtual inheritance to know how that changes  
 function calls.
  As far as virtual functions, only the destructor is virtual, so  
 there is no issue there.

You're right, but there is an issue because as far as I can recall these functions' implementation do end up calling a virtual function per char; that might be streambuf.overflow. I'm not keen on investigating this any further, but I'd be grateful if you shared any related knowledge.

conversion to the appropriate width can be done per character (and is a no-op for char).
 At the end of the day, there seem to be violent agreement that we  
 don't want one virtual call per character or one delegate call per  
 character.

negligible, and the virtual call vs. direct call is only slightly less negligible, I think the virtualness may not matter. However, I think avoiding one *call* per character is a worthy goal. This doesn't mean I change my mind :) I still think there is little benefit to having to conjure up an entire object just to convert something to a string vs. writing a simple inner function. One way to find out is to support only char[], and see who complains :) It'd be much easier to go from supporting char[] to supporting all the widths than going from supporting all to just one.

One problem I just realized is that, if we e.g. offer only put(in char[]) or a delegate to that effect, we make it impossible to output one character efficiently. The (&c)[0 .. 1] trick will not work in safe mode. You'd have to allocate a one-element array dynamically.

char[1] buf; buf[0] = c; put(buf); Although it would be a useful feature to be able to convert a value type to an array of one element reference, especially since that should be as safe as taking a slice of a static array. Another solution, although I'm unaware of the added costs: void toString(void delegate(in char[]...) put, string fmt);
 Also, many OSs adopted UTF-16 as their standard format. It may be wise  
 to design for compatibility.

So you want toString's to look like this? version(utf16isdefault) { textobj.put("Array: "w); ... } else { textobj.put("Array: "); ... } -Steve
Nov 12 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 12 Nov 2009 16:19:39 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 14:40:12 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 13:46:08 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

   From my C++ book, it appears to only use virtual inheritance.  I  
 don't know enough about virtual inheritance to know how that  
 changes function calls.
  As far as virtual functions, only the destructor is virtual, so  
 there is no issue there.

You're right, but there is an issue because as far as I can recall these functions' implementation do end up calling a virtual function per char; that might be streambuf.overflow. I'm not keen on investigating this any further, but I'd be grateful if you shared any related knowledge.

conversion to the appropriate width can be done per character (and is a no-op for char).
 At the end of the day, there seem to be violent agreement that we  
 don't want one virtual call per character or one delegate call per  
 character.

so negligible, and the virtual call vs. direct call is only slightly less negligible, I think the virtualness may not matter. However, I think avoiding one *call* per character is a worthy goal. This doesn't mean I change my mind :) I still think there is little benefit to having to conjure up an entire object just to convert something to a string vs. writing a simple inner function. One way to find out is to support only char[], and see who complains :) It'd be much easier to go from supporting char[] to supporting all the widths than going from supporting all to just one.

One problem I just realized is that, if we e.g. offer only put(in char[]) or a delegate to that effect, we make it impossible to output one character efficiently. The (&c)[0 .. 1] trick will not work in safe mode. You'd have to allocate a one-element array dynamically.

buf[0] = c; put(buf);

This would not compile in SafeD.

:O Why not? I would expect that using a local buffer would be the main way for converting non-string things to strings, or to avoid calling the delegate/vfunction lots of times. i.e. if I want to output an integer i: if(i == 0) put("0"); else { char[20] buf; int idx = buf.length - 1; while(i != 0) { buf[idx] = i % 10; --idx; i /= 10; } put(buf[idx..$]); // no compily in SafeD??? } Do I have to allocate a heap buffer in SafeD?
 Also, many OSs adopted UTF-16 as their standard format. It may be wise  
 to design for compatibility.

version(utf16isdefault) { textobj.put("Array: "w); ... } else { textobj.put("Array: "); ... } -Steve

I was just thinking of offering an interface that offers utf8 and utf16 and utf32.

Yes, and your explaination for this is because many OSes adopt UTF-16 as their standard format. My expectation is that the outputter will convert to the required OS format anyways, regardless of what you pass it, so why should we write code to cater to what the OS wants? I'd like to write string-handling code once and be done with it, not try to optimize my toString functions so that they use the "right" methods for the current OS. I asserted that the only reason you want to use the functions other than the char[] version is in the case where your data is *stored* as wchar[] or dchar[]. Otherwise, it makes no sense to do the conversion because the outputter already does it for you. So the question becomes, how often do you need to output data that's already in dchar[] or wchar[] format, and is it worth passing around a list of functions just in case you need that, or should you just call a conversion routine the few times you need it? Let's not forget that this is mainly for debugging... -Steve
Nov 12 2009
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 12 Nov 2009 16:54:13 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:


  Let's not forget that this is mainly for debugging...

If it's mainly for debugging maybe it's not worth spending time on.

Debugging is not always done by the developer on his system where a debugger is available. The main use I see for toString is logging (for the purpose of debugging post-mortem failures on customer's systems). -Steve
Nov 12 2009