www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Re: Semantics of toString

Andrei Alexandrescu Wrote:

 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 11:46:48 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 10:29:17 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range 
 and write to it. (The sink is a simplified range.)

A range only makes sense as a struct, not an interface/object. I'll tell you why: performance.

You are right. If range interfaces accommodate block transfers, this problem may be addressed. I agree that one virtual call per character output would be overkill. (I seem to recall it's one of the reasons why C++'s iostreams are so inefficient.)


Oh yes they do. (Did you even google?) Virtual multiple inheritance, the works. http://www.deitel.com/articles/cplusplus_tutorials/20060225/virtualBaseClass/

From my C++ book, it appears to only use virtual inheritance. I don't know enough about virtual inheritance to know how that changes function calls. As far as virtual functions, only the destructor is virtual, so there is no issue there.

You're right, but there is an issue because as far as I can recall these functions' implementation do end up calling a virtual function per char; that might be streambuf.overflow. I'm not keen on investigating this any further, but I'd be grateful if you shared any related knowledge. At the end of the day, there seem to be violent agreement that we don't want one virtual call per character or one delegate call per character.
  void put(in char[] str)
 {
   foreach(dchar dc; str)
   {
      put((&dc)[0..1]);
   }
 }
  Note that you probably want to build a buffer of dchars instead of 
 putting one at a time, but you get the idea.

I don't get the idea. I'm seeing one virtual call per character.

You missed the note. I didn't implement it, but you could easily implement a stack-allocated buffer to cache the conversions, passing multiple converted code-points at once. But I don't think it's even worth discussing per my other points.
 That being said, one other point that makes all this moot is -- 
 toString is for debugging, not for general purpose.  We don't need to 
 support everything that is possible.  You should be able to say "hey, 
 toString only accepts char[], deal."  Of course, you could substitute 
 wchar[] or dchar[], but I think by far char[] is the most common (and 
 is the default type for string literals).

I was hoping we could elevate the usefulness of toString a bit.

Whatever kind of data the output stream gets, it's going to convert it to the format it wants anyways (as for stdout, I think that would be utf8), the only benefit is if you have data stored in a different width that you wanted to output. Calling a conversion function in that case I think is reasonable enough, and saves the output stream from having to convert/deal with it. In other words, I don't think it's going to be that common a case where you need anything other than utf8 output, and therefore the cost of creating an interface, making virtual calls, disallowing simple delegate passing etc is worth the convenience *just in case* you have data stored as wchar[] you want to output.

I'm not sure. http://www.gnu.org/s/libc/manual/html_node/Streams-and-I18N.html#Streams-and-I18N gnu defines means to set and detect a utf-16 console, which dmd observes (grep std/ for fwide). But then I'm not sure how many are using that kind of stuff.
 That's not to say there is no reason to have a TextOutputStream 
 object.  Such a thing is perfectly usable for a toString which takes 
 a char[] delegate sink, just pass &put.  In fact, there could be a 
 default toString function in Object that does just that:
  class Object
 {
    ...
    void toString(delegate void(in char[] buf) put, string fmt) const
    {}
    void toString(TextOutputStream tos, string fmt) const
    { toString(&tos.put, fmt); }
 }

I'd agree with the delegate idea if we established that UTF-8 is favored compared to all other formats.

D seems to favor UTF8 -- it is the default type for string literals. I don't think I've ever used dchar, and I usually only use wchar to talk to Win32 functions when required. The question I'd ask is -- how common is it where the versions other than char[] would be more convenient?

I don't know. I think Asian-language users might give a salient answer.

&#20126;&#27954;&#29992;&#25142;&#26377;&#19968;&#20491;&#31361;&#20986;&#30340;&#31572;&#26696;
Nov 12 2009