www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - toString refactor in druntime

reply Benjamin Thaut <code benjamin-thaut.de> writes:
I'm planning on doing a pull request for druntime which rewrites every 
toString function within druntime to use the new sink signature. That 
way druntime would cause a lot less allocations which end up beeing 
garbage right away. Are there any objections against doing so? Any 
reasons why such a pull request would not get accepted?

Kind Regards
Benjamin Thaut
Oct 27 2014
next sibling parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Benjamin Thaut"  wrote in message news:m2kt16$2566$1 digitalmars.com... 

 I'm planning on doing a pull request for druntime which rewrites every 
 toString function within druntime to use the new sink signature. That 
 way druntime would cause a lot less allocations which end up beeing 
 garbage right away. Are there any objections against doing so? Any 
 reasons why such a pull request would not get accepted?
How ugly is it going to be, since druntime can't use std.format?
Oct 27 2014
parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 27.10.2014 11:07, schrieb Daniel Murphy:
 "Benjamin Thaut"  wrote in message news:m2kt16$2566$1 digitalmars.com...
 I'm planning on doing a pull request for druntime which rewrites every
 toString function within druntime to use the new sink signature. That
 way druntime would cause a lot less allocations which end up beeing
 garbage right away. Are there any objections against doing so? Any
 reasons why such a pull request would not get accepted?
How ugly is it going to be, since druntime can't use std.format?
They wouldn't get any uglier than they already are, because the current toString functions within druntime also can't use std.format. An example would be to toString function of TypInfo_StaticArray: override string toString() const { SizeStringBuff tmpBuff = void; return value.toString() ~ "[" ~ cast(string)len.sizeToTempString(tmpBuff) ~ "]"; } Would be replaced by: override void toString(void delegate(const(char)[]) sink) const { SizeStringBuff tmpBuff = void; value.toString(sink); sink("["); sink(cast(string)len.sizeToTempString(tmpBuff)); sink("]"); } The advantage would be that the new version now ideally never allocates. While the old version allocated 3 times of which 2 allocations end up beeing garbage right away. Also I rember reading that the long term goal is to convert all toString functions to the sink version. Kind Regards Benjamin Thaut
Oct 27 2014
next sibling parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Benjamin Thaut"  wrote in message news:m2m3j2$ciu$1 digitalmars.com...

 They wouldn't get any uglier than they already are, because the current 
 toString functions within druntime also can't use std.format.

 An example would be to toString function of TypInfo_StaticArray:

 override string toString() const
 {
 SizeStringBuff tmpBuff = void;
 return value.toString() ~ "[" ~ cast(string)len.sizeToTempString(tmpBuff) 
 ~ "]";
 }

 Would be replaced by:

 override void toString(void delegate(const(char)[]) sink) const
 {
   SizeStringBuff tmpBuff = void;
 value.toString(sink);
 sink("[");
 sink(cast(string)len.sizeToTempString(tmpBuff));
 sink("]");
 }

 The advantage would be that the new version now ideally never allocates. 
 While the old version allocated 3 times of which 2 allocations end up 
 beeing garbage right away.

 Also I rember reading that the long term goal is to convert all toString 
 functions to the sink version.
It's very ugly compared to the formattedWrite version, and it does increase the line count compared to the current version (this is the main disadvantage of the sink-based toString IMO). If this is as bad as it gets, PR approval shouldn't be a problem.
Oct 27 2014
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/27/14 2:45 PM, Daniel Murphy wrote:
 "Benjamin Thaut"  wrote in message news:m2m3j2$ciu$1 digitalmars.com...

 They wouldn't get any uglier than they already are, because the
 current toString functions within druntime also can't use std.format.

 An example would be to toString function of TypInfo_StaticArray:

 override string toString() const
 {
 SizeStringBuff tmpBuff = void;
 return value.toString() ~ "[" ~
 cast(string)len.sizeToTempString(tmpBuff) ~ "]";
 }

 Would be replaced by:

 override void toString(void delegate(const(char)[]) sink) const
 {
   SizeStringBuff tmpBuff = void;
 value.toString(sink);
 sink("[");
 sink(cast(string)len.sizeToTempString(tmpBuff));
 sink("]");
 }

 The advantage would be that the new version now ideally never
 allocates. While the old version allocated 3 times of which 2
 allocations end up beeing garbage right away.

 Also I rember reading that the long term goal is to convert all
 toString functions to the sink version.
It's very ugly compared to the formattedWrite version, and it does increase the line count compared to the current version (this is the main disadvantage of the sink-based toString IMO). If this is as bad as it gets, PR approval shouldn't be a problem.
Might I suggest a helper that takes a sink and a variadic list of strings, and outputs those strings to the sink in order. -Steve
Oct 27 2014
parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Mon, 27 Oct 2014 15:20:24 -0400
Steven Schveighoffer via Digitalmars-d <digitalmars-d puremagic.com>
wrote:

 Might I suggest a helper that takes a sink and a variadic list of=20
 strings, and outputs those strings to the sink in order.
hehe. simple CTFE writef seems to be a perfect fit for druntime. i implemented very "barebone", yet powerful PoC in ~18Kb of code (~500 lines). it can be made even smaller if necessary. then one can use something like `writef!"<%s:%5s>"(sink, msg, code);` and so on.
Oct 27 2014
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/27/14 4:24 PM, ketmar via Digitalmars-d wrote:
 On Mon, 27 Oct 2014 15:20:24 -0400
 Steven Schveighoffer via Digitalmars-d <digitalmars-d puremagic.com>
 wrote:

 Might I suggest a helper that takes a sink and a variadic list of
 strings, and outputs those strings to the sink in order.
hehe. simple CTFE writef seems to be a perfect fit for druntime. i implemented very "barebone", yet powerful PoC in ~18Kb of code (~500 lines). it can be made even smaller if necessary. then one can use something like `writef!"<%s:%5s>"(sink, msg, code);` and so on.
I think this is overkill for this purpose. We need something simple to save a few lines of code. -Steve
Oct 27 2014
parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Mon, 27 Oct 2014 17:04:55 -0400
Steven Schveighoffer via Digitalmars-d <digitalmars-d puremagic.com>
wrote:

 I think this is overkill for this purpose. We need something simple
 to save a few lines of code.
18KB (even less) module which consists mostly of functional templates, generates nice string mixin and adds part of writef functionality and can convert nicely formatted strings and args to series of calls to any function is overkill? ok. who i am to teach people that metaprogramming is the way to automate boring things... sure, i won't force anyone to use that, i can't take the pleasure of writing sink boilerplate calls from people.
Oct 27 2014
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/27/14 6:02 PM, ketmar via Digitalmars-d wrote:
 On Mon, 27 Oct 2014 17:04:55 -0400
 Steven Schveighoffer via Digitalmars-d <digitalmars-d puremagic.com>
 wrote:

 I think this is overkill for this purpose. We need something simple
 to save a few lines of code.
18KB (even less) module which consists mostly of functional templates, generates nice string mixin and adds part of writef functionality and can convert nicely formatted strings and args to series of calls to any function is overkill? ok. who i am to teach people that metaprogramming is the way to automate boring things...
Meta has a cost with the current compiler. It would be nice if it didn't, but I have practical concerns.
 sure, i won't force anyone to use that, i can't take the pleasure of
 writing sink boilerplate calls from people.
I think a few simple functions can suffice for druntime's purposes. We don't need a kitchen sink function (pun intended). -Steve
Oct 28 2014
parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Tue, 28 Oct 2014 08:37:43 -0400
Steven Schveighoffer via Digitalmars-d <digitalmars-d puremagic.com>
wrote:

 Meta has a cost with the current compiler. It would be nice if it=20
 didn't, but I have practical concerns.
i don't think that there will be alot calls to 'write[f]' anyway. i know that CTFE is not costless (i once wrote a simple Convey's Life sample and summoned OOM-killer with CTFE .lif parser! ;-), but this can be compensated by adding CTFE writef use function-by-function.
 I think a few simple functions can suffice for druntime's purposes. We=20
 don't need a kitchen sink function (pun intended).
ah, those famous last words... ;-) from my observations it's enough to implement '%[+-]width[.maxlen]s' and the same for '%x'. i also added codes to skip arguments and to print all what's left ('%*'). i'm sure that it can be done in <10KB, and it will be very handy to have. druntime doesn't do alot of printing and string conversions anyway. and phobos is already ridden with templates and CTFE.
Oct 28 2014
parent "Martin Nowak" <code dawg.eu> writes:
On Tuesday, 28 October 2014 at 21:32:14 UTC, ketmar via 
Digitalmars-d wrote:
 On Tue, 28 Oct 2014 08:37:43 -0400
 I think a few simple functions can suffice for druntime's 
 purposes. We don't need a kitchen sink function (pun intended).
ah, those famous last words... ;-) from my observations it's enough to implement '%[+-]width[.maxlen]s' and the same for '%x'. i also added codes to skip arguments and to print all what's left ('%*'). i'm sure that it can be done in <10KB, and it will be very handy to have. druntime doesn't do alot of printing and string conversions anyway. and phobos is already ridden with templates and CTFE.
https://github.com/D-Programming-Language/druntime/pull/662
Oct 31 2014
prev sibling parent reply Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:
 28 October 2014 04:40, Benjamin Thaut via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 Am 27.10.2014 11:07, schrieb Daniel Murphy:

 "Benjamin Thaut"  wrote in message news:m2kt16$2566$1 digitalmars.com...
 I'm planning on doing a pull request for druntime which rewrites every
 toString function within druntime to use the new sink signature. That
 way druntime would cause a lot less allocations which end up beeing
 garbage right away. Are there any objections against doing so? Any
 reasons why such a pull request would not get accepted?
How ugly is it going to be, since druntime can't use std.format?
They wouldn't get any uglier than they already are, because the current toString functions within druntime also can't use std.format. An example would be to toString function of TypInfo_StaticArray: override string toString() const { SizeStringBuff tmpBuff = void; return value.toString() ~ "[" ~ cast(string)len.sizeToTempString(tmpBuff) ~ "]"; } Would be replaced by: override void toString(void delegate(const(char)[]) sink) const { SizeStringBuff tmpBuff = void; value.toString(sink); sink("["); sink(cast(string)len.sizeToTempString(tmpBuff)); sink("]"); }
The thing that really worries me about this synk API is that your code here produces (at least) 4 calls to a delegate. That's a lot of indirect function calling, which can be a severe performance hazard on some systems. We're trading out garbage for low-level performance hazards, which may imply a reduction in portability. I wonder if the trade-off between flexibility and performance went a little too far towards flexibility in this case. It's better, but I don't think it hits the mark, and I'm not sure that hitting the mark on both issues is impossible. But in any case, I think all synk code like this should aim to call the user supplied synk delegate at most *once* per toString. I'd like to see code that used the stack to compose the string locally, then feed it through to the supplied synk delegate in fewer (or one) calls. Ideally, I guess I'd prefer to see an overload which receives a slice to write to instead and do away with the delegate call. Particularly in druntime, where API and potential platform portability decisions should be *super*conservative.
 The advantage would be that the new version now ideally never allocates.
 While the old version allocated 3 times of which 2 allocations end up beeing
 garbage right away.

 Also I rember reading that the long term goal is to convert all toString
 functions to the sink version.

 Kind Regards
 Benjamin Thaut
Oct 27 2014
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/27/14 8:01 PM, Manu via Digitalmars-d wrote:
   28 October 2014 04:40, Benjamin Thaut via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 Am 27.10.2014 11:07, schrieb Daniel Murphy:

 "Benjamin Thaut"  wrote in message news:m2kt16$2566$1 digitalmars.com...
 I'm planning on doing a pull request for druntime which rewrites every
 toString function within druntime to use the new sink signature. That
 way druntime would cause a lot less allocations which end up beeing
 garbage right away. Are there any objections against doing so? Any
 reasons why such a pull request would not get accepted?
How ugly is it going to be, since druntime can't use std.format?
They wouldn't get any uglier than they already are, because the current toString functions within druntime also can't use std.format. An example would be to toString function of TypInfo_StaticArray: override string toString() const { SizeStringBuff tmpBuff = void; return value.toString() ~ "[" ~ cast(string)len.sizeToTempString(tmpBuff) ~ "]"; } Would be replaced by: override void toString(void delegate(const(char)[]) sink) const { SizeStringBuff tmpBuff = void; value.toString(sink); sink("["); sink(cast(string)len.sizeToTempString(tmpBuff)); sink("]"); }
The thing that really worries me about this synk API is that your code here produces (at least) 4 calls to a delegate. That's a lot of indirect function calling, which can be a severe performance hazard on some systems. We're trading out garbage for low-level performance hazards, which may imply a reduction in portability.
I think given the circumstances, we are better off. But when we find a platform that does perform worse, we can try and implement alternatives. I don't want to destroy performance on the platforms we *do* support, for the worry that some future platform isn't as friendly to this method.
 But in any case, I think all synk code like this should aim to call
 the user supplied synk delegate at most *once* per toString.
 I'd like to see code that used the stack to compose the string
 locally, then feed it through to the supplied synk delegate in fewer
 (or one) calls.
This is a good goal to have, regardless. The stack is always pretty high performing. However, it doesn't scale well. If you look above, the function already uses the stack to output the number. It would be trivial to add 2 chars to put the "[]" there also so only one sink call occurs. But an aggregate which relies on members to output themselves is going to have a tough time following this model. Only at the lowest levels can we enforce such a rule. Another thing to think about is that the inliner can potentially get rid of the cost of delegate calls.
 Ideally, I guess I'd prefer to see an overload which receives a slice
 to write to instead and do away with the delegate call. Particularly
 in druntime, where API and potential platform portability decisions
 should be *super*conservative.
This puts the burden on the caller to ensure enough space is allocated. Or you have to reenter the function to finish up the output. Neither of these seem like acceptable drawbacks. What would you propose for such a mechanism? Maybe I'm not thinking of your ideal API. -Steve
Oct 28 2014
parent reply Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 28 October 2014 22:51, Steven Schveighoffer via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 10/27/14 8:01 PM, Manu via Digitalmars-d wrote:
   28 October 2014 04:40, Benjamin Thaut via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 Am 27.10.2014 11:07, schrieb Daniel Murphy:

 "Benjamin Thaut"  wrote in message news:m2kt16$2566$1 digitalmars.com...
 I'm planning on doing a pull request for druntime which rewrites every
 toString function within druntime to use the new sink signature. That
 way druntime would cause a lot less allocations which end up beeing
 garbage right away. Are there any objections against doing so? Any
 reasons why such a pull request would not get accepted?
How ugly is it going to be, since druntime can't use std.format?
They wouldn't get any uglier than they already are, because the current toString functions within druntime also can't use std.format. An example would be to toString function of TypInfo_StaticArray: override string toString() const { SizeStringBuff tmpBuff = void; return value.toString() ~ "[" ~ cast(string)len.sizeToTempString(tmpBuff) ~ "]"; } Would be replaced by: override void toString(void delegate(const(char)[]) sink) const { SizeStringBuff tmpBuff = void; value.toString(sink); sink("["); sink(cast(string)len.sizeToTempString(tmpBuff)); sink("]"); }
The thing that really worries me about this synk API is that your code here produces (at least) 4 calls to a delegate. That's a lot of indirect function calling, which can be a severe performance hazard on some systems. We're trading out garbage for low-level performance hazards, which may imply a reduction in portability.
I think given the circumstances, we are better off. But when we find a platform that does perform worse, we can try and implement alternatives. I don't want to destroy performance on the platforms we *do* support, for the worry that some future platform isn't as friendly to this method.
Video games consoles are very real, and very now. I suspect they may even represent the largest body of native code in the world today. I don't know if 'alternatives' is the right phrase, since this approach isn't implemented yet, and I wonder if a slightly different API strategy exists which may not exhibit this problem.
 But in any case, I think all synk code like this should aim to call
 the user supplied synk delegate at most *once* per toString.
 I'd like to see code that used the stack to compose the string
 locally, then feed it through to the supplied synk delegate in fewer
 (or one) calls.
This is a good goal to have, regardless. The stack is always pretty high performing. However, it doesn't scale well. If you look above, the function already uses the stack to output the number. It would be trivial to add 2 chars to put the "[]" there also so only one sink call occurs.
It would be trivial, and that's precisely what I'm suggesting! :)
 But an aggregate which relies on members to output themselves is going to
 have a tough time following this model. Only at the lowest levels can we
 enforce such a rule.
I understand this, which is the main reason I suggest to explore something other than a delegate based interface.
 Another thing to think about is that the inliner can potentially get rid of
 the cost of delegate calls.
druntime is a binary lib. The inliner has no effect on this equation.
 Ideally, I guess I'd prefer to see an overload which receives a slice
 to write to instead and do away with the delegate call. Particularly
 in druntime, where API and potential platform portability decisions
 should be *super*conservative.
This puts the burden on the caller to ensure enough space is allocated. Or you have to reenter the function to finish up the output. Neither of these seem like acceptable drawbacks.
Well that's why I open for discussion. I'm sure there's room for creativity here. It doesn't seem that unreasonable to reenter the function to me actually, I'd prefer a second static call in the rare event that a buffer wasn't big enough, to many indirect calls in every single case. There's no way that reentry would be slower. It may be more inconvenient, but I wonder if some API creativity could address that...?
 What would you propose for such a mechanism? Maybe I'm not thinking of your
 ideal API.
I haven't thought of one I'm really happy with. I can imagine some 'foolproof' solution at the API level which may accept some sort of growable string object (which may represent a stack allocation by default). This could lead to a virtual call if the buffer needs to grow, but that's not really any worse than a delegate call, and it's only in the rare case of overflow, rather than many calls in all cases.
Oct 28 2014
next sibling parent "Kagamin" <spam here.lot> writes:
On Tuesday, 28 October 2014 at 23:06:32 UTC, Manu via 
Digitalmars-d wrote:
 I haven't thought of one I'm really happy with.
 I can imagine some 'foolproof' solution at the API level which 
 may
 accept some sort of growable string object (which may represent 
 a
 stack allocation by default). This could lead to a virtual call 
 if the
 buffer needs to grow, but that's not really any worse than a 
 delegate
 call, and it's only in the rare case of overflow, rather than 
 many
 calls in all cases.
If you want to combine two approaches of writing data, you will need to write toString logic twice, both variants intertwined on every byte written.
Oct 29 2014
prev sibling next sibling parent reply "Kagamin" <spam here.lot> writes:
struct Sink
{
    char[] buff;
    void delegate(in char[]) sink;

    void write(in char[] s)
    {
       auto len=min(s.length,buff.length);
       buff[0..len]=s[0..len];
       buff=buff[len..$];
       const s1=s[len..$];
       if(s1.length)sink(s1);
    }
}

override void toString(ref Sink sink) const
{
    value.toString(sink);
    sink.write("[");
    len.toString(sink);
    sink.write("]");
}
Oct 29 2014
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/29/14 6:03 AM, Kagamin wrote:
 struct Sink
 {
     char[] buff;
     void delegate(in char[]) sink;

     void write(in char[] s)
     {
        auto len=min(s.length,buff.length);
        buff[0..len]=s[0..len];
        buff=buff[len..$];
        const s1=s[len..$];
        if(s1.length)sink(s1);
     }
 }

 override void toString(ref Sink sink) const
 {
     value.toString(sink);
     sink.write("[");
     len.toString(sink);
     sink.write("]");
 }
This would require sink to write the buffer before it's first call, since you don't track that. Wouldn't it be better to track the "used" length in buff directly so write can handle that? Not a bad idea, BTW. -Steve
Oct 30 2014
next sibling parent reply "Kagamin" <spam here.lot> writes:
On Thursday, 30 October 2014 at 15:32:30 UTC, Steven 
Schveighoffer wrote:
 This would require sink to write the buffer before it's first 
 call, since you don't track that.
That's delegated to the sink delegate.
 Wouldn't it be better to track the "used" length in buff 
 directly so write can handle that?
The used length is tracked by shrinking the buff. This only shows the callee's perspective.
Oct 30 2014
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/30/14 11:54 AM, Kagamin wrote:
 On Thursday, 30 October 2014 at 15:32:30 UTC, Steven Schveighoffer wrote:
 This would require sink to write the buffer before it's first call,
 since you don't track that.
That's delegated to the sink delegate.
Keep in mind, sink delegate is not a singly implemented function, it's implemented wherever the output is done. So it's a lot of boilerplate to copy around.
 Wouldn't it be better to track the "used" length in buff directly so
 write can handle that?
The used length is tracked by shrinking the buff. This only shows the callee's perspective.
No, what I mean is: struct Sink { char[] buff; uint used; // buff[used..$] is what your buff used to be. void delegate(in char[]) sink; ... } This way, when write finds it runs out of space, first thing it does is sink the buff, then starts sinking the rest. In fact, you can just keep using buff once you sink it, to avoid future "extra calls" to sink. Note, that this can be implemented right on top of the existing sink mechanism. -Steve
Oct 30 2014
parent "Kagamin" <spam here.lot> writes:
On Thursday, 30 October 2014 at 16:03:01 UTC, Steven 
Schveighoffer wrote:
 Keep in mind, sink delegate is not a singly implemented 
 function, it's implemented wherever the output is done. So it's 
 a lot of boilerplate to copy around.
Only 2 lines.
 This way, when write finds it runs out of space, first thing it 
 does is sink the buff, then starts sinking the rest. In fact, 
 you can just keep using buff once you sink it, to avoid future 
 "extra calls" to sink.
I tried to make the code minimalistic. Thought that can give advantages.
Oct 30 2014
prev sibling parent "Kagamin" <spam here.lot> writes:
On Thursday, 30 October 2014 at 15:32:30 UTC, Steven 
Schveighoffer wrote:
 This would require sink to write the buffer before it's first 
 call, since you don't track that.
Also not always true, depends on sink implementation. If you construct string in memory, you can concat the buffered result with sunken result any time later.
Oct 30 2014
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/28/14 7:06 PM, Manu via Digitalmars-d wrote:
 On 28 October 2014 22:51, Steven Schveighoffer via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On 10/27/14 8:01 PM, Manu via Digitalmars-d wrote:
    28 October 2014 04:40, Benjamin Thaut via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 Am 27.10.2014 11:07, schrieb Daniel Murphy:

 "Benjamin Thaut"  wrote in message news:m2kt16$2566$1 digitalmars.com...
 I'm planning on doing a pull request for druntime which rewrites every
 toString function within druntime to use the new sink signature. That
 way druntime would cause a lot less allocations which end up beeing
 garbage right away. Are there any objections against doing so? Any
 reasons why such a pull request would not get accepted?
How ugly is it going to be, since druntime can't use std.format?
They wouldn't get any uglier than they already are, because the current toString functions within druntime also can't use std.format. An example would be to toString function of TypInfo_StaticArray: override string toString() const { SizeStringBuff tmpBuff = void; return value.toString() ~ "[" ~ cast(string)len.sizeToTempString(tmpBuff) ~ "]"; } Would be replaced by: override void toString(void delegate(const(char)[]) sink) const { SizeStringBuff tmpBuff = void; value.toString(sink); sink("["); sink(cast(string)len.sizeToTempString(tmpBuff)); sink("]"); }
The thing that really worries me about this synk API is that your code here produces (at least) 4 calls to a delegate. That's a lot of indirect function calling, which can be a severe performance hazard on some systems. We're trading out garbage for low-level performance hazards, which may imply a reduction in portability.
I think given the circumstances, we are better off. But when we find a platform that does perform worse, we can try and implement alternatives. I don't want to destroy performance on the platforms we *do* support, for the worry that some future platform isn't as friendly to this method.
Video games consoles are very real, and very now. I suspect they may even represent the largest body of native code in the world today.
Sorry, I meant future *D supported* platforms, not future not-yet-existing platforms.
 I don't know if 'alternatives' is the right phrase, since this
 approach isn't implemented yet, and I wonder if a slightly different
 API strategy exists which may not exhibit this problem.
Well, the API already exists and is supported. The idea is to migrate the existing toString calls to the new API.
 But an aggregate which relies on members to output themselves is going to
 have a tough time following this model. Only at the lowest levels can we
 enforce such a rule.
I understand this, which is the main reason I suggest to explore something other than a delegate based interface.
Before we start ripping apart our existing APIs, can we show that the performance is really going to be so bad? I know virtual calls have a bad reputation, but I hate to make these choices absent real data. For instance, D's underlying i/o system uses FILE *, which is about as virtual as you can get. So are you avoiding a virtual call to use a buffer to then pass to a virtual call later?
 Another thing to think about is that the inliner can potentially get rid of
 the cost of delegate calls.
druntime is a binary lib. The inliner has no effect on this equation.
It depends on the delegate and the item being output, whether the source is available to the compiler, and whether or not it's a virtual function. True, some cases will not be inlinable. But the "tweaks" we implement for platform X which does not do well with delegate calls, could be to make this more available.
 Ideally, I guess I'd prefer to see an overload which receives a slice
 to write to instead and do away with the delegate call. Particularly
 in druntime, where API and potential platform portability decisions
 should be *super*conservative.
This puts the burden on the caller to ensure enough space is allocated. Or you have to reenter the function to finish up the output. Neither of these seem like acceptable drawbacks.
Well that's why I open for discussion. I'm sure there's room for creativity here. It doesn't seem that unreasonable to reenter the function to me actually, I'd prefer a second static call in the rare event that a buffer wasn't big enough, to many indirect calls in every single case.
A reentrant function has to track the state of what has been output, which is horrific in my opinion.
 There's no way that reentry would be slower. It may be more
 inconvenient, but I wonder if some API creativity could address
 that...?
The largest problem I see is, you may not know before you start generating strings whether it will fit in the buffer, and therefore, you may still end up eventually calling the sink. Note, you can always allocate a stack buffer, use an inner function as a delegate, and get the inliner to remove the indirect calls. Or use an alternative private mechanism to build the data. Would you say that *one* delegate call per object output is OK?
 What would you propose for such a mechanism? Maybe I'm not thinking of your
 ideal API.
I haven't thought of one I'm really happy with. I can imagine some 'foolproof' solution at the API level which may accept some sort of growable string object (which may represent a stack allocation by default). This could lead to a virtual call if the buffer needs to grow, but that's not really any worse than a delegate call, and it's only in the rare case of overflow, rather than many calls in all cases.
This is a typical mechanism that Tango used -- pass in a ref to a dynamic array referencing a stack buffer. If it needed to grow, just update the length, and it moves to the heap. In most cases, the stack buffer is enough. But the idea is to try and minimize the GC allocations, which are performance killers on the current platforms. I think adding the option of using a delegate is not limiting -- you can always, on a platform that needs it, implement a alternative protocol that is internal to druntime. We are not preventing such protocols by adding the delegate version. But on our currently supported platforms, the delegate vs. GC call is soo much better. I can't see any reason to avoid the latter. -Steve
Oct 30 2014
next sibling parent reply "Jonathan Marler" <johnnymarler gmail.com> writes:
 Before we start ripping apart our existing APIs, can we show 
 that the performance is really going to be so bad? I know 
 virtual calls have a bad reputation, but I hate to make these 
 choices absent real data.

 For instance, D's underlying i/o system uses FILE *, which is 
 about as virtual as you can get. So are you avoiding a virtual 
 call to use a buffer to then pass to a virtual call later?
I think its debatable how useful this information would be but I've written a small D Program to try to explore the different performance statistics for various methods. I've uploaded the code to my server, feel free to download/modify/use. Here's the various methods I've tested. /** Method 1: ReturnString string toString(); Method 2: SinkDelegate void toString(void delegate(const(char)[]) sink); Method 3: SinkDelegateWithStaticHelperBuffer struct SinkStatic { char[64] buffer; void delegate(const(char)[]) sink; } void toString(ref SinkStatic sink); Method 4: SinkDelegateWithDynamicHelperBuffer struct SinkDynamic { char[] buffer; void delegate(const(char)[]) sink; } void toString(ref SinkDynamic sink); void toString(SinkDynamic sink); */ Dmd/No Optimization (dmd dtostring.d): RuntimeString run 1 (loopcount 10000000) Method 1 : 76 ms Method 2 : 153 ms Method 3 : 156 ms Method 4ref : 157 ms Method 4noref: 167 ms StringWithPrefix run 1 (loopcount 1000000) Method 1 : 159 ms Method 2 : 22 ms Method 3 : 80 ms Method 4ref : 80 ms Method 4noref: 83 ms ArrayOfStrings run 1 (loopcount 1000000) Method 1 : 1 sec and 266 ms Method 2 : 79 ms Method 3 : 217 ms Method 4ref : 226 ms Method 4noref: 222 ms Dmd/With Optimization (dmd -O dtostring.d): RuntimeString run 1 (loopcount 10000000) Method 1 : 35 ms Method 2 : 67 ms Method 3 : 67 ms Method 4ref : 72 ms Method 4noref: 70 ms StringWithPrefix run 1 (loopcount 1000000) Method 1 : 154 ms Method 2 : 9 ms Method 3 : 86 ms Method 4ref : 63 ms Method 4noref: 65 ms ArrayOfStrings run 1 (loopcount 1000000) Method 1 : 1 sec and 252 ms Method 2 : 37 ms Method 3 : 191 ms Method 4ref : 193 ms Method 4noref: 201 ms I would like to make a note that passing around a stack allocated buffer to the various toString methods along with a sink delegate may not get much performance benefit. One reason is because the logic can get a little hairy when trying to decide if the buffer is large enough for the string (see my code on ArrayOfStrings) which creates more code which can slow down the processor simply because there is more "code" memory the processor has to manage. Also note that adding a helper method to the buffer/sink delegate doesn't help at all since this is equivalent to passing around the delegate (meaning you could just create the Buffered sink that calls the real delegate when it gets full and pass around it's own sink delegate).
Oct 30 2014
next sibling parent "Jonathan Marler" <johnnymarler gmail.com> writes:
On Thursday, 30 October 2014 at 18:54:00 UTC, Jonathan Marler
wrote:
 Before we start ripping apart our existing APIs, can we show 
 that the performance is really going to be so bad? I know 
 virtual calls have a bad reputation, but I hate to make these 
 choices absent real data.

 For instance, D's underlying i/o system uses FILE *, which is 
 about as virtual as you can get. So are you avoiding a virtual 
 call to use a buffer to then pass to a virtual call later?
I think its debatable how useful this information would be but I've written a small D Program to try to explore the different performance statistics for various methods. I've uploaded the code to my server, feel free to download/modify/use.
Woops heres the link: http://marler.info/dtostring.d
Oct 30 2014
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/30/14 2:53 PM, Jonathan Marler wrote:
 Before we start ripping apart our existing APIs, can we show that the
 performance is really going to be so bad? I know virtual calls have a
 bad reputation, but I hate to make these choices absent real data.

 For instance, D's underlying i/o system uses FILE *, which is about as
 virtual as you can get. So are you avoiding a virtual call to use a
 buffer to then pass to a virtual call later?
I think its debatable how useful this information would be but I've written a small D Program to try to explore the different performance statistics for various methods. I've uploaded the code to my server, feel free to download/modify/use. Here's the various methods I've tested. /** Method 1: ReturnString string toString(); Method 2: SinkDelegate void toString(void delegate(const(char)[]) sink); Method 3: SinkDelegateWithStaticHelperBuffer struct SinkStatic { char[64] buffer; void delegate(const(char)[]) sink; } void toString(ref SinkStatic sink); Method 4: SinkDelegateWithDynamicHelperBuffer struct SinkDynamic { char[] buffer; void delegate(const(char)[]) sink; } void toString(ref SinkDynamic sink); void toString(SinkDynamic sink); */ Dmd/No Optimization (dmd dtostring.d): RuntimeString run 1 (loopcount 10000000) Method 1 : 76 ms Method 2 : 153 ms
I think the above result is deceptive, and the test isn't very useful. The RuntimeString toString isn't a very interesting data point -- it's simply a single string. Not many cases are like that. Most types have multiple members, and it's the need to *construct* a string from that data which is usually the issue. But I would caution, the whole point of my query was about data on the platforms of which Manu speaks. That is, platforms that have issues dealing with virtual calls. x86 doesn't seem to be one of them. -Steve
Oct 30 2014
next sibling parent "Jonathan Marler" <johnnymarler gmail.com> writes:
On Thursday, 30 October 2014 at 20:15:36 UTC, Steven 
Schveighoffer wrote:
 On 10/30/14 2:53 PM, Jonathan Marler wrote:
 Before we start ripping apart our existing APIs, can we show 
 that the
 performance is really going to be so bad? I know virtual 
 calls have a
 bad reputation, but I hate to make these choices absent real 
 data.

 For instance, D's underlying i/o system uses FILE *, which is 
 about as
 virtual as you can get. So are you avoiding a virtual call to 
 use a
 buffer to then pass to a virtual call later?
I think its debatable how useful this information would be but I've written a small D Program to try to explore the different performance statistics for various methods. I've uploaded the code to my server, feel free to download/modify/use. Here's the various methods I've tested. /** Method 1: ReturnString string toString(); Method 2: SinkDelegate void toString(void delegate(const(char)[]) sink); Method 3: SinkDelegateWithStaticHelperBuffer struct SinkStatic { char[64] buffer; void delegate(const(char)[]) sink; } void toString(ref SinkStatic sink); Method 4: SinkDelegateWithDynamicHelperBuffer struct SinkDynamic { char[] buffer; void delegate(const(char)[]) sink; } void toString(ref SinkDynamic sink); void toString(SinkDynamic sink); */ Dmd/No Optimization (dmd dtostring.d): RuntimeString run 1 (loopcount 10000000) Method 1 : 76 ms Method 2 : 153 ms
I think the above result is deceptive, and the test isn't very useful. The RuntimeString toString isn't a very interesting data point -- it's simply a single string. Not many cases are like that. Most types have multiple members, and it's the need to *construct* a string from that data which is usually the issue. But I would caution, the whole point of my query was about data on the platforms of which Manu speaks. That is, platforms that have issues dealing with virtual calls. x86 doesn't seem to be one of them. -Steve
Like I said "I think its debatable how useful this information would be". Yes you are correct the RuntimeString isn't very interesting, it's just establishes a base to compare what happens in the "simplest" possible case. In the real world this would almost never happen. I provided the code to encourage others to modify it and maybe post some more interesting cases then what I already provided. I wanted to get some "real world" examples of how each API would change the code. I would love to see someone improve on my implementation on ArrayOfStrings. I wrote this code in a couple hours so I'm sure there alot of room for improvement.
Oct 30 2014
prev sibling next sibling parent reply "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Thursday, 30 October 2014 at 20:15:36 UTC, Steven 
Schveighoffer wrote:
 I think the above result is deceptive, and the test isn't very 
 useful. The RuntimeString toString isn't a very interesting 
 data point -- it's simply a single string. Not many cases are 
 like that. Most types have multiple members, and it's the need 
 to *construct* a string from that data which is usually the 
 issue.

 But I would caution, the whole point of my query was about data 
 on the platforms of which Manu speaks. That is, platforms that 
 have issues dealing with virtual calls. x86 doesn't seem to be 
 one of them.
OTOH, ArrayOfStrings shows that allocating is worse by several orders of magnitudes. This will not change on any architecture. And the simple sink variant is still faster than the rest by almost an order of magnitude, this may also be unlikely to be much different on these architectures.
Oct 31 2014
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/31/14 4:40 AM, "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net>" 
wrote:
 On Thursday, 30 October 2014 at 20:15:36 UTC, Steven Schveighoffer wrote:
 I think the above result is deceptive, and the test isn't very useful.
 The RuntimeString toString isn't a very interesting data point -- it's
 simply a single string. Not many cases are like that. Most types have
 multiple members, and it's the need to *construct* a string from that
 data which is usually the issue.

 But I would caution, the whole point of my query was about data on the
 platforms of which Manu speaks. That is, platforms that have issues
 dealing with virtual calls. x86 doesn't seem to be one of them.
OTOH, ArrayOfStrings shows that allocating is worse by several orders of magnitudes. This will not change on any architecture. And the simple sink variant is still faster than the rest by almost an order of magnitude, this may also be unlikely to be much different on these architectures.
I find it hard to believe that the delegate version is going to be slower than the memory version on any architecture also. But I must defer to Manu as the expert in those architectures. This is why I asked for a test. The presented data is useful, but not for the purpose of my query. I need to know if it performs bad on these platforms, not how it performs on x86. We should be willing to entertain other proposals for how toString should work, but not if it's proven that what we have will suffice. It should be possible to perform such a test without D support. In any case, I think there is still room for improvement inside the implementations of toString as has been mentioned. -Steve
Oct 31 2014
parent reply "Jonathan Marler" <johnnymarler gmail.com> writes:
On Friday, 31 October 2014 at 11:42:05 UTC, Steven Schveighoffer 
wrote:
 On 10/31/14 4:40 AM, "Marc =?UTF-8?B?U2Now7x0eiI=?= 
 <schuetzm gmx.net>" wrote:
 On Thursday, 30 October 2014 at 20:15:36 UTC, Steven 
 Schveighoffer wrote:
 I think the above result is deceptive, and the test isn't 
 very useful.
 The RuntimeString toString isn't a very interesting data 
 point -- it's
 simply a single string. Not many cases are like that. Most 
 types have
 multiple members, and it's the need to *construct* a string 
 from that
 data which is usually the issue.

 But I would caution, the whole point of my query was about 
 data on the
 platforms of which Manu speaks. That is, platforms that have 
 issues
 dealing with virtual calls. x86 doesn't seem to be one of 
 them.
OTOH, ArrayOfStrings shows that allocating is worse by several orders of magnitudes. This will not change on any architecture. And the simple sink variant is still faster than the rest by almost an order of magnitude, this may also be unlikely to be much different on these architectures.
I find it hard to believe that the delegate version is going to be slower than the memory version on any architecture also. But I must defer to Manu as the expert in those architectures. This is why I asked for a test. The presented data is useful, but not for the purpose of my query. I need to know if it performs bad on these platforms, not how it performs on x86. We should be willing to entertain other proposals for how toString should work, but not if it's proven that what we have will suffice. It should be possible to perform such a test without D support. In any case, I think there is still room for improvement inside the implementations of toString as has been mentioned. -Steve
I wrote a Windows CE app to run on our printers here at HP to test what the Microsoft ARM compiler does with virtual function calls. I had to do an operation with a global volatile variable to prevent the compiler from inlining the non-virtual function call but I finally got it to work. Calling the function 100 million times yielded the following times: Windows Compiler on ARM (Release) ------------------------------------------- NonVirtual: 0.537000 seconds Virtual : 1.281000 seconds Windows Compiler on x86 (Release) ------------------------------------------- NonVirtual: 0.226000 seconds Virtual : 0.226000 seconds Windows Compiler on x86 (Debug) ------------------------------------------- NonVirtual: 2.940000 seconds Virtual : 3.204000 seconds Here's the link to the code: http://marler.info/virtualtest.c
Oct 31 2014
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/31/14 4:50 PM, Jonathan Marler wrote:

 I wrote a Windows CE app to run on our printers here at HP to test what
 the Microsoft ARM compiler does with virtual function calls.  I had to
 do an operation with a global volatile variable to prevent the compiler
 from inlining the non-virtual function call but I finally got it to work.

 Calling the function 100 million times yielded the following times:

 Windows Compiler on ARM (Release)
 -------------------------------------------
 NonVirtual: 0.537000 seconds
 Virtual   : 1.281000 seconds

 Windows Compiler on x86 (Release)
 -------------------------------------------
 NonVirtual: 0.226000 seconds
 Virtual   : 0.226000 seconds

 Windows Compiler on x86 (Debug)
 -------------------------------------------
 NonVirtual: 2.940000 seconds
 Virtual   : 3.204000 seconds


 Here's the link to the code:

 http://marler.info/virtualtest.c
Thanks, this is helpful. -Steve
Nov 03 2014
prev sibling parent Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 31 October 2014 06:15, Steven Schveighoffer via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 10/30/14 2:53 PM, Jonathan Marler wrote:
 Before we start ripping apart our existing APIs, can we show that the
 performance is really going to be so bad? I know virtual calls have a
 bad reputation, but I hate to make these choices absent real data.

 For instance, D's underlying i/o system uses FILE *, which is about as
 virtual as you can get. So are you avoiding a virtual call to use a
 buffer to then pass to a virtual call later?
I think its debatable how useful this information would be but I've written a small D Program to try to explore the different performance statistics for various methods. I've uploaded the code to my server, feel free to download/modify/use. Here's the various methods I've tested. /** Method 1: ReturnString string toString(); Method 2: SinkDelegate void toString(void delegate(const(char)[]) sink); Method 3: SinkDelegateWithStaticHelperBuffer struct SinkStatic { char[64] buffer; void delegate(const(char)[]) sink; } void toString(ref SinkStatic sink); Method 4: SinkDelegateWithDynamicHelperBuffer struct SinkDynamic { char[] buffer; void delegate(const(char)[]) sink; } void toString(ref SinkDynamic sink); void toString(SinkDynamic sink); */ Dmd/No Optimization (dmd dtostring.d): RuntimeString run 1 (loopcount 10000000) Method 1 : 76 ms Method 2 : 153 ms
I think the above result is deceptive, and the test isn't very useful. The RuntimeString toString isn't a very interesting data point -- it's simply a single string. Not many cases are like that. Most types have multiple members, and it's the need to *construct* a string from that data which is usually the issue. But I would caution, the whole point of my query was about data on the platforms of which Manu speaks. That is, platforms that have issues dealing with virtual calls. x86 doesn't seem to be one of them. -Steve
I want to get back to this (and other topics), but I'm still about 30 posts behind, and I have to go... I really can't keep up with this forum these days :/ I'll be back on this topic soon...
Nov 01 2014
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/30/2014 8:30 AM, Steven Schveighoffer wrote:
 This is a typical mechanism that Tango used -- pass in a ref to a dynamic array
 referencing a stack buffer. If it needed to grow, just update the length, and
it
 moves to the heap. In most cases, the stack buffer is enough. But the idea is
to
 try and minimize the GC allocations, which are performance killers on the
 current platforms.
We keep solving the same problem over and over. std.internal.scopebuffer does this handily. It's what it was designed for - it works, it's fast, and it virtually eliminates the need for heap allocations. Best of all, it's an Output Range, meaning it fits in with the range design of Phobos.
Nov 01 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Saturday, 1 November 2014 at 07:02:03 UTC, Walter Bright wrote:
 On 10/30/2014 8:30 AM, Steven Schveighoffer wrote:
 This is a typical mechanism that Tango used -- pass in a ref 
 to a dynamic array
 referencing a stack buffer. If it needed to grow, just update 
 the length, and it
 moves to the heap. In most cases, the stack buffer is enough. 
 But the idea is to
 try and minimize the GC allocations, which are performance 
 killers on the
 current platforms.
We keep solving the same problem over and over. std.internal.scopebuffer does this handily. It's what it was designed for - it works, it's fast, and it virtually eliminates the need for heap allocations. Best of all, it's an Output Range, meaning it fits in with the range design of Phobos.
It is not the same thing as ref/out buffer argument. We have been running ping-pong comments about it for a several times now. All std.internal.scopebuffer does is reducing heap allocation count at cost of stack consumption (and switching to raw malloc for heap) - it does not change big-O estimate of heap allocations unless it is used as a buffer argument - at which point it is no better than plain array.
Nov 01 2014
next sibling parent "Jonathan Marler" <johnnymarler gmail.com> writes:
On Saturday, 1 November 2014 at 12:31:15 UTC, Dicebot wrote:
 On Saturday, 1 November 2014 at 07:02:03 UTC, Walter Bright 
 wrote:
 On 10/30/2014 8:30 AM, Steven Schveighoffer wrote:
 This is a typical mechanism that Tango used -- pass in a ref 
 to a dynamic array
 referencing a stack buffer. If it needed to grow, just update 
 the length, and it
 moves to the heap. In most cases, the stack buffer is enough. 
 But the idea is to
 try and minimize the GC allocations, which are performance 
 killers on the
 current platforms.
We keep solving the same problem over and over. std.internal.scopebuffer does this handily. It's what it was designed for - it works, it's fast, and it virtually eliminates the need for heap allocations. Best of all, it's an Output Range, meaning it fits in with the range design of Phobos.
It is not the same thing as ref/out buffer argument. We have been running ping-pong comments about it for a several times now. All std.internal.scopebuffer does is reducing heap allocation count at cost of stack consumption (and switching to raw malloc for heap) - it does not change big-O estimate of heap allocations unless it is used as a buffer argument - at which point it is no better than plain array.
Sorry if this is a stupid question but what's being discussed here? Are we talking about passing a scope buffer to toString, or are we talking about the implementation of the toString function allocating it's own scope buffer? API change or implementation notes or something else? Thanks.
Nov 01 2014
prev sibling next sibling parent reply "David Nadlinger" <code klickverbot.at> writes:
On Saturday, 1 November 2014 at 12:31:15 UTC, Dicebot wrote:
 It is not the same thing as ref/out buffer argument. We have 
 been running ping-pong comments about it for a several times 
 now. All std.internal.scopebuffer does is reducing heap 
 allocation count at cost of stack consumption (and switching to 
 raw malloc for heap) - it does not change big-O estimate of 
 heap allocations unless it is used as a buffer argument - at 
 which point it is no better than plain array.
Agreed. Its API also has severe safety problems. David
Nov 01 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 11/1/2014 10:04 AM, David Nadlinger wrote:
 Agreed. Its API also has severe safety problems.
Way overblown. And it's nothing that druntime developers cannot handle easily. druntime is full of system programming.
Nov 01 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/1/2014 5:31 AM, Dicebot wrote:
 On Saturday, 1 November 2014 at 07:02:03 UTC, Walter Bright wrote:
 On 10/30/2014 8:30 AM, Steven Schveighoffer wrote:
 This is a typical mechanism that Tango used -- pass in a ref to a dynamic array
 referencing a stack buffer. If it needed to grow, just update the length, and
it
 moves to the heap. In most cases, the stack buffer is enough. But the idea is
to
 try and minimize the GC allocations, which are performance killers on the
 current platforms.
We keep solving the same problem over and over. std.internal.scopebuffer does this handily. It's what it was designed for - it works, it's fast, and it virtually eliminates the need for heap allocations. Best of all, it's an Output Range, meaning it fits in with the range design of Phobos.
It is not the same thing as ref/out buffer argument.
Don't understand your comment.
 We have been running
 ping-pong comments about it for a several times now. All
 std.internal.scopebuffer does is reducing heap allocation count at cost of
stack
 consumption (and switching to raw malloc for heap) - it does not change big-O
 estimate of heap allocations unless it is used as a buffer argument - at which
 point it is no better than plain array.
1. stack allocation is not much of an issue here because these routines in druntime are not recursive, and there's plenty of stack available for what druntime is using toString for. 2. "All it does" is an inadequate summary. Most of the time it completely eliminates the need for allocations. This is a BIG deal, and the source of much of the speed of Warp. The cases where it does fall back to heap allocation do not leave GC memory around. 3. There is big-O in worst case, which is very unlikely, and big-O in 99% of the cases, which for scopebuffer is O(1). 4. You're discounting the API of scopebuffer which is usable as an output range, fitting it right in with Phobos' pipeline design. Furthermore, again, I know it works and is fast.
Nov 01 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Saturday, 1 November 2014 at 17:50:33 UTC, Walter Bright wrote:
 It is not the same thing as ref/out buffer argument.
Don't understand your comment.
Steven comment has mentioned two things about Tango approach - using stack buffer as initial buffer and extensive usage of ref parameters for such arguments. std.internal.scopebuffer on its own only addresses the former.
 We have been running
 ping-pong comments about it for a several times now. All
 std.internal.scopebuffer does is reducing heap allocation 
 count at cost of stack
 consumption (and switching to raw malloc for heap) - it does 
 not change big-O
 estimate of heap allocations unless it is used as a buffer 
 argument - at which
 point it is no better than plain array.
1. stack allocation is not much of an issue here because these routines in druntime are not recursive, and there's plenty of stack available for what druntime is using toString for.
Fibers. Ironically one more problematic parts of Tango for us is the Layout one which uses somewhat big stack buffer for formatting the arguments into string. It is a very common reason for fiber stack overflows (and thus segfaults) unless default size is changed. With 64-bit systems and lazy stack page allocation it is not a critical problem anymore but it is still important problem because any growth of effective fiber stack prevents one from using it as truly cheap context abstraction. Persistent thread-local heap buffer that gets reused by many bound fibers can be much better in that regard.
 2. "All it does" is an inadequate summary. Most of the time it 
 completely eliminates the need for allocations. This is a BIG 
 deal, and the source of much of the speed of Warp. The cases 
 where it does fall back to heap allocation do not leave GC 
 memory around.
We have a very different applications in mind. Warp is "normal" single user application and I am speaking about servers. Those have very different performance profiles and requirements, successful experience in one domain is simply irrelevant to other.
 3. There is big-O in worst case, which is very unlikely, and 
 big-O in 99% of the cases, which for scopebuffer is O(1).
And with servers you need to always consider worst case as default or meet the DoS attack. I am not even sure where 99% comes from because input length is usually defined by application domain and stack usage of scopebuffer is set inside library code.
 4. You're discounting the API of scopebuffer which is usable as 
 an output range, fitting it right in with Phobos' pipeline 
 design.
Any array is output range :)
Nov 02 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 11/2/2014 3:57 PM, Dicebot wrote:
 On Saturday, 1 November 2014 at 17:50:33 UTC, Walter Bright wrote:
 It is not the same thing as ref/out buffer argument.
Don't understand your comment.
Steven comment has mentioned two things about Tango approach - using stack buffer as initial buffer and extensive usage of ref parameters for such arguments. std.internal.scopebuffer on its own only addresses the former.
I still have no idea how this would apply here.
 We have been running
 ping-pong comments about it for a several times now. All
 std.internal.scopebuffer does is reducing heap allocation count at cost of
stack
 consumption (and switching to raw malloc for heap) - it does not change big-O
 estimate of heap allocations unless it is used as a buffer argument - at which
 point it is no better than plain array.
1. stack allocation is not much of an issue here because these routines in druntime are not recursive, and there's plenty of stack available for what druntime is using toString for.
Fibers. Ironically one more problematic parts of Tango for us is the Layout one which uses somewhat big stack buffer for formatting the arguments into string. It is a very common reason for fiber stack overflows (and thus segfaults) unless default size is changed. With 64-bit systems and lazy stack page allocation it is not a critical problem anymore but it is still important problem because any growth of effective fiber stack prevents one from using it as truly cheap context abstraction. Persistent thread-local heap buffer that gets reused by many bound fibers can be much better in that regard.
There is no problem with having the max stack allocation for scopebuffer use set smaller for 32 bit code than 64 bit code.
 2. "All it does" is an inadequate summary. Most of the time it completely
 eliminates the need for allocations. This is a BIG deal, and the source of
 much of the speed of Warp. The cases where it does fall back to heap
 allocation do not leave GC memory around.
We have a very different applications in mind. Warp is "normal" single user application and I am speaking about servers. Those have very different performance profiles and requirements, successful experience in one domain is simply irrelevant to other.
What part of druntime would be special case for servers?
 3. There is big-O in worst case, which is very unlikely, and big-O in 99% of
 the cases, which for scopebuffer is O(1).
And with servers you need to always consider worst case as default or meet the DoS attack.
Minimizing allocations is about dealing with the most common cases.
 I am not even sure where 99% comes from because input length is
 usually defined by application domain and stack usage of scopebuffer is set
 inside library code.
It's not that hard.
 4. You're discounting the API of scopebuffer which is usable as an output
 range, fitting it right in with Phobos' pipeline design.
Any array is output range :)
The point is what to do when the array gets full.
Nov 02 2014
prev sibling parent reply Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 31 October 2014 01:30, Steven Schveighoffer via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 10/28/14 7:06 PM, Manu via Digitalmars-d wrote:
 On 28 October 2014 22:51, Steven Schveighoffer via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On 10/27/14 8:01 PM, Manu via Digitalmars-d wrote:
    28 October 2014 04:40, Benjamin Thaut via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 Am 27.10.2014 11:07, schrieb Daniel Murphy:

 "Benjamin Thaut"  wrote in message
 news:m2kt16$2566$1 digitalmars.com...
 I'm planning on doing a pull request for druntime which rewrites
 every
 toString function within druntime to use the new sink signature. That
 way druntime would cause a lot less allocations which end up beeing
 garbage right away. Are there any objections against doing so? Any
 reasons why such a pull request would not get accepted?
How ugly is it going to be, since druntime can't use std.format?
They wouldn't get any uglier than they already are, because the current toString functions within druntime also can't use std.format. An example would be to toString function of TypInfo_StaticArray: override string toString() const { SizeStringBuff tmpBuff = void; return value.toString() ~ "[" ~ cast(string)len.sizeToTempString(tmpBuff) ~ "]"; } Would be replaced by: override void toString(void delegate(const(char)[]) sink) const { SizeStringBuff tmpBuff = void; value.toString(sink); sink("["); sink(cast(string)len.sizeToTempString(tmpBuff)); sink("]"); }
The thing that really worries me about this synk API is that your code here produces (at least) 4 calls to a delegate. That's a lot of indirect function calling, which can be a severe performance hazard on some systems. We're trading out garbage for low-level performance hazards, which may imply a reduction in portability.
I think given the circumstances, we are better off. But when we find a platform that does perform worse, we can try and implement alternatives. I don't want to destroy performance on the platforms we *do* support, for the worry that some future platform isn't as friendly to this method.
Video games consoles are very real, and very now. I suspect they may even represent the largest body of native code in the world today.
Sorry, I meant future *D supported* platforms, not future not-yet-existing platforms.
I'm not sure what you mean. I've used D on current and existing games consoles. I personally think it's one of D's most promising markets... if not for just a couple of remaining details. Also, my suggestion will certainly perform better on all platforms. There is no platform that can benefit from the existing proposal of an indirect function call per write vs something that doesn't.
 I don't know if 'alternatives' is the right phrase, since this
 approach isn't implemented yet, and I wonder if a slightly different
 API strategy exists which may not exhibit this problem.
Well, the API already exists and is supported. The idea is to migrate the existing toString calls to the new API.
Really? Bummer... I haven't seen this API anywhere yet. Seems a shame to make such a mistake with a brand new API. Too many competing API patterns :/
 But an aggregate which relies on members to output themselves is going to
 have a tough time following this model. Only at the lowest levels can we
 enforce such a rule.
I understand this, which is the main reason I suggest to explore something other than a delegate based interface.
Before we start ripping apart our existing APIs, can we show that the performance is really going to be so bad? I know virtual calls have a bad reputation, but I hate to make these choices absent real data.
My career for a decade always seems to find it's way back to fighting virtual calls. (in proprietary codebases so I can't easily present case studies) But it's too late now I guess. I should have gotten in when someone came up with the idea... I thought it was new.
 For instance, D's underlying i/o system uses FILE *, which is about as
 virtual as you can get. So are you avoiding a virtual call to use a buffer
 to then pass to a virtual call later?
I do a lot of string processing, but it never finds it's way to a FILE*. I don't write console based software.
 Another thing to think about is that the inliner can potentially get rid
 of
 the cost of delegate calls.
druntime is a binary lib. The inliner has no effect on this equation.
It depends on the delegate and the item being output, whether the source is available to the compiler, and whether or not it's a virtual function. True, some cases will not be inlinable. But the "tweaks" we implement for platform X which does not do well with delegate calls, could be to make this more available.
I suspect the cases where the inliner can do something useful would be in quite a significant minority (with respect to phobos and druntime in particular). I haven't tried it, but I have a lifetime of disassembling code of this sort, and I'm very familiar with the optimisation patterns.
 Ideally, I guess I'd prefer to see an overload which receives a slice
 to write to instead and do away with the delegate call. Particularly
 in druntime, where API and potential platform portability decisions
 should be *super*conservative.
This puts the burden on the caller to ensure enough space is allocated. Or you have to reenter the function to finish up the output. Neither of these seem like acceptable drawbacks.
Well that's why I open for discussion. I'm sure there's room for creativity here. It doesn't seem that unreasonable to reenter the function to me actually, I'd prefer a second static call in the rare event that a buffer wasn't big enough, to many indirect calls in every single case.
A reentrant function has to track the state of what has been output, which is horrific in my opinion.
How so? It doesn't seem that bad to me. We're talking about druntime here, the single most used library in the whole ecosystem... that shit should be tuned to the max. It doesn't matter how pretty the code is.
 There's no way that reentry would be slower. It may be more
 inconvenient, but I wonder if some API creativity could address
 that...?
The largest problem I see is, you may not know before you start generating strings whether it will fit in the buffer, and therefore, you may still end up eventually calling the sink.
Right. The api should be structured to make a virtual call _only_ in the rare instance the buffer overflows. That is my suggestion. You can be certain to supply a buffer that will not overflow in many/most cases.
 Note, you can always allocate a stack buffer, use an inner function as a
 delegate, and get the inliner to remove the indirect calls. Or use an
 alternative private mechanism to build the data.
We're talking about druntime specifically. It is a binary lib. The inliner won't save you.
 Would you say that *one* delegate call per object output is OK?
I would say that an uncontrollable virtual call is NEVER okay, especially in otherwise trivial and such core functions like toString in druntime. But one is certainly better than many. Remember I was arguing for final-by-default for years (because it's really important)... and I'm still extremely bitter about that outcome.
 What would you propose for such a mechanism? Maybe I'm not thinking of
 your
 ideal API.
I haven't thought of one I'm really happy with. I can imagine some 'foolproof' solution at the API level which may accept some sort of growable string object (which may represent a stack allocation by default). This could lead to a virtual call if the buffer needs to grow, but that's not really any worse than a delegate call, and it's only in the rare case of overflow, rather than many calls in all cases.
This is a typical mechanism that Tango used -- pass in a ref to a dynamic array referencing a stack buffer. If it needed to grow, just update the length, and it moves to the heap. In most cases, the stack buffer is enough. But the idea is to try and minimize the GC allocations, which are performance killers on the current platforms.
I wouldn't hard-code to overflow to the GC heap specifically. It should be an API that the user may overflow to wherever they like.
 I think adding the option of using a delegate is not limiting -- you can
 always, on a platform that needs it, implement a alternative protocol that
 is internal to druntime. We are not preventing such protocols by adding the
 delegate version.
You're saying that some platform may need to implement a further completely different API? Then no existing code will compile for that platform. This is madness. We already have more than enough API's.
 But on our currently supported platforms, the delegate vs. GC call is soo
 much better. I can't see any reason to avoid the latter.
The latter? (the GC?) .. Sorry, I'm confused.
Nov 01 2014
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/1/14 9:30 AM, Manu via Digitalmars-d wrote:
 On 31 October 2014 01:30, Steven Schveighoffer via Digitalmars-d
 Sorry, I meant future *D supported* platforms, not future not-yet-existing
 platforms.
I'm not sure what you mean. I've used D on current and existing games consoles. I personally think it's one of D's most promising markets... if not for just a couple of remaining details.
I don't think D officially supports these platforms. I could be wrong.
 Also, my suggestion will certainly perform better on all platforms.
 There is no platform that can benefit from the existing proposal of an
 indirect function call per write vs something that doesn't.
Performance isn't the only consideration. In your case, it has a higher priority than ease of implementation, flexibility, or usability. But that's not the case everywhere. Consider the flip-side: on x86, your mechanism may be a hair faster than just having a delegate. Is it worth all the extra trouble for those folks to have to save some state or deal with reallocating buffers in their toString functions?
 Before we start ripping apart our existing APIs, can we show that the
 performance is really going to be so bad? I know virtual calls have a bad
 reputation, but I hate to make these choices absent real data.
My career for a decade always seems to find it's way back to fighting virtual calls. (in proprietary codebases so I can't easily present case studies) But it's too late now I guess. I should have gotten in when someone came up with the idea... I thought it was new.
At the moment, you are stuck with most toString calls allocating on the GC every time they are called. I think the virtual call thing should be a pleasant improvement :) But in all seriousness, I am not opposed to an alternative API, but the delegate one seems to find the right balance of flexibility and ease of implementation. I think we can use any number of toString APIs, and in fact, we should be able to build on top of the delegate version a mechanism to reduce (but not eliminate obviously) virtual calls.
 For instance, D's underlying i/o system uses FILE *, which is about as
 virtual as you can get. So are you avoiding a virtual call to use a buffer
 to then pass to a virtual call later?
I do a lot of string processing, but it never finds it's way to a FILE*. I don't write console based software.
Just an example. Point taken.
 A reentrant function has to track the state of what has been output, which
 is horrific in my opinion.
How so? It doesn't seem that bad to me. We're talking about druntime here, the single most used library in the whole ecosystem... that shit should be tuned to the max. It doesn't matter how pretty the code is.
Keep in mind that any API addition is something that all users have to deal with. If we are talking about a specialized, tuned API that druntime and phobos can use, I don't think it would be impossible to include this. But to say we only support horrible allocate-every-toString-call mechanism, and please-keep-your-own-state-machine mechanism is not good. The main benefit of the delegate approach is that it's easy to understand, easy to use, and reasonably efficient. It's a good middle ground. It's also easy to implement a sink. Both sides are easy, it makes the whole thing more approachable.
 The largest problem I see is, you may not know before you start generating
 strings whether it will fit in the buffer, and therefore, you may still end
 up eventually calling the sink.
Right. The api should be structured to make a virtual call _only_ in the rare instance the buffer overflows. That is my suggestion. You can be certain to supply a buffer that will not overflow in many/most cases.
I, and I'm sure most of the developers, are open to new ideas to make something like this as painless as possible. I still think we should keep the delegate mechanism.
 Note, you can always allocate a stack buffer, use an inner function as a
 delegate, and get the inliner to remove the indirect calls. Or use an
 alternative private mechanism to build the data.
We're talking about druntime specifically. It is a binary lib. The inliner won't save you.
Let's define the situation here -- there is a boundary in druntime in across which no inlining can occur. Before the boundary or after the boundary, inlining is fair game. So for instance, if a druntime object has 3 members it needs to toString in order to satisfy it's own toString, those members will probably all be druntime objects as well. In which case it can optimize those sub-calls. And let's also not forget that druntime has template objects in it as well, which are ripe for inlining. This is what I meant.
 Would you say that *one* delegate call per object output is OK?
I would say that an uncontrollable virtual call is NEVER okay, especially in otherwise trivial and such core functions like toString in druntime. But one is certainly better than many.
I'm trying to get a feel for how painful this has to be :) If we can have one virtual call, it means you can build a mechanism that works with delegates + a more efficient one, by just sinking the result of the efficient one. This means you can work with the existing APIs right now.
 This is a typical mechanism that Tango used -- pass in a ref to a dynamic
 array referencing a stack buffer. If it needed to grow, just update the
 length, and it moves to the heap. In most cases, the stack buffer is enough.
 But the idea is to try and minimize the GC allocations, which are
 performance killers on the current platforms.
I wouldn't hard-code to overflow to the GC heap specifically. It should be an API that the user may overflow to wherever they like.
Just keep in mind the clients of this API are on 3 sides: 1. Those who implement the toString call. 2. Those who implement a place for those toString calls to go. 3. Those who wish to put the 2 together. We want to reduce the burden as much as possible on all of them. We also don't want to require implementing ALL these different toString mechanisms -- I should be able to implement one of them, and all can use it.
 I think adding the option of using a delegate is not limiting -- you can
 always, on a platform that needs it, implement a alternative protocol that
 is internal to druntime. We are not preventing such protocols by adding the
 delegate version.
You're saying that some platform may need to implement a further completely different API? Then no existing code will compile for that platform. This is madness. We already have more than enough API's.
You just said you don't use FILE *. Why do we have to ensure all pieces of Phobos implement everything you desire when you aren't going to use it? I don't think it's madness to *provide* a mechanism for more efficient (on some platforms) code, and then ask those who are interested to use that mechanism, while not forcing it on all those who aren't. You can always submit a pull request to add it where you need it! But having an agreed upon API is an important first step. So let's get that done.
 But on our currently supported platforms, the delegate vs. GC call is soo
 much better. I can't see any reason to avoid the latter.
The latter? (the GC?) .. Sorry, I'm confused.
My bad, I meant the former. No wonder you were confused ;) -Steve
Nov 03 2014
parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Monday, 3 November 2014 at 15:42:57 UTC, Steven Schveighoffer 
wrote:
 At the moment, you are stuck with most toString calls 
 allocating on the GC every time they are called. I think the 
 virtual call thing should be a pleasant improvement :)
Note that delegates aren't virtual calls, but indirect calls. The former need 2 memory access, the latter none (or 3 vs. 1 if the delegate/object isn't yet in a register).
Nov 03 2014
prev sibling parent "Martin Nowak" <code dawg.eu> writes:
On Tuesday, 28 October 2014 at 23:06:32 UTC, Manu via 
Digitalmars-d wrote:
 Video games consoles are very real, and very now.
What architecture/platform? The indirect function call argument seems a bit displaced for toString.
 This puts the burden on the caller to ensure enough space is 
 allocated. Or
 you have to reenter the function to finish up the output. 
 Neither of these
 seem like acceptable drawbacks.
Well that's why I open for discussion. I'm sure there's room for creativity here.
Well the obvious alternative is a printf style return, that tells you how much space is needed.
 It doesn't seem that unreasonable to reenter the function to me
 actually, I'd prefer a second static call in the rare event 
 that a
 buffer wasn't big enough,
Second virtual call ;).
Oct 31 2014
prev sibling next sibling parent "Martin Nowak" <code dawg.eu> writes:
On Monday, 27 October 2014 at 07:42:30 UTC, Benjamin Thaut wrote:
 Any reasons why such a pull request would not get accepted?
I did that for the exception hierarchy and sure would accept that for TypeInfo. Not so sure about Object itself, as it would nail down the API.
Oct 31 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/27/2014 12:42 AM, Benjamin Thaut wrote:
 I'm planning on doing a pull request for druntime which rewrites every toString
 function within druntime to use the new sink signature. That way druntime would
 cause a lot less allocations which end up beeing garbage right away. Are there
 any objections against doing so? Any reasons why such a pull request would not
 get accepted?
Why a sink version instead of an Output Range?
Oct 31 2014
next sibling parent reply "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Friday, 31 October 2014 at 19:04:29 UTC, Walter Bright wrote:
 On 10/27/2014 12:42 AM, Benjamin Thaut wrote:
 I'm planning on doing a pull request for druntime which 
 rewrites every toString
 function within druntime to use the new sink signature. That 
 way druntime would
 cause a lot less allocations which end up beeing garbage right 
 away. Are there
 any objections against doing so? Any reasons why such a pull 
 request would not
 get accepted?
Why a sink version instead of an Output Range?
I guess because it's for druntime, and we don't want to pull in std.range?
Oct 31 2014
parent reply Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 1 November 2014 05:06, via Digitalmars-d <digitalmars-d puremagic.com> wrote:
 On Friday, 31 October 2014 at 19:04:29 UTC, Walter Bright wrote:
 On 10/27/2014 12:42 AM, Benjamin Thaut wrote:
 I'm planning on doing a pull request for druntime which rewrites every
 toString
 function within druntime to use the new sink signature. That way druntime
 would
 cause a lot less allocations which end up beeing garbage right away. Are
 there
 any objections against doing so? Any reasons why such a pull request
 would not
 get accepted?
Why a sink version instead of an Output Range?
I guess because it's for druntime, and we don't want to pull in std.range?
I'd say that I'd be nervous to see druntime chockers full of templates...?
Nov 01 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/1/2014 6:35 AM, Manu via Digitalmars-d wrote:
 I'd say that I'd be nervous to see druntime chockers full of templates...?
What's a chocker? Why would templates make you nervous? They're not C++ templates!
Nov 01 2014
parent reply Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 2 November 2014 04:15, Walter Bright via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 11/1/2014 6:35 AM, Manu via Digitalmars-d wrote:
 I'd say that I'd be nervous to see druntime chockers full of templates...?
What's a chocker?
It's Australian for 'lots'.
 Why would templates make you nervous? They're not C++ templates!
What do you mean? How are D templates any different than C++ templates in a practical sense? I want a binary lib to be a binary lib. I don't think it's good form for the lowest level library in the language ecosystem to depend on templates (ie, client-side code generation). This is the fundamental lib that will be present in every D application there is. If it is not a binary lib, then it can't be updated. Consider performance improvements are made to druntime, which every application should enjoy. If the code is templates, then the old version at time of compiling is embedded into existing client software, the update will have no effect unless the client software is rebuilt. More important, what about security fixes in druntime... imagine a critical security problem in druntime (I bet there's lots!); if we can't update druntime, then *every* D application is an exploit. Very shaky foundation for an ecosystem... druntime is a fundamental ecosystem library. It should be properly semantically version-ed, and particularly for security reasons, I think this should be taken very very seriously. This argument could equally be applicable to phobos, and I've always been nervous about it too for the same reasons... but I'll draw a line there, in that phobos is not critical for an application to build and link, and so much of the API is already templates, it would be impossible to change that now.
Nov 02 2014
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/3/14 8:45 AM, Manu via Digitalmars-d wrote:
 On 2 November 2014 04:15, Walter Bright via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 Why would templates make you nervous? They're not C++ templates!
What do you mean? How are D templates any different than C++ templates in a practical sense?
Probably about three times easier to read and write.
 I want a binary lib to be a binary lib. I don't think it's good form
 for the lowest level library in the language ecosystem to depend on
 templates (ie, client-side code generation).
 This is the fundamental lib that will be present in every D
 application there is. If it is not a binary lib, then it can't be
 updated.

 Consider performance improvements are made to druntime, which every
 application should enjoy. If the code is templates, then the old
 version at time of compiling is embedded into existing client
 software, the update will have no effect unless the client software is
 rebuilt.
 More important, what about security fixes in druntime... imagine a
 critical security problem in druntime (I bet there's lots!); if we
 can't update druntime, then *every* D application is an exploit. Very
 shaky foundation for an ecosystem...
The same argument goes for all statically linked libraries.
 druntime is a fundamental ecosystem library. It should be properly
 semantically version-ed, and particularly for security reasons, I
 think this should be taken very very seriously.

 This argument could equally be applicable to phobos, and I've always
 been nervous about it too for the same reasons... but I'll draw a line
 there, in that phobos is not critical for an application to build and
 link, and so much of the API is already templates, it would be
 impossible to change that now.
Within reason, most of the runtime and standard library ought to be generic so as to adapt best to application needs. Generics are a very powerful mechanism for libraries. Andrei
Nov 03 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/2/2014 11:45 PM, Manu via Digitalmars-d wrote:
 On 2 November 2014 04:15, Walter Bright via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 Why would templates make you nervous? They're not C++ templates!
What do you mean? How are D templates any different than C++ templates in a practical sense?
They're much more straightforward to use, syntactically and semantically.
 I want a binary lib to be a binary lib. I don't think it's good form
 for the lowest level library in the language ecosystem to depend on
 templates (ie, client-side code generation).
What's the problem with that?
 This is the fundamental lib that will be present in every D
 application there is. If it is not a binary lib, then it can't be
 updated.
Have you ever looked at the C openssl.lib? The .h files with it are loaded with metaprogramming done with C macros. Yet I've never heard anyone complain about it. C .h files for DLLs are typically stuffed with C macros.
 Consider performance improvements are made to druntime, which every
 application should enjoy. If the code is templates, then the old
 version at time of compiling is embedded into existing client
 software, the update will have no effect unless the client software is
 rebuilt.
 More important, what about security fixes in druntime... imagine a
 critical security problem in druntime (I bet there's lots!); if we
 can't update druntime, then *every* D application is an exploit. Very
 shaky foundation for an ecosystem...
The defense presents openssl as Exhibit A! (The templates really only present the interface to the dll, not the guts of it.)
 druntime is a fundamental ecosystem library. It should be properly
 semantically version-ed, and particularly for security reasons, I
 think this should be taken very very seriously.
openssl!!! BTW, you should know that if a template is instantiated by the library itself, the compiler won't re-instantiate it and insert that in the calling code. It'll just call the instantiated binary.
Nov 03 2014
next sibling parent reply =?UTF-8?Q?Tobias=20M=C3=BCller?= <troplin bluewin.ch> writes:
Walter Bright <newshound2 digitalmars.com> wrote:

[snip]

 Have you ever looked at the C openssl.lib? The .h files with it are
 loaded with metaprogramming done with C macros. Yet I've never heard
 anyone complain about it.
Those macros are a very common common complaint in my experience.
 C .h files for DLLs are typically stuffed with C macros.
[snip]
 The defense presents openssl as Exhibit A!
Presenting OpenSSL as a case for good interface design is a crime by itself! Tobi
Nov 03 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/3/2014 9:36 AM, Tobias Müller wrote:
 Presenting OpenSSL as a case for good interface design is a crime by
 itself!
Not at all. I presented it as an example of a C library that has a metaprogramming interface, but that interface has not prevented bug fix updates to the shared library itself without requiring recompiling of apps that call it. All shared C libraries have a metaprogramming interface if they have macros in the .h file.
Nov 03 2014
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/3/14 4:37 PM, Walter Bright wrote:
 On 11/3/2014 9:36 AM, Tobias Müller wrote:
 Presenting OpenSSL as a case for good interface design is a crime by
 itself!
Not at all. I presented it as an example of a C library that has a metaprogramming interface, but that interface has not prevented bug fix updates to the shared library itself without requiring recompiling of apps that call it. All shared C libraries have a metaprogramming interface if they have macros in the .h file.
I had a very nasty experience with using a template-based API. I vowed to avoid it wherever possible. The culprit was std::string -- it changed something internally from one version of libc++ to the next on Linux. So I had to recompile everything, but the whole system I was using was with .so objects. templates do NOT make good API types IMO. -Steve
Nov 03 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/3/2014 2:28 PM, Steven Schveighoffer wrote:
 I had a very nasty experience with using a template-based API. I vowed to avoid
 it wherever possible.

 The culprit was std::string -- it changed something internally from one version
 of libc++ to the next on Linux. So I had to recompile everything, but the whole
 system I was using was with .so objects.

 templates do NOT make good API types IMO.
It seems this is blaming templates for a different problem. If I have: struct S { int x; }; in my C .h file, and I change it to: struct S { int x,y; }; Then all my API functions that take S as a value argument will require recompilation of any code that uses it. Would you conclude that C sux for making APIs? Of course not. You'd say that a stable API should use reference types, not value types. Having templates or not is irrelevant.
Nov 03 2014
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/3/14 8:09 PM, Walter Bright wrote:
 On 11/3/2014 2:28 PM, Steven Schveighoffer wrote:
 I had a very nasty experience with using a template-based API. I vowed
 to avoid
 it wherever possible.

 The culprit was std::string -- it changed something internally from
 one version
 of libc++ to the next on Linux. So I had to recompile everything, but
 the whole
 system I was using was with .so objects.

 templates do NOT make good API types IMO.
It seems this is blaming templates for a different problem. If I have: struct S { int x; }; in my C .h file, and I change it to: struct S { int x,y; }; Then all my API functions that take S as a value argument will require recompilation of any code that uses it. Would you conclude that C sux for making APIs? Of course not. You'd say that a stable API should use reference types, not value types.
a string is a reference type, the data is on the heap. But that is not the issue. The issue is that it's IMPOSSIBLE for me to ensure std::string remains stable, because it's necessarily completely exposed. There is no encapsulation. Even if I used a pointer to a std::string, the implementation change is going to cause issues. On the contrary, having C-strings as a parameter type has never broken for me. In later projects, I have added simple "immutable string" type modeled after D, which works so much better :) -Steve
Nov 03 2014
prev sibling parent reply Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 3 November 2014 19:55, Walter Bright via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 11/2/2014 11:45 PM, Manu via Digitalmars-d wrote:
 On 2 November 2014 04:15, Walter Bright via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 Why would templates make you nervous? They're not C++ templates!
What do you mean? How are D templates any different than C++ templates in a practical sense?
They're much more straightforward to use, syntactically and semantically.
This isn't anything to do with what I'm talking about. I'm not nervous because I don't like the D template syntax, it's because I don't feel it's a good idea for druntime (specifically) to have non-static interfaces which may expose, or create dependencies on druntime internals.
 I want a binary lib to be a binary lib. I don't think it's good form
 for the lowest level library in the language ecosystem to depend on
 templates (ie, client-side code generation).
What's the problem with that?
Templates which operate on library internals within client code create more dependencies on the library. It obscures the clarity of the API.
 This is the fundamental lib that will be present in every D
 application there is. If it is not a binary lib, then it can't be
 updated.
Have you ever looked at the C openssl.lib? The .h files with it are loaded with metaprogramming done with C macros. Yet I've never heard anyone complain about it. C .h files for DLLs are typically stuffed with C macros.
I'm not familiar with openssl, but regardless, I wouldn't consider openssl to lie at the same level in the ecosystem as druntime. The only library I would use as comparison is the CRT, which is firmly self-contained, with a well defined API. I understand your point in principle; if it is effectively 'helper' code which may be embedded in the client, *but still operates exclusively via the API*, then I have no issue. It doesn't appear to me that this is the case here...?
 Consider performance improvements are made to druntime, which every
 application should enjoy. If the code is templates, then the old
 version at time of compiling is embedded into existing client
 software, the update will have no effect unless the client software is
 rebuilt.
 More important, what about security fixes in druntime... imagine a
 critical security problem in druntime (I bet there's lots!); if we
 can't update druntime, then *every* D application is an exploit. Very
 shaky foundation for an ecosystem...
The defense presents openssl as Exhibit A! (The templates really only present the interface to the dll, not the guts of it.)
That's fine. But how is that the case here? Performing a string conversion implies direct access to the data being converted. If the function is a template, then my app is directly accessing internal druntime data. I don't like being classified as 'the offense', in fact, I'm really, really tired of unfairly being assigned this position whenever I try and make a stand for things that matter to me and my kind. I'm just highlighting an issue I recognise in the API, where there is no way to get a string out of druntime without GC or indirect function calls. It seems to be the appropriate time for me to raise this opinion, since we seem to be in a full-throttle state of supporting nogc, which is excellent, but replacing it with indirect function calls isn't awesome. This takes one problem and replaces it with a different problem with different characteristics. I would like to see an overload like the C lib; API receives memory, writes to it. This may not be the API that most people use, but I think it should exist. I then suggested that this API may be crafted in such a way that the higher-level goals can also be expressed through it. It could be wrapped in a little thing that may request a memory expansion if the buffer supplied wasn't big enough: struct OutputBuffer { char[] buffer; bool function(size_t size) extendBuffer; // <- user-supplied (indirect) function that may expand the buffer in some app-specific way. } toString writes to 'buffer', if it's not big enough, ONLY THEN make an indirect function call to get more memory. API is the similar to the synk callback, but only used in the rare case of overflow. Perhaps my little example could be re-jigged to support the synk delegate approach somehow, or perhaps it's just a different overload. I don't know exactly, other people have much more elaborate needs than myself. I'm just saying I don't like where this API is headed. It doesn't satisfy my needs, and I'm super-tired of being vilified for having that opinion! I'd be happy with 'toString(char[] outputBuffer)', but I think a design may exist where all requirements are satisfied, rather than just dismissing my perspective as niche and annoying, and rolling with what's easy. I would propose these criteria: * function is not a template exposing druntime internals to the host application * toString should be capable of receiving a pre-allocated(/stack) buffer and just write to it * indirect function calls should only happen only in the rare case of output buffer overflow, NOT in all cases
 druntime is a fundamental ecosystem library. It should be properly
 semantically version-ed, and particularly for security reasons, I
 think this should be taken very very seriously.
openssl!!! BTW, you should know that if a template is instantiated by the library itself, the compiler won't re-instantiate it and insert that in the calling code. It'll just call the instantiated binary.
I don't think that's what's on offer here. If toString is a template (ie, receives an OutputRange, which is a user-defined type), how can it be that the user's instantiation would already be present in druntime? It seems highly unlikely to me. At best, it's unreliable.
Nov 07 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 11/7/2014 5:41 PM, Manu via Digitalmars-d wrote:
 On 3 November 2014 19:55, Walter Bright via Digitalmars-d
 This isn't anything to do with what I'm talking about. I'm not nervous
 because I don't like the D template syntax, it's because I don't feel
 it's a good idea for druntime (specifically) to have non-static
 interfaces which may expose, or create dependencies on druntime
 internals.
My point with the C macro interfaces is it's not the "template" that makes an API design non-static, it's the way the API is designed. Such a determination can only be made on a case by case basis, not a blanket templates-are-bad.
 The only library I would use as comparison is the CRT, which is firmly
 self-contained, with a well defined API.
Hardly. It still uses macros, and it's still quite sensitive to various struct declarations which are in the corresponding .h files. If you don't agree, take a look in druntime at all the conditional compilation for the various CRTs which presumably have the same API. Check out fileno(), for example. Or errno. Or heck, just grep for #define in the C .h files, or grep for "struct{".
 That's fine. But how is that the case here? Performing a string
 conversion implies direct access to the data being converted. If the
 function is a template, then my app is directly accessing internal
 druntime data.
It all depends on how you design it. Recall that what defines an Output Range is the existence of put(r,e). There's no dependency on internals unless you deliberately expose it.
 I don't like being classified as 'the offense', in fact, I'm really,
 really tired of unfairly being assigned this position whenever I try
 and make a stand for things that matter to me and my kind.
Sorry, I thought you'd find the metaphor amusing. I didn't mean it to be offensive (!). Oops, there I go again!
 which is excellent, but replacing it with indirect function calls
 isn't awesome. This takes one problem and replaces it with a different
 problem with different characteristics.
I think this is a misunderstanding of output ranges.
 I would like to see an overload like the C lib; API receives memory,
 writes to it. This may not be the API that most people use, but I
 think it should exist.
 I then suggested that this API may be crafted in such a way that the
 higher-level goals can also be expressed through it. It could be
 wrapped in a little thing that may request a memory expansion if the
 buffer supplied wasn't big enough:

 struct OutputBuffer
 {
    char[] buffer;
    bool function(size_t size) extendBuffer; // <- user-supplied
 (indirect) function that may expand the buffer in some app-specific
 way.
 }

 toString writes to 'buffer', if it's not big enough, ONLY THEN make an
 indirect function call to get more memory.  API is the similar to the
 synk callback, but only used in the rare case of overflow.
We're reinventing Output Ranges again.
 I don't know exactly, other people have much more elaborate needs than
 myself. I'm just saying I don't like where this API is headed. It
 doesn't satisfy my needs, and I'm super-tired of being vilified for
 having that opinion!
Well, my opinion that sink should be replaced with output range isn't very popular here either, join the club.
    * toString should be capable of receiving a pre-allocated(/stack)
 buffer and just write to it
    * indirect function calls should only happen only in the rare case
 of output buffer overflow, NOT in all cases
Again, this is reinventing output ranges. It's exactly the niche they serve.
 I don't think that's what's on offer here. If toString is a template
 (ie, receives an OutputRange, which is a user-defined type), how can
 it be that the user's instantiation would already be present in
 druntime?
If one uses an output range that is imported from druntime, which is quite likely, then it is also quite likely that the instantiation with that type will already be present in druntime.
Nov 07 2014
prev sibling next sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Fri, Oct 31, 2014 at 12:04:24PM -0700, Walter Bright via Digitalmars-d wrote:
 On 10/27/2014 12:42 AM, Benjamin Thaut wrote:
I'm planning on doing a pull request for druntime which rewrites
every toString function within druntime to use the new sink
signature. That way druntime would cause a lot less allocations which
end up beeing garbage right away. Are there any objections against
doing so? Any reasons why such a pull request would not get accepted?
Why a sink version instead of an Output Range?
To allow toString to be a virtual function, perhaps? Besides, the sink version basically allows encapsulation of an output range -- instead of calling x.toString(outputRange) you just write: x.toString((const(char)[] data) { outputRange.put(data); }); Most of the time, you don't even need to bother with this, because given an output range, formattedWrite("%s"...) will automatically figure out which overload of toString to call and what arguments to pass to it. T -- The only difference between male factor and malefactor is just a little emptiness inside.
Oct 31 2014
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/31/2014 12:07 PM, H. S. Teoh via Digitalmars-d wrote:
 On Fri, Oct 31, 2014 at 12:04:24PM -0700, Walter Bright via Digitalmars-d
wrote:
 On 10/27/2014 12:42 AM, Benjamin Thaut wrote:
 I'm planning on doing a pull request for druntime which rewrites
 every toString function within druntime to use the new sink
 signature. That way druntime would cause a lot less allocations which
 end up beeing garbage right away. Are there any objections against
 doing so? Any reasons why such a pull request would not get accepted?
Why a sink version instead of an Output Range?
To allow toString to be a virtual function, perhaps?
Output ranges can be virtual functions. All an output range is is a type with a "put" method.
 Besides, the sink version basically allows encapsulation of an output
 range
The point of an output range is to encapsulate and abstract. Putting another wrapper around it just gives the impression we don't know what we're doing.
 -- instead of calling x.toString(outputRange) you just write:

 	x.toString((const(char)[] data) { outputRange.put(data); });

 Most of the time, you don't even need to bother with this, because given
 an output range, formattedWrite("%s"...) will automatically figure out
 which overload of toString to call and what arguments to pass to it.
What I object to with the sink design is there is no consistency in design - we cannot preach ranges as a best practice and then use some other methodology. BTW, just to be clear, I applaud fixing druntime to remove unnecessary GC allocations, and agree that with proper design most of the allocations can go away. It's just that sink and output ranges are both designed to solve the same problem in pretty much the same way. The difference appears to be little more than tomayto tomahto.
Oct 31 2014
next sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Fri, Oct 31, 2014 at 02:01:50PM -0700, Walter Bright via Digitalmars-d wrote:
 On 10/31/2014 12:07 PM, H. S. Teoh via Digitalmars-d wrote:
On Fri, Oct 31, 2014 at 12:04:24PM -0700, Walter Bright via Digitalmars-d wrote:
On 10/27/2014 12:42 AM, Benjamin Thaut wrote:
I'm planning on doing a pull request for druntime which rewrites
every toString function within druntime to use the new sink
signature. That way druntime would cause a lot less allocations
which end up beeing garbage right away. Are there any objections
against doing so? Any reasons why such a pull request would not get
accepted?
Why a sink version instead of an Output Range?
To allow toString to be a virtual function, perhaps?
Output ranges can be virtual functions. All an output range is is a type with a "put" method.
The problem is that you don't know the type of the output range in advance. So you'd have to templatize toString. Which means it can no longer be virtual. T -- Дерево держится корнями, а человек - друзьями.
Oct 31 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/31/14 2:11 PM, H. S. Teoh via Digitalmars-d wrote:
 On Fri, Oct 31, 2014 at 02:01:50PM -0700, Walter Bright via Digitalmars-d
wrote:
 On 10/31/2014 12:07 PM, H. S. Teoh via Digitalmars-d wrote:
 On Fri, Oct 31, 2014 at 12:04:24PM -0700, Walter Bright via Digitalmars-d
wrote:
 On 10/27/2014 12:42 AM, Benjamin Thaut wrote:
 I'm planning on doing a pull request for druntime which rewrites
 every toString function within druntime to use the new sink
 signature. That way druntime would cause a lot less allocations
 which end up beeing garbage right away. Are there any objections
 against doing so? Any reasons why such a pull request would not get
 accepted?
Why a sink version instead of an Output Range?
To allow toString to be a virtual function, perhaps?
Output ranges can be virtual functions. All an output range is is a type with a "put" method.
The problem is that you don't know the type of the output range in advance. So you'd have to templatize toString. Which means it can no longer be virtual.
Yah, for such stuff a delegate that takes a const(char)[] comes to mind. -- Andrei
Oct 31 2014
parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Fri, Oct 31, 2014 at 02:57:58PM -0700, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 10/31/14 2:11 PM, H. S. Teoh via Digitalmars-d wrote:
On Fri, Oct 31, 2014 at 02:01:50PM -0700, Walter Bright via Digitalmars-d wrote:
On 10/31/2014 12:07 PM, H. S. Teoh via Digitalmars-d wrote:
On Fri, Oct 31, 2014 at 12:04:24PM -0700, Walter Bright via Digitalmars-d wrote:
On 10/27/2014 12:42 AM, Benjamin Thaut wrote:
I'm planning on doing a pull request for druntime which rewrites
every toString function within druntime to use the new sink
signature. That way druntime would cause a lot less allocations
which end up beeing garbage right away. Are there any objections
against doing so? Any reasons why such a pull request would not
get accepted?
Why a sink version instead of an Output Range?
To allow toString to be a virtual function, perhaps?
Output ranges can be virtual functions. All an output range is is a type with a "put" method.
The problem is that you don't know the type of the output range in advance. So you'd have to templatize toString. Which means it can no longer be virtual.
Yah, for such stuff a delegate that takes a const(char)[] comes to mind. --
[...] Which is what the sink version of toString currently does. T -- Error: Keyboard not attached. Press F1 to continue. -- Yoon Ha Lee, CONLANG
Oct 31 2014
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/31/14 5:01 PM, Walter Bright wrote:
 On 10/31/2014 12:07 PM, H. S. Teoh via Digitalmars-d wrote:
 On Fri, Oct 31, 2014 at 12:04:24PM -0700, Walter Bright via
 Digitalmars-d wrote:
 On 10/27/2014 12:42 AM, Benjamin Thaut wrote:
 I'm planning on doing a pull request for druntime which rewrites
 every toString function within druntime to use the new sink
 signature. That way druntime would cause a lot less allocations which
 end up beeing garbage right away. Are there any objections against
 doing so? Any reasons why such a pull request would not get accepted?
Why a sink version instead of an Output Range?
To allow toString to be a virtual function, perhaps?
Output ranges can be virtual functions. All an output range is is a type with a "put" method.
He said "toString" not "sink". And there are more use cases than a type that implements 'put'.
 What I object to with the sink design is there is no consistency in
 design - we cannot preach ranges as a best practice and then use some
 other methodology.
Keep in mind that saying "toString will take output ranges" means that ALL toString implementers must handle ALL forms of output ranges. It's not an issue with "we don't know what we're doing", it's an issue of "let's not make everyone who wants to spit out a simple string handle 5+ different use cases, and you'd better test for them, because the compiler won't complain until it's used!" I think toString should be first and foremost SIMPLE. It already was -- return a string. But that forces people to allocate, and we want to avoid that. Using a sink is pretty much just as simple.
 BTW, just to be clear, I applaud fixing druntime to remove unnecessary
 GC allocations, and agree that with proper design most of the
 allocations can go away. It's just that sink and output ranges are both
 designed to solve the same problem in pretty much the same way. The
 difference appears to be little more than tomayto tomahto.
It is a huge difference to say EVERYONE who implements toString will take any templated type that purports to be an output range, vs giving one case to handle. -Steve
Nov 03 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/3/2014 8:09 AM, Steven Schveighoffer wrote:
 It is a huge difference to say EVERYONE who implements toString will take any
 templated type that purports to be an output range, vs giving one case to
handle.
All an output range is is a type with a 'put' method. That's it. You're making it out to be far more complex than it is.
Nov 03 2014
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/3/14 4:40 PM, Walter Bright wrote:
 On 11/3/2014 8:09 AM, Steven Schveighoffer wrote:
 It is a huge difference to say EVERYONE who implements toString will
 take any
 templated type that purports to be an output range, vs giving one case
 to handle.
All an output range is is a type with a 'put' method. That's it. You're making it out to be far more complex than it is.
Directly from the docs: (http://dlang.org/phobos/std_range.html#isOutputRange) void myprint(in char[] s) { } static assert(isOutputRange!(typeof(&myprint), char)); No 'put' in sight, except as a substring of isOutputRange. I don't think you realize what a beast supporting all output ranges is, or using them (hint: calling r.put for a generic output range is an ERROR). -Steve
Nov 03 2014
next sibling parent reply "Jonathan Marler" <johnnymarler gmail.com> writes:
On Monday, 3 November 2014 at 22:33:25 UTC, Steven Schveighoffer 
wrote:
 On 11/3/14 4:40 PM, Walter Bright wrote:
 On 11/3/2014 8:09 AM, Steven Schveighoffer wrote:
 It is a huge difference to say EVERYONE who implements 
 toString will
 take any
 templated type that purports to be an output range, vs giving 
 one case
 to handle.
All an output range is is a type with a 'put' method. That's it. You're making it out to be far more complex than it is.
Directly from the docs: (http://dlang.org/phobos/std_range.html#isOutputRange) void myprint(in char[] s) { } static assert(isOutputRange!(typeof(&myprint), char)); No 'put' in sight, except as a substring of isOutputRange. I don't think you realize what a beast supporting all output ranges is, or using them (hint: calling r.put for a generic output range is an ERROR). -Steve
In many cases templates are good because they provide the a way for the programmer to use a library optimized for their particular application. This is the case for the toString function. An argument can be made that using templates is dangerous because if they are used incorrectly, the number of template instantiates can blow up. But this can always be solved by the programmer by changing all their template calls to use the same template parameters. This allows the template solution to simultaneously support a sink that represents a real function, or a delegate, or whatever the application needs. I understand that people like having a binary library that instantiates it's own functions that have a static interface and I think there's value to that. But most of the value is in dynamic libraries that the compiler cannot optimize. When the compiler can optimize, let it:) I updated my test code to use a templated sink, here the link: http://marler.info/dtostring.d Method 1: ReturnString string toString(); Method 2: SinkDelegate void toString(void delegate(const(char)[]) sink); Method 3: SinkTemplate void toString(T)(T sink) if(isOutputRange!(T,const(char)[])); Method 4: SinkDelegateWithStaticHelperBuffer struct SinkStatic { char[64] buffer; void delegate(const(char)[]) sink; } void toString(ref SinkStatic sink); Method 5: SinkDelegateWithDynamicHelperBuffer struct SinkDynamic { char[] buffer; void delegate(const(char)[]) sink; } void toString(ref SinkDynamic sink); void toString(SinkDynamic sink); (DMD Compiler on x86) "dmd dtostring.d" RuntimeString run 1 (loopcount 10000000) Method 1 : 76 ms Method 2 : 153 ms Method 3 : 146 ms Method 4 : 157 ms Method 5ref : 165 ms Method 5noref: 172 ms StringWithPrefix run 1 (loopcount 1000000) Method 1 : 149 ms Method 2 : 22 ms Method 3 : 21 ms Method 4 : 80 ms Method 5ref : 81 ms Method 5noref: 82 ms ArrayOfStrings run 1 (loopcount 1000000) Method 1 : 1 sec Method 2 : 81 ms Method 3 : 77 ms Method 4 : 233 ms Method 5ref : 232 ms Method 5noref: 223 ms (DMD Compiler on x86 with Optimization) "dmd -O dtostring.d" RuntimeString run 1 (loopcount 10000000) Method 1 : 30 ms Method 2 : 65 ms Method 3 : 55 ms Method 4 : 68 ms Method 5ref : 68 ms Method 5noref: 67 ms StringWithPrefix run 1 (loopcount 1000000) Method 1 : 158 ms Method 2 : 9 ms Method 3 : 8 ms Method 4 : 63 ms Method 5ref : 64 ms Method 5noref: 66 ms ArrayOfStrings run 1 (loopcount 1000000) Method 1 : 1 sec, 292 ms Method 2 : 35 ms Method 3 : 34 ms Method 4 : 193 ms Method 5ref : 198 ms Method 5noref: 200 ms The results aren't suprising. The template out performs the delegate sink. In a very big project one might try to limit the number of instantiations of toString by using a specific toString instance that accepts some type common OutputRange wrapper which would make the template version perform the same as the sink delegate version, but for projects that don't need to worry about that, you will get better performance from more compiler optimization.
Nov 03 2014
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/3/14 6:05 PM, Jonathan Marler wrote:
 On Monday, 3 November 2014 at 22:33:25 UTC, Steven Schveighoffer wrote:
 On 11/3/14 4:40 PM, Walter Bright wrote:
 On 11/3/2014 8:09 AM, Steven Schveighoffer wrote:
 It is a huge difference to say EVERYONE who implements toString will
 take any
 templated type that purports to be an output range, vs giving one case
 to handle.
All an output range is is a type with a 'put' method. That's it. You're making it out to be far more complex than it is.
Directly from the docs: (http://dlang.org/phobos/std_range.html#isOutputRange) void myprint(in char[] s) { } static assert(isOutputRange!(typeof(&myprint), char)); No 'put' in sight, except as a substring of isOutputRange. I don't think you realize what a beast supporting all output ranges is, or using them (hint: calling r.put for a generic output range is an ERROR). -Steve
In many cases templates are good because they provide the a way for the programmer to use a library optimized for their particular application. This is the case for the toString function. An argument can be made that using templates is dangerous because if they are used incorrectly, the number of template instantiates can blow up. But this can always be solved by the programmer by changing all their template calls to use the same template parameters. This allows the template solution to simultaneously support a sink that represents a real function, or a delegate, or whatever the application needs.
If we make toString a template, we precludes it as a virtual function, and we force the object to expose its inner workings. I think the template solution has advantages, one being the possibility for optimization. But I don't think the gains are significant enough. It's also more complex than necessary.
 I understand that people like having a binary library that instantiates
 it's own functions that have a static interface and I think there's
 value to that.  But most of the value is in dynamic libraries that the
 compiler cannot optimize.  When the compiler can optimize, let it:)

 I updated my test code to use a templated sink, here the link:

 http://marler.info/dtostring.d


     Method 1: ReturnString
               string toString();
     Method 2: SinkDelegate
               void toString(void delegate(const(char)[]) sink);
     Method 3: SinkTemplate
               void toString(T)(T sink) if(isOutputRange!(T,const(char)[]));
     Method 4: SinkDelegateWithStaticHelperBuffer
               struct SinkStatic { char[64] buffer; void
 delegate(const(char)[]) sink; }
           void toString(ref SinkStatic sink);
     Method 5: SinkDelegateWithDynamicHelperBuffer
               struct SinkDynamic { char[] buffer; void
 delegate(const(char)[]) sink; }
           void toString(ref SinkDynamic sink);
           void toString(SinkDynamic sink);


 (DMD Compiler on x86) "dmd dtostring.d"
 RuntimeString run 1 (loopcount 10000000)
    Method 1     : 76 ms
    Method 2     : 153 ms
    Method 3     : 146 ms
    Method 4     : 157 ms
    Method 5ref  : 165 ms
    Method 5noref: 172 ms
 StringWithPrefix run 1 (loopcount 1000000)
    Method 1     : 149 ms
    Method 2     : 22 ms
    Method 3     : 21 ms
    Method 4     : 80 ms
    Method 5ref  : 81 ms
    Method 5noref: 82 ms
 ArrayOfStrings run 1 (loopcount 1000000)
    Method 1     : 1 sec
    Method 2     : 81 ms
    Method 3     : 77 ms
    Method 4     : 233 ms
    Method 5ref  : 232 ms
    Method 5noref: 223 ms


 (DMD Compiler on x86 with Optimization) "dmd -O dtostring.d"
 RuntimeString run 1 (loopcount 10000000)
    Method 1     : 30 ms
    Method 2     : 65 ms
    Method 3     : 55 ms
    Method 4     : 68 ms
    Method 5ref  : 68 ms
    Method 5noref: 67 ms
 StringWithPrefix run 1 (loopcount 1000000)
    Method 1     : 158 ms
    Method 2     : 9 ms
    Method 3     : 8 ms
    Method 4     : 63 ms
    Method 5ref  : 64 ms
    Method 5noref: 66 ms
 ArrayOfStrings run 1 (loopcount 1000000)
    Method 1     : 1 sec, 292 ms
    Method 2     : 35 ms
    Method 3     : 34 ms
    Method 4     : 193 ms
    Method 5ref  : 198 ms
    Method 5noref: 200 ms

 The results aren't suprising.  The template out performs the delegate
 sink.  In a very big project one might try to limit the number of
 instantiations of toString by using a specific toString instance that
 accepts some type common OutputRange wrapper which would make the
 template version perform the same as the sink delegate version, but for
 projects that don't need to worry about that, you will get better
 performance from more compiler optimization.
I think the performance gains are minimal. The only one that is significant is StringWithPrefix, which has a 11% gain. But that's still only 1ms, and 1ms on a PC can be attributed to external forces. I would increase the loop count on that one. Note, if you really want to see gains, use -inline. -Steve
Nov 03 2014
parent reply "Jonathan Marler" <johnnymarler gmail.com> writes:
On Tuesday, 4 November 2014 at 02:49:55 UTC, Steven Schveighoffer 
wrote:
 On 11/3/14 6:05 PM, Jonathan Marler wrote:
 In many cases templates are good because they provide the a 
 way for the
 programmer to use a library optimized for their particular 
 application.
 This is the case for the toString function.  An argument can 
 be made
 that using templates is dangerous because if they are used 
 incorrectly,
 the number of template instantiates can blow up.  But this can 
 always be
 solved by the programmer by changing all their template calls 
 to use the
 same template parameters.  This allows the template solution to
 simultaneously support a sink that represents a real function, 
 or a
 delegate, or whatever the application needs.
If we make toString a template, we precludes it as a virtual function, and we force the object to expose its inner workings. I think the template solution has advantages, one being the possibility for optimization. But I don't think the gains are significant enough. It's also more complex than necessary.
I was thinking you could have the best of both worlds with templates. For example, you could define the toString template like this: void toStringTemplate(T)(T sink) if(isOutputRange!(T,const(char)[])) Then you could declare an alias like this: alias toString = toStringTemplate!(void delegate(const(char)[])); Which (correct me if I'm wrong) I believe is equivalent to the original sink delegate function. This allows programmers to write the logic for toString once and allow a developer using the library to choose whether they want to use the delegate version or the generic output range version. This gives the user of the library the ability to choose the best version for their own application. Note: I added this "alias" method to my dtostring.d test code and it wasn't as fast as the delegate version. I'm not sure why as I thought the generated code would be identical. If anyone has any insight as to why this happened let me know. code is at http://marler.info/dtostring.d
Nov 03 2014
parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Tuesday, 4 November 2014 at 04:34:09 UTC, Jonathan Marler 
wrote:
 On Tuesday, 4 November 2014 at 02:49:55 UTC, Steven 
 Schveighoffer wrote:
 On 11/3/14 6:05 PM, Jonathan Marler wrote:
 In many cases templates are good because they provide the a 
 way for the
 programmer to use a library optimized for their particular 
 application.
 This is the case for the toString function.  An argument can 
 be made
 that using templates is dangerous because if they are used 
 incorrectly,
 the number of template instantiates can blow up.  But this 
 can always be
 solved by the programmer by changing all their template calls 
 to use the
 same template parameters.  This allows the template solution 
 to
 simultaneously support a sink that represents a real 
 function, or a
 delegate, or whatever the application needs.
If we make toString a template, we precludes it as a virtual function, and we force the object to expose its inner workings. I think the template solution has advantages, one being the possibility for optimization. But I don't think the gains are significant enough. It's also more complex than necessary.
I was thinking you could have the best of both worlds with templates. For example, you could define the toString template like this: void toStringTemplate(T)(T sink) if(isOutputRange!(T,const(char)[])) Then you could declare an alias like this: alias toString = toStringTemplate!(void delegate(const(char)[])); Which (correct me if I'm wrong) I believe is equivalent to the original sink delegate function. This allows programmers to write the logic for toString once and allow a developer using the library to choose whether they want to use the delegate version or the generic output range version. This gives the user of the library the ability to choose the best version for their own application. Note: I added this "alias" method to my dtostring.d test code and it wasn't as fast as the delegate version. I'm not sure why as I thought the generated code would be identical. If anyone has any insight as to why this happened let me know. code is at http://marler.info/dtostring.d
I'm sure it's been mentioned before, but you should try ldc/gdc as they have much more capable optimisers.
Nov 03 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/3/2014 2:33 PM, Steven Schveighoffer wrote:
 On 11/3/14 4:40 PM, Walter Bright wrote:
 On 11/3/2014 8:09 AM, Steven Schveighoffer wrote:
 It is a huge difference to say EVERYONE who implements toString will
 take any
 templated type that purports to be an output range, vs giving one case
 to handle.
All an output range is is a type with a 'put' method. That's it. You're making it out to be far more complex than it is.
Directly from the docs: (http://dlang.org/phobos/std_range.html#isOutputRange) void myprint(in char[] s) { } static assert(isOutputRange!(typeof(&myprint), char)); No 'put' in sight, except as a substring of isOutputRange. I don't think you realize what a beast supporting all output ranges is, or using them (hint: calling r.put for a generic output range is an ERROR).
The documentation says, more specifically, that the requirement is that it support put(r,e). The array operands are output ranges NOT because the output ranges need to know about arrays, but because arrays themselves have a put() operation (as defined in std.array). All the algorithm (such as a toString()) needs to do is call put(r,e). It doesn't need to do anything else. It isn't any more complicated than the sink interface.
Nov 03 2014
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/3/14 8:16 PM, Walter Bright wrote:
 On 11/3/2014 2:33 PM, Steven Schveighoffer wrote:
 On 11/3/14 4:40 PM, Walter Bright wrote:
 On 11/3/2014 8:09 AM, Steven Schveighoffer wrote:
 It is a huge difference to say EVERYONE who implements toString will
 take any
 templated type that purports to be an output range, vs giving one case
 to handle.
All an output range is is a type with a 'put' method. That's it. You're making it out to be far more complex than it is.
Directly from the docs: (http://dlang.org/phobos/std_range.html#isOutputRange) void myprint(in char[] s) { } static assert(isOutputRange!(typeof(&myprint), char)); No 'put' in sight, except as a substring of isOutputRange. I don't think you realize what a beast supporting all output ranges is, or using them (hint: calling r.put for a generic output range is an ERROR).
The documentation says, more specifically, that the requirement is that it support put(r,e). The array operands are output ranges NOT because the output ranges need to know about arrays, but because arrays themselves have a put() operation (as defined in std.array).
You can't import std.array from druntime.
 All the algorithm (such as a toString()) needs to do is call put(r,e).
 It doesn't need to do anything else. It isn't any more complicated than
 the sink interface.
Again, std.range.put isn't defined in druntime. And neither is isOutputRange. Are you planning on moving these things to druntime? -Steve
Nov 03 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/3/2014 5:47 PM, Steven Schveighoffer wrote:
 Again, std.range.put isn't defined in druntime.
True.
 And neither is isOutputRange.
Wouldn't really be needed.
 Are you planning on moving these things to druntime?
This illustrates a long running issue about what goes in druntime and what in phobos. It's not much of an argument for having a different API that does the same thing, only incompatible.
Nov 03 2014
parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Mon, Nov 03, 2014 at 05:56:56PM -0800, Walter Bright via Digitalmars-d wrote:
 On 11/3/2014 5:47 PM, Steven Schveighoffer wrote:
Again, std.range.put isn't defined in druntime.
True.
And neither is isOutputRange.
Wouldn't really be needed.
Are you planning on moving these things to druntime?
This illustrates a long running issue about what goes in druntime and what in phobos.
[...] Another prime example is std.typecons.Tuple, which blocked the implementation of byPair in AA's. T -- Political correctness: socially-sanctioned hypocrisy.
Nov 03 2014
prev sibling next sibling parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Friday, 31 October 2014 at 19:09:28 UTC, H. S. Teoh via 
Digitalmars-d wrote:
 Besides, the sink version basically allows encapsulation of an 
 output
 range -- instead of calling x.toString(outputRange) you just 
 write:

 	x.toString((const(char)[] data) { outputRange.put(data); });
In recent compiler versions we can just write: x.toString(data => outputRange.put(data));
Oct 31 2014
parent reply "Jonathan Marler" <johnnymarler gmail.com> writes:
On Friday, 31 October 2014 at 22:13:31 UTC, Jakob Ovrum wrote:
 On Friday, 31 October 2014 at 19:09:28 UTC, H. S. Teoh via 
 Digitalmars-d wrote:
 Besides, the sink version basically allows encapsulation of an 
 output
 range -- instead of calling x.toString(outputRange) you just 
 write:

 	x.toString((const(char)[] data) { outputRange.put(data); });
In recent compiler versions we can just write: x.toString(data => outputRange.put(data));
No need for the extra function, just call: x.toString(&(outputRange.put));
Oct 31 2014
parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Saturday, 1 November 2014 at 05:27:16 UTC, Jonathan Marler 
wrote:
 No need for the extra function, just call:

 x.toString(&(outputRange.put));
That doesn't work for a wide variety of possible cases, notably when `put` is a function template or when the code depends on std.range.put or some other UFCS `put` function. As such, it should be avoided in generic code, and then you might as well avoid it in general, lest your algorithm unnecessarily ends up breaking with output ranges you didn't test for after refactoring. (Note that parantheses are not required in your example)
Oct 31 2014
parent "Jonathan Marler" <johnnymarler gmail.com> writes:
On Saturday, 1 November 2014 at 06:04:56 UTC, Jakob Ovrum wrote:
 On Saturday, 1 November 2014 at 05:27:16 UTC, Jonathan Marler 
 wrote:
 No need for the extra function, just call:

 x.toString(&(outputRange.put));
That doesn't work for a wide variety of possible cases, notably when `put` is a function template or when the code depends on std.range.put or some other UFCS `put` function. As such, it should be avoided in generic code, and then you might as well avoid it in general, lest your algorithm unnecessarily ends up breaking with output ranges you didn't test for after refactoring. (Note that parantheses are not required in your example)
Ah yes, you are right that this wouldn't work in generic code. Meaning, if the code calling toString was itself a template accepting output ranges, then many times the &outputRange.put wouldn't work. In this case I think the anonymous function is a good way to go. I was more thinking of the case where the code calling toString was user code where the outputRange was a known type. Thanks for catching my silly assumption.
Nov 01 2014
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/31/14 3:07 PM, H. S. Teoh via Digitalmars-d wrote:
 On Fri, Oct 31, 2014 at 12:04:24PM -0700, Walter Bright via Digitalmars-d
wrote:
 On 10/27/2014 12:42 AM, Benjamin Thaut wrote:
 I'm planning on doing a pull request for druntime which rewrites
 every toString function within druntime to use the new sink
 signature. That way druntime would cause a lot less allocations which
 end up beeing garbage right away. Are there any objections against
 doing so? Any reasons why such a pull request would not get accepted?
Why a sink version instead of an Output Range?
To allow toString to be a virtual function, perhaps? Besides, the sink version basically allows encapsulation of an output range -- instead of calling x.toString(outputRange) you just write: x.toString((const(char)[] data) { outputRange.put(data); });
No, please don't do that. It's put(outputRange, data); -Steve
Nov 03 2014
parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Monday, 3 November 2014 at 16:02:23 UTC, Steven Schveighoffer 
wrote:
 On 10/31/14 3:07 PM, H. S. Teoh via Digitalmars-d wrote:
 On Fri, Oct 31, 2014 at 12:04:24PM -0700, Walter Bright via 
 Digitalmars-d wrote:
 On 10/27/2014 12:42 AM, Benjamin Thaut wrote:
 I'm planning on doing a pull request for druntime which 
 rewrites
 every toString function within druntime to use the new sink
 signature. That way druntime would cause a lot less 
 allocations which
 end up beeing garbage right away. Are there any objections 
 against
 doing so? Any reasons why such a pull request would not get 
 accepted?
Why a sink version instead of an Output Range?
To allow toString to be a virtual function, perhaps? Besides, the sink version basically allows encapsulation of an output range -- instead of calling x.toString(outputRange) you just write: x.toString((const(char)[] data) { outputRange.put(data); });
No, please don't do that. It's put(outputRange, data); -Steve
Why?
Nov 08 2014
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/8/14 5:25 AM, John Colvin wrote:
 On Monday, 3 November 2014 at 16:02:23 UTC, Steven Schveighoffer wrote:
 On 10/31/14 3:07 PM, H. S. Teoh via Digitalmars-d wrote:
     x.toString((const(char)[] data) { outputRange.put(data); });
No, please don't do that. It's put(outputRange, data);
Why?
import std.range; struct ORange { void put(char) {} } static assert(isOutputRange!(ORange, char)); void main() { char[] buf = "hello".dup; ORange r; //r.put(buf); // fails put(r, buf); // works } I have said before, making the UFCS function the same name as the member hook it uses was a bad anti-pattern. Now, you must always use the non-UFCS version of put, which sucks. -Steve
Nov 08 2014
prev sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/31/14 3:04 PM, Walter Bright wrote:
 On 10/27/2014 12:42 AM, Benjamin Thaut wrote:
 I'm planning on doing a pull request for druntime which rewrites every
 toString
 function within druntime to use the new sink signature. That way
 druntime would
 cause a lot less allocations which end up beeing garbage right away.
 Are there
 any objections against doing so? Any reasons why such a pull request
 would not
 get accepted?
Why a sink version instead of an Output Range?
A sink is an output range. Supporting all output ranges isn't necessary. -Steve
Nov 03 2014