www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Bloat with std.(string.)format?

reply Chris <wendlec tcd.ie> writes:
If I have code like this:

auto builder = appender!string;
builder ~= "Hello, World!";
builder ~= "I'm here!";
builder ~= "Now I'm there!";

the object file grows by 10-11 lines with each call to `builder 
~=`. If I use this:

builder ~= format("%s", "Hello, World!");
builder ~= format("%s", "I'm here!");
builder ~= format("%s", "Now I'm there!");

The object file is more than twice as big and it grows by 20 
lines with each call to `format`.

If I use

builder ~= format("%s %s %s", "Hello, World!", "I'm here!", "Now 
I'm there!");

the code bloat is even worse.

There are many situation where a formatting string is preferable 
to concatenation, however it adds _a lot_ of bloat. Would a 
custom formatter be preferable to reduce code bloat or should 
std/format.d be optimized? (Or both?)

dmd 2.067.1
-release -boundscheck=off -inline -O
Sep 17 2015
parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Thursday, 17 September 2015 at 09:54:07 UTC, Chris wrote:
 If I have code like this:

 auto builder = appender!string;
 builder ~= "Hello, World!";
 builder ~= "I'm here!";
 builder ~= "Now I'm there!";

 the object file grows by 10-11 lines with each call to `builder 
 ~=`. If I use this:

 builder ~= format("%s", "Hello, World!");
 builder ~= format("%s", "I'm here!");
 builder ~= format("%s", "Now I'm there!");

 The object file is more than twice as big and it grows by 20 
 lines with each call to `format`.

 If I use

 builder ~= format("%s %s %s", "Hello, World!", "I'm here!", 
 "Now I'm there!");

 the code bloat is even worse.

 There are many situation where a formatting string is 
 preferable to concatenation, however it adds _a lot_ of bloat. 
 Would a custom formatter be preferable to reduce code bloat or 
 should std/format.d be optimized? (Or both?)

 dmd 2.067.1
 -release -boundscheck=off -inline -O
Some initial bloat is expected, format is pretty big (although twice as big is a lot, unless your original code was quite small?). The extra bloat per call is likely due to inlining. I would hope that dmd would spot consecutive inlining of the same function and merge them, but perhaps it doesn't. You could certainly make a less feature complete implementation of format that is smaller. Have you tried with ldc or gdc. In particular, have you tried using ldc with --gc-sections on linux?
Sep 17 2015
parent reply Chris <wendlec tcd.ie> writes:
On Thursday, 17 September 2015 at 10:33:44 UTC, John Colvin wrote:
 Some initial bloat is expected, format is pretty big (although 
 twice as big is a lot, unless your original code was quite 
 small?).
It was in a test program. Only a few lines. But it would still add a lot of bloat in a program that uses it in different modules, wouldn't it?
 The extra bloat per call is likely due to inlining. I would 
 hope that dmd would spot consecutive inlining of the same 
 function and merge them, but perhaps it doesn't.
 You could certainly make a less feature complete implementation 
 of format that is smaller.
Don't know if it's worth the trouble.
 Have you tried with ldc or gdc. In particular, have you tried 
 using ldc with --gc-sections on linux?
Not yet. GDC and LDC always lag behind (this time considerably), so I'm usually stuck with DMD for development.
Sep 17 2015
parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Thursday, 17 September 2015 at 10:53:17 UTC, Chris wrote:
 On Thursday, 17 September 2015 at 10:33:44 UTC, John Colvin 
 wrote:
 Some initial bloat is expected, format is pretty big (although 
 twice as big is a lot, unless your original code was quite 
 small?).
It was in a test program. Only a few lines. But it would still add a lot of bloat in a program that uses it in different modules, wouldn't it?
The upfront cost is paid only once per unique template arguments per binary. So no, it doesn't scale badly there. Inlining, on the other hand, will - roughly speaking - increase binary sizes linearly with the number of calls. That's the cost you pay for (hopefully) better performance.
 The extra bloat per call is likely due to inlining. I would 
 hope that dmd would spot consecutive inlining of the same 
 function and merge them, but perhaps it doesn't.
 You could certainly make a less feature complete 
 implementation of format that is smaller.
Don't know if it's worth the trouble.
I would say not worth it, unless you have a real problem with binary sizes for an actual finished product. Even then, I'd say you could get bigger, easier gains by messing around with -fvisibility settings, --gc-sections, strip etc. on GDC and LDC
 Have you tried with ldc or gdc. In particular, have you tried 
 using ldc with --gc-sections on linux?
Not yet. GDC and LDC always lag behind (this time considerably), so I'm usually stuck with DMD for development.
That's a shame. https://github.com/ldc-developers/ldc/releases/tag/v0.16.0-alpha3 is at 2.067.1, is that not up-to-date enough?
Sep 17 2015
parent reply Chris <wendlec tcd.ie> writes:
On Thursday, 17 September 2015 at 12:49:03 UTC, John Colvin wrote:
 On Thursday, 17 September 2015 at 10:53:17 UTC, Chris wrote:
 On Thursday, 17 September 2015 at 10:33:44 UTC, John Colvin 
 wrote:
 Some initial bloat is expected, format is pretty big 
 (although twice as big is a lot, unless your original code 
 was quite small?).
It was in a test program. Only a few lines. But it would still add a lot of bloat in a program that uses it in different modules, wouldn't it?
The upfront cost is paid only once per unique template arguments per binary. So no, it doesn't scale badly there. Inlining, on the other hand, will - roughly speaking - increase binary sizes linearly with the number of calls. That's the cost you pay for (hopefully) better performance.
 The extra bloat per call is likely due to inlining. I would 
 hope that dmd would spot consecutive inlining of the same 
 function and merge them, but perhaps it doesn't.
 You could certainly make a less feature complete 
 implementation of format that is smaller.
Don't know if it's worth the trouble.
I would say not worth it, unless you have a real problem with binary sizes for an actual finished product. Even then, I'd say you could get bigger, easier gains by messing around with -fvisibility settings, --gc-sections, strip etc. on GDC and LDC
 Have you tried with ldc or gdc. In particular, have you tried 
 using ldc with --gc-sections on linux?
Not yet. GDC and LDC always lag behind (this time considerably), so I'm usually stuck with DMD for development.
That's a shame. https://github.com/ldc-developers/ldc/releases/tag/v0.16.0-alpha3 is at 2.067.1, is that not up-to-date enough?
Thanks. That's up to date enough now. Is it stable, though? For version 2.067.1 it took a long time this time. Maybe we should focus some of our efforts on LDC and GCD being up to date faster.
Sep 17 2015
parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Thursday, 17 September 2015 at 13:42:15 UTC, Chris wrote:
 On Thursday, 17 September 2015 at 12:49:03 UTC, John Colvin 
 wrote:
 [...]
Thanks. That's up to date enough now. Is it stable, though?
Reasonably so in my testing, but expect more bugs than in a full release.
 For version 2.067.1 it took a long time this time. Maybe we 
 should focus some of our efforts on LDC and GCD being up to 
 date faster.
It would be great to have more people working on them, yes.
Sep 17 2015
parent reply Chris <wendlec tcd.ie> writes:
On Thursday, 17 September 2015 at 15:17:21 UTC, John Colvin wrote:
 On Thursday, 17 September 2015 at 13:42:15 UTC, Chris wrote:
 On Thursday, 17 September 2015 at 12:49:03 UTC, John Colvin 
 wrote:
 [...]
Thanks. That's up to date enough now. Is it stable, though?
Reasonably so in my testing, but expect more bugs than in a full release.
 For version 2.067.1 it took a long time this time. Maybe we 
 should focus some of our efforts on LDC and GCD being up to 
 date faster.
It would be great to have more people working on them, yes.
I suppose it's an area most people (including myself) shy away from. I know next to nothing about compiler implementation.
Sep 17 2015
parent Kagamin <spam here.lot> writes:
On Thursday, 17 September 2015 at 15:45:10 UTC, Chris wrote:
 I suppose it's an area most people (including myself) shy away 
 from. I know next to nothing about compiler implementation.
Sometimes it's just diagnosis of test failures.
Sep 18 2015