www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Policy for exposing range structs

reply Seb <seb wilzba.ch> writes:
If I understand it correctly, the current policy in Phobos is 
that range methods should use static nested structs to avoid the 
name clutter and document the capabilities of the returned ranges 
in the documentation.
However there are a lot of old functions that still use public 
structs, which leads to the rather confusing documentation output 
in std.range:

Chunks · chunks · Cycle · cycle ... evenChunks · EvenChunks · 
FrontTransversal · frontTransversal · ... · indexed · Indexed · 
... · lockstep · Lockstep · .. · Recurrence · recurrence · 
RefRange · refRange · repeat · Repeat · ... · sequence · Sequence 
· SortedRange · ... · Take · take · ... Transversal · transversal 
· TransverseOptions · zip · Zip

(off-topic: it's quite interesting to see that sometimes the 
structs are before the method and sometimes after it)

Two arguments to keep exposing the structs are that (1) an API 
user can explicitly specify the return type (in contrast to auto) 
and (2) one can see the capabilities of the struct in the 
documentation.

There are many cases where methods in these structs are optional 
and depend on the capabilities of the input range (e.g. backward, 
length, random access, ...). I could imagine that

1) We rework ddoc, s.t. it doesn't list these structs in the 
overview and adds the struct methods to the function (could get 
ugly)
2) We deprecate those exposed structs and make them 
private/nested (does anyone know whether it's common to depend on 
those structs?)

What are your thoughts on this matter?

(This is a follow-up from a short discussion with Jakob Ovrum on 
github [1].)

[1] https://github.com/D-Programming-Language/phobos/pull/4027
Mar 25 2016
next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 3/25/16 7:37 AM, Seb wrote:
 If I understand it correctly, the current policy in Phobos is that range
 methods should use static nested structs to avoid the name clutter and
 document the capabilities of the returned ranges in the documentation.
 However there are a lot of old functions that still use public structs,
 which leads to the rather confusing documentation output in std.range:

 Chunks · chunks · Cycle · cycle ... evenChunks · EvenChunks ·
 FrontTransversal · frontTransversal · ... · indexed · Indexed · .... ·
 lockstep · Lockstep · .. · Recurrence · recurrence · RefRange · refRange
 · repeat · Repeat · ... · sequence · Sequence · SortedRange · ... ·
Take
 · take · ... Transversal · transversal · TransverseOptions · zip · Zip

 (off-topic: it's quite interesting to see that sometimes the structs are
 before the method and sometimes after it)

 Two arguments to keep exposing the structs are that (1) an API user can
 explicitly specify the return type (in contrast to auto) and (2) one can
 see the capabilities of the struct in the documentation.

 There are many cases where methods in these structs are optional and
 depend on the capabilities of the input range (e.g. backward, length,
 random access, ...). I could imagine that

 1) We rework ddoc, s.t. it doesn't list these structs in the overview
 and adds the struct methods to the function (could get ugly)
 2) We deprecate those exposed structs and make them private/nested (does
 anyone know whether it's common to depend on those structs?)

 What are your thoughts on this matter?
We should actually be moving *away from* voldemort types: https://forum.dlang.org/post/n96k3g$ka5$1 digitalmars.com -Steve
Mar 25 2016
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Friday, 25 March 2016 at 14:07:41 UTC, Steven Schveighoffer 
wrote:
 We should actually be moving *away from* voldemort types:
Indeed, they basically suck. Really, I think the structs ought to be where the bulk of the documentation is, and the function is just listed as a convenience method for creating them. That's really how it works anyway.
Mar 25 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/25/2016 10:30 AM, Adam D. Ruppe wrote:
 On Friday, 25 March 2016 at 14:07:41 UTC, Steven Schveighoffer wrote:
 We should actually be moving *away from* voldemort types:
Indeed, they basically suck.
I love those things! -- Andrei
Mar 25 2016
next sibling parent Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Saturday, March 26, 2016 01:24:11 Andrei Alexandrescu via Digitalmars-d 
wrote:
 On 03/25/2016 10:30 AM, Adam D. Ruppe wrote:
 On Friday, 25 March 2016 at 14:07:41 UTC, Steven Schveighoffer wrote:
 We should actually be moving *away from* voldemort types:
Indeed, they basically suck.
I love those things! -- Andrei
They're very cool in principle but have some issues in practice. As long as we can reasonably fix those issues, I think that we should continue to use them, but if we can't, then we'll have to reconsider using them. But I hope that we can solve the issues. It would certainly be a shame if we couldn't. - Jonathan M Davis
Mar 25 2016
prev sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 3/26/16 1:24 AM, Andrei Alexandrescu wrote:
 On 03/25/2016 10:30 AM, Adam D. Ruppe wrote:
 On Friday, 25 March 2016 at 14:07:41 UTC, Steven Schveighoffer wrote:
 We should actually be moving *away from* voldemort types:
Indeed, they basically suck.
I love those things! -- Andrei
I prefer voldemort types too! To repeat all the template machinery in an external struct is annoying. So nice to just put "struct X" inside the function. But clearly, not worth the pain if this can't be fixed. -Steve
Mar 26 2016
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/25/16 10:07 AM, Steven Schveighoffer wrote:
 We should actually be moving *away from* voldemort types:

 https://forum.dlang.org/post/n96k3g$ka5$1 digitalmars.com

 -Steve
Has this bug been submitted? -- Andrei
Mar 25 2016
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 3/25/16 11:07 AM, Andrei Alexandrescu wrote:
 On 3/25/16 10:07 AM, Steven Schveighoffer wrote:
 We should actually be moving *away from* voldemort types:

 https://forum.dlang.org/post/n96k3g$ka5$1 digitalmars.com
Has this bug been submitted? -- Andrei
I'm not sure it's a bug that can be fixed. It's caused by the design of the way template name mangling is included. I can submit a general "enhancement", but I don't know what it would say? Make template mangling more efficient? :) I suppose having a bug report with a demonstration of why we should change it is a good thing. I'll add that. -Steve
Mar 25 2016
next sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Fri, Mar 25, 2016 at 11:40:11AM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 3/25/16 11:07 AM, Andrei Alexandrescu wrote:
On 3/25/16 10:07 AM, Steven Schveighoffer wrote:
We should actually be moving *away from* voldemort types:

https://forum.dlang.org/post/n96k3g$ka5$1 digitalmars.com
Has this bug been submitted? -- Andrei
I'm not sure it's a bug that can be fixed. It's caused by the design of the way template name mangling is included. I can submit a general "enhancement", but I don't know what it would say? Make template mangling more efficient? :) I suppose having a bug report with a demonstration of why we should change it is a good thing. I'll add that.
[...] We've been talking about compressing template symbols a lot recently, but there's a very simple symbol size reduction that we can do right now: most of the templates in Phobos are eponymous templates, and under the current mangling scheme that means repetition of the template name and the eponymous member in the symbol. My guess is that most of the 4k symbol bloats come from eponymous templates. In theory, a single character (or something in that vicinity) ought to be enough to indicate an eponymous template. That should cut down symbol size significantly (I'm guessing about 30-40% reduction at a minimum, probably more in practice) without requiring a major overhaul of the mangling scheme. T -- Windows: the ultimate triumph of marketing over technology. -- Adrian von Bidder
Mar 25 2016
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 3/25/16 12:18 PM, H. S. Teoh via Digitalmars-d wrote:
 On Fri, Mar 25, 2016 at 11:40:11AM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 3/25/16 11:07 AM, Andrei Alexandrescu wrote:
 On 3/25/16 10:07 AM, Steven Schveighoffer wrote:
 We should actually be moving *away from* voldemort types:

 https://forum.dlang.org/post/n96k3g$ka5$1 digitalmars.com
Has this bug been submitted? -- Andrei
I'm not sure it's a bug that can be fixed. It's caused by the design of the way template name mangling is included. I can submit a general "enhancement", but I don't know what it would say? Make template mangling more efficient? :) I suppose having a bug report with a demonstration of why we should change it is a good thing. I'll add that.
[...] We've been talking about compressing template symbols a lot recently, but there's a very simple symbol size reduction that we can do right now: most of the templates in Phobos are eponymous templates, and under the current mangling scheme that means repetition of the template name and the eponymous member in the symbol. My guess is that most of the 4k symbol bloats come from eponymous templates. In theory, a single character (or something in that vicinity) ought to be enough to indicate an eponymous template. That should cut down symbol size significantly (I'm guessing about 30-40% reduction at a minimum, probably more in practice) without requiring a major overhaul of the mangling scheme.
I don't think it's that simple. For example: auto foo(T)(T t) Needs to repeat T (whatever it happens to be) twice -- once for the template foo, and once for the function parameter. If foo returns an internally defined type that can be passed to foo: x.foo.foo.foo.foo Each nesting multiplies the size of the symbol by 2 (at least, maybe even 3). So it's exponential growth. Even if you compress it to one character, having a chain of, say, 16 calls brings you to 65k characters for the symbol. We need to remove the number of times the symbol is repeated, via some sort of substitution. Added the bug report. Take a look and see what you think. https://issues.dlang.org/show_bug.cgi?id=15831 -Steve
Mar 25 2016
parent reply Anon <anon anon.anon> writes:
On Friday, 25 March 2016 at 18:20:12 UTC, Steven Schveighoffer 
wrote:
 On 3/25/16 12:18 PM, H. S. Teoh via Digitalmars-d wrote:
 On Fri, Mar 25, 2016 at 11:40:11AM -0400, Steven Schveighoffer 
 via Digitalmars-d wrote:
 On 3/25/16 11:07 AM, Andrei Alexandrescu wrote:
 On 3/25/16 10:07 AM, Steven Schveighoffer wrote:
 We should actually be moving *away from* voldemort types:

 https://forum.dlang.org/post/n96k3g$ka5$1 digitalmars.com
Has this bug been submitted? -- Andrei
I'm not sure it's a bug that can be fixed. It's caused by the design of the way template name mangling is included. I can submit a general "enhancement", but I don't know what it would say? Make template mangling more efficient? :) I suppose having a bug report with a demonstration of why we should change it is a good thing. I'll add that.
[...] We've been talking about compressing template symbols a lot recently, but there's a very simple symbol size reduction that we can do right now: most of the templates in Phobos are eponymous templates, and under the current mangling scheme that means repetition of the template name and the eponymous member in the symbol. My guess is that most of the 4k symbol bloats come from eponymous templates. In theory, a single character (or something in that vicinity) ought to be enough to indicate an eponymous template. That should cut down symbol size significantly (I'm guessing about 30-40% reduction at a minimum, probably more in practice) without requiring a major overhaul of the mangling scheme.
I don't think it's that simple. For example: auto foo(T)(T t) Needs to repeat T (whatever it happens to be) twice -- once for the template foo, and once for the function parameter. If foo returns an internally defined type that can be passed to foo: x.foo.foo.foo.foo Each nesting multiplies the size of the symbol by 2 (at least, maybe even 3). So it's exponential growth. Even if you compress it to one character, having a chain of, say, 16 calls brings you to 65k characters for the symbol. We need to remove the number of times the symbol is repeated, via some sort of substitution. Added the bug report. Take a look and see what you think. https://issues.dlang.org/show_bug.cgi?id=15831 -Steve
These repetitions could be eliminated relatively easily (from a user's perspective, anyway; things might be more difficult in the actual implementation). Two changes to the mangling: 1) `LName`s of length 0 (which currently cannot exist) mean to repeat the previous `LName` of the current symbol. 2) N `Number` is added as a valid `Type`, meaning "Type Back Reference". Basically, all instances of a struct/class/interface/enum type in a symbol's mangling get counted (starting from zero), and subsequent instances of that type can be referred to by N0, N1, N2, etc. So given: ``` module mod; struct Foo; Foo* func(Foo* a, Foo* b); ``` `func` currently mangles as: _D3mod4funcFPS3mod3FooPS3mod3FooZPS3mod3Foo It would instead be mangled as: _D3mod4funcFPS3mod3FooPN0ZPN0 Nested templates declarations would get numbered depth first as follows: S7!(S2!(S0, S1), S6!(S3, S5!(S4))) I have another idea for reducing the byte impact of template string value parameters, but it is a bit more complicated and I need to finish de-bugging and optimizing some code to make sure it will work as well as I think. I'll post more on that soon, I suspect.
Mar 25 2016
next sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 3/25/16 3:04 PM, Anon wrote:
 These repetitions could be eliminated relatively easily (from a user's
 perspective, anyway; things might be more difficult in the actual
 implementation).

 Two changes to the mangling:
[snip] Please add these ideas to the bug report! I'm not sure if it completely fixes this or not, but it's worth exploring options. This is similar to how I was thinking we should approach this. -Steve
Mar 25 2016
prev sibling parent reply Johan Engelen <j j.nl> writes:
On Friday, 25 March 2016 at 19:04:46 UTC, Anon wrote:
 Two changes to the mangling:

 1) `LName`s of length 0 (which currently cannot exist) mean to 
 repeat the previous `LName` of the current symbol.

 2) N `Number` is added as a valid `Type`, meaning "Type Back 
 Reference". Basically, all instances of a 
 struct/class/interface/enum type in a symbol's mangling get 
 counted (starting from zero), and subsequent instances of that 
 type can be referred to by N0, N1, N2, etc.
I have implemented the second part (simplest I thought) to see what it would bring (in LDC). 'Q' seems unused, so I used that instead of 'N' ;) I'm not 100% certain of the implementation, as it relies on DMD using the same pointer values for recurring types but it seemed to work with complicated code. The core.demangle demangler was relatively easy to upgrade for repeated types. Unfortunately, and perhaps expectedly, it did not give a large size reduction / speed boost. Not very statistically sound results though ;) The change is relatively easy, I think, so perhaps it is still worthwhile pursuing. An implementation issue that changes in mangling have to deal with: One surprising thing was that std.traits uses the mangled name in `ParameterStorageClassTuple`. So std.traits contains its own mini mangling parser, and also calls .mangleof on Types, to skip to the next function argument in the mangled function string. The "type back reference" scheme means that Type.mangleof may not correspond to how the Type embedded in the mangling of a function, and so I had to upgrade std.traits with its own parser (I copied core.demangle.Demangle, and removed all output complications). For changes in mangling, it'd be good if std.traits.ParameterStorageClassTuple could be rewritten to not use the mangled function name + string processing.
Apr 01 2016
parent reply Johan Engelen <j j.nl> writes:
On Friday, 1 April 2016 at 11:00:29 UTC, Johan Engelen wrote:
 Unfortunately, and perhaps expectedly, it did not give a large 
 size reduction / speed boost. Not very statistically sound 
 results though ;)
The times I measured are not including linking (!). But I think the small difference in file size does not predict much improved link times. Meanwhile, I've implemented hashing of function names and other symbols *for the backend*, giving an object file size reduction of ~25% (hashing everything larger than 100 chars) for my current testcase (251MB -> 189MB). Hashing symbols in the FE is not possible with my testcase because of std.traits.ParameterStorageClassTuple... :/
 For changes in mangling, it'd be good if 
 std.traits.ParameterStorageClassTuple could be rewritten to not 
 use the mangled function name + string processing.
std.traits.ParameterStorageClassTuple is a major obstacle for aggressive mangling changes. Perhaps someone has the time and skills to rewrite it if possible? (My D-fu is not really up to that task) I'll try and keep you posted with some more measurements, to get a feel for what matters / what doesn't matter too much.
Apr 01 2016
parent reply Johan Engelen <j j.nl> writes:
On Friday, 1 April 2016 at 14:46:42 UTC, Johan Engelen wrote:
 Meanwhile, I've implemented hashing of function names and other 
 symbols *for the backend*, giving an object file size reduction 
 of ~25% (hashing everything larger than 100 chars) for my 
 current testcase (251MB -> 189MB).
 Hashing symbols in the FE is not possible with my testcase 
 because of std.traits.ParameterStorageClassTuple... :/
See my PR for LDC: https://github.com/ldc-developers/ldc/pull/1445 "This adds MD5 hashing of symbol names that are larger than threshold set by -hashthres. What is very unfortunate is that std.traits depends on the mangled name, doing string parsing of the mangled name of symbols to obtain symbol traits. This means that mangling cannot be changed (dramatically, like hashing) at a high level, and the hashing has to be done on a lower level. Hashed symbols look like this: _D3one3two5three3L3433_46a82aac733d8a4b3588d7fa8937aad66Result3fooZ ddemangle gives: one.two.three.L34._46a82aac733d8a4b3588d7fa8937aad6.Result.foo Meaning: this symbol is defined in module one.two.three on line 34. The identifier is foo and is contained in the struct or class Result. Symbols that may be hashed: - functions - struct/class initializer - vtable - typeinfo (needed surgery inside FE code) The feature is experimental, and has been tested on Weka.io's codebase. Compilation with -hashthres=1000 results in a binary that is half the size of the original (201MB vs. 461MB). I did not observe a significant difference in total build times. Hash threshold of 8000 gives 229MB, 800 gives 195MB binary size: there is not much gain after a certain hash threshold. Linking Weka's code fails with a threshold of 500: phobos contains a few large symbols (one larger than 8kb!) and this PR currently does not disable hashing of symbols that are inside phobos, hence "experimental". Future work could try to figure out whether a symbol is inside phobos or not."
Apr 19 2016
next sibling parent Jack Stouffer <jack jackstouffer.com> writes:
On Tuesday, 19 April 2016 at 14:44:12 UTC, Johan Engelen wrote:
 What is very unfortunate is that std.traits depends on the 
 mangled name, doing string parsing of the mangled name of 
 symbols to obtain symbol traits.
I'm surprised that's how it works. I just assumed there was some __traits call for that.
 Compilation with -hashthres=1000 results in a binary that is 
 half the size of the original (201MB vs. 461MB).
Seems promising!
Apr 19 2016
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 4/19/16 10:44 AM, Johan Engelen wrote:

 The feature is experimental, and has been tested on Weka.io's codebase.
 Compilation with -hashthres=1000 results in a binary that is half the
 size of the original (201MB vs. 461MB). I did not observe a significant
 difference in total build times.
I'd be surprised link times did not improve. With my trivial test cases, the linker started getting hung up with very long symbols. The compilation itself was quick though. Certainly, cutting exe sizes (this is significantly more than half, BTW) is a good thing in itself. Another thing, however, is memory usage of the compiler. My understanding is that having large symbol names requires more memory. How does that fare? -Steve
Apr 19 2016
next sibling parent reply Marco Leise <Marco.Leise gmx.de> writes:
What about using old string compression algorithms on the
mangled name? That should effectively get rid of all the
repeated words.

-- 
Marco
Apr 19 2016
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 4/19/16 4:50 PM, Marco Leise wrote:
 What about using old string compression algorithms on the
 mangled name? That should effectively get rid of all the
 repeated words.
It's somewhat recursive, so I don't know how well string compression will work. It's possible, definitely. But I think a substitution scheme that is tailored to the issue may prove more effective. Just the fact that removing the struct out of the function itself cleans it up shows that there is a path to make this happen. -Steve
Apr 19 2016
prev sibling parent Johan Engelen <j j.nl> writes:
On Tuesday, 19 April 2016 at 17:59:48 UTC, Steven Schveighoffer 
wrote:
 On 4/19/16 10:44 AM, Johan Engelen wrote:

 The feature is experimental, and has been tested on Weka.io's 
 codebase.
 Compilation with -hashthres=1000 results in a binary that is 
 half the
 size of the original (201MB vs. 461MB). I did not observe a 
 significant
 difference in total build times.
I'd be surprised link times did not improve. With my trivial test cases, the linker started getting hung up with very long symbols. The compilation itself was quick though.
`nm` on my Mac cannot handle the long symbol names, and I have to use llvm-nm :)
 Certainly, cutting exe sizes (this is significantly more than 
 half, BTW) is a good thing in itself.

 Another thing, however, is memory usage of the compiler. My 
 understanding is that having large symbol names requires more 
 memory. How does that fare?
I don't know. Generating the (unhashed) mangled symbol names seems fast. LDC is using so much memory (>6 GB) that I think a few extra megabytes for symbol names is not noticeable (the symbol name is cached for functions; adding caching for other symbol mangled names did not change compile times but I didn't look at memory). What worries me is that std.traits is doing all that string processing for simple things like getting the storage class of a function parameter. When it starts to do CTFE on a string that is 1 MB, that's not good I think.
Apr 20 2016
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/25/2016 11:40 AM, Steven Schveighoffer wrote:
 On 3/25/16 11:07 AM, Andrei Alexandrescu wrote:
 On 3/25/16 10:07 AM, Steven Schveighoffer wrote:
 We should actually be moving *away from* voldemort types:

 https://forum.dlang.org/post/n96k3g$ka5$1 digitalmars.com
Has this bug been submitted? -- Andrei
I'm not sure it's a bug that can be fixed. It's caused by the design of the way template name mangling is included. I can submit a general "enhancement", but I don't know what it would say? Make template mangling more efficient? :) I suppose having a bug report with a demonstration of why we should change it is a good thing. I'll add that. -Steve
Compression of template names. -- Andrei
Mar 25 2016
next sibling parent reply Anon <anon anon.anon> writes:
On Saturday, 26 March 2016 at 05:22:56 UTC, Andrei Alexandrescu 
wrote:
 On 03/25/2016 11:40 AM, Steven Schveighoffer wrote:
 On 3/25/16 11:07 AM, Andrei Alexandrescu wrote:
 On 3/25/16 10:07 AM, Steven Schveighoffer wrote:
 We should actually be moving *away from* voldemort types:

 https://forum.dlang.org/post/n96k3g$ka5$1 digitalmars.com
Has this bug been submitted? -- Andrei
I'm not sure it's a bug that can be fixed. It's caused by the design of the way template name mangling is included. I can submit a general "enhancement", but I don't know what it would say? Make template mangling more efficient? :) I suppose having a bug report with a demonstration of why we should change it is a good thing. I'll add that. -Steve
Compression of template names. -- Andrei
Would literally not help. The problem in the bug report is recursive expansion of templates creating mangled name length O(2^n) where n is the number of recursive calls. If you compress that to an eighth of its size, you get O(2^(n-3)), which isn't actually fixing things, as that is still O(2^n). The (conceptually) simple change I suggested brings the mangled name length down to O(n). You could compress *that*, but then you are compressing such a small amount of data most compression algorithms will cause the size to grow, not shrink.
Mar 26 2016
parent reply Johan Engelen <j j.nl> writes:
On Saturday, 26 March 2016 at 17:42:06 UTC, Anon wrote:
 The (conceptually) simple change I suggested brings the mangled 
 name length down to O(n).
Hi Anon, I've started implementing your idea. But perhaps you already have a beginning of an implementation? If so, please contact me :) https://github.com/JohanEngelen Thanks, Johan
Mar 31 2016
parent reply Anon <anon anon.anon> writes:
On Thursday, 31 March 2016 at 11:15:18 UTC, Johan Engelen wrote:
 Hi Anon,
   I've started implementing your idea. But perhaps you already 
 have a beginning of an implementation? If so, please contact me 
 :)
 https://github.com/JohanEngelen

 Thanks,
   Johan
No, I haven't started implemented things for that idea. The experiments I did with it were by manually altering mangled names in Vim. I've been spending my D time thinking about potential changes to how template string value parameters are encoded. My code is a bit messy (and not integrated with the compiler at all), but I use a bootstring technique (similar to Punycode[1]) to encode Unicode text using only [a-zA-Z0-9_]. The results are always smaller than base16 and base64 encodings. For plain ASCII text, the encoding tends to grow by a small amount. For text containing larger UTF-8 code points, the encoding usually ends up smaller than the raw UTF-8 string. A couple examples of my encoder at work: --- some_identifier some_identifier_ /usr/include/d/std/stdio.d usrincludedstdstdiod_jqacdhbd Hello, World! HelloWorld_0far4i こんにちは世界 (UTF-8: 21 bytes) XtdCDr5mL02g3rv (15 bytes) --- I still need to clean up the encoder/decoder and iron out some specifics on how this could fit into the mangling, but I should have time to work on this some more later today/tomorrow. [1]: https://en.wikipedia.org/wiki/Punycode
Mar 31 2016
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 31 March 2016 at 16:38:59 UTC, Anon wrote:
 I've been spending my D time thinking about potential changes 
 to how template string value parameters are encoded.
How does it compare to simply gzipping the string and writing it out with base62?
Mar 31 2016
parent reply Anon <anon anon.anon> writes:
On Thursday, 31 March 2016 at 16:46:42 UTC, Adam D. Ruppe wrote:
 On Thursday, 31 March 2016 at 16:38:59 UTC, Anon wrote:
 I've been spending my D time thinking about potential changes 
 to how template string value parameters are encoded.
How does it compare to simply gzipping the string and writing it out with base62?
My encoding is shorter in the typical use case, at least when using xz instead gzip. (xz was quicker/easier to get raw compressed data without a header.) 1= Raw UTF-8, 2= my encoder, 3= `echo -n "$1" | xz -Fraw | base64` --- 1. some_identifier 2. some_identifier_ 3. AQA0c29tZV9pZGVudGlmaWVyAA== 1. /usr/include/d/std/stdio.d 2. usrincludedstdstdiod_jqacdhbd 3. AQAZL3Vzci9pbmNsdWRlL2Qvc3RkL3N0ZGlvLmQa 1. Hello, World! 2. HelloWorld_0far4i 3. AQAMSGVsbG8sIFdvcmxkIQA= 1. こんにちは世界 2. XtdCDr5mL02g3rv 3. AQAU44GT44KT44Gr44Gh44Gv5LiW55WMAA== --- The problem is that compression isn't magical, and a string needs to be long enough and have enough repetition to compress well. If it isn't, compression causes the data to grow, and base64 compounds that. For the sake of fairness, let's also do a larger (compressible) string. Input: 1000 lines, each with the text "Hello World" 1. 12000 bytes 2. 12008 bytes 3. 94 bytes However, my encoding is still fairly compressible, so we *could* route it through the same compression if/when a symbol is determined to be compressible. That yields 114 bytes. The other thing I really like about my encoder is that plain C identifiers are left verbatim visible in the result. That would be especially nice with, e.g., opDispatch. Would a hybrid approach (my encoding, optionally using compression when it would be advantageous) make sense? My encoder already has to process the whole string, so it could do some sort of analysis to estimate how compressible the result would be. I don't know what that would look like, but it could work. Alternately, we could do the compression on whole mangled names, not just the string values, but I don't know how desirable that is.
Mar 31 2016
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 31 March 2016 at 17:30:44 UTC, Anon wrote:
 My encoding is shorter in the typical use case
Yeah, but my thought is the typical use case isn't actually the problem - it is OK as it is. Longer strings are where it gets concerning to me.
 Would a hybrid approach (my encoding, optionally using 
 compression when it would be advantageous) make sense?
Yeah, that might be cool too.
 Alternately, we could do the compression on whole mangled 
 names, not just the string values, but I don't know how 
 desirable that is.
There are often a lot of repeats in there... so maybe.
Mar 31 2016
parent reply Anon <anon anon.anon> writes:
On Thursday, 31 March 2016 at 17:52:43 UTC, Adam D. Ruppe wrote:
 Yeah, but my thought is the typical use case isn't actually the 
 problem - it is OK as it is. Longer strings are where it gets 
 concerning to me.
Doubling the size of UTF-8 (the effect of the current base16 encoding) bothers me regardless of string length. Especially when use of __MODULE__ and/or __FILE__ as template arguments seems to be fairly common. Having thought about it a bit more, I am now of the opinion that super-long strings have no business being in template args, so we shouldn't cater to them. The main case with longer strings going into template arguments I'm aware of is for when strings will be processed, then fed to `mixin()`. However, that compile time string processing is much better served with a CTFE-able function using a Rope for manipulating the string until it is finalized into a normal string. If you are doing compile-time string manipulation with templates, the big symbol is the least of your worries. The repeated allocation and reallocation will quickly make your code uncompilable due to soaring RAM usage. The same is (was?) true of non-rope CTFE string manipulation. Adding a relatively memory-intensive operation like compression isn't going to help in that case. Granted, the language shouldn't have a cut-off for string length in template arguments, but if you load a huge string as a template argument, I think something has gone wrong in your code. Catering to that seems to me to be encouraging it, despite the existence of much better approaches. The only other case I can think of where you might want a large string as a template argument is something like: ``` struct Foo(string s) { mixin(s); } Foo!q{...} foo; ``` But that is much better served as something like: ``` mixin template Q() { mixin(q{...}); // String doesn't end up in mangled name } struct Foo(alias A) { mixin A; } Foo!Q foo; ``` Or better yet (when possible): ``` mixin template Q() { ... // No string mixin needed } struct Foo(alias A) { mixin A; } Foo!Q foo; ``` The original mangling discussion started from the need to either fix a mangling problem or officially discourage Voldemort types. Most of the ideas we've been discussing and/or working on have been toward keeping Voldemort types, since many here want them. I'm not sure what use case would actually motivate compressing strings/symbols. My motivations for bootstring encoding: * Mostly care about opDispatch, and use of __FILE__/__MODULE__ as compile-time parameters. Symbol bloat from their use isn't severe, but it could be better. * ~50% of the current mangling size for template string parameters * Plain C identifier strings (so, most identifiers) will end up directly readable in the mangled name even without a demangler * Retains current ability to access D symbols from C (in contrast to ideas that would use characters like '$' or '?') * I already needed bootstring encoding for an unrelated project, and figured I could offer to share it with D, since it seems like it would fit here, too
Mar 31 2016
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 31 March 2016 at 20:12:34 UTC, Anon wrote:
 Having thought about it a bit more, I am now of the opinion 
 that super-long strings have no business being in template 
 args, so we shouldn't cater to them.
meh, if it is allowed, it is going to happen. Why make it worse when there's so little cost in making it better?
 The main case with longer strings going into template arguments 
 I'm aware of is for when strings will be processed, then fed to 
 `mixin()`.
I often don't actually modify the string at all and by putting the string as a template argument, it enables a kind of compile-time memoization like I talked about here a short while ago: http://stackoverflow.com/a/36251271/1457000 The string may be exceedingly if imported from a file or generated externally and cached: MyType!(import("definition.txt")) foo; enum foo = ctfeFunction(); MyType!foo test; I've never had a huge problem with this in practice, but I've never had a huge problem with big names in practice at all either. I can imagine both though. (what bothers me more than the mangle actually is the compiler error messages showing all those strings. Gross. I'd like to see XML error messages to make displaying them easier. But that's a separate topic.)
 The original mangling discussion started from the need to 
 either fix a mangling problem or officially discourage 
 Voldemort types.
Yes, indeed, that's probably the bigger problem.
 * Retains current ability to access D symbols from C (in 
 contrast to ideas that would use characters like '$' or '?')
$ is actually a valid identifier character in C (and quite a few other languages). You can use it in the linker as well as in C source code.
Mar 31 2016
parent Anon <anon anon.anon> writes:
On Thursday, 31 March 2016 at 20:40:03 UTC, Adam D. Ruppe wrote:
 meh, if it is allowed, it is going to happen. Why make it worse 
 when there's so little cost in making it better?
Define "little cost". Whatever compression algorithm chosen will need support added to any/all tools that want to demangle D. GDB and LLDB currently link to liblzma (on my system, at least. Configurations may vary). nm and objdump don't link to any compression lib. Good luck convincing binutils to add compression dependencies like that for D when they don't need them any other mangling schemes. And no, ddemangle isn't a solution here, as then all those projects would need to be able to refer to it, and the best way for them to do that is to bundle it. Since ddemangle is written in D, that would mean binutils would suddenly depend on having a working D compiler. That won't happen in the next decade. Also, any D code that uses core.demangle for any reason would suddenly depend on that compression library. I'm not even fully convinced that my bootstring idea is low enough cost, and it's fairly simple, fast, and memory efficient compared to any compression algorithm.
 I often don't actually modify the string at all and by putting 
 the string as a template argument, it enables a kind of 
 compile-time memoization like I talked about here a short while 
 ago: http://stackoverflow.com/a/36251271/1457000

 The string may be exceedingly if imported from a file or 
 generated externally and cached:

 MyType!(import("definition.txt")) foo;

 enum foo = ctfeFunction();

 MyType!foo test;
Just assigning to an enum gets you memoization (tested on LDC w/ DFE 2.68.2). I don't see how the template factors into this. Now, yes, if you call the function directly multiple times assigning to different enums, it won't memoize those. And it doesn't work lazily how the SO question asked for, but this does: enum lazily(string name: "foo") = ctfeFunction(); If you don't refer to lazily!"foo", ctfeFunction() never gets called. If you do, it gets called once, regardless of how many times you use lazily!"foo". That gives you lazy memoization of any CTFEable function without ever needing the function parameters to become template parameters. I'm not sure what MyType is, but that looks like a prime candidate for my previous post's mixin examples. If not, you could use "definition.txt" as its parameter, and have it import() as an implementation detail.
 $ is actually a valid identifier character in C
Nope. $ as an identifier character is a commonly supported extension, but code that uses it doesn't compile with `clang -std=c11 -Werror -pedantic`.
Mar 31 2016
prev sibling parent reply David Nadlinger <code klickverbot.at> writes:
On Saturday, 26 March 2016 at 05:22:56 UTC, Andrei Alexandrescu 
wrote:
 Compression of template names. -- Andrei
Compression in the usual sense won't help. Sure, it might reduce the object file size, but the full string will again have to be generated first, still requiring absurd amounts time and space. The latter is definitely not negligible for symbol names of several hundreds of kilobytes; it shows up prominently in compiler profiles of affected Weka builds. — David
Mar 27 2016
parent reply Liran Zvibel <liran weka.io> writes:
On Sunday, 27 March 2016 at 17:01:39 UTC, David Nadlinger wrote:
 Compression in the usual sense won't help. Sure, it might 
 reduce the object file size, but the full string will again 
 have to be generated first, still requiring absurd amounts time 
 and space. The latter is definitely not negligible for symbol 
 names of several hundreds of kilobytes; it shows up prominently 
 in compiler profiles of affected Weka builds.
We love Voldemort types at Weka, and use them a lot in our non-gc-allocating ranges and algorithm libraries. Also, we liberally nest templates inside of other templates. I don't think we can do many of the things we do if we had to define everything at module level. This flexibility is amazing for us and part of the reason we love D. But, as David said -- it comes with a great price for us. I just processed our biggest executable, and came up with the following numbers: total symbols: 99649 Symbols longer than 1k: 9639 Symbols longer than 500k: 102 Symbols longer than 1M: 62. The longest symbols are about 5M bytes! This affects our exe sizes in a terrible way, and also increases our compile and link times considerably. I will only be able to come up with statistics of how much time was wasted due to too-long-symbols after we fix it, but obviously this is a major problem for us. I think we should try the solution proposed by Anon, as it has a good possibility of saving quite a bit. It's important to make sure that when a template is given as a template parameter, the complete template is treated as the LName. Thinking about the compression idea by Andrei, I think we get such long names since we have huge symbols that are being passed as Voldemort names to template parameters. Then we repeat the huge symbols several times in the new template. Think of a .5M symbol passed few times to a template, this is probably how we get to 5M size symbols. This could end up being too complex, but if we assign "huffman coding" like names to the complete template names in a module scope (lets say, only if the template name is longer than 30 bytes), we then will be able to replace a very long string by the huffman coded version coupled with the LName+Number idea above, we will be able to shorten symbol names considerably. solution, and then we can see if we also have to recursively couple it with huffman-coding of the results template names. Liran
Mar 30 2016
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 3/30/16 3:19 PM, Liran Zvibel wrote:
 On Sunday, 27 March 2016 at 17:01:39 UTC, David Nadlinger wrote:
 Compression in the usual sense won't help. Sure, it might reduce the
 object file size, but the full string will again have to be generated
 first, still requiring absurd amounts time and space. The latter is
 definitely not negligible for symbol names of several hundreds of
 kilobytes; it shows up prominently in compiler profiles of affected
 Weka builds.
We love Voldemort types at Weka, and use them a lot in our non-gc-allocating ranges and algorithm libraries. Also, we liberally nest templates inside of other templates. I don't think we can do many of the things we do if we had to define everything at module level. This flexibility is amazing for us and part of the reason we love D.
Voldemort types are what cause the bloat, templates inside templates aren't as much of a problem. It's because the Voldemort type has to include in its symbol name at least twice, and I think 3 times actually (given the return type), the template parameter/function parameter types of the function it resides in. If the template is just a template, it's just included once. This is why moving the type outside the function is effective at mitigation. It's linear growth vs. exponential. I too like Voldemort types, but I actually found moving the types outside the functions quite straightforward. It's just annoying to have to repeat the template parameters. If you make them private, then you can simply avoid all the constraints. It's a bad leak of implementation, since now anything in the file has access to that type directly, but it's better than the issues with voldemort types. See the update to my iopipe library here: https://github.com/schveiguy/iopipe/commit/1b0696dc82fce500c6b314ec3d8e5e11e0c1bcd7 This one commit made my example program 'convert' (https://github.com/schveiguy/iopipe/blob/master/examples/convert/convert.d) save over 90% binary size (went from 10MB to <1MB). This also calmed down some REALLY horrible stack traces when I was debugging. As in, I could actually understand what function it was talking about, and it didn't take 10 seconds to print stack trace.
 But, as David said -- it comes with a great price for us.

 I just processed our biggest executable, and came up with the following
 numbers:
 total symbols: 99649
 Symbols longer than 1k: 9639
 Symbols longer than 500k: 102
 Symbols longer than 1M: 62. The longest symbols are about 5M bytes!

 This affects our exe sizes in a terrible way, and also increases our
 compile and link times considerably. I will only be able to come up with
 statistics of how much time was wasted due to too-long-symbols after we
 fix it, but obviously this is a major problem for us.
From my testing, it doesn't take much to get to the point where the linker is unusable. A simple struct when nested in 15 calls to a function makes the linker take an unreasonable amount of time (over 1.5 minutes, I didn't wait to see how long). See my bug report for details. Another factor in the name length is the module name which is included in every type and function. So you have a factor like 3^15 for the name, but then you multiply this by the module names as well.
 I think we should try the solution proposed by Anon, as it has a good
 possibility of saving quite a bit.
 It's important to make sure that when a template is given as a template
 parameter, the complete template is treated as the LName.
I hope this is given serious thought, looks like someone has already started implementation. Anon, it appears that your mechanism has been well received by a few knowledgeable people here. I encourage you to solidify your proposal in a DIP (D improvement proposal) here: http://wiki.dlang.org/DIPs. -Steve
Mar 31 2016
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 31 March 2016 at 13:10:49 UTC, Steven Schveighoffer 
wrote:
 Voldemort types are what cause the bloat, templates inside 
 templates aren't as much of a problem.
So here's an idea of other things don't work out: voldemort types don't have a name that can be said... that could be true in the mangle too. We could potentially just hash it to some fixed length. Take the existing name, SHA1 it, and call the mangle function_containing_type$that_hash. Demangling the inside is useless anyway, so we lose nothing from that. For example, a function foo.bar() returning a voldemort type might have the mangle: _D3foo3barv$55ca6286e3e4f4fba5d0448333fa99fc5a404a73 where that's the hash of whatever filth the compiler makes up for it. and methods on the voldemort type would start with that too: _D503foo3barv$55ca6286e3e4f4fba5d0448333fa99fc5a404a735empty That's getting messy to see as an example, but it took the name of the function + the voldemort as a whole to be the public name of the type. A demangler could scan that for the $ and recognize it as ugliness and slice it off, printing the name as a somewhat human-readable foo.bar().empty perhaps, so we can see it is a method on the return value of that function. It isn't completely impossible like demanging a whole hash; should be short in the binary and readable enough to make sense of in a stack trace. I actually recall something like this being done at some point in the past, but I don't remember the details. I think it just truncated the name though, it didn't attempt to remain semi-reversible. Of course, a chain of voldemort types will still include that type as a template argument in the next call, so names as a whole can still be very long, but how long are your chains? I can see this becoming a 10 KB long name in extreme circumstances (still yikes) but not megabytes. And, still, having to run over the names to build the hash is going to be a memory/speed thing in the compiler, but SHA1ing a few megabytes isn't as slow as writing it out to disk over and over again. Another thing to consider is what my D to Javascript thing did: abbreviate EVERYTHING in the object file. You can't demangle that at all, but it leads to much smaller binaries. This probably isn't that useful for D in general, but might solve Weka's problem since they can use internal hacks.
Mar 31 2016
next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 3/31/16 9:38 AM, Adam D. Ruppe wrote:
 On Thursday, 31 March 2016 at 13:10:49 UTC, Steven Schveighoffer wrote:
 Voldemort types are what cause the bloat, templates inside templates
 aren't as much of a problem.
So here's an idea of other things don't work out: voldemort types don't have a name that can be said... that could be true in the mangle too. We could potentially just hash it to some fixed length. Take the existing name, SHA1 it, and call the mangle function_containing_type$that_hash. Demangling the inside is useless anyway, so we lose nothing from that.
Ugh, let's try the huffman coding thing first :) Stack traces would be unusable. However, this does seem promising if that doesn't work out. The hash would definitely solve the linking and binary size issue. A possible thing the compiler *could* do is place inside the binary a hash-to-actual-symbol table that the exception printer can utilize to print a nicer stack trace... -Steve
Mar 31 2016
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 31 March 2016 at 14:00:38 UTC, Steven Schveighoffer 
wrote:
 Ugh, let's try the huffman coding thing first :)
Do that after and along with!
 Stack traces would be unusable.
That's why I'd keep the function name outside the hash. The inner spam wouldn't be readable (not like a megabyte long name is readable anyway...), but the function name (which includes template arguments*) still is and that's probably the most useful part anyway. * I just thought of another thing though.... string arguments to templates are included in the mangle, and with CTFE mixin stuff, they can become VERY long. void foo(string s)() {} pragma(msg, foo!"hi there, friend".mangleof); _D1p48__T3fooVAyaa16_68692074686572652c20667269656e64Z3fooFNaNbNiNfZv That "68692074686572652c20667269656e64" portion of it is the string represented as hexadecimal. If you do a CTFE thing, the code string you pass in may be kept in ALL the names generated from it. We should probably do something about these too. Simply gzipping before converting to hex is a possible option if we want it reversible. Or, of course, hashing too if we don't care about that. And IMO a huge string in a stack trace is unreadable anyway... I'd be tempted to say if the string is longer than like 64 chars, just hash it. But this is debatable.
 A possible thing the compiler *could* do is place inside the 
 binary a hash-to-actual-symbol table that the exception printer 
 can utilize to print a nicer stack trace...
Indeed. I actually just emailed Liran with this suggestion as well. I don't think it is ideal for D in general, but for an internal project, it might solve some problems.
Mar 31 2016
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 3/31/16 10:30 AM, Adam D. Ruppe wrote:
 On Thursday, 31 March 2016 at 14:00:38 UTC, Steven Schveighoffer wrote:
 Ugh, let's try the huffman coding thing first :)
Do that after and along with!
 Stack traces would be unusable.
That's why I'd keep the function name outside the hash. The inner spam wouldn't be readable (not like a megabyte long name is readable anyway...), but the function name (which includes template arguments*) still is and that's probably the most useful part anyway.
Moving types outside the function actually results in a pretty readable stack trace. I noticed the stack trace printer just gives up after so many characters anyway. -Steve
Mar 31 2016
prev sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 3/31/16 9:38 AM, Adam D. Ruppe wrote:

 Of course, a chain of voldemort types will still include that type as a
 template argument in the next call, so names as a whole can still be
 very long, but how long are your chains? I can see this becoming a 10 KB
 long name in extreme circumstances (still yikes) but not megabytes.
If you are hashing anyway, just hash the hash :) In other words, this function: auto foo(T)(T t) { static struct R {} return R; } would result in a return type of a hash, with no care about whether T was a hashed symbol or not. This puts a cap on the size of the name, it would never get that big. -Steve
Mar 31 2016
prev sibling next sibling parent jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 31 March 2016 at 13:10:49 UTC, Steven Schveighoffer 
wrote:
 I too like Voldemort types, but I actually found moving the 
 types outside the functions quite straightforward. It's just 
 annoying to have to repeat the template parameters. If you make 
 them private, then you can simply avoid all the constraints. 
 It's a bad leak of implementation, since now anything in the 
 file has access to that type directly, but it's better than the 
 issues with voldemort types.
If you move anything with a Voldemort type to their own modules, then do what you say, then there is no longer an access issue. Leads to a proliferation of modules. I can think of another alternative, but it is probably a needless complexity. Suppose there is a protection attribute with the property that things in the module can only access it if given permission explicitly. For instance, taking the D wiki Voldemort type example and modifying it to your approach would give struct TheUnnameable { int value; this(int x) {value = x;} int getValue() { return value; } } auto createVoldemortType(int value) { return TheUnnameable(value); } The Unnameable would then be changed to explicit struct TheUnnameable { explicit(createVoldemortType); int value; this(int x) {value = x;} int getValue() { return value; } } where explicit used as a protection attribute would restrict TheUnnameable to only things where the explicit function gives explicit permission.
Mar 31 2016
prev sibling parent Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Thursday, March 31, 2016 09:10:49 Steven Schveighoffer via Digitalmars-d 
wrote:
 On 3/30/16 3:19 PM, Liran Zvibel wrote:
 On Sunday, 27 March 2016 at 17:01:39 UTC, David Nadlinger wrote:
 Compression in the usual sense won't help. Sure, it might reduce the
 object file size, but the full string will again have to be generated
 first, still requiring absurd amounts time and space. The latter is
 definitely not negligible for symbol names of several hundreds of
 kilobytes; it shows up prominently in compiler profiles of affected
 Weka builds.
We love Voldemort types at Weka, and use them a lot in our non-gc-allocating ranges and algorithm libraries. Also, we liberally nest templates inside of other templates. I don't think we can do many of the things we do if we had to define everything at module level. This flexibility is amazing for us and part of the reason we love D.
Voldemort types are what cause the bloat, templates inside templates aren't as much of a problem. It's because the Voldemort type has to include in its symbol name at least twice, and I think 3 times actually (given the return type), the template parameter/function parameter types of the function it resides in. If the template is just a template, it's just included once. This is why moving the type outside the function is effective at mitigation. It's linear growth vs. exponential.
This makes me wonder if there would be a sane way to basically treat the Voldemort type as if it were outside the function it was declared from a mangling standpoint while still treating its accessibility the same way. Maybe that wouldn't work due to potential name clashes, but if the problem is purely because the type is declared inside the function, I would think that it would make sense to try and find a way to treat it more like it was declared outside the function when dealing with the name mangling. Then - in theory - you'd get the benefits of using a Voldemort type while getting the name mangling cost that you get when you declare a private type outside of the function. - Jonathan M Davis
Mar 31 2016
prev sibling parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Fri, Mar 25, 2016 at 11:37:24AM +0000, Seb via Digitalmars-d wrote:
 If I understand it correctly, the current policy in Phobos is that
 range methods should use static nested structs to avoid the name
 clutter and document the capabilities of the returned ranges in the
 documentation.  However there are a lot of old functions that still
 use public structs, which leads to the rather confusing documentation
 output in std.range:
Actually, certain types of range structs *have* to be in module space rather than inside the function, because of compiler limitations / language restrictions. An example that comes to mind is a range with an alias parameter -- I forget the details, but basically there are some cases where this will not work correctly with certain uses if it's declared inside the function, so it has to be in module space. This should probably be investigated more, though. It seems to be a grey area of semantics that could use a DIP. [...]
 (off-topic: it's quite interesting to see that sometimes the structs
 are before the method and sometimes after it)
I think it's a recent change in convention that recommends declaring such structs after the function. In any case, declaration order in D is (generally -- mostly in module space) not important, so it's not a big deal.
 Two arguments to keep exposing the structs are that (1) an API user
 can explicitly specify the return type (in contrast to auto) and (2)
 one can see the capabilities of the struct in the documentation.
I think (1) is moot, because there's always typeof and ReturnType. In fact, I'd argue that it's better to use typeof / ReturnType, because sometimes the user *shouldn't* need to know exactly what the template arguments are. For example, if some of the template arguments come from IFTI, where the exact types may not be immediately obvious from the user's POV, or if there are default template parameters that are a pain to spell out every single time. I argue that (2) is a bad idea, because a range ought to be opaque. The whole point of the range API is that it should be possible for the implementation to change drastically or be replaced by a different implementation, yet user code should still continue to Just Work(tm) without any modifications. As such, user code should not depend on implementation details of the range, and really shouldn't know anything else about the range other than what is specified in the range API. All the user ought to know is whether it's an input range, forward range, bidirectional range, etc.. Anything more than that leads to breakage when the range implementation is updated/replaced, which breaks the whole premise of using ranges in the first place.
 There are many cases where methods in these structs are optional and
 depend on the capabilities of the input range (e.g. backward, length,
 random access, ...). I could imagine that
 
 1) We rework ddoc, s.t. it doesn't list these structs in the overview
 and adds the struct methods to the function (could get ugly)
No. The docs of the function should simply state what kind of range it returns -- input, forward, bidirectional, or random access. If it depends on what the function is given, the docs should explain under what conditions the function will return which kind of range. Listing individual range methods for every range function in Phobos is a lot of needless repetition, and is error-prone. If anything, we should write a page that explains exactly what methods are available to each kind of range, and just link to that from the function docs. That's what hyperlinks are for.
 2) We deprecate those exposed structs and make them private/nested
 (does anyone know whether it's common to depend on those structs?)
[...] I don't know if it's common, but I *have* seen people spell out the type explicitly. So changing this now will probably break existing code, and people will likely be resistant to that. (Arguably, though, it would be for the better -- users really shouldn't need to know the exact range type.) T -- Winners never quit, quitters never win. But those who never quit AND never win are idiots.
Mar 25 2016