www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Use case: eliminate hidden allocations in buildPath

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Hello,


Walter and I were talking about eliminating the surreptitious 
allocations in buildPath:



We'd need to keep the existing version working, so we're looking at 
adding one or more new overloads. We're looking at giving the user the 
option to control any needed memory allocation (or even arrange things 
such that there's no memory allocated at all).

It's a generous design space, so although we have a couple of ideas 
let's hear others first.


Thanks,

Andrei
Dec 04 2013
next sibling parent reply Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-12-04 23:14:48 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Walter and I were talking about eliminating the surreptitious 
 allocations in buildPath:
 

 
 We'd need to keep the existing version working, so we're looking at 
 adding one or more new overloads. We're looking at giving the user the 
 option to control any needed memory allocation (or even arrange things 
 such that there's no memory allocated at all).
 
 It's a generous design space, so although we have a couple of ideas 
 let's hear others first.
Allow an allocator as the first argument. Then pass an allocator that uses preallocated memory (or any other strategy that does not really need to allocate). While technically the allocator still "allocates" memory, since you control the allocator it does it the you can redefine "allocate" to not allocate. Here's a funny thought: allow plain arrays to be *typed* allocators through UFCS, just like arrays are ranges. If you have an array of chars, then "allocating" from it will simply return a slice and "bump the pointer" by becoming the remaining unused slice. The big problem with buildPath is that it won't work with overloading because your allocator has the same type as the other parameters. You'll need to wrap it in some kind of allocator shell. :-/ -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Dec 04 2013
parent reply Manu <turkeyman gmail.com> writes:
On 5 December 2013 10:13, Michel Fortin <michel.fortin michelf.ca> wrote:

 On 2013-12-04 23:14:48 +0000, Andrei Alexandrescu <
 SeeWebsiteForEmail erdani.org> said:

  Walter and I were talking about eliminating the surreptitious allocations
 in buildPath:



 We'd need to keep the existing version working, so we're looking at
 adding one or more new overloads. We're looking at giving the user the
 option to control any needed memory allocation (or even arrange things such
 that there's no memory allocated at all).

 It's a generous design space, so although we have a couple of ideas let's
 hear others first.
Allow an allocator as the first argument. Then pass an allocator that uses preallocated memory (or any other strategy that does not really need to allocate). While technically the allocator still "allocates" memory, since you control the allocator it does it the you can redefine "allocate" to not allocate.
Allocator as the first argument? This is so you can use UFCS on the allocator to make the call? I think it maybe makes sense for a function that takes an output range to receive it as the first argument for this reason: outputRange.myFunction(); // I'm not sure if I like this, not sure it's communicating the right process... but maybe people find it convenient? An overload that receives an allocator though, I'd probably make the allocator the last argument, this way it can have a default arg that is the default GC allocator, thus eliminating the other (original?) overload of the function which receives neither an output range or an allocator (using the GC intrinsically). Here's a funny thought: allow plain arrays to be *typed* allocators through
 UFCS, just like arrays are ranges. If you have an array of chars, then
 "allocating" from it will simply return a slice and "bump the pointer" by
 becoming the remaining unused slice. The big problem with buildPath is that
 it won't work with overloading because your allocator has the same type as
 the other parameters. You'll need to wrap it in some kind of allocator
 shell. :-/
I'm not sure I like this idea. I think I'd prefer to see 2 overloads, one that receives an output range, and one that receives an allocator (which may default to a GC allocator?).
Dec 04 2013
parent reply Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-12-05 02:24:02 +0000, Manu <turkeyman gmail.com> said:

 Allocator as the first argument? This is so you can use UFCS on the
 allocator to make the call?
Haha, no. That's because buildPath's arguments are (const(C[])[] paths...) with a "..." at the end. Can we put an argument after the variadic argument? I didn't check, but I though it was impossible... thus the allocator as the first argument. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Dec 04 2013
next sibling parent Manu <turkeyman gmail.com> writes:
On 5 December 2013 13:14, Michel Fortin <michel.fortin michelf.ca> wrote:

 On 2013-12-05 02:24:02 +0000, Manu <turkeyman gmail.com> said:

  Allocator as the first argument? This is so you can use UFCS on the
 allocator to make the call?
Haha, no. That's because buildPath's arguments are (const(C[])[] paths...) with a "..." at the end. Can we put an argument after the variadic argument? I didn't check, but I though it was impossible... thus the allocator as the first argument.
Oh yeah! :P I was thinking in a more general sense, as a principle to be applied across phobos, not just for this one function.
Dec 04 2013
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-12-05 04:14, Michel Fortin wrote:

 Haha, no. That's because buildPath's arguments are (const(C[])[]
 paths...) with a "..." at the end. Can we put an argument after the
 variadic argument? I didn't check, but I though it was impossible...
 thus the allocator as the first argument.
No, that's not possible. One could think it would be, as long as there is no conflict in the types. -- /Jacob Carlborg
Dec 05 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Dec 04, 2013 at 03:14:48PM -0800, Andrei Alexandrescu wrote:
 Hello,
 
 
 Walter and I were talking about eliminating the surreptitious
 allocations in buildPath:
 

 
 We'd need to keep the existing version working, so we're looking at
 adding one or more new overloads. We're looking at giving the user
 the option to control any needed memory allocation (or even arrange
 things such that there's no memory allocated at all).
 
 It's a generous design space, so although we have a couple of ideas
 let's hear others first.
[...] What about a new overload that takes an output range instead of returning a string? The caller can then create the appropriate output range that does whatever is desired (malloc a buffer, use a static fixed-size buffer, etc.), and buildPath doesn't have to care about the implementation details. So, something like: void buildPath(OutputRange,Range)(OutputRange output, Range segments) if (isOutputRange!(OutputRange, ElementType!Range)) { ... } // On a related note, I wonder if it's profitable to extend the concept of an output range to include void delegates that take range elements as arguments. The current toString overload looked for by std.format, for example, has this signature: void toString(scope void delegate(const(char)[] s) dg); The delegate here essentially behaves like an output range (it takes snippets of string data and presumably appends them to some buffer somewhere). So why not extend the concept of output ranges to include such delegates, so that we can write: void toString(R)(R r) if (isOutputRange!R); Then any output range can be used as the target of a string conversion, e.g., writing straight to a file or network socket without the unnecessary buffering that returning a string implies. "But!" I hear you cry, "you can't call a delegate as dg.put(...), which you have to if it is to conform to the output range interface!" To that I'd say: module std.range; ... void put(R,T)(R dg, T data) if (is(R == void delegate(T)) || is(R == void function(T))) { dg(data); } If I'm not mistaken, this should make isOutputRange true for functions and delegates that take the appropriate argument types. This then allows us to use buildPath with no hidden allocations, for example: char[1024] buffer; void appendToBuf(const(char)[] data) { ... } void main() { buildPath(&appendToBuf, "usr", "local", "share", "filename"); } T -- Three out of two people have difficulties with fractions. -- Dirk Eddelbuettel
Dec 04 2013
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, December 04, 2013 16:38:54 H. S. Teoh wrote:
 What about a new overload that takes an output range instead of
 returning a string?
I would have thought that that would be the obvious way to solve the problem. In general, I think that when a function allocates any kind of string or array which it returns, we should overload it with a version that takes an output range. In many cases, it would probably even make sense to make the default just use std.array.appender and make it so that only the output range overload is really doing any work. What to do in cases of allocation where we're not dealing with arrays being returned is a tougher question, and I think that that starts going into custom allocator territory, but for arrays, output ranges are definitely the way to go IMHO. - Jonathan M Davis
Dec 04 2013
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
On 5 December 2013 11:02, Jonathan M Davis <jmdavisProg gmx.com> wrote:

 On Wednesday, December 04, 2013 16:38:54 H. S. Teoh wrote:
 What about a new overload that takes an output range instead of
 returning a string?
I would have thought that that would be the obvious way to solve the problem. In general, I think that when a function allocates any kind of string or array which it returns, we should overload it with a version that takes an output range. In many cases, it would probably even make sense to make the default just use std.array.appender and make it so that only the output range overload is really doing any work.
This seems the intuitive approach to me. What to do in cases of allocation where we're not dealing with arrays being
 returned is a tougher question, and I think that that starts going into
 custom
 allocator territory, but for arrays, output ranges are definitely the way
 to go
 IMHO.
You mean for internal working buffers? It get's tricky. In my experience, many functions can use alloca, with a fallback in the event of very large workspaces. In the event of very large workspaces, it is often possible to break the process into stages which allow an iterative use of a smaller alloca-ed buffer if the function doesn't perform highly random access on the working data. In the event of large workspace and heavily randomised access to the working dataset (ie, difficult to avoid one large allocation), then and only then, maybe start thinking about receiving an allocator to work with (default arg to a gc allocator), or, if it's a large complex process, break the process into sub-functions which the user can issue, and they will then inherit ownership and management of the workspace, which they can treat how they like. These are some of my approaches. Almost all code I write must never allocate, except for functions that produce new allocations, where either receiving an output range or an allocator seems intuitive. The most important question to ask is 'can this function possibly be called recursively?'. Many library functions are effectively leaf functions, and can't lead back into user code via any path, which means alloca is safe to use liberally. If there is potential for a recursive call, then i guess you need to start preferring receiving allocators or output ranges, unless the alloca is in the order of a normal function's stack usage (<1kb-ish?).
Dec 04 2013
prev sibling next sibling parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Wednesday, 4 December 2013 at 23:14:48 UTC, Andrei 
Alexandrescu wrote:
 Hello,


 Walter and I were talking about eliminating the surreptitious 
 allocations in buildPath:



 We'd need to keep the existing version working, so we're 
 looking at adding one or more new overloads. We're looking at 
 giving the user the option to control any needed memory 
 allocation (or even arrange things such that there's no memory 
 allocated at all).

 It's a generous design space, so although we have a couple of 
 ideas let's hear others first.


 Thanks,

 Andrei
Use an output range. It's the generic D approach, and what we already do for the string functions such as std.string.translate: (look down for the output range overloads). Anything "allocator" related should be carried by the output range itself. The function itself should not care nor know about any of that.
Dec 05 2013
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-12-05 09:09, monarch_dodra wrote:

 Use an output range. It's the generic D approach, and what we already do
 for the string functions such as std.string.translate:

 (look down for the output range overloads).

 Anything "allocator" related should be carried by the output range
 itself. The function itself should not care nor know about any of that.
In general case, what would you suggest for functions not operating on ranges? -- /Jacob Carlborg
Dec 05 2013
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 5 December 2013 at 08:45:08 UTC, Jacob Carlborg 
wrote:
 On 2013-12-05 09:09, monarch_dodra wrote:

 Use an output range. It's the generic D approach, and what we 
 already do
 for the string functions such as std.string.translate:

 (look down for the output range overloads).

 Anything "allocator" related should be carried by the output 
 range
 itself. The function itself should not care nor know about any 
 of that.
In general case, what would you suggest for functions not operating on ranges?
How do you mean? As in functions that only output a "single item"?
Dec 05 2013
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-12-05 09:52, monarch_dodra wrote:

 How do you mean? As in functions that only output a "single item"?
Yes. -- /Jacob Carlborg
Dec 05 2013
parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 5 December 2013 at 08:59:08 UTC, Jacob Carlborg 
wrote:
 On 2013-12-05 09:52, monarch_dodra wrote:

 How do you mean? As in functions that only output a "single 
 item"?
Yes.
I'd say simply with pass by ref, where the caller for puts the object on the stack (or wherever it wishes). If the object needs to further allocate, the said object itself should know how to it. In the use case where you *must* have a "foo(ref T* o)"-style signature, I'm not sure.
Dec 05 2013
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/5/13 12:52 AM, monarch_dodra wrote:
 On Thursday, 5 December 2013 at 08:45:08 UTC, Jacob Carlborg wrote:
 On 2013-12-05 09:09, monarch_dodra wrote:

 Use an output range. It's the generic D approach, and what we already do
 for the string functions such as std.string.translate:

 (look down for the output range overloads).

 Anything "allocator" related should be carried by the output range
 itself. The function itself should not care nor know about any of that.
In general case, what would you suggest for functions not operating on ranges?
How do you mean? As in functions that only output a "single item"?
Returns a non-linear data structure such as a hash table. Andrei
Dec 05 2013
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 5 December 2013 at 15:00:07 UTC, Andrei Alexandrescu 
wrote:
 On 12/5/13 12:52 AM, monarch_dodra wrote:
 On Thursday, 5 December 2013 at 08:45:08 UTC, Jacob Carlborg 
 wrote:
 On 2013-12-05 09:09, monarch_dodra wrote:

 Use an output range. It's the generic D approach, and what 
 we already do
 for the string functions such as std.string.translate:

 (look down for the output range overloads).

 Anything "allocator" related should be carried by the output 
 range
 itself. The function itself should not care nor know about 
 any of that.
In general case, what would you suggest for functions not operating on ranges?
How do you mean? As in functions that only output a "single item"?
Returns a non-linear data structure such as a hash table. Andrei
Output range! :) Output range interface makes no linearity requirements. Just that: "out.put(this)" compiles.
Dec 05 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/5/13 8:19 AM, monarch_dodra wrote:
 On Thursday, 5 December 2013 at 15:00:07 UTC, Andrei Alexandrescu wrote:
 Andrei
Output range! :) Output range interface makes no linearity requirements. Just that: "out.put(this)" compiles.
Hrm, construction of a hash table is linearizable so bad example on my part. But I'm talking about general structured data such as objects with allocated fields and connections to other objects etc. etc. Andrei
Dec 05 2013
next sibling parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 5 December 2013 at 16:46:52 UTC, Andrei Alexandrescu 
wrote:
 On 12/5/13 8:19 AM, monarch_dodra wrote:
 On Thursday, 5 December 2013 at 15:00:07 UTC, Andrei 
 Alexandrescu wrote:
 Andrei
Output range! :) Output range interface makes no linearity requirements. Just that: "out.put(this)" compiles.
Hrm, construction of a hash table is linearizable so bad example on my part. But I'm talking about general structured data such as objects with allocated fields and connections to other objects etc. etc. Andrei
Say, something like a graph? At that point, I'd say you'd have to pass an "allocating scheme" to the function, so an allocator, yes. Depending on the data being generated, the constructed item could carry the allocator itself. For example: auto newNode = myNode.GenerateNeighbor(); In But I think that'd be a special case situation. For everything else, output range is an easy and intuitive, and fits well with the rest of phobos. We'd want (IMHO) to integrate the allocators directly inside the output ranges.
Dec 05 2013
next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 5 December 2013 at 17:06:13 UTC, monarch_dodra wrote:
 On Thursday, 5 December 2013 at 16:46:52 UTC, Andrei 
 Alexandrescu wrote:
 On 12/5/13 8:19 AM, monarch_dodra wrote:
 On Thursday, 5 December 2013 at 15:00:07 UTC, Andrei 
 Alexandrescu wrote:
 Andrei
Output range! :) Output range interface makes no linearity requirements. Just that: "out.put(this)" compiles.
Hrm, construction of a hash table is linearizable so bad example on my part. But I'm talking about general structured data such as objects with allocated fields and connections to other objects etc. etc. Andrei
 In But I think that'd be a special case situation. For 
 everything else, output range is an easy and intuitive, and 
 fits well with the rest of phobos. We'd want (IMHO) to 
 integrate the allocators directly inside the output ranges.
I really need to re-read myself before posting, sorry :/ But I think that'd be a special case situation. For everything else, output ranges are easy and intuitive, and fit well with the rest of phobos. We'd want (IMHO) to integrate allocation directly inside output ranges.
Dec 05 2013
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/5/13 9:06 AM, monarch_dodra wrote:
 On Thursday, 5 December 2013 at 16:46:52 UTC, Andrei Alexandrescu wrote:
 On 12/5/13 8:19 AM, monarch_dodra wrote:
 On Thursday, 5 December 2013 at 15:00:07 UTC, Andrei Alexandrescu wrote:
 Andrei
Output range! :) Output range interface makes no linearity requirements. Just that: "out.put(this)" compiles.
Hrm, construction of a hash table is linearizable so bad example on my part. But I'm talking about general structured data such as objects with allocated fields and connections to other objects etc. etc. Andrei
Say, something like a graph? At that point, I'd say you'd have to pass an "allocating scheme" to the function, so an allocator, yes. Depending on the data being generated, the constructed item could carry the allocator itself. For example: auto newNode = myNode.GenerateNeighbor(); In But I think that'd be a special case situation. For everything else, output range is an easy and intuitive, and fits well with the rest of phobos. We'd want (IMHO) to integrate the allocators directly inside the output ranges.
Trees and graphs are not special cases. They're bread and butter objects with fields that are in turn allocated etc. Consider how difficult it would be (for both the implementer and the user) to fill via an output range an object (class or struct) that has a few fields that need allocation (such as strings and arrays). It becomes a mini-serialization project. We can't realistically propose that as a way to allow users to control memory allocation. Andrei
Dec 05 2013
parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 5 December 2013 at 18:34:40 UTC, Andrei Alexandrescu 
wrote:
 Trees and graphs are not special cases. They're bread and 
 butter objects with fields that are in turn allocated etc.
Yes, and they aren't really linear containers either, but that doesn't mean we throw the input range concept out the window. The graph itself could carry the allocator, and just the same you have a function "getNeighbors" that returns an input range of a node's neighboring nodes, the graph could also give a collection of different ouput ranges that takes care of allocation. I'm just saying it's not something I'd like to see that *functions* take allocators as arguments. I think that should be abstracted by the containers/ranges themselves, in such a way that the functions only have to use things like "pushBack" (eg: std::back_inserter).
 Consider how difficult it would be (for both the implementer 
 and the user) to fill via an output range an object (class or 
 struct) that has a few fields that need allocation (such as 
 strings and arrays). It becomes a mini-serialization project. 
 We can't realistically propose that as a way to allow users to 
 control memory allocation.
I'm out of my league on this one.
Dec 05 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Dec 05, 2013 at 08:46:52AM -0800, Andrei Alexandrescu wrote:
 On 12/5/13 8:19 AM, monarch_dodra wrote:
On Thursday, 5 December 2013 at 15:00:07 UTC, Andrei Alexandrescu wrote:
Andrei
Output range! :) Output range interface makes no linearity requirements. Just that: "out.put(this)" compiles.
Hrm, construction of a hash table is linearizable so bad example on my part. But I'm talking about general structured data such as objects with allocated fields and connections to other objects etc. etc.
[...] I got your point from your previous post. It made me wonder how something like, say, a directed graph could be constructed in a way such that the caller gets to control how memory is allocated. My first thought was to use an abstract factory object that exposes an API that allowed construction of nodes and edges. The function would take this API as argument, and the caller provides the actual implementation. However, on further thought, this doesn't address the case when the function needs to do something opaque to the API, such as creating a derived class of a graph node (assuming the API has some kind of method that allocates and creates nodes). Ultimately, it seems that an allocator is the only way that doesn't make the caller/callee interface extremely ugly and leaky. T -- People tell me I'm stubborn, but I refuse to accept it!
Dec 05 2013
prev sibling next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
05-Dec-2013 20:46, Andrei Alexandrescu пишет:
 On 12/5/13 8:19 AM, monarch_dodra wrote:
 On Thursday, 5 December 2013 at 15:00:07 UTC, Andrei Alexandrescu wrote:
 Andrei
Output range! :) Output range interface makes no linearity requirements. Just that: "out.put(this)" compiles.
Hrm, construction of a hash table is linearizable so bad example on my part. But I'm talking about general structured data such as objects with allocated fields and connections to other objects etc. etc.
Pass desired container type that follows some implicit protocol such as isGraph!T. Then container type defines allocation scheme (=some container specific default). As an extended version allow passing reference to an allocator to use with that container type. -- Dmitry Olshansky
Dec 05 2013
prev sibling parent reply Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-12-05 16:46:52 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 12/5/13 8:19 AM, monarch_dodra wrote:
 Output range! :)
 
 Output range interface makes no linearity requirements. Just that:
 "out.put(this)" compiles.
Hrm, construction of a hash table is linearizable so bad example on my part. But I'm talking about general structured data such as objects with allocated fields and connections to other objects etc. etc.
You have to admit that a "bump the pointer" allocator is pretty much the same thing as an output range that fills a preexisting array. So here's an interesting idea: perhaps typed allocators should *be* output ranges. The difference is that if you "put" something on the allocator, it copies/moves it to the allocator's heap and returns you a pointer. While if you "put" something on a generic output range, don't expect any pointer out of it. Some functions, like buildPath, would only require an output range. Others will need a more complete allocator, that's those functions that want to refer back to elements they've "put" on the allocator's heap. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Dec 07 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/7/13 5:13 AM, Michel Fortin wrote:
 On 2013-12-05 16:46:52 +0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 On 12/5/13 8:19 AM, monarch_dodra wrote:
 Output range! :)

 Output range interface makes no linearity requirements. Just that:
 "out.put(this)" compiles.
Hrm, construction of a hash table is linearizable so bad example on my part. But I'm talking about general structured data such as objects with allocated fields and connections to other objects etc. etc.
You have to admit that a "bump the pointer" allocator is pretty much the same thing as an output range that fills a preexisting array.
Only of the output range is untyped. An allocator traffics in untyped chunks; an output range usually has an element type.
 So here's an interesting idea: perhaps typed allocators should *be*
 output ranges. The difference is that if you "put" something on the
 allocator, it copies/moves it to the allocator's heap and returns you a
 pointer. While if you "put" something on a generic output range, don't
 expect any pointer out of it.

 Some functions, like buildPath, would only require an output range.
 Others will need a more complete allocator, that's those functions that
 want to refer back to elements they've "put" on the allocator's heap.
I don't think that would work. Andrei
Dec 07 2013
parent reply Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-12-07 16:08:18 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 12/7/13 5:13 AM, Michel Fortin wrote:
 You have to admit that a "bump the pointer" allocator is pretty much the
 same thing as an output range that fills a preexisting array.
Only of the output range is untyped. An allocator traffics in untyped chunks; an output range usually has an element type.
If I remember well, you said before that you may also design "typed allocators" that'd work at a layer above untyped allocators. That's what I was referring to, although I wasn't very clear (the words "typed allocator" only appeared after the paragraph quoted above). Anyway, here's another thing to consider in the Output Range vs. Allocator debate: allocators can be faster than output ranges because you can tell the allocator in advance how much data you need. For instance, if you're joining multiple strings together you can quickly calculate the final length, allocate that, write to allocated memory and return the result. In fact, that's exactly what buildPath currently does when allocating from the GC. With an output range that appends to a string the string needs to constantly grow to accommodate new content, with no insight as to the final length. The way to fix that with an output range would be to have have a "reserve" function telling the range "prepare yourself to receive that much data", which may or may not do something depending on the kind of output range it is. There's no such function in the definition of an output range right now. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Dec 07 2013
parent Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 7 Dec 2013 13:27:07 -0500
schrieb Michel Fortin <michel.fortin michelf.ca>:

 [=E2=80=A6] The way to fix that with an output range would be to have=20
 have a "reserve" function telling the range "prepare yourself to=20
 receive that much data", which may or may not do something depending on=20
 the kind of output range it is.
This could even preallocate blocks on a filesystem for file record based output ranges. (In order to avoid file fragmentation on mechanical disks.) --=20 Marco
Dec 07 2013
prev sibling parent reply "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 5 December 2013 at 08:09:55 UTC, monarch_dodra wrote:
 On Wednesday, 4 December 2013 at 23:14:48 UTC, Andrei 
 Alexandrescu wrote:
 Hello,


 Walter and I were talking about eliminating the surreptitious 
 allocations in buildPath:



 We'd need to keep the existing version working, so we're 
 looking at adding one or more new overloads. We're looking at 
 giving the user the option to control any needed memory 
 allocation (or even arrange things such that there's no memory 
 allocated at all).

 It's a generous design space, so although we have a couple of 
 ideas let's hear others first.


 Thanks,

 Andrei
Use an output range. It's the generic D approach, and what we already do for the string functions such as std.string.translate: (look down for the output range overloads). Anything "allocator" related should be carried by the output range itself. The function itself should not care nor know about any of that.
And thanks to your improvements[1] to put() output ranges should actually be much more usable than they ever were before. I've mentioned this a few times but I tried to add an output range overload of toUpper and toLower but immediately encountered problems with appending dchars to char arrays. Your recently merged improvements to put() appear to have addressed that problem and many others and I'll soon have another go at adding these overloads now that that change is in place so thanks for doing the hard work. Before this got merged I don't even think output ranges could be easily used for this improvement to buildPath so it got merged at a very convenient time. (my memory is a bit poor but if I'm remembering correctly appender worked because it had done its own handling of narrow strings but you couldn't just use a static or dynamic narrow string array as an output range) 1. https://github.com/D-Programming-Language/phobos/pull/1569
Dec 05 2013
parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 5 December 2013 at 20:47:32 UTC, Brad Anderson wrote:
 And thanks to your improvements[1] to put() output ranges 
 should actually be much more usable than they ever were before. 
 I've mentioned this a few times but I tried to add an output 
 range overload of toUpper and toLower but immediately 
 encountered problems with appending dchars to char arrays.  
 Your recently merged improvements to put() appear to have 
 addressed that problem and many others and I'll soon have 
 another go at adding these overloads now that that change is in 
 place so thanks for doing the hard work.

 Before this got merged I don't even think output ranges could 
 be easily used for this improvement to buildPath so it got 
 merged at a very convenient time.

 (my memory is a bit poor but if I'm remembering correctly 
 appender worked because it had done its own handling of narrow 
 strings but you couldn't just use a static or dynamic narrow 
 string array as an output range)

 1. https://github.com/D-Programming-Language/phobos/pull/1569
Yes, appender did its own trans-coding, which helped the situation, but it made things like output delegates absolutely useless. My next step was to *remove* the on the fly trancoding in appender, to rely only "put", directly, but I fear that would cause un-necessary breakge, due to UFCS not triggering if the member function exists: In the sense that put(appender!string(), myDstring); //Yes appender!string().put(myDstring); //Nope
Dec 05 2013
prev sibling parent reply "inout" <inout gmail.com> writes:
On Wednesday, 4 December 2013 at 23:14:48 UTC, Andrei 
Alexandrescu wrote:
 Hello,


 Walter and I were talking about eliminating the surreptitious 
 allocations in buildPath:



 We'd need to keep the existing version working, so we're 
 looking at adding one or more new overloads. We're looking at 
 giving the user the option to control any needed memory 
 allocation (or even arrange things such that there's no memory 
 allocated at all).

 It's a generous design space, so although we have a couple of 
 ideas let's hear others first.


 Thanks,

 Andrei
My approach in cases like that used to be to pass an optional char[] buffer = null as last argument (it doesn't work as nice in variadic functions though): // constructs the result in a buffer if one is passed until there is no more space left, in which case it reallocates the buffer char[] buildPath(const(char)[] path1, const(char)[] path2, char[] buffer = null) { Appender!(char) appender = useBuffer(buffer); ... }
Dec 05 2013
parent "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 5 December 2013 at 17:30:37 UTC, inout wrote:
 My approach in cases like that used to be to pass an optional 
 char[] buffer = null as last argument (it doesn't work as nice 
 in variadic functions though):

 // constructs the result in a buffer if one is passed until 
 there is no more space left, in which case it reallocates the 
 buffer
 char[] buildPath(const(char)[] path1, const(char)[] path2, 
 char[] buffer = null) {
   Appender!(char) appender = useBuffer(buffer);
   ...
 }
The nice thing about output ranges instead of this approach is the allocation strategy is left up to the caller rather than the function. I think generalized SSO (a small static buffer that can fall back to a dynamic allocation) will become a very common idiom as more and more of phobos accepts output ranges.
Dec 05 2013