digitalmars.D - Use case: eliminate hidden allocations in buildPath

Andrei Alexandrescu (12/12) Dec 04 2013 Hello,

Michel Fortin (18/30) Dec 04 2013 Allow an allocator as the first argument. Then pass an allocator that

Manu (16/42) Dec 04 2013 Allocator as the first argument? This is so you can use UFCS on the

Michel Fortin (9/11) Dec 04 2013 Haha, no. That's because buildPath's arguments are (const(C[])[]

Manu (4/12) Dec 04 2013 Oh yeah! :P
Jacob Carlborg (5/9) Dec 05 2013 No, that's not possible. One could think it would be, as long as there

H. S. Teoh (48/63) Dec 04 2013 [...]
Jonathan M Davis (12/14) Dec 04 2013 I would have thought that that would be the obvious way to solve the pro...
Manu (26/43) Dec 04 2013 This seems the intuitive approach to me.
monarch_dodra (9/22) Dec 05 2013 Use an output range. It's the generic D approach, and what we

Jacob Carlborg (5/11) Dec 05 2013 In general case, what would you suggest for functions not operating on

monarch_dodra (3/16) Dec 05 2013 How do you mean? As in functions that only output a "single item"?

Jacob Carlborg (4/5) Dec 05 2013 Yes.

monarch_dodra (7/11) Dec 05 2013 I'd say simply with pass by ref, where the caller for puts the

Andrei Alexandrescu (3/17) Dec 05 2013 Returns a non-linear data structure such as a hash table.

monarch_dodra (5/29) Dec 05 2013 Output range! :)

Andrei Alexandrescu (5/10) Dec 05 2013 Hrm, construction of a hash table is linearizable so bad example on my

monarch_dodra (11/26) Dec 05 2013 Say, something like a graph? At that point, I'd say you'd have to

monarch_dodra (6/29) Dec 05 2013 I really need to re-read myself before posting, sorry :/
Andrei Alexandrescu (9/32) Dec 05 2013 Trees and graphs are not special cases. They're bread and butter objects...

monarch_dodra (14/22) Dec 05 2013 Yes, and they aren't really linear containers either, but that

H. S. Teoh (17/30) Dec 05 2013 [...]
Dmitry Olshansky (8/19) Dec 05 2013 Pass desired container type that follows some implicit protocol such as
Michel Fortin (16/25) Dec 07 2013 You have to admit that a "bump the pointer" allocator is pretty much

Andrei Alexandrescu (5/26) Dec 07 2013 Only of the output range is untyped. An allocator traffics in untyped

Michel Fortin (24/30) Dec 07 2013 If I remember well, you said before that you may also design "typed

Marco Leise (7/11) Dec 07 2013 This could even preallocate blocks on a filesystem for file

Brad Anderson (18/49) Dec 05 2013 And thanks to your improvements[1] to put() output ranges should

monarch_dodra (11/28) Dec 05 2013 Yes, appender did its own trans-coding, which helped the

inout (12/25) Dec 05 2013 My approach in cases like that used to be to pass an optional

Brad Anderson (6/17) Dec 05 2013 The nice thing about output ranges instead of this approach is

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Hello,


Walter and I were talking about eliminating the surreptitious 
allocations in buildPath:



We'd need to keep the existing version working, so we're looking at 
adding one or more new overloads. We're looking at giving the user the 
option to control any needed memory allocation (or even arrange things 
such that there's no memory allocated at all).

It's a generous design space, so although we have a couple of ideas 
let's hear others first.


Thanks,

Andrei

Dec 04 2013

Michel Fortin <michel.fortin michelf.ca> writes:

On 2013-12-04 23:14:48 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Walter and I were talking about eliminating the surreptitious 
 allocations in buildPath:
 

 
 We'd need to keep the existing version working, so we're looking at 
 adding one or more new overloads. We're looking at giving the user the 
 option to control any needed memory allocation (or even arrange things 
 such that there's no memory allocated at all).
 
 It's a generous design space, so although we have a couple of ideas 
 let's hear others first.

Allow an allocator as the first argument. Then pass an allocator that 
uses preallocated memory (or any other strategy that does not really 
need to allocate). While technically the allocator still "allocates" 
memory, since you control the allocator it does it the you can redefine 
"allocate" to not allocate.

Here's a funny thought: allow plain arrays to be *typed* allocators 
through UFCS, just like arrays are ranges. If you have an array of 
chars, then "allocating" from it will simply return a slice and "bump 
the pointer" by becoming the remaining unused slice. The big problem 
with buildPath is that it won't work with overloading because your 
allocator has the same type as the other parameters. You'll need to 
wrap it in some kind of allocator shell. :-/

-- 
Michel Fortin
michel.fortin michelf.ca
http://michelf.ca

Dec 04 2013

Manu <turkeyman gmail.com> writes:

On 5 December 2013 10:13, Michel Fortin <michel.fortin michelf.ca> wrote:

 On 2013-12-04 23:14:48 +0000, Andrei Alexandrescu <
 SeeWebsiteForEmail erdani.org> said:

  Walter and I were talking about eliminating the surreptitious allocations
 in buildPath:



 We'd need to keep the existing version working, so we're looking at
 adding one or more new overloads. We're looking at giving the user the
 option to control any needed memory allocation (or even arrange things such
 that there's no memory allocated at all).

 It's a generous design space, so although we have a couple of ideas let's
 hear others first.

 Allow an allocator as the first argument. Then pass an allocator that uses
 preallocated memory (or any other strategy that does not really need to
 allocate). While technically the allocator still "allocates" memory, since
 you control the allocator it does it the you can redefine "allocate" to not
 allocate.

Allocator as the first argument? This is so you can use UFCS on the
allocator to make the call?
I think it maybe makes sense for a function that takes an output range to
receive it as the first argument for this reason:
 outputRange.myFunction(); // I'm not sure if I like this, not sure it's
communicating the right process... but maybe people find it convenient?

An overload that receives an allocator though, I'd probably make the
allocator the last argument, this way it can have a default arg that is the
default GC allocator, thus eliminating the other (original?) overload of
the function which receives neither an output range or an allocator (using
the GC intrinsically).

Here's a funny thought: allow plain arrays to be *typed* allocators through
 UFCS, just like arrays are ranges. If you have an array of chars, then
 "allocating" from it will simply return a slice and "bump the pointer" by
 becoming the remaining unused slice. The big problem with buildPath is that
 it won't work with overloading because your allocator has the same type as
 the other parameters. You'll need to wrap it in some kind of allocator
 shell. :-/


I'm not sure I like this idea. I think I'd prefer to see 2 overloads, one
that receives an output range, and one that receives an allocator (which
may default to a GC allocator?).

Dec 04 2013

Michel Fortin <michel.fortin michelf.ca> writes:

On 2013-12-05 02:24:02 +0000, Manu <turkeyman gmail.com> said:

 Allocator as the first argument? This is so you can use UFCS on the
 allocator to make the call?

Haha, no. That's because buildPath's arguments are (const(C[])[] 
paths...) with a "..." at the end. Can we put an argument after the 
variadic argument? I didn't check, but I though it was impossible... 
thus the allocator as the first argument.

-- 
Michel Fortin
michel.fortin michelf.ca
http://michelf.ca

Dec 04 2013

Manu <turkeyman gmail.com> writes:

On 5 December 2013 13:14, Michel Fortin <michel.fortin michelf.ca> wrote:

 On 2013-12-05 02:24:02 +0000, Manu <turkeyman gmail.com> said:

  Allocator as the first argument? This is so you can use UFCS on the
 allocator to make the call?

 Haha, no. That's because buildPath's arguments are (const(C[])[] paths...)
 with a "..." at the end. Can we put an argument after the variadic
 argument? I didn't check, but I though it was impossible... thus the
 allocator as the first argument.


Oh yeah! :P
I was thinking in a more general sense, as a principle to be applied across
phobos, not just for this one function.

Dec 04 2013

Jacob Carlborg <doob me.com> writes:

On 2013-12-05 04:14, Michel Fortin wrote:

 Haha, no. That's because buildPath's arguments are (const(C[])[]
 paths...) with a "..." at the end. Can we put an argument after the
 variadic argument? I didn't check, but I though it was impossible...
 thus the allocator as the first argument.

No, that's not possible. One could think it would be, as long as there 
is no conflict in the types.

-- 
/Jacob Carlborg

Dec 05 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Dec 04, 2013 at 03:14:48PM -0800, Andrei Alexandrescu wrote:
 Hello,
 
 
 Walter and I were talking about eliminating the surreptitious
 allocations in buildPath:
 

 
 We'd need to keep the existing version working, so we're looking at
 adding one or more new overloads. We're looking at giving the user
 the option to control any needed memory allocation (or even arrange
 things such that there's no memory allocated at all).
 
 It's a generous design space, so although we have a couple of ideas
 let's hear others first.

[...]

What about a new overload that takes an output range instead of
returning a string? The caller can then create the appropriate output
range that does whatever is desired (malloc a buffer, use a static
fixed-size buffer, etc.), and buildPath doesn't have to care about the
implementation details. So, something like:

	void buildPath(OutputRange,Range)(OutputRange output,
					  Range segments)
		if (isOutputRange!(OutputRange, ElementType!Range))
	{ ... }

//

On a related note, I wonder if it's profitable to extend the concept of
an output range to include void delegates that take range elements as
arguments. The current toString overload looked for by std.format, for
example, has this signature:

	void toString(scope void delegate(const(char)[] s) dg);

The delegate here essentially behaves like an output range (it takes
snippets of string data and presumably appends them to some buffer
somewhere). So why not extend the concept of output ranges to include
such delegates, so that we can write:

	void toString(R)(R r) if (isOutputRange!R);

Then any output range can be used as the target of a string conversion,
e.g., writing straight to a file or network socket without the
unnecessary buffering that returning a string implies.

"But!" I hear you cry, "you can't call a delegate as dg.put(...), which
you have to if it is to conform to the output range interface!" To that
I'd say:

	module std.range;
	...
	void put(R,T)(R dg, T data)
		if (is(R == void delegate(T)) ||
		    is(R == void function(T)))
	{
		dg(data);
	}

If I'm not mistaken, this should make isOutputRange true for functions
and delegates that take the appropriate argument types.

This then allows us to use buildPath with no hidden allocations, for
example:

	char[1024] buffer;

	void appendToBuf(const(char)[] data) { ... }

	void main() {
		buildPath(&appendToBuf, "usr", "local", "share", "filename");
	}


T

-- 
Three out of two people have difficulties with fractions. -- Dirk Eddelbuettel

Dec 04 2013

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Wednesday, December 04, 2013 16:38:54 H. S. Teoh wrote:
 What about a new overload that takes an output range instead of
 returning a string?

I would have thought that that would be the obvious way to solve the problem. 
In general, I think that when a function allocates any kind of string or array 
which it returns, we should overload it with a version that takes an output 
range. In many cases, it would probably even make sense to make the default 
just use std.array.appender and make it so that only the output range overload 
is really doing any work.

What to do in cases of allocation where we're not dealing with arrays being 
returned is a tougher question, and I think that that starts going into custom 
allocator territory, but for arrays, output ranges are definitely the way to go 
IMHO.

- Jonathan M Davis

Dec 04 2013

Manu <turkeyman gmail.com> writes:

On 5 December 2013 11:02, Jonathan M Davis <jmdavisProg gmx.com> wrote:

 On Wednesday, December 04, 2013 16:38:54 H. S. Teoh wrote:
 What about a new overload that takes an output range instead of
 returning a string?

 I would have thought that that would be the obvious way to solve the
 problem.
 In general, I think that when a function allocates any kind of string or
 array
 which it returns, we should overload it with a version that takes an output
 range. In many cases, it would probably even make sense to make the default
 just use std.array.appender and make it so that only the output range
 overload
 is really doing any work.

This seems the intuitive approach to me.


What to do in cases of allocation where we're not dealing with arrays being
 returned is a tougher question, and I think that that starts going into
 custom
 allocator territory, but for arrays, output ranges are definitely the way
 to go
 IMHO.

You mean for internal working buffers?
It get's tricky. In my experience, many functions can use alloca, with a
fallback in the event of very large workspaces.
In the event of very large workspaces, it is often possible to break the
process into stages which allow an iterative use of a smaller alloca-ed
buffer if the function doesn't perform highly random access on the working
data.
In the event of large workspace and heavily randomised access to the
working dataset (ie, difficult to avoid one large allocation), then and
only then, maybe start thinking about receiving an allocator to work with
(default arg to a gc allocator), or, if it's a large complex process, break
the process into sub-functions which the user can issue, and they will then
inherit ownership and management of the workspace, which they can treat how
they like.

These are some of my approaches. Almost all code I write must never
allocate, except for functions that produce new allocations, where either
receiving an output range or an allocator seems intuitive.

The most important question to ask is 'can this function possibly be called
recursively?'. Many library functions are effectively leaf functions, and
can't lead back into user code via any path, which means alloca is safe to
use liberally. If there is potential for a recursive call, then i guess you
need to start preferring receiving allocators or output ranges, unless the
alloca is in the order of a normal function's stack usage (<1kb-ish?).

Dec 04 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Wednesday, 4 December 2013 at 23:14:48 UTC, Andrei 
Alexandrescu wrote:
 Hello,


 Walter and I were talking about eliminating the surreptitious 
 allocations in buildPath:



 We'd need to keep the existing version working, so we're 
 looking at adding one or more new overloads. We're looking at 
 giving the user the option to control any needed memory 
 allocation (or even arrange things such that there's no memory 
 allocated at all).

 It's a generous design space, so although we have a couple of 
 ideas let's hear others first.


 Thanks,

 Andrei

Use an output range. It's the generic D approach, and what we 
already do for the string functions such as std.string.translate:

(look down for the output range overloads).

Anything "allocator" related should be carried by the output 
range itself. The function itself should not care nor know about 
any of that.

Dec 05 2013

Jacob Carlborg <doob me.com> writes:

On 2013-12-05 09:09, monarch_dodra wrote:

 Use an output range. It's the generic D approach, and what we already do
 for the string functions such as std.string.translate:

 (look down for the output range overloads).

 Anything "allocator" related should be carried by the output range
 itself. The function itself should not care nor know about any of that.

In general case, what would you suggest for functions not operating on 
ranges?

-- 
/Jacob Carlborg

Dec 05 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 5 December 2013 at 08:45:08 UTC, Jacob Carlborg 
wrote:
 On 2013-12-05 09:09, monarch_dodra wrote:

 Use an output range. It's the generic D approach, and what we 
 already do
 for the string functions such as std.string.translate:

 (look down for the output range overloads).

 Anything "allocator" related should be carried by the output 
 range
 itself. The function itself should not care nor know about any 
 of that.

 In general case, what would you suggest for functions not 
 operating on ranges?

How do you mean? As in functions that only output a "single item"?

Dec 05 2013

Jacob Carlborg <doob me.com> writes:

On 2013-12-05 09:52, monarch_dodra wrote:

 How do you mean? As in functions that only output a "single item"?

Yes.

-- 
/Jacob Carlborg

Dec 05 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 5 December 2013 at 08:59:08 UTC, Jacob Carlborg 
wrote:
 On 2013-12-05 09:52, monarch_dodra wrote:

 How do you mean? As in functions that only output a "single 
 item"?

 Yes.

I'd say simply with pass by ref, where the caller for puts the 
object on the stack (or wherever it wishes). If the object needs 
to further allocate, the said object itself should know how to it.

In the use case where you *must* have a "foo(ref T* o)"-style 
signature, I'm not sure.

Dec 05 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/5/13 12:52 AM, monarch_dodra wrote:
 On Thursday, 5 December 2013 at 08:45:08 UTC, Jacob Carlborg wrote:
 On 2013-12-05 09:09, monarch_dodra wrote:

 Use an output range. It's the generic D approach, and what we already do
 for the string functions such as std.string.translate:

 (look down for the output range overloads).

 Anything "allocator" related should be carried by the output range
 itself. The function itself should not care nor know about any of that.

 In general case, what would you suggest for functions not operating on
 ranges?

 How do you mean? As in functions that only output a "single item"?

Returns a non-linear data structure such as a hash table.

Andrei

Dec 05 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 5 December 2013 at 15:00:07 UTC, Andrei Alexandrescu 
wrote:
 On 12/5/13 12:52 AM, monarch_dodra wrote:
 On Thursday, 5 December 2013 at 08:45:08 UTC, Jacob Carlborg 
 wrote:
 On 2013-12-05 09:09, monarch_dodra wrote:

 Use an output range. It's the generic D approach, and what 
 we already do
 for the string functions such as std.string.translate:

 (look down for the output range overloads).

 Anything "allocator" related should be carried by the output 
 range
 itself. The function itself should not care nor know about 
 any of that.

 In general case, what would you suggest for functions not 
 operating on
 ranges?

 How do you mean? As in functions that only output a "single 
 item"?

 Returns a non-linear data structure such as a hash table.

 Andrei

Output range! :)

Output range interface makes no linearity requirements. Just 
that: "out.put(this)" compiles.

Dec 05 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/5/13 8:19 AM, monarch_dodra wrote:
 On Thursday, 5 December 2013 at 15:00:07 UTC, Andrei Alexandrescu wrote:
 Andrei

 Output range! :)

 Output range interface makes no linearity requirements. Just that:
 "out.put(this)" compiles.

Hrm, construction of a hash table is linearizable so bad example on my 
part. But I'm talking about general structured data such as objects with 
allocated fields and connections to other objects etc. etc.

Andrei

Dec 05 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 5 December 2013 at 16:46:52 UTC, Andrei Alexandrescu 
wrote:
 On 12/5/13 8:19 AM, monarch_dodra wrote:
 On Thursday, 5 December 2013 at 15:00:07 UTC, Andrei 
 Alexandrescu wrote:
 Andrei

 Output range! :)

 Output range interface makes no linearity requirements. Just 
 that:
 "out.put(this)" compiles.

 Hrm, construction of a hash table is linearizable so bad 
 example on my part. But I'm talking about general structured 
 data such as objects with allocated fields and connections to 
 other objects etc. etc.

 Andrei

Say, something like a graph? At that point, I'd say you'd have to 
pass an "allocating scheme" to the function, so an allocator, 
yes. Depending on the data being generated, the constructed item 
could carry the allocator itself. For example: auto newNode = 
myNode.GenerateNeighbor();

In But I think that'd be a special case situation. For everything 
else, output range is an easy and intuitive, and fits well with 
the rest of phobos. We'd want (IMHO) to integrate the allocators 
directly inside the output ranges.

Dec 05 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 5 December 2013 at 17:06:13 UTC, monarch_dodra wrote:
 On Thursday, 5 December 2013 at 16:46:52 UTC, Andrei 
 Alexandrescu wrote:
 On 12/5/13 8:19 AM, monarch_dodra wrote:
 On Thursday, 5 December 2013 at 15:00:07 UTC, Andrei 
 Alexandrescu wrote:
 Andrei

 Output range! :)

 Output range interface makes no linearity requirements. Just 
 that:
 "out.put(this)" compiles.

 Hrm, construction of a hash table is linearizable so bad 
 example on my part. But I'm talking about general structured 
 data such as objects with allocated fields and connections to 
 other objects etc. etc.

 Andrei


 In But I think that'd be a special case situation. For 
 everything else, output range is an easy and intuitive, and 
 fits well with the rest of phobos. We'd want (IMHO) to 
 integrate the allocators directly inside the output ranges.

I really need to re-read myself before posting, sorry :/

But I think that'd be a special case situation. For
everything else, output ranges are easy and intuitive, and
fit well with the rest of phobos. We'd want (IMHO) to
integrate allocation directly inside output ranges.

Dec 05 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/5/13 9:06 AM, monarch_dodra wrote:
 On Thursday, 5 December 2013 at 16:46:52 UTC, Andrei Alexandrescu wrote:
 On 12/5/13 8:19 AM, monarch_dodra wrote:
 On Thursday, 5 December 2013 at 15:00:07 UTC, Andrei Alexandrescu wrote:
 Andrei

 Output range! :)

 Output range interface makes no linearity requirements. Just that:
 "out.put(this)" compiles.

 Hrm, construction of a hash table is linearizable so bad example on my
 part. But I'm talking about general structured data such as objects
 with allocated fields and connections to other objects etc. etc.

 Andrei

 Say, something like a graph? At that point, I'd say you'd have to pass
 an "allocating scheme" to the function, so an allocator, yes. Depending
 on the data being generated, the constructed item could carry the
 allocator itself. For example: auto newNode = myNode.GenerateNeighbor();

 In But I think that'd be a special case situation. For everything else,
 output range is an easy and intuitive, and fits well with the rest of
 phobos. We'd want (IMHO) to integrate the allocators directly inside the
 output ranges.

Trees and graphs are not special cases. They're bread and butter objects 
with fields that are in turn allocated etc.

Consider how difficult it would be (for both the implementer and the 
user) to fill via an output range an object (class or struct) that has a 
few fields that need allocation (such as strings and arrays). It becomes 
a mini-serialization project. We can't realistically propose that as a 
way to allow users to control memory allocation.


Andrei

Dec 05 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 5 December 2013 at 18:34:40 UTC, Andrei Alexandrescu 
wrote:
 Trees and graphs are not special cases. They're bread and 
 butter objects with fields that are in turn allocated etc.

Yes, and they aren't really linear containers either, but that 
doesn't mean we throw the input range concept out the window.

The graph itself could carry the allocator, and just the same you 
have a function "getNeighbors" that returns an input range of a 
node's neighboring nodes, the graph could also give a collection 
of different ouput ranges that takes care of allocation.

I'm just saying it's not something I'd like to see that 
*functions* take allocators as arguments. I think that should be 
abstracted by the containers/ranges themselves, in such a way 
that the functions only have to use things like "pushBack" (eg: 
std::back_inserter).

 Consider how difficult it would be (for both the implementer 
 and the user) to fill via an output range an object (class or 
 struct) that has a few fields that need allocation (such as 
 strings and arrays). It becomes a mini-serialization project. 
 We can't realistically propose that as a way to allow users to 
 control memory allocation.

I'm out of my league on this one.

Dec 05 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Dec 05, 2013 at 08:46:52AM -0800, Andrei Alexandrescu wrote:
 On 12/5/13 8:19 AM, monarch_dodra wrote:
On Thursday, 5 December 2013 at 15:00:07 UTC, Andrei Alexandrescu wrote:
Andrei

Output range! :)

Output range interface makes no linearity requirements. Just that:
"out.put(this)" compiles.

 
 Hrm, construction of a hash table is linearizable so bad example on
 my part. But I'm talking about general structured data such as
 objects with allocated fields and connections to other objects etc.
 etc.

[...]

I got your point from your previous post. It made me wonder how
something like, say, a directed graph could be constructed in a way such
that the caller gets to control how memory is allocated. My first
thought was to use an abstract factory object that exposes an API that
allowed construction of nodes and edges. The function would take this
API as argument, and the caller provides the actual implementation.

However, on further thought, this doesn't address the case when the
function needs to do something opaque to the API, such as creating a
derived class of a graph node (assuming the API has some kind of method
that allocates and creates nodes). Ultimately, it seems that an
allocator is the only way that doesn't make the caller/callee interface
extremely ugly and leaky.


T

-- 
People tell me I'm stubborn, but I refuse to accept it!

Dec 05 2013

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

05-Dec-2013 20:46, Andrei Alexandrescu пишет:
 On 12/5/13 8:19 AM, monarch_dodra wrote:
 On Thursday, 5 December 2013 at 15:00:07 UTC, Andrei Alexandrescu wrote:
 Andrei

 Output range! :)

 Output range interface makes no linearity requirements. Just that:
 "out.put(this)" compiles.

 Hrm, construction of a hash table is linearizable so bad example on my
 part. But I'm talking about general structured data such as objects with
 allocated fields and connections to other objects etc. etc.

Pass desired container type that follows some implicit protocol such as 
isGraph!T. Then container type defines allocation scheme (=some 
container specific default).

As an extended version allow passing reference to an allocator to use 
with that container type.

-- 
Dmitry Olshansky

Dec 05 2013

Michel Fortin <michel.fortin michelf.ca> writes:

On 2013-12-05 16:46:52 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 12/5/13 8:19 AM, monarch_dodra wrote:
 Output range! :)
 
 Output range interface makes no linearity requirements. Just that:
 "out.put(this)" compiles.

 
 Hrm, construction of a hash table is linearizable so bad example on my 
 part. But I'm talking about general structured data such as objects 
 with allocated fields and connections to other objects etc. etc.

You have to admit that a "bump the pointer" allocator is pretty much 
the same thing as an output range that fills a preexisting array.

So here's an interesting idea: perhaps typed allocators should *be* 
output ranges. The difference is that if you "put" something on the 
allocator, it copies/moves it to the allocator's heap and returns you a 
pointer. While if you "put" something on a generic output range, don't 
expect any pointer out of it.

Some functions, like buildPath, would only require an output range. 
Others will need a more complete allocator, that's those functions that 
want to refer back to elements they've "put" on the allocator's heap.

-- 
Michel Fortin
michel.fortin michelf.ca
http://michelf.ca

Dec 07 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/7/13 5:13 AM, Michel Fortin wrote:
 On 2013-12-05 16:46:52 +0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 On 12/5/13 8:19 AM, monarch_dodra wrote:
 Output range! :)

 Output range interface makes no linearity requirements. Just that:
 "out.put(this)" compiles.

 Hrm, construction of a hash table is linearizable so bad example on my
 part. But I'm talking about general structured data such as objects
 with allocated fields and connections to other objects etc. etc.

 You have to admit that a "bump the pointer" allocator is pretty much the
 same thing as an output range that fills a preexisting array.

Only of the output range is untyped. An allocator traffics in untyped 
chunks; an output range usually has an element type.

 So here's an interesting idea: perhaps typed allocators should *be*
 output ranges. The difference is that if you "put" something on the
 allocator, it copies/moves it to the allocator's heap and returns you a
 pointer. While if you "put" something on a generic output range, don't
 expect any pointer out of it.

 Some functions, like buildPath, would only require an output range.
 Others will need a more complete allocator, that's those functions that
 want to refer back to elements they've "put" on the allocator's heap.

I don't think that would work.


Andrei

Dec 07 2013

Michel Fortin <michel.fortin michelf.ca> writes:

On 2013-12-07 16:08:18 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 12/7/13 5:13 AM, Michel Fortin wrote:
 You have to admit that a "bump the pointer" allocator is pretty much the
 same thing as an output range that fills a preexisting array.

 
 Only of the output range is untyped. An allocator traffics in untyped 
 chunks; an output range usually has an element type.

If I remember well, you said before that you may also design "typed 
allocators" that'd work at a layer above untyped allocators. That's 
what I was referring to, although I wasn't very clear (the words "typed 
allocator" only appeared after the paragraph quoted above).

Anyway, here's another thing to consider in the Output Range vs. 
Allocator debate: allocators can be faster than output ranges because 
you can tell the allocator in advance how much data you need.

For instance, if you're joining multiple strings together you can 
quickly calculate the final length, allocate that, write to allocated 
memory and return the result. In fact, that's exactly what buildPath 
currently does when allocating from the GC.

With an output range that appends to a string the string needs to 
constantly grow to accommodate new content, with no insight as to the 
final length. The way to fix that with an output range would be to have 
have a "reserve" function telling the range "prepare yourself to 
receive that much data", which may or may not do something depending on 
the kind of output range it is. There's no such function in the 
definition of an output range right now.

-- 
Michel Fortin
michel.fortin michelf.ca
http://michelf.ca

Dec 07 2013

Marco Leise <Marco.Leise gmx.de> writes:

Am Sat, 7 Dec 2013 13:27:07 -0500
schrieb Michel Fortin <michel.fortin michelf.ca>:

 [=E2=80=A6] The way to fix that with an output range would be to have=20
 have a "reserve" function telling the range "prepare yourself to=20
 receive that much data", which may or may not do something depending on=20
 the kind of output range it is.

This could even preallocate blocks on a filesystem for file
record based output ranges. (In order to avoid
file fragmentation on mechanical disks.)

--=20
Marco

Dec 07 2013

"Brad Anderson" <eco gnuk.net> writes:

On Thursday, 5 December 2013 at 08:09:55 UTC, monarch_dodra wrote:
 On Wednesday, 4 December 2013 at 23:14:48 UTC, Andrei 
 Alexandrescu wrote:
 Hello,


 Walter and I were talking about eliminating the surreptitious 
 allocations in buildPath:



 We'd need to keep the existing version working, so we're 
 looking at adding one or more new overloads. We're looking at 
 giving the user the option to control any needed memory 
 allocation (or even arrange things such that there's no memory 
 allocated at all).

 It's a generous design space, so although we have a couple of 
 ideas let's hear others first.


 Thanks,

 Andrei

 Use an output range. It's the generic D approach, and what we 
 already do for the string functions such as 
 std.string.translate:

 (look down for the output range overloads).

 Anything "allocator" related should be carried by the output 
 range itself. The function itself should not care nor know 
 about any of that.

And thanks to your improvements[1] to put() output ranges should 
actually be much more usable than they ever were before. I've 
mentioned this a few times but I tried to add an output range 
overload of toUpper and toLower but immediately encountered 
problems with appending dchars to char arrays.  Your recently 
merged improvements to put() appear to have addressed that 
problem and many others and I'll soon have another go at adding 
these overloads now that that change is in place so thanks for 
doing the hard work.

Before this got merged I don't even think output ranges could be 
easily used for this improvement to buildPath so it got merged at 
a very convenient time.

(my memory is a bit poor but if I'm remembering correctly 
appender worked because it had done its own handling of narrow 
strings but you couldn't just use a static or dynamic narrow 
string array as an output range)

1. https://github.com/D-Programming-Language/phobos/pull/1569

Dec 05 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 5 December 2013 at 20:47:32 UTC, Brad Anderson wrote:
 And thanks to your improvements[1] to put() output ranges 
 should actually be much more usable than they ever were before. 
 I've mentioned this a few times but I tried to add an output 
 range overload of toUpper and toLower but immediately 
 encountered problems with appending dchars to char arrays.  
 Your recently merged improvements to put() appear to have 
 addressed that problem and many others and I'll soon have 
 another go at adding these overloads now that that change is in 
 place so thanks for doing the hard work.

 Before this got merged I don't even think output ranges could 
 be easily used for this improvement to buildPath so it got 
 merged at a very convenient time.

 (my memory is a bit poor but if I'm remembering correctly 
 appender worked because it had done its own handling of narrow 
 strings but you couldn't just use a static or dynamic narrow 
 string array as an output range)

 1. https://github.com/D-Programming-Language/phobos/pull/1569

Yes, appender did its own trans-coding, which helped the 
situation, but it made things like output delegates absolutely 
useless.

My next step was to *remove* the on the fly trancoding in 
appender, to rely only "put", directly, but I fear that would 
cause un-necessary breakge, due to UFCS not triggering if the 
member function exists:

In the sense that
put(appender!string(), myDstring); //Yes
appender!string().put(myDstring); //Nope

Dec 05 2013

"inout" <inout gmail.com> writes:

On Wednesday, 4 December 2013 at 23:14:48 UTC, Andrei 
Alexandrescu wrote:
 Hello,


 Walter and I were talking about eliminating the surreptitious 
 allocations in buildPath:



 We'd need to keep the existing version working, so we're 
 looking at adding one or more new overloads. We're looking at 
 giving the user the option to control any needed memory 
 allocation (or even arrange things such that there's no memory 
 allocated at all).

 It's a generous design space, so although we have a couple of 
 ideas let's hear others first.


 Thanks,

 Andrei

My approach in cases like that used to be to pass an optional 
char[] buffer = null as last argument (it doesn't work as nice in 
variadic functions though):

// constructs the result in a buffer if one is passed until there 
is no more space left, in which case it reallocates the buffer
char[] buildPath(const(char)[] path1, const(char)[] path2, char[] 
buffer = null) {
   Appender!(char) appender = useBuffer(buffer);
   ...
}

Dec 05 2013

"Brad Anderson" <eco gnuk.net> writes:

On Thursday, 5 December 2013 at 17:30:37 UTC, inout wrote:
 My approach in cases like that used to be to pass an optional 
 char[] buffer = null as last argument (it doesn't work as nice 
 in variadic functions though):

 // constructs the result in a buffer if one is passed until 
 there is no more space left, in which case it reallocates the 
 buffer
 char[] buildPath(const(char)[] path1, const(char)[] path2, 
 char[] buffer = null) {
   Appender!(char) appender = useBuffer(buffer);
   ...
 }

The nice thing about output ranges instead of this approach is 
the allocation strategy is left up to the caller rather than the 
function.  I think generalized SSO (a small static buffer that 
can fall back to a dynamic allocation) will become a very common 
idiom as more and more of phobos accepts output ranges.

Dec 05 2013

D Programming

C/C++ Programming

Other

digitalmars.D - Use case: eliminate hidden allocations in buildPath