digitalmars.D - Re: why allocators are not discussed here

"H. S. Teoh" <hsteoh quickfur.ath.cx> Jun 26 2013

"Adam D. Ruppe" <destructionator gmail.com> Jun 26 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Jun 26, 2013 at 01:16:31AM +0200, Adam D. Ruppe wrote:
 On Tuesday, 25 June 2013 at 22:50:55 UTC, H. S. Teoh wrote:
And maybe (b) can be implemented by making gc_alloc / gc_free
overridable function pointers? Then we can override their values
and use scope guards to revert them back to the values they were
before.


 Yea, I was thinking this might be a way to go. You'd have a global
 (well, thread-local) allocator instance that can be set and reset
 through stack calls.
 
 You'd want it to be RAII or delegate based, so the scope is clear.
 
 with_allocator(my_alloc, {
      do whatever here
 });
 
 
 or
 
 {
    ChangeAllocator!my_alloc dummy;
 
    do whatever here
 } // dummy's destructor ends the allocator scope
 
 
 I think the former is a bit nicer, since the dummy variable is a bit
 silly. We'd hope that delegate can be inlined.


Actually, D's frontend leaves something to be desired when it comes to
inlining delegates. It *is* done sometimes, but not as often as one may
like. For example, opApply generally doesn't inline its delegate, even
when it's just a thin wrapper around a foreach loop.

But yeah, I think the former has nicer syntax. Maybe we can help the
compiler with inlining by making the delegate a compile-time parameter?
But it forces a switch of parameter order, which is Not Nice (hurts
readability 'cos the allocator argument comes after the block instead of
before).


 But, the template still has a big advantage: you can change the
 type. And I think that is potentially enormously useful.


True. It can use different types for different allocators that does (or
doesn't) do cleanups at the end of the scope, depending on what the
allocator needs to do.


 Another question is how to tie into output ranges. Take std.conv.to.
 
 auto s = to!string(10); // currently, this hits the gc
 
 What if I want it to go on a stack buffer? One option would be to
 rewrite it to use an output range, and then call it like:
 
 char[20] buffer;
 auto s = to!string(10, buffer); // it returns the slice of the
 buffer it actually used
 
 (and we can do overloads so to!string(10, radix) still works, as
 well as to!string(10, radix, buffer). Hassle, I know...)


I think supporting the multi-argument version of to!string() is a good
thing, but what to do with library code that calls to!string()? It'd be
nice if we could somehow redirect those GC calls without having to comb
through the entire Phobos codebase for stray calls to to!string().


[...]
 The fun part is the output range works for that, and could also work
 for something like this:
 
 struct malloced_string {
     char* ptr;
     size_t length;
     size_t capacity;
     void put(char c) {
         if(length >= capacity)
            ptr = realloc(ptr, capacity*2);
         ptr[length++] = c;
     }
 
     char[] slice() { return ptr[0 .. length]; }
     alias slice this;
     mixin RefCounted!this; // pretend this works
 }
 
 
 {
    malloced_string str;
    auto got = to!string(10, str);
 } // str is out of scope, so it gets free()'d. unsafe though: if you
 stored a copy of got somewhere, it is now a pointer to freed memory.
 I'd kinda like language support of some sort to help mitigate that
 though, like being a borrowed pointer that isn't allowed to be
 stored, but that's another discussion.


Nice!


 And that should work. So then what we might do is provide these
 little output range wrappers for various allocators, and use them on
 many functions.
 
 So we'd write:
 
 import std.allocators;
 import std.range;
 
 // mallocator is provided in std.allocators and offers the goods
 OutputRange!(char, mallocator) str;
 
 auto got = to!string(10, str);


I like this. However, it still doesn't address how to override the
default allocator in, say, Phobos functions.


 What's nice here is the output range is useful for more than just
 allocators. You could also to!string(10, my_file) or a delegate,
 blah blah blah. So it isn't too much of a burden, it is something
 you might naturally use anyway.


Now *that* is a very nice idea. I like having a way of bypassing using a
string buffer, and just writing the output directly to where it's
intended to go. I think to() with an output range parameter definitely
should be implemented. It doesn't address all of the issues, but it's a
very big first step IMO.


Also, we may have the problem of the wrong allocator
being used to free the object.


 Another reason why encoding the allocator into the type is so nice.
 For the minimal D I've been playing with, the idea I'm running with
 is all allocated memory has some kind of special type, and then
 naked pointers are always assumed to be borrowed, so you should
 never store or free them.


Interesting idea. So basically you can tell which allocator was used to
allocate an object just by looking at its type? That's not a bad idea,
actually.


 auto foo = HeapArray!char(capacity);
 
 void bar(char[] lol){}
 
 bar(foo); // allowed, foo has an alias this on slice


This is nice. Hooray for alias this. :)


 // but....
 
 struct A {
    char[] lol; // not allowed, because you don't know when lol is
 going to be freed
 }
 
 
 foo frees itself with refcounting.


This is a bit inconvenient. So your member variables will have to know
what allocation type is being used. Not the end of the world, of course,
but not as pretty as one would like.


On Wed, Jun 26, 2013 at 03:24:57AM +0200, Adam D. Ruppe wrote:
 I was just quickly skimming some criticism of C++ allocators, since
 my thought here is similar to what they do. On one hand, maybe D can
 do it right by tweaking C++'s design rather than discarding it.
 
 On the other hand, with all the C++ I've done, I have never actually
 used STL allocators, which could say something about me or could say
 something about them.
 
 
 One thing I saw said making the differently allocated object a
 different type sucks. ...but must it? The complaint there was "so
 much for just doing a function that takes a std::string". But, the
 way I'd want to do it in D is the function would take a char[]
 instead, and our special allocated type provides that via opSlice
 and/or alias this.


Yeah I think alias this adds a whole new factor into the equation. The
advantage of having a distinct type makes it much easier to implement,
and allows you to mix differently-allocated objects without having to
worry about things like calling the right version of gc_free to cleanup
properly. You can even have the same underlying data type be allocated
in two different ways, and the cleanup will happen correctly.

Basically, when you allocate some object O of class C using allocator A,
then it follows that no matter what you do with the gc_alloc/gc_free
function pointers afterwards, O must be freed using A.free. So in a
sense, O needs to carry around a function pointer to A.free in its dtor
(or whoever frees it). So this actually argues for having a distinct
type for an instance of C allocated using A, vs. an instance of C
allocated using a different allocator B. You need to store that function
pointer to A.free and B.free *somewhere*, otherwise things won't work
properly.


[...]
 Anyway, bottom line is I don't think that criticism necessarily
 applies to D.


Agreed, in D, distinct types per allocator is, at the very least, not as
bad as it is in C++.


 But there's surely many others and I'm more or less a
 n00b re c++'s allocators so idk yet.


Who *isn't* a n00b wrt to C++'s allocators, since so few people actually
use it? :-P


T

-- 
He who sacrifices functionality for ease of use, loses both and deserves
neither. -- Slashdotter

Jun 26 2013

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Wednesday, 26 June 2013 at 16:40:20 UTC, H. S. Teoh wrote:
 I think supporting the multi-argument version of to!string() is 
 a good thing, but what to do with library code that calls 
 to!string()? It'd be nice if we could somehow redirect those GC 
 calls without having to comb through the entire Phobos codebase 
 for stray calls to to!string().



Let's consider what kinds of allocations we have. We can break 
them up into two broad groups: internal and visible.

Internal allocations, in theory, don't matter. These can be on 
the stack, the gc heap, malloc/free, whatever. The function 
itself is responsible for their entire lifetime.

Changing these either optimize, in the case of reusing a region, 
or leak if you switch it to manual and the function doesn't know 
it.

Visible allocations are important because the caller is 
responsible for freeing them. Here, I really think we want the 
type system's help: either it should return something that we 
know we're responsible for, or take a buffer/output range from us 
to receive the data in the first place.

Either way, the function signature should reflect what's going on 
with visible allocations. It'd possibly return a wrapped type and 
it'd take an output range/buffer/allocator.



With internals though, the only reason I can see why you'd want 
to change them outside the function is to give them a region of 
some sort to work with, especially since you don't know for sure 
what it is doing - these are all local variables to the 
function/call stack. And here, I don't think we want to change 
the allocator wholesale.

At most, we'd want to give it hints that what we're doing are 
short lived. (Or, better yet, have it figure this out on its own, 
like a generational gc.)



So I think this is more about tweaking the gc than replacing it, 
at most adding a couple new functions to it:

GC.hint_short_lived // returns a helper struct with a static 
refcount:

TempGcAllocator {
      static int tempCount = 0;
      static void* localRegion;
      this() { tempCount++; } // pretend this works
      ~this() { tempCount--; if(tempCount == 0) 
gc.tryToCollect(localRegion); }

      T create(T, Args...)(Args args) { return GC.new_short_lived 
T(args); }
}


and gc.tryToCollect() does a quick scan for anything into the 
local region. If there's nothing in there, it frees the whole 
thing. If there is, in the name of memory safety, it just 
reintegrates that local region into the regular memory and gc's 
its components normally.



The reason the count is static is that you don't have to pass 
this thing down the call stack. Any function that wants to adapt 
to this generational hint system just calls hint_short_lived. If 
you're a leaf function, that's ok, the static count means you'll 
inherit the region from the function above you.

You would NOT use this in main(), as that defeats the purpose.


 I think to() with an output range parameter definitely
 should be implemented.


No doubt about it, we should aim for most phobos functions not to 
allocate at all, if given an output range they can use.


 Interesting idea. So basically you can tell which allocator was 
 used to allocate an object just by looking at its type?


Right, then you'll know if you have to free() it. (Or it can free 
itself with its destructor.)


 This is a bit inconvenient. So your member variables will have 
 to know what allocation type is being used. Not the end of the
 world, of course, but not as pretty as one would like.


Yeah, you'd need to know if you own them or not too (are you 
responsible for freeing that string you just got passed? If no, 
are you sure it won't be freed while you're still using it?), but 
I just think that's a part of memory management you can't 
sidestep.

There's two easy answers: 1) always make a private copy of 
anything you store (and perhaps write to) or 2) use a gc and 
trust it to always be the owner.

In any other case, I think you *have* to think about it, and the 
type telling you can help you make that decision.


 and allows you to mix differently-allocated objects without 
 having to


Important to remember though that you are borrowing these 
references, not taking ownership.

I think the rule of all pointers/slices are borrowed is fairly 
workable though. With the gc, that's ok, you don't own anything. 
The garbage collector is responsible for it all, so store away. 
(Though if it is mutable, you might want to idup it so you don't 
get overwritten by someone else. But that's a separate question 
from allocation method.... and already encoded in D's type 
system).

So never free() a naked pointer, unless you know what you're 
doing like interfacing with a C library, prefer to only free a 
ManuallyAllocated!(pointer).

hell a C library binding could change the type too, it'd still be 
binary compatible. RefCounted!T wouldn't be, but 
ManuallyAllocated!T would just be a wrapper around T*.

I think I'm starting to ramble!

Jun 26 2013

D Programming

C/C++ Programming

Other

digitalmars.D - Re: why allocators are not discussed here